摘要
蛋白质结构预测过程中常会生成大量候选结构,需从中挑选出最佳预测结构.聚类是一种常见的挑选方法,根据候选结构间的相似度将其划分成不同的类别.与传统聚类算法不同,谱聚类算法可以收敛于全局最优解.它的核心思想是"先降维,再聚类",对包含大量信息的蛋白质结构非常适合.与单体结构挑选、loop结构挑选等相比,对接复合物的挑选更加复杂.因而,本文以蛋白质对接复合物为研究对象,提出一种基于谱聚类的对接复合物最佳预测结构的挑选方法.选取十组数据集进行实验,将本文的方法与能量打分、基于近邻传播算法的方法作比较.实验结果表明,谱聚类算法在对接复合物结构挑选方面可以取得很好的效果.
As large quantifies of decoys generated during protein structure predictionv selecting the best prediction is an essential step. Clustering is a common way to select decoys, it obtains the partition of the dataset based on the similarities between the decoys. Different from the traditional clustering algorithms, spectral clustering can be converged to global optimal solution. The main idea of spectral clustering algorithm is the "dimension reduction first, and then cluster". So ,it's quite suitable for protein structures with large amounts of information. Compared with selecting monomer structures and loop structures selecting the compound structures of protein is more complicated. This paper proposes a method for selecting the best prediction of the compound structures of protein docking by spectral clustering and chooses the compound structures as the object of the research. Evaluation is conducted by comparing this method to energy scoring and the method based on affinity propagation upon ten cases. The experimental results indicate that spectral clustering is able to pick out representative structures.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第10期2365-2368,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61170125)资助
关键词
谱聚类
结构挑选
能量打分
对接复合物
spectral clustering
structure selection
energy scoring
compound structures