期刊文献+

一种用于蛋白质结构聚类的聚类中心选择算法 被引量:7

An Exemplar Selection Algorithm for Protein Structures Clustering
下载PDF
导出
摘要 提出一种对蛋白质结构聚类中心进行选择的算法.聚类是蛋白质结构预测过程中必不可少的一个后处理步骤,而目前在蛋白质结构预测中常用的属性阈值(Quality threshold,QT)聚类算法依赖于由经验得出的聚类半径;其他聚类算法,如近邻传播(Affinity propagation,AP)聚类算法也存在影响聚类分布的参数.为克服对主观经验参数的依赖,本文提出一种聚类中心选择算法(Exemplar selection algorithm,ESA),用于对不同参数下的聚类结果进行分析,从而选择最佳聚类中心,进而确定聚类半径等经验参数.该算法在真实蛋白质结构数据集上进行了实验,在未知经验参数情况下选择出最佳聚类中心,同时也为不同聚类算法寻找适合相应数据集的客观聚类参数提供了支持. This paper proposes an exemplar selection algorithm(ESA)for protein structures clustering,which is a necessary post-processing step for protein structure prediction.The widely-used quality threshold(QT)algorithm in protein structure prediction depends on clustering radius derived from experience,which also affects clustering distribution in other widely-used clustering algorithms such as affinity propagation(AP).The proposed exemplar selection algorithm can analyze clustering results,choose the best exemplar,and confirm clustering parameter such as clustering radius. Experimental results on real protein structure predictions confirm the effectiveness of our exemplar selection algorithm, which can choose the best exemplar with no experience parameter,and can find the best parameter fitting for data set.
出处 《自动化学报》 EI CSCD 北大核心 2011年第6期682-692,共11页 Acta Automatica Sinica
基金 国家自然科学基金(60970055)资助~~
关键词 蛋白质结构 聚类 属性阈值 近邻传播 聚类中心选择 Protein structure clustering quality threshold(QT) affnity propagation(AP) exemplar selection
  • 相关文献

参考文献20

  • 1Anfinsen C B. Principles that govern the folding of protein chains. Science, 1973, 181(4096): 223-230.
  • 2Bradley P, Misura K M S, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science, 2005, 309(5742): 1868-1871.
  • 3Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. Journal of Computational Chemistry, 2004, 25(6): 865-871.
  • 4Wu S, Skolnich J, Zhang Y. Ab initio modeling of small pro- teins by iterative TASSER simulations. BMC Biology, 2007, 5(1): 17-26.
  • 5Zhang Y. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins: Structure, Function, and Bioinformatics, 2007, 69(S8): 108-117.
  • 6岳峰,孙亮,王宽全,王永吉,左旺孟.基因表达数据的聚类分析研究进展[J].自动化学报,2008,34(2):113-120. 被引量:25
  • 7Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A. Critical assessment of methods of pro- tein structure prediction -- round VII. Proteins: Structure, Function, and Bioinformaties, 2007, 69(S8): 3-9.
  • 8Heyer L J, Kruglyak S, Yooseph S. Exploring expres- sion data: identification and analysis of coexpressed genes. Genome Research, 1999, 9:1106-1115.
  • 9王开军,张军英,李丹,张新娜,郭涛.自适应仿射传播聚类[J].自动化学报,2007,33(12):1242-1246. 被引量:145
  • 10Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976.

二级参考文献93

  • 1倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量:18
  • 2刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 3Tsai C Y,Chin C C.Developing a feature weight selfadjustment mechanism for a K-means clustering algorithm.Computational Statistics and Data Analysis,2008,52(10):4658-4672
  • 4Qu J,Jiang Q S,Weng F F,Hong Z L.A hierarchical clustering based on overlap similarity measure.In:Proceedings of the 8th ACIS International Conference on Software Engineering,Artificial Intelligence,Networking,and Parallel/Distributed Computing.Qingdao,China:IEEE,2007.905-910
  • 5Kohonen T,Kaski S,Lagus K,Salojarvi J,Honkela J,Paatero V.Self organization of a massive document collection.IEEE Transactions on Neural Networks,2000,11(3):574-585
  • 6Arthur H,Saman K H.Class structure visualization with semi-supervised growing self-organizing maps.Neurocomparing,2008,71(16-18):3124-3130
  • 7Chan A,Pampalk E.Growing hierarchical self organising map (GHSOM) toolbox:visualisations and enhancements.In:Proceedings of the 9th International Conference on Neural Information Processing.Singapore,Singapore:IEEE,2002.2537-2541
  • 8Tien D D,Sin C H,Alvis C M F.Associative feature selection for text mining.International Journal of Information Technology,2006,12(4):59-68
  • 9Verikas A,Bacauskiene M.Feature selection with neural networks.Pattern Recognition Letters,2002,23(11):1323-1335
  • 10Malhi A,Gao R X.PCA-based feature selection scheme for machine defect classification.IEEE Transactions on Instrumentation and Measurement,2004,53(6):1517-1525

共引文献333

同被引文献34

  • 1庄令,庄令,谢睛宜,林海鹏,洪葵.深海链霉菌选择性分离及活性菌株16S rRNA聚类分析[J].生物技术通报,2009,25(S1):398-401. 被引量:3
  • 2宋方方,毕天姝,杨奇逊.基于WAMS的电力系统受扰轨迹预测[J].电力系统自动化,2006,30(23):27-32. 被引量:43
  • 3潘吉斯,吕强,王红玲.一种并行蚁群Bayesian网络学习的算法[J].小型微型计算机系统,2007,28(4):651-655. 被引量:9
  • 4Tantar A,Melab N,Talbi E -,et al.A parallel hybrid genetic algorithm for protein structure prediction on the computational grid[].Future Generation Computer Systems.2007
  • 5Dorigo,M.,Stuetzle,Th. Ant Colony Optimization . 2004
  • 6Zhenqin Li,Harold A Scheraga.Monte Carlo-minimization approach to the multiple-minima problem in protein folding[].Proceedings of the National Academy of Sciences of the United States of America.1987
  • 7Baker D,Sali A.Protein structure prediction and structural genomics[].Science.2001
  • 8GD Rose,PJ Fleming,JR Banawar,A Maritan.A backbone-based theory of protein folding[].Proceedings of the National Academy of Science USA.2006
  • 9Anfinsen C B.Principles that govern the folding of protein chains[].Science.1973
  • 10Zhang Y.Progress and challenges in protein structure prediction[].Current Opinion in Structural Biology.2008

引证文献7

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部