摘要
蛋白质相互作用(PPI)网络是生物信息学的一个新的研究领域。近年来谱聚类算法在未知蛋白质的功能预测方面发挥了重要作用,但是它要求事先确定聚类数目,为此提出了一种基于边的得分搜索的谱聚类算法。该算法采用谱聚类方法对数据进行预处理,并通过构造蛋白质节点之间的边的得分矩阵找到数据样本之间的相关性,同时融入粒子群算法来确定边的得分的最佳选择阈值,最后用广度优先遍历结点的方法得到聚类结果。算法在PPI网络数据集上进行了测试,结果表明该算法不但可以自动确定聚类数目,而且聚类结果的正确率和F-measure值都得到了提高。
Protein-protein interaction(PPI) network is a new research field in the bioinformatics.Recently spectral clustering algorithm has played an important role in the field of predicting the function of unknown proteins.However,the cluster number must be predefined.With regard to this problem,this paper proposed a spectral clustering algorithm combining with edge-based scoring searching method.Firstly,the algorithm preprocessed the PPI data via spectral clustering,then constructed the scoring matrix of edges connecting protein nodes with each other to find the relationship of dataset,and adopted particle swarm optimization algorithm to determine optimal threshold of the score of edge.Finally,it obtained the clustering results by means of breadth first traversing the protein nodes.Tested this algorithm on the PPI dataset,and the results prove that the algorithm can not only automatically determine the cluster number,but also improve both the precision value and F-measure value.
出处
《计算机应用研究》
CSCD
北大核心
2012年第7期2442-2446,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61100164
61173190)
陕西省自然科学基础研究计划资助项目(2010JQ8034)
中央高校基本科研业务费专项资金资助项目(GK200902016)
陕西师范大学研究生创新基金资助项目(2011CXS030)
关键词
谱聚类算法
粒子群优化算法
蛋白质相互作用网络
spectral clustering algorithm
particle swarm optimization algorithm
protein-protein interaction network