期刊文献+

基于支持向量回归方法的蛋白残基可溶性预测

Prediction of Relative Solvent Accessibility Using Support Vector Regression
下载PDF
导出
摘要 介绍了一种从蛋白质序列预测残基相对可溶性的新方法。该方法基于支持向量回归,并将序列局部信息作为输入。不同于先前的大部分预测方法仅对特定的蛋白残基相对可溶性进行状态分类,该方法预测了相对可溶性的连续值,从而比状态分类保留了蛋白质三维结构的更多信息。本研究对RS-126,Manesh-215和CB-513三个数据集进行了测试,通过比较不同的参数及窗宽模型来获得最佳结果,采用平均绝对误差、相关系数等参数来衡量预测效果,同时与多层反馈神经网络方法(RVP-Net)的实验结果比较,在3-fold情况下三个数据集预测结果的平均绝对误差均有降低,相关系数均有提高。另外,该算法采用了多序列比对作为输入,效果比单序列有所提高。采用该方法,对CB-513数据集平均绝对误差可以达到16.8%、相关系数为0.562,而用RVP-Net方法分别为18.8%和0.480。这些结论表明支持向量回归方法是蛋白质序列分析的一种有效工具。 In this work a novel method was proposed to predict the relative solvent accessibilities of residues from protein primary sequences. This method was based on support vector regression (SVR) and used the local information of the particular residue for prediction as input. Three data sets, RS-126, Manesh-215 and CB-513, were collected and used to evaluate prediction performance. With 3-fold cross validation test, the average of mean absolute error (MAE) and correlation coefficient (CC) for different data set were consistently better than a previous method called RVP-Net which was based on a muhilayer feed-forward neural network. In addition, we used multiple sequence alignment as input information and obtained a prediction result of 16.8% for MAE and 0.562 for CC, which was superior to that obtained with single sequence input. The results demonstrate the efficiency of this method and that the support vector regression is a useful tool for proteomics prediction analysis.
出处 《中国生物医学工程学报》 CAS CSCD 北大核心 2007年第1期1-5,共5页 Chinese Journal of Biomedical Engineering
基金 中国科学技术大学高水平大学建设重点科研项目
关键词 相对可溶性 支持向量机 机器学习 蛋白质结构预测 生物信息学 relative solvent accessibility support vector machine machine learning protein structure prediction bioinformatics
  • 相关文献

参考文献12

  • 1Naderi-Manesh H, Sadeghi M, Arab S, et al. Prediction of protein surface accessibility with information theory [J]. Proteins, 2001,42:452 - 459.
  • 2Rost B, Sander C. Conservation and prediction of solvent accessibility in protein families [J]. Proteins, 1994, 20:216-226.
  • 3Yuan Z, Burrage K, Mattick JS. Prediction of protein solvent accessibility using support vector machines [ J ]. Proteins, 2002,48:566 - 570.
  • 4Kim H, Park H. Prediction of protein relative solvent accessibility with support vector machines and long-range interactive 3D local descriptor [J]. Proteins, 2004, 54:557- 562.
  • 5Ahmad S, Gromiha MM, Sarai A. Real value prediction of solvent accessibility from amino acid sequence [J]. Proteins, 2003, 50:629 - 635.
  • 6Rost B, Sander C. Improved prediction of protein secondary structure by using sequence profiles and neural network [ J ]. Proc Natl Acad Sci, 1993, 90:7558 - 7562.
  • 7Cuff JA, Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction [ J ].Proteins, 2000, 40:502 - 511.
  • 8Hua SJ, Sun ZR. A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach [J]. J Mol Biol, 2001, 308:397-407.
  • 9Cai CZ, Han LY, Ji ZL, et al. Enzyme family classification by support vector machines [J]. Proteins, 2004, 55:66 - 76.
  • 10李霞,张田文,李丽,郭政.决策树特征基因选择方法对SVM有效性的研究[J].中国生物医学工程学报,2004,23(1):66-72. 被引量:15

二级参考文献8

  • 1[1]DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomicscale[J]. Science,1997,278:680-685.
  • 2[2]Golub TR, et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring[J]. Science, 1999,286:531-537.
  • 3[3]Eric P. Xing, Michael I. Joran, Richard M. Karpy. Feature Selection for High-Dimensional Genomic Microarray Data, International conference on machine learning[R]. 2001.
  • 4[4]John G, Kohavi, Pfleger K. Irrelevant features and the subset selection problem in W.W.Cohen & H.Hirsh(eds), Machine Learning: Proceedings of the 11th International Conference[R]. CA,1994,121-129.
  • 5[5]Xia Li, et al. An ensemble method for gene discovery based on DNA microarray data[J]. SCIENCE IN CHINA(Series C).2004,47(1).
  • 6[6]Cristianini N, Shawe-Taylor J. An Introduction to Spport Vector Machines[M]. Cambridge University Press. Cambridge. www.support-vector.net.
  • 7[7]Brown, MPS, Grundy WN, Lin D, et al. Support Vector Machine Classification of Microarray Gene Expression Data[M].1999,Technical Report USCC-CRL-99-09. Available at: http://www.cse.ucsc.edu/research/compbio/genex.
  • 8[8]Alon U, Barkai N, Notterman D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays[J]. Proc Natl Acad Sci USA, 1999,96:6745-6750.

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部