基于支持向量回归方法的蛋白残基可溶性预测

Prediction of Relative Solvent Accessibility Using Support Vector Regression

下载PDF

导出

摘要介绍了一种从蛋白质序列预测残基相对可溶性的新方法。该方法基于支持向量回归,并将序列局部信息作为输入。不同于先前的大部分预测方法仅对特定的蛋白残基相对可溶性进行状态分类,该方法预测了相对可溶性的连续值,从而比状态分类保留了蛋白质三维结构的更多信息。本研究对RS-126,Manesh-215和CB-513三个数据集进行了测试,通过比较不同的参数及窗宽模型来获得最佳结果,采用平均绝对误差、相关系数等参数来衡量预测效果,同时与多层反馈神经网络方法(RVP-Net)的实验结果比较,在3-fold情况下三个数据集预测结果的平均绝对误差均有降低,相关系数均有提高。另外,该算法采用了多序列比对作为输入,效果比单序列有所提高。采用该方法,对CB-513数据集平均绝对误差可以达到16.8%、相关系数为0.562,而用RVP-Net方法分别为18.8%和0.480。这些结论表明支持向量回归方法是蛋白质序列分析的一种有效工具。 In this work a novel method was proposed to predict the relative solvent accessibilities of residues from protein primary sequences. This method was based on support vector regression （SVR） and used the local information of the particular residue for prediction as input. Three data sets, RS-126, Manesh-215 and CB-513, were collected and used to evaluate prediction performance. With 3-fold cross validation test, the average of mean absolute error （MAE） and correlation coefficient （CC） for different data set were consistently better than a previous method called RVP-Net which was based on a muhilayer feed-forward neural network. In addition, we used multiple sequence alignment as input information and obtained a prediction result of 16.8% for MAE and 0.562 for CC, which was superior to that obtained with single sequence input. The results demonstrate the efficiency of this method and that the support vector regression is a useful tool for proteomics prediction analysis.

作者许文龙李骜王明会江朝晖冯焕清

机构地区中国科学技术大学电子科学与技术系北京工业大学生命科学和生物工程学院

出处《中国生物医学工程学报》 CAS CSCD 北大核心 2007年第1期1-5,共5页 Chinese Journal of Biomedical Engineering

基金中国科学技术大学高水平大学建设重点科研项目

关键词相对可溶性支持向量机机器学习蛋白质结构预测生物信息学 relative solvent accessibility support vector machine machine learning protein structure prediction bioinformatics

分类号 Q617 [生物学—生物物理学]

引文网络
相关文献

参考文献12

1Naderi-Manesh H, Sadeghi M, Arab S, et al. Prediction of protein surface accessibility with information theory [J]. Proteins, 2001,42:452 - 459.
2Rost B, Sander C. Conservation and prediction of solvent accessibility in protein families [J]. Proteins, 1994, 20:216-226.
3Yuan Z, Burrage K, Mattick JS. Prediction of protein solvent accessibility using support vector machines [ J ]. Proteins, 2002,48:566 - 570.
4Kim H, Park H. Prediction of protein relative solvent accessibility with support vector machines and long-range interactive 3D local descriptor [J]. Proteins, 2004, 54:557- 562.
5Ahmad S, Gromiha MM, Sarai A. Real value prediction of solvent accessibility from amino acid sequence [J]. Proteins, 2003, 50:629 - 635.
6Rost B, Sander C. Improved prediction of protein secondary structure by using sequence profiles and neural network [ J ]. Proc Natl Acad Sci, 1993, 90:7558 - 7562.
7Cuff JA, Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction [ J ].Proteins, 2000, 40:502 - 511.
8Hua SJ, Sun ZR. A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach [J]. J Mol Biol, 2001, 308:397-407.
9Cai CZ, Han LY, Ji ZL, et al. Enzyme family classification by support vector machines [J]. Proteins, 2004, 55:66 - 76.
10李霞,张田文,李丽,郭政.决策树特征基因选择方法对SVM有效性的研究[J].中国生物医学工程学报,2004,23(1):66-72. 被引量：15

二级参考文献8

1[1]DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomicscale[J]. Science,1997,278:680-685.
2[2]Golub TR, et al. Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring[J]. Science, 1999,286:531-537.
3[3]Eric P. Xing, Michael I. Joran, Richard M. Karpy. Feature Selection for High-Dimensional Genomic Microarray Data, International conference on machine learning[R]. 2001.
4[4]John G, Kohavi, Pfleger K. Irrelevant features and the subset selection problem in W.W.Cohen & H.Hirsh(eds), Machine Learning: Proceedings of the 11th International Conference[R]. CA,1994,121-129.
5[5]Xia Li, et al. An ensemble method for gene discovery based on DNA microarray data[J]. SCIENCE IN CHINA(Series C).2004,47(1).
6[6]Cristianini N, Shawe-Taylor J. An Introduction to Spport Vector Machines[M]. Cambridge University Press. Cambridge. www.support-vector.net.
7[7]Brown, MPS, Grundy WN, Lin D, et al. Support Vector Machine Classification of Microarray Gene Expression Data[M].1999,Technical Report USCC-CRL-99-09. Available at: http://www.cse.ucsc.edu/research/compbio/genex.
8[8]Alon U, Barkai N, Notterman D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays[J]. Proc Natl Acad Sci USA, 1999,96:6745-6750.

共引文献14

1李杰,唐降龙,王亚东,李霞.基因表达谱聚类/分类技术研究及展望[J].生物工程学报,2005,21(4):667-673. 被引量：6
2郭磊,武优西,刘雪娜,颜威利,沈雪勤.基于主成份分析和支持向量机的MRI图像多目标分割[J].中国生物医学工程学报,2007,26(4):498-502. 被引量：3
3周昉,何洁月.生物信息学中基因芯片的特征选择技术综述[J].计算机科学,2007,34(12):143-150. 被引量：20
4李勇明,曾孝平,蒋阳,王毅,曾浩,赵德春.搜索空间逐步缩小的遗传算法用于尿沉渣图像特征选择的研究[J].中国生物医学工程学报,2008,27(6):842-847. 被引量：1
5李建更,高志坤,严志,阮晓钢.基于双基因分析的结肠癌标志基因选择[J].中国生物医学工程学报,2009,28(5):691-695. 被引量：2
6钟金贝,林亚平,卢新国.在癌症分类中基于分层抽样的神经网络集成算法[J].微计算机信息,2010,26(4):178-180.
7席金菊,李淑红.基于相似度模型的动物疾病确诊方法研究[J].计算机工程与设计,2010,31(5):1134-1136. 被引量：3
8陈岩,来海锋,王清,王卫伟.基于filter-wrapper的两步特征变量提取方法[J].机电工程,2010,27(4):67-71. 被引量：4
9韩斌,陈岩,来海锋,厉力华,祝磊,代琦.基于诊断结果相关性分析的肿瘤基因标志物提取方法[J].航天医学与医学工程,2010,23(6):449-454.
10潘冬寅,朱发,徐昇,业宁.结肠癌基因表达谱的特征选取研究[J].山东大学学报（工学版）,2012,42(2):23-29. 被引量：1

1王娴,李骜,王明会,冯焕清.基于支持向量机方法的蛋白质氨基酸残基可溶性预测[J].生物物理学报,2005,21(1):60-64. 被引量：2
2雷红星,吴加金,李伍举.大肠肝菌中重组蛋白的可溶性预测[J].生物物理学报,1993,9(3):504-506.
3尹辉,敬闰宇,李益洲,文志宁,李梦龙.支持向量回归预测蛋白质残基的B因子[J].计算机与应用化学,2011,28(11):1477-1480. 被引量：2
4孙重华,江凡.Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method[J].Chinese Physics B,2010,19(11):1-6. 被引量：1
5Envitation[J].Asian Journal of Andrology,2015,17(2).
6Jnuitation[J].Asian Journal of Andrology,2015,17(3).
7张金屯.植被与环境关系的分析Ⅲ.高斯回归[J].山西大学学报（自然科学版）,1993,16(3):316-321.
8须文波,陆克中.神经网络在蛋白质二级结构预测中的应用[J].生物信息学,2006,4(1):26-29. 被引量：7
9基础医学——医学生物化学[J].中国学术期刊文摘,2007,13(3):192-192.
10李若兰.植入皮肤内的身份证[J].科学画报,2005,0(6):25-25.

中国生物医学工程学报

2007年第1期

浏览历史

内容加载中请稍等...

基于支持向量回归方法的蛋白残基可溶性预测

参考文献12

二级参考文献8

共引文献14

相关作者

相关机构

相关主题

浏览历史