摘要
介绍了一种从蛋白质序列预测残基相对可溶性的新方法。该方法基于支持向量回归,并将序列局部信息作为输入。不同于先前的大部分预测方法仅对特定的蛋白残基相对可溶性进行状态分类,该方法预测了相对可溶性的连续值,从而比状态分类保留了蛋白质三维结构的更多信息。本研究对RS-126,Manesh-215和CB-513三个数据集进行了测试,通过比较不同的参数及窗宽模型来获得最佳结果,采用平均绝对误差、相关系数等参数来衡量预测效果,同时与多层反馈神经网络方法(RVP-Net)的实验结果比较,在3-fold情况下三个数据集预测结果的平均绝对误差均有降低,相关系数均有提高。另外,该算法采用了多序列比对作为输入,效果比单序列有所提高。采用该方法,对CB-513数据集平均绝对误差可以达到16.8%、相关系数为0.562,而用RVP-Net方法分别为18.8%和0.480。这些结论表明支持向量回归方法是蛋白质序列分析的一种有效工具。
In this work a novel method was proposed to predict the relative solvent accessibilities of residues from protein primary sequences. This method was based on support vector regression (SVR) and used the local information of the particular residue for prediction as input. Three data sets, RS-126, Manesh-215 and CB-513, were collected and used to evaluate prediction performance. With 3-fold cross validation test, the average of mean absolute error (MAE) and correlation coefficient (CC) for different data set were consistently better than a previous method called RVP-Net which was based on a muhilayer feed-forward neural network. In addition, we used multiple sequence alignment as input information and obtained a prediction result of 16.8% for MAE and 0.562 for CC, which was superior to that obtained with single sequence input. The results demonstrate the efficiency of this method and that the support vector regression is a useful tool for proteomics prediction analysis.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2007年第1期1-5,共5页
Chinese Journal of Biomedical Engineering
基金
中国科学技术大学高水平大学建设重点科研项目
关键词
相对可溶性
支持向量机
机器学习
蛋白质结构预测
生物信息学
relative solvent accessibility
support vector machine
machine learning
protein structure prediction
bioinformatics