摘要
编码方式是影响蛋白质二级结构预测准确率的重要因素之一。针对单序列蛋白质二级结构预测问题,提出了一种新的综合编码方法。该编码是根据氨基酸出现在每种二级结构中的倾向因子以及氨基酸的疏水性值进行分类,并以二进制形式来表示每类氨基酸的编码方法。在相同的实验条件下,首先用不同的编码方式对数据集CB513进行编码,然后采用支持向量机的方法进行训练建模预测。实验结果显示提出编码的预测准确率比20位正交编码和5位编码分别高出1.48%和10.68%。可见,该编码比较适合非同源或低同源蛋白质结构预测。
Coding scheme plays an important role on determining the protein secondary structure prediction.A new comprehensive coding scheme is suggested to use for single-sequence protein secondary structure prediction.The method regards not only the trending factor of every amino acid appearance in protein secondary structure,but also the value of amino acid hydrophobicity,and it uses binary form to express all kinds of amino acid.The different code schemes are used to state the date set of CB513.Then,the theory of Support Vector Machine(SVM) is applied to protein secondary structure prediction. The results show that the prediction accuracy of the new coding scheme are about 1.48% and 10.68% higher than the classical orthogonal matrix and the five coding,respectively.It showes that this coding is more suitable for non-homologous or lower homologous protein structure prediction.
出处
《计算机工程与应用》
CSCD
北大核心
2011年第18期163-165,共3页
Computer Engineering and Applications
基金
国家自然科学基金(No.61070060)~~
关键词
编码方式
蛋白质二级结构预测
支持向量机
coding scheme
protein secondary structure prediction
Support Vector Machine(SVM)