期刊文献+

基于特征相关性选择的二硫键预测算法

Predicting Disulfide Connectivity Based on Correlation Coefficients Selection
下载PDF
导出
摘要 二硫键是维持蛋白质结构与功能稳定的重要生物特征,先前关于二硫键模式的预测通常为将相关特征进行特征选择并代入机器学习模型,其缺陷在于没有考虑不同特征之间的关联性,该文根据传统的预测方法,在使用费舍得分进行特征选择的基础上,计算特征子空间中各特征的相关度,剔除线性相关度高的特征,利用支持向量回归对处理后的数据进行四重交叉验证,以取得更加理想的效果。 Disulfide connectivity is one of significant protein structural characteristic. Previous prediction methods usuallyused support vector regression,which didnt consider the correlation between different features. According to traditional predictionmethods,based on fisher score,this paper calculated correlation coefficient of each pair of features after feature selection,then de-leted the features with high correlation coefficient. Based on the rest features,support vector regression was used to train model andtest. 4-fold validation was used on our benchmark dataset to gain a hopeful result comparing with previous results.
作者 刘坤
出处 《计算机与数字工程》 2017年第11期2093-2096,2117,共5页 Computer & Digital Engineering
基金 国家自然科学基金项目(编号:61373062 61371040)资助
关键词 生物信息学 二硫键 支持向量回归 相关系数 特征选择 bioinformatics,disulfide bond,support vector regression,correlation coefficient,feature selection
  • 相关文献

参考文献1

二级参考文献23

  • 1Archer KJ, Kirnes RV, 2008. Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. ,52(4):2249-2260.
  • 2Biau G, 2012. Analysis of a random forests model. J. Mach. Learn. Res. , 13: 1063 -1095.
  • 3Breiman L, 2001a. Random forests. Mach. Learn. , 45:5 - 32.
  • 4Breiman L, 2001b. Statistical modeling: The two cultures. Stat. Sci., 16:199-215.
  • 5Breiman L, Friedman JH, O lshen RA, Stone CJ, 1984.Classification and Regression Trees. Chapman and Hall. 1 -359.
  • 6Cutler DR, Edwards TC, Jr., Beard KH, Cutler A, Hess KT, 2007. Random forests for classification in ecology. Ecology, 88 (11) :2783 - 2792.
  • 7Deng H, Runger G, Tuv E, 2011. Bias of importance measures for multi-valued attributes and solutionsl I Proceedings of the 21 st International Conference on Artificial Neural Networks (ICANN).
  • 8Elith J, Graham CH, 2009. Do they? How do they? Why do they differ? On finding reasons for differing performances of species distribution models. Ecography, 32 ( 1 ) : 66 - 77 .
  • 9Genuer R, Poggi JM, Tuleau-Malot C, 2010. Variable selection using random forests. Pattern Recogn. Lett., 31 (14) :2225 - 2236.
  • 10Groemping U, 2009. Variable importance assessment in regression.: linear regression versus random forest. Am. Stat. , 63(4) :308 -319.

共引文献361

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部