摘要
利用多标签机器学习对多功能蛋白质进行分类预测是目前比较好的方法。随机k标签分类算法(RAkEL)采用将所有标签按k长度随机划分进行分类预测。虽然该方法考虑到了标签之间的相关性,但是随机k标签划分时会产生大量冗余标签,增加了分类器计算量。对传统的随机k标签分类学习算法进行改进,加入Apriori算法对标签进行关联规则挖掘,将得到的关联规则进行标签划分,然后运用集成LP算法进行模型训练,从而得到最终模型,最后以最终模型进行标签分类预测。运用改进的多标签分类学习算法对多功能酶(一种多功能蛋白质)进行分类预测,并与传统的多标签分类学习算法进行效果比较,改进后的多标签分类器在相关指标上能取得较好优势,其中平均精度(AP)可达92.03%。
Using multi-label machine learning to classify and predict multifunctional proteins is a good method at present.The random k label classification algorithm(RAkEL)uses the random division of all labels according to the k length to predict the classification.However,this method takes into account the correlation between labels,but random k of labels will produce a large number of redundant labels to increase the classifier computation.This paper improves the traditional random k label classification learning algorithm,adds the Apriori algorithm to mine the label association rules,and divides the resulting association rules into labels.Finally,the integrated LP(Label Powerset)algorithm is used to train the model.The final model is obtained for classification and prediction labels.An improved multi-label classification learning algorithm is used to predict the classification of multifunctional enzymes(a multifunctional protein),and the results are compared with the traditional multi-label classification learning algorithm.The improved multi-label classifier can obtain better advantages in related indexes,in which the average accuracy(AP)can reach 92.03%.
作者
皮赛奇
刘干
PI Sai-qi;LIU Gan(College of Humanities and Science,Guizhou Minzu University,Guiyang 550025,China)
出处
《软件导刊》
2021年第7期34-37,共4页
Software Guide
基金
贵州民族大学人文科技学院自然科学基金项目(19rwjs003)
贵州民族大学自然科学基金项目(GZMU[2019]QN2)。