摘要
肝癌是一种我国高发的消化系统恶性肿瘤,患者死亡率高,威胁极大。而其预后情况通常只能通过医生的专业知识和经验积累来粗略判断,准确率较差。因此文中在分析随机森林算法的基本原理的基础上,提出一种改进的基于随机森林的特征筛选算法,并应用Python编程设计了一个能够预处理数据、调用这些算法、控制各参数并展现测试结果的系统,最终将该系统应用于肝癌预后预测,比较分析了不同的算法、参数、内部策略对预测精度和计算性能的影响。研究结果表明,随机森林相比剪枝过的决策树具备更好的泛化能力和训练速度,改进的特征筛选算法能够在保证预测精度的前提下显著缩小特征集。
Liver cancer is a malignant tumor of the digestive system highly occurred in China,which causes high mortality of patients and great threat to their lives,and its prognosis conditions are often roughly judged by doctors with their professional knowledge and experience accumulation,which has poor accuracy. Therefore,on the basis of analyzing the basic principle of the random forest algorithm,an improved feature selection algorithm based on the random forest is proposed in this paper. The Python programming design is applied to design a system that can preprocess data,recall the algorithms,control various parameters and display test results. The system is applied to the prognosis prediction of the liver cancer. The influences of different algorithms,parameters and internal strategies on the prediction accuracy and computing performance are compared and analyzed. The research results show that in comparison with the pruned decision tree,the random forest has a better generalization ability and training speed,and the improved feature selection algorithm can significantly reduce the feature set on the premise of guaranteeing the prediction accuracy.
作者
刘云翔
陈斌
周子宜
LIU Yunxiang;CHEN Bin;ZHOU Ziyi(School of Computer Science and Information Engineering,Shanghai Institute of Technology,Shanghai 201418,China)
出处
《现代电子技术》
北大核心
2019年第12期117-121,共5页
Modern Electronics Technique
基金
国家自然科学基金项目(61702334)
上海市自然科学基金项目(17ZR1429700)~~
关键词
随机森林算法
特征筛选
肝癌预后预测
决策树
预测精度
特征集
random forest algorithm
feature selection
liver cancer prognosis prediction
decision tree
prediction accura cy
feature set