摘要
旋转森林(rotation forest,Ro F)是一种运用线性分析理论和决策树的集成分类算法,在分类器个数较少的情况下仍可以取得良好的结果,同时能保证集成分类的准确性。但对于部分基因数据集,存在线性不可分的情况,原始的算法分类效果不佳。提出了一种运用核主成分分析变换的旋转森林算法(rotation forest algorithm based on kernel principal component analysis,KPCA-Ro F),选择高斯径向基核函数和主成分分析的方法对基因数据集进行非线性映射和差异性变化,着重于参数的选择问题,再利用决策树算法进行集成学习。实验证明,改进后的算法能很好地解决数据线性不可分的情形,同时也提高了基因数据集上的分类精度。
Rotation forest(RoF)algorithm is an ensemble classification algorithm using linear analysis theory and decision trees.The rotation forest achieves higher classification accuracy and superior performance with less number of classifiers.However,the classification accuracy decreases for gene expression data with linearly inseparable cases.To address this issue,this paper proposes a rotation forest algorithm based on kernel principal component analysis(KPCA-RoF),chooses the Gaussian kernel function and principal component analysis to implement the nonlinear mapping and deal with differences in gene data.The proposed algorithm focuses on the optimization of parameters,and uses decision tree algorithm for ensemble learning.Experiments show that the new algorithm well addresses the linearly inseparabal issue and improves the classification accuracy.
作者
陆慧娟
刘亚卿
孟亚琼
关伟
刘砚秋
LU Huijuan;LIU Yaqing;MENG Yaqiong;GUAN Wei;LIU Yanqiu(College of Information Engineering, China Jiliang University, Hangzhou 310018, China;College of Modern Science and Technology, China Jiliang University, Hangzhou 310018, China)
出处
《计算机科学与探索》
CSCD
北大核心
2017年第10期1570-1578,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金Nos.61272315
60905034
浙江省自然科学基金No.Y1110342
国家安全总局项目No.zhejiang-00062014AQ~~
关键词
核函数
主成分分析
决策树
旋转森林
基因数据分类
kernel function
principal component analysis
decision tree
rotation forest
gene data classification