摘要
随机森林算法在多种应用场景与数据集中都实现了良好的模型分类效果,但该算法在应用于不平衡二分类数据集时,受限于样本数据量本身的好坏比倾斜与决策子树叶节点投票机制,对样本量占相对少数的小类属样本不能很好地对分类进行表决。对此,文中对原有随机森林算法的节点分类规则进行改进。在模型训练过程中,综合考虑度量节点样本分类占比与节点深度,增加有利于少量类样本分类信息,从而提高了少数样本类的分类准确率。通过在不同数据集上进行随机森林改进算法的效果测试,证明改进算法相对于传统算法在不平衡数据集上有更好的模型表现,大样本条件下少量类样本分类准确率有显著提升。
Random forest algorithm has achieved a great classification effect in a variety of scenarios and datasets,but when applied in the unbalanced binary classification datasets,it is restricted to the imbalance of sample data itself and the leaf node voting mechanism,the sample which size of relatively few samples can't vote on classification very well. For this,we improve the node classification rules of original random forest algorithm. In model training,by considering sample classification proportion and the depth of the measurement nodes comprehensively,and increasing classified information in favor for the small amount of samples,the accuracy of the few sample classification can be raised. After testing on different datasets,it proves that the improved algorithm on unbalanced dataset has better performance than the traditional algorithm,and that the few sample classification accuracy has been increased significantly under the condition of large amount of dataset.
作者
刘耀杰
刘独玉
LIU Yao-jie;LIU Du-yu(School of Electrical and Information Engineering,Southwest Minzu University,Chengdu 610041,China)
出处
《计算机技术与发展》
2019年第6期100-104,共5页
Computer Technology and Development
基金
中央高校基本科研业务费专项资金项目(2017ZYXS09)