摘要
针对不平衡数据集的有效分类问题,提出一种结合代价敏感学习和随机森林算法的分类器。首先提出了一种新型不纯度度量,该度量不仅考虑了决策树的总代价,还考虑了同一节点对于不同样本的代价差异;其次,执行随机森林算法,对数据集作K次抽样,构建K个基础分类器;然后,基于提出的不纯度度量,通过分类回归树(CART)算法来构建决策树,从而形成决策树森林;最后,随机森林通过投票机制做出数据分类决策。在UCI数据库上进行实验,与传统随机森林和现有的代价敏感随机森林分类器相比,该分类器在分类精度、AUC面积和Kappa系数这3种性能度量上都具有良好的表现。
For the problem of effective classification on imbalanced data sets,a classifier combining cost-sensitive learning and random forest algorithm is proposed.Firstly,a new impurity measure is proposed,taking into account not only the total cost of the decision tree,but also the cost difference of the same node for different samples.Then,the random forest algorithm is executed,K times sampling for the data set is performed,and K basic classifiers are built.Then,the decision tree is constructed by the classification regression tree (CART) algorithm based on the proposed impurity measure,so as to form the decision tree forest.Finally,the random forest algorithm makes the data classification decision by voting mechanism.In the UCI database,compared with the traditional random forest and the existing cost-sensitive random forest classifier,this classifier has good performance in the classification accuracy,AUC area and Kappa coefficient.
出处
《计算机科学》
CSCD
北大核心
2017年第B11期98-101,共4页
Computer Science
关键词
代价敏感学习
随机森林
不纯度度量
分类回归树(CART)
不平衡数据
Cost-sensitive learning, Random forest, Impurity measurement, Classification regression tree (CART ), Imbalanced data