摘要
介绍了一种有效的决策树改进模型:R-C 4.5及其简化版本,旨在构造一棵简单的树,同时提高决策树属性选择度量的可解释性,减少空枝和无意义分枝,以及过度拟合。该决策树模型基于著名的C 4.5决策树模型,但在属性的选取和分枝策略上进行了改进。在R-C 4.5中,通过合并分类效果差的分枝,有效避免了碎片等问题。实验表明,R-C 4.5决策树在保持模型预测准确率的同时,有效改进了树的健壮性。作为R-C 4.5的简化版本,R-C 4.5c和R-C 4.5s可生成更为简单的树,而且R-C 4.5s通过数据预处理阶段完成,易于实现。
An effective improved decision tree—R-C 4.5 and its simplified versions were proposed to enhance the interpretability of test attribute selection measure,reduce the numbers of insignificant or empty branches,and avoid the appearance of over fitting.This model was based on C 4.5 and improved on attribute selection and partition methods.R-C 4.5 combines branches which have high entropies,because these branches have poor classification effect in divide-and-conquer process.The results of experiments show that R-C 4.5 improves the predictive accuracy and robustness.As the simplified versions of R-C 4.5,R-C 4.5c and R-C(4.5)s can construct more robust trees.And R-C 4.5s is improved in data preprocessing,so it is the easiest version to be implemented in the three ones.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2006年第z1期996-1001,共6页
Journal of Tsinghua University(Science and Technology)
基金
上海财经大学"211"工程资助项目