摘要
为了使决策树健壮,我们从描述信息增益开始,关于这个规则的置信度,使用C4.5作为度量。这可以使我们快速的解释为什么信息增益,象置信度,偏重大多数类的规则的结果。为了克服这种偏见,我们介绍一种新度量,类置信度比例(CCP),它是CCPDT(类置信度比例决策树)形成的基础。这两种变化在一起产生一个分类器,它不仅比传统的决策树,而且比著名的平衡取样技术学习树能更好的完成统计。
In order to make decision trees robust, we begin by expressing Information Gain, the metric used in C4.5, in terms of con- fidence of a rule. This allows us to immediately explain why Information Gain. like confidence, results in rules which are biased towards the majority class. To overcome this bias. we introduce a new measure. Class Confidence Proportion (CCP), which forms the basis of CCPDT. Together these two changes yield a classifier that performs statistically better than not only traditional decision trees but also trees learned from data that has been balanced by well known sampling techniques.
出处
《大观周刊》
2012年第26期113-114,共2页
关键词
类置信度比例
决策树
分类器
Information Gain CCP Decision Tree Classifier