期刊文献+

基于不平衡数据集的决策树算法

下载PDF
导出
摘要 为了使决策树健壮,我们从描述信息增益开始,关于这个规则的置信度,使用C4.5作为度量。这可以使我们快速的解释为什么信息增益,象置信度,偏重大多数类的规则的结果。为了克服这种偏见,我们介绍一种新度量,类置信度比例(CCP),它是CCPDT(类置信度比例决策树)形成的基础。这两种变化在一起产生一个分类器,它不仅比传统的决策树,而且比著名的平衡取样技术学习树能更好的完成统计。 In order to make decision trees robust, we begin by expressing Information Gain, the metric used in C4.5, in terms of con- fidence of a rule. This allows us to immediately explain why Information Gain. like confidence, results in rules which are biased towards the majority class. To overcome this bias. we introduce a new measure. Class Confidence Proportion (CCP), which forms the basis of CCPDT. Together these two changes yield a classifier that performs statistically better than not only traditional decision trees but also trees learned from data that has been balanced by well known sampling techniques.
出处 《大观周刊》 2012年第26期113-114,共2页
关键词 类置信度比例 决策树 分类器 Information Gain CCP Decision Tree Classifier
  • 相关文献

参考文献5

  • 1FAWCTT T,PROVOST F. Combining data mining and ma chine learning for effective user profile[A].Portland:AAA I Press,1996.8-13.
  • 2WEISS G. Mining with ratity:a unifying framework[J].SIGKDD Explorations,2004,(01):7-19.
  • 3周丽;李坚.数据仓库与决策支持[M]北京:国防工业出版社,2003.
  • 4Xu X,He Y. Improvements on Fast Motion Estimation Strategy for H.264/AVC[J].IEEE Transactions on Circuits and Systems for Video Technology,2008,(03):285-293.
  • 5李瑞,魏现梅,黄明,梁旭.一种改进的决策树学习算法[J].科学技术与工程,2009,9(20):6038-6041. 被引量:10

二级参考文献6

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部