摘要
ID3算法作为一种流行的决策树算法,因为其算法简单、易实现而被广泛使用。但其生成的树结构往往过于庞大,复杂,也影响了算法效率。为了优化树的结构,提高树生成的效率,避免"过拟合"效应,本文将每个分类属性分类后的效果也考虑在内,即,若分类效果达到某个预定的标准则终止那条分支继续分类,并引入了最大支持度的概念,采用了前剪枝策略,对ID3算法进行了改进。实验结果显示,改进算法的确能够使生成的决策树在保证精度的基础上更加精简。
As a popular algorithm of decision tree, ID3 is widely used because of its simple idea and facile realization. However, the structure of the tree produced by this algorithm is usually too large and complex, thus the performance of the algorithm is restricted. In order to enhance the efficiency of the tree-producing process and avoid "overfitting", we take the classification effect of each classifying attribute into account, that is, if the classification effect reaches a certain level, the process of classification of that branch will be terminated, and propose an improved algorithm by using the maximum support and adopting pre-pruning strategy. The experiment results show that the improved algorithm can make decision tree simpler without reducing precise.
出处
《计算机与现代化》
2008年第9期47-50,共4页
Computer and Modernization
基金
上海市科委资助项目(05DZ11C06)
关键词
数据挖掘
决策树
前剪枝
data mining
decision tree
pre-pruning