摘要
针对决策树算法在分类时的多值偏向问题,提出了一种合理的基于相关系数的MID3算法的改进算法。该算法在生成决策树的过程中,将属性与分类结果之间的相关关系引入决策树节点的属性选择中,从而在一定程度上解决ID3算法的多值倾向问题,同时考虑系统两层节点从全局上优化树的结构。利用UCI数据集样本进行实验,将本文算法与ID3算法进行对比,得到了算法的效率的比较结果。实验结论表明,算法提高了数据的平均分类准确率,生成的决策树结构更加合理。
Aiming at the multi-value bias problem of decision tree algorithm in classification,a reasonable improved algorithm of MID3 algorithm based on correlation coefficient is proposed.In the process of generating decision tree,the algorithm introduces the correlation between attributes and classification into the attribute selection of decision tree nodes,so as to solve the multivalued tendency problem of ID3 algorithm to a certain extent.At the same time,the two-tier nodes of the system are considered to optimize the structure of the tree from the whole situation.By comparing with ID3 and improved algorithm,the efficiency of the algorithm is tested and compared with specific UCI data set samples.The conclusion shows that the algorithm improves the average classification accuracy,and the structure of decision tree is more reasonable.
作者
吕洁
王凤芹
王丽娜
LV Jie;WANG Feng-qin;WANG Li-na(Naval Aviation University,Yantai,Shandong 264001,China)
出处
《计算技术与自动化》
2023年第1期119-122,共4页
Computing Technology and Automation
关键词
相关系数
ID3
MID3
信息熵
决策树
correlation coefficient
ID3
MID3
information entropy
desicion tree