摘要
决策树算法易受训练样本集中噪声和混杂区域的影响。重复剪辑近邻法能消除样本集中符合某些先决条件的噪声 ,清除混杂区域中后验概率较小的类别所包含的样本 ,并在各类样本间形成符合Bayes分类准则的界线。用它对合适的训练样本集进行筛选 ,可在不损害分类准确率的同时明显地减小决策树的规模 ,有助于增强决策树的可理解性和可用性 ,从而提高决策树的性能。
Noises and overlapped regions existing in training samples hurt the simplicity and generality of decision trees. To solve this problem, a sample selection algorithm based on multi-edit-nearest-neighbor rule is proposed. This algorithm, under ideal conditions, can eliminate the noise satisfying some prerequisites, purify the overlapped region according to its members′ posterior probabilities, and finally form a Bayesian boundary between samples of different classes. When applied to an appropriate trainingdataset,itobviouslycutsdownthesize of resulting decision trees without sacrificing the accuracy. This improves both the understandability and generality of decision trees.
出处
《控制与决策》
EI
CSCD
北大核心
2003年第1期96-98,102,共4页
Control and Decision
基金
国家 8 6 3高技术计划基金资助项目 (86 3- 5 11- 945 - 0 0 5
86 3- 30 6 -ZD13- 0 5 - 6 )