摘要
在数据挖掘中我们往往会忽略离群数据,可是这些数据却往往包含重要的信息。本文采用了将决策树与相异度相结合的方式进行离群数据的挖掘。通过计算决策树中各属性的信息增益,递归构造出决策树,并通过剪枝,进行初次的离群点检测,再运用相异度计算公式建立矩阵,找出最终的离群点集合。
We always ignore the outlier in the course of data mining, but the outlier sometimes include the important information. The outlier mining is done by the way of joining the decision tree and dissimiliarity in the paper. The decision tree is recursively con- structed by computing the information gain of different attributes and the outlier is firstly detected by pruning, then establish matrices by the dissimiliarity, finding the outlier set.
出处
《微计算机信息》
2009年第21期131-132,124,共3页
Control & Automation
基金
江西省教育厅基金项目(赣教技字[2005]42)
江西省教改基金项目(赣教改字[2005]100)
关键词
离群数据
决策树
相异度
outlier
decision tree
dissimiliarity