摘要
离群点挖掘可揭示稀有事件和现象、发现有趣的模式,有着广阔的应用前景,因此引起广泛关注。首先介绍离群点的定义、引起离群的原因和离群点挖掘算法的分类,对基于距离和基于密度的离群点挖掘算法进行了比较详细的讨论,指出了其优缺点和发展方向,重点对当前研究的热点——高维大数据量的挖掘、空间数据挖掘、时序离群点挖掘和离群点挖掘技术的应用进行了讨论,指出了进一步研究方向。
The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. This survey provided a comprehensive overview of existing outlier mining techniques and summarized their features to help users choose, studied and improve algorithms for outlier mining. Studied the outlier mining techniques on high-dimensional data, spatial data and sequential data, pointed out the advantages and disadvantages, and put forward their researeh direction about outlier mining in future work.
出处
《计算机科学》
CSCD
北大核心
2008年第11期13-18,27,共7页
Computer Science
基金
国家自然科学基金(60603041)
江苏省高校自然科学基金(05KJB520017)
关键词
离群点挖掘
局部离群点
子空间
剪枝
空间离群点
高维数据
数据流
Outlier mining, Local outlier, Subspace, Pruning, Spatial outlier, High dimensional data, Sequential data