摘要
介绍离群点的定义和传统的离群检测算法,针对传统离群检测算法无法适用于高维数据的问题,分析现有的基于子空间的高维数据离群检测算法的优缺点,给出一种基于强相关子空间的离群检测算法。使用信息熵和互信息发现强相关子空间,使用经典离群挖掘算法确定离群值。采用UCI数据集和人工数据集作为实验数据验证了该算法的可行性,仿真结果表明,该算法能够有效识别高维数据集中的离群点,精度高,耗时短。
The definition of outliers and traditional outlier detection algorithms were introduced,aiming at the problem that traditional algorithms are not suitable for high-dimensional data,the advantages and disadvantages of outlier detection methods of high-dimensional data based on subspace were analyzed,and a strong correlation subspace outlier detection algorithm was proposed.These strong correlation subspaces were found using entropy and mutual information.Outliers were distinguished from these subspaces using classical outlier detection techniques.The UCI data sets and synthetic data sets were used as the experimental data.Simulation results show that the proposed approach can identify outliers effectively in high-dimensional datasets and it has higher precision and consumes less time.
出处
《计算机工程与设计》
北大核心
2017年第10期2754-2758,共5页
Computer Engineering and Design
基金
国家青年科学基金项目(61602335)
关键词
高维数据
数据挖掘
离群检测
强相关子空间
互信息
high-dimensional data
data mining
outl ier detection
strong correlation subspace
mutual information