摘要
异常值检测是数据挖掘领域中的核心问题,在工业生产中也有着广泛的应用。准确高效的异常值检测方法能够及时反映出工业系统运行状态,为相关人员提供参考,而传统的异常值检测方法无法很好地检测出变化模式复杂、变化范围小、具有流数据特性的数据中的异常值。因此,本文提出了一种新的针对该类型数据的异常值检测方法:首先通过对数据进行聚类划分,将相似的数据进行归类,从而将原本复杂的数据分布拆解成为每个聚类下简单数据分布的叠加;然后使用核密度估计假设检验的方法对待检测数据进行异常值检测。在标准数据集和真实数据上的实验结果表明,该方法相比于传统的异常值检测方法在检测精度上有一定的提升。
Outlier detection is the core problem in data mining and is widely used in industrial production.Accurate and efficient outlier detection method can reflect the condition of industrial system in time,which provides reference for the relevant personnel.Traditional outlier detection algorithms can′t efficiently detect outliers in those data with complicated change modes,small change range and the characteristics of streaming data.In this paper a new method for detecting outliers is proposed.Firstly,the data are clustered into several categories by clustering.The data in the same categories share the common characteristics.In this way,we believe that the data in the same categories are under the same distribution which are simpler to fit than the whole data.So the original complex data distribution can be factored into several simple distributions.Secondly,kernel density estimation(KDE)hypothesis testing is used for abnormal value detection.Experiments in the UCI dataset and real industrial data show that the proposed method is more efficient than traditional methods.
出处
《数据采集与处理》
CSCD
北大核心
2017年第5期997-1004,共8页
Journal of Data Acquisition and Processing
基金
国家自然科学基金(61503178)资助项目
江苏省自然科学基金(BK20150587)资助项目
关键词
异常值检测
聚类
假设检验
核密度估计
outlier detection
clustering
hypothesis testing
kernel density estimation