摘要
异常检测是数据挖掘中的一个重要研究方向,当前大多数基于密度的异常检测算法常常基于样本分布假设,敏感于近邻参数k并且缺乏对集体异常点的检测能力。针对这些问题,提出了一种基于核密度估计的核密度波动算法。定义了可以综合评估数据点邻域内和邻域外核密度值波动的核密度波动因子,将其作为检测指标,并制定检测规则来识别异常点,这一指标可以综合考虑数据点的局部特征和全局特征,而且有助于发现集体异常。数据集上的实验结果表明,所提算法可以取得更好的检测结果,同时对算法参数具有相当的鲁棒性。
Anomaly detection is an important research direction in data mining.Most current density-based algorithms are often based on sample distribution assumptions,are sensitive to the nearest neighbor parameter k,and cannot detect collective outliers.Aiming at these problems,a kernel density fluctuation algorithm based on kernel density estimation is proposed.The kernel density fluctuation factors that can comprehensively evaluate the fluctuations of nuclear density val-ues within and outside the neighborhood are defined,and detection criteria are developed to identify outliers.This indica-tor can comprehensively consider the local and global characteristics of the data points,and at the same time help to find collective anomalies.The experimental results on the data set show that the proposed algorithm can achieve better detec-tion results,and at the same time,it is quite robust to the algorithm parameters.
作者
张博文
刘智
桑国明
ZHANG Bowen;LIU Zhi;SANG Guoming(Information Science and Technology College,Dalian Maritime University,Dalian,Liaoning 116026,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第12期132-136,共5页
Computer Engineering and Applications
基金
国家自然科学基金(61672122)
中央高校基本科研业务费专项(3132019207)。
关键词
数据挖掘
异常检测
核密度估计
核密度波动
敏感性分析
data mining
anomaly detection
kernel density estimation
kernel-density fluctuation
sensitivity analysis