摘要
深入分析了传统的基于密度的聚类方法的特点和存在的问题及讨论了基于密度聚类算法研究现状,提出了一种改进的基于密度分布函数的聚类算法.使用K最近邻(KNN)的思想度量密度以寻找当前密度最大点,即中心点.并使用区域比例,将类从中心点开始扩展,每次扩展的同时引入半径比例因子以发现核心点.再从该核心点的KNN扩展类,直至密度下降到中心点密度的给定比率时结束.给出了数个算法实例并与基于网格的共享近邻聚类(GNN)算法在聚类准确率和效率上进行了试验比较,试验表明该算法极大降低了基于密度聚类算法对参数的敏感性、改善了对高维密度分布不均数据集的聚类效果、提高了聚类准确率和效率.
Characteristics and disadvantages of traditional density-based clustering algorithms are deeply investigated; the present research status of density-based clustering algorithms is discussed; an improved clustering algorithm based on density distribution function is put forward. K nearest neighbor (KNN) is used to measure the density of each point; a local maximum density point is defined as the center point. By means of local scale, classification is extended from the center point. For each point there is a procedure to determine whether it is a core point by a radius scale factor. The classification is extended once again from the core point until the density descends to the given ratio of the density of the center point. Several algorithm examples are given and the algorithm is experimentally compared with the grid-shared nearest neighbor (GNN) clustering algorithm, on the clustering accuracy ratio and efficiency. The tests show that the improved algorithm greatly reduces the sensitivity of density-based clustering algorithms to parameters, improves the clustering effect of the high-dimensional data sets with uneven density distribution, and enhances the clustering accuracy and efficiency.
出处
《控制理论与应用》
EI
CAS
CSCD
北大核心
2011年第12期1791-1796,共6页
Control Theory & Applications
基金
国家自然科学基金资助项目(60634020)
湖南省自然科学基金资助项目(08JJ3132)
中央高校基本科研业务费资助项目
关键词
聚类算法
KNN
GNN
密度分布函数
OPTICS
DENCLUE
区域比例
半径比例因子
clustering algorithms; KNN; GNN; density distribution function; OPTICS(ordering points to identify the clustering structure); DENCLUE(density-based clustering); local scale; radius scale factor