摘要
密度峰值聚类(DPC)算法对于密度多样、形状复杂的数据集不能准确选择聚类中心,同时基于局部引力的聚类(LGC)算法参数较多且需要手动调参。针对这些问题,提出了一种基于局部引力和距离的聚类算法(LGDC)。首先,利用局部引力模型计算数据点的集中度(CE),根据集中度确定每个数据点与高集中度的点之间的距离;然后,选取具有高集中度值和高距离值的数据点作为聚类中心;最后,基于簇的内部点集中度远高于边界点的集中度的思想,分配其余数据点,并且利用平衡k近邻实现参数的自动调整。实验结果表明,LGDC在4个合成数据集上取得了更好的聚类效果;且在Wine、SCADI、Soybean等真实数据集上,LGDC的调整兰德系数(ARI)指标相较DPC、LGC等算法平均提高了0.1447。
The Density Peak Clustering(DPC)algorithm cannot accurately select the cluster centers for the datasets with various density and complex shape.The Clustering by Local Gravitation(LGC)algorithm has many parameters which need manual adjustment.To address these issues,a new Clustering algorithm based on Local Gravity and Distance(LGDC)was proposed.Firstly,the local gravity model was used to calculate the ConcEntration(CE)of data points,and the distance between each point and the point with higher CE value was determined according to CE.Then,the data points with high CE and high distance were selected as cluster centers.Finally,the remaining data points were allocated based on the idea that the CE of internal points of the cluster was much higher than that of the boundary points.At the same time,the balanced k nearest neighbor was used to adjust the parameters automatically.Experimental results show that,LGDC achieves better clustering effect on four synthetic datasets.Compared with algorithms such as DPC and LGC,LGDC has the index of Adjustable Rand Index(ARI)improved by 0.1447 on average on the real datasets such as Wine,SCADI and Soybean.
作者
杜洁
马燕
黄慧
DU Jie;MA Yan;HUANG Hui(College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 201418,China)
出处
《计算机应用》
CSCD
北大核心
2022年第5期1472-1479,共8页
journal of Computer Applications
基金
国家自然科学基金资助项目(61373004)。
关键词
密度峰值聚类
引力聚类
局部引力模型
集中度
距离
density peak clustering
gravity clustering
local gravity model
concentration
distance