摘要
密度峰值聚类算法(density peaks cluster,DPC)是一种基于密度的聚类算法,该算法可以聚类任意形状的类簇。在类簇间有密度差距的数据集上,DPC不能准确地选择聚类中心。DPC的非中心点分配策略会引起连续错误,影响算法的聚类效果。模糊k近邻密度峰值算法(fuzzy k-nearest neighbor DPC,FKNN-DPC)是一种改进的DPC算法,该算法采用边界点检测并结合2步分配策略来避免连续错误。当类簇间有密度差距时,FKNN-DPC的边界点检测效果不理想,此外,其非中心点分配策略缺乏对样本近邻信息的考虑。定义相对密度(relative density)并结合近邻关系(nearest neighbor relationship)提出RN-DPC算法解决上述问题。针对DPC因为类簇间的密度差距而不能准确选择聚类中心的问题,定义相对密度用于消除类簇间的密度差距。基于反向k近邻关系检测边界点并且引入共享最近邻关系来对FKNN-DPC的分配策略进行改进。RN-DPC算法在人工数据集和真实数据集上分别与不同的聚类算法进行了对比,实验结果验证了RN-DPC算法的有效性和合理性。
Density peaks cluster(DPC)is a clustering algorithm based on density,which can find clusters with arbitrary shape.However,DPC cannot select clustering centers when there is density gap among clusters.Moreover,the non-center points allocation strategy of DPC will cause continuous errors and affect the clustering performance of the algorithm.A method combining boundary detection and two-step allocation strategy of non-center points is proposed in fuzzy k-nearest neighbor DPC(FKNN-DPC)to avoid continuous errors.However,the boundary detection method of FKNN-DPC cannot handle clusters with density gap,and its non-center points allocation strategy does not take into account the nearest neighbor information of data points.To address these issues,an improved DPC algorithm based on Relative density and nearest neighbor relationship(RN-DPC)is proposed.First,the relative density is defined to eliminate density gap among clusters,which can solve the issue that DPC cannot select correct clustering centers when there is density gap among clusters.Then,reverse k-nearest neighbor relationship is used to detect boundary and shared-nearest neighbor relationship is introduced to improve the allocation strategy of non-center points in FKNN-DPC.Finally,the proposed algorithm is benchmarked on synthetic and real-world datasets with different clustering algorithms.The experimental results demonstrate the effectiveness and rationality of the proposed algorithm in this paper.
作者
代永杨
张清华
支学超
DAI Yongyang;ZHANG Qinghua;ZHI Xuechao(Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China)
出处
《重庆邮电大学学报(自然科学版)》
CSCD
北大核心
2021年第5期791-805,共15页
Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金
国家重点研发计划(2020YFC2003502)
国家自然科学基金(61876201)。
关键词
聚类
密度峰值
近邻关系
边界点检测
近邻分配
cluster
density peaks
nearest neighbor relationship
boundary detection
nearest neighbor assignment