摘要
密度聚类作为一类重要的聚类分析方法,具有无需预先指定类簇数,可识别任意形状聚类族等优点,但在计算密度的过程中,K近邻或邻域半径的选取对聚类效果具有较大的影响,且当数据集中存在类簇间距相差较大的情况时,密度聚类无法自适应类簇中数据对象密度变换,导致聚类效果与实际存在较大误差。针对现有密度聚类分析存在的不足,利用有效邻近点和适应密度分布,提出了一种密度聚类分析算法。该算法首先通过相对距离确定伸缩半径,定义了数据对象的有效邻近点,并有效地克服了近邻值K选取对聚类效果的影响;其次,计算核心点和边界点阈值,依据有效邻近点,并确定类簇中的核心区域数据对象,有效地改善了聚类分析效率;然后,调整簇内有效距离,改善了类簇密度分布不均匀、类簇间距离过大等问题;最后,在人工和UCI数据集上验证了该算法的有效性。
As an important cluster analysis method,density clustering has the advantages of unspecified number of cluster in advance and clustering with arbitrary shapes can be discovered.However,in the process of calculating the density,there is an important influence on the clustering due to the selection of K-nearest neighboring or Eps.When cluster spacing vary a lot in the datasets,the density clustering is unable to adapt to the data object density transformation in the clusters,which leads to a large deviation between the clustering and the reality datasets.In order to overcome shortcomings of existing density cluster analysis,a density clustering algorithm is proposed by using effective neighboring points and adaptive density distribution.Firstly,the telescopic radius is determined by the relative distance,the effective neighboring points of the data object is defined,and the influence of the selection of the nearest neighbor value K on the clustering effect is overcame.Secondly,core point and boundary point threshold are calculated using the relative distance,so that core area objects in the cluster are determined according to the effective neighboring points,which effectively improves the efficiency of cluster analysis.Thirdly,uneven density distribution and large distance between clusters are improved by adjusting the effective distance within the cluster.In the end,the effectiveness of the proposed algorithm is validated on artificial and UCI datasets.
作者
闫强强
张敏
荀亚玲
YAN Qiang-qiang;ZHANG Min;XUN Ya-ling(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)
出处
《计算机技术与发展》
2022年第9期14-22,共9页
Computer Technology and Development
基金
国家青年科学基金项目(61602335)
山西省自然科学基金(201901D211302)。
关键词
密度聚类
伸缩半径
有效邻近点
适应密度分布
相对距离
density clustering algorithm
telescopic radius
effective neighboring points
adaptive density distribution
relative distance