摘要
K-means算法从样本集随机选取初始聚类中心导致聚类结果不稳定,且聚类性能易受奇异点影响。针对以上缺陷,文章定义基于相异度矩阵的邻域半径概念,依次选取最小邻域半径对应的样本作为初始聚类中心,直到邻域半径达到样本集的平均邻域半径;若选取的聚类中心数量不足K个,逐步缩小邻域参数探索,直到选出K个。随后给出基于实验的剔除奇异点公式,得到最终的聚类结果。实验结果表明,算法在准确度和迭代次数两方面均有所改进。
The K-means algorithm selects the initial clustering centers from the sample set at random,which leads to unstable clustering results,and the clustering performance is easily affected by singularity.In view of above defects,the paper defines the concept of neighborhood radius based on the dissimilarity matrix,and successively selects the samples corresponding to the minimum neighborhood radius as the initial clustering centers,until the neighborhood radius reaches the average neighborhood radius of the sample set;if the number of selected clustering centers is less than K,the neighborhood parameter is gradually reduced to explore,until K initial clustering centers are selected.Then the formula of eliminating singular points based on experiment is given,and the final clustering result is obtained.Experimental results show that the algorithm is improved in accuracy and iteration times.
作者
李汉波
魏福义
张嘉龙
刘志伟
LI Hanbo;WEI Fuyi;ZHANG Jialong;LIU Zhiwei(South China Agricultural University,Guangzhou 510642,China)
出处
《现代信息科技》
2021年第7期67-70,共4页
Modern Information Technology
基金
国家自然科学基金青年项目(11701189)
广东省大学生创新创业项目(S202010564034)
华南农业大学微达安产业学院项目。