摘要
K-均值算法对初始聚类中心敏感,聚类结果随不同初始聚类中心波动。针对以上问题,提出一种基于最小方差的自适应K-均值初始化方法,使初始聚类中心分布在K个不同样本密集区域,聚类结果收敛到全局最优。首先,根据样本空间分布信息,计算样本方差得到样本紧密度信息,并基于样本紧密度选出满足条件的候选初始聚类中心;然后,对候选初始聚类中心进行处理,筛选出K个初始聚类中心。实验证明,算法具有较高的聚类性能,对噪声和孤立点具有较好的鲁棒性,且适合对大规模数据集聚类。
K-means algorithm is sensitive to the initial cluster center;fluctuation of clustering results are following with different initial cluster centers. To solve these problems,in this paper,an adaptive K-means initialization method is proposed based on minimum variance;the initial clustering center is distributed in the K different sample density regions,clustering results of convergence to the global optimum. Firstly,according to the information of the space distribution of samples,the information of samples close degree is got by calculation of sample variance. In addition,based on samples close degree the qualified candidate initial cluster centers is selected;Then,the candidate initial cluster centers are dealt with and k initial cluster centers are filtered. The experiment proved that this algorithm has high clustering performance and good robustness for processing of the noise and the isolated point;it is suitable for clustering the large-scale data set.
出处
《长春理工大学学报(自然科学版)》
2015年第5期140-144,149,共6页
Journal of Changchun University of Science and Technology(Natural Science Edition)
关键词
聚类
K-均值
方差
初始聚类中心
clustering
K-means
deviation
initialized clustering centers