摘要
针对K-均值聚类算法对初始聚类中心存在依赖性的缺陷,提出一种基于数据空间分布选取初始聚类中心的改进算法.该算法首先定义样本距离、样本平均差异度和样本集总体平均差异度;然后将每个样本按平均差异度排序,选择平均差异度较大且与已选聚类中心的差异度大于样本集总体平均差异度的样本作为初始聚类中心.实验表明,改进后的算法不仅提高了聚类结果的稳定性和正确率,而且迭代次数明显减少,收敛速度快.
Aiming at the dependence on initial clustering centers of the K-means clustering algorithm,an improved algorithm is proposed.In the improved K-means algorithm,the initial clustering centers are selected according to the distribution of data spatial.The distance between two samples,the average difference of each sample,and total average difference of sample set are defined.Then the average difference of each sample is sorted.The sample with larger average difference is selected as the initial clustering center if its difference from the selected cluster is larger than average difference.Experimental results show that the stability and accuracy of the clustering results are increased by using the improved algorithm,and the convergence speed is also accelerated.
出处
《控制与决策》
EI
CSCD
北大核心
2017年第4期759-762,共4页
Control and Decision
基金
国家自然科学基金项目(61473118)
湖南省自然科学基金项目(2015JJ2074)
湖南省高校创新平台开放基金项目(13K102)
湖南省科技计划项目(2016TP1021)
关键词
K-均值聚类
初始聚类中心
样本差异度
K-means clustering
initial clustering center
sample difference