摘要
针对K-means聚类算法对初始聚类中心敏感问题,提出1种结合方差与误差平方和的优化算法。首先,该算法基于方差和距离选取k个位于不同区域且样本点相对集中的集合。然后,分别选取使这k个样本集合误差平方和最小的数据作为k个初始聚类中心。利用改进算法与其他算法将UCI数据库中所选取的数据集进行聚类划分,对比不同算法下的聚类结果。研究结果表明,改进算法不仅可以提高聚类质量,而且可以减少聚类的迭代次数,加快收敛速度。
Aiming at the problem that K-means clustering algorithm is sensitive to the initial clustering center,an optimization algorithm combining variance and sum of square error was proposed.Firstly,k sets located in different regions and with relatively concentrated sample points were selected based on variance and distance,and then the data that minimizes the sum of square errors of these k sets are calculated as k initial clustering centers.The improved algorithm and other algorithms were employed to cluster the selected datasets in UCI database.By comparing the clustering results of different algorithms,it can be seen that the improved algorithm can not only improve the clustering quality,but also reduce the number of iterations of clustering and accelerate the convergence speed.
作者
曾如明
李云飞
ZENG Ruming;LI Yunfei(College of Mathematics and Information,China West Normal University,Nanchong 637009,China)
出处
《邵阳学院学报(自然科学版)》
2021年第2期8-14,共7页
Journal of Shaoyang University:Natural Science Edition
基金
西华师范大学英才科研基金项目(17YC381)。