期刊文献+

最小方差优化初始聚类中心的K-means算法 被引量:86

K-means Algorithm Based on Minimum Deviation Initialized Clustering Centers
下载PDF
导出
摘要 传统K-means算法随机选取初始聚类中心,容易导致聚类结果不稳定,而优化初始聚类中心的K-means算法需要一定的参数选择,也会使聚类结果缺乏客观性。为此,根据样本空间分布紧密度信息,提出利用最小方差优化初始聚类中心的K-means算法。该算法运用样本空间分布信息,通过计算样本空间分布的方差得到样本紧密度信息,选择方差最小(即紧密度最高)且相距一定距离的样本作为初始聚类中心,实现优化的K-means聚类。在UCI机器学习数据库数据集和含有噪音的人工模拟数据集上的实验结果表明,该算法不仅能得到较好的聚类结果,且聚类结果稳定,对噪音具有较强的免疫性能。 To overcome the deficiencies of traditional K-means algorithm whose clustering is dependent on the seeds chosen randomly and of the improved K-means algorithms whose clustering are unstable for the parameters selected arbitrarily,a novel K-means clustering algorithm is proposed in this paper.This new K-means algorithm adopts the pattern information of exemplars in a dataset,and computes the deviation for each sample.It uses the well known principle that the deviation of a sample addresses the intensive of exemplars around it.The less the deviation is,the more exemplars are intensively gathered around the related sample.The proposed K-means algorithm chooses the first K samples with the minimum deviation and far away from each other as the initial cluster centers to improve the performance of it.The proposed K-means algorithm is tested on UCI data sets and on synthetic datasets with some proportional noises.The experimental results demonstrate that the proposed novel K-means algorithm not only can achieve a very promising and stable clustering,but also get the immune property with noises in its clustering.
出处 《计算机工程》 CAS CSCD 2014年第8期205-211,223,共8页 Computer Engineering
基金 国家自然科学基金资助项目(31372250) 陕西省科技攻关计划基金资助项目(2013K12-03-24) 中央高校基本科研业务费专项基金资助项目(GK201102007)
关键词 聚类 K-MEANS算法 方差 紧密度 初始聚类中心 clustering K-means algorithm deviation intensive degree initialized clustering centers
  • 相关文献

参考文献25

  • 1Han Jiawei,Kamber M.Data Mining:Concepts and Techniques[M].2nd ed.Beijing,China:China Machine Press,2011.
  • 2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1074
  • 3Pena J M,Lozano J A,Larranaga P.An Empirical Comparison of Four Initialization Methods for the K Means Algorithm[J].Pattern Recognition Letters,1999,20(10):1027-1040.
  • 4Vance F.Clustering and the Continuous K-Means Algorithm[J].Los Alamos Science,1994,22:138-134.
  • 5Jain A K,Murty M N,Flynn P J.Data Clustering:A Review[J].ACM Computing Survey,1999,31 (3):264-323.
  • 6Kaufman L,Rousseeuw P J.Finding Groups in Data:An Introduction to Cluster Analysis[M].New York,USA:John Wiley & Sons,Inc.,1990.
  • 7Dhillon I S,Guan Yuqiang,Kogan J.Refining Clusters in High Dimensional Text Data[C]//Proceedings of the 2nd SIAM Workshop on Clustering High Dimensional Data.Arlington,USA:[s.n.],2002:59-66.
  • 8Khan S S,Ahmad A.Cluster Center Initialization for Kmeans Clustering[J].Pattern Recognition Letters,2004,25(11):1293-1302.
  • 9Deelers S,Auwatanamongkol S.Enhancing K-means Algorithm with Initial Cluster Centers Derived from Data Partitioning Along the Data Axis with the Highest Variance[J].Proceedings of World Academy of Science,Engineering and Technology,2007,26:323-328.
  • 10钱线,黄萱菁,吴立德.初始化K-means的谱方法[J].自动化学报,2007,33(4):342-346. 被引量:32

二级参考文献119

共引文献1688

同被引文献556

引证文献86

二级引证文献515

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部