期刊文献+

一种改进K-means算法的聚类算法CARDBK 被引量:12

Clustering Algorithm CARDBK Improved from K-means Algorithm
下载PDF
导出
摘要 CARDBK聚类算法与批K-means算法的不同之处在于,每个点不是只归属于一个簇,而是同时影响多个簇的质心值,一个点影响某一个簇的质心值的程度取决于该点与其它离该点更近的簇的质心之间的距离值。从聚类结果的熵、纯度、F1值、Rand Index和NMI等5个性能指标值来看,与多个不同算法在多个不同数据集上分别聚类相比,该算法具有较好的聚类结果;与多个不同算法在同一数据集上很多不同的初始化条件下分别聚类相比,该算法具有较好且稳定的聚类结果;该算法在不同大小数据集上聚类时具有线性伸缩性且速度较快。 The difference between our clustering algorithm and batch K-means algorithm is that in our algorithm each point is not only attributable to one cluster,instead affects multiple cluster centroid values,and the degree of influence of a point on a cluster centroid depends on the distance values between this point and the other more near cluster centroids.Our algorithm and a number of different algorithms on a number of different data sets were clustered respectively from the point of view of their clustering result’s five performance index values such as entropy,purity,F1 value,Rand Index and normalized mutual information,and the results show our algorithm has a better clustering results.Our algorithm and a number of different algorithms were clustered respectively on one same data set but under many different initialization conditions,and clustering results of our algorithm are preferably more stable and better.Cluster on different size data sets by our algorithm has a linear scalability and is faster.
出处 《计算机科学》 CSCD 北大核心 2015年第3期201-205,共5页 Computer Science
基金 国家自然科学基金(61379019 71102149) 中国博士后科学基金(2013M540704) 四川省学术和技术带头人培养资金 四川省博士后科研基金资助
关键词 聚类 文档聚类 文本聚类 K-MEANS 算法 Clustering Text clustering Document clustering K-means Algorithm
  • 相关文献

参考文献20

  • 1朱烨行.文档聚类算法研究[D].西安:西北工业大学,2009.
  • 2Zhao Ying, Karypis G. Criterion functions for document clut ring: Experiments and analysis[R/OL]. 20O3-O4-23 [2008-10-2] http://glaros dt: umrL edu/gkhome/cluto/cluto/do:aload [.
  • 3Anon. an Introduction to Cluster Analysis for Data Mining[EB/ OL]. 2000-02-10 [2008-12-2]. http://www, do188, corn/p- 567183494975. html.
  • 4刘泉凤,陆蓓,王小华.文本挖掘中聚类算法的比较研究[J].计算机时代,2005(6):7-8. 被引量:8
  • 5谷波,张永奎.文本聚类算法的分析与比较[J].电脑开发与应用,2003,16(11):4-6. 被引量:11
  • 6Bernd F. Some Competitive Learning Methods [R/OL]. 1997-04-05E2008-10-223. http://www, neuroinformatik, ruhr- uni-bochum, de/ini/VDM/research/gsn/JavaPaper/.
  • 7Ridella S, Rovetta S, Zunino R. Plastic Algorithm for Adaptive Vector Quantisation[J]. Neural Computing &. Applications, 1998,7(1) :37-51.
  • 8Pal N R, Bezdek J C, Tsao E C K. Generalized Clustering Net- works and Kohonen's Self-Organizing Scheme[J]. IEEE Trans- action on Neural Networks, 1993,4(4) : 549-557.
  • 9Hansen P, Mladenovie N. J-Means: A New Local Search Heuris- tic for Minimum Sum-of-Squares Clustering[J]. Pattern Recog- nition, 2001,34(2) : 405-413.
  • 10唐春生,张磊,潘东,等.文本分类研究进展[EB/OL].2001.ttp://epec.sjtu.edu.cn/seminar/.

二级参考文献11

共引文献19

同被引文献105

  • 1洪月华.一种具有学习能力的人工蜂群优化算法[J].微电子学与计算机,2015,32(6):154-158. 被引量:2
  • 2陈兴蜀,吴小松,王文贤,王海舟.基于特征关联度的K-means初始聚类中心优化算法[J].四川大学学报(工程科学版),2015,47(1):13-19. 被引量:29
  • 3黄金杰,李士勇,蔡云泽.一种建立粗糙数据模型的监督模糊聚类方法[J].软件学报,2005,16(5):744-753. 被引量:12
  • 4毛韶阳,李肯立.优化K-means初始聚类中心研究[J].计算机工程与应用,2007,43(22):179-181. 被引量:26
  • 5罗森林,马俊,潘丽敏编著.数据挖掘理论与技术[M].北京:电子工业出版社,2013.
  • 6Zhang Ren-yuan, Shibata T. An analog on-line-learning K-means processor employing fully parallel self-converging cireuitry[J]. Analog Integrated Circuits and Signal Processing, 2013,75 (2): 267-277.
  • 7Sathiyakumari K, Preamsudha V, Manimekalai G, et al. A Sur- vey on Various Approaches in Document Clustering [J]. Inter- national Journal of Computer Technology and Applications, 2011,2(5) : 1534-1539.
  • 8Kannungo T, Mount D M, Netanyahu N S, et al. An Efficient K- Means Clustering Algorithm: Analysis And Implementation[J]. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2002,24(7) : 881-891.
  • 9Wang Da, Mazumdar A, Womell G W. A Rate-Distortion Theory For Permutation Spaces[C]//IEEE International Symposium on Information Theory Proceedings. 2013:2562-2566.
  • 10Sun Zhan-quan,Geoffrey F, Gu Wei-dong, et al. A parallel clus- tering method combined information bottleneck theory and cen- troid-based clustering [J]. The Journal of Supercomputing, 2014,69 (1) .. 452-467.

引证文献12

二级引证文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部