期刊文献+

一种并行的加速k-均值聚类方法 被引量:2

A Parallel Speeding K-means Clustering Method
下载PDF
导出
摘要 针对传统k-均值聚类方法不能有效处理海量数据聚类的问题,该文提出一种基于并行计算的加速k-均值聚类(K-means clustering based on parallel computing,Pk-means)方法。该方法首先将海量的聚类样本随机划分为多个独立同分布的聚类工作集,并在每个工作集上并行进行传统k-均值聚类,并得到相应的聚类中心和半径,通过衡量不同子集聚类结果的关系,对每个工作集中聚类得到的子类进行合并,并对特殊数据进行二次归并以校正聚类结果,从而有效处理海量数据的聚类问题。实验结果表明,Pk_means方法在大规模数据集上在保持聚类效果的同时大幅度提高了聚类效率。 To solve problems that traditional k-means clustering algorithm can not solve the large scale dataset clustering,this pa per presents a speeding k-means clustering method based on parallel computing,called PK-means clustering algorithm,in order to solve the low efficiency clustering problem of traditional k-means algorithm.The large scale samples set is divided into some clustering working sets with independent identical distribution and the traditional k-means clustering method is executed on ev ery working set.Then the center and radius of every cluster is computed,and the clustering results of all working sets are com bined by the relationship of different working set.At last,the remaining small special samples are clustered by the former results.The parallel computing way is used in this process,so the clustering efficiency is improved largely and it can be used to solve the large scale clustering problems.Simulation results demonstrate that the excellent clustering efficiency is obtained by this parallel speeding k-means method.
作者 王秀华
出处 《电脑知识与技术》 2013年第6X期4299-4302,共4页 Computer Knowledge and Technology
关键词 K-均值聚类 并行计算 并行k-均值聚类 工作集 效率 k-means clustering parallel computing parallel k-means clustering working set efficiency
  • 相关文献

参考文献11

二级参考文献66

  • 1司文武,钱沄涛.一种基于谱聚类的半监督聚类方法[J].计算机应用,2005,25(6):1347-1349. 被引量:11
  • 2冯少荣,肖文俊.基于密度的DBSCAN聚类算法的研究及应用[J].计算机工程与应用,2007,43(20):216-221. 被引量:34
  • 3Cheng R.Managing Uncertainty in Constantly-evolving Environments[D].Purdue University,2005
  • 4Cheng R,Kalashnikov D V,Prabhal.ar S.Evaluating probabilistic queries over imprecise data[C]//The 2003 ACM SIGMOD International Conference on Management of Data.San Diego,2003
  • 5Cheng R,Xia Y,Prabhal.ar S,et al.Efficient indexing methods for probabilistic threshold queries over uncertain data[C]//The 30th International Conference on Very Large Data Bases.Toronto,2004
  • 6Dalvi N,Suciu D.Efficient query evaluation on probabilistic databases[C]//The 30th International Conference on Very Large Data Bases.Toronto,2004
  • 7Chau M,Cheng R,Kao B,et al.Uncertain Data Mining:An Example in Clustering Location Data[C] // The 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining.Singapore,2006
  • 8Kriegel H-P,Pfeifle M.Density-based clustering of uncertain data[C]//The 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.Chicago,2005
  • 9Ester M,Kriegel H-P,Sander J,et al.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//The 2nd International Conference on Knowledge Discovery and Data Mining.Portland,1996
  • 10Stonebral.er M,Frew J,Gardels K,et al.The SEQUOIA 2000 Storage Benchmark[C]//The 1993 ACM SIGMOD International Conference on Management of Data.Washington,1993

共引文献316

同被引文献22

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2周涓,熊忠阳,张玉芳,任芳.基于最大最小距离法的多中心聚类算法[J].计算机应用,2006,26(6):1425-1427. 被引量:72
  • 3韩家炜,坎伯.数据挖掘概念与技术[M].北京:机械工业出版社.2008.
  • 4汤秋菊,李义杰.无指导聚类在信用卡促销中的应用[J].计算机与现代化,2007(9):100-102. 被引量:1
  • 5ViktorMS,KennethC.大数据时代[M].盛扬燕,周涛译.杭州:浙江人民出版社,2012.
  • 6Rudi L Cilibrasi, Paul M B Vitanyi. A fast quartet tree heuristic for hierarchical clustering [ J ]. Pattern Recogni- tion, 2011,44(3) :662-677.
  • 7Su M C, Chou C H. A modified version of the k-means al- gorithm with a distance based on cluster symmetry [ J ]. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2001,23(6) :674-680.
  • 8Elkan C. Using the triangle inequality to accelerate k- means [ C ]/! Proceedings of the 20th International Confer- ence on Machine Learning. 2003:147-153.
  • 9Huang G B, Ding X, Zhou H. Optimization method based extreme learning machine for classification [ J ]. Neuroeom- puting, 2010,74 ( 1-3 ) : 155-163.
  • 10UCI Machine Learning Repository. Welcome to the UC Ir- vine Machine Learning Repository! [ DB/OL]. http://ar- chive, ics. uci. edu/ml/, 2013-07-01.

引证文献2

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部