期刊文献+

面向大数据挖掘的Hadoop框架K均值聚类算法 被引量:22

K-means clustering algorithm with Hadoop framework for large data mining
下载PDF
导出
摘要 为提高大数据聚类效率,提出一种基于Hadoop框架的K均值聚类算法。采用Hadoop框架所用的MapReduce模型,将大数据划分成许多数据块。在Map阶段提出权重K均值聚类算法,对每一个数据块独立聚类,得到聚类中心和权重;在Reduce阶段提出加权融合K均值聚类算法,对Map阶段得到的聚类中心和权重进行融合,得到最终的聚类结果。在HIGGS数据集上进行聚类实验,实验结果表明,该算法在保持聚类准确率的前提下大幅提升了大数据聚类时K均值聚类算法的运算效率。 To improve the efficiency of big data clustering,a K-means clustering algorithm based on Hadoop framework was proposed.The MapReduce model of Hadoop framework was used to divide big data into many data blocks.In the Map phase,a weighted K-means clustering algorithm was proposed to cluster independently for each data block,and the clustering centers and weights were obtained.In the Reduce phase,the weighted fusing K-means clustering algorithm was proposed,to fuse the clustering centers and weights obtained in the Map phase,and the final clustering results were obtained.The clustering experiment was executed on HIGGS dataset,the results show that the proposed algorithm can greatly improve the efficiency of K-means clustering algorithm for big data clustering on the premise of keeping the accuracy of clustering.
作者 李爽 陈瑞瑞 林楠 LI Shuang;CHEN Rui-rui;LIN Nan(School of Information Engineering, Zhengzhou University of Industrial Technology, Zhengzhou 451199, China;College of Software and Application of Science and Technology, Zhengzhou University, Zhengzhou 451199, China)
出处 《计算机工程与设计》 北大核心 2018年第12期3734-3738,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(61502204)
关键词 数据挖掘 K均值聚类 Hadoop框架 大数据 MAPREDUCE模型 data mining K-means clustering Hadoop framework big data MapReduce model
  • 相关文献

参考文献4

二级参考文献111

共引文献121

同被引文献192

引证文献22

二级引证文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部