期刊文献+

云计算平台中分布式Hadoop数据挖掘关键技术研究(英文) 被引量:10

Research on key technologies of distributed Hadoop data mining in cloud computing platform
下载PDF
导出
摘要 云计算环境下的大数据特征挖掘是大数据统计及分析的基础。为了提高聚类的准确度和速度,设计了一种基于分布式Hadoop平台和熵加权特征选择的数据挖掘方案。该方案首先采用无回路有向图对Hadoop平台下的Map Reduce作业流调度问题进行了分析。然后采用并行Map Reduce执行过程完成分布式计算。最后,采用熵加权聚类算法实现海量数据挖掘。仿真结果显示,提出的数据挖掘方案具有较好聚类效果和运行效率。 Big data feature mining in cloud computing environment is the basis for big data statistics and analysis. In order to improve the accuracy and speed of clustering,a data mining scheme based on distributed Hadoop platform and entropy weighted feature selection was designed in this paper.This scheme firstly uses the no-loop directed graph to analyze the problem of Map Reduce job stream scheduling under Hadoop platform,and then uses the parallel Map Reduce execution to complete the distributed computing.Finally,massive data mining is implemented by using the entropy weighted clustering algorithm.Simulation results show that the proposed data mining scheme has good clustering effect and operation efficiency.
作者 何婕 赖敏 Jie HE;Min LAI(College of Electronic Information Engineering,Chongqing Radio and Television University ,ChongQing 401520,China;Chongqing Institute of Engineenng,College of Software Engineering&Computer Science,Chongqing 401320,China)
出处 《机床与液压》 北大核心 2018年第24期144-149,共6页 Machine Tool & Hydraulics
基金 Chongqing Science and Technology Research Project of the Education Commission(KJ1737458)~~
关键词 云计算 大数据挖掘 MAP REDUCE HADOOP 熵加权 聚类算法 Cloud computing Big data mining Map Reduce Hadoop Entropy weighting Clustering algorithm
  • 相关文献

参考文献5

二级参考文献34

共引文献133

同被引文献98

引证文献10

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部