期刊文献+

基于改进的Map/Reduce及模式空间划分的数据挖掘 被引量:4

Data Mining Based on Improved Map/Reduce and Pattern Space Division
下载PDF
导出
摘要 为通过Map/Reduce基于键/值对以处理数据集与模式集的多对多的对应关系的方式实现数据挖掘,解决在较复杂的类型的模式的挖掘中存在的由组合爆炸导致的模式集过大的问题,提出了通过模式空间划分实现将处理数据集与模式集的对应关系的问题转化为处理数据集与子模式集的集合的对应关系的问题的方法,并对Map/Reduce集群的调度机制及组织、处理键/值对的方式进行了改进,增强了Map/Reduce执行模式挖掘任务的能力.在Map/Reduce集群上采用该思路实现某些较复杂类型的模式的挖掘算法时的并行度高于Map/Reduce化的传统算法. In order to realize data mining through map/reduce based on key/value pairs,the way of processing the many-to-many corresponding relationship between the data set and the pattern set is adopted.For some of the more complex types of pattern,the pattern set is so large because of combinatorial explosion that the corresponding relationship cannot always be processed directly by the map/reduce.Therefore a way of pattern space is proposed to convert the problem of processing the many-to-many corresponding relationship between the date set and the pattern set to the problem of processing the many-to-many corresponding relationship between the data set and the set of the sub-pattern sets.At the same time,the scheduling mechanisms of the map/reduce cluster and the way of organizing the key/value pairs is improved to enhance the ability of map/reduce to execute pattern mining tasks.The results show that higher parallelism is achieved on map/reduce clusters by using this idea than the map/reduce of the traditional algorithm in mining some of the more complex types of pattern.
作者 刘骞 陈明
出处 《微电子学与计算机》 CSCD 北大核心 2011年第8期140-142,共3页 Microelectronics & Computer
关键词 改进的Map/Reduce 集群 多对多映射 模式空间划分 模式编码 数据挖掘 improved map/reduce cluster the many-to-many corresponding relationship pattern space division pattern coding Data Mining
  • 相关文献

参考文献4

  • 1Ng Andrew Y, Bradski Gary, Chu ChengTao. MapRe- duce for machine learning on multicore [R], NIPS, 2006.
  • 2Dan Gillick, Arlo Faria, John DeNero. Map/Reduce: distributed computing for machine learning[R]. 2006.
  • 3Dean Jeffrey, Ghemawat Sanjay. Map/Reduce.. simpli- fied data processing on large clustersEJ]. Communictions of the ACM, 2008, 51(1):107-113.
  • 4陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. 被引量:1312

二级参考文献29

  • 1Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
  • 2Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
  • 3Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
  • 4Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
  • 5Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
  • 6Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
  • 7Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
  • 8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.
  • 9Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335-350.
  • 10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

共引文献1311

同被引文献22

引证文献4

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部