基于改进的Map/Reduce及模式空间划分的数据挖掘被引量：4

Data Mining Based on Improved Map/Reduce and Pattern Space Division

下载PDF

导出

摘要为通过Map/Reduce基于键/值对以处理数据集与模式集的多对多的对应关系的方式实现数据挖掘,解决在较复杂的类型的模式的挖掘中存在的由组合爆炸导致的模式集过大的问题,提出了通过模式空间划分实现将处理数据集与模式集的对应关系的问题转化为处理数据集与子模式集的集合的对应关系的问题的方法,并对Map/Reduce集群的调度机制及组织、处理键/值对的方式进行了改进,增强了Map/Reduce执行模式挖掘任务的能力.在Map/Reduce集群上采用该思路实现某些较复杂类型的模式的挖掘算法时的并行度高于Map/Reduce化的传统算法. In order to realize data mining through map/reduce based on key/value pairs,the way of processing the many-to-many corresponding relationship between the data set and the pattern set is adopted.For some of the more complex types of pattern,the pattern set is so large because of combinatorial explosion that the corresponding relationship cannot always be processed directly by the map/reduce.Therefore a way of pattern space is proposed to convert the problem of processing the many-to-many corresponding relationship between the date set and the pattern set to the problem of processing the many-to-many corresponding relationship between the data set and the set of the sub-pattern sets.At the same time,the scheduling mechanisms of the map/reduce cluster and the way of organizing the key/value pairs is improved to enhance the ability of map/reduce to execute pattern mining tasks.The results show that higher parallelism is achieved on map/reduce clusters by using this idea than the map/reduce of the traditional algorithm in mining some of the more complex types of pattern.

作者刘骞陈明

机构地区中国石油大学计算机科学与技术系

出处《微电子学与计算机》 CSCD 北大核心 2011年第8期140-142,共3页 Microelectronics & Computer

关键词改进的Map/Reduce 集群多对多映射模式空间划分模式编码数据挖掘 improved map/reduce cluster the many-to-many corresponding relationship pattern space division pattern coding Data Mining

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1Ng Andrew Y, Bradski Gary, Chu ChengTao. MapRe- duce for machine learning on multicore [R], NIPS, 2006.
2Dan Gillick, Arlo Faria, John DeNero. Map/Reduce: distributed computing for machine learning[R]. 2006.
3Dean Jeffrey, Ghemawat Sanjay. Map/Reduce.. simpli- fied data processing on large clustersEJ]. Communictions of the ACM, 2008, 51(1):107-113.
4陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. 被引量：1312

二级参考文献29

1Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
2Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
3Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
4Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
5Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
6Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
7Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.
9Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335-350.
10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

共引文献1311

1查伟,孙燕琼,郑继平.基于云测试架构的FIVP解决方案[J].铁路技术创新,2021(S01):82-86.
2林少伟.人工智能法律主体资格实现路径:以商事主体为视角[J].中国政法大学学报,2021(3):165-177. 被引量：6
3胡祖林,肇杰.云计算下的网盘安全[J].计算机产品与流通,2020,0(1):164-164.
4张盛,任伟,王玉,黄金明,陈旭彤.基于Web的重力异常正演建模工具[J].地质论评,2023,69(S01):595-597.
5赵文韬.基于5G技术的黑龙江云计算产业发展[J].电子技术（上海）,2020,49(9):186-187.
6Longfei He,Mei Xue,Bin Gu.Internet-of-things enabled supply chain planning and coordination with big data services:Certain theoretic implications[J].Journal of Management Science and Engineering,2020,5(1):1-22. 被引量：6
7吴劲松,陈孚.云计算发展及应用研究[J].广西通信技术,2011(2):9-13. 被引量：5
8黄纬,温志萍,程初.云计算中基于K-均值聚类的虚拟机调度算法研究[J].南京理工大学学报,2013,37(6):807-812. 被引量：17
9孙凌宇,欧阳春娟,冷明,刘昌鑫,夏洁武.云计算与高等教育管理信息服务系统构建[J].山西财经大学学报,2012,34(S1). 被引量：9
10王荣荣.云计算技术基础上数字图书馆云服务平台的实现[J].河北北方学院学报（社会科学版）,2013,29(4):72-74. 被引量：2

同被引文献22

1Tom White. Hadoop权威指南[M](2版).北京:清华大学出版社,2011.
2White T著.周敏奇,王晓玲,金澈清,等译.Hadoop权威指南.第二版.北京:清华大学出版社,2011.
3Jeffrey D, Sanjay G. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2005 , 51 ( 1 ) : 107-113.
4Dean Jeffrey,Ghemawat Sanjay. MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2005,(01):107-113.
5GHEMAWAT S,GOBIOFF H,LEUNG S T. The Google File System[A].New York:acm Press,2003.29-43.
6Tom White.Hadoop权威指南[M]北京:清华大学出版社,2011.
7Jiawei Han;Micheline Kamber.数据挖掘:概念与技术[M]北京:机械工业出版社,2007.
8索红军.数据挖掘在商场决策支持中的应用研究[J].科学技术与工程,2008,8(14):3950-3952. 被引量：3
9胡侃,夏绍玮.基于大型数据仓库的数据采掘:研究综述[J].软件学报,1998,9(1):53-63. 被引量：256
10宓詠.高校信息化的发展与展望——综合数据分析的探索[J].教育技术资讯,2009(5):10-13. 被引量：6

引证文献4

1应毅,顾问,叶传标.基于开源项目的高校私有云平台[J].现代教育技术,2012,22(12):99-102. 被引量：2
2应毅,任凯,刘正涛.基于云计算技术的数据挖掘[J].微电子学与计算机,2013,30(2):161-164. 被引量：20
3应毅,任凯,曹阳.基于改进的MapReduce模型的Web挖掘[J].科学技术与工程,2013,21(5):1205-1209. 被引量：10
4杨健兵.MapReduce框架下改进Apriori算法的研究[J].长春大学学报,2016,26(12):40-43. 被引量：2

二级引证文献34

1施亮,钱雪忠.基于Hadoop的并行FP-Growth算法的研究与实现[J].微电子学与计算机,2015,32(4):150-154. 被引量：15
2李伟.基于虚拟化技术的高校多媒体系统架构[J].宝鸡文理学院学报（自然科学版）,2013,33(2):57-60. 被引量：1
3张宝明.网络信息摘录与脱机Web应用程序的构建[J].信息技术,2014,38(2):163-166.
4张兴旺,黄晓斌.图书情报学视角下Web挖掘研究述评[J].图书情报工作,2014,58(4):117-126. 被引量：2
5谢雪莲,李兰友.基于云计算的并行K-means聚类算法研究[J].计算机测量与控制,2014,22(5):1510-1512. 被引量：21
6齐娇娇.基于改进Apriori算法的运动员多属性训练数据挖掘模型构建及仿真[J].微型电脑应用,2018,34(12):137-139. 被引量：5
7Yajie WANG,Jinlin HE,Peng WANG,Jiao DAI,Bing YANG,Hong TAN,Guangcan TAO.Risk Analysis of Big Data Based on Cloud Computing for the Inspection and Testing of Toxic and Hazardous Substances in Meat Products[J].Asian Agricultural Research,2017,9(8):95-100. 被引量：4
8李悦,高晶,雷鸣.基于云计算技术的Web数据挖掘的算法研究[J].科技资讯,2014,12(18):17-17. 被引量：3
9李永生,曾沁,徐美红,石小英.基于Hadoop的数值预报产品服务平台设计与实现[J].应用气象学报,2015,26(1):122-128. 被引量：32
10高莉莎,刘正涛,应毅.基于应用程序的MapReduce性能优化[J].计算机技术与发展,2015,25(7):96-99. 被引量：4

1刘骞,陈明.基于Map/Reduce集群上的模式空间划分的序列模式挖掘[J].微电子学与计算机,2012,29(9):149-151. 被引量：1
2刘骞,陈明.基于Map／Reduce集群上的模式空间划分的数据挖掘[J].中国电子商情（通信市场）,2012(3):91-95.
3夏幼明,解敏,周雯.数据挖掘方法分析与评价[J].云南师范大学学报（自然科学版）,2003,23(2):7-16. 被引量：10
4葛龙云.如何分析C语言复杂类型[J].昆明理工大学学报（理工版）,2000,25(1):149-151.
5王晔,牛小龙,王郁武.C#实现短信息的PDU模式编码[J].福建电脑,2011,27(12):37-38. 被引量：2
6汤宇.基于交叉模式编码的汇编技术研究[J].现代计算机,2016,22(6):27-28.
7邓方安,刘三阳,徐扬,杨磊.粗糙近似算子在模式的可能性和必然性分类中的应用[J].电子学报,2004,32(4):697-700.
8刘小丹,栾若星,黄翠翠.可视二维条码的模式编码方法[J].计算机应用,2010,30(9):2467-2469. 被引量：2
9周景洲.神经网络模式编码干扰的判别与改进[J].电机与控制学报,1998,2(1):18-20.
10耶晓东,何亚银,庞亮亮.对象数据库技术发展及其应用[J].现代机械,2006(4):58-60. 被引量：1

微电子学与计算机

2011年第8期

浏览历史

内容加载中请稍等...

基于改进的Map/Reduce及模式空间划分的数据挖掘被引量：4

参考文献4

二级参考文献29

共引文献1311

同被引文献22

引证文献4

二级引证文献34

相关作者

相关机构

相关主题

浏览历史

基于改进的Map/Reduce及模式空间划分的数据挖掘 被引量：4

参考文献4

二级参考文献29

共引文献1311

同被引文献22

引证文献4

二级引证文献34

相关作者

相关机构

相关主题

浏览历史

基于改进的Map/Reduce及模式空间划分的数据挖掘被引量：4