基于属性约简的粗糙集海量数据分割算法研究被引量：1

Mass Data Partition for Rough Set on Attribute Reduction Algorithm

下载PDF

导出

摘要结合Rough Set理论研究了分布式处理海量数据中的关键问题,即分割海量数据集的问题。经典的Rough Set算法要求数据常驻内存,因此不能有效地处理海量数据。为了能够直接处理海量数据集,根据最佳分割的定义,结合属性约简的思想,提出基于属性约简的粗糙集海量数据分割算法(Mass Data Partition for Rough Set on Attribute Reduction,MD-PRS-AR)。通过实验表明,MDPRS-AR算法的分割效率比传统的算法约高70%,而且与处理整个数据集的算法相比,正确性损失不大。 An effective rough-set-based method is developed to study the key problem of process distributed mass data, which is the problem of segment massive dataset. Most other rough- set - based algorithms are designed only for memory- resident data, so it is hard for these algorithms to deal with mass data set. On the base of definition of best partition, and combined with the idea of attribute reduction, a mass data partition for rough set on attribute reduction algorithm is developed for processing mass data sets directly. It is proved by simulation experiments that the MDPRS- AR method presented is faster than original rough- set- based algorithms by about 70%, while its performance is close to those algorithms that process the original data set as a whole.

作者夏奇思王汝传

机构地区南京邮电大学计算机学院南京大学计算机软件新技术国家重点实验室

出处《计算机技术与发展》 2010年第4期5-7,11,共4页 Computer Technology and Development

基金国家自然科学基金(60973139 60773041) 江苏省自然科学基金(BK2008451) 国家高科技863项目(2007AA01Z404 2007AA01Z478) 现代通信国家重点实验室基金(9140C1105040805) 国家和江苏省博士后基金(0801019C 20090451240 20090451241) 江苏高校科技创新计划项目(CX08B-086Z) 江苏省六大高峰人才项目(2008118) 江苏省青蓝工程资助项目

关键词海量数据粗糙集数据分割分布式处理属性约简 mass data rough set data partition distributed information procession attribute reduction

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献7

1苗卿,单立新,裘昱.信息熵在数据集分割中的应用研究[J].电脑知识与技术,2007(3):1193-1194. 被引量：3
2Pawlak Z. Rough Set Approach to Multi - Attriute Decision Analysis[J]. European Journal of Operational Research, 1994(72) :443 - 459.
3Pawlak Z, Grzymala - Busse J, Slowinski R, et al. Rough Sets [ J ]. Communications of the ACM, 1995,38(11 ) :89 - 95.
4姚辉学,卢章平.海量数据多边形布尔运算的区域分割算法[J].中国图象图形学报,2007,12(3):552-557. 被引量：7
5伍东,李建,税敏.海量数据并行压缩算法研究[J].山西电子技术,2007(2):85-87. 被引量：2
6AnAJ, Shan N, Chan C, etal. Discovering Rules for Water Demand Prediction: An Enhanced Rough - set Approach[J]. Artificial Intelligence, 1996,9(6) :645 - 653.
7Wu X D, Zhang S C. Synthesizing High- Frequency Rules from Different Data Sources[J]. IEEE. Transaction on Knowledge and Data Engineering,2003,15 (2) : 353 - 367.

二级参考文献12

1杨智君,田地,马骏骁,隋欣,周斌.入侵检测技术研究综述[J].计算机工程与设计,2006,27(12):2119-2123. 被引量：45
2[3]David Salomon.数据压缩原理与应用[M].吴乐南,译.北京:电子工业出版社,2003.
3[2]Cleary J.G.and I.H.Witten(1984),Data Compression Using Adaptive Coding and Partial String Matching.IEEE Transactions on Communications COM-32(4):396-402,April
4[3]Eric Bodden,make clasen.Arithmetic Coding revealed-A guided tour from theory to praxis.Translated and updated version,May 2004.
5[4]Barry Wilkinson、Michad Allent.并行程序设计[M].陆鑫达,等译.北京:机械工业出版社,2005.
6Rivero M,Feito F R.Boolean operations on general planar polygons[J].Computer& Graphics,2000,24(6):881 -896.
7Ruiz J,de Miras,Feito F R.Inclusion test for curved-edged polygons[J].Computers & Graphics,1997,21(6):815 -824.
8Feito F,Rivero M L,Rueda A J.Boolean representations of general planar polygons[A].In:Proceedings of the 7th International Conference in Central Europe on Computer Graphics,Visualization and Interactive Digital Media[C],Plzen-Bory,Czech Republic,1999:87 - 92.
9周培德著.计算几何--算法分析与设计[M].北京:清华大学出版社,1999:133-176.
10谢步瀛,张岩.用分段法与链表法的二维布尔运算[J].工程图学学报,2003,24(2):78-84. 被引量：7

共引文献9

1钟云海,郑海,周建波.矢量电子航海图的分割方法研究[J].中国航海,2008,31(4):331-334.
2张旭,张国霞.集成电路版图中GPC和PB的相交运算特性[J].现代电子技术,2009,32(8):92-96.
3李伟光,张成岗.不同压缩程序对海量生物信息数据压缩效率的比较分析[J].生物信息学,2009,7(3):196-201.
4刘雪娜,侯宝明.复杂多边形窗口的多边形裁剪的改进算法[J].计算机与现代化,2009(11):36-38. 被引量：1
5刘广平,陈立文.房地产市场发展状况比较研究[J].未来与发展,2010,31(3):93-96. 被引量：1
6张志远,火一莽,万月亮,翁越龙.储存系统数据布局算法进展分析[J].信息网络安全,2013(5):73-78.
7宋海波,陆正福,张翔.基于条件熵和改进遗传算法的入侵检测算法研究[J].软件导刊,2013,12(10):68-69. 被引量：1
8陈德标,钭祖民,钱方捷,吴世喜,姚颖焘,梁军,陈汶哲.现代测绘技术在森林郁闭度测算中的应用[J].测绘科学,2014,39(3):143-145.
9贺彪,赵志刚,夏俊.基于CGAL的三维空间布尔运算功能分析与设计[J].测绘通报,2015(6):89-92. 被引量：1

同被引文献9

1段中兴,张德运.基于误码率的模糊加权无线网络公平调度算法[J].西安交通大学学报,2005,39(12):1303-1306. 被引量：1
2Zadeh L A. Fuzzy sets[ J]. Information and Control, 1965, 8:338-353.
3秦克云徐扬.L型直觉模糊集.兰州大学学报,1996,32:352-355.
4Li Deng-Feng. Some measures of dissimilarity in intuitionistic fuzzy structures[ J]. Journal of Computer and System Sciences. 2004(1) :115-122.
5Atanassov K T. New operations defined over the intuitionistic fuzzy sets[J]. Fuzzy Sets and Systems, 1994,61:137- 142.
6陈健,赵跃龙.变精度粗糙集在手术诊断中的应用[J].闽江学院学报,2007,28(5):39-42. 被引量：3
7宋远芳.基于本体的数据挖掘技术在商务智能中的应用[J].计算机技术与发展,2009,19(1):184-186. 被引量：10
8柴造坡.基于相似关系的变精度粗糙集的数据约简[J].哈尔滨师范大学自然科学学报,2009,25(4):18-21. 被引量：1
9张家柏,王小玲.基于聚类和二进制PSO的特征选择[J].计算机技术与发展,2010,20(6):25-28. 被引量：6

引证文献1

1丛涌泉,管婷,张春海,刘超,刘晓东.贴近度方法在考试分类系统中的研究与应用[J].计算机技术与发展,2011,21(1):250-252.

1Dong Hao Luo Shengmei Zhang Hengsheng.A Distributed In-Memory Database Solution for Mass Data Applications[J].ZTE Communications,2010,8(4):45-48.
2QIN Jian-cheng BAI Zhong-ying.Design of new format for mass data compression[J].The Journal of China Universities of Posts and Telecommunications,2011,18(1):121-128. 被引量：2
3何盈捷,刘惟一.基于边界的Markov网的发现[J].计算机科学,2001,28(9):78-82. 被引量：1
4Huabin Ruan,Xiaomeng Huang,Yang Zhou.Design and Implementation of ZTE Object Storage System[J].ZTE Communications,2012,10(4):60-64. 被引量：1
5LI AiQun,DING YouLiang,WANG Hao,GUO Tong.Analysis and assessment of bridge health monitoring mass data——progress in research/development of “Structural Health Monitoring”[J].Science China(Technological Sciences),2012,55(8):2212-2224. 被引量：14
6武星,吕海涛,卓少剑.Sentiment Analysis for Chinese Text Based on Emotion Degree Lexicon and Cognitive Theories[J].Journal of Shanghai Jiaotong university(Science),2015,20(1):1-6. 被引量：2
7HAO Qian-qian,DING Jin-kou,WANG Jian-fei.A Self-adaptive Learning Rate Principle for Stacked Denoising Autoencoders[J].软件,2015,36(9):82-86. 被引量：1
8Shengmei Luo,Qing He,Lixia Liu,Xiang Ao,Ning Li,Fuzhen Zhuang.Parallel Web Mining System Based on Cloud Platform[J].ZTE Communications,2012,10(4):45-53. 被引量：1
9HU Yuxiang,DONG Fang,LAN Julong.Performance Analysis of Hybrid Distribution in Human-Centric Multimedia Networking[J].Chinese Journal of Electronics,2016,25(4):761-767.
10HE Ming,ZHANG Yujie,MENG Xiangwu.Gossip-Based Resource Location Strategy in Interest Community for P2P Networks[J].Chinese Journal of Electronics,2015,24(2):272-280. 被引量：2

计算机技术与发展

2010年第4期

浏览历史

内容加载中请稍等...

基于属性约简的粗糙集海量数据分割算法研究被引量：1

参考文献7

二级参考文献12

共引文献9

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于属性约简的粗糙集海量数据分割算法研究 被引量：1

参考文献7

二级参考文献12

共引文献9

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于属性约简的粗糙集海量数据分割算法研究被引量：1