基于信息熵降维的混合属性数据流聚类算法

Clustering Algorithm for Data Stream with Heterogeneous Attributes Based on Information Entropy Dimension Reduction

下载PDF

导出

摘要现有的数据流聚类算法无法处理高维混合属性的数据流。针对该问题,对HPStream算法的脱机聚类和联机聚类过程进行改进,利用频度矩阵处理名词属性,通过基于信息熵的名词属性选择方法降低数据维度。实验结果表明,该算法能有效处理混合属性和维度较高的数据集,与HPStream算法相比,聚类精度有5%~15%的提高。 Existed data stream clustering algorithms can not deal with the data stream with high-dimensional heterogeneous attributes.To address the problem,this paper improves the off-line process and the on-line process of HPStream algorithm,which uses frequency matrix to handle the categorical attributes and uses the principle of information entropy to handle the problem of high dimension.Experimental results show that the algorithm can manipulate heterogeneous attributes and high-dimensional data sets.Compared with the HPStream algorithm,its clustering precision is increased by 5% ~15%.

作者谭建建郑洪源丁秋林

机构地区南京航空航天大学信息科学与技术学院

出处《计算机工程》 CAS CSCD 北大核心 2011年第19期82-84,87,共4页 Computer Engineering

关键词数据流挖掘混合属性频度矩阵信息熵降维 data stream mining heterogeneous attributes frequency matrix information entropy dimension reduction

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1Agarwal C C, Han Jiawei, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams[C]//Proceedings of the 29th International Conference on Very Large Data Bases. Berlin, Germany: [s. n.], 2003: 81-92.
2Agarwal C C, Han Jiawei, Yu P S. A Framework for Projected Clustering of High Dimensional Data Streams[C]//Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, Canada: [s. n.], 2004: 852-863.
3姚文集,高明霞,毛国君,李广奎.基于滑动窗口的XML数据流聚类算法[J].计算机工程,2010,36(13):87-89. 被引量：4
4杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364-1371. 被引量：22
5Huang Zhexue. Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values[J]. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304.

二级参考文献18

1常建龙,曹锋,周傲英+.基于滑动窗口的进化数据流聚类[J].软件学报,2007,18(4):905-918. 被引量：61
2Zhang Kaizhong,Shasha D.Simple Fast Algorithm for the Editing Distance Between Trees and Related Problems[J].SIAM Journal on Computing,1989,18(6):1245-1262.
3Costa G,Manco G,Ortale R.A Tree-based Approach to Clustering XML Documents by Structure[J].Computer Science,2004,32(2):137-148.
4Wang Lian,Cheung D W L.An Efficient and Scalable Algorithm for Clustering XML Documents by Structure[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(1):82-96.
5Nayak R.Fast and Effective Clustering of XML Data Using Structural Information[J].Knowledge and Information Systems,2008,14(2):197-215.
6Muthukrishnan S.Data Streams:Algorithms and Applications.Hanover,MA,USA:Now Publishers Inc.,2005
7Golab L,Ozsu M T.Issues in data stream management.SIGMOD Record,2003,32(2):5-14
8Garofalakis M N,Gehrke J.Querying and mining data streams:You only get one look//Proceedings of the 28th International Conference on Very Large Data Bases.Hong Kong,China,2002:635-635
9Gaber M M,Zaslavsky A B,Krishnaswamy S.Mining data streams:A review.SIGMOD Record,2005,34(2):18-26
10Guha S,Meyerson A,Mishra N,Motwani R,O'Callaghan L.Clustering data streams:Theory and practice.IEEE Transactions on Knowledge and Data Engineering,2003,15(3):515-528

共引文献24

1张广婷.基于KPoints的分布式聚类模型与算法[J].计算机工程,2011,37(S1):40-42.
2万仁霞,王立新,刘振文.基于相异度矩阵的混合属性数据流聚类算法[J].计算机工程与应用,2008,44(25):149-151. 被引量：8
3张晓龙,曾伟.实时数据流聚类的研究新进展[J].计算机工程与设计,2009,30(9):2177-2181. 被引量：5
4李贤,罗可.BIRCH混合属性数据聚类方法[J].计算机工程与应用,2009,45(30):123-125. 被引量：3
5黄德才,吴天虹.基于密度的混合属性数据流聚类算法[J].控制与决策,2010,25(3):416-421. 被引量：11
6付淇,黎虹,李广振.流数据聚类研究综述[J].科技广场,2010(1):237-240.
7苏晓珂,兰洋,秦玉明,程耀东.基于衰减模型的混合属性数据流离群检测[J].计算机科学,2010,37(5):157-161. 被引量：1
8陈荣晖,王伦文.一种新的滑动窗口模型数据流聚类方法[J].小型微型计算机系统,2010,31(12):2355-2358. 被引量：7
9李桃迎,陈燕,秦胜君,李楠.增量聚类算法综述[J].科学技术与工程,2010,10(35):8752-8759. 被引量：7
10高明霞,姚文集,毛国君.XML数据流中面向聚类的指数直方图[J].北京工业大学学报,2011,37(8):1242-1248.

1胡波,黄宁,仵伟强.基于业务路径和频度矩阵的关联规则挖掘算法[J].计算机科学,2016,43(12):146-152. 被引量：2
2张彩霞,李琳.基于矩阵运算的图象处理[J].科技风,2014(8):108-108.
3朱立夫,刘向东.基于用户多页面浏览模式下的网络结构推荐系统的研究[J].智能计算机与应用,2016,6(5):31-34.
4刘茂诚,刘展,袁海兰,王永刚,时盛堂.用户并发访问过程的系统负载分析模型[J].西安石油大学学报（自然科学版）,2008,23(3):92-95. 被引量：2
5李龙顺,彭冬亮,申屠晗,薛安克,刘俊.基于多源冲突数据聚类的态势估计方法[J].火力与指挥控制,2017,42(4):42-46. 被引量：3
6朱立夫,彭佳红.基于用户树形浏览模式下的推荐系统协同式过滤研究[J].计算机与现代化,2012(4):209-211.
7邹伟胜.矩阵处理器(Matrix Processor)[J].音响技术,2005(3):6-7.
8党怀义.云技术在飞行试验数据处理中的应用[J].测控技术,2014,33(3):49-52. 被引量：6
9邓晔.试论矩阵处理分析技术的变革[J].A&S（安防工程商）,2008(11):194-194.
10段晓君,杜小勇,易东云.将Matlab函数转换为VB可用的DLL[J].电脑与信息技术,2000,8(1):44-47. 被引量：16

计算机工程

2011年第19期

浏览历史

内容加载中请稍等...

基于信息熵降维的混合属性数据流聚类算法

参考文献5

二级参考文献18

共引文献24

相关作者

相关机构

相关主题

浏览历史