期刊文献+

基于信息熵降维的混合属性数据流聚类算法

Clustering Algorithm for Data Stream with Heterogeneous Attributes Based on Information Entropy Dimension Reduction
下载PDF
导出
摘要 现有的数据流聚类算法无法处理高维混合属性的数据流。针对该问题,对HPStream算法的脱机聚类和联机聚类过程进行改进,利用频度矩阵处理名词属性,通过基于信息熵的名词属性选择方法降低数据维度。实验结果表明,该算法能有效处理混合属性和维度较高的数据集,与HPStream算法相比,聚类精度有5%~15%的提高。 Existed data stream clustering algorithms can not deal with the data stream with high-dimensional heterogeneous attributes.To address the problem,this paper improves the off-line process and the on-line process of HPStream algorithm,which uses frequency matrix to handle the categorical attributes and uses the principle of information entropy to handle the problem of high dimension.Experimental results show that the algorithm can manipulate heterogeneous attributes and high-dimensional data sets.Compared with the HPStream algorithm,its clustering precision is increased by 5% ~15%.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第19期82-84,87,共4页 Computer Engineering
关键词 数据流挖掘 混合属性 频度矩阵 信息熵 降维 data stream mining heterogeneous attributes frequency matrix information entropy dimension reduction
  • 相关文献

参考文献5

  • 1Agarwal C C, Han Jiawei, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams[C]//Proceedings of the 29th International Conference on Very Large Data Bases. Berlin, Germany: [s. n.], 2003: 81-92.
  • 2Agarwal C C, Han Jiawei, Yu P S. A Framework for Projected Clustering of High Dimensional Data Streams[C]//Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, Canada: [s. n.], 2004: 852-863.
  • 3姚文集,高明霞,毛国君,李广奎.基于滑动窗口的XML数据流聚类算法[J].计算机工程,2010,36(13):87-89. 被引量:4
  • 4杨春宇,周杰.一种混合属性数据流聚类算法[J].计算机学报,2007,30(8):1364-1371. 被引量:22
  • 5Huang Zhexue. Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values[J]. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304.

二级参考文献18

  • 1常建龙,曹锋,周傲英+.基于滑动窗口的进化数据流聚类[J].软件学报,2007,18(4):905-918. 被引量:61
  • 2Zhang Kaizhong,Shasha D.Simple Fast Algorithm for the Editing Distance Between Trees and Related Problems[J].SIAM Journal on Computing,1989,18(6):1245-1262.
  • 3Costa G,Manco G,Ortale R.A Tree-based Approach to Clustering XML Documents by Structure[J].Computer Science,2004,32(2):137-148.
  • 4Wang Lian,Cheung D W L.An Efficient and Scalable Algorithm for Clustering XML Documents by Structure[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(1):82-96.
  • 5Nayak R.Fast and Effective Clustering of XML Data Using Structural Information[J].Knowledge and Information Systems,2008,14(2):197-215.
  • 6Muthukrishnan S.Data Streams:Algorithms and Applications.Hanover,MA,USA:Now Publishers Inc.,2005
  • 7Golab L,Ozsu M T.Issues in data stream management.SIGMOD Record,2003,32(2):5-14
  • 8Garofalakis M N,Gehrke J.Querying and mining data streams:You only get one look//Proceedings of the 28th International Conference on Very Large Data Bases.Hong Kong,China,2002:635-635
  • 9Gaber M M,Zaslavsky A B,Krishnaswamy S.Mining data streams:A review.SIGMOD Record,2005,34(2):18-26
  • 10Guha S,Meyerson A,Mishra N,Motwani R,O'Callaghan L.Clustering data streams:Theory and practice.IEEE Transactions on Knowledge and Data Engineering,2003,15(3):515-528

共引文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部