期刊文献+

高维数据流聚类及其演化分析研究 被引量:9

Research on Clustering and Evolution Analysis of High Dimensional Data Stream
下载PDF
导出
摘要 基于数据流数据的聚类分析算法已成为研究的热点.提出一种基于子空间的高维数据流聚类及演化分析算法CAStream,该算法对数据空间进行网格化,采用近似的方法记录网格单元的统计信息,并将潜在密集网格单元快照以改进的金字塔时间结构进行存储,最后采用深度优先搜索方法进行聚类及其演化分析.CAStream能够有效处理高维数据流,并能发现任意形状分布的聚类.基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性. Clustering analysis in data stream has become a hot research issue. In this paper, CAStream, a novel algorithm of clustering and evolution analysis over high dimensional data stream is presented, which is based on subspace. CAStream partitions the data space into grids, gets the grid summary statistics using approximate method, then stores snapshots of potential dense girds by improved pyramid time frame, and finally finds the clusters and analyzes the cluster evolution by the depth-first search algorithm. CAStream can deal with high dimensional data stream, and discover the clusters with arbitrary shape. The experimental results on real datasets and synthetic datasets demonstrate the promising availabilities of the approach.
出处 《计算机研究与发展》 EI CSCD 北大核心 2006年第11期2005-2011,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(70371015) 教育部高等学校博士学科点科研基金项目(20040286009)
关键词 数据流 聚类分析 改进金字塔时间结构 演化分析 data stream clustering analysis improved pyramid time frame evolution analysis
  • 相关文献

参考文献10

  • 1Babcock S Babu,M Datar,et al.Models and issues in data stream systems[C].In:Proc of the 21st ACM Symp on Principles of Database Systems.New York:ACM Press,2002.1-16
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3S Guha,N Mishra,R Motwani,et al.Clustering data streams:Theory and practice[J].IEEE TKDE Special Issue on Clustering,2003,3(2):37-46
  • 4C Aggarwal,J Han,J Wang,et al.A framework for clustering evolving data streams[C].In:Proc of the 29th Int'l Conf on Very Large Data Base.San Francisco:Morgan Kaufmann,2003.81-92
  • 5C Aggarwal,J Han,J Wang,et al.A framework for projected clustering of high dimensional data streams[C].In:Proc of the 30th Int'l Conf on Very Large Data Base.San Francisco:Morgan Kaufmann,2004.852-863
  • 6O Nasraoui,C C Uribe,C R Coronel.TECNO-STREAMS:Tracking evolving clusters in noisy data streams with a scalable immune system learning model[C].In:Proc of the 3rd IEEE Int'l Conf on Data Mining.Los Alamitos,CA:IEEE Computer Society Press,2003.19-22
  • 7孙焕良 赵法信 鲍玉斌 等.CD—Stream——一种基于空间划分的流数据密度聚类算法[J].计算机研究与发展,2004,41:289-294.
  • 8C Aggarwal,J Han,J Wang,et al.On demand classification of data streams[C].In:Proc of the 10th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining.New York:ACM Press,2004.503-508
  • 9G S Manku,R Motwani.Approximate frequency counts over data streams[C].In:Proc of the 28th Int'l Conf on Very Large Data Base.San Francisco:Morgan Kaufmann,2002.346-357
  • 10R Agrawal,J Gehrke,D Gunopulos,et al.Automatic subspace clustering of high dimensional data for data mining application[C].In:Proc of the 1994 ACM SIGMOD Int'l Conf on Management of Data.New York:ACM Press,1994.94-105

二级参考文献52

  • 1Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1~16.
  • 2Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
  • 3Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261~272.
  • 4Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
  • 5Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
  • 6Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
  • 7Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
  • 8Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97~106.
  • 9Zhou A, Cai Z, Wei L, Qian W. M-Kernel merging: Towards density estimation over data streams. In: Cha SK, Yoshikawa M, eds. The 8th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003. 285~292.
  • 10Gibbons PB, Matias Y. Synopsis data structures for massive data sets. In: Tarjan RE, Warnow T, eds. Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Baltimore: ACM/SIAM, 1999. 909-910.

共引文献161

同被引文献82

引证文献9

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部