期刊文献+

基于增量DFT概要的数据流聚类算法 被引量:3

An algorithm for clustering data streams using incremental DFT
下载PDF
导出
摘要 数据流聚类分析是数据流挖掘领域的重要分支。由于数据流海量、快速、动态到达,传统的静态数据挖掘技术不能满足在线分析的需求。数据流聚类的核心是设计单遍数据集扫描算法,在有限的内存中存储少量概要特征信息,实现数据流实时、在线聚类分析。采用数据流处理中广泛应用的滑动窗口模型,提出一种新的基于增量傅立叶变换(DFT)的数据流概要算法,并在此基础上运用k-均值(k-means)聚类,实现数据流的在线挖掘。基于增量DFT概要的数据流聚类算法可减少运行时间,节省内存空间,实际用电负荷数据证明了算法的有效性。 Clustering data streams is one of the important branches in mining data streams. Because of dynamic and massive characteristics of data streams, traditional data mining algorithnks could not satisfy the requirement of online analysis. The focus on data stream technologies is to design one-pass scan algorithmover data set, and maintain an effective synopsis data structure (digest) in memory incrementally which is far smaller than the size of whole data set, A novel algorithm for clustering data streams is presented in this paper. In this algorithm, means method is used for the subset division, sliding window model is used for the data changing and updating, DFT digest is used for data reduction and can be incrementally maintained. This algorithm can save main memory and run time, it is suitable for online clustering. Experiment of clustering real electrical consumption data verify the effectiveness of the presented algorithm.
出处 《华北电力大学学报(自然科学版)》 CAS 北大核心 2007年第5期85-89,共5页 Journal of North China Electric Power University:Natural Science Edition
关键词 数据流 滑动窗口 增量傅立叶变换 聚类 K-MEANS data stream sliding window incremental DFT cluster k-means
  • 相关文献

参考文献10

  • 1Babcock B,Balm S,Dater M,et al.Models and issues in data stream ststens[A].In:Proceedings of the 21st ACM Symp on Principles of Database Systems[C].Madison,Winscomin,USA:ACM Press,2002.1-16.
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3Cuba S,Mishra N,Motwani R,et al.Clustering data stteams[A].In:The 41st Annual Symp on Foundations of Computer Science,FOCS2000[C].Redondo Beach:IEEE Computer Society,2000.359-366.
  • 4O'Callaghan L,Mishia N,Meyemm A,et al.Streaming-data algorithms for high-quality dustering[A].In.Proc of IEEE International Conference on Data Engineering[C].San Jose,California,USA.IEEE Computer Society,2002.685-699.
  • 5Aggarwal C,Han J,Wang J,et al.A framework for dustering evolving data streams[A].In:Proceedingsof the 29th International Conference on Very Large Databases[C].Berlin.Germany:Morgan Kaufmann Publishers,2003.81-92.
  • 6Guha S,Koudas N.Approximating a data stream for querying and estimation:algorithms and perfomance evaluation[A].In:Proceedings of the 18th International Conference on Data Engineering(ICDE)[C].San Jose,California,USA:IEEE Press,2002.567-576.
  • 7Ordonez C.Clustering Binary Data Streams with Kmeans[A].In.Proc.of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD)[C].San Diego:2003.12-19.
  • 8Zhu Y,Shasha D.StatStream:Statistical Monitoring of Thousands of Data Streams in Real Time[A].In:Proceedings of the 28th International Conference on Very Large Databases[C].Hong Kong,China:Morgan Kaufmann,2002.358-369.
  • 9Beringer J,Hullermeier E.Online Clustering of Paralld Data Streams[J].Data & Knowleclge Engineering,2006,58(8):180-204.
  • 10李扬,王治华,卢毅,李军红,张长沪.峰谷分时电价的实施及大工业用户的响应[J].电力系统自动化,2001,25(8):45-48. 被引量:92

二级参考文献55

  • 1Roos J G,IEEE Trans Power Systems,1998年,13卷,1期,159页
  • 2Schweppe F C,Spot Price of Electricity,1988年
  • 3Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1~16.
  • 4Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
  • 5Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261~272.
  • 6Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
  • 7Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
  • 8Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
  • 9Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
  • 10Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97~106.

共引文献251

同被引文献32

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部