基于增量DFT概要的数据流聚类算法被引量：3

An algorithm for clustering data streams using incremental DFT

下载PDF

导出

摘要数据流聚类分析是数据流挖掘领域的重要分支。由于数据流海量、快速、动态到达,传统的静态数据挖掘技术不能满足在线分析的需求。数据流聚类的核心是设计单遍数据集扫描算法,在有限的内存中存储少量概要特征信息,实现数据流实时、在线聚类分析。采用数据流处理中广泛应用的滑动窗口模型,提出一种新的基于增量傅立叶变换(DFT)的数据流概要算法,并在此基础上运用k-均值(k-means)聚类,实现数据流的在线挖掘。基于增量DFT概要的数据流聚类算法可减少运行时间,节省内存空间,实际用电负荷数据证明了算法的有效性。 Clustering data streams is one of the important branches in mining data streams. Because of dynamic and massive characteristics of data streams, traditional data mining algorithnks could not satisfy the requirement of online analysis. The focus on data stream technologies is to design one-pass scan algorithmover data set, and maintain an effective synopsis data structure （digest） in memory incrementally which is far smaller than the size of whole data set, A novel algorithm for clustering data streams is presented in this paper. In this algorithm, means method is used for the subset division, sliding window model is used for the data changing and updating, DFT digest is used for data reduction and can be incrementally maintained. This algorithm can save main memory and run time, it is suitable for online clustering. Experiment of clustering real electrical consumption data verify the effectiveness of the presented algorithm.

作者孔英会安静车辚辚刘云峰

机构地区华北电力大学电气与电子工程学院

出处《华北电力大学学报（自然科学版）》 CAS 北大核心 2007年第5期85-89,共5页 Journal of North China Electric Power University：Natural Science Edition

关键词数据流滑动窗口增量傅立叶变换聚类 K-MEANS data stream sliding window incremental DFT cluster k-means

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Babcock B,Balm S,Dater M,et al.Models and issues in data stream ststens[A].In:Proceedings of the 21st ACM Symp on Principles of Database Systems[C].Madison,Winscomin,USA:ACM Press,2002.1-16.
2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
3Cuba S,Mishra N,Motwani R,et al.Clustering data stteams[A].In:The 41st Annual Symp on Foundations of Computer Science,FOCS2000[C].Redondo Beach:IEEE Computer Society,2000.359-366.
4O'Callaghan L,Mishia N,Meyemm A,et al.Streaming-data algorithms for high-quality dustering[A].In.Proc of IEEE International Conference on Data Engineering[C].San Jose,California,USA.IEEE Computer Society,2002.685-699.
5Aggarwal C,Han J,Wang J,et al.A framework for dustering evolving data streams[A].In:Proceedingsof the 29th International Conference on Very Large Databases[C].Berlin.Germany:Morgan Kaufmann Publishers,2003.81-92.
6Guha S,Koudas N.Approximating a data stream for querying and estimation:algorithms and perfomance evaluation[A].In:Proceedings of the 18th International Conference on Data Engineering(ICDE)[C].San Jose,California,USA:IEEE Press,2002.567-576.
7Ordonez C.Clustering Binary Data Streams with Kmeans[A].In.Proc.of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD)[C].San Diego:2003.12-19.
8Zhu Y,Shasha D.StatStream:Statistical Monitoring of Thousands of Data Streams in Real Time[A].In:Proceedings of the 28th International Conference on Very Large Databases[C].Hong Kong,China:Morgan Kaufmann,2002.358-369.
9Beringer J,Hullermeier E.Online Clustering of Paralld Data Streams[J].Data & Knowleclge Engineering,2006,58(8):180-204.
10李扬,王治华,卢毅,李军红,张长沪.峰谷分时电价的实施及大工业用户的响应[J].电力系统自动化,2001,25(8):45-48. 被引量：92

二级参考文献55

1Roos J G，IEEE Trans Power Systems，1998年，13卷，1期，159页
2Schweppe F C，Spot Price of Electricity，1988年
3Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1～16.
4Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
5Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261～272.
6Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
7Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
8Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
9Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
10Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97～106.

共引文献251

1欧阳觅剑.税制改革自征管始[J].南风窗,2002(17):37-39.
2田李,王乐,贾焰,邹鹏,李爱平.分布式数据流上低通信开销的连续极值查询方法研究[J].计算机研究与发展,2007,44(z3):61-66.
3陈飞波,钱卫宁,周傲英.基于最窄平行四边形的数据流突变检测算法[J].计算机研究与发展,2007,44(z3):505-510.
4何月梅,杜海艳,王保民.分形技术与矢量量化相结合的网络流量异常检测研究[J].邯郸学院学报,2009,19(3):73-76.
5秦林新,刘奇志.一种乱序数据流上的偏倚抽样算法[J].计算机研究与发展,2011,48(S3):298-303.
6张明明,芦琳.电能计量中的异常数据研究[J].电气应用,2013,0(S1):42-46. 被引量：2
7曾鸣,刘敏,赵庆波,孙昕,尹佳音,张艳馥.上网侧与销售侧峰谷分时电价联动的理论及应用[J].中国电力,2003,36(9):70-74. 被引量：27
8杨力俊,牛亚平,韩伟国,谭丽红,余刚.城市电力负荷特性分析及其优化控制[J].现代电力,2005,22(3):89-93. 被引量：3
9刘昌,姚建刚,姚文峰,张午阳,张佳启.基于DSM的分时电价的确定与分析[J].继电器,2005,33(15):57-61. 被引量：25
10金澈清,崇志宏,周傲英.一种实时监控最近邻的近似算法[J].计算机科学与探索,2007,1(2):146-159.

同被引文献32

1马骞,杨以涵,刘文颖,齐郑,郭金智.多输入特征融合的组合支持向量机电力系统暂态稳定评估[J].中国电机工程学报,2005,25(6):17-23. 被引量：137
2陈隽,徐幼麟.经验模分解在信号趋势项提取中的应用[J].振动．测试与诊断,2005,25(2):101-104. 被引量：57
3刘奕群,张敏,马少平.基于改进决策树算法的网络关键资源页面判定[J].软件学报,2005,16(11):1958-1966. 被引量：11
4赵凤展,杨仁刚.基于时域、小波变换和FFT的电能质量扰动识别[J].继电器,2006,34(8):50-55. 被引量：10
5孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1):1-5. 被引量：36
6钱峰,胡光岷.基于滑动时窗的小波变换实时算法[J].信号处理,2007,23(3):361-364. 被引量：9
7Gibbons P B, Matias Y.New sampling-based summary statistics for improving approximate query answers[C]//Proc of the ACM SIGMOD Int'l Conf on Management of Data Seattle. [S.l.] :ACM Press, 1998 : 331-342.
8Marcus D, Torsten S.Performance and limitations of the Hilbert-Huang Transformation(HHT) with an application to irregular water waves[J].Ocean Engineering, 2004,31 ( 14/15 ) : 1783 - 1834.
9Peng Z K,Tse Peter W,Chu F L.An improved Hilbert-Huang transform and its application in vibration signal analysis[J]. Journal of Sound and Vibration,2005,286(1/2):187-205.
10Dimitris A 1. Katsaprakakis, Dimitris G. Christakis, Arthouros Zervos, and Spiros Voutsinas. A Power-Quality Measure [ J ]. IEEE Transactions On Power Delivery, 2008, 23 (2): 553-561.

引证文献3

1刘慧婷,倪志伟.经验模态分解在数据流概要生成中的应用[J].计算机工程与应用,2010,46(22):6-8.
2孔英会,蔡维,何伟.基于特征组合的SVM电能质量扰动信号分类[J].华北电力大学学报（自然科学版）,2010,37(4):72-77. 被引量：4
3田德,张琦.基于优化核极限学习机的光伏出力短期预测[J].电力科学与工程,2017,33(12):15-21. 被引量：3

二级引证文献7

1陈朋永,赵书涛,李建鹏,陈云飞.基于EMD和SVM的高压断路器机械故障诊断方法研究[J].华北电力大学学报（自然科学版）,2012,39(6):23-28. 被引量：14
2丁建光,张沛超.基于Hoeffding Tree的电能质量在线扰动分类[J].电力自动化设备,2014,34(9):84-89. 被引量：5
3潘丰厚,原峰,王茂军,钟丹田,高强,张云华.基于ADSP-BF606的电能质量检测装置的设计[J].电气应用,2015,0(S2):44-50. 被引量：1
4程宇卿.“互联网+”模式下智慧医疗对疾病预防的预测模型[J].电子技术与软件工程,2022(7):62-66. 被引量：2
5商立群,李洪波,侯亚东,黄辰浩,张建涛,杨雷.基于VMD-ISSA-KELM的短期光伏发电功率预测[J].电力系统保护与控制,2022,50(21):138-148. 被引量：40
6孙玉波,涂承谦,李斌,缪健锋.基于GA-PSO-BP与灰色关联的光伏短期功率预测[J].电力与能源,2024,45(2):219-227.
7杨岑玉,王同勋.基于支持向量机的高速铁路电能质量数据分类方法研究[J].智能电网,2014,2(1):34-38.

1王旭阳,李明.基于概念格的数据挖掘方法研究[J].计算机应用,2005,25(4):827-829. 被引量：14
2李琦,宋国新.在线挖掘关联规则算法的改进[J].华东理工大学学报（自然科学版）,2000,26(5):507-511. 被引量：2
3敖富江,杜静,颜跃进,黄柯棣.在线挖掘数据流滑动窗口中频繁闭项集[J].系统工程与电子技术,2009,31(5):1235-1240. 被引量：2
4王咏,申瑞民.采用构造后缀树方法的在线挖掘浏览模式[J].计算机工程,2004,30(19):126-128. 被引量：3
5敖富江,颜跃进,刘宝宏,黄柯棣.在线挖掘数据流滑动窗口中最大频繁项集[J].系统仿真学报,2009,21(4):1134-1139. 被引量：9
6陈耿锋.大规模网络数据的在线挖掘系统的研究与开发[J].现代计算机,2012,18(15):8-12. 被引量：2
7周欣.五笔打字提速简析[J].西北职教,2006(12).
8廖雨田.探讨如何做好供电所业扩报装工作[J].科技创新导报,2016,13(26):3-3. 被引量：3
9董志.集成空间分析方法在线挖掘地理空间关联规则[J].电脑编程技巧与维护,2016(3):10-23. 被引量：4
10毛伊敏,陈志刚.在线挖掘数据流闭频繁项集的高效算法[J].计算机科学,2013,40(2):229-234. 被引量：2

华北电力大学学报（自然科学版）

2007年第5期

浏览历史

内容加载中请稍等...

基于增量DFT概要的数据流聚类算法被引量：3

参考文献10

二级参考文献55

共引文献251

同被引文献32

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于增量DFT概要的数据流聚类算法 被引量：3

参考文献10

二级参考文献55

共引文献251

同被引文献32

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于增量DFT概要的数据流聚类算法被引量：3