期刊文献+

面向流数据的DPFP-Stream算法的设计与实现 被引量:1

Realization and Implementation of Distributed Parallel Mining of Frequent Patterns for Data Streams
下载PDF
导出
摘要 从海量数据中发现频繁模式一直是数据挖掘研究的热点,在零售市场数据分析、网络监控、网络使用挖掘和股票市场的预测等领域中也有着广泛的应用。尽管在过去的十年里,很多学者提出了许多基于静态数据集的频繁模式挖掘算法,而由于流数据持续、无限、有序而高速产生的特性,在流数据中隐藏的数据知识很可能随着时间的推移而产生变化,因而基于流数据的频繁模式挖掘应不同于以往基于静态数据集的频繁模式挖掘算法。为了更好地分析在线流数据,基于不同的时间粒度从流数据中抽取频繁模式并且监控频繁模式的变化,基于高效的FP-tree结构,借助倾斜时间窗口和MapReduce的思想,提出了针对数据流的频繁模式挖掘算法DPFP-stream。并将该算法在Storm平台上实现,算法数据源采用Kafka,并将中间结果存入内存数据库Redis中。通过大量的实验表明,该算法从高速的数据流中发现频繁模式的效率很高且性能稳定。在海量数据实时计算中,采用该算法,不仅能应对高速的数据流,而且能监控不同时间粒度的频繁模式的变化过程。 Finding frequent patterns in a continuous stream of transactions is critical for many applications such as retail market data analy- sis, network monitoring, web usage mining and stock market prediction. Even though numerous frequent pattern mining algorithms have been developed over the past decade, new solutions for handling stream data are still required due to the continuous, unbounded and or- dered sequence of data elements generated at a rapid rate in a data stream. As a result,the knowledge embedded in a data stream is more likely to be changed as time goes by. Therefore, extracting frequent patterns from data at multiple time granularities and monitoring the gradual changes of frequent patterns can enhance the analysis of online data streams. Based on efficient FP-tree structure,according to the ideas of tilted-time windows and MapReduce,the DPFP-stream is proposed and implemented in Storm. The data resource of it uses Kaf- ka and stores middle result into Redis. Extensive experiment shows that the algorithm proposed is highly efficient in terms of time com- plexity when finding recent frequent patterns from a high-speed data stream. With the application of the algorithm in real-time compu- ting,it can not only process high speed stream,but also monitor the change of frequent patterns with tilted-time windows.
出处 《计算机技术与发展》 2017年第7期29-33,共5页 Computer Technology and Development
基金 国家自然科学基金资助项目(61302158 61571238)
关键词 DPFP-stream MAPREDUCE STORM REDIS DPFP-stream MapReduce Storm Redis
  • 相关文献

参考文献4

二级参考文献42

  • 1周涛,陆惠玲.关联规则挖掘算法研究[J].齐齐哈尔大学学报(自然科学版),2004,20(3):58-62. 被引量:5
  • 2毕建欣,张岐山.关联规则挖掘算法综述[J].中国工程科学,2005,7(4):88-94. 被引量:51
  • 3吴芬兰,胡朝举,高雅,李整.关联规则挖掘算法的改进[J].微机发展,2005,15(8):151-152. 被引量:3
  • 4何丽君,董蕊,袁克杰.常见关联规则算法分析与比较[J].大连民族学院学报,2005,7(5):39-42. 被引量:6
  • 5Boukerche. Handbook of Algorithms for Qireless Networking and Mobile Computing. Chapman & Hall/CRC, 2005.
  • 6Aach J and Church G. Aligning gene expression time series with time warping algorithms. Bioinformatics, 2001, 17(6), 495-508.
  • 7Laxman S. Stream prediction using a generative model based on frequent episodes in event sequences. Proceeding of Knowledge Discovery and Data Mining Conference 2008, Las Vegas, Nevada, USA,30 Jul. 2008: 453-461.
  • 8Vladimir Vapnik. The Nature of Statistical Learning Theory. New York: Springer Verlag, 1999, Chapter 4.
  • 9Lin J, Keogh E, Lonardi S, and Chiu B. A symbolic representation of time series with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego, California, 9 Jun. 2003: 2-11.
  • 10Cheng H, Yah X, Han J, and Hsu C W. Discriminative frequent pattern analysis for effective classification. Proceeding of International Conference on Data Engineering 2007, Istanbul, 17 April, 2007: 716-725.

共引文献47

同被引文献4

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部