期刊文献+

时间敏感数据流上的频繁项集挖掘算法 被引量:29

Frequent Itemset Mining over Time-Sensitive Streams
下载PDF
导出
摘要 数据流中的数据分布随着时间动态变化,但传统基于事务的滑动窗口模型难以体现该特征,因此挖掘结果并不精确.首先提出时间敏感数据流处理中存在的问题,然后建立基于时间戳的滑动窗口模型,并转换为基于事务的可变滑动窗口进行处理,提出了频繁项集的挖掘算法FIMoTS.该算法引入了类型变化界限的概念,将项集进行动态分类,根据滑动窗口大小的变化对项集进行延迟处理,仅当项集的类型变化界限超出一定阈值的时候才进行支持度的重新计算,能够达到剪枝的目的.在4种不同密度的数据集上完成的实验结果显示,该算法能够在保证内存开销基本不变的情况下显著提高计算效率. Stream data arrives dynamically when stream continues, which cannot be reflected by the traditional transaction-based sliding window, thus the results are not accurate. This paper focuses on this problem and builds a timestamp-based sliding window model, which is afterwards converted into a transaction-based variable sliding window; based on this model, a frequent item- set mining algorithm named FIMoTS is proposed. In this algorithm, we introduce the type trans- forming bound to dynamically classify the itemsets into categories; as a result, these itemsets can be deferred processed with regard to the window size, that is, an itemset will not be processed unless its type transforming bounds reach to a threshold. Consequently, the computational pruning can be conducted. The experimental results over four different datasets show that our algorithm significantly outperform the Naive method.
出处 《计算机学报》 EI CSCD 北大核心 2012年第11期2283-2293,共11页 Chinese Journal of Computers
基金 国家自然科学基金(61100112) 教育部人文社会科学研究青年基金(11YJCZH006) 北京市自然科学基金(9092014 4112053) 中央财经大学科研创新团队支持计划资助~~
关键词 频繁项集 数据流 时间敏感 滑动窗口 数据挖掘 frequent itemsets data stream timesensitive sliding window data mining
  • 相关文献

参考文献21

  • 1Agrawal R, Imielinski T, Swami A N. Mining association rules between sets of items in large databases//Proceedings of the ACM SIGMOD the International Conference on Man agement of Data. Vienna, Austria, 1993:297-216.
  • 2Agrawal R, Srikant R. Fast algorithms for mining associa tion rules//Proceedings of the VLDB the Very Large Data bases. Santiago, Chile, 1994:487-499.
  • 3Agrawal R, Srikant R. Mining sequential patterns//Proceed ings of the ICDE the International Conference on Data Engi neering. Taipei, China, 1995:3-14.
  • 4Xiong H, Tan P-N, Kumar V. Hyperclique pattern discovery. DMKD the Data Mining and Knowledge Discovery, 2006, 13(2): 219-242.
  • 5Chang J H, Lee W S. Finding recent frequent itemsets adap- tively over online data streams//Proceedings of the Interna tional Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, 2003:487-492.
  • 6Li H, Lee S, Shan M. An efficient algorithm for mining fre quent itemsets over the entire history of data streams//Proceedings of the International Workshop Frequent Itemset Mining Implementations. Seattle, WA, USA, 2004:20-24.
  • 7Giannella C, Han J, Pei J, Yah X, Yu P S. Mining frequent patterns in data streams at multiple time granularities// Kargupta H, Joshi A, Sivakumar K, Yesha Y eds. Next Generation Data Mining. AAAI/MIT, 2003:191-210.
  • 8Chang J H, Lee W S. estWin: Adpatively monitoring the re- cent change of frequent itemsets over online data streams// Proceedings of the Conference on Information and Knowledge Management. New Orleans, Louisiana, USA, 2003:536-539.
  • 9Jin R, Agrawa O. An algorithm for in-core frequent itemset mining on streaming data//Proceedings of the IEEE Interna- tional Conference on Data Mining. Houston, Texas, USA, 2005, 210-217.
  • 10Mozafari B, Thakkar H, Zaniolo C. Verifying and mining frequent patterns from large windows over data streams// Proceedings of the International Conference on Data Engi neering. Cancun, Mexico, 2008:179-188.

同被引文献285

引证文献29

二级引证文献379

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部