期刊文献+

基于滑动窗口的数据流闭合频繁模式的挖掘 被引量:26

Mining Frequent Closed Patterns from a Sliding Window over Data Streams
下载PDF
导出
摘要 频繁闭合模式集惟一确定频繁模式完全集并且数量小得多,然而,如何挖掘滑动窗口中的频繁闭合模式集是一个很大的挑战.根据数据流的特点,提出了一种发现滑动窗口中频繁闭合模式的新方法DSCFI.DSCFI算法将滑动窗口分割为若干个基本窗口,以基本窗口为更新单位,利用已有的频繁闭合模式挖掘算法计算每个基本窗口的潜在频繁闭合项集,将它们及其子集存储到一种新的数据结构DSCFItree中,DSCFItree能够增量更新,利用DSCFItree可以快速地挖掘滑动窗口中的所有频繁闭合模式.最后,通过实验验证了这种方法的有效性. The set of frequent closed patterns determines exactly the complete set of all frequent patterns and is usually much smaller than the latter. But how to mine frequent closed patterns from a sliding window is a very big challenge. According to the features of data streams, a new algorithm, call DS_CFI, is proposed to solve the problem of mining the frequent closed itemsets. A sliding window is divided into several basic windows and the basic window is served as an updating unit. Latency frequent closed itemsets of every basic window are mined by the existing frequent closed pattern algorithms. Those itemsets and their subset are stored in a new data structure called DSCFI_tree. The DSCFI_tree can be incrementally updated and the frequent closed itemsets in a sliding window can be rapidly found based on DSCFI_tree. The experimental results show the feasibility and effectiveness of the algorithm.
出处 《计算机研究与发展》 EI CSCD 北大核心 2006年第10期1738-1743,共6页 Journal of Computer Research and Development
基金 江苏省高技术基金项目(BG2004034) 江苏省2004年度研究生创新计划基金项目(xm04-36)~~
关键词 数据流 闭合频繁项集 滑动窗口 关联规则 知识发现 data stream frequent closed item sliding window association rule knowledge discovery
  • 相关文献

参考文献13

  • 1C Giannella,J Han,J Pei,et al.Mining frequent patterns in data streams at multiple time granularities[G].In:H Kargupta,A Joshi,K Sivakumar,et al,eds.Next Generation Data Mining.Cambridge,Mass:MIT Press,2003
  • 2G S Manku,R Motwani.Approximate frequency counts over streaming data[C].The 28th Int'l Conference on Very Large Data Bases (VLDB 2002),Hong Kong,2002
  • 3Song Guojie,Wang Tengjiao,Tang Shiwei,et al.Estimation and maintenance of frequent pattern in data streams[C].National Data Base Conference 2003,Changsha,2003
  • 4Joong Hyuk Chang,Won Suk Lee.Finding recent frequent itemsets adaptively over online data streams[C].The 9th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining (KDD 03),Washington,DC,2003
  • 5Wei Guang Teng,Ming-Syan Chen,Philip S Yu.A regression-based temporal pattern mining scheme for data streams[C].The Int'l Conf on Very Large Data Bases(VLDB 2003),Berlin,Germany,2003
  • 6Graham Cormode,Flip Korn,S Muthukrishnan,et al.Finding hierarchical heavy hitters in data streams[C].The Int'l Conf on Very Large Data Bases (VLDB 2003),Berlin,Germany,2003
  • 7Graham Cormode,S Muthukrishnan.What's hot and What's not:Tracking most frequent items dynamically[C].The ACM Symp on Principles of Database Systems (PODS 2003),San Diego,CA,USA,2003
  • 8C Sirish,M J Franklin.Streaming queries over streaming data[C].The 28th Int'l Conf on Very Large Data Bases,Hong Kong,2002
  • 9N Pasquier,Y Bastide,R Taouil,et al.Discovering frequent closed itemsets for association rules[C].In:Beeri C,et al,eds.Proc of the 7th Int'l Conf on Database Theory.Berlin:Springer-Verlag,1999.398-416
  • 10J Pei,J Han,R Mao.CLOSET:An efficient algorithm for mining frequent closed itemsets[C].In:D Gunopulos,et al,eds.Proc of the 2000 ACM SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery.New York:ACM Press,2000.21-30

二级参考文献8

  • 1[1]Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In: Beeri C, et al, eds. Proc. of the 7th Int'l. Conf. on Database Theory. Jerusalem: Springer-Verlag, 1999. 398~416.
  • 2[2]Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Beeri C, et al, eds. Proc. of the 20th Int'l. Conf. on Very Large Databases. Santiago: Morgan Kaufmann Publishers, 1994. 487~499.
  • 3[3]Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, et al, eds. Proc. of the 2000 ACM SIGMOD Int'l. Workshop on Data Mining and Knowledge Discovery. Dallas: ACM Press, 2000. 21~30.
  • 4[4]Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Georgakopoulos D, et al, eds. Proc. of the 17th Int'l. Conf. on Data Engineering. Heidelberg: IEEE Press, 2001. 443~452.
  • 5[5]Zaki MJ, Hsiao CJ. CHARM: An efficient algorithm for closed itemset mining. In: Grossman R, et al, eds. Proc. of the 2nd SIAM Int'l. Conf. on Data Mining. Arlington: SIAM, 2002. 12~28.
  • 6[6]Liu JQ, Pan YH, Wang K, Han J. Mining frequent item sets by opportunistic projection. In: Hand D, et al, eds. Proc. of the 8th ACM SIGKDD Int'l. Conf. on Knowledge Discovery and Data Mining. Alberta: ACM Press, 2002. 229~238.
  • 7[7]Srikant R. Quest synthetic data generation code. San Jose: IBM Almaden Research Center, 1994. http://www.almaden.ibm.com/ software/quest/Resources/index.shtml
  • 8[8]Blake C, Merz C. UCI Repository of machine learning. Irvine: University of California, Department of Information and Computer Science, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html

共引文献18

同被引文献368

引证文献26

二级引证文献199

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部