摘要
数据流中的数据分布随着时间动态变化,但传统基于事务的滑动窗口模型难以体现该特征,因此挖掘结果并不精确.首先提出时间敏感数据流处理中存在的问题,然后建立基于时间戳的滑动窗口模型,并转换为基于事务的可变滑动窗口进行处理,提出了频繁项集的挖掘算法FIMoTS.该算法引入了类型变化界限的概念,将项集进行动态分类,根据滑动窗口大小的变化对项集进行延迟处理,仅当项集的类型变化界限超出一定阈值的时候才进行支持度的重新计算,能够达到剪枝的目的.在4种不同密度的数据集上完成的实验结果显示,该算法能够在保证内存开销基本不变的情况下显著提高计算效率.
Stream data arrives dynamically when stream continues, which cannot be reflected by the traditional transaction-based sliding window, thus the results are not accurate. This paper focuses on this problem and builds a timestamp-based sliding window model, which is afterwards converted into a transaction-based variable sliding window; based on this model, a frequent item- set mining algorithm named FIMoTS is proposed. In this algorithm, we introduce the type trans- forming bound to dynamically classify the itemsets into categories; as a result, these itemsets can be deferred processed with regard to the window size, that is, an itemset will not be processed unless its type transforming bounds reach to a threshold. Consequently, the computational pruning can be conducted. The experimental results over four different datasets show that our algorithm significantly outperform the Naive method.
出处
《计算机学报》
EI
CSCD
北大核心
2012年第11期2283-2293,共11页
Chinese Journal of Computers
基金
国家自然科学基金(61100112)
教育部人文社会科学研究青年基金(11YJCZH006)
北京市自然科学基金(9092014
4112053)
中央财经大学科研创新团队支持计划资助~~
关键词
频繁项集
数据流
时间敏感
滑动窗口
数据挖掘
frequent itemsets
data stream
timesensitive
sliding window
data mining