摘要
提出了一个基于时间窗口的数据预处理算法 .面向具体应用 ,根据已有知识 ,此算法可以智能化地滤去一些“噪声”数据 .与一般的定义不同 ,本文所谓的“噪声”数据是指那些由一些已知的规则决定性地影响着的数据 ,研究显示它们会对进一步的数据挖掘形成极大干扰 .实际测试结果表明 ,本算法能够改善一些已有数据挖掘算法的执行效果 .
A time-windows based data preprocessing algorithm is proposed. Application oriented, this algorithm can intelligently filter out 'noisy' data, which is decided by the rules currently known and may prevent us from mining new rules from the database. Using Apriori algorithm to process the data that has been preprocessed by the algorithm, the authors get frequent itemset closer to their target. Furthermore, using TW _SP, a multi-dimensional sequential pattern mining algorithm proposed by other researchers in 2001, to process the preprocessed data, the authors get sequential patterns which proved to be clearer.
出处
《小型微型计算机系统》
CSCD
北大核心
2004年第1期89-92,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金 (6983 5 0 0 )资助
关键词
数据预处理
数据挖掘
序贯模式
data preprocessing
data mining
sequential patterns