摘要
序列模式挖掘是从给定序列中发现出现频率高的模式的一种方法,目前已在诸多领域被广泛应用.假定子模式p_i和p_j(i<j)可以分别匹配事件A和事件B,传统的序列模式挖掘方法能够对事件B在事件A之后的序列进行检测,而不能对事件B发生在事件A之前的序列进行识别.为了解决此问题,文中提出了周期性一般间隙约束的序列模式挖掘问题,该问题具有如下5个特点:间隙约束的最小值可为负值的一般间隙约束;每个间隙约束都相同的周期性模式;在支持数统计方面无特殊约束,即允许序列中事件多次使用;该挖掘问题满足Apriori性质;挖掘支持率大于给定的频繁度阈值的频繁模式.为了进行有效地挖掘,采用深度优先的方式建立模式树.文中采用模式匹配技术,在一遍扫描序列数据库的情况下,建立其所有超模式的不完整网树森林(不完整网树是网树的最后一层结点,可以存储在一个数组中,可以有效地表示一个模式在一个序列中的支持数),并对这些超模式的支持率进行有效地计算,进而挖掘出所有频繁模式,有效地提高了序列模式挖掘速度.实验结果验证了文中算法的可行性和有效性.
Sequential pattern plays an essential role in many Pi and Pj (i〈j) can match even the sequences in which event B mining is to discover the frequent patterns in the sequences and critical data mining tasks with broad applications. Given sub-patterns ts A and B respectively, traditional pattern mining methods can detect is after event A, but fail to find the sequences with event B occurring before event A. To tackle this challenge, in this paper, we propose sequential pattern mining with periodic general gap constraints with five characteristics as follows. The minimal gap constraint, namely general gap constraint, can be a negative value. All gap constraints of the pattern are the same. Any event in the sequence can be used more than once in different supports. The problem satisfies the Apriori property under the new definition of offset sequences. If the support ratio of a pattern is greater than the given threshold, the pattern is a frequent pattern. To solve the problem effectively, a depth-first search strategy is used to create the pattern tree. An Incom- plete Nettree structure, the last layer of a Nettree which can be stored in an array, can represent the support of a pattern in the sequence. Pattern matching method is used to create the Incomplete Nettrees for all super patterns of a frequent pattern by one way scan over the sequence database. Therefore, we can calculate the support ratios of these super patterns and find the frequent ones. Experimental results validate the feasibility and effectiveness of the proposed algorithms.
作者
武优西
周坤
刘靖宇
江贺
吴信东
WU You-Xi ZHOU Kun LIU Jing-Yu JIANG He WU Xin-Donga(School of Computer Science and Engineering, Hebei University of Technology, Tianjin 300401 School of Software, Dalian University of Technology, Dalian, Liaoning 116621 School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009 Department of Computer Science, University of Vermont, Burlington 05405, USA)
出处
《计算机学报》
EI
CSCD
北大核心
2017年第6期1338-1352,共15页
Chinese Journal of Computers
基金
国家自然科学基金(61229301)
教育部创新团队项目(IRT13059)
河北省自然科学基金(F2013202138)
河北省教育厅重点项目(ZD2014009)
河北省教育厅青年基金(QN2014192)资助~~