期刊文献+

周期性一般间隙约束的序列模式挖掘 被引量:12

Mining Sequential Patterns with Periodic General Gap Constraints
下载PDF
导出
摘要 序列模式挖掘是从给定序列中发现出现频率高的模式的一种方法,目前已在诸多领域被广泛应用.假定子模式p_i和p_j(i<j)可以分别匹配事件A和事件B,传统的序列模式挖掘方法能够对事件B在事件A之后的序列进行检测,而不能对事件B发生在事件A之前的序列进行识别.为了解决此问题,文中提出了周期性一般间隙约束的序列模式挖掘问题,该问题具有如下5个特点:间隙约束的最小值可为负值的一般间隙约束;每个间隙约束都相同的周期性模式;在支持数统计方面无特殊约束,即允许序列中事件多次使用;该挖掘问题满足Apriori性质;挖掘支持率大于给定的频繁度阈值的频繁模式.为了进行有效地挖掘,采用深度优先的方式建立模式树.文中采用模式匹配技术,在一遍扫描序列数据库的情况下,建立其所有超模式的不完整网树森林(不完整网树是网树的最后一层结点,可以存储在一个数组中,可以有效地表示一个模式在一个序列中的支持数),并对这些超模式的支持率进行有效地计算,进而挖掘出所有频繁模式,有效地提高了序列模式挖掘速度.实验结果验证了文中算法的可行性和有效性. Sequential pattern plays an essential role in many Pi and Pj (i〈j) can match even the sequences in which event B mining is to discover the frequent patterns in the sequences and critical data mining tasks with broad applications. Given sub-patterns ts A and B respectively, traditional pattern mining methods can detect is after event A, but fail to find the sequences with event B occurring before event A. To tackle this challenge, in this paper, we propose sequential pattern mining with periodic general gap constraints with five characteristics as follows. The minimal gap constraint, namely general gap constraint, can be a negative value. All gap constraints of the pattern are the same. Any event in the sequence can be used more than once in different supports. The problem satisfies the Apriori property under the new definition of offset sequences. If the support ratio of a pattern is greater than the given threshold, the pattern is a frequent pattern. To solve the problem effectively, a depth-first search strategy is used to create the pattern tree. An Incom- plete Nettree structure, the last layer of a Nettree which can be stored in an array, can represent the support of a pattern in the sequence. Pattern matching method is used to create the Incomplete Nettrees for all super patterns of a frequent pattern by one way scan over the sequence database. Therefore, we can calculate the support ratios of these super patterns and find the frequent ones. Experimental results validate the feasibility and effectiveness of the proposed algorithms.
作者 武优西 周坤 刘靖宇 江贺 吴信东 WU You-Xi ZHOU Kun LIU Jing-Yu JIANG He WU Xin-Donga(School of Computer Science and Engineering, Hebei University of Technology, Tianjin 300401 School of Software, Dalian University of Technology, Dalian, Liaoning 116621 School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009 Department of Computer Science, University of Vermont, Burlington 05405, USA)
出处 《计算机学报》 EI CSCD 北大核心 2017年第6期1338-1352,共15页 Chinese Journal of Computers
基金 国家自然科学基金(61229301) 教育部创新团队项目(IRT13059) 河北省自然科学基金(F2013202138) 河北省教育厅重点项目(ZD2014009) 河北省教育厅青年基金(QN2014192)资助~~
关键词 序列模式挖掘 一般间隙 频繁模式 模式匹配 APRIORI性质 人工智能 sequential pattern mining general gap frequent pattern pattern matching Apriori property artificial intelligence
  • 相关文献

参考文献3

二级参考文献15

  • 1邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量:19
  • 2Fischer M J, Paterson M S. String Matching and Other Products// Karp R M, ed. Complexity of Computation. Cambridge, USA. MIT Press, 1974:113-125.
  • 3Zhang Minghua, Kao B, Cheung D W, et al. Mining Periodic Pat- terns with Gap Requirement from Sequences // Proc of the ACM SIGMOD International Conference on Special Interest Group on Man- agement of Data. Baltimore, USA, 2005 : 623-633.
  • 4Cole J R, Chai B, Marsh T L, et al. The Ribosomal Database Pro- ject(RDP-II) : Sequences and Tools for High-Throughput rRNA Analysis. Nucleic Acids Research, 2005, 33 ( zl ) : 294-296.
  • 5He Dan, Wu Xindong, Zhu Xingquan. SAIL-APPROX: An Effi- cient On-line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints// Proc of the IEEE International Conference on Bioinformatics and Biomedicine. Silicon Valley, USA, 2007:151-158.
  • 6Xie Fei, Wu Xindong, Hu Xuegang, et al. Sequential Pattern Min- ing with Wildcards// Proc of the 22nd IEEE International Confer-ence on Tools with Artificial Intelligence. Arras, France, 2010: 241-247.
  • 7Ji Xiaonan, Bailey J, Dong Guozhu. Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints. Knowledge and Infor- mation Systems, 2007, 11 (3) : 259-286.
  • 8He Yu, Wu Xindong, Zhu Xingquan, et al. Mining Frequent Pat- terns with Wildcards from Biological Sequences//Proc of the IEEE International Conference on Information Reuse and Integration. Las Vegas, USA, 2007:329-334.
  • 9Califf M E, Mooney R J. Bottom-up Relational Leaming of Pattern Matching Rules for Information Extraction. Journal of Machine Learning Research, 2003, 4(6) : 177-210.
  • 10Manber U, Baeza-Yates R. An Algorithm for String Matching with a Sequence of Don' t Cares. Information Processing Letters, 1991, 37 (3) : 133-136.

共引文献32

同被引文献76

引证文献12

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部