摘要
Aho-Corasick自动机算法在模式匹配失配时,需要多次回溯才转移到有效的后继状态。为此,提出一种快速多模式匹配算法。该算法为每个状态建立失配时的后继指针,在模式匹配失配时,可以通过失配后继指针快速找到有效后继状态,从而避免Aho-Corasick自动机失配时的过多回溯,提高匹配效率。算法在自动机建立时采用动态规划的方法,为每个状态建立匹配长度和匹配量等信息,在模式匹配过程中,基于这些信息统计模式串在主串中的重复次数、最早出现模式串位置等信息。实验结果表明,该算法匹配精确、效率高,且支持在线操作。
Aho-Corasick automata algorithm has to backtrack for multiple times to shift to the effective subsequence state when it fails in one pattern matching.In order to solve this problem,this paper proposes a fast multiple patterns matching algorithm based on Aho-Corasick automata.The improved algorithm builds the subsequence pointers for each state.On failing matching,it can shift to the effective subsequence state through the subsequence pointers efficiently,which can reduce backtracking times in Aho-Corasick automata.Furthermore,the proposed algorithm achieves information such as matching length,matching times etc for each state during building automata by dynamic programming methods.Based on this information,the algorithm can calculate the repeated times of pattern strings,earliest position of pattern strings.Experimental results show that the algorithm has advantages of matching accuracy,efficiency,and supporting on-line operation.
出处
《计算机工程》
CAS
CSCD
2012年第11期173-176,共4页
Computer Engineering
基金
国家自然科学基金资助项目(61170108
6110019)
浙江省新苗人才计划基金资助项目(2011R404018)