Optimizing of large-number-patterns string matching algorithms based on definite-state automata 被引量：3

Optimizing of large-number-patterns string matching algorithms based on definite-state automata

下载PDF

导出

摘要 Because the small CACHE size of computers, the scanning speed of DFA based multi-pattern string-matching algorithms slows down rapidly especially when the number of patterns is very large. For solving such problems, we cut down the scanning time of those algorithms (i.e. DFA based) by rearranging the states table and shrinking the DFA alphabet size. Both the methods can decrease the probability of large-scale random memory accessing and increase the probability of continuously memory accessing. Then the hitting rate of the CACHE is increased and the searching time of on the DFA is reduced. Shrinking the alphabet size of the DFA also reduces the storage complication. The AC++algorithm, by optimizing the Aho-Corasick (i.e. AC) algorithm using such methods, proves the theoretical analysis. And the experimentation results show that the scanning time of AC++and the storage occupied is better than that of AC in most cases and the result is much attractive when the number of patterns is very large. Because DFA is a widely used base algorithm in may string matching algorithms, such as DAWG, SBOM etc., the optimizing method discussed is significant in practice. Because the small CACHE size of computers, the scanning speed of DFA based multi-pattern stringmatching algorithms slows down rapidly especially when the number of patterns is very large. For solving such problems, we cut down the scanning time of those algorithms （i. e. DFA based） by rearranging the states table and shrinking the DFA alphabet size. Both the methods can decrease the probability of large-scale random mem- ory accessing and increase the probability of continuously memory accessing. Then the hitting rate of the CACHE is increased and the searching time of on the DFA is reduced. Shrinking the alphabet size of the DFA also reduces the storage complication. The AC ＋＋ algorithm, by optimizing the Aho-Corasick （i. e. AC） algorithm using such methods, proves the theoretical analysis. And the experimentation results show that the scanning time of AC ＋＋ and the storage occupied is better than that of AC in most cases and the result is much attractive when the number of patterns is very large. Because DFA is a widely used base algorithm in may string matching algorithms, such as DAWG, SBOM etc. , the optimizing method discussed is significant in practice.

作者陈训逊方滨兴

机构地区 Computer Networks and Information Security Technology Research Center

出处《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2007年第2期236-239,共4页 哈尔滨工业大学学报（英文版）

关键词 multi-pattern string-matching definite-state automata Aho-Corasick algorithm CACHE 有限自动机大数型字符串匹配算法优化多类型字符串 CACHE

分类号 TP301.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献4

1Allauzen C,Raffinot M.Factor Oracle of a Set of Words[]..1999
2Aho A,Corasick M.Efficient string matching: An aid to bibliographic search[].Communications of the ACM.1975
3Knuth D,Morris J,Pratt V.Fast pattern matching in strings[].SIAM Journal on Computing.1977
4Wu S,Manber U.A Fast Algorithm for Multi-pattern Searching[]..1994

同被引文献12

1宋明秋,张国权,邓贵仕.IDS中新的快速多模式匹配算法及其设计[J].计算机工程与应用,2005,41(21):159-162. 被引量：9
2杨薇薇,廖翔.一种改进的BM模式匹配算法[J].计算机应用,2006,26(2):318-319. 被引量：25
3Boyer RS,Moore JS.A fast string searching algorithm[J].Communication of the ACM.1997,20(10):762-772.
4R.Nigel Horspool.Practical fast searching in strings.Software Practice and Experience.1980,10(6):501-506.
5Mike fisk,George Varghese,Fast content-based packet handling for intrusion detection[R].UCSD Technical Report CS2001-0670.2001-05.
6Boyer RS,Moore Js.A fast string searching algorithm[J].Communication of the ACM.1997,20(10):762-772.
7R.Nigel Horspool.Practical fast searching in strings.Software Practice and Experience.1980,10(6):501-506.
8A Aho,M Corasick.Efficient string matching an aid to bibliographic search.Communication of the ACM,1975,18(6):333-340.
9C Jason coit,Stuart Staniford.Toward faster string matching for intrusion detection or exceeding the speed of snort[J].IEEE,CS Press,2001:367-373.
10Mike fisk,George Varghese,Fast content-based packet handling for intrusion detection[R].UCSD Technical Report CS2001-0670.2001-05.

引证文献3

1高朝勤,陈元琰,黎芸.入侵检测中一种节约内存的多模式匹配算法[J].计算机工程与应用,2009,45(11):107-110. 被引量：4
2张峰.一种改进的单模式匹配算法[J].福建电脑,2010,26(7):89-90.
3张峰.一种改进的多模式匹配算法[J].福建电脑,2010,26(8):113-114.

二级引证文献4

1孟庆端,吕东伟,梁祖华.入侵检测系统中改进的AC_BMH算法[J].计算机工程,2010,36(22):160-162. 被引量：4
2赵旭.基于Snort的动态自适应多媒体数据处理方法[J].计算机系统应用,2011,20(4):211-213. 被引量：2
3王瑞莹,邱亮.一种新的应用于数据流关联分析的多模式匹配算法[J].东北电力大学学报,2012,32(4):22-25. 被引量：1
4陈家宇.对贝叶-莫尔算法的研究与改进[J].电子测试,2013,0(6X):197-198.

1智云生,孙星明,黄华军,柴晨阳.入侵检测中基于后缀树的多模式匹配算法[J].计算机应用与软件,2008,25(10):38-40. 被引量：2
2余建明,薛一波,李军.Memory Efficient String Matching Algorithm for Network Intrusion Management System[J].Tsinghua Science and Technology,2007,12(5):585-593. 被引量：9
3贺龙涛,Fang,Binxing,YUN,Xiaochun,Hu,Mingzeng.SDFA： a Uniform Model for String Matching Algorithms[J].High Technology Letters,2004,10(2):34-37.

Journal of Harbin Institute of Technology(New Series)

2007年第2期

浏览历史

内容加载中请稍等...

Optimizing of large-number-patterns string matching algorithms based on definite-state automata 被引量：3

参考文献4

同被引文献12

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史