摘要
字符串近似匹配在网络安全中有广泛的应用。本文从中文字符串相似度角度出发,提出了通过单个汉字的细分来提高字符相似度的想法,并从汉字"成簇性"方面进行分析,引出了汉字的Key表示方法,将汉字与Key的映射关系归结为规则,讨论了规则的获取方法。设计了基于规则的中文字符串近似匹配的框架,提出了新的相似度计算模型,并通过实验对整个流程加以验证,证明基于规则的中文字符串近似匹配的优越性。
Approximate string matching is widely used in network security.Stand on the point of the similarity of Chinese strings,this paper proposes an idea which improve the similarity by the division of single Chinese character.And analyzing from the "cluster" feature of Chinese character,discussing expression of the key of Chinese characters.The relationship between Chinese character and their keys is concluded to rules,and the method that get the rules are also discussed.Moreover,the paper designs a framework of approximate string matching of Chinese characters based on rules,and proposes a new similarity calculating model of strings.And at last,proving the whole flow by an experiment,testify the advantage of this method.
出处
《网络安全技术与应用》
2010年第12期41-44,40,共5页
Network Security Technology & Application
关键词
中文字符串
近似匹配
成簇性
规则
String of Chinese character
Approximate string matching
Cluster feature
Rules