期刊文献+

基于改进Trie树的变形敏感词过滤算法 被引量:4

Deformation-Sensitive Word Filtering Algorithm Based on Improved Trie Tree
下载PDF
导出
摘要 在文本处理中,针对一般敏感词的过滤系统已经十分成熟,但是对于现今普及的变形敏感词的过滤方法有待完善,尤其是对于复杂的中文变形敏感词。针对变形敏感词过滤这一问题,通过对变形敏感词进行分析总结,提出一种基于改进Trie树的变形敏感词过滤算法。该算法经过对变形敏感词分析归类、文本进行分立预处理、构建符合中文特点的Trie树、变形敏感词过滤等阶段,形成一套完整的中文文本过滤体系。经过反复实验表明,该算法不仅可以有效查找中文本中的普通敏感词,并且能高效地过滤出变形敏感词,其中对总敏感词和变形敏感词的查全率分别达到95.46%和92.49%,扩大敏感词查找范围,提高敏感词过滤的精确度。 In text processing,the filtering system for general sensitive words has matured,but the processing methods for deformed sensitive words that are now common are still to be improved,especially for complex Chinese texts that are sensitive to deformation.Through analyzing and summarizing the deformation sensitive words,proposes a sensitive word filtering algorithm based on improved Trie tree.The algorithm pass?es through the process of preprocessing the deformation-sensitive words,preprocessing the text,constructing the Chinese-specific Trie tree,detecting sensitive words,etc.Finally,it can not only effectively find common sensitive words in Chinese text,but also can effectively filter out the deformation-sensitive words.The recall rate of total sensitive words and deformation-sensitive words reach95.46%and92.49%,respectively,which expands the search range of sensitive words and improves the accuracy of filtering of sensitive words.
作者 叶情 YE Qing(College of Computer Science, Sichuan University, Chengdu 610065)
出处 《现代计算机》 2018年第22期3-7,共5页 Modern Computer
基金 国家自然科学基金资助项目(No.61332001)
关键词 敏感词过滤 TRIE树 变形敏感词 文本分立 模糊匹配 Sensitive Word Filtering Trie Tree Fuzzy Matching Text Separation Deformation-Sensitive Word
  • 相关文献

参考文献7

二级参考文献55

共引文献158

同被引文献36

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部