摘要
分析了全文检索系统中常用的基于字表和基于词表的中文分词方法的优缺点 ,提出了基于字、词、词组混合模型的中文搜索引擎分词系统 ;并利用广义语词概念 ,设计了分词词典 ,改进了最大匹配分词算法 (MM) ;最后把分词系统应用于全文检索中。
The strongpoints and the shortcomings of the application of Character-based and Word-based Chinese word segmentation method is analyzed. A new Word Segmentation System Based on Character , Word and Phrase for Chinese Search Engine is proposed . A segmentation is designed by using the generalized concept of phraseology for the improvement of the MM arithmetic. Finally this new word segmentation system is applied in full-text search.
出处
《武汉工业学院学报》
CAS
2002年第3期37-40,共4页
Journal of Wuhan Polytechnic University