摘要
为提高逆向最大匹配算法的分词精度,本研究利用词频阙值,单字函数等方法取得了较好的消歧效果。实验结果表明:该分词算法既能遵循长词优先的原则,又能进一步识别和消除覆盖歧义。改进的RMM不仅在速度上仍保持较大优势而且在分词准确率上有了进一步的提高,对使用机械分词算法的中小型搜索引擎在提高分词精度方面具有一定的实用价值。
In order to enhance the accuracy of chinese word segmentation, using term frequency value and single character function, the present study has made great progress on the ambiguity resolution. The experiment shows this method is able to follow the long-word-first principle, and can further detect and resolve ambiguity. The improved RMM not only has a greater advantage in speed, but also increases the accuracy. It has practical value in the aspect of ambiguity resolution to the middle and small-scale search engines which adopt mechanical method.
出处
《河北农业大学学报》
CAS
CSCD
北大核心
2009年第4期100-102,107,共4页
Journal of Hebei Agricultural University
基金
河北省科学技术研究与发展计划项目(07213512)
关键词
中文分词
逆向最大匹配算法
单字率
词频
chinese word segmentation
RMM
rate of chinese character
term frequency