摘要
中文分词技术是中文信息处理领域的基础研究课题。而分词对于中文信息处理的诸多领域都是一个非常重要的基本组成部分。首先对中文分词的基本概念与应用,以及中文分词的基本方法进行了概述。然后分析了分词中存在的两个最大困难。最后指出了中文分词未来的研究方向。
Chinese word segmentation is a basic research issue on Chinese information processing tasks. And Chinese word segmentation is a very important component in many field of Chinese information process. The paper proposes an unsupervised training method for acquiring probability models that accurately segment Chinese character sequences into words. Then it presents a detailed analysis of the two great dificulties in word segmentation. And finally, it points out the research problems to be resolved on Chinese word segmentation.
出处
《信息技术》
2009年第7期187-189,192,共4页
Information Technology
关键词
中文分词
分词算法
歧义
未登录词
Chinese word segmentation
segmentation algorithm
ambiguity
unlisted words