期刊文献+

基于条件随机场的汉语分词系统 被引量:15

CRF-based Chinese Word Segmentation Research
下载PDF
导出
摘要 汉语分词是自然语言处理的首要的基本工作。本文提出了一个基于条件随机场(简称CRF)的汉语分词模型,CRF模型作为一个判别模型,可以容纳任意的非独立的特征信息。我们首先将分词看作是一个标记的过程,然后利用CRF模型对每个汉字进行标记,最后转换为相应的分词结果。系统采用感知机(Perceptron)算法进行参数训练。跟以前利用CRF进行分词的模型相比,本系统定义并使用了不同的特征函数,取得了更好的切分结果。在1st SIGHAN分词比赛PK测试集上封闭测试,F值为95.2%。 Chinese word segmentation is the basic task in the NLP research. A CRF-based word segmentation system is proposed in this paper. CRF model which is a discriminable model can incorporate any arbitrary and non-independent feature. Firstly, we convert the segmentation to a tagging problem. Then, the characters are tagged by CRF model, and the corresponding segmentation result is obtained. A pereeotron algorithm is used in training parameters. The system is tested in the 1st SIGHAN PK testing set and the F-value is 95.2%.
出处 《微计算机信息》 北大核心 2006年第10S期178-180,共3页 Control & Automation
基金 863课题"中文平台评价体系研究与基础数据库建设"(2004AA114010) 863课题"中文信息处理与人机交互技术的测评系统和体系"(2003AA111010)
关键词 汉语分词 条件随机场 感知机 Chinese word segmentation, CRF, Perceptron
  • 相关文献

参考文献7

  • 1J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. [C] In Proceedings of the 18th International Conf. on Machine Learning, pages 282-289. 2001
  • 2Fuchun Peng, Fangfang Feng, and Andrew McCallum; Chinese Segmentation and New Word Detection using Conditional Random Fields. [C] In Proceedings of The 20th International Conference on Computational Linguistics (COLING 2004) , pages 562-568, August 23-27, 2004
  • 3Ng, Hwee Tou & Low, Jin Kiat. Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? [C] Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. EMNLP 2004.
  • 4N. Xue. Chinese Word Segmentation as Character Tagging. [C]International Journal of Computational Linguistics and Chinese Language Processing.2003
  • 5Collins, M. (2002). Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with the Perceptron Algorithm. [C] In Proceedings of EMNLP 2002.
  • 6R. Sproat and T. Emerson. The first international Chinese word segmentation bakeoff. [C] In Proc. of SIGHAN Workshop. 2003.
  • 7金春实,丁晓青,彭良瑞,刘长松.基于词素的日文分词方法及其在OCR系统中的应用[J].微计算机信息,2006(01X):244-246. 被引量:2

二级参考文献1

共引文献1

同被引文献125

引证文献15

二级引证文献112

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部