期刊文献+

一种新的错误驱动学习方法在中文分词中的应用 被引量:9

A New Error-driven Learning Approach for Chinese Word Segmentation
下载PDF
导出
摘要 中文分词应用中一个很重要的问题就是缺乏词的统一性定义。不同的分词标准会导致不同的分词结果,不同的应用也需要不同的分词结果。而针对不同的分词标准开发多个中文分词系统是不现实的,因此针对多种不同的分词标准,如何利用现有的分词系统进行灵活有效的输出就显得非常重要。本文提出了一种新的基于转换的学习方法,对分词结果进行后处理,可以针对不同的分词标准进行灵活有效的输出。不同于以往的用于分词的转换学习方法,该方法有效利用了一些语言学信息,把词类和词內结构信息引入规则模板和转换规则中。为了验证该方法,我们在4个标准测试集上进行了分词评测,取得了令人满意的效果。 A well known problem for Chinese word segrnentation(CWS)is that we can not have a unique definition of words. Different standards may result in different word segmentation outputs. It is unrealizable to develop different CWS systems according to different applications or standards, so it is significantly important to flexibly adapt segmentation outputs towards different standards or applications using existing CWS system. The paper presents a linguistically enriched transformation-based learning approach for performing CWS adaptation as a postprocessor. Different from other transform-based learning used in CWS, the approach utilizes some linguistics information, and introduces word class and word internal structure to rule templates and transformations. The performance of the approach is evaluated on four different test sets, which represent four different standards. It turns out to be comparable to several state-ofthe-art approaches which perform Chinese word segmentation based on single standard.
出处 《计算机科学》 CSCD 北大核心 2006年第3期160-164,共5页 Computer Science
关键词 中文分词 规则模板 词类 词内结构 基于转换的学习(TBL) Chinese word segmentation, Rule template, Word class, Word internal structure, Transformation-based Learning(TBL)
  • 相关文献

参考文献12

  • 1Richard S,Emerson T.The first international Chinese word segmentation bakeoff.SIGHAN 2003.
  • 2Richard S,Shih C.Corpus-based methods in Chinese morphology and phonology.In:COOLING 2002.
  • 3Gao Jianfeng,Li Mu,Huang Chang-Ning.Improved source-channel model for Chinese word segmentation.ACL2003.
  • 4Gao J ianfeng,Wu Andi,Li Mu,et al.Adaptive Chinese word segmentation.ACL2004.
  • 5Wu Zimin,Tseng Gwyneth.Chinese text segmentation for text retrieval achievements and problems.JASIS,1993,44 (9):532 ~542.
  • 6Palmer D.A trainable rule-based algorithm for word segmentation.ACL '97.
  • 7Hockenmaier J,Brew C.Error-driven learning of Chinese word segmentation.In:the 12th Pacific Conference on Language and Information,Singapore,Chinese and Oriental Languages Processing Society,1998.218~229.
  • 8Xue Nianwen.Chinese word segmentation as character tagging.Computational Linguistics and Chinese Language Processing,2003.
  • 9Wu Andi,Jiang Z.Word Segmentation in Sentence Analysls. In:Proceedings of the 1998 International Conference on Chinese Information Processing, Beijing, China, 198. 169-180.
  • 10Wu Andi.Customizable segmentation of morphologically derived words in Chinese.International Journal of Computational Linguistics and Chinese Language Processing,2003,8(1):1~27.

同被引文献133

引证文献9

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部