期刊文献+

基于用字共现频率统计的外国译名自动识别 被引量:1

Automatic identification of transliterated name based on co-occurrence frequency statistics of words
下载PDF
导出
摘要 为了减少分词的负面效果,提出了基于用字共现频率统计的外国译名自动识别方法。对译名的用字特征进行了统计,提出译名共现字串的概念,并由译名用字表与汉语常用字表得到了非译名用字表。在上述工作的基础上定义了译名的边界,在边界定义的基础上设计了一种对分词错误的调整方法。对开放语料的测试结果表明,与最大词频分词算法相比,该算法在译名识别中的准确率、召回率、F值均有所提高。 To reduce the negative impact of segmentation, an automatic recognition algorithm for transliterated name recognition based on co-occurrence frequency statistics of words is presented. Firstly, the statistical features of word of transliterated name are summarized and then the concept of co-occurrence string is proposed. The character table of non-translated name is obtained through the character table of transliterated name and the commnon Chinese character table. Secondly, the boundary of transliterated name is defined based on these above. Finally, an adjustment method is designed to deal with errors of segmentation based on the definition of boundary. The result of experiment is satisfied in comparison with maximum word frequency segmentation algorithm. The recall rate, precision rate and F values of identification are enhanced.
出处 《计算机工程与设计》 CSCD 北大核心 2012年第1期362-366,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(60702056)
关键词 外国译名 分词 共现字串 频率统计 译名边界 自然语言处理 transliterated name segmentation co-occurrence string frequency statistics boundary of transliterated name; natural language processing
  • 相关文献

参考文献14

二级参考文献75

共引文献415

同被引文献9

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部