期刊文献+

利用音译和网络挖掘翻译命名实体 被引量:11

Named Entity Translation with Web Mining and Transliteration
下载PDF
导出
摘要 本文提出了一种新颖的方法,综合利用音译和网络挖掘来提高命名实体翻译的效果。具体而言,首先利用音译模型生成一个候选翻译,然后利用音译信息配合网络挖掘获得更多的候选翻译。最后,使用最大熵(MaximumEntropy)模型综合考虑源词和候选翻译之间的各种特征,如发音相似度,上下文本特征,网页共现关系等,来排序得到的候选翻译,从而决定最终的翻译结果。实验结果显示我们的方法显著的提高了命名实体翻译的精确度。 This paper presents a novel approach to improve the named entity translation by combining transliteration with web mining. For the details of the approach, a transliteration model is used to generate a translation candidate, and then the web information applied to get more translations. A Maximum Entropy (ME) model is employed to rank the translation candidates with various features such as pronouncing similarity, contextual features, cooccurrence etc. The experimental results show that our approach effectively improves the precision of the named entity translation by a large margin.
出处 《中文信息学报》 CSCD 北大核心 2007年第1期23-29,共7页 Journal of Chinese Information Processing
关键词 人工智能 机器翻译 音译 命名实体翻译 网络挖掘 artificial intelligence machine translation transliteration named entity translation web mining
  • 相关文献

参考文献22

  • 1Gao Wei.Phoneme-based Statistical Transliteration of Foreign Names for OOV Problem[D].The Chinese University of Hong Kong.2004.
  • 2Donghui Feng,Yajuan Lv,Ming Zhou.2004.A New Approach for English-Chinese Named Entity Alignment[A].In:Proc.of EMNLP-2004[C].pp.372-379.
  • 3Jenq-Haur Wang,Jei-Wen Teng,Pu-Jen Cheng,Wen-Hsiang Lu,Lee-Feng Chien.2004.Translating un-known cross-lingual queries in digital libraries using a web-based approach[A].In:Proc.of JCDL 2004[C],pp.108-116.
  • 4Julian Kupiec.1993.An algorithm for finding noun phrase correspondences in bilingual corpora[A].In:Proc.of the 31st Annual Meeting of the ACL[C],pp.17-22.
  • 5Knight K.,Graehl J.1998.Machine Transliteration[A].Computational Linguistics[J] 24(4):599-612.
  • 6Masaaki Nagata,Teruka Saito,and Kenji Suzuki.2001.Using the Web as a Bilingual Dictionary[A].In:Proc.of ACL 2001 Workshop on Data-driven Methods in Machine Translation[C],pp.95-102.
  • 7Pascale Fung and Lo Yuen Yee.1998.An IR Approach for Translating New Words from Nonparallel,Com-parable Texts[A].In:Proc.of the 36th Annual Conference of the ACL[C],pp 414-420.
  • 8Pu-Jen Cheng,Wen-Hsiang Lu,Jer-Wen Teng,Lee-Feng Chien.2004.Creating Multilingual Translation Lexicons with Regional Variations Using Web Corpora[A].In:Proc.of ACL-04[C].pp.534-541.
  • 9Paola Virga and Sanjeev Khudanpur.2003.Transliteration of proper names in cross-lingual information retrieval[A].In:Proc.of the ACL Workshop on Multi-lingual Named Entity Recognition[C].pp.57-64.
  • 10Reinhard Rapp.1999.Automatic identification of word translations from unrelated English and German corpora[A].In:Proc.of ACL-99[C],pp.519-526.

二级参考文献11

  • 1周强,俞士汶.汉语短语标注标记集的确定[J].中文信息学报,1996,10(4):1-11. 被引量:35
  • 2Xun E, ghou M, and Huang C. A Unified Statistical Modal for the Identification of English Base NP.The 38th Annual Meeting of the Association for Computational Linguistics [C], 2002.
  • 3Lance A. Ramshaw and Mitchell P. Marcus. Text Chunking Using Transformation-Based Learning.Proceedings of the Third ACL Workshop on Very Large Corpora [C], Cambridge MA, USA, 1995.
  • 4Jlian M. Kupiec. An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora. Proceedings of the 3Ist Annual Meeting of the ACL [ C] ,1993.
  • 5Smadja F, McKeown K. R and Hatzivassiloglou V. Translation Collocations for Bilingual Lexicons: A Statistical Approach [J] Computational Linguistics 1996,22(1) : 1 - 38.
  • 6Melamed I. D. Automatic Discovery of Non-Compositional Compounds. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing [C], Providence, RI 1997.
  • 7Jianfeng Gao, Jian-Yun Nie. Improving Query Translation for Cross-language Information Retrieval Using Statistical Models Proceedings of the 24th annual international ACMSIGIR conference [C] 96 - 104,2001.
  • 8Fung P,Proceedingsofthe 15thInternationalConferenceonComputationalLinguistics(COLING’,1994年,1096页
  • 9Fung P,ProceedingsoftheFirstConferenceoftheAssociationforMa chineTranslationintheAmeric,1994年,81页
  • 10赵军,黄昌宁.基于转换的汉语基本名词短语识别模型[J].中文信息学报,1999,13(2):1-7. 被引量:41

共引文献9

同被引文献145

  • 1孙乐乐.中文地名翻译浅谈[J].科技经济市场,2006(11). 被引量:3
  • 2孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 3张永臣,孙乐,李飞,李文波,西野文人,于浩,方高林.基于Web数据的特定领域双语词典抽取[J].中文信息学报,2006,20(2):16-23. 被引量:11
  • 4李中国,刘颖.边界模板和局部统计相结合的中国人名识别[J].中文信息学报,2006,20(5):44-50. 被引量:13
  • 5NIST. The ACE 2007 (ACE07) Evaluation Plan: Evaluation of the Detection and Recognition of ACE Entities, Values, Temporal Expressions, Relations, and Events [EB/OL]. [-2007]. http://www, hist. gov/ speech/tests/ace/2OOT/doc/aceOT-evalplan, vl. 3a. pdf.
  • 6Nancy A. Chinchor. Overview of MUC-7/MET-2[C]//Proceedings of the Seventh Message Under- standing Conference (MUC-7), Fairfax, Virginia, 1998.
  • 7Gina Anne Levow. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition[C]//Proceedings of the Fifth SigHAN Workshop on Chinese Language Processing, Sydney: Association for Computational Lin- guistics, 2006:108 117.
  • 8A. Mikheev, C. Grover, Moens M. Description of the LTG System Used for MUC-7[C]//Proceedings of 7th Message Understanding Conference ( MUC-7 ), Fairfax, Virginia, 1998.
  • 9863计划中文信息处理与智能人机接口技术评测组.2004年度863计划中文信息处理与智能人机交互技术评测:命名实体评测结果报告[R].北京:863计划中文信息处理与智能人机接口技术评测组,2004.
  • 10Ralph Grishman, Beth Sundheim. Design of the MUC-6 evaluation [C]//Proceedings of 6th Message Under- standing Conference, Columbia, MD, 199S.

引证文献11

二级引证文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部