期刊文献+

Wikipedia跨语言链接发现中的锚文本译项选择

The Translation Selection of Anchor Text in Wikipedia Cross-Lingual Link Discovery
下载PDF
导出
摘要 Wikipedia跨语言链接发现主要研究从源语言Wikipedia文章中自动识别与主题相关的锚文本,并为锚文本推荐一组相关的目标语言链接。该研究涉及三个关键问题:锚文本识别、锚文本翻译和目标链接发现。在锚文本翻译中,一个锚文本可能存在多个目标译项,如果其译项选择有误,将会直接影响目标链接发现中的链接推荐的准确性。为此,该文提出了一种基于上下文的锚文本译项选择方法,使用基于逐点互信息投票的方式确定锚文本的译项。对中英文Wikipedia中的人名、术语以及缩略语的译项选择进行测试,实验表明该方法取得了较好的效果。 The research on Wikipedia Cross-Lingual Link Discovery(CLLD)is to automatically identify an anchor text related to topic from source language Wikipedia articles,and recommend a set of relevant target language links to the anchor text.It involves three key problems:anchor text identification,anchor text translation,and target link discovery.To deal with the multiple target translations of an anchor text,we propose a context-based translation selection method,which uses a vote method based on pointwise mutual information(PMI).Experiments on the translation selection of person names,terminology and abbreviation in Chinese and English Wikipedia articles,the results show that the method achieves good performances.
出处 《中文信息学报》 CSCD 北大核心 2016年第2期196-201,216,共7页 Journal of Chinese Information Processing
基金 国家科技支撑计划资助项目(2012BAH14F00) 国家973计划资助项目(2010CB530401)
关键词 WIKIPEDIA 跨语言链接发现 锚文本 译项选择 逐点互信息 Wikipedia CLLD anchor text translation selection PMI
  • 相关文献

参考文献15

  • 1涂新辉,张红春,周琨峰,何婷婷.中文维基百科的结构化信息抽取及词语相关度计算方法[J].中文信息学报,2012,26(3):109-115. 被引量:24
  • 2Huang W C, Trotman A, Geva S. A Virtual Evalua- tion Track for Cross Language Link Discovery I-A]. In SIGIR'09. Boston, USA, 2009: 1-7.
  • 3Tang L X, Trotman A, Geva S, et al. Cross-Lingual Knowledge Discovery: Chinese-to-English Article Linking in Wikipedia [J]. Information Retrieval Tech- nology. Springer Berlin Heidelberg, 2012: 286-295.
  • 4Kang I S, Marigomen R. English-to-Korean Cross- linking of Wikipedia Articles at KSLP [-C]//Proceed- ings of NTCIR-9, Tokyo, Japan, 2011: 481-483.
  • 5Tang L X, Cavanagh D, Trotman A. Automated Cross-lingual Link Discovery in Wikipedia [C]//Pro- ceedings of NTCIR-9, Tokyo, Japan, 2011: 512-529.
  • 6Liu M F, Kang L, Yang S, et al. WUST EN-CS Crosslink System at NTCIR-9 CLLD Task rC]//Pro- ceedings of NTCIR-9, Tokyo, Japan, 2011: 508-511.
  • 7Gao Y F, Xu H J, Zhang J S, et al. Multi-filtering Method Based Cross-lingual Link Discovery [C]//Pro- ceedings of NTCIR-9, Tokyo, Japan, 2011: 520-523.
  • 8Kim J, Gurevych I. UKP at CrossLink, Anchor Text Translation for Cross-lingual Link Discovery [C]// Proceedings of NTCIR-9, Tokyo, Japan, 2011: 487- 494.
  • 9赵军.命名实体识别、排歧和跨语言关联[J].中文信息学报,2009,23(2):3-17. 被引量:51
  • 10郭稷,吕雅娟,刘群.一种有效的基于Web的双语翻译对获取方法[J].中文信息学报,2008,22(6):103-109. 被引量:11

二级参考文献98

  • 1孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 2张永臣,孙乐,李飞,李文波,西野文人,于浩,方高林.基于Web数据的特定领域双语词典抽取[J].中文信息学报,2006,20(2):16-23. 被引量:11
  • 3蒋龙,周明,简立峰.利用音译和网络挖掘翻译命名实体[J].中文信息学报,2007,21(1):23-29. 被引量:11
  • 4Y. Zhang and P. Vines. Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval [C]//the Proceedings of SIGIR 2004, 162-169.
  • 5F. Huang, Y. Zhang and S. Vogel. Mining Key Phrase Translations from Web Corpora[C]//the Proceedings of HLT-EMNLP 2005: 483-490.
  • 6F. Huang, S. Vogel and A. Waibel. Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization[C]//the Proceedings of ACL 2003 workshop on Multilingual and mixed-language named entity recognition,9-16.
  • 7F. Huang and S. Vogel. Improved Named Entity Translation and Bilingual Named Entity Extraction [C]//the Proceedings of ICMI 2002, 253-258.
  • 8Y. Zhang and P. Vines. Detection and Translation of OOV Terms Prior to Query Time[C]//the Proceed ings of SIGIR2004,524-525.
  • 9G. H. Cao, J. F. Gao and J. Y. Nie. A System to Mine Large-Scale Bilingual Dictionaries from Monolingual Web Pages[C]//MT Summit XI, 57-64.
  • 10M. Collins and N. Dully. New Ranking Algorithms for Parsing and Tagging: Kernel over Discrete Struc- tures, and the Voted Perceptron[C]//the Proceedings of ACL2002, 263-270.

共引文献90

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部