期刊文献+

基于编码器-解码器重构框架的蒙汉神经机器翻译 被引量:6

MONGOLIAN-CHINESE NEURAL MACHINE TRANSLATION BASED ON ENCODER-DECODER RECONSTRUCTION FRAMEWORK
下载PDF
导出
摘要 近年来,端到端的神经机器翻译方法由于翻译准确率高,模型结构简单等优点已经成为机器翻译研究的重点,但其依然存在一个主要的缺点,该模型倾向于反复翻译某些源词,而错误地忽略掉部分词。针对这种情况,采用在端到端模型的基础上添加重构器的方法。首先利用Word2vec技术对蒙汉双语数据集进行向量化表示,然后预训练端到端的蒙汉神经机器翻译模型,最后对基于编码器-解码器重构框架的蒙汉神经机器翻译模型进行训练。将基于注意力机制的蒙汉神经机器翻译模型作为基线系统。实验结果表明,该框架显著提高了蒙汉机器翻译的充分性,比传统的基于注意力机制的蒙汉机器翻译模型具有更好的翻译效果。 In recent years,the end-to-end neural machine translation method has become the focus of machine translation research because of its high translation accuracy and simple model structure.However,it still has a major shortcoming.The model tends to repeatedly translate some source words and ignores some words by mistake.In this case,we adopt the method of adding a reconstructor based on the end-to-end model.The Word2vec technology was used to vectorize the Mongolian-Chinese bilingual dataset;the end-to-end Mongolian-Chinese neural machine translation model was pre-trained;we trained the Mongolian-Chinese neural machine translation model based on the encoder-decoder reconstruction framework.The Mongolian-Chinese neural machine translation model based on attention mechanism was used as the baseline system.The experimental results show that the framework significantly improves the sufficiency of Mongolian-Chinese machine translation,and has better translation effect than the traditional Mongolian-Chinese machine translation model based on attention mechanism.
作者 孙晓骞 苏依拉 赵亚平 王宇飞 仁庆道尔吉 Sun Xiaoqian;Su Yila;Zhao Yaping;Wang Yufei;Ren Qingdaoerji(College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,Inner Mongolia,China)
出处 《计算机应用与软件》 北大核心 2020年第4期150-155,163,共7页 Computer Applications and Software
基金 国家自然科学基金项目(61363052,61502255) 内蒙古自治区自然科学基金项目(2016MS0605) 内蒙古自治区民族委员会基金项目(MW-2017-MGYWXXH-03)。
关键词 蒙汉机器翻译 端到端 重构器 过译漏译 Mongolian-Chinese machine translation End-to-end Reconstructor Over-translation and missing-translation
  • 相关文献

参考文献5

二级参考文献53

  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:326
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:387
  • 3黄昌宁,赵海.中文分词十年回顾[J].中文信息学报,2007,21(3):8-19. 被引量:249
  • 4王凯华,李济洪,张国华,王瑞波.基于最大熵模型的中文阅读理解问答系统技术研究[C]..CNCCL-2007:内容计算的研究与应用前沿.北京:清华大学出版社,2007.643-648.
  • 5张娜,李济洪.基于语义标注的中文阅读理解语料库的建设[C]//CNCCL-2007内容计算的研究与应用前沿.北京:清华大学出版社,2007:338-343.
  • 6张乐.最大熵工具包MaxEnt(2004版)[CP/OL].[2004].http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html.
  • 7Hwee Tou Ng,Leong Hwee Teo,Jennifer Lai Pheng Kwan. A Machine Learning Approach to Answering Questions for Read- ing Comprehension Tesls[C]//Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora,2000.
  • 8Stacey Bailey,Detmar Meurers. Diagnosing meaning errors in short answer to reading comprehension questions[C]. In Joel Tetreault,Jill Burstein, and Rachele De Feliee, editors,Proceedings, of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications (BEA-3) at ACL' 08,pages 107-115,Columbus,Ohio. 2008.
  • 9Detmar Meurers,Ramon Ziai,Niels Ott,et al. Evaluating answers to reading comp-rehension questions in context: Results for Gernan and the role of information strueture[C]. In Proceedings of the TextInfer 2011 Workshop on Textual Entailment,pages 1-9, Edinburgh, Sco tland, UK, July. Association for Computational Linguistics. 2011.
  • 10Collobert,R,Weston J. A unified architecture for natural language processing:Deep neural networks with multitask learning [J]. ICML. 2008.

共引文献79

同被引文献50

引证文献6

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部