摘要
近年来,端到端的神经机器翻译方法由于翻译准确率高,模型结构简单等优点已经成为机器翻译研究的重点,但其依然存在一个主要的缺点,该模型倾向于反复翻译某些源词,而错误地忽略掉部分词。针对这种情况,采用在端到端模型的基础上添加重构器的方法。首先利用Word2vec技术对蒙汉双语数据集进行向量化表示,然后预训练端到端的蒙汉神经机器翻译模型,最后对基于编码器-解码器重构框架的蒙汉神经机器翻译模型进行训练。将基于注意力机制的蒙汉神经机器翻译模型作为基线系统。实验结果表明,该框架显著提高了蒙汉机器翻译的充分性,比传统的基于注意力机制的蒙汉机器翻译模型具有更好的翻译效果。
In recent years,the end-to-end neural machine translation method has become the focus of machine translation research because of its high translation accuracy and simple model structure.However,it still has a major shortcoming.The model tends to repeatedly translate some source words and ignores some words by mistake.In this case,we adopt the method of adding a reconstructor based on the end-to-end model.The Word2vec technology was used to vectorize the Mongolian-Chinese bilingual dataset;the end-to-end Mongolian-Chinese neural machine translation model was pre-trained;we trained the Mongolian-Chinese neural machine translation model based on the encoder-decoder reconstruction framework.The Mongolian-Chinese neural machine translation model based on attention mechanism was used as the baseline system.The experimental results show that the framework significantly improves the sufficiency of Mongolian-Chinese machine translation,and has better translation effect than the traditional Mongolian-Chinese machine translation model based on attention mechanism.
作者
孙晓骞
苏依拉
赵亚平
王宇飞
仁庆道尔吉
Sun Xiaoqian;Su Yila;Zhao Yaping;Wang Yufei;Ren Qingdaoerji(College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,Inner Mongolia,China)
出处
《计算机应用与软件》
北大核心
2020年第4期150-155,163,共7页
Computer Applications and Software
基金
国家自然科学基金项目(61363052,61502255)
内蒙古自治区自然科学基金项目(2016MS0605)
内蒙古自治区民族委员会基金项目(MW-2017-MGYWXXH-03)。
关键词
蒙汉机器翻译
端到端
重构器
过译漏译
Mongolian-Chinese machine translation
End-to-end
Reconstructor
Over-translation and missing-translation