摘要
低资源神经机器翻译的研究难点是缺乏大量的平行语料来给模型进行训练。随着预训练模型的发展,并且在各大自然语言处理任务中均取得很大的提升,本文提出一种融合ELMO预训练模型的神经机器翻译模型来解决低资源神经机器翻译问题。本文模型在土耳其语-英语低资源翻译任务上相比于反向翻译提升超过0.7个BLEU,在罗马尼亚语-英语翻译任务上提升超过0.8个BLEU。此外,在模拟的中-英、法-英、德-英、西-英这4组低资源翻译任务上相比于传统神经机器翻译模型分别提升2.3、3.2、2.6、3.2个BLEU。实验表明使用融合ELMO的模型来解决低资源神经机器翻译问题是有效的。
The difficulty in low-resource neural machine translation is lack of numerous parallel corpus to train the model.With the development of the pre-training model,it has made great improvements in various natural language processing tasks.In this paper,a neural machine translation model combining ELMO is proposed to solve the problem of low-resource neural machine translation.There are more than 0.7 BLEU improvements in the Turkish-English low-resource translation task compared to the back translation,and more than 0.8 BLEU improvements in the Romanian-English translation task.In addition,compared with the traditional neural machine translation model,the simulated low-resource translation tasks of Chinese-English,French-English,German-English and Spanish-English increase by 2.3,3.2,2.6 and 3.2 BLEU respectively.The experimental results show that the ELMO model is effective for low-resource neural machine translation.
作者
王浩畅
孙孟冉
赵铁军
WANG Hao-chang;SUN Meng-ran;ZHAO Tie-jun(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China;School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)
出处
《计算机与现代化》
2021年第7期38-42,共5页
Computer and Modernization
基金
国家自然科学基金资助项目(61402099,61702093)。
关键词
低资源
平行语料
预训练模型
神经机器
翻译模型
low-resource
parallel corpus
pre-training model
neural machine
translation model