摘要
近年来,基于神经网络的机器翻译取得了快速发展,然而由于它需要大规模的平行语料库,所以对于资源稀缺的小语种的翻译往往显得效果不佳.在分析编码-解码框架和注意力机制的基础上,基于对偶学习的思想,提出了一种面向小语种翻译的半监督神经网络模型.该模型利用较大的单语语料库与少量平行语料库来实现小语种翻译.实验结果表明,当平行语料资源不足以训练一个普通神经网络模型时,使用半监督网络模型能够取得较好的结果,但所采用的半监督学习模型对单语语料库的数量要求非常高,要达到一定数量级才能达到良好效果.
Recent years,neural machine translation has achieved great development.However,its requirement for large-scale parallel corpora,translating low-resource languages fluently becomes a big challenge.This paper first briefly introduces the encoder-decoder framework and attention mechanism.Next,we propose a semi-supervised neural network model based on dual-learning,which can translate low-resource languages using some monolingual corpora and small parallel corpora.Finally,results show that semi-supervised neural machine translation can achieve reasonable results with parallel corpora which are insufficient to train a common neural model.However,the semi-supervised model requires a large number of monolingual corpora to achieve great performance.
作者
陆雯洁
谭儒昕
刘功申
孙环荣
LU Wenjie;TAN Ruxin;LIU Gongshen;SUN Huanrong(Shanghai Jiao Tong University,School of Electronic Information and Electrical Engineering,Shanghai 200240,China;Shanghai Jiao Tong University-Shanghai Songheng Information Content Analysis Joint Lab,Shanghai 200240,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2019年第2期200-208,共9页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(61772337
61472248)
关键词
半监督学习
小语种
机器翻译
semi-supervised learning
low-resource language
machine translation