摘要
基于注意力机制的神经网络机器翻译模型已经成为目前主流的翻译模型,在许多翻译方向上均超过了统计机器翻译模型,尤其是在训练语料规模比较大的情况下,优势更加明显。该模型使用编码器—解码器框架,将翻译任务建模成序列到序列的问题。然而,在基于门控循环单元(gated recurrent unit,GRU)的编码器—解码器模型中,随着模型层数的增加,梯度消失的问题使模型难以收敛并且严重退化,进而使翻译性能下降。该文使用了一种简单循环单元(simple recurrent unit,SRU)代替GRU单元,通过堆叠网络层数加深编码器和解码器的结构,提高了神经网络机器翻译模型的性能。我们在德语—英语和维语—汉语翻译任务上进行了实验,实验结果表明,在神经网络机器翻译模型中使用SRU单元,可以有效地解决梯度消失带来的模型难以训练的问题;通过加深模型能够显著地提升系统的翻译性能,同时保证训练速度基本不变。此外,我们还与基于残差连接(residual connections)的神经网络机器翻译模型进行了实验对比,实验结果表明,我们的模型有显著性优势。
Attention-based neural machine translation models have become extremely popula,with an encoder-decoder framework to model translation as a sequence to sequence problem.In this paper,we replace the gated recurrent units in the classical encoder and decoder with the simple recurrent units(SRUs),and deepen the structure of the encoder and decoder by stacking network layers to improve the performance of neural machine translation model.We conducted experiments on the German-English and Uyghur-Chinese translation tasks.Experiment results show that the performance is significantly improved without extra training speed,especially with residual connections.
作者
张文
冯洋
刘群
ZHANG Wen;FENG Yang;LIU Qun(Keylab of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;ADAPT Centre,School of Computing,Dublin City University,Dublin,Ireland)
出处
《中文信息学报》
CSCD
北大核心
2018年第10期36-44,共9页
Journal of Chinese Information Processing
关键词
门控循环单元
梯度消失
残差连接
简单循环单元
gated recurrent unit
gradient vanishing
residual connection
simple recurrent unit