摘要
在蒙汉神经机器翻译中,输入序列的基本粒度对翻译效果有一定的影响。为了选择合适的翻译粒度,分别对蒙古语和汉语进行词-词、词-子词、子词-词、子词-子词粒度的切分,并对比不同粒度在长短时记忆网络和Transformer翻译模型中的翻译表现。实验结果表明,在两种翻译模型中,对两种语料同时进行子词粒度切分效果最好。
In Mongolian-Chinese neural machine translation,the basic granularity of the input sequence has a certain impact on the translation effect.In order to choose the appropriate granularity of translation,this paper divides the granularity of word-word,word-subword,subword-word and subword-subword in Mongolian and Chinese respectively.And the translation performance of different granularity in the long-term memory and Transformer translation model is compared.The experimental results show that in the two translation models,the subword granularity segmentation of the two corpora is the best.
作者
高芬
苏依拉
牛向华
赵旭
范婷婷
仁庆道尔吉
Gao Fen;Su Yila;Niu Xianghua;Zhao Xu;Fan Tingting;Ren Qingdaoerji(College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,Inner Mongolia,China)
出处
《计算机应用与软件》
北大核心
2020年第4期145-149,170,共6页
Computer Applications and Software
基金
国家自然科学基金项目(61363052,61502255)
内蒙古自治区自然科学基金项目(2016MS0605)
内蒙古自治区民族事务委员会基金项目(MW-2017-MGYWXXH-03)。