摘要
自动后编辑(APE)是一种自动修改机器译文错误的方法,能够改善机器翻译系统的译文质量。目前,APE研究主要集中于通用领域,然而对于专业性强和译文质量要求较高的专利译文的APE则鲜有研究。文中研究了专利译文自动后编辑,提出了翻译错误类分布加权的专利译文自动后编辑集成模型。首先,提出术语加权翻译编辑率(WTER)计算方法,在翻译编辑率(TER)中加入了每个词的术语概率因子,使术语错误较多的样本WTER值较高。然后,通过WTER从3个机器翻译系统构造的训练数据中选择错译、漏译、增译与移位错误样本子集分别构建错误修正偏向性APE子模型。最后,通过翻译错误类分布加权错误修正偏向性APE子模型。该方法针对专利专业性、强术语较多的特点,每个子模型分别面向一类错误,考虑了错误修正的偏向性,通过模型集成兼顾了译文错误多样性,在英中专利摘要数据集上的实验结果表明,相比3个基线系统,所提方法的BLEU值分别平均提升了2.52,2.28和2.27。
Automatic post-editing(APE)is a method of automatically modifying errors in machine translation,which can improve the quality of machine translation system.Currently,APE research mainly focuses on general domains.However,there is little research on APE for patent translations,which requires high translation quality due to their strong professionalism.This paper proposes an ensemble model of APE of patent translation based on the weighted distribution of translation errors.Firstly,the term weighted translation edit rate(WTER)calculation method is proposed,which introduces the concept of term probability factor in translation edit rate(TER),and improves the WTER value of samples with more term errors.Then,the proposed WTER model is used to select subsets of mistranslation,missing translation,additional tralslation and shift error samples from the training data constructed by the three machine translation systems to construct the error correction biased APE sub-model,respectively.Finally,the biased APE sub-model is corrected by the weighted distribution of translation errors.The proposed method considers the strong professionalism and numerous technical terms in patent translations.Based on the consideration of error-correction bias,it integrates multiple sub-models to balance the diversity of translation errors.Experimental results on an English-Chinese patent abstract dataset show that,compared with the three baseline systems,the proposed method improves the BLEU values by an average of 2.52,2.28,and 2.27,respectively.
作者
赵三元
王裴岩
叶娜
赵欣瑜
蔡东风
张桂平
ZHAO Sanyuan;WANG Peiyan;YE Na;ZHAO Xinyu;CAI Dongfeng;ZHANG Guiping(Human-Computer Intelligence Research Center,Shenyang Aerospace University,Shenyang 110136,China)
出处
《计算机科学》
CSCD
北大核心
2023年第S02期44-51,共8页
Computer Science
基金
国家自然科学基金(U1908216)
教育部人文社会科学研究青年基金(19YJC740107)
沈阳市科学技术计划(20-202-1-28)。
关键词
自动后编辑
专利译文
翻译错误类分布
集成
翻译编辑率
Automatic post-editing
Patent translation
Distribution of translation errors
Ensemble
Translation edit rate