摘要
为提升机器英语翻译的有效性、翻译精度以及机器翻译模型的邻域自适应能力,该文研究了基于数据挖掘的机器英语翻译模型。利用LDA模型挖掘平行语料内文本中的主题信息,利用词表上的多项式分布表示主题,判断文档集合内各文档主题所占比例,依据概率抽样主题相应词表的多项式分布获取具体单词,利用极大似然估计方法处理目标语言单语语料,并将平行语料作为训练目标,通过重要性采样以及全概率公式估计目标语言单语语料,建立机器英语翻译模型,选取束搜索方法采样获取估算期望值,实现英语语句翻译。模型测试结果表明,采用该模型翻译不同语料库内语句的语义信息的召回率高于96%,GLEU值高于58,邻域自适应能力较强。
In order to improve the effectiveness and translation accuracy of machine English translation,improve the neighborhood adaptive ability of the machine translation model,the machine English translation model is studied based on data mining.Use the LDA model to mine the topic information in the text in the parallel corpus,use the polynomial distribution on the vocabulary to represent the topic,determine the proportion of each document topic in the document collection,and obtain specific words based on the polynomial distribution of the corresponding vocabulary of the probability sampling topic.The maximum likelihood estimation method is used to process the target language monolingual corpus,and the parallel corpus is used as the training target,estimate the target language monolingual corpus through importance sampling and the full probability formula to establish a machine English translation model,and select the beam search method to sample to obtain the estimated expected value realize English sentence translation.The model test results show that the semantic information recall rate of sentences in different corpora translated using this model is higher than 96%,the GLEU value is higher than 58,and the domain adaptive ability is strong.
作者
王雪
王娟
胡仁青
WANG Xue;WANG Juan;HU Renqing(Xi’an Traffic Engineering Institute,Xi’an 710300,China)
出处
《电子设计工程》
2022年第15期167-171,共5页
Electronic Design Engineering
基金
西安交通工程学院2021年中青年基金项目(21KY-62)。