摘要
针对当前英汉自动翻译搜索引擎存在关键词提取准确率低,导致英汉翻译效果不佳的问题,设计一个基于多语料库关键词搜索的英汉自动在线翻译系统。通过网络爬虫采集中英文数据并建立一个数据库;然后采用基于词共现+位置信息+相似度的文本关键词提取算法进行关键词提取,并通过特征词加权计算关键词相似度阀值方法进行英文检索和文本分类;由此实现英汉在线自动翻译。结果表明,对比于传统的TF-IDF算法和基于共现词的关键提取算法,提出的关键词提取算法的查准率、查全率和综合指标均为最高,关键词提取效果更好。提出的改进相似度阀值计算方法的查全率和查准率分别为91.5%和98.2%,相较于现有的编辑距离相似度算法、余弦相似度算法明显更高。且本算法的时间损耗仅为60 s,比另外两种算法分别低了180 s和390 s。由此可知,提出的算法可实现关键词特征准确检索和文本分类,英汉在线翻译效果显著提升,设计的系统具备可行性。
In view of the problem of low keyword extraction accuracy of the current English-Chinese automatic translation search engine, which leads to poor English-Chinese translation effect, an English-Chinese automatic online translation system based on multi-corpus keyword search is designed. Collect Chinese and English data and establish a database, adopt keyword extraction algorithm based on word co-occurrence + location information + similarity, and retrieve and classify keywords by calculating feature word weights;thus automatic English-Chinese online translation is realized. The results show that, compared with the traditional TF-IDF algorithm and the key extraction algorithm based on co-occurrence words, the proposed keyword extraction algorithm has the highest accuracy, recall and comprehensive index, and the keyword extraction effect is better. The recall and accuracy rates of the proposed improved similarity threshold calculation method are 91.5% and 98.2%, respectively, which are significantly higher than the existing edit distance similarity algorithms and cosine similarity algorithms. Moreover, the time loss of this algorithm is only 60 s, which is 180 s and 390 s lower than the other two algorithms. Therefore, the proposed algorithm can realize accurate keyword feature retrieval and text classification, the effect of English-Chinese online translation effect is significantly improved, and the designed system is feasible.
作者
贺婧
HE Jing(Xi’an Innovation College of Yan’an University,Xi’an 710100,China)
出处
《自动化与仪器仪表》
2023年第2期170-175,180,共7页
Automation & Instrumentation
基金
“讲好中国故事”背景下探究中国特色话语的语际传播——以陕西省档案馆展览标识英译为例《陕西省教育厅2021年度专项科学研究计划》(21JK0445)。
关键词
关键词搜索
英汉自动翻译
网络爬虫
词共现
相似度阀值
keyword search
English-Chinese automatic translation
web crawler
word occurrence
similarity threshold