摘要
高效的自动词对齐技术是词对齐语料库建设的关键所在。当前很多词对齐方法存在以下不足:未登录词问题、灵活翻译问题和全局最优匹配问题。针对以上不足,该文提出加权二部图最大匹配词对齐模型,利用二部图为双语句对建模,利用词形、语义、词性和共现等信息计算单词间的相似度,利用加权二部图最大匹配获得最终对齐结果。在汉日词语对齐上的实验表明,该方法在一定程度上解决了以上三点不足,F-Score为80%,优于GIZA++的72%。
The paper proposed a word alignment model which matches words by maximum matching on a weighted bipartite graph and measures word similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence. The experiments on Chinese-Japanese word aligment shows that this model can partly solve some problems of existing word alignment methods, such as the unknown word problem, the synonym problem and the global optimization problem. In the experiment, the F-score of our method is 80%, better than the F-score 72% of GIZA+ +.
出处
《中文信息学报》
CSCD
北大核心
2007年第5期101-106,共6页
Journal of Chinese Information Processing
基金
富士施乐访问研究员计划的资助
关键词
计算机应用
中文信息处理
词对齐
二部图
匹配
computer application
Chinese information processing
word alignment
bipartite graph
matching