期刊文献+

基于加权二部图的汉日词对齐 被引量:7

Word Alignment Between Chinese and Japanese Based on Weighted Bipartite Graph
下载PDF
导出
摘要 高效的自动词对齐技术是词对齐语料库建设的关键所在。当前很多词对齐方法存在以下不足:未登录词问题、灵活翻译问题和全局最优匹配问题。针对以上不足,该文提出加权二部图最大匹配词对齐模型,利用二部图为双语句对建模,利用词形、语义、词性和共现等信息计算单词间的相似度,利用加权二部图最大匹配获得最终对齐结果。在汉日词语对齐上的实验表明,该方法在一定程度上解决了以上三点不足,F-Score为80%,优于GIZA++的72%。 The paper proposed a word alignment model which matches words by maximum matching on a weighted bipartite graph and measures word similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence. The experiments on Chinese-Japanese word aligment shows that this model can partly solve some problems of existing word alignment methods, such as the unknown word problem, the synonym problem and the global optimization problem. In the experiment, the F-score of our method is 80%, better than the F-score 72% of GIZA+ +.
出处 《中文信息学报》 CSCD 北大核心 2007年第5期101-106,共6页 Journal of Chinese Information Processing
基金 富士施乐访问研究员计划的资助
关键词 计算机应用 中文信息处理 词对齐 二部图 匹配 computer application Chinese information processing word alignment bipartite graph matching
  • 相关文献

参考文献11

  • 1F.Och,H.Ney..A Systematic Comparison of Various Statistical Alignment Models[J].Computational Linguistics,2003,29(1):19-51.
  • 2P.F.Brown,J.Cocke,S.A.D.Pietra,V.J.D.Pietra,et al.A Statistical Approach to Machine Translation[J].Computational Linguistics,1990,16(2):79-85.
  • 3W.Gale,K.Church.Identifying Word Correspondances in Parallel Texts[A].DARPA Workshop on Speech and Natural Language[C].Canada:Pacific Grove,1991,152-157.
  • 4Y.Zhang,Q.Ma,H.Isahara.Use of Kanji Information in Constructing a Japanese-Chinese Bilingual Lexicon[A].The 4th Workshop on Asian Language Resources[C].Hainan:2004,39-46.
  • 5D.WU.Bracketing and Aligning Words and Constituents in Parallel Text Using Stochastic Inversion Transduction Grammars[A].Parallel Text Processing:Alignment and Use of Translation Corpora[M].Dordrecht:Kluwer,2000.
  • 6刘小虎,吴葳,李生,赵铁军,蔡萌,鞠英杰.基于词典和统计的语料库词汇级对齐算法[J].情报学报,1997,16(1):21-27. 被引量:8
  • 7吕雅娟,赵铁军,李生,杨沐昀.统计和词典方法相结合的双语语料库词对齐[A].第六届全国计算语言学联合学术会议[C],2001,8.
  • 8常宝宝.基于统计的翻译等价词对抽取研究[J].计算机学报,2003,26(5):616-621. 被引量:11
  • 9吕学强,吴宏林,姚天顺.无双语词典的英汉词对齐[J].计算机学报,2004,27(8):1036-1045. 被引量:11
  • 10J.Jiang and D.Conrath.Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy[A].International Conference on Research in Computational Linguistics[C].China,Taiwan:1997,19-33.

二级参考文献26

  • 1刘小虎,吴葳,李生,赵铁军,蔡萌,鞠英杰.基于词典和统计的语料库词汇级对齐算法[J].情报学报,1997,16(1):21-27. 被引量:8
  • 2Xu Dong-Hua. Aligning and matching of English-Chinese bilingual texts of CNS news. Department of Information System and Computer Science, National Univerisity of Singapore:Technical Report: cmp-lg/9608017, 1996
  • 3Brown P.F., Lai J.C., Mercer R.L. et al.. Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991, 169~176
  • 4Gale W.A., Church K.W.. A program for aligning sentences in bilingual corpora. Computational Linguistics, 1993,19(1): 75~102
  • 5Kay M., Roscheisen M.. Text-translation alignment.Computational Linguistics, 1993, 19(1): 121~142
  • 6Chen S.F.. Aligning sentences in bilingual corpora using lexical information. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, 1993, 9~16
  • 7Wu De-Kai. Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32th Annual Conference of the Association for Computational Linguistics, Las Cruces, NM, 1994, 80~87
  • 8Imamura K.. A hierarchical phrase alignment from English and Japanese bilingual text. In: Proceedings of the 2nd International Conference on Intelligent Text Processing and Computational Linguistics, Mexico, 2001, 206~207
  • 9Ker S.J.,Chang J.S.. A class-based approach to word alignment. Computational Linguistics, 1997, 23(2): 313~344
  • 10Borin L.. You'll take the high road and I'll take the low road: Using a third language to improve bilingual word alignment. In: Proceedings of the 18th International Conference of Computational Linguistics, Saarbrucken, Germany,2000, 97~103

共引文献26

同被引文献59

  • 1吕学强,吴宏林,姚天顺.无双语词典的英汉词对齐[J].计算机学报,2004,27(8):1036-1045. 被引量:11
  • 2郑丽英.数据结构Trie及其应用[J].现代计算机,2004,10(8):20-22. 被引量:6
  • 3张孝飞,陈肇雄,黄河燕,王建德.基于锚点词对的双语词对齐算法[J].小型微型计算机系统,2006,27(2):330-334. 被引量:10
  • 4王思力,张华平,王斌.双数组Trie树算法优化及其应用研究[J].中文信息学报,2006,20(5):24-30. 被引量:29
  • 5Brown P F, Pietra V J D, Pietra S A D, et al. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 1993, 19(2) : 263 -311.
  • 6Vogel S, Ney H, Tillmann C. HMM-Based Word Alignment in Sta- tistical Translation// Proc of the 16th Conference on Computational Linguistics. Stroudsburg, USA, 1996: 836- 841.
  • 7Och F J, Ney H. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 2003,29( 1 ) : 19 -51.
  • 8Wu Dekai. Stochastic Inversion Transduction Grammars and Bilin- gual Parsing of Parallel Corpora. Computational Linguistics, 1997, 23 (3) : 377 - 403.
  • 9Zhang Hao, Gildea D. Stochastic Lexicalized Inversion Transduction Grammar for Alignment // Proc of the 43rd Annual Meeting on Association for Computational Linguistics. Ann Arbor, USA, 2005 : 475 - 482.
  • 10Haghighi A, Blitzer J, de Nero J, et al. Better Word Alignments with Supervised ITG Models// Proc of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Con- ference on Natural Language Processing of the AFNLP. Suntee, Sin- gapore, 2009 : 923 - 931.

引证文献7

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部