基于加权二部图的汉日词对齐被引量：7

Word Alignment Between Chinese and Japanese Based on Weighted Bipartite Graph

下载PDF

导出

摘要高效的自动词对齐技术是词对齐语料库建设的关键所在。当前很多词对齐方法存在以下不足:未登录词问题、灵活翻译问题和全局最优匹配问题。针对以上不足,该文提出加权二部图最大匹配词对齐模型,利用二部图为双语句对建模,利用词形、语义、词性和共现等信息计算单词间的相似度,利用加权二部图最大匹配获得最终对齐结果。在汉日词语对齐上的实验表明,该方法在一定程度上解决了以上三点不足,F-Score为80%,优于GIZA++的72%。 The paper proposed a word alignment model which matches words by maximum matching on a weighted bipartite graph and measures word similarity in terms of morphological similarity, semantic distance, part of speech and co-occurrence. The experiments on Chinese-Japanese word aligment shows that this model can partly solve some problems of existing word alignment methods, such as the unknown word problem, the synonym problem and the global optimization problem. In the experiment, the F-score of our method is 80%, better than the F-score 72% of GIZA＋＋.

作者吴宏林刘绍明于戈

机构地区东北大学信息学院计算机软件与理论研究所日本富士施乐公司

出处《中文信息学报》 CSCD 北大核心 2007年第5期101-106,共6页 Journal of Chinese Information Processing

基金富士施乐访问研究员计划的资助

关键词计算机应用中文信息处理词对齐二部图匹配 computer application Chinese information processing word alignment bipartite graph matching

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1F.Och,H.Ney..A Systematic Comparison of Various Statistical Alignment Models[J].Computational Linguistics,2003,29(1):19-51.
2P.F.Brown,J.Cocke,S.A.D.Pietra,V.J.D.Pietra,et al.A Statistical Approach to Machine Translation[J].Computational Linguistics,1990,16(2):79-85.
3W.Gale,K.Church.Identifying Word Correspondances in Parallel Texts[A].DARPA Workshop on Speech and Natural Language[C].Canada:Pacific Grove,1991,152-157.
4Y.Zhang,Q.Ma,H.Isahara.Use of Kanji Information in Constructing a Japanese-Chinese Bilingual Lexicon[A].The 4th Workshop on Asian Language Resources[C].Hainan:2004,39-46.
5D.WU.Bracketing and Aligning Words and Constituents in Parallel Text Using Stochastic Inversion Transduction Grammars[A].Parallel Text Processing:Alignment and Use of Translation Corpora[M].Dordrecht:Kluwer,2000.
6刘小虎,吴葳,李生,赵铁军,蔡萌,鞠英杰.基于词典和统计的语料库词汇级对齐算法[J].情报学报,1997,16(1):21-27. 被引量：8
7吕雅娟，赵铁军，李生，杨沐昀．统计和词典方法相结合的双语语料库词对齐[A]．第六届全国计算语言学联合学术会议[C]，2001，8．
8常宝宝.基于统计的翻译等价词对抽取研究[J].计算机学报,2003,26(5):616-621. 被引量：11
9吕学强,吴宏林,姚天顺.无双语词典的英汉词对齐[J].计算机学报,2004,27(8):1036-1045. 被引量：11
10J.Jiang and D.Conrath.Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy[A].International Conference on Research in Computational Linguistics[C].China,Taiwan:1997,19-33.

二级参考文献26

1刘小虎,吴葳,李生,赵铁军,蔡萌,鞠英杰.基于词典和统计的语料库词汇级对齐算法[J].情报学报,1997,16(1):21-27. 被引量：8
2Xu Dong-Hua. Aligning and matching of English-Chinese bilingual texts of CNS news. Department of Information System and Computer Science, National Univerisity of Singapore:Technical Report: cmp-lg/9608017, 1996
3Brown P.F., Lai J.C., Mercer R.L. et al.. Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991, 169～176
4Gale W.A., Church K.W.. A program for aligning sentences in bilingual corpora. Computational Linguistics, 1993,19(1): 75～102
5Kay M., Roscheisen M.. Text-translation alignment.Computational Linguistics, 1993, 19(1): 121～142
6Chen S.F.. Aligning sentences in bilingual corpora using lexical information. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, 1993, 9～16
7Wu De-Kai. Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32th Annual Conference of the Association for Computational Linguistics, Las Cruces, NM, 1994, 80～87
8Imamura K.. A hierarchical phrase alignment from English and Japanese bilingual text. In: Proceedings of the 2nd International Conference on Intelligent Text Processing and Computational Linguistics, Mexico, 2001, 206～207
9Ker S.J.,Chang J.S.. A class-based approach to word alignment. Computational Linguistics, 1997, 23(2): 313～344
10Borin L.. You'll take the high road and I'll take the low road: Using a third language to improve bilingual word alignment. In: Proceedings of the 18th International Conference of Computational Linguistics, Saarbrucken, Germany,2000, 97～103

共引文献26

1黄俊红,范云,黄萍.双语平行语料库对齐技术述评[J].外语电化教学,2007(6):21-25. 被引量：20
2方淼,高庆狮,余祖波.语义单元表示库的半自动构建技术[J].华中科技大学学报（自然科学版）,2005,33(z1):278-281.
3黄新艳,姚文琳,徐建良.基于汉英双语语料库的汉英Ontology的建立与管理[J].仪器仪表学报,2005,26(z2):529-532. 被引量：1
4吕学强,吴宏林,姚天顺.无双语词典的英汉词对齐[J].计算机学报,2004,27(8):1036-1045. 被引量：11
5肖华云,常宝宝.服务于双语词典编纂的检索平台[J].计算机工程与应用,2005,41(15):117-119.
6马芳,王炳锡,郭永辉.一种新的自纠错句对齐算法的研究与实现[J].微计算机信息,2005,21(10X):154-155. 被引量：1
7蒋宏飞,杨沐昀,赵铁军.面向奥运的汉英RBMT与EBMT研究[J].中文信息学报,2006,20(B03):71-74. 被引量：1
8吴江.中文自然语言理解技术与智能检索[J].图书馆学研究,2006(3):85-87. 被引量：3
9朱伟丽,韩宇,肖晓旦,陈先来.医学关键词与叙词对照表自动构建研究[J].现代图书情报技术,2006(8):51-54. 被引量：8
10陈国华,王立欣,梁茂成,刘树杰,许家金.英汉/汉英对译语料库对应词检索器[J].外语电化教学,2006(6):11-16. 被引量：13

同被引文献59

1吕学强,吴宏林,姚天顺.无双语词典的英汉词对齐[J].计算机学报,2004,27(8):1036-1045. 被引量：11
2郑丽英.数据结构Trie及其应用[J].现代计算机,2004,10(8):20-22. 被引量：6
3张孝飞,陈肇雄,黄河燕,王建德.基于锚点词对的双语词对齐算法[J].小型微型计算机系统,2006,27(2):330-334. 被引量：10
4王思力,张华平,王斌.双数组Trie树算法优化及其应用研究[J].中文信息学报,2006,20(5):24-30. 被引量：29
5Brown P F, Pietra V J D, Pietra S A D, et al. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 1993, 19(2) : 263 -311.
6Vogel S, Ney H, Tillmann C. HMM-Based Word Alignment in Sta- tistical Translation// Proc of the 16th Conference on Computational Linguistics. Stroudsburg, USA, 1996: 836- 841.
7Och F J, Ney H. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 2003,29( 1 ) : 19 -51.
8Wu Dekai. Stochastic Inversion Transduction Grammars and Bilin- gual Parsing of Parallel Corpora. Computational Linguistics, 1997, 23 (3) : 377 - 403.
9Zhang Hao, Gildea D. Stochastic Lexicalized Inversion Transduction Grammar for Alignment // Proc of the 43rd Annual Meeting on Association for Computational Linguistics. Ann Arbor, USA, 2005 : 475 - 482.
10Haghighi A, Blitzer J, de Nero J, et al. Better Word Alignments with Supervised ITG Models// Proc of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Con- ference on Natural Language Processing of the AFNLP. Suntee, Sin- gapore, 2009 : 923 - 931.

引证文献7

1张贯虹,乌达巴拉,巩政.基于判别式模型的蒙英词对齐方法[J].模式识别与人工智能,2012,25(3):521-526. 被引量：1
2张贯虹.融合句法信息的双语词对齐方法研究[J].电脑知识与技术,2014(3):1519-1523.
3尹存燕,黄书剑,戴新宇,陈家骏.面向新闻语料的中日命名实体翻译抽取[J].小型微型计算机系统,2015,36(6):1393-1397. 被引量：3
4唐亮,李倩,许洪波,易绵竹.基于多策略过滤的汉日多词短语抽取和对齐[J].山东大学学报（理学版）,2015,50(9):21-28. 被引量：4
5倪耀群,许洪波,程学旗.基于多特征融合和图匹配的维汉句子对齐[J].中文信息学报,2016,30(4):124-133. 被引量：2
6李萍,杨勇,任鸽,赛买提.艾力.基于HMM与词典的汉维词对齐研究[J].现代计算机,2017,23(21):7-10. 被引量：1
7张绍阳,曹家波,王子凡,曲卫东.基于加权二部图匹配的中文段落相似度计算[J].计算机工程与应用,2017,53(18):95-101. 被引量：4

二级引证文献15

1司莉,何依.2000年以来我国多语言语料库研究进展[J].现代情报,2016,36(6):165-170. 被引量：2
2苏依拉,赵亚平,牛向华.基于统计的蒙汉机器翻译中词对齐方法研究[J].中文信息学报,2018,32(6):44-51. 被引量：2
3李霞,刘承标,章友豪,蒋盛益.基于局部和全局语义融合的跨语言句子语义相似度计算模型[J].中文信息学报,2019,33(6):18-26. 被引量：14
4曹建文,万福成.面向自动问答系统的问句相似度计算研究[J].重庆大学学报（自然科学版）,2019,42(9):114-122. 被引量：4
5郑凌茜.国内外新闻翻译研究的文献计量分析(2000-2018)[J].天津外国语大学学报,2020,27(2):64-77. 被引量：2
6王东,林宏.一种试题智能提取与批量导入方法[J].贵阳学院学报（自然科学版）,2020,15(1):87-92. 被引量：3
7丘心颖,陈汉武,陈源,谭立聪,张皓,肖莉娴.融合Self-Attention机制和n-gram卷积核的印尼语复合名词自动识别方法研究[J].湖南工业大学学报,2020,34(3):1-9. 被引量：2
8杨健.基于Python的janome日文分词技术的研究与实现[J].IT经理世界,2020,23(5):214-215.
9唐菊香,孙怿晖,廖晓,刘建国,于娟.多策略融合的俄语文本词语提取方法研究[J].中国科技术语,2021,23(3):59-67.
10夏涛,吉琳娜,刘哲,杨风暴.基于点对局部拓扑和加权二分图的地面目标关联[J].探测与控制学报,2021,43(6):106-112.

1郭淼霞,陈伟,李智腾,黄永宗.加权二部图推荐算法的MapReduce并行化实现[J].泉州师范学院学报,2015,33(2):110-114. 被引量：1
2刘晓光,谢晓尧.一种结合遗忘机制与加权二部图的推荐算法[J].河南科技大学学报（自然科学版）,2015,36(3):48-53. 被引量：4
3李镇东,罗琦,施力力.基于增加相似度系数的加权二部图推荐算法[J].计算机科学,2016,43(7):259-264. 被引量：6
4刘少华,张茂军,陈旺.无重叠视域多摄像机的数据关联算法[J].计算机应用,2009,29(9):2378-2382. 被引量：2
5杜中华,王兴贵,陈永才.科学计算时计算机编程语言的互译问题研究[J].计算机工程,2001,27(12):164-165. 被引量：1
6张新猛,蒋盛益.基于加权二部图的个性化推荐算法[J].计算机应用,2012,32(3):654-657. 被引量：34
7陈兴俊,魏晶晶,廖祥文,简思远,陈国龙.基于词对齐模型的中文评价对象与评价词抽取[J].山东大学学报（理学版）,2016,51(1):58-64. 被引量：4
8张安磊.移动环境下特色数据资源个性化推送系统的构建方法[J].计算机光盘软件与应用,2013,16(10):54-55. 被引量：1
9柳俊,周斌,黄九鸣.基于二部图投影的微博事件关联分析方法研究[J].信息网络安全,2014(9):44-49. 被引量：4
10陈正宇,徐志国.睡眠机制无线传感器网络中负载平衡的数据融合调度方法[J].金陵科技学院学报,2016,32(1):1-5.

中文信息学报

2007年第5期

浏览历史

内容加载中请稍等...

基于加权二部图的汉日词对齐被引量：7

参考文献11

二级参考文献26

共引文献26

同被引文献59

引证文献7

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于加权二部图的汉日词对齐 被引量：7

参考文献11

二级参考文献26

共引文献26

同被引文献59

引证文献7

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于加权二部图的汉日词对齐被引量：7