最大熵和Brill方法结合识别英语BaseNPs 被引量：6

Identifying English BaseNPs Through a Combination of Maximum Entropy Approach and Brill Approach

下载PDF

导出

摘要为了进一步提高基本名词短语(BaseNPs)的识别精度,针对最大熵方法和Brill方法各自的特点,提出基于两者相结合的英语基本名词短语识别算法.该算法是在高准确率词性标注的基础上实现的.在训练和测试两个阶段中,均先采用最大熵方法识别基本名词短语,然后将已具有很高精度的识别结果作为初始标注结果运用于Brill方法中.实验结果表明,此联合算法达到了94%的准确率和召回率,充分融合了最大熵方法和Brill方法的优点,可与基于相同训练和测试语料的目前最理想的英语基本名词短语识别结果相比. To increase further the accuracy of BaseNP identification and utilize features of the maximum entropy approach and the Brill approach, an English BaseNPs identification algorithm based on a combined approach is presented. The algorithm is based on a high-performance POS （parts of speech） tagger. During the training phase and the application phase, maximum entropy approach is first applied to the initialization process of Brill approach, and the Brill approach is then run on its results already having high accuracy. Experimental results showed that this combined algorithm achieved a high precision and recall rate of over 94 %, fully inosculating the strength of the maximum entropy approach and the Brill approach. It is comparable to the most ideal results of existing English BaseNP identification based on the same training and testing corpus.

作者吕琳刘玉树

机构地区北京理工大学计算机科学技术学院

出处《北京理工大学学报》 EI CAS CSCD 北大核心 2006年第6期500-503,共4页 Transactions of Beijing Institute of Technology

基金国家部委预研项目(504-4)

关键词基本名词短语短语识别最大熵 Brill方法 BaseNP phrase identification maximum entropy Brill approach

分类号 TP301.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献8

1Brill E.Transformation-basederror-driven parsing[C]∥ The Third International Workshop on ParsingTechnologies.Tilburg,Netherlands:[s.n.],1993:13-16.
2Cardie C,Pierce D.Error-driven pruning of the treebank grammars for base noun phraseidentification[C]∥ COLING-ACL'98.Montreal,Canada:[s.n.],1998:218-224.
3周雅倩,郭以昆,黄萱菁,吴立德.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展,2003,40(3):440-446. 被引量：62
4Tjong Erik F,Sang Kim.Memory-based shallow parsing[J].Machine LearningResearch,2002(2):559-594.
5吕琳,周世斌,刘玉树.一种高性能英文词性标注器的设计与实现[J].北京理工大学学报,2005,25(10):876-879. 被引量：5
6徐延勇,周献中,井祥鹤,郭忠伟.基于最大熵模型的汉语句子分析[J].电子学报,2003,31(11):1608-1612. 被引量：16
7李素建,刘群,杨志峰.基于最大熵模型的组块分析[J].计算机学报,2003,26(12):1722-1727. 被引量：58
8Sang E F T K,Daelemans W,Déjean H,et al.Applying system combination to base nounphrase identification[C]∥ COLING 2000.Saarbrücken,Germany:Morgan KaufmannPublishers,2000:857-863.

二级参考文献46

1孙宏林,俞士汶.浅层句法分析方法概述[J].当代语言学,2000,2(2):74-83. 被引量：38
2[1]Erik F, Tjong Kim Sang,Buchholz S. Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL2000 and LLL-2000, Lisbon, Portugal, 2000. 127～132
3[2]Steven A. Parsing by Chunks. In: Berwick, Abney, Tenny eds. Principle-Based Parsing: Kluwer Academic Publishers,1991. 257～278
4[5]Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996
5[6]Ratnaparkhi A. A simple introduction to maximum entropy models for natural language processing. Institute for Research in Cognitive Science, University of Pennsylvania : Technical Report 9708, 1997
6[7]Berger A, Pietra S D, Pietra V D. A maximum entropy approach to natural language processing. Computational Linguistics, 1996,22(1):39～71
7[8]Skut, Wojciech, Thorsten Brants. A maximum entropy partial parser for unrestricted text. In:Proceedings of the 6th Workshop on Very Large Corpora, Montreal, Canada, 1998. 143～151
8[10]Abney S. Part-of-speech tagging and partial parsing. In:Church K, Young S, Bloothooft G eds. Corpus-Based Methods in Language and Speech, An ELSNET volume, Dordrecht:Kluwer Academic Publishers, 1996. 119～136
9[11]Church K W. A stochastic parts program and noun phrase parser for unrestricted text. In:Proceedings of the 2nd Conference on Applied Natural Language Processing, Texas, USA, 1988.136～143
10[12]Ramshaw L A, Marcus M P. Text chunking using transformation-based learning. In: Proceedings of ACL Third Workshop on Very Large Corpora, Cambridge, USA, 1995. 82～94

共引文献118

1李剑锋,胡国平,王仁华.基于最大熵模型的韵律短语边界预测[J].中文信息学报,2004,18(5):56-63. 被引量：20
2陈晓明,周渝.汉语部分句法分析的研究和发展趋势[J].贵州大学学报（自然科学版）,2004,21(4):384-386. 被引量：2
3干俊伟,黄德根.汉语介词短语的自动识别[J].中文信息学报,2005,19(4):17-23. 被引量：14
4王建会,王雷,胡运发.词语间依存关系的定量识别[J].中文信息学报,2005,19(4):31-38. 被引量：3
5王胜,朱明.基于最大熵马尔可夫模型的地址信息抽取[J].计算机工程与应用,2005,41(21):192-194. 被引量：8
6冯丽萍,焦莉娟.基于最大熵的中文组织机构名识别模型[J].计算机与数字工程,2010,38(12):36-40. 被引量：2
7余正涛,樊孝忠.基于最大熵模型的汉语问句语义组块分析[J].计算机工程,2005,31(17):3-5. 被引量：5
8戴文华,焦翠珍,徐斌.基于统计的自然语言处理模型[J].咸宁学院学报,2005,25(3):79-82. 被引量：3
9余正涛,樊孝忠,郭剑毅.基于支持向量机的汉语问句分类[J].华南理工大学学报（自然科学版）,2005,33(9):25-29. 被引量：20
10冯冲,陈肇雄,黄河燕,王江伟.最大熵模型的树-栅格最优N解码算法[J].计算机科学,2005,32(10):167-169. 被引量：1

同被引文献50

1邢福义.NVN造名结构及其NV｜VN简省形式[J].语言研究,1994,14(2):1-12. 被引量：21
2梁颖红,赵铁军,姚建民,于浩,徐冰.基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略[J].计算机工程与应用,2004,40(35):1-3. 被引量：2
3梁颖红,赵铁军,岳琪.英语基本名词短语识别技术研究[J].信息技术,2004,28(12):22-24. 被引量：4
4梁颖红,毛蕾,赵铁军,徐冰,朱义勇.英语基本名词短语识别向汉语的快速移植[J].高技术通讯,2004,14(12):21-24. 被引量：1
5马建军.基于规则和统计的机器翻译方法歧义问题比较分析[J].大连理工大学学报（社会科学版）,2010,31(3):114-119. 被引量：8
6许静芳,李星,李粤.信息检索中主题式词典的构建方法[J].计算机工程,2005,31(21):143-145. 被引量：5
7郭永辉,杨红卫,马芳,王炳锡.基于粗糙集的基本名词短语识别[J].中文信息学报,2006,20(3):14-21. 被引量：2
8冯冲,陈肇雄,黄河燕,张亮,王江伟.基于条件随机域的复杂最长名词短语识别[J].小型微型计算机系统,2006,27(6):1134-1139. 被引量：16
9Pantel P.An Unsupervised Approach to Prepositional Phrase Attachment Using Contextually Similar Words[M].USA:Association for Computational Linguistics,2000.
10Brill E,Resnik P.A Rule-based Approach to Prepositional Phrase-attachment Disambiguation[M].USA:Association for Computational Linguistics,1994.

引证文献6

1卢朝华,黄广君,郭志兵.基于最大熵的汉语介词短语识别研究[J].通信技术,2010,43(5):181-183. 被引量：7
2马建军,黄德根.英语功能名词短语研究及其应用[J].大连理工大学学报,2012,52(1):126-131. 被引量：1
3卢朝华,徐好芹,王玉芬.基于语义分析的汉语介词短语识别方法研究[J].电脑与电信,2012(3):46-48. 被引量：3
4郑丽,吕学强.搜索引擎日志中“N+V+N”、“V+N+N”型短语识别[J].计算机工程与应用,2013,49(6):143-147. 被引量：1
5韩朝阳,刘国兵,王跃武.一种新型英语基本名词短语识别方法——基于边界概率与N_Gram词性串规则相结合[J].软件导刊,2015,14(8):14-18. 被引量：1
6马建军,裴家欢,黄德根.CRFs融合语义信息的英语功能名词短语识别[J].中文信息学报,2016,30(6):59-66. 被引量：2

二级引证文献12

1许亚堃,刘功申.利用依存关系优化拼音输入法[J].通信技术,2013,46(3):83-86.
2胡韧奋.面向汉英专利机器翻译的介词短语自动识别策略[J].语言文字应用,2015(1):136-144. 被引量：8
3桑乐园,黄德根.基于简单名词短语的汉语介词短语识别研究[J].中文信息学报,2015,29(6):8-12. 被引量：2
4吴锋文.面向信息处理的非分句语段的类型及其特征[J].渭南师范学院学报,2017,32(3):64-69.
5马建军,裴家欢,黄德根.CRFs融合语义信息的英语功能名词短语识别[J].中文信息学报,2016,30(6):59-66. 被引量：2
6李洪政,晋耀红.汉语介词短语自动识别研究综述[J].中文信息学报,2017,31(2):1-10. 被引量：1
7刘彤,黄德根,张聪.基于多模型融合的汉语介词短语识别[J].中文信息学报,2017,31(6):25-32.
8王闻慧.融合边界信息的越南语名词短语深度学习识别方法[J].计算机应用与软件,2019,36(12):169-175.
9邢丹,饶高琦,荀恩东,王诚文.基于大规模语料库的介词结构搭配库构建[J].中文信息学报,2020,34(11):1-8. 被引量：8
10汪梦翔.基于规则的非典型有标被动句的语义角色自动标注研究[J].语言文字应用,2022(2):122-132.

1周明,吴进,黄昌宁.用于词性标注的一种快速学习算法──对Brill的基于变换算法的一项改进[J].计算机学报,1998,21(4):357-366. 被引量：8
2The Brill Journal Archives Online[J].Frontiers of History in China,2013,8(2):308-308.
3The Brill Journal Archives Online[J].Frontiers of Philosophy in China,2015,10(3).
4The Brill Journal Archives Online[J].Frontiers of Literary Studies in China-Selected Publications from Chinese Universities,2015,9(3).
5The Brill Journal Archives Online[J].Frontiers of Literary Studies in China-Selected Publications from Chinese Universities,2013,7(3):529-529.
6The Brill Journal Archives Online （2012）[J].Frontiers of Business Research in China,2012,6(4).
7The Brill lournal Archives Online[J].Frontiers of History in China,2013,8(4).
8The Brill Journal Archives Online （2012）[J].Frontiers of History in China,2012,7(4).
9The Brill Journal Archives Online[J].Frontiers of Philosophy in China,2013,8(2).
10The Brill Journal Archives Online[J].Frontiers of Economics in China-Selected Publications from Chinese Universities,2013,8(2).

北京理工大学学报

2006年第6期

浏览历史

内容加载中请稍等...