期刊文献+

基于分类器串联融合的生物医学命名实体识别

Bio-entity recognition based on cascade generalization
下载PDF
导出
摘要 鉴于生物医学命名实体识别的多数模型使用单机器学习算法时识别效果不好,提出一种基于条件随机域(CRFs)与最大熵(Maxent)分类器融合的方法,利用基分类器之间的相关性和互补性,结合有效的特征集合,进行再学习,得到融合模型.实验表明,该模型的识别性能与单一分类器和JNLPBA专题会议相关的系统比较,取得很好成绩,F测度达到70.7%,证明该融合方法有效. Currently,most of methods for bio-entity recognition are based on a single machine learning algorithm and it can not achieve better performance.Therefore,in this paper,we propose a cascade generalization method based on the CRFs and Maxent which makes use of the compensation and relativity among different classifiers.Experimental results show that the cascade generalization method is obviously superior to the individual classifier based method and the most state of the art systems in JNLPBA conferences.F value reached 70.7%,showing that the fusion method is effective.
出处 《大庆石油学院学报》 CAS 北大核心 2011年第2期91-94,122,共4页 Journal of Daqing Petroleum Institute
基金 黑龙江省自然科学基金项目(F200603)
关键词 条件随机域 最大熵 分类器融合 特征提取 生物医学命名实体识别 conditional random fields maximum entropy cascade generalization feature extraction bio-entity recognition
  • 相关文献

参考文献15

  • 1王浩畅,赵铁军.生物医学文本挖掘技术的研究与进展[J].中文信息学报,2008,22(3):89-98. 被引量:23
  • 2Krauthammer M, Rzhetsky A, Morozov P, et al. Using BLAST for identifying gene and protein names in journal articles[J]. GENE, 2000,259(1):245- 252.
  • 3Olsson F, Er iksson G, Franzen K, et al. Notions of correctness when evaluating protein name taggers[C/OL ]// Proceedings of the 19th international conference on computational linguistics. 2002:765-771~2007 05-10]. http://www.sics.se/-fredriko/papers/coling02.pdf.
  • 4Zhou Guodong, Zhang Jie, Su Jian, et al. Recognizing names in biomedical texts: a machine learning approach[J].Bioinformatics, 2004,20(7) :1178-1190.
  • 5胡俊锋,陈浩,陈蓉,谭斌,于中华.基于感知器的生物医学命名实体边界识别算法[J].计算机应用,2007,27(12):3026-3028. 被引量:2
  • 6王浩畅,赵铁军.基于SVM的生物医学命名实体的识别[J].哈尔滨工程大学学报,2006,27(B07):570-574. 被引量:18
  • 7L N Y F, TSA I T H, Chou W C,et al. A maximum entropy approach to biomedical named entity recognition[C/OL]//4th workshop on datamining in bioinformatics. 2004:56-61 [2007 - 05 - 01]. http://iasl, iis. sinica, edu. tw/webpdf/paper- 2004 - A Maxi- mum Entropy-Approach to_ Biomedical Named_Entity Recognition. pdf.
  • 8Lafferty J, Mccallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. proc. of the 18tb international conference on machine learning[C]. San Francisco: 2001:282-289.
  • 9Tom M.机器学习[M].北京:机械工业出版社,2000:166-170.
  • 10马瑞民,马民艳.基于CRFs的多策略生物医学命名实体识别[J].齐齐哈尔大学学报(自然科学版),2011,27(1):39-42. 被引量:2

二级参考文献94

  • 1王浩畅,赵铁军,刘延力,于浩.生物医学文本中命名实体识别的智能化方法[J].北京邮电大学学报,2006,29(z2):54-58. 被引量:2
  • 2王浩畅,赵铁军.基于SVM的生物医学命名实体的识别[J].哈尔滨工程大学学报,2006,27(B07):570-574. 被引量:18
  • 3Krauthammer M, Rzhetsky A, Morozov P, et al. Using BLAST for identifying gene and protein names in journal articles[J]. GENE, 2000, 259 (1) : 245-252.
  • 4OLSSON F, ER IKSSON G, FRANZEN K, et al. Notions of correctness when evaluating protein name taggers[C/OL ]//Proceedings of the 19 th International Conference on Computational Linguistics.2002:765 - 771 [2007 - 05 - 10 ]. http://www, sits. se / - fredriko /paper/coling02. pdf.
  • 5ZHOU Guodong, ZHANG Jie, SU Jian, et al.Recognizing names in biomedical texts: a machine learning approaeh[J ]. Bioinformatics, 2004,20(7): 1178- 1190.
  • 6NOBATA C, COLLIER N, TSUJ II J. Automatic term identification and classification in biology texts[C ]//Proceedings of the 5 th Natural Language Processing Pacific Rim Symposium. 北京:清华大学出版社, 1999:369 - 374.
  • 7KAZAMA J, MAKINO T, OHTA Y, et al. Tuning support vector machines for biomedical named entity recognition [C/OL ] //Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain. 2002:1 - 8[2007 - 05 - 01 ]. http: / /www-tsujii.is s u-tokyo, ac. jp / - kazama/papers/kazama_aclbio02, pdf.
  • 8L N Y-F, TSA I T-H, CHOU W-C, et al. A maximum entropy approach to biomedical named entity recognition[C/OL ]//4 th Workshop on DataMining in Bioinformatics. 2004:56 - 61 [2007 - 05 -01 ]. http: / / iasl. iis. sinica, edu. tw/webpdf /paper-2004-A_Maximum_Entropy_Approach to Biomedical_Named_Entity_Recognition. pdf.
  • 9J.Lafferty,A.Mccallum,F.Pereira.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]. Warsaw: Proc.of the 18th International Conference on Machine Learning, 2001: 282-289.
  • 10YOSHIMASA Tsuruoka, YUKA Tateishi,KIM Jin-Dong,et al.Developing a robust part-of-speech tagger for biomedical text[A].Advances in Informatics-10th Panhellenic Conference on Informatics[Cl.[s.1.],2005.

共引文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部