期刊文献+

基于词表示方法的生物医学命名实体识别 被引量:19

Research of Word Representations on Biomedical Named Entity Recognition
下载PDF
导出
摘要 生物医学命名实体识别是生物医学信息抽取的前提.目前实体识别大多采用机器学习的方法,依靠人工根据领域知识和经验制定特征,需要反复实验进行相应的特征选择,并且这些特征很少使用深层次的语义信息.为了探究语义信息对命名实体识别的影响,本文尝试在大规模未标注数据上进行训练,自动获得语义信息,得到三种词表示方法:词向量、基于词向量的聚类和布朗聚类.将其作为CRF和SVM的特征进行半监督学习,并在相同条件下进行对比实验.实验结果表明,词表示方法能有效地学习到潜在的语义信息,从而提高现有基于机器学习系统的性能.在未利用词典等任何外部资源的情况下,公共评测语料Bio Creative II GM上的实验结果为:精确率、召回率、F值分别达到91.24%、85.80%、88.44%. Biomedical named entity recognition is the prerequisite for biomedical information extraction. The current entity recognition methods, which are based on machine learning, mainly depend on manually summarizing features, according to the domain knowledge and experience, and need to do experiments repeatedly for selecting the appropriate features. And these features rarely utilize the deep semantic information. To investigate the effect of semantic information on Named Entity Recognition, this paper attempts to obtain se- mantic information automatically from the large-scale unlabeled corpus, which can be downloaded from public database, such as PubMed, and get three kinds of word representation approaches, including word embeddings, cluster based on word embeddings, and Brown cluster. The three kinds of word representation are adopted as the features of CRF model and SVM model for semi-supervised learning. Comparative experiments are conducted under the same conditions : the dimension of word embeddings and the number of clusters. The experimental results show that the word representation approaches can learn the latent semantic information effectively and thus improve the performance of existing entity recognition systems based on machine learning. Experimental results ( Precision, Recall, F-score) on public evaluation corpus BioCreative II GM reaches 91.24% ,85.80%, and 88.44% respectively without the dic- tionary or any other external resources.
出处 《小型微型计算机系统》 CSCD 北大核心 2016年第2期302-307,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61173101 61173100)资助
关键词 半监督 词表示 聚类 实体识别 semi-supervised word representation cluster entity recognition
  • 相关文献

参考文献14

  • 1Schuemie M J,Mons B, Weeber M, et al. Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification [ J]. Journal of Biomedical Informatics, 2007, 40(3) :316-324.
  • 2Hanisch D, Fundel K, Mevissen H T, et al. prominer: rule-based protein and gene entity recognition [ J ]. BMC Bioinformatics, 2005,6(S1 ) :S14.
  • 3Lee Chib, Hou Wenjuan, Chert Hsin-Hsi. Annotating multiple types of biomedical entities : a single word classification approach [ C ]. Proceedings of the International Joint Workshop on Natural Lan- guage Processing in Biomedicine and its Applications, Geneva, Switzerland, 2004 : 80-83.
  • 4Li Li-shuang, Fan Wen-ting, Huang De-gen, et al. Boosting per- formance of gene mention tagging system by hybrid methods [ J ].Journal of Biomedical Informatics ,2012 ,45 (1) :156-164.
  • 5Ando R K. BioCreative II gene mention tagging system at IBM Watson[ C]. Proceedings of the Second BioCreative Challenge E- valuation Workshop ,2007 : 101-103.
  • 6Li Yan-peng, Lin Hong-fei, Yang Zhi-hao. Incorporating rich back- ground knowledge gene for named entity classicisation and recogni- tion [J]. BMC Bioinformatics,2009,10(1) :1-15.
  • 7Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning [ C ]. Proceedings of the 48th Annual Meeting of the Association for Computational Lin- guistics, Uppsala, Sweden, 2010 : 384-394.
  • 8Kuksa P P,Qi Y. Semi-supervised bio-named entity recognition with word-codebook learning [ C]. In Proceedings of the SIAM Intema- tional Conference on Data Mining, Columbus, USA,2010:25-36.
  • 9Brown P F, DeSouza P V, Mercer R L, et al. Class-based n-gram models of natural language [ J]. Computational Linguistics, 1992,18(4) :467-479.
  • 10Bengio Y, Ducharrne R, Vincent P, et al. A neural probabilistic lan- guage model [ J], Journal of Machine Learning Research, 2003,3 (6) :1137-1155.

同被引文献166

引证文献19

二级引证文献224

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部