期刊文献+

基于特征和隐马尔可夫模型的文本信息抽取 被引量:3

Information Extraction Based on Character Extraction and HMMs
下载PDF
导出
摘要 基于文本分块提出一种新的文本信息抽取技术,该技术利用文本的语义特征和结构特征,抽取具有特征的状态,以此结果为基础,进一步运用改进的隐马尔可夫模型,抽取剩余的无特征状态。对美国CMU大学CORA搜索引擎研制组提供的数据集中的100篇进行测试,结果显示精确度和召回率比基于单词和传统隐马尔可夫模型的方法都有所提高,并进一步提高了效率。 This paper brings forward a kind of new text information extraction technology based on text blocks.This technology utilizes the semanteme characteristic and structure characteristic of the text to make certain the states with characteristic.On the basis of this result,the remainder states of no characteristic with the improved hidden Markov models(HMMs) are extracted.This paper has tested 100 pieces of headers of computer science paper of the data provided by the search-engine research group from CMU university of USA.The result shows that the recall and precision rate are all improved a lot compared with existing methods which are based on words and traditional HMMs.
出处 《河南科技大学学报(自然科学版)》 CAS 2008年第2期55-57,70,共4页 Journal of Henan University of Science And Technology:Natural Science
基金 吉林省科技发展计划项目(20050527)
关键词 文本分块 特征提取 隐马尔可夫模型 Text block Character extraction HMMs
  • 相关文献

参考文献6

  • 1孙斌.信息提取技术概述(下)[J].术语标准化与信息技术,2003(1):34-37. 被引量:11
  • 2Srihari R. A Question Answering System Supported by Information Extraction [ C ]//Proceedings of ANLP 2000. Seattle, 2000.
  • 3Freitag D, McCallum A. Information Extraction with HMMs and Shrinkage [ C ]/! Working Notes of the AAAI - 99 Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999, AAAI Press.
  • 4McCallum A, Freitag D. Maximum Entropy Markov Models for Information Extraction and Segmentation[ C l//Proceedings of the Seventeenth International Conference on Machine Learning. Stanford,CA,Morgan Kaufmann,2000:591 -595.
  • 5Seymore K, McCallum A,Rosenfeld R. Learning Hidden Markov Model structure for Information Extraction [ C ]//Working Notes of the AAAIWorkshop on Machine Learning for Information Extraction. AAAI Press,1999:37 -42.
  • 6Rohini K, Srihari. InfoXtraet : A Customizable Intermediate Level Information Extraction Engine [ C ]//Proceedings of HLT/ NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS) ,2003:52 -59.

二级参考文献9

共引文献10

同被引文献41

引证文献3

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部