期刊文献+

基于事件实例驱动的新闻文本事件抽取 被引量:12

News Text Event Extraction Driven by Event Sample
下载PDF
导出
摘要 目前,事件抽取的流行方法是以事件元素或触发词进行驱动,但该方法容易导致正反例不平衡,且在语料库规模较小时存在一定的数据稀疏问题。提出了一种基于事件实例驱动的事件抽取方法。首先,从文档句子中抽取出刻画一个事件发生有代表性的特征,构成候选事件实例表示;其次,通过二元分类器对新闻文本中的事件实例与非事件实例进行分类;最后,对事件实例采用基于层次聚类的k-medoids算法完成事件抽取。该方法不仅克服了正反例失衡以及数据稀疏问题,而且解决了预先定义事件类别的局限性。实验结果验证了该方法的有效性,对比传统方法,事件抽取的准确率与召回率均获得了显著的提高。 At present,popular methods of event extraction regard event arguments or triggers as drivers,but they may cause positive and negative samples imbalance.Furthermore,there will be data sparseness problem when the corpus is small.This paper proposed an event extraction method driven by event sample.Firstly,features of event samples were extracted from news text sentences to compose the description of candidate event.Secondly,event samples and non-event samples of news text were classified through binary classification.Finally,event samples were clustered by hierarchical and k-medoids clustering algorithm to complete event extraction.The method not only overcomes positive and negative samples imbalance and data sparseness problem,but also resolves the limit of pre-defined event types.Experimental results indicate that the proposed method is effective,improves precision and recall of event extraction compared to traditional methods.
出处 《计算机科学》 CSCD 北大核心 2011年第8期232-235,共4页 Computer Science
基金 国家社科重大基金项目(09&ZD014) 国家863项目(2007AA01Z439)资助
关键词 事件实例 分类 新闻文本 聚类 事件抽取 Event sample Classification News text Clustering Event extraction
  • 相关文献

参考文献10

  • 1冯礼.基于事件框架的突发事件信息抽取[D]上海交通大学,上海交通大学2008.
  • 2Li W J,Wu M L,Lu Q.Extractive summarization using interand intraevent relevance. Proceedings of the 44th Annual Meeting of the Association for Computational Liguistics . 2006
  • 3Ahn D.The stages of event extraction. Proceedings of the COLING-ACL 2006 Workshop on Annotating and Reasoning About Time and Events . 2006
  • 4ACE(Automatic Content Extraction).Chinese Annotation Guidelines for Events. . 2005
  • 5赵妍妍,秦兵,车万翔,刘挺.中文事件抽取技术研究[J].中文信息学报,2008,22(1):3-8. 被引量:106
  • 6Leong C H,Tou N H.A Maxi mum Entropy Approach to Infor-mation Extraction from Semi-structured and Free Text. Proceedings of the 18th National Conference on Artificial Intelli-gence . 2002
  • 7张先飞,郭志刚,刘嵩,程磊,田雨暄.基于触发词指导的自相似度聚类事件检测[J].计算机科学,2010,37(3):212-214. 被引量:12
  • 8谭红叶.中文事件抽取关键技术研究[D]哈尔滨工业大学,哈尔滨工业大学2008.
  • 9Vapnik VN.The Nature of Statistical Learning Theory. . 2000
  • 10Zwaan R A,Radvansky G A.Situation models in language comprehension and memory. Psychological Bulletin . 1998

二级参考文献17

  • 1ACE(Automatic Content Extraction) Chinese Annotation Gui - delines for Events [M]. National Institute of Standards and Technology, 2005.
  • 2Surdeanu M, Harabagiu S, Williams J, et al. Using Predicate-Argument Structures for Information Extraction[C]// Proceedings of ACL. 2003,8-15.
  • 3Surdeanu M, Harabagiu S. Infrastructure for open-domain information extraction [C]//Proceedings of the Human Language Technology Conference. 2002 : 325-330.
  • 4Chieu Hal Leong, Ng Hwee Tou. A Maximum entropy Ap - proach to Information Extraction from Semi-Structured and Free Text[C]//Proceedings of the 18th National Conference on Artificial Intelligence. 2002:786-791.
  • 5Ahn D. The Stages of Event Extraction[C]//Proceedings of the Workshop on Annotations and Reasoning about Time and Events. 2006 : 1-8.
  • 6Ding C, He Xiaofeng. Cluster Merging and Splitting in Hierarchical Clustering Algorithms [A] // Proceedings of the 2002 IEEE International Conference on Data Mining[C]. Maebashi City,Japan: Maebashi TERRSA, 2002 : 139-146.
  • 7Ding C, He X, Zha H, et al. A Min-Max Cut Algorithm for Graph Partitioning and Data Clustering[A]//Proceedings of the IEEE Internationl Conference [C]. San Jose, California, USA:Data Mining,2001 ; 107-114.
  • 8Naomi Daniel,Dragomir Radev and Timothy Allison.Sub-event based Multi-document Summarization[A].In:Proceedings of the HLT-NAACL Workshop on Text Summarization[C].2003.9-16.
  • 9Elena Filatova and Vasileios Hatzivassiloglou.Event-based Extractive summarization[A].In:Proceedings of ACL Workshop on Summarization[C]].2004.104-111.
  • 10Wenjie Li,Mingli Wu and Qin Lu.Extractive Summarization using Inter-and Intra-Event Relevance[A].In:Proceedings of the 44th Annual Meeting of the Association for Computational Liguistics[C].2006.369-376.

共引文献111

同被引文献124

  • 1曾玉.信息检索的模糊聚类分析模型[J].情报学报,2004,23(4):433-436. 被引量:15
  • 2黄发良,钟智.用于分类的支持向量机[J].广西师范学院学报(自然科学版),2004,21(3):75-78. 被引量:14
  • 3姜吉发.一种事件信息抽取模式获取方法[J].计算机工程,2005,31(15):96-98. 被引量:27
  • 4郑家恒,菅小艳.农作物信息抽取系统的设计与实现[J].计算机工程,2006,32(7):197-198. 被引量:5
  • 5李丽双,黄德根,陈春荣,杨元生.SVM与规则相结合的中文地名自动识别[J].中文信息学报,2006,20(5):51-57. 被引量:32
  • 6Czechowski T, Stitt M, Altmann T, et al. Genome -Wide Identifi- cation and Testing of Superior Reference Genes for Transcript Nor- malization in Arabidopsis[ J]. Plant Physiology, 2005, 139( 1 ) : 5 - 17.
  • 7Libauh M, Thibivilliers S, Bilgin D D, et al. Identification of Four Soybean Reference Genes for Gene Expression Normalization [ J ]. The Plant Genome, 2008, 1(1 ):44-54.
  • 8Faccioh P, Ciceri G P, Provero P, el al. A Combined Strategy of "in Silico" Transcriptome Analysis and Web Search Engine Opti- mization Allows an Agile Identification of Reference Genes Suitable for Normalization in Gene Expression Studies [ J]. Plant Molecular Biology, 2007, 63 (5) :679 - 688.
  • 9Coker J S, Davis E. Selection of Candidate Housekeeping Controls in Tomato Plants Using EST Data [ J]. BioTechniques, 2003, 35 (4) :740 -748.
  • 10The Stanford Parser: A Statistical Parser[ EB/OL]. [20l1 -12 - 18 ]. http ://nip. stanford, edu/software/lex -parser. shtml.

引证文献12

二级引证文献82

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部