摘要
目前,事件抽取的流行方法是以事件元素或触发词进行驱动,但该方法容易导致正反例不平衡,且在语料库规模较小时存在一定的数据稀疏问题。提出了一种基于事件实例驱动的事件抽取方法。首先,从文档句子中抽取出刻画一个事件发生有代表性的特征,构成候选事件实例表示;其次,通过二元分类器对新闻文本中的事件实例与非事件实例进行分类;最后,对事件实例采用基于层次聚类的k-medoids算法完成事件抽取。该方法不仅克服了正反例失衡以及数据稀疏问题,而且解决了预先定义事件类别的局限性。实验结果验证了该方法的有效性,对比传统方法,事件抽取的准确率与召回率均获得了显著的提高。
At present,popular methods of event extraction regard event arguments or triggers as drivers,but they may cause positive and negative samples imbalance.Furthermore,there will be data sparseness problem when the corpus is small.This paper proposed an event extraction method driven by event sample.Firstly,features of event samples were extracted from news text sentences to compose the description of candidate event.Secondly,event samples and non-event samples of news text were classified through binary classification.Finally,event samples were clustered by hierarchical and k-medoids clustering algorithm to complete event extraction.The method not only overcomes positive and negative samples imbalance and data sparseness problem,but also resolves the limit of pre-defined event types.Experimental results indicate that the proposed method is effective,improves precision and recall of event extraction compared to traditional methods.
出处
《计算机科学》
CSCD
北大核心
2011年第8期232-235,共4页
Computer Science
基金
国家社科重大基金项目(09&ZD014)
国家863项目(2007AA01Z439)资助
关键词
事件实例
分类
新闻文本
聚类
事件抽取
Event sample
Classification
News text
Clustering
Event extraction