摘要
作为自然语言处理领域中经典的信息抽取任务,事件抽取方法可以帮助人们从海量文本数据中快速准确地获取结构化事件信息,在事理图谱构建、舆情监控、态势感知等多个方面都起着举足轻重的作用。而由于事件组成的复杂性,文档中会包含多个相关的事件句,如果在对文档进行事件抽取时仅针对单个句子,从句子层面进行事件抽取,则很难将一个事件分散在整个文档中的事件信息抽取完整,得到完整的事件信息。为了解决这些问题,该文提出了基于全局语义匹配的篇章级事件抽取方法。首先,使用基于长短期记忆网络—条件随机场的序列标注模型进行句子级事件抽取;其次,在句子级事件抽取的基础上,采用所提全局语义匹配方法进行事件共指判断,通过融合句子级事件信息完成事件信息的完整抽取;最后,在MUC-4事件抽取数据集对所提模型进行验证,结果表明所提方法对文档中分散的事件元素有更准确抽取效果,在F1值上也有明显提升。
As classic information extraction task in the field of natural language processing,event extraction method can help people quickly and accurately obtain structured event information from massive text data,which plays a pivotal role in many aspects,such as the construction of logic map,public opinion monitoring,situation awareness and so on.Due to the complexity of event composition,the document will contain multiple related event sentences.If the event extraction of the document is carried out only for a single sentence and from the sentence level,it is difficult to extract the event information of an event scattered in the whole document completely and obtain the complete event information.To solve these problems,we propose a novel document-level event extraction method based on global semantic matching.Firstly,the sequence labeling model based on long short-term memory network-conditional random field was used to extract sentence level events.Secondly,on the basis of sentence level event extraction,the global semantic matching method was used for event co-reference judgment,and the complete extraction of sentence level event information was completed by integrating sentence level event information.Finally,the proposed model was verified on the MUC-4 event extraction data set.It is showed that the proposed method has a more accurate extraction effect on scattered event elements in the document,and the F1 value is also significantly improved.
作者
高兵
皇甫楠
邹启杰
秦静
GAO Bing;HUANGFU Nan;ZOU Qi-jie;QIN Jing(School of Information Engineering,Dalian University,Dalian 116622,China;Dalian Key Laboratory of Smart Healthcare and Health,Dalian University,Dalian 116622,China;School of Software Engineering,Dalian University,Dalian 116622,China)
出处
《计算机技术与发展》
2023年第7期154-159,共6页
Computer Technology and Development
基金
国家自然科学基金青年科学基金项目(62002038)
辽宁省科学研究经费项目(LJKZ1180)。
关键词
事件抽取
篇章级事件抽取
全局语义匹配
论元识别
信息融合
机器学习
event extraction
document level event extraction
global semantic match
argument identification
information fusion
machine learning