摘要
事件抽取在自然语言处理应用中扮演着重要的角色,如股票市场趋势预测.传统事件抽取较为关注触发词和论元所属类型的正确性,较少地结合应用需求去分析研究事件抽取效果及使用价值.在财经领域,事件作用对象及动作是关注的重点.因此,本文聚焦于金融事件,抽取三元组事件ET(Sub,Pred,Obj).在中文财经新闻中,存在大量事件嵌套和成分共享等现象,致使易出现事件漏抽和事件成分缺失的情况.为了解决这些问题,本文建立一个句法和语义依存分析相结合的中文事件抽取框架,归纳了4种常见缺省结构,并设计相应的补全规则.首先,基于句法依存树,分析动词词法和句法结构,建立核心动词链,使得每个核心动词对应一个事件,解决事件漏抽问题.然后,在句法依存树的基础上添加语义依存关系,建立事件间语义关联,得到句法语义依存分析(Syntactic Semantic Dependency Parsing,SSDP)树.第三,调整SSDP树,优化句法结构,形成SSDP图,使得同等句法结构的词结点处于相同层级,为后续事件抽取提供途径.第四,归纳4种常见缺省结构,设计相应补全规则,解决事件成分缺失问题.最后,在中文财经新闻标题和CoNLL2009中文语料上进行详细的实验测试,实验结果表明该方法是有效的.
As a sub-task of information extraction,event extraction plays an important role in nature language process applications,such as stock market trend forecast,which can provide strong clues for events users,e.g.investors,managers and government,to analyze the market and make decisions.At present,most of the studies about event extraction pay more attention to the type correctness of triggers and arguments,and not consider the effect and value of event extraction based on application requirements.We call this type of event extraction traditional event extraction.The event types and standards in traditional event extraction are derived from ACE2005 containing 8 categories and 33 sub-categories,KBP2015 and ERE,et al.However,there are some limitations in application of them to event extraction in specific financial domain.For example,there is not the overweight event type in ACE2005,which is a special behavior in the financial domain.In this paper,we focus on the financial news and extract open events without types.In the field of finance and economics,most event users are more concerned with the objects and actions that events affect.Therefore,combined with the application requirement,we propose to extract the financial event ET(Sub,Pred,Obj),where Sub,Pred and Obj represent subject,predicate and object respectively.However,Chinese financial news generally suffers from the event nesting and component default problem,which result in event omission and key element missing of events.To tackle this issue,with the expression habits and characteristics of Chinese linguistics,we build a Chinese event extraction framework based on syntactic and semantic dependency parsing.Then summarize four common default structures and design corresponding completion rules.In particular,at the beginning of this paper,we summarize four prominent phenomena in the extraction of events from the headlines of financial news,and explore the cause of these problems,no in-depth analyzing the relevance of syntactic and semantic structure or lack of it.After that,we employ the syntactic dependency parsing tree and lexical structure,and propose the core verb chains,which make sure that each core verb corresponds to an event solving event leakage problem.Thirdly,we add semantic dependency relation between events on the basis of syntactic dependency tree,which is called Syntactic Semantic Dependency Parsing(SSDP)tree.In order to better separate the detected events and their properties,we adjust and optimize SSDP tree to form the SSDP graph,where the word nodes of the same syntactic structure are at the same level,providing a way for subsequent event extraction.Fourthly,with the division of default structure in linguistic,we summarize four common default structures and propose ten corresponding completion rules to solve the problem of component default.Meanwhile,the whole Chinese event extraction algorithm based SSDP graph is shown at the end of the section.Finally,this paper depicts a detailed experimental situation.The experimental dataset,labeling standard and evaluation index are given.Subsequently,the method in this paper is verified on two datasets,financial news titles and common field news titles.At the end,we conduct comprehensive benchmarks on Chinese financial news titles and CoNLL2009 Chinese Corpus.The experimental results show that the proposed methods are effective.
作者
万齐智
万常选
胡蓉
刘德喜
WAN Qi-Zhi;WAN Chang-Xuan;HU Rong;LIU De-Xi(School of Information Technology,Jiangxi University of Finance and Economics,Nanchang 330032;School of Software and lnternet of Things Engineering,Jiangxi University of Finance and Ecomomics,Nanchang330032;Jiangri Key Labroratory of Data and Knowledge Enginering,Jiangxi University of Finance and Economics,Nanchang0330013)
出处
《计算机学报》
EI
CSCD
北大核心
2021年第3期508-530,共23页
Chinese Journal of Computers
基金
国家自然科学基金项目(61972184,61562032,61762042)
江西省教育厅科学技术研究项目(GJJ180198,GJJ180252)资助。
关键词
中文事件抽取
核心动词链
句法语义依存分析图
事件语义关联
缺省补全
Chinese event extraction
core verb chain
syntactic semantic dependency parsing graph
event semantics relevance
default complement