期刊文献+

基于Fin-BERT的中文金融领域事件抽取方法

Fin-BERT-Based Event Extraction Method for Chinese Financial Domain
下载PDF
导出
摘要 事件抽取旨在从海量的非结构化的事件相关文本中抽取出人类感兴趣的内容,目前现有的事件抽取方法大多数基于通用语料,很少考虑到领域内的先验知识,并且现有的方法大多数不能很好地处理同一文档包含多个事件的情况,面对存在较多负面样例的测试也表现不佳。针对上述问题提出了一种基于Fin-BERT(financial bidirectional encoder representation from Transformers)和PTPCG(pseudo-trigger-aware pruned complete graph)的模型FinPTPCG,该方法充分利用Fin-BERT预训练模型的表达能力,在编码阶段融入领域内的先验知识,并且在事件检测模块采用多个二元分类器叠加的方式,保证模型可以有效识别一篇文档内存在多事件的情况并筛除掉负面样例,抽取实体之后将实体连接成完全图并通过计算相似度矩阵进行剪枝,通过选择伪触发器解决无标注触发词的问题,最后接入事件分类器实现事件抽取。该方法在ChFinAnn和Duee-fin数据集上事件抽取任务的F1值相比于基线方法分别取得了0.7个百分点和3.7个百分点的提升。 Event extraction aims to extract human-interest information from massive amounts of unstructured text.Currently,most existing event extraction methods are based on general corpora and rarely consider domain-specific prior knowledge.Moreover,most methods cannot handle well the case where multiple events exist in the same document,and they perform poorly when faced with a large number of negative examples.To address these issues,this paper proposes a model called Fin-PTPCG based on Fin-BERT(financial bidirectional encoder representation from Transformers)and PTPCG(pseudo-trigger-aware pruned complete graph).This method fully utilizes the expression ability of the Fin-BERT pre-training model and incorporates domain-specific prior knowledge during the encoding stage.In the event detection module,multiple binary classifiers are stacked to ensure that the model can effectively identify the situation of multiple events in a document and screen out negative examples.Combined with the decoding module of the PTPCG model,entities are extracted and connected into a complete graph and pruned by calculating a similarity matrix.The problem of unlabeled triggers is solved by selecting pseudo-triggers.Finally,the event extraction is achieved by the event classifier.This method achieves a 0.7 and 3.7 percentage points improvement in F1 score compared to the baselines on the ChFinAnn and Duee-fin datasets for the event extraction task.
作者 李熠 耿朝阳 杨丹 LI Yi;GENG Chaoyang;YANG Dan(School of Computer Science and Engineering,Xi’an Technological University,Xi’an 710021,China)
出处 《计算机工程与应用》 CSCD 北大核心 2024年第14期123-132,共10页 Computer Engineering and Applications
关键词 事件抽取 事件检测 信息抽取 自然语言处理 event extraction event detection information extraction natural language processing
  • 相关文献

参考文献4

二级参考文献41

  • 1李妮,关焕梅,杨飘,董文永.基于BERT-IDCNN-CRF的中文命名实体识别方法[J].山东大学学报(理学版),2020,55(1):102-109. 被引量:54
  • 2张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48. 被引量:66
  • 3俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:157
  • 4钱世凤.省略界定综述[J].语文学刊(高等教育版),2007(1):119-122. 被引量:3
  • 5Wikipedia:Message Understanding Conference[EB/OL].2013-12-27.http://en.wikipedia.org/wiki/Message_Understanding_Conference.
  • 6Wikipedia:Named Entity Recognition[EB/OL].2013-12-28.http://en.wikipedia.org/wiki/Named_Entity_Recognition.
  • 7Rizzo G,Troncy R.NERD:Evaluating Named Entity Recognition Toolsinthe Web of Data[J].Lecture Notesin Computer Science,2012(7295):39-55.
  • 8Rizzo G,Troncy R.NERD:A Framework for Unifying Named Entity Recognition and Disam biguation Extraction Tools[C]∥13th Conference ofthe European Chapter of the Association for ComputationalL inguistics.2012:73-76.
  • 9Li Chen-liang,Weng Jian-shu.TwiNER:Named Entity Recognition in Targeted Twitter Stream[C]∥SIGIR.2012:721-730.
  • 10Liu Xiao-hua,Zhang Shao-dian,et al.Recognizing Named Entitiesin Tweets[C]∥ACL.2011:359-367.

共引文献113

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部