期刊文献+

多维字符特征表示的铁路设备事故信息抽取方法 被引量:5

Information extraction method for railway equipment accidents based on multi-dimensional character feature representation
原文传递
导出
摘要 为解决铁路设备事故调查报告数据分析困难的问题,提出基于多维字符特征表示设备事故信息抽取方法,在数据预处理阶段,提出主题模式匹配方法,抽取命名实体所属的主题段落;在文本特征表示中,提出多维特征表示方法将文本转化为特征向量;采用长短时记忆网络(BiLSTM)与条件随机场(CRF)神经网络实现铁路设备事故命名实体识别模型训练;采用铁路设备事故调查报告进行试验验证。结果表明:通过主题模式匹配预处理,多维字符特征+BiLSTM+CRF模型的综合评价指标提升22.86%,多维字符特征表示方法相比word2vec特征表示方法,能够使BiLSTM+CRF模型的综合评价指标提升4.89%。 In order to address difficulty in data analysis in investigation reports of railway equipment accidents,an accident information extraction method based on multi-dimensional character feature representation was proposed.Firstly,a subject pattern matching method was put forward for data preprocessing stage to extract subject paragraphs to which named entity belonged.For text feature representation,a multi-dimensional feature representation method was proposed to transform text into feature vector,and training of named entity recognition model was carried out by using bidirection long short term memory(BiLSTM)+conditional random fields(CRF)neural network.Finally,accident investigation report was used for experimental verification.The results show that the comprehensive evaluation index of multi-dimensional character+BiLSTM+CRF model is improved by 22.86%through preprocessing of subject pattern matching.And compared with word2vec feature representation,multi-dimensional one can improve evaluation index of BiLSTM+CRF model by 4.89%.
作者 张鹏翔 ZHANG Pengxiang(Standards&Metrology Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)
出处 《中国安全科学学报》 CAS CSCD 北大核心 2022年第6期109-114,共6页 China Safety Science Journal
关键词 多维字符特征 铁路设备事故 信息抽取 主题模式匹配 命名实体识别 multi-dimensional character feature railway equipment accident information extraction subject pattern matching named entity recognition
  • 相关文献

参考文献10

二级参考文献70

  • 1俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:159
  • 2徐爱萍,边馥苓.GIS中文查询系统的词典设计与分词研究[J].武汉大学学报(信息科学版),2006,31(4):348-351. 被引量:10
  • 3Gaizauskas R, Wilks Y. Information Extraction: Beyond Document Retrieval. Journal of Documentation, 1997
  • 4Appelt D E, Israel D J. Introduction to Information Extraction Technology IJCAI-99
  • 5The AGE 2003 Evaluation Plan. http://www. nist. gov/speech/tests/ace/ace03/,Site visited on August 30th, 2003
  • 6Aone C,Halverson L,Hampton T,Ramos-Santacruz M. SPA: Description of the IE2 system used for MUC-7. In:Proc. of MUC-7,1998
  • 7Miller S, Crystal M, Fox H, Ramshaw L, Schwartz R, Stone R,Weischedel R. Algorithms that learn to extract information-BBN: Description of the SIFT system as used for MUC-7. In:Proc. of MUC-7,1998
  • 8Freitag D. Machine Learning for Information Extraction in Informal Domains :[PhD thesis]. Carnegie Mellon University, 1998
  • 9Ciravegna F. Adaptive information extraction from text by rule induction and generalisation. In: Proc. of the Seventeenth Intl. Joint Conf. on Artificial Intelligence, 2001
  • 10Califf M E,Mooney R J. Relational learning of pattern-match rules for information extraction. In: Proc. of the Sixteenth National Conf. on Artificial Intelligence, 1999. 328-334

共引文献114

同被引文献36

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部