摘要
为解决铁路设备事故调查报告数据分析困难的问题,提出基于多维字符特征表示设备事故信息抽取方法,在数据预处理阶段,提出主题模式匹配方法,抽取命名实体所属的主题段落;在文本特征表示中,提出多维特征表示方法将文本转化为特征向量;采用长短时记忆网络(BiLSTM)与条件随机场(CRF)神经网络实现铁路设备事故命名实体识别模型训练;采用铁路设备事故调查报告进行试验验证。结果表明:通过主题模式匹配预处理,多维字符特征+BiLSTM+CRF模型的综合评价指标提升22.86%,多维字符特征表示方法相比word2vec特征表示方法,能够使BiLSTM+CRF模型的综合评价指标提升4.89%。
In order to address difficulty in data analysis in investigation reports of railway equipment accidents,an accident information extraction method based on multi-dimensional character feature representation was proposed.Firstly,a subject pattern matching method was put forward for data preprocessing stage to extract subject paragraphs to which named entity belonged.For text feature representation,a multi-dimensional feature representation method was proposed to transform text into feature vector,and training of named entity recognition model was carried out by using bidirection long short term memory(BiLSTM)+conditional random fields(CRF)neural network.Finally,accident investigation report was used for experimental verification.The results show that the comprehensive evaluation index of multi-dimensional character+BiLSTM+CRF model is improved by 22.86%through preprocessing of subject pattern matching.And compared with word2vec feature representation,multi-dimensional one can improve evaluation index of BiLSTM+CRF model by 4.89%.
作者
张鹏翔
ZHANG Pengxiang(Standards&Metrology Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China)
出处
《中国安全科学学报》
CAS
CSCD
北大核心
2022年第6期109-114,共6页
China Safety Science Journal
关键词
多维字符特征
铁路设备事故
信息抽取
主题模式匹配
命名实体识别
multi-dimensional character feature
railway equipment accident
information extraction
subject pattern matching
named entity recognition