期刊文献+

融合BERT-WWM和注意力机制的茶叶知识图谱构建 被引量:3

Construction of knowledge graph of integrating BERT-WWM and attention mechanism
下载PDF
导出
摘要 【目的】针对当前茶叶领域语料数据库不完善、多源异构数据聚合能力差、知识共享困难等问题,提出一种基于BERT-WWM-BiLSTM-AttTea-CRF模型的茶叶知识图谱构建方法。【方法】以基于全词掩码的BERT-WWM(Whole Word Masking)层替换预训练模型中的随机掩码BERT层,并根据茶叶领域语料数据的全局文本特征,设计可实现茶叶关键实体权重分配的注意力机制层以提高文本提取的准确率,最后通过条件随机场对序列中的各个实体进行分类提取,从而完成茶叶中文实体识别的整个流程。【结果】BERT-WWM-BiLSTM-AttTea-CRF模型能有效识别茶叶知识文本数据中的实体,对茶叶非结构化数据的实体抽取效果优于RoBERTa_BiLSTM_CRF、ALBERT_BiLSTM_CRF等主流模型,识别的准确率、召回率、F1值分别为92.03%、90.36%、91.19%。经改进后的模型对茶叶品种数据和茶叶病害数据的识别率有明显提升,其F1值分别达到94.32%和94.05%。【结论】本研究所构建的茶叶知识图谱具有数据覆盖面广、聚合能力强、体系完整等优势,对农业特定领域的知识图谱构建和农业中文命名实体的提取研究具有重要意义。 【Objective】This study proposes a tea knowledge graph construction method based on BERT-WWM-BiLSTM-AttTea-CRF,which aims to solve the problems in the field of tea science,such as imperfect corpus database,poor aggregation ability of multi-source heterogeneous data and difficult knowledge sharing.【Method】This method replaced the random masking BERT layer in the pre training model with the BERT-WWM(Whole Word Masking)layer based on the whole word masking.According to the global text characteristics of the corpus data in the tea field,an attention mechanism layer that can realize the weight distribution of key tea entities was designed to improve the accuracy of text extraction.The whole process of tea Chinese entity recognition was completed by extracting and classifying each entity in the sequence through conditional random field.【Result】The model can effectively identify the entities in the tea knowledge text data,and the entity extraction effect of unstructured tea data was better than RoBERTA_BiLSTM_CRF,ALBERT_BiLSTM_CRF.The recognition precision,recall and F1value were 92.03%,90.36%and 91.19%.The improved model has significantly improved the recognition rate of tea variety data and tea disease data,with F1values reaching 94.32%and 94.05%.【Conclusion】The tea knowledge graph constructed by this research method has the advantages of wide data coverage,strong aggregation ability,and complete system,which is of great significance to the construction of knowledge graph in specific agricultural fields and the extraction of agricultural Chinese named entities.
作者 刘永波 黄强 高文波 何鹏 许钰莎 LIU Yong-bo;HUANG Qiang;GAO Wen-bo;HE Peng;XU Yu-sha(Institute of Agricultural Information and Rural Economy,Sichuan Academy of Agricultural Sciences,Chengdu 610066,China;Sichuan Agricultural University,Ya'an,Sichuan 625014,China)
出处 《西南农业学报》 CSCD 北大核心 2022年第12期2912-2921,共10页 Southwest China Journal of Agricultural Sciences
基金 国家重点研发计划(2020YFD1100601)。
关键词 茶叶 知识图谱 条件随机场 双向长短词记忆模型 注意力机制 Tea Knowledge graph Conditional random field Bi-directional long short-term memory Attention mechanism
  • 相关文献

参考文献17

二级参考文献185

共引文献216

同被引文献75

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部