摘要
随着数字农业的快速发展,农作物命名实体识别作为农业领域知识图谱构建的基础,成为一种高效率的农作物研究领域识别方法。由于农作物实体识别呈现结构复杂、实体指称不一致、干扰因素多等特征,严重制约了农作物领域实体识别的性能,提出一种基于预训练语言模型的实体识别模型,使用BERT为文本中词进行编码、采用双向LSTM(Long-Short Term Memory)获取句子中关键词的上下文,采用CRFs(Conditional Random Fields)捕获词之间的依赖关系,并结合所构建的农作物命名实体识别数据集进行验证。实验证明该模型能够有效对农作物实体进行识别,且性能优于当前已有的实体识别模型。
With the rapid development of digital agriculture,crop named entity recognition,as the basis of knowledge graph construction in agriculture,is becoming an efficient crop recognition method.Since crop entity recognition presents complex structure,inconsistent entity designations,and multiple confounding factors,which seriously restrict the performance of entity recognition in crop domain,the paper proposes an entity recognition model based on pre-trained language models.BERT was used to encode words in text,bi-directional LSTM was used to obtain the context of each keyword in a sentence,and CRFs was used to capture the dependencies between words.The model was validated with the constructed crop named entity recognition dataset.The experiments demonstrate that the model can effectively recognize crop entities and outperforms the existing entity recognition models.
作者
沈子雷
杜永强
Shen Zilei;Du Yongqiang(College of Information Engineering,Xinyang Agriculture and Forestry University,Xinyang 464000,Henan,China)
出处
《计算机应用与软件》
北大核心
2024年第6期223-229,共7页
Computer Applications and Software
基金
河南省科技攻关项目(222102110189)。
关键词
命名实体识别
BERT预训练语言模型
双向LSTM
农作物
Named entity recognition
BERT Pre-trained language models
Bi-directional long and short-term memory
Crops