期刊文献+

基于BERT字向量和TextCNN的农业问句分类模型分析 被引量:9

Agricultural question classification model based on BERT word vector and TextCNN
下载PDF
导出
摘要 【目的】研究不同词向量和深度学习模型组合对农业问句分类结果的影响,为构建农业智能问答系统提供技术支撑。【方法】通过爬虫获取农业种植网等网站的问答数据,选择20000条问句进行人工标注,构建农业问句分类语料库。采用BERT对农业问句进行字符编码,利用文本卷积神经网络(TextCNN)提取问句高维度特征对农业问句进行分类。【结果】在词向量对比实验中,BERT字向量与TextCNN结合时农业问句分类F1值达93.32%,相比Word2vec字向量提高2.1%。在深度学习模型的分类精度对比方面,TextCNN与Word2vec和BERT字向量结合的F1值分别达91.22%和93.32%,均优于其他模型。在农业问句的细分试验中,BERT-TextCNN在栽培技术、田间管理、土肥水管理和其他4个类别中分类F1值分别为86.06%、90.56%、95.04%和85.55%,均优于其他深度学习模型。超参数设置方面,BERT-TextCNN农业问句分类模型卷积核大小设为[3,4,5]、学习率设为5e-5、迭代次数设为5时效果最优,该模型在数据样本不均衡的情况下,对于农业问句的平均分类准确率依然能达93.00%以上,可满足农业智能问答系统的问句分类需求。【建议】通过阿里NLP等开源平台提升数据标注质量;在分类过程中补充词频和文档特征,提高模型分类精度;农业相关政府职能部门加强合作,积极探索农业技术数字化推广和服务新模式。 【Objective】To study the effects of different word vectors and deep learning models on the classification results of agricultural questions,so as to provide technical support for the construction of agricultural intelligent question answering system.【Method】The question-and-answer data from websites such as the Agricultural Planting Network was obtained through crawlers,and 20 thousand questions were selected for artificial annotation to construct the classification corpus of agricultural questions. Bidirectional encoder representation from transformers(BERT)was used to encode agricultural questions,and text convolutional neural network(TextCNN)was used to extract high-dimensional features of questions to classify agricultural questions.【Result】In the word vector comparison experiment,when BERT word vector was combined with TextCNN,the F1 value of agricultural question classification reached 93.32%,which was 2.1% higher than that of Word2vec. In the comparison of classification accuracy of deep learning models,when TextCNN was combined with Word2vec and BERT,F1 value reached 91.22% and 93.32%,respectively,which were better than that of other models. In the subdivision experiment of agricultural questions,F1 values of BERT-TextCNN in the classification of cultivation technology,field management,soil,fertilizer and water management achieved 86.06%,90.56%,95.04% and85.55%,which were better than that in other deep learning models. In terms of hyperparameter settings,the BERTTextCNN agricultural question classification model had the best effect when the convolution kernel size is set as[3,4,5],the learning rate was set to 5e-5,and the number of iterations was set to 5. In the case of unbalanced data samples,the average classification accuracy of agricultural questions could still reach more than 93.00%,which could meet the question classification requirements of the agricultural intelligent question answering system.【Suggestion】The quality of data annotation can be improved through open source platforms such as Ali NLP;model classification accuracy shall be improved through supplementing word frequency and document features in the classification process;Agricultural-related government departments need to strengthen cooperation to explore new models of popularization and service of agricultural technology digitalization.
作者 鲍彤 罗瑞 郭婷 贵淑婷 任妮 BAO Tong;LUO Rui;GUO Ting;GUI Shu-ting;REN Ni(Information Center,Jiangsu Academy of Agricultural Sciences,Nanjing,Jiangsu 210014,China;Institute of Science and Technology Information,Jiangsu University,Zhenjiang,Jiangsu 212013,China)
出处 《南方农业学报》 CAS CSCD 北大核心 2022年第7期2068-2076,共9页 Journal of Southern Agriculture
基金 国家社会科学基金项目(19BTQ032)。
关键词 农业问句 智能问答系统 问句分类 预训练语言模型(BERT) 文本卷积神经网络 agricultural questions intelligent question answering system question classification bidirectional encoder representation from transformers(BERT) text convolutional neural network(TextCNN)
  • 相关文献

参考文献24

二级参考文献203

共引文献633

同被引文献138

引证文献9

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部