摘要
随着大数据时代的到来,问答系统成为人们获取信息的有效手段之一。作为问答系统关键一环的问句分类直接影响系统的性能。目前,问句分类研究主要集中在现代汉语领域,对中华典籍的问句分类研究还不多见。本文从问句分类概念出发,在生成中华典籍问句分类语料集的基础上,设计了面向中华典籍的问句分类体系,并对支持向量机、循环神经网络、长短时记忆神经网络、双向长短时记忆神经网络、BERT等模型的问句分类性能进行了比较研究。实验结果表明,与支持向量机和传统深度学习模型相比,BERT模型具有更优的问句分类能力,在本文提出的问句分类体系上,F1值达到95.55%,BERT模型在中华典籍问句分类任务中具有一定优势,具有一定的推广和应用价值。
With the arrival of big data era,a question answering system has become an effective method for users to obtain information.Question classification,as an important part in a question answering system,makes a direct impact on the quality of answering.The current research on question classification focuses on modern Chinese,whereas there are few related research in the field of Chinese classics.This paper starts with the concept of question classification,constructs the corpus of question classification in Chinese classics,designs the question classification system,and conducts comparative experiments by support vector machine(SVM),recurrent neural network(RNN) long short-term memory network(LSTM),bidirectional LSTM(BiLSTM),bi-directional encoder representations from transformers(BERT).The experimental results show that compared with SVM and traditional deep learning model,BERT performs best,and delivers F1 value of 95.55% on the question classification system;BERT model is quite fit for classification of Chinese classics questions,which has a certain promotion and application value.
作者
刘忠宝
贾君枝
LIU Zhongbao;JIA Junzhi(Institute of Language Intelligence,Beijing Language and Culture University,Beijing 100083,China;School of Information Resource Management,Renmin University of China,Beijing 100872,China)
出处
《晋图学刊》
2022年第3期34-43,共10页
Shanxi Library Journal
基金
教育部哲学社会科学研究后期资助项目“大数据环境下数字人文理论、方法与应用研究”(项目编号:21JHQ081)。
关键词
中华典籍
问句分类
深度学习
BERT模型
Chinese classics questions
question classification
deep learning
BERT model