摘要
【目的/意义】问答社区问句的自动标引可以为网站的信息组织和信息服务提供有效帮助。目前关于自动标引的研究大部分集中于抽词标引,并不适用于问答社区问句的自动标引。【方法/过程】本文以金投网问答社区为例,融合了赋词标引和抽词标引方法,提出了一种基于预训练语言模型BERT和TF-IDF的问答社区问句自动标引模型。该模型使用基于BERT的多标签分类算法对问句进行赋词标引,将问句划分为短问句和长问句,使用TF-IDF算法对长问句进行抽词标引,补充长问句标引标签。【结果/结论】实验结果表明,本文提出的自动标引模型可以有效对问答社区问句进行自动标引,对提高用户信息检索效果具有重要的意义。【创新/局限】利用问句内外部特征构建了基于BERT和TF-IDF的问答社区问句自动标引模型,并提出了一种基于BERT的多标签分类算法。
【Purpose/significance】Automatic indexing of questions in Q&A community can provide effective help for information organization and information service of websites. At present, most researches on automatic indexing focus on extraction indexing, which is not applicable to the automatic indexing of questions in Q&A community.【Method/process】Based on the CNGOLD Q&A community as an example, this paper combines the methods of assignment indexing and extraction indexing, and proposes an automatic indexing model of questions in Q&A community based on BERT and TF-IDF. This model uses the multi-label classification algorithm based on BERT to assign the questions, divides the questions into short questions and long questions, and uses the TF-IDF algorithm to extract the long questions and supplement the indexing tags of long questions.【Result/conclusion】 The experimental results show that the automatic indexing model proposed in this paper can effectively automatically index the questions in Q&A community, which is of great significance to improve the effect of user information retrieval.【Innovation/limitation】Based on the internal and external characteristics of questions, this paper constructs an automatic indexing model of questions in Q&A community based on BERT and TF-IDF, and proposes a BERT based multi-label classification model.
作者
唐晓波
刘江南
TANG Xiao-bo;LIU Jiang-nan(School of Information Management,Wuhan University,Wuhan 430012,China;Center for Studies of Information Resources,Wuhan University,Wuhan 430012,China)
出处
《情报科学》
CSSCI
北大核心
2021年第3期3-10,共8页
Information Science
基金
国家自然科学基金项目“基于文本和Web语义分析的智能咨询服务研究”(71673209)。