摘要
[目的/意义]深度学习模型已经成为命名实体识别的主要方法,但当前多数模型的预处理忽略文本中词上下文的语义信息。因此,探明文本的语义化表示对于实体识别的影响具有重要意义。[方法/过程]文章以中华美食本体库构建为例,通过构建CRFs、BiLSTM-CRFs、Char2vec-BiLSTM-CRFs和BERT-BiLSTM-CRFs这4类模型,对比分析文本语义化表示对其识别效果的影响;联合比较了基于BERT-BiLSTM-CRFs模型的术语单独抽取和集合抽取的结果,并将该模型应用于美食本体库的构建。[结果/结论]实验表明,使用BERT-BiLSTM-CRFs模型的实体识别准确率要高于其他模型,整体F1值提升8.7%;并且实体单独识别抽取比集合抽取效果更好。[局限]研究的实验数据规模有限,后续研究将在更大数据集上进行。
[Purpose/significance]Deep learning model has become the main method of named entity recognition,but the preprocessing of most models ignores the semantic information of word context in text.Therefore,it is of great significance to explore the influence of semantic representation of text on entity recognition.[Method/process]This paper takes the construction of Chinese cuisine ontology as an example to analyze the influence of semantic representation on the recognition accuracy by constructing CRFs,BiLSTM-CRFs,Char2vec-BiLSTM-CRFs and BERT-BiLSTM-CRFs.The results of single term extraction and collection extraction based on BERT-BiLSTM-CRFs model are compared,and the model is applied to the establishment of cuisine ontology.[Result/conclusion]The experiment shows that the term recognition accuracy of BERT-BiLSTM-CRFs model is higher than that of other models,with an overall improvement of 8.7%(F1),and the single entity recognition method is better than the collection recognition method.[Limitations]The experimental data scale of this study is limited,and the follow-up study will be carried out on larger datasets.
出处
《情报理论与实践》
CSSCI
北大核心
2021年第10期8-17,共10页
Information Studies:Theory & Application
基金
国家社会科学基金重点项目“大数据环境下领域知识加工与组织模式研究”(项目编号:20ATQ006)
南京大学文科青年跨学科团队专项“面向人文计算的方志文本的语义分析和知识图谱研究”的成果
江苏青年社科英才和南京大学仲英青年学者(Tang Schloar)等人才培养计划的支持。