摘要
针对旅游评论中的实体具有长度较长、结构复杂、嵌套严重的问题,提出一种基于Ro BERTa-WWMBiLSTM-CRF模型的旅游实体识别方法。首先,使用RoBERTa-WWM(A Robustly Optimized BERT Pre-training Approach-Whole Word Masking)预训练语言模型从旅游评论中获得含有先验语义信息的字符向量;其次,引入双向长短期记忆网络(BiLSTM)进一步获得包含上下文信息的文本序列双向表达;最后,通过条件随机场(CRF)输出最优标签序列。使用建立的旅游数据集进行实验,结果表明Ro BERTa-WWM-BiLSTM-CRF模型的识别效果优于现有的主流模型,验证了该方法进行命名实体识别的有效性。
Aiming at the problems of long length,complex structure and serious nesting of entities in travel reviews,a travel entity recognition method based on RoBERTa-WWM-BiLSTM-CRF model is presented.Firstly,the RoBERTa-WWM(A Robustly Optimized BERT Pre-training Approach-Whole Word Masking) pre-training language model is used to obtain character vectors containing a priori semantic information from travel reviews;Secondly,the introduction of Bi-directional Long Short-Term Memory(BILSTM) further obtains the bidirectional expression of text sequences containing contextual information;Finally,the Conditional Random Field(CRF)is introduced to output the optimal tag sequence.Experiment with the builded tourism data set,The results show that the recognition effect of the RoBERTa-WWM-BiLSTM-CRF model is better than that of the existing mainstream models,verifying The effectiveness of this method for named entity recognition.
作者
李胜楠
徐春
LI Sheng-nan;XU Chun(School of Information Management,Xinjiang University of Finance and Economics,Urumqi 830012,China)
出处
《电脑与信息技术》
2022年第6期34-38,共5页
Computer and Information Technology
基金
新疆自然科学基金项目(项目编号:2019D01A23)
新疆财经大学科研基金项目(项目编号:2022XGC073)
新疆社会科学基金项目(项目编号:18BGL086)。