摘要
虚假评论识别在电子商务、社交媒体等领域具有重要的应用价值。尽管现有虚假评论识别模型融合了文本的情感信息,但在预训练过程中忽视了对情感信息的提取,导致准确率不高。针对此问题,本文提出一种基于情感信息预处理和双向门控循环单元(Bidirectional Gated Recurrent Unit,Bi-GRU)的虚假评论识别模型(FR-SG),用于提高虚假评论识别的准确率。首先,通过Albert模型获取文本的语义向量;然后,使用词频逆文本频率(Term Frequency-Inverse Document Frequency,TF-IDF)和K均值(K-means++)聚类的方法从评论中挖掘情感种子词,基于种子词对文本中的属性词和情感词进行掩码(mask);接着,使用面向情感的目标优化函数,将情感信息嵌入到语义表示中,生成情感向量;最后,将这两组向量的拼接结果输入虚假评论识别网络中,得到文本的分类结果。实验结果表明,相较于Bi-GRU+Attention模型,FR SG提高了虚假评论识别的准确率。
Fake review detection has important application value in e-commerce, social media, and other fields.Although existing review detection models integrate the sentiment information of the text, in the process of pre-training, the extraction of emotional information is ignored, resulting in low accuracy.Aiming at this problem, a Fake Review detection model(FR-SG) based on the pre-training of sentiment information and Bidirectional Gated Recurrent Unit(Bi-GRU) is proposed to improve the accuracy of fake review detection in this article.Firstly, the semantic vector of the text is obtained by Albert model.Then, Term Frequency-Inverse Document Frequency(TF-IDF) and K-means++ clustering methods are used to mine the sentiment seed words from reviews.Based on the seed words, the attribute words and sentiment words in the text are masked.Then, using the sentiment-oriented objective optimization function, the sentiment information is embedded into the semantic representation of the text to generate the sentiment vector.Finally, the joining results of these two groups of vectors are input into the fake review detection network to obtain the classification results of the text.Experimental results show that FR-SG improves the accuracy of fake review detection compared with the Bi-GRU+Attention model.
作者
张玉莹
朱广丽
张友强
孙争艳
张顺香
ZHANG Yuying;ZHU Guangli;ZHANG Youqiang;SUN Zhengyan;ZHANG Shunxiang(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan,Anhui,232001,China;Institute of Artificial Intelligence Research,Hefei Comprehensive National Science Center,Hefei,Anhui,230088,China)
出处
《广西科学》
CAS
北大核心
2023年第1期169-176,共8页
Guangxi Sciences
基金
国家自然科学基金面上项目(62076006)
安徽省高校协同创新项目(GXXT 2021008)资助。