摘要
为了有效识别商品虚假评论,提出一种基于情感极性与SMOTE过采样的虚假评论识别方法。首先,根据在线虚假评论的特点,构建一个多维虚假评论特征模型;其次,在情感极性算法中增加了情感极性均值和情感极性标准差等统计指标来全面刻画虚假评论;最后,针对虚假评论中的类不平衡问题,使用SMOTE算法优化随机森林分类模型,从而提高虚假评论识别效果。基于大众点评网的真实评论数据进行了多组实验,实验结果表明该方法在正负样本不平衡的虚假评论数据集中具有更高的准确率、召回率及F值。综合考虑情感极性和正负样本不平衡等因素可帮助电商平台有效过滤虚假评论,为消费者提供更加真实可靠的评论数据。
To detect product fake review effectively,this paper proposed a method to detect fake reviews based on sentiment polarity and over-sampling. Firstly,according to the characteristics of online fake reviews,this paper constructed a multidimensional fake review feature model. Secondly,in order to fully characterize the fake reviews,added the average and the standard deviation of sentiment polarity to sentiment polarity algorithm. Finally,to solve the problem of class imbalance in fake reviews,this paper used SMOTE algorithm to optimize the random forest classification model,so as to improve the recognition effect of fake reviews. Empirical experiments conducted on dataset of dianping. com show that the proposed method has higher precision,recall and F-score in the imbalanced data of fake reviews. Comprehensive consideration of the sentiment polarity,imbalanced data and other factors can help the E-commerce platform to effectively filter fake reviews,and provide consumers with more real and reliable reviews.
作者
缪裕青
欧威健
刘同来
刘水清
文益民
Miao Yuqing;Ou Weijian;Liu Tonglai;Liu Shuiqing;Wen Yimin(School of Computer Science & Information Security;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin Guangxi 541004,China)
出处
《计算机应用研究》
CSCD
北大核心
2018年第7期2042-2045,共4页
Application Research of Computers
基金
广西自然科学基金资助项目(2014GXNSFAA118395)
国家自然科学基金资助项目(61363029)
桂林电子科技大学研究生教育创新计划资助项目(2016YJCX72)
关键词
虚假评论
情感极性
用户行为
逻辑回归
随机森林
fake reviews
sentiment polarity
reviewer behavior
logistic regression
random forest