期刊文献+

融合内容特征与传播特征的微博文本情感分类 被引量:2

Micro-blog text emotion classification based on the fusion of content features and spread features
下载PDF
导出
摘要 基于Word2vec的文本向量化表示方法未充分考虑微博文本的内容特征与传播特征,导致文本向量化表示欠佳,且采用单个机器学习算法进行情感分类的精度不高。提出一种融合文本中表情符号,词的语义、词性与情感等内容特征,评论、转发与点赞数等传播特征,共同构建蕴含丰富语义与情感信息的文本特征向量。根据各基分类器在训练数据集上的性能表现设置不同权重,并与类概率向量相乘,保留最大、最小与平均加权概率值,同时结合原始文本特征向量作为元分类器的输入数据以改进原Stacking算法,进行微博文本情感分类。在微博数据集上的实验结果表明:本文方法能更好地表示文本向量,以加权方式改进的Stacking集成学习分类器优于单个分类器;相较于其他情感分类方法,本文方法的准确率提升1.75%~4.90%。 The text vector representation method based on Word2vec does not fully consider the content features and spread features of micro-blog texts,so it is not good enough to finish the micro-blog text vector representation.Besides,a single machine learning algorithm which is applied to classify the micro-blog text through emotions can’t provide a high accuracy of emotion classification.To further improve the effect of emotion classification for the micro-blog text,this paper proposes a new text vector representation method,which is combined with the improved Stacking ensemble learning algorithm to accomplish emotion classification for micro-blog text data in this paper.At first,text feature vectors with rich semantic and emotional information are proposed to be constructed together by integrating text content features such as emoticons,semantic features of words,and part of speech and emotion,with the spread features such as comments,retweets and likes.Specifically,when constructing the initial text feature vector,this paper synthesizes the content features such as emoticons,word semantics,as well as part of speech and emotion.Meanwhile,it also constructs the corresponding feature vectors according to the above content features,and splices these vectors into the initial text feature based on content characteristics.Secondly,the influence of the text is constructed based on the spread features of the text,such as the number of comments,retweets and agreements.Finally,the influence of the micro-blog text is combined with the initial text feature vector to further enrich the semantic and emotional information contained in the vector representation of the micro-blog text.Moreover,in the improved Stacking ensemble learning algorithm,combined with the initial training data set,four classification algorithms are selected,such as AdaBoost,random forest,GBDT and XGBoost.Then,a 5 fold cross-validation method is used to generate a high-performance base classifier.More importantly,the class probability vector is used instead of the class label output from the base classifier.Different weights are set and multiplied with the class probability vector according to the performance of the base classifiers on the training data set.After that,they are multiplied by the class probability vector to get the weighted class probability vector,retaining the maximum weighted probability values,the minimum weighted probability values and the average weighted probability values of each text predicted by all base classifiers belonging to each category.A simple and stable logistic regression algorithm is selected as the meta-classifier as well.At last,the original Stacking algorithm is improved by integrating the above weighted probability values as the input data of the meta-classifier with the original text feature vector so as to accomplish emotion classification of micro-blog text.The experiment results on the data set of the micro-blog text show that the proposed method can better represent text vectors,and the improved Stacking ensemble learning classifier by the weight method is superior to the single emotion classifier.Compared with other emotion classification methods,the method proposed in this paper has made a performance improvement on the accuracy index from 1.75%to 4.90%,effectively improving the effect of emotion classification.
作者 陈红阳 黄正洪 何盈盈 周也力 CHEN Hongyang;HUANG Zhenghong;HE Yingying;ZHOU Yeli(School of Computer Engineering,Chongqing College of Humanities,Science and Technology,Chongqing 401524,China;School of Information Science and technology,Chengdu University of Technology,Chengdu 610059,China)
出处 《重庆理工大学学报(自然科学)》 北大核心 2023年第7期245-255,共11页 Journal of Chongqing University of Technology:Natural Science
基金 重庆市教委科学技术研究项目(KJQN201901801,KJQN202001803,KJQN202303114) 重庆市教委科学技术研究重点项目(KJZD-202001801) 重庆人文科技学院2020年校级技术创新专项项目(CQRKZK2020004)。
关键词 微博文本 情感特征 词性特征 传播特征 情感分类 micro-blog text emotion feature part of speech feature spread feature emotion classification
  • 相关文献

参考文献13

二级参考文献139

共引文献434

同被引文献26

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部