摘要
在图文双模态情绪分类任务中,特征提取不充分和多模态特征融合出现信息冗余等问题较为普遍,本文在多通道特征提取和融合的过程中引入注意力机制,提出融合注意力机制的多模态情绪分类模型。首先,使用TextCNN和BERT模型分别提取文本局部特征、文本上下文特征,用残差网络提取图像特征;其次,利用跨模态注意力机制实现模态间的信息交互,从而增强各模态特征表示;然后,利用自注意力机制进行多模态特征融合;最后,通过Softmax分类器获得最终情绪分类结果。在公开的TumEmo图文数据集上,情绪七分类的准确率和F1值分别达到了75.2%、74.3%,表现出良好的性能。
In the task of image text bimodal emotion classification,insufficient feature extraction and information redundancy in multimodal feature fusion are common problems.In this paper,attention mechanism is introduced in the process of multi-channel feature extraction and fusion,and a multimodal emotion classification model integrating attention mechanism is proposed.Firstly,TextCNN and BERT models are used to extract text local features and text context features respectively,and residual network is used to extract image features.Secondly,the cross modal attention mechanism is used to realize the information interaction between modes,so as to enhance the representation of modal features;Then,the self attention mechanism is used to fuse multimodal features in turn;Finally,the final emotion classification result is obtained through Softmax classifier.On the public TumEmo image text data set,the accuracy rate of emotion seven classification and F1 value reached 75.2%and 74.3%respectively,showing good performance.
作者
彭俊文
李磊
PENG Junwen;LI Lei(School of Statistics and Data Science,Xinjiang University of Finance and Economics,Urumqi Xinjiang 830012)
出处
《软件》
2023年第12期176-180,共5页
Software