摘要
针对语音情感识别存在数据样本不足、识别准确率不高以及算法模型参数量大等问题,提出一种融合多头注意力的VGGNet语音情感识别方法。首先,通过在原语音中添加高斯白噪声以及对对数梅尔频谱图进行色彩饱和度处理的方法实现数据增强;然后,搭建一种轻量级VGGNet,将扩充后的频谱图输入到网络模型中;最后,将多头注意力机制与VGGNet相结合,有效提高语音情感识别算法的准确率。通过在RAVDESS和IEMOCP数据集上与其他主流算法进行跨数据集对比实验,表明该算法的识别准确率均达最高,分别为88.3%和77.11%。
A VGGNet speech emotion recognition method with multi-head attention is presented to solve the problems of insufficient data samples,low recognition accuracy and large number of algorithm model parameters in speech emotion recognition.First,data enhancement is achieved by adding white Gaussian noise to the original speech and processing the color saturation of the Log-Mel spectrogram.Then,a lightweight VGGNet is built to input the expanded spectrum into the network model.Finally,the combination of multi-head attention mechanism and VGNet can effectively improve the accuracy of speech affective recognition algorithm.The cross-dataset comparison experiments on RAVDESS and IEMOCP datasets show that the algorithm achieves the highest recognition accuracy of 88.3%and 77.11%,respectively.
作者
焦亚萌
周成智
李文萍
崔琳
董免
Jiao Yameng;Zhou Chengzhi;Li Wenping;Cui Lin;Dong Mian(School of Electronices and Information,Xi’an Polytechnic University,Xi’an 710048,China)
出处
《国外电子测量技术》
北大核心
2022年第1期63-69,共7页
Foreign Electronic Measurement Technology
基金
陕西省教育厅专项科研计划(20JK0647)
陕西省自然科学基础研究计划(2021JQ692)项目资助。
关键词
语音情感识别
数据增强
多头注意力机制
speech emotion recognition
data augmentation
multi-head self-attention