摘要
语音中的噪声会影响语音信号中的正常信息,影响语音识别效果,为提升音/视频双流语音识别准确率与效率,提出基于注意力LSTM的音/视频双流语音识别算法。使用小波阈值方法对音/视频双流语音数据实施去噪处理,依据去噪结果对语音数据矢量量化;基于语音数据量化结果构建At-LSTM模型,运用该模型提取语音数据的局部与整体特征,并融合处理特征;基于语音数据特征融合结果,对语音数据实施分类,实现音/视频双流语音的识别。实验结果表明,使用上述方法识别音/视频双流语音时,识别准确率高、识别时间短,且语音中的含噪部分较少,识别的语音更加流畅。
Noise in speech may affect the normal information in speech signal and the effect of speech recognition. Therefore, an algorithm of recognizing audio/video dual-stream speech based on attention LSTM was presented. At first, wavelet threshold method was used to reduce noise from audio/video dual stream speech data. Based on the denoising results, the speech data was vectorized. Based on the result of vector quantization for speech data, a At-LSTM model was constructed to extract the local and global features of speech data and fuse features. Based on the result of feature fusion, the voice data were classified. Finally, the recognition for audio/video dual-stream voice was achieved. Experimental results show that the method has high recognition accuracy, short recognition time when recognizing dual-stream speech of audio/video, with less noise and more fluent speech.
作者
张添添
王婧
ZHANG Tian-tian;WANG Jing(Tianhua College,Shanghai Normal University,Shanghai 201815,China;International School of Software,Wuhan University,Wuhan Hubei 430072,China)
出处
《计算机仿真》
北大核心
2023年第1期251-254,282,共5页
Computer Simulation
关键词
注意力机制
音/视频
双流语音识别
语音数据去噪
Attention mechanism
LSTM model
Audio/video
Dual-stream speech recognition
Speech data denoising