摘要
目前深度学习技术在自然语言处理领域发展迅速,由于语音信息的特殊性,致使当前语音识别算法无法有效提取时域和频域特征。针对时域和频域特征混淆的问题,采用注意力机制设计了音频识别模块,该模块使用了音频信号的时域特征以及经FFT转换而来的频域特征。通过Transformer中的交叉注意力机制,来获得音频信号的长程依赖关系,并利用一维时间卷积捕捉局部特征信息,更好地将全局特征和局部特征进行融合。在实验部分,使用四种主流中文语音数据集进行实验以及对多种算法进行比较,字错误率降低11.1%以上,实时率达0.03以下。根据该算法进行系统设计,增强了实用性。
Deep learning technology is developing rapidly in natural language processing due to the specificity of speech information,resulting in the current speech recognition algorithm not effectively extracting the time domain and frequency domain features.To address the problem of confusion between time domain and frequency domain features,an audio recognition module is designed using the attention mechanism,which uses the time domain features of the audio signal and the frequency domain features converted by FFT.The cross⁃attention mechanism in the Transformer is used to obtain the long⁃range dependencies of the audio signal,and the one⁃dimensional time convolution is used to capture the local feature information for better fusion of global and local features.In the experimental part,using four mainstream Chinese speech datasets for experiments and multiple algorithms for comparison,the word error rate is reduced by more than 11.1%,and the real⁃time rate reaches below 0.03.The system design based on this algorithm enhances the practicality.
作者
付默予
FU Moyu(China Coal(Tianjin)Underground Engineering Intelligent Research Institute Co.,Ltd.,Tianjin 300120,China)
出处
《电子设计工程》
2024年第21期132-136,共5页
Electronic Design Engineering