摘要
基于长短时记忆(LSTM)神经网络在语音识别方面的良好性能,本文引入了一种新的深度LSTM方法.该方法利用深度控制门控函数连接多层LSTM单元,在循环神经网络中引入了上下层之间的线性相关性,可以更深层地构建语音模型.同时利用链接时序分类的训练准则进行模型训练,搭建端到端语音识别系统,解决了隐马尔可夫模型需要将标签和序列强制对齐的问题.实验表明,深度LSTM可以提高语音建模的性能,相比使用标准LSTM的模型,在准确率方面提高约4%.
Based on good performance of the long-short term memory(LSTM) neural network in speech recognition, a new depth-gated LSTM method is introduced. This method used depth control gating function to connect LSTM units, and introduced the linear correlation between the upper and lower recurrent units, which can further construct the speech model. Moreover, the training criterion based on connectionist temporal classification was applied to the acoustic model training and an end-to-end speech recognition system was built to solve the problem that hidden Markov model which needs to align labels and sequences forcibly. Experiments show that depth-gated LSTM can improve the performance of speech modeling. Compared with the model using standard LSTM, the accuracy of depth-gated LSTM is improved by about 4%.
作者
张瑞珍
韩跃平
张晓通
ZHANG Rui-zhen;HAN Yue-ping;ZHANG Xiao-tong(School of Information and Communication Engineering,North University of China,Taiyuan 030051,China)
出处
《中北大学学报(自然科学版)》
CAS
2020年第3期244-248,共5页
Journal of North University of China(Natural Science Edition)
关键词
语音识别
深度LSTM
链接时序分类
端到端
speech recognition
depth-gated LSTM
connectionist temporal classification
end-to-end