摘要
为提高常规自动语音识别(ASR)系统的精度,提出基于隐式马尔可夫模型混合连接时间分类/注意力机制的端到端ASR系统设计方法。首先,针对可观测时变序列语音识别过程中存在的连续性强、词汇量大的语音识别难点,基于隐式马尔可夫模型对语音识别过程进行模拟,实现了语音识别模型参数化;其次,使用连接时间分类目标函数作为辅助任务,在多目标学习框架中训练语音识别过程的关注模型编码器,可降低序列级连接时间分类目标近似度,实现语音识别过程精度提升;最后,通过在自建语音识别库上的仿真实验,验证所提算法在识别效率和精度上的性能优势。
In order to improve the accuracy of the conventional automatic speech recognition(ASR) system, an end-to-end ASR system design method based on the Hidden Markov Model(HMM) connection time classification/attention mechanism is proposed. Firstly, the speech recognition process is simulated based on the implicit Markov model to realize the parameterization of the speech recognition model, aiming at the difficulty of speech recognition with strong continuity and large vocabulary in the speech recognition process of observable time variant sequence. Secondly, using the objective function as the auxiliary task, the attention model coder of the speech recognition process is trained in the multi-target learning framework, which can reduce the approximate degree of the sequence level connection time classification target and improve the accuracy of the speech recognition process. Finally, simulation experiments on the self-built speech recognition library verify the performance advantages of the proposed algorithm in terms of recognition efficiency and accuracy.
作者
陈聪
贺杰
陈佳
CHEN Cong;HE Jie;CHEN Jia(School of Data Science and Sofware Engineering,Wuzhou University,Wuzhou 543002,China)
出处
《控制工程》
CSCD
北大核心
2021年第3期585-591,共7页
Control Engineering of China
基金
国家自然科学基金项目(61562074,61961036)
广西高校行业软件技术重点实验室资助项目。
关键词
隐式马尔可夫
连接时间分类
注意力机制
端到端
语音识别
Hidden Markov
connection time classification
attention mechanism
end-to-end
speech recognition