期刊文献+

混合连接时间/注意力机制端到端语音识别 被引量:6

End-to-end Speech Recognition of Hybrid Connection Time and Attention Mechanism
原文传递
导出
摘要 为提高常规自动语音识别(ASR)系统的精度,提出基于隐式马尔可夫模型混合连接时间分类/注意力机制的端到端ASR系统设计方法。首先,针对可观测时变序列语音识别过程中存在的连续性强、词汇量大的语音识别难点,基于隐式马尔可夫模型对语音识别过程进行模拟,实现了语音识别模型参数化;其次,使用连接时间分类目标函数作为辅助任务,在多目标学习框架中训练语音识别过程的关注模型编码器,可降低序列级连接时间分类目标近似度,实现语音识别过程精度提升;最后,通过在自建语音识别库上的仿真实验,验证所提算法在识别效率和精度上的性能优势。 In order to improve the accuracy of the conventional automatic speech recognition(ASR) system, an end-to-end ASR system design method based on the Hidden Markov Model(HMM) connection time classification/attention mechanism is proposed. Firstly, the speech recognition process is simulated based on the implicit Markov model to realize the parameterization of the speech recognition model, aiming at the difficulty of speech recognition with strong continuity and large vocabulary in the speech recognition process of observable time variant sequence. Secondly, using the objective function as the auxiliary task, the attention model coder of the speech recognition process is trained in the multi-target learning framework, which can reduce the approximate degree of the sequence level connection time classification target and improve the accuracy of the speech recognition process. Finally, simulation experiments on the self-built speech recognition library verify the performance advantages of the proposed algorithm in terms of recognition efficiency and accuracy.
作者 陈聪 贺杰 陈佳 CHEN Cong;HE Jie;CHEN Jia(School of Data Science and Sofware Engineering,Wuzhou University,Wuzhou 543002,China)
出处 《控制工程》 CSCD 北大核心 2021年第3期585-591,共7页 Control Engineering of China
基金 国家自然科学基金项目(61562074,61961036) 广西高校行业软件技术重点实验室资助项目。
关键词 隐式马尔可夫 连接时间分类 注意力机制 端到端 语音识别 Hidden Markov connection time classification attention mechanism end-to-end speech recognition
  • 相关文献

参考文献2

二级参考文献42

  • 1陈海花,孟庆春.基于蚁群算法的语音信号动态时间规划[J].哈尔滨工业大学学报,2006,38(10):1758-1761. 被引量:2
  • 2M Dorigo, Maniezzo Vittorio, Colorni Alberto. The Ant System: Optimization by a colony of cooperating agents[ J]. IEEE Transac- tions on Systems, Man, and Cybernetics-Part B, 1996,26( 1 ) :1 -13.
  • 3M Dorigo, L M Gambardella. Ant Colony System:A Cooperative Learning Approach to the Traveling Salesman Problem [ J ]. IEEE Transactions on Evolutionary Computation, 1997,1 ( 1 ) :53-66.
  • 4Colornia, M Dorigo, V Maniezzo. Ant Colony system for job-shop scheduling[ J ]. Belgian J of Operations Research Statistics and Computer Science, 1994,34 ( 1 ) :39-53.
  • 5V Maniezzo, A Carbonaro. An ants Heuristic for the frequency as- signnment problmn [ J ]. Future Generation Computer Systems, 2000, (16) :927-935.
  • 6Moulines E, Sagisaka Y. Voice conversion: state of the art and perspectives. Special Issue of Speech Communication. The Netherlands, 1995, 16(2): 125-126.
  • 7Furui S. Research of individuality features in speech waves and automatic speaker recognition techniques. Speech Communication, 1986, 5(2): 183-197.
  • 8Abe M, Nakamura S, Shikano K, Kuwabara H. Voice conversion through vector quantization. In: Proceedings of the 1998 IEEE International Conference on Acoustic, Speech, and Signal Processing. New York, USA: IEEE, 1988. 655-658.
  • 9Arslan L M. Speaker transformation algorithm using segmental codebooks (STASC). Speech Communication, 1999, 28(3): 211-226.
  • 10Narendranath M, Murthy H A, Rajendran S, Yegnanarayana B. Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 1995, 16(2): 207-216.

共引文献6

同被引文献72

引证文献6

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部