期刊文献+

基于深度神经网络的蒙古语声学模型建模研究 被引量:5

Mongolian acoustic modeling based on deep neural network
下载PDF
导出
摘要 针对高斯混合模型在蒙古语语音识别声学建模中不能充分描述蒙古语声学特征之间相关性和独立性假设的问题,开展了使用深度神经网络模型进行蒙古语声学模型建模的研究。以深度神经网络为基础,将分类与语音特征内在结构的学习紧密结合进行蒙古语声学特征的提取,构建了DNN-HMM蒙古语声学模型,结合无监督预训练与监督训练调优过程设计了训练算法,在DNN-HMM蒙古语声学模型训练中加入dropout技术避免过拟合现象。最后,在小规模语料库和Kaldi实验平台下,对GMM-HMM和DNN-HMM蒙古语声学模型进行了对比实验。实验结果表明,DNN-HMM蒙古语声学模型的词识别错误率降低了7.5%,句识别错误率降低了13.63%;同时,训练时加入dropout技术可以有效避免DNN-HMM蒙古语声学模型的过拟合现象。 Considering the difficulty of using the Gaussian mixture model(GMM)to adequately describe the correlation and independence hypothesis of the Mongolian acoustic features in the acoustic modeling of Mongolian speech recognition,this study investigates an acoustic model based on deep neural network(DNN).Firstly,using DNN,the internal structure of phonetic features were classified and learned to extract the Mongolian acoustic features,and a DNNHMM Mongolian acoustic model was constructed.Secondly,a training algorithm was designed by combining unsupervised pre-training and supervised training tuning.In addition,dropout technology was added into the DNN-HMM Mongolian acoustic model training to avoid the over-fitting phenomenon.Finally,a comparative experiment was conducted for the GMM-HMM and DNN-HMM Mongolian acoustic models on basis of the small-scale corpus and Kaldi experimental platform.Experimental results show that the word recognition error rate of DNN-HMM Mongolian model was reduced by 7.5%and sentence recognition error rate was reduced by 13.63%.In addition,the over-fitting of DNN-HMM Mongolian acoustic model can be effectively avoided by adopting the dropout technique during training.
作者 马志强 李图雅 杨双涛 张力 MA Zhiqiang;LI Tuya;YANG Shuangtao;ZHANG Li(School of Data Science&Application,Inner Mongolia University of Technology,Hohhot 010080,China)
出处 《智能系统学报》 CSCD 北大核心 2018年第3期486-492,共7页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金项目(61762070 61650205)
关键词 语音识别 声学模型 GMM-HMM DNN-HMM 监督学习 预训练 过拟合 DROPOUT speech recognition acoustic model GMM-HMM DNN-HMM supervised learning pre-training over-fitting dropout
  • 相关文献

参考文献2

二级参考文献10

  • 1Bao Fei-long,Gao Guang-lai.The Research on Mongolian Spo-ken Term Detection Based on Confusion Network[C]∥Procee-dings of The Chinese Conference on Pattern Recognition(CCPR2012).Beijing,2012:606-612.
  • 2Gao Guang-lai,Biligetu,Nabuqing,et al.A Mongolian speechrecognition system based on HMM[C]∥Proceedings of International Conference on Intelligent Computing(ICIC2006).Kunming,2006:667-676.
  • 3Qilao H S,Gao Guang-lai.Researching of Speech Recognition Oriented Mongolian Acoustic Model[C]∥Proceedings of The Chinese Conference on Pattern Recognition(CCPR2008).Beijing,2008:406-411.
  • 4Bao Fei-long,Gao Guang-lai.Improving of Acoustic Model forthe Mongolian Speech Recognition System[C]∥Proceedings of The Chinese Conference on Pattern Recognition(CCPR2009).Nanjing,2009:616-620.
  • 5Mangu L,Brill E,Stolcke A.Finding consensus in speech recognition:word error minimization and other applications of confusion networks[J].Computer Speech and Language,2000,14(4):373-400.
  • 6Mamou J,Carmel D,Hoory R.Spoken document retrieval from call-center conversations[C]∥Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval.New York,NY,USA,2006:51-58.
  • 7Mamou J,Ramabhadran B,Siohan O.Vocabulary independentspoken term detection[C]∥Proc.ACM-SIGIR'07.Amsterdam,2007:615-622.
  • 8Young S,et al.The HTK book(Revised for HTK version 3.4.1)[M].Cambridge University,2009.
  • 9Stolcke A.SRILM-An Extensible Language Modeling Toolkit[C]∥Proc.Intl.Conf.Spoken Language Processing.Denver,Colorado,2002.
  • 10Cavnar W B,Trenkle J M.N-gram-based text categorization. Ann Arbor MI . 1994

共引文献3

同被引文献19

引证文献5

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部