期刊文献+

汉语连续语音识别之音素声学模型的改进 被引量:7

Improvement of Phoneme Acoustic Modeling in Large Vocabulary Continuous Mandarin Speech Recognition System
下载PDF
导出
摘要 研究基于主元音音素基元的声学模型的改进。由于汉语语音特点,主元音模型得到了广泛的应用。通过分析主元音音素模型,发现该模型存在词组音节序列字界线有歧义,从而提出主元音的改进方法以明确音节序列中字的分界,减小基元规模,提高语音系统识别率。为了描述连续语意中的协同发音现象,还针对改进后的主元音基元,设计了相应的有调问题集,利用决策树的参数共享策略建立了上下文相关的音素模型。实验结果表明,改进后的有调音素集合在削减了原有基元个数的基础上,字误识率(CER)有0.4%-0.6%的明显改善。 This research studies the improvements of the main vowel phonemes acoustic model. According to the features of Mandarin, main vowel method is widely used. Through analyzing the phoneme model, it is discovered that the decomposition of a word' s pronunciation into a sequence of syllables is not unique. In this paper, methods are developed to optimize the main vowel modeling, including syllable refinement, model size and character error rate reduction. To describe the semantics of the continuous co - articulation phenomenon, this paper designs a set of appropriate questions, and builds a context dependent tri - phone acoustic model based on decision - tree - based state - tying. Experiments show that the improved acoustic model is of less size than the old one and leads to an absolute reduction of character error rate (CER) by about 0.4% -0.6%.
出处 《计算机仿真》 CSCD 北大核心 2010年第5期355-358,共4页 Computer Simulation
关键词 大词汇量连续汉语语音识别 音素 主元音 决策树 Large vocabulary continuous mandarin speech recognition Phoneme Main vowel Decision tree
  • 相关文献

参考文献8

  • 1MA Bin and HUO Qiang. Benchmark results of triphone - based acoustic modeling on HKU96 and HKU99 putonghua corpora [ J ]. International Symposium on Chinese Spoken Language Processing ( ISCSLP' 00), Oct. 13 - 15 2000. 359 - 362.
  • 2M Y Hwang, et. al. Building a highly accurate mandarin speech recognizer[ C ]. in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Kyoto, Japan, Dec. 2007. 490 - 495.
  • 3M Y Hwang, X D Huang and F Alleva. Predicting unseen triphones with senones[C], in Proc. ICASSP, 1993.311 -314.
  • 4C J Chen, et. al. Recognize tone languages using pitch information on the main vowel of each syllable[C], in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Salt LakeCity, USA, May 2001,1:61 -64.
  • 5P F WONG and M H SIU. Decision tree based tone modeling for Chinese speech recognition[ C ]. in Prec. ICASSP, 2004,1. 905 -908.
  • 6B Hoffmeister et. al. Development of the 2007 RWTH mandarin LVCSR system[ C]. in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Kyoto, Japan, Dec. 2007. 455 - 460.
  • 7C Plaid, B Hoffmeister, M Hwang, D Lu, G I-leigold, J L?? f, R Schluter and H Ney. Recent Improvements of the RWTH GALE Mandarin LVCSR System[J]. In Interspeech, Brisbane, Australia, September 2008. 2426 -2429.
  • 8李净,徐明星.汉语连续语音识别中声学模型基元比较:音节、音素、声韵母[C].第六届全国人机语音通信会议,20014:267-280.

共引文献3

同被引文献39

  • 1李素建,王厚峰,俞士汶,辛乘胜.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量:93
  • 2吕琳,周世斌,刘玉树.一种高性能英文词性标注器的设计与实现[J].北京理工大学学报,2005,25(10):876-879. 被引量:5
  • 3汤玲,戴斌.抗噪声语音识别及语音增强算法的应用[J].计算机仿真,2006,23(9):80-82. 被引量:5
  • 4ZHU Qi-feng.Incorporating tandem/HATs MLP features into SRI's conversational speech recognition system[J].in Proc.DARPA RT Workshop 2004.
  • 5Jing Zheng.Combining Discriminative Feature,Transform,and Model Training for Large Vocabulary Speech Recognition[C].in Proc.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing,Honolulu,Hawaii,2007(4):633-636.
  • 6wang M Y.Building a highly accurate mandarin speech recognizer[J].in Proc.IEEE Automatic Speech Recognition and Understanding Workshop,Kyoto,Japan,Dec.,2007:490-495.
  • 7Chen B.Learning long-term temporal features in LVCSR using neural networks[J].in Proc.Int.Conf.on Spoken Language Processing,Jeju Island,Korea,Oct.,2004.
  • 8Hermansky H,Ellis D P W,Sharma S.Tandem connectionist feature stream extraction for conventional hmm systems[C].in Proc.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing,Istanbul,Turkey,2000:1635-1638.
  • 9Valente F,Hermansky H.Combination of acoustic classifiers based on dempster-shafer theory of evidence[J].in Proc.IEEE Int.Conf.on Acoustics,Speech,and Signal Processing,Honolulu,HI,USA,Apr.,2007.
  • 10Morgan N,Chen B Y,Zhu Q,et al.Trapping Conversational.Speech:Extending TRAP/Tandem approaches to conversational telephone speech recognition[J].in Proceedings of IEEE ICASSP,Montreal,May 2004.

引证文献7

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部