期刊文献+

基于发音特征的汉语普通话语音声学建模 被引量:14

Tonal articulatory feature-based acoustic modeling for Chinese Putonghua speech recognition
原文传递
导出
摘要 将表征汉语普通话语音特点的发音特征引入汉语普通话语音识别的声学建模中,根据普通话发音特点,确定了用于区别普通话元音、辅音以及声调信息的9种发音特征,并以此为目标值训练神经网络得到语音信号属于各类发音特征的后验概率,将此概率作为语音识别的输入特征建立声学模型。在汉语普通话非特定人大词表自然口语对话识别系统中进行了实验验证,并与基于频谱特征的声学模型进行了比较,在相同解码速度下,由此方法建立的声学模型汉字错误率相对下降6.8%;将发音特征和频谱特征进行了融合实验,融合以后的识别系统相对基于频谱特征系统的汉字错误率相对下降10.1%。上述结果表明,基于发音特征的声学模型更加有效的实现了对语音特性的表征,通过利用发音特征和频谱特征的互补性,能够进一步实现对语音识别性能的提高。 The development of a Chinese Putonghua conversational Large Vocabulary Continues Speech Recognition (LVCSR) system using tonal articulatory tandem features is presented. A set of nine Articulatory Features (AF) that are used for classifying sounds and tones of Chinese Putonghua is given, and the posteriors of these nine AF classifiers are used as features in the Automatic Speech Recognition (ASR). In the experiment on Chinese Putonghua conversational LVCSR, compared with baseline ASR using standard acoustic features, the tonal AF-based ASR has a 6.8~ decrease on Character Error Rate (CER). When the AF combinations with standard acoustic features at feature-level and word-level, the CER achieves 10.1% relative reduction. These results prove that the AF is effective to capture the characteristics of the speech pronunciations, and with the complementary information provided by standard acoustic features and AF, the combination system achieves better performances further.
出处 《声学学报》 EI CSCD 北大核心 2010年第2期254-260,共7页 Acta Acustica
基金 国家科技支撑计划(2008BAI50B00) 国家自然科学基金(10925419 90920302 10874203 60875014)资助项目
  • 相关文献

参考文献19

  • 1Kirchhoff K. Robust speech recognition using articulatory information. PhD thesis, University of Bielefeld, Germany, 1999.
  • 2Livescu K et al. Articulatory feature-based methods for acoustic and audio-visual speech recognition: JHU Summer Workshop Final Report. Technical report, Johns Hopkins University Center for Language and Speech Processing, 2007.
  • 3Cetin Oet al. An articulatory feature-based tandem approach and factored tandem observation modeling, in ICASSP, 2007; 4: 645-648, ISBN: 1-4244-0727-3.
  • 4Cetin O, Magimai-Doss M, Livescu K, Kantor A, King S, Bartels C, Frankel J. Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs. in Proc. ASRU, 2007:36-41.
  • 5吴宗济.试论普通话语音的“区别特征”及其相互关系[J].中国语文,1982,(6).
  • 6Peter Ladefoged. A course in phonetics. Third Edition, P11-P13, University of California, Los Angeles, 1993.
  • 7Papcun J, Hochberg T R, Thomas F, Larouche J, Zacks J, Levy S. In ferring articulation and recognizing gestures from acoustics with a neural network trained on X-ray microbeam data. J. Acoust. Soc. Amer., 1992; 92(2): 688- 700.
  • 8Schroeter J, Sondhi M M. Techniques for estimating vocaltract shapes from the speech signal. IEEE Trans. Speech Audio Process, 1994(2): 133-150.
  • 9Lippmann R L. Review of neural networks for speech recognition. Neural Computation, MIT Press Cambridge, MA, USA, 1989:1- 38.
  • 10Chen B, Zhu Q, Morgan N. Learning long term temporal features in LVCSR using neural networks. In: Proc. ICSLP, 2004:612-615.

二级参考文献9

  • 1黄昌宁.统计语言模型能做什么?[J].语言文字应用,2002(1):77-84. 被引量:31
  • 2863评测网站[EB].http://www.863data.org.cn.英文版:http://www.863data.org.cn/english.
  • 3NIST语音类评测网站[EB].http://www.nist.gov/speech/tests/index.htm.
  • 4NIST机器翻译评测网站[EB].http://www.nist.gov/speech/tests/mt/index.htm.
  • 5TREC网站[EB].http://trec.nist.gov/.
  • 6CLEF评测网站[EB].http://www.clef-campaign.org/.
  • 7NTCIR评测网站[EB].http://research.nii.ac.jp/ntcir/workshop/.
  • 8MUC7 [EB]: http://www. itl. nist. gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc. html.
  • 9SIGHAN网站[EB].http://www.sighan.org/.

共引文献27

同被引文献98

引证文献14

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部