摘要
将表征汉语普通话语音特点的发音特征引入汉语普通话语音识别的声学建模中,根据普通话发音特点,确定了用于区别普通话元音、辅音以及声调信息的9种发音特征,并以此为目标值训练神经网络得到语音信号属于各类发音特征的后验概率,将此概率作为语音识别的输入特征建立声学模型。在汉语普通话非特定人大词表自然口语对话识别系统中进行了实验验证,并与基于频谱特征的声学模型进行了比较,在相同解码速度下,由此方法建立的声学模型汉字错误率相对下降6.8%;将发音特征和频谱特征进行了融合实验,融合以后的识别系统相对基于频谱特征系统的汉字错误率相对下降10.1%。上述结果表明,基于发音特征的声学模型更加有效的实现了对语音特性的表征,通过利用发音特征和频谱特征的互补性,能够进一步实现对语音识别性能的提高。
The development of a Chinese Putonghua conversational Large Vocabulary Continues Speech Recognition (LVCSR) system using tonal articulatory tandem features is presented. A set of nine Articulatory Features (AF) that are used for classifying sounds and tones of Chinese Putonghua is given, and the posteriors of these nine AF classifiers are used as features in the Automatic Speech Recognition (ASR). In the experiment on Chinese Putonghua conversational LVCSR, compared with baseline ASR using standard acoustic features, the tonal AF-based ASR has a 6.8~ decrease on Character Error Rate (CER). When the AF combinations with standard acoustic features at feature-level and word-level, the CER achieves 10.1% relative reduction. These results prove that the AF is effective to capture the characteristics of the speech pronunciations, and with the complementary information provided by standard acoustic features and AF, the combination system achieves better performances further.
出处
《声学学报》
EI
CSCD
北大核心
2010年第2期254-260,共7页
Acta Acustica
基金
国家科技支撑计划(2008BAI50B00)
国家自然科学基金(10925419
90920302
10874203
60875014)资助项目