摘要
研究汉语自然口语识别中的建模单元选择问题。在HMM三状态模型中,声韵母单元与音素单元作为两种最流行的建模单元各有优劣。一方面从自然口语音变严重的问题出发,倾向采用粗粒度的声韵母单元以概括各种音变;另一方面从三状态结构可能无法有效描述复杂单元的问题出发,又倾向采用细粒度的音素单元。本文在实验语音学理论研究成果与声韵母时长分析实验结果的基础上,主张对扩展声韵母单元进行有选择的拆分,提出了基于鼻韵尾分离的声韵母拆分方法。实验结果表明本文的方法与扩展声韵母单元、音素单元相比,识别性能有了明显改善,其字错误率分别降低2.23%和9.45%。
This paper focuses research on acoustic modeling unit selection in Chinese Putonghua spontaneous speech recognition. Under HMM three-state models,two most popular modeling units,namely extended initial/final(XIF) units and phoneme units,have their own advantages and drawbacks.On one hand,from the perspective of serious pronunciation variation problem in spontaneous speech,the coarsely granular XIF units are preferred to gather up all kinds of pronunciation variations.On the other hand,from the perspective of the low-distinguish ability of three-state structure for complex modeling units,the finely granular phoneme units are preferred.In this paper,based on theoretical achievements of experimental phonetics and the experimental results of duration analysis of XIF units,we propose an XIF model with separating nasal coda.Experiments carried out on a Chinese Putonghua spontaneous speech recognition task show that our proposed method is better than the XIF modeling and phoneme-based modeling,with the character error rate is reduced by 2.23%and 9.45%respectively.
出处
《声学学报》
EI
CSCD
北大核心
2010年第5期587-592,共6页
Acta Acustica
基金
973项目(2004CB318106)
863项目(2006AA010102)和863项目(2006AA01Z195)