摘要
针对混响环境中语音识别率相对安静环境下急剧下降的问题,提出了一种将语音视觉信息与音频特征相结合的方法。通过快速检测和定位包含说话人唇部的感兴趣区域(ROI),获得ROI图像序列。首先对ROI图像进行离散余弦变换,提取反映说话人唇动的视觉特征。音频特征的提取,则采用较为成熟的Mel频率倒谱系数(MF-CC)方法。对所获取的视、音特征采用隐马尔可夫模型作为训练识别算法。测试实验结果证明,采用视、听特征相结合的方法,有效地提高了混响环境中的语音识别率。
The speech recognition rate in reverberant environments declines sharply compared with that in quiet environ- ments. This paper proposed a method which combines visual features with audio features. The ROI image sequence is ob- tained by rapidly detection and location of the region of interest (R01) which contains speaker' s mouth area. Firstly, the extraction of visual features is performed by discrete cosine transform which is calculated from image of ROI. The MFCCs are extracted as the audio features. The HMM is used for training and recognition. The experimental results have shown that the method can effectively improve the recognition rate in reverberation conditions.
出处
《电声技术》
2012年第12期42-45,共4页
Audio Engineering
基金
陕西省自然科学基金(2012JM1010)
西北工业大学研究生创业种子基金(Z2012008)
关键词
语音识别
混响
视-听特征融合
隐马尔可夫模型
speech recognition
reverberation
audio-visual integration
Hidden Markov Model( HMM )