期刊文献+

混响环境中的视-听语音识别 被引量:3

Audio-Visual Speech Recognition in Reverberant Environments
下载PDF
导出
摘要 针对混响环境中语音识别率相对安静环境下急剧下降的问题,提出了一种将语音视觉信息与音频特征相结合的方法。通过快速检测和定位包含说话人唇部的感兴趣区域(ROI),获得ROI图像序列。首先对ROI图像进行离散余弦变换,提取反映说话人唇动的视觉特征。音频特征的提取,则采用较为成熟的Mel频率倒谱系数(MF-CC)方法。对所获取的视、音特征采用隐马尔可夫模型作为训练识别算法。测试实验结果证明,采用视、听特征相结合的方法,有效地提高了混响环境中的语音识别率。 The speech recognition rate in reverberant environments declines sharply compared with that in quiet environ- ments. This paper proposed a method which combines visual features with audio features. The ROI image sequence is ob- tained by rapidly detection and location of the region of interest (R01) which contains speaker' s mouth area. Firstly, the extraction of visual features is performed by discrete cosine transform which is calculated from image of ROI. The MFCCs are extracted as the audio features. The HMM is used for training and recognition. The experimental results have shown that the method can effectively improve the recognition rate in reverberation conditions.
机构地区 西北工业大学
出处 《电声技术》 2012年第12期42-45,共4页 Audio Engineering
基金 陕西省自然科学基金(2012JM1010) 西北工业大学研究生创业种子基金(Z2012008)
关键词 语音识别 混响 视-听特征融合 隐马尔可夫模型 speech recognition reverberation audio-visual integration Hidden Markov Model( HMM )
  • 相关文献

参考文献8

  • 1吴佳栋,陈光冶.语音信号去混响原理与技术[J].电声技术,2006,30(5):63-67. 被引量:9
  • 2POTAMIANOS G, NETI C. Recent advances in the automatic recognition of audio-visual speech [ J ]. Proc of the IEEE, 2003,91 (9) :1306 - 1326.
  • 3CHALOUPKA J ,NOUZA J ,ZDANSKY J. Audio-visual voice command recognition in noisy conditions[ C ]//AVSP. Moreton Island, Australia: [ s. n. ] ,2008.
  • 4MATI'HEW I. Features for audio-visual speech recognition [D]. England: University of East Anglia, 1998.
  • 5HECKMANN M, KROSCHEL K. DCT-based video features for audio-visual speech recognition [ C ]//Proc. of Inter. Conf. on Spoken Language Processing, ICSLP. Denver Colorado: [ s. n. ],2002 : 1925 - 1928.
  • 6何俊,张华.一种唇读嘴唇的实时检测方法[C]//第26届中国控制会议.张家界:[出版者不详],2007:516-520.
  • 7CHEN Q. Multimodel biometric personal identification based on data fusion[ D]. Hangzhou :Zhejiang University,2007.
  • 8丁世飞,靳奉祥,王健,王孝莹.一种新的基于信息论的PCA特征压缩算法[J].小型微型计算机系统,2004,25(4):694-697. 被引量:7

二级参考文献25

  • 1姜旦.信息论[M].合肥:中国科技大学出版社,1987.14-96.
  • 2COLE D,MOODY M,SRIDHARAN S.Intelligibility of reverberant speech enhanced by inversion of room response[C].International Symposium on Speech,Image Processing and Neural Networks,1994:13-16.
  • 3MIYOSHI M,KANEDA Y.Inverse filtering of room acoustics[J].IEEE Trans.ASSP,1988,36(2):145-152.
  • 4WALSH J P.On limitations of minimum mean-square error deconvolution in deriving impulse response of rooms[J].J.Acoust.Soc.Amer.,1985,77(2):547-556.
  • 5OPPENHEIM A V,SCHAFER R W.Digital signal proeessing[M].Prentice Hall Inc.,1975.
  • 6BEES D,BLOSTEIN M,KABAL P.Reverberant speech enhancement using cepstral processing[C].IEEE ICASSP-91,1991:977-980.
  • 7WU M.A One-microphone algorithm for reverberant speech enhancement[J].Proc.of ICASSP,2003,1:92-95.
  • 8NAKATANI T.Blind dereverberation of single channel speech signal based on harmonic structure[J].Proc.of ICASSP,2003,1:892-895.
  • 9LANAGAN J L F,JOHNSTON J D,ZAHN R.Computer-steered microphone arrays for sound transduction in large room[J].J.Acoust.Soc.Amer.,1985,78(5):1 508-1518.
  • 10ALLEN J B.Short term spectral analysis,synthesis,and modification by discrete Fourier transform[J].IEEE Trans.Acoust.Speech Signal Process.,1977,25:235-238.

共引文献14

同被引文献22

  • 1梁刘红,富亮,薛向阳.电视节目自动分割算法[J].计算机研究与发展,2004,41(9):1514-1520. 被引量:1
  • 2樊星,顾伟康,叶秀清.多媒体会议中的快速实时自适应混音方案研究[J].软件学报,2005,16(1):108-115. 被引量:18
  • 3蔡必强.视频会议中混音技术研究[J].现代电子技术,2006,29(20):85-87. 被引量:7
  • 4田破荒,李弼程,彭天强.一种有效的视频文字提取方法[J].电视技术,2007,31(10):87-89. 被引量:1
  • 5RASHEED Z,SHEIKH Y,SHAH M. On the use of computable features for fihn classification[ J ]. IEEE Transactions on Circuits and Systems for Video Technology ,2005,15 ( 1 ) :52~34.
  • 6LIU Zhu, HUANG Jincheng, WANG Yao. Classification of "IV programs based on audio information using hidden Markov model[ C]//Proc. IEEE Signal Processing Society Workshop on Multimedia Signal Processing. [ S. I. ] :IEEE Press,1998:27-32.
  • 7WANG Jinqiao, DUAN Lingyu, LIU Qingshan, et al. A muhimodal scheme for program segmentation and representation in broadcast video stream[ J ]. IEEE Trmlsactions on Multimedia,2008,10( 3 ) :393-408.
  • 8LIU Z,HUANG Q. Adaptive anchor detection using on-line trained audi- o/visual model[ C ]//Proc. SPIE. San Jose,CA : [ s. n. ] ,2000:156-167.
  • 9博客园.KinectforWindowsSDK开发人门(六):骨骼追踪基础上[EB/OL].[2013-05-10].http://www.cnblogs.com/yangecnu/archive/2012/04/06/KinectSDK-Skeleton-Tracking_Partl.html.
  • 10博客园.KinectforWindowsSDK开发入门(十二):语音识别上[EB/OL].[2013-05-10].http://www.cnborgs.com/yangecnu/archive/2012,05,03,KinectSDKSpeechRecognition-partl.html.

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部