期刊文献+

多模式语音端点检测 被引量:6

Multimodal voice activity detection
原文传递
导出
摘要 在语音信号处理系统中,基于帧能量的语音端点检测(voiceactivitydetection,VAD)往往受到语音段能量不平稳及噪声的影响,为了提高语音端点检测的性能和鲁棒性,引入视觉信息。该文提出采用基于数据驱动的线性变换生成视觉特征,在提出一个基于统计的VAD一般模型的基础上构建两个单模式的VAD系统,通过两步式的融合方法,得到了多模式的VAD系统。实验表明:同时利用音频和视觉信息的多模式VAD比基于帧能量的听觉VAD在帧错误率上有55.0%的相对下降,在断句错误率上有98.5%的相对下降。这一结果说明多模式VAD方法基本可以避免断句错误,也能够显著改善帧检测性能,是一种相当有效的方法。 In speech recognition systems, the frame energy based voice activity detection (VAD) method may be affected by interterance trom background noise and non-stationary characteristics of the frame energy in the voice segment. This paper presents a model to improve the performance and robustness of VAD by introducing visual intormation, Data driven linear transtormations are used for visual teature extraction with a general statistical VAD model and a two stage fusion strategy in a muhimodal VAD system. Experiments show a 55.0% reduction in the trame error rate and a 98.5%. reduction in sentence breaking error rate with the multimodal VAD as compared to the frame energy-based audio VAD. The results show thai muhimodal method eliminates most sentence breaking errors, and improves trame detection pertormance.
作者 刘鹏 王作英
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2005年第7期896-899,共4页 Journal of Tsinghua University(Science and Technology)
基金 国家"八六三"高技术项目(2001AA114071)
关键词 语音识别 语音端点检测 多模式 speech recognition voice activity detection multimodal
  • 相关文献

参考文献7

  • 1Lamel L F,Rabiner L R,Rosenberg A E,et al.An improved endpoint detector for isolated word recognition [J].IEEE Trans Acoust,Voice,Signal Processing,1981,29(8):777-785.
  • 2Shen J L,Hung J W,Lee L S.Robust entropy based endpoint detection for voice recognition in noisy environments [A].Proc 4th Int Conf on Spoken Language Processing (ICSLP'96) [C].Philadelphia:IEEE,1996.881-884.
  • 3CHEN Tsuhan.Audiovisual speech processing [J].IEEE Signal Processing Magazine,2001,18(1):921.
  • 4Kirby M,Sirovich L.Application of the Karhunen-Loeve procedure for the characterization of human faces [J].IEEE Trans Pattern Analysis and Machine Intelligence,1990,12(1):103-108.
  • 5Nelder J A,Mead R.A simplex method for function optimization [J].Comput J,1965,7(4):308-313.
  • 6Tanyer S G,Ozer H.Voice activity detection in nonstationary noise [J].IEEE Trans Acoust,Voice,Signal Processing,2000,8(7):478-482.
  • 7TIAN Ye,WU Ji,WANG Zuoying,et al.Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection [A].Proc 2003 IEEE Int Conf on Acoustic,Speech,and Signal Processing (ICASSP'03) [C].Hong Kong:IEEE,2003.444-447.

同被引文献142

引证文献6

二级引证文献87

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部