期刊文献+

基于音素后验概率的样例语音关键词检测方法 被引量:3

A Query-by-Example Spoken Term Detection Method Based on Phonetic Posteriorgram
下载PDF
导出
摘要 低资源条件下的语音关键词检测是一个具有挑战性的问题,因为传统的基于大词汇量连续语音识别(LVCSR)的语音关键词检测方法不再适用.针对此问题提出了一种基于深度神经网络(DNN)输出层后验概率特征和改进的动态时间规整(DTW)算法的语音关键词检测方法.采用无监督高斯混合模型(GMM)和中、英文DNN音素模型得出的输入特征构建互补的子系统,并在SWS2013多语种数据集上进行实验.结果表明:相对于基线系统,分数层面的多语种、多系统融合能够有效地提升语音关键词检测系统的性能. Spoken term detection in low-resource situations is a challenging task, because traditional large vocabu- lary continuous speech recognition (LVCSR)approaches are often unusable. We propose a query-by-example (QBE) spoken term detection (STD)method based on deep neural network (DNN)posteriorgram features and a modified dy- namic time warping (DTW) research approach. Subsystems are built with unsupervised Gaussian mixture model (GMM) and DNN monophone models trained on Chinese and English languages. The subsystems are then evaluated on the SWS2013 multilingual database of low-resource languages. The score-level fusion of these different languages and different subsystems is shown to improve performance significantly compared with the baseline results.
出处 《天津大学学报(自然科学与工程技术版)》 EI CAS CSCD 北大核心 2015年第9期757-760,共4页 Journal of Tianjin University:Science and Technology
基金 国家自然科学基金资助项目(61370034 61273268 61403224)
关键词 样例查询 语音关键词检测 DNN输出层特征 动态时间规整 query-by-example spoken term detection deep neural network output features dynamic time warping
  • 相关文献

参考文献10

  • 1Miller D R H, Kleber M, Kao C L, et al. Rapid and accurate spoken term detection EC] //Proc Interspeech. Antwerp, Belgium, 2007: 314-317.
  • 2Hazen T J, Shen W, White C. Query-by-example spo- ken term detection using phonetic posteriorgram tem- plates[C]// Proc ASRU 1EEE. Florence, Italy, 2009: 421-426.
  • 3Rodriguez-Fuentes L J, Varona A, Penagarikano M, et al. High-performance query-by-example spoken term de- tection on the SWS 2013 evaluationEC]//Proc ICASSP IEEE. Florence, Italy, 2014: 7819-7823.
  • 4Zhang Y, Glass J R. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams [C]//Proc ASRU IEEE. Merano, Italy, 2009: 398- 403.
  • 5Szoke I, Burget L, Grezl fusion of query-by-example [C]// Proc ICASSP IEEE. 7849-7853. F, et al. Calibration and systems--But SWS 2013 Florence, Italy, 2014 :.
  • 6Ney H. The use of a one-stage dynamic programming algorithm for connected word recognition[J]. IEEE Transactions on Acoustics, Speech, and Signal Proc- essing, 1984, 32(2): 188-196.
  • 7Anguera X, Rodriguez-Fuentes L J, Sz6ke I, et al. Query-by-example spoken term detection evaluation on low-resource languages EC].//Proceedings of the 4th In- ternational Workshop on Spoken Language Technologies for Under-Resourced Languages. St. Petersburg, Russia, 2014: 24-31.
  • 8Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. 1EEE Signal Process- ingMagazine, 2012, 29: 82-97.
  • 9Cai M, Shi Y Z, Liu J. Deep maxout neural networks for speech recognition [C]// Proc ASRU IEEE. Olomouc, Czech Republic, 2013: 291-296.
  • 10Wang H, Lee D. CUHK system for the spoken web search task at MediaEval 2012[C~// Proc MediaEval. Pisa, Italy, 2012: 1-2.

同被引文献4

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部