期刊文献+

在线无监督说话人检索中稳健的模型自举算法 被引量:3

A Robust Bootstrapping Algorithm of Speaker Models for On-Line Unsupervised Speaker Indexing
下载PDF
导出
摘要 基于回归树模型的多特征空间建模方法在回归类内部进行特征音分析,较好地解决了训练数据不足时说话人模型的训练问题,而短语音段聚类策略又进一步避免了过短的语音片断对自举训练的影响.验证实验采用了实际录制的近8小时的不同谈话数据.结果显示,即使平均自举片断长度小于5秒,新方法依然非常稳健,不仅提高了说话人改变检测的效果,而且优于通常的自举方法. A robust bootstrapping framework, which employs Multi-EigenSpace modeling technique based on regression class (RC-MES) to build speaker models with sparse data, and a short-segments clustering to prevent the too short segments from influencing bootstrapping, are proposed in this paper. For a real discussion archived with a total duration of 8 hours, the significant robustness of the proposed method is demonstrated, which not only improves the speaker change detection performance but also outperforms the conventional bootstrapping methods, even if the average bootstrapping segment duration is less than 5 seconds.
出处 《软件学报》 EI CSCD 北大核心 2007年第3期608-616,共9页 Journal of Software
基金 Supported by the Science & Technology Research and Development Plan of Shanxi Province of China under Grant No.2005k04G23(陕西省科学技术研究发展计划)
关键词 说话人检索 说话人模型 回归类 特征音 speaker indexing speaker model regression class eigenvoice
  • 相关文献

参考文献1

二级参考文献11

  • 1Delacourt, P., Wellekens, C.J. DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication, 2000,32(1~2):111~126.
  • 2Guo, Xue-feng, Zhu, Wei-bin, Shi, Qiu. The IBM LVCSR system used for 1998 Mandarin broadcast news transcription evaluation. In: Proceedings of the 1999 DARPA Broadcast News Workshop. 1999. http://www.nist.gov/.
  • 3Bakis, R., Chen, S., Gopalakrishnan, P.S., et al. Transcription of broadcast news shows with the IBM large vocabulary speech recognition system. In: Proceedings of the DARPA Speech Recognition Workshop. Chantilly, 1997. 67~72.
  • 4Wegmann, S., Zhan, P., Gillick, L. Progress in broadcast news transcription at Dragon systems. In: Proceedings of the ICASSP'99, Vol. 1. Phoenix, Arizona: IEEE. 1999. 33~36.
  • 5Siegler, M.A., Jain U., Raj, B., et al. Automatic segmentation, classification, and clustering of broadcast news audio. In: Proceedings of the DARPA Speech Recognition Workshop. Chantilly, 1997. 97~99.
  • 6Cover, T.M., Tomas, J.A. Elements of Information Theory. New York: John Wiley & Sons, 1991. 1197-1208.
  • 7Gish, H., Schmidt, N. Text-Independent speaker identification. IEEE Signal Processing Magazine, 1994,11(4):18~32.
  • 8Chen, S.S., Gopalakrishnan, P.S. Clustering via the bayesian information criterion with applications in speech recognition. In: Proceedings of the ICASSP'98, Vol. 2, Seattle, Washington: IEEE, 1998. 645~648.
  • 9Schwarz, G. Estimating the dimension of a model. The Annuals of Statistics, 1978,6:461~464.
  • 10Delacourt, P., Wellejkens, C.J. Audio data indexing: use of second-order statistics for speaker-based segmentation. In: Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS'1999), Vol.2. Florence, Italy: IEEE, 1999. 959~963.

共引文献16

同被引文献77

  • 1..http://www.itl.nist.gov/iad/mig/tests/rt/,.
  • 2Narayanan K S. Unsupervised Speaker Indexing Using Generic Models[J]. IEEE Trans. on Speech and Audio Processing, 2005, 13(5): 1004-1013.
  • 3Chen S S, Gopalakrishnan P C. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion[C] //Proc. of DARPA Broadcast News Transcription & Understanding Workshop. New Your, USA: [s. n.] , 1998: 127-132.
  • 4Kotti M, Moschou V, Kotropoulos C. Speaker Segmentation and Clustering[J]. Signal Processing, 2008, 88(5): 1091-1124.
  • 5Delacourt P, Wellekens. DISTBIC: A Speaker-based Segmentation for Audio Data Indexing[J]. Speech Communication, 2000, 32(1/2): 111-126.
  • 6Kenny P, Boulianne G. Speaker and Session Variability in GMM- based Speaker Verification[J]. IEEE Trans. on Audio, Speech and Language Processing, 2007, 15(4): 1448-1460.
  • 7Chu S M, Tang Hao. Fishervoice and Semi-supervised Speaker Clustering[C] //Proc. of ICASSP’09. [S. 1.] : IEEE Press, 2009: 4089-4092.
  • 8He Q H, Yang J C. Combining GMM, Jenson’s Inequality and BIC for Speaker Indexing[J]. Electronics Letters, 2010, 46(9): 654-655.
  • 9Nishida M, Kawahara T. Speaker Model Selection Based on Bayesian Information Criterion Applied to Unsupervised Speaker Model Indexing[J]. IEEE Trans. on Speech and Audio Processing, 2005, 13(4): 583-592.
  • 10Narayanan K S.Unsupervised Speaker Indexing Using Generic Models[J]. IEEE Transaction on Speech and Audio Processing,2005,13(5):1004-1013.

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部