期刊文献+

一种利用SE-Res2Net的合成语音检测系统 被引量:3

A Synthetic Speech Detection System Using SE-Res2Net
下载PDF
导出
摘要 传统的说话人识别(Automatic Speaker Verfication, ASV)系统难以分辨合成语音,构建一个说话人保护系统刻不容缓。针对合成语音侵扰说话人识别系统问题,从特征层面提出了一种基于经验模式分解(Empirical Mode Decomposition, EMD)的梅尔倒谱系数(Mel Frequency Cepstral Coefficients, MFCC)+逆梅尔倒谱系数(Inverse Mel Frequency Cepstral Coefficients, IMFCC)的双通道语音特征作为合成语音检测的前端特征,在后端分类器上串联Res2Net网络和SENet网络组合成SE-Res2Net网络来提升模型的泛化能力。将不同特征与模型的打分结果融合,进一步提高实验性能。在ASVspoof2019数据集上的实验结果表明,该设计的合成语音检测系统能有效检测合成语音,与ASVspoof2019比赛的基线系统相比,融合模型的等错误概率(Equal Error Rate, EER)与串联成本检测函数(tandem Detection Cost Function, t-DCF)分别降低了49%和64%。 It is difficult for traditional Automatic Speaker Verfication(ASV) systems to distinguish synthetic speech, so it is urgent to build a speaker protection system.A two-channel speech feature based on Empirical Mode Decomposition(EMD) of Mel Frequency Cepstral Coefficients and Inverse Mel Frequency Cepstral Coefficients(MFCC+IMFCC) is proposed as the front-end feature for synthetic speech detection at the feature level, and then the Res2 Net network and the Squeeze-and-Excitation Network(SENet) are cascaded on the back-end classifier to form an SE-Res2 Net network to enhance the generalization ability of the model.The scoring results of different features and models are fused to further improve the experimental performance.The experimental results on the ASVspoof2019 dataset show that the synthetic speech detection system designed can effectively detect synthetic speech.Compared with the baseline system of the ASVspoof2019 competition, the Equal Error Rate(EER) and tandem Detection Cost Function(t-DCF) of the fused model are reduced by 49% and 64%,respectively.
作者 梁超 高勇 LIANG Chao;GAO Yong(School of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China)
出处 《无线电工程》 北大核心 2022年第9期1560-1565,共6页 Radio Engineering
关键词 合成语音检测 Res2Net 经验模式分解 SENet 等错误概率 串联成本检测函数 synthetic speech detection Res2Net EMD decomposition SENet EER t-DCF
  • 相关文献

参考文献3

二级参考文献77

  • 1汪峥,连翰,王建军.说话人识别中特征参数提取的一种新方法[J].复旦学报(自然科学版),2005,44(1):197-200. 被引量:16
  • 2谷志新,王述洋,田仲富.声纹识别技术中特征语音参数选取的相关问题[J].林业劳动安全,2005,18(2):27-30. 被引量:2
  • 3张华,裘雪红.说话人识别中LPCCEP倒谱分量的相对重要性[J].计算机技术与发展,2006,16(4):67-68. 被引量:1
  • 4PRUZANSKY S.Pattern-matching procedure for automatic talker recognition[M].J.Acoust.Sec.Amer,1963:354-358.
  • 5ATAL B S.Automatic speaker recognition based on pitch contours[M].J.PH.D thesis.Ploytechinc Inst.Brooklyn.NY,1968:289-212.
  • 6TIERNEY J.A study of LPC analysis of speech in additive noise[J].IEEE Trans on Signal Processing,1990,24(4):389-397.
  • 7马俊.声纹识别技术研究[D].哈尔滨:哈尔滨工程大学,2004:33-49.
  • 8DANIEL G R,JULIAN F A.Using quality measures for multilevel speaker recognition[J].Computer Speech and Language,2006(20):192–209.
  • 9LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
  • 10HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets [J]. Neural Computation, 2006, 18(7): 1527-1554.

共引文献569

同被引文献9

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部