摘要
在说话人确认中,通常采用的声学特征(如MFCC,PLP特征等)包含的主要是文本信息和信道信息,说话人信息属于其中的弱信息,极易受到语音信号中的文本信息及信道、噪声等干扰的影响.针对这个问题,提出一种基于深度神经网络提取语音信号中说话人特征的方法,该方法用语音识别深度神经网络各个隐层非线性输出值来提取说话人特征.在RSR2015数据库上开展了GMM-UBM文本无关和文本相关说话人确认实验,实验结果表明本文方法提取的特征相对于传统的MFCC特征,系统等错误率(Equal Error Rate,EER)有了明显的下降.
In speaker verification, Acoustic features ( e. g. MFCC, PLP, etc. ) that are frequently used contain speech content and chan- nel information mainly, speaker information is a kind of weak information contained in speech signal, which may be affected easily by other information and disturbance contained in speech signal such as speech content, channel variation, and noise. To address this prob- lem,this paper proposes a new way of speaker feature extraction based on deep neural network for ASR, which uses nonlinear output of hidden layers in DNN. Text-independent and text-dependent GMM-UBM experiments on RSR2015 database showed that the pro- posed method can achieve a valid performance gain over the MFCC feature on EER.
出处
《小型微型计算机系统》
CSCD
北大核心
2017年第1期142-146,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61273264)资助