摘要
基于回归树模型的多特征空间建模方法在回归类内部进行特征音分析,较好地解决了训练数据不足时说话人模型的训练问题,而短语音段聚类策略又进一步避免了过短的语音片断对自举训练的影响.验证实验采用了实际录制的近8小时的不同谈话数据.结果显示,即使平均自举片断长度小于5秒,新方法依然非常稳健,不仅提高了说话人改变检测的效果,而且优于通常的自举方法.
A robust bootstrapping framework, which employs Multi-EigenSpace modeling technique based on regression class (RC-MES) to build speaker models with sparse data, and a short-segments clustering to prevent the too short segments from influencing bootstrapping, are proposed in this paper. For a real discussion archived with a total duration of 8 hours, the significant robustness of the proposed method is demonstrated, which not only improves the speaker change detection performance but also outperforms the conventional bootstrapping methods, even if the average bootstrapping segment duration is less than 5 seconds.
出处
《软件学报》
EI
CSCD
北大核心
2007年第3期608-616,共9页
Journal of Software
基金
Supported by the Science & Technology Research and Development Plan of Shanxi Province of China under Grant No.2005k04G23(陕西省科学技术研究发展计划)
关键词
说话人检索
说话人模型
回归类
特征音
speaker indexing
speaker model
regression class
eigenvoice