Comparison of Different Implementations of MFCC 被引量：19

Comparison of Different Implementations of MFCC

导出

摘要 The performance of the Mel-Frequency Cepstrum Coefficients (MFCC) may be affected by (1) the number of filters, (2) the shape of filters, (3) the way in which filters are spaced, and (4) the way in which the power spectrum is warped. In this paper, several compar- ison experiments are done to find a best implementation. The traditional MFCC calculation excludes the 0th coefficient for the reason that it is regarded as somewhat unreliable. According to the analysis and experiments, the authors find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, which results in the FBE-MFCC. The au- thors also propose a better analysis, namely the auto-regressive analysis, on the frame energy, which outperforms its 1st and/or 2nd order differential derivatives. Experiments with the '863' Speech Database show that, compared with the traditional MFCC with its corresponding auto- regressive analysis coefficients, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the best combination, reducing the Chinese syllable er- ror rate (CSER) by about 10%, while the FBE-MFCC with the corresponding auto-regressive analysis coefficients reduces CSER by 2.5%. Comparison experiments are also done with a quite casual Chinese speech database, named Chinese Annotated Spontaneous Speech (CASS) corpus. The FBE-MFCC can reduce the error rate by about 2.9% on an average. The performance of the Mel-Frequency Cepstrum Coefficients (MFCC) may be affected by (1) the number of filters, (2) the shape of filters, (3) the way in which filters are spaced, and (4) the way in which the power spectrum is warped. In this paper, several compar- ison experiments are done to find a best implementation. The traditional MFCC calculation excludes the 0th coefficient for the reason that it is regarded as somewhat unreliable. According to the analysis and experiments, the authors find that it can be regarded as the generalized frequency band energy (FBE) and is hence useful, which results in the FBE-MFCC. The au- thors also propose a better analysis, namely the auto-regressive analysis, on the frame energy, which outperforms its 1st and/or 2nd order differential derivatives. Experiments with the '863' Speech Database show that, compared with the traditional MFCC with its corresponding auto- regressive analysis coefficients, the FBE-MFCC and the frame energy with their corresponding auto-regressive analysis coefficients form the best combination, reducing the Chinese syllable er- ror rate (CSER) by about 10%, while the FBE-MFCC with the corresponding auto-regressive analysis coefficients reduces CSER by 2.5%. Comparison experiments are also done with a quite casual Chinese speech database, named Chinese Annotated Spontaneous Speech (CASS) corpus. The FBE-MFCC can reduce the error rate by about 2.9% on an average.

作者郑方张国亮宋战江

机构地区 CenterofSpeechTechnology

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2001年第6期582-589,共8页 计算机科学技术学报（英文版）

关键词 MFCC frequency band energy auto-regressive analysis generalized ini- tial/final MFCC, frequency band energy, auto-regressive analysis, generalized ini- tial/final

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献5

1Chen X X，Int Conference on Spoken Language Processing（ICSLP'2000），2000年
2Li A J，Int Conference on Spoken Language Processing（ICSLP'2000），2000年
3Zheng F，Int Symposium on Chinese Spoken Language Processing（ISCSLP'98），1998年，ASRA349页
4Huang X D，Automatic Speech and Speaker Recognition:Advanced Topics，1996年，481页
5Zheng F，学位论文，1992年

同被引文献74

1葛道辉,李洪升,张亮,刘如意,沈沛意,苗启广.轻量级神经网络架构综述[J].软件学报,2020(9):2627-2653. 被引量：50
2顾明亮,沈兆勇.基于语音配列的汉语方言自动辨识[J].中文信息学报,2006,20(5):77-82. 被引量：19
3陈杰,张玲华,吴玺宏.基于小波包-LPCC的说话人识别特征参数[J].南京邮电大学学报（自然科学版）,2007,27(6):54-56. 被引量：5
4SCHAFER P B, JIN D Z. Noise-robust speech recognition through auditory feature detection and spike sequence decoding[J].Neural Computation, 2014, 26(3): 523-556.
5SLOIN A, BURSHTEIN D. Support vector machine training for improved hidden Markov modeling[J].IEEE Transactions on Signal Processing, 2008, 56(1): 172-188.
6TAKIGUCHI T, ARIKI Y. PCA-based speech enhancement for distorted speech recognition[J].Journal of Multimedia, 2007, 2(5): 13-18.
7HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J].Science, 2006, 313(5786): 504-507.
8SALAKHUTDINOV R, HINTON G E. Replicated Softmax: an undirected topic model [C]∥Proceedings of the Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2009: 1607-1614.
9DODDINGTON G R, SCHALK T B. Speech recognition: turning theory to practice[J].IEEE Spectrum, 1981, 18(9): 26-32.
10VARGA A, STEENEKEN H J M. Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication, 1993, 12(3): 247-251.

引证文献19

1胡峰,马春侠,崔毅安,史广顺.基于分布式干涉光纤传感网络的通信线路防护系统[J].南京理工大学学报,2014,38(6):757-762. 被引量：4
2宋青松,田正鑫,孙文磊,吴小杰,安毅生.用于孤立数字语音识别的一种组合降维方法[J].西安交通大学学报,2016,50(6):42-46. 被引量：9
3罗贵舟,熊晓东.基于LD3320的智能LED照明系统研究[J].电子世界,2017,0(2):182-183. 被引量：4
4赵鑫,陈晓冬,常昕,齐麟,汪毅,郁道银.基于Multi-Fisher准则的语音混合特征提取和特征增强方法[J].纳米技术与精密工程,2017,15(4):317-322. 被引量：3
5徐保民,李文婧.一种自适应的异常声音端点检测方法[J].软件导刊,2017,16(8):1-4. 被引量：3
6徐宏伟,严迪群,阳帆,王让定,金超,向立.基于卷积神经网络的电子变调语音检测算法[J].电信科学,2018,34(2):46-57. 被引量：4
7李应,印佳丽.基于多随机森林的低信噪比声音事件检测[J].电子学报,2018,46(11):2705-2713. 被引量：5
8杨正哲,任玉玲,杜省,柳瑞波.分区域方言客服语音识别系统研究[J].网络新媒体技术,2019,8(1):37-42. 被引量：3
9林艺明,李应.利用能量压缩后的MBPD检测低信噪比声音事件[J].计算机应用与软件,2021,38(6):126-133.
10杨汶雯,石梦荧.基于深度学习的音乐特征提取及流派分类[J].长江信息通信,2021,34(5):9-11. 被引量：4

二级引证文献49

1李红娟,王祥.计算机网络的信息安全分析及防护策略研究[J].信息安全与技术,2016,7(4):40-41. 被引量：31
2张泽.现代数字信号处理技术在光接入网络中的应用分析[J].电子测试,2015,26(5):102-104. 被引量：4
3王春荣,黄凌山,熊昌炯,夏尔冬,郑飞杰.基于STM32的智能答疑机器人设计[J].三明学院学报,2016,33(4):67-71. 被引量：3
4林麒麟,包广清,宋旭辉,张宝强,陶佳.基于语音识别的电梯辅助控制系统设计[J].计算机与数字工程,2017,45(3):544-548. 被引量：11
5梁敏健,崔啸宇,宋青松,赵祥模.基于HOG-Gabor特征融合与Softmax分类器的交通标志识别方法[J].交通运输工程学报,2017,17(3):151-158. 被引量：32
6王小君,卢昱明.基于大数据分析的海量数据特征智能采集方法研究[J].自动化与仪器仪表,2017(11):69-71. 被引量：8
7祁超,张曦.数字光纤网络通信数据采集方法优化仿真[J].计算机仿真,2017,34(11):172-175. 被引量：2
8杨保亮,陈玉芳.基于语音识别技术的智能家居系统的设计[J].电子世界,2018,0(7):205-206. 被引量：7
9贾赟,刘天宇,奚志豪,杨果,陈倩倩,许鹏.语音无线控制型智能家居照明系统[J].科技创新与应用,2018,8(13):46-47. 被引量：5
10李洋,杨涛.基于小波包和FCM聚类的电能表内异物检测[J].传感器与微系统,2019,38(9):134-136. 被引量：2

1TV Show[J].通信技术,2004,37(2):11-11.
2白琳,黄梓瑜,叶程,姜莹莹.基于BP神经网络的车辆声音信号识别[J].自动化技术与应用,2014,33(2):64-66. 被引量：8
3刘天顺,丁腾腾,毛强.基于LD3320语音控制智能车的设计[J].数字技术与应用,2015,33(10):5-5. 被引量：4
4郑方,PascaleFung,等.Mandarin Pronunciation Modeling Based on CASS Corpus[J].Journal of Computer Science & Technology,2002,17(3):249-263. 被引量：1
5MA Xiaohui(Department of Radio Engineering Southeast University Nanjing 210096)GONG Yifan(CRIN/CNRS France)FU Yuqing,LU Jiren(Department of Radio Engineering Southeast University Nanjing 210096).A study on continuous Chinese speech recognition based on stochastic trajectory models[J].Chinese Journal of Acoustics,1997,16(4):350-355.
6贾琳,张中兆.Performance analysis of UWB radio systems under cassa impulsive noise environment[J].Journal of Harbin Institute of Technology(New Series),2006,13(2):242-250.
7武露,张贤达.Blind Carrier Frequency Offset Estimation via Power Spectrum Analysis in MIMO OFDM Systems[J].Tsinghua Science and Technology,2009,14(1):146-150.
8球迷也时尚[J].中国服装（北京）,2009(8):74-75.
9陈宏林.不知疲倦的机器“摄影师”[J].微电脑世界,2003(11):17-17.
10人语：“微软在手机领域的游戏体验一直带有机会主义色彩”[J].互联网周刊,2005(41):20-20.

Journal of Computer Science & Technology

2001年第6期

浏览历史

内容加载中请稍等...

Comparison of Different Implementations of MFCC 被引量：19

参考文献5

同被引文献74

引证文献19

二级引证文献49

相关作者

相关机构

相关主题

浏览历史