言语信息处理的进展被引量：3

The Research Progress of Speech Information Processing

下载PDF

导出

摘要该文介绍了言语信息处理的进展,特别提到汉语言语处理的现状。言语信息处理涉及到言语识别、说话人识别、言语合成、言语知觉计算等。带口音和随意发音的言语识别有力的支持了语言学习与口语水平测评等应用;跨信道、环境噪音、多说话人、短语音、时变语音等因素存在的情况下提高识别正确率,是说话人识别的研究热点;言语合成主要关注多语言合成、情感言语合成、可视言语合成等;言语知觉计算开展了言语测听、噪声抑制算法、助听器频响补偿方法、语音信号增强算法等研究。将言语处理技术与语言、网络有效结合,促进了更加和谐的人机言语交互。 This paper introduces the progress of speech information processing,especially the researches on Chinese speech processing.Speech information processing includes speech recognition,speaker recognition,speech synthesis and computational speech perception.Researches on speech recognition with accent and personal style support the systems of language learning and evaluation,while speaker recognition focuses on how to improve the performance in different conditions.Researches on speech synthesis pay more attention on cross-language,emotional and audio-visual speech synthesis.Fomputational speech perception focuses on the implementation on speech testing and rehabilitation,denoising,and speech enhancement.Through these researches,especially the combination of speech information processing,linguistics and web technology,we can build more harmonious human-computer speech interaction system.

作者蔡莲红贾珈郑方

机构地区清华大学计算机科学与技术系清华大学信息技术研究院语音和语言技术中心

出处《中文信息学报》 CSCD 北大核心 2011年第6期137-141,共5页 Journal of Chinese Information Processing

基金国家自然科学基金资助项目(61003094,60928005,60805008)

关键词言语识别说话人识别言语合成言语知觉计算 speech recognition speaker recognition speech synthesis computational speech perception

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Rabiner L, Juang B-H. Fundamentals of Speech Rec- ognition[M]. Prentice Hall, 1993.
2Huang X D, Acero A, Hon H W. Spoken language processing: A guide to theory, algorithm and system development[M]. Prentice Hail. 2001.
3Liu L, Zheng F, Wu W. State-dependent phoneme- based model merging for dialectal Chinese speech rec- ognition [J]. Speech Communication, 2008, 50 (7):605-615.
4Harrison A, Meng H, Lee P. Automated Feedback in Commercial Computer-Training Systems[R]. Dept. of SEEM, CUHK, 2009.
5Meng H, Lo W-K, Harrison A M, et al. Develop- ment of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English:The CUHK Experience[C]//APSIPA 2010, Biopo-lis, Singapore: 2010.
6Wu W, Zheng F, Xu M, et al. A Channel Robust Speaker Verification Algorithm Using Cohort-based Speaker Model Synthesis [J]. IEEE Transactions onAudio, Speech, and Language Processing, 2007, 15 (6): 1893- 1903.
7Zen H, Nose T, Yamagishi J, et al. The HMM-based Speech Synthesis System (HTS) Version 2. 0 [C]// Sixth ISCA Workshop on Speech Synthesis. Bonn,Germany: 2007: 294-299.
8Qian Y, Xu J, Soong F K. A frame mapping based HMM approach to cross-lingual voice transformation [C]//2011 IEEE International Conference on Acous-tics, Speech and Signal Processing (ICASSP). IEEE, 2011: 5120-5123.
9Chung-Hsien Wu, Chi-Chun Hsia, Chung-Han Lee, et al. Hierarchical Prosody Conversion Using Regres- sion-Based Clustering for Emotional Speech Synthesis.[J]. IEEE Transactions on Audio, Speech, and Lan- guage Processing, 2010, 18(6): 1394-1405.
10Jia Jia, Shen Zhang, Fanbo Meng, et al. Emotional Audio-Visual Speech Synthesis Based on PAD[J]. IEEE Transactions on Audio, Speech, and LanguageProcessing, 2011, 19(3): 570-582.

二级参考文献14

1[12]Zeng FG,Nie KB,Stickney G.Speech recognition with slowly varying amplitude and frequency modulation cues.Proceedings of National Academic Society.U.S.A,2005,102(7):2293-2298.
2[13]Freyman RL,Balakrishnan U,Helfer KS.Effect of number of masking talkers and auditory priming on informational masking in speech recognition.J Acoust Soc Am,2004,115:2246-2256.
3[14]Li L,Qi JG,He Y.(2005a).Attribute capture in the precedence effect for long-duration noise sounds.Hearing Research,2005,202:235-247.
4[1]Zeng FG.Cochlear implants in China.Audiology,1995,34:61-75.
5[2]Zeng FG,Cao KL,Wang ZZ.Progress in cochlear implants.Chinese Journal of Otolaryngology,1998,33(2):123-125.
6[3]Kang,J.Comparison of speech intelligibility between English and Chinese.Journal of Acoustic Society of America,1998,103:1213-12.
7[4]Freyman RL,Helfer KS,McCall DD,et al.The role of perceived spatial separation in the unmasking of speech.Journal of Acoustic Society of America,1999,106:3578-3588.
8[5]Freyman RL,Balakrishnan U,Heifer KS.Spatial release from informational masking in speech recognition.Journal of Acoustic Society of America,2001,109:2112-2122.
9[6]Li L,Daneman M,Qi JQ,et al.Does the information content of an irrelevant source differentially affect speech recognition in younger and older adults? Journal of Experimental Psychology:Human Perception and Performance,2004,30:1077-1091.
10[7]Wu XH,Wang C,Chen J,et al.The effect of perceived spatial separation on informational masking of Chinese speech.Hearing Research,2004,199:1-10.

共引文献7

1刘博.人工耳蜗植入者的声调研究与评价[J].中国医学文摘（耳鼻咽喉科学）,2009,24(5):260-261. 被引量：5
2亓贝尔,刘博.人工耳蜗言语处理方案的研究进展[J].临床耳鼻咽喉头颈外科杂志,2012,26(1):44-47. 被引量：5
3亓贝尔,刘博.成年人工耳蜗植入者术后效果的综合评估[J].中国听力语言康复科学杂志,2013(5):365-368.
4龚树生,郝瑾.国产人工耳蜗,任重道远[J].中国医学文摘（耳鼻咽喉科学）,2013,28(5):231-236. 被引量：3
5亓贝尔,刘博,董瑞娟,Andreas Krenmayr,陈雪清,王硕.时域精细结构信息对汉语人工耳蜗使用者言语识别的影响[J].中华耳鼻咽喉头颈外科杂志,2014,49(4):294-299. 被引量：2
6刘博,亓贝尔,Andreas Krenmayr,陈雪清,王硕,Reinhold Schatzer,Clemens Zierhofer,韩德民.言语噪声下汉语普通话声调测试系统的编制[J].中华耳鼻咽喉头颈外科杂志,2014,49(9):733-737. 被引量：6
7亓贝尔,Andreas Krenmayr,董瑞娟,傅新星,刘博.听力正常人噪声下汉语普通话声调识别成绩-强度函数的研究[J].中国耳鼻咽喉头颈外科,2016,23(1):4-7. 被引量：1

同被引文献23

1徐俊,蔡莲红.面向情感转换的层次化韵律分析与建模[J].清华大学学报（自然科学版）,2009(S1):1274-1277. 被引量：7
2蔡莲红,崔丹丹,蔡锐.汉语普通话语音合成语料库TH-CoSS的建设和分析[J].中文信息学报,2007,21(2):94-99. 被引量：12
3Zen H, Tokuda K, Black A W.Statistical parametric speech synthesis[J].Speech Communication,2009,51 ( 11 ) : 1039-1064.
4Yamagishi J, Kobayashi T, Nakano Y, et al.Analysis of speaker adaptation algorithms for HMM-based speech syn- thesis and a constrained SMAPLR adaptation algorithm[J]. IEEE Transactions on Audio, Speech, and Language Process- ing, 2009, 17( 1 ) : 66-83.
5Nose T, Tachibana M, Kobayashi T.HMM-based style con- trol for expressive speech synthesis with arbitrary speaker's voice using model adaptation[J].IEICE Trans on Inf & Syst, 2009, E92-D (3) : 489-497.
6Yang Hongwu, Meng H M, Cai Lianhong.Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis[C]//Proceedings of International Conference on Spoken Language Processing. Pittsburg, USA : [s.n.], 2006: 1806-1809.
7Wu Zhiyong, Meng H M, Yang Hongwu, et al.Modeling the expressivity of input text semantics for chinese text-to-speech synthesis in a spoken dialog system[J].IEEE Transactions on Audio, Speech, and Language Processing, 2009, 17 (8) : 1567-1577.
8崔丹丹.情感语音分析与变换的研究[D].北京:清华大学,2007.
9Guo Weitong, Yang Hongwu, Pei Dong, et al.Prosody con- version of Chinese northwest mandarin dialect based on five degree tone model[J].Intemational Journal of Digital Content Technology and its Applications, 2012, 6 (17): 323-332.
10Kawahara H,Masuda-Katsuse I,de Cheveigne A.Restructur- ing speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extrac- tion: possible role of a repetitive structure in sounds[J]. Speech Communication, 1999,27(3/4) : 187-207.

引证文献3

1鲁小勇,杨鸿武,郭威彤,裴东.基于PAD三维情绪模型的情感语音韵律转换[J].计算机工程与应用,2013,49(5):230-235. 被引量：3
2王泽勋.层次韵律特征对语音情感转换的影响分析[J].信息通信,2017,30(10):29-30.
3张瑛杰,彭亚雄.基于人脸和声纹的多生物特征融合技术研究[J].电子科技,2018,31(5):40-43. 被引量：3

二级引证文献6

1王泽勋.层次韵律特征对语音情感转换的影响分析[J].信息通信,2017,30(10):29-30.
2智鹏鹏,杨鸿武,宋南.利用说话人自适应实现基于DNN的情感语音合成[J].重庆邮电大学学报（自然科学版）,2018,30(5):673-679. 被引量：4
3吕妮.基于图像处理的高精度人流密度监控方法[J].电子设计工程,2019,27(24):171-175. 被引量：2
4潘涛,王胜利.基于不同算法的语音信号共振峰提取研究与实现[J].甘肃科技,2019,35(22):23-26.
5毛俊杰,刘鹏,李昌锋.基于人脸识别和生物特征的学生身份安全认证系统[J].电子设计工程,2020,28(12):30-34. 被引量：6
6张凡,肖勤,彭梓齐.基于物联网的智能火灾逃生系统[J].科技创新导报,2020,17(33):83-85.

1齐晓凡.言语识别技术的发展与展望[J].中国司法鉴定,2007(3):40-43. 被引量：6
2王明会,潘新安,钟义信.神经元网络用于拼音－汉字的转换[J].北京邮电大学学报,1994,17(2):39-43.
3郭铭,陈云凤.自动发音人识别中测试文本选择的研究[J].声学学报,1993,18(2):148-153. 被引量：3
4贺中人,何友金,李军.基于稀疏表示的图像混合噪声抑制算法[J].飞行器测控学报,2009,28(2):61-63.
5纪红,吴善培.半连续隐马尔可夫模型在孤立数字识别中的应用[J].北京邮电学院学报,1993,16(3):76-82. 被引量：1
6吕士楠,齐士钤,张家.合成言语自然度的研究[J].声学学报,1994,19(1):59-65. 被引量：7
7顾国林.撼动入门市场缤特力推出M20蓝牙耳机[J].微电脑世界,2011(8):25-25.
8欧贵文.基于过零点间时间间隔对P，T，K，Z，ZH，H等六个声母的识别[J].声学学报,1994,19(3):234-237. 被引量：1
9郭旭静,王祖林.SAR图像的非下采样Contourlet噪声抑制算法[J].北京航空航天大学学报,2007,33(8):894-897. 被引量：4
10王先通,曹江涛,宋丽娟,付贵增.基于非线性-复位PSO的数据采集与处理系统[J].辽宁石油化工大学学报,2016,36(6):69-73.

中文信息学报

2011年第6期

浏览历史

内容加载中请稍等...

言语信息处理的进展被引量：3

参考文献14

二级参考文献14

共引文献7

同被引文献23

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

言语信息处理的进展 被引量：3

参考文献14

二级参考文献14

共引文献7

同被引文献23

引证文献3

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

言语信息处理的进展被引量：3