期刊文献+

潜在语义分析在连续语音识别中的应用

Application of latent semantic analysis in continuous speech recognition
下载PDF
导出
摘要 研究了潜在语义分析(LSA)理论及其在连续语音识别中应用的相关技术,在此基础上利用WSJ0文本语料库上构建LSA模型,并将其与3-gram模型进行插值组合,构建了包含语义信息的统计语言模型;同时为了进一步优化混合模型的性能,提出了基于密度函数初始化质心的k-means聚类算法对LSA模型的向量空间进行聚类。WSJ0语料库上的连续语音识别实验结果表明:LSA+3-gram混合模型能够使识别的词错误率相比较于标准的3-gram下降13.3%。 The theory of Latent Semantic Analysis(LSA) for speech recognition is described,and the related techniques for implementing LSA-based language modeling in speech recognition systems are presented.An LSA-based semantic model is constructed on the WSJ0 text corpus.This paper uses the interpolation method to combine this semantic model with conventional 3-gram to form a hybrid language model( i.e. , LSA+3-gram ).To optimize the performance of the hybrid model,it applies k-means algorithm to perform vector clustering in the LSA vector space while the density function is used to initialize the centroid.The constructed hybrid language model outperforms the corresponding 3-gram baseline:Continuous speech recognition experiments conducted on the WSJ0 test corpus show a relative reduction in word error rate of about 13.3%.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第32期111-113,共3页 Computer Engineering and Applications
基金 国家自然科学基金No.60573189 国家高技术研究发展计划(863)No.2006AA01Z139 No.2006AA010107 No.2006AA010108 福建省自然科学基金No.2006J0043~~
关键词 潜在语义分析 N元文法 K均值聚类 连续语音识别 latent semantic analysis N-gram k-means clustering continuous speech recognition
  • 相关文献

参考文献2

二级参考文献8

  • 1Han J,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2002:223-262.
  • 2Kumar M,Nitin R P,James B O.Clustering data with measurement errors[C]//Ninth International Conference of Forum for Interdisciplinary Mathematics on Statistics Combinatories and Related Areas,SCRA 2002-FIM IX,December 21-23,2002.
  • 3Su M C,Chou C H.A modified version of the k-means algorithm with a disance based on cluster symmetry[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2001,23(6):674-680.
  • 4Usama M F,Cory R,Paul S B.Initialization of interactive refinement clustering algorithms[C]//Pmceedings of Fourth International Conference on Knowledge Discovery and Data Mining.Menlo Park:AAA I Press,1998:194-198.
  • 5Chaudhuri D,Chaudhuri B B.A novel multi-seed nonhierarchical data clustering technique[J].IEEE Transactions on Systems,Man and Cybernetics:Part B,1997,27(5):871-877.
  • 6裴继红,范九伦,谢维信.聚类中心的初始化方法[J].电子科学学刊,1999,21(3):320-325. 被引量:42
  • 7林鸿飞,姚天顺.基于潜在语义索引的文本浏览机制[J].中文信息学报,2000,14(5):49-56. 被引量:29
  • 8林鸿飞.基于示例的文本标题分类机制[J].计算机研究与发展,2001,38(9):1132-1136. 被引量:17

共引文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部