基于多尺度距离矩阵的语音关键词检测与细粒度定位方法

Spoken term detection and fine-grained localization methodbased on multi-scale distance matrices

下载PDF

导出

摘要针对现有语音关键词检测方法定位精度低的问题,提出了一种基于多尺度距离矩阵的语音关键词检测与细粒度定位方法(spoken term detection and fine-grained localization method based on multi-scale distance matrices,MF-STD)。该方法首先利用残差卷积网络提取特征并构建距离矩阵以建模输入之间的相关性;其次通过多尺度分割和解耦头学习不同尺度下的定位信息;最后根据多尺度加权定位损失、置信度损失和分类损失优化模型,实现对关键词存在性和时域边界的细粒度预测。在LibriSpeech数据集上的实验结果表明,MF-STD在集内词的检测中,精准率和交并比分别达到97.1%和88.6%;在集外词的检测中,精准率和交并比分别达到96.7%和88.2%。与现有的语音关键词检测与定位方法相比,MF-STD的检测准确率和定位精度显著提升,充分证明该方法的先进性,也证明了多尺度特征建模与细粒度定位约束在语音关键词检测任务中的有效性。 Aiming to address the low localization accuracy of existing spoken term detection methods,this paper proposed a spoken term detection and fine-grained localization method based on multi-scale distance matrices(MF-STD).This method firstly employed a residual convolutional network to extract features and construct a distance matrix to model the correlation between inputs.Then,it learnt the localization information at different scales through multi-scale segmentation and decoupling heads.Finally,the model was optimized according to the multi-scale weighted localization loss,confidence loss,and classification loss.This enabled the model to achieve fine-grained prediction of keyword existence and time domain boundaries.Experimental results on the LibriSpeech dataset demonstrate that for in-vocabulary detection,the precision and intersection over union(IoU)reach 97.1%and 88.6%,respectively.In the case of out-of-vocabulary detection,the precision and IoU reach 96.7%and 88.2%,respectively.In comparison to existing methods for spoken term detection and localization,MF-STD significantly improves detection accuracy and localization precision.This fully demonstrates the superiority of the proposed method and the effectiveness of multi-scale feature modeling and fine-grained localization constraints in spoken term detection tasks.

作者李祥瑞毛启容 Li Xiangrui;Mao Qirong(School of Computer Science and Communication Engineering,Jiangsu University,Zhenjiang Jiangsu 212013,China;Jiangsu Province Big Data Ubiquitous Perception&Intelligent Agriculture Application Engineering Research Center,Zhenjiang Jiangsu 212013,China)

机构地区江苏大学计算机科学与通信工程学院江苏省大数据泛在感知与智能农业应用工程研究中心

出处《计算机应用研究》 CSCD 北大核心 2024年第11期3370-3375,共6页 Application Research of Computers

基金江苏省重点研发计划资助项目(BE2020036) 江苏大学应急管理学院专项科研项目(KY-A-01)。

关键词语音关键词检测语音细粒度定位多尺度检测残差卷积网络 spoken term detection speech fine-grained localization multi-scale detection convolutional residual network

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

1王宏志,宋明轩,程超,解东旋.基于改进YOLOv5算法的道路目标检测方法[J].吉林大学学报（工学版）,2024,54(9):2658-2667.
2王晓强,李科岑,李雷孝,王鑫鹏,杨锦帆.基于特征融合和注意力的驾驶员吸烟目标检测[J].计算机工程与设计,2024,45(11):3337-3344.
3胡波,张景涵,栾琨.校政合作与校企合作对区域经济增长影响的实证研究[J].中国高校科技,2024(11):95-102.
4叶大鹏,景均,张之得,李辉煌,吴昊宇,谢立敏.MSH-YOLOv8:融合尺度重建的蘑菇小目标检测方法[J].智慧农业（中英文）,2024,6(5):139-152.
5袁谷长.基于改进YOLOv5的密集人群检测算法研究[J].应用数学进展,2024,13(10):4623-4628.
6常蓉,马海艺.遥感技术在尾矿库环境监测中的应用探析[J].中国环境监测,2024,40(5):224-233.
7刘子帆,黄军杰,李遥.基于深度时空卷积神经网络的人群异常行为检测和定位[J].数字技术与应用,2024,42(9):39-41.
8陈文翰,朱正为,宋昌隆.基于改进YOLOv7的SAR图像舰船目标检测方法[J].电光与控制,2024,31(12):19-26.
9岳衡,崔华朔,刘善军.顾及空间尺度与时序特征的耕地分类方法研究[J].测绘科学,2024,49(9):134-143.
10曾海峰.基于改进YOLOv8的交通标志检测与识别算法研究[J].电脑知识与技术,2024,20(30):13-16.

计算机应用研究

2024年第11期

浏览历史

内容加载中请稍等...

基于多尺度距离矩阵的语音关键词检测与细粒度定位方法

相关作者

相关机构

相关主题

浏览历史