期刊文献+

基于改进FeatDepth的足球运动场景无监督单目图像深度预测

Unsupervised monocular image depth prediction in football sports scene based on improved featdepth
下载PDF
导出
摘要 为了在降低成本的同时提高图像深度信息预测的精确度,并将深度估计应用于足球运动场景,提出一种基于改进FeatDepth的足球运动场景无监督单目图像深度预测方法。首先,对原FeatDepth引入注意力机制,使模型更加关注有效的特征信息;其次,将FeatDepth中的PoseNet网络和DepthNet网络分别嵌入GAM全局注意力机制模块,为网络添加额外的上下文信息,在基本不增加计算成本的情况下提升FeatDepth模型深度预测性能;再次,为在低纹理区域和细节上获得更好的深度预测效果,由单视图重构损失与交叉视图重构损失组合而成最终的损失函数。选取KITTI数据集中Person场景较多的部分进行数据集制作并进行仿真实验,结果表明,改进后的FeatDepth模型不仅在精确度上有所提升,且在低纹理区域及细节处拥有更好的深度预测效果。最后,对比模型在足球场景下的推理效果后得出,改进后的模型在低纹理区域(足球、球门等)及细节处(肢体等)有更好的深度预测效果,实现了将基于无监督的单目深度估计模型应用于足球运动场景的目的。 [Objective]To reduce the cost and improve the accuracy of the depth estimation model in the process of image depth information prediction and apply the depth estimation model to the complex football sports scene,an unsupervised monocular image depth prediction method based on the improved FeatDepth is proposed.The monocular depth estimation model is used to obtain the relative depth information between the people,the football,and the goal in the football scene and calculate the distance information between the targets,which can be used for football auxiliary training and monitoring whether the player is offside and other application scenarios.[Methods]First,the attention mechanism was introduced to the original FeatDepth method so that the model pays more attention to the effective feature information.Second,the PoseNet and DepthNet networks in FeatDepth were embedded in the GAM global attention mechanism module,adding additional context information to the network and improving the depth prediction performance of the FeatDepth model without increasing the computational cost.Third,because of the higher requirements for depth information prediction in the football scene,to ensure that the model exhibits better performance in the low-texture areas and details,the loss function scheme used by the original FeatDepth method was adopted.The final loss function was mainly composed of the combination of single-view and cross-view reconstruction losses,in which the single-view reconstruction loss was composed of discriminant and convergence losses based on the reconstruction loss,and the cross-view reconstruction loss was composed of eigenmetric and photometric losses.Then,the dataset was made,and the parts of the KITTI public dataset with more person scenes were selected for dataset making,including 4,721 images in the training set,631 images in the verification set,and 584 images in the test set.Model comparison experiments were conducted to verify the effectiveness of the improvement strategy.[Results]The improved model with the GAM global attention mechanism module is called G-FeatDepth,and after comparative experiments on the dataset,the improved G-FeatDepth model is compared with the improved FeatDepth model on each evaluation index,with absolute relative error decreased by 0.007,square relative error decreased by 0.051,root–mean–square error decreased by 0.032,and root–mean–square logarithmic error decreased by 0.005,as well as accuracy with thresholdsδ<1.25 improved by 0.009,δ<1.25^(2)improved by 0.004,andδ<1.25^(3)improved by 0.002,which not only reduces the error index but also improves the accuracy.The experimental data verifies the improvement of the model performance.According to the actual inference effect of the model in the dataset,the G-FeatDepth model has a better depth prediction effect in low-texture areas and details than the other models.[Conclusions]Using the image data in the football scene and comparing the inference effects of each model in the football scene,the improved model G-FeatDepth has a better depth prediction effect in the details of low-texture areas(e.g.,football,goals,and limbs),that is,it is more satisfying to predict the depth information in the football scene,and the unsupervised monocular depth estimation model is applied to the football sports scene.
作者 傅荟璇 徐权文 王宇超 FU Huixuan;XU Quanwen;WANG Yuchao(College of Intelligent Systems Science and Engineering,Harbin Engineering University,Harbin 150001,China)
出处 《实验技术与管理》 CAS 北大核心 2024年第10期74-84,共11页 Experimental Technology and Management
基金 国家自然科学基金面上项目(52271313) 中央高校基金项目(3072024GH0405)。
关键词 足球运动场景 无监督单目深度估计 FeatDepth 注意力机制 GAM 图像重构 football sports scenes unsupervised monocular depth estimation FeatDepth attention mechanism GAM image reconstruction
  • 相关文献

参考文献3

二级参考文献3

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部