摘要
基于视觉的三维手部姿态估计是实现人机交互的重要技术手段。目前,视觉手部姿态估计算法易受光照变化、遮挡和环境噪声等复杂环境因素干扰,导致模型的鲁棒性无法得到保障。这些多变的环境因素使得传统的深度学习方法在真实场景中难以取得令人满意的结果。针对这一难题,本文提出了一种基于特征分离的手部姿态估计算法,通过对手部图像中的关键特征进行精炼来提升模型在不同环境中的鲁棒性。首先,对编码器进行基于频域增强的预训练,从而减少环境噪声对于底层视觉特征提取的影响;其次,在解码阶段提出了一种用于分离因果特征和非因果特征的双分支结构,通过减少非因果特征对于姿态估计任务的影响以提高模型应对复杂环境的能力;最后,通过融合全局姿态信息和局部关节信息,实现了不同尺度的统一优化,并基于两个公开数据集的定量分析和定性分析,验证了本文所提出方法的准确性和鲁棒性。
Visual 3D hand pose estimation is a crucial approach in the field of human-computer interaction.Currently,approaches for visual hand pose estimation often pose challenges in ensuring the robustness of the model due to complicated environmental factors,such as illumination,occlusion,and environmental noises.These variable environmental factors make it difficult for traditional deep learning-based methods to achieve satisfactory results in real-world scenarios.To address this challenge,we propose a hand pose estimation approach based on feature disentanglement,which aims to enhance the model′s robustness in diverse environments by refining key features in hand images.Specifically,this paper first conducts spectrum augmentation-based pretraining for the encoder,reducing the influence of environmental noises to low-level visual feature extraction.After that,a dual-branch structure is introduced during the decoder stage to decouple causal and non-causal features,decreasing the impact of non-causal features on the pose estimation task and improving the model′s capability to handle complicated environments.Finally,the global posture information and local joint information are fused to achieve multi-scale refinement for the estimation.Qualitative and quantitative results on two publicly datasets demonstrate the superior performance and robustness of the proposed method.
作者
高鲲
张皓洋
李达
闫野
印二威
GAO Kun;ZHANG Haoyang;LI Da;YAN Ye;YIN Erwei(College of Engineering,Peking University,Beijing 100091,China;National Innovation Institute of Defense Technology,Academy of Military Sciences,Beijing 100071,China;Intelligent Game and Decision Laboratory,Beijing 100071,China;Tianjin Artificial Intelligence Innovation Center,Tianjin 100071,China;College of Software,Nankai University,Tianjin 300071,China)
出处
《智能安全》
2024年第3期54-65,共12页
Artificial Intelligence Security
基金
国家重点研发计划资助项目(2023YFF1203900,2023YFF1203903)
国家自然科学基金资助项目(62332019,62076250)。
关键词
特征分离
复杂环境
三维手部姿态估计
因果-非因果特征解耦
全局-局部信息融合
feature separation
complicated environment
3D hand pose estimation
causal and non-causal feature disentanglement
fusion of global and local information