跨视图时序对比学习的自监督视频表征算法

Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation

下载PDF

导出

摘要现有的自监督表征算法主要关注视频帧之间的短期运动特性,但是帧间动作序列的变化幅度较小,而且单视图数据因语义受限影响深度特征表达能力,视频动作中丰富的多视图信息未被充分利用。为此提出基于跨视图语义一致性的时序对比学习算法,自监督学习RGB帧和光流场两种数据中蕴含的动作时序变化特性,主要思路为:设计局部时序对比学习方法,采用不同正负样本划分策略,挖掘同一实例不重叠片段之间的时序相关性和判别可分性,增强细粒度特征表达能力;研究全局对比学习方法,通过跨视图语义协同训练来增加正样本,学习多实例不同视图的语义一致性,提高模型的泛化能力。通过两个下游任务对模型效果进行评估,在UCF101和HMDB51数据集的实验结果表明,所提方法在动作识别和视频检索任务上,较前沿主流方法平均提升了2~3.5个百分点。 The existing self-supervised representation algorithms mainly focus on the short-term motion characteristics between video frames,but the variation range of the action sequence between frames is small,and the depth feature expression ability of single-view data is affected due to semantic limitations,so the rich multi-view information in video actions is not fully utilized.Therefore,a temporal contrast learning algorithm based on cross-view semantic consistency is proposed to self-supervised learn the action temporal variation characteristics embedded in both RGB frames and optical flow field data.The main ideas are as follows:to design a local temporal contrast learning method,adopt different posi-tive and negative sample division strategies to explore the temporal correlation and discriminative differentiability between non-overlapping segments of the same instance,and enhance the fine-grained feature expression capability;to study the global contrast learning method to increase the positive samples by cross-view semantic co-training,learn the semantic consistency of different views of multiple instances,and improve the generalization ability of the model.The model per-formance is evaluated through two downstream tasks,and the experimental results on UCF101 and HMDB51 datasets show that the proposed method improves on average 2~3.5 percentage points over cutting-edge mainstream methods on action recognition and video retrieval tasks.

作者王露露徐增敏张雪莲蒙儒省卢涛 WANG Lulu;XU Zengmin;ZHANG Xuelian;MENG Ruxing;LU Tao(Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation,School of Mathematics and Computing Science,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China;Center for Applied Mathematics of Guangxi(GUET),Guilin,Guangxi 541004,China;Anview.ai,Guilin,Guangxi 541010,China;Hubei Key Laboratory of Intelligent Robot,School of Computer Science and Engineering,Wuhan Institute of Technology,Wuhan 430205,China)

机构地区桂林电子科技大学数学与计算科学学院广西高校数据分析与计算重点实验室广西应用数学中心(桂林电子科技大学) 桂林安维科技有限公司武汉工程大学计算机科学与工程学院智能机器人湖北省重点实验室

出处《计算机工程与应用》 CSCD 北大核心 2024年第18期158-166,共9页 Computer Engineering and Applications

基金广西自然科学基金(2024GXNSFAA010493) 国家自然科学基金(61862015,62072350) 广西科技基地和人才专项(AD23023002,AD21220114) 广西重点研发计划项目(AB17195025)。

关键词自监督学习视频表征学习时序对比学习局部对比学习跨视图协同 self-supervised learning video representation learning temporal contrastive learning local contrastive learning cross-view co-training

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1张孟雄.可靠性工程在模具设计中的应用与研究[J].中文科技期刊数据库（全文版）自然科学,2019(1):200-200.
2金川,付小思.融合注意力机制的深度哈希图像检索方法[J].荆楚理工学院学报,2024,39(4):33-39.
3钱卫.安加拉火箭:诞生在苏联的废墟中[J].太空探索,2024(6):33-37.
4朱海峰.天然气压缩机油品质量与电容、电导率的关系研究[J].设备管理与维修,2024(16):185-187.
5王科理,石春珉,王克俊,程传彬,李勇,孙飚.基于改进卷积神经网络的受电弓滑板缺陷识别方法[J].铁道机车车辆,2024,44(4):99-105.
6周强,顾汉富,柏嵩,张翔.基于改进XGBoost算法的XLPE电缆接头故障自动化诊断与测量研究[J].自动化与仪表,2024,39(7):84-86.
7徐使超,张强.大概念视角下的语料库辅助英语阅读教学三维路径[J].基础外语教育,2024,26(4):25-32.
8郑李明,许天赐,高浩然,李庆华,胡晨光,窦智.特异小样本工业产品表面缺陷检测方法研究[J].河南师范大学学报（自然科学版）,2024,52(6):88-96.
9小鹰.此面向前 22800型小型导弹舰的发展与改进[J].舰船知识,2023(1):44-50.
10徐欣,侯成凯.基于多尺度卷积神经网络和注意力机制的模拟电路早期故障诊断方法[J].电子器件,2024,47(4):929-934.

计算机工程与应用

2024年第18期

浏览历史

内容加载中请稍等...

跨视图时序对比学习的自监督视频表征算法

相关作者

相关机构

相关主题

浏览历史