期刊文献+

融合多种参数高效微调技术的深度伪造检测方法

Deepfake Detection Method Integrating Multiple Parameter-Efficient Fine-Tuning Techniques
下载PDF
导出
摘要 近年来,随着深度伪造技术趋于成熟,换脸软件、合成视频已经随处可见。尽管深度伪造技术为人们带来了娱乐,但同时也为不法分子提供了滥用的机会。因此,深度伪造检测技术的重要性也日益凸显。现有的深度伪造检测方法普遍存在跨压缩率鲁棒性差、跨数据集泛化性差以及模型训练开销大等问题。为解决上述问题,提出一种融合多种参数高效微调技术的深度伪造检测方法,使用以掩码图像建模(MIM)自监督方法预训练的视觉自注意力模型作为主干,使用克罗内克积改进的低秩自适应方法对预训练模型的自注意力模块参数进行微调,同时采用并行结构加入卷积适配器对图像局部纹理信息进行学习,以增强预训练模型在深度伪造检测任务中的适应能力,采用并行结构引入经典适配器对预训练模型的前馈网络微调以充分利用预训练阶段学习到的知识,使用多层感知机代替原预训练模型分类头实现深度伪造检测。在六个数据集上的实验结果表明,该模型在可训练参数仅有2×10^(7)的情况下,在六个主流数据集上实现了平均约0.996的帧水平AUC。在跨压缩率实验中,帧水平AUC的平均下降为0.135。在跨数据集泛化性实验中,帧水平AUC达到了平均0.765。 In recent years,as deepfake technology matures,face-swapping software and synthesized videos have become widespread.While these techniques offer entertainment,they also provide opportunities for misuse by malicious actors.Consequently,the significance of deepfake detection technology has grown markedly.Existing methods for deepfake detection commonly suffer from issues including poor cross-compression robustness,weak crossdataset generalization,and high model training overheads.To address these challenges,this paper proposes a deepfake detection approach that combines multiple parameter-efficient fine-tuning techniques.This method utilizes a visual Transformer model pretrained with the masked image modeling self-supervised method as its backbone.Initially,it employs the low-rank adaptation(LoRA)method for fine-tuning the self-attention module parameters of the pretrained model.Concurrently,it introduces a parallel structure incorporating convolutional adapters to capture local texture information,enhancing the model’s adaptability in deepfake detection tasks.Subsequently,a serial structure introduces classical adapters to fine-tune the feed-forward network of the pretrained model,maximizing the utilization of knowledge acquired during the pretraining phase.Ultimately,a multi-layer perceptron replaces the original pretrained model’s classification head for deepfake detection.Experimental results across six datasets demonstrate that this model achieves an average frame-level AUC of approximately 0.996 with only 2×10^(7)trainable parameters.In cross-compression experiments,the average frame-level AUC drop is 0.135.In cross-dataset generalization experiments,the frame-level AUC averages around 0.765.
作者 张溢文 蔡满春 陈咏豪 朱懿 姚利峰 ZHANG Yiwen;CAI Manchun;CHEN Yonghao;ZHU Yi;YAO Lifeng(College of Information and Cyber Security,People’s Public Security University of China,Beijing 100038,China)
出处 《计算机科学与探索》 CSCD 北大核心 2024年第12期3335-3347,共13页 Journal of Frontiers of Computer Science and Technology
基金 中国人民公安大学网络空间安全执法技术双一流创新研究专项(2023SYL07)。
关键词 深度伪造 视觉自注意力模型 自监督预训练模型 低秩自适应 参数高效微调 deepfakes vision Transformer self-supervised pretrained models low-rank adaptation(LoRA) parameterefficient fine-tuning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部