基于隐空间扩散模型的差分隐私数据合成方法研究

Differential Privacy Data Synthesis Method Based on Latent Diffusion Model

下载PDF

导出

摘要数据共享与发布可以有效发挥数据的价值,能够在数智时代推动科技进步和经济社会的发展。在数据共享的同时如何保护数据版权及个人隐私仍是一项巨大的挑战。差分隐私数据合成是数据隐私保护的有效手段,数据持有者通过发布合成数据取代真实数据,一方面可以保护数据隐私,另一方面也可以提高数据的泛用性与可用性。针对差分隐私生成模型合成图像数据样本可用性低的问题,提出了基于隐空间扩散模型的两阶段差分隐私生成模型。首先对原始图像进行差分隐私感知信息压缩,将其从像素空间投射至隐空间中,获得原始敏感数据的脱敏隐向量表示。然后将隐向量输入扩散模型,使其逐渐转变为先验分布,并通过去噪过程进行采样。最后,使用MNIST和Fashion MNIST数据集训练并进行数据合成,结果表明该模型在FID和下游任务准确性上相比DP-Sinkhorn等SOTA模型均有明显提升。 The widespread application of data sharing and publication in the socio-economic domain drives scientific progress and societal development.However,issues related to copyright and privacy,especially concerning personal data,remain critical challenges.Differential privacy data synthesis has emerged as an effective means of protecting data privacy,where data holders can release synthetic data instead of real data,thereby enhancing data utility and availability while preserving privacy.In response to the limited usability of existing differential privacy generation models,this paper proposes a two-stage differential privacy generation model based on the latent space diffusion approach.Firstly,the differential privacy-aware information compression is performed on the original image,and it is projected from the pixel space to the latent space to obtain the desensitized latent vector representation of the original sensitive data.The latent vector is then fed into a diffusion model to gradually transform into a prior distribution and sampled through a denoising process.Experimental results based on the MNIST and Fashion MNIST datasets demonstrate that the proposed model exhibits significant improvements in terms of Fréchet inception distance(FID)and downstream task accuracy compared to state-of-the-art models like DP-Sinkhorn.

作者葛胤池张辉孙浩航 GE Yinchi;ZHANG Hui;SUN Haohang(State Key Laboratory of Complex&Critical Software Environment,Beihang University,Beijing 100191,China)

机构地区北京航空航天大学复杂关键软件环境全国重点实验室

出处《计算机科学》 CSCD 北大核心 2024年第3期30-38,共9页 Computer Science

关键词差分隐私数据合成生成模型自编码器扩散模型 Differential privacy Data synthesis Generative models Autoencoder Diffusion models

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1杨露露,李春芝,陈晓华,王丽.应用Sinkhorn距离和图正则约束的高效解混算法[J].遥感学报,2023,27(11):2603-2616.
2樊兴华,冼佩莹.“青春”的“出走”与“返还”——浅谈现代舞作品《青春之歌》的身体、空间、心灵投射[J].中国民族博览,2023(22):136-138.
3彭泓,王骞,贾迪,赵金源,庞宇恒.合成数据驱动目标姿态追踪的快速收敛网络[J].中国图象图形学报,2024,29(1):147-162.
4李晓霖,李刚,张恩琪,顾广华.行列式点过程采样的文本生成图像方法[J].武汉大学学报（信息科学版）,2024,49(2):246-255.
5杨柳青,王守东,杜宝强.基于注意力机制的无监督学习地震数据随机和不规则噪声衰减方法[J].石油科学通报,2024,9(1):35-49.
6倪宇东,姜福豪,邹雪峰,蓝益军,柳兴刚,门哲,许银坡.多采样率地震勘探技术(MrSET)探讨[J].地球物理学报,2024,67(3):1169-1180.
7潘新朋,刘志顺,高大维,王璞,郭振威,柳建新.岩石物理驱动的储层物性参数非线性地震反演方法[J].地球物理学报,2024,67(3):1237-1254. 被引量：1

计算机科学

2024年第3期

浏览历史

内容加载中请稍等...

基于隐空间扩散模型的差分隐私数据合成方法研究

相关作者

相关机构

相关主题

浏览历史