基于特征约束和自适应损失平衡的机器遗忘方法

A Machine Unlearning Method via Feature Constraint and Adaptive Loss Balance

下载PDF

导出

摘要随着数字化进程的加速推进,数据要素已成为现代社会运转的核心驱动力.由于深度学习模型训练需要大量数据作为输入,其数据隐私保护问题也愈发重要.机器遗忘技术使模型能够删除特定数据的影响,同时保持对剩余数据的泛化性能,为深度学习模型中的数据要素安全保护提供了有效的解决方案.现有的机器遗忘方法主要分为精确遗忘和近似遗忘2类,但前者需要干预模型原始训练流程,后者则在遗忘效果和模型泛化能力之间难以找到平衡点.为此,提出了一种基于特征约束和自适应损失平衡的近似遗忘框架.首先,对于“遗忘”过程,使用同样未经过遗忘样本训练的随机模型作为教师来引导遗忘模型的特征输出,实现模型对数据要素在特征层面的遗忘.然后,采用少量剩余数据进行微调训练,来“恢复”模型对于其他数据的泛化性能.将上述机器遗忘框架视为一个多任务优化问题,在“遗忘”和“恢复”2个任务中引入自适应损失平衡,实现任务的稳步训练.以卷积神经网络模型为例,在3个公开数据集上对比了UNSIR等多种基线方法,实验结果表明,该方法构建的遗忘模型不仅保障了机器遗忘效果,在剩余数据的准确率、时间开销、预测结果分布等指标上优于同类方法,更加有效地保护了模型的泛化性能. With the accelerated advancement of digitization,data elements have become the core driving force for the operation of modern society.However,at the same time,data security issues have become increasingly prominent,with frequent occurrences of data breaches and privacy violations,causing serious losses to individuals,organizations,and even countries.Against this backdrop,the security of data elements has become the focus of attention from all sectors of society,and the issue of data privacy protection in deep learning models has also attracted widespread attention.Among them,machine unlearning,as a key technology for protecting user’s privacy,aims to enable models to remove the influence of specific data while maintaining generalization performance for remaining data,providing an effective solution for protecting the security of data elements in deep learning models.Existing machine unlearning methods are mainly divided into two categories:exact unlearning and approximate unlearning.However,exact unlearning methods need to intervene in the original training process of the models,while approximate unlearning methods find it difficult to strike a balance between unlearning performance and model generalization ability.To address these issues,we propose an approximate unlearning framework based on feature constraints and adaptive loss balancing.We adopt a“forgetting-recovering”machine unlearning framework.First,for the“forgetting”process,in order to mimic the feature outputs of retrained models for the forgetting samples,we use a randomly initialized model that has not been trained on the forgetting samples to guide the feature outputs of the unlearning model,constraining forgetting at the feature level to avoid easily obtaining forgetting data information from the model.Then,a small amount of data is used for fine-tuning to“recover”the generalization performance of models on the remaining data.At the same time,we regard the above machine unlearning framework as a multi-task optimization problem and introduce adaptive loss balance to automatically balance the“forgetting”and“recovering”tasks,preventing the model from“over-forgetting”or“over-recovering”,so that the“forgetting”and“recovering”tasks can be trained relatively balanced and steadily.Extensive experiments on 3 image classification datasets show that our method can effectively forget the forgetting data and achieve optimal performance in multiple metrics.

作者殷昱煜吴广强李尤慧子王鑫雨高洪皓 Yin Yuyu;Wu Guangqiang;Li Youhuizi;Wang Xinyu;Gao Honghao(School of Computer Science,Hangzhou Dianzi University,Hangzhou 310018;Key Laboratory of Complex Systems Modeling and Simulation(Hangzhou Dianzi University),Ministry of Education,Hangzhou 310018;School of Computer Engineering and Science,Shanghai University,Shanghai 200444;Department of Computer Engineering,Gachon University,Seongnam,Republic of Korea 461701)

机构地区杭州电子科技大学计算机学院复杂系统建模与仿真教育部重点实验室(杭州电子科技大学) 上海大学计算机工程与科学学院嘉泉大学计算机工程系

出处《计算机研究与发展》 EI CSCD 北大核心 2024年第10期2649-2661,共13页 Journal of Computer Research and Development

基金国家自然科学基金项目(62272140) 浙江省自然科学基金项目(LY22F020018) 浙江省“尖兵”“领雁”研发公关计划(2024C01166)。

关键词数据要素安全机器遗忘特征约束多任务优化自适应损失平衡 data element security machine unlearning feature constraints multi-task optimization adaptive loss balance

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1李文斌,熊亚锟,范祉辰,邓波,曹付元,高阳.持续学习的研究进展与趋势[J].计算机研究与发展,2024,61(6):1476-1496. 被引量：1
2郭虎升,张洋,王文剑.面向不同类型概念漂移的两阶段自适应集成学习方法[J].计算机研究与发展,2024,61(7):1799-1811. 被引量：1

二级参考文献8

1王涛,李舟军,颜跃进,陈火旺.数据流挖掘分类技术综述[J].计算机研究与发展,2007,44(11):1809-1815. 被引量：41
2杜航原,王文剑,白亮.一种基于优化模型的演化数据流聚类方法[J].中国科学：信息科学,2017,47(11):1464-1482. 被引量：13
3翟婷婷,高阳,朱俊武.面向流数据分类的在线学习综述[J].软件学报,2020,31(4):912-931. 被引量：25
4郭虎升,张爱娟,王文剑.基于在线性能测试的概念漂移检测方法[J].软件学报,2020,31(4):932-947. 被引量：12
5赵鹏,周志华.基于决策树模型重用的分布变化流数据学习[J].中国科学：信息科学,2021,51(1):1-12. 被引量：16
6郭虎升,任巧燕,王文剑.基于时序窗口的概念漂移类别检测[J].计算机研究与发展,2022,59(1):127-143. 被引量：10
7文益民,刘帅,缪裕青,易新河,刘长杰.概念漂移数据流半监督分类综述[J].软件学报,2022,33(4):1287-1314. 被引量：10
8梁斌,李光辉,代成龙.面向概念漂移且不平衡数据流的G-mean加权分类方法[J].计算机研究与发展,2022,59(12):2844-2857. 被引量：6

1孙红杰.数字文化遗产的文明互鉴潜质与文化拾遗功能[J].中国非物质文化遗产,2024(3):115-123.
2张凯,刘月,覃正楚,秦心怡.迁移表征的知识追踪模型[J].智能系统学报,2024,19(4):974-982.
3韩红桂,白星,侯莹.基于领域自适应的城市污水处理运行过程多工况优化控制[J].中国科学：技术科学,2024,54(9):1652-1664.
4杜芳鹏,李聪聪,谭富荣,贺小元,许文哲,杨剑羽.昭通盆地褐煤元素地球化学特征及对沉积环境的响应[J].延安大学学报（自然科学版）,2024,43(3):33-41.
5梁燕滨,刘小华,龙彩银.加强呼吸训练流程对提高上腹部肝胆特异性造影剂动态增强扫描图像质量的影响[J].影像研究与医学应用,2024,8(18):188-190.
6杨胜英,潘炜垚,雷景生,张淑萍,钱小鸿.基于自适应特征融合的番茄叶病害识别方法[J].中国农机化学报,2024,45(10):241-246.
7许晖.为《销量周榜》填歌词[J].汽车之友,2024(10):7-7.

计算机研究与发展

2024年第10期

浏览历史

内容加载中请稍等...

基于特征约束和自适应损失平衡的机器遗忘方法

参考文献2

二级参考文献8

相关作者

相关机构

相关主题

浏览历史