面向Stacking算法的差分隐私保护研究

Research on differential privacy protection for Stacking algorithm

下载PDF

导出

摘要为解决同质集成学习算法对噪声更敏感,难以兼顾较好的预测性能和有效的隐私保护这一问题,提出一种基于差分隐私的DPStacking算法,将异质Stacking算法与差分隐私技术相结合,以优化算法的隐私保护和预测性能。但是,由于Stacking算法的低层和高层模型都可以由不同的学习器构成,若对某个具体学习器设计隐私预算分配方案来提供差分隐私保护,该方案往往无法适用于由任意基学习器和元学习构成的Stacking算法。基于此,设计了一种基于元学习器的隐私预算分配方案,此方案根据皮尔逊相关系数及差分隐私并行组合的特性为元学习器输入的不同构成体分配不同的隐私预算。通过理论与实验验证,DPStacking算法符合ε-差分隐私保护,与基于差分隐私的随机森林算法(DiffRFs)、Adaboost算法(DP-AdaBoost)、XGBoost算法(DPXGB)相比,能有效保护数据隐私的同时拥有更好的预测性能,并较好地解决了单一同质集成学习算法对噪声更加敏感的问题。 In order to solve the problem that homogeneous ensemble learning algorithms are more sensitive to noise and difficult to take into account both better predictive performance and effective privacy protection,a DPStacking algorithm based on differential privacy is proposed.This algorithm combines heterogeneous Stacking algorithms with differential privacy technology to optimize the privacy protection and its predictive performance.However,since both the low-level and high-level models of the Stacking algorithm can be composed of different learners,if a privacy budget allocation scheme is designed for a particular learner to provide differential privacy protection,this scheme is often not applicable to Stacking algorithms composed of arbitrary base learners and meta-learners.Based on this,a privacy budget allocation scheme based on meta-learners is designed,which allocates different privacy budgets to different components of meta-learners according to the Pearson correlation coefficient and the characteristics of differential privacy parallel combination.Through theoretical and experimental verification,DPStacking algorithm satisfiesε-differential privacy protection.Compared with differential privacy random forest algorithm(DiffRFs),Adaboost algorithm(DP-AdaBoost),XGBoost algorithm(DPXGB),it can effectively guarantee data privacy while having better predictive performance,and better solve the problem that single homogeneous ensemble learning algorithm is more sensitive to noise.

作者董燕灵张淑芬徐精诚王豪石 DONG Yan-ling;ZHANG Shu-fen;XU Jing-cheng;WANG Hao-shi(College of Science,North China University of Science and Technology,Tangshan 063210;Hebei Key Laboratory of Data Science and Application,Tangshan 063210;Tangshan Key Laboratory of Data Science,Tangshan 063210;Tangshan Key Laboratory of Big Data Security and Intelligent Computing,Tangshan 063210,China)

机构地区华北理工大学理学院河北省数据科学与应用重点实验室唐山市数据科学重点实验室唐山市大数据安全与智能计算重点实验室

出处《计算机工程与科学》 CSCD 北大核心 2024年第2期244-252,共9页 Computer Engineering & Science

基金国家自然科学基金(U20A20179)。

关键词差分隐私隐私预算分配 Stacking算法集成学习 differential privacy privacy budget allocation Stacking algorithm ensemble learning

分类号 TP309.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献12

1刘睿瑄,陈红,郭若杨,赵丹,梁文娟,李翠平.机器学习中的隐私攻击与防御[J].软件学报,2020,31(3):866-892. 被引量：25
2李顺东,王道顺.基于同态加密的高效多方保密计算[J].电子学报,2013,41(4):798-803. 被引量：47
3沈志荣,薛巍,舒继武.可搜索加密机制研究与进展[J].软件学报,2014,25(4):880-895. 被引量：62
4杨旸,杨书略,柯闽.加密云数据下基于Simhash的模糊排序搜索方案[J].计算机学报,2017,40(2):431-444. 被引量：28
5穆海蓉,丁丽萍,宋宇宁,卢国庆.DiffPRFs:一种面向随机森林的差分隐私保护算法[J].通信学报,2016,37(9):175-182. 被引量：15
6贾俊杰,邱万勇,马慧芳.差分隐私保护约束下集成分类算法的研究[J].信息安全学报,2021,6(4):106-118. 被引量：3
7谢世茂,毛航,陈思成.基于纵向联邦学习的快速提升树算法[J].信息技术与标准化,2022(6):55-61. 被引量：2
8宛艳萍,谷佳真,张芳.融合改进Stacking与规则的文本情感分析[J].小型微型计算机系统,2021,42(7):1389-1395. 被引量：8
9余东行,张保明,赵传,郭海涛,卢俊.联合卷积神经网络与集成学习的遥感影像场景分类[J].遥感学报,2020,24(6):717-727. 被引量：37
10代伟,许峻玮,王鹤杰,李琪.SPNG+:基于stacking集成策略预测革兰氏阳性菌非经典分泌蛋白质[J].中国生物化学与分子生物学报,2021,37(7):937-947. 被引量：1

二级参考文献119

1Shun-DongLi Yi-QiDai.Secure Two-Party Computational Geometry[J].Journal of Computer Science & Technology,2005,20(2):258-263. 被引量：36
2李顺东,戴一奇,游启友.姚氏百万富翁问题的高效解决方案[J].电子学报,2005,33(5):769-773. 被引量：43
3Richard A Johnson Dean W Wichem著陆璇译.实用多元统计分析[M].北京:清华大学出版社,2001.545-585.
4刘来福曾文艺.数学模型与数学建模[M].北京:北京师范大学出版社,2002..
5Roger B Nelson. An Introduction to Copulas[M]. New York: Springer, 1999.
6A C Yao. Protocols for secure computations [ A ]. Proceedings of the 23th IEEE Symposium on Foundations of Computer Sci- ence [ C]. Piscataway: IEEE Press, 1982.160 - 164.
7S Goldwasser. Multi-party computations: Past and present [ A]. Proceedings of the 16th Annual ACM Symposium on Principles of Distributed Computing [ C]. NY: ACM Press, 1997.1 - 6.
8W L Du,M J Atallah. Privacy-preserving cooperative scientific computations [ A]. Proceedings of 14th IEEE Computer Securi- ty Foundations Workshop Lecture [ C ].Piscataway: IEEE Press, 2001.273 - 282.
9S G Choi, K W Hwangy, J Katz, et al. Secure multi-party com- putation of boolean circuits with applications to privacy in on- line marketplaces [ A ]. Lecture Notes in Computer Science 7178 [ C]. NY: Springer, 2012.416 - 432.
10R Agrawal, R Srikant. Privacy-preserving data mining [ A ]. Proceedings of ACM International Conference on Management of Data and Symposium on Principles of Database Systems [ C]. NY: ACM Press,2000.439 - 450.

共引文献243

1王鹏宇,张艳硕,李烨龙.“工资中位数问题”的方案分析与设计[J].北京电子科技学院学报,2022,30(1):75-85.
2马舒岑,史建琦,黄滟鸿,秦胜潮,侯哲.基于最小不满足核的随机森林局部解释性分析[J].软件学报,2022,33(7):2447-2463. 被引量：2
3李晓岩,史宏.人工智能视域下机器学习的教育应用与创新探索[J].计算机产品与流通,2020,9(9):40-40. 被引量：3
4刘有耀,陈琪,李舒曼.基于迁移学习的遥感图像场景分类[J].光电子．激光,2022,33(7):709-714. 被引量：3
5李国安,朱翰.二元Block-Basu型指数分布的独立性与不相关性[J].大学数学,2012,28(6):88-90. 被引量：3
6詹婉荣,于海.相关系数的传递性[J].大学数学,2013,29(1):91-94. 被引量：6
7ZHENG Qiang,LIU HuiQing,LI Fang,WANG Qing,WANG ChangJiu,LU Chuan.Quantitative description of steam channels after steam flooding[J].Science China(Technological Sciences),2013,56(5):1164-1168.
8郑强,刘慧卿,李芳,王庆,王长久,卢川.蒸汽驱后汽窜通道定量描述[J].中国科学：技术科学,2013,43(6):684-688. 被引量：10
9杨帆,冯翔,阮羚,陈俊武,夏荣,陈昱龙,金志辉.基于皮尔逊相关系数法的水树枝与超低频介损的相关性研究[J].高压电器,2014,50(6):21-25. 被引量：57
10王怀军,房鼎益,董浩,陈晓江,汤战勇.白盒环境中防动态攻击的软件保护方法研究[J].电子学报,2014,42(3):529-537. 被引量：10

1杨雪驰.炼化企业电力系统运行异常检测技术研究[J].中国设备工程,2024(3):180-182. 被引量：1
2尹春勇,蒋奕阳.基于个性化时空聚类的差分隐私轨迹保护模型[J].信息网络安全,2024(1):80-92.
3王艳君.基于改进Adaboost算法的数据网络DoS入侵安全检测方法[J].信息与电脑,2023,35(21):1-3.
4李媚媚.基于多数据融合的电力物资供应需求量预测方法[J].信息与电脑,2023,35(21):76-78.
5胡智勇,于千城,王之赐,张丽丝.基于多目标优化的联邦学习进化算法[J].计算机应用研究,2024,41(2):415-420. 被引量：2
6张京良,解朋朋,曾雪迎.高等数学创新性教学的探索与实践[J].高等数学研究,2024,27(1):103-107. 被引量：1
7张波,聂伟.射频电路理论与实验课程建设[J].中国新通信,2023,25(23):52-54.
8徐茹枝,戴理朋,夏迪娅,杨鑫.基于联邦学习的中心化差分隐私保护算法研究[J].信息网络安全,2024(1):69-79.
9王二香.企业全面预算管理存在的问题与挑战[J].现代营销（上）,2023(11):74-76.
10庞诺言,关东海,袁伟伟.基于早期时间序列分类的可解释实时机动识别算法[J].计算机工程与科学,2024,46(2):353-362.

计算机工程与科学

2024年第2期

浏览历史

内容加载中请稍等...

面向Stacking算法的差分隐私保护研究

参考文献12

二级参考文献119

共引文献243

相关作者

相关机构

相关主题

浏览历史