基于梯度优化的大语言模型后门识别探究

Research on gradient optimization-based backdoor identification of large language model

下载PDF

导出

摘要随着大语言模型的流行并且应用在越来越多的领域,大语言模型的安全问题也随之而来。通常训练大语言模型对数据集以及计算资源有着极为苛刻的要求,所以有使用需求的用户大部分都直接利用网络上开源的数据集以及模型,这给后门攻击提供了绝佳的温室。后门攻击是指用户在模型中输入正常数据时模型表现像没有注入后门时一样正常,但当输入带有后门触发器的数据时模型输出异常。防止后门攻击的有效方法就是进行后门识别。目前基于梯度的优化方法是比较常用的,但使用这些方法时内部影响因子的设定对识别效果具有一定影响。文章就词令牌数量、最邻近数量、噪声大小进行了实验测量和作用机制的分析,以便为后续使用这些方法的研究者提供参考。 With the popularity of large language models(LLM)and their application in more fields,the security concerns of large language models also arise.In general,training LLM has extremely demanding requirements for datasets and computing resources,so most users who need to use them directly use open-source datasets and models on the Internet,which provides an excellent greenhouse for backdoor attacks.A backdoor attack is when a user enters normal data into the model as if it were not injected with a backdoor,but the model output is abnormal when data with a backdoor trigger is input.An effective way to prevent backdoor attacks is to perform backdoor identification.At present,gradient-based optimization methods are commonly used,but the setting of internal impact factors has a great impact on the recognition effect when using these methods.In this paper,the word token length,the number of nearest neighbors,and the noise scale are measured experimentally and the mechanism of action is analyzed,so as to provide reference for researchers who use these methods in the future.

作者陈佳华陈宇曹婍 Chen Jiahua;Chen Yu;Cao Qi(School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610066,China;School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China;CAS Key Laboratory of AI Security,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

机构地区电子科技大学信息与软件工程学院北京邮电大学计算机学院中国科学院计算技术研究所智能算法安全重点实验室

出处《网络安全与数据治理》 2023年第12期14-19,共6页 CYBER SECURITY AND DATA GOVERNANCE

基金国家重点研发计划(2022YFB3103700,2022YFB3103701)。

关键词大语言模型后门攻击基于梯度的后门识别影响因子 large language models backdoor attack gradient-based backdoor identification impact factor

分类号 TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1韩占政,刘化冰,王国纶,徐崧哲.便携式计算机转轴阻尼对触控显示屏振动的研究[J].电脑知识与技术,2023,19(33):51-54.
2王周生,杨庚,戴华.基于梯度选择的轻量化差分隐私保护联邦学习[J].计算机科学,2024,51(1):345-354.
3刘杰,王佳琦,韩秋漪,张善端.222nm准分子光源系统的封闭式循环散热[J].照明工程学报,2023,34(6):114-121.
4魏荣琨,马荣,赵龙庆,周沫,魏明洋.基于激光雷达梯度滤波的道路边界检测[J].机电工程技术,2023,52(12):207-212. 被引量：1
5李炯逸,李彬,邱前辉,刘遗斌,田联房.基于MRI与优化3D-ResNet18的鼻咽癌复发预测模型[J].中国生物医学工程学报,2023,42(5):583-593. 被引量：1
6张忠胜,张霞,丁文皓,王惠馨.碳化硅材料热扩散系数的外推法应用研究[J].炭素,2023(3):8-12.
7侯春旭,冯俊小,陈昊,陈宋璇,王云.甜高粱秸秆热物性研究[J].绿色矿冶,2023,39(6):86-92.
8管星悦,黄恒焱,彭华祺,刘彦航,李文飞,王炜.生物分子模拟中的机器学习方法[J].物理学报,2023,72(24):45-57.
9张定,朱玉莹,汪恒,薛其坤.转角铜氧化物中的约瑟夫森效应[J].物理学报,2023,72(23):71-78.
10周亮,孟进,刘永才,李伟,杨浩楠.基于非脉冲矢量网络分析仪的非反向交叉眼干扰实验设计与分析[J].系统工程与电子技术,2024,46(1):62-70.

网络安全与数据治理

2023年第12期

浏览历史

内容加载中请稍等...

基于梯度优化的大语言模型后门识别探究

相关作者

相关机构

相关主题

浏览历史