摘要
开源软件缺陷预测通过挖掘软件历史仓库的数据,利用与软件缺陷相关的度量元或源代码本身的语法语义特征,借助机器学习或深度学习方法提前发现软件缺陷,从而减少软件修复成本并提高产品质量.漏洞预测则通过挖掘软件实例存储库来提取和标记代码模块,预测新的代码实例是否含有漏洞,减少漏洞发现和修复的成本.通过对2000年至2022年12月软件缺陷预测研究领域的相关文献调研,以机器学习和深度学习为切入点,梳理了基于软件度量和基于语法语义的预测模型.基于这2类模型,分析了软件缺陷预测和漏洞预测之间的区别和联系,并针对数据集来源与处理、代码向量的表征方法、预训练模型的提高、深度学习模型的探索、细粒度预测技术、软件缺陷预测和漏洞预测模型迁移六大前沿热点问题进行了详尽分析,最后指出了软件缺陷预测未来的发展方向.
Open-source software defect prediction reduces software repair costs and improves product quality by mining data from software history warehouses,using the syntactic semantic features of metrics related to software defects or the source code itself,and utilizing machine learning or deep learning methods to find software defects in advance.Vulnerability prediction extracts and tags code modules by mining software instance repositories to predict whether new code instances contain vulnerabilities in order to reduce the cost of vulnerability discovery and fixing.We investigate and analyze the relevant literatures in the field of software defect prediction from 2000 to December 2022.Taking machine learning and deep learning as the starting point,we sort out two types of prediction models which are based on software metrics and grammatical semantics.Based on the two types of models,the difference and connection between software defect prediction and vulnerability prediction are analyzed.Moreover,six frontier hot issues such as dataset source and processing,code vector representation method,pre-training model improvement,deep learning model exploration,fine-grained prediction technology,software defect prediction and vulnerability prediction model migration are analyzed in detail.Finally,the future development direction of software defect prediction is pointed out.
作者
田笑
常继友
张弛
荣景峰
王子昱
张光华
王鹤
伍高飞
胡敬炉
张玉清
Tian Xiao;Chang Jiyou;Zhang Chi;Rong Jingfeng;Wang Ziyu;Zhang Guanghua;Wang He;Wu Gaofei;Hu Jinglu;Zhang Yuqing(School of Cyber Engineering,Xidian University,Xi’an 710126;National Computer Network Intrusion Protection Center(University of Chinese Academy of Sciences),Beijing 101408;School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang 050018;Guangxi Key Laboratory of Cryptography and Information Security(Guilin University of Electronic Technology),Guilin,Guangxi 541000;Graduate School of Information,Production and Systems,Waseda University,Japan 808-0135;College of Cyberspace Security,Hainan University,Haikou 570228;Zhongguancun Laboratory,Beijing 100094)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2023年第7期1467-1488,共22页
Journal of Computer Research and Development
基金
先进密码技术与系统安全四川省重点实验室开放课题(SKLACSS-202205)
海南省重点研发计划项目(GHYF2022010,ZDYF202012)
国家自然科学基金项目(U1836210)
陕西省自然科学基础研究计划(2021JQ-192)
广西密码学与信息安全重点实验室课题(GCIS202123)。
关键词
软件缺陷预测
漏洞预测
机器学习
深度学习
度量元
语法语义分析
software defect prediction
vulnerability prediction
machine learning
deep learning
metric
semantic and syntactic analysis