摘要
软件开发过程中,软件开发人员常常通过搜索软件历史仓库(historical software repository,HSR),再经复制/粘贴以实现软件复用。HSR中会保存被复用的代码的缺陷及修复信息,辅助处理相似缺陷。基于此,提出一种基于HSR挖掘的相似缺陷识别方法。首先,基于变更日志的分析,从HSR中提取出已知缺陷的模块,建立bug模块库。然后,采用基于抽象语法树(abstract syntax tree,AST)的相似代码检测方法,识别待测试软件与bug模块库中相似的代码,并借助HSR中保存的相应缺陷及修复信息,完成待测试软件中可能包含潜在缺陷的模块的识别。同时,为提高相似代码的识别精度,优化基于AST的代码特征度量。在18个C程序、164对克隆代码上进行实验,结果表明所提方法能够识别出全部相似代码且性能优于已有工具。在人工构建的bug模块库上验证了代码相似性对相似缺陷识别的作用。最后,在8个真实的大型C项目上进行验证,平均缺陷召回率达到94%,表明挖掘HSR可以有效地为跨项目传播的相似代码提供缺陷理解支持。
In the process of software development,software developers often search the historical software repository(HSR),and then copy/paste the code required to realize software reuse.Bugs and the fixing information of the reused codes are stored in the HSR,which can assist in dealing with the similar bugs.Therefore,a similar bug identification method based on HSR mining is proposed.Firstly,based on the analysis of the change log,the modules with known bugs are extracted from the HSR,then the bug module library is established.Then,the similarity code detection method based on abstract syntax tree(AST)is used to identify the similar code both in the software to be tested and the bug module library.With the help of the corresponding bug and the fix information stored in the HSR,the module that may contain potential bugs in the software to be tested is identified.At the same time,in order to improve the recognition accuracy of the similar codes,the code feature measurement based on AST is optimized.The experimental results on 18 C programs and 164 clone codes show that the proposed method can identify all the similar codes and its performance is better than the existing tools.The effect of code similarity on similar bug identification is verified on the manually built bug module library.Finally,an empirical study on 8 large real-world C projects is proceeded.The average bug recall rate is 94%,which,shows that mining HSR can effectively support bug understanding on circumstance of the similar codes spreading across projects.
作者
龚丹
王甜甜
苏小红
董美含
GONG Dan;WANG Tiantian;SU Xiaohong;DONG Meihan(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;Department of Computer Science and Technology, Harbin Huade University, Harbin 150001, China)
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2020年第10期2399-2408,共10页
Systems Engineering and Electronics
基金
国家自然科学基金(61672191)
“十三五”国家重点研发计划(2017YFC0702204)资助课题。
关键词
软件复用
软件历史仓库
克隆代码
相似缺陷
抽象语法树
software reuse
historical software repository(HSR)
clone code
similar bug
abstract syntax tree(AST)