摘要
如何发现代码克隆,是软件维护和软件侵权纠纷案件中的一个关键问题。由于商业保密等原因,在商业软件的侵权纠纷案中往往无法使用基于源代码比对的克隆检测技术。因此,针对这类无法获得源代码进行代码克隆检测的场景,文中提出一种针对二进制可执行文件分析的代码克隆检测方法。首先,通过反编译与指令类型抽象得到二进制可执行目标文件的指令类型序列;然后,对指令类型序列构建后缀树,利用后缀树的性质获取函数级的指令序列间的克隆信息,并通过消除沙砾指令进一步提高检测性能;最后,基于MIPS32指令集,使用Linux内核和经过混淆处理的代码分别作为克隆级别0-级别2与级别1-级别4的二进制可执行文件代码克隆测试样本,并与源代码检测工具进行对比测试。结果表明,所提算法在缺少源代码的场景下同样能进行细粒度的克隆分析,且对各级代码克隆均具有较好的检测性能。
How to detect code clones is an important issue in software maintenance and software infringements.Clone detection techniques based on source code tend to fail in the infringement disputes of commercial software due to trade secret.Therefore,in the scenario when the source code is unavailable for detection,this paper presented a clone detection algorithm based on binary executable file analysis.Firstly,instruction type sequences of binary executable files are obtained by decompilation instruction type abstraction,then a suffix tree is constructed based on these instruction type sequences.The clone pairs among functions can be figured out based on this suffix tree.In addition,this paper eliminated gravel instructions for enhancing performance.At last,based on MIPS32 instruction set,this paper used respectively Linux kernel and obfuscated test code as samples on clone level 0-level 2 and level 1-level 4 to compare with the source code detection tools.Test results show that even in the scenario where the source code is lacking,this algorithm can also perform fine-grained clone analysis and has high detection performance for code clones at all levels.
作者
张凌浩
桂盛霖
穆逢君
王胜
ZHANG Ling-hao;GUI Sheng-lin;MU Feng-jun;WANG Sheng(State Grid Sichuan Electric Power Research Institute,Chengdu 610000,China;School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China;The 30th Institute of China Electronics Technology Group Corporation,Chengdu 610041,China)
出处
《计算机科学》
CSCD
北大核心
2019年第10期141-147,共7页
Computer Science
基金
国家自然科学基金(61401067)
国网四川省电力公司科技项目(521997170001P,521997170017)资助
关键词
代码克隆
二进制可执行文件
后缀树
性能优化
Code cone
Binary executable file
Suffix tree
Performance optimization