Decompilation aims to analyze and transform low-level program language(PL)codes such as binary code or assembly code to obtain an equivalent high-level PL.Decompilation plays a vital role in the cyberspace security fi...Decompilation aims to analyze and transform low-level program language(PL)codes such as binary code or assembly code to obtain an equivalent high-level PL.Decompilation plays a vital role in the cyberspace security fields such as software vulnerability discovery and analysis,malicious code detection and analysis,and software engineering fields such as source code analysis,optimization,and cross-language cross-operating system migration.Unfortunately,the existing decompilers mainly rely on experts to write rules,which leads to bottlenecks such as low scalability,development difficulties,and long cycles.The generated high-level PL codes often violate the code writing specifications.Further,their readability is still relatively low.The problems mentioned above hinder the efficiency of advanced applications(e.g.,vulnerability discovery)based on decompiled high-level PL codes.In this paper,we propose a decompilation approach based on the attention-based neural machine translation(NMT)mechanism,which converts low-level PL into high-level PL while acquiring legibility and keeping functionally similar.To compensate for the information asymmetry between the low-level and high-level PL,a translation method based on basic operations of low-level PL is designed.This method improves the generalization of the NMT model and captures the translation rules between PLs more accurately and efficiently.Besides,we implement a neural decompilation framework called Neutron.The evaluation of two practical applications shows that Neutron’s average program accuracy is 96.96%,which is better than the traditional NMT model.展开更多
Program comprehension is one of the most important applications in decompilation. The more abstract the decompilation result the better it is understood. Intrinsic function is introduced by a compiler to reduce the ov...Program comprehension is one of the most important applications in decompilation. The more abstract the decompilation result the better it is understood. Intrinsic function is introduced by a compiler to reduce the overhead of a function call and is inlined in the code where it is called. When analyzing the decompiled code with lots of inlined intrinsic functions, reverse engineers may be confused by these detailed and repeated operations and lose the goal. In this paper, we propose a method based graph isomorphism to detect intrinsic function on the CFG (Control Flow Graph) of the target function first. Then we identify the boundary of the intrinsic function, determine the parameter and return value and reduce the intrinsic function to a single function call in the disassembled program. Experimental results show that our method is more efficient at reducing intrinsic functions than the state-of-art decompilers such as Hex-Rays, REC and RD (Retargetable Decompiler).展开更多
Decompiling, as a means of analysing and understanding software, has great practical value. This paper presents a kind of decompiling method offered by the authors,in which the techniques of library-function pattern r...Decompiling, as a means of analysing and understanding software, has great practical value. This paper presents a kind of decompiling method offered by the authors,in which the techniques of library-function pattern recognition, intermediate language,symbolic execution, rule-based 4ata type recovery program transformation, and knowledge engineering are separately aPPlied to diIfernt phases of decompiling. Then it is discussed that the techulques of developing expert systems are adopted to build a decompiling system shell independent of the knowledge of language and program runningenvironment. The shell will become a real decompiler, as long as the new knowledgeof application environment is interactively acqired.展开更多
基金Our research was supported by NSFC U1836211.And the recipient is Professor Kai Chen.
文摘Decompilation aims to analyze and transform low-level program language(PL)codes such as binary code or assembly code to obtain an equivalent high-level PL.Decompilation plays a vital role in the cyberspace security fields such as software vulnerability discovery and analysis,malicious code detection and analysis,and software engineering fields such as source code analysis,optimization,and cross-language cross-operating system migration.Unfortunately,the existing decompilers mainly rely on experts to write rules,which leads to bottlenecks such as low scalability,development difficulties,and long cycles.The generated high-level PL codes often violate the code writing specifications.Further,their readability is still relatively low.The problems mentioned above hinder the efficiency of advanced applications(e.g.,vulnerability discovery)based on decompiled high-level PL codes.In this paper,we propose a decompilation approach based on the attention-based neural machine translation(NMT)mechanism,which converts low-level PL into high-level PL while acquiring legibility and keeping functionally similar.To compensate for the information asymmetry between the low-level and high-level PL,a translation method based on basic operations of low-level PL is designed.This method improves the generalization of the NMT model and captures the translation rules between PLs more accurately and efficiently.Besides,we implement a neural decompilation framework called Neutron.The evaluation of two practical applications shows that Neutron’s average program accuracy is 96.96%,which is better than the traditional NMT model.
文摘Program comprehension is one of the most important applications in decompilation. The more abstract the decompilation result the better it is understood. Intrinsic function is introduced by a compiler to reduce the overhead of a function call and is inlined in the code where it is called. When analyzing the decompiled code with lots of inlined intrinsic functions, reverse engineers may be confused by these detailed and repeated operations and lose the goal. In this paper, we propose a method based graph isomorphism to detect intrinsic function on the CFG (Control Flow Graph) of the target function first. Then we identify the boundary of the intrinsic function, determine the parameter and return value and reduce the intrinsic function to a single function call in the disassembled program. Experimental results show that our method is more efficient at reducing intrinsic functions than the state-of-art decompilers such as Hex-Rays, REC and RD (Retargetable Decompiler).
文摘Decompiling, as a means of analysing and understanding software, has great practical value. This paper presents a kind of decompiling method offered by the authors,in which the techniques of library-function pattern recognition, intermediate language,symbolic execution, rule-based 4ata type recovery program transformation, and knowledge engineering are separately aPPlied to diIfernt phases of decompiling. Then it is discussed that the techulques of developing expert systems are adopted to build a decompiling system shell independent of the knowledge of language and program runningenvironment. The shell will become a real decompiler, as long as the new knowledgeof application environment is interactively acqired.