摘要
随着代码混淆、加壳技术的应用,基于行为特征的Android应用相似性检测受到的影响愈加明显.提出了一种抗混淆的大规模Android应用相似性检测方法,通过提取应用内特定文件的内容特征计算应用相似性,该方法不受代码混淆的影响,且能有效抵抗文件混淆带来的干扰.对5.9万个应用内的文件类型进行统计,选取具有普遍性、代表性和可度量性的图片文件、音频文件和布局文件作为特征文件.针对3种特征文件的特点,提出了不同内容特征提取方法和相似度计算方法,并通过学习对其相似度赋予权重,进一步提高应用相似性检测的准确性.使用正版应用和已知恶意应用作为标准,对5.9万个应用进行相似性检测实验,结果显示基于文件内容的相似性检测可以准确识别重打包应用和含有已知恶意代码的应用,并且在效率和准确性上均优于现有方案.
Code obfuscation exerts a huge impact on similarity detection among Android applications based on behavior characteristics. In order to deal with the situation, we propose a novel way of similarity detection among Android applications based on file content characteristics, which computes the similarity of file content features and can be applied to large-scale scenario in real world. Our method is not subject to code obfuscation or file obfuscation. We choose to utilize the characteristics of image, audio and layout files which are shown in our statistics as the most representative features in Android applications. Meanwhile, different weights are given to these features through machine learning, which further enhances the accuracy of our method. In addition, we implement a prototype system and particularly optimize each step to speed up the calculation, making our system suitable for large-scale scenario and give a good calculation performance. The experiments dataset contains 59 000 applications. And for both legitimate application and malware applications, our system successfully detects those repackaged pirate applications and those with the similar malicious component, which prove the effectiveness of our method. The experiment results demonstrate that similarity detection based on file content characteristics could resist the file obfuscation and give better performance in both accuracy and efficiency.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2014年第7期1446-1457,共12页
Journal of Computer Research and Development
基金
国家"九七三"重点基础研究发展计划基金项目(2012CB315804)
国家自然科学基金重大研究计划项目(91118006)
国家自然科学基金项目(61073179)
北京市自然科学基金项目(4122086)
关键词
文件内容特征
模糊散列
感知特征
安卓
应用相似性
抗混淆
file content characteristics
fuzzy similarity
anti-obfuscation Hash
perceptual features
Android
application