期刊文献+

克隆代码有害性预测中的特征选择模型 被引量:2

Feature selection model for harmfulness prediction of clone code
下载PDF
导出
摘要 为解决克隆代码有害性预测过程中特征无关与特征冗余的问题,提出一种基于相关程度和影响程度的克隆代码有害性特征选择组合模型。首先,利用信息增益率对特征数据进行相关性的初步排序;然后,保留相关性排名较高的特征并去除其他无关特征,减小特征的搜索空间;接着,采用基于朴素贝叶斯等六种分类器分别与封装型序列浮动前向选择算法结合来确定最优特征子集。最后对不同的特征选择方法进行对比分析,将各种方法在不同选择准则上的优势加以利用,对特征数据进行分析、筛选和优化。实验结果表明,与未进行特征选择之前对比发现有害性预测准确率提高15.2~34个百分点以上;与其他特征选择方法比较,该方法在F1测度上提高1.1~10.1个百分点,在AUC指标上提升达到0.7~22.1个百分点,能极大地提高有害性预测模型的准确度。 To solve the problem of irrelevant and redundant features in harmfulness prediction of clone code, a combination model for harmfulness feature selection of code clone was proposed based on relevance and influence. Firstly, a preliminary sorting for the correlation of feature data was proceeded by the information gain ratio, then the features with high correlation was preserved and other irrelevant features were removed to reduce the search space of features. Next, the optimal feature subset was determined by using the wrapper sequential floating forward selection algorithm combined with six kinds of classifiers including Naive Bayes and so on. Finally, the different feature selection methods were analyzed, and feature data was analyzed, filtered and optimized by using the advantages of various methods in different selection critera. Experimental results show that the prediction accuracy is increased by 15.2-34 percentage pointsafter feature selection; and compared with other feature selection methods, F1-measure of this method is increased by 1.1-10.1 percentage points, and AUC measure is increased by 0.7-22.1 percentage points. As a result, this method can greatly improve the accuracy of harmfulness prediction model.
出处 《计算机应用》 CSCD 北大核心 2017年第4期1135-1142,1163,共9页 journal of Computer Applications
基金 国家自然科学基金资助项目(61363017 61462071) 内蒙古自治区自然科学基金资助项目(2014MS0613 2015MS0606)~~
关键词 克隆代码 有害性预测 特征子集 信息增益率 特征选择 clone code harmfulness prediction feature subset information gain ratio feature selection
  • 相关文献

参考文献4

二级参考文献105

  • 1叶进,林士敏.基于贝叶斯网络的推理在移动客户流失分析中的应用[J].计算机应用,2005,25(3):673-675. 被引量:12
  • 2Shaw M. Truth Vs. knowledge: The difference between what a component does and what we know it does//Proeeedings of the 8th International Workshop Software Specification and Design. Budapest, Hungary, 1996: 181- 185.
  • 3Binkley David. Source code analysis: A road map//Proceedings of the Future of Software Engineering. Minneapolis, MN, USA, 2007:104 -119.
  • 4Dwyer Matthew B, Hatcliff John, Robby, Pasareanu Corina S, Visser Willem. Formal software analysis emerging trends in software model cheeking//Proceedings of the Future of Software Engineering. Minneapolis, MN, USA, 2007: 120- 136.
  • 5Flemming Nielson, Hanne Riis Nielson, Chris Hankin. Principles of Program Analysis. Berlin, Germany: Springer Verlag, 2005.
  • 6Jackson Daniel, Rinard Martin. Software analysis: A roadmap//Proceedings of the Future of Software Engineering. Limerick, Ireland, 2000:133-145.
  • 7Aho Alfred V, Sethi Ravi, Ullman Jeffrey D. Compilers: Principles, Techniques, and Tools. New Jersey, USA: Addison-Wesley, 1986.
  • 8Clarke E M, Jr Grumberg O, Peled D A. Model Checking, Cambridge, MA: MIT Press, 2000.
  • 9Ball T, Rajamani S K. Automatically validating temporal safety properties of interfaces//Dwyer M B ed. Proceedings of the 8th SPIN Workshop. LNCS 2057. Springer, 2001:103-122.
  • 10Chen H, Wagner D A. MOPS: An infrastructure for examining security properties of software//Proceedings of the 9th ACM Conference on Computer and Communications Security. Washengton, DC, USA, 2002:235-244.

共引文献121

同被引文献8

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部