期刊文献+

一种基于领域适配的跨项目软件缺陷预测方法 被引量:15

Domain Adaptation Approach for Cross-project Software Defect Prediction
下载PDF
导出
摘要 软件缺陷预测旨在帮助软件开发人员在早期发现和定位软件部件可能存在的潜在缺陷,以达到优化测试资源分配和提高软件产品质量的目的.跨项目缺陷预测在已有项目的缺陷数据集上训练模型,去预测新的项目中的缺陷,但其效果往往不理想,其主要原因在于,采样自不同项目的样本数据集,其概率分布特性存在较大差异,由此对预测精度造成较大影响.针对此问题,提出一种监督型领域适配(domainadaptation)的跨项目软件缺陷预测方法.将实例加权的领域适配与机器学习的预测模型训练过程相结合,通过构造目标项目样本相关的权重,将其施加于充足的源项目样本中,以实例权重去影响预测模型的参数学习过程,将来自目标项目中缺陷数据集的分布特性适配到训练数据集中,从而实现缺陷数据样本的复用和跨项目软件缺陷预测.在10个大型开源软件项目上对该方法进行实证,从数据集、数据预处理、实验结果多个角度针对不同的实验设定策略进行分析;从数据、预测模型以及模型适配层面分析预测模型的过拟合问题.实验结果表明,该方法性能优于同类方法,显著优于基准性能,且能够接近和达到项目内缺陷预测的性能. Software defect prediction aims at the very early step of software quality control, helps software engineers focus their attention on defect-prone parts during verification process. Cross-project defect predictions are proposed in which prediction models are trained by using sufficient training data from already existed software projects and predict defect in some other projects, however, their performances are always poor. The main reason is that, the divergence of the data distribution among different software projects causes a dramatic impact on the prediction accuracy. This study proposed an approach of cross-project defect prediction by applying a supervised domain adaptation based on instance weighting. The sufficient instances drawn from some source project are weighted by assigning target-dependent weights to the loss function of the prediction model when minimizing the expected loss over the distribution of source data, so that the distribution properties of the data from target project can be matched to the source project. Experiments including dataset selection, data preprocessing and results are described over different experiment strategies on ten open-source software projects. Over fitting problems are also studied through different levels including dataset, prediction model and domain adaptation process. The results show that the proposed approach is close to the performance of within-project defect prediction, better than similar approach and significantly better that of the baseline.
作者 陈曙 叶俊民 刘童 CHEN Shu;YE Jun-Min;LIU Tong(School of Computer,Central China Normal University,Wuhan 430079,China)
出处 《软件学报》 EI CSCD 北大核心 2020年第2期266-281,共16页 Journal of Software
基金 国家科技支撑计划(2015BAK33B00).
关键词 软件缺陷预测 软件缺陷度量元 机器学习 迁移学习 领域适配 software defect prediction software defect metrics machine learning transfer learning domain adaptation
  • 相关文献

参考文献4

二级参考文献223

  • 1Ben-David S,Blitzer J,Crammer K,Pereira F.Analysis of representations for domain adaptation.In:Platt JC,Koller D,Singer Y,Roweis ST,eds.Proc.of the Advances in Neural Information Processing Systems 19.Cambridge:MIT Press,2007.137-144.
  • 2Blitzer J,McDonald R,Pereira F.Domain adaptation with structural correspondence learning.In:Jurafsky D,Gaussier E,eds.Proc.of the Int’l Conf.on Empirical Methods in Natural Language Processing.Stroudsburg PA:ACL,2006.120-128.
  • 3Dai WY,Xue GR,Yang Q,Yu Y.Co-Clustering based classification for out-of-domain documents.In:Proc.of the 13th ACM Int’l Conf.on Knowledge Discovery and Data Mining.New York:ACM Press,2007.210-219.[doi:10.1145/1281192.1281218].
  • 4Dai WY,Xue GR,Yang Q,Yu Y.Transferring naive Bayes classifiers for text classification.In:Proc.of the 22nd Conf.on Artificial Intelligence.AAAI Press,2007.540-545.
  • 5Liao XJ,Xue Y,Carin L.Logistic regression with an auxiliary data source.In:Proc.of the 22nd lnt*I Conf.on Machine Learning.San Francisco:Morgan Kaufmann Publishers,2005.505-512.[doi:10.1145/1102351.1102415].
  • 6Xing DK,Dai WY,Xue GR,Yu Y.Bridged refinement for transfer learning.In:Proc.of the Ilth European Conf.on Practice of Knowledge Discovery in Databases.Berlin:Springer-Verlag,2007.324-335.[doi:10.1007/978-3-540-74976-9_31].
  • 7Mahmud MMH.On universal transfer learning.In:Proc.of the 18th Int’l Conf.on Algorithmic Learning Theory.Sendai,2007.135-149.[doi:10,1007/978-3-540-75225-7_14].
  • 8Samarth S,Sylvian R.Cross domain knowledge transfer using structured representations.In:Proc.of the 21st Conf.on Artificial Intelligence.AAAI Press,2006.506-511.
  • 9Bel N,Koster CHA,Villegas M.Cross-Lingual text categorization.In:Proc.of the European Conf.on Digital Libraries.Berlin:Springer-Verlag,2003.126-139.[doi:10.1007/978-3-540-45175-4_13].
  • 10Zhai CX,Velivelli A,Yu B.A cross-collection mixture model for comparative text mining.In:Proc.of the 10th ACM SIGKDD Int’l Conf.on Knowledge Discovery and Data Mining.New York:ACM,2004.743-748.[doi:10.1145/1014052.1014150].

共引文献610

同被引文献96

引证文献15

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部