摘要
本文提出一种类不平衡对软件缺陷预测模型稳定性和预测性能的影响分析方法 .首先,使用欠采样方法将原数据集构造成一组不平衡率小于原数据集本身不平衡率的新数据集.其中,在构造数据集时使用固定种子,保证同一个数据集构造的同一个不平衡率的数据集中的数据相同,以减少每次运行结果的随机性.其次,以MCC值作为预测模型的性能评价指标,将每次产生的新数据集放入模型中的分类算法进行训练预测评价,获得当前数据集不同不平衡率下的MCC值,并提出稳定性评价指标.实验结果表明:与AUC相比,MCC更适合作为类不平衡情况下软件缺陷预测模型稳定性的评价指标;对于软件缺陷预测性能稳定性,代价敏感模型表现优于集成模型.
The paper proposes a method for analyzing the influence of class imbalance on software defect prediction model stability and prediction performance.Firstly,the original data set is constructed into a set of new data sets whose un⁃balance rate is less than the original data set's unbalance rate by using the undersampling method.Where,fixed seeds are used in the construction of the data set to ensure that the data in the same unbalanced rate data set constructed by the same data set is the same,so as to reduce the randomness of the results of each run.Secondly,the MCC value is taken as the per⁃formance evaluation indicator of the prediction model,and the new data set generated each time is put into the classification algorithm of the model for training and prediction evaluation,so as to obtain the MCC value at different unbalanced rate for the current data set.We also propose a performance stability evaluation indicator.The experimental results show that,MCC is more suitable as the stability evaluation indicator of software defect prediction model under the condition of class imbalance compared with AUC.For the stability of software defect prediction performance,the cost sensitive model per⁃forms better than the ensemble model.
作者
张艳梅
植胜林
姜淑娟
袁冠
ZHANG Yan-mei;ZHI Sheng-lin;JIANG Shu-juan;YUAN Guan(Mine Digitization Engineering Research Center of the Ministry of Education,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;KeHua Data CO.,LTD,Shenzhen,Guangdong 518055,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2023年第8期2076-2087,共12页
Acta Electronica Sinica
基金
国家自然科学基金(No.61673384,No.71774159)
中国博士后科学基金特别资助(No.2021T140707)。
关键词
类不平衡
缺陷预测
稳定性
预测性能
评价指标
class imbalance
defect prediction
stability
prediction performance
evaluation indicator