期刊文献+

基于边界自适应SMOTE和Focal Loss函数改进LightGBM的信用风险预测模型 被引量:6

Credit risk prediction model based on borderline adaptive SMOTE and Focal Loss improved LightGBM
下载PDF
导出
摘要 针对信用风险评估中数据集不平衡影响模型预测效果的问题,提出一种基于边界自适应合成少数类过采样方法(BA-SMOTE)和利用FocalLoss函数改进LightGBM损失函数的算法(FLLightGBM)相结合的信用风险预测模型。首先,在边界合成少数类过采样(Borderline-SMOTE)的基础上,引入自适应思想和新的插值方式,使每个处于边界的少数类样本生成不同数量的新样本,并且新样本的位置更靠近原少数类样本,以此来平衡数据集;其次,利用FocalLoss函数来改进LightGBM算法的损失函数,并以改进的算法训练新的数据集以得到最终结合BA-SMOTE方法和FLLightGBM算法建立的BA-SMOTE-FLLightGBM模型;最后,在LendingClub数据集上进行信用风险预测。实验结果表明,与其他不平衡分类算法RUSBoost、CUSBoost、KSMOTE-AdaBoost和AK-SMOTE-Catboost相比,所建立的模型在G-mean和AUC两个指标上都有明显的提升,提升了9.0%~31.3%和5.0%~14.1%。以上结果验证了所提出的模型在信用风险评估中具有更好的违约预测效果。 Aiming at the problem that the imbalance of datasets in credit risk assessment affects the prediction effect of the model,a credit risk prediction model based on Borderline Adaptive Synthetic Minority Oversampling TEchnique(BA-SMOTE)and Focal Loss-Light Gradient Boosting Machine(FLLightGBM)was proposed.Firstly,on the basis of Borderline Synthetic Minority Oversampling TEchnique(Borderline-SMOTE),the adaptive idea and new interpolation method were introduced,so that different numbers of new samples were generated for each minority sample at the border,and the positions of the new samples were closer to the original minority sample,thereby balancing the dataset.Secondly,the Focal Loss function was used to improve the loss function of LightGBM(Light Gradient Boosting Machine)algorithm,and the improved algorithm was used to train a new dataset to obtain the final BA-SMOTE-FLLightGBM model constructed by BA-SMOTE method and FLLightGBM algorithm.Finally,on Lending Club dataset,the credit risk prediction was performed.Experimental results show that compared with other imbalanced classification algorithms RUSBoost(Random Under-Sampling with adaBoost),CUSBoost(Cluster-based Under-Sampling with adaBoost),KSMOTE-AdaBoost(K-means clustering SMOTE with AdaBoost),and AK-SMOTE-Catboost(AllKnn-SMOTE-Catboost),the constructed model has a significant improvement on two evaluation indicators G-mean and AUC(Area Under Curve)with 9.0%−31.3%and 5.0%−14.1%respectively.The above results verify that the proposed model has a better default prediction effect in credit risk assessment.
作者 陈海龙 杨畅 杜梅 张颖宇 CHEN Hailong;YANG Chang;DU Mei;ZHANG Yingyu(College of Computer Science and Technology,Harbin University of Science and Technology,Harbin Heilongjiang 150080,China)
出处 《计算机应用》 CSCD 北大核心 2022年第7期2256-2264,共9页 journal of Computer Applications
基金 国家自然科学基金资助项目(61772160) 哈尔滨市科技创新人才研究专项(2017RAQXJ045)。
关键词 信用风险 不平衡数据 过采样 LightGBM FocalLoss credit risk imbalanced data oversampling LightGBM(Light Gradient Boosting Machine) Focal Loss
  • 相关文献

参考文献14

二级参考文献90

共引文献468

同被引文献63

引证文献6

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部