摘要
分别基于逻辑回归模型和XGBoost机器学习模型构建了信用评分卡,比较了两种模型在个人信用评分上的表现,指出XGBoost机器学习模型在“AUC、KS、F1和Accuracy值”上表现更加优秀.首先,从数据的包容性、可解释性以及模型的准确性方面对两个模型进行了对比;其次,使用住房贷款违约风险预测的竞赛数据,分别构建了基于逻辑回归模型和XGBoost机器学习模型的信用评分卡,并使用了AUC、KS、F1和Accuracy来评估这两个模型的分类效果和预测准确程度;最后,通过对比两个模型的评估结果,分析了XGBoost机器学习模型相较于逻辑回归模型更加优秀的原因.结论指出:XGBoost机器学习模型在测试集上的AUC、KS、F1和Accuracy值比逻辑回归模型分别提升了19.9%、17.5%、15.4%和11.9%,其原因在于XGBoost机器学习模型纳入了更多的维度信息、更加科学的缺失值处理方式以及考虑了正则化项的算法原理.
A credit scoring card based on logistic regression model and XGBoost machine learning model are constructed respectively.The performance of the two models are compared in personal credit scoring,and it is pointed out that XGBoost machine learning model performs better in“AUC,KS,F1 and Accuracy values”.Firstly,a comparative analysis of the two models is made from the aspects of data inclusiveness,interpretability and model accuracy.Secondly,using the competition data of housing loan default risk prediction,credit scoring cards based on logistic regression model and XGBoost machine learning model are constructed respectively,and AUC,KS,F1 and Accuracy are used to evaluate the classification effect and prediction accuracy of the two models.Finally,by comparing the evaluation results of the two models,the reason why XGBoost machine learning model is better than logistic regression model is analyzed.The conclusion points out that the values of AUC,KS,F1 and Accuracy of XGBoost machine learning model in the test set are increased by 19.9%,17.5%,15.4%and 11.9%respectively compared with logistic regression model.The reason is that XGBoost machine learning model includes more dimensional information,more scientific missing value processing method and better algorithm principle considering regularization term.
作者
张利斌
吴宗文
ZHANG Libin;WU Zongwen(School of Economics,South-Central Minzu University,Wuhan 430074,China)
出处
《中南民族大学学报(自然科学版)》
CAS
北大核心
2023年第6期846-852,共7页
Journal of South-Central University for Nationalities:Natural Science Edition
基金
中南民族大学研究生创新基金项目资助项目(3212021sycxjj195)。