摘要
企业破产数据中存在高维不平衡的特性,会导致模型预测性能降低且预测结果偏向于多数类。为了提高具有破产风险企业的预测准确率,将从特征、数据、模型3个方面综合考虑。首先提出一种Pearson相关系数特征提取规则进行特征选择,再使用已有的平衡化技术进行数据平衡化处理,最后提出了一种基于改变分类阈值的随机森林算法构建企业破产预测模型。在包含10173个公司数据集上的实验结果表明,本文的研究方法具有一定的优越性,对后续进行企业破产预测研究也具有较高的参考价值。
There is a high-dimensional imbalance in enterprise bankruptcy data,which will reduce the prediction performance of the model and the prediction results are biased to most classes.In order to improve the prediction accuracy of a bankruptcy risk enterprises,the characteristics,data and model will be considered comprehensively.First,this paper proposes a Pearson correlation coefficient feature extraction rule for feature selection,and then uses the existing balance technology to balance the data.Finally,a stochastic forest algorithm based on changing the classification threshold is proposed to construct the enterprise bankruptcy prediction model.The experimental results on the data sets of 10173 companies show that the research method in this paper has certain advantages and has a high reference value for the subsequent research on the prediction of enterprise bankruptcy.
作者
张康林
叶春明
ZHANG Kanglin;YE Chunming(Business School,University of Shanghai for Science and Technology,Shanghai 200093)
出处
《科技促进发展》
2021年第4期748-758,共11页
Science & Technology for Development
关键词
改进随机森林
企业破产预测
高维不平衡
特征提取
类平衡化
improved random forest
enterprise bankruptcy prediction
high-dimensional imbalance
feature extract
class balancing