Logistic回归的双层变量选择研究被引量：13

Research on Bi-level Variable Selection for Logistic Regression

下载PDF

导出

摘要变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构进行筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸出的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法 Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后,本文将adSGL法应用于信用卡信用评分研究,与logistic回归相比,其具有更高的分类精度和稳健性。 Variable selection is of great importance in statistical modeling. Suitable variables can make the model simple and have favorite performance of prediction. We propose a novel penalized bi-level variable selection method- adaptive Sparse Group Lasso （adSGL） , under the framework of logistic regression. Its uniqueness is that it does selection based on the grouping structure of predictors, which realizes selections at both group and individual level. It has the advantage of allowing different amounts of shrinkage for different individuals and groups, which can avoid over shrinkage for large coefficients and improve the accuracies of estimate and prediction. The difficulties of solution lies in the non-strict convexity of the penalized likelihood function so we solve the model based on block coordinate descent and establish selection criteria of tuning parameter. Simulation studies show that in compare with three representative methods Sparse Group Lasso,Group Lasso and Lasso, adSGL not only enhances bi-level selection accuracy, but also reduces model error. In the application of credit card credit scoring dataset shows that in compare with logistic regression, adSGL method has higher classification accuracy and better robustness.

作者王小燕方匡南谢邦昌

机构地区厦门大学经济学院统计系厦门大学经济学院台湾辅仁大学统计资讯学系

出处《统计研究》 CSSCI 北大核心 2014年第9期107-112,共6页 Statistical Research

基金国家自然科学基金面上项目“广义线性模型的组变量选择及其在信用评分中的应用”(71471152) 国家社会科学基金重大项目“大数据与统计学理论的发展研究”(13&ZD148) 国家社会科学基金青年项目“大数据的高维变量选择方法及其应用研究”(13CTJ001)资助

关键词变量选择群组变量惩罚似然信用评分 Variable Selection Grouped Variables Penalized Likelihood Credit Scoring

分类号 F222.3 [经济管理—国民经济]

引文网络
相关文献

参考文献15

1Fan J., Li R. Variable selection via nonconeave penalized likelihood and its oracle properties E J ]. Journal of the American Statistical Association, 2001 ( 96 ) : 1348 - 1360.
2孙燕.随机效应Logit计量模型的自适应Lasso变量选择方法研究——基于Gauss-Hermite积分的EM算法[J].数量经济技术经济研究,2012,29(12):147-157. 被引量：11
3张景肖,刘燕平.函数性广义线性模型曲线选择的正则化方法[J].统计研究,2012,29(9):95-102. 被引量：4
4Tibshirani R. Regression shrinkage and selection via the Lasso [ J ]. Journal of Royal Statistical Society, Series B, 1996 ( 58 ) : 267 - 288.
5Yuan M. , Lin Y. Model selection and estimation in regression with grouped variables [ J ]. Journal of the Royal Statistical Society, Series B, 2006, 68 ( 1 ) :49 - 67.
6Huang J. , et al. A group bridge approach for variable selection [ J ]. Biometrika, 2009 ( 96 ) :339 - 355.
7Broheny P., Huang J. Penalized methods for bi-level variable selection [ J]. Statistics and its Interfaces, 2009, 2 (3) :369 - 380.
8Simon N., et al. A sparse group lasso [ J ]. Journal of Computational and Graphical Statistics, 2013, 22 (2) :231 - 245.
9Zou H. The adaptive lasso and its oracle properties [ J ]. Journal of the American Statistical Association, 2006( 101 ):1418 - 1429.
10Wang H., Leng C. A Note of Adaptive Group Lasso [ J ]. Computational Statistics and Data Analysis, 2006 (52) :5277 - 5286.

二级参考文献63

1刘闽,林成德.基于支持向量机的商业银行信用风险评估模型[J].厦门大学学报（自然科学版）,2005,44(1):29-32. 被引量：26
2韩俊林,陈励.随机效应Logistic模型的参数估计[J].数量经济技术经济研究,2005,22(1):93-98. 被引量：4
3刘云焘,吴冲,王敏,乔木.基于支持向量机的商业银行信用风险评估模型研究[J].预测,2005,24(1):52-55. 被引量：16
4迟国泰,许文,孙秀峰.个人信用卡信用风险评价体系与模型研究[J].同济大学学报（自然科学版）,2006,34(4):557-563. 被引量：28
5Breiman L. Heuristics of instability and stabilization in model selection[ J]. The Annals of Statistics, 1996,24 (6) :2350 - 2383.
6Hastie T, Tibshirani R, Friedman J H. The elements of statistical learning: data mining, inference, and prediction [ M ]. Springer Verlag, 2001.
7Hoerl A E, Kennard R W. Ridge regression : Biased estimation for nonorthogonal problems[J]. Technometrics, 1970,12 ( 1 ) :55 - 67.
8Tibshirani R. Regression shrinkage and selection via the Lasso[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES BMETHODOLOGICAL, 1996,58 ( 1 ) :267 - 288.
9Frank I E, Friedman J H. A statistical view of some chemomctrics regression tools [ J ]. Technometrics, 1993 : 109 - 135.
10Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties [ J]. Journal of the American Statistical Association,2001,96 (456) : 1348 - 1360.