摘要
目前基于GIS的泥石流易发性(简称DFS)评价模型中,统计类型模型的因子须保证独立性,且权重受区间划分控制;线性机器学习难以处理非线性问题、而常用非线性模型调试效率低.鉴于随机森林(RF)能有效克服常用模型的诸多不足,且在DFS评价中的应用极少,首先展开基于RF的DFS评价,采用线性、RBF支持向量机、二次判别分析、RF等经贝叶斯优化的模型和26种泥石流影响因子;然后,分别以RF的相对权重排序和蒙特卡洛方法研究因子组合和建模样本变化下DFS评价的可靠性.结果表明:RF不易发和较易发区中有21个因子可指示泥石流孕育环境差异;RF的相对权重排序能有效确定易发模型的局部最优因子组合;随机样本划分导致的评价不确定性在中易发区最大,应通过提高建模样本比例和改善模型降低;RF的预测能力指标AUC为0.86、全局预测精度为0.79、F1分数为0.66、brier分数为0.14,以及它们的可靠度最优,可作为DFS定量评估的优先选择.
Nowadays models extensively used in GIS for debris-flow susceptibility(DFS)assessment remain obviously inadequate.In models based on classical statistical theory(e.g.information value,weight of evidence,and certainty factors),the independence between debris-flow conditioning factors is necessary,and the weight of these factors depends on the classification method.The linear machine learning may fail in nonlinear classification problems,whereas hyper-parameter tuning of usual nonlinear techniques is always difficult.Random forest(RF)is capable of resolving the most of problems of these usual models,but have hardly been applied in DFS assessment.This article aims to investigate the DFS assessment of RF and evaluate its reliability,using 4 models with the hyper-parameters tuning of Bayesian optimization,random forest(RF),linear support vector machine(LSVM),radial basis function-support vector machine(RBF-SVM),and quadratic discriminant analysis(QDA),and 26 conditioning factors.A modified five-fold cross-validation method is adopted to evaluate DFS assessment firstly,and then the rank of the relative weight of RF and Monte Carlo method are used respectively,to investigate the reliability of DFS assessment under the different combinations of debris-flow conditioning factors or the random sample split.Results demonstrate that 21 out of 26 debris-flow conditioning factors indicate the difference of the environments with different debris-flow rates.Relative weight rank of RF,can effectively determine the local optimal combination of factors for the 4 models.The uncertainty of susceptibility assessment resulting from the random sample split is most significant in the medium susceptibility zone(0.4~0.6),and can be reduced by increasing the proportion of the model building sample and improving the susceptibility model.The prediction performance of RF is:AUC=0.86,overall accuracy=0.79,F1 score=0.66 and brier score=0.14.And their reliability is optimal in all these 4 models.Therefore,RF can be a superior model for quantitative DFS assessment.
作者
张书豪
吴光
Zhang Shuhao;Wu Guang(Faculty of Geosciences and Environmental Engineering,Southwest Jiaotong University,Chengdu 611756,China)
出处
《地球科学》
EI
CAS
CSCD
北大核心
2019年第9期3115-3134,共20页
Earth Science
基金
中国铁路总公司科技开发计划(No.2010G004-I)
关键词
泥石流易发性
随机森林
可靠性
支持向量机
二次判别分析
因子组合
GIS
工程地质
debris-flow susceptibility
random forest
reliability
support vector machine
quadratic discriminant analysis
conditioning factors combination
GIS
engineering geology