期刊文献+

随机森林是特点鲜明的模型,不是万能的模型 被引量:8

Random forest is a specific algorithm, not omnipotent for all datasets
原文传递
导出
摘要 随机森林(Random forest)模型在2001年发表后得到广泛的关注。由于随机森林可以进行回归和判别等多种统计分析,而且不受正态性、方差齐性和自变量独立性等参数检验的前提条件的制约,其应用日益普遍,有被看作万能模型的趋势。实际上,随机森林是一种特点鲜明的模型,应用局部优化拟合观察值,在分析有偏效应关系的数据时,其结果往往不准确。本文以蝉科(Cicadidea)物种的分布数据为例,比较了随机森林在回归分析时与多元线性回归、广义可加模型和人工神经网络模型的差别,在判别分析时与线性判别分析的差别,强调了随机森林预测时的碎片化特点。结果显示随机森林在处理有多元共线性和交互作用的数据时,以及在判别分析时,其准确率最高。鉴于随机森林的局限性,建议做数据分析时选择多种模型进行比较。文中的R语言代码可为研究者提供参考。 Random forest has gained extensive attention since its publication in 2001. Random forest can handle both regression and classification with minimum assumptions(no need for normality, homogeneity of variance, and independence between explanatory variables), so that its applications has dramatically increased. Someone even use it as an omnipotent tool for all analysis. In fact, random forest is a specific algorithm with clear characteristics. It is an ensemble method by constructing a number of decision trees, which intends to use local optimization to fit data. When the data have strong partial effect, random forest usually does not fit well. I compared the performance of random forest with multiple regression models,generalized additive models, and artificial neural network using the occurrence data of Cicadidea species. The results showed,although the prediction of random forest looked fragmented, it outperformed the other three models. Random forest also performed better than linear discriminant analysis for classifications. Random forest has its strength and weakness. I suggestion to use multiple models for data analysis rather than one "powerful" model.
作者 李欣海 LI Xin-Hai(Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China;University of Chinese Academy of Sciences, Beijing 100049, China)
出处 《应用昆虫学报》 CAS CSCD 北大核心 2019年第1期170-179,共10页 Chinese Journal of Applied Entomology
基金 国家自然科学基金面上项目(31772479 31572287)
关键词 随机森林 偏效应 交互作用 多元共线性 R语言 random forest partial effect interaction multicollinearity R
  • 相关文献

参考文献1

二级参考文献23

  • 1Archer KJ, Kirnes RV, 2008. Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. ,52(4):2249-2260.
  • 2Biau G, 2012. Analysis of a random forests model. J. Mach. Learn. Res. , 13: 1063 -1095.
  • 3Breiman L, 2001a. Random forests. Mach. Learn. , 45:5 - 32.
  • 4Breiman L, 2001b. Statistical modeling: The two cultures. Stat. Sci., 16:199-215.
  • 5Breiman L, Friedman JH, O lshen RA, Stone CJ, 1984.Classification and Regression Trees. Chapman and Hall. 1 -359.
  • 6Cutler DR, Edwards TC, Jr., Beard KH, Cutler A, Hess KT, 2007. Random forests for classification in ecology. Ecology, 88 (11) :2783 - 2792.
  • 7Deng H, Runger G, Tuv E, 2011. Bias of importance measures for multi-valued attributes and solutionsl I Proceedings of the 21 st International Conference on Artificial Neural Networks (ICANN).
  • 8Elith J, Graham CH, 2009. Do they? How do they? Why do they differ? On finding reasons for differing performances of species distribution models. Ecography, 32 ( 1 ) : 66 - 77 .
  • 9Genuer R, Poggi JM, Tuleau-Malot C, 2010. Variable selection using random forests. Pattern Recogn. Lett., 31 (14) :2225 - 2236.
  • 10Groemping U, 2009. Variable importance assessment in regression.: linear regression versus random forest. Am. Stat. , 63(4) :308 -319.

共引文献361

同被引文献125

引证文献8

二级引证文献62

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部