期刊文献+

融合人类知识的随机森林特征选择方法研究 被引量:3

Research on Random Forest Feature Selection Method by Human Knowledge
下载PDF
导出
摘要 特征选择可以从原始特征空间中选择出一些最有效的特征以降低数据特征维度,提高学习算法性能。在数据降维问题中,常见的特征选择方法主要依靠数据本身的统计特性,通过数据本身信息选择更有效的特征,然而一些实际问题中往往积累了大量人类经验,这些人类知识可能对特征选择有重要影响,但很少有特征选择方法考虑使用这些人类知识。针对此类包含人类知识问题,并兼顾人类知识和采集数据的特征选择方法,提出了基于随机森林和模糊系统的二次筛选的特征选择模型。该模型通过随机森林算法剔除原始数据集中的冗余特征,实现初步筛选,利用初选特征中包含的人类知识搭建模糊系统,对初选特征计算评估得分,筛选出最终的关键特征。在汽油提纯真实数据集上进行了实验,相较于常规特征选择方法,该模型有显著提升,验证了结合人类知识随机森林特征选择方法的有效性。 Feature selection methods can select more efficient features from the original feature space to reduce data characteristic dimensions and improve learning algorithm performance.For the problem of data dimensionality reduction,common feature selection methods mainly rely on the statistical characteristics of the data itself,and select more effective features through the data itself.However,a lot of human experience is often accumulated in some practical problems.Human knowledge may have an important influence on feature selection,but few feature selection methods take the use of such human knowledge into account.In response to this kind of feature selection method that contains human knowledge and takes into account both human knowledge and collected data,a feature selection model based on secondary screening of random forest and fuzzy system is proposed.The model uses the random forest algorithm to eliminate redundant features in the original data set to achieve preliminary screening,build a fuzzy system using human knowledge contained in primary elections,calculate evaluation scores for the primary selected features,and screen out the final key features.Experiments were carried out on the real data set of gasoline purification.Compared with the conventional feature selection method,the model has a significant improvement,which verifies the effectiveness of the random forest feature selection method combined with human knowledge.
作者 戴贵洋 綦秀利 余晓晗 DAI Gui-yang;QI Xiu-li;YU Xiao-han(School Command&Control Engineering,Army Engineering University of PLA,Nanjing 210007,China)
出处 《计算机技术与发展》 2022年第7期155-160,共6页 Computer Technology and Development
基金 国家自然科学基金项目(61806221)。
关键词 特征选择 随机森林 人类知识 模糊系统 数据降维 feature selection random forest human knowledge fuzzy system data dimensionality reduction
  • 相关文献

参考文献5

二级参考文献11

共引文献20

同被引文献34

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部