摘要
受多源小样本数据属性复杂性的影响,对其进行集成处理时,过拟合和欠拟合情况较为明显。为此,文章提出基于随机森林的多源小样本数据快速集成方法。考虑多源小样本数据自身的属性特征,在构建随机森林模型阶段,充分利用粒向量与多源小样本数据特征的贴合性,将其作为随机森林的基础结构,利用粒化层归一化多源小样本数据,并将输出的粒化结果作为决策层的节点。在集成阶段,根据多源小样本数据与决策层节点之间的距离,集成数据。在测试结果中,数据集成的过拟合情况占比仅为0.29%,欠拟合情况占比也仅为0.27%,具有良好的集成效果。
Influenced by the attribute complexity of multi-source small sample data,the overfitting and underfiting are obvious.Therefore,the rapid integration method of multi-source small sample data based on random forest is proposed.Considering the properties of multi-source small sample data itself,in the construction of the random forest model stage,make full use of the fit of particle vector and small sample data features,as the basis of the random forest,using the granulation layer of multi-source small sample data normalization operation,and the output granulation results as a decision-making node.In the integration stage,the integration of the data is realized according to the distance between the multi-source small sample data and the nodes at the decision level.In the test results,the proportion of overfitting of data integration was only 0.29%,and the proportion of underfitting was only 0.27%,which had good integration effect.
作者
何昀
张川
张继夫
陈伟
HE Yun;ZHANG Chuan;ZHANG Jifu;CHEN Wei(Aviation University of Air Force,Changchun Jilin 130021,China)
出处
《信息与电脑》
2024年第1期52-54,共3页
Information & Computer
关键词
随机森林
多源小样本数据
快速集成
属性特征
随机森林模型
random forest
multi-source small sample data
fast integration
attribute characteristics
random forest model