摘要
针对传统大数据分类方法未对数据的主要特征进行优化分类,导致准确率低、效率差等问题,提出基于随机森林算法的不平衡大数据动态分类。设计分类系统基本框架以及分类处理器的硬件结构,针对大数据时域特征中的瞬时能量,计算帧数及过零率。依据得到的计算结果使用线性预测和梅尔频率倒谱系数两种方式,结合帧数大小构建不平衡大数据的主要特征函数。利用随机森林算法的表达函数,建立算法的基本框架,分别对其中的子模型优化分类。再获取决策树模型,对不平衡数据特征分裂处理,实现数据动态分类,并使用少数类和多数类评价指标,对结果进行理论检验。通过仿真表明,上述方法具有更高的数据分类准确率,更好的分类效果,可为今后的大数据动态分类工作提供良好的参考。
Traditional big data classification method ignored to optimize the classification of main features of data,resulting in low accuracy and poor efficiency.Therefore,a dynamic classification method for unbalanced big data based on random forest algorithm was proposed.At first,the basic framework of classification system and the hardware structure of classification processor were designed.For the instantaneous energy in time domain feature of big data,the frame number and zero-crossing rate were calculated.According to the calculation results,linear prediction and Mel frequency cepstrum coefficients were combined with the frame number to construct the main feature function of unbalanced big data.Moreover,the expression function of the random forest algorithm was used to build the basic framework of algorithm,and the sub-models were optimized and classified respectively.Then,the decision tree model was obtained,and the imbalanced data features were classified.Finally,dynamic classification of data was achieved.The minority and majority evaluation indicators were used to test the result in theory.Simulation results show that the proposed method has higher accuracy of data classification and better classification effect.
作者
包涵
范晓安
BAO Han;FAN Xiao-an(Jilin University,Changchun Jilin 130012,China)
出处
《计算机仿真》
北大核心
2020年第8期311-314,461,共5页
Computer Simulation
关键词
云计算
不平衡大数据
动态分类
帧数
随机森林算法
Cloud computing
Unbalanced big data
Dynamic classification
Frame number
Random forest algorithm