摘要
高校贫困生的贫困程度判定可以归属于构建分类模型对样本数据进行训练。但单个分类模型的精准度要取决于处理样本数据的大小和类型复杂度,在模型速度和准确性之间不易取舍。集成多个分类算法可以避免单个分类算法的过拟合。通过邻域分量分析(Neighborhood Component Analysis,NCA)进行特征降维降低初始分类模型的计算成本,对误判损失引入一个成本函数进行惩罚的同时采用贝叶斯优化进行超参数调优。结果表明,改进后的分类模型泛化能力得到明显提升。计算时间成本降低的同时,误判率由初始的8%下降到5%,模型的准确率提升了近4%。
Poverty levels of poor students in the university can be attributed to build a classification model of training sample data.But the model of a single classification accuracy depends on the size of the sample data and types of complexity,and it is difficult to choose between the speed and accuracy of the model.Integrating multiple classification algorithm can avoid a single classification algorithm of fitting.Through the neighborhood component analysis (NCA) for feature dimension reduction,we reduced initial classification model of calculating cost.For misjudgment loss,we introduced a cost function to punish and used bayesian optimization to super parameter tuning simultaneously.The results show that the generalization ability of improved classification model is improved significantly.At the same time,the computation time cost decreases,misjudgment rate decreases from 8% to 5%,and the accuracy of the model increases by nearly 4%.
作者
李斌
王卫星
Li Bin;Wang Weixing(Modern Education Technology Center,College of Applied Engineering,Henan University of Science and Technology,Sanmenxia 472000,Henan,China)
出处
《计算机应用与软件》
北大核心
2019年第8期281-287,299,共8页
Computer Applications and Software
基金
河南省2017年高等教育教学改革研究与实践项目(2017SJGLX636)
关键词
分类算法
领域分量分析
贝叶斯调优
MATLAB
贫困生判别
Classification algorithm
Neighborhood component analysis(NCA)
Bayesian tuning
MATLAB
Poor student discriminant