摘要
采用实例加权的隐朴素贝叶斯算法对教育数据进行分析和预测。通过对教育数据进行分析和预测,采用属性值频率加权的方法提升了模型准确性。通过计算不同属性值所占的属性值频率对实例进行加权,实例的权重表示了不同的实例对实验结果的影响。实验使用了UCI提供的葡萄牙中学学生成绩数据集,对比了多种贝叶斯算法,并经过十次十折交叉验证方法验证模型性能。准确率、召回率、AUC指标和F-score指标的实验结果表明,IWHNB算法在649个实例的数据集上表现最佳,展现出优异的分类性能。
In order to predict students’grades more accurately,a new Naive Bayes algorithm,Instance Weighted Hidden Naive Bayes(IWHNB),is adopted in this paper to analyze and predict educational data.The instance is weighted by calculating the frequency of different attribute values.The weight of the case represents the influence of different instances on the experimental results.The results of students from two Portuguese middle schools obtained from the machine learning database proposed by University of California,Irvine(UCI)were used as the data set of this experiment.The data were discretized,replaced by missing values,and processed by deleting useless attribute values for training and testing.Implicit naive Bayes algorithm(HNB),naive Bayes algorithm(NB),average one dependent estimators algorithm(AODE)and instance weighted naive Bayes algorithm(AVFWNB)were selected as comparison objects.Ten times of ten-fold cross-validation show that IWHNB algorithm has the highest accuracy.AUC index and F-score index tested in the data set contained 649 instances.To sum up,IWHNB algorithm has good classification performance in predicting student achievement.
作者
王狄
余良俊
WANG Di;YU Liang-jun(Hubei Normal University,Computer and Information Engineering College,Huangshi Hubei 435002,China;Hubei University of Education,School of Computer Science and Engineering,Wuhan 430205,China)
出处
《湖北第二师范学院学报》
2023年第8期101-108,共8页
Journal of Hubei University of Education
基金
湖北省科技厅技术创新专项揭榜制项目(2019AEE020)
智能地学信息处理湖北省重点实验室开放基金项目(KLIGIP-2018A05)
湖北第二师范学院人才引进科研启动经费项目(20RC07)
湖北第二师范学院校级教改项目(X2022002)。
关键词
大数据
隐朴素贝叶斯
实例加权
十折交叉验证
属性值频率
big data
hidden naive bayes
instance weighting
ten fold cross verification
attribute value frequency