摘要
糖尿病是一种比较常见的慢性疾病,并且存在较长的无症状阶段。本文主要介绍了机器学习中的5种分类算法,分别是朴素贝叶斯、支持向量机、逻辑回归、决策树和集成分类器Random Forest,并在Weka数据挖掘平台上,对糖尿病数据进行挖掘分析,根据混淆矩阵、Kappa系数、ROC曲线、均方根误差以及相对绝对误差这几个性能指标对分类器效果进行分析,找到最适合糖尿病疾病预测的算法,为当今医疗行业其他疾病数据的挖掘分析提供思路。
Diabetes is a relatively common chronic disease,and there is a long asymptomatic stage.This article mainly introduces five classification algorithms in machine learning,which are Naive Bayes,Support Vector Machine,Logistic Regression,Decision Tree,and Random Forest,an integrated classifier.On the Weka data mining platform,the diabetes data is mined and analyzed.The effect of the classifier is analyzed according to the confusion matrix,Kappa coefficient,ROC curve,root mean square error and relative absolute error,and the most suitable algorithm for diabetic disease prediction is achieved,which could provide ideas for the current medical industry data mining.
作者
王成武
晏峻峰
WANG Chengwu;YAN Junfeng(School of Informatics,Hunan University of Chinese Medicine,Changsha 410208,China)
出处
《智能计算机与应用》
2021年第1期64-68,共5页
Intelligent Computer and Applications
基金
湖南省教育厅科研重点项目(18A219)。