摘要
目的:基于机器学习建立一种可准确预测脂肪肝的模型,辅助医务人员实现脂肪肝高危人群的识别及分类。方法:回顾性收集2018年4月-12月在十堰市某三级医院体检人群的临床资料,选取包括性别、年龄、体质指数、甘油三酯等16个指标,采用Catboost、XGBoost、KNN、Logistic回归、朴素贝叶斯和支持向量机六种机器学习算法进行脂肪肝预测模型构建。其中70%的数据用于训练集,剩余30%用于测试集。采用受试者工作特征曲线下面积(area under the curve,AUC)和校准曲线以评估及比较模型的预测性能,并采用沙普利可加性特征解释方法(shapley additive explanation,SHAP)实现预测模型的可视化。结果:共纳入6237名受试者,其中脂肪肝患者3208名。Catboost、XGBoost、KNN、Logistic回归、朴素贝叶斯和支持向量机的AUC分别为0.926、0.911、0.807、0.880、0.882、0.862。除朴素贝叶斯外,其他所有模型的校准曲线均表现良好,Catboost模型在六个模型中具有最佳预测性能。在Catboost变量重要性矩阵图中,排名前五个预测因素是体质指数、甘油三酯、总胆固醇、年龄、低密度脂蛋白。此外,体质指数、甘油三酯、低密度脂蛋白对脂肪肝病程影响呈现潜在交互关系,且体质指数>20 kg/m^(2)是脂肪肝风险显著升高的危险因素。结论:Catboost模型是最佳的预测模型,其可用于预测脂肪肝及解释特征变量交互关系,体质指数、甘油三酯、低密度脂蛋白指标可预测脂肪肝的高危人群。
Objective Based on machine learning,a model that can accurately predict fatty liver was established to assist medical staff in identifying and classifying high-risk population for fatty liver.Methods Clinical data of people undergoing physical examination in a hospital in Shiyan City from April to December 2018 were retrospectively collected.Sixteen indicators including gender,age,body mass index and triglyceride were selected.Six machine learning algorithms including Catboost,XGBoost,KNN,Logistic regression,Naive Bayes and support vector machine were used to construct the prediction model of fatty liver.70%of the data was for the training set,and the remaining 30%was used for the test set.The receiver operating characteristic area under the curve(AUC)and the calibration curve were used to evaluate and compare the predictive performance of the model,and the Shapley additive explanation method(SHAP)was used to visualize the prediction model.Results A total of 6237 subjects were enrolled,including 3208 patients with fatty liver.The AUC of Catboost,XGBoost,KNN,Logistic regression,Naive Bayes and support vector machine were 0.926,0.911,0.807,0.880,0.882 and 0.862,respectively.All calibration curves except naive Bayes performed well,and Catboost model had the best predictive performance among the six models.In the Catboost variable importance matrix,the top five predictors were body mass index,triglycerides,total cholesterol,age and LDL.In addition,body mass index,triglyceride,and low density lipoprotein(LDL)had a potential interactive relationship on the course of fatty liver,and BMI>20 kg/m^(2) was a significant risk factor for increased fatty liver risk.Conclusion Catboost model is the best prediction model,which can be used to predict fatty liver disease and explain the interaction of characteristic variables.Body mass index,triglyceride and low density lipoprotein indexes can predict the high risk population of fatty liver.
作者
戴晓霞
李娟
王晨阳
朱国军
刘冰
DAI Xiao-xia;LI Juan;WANG Chen-yang;ZHU Guo-jun;LIU Bing(Research Center of Health Management and Health Service Development,Hubei University of Medicine;Community Health Service Center,Bailang Development Zone;Renmin Hospital,Hubei University of Medicine,Shiyan,Hubei 442000,China)
出处
《湖北医药学院学报》
CAS
2022年第6期574-577,583,F0004,共6页
Journal of Hubei University of Medicine
基金
国家自然科学基金项目(71774049)。
关键词
机器学习
脂肪肝
预测模型
Machine learning
Fatty liver
Predictive model