摘要
目的不稳定型心绞痛患者不良结局具有多维性的特点,传统统计方法多对不稳定型心绞痛的单维结局进行预测,无法解决多标签数据特征冗余、标签不平衡等问题。本文尝试采用多标签合成少数类过采样技术(MLSMOTE)算法进行处理,并构建多标签预测模型,以提高其预测性能。方法收集来自2017年1月~2020年5月于山西医科大学第二医院收治的不稳定型心绞痛患者纳入本研究。采用回顾性和前瞻性相结合的临床队列收集患者信息。以不稳定型心绞痛患者发生心肌梗死、心力衰竭、血运重建、脑卒中、死亡为结局,使用改进Relief F的多标记特征选择(RF-ML)算法选择多标签特征子集,MLSMOTE算法进行多标签不平衡处理,在此基础上构建分类器链(CC)的多标签分类模型,选取随机森林、朴素贝叶斯、支持向量机、K近邻(K-nearest neighbors,KNN)算法等为基分类器进行比较,并评价模型性能。结果采用多标签特征选择方法RF-ML进行变量筛选,最终筛选出18个变量纳入模型,分别为:尿酸、肌酐、血小板、氯、血红蛋白、收缩压、舒张压、心率、钠、血清总胆红素、血清间接胆红素、白蛋白、血清总胆汁酸、体质指数(BMI)、血糖、血清直接胆红素、低密度脂蛋白胆固醇、高密度脂蛋白胆固醇。采用多标签不平衡算法MLSMOTE对此次研究涉及的5个标签:心肌梗死、心力衰竭、血运重建、脑卒中、死亡进行不平衡处理。采用不平衡处理后的数据,选择随机森林、朴素贝叶斯、支持向量机、KNN作为基分类器,建立CC模型,结果显示以朴素贝叶斯为基分类器的CC模型在Ranking loss、Macro_AUC、Micro_AUC、Macro_F1、Micro_F1、Macro_recall六个指标上的表现性能均优于其他模型。结论本研究采用MLSMOTE算法进行不平衡处理,使原始标签的不平衡率得到一定改善。运用均衡化数据建立CC模型,充分考虑了标签的特定特征和标签相关性,以朴素贝叶斯为基分类器的CC模型表现最佳。
Objective To construct a multilabel prediction model for processing multilabel data and predicting adverse outcomes in patients with unstable angina pectoris(UAP)by applying algorithm of multilabel synthetic minority over sampling technique(MLSMOTE).Methods UAP patients were chosen from the Second Hospital of Shanxi Medical University from Jan.2017 to May 2020.Patients’information was collected by using a retrospective and prospective clinical cohort study.The multilabel feature subsets were selected by using algorithm of relief F for multilabel feature selection(RF-ML)taken myocardial infarction(MI),heart failure(HF),revascularization,stroke and death as outcomes in UAP patients.MLSMOTE algorithm is used to deal with multilabel imbalance,and on this basis,multilabel classification models of classifier chains(CC)were constructed and compared,and random forest,naive Bayes,support vector machine(SVM)and K-nearest neighbor(KNN)were selected as base classifiers,and the model performance was reviewed.Results The were finally 18 variables screened and enclosed into the model by using RF-ML method.These variables included uric acid(UA),creatinine(Cr),platelet,chlorine(CL),hemoglobin(Hb),systolic blood pressure(SBP),diastolic blood pressure(DBP),heart rate(HR),sodium(Na),total bilirubin(TBIL),indirect bilirubin(IBIL),albumin(ALB),total bile acid(TBA),body mass index(BMI),blood sugar,direct bilirubin(DBIL),low-density lipoprotein-cholesterol(LDL-C)and highdensity lipoprotein-cholesterol(HDL-C).The imbalanced processing was carried out to 5 labels involved in this study including MI,HF,revascularization,stroke and death by using MLSMOTE.The CC models were constructed with random forest,naive Bayes,SVM and KNN as base classifiers by using processed data,and the results showed that the performance of CC model with naive Bayes as base classifier was better in 6 indexes of ranking loss,macro-AUC,micro-AUC,macro_F1,micro_F1 and macro-recall than that of other CC models.Conclusion MLSMOTE algorithm is used for imbalance processing in this study,which improves the imbalance rate of the original labels to some extent.The CC models are constructed by using balanced data,fully considering the specific features and correlation of labels.The performance of CC model with naive Bayes as base classifier was the best.
作者
王紫芸
张瑜
韩港飞
闫晶晶
田晶
Wang Ziyun;Zhang Yu;Han Gangfei;Yan Jingjing;Tian Jing(Shanxi Medical University,Taiyuan 030001,China;不详)
出处
《中国循证心血管医学杂志》
2024年第6期651-656,共6页
Chinese Journal of Evidence-Based Cardiovascular Medicine
基金
国家自然科学基金(82103958)
山西省科技创新人才团队专项计划(202204051001026)。
关键词
不稳定型心绞痛
多标签特征选择
多标签不平衡
标签特定特征
Unstable angina pectoris
Multilabel feature selection
Multilabel imbalance
Label specific features