期刊文献+

基于机器学习算法的胰腺导管腺癌预后模型构建及其验证

Construction and validation of a prognostic model for pancreatic ductal adenocarcinoma based on machine learning algorithm
原文传递
导出
摘要 背景与目的:胰腺导管腺癌(PDAC)是胰腺癌中最常见的病理类型,其远期预后差,缺乏个体化的预后评价工具。本研究通过SEER数据库中的大样本真实世界数据,基于机器学习算法,构建PDAC患者预后列线图,旨在精准化、个体化评价PDAC患者的预后,为临床决策制定提供参考。方法:根据纳入和排除标准,提取SEER数据库2000—2018年期间经病理学确诊为PDAC患者的临床病理及预后资料。按7∶3随机分为训练集和验证集。在训练集中,分别采用单(多)因素Cox比例风险模型、LASSO回归模型和随机生存森林模型筛选影响PDAC预后的独立因素,构建预测6、12、36个月肿瘤特异性生存期(CSS)和总生存期(OS)的列线图模型。随后,分别在训练集和验证集中利用一致性指数(C指数)、受试者工作特征(ROC)曲线、校准曲线、生存曲线、决策曲线分析对模型进行验证和评估。结果:本研究共纳入4237例患者,其中训练集2965例,验证集1272例,两组基线资料均衡可比。训练集和验证集中患者的中位随访时间分别为18(9~36)个月和18(9~37)个月。多因素Cox比例风险模型显示,年龄、T分期、N分期、M分期、分化程度、手术、系统治疗和化疗是OS的独立影响因素(均P<0.05);年龄、T分期、N分期、M分期、分化程度、手术和化疗是CSS的独立影响因素(均P<0.05)。LASSO回归模型显示,年龄、分化程度、T分期、N分期、M分期、化疗、手术、淋巴清扫范围、放疗和系统治疗与OS相关;分化程度、T分期、N分期、M分期、化疗、手术、淋巴清扫范围、放疗和系统治疗与CSS相关。随机生存森林模型显示,影响OS的重要性评分前五位变量分别为:系统治疗、分化程度、N分期、化疗和T分期;而影响CSS的重要性评分前五位变量分别为:系统治疗、分化程度、N分期、化疗和AJCC分期。基于多因素Cox回归模型、LASSO回归模型和随机生存森林模型的分析结果并结合临床重要性,最终选择年龄、T分期、N分期、M分期、分化程度、手术和化疗,共七个临床特征成功构建预测6、12、36个月的OS和CSS的预测模型。模型验证结果表明,对于OS,在训练集和验证集中的C指数分别为0.692(95%CI=0.681~0.704)和0.680(95%CI=0.664~0.698);对于CSS,在训练集和验证集中的C指数分别为0.696(95%CI=0.684~0.707)和0.680(95%CI=0.662~0.698)。ROC曲线表明模型具有良好的预测价值;校准曲线均靠近理想的45°参考线。结论:年龄、TNM分期、分化程度、手术和化疗是PDAC患者预后的独立影响因素;基于这些变量构建的预测模型,有较高的区分度和准确度。有助于临床医师为PDAC患者制定精准的、个体化的治疗和随访方案。 Background and Aims:Pancreatic ductal adenocarcinoma(PDAC)is the most common pathological type of pancreatic cancer,with a poor long-term prognosis and a lack of individualized prognostic assessment tools.This study was conducted to construct a prognostic nomogram for PDAC patients based on large-sample real-world data from the SEER database using machine learning algorithms to provide precise and individualized prognostic evaluations to inform clinical decision-making.Methods:The clinical and prognostic data of PDAC patients pathologically diagnosed from 2000 to 2018 were extracted from the SEER database based on inclusion and exclusion criteria.The data were randomly divided into training(70%)and validation(30%)sets.In the training set,independent prognostic factors were identified using univariate and multivariate Cox proportional hazards models,LASSO regression,and random survival forests.A nomogram was developed to predict 6,12,and 36-month cancer-specific survival(CSS)and overall survival(OS).The model was then validated and assessed in both training and validation sets using the concordance index(C-index),receiver operating characteristic(ROC)curve,calibration curve,survival curves,and decision curve analysis.Results:A total of 4237 patients were included,with 2965 in the training set and 1272 in the validation set,showing comparable baseline characteristics.The median follow-up time was 18(9-36)months for the training set and 18(9-37)months for the validation set.The multivariate Cox model indicated that age,T stage,N stage,M stage,differentiation,surgery,systemic therapy,and chemotherapy were independent factors for OS(all P<0.05).For CSS,age,T stage,N stage,M stage,differentiation,surgery,and chemotherapy were independent factors(all P<0.05).The LASSO regression model found that age,differentiation,T stage,N stage,M stage,chemotherapy,surgery,lymph node dissection,radiotherapy,and systemic therapy were associated with OS,while T stage,N stage,M stage,chemotherapy,surgery,lymph node dissection,radiotherapy,and systemic therapy were linked to CSS.The random survival forest model identified the top five variables affecting OS as systemic therapy,differentiation,N stage,chemotherapy,and T stage;and for CSS,they were systemic therapy,differentiation,N stage,chemotherapy,and AJCC stage.Based on the analyses from the multivariate Cox,LASSO,and random survival forest model,along with clinical significance,a prediction model was successfully constructed using seven clinical features:age,T stage,N stage,M stage,differentiation,surgery,and chemotherapy to predict OS and CSS at 6,12,and 36 months.The validation results showed C-indexes of 0.692(95%CI=0.681-0.704)and 0.680(95%CI=0.664-0.698)for OS in the training and validation sets,respectively,and 0.696(95%CI=0.684-0.707)and 0.680(95%CI=0.662-0.698)for CSS.ROC curves indicated good predictive value,and calibration curves closely matched the ideal 45°reference line.Conclusion:Age,TNM stage,differentiation,surgery,and chemotherapy are independent prognostic factors for PDAC patients.The prognostic model based on these variables has high discrimination and accuracy,assisting clinicians in developing precise and personalized treatment and follow-up plans for PDAC patients.
作者 张业光 赵攀 章慧 黄正红 黄坤 ZHANG Yeguang;ZHAO Pan;ZHANG Hui;HUANG Zhenghong;HUANG Kun(Department of Ultrasound Medicine,Mianyang Traditional Chinese Medicine Hospital,Mianyang,Sichuan 621000,China;Department of General Surgery,Mianyang Traditional Chinese Medicine Hospital,Mianyang,Sichuan 621000,China;College of Medical Technology,Chengdu University of Traditional Chinese Medicine,Chengdu 611137,China)
出处 《中国普通外科杂志》 CAS CSCD 北大核心 2024年第9期1459-1472,共14页 China Journal of General Surgery
基金 四川省绵阳市卫健委基金资助项目(202309) 四川省绵阳市中医医院基金资助项目(MYSZYYYKT2023117)。
关键词 胰腺肿瘤 预后 SEER规划 机器学习 列线图 Pancreatic Neoplasms Prognosis SEER Program Machine Learning Nomograms
  • 相关文献

参考文献14

二级参考文献73

共引文献272

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部