摘要
目的:通过机器学习算法对早发性卵巢功能不全(POI)的影响因素进行特征排序,找出对POI影响较大的因素。方法:先制定纳入和剔除标准,选取因月经不调就诊的500例患者,根据中医证型进行年龄和职业差异性分析。再通过逻辑回归、支持向量机、决策树、随机森林、极端梯度提升和K-最近邻6种机器学习算法对患者进行POI预测分类,根据算法求得的马修斯相关系数和AUC进行预测精准度比较。通过随机森林中的准确度和基尼不纯度下降对POI影响因素进行特征排序,结合逐步剔除法得到对POI影响程度排序前五的特征因素。结果:随机森林的算法在马修斯相关系数、准确率和AUC中均获得了最大值,分别为0.399、0.717和0.908。POI的影响因素有子宫或盆腔手术史、受教育程度、年龄、减肥史和吸烟史,这些因素的Borda计数得分依次为手术史(2.446)、受教育程度(2.924)、年龄(4.060)、减肥史(5.303)、吸烟史(6.429)。结论:随机森林的性能在预测POI患者中优于其他5种算法,当患者的数据信息不足时,医生可先通过这5个特征因素的指标对月经不调患者进行初步干预。
Aim:To rank the influencing factors of premature ovarian insufficiency(POI)by machine learning algorithm,and find out the factors that have a greater impact on POI.Methods:Firstly,the inclusion and exclusion criteria were established,500 patients with abnormal menstruation were selected,and the corresponding age and occupation differences were analyzed according to the traditional Chinese medicine syndrome type.Then,6 machine learning algorithms including Logistic regression,support vector machine,decision tree,random forest,extreme gradient boosting and K-nearest neighbor were used to predict and classify POI,and the prediction accuracy was compared according to the Matthews correlation coefficient and AUC obtained by the algorithm.POI influencing factors were sorted through the accuracy and Gini impurity reduction in random forest,and the top 5 factors were obtained by the stepwise elimination method.Results:Random forest algorithm obtained the maximum value in Matthews correlation coefficient,accuracy and AUC,which were 0.399,0.717 and 0.908,respectively.The influencing factors of POI were uterine or pelvic surgery history,education level,age,weight loss history and smoking history.The Borda count scores for the 5 factors were uterine or pelvic surgery history(2.446),education level(2.924),age(4.060),weight loss history(5.303),and smoking history(6.429).Conclusions:The performance of random forest algorithm is better than the other 5 algorithms in predicting POI.When the data information of patients is insufficient,doctors could preliminarily intervene patients with irregular menstruation through the indicators of these 5 characteristic factors.
作者
陆玉婷
盛正和
黄菲
裴世成
蒙华琳
伍善广
LU Yuting;SHENG Zhenghe;HUANG Fei;PEI Shicheng;MENG Hualin;WU Shanguang(Department of Medicine,Guangxi University of Science and Technology,Liuzhou Key Laboratory of Guizhong Characteristic Medicinal Resources Development,Liuzhou,Guangxi 545005;School of Pharmacy,Hunan University of Chinese Medicine,Hunan Engineering Technology Research Center of Bioactive Substance Discovery of Chinese Medicine,Changsha 410208;Department of Traditional Chinese Medicine Internal Medicine,Liuzhou People′s Hospital,Liuzhou,Guangxi 545005)
出处
《郑州大学学报(医学版)》
CAS
北大核心
2024年第2期246-251,共6页
Journal of Zhengzhou University(Medical Sciences)
基金
国家自然科学基金项目(21766003)
湖南省研究生科研创新项目(CX20220776)。
关键词
早发性卵巢功能不全
机器学习
特征排序
premature ovarian insufficiency
machine learning
feature ranking