目的探究CatBoost算法在中青年颈动脉粥样硬化预测中的应用价值,为中青年颈动脉粥样硬化早期筛查提供一种可行的技术手段。方法以2016—2018年期间在北京某医院体检中心进行健康体检的2258位中青年为研究对象,根据颈动脉彩超检查结果诊...目的探究CatBoost算法在中青年颈动脉粥样硬化预测中的应用价值,为中青年颈动脉粥样硬化早期筛查提供一种可行的技术手段。方法以2016—2018年期间在北京某医院体检中心进行健康体检的2258位中青年为研究对象,根据颈动脉彩超检查结果诊断是否有颈动脉粥样硬化。使用下采样技术对样本进行平衡处理。分析变量重要性进行特征选择,构建CatBoost模型。利用Logistic回归和人工神经网络两类机器学习算法构建模型,并与CatBoost模型进行比较分析。以灵敏度、特异性、准确率及受试者工作特征(receiver operating characteristic,ROC)曲线下的面积(area under the ROC curve,AUC)作为模型的评价指标。结果CatBoost模型在测试集上的灵敏度、特异性、准确率和AUC均最高,分别为82.8%、96.7%、90.3%、0.92。Logistic回归模型和神经网络模型的灵敏度、特异性和准确率均介于62.4%~73.3%之间,AUC均介于0.72~0.78之间。重要性分析表明影响中青年颈动脉粥样硬化最重要的三个因素依次是年龄、腰高比、高密度脂蛋白胆固醇。结论CatBoost算法在中青年颈动脉粥样硬化预测中的应用具有一定的可行性。相比于其他传统算法,具有较高的诊断价值。展开更多
Accurate estimation of dew point temperature(Tdew)plays a very important role in the fields of water resource management,agricultural engineering,climatology and energy utilization.However,there are few studies on the...Accurate estimation of dew point temperature(Tdew)plays a very important role in the fields of water resource management,agricultural engineering,climatology and energy utilization.However,there are few studies on the applicability of local Tdew algorithms at regional scales.This study evaluated the performance of a new machine learning algorithm,i.e.,gradient boosting on decision trees with categorical features support(Cat Boost)to estimate daily Tdew using limited local and cross-station meteorological data.The random forests(RF)algorithm was also assessed for comparison.Daily meteorological data from 2016 to 2019,including maximum,minimum and average temperature(Tmax,Tmin and Tmean),maximum,minimum and average relative humidity(RHmax,RHmin and RHmean),maximum,minimum and average global solar radiation(Rsmax,Rsmin and Rsmean)from three weather stations in Hunan of China were used to evaluate the CatBoost and RF algorithms.The results showed that both algorithms achieved satisfactory estimation accuracy at the target stations(on average RMSE=1.020℃,R^(2)=0.969,MAE=0.718℃and NRMSE=0.087)in the absence of complete meteorological parameters(with only temperature data as input).The Cat Boost algorithm(on average RMSE=1.900℃and R^(2)=0.835)was better than the RF algorithm(on average RMSE=2.214℃andR^(2)=0.828).The accuracy and stability of the CatBoost and RF algorithms were positively correlated with the number of input parameters,and the three-parameter algorithms achieved higher estimation accuracy than the two-parameter algorithms.The developed methodology is helpful to predict Tdew at regional scale.展开更多
Despite the advancement within the last decades in the field of smart grids,energy consumption forecasting utilizing the metrological features is still challenging.This paper proposes a genetic algorithm-based adaptiv...Despite the advancement within the last decades in the field of smart grids,energy consumption forecasting utilizing the metrological features is still challenging.This paper proposes a genetic algorithm-based adaptive error curve learning ensemble(GA-ECLE)model.The proposed technique copes with the stochastic variations of improving energy consumption forecasting using a machine learning-based ensembled approach.A modified ensemble model based on a utilizing error of model as a feature is used to improve the forecast accuracy.This approach combines three models,namely CatBoost(CB),Gradient Boost(GB),and Multilayer Perceptron(MLP).The ensembled CB-GB-MLP model’s inner mechanism consists of generating a meta-data from Gradient Boosting and CatBoost models to compute the final predictions using the Multilayer Perceptron network.A genetic algorithm is used to obtain the optimal features to be used for the model.To prove the proposed model’s effectiveness,we have used a four-phase technique using Jeju island’s real energy consumption data.In the first phase,we have obtained the results by applying the CB-GB-MLP model.In the second phase,we have utilized a GA-ensembled model with optimal features.The third phase is for the comparison of the energy forecasting result with the proposed ECL-based model.The fourth stage is the final stage,where we have applied the GA-ECLE model.We obtained a mean absolute error of 3.05,and a root mean square error of 5.05.Extensive experimental results are provided,demonstrating the superiority of the proposed GA-ECLE model over traditional ensemble models.展开更多
文摘目的探究CatBoost算法在中青年颈动脉粥样硬化预测中的应用价值,为中青年颈动脉粥样硬化早期筛查提供一种可行的技术手段。方法以2016—2018年期间在北京某医院体检中心进行健康体检的2258位中青年为研究对象,根据颈动脉彩超检查结果诊断是否有颈动脉粥样硬化。使用下采样技术对样本进行平衡处理。分析变量重要性进行特征选择,构建CatBoost模型。利用Logistic回归和人工神经网络两类机器学习算法构建模型,并与CatBoost模型进行比较分析。以灵敏度、特异性、准确率及受试者工作特征(receiver operating characteristic,ROC)曲线下的面积(area under the ROC curve,AUC)作为模型的评价指标。结果CatBoost模型在测试集上的灵敏度、特异性、准确率和AUC均最高,分别为82.8%、96.7%、90.3%、0.92。Logistic回归模型和神经网络模型的灵敏度、特异性和准确率均介于62.4%~73.3%之间,AUC均介于0.72~0.78之间。重要性分析表明影响中青年颈动脉粥样硬化最重要的三个因素依次是年龄、腰高比、高密度脂蛋白胆固醇。结论CatBoost算法在中青年颈动脉粥样硬化预测中的应用具有一定的可行性。相比于其他传统算法,具有较高的诊断价值。
基金the Shandong Provincial Natural Science Fund(ZR2020ME254 and ZR2020QD061).
文摘Accurate estimation of dew point temperature(Tdew)plays a very important role in the fields of water resource management,agricultural engineering,climatology and energy utilization.However,there are few studies on the applicability of local Tdew algorithms at regional scales.This study evaluated the performance of a new machine learning algorithm,i.e.,gradient boosting on decision trees with categorical features support(Cat Boost)to estimate daily Tdew using limited local and cross-station meteorological data.The random forests(RF)algorithm was also assessed for comparison.Daily meteorological data from 2016 to 2019,including maximum,minimum and average temperature(Tmax,Tmin and Tmean),maximum,minimum and average relative humidity(RHmax,RHmin and RHmean),maximum,minimum and average global solar radiation(Rsmax,Rsmin and Rsmean)from three weather stations in Hunan of China were used to evaluate the CatBoost and RF algorithms.The results showed that both algorithms achieved satisfactory estimation accuracy at the target stations(on average RMSE=1.020℃,R^(2)=0.969,MAE=0.718℃and NRMSE=0.087)in the absence of complete meteorological parameters(with only temperature data as input).The Cat Boost algorithm(on average RMSE=1.900℃and R^(2)=0.835)was better than the RF algorithm(on average RMSE=2.214℃andR^(2)=0.828).The accuracy and stability of the CatBoost and RF algorithms were positively correlated with the number of input parameters,and the three-parameter algorithms achieved higher estimation accuracy than the two-parameter algorithms.The developed methodology is helpful to predict Tdew at regional scale.
基金This research was financially supported by the Ministry of Small and Mediumsized Enterprises(SMEs)and Startups(MSS),Korea,under the“Regional Specialized Industry Development Program(R&D,S2855401)”supervised by the Korea Institute for Advancement of Technology(KIAT).
文摘Despite the advancement within the last decades in the field of smart grids,energy consumption forecasting utilizing the metrological features is still challenging.This paper proposes a genetic algorithm-based adaptive error curve learning ensemble(GA-ECLE)model.The proposed technique copes with the stochastic variations of improving energy consumption forecasting using a machine learning-based ensembled approach.A modified ensemble model based on a utilizing error of model as a feature is used to improve the forecast accuracy.This approach combines three models,namely CatBoost(CB),Gradient Boost(GB),and Multilayer Perceptron(MLP).The ensembled CB-GB-MLP model’s inner mechanism consists of generating a meta-data from Gradient Boosting and CatBoost models to compute the final predictions using the Multilayer Perceptron network.A genetic algorithm is used to obtain the optimal features to be used for the model.To prove the proposed model’s effectiveness,we have used a four-phase technique using Jeju island’s real energy consumption data.In the first phase,we have obtained the results by applying the CB-GB-MLP model.In the second phase,we have utilized a GA-ensembled model with optimal features.The third phase is for the comparison of the energy forecasting result with the proposed ECL-based model.The fourth stage is the final stage,where we have applied the GA-ECLE model.We obtained a mean absolute error of 3.05,and a root mean square error of 5.05.Extensive experimental results are provided,demonstrating the superiority of the proposed GA-ECLE model over traditional ensemble models.