The quality of hot-rolled steel strip is directly affected by the strip crown.Traditional machine learning models have shown limitations in accurately predicting the strip crown,particularly when dealing with imbalanc...The quality of hot-rolled steel strip is directly affected by the strip crown.Traditional machine learning models have shown limitations in accurately predicting the strip crown,particularly when dealing with imbalanced data.This limitation results in poor production quality and efficiency,leading to increased production costs.Thus,a novel strip crown prediction model that uses the Boruta and extremely randomized trees(Boruta-ERT)algorithms to address this issue was proposed.To improve the accuracy of our model,we utilized the synthetic minority over-sampling technique to balance the imbalance data sets.The Boruta-ERT prediction model was then used to select features and predict the strip crown.With the 2160 mm hot rolling production lines of a steel plant serving as the research object,the experimental results showed that 97.01% of prediction data have an absolute error of less than 8 lm.This level of accuracy met the control requirements for strip crown and demonstrated significant benefits for the improvement in production quality of steel strip.展开更多
Data-driven modelling methods are being developed in the quest to achieve more accurate performance prediction of protons exchange membrane fuel cell (PEMFC) systems in response to their complicated physicochemical ph...Data-driven modelling methods are being developed in the quest to achieve more accurate performance prediction of protons exchange membrane fuel cell (PEMFC) systems in response to their complicated physicochemical phenomena. However, there is little research in this field detailing the pre-processing and selection of balance of plants (BOP) features for the input layer of system performance prediction at different current densities. Furthermore, most of the previous research applies neural networks based on simulation data rather than real-time bench or vehicle operation datasets which leads to low robustness and unreliable practical results. This paper details the application of a novel algorithm denoted XGBoost-Boruta, which utilises the combination of an ensemble learning approach and a wrapping approach, to improve the robustness of feature selection and to increase the accuracy and robustness of PEMFC system performance prediction. By introduction of the Z score and shadow features to eliminate the randomness of conventional ensemble learning methods, seven key controllable BOP variables of the hydrogen anode, air cathode and cooling subsystems are selected as the original input variables to determine their dependency on the stack voltage. Two case studies are presented for verification and validation of the proposed algorithm based on the real-time dataset of bench experimental data and data obtained from heavy truck operation at current densities ranging from 100 to 1500 mA/cm2. The feature selection strategy, based on the proposed XGBoost-Boruta algorithm, largely decreases the RMSE by 23.8% and 14.1% and the R^(2) increases by 0.06 and 0.04 of both the bench experimental and the heavy truck validation datasets respectively.展开更多
Wind speed forecasting is important for wind energy forecasting.In the modern era,the increase in energy demand can be managed effectively by fore-casting the wind speed accurately.The main objective of this research ...Wind speed forecasting is important for wind energy forecasting.In the modern era,the increase in energy demand can be managed effectively by fore-casting the wind speed accurately.The main objective of this research is to improve the performance of wind speed forecasting by handling uncertainty,the curse of dimensionality,overfitting and non-linearity issues.The curse of dimensionality and overfitting issues are handled by using Boruta feature selec-tion.The uncertainty and the non-linearity issues are addressed by using the deep learning based Bi-directional Long Short Term Memory(Bi-LSTM).In this paper,Bi-LSTM with Boruta feature selection named BFS-Bi-LSTM is proposed to improve the performance of wind speed forecasting.The model identifies relevant features for wind speed forecasting from the meteorological features using Boruta wrapper feature selection(BFS).Followed by Bi-LSTM predicts the wind speed by considering the wind speed from the past and future time steps.The proposed BFS-Bi-LSTM model is compared against Multilayer perceptron(MLP),MLP with Boruta(BFS-MLP),Long Short Term Memory(LSTM),LSTM with Boruta(BFS-LSTM)and Bi-LSTM in terms of Root Mean Square Error(RMSE),Mean Absolute Error(MAE),Mean Square Error(MSE)and R2.The BFS-Bi-LSTM surpassed other models by producing RMSE of 0.784,MAE of 0.530,MSE of 0.615 and R2 of 0.8766.The experimental result shows that the BFS-Bi-LSTM produced better forecasting results compared to others.展开更多
Visible and near-infrared(vis-NIR)spectroscopy technique allows for fast and efficient determination of soil organic matter(SOM).However,a prior requirement for the vis-NIR spectroscopy technique to predict SOM is the...Visible and near-infrared(vis-NIR)spectroscopy technique allows for fast and efficient determination of soil organic matter(SOM).However,a prior requirement for the vis-NIR spectroscopy technique to predict SOM is the effective removal of redundant information.Therefore,this study aims to select three wavelength selection strategies for obtaining the spectral response characteristics of SOM.The SOM content and spectral information of 110 soil samples from the Ogan-Kuqa River Oasis were measured under laboratory conditions in July 2017.Pearson correlation analysis was introduced to preselect spectral wavelengths from the preprocessed spectra that passed the 0.01 level significance test.The successive projection algorithm(SPA),competitive adaptive reweighted sampling(CARS),and Boruta algorithm were used to detect the optimal variables from the preselected wavelengths.Finally,partial least squares regression(PLSR)and random forest(RF)models combined with the optimal wavelengths were applied to develop a quantitative estimation model of the SOM content.The results demonstrate that the optimal variables selected were mainly located near the range of spectral absorption features(i.e.,1400.0,1900.0,and 2200.0 nm),and the CARS and Boruta algorithm also selected a few visible wavelengths located in the range of 480.0–510.0 nm.Both models can achieve a more satisfactory prediction of the SOM content,and the RF model had better accuracy than the PLSR model.The SOM content prediction model established by Boruta algorithm combined with the RF model performed best with 23 variables and the model achieved the coefficient of determination(R2)of 0.78 and the residual prediction deviation(RPD)of 2.38.The Boruta algorithm effectively removed redundant information and optimized the optimal wavelengths to improve the prediction accuracy of the estimated SOM content.Therefore,combining vis-NIR spectroscopy with machine learning to estimate SOM content is an important method to improve the accuracy of SOM prediction in arid land.展开更多
短期电力负荷受多种因素影响,具有波动性大、随机性强的特点,使得高精度的短期负荷预测比较困难。为充分提取负荷数据中的特征,提升短期负荷预测精度,提出了一种基于模态分解及注意力机制长短时间网络(long and short-term temporal net...短期电力负荷受多种因素影响,具有波动性大、随机性强的特点,使得高精度的短期负荷预测比较困难。为充分提取负荷数据中的特征,提升短期负荷预测精度,提出了一种基于模态分解及注意力机制长短时间网络(long and short-term temporal networks with attention,LSTNet-Attn)的短期负荷预测模型。首先该模型采用自适应白噪声的完整经验模态分解(complete ensemble empirical mode decomposition with adaptive noise,CEEMDAN)对包含大量高频分量且频率成分复杂的原始负荷时间序列进行处理,经频率分离后得到若干个包含不同频率成分的本征模函数(intrinsic mode functions,IMF)。其次,在采集特征的基础上构建日期特征,并通过Boruta算法优化输入数据维度冗余问题。然后,在上述基础上构建LSTNet-Attn预测模型,模型包括卷积模块、循环跳过模块、自回归(autoregressive,AR)模块和注意力机制模块。卷积模块和循环跳过模块提取输入负荷数据中高度非线性的长短期特征和线性特征;AR模块优化神经网络对线性特征识别不敏感问题;注意力机制实现对重要特征分配更多权重以捕获全局与局部的联系,优化模型提升预测精度。最后采用于麻省理工数据集进行实例验证,并与常用预测模型进行对比研究和模型消融研究,证明该模型有效提高了负荷预测的精确性。展开更多
Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett...Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.52074085,U21A20117 and U21A20475)the Fundamental Research Funds for the Central Universities(Grant No.N2004010)the Liaoning Revitalization Talents Program(XLYC1907065).
文摘The quality of hot-rolled steel strip is directly affected by the strip crown.Traditional machine learning models have shown limitations in accurately predicting the strip crown,particularly when dealing with imbalanced data.This limitation results in poor production quality and efficiency,leading to increased production costs.Thus,a novel strip crown prediction model that uses the Boruta and extremely randomized trees(Boruta-ERT)algorithms to address this issue was proposed.To improve the accuracy of our model,we utilized the synthetic minority over-sampling technique to balance the imbalance data sets.The Boruta-ERT prediction model was then used to select features and predict the strip crown.With the 2160 mm hot rolling production lines of a steel plant serving as the research object,the experimental results showed that 97.01% of prediction data have an absolute error of less than 8 lm.This level of accuracy met the control requirements for strip crown and demonstrated significant benefits for the improvement in production quality of steel strip.
文摘Data-driven modelling methods are being developed in the quest to achieve more accurate performance prediction of protons exchange membrane fuel cell (PEMFC) systems in response to their complicated physicochemical phenomena. However, there is little research in this field detailing the pre-processing and selection of balance of plants (BOP) features for the input layer of system performance prediction at different current densities. Furthermore, most of the previous research applies neural networks based on simulation data rather than real-time bench or vehicle operation datasets which leads to low robustness and unreliable practical results. This paper details the application of a novel algorithm denoted XGBoost-Boruta, which utilises the combination of an ensemble learning approach and a wrapping approach, to improve the robustness of feature selection and to increase the accuracy and robustness of PEMFC system performance prediction. By introduction of the Z score and shadow features to eliminate the randomness of conventional ensemble learning methods, seven key controllable BOP variables of the hydrogen anode, air cathode and cooling subsystems are selected as the original input variables to determine their dependency on the stack voltage. Two case studies are presented for verification and validation of the proposed algorithm based on the real-time dataset of bench experimental data and data obtained from heavy truck operation at current densities ranging from 100 to 1500 mA/cm2. The feature selection strategy, based on the proposed XGBoost-Boruta algorithm, largely decreases the RMSE by 23.8% and 14.1% and the R^(2) increases by 0.06 and 0.04 of both the bench experimental and the heavy truck validation datasets respectively.
文摘Wind speed forecasting is important for wind energy forecasting.In the modern era,the increase in energy demand can be managed effectively by fore-casting the wind speed accurately.The main objective of this research is to improve the performance of wind speed forecasting by handling uncertainty,the curse of dimensionality,overfitting and non-linearity issues.The curse of dimensionality and overfitting issues are handled by using Boruta feature selec-tion.The uncertainty and the non-linearity issues are addressed by using the deep learning based Bi-directional Long Short Term Memory(Bi-LSTM).In this paper,Bi-LSTM with Boruta feature selection named BFS-Bi-LSTM is proposed to improve the performance of wind speed forecasting.The model identifies relevant features for wind speed forecasting from the meteorological features using Boruta wrapper feature selection(BFS).Followed by Bi-LSTM predicts the wind speed by considering the wind speed from the past and future time steps.The proposed BFS-Bi-LSTM model is compared against Multilayer perceptron(MLP),MLP with Boruta(BFS-MLP),Long Short Term Memory(LSTM),LSTM with Boruta(BFS-LSTM)and Bi-LSTM in terms of Root Mean Square Error(RMSE),Mean Absolute Error(MAE),Mean Square Error(MSE)and R2.The BFS-Bi-LSTM surpassed other models by producing RMSE of 0.784,MAE of 0.530,MSE of 0.615 and R2 of 0.8766.The experimental result shows that the BFS-Bi-LSTM produced better forecasting results compared to others.
基金supported by the Key Project of Natural Science Foundation of Xinjiang Uygur Autonomous Region,China(2021D01D06)the National Natural Science Foundation of China(41961059)。
文摘Visible and near-infrared(vis-NIR)spectroscopy technique allows for fast and efficient determination of soil organic matter(SOM).However,a prior requirement for the vis-NIR spectroscopy technique to predict SOM is the effective removal of redundant information.Therefore,this study aims to select three wavelength selection strategies for obtaining the spectral response characteristics of SOM.The SOM content and spectral information of 110 soil samples from the Ogan-Kuqa River Oasis were measured under laboratory conditions in July 2017.Pearson correlation analysis was introduced to preselect spectral wavelengths from the preprocessed spectra that passed the 0.01 level significance test.The successive projection algorithm(SPA),competitive adaptive reweighted sampling(CARS),and Boruta algorithm were used to detect the optimal variables from the preselected wavelengths.Finally,partial least squares regression(PLSR)and random forest(RF)models combined with the optimal wavelengths were applied to develop a quantitative estimation model of the SOM content.The results demonstrate that the optimal variables selected were mainly located near the range of spectral absorption features(i.e.,1400.0,1900.0,and 2200.0 nm),and the CARS and Boruta algorithm also selected a few visible wavelengths located in the range of 480.0–510.0 nm.Both models can achieve a more satisfactory prediction of the SOM content,and the RF model had better accuracy than the PLSR model.The SOM content prediction model established by Boruta algorithm combined with the RF model performed best with 23 variables and the model achieved the coefficient of determination(R2)of 0.78 and the residual prediction deviation(RPD)of 2.38.The Boruta algorithm effectively removed redundant information and optimized the optimal wavelengths to improve the prediction accuracy of the estimated SOM content.Therefore,combining vis-NIR spectroscopy with machine learning to estimate SOM content is an important method to improve the accuracy of SOM prediction in arid land.
基金support provided by The Science and Technology Development Fund,Macao SAR,China(File Nos.0057/2020/AGJ and SKL-IOTSC-2021-2023)Science and Technology Program of Guangdong Province,China(Grant No.2021A0505080009).
文摘Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。