To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) s...To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) sequestration models do not adequately consider multiple transport mechanisms.Moreover,the evaluation of CO_(2) storage processes usually involves laborious and time-consuming numerical simulations unsuitable for practical prediction and decision-making.In this paper,an integrated model involving gas diffusion,adsorption,dissolution,slip flow,and Darcy flow is proposed to accurately characterize CO_(2) storage in depleted shale reservoirs,supporting the establishment of a training database.On this basis,a hybrid physics-informed data-driven neural network(HPDNN)is developed as a deep learning surrogate for prediction and inversion.By incorporating multiple sources of scientific knowledge,the HPDNN can be configured with limited simulation resources,significantly accelerating the forward and inversion processes.Furthermore,the HPDNN can more intelligently predict injection performance,precisely perform reservoir parameter inversion,and reasonably evaluate the CO_(2) storage capacity under complicated scenarios.The validation and test results demonstrate that the HPDNN can ensure high accuracy and strong robustness across an extensive applicability range when dealing with field data with multiple noise sources.This study has tremendous potential to replace traditional modeling tools for predicting and making decisions about CO_(2) storage projects in depleted shale reservoirs.展开更多
With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbin...With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.展开更多
The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was p...The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.展开更多
The heat transfer through a concave permeable fin is analyzed by the local thermal non-equilibrium(LTNE)model.The governing dimensional temperature equations for the solid and fluid phases of the porous extended surfa...The heat transfer through a concave permeable fin is analyzed by the local thermal non-equilibrium(LTNE)model.The governing dimensional temperature equations for the solid and fluid phases of the porous extended surface are modeled,and then are nondimensionalized by suitable dimensionless terms.Further,the obtained nondimensional equations are solved by the clique polynomial method(CPM).The effects of several dimensionless parameters on the fin's thermal profiles are shown by graphical illustrations.Additionally,the current study implements deep neural structures to solve physics-governed coupled equations,and the best-suited hyperparameters are attained by comparison with various network combinations.The results of the CPM and physicsinformed neural network(PINN)exhibit good agreement,signifying that both methods effectively solve the thermal modeling problem.展开更多
Landslide susceptibility mapping is a crucial tool for disaster prevention and management.The performance of conventional data-driven model is greatly influenced by the quality of the samples data.The random selection...Landslide susceptibility mapping is a crucial tool for disaster prevention and management.The performance of conventional data-driven model is greatly influenced by the quality of the samples data.The random selection of negative samples results in the lack of interpretability throughout the assessment process.To address this limitation and construct a high-quality negative samples database,this study introduces a physics-informed machine learning approach,combining the random forest model with Scoops 3D,to optimize the negative samples selection strategy and assess the landslide susceptibility of the study area.The Scoops 3D is employed to determine the factor of safety value leveraging Bishop’s simplified method.Instead of conventional random selection,negative samples are extracted from the areas with a high factor of safety value.Subsequently,the results of conventional random forest model and physics-informed data-driven model are analyzed and discussed,focusing on model performance and prediction uncertainty.In comparison to conventional methods,the physics-informed model,set with a safety area threshold of 3,demonstrates a noteworthy improvement in the mean AUC value by 36.7%,coupled with a reduced prediction uncertainty.It is evident that the determination of the safety area threshold exerts an impact on both prediction uncertainty and model performance.展开更多
This study explores the effectiveness of machine learning models in predicting the air-side performance of microchannel heat exchangers.The data were generated by experimentally validated Computational Fluid Dynam-ics...This study explores the effectiveness of machine learning models in predicting the air-side performance of microchannel heat exchangers.The data were generated by experimentally validated Computational Fluid Dynam-ics(CFD)simulations of air-to-water microchannel heat exchangers.A distinctive aspect of this research is the comparative analysis of four diverse machine learning algorithms:Artificial Neural Networks(ANN),Support Vector Machines(SVM),Random Forest(RF),and Gaussian Process Regression(GPR).These models are adeptly applied to predict air-side heat transfer performance with high precision,with ANN and GPR exhibiting notably superior accuracy.Additionally,this research further delves into the influence of both geometric and operational parameters—including louvered angle,fin height,fin spacing,air inlet temperature,velocity,and tube temperature—on model performance.Moreover,it innovatively incorporates dimensionless numbers such as aspect ratio,fin height-to-spacing ratio,Reynolds number,Nusselt number,normalized air inlet temperature,temperature difference,and louvered angle into the input variables.This strategic inclusion significantly refines the predictive capabilities of the models by establishing a robust analytical framework supported by the CFD-generated database.The results show the enhanced prediction accuracy achieved by integrating dimensionless numbers,highlighting the effectiveness of data-driven approaches in precisely forecasting heat exchanger performance.This advancement is pivotal for the geometric optimization of heat exchangers,illustrating the considerable potential of integrating sophisticated modeling techniques with traditional engineering metrics.展开更多
NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large lan...NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.展开更多
This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models ...This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.展开更多
The world’s increasing population requires the process industry to produce food,fuels,chemicals,and consumer products in a more efficient and sustainable way.Functional process materials lie at the heart of this chal...The world’s increasing population requires the process industry to produce food,fuels,chemicals,and consumer products in a more efficient and sustainable way.Functional process materials lie at the heart of this challenge.Traditionally,new advanced materials are found empirically or through trial-and-error approaches.As theoretical methods and associated tools are being continuously improved and computer power has reached a high level,it is now efficient and popular to use computational methods to guide material selection and design.Due to the strong interaction between material selection and the operation of the process in which the material is used,it is essential to perform material and process design simultaneously.Despite this significant connection,the solution of the integrated material and process design problem is not easy because multiple models at different scales are usually required.Hybrid modeling provides a promising option to tackle such complex design problems.In hybrid modeling,the material properties,which are computationally expensive to obtain,are described by data-driven models,while the well-known process-related principles are represented by mechanistic models.This article highlights the significance of hybrid modeling in multiscale material and process design.The generic design methodology is first introduced.Six important application areas are then selected:four from the chemical engineering field and two from the energy systems engineering domain.For each selected area,state-ofthe-art work using hybrid modeling for multiscale material and process design is discussed.Concluding remarks are provided at the end,and current limitations and future opportunities are pointed out.展开更多
Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to...Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to forecast shale gas production is still challenging due to complex fracture networks,dynamic fracture properties,frac hits,complicated multiphase flow,and multi-scale flow as well as data quality and uncertainty.This work develops an integrated framework for evaluating shale gas well production based on data-driven models.Firstly,a comprehensive dominated-factor system has been established,including geological,drilling,fracturing,and production factors.Data processing and visualization are required to ensure data quality and determine final data set.A shale gas production evaluation model is developed to evaluate shale gas production levels.Finally,the random forest algorithm is used to forecast shale gas production.The prediction accuracy of shale gas production level is higher than 95%based on the shale gas reservoirs in China.Forty-one wells are randomly selected to predict cumulative gas production using the optimal regression model.The proposed shale gas production evaluation frame-work overcomes too many assumptions of analytical or semi-analytical models and avoids huge computation cost and poor generalization for numerical modelling.展开更多
Accurate insight into the heat generation rate(HGR) of lithium-ion batteries(LIBs) is one of key issues for battery management systems to formulate thermal safety warning strategies in advance.For this reason,this pap...Accurate insight into the heat generation rate(HGR) of lithium-ion batteries(LIBs) is one of key issues for battery management systems to formulate thermal safety warning strategies in advance.For this reason,this paper proposes a novel physics-informed neural network(PINN) approach for HGR estimation of LIBs under various driving conditions.Specifically,a single particle model with thermodynamics(SPMT) is first constructed for extracting the critical physical knowledge related with battery HGR.Subsequently,the surface concentrations of positive and negative electrodes in battery SPMT model are integrated into the bidirectional long short-term memory(BiLSTM) networks as physical information.And combined with other feature variables,a novel PINN approach to achieve HGR estimation of LIBs with higher accuracy is constituted.Additionally,some critical hyperparameters of BiLSTM used in PINN approach are determined through Bayesian optimization algorithm(BOA) and the results of BOA-based BiLSTM are compared with other traditional BiLSTM/LSTM networks.Eventually,combined with the HGR data generated from the validated virtual battery,it is proved that the proposed approach can well predict the battery HGR under the dynamic stress test(DST) and worldwide light vehicles test procedure(WLTP),the mean absolute error under DST is 0.542 kW/m^(3),and the root mean square error under WLTP is1.428 kW/m^(3)at 25℃.Lastly,the investigation results of this paper also show a new perspective in the application of the PINN approach in battery HGR estimation.展开更多
Steam cracking is the dominant technology for producing light olefins,which are believed to be the foundation of the chemical industry.Predictive models of the cracking process can boost production efficiency and prof...Steam cracking is the dominant technology for producing light olefins,which are believed to be the foundation of the chemical industry.Predictive models of the cracking process can boost production efficiency and profit margin.Rapid advancements in machine learning research have recently enabled data-driven solutions to usher in a new era of process modeling.Meanwhile,its practical application to steam cracking is still hindered by the trade-off between prediction accuracy and computational speed.This research presents a framework for data-driven intelligent modeling of the steam cracking process.Industrial data preparation and feature engineering techniques provide computational-ready datasets for the framework,and feedstock similarities are exploited using k-means clustering.We propose LArge-Residuals-Deletion Multivariate Adaptive Regression Spline(LARD-MARS),a modeling approach that explicitly generates output formulas and eliminates potentially outlying instances.The framework is validated further by the presentation of clustering results,the explanation of variable importance,and the testing and comparison of model performance.展开更多
Vortex induced vibration(VIV)is a challenge in ocean engineering.Several devices including fairings have been designed to suppress VIV.However,how to optimize the design of suppression devices is still a problem to be...Vortex induced vibration(VIV)is a challenge in ocean engineering.Several devices including fairings have been designed to suppress VIV.However,how to optimize the design of suppression devices is still a problem to be solved.In this paper,an optimization design methodology is presented based on data-driven models and genetic algorithm(GA).Data-driven models are introduced to substitute complex physics-based equations.GA is used to rapidly search for the optimal suppression device from all possible solutions.Taking fairings as example,VIV response database for different fairings is established based on parameterized models in which model sections of fairings are controlled by several control points and Bezier curves.Then a data-driven model,which can predict the VIV response of fairings with different sections accurately and efficiently,is trained through BP neural network.Finally,a comprehensive optimization method and process is proposed based on GA and the data-driven model.The proposed method is demonstrated by its application to a case.It turns out that the proposed method can perform the optimization design of fairings effectively.VIV can be reduced obviously through the optimization design.展开更多
Machine learning(ML)provides a new surrogate method for investigating groundwater flow dynamics in unsaturated soils.Traditional pure data-driven methods(e.g.deep neural network,DNN)can provide rapid predictions,but t...Machine learning(ML)provides a new surrogate method for investigating groundwater flow dynamics in unsaturated soils.Traditional pure data-driven methods(e.g.deep neural network,DNN)can provide rapid predictions,but they do require sufficient on-site data for accurate training,and lack interpretability to the physical processes within the data.In this paper,we provide a physics and equalityconstrained artificial neural network(PECANN),to derive unsaturated infiltration solutions with a small amount of initial and boundary data.PECANN takes the physics-informed neural network(PINN)as a foundation,encodes the unsaturated infiltration physical laws(i.e.Richards equation,RE)into the loss function,and uses the augmented Lagrangian method to constrain the learning process of the solutions of RE by adding stronger penalty for the initial and boundary conditions.Four unsaturated infiltration cases are designed to test the training performance of PECANN,i.e.one-dimensional(1D)steady-state unsaturated infiltration,1D transient-state infiltration,two-dimensional(2D)transient-state infiltration,and 1D coupled unsaturated infiltration and deformation.The predicted results of PECANN are compared with the finite difference solutions or analytical solutions.The results indicate that PECANN can accurately capture the variations of pressure head during the unsaturated infiltration,and present higher precision and robustness than DNN and PINN.It is also revealed that PECANN can achieve the same accuracy as the finite difference method with fewer initial and boundary training data.Additionally,we investigate the effect of the hyperparameters of PECANN on solving RE problem.PECANN provides an effective tool for simulating unsaturated infiltration.展开更多
To equip data-driven dynamic chemical process models with strong interpretability,we develop a light attention–convolution–gate recurrent unit(LACG)architecture with three sub-modules—a basic module,a brand-new lig...To equip data-driven dynamic chemical process models with strong interpretability,we develop a light attention–convolution–gate recurrent unit(LACG)architecture with three sub-modules—a basic module,a brand-new light attention module,and a residue module—that are specially designed to learn the general dynamic behavior,transient disturbances,and other input factors of chemical processes,respectively.Combined with a hyperparameter optimization framework,Optuna,the effectiveness of the proposed LACG is tested by distributed control system data-driven modeling experiments on the discharge flowrate of an actual deethanization process.The LACG model provides significant advantages in prediction accuracy and model generalization compared with other models,including the feedforward neural network,convolution neural network,long short-term memory(LSTM),and attention-LSTM.Moreover,compared with the simulation results of a deethanization model built using Aspen Plus Dynamics V12.1,the LACG parameters are demonstrated to be interpretable,and more details on the variable interactions can be observed from the model parameters in comparison with the traditional interpretable model attention-LSTM.This contribution enriches interpretable machine learning knowledge and provides a reliable method with high accuracy for actual chemical process modeling,paving a route to intelligent manufacturing.展开更多
The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of m...The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of measurement recently,there are numerous data-driven methods devoted to discovering governing laws from data.In this work,a data-driven method is employed to perform the modeling of the projectile based on the Kramers–Moyal formulas.More specifically,the four-dimensional projectile system is assumed as an It?stochastic differential equation.Then the least square method and sparse learning are applied to identify the drift coefficient and diffusion matrix from sample path data,which agree well with the real system.The effectiveness of the data-driven method demonstrates that it will become a powerful tool in extracting governing equations and predicting complex dynamical behaviors of the projectile.展开更多
This work addresses the multiscale optimization of the puri cation processes of antibody fragments. Chromatography decisions in the manufacturing processes are optimized, including the number of chromatography columns...This work addresses the multiscale optimization of the puri cation processes of antibody fragments. Chromatography decisions in the manufacturing processes are optimized, including the number of chromatography columns and their sizes, the number of cycles per batch, and the operational ow velocities. Data-driven models of chromatography throughput are developed considering loaded mass, ow velocity, and column bed height as the inputs, using manufacturing-scale simulated datasets based on microscale experimental data. The piecewise linear regression modeling method is adapted due to its simplicity and better prediction accuracy in comparison with other methods. Two alternative mixed-integer nonlinear programming (MINLP) models are proposed to minimize the total cost of goods per gram of the antibody puri cation process, incorporating the data-driven models. These MINLP models are then reformulated as mixed-integer linear programming (MILP) models using linearization techniques and multiparametric disaggregation. Two industrially relevant cases with different chromatography column size alternatives are investigated to demonstrate the applicability of the proposed models.展开更多
The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased si...The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased significantly,making data driven models more challenging to develop.To address this prob lem,data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensiona industrial data.This paper systematically explores and discusses the necessity,feasibility,and effectiveness of augmented indus trial data-driven modeling in the context of the curse of dimen sionality and virtual big data.Then,the process of data augmen tation modeling is analyzed,and the concept of data boosting augmentation is proposed.The data boosting augmentation involves designing the reliability weight and actual-virtual weigh functions,and developing a double weighted partial least squares model to optimize the three stages of data generation,data fusion and modeling.This approach significantly improves the inter pretability,effectiveness,and practicality of data augmentation in the industrial modeling.Finally,the proposed method is verified using practical examples of fault diagnosis systems and virtua measurement systems in the industry.The results demonstrate the effectiveness of the proposed approach in improving the accu racy and robustness of data-driven models,making them more suitable for real-world industrial applications.展开更多
Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic ...Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic process modelling(DSPM)approach into dynamic simulation of the railway vehicle collision.This DSPM approach consists of two steps:(i)process description,four kinds of kernels are used to describe the uncertainty inherent in collision processes;(ii)solving,stochastic variational inferences and mini-batch algorithms can then be used to accelerate computations of stochastic processes.By applying DSPM,Gaussian process regression(GPR)and finite element(FE)methods to two collision scenarios(i.e.lead car colliding with a rigid wall,and the lead car colliding with another lead car),we are able to achieve a comprehensive analysis.The comparison between the DSPM approach and the FE method revealed that the DSPM approach is capable of calculating the corresponding confidence interval,simultaneously improving the overall computational efficiency.Comparing the DSPM approach with the GPR method indicates that the DSPM approach has the ability to accurately describe the dynamic response under unknown conditions.Overall,this research demonstrates the feasibility and usability of the proposed DSPM approach for stochastic dynamics simulation of the railway vehicle collision.展开更多
Recently, the China haze becomes more and more serious, but it is very difficult to model and control it. Here, a data-driven model is introduced for the simulation and monitoring of China haze. First, a multi-dimensi...Recently, the China haze becomes more and more serious, but it is very difficult to model and control it. Here, a data-driven model is introduced for the simulation and monitoring of China haze. First, a multi-dimensional evaluation system is built to evaluate the government performance of China haze. Second, a data-driven model is employed to reveal the operation mechanism of China’s haze and is described as a multi input and multi output system. Third, a prototype system is set up to verify the proposed scheme, and the result provides us with a graphical tool to monitor different haze control strategies.展开更多
基金This work is funded by National Natural Science Foundation of China(Nos.42202292,42141011)the Program for Jilin University(JLU)Science and Technology Innovative Research Team(No.2019TD-35).The authors would also like to thank the reviewers and editors whose critical comments are very helpful in preparing this article.
文摘To reduce CO_(2) emissions in response to global climate change,shale reservoirs could be ideal candidates for long-term carbon geo-sequestration involving multi-scale transport processes.However,most current CO_(2) sequestration models do not adequately consider multiple transport mechanisms.Moreover,the evaluation of CO_(2) storage processes usually involves laborious and time-consuming numerical simulations unsuitable for practical prediction and decision-making.In this paper,an integrated model involving gas diffusion,adsorption,dissolution,slip flow,and Darcy flow is proposed to accurately characterize CO_(2) storage in depleted shale reservoirs,supporting the establishment of a training database.On this basis,a hybrid physics-informed data-driven neural network(HPDNN)is developed as a deep learning surrogate for prediction and inversion.By incorporating multiple sources of scientific knowledge,the HPDNN can be configured with limited simulation resources,significantly accelerating the forward and inversion processes.Furthermore,the HPDNN can more intelligently predict injection performance,precisely perform reservoir parameter inversion,and reasonably evaluate the CO_(2) storage capacity under complicated scenarios.The validation and test results demonstrate that the HPDNN can ensure high accuracy and strong robustness across an extensive applicability range when dealing with field data with multiple noise sources.This study has tremendous potential to replace traditional modeling tools for predicting and making decisions about CO_(2) storage projects in depleted shale reservoirs.
基金Supported by the National Natural Science Foundation of China under Grant No.52131102.
文摘With the rapid advancement of machine learning technology and its growing adoption in research and engineering applications,an increasing number of studies have embraced data-driven approaches for modeling wind turbine wakes.These models leverage the ability to capture complex,high-dimensional characteristics of wind turbine wakes while offering significantly greater efficiency in the prediction process than physics-driven models.As a result,data-driven wind turbine wake models are regarded as powerful and effective tools for predicting wake behavior and turbine power output.This paper aims to provide a concise yet comprehensive review of existing studies on wind turbine wake modeling that employ data-driven approaches.It begins by defining and classifying machine learning methods to facilitate a clearer understanding of the reviewed literature.Subsequently,the related studies are categorized into four key areas:wind turbine power prediction,data-driven analytic wake models,wake field reconstruction,and the incorporation of explicit physical constraints.The accuracy of data-driven models is influenced by two primary factors:the quality of the training data and the performance of the model itself.Accordingly,both data accuracy and model structure are discussed in detail within the review.
基金financially supported by the National Key Research and Development Program of China(2022YFB3706800,2020YFB1710100)the National Natural Science Foundation of China(51821001,52090042,52074183)。
文摘The complex sand-casting process combined with the interactions between process parameters makes it difficult to control the casting quality,resulting in a high scrap rate.A strategy based on a data-driven model was proposed to reduce casting defects and improve production efficiency,which includes the random forest(RF)classification model,the feature importance analysis,and the process parameters optimization with Monte Carlo simulation.The collected data includes four types of defects and corresponding process parameters were used to construct the RF model.Classification results show a recall rate above 90% for all categories.The Gini Index was used to assess the importance of the process parameters in the formation of various defects in the RF model.Finally,the classification model was applied to different production conditions for quality prediction.In the case of process parameters optimization for gas porosity defects,this model serves as an experimental process in the Monte Carlo method to estimate a better temperature distribution.The prediction model,when applied to the factory,greatly improved the efficiency of defect detection.Results show that the scrap rate decreased from 10.16% to 6.68%.
基金funding this work through Small Research Project under grant number RGP.1/141/45。
文摘The heat transfer through a concave permeable fin is analyzed by the local thermal non-equilibrium(LTNE)model.The governing dimensional temperature equations for the solid and fluid phases of the porous extended surface are modeled,and then are nondimensionalized by suitable dimensionless terms.Further,the obtained nondimensional equations are solved by the clique polynomial method(CPM).The effects of several dimensionless parameters on the fin's thermal profiles are shown by graphical illustrations.Additionally,the current study implements deep neural structures to solve physics-governed coupled equations,and the best-suited hyperparameters are attained by comparison with various network combinations.The results of the CPM and physicsinformed neural network(PINN)exhibit good agreement,signifying that both methods effectively solve the thermal modeling problem.
基金Project(G2022165004L)supported by the High-end Foreign Expert Introduction Program,ChinaProject(2021XM3008)supported by the Special Foundation of Postdoctoral Support Program,Chongqing,China+1 种基金Project(2018-ZL-01)supported by the Sichuan Transportation Science and Technology Project,ChinaProject(HZ2021001)supported by the Chongqing Municipal Education Commission,China。
文摘Landslide susceptibility mapping is a crucial tool for disaster prevention and management.The performance of conventional data-driven model is greatly influenced by the quality of the samples data.The random selection of negative samples results in the lack of interpretability throughout the assessment process.To address this limitation and construct a high-quality negative samples database,this study introduces a physics-informed machine learning approach,combining the random forest model with Scoops 3D,to optimize the negative samples selection strategy and assess the landslide susceptibility of the study area.The Scoops 3D is employed to determine the factor of safety value leveraging Bishop’s simplified method.Instead of conventional random selection,negative samples are extracted from the areas with a high factor of safety value.Subsequently,the results of conventional random forest model and physics-informed data-driven model are analyzed and discussed,focusing on model performance and prediction uncertainty.In comparison to conventional methods,the physics-informed model,set with a safety area threshold of 3,demonstrates a noteworthy improvement in the mean AUC value by 36.7%,coupled with a reduced prediction uncertainty.It is evident that the determination of the safety area threshold exerts an impact on both prediction uncertainty and model performance.
基金supported by the National Natural Science Foundation of China(Grant No.52306026)the Wenzhou Municipal Science and Technology Research Program(Grant No.G20220012)+2 种基金the Special Innovation Project Fund of the Institute of Wenzhou,Zhejiang University(XMGL-KJZX202205)the State Key Laboratory of Air-Conditioning Equipment and System Energy Conservation Open Project(Project No.ACSKL2021KT01)the Special Innovation Project Fund of the Institute of Wenzhou,Zhejiang University(XMGL-KJZX-202205).
文摘This study explores the effectiveness of machine learning models in predicting the air-side performance of microchannel heat exchangers.The data were generated by experimentally validated Computational Fluid Dynam-ics(CFD)simulations of air-to-water microchannel heat exchangers.A distinctive aspect of this research is the comparative analysis of four diverse machine learning algorithms:Artificial Neural Networks(ANN),Support Vector Machines(SVM),Random Forest(RF),and Gaussian Process Regression(GPR).These models are adeptly applied to predict air-side heat transfer performance with high precision,with ANN and GPR exhibiting notably superior accuracy.Additionally,this research further delves into the influence of both geometric and operational parameters—including louvered angle,fin height,fin spacing,air inlet temperature,velocity,and tube temperature—on model performance.Moreover,it innovatively incorporates dimensionless numbers such as aspect ratio,fin height-to-spacing ratio,Reynolds number,Nusselt number,normalized air inlet temperature,temperature difference,and louvered angle into the input variables.This strategic inclusion significantly refines the predictive capabilities of the models by establishing a robust analytical framework supported by the CFD-generated database.The results show the enhanced prediction accuracy achieved by integrating dimensionless numbers,highlighting the effectiveness of data-driven approaches in precisely forecasting heat exchanger performance.This advancement is pivotal for the geometric optimization of heat exchangers,illustrating the considerable potential of integrating sophisticated modeling techniques with traditional engineering metrics.
基金supported by the Jiangsu Provincial Science and Technology Project Basic Research Program(Natural Science Foundation of Jiangsu Province)(No.BK20211283).
文摘NJmat is a user-friendly,data-driven machine learning interface designed for materials design and analysis.The platform integrates advanced computational techniques,including natural language processing(NLP),large language models(LLM),machine learning potentials(MLP),and graph neural networks(GNN),to facili-tate materials discovery.The platform has been applied in diverse materials research areas,including perovskite surface design,catalyst discovery,battery materials screening,structural alloy design,and molecular informatics.By automating feature selection,predictive modeling,and result interpretation,NJmat accelerates the development of high-performance materials across energy storage,conversion,and structural applications.Additionally,NJmat serves as an educational tool,allowing students and researchers to apply machine learning techniques in materials science with minimal coding expertise.Through automated feature extraction,genetic algorithms,and interpretable machine learning models,NJmat simplifies the workflow for materials informatics,bridging the gap between AI and experimental materials research.The latest version(available at https://figshare.com/articles/software/NJmatML/24607893(accessed on 01 January 2025))enhances its functionality by incorporating NJmatNLP,a module leveraging language models like MatBERT and those based on Word2Vec to support materials prediction tasks.By utilizing clustering and cosine similarity analysis with UMAP visualization,NJmat enables intuitive exploration of materials datasets.While NJmat primarily focuses on structure-property relationships and the discovery of novel chemistries,it can also assist in optimizing processing conditions when relevant parameters are included in the training data.By providing an accessible,integrated environment for machine learning-driven materials discovery,NJmat aligns with the objectives of the Materials Genome Initiative and promotes broader adoption of AI techniques in materials science.
基金sponsored by the U.S.Department of Housing and Urban Development(Grant No.NJLTS0027-22)The opinions expressed in this study are the authors alone,and do not represent the U.S.Depart-ment of HUD’s opinions.
文摘This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.
文摘The world’s increasing population requires the process industry to produce food,fuels,chemicals,and consumer products in a more efficient and sustainable way.Functional process materials lie at the heart of this challenge.Traditionally,new advanced materials are found empirically or through trial-and-error approaches.As theoretical methods and associated tools are being continuously improved and computer power has reached a high level,it is now efficient and popular to use computational methods to guide material selection and design.Due to the strong interaction between material selection and the operation of the process in which the material is used,it is essential to perform material and process design simultaneously.Despite this significant connection,the solution of the integrated material and process design problem is not easy because multiple models at different scales are usually required.Hybrid modeling provides a promising option to tackle such complex design problems.In hybrid modeling,the material properties,which are computationally expensive to obtain,are described by data-driven models,while the well-known process-related principles are represented by mechanistic models.This article highlights the significance of hybrid modeling in multiscale material and process design.The generic design methodology is first introduced.Six important application areas are then selected:four from the chemical engineering field and two from the energy systems engineering domain.For each selected area,state-ofthe-art work using hybrid modeling for multiscale material and process design is discussed.Concluding remarks are provided at the end,and current limitations and future opportunities are pointed out.
基金funded by National Natural Science Foundation of China(52004238)China Postdoctoral Science Foundation(2019M663561).
文摘Increasing the production and utilization of shale gas is of great significance for building a clean and low-carbon energy system.Sharp decline of gas production has been widely observed in shale gas reservoirs.How to forecast shale gas production is still challenging due to complex fracture networks,dynamic fracture properties,frac hits,complicated multiphase flow,and multi-scale flow as well as data quality and uncertainty.This work develops an integrated framework for evaluating shale gas well production based on data-driven models.Firstly,a comprehensive dominated-factor system has been established,including geological,drilling,fracturing,and production factors.Data processing and visualization are required to ensure data quality and determine final data set.A shale gas production evaluation model is developed to evaluate shale gas production levels.Finally,the random forest algorithm is used to forecast shale gas production.The prediction accuracy of shale gas production level is higher than 95%based on the shale gas reservoirs in China.Forty-one wells are randomly selected to predict cumulative gas production using the optimal regression model.The proposed shale gas production evaluation frame-work overcomes too many assumptions of analytical or semi-analytical models and avoids huge computation cost and poor generalization for numerical modelling.
基金funded by the Artificial Intelligence Technology Project of Xi’an Science and Technology Bureau in China(No.21RGZN0014)。
文摘Accurate insight into the heat generation rate(HGR) of lithium-ion batteries(LIBs) is one of key issues for battery management systems to formulate thermal safety warning strategies in advance.For this reason,this paper proposes a novel physics-informed neural network(PINN) approach for HGR estimation of LIBs under various driving conditions.Specifically,a single particle model with thermodynamics(SPMT) is first constructed for extracting the critical physical knowledge related with battery HGR.Subsequently,the surface concentrations of positive and negative electrodes in battery SPMT model are integrated into the bidirectional long short-term memory(BiLSTM) networks as physical information.And combined with other feature variables,a novel PINN approach to achieve HGR estimation of LIBs with higher accuracy is constituted.Additionally,some critical hyperparameters of BiLSTM used in PINN approach are determined through Bayesian optimization algorithm(BOA) and the results of BOA-based BiLSTM are compared with other traditional BiLSTM/LSTM networks.Eventually,combined with the HGR data generated from the validated virtual battery,it is proved that the proposed approach can well predict the battery HGR under the dynamic stress test(DST) and worldwide light vehicles test procedure(WLTP),the mean absolute error under DST is 0.542 kW/m^(3),and the root mean square error under WLTP is1.428 kW/m^(3)at 25℃.Lastly,the investigation results of this paper also show a new perspective in the application of the PINN approach in battery HGR estimation.
基金supported by the National Key Research and Development Program of China(2021 YFB 4000500,2021 YFB 4000501,and 2021 YFB 4000502)。
文摘Steam cracking is the dominant technology for producing light olefins,which are believed to be the foundation of the chemical industry.Predictive models of the cracking process can boost production efficiency and profit margin.Rapid advancements in machine learning research have recently enabled data-driven solutions to usher in a new era of process modeling.Meanwhile,its practical application to steam cracking is still hindered by the trade-off between prediction accuracy and computational speed.This research presents a framework for data-driven intelligent modeling of the steam cracking process.Industrial data preparation and feature engineering techniques provide computational-ready datasets for the framework,and feedstock similarities are exploited using k-means clustering.We propose LArge-Residuals-Deletion Multivariate Adaptive Regression Spline(LARD-MARS),a modeling approach that explicitly generates output formulas and eliminates potentially outlying instances.The framework is validated further by the presentation of clustering results,the explanation of variable importance,and the testing and comparison of model performance.
基金supported by the National Natural Science Foundation of China(Grant No.51809279)the Major National Science and Technology Program(Grant No.2016ZX05028-001-05)+1 种基金Program for Changjiang Scholars and Innovative Research Team in University(Grant No.IRT14R58)the Fundamental Research Funds for the Central Universities,that is,the Opening Fund of National Engineering Laboratory of Offshore Geophysical and Exploration Equipment(Grant No.20CX02302A).
文摘Vortex induced vibration(VIV)is a challenge in ocean engineering.Several devices including fairings have been designed to suppress VIV.However,how to optimize the design of suppression devices is still a problem to be solved.In this paper,an optimization design methodology is presented based on data-driven models and genetic algorithm(GA).Data-driven models are introduced to substitute complex physics-based equations.GA is used to rapidly search for the optimal suppression device from all possible solutions.Taking fairings as example,VIV response database for different fairings is established based on parameterized models in which model sections of fairings are controlled by several control points and Bezier curves.Then a data-driven model,which can predict the VIV response of fairings with different sections accurately and efficiently,is trained through BP neural network.Finally,a comprehensive optimization method and process is proposed based on GA and the data-driven model.The proposed method is demonstrated by its application to a case.It turns out that the proposed method can perform the optimization design of fairings effectively.VIV can be reduced obviously through the optimization design.
基金funding support from the science and technology innovation Program of Hunan Province(Grant No.2023RC1017)Hunan Provincial Postgraduate Research and Innovation Project(Grant No.CX20220109)National Natural Science Foundation of China Youth Fund(Grant No.52208378).
文摘Machine learning(ML)provides a new surrogate method for investigating groundwater flow dynamics in unsaturated soils.Traditional pure data-driven methods(e.g.deep neural network,DNN)can provide rapid predictions,but they do require sufficient on-site data for accurate training,and lack interpretability to the physical processes within the data.In this paper,we provide a physics and equalityconstrained artificial neural network(PECANN),to derive unsaturated infiltration solutions with a small amount of initial and boundary data.PECANN takes the physics-informed neural network(PINN)as a foundation,encodes the unsaturated infiltration physical laws(i.e.Richards equation,RE)into the loss function,and uses the augmented Lagrangian method to constrain the learning process of the solutions of RE by adding stronger penalty for the initial and boundary conditions.Four unsaturated infiltration cases are designed to test the training performance of PECANN,i.e.one-dimensional(1D)steady-state unsaturated infiltration,1D transient-state infiltration,two-dimensional(2D)transient-state infiltration,and 1D coupled unsaturated infiltration and deformation.The predicted results of PECANN are compared with the finite difference solutions or analytical solutions.The results indicate that PECANN can accurately capture the variations of pressure head during the unsaturated infiltration,and present higher precision and robustness than DNN and PINN.It is also revealed that PECANN can achieve the same accuracy as the finite difference method with fewer initial and boundary training data.Additionally,we investigate the effect of the hyperparameters of PECANN on solving RE problem.PECANN provides an effective tool for simulating unsaturated infiltration.
基金support provided by the National Natural Science Foundation of China(22122802,22278044,and 21878028)the Chongqing Science Fund for Distinguished Young Scholars(CSTB2022NSCQ-JQX0021)the Fundamental Research Funds for the Central Universities(2022CDJXY-003).
文摘To equip data-driven dynamic chemical process models with strong interpretability,we develop a light attention–convolution–gate recurrent unit(LACG)architecture with three sub-modules—a basic module,a brand-new light attention module,and a residue module—that are specially designed to learn the general dynamic behavior,transient disturbances,and other input factors of chemical processes,respectively.Combined with a hyperparameter optimization framework,Optuna,the effectiveness of the proposed LACG is tested by distributed control system data-driven modeling experiments on the discharge flowrate of an actual deethanization process.The LACG model provides significant advantages in prediction accuracy and model generalization compared with other models,including the feedforward neural network,convolution neural network,long short-term memory(LSTM),and attention-LSTM.Moreover,compared with the simulation results of a deethanization model built using Aspen Plus Dynamics V12.1,the LACG parameters are demonstrated to be interpretable,and more details on the variable interactions can be observed from the model parameters in comparison with the traditional interpretable model attention-LSTM.This contribution enriches interpretable machine learning knowledge and provides a reliable method with high accuracy for actual chemical process modeling,paving a route to intelligent manufacturing.
基金the Six Talent Peaks Project in Jiangsu Province,China(Grant No.JXQC-002)。
文摘The dynamical modeling of projectile systems with sufficient accuracy is of great difficulty due to high-dimensional space and various perturbations.With the rapid development of data science and scientific tools of measurement recently,there are numerous data-driven methods devoted to discovering governing laws from data.In this work,a data-driven method is employed to perform the modeling of the projectile based on the Kramers–Moyal formulas.More specifically,the four-dimensional projectile system is assumed as an It?stochastic differential equation.Then the least square method and sparse learning are applied to identify the drift coefficient and diffusion matrix from sample path data,which agree well with the real system.The effectiveness of the data-driven method demonstrates that it will become a powerful tool in extracting governing equations and predicting complex dynamical behaviors of the projectile.
文摘This work addresses the multiscale optimization of the puri cation processes of antibody fragments. Chromatography decisions in the manufacturing processes are optimized, including the number of chromatography columns and their sizes, the number of cycles per batch, and the operational ow velocities. Data-driven models of chromatography throughput are developed considering loaded mass, ow velocity, and column bed height as the inputs, using manufacturing-scale simulated datasets based on microscale experimental data. The piecewise linear regression modeling method is adapted due to its simplicity and better prediction accuracy in comparison with other methods. Two alternative mixed-integer nonlinear programming (MINLP) models are proposed to minimize the total cost of goods per gram of the antibody puri cation process, incorporating the data-driven models. These MINLP models are then reformulated as mixed-integer linear programming (MILP) models using linearization techniques and multiparametric disaggregation. Two industrially relevant cases with different chromatography column size alternatives are investigated to demonstrate the applicability of the proposed models.
基金supported in part by the National Natural Science Foundation of China(NSFC)(92167106,61833014)Key Research and Development Program of Zhejiang Province(2022C01206)。
文摘The curse of dimensionality refers to the problem o increased sparsity and computational complexity when dealing with high-dimensional data.In recent years,the types and vari ables of industrial data have increased significantly,making data driven models more challenging to develop.To address this prob lem,data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensiona industrial data.This paper systematically explores and discusses the necessity,feasibility,and effectiveness of augmented indus trial data-driven modeling in the context of the curse of dimen sionality and virtual big data.Then,the process of data augmen tation modeling is analyzed,and the concept of data boosting augmentation is proposed.The data boosting augmentation involves designing the reliability weight and actual-virtual weigh functions,and developing a double weighted partial least squares model to optimize the three stages of data generation,data fusion and modeling.This approach significantly improves the inter pretability,effectiveness,and practicality of data augmentation in the industrial modeling.Finally,the proposed method is verified using practical examples of fault diagnosis systems and virtua measurement systems in the industry.The results demonstrate the effectiveness of the proposed approach in improving the accu racy and robustness of data-driven models,making them more suitable for real-world industrial applications.
基金supported by the National Key Research and Development Project(No.2019YFB1405401)the National Natural Science Foundation of China(No.5217120056)。
文摘Using stochastic dynamic simulation for railway vehicle collision still faces many challenges,such as high modelling complexity and time-consuming.To address the challenges,we introduce a novel data-driven stochastic process modelling(DSPM)approach into dynamic simulation of the railway vehicle collision.This DSPM approach consists of two steps:(i)process description,four kinds of kernels are used to describe the uncertainty inherent in collision processes;(ii)solving,stochastic variational inferences and mini-batch algorithms can then be used to accelerate computations of stochastic processes.By applying DSPM,Gaussian process regression(GPR)and finite element(FE)methods to two collision scenarios(i.e.lead car colliding with a rigid wall,and the lead car colliding with another lead car),we are able to achieve a comprehensive analysis.The comparison between the DSPM approach and the FE method revealed that the DSPM approach is capable of calculating the corresponding confidence interval,simultaneously improving the overall computational efficiency.Comparing the DSPM approach with the GPR method indicates that the DSPM approach has the ability to accurately describe the dynamic response under unknown conditions.Overall,this research demonstrates the feasibility and usability of the proposed DSPM approach for stochastic dynamics simulation of the railway vehicle collision.
文摘Recently, the China haze becomes more and more serious, but it is very difficult to model and control it. Here, a data-driven model is introduced for the simulation and monitoring of China haze. First, a multi-dimensional evaluation system is built to evaluate the government performance of China haze. Second, a data-driven model is employed to reveal the operation mechanism of China’s haze and is described as a multi input and multi output system. Third, a prototype system is set up to verify the proposed scheme, and the result provides us with a graphical tool to monitor different haze control strategies.