In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
BACKGROUND Radiation pneumonitis(RP)is a severe complication of thoracic radiotherapy that may lead to dyspnea and lung fibrosis,and negatively affects patients’quality of life.AIM To carry out multiple regression an...BACKGROUND Radiation pneumonitis(RP)is a severe complication of thoracic radiotherapy that may lead to dyspnea and lung fibrosis,and negatively affects patients’quality of life.AIM To carry out multiple regression analysis on the influencing factors of radiation pneumonitis.METHODS Records of 234 patients receiving chest radiotherapy in Huzhou Central Hospital(Huzhou,Zhejiang Province,China)from January 2018 to February 2021,and the patients were divided into either a study group or a control group based on the presence of radiation pneumonitis or not.Among them,93 patients with radiation pneumonitis were included in the study group and 141 without radiation pneumonitis were included in the control group.General characteristics,and radiation and imaging examination data of the two groups were collected and compared.Due to the statistical significance observed,multiple regression analysis was performed on age,tumor type,chemotherapy history,forced vital capacity(FVC),forced expiratory volume in the first second(FEV1),carbon monoxide diffusion volume(DLCO),FEV1/FVC ratio,planned target area(PTV),mean lung dose(MLD),total number of radiation fields,percentage of lung tissue in total lung volume(vdose),probability of normal tissue complications(NTCP),and other factors.RESULTS The proportions of patients aged≥60 years and those with the diagnosis of lung cancer and a history of chemotherapy in the study group were higher than those in the control group(P<0.05);FEV1,DLCO,and FEV1/FVC ratio in the study group were lower than those in the control group(P<0.05),while PTV,MLD,total field number,vdose,and NTCP were higher than in the control group(P<0.05).Logistic regression analysis showed that age,lung cancer diagnosis,chemotherapy history,FEV1,FEV1/FVC ratio,PTV,MLD,total number of radiation fields,vdose,and NTCP were risk factors for radiation pneumonitis.CONCLUSION We have identified patient age,type of lung cancer,history of chemotherapy,lung function,and radiotherapy parameters as risk factors for radiation pneumonitis.Comprehensive evaluation and examination should be carried out before radiotherapy to effectively prevent radiation pneumonitis.展开更多
Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urg...Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
This paper studies the deterioration of bridge substructures utilizing the Long-Term Bridge Performance(LTBP)Program InfoBridge^(TM)and develops a survival model using Cox proportional hazards regression.The survival ...This paper studies the deterioration of bridge substructures utilizing the Long-Term Bridge Performance(LTBP)Program InfoBridge^(TM)and develops a survival model using Cox proportional hazards regression.The survival analysis is based on the National Bridge Inventory(NBI)dataset.The study calculates the survival rate of reinforced and prestressed concrete piles on bridges under marine conditions over a 29-year span(from 1992 to 2020).The state of Maryland is the primary focus of this study,with data from three neighboring regions,the District of Columbia,Virginia,and Delaware to expand the sample size.The data obtained from the National Bridge Inventory are condensed and filtered to acquire the most relevant information for model development.The Cox proportional hazards regression is applied to the condensed NBI data with six parameters:Age,ADT,ADTT,number of spans,span length,and structural length.Two survival models are generated for the bridge substructures:Reinforced and prestressed concrete piles in Maryland and reinforced and prestressed concrete piles in wet service conditions in the District of Columbia,Maryland,Delaware,and Virginia.Results from the Cox proportional hazards regression are used to construct Markov chains to demonstrate the sequence of the deterioration of bridge substructures.The Markov chains can be used as a tool to assist in the prediction and decision-making for repair,rehabilitation,and replacement of bridge piles.Based on the numerical model,the Pile Assessment Matrix Program(PAM)is developed to facilitate the assessment and maintenance of current bridge structures.The program integrates the NBI database with the inspection and research reports from various states’department of transportation,to serve as a tool for condition state simulation based on maintenance or rehabilitation strategies.展开更多
Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections...Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections,cluster analysis and stepwise regression are integrated to predict the traffic volume of lanes at non-detector isolated controlled intersections.First cluster analysis is used to cluster the lanes of non-detector isolated signal-controlled intersections and the lanes of all signal-controlled intersections with detectors.Then, by the results of cluster analysis,the traffic volume samples are selected randomly and stepwise regression is used to predict the traffic volume of lanes at non-detector isolated signal-controlled intersections.The method is tested by the traffic volume data of lanes of the road network of Nanjing city.The problem of predicting the traffic volume of lanes at non-detector isolated signal-controlled intersections was resolved and can be widely used in urban traffic flow guidance and urban traffic control in cities without enough intersections equipped with detectors.展开更多
Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can a...Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.展开更多
[Objective] The research aimed to study the significant influence factors of the population variations of oriental fruit fly. [Method] Using stepwise regression analysis, the population variations law of oriental frui...[Objective] The research aimed to study the significant influence factors of the population variations of oriental fruit fly. [Method] Using stepwise regression analysis, the population variations law of oriental fruit fly in Jianshui County of Yunnan province and the meteorological factors that caused its occurrence were analyzed. And the regression model was built. Finally, the regression model was tested on the basis of the data in Jianshui County of Yunnan Province during 2004-2006.[Result] The main meteorological factors that influenced the occurrence of oriental fruit fly were relative humidity, the lowest monthly temperature and rainfall. [Conclusion] This study will provide certain reference for the prediction researches on the time, quantity and occurrence peak of oriental fruit fly.展开更多
BYD is one of the largest new energy vehicle companies in China.Analyzing its scenario and the factors that affect its value helps to understand and identify development opportunities and potential problems.On one han...BYD is one of the largest new energy vehicle companies in China.Analyzing its scenario and the factors that affect its value helps to understand and identify development opportunities and potential problems.On one hand,this paper makes a qualitative analysis of BYD,using SWOT model to study the internal capability and external environment of BYD.On the other hand,the multiple regression model is used for quantitative analysis of BYD’s enterprise value,and the model is established based on three factors:enterprise fundamentals,investor behavior and psychology,and macroeconomic policy uncertainty,and the stepwise regression is carried out.The results show that the increase of institutional investors’shareholding ratio,the increase of investor sentiment index,and the increase of M2 growth rate will increase the overall enterprise value,while the increase of economic policy uncertainty will decrease the enterprise value.展开更多
The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth res...The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth response, tree age, pruning severity and pretreatment crown size. First, multiple regression analysis was performed to assess the effect of tree age, pruning severity and pretreatment crown size on diameter growth response. Next, segmented regression analysis was performed to assess the effect of pruning severity on diameter growth response. The results of the multiple regression showed that diameter growth response was significantly influenced by pruning severity and pretreatment crown size. The results of the segmented regression showed that in the whole data set, an abrupt change toward a decrease in diameter growth response was detected at 25% of the live crown removed. However, in the group of fully crowned and open-grown, diameter growth response continuously decreased with increasing pruning severity with no significant abrupt change, whereas in the group of 70% - 90% live crown, diameter growth response did not significantly decrease up to the break point (53% crown removed) and then abruptly decreased. This may be the first study to show the numerical evaluation of the effect of pruning severity on tree growth by change point analysis.展开更多
A multivariable regression analysis of the in-situ stress field, which considers the non-linear deformation behavior of faults in practical projects, is presented based on a newly developed three-dimensional displacem...A multivariable regression analysis of the in-situ stress field, which considers the non-linear deformation behavior of faults in practical projects, is presented based on a newly developed three-dimensional displacement discontinuity method (DDM) program. The Bar- ton-Bandis model and the Kulhaway model are adopted as the normal and the tangential deformation model of faults, respectively, where the Mohr-Coulomb failure criterion is satisfied. In practical projects, the values of the mechanical parameters of rock and faults are restricted in a bounded range for in-situ test, and the optimal mechanical parameters are obtained from this range by a loop. Comparing with the traditional finite element method (FEM), the DDM regression results are more accurate.展开更多
The blast-induced ground vibration prediction using scaled distance regression analysis is one of the most popular methods employed by engineers for many decades. It uses the maximum charge per delay and distance of m...The blast-induced ground vibration prediction using scaled distance regression analysis is one of the most popular methods employed by engineers for many decades. It uses the maximum charge per delay and distance of monitoring as the major factors for predicting the peak particle velocity(PPV). It is established that the PPV is caused by the maximum charge per delay which varies with the distance of monitoring and site geology. While conducting a production blasting, the waves induced by blasting of different holes interfere destructively with each other, which may result in higher PPV than the predicted value with scaled distance regression analysis. This phenomenon of interference/superimposition of waves is not considered while using scaled distance regression analysis. In this paper, an attempt has been made to compare the predicted values of blast-induced ground vibration using multi-hole trial blasting with single-hole blasting in an opencast coal mine under the same geological condition. Further,the modified prediction equation for the multi-hole trial blasting was obtained using single-hole regression analysis. The error between predicted and actual values of multi-hole blast-induced ground vibration was found to be reduced by 8.5%.展开更多
This paper presents an analysis to forecast the loads of an isolated area where the history of load is not available or the history may not represent the realistic demand of electricity. The analysis is done through l...This paper presents an analysis to forecast the loads of an isolated area where the history of load is not available or the history may not represent the realistic demand of electricity. The analysis is done through linear regression and based on the identification of factors on which electrical load growth depends. To determine the identification factors, areas are selected whose histories of load growth rate known and the load growth deciding factors are similar to those of the isolated area. The proposed analysis is applied to an isolated area of Bangladesh, called Swandip where a past history of electrical load demand is not available and also there is no possibility of connecting the area with the main land grid system.展开更多
Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi...Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.展开更多
In order to overcome the disadvantages of diagonal connection structures that are complex and for which it is difficult to derive the discriminant of the airflow directions of airways, we have applied a multiple regre...In order to overcome the disadvantages of diagonal connection structures that are complex and for which it is difficult to derive the discriminant of the airflow directions of airways, we have applied a multiple regression method to analyze the effect, of changing the rules of mine airflows, on the stability of a mine ventilation system. The amount of air ( Qj ) is determined for the major airway and an optimum regression equation was derived for Qi as a function of the independent variable ( Ri ), i.e., the venti- lation resistance between different airways. Therefore, corresponding countermeasures are proposed according to the changes in airflows. The calculated results agree very well with our practical situation, indicating that multiple regression analysis is simple, quick and practical and is therefore an effective method to analyze the stability of mine ventilation systems.展开更多
This study aims to extend the multivariate adaptive regression splines(MARS)-Monte Carlo simulation(MCS) method for reliability analysis of slopes in spatially variable soils. This approach is used to explore the infl...This study aims to extend the multivariate adaptive regression splines(MARS)-Monte Carlo simulation(MCS) method for reliability analysis of slopes in spatially variable soils. This approach is used to explore the influences of the multiscale spatial variability of soil properties on the probability of failure(P_f) of the slopes. In the proposed approach, the relationship between the factor of safety and the soil strength parameters characterized with spatial variability is approximated by the MARS, with the aid of Karhunen-Loeve expansion. MCS is subsequently performed on the established MARS model to evaluate Pf.Finally, a nominally homogeneous cohesive-frictional slope and a heterogeneous cohesive slope, which are both characterized with different spatial variabilities, are utilized to illustrate the proposed approach.Results showed that the proposed approach can estimate the P_f of the slopes efficiently in spatially variable soils with sufficient accuracy. Moreover, the approach is relatively robust to the influence of different statistics of soil properties, thereby making it an effective and practical tool for addressing slope reliability problems concerning time-consuming deterministic stability models with low levels of P_f.Furthermore, disregarding the multiscale spatial variability of soil properties can overestimate or underestimate the P_f. Although the difference is small in general, the multiscale spatial variability of the soil properties must still be considered in the reliability analysis of heterogeneous slopes, especially for those highly related to cost effective and accurate designs.展开更多
In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection m...In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection method called recursive feature elimination based on ridge regression(Ridge-RFE)for the original spectral data is recommended to make full use of the valid information of spectra.In the Ridge-RFE method,the absolute value of the ridge regression coefficient was used as a criterion to screen spectral characteristic,the feature with the absolute value of minimum weight in the input subset features was removed by recursive feature elimination(RFE),and the selected features were used as inputs of the partial least squares regression(PLS)model.The Ridge-RFE method based PLS model was used to measure the Fe,Si,Mg,Cu,Zn and Mn for 51 aluminum alloy samples,and the results showed that the root mean square error of prediction decreased greatly compared to the PLS model with full spectrum as input.The overall results demonstrate that the Ridge-RFE method is more efficient to extract the redundant features,make PLS model for better quantitative analysis results and improve model generalization ability.展开更多
A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the mai...A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.展开更多
Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The me...Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The measurement results illustrate that when chloramine residual is more than 0.3 mg/L or AOC content is below 50 μg/L, the biological stability of drinking water can be controlled. Both chloramine residual and AOC have a good relationship with Heterotrophic Plate Counts(HPC)(log value), the correlation coefficient was -0.64 and 0.33, respectively. By regression analysis of the survey data, a statistical equation is presented and it is concluded that disinfectant residual exerts the strongest influence on bacterial growth and AOC is a suitable index to assess the biological stability in the drinking water.展开更多
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistica...With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.展开更多
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
文摘BACKGROUND Radiation pneumonitis(RP)is a severe complication of thoracic radiotherapy that may lead to dyspnea and lung fibrosis,and negatively affects patients’quality of life.AIM To carry out multiple regression analysis on the influencing factors of radiation pneumonitis.METHODS Records of 234 patients receiving chest radiotherapy in Huzhou Central Hospital(Huzhou,Zhejiang Province,China)from January 2018 to February 2021,and the patients were divided into either a study group or a control group based on the presence of radiation pneumonitis or not.Among them,93 patients with radiation pneumonitis were included in the study group and 141 without radiation pneumonitis were included in the control group.General characteristics,and radiation and imaging examination data of the two groups were collected and compared.Due to the statistical significance observed,multiple regression analysis was performed on age,tumor type,chemotherapy history,forced vital capacity(FVC),forced expiratory volume in the first second(FEV1),carbon monoxide diffusion volume(DLCO),FEV1/FVC ratio,planned target area(PTV),mean lung dose(MLD),total number of radiation fields,percentage of lung tissue in total lung volume(vdose),probability of normal tissue complications(NTCP),and other factors.RESULTS The proportions of patients aged≥60 years and those with the diagnosis of lung cancer and a history of chemotherapy in the study group were higher than those in the control group(P<0.05);FEV1,DLCO,and FEV1/FVC ratio in the study group were lower than those in the control group(P<0.05),while PTV,MLD,total field number,vdose,and NTCP were higher than in the control group(P<0.05).Logistic regression analysis showed that age,lung cancer diagnosis,chemotherapy history,FEV1,FEV1/FVC ratio,PTV,MLD,total number of radiation fields,vdose,and NTCP were risk factors for radiation pneumonitis.CONCLUSION We have identified patient age,type of lung cancer,history of chemotherapy,lung function,and radiotherapy parameters as risk factors for radiation pneumonitis.Comprehensive evaluation and examination should be carried out before radiotherapy to effectively prevent radiation pneumonitis.
文摘Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
基金This research receives funding from the Maryland Department of Transportation State Highway Administration.
文摘This paper studies the deterioration of bridge substructures utilizing the Long-Term Bridge Performance(LTBP)Program InfoBridge^(TM)and develops a survival model using Cox proportional hazards regression.The survival analysis is based on the National Bridge Inventory(NBI)dataset.The study calculates the survival rate of reinforced and prestressed concrete piles on bridges under marine conditions over a 29-year span(from 1992 to 2020).The state of Maryland is the primary focus of this study,with data from three neighboring regions,the District of Columbia,Virginia,and Delaware to expand the sample size.The data obtained from the National Bridge Inventory are condensed and filtered to acquire the most relevant information for model development.The Cox proportional hazards regression is applied to the condensed NBI data with six parameters:Age,ADT,ADTT,number of spans,span length,and structural length.Two survival models are generated for the bridge substructures:Reinforced and prestressed concrete piles in Maryland and reinforced and prestressed concrete piles in wet service conditions in the District of Columbia,Maryland,Delaware,and Virginia.Results from the Cox proportional hazards regression are used to construct Markov chains to demonstrate the sequence of the deterioration of bridge substructures.The Markov chains can be used as a tool to assist in the prediction and decision-making for repair,rehabilitation,and replacement of bridge piles.Based on the numerical model,the Pile Assessment Matrix Program(PAM)is developed to facilitate the assessment and maintenance of current bridge structures.The program integrates the NBI database with the inspection and research reports from various states’department of transportation,to serve as a tool for condition state simulation based on maintenance or rehabilitation strategies.
基金The National Natural Science Foundation of China(No.50378016).
文摘Because of the difficulty to obtain the traffic flow information of lanes at non-detector intersections in most metropolises of the world,based on the relationships between the lanes of signal-controlled intersections,cluster analysis and stepwise regression are integrated to predict the traffic volume of lanes at non-detector isolated controlled intersections.First cluster analysis is used to cluster the lanes of non-detector isolated signal-controlled intersections and the lanes of all signal-controlled intersections with detectors.Then, by the results of cluster analysis,the traffic volume samples are selected randomly and stepwise regression is used to predict the traffic volume of lanes at non-detector isolated signal-controlled intersections.The method is tested by the traffic volume data of lanes of the road network of Nanjing city.The problem of predicting the traffic volume of lanes at non-detector isolated signal-controlled intersections was resolved and can be widely used in urban traffic flow guidance and urban traffic control in cities without enough intersections equipped with detectors.
基金financial supports from National Natural Science Foundation of China(No.62205172)Huaneng Group Science and Technology Research Project(No.HNKJ22-H105)Tsinghua University Initiative Scientific Research Program and the International Joint Mission on Climate Change and Carbon Neutrality。
文摘Laser-induced breakdown spectroscopy(LIBS)has become a widely used atomic spectroscopic technique for rapid coal analysis.However,the vast amount of spectral information in LIBS contains signal uncertainty,which can affect its quantification performance.In this work,we propose a hybrid variable selection method to improve the performance of LIBS quantification.Important variables are first identified using Pearson's correlation coefficient,mutual information,least absolute shrinkage and selection operator(LASSO)and random forest,and then filtered and combined with empirical variables related to fingerprint elements of coal ash content.Subsequently,these variables are fed into a partial least squares regression(PLSR).Additionally,in some models,certain variables unrelated to ash content are removed manually to study the impact of variable deselection on model performance.The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method.It is significantly better than the variable selection only method based on empirical knowledge and in most cases outperforms the baseline method.The results showed that on all three datasets the hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest root mean square error of prediction(RMSEP)values of 1.605,3.478 and 1.647,respectively,which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables,which are 1.959,3.718 and 2.181,respectively.The LASSO-PLSR model with empirical support and 20 selected variables exhibited a significantly improved performance after variable deselection,with RMSEP values dropping from 1.635,3.962 and 1.647 to 1.483,3.086 and 1.567,respectively.Such results demonstrate that using empirical knowledge as a support for datadriven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
基金Supported by National Key Technology R&D Program in the11th Five Year Plan of China(2006BAD10A14)~~
文摘[Objective] The research aimed to study the significant influence factors of the population variations of oriental fruit fly. [Method] Using stepwise regression analysis, the population variations law of oriental fruit fly in Jianshui County of Yunnan province and the meteorological factors that caused its occurrence were analyzed. And the regression model was built. Finally, the regression model was tested on the basis of the data in Jianshui County of Yunnan Province during 2004-2006.[Result] The main meteorological factors that influenced the occurrence of oriental fruit fly were relative humidity, the lowest monthly temperature and rainfall. [Conclusion] This study will provide certain reference for the prediction researches on the time, quantity and occurrence peak of oriental fruit fly.
文摘BYD is one of the largest new energy vehicle companies in China.Analyzing its scenario and the factors that affect its value helps to understand and identify development opportunities and potential problems.On one hand,this paper makes a qualitative analysis of BYD,using SWOT model to study the internal capability and external environment of BYD.On the other hand,the multiple regression model is used for quantitative analysis of BYD’s enterprise value,and the model is established based on three factors:enterprise fundamentals,investor behavior and psychology,and macroeconomic policy uncertainty,and the stepwise regression is carried out.The results show that the increase of institutional investors’shareholding ratio,the increase of investor sentiment index,and the increase of M2 growth rate will increase the overall enterprise value,while the increase of economic policy uncertainty will decrease the enterprise value.
文摘The effect of pruning severity on tree growth was analyzed by change point detection using segmented regression. The present study applied this analysis to a well-known published data set including diameter growth response, tree age, pruning severity and pretreatment crown size. First, multiple regression analysis was performed to assess the effect of tree age, pruning severity and pretreatment crown size on diameter growth response. Next, segmented regression analysis was performed to assess the effect of pruning severity on diameter growth response. The results of the multiple regression showed that diameter growth response was significantly influenced by pruning severity and pretreatment crown size. The results of the segmented regression showed that in the whole data set, an abrupt change toward a decrease in diameter growth response was detected at 25% of the live crown removed. However, in the group of fully crowned and open-grown, diameter growth response continuously decreased with increasing pruning severity with no significant abrupt change, whereas in the group of 70% - 90% live crown, diameter growth response did not significantly decrease up to the break point (53% crown removed) and then abruptly decreased. This may be the first study to show the numerical evaluation of the effect of pruning severity on tree growth by change point analysis.
基金financially supported by the Western Transport Technical Project of the Ministry of Transport, China (No. 2009318000046)
文摘A multivariable regression analysis of the in-situ stress field, which considers the non-linear deformation behavior of faults in practical projects, is presented based on a newly developed three-dimensional displacement discontinuity method (DDM) program. The Bar- ton-Bandis model and the Kulhaway model are adopted as the normal and the tangential deformation model of faults, respectively, where the Mohr-Coulomb failure criterion is satisfied. In practical projects, the values of the mechanical parameters of rock and faults are restricted in a bounded range for in-situ test, and the optimal mechanical parameters are obtained from this range by a loop. Comparing with the traditional finite element method (FEM), the DDM regression results are more accurate.
文摘The blast-induced ground vibration prediction using scaled distance regression analysis is one of the most popular methods employed by engineers for many decades. It uses the maximum charge per delay and distance of monitoring as the major factors for predicting the peak particle velocity(PPV). It is established that the PPV is caused by the maximum charge per delay which varies with the distance of monitoring and site geology. While conducting a production blasting, the waves induced by blasting of different holes interfere destructively with each other, which may result in higher PPV than the predicted value with scaled distance regression analysis. This phenomenon of interference/superimposition of waves is not considered while using scaled distance regression analysis. In this paper, an attempt has been made to compare the predicted values of blast-induced ground vibration using multi-hole trial blasting with single-hole blasting in an opencast coal mine under the same geological condition. Further,the modified prediction equation for the multi-hole trial blasting was obtained using single-hole regression analysis. The error between predicted and actual values of multi-hole blast-induced ground vibration was found to be reduced by 8.5%.
文摘This paper presents an analysis to forecast the loads of an isolated area where the history of load is not available or the history may not represent the realistic demand of electricity. The analysis is done through linear regression and based on the identification of factors on which electrical load growth depends. To determine the identification factors, areas are selected whose histories of load growth rate known and the load growth deciding factors are similar to those of the isolated area. The proposed analysis is applied to an isolated area of Bangladesh, called Swandip where a past history of electrical load demand is not available and also there is no possibility of connecting the area with the main land grid system.
基金supported by the projects under the Innovation Team of the Safety Standards and Testing Technology for Agricultural Products of Zhejiang Province, China (Grant No.2010R50028)the National Key Technologies R&D Program of China during the 11th Five-Year Plan Period (Grant No.2006BAK02A18)
文摘Near infrared reflectance spectroscopy (NIRS), a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA) to discriminate the transgenic (TCTP and mi166) and wild type (Zhonghua 11) rice. Furthermore, rice lines transformed with protein gene (OsTCTP) and regulation gene (Osmi166) were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000-8 000 cm-1 and 4 000-10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000-10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
基金Project F010206 supported by the National Natural Science Foundation of China
文摘In order to overcome the disadvantages of diagonal connection structures that are complex and for which it is difficult to derive the discriminant of the airflow directions of airways, we have applied a multiple regression method to analyze the effect, of changing the rules of mine airflows, on the stability of a mine ventilation system. The amount of air ( Qj ) is determined for the major airway and an optimum regression equation was derived for Qi as a function of the independent variable ( Ri ), i.e., the venti- lation resistance between different airways. Therefore, corresponding countermeasures are proposed according to the changes in airflows. The calculated results agree very well with our practical situation, indicating that multiple regression analysis is simple, quick and practical and is therefore an effective method to analyze the stability of mine ventilation systems.
基金supported by The Hong Kong Polytechnic University through the project RU3Ythe Research Grant Council through the project PolyU 5128/13E+1 种基金National Natural Science Foundation of China(Grant No.51778313)Cooperative Innovation Center of Engineering Construction and Safety in Shangdong Blue Economic Zone
文摘This study aims to extend the multivariate adaptive regression splines(MARS)-Monte Carlo simulation(MCS) method for reliability analysis of slopes in spatially variable soils. This approach is used to explore the influences of the multiscale spatial variability of soil properties on the probability of failure(P_f) of the slopes. In the proposed approach, the relationship between the factor of safety and the soil strength parameters characterized with spatial variability is approximated by the MARS, with the aid of Karhunen-Loeve expansion. MCS is subsequently performed on the established MARS model to evaluate Pf.Finally, a nominally homogeneous cohesive-frictional slope and a heterogeneous cohesive slope, which are both characterized with different spatial variabilities, are utilized to illustrate the proposed approach.Results showed that the proposed approach can estimate the P_f of the slopes efficiently in spatially variable soils with sufficient accuracy. Moreover, the approach is relatively robust to the influence of different statistics of soil properties, thereby making it an effective and practical tool for addressing slope reliability problems concerning time-consuming deterministic stability models with low levels of P_f.Furthermore, disregarding the multiscale spatial variability of soil properties can overestimate or underestimate the P_f. Although the difference is small in general, the multiscale spatial variability of the soil properties must still be considered in the reliability analysis of heterogeneous slopes, especially for those highly related to cost effective and accurate designs.
基金supported by National Key Research and Development Program of China(No.2016YFF0102502)the Key Research Program of Frontier Sciences,CAS(No.QYZDJ-SSW-JSC037)the Youth Innovation Promotion Association,CAS,Liao Ning Revitalization Talents Program(No.XLYC1807110)。
文摘In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection method called recursive feature elimination based on ridge regression(Ridge-RFE)for the original spectral data is recommended to make full use of the valid information of spectra.In the Ridge-RFE method,the absolute value of the ridge regression coefficient was used as a criterion to screen spectral characteristic,the feature with the absolute value of minimum weight in the input subset features was removed by recursive feature elimination(RFE),and the selected features were used as inputs of the partial least squares regression(PLS)model.The Ridge-RFE method based PLS model was used to measure the Fe,Si,Mg,Cu,Zn and Mn for 51 aluminum alloy samples,and the results showed that the root mean square error of prediction decreased greatly compared to the PLS model with full spectrum as input.The overall results demonstrate that the Ridge-RFE method is more efficient to extract the redundant features,make PLS model for better quantitative analysis results and improve model generalization ability.
基金Project(70671039) supported by the National Natural Science Foundation of China
文摘A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.
基金Foundation item: The National High Tech Research and Development Program(863) of China(No. 2002AA601140) and the National Natural Science Foundation of China(No. 50238020)
文摘Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The measurement results illustrate that when chloramine residual is more than 0.3 mg/L or AOC content is below 50 μg/L, the biological stability of drinking water can be controlled. Both chloramine residual and AOC have a good relationship with Heterotrophic Plate Counts(HPC)(log value), the correlation coefficient was -0.64 and 0.33, respectively. By regression analysis of the survey data, a statistical equation is presented and it is concluded that disinfectant residual exerts the strongest influence on bacterial growth and AOC is a suitable index to assess the biological stability in the drinking water.
基金founded by the National Natural Science Foundation of China(81202283,81473070,81373102 and81202267)Key Grant of Natural Science Foundation of the Jiangsu Higher Education Institutions of China(10KJA330034 and11KJA330001)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China(20113234110002)the Priority Academic Program for the Development of Jiangsu Higher Education Institutions(Public Health and Preventive Medicine)
文摘With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the perfor- mance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.