Accurately estimating blasting vibration during rock blasting is the foundation of blasting vibration management.In this study,Tuna Swarm Optimization(TSO),Whale Optimization Algorithm(WOA),and Cuckoo Search(CS)were u...Accurately estimating blasting vibration during rock blasting is the foundation of blasting vibration management.In this study,Tuna Swarm Optimization(TSO),Whale Optimization Algorithm(WOA),and Cuckoo Search(CS)were used to optimize two hyperparameters in support vector regression(SVR).Based on these methods,three hybrid models to predict peak particle velocity(PPV)for bench blasting were developed.Eighty-eight samples were collected to establish the PPV database,eight initial blasting parameters were chosen as input parameters for the predictionmodel,and the PPV was the output parameter.As predictive performance evaluation indicators,the coefficient of determination(R2),rootmean square error(RMSE),mean absolute error(MAE),and a10-index were selected.The normalizedmutual information value is then used to evaluate the impact of various input parameters on the PPV prediction outcomes.According to the research findings,TSO,WOA,and CS can all enhance the predictive performance of the SVR model.The TSO-SVR model provides the most accurate predictions.The performances of the optimized hybrid SVR models are superior to the unoptimized traditional prediction model.The maximum charge per delay impacts the PPV prediction value the most.展开更多
The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest touris...The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.展开更多
Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urg...Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.展开更多
The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration o...The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
Mixture of Experts(MoE)regression models are widely studied in statistics and machine learning for modeling heterogeneity in data for regression,clustering and classification.Laplace distribution is one of the most im...Mixture of Experts(MoE)regression models are widely studied in statistics and machine learning for modeling heterogeneity in data for regression,clustering and classification.Laplace distribution is one of the most important statistical tools to analyze thick and tail data.Laplace Mixture of Linear Experts(LMoLE)regression models are based on the Laplace distribution which is more robust.Similar to modelling variance parameter in a homogeneous population,we propose and study a new novel class of models:heteroscedastic Laplace mixture of experts regression models to analyze the heteroscedastic data coming from a heterogeneous population in this paper.The issues of maximum likelihood estimation are addressed.In particular,Minorization-Maximization(MM)algorithm for estimating the regression parameters is developed.Properties of the estimators of the regression coefficients are evaluated through Monte Carlo simulations.Results from the analysis of two real data sets are presented.展开更多
Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models...Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.展开更多
In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual nor...In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual normality assumption. It is well known that, in general, there is no closed form for the probability density function of stable distributions. However, under a Bayesian approach, the use of a latent or auxiliary random variable gives some simplification to obtain any posterior distribution when related to stable distributions. To show the usefulness of the computational aspects, the methodology is applied to two examples: one is related to a standard linear regression model with an explanatory variable and the other is related to a simulated data set assuming a 23 factorial experiment. Posterior summaries of interest are obtained using MCMC (Markov Chain Monte Carlo) methods and the OpenBugs software.展开更多
Nitrate nitrogen(NO_(3)^(-)N)from agricultural activities and in industrial wastewater has become the main source of groundwater pollution,which has raised widespread concerns,particularly in arid and semi-arid river ...Nitrate nitrogen(NO_(3)^(-)N)from agricultural activities and in industrial wastewater has become the main source of groundwater pollution,which has raised widespread concerns,particularly in arid and semi-arid river basins with little water that meets relevant standards.This study aimed to investigate the performance of spatial and non-spatial regression models in modeling nitrate pollution in a semi-intensive farming region of Iran.To perform the modeling of the groundwater's NO_(3)^(-)N concentration,both natural and anthropogenic factors affecting groundwater NO_(3)^(-)N were selected.The results of Moran's I test showed that groundwater nitrate concentration had a significant spatial dependence on the density of wells,distance from streams,total annual precipitation,and distance from roads in the study area.This study provided a way to estimate nitrate pollution using both natural and anthropogenic factors in arid and semi-arid areas where only a few factors are available.Spatial regression methods with spatial correlation structures are effective tools to support spatial decision-making in water pollution control.展开更多
A geometric framcwork is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kin...A geometric framcwork is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures, The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.展开更多
In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete co...In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete consistency result for the estimator of g(x) is presented.展开更多
Rudraprayag in Garhwal Himalayan division is one of the most vulnerable districts to landslides in India. Heavy rainfall, steep slope and developmental activities are important factors for the occurrence of landslides...Rudraprayag in Garhwal Himalayan division is one of the most vulnerable districts to landslides in India. Heavy rainfall, steep slope and developmental activities are important factors for the occurrence of landslides in the district. Therefore, specific assessment of landslide susceptibility and its accuracy at regional level is essential for disaster management and proper land use planning. The article evaluates effectiveness of frequency ratio, fuzzy logic and logistic regression models for assessing landslide susceptibility in Rudraprayag district of Uttarakhand state, India. A landslide inventory map was prepared and verified by field data. Fourteen landslide parameters and generated inventory map were utilized to prepare landslide susceptibility maps through frequency ratio, fuzzy logic and logistic regression models. Landslide susceptibility maps generated through these models were classified into very high, high, medium, low and very low categories using natural breaks classification. Receiver operating characteristics(ROC) curve, spatially agreed area approach and seed cell area index(SCAI) method were used to validate the landslide models. Validation results revealed that fuzzy logic model was found to be more effective in assessing landslide susceptibility in the study area. The landslide susceptibility map generated through fuzzy logic model can be best utilized for landslide disaster management and effective land use planning.展开更多
In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coef...In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.展开更多
A number of statistical tests are proposed for the purpose of change-point detection in a general nonparametric regression model under mild conditions. New proofs are given to prove the weak convergence of the underly...A number of statistical tests are proposed for the purpose of change-point detection in a general nonparametric regression model under mild conditions. New proofs are given to prove the weak convergence of the underlying processes which assume remove the stringent condition of bounded total variation of the regression function and need only second moments. Since many quantities, such as the regression function, the distribution of the covariates and the distribution of the errors, are unspecified, the results are not distribution-free. A weighted bootstrap approach is proposed to approximate the limiting distributions. Results of a simulation study for this paper show good performance for moderate samples sizes.展开更多
Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to compl...Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.展开更多
The desired economics of hard rock surface mining is mainly determined by the parameters of process design which minimize the overall cost per tonne of the rock mined in drilling, blasting, handling and primary crushi...The desired economics of hard rock surface mining is mainly determined by the parameters of process design which minimize the overall cost per tonne of the rock mined in drilling, blasting, handling and primary crushing in given rockmass conditions. The most effective parameters of process design could be established based on the regression models of the cumulative influence of rockmass and mine design parameters on the overall cost per tonne of the rock drilled, blasted, handled and crushed. These models could be developed from the huge data accumulated worldwide on the costs per tonne of hard rock surface mining in drilling, blasting, handling and primary crushing vs the parameters of rockmass and mine design. This paper only dwelt on the development of regression models for oversize generation, blasthole productivity and blasting cost for iron ore surface mines, whose data is available. The SPSS standard statistical correlation – regression analysis software was used in the analysis. Interpretation of the models generated shows that the individual effects of the determinant rockmass and blast design parameters on oversize generation, blasthole productivity and blasting cost are all in compliance with the findings of other researchers and the theory of explosive rock fragmentation and could be used for the estimation of oversize generation, blasthole productivity and blasting cost in rockmass and blast design conditions similar to those of the iron ore surface mines examined in this study. However, the regression models obtained here could not be used alone for the optimization of blast design because most of the determinant parameters also have conflicting effect on the other processes of drilling, handling and primary crushing the blasted rock. Also, the quality and content of the regression models could be enhanced further by increasing the content of rockmass and blast design parameters and the volume of data considered in the regression analysis.展开更多
We propose a subsampling method for robust estimation of regression models which is built on classical methods such as the least squares method. It makes use of the non-robust nature of the underlying classical method...We propose a subsampling method for robust estimation of regression models which is built on classical methods such as the least squares method. It makes use of the non-robust nature of the underlying classical method to find a good sample from regression data contaminated with outliers, and then applies the classical method to the good sample to produce robust estimates of the regression model parameters. The subsampling method is a computational method rooted in the bootstrap methodology which trades analytical treatment for intensive computation;it finds the good sample through repeated fitting of the regression model to many random subsamples of the contaminated data instead of through an analytical treatment of the outliers. The subsampling method can be applied to all regression models for which non-robust classical methods are available. In the present paper, we focus on the basic formulation and robustness property of the subsampling method that are valid for all regression models. We also discuss variations of the method and apply it to three examples involving three different regression models.展开更多
A changepoint in statistical applications refers to an observational time point at which the structure pattern changes during a somewhat long-term experimentation process. In many cases, the change point time and caus...A changepoint in statistical applications refers to an observational time point at which the structure pattern changes during a somewhat long-term experimentation process. In many cases, the change point time and cause are documented and it is reasonably straightforward to statistically adjust (homogenize) the series for the effects of the changepoint. Sadly many changepoint times are undocumented and the changepoint times themselves are the main purpose of study. In this article, the changepoint analysis in two-phrase linear regression models is developed and discussed. Following Liu and Qian (2010)'s idea in the segmented linear regression models, the modified empirical likelihood ratio statistic is proposed to test if there exists a changepoint during the long-term experiment and observation. The modified empirical likelihood ratio statistic is computation-friendly and its ρ-value can be easily approximated based on the large sample properties. The procedure is applied to the Old Faithful geyser eruption data in October 1980.展开更多
Internal solitary wave propagation over a submarine ridge results in energy dissipation, in which the hydrodynamic interaction between a wave and ridge affects marine environment. This study analyzes the effects of ri...Internal solitary wave propagation over a submarine ridge results in energy dissipation, in which the hydrodynamic interaction between a wave and ridge affects marine environment. This study analyzes the effects of ridge height and potential energy during wave-ridge interaction with a binary and cumulative logistic regression model. In testing the Global Null Hypothesis, all values are p<0.001, with three statistical methods, such as Likelihood Ratio, Score, and Wald. While comparing with two kinds of models, tests values obtained by cumulative logistic regression models are better than those by binary logistic regression models. Although this study employed cumulative logistic regression model, three probability functions p^1, p^2 and p^3, are utilized for investigating the weighted influence of factors on wave reflection. Deviance and Pearson tests are applied to check the goodness-of-fit of the proposed model. The analytical results demonstrated that both ridge height (X1) and potential energy (X2) significantly impact (p<0.0001) the amplitude-based reflected rate; the P-values for the deviance and Pearson are all >0.05 (0.2839, 0.3438, respectively). That is, the goodness-of-fit between ridge height (X1) and potential energy (X2) can further predict parameters under the scenario of the best parsimonious model.Investigation of 6 predictive powers (R2, Max-rescaled R2, Somers'D, Gamma, Tau-a, and c, respectively) indicate that these predictive estimates of the proposed model have better predictive ability than ridge height alone, and are very similar to the interaction of ridge height and potential energy. It can be concluded that the goodness-of-fit and prediction ability of the cumulative logistic regression model are better than that of the binary logistic regression model.展开更多
基金financially supported by the NationalNatural Science Foundation of China(Grant No.42072309)the Fundamental Research Funds for National University,China University of Geosciences(Wuhan)(Grant No.CUGDCJJ202217)+1 种基金the Knowledge Innovation Program of Wuhan-Basic Research(Grant No.2022020801010199)the Hubei Key Laboratory of Blasting Engineering Foundation(HKLBEF202002).
文摘Accurately estimating blasting vibration during rock blasting is the foundation of blasting vibration management.In this study,Tuna Swarm Optimization(TSO),Whale Optimization Algorithm(WOA),and Cuckoo Search(CS)were used to optimize two hyperparameters in support vector regression(SVR).Based on these methods,three hybrid models to predict peak particle velocity(PPV)for bench blasting were developed.Eighty-eight samples were collected to establish the PPV database,eight initial blasting parameters were chosen as input parameters for the predictionmodel,and the PPV was the output parameter.As predictive performance evaluation indicators,the coefficient of determination(R2),rootmean square error(RMSE),mean absolute error(MAE),and a10-index were selected.The normalizedmutual information value is then used to evaluate the impact of various input parameters on the PPV prediction outcomes.According to the research findings,TSO,WOA,and CS can all enhance the predictive performance of the SVR model.The TSO-SVR model provides the most accurate predictions.The performances of the optimized hybrid SVR models are superior to the unoptimized traditional prediction model.The maximum charge per delay impacts the PPV prediction value the most.
文摘The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.
文摘Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.
基金supported by the Basic Research Special Plan of Yunnan Provincial Department of Science and Technology-General Project(Grant No.202101AT070094)。
文摘The safety factor is a crucial quantitative index for evaluating slope stability.However,the traditional calculation methods suffer from unreasonable assumptions,complex soil composition,and inadequate consideration of the influencing factors,leading to large errors in their calculations.Therefore,a stacking ensemble learning model(stacking-SSAOP)based on multi-layer regression algorithm fusion and optimized by the sparrow search algorithm is proposed for predicting the slope safety factor.In this method,the density,cohesion,friction angle,slope angle,slope height,and pore pressure ratio are selected as characteristic parameters from the 210 sets of established slope sample data.Random Forest,Extra Trees,AdaBoost,Bagging,and Support Vector regression are used as the base model(inner loop)to construct the first-level regression algorithm layer,and XGBoost is used as the meta-model(outer loop)to construct the second-level regression algorithm layer and complete the construction of the stacked learning model for improving the model prediction accuracy.The sparrow search algorithm is used to optimize the hyperparameters of the above six regression models and correct the over-and underfitting problems of the single regression model to further improve the prediction accuracy.The mean square error(MSE)of the predicted and true values and the fitting of the data are compared and analyzed.The MSE of the stacking-SSAOP model was found to be smaller than that of the single regression model(MSE=0.03917).Therefore,the former has a higher prediction accuracy and better data fitting.This study innovatively applies the sparrow search algorithm to predict the slope safety factor,showcasing its advantages over traditional methods.Additionally,our proposed stacking-SSAOP model integrates multiple regression algorithms to enhance prediction accuracy.This model not only refines the prediction accuracy of the slope safety factor but also offers a fresh approach to handling the intricate soil composition and other influencing factors,making it a precise and reliable method for slope stability evaluation.This research holds importance for the modernization and digitalization of slope safety assessments.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
基金the National Natural Science Foundation of China(11861041,11261025).
文摘Mixture of Experts(MoE)regression models are widely studied in statistics and machine learning for modeling heterogeneity in data for regression,clustering and classification.Laplace distribution is one of the most important statistical tools to analyze thick and tail data.Laplace Mixture of Linear Experts(LMoLE)regression models are based on the Laplace distribution which is more robust.Similar to modelling variance parameter in a homogeneous population,we propose and study a new novel class of models:heteroscedastic Laplace mixture of experts regression models to analyze the heteroscedastic data coming from a heterogeneous population in this paper.The issues of maximum likelihood estimation are addressed.In particular,Minorization-Maximization(MM)algorithm for estimating the regression parameters is developed.Properties of the estimators of the regression coefficients are evaluated through Monte Carlo simulations.Results from the analysis of two real data sets are presented.
文摘Recently,many regression models have been presented for prediction of mechanical parameters of rocks regarding to rock index properties.Although statistical analysis is a common method for developing regression models,but still selection of suitable transformation of the independent variables in a regression model is diffcult.In this paper,a genetic algorithm(GA)has been employed as a heuristic search method for selection of best transformation of the independent variables(some index properties of rocks)in regression models for prediction of uniaxial compressive strength(UCS)and modulus of elasticity(E).Firstly,multiple linear regression(MLR)analysis was performed on a data set to establish predictive models.Then,two GA models were developed in which root mean squared error(RMSE)was defned as ftness function.Results have shown that GA models are more precise than MLR models and are able to explain the relation between the intrinsic strength/elasticity properties and index properties of rocks by simple formulation and accepted accuracy.
基金financial support from the Brazilian Institution Conselho Nacional de Desenvolvimento Cientifico e Tecnologico(CNPq).
文摘In this paper, we study some robustness aspects of linear regression models of the presence of outliers or discordant observations considering the use of stable distributions for the response in place of the usual normality assumption. It is well known that, in general, there is no closed form for the probability density function of stable distributions. However, under a Bayesian approach, the use of a latent or auxiliary random variable gives some simplification to obtain any posterior distribution when related to stable distributions. To show the usefulness of the computational aspects, the methodology is applied to two examples: one is related to a standard linear regression model with an explanatory variable and the other is related to a simulated data set assuming a 23 factorial experiment. Posterior summaries of interest are obtained using MCMC (Markov Chain Monte Carlo) methods and the OpenBugs software.
文摘Nitrate nitrogen(NO_(3)^(-)N)from agricultural activities and in industrial wastewater has become the main source of groundwater pollution,which has raised widespread concerns,particularly in arid and semi-arid river basins with little water that meets relevant standards.This study aimed to investigate the performance of spatial and non-spatial regression models in modeling nitrate pollution in a semi-intensive farming region of Iran.To perform the modeling of the groundwater's NO_(3)^(-)N concentration,both natural and anthropogenic factors affecting groundwater NO_(3)^(-)N were selected.The results of Moran's I test showed that groundwater nitrate concentration had a significant spatial dependence on the density of wells,distance from streams,total annual precipitation,and distance from roads in the study area.This study provided a way to estimate nitrate pollution using both natural and anthropogenic factors in arid and semi-arid areas where only a few factors are available.Spatial regression methods with spatial correlation structures are effective tools to support spatial decision-making in water pollution control.
文摘A geometric framcwork is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve, introduced by Severini and Wong (1992). The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures, The results obtained by Hamilton et al. (1982), Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.
基金Supported by the Research Teaching Model Curriculum of Anhui University(xjyjkc1407)Supported by the Students Innovative Training Project of Anhui University(201310357004,201410357117,201410357249)Supported by the Quality Improvement Projects for Undergraduate Education of Anhui University(ZLTS2015035)
文摘In this paper, by using some inequalities of negatively orthant dependent(NOD,in short) random variables and the truncated method of random variables, we investigate the nonparametric regression model. The complete consistency result for the estimator of g(x) is presented.
文摘Rudraprayag in Garhwal Himalayan division is one of the most vulnerable districts to landslides in India. Heavy rainfall, steep slope and developmental activities are important factors for the occurrence of landslides in the district. Therefore, specific assessment of landslide susceptibility and its accuracy at regional level is essential for disaster management and proper land use planning. The article evaluates effectiveness of frequency ratio, fuzzy logic and logistic regression models for assessing landslide susceptibility in Rudraprayag district of Uttarakhand state, India. A landslide inventory map was prepared and verified by field data. Fourteen landslide parameters and generated inventory map were utilized to prepare landslide susceptibility maps through frequency ratio, fuzzy logic and logistic regression models. Landslide susceptibility maps generated through these models were classified into very high, high, medium, low and very low categories using natural breaks classification. Receiver operating characteristics(ROC) curve, spatially agreed area approach and seed cell area index(SCAI) method were used to validate the landslide models. Validation results revealed that fuzzy logic model was found to be more effective in assessing landslide susceptibility in the study area. The landslide susceptibility map generated through fuzzy logic model can be best utilized for landslide disaster management and effective land use planning.
文摘In this article, to improve the doubly robust estimator, the nonlinear regression models with missing responses are studied. Based on the covariate balancing propensity score (CBPS), estimators for the regression coefficients and the population mean are obtained. It is proved that the proposed estimators are asymptotically normal. In simulation studies, the proposed estimators show improved performance relative to usual augmented inverse probability weighted estimators.
文摘A number of statistical tests are proposed for the purpose of change-point detection in a general nonparametric regression model under mild conditions. New proofs are given to prove the weak convergence of the underlying processes which assume remove the stringent condition of bounded total variation of the regression function and need only second moments. Since many quantities, such as the regression function, the distribution of the covariates and the distribution of the errors, are unspecified, the results are not distribution-free. A weighted bootstrap approach is proposed to approximate the limiting distributions. Results of a simulation study for this paper show good performance for moderate samples sizes.
基金supported by the Basic Performance Key Project,the Ministry of Science and Technology of the People’s Republic of China(No.2006FY110300)
文摘Objective Sub-health status has progressively gained more attention from both medical professionals and the publics. Treating the number of sub-health symptoms as count data rather than dichotomous data helps to completely and accurately analyze findings in sub-healthy population. This study aims to compare the goodness of fit for count outcome models to identify the optimum model for sub-health study.Methods The sample of the study derived from a large-scale population survey on physiological and psychological constants from 2007 to 2011 in 4 provinces and 2 autonomous regions in China. We constructed four count outcome models using SAS: Poisson model, negative binomial (NB) model, zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model. The number of sub-health symptoms was used as the main outcome measure. The alpha dispersion parameter and O test were used to identify over-dispersed data, and Vuong test was used to evaluate the excessive zero count. The goodness of fit of regression models were determined by predictive probability curves and statistics of likelihood ratio test.Results Of all 78 307 respondents, 38.53% reported no sub-health symptoms. The mean number of sub-health symptoms was 2.98, and the standard deviation was 3.72. The statistic O in over-dispersion test was 720.995 (P<0.001); the estimated alpha was 0.618 (95% CI: 0.600-0.636) comparing ZINB model and ZIP model; Vuong test statistic Z was 45.487. These results indicated over-dispersion of the data and excessive zero counts in this sub-health study. ZINB model had the largest log likelihood (-167 519), the smallest Akaike’s Information Criterion coefficient (335 112) and the smallest Bayesian information criterion coefficient (335455),indicating its best goodness of fit. The predictive probabilities for most counts in ZINB model fitted the observed counts best. The logit section of ZINB model analysis showed that age, sex, occupation, smoking, alcohol drinking, ethnicity and obesity were determinants for presence of sub-health symptoms; the binomial negative section of ZINB model analysis showed that sex, occupation, smoking, alcohol drinking, ethnicity, marital status and obesity had significant effect on the severity of sub-health.Conclusions All tests for goodness of fit and the predictive probability curve produced the same finding that ZINB model was the optimum model for exploring the influencing factors of sub-health symptoms.
文摘The desired economics of hard rock surface mining is mainly determined by the parameters of process design which minimize the overall cost per tonne of the rock mined in drilling, blasting, handling and primary crushing in given rockmass conditions. The most effective parameters of process design could be established based on the regression models of the cumulative influence of rockmass and mine design parameters on the overall cost per tonne of the rock drilled, blasted, handled and crushed. These models could be developed from the huge data accumulated worldwide on the costs per tonne of hard rock surface mining in drilling, blasting, handling and primary crushing vs the parameters of rockmass and mine design. This paper only dwelt on the development of regression models for oversize generation, blasthole productivity and blasting cost for iron ore surface mines, whose data is available. The SPSS standard statistical correlation – regression analysis software was used in the analysis. Interpretation of the models generated shows that the individual effects of the determinant rockmass and blast design parameters on oversize generation, blasthole productivity and blasting cost are all in compliance with the findings of other researchers and the theory of explosive rock fragmentation and could be used for the estimation of oversize generation, blasthole productivity and blasting cost in rockmass and blast design conditions similar to those of the iron ore surface mines examined in this study. However, the regression models obtained here could not be used alone for the optimization of blast design because most of the determinant parameters also have conflicting effect on the other processes of drilling, handling and primary crushing the blasted rock. Also, the quality and content of the regression models could be enhanced further by increasing the content of rockmass and blast design parameters and the volume of data considered in the regression analysis.
文摘We propose a subsampling method for robust estimation of regression models which is built on classical methods such as the least squares method. It makes use of the non-robust nature of the underlying classical method to find a good sample from regression data contaminated with outliers, and then applies the classical method to the good sample to produce robust estimates of the regression model parameters. The subsampling method is a computational method rooted in the bootstrap methodology which trades analytical treatment for intensive computation;it finds the good sample through repeated fitting of the regression model to many random subsamples of the contaminated data instead of through an analytical treatment of the outliers. The subsampling method can be applied to all regression models for which non-robust classical methods are available. In the present paper, we focus on the basic formulation and robustness property of the subsampling method that are valid for all regression models. We also discuss variations of the method and apply it to three examples involving three different regression models.
文摘A changepoint in statistical applications refers to an observational time point at which the structure pattern changes during a somewhat long-term experimentation process. In many cases, the change point time and cause are documented and it is reasonably straightforward to statistically adjust (homogenize) the series for the effects of the changepoint. Sadly many changepoint times are undocumented and the changepoint times themselves are the main purpose of study. In this article, the changepoint analysis in two-phrase linear regression models is developed and discussed. Following Liu and Qian (2010)'s idea in the segmented linear regression models, the modified empirical likelihood ratio statistic is proposed to test if there exists a changepoint during the long-term experiment and observation. The modified empirical likelihood ratio statistic is computation-friendly and its ρ-value can be easily approximated based on the large sample properties. The procedure is applied to the Old Faithful geyser eruption data in October 1980.
基金This paper was financially supported by NSC96-2628-E-366-004-MY2 and NSC96-2628-E-132-001-MY2
文摘Internal solitary wave propagation over a submarine ridge results in energy dissipation, in which the hydrodynamic interaction between a wave and ridge affects marine environment. This study analyzes the effects of ridge height and potential energy during wave-ridge interaction with a binary and cumulative logistic regression model. In testing the Global Null Hypothesis, all values are p<0.001, with three statistical methods, such as Likelihood Ratio, Score, and Wald. While comparing with two kinds of models, tests values obtained by cumulative logistic regression models are better than those by binary logistic regression models. Although this study employed cumulative logistic regression model, three probability functions p^1, p^2 and p^3, are utilized for investigating the weighted influence of factors on wave reflection. Deviance and Pearson tests are applied to check the goodness-of-fit of the proposed model. The analytical results demonstrated that both ridge height (X1) and potential energy (X2) significantly impact (p<0.0001) the amplitude-based reflected rate; the P-values for the deviance and Pearson are all >0.05 (0.2839, 0.3438, respectively). That is, the goodness-of-fit between ridge height (X1) and potential energy (X2) can further predict parameters under the scenario of the best parsimonious model.Investigation of 6 predictive powers (R2, Max-rescaled R2, Somers'D, Gamma, Tau-a, and c, respectively) indicate that these predictive estimates of the proposed model have better predictive ability than ridge height alone, and are very similar to the interaction of ridge height and potential energy. It can be concluded that the goodness-of-fit and prediction ability of the cumulative logistic regression model are better than that of the binary logistic regression model.