Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale c...BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.展开更多
In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), ob...In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.展开更多
The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest touris...The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.展开更多
Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a n...Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.展开更多
In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which us...In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which uses fuzzy theory to describe the uncertainty in big data sets and uses quantum computing to exponentially improve the efficiency of data set preprocessing and parameter estimation.In this paper,data envelopment analysis(DEA)is used to calculate the degree of importance of each data point.Meanwhile,Harrow,Hassidim and Lloyd(HHL)algorithm and quantum swap circuits are used to improve the efficiency of high-dimensional data matrix calculation.The application of the quantum fuzzy regression model to smallscale financial data proves that its accuracy is greatly improved compared with the quantum regression model.Moreover,due to the introduction of quantum computing,the speed of dealing with high-dimensional data matrix has an exponential improvement compared with the fuzzy regression model.The quantum fuzzy regression model proposed in this paper combines the advantages of fuzzy theory and quantum computing which can efficiently calculate high-dimensional data matrix and complete parameter estimation using quantum computing while retaining the uncertainty in big data.Thus,it is a new model for efficient and accurate big data processing in uncertain environments.展开更多
Wireless sensor network(WSN)positioning has a good effect on indoor positioning,so it has received extensive attention in the field of positioning.Non-line-of sight(NLOS)is a primary challenge in indoor complex enviro...Wireless sensor network(WSN)positioning has a good effect on indoor positioning,so it has received extensive attention in the field of positioning.Non-line-of sight(NLOS)is a primary challenge in indoor complex environment.In this paper,a robust localization algorithm based on Gaussian mixture model and fitting polynomial is proposed to solve the problem of NLOS error.Firstly,fitting polynomials are used to predict the measured values.The residuals of predicted and measured values are clustered by Gaussian mixture model(GMM).The LOS probability and NLOS probability are calculated according to the clustering centers.The measured values are filtered by Kalman filter(KF),variable parameter unscented Kalman filter(VPUKF)and variable parameter particle filter(VPPF)in turn.The distance value processed by KF and VPUKF and the distance value processed by KF,VPUKF and VPPF are combined according to probability.Finally,the maximum likelihood method is used to calculate the position coordinate estimation.Through simulation comparison,the proposed algorithm has better positioning accuracy than several comparison algorithms in this paper.And it shows strong robustness in strong NLOS environment.展开更多
Regression models for survival time data involve estimation of the hazard rate as a function of predictor variables and associated slope parameters. An adaptive approach is formulated for such hazard regression modeli...Regression models for survival time data involve estimation of the hazard rate as a function of predictor variables and associated slope parameters. An adaptive approach is formulated for such hazard regression modeling. The hazard rate is modeled using fractional polynomials, that is, linear combinations of products of power transforms of time together with other available predictors. These fractional polynomial models are restricted to generating positive-valued hazard rates and decreasing survival times. Exponentially distributed survival times are a special case. Parameters are estimated using maximum likelihood estimation allowing for right censored survival times. Models are evaluated and compared using likelihood cross-validation (LCV) scores. LCV scores and tolerance parameters are used to control an adaptive search through alternative fractional polynomial hazard rate models to identify effective models for the underlying survival time data. These methods are demonstrated using two different survival time data sets including survival times for lung cancer patients and for multiple myeloma patients. For the lung cancer data, the hazard rate depends distinctly on time. However, controlling for cell type provides a distinct improvement while the hazard rate depends only on cell type and no longer on time. Furthermore, Cox regression is unable to identify a cell type effect. For the multiple myeloma data, the hazard rate also depends distinctly on time. Moreover, consideration of hemoglobin at diagnosis provides a distinct improvement, the hazard rate still depends distinctly on time, and hemoglobin distinctly moderates the effect of time on the hazard rate. These results indicate that adaptive hazard rate modeling can provide unique insights into survival time data.展开更多
Recurrent event time data and more general multiple event time data are commonly analyzed using extensions of Cox regression, or proportional hazards regression, as used with single event time data. These methods trea...Recurrent event time data and more general multiple event time data are commonly analyzed using extensions of Cox regression, or proportional hazards regression, as used with single event time data. These methods treat covariates, either time-invariant or time-varying, as having multiplicative effects while general dependence on time is left un-estimated. An adaptive approach is formulated for analyzing multiple event time data. Conditional hazard rates are modeled in terms of dependence on both time and covariates using fractional polynomials restricted so that the conditional hazard rates are positive-valued and so that excess time probability functions (generalizing survival functions for single event times) are decreasing. Maximum likelihood is used to estimate parameters adjusting for right censored event times. Likelihood cross-validation (LCV) scores are used to compare models. Adaptive searches through alternate conditional hazard rate models are controlled by LCV scores combined with tolerance parameters. These searches identify effective models for the underlying multiple event time data. Conditional hazard regression is demonstrated using data on times between tumor recurrence for bladder cancer patients. Analyses of theory-based models for these data using extensions of Cox regression provide conflicting results on effects to treatment group and the initial number of tumors. On the other hand, fractional polynomial analyses of these theory-based models provide consistent results identifying significant effects to treatment group and initial number of tumors using both model-based and robust empirical tests. Adaptive analyses further identify distinct moderation by group of the effect of tumor order and an additive effect to group after controlling for nonlinear effects to initial number of tumors and tumor order. Results of example analyses indicate that adaptive conditional hazard rate modeling can generate useful insights into multiple event time data.展开更多
Recurrent event time data and more general multiple event time data are commonly analyzed using extensions of Cox regression, or proportional hazards regression, as used with single event time data. These methods trea...Recurrent event time data and more general multiple event time data are commonly analyzed using extensions of Cox regression, or proportional hazards regression, as used with single event time data. These methods treat covariates, either time-invariant or time-varying, as having multiplicative effects while general dependence on time is left un-estimated. An adaptive approach is formulated for analyzing multiple event time data. Conditional hazard rates are modeled in terms of dependence on both time and covariates using fractional polynomials restricted so that the conditional hazard rates are positive-valued and so that excess time probability functions (generalizing survival functions for single event times) are decreasing. Maximum likelihood is used to estimate parameters adjusting for right censored event times. Likelihood cross-validation (LCV) scores are used to compare models. Adaptive searches through alternate conditional hazard rate models are controlled by LCV scores combined with tolerance parameters. These searches identify effective models for the underlying multiple event time data. Conditional hazard regression is demonstrated using data on times between tumor recurrence for bladder cancer patients. Analyses of theory-based models for these data using extensions of Cox regression provide conflicting results on effects to treatment group and the initial number of tumors. On the other hand, fractional polynomial analyses of these theory-based models provide consistent results identifying significant effects to treatment group and initial number of tumors using both model-based and robust empirical tests. Adaptive analyses further identify distinct moderation by group of the effect of tumor order and an additive effect to group after controlling for nonlinear effects to initial number of tumors and tumor order. Results of example analyses indicate that adaptive conditional hazard rate modeling can generate useful insights into multiple event time data.展开更多
Regression and autoregressive mixed models are classical models used to analyze the relationship between time series response variable and other covariates. The coefficients in traditional regression and autoregressiv...Regression and autoregressive mixed models are classical models used to analyze the relationship between time series response variable and other covariates. The coefficients in traditional regression and autoregressive mixed models are constants. However, for complicated data, the coefficients of covariates may change with time. In this article, we propose a kind of partial time-varying coefficient regression and autoregressive mixed model and obtain the local weighted least-square estimators of coefficient functions by the local polynomial technique. The asymptotic normality properties of estimators are derived under regularity conditions, and simulation studies are conducted to empirically examine the finite-sample performances of the proposed estimators. Finally, we use real data about Lake Shasta inflow to illustrate the application of the proposed model.展开更多
Regression and autoregressive mixed models are classical models used to analyze the relationship between time series response variable and other covariates. The coefficients in traditional regression and autoregressiv...Regression and autoregressive mixed models are classical models used to analyze the relationship between time series response variable and other covariates. The coefficients in traditional regression and autoregressive mixed models are constants. However, for complicated data, the coefficients of covariates may change with time. In this article, we propose a kind of partial time-varying coefficient regression and autoregressive mixed model and obtain the local weighted least-square estimators of coefficient functions by the local polynomial technique. The asymptotic normality properties of estimators are derived under regularity conditions, and simulation studies are conducted to empirically examine the finite-sample performances of the proposed estimators. Finally, we use real data about Lake Shasta inflow to illustrate the application of the proposed model.展开更多
In this paper, the Schwarz Information Criterion (SIC) is used to detect the change points in polynomial regression models. Switching quadratic regression models with same amount of model deviation and switching polyn...In this paper, the Schwarz Information Criterion (SIC) is used to detect the change points in polynomial regression models. Switching quadratic regression models with same amount of model deviation and switching polynomial regression models with different amount of model deviation for different segments of regression are considered. The number of separate regimes and their corresponding regression orders are assume to be known. The method is then applied to cable data sets and the change points are successfully detected.展开更多
Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the...Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.展开更多
Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urg...Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.展开更多
In this study, a multivariate local quadratic polynomial regression(MLQPR) method is proposed to design a model for the sludge volume index(SVI). In MLQPR, a quadratic polynomial regression function is established to ...In this study, a multivariate local quadratic polynomial regression(MLQPR) method is proposed to design a model for the sludge volume index(SVI). In MLQPR, a quadratic polynomial regression function is established to describe the relationship between SVI and the relative variables, and the important terms of the quadratic polynomial regression function are determined by the significant test of the corresponding coefficients. Moreover, a local estimation method is introduced to adjust the weights of the quadratic polynomial regression function to improve the model accuracy. Finally, the proposed method is applied to predict the SVI values in a real wastewater treatment process(WWTP). The experimental results demonstrate that the proposed MLQPR method has faster testing speed and more accurate results than some existing methods.展开更多
In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are o...In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.展开更多
The technology of tunnel boring machine(TBM)has been widely applied for underground construction worldwide;however,how to ensure the TBM tunneling process safe and efficient remains a major concern.Advance rate is a k...The technology of tunnel boring machine(TBM)has been widely applied for underground construction worldwide;however,how to ensure the TBM tunneling process safe and efficient remains a major concern.Advance rate is a key parameter of TBM operation and reflects the TBM-ground interaction,for which a reliable prediction helps optimize the TBM performance.Here,we develop a hybrid neural network model,called Attention-ResNet-LSTM,for accurate prediction of the TBM advance rate.A database including geological properties and TBM operational parameters from the Yangtze River Natural Gas Pipeline Project is used to train and test this deep learning model.The evolutionary polynomial regression method is adopted to aid the selection of input parameters.The results of numerical exper-iments show that our Attention-ResNet-LSTM model outperforms other commonly-used intelligent models with a lower root mean square error and a lower mean absolute percentage error.Further,parametric analyses are conducted to explore the effects of the sequence length of historical data and the model architecture on the prediction accuracy.A correlation analysis between the input and output parameters is also implemented to provide guidance for adjusting relevant TBM operational parameters.The performance of our hybrid intelligent model is demonstrated in a case study of TBM tunneling through a complex ground with variable strata.Finally,data collected from the Baimang River Tunnel Project in Shenzhen of China are used to further test the generalization of our model.The results indicate that,compared to the conventional ResNet-LSTM model,our model has a better predictive capability for scenarios with unknown datasets due to its self-adaptive characteristic.展开更多
Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model int...Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.展开更多
This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,w...This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
文摘BACKGROUND The spread of the severe acute respiratory syndrome coronavirus 2 outbreak worldwide has caused concern regarding the mortality rate caused by the infection.The determinants of mortality on a global scale cannot be fully understood due to lack of information.AIM To identify key factors that may explain the variability in case lethality across countries.METHODS We identified 21 Potential risk factors for coronavirus disease 2019(COVID-19)case fatality rate for all the countries with available data.We examined univariate relationships of each variable with case fatality rate(CFR),and all independent variables to identify candidate variables for our final multiple model.Multiple regression analysis technique was used to assess the strength of relationship.RESULTS The mean of COVID-19 mortality was 1.52±1.72%.There was a statistically significant inverse correlation between health expenditure,and number of computed tomography scanners per 1 million with CFR,and significant direct correlation was found between literacy,and air pollution with CFR.This final model can predict approximately 97%of the changes in CFR.CONCLUSION The current study recommends some new predictors explaining affect mortality rate.Thus,it could help decision-makers develop health policies to fight COVID-19.
文摘In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.
文摘The global pandemic,coronavirus disease 2019(COVID-19),has significantly affected tourism,especially in Spain,as it was among the first countries to be affected by the pandemic and is among the world’s biggest tourist destinations.Stock market values are responding to the evolution of the pandemic,especially in the case of tourist companies.Therefore,being able to quantify this relationship allows us to predict the effect of the pandemic on shares in the tourism sector,thereby improving the response to the crisis by policymakers and investors.Accordingly,a dynamic regression model was developed to predict the behavior of shares in the Spanish tourism sector according to the evolution of the COVID-19 pandemic in the medium term.It has been confirmed that both the number of deaths and cases are good predictors of abnormal stock prices in the tourism sector.
基金supported by National Natural Science Foundation of China (61703410,61873175,62073336,61873273,61773386,61922089)。
文摘Remaining useful life(RUL) prediction is one of the most crucial elements in prognostics and health management(PHM). Aiming at the imperfect prior information, this paper proposes an RUL prediction method based on a nonlinear random coefficient regression(RCR) model with fusing failure time data.Firstly, some interesting natures of parameters estimation based on the nonlinear RCR model are given. Based on these natures,the failure time data can be fused as the prior information reasonably. Specifically, the fixed parameters are calculated by the field degradation data of the evaluated equipment and the prior information of random coefficient is estimated with fusing the failure time data of congeneric equipment. Then, the prior information of the random coefficient is updated online under the Bayesian framework, the probability density function(PDF) of the RUL with considering the limitation of the failure threshold is performed. Finally, two case studies are used for experimental verification. Compared with the traditional Bayesian method, the proposed method can effectively reduce the influence of imperfect prior information and improve the accuracy of RUL prediction.
基金This work is supported by the NationalNatural Science Foundation of China(No.62076042)the Key Research and Development Project of Sichuan Province(Nos.2021YFSY0012,2020YFG0307,2021YFG0332)+3 种基金the Science and Technology Innovation Project of Sichuan(No.2020017)the Key Research and Development Project of Chengdu(No.2019-YF05-02028-GX)the Innovation Team of Quantum Security Communication of Sichuan Province(No.17TD0009)the Academic and Technical Leaders Training Funding Support Projects of Sichuan Province(No.2016120080102643).
文摘In the era of big data,traditional regression models cannot deal with uncertain big data efficiently and accurately.In order to make up for this deficiency,this paper proposes a quantum fuzzy regression model,which uses fuzzy theory to describe the uncertainty in big data sets and uses quantum computing to exponentially improve the efficiency of data set preprocessing and parameter estimation.In this paper,data envelopment analysis(DEA)is used to calculate the degree of importance of each data point.Meanwhile,Harrow,Hassidim and Lloyd(HHL)algorithm and quantum swap circuits are used to improve the efficiency of high-dimensional data matrix calculation.The application of the quantum fuzzy regression model to smallscale financial data proves that its accuracy is greatly improved compared with the quantum regression model.Moreover,due to the introduction of quantum computing,the speed of dealing with high-dimensional data matrix has an exponential improvement compared with the fuzzy regression model.The quantum fuzzy regression model proposed in this paper combines the advantages of fuzzy theory and quantum computing which can efficiently calculate high-dimensional data matrix and complete parameter estimation using quantum computing while retaining the uncertainty in big data.Thus,it is a new model for efficient and accurate big data processing in uncertain environments.
基金supported by the National Natural Science Foundation of China under Grant No.62273083 and No.61973069Natural Science Foundation of Hebei Province under Grant No.F2020501012。
文摘Wireless sensor network(WSN)positioning has a good effect on indoor positioning,so it has received extensive attention in the field of positioning.Non-line-of sight(NLOS)is a primary challenge in indoor complex environment.In this paper,a robust localization algorithm based on Gaussian mixture model and fitting polynomial is proposed to solve the problem of NLOS error.Firstly,fitting polynomials are used to predict the measured values.The residuals of predicted and measured values are clustered by Gaussian mixture model(GMM).The LOS probability and NLOS probability are calculated according to the clustering centers.The measured values are filtered by Kalman filter(KF),variable parameter unscented Kalman filter(VPUKF)and variable parameter particle filter(VPPF)in turn.The distance value processed by KF and VPUKF and the distance value processed by KF,VPUKF and VPPF are combined according to probability.Finally,the maximum likelihood method is used to calculate the position coordinate estimation.Through simulation comparison,the proposed algorithm has better positioning accuracy than several comparison algorithms in this paper.And it shows strong robustness in strong NLOS environment.
文摘Regression models for survival time data involve estimation of the hazard rate as a function of predictor variables and associated slope parameters. An adaptive approach is formulated for such hazard regression modeling. The hazard rate is modeled using fractional polynomials, that is, linear combinations of products of power transforms of time together with other available predictors. These fractional polynomial models are restricted to generating positive-valued hazard rates and decreasing survival times. Exponentially distributed survival times are a special case. Parameters are estimated using maximum likelihood estimation allowing for right censored survival times. Models are evaluated and compared using likelihood cross-validation (LCV) scores. LCV scores and tolerance parameters are used to control an adaptive search through alternative fractional polynomial hazard rate models to identify effective models for the underlying survival time data. These methods are demonstrated using two different survival time data sets including survival times for lung cancer patients and for multiple myeloma patients. For the lung cancer data, the hazard rate depends distinctly on time. However, controlling for cell type provides a distinct improvement while the hazard rate depends only on cell type and no longer on time. Furthermore, Cox regression is unable to identify a cell type effect. For the multiple myeloma data, the hazard rate also depends distinctly on time. Moreover, consideration of hemoglobin at diagnosis provides a distinct improvement, the hazard rate still depends distinctly on time, and hemoglobin distinctly moderates the effect of time on the hazard rate. These results indicate that adaptive hazard rate modeling can provide unique insights into survival time data.
文摘Recurrent event time data and more general multiple event time data are commonly analyzed using extensions of Cox regression, or proportional hazards regression, as used with single event time data. These methods treat covariates, either time-invariant or time-varying, as having multiplicative effects while general dependence on time is left un-estimated. An adaptive approach is formulated for analyzing multiple event time data. Conditional hazard rates are modeled in terms of dependence on both time and covariates using fractional polynomials restricted so that the conditional hazard rates are positive-valued and so that excess time probability functions (generalizing survival functions for single event times) are decreasing. Maximum likelihood is used to estimate parameters adjusting for right censored event times. Likelihood cross-validation (LCV) scores are used to compare models. Adaptive searches through alternate conditional hazard rate models are controlled by LCV scores combined with tolerance parameters. These searches identify effective models for the underlying multiple event time data. Conditional hazard regression is demonstrated using data on times between tumor recurrence for bladder cancer patients. Analyses of theory-based models for these data using extensions of Cox regression provide conflicting results on effects to treatment group and the initial number of tumors. On the other hand, fractional polynomial analyses of these theory-based models provide consistent results identifying significant effects to treatment group and initial number of tumors using both model-based and robust empirical tests. Adaptive analyses further identify distinct moderation by group of the effect of tumor order and an additive effect to group after controlling for nonlinear effects to initial number of tumors and tumor order. Results of example analyses indicate that adaptive conditional hazard rate modeling can generate useful insights into multiple event time data.
文摘Recurrent event time data and more general multiple event time data are commonly analyzed using extensions of Cox regression, or proportional hazards regression, as used with single event time data. These methods treat covariates, either time-invariant or time-varying, as having multiplicative effects while general dependence on time is left un-estimated. An adaptive approach is formulated for analyzing multiple event time data. Conditional hazard rates are modeled in terms of dependence on both time and covariates using fractional polynomials restricted so that the conditional hazard rates are positive-valued and so that excess time probability functions (generalizing survival functions for single event times) are decreasing. Maximum likelihood is used to estimate parameters adjusting for right censored event times. Likelihood cross-validation (LCV) scores are used to compare models. Adaptive searches through alternate conditional hazard rate models are controlled by LCV scores combined with tolerance parameters. These searches identify effective models for the underlying multiple event time data. Conditional hazard regression is demonstrated using data on times between tumor recurrence for bladder cancer patients. Analyses of theory-based models for these data using extensions of Cox regression provide conflicting results on effects to treatment group and the initial number of tumors. On the other hand, fractional polynomial analyses of these theory-based models provide consistent results identifying significant effects to treatment group and initial number of tumors using both model-based and robust empirical tests. Adaptive analyses further identify distinct moderation by group of the effect of tumor order and an additive effect to group after controlling for nonlinear effects to initial number of tumors and tumor order. Results of example analyses indicate that adaptive conditional hazard rate modeling can generate useful insights into multiple event time data.
文摘Regression and autoregressive mixed models are classical models used to analyze the relationship between time series response variable and other covariates. The coefficients in traditional regression and autoregressive mixed models are constants. However, for complicated data, the coefficients of covariates may change with time. In this article, we propose a kind of partial time-varying coefficient regression and autoregressive mixed model and obtain the local weighted least-square estimators of coefficient functions by the local polynomial technique. The asymptotic normality properties of estimators are derived under regularity conditions, and simulation studies are conducted to empirically examine the finite-sample performances of the proposed estimators. Finally, we use real data about Lake Shasta inflow to illustrate the application of the proposed model.
文摘Regression and autoregressive mixed models are classical models used to analyze the relationship between time series response variable and other covariates. The coefficients in traditional regression and autoregressive mixed models are constants. However, for complicated data, the coefficients of covariates may change with time. In this article, we propose a kind of partial time-varying coefficient regression and autoregressive mixed model and obtain the local weighted least-square estimators of coefficient functions by the local polynomial technique. The asymptotic normality properties of estimators are derived under regularity conditions, and simulation studies are conducted to empirically examine the finite-sample performances of the proposed estimators. Finally, we use real data about Lake Shasta inflow to illustrate the application of the proposed model.
文摘In this paper, the Schwarz Information Criterion (SIC) is used to detect the change points in polynomial regression models. Switching quadratic regression models with same amount of model deviation and switching polynomial regression models with different amount of model deviation for different segments of regression are considered. The number of separate regimes and their corresponding regression orders are assume to be known. The method is then applied to cable data sets and the change points are successfully detected.
文摘Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.
文摘Rapidly spreading COVID-19 virus and its variants, especially in metropolitan areas around the world, became a major health public concern. The tendency of COVID-19 pandemic and statistical modelling represents an urgent challenge in the United States for which there are few solutions. In this paper, we demonstrate combining Fourier terms for capturing seasonality with ARIMA errors and other dynamics in the data. Therefore, we have analyzed 156 weeks COVID-19 dataset on national level using Dynamic Harmonic Regression model, including simulation analysis and accuracy improvement from 2020 to 2023. Most importantly, we provide new advanced pathways which may serve as targets for developing new solutions and approaches.
文摘In this study, a multivariate local quadratic polynomial regression(MLQPR) method is proposed to design a model for the sludge volume index(SVI). In MLQPR, a quadratic polynomial regression function is established to describe the relationship between SVI and the relative variables, and the important terms of the quadratic polynomial regression function are determined by the significant test of the corresponding coefficients. Moreover, a local estimation method is introduced to adjust the weights of the quadratic polynomial regression function to improve the model accuracy. Finally, the proposed method is applied to predict the SVI values in a real wastewater treatment process(WWTP). The experimental results demonstrate that the proposed MLQPR method has faster testing speed and more accurate results than some existing methods.
文摘In this paper, three smoothed empirical log-likelihood ratio functions for the parameters of nonlinear models with missing response are suggested. Under some regular conditions, the corresponding Wilks phenomena are obtained and the confidence regions for the parameter can be constructed easily.
基金The research was supported by the National Natural Science Foundation of China(Grant No.52008307)the Shanghai Sci-ence and Technology Innovation Program(Grant No.19DZ1201004)The third author would like to acknowledge the funding by the China Postdoctoral Science Foundation(Grant No.2023M732670).
文摘The technology of tunnel boring machine(TBM)has been widely applied for underground construction worldwide;however,how to ensure the TBM tunneling process safe and efficient remains a major concern.Advance rate is a key parameter of TBM operation and reflects the TBM-ground interaction,for which a reliable prediction helps optimize the TBM performance.Here,we develop a hybrid neural network model,called Attention-ResNet-LSTM,for accurate prediction of the TBM advance rate.A database including geological properties and TBM operational parameters from the Yangtze River Natural Gas Pipeline Project is used to train and test this deep learning model.The evolutionary polynomial regression method is adopted to aid the selection of input parameters.The results of numerical exper-iments show that our Attention-ResNet-LSTM model outperforms other commonly-used intelligent models with a lower root mean square error and a lower mean absolute percentage error.Further,parametric analyses are conducted to explore the effects of the sequence length of historical data and the model architecture on the prediction accuracy.A correlation analysis between the input and output parameters is also implemented to provide guidance for adjusting relevant TBM operational parameters.The performance of our hybrid intelligent model is demonstrated in a case study of TBM tunneling through a complex ground with variable strata.Finally,data collected from the Baimang River Tunnel Project in Shenzhen of China are used to further test the generalization of our model.The results indicate that,compared to the conventional ResNet-LSTM model,our model has a better predictive capability for scenarios with unknown datasets due to its self-adaptive characteristic.
基金This work was supported by the 2021 Project of the“14th Five-Year Plan”of Shaanxi Education Science“Research on the Application of Educational Data Mining in Applied Undergraduate Teaching-Taking the Course of‘Computer Application Technology’as an Example”(SGH21Y0403)the Teaching Reform and Research Projects for Practical Teaching in 2022“Research on Practical Teaching of Applied Undergraduate Projects Based on‘Combination of Courses and Certificates”-Taking Computer Application Technology Courses as an Example”(SJJG02012)the 11th batch of Teaching Reform Research Project of Xi’an Jiaotong University City College“Project-Driven Cultivation and Research on Information Literacy of Applied Undergraduate Students in the Information Times-Taking Computer Application Technology Course Teaching as an Example”(111001).
文摘Social network is the mainstream medium of current information dissemination,and it is particularly important to accurately predict its propagation law.In this paper,we introduce a social network propagation model integrating multiple linear regression and infectious disease model.Firstly,we proposed the features that affect social network communication from three dimensions.Then,we predicted the node influence via multiple linear regression.Lastly,we used the node influence as the state transition of the infectious disease model to predict the trend of information dissemination in social networks.The experimental results on a real social network dataset showed that the prediction results of the model are consistent with the actual information dissemination trends.
文摘This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.