The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the unc...The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the uncertainty is captured with a given discrete probability distribution over the groups. Such situations arise, for example, in the use of Bayesian imputation methods to assess race and ethnicity disparities with certain insurance, health, and financial data. A widely used method to implement this assessment is the Bayesian Improved Surname Geocoding (BISG) method which assigns a discrete probability over six race/ethnicity groups to an individual given the individual’s surname and address location. Using a Bayesian framework and Markov Chain Monte Carlo sampling from the joint posterior distribution of the group means, the probability of a disparity hypothesis is estimated. Four methods are developed and compared with an illustrative data set. Three of these methods are implemented in an R-code and one method in WinBUGS. These methods are programed for any number of groups between two and six inclusive. All the codes are provided in the appendices.展开更多
BACKGROUND Portal hypertension(PHT),primarily induced by cirrhosis,manifests severe symptoms impacting patient survival.Although transjugular intrahepatic portosystemic shunt(TIPS)is a critical intervention for managi...BACKGROUND Portal hypertension(PHT),primarily induced by cirrhosis,manifests severe symptoms impacting patient survival.Although transjugular intrahepatic portosystemic shunt(TIPS)is a critical intervention for managing PHT,it carries risks like hepatic encephalopathy,thus affecting patient survival prognosis.To our knowledge,existing prognostic models for post-TIPS survival in patients with PHT fail to account for the interplay among and collective impact of various prognostic factors on outcomes.Consequently,the development of an innovative modeling approach is essential to address this limitation.AIM To develop and validate a Bayesian network(BN)-based survival prediction model for patients with cirrhosis-induced PHT having undergone TIPS.METHODS The clinical data of 393 patients with cirrhosis-induced PHT who underwent TIPS surgery at the Second Affiliated Hospital of Chongqing Medical University between January 2015 and May 2022 were retrospectively analyzed.Variables were selected using Cox and least absolute shrinkage and selection operator regression methods,and a BN-based model was established and evaluated to predict survival in patients having undergone TIPS surgery for PHT.RESULTS Variable selection revealed the following as key factors impacting survival:age,ascites,hypertension,indications for TIPS,postoperative portal vein pressure(post-PVP),aspartate aminotransferase,alkaline phosphatase,total bilirubin,prealbumin,the Child-Pugh grade,and the model for end-stage liver disease(MELD)score.Based on the above-mentioned variables,a BN-based 2-year survival prognostic prediction model was constructed,which identified the following factors to be directly linked to the survival time:age,ascites,indications for TIPS,concurrent hypertension,post-PVP,the Child-Pugh grade,and the MELD score.The Bayesian information criterion was 3589.04,and 10-fold cross-validation indicated an average log-likelihood loss of 5.55 with a standard deviation of 0.16.The model’s accuracy,precision,recall,and F1 score were 0.90,0.92,0.97,and 0.95 respectively,with the area under the receiver operating characteristic curve being 0.72.CONCLUSION This study successfully developed a BN-based survival prediction model with good predictive capabilities.It offers valuable insights for treatment strategies and prognostic evaluations in patients having undergone TIPS surgery for PHT.展开更多
Xinjiang Uygur Autonomous Region is a typical inland arid area in China with a sparse and uneven distribution of meteorological stations,limited access to precipitation data,and significant water scarcity.Evaluating a...Xinjiang Uygur Autonomous Region is a typical inland arid area in China with a sparse and uneven distribution of meteorological stations,limited access to precipitation data,and significant water scarcity.Evaluating and integrating precipitation datasets from different sources to accurately characterize precipitation patterns has become a challenge to provide more accurate and alternative precipitation information for the region,which can even improve the performance of hydrological modelling.This study evaluated the applicability of widely used five satellite-based precipitation products(Climate Hazards Group InfraRed Precipitation with Station(CHIRPS),China Meteorological Forcing Dataset(CMFD),Climate Prediction Center morphing method(CMORPH),Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record(PERSIANN-CDR),and Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis(TMPA))and a reanalysis precipitation dataset(ECMWF Reanalysis v5-Land Dataset(ERA5-Land))in Xinjiang using ground-based observational precipitation data from a limited number of meteorological stations.Based on this assessment,we proposed a framework that integrated different precipitation datasets with varying spatial resolutions using a dynamic Bayesian model averaging(DBMA)approach,the expectation-maximization method,and the ordinary Kriging interpolation method.The daily precipitation data merged using the DBMA approach exhibited distinct spatiotemporal variability,with an outstanding performance,as indicated by low root mean square error(RMSE=1.40 mm/d)and high Person's correlation coefficient(CC=0.67).Compared with the traditional simple model averaging(SMA)and individual product data,although the DBMA-fused precipitation data were slightly lower than the best precipitation product(CMFD),the overall performance of DBMA was more robust.The error analysis between DBMA-fused precipitation dataset and the more advanced Integrated Multi-satellite Retrievals for Global Precipitation Measurement Final(IMERG-F)precipitation product,as well as hydrological simulations in the Ebinur Lake Basin,further demonstrated the superior performance of DBMA-fused precipitation dataset in the entire Xinjiang region.The proposed framework for solving the fusion problem of multi-source precipitation data with different spatial resolutions is feasible for application in inland arid areas,and aids in obtaining more accurate regional hydrological information and improving regional water resources management capabilities and meteorological research in these regions.展开更多
Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent s...Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.展开更多
The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring f...The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring false alarms. To address the above problem, an ensemble of greedy dynamic principal component analysis-Gaussian mixture model(EGDPCA-GMM) is proposed in this paper. First, PCA-GMM is introduced to deal with the collinearity and the non-Gaussian distribution of blast furnace data.Second, in order to explain the dynamics of data, the greedy algorithm is used to determine the extended variables and their corresponding time lags, so as to avoid introducing unnecessary noise. Then the bagging ensemble is adopted to cooperate with greedy extension to eliminate the randomness brought by the greedy algorithm and further reduce the false alarm rate(FAR) of monitoring results. Finally, the algorithm is applied to the blast furnace of a large iron and steel group in South China to verify performance.Compared with the basic algorithms, the proposed method achieves lowest FAR, while keeping missed alarm rate(MAR) remain stable.展开更多
The relationship between the water content or saturation of unsaturated soils and its matrix suction is commonly described by the soilwater characteristic curve(SWCC).Currently,study on the SWCC model is focused on fi...The relationship between the water content or saturation of unsaturated soils and its matrix suction is commonly described by the soilwater characteristic curve(SWCC).Currently,study on the SWCC model is focused on fine-grained soils like clay and silty soils,but the SWCC model for grinding soil-rock mixture(SRM)is less studied.Considering that the SRM is in a certain compaction state in the actual project,this study established a surface model with three variables of coupling compaction degree-substrate suction-moisture content based on the Cavalcante-Zornberg soil-water characteristic curve model.Then,the influence of each fitting parameter on the curve was analyzed.For the common SRM,the soil-water characteristic test was conducted.Moreover,the experimental measurements exhibit remarkable consistency with the mode surface.The analysis shows that the surface model intuitively describes the soil-water characteristics of grinding SRM,which can provide the SWCC of soils with bimodal pore characteristics under specific compaction degrees.Furthermore,it can reflect the influence of compaction degrees on the SWCC of rock-soil mass and has a certain prediction effect.The SWCC of SRM with various soil-rock ratios have a double-step shape.With the increase in compaction degree,the curves as a whole tend toward decreasing mass moisture content.The curve changes are mainly concentrated in the large pore section.展开更多
Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identificatio...Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.展开更多
Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream p...Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.展开更多
Background: Workplace violence (WV) towards psychiatric staff has commonly been associated with Posttraumatic Stress Disorder (PTSD). However, prospective studies have shown that not all psychiatric staff who experien...Background: Workplace violence (WV) towards psychiatric staff has commonly been associated with Posttraumatic Stress Disorder (PTSD). However, prospective studies have shown that not all psychiatric staff who experience workplace violence experience post-traumatic stress. Purpose: We want to examine the longitudinal trajectories of PTSD in this population to identify possible subgroups that might be more at risk. Furthermore, we need to investigate whether certain risk factors of PTSD might identify membership in the subgroups. Method: In a sample of psychiatric staff from 18 psychiatric wards in Denmark who had reported an incident of WV, we used Latent Growth Mixture Modelling (LGMM) and further logistic regression analysis to investigate this. Results: We found three separate PTSD trajectories: a recovering, a delayed-onset, and a moderate-stable trajectory. Higher social support and negative cognitive appraisals about oneself, the world and self-blame predicted membership in the delayed-onset trajectory, while higher social support and lower accept coping predicted membership in the delayed-onset trajectory. Conclusion: Although most psychiatric staff go through a natural recovery, it is important to be aware of and identify staff members who might be struggling long-term. More focus on the factors that might predict these groups should be an important task for psychiatric departments to prevent posttraumatic symptomatology from work.展开更多
Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the...Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.展开更多
Recently,the application of Bayesian updating to predict excavation-induced deformation has proven successful and improved prediction accuracy significantly.However,updating the ground settlement profile,which is cruc...Recently,the application of Bayesian updating to predict excavation-induced deformation has proven successful and improved prediction accuracy significantly.However,updating the ground settlement profile,which is crucial for determining potential damage to nearby infrastructures,has received limited attention.To address this,this paper proposes a physics-guided simplified model combined with a Bayesian updating framework to accurately predict the ground settlement profile.The advantage of this model is that it eliminates the need for complex finite element modeling and makes the updating framework user-friendly.Furthermore,the model is physically interpretable,which can provide valuable references for construction adjustments.The effectiveness of the proposed method is demonstrated through two field case studies,showing that it can yield satisfactory predictions for the settlement profile.展开更多
The state of in situ stress is a crucial parameter in subsurface engineering,especially for critical projects like nuclear waste repository.As one of the two ISRM suggested methods,the overcoring(OC)method is widely u...The state of in situ stress is a crucial parameter in subsurface engineering,especially for critical projects like nuclear waste repository.As one of the two ISRM suggested methods,the overcoring(OC)method is widely used to estimate the full stress tensors in rocks by independent regression analysis of the data from each OC test.However,such customary independent analysis of individual OC tests,known as no pooling,is liable to yield unreliable test-specific stress estimates due to various uncertainty sources involved in the OC method.To address this problem,a practical and no-cost solution is considered by incorporating into OC data analysis additional information implied within adjacent OC tests,which are usually available in OC measurement campaigns.Hence,this paper presents a Bayesian partial pooling(hierarchical)model for combined analysis of adjacent OC tests.We performed five case studies using OC test data made at a nuclear waste repository research site of Sweden.The results demonstrate that partial pooling of adjacent OC tests indeed allows borrowing of information across adjacent tests,and yields improved stress tensor estimates with reduced uncertainties simultaneously for all individual tests than they are independently analysed as no pooling,particularly for those unreliable no pooling stress estimates.A further model comparison shows that the partial pooling model also gives better predictive performance,and thus confirms that the information borrowed across adjacent OC tests is relevant and effective.展开更多
The multi-source passive localization problem is a problem of great interest in signal pro-cessing with many applications.In this paper,a sparse representation model based on covariance matrix is constructed for the l...The multi-source passive localization problem is a problem of great interest in signal pro-cessing with many applications.In this paper,a sparse representation model based on covariance matrix is constructed for the long-range localization scenario,and a sparse Bayesian learning algo-rithm based on Laplace prior of signal covariance is developed for the base mismatch problem caused by target deviation from the initial point grid.An adaptive grid sparse Bayesian learning targets localization(AGSBL)algorithm is proposed.The AGSBL algorithm implements a covari-ance-based sparse signal reconstruction and grid adaptive localization dictionary learning.Simula-tion results show that the AGSBL algorithm outperforms the traditional compressed-aware localiza-tion algorithm for different signal-to-noise ratios and different number of targets in long-range scenes.展开更多
Bayesian model averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator ar...Bayesian model averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator are hard to derive. This paper proposes a mixture of priors and sampling distributions as a basic of a Bayes estimator. The frequentist properties of the new Bayes estimator are automatically derived from Bayesian decision theory. It is shown that if all competing models have the same parametric form, the new Bayes estimator reduces to BMA estimator. The method is applied to the daily exchange rate Euro to US Dollar.展开更多
BACKGROUND The factors affecting the prognosis and role of adjuvant therapy in advanced gallbladder carcinoma(GBC)after curative resection remain unclear.AIM To provide a survival prediction model to patients with GBC...BACKGROUND The factors affecting the prognosis and role of adjuvant therapy in advanced gallbladder carcinoma(GBC)after curative resection remain unclear.AIM To provide a survival prediction model to patients with GBC as well as to identify the role of adjuvant therapy.METHODS Patients with curatively resected advanced gallbladder adenocarcinoma(T3 and T4)were selected from the Surveillance,Epidemiology,and End Results database between 2004 and 2015.A survival prediction model based on Bayesian network(BN)was constructed using the tree-augmented na?ve Bayes algorithm,and composite importance measures were applied to rank the influence of factors on survival.The dataset was divided into a training dataset to establish the BN model and a testing dataset to test the model randomly at a ratio of 7:3.The confusion matrix and receiver operating characteristic curve were used to evaluate the model accuracy.RESULTS A total of 818 patients met the inclusion criteria.The median survival time was 9.0 mo.The accuracy of BN model was 69.67%,and the area under the curve value for the testing dataset was 77.72%.Adjuvant radiation,adjuvant chemotherapy(CTx),T stage,scope of regional lymph node surgery,and radiation sequence were ranked as the top five prognostic factors.A survival prediction table was established based on T stage,N stage,adjuvant radiotherapy(XRT),and CTx.The distribution of the survival time(>9.0 mo)was affected by different treatments with the order of adjuvant chemoradiotherapy(cXRT)>adjuvant radiation>adjuvant chemotherapy>surgery alone.For patients with node-positive disease,the larger benefit predicted by the model is adjuvant chemoradiotherapy.The survival analysis showed that there was a significant difference among the different adjuvant therapy groups(log rank,surgery alone vs CTx,P<0.001;surgery alone vs XRT,P=0.014;surgery alone vs cXRT,P<0.001).CONCLUSION The BN-based survival prediction model can be used as a decision-making support tool for advanced GBC patients.Adjuvant chemoradiotherapy is expected to improve the survival significantly for patients with node-positive disease.展开更多
Cluster-based channel model is the main stream of fifth generation mobile communications, thus the accuracy of clustering algorithm is important. Traditional Gaussian mixture model (GMM) does not consider the power in...Cluster-based channel model is the main stream of fifth generation mobile communications, thus the accuracy of clustering algorithm is important. Traditional Gaussian mixture model (GMM) does not consider the power information which is important for the channel multipath clustering. In this paper, a normalized power weighted GMM (PGMM) is introduced to model the channel multipath components (MPCs). With MPC power as a weighted factor, the PGMM can fit the MPCs in accordance with the cluster-based channel models. Firstly, expectation maximization (EM) algorithm is employed to optimize the PGMM parameters. Then, to further increase the searching ability of EM and choose the optimal number of components without resort to cross-validation, the variational Bayesian (VB) inference is employed. Finally, 28 GHz indoor channel measurement data is used to demonstrate the effectiveness of the PGMM clustering algorithm.展开更多
The stability of soil-rock mixtures(SRMs) that widely distributed in slopes is of significant concern for slope safety evaluation and disaster prevention. The failure behavior of SRM slopes under surface loading condi...The stability of soil-rock mixtures(SRMs) that widely distributed in slopes is of significant concern for slope safety evaluation and disaster prevention. The failure behavior of SRM slopes under surface loading conditions was investigated through a series of centrifuge model tests considering various volumetric gravel contents. The displacement field of the slope was determined with image-based displacement system to observe the deformation of the soil and the movement of the block during loading in the tests. The test results showed that the ultimate bearing capacity and the stiffness of SRM slopes increased evidently when the volumetric block content exceeded a threshold value. Moreover, there were more evident slips around the blocks in the SRM slope. The microscopic analysis of the block motion showed that the rotation of the blocks could aggravate the deformation localization to facilitate the development of the slip surface. The high correlation between the rotation of the key blocks and the slope failure indicated that the blocks became the dominant load-bearing medium that influenced the slope failure. The blocks in the sliding body formed a chain to bear the load and change the displacement distribution of the adjacent matrix sand through the block rotation.展开更多
文摘The topic of this article is one-sided hypothesis testing for disparity, i.e., the mean of one group is larger than that of another when there is uncertainty as to which group a datum is drawn. For each datum, the uncertainty is captured with a given discrete probability distribution over the groups. Such situations arise, for example, in the use of Bayesian imputation methods to assess race and ethnicity disparities with certain insurance, health, and financial data. A widely used method to implement this assessment is the Bayesian Improved Surname Geocoding (BISG) method which assigns a discrete probability over six race/ethnicity groups to an individual given the individual’s surname and address location. Using a Bayesian framework and Markov Chain Monte Carlo sampling from the joint posterior distribution of the group means, the probability of a disparity hypothesis is estimated. Four methods are developed and compared with an illustrative data set. Three of these methods are implemented in an R-code and one method in WinBUGS. These methods are programed for any number of groups between two and six inclusive. All the codes are provided in the appendices.
基金Supported by the Chinese Nursing Association,No.ZHKY202111Scientific Research Program of School of Nursing,Chongqing Medical University,No.20230307Chongqing Science and Health Joint Medical Research Program,No.2024MSXM063.
文摘BACKGROUND Portal hypertension(PHT),primarily induced by cirrhosis,manifests severe symptoms impacting patient survival.Although transjugular intrahepatic portosystemic shunt(TIPS)is a critical intervention for managing PHT,it carries risks like hepatic encephalopathy,thus affecting patient survival prognosis.To our knowledge,existing prognostic models for post-TIPS survival in patients with PHT fail to account for the interplay among and collective impact of various prognostic factors on outcomes.Consequently,the development of an innovative modeling approach is essential to address this limitation.AIM To develop and validate a Bayesian network(BN)-based survival prediction model for patients with cirrhosis-induced PHT having undergone TIPS.METHODS The clinical data of 393 patients with cirrhosis-induced PHT who underwent TIPS surgery at the Second Affiliated Hospital of Chongqing Medical University between January 2015 and May 2022 were retrospectively analyzed.Variables were selected using Cox and least absolute shrinkage and selection operator regression methods,and a BN-based model was established and evaluated to predict survival in patients having undergone TIPS surgery for PHT.RESULTS Variable selection revealed the following as key factors impacting survival:age,ascites,hypertension,indications for TIPS,postoperative portal vein pressure(post-PVP),aspartate aminotransferase,alkaline phosphatase,total bilirubin,prealbumin,the Child-Pugh grade,and the model for end-stage liver disease(MELD)score.Based on the above-mentioned variables,a BN-based 2-year survival prognostic prediction model was constructed,which identified the following factors to be directly linked to the survival time:age,ascites,indications for TIPS,concurrent hypertension,post-PVP,the Child-Pugh grade,and the MELD score.The Bayesian information criterion was 3589.04,and 10-fold cross-validation indicated an average log-likelihood loss of 5.55 with a standard deviation of 0.16.The model’s accuracy,precision,recall,and F1 score were 0.90,0.92,0.97,and 0.95 respectively,with the area under the receiver operating characteristic curve being 0.72.CONCLUSION This study successfully developed a BN-based survival prediction model with good predictive capabilities.It offers valuable insights for treatment strategies and prognostic evaluations in patients having undergone TIPS surgery for PHT.
基金supported by The Technology Innovation Team(Tianshan Innovation Team),Innovative Team for Efficient Utilization of Water Resources in Arid Regions(2022TSYCTD0001)the National Natural Science Foundation of China(42171269)the Xinjiang Academician Workstation Cooperative Research Project(2020.B-001).
文摘Xinjiang Uygur Autonomous Region is a typical inland arid area in China with a sparse and uneven distribution of meteorological stations,limited access to precipitation data,and significant water scarcity.Evaluating and integrating precipitation datasets from different sources to accurately characterize precipitation patterns has become a challenge to provide more accurate and alternative precipitation information for the region,which can even improve the performance of hydrological modelling.This study evaluated the applicability of widely used five satellite-based precipitation products(Climate Hazards Group InfraRed Precipitation with Station(CHIRPS),China Meteorological Forcing Dataset(CMFD),Climate Prediction Center morphing method(CMORPH),Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record(PERSIANN-CDR),and Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis(TMPA))and a reanalysis precipitation dataset(ECMWF Reanalysis v5-Land Dataset(ERA5-Land))in Xinjiang using ground-based observational precipitation data from a limited number of meteorological stations.Based on this assessment,we proposed a framework that integrated different precipitation datasets with varying spatial resolutions using a dynamic Bayesian model averaging(DBMA)approach,the expectation-maximization method,and the ordinary Kriging interpolation method.The daily precipitation data merged using the DBMA approach exhibited distinct spatiotemporal variability,with an outstanding performance,as indicated by low root mean square error(RMSE=1.40 mm/d)and high Person's correlation coefficient(CC=0.67).Compared with the traditional simple model averaging(SMA)and individual product data,although the DBMA-fused precipitation data were slightly lower than the best precipitation product(CMFD),the overall performance of DBMA was more robust.The error analysis between DBMA-fused precipitation dataset and the more advanced Integrated Multi-satellite Retrievals for Global Precipitation Measurement Final(IMERG-F)precipitation product,as well as hydrological simulations in the Ebinur Lake Basin,further demonstrated the superior performance of DBMA-fused precipitation dataset in the entire Xinjiang region.The proposed framework for solving the fusion problem of multi-source precipitation data with different spatial resolutions is feasible for application in inland arid areas,and aids in obtaining more accurate regional hydrological information and improving regional water resources management capabilities and meteorological research in these regions.
文摘Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.
基金supported by the National Natural Science Foundation of China (61903326, 61933015)。
文摘The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring false alarms. To address the above problem, an ensemble of greedy dynamic principal component analysis-Gaussian mixture model(EGDPCA-GMM) is proposed in this paper. First, PCA-GMM is introduced to deal with the collinearity and the non-Gaussian distribution of blast furnace data.Second, in order to explain the dynamics of data, the greedy algorithm is used to determine the extended variables and their corresponding time lags, so as to avoid introducing unnecessary noise. Then the bagging ensemble is adopted to cooperate with greedy extension to eliminate the randomness brought by the greedy algorithm and further reduce the false alarm rate(FAR) of monitoring results. Finally, the algorithm is applied to the blast furnace of a large iron and steel group in South China to verify performance.Compared with the basic algorithms, the proposed method achieves lowest FAR, while keeping missed alarm rate(MAR) remain stable.
基金funded by the Science and Technology Research Program of Chongqing Municipal Education Commission(grant number KJZD-K202100705)the Talents Program Supply System of Chongqing(grant number cstc2022ycjhbgzxm0080)。
文摘The relationship between the water content or saturation of unsaturated soils and its matrix suction is commonly described by the soilwater characteristic curve(SWCC).Currently,study on the SWCC model is focused on fine-grained soils like clay and silty soils,but the SWCC model for grinding soil-rock mixture(SRM)is less studied.Considering that the SRM is in a certain compaction state in the actual project,this study established a surface model with three variables of coupling compaction degree-substrate suction-moisture content based on the Cavalcante-Zornberg soil-water characteristic curve model.Then,the influence of each fitting parameter on the curve was analyzed.For the common SRM,the soil-water characteristic test was conducted.Moreover,the experimental measurements exhibit remarkable consistency with the mode surface.The analysis shows that the surface model intuitively describes the soil-water characteristics of grinding SRM,which can provide the SWCC of soils with bimodal pore characteristics under specific compaction degrees.Furthermore,it can reflect the influence of compaction degrees on the SWCC of rock-soil mass and has a certain prediction effect.The SWCC of SRM with various soil-rock ratios have a double-step shape.With the increase in compaction degree,the curves as a whole tend toward decreasing mass moisture content.The curve changes are mainly concentrated in the large pore section.
基金supported by the grants from the Natural Science Foundation of Hubei Province(No.2020CFB780)the Fundamental Research Funds for the Central Universities(No.2017KFYXJJ020).
文摘Objective Body fluid mixtures are complex biological samples that frequently occur in crime scenes,and can provide important clues for criminal case analysis.DNA methylation assay has been applied in the identification of human body fluids,and has exhibited excellent performance in predicting single-source body fluids.The present study aims to develop a methylation SNaPshot multiplex system for body fluid identification,and accurately predict the mixture samples.In addition,the value of DNA methylation in the prediction of body fluid mixtures was further explored.Methods In the present study,420 samples of body fluid mixtures and 250 samples of single body fluids were tested using an optimized multiplex methylation system.Each kind of body fluid sample presented the specific methylation profiles of the 10 markers.Results Significant differences in methylation levels were observed between the mixtures and single body fluids.For all kinds of mixtures,the Spearman’s correlation analysis revealed a significantly strong correlation between the methylation levels and component proportions(1:20,1:10,1:5,1:1,5:1,10:1 and 20:1).Two random forest classification models were trained for the prediction of mixture types and the prediction of the mixture proportion of 2 components,based on the methylation levels of 10 markers.For the mixture prediction,Model-1 presented outstanding prediction accuracy,which reached up to 99.3%in 427 training samples,and had a remarkable accuracy of 100%in 243 independent test samples.For the mixture proportion prediction,Model-2 demonstrated an excellent accuracy of 98.8%in 252 training samples,and 98.2%in 168 independent test samples.The total prediction accuracy reached 99.3%for body fluid mixtures and 98.6%for the mixture proportions.Conclusion These results indicate the excellent capability and powerful value of the multiplex methylation system in the identification of forensic body fluid mixtures.
文摘Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.
文摘Background: Workplace violence (WV) towards psychiatric staff has commonly been associated with Posttraumatic Stress Disorder (PTSD). However, prospective studies have shown that not all psychiatric staff who experience workplace violence experience post-traumatic stress. Purpose: We want to examine the longitudinal trajectories of PTSD in this population to identify possible subgroups that might be more at risk. Furthermore, we need to investigate whether certain risk factors of PTSD might identify membership in the subgroups. Method: In a sample of psychiatric staff from 18 psychiatric wards in Denmark who had reported an incident of WV, we used Latent Growth Mixture Modelling (LGMM) and further logistic regression analysis to investigate this. Results: We found three separate PTSD trajectories: a recovering, a delayed-onset, and a moderate-stable trajectory. Higher social support and negative cognitive appraisals about oneself, the world and self-blame predicted membership in the delayed-onset trajectory, while higher social support and lower accept coping predicted membership in the delayed-onset trajectory. Conclusion: Although most psychiatric staff go through a natural recovery, it is important to be aware of and identify staff members who might be struggling long-term. More focus on the factors that might predict these groups should be an important task for psychiatric departments to prevent posttraumatic symptomatology from work.
文摘Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.
基金the financial support from the Guangdong Provincial Department of Science and Technology(Grant No.2022A0505030019)the Science and Technology Development Fund,Macao SAR,China(File Nos.0056/2023/RIB2 and SKL-IOTSC-2021-2023).
文摘Recently,the application of Bayesian updating to predict excavation-induced deformation has proven successful and improved prediction accuracy significantly.However,updating the ground settlement profile,which is crucial for determining potential damage to nearby infrastructures,has received limited attention.To address this,this paper proposes a physics-guided simplified model combined with a Bayesian updating framework to accurately predict the ground settlement profile.The advantage of this model is that it eliminates the need for complex finite element modeling and makes the updating framework user-friendly.Furthermore,the model is physically interpretable,which can provide valuable references for construction adjustments.The effectiveness of the proposed method is demonstrated through two field case studies,showing that it can yield satisfactory predictions for the settlement profile.
基金supported by the Guangdong Basic and Applied Basic Research Foundation(2023A1515011244).
文摘The state of in situ stress is a crucial parameter in subsurface engineering,especially for critical projects like nuclear waste repository.As one of the two ISRM suggested methods,the overcoring(OC)method is widely used to estimate the full stress tensors in rocks by independent regression analysis of the data from each OC test.However,such customary independent analysis of individual OC tests,known as no pooling,is liable to yield unreliable test-specific stress estimates due to various uncertainty sources involved in the OC method.To address this problem,a practical and no-cost solution is considered by incorporating into OC data analysis additional information implied within adjacent OC tests,which are usually available in OC measurement campaigns.Hence,this paper presents a Bayesian partial pooling(hierarchical)model for combined analysis of adjacent OC tests.We performed five case studies using OC test data made at a nuclear waste repository research site of Sweden.The results demonstrate that partial pooling of adjacent OC tests indeed allows borrowing of information across adjacent tests,and yields improved stress tensor estimates with reduced uncertainties simultaneously for all individual tests than they are independently analysed as no pooling,particularly for those unreliable no pooling stress estimates.A further model comparison shows that the partial pooling model also gives better predictive performance,and thus confirms that the information borrowed across adjacent OC tests is relevant and effective.
文摘The multi-source passive localization problem is a problem of great interest in signal pro-cessing with many applications.In this paper,a sparse representation model based on covariance matrix is constructed for the long-range localization scenario,and a sparse Bayesian learning algo-rithm based on Laplace prior of signal covariance is developed for the base mismatch problem caused by target deviation from the initial point grid.An adaptive grid sparse Bayesian learning targets localization(AGSBL)algorithm is proposed.The AGSBL algorithm implements a covari-ance-based sparse signal reconstruction and grid adaptive localization dictionary learning.Simula-tion results show that the AGSBL algorithm outperforms the traditional compressed-aware localiza-tion algorithm for different signal-to-noise ratios and different number of targets in long-range scenes.
文摘Bayesian model averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator are hard to derive. This paper proposes a mixture of priors and sampling distributions as a basic of a Bayes estimator. The frequentist properties of the new Bayes estimator are automatically derived from Bayesian decision theory. It is shown that if all competing models have the same parametric form, the new Bayes estimator reduces to BMA estimator. The method is applied to the daily exchange rate Euro to US Dollar.
基金Supported by the National Natural Science Foundation of China,No.81572420 and No.71871181the Key Research and Development Program of Shaanxi Province,No.2017ZDXM-SF-055the Multicenter Clinical Research Project of School of Medicine,Shanghai Jiaotong University,No.DLY201807
文摘BACKGROUND The factors affecting the prognosis and role of adjuvant therapy in advanced gallbladder carcinoma(GBC)after curative resection remain unclear.AIM To provide a survival prediction model to patients with GBC as well as to identify the role of adjuvant therapy.METHODS Patients with curatively resected advanced gallbladder adenocarcinoma(T3 and T4)were selected from the Surveillance,Epidemiology,and End Results database between 2004 and 2015.A survival prediction model based on Bayesian network(BN)was constructed using the tree-augmented na?ve Bayes algorithm,and composite importance measures were applied to rank the influence of factors on survival.The dataset was divided into a training dataset to establish the BN model and a testing dataset to test the model randomly at a ratio of 7:3.The confusion matrix and receiver operating characteristic curve were used to evaluate the model accuracy.RESULTS A total of 818 patients met the inclusion criteria.The median survival time was 9.0 mo.The accuracy of BN model was 69.67%,and the area under the curve value for the testing dataset was 77.72%.Adjuvant radiation,adjuvant chemotherapy(CTx),T stage,scope of regional lymph node surgery,and radiation sequence were ranked as the top five prognostic factors.A survival prediction table was established based on T stage,N stage,adjuvant radiotherapy(XRT),and CTx.The distribution of the survival time(>9.0 mo)was affected by different treatments with the order of adjuvant chemoradiotherapy(cXRT)>adjuvant radiation>adjuvant chemotherapy>surgery alone.For patients with node-positive disease,the larger benefit predicted by the model is adjuvant chemoradiotherapy.The survival analysis showed that there was a significant difference among the different adjuvant therapy groups(log rank,surgery alone vs CTx,P<0.001;surgery alone vs XRT,P=0.014;surgery alone vs cXRT,P<0.001).CONCLUSION The BN-based survival prediction model can be used as a decision-making support tool for advanced GBC patients.Adjuvant chemoradiotherapy is expected to improve the survival significantly for patients with node-positive disease.
基金supported by National Science and Technology Major Program of the Ministry of Science and Technology (No.2018ZX03001031)Key program of Beijing Municipal Natural Science Foundation (No. L172030)+2 种基金Beijing Municipal Science & Technology Commission Project (No. Z171100005217001)Key Project of State Key Lab of Networking and Switching Technology (NST20170205)National Key Technology Research and Development Program of the Ministry of Science and Technology of China (NO. 2012BAF14B01)
文摘Cluster-based channel model is the main stream of fifth generation mobile communications, thus the accuracy of clustering algorithm is important. Traditional Gaussian mixture model (GMM) does not consider the power information which is important for the channel multipath clustering. In this paper, a normalized power weighted GMM (PGMM) is introduced to model the channel multipath components (MPCs). With MPC power as a weighted factor, the PGMM can fit the MPCs in accordance with the cluster-based channel models. Firstly, expectation maximization (EM) algorithm is employed to optimize the PGMM parameters. Then, to further increase the searching ability of EM and choose the optimal number of components without resort to cross-validation, the variational Bayesian (VB) inference is employed. Finally, 28 GHz indoor channel measurement data is used to demonstrate the effectiveness of the PGMM clustering algorithm.
基金supported by National Key R&D Program of China(2018YFC1508503)
文摘The stability of soil-rock mixtures(SRMs) that widely distributed in slopes is of significant concern for slope safety evaluation and disaster prevention. The failure behavior of SRM slopes under surface loading conditions was investigated through a series of centrifuge model tests considering various volumetric gravel contents. The displacement field of the slope was determined with image-based displacement system to observe the deformation of the soil and the movement of the block during loading in the tests. The test results showed that the ultimate bearing capacity and the stiffness of SRM slopes increased evidently when the volumetric block content exceeded a threshold value. Moreover, there were more evident slips around the blocks in the SRM slope. The microscopic analysis of the block motion showed that the rotation of the blocks could aggravate the deformation localization to facilitate the development of the slip surface. The high correlation between the rotation of the key blocks and the slope failure indicated that the blocks became the dominant load-bearing medium that influenced the slope failure. The blocks in the sliding body formed a chain to bear the load and change the displacement distribution of the adjacent matrix sand through the block rotation.