Objective To explore the genotyping characteristics of human fecal Escherichia coli(E. coli) and the relationships between antibiotic resistance genes(ARGs) and multidrug resistance(MDR) of E. coli in Miyun District, ...Objective To explore the genotyping characteristics of human fecal Escherichia coli(E. coli) and the relationships between antibiotic resistance genes(ARGs) and multidrug resistance(MDR) of E. coli in Miyun District, Beijing, an area with high incidence of infectious diarrheal cases but no related data.Methods Over a period of 3 years, 94 E. coli strains were isolated from fecal samples collected from Miyun District Hospital, a surveillance hospital of the National Pathogen Identification Network. The antibiotic susceptibility of the isolates was determined by the broth microdilution method. ARGs,multilocus sequence typing(MLST), and polymorphism trees were analyzed using whole-genome sequencing data(WGS).Results This study revealed that 68.09% of the isolates had MDR, prevalent and distributed in different clades, with a relatively high rate and low pathogenicity. There was no difference in MDR between the diarrheal(49/70) and healthy groups(15/24).Conclusion We developed a random forest(RF) prediction model of TEM.1 + baeR + mphA + mphB +QnrS1 + AAC.3-IId to identify MDR status, highlighting its potential for early resistance identification. The causes of MDR are likely mobile units transmitting the ARGs. In the future, we will continue to strengthen the monitoring of ARGs and MDR, and increase the number of strains to further verify the accuracy of the MDR markers.展开更多
Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences...Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences a sequence of clinical progression events.One main objective in the MSM framework is variable selection,where attempts are made to identify the risk factors associated with the transition hazard rates or probabilities of disease progression.The usual variable selection methods,including stepwise and penalized methods,do not provide information about the importance of variables.In this context,we present a two-step algorithm to evaluate the importance of variables formulti-state data.Three differentmachine learning approaches(randomforest,gradient boosting,and neural network)as themost widely usedmethods are considered to estimate the variable importance in order to identify the factors affecting disease progression and rank these factors according to their importance.The performance of our proposed methods is validated by simulation and applied to the COVID-19 data set.The results revealed that the proposed two-stage method has promising performance for estimating variable importance.展开更多
Magnetic Resonance Imaging(MRI)is one of the important resources for identifying abnormalities in the human brain.This work proposes an effective Multi-Class Classification(MCC)system using Binary Robust Invariant Scal...Magnetic Resonance Imaging(MRI)is one of the important resources for identifying abnormalities in the human brain.This work proposes an effective Multi-Class Classification(MCC)system using Binary Robust Invariant Scalable Keypoints(BRISK)as texture descriptors for effective classification.Atfirst,the potential Region Of Interests(ROIs)are detected using features from the acceler-ated segment test algorithm.Then,non-maxima suppression is employed in scale space based on the information in the ROIs.The discriminating power of BRISK is examined using three machine learning classifiers such as k-Nearest Neighbour(kNN),Support Vector Machine(SVM)and Random Forest(RF).An MCC sys-tem is developed which classifies the MRI images into normal,glioma,meningio-ma and pituitary.A total of 3264 MRI brain images are employed in this study to evaluate the proposed MCC system.Results show that the average accuracy of the proposed MCC-RF based system is 99.62%with a sensitivity of 99.16%and spe-cificity of 99.75%.The average accuracy of the MCC-kNN system is 93.65%and 97.59%by the MCC-SVM based system.展开更多
This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance rank...This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.展开更多
Aims Preserving and restoring Tamarix ramosissima is urgently required in the Tarim Basin,Northwest China.Using species distribution models to predict the biogeographical distribution of species is regularly used in c...Aims Preserving and restoring Tamarix ramosissima is urgently required in the Tarim Basin,Northwest China.Using species distribution models to predict the biogeographical distribution of species is regularly used in conservation and other management activities.However,the uncertainty in the data and models inevitably reduces their prediction power.The major purpose of this study is to assess the impacts of predictor variables and species distribution models on simulating T.ramosissima distribution,to explore the relationships between predictor variables and species distribution models and to model the potential distribution of T.ramosissima in this basin.Methods Three models—the generalized linear model(GLM),classification and regression tree(CART)and Random Forests—were selected and were processed on the BIOMOD platform.The presence/absence data of T.ramosissima in the Tarim Basin,which were calculated from vegetation maps,were used as response variables.Climate,soil and digital elevation model(DEM)data variables were divided into four datasets and then used as predictors.The four datasets were(i)climate variables,(ii)soil,climate and DEM variables,(iii)principal component analysis(PCA)-based climate variables and(iv)PCA-based soil,climate and DEM variables.Important Findings The results indicate that predictive variables for species distribution models should be chosen carefully,because too many predictors can reduce the prediction power.The effectiveness of using PCA to reduce the correlation among predictors and enhance the modelling power depends on the chosen predictor variables and models.Our results implied that it is better to reduce the correlating predictors before model processing.The Random Forests model was more precise than the GLM and CART models.The best model for T.ramosissima was the Random Forests model with climate predictors alone.Soil variables considered in this study could not significantly improve the model’s prediction accuracy for T.ramosissima.The potential distribution area of T.ramosissima in the Tarim Basin is;3.57310^(4) km^(2),which has the potential to mitigate global warming and produce bioenergy through restoring T.ramosissima in the Tarim Basin.展开更多
基金funded by the National Pathogen Identification Network project and Research on Key Technologies of Intelligent Monitoring,Early Warning and Tracing of Infectious Diseases in Miyun。
文摘Objective To explore the genotyping characteristics of human fecal Escherichia coli(E. coli) and the relationships between antibiotic resistance genes(ARGs) and multidrug resistance(MDR) of E. coli in Miyun District, Beijing, an area with high incidence of infectious diarrheal cases but no related data.Methods Over a period of 3 years, 94 E. coli strains were isolated from fecal samples collected from Miyun District Hospital, a surveillance hospital of the National Pathogen Identification Network. The antibiotic susceptibility of the isolates was determined by the broth microdilution method. ARGs,multilocus sequence typing(MLST), and polymorphism trees were analyzed using whole-genome sequencing data(WGS).Results This study revealed that 68.09% of the isolates had MDR, prevalent and distributed in different clades, with a relatively high rate and low pathogenicity. There was no difference in MDR between the diarrheal(49/70) and healthy groups(15/24).Conclusion We developed a random forest(RF) prediction model of TEM.1 + baeR + mphA + mphB +QnrS1 + AAC.3-IId to identify MDR status, highlighting its potential for early resistance identification. The causes of MDR are likely mobile units transmitting the ARGs. In the future, we will continue to strengthen the monitoring of ARGs and MDR, and increase the number of strains to further verify the accuracy of the MDR markers.
文摘Survival data with amulti-state structure are frequently observed in follow-up studies.An analytic approach based on a multi-state model(MSM)should be used in longitudinal health studies in which a patient experiences a sequence of clinical progression events.One main objective in the MSM framework is variable selection,where attempts are made to identify the risk factors associated with the transition hazard rates or probabilities of disease progression.The usual variable selection methods,including stepwise and penalized methods,do not provide information about the importance of variables.In this context,we present a two-step algorithm to evaluate the importance of variables formulti-state data.Three differentmachine learning approaches(randomforest,gradient boosting,and neural network)as themost widely usedmethods are considered to estimate the variable importance in order to identify the factors affecting disease progression and rank these factors according to their importance.The performance of our proposed methods is validated by simulation and applied to the COVID-19 data set.The results revealed that the proposed two-stage method has promising performance for estimating variable importance.
文摘Magnetic Resonance Imaging(MRI)is one of the important resources for identifying abnormalities in the human brain.This work proposes an effective Multi-Class Classification(MCC)system using Binary Robust Invariant Scalable Keypoints(BRISK)as texture descriptors for effective classification.Atfirst,the potential Region Of Interests(ROIs)are detected using features from the acceler-ated segment test algorithm.Then,non-maxima suppression is employed in scale space based on the information in the ROIs.The discriminating power of BRISK is examined using three machine learning classifiers such as k-Nearest Neighbour(kNN),Support Vector Machine(SVM)and Random Forest(RF).An MCC sys-tem is developed which classifies the MRI images into normal,glioma,meningio-ma and pituitary.A total of 3264 MRI brain images are employed in this study to evaluate the proposed MCC system.Results show that the average accuracy of the proposed MCC-RF based system is 99.62%with a sensitivity of 99.16%and spe-cificity of 99.75%.The average accuracy of the MCC-kNN system is 93.65%and 97.59%by the MCC-SVM based system.
文摘This paper aims to build an employee attrition classification model based on the Stacking algorithm.Oversampling algorithm is applied to address the issue of data imbalance and the Randomforest feature importance ranking method is used to resolve the overfitting problem after data cleaning and preprocessing.Then,different algorithms are used to establish classification models as control experiments,and R-squared indicators are used to compare.Finally,the Stacking algorithm is used to establish the final classification model.This model has practical and significant implications for both human resource management and employee attrition analysis.
基金National Basic Research Program of China(973 Program)(No.2010CB951303 and No.2009CB421106).
文摘Aims Preserving and restoring Tamarix ramosissima is urgently required in the Tarim Basin,Northwest China.Using species distribution models to predict the biogeographical distribution of species is regularly used in conservation and other management activities.However,the uncertainty in the data and models inevitably reduces their prediction power.The major purpose of this study is to assess the impacts of predictor variables and species distribution models on simulating T.ramosissima distribution,to explore the relationships between predictor variables and species distribution models and to model the potential distribution of T.ramosissima in this basin.Methods Three models—the generalized linear model(GLM),classification and regression tree(CART)and Random Forests—were selected and were processed on the BIOMOD platform.The presence/absence data of T.ramosissima in the Tarim Basin,which were calculated from vegetation maps,were used as response variables.Climate,soil and digital elevation model(DEM)data variables were divided into four datasets and then used as predictors.The four datasets were(i)climate variables,(ii)soil,climate and DEM variables,(iii)principal component analysis(PCA)-based climate variables and(iv)PCA-based soil,climate and DEM variables.Important Findings The results indicate that predictive variables for species distribution models should be chosen carefully,because too many predictors can reduce the prediction power.The effectiveness of using PCA to reduce the correlation among predictors and enhance the modelling power depends on the chosen predictor variables and models.Our results implied that it is better to reduce the correlating predictors before model processing.The Random Forests model was more precise than the GLM and CART models.The best model for T.ramosissima was the Random Forests model with climate predictors alone.Soil variables considered in this study could not significantly improve the model’s prediction accuracy for T.ramosissima.The potential distribution area of T.ramosissima in the Tarim Basin is;3.57310^(4) km^(2),which has the potential to mitigate global warming and produce bioenergy through restoring T.ramosissima in the Tarim Basin.