BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling techn...BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.展开更多
In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wan...In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.展开更多
Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenom...Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.展开更多
During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground sam...During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground samples and the information is subjective,heterogeneous,and imbalanced due to mixed ground conditions.In this study,an unsupervised(K-means)and synthetic minority oversampling technique(SMOTE)-guided light-gradient boosting machine(LightGBM)classifier is proposed to identify the soft ground tunnel classification and determine the imbalanced issue of tunnelling data.During the tunnel excavation,an earth pressure balance(EPB)TBM recorded 18 different operational parameters along with the three main tunnel lithologies.The proposed model is applied using Python low-code PyCaret library.Next,four decision tree-based classifiers were obtained in a short time period with automatic hyperparameter tuning to determine the best model for clustering-guided SMOTE application.In addition,the Shapley additive explanation(SHAP)was implemented to avoid the model black box problem.The proposed model was evaluated using different metrics such as accuracy,F1 score,precision,recall,and receiver operating characteristics(ROC)curve to obtain a reasonable outcome for the minority class.It shows that the proposed model can provide significant tunnel lithology identification based on the operational parameters of EPB-TBM.The proposed method can be applied to heterogeneous tunnel formations with several TBM operational parameters to describe the tunnel lithologies for efficient tunnelling.展开更多
Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performa...Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.展开更多
In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job ...In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
With the development of the XFEL (X-ray free electron laser), high quality diffraction patterns from nanocrystals have been achieved. The nanocrystals with different sizes and random orientations are injected to the...With the development of the XFEL (X-ray free electron laser), high quality diffraction patterns from nanocrystals have been achieved. The nanocrystals with different sizes and random orientations are injected to the XFEL beams and the diffraction patterns can be obtained by the so-called "diffraction-and-destruction" mode. The recovery of orientations is one of the most critical steps in reconstructing the 3D structure of nanocrystals. There is already an approach to solve the orientation problem by using the automated indexing software in crystallography. However, this method cannot distinguish the twin orientations in the cases of the symmetries of Bravais lattices higher than the point groups. Here we propose a new method to solve this problem. The shape transforms of nanocrystals can be determined from all of the intensities around the diffraction spots, and then Fourier transformation of a single crystal cell is obtained. The actual orientations of the patterns can be solved by comparing the values of the Fourier transformations of the crystal cell on the intersections of all patterns. This so-called "multiple-common-line" method can distinguish the twin orientations in the XFEL diffraction patterns successfully.展开更多
The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its ...The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/.展开更多
基金Supported by Discipline Advancement Program of Shanghai Fourth People’s Hospital,No.SY-XKZT-2020-2013.
文摘BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.
文摘In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.
文摘Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.
基金supported by Japan Society for the Promotion of Science KAKENHI(Grant No.JP22H01580).
文摘During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground samples and the information is subjective,heterogeneous,and imbalanced due to mixed ground conditions.In this study,an unsupervised(K-means)and synthetic minority oversampling technique(SMOTE)-guided light-gradient boosting machine(LightGBM)classifier is proposed to identify the soft ground tunnel classification and determine the imbalanced issue of tunnelling data.During the tunnel excavation,an earth pressure balance(EPB)TBM recorded 18 different operational parameters along with the three main tunnel lithologies.The proposed model is applied using Python low-code PyCaret library.Next,four decision tree-based classifiers were obtained in a short time period with automatic hyperparameter tuning to determine the best model for clustering-guided SMOTE application.In addition,the Shapley additive explanation(SHAP)was implemented to avoid the model black box problem.The proposed model was evaluated using different metrics such as accuracy,F1 score,precision,recall,and receiver operating characteristics(ROC)curve to obtain a reasonable outcome for the minority class.It shows that the proposed model can provide significant tunnel lithology identification based on the operational parameters of EPB-TBM.The proposed method can be applied to heterogeneous tunnel formations with several TBM operational parameters to describe the tunnel lithologies for efficient tunnelling.
基金supported by the National Key R&D Program of China(No.2021YFC2100100)the National Natural Science Foundation of China(No.21901157)the Shanghai Science and Technology Project(No.21JC1403400)。
文摘Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.
文摘In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.
基金Supported by National Natural Science Foundation of China (10979005)National Basic Research Program of China(2009CB918600)
文摘With the development of the XFEL (X-ray free electron laser), high quality diffraction patterns from nanocrystals have been achieved. The nanocrystals with different sizes and random orientations are injected to the XFEL beams and the diffraction patterns can be obtained by the so-called "diffraction-and-destruction" mode. The recovery of orientations is one of the most critical steps in reconstructing the 3D structure of nanocrystals. There is already an approach to solve the orientation problem by using the automated indexing software in crystallography. However, this method cannot distinguish the twin orientations in the cases of the symmetries of Bravais lattices higher than the point groups. Here we propose a new method to solve this problem. The shape transforms of nanocrystals can be determined from all of the intensities around the diffraction spots, and then Fourier transformation of a single crystal cell is obtained. The actual orientations of the patterns can be solved by comparing the values of the Fourier transformations of the crystal cell on the intersections of all patterns. This so-called "multiple-common-line" method can distinguish the twin orientations in the XFEL diffraction patterns successfully.
基金Project supported by the National Natural Science Foundation of China(No.61972261)the Natural Science Foundation of Guangdong Province,China(No.2023A1515011667)+1 种基金the Key Basic Research Foundation of Shenzhen,China(No.JCYJ20220818100205012)the Basic Research Foundation of Shenzhen,China(No.JCYJ20210324093609026)。
文摘The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/.