BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling techn...BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.展开更多
The shallow subsurface defects are difficult to be identified and quantified by ultrasonic time-of-flight diffraction(TOFD)due to the low resolution induced by pulse width and beam spreading.In this paper,Sparse-SAFT ...The shallow subsurface defects are difficult to be identified and quantified by ultrasonic time-of-flight diffraction(TOFD)due to the low resolution induced by pulse width and beam spreading.In this paper,Sparse-SAFT is proposed to improve the time resolution and lateral resolution in TOFD imaging by combining sparse deconvolution and synthetic aperture focusing technique(SAFT).The mathematical model in the frequency domain is established based on the l1 and l2 norm constraints,and the optimization problem is solved for enhancing time resolution.On this basis,SAFT is employed to improve lateral resolution by delay-and-sum beamforming.The simulated and experimental results indicate that the lateral wave and tip-diffracted waves can be decoupled with Sparse-SAFT.The shallow subsurface defects with a height of 3.0 mm at the depth of 3.0 mm were detected quantitatively,and the relative measurement errors of flaw heights and depths were no more than 10.3%.Compared to conventional SAFT,the time resolution and lateral resolution are enhanced by 72.5 and 56%with Sparse-SAFT,respectively.Finally,the proposed method is also suitable for improving resolution to detect the defects beyond dead zone.展开更多
Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenom...Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.展开更多
In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wan...In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performa...Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.展开更多
During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground sam...During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground samples and the information is subjective,heterogeneous,and imbalanced due to mixed ground conditions.In this study,an unsupervised(K-means)and synthetic minority oversampling technique(SMOTE)-guided light-gradient boosting machine(LightGBM)classifier is proposed to identify the soft ground tunnel classification and determine the imbalanced issue of tunnelling data.During the tunnel excavation,an earth pressure balance(EPB)TBM recorded 18 different operational parameters along with the three main tunnel lithologies.The proposed model is applied using Python low-code PyCaret library.Next,four decision tree-based classifiers were obtained in a short time period with automatic hyperparameter tuning to determine the best model for clustering-guided SMOTE application.In addition,the Shapley additive explanation(SHAP)was implemented to avoid the model black box problem.The proposed model was evaluated using different metrics such as accuracy,F1 score,precision,recall,and receiver operating characteristics(ROC)curve to obtain a reasonable outcome for the minority class.It shows that the proposed model can provide significant tunnel lithology identification based on the operational parameters of EPB-TBM.The proposed method can be applied to heterogeneous tunnel formations with several TBM operational parameters to describe the tunnel lithologies for efficient tunnelling.展开更多
Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). ...Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). The collected EEG signals are processed using Machine Learning-Random Forest and Naive Bayes- and Deep Learning-Recurrent Neural Network (RNN), Neural Network (NN) and Long Short Term Memory (LSTM)-Algorithms to obtain the recent mood of a person. The Algorithms mentioned above have been imposed on the data set in order to find out what the person is feeling at a particular moment. The following thesis is conducted to find out one of the following moods (happy, surprised, disgust, fear, anger and sadness) of a person at an instant, with an aim to obtain the result with least amount of time delay as the mood differs. It is pretty obvious that the accuracy of the output varies depending upon the algorithm used, time taken to process the data, so that it is easy for us to compare the reliability and dependency of a particular algorithm to another, prior to its practical implementation. The imbalance data sets that were used had an imbalanced class and thus, over fitting occurred. This problem was handled by generating Artificial Data sets with the use of SMOTE Oversampling Technique.展开更多
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
Background Bacterial vaginosis is a polymicrobial syndrome in which the homeostasis exerted by the Latobacillus species that protect the vaginal mucosa has been lost.This study explored the data balancing process with...Background Bacterial vaginosis is a polymicrobial syndrome in which the homeostasis exerted by the Latobacillus species that protect the vaginal mucosa has been lost.This study explored the data balancing process with the intention of improving the quality of association rules.The article aimed to balance the unbalanced multiclass dataset to improve association rule creation.Methods A dataset with 201 observations and 58 variables was analyzed.A preconstructed dataset was used.The authors collected the data between August 2016 and October 2018 in Tabasco,Mexico.The study population comprised sexually active women ages 18 to 50 who underwent gynecological inspection at the infectious and metabolic diseases research laboratory at the Universidad Juarez Autonoma de Tabasco.To determine the best κ-value,the random-forest algorithm was used and the balancing was performed with the synthetic minority over-sampling technique(SMOTE),random over-sampling examples(ROSE),and adaptive syntetic sampling approach for imbalanced learning(ADASYN)algorithms.The Apriori algorithm created the rules and to select rules with statistical significance,the is.redundant(),is.significant(),and is.maximal()functions and quality metric Fisher’s exact tes were used.The biological validation was carried out by the expert(bacteriologist).Results The ADASYN algorithm at K=9 the out of the bag(OOB)error was zero,this was the best𝐾-values.In the balancing process the ADASYN algorithm show best the performance.From the dataset balanced with ADASYN,the apriori algorithm created the association rules and the selection with the quality metric Fisher’s exact test,and the biological validation reported 13 rules.Gram-bacteria Atopobium vaginae,Gardnerella vaginalis,Megasphaera filotipo 1,Mycoplasma hominis and Ureaplasma parvum were detected by the apriori algorithm from the balanced dataset.Conclusion Balancing may improve the creation of association rules to efficiently model the bacteria that cause bacterial vaginosis.展开更多
Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and bi- ased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various tec...Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and bi- ased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques in- cluding sampling and cost sensitive learning are often em- ployed to improve the performance of classifiers in such sit- uations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between the measure accord- ing to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard three- layer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently fa- vorable outcomes in comparison with a commonly used sam- pling technique. The effectiveness of multi-objective opti- mization in handling imbalanced problems is also demon- strated.展开更多
The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its ...The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/.展开更多
基金Supported by Discipline Advancement Program of Shanghai Fourth People’s Hospital,No.SY-XKZT-2020-2013.
文摘BACKGROUND Postoperative delirium,particularly prevalent in elderly patients after abdominal cancer surgery,presents significant challenges in clinical management.AIM To develop a synthetic minority oversampling technique(SMOTE)-based model for predicting postoperative delirium in elderly abdominal cancer patients.METHODS In this retrospective cohort study,we analyzed data from 611 elderly patients who underwent abdominal malignant tumor surgery at our hospital between September 2020 and October 2022.The incidence of postoperative delirium was recorded for 7 d post-surgery.Patients were divided into delirium and non-delirium groups based on the occurrence of postoperative delirium or not.A multivariate logistic regression model was used to identify risk factors and develop a predictive model for postoperative delirium.The SMOTE technique was applied to enhance the model by oversampling the delirium cases.The model’s predictive accuracy was then validated.RESULTS In our study involving 611 elderly patients with abdominal malignant tumors,multivariate logistic regression analysis identified significant risk factors for postoperative delirium.These included the Charlson comorbidity index,American Society of Anesthesiologists classification,history of cerebrovascular disease,surgical duration,perioperative blood transfusion,and postoperative pain score.The incidence rate of postoperative delirium in our study was 22.91%.The original predictive model(P1)exhibited an area under the receiver operating characteristic curve of 0.862.In comparison,the SMOTE-based logistic early warning model(P2),which utilized the SMOTE oversampling algorithm,showed a slightly lower but comparable area under the curve of 0.856,suggesting no significant difference in performance between the two predictive approaches.CONCLUSION This study confirms that the SMOTE-enhanced predictive model for postoperative delirium in elderly abdominal tumor patients shows performance equivalent to that of traditional methods,effectively addressing data imbalance.
基金National Key Research and Development Program of China(Grant No.2019YFA0709003)National Natural Science Foundation of China(Grant No.51905079)Liaoning Revitalization Talents Program(Grant No.XLYC1902082).
文摘The shallow subsurface defects are difficult to be identified and quantified by ultrasonic time-of-flight diffraction(TOFD)due to the low resolution induced by pulse width and beam spreading.In this paper,Sparse-SAFT is proposed to improve the time resolution and lateral resolution in TOFD imaging by combining sparse deconvolution and synthetic aperture focusing technique(SAFT).The mathematical model in the frequency domain is established based on the l1 and l2 norm constraints,and the optimization problem is solved for enhancing time resolution.On this basis,SAFT is employed to improve lateral resolution by delay-and-sum beamforming.The simulated and experimental results indicate that the lateral wave and tip-diffracted waves can be decoupled with Sparse-SAFT.The shallow subsurface defects with a height of 3.0 mm at the depth of 3.0 mm were detected quantitatively,and the relative measurement errors of flaw heights and depths were no more than 10.3%.Compared to conventional SAFT,the time resolution and lateral resolution are enhanced by 72.5 and 56%with Sparse-SAFT,respectively.Finally,the proposed method is also suitable for improving resolution to detect the defects beyond dead zone.
文摘Delirium,a complex neurocognitive syndrome,frequently emerges following surgery,presenting diverse manifestations and considerable obstacles,especially among the elderly.This editorial delves into the intricate phenomenon of postoperative delirium(POD),shedding light on a study that explores POD in elderly individuals undergoing abdominal malignancy surgery.The study examines pathophysiology and predictive determinants,offering valuable insights into this challenging clinical scenario.Employing the synthetic minority oversampling technique,a predictive model is developed,incorporating critical risk factors such as comorbidity index,anesthesia grade,and surgical duration.There is an urgent need for accurate risk factor identification to mitigate POD incidence.While specific to elderly patients with abdominal malignancies,the findings contribute significantly to understanding delirium pathophysiology and prediction.Further research is warranted to establish standardized predictive for enhanced generalizability.
文摘In this editorial,we comment on the article by Hu et al entitled“Predictive modeling for postoperative delirium in elderly patients with abdominal malignancies using synthetic minority oversampling technique”.We wanted to draw attention to the general features of postoperative delirium(POD)as well as the areas where there are uncertainties and contradictions.POD can be defined as acute neurocognitive dysfunction that occurs in the first week after surgery.It is a severe postoperative complication,especially for elderly oncology patients.Although the underlying pathophysiological mechanism is not fully understood,various neuroinflammatory mechanisms and neurotransmitters are thought to be involved.Various assessment scales and diagnostic methods have been proposed for the early diagnosis of POD.As delirium is considered a preventable clinical entity in about half of the cases,various early prediction models developed with the support of machine learning have recently become a hot scientific topic.Unfortunately,a model with high sensitivity and specificity for the prediction of POD has not yet been reported.This situation reveals that all health personnel who provide health care services to elderly patients should approach patients with a high level of awareness in the perioperative period regarding POD.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.
基金supported by the National Key R&D Program of China(No.2021YFC2100100)the National Natural Science Foundation of China(No.21901157)the Shanghai Science and Technology Project(No.21JC1403400)。
文摘Accurate regulation of two-dimensional materials has become an effective strategy to develop a wide range of catalytic applications.The introduction of heterogeneous components has a significant impact on the performance of materials,which makes it difficult to discover and understand the structure-property relationships at the atomic level.Here,we developed a novel and efficient ensemble learning classifier with synthetic minority oversampling technique(SMOTE) to discover all possible arsenene catalysts with implanted heteroatoms for hydrogen evolution reaction(HER).A total of 850 doped arsenenes were collected as a database and 140 modified arsenene materials with different doping atoms and doping sites were identified as promising candidate catalysts for HER,with a machine learning prediction accuracy of 81%.Based on the results of machine learning,we proposed 13 low-cost and easily synthesized two-dimensional Fe-doped arsenene catalytic materials that are expected to contribute to high-efficient HER.The proposed ensemble method achieved high prediction accuracy,but millions of times faster to predict Gibbs free energies and only required a small amount of data.This study indicates that the presented ensemble learning classifier is capable of screening high-efficient catalysts,and can be further extended to predict other two-dimensional catalysts with delicate regulation.
基金supported by Japan Society for the Promotion of Science KAKENHI(Grant No.JP22H01580).
文摘During tunnel boring machine(TBM)excavation,lithology identification is an important issue to understand tunnelling performance and avoid time-consuming excavation.However,site investigation generally lacks ground samples and the information is subjective,heterogeneous,and imbalanced due to mixed ground conditions.In this study,an unsupervised(K-means)and synthetic minority oversampling technique(SMOTE)-guided light-gradient boosting machine(LightGBM)classifier is proposed to identify the soft ground tunnel classification and determine the imbalanced issue of tunnelling data.During the tunnel excavation,an earth pressure balance(EPB)TBM recorded 18 different operational parameters along with the three main tunnel lithologies.The proposed model is applied using Python low-code PyCaret library.Next,four decision tree-based classifiers were obtained in a short time period with automatic hyperparameter tuning to determine the best model for clustering-guided SMOTE application.In addition,the Shapley additive explanation(SHAP)was implemented to avoid the model black box problem.The proposed model was evaluated using different metrics such as accuracy,F1 score,precision,recall,and receiver operating characteristics(ROC)curve to obtain a reasonable outcome for the minority class.It shows that the proposed model can provide significant tunnel lithology identification based on the operational parameters of EPB-TBM.The proposed method can be applied to heterogeneous tunnel formations with several TBM operational parameters to describe the tunnel lithologies for efficient tunnelling.
文摘Developments in biomedical science, signal processing technologies have led Electroencephalography (EEG) signals to be widely used in the diagnosis of brain disease and in the field of Brain-Computer Interface (BCI). The collected EEG signals are processed using Machine Learning-Random Forest and Naive Bayes- and Deep Learning-Recurrent Neural Network (RNN), Neural Network (NN) and Long Short Term Memory (LSTM)-Algorithms to obtain the recent mood of a person. The Algorithms mentioned above have been imposed on the data set in order to find out what the person is feeling at a particular moment. The following thesis is conducted to find out one of the following moods (happy, surprised, disgust, fear, anger and sadness) of a person at an instant, with an aim to obtain the result with least amount of time delay as the mood differs. It is pretty obvious that the accuracy of the output varies depending upon the algorithm used, time taken to process the data, so that it is easy for us to compare the reliability and dependency of a particular algorithm to another, prior to its practical implementation. The imbalance data sets that were used had an imbalanced class and thus, over fitting occurred. This problem was handled by generating Artificial Data sets with the use of SMOTE Oversampling Technique.
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
文摘Background Bacterial vaginosis is a polymicrobial syndrome in which the homeostasis exerted by the Latobacillus species that protect the vaginal mucosa has been lost.This study explored the data balancing process with the intention of improving the quality of association rules.The article aimed to balance the unbalanced multiclass dataset to improve association rule creation.Methods A dataset with 201 observations and 58 variables was analyzed.A preconstructed dataset was used.The authors collected the data between August 2016 and October 2018 in Tabasco,Mexico.The study population comprised sexually active women ages 18 to 50 who underwent gynecological inspection at the infectious and metabolic diseases research laboratory at the Universidad Juarez Autonoma de Tabasco.To determine the best κ-value,the random-forest algorithm was used and the balancing was performed with the synthetic minority over-sampling technique(SMOTE),random over-sampling examples(ROSE),and adaptive syntetic sampling approach for imbalanced learning(ADASYN)algorithms.The Apriori algorithm created the rules and to select rules with statistical significance,the is.redundant(),is.significant(),and is.maximal()functions and quality metric Fisher’s exact tes were used.The biological validation was carried out by the expert(bacteriologist).Results The ADASYN algorithm at K=9 the out of the bag(OOB)error was zero,this was the best𝐾-values.In the balancing process the ADASYN algorithm show best the performance.From the dataset balanced with ADASYN,the apriori algorithm created the association rules and the selection with the quality metric Fisher’s exact test,and the biological validation reported 13 rules.Gram-bacteria Atopobium vaginae,Gardnerella vaginalis,Megasphaera filotipo 1,Mycoplasma hominis and Ureaplasma parvum were detected by the apriori algorithm from the balanced dataset.Conclusion Balancing may improve the creation of association rules to efficiently model the bacteria that cause bacterial vaginosis.
文摘Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and bi- ased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques in- cluding sampling and cost sensitive learning are often em- ployed to improve the performance of classifiers in such sit- uations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between the measure accord- ing to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard three- layer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently fa- vorable outcomes in comparison with a commonly used sam- pling technique. The effectiveness of multi-objective opti- mization in handling imbalanced problems is also demon- strated.
基金Project supported by the National Natural Science Foundation of China(No.61972261)the Natural Science Foundation of Guangdong Province,China(No.2023A1515011667)+1 种基金the Key Basic Research Foundation of Shenzhen,China(No.JCYJ20220818100205012)the Basic Research Foundation of Shenzhen,China(No.JCYJ20210324093609026)。
文摘The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/.