A network intrusion detection system is critical for cyber security against llegitimate attacks.In terms of feature perspectives,network traffic may include a variety of elements such as attack reference,attack type,a...A network intrusion detection system is critical for cyber security against llegitimate attacks.In terms of feature perspectives,network traffic may include a variety of elements such as attack reference,attack type,a subcategory of attack,host information,malicious scripts,etc.In terms of network perspectives,network traffic may contain an imbalanced number of harmful attacks when compared to normal traffic.It is challenging to identify a specific attack due to complex features and data imbalance issues.To address these issues,this paper proposes an Intrusion Detection System using transformer-based transfer learning for Imbalanced Network Traffic(IDS-INT).IDS-INT uses transformer-based transfer learning to learn feature interactions in both network feature representation and imbalanced data.First,detailed information about each type of attack is gathered from network interaction descriptions,which include network nodes,attack type,reference,host information,etc.Second,the transformer-based transfer learning approach is developed to learn detailed feature representation using their semantic anchors.Third,the Synthetic Minority Oversampling Technique(SMOTE)is implemented to balance abnormal traffic and detect minority attacks.Fourth,the Convolution Neural Network(CNN)model is designed to extract deep features from the balanced network traffic.Finally,the hybrid approach of the CNN-Long Short-Term Memory(CNN-LSTM)model is developed to detect different types of attacks from the deep features.Detailed experiments are conducted to test the proposed approach using three standard datasets,i.e.,UNsWNB15,CIC-IDS2017,and NSL-KDD.An explainable AI approach is implemented to interpret the proposed method and develop a trustable model.展开更多
As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is o...As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is one of these challenges that can significantly degrade the learning efficiency.To deal with data imbalance issue,this work proposes a new learning framework,called clustered federated learning with weighted model aggregation(weighted CFL).Compared with traditional FL,our weighted CFL adaptively clusters the participating edge devices based on the cosine similarity of their local gradients at each training iteration,and then performs weighted per-cluster model aggregation.Therein,the similarity threshold for clustering is adaptive over iterations in response to the time-varying divergence of local gradients.Moreover,the weights for per-cluster model aggregation are adjusted according to the data balance feature so as to speed up the convergence rate.Experimental results show that the proposed weighted CFL achieves a faster model convergence rate and greater learning accuracy than benchmark methods under the imbalanced data scenario.展开更多
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced mach...Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.展开更多
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl...Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.展开更多
Traditional particle identification methods face timeconsuming,experience-dependent,and poor repeatability challenges in heavy-ion collisions at low and intermediate energies.Researchers urgently need solutions to the...Traditional particle identification methods face timeconsuming,experience-dependent,and poor repeatability challenges in heavy-ion collisions at low and intermediate energies.Researchers urgently need solutions to the dilemma of traditional particle identification methods.This study explores the possibility of applying intelligent learning algorithms to the particle identification of heavy-ion collisions at low and intermediate energies.Multiple intelligent algorithms,including XgBoost and TabNet,were selected to test datasets from the neutron ion multi-detector for reaction-oriented dynamics(NIMROD-ISiS)and Geant4 simulation.Tree-based machine learning algorithms and deep learning algorithms e.g.TabNet show excellent performance and generalization ability.Adding additional data features besides energy deposition can improve the algorithm’s performance when the data distribution is nonuniform.Intelligent learning algorithms can be applied to solve the particle identification problem in heavy-ion collisions at low and intermediate energies.展开更多
A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on t...A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.展开更多
With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detectin...With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detecting attacks.However,two challenges continue to stymie the development of a viable network intrusion detection system:imbalanced training data and new undiscovered attacks.Therefore,this study proposes a unique deep learning-based intrusion detection method.We use two independent in-memory autoencoders trained on regular network traffic and attacks to capture the dynamic relationship between traffic features in the presence of unbalanced training data.Then the original data is fed into the triplet network by forming a triplet with the data reconstructed from the two encoders to train.Finally,the distance relationship between the triples determines whether the traffic is an attack.In addition,to improve the accuracy of detecting unknown attacks,this research proposes an improved triplet loss function that is used to pull the distances of the same class closer while pushing the distances belonging to different classes farther in the learned feature space.The proposed approach’s effectiveness,stability,and significance are evaluated against advanced models on the Android Adware and General Malware Dataset(AAGM17),Knowledge Discovery and Data Mining Cup 1999(KDDCUP99),Canadian Institute for Cybersecurity Group’s Intrusion Detection Evaluation Dataset(CICIDS2017),UNSW-NB15,Network Security Lab-Knowledge Discovery and Data Mining(NSL-KDD)datasets.The achieved results confirmed the superiority of the proposed method for the task of network intrusion detection.展开更多
Cloud Computing(CC)is the preference of all information technology(IT)organizations as it offers pay-per-use based and flexible services to its users.But the privacy and security become the main hindrances in its achi...Cloud Computing(CC)is the preference of all information technology(IT)organizations as it offers pay-per-use based and flexible services to its users.But the privacy and security become the main hindrances in its achievement due to distributed and open architecture that is prone to intruders.Intrusion Detection System(IDS)refers to one of the commonly utilized system for detecting attacks on cloud.IDS proves to be an effective and promising technique,that identifies malicious activities and known threats by observing traffic data in computers,and warnings are given when such threatswere identified.The current mainstream IDS are assisted with machine learning(ML)but have issues of low detection rates and demanded wide feature engineering.This article devises an Enhanced Coyote Optimization with Deep Learning based Intrusion Detection System for Cloud Security(ECODL-IDSCS)model.The ECODL-IDSCS model initially addresses the class imbalance data problem by the use of Adaptive Synthetic(ADASYN)technique.For detecting and classification of intrusions,long short term memory(LSTM)model is exploited.In addition,ECO algorithm is derived to optimally fine tune the hyperparameters related to the LSTM model to enhance its detection efficiency in the cloud environment.Once the presented ECODL-IDSCS model is tested on benchmark dataset,the experimental results show the promising performance of the ECODL-IDSCS model over the existing IDS models.展开更多
The extent of the peril associated with cancer can be perceivedfrom the lack of treatment, ineffective early diagnosis techniques, and mostimportantly its fatality rate. Globally, cancer is the second leading cause of...The extent of the peril associated with cancer can be perceivedfrom the lack of treatment, ineffective early diagnosis techniques, and mostimportantly its fatality rate. Globally, cancer is the second leading cause ofdeath and among over a hundred types of cancer;lung cancer is the secondmost common type of cancer as well as the leading cause of cancer-relateddeaths. Anyhow, an accurate lung cancer diagnosis in a timely manner canelevate the likelihood of survival by a noticeable margin and medical imagingis a prevalent manner of cancer diagnosis since it is easily accessible to peoplearound the globe. Nonetheless, this is not eminently efficacious consideringhuman inspection of medical images can yield a high false positive rate. Ineffectiveand inefficient diagnosis is a crucial reason for such a high mortalityrate for this malady. However, the conspicuous advancements in deep learningand artificial intelligence have stimulated the development of exceedinglyprecise diagnosis systems. The development and performance of these systemsrely prominently on the data that is used to train these systems. A standardproblem witnessed in publicly available medical image datasets is the severeimbalance of data between different classes. This grave imbalance of data canmake a deep learning model biased towards the dominant class and unableto generalize. This study aims to present an end-to-end convolutional neuralnetwork that can accurately differentiate lung nodules from non-nodules andreduce the false positive rate to a bare minimum. To tackle the problem ofdata imbalance, we oversampled the data by transforming available images inthe minority class. The average false positive rate in the proposed method isa mere 1.5 percent. However, the average false negative rate is 31.76 percent.The proposed neural network has 68.66 percent sensitivity and 98.42 percentspecificity.展开更多
Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis technique...Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis techniques based on an advanced machine learning algorithm. An ensemble classification model based on Aviation Safety Reporting System(ASRS) has been proposed to analyze aviation safety targeting the people injured in the system.The ensemble classification model shall contain two modules: the data-driven module consisting of data cleaning, feature selection,and imbalanced data division and reorganization, and the modeldriven module stacked by Random Forest(RF), XGBoost(XGB),and Light Gradient Boosting Machine(LGBM) separately. The results indicate that the ensemble model could solve the data imbalance while vastly improving accuracy. LGBM illustrates higher accuracy and faster run in the analysis of a single model of the ASRS-based imbalanced data, while the ensemble model has the best performance in classification at the same time. The ensemble model proposed for imbalanced data classification can provide a certain reference for similar data processing while improving the safety of civil aviation.展开更多
A credit risk prediction model named KM-ADASYN-TL-FLLightGBM(KADT-FLightGBM)is proposed in this study.Firstly,to overcome the limitation of traditional sampling methods in dealing with imbalanced datasets,an improved ...A credit risk prediction model named KM-ADASYN-TL-FLLightGBM(KADT-FLightGBM)is proposed in this study.Firstly,to overcome the limitation of traditional sampling methods in dealing with imbalanced datasets,an improved ADASYN sampling with K-means clustering algorithm is constructed.Moreover,the Tomek Links method is used to filter the generated samples.Secondly,an utilized an optimized LightGBM algorithm with the Focal Loss is employed to training the model using the datasets obtained by the improved ADASYN sampling.Finally,the comparative analysis between the ensemble model and other different sampling methodologies is conducted on the Lending Club dataset.The results demonstrate that the proposed model effectively minimizes the misclassification of minority classes in credit risk prediction and can be used as a reference for similar studies.展开更多
As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in ...As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in digital payments.However,the task of anomaly detection in blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions.Although several studies have been conducted in the field,a limitation persists:the lack of explanations for the model’s predictions.This study seeks to overcome this limitation by integrating explainable artificial intelligence(XAI)techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions.The shapley additive explanation(SHAP)method is employed to measure the contribution of each feature,and it is compatible with ensemble models.Moreover,we present rules for interpreting whether a Bitcoin transaction is anomalous or not.Additionally,we introduce an under-sampling algorithm named XGBCLUS,designed to balance anomalous and non-anomalous transaction data.This algorithm is compared against other commonly used under-sampling and over-sampling techniques.Finally,the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers.Our experimental results demonstrate that:(i)XGBCLUS enhances true positive rate(TPR)and receiver operating characteristic-area under curve(ROC-AUC)scores compared to state-of-the-art under-sampling and over-sampling techniques,and(ii)our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy,TPR,and false positive rate(FPR)scores.展开更多
ICOs,the initial coin offerings,are a common way to raise funds for blockchain projects.Fraudulent ICO projects not only cause financial losses to investors but also cause a loss of confidence in the blockchain capita...ICOs,the initial coin offerings,are a common way to raise funds for blockchain projects.Fraudulent ICO projects not only cause financial losses to investors but also cause a loss of confidence in the blockchain capital market.Whitepapers are usually the most important information source,so it is feasible to identify fraudulent ICO programs by analyzing whitepapers.However,the fraud samples are difficult to collect,and the classes are imbalanced.In this study,we attempt to solve this problem by extracting linguistic features from the ICO whitepaper and using a variety of cutting-edge machine learning and deep learning algorithms to train the prediction model and attempt to resample,modify the weight and modify the loss function for imbalanced samples.Our optimal method achieves an AUC of 0.94 and an accuracy of 82%,which is better than other traditional standard methods,and the results provide important implications for ICO fraud detection.展开更多
Recently,rapid serial visual presentation(RSVP),as a new event-related potential(ERP)paradigm,has become one of the most popular forms in electroencephalogram signal processing technologies.Several improvement approac...Recently,rapid serial visual presentation(RSVP),as a new event-related potential(ERP)paradigm,has become one of the most popular forms in electroencephalogram signal processing technologies.Several improvement approaches have been proposed to improve the performance of RSVP analysis.In brain-computer interface systems based on RSVP,the family of approaches that do not depend on training specific parameters is essential.The participating teams proposed several effective training-free frameworks of algorithms in the ERP competition of the BCI Controlled Robot Contest in World Robot Contest 2021.This paper discusses the effectiveness of various approaches in improving the performance of the system without requiring training and suggests how to apply these approaches in a practical system.First,appropriate preprocessing techniques will greatly improve the results.Then,the non-deep learning algorithm may be more stable than the deep learning approach.Furthermore,ensemble learning can make the model more stable and robust.展开更多
文摘A network intrusion detection system is critical for cyber security against llegitimate attacks.In terms of feature perspectives,network traffic may include a variety of elements such as attack reference,attack type,a subcategory of attack,host information,malicious scripts,etc.In terms of network perspectives,network traffic may contain an imbalanced number of harmful attacks when compared to normal traffic.It is challenging to identify a specific attack due to complex features and data imbalance issues.To address these issues,this paper proposes an Intrusion Detection System using transformer-based transfer learning for Imbalanced Network Traffic(IDS-INT).IDS-INT uses transformer-based transfer learning to learn feature interactions in both network feature representation and imbalanced data.First,detailed information about each type of attack is gathered from network interaction descriptions,which include network nodes,attack type,reference,host information,etc.Second,the transformer-based transfer learning approach is developed to learn detailed feature representation using their semantic anchors.Third,the Synthetic Minority Oversampling Technique(SMOTE)is implemented to balance abnormal traffic and detect minority attacks.Fourth,the Convolution Neural Network(CNN)model is designed to extract deep features from the balanced network traffic.Finally,the hybrid approach of the CNN-Long Short-Term Memory(CNN-LSTM)model is developed to detect different types of attacks from the deep features.Detailed experiments are conducted to test the proposed approach using three standard datasets,i.e.,UNsWNB15,CIC-IDS2017,and NSL-KDD.An explainable AI approach is implemented to interpret the proposed method and develop a trustable model.
文摘As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is one of these challenges that can significantly degrade the learning efficiency.To deal with data imbalance issue,this work proposes a new learning framework,called clustered federated learning with weighted model aggregation(weighted CFL).Compared with traditional FL,our weighted CFL adaptively clusters the participating edge devices based on the cosine similarity of their local gradients at each training iteration,and then performs weighted per-cluster model aggregation.Therein,the similarity threshold for clustering is adaptive over iterations in response to the time-varying divergence of local gradients.Moreover,the weights for per-cluster model aggregation are adjusted according to the data balance feature so as to speed up the convergence rate.Experimental results show that the proposed weighted CFL achieves a faster model convergence rate and greater learning accuracy than benchmark methods under the imbalanced data scenario.
基金supported by the National Natural Science Foundation of China Civil Aviation Joint Fund (U1833110)Research on the Dual Prevention Mechanism and Intelligent Management Technology f or Civil Aviation Safety Risks (YK23-03-05)。
文摘Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.
文摘Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.
基金This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences(No.XDB34030000)the National Key Research and Development Program of China(No.2022YFA1602404)+1 种基金the National Natural Science Foundation(No.U1832129)the Youth Innovation Promotion Association CAS(No.2017309).
文摘Traditional particle identification methods face timeconsuming,experience-dependent,and poor repeatability challenges in heavy-ion collisions at low and intermediate energies.Researchers urgently need solutions to the dilemma of traditional particle identification methods.This study explores the possibility of applying intelligent learning algorithms to the particle identification of heavy-ion collisions at low and intermediate energies.Multiple intelligent algorithms,including XgBoost and TabNet,were selected to test datasets from the neutron ion multi-detector for reaction-oriented dynamics(NIMROD-ISiS)and Geant4 simulation.Tree-based machine learning algorithms and deep learning algorithms e.g.TabNet show excellent performance and generalization ability.Adding additional data features besides energy deposition can improve the algorithm’s performance when the data distribution is nonuniform.Intelligent learning algorithms can be applied to solve the particle identification problem in heavy-ion collisions at low and intermediate energies.
基金Funded by the National 863 Program of China (No. 2005AA113150), and the National Natural Science Foundation of China (No.40701158).
文摘A novel Hilbert-curve is introduced for parallel spatial data partitioning, with consideration of the huge-amount property of spatial information and the variable-length characteristic of vector data items. Based on the improved Hilbert curve, the algorithm can be designed to achieve almost-uniform spatial data partitioning among multiple disks in parallel spatial databases. Thus, the phenomenon of data imbalance can be significantly avoided and search and query efficiency can be enhanced.
基金support of National Natural Science Foundation of China(U1936213)Yunnan Provincial Natural Science Foundation,“Robustness analysis method and coupling mechanism of complex coupled network system”(202101AT070167)Yunnan Provincial Major Science and Technology Program,“Construction and application demonstration of intelligent diagnosis and treatment system for childhood diseases based on intelligent medical platform”(202102AA100021).
文摘With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detecting attacks.However,two challenges continue to stymie the development of a viable network intrusion detection system:imbalanced training data and new undiscovered attacks.Therefore,this study proposes a unique deep learning-based intrusion detection method.We use two independent in-memory autoencoders trained on regular network traffic and attacks to capture the dynamic relationship between traffic features in the presence of unbalanced training data.Then the original data is fed into the triplet network by forming a triplet with the data reconstructed from the two encoders to train.Finally,the distance relationship between the triples determines whether the traffic is an attack.In addition,to improve the accuracy of detecting unknown attacks,this research proposes an improved triplet loss function that is used to pull the distances of the same class closer while pushing the distances belonging to different classes farther in the learned feature space.The proposed approach’s effectiveness,stability,and significance are evaluated against advanced models on the Android Adware and General Malware Dataset(AAGM17),Knowledge Discovery and Data Mining Cup 1999(KDDCUP99),Canadian Institute for Cybersecurity Group’s Intrusion Detection Evaluation Dataset(CICIDS2017),UNSW-NB15,Network Security Lab-Knowledge Discovery and Data Mining(NSL-KDD)datasets.The achieved results confirmed the superiority of the proposed method for the task of network intrusion detection.
基金The Deanship of Scientific Research(DSR)at King Abdulaziz University(KAU),Jeddah,Saudi Arabia has funded this project,under grant no.KEP-1-120-42.
文摘Cloud Computing(CC)is the preference of all information technology(IT)organizations as it offers pay-per-use based and flexible services to its users.But the privacy and security become the main hindrances in its achievement due to distributed and open architecture that is prone to intruders.Intrusion Detection System(IDS)refers to one of the commonly utilized system for detecting attacks on cloud.IDS proves to be an effective and promising technique,that identifies malicious activities and known threats by observing traffic data in computers,and warnings are given when such threatswere identified.The current mainstream IDS are assisted with machine learning(ML)but have issues of low detection rates and demanded wide feature engineering.This article devises an Enhanced Coyote Optimization with Deep Learning based Intrusion Detection System for Cloud Security(ECODL-IDSCS)model.The ECODL-IDSCS model initially addresses the class imbalance data problem by the use of Adaptive Synthetic(ADASYN)technique.For detecting and classification of intrusions,long short term memory(LSTM)model is exploited.In addition,ECO algorithm is derived to optimally fine tune the hyperparameters related to the LSTM model to enhance its detection efficiency in the cloud environment.Once the presented ECODL-IDSCS model is tested on benchmark dataset,the experimental results show the promising performance of the ECODL-IDSCS model over the existing IDS models.
基金supported this research through the National Research Foundation of Korea (NRF)funded by the Ministry of Science,ICT (2019M3F2A1073387)this work was supported by the Institute for Information&communications Technology Promotion (IITP) (NO.2022-0-00980Cooperative Intelligence Framework of Scene Perception for Autonomous IoT Device).
文摘The extent of the peril associated with cancer can be perceivedfrom the lack of treatment, ineffective early diagnosis techniques, and mostimportantly its fatality rate. Globally, cancer is the second leading cause ofdeath and among over a hundred types of cancer;lung cancer is the secondmost common type of cancer as well as the leading cause of cancer-relateddeaths. Anyhow, an accurate lung cancer diagnosis in a timely manner canelevate the likelihood of survival by a noticeable margin and medical imagingis a prevalent manner of cancer diagnosis since it is easily accessible to peoplearound the globe. Nonetheless, this is not eminently efficacious consideringhuman inspection of medical images can yield a high false positive rate. Ineffectiveand inefficient diagnosis is a crucial reason for such a high mortalityrate for this malady. However, the conspicuous advancements in deep learningand artificial intelligence have stimulated the development of exceedinglyprecise diagnosis systems. The development and performance of these systemsrely prominently on the data that is used to train these systems. A standardproblem witnessed in publicly available medical image datasets is the severeimbalance of data between different classes. This grave imbalance of data canmake a deep learning model biased towards the dominant class and unableto generalize. This study aims to present an end-to-end convolutional neuralnetwork that can accurately differentiate lung nodules from non-nodules andreduce the false positive rate to a bare minimum. To tackle the problem ofdata imbalance, we oversampled the data by transforming available images inthe minority class. The average false positive rate in the proposed method isa mere 1.5 percent. However, the average false negative rate is 31.76 percent.The proposed neural network has 68.66 percent sensitivity and 98.42 percentspecificity.
基金Supported by the Joint Fund of National Natural Science Foundation of China and Civil Aviation Administration of China (U1833110)。
文摘Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis techniques based on an advanced machine learning algorithm. An ensemble classification model based on Aviation Safety Reporting System(ASRS) has been proposed to analyze aviation safety targeting the people injured in the system.The ensemble classification model shall contain two modules: the data-driven module consisting of data cleaning, feature selection,and imbalanced data division and reorganization, and the modeldriven module stacked by Random Forest(RF), XGBoost(XGB),and Light Gradient Boosting Machine(LGBM) separately. The results indicate that the ensemble model could solve the data imbalance while vastly improving accuracy. LGBM illustrates higher accuracy and faster run in the analysis of a single model of the ASRS-based imbalanced data, while the ensemble model has the best performance in classification at the same time. The ensemble model proposed for imbalanced data classification can provide a certain reference for similar data processing while improving the safety of civil aviation.
基金supported by the National Natural Science Foundation of China(Nos.71503108 and 62077029)CCF-Huawei Innovation Research Program Grant(No.CCF-HuaweiFM202209)Research and Practice Innovation Project of Jiangsu Normal University(No.2022XKT1540).
文摘A credit risk prediction model named KM-ADASYN-TL-FLLightGBM(KADT-FLightGBM)is proposed in this study.Firstly,to overcome the limitation of traditional sampling methods in dealing with imbalanced datasets,an improved ADASYN sampling with K-means clustering algorithm is constructed.Moreover,the Tomek Links method is used to filter the generated samples.Secondly,an utilized an optimized LightGBM algorithm with the Focal Loss is employed to training the model using the datasets obtained by the improved ADASYN sampling.Finally,the comparative analysis between the ensemble model and other different sampling methodologies is conducted on the Lending Club dataset.The results demonstrate that the proposed model effectively minimizes the misclassification of minority classes in credit risk prediction and can be used as a reference for similar studies.
文摘As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in digital payments.However,the task of anomaly detection in blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions.Although several studies have been conducted in the field,a limitation persists:the lack of explanations for the model’s predictions.This study seeks to overcome this limitation by integrating explainable artificial intelligence(XAI)techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions.The shapley additive explanation(SHAP)method is employed to measure the contribution of each feature,and it is compatible with ensemble models.Moreover,we present rules for interpreting whether a Bitcoin transaction is anomalous or not.Additionally,we introduce an under-sampling algorithm named XGBCLUS,designed to balance anomalous and non-anomalous transaction data.This algorithm is compared against other commonly used under-sampling and over-sampling techniques.Finally,the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers.Our experimental results demonstrate that:(i)XGBCLUS enhances true positive rate(TPR)and receiver operating characteristic-area under curve(ROC-AUC)scores compared to state-of-the-art under-sampling and over-sampling techniques,and(ii)our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy,TPR,and false positive rate(FPR)scores.
基金supported by the National Natural Science Foundation of China under Grant No.61907042Beijing Natural Science Foundation under Grant No.4194090.
文摘ICOs,the initial coin offerings,are a common way to raise funds for blockchain projects.Fraudulent ICO projects not only cause financial losses to investors but also cause a loss of confidence in the blockchain capital market.Whitepapers are usually the most important information source,so it is feasible to identify fraudulent ICO programs by analyzing whitepapers.However,the fraud samples are difficult to collect,and the classes are imbalanced.In this study,we attempt to solve this problem by extracting linguistic features from the ICO whitepaper and using a variety of cutting-edge machine learning and deep learning algorithms to train the prediction model and attempt to resample,modify the weight and modify the loss function for imbalanced samples.Our optimal method achieves an AUC of 0.94 and an accuracy of 82%,which is better than other traditional standard methods,and the results provide important implications for ICO fraud detection.
基金This research was supported by the National Key Research and Development Program of China(Grant No.2021ZD0201303)the Technology Innovation Project of Hubei Province of China(Grant No.2019AEA171)the Hubei Province Funds for Distinguished Young Scholars(Grant No.2020CFA050).
文摘Recently,rapid serial visual presentation(RSVP),as a new event-related potential(ERP)paradigm,has become one of the most popular forms in electroencephalogram signal processing technologies.Several improvement approaches have been proposed to improve the performance of RSVP analysis.In brain-computer interface systems based on RSVP,the family of approaches that do not depend on training specific parameters is essential.The participating teams proposed several effective training-free frameworks of algorithms in the ERP competition of the BCI Controlled Robot Contest in World Robot Contest 2021.This paper discusses the effectiveness of various approaches in improving the performance of the system without requiring training and suggests how to apply these approaches in a practical system.First,appropriate preprocessing techniques will greatly improve the results.Then,the non-deep learning algorithm may be more stable than the deep learning approach.Furthermore,ensemble learning can make the model more stable and robust.