In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection m...In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection method called recursive feature elimination based on ridge regression(Ridge-RFE)for the original spectral data is recommended to make full use of the valid information of spectra.In the Ridge-RFE method,the absolute value of the ridge regression coefficient was used as a criterion to screen spectral characteristic,the feature with the absolute value of minimum weight in the input subset features was removed by recursive feature elimination(RFE),and the selected features were used as inputs of the partial least squares regression(PLS)model.The Ridge-RFE method based PLS model was used to measure the Fe,Si,Mg,Cu,Zn and Mn for 51 aluminum alloy samples,and the results showed that the root mean square error of prediction decreased greatly compared to the PLS model with full spectrum as input.The overall results demonstrate that the Ridge-RFE method is more efficient to extract the redundant features,make PLS model for better quantitative analysis results and improve model generalization ability.展开更多
In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different f...In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics.While the first experiments directly used the own stock features as the model inputs,the second experiments utilized reduced stock features through Variational AutoEncoders(VAE).In the last experiments,in order to grasp the effects of the other banking stocks on individual stock performance,the features belonging to other stocks were also given as inputs to our models.While combining other stock features was done for both own(named as allstock_own)and VAE-reduced(named as allstock_VAE)stock features,the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination.As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model,the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675.Although the classification results achieved with both feature types was close,allstock_VAE achieved these results using nearly 16.67%less features compared to allstock_own.When all experimental results were examined,it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features.It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.展开更多
Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently i...Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.展开更多
Field-road segmentation is one of the key tasks in the processing of the trajectory of agricultural machinery.To improve the accuracy of the field-road segmentation,this study proposed an XGBoost model based on dual f...Field-road segmentation is one of the key tasks in the processing of the trajectory of agricultural machinery.To improve the accuracy of the field-road segmentation,this study proposed an XGBoost model based on dual feature extraction and recursive feature elimination called DR-XGBoost.DR-XGBoost takes only a small amount of agricultural machine trajectory features as input.Firstly,the model adopted the dual feature extraction method we designed to rapidly expand the number of features and then adequately extract local trajectory features by the time window and feature extraction operator.Secondly,the model applies the recursive feature elimination algorithm to eliminate redundant features from the perspective of the model segmentation effect and thus reduce the computational consumption of model training.Thirdly,it trains XGBoost to complete the trajectory segmentation.To evaluate the effectiveness of DR-XGBoost,we conducted a series of experiments on a real trajectory dataset of agricultural machines.The model achieves a 98.2%Macro-F1 score on the dataset,which is 10.9%higher than the previous state-of-art.The proposal of DR-XGBoost fills the knowledge gap of trajectory feature extraction for agricultural machinery and provides a reasonable and effective feature selection scheme for the field-road segmentation problem.展开更多
This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected featur...This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected features according to a learning model to assess how those important features were identified.Initially,the analysis features were extracted using Cuckoo Sandbox,an open-source malware analysis tool,then the features were divided into five categories using the extracted information.The 804 extracted features were reduced by 70%after selecting only the most suitable ones for malware classification using a learning model-based feature selection method called the recursive feature elimination.Next,these important features were analyzed.The level of contribution from each one was assessed by the Random Forest classifier method.The results showed that System call features were mostly allocated.At the end,it was possible to accurately identify the malware type using only 36 to 76 features for each of the four types of malware with the most analysis samples available.These were the Trojan,Adware,Downloader,and Backdoor malware.展开更多
At present,the prevalence of diabetes is increasing because the human body cannot metabolize the glucose level.Accurate prediction of diabetes patients is an important research area.Many researchers have proposed tech...At present,the prevalence of diabetes is increasing because the human body cannot metabolize the glucose level.Accurate prediction of diabetes patients is an important research area.Many researchers have proposed techniques to predict this disease through data mining and machine learning methods.In prediction,feature selection is a key concept in preprocessing.Thus,the features that are relevant to the disease are used for prediction.This condition improves the prediction accuracy.Selecting the right features in the whole feature set is a complicated process,and many researchers are concentrating on it to produce a predictive model with high accuracy.In this work,a wrapper-based feature selection method called recursive feature elimination is combined with ridge regression(L2)to form a hybrid L2 regulated feature selection algorithm for overcoming the overfitting problem of data set.Overfitting is a major problem in feature selection,where the new data are unfit to the model because the training data are small.Ridge regression is mainly used to overcome the overfitting problem.The features are selected by using the proposed feature selection method,and random forest classifier is used to classify the data on the basis of the selected features.This work uses the Pima Indians Diabetes data set,and the evaluated results are compared with the existing algorithms to prove the accuracy of the proposed algorithm.The accuracy of the proposed algorithm in predicting diabetes is 100%,and its area under the curve is 97%.The proposed algorithm outperforms existing algorithms.展开更多
The recursive structure and quasi-recursive structure of Adaptive Volterra Fil-ter(AVF) are put forward, their algorithms are given, and their characteristics and applications are discussed. The introduction of recurs...The recursive structure and quasi-recursive structure of Adaptive Volterra Fil-ter(AVF) are put forward, their algorithms are given, and their characteristics and applications are discussed. The introduction of recursive structure can remarkably reduce the parameters and computational cost of AVF.展开更多
Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network...Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network.On the other hand,these advantages create a more vulnerable environment with substantial risks,culminating in network difficulties,system paralysis,online banking frauds,and robberies.These issues have a significant detrimental impact on organizations,enterprises,and even economies.Accuracy,high performance,and real-time systems are necessary to achieve this goal.Using a SDN to extend intelligent machine learning methodologies in an Intrusion Detection System(IDS)has stimulated the interest of numerous research investigators over the last decade.In this paper,a novel HFS-LGBM IDS is proposed for SDN.First,the Hybrid Feature Selection algorithm consisting of two phases is applied to reduce the data dimension and to obtain an optimal feature subset.In thefirst phase,the Correlation based Feature Selection(CFS)algorithm is used to obtain the feature subset.The optimal feature set is obtained by applying the Random Forest Recursive Feature Elimination(RF-RFE)in the second phase.A LightGBM algorithm is then used to detect and classify different types of attacks.The experimental results based on NSL-KDD dataset show that the proposed system produces outstanding results compared to the existing methods in terms of accuracy,precision,recall and f-measure.展开更多
基金supported by National Key Research and Development Program of China(No.2016YFF0102502)the Key Research Program of Frontier Sciences,CAS(No.QYZDJ-SSW-JSC037)the Youth Innovation Promotion Association,CAS,Liao Ning Revitalization Talents Program(No.XLYC1807110)。
文摘In the spectral analysis of laser-induced breakdown spectroscopy,abundant characteristic spectral lines and severe interference information exist simultaneously in the original spectral data.Here,a feature selection method called recursive feature elimination based on ridge regression(Ridge-RFE)for the original spectral data is recommended to make full use of the valid information of spectra.In the Ridge-RFE method,the absolute value of the ridge regression coefficient was used as a criterion to screen spectral characteristic,the feature with the absolute value of minimum weight in the input subset features was removed by recursive feature elimination(RFE),and the selected features were used as inputs of the partial least squares regression(PLS)model.The Ridge-RFE method based PLS model was used to measure the Fe,Si,Mg,Cu,Zn and Mn for 51 aluminum alloy samples,and the results showed that the root mean square error of prediction decreased greatly compared to the PLS model with full spectrum as input.The overall results demonstrate that the Ridge-RFE method is more efficient to extract the redundant features,make PLS model for better quantitative analysis results and improve model generalization ability.
文摘In this study,the hourly directions of eight banking stocks in Borsa Istanbul were predicted using linear-based,deep-learning(LSTM)and ensemble learning(Light-GBM)models.These models were trained with four different feature sets and their performances were evaluated in terms of accuracy and F-measure metrics.While the first experiments directly used the own stock features as the model inputs,the second experiments utilized reduced stock features through Variational AutoEncoders(VAE).In the last experiments,in order to grasp the effects of the other banking stocks on individual stock performance,the features belonging to other stocks were also given as inputs to our models.While combining other stock features was done for both own(named as allstock_own)and VAE-reduced(named as allstock_VAE)stock features,the expanded dimensions of the feature sets were reduced by Recursive Feature Elimination.As the highest success rate increased up to 0.685 with allstock_own and LSTM with attention model,the combination of allstock_VAE and LSTM with the attention model obtained an accuracy rate of 0.675.Although the classification results achieved with both feature types was close,allstock_VAE achieved these results using nearly 16.67%less features compared to allstock_own.When all experimental results were examined,it was found out that the models trained with allstock_own and allstock_VAE achieved higher accuracy rates than those using individual stock features.It was also concluded that the results obtained with the VAE-reduced stock features were similar to those obtained by own stock features.
基金Supported by China 973 Program (No.2002CB312200), the National Natural Science Foundation of China (No.60574019 and No.60474045), the Key Technologies R&D Program of Zhejiang Province (No.2005C21087) and the Academician Foundation of Zhejiang Province (No.2005A1001-13).
文摘Key variable identification for classifications is related to many trouble-shooting problems in process indus-tries. Recursive feature elimination based on support vector machine (SVM-RFE) has been proposed recently in applica-tion for feature selection in cancer diagnosis. In this paper, SVM-RFE is used to the key variable selection in fault diag-nosis, and an accelerated SVM-RFE procedure based on heuristic criterion is proposed. The data from Tennessee East-man process (TEP) simulator is used to evaluate the effectiveness of the key variable selection using accelerated SVM-RFE (A-SVM-RFE). A-SVM-RFE integrates computational rate and algorithm effectiveness into a consistent framework. It not only can correctly identify the key variables, but also has very good computational rate. In comparison with contribution charts combined with principal component aralysis (PCA) and other two SVM-RFE algorithms, A-SVM-RFE performs better. It is more fitting for industrial application.
基金This work was financially supported by the National Key Research and Development Program of China(Grant No.2021YFB3901300)the National Precision Agriculture Application Project(Grant No.JZNYYY001)National Innovation Training Project for University in China(Grant No.202310019034).
文摘Field-road segmentation is one of the key tasks in the processing of the trajectory of agricultural machinery.To improve the accuracy of the field-road segmentation,this study proposed an XGBoost model based on dual feature extraction and recursive feature elimination called DR-XGBoost.DR-XGBoost takes only a small amount of agricultural machine trajectory features as input.Firstly,the model adopted the dual feature extraction method we designed to rapidly expand the number of features and then adequately extract local trajectory features by the time window and feature extraction operator.Secondly,the model applies the recursive feature elimination algorithm to eliminate redundant features from the perspective of the model segmentation effect and thus reduce the computational consumption of model training.Thirdly,it trains XGBoost to complete the trajectory segmentation.To evaluate the effectiveness of DR-XGBoost,we conducted a series of experiments on a real trajectory dataset of agricultural machines.The model achieves a 98.2%Macro-F1 score on the dataset,which is 10.9%higher than the previous state-of-art.The proposal of DR-XGBoost fills the knowledge gap of trajectory feature extraction for agricultural machinery and provides a reasonable and effective feature selection scheme for the field-road segmentation problem.
基金supported by the Research Program through the National Research Foundation of Korea,NRF-2018R1D1A1B07050864.
文摘This study was conducted to enable prompt classification of malware,which was becoming increasingly sophisticated.To do this,we analyzed the important features of malware and the relative importance of selected features according to a learning model to assess how those important features were identified.Initially,the analysis features were extracted using Cuckoo Sandbox,an open-source malware analysis tool,then the features were divided into five categories using the extracted information.The 804 extracted features were reduced by 70%after selecting only the most suitable ones for malware classification using a learning model-based feature selection method called the recursive feature elimination.Next,these important features were analyzed.The level of contribution from each one was assessed by the Random Forest classifier method.The results showed that System call features were mostly allocated.At the end,it was possible to accurately identify the malware type using only 36 to 76 features for each of the four types of malware with the most analysis samples available.These were the Trojan,Adware,Downloader,and Backdoor malware.
文摘At present,the prevalence of diabetes is increasing because the human body cannot metabolize the glucose level.Accurate prediction of diabetes patients is an important research area.Many researchers have proposed techniques to predict this disease through data mining and machine learning methods.In prediction,feature selection is a key concept in preprocessing.Thus,the features that are relevant to the disease are used for prediction.This condition improves the prediction accuracy.Selecting the right features in the whole feature set is a complicated process,and many researchers are concentrating on it to produce a predictive model with high accuracy.In this work,a wrapper-based feature selection method called recursive feature elimination is combined with ridge regression(L2)to form a hybrid L2 regulated feature selection algorithm for overcoming the overfitting problem of data set.Overfitting is a major problem in feature selection,where the new data are unfit to the model because the training data are small.Ridge regression is mainly used to overcome the overfitting problem.The features are selected by using the proposed feature selection method,and random forest classifier is used to classify the data on the basis of the selected features.This work uses the Pima Indians Diabetes data set,and the evaluated results are compared with the existing algorithms to prove the accuracy of the proposed algorithm.The accuracy of the proposed algorithm in predicting diabetes is 100%,and its area under the curve is 97%.The proposed algorithm outperforms existing algorithms.
文摘The recursive structure and quasi-recursive structure of Adaptive Volterra Fil-ter(AVF) are put forward, their algorithms are given, and their characteristics and applications are discussed. The introduction of recursive structure can remarkably reduce the parameters and computational cost of AVF.
文摘Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network.On the other hand,these advantages create a more vulnerable environment with substantial risks,culminating in network difficulties,system paralysis,online banking frauds,and robberies.These issues have a significant detrimental impact on organizations,enterprises,and even economies.Accuracy,high performance,and real-time systems are necessary to achieve this goal.Using a SDN to extend intelligent machine learning methodologies in an Intrusion Detection System(IDS)has stimulated the interest of numerous research investigators over the last decade.In this paper,a novel HFS-LGBM IDS is proposed for SDN.First,the Hybrid Feature Selection algorithm consisting of two phases is applied to reduce the data dimension and to obtain an optimal feature subset.In thefirst phase,the Correlation based Feature Selection(CFS)algorithm is used to obtain the feature subset.The optimal feature set is obtained by applying the Random Forest Recursive Feature Elimination(RF-RFE)in the second phase.A LightGBM algorithm is then used to detect and classify different types of attacks.The experimental results based on NSL-KDD dataset show that the proposed system produces outstanding results compared to the existing methods in terms of accuracy,precision,recall and f-measure.