A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positio...A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability.展开更多
This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gai...This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments.展开更多
Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor sys...Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method.展开更多
Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake ...Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes.展开更多
Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is...Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers.展开更多
It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a ...It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYLg, and GUCA2B.展开更多
Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative cla...Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative classifier with a compact set of rules(i.e.,compactness),which is easy to understand and use in decision making.This paper presents a novel approach to fuzzy associative classification(namely Gain-based Fuzzy Rule-Covering classification,GFRC),which is a fuzzy extension of an effective classifier GARC.In GFRC,two desirable strategies are introduced to enhance the compactness with accuracy.One strategy is fuzzy partitioning for data discretization to cope with the‘sharp boundary problem’,in that simulated annealing is incorporated based on the information entropy measure;the other strategy is a data-redundancy resolution coupled with the rulecovering treatment.Data experiments show that GFRC had good accuracy,and was significantly advantageous over other classifiers in compactness.Moreover,GFRC is applied to a real-world case for predicting the growth of sellers in an electronic marketplace,illustrating the classification effectiveness with linguistic rules in business decision support.展开更多
In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set f...In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.展开更多
The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects...The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.展开更多
Since the previous research works are not synthetic and accurate enough for building a precise hypertension risk evaluation system,by ranking the significances for hypertension factors according to the information gai...Since the previous research works are not synthetic and accurate enough for building a precise hypertension risk evaluation system,by ranking the significances for hypertension factors according to the information gains on 2 231 normotensive and 823 hypertensive samples,totally 42 different neural network models are built and tested.The prediction accuracy of a model whose inputs are 26 factors is found to be much higher than the 81.61% obtained by previous research work. The prediction matching rates of the model for "hypertension or not","systolic blood pressure",and "diastolic blood pressure" are 95.79%,98.22% and 98.41%,respectively.Based on the found model and the object oriented techniques,an online hypertension risk evaluation system is developed,being able to gather new samples,learn the new samples,and improve its prediction accuracy automatically.展开更多
Landslide is considered as one of the most severe threats to human life and property in the hilly areas of the world.The number of landslides and the level of damage across the globe has been increasing over time.Ther...Landslide is considered as one of the most severe threats to human life and property in the hilly areas of the world.The number of landslides and the level of damage across the globe has been increasing over time.Therefore,landslide management is essential to maintain the natural and socio-economic dynamics of the hilly region.Rorachu river basin is one of the most landslide-prone areas of the Sikkim selected for the present study.The prime goal of the study is to prepare landslide susceptibility maps(LSMs)using computer-based advanced machine learning techniques and compare the performance of the models.To properly understand the existing spatial relation with the landslide,twenty factors,including triggering and causative factors,were selected.A deep learning algorithm viz.convolutional neural network model(CNN)and three popular machine learning techniques,i.e.,random forest model(RF),artificial neural network model(ANN),and bagging model,were employed to prepare the LSMs.Two separate datasets including training and validation were designed by randomly taken landslide and nonlandslide points.A ratio of 70:30 was considered for the selection of both training and validation points.Multicollinearity was assessed by tolerance and variance inflation factor,and the role of individual conditioning factors was estimated using information gain ratio.The result reveals that there is no severe multicollinearity among the landslide conditioning factors,and the triggering factor rainfall appeared as the leading cause of the landslide.Based on the final prediction values of each model,LSM was constructed and successfully portioned into five distinct classes,like very low,low,moderate,high,and very high susceptibility.The susceptibility class-wise distribution of landslides shows that more than 90%of the landslide area falls under higher landslide susceptibility grades.The precision of models was examined using the area under the curve(AUC)of the receiver operating characteristics(ROC)curve and statistical methods like root mean square error(RMSE)and mean absolute error(MAE).In both datasets(training and validation),the CNN model achieved the maximum AUC value of 0.903 and 0.939,respectively.The lowest value of RMSE and MAE also reveals the better performance of the CNN model.So,it can be concluded that all the models have performed well,but the CNN model has outperformed the other models in terms of precision.展开更多
A cued search algorithm with uncertain detection performance is proposed for phased array radars. Firstly, a target search model based on the information gain criterion is presented with known detection performance, a...A cued search algorithm with uncertain detection performance is proposed for phased array radars. Firstly, a target search model based on the information gain criterion is presented with known detection performance, and the statistical characteristic of the detection probability is calculated by using the fluctuant model of the target radar cross section (RCS). Secondly, when the detection probability is completely unknown, its probability density function is modeled with a beta distribution, and its posterior probability distribution with the radar observation is derived based on the Bayesian theory. Finally simulation results show that the cued search algorithm with a known RCS fluctuant model can achieve the best performance, and the algorithm with the detection probability modeled as a beta distribution is better than that with a random selected detection probability because the model parameters can be updated by the radar observation to approach to the real value of the detection probability.展开更多
Intrusion Detection Systems(IDSs)have a great interest these days to discover complex attack events and protect the critical infrastructures of the Internet of Things(IoT)networks.Existing IDSs based on shallow and de...Intrusion Detection Systems(IDSs)have a great interest these days to discover complex attack events and protect the critical infrastructures of the Internet of Things(IoT)networks.Existing IDSs based on shallow and deep network architectures demand high computational resources and high volumes of data to establish an adaptive detection engine that discovers new families of attacks from the edge of IoT networks.However,attackers exploit network gateways at the edge using new attacking scenarios(i.e.,zero-day attacks),such as ransomware and Distributed Denial of Service(DDoS)attacks.This paper proposes new IDS based on Few-Shot Deep Learning,named CNN-IDS,which can automatically identify zero-day attacks from the edge of a network and protect its IoT systems.The proposed system comprises two-methodological stages:1)a filtered Information Gain method is to select the most useful features from network data,and 2)one-dimensional Convolutional Neural Network(CNN)algorithm is to recognize new attack types from a network’s edge.The proposed model is trained and validated using two datasets of the UNSW-NB15 and Bot-IoT.The experimental results showed that it enhances about a 3%detection rate and around a 3%–4%falsepositive rate with the UNSW-NB15 dataset and about an 8%detection rate using the BoT-IoT dataset.展开更多
We theoretically study the reversible process of quantum entanglement state by means of weak measurement and corresponding reversible operation.We present a protocol of the reversion operation in two bodies based on t...We theoretically study the reversible process of quantum entanglement state by means of weak measurement and corresponding reversible operation.We present a protocol of the reversion operation in two bodies based on the theory of reversion of single photon and then expend it in quantum communication channels.The theoretical results demonstrate that the protocol does not break the information transmission after a weak measurement and a reversible measurement with the subsequent process in the transmission path.It can reverse the perturbed entanglement intensity evolution to its original state.Under the condition of different weak measurement intensity the protocol can reverse the perturbed quantum entanglement system perfectly.In the process we can get the classical information described by information gain from the quantum system through weak measurement operation.On the other hand,in order to realize complete reversibility,the classical information of the quantum entanglement system must obey a limited range we present in this paper in the reverse process.展开更多
The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute indep...The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.展开更多
With the rapid growth of internet based services and the data generated on these services are attracted by the attackers to intrude the networking services and information.Based on the characteristics of these intrude...With the rapid growth of internet based services and the data generated on these services are attracted by the attackers to intrude the networking services and information.Based on the characteristics of these intruders,many researchers attempted to aim to detect the intrusion with the help of automating process.Since,the large volume of data is generated and transferred through network,the security and performance are remained an issue.IDS(Intrusion Detection System)was developed to detect and prevent the intruders and secure the network systems.The performance and loss are still an issue because of the features space grows while detecting the intruders.In this paper,deep clustering based CNN have been used to detect the intruders with the help of Meta heuristic algorithms for feature selection and preprocessing.The proposed system includes three phases such as preprocessing,feature selection and classification.In the first phase,KDD dataset is preprocessed by using Binning normalization and Eigen-PCA based discretization method.In second phase,feature selection is performed by using Information Gain based Dragonfly Optimizer(IGDFO).Finally,Deep clustering based Convolutional Neural Network(CCNN)classifier optimized with Particle Swarm Optimization(PSO)identifies intrusion attacks efficiently.The clustering loss and network loss can be reduced with the optimization algorithm.We evaluate the proposed IDS model with the NSL-KDD dataset in terms of evaluation metrics.The experimental results show that proposed system achieves better performance compared with the existing system in terms of accuracy,precision,recall,f-measure and false detection rate.展开更多
As IoT devices become more ubiquitous, the security of IoT-based networks becomes paramount. Machine Learning-based cybersecurity enables autonomous threat detection and prevention. However, one of the challenges of a...As IoT devices become more ubiquitous, the security of IoT-based networks becomes paramount. Machine Learning-based cybersecurity enables autonomous threat detection and prevention. However, one of the challenges of applying Machine Learning-based cybersecurity in IoT devices is feature selection as most IoT devices are resource-constrained. This paper studies two feature selection algorithms: Information Gain and PSO-based, to select a minimum number of attack features, and Decision Tree and SVM are utilized for performance comparison. The consistent use of the same metrics in feature selection and detection algorithms substantially enhances the classification accuracy compared to the non-consistent use in feature selection by Information Gain (entropy) and Tree detection algorithm by classification. Furthermore, the Tree with consistent feature selection is comparable to the ensemble that provides excellent performance at the cost of computation complexity.展开更多
This research aims to develop a model to enhance lymphatic diseases diagnosis by the use of random forest ensemble machine-learning method trained with a simple sampling scheme. This study has been carried out in two ...This research aims to develop a model to enhance lymphatic diseases diagnosis by the use of random forest ensemble machine-learning method trained with a simple sampling scheme. This study has been carried out in two major phases: feature selection and classification. In the first stage, a number of discriminative features out of 18 were selected using PSO and several feature selection techniques to reduce the features dimension. In the second stage, we applied the random forest ensemble classification scheme to diagnose lymphatic diseases. While making experiments with the selected features, we used original and resampled distributions of the dataset to train random forest classifier. Experimental results demonstrate that the proposed method achieves a remark-able improvement in classification accuracy rate.展开更多
To identify key symptoms of two major syndromes in chronic hepatitis B (CHB), which can be the clinical evidence for Chinese medicine (CM) doctors to make decisions. Standardization scales on diagnosis for CHB in CM w...To identify key symptoms of two major syndromes in chronic hepatitis B (CHB), which can be the clinical evidence for Chinese medicine (CM) doctors to make decisions. Standardization scales on diagnosis for CHB in CM were designed including physical symptoms, tongue and pulse appearance. The total of 695 CHB cases with dampness-heat (DH) syndrome or Pi (Spleen) deficiency (SD) syndrome were collected for feature selection and modeling, another 275 CHB patients were collected in different locations for validation. Key symptoms were selected based on modified information gain (IG), and 5 classifiers were applied to assist with models training and validation. Classification accuracy and area under receiver operating characteristic curves (AUC) were evaluated. (1) Thirteen DH syndrome key symptoms and 13 SD syndrome key symptoms were selected from original 125 symptoms; (2) The key symptoms could achieve similar or better diagnostic accuracy than the original total symptoms; (3) In the validation phase, the key symptoms could identify syndromes effectively, especially in DH syndrome, which average prediction accuracy on 5 classifiers could achieve 0.864 with the average AUC 0.772. The selected key symptoms could be simple DH and SD syndromes diagnostic elements applied in clinical directly. (Registration N0.: ChiCTR-DCC-10000759).展开更多
The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This help...The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This helps the search engines to provide users with relevant and quick retrieval results. As web pages are represented by thousands of features, feature selection helps the web page classifiers to resolve this large scale dimensionality problem. This paper proposes a new feature selection method using Ward’s minimum variance measure. This measure is first used to identify clusters of redundant features in a web page. In each cluster, the best representative features are retained and the others are eliminated. Removing such redundant features helps in minimizing the resource utilization during classification. The proposed method of feature selection is compared with other common feature selection methods. Experiments done on a benchmark data set, namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.展开更多
基金the High Technology Research and Development Programme of China (2003AA134030)
文摘A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability.
基金supported by the National Defense Pre-research Foundation (9140A21041110KG0148)
文摘This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments.
文摘Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method.
文摘Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes.
文摘Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers.
基金supported by the National Natural Science Foundation of China(Grant No.61672386)Humanities and Social Sciences Planning Project of Ministry of Education,China(Grant No.16YJAZH071)+1 种基金Anhui Provincial Natural Science Foundation of China(Grant No.1708085MF142)the Natural Science Research Key Project of Anhui Colleges,China(Grant No.KJ2014A266)
文摘It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYLg, and GUCA2B.
基金the MOE Project of Key Research Institute of Humanities and Social Sciences at Universities(12JJD630001)the National Natural Science Foundation of China(71372044/71110107027)Tsinghua University Initiative Scientific Research Program(20101081741).
文摘Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative classifier with a compact set of rules(i.e.,compactness),which is easy to understand and use in decision making.This paper presents a novel approach to fuzzy associative classification(namely Gain-based Fuzzy Rule-Covering classification,GFRC),which is a fuzzy extension of an effective classifier GARC.In GFRC,two desirable strategies are introduced to enhance the compactness with accuracy.One strategy is fuzzy partitioning for data discretization to cope with the‘sharp boundary problem’,in that simulated annealing is incorporated based on the information entropy measure;the other strategy is a data-redundancy resolution coupled with the rulecovering treatment.Data experiments show that GFRC had good accuracy,and was significantly advantageous over other classifiers in compactness.Moreover,GFRC is applied to a real-world case for predicting the growth of sellers in an electronic marketplace,illustrating the classification effectiveness with linguistic rules in business decision support.
基金National Natural Science Foundation of China(U2133208,U20A20161)National Natural Science Foundation of China(No.62273244)Sichuan Science and Technology Program(No.2022YFG0180).
文摘In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.
文摘The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.
基金The National High Technology Research and Development Program of China(863Program)(No.2006AA02Z347)
文摘Since the previous research works are not synthetic and accurate enough for building a precise hypertension risk evaluation system,by ranking the significances for hypertension factors according to the information gains on 2 231 normotensive and 823 hypertensive samples,totally 42 different neural network models are built and tested.The prediction accuracy of a model whose inputs are 26 factors is found to be much higher than the 81.61% obtained by previous research work. The prediction matching rates of the model for "hypertension or not","systolic blood pressure",and "diastolic blood pressure" are 95.79%,98.22% and 98.41%,respectively.Based on the found model and the object oriented techniques,an online hypertension risk evaluation system is developed,being able to gather new samples,learn the new samples,and improve its prediction accuracy automatically.
文摘Landslide is considered as one of the most severe threats to human life and property in the hilly areas of the world.The number of landslides and the level of damage across the globe has been increasing over time.Therefore,landslide management is essential to maintain the natural and socio-economic dynamics of the hilly region.Rorachu river basin is one of the most landslide-prone areas of the Sikkim selected for the present study.The prime goal of the study is to prepare landslide susceptibility maps(LSMs)using computer-based advanced machine learning techniques and compare the performance of the models.To properly understand the existing spatial relation with the landslide,twenty factors,including triggering and causative factors,were selected.A deep learning algorithm viz.convolutional neural network model(CNN)and three popular machine learning techniques,i.e.,random forest model(RF),artificial neural network model(ANN),and bagging model,were employed to prepare the LSMs.Two separate datasets including training and validation were designed by randomly taken landslide and nonlandslide points.A ratio of 70:30 was considered for the selection of both training and validation points.Multicollinearity was assessed by tolerance and variance inflation factor,and the role of individual conditioning factors was estimated using information gain ratio.The result reveals that there is no severe multicollinearity among the landslide conditioning factors,and the triggering factor rainfall appeared as the leading cause of the landslide.Based on the final prediction values of each model,LSM was constructed and successfully portioned into five distinct classes,like very low,low,moderate,high,and very high susceptibility.The susceptibility class-wise distribution of landslides shows that more than 90%of the landslide area falls under higher landslide susceptibility grades.The precision of models was examined using the area under the curve(AUC)of the receiver operating characteristics(ROC)curve and statistical methods like root mean square error(RMSE)and mean absolute error(MAE).In both datasets(training and validation),the CNN model achieved the maximum AUC value of 0.903 and 0.939,respectively.The lowest value of RMSE and MAE also reveals the better performance of the CNN model.So,it can be concluded that all the models have performed well,but the CNN model has outperformed the other models in terms of precision.
基金supported by the National Natural Science Foundation of China (61372165)the Postdoctoral Science Foundation of China (201150M15462012T50874)
文摘A cued search algorithm with uncertain detection performance is proposed for phased array radars. Firstly, a target search model based on the information gain criterion is presented with known detection performance, and the statistical characteristic of the detection probability is calculated by using the fluctuant model of the target radar cross section (RCS). Secondly, when the detection probability is completely unknown, its probability density function is modeled with a beta distribution, and its posterior probability distribution with the radar observation is derived based on the Bayesian theory. Finally simulation results show that the cued search algorithm with a known RCS fluctuant model can achieve the best performance, and the algorithm with the detection probability modeled as a beta distribution is better than that with a random selected detection probability because the model parameters can be updated by the radar observation to approach to the real value of the detection probability.
基金This work has been supported by the Australian Research Data Common(ARDC),project code–RG192500.
文摘Intrusion Detection Systems(IDSs)have a great interest these days to discover complex attack events and protect the critical infrastructures of the Internet of Things(IoT)networks.Existing IDSs based on shallow and deep network architectures demand high computational resources and high volumes of data to establish an adaptive detection engine that discovers new families of attacks from the edge of IoT networks.However,attackers exploit network gateways at the edge using new attacking scenarios(i.e.,zero-day attacks),such as ransomware and Distributed Denial of Service(DDoS)attacks.This paper proposes new IDS based on Few-Shot Deep Learning,named CNN-IDS,which can automatically identify zero-day attacks from the edge of a network and protect its IoT systems.The proposed system comprises two-methodological stages:1)a filtered Information Gain method is to select the most useful features from network data,and 2)one-dimensional Convolutional Neural Network(CNN)algorithm is to recognize new attack types from a network’s edge.The proposed model is trained and validated using two datasets of the UNSW-NB15 and Bot-IoT.The experimental results showed that it enhances about a 3%detection rate and around a 3%–4%falsepositive rate with the UNSW-NB15 dataset and about an 8%detection rate using the BoT-IoT dataset.
基金Project supported by the National Natural Science Foundation of China(Grant No.11504135)University Science and Technology Plan Project of Shandong Province,China(Grant Nos.J16LJ53).
文摘We theoretically study the reversible process of quantum entanglement state by means of weak measurement and corresponding reversible operation.We present a protocol of the reversion operation in two bodies based on the theory of reversion of single photon and then expend it in quantum communication channels.The theoretical results demonstrate that the protocol does not break the information transmission after a weak measurement and a reversible measurement with the subsequent process in the transmission path.It can reverse the perturbed entanglement intensity evolution to its original state.Under the condition of different weak measurement intensity the protocol can reverse the perturbed quantum entanglement system perfectly.In the process we can get the classical information described by information gain from the quantum system through weak measurement operation.On the other hand,in order to realize complete reversibility,the classical information of the quantum entanglement system must obey a limited range we present in this paper in the reverse process.
文摘The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.
基金The third and fourth authors were supported by the Project of Specific Research PrF UHK No.2101/2021 and Long-term development plan of UHK,University of Hradec Králové,Czech Republic.
文摘With the rapid growth of internet based services and the data generated on these services are attracted by the attackers to intrude the networking services and information.Based on the characteristics of these intruders,many researchers attempted to aim to detect the intrusion with the help of automating process.Since,the large volume of data is generated and transferred through network,the security and performance are remained an issue.IDS(Intrusion Detection System)was developed to detect and prevent the intruders and secure the network systems.The performance and loss are still an issue because of the features space grows while detecting the intruders.In this paper,deep clustering based CNN have been used to detect the intruders with the help of Meta heuristic algorithms for feature selection and preprocessing.The proposed system includes three phases such as preprocessing,feature selection and classification.In the first phase,KDD dataset is preprocessed by using Binning normalization and Eigen-PCA based discretization method.In second phase,feature selection is performed by using Information Gain based Dragonfly Optimizer(IGDFO).Finally,Deep clustering based Convolutional Neural Network(CCNN)classifier optimized with Particle Swarm Optimization(PSO)identifies intrusion attacks efficiently.The clustering loss and network loss can be reduced with the optimization algorithm.We evaluate the proposed IDS model with the NSL-KDD dataset in terms of evaluation metrics.The experimental results show that proposed system achieves better performance compared with the existing system in terms of accuracy,precision,recall,f-measure and false detection rate.
文摘As IoT devices become more ubiquitous, the security of IoT-based networks becomes paramount. Machine Learning-based cybersecurity enables autonomous threat detection and prevention. However, one of the challenges of applying Machine Learning-based cybersecurity in IoT devices is feature selection as most IoT devices are resource-constrained. This paper studies two feature selection algorithms: Information Gain and PSO-based, to select a minimum number of attack features, and Decision Tree and SVM are utilized for performance comparison. The consistent use of the same metrics in feature selection and detection algorithms substantially enhances the classification accuracy compared to the non-consistent use in feature selection by Information Gain (entropy) and Tree detection algorithm by classification. Furthermore, the Tree with consistent feature selection is comparable to the ensemble that provides excellent performance at the cost of computation complexity.
文摘This research aims to develop a model to enhance lymphatic diseases diagnosis by the use of random forest ensemble machine-learning method trained with a simple sampling scheme. This study has been carried out in two major phases: feature selection and classification. In the first stage, a number of discriminative features out of 18 were selected using PSO and several feature selection techniques to reduce the features dimension. In the second stage, we applied the random forest ensemble classification scheme to diagnose lymphatic diseases. While making experiments with the selected features, we used original and resampled distributions of the dataset to train random forest classifier. Experimental results demonstrate that the proposed method achieves a remark-able improvement in classification accuracy rate.
基金Supported by National Science and Technology Major Project(No.2012ZX10005001-004,No.2012ZX09303009-001)National Natural Science Foundation of China(No.81403298,No.81373857)+1 种基金Shanghai Natural Science Foundation of China(No.14ZR1442000)Shanghai Educational Development Foundation(No.14CG41)
文摘To identify key symptoms of two major syndromes in chronic hepatitis B (CHB), which can be the clinical evidence for Chinese medicine (CM) doctors to make decisions. Standardization scales on diagnosis for CHB in CM were designed including physical symptoms, tongue and pulse appearance. The total of 695 CHB cases with dampness-heat (DH) syndrome or Pi (Spleen) deficiency (SD) syndrome were collected for feature selection and modeling, another 275 CHB patients were collected in different locations for validation. Key symptoms were selected based on modified information gain (IG), and 5 classifiers were applied to assist with models training and validation. Classification accuracy and area under receiver operating characteristic curves (AUC) were evaluated. (1) Thirteen DH syndrome key symptoms and 13 SD syndrome key symptoms were selected from original 125 symptoms; (2) The key symptoms could achieve similar or better diagnostic accuracy than the original total symptoms; (3) In the validation phase, the key symptoms could identify syndromes effectively, especially in DH syndrome, which average prediction accuracy on 5 classifiers could achieve 0.864 with the average AUC 0.772. The selected key symptoms could be simple DH and SD syndromes diagnostic elements applied in clinical directly. (Registration N0.: ChiCTR-DCC-10000759).
文摘The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This helps the search engines to provide users with relevant and quick retrieval results. As web pages are represented by thousands of features, feature selection helps the web page classifiers to resolve this large scale dimensionality problem. This paper proposes a new feature selection method using Ward’s minimum variance measure. This measure is first used to identify clusters of redundant features in a web page. In each cluster, the best representative features are retained and the others are eliminated. Removing such redundant features helps in minimizing the resource utilization during classification. The proposed method of feature selection is compared with other common feature selection methods. Experiments done on a benchmark data set, namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.