摘要
目前,许多误用检测系统无法检测未知攻击,而异常检测系统虽然能够精确检测未知攻击,但由于入侵检测固有的特性,入侵事件与正常事件类间存在极大的不平衡性,这导致很难利用机器学习的方法高效地进行入侵行为检测.为此,提出了一种基于信息增益和随机森林分类器的入侵检测系统.为了解决类之间的不平衡性,对训练数据集应用了合成少数过采样算法.提出了一种基于信息增益的特征选择方法,并用于构建一个数据集的特征约减子集.首先,利用随机森林算法从训练集中建立入侵模型,构建误用检测模型,通过网络连接的特征来匹配检测已知攻击.然后,利用信息增益的特征选择方法,根据特征约减获得的特征,将不确定性攻击的网络连接数据通过随机森林进行聚类,进而实现未知攻击的检测.实验采用的NSL-KDD入侵检测数据集是KDDCUP99数据集的增强版本.由于入侵检测固有的特性,NSL-KDD数据集设计时类间存在极大的不平衡性.实验结果表明,结合合成少数过采样算法以及基于特征选择的信息增益的随机森林分类器对少数类别异常检测率可达到0.962.
At present, many misuse detection systems cannot detect unknown attacks, while the anomaly detection system can accurately detect unknown attacks, but because of intrusion detection inherent characteristics, there is a great imbalance between intrusion events and normal events, which lead it very difficult to use the method of machine learning to carry out intrusion behavior detection. An intrusion detection system based on information gain and random forest classifier is proposed. In order to solve the imbalance between classes, a small number of over-sampling algorithms is applied to the training data set. A feature selection method based on information gain is proposed, and it is used to construct the feature subtraction subsets of the data set. Firstly, the intrusion model is established from the training set by using the random forest algorithm, and the misuse detection model is constructed, and the known attacks are detected by matching the characteristics of the network connection. Then, by using the feature selection method of information gain, the network connection data of the uncertain attack is clustered according to the characteristic of the feature, and the detection of unknown attack is realized by clustering with the forest. The NSL-KDD intrusion detection data set used in the experiment is an enhanced version of the KDDCUP'99 data set. Due to the inherent characteristics of intrusion detection, there is a great imbalance between NSL-KDD data set. The experimental results show that the random forest classifier combined with the Synthetic Minority Over Sampling Technique (SMOTE) can reach 0. 962 of the detection rate for small samole categories.
出处
《中北大学学报(自然科学版)》
CAS
2018年第1期74-79,88,共7页
Journal of North University of China(Natural Science Edition)
基金
国家自然科学基金资助项目(11404398)
河南科技厅重点攻关资助项目(142102210097)