摘要
针对恶意网址检测系统的特征选择和降维问题,基于特征选择方法的优化结果提出多种特征子集。利用基于分类器的准确率和召回率等性能评价指标,采用随机森林、贝叶斯网络、J48、随机树机器学习方法,对信息增益、卡方校验、信息增益率、基于Relief值、基于OneR分类器、基于关联性规则、基于相关性等多种特征选择算法所确定的特征子集进行检测。结果表明,除基于相关性特征选择算法确定的特征子集外,其他方法确定的特征子集均具有良好的分类性能,其中基于关联性规则选择的特征子集的维度仅为5,但各分类器基于此特征子集的分类准确率均高达99%以上。
The multiple feature subsets are proposed based on the optimization results of feature selection method to solve the problems of feature selection and dimension reduction for malicious URLs detection system. The classifier.based performance evaluation indicators such as accuracy rate and recall rate, and machine learning method using random forest, Bayesian network,J48,random tree are used to detect the feature subsets determined by information gain,Chi - square verification, information gain radio,and multi - feature selection algorithms based on Relief value,OneR classifier,correction rule and correction attribute evaluation. The results show that,except the feature subset determined by the algorithm based on correction attribute evaluation,the feature subsets determined by other feature selection algorithms have high classification performance,in which the dimensionality of feature subset determined by the algorithm based on correlation rule is only 5,but the classification accuracy rate of all the classifiers based on this feature subset can reach up to 99%.
作者
张慧
钱丽萍
汪立东
袁辰
张婷
ZHANG Hui;QIAN Liping;WANG Lidong;YUAN Chen;ZHANG Ting(College of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China)
出处
《现代电子技术》
北大核心
2019年第9期60-64,共5页
Modern Electronics Technique
基金
国家自然科学基金资助项目(61571144)
北京建筑大学博士基金项目(00331616014)~~
关键词
网络安全
恶意网址检测
特征提取
特征选择
特征子集
信息安全
network security
malicious URL detection
feature extraction
feature selection
feature subset
information security