摘要
在基于网络流量分析,被动式的网络设备识别研究中,网络流量数据中往往存在许多高维数据,其中的部分特征对设备识别贡献不大,甚至会严重影响分类结果和分类性能.所以针对这个问题本文提出了一种将Filter和Wrapper方式相结合,基于对称不确定性(SU)和近似马尔可夫毯(AMB)的网络流量特征选择算法FSSA,本文提出的方法首先利用对称不确定性算法选择出对于各个类别具有分类贡献的特征,去除不相关的特征属性;然后在候选特征子集中利用近似马尔可夫毯算法删除冗余特征,最后采用Wrapper方式基于C4.5分类算法,进行最后的特征优选.实验表明,该方法下选择出的特征对网络设备操作系统类型识别的精确率相较于经典的特征选择方法有了一定的提高,在小类别数据上的召回率也得到了提升.
In the research of passive network device identification based on network traffic analysis, much highdimensional data often appears in the network traffic data, and some of these features do not contribute much to device identification and even can seriously affect the classification results and performance. Therefore, this study proposes a network traffic feature selection algorithm FSSA that combines Filter and Wrapper approaches based on symmetric uncertainty(SU) and approximate Markov blanket(AMB). Specifically, the proposed method in this study first uses the SU algorithm to select the features with classification contributions for each category and remove irrelevant feature attributes. Then, the AMB algorithm is adopted to delete redundant features in the subset of candidate features. Finally,the Wrapper approach based on the C4.5 classification algorithm is employed to determine the final feature preference.The experimental results show that the accuracy of the features selected under this method for type identification of the network device operating system has been improved compared with classical feature selection methods, and the recall rate on small class data has also been raised.
作者
庞玉林
李喜旺
PANG Yu-Lin;LI Xi-Wang(Shenyang Institute of Computing Technology,Chinese Academy of Sciences,Shenyang 110168,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《计算机系统应用》
2022年第4期281-287,共7页
Computer Systems & Applications
基金
兴辽英才计划(XLYC2019019)。