摘要
为优化数据类型识别技术,进一步完善数据类型识别的方法,改善当前数据类型识别难以识别出复合文件的问题。笔者通过对8种常见的数据类型进行实验,初步选定朴素贝叶斯等几种分类算法,并提出基于支持向量机(Support Vector Machine,SVM)的多方面参数选定方法,然后依据新的数据类型识别方法与传统文件类型分别进行对比实验,同时确定数据类型识别的函数分析方法。通过实验可知,基于SVM支持向量机算法的数据类型识别方法建模时间长,但识别率高,被认定为以后要采用的新的基于机器学习的数据类型识别方法。
In order to optimize the technology of data type identification,the method of data type identification is further improved to improve the problem that the current data type identification is difficult to identify composite files.Based on the experiment of 8 kinds of common data types,and preliminary selected such as naive bayes classification algorithm,and puts forward the various parameter selection method based on support vector machine SVM,then on the basis of new data types identification method and the comparative experiments with traditional file types respectively,at the same time to determine the data type recognition function analysis method.It can be known through experiments,the data type recognition method based on SVM support vector machine algorithm takes a long time to model,but the recognition rate is high,which is identified as a new data type recognition method based on machine learning to be adopted in the future.
作者
李锐
LI Rui(School of Mathematics and Computational Science,Hunan University of Science and Technology,Xiangtan Hunan 411201,China)
出处
《信息与电脑》
2021年第16期150-153,共4页
Information & Computer
关键词
机器学习
文件类型
支持向量机算法
文件碎片
machine learning
file type
support vector machine algorithm
file fragmentation