摘要
为解决DBSCAN聚类算法的Eps及MinPts参数选择问题,提出一种领域无关的参数动态选择方法。首先,基于k-均值算法对数据集进行初步聚类,聚类中采用最大最小距离方法确定初始聚类中心。其次,针对k-均值聚类结果,计算统计各聚类中样本间距离的分布情况,选择使得具有最大样本对数的距离值作为对应类的Eps值,并通过Eps获得MinPts值。最后,对DBSCAN算法进行改进,使其可根据当前核心点所属k-均值聚类对应的Eps对其运行值进行自适应调整。将上述思想运用于未知协议条件下的比特流聚类分析,结果表明,在无需用户指定Eps及MinPts的条件下,即可获得满意的聚类结果,提高了算法的适用性和准确率。
This paper puts forward a field-irrelative method for dynamically selecting the Eps and MinPts parameters forDBSCAN algorithm.The dataset is first crudely clustered with k-means algorithm using maximum and minimum distanceinitial-centers choosing method.The distance distribution of samples within each k-means cluster is then calculated andanalyzed,choosing the distance which allows maximum point-pair numbers as Eps.The MinPts parameter is also calculatedaccording to the confirmed Eps.Also it improves DBSCAN algorithm to dynamically adjust Eps according to the kmeanscluster to which the current key point belongs.Appling the above ideas to unknown protocol bitstreams clustering,the experiment results demonstrate that the improved DBSCAN can yield satisfied clustering results without manuallyspecifying the Eps and MinPts parameters.The applicability and accuracy of DBSCAN algorithm are improved.
作者
王兆丰
单甘霖
WANG Zhaofeng;SHAN Ganlin(Electronics and Optics Engineering Department, Ordnance Engineering College, Shijiazhuang 050003, China)
出处
《计算机工程与应用》
CSCD
北大核心
2017年第3期80-86,共7页
Computer Engineering and Applications
关键词
聚类
一种经典的基于密度的聚类算法(DBSCAN)
参数选择
K-均值算法
未知协议
clustering
Density-Based Spatial Clustering of Applications with Noise(DBSCAN) algorithm
parameter selection
k-means algorithm
unknown protocol