摘要
具有噪声的基于密度的聚类方法(Density-based spatial clustering of applications with noise,DBSCAN)在数据规模上的扩展性较差。本文在其基础上提出一种改进算法——具有噪声的基于密度的快速聚类方法(Fast-density-based spatial clustering of applications with noise,F-DBSCAN),对核心对象邻域中的对象只作标记,不再进行扩展检查,通过判断核心对象邻域中是否存在已标记对象来实现簇合并,对边界对象判断其邻域中是否存在核心对象来确认是否为噪声。此方法避免了原始算法中对重叠区域的重复操作,在不需创建空间索引的前提下,其时间复杂度为O(nlogn)。通过实验数据集和真实数据集,验证其聚类效果及算法效率。实验表明F-DBSCAN算法不仅保证了有良好的聚类效果及算法效率,并且在数据规模上具有良好的扩展性。
Density-based spatial clustering of applications with noise(DBSCAN)has poor scalability on the data size,especially when the amount of data increases.Here an improved adaptive fast-density-based spatial clustering of applications with noise(F-DBSCAN)algorithm is proposed,with no longer checks of the objects inside the neighborhood of core objects,but just the mark of them.Merging clusters is performed by determining whether there exist the marked objects in the neighborhood of core objects.Noisy objects are recognized by checking whether the neighborhood of border ones contains a core ones.The proposed algorithm can avoid the repeated checking of overlapping area of the original DBSCAN without building the spatial index,thus improving its efficiency substantially with time complexity approaching O(nlogn).The clustering quality of F-DBSCAN is validated on both artificial and real datasets,and its efficiency is also validated on two real datasets from different industries.The empirical results suggest that F-DBSCAN can achieve good clustering quality as well as better efficiency and scalability.
出处
《数据采集与处理》
CSCD
北大核心
2015年第4期888-895,共8页
Journal of Data Acquisition and Processing
基金
江苏省社会发展(BE2010638)资助项目