摘要
DBSCAN是一种简单、有效的基于密度的聚类算法,用于寻找被低密度区域分离的高密度区域。DBSCAN是最经常被使用、在科学文献中被引用最多的聚类算法之一。在数据维度比较高的情况下,DBSCAN的时间复杂度为0(n2)。然而,在现实世界中,数据集的大小已经增长到超大规模。对此,一个有效率的并行的DBSCAN算法被提出,并在MapRe-duce平台下实现它。首先,对已经预处理过的数据进行划分。接下来,局部的DBSCAN算法将对每一块划分好的数据空间实现聚类。最终,利用合并算法对上一阶段的聚类结果进行合并。实验结果验证了并行算法的有效性。
DBSCAN is an effective density-based clustering method which is designed to find high-density regions which are sep- arated by low-density regions. DBSCAN is one of the most common clustering algorithms and also most cited in scientific litera- ture. In the case of the data of high dimension, the computation complexity of DBSCAN is O(n2) . However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. In this paper, an efficient parallel density- based clustering algorithm is proposed and implemented by using MapReduce. Furthermore, we adopt a quick partitioning strate- gy for data which has been preprocessed is adopted. Then, Local DBSCAN process for each subspace divided by the partition pro- file is implemented to generate clusters. At last, the clusters which are generated in the previous phase are merged.
作者
林阿弟
陈晓锋
LIN A-di, CHEN Xiao-feng (Department of Computer Science, Xiamen University, Xiamen 361005, China)
出处
《电脑知识与技术》
2015年第4期161-164,共4页
Computer Knowledge and Technology