期刊文献+

基于MapReduce的DBSCAN聚类算法的并行实现

The Realization of MapReduce-based DBSCAN Density-base Clustering Method
下载PDF
导出
摘要 DBSCAN是一种简单、有效的基于密度的聚类算法,用于寻找被低密度区域分离的高密度区域。DBSCAN是最经常被使用、在科学文献中被引用最多的聚类算法之一。在数据维度比较高的情况下,DBSCAN的时间复杂度为0(n2)。然而,在现实世界中,数据集的大小已经增长到超大规模。对此,一个有效率的并行的DBSCAN算法被提出,并在MapRe-duce平台下实现它。首先,对已经预处理过的数据进行划分。接下来,局部的DBSCAN算法将对每一块划分好的数据空间实现聚类。最终,利用合并算法对上一阶段的聚类结果进行合并。实验结果验证了并行算法的有效性。 DBSCAN is an effective density-based clustering method which is designed to find high-density regions which are sep- arated by low-density regions. DBSCAN is one of the most common clustering algorithms and also most cited in scientific litera- ture. In the case of the data of high dimension, the computation complexity of DBSCAN is O(n2) . However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. In this paper, an efficient parallel density- based clustering algorithm is proposed and implemented by using MapReduce. Furthermore, we adopt a quick partitioning strate- gy for data which has been preprocessed is adopted. Then, Local DBSCAN process for each subspace divided by the partition pro- file is implemented to generate clusters. At last, the clusters which are generated in the previous phase are merged.
作者 林阿弟 陈晓锋 LIN A-di, CHEN Xiao-feng (Department of Computer Science, Xiamen University, Xiamen 361005, China)
出处 《电脑知识与技术》 2015年第4期161-164,共4页 Computer Knowledge and Technology
关键词 DBSCAN MAPREDUCE 聚类算法 并行算法:数据挖掘 DBSCAN mapreduce clustering algorithms parallel algorithms data mining
  • 相关文献

参考文献7

  • 1Ester M, Kriegel H P, Sander J, et al. A density based algo- rithm for discovering clusters in large spatial databases[C]. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Min- ing. Portland, 1996:226 - 231.
  • 2Bentley J L. Multidimensional binary search trees used for as- sociative searching[J]. Communications of the ACM, 1975, 18 (9): 509-517.
  • 3Guttman A. R-trees: a dynamic index structure for spatial searching[J]. ACM, 1984, 14(2): 47-57.
  • 4Beekmann N, Kriegel H P, Schneider R, et aL The R*-tree: an efficient and robust access method for points and rectangles [M]. ACM, 1990,19(2): 322-331.
  • 5Xu X, Jager J, Kriegel H P. A fast parallel clustering algo- rithm for large spatial databases[J]. Data Min. Knowl. Diseov, 1999(3): 263 - 290.
  • 6Yaobin He, Tan Haoyu, Luo Wuman, Huajian Mao, Di Ma, Shengzhong Feng, Jianping Fan.MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce [J]. 2011 IEEE 17th International Conference on Parallel and Distributed Systems,2011: 473-480.
  • 7Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters[J]. Proceedings of the 6th conference on Sym- posium on Opearting Systems Design & Implementation - Vol- ume 6. Berkeley,CA, USA: USENIX Association, 2004: 10.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部