期刊文献+

一种改进的DBSCAN算法在Spark平台上的应用 被引量:7

Application of Improved DBSCAN Algorithm on Spark Platform
下载PDF
导出
摘要 针对DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类算法内存占用率较高的问题,文中将改进的DBSCAN聚类算法与Spark平台并行聚类计算理论相结合,对海量数据采用分而治之的办法进行聚类处理,大幅减小了算法对内存的占用率。实验仿真结果表明,所提出的并行计算方法能够有效缓解内存不足的问题,并且该方法也能够用来评价DBSCAN聚类算法在Hadoop平台下的聚类分析效果,还能对两种聚类方法进行对比分析,从而获得较好的计算性能;且其比在Hadoop平台上的计算加速度提高了24%左右,因此可以用以评价DBSCAN聚类算法在聚类处理方面的优劣。 Aiming at the problem of high memory occupancy of DBSCAN(Density-Based Spatial Clustering of Applications with Noise)clustering algorithm,this paper combines the improved DBSCAN clustering algorithm with the parallel clustering calculation theory of Spark platform,and the clustering and processing methods for massive data are clustered,which greatly reduces the memory usage of the algorithm.The experimental simulation results show that the proposed parallel computing method can effectively reduce the shortage of memory,and it also can be used to evaluate the clustering effect of the DBSCAN clustering algorithm on the Hadoop platform,and compare and analyze the two clustering methods to obtain better computing performance.Besides,the acceleration is increased by about 24%compared with that on the Hadoop platform.The proposed method can be used to evaluate the pros and cons of the DBSCAN clustering algorithm in clustering.
作者 邓定胜 DENG Ding-sheng(School of Science and Technology,Sichuan Minzu College,Kangding,Sichuan 626001,China)
出处 《计算机科学》 CSCD 北大核心 2020年第S02期425-429,443,共6页 Computer Science
基金 四川民族学院自然科学重点项目(XYZB19001ZA) 四川省教育厅自然科学重点项目(17ZA0295) 四川民族学院2017年应用型示范课程项目(sfkc201705) 国家自然科学基金项目(11461058)。
关键词 并行计算 DBSCAN 聚类算法 SPARK 聚类加速比 Parallel computing DBSCAN Clustering algorithm Spark Clustering acceleration ratio
  • 相关文献

参考文献4

二级参考文献36

  • 1宋明,刘宗田.基于数据交叠分区的并行DBSCAN算法[J].计算机应用研究,2004,21(7):17-20. 被引量:9
  • 2何中胜,刘宗田,庄燕滨.基于数据分区的并行DBSCAN算法[J].小型微型计算机系统,2006,27(1):114-116. 被引量:16
  • 3李杰,贾瑞玉,张璐璐.一个改进的基于DBSCAN的空间聚类算法研究[J].计算机技术与发展,2007,17(1):114-116. 被引量:13
  • 4冯少荣,肖文俊.基于密度的DBSCAN聚类算法的研究及应用[J].计算机工程与应用,2007,43(20):216-221. 被引量:34
  • 5Chem M S, Hart J H, Yu P S. Data mining: An overview from a database perspective [ J ]. IEEE Transactions on Knowledge and Data Engineering, 1996,8 ( 6 ) : 866 - 883.
  • 6Kaufan L, Rpusseeuw P J. Finding Group in Data: An Introduction to Cluster Analysis [ M ]. New York:John Wiley & Sons, 1990.
  • 7Guha S, Rastogi R, Shi M K. CURE:An Efficient (;lustering Algorithm for Large Databases[ C]//Proc 1998 ACMSIGMOD Inter Conf Manage Data. New York : ACM Press, 1998 -73 - 84.
  • 8Agrawal R, Gehrke J, Gunopolos D, et al. Automatic subspace clustering of high dimensional data for data mining application [ C ]//Proc ACM SIGMOD Inter Conf Very Large Data Base. Roma:Morgan Kalffmann Publishers,2001:331 -340.
  • 9Ester M, Kriegel H P, Sander J, et al. A density based algorithm for discovering clusters in large spatial database with noise [ C ]//Proc 2nd Inter Conf Know Discove Data Mining. Portland:AAAI Press, 1996:226 -231.
  • 10Lin C Y, Chang C C, Lin C C. Fundamental Informatieae,2005,68(4) :315 -331.

共引文献26

同被引文献83

引证文献7

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部