期刊文献+

一种基于密度峰值的高效分布式聚类算法 被引量:4

An Efficient Distributed Clustering Algorithm Based on Peak Density
下载PDF
导出
摘要 基于密度峰值的聚类算法(DPC)是最近提出的一种高效密度聚类算法。该算法可以对非球形分布的数据聚类,有待调节参数少、聚类速度快等优点,但在计算每个数据对象的密度值和高密度最邻近距离时,需要进行距离度量,其时间复杂度为。在大数据时代,尤其是处理海量高维数据时,该算法的效率会受到很大的影响。为了提高该算法的效率和扩展性,利用Spark在内存计算以及迭代计算上的优势,提出一种高效的基于E2LSH分区的聚类算法ELSDPC(an efficient distributed density peak clustering algorithm based on E2LSH partition with spark)。算法利用DPC算法的局部特性,引入局部敏感哈希算法LSH实现将邻近点集划分到一个区域。通过实验分析表明:该算法可在满足较高准确率的同时有效提高聚类算法的扩展性和时间效率。 The density peak clustering algorithm (DPC) is a recently proposed efficient density clustering algorithm. The algorithm can cluster the data of non-spherical distribution,which needs less adjustment parameters and fast clustering speed. But when calculating the density and exclusion value of each data object,the distance measure needs to be measured,and its time complexity is . When dealing with big data,especially high-dimension data ,the efficiency of the algorithm will be greatly affected. In order to improve the efficiency and scalability of the algorithm,take the advantages of Spark in memory calculation and iterative computing,we propose an efficient clustering algorithm based on E2LSH partition-ELSDPC. Using the local characteristics of the DPC algorithm,the LSH implementation is introduced to divide the adjacent point set into a region. The experimental analysis shows that the algorithm can effectively improve the scalability and time efficiency of the clustering algorithm while satisfying the high accuracy.
作者 何仝 徐蔚鸿 马红华 曾水玲 HE Tong;XU Wei-hong;MA Hong-hua;ZENG Shui-ling(School of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha,Hunan 440114,China;Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation,Changsha University of Science and Technology,Changsha,Hunan 440114,China;Zixing City Science Bureau,Chenzhou,Hunan 23400,China)
出处 《计算技术与自动化》 2019年第2期64-71,共8页 Computing Technology and Automation
基金 国家自然科学基金资助项目(61363033) 湖南省科技服务平台专项资助项目(2012TP1001) 湖南省教育厅重点项目(17A007) 综合交通运输大数据智能处理湖南省重点实验室项目(2015TP1005) 长沙市科技计划项目(KQ1703018,KQ1706064)
关键词 聚类 密度峰值 大数据 局部敏感哈希 SPARK clustering density peak big data LSH Spark
  • 相关文献

参考文献1

二级参考文献15

  • 1Xu Rui, Wunsch D II. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 2Kaufman L, Peter R. Clustering by Means of Medoids [G] // Statistical Data Analysis Based on the IA Norm and Related Methods. North-Holland: North-Holland Press, 1987: 405- 416.
  • 3MacQueen J. Some methods for classification and analysis of multivariate observations[C] //Proc of the 5th Berkeley Symp on Mathematical Statistics and Probability. Berkeley, CA: University of California Press, 1967 281-297.
  • 4Zhang W, Wang X, Zhao D, et al. Graph Degree Linkage: Agglomerative Clustering on a Directed Graph [M] . Berlin: Springer, 2012:428-441.
  • 5Ester M, Kriegel H P, Sander J, et al. A density based algorithm for discovering clusters in large spatial databases with noise [C] //Proc of ACM KDD'96. New York: ACM, 1996:226-231.
  • 6Wang W, Jiong Y, Muntz R. STING: A statistical information grid approach to spatial data mining [C]//Proc of VLDB'97. San Francisco, CA: Morgan Kau{mann, 1997: 186-195.
  • 7Alex R, Alessandro L. Clustering by fast search and find of density peaks [J]. Science, 2014, 344(1492) :1492-1496.
  • 8Jeffrey D, Sanay G. MapReduce.. Simplified data processing on large clusters [J]. Communications of the ACM, 2004, 51(1) : 107-113.
  • 9Akdogan A, Demiryurek U, Banael Kashani F, et al. Voronoi-based geospatial query processing with MapReduee [C]//Proc of CloudCom '10. Piscataway, NJ: IEEE, 2010: 9-16.
  • 10Lu Wei, Shen Yanyan, Chen Su, etc. Efficient processing of k nearest neighbor joins using MapReduce [J]. VLDB Endowment, 2012, 5(10)= 1016-1027.

共引文献16

同被引文献31

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部