期刊文献+

大规模数据集下基于DBSCAN算法的增量并行化快速聚类 被引量:7

INCREMENTAL PARALLELIZATION OF FAST CLUSTERING BASED ON DBSCAN ALGORITHM UNDER LARGE-SCALE DATA SET
下载PDF
导出
摘要 时空轨迹数据挖掘是发现移动对象行为模式的重要方式。针对海量轨迹数据处理的需求,提出一种增量并行化快速聚类算法。算法基于数据点的数量采用二分法对空间网格分区,结合贪心算法弹性重组分区,合理进行数据划分;进行本地局部聚类,获得合并簇候选集;采用R*-tree索引候选簇进行合并判断并处理;建立合并簇的无向无环图模型,并进行数据的全局重标签。实验结果表明:算法的弹性分区处理能有效地减少噪点数据,提高局部聚类的质量,采用R*-tree索引结构的合并策略有效提高了聚类的时间效率,聚类效果好,能实现大规模数据的在线处理。 Spatial temporal trajectory data mining is an important way to discover the behavior patterns of mobile objects.Aimed at the demand of massive trajectory data processing,an incremental parallelization fast clustering algorithm was proposed.Based on the number of data points,the algorithm divided the space grid by dichotomy,and combined the greedy algorithm to restructure the partition rationally to reasonably divide the data.It dealt with local clustering to obtain the merged candidate cluster sets.The candidate clusters of R*-tree indexes were merged to be judged and processed.An undirected acyclic graph model of the merged clusters was established and the data was globally re-labeled.The experimental results show that the algorithm effectively reduces the elastic partition processing noise data and improves the quality of local clustering.The merging strategy adopting R*-tree index structure effectively improves the time efficiency of clustering,and has good clustering effect and realized the online processing of large-scale data.
作者 王兴 吴艺 蒋新华 廖律超 Wang Xing;Wu Yi;Jiang Xinhua;Liao Lüchao(School of Information Science and Engineering,Central South University,Changsha 410075,Hunan,China;School of Math and Information,Fujian Normal University,Fuzhou 350108,Fujian,China;Fujian Key Laboratory of Automotive Electronic and Electrical Drive Technology, Fujian University of Technology,Fuzhou 350108,Fujian,China)
出处 《计算机应用与软件》 北大核心 2018年第4期269-275,280,共8页 Computer Applications and Software
基金 国家自然科学基金项目(61304199 41471333) 福建省高校杰出青年科研人才计划项目(JA14209) 福建省教育厅项目(JA15325)
关键词 大数据 DBSCAN 均衡划分 增量 并行化 Big data DBSCAN Balanced partitioning Increment Parallelization
  • 相关文献

参考文献6

二级参考文献75

  • 1周家帅,王琦,高军.一种基于动态划分的MapReduce负载均衡方法[J].计算机研究与发展,2013,50(S1):369-377. 被引量:11
  • 2周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量:42
  • 3Steenbruggen J,Borzacchiello M T,Nijkamp P,et al.Mobile phone data from GSM networks for traffic parameter and urban spatial pattern assessment:a review of applications and opportunities[J].GeoJournal,2013,78(2):223-243.
  • 4Llorca D F,Sotelo M,Sánchez S,et al.Traffic data collection for floating car data enhancement in V2I networks[J].EURASIP Journal on Advances in Signal Processing,2010,7:1-13.
  • 5Mandir E.Potential of traffic information to optimize route and departure time choice[D].Zugl.:Stuttgart,Universität Stuttgart,Diss.,2012.
  • 6Ben-Akiva M E,Gao S,Wei Z,et al.A dynamic traffic assignment model for highly congested urban networks[J].Transportation Research Part C:Emerging Technologies,2012,24(10):62-82.
  • 7Xu R,Wunsch D.Survey of clustering algorithms[J].Neural Networks,IEEE Transactions on,2005,16(3):645-678.
  • 8Barthélemy M.Spatial networks[J].Physics Reports,2011,499(1):1-101.
  • 9Ester M,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C].Proceedings of Kdd-96,1996,96(34):226-231.
  • 10Carvalho A X Y,Albuquerque P H M,De Almeida Junior G R,et al.Spatial Hierarchical clustering[J].Revista Brasileira de Biometria,2009,27(3):411-442.

共引文献48

同被引文献62

引证文献7

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部