摘要
时空轨迹数据挖掘是发现移动对象行为模式的重要方式。针对海量轨迹数据处理的需求,提出一种增量并行化快速聚类算法。算法基于数据点的数量采用二分法对空间网格分区,结合贪心算法弹性重组分区,合理进行数据划分;进行本地局部聚类,获得合并簇候选集;采用R*-tree索引候选簇进行合并判断并处理;建立合并簇的无向无环图模型,并进行数据的全局重标签。实验结果表明:算法的弹性分区处理能有效地减少噪点数据,提高局部聚类的质量,采用R*-tree索引结构的合并策略有效提高了聚类的时间效率,聚类效果好,能实现大规模数据的在线处理。
Spatial temporal trajectory data mining is an important way to discover the behavior patterns of mobile objects.Aimed at the demand of massive trajectory data processing,an incremental parallelization fast clustering algorithm was proposed.Based on the number of data points,the algorithm divided the space grid by dichotomy,and combined the greedy algorithm to restructure the partition rationally to reasonably divide the data.It dealt with local clustering to obtain the merged candidate cluster sets.The candidate clusters of R*-tree indexes were merged to be judged and processed.An undirected acyclic graph model of the merged clusters was established and the data was globally re-labeled.The experimental results show that the algorithm effectively reduces the elastic partition processing noise data and improves the quality of local clustering.The merging strategy adopting R*-tree index structure effectively improves the time efficiency of clustering,and has good clustering effect and realized the online processing of large-scale data.
作者
王兴
吴艺
蒋新华
廖律超
Wang Xing;Wu Yi;Jiang Xinhua;Liao Lüchao(School of Information Science and Engineering,Central South University,Changsha 410075,Hunan,China;School of Math and Information,Fujian Normal University,Fuzhou 350108,Fujian,China;Fujian Key Laboratory of Automotive Electronic and Electrical Drive Technology, Fujian University of Technology,Fuzhou 350108,Fujian,China)
出处
《计算机应用与软件》
北大核心
2018年第4期269-275,280,共8页
Computer Applications and Software
基金
国家自然科学基金项目(61304199
41471333)
福建省高校杰出青年科研人才计划项目(JA14209)
福建省教育厅项目(JA15325)
关键词
大数据
DBSCAN
均衡划分
增量
并行化
Big data
DBSCAN
Balanced partitioning
Increment
Parallelization