摘要
针对传统聚类算法在处理时空位置数据挖掘时面临的多维聚类问题,提出了动态加权聚类模型。该模型叠加利用经典k-均值和基于密度的DBSCAN聚类算法,通过计算最大轮廓系数确定合适的簇数目,按照划分初始簇类、识别和剔除噪声点、修正聚类簇中心点位置坐标3个步骤实现对大体量多维时空位置数据的聚类分析,提出了动态权重系数计算公式,优化了基于密度的DBSCAN聚类算法中相似度函数,并在Python3.7环境下以网络签到数据集实例仿真验算了该模型算法。实验结果表明,相较单一的传统聚类算法,该模型能综合利用多维非位置属性对时空位置数据点聚类,更合理界定聚类簇的归属数据点,对提升时空位置数据集聚类簇中数据点的聚类效果明显。
In view of the high dimension that the traditional clustering algorithm faced when processing spatio-temporal data mining,a dynamic weighted clustering model was proposed which overlaid k-means and DBSCAN clustering algorithms.After determining the number of clusters according to the maximum silhouette coefficient,the huge spatio-temporal dataset can be divided into several subclusters.Then the weighted coefficient calculation formula was proposed to improve the similarity function of DBSCAN clustering algorithm using suitable non-position attributes.With the new dynamic weighted DBSCAN algorithm,noise points in all sub-clusters are obviously identified and removed from the original sub-clusters.And the corrected center point location coordinates of sub-clusters can be calculated in using k-means clustering algorithm again.Moreover,in order to compare with the single traditional clustering algorithm,a dataset of point of interesting from social network was simulated to verify the optimization effect of the hybrid dynamic weighted clustering model in the programming environment of Python3.7.The experiment result shows that this model can more scientifically and reasonably define the belonging data points in every cluster,and the cluster effect is optimized obviously.
作者
郭名静
边少锋
单潮龙
熊鑫
GUO Mingjing;BIAN Shaofeng;SHAN Chaolong;XIONG Xin(School of Science,EastChina University of Technology,Nanchang 330013,China;School of Electrical Engineering,Naval University of Engineering,Wuhan 430033,China)
出处
《测绘科学》
CSCD
北大核心
2019年第11期35-42,共8页
Science of Surveying and Mapping
基金
国家自然科学基金项目(41576105,41604010)
江西省教育科学“十三五”规划2018年度课题(18YB099)
关键词
时空数据
数据挖掘
均值聚类
密度聚类
spatio-temporal data
data mining
k-means
density clustering