摘要
考虑到传统大规模数据聚类分析算法收敛速度较慢的问题,提出一种基于云计算技术的大规模数据聚类分析算法。定义云空间内的数据变量,计算数据点密度,以计算出的密度数据为基准,整合为两个不同的数据集合,不断循环删除集合内局部密度低于平均密度的离群点,得出聚类中心。筛选远离聚类中心的点,计算数据点与聚类中心的相似系数,将保留下来的数据点划分为聚类区域,采用层次分配方法将聚类数据点分配到聚类区域中,完成大规模数据的聚类分析。实验结果表明,与传统聚类分析方法相比,所提聚类分析算法的收敛速度最高可达10 mm/s,收敛速度更快,说明该算法的收敛效果较好。
In view of the slow convergence rate of traditional large-scale data clustering analysis algorithm,a large-scale data clustering analysis algorithm based on cloud computing technology is proposed.The data variables in cloud space are defined and the density of data points is calculated.By taking the calculated density data as the criteria,the data points are integrated into two different data sets,and those points in sets with the local density lower than the average density are repeatedly deleted to obtain the clustering center.The points far from the clustering center are screened.The similarity coefficients between the data points and the clustering center are calculated,and the remaining data points are dividing into clustering regions.The hierarchical allocation method is used to distribute clustering data points into clustering regions to complete the clustering analysis of large-scale data.The experimental results show that,in comparison with the traditional clustering algorithm,the convergence rate of the proposed clustering analysis algorithm can reach 10 mm/s,which is more fast.It is verified that the proposed algorithm has better convergence effect.
作者
塔娜
TA Na(School of Computer,Hulunbuir University,Hulunbuir 021008,China;College of Computer Science and Technology,Jilin University,Changchun 130012,China)
出处
《现代电子技术》
北大核心
2020年第15期123-126,共4页
Modern Electronics Technique
关键词
云计算技术
大规模数据
聚类中心
相似系数
数据点密度
收敛速度
cloud computing technology
large-scale data
clustering center
similarity coefficient
data point density
convergence rate