摘要
针对云海大数据聚类问题,以及当前数据聚类方法中的不足之处,提出基于网格索引的云海大数据模糊聚类方法。通过云海大数据网格划分,建立云海大数据网格索引机制,用于识别与描述空间内数据的分布与定位信息,应用过程中需要根据实际需求选择适当的网格划分级别来保证网格索引效果;根据空间索引机制中各网格内数据的分布情况,将近邻的高密度数据网格进行合并与边界柔化处理,实现数据簇更新及边界调整,结合隶属度函数获取数据聚类信息,完成云海大数据模糊聚类。实验结果表明,基于网格索引的云海大数据模糊聚类方法的聚类迭代次数少于实验对比方法,能够有效辨识数据集的实际类簇数,聚类过程的运行时间更短,即使对于含噪数据集也具备较好的鲁棒性。
For cloud big data clustering problem and the shortcomings of data clustering methods,this article puts forward a fuzzy clustering method of in-cloud big data based on grid index.Through the mesh generation of cloud data,the in-cloud big data grid index mechanism was established to identify and describe the distribution and location information of data in the space.In the application process,it was necessary to select the appropriate meshing level to ensure the grid index based on actual demands.According to the distribution of data in each grid of spatial indexing mechanism,the high-density data grids were merged and softened,so that the data cluster was updated and the boundary was adjusted.Finally,the data clustering information was obtained by combining the membership functions.Thus,the fuzzy clustering of in-cloud cloud big data was achieved.Simulation results show that the in-cloud big data fuzzy clustering method based on grid index has fewer clustering iterations than the comparison method,so that this method can effectively identify the actual cluster number of the data sets.Meanwhile,the running time is shorter and it has good robustness even for the data set with noise.
作者
康耀龙
冯丽露
张景安
KANG Yao-long;FENG Li-lu;ZHANG Jing-an(College of Computer and Network Engineering,Shanxi Datong University,Datong Shanxi 037009,China;College of Educational Science and Technology,Shanxi Datong University,Datong Shanxi 037009,China;Shanxi Datong University Network Information Center,Datong Shanxi 037009,China)
出处
《计算机仿真》
北大核心
2019年第12期341-344,441,共5页
Computer Simulation
基金
大同市经济和信息化委员会专项基金项目(JXW2017001)
关键词
网格索引
大数据
数据聚类
模糊聚类
隶属度函数
Grid index
Big data
Data clustering
Fuzzy clustering
Membership function