摘要
由于对聚类网格之间的相互影响未作出考虑,导致数据聚类算法出现聚类质量差等情况。因此,提出一种基于网格耦合的混合属性大数据聚类算法。通过网格耦合定义相关参量的基础,得到网格耦合过程中网格质心间距,利用网格进行大数据聚类,充分分析各个网格间权重影响情况。该聚类算法分别从在线和离线两个阶段开展:在线阶段更新网格特征向量,并根据属性动态变化划分网格;离线阶段构建无向图,顶点设置为网络中心点,质心距离以及中心点间距作为边构建无相图,根据该图获得最小生成树同时切断该树第r-1最大边,最终获取混合属性大数据的k个聚类,实现混合属性的准确聚类。实验结果表明,该算法在质心调节参数与质心距离取值较适中情况下具有良好聚类效果,且聚类质量与聚类效率较高。
Due to insufficient consideration of the interaction between clustering grids,data clustering algorithms often have poor clustering quality.Therefore,this paper proposes a hybrid attribute big data clustering algorithm based on grid coupling.Based on the definition of relevant parameters of grid coupling,the grid centroid spacing in the process of grid coupling is obtained.The grid is used for big data clustering to fully analyze the influence of weight among grids.The clustering algorithm is developed from online and offline stages respectively:the online stage updates the grid feature vector and divides the grid according to the dynamic changes of attributes;whereas an undirected graph is constructed in the off-line stage,the vertex is set as the network center point,and the centroid distance and center point spacing are used as edges to construct an undirected graph.According to the graph,the minimum spanning tree is obtained,the r-1 largest edge of the tree is cut off,and finally k clusters of mixed attribute big data are obtained.Thus,accurate clustering of mixed attributes is achieved.Experimental results show that the proposed algorithm has good clustering effect when the centroid adjustment parameters and centroid distance are moderate,and the clustering quality and efficiency are high.
作者
李洁
许青
张露露
王英明
LI Jie;XU Qing;ZHANG Lulu;WANG Yingming(Ma’anshan University,Ma’anshan 243000,China)
出处
《信息工程大学学报》
2022年第2期218-223,共6页
Journal of Information Engineering University
基金
2019安徽高校自然科学研究项目(KJ2019A0916)。
关键词
网格耦合
混合属性
大数据
聚类算法
网格质心
最小生成树
grid coupling
mixed attribute
big data
clustering algorithm
grid centroid
minimum spanning tree