摘要
常规的非结构化大数据密度聚类方法耗时长,且易出现数据密度分配错误的情况,影响数据聚类精度。因此,提出一种基于深度学习的非结构化大数据快速密度聚类方法。采用数据密度函数求解每个非结构化大数据密度值,使用邻近搜索技术找出各簇最佳中心,选用Alex Net网络建立数据聚类学习框架,利用映射方式提取数据特征矢量,通过损失函数得出伪标签并作为反向传播依据。为了提升模型聚类速度及精度,引入小批量梯度下降优化聚类模型参数,实现非结构化大数据密度聚类。实验结果表明,所提方法能够使密度相似数据紧密、密度相差较大数据稀疏,令数据密度聚类效果良好。
Conventionally,traditional methods are time-consuming and prone to incorrect data density allocation,which affects the data clustering accuracy.Therefore,this paper proposed a fast density clustering method for nonstructural big data based on deep learning.Firstly,the data density function was used to calculate all density values of unstructured big data.Secondly,the proximity search technology was adopted to find the best center of each cluster.Then,the Alex Net network was used to construct a learning framework for data clustering.Meanwhile,data feature vectors were extracted by mapping.Thirdly,pseudo labels were obtained by loss function as a basis for backpropagation.In order to improve the clustering speed and accuracy of the model,small-lot gradient descent was used to optimize the model parameter,thus achieving the non-structural big data density clustering.Experimental results show that the proposed method can make the data with similar density integrate more closely with each other and make the data with large density differences sparse,so it has good data density clustering effect.
作者
胡涛
王中杰
张连明
陈晓锁
HU Tao;WANG Zhong-jie;ZHANG Lian-ming;CHEN Xiao-suo(Electric and Information Engineering College,Hunan Institute of Traffic Engineering,Hengyang Hunan 421001,China;College of Information Science and Engineering,Hunan Normal University,Changsha Hunan 410000,China)
出处
《计算机仿真》
2024年第5期501-505,共5页
Computer Simulation
基金
湖南省教育厅教学改革研究项目:程序设计类课程(HNJG-2021-1275)
湖南省教育厅科学研究重点项目(22A0056)
基于图神经网络的工业物联网模型与路由优化研究(2023.01-2025.12)。