摘要
大数据的结构特征具有高维性、无序性,其直接挖掘和提取的难度较大。基于大数据任务中海量数据的分析需求,提出基于改进稀疏表示的大数据模糊聚类算法。算法通过大数据的标准化处理结果提取大数据特征,以改进稀疏性为基础,采用BP算法对求解大数据特征,从中取得各个特征量的稀疏系数及协同表示系数。基于获取的大数据不同特征联合系数,确定数据类别标签和模糊聚类中心,利用遗传算法对大数据完成模糊聚类,实现大数据模糊聚类算法的设计。实验结果验证了研究算法达到最优应用效果所需的平均迭代次数更少,CPU耗时可控制在0.6s内,能够精准聚类多种大数据,证明了该算法的应用有效性更强。
The structure of big data has high dimensionality and randomness, so it is difficult to mine and extract big data directly. Due to the analysis requirements of massive data in big data tasks, this paper presented a fuzzy clustering algorithm for big data based on improved sparse representation. This algorithm extracted big data features through the standardized processing results of big data. Based on improving the sparsity, BP algorithm was used to solve the big data features, from which the sparsity coefficients and collaborative representation coefficients of each feature quantity were obtained. Based on these associated coefficients of big data, data category labels and fuzzy clustering centers were determined. Finally, the genetic algorithm was adopted to complete the fuzzy clustering of big data, thus designing the fuzzy clustering algorithm for big data. Experimental results prove that the proposed algorithm needs fewer average iterations in order to achieve the optimal application effect. And the CPU time consumption of CPU can be controlled within 0.6s. In addition, this algorithm can accurately cluster a variety of big data, so it has stronger application effectiveness.
作者
鄂晶晶
杨丽华
冯锋
E Jing-jing;YANG Li-hua;FENG Feng(Computer School,Hulunbuir College,Hulunbuir Inner Mongolia 021000,China;School of Information Engineering,Ningxia University,Yinchuan Ningxia 750021,China)
出处
《计算机仿真》
北大核心
2023年第1期479-483,共5页
Computer Simulation
基金
宁夏自然科学基金重点资助项目(2021AAC02004)。
关键词
改进稀疏表示
大数据模糊聚类
稀疏系数
特征提取
Improved sparse representation
Big data fuzzy clustering
Sparse coefficient
BP algorithm
Feature extraction