摘要
大数据聚类过程是一个随机的非线性处理过程,具有很高的不确定性。由于传统方法需要先验知识进行学习,不能很好地适应大数据的实时变化情况,无法有效实现大数据聚类,因此提出一种基于混沌关联特征提取的大数据聚类算法。分析了传统方法的弊端,通过重构相空间建立了一个多维的状态空间向量与混沌轨迹,使原系统中很多几何特征量保持不变,为分析原系统的混沌特征提供有效依据。将平均互信息量取第一个最小值时的横坐标所指的时间延迟作为重构相空间的最佳时间延迟,采用虚假最近邻点算法对最佳嵌入维数进行选择。将提取的关联维数这一特征量作为大数据聚类的混沌特征量,依据提取的混沌关联维特征对大数据进行聚类。仿真实验表明,所提算法能够有效提高数据的聚类效率,减少能耗,是一种有效的数据聚类方法。
Big data clustering process is a kind of stochastic nonlinear processing and has very high uncertainty. Because the traditional methods need prior knowledge to learn, are not good to adapt to the real-time change situation of big data and unable to effectively implement large data clustering, we put forward a kind of big data clustering method based on chaotic correlation feature extraction. We analyzed the disadvantages of the traditional methods, established a multidi- mensional state space vector and the chaotic trajectory by phase space reconstruction. Much of the geometry characte- ristic information in the original system remains same, which provides the effective basis for the analysis of chaotic cha- racteristics of the original system. Time delay referred by the abscissa when the average mutual information obtains the first minimum is as the best time delay of reconstructing phase space, and the false nearest neighbor algorithm is used to select the best embedding dimension. The extracted correlation dimension is used as the haotic correlation characteristics of bige data clustering, and big data is clustered based on the extracted chaos correlation dimension feature. The simulation results show that the proposed algorithm can effectively improve the efficiency of the clustering of data, reduce energy consumption,and is an effective method of data clustering.
出处
《计算机科学》
CSCD
北大核心
2016年第6期229-232,共4页
Computer Science
基金
陕西自然科学基金:无铅焊点在多场耦合作用下的失效行为及寿命预测方法(2015JM6345)资助
关键词
混沌关联维特征
大数据
聚类
Chaos correlation dimension feature,Big data,Clustering