期刊文献+

面向大数据集的有效聚类算法 被引量:7

Efficient clustering algorithm for large data sets
下载PDF
导出
摘要 为解决传统模糊C-均值算法无法适应大规模数据集体量大、冗余属性的问题,提出了一种面向大数据集的混合聚类算法。将大数据集划分为多个子集,对各子集进行聚类,通过合并得到最终聚类结果。对于子集采用基于基因表达式编程(GEP)和模糊C-均值的混合算法进行聚类,以改善聚类的质量和效率;基于相似性选取初始聚类中心,使用信息熵体现属性重要程度,从而进一步优化聚类性能。实验仿真及分析结果表明,该算法具有较好地全局收敛性,得到的聚类效果也更好。 To solve the problem that traditional fuzzy C-means algorithm could not adopt to large scale datasets with large size and redundant attribute,a hybrid clustering algorithm for large data sets was proposed.The large data sets were divided into subsets,and each subset was first clustered,and then final clustering result was obtained by merging.The subset was clustered by a mixed algorithm based on gene expression programming (GEP) and fuzzy C-means.The quality and efficiency of clustering was improved.While initial clustering center was selected based on similarity,and the importance of data attribute was embedded by information entropy,thereby the clustering performance was optimized further.Simulation experiments showed that the algorithm had better global convergence,and could get even better clustering result.
作者 古凌岚
出处 《计算机工程与设计》 CSCD 北大核心 2014年第6期2183-2187,共5页 Computer Engineering and Design
关键词 大数据集 模糊C-均值 基因表达式编程 属性信息熵 聚类 large data sets fuzzy C-means gene expression programming attribute information entropy clustering
  • 相关文献

参考文献12

二级参考文献160

共引文献145

同被引文献60

引证文献7

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部