摘要
近邻传播聚类(AP)方法是近年来出现的一种广受关注的聚类方法,在处理多类、大规模数据集时,能够在较短的时间得到较理想的结果,因此与传统方法相比具有很大的优势。但是对于一些聚类结构复杂的数据集,往往不能得到很好的聚类结果。通过分析数据的聚类特性,设计了一种可以根据数据结构自动调整参数的核函数,数据集在其映射得到的核空间中线性可分或几乎线性可分,对该核空间中的数据集进行近邻传播聚类,有效提高了AP聚类的精确度和速度。算法有效性分析以及仿真实验验证了所提算法在处理大规模复杂结构数据集上的性能优于原始AP算法。
AP algorithm has become increasingly popular in recent years as an efficient and fast clustering algorithm.AP has better performance on large and multi-class dataset than the existing clustering algorithms.But for the datasets with complex cluster structures,it cannot produce good clustering results.Through analyzing the property of data clusters,this paper proposed a kernel function,optimized that the parameters automatically according to the dataset structure,and the dataset in kernel space were linearly separable or almost linearly.Carried AP on the kernel space,it had a kernel-adaptive affinity propagation clustering algorithm(KA-APC).Compared with the original AP clustering,it had the advantages of effectively dealing with the large multi-scale dataset.The promising experimental results show that this algorithm outperforms the original AP algorithm.
出处
《计算机应用研究》
CSCD
北大核心
2012年第5期1644-1647,1650,共5页
Application Research of Computers
基金
国家"863"计划资助项目(2009AA01A346)
关键词
近邻传播聚类
核聚类
核自适应聚类
流形学习
affinity propagation(AP)
kernel clustering
kernel adaptive clustering
manifold learning