摘要
针对K-medoids聚类算法对初始聚类中心敏感、聚类结果依赖于初始聚类中心的缺陷,提出一种局部方差优化的K-medoids聚类算法,以期使K-medoids的初始聚类中心分布在不同的样本密集区域,聚类结果尽可能地收敛到全局最优解。该算法引入局部方差的概念,根据样本所处位置的局部样本分布定义样本的局部方差,以样本局部标准差为邻域半径,选取局部方差最小且位于不同区域的样本作为K-medoids的初始中心,充分利用了方差所提供的样本分布信息。在规模大小不等的UCI数据集以及带有不同比例噪声的不同规模的人工模拟数据集上进行实验,并利用六种聚类算法性能测试指标进行测试,结果表明该算法具有聚类效果好、抗噪性能强的优点,而且适用于大规模数据集的聚类。提出的Num-近邻方差优化的K-medoids聚类算法优于快速K-medoids聚类算法及基于邻域的改进K-medoids聚类算法。
To overcome the disadvantages of K-medoids which was sensible to the initial seeds and whose clustering depended on the initial seeds, this paper proposed a new K-medoids algorithm to select the samples in different dense area as the initial seeds and made the clustering of K-medoids converge to the global optimal solution as could as possible. The new algorithm in- troduced the concept of the local variance, and gave the definition using the distribution pattern of exemplars in a local area. Then the local standard deviation was regarded the radius of the neighbourhood, so that the samples with the minimum local va- riance and lying at different areas were chosen as initial seeds for K-medoids. The proposed algorithm was tested on the real datasets with different size of samples from UCI machine learning repository and on the synthetically generated datasets with the varied size of exemplars and with some proportional noises. This paper adopted the 6 very popular criteria for evaluating cluste- ring algorithms to value the performance of the proposed algorithm. The experimental results demonstrate that the proposed K- medoids algorithm obtains good clustering, and is robust to noises, and is scalable to cluster large scale datasets. The proposed K-medoids clustering algorithm outperforms the fast K-medoids clustering algorithm and the improved K-medoids algorithm which is based on the neighbourhood.
出处
《计算机应用研究》
CSCD
北大核心
2015年第1期30-34,共5页
Application Research of Computers
基金
陕西省科技攻关基金资助项目(2013K12-03-24)
国家自然科学基金资助项目(31372250)
中央高校基本科研业务费专项资金资助项目(GK201102007)