期刊文献+

Num-近邻方差优化的K-medoids聚类算法 被引量:11

Optimized K-medoids clustering algorithm by variance of Num-near neighbour
下载PDF
导出
摘要 针对K-medoids聚类算法对初始聚类中心敏感、聚类结果依赖于初始聚类中心的缺陷,提出一种局部方差优化的K-medoids聚类算法,以期使K-medoids的初始聚类中心分布在不同的样本密集区域,聚类结果尽可能地收敛到全局最优解。该算法引入局部方差的概念,根据样本所处位置的局部样本分布定义样本的局部方差,以样本局部标准差为邻域半径,选取局部方差最小且位于不同区域的样本作为K-medoids的初始中心,充分利用了方差所提供的样本分布信息。在规模大小不等的UCI数据集以及带有不同比例噪声的不同规模的人工模拟数据集上进行实验,并利用六种聚类算法性能测试指标进行测试,结果表明该算法具有聚类效果好、抗噪性能强的优点,而且适用于大规模数据集的聚类。提出的Num-近邻方差优化的K-medoids聚类算法优于快速K-medoids聚类算法及基于邻域的改进K-medoids聚类算法。 To overcome the disadvantages of K-medoids which was sensible to the initial seeds and whose clustering depended on the initial seeds, this paper proposed a new K-medoids algorithm to select the samples in different dense area as the initial seeds and made the clustering of K-medoids converge to the global optimal solution as could as possible. The new algorithm in- troduced the concept of the local variance, and gave the definition using the distribution pattern of exemplars in a local area. Then the local standard deviation was regarded the radius of the neighbourhood, so that the samples with the minimum local va- riance and lying at different areas were chosen as initial seeds for K-medoids. The proposed algorithm was tested on the real datasets with different size of samples from UCI machine learning repository and on the synthetically generated datasets with the varied size of exemplars and with some proportional noises. This paper adopted the 6 very popular criteria for evaluating cluste- ring algorithms to value the performance of the proposed algorithm. The experimental results demonstrate that the proposed K- medoids algorithm obtains good clustering, and is robust to noises, and is scalable to cluster large scale datasets. The proposed K-medoids clustering algorithm outperforms the fast K-medoids clustering algorithm and the improved K-medoids algorithm which is based on the neighbourhood.
作者 谢娟英 高瑞
出处 《计算机应用研究》 CSCD 北大核心 2015年第1期30-34,共5页 Application Research of Computers
基金 陕西省科技攻关基金资助项目(2013K12-03-24) 国家自然科学基金资助项目(31372250) 中央高校基本科研业务费专项资金资助项目(GK201102007)
关键词 局部方差 Num-近邻 邻域 初始聚类中心 聚类 local variance Num-nearneighbour neibourhood initial seeds clustering
  • 相关文献

参考文献23

  • 1孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1074
  • 2HAN Jia-wei, KAMBER M, PEI Jing. Data mining: concepts and techniques [ M ]. San Francisco : Morgan Kaufmann Publishers,2006.
  • 3HUANG Zhe-xue. Clustering large data sets with mixed numeric and categorical values [ C ]//Proc of the 1st Pacific-Asia Conference onKnowledge Discovery and Data Mining. 1997 : 21-34.
  • 4HUANG Zhe-xue. Extensions to the K-means algorithm for clustering large data sets with categorical values[ J]. Data Minin9 and Knows- edge Discovery, 1998,2 ( 3 ) : 283 -304.
  • 5HUANG Zhe-xue,NG M K, RONG Hong-qiang,et al. Automated vari- able weighting in K-means type clustering[J]. IEEE Traas on Pat- tern Analysis and Machine Intelligence, 2005,27(5) : 657-668.
  • 6CHEN Xiao-jun, YE Yun-ming, XU Xiao-fei, et al. A feature group weighting method for subspace clustering of high-dimensional data [ J]. Pattern Recognition ,2012,45( 1 ) : 434-446.
  • 7谢娟英,蒋帅,王春霞,张琰,谢维信.一种改进的全局K-均值聚类算法[J].陕西师范大学学报(自然科学版),2010,38(2):18-22. 被引量:47
  • 8谢娟英,张琰,谢维信,高新波.一种新的密度加权粗糙K-均值聚类算法[J].山东大学学报(理学版),2010,45(7):1-6. 被引量:11
  • 9谢娟英,马箐,谢维信.一种确定最佳聚类数的新算法[J].陕西师范大学学报(自然科学版),2012,40(1):13-18. 被引量:11
  • 10谢娟英,郭文娟,谢维信,高新波.基于样本空间分布密度的改进次胜者受罚竞争学习算法[J].计算机应用,2012,32(3):638-642. 被引量:5

二级参考文献74

共引文献1295

同被引文献86

引证文献11

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部