期刊文献+

一种考虑数据类大小和密度差异的模糊聚类有效性指标 被引量:2

A Fuzzy Cluster Validity Index in Consideration of Different Size and Density of Data Set
下载PDF
导出
摘要 聚类有效性指标用于评价聚类质量和确定最佳聚类数,针对包含大小和密度差异性较大数据类的数据集,在分析了传统模糊聚类有效性指标不足的基础上,提出了一个同时考虑紧致性、重叠度和分离性的聚类有效性指标COS。类内紧致性用一定阈值内的隶属度之和与最大类内距离之比表示,一定阈值内各样本同属于两个类的隶属度差异反映了这两个类的重叠度,类间分离性的度量为最小类间距离,使COS指标值最大的聚类数即为最佳聚类数。在四个人工数据集和iris真实数据集上利用模糊C均值算法进行聚类实验的结果表明,COS指标可以有效发现小类和低密度类。 Cluster validity indices are used to validate clustering results and determine the optimal cluster number. Regarding to the data set with clusters of different size and density, a new cluster validity index called COS is proposed based on the analysis of drawbacks of traditional cluster validity indices. The compactness, overlapping and separation are taken into account in COS index at the same time. The compactness of intra-clusters is expressed by the ratio of the sum of membership degrees in certain threshold and the max distance of intra-clusters. The difference of membership degrees in certain threshold of a certain point to two clusters indicates the overlapping degree of the two clusters. The measurement of separation of inter-clusters is the minimum distance between clusters. The optimal cluster number is determined by the maximum value of COS index. Experimental studies using fuzzy c-means algorithm on four artificial data sets and iris data set show that the COS index can discover the small size and low density clusters effectively.
出处 《情报学报》 CSSCI 北大核心 2013年第3期306-313,共8页 Journal of the China Society for Scientific and Technical Information
基金 国家高技术研究发展计划(863计划)(编号:2011AA05A116) 国家自然科学基金重点项目(编号:71131002)
关键词 模糊C均值聚类 聚类有效性指标 大小和密度 COS指标 fuzzy c-means clustering, cluster validity index, size and density, COS index
  • 相关文献

参考文献17

二级参考文献56

共引文献225

同被引文献126

  • 1杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:191
  • 2倪长健,王顺久,崔鹏.投影寻踪动态聚类模型及其在天然草地分类中的应用[J].安全与环境学报,2006,6(5):68-71. 被引量:17
  • 3Anderberg M R. Cluster analysis for application[M]. New York: Academic Press, 1973.
  • 4Jain A K, Murty M N, Flynn P J. Data clustering: A review[J]. ACM Computing Survey, 1999, 31(3): 264-323.
  • 5Xu R, Wunsch II D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16: 645-678.
  • 6Omran M G H, Engelbrecht A P, Salman A. An overview of clustering methods[J]. Intelligent Data Analysis, 2007, 11(6): 583-605.
  • 7Giancarlo R, Utro F. Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis[J]. Theoretical Computer Science, 2012, 428(4): 58-79.
  • 8Bezdek J C. Cluster validity with fuzzy sets[J]. Journal of Cybernetics, 1974, 3(3): 58-74.
  • 9Liang J Y, Zhao X W, Li D Y, et al. Determining the number of clusters using information entropy for mixed data[J]. Pattern Recognition, 2012, 45(6): 2251-2265.
  • 10Pal N R, Biswas J. Cluster validation using graph theoretic concepts[J]. Pattern Recognition, 1997, 30(6): 847-857.

引证文献2

二级引证文献109

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部