摘要
聚类有效性指标用于评价聚类质量和确定最佳聚类数,针对包含大小和密度差异性较大数据类的数据集,在分析了传统模糊聚类有效性指标不足的基础上,提出了一个同时考虑紧致性、重叠度和分离性的聚类有效性指标COS。类内紧致性用一定阈值内的隶属度之和与最大类内距离之比表示,一定阈值内各样本同属于两个类的隶属度差异反映了这两个类的重叠度,类间分离性的度量为最小类间距离,使COS指标值最大的聚类数即为最佳聚类数。在四个人工数据集和iris真实数据集上利用模糊C均值算法进行聚类实验的结果表明,COS指标可以有效发现小类和低密度类。
Cluster validity indices are used to validate clustering results and determine the optimal cluster number. Regarding to the data set with clusters of different size and density, a new cluster validity index called COS is proposed based on the analysis of drawbacks of traditional cluster validity indices. The compactness, overlapping and separation are taken into account in COS index at the same time. The compactness of intra-clusters is expressed by the ratio of the sum of membership degrees in certain threshold and the max distance of intra-clusters. The difference of membership degrees in certain threshold of a certain point to two clusters indicates the overlapping degree of the two clusters. The measurement of separation of inter-clusters is the minimum distance between clusters. The optimal cluster number is determined by the maximum value of COS index. Experimental studies using fuzzy c-means algorithm on four artificial data sets and iris data set show that the COS index can discover the small size and low density clusters effectively.
出处
《情报学报》
CSSCI
北大核心
2013年第3期306-313,共8页
Journal of the China Society for Scientific and Technical Information
基金
国家高技术研究发展计划(863计划)(编号:2011AA05A116)
国家自然科学基金重点项目(编号:71131002)