摘要
模糊聚类是模式识别、机器学习和图像处理等领域的重要研究内容。模糊C-均值聚类算法是最常用的模糊聚类实现算法。该算法需要预先给定聚类数才能对数据集进行聚类。提出了一种新的聚类有效性指标,对聚类结果进行有效性验证。该指标从划分熵、隶属度、几何结构角度,定义了紧凑度、分离度、重叠度三个重要特征测量。在此基础上,提出了一种最佳聚类数确定方法。将新聚类有效性指标与传统有效性指标在六个人工数据集和三个真实数据集进行实验验证。实验结果表明,所提出的指标和方法能够有效地对聚类结果进行评估,适合确定样本的最佳聚类数。
Fuzzy clustering is an important research content in the fields of pattern recognition,machine learning and image processing.Fuzzy C-means clustering algorithm is the most commonly used fuzzy clustering algorithm.The algorithm needs to preset the number of clusters in order to cluster the data set.This paper proposed a new clustering validity index to validate the clustering results.This index defined the three important features of compactness,resolution and overlap degree from the perspective of partition entropy,membership degree and geometric structure.On this basis,this paper proposed a method of determining the optimal clustering number.It validated the new clustering validity index and the traditional effectiveness index in six artificial data sets and three real data sets.The experimental results show that the proposed indexes and methods can effectively evaluate the clustering results and are suitable for determining the optimal clustering number of the samples.
作者
耿嘉艺
钱雪忠
周世兵
Geng Jiayi;Qian Xuezhong;Zhou Shibing(School of Internet of Things Engineering, Jiangnan University, Wuxi Jiangsu 214122, China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第4期1001-1005,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61673193)
中央高校基本科研业务费专项资金(JUSRP11235
JUSRP51635B)
关键词
模糊C-均值聚类
聚类数
聚类有效性指标
模糊聚类
fuzzy C-means clustering
number of clusters
clustering validity index
fuzzy clustering