摘要
针对传统GK聚类算法无法自动确定聚类数和对初始聚类中心比较敏感的缺陷,提出一种改进的GK聚类算法。该算法首先通过基于类间分离度和类内紧致性的权和的新有效性指标来确定最佳聚类数;然后,利用改进的熵聚类的思想来确定初始聚类中心;最后,根据判定出的聚类数和新的聚类中心进行聚类。实验结果表明,新指标能准确地判断出类间有交叠的数据集的最佳聚类数,且改进后的算法具有更高的聚类准确率。
Traditional GK clustering algorithm cannot automatically determine the number of clusters, and is sensitive to the initial cluster centers. According to these defects, an improved algorithm was proposed in this paper. Firstly, a new validity index, based on the weighted sum of separation between clusters and inter-cluster compactness, was proposed for the determination of the proper number of clusters. Then the idea of an improved entropy clustering was referenced to determine the initial cluster centers. Finally, the improved algorithm clustered the data sets according to the number of clusters given by the new index and the new cluster centers. The experimental resuhs show that the new index works well in situations when there are overlapping clusters in the data set, and the improved algorithm has a higher clustering accuracy.
出处
《计算机应用》
CSCD
北大核心
2012年第9期2476-2479,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(61103129)
江苏省科技支撑计划项目(BE2009009)
关键词
聚类数
聚类有效性指标
初始聚类中心
熵聚类
GK聚类算法
cluster number
cluster validity index
initial cluster center
entropy clustering
Gustafson-Kessel (GK) clustering algorithm