摘要
为了简单有效地对数据集进行结构分析,提出了一种基于最小树进行聚类的算法(MSTCA).其基本思想是在最小树中切割所有大于一定阈值的边,对数据集进行子类划分,同时对较小的子类进行合并.MSTCA产生的聚类结果在不考虑子类次序时是唯一的。对它的递归调用还可在若干不同粒度层次上形成数据集的聚类结构.计算实验表明,MSTCA不仅能为具有各种不同聚类形状的数据集自适应地选择较好的聚类个数,而且只需简单的参数选择就能准确地分析出数据中存在的合理聚类和例外样本.
In order to analyze the structure of a dataset simply and efficiently, this paper proposes a new clustering algorithm based on minimal spanning tree: MSTCA. The basic idea of which is to partition a data set into subclasses by cutting all edges whose lengths are greater than a certain threshold in one of its minimal spanning tree, and to merge those relatively small subclasses at the same time. MSTCA can guarantee a unique clustering result without considering the order of subclasses, and the recursive call to it can generate a hierarchical structure with clusters in some different levels. Computing experiments show that MSTCA can adaptively choose the good number of clusters for a data set with clusters of various shapes and often accurately detect reasonable clusters and outliers in a data set requiring only simple selection of parameters.
出处
《北京工业大学学报》
EI
CAS
CSCD
北大核心
2007年第3期331-336,共6页
Journal of Beijing University of Technology
基金
北京市自然科学基金(4052005)
北京市属市管高等学校'中青年骨干教师培养计划'资助项目
关键词
最小树
阈值切割
聚类算法
聚类个数
层次聚类
minimal spanning trees
threshold cutting
clustering algorithms
number of clusters
hierarchical clustering