摘要
目的提出一种可以发现不同密度层次分布的聚类算法,解决多层次不同密度数据集的聚类问题.方法采用对数据对象的k-邻居距离进行排序,利用线性回归分析方法发现密度区域变化的边界,对同一个密度区域中的点利用DBSCAN算法进行聚类,获得了多密度级别的类.结果使用真实数据集与人工数据集测试结果表明,此算法可以发现现有算法所不能发现的模式.结论算法在时间效率上与DB-SCAN相同,空间效率上随着输入数据的数目增加而线性增长,同时此算法可适用于高维数据集.
The density-based algorithms for clustering are important clustering algorithms, such as DBSCAN, which can be used to find arbitrary shapes. The existing density-based algorithms can not find multilevel density clusters. The paper proposes an algorithm for clustering different density clusters. Because kneighbor distances of objects show the different density in data sets, the k-distances of all objects are sorted, and the boundary of different density level is found by linear regression. Algorithm on both real and artificial data sets are tested. The results show that its time complexity is equal to that of DBSCAN; its space complexity will be decreased linearly with the increasing numbers of input points.
出处
《沈阳建筑大学学报(自然科学版)》
CAS
2006年第2期329-333,共5页
Journal of Shenyang Jianzhu University:Natural Science
基金
辽宁省自然科学基金(20052006)
辽宁省教育厅攻关计划项目(05L354)