摘要
聚类是数据挖掘领域中的一个重要研究方向 .聚类技术在统计数据分析、模式识别、图像处理等领域有广泛应用 .迄今为止人们提出了许多用于大规模数据库的聚类算法 .基于密度的聚类算法 DBSCAN就是一个典型代表 .以 DBSCAN为基础 ,提出了一种基于密度的快速聚类算法 .新算法以核心对象邻域中所有对象的代表对象为种子对象来扩展类 ,从而减少区域查询次数 ,降低 I/ O开销 ,实现快速聚类 .对二维空间数据测试表明 :快速算法能够有效地对大规模数据库进行聚类 ,速度上数倍于已有 DBSCAN算法 .
Clustering is a promising application area for many fields including data mining, statistical data analysis, pattern recognition, image processing, etc. In this paper, a fast density based clustering algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Unlike DBSCAN, the new DBSCAN uses only a small number of representative objects in a core object's neighborhood as seeds to expand the cluster so that the execution frequency of region query can be decreased, and consequently the I/O cost is reduced. Experimental results show that the new algorithm is effective and efficient in clustering large scale databases, and it is faster than the original DBSCAN by several times.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2000年第11期1287-1292,共6页
Journal of Computer Research and Development
基金
国家自然科学基金项目!(项目编号 6 97430 0 1)
国家教委博士点教育基金
关键词
数据挖掘
聚类
密度
快速算法
数据库
spatial database, data mining, clustering, density, fast algorithm, representative objects