一种发现多层次密度的聚类算法

A Density-based Clustering Algorithm for Multi-level Density

下载PDF

导出

摘要目的提出一种可以发现不同密度层次分布的聚类算法,解决多层次不同密度数据集的聚类问题.方法采用对数据对象的k-邻居距离进行排序,利用线性回归分析方法发现密度区域变化的边界,对同一个密度区域中的点利用DBSCAN算法进行聚类,获得了多密度级别的类.结果使用真实数据集与人工数据集测试结果表明,此算法可以发现现有算法所不能发现的模式.结论算法在时间效率上与DB-SCAN相同,空间效率上随着输入数据的数目增加而线性增长,同时此算法可适用于高维数据集. The density-based algorithms for clustering are important clustering algorithms, such as DBSCAN, which can be used to find arbitrary shapes. The existing density-based algorithms can not find multilevel density clusters. The paper proposes an algorithm for clustering different density clusters. Because kneighbor distances of objects show the different density in data sets, the k-distances of all objects are sorted, and the boundary of different density level is found by linear regression. Algorithm on both real and artificial data sets are tested. The results show that its time complexity is equal to that of DBSCAN; its space complexity will be decreased linearly with the increasing numbers of input points.

作者孙焕良毕占举刘俊岭周祥国许景科

机构地区沈阳建筑大学信息科学与工程学院沈阳药科大学计算中心沈阳建筑大学计算中心辽宁公安司法管理干部学院

出处《沈阳建筑大学学报（自然科学版）》 CAS 2006年第2期329-333,共5页 Journal of Shenyang Jianzhu University：Natural Science

基金辽宁省自然科学基金(20052006) 辽宁省教育厅攻关计划项目(05L354)

关键词数据挖掘聚类基于密度的算法 DBSCAN data mining clustering density-based algorithms DBSCAN

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1Ng R T, Han J. Efficient and Effective Clustering Methods for Spatial Data Mining [ C ]. Santiago, Chile: Proe. of the 20th VLDB Conference, 1994.144 - 155.
2Karypis G. Han E H, Kumar V. CHAMELEON. A Hierarchical Clustering Algorithm Using Dynamic Modeling[J]. COMPUTER, 1999(32) :68 - 75.
3Guha S, Rastogi R, Shim K. Rock: A Robust Clustering Algorithm for Categorical Attributes[C]. Sydney, Australia: Proc. of Conf. Data Engineering (ICDE' 99 ),1999. 512 - 521.
4Guha S, Rastogi R, Shim K. CURE: An Efficient Clustering Algorithm for Large Databases [ C ]. Seattle,WA. Proc. of SIGMOD 1998, 1998.73 - 84.
5Ester M, Kriegel H P, Sander J, et al. A Density-BasedAlgorithm for Discovering Clusters in Large Spatial Databases [ C]. Portland OR, USA: Proc. of KDD Conf., 1996. 226 - 231.
6Ankerst M, Breunlg M M, Kriegel H P, et al. Optics:Ordering Points to Identify the Clustering Structure[C]. Philadelphia, Pennsylvania USA: Proe. of SIGMOD 1999, 1999.49 - 60.
7Levent Ertoz, Michael Steinbach, Vipin Kumar. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data[ C]. San Francisco, CA,USA: Proc. of Proceedings of the Third SIAM International Conference on Data Mining, 2003. 150 - 155.
8Agrawal R, Gehrke J, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications[ C]. Seattle, WA: Proe. of SIGMOD 1998,1998.94 -105.
9宋晓宇,薛春燕,许景科.关联规则在先进技术与产品推荐系统中的应用[J].沈阳建筑大学学报（自然科学版）,2005,21(5):564-567. 被引量：2

二级参考文献9

1杨引霞,谢康林,朱扬勇,左子叶.电子商务网站推荐系统中关联规则推荐模型的实现[J].计算机工程,2004,30(19):57-59. 被引量：24
2Wilvan der Aalat, Boudewijn Van Dongen. A surveyof issues and approaches[J]. Data&Knowledge Engineering, 2003, (47) :237 - 267.
3Jiawei Han, Mieheline Kamber. Data Mining Concepts And Techniques[ M]. Morgan Karfmann: Publishere, 2001.
4Aurice D Mulvenna. Personalization on the net using Web mining[J]. Communications of the ACM, 2000,(43): 122 - 12.
5Nyanchama M. The role graph model and conflict Ofinterest[J]. ACM. ISSEC, 1999, 2(1): 3 - 33.
6Larry Stevens. IT sharpens data mining's focus[J]. Internet Week, 2001, (8) :29 - 30.
7苏敏,郭瑞景,陶先平,徐锋,陈红强,吕建.Agent为中介的电子商务研究[J].计算机科学,2001,28(11):83-88. 被引量：6
8张宜生,刘凡,梁书云.人力资源数据挖掘技术及其应用[J].计算机工程与应用,2002,38(6):187-189. 被引量：17
9孙焕良,王永会,宋晓宇,李彤,李晓辉.基于面向对象技术构建多维数据模型[J].沈阳建筑工程学院学报（自然科学版）,2002,18(3):214-216. 被引量：4

共引文献1

1刘爱宏,史春燕.基于数据挖掘的现代物流信息系统管理问题研究[J].物流技术,2013,32(5):381-383. 被引量：7

1许洪玮,曹江中,何家峰,戴青云.基于密度与路径的稳健谱聚类[J].计算机工程与应用,2015,51(2):165-170. 被引量：1
2顾洪博,张继怀.不确定性数据的聚类分析研究及应用[J].河北工程大学学报（自然科学版）,2012,29(1):109-112. 被引量：1
3孙焕良,邱菲,刘俊岭,朱叶丽.IncSNN——一种基于密度的增量聚类算法[J].计算机研究与发展,2006,43(z3):309-313. 被引量：5
4赵文冲,蔡江辉,张继福.改进k值自动获取VDBSCAN聚类算法[J].计算机系统应用,2016,25(9):131-136. 被引量：3
5高济.分布式问题求解过程控制的动态层次分布[J].自动化学报,1989,15(5):428-431.
6谢人强,陈震.基于项目流行度与用户行为的协同过滤推荐算法[J].北京信息科技大学学报（自然科学版）,2016,31(1):76-79. 被引量：2
7曾泽林,段明秀.基于密度的聚类算法DBSCAN的研究与实现[J].科技信息,2012(30):163-163. 被引量：3
8张倩,李明,王雪松.基于密度分布的半监督回归算法研究[J].工矿自动化,2012,38(3):29-30.
9孙焕良,邱菲,朱叶丽,王永会.ISNN:一种基于密度的高效增量聚类算法[J].沈阳建筑大学学报（自然科学版）,2006,22(6):1015-1018.
10杨昕,彭玉青.结合蚂蚁算法的K-Means聚类分析[J].河北工业大学学报,2007,36(3):48-52. 被引量：2

沈阳建筑大学学报（自然科学版）

2006年第2期

浏览历史

内容加载中请稍等...

一种发现多层次密度的聚类算法

参考文献9

二级参考文献9

共引文献1

相关作者

相关机构

相关主题

浏览历史