期刊文献+

密度分布函数在聚类算法中的应应用用 被引量:8

Application of density distribution function in clustering algorithms
下载PDF
导出
摘要 深入分析了传统的基于密度的聚类方法的特点和存在的问题及讨论了基于密度聚类算法研究现状,提出了一种改进的基于密度分布函数的聚类算法.使用K最近邻(KNN)的思想度量密度以寻找当前密度最大点,即中心点.并使用区域比例,将类从中心点开始扩展,每次扩展的同时引入半径比例因子以发现核心点.再从该核心点的KNN扩展类,直至密度下降到中心点密度的给定比率时结束.给出了数个算法实例并与基于网格的共享近邻聚类(GNN)算法在聚类准确率和效率上进行了试验比较,试验表明该算法极大降低了基于密度聚类算法对参数的敏感性、改善了对高维密度分布不均数据集的聚类效果、提高了聚类准确率和效率. Characteristics and disadvantages of traditional density-based clustering algorithms are deeply investigated; the present research status of density-based clustering algorithms is discussed; an improved clustering algorithm based on density distribution function is put forward. K nearest neighbor (KNN) is used to measure the density of each point; a local maximum density point is defined as the center point. By means of local scale, classification is extended from the center point. For each point there is a procedure to determine whether it is a core point by a radius scale factor. The classification is extended once again from the core point until the density descends to the given ratio of the density of the center point. Several algorithm examples are given and the algorithm is experimentally compared with the grid-shared nearest neighbor (GNN) clustering algorithm, on the clustering accuracy ratio and efficiency. The tests show that the improved algorithm greatly reduces the sensitivity of density-based clustering algorithms to parameters, improves the clustering effect of the high-dimensional data sets with uneven density distribution, and enhances the clustering accuracy and efficiency.
出处 《控制理论与应用》 EI CAS CSCD 北大核心 2011年第12期1791-1796,共6页 Control Theory & Applications
基金 国家自然科学基金资助项目(60634020) 湖南省自然科学基金资助项目(08JJ3132) 中央高校基本科研业务费资助项目
关键词 聚类算法 KNN GNN 密度分布函数 OPTICS DENCLUE 区域比例 半径比例因子 clustering algorithms; KNN; GNN; density distribution function; OPTICS(ordering points to identify the clustering structure); DENCLUE(density-based clustering); local scale; radius scale factor
  • 相关文献

参考文献17

二级参考文献76

  • 1荆丰伟,刘冀伟,王淑盛.改进的K-均值算法在岩相识别中的应用[J].微计算机信息,2004,20(7):41-42. 被引量:5
  • 2姜园,张朝阳,仇佩亮,周东方.用于数据挖掘的聚类算法[J].电子与信息学报,2005,27(4):655-662. 被引量:68
  • 3修宇,王士同,吴锡生,胡德文.方向相似性聚类方法DSCM[J].计算机研究与发展,2006,43(8):1425-1431. 被引量:21
  • 4[1]Han JW,Kamber M. Data Mining:Concepts and Techniques[D]. Simon Fraser University,2000.
  • 5[2]Alsabti K,Ranka S,Singh V.An efficient k-means clustering algorithm[A]. IPPS-98,Proceedings of the First Workshop on High Performance Date Mining[C]. Orlando,Florida,USA,1998.
  • 6[3]Ester M,Kriegel HP,Sander J,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[A]. Proceedings 2nd International Conference on Knowledge Discovery and Data Mining[C]. Portland,OR,1996. 226-231.
  • 7[4]Wang HX,Zaniolo C. Database System Extensions for Decision Support:the AXL Approach[A]. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery[C]. 2000. 11-20.
  • 8Alexander Hinneburg,Daniel A Keim.A General Approach to Clustering in Large Databases with Noise[J].Knowledge and Information Systems,2003(5):387-415.
  • 9XiaoGao Yu,XiaoPeng Yu.The Research on an adaptive k-nearest neighbors classifier[C]//ICMLC.2006:1241-1246.
  • 10Han Jiawei,Micheline Kamber.Data Mining-Concepts and Techniques[M].China Machine Press,Beijing,2004.

共引文献599

同被引文献86

  • 1刘丽萍,王智,孙优贤.无线传感器网络部署及其覆盖问题研究[J].电子与信息学报,2006,28(9):1752-1757. 被引量:58
  • 2徐雪松,刘凤玉.一种基于距离的再聚类的离群数据发现算法[J].计算机应用,2006,26(10):2398-2400. 被引量:4
  • 3宋余庆,谢从华,朱玉全,李存华,陈健美,王立军.基于近似密度函数的医学图像聚类分析研究[J].计算机研究与发展,2006,43(11):1947-1952. 被引量:16
  • 4徐雪松,张谓,宋东明,张宏,刘凤玉.基于核的PP主成分分析及其在离群聚类中的应用[J].计算机科学,2007,34(9):131-134. 被引量:1
  • 5ANKERST M, BREUNING M, KRIEGEL H-P, et aL OPTICS: or- dering points to identify the clustering structure[ C]// Proceedings of 1999 ACM-SIGMOD International Conference on Management of Da- ta. New York: ACM, 1999:49-60.
  • 6ESTER M, KRIEGEL H-P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise [C]// SIMOUDIS E, HAN J, FAYYAD U M, ed. KDD-96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. [S.I.]: AAAI Press, 1996:226 -231.
  • 7SANDER J , QIN XUEJIE , LU ZHIYONG , et al . Automatic extraction of clusters from hierarchical clustering representations [ C]//PAKDD 2003: Proceedings of the 7 th Pacfic-Asia Conference on Knowledge Discovery and Data Mining. Heidelberg: Springer- Verlag, 2003:75 - 87.
  • 8KRIEGEL H-P, BRECHEISEN S, KRC)GER P, et al. Density- based data analysis mad similarity search[ C]// PETRUSHIN V A,KHAN L, ed. Multimedia Data Mining and Knowledge Discovery. Berlin: Springer, 2006:94 - 115.
  • 9维基百科:移动平均[EB/OL].[2012-08-29].blip://zh.wikipedia.org/zh/%E7%A7%BB%E5%8B%95%E5%B9%B3%E5%9D%87.
  • 10Signal smoothing 'algorithms[ EB/OL]. [ 2012- 07- 28]. http:// www. chem. uoa. gr/applets/appletsmooth/appl_smooth2, html.

引证文献8

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部