期刊文献+

k-LDCHD——高维空间k邻域局部密度聚类算法 被引量:18

k-LDCHD—A Local Density Based k-Neighborhood Clustering Algorithm for High Dimensional Space
下载PDF
导出
摘要 聚类是数据挖掘领域的一项重要课题,高维空间聚类以数据分布稀疏、噪声数据多、“差距趋零现象”而成为难点.在分析现有聚类算法不足的基础上,引入k邻域点集、k邻域半径等概念,提出一种高维空间单参数k邻域局部密度聚类算法kPCLDHD;为了提高算法的效率,进一步定义了参考距离等概念,并采用“双参考数据点”对数据集中的数据对象进行预处理,以减少扫描数据集的开销,提出kPCLDHD的优化算法kLDCHD.理论分析和实验结果表明,算法可以有效解决高维空间聚类问题,算法是有效可行的. Clustering is an important research in data mining Clustering in high dimensional space is especially difficult for the spatial distribution of the data, too much noise data points, and the phenomenon that the distance between the distances to the nearest and farthest neighbors of a data point goes to zero By analyzing limitations of the existing algorithms, definitions such as k-neighborhood set and k-radius are introduced A local density based k-neighborhood clustering algorithm k-PCLDHD is proposed to solve this problem To improve the algorithm's efficiency, the optimized algorithm k-LDCHD is proposed The definition of reference distance is applied to make a pretreatment to the data set, thus avoiding quite a lot of scans to the data set after using double reference points, and the effectiveness is improved greatly The theoretical analysis and experimental results indicate that the algorithm can solve the problem of clustering in high dimensional space It's effective and efficient
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第5期784-791,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(70371015) 教育部高等学校博士学科点专项科研基金项目(20040286009)
关键词 k邻域半径 双参考数据点 参考半径 高维空间 k-neighbor radius double reference point reference radius high dimensional space
  • 相关文献

参考文献11

  • 1周水庚,周傲英,曹晶.基于数据分区的DBSCAN算法[J].计算机研究与发展,2000,37(10):1153-1159. 被引量:99
  • 2Zhang T, et al. Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Int'l Conf.Management of Data, Montreal. New York: ACM Press, 1996.73 ~ 84.
  • 3Guha S, Rastogi R, Shin K. CURE: An efficient clustering algorithm for large databases. In: Proc. ACM SIGMOD Int'l Conf. Management of Data, Seattle. New York: ACM Press,1998. 73~84.
  • 4Jiawei Han, Micheline. Data Mining: Concepts and Techniques.San Francisco: Morgan Kaufmann Publishers, 2000.
  • 5C. Ordones, E. Omiecinski. Efficient disk-based K-means clustering for relational databases. IEEE Trans. Knowledge and Data Engineering, 2004, 16:909~921.
  • 6C. Ordonez. Clustering binery data streams with K-means. ACM DKMD Workshop, San Diego, California, 2003.
  • 7Ester M, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. the 2nd Int'l Conf. Knowledge Discovering in Databases and Data Mining(KDD 96). Menlo Park, CA: AAA I Press, 1996.
  • 8Zhan W, et al. STING: A statistical information grid approach to spatial data mining. In: Proc. the 23rd VLDB Conf. Athens. San Francicso: Morgan Kaufmann, 1997. 186~ 195.
  • 9K. Beyer, J. Goldstein, R. Ramakhrisnan, et al. Nearest neighbor' meaningful. In: Proc. the 7th Int'l Conf. Database Theory ( ICDT' 99), http://citeseer.ist.psu.edu/605885.html,1999.
  • 10A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the neareast neighbor in high dimensional spaces. In: Proc. the 26th Int'l Conf. Very Large Data Bases, San Francisco, 2000.

二级参考文献5

  • 1周水庚,复旦大学计算机科学系技术报告,1999年
  • 2Zhan W,Proc of the 2 3 rd VL DB Conference,1997年,186页
  • 3Chen M S,IEEE Trans Knowledge Data Engineering,1996年,8卷,6期,866页
  • 4Zhang T,Proc ACM SIGMOD Int Conf on Management of Data,1996年,73页
  • 5Ng R T,Proc 20th VL DB Conference,1994年,144页

共引文献98

同被引文献169

引证文献18

二级引证文献182

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部