期刊文献+

Squeezer:An Efficient Algorithm for Clustering Categorical Data 被引量:32

原文传递
导出
摘要 This paper presents a new efficient algorithm for clustering categorical data,Squeezer, which can produce high quality clustering results and at the same time deservegood scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning tto an existing cluster (initially none), or creating t as a new cluster, which is determined bythe similarities between t and clusters. Due to its characteristics, the proposed algorithm isextremely suitable for clustering data streams, where given a sequence of points, the objective isto maintain consistently good clustering of the sequence so far, using a small amount of memoryand time. Outliers can also be handled efficiently and directly in Squeezer. Experimental resultson real-life and synthetic datasets verify the superiority of Squeezer.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第5期611-624,共14页 计算机科学技术学报(英文版)
基金 国家自然科学基金,IBMAS/400 Research Fund
  • 相关文献

参考文献17

  • 1Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. In Proc. 1999 Int. Conf. Data Engineering, Sydney, Australia, Mar., 1999, pp.512-521.
  • 2Alexandros Nanopoulos, Yannis Theodoridis, Yannis Manolopoulos. C2P: Clustering based on closest pairs. In Proc. 27th Int. Conf. Very Large Database, Rome, Italy, September, 2001, pp.331-340.
  • 3Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases.In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, USA, Aug., 1996,pp.226-231.
  • 4Zhang T, Ramakrishnan R, Livny M. BIRTH: An efficient data clustering method for very large databases. In Proc.the ACM-SIGMOD Int. Conf. Management of Data, Montreal, Quebec, Canada, June, 1996, pp.103-114.
  • 5Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. CURE: A clustering algorithm for large databases. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington, USA, June, 1998, pp.73-84.
  • 6Karypis G, Han E-H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 32(8): 68-75.
  • 7Sheikholeslami G, chatterjee S, Zhang A. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proc. 1998 Int. Conf. Very Large Databases, New York, August, 1998, pp.428-439.
  • 8Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. the 1998 ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington,USA, June, 1998, pp.94-105.
  • 9Jiang M FI Tseng S S, Su C M. Two-phase clustering process for outliers detection. Pattern Recognition Letters,2001, 22(6/7): 691-700.
  • 10Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan. CACTUS-clustering categorical data using summaries.In Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining, August, 1999, pp.73-83.

同被引文献228

引证文献32

二级引证文献141

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部