Squeezer：An Efficient Algorithm for Clustering Categorical Data 被引量：32

导出

摘要 This paper presents a new efficient algorithm for clustering categorical data,Squeezer, which can produce high quality clustering results and at the same time deservegood scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning tto an existing cluster (initially none), or creating t as a new cluster, which is determined bythe similarities between t and clusters. Due to its characteristics, the proposed algorithm isextremely suitable for clustering data streams, where given a sequence of points, the objective isto maintain consistently good clustering of the sequence so far, using a small amount of memoryand time. Outliers can also be handled efficiently and directly in Squeezer. Experimental resultson real-life and synthetic datasets verify the superiority of Squeezer.

作者何增有徐晓飞邓胜春

机构地区 Department of Computer Science and Engineering

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第5期611-624,共14页 计算机科学技术学报（英文版）

基金国家自然科学基金，IBMAS/400 Research Fund

分类号 TP274.2 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献17

1Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. ROCK: A robust clustering algorithm for categorical attributes. In Proc. 1999 Int. Conf. Data Engineering, Sydney, Australia, Mar., 1999, pp.512-521.
2Alexandros Nanopoulos, Yannis Theodoridis, Yannis Manolopoulos. C2P: Clustering based on closest pairs. In Proc. 27th Int. Conf. Very Large Database, Rome, Italy, September, 2001, pp.331-340.
3Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases.In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, USA, Aug., 1996,pp.226-231.
4Zhang T, Ramakrishnan R, Livny M. BIRTH: An efficient data clustering method for very large databases. In Proc.the ACM-SIGMOD Int. Conf. Management of Data, Montreal, Quebec, Canada, June, 1996, pp.103-114.
5Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. CURE: A clustering algorithm for large databases. In Proc. the ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington, USA, June, 1998, pp.73-84.
6Karypis G, Han E-H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 32(8): 68-75.
7Sheikholeslami G, chatterjee S, Zhang A. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proc. 1998 Int. Conf. Very Large Databases, New York, August, 1998, pp.428-439.
8Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. the 1998 ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington,USA, June, 1998, pp.94-105.
9Jiang M FI Tseng S S, Su C M. Two-phase clustering process for outliers detection. Pattern Recognition Letters,2001, 22(6/7): 691-700.
10Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan. CACTUS-clustering categorical data using summaries.In Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining, August, 1999, pp.73-83.

同被引文献228

1李永珺,刘六生.合作学习的问题研究[J].学术探索,2003(S1):269-271. 被引量：3
2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
3袁玉飞.一种用于雷达网目标状态融合的算法[J].情报指挥控制系统与仿真技术,2004,26(5):53-56. 被引量：2
4李洁,高新波,焦李成.模糊CLOPE算法及其参数优选[J].控制与决策,2004,19(11):1250-1254. 被引量：4
5蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量：9
6冯兴杰,黄亚楼.带约束条件的聚类算法研究[J].计算机工程与应用,2005,41(7):12-14. 被引量：12
7陈小松,崔志明.基于Chameleon算法的用户聚类的设计与实现[J].微机发展,2005,15(4):48-50. 被引量：7
8王伟东,芦金婵,张讲社.基于视觉原理的密度聚类算法[J].工程数学学报,2005,22(2):349-352. 被引量：5
9杨胜文,史美林.一种支持QoS约束的Web服务发现模型[J].计算机学报,2005,28(4):589-594. 被引量：131
10骆正山,毋建宏,王小完.基于CHAMELEON算法构建自适应网站研究[J].微电子学与计算机,2005,22(3):259-261. 被引量：1

引证文献32

1卓琳,赵厚宇,詹思延.异常检测方法及其应用综述[J].计算机应用研究,2020,37(S01):9-15. 被引量：25
2蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量：9
3蒋盛益,李庆华.聚类分析中的差异性度量方法研究[J].计算机工程与应用,2005,41(11):146-149. 被引量：4
4蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量：21
5蒋盛益,李庆华,王卉,孟中楼.一种基于聚类的有指导的入侵检测方法[J].小型微型计算机系统,2005,26(6):1042-1045. 被引量：6
6蒋盛益,李庆华,赵延喜.一种两阶段异常检测方法[J].小型微型计算机系统,2005,26(7):1237-1240. 被引量：7
7蒋盛益,李庆华.基于引力的入侵检测方法[J].系统仿真学报,2005,17(9):2202-2206. 被引量：6
8郝凯,朱敏.有源雷达组网目标定位中去除虚假目标的改进方法[J].四川大学学报（自然科学版）,2006,43(2):315-319. 被引量：4
9蒋盛益,阮幼林,李庆华.面向混合属性的高效聚类算法研究[J].计算机工程,2006,32(12):47-49.
10蒋盛益.基于投票机制的融合聚类算法[J].小型微型计算机系统,2007,28(2):306-309. 被引量：7

二级引证文献141

1王鹏宇,王国宇,贾贞,曹晓晓,王泉斌,苏天赟.一种基于局部特征的层次聚类算法[J].中国海洋大学学报（自然科学版）,2019,49(S02):176-184. 被引量：6
2洪生,王俊松.基于实时大数据分析的流量异常检测研究[J].信息化研究,2023,49(4):26-31.
3吴晓璇,倪志伟,倪丽萍.基于分形维数的聚类融合算法[J].吉林大学学报（工学版）,2012,42(S1):364-367. 被引量：1
4高原,耿国华,王怡.基于动态矩形的聚类方法的设计与实现[J].计算机应用,2006,26(4):870-871.
5蒋盛益,姜灵敏.一种高效异常检测方法[J].计算机工程,2007,33(7):166-168. 被引量：7
6陈磊松.数据流处理系统的调度策略研究[J].计算机工程与设计,2007,28(8):1845-1847. 被引量：1
7陈健美,宋顺林,陆虎,宋余庆,朱玉全.改进模糊聚类算法及其在入侵检测中的应用[J].东南大学学报（自然科学版）,2007,37(4):589-592. 被引量：12
8谭义红,林亚平,董婷,周四望,罗立.传感器网络中异常数据实时检测算法[J].系统仿真学报,2007,19(18):4335-4338. 被引量：8
9于少伟,曹凯.基于云模型的动态交通数据流软划分算法[J].计算机工程与应用,2007,43(28):217-219. 被引量：5
10单世民,邓贵仕,何英昊.数据流中孤立点识别方法[J].计算机工程,2007,33(15):172-174. 被引量：4

1YIN Hong YANG Shuqiang HAN Weihong.An Efficient Algorithm for Processing Partialmax/min Queries in OLAP[J].China Communications,2010,7(4):65-70.
2Wang Jie,Zeng Yu.SWFP-Miner： an efficient algorithm for mining weighted frequent pattern over data streams[J].High Technology Letters,2012,18(3):289-294.
3Ming Lei Pilian He Zhichao Li.An Improved K-means Algorithm for Clustering Categorical Data[J].通讯和计算机（中英文版）,2006,3(8):20-24. 被引量：1
4天堂鸟.如何优化电脑系统的BIOS?[J].网络与信息,2008(6):56-56.
5BAI Tian,C.A. Kulikowski,GONG Leiguang,YANG Bin,HUANG Lan,ZHOU Chunguang.A Global K-modes Algorithm for Clustering Categorical Data[J].Chinese Journal of Electronics,2012,21(3):460-465. 被引量：3
6汪彦刚.An efficient algorithm and FPGA implementation of video luminance transient improvement[J].High Technology Letters,2010,16(4):359-365. 被引量：1
7bobooo.请把视频文件变小——Sorenson Squeeze 4视频压缩利器[J].数字技术与应用,2005(4):55-55.
8LIUYaolin,MartinMolenaar,AlTinghua,LIUYanfang.Categorical Database Generalization[J].Geo-Spatial Information Science,2003,6(4):1-9. 被引量：1
9DengShengchun,HeZengyou,XuXiaofei,LiQinzhi.Efficient storage and retrieval of clustering results using relational database[J].High Technology Letters,2005,11(1):51-55.
10Zhang Xiaopeng,Wu Enhua (Computer Science Lab, Institute of Software, The Chinese Academia Sinica, Beijing 100080)Kang Baosheng (Department of Mathematics, Northwest University, Xi’an 710069, China).An Efficient Algorithm for the Calculation of NURBS Curves and All Their Derivatives[J].Computer Aided Drafting,Design and Manufacturing,1998,8(2):22-28.

Journal of Computer Science & Technology

2002年第5期

浏览历史

内容加载中请稍等...

Squeezer：An Efficient Algorithm for Clustering Categorical Data 被引量：32

参考文献17

同被引文献228

引证文献32

二级引证文献141

相关作者

相关机构

相关主题

浏览历史