A Tradeoff Between Accuracy and Speed for K-Means Seed Determination

下载PDF

导出

摘要 With a sharp increase in the information volume,analyzing and retrieving this vast data volume is much more essential than ever.One of the main techniques that would be beneficial in this regard is called the Clustering method.Clustering aims to classify objects so that all objects within a cluster have similar features while other objects in different clusters are as distinct as possible.One of the most widely used clustering algorithms with the well and approved performance in different applications is the k-means algorithm.The main problem of the k-means algorithm is its performance which can be directly affected by the selection in the primary clusters.Lack of attention to this crucial issue has consequences such as creating empty clusters and decreasing the convergence time.Besides,the selection of appropriate initial seeds can reduce the cluster’s inconsistency.In this paper,we present a new method to determine the initial seeds of the k-mean algorithm to improve the accuracy and decrease the number of iterations of the algorithm.For this purpose,a new method is proposed considering the average distance between objects to determine the initial seeds.Our method attempts to provide a proper tradeoff between the accuracy and speed of the clustering algorithm.The experimental results showed that our proposed approach outperforms the Chithra with 1.7%and 2.1%in terms of clustering accuracy for Wine and Abalone detection data,respectively.Furthermore,achieved results indicate that comparing with the Reverse Nearest Neighbor(RNN)search approach,the proposed method has a higher convergence speed.

作者 Farzaneh Khorasani Morteza Mohammadi Zanjireh Mahdi Bahaghighat Qin Xin

机构地区 Computer Engineering Department Faculty of Science and Technology

出处《Computer Systems Science & Engineering》 SCIE EI 2022年第3期1085-1098,共14页 计算机系统科学与工程（英文）

关键词 Data clustering k-means algorithm information retrieval outlier detection clustering accuracy unsupervised learning

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Ayman Altameem,Ramesh Chandra Poonia,Ankit Kumar,Linesh Raja,Abdul Khader Jilani Saudagar.P-ROCK: A Sustainable Clustering Algorithm for Large Categorical Datasets[J].Intelligent Automation & Soft Computing,2023(1):553-566.
2Ahamed Al Malki,Mohamed M. Rizk,M. A. El-Shorbagy,A. A. Mousa.Hybrid Genetic Algorithm with K-Means for Clustering Problems[J].Open Journal of Optimization,2016,5(2):71-83. 被引量：1
3TRISTIN ZHANG.TANG Fast Food Made Glamorous[J].城市漫步（GBA版）,2016(7):61-61.
4Jing Li,Zhenzhen Song,Chuandong Ma,Tonghang Sui,Peng Yi,Jianlin Liu.A Bioinspired Adhesive Sucker with Both Suction and Adhesion Mechanisms for Three-Dimensional Surfaces[J].Journal of Bionic Engineering,2022,19(6):1671-1683. 被引量：1
5Zhuopu HOU,Min ZHOU,Clive ROBERTS,Hairong DONG.Cuckoo search approach for automatic train regulation under capacity limitation[J].Science China(Information Sciences),2023,66(4):295-296.
6Liying Wang,Gongsang Quzhen,Min Qin,Zehang Liu,Huasheng Pang,Roger Frutos,Laurent Gavotte.Geographic distribution and prevalence of human echinococcosis at the township level in the Tibet Autonomous Region[J].Infectious Diseases of Poverty,2022,11(1):90-90. 被引量：2
7Joshua Cawthorpe.Sinoviniculture[J].城市漫步（上海版、英文）,2022(12):28-29.
8Joshua Cawthorpe.Sinoviniculture[J].城市漫步（GBA版）,2022(12):28-29.
9孙静,王钻开.银杏种子壳的多级结构[J].Science Bulletin,2023,68(4):376-378.
10蔡佳琦,秦小明,卢虹玉,叶琬祺,张桂芳,林海生,郑惠娜.鲍鱼肌肉酶解产物对秀丽隐杆线虫的抗衰老作用研究[J].食品与发酵工业,2023,49(12):106-112. 被引量：2

Computer Systems Science & Engineering

2022年第3期

浏览历史

内容加载中请稍等...

A Tradeoff Between Accuracy and Speed for K-Means Seed Determination

相关作者

相关机构

相关主题

浏览历史