期刊文献+

基于数据集压缩的聚类算法性能优化研究 被引量:6

Research on optimization of clustering algorithm performance based on dataset compression
下载PDF
导出
摘要 针对目前聚类算法对大数据集的聚类分析中存在时间花费过大的问题,提出了一种基于最近邻相似性的数据集压缩算法。通过将若干个相似性最近邻的数据点划分成一个数据簇并随机选择簇头构成新的数据集,大大缩减了数据的规模。然后分别采用K-means算法和AP算法对压缩后的数据集进行聚类分析。实验结果表明,压缩后的数据集与原始数据集的聚类分析相比,在保证聚类准确率基本一致的前提下,有效降低了聚类的花费时长,提高了算法的聚类性能,证明了该数据集压缩算法在聚类分析中的有效性和可靠性。 This paper proposed a data set compression algorithm based on nearest neighbor similarity to solve the problem that the clustering algorithm is too expensive in the large data clustering analysis.It greatly reduced the size of the data set by dividing several data points nearest to each other into a data cluster and forming new data set with randomly selecting cluster heads.Then it used the K-means algorithm and the AP algorithm to cluster the compressed datasets respectively.The experimental results show that compared with the original data set clustering analysis,the compressed dataset can reduce the time of clustering and improve the clustering performance of the algorithm in the case of the clustering accuracy is basically the same,which proves that the validity and reliability of data set compression algorithm in cluster analysis.
作者 赵延龙 滑楠 Zhao Yanlong;Hua Nan(College of Information&Navigation,Air Force Engineering University,Xi’an 710077,China)
出处 《计算机应用研究》 CSCD 北大核心 2018年第5期1450-1453,共4页 Application Research of Computers
关键词 聚类 数据压缩 聚类性能 clustering data compression clustering performance
  • 相关文献

参考文献12

二级参考文献125

共引文献100

同被引文献89

引证文献6

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部