摘要
传统的k-means算法不论其数据样本的分布情况,将簇边缘位置、簇中心位置、离群点的数据样本全部按照最小距离原则,划分到离它最近的聚类中心所在簇中,没有考虑数据样本与其他簇之间的关系。如果数据样本与另一簇中心的距离接近于最小距离,则此数据样本与两个簇的关系都很大,显然这样直接划分并不合理。针对此问题,文中提出了最近邻优化的k-means聚类算法。运用近邻的思想,将这些不“很属于”某簇的数据样本划分到其最近邻数据样本所在的簇中,实验结果表明,这种最近邻优化的k-means聚类算法有效地减少了算法的迭代次数,提高了算法的聚类准确度,得到了良好的聚类效果。
Traditional k-means algorithms usually ignores the distribution of the data samples,assign all of them in the cluster edge position,center position,outliers to the cluster which nearest clustering center locates,in accordance with the principle of minimum distance,without considering the relationsh1ip between the data sample and other clusters.If the distance between the data sample and the other cluster is close to the minimum distance,the data sample is very close to the two clusters,obviously,the direct division menthod is not reasonable.Aiming at this problem,this paper presented a clustering algorithm optimized nearest neighbor(1NN-kmeans).Using the ideas of neighbor,assign these samples that do not firmly belong to a certain cluster to the cluster that the nearest neighbor sample belongs to.The experimental results show that 1NN effectively reduced the number of iterations and improved the clustering accuracy and finally achieved the better clustering results.
作者
林涛
赵璨
LIN Tao;ZHAO Can(School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China)
出处
《计算机科学》
CSCD
北大核心
2019年第S11期216-219,共4页
Computer Science
基金
天津市自然科学基金重点项目(13jczdjc34400)
河北省科技计划项目(17214304D)
天津市科技重大专项(14ZCDZGX00818)资助
关键词
K-MEANS
分布
关系
簇
最近邻
K-means
Distribution
Relationship
Cluster
Nearest neighbor