摘要
典型k-均值算法中的聚类数k必须是事先给定的确定值,然而,实际应用中k很难被精确地确定。同时该算法对初始聚类中心的依赖性而导致聚类结果可能陷入局部极小,使得该算法对一些实际问题无效。采用基于密度聚类算法(DBSCAN),在筛选局部代表点时结合贝叶斯信息准则(BIC),得到少量精准反映局部数据分布的BIC核心点。然后,以BIC核心点为初始聚类中心,BIC核心点数量为类别数,对全局数据进行k-均值聚类。实验结果表明,优化的k-均值算法是一种有效可行的聚类算法。
The value of k must be confirmed in advance to exert k-means algorithm,however,it can not be clearly and easily confirmed in fact for its uncertainty.At the same time,the dependence of k-means algorithm on the initial center may sink into the local minimum,makes this algorithm ineffective for a number of practical issues.An effective algorithm based on density-based spatical clustering of application with noise(DBSCAN) is proposed,which is combined with the Bayesian Information Criterion(BIC),only selecting less BIC-core-points to represent each local site.The global k-means clustering select BIC-core-points as the initial cluster centers,the value of k is equal to the number of BIC-core-points.Experimental results show that the feasibility and the effectiveness of optimal k-means algorithm.
出处
《安徽工业大学学报(自然科学版)》
CAS
2010年第4期409-412,共4页
Journal of Anhui University of Technology(Natural Science)
基金
安徽省教育厅自然科学基金资助项目(KJ2008B103)