期刊文献+

一类基于贝叶斯信息准则的k均值聚类算法 被引量:15

An k Means Clustering Algorithm Based on Bayesian Information Criterion
下载PDF
导出
摘要 典型k-均值算法中的聚类数k必须是事先给定的确定值,然而,实际应用中k很难被精确地确定。同时该算法对初始聚类中心的依赖性而导致聚类结果可能陷入局部极小,使得该算法对一些实际问题无效。采用基于密度聚类算法(DBSCAN),在筛选局部代表点时结合贝叶斯信息准则(BIC),得到少量精准反映局部数据分布的BIC核心点。然后,以BIC核心点为初始聚类中心,BIC核心点数量为类别数,对全局数据进行k-均值聚类。实验结果表明,优化的k-均值算法是一种有效可行的聚类算法。 The value of k must be confirmed in advance to exert k-means algorithm,however,it can not be clearly and easily confirmed in fact for its uncertainty.At the same time,the dependence of k-means algorithm on the initial center may sink into the local minimum,makes this algorithm ineffective for a number of practical issues.An effective algorithm based on density-based spatical clustering of application with noise(DBSCAN) is proposed,which is combined with the Bayesian Information Criterion(BIC),only selecting less BIC-core-points to represent each local site.The global k-means clustering select BIC-core-points as the initial cluster centers,the value of k is equal to the number of BIC-core-points.Experimental results show that the feasibility and the effectiveness of optimal k-means algorithm.
作者 储岳中
出处 《安徽工业大学学报(自然科学版)》 CAS 2010年第4期409-412,共4页 Journal of Anhui University of Technology(Natural Science)
基金 安徽省教育厅自然科学基金资助项目(KJ2008B103)
关键词 空间聚类k -均值聚类 贝叶斯信息准则(BIC) 密度聚类算法(DBSCCAN) 核心点 spatial clustering k-means clustering algorithm Bayesian information criterion density-based clustering of application with noise core-points
  • 相关文献

参考文献10

二级参考文献25

  • 1赵鹏,耿焕同,王清毅,蔡庆生.基于聚类和分类的个性化文章自动推荐系统的研究[J].南京大学学报(自然科学版),2006,42(5):512-518. 被引量:13
  • 2Han JW, Kambr M. Data Mining Concepts and Techniques. Beijing: Higher Education Press, 2001. 145-176.
  • 3Kaufan L, Rousseeuw PJ. Finding Groups in Data: an Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
  • 4Ester M, Kriegel HP, Sander J, Xu X. A density based algorithm for discovering clusters in large spatial databases with noise. In:Simoudis E, Han JW, Fayyad UM, eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland: AAAI Press, 1996. 226-231.
  • 5Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. "73-84.
  • 6Agrawal R, Gehrke J, Gunopolos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining application. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data.Seattle: ACM Press, 1998.94-105.
  • 7Alexandros N, Yannis T,Yannis M. C^2P: clustering based on closest pairs. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S,Ramamohanarao K, Snodgrass RT, eds. Proceedings of the 27th International Conference on Very Large Data Bases. Roma:Morgan Kaufmann Publishers, 2001. 331-340.
  • 8Berchtold S, Bohm C, Kriegel H-P. The pyramid-technique: towards breaking the curse of dimensionality. In: Haas LM, Tiwary A,eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. 142- 153.
  • 9Yu C, Ooi BC, Tan K-L, Jagadish HV. Indexing the distance: an efficient method to KNN processing. In: Apers PMG, Atzeni P,Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT, eds. Proceedings of the 27th International Conference on Very Large Data Bases. Roma: Morgan Kaufmann Publishers, 2001. 421--430.
  • 10Treshansky A,McGraw R.An overview of clustering algorithms[A].Proceedings of SPIE,The International Society for Optical Engineering[C].2001(4367):41-51.

共引文献342

同被引文献143

引证文献15

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部