期刊文献+

一种高维混合属性数据聚类算法 被引量:6

Algorithm for clustering of high-dimensional data mixed with numeric and categorical attributes
下载PDF
导出
摘要 在许多应用中,很多数据集都具有数值型和分类型数据的混合特征,k-prototype是针对这类数据聚类的经典方法之一,该方法是一种基于k-means和k-mode的聚类方法。在研究了现有的混合属性数据聚类方法之后,引入了一种新算法用于混合型数据聚类,不仅改进了prototype的选取方法,而且提出了一种新的针对混合型数据的相似度度量方式,基于此又提出了一种不同于k-prototype的数据到prototype的分配方式,采用类似层次聚类中凝聚聚类的思想进行聚类,通过在四个真实的混合型数据集上测试发现:与传统算法相比,算法提高了聚类的精度和稳定性。 In many applications, many datasets have the features of both numeric and categorical data, the k-prototype is one of the most important algorithms designed for clustering this type data. Based on the studying of the existing cluster-ing algorithms for mixed data, it proposes a new algorithm for the clustering of mixed data, not only modifies the method of the searching of prototypes, but also designs a new measurement of similarity to measure the similarity between data objects. It also proposes a new method that different from k-prototype to allocate data to prototype. It uses the idea similar to the agglomerate clustering in hierarchical clustering to clustering, after the testing on four real mixed datasets it is found that compared with other algorithms, the proposed algorithm not only can improve the accuracy of clustering, but also has the very high stability.
机构地区 汕头大学工学院
出处 《计算机工程与应用》 CSCD 北大核心 2015年第8期128-133,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.61170130)
关键词 聚类 混合型数据 相似度计算 层次聚类 clustering mixed data similarity measure hierarchical clustering
  • 相关文献

参考文献22

  • 1Han J,Kamber M,Pei J.Data mining:concepts and techniques[M].Beijing:China Machine Press,2012.
  • 2Huang Z.Extensions to the k-means algorithm for clusteringlarge data sets with categorical values[J].DataMining and Knowledge Discovery,1998,2(3):283-304.
  • 3Jain A K,Dubes R C.Algorithms for clustering data[M].New Jersey:Prentice-Hall,1988.
  • 4Jain A K,Murty M N,Flynn P J.Data clustering:a survey[J].ACM Computing Surveys,1999,31(3):264-323.
  • 5Iam-On N,Boongoen T,Garrett S.A link-based clusterensemble approach for categorical data clustering[J].IEEEKnowledge and Data Engineering,2012,24(3):413-425.
  • 6Bordogna G,Pasi G.A quality driven hierarchical datadivisive soft clustering for information retrieval[J].Knowledge-Based Systems,2012,26(1):9-19.
  • 7Islam M Z,Brankovic L.Privacy preserving data mining:a noise addition framework using a novel clustering technique[J].Knowledge-Based Systems,2011,24(8):1214-1223.
  • 8Zhang W,Yoshida T,Tang X J,et al.Text clustering usingfrequent itemsets[J].Knowledge-Based Systems,2011,23(5):379-388.
  • 9Macqueen J.Some methods for classification and analysisof multivariate observations[C]//Proceedings of the5th Berkeley Symposium on Mathematical Statisticsand Probability.Berkeley:University of California Press,1967:281-297.
  • 10Kaufman L,Rousseeuw P J.Finding groups in data:anintroduction to cluster analysis[M].Hoboken:John Wiley &Sons Inc,1990:68-72.

同被引文献38

引证文献6

二级引证文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部