期刊文献+

基于相互依存冗余度量的k-modes算法 被引量:5

K-modes Algorithm Based on Interdependence Redundancy Measure
下载PDF
导出
摘要 距离度量是聚类算法的基础,它对算法的效果有很大的影响.然而分类型数据的聚类是学习算法中重要而棘手的问题.传统的k-modes算法采用0-1匹配方法定义每两个对象属性值之间的距离,忽视了属性间的相互关系对距离的影响.针对这个问题,本文基于相互依存冗余度量定义了一种新的距离,该距离由内部距离和外部距离两个部分决定.这种度量方法不仅表现出某个属性本身的差异性,而且表现出其他属性对该属性的影响程度.本文与基于其他距离度量的k-modes算法进行实验比较,结果表明基于相互依存冗余度量的k-modes算法能有效地提高算法的聚类精度. Distance measure is the basis of many clustering algorithm, and the effectiveness of its algorithm has a great influence on the learning results. However, categorical data clustering among many learning algorithms is an important and difficult issues. The traditional k-modes algorithm matching 0-1 define the distance between each of the two object attribute values, ignoring the impact of the relationship between the properties of distance. To address this problem, we defined a new distance based on interdependence redundancy measure, the distance is determined by the internal distance and external distance between two parts. This is reflected not only a measure of the difference between the attribute itself,but also reflects the impact of the other attributes of the property. With the traditional k-modes algorithm and Hong Jia's k-modes algorithm experiments,the results show that the distance metric based on the new improved k-modes clustering algorithm can effectively improve the accuracy of the algorithm.
出处 《小型微型计算机系统》 CSCD 北大核心 2016年第8期1790-1793,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61472089)资助 广东省自然科学基金项目(2014A030308008)资助 软件新技术国家重点实验室开放课题项目(KFKT2014B23)资助
关键词 k-modes算法 分类型属性 相互依存冗余度量 k-modes algorithm categorical attributes interdependence redundancy measure
  • 相关文献

参考文献20

  • 1Mac Queen J B. Some methods for classification and analysis of multivariate observations[ C]. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967:281-297.
  • 2. Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm[ J]. Journal of the Royal Sta- tistical Society, Series B ( Methodological), 1977,39 ( 1 ) : 1-38.
  • 3Hubek Z. Coefficients of association and similarity, based on bina- ry (presenceeabsence) data: an evaluation [ J ]. Biological Reviews, 1982,57(4) :669-689.
  • 4Gower J C, Legendre P. Metric and euclidean properties of dissimi- larity coefficients[ J]. Journal of Classification, 1986,3 ( 1 ) :5-48.
  • 5Batagelj V, Bren M. Comparing resemblance measures [ J ]. Journal of Classification, 1995,12 ( 1 ) :73-90.
  • 6Hsu C C, Wang S H. An integrated framework for visualized and exploratory pattern discovery in mixed data[ J]. IEEE Transactions on Knowledge and Data Engineering,2005,18 (2) : 161-173.
  • 7Hsu C C. Generalizing self-organizing map for categorical data[ J]. IEEE Transactions on Neural Networks,2006,17 (2) :294-304.
  • 8Huang Z. Clustering large data sets with mixed numeric and cate- gorical values [ C ]. In:Proceedings of the First Pacific-Asia Confer- ence on Knowledge Discovery and Data Mining, World Scientific, Singapore, 1997.
  • 9Huang Z. Extensions to the K-modes algorithm for clustering large data sets with categorical values [ J ]. Data Mining and Knowledge Discovery, 1998,2 (3) :283-304.
  • 10Huang Z,Ng M K. A fuzzy k-modes algorithm for clustering cate- gorical data [J]. IEEE Transactions on Fuzzy Systems, 1999,7 (4) :446-452.

同被引文献38

引证文献5

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部