摘要
距离度量是聚类算法的基础,它对算法的效果有很大的影响.然而分类型数据的聚类是学习算法中重要而棘手的问题.传统的k-modes算法采用0-1匹配方法定义每两个对象属性值之间的距离,忽视了属性间的相互关系对距离的影响.针对这个问题,本文基于相互依存冗余度量定义了一种新的距离,该距离由内部距离和外部距离两个部分决定.这种度量方法不仅表现出某个属性本身的差异性,而且表现出其他属性对该属性的影响程度.本文与基于其他距离度量的k-modes算法进行实验比较,结果表明基于相互依存冗余度量的k-modes算法能有效地提高算法的聚类精度.
Distance measure is the basis of many clustering algorithm, and the effectiveness of its algorithm has a great influence on the learning results. However, categorical data clustering among many learning algorithms is an important and difficult issues. The traditional k-modes algorithm matching 0-1 define the distance between each of the two object attribute values, ignoring the impact of the relationship between the properties of distance. To address this problem, we defined a new distance based on interdependence redundancy measure, the distance is determined by the internal distance and external distance between two parts. This is reflected not only a measure of the difference between the attribute itself,but also reflects the impact of the other attributes of the property. With the traditional k-modes algorithm and Hong Jia's k-modes algorithm experiments,the results show that the distance metric based on the new improved k-modes clustering algorithm can effectively improve the accuracy of the algorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第8期1790-1793,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61472089)资助
广东省自然科学基金项目(2014A030308008)资助
软件新技术国家重点实验室开放课题项目(KFKT2014B23)资助