基于相互依存冗余度量的k-modes算法被引量：5

K-modes Algorithm Based on Interdependence Redundancy Measure

下载PDF

导出

摘要距离度量是聚类算法的基础,它对算法的效果有很大的影响.然而分类型数据的聚类是学习算法中重要而棘手的问题.传统的k-modes算法采用0-1匹配方法定义每两个对象属性值之间的距离,忽视了属性间的相互关系对距离的影响.针对这个问题,本文基于相互依存冗余度量定义了一种新的距离,该距离由内部距离和外部距离两个部分决定.这种度量方法不仅表现出某个属性本身的差异性,而且表现出其他属性对该属性的影响程度.本文与基于其他距离度量的k-modes算法进行实验比较,结果表明基于相互依存冗余度量的k-modes算法能有效地提高算法的聚类精度. Distance measure is the basis of many clustering algorithm, and the effectiveness of its algorithm has a great influence on the learning results. However, categorical data clustering among many learning algorithms is an important and difficult issues. The traditional k-modes algorithm matching 0-1 define the distance between each of the two object attribute values, ignoring the impact of the relationship between the properties of distance. To address this problem, we defined a new distance based on interdependence redundancy measure, the distance is determined by the internal distance and external distance between two parts. This is reflected not only a measure of the difference between the attribute itself,but also reflects the impact of the other attributes of the property. With the traditional k-modes algorithm and Hong Jia＇s k-modes algorithm experiments,the results show that the distance metric based on the new improved k-modes clustering algorithm can effectively improve the accuracy of the algorithm.

作者黄苑华郝志峰蔡瑞初谢峰

机构地区广东工业大学应用数学学院广东工业大学计算机学院

出处《小型微型计算机系统》 CSCD 北大核心 2016年第8期1790-1793,共4页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(61472089)资助广东省自然科学基金项目(2014A030308008)资助软件新技术国家重点实验室开放课题项目(KFKT2014B23)资助

关键词 k-modes算法分类型属性相互依存冗余度量 k-modes algorithm categorical attributes interdependence redundancy measure

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献20

1Mac Queen J B. Some methods for classification and analysis of multivariate observations[ C]. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967:281-297.
2. Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm[ J]. Journal of the Royal Sta- tistical Society, Series B ( Methodological), 1977,39 ( 1 ) : 1-38.
3Hubek Z. Coefficients of association and similarity, based on bina- ry (presenceeabsence) data: an evaluation [ J ]. Biological Reviews, 1982,57(4) :669-689.
4Gower J C, Legendre P. Metric and euclidean properties of dissimi- larity coefficients[ J]. Journal of Classification, 1986,3 ( 1 ) :5-48.
5Batagelj V, Bren M. Comparing resemblance measures [ J ]. Journal of Classification, 1995,12 ( 1 ) :73-90.
6Hsu C C, Wang S H. An integrated framework for visualized and exploratory pattern discovery in mixed data[ J]. IEEE Transactions on Knowledge and Data Engineering,2005,18 (2) : 161-173.
7Hsu C C. Generalizing self-organizing map for categorical data[ J]. IEEE Transactions on Neural Networks,2006,17 (2) :294-304.
8Huang Z. Clustering large data sets with mixed numeric and cate- gorical values [ C ]. In:Proceedings of the First Pacific-Asia Confer- ence on Knowledge Discovery and Data Mining, World Scientific, Singapore, 1997.
9Huang Z. Extensions to the K-modes algorithm for clustering large data sets with categorical values [ J ]. Data Mining and Knowledge Discovery, 1998,2 (3) :283-304.
10Huang Z,Ng M K. A fuzzy k-modes algorithm for clustering cate- gorical data [J]. IEEE Transactions on Fuzzy Systems, 1999,7 (4) :446-452.

同被引文献38

1赵克勤.集对分析及其初步应用[J].大自然探索,1994,13(1):67-72. 被引量：291
2张小宇,梁吉业,曹付元,于慧娟.基于加权连接度的改进K-Modes聚类算法[J].广西师范大学学报（自然科学版）,2008,26(3):189-193. 被引量：3
3梁吉业,钱宇华.信息系统中的信息粒与熵理论[J].中国科学（E辑）,2008,38(12):2048-2065. 被引量：51
4白亮,梁吉业,曹付元.基于粗糙集的改进K-Modes聚类算法[J].计算机科学,2009,36(1):162-164. 被引量：15
5梁吉业,白亮,曹付元.基于新的距离度量的K-Modes聚类算法[J].计算机研究与发展,2010,47(10):1749-1755. 被引量：46
6周爱武,于亚飞.K-Means聚类算法的研究[J].计算机技术与发展,2011,21(2):62-65. 被引量：134
7李仁侃,叶东毅.属性赋权的K-Modes算法优化[J].计算机科学与探索,2012,6(1):90-96. 被引量：3
8吴润秀.基于互信息量的改进K-Modes聚类方法[J].统计与决策,2012,28(6):89-91. 被引量：3
9黄德才,赵克勤,陆耀忠,洪宁.a+bi+cj型联系数的四则运算及其应用[J].机电工程,2000,17(3):81-84. 被引量：19
10石隽锋,白妙青.一种改进的K-Modes聚类算法[J].现代电子技术,2015,38(4):39-41. 被引量：1

引证文献5

1李燕梅.网络数据传输中冗余信息优化消除仿真[J].计算机仿真,2018,35(1):370-373. 被引量：6
2贾彬,梁毅,苏航.一种改进的K-Modes聚类算法[J].软件导刊,2019,18(6):60-64. 被引量：7
3施振佺,陈世平.一种改进的k-modes聚类算法[J].运筹与管理,2019,28(12):112-117. 被引量：6
4张春英,高瑞艳,王佳昊,陈松,刘凤春,任静,冯晓泽.面向不完备分类型矩阵数据的集对k-modes聚类算法[J].小型微型计算机系统,2021,42(9):1837-1844. 被引量：4
5郝荣丽,胡立华.一种基于属性值权重的k-modes聚类分析算法[J].计算机与数字工程,2023,51(5):1001-1004. 被引量：1

二级引证文献21

1韦锦.廓坊日记[J].岁月,2000(7):34-36.
2熊雪兰.如何优化传输网络提高通信可靠性[J].信息通信,2018,31(5):209-210. 被引量：1
3邓盛彪,张宏涛,孙勇,苏子宁,凌云汉.基于大数据的锻造生产过程模型的搭建与分析[J].锻压技术,2019,44(5):174-179. 被引量：5
4李宗锴.浅谈数学建模与信息的优化传输问题[J].数学学习与研究,2019(1):17-17.
5罗秋慧,李航,杨弘凡,吴丽.基于网格分析的激光点单株果树识别[J].软件导刊,2020,19(4):194-198.
6郑忠斌,孙繁荣.基于Spark与改进K- modes的增量聚类研究[J].信息技术,2020,44(6):50-55. 被引量：1
7张岩金,白亮.一种基于符号关系图的快速符号数据聚类算法[J].计算机科学,2021,48(4):111-116. 被引量：1
8张春英,高瑞艳,王佳昊,陈松,刘凤春,任静,冯晓泽.面向不完备分类型矩阵数据的集对k-modes聚类算法[J].小型微型计算机系统,2021,42(9):1837-1844. 被引量：4
9杨晖.基于皮尔森相关算法的云存储层次化去冗优化[J].吉林大学学报（信息科学版）,2022,40(1):71-76. 被引量：2
10刘江平.基于特征选择的光通信网络传输冗余信息辨识方法[J].保山学院学报,2022,41(2):71-77. 被引量：1

1杨阳,张为群,刘枫,黄仁杰.基于MapReduce自适应参数的粗糙K-modes算法研究[J].计算机科学,2012,39(11):149-152.
2白亮,梁吉业,曹付元.基于粗糙集的改进K-Modes聚类算法[J].计算机科学,2009,36(1):162-164. 被引量：15
3张勇.基于ReliefF算法的模糊聚类新算法[J].华南金融电脑,2009(1):43-46. 被引量：3
4石隽锋,白妙青.一种改进的K-Modes聚类算法[J].现代电子技术,2015,38(4):39-41. 被引量：1
5郭涛,丁祥武.基于MapReduce的并行k-modes算法[J].智能计算机与应用,2015,5(1):43-45.
6罗冬梅.改进的k-prototypes算法及应用[J].武夷学院学报,2009,28(2):74-77. 被引量：1
7李仁侃,叶东毅.粗糙K-Modes聚类算法[J].计算机应用,2011,31(1):97-100. 被引量：5
8石玉强,廖起彬,王鸿绪.Vague集之间的相似度量定义的再讨论[J].计算机科学,2012,39(12):255-256. 被引量：6
9石玉强,王鸿绪.关于Vague集的相似度量定义的注[J].计算机工程与应用,2012,48(32):129-131. 被引量：4
10尹波,何松华.基于PSO的模糊K-Prototypes聚类[J].计算机工程与设计,2008,29(11):2883-2885. 被引量：2

小型微型计算机系统

2016年第8期

浏览历史

内容加载中请稍等...

基于相互依存冗余度量的k-modes算法被引量：5

参考文献20

同被引文献38

引证文献5

二级引证文献21

相关作者

相关机构

相关主题

浏览历史

基于相互依存冗余度量的k-modes算法 被引量：5

参考文献20

同被引文献38

引证文献5

二级引证文献21

相关作者

相关机构

相关主题

浏览历史

基于相互依存冗余度量的k-modes算法被引量：5