期刊文献+

基于熵的混合属性聚类算法 被引量:3

Entropy-based clustering algorithm for mixed attribute
下载PDF
导出
摘要 为解决混合属性中数值属性与分类属性相似性度量的差异造成的聚类效果不佳问题,分析混合属性聚类相似性度量的问题,提出基于熵的混合属性聚类算法。引入熵离散化技术将数值属性离散化,仅使用二元化距离度量混合属性对象之间的相似性,在聚类过程中随机选取k个初始簇中心,将其它对象按照距离k个簇中心的最小距离划分到相应的簇中,选择每个簇中每个数据属性中频率最高的属性值形成新的簇中心继续划分对象,迭代此步当满足目标条件时停止,形成最终聚类。在UCI数据集上的实验结果验证了该算法的有效性。 To solve the problem of poor clustering effects caused by the difference between the similarity measures of numerical attribute and categorical attribute in mixed attribute,the problem of similarity measures of mixed attribute clusters was analyzed,and entropy-based clustering algorithm for mixed data was proposed.Entropy discretization technology was introduced to discretize numerical attributes and only binary distances were used to measure the similarity between mixed attribute objects.During the clustering process,k initial cluster centers were randomly selected,and other objects were divided into corresponding clusters according to the minimum distance from the k cluster centers.The most frequent attribute value of each data attribute in each cluster was selected to form a new cluster center and continue to divide objects.Iterating this step stopped when the target conditions were met to form the final cluster.Experimental results on the UCI dataset verify the effectiveness of the algorithm.
作者 邱保志 王志林 QIU Bao-zhi;WANG Zhi-lin(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
出处 《计算机工程与设计》 北大核心 2021年第4期957-962,共6页 Computer Engineering and Design
基金 国家自然科学基金项目(61602154)。
关键词 聚类 混合属性 离散化 clustering mixed attribute entropy discretization only
  • 相关文献

参考文献7

二级参考文献69

  • 1李杰,贾瑞玉,张璐璐.一个改进的基于DBSCAN的空间聚类算法研究[J].计算机技术与发展,2007,17(1):114-116. 被引量:13
  • 2王翠茹,朵春红.一种改进的基于密度的DBSCAN聚类算法[J].广西师范大学学报(自然科学版),2007,25(4):104-107. 被引量:4
  • 3孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1076
  • 4Chehreghani M H, Abolhassani H, Chehreghani M H. Improving density based methods for hierarchical clus- tering of Web pagesl-J~. Data and Knowledge Engi- neering, 2008,67 (1) .. 30-50.
  • 5黄权,陆昌辉.数据之魅:基于开源工具的数据分析[M].北京:清华大学出版社,2012:313-314.
  • 6Huang Z X. Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304.
  • 7Jain A K, Dubes R C. Algorithms for Clustering Data. New Jersey: Prentice-Hall, 1988.
  • 8Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann, 2001.
  • 9Chen W F, Feng G C. Spectral clustering: a semi-supervised approach. Neurocomputing, 2012, 77(1): 229-242.
  • 10Zhang W, Yoshida T, Tang X J, Wang Q. Text clustering using frequent itemsets. Knowledge-Based Systems, 2010, 23(5): 379-388.

共引文献168

同被引文献39

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部