期刊文献+

一般分布区间型符号数据的K均值聚类方法 被引量:11

K-means clustering of generally distributed interval symbolic data
下载PDF
导出
摘要 对于区间型符号数据聚类分析的研究,现有方法大多假设个体在区间内服从均匀分布,这往往并不符合实际情况.针对此问题,研究一般分布的区间型符号数据K均值聚类方法,给出了一般分布区间型符号数据的定义,并基于经验分布理论研究其描述统计.基于Hausdorff距离,考虑区间数所包含个体的分布信息,提出了一种新的区间型符号数据距离度量.给出了一般分布的区间型符号数据K均值聚类算法.通过随机模拟试验对该方法进行了有效性评价,结论表明,在各种实验设计的条件下,考虑一般分布的K均值聚类算法有效性均优于均匀分布假设下的K均值聚类算法.最后将文中方法应用于汽车的聚类分析,进一步体现了文中方法在解决实际问题中的优势. The existed clustering methods of interval data mostly supposed that the data are uniformly distribu- ted across the interval. However, this is not always practical. Taking this into account, this paper aims to re- search the k-means clustering method of interval data with a general distribution. The definition of generally distributed interval data is proposed, and descriptive statistics was researched based on empirical distribution theory. On the basis of Hausdorff distance, the paper puts forward a new distance for interval data, which con- siders the point data contained in the intervals. Based on this, we present a algorithm of k-means clustering of generally distributed interval symbolic data. A simulation experiment is conducted to evaluate the validity of our method. The results show that, compared with analysis methods of uniform interval symbolic data, the a- nalysis methods of generally distributed interval symbolic data are more effective under all the conditions de- signed in our experiment. Finally, the method is illustrated by an example of real-case data which shows the advantages of our method in the practical application.
出处 《管理科学学报》 CSSCI 北大核心 2013年第3期21-28,共8页 Journal of Management Sciences in China
基金 国家自然科学基金资助项目(71271147 71003072)
关键词 区间数 一般分布 符号数据分析 聚类分析 interval symbolic data general distribution symbolic data analysis clustering analysis
  • 相关文献

参考文献14

  • 1胡艳,王惠文.一种海量数据的分析技术——符号数据分析及应用[J].北京航空航天大学学报(社会科学版),2004,17(2):40-44. 被引量:19
  • 2Bock H H, Diday E. Analysis of Symbolic Data[ M ]. New York: Springer-Verlag, 2000.
  • 3李汶华,郭均鹏.区间型符号数据回归分析及其应用[J].管理科学学报,2010,13(4):38-43. 被引量:13
  • 4Diday E, Brito M P. Symbolic cluster analysis[ C]//Conceptual and Numerical Analysis of Data(Eds. Opitz O), Heidelberg: Springer-Verlag, 1989 : 45 - 84.
  • 5De Carvalho F A T, Csemel M, Lechevallier Y. Clustering constrained symbolic data [ J ]. Pattern Recognition Letters, 2009, 30 ( 11 ) : 1037 - 1045.
  • 6De Carvalho F A T, Brito P, Bock H H. Dynamic clustering for interval data based on L2 distance[ J]. Computational Statistics, 2006, 21 (2) : 231 -250.
  • 7Tenorio C P, De Carvalho F A T, Pimentel J T. A partitioning fuzzy clustering algorithm for symbolic interval data based on adaptive mahalanobis distances[ C ]//Proceedings of 7th International Conference on Hybrid Intelligent Systems, 2007:174 - 179.
  • 8De Carvalho F A T, Tenorio C P. Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances[ J ]. Fuzzy Sets and Systems, 2010, 161 (23) : 2978 - 2999.
  • 9De Carvalho F A T, Lechevallier Y. Partitional clustering algorithms for symbolic interval data based on single adaptive distances [ J ]. Pattern Recognition, 2009, 42 (7) : 1223 - 1236.
  • 10Irpino A, Verde R. Dynamic clustering of interval data using a Wasserstein-based distance [ J ]. Pattern Recognition, 2008, 29(11): 1648 -1658.

二级参考文献45

共引文献64

同被引文献124

引证文献11

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部