期刊文献+

基于K近邻链式相似性度量的聚类算法 被引量:2

Chained Similarity Measurement Based on K Neighbors
下载PDF
导出
摘要 聚类算法是一种重要的数据挖掘方法,其目标是按照某种准则把一个数据集分割成不同的类或簇,使得同一类对象的相似度尽可能地大,不同类对象之间的相似度尽可能地小。所以,相似性度量是聚类分析的重要环节。为进一步改善传统聚类算法中,采用欧式距离进行相似性度量时,不能很好地反应非凸数据集的全局一致性的问题,在欧式距离基础上,提出一种基于密度和近邻通过构建近邻链的方式计算流形上两点间距离的度量方法,针对具有非凸结构的数据集,可以很好反应其局部和全局一致性。为验证方法的有效性,基于K-medoids和Affinity Propagat-ion聚类算法,在二维和三维数据集上对比采用不同距离度量时的聚类结果并取得了较好的实验效果。 The clustering algorithm is an important data mining method,and its goal is to divide a data set into different classes or clusters according to a certain criterion,so that the similarity between objects in the same class is as large as possible and the similarity between objects in different classes is as small as possible.Therefore,similarity measurement is an important part of cluster analysis.In order to further improve the problem that Euclidean distance is used for similarity measurement in traditional clustering algorithms does not reflect well the global consistency of non-convex data sets,this paper proposes a method to calculate the distance between two points on a manifold based on density and nearest neighbor by constructing a chain of nearest neighbors based on Euclidean distance,which can well reflect the global consistency of data set with manifold structure.The method can reflect the local and global consistency of the data set with non-convex structure.To verify the effectiveness of the method,the clustering results are compared on two-dimensional and three-dimensional data sets with different distance measures based on K-medoids and Affinity Propagation clustering algorithms,and good experimental results are achieved.Finally,some problems of the method and the follow-up research plan are summarized.
作者 刘佳伟 唐锦萍 LIU Jia-wei;TANG Jin-ping(School of Data Science and Technology,Heilongjiang University,Harbin Heilongjiang 150080,China)
出处 《计算机仿真》 北大核心 2023年第8期382-388,420,共8页 Computer Simulation
基金 国家自然科学基金(11701159)。
关键词 聚类 距离 密度 流形 非凸数据集 近邻 Clustering Distance Density Manifold Non-convex dataset Neighbors
  • 相关文献

参考文献6

二级参考文献52

  • 1冯征.一种基于粗糙集的K-Means聚类算法[J].计算机工程与应用,2006,42(20):141-142. 被引量:16
  • 2宋晓峰,亢金龙,王宏.进化算法的发展与应用[J].现代电子技术,2006,29(20):66-68. 被引量:4
  • 3Blum A,Dwork C,McSherry F,et al.Practical Privacy:The SuLQ Framework[C] //24th ACM SIGMOD International Conference on Management of Data / Principles of Database Systems,Baltimore (PODS 2005).Baltimore,Maryland,USA,June 2005.
  • 4Dwork C.Differential Privacy[C] //33rd International Colloquium on Automata,Languages and Programming,part Ⅱ (ICALP 2006).Venice,Italy,Springer Verlag,July 2006.
  • 5Dwork C.Differential Privacy:A Survey of Results[C] //Theory and Applications of Models of Computation(TAMC2008).Xi'an,China,Springer Verlag,April 2008.
  • 6Dwork C.The Differential Privacy Frontier[C] //6th Theory of Cryptography Conference (TCC 2009).San Francisco,CA,Springer Verlag,March 2009.
  • 7Dwork C.Differential Privacy in New Settings[C] //Symposium on Discrete Algorithms (SODA),Society for Industrial and Applied Mathematics.Austin,TX,January 2010.
  • 8Dwork C.A Firm Foundation for Private Data Analysis[J].Communications of the ACM,2011,54 (1):86-95.
  • 9Dwork C.The Promise of Differential Privacy.A Tutorial on Algorithmic Techniques[C] // 52nd Annual IEEE Symposium on Foundations of Computer Science.Palm Springs,CA,October 2011.
  • 10Agrawal R,Strikant R.Privacy-preserving data mining[C] //Proceedings of the 2000 ACM SIGMOD International Conference on Managementof Data.Dallas,Texas,May 2000:439-450.

共引文献94

同被引文献186

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部