期刊文献+

高维数据相似性度量方法研究 被引量:18

Research on the Similarity Measurement of High Dimensional Data
下载PDF
导出
摘要 将低维空间中的距离度量方法(如Lk-范数)应用于高维空间时,随着维数的增加,对象之间距离的对比性将不复存在。研究高维数据有效的距离或相似(相异)度度量方法是一个重要且具有挑战性的课题。通过对传统的距离度量或相似性(相异性)度量方法在高维空间中表现出的不适应性的分析,并对现有的应用于高维数据的相似性度量方法进行总结,提出了高维数据相似性度量函数Hsim(X,Y)的改进方法HDsim(X,Y)。函数HDsim(X,Y)整合了各类型数据的相似性度量方法,在处理数值型、二值型以及分类属性数据上充分体现了原Hsim(X,Y)处理数值型数据、Jaccard系数处理二值数据以及匹配率处理分类属性数据的优越性。通过有效性及实例分析,充分论证了HDsim(X,Y)在高维空间中的有效性。 There exists no comparison between the distances of the objects with the increase of dimension when the method of distance measurement for low dimensional space is adopted in high dimensional space. The study of efficient methods for distance measurement or similarity (dissimilarity) measurement in high dimensional space is very important and challenging. The improved function HDsirn (X,Y) is proposed to measure the similarity between the objects in high dimen- sional space through analyzing the inapplicability of the traditional measurement being used in high dimensional space and summarizing the existing methods to similarity measurement for high dimensional data. The methods for similarity measure- ment to all kinds of data have been integrated by function HDsim (X,Y) , which takes full advantage of the original function Hsim (X,Y) in dealing with numerical data, the Jaccard coefficient in dealing with the binary data, and the matching ratio in dealing with the categorical data. Validity and ease analysis demonstrate that the function HDsim (X,Y) is effective in com- puting the similarity between the objects in high dimensional space.
出处 《计算机工程与科学》 CSCD 北大核心 2010年第5期92-96,共5页 Computer Engineering & Science
基金 国家科技支撑计划资助项目(2007BAH16B03) 国家863计划资助项目(2009AA12Z228)
关键词 高维数据 相似性度量 属性相似性 空间相似性 high dimensional data similarity measurement attribute similarity spatial similarity
  • 相关文献

参考文献9

二级参考文献33

  • 1陈建斌,宋翰涛.基于属性分布相似度的超图高维聚类算法研究[J].计算机工程与应用,2004,40(34):195-198. 被引量:7
  • 2刘纪平,汪宏斌,汪诚波,周洞汝.基于模糊最近邻的高维数据聚类[J].小型微型计算机系统,2005,26(2):261-263. 被引量:5
  • 3刘泉凤,陆蓓.数据挖掘中聚类算法的比较研究[J].浙江水利水电专科学校学报,2005,17(2):55-58. 被引量:9
  • 4A Guttman. R-Tree: A dynamic index structure for spatial searching. The ACM SIGMOD Int'l Conf on Management of Data, Boston, MA, 1984
  • 5T Sellis, N Roussopoulos, C Faloutsos. The R+ tree: A dynamic index for multidimensional objects. The 13th Int'l Conf on Very Large Data Bases, Brighton, England, 1987
  • 6N Beckman, H-P Kriegel, R Schneider et al. The R*-tree: An efficient and robust method for points and rectangles. The ACM SIGMOD Int'l Conf on Management of Data, Atlantic City, NJ, 1990
  • 7N Katayama, S Satoh. The SR-tree: An index structure for high dimensional nearest neighbor queries. The ACM SIGMOD Int'l Conf on Management of Data, Tucson, Arizona, USA, 1997
  • 8S Berchtold, D Keim, H-P Kriegel. The X-tree: An index structure for high-dimensional data. The 22nd Int'l Conf on Very Large Data Bases, Bombay, India, 1996
  • 9S Berchtold, C Bhm, H V Jagadish et al. Independent quantization: An index compression technique for high-dimensional data spaces. The 16th Int'l Conf on Data Engineering, San Diego, California, USA, 2000
  • 10Y Sakurai, M Yoshikawa, S Uemura et al. The A-tree: An index structure for high-dimensional spaces using relative approximation. The 26th Int'l Conf on Very Large Data Bases, Cairo, Egypt, 2000

共引文献36

同被引文献137

引证文献18

二级引证文献78

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部