一种LSH索引的自动参数调整方法被引量：6

A self-tuning method of LSH index

下载PDF

导出

摘要针对LSH技术的固有缺点提出了一种根据数据自动调整LSH索引结构关键参数的方法,该方法面向数据集,使得索引结构可以针对不同数据集的统计特征选取适当的散列函数,而不用手工调整LSH索引结构中的关键参数,提高了LSH算法的准确性,且在进行查询时不增加额外的时间空间开销.模拟实验表明,和使用原始LSH算法相比较,使用该方法进行最近邻查询得到结果集的相似性可以提高10%左右,相似偏差可以减小8%左右;并且由于参数调整过程在查询过程之前,因此改进LSH算法和原始LSH算法在进行查询时有相同的时间空间性能. To overcome the handicap of original LSH indexing, an improving approach is presented which enables self-tuning of key parameters of indexing structure. The new approach is dataset-oriented, which make it possible to select appropriate hashing functions according to the statistic feature of a dataset automatically, instead of settling the key parameters manually. This approach improves the indexing performance while not increasing storage and query overhead experiment study shows that comparing to the original LSH method the new approach can improve the inter similarity of result set of query by about 10 %, reduce the error of result set by about 8 %. Meanwhile the new approach has the same temporal-spatial overhead as original LSH when performing query, since the query process is preceded with tuning process.

作者卢炎生饶祺

机构地区华中科技大学计算机科学与技术学院

出处《华中科技大学学报（自然科学版）》 EI CAS CSCD 北大核心 2006年第11期38-40,57,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)

基金湖北省自然科学基金资助项目(ABA048)

关键词高维数据索引相似度查询近似最近邻查询 high dimensional data indexing similarity search approximate nearest neighbor search

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1Edelsbrunner H. Algorithms in combinatorial geometry[M].Edelsbrunner: Springer-Verlag, 1987.
2Weber R, Schek H, Blott S. A quantitative analysis and performance study for similarity search methodsin high dimensional spaces[C]// Ashish O S, JenniferW. Proceedings of the 24th International Conference on Very Large Data Bases (VLDB). New York:Morgan Kaufmann Publishers Inc, 1998: 194-205.
3Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing[C]// Malcolm P A, Maria E Q, et al. Proc of VLDB. Edinburgh: Morgan Kaufmann Publishers Inc, 1999:518-529.
4Indyk P, Motwani R. Approximate nearest neighbor towards removing the curse of dimensionality[C]//Laszio B. Proc of STOC. New York: ACM Press,1998:604-613.
5Qamra A, Meng Y, Chang Y. Enhanced distance functions and indexing for image replica recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 21(3):375-391.
6Li B, Chang E, Wu Y. Discovery of a perceptual distance function for measuring image similarity [J].ACM Multimedia J, 2003, 8(6): 512-522.

同被引文献93

1王国仁,黄健美,王斌,韩东红,乔百友,于戈.基于最大间隙空间映射的高维数据索引技术[J].软件学报,2007,18(6):1419-1428. 被引量：9
2Daswani N, Garcia-Molina H, Yang B. Open problems in data sharing peer-to-peer systems [C]. Heidelberg: Springer-Veda, 2003:1-15.
3Li J,Loo B T, Hellerstein J,et al.On the feasibility of peer-to-peer web indexing and search[C].Berkeley:Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS), 2003: 207-215.
4Reynolds P, Vahdat A.Efficient peer-to-peer keyword searching [C].Riode Janeiro,Brazil:Middleware,2003:21-40.
5Indyk P. Approximate nearest neighbor algorithms for Frechet distance via product metrics[C].Barcelona:Symposium on Computational Geometry,2002:102-106.
6Broder A Z,Charikar M,Frieze A M,et al.Min-wise independent permutations[J].J Comput System Sci,2000,60(3):630- 659.
7Smith M K. Web ontology issue status [EB/OL] .http://www. w3.org/2001/sw/WebOnt/webont-issues.html,2003-11.
8TREC: Text retrieval conference [EB/OL] .http://trec.nist.gov, 2006-05.
9Schaffalitzky F, Zisserman A. Multi-view matching for unordered image sets, or "how do I organize my holiday snaps?"[C] //Proceedings of the 7th European Conference on Computer Vision, Copenhagen, 2002 : 414-431.
10Snavely N, Seitz S M, Szeliski R. Photo tourism: exploring photo collections in 3D [J]. ACM Transactions on Graphics, 2006, 25(3): 835-846.

引证文献6

1刘文娣,蔡明.有效的结构化P2P信息检索[J].计算机工程与设计,2009,30(16):3787-3789. 被引量：1
2杨恒,王庆,何周灿.面向高维图像特征匹配的多次随机子向量量化哈希算法[J].计算机辅助设计与图形学学报,2010,22(3):494-502. 被引量：9
3何周灿,王庆,杨恒.一种面向快速图像匹配的扩展LSH算法[J].四川大学学报（自然科学版）,2010,47(2):269-274. 被引量：8
4高毫林,徐旭,李弼程.近似最近邻搜索算法——位置敏感哈希[J].信息工程大学学报,2013,14(3):332-340. 被引量：8
5赵跃华,林聚伟.面向海量病毒样本家族聚类方法的研究[J].计算机工程与应用,2014,50(18):118-121.
6曹玉东,刘艳洋,孙福明,贾旭.低空间复杂度的LSH算法及其在图像检索中的应用[J].计算机工程与科学,2015,37(2):379-383. 被引量：2

二级引证文献26

1董会国.基于P2P网络搜索技术的研究与应用[J].电脑知识与技术,2010(2):824-825. 被引量：1
2陈慧中,陈永光,景宁,陈荦.遥感影像检索中高维特征的快速匹配[J].电子与信息学报,2011,33(9):2144-2151.
3陈慧中,陈永光,景宁,陈荦.PCPF:一种面向多媒体数据库中高维向量匹配的并行索引结构[J].计算机学报,2011,34(10):2009-2017. 被引量：3
4赵启潍,张乐,祝贝利,刘静.面向高维数据的LSH算法及应用[J].福建电脑,2012,28(4):13-14. 被引量：1
5邵寿平,韩春燕,谢勇,琚生根.改进序贯相似性检测算法的遥感图像匹配[J].四川大学学报（自然科学版）,2013,50(2):288-292. 被引量：3
6曹玉东,刘福英,蔡希彪.基于局部敏感哈希算法的图像高维数据索引技术的研究[J].辽宁工业大学学报（自然科学版）,2013,33(1):1-3. 被引量：6
7高毫林,徐旭,李弼程.近似最近邻搜索算法——位置敏感哈希[J].信息工程大学学报,2013,14(3):332-340. 被引量：8
8雷婷.云环境下基于MKd-Tree的大规模图数据索引技术[J].电讯技术,2013,53(7):909-916.
9李红梅,郝文宁,陈刚.基于精确欧氏局部敏感哈希的协同过滤推荐算法[J].计算机应用,2014,34(12):3481-3486. 被引量：9
10唐坤,韩斌.一种自适应搜索范围的SIFT特征点快速匹配算法[J].智能系统学报,2014,9(6):723-728.

1曹玉东,刘艳洋,孙福明,贾旭.低空间复杂度的LSH算法及其在图像检索中的应用[J].计算机工程与科学,2015,37(2):379-383. 被引量：2
2朱云峰.余弦距离算法在固定资产管理系统中文本相似度查询的应用[J].无锡商业职业技术学院学报,2013,13(6):96-99. 被引量：1
3薛向阳,罗航哉,吴立德.LIFT:一种用于高维数据的索引结构[J].电子学报,2001,29(2):192-195. 被引量：5
4曹玉东,刘福英,蔡希彪.基于局部敏感哈希算法的图像高维数据索引技术的研究[J].辽宁工业大学学报（自然科学版）,2013,33(1):1-3. 被引量：6
5刘婉,徐望明,石汉路.基于高维局部特征和LSH索引的图像检索技术[J].电子设计工程,2011,19(20):110-112. 被引量：1
6唐俊华,阎保平.基于LSH索引的快速图像检索[J].计算机工程与应用,2002,38(24):20-21. 被引量：6
7曹玉东,刘艳洋,贾旭,王冬霞.基于改进的局部敏感哈希算法实现图像型垃圾邮件过滤[J].计算机应用研究,2016,33(6):1693-1696. 被引量：13
8袁培森,沙朝锋,王晓玲,周傲英.一种基于学习的高维数据c-近似最近邻查询算法[J].软件学报,2012,23(8):2018-2031. 被引量：18
9李大湘,吴倩,李娜.融合LBP特征与LSH索引的鞋印图像检索[J].警察技术,2016,0(3):47-49. 被引量：4
10刘根平.集中式环境下的局部敏感哈希算法综述[J].移动通信,2015,39(10):46-51. 被引量：1

华中科技大学学报（自然科学版）

2006年第11期

浏览历史

内容加载中请稍等...

一种LSH索引的自动参数调整方法被引量：6

参考文献6

同被引文献93

引证文献6

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

一种LSH索引的自动参数调整方法 被引量：6

参考文献6

同被引文献93

引证文献6

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

一种LSH索引的自动参数调整方法被引量：6