摘要
近似k近邻查询的研究一直受到广泛关注,局部敏感散列(LSH)是解决此问题的主流方法之一。LSH及目前大部分改进版本都会面临以下问题:数据散列以后在桶里分布不均匀;无法准确计算对应参数k的查询范围建立索引。基于此,将支持动态数据索引的LSH和B-tree结合,构建新的SLSB-forest索引结构,使散列桶里的数据维持在一个合理的区间。针对SLSB-forest提出了两种查询算法:快速查找和准确率优先查找,并通过理论和实验证明查找过程中查询范围的动态变化。
The study of approximate k nearest neighbors query has attracted broad attention. Local sensitive hash is one of the mainstream ways to solve this problem. Local sensitive hash and its varients have noted the following problems: the uneven distribution of hashed data in the buckets, it cannot calculate the appropriate query range (for the parameter k) to build index. To tackle the above problem, a new data struct which called SLSB-forest was pre- sented. The SLSB-forest combined the LSH and the B-tree to maintain the amount of bucket's data in reasonable range. Two query algorithms were proposed: fast and accurate priority search. Theory and experimental results prove that query range can dynamic change during searching approximate k nearest neighbors.
出处
《电信科学》
北大核心
2017年第9期58-68,共11页
Telecommunications Science
基金
国家自然科学基金资助项目(No.61472194
No.61572266)
浙江省自然科学基金资助项目(No.LY16F020003)~~
关键词
近似k近邻
局部敏感散列
高维数据
approximate k nearest neighbor, locality sensitive hash, high dimensionality