摘要
在大规模视频、图像、文本检索等许多实际应用中,高维空间内海量数据的索引及近邻检索一直是难点和关键问题之一.传统的K-D树等树形索引方法在高维空间中容易陷入"维度灾难",而主流的哈希散列方法(如局部敏感哈希)空间复杂度较高,在大规模数据下难以应用.本文总结了近年来基于向量量化的检索算法的相关研究,提出了一种基于GPU优化的高维数据近似近邻检索算法,在组合量化算法的基础上融合双层索引树结构与局部子空间最优化思想,在提高算法准确率的同时针对GPU模型优化算法,极大改善了检索性能,在单张GPU上实现了十亿量级高维数据的高效近似近邻检索.
ANN( Approximate Nearest Neighbor) search for large-scale data in high-dimensional space have been one of the most difficult and key issues in many multimedia applications. Tradition methods based on tree partition,like K-D tree,falls into 'the curse of dimensionality'while hashing techniques such as LSH comes with high spatial complexity and requires unacceptable RAMusage. In this paper,we propose a GPU-based ANN search algorithm for billion-scale high-dimensional data based on composite quantization.We first combines the multi-layer inverted index structure with local subspace optimization in composite quantization to achieve high accuracy,then adjust the calculate process constrained with the GPU model to reduce GRAMusage and enhance retrieval performance.Finally,our methods successfully implement efficient ANN searching for billion scale high-dimensional data on one single GPU.
作者
邓理睿
包涵
陈靓
全成斌
赵有健
DENG Li-rui;BAO Han;CHEN Liang;QUAN Cheng-bin;ZHAO You-jian(Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China;Research Institute of Petroleum Exploration&Development,PetroChina,Beijing 100083,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第2期390-394,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61472210
61233007)资助
关键词
近似近邻检索
组合量化
GPU
高维索引
approximate nearest neighbor
composite quantization
GPU
high-dimension indexing