摘要
针对LSH技术的固有缺点提出了一种根据数据自动调整LSH索引结构关键参数的方法,该方法面向数据集,使得索引结构可以针对不同数据集的统计特征选取适当的散列函数,而不用手工调整LSH索引结构中的关键参数,提高了LSH算法的准确性,且在进行查询时不增加额外的时间空间开销.模拟实验表明,和使用原始LSH算法相比较,使用该方法进行最近邻查询得到结果集的相似性可以提高10%左右,相似偏差可以减小8%左右;并且由于参数调整过程在查询过程之前,因此改进LSH算法和原始LSH算法在进行查询时有相同的时间空间性能.
To overcome the handicap of original LSH indexing, an improving approach is presented which enables self-tuning of key parameters of indexing structure. The new approach is dataset-oriented, which make it possible to select appropriate hashing functions according to the statistic feature of a dataset automatically, instead of settling the key parameters manually. This approach improves the indexing performance while not increasing storage and query overhead experiment study shows that comparing to the original LSH method the new approach can improve the inter similarity of result set of query by about 10 %, reduce the error of result set by about 8 %. Meanwhile the new approach has the same temporal-spatial overhead as original LSH when performing query, since the query process is preceded with tuning process.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2006年第11期38-40,57,共4页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
湖北省自然科学基金资助项目(ABA048)
关键词
高维数据索引
相似度查询
近似最近邻查询
high dimensional data indexing
similarity search
approximate nearest neighbor search