摘要
索引技术是基于内容的相似性检索的核心内容,而数据的分割则是影响索引性能的关键因素.提出一种高维数据空间分割策略——在距离分割基础上基于关键维的二次分割,以及相应的索引技术.基于关键维的二次分割保证孪生兄弟节点的无重叠性,而在索引中根据选定的关键维进行孪生兄弟节点间的二次过滤,从而增强过滤效率.这种数据分片策略和索引技术使得索引的过滤效率成倍提高.实验结果显示,关键维能够很好地提高索引的相似性检索性能,对于加速基于内容的多媒体信息检索具有很大的意义.
Index is one of the core components of content based similarity search and the data partition is the key factor affecting the performance of index. This paper proposes a new data partition strategy-key dimension based partition strategy on the basis of the traditional distance based partition strategy, and the index technique accordingly. The key dimension based data partition eliminates the overlaps between twin nodes, and the filtering between twin nodes by key dimension enhances the filtering ability of index. The data partition strategy and index technique proposed can greatly improve the filtering ability of index. Experimental results show that key dimension can be used to improve the performance of index, which is of great significance for accelerating the content based similarity search.
出处
《软件学报》
EI
CSCD
北大核心
2004年第9期1361-1374,共14页
Journal of Software
基金
国家自然科学基金
教育部高等学校优秀青年教师教学科研奖励计划基金~~
关键词
高维索引
度量空间
关键维
范围查询
最近邻查询
Algorithms
Data processing
Indexing (of information)
Trees (mathematics)