摘要
由于高维属性和海量数据所带来的影响,数据管理需要相当高的计算负载,传统的集中索引技术已经变得不切实际。为满足数据的快速增长、海量和高维特性的要求,实现了一个高层次的分布式树形索引结构框架MRC-Tree。基于MRC-Tree框架基础上,提出了两种MKd-Tree索引结构构建方法,即OMKd-Tree和MMKd-Tree。理论分析和实验结果表明,基于MRC-Tree框架的MKd-Tree索引结构构建方法具有良好的可扩展性和较高的检索效率。
Managing the high-dimensional, large-scale data needs extremely high computational load. Traditional centralized indexing techniques apparently become impractical. To address the demanding needs caused by this rapidly growing, large-scale, and high-dimensional information ecology, a high-level distributed framework for searches and computations on tree indexing structures based on Map-Reduce in the Hadoop environment, MRC- Tree (Computation based on Map-Reduce on tree structures) is achieved. And then, two MKd-Tree(Kd-Tree based on Map-Reduce) index structures based on MRC-Tree framework, OMKd-Tree (Build one distributed Kd- Tree based on Map-Reduce) and MMKd-tree (Build multiple Kd-Trees by splitting data equally based on Map- Reduce) are proposed. Finally, the theoretical analysis and experiment results illustrate that the methods are highly effective and extensible to the similarity search in high-dimensional data environment.
出处
《电讯技术》
北大核心
2013年第7期909-916,共8页
Telecommunication Engineering