摘要
针对索引创建和维护效率不高的问题,设计了一种基于DHT(Distributed Hash Table)的分布式倒排索引构建算法。该算法利用基于改进的Chord网络的分布式哈希表技术,将分词后的结果分散到多个索引服务器上并行构建索引,同时采用前驱列表定位和减少服务器定位延迟的技术,大大缩短了索引构建时间。通过采用统一调度的基于分块的增量式倒排索引更新策略,索引更新时不再需要移动已有的索引文件,提高了索引更新效率。利用周期性稳定算法和前驱列表定位提高了系统的稳定性、容错性和索引的一致性。
A distributed inverted index's building method based on DHT (Distributed Hash Table) was adopted to im- prove the index's creating and updating efficiency. The arithmetic, using the DHT technology based on improved Chord network,hashes the terms and their relational information to the distributed index servers and builds the index paralle- ly. This method reduces the index' s building time through distributing a task to many nodes. The strategies of schedu- ling the index building task through chained index management servers and the incremental distributed inverted index updating method were used,which could assure index's consistency and updating efficiency.
出处
《计算机科学》
CSCD
北大核心
2010年第2期65-70,共6页
Computer Science
基金
国家自然科学基金项目(60873225
60773191
70771043)
国家高技术研究发展计划(863计划)项目(2007AA01Z403)资助