摘要
为了解决ASPSeek倒排索引直接采用操作系统文件缓冲访问机制带来的效率问题,本研究以125万张中文农业网页为样本,采用块大小可变的倒排索引存储结构,设计了基于LRU、MRU、LFU、MFU、Clock、Random、FPA替代策略的专用缓冲管理机制。对这7种替代策略的缓冲命中率及查询访问时间对比测试表明,当所有词条以相同的概率进行检索时,Clock算法为较好的缓冲替代策略,当词条以特定差别概率进行检索时,本研究提出的FPA算法为较好的缓冲替代策略。
In order to solve the efficiency problem of ASPSeek inverted index caused by directly using operation system file buffer query mechanism,taken 1.25 million Chinese agricultural web pages as sample,this paper proposed a new blocking inverted index scheme with buffer mechanism based on LRU,MRU,LFU,MFU,Clock,Random and FPA replacement strategies.The contrast test of buffer hit rate and query access time of these seven replacement strategies,indicated that clock strategy was better than others when all Chinese terms were retrieved with equal probability and the new proposed FPA strategy was better than others when all Chinese terms were retrieved with specifically different probability.
出处
《新疆农业大学学报》
CAS
北大核心
2011年第2期161-164,共4页
Journal of Xinjiang Agricultural University
基金
新疆维吾尔自治区科技攻关项目(200931103)
关键词
农业搜索引擎
倒排索引
缓冲替换策略
agricultural search engine
inverted file
buffer replacement strategy