摘要
Google采用了并行,索引桶,数据压缩,PageRank算法等的技术,建立了复杂的体系结构,包括网络爬行机器人crawler、知识库Repository、索引系统(包括索引器indexer,桶barrels,文件索引等)、排序器Sorter和搜索器Searcher五个部分.Google的rank系统综合了词频,类型,相邻度,网页重要性等因素.其中最值得一提的是计算网页重要性的PageRank算法,它把文献检索的引用理论应用到Web中,即一个网页有很多网页指向它,或者一些重要的网页指向它,则这个网页很重要.PageRank算法大大提高了检索效率.
It is hard to retrieve information on the Internet, but search engine make it easy. The data on the Intemet is so large that the retrieve information technology on the normal database can not meet the requirement. To resolve the problem, some technologies, such as parallel processing, barrel sorting, compression and PageRank, are applied to Google. So it is a complicated system which have five parts, crawler, Repository, index system(including indexer, barrels, file index and so on), sorter, searcher. The rank system of Google considers both count-weight, type weight, prox-weight, and PageRank which weight the importance of a page. Applied Academic citation literature to the Web, a page can have a high PageRank if there arc many pages that point to it, or if there arc some pages that point to it and have a high PageRank. Applying the PageRank, the search technology is improved effectively.
出处
《哈尔滨商业大学学报(自然科学版)》
CAS
2006年第1期84-87,共4页
Journal of Harbin University of Commerce:Natural Sciences Edition