摘要
目的为了解决高等院校信息资源领域搜索服务的需求,以及对搜索结果的个性化和实时性要求,针对高等院校信息资源分布比较多较广的特点,设计面向高校信息的相关垂直搜索引擎系统.方法利用Nutch网络爬虫和Lucene索引平台结合中文分词、主题预测等相关垂直搜索引擎技术,针对高等院校信息资源这一类特定领域研究和实现一款具有市场潜力的垂直搜索引擎系统.结果在基于Lucene平台和改进的Nutch开源爬虫框架下,设计并构建了包含抓取网页、解析网页、数据索引、数据搜索等功能组成的高校信息垂直搜索引擎原形.结论该高校信息垂直搜索引擎可为用户提供相关高校某个主题信息的检索、查询与分析等服务,同时在查询的准确率和效率上都比传统的搜索引擎有了显著的提高.
In order to meet the demand of searching service for the information resource fields of universities, as well as the requirement of personalized and real time search results, also aimed at the characteristics of the wide university information resource distributions, we designed the vertical searching engine system which is oriented towards universities. In the study, we made use of Nutch net parser and Lucene index plat- form, combined the Chinese segment with forecast of theme which relates to vertical searching engine tech- nology, researched and implemented a vertical searching engine system which has good competition in mar- keting at the special area of university information resources. Based on Lucene platform and the improved frame of Nutch open source parser, we designed and built the primary form of the vertical searching engine system of university information, which contains web page grabbing, web page analyzing, indexing data, searching data, and other functions. Then we attained the conclusions : the vertical search engine system of u- niversity information can provide the indexing, querying, analyzing, and other service of special information theme of a university for users;the accuracy and efficiency are more outstanding improvement comparing to the traditional searching engines
出处
《沈阳建筑大学学报(自然科学版)》
CAS
北大核心
2012年第3期555-562,共8页
Journal of Shenyang Jianzhu University:Natural Science
基金
国家自然科学基金项目(60874103)
辽宁省教育厅基金项目(L2010449)