期刊文献+

面向高校信息的垂直搜索引擎的研究与实现 被引量:5

Vertical Search Engine Research and Implementation towards to University Information
下载PDF
导出
摘要 目的为了解决高等院校信息资源领域搜索服务的需求,以及对搜索结果的个性化和实时性要求,针对高等院校信息资源分布比较多较广的特点,设计面向高校信息的相关垂直搜索引擎系统.方法利用Nutch网络爬虫和Lucene索引平台结合中文分词、主题预测等相关垂直搜索引擎技术,针对高等院校信息资源这一类特定领域研究和实现一款具有市场潜力的垂直搜索引擎系统.结果在基于Lucene平台和改进的Nutch开源爬虫框架下,设计并构建了包含抓取网页、解析网页、数据索引、数据搜索等功能组成的高校信息垂直搜索引擎原形.结论该高校信息垂直搜索引擎可为用户提供相关高校某个主题信息的检索、查询与分析等服务,同时在查询的准确率和效率上都比传统的搜索引擎有了显著的提高. In order to meet the demand of searching service for the information resource fields of universities, as well as the requirement of personalized and real time search results, also aimed at the characteristics of the wide university information resource distributions, we designed the vertical searching engine system which is oriented towards universities. In the study, we made use of Nutch net parser and Lucene index plat- form, combined the Chinese segment with forecast of theme which relates to vertical searching engine tech- nology, researched and implemented a vertical searching engine system which has good competition in mar- keting at the special area of university information resources. Based on Lucene platform and the improved frame of Nutch open source parser, we designed and built the primary form of the vertical searching engine system of university information, which contains web page grabbing, web page analyzing, indexing data, searching data, and other functions. Then we attained the conclusions : the vertical search engine system of u- niversity information can provide the indexing, querying, analyzing, and other service of special information theme of a university for users;the accuracy and efficiency are more outstanding improvement comparing to the traditional searching engines
出处 《沈阳建筑大学学报(自然科学版)》 CAS 北大核心 2012年第3期555-562,共8页 Journal of Shenyang Jianzhu University:Natural Science
基金 国家自然科学基金项目(60874103) 辽宁省教育厅基金项目(L2010449)
关键词 高校信息 LUCENE 垂直搜索 NUTCH university information Lucene vertical searching Nutch
  • 相关文献

参考文献9

  • 1梁永霖.基于Java的全文检索引擎Lucene的分析与研究.电脑知识与技术,2008,:231-233.
  • 2赵德平,刘阳,李鹏.基于Lucene的房产信息垂直搜索引擎的研究[J].沈阳建筑大学学报(自然科学版),2011,27(1):178-183. 被引量:6
  • 3杨坚争,李朝平.垂直搜索引擎及其应用[M].北京:电子商务出版社,2006.
  • 4包燕晗.搜索引擎存在的问题与发展趋势[J].中国信息导报,2006(4):60-61. 被引量:9
  • 5Manku G S, Jain A. Detecting Near Duplicates for Web Crawling [ J ]. www2007/Track: Data Mining, 2007 : 141 - 149.
  • 6Ye Shaozhi,Wen Ji Rong. A systematic study of pa- rameter correlations in large scale duplicate docu- ment detection[ C]//Proceedings of the 10th Pacific- Asia Conference on Knowledge Discovery and Data Mining. Springer - Verlag Berlin Heidelberg: PAK- DD,2006 : 275 - 284.
  • 7Manku G S, Jain A, Sarma A D. Detecting near - du- plicates for web crawling [ J ]. WWW 2007/Track: Data Mining: Similiarlty Search ,2007,15 ( 8 ): 141 - 149.
  • 8Yang Hui, Callan J. Near - duplicate detection by in- stance - level constrained clustering [ C]//Proceed- ings of the 29th ACM Conference on Research and Development in Information Retrieval (SIGIR -06). Near- duplicate Detection by Instance -level Con- strained Clusterin: ACM Press, 2006:421 - 428.
  • 9Stein B. Principles of hash- based text retrieval[J]. SIGIR'07,2007,79 ( 12 ) :527 - 534.

二级参考文献11

共引文献13

同被引文献33

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部