期刊文献+

利用Nutch设计实现生物医学信息垂直搜索引擎

Design and Implementation of Biomedical Information Vertical Search Engine using Nutch Software
下载PDF
导出
摘要 在网络的海量信息搜索过程中,医学情报研究和信息服务机构,经常需要构建面向专题的垂直搜索系统以满足特定人群的需求。本文利用Nutch和Lucene等开源软件设计了一个面向生物医学信息的垂直搜索引擎系统,并对网页信息抓取、格式处理、内容索引和检索等关键技术进行了说明。在此搜索引擎中,通过加入中文分词和增量抓取等模块,提高了中文关键字的识别率,缩短了信息的更新周期。目前该系统已经上线测试,能够获得较为精确和及时的搜索结果。 In the process of searching useful information from the massive information network, the vertical search system is often used by the information service organizations for medical information research and information service, to meet the specific needs. This paper uses open-source software Nutch and Lucene to design and implement a vertical search engine for biomedical information. Some key techniques such as crawling and processing of web page, content indexing and searching, are explained and discussed. The system improves the recognition rate of Chinese keywords and reduces the information update cycle by adding Chinese word segmentation and re-crawl modules. Currently the system has been tested online and obtained more accurate and timely search resuhs.
出处 《北京生物医学工程》 2010年第6期638-640,644,共4页 Beijing Biomedical Engineering
关键词 NUTCH 网络信息抓取 LUCENE 中文分词 增量抓取 Nutch soflware crawl Lucene software Chinese word segmentation re-crawl
  • 相关文献

参考文献6

二级参考文献23

共引文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部