摘要
随着信息化步伐的加快,网络求职越来越普及,通用搜索引擎以及招聘网也成为大学生获取就业信息的主要渠道。针对大多通用搜索引擎搜索的就业信息精准度不高以及招聘网站内容繁杂等问题,研究面向大学生的就业主题搜索引擎系统。借助Heritrix爬虫工具以及Solr全文搜索引擎进行二次开发,并对Heritrix爬虫工具默认的爬取策略以及队列分配策略进行优化,同时引入IK Analyzer改进Solr的中文分词的准确率。系统原型测试结果表明,系统具有较好抓取效率以及查准率。
With the acceleration of informationization, online employment is becoming more and more popular. Universal search engines and recruitment networks are the main channels for university students to obtain employment information. For most of the common search engine search job information accuracy is not high and recruitment site content is complex and other issues, the author studied the employment theme search engine system for college students.the author studied the employment themes search engine system for college students. Second development with the Heritrix crawler tool and the Solr full-text search engine, and optimization of the default crawling strategy and queue allocation strategy of the Heritrix crawler tool, and the introduction of IK Analyzer to improve Solr's Chinese word segmentation accuracy. System prototype test results show that the system has better capture efficiency and precision.
作者
郑燕娥
郑志明
ZHENG Yan-e;ZHENG Zhi-ming(College of Engineering and Technology, Yang'en University, Fujian Quanzhou 362014, China;Meizhouwan Vocational Technology College, Fujian Putian 351254, China)
出处
《齐齐哈尔大学学报(自然科学版)》
2018年第4期13-20,共8页
Journal of Qiqihar University(Natural Science Edition)
基金
福建省2016年中青年教师教育科研项目(JAT160591)