摘要
对HMM爬虫中K-means算法的K值选取方法作出相应改进,然后针对爬取网页的内容与主题相关度不高的问题,对隐马尔科夫模型的假设条件进行修改,完成改进后的隐马尔科夫爬虫设计。
This paper made corresponding improvement on K value selection method of K-means algorithm in HMM crawler, then aiming at the problem that the correlation between the content and theme of the crawled page is not high, improved the assumed condition of the hidden Markov model, and completed the improved hidden Markov crawler designing.
出处
《河南科技》
2016年第17期27-28,共2页
Henan Science and Technology
关键词
网络爬虫
算法
改进
network crawler
algorithm
improvement