期刊文献+

基于JavaScript等多链接分析的主题爬虫设计实现 被引量:4

The Design and Implementation of Topic Crawler based on JavaScript and Other Multi-link Analysis
下载PDF
导出
摘要 针对页面中的大量动态链接,提出了模拟浏览器的解析方式进行页面链接的提取,并设计实现了基于JaveScript等多链接分析的主题爬虫系统. In this article, for the large amount of dynamic linking in the page, the analytical simulation of the browser has been proposed to carry out the extraction of page links. It also designes and implementes topic crawler based on Jave.
作者 刘兵
出处 《许昌学院学报》 CAS 2010年第2期87-90,共4页 Journal of Xuchang University
关键词 主题爬虫 链接分析 相关度 topic crawler link analysis correlation
  • 相关文献

参考文献5

  • 1周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量:156
  • 2杜光芹,张化祥,赵瑞东.主题Web挖掘研究[J].计算机技术与发展,2008,18(2):94-97. 被引量:3
  • 3Liu H Y, Milios E, Janssen J. Focused Crawling by Learning HMM from User' s Topic - specific Browsing[ C ]. Proceedings of the web intelligence. IEEE/WIC/ACM International Conference on Web intelligence. Washington DC, USA:IEEE Computer Society ,2004.
  • 4Guo Q,Guo H ,Zhang Z Q. Schema Driven Topic Specific Web crawling[ C]. Lecture Notes in Cimputer Science. Berlin'Heidelberg : Springer, 2005 : 594 - 599.
  • 5Soumen Chakrabarti,Martin van den Berg, Byron Dom. Focused Crawling:A New Approach to Topic- Specific Web Resource Discovery [ J ]. Computer Networks, 1999,31 ( 11 ) : 1623 - 1640.

二级参考文献39

  • 1陈康,武港山.基于Ontology的信息检索技术研究[J].中文信息学报,2005,19(2):51-57. 被引量:29
  • 2王诚,张璟.基于语义的Web信息检索[J].计算机应用研究,2005,22(8):111-112. 被引量:3
  • 3周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005,25(9):1965-1969. 被引量:156
  • 4EHRIG M, MAEDCHE A. Ontology-focused crawling of Web documents[A]. Proceedings of the 2003 ACM symposium on Applied computing[C], March 2003.
  • 5GUO Q, GUO H, ZHANG ZQ, et al. Schema Driven Topic Specific Web Crawling[A]. DASFAA[C], 2005.
  • 6GRAUPMANN J, BIWER M, ZIMMER C, et al. COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data[A]. Proceedings of the 30th VLDB Conference[C],2004.
  • 7QIN JL, ZHOU YL, CHAU M. Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method[A]. Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries[C], June 2004.
  • 8CHO J , GARCIA - MOLINA H , PAGE L . Efficient crawling through URL ordering[A]. Proceedings of the seventh international conference on World Wide Web 7[C], April 1998.
  • 9FLORESCU D, LEVY AY, MENDELZON AO. Database techniques for the world-wide web: A survey[J]. SIGMOD Record, 1998,27(3) :59 -74.
  • 10LAWRENCE S, GILES CL. Searching the World Wide Web[J].Science, 1998,280(5360):98.

共引文献157

同被引文献29

  • 1彭轲,廖闻剑.基于浏览器服务的网络爬虫[J].硅谷,2009,2(4). 被引量:7
  • 2信息产业部综合规划司.中国信息产业“十五”发展规划[C].北京:人民邮电出版社,2001..
  • 3Wikipedia. Web crawler [EB/OL]. [2013-05-30]. http ://en.wiki- pedia.org/wiki/Web_crawler.
  • 4University of Toronto. HTML and XHTML document type defi- nitions [EB/OL]. [2013-04-23]. http://www.utoronto.ca/webdocs/ HTMLdocs/HTML_Spec/html.html.
  • 5Wikipedia. Regular expression [EB/OL]. [2013-04-23]. http://en. wikipedia.org/wiki/Regular_expression.
  • 6World Wide Web Consortium. Document object model [EB/OL]. [2013-04-23]. http ://www.w3.org/DOM.
  • 7ALVAREZ M, RAPOSO J, PAN A, et al. DeepBot: a focused crawler for accessing hidden web content [C]// Proceedings of DEECS 2007. New York, USA: ACM, 2007: 18-25.
  • 8I WebKit Open Source Project. The WebKit open source project [EB/OL]. [2013-03-24]. http://www.webkit.org.
  • 9WebKit Open Source Project. JavaScriptCore [EB/OL]. [2013- 03-24]. http ://trac.webkit.org/wiki/JavaScriptCore.
  • 10w3af. w3af-open source web application security scanner [EB/ OL]. [2013-04-16]. http://www.w3af.org.

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部