期刊文献+

基于SRI的动态网页信息抽取方法 被引量:1

Information Extraction Method for Dynamic Web Pages Based on Similar Records Induction
下载PDF
导出
摘要 提出了基于相似记录项归纳的动态网页信息抽取方法.该方法采用编辑距离算法和树排列算法归纳产生记录项的包装器树.对各种类型网页进行信息抽取实验,取得98.11%的召回率和96.90%的准确率. Dynamic Web pages are pages which are generated by programs automatically. It is estimated that most Web pages exist in the form of dynamic web pages. This paper puts forward an extraction method based on similar records induction ( SRI), which uses string editing distance algorithm and DOM tree alignment algorithm to generate record wrapper. Experimental results show that the extraction method gets a recall of 98.11% and a precision of 96.90% for all kinds of dynamic Web pages.
出处 《重庆工学院学报(自然科学版)》 2009年第10期87-93,共7页 Journal of Chongqing Institute of Technology
基金 国家自然科学基金资助项目(60873153 60803061)
关键词 动态网页 信息抽取 包装器 DOM树 dynamic Web page information extraction wrapper DOM tree
  • 相关文献

参考文献9

  • 1Ray mond Kosala , Hendrik Blockeel. Web Mining Research: A Survey [ Z ]. SKGKDD: Explorations, 2000.
  • 2LiuL, Pu C, Han W. XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources[C]//Proc. 16th IEEE Int'l Conf. Data Eng. (ICDE). [S.l. ]:Is. n. ] ,2000:611 -621.
  • 3Gusfield, D. Algorithms on strings, tree, and sequence [M]. Cambrige:[s. n. ] ,1997.
  • 4Crescenzi V, Mecca G, Merialdo P. RoadRunner: Towards-Automatic Data Extraction from Large Web Sites[C]//Proc. the 26th Int'l Conf. Very Large Database Systems (VLDB). [S. l. ]: [s. n. ] ,2001: 109 - 118.
  • 5Chang C H, Lui S C. IEPAD: information extraction based on pattern discovery [ C ]//Proc. 10th World Wide Web Conf. [ S. l. ] : [ s. n. ] ,2001 : 681 -688.
  • 6Chang C H, Kuo S C. OLERA: A Semisupervised Approach for Web Data Extraction with Visual Support [ J ]. IEEE Intelligent Systems,2004, 19 (6) :56 - 64.
  • 7Hogue A, Karger D. Thresher: Automating the Unwrapping of Semantic Content from the World Wide [ C ]//Proe. 14th Int'l Conf. World Wide Web (WWW). [S.l.]:[s.n.] ,2005.
  • 8Liu B,. Grossman R, Zhai Y. Mining Data Records in Web Pages [ C ]//Proc. Int'l Conf. Knowledge Discovery in Databases and Data Mining (KDD). [S. l. ]:[s. n. ],2003:601 -606.
  • 9Zhai Y , Liu B. Web Data Extraction Based on Partial Tree Alignment [ C ]//Proc. 14th Int'l Conf. World Wide Web (WWW). [S. l. ] :[s. n. ] ,2005:76 -85.

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部