期刊文献+

一种无改写的正则表达式分析树构造算法 被引量:2

AN ALGORITHM FOR CREATING REGULAR EXPRESSION PARSE TREES WITHOUT REWRITING
下载PDF
导出
摘要 数据抽取常用正则表达式(RE)来描述数据源。为实现可视化描述,需将RE转换成分析树。但现有基于改写的RE分析树构造方法会破坏数据对象的内在结构,不能用于数据抽取问题。提出了一种无改写的RE分析树构造算法。实验表明,该算法在时空间性能和实用性等方面优于现有RE分析树构造算法。 Data extraction often applies regular expressions (REs) to describe data sources. In order to visualize the description, REs must be converted into parse trees. However, as the present methods for creating rewriting-based RE parse trees will destroy the inner structure of data objects,they are not fit for data extraction An algorithm for creating RE parse trees without rewriting is proposed. Experiments show that the algorithm outperforms the present counterparts not only in time and space behaviors, but also in practicality.
作者 邓绪斌
出处 《计算机应用与软件》 CSCD 北大核心 2007年第12期65-66,84,共3页 Computer Applications and Software
基金 浙江省教育厅项目:高自动化Web信息抽取工具研究(20060144)
关键词 正则表达武 分析树 数据抽取 改写 Regular expression Parse tree Data extraction Rewriting
  • 相关文献

参考文献2

二级参考文献22

  • 1胡东东,孟小峰.一种基于树结构的Web数据自动抽取方法[J].计算机研究与发展,2004,41(10):1607-1613. 被引量:21
  • 2Meng X F, Lu H J, Wang H Yet al. Data extraction from the web based on pre-defined schema. Journal of Computer Science and Technology, 2002, 17(4): 377-388.
  • 3Embley D W, Jiang Y, Ng Y K. kecord-boundary discovery in web documents. In Proc. 1999 ACM SIGMOD Int. Conf.Management of Data ( SIGMOD'99), Philadelphia, Pennsylvania, USA, June 1-3, 1999, pp.467-478.
  • 4Yamada Y, Ikeda D, Hirokawa S. Automatic wrapper generation for multilingual web resources. In Proc. 5th Int. Conf.Discovery Science (DS'02), Liibeck, Germany, November 24-26, 2002, pp.332-339.
  • 5Frisch A, Cardelli L. Greedy regular expression matching. In Proc. POPL '04 Workshop on Programming Languages Technologies for XML (PLAN-X'04), Venice, Italy, January 13,2004, pp.1-12.
  • 6Schwinn A, Schelp J. Data integration patterns. In Proc. 6th Int. Conf. Business Information Systems ( BIS'03), Colorado Springs, Colorado, USA, June 4-6, 2003, pp.232-238.
  • 7Laend-er A, Ribeiro-Neto B, da silva A. DEByE: Data extraction by example. Data and Knowledge Engineering, 2002,40(2): 121-154.
  • 8Adelberg B. NoDoSE: A tool for semi-automatically extracting structured and semistructured data from text documents.In Proc. 1998 ACM SIGMOD Int. Conf. Management of Data (SIGMOD'98), Seattle, Washington, USA, June 2-4,1998, pp.283-294.
  • 9Arasu-A, Garcia-Molina H. Extracting structured data from web pages. In Proc. 2003 ACM SIGMOD Int. Conf. Management of Data (SIGMOD'03), San Diego, California, USA,June 10-12, 2003, pp.337-348.
  • 10Crescenzi V, Mecca G, Merialdo P. RoadRunner: Towards automatic data extraction from large web sites. In Proc. 27th Int. Conf. Very Large Data Bases ( VLDB'01), Roma, Italy,September 11-14, 2001, pp.109-118.

共引文献9

同被引文献4

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部