期刊文献+

基于搜索引擎的Deep Web数据源发现 被引量:1

Deep Web Data Source Discovery Based on Search Engine
下载PDF
导出
摘要 提出一种利用搜索引擎发现数据源的方法。为向搜索引擎提交高质量的关键词,将本体作为等级化组织词汇的架构引入到初始词构建过程。对所有词汇按在当前领域中出现频率高低进行分类,并根据搜索引擎返回接口集元素数量进行二次分类,确保关键词是对发现数据源查询接口贡献较大的词汇。在不同领域上的测试结果表明,该方法能发现相当数量的查询接口,从而验证其有效性。 This paper proposes a method for the data source discovery using the search engine.In order to submit high quality key words to the search engine,the paper introduces the ontology to the initial word construction process,classifies all the words according to their frequency in the current domain,and reclassifies these words in accordance with the element quantity of the returned collection,ensures that the key word contributes greatly to the discovery of the data source query interface.Test results in different domains show that the approach proposed can discover a large amount of query interfaces,and its validty is verified.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第5期77-79,82,共4页 Computer Engineering
基金 国家自然科学基金资助项目(60970015) 2008年江苏省重大科技支撑与自主创新计划基金资助项目(BE2008044)
关键词 数据源发现 深层网 本体 data source discovery Deep Web ontology
  • 相关文献

参考文献7

  • 1Chakrabarti S, Punera K, Subramanyam M. Accelerated Focused Crawling Through Online Relevance Feedback[C]//Proceedings of the 1 lth International Conference on World Wide Web. Honolulu, Hawaii, USA: [s. n.], 2002: 148-159.
  • 2林超,赵朋朋,崔志明.Deep Web数据源聚焦爬虫[J].计算机工程,2008,34(7):56-58. 被引量:11
  • 3白鹤,汤迪斌,王劲林.分布式多主题网络爬虫系统的研究与实现[J].计算机工程,2009,35(19):13-16. 被引量:20
  • 4Chang K C C, He 13. Toward Large Scale Integration: Building a Meta Querier over Databases on the Web[C]//Proceedings of the 2nd Conference on Innovative Data Systems Research. Asilomar, California, USA: [s. n.], 2005: 44-55.
  • 5Barbosa L, Freire J. Searching for Hidden Web Databases[C]// Proceedings of the 8th International Workshop on the Web and Database. Baltimore, Maryland, USA: [s. n.], 2005: 1-6.
  • 6Barbosa L, Freire J. An Adaptive Crawler for Locating ttidden Web Entry Points[C]//Proceedings of the 16th International Conference on World Wide Web. Banff, Alberta, Canada: [s. n.], 2007: 441-450.
  • 7Raghavan S, Garcia-Molina H. Crawling the Hidden Web[C]// Proceedings of the 27th VLDB Conference. Roma, Italy: [s. n.], 2001: 129-138.

二级参考文献14

  • 1钱榕,徐新华,郑莹,杨炳儒.智能专题化信息搜集Crawler[J].计算机工程,2006,32(3):57-59. 被引量:4
  • 2Rungsawang A, Angkawattanawit N. Learnable Topic-specific Web Crawler[J]. Journal of Network and Computer Applications, 2005, 28(2): 97-114.
  • 3Chakrabhik S, Vandenburg M, Dom B. Focused Crawling: A New Approach to Topic-specific Web Resource Discovery[C]//Proceedings of the 8th International World-Wide Web Conference. Toronto, Canada: [s. n.], 1999.
  • 4Liu Hongyu, MIuOS E, Janssen J. Probabilistic Models for Focused Web Crawling[C]//Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management. New York, USA: ACM Press, 2004.
  • 5Florescu D, Levy A, Mendelzon A. Database Techniques for the World-Wide Web: A Survey[J]. SIGMOD Record, 1998, 27(3): 59-74.
  • 6Wei Jiying, Wen Jirong. instance-based Schema Matching for Web Databases by Domain-specific Query Probing[C]//Proceedings of the 30th international Conference on VLDB. Toronto, Canada: [s. n.], 2004.
  • 7Kevin Chang Chenchuan. Structured Databases on the Web: Observations and Implications[J]. SIGMOD Record, 2004, 33(3): 61-65.
  • 8Cho J, Garcia-Molina H, Page L. Efficient Crawling Through URL Ordering[J]. Computer Networks and ISDN Systems, 1998, 30(7): 161-172.
  • 9Rennie J, McCallum A. Using Reinforcement Learning to Spider the Web Efficiently[C].Proc. of the International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers Inc., 1999: 335-343.
  • 10Diligenti M, Coetzee F M, Lawrence S, et al. Focused Crawling Using Context Graphs[C].Proc. of the International Conference on Very Large Database. San Francisco, USA: Morgan Kaufmann Publishers Inc., 2000: 527-534.

共引文献29

同被引文献11

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部