摘要
互联网的飞速发展,给人类带来了海量的可供访问信息,但是,现今搜索引擎索引的绝大部分是表层SurfaceWeb网的信息,限于一些技术原因,搜索引擎几乎无法索引到Deep Web网中的信息。由于查询接口是Deep Web的唯一入口,但并非所有的网页表单都是查询接口,为了能充分利用Deep Web后台数据库信息,首先要找到进入Deep Web后台数据库的入口,所以对查询接口的正确判定至关重要。文中介绍了利用决策树C4.5分类算法自动判定网页表单是否为DeepWeb查询接口的方法。
The rapid development of the Internet brought a mass of information, but the search engine indexed most of the Surface Web, limited to a number of technical reasons, the search engine was almost impossible to index Deep Web. The query interface was the only entrance to the Deep Web, but not all of the web forms were query interfaces. In this paper, using C4.5 decision tree classification algorithm automatic web form to determine whether the Deep Web query interface.
出处
《计算机与数字工程》
2009年第3期131-134,共4页
Computer & Digital Engineering