单选题 One of the difficulties in building an SQL-like query language for the Web is the absence of a database{{U}} {{U}} 11 {{/U}} {{/U}}for this huge, heterogeneous repository of information. However, if we are interested in HTML documents only, we can construct a virtual schema from the implicit structure of these files. Thus, at the highest level of{{U}} {{U}} 12 {{/U}} {{/U}}, every such document is identified by its Uniform Resource Locator (URL), and a{{U}} {{U}} 13 {{/U}} {{/U}}and a text. Also, Web severs provide some additional information such as the type, length, and the last modification date of a document. So for data mining purposes, we can consider the set of all HTML documents as a relation:
Document (url, title, text, type, length, modif)
Where all the{{U}} {{U}} 14 {{/U}} {{/U}}are character strings. In this framework, an individual document is identified with a{{U}} {{U}} 15 {{/U}} {{/U}}in this relation. Of course, if some optional information is missing from the HTML document, the associate fields will be left blank, but this is not uncommon in any database.
单选题
  • A. schema
  • B. platform
  • C. module
  • D. relation
【正确答案】 A
【答案解析】
单选题
  • A. protocol
  • B. control
  • C. abstraction
  • D. presentation
【正确答案】 C
【答案解析】
单选题
  • A. table
  • B. title
  • C. driver
  • D. event
【正确答案】 B
【答案解析】
单选题
  • A. type
  • B. links
  • C. characteristics
  • D. attributes
【正确答案】 D
【答案解析】
单选题
  • A. relation
  • B. field
  • C. script
  • D. tuple
【正确答案】 D
【答案解析】[解析] 为Web建立类似SQL一样的查询语言的困难之一是,缺乏一个为这个庞大而异构的信息库建立的数据库模式。但是,如果我们仅仅关心HTML文档的话,那么我们可以从这些文件的固有结构中构造一个虚拟模式。这样一来,在最高级的抽象层次上,每一个这样的文档都可由它的URL、标题和正文标识。此外,Web服务器还提供有些附加的信息,例 如类型、长度和文档的最后修改日期等。因此,从数据挖掘的角度来看,我们可以把所有HTML文档组成的集合看做一个关系: Document (url, title, text, type, length, modif) 其中所有的属性都是字符串类型。在这个框架下,单个文档由关系中的一个元组来标识。当然,如果HTML文档丢失了一些可选信息,则相关的域将为空值,但是这在任何数据库中都是常见的方法。