单选题
One of the difficulties in building an SQL-like query
language for the Web is the absence of a database{{U}} {{U}} 11
{{/U}} {{/U}}for this huge, heterogeneous repository of information. However,
if we are interested in HTML documents only, we can construct a virtual schema
from the implicit structure of these files. Thus, at the highest level
of{{U}} {{U}} 12 {{/U}} {{/U}}, every such document is
identified by its Uniform Resource Locator (URL), and a{{U}} {{U}}
13 {{/U}} {{/U}}and a text. Also, Web severs provide some additional
information such as the type, length, and the last modification date of a
document. So for data mining purposes, we can consider the set of all HTML
documents as a relation: Document (url, title, text, type,
length, modif) Where all the{{U}} {{U}} 14
{{/U}} {{/U}}are character strings. In this framework, an individual document
is identified with a{{U}} {{U}} 15 {{/U}} {{/U}}in this
relation. Of course, if some optional information is missing from the HTML
document, the associate fields will be left blank, but this is not uncommon in
any database.