One of the difficulties in building an SQL-like query lange for the Web is the absence of a database schema for this huge, heterogeneous repository of information. However, if we are interested in HTML documents only, we can construct a virtual (66) from the implicit structure of these files. Thus, at the highest level of (67) , every such document is identified by its Uniform Resource Locator (URL), has a title and a text Also, Web servers provide some additional information such as the type, length, and the last modification date of a document. So, for data mining purposes, we can consider the site of all HTML documents as arelation: Document (url, (68) , text, type, length, modify) Where all the (69) are character strings. In this framework, anindividual document is identified with a (70) in this relation. Of course, if some optional information is missing from the HTML document, the associate fields will de left blank, but this is not uncommon in any database.