0% found this document useful (0 votes)

20 views14 pages

Relational Databases For Querying XML Documents

Uploaded by

H-Kati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views14 pages

Relational Databases For Querying XML Documents

Uploaded by

H-Kati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

MIT OpenCourseWare

http://ocw.mit.edu

20.453J / 2.771J / HST.958J Biomedical Information Technology

Fall 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Relational Databases for Querying XML Documents:
Limitations and Opportunities

Jayavel Shanmugasundaram Kristin Tufte Gang He

Chun Zhang David DeWitt Jeffrey Naughton
Department of Computer Sciences
University of Wisconsin-Madison
{jai, tufte, czhang, dewitt, naughton}@cs.wisc.edu, [email protected]

Abstract model that would make it more appropriate for

processing queries over XML documents.
XML is fast emerging as the dominant standard
for representing data in the World Wide Web. 1. Introduction
Sophisticated query engines that allow users to
effectively tap the data stored in XML Extensible Markup Language (XML) is fast emerging as
documents will be crucial to exploiting the full the dominant standard for representing data on the
power of XML. While there has been a great deal Internet. Like HTML, XML is a subset of SGML.
of activity recently proposing new semi- However, whereas HTML tags serve the primary purpose
structured data models and query languages for of describing how to display a data item, XML tags
this purpose, this paper explores the more describe the data itself. The importance of this simple
conservative approach of using traditional distinction cannot be underestimated – because XML data
relational database engines for processing XML is self-describing, it is possible for programs to interpret
documents conforming to Document Type the data. This means that a program receiving an XML
Descriptors (DTDs). To this end, we have document can interpret it in multiple ways, can filter the
developed algorithms and implemented a document based upon its content, can restructure it to suit
prototype system that converts XML documents the application’s needs, and so forth.
to relational tuples, translates semi-structured The initial impetus for XML may have been primarily
queries over XML documents to SQL queries to enhance this ability of remote applications to interpret
over tables, and converts the results to XML. We and operate on documents fetched over the Internet.
have qualitatively evaluated this approach using However, from a database point of view, XML raises a
several real DTDs drawn from diverse domains. different exciting possibility: with data stored in XML
It turns out that the relational approach can documents, it should be possible to query the contents of
handle most (but not all) of the semantics of these documents. One should be able to issue queries over
semi-structured queries over XML data, but is sets of XML documents to extract, synthesize, and
likely to be effective only in some cases. We analyze their contents. But what is the best way to provide
identify the causes for these limitations and this query capability over XML documents?
propose certain extensions to the relational At first glance the answer is obvious. Since an XML
document is an example of a semi-structured data set (it is
Permission to copy without fee all or part of this material tree-structured, with each node in the tree described by a
is granted provided that the copies are not made or label), why not use semi-structured query languages and
distributed for direct commercial advantage, the VLDB query evaluation techniques? This is indeed a viable
copyright notice and the title of the publication and its approach, and there is considerable activity in the semi-
date appear, and notice is given that copying is by structured data community focussed upon exploiting this
permission of the Very Large Data Base Endowment. To approach [5,14]. While semi-structured techniques will
copy otherwise, or to republish, requires a fee and/or clearly work, in this paper we ask the question of whether
special permission from the Endowment this is the only or the best approach to take. The downside
Proceedings of the 25th VLDB Conference, of using semi-structured techniques is that this approach
Edinburgh, Scotland, 1999. turns its back on 20 years of work invested in relational
database technology. Is it really the case that we cannot 1.1 Related Work
use relational technology, and must start afresh with new
There has been a lot of work developing special purpose
techniques? Or can we leverage relational technology to
query engines for semi-structured data [5,14]. Many of
provide query capability over XML documents?
the abstracts submitted to the XML query languages
In this paper we demonstrate that it is indeed possible
workshop use this approach [18]. Our goal in this paper,
to use standard commercial relational database systems to
however, is to investigate the use of relational database
evaluate powerful queries over XML documents. The key
systems to process queries on semi-structured documents.
that makes this possible is the existence of Document
In this sense, our work is similar to the work on STORED
Type Descriptors (DTDs) [2] (or an equivalent, such as
[10]. However, our approach differs in important ways.
DCDs [4] or XML Schemas [16]). A DTD is in effect a
The STORED approach uses a combination of relational
schema for a set of XML documents. Without DTDs or
and semi-structured techniques to process any semi-
their equivalent, XML will never reach its full potential,
structured documents. We begin with the assumption that
because a tagged document is not very useful without
the document conforms to a schema and store the
some agreement among inter-operating applications as to
document entirely within the relational system. Further,
what the tags mean. Put another way, the reason the
we handle recursive queries, address the issue of
Internet community is so excited about XML is that there
constructing the result in XML and evaluate the relational
is the vision of a future in which the vast majority of files
approach using real DTDs.
on the web are XML files conforming to DTDs. An
Oracle 8i provides some basic support for querying
application encountering such a file can interpret the file
XML documents using a relational engine [17]. However,
by consulting the DTDs to which the document conforms.
the translation from document schemas to relational
Our approach to querying XML documents is the
schemas is manual and not automatic as in our approach.
following. First, we process a DTD to generate a
In addition, Oracle 8i does not provide support for semi-
relational schema. Second, we parse XML documents
structured queries over XML documents and provides
conforming to DTDs and load them into tuples of
only primitive support for converting results to XML.
relational tables in a standard commercial DBMS (in our
There has also been work on processing SGML data
case, IBM DB2). Third, we translate semi-structured
using an OODBMS [6]. The conclusion was that this is
queries (specified in a language similar to XML-QL [9] or
feasible with some extensions to OO query languages.
Lorel [1]) over XML documents into SQL queries over
Our work considers a more restricted set of documents
the corresponding relational data. Finally, we convert the
(XML, rather than SGML) and considers mapping to the
results back to XML.
relational model, rather than a general OO model.
The good news is that this works. A main contribution
Our method of eliminating wild cards and alternations
of this paper is the description of an approach that enables
in path expression queries to enable processing by a
one to take the XML queries, data sets, and schemas so
relational engine bears some similarities to the work on
foreign to the relational world and process them in
compile time optimization of path expressions in semi-
relational systems without any manual intervention. This
structured query engines [12,15]. Our different focus,
means that we are presented with a large opportunity: all
however, results in modified techniques.
of the power of relational database systems can be
brought to bear upon the XML query problem.
1.2 Roadmap
However, the fact that something is possible does not
necessarily imply that it is a good idea. Our experience The rest of the paper is organized as follows. Section 2
with implementing this system and using it with over 30 gives an overview of XML documents, schemas and
different XML DTDs has revealed that there are a number query languages. The algorithms for translating DTDs and
of limitations in current relational database systems that in XML documents to a relational format and an evaluation
some instances make using relational technology for of the algorithms using real DTDs are given in Section 3.
XML queries either awkward or inefficient. Relational Section 4 describes the translation of queries over XML
technology proves awkward for queries that require documents to SQL queries. Section 5 deals with the
complex XML constructs in their results, and may be conversion of the results to XML. Section 6 concludes by
inefficient when fragmentation due to the handling of set- proposing extensions to the relational model that will
valued attributes and sharing causes too many joins in the make it more suitable for processing XML documents.
evaluation of simple queries. Another contribution of this
paper is the identification of those limitations, and a 2. Overview of XML, XML Schemas and
discussion of how they might be removed. It is an open XML Query Languages
question at this point whether the best approach is to start
with relational technology and try to remove those In this section, we give a very brief overview of XML,
limitations, or to start with a semi-structured system and XML schemas and XML query languages. Further details
try to add the power and sophistication currently found in can be obtained from the references.
relational query processing systems.
2.1 Extensible Markup Langua ge one or more elements), ? (optional), and | (or). All values
are assumed to be string values, unless the type is ANY in
Extensible Markup Language (XML) is a hierarchical
which case the value can be an arbitrary XML fragment.
data format for information exchange in the World Wide
There is a special attribute, id, which can occur once for
Web. An XML document consists of nested element
each element. The id attribute uniquely identifies an
structures, starting with a root element. Element data can
element within a document and can be referenced through
be in the form of attributes or sub-elements. Figure 1
an IDREF field in another element. IDREFs are untyped.
shows an XML document that contains information about
Finally, there is no concept of a root of a DTD – an XML
a book. In this example, there is a book element that has
document conforming to a DTD can be rooted at any
two sub-elements, booktitle and author. The author
element specified in the DTD. Figure 2 shows a DTD
element has an id attribute with value “dawkins” and is
specification, while Figure 1 gives an XML document that
further nested to provide name and address information.
conforms to this DTD.
Further information on XML can be found in [3,8].
Document Content Descriptors (DCDs) [4] and XML
Schemas [16] are extensions to DTDs. For our purposes,
<book>
<booktitle> The Selfish Gene </booktitle>
the main difference between these and DTDs is that they
<author id = “dawkins”>
allow typing of values and set size specification. If DCDs
<name> and XML Schemas become standard, the additional
<firstname> Richard </firstname> information would aid in our translation process; for
<lastname> Dawkins </lastname> example, we could create tables with integer attributes
</name> where appropriate instead of using just strings. The types
<address> in the current DCD proposal are compatible with types
<city> Timbuktu </city> supported by current relational systems. More complex
<zip> 99999 </zip> types will require object-relational extensions.
</address>
</author> 2.3 XML Query Languages
</book>
SELECT X.author.lastname
Figure 1 FROM book X
WHERE X.booktitle = “The Selfish Gene”

<!ELEMENT book (booktitle, author)

Figure 3
<!ELEMENT article (title, author*, contactauthor)>
WHERE <book>
<!ELEMENT contactauthor EMPTY> <booktitle> The Selfish Gene </booktitle>
<author>
<!ATTLIST contactauthor authorID IDREF IMPLIED>
<lastname> $l </lastname>
<!ELEMENT monograph (title, author, editor)> </>
</> IN a.xml, b.xml
<!ELEMENT editor (monograph*)> CONSTRUCT <lastname> $l </lastname>
<!ATTLIST editor name CDATA #REQUIRED>
<!ELEMENT author (name, address)>
Figure 4
<!ATTLIST author id ID #REQUIRED> There are many semi-structured query languages that can
be used to query XML documents, including XML-QL
<!ELEMENT name (firstname?, lastname)>
[9], Lorel [1], UnQL [5] and XQL (from Microsoft). All
<!ELEMENT firstname (#PCDATA)> these query languages have a notion of path expressions
<!ELEMENT lastname (#PCDATA)> for navigating the nested structure of XML. XML-QL
<!ELEMENT address ANY> uses a nested XML-like structure to specify the part of a
document to be selected and the structure of the result
Figure 2 XML document.
Figure 4 shows an XML-QL query to determine the
2.2 DTDs and other XML Sche mas last name of an author of a book having title “The Selfish
Gene”, specified over a set of XML documents
Document Type Descriptors (DTDs) [2] describe the conforming to the DTD in Figure 2. The last names thus
structure of XML documents and are like a schema for selected will be nested within a lastname tag, as specified
XML documents. A DTD specifies the structure of an in the construct clause of the query. Lorel is more like
XML element by specifying the names of its sub-elements SQL and its representation of the same query is shown in
and attributes. Sub-element structure is specified using the Figure 3. In this paper, we use a combination of XML-QL
operators * (set with zero or more elements), + (set with and Lorel (modified appropriately for our purposes).
3. Storing XML Documents in a Relational (c) grouping transformations that group sub-elements
Database System having the same name (for example, two a* sub-elements
are grouped into one a* sub-element - see Figure 7). In
In this section, we describe how to generate relational addition, all “+” operators are transformed to “*”
schemas from XML DTDs. The main issues that must be operators. Our example specification would be
addressed include (a) dealing with the complexity of DTD transformed to: <!ELEMENT a (b*, c?, e*, f*)>.
element specifications (b) resolving the conflict between The transformations preserve the semantics of (a) one
the two-level nature of relational schemas (table and or many and (b) null or not null. The astute reader may
attribute) vs. the arbitrary nesting of XML DTD schemas notice that we have lost some information about relative
and (c) dealing with set-valued attributes and recursion. orders of the elements. This is true; fortunately, this
information can be captured when a specific XML
3.1 Simplifying DTDs document is loaded into this relational schema (e.g., by
position fields in the tuples representing some of the
In general, DTDs can be complex and generating
elements.) We now explore techniques for converting a
relational schemas that capture this complexity would be
simplified DTD to a relational schema.
unwieldy at best. Fortunately, one can simplify the details
of a DTD and still generate a relational schema that can
store and query documents conforming to that DTD. 3.2 Motivation for Special Sche ma Conversion
Note that it is not necessary to be able to regenerate a Techniques
DTD from the generated relational schema. Rather, what Traditionally, relational schemas have been derived from
is required is that (a) any document conforming to the a data model such as the Entity-Relationship model. This
DTD can be stored in the relational schema, and (b) any translation is straightforward because there is a clear
XML semi-structured query over a document conforming separation between entities and their attributes. Each
to the DTD can be evaluated over the relational database entity and its attributes are mapped to a relation.
instance. When converting an XML DTD to relations, it is
Most of the complexity of DTDs stems from the tempting to map each element in the DTD to a relation
complex specification of the type of an element. For and map the attributes of the element to attributes of the
instance, we could specify an element a as <!ELEMENT relation. However, there is no correspondence between
a ((b|c|e)?,(e?|(f?,(b,b)*))*)>, where b, c, e and f are other elements and attributes of DTDs and entities and
elements. However, at the query language level, all that attributes of the ER-Model. What would be considered
matters is the position of an element in the XML “attributes” in an ER-Model are often most naturally
document, relative to its siblings and the parent-child represented as elements in a DTD. Figure 2 shows a
relationship between elements in the XML document. We DTD that illustrates this point. In an ER-Model, author
now propose a set of transformations that can be used to would be an “entity” and firstname, lastname and address
“simplify” any arbitrary DTD without undermining the would be attributes of that entity. In designing a DTD,
effectiveness of queries over documents conforming to there is no incentive to make author an element and
that DTD. These transformations are a superset of similar firstname, lastname and address attributes. In fact, in
transformations presented in [10]. XML, if firstname and lastname were attributes, they
could not be nested under name because XML attributes
e1** � e1* cannot have a nested structure. Directly mapping elements
(e1, e2)* � e1*, e2* e1*? � e1*
(e1, e2)? � e1?, e2? to relations is thus likely to lead to excessive
e1?* � e1* fragmentation of the document.
(e1|e2) � e1?, e2? e1?? � e1?
Figure 5 Figure 6 3.3 The Basic Inlining Techniq ue

..., a*, ..., a*, ... � a*, ... The Basic Inlining Technique, hereafter referred to as
..., a*, ..., a?, ... � a*, ... Basic, solves the fragmentation problem by inlining as
..., a?, ..., a*, ... � a*, ... many descendants of an element as possible into a single
..., a?, ..., a?, ... � a*, … relation. However, Basic creates relations for every
…, a, …, a, … � a*, … element because an XML document can be rooted at any
element in a DTD. For example, the author element in
Figure 7 Figure 2 would be mapped to a relation with attributes
The transformations are of three types: (a) flattening firstname, lastname and address. In addition, relations
transformations which convert a nested definition into a would be created for firstname, lastname and address.
flat representation (i.e., one in which the binary operators We must address two complications: set-valued
“,” and “|” do not appear inside any operator – see Figure attributes and recursion. In the example DTD in Figure 2,
5) (b) simplification transformations, which reduce many when creating a relation for article, we cannot inline the
unary operators to a single unary operator (Figure 6) and set of authors because the traditional relational model
does not support set-valued attributes. Rather, we follow Each node is marked as “visited” the first time it is
the standard technique for storing sets in an RDBMS and reached and is unmarked it once all its children have been
create a relation for author and link authors to articles traversed.
using a foreign key. Just using inlining (if we want the If an unmarked node in the DTD graph is reached
process to terminate) necessarily limits the level of during depth first traversal, a new node bearing the same
nesting in the recursion. Therefore, we express the name is created in the element graph. In addition, a
recursive relationship using the notion of relational keys regular edge is created from the most recently created
and use relational recursive processing to retrieve the node in the element graph with the same name as the DFS
relationship. In order to do this in a general fashion, we parent of the current DTD node to the newly created node.
introduce the notion of a DTD graph. If an attempt is made to traverse an already marked
DTD node, then a backpointer edge is added from the
book article monograph most recently created node in the element graph to the
most recently created node in the element graph with the
?
same name as the marked DTD node.
booktitle title
The element graph for the editor element in the DTD
* contactauthor editor graph in Figure 8 is shown in Figure 9. Intuitively, the
element graph expands the relevant part of the DTD graph
authorID *
into a tree structure.
author Given an element graph, relations are created as
name
follows. A relation is created for the root element of the
name
address authorid graph. All the element’s descendents are inlined into that
relation with the following two exceptions: (a) children
?
directly below a “*” node are made into separate relations
firstname lastname – this corresponds to creating a new relation for a set-
valued child; and (b) each node having a backpointer edge
Figure 8 pointing to it is made into a separate relation – this
corresponds to creating a new relation to handle recursion.
editor Figure 10 shows the relational schema that would be
generated for the DTD in Figure 2. There are several
features to note in the schema. Attributes in the relations
* name
are named by the path from the root element of the
relation. Each relation has an ID field that serves as the
monograph key of that relation. All relations corresponding to
element nodes having a parent also have a parentID field
title that serves as a foreign key. For instance, the
author article.author relation has a foreign key
article.author.parentID that joins authors with articles.
name address authorid
The XML document in Figure 1 would be converted to
the following tuple in the book relation:
?
(1, The Selfish Gene, Richard, Dawkins,
firstname lastname <city>Timbuktu</city><zip>99999</zip>, dawkins)
Figure 9 The ANY field, address, is stored as an uninterpreted
A DTD graph represents the structure of a DTD. Its string; thus the nested structure is not visible to the
nodes are elements, attributes and operators in the DTD. database system without further support for XML (see
Each element appears exactly once in the graph, while Section 6). Note that if the author Richard Dawkins has
attributes and operators appear as many times as they authored many books, then the author information will be
appear in the DTD. The DTD graph corresponding to the replicated for each book because it is replicated in the
DTD in Figure 2 is given in Figure 8. Cycles in the DTD corresponding XML documents.
graph indicate the presence of recursion. While Basic is good for certain types of queries, such
The schema created for a DTD is the union of the sets as “list all authors of books”, it is likely to be grossly
of relations created for each element. In order to inefficient for other queries. For example, queries such as
determine the set of relations to be created for a particular
“list all authors having first name Jack” will have to be
element, we create a graph structure called the element
executed as the union of 5 separate queries. Another
graph. The element graph is constructed as follows. disadvantage of Basic is the large number of relations it
Do a depth first traversal of the DTD graph, starting at creates. Our next technique attempts to resolve these
the element node for which we are constructing relations.
problems.
book (bookID: integer, book.booktitle : string, book.author.name.firstname: string, book.author.name.lastname: string,
book.author.address: string, author.authorid: string)
booktitle (booktitleID: integer, booktitle: string)
article (articleID: integer, article.contactauthor.authorid: string, article.title: string)
article.author (article.authorID: integer, article.author.parentID: integer, article.author.name.firstname: string,
article.author.name.lastname: string, article.author.address: string, article.author.authorid: string)
contactauthor (contactauthorID: integer, contactauthor.authorid: string)
title (titleID: integer, title: string)
monograph (monographID: integer, monograph.parentID: integer, monograph.title: string, monograph.editor.name: string,
monograph.author.name.firstname: string, monograph.author.name.lastname: string,
monograph.author.address: string, monograph.author.authorid: string)
editor (editorID: integer, editor.parentID: integer, editor.name: string)
editor.monograph (editor.monographID: integer, editor.monograph.parentID: integer, editor.monograph.title: string,
editor.monograph.author.name.firstname: string, editor.monograph.author.name.lastname: string,
editor.monograph.author.address: string, editor.monograph.author.authorid: string)
author (authorID: integer, author.name.firstname: string, author.name.lastname: string, author.address: string,
author.authorid: string)
name (nameID: integer, name.firstname: string, name.lastname: string)
firstname (firstnameID: integer, firstname: string)
lastname (lastnameID: integer, lastname: string)
address (addressID: integer, address: string)
Figure 10

book (bookID: integer, book.booktitle.isroot: boolean, book.booktitle : string)

article (articleID: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string)
monograph (monographID: integer,monograph.parentID: integer, monograph.parentCODE: integer,
monograph.editor.isroot: boolean, monograph.editor.name: string)
title (titleID: integer, title.parentID: integer, title.parentCODE: integer, title: string)
author (authorID: integer, author.parentID: integer, author.parentCODE: integer, author.name.isroot: boolean,
author.name.firstname.isroot: :boolean, author.name.firstname: string, author.name.lastname.isroot: boolean,
author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string)

Figure 11
3.4 The Shared Inlining Techni que one of them is made a separate relation. We can find such
mutually recursive elements by looking for strongly
The Shared Inlining Technique, hereafter referred to as
connected components in the DTD graph.
Shared, attempts to avoid the drawbacks of Basic by
Once we decide which element nodes are to be made
ensuring that an element node is represented in exactly
into separate relations, it is relatively easy to construct the
one relation. The principal idea behind Shared is to
relational schema. Each element node X that is a separate
identify the element nodes that are represented in multiple
relation inlines all the nodes Y that are reachable from it
relations in Basic (such as the firstname, lastname and
such that the path from X to Y does not contain a node
address elements in the example) and to share them by
(other than X) that is to be made a separate relation.
creating separate relations for these elements.
Figure 11 shows the schema derived from the DTD graph
We must first decide what relations to create. In
of Figure 8. One striking feature is the small number of
Shared, relations are created for all elements in the DTD
relations compared to the Basic schema (Figure 10).
graph whose nodes have an in-degree greater than one.
Inlining an element X into a relation corresponding to
These are precisely the nodes that are represented as
another element Y creates problems when an XML
multiple relations in Basic. Nodes with an in-degree of
document is rooted at the element X. To facilitate queries
one are inlined. Element nodes having an in-degree of
on such elements we make use of isRoot fields.
zero are also made separate relations, because they are not
The element sharing in Shared has query processing
reachable from any other node. As in Basic, elements
implications. For example, a selection query over all
below a “*” node are made into separate relations.
authors accesses only one relation in Shared compared to
Finally, of the mutually recursive elements all having in-
five relations in Basic. Despite the fact that Shared
degree one (such as monograph and editor in Figure 8),
addresses some of the shortcomings and shares some of
book (bookID: integer, book.booktitle.isroot: boolean, book.booktitle : string, author.name.firstname: string,
author.name.lastname: string, author.address: string, author.authorid: string)
article (articleID: integer, article.contactauthor.isroot: boolean, article.contactauthor.authorid: string,
article.title.isroot: boolean, article.title: string)
monograph (monographID: integer, monograph.parentID: integer, monograph.parentCODE: integer,
monograph.title: string, monograph.editor.isroot: boolean, monograph.editor.name: string,
author.name.firstname: string, author.name.lastname: string, author.address: string, author.authorid: string)
author (authorID: integer, author.parentID: integer, author.parentCODE: integer, author.name.isroot: boolean,
author.name.firstname.isroot: boolean, author.name.firstname: string, author.name.lastname.isroot: boolean,
author.name.lastname: string, author.address.isroot: boolean, author.address: string, author.authorid: string)

Figure 12

the strengths of Basic, Basic performs better in one about path expressions because we use a relational
important respect – reducing the number of joins starting database which uses joins to process path expressions.
at a particular element node. Thus we explore a hybrid This subsection logically contains “forward
approach that combines the join reduction properties of references” to Section 4, in which we describe how SQL
Basic with the sharing features of Shared queries are generated from semi-structured XML queries.
However, the only point from Section 4 that is necessary
3.5 The Hybrid Inlining Techn ique to understand the results here is that a single semi-
structured query could give rise to a union of several SQL
The Hybrid Inlining Technique, or Hybrid, is the same as
queries, and that each of these queries may contain some
Shared except that it inlines some elements that are not
number of joins. The use of Basic vs. Shared vs. Hybrid
inlined in Shared. In particular, Hybrid additionally
determines how many queries are generated, and how
inlines elements with in-degree greater than one that are
many joins are found in each query. Although Basic and
not recursive or reached through a “*” node. Set sub-
Hybrid reduce the number of joins per SQL query, their
elements and recursive elements are treated as in Shared.
higher degree of inlining could cause more SQL queries
Figure 12 shows the relational schema generated using
to be generated. For each algorithm, each DTD, and a
this hybrid approach. Note how this schema combines
variable number of path lengths, we make the following
features of both Basic and Shared – author is inlined with
measurements:
book and monograph even though it is shared, while
• The average number of SQL queries generated for
monograph and editor are represented exactly once.
path expressions of length N.
So far, we have implicitly assumed that the data model
is unordered, i.e., the position of an element does not • The average number of joins in each SQL query
matter. Order could, however, be easily incorporated into for path expressions of length N.
our framework by storing a position field for each • The total average number of joins in order to
element. process path expressions of length N (the product
of the two previous measurements.)
3.6 A Qualitative Evaluation of the Basic, Shared In Sections 3.6.2 and 3.6.3, we assume that path
and Hybrid Techniques expressions start from an arbitrary element in the DTD.
We relax this assumption in Section 3.6.4.
In this section we qualitatively evaluate our relation-
conversion algorithms using 37 DTDs available from 3.6.2 Evaluation Results for Expr ession Paths of
Robin Cover's SGML/XML Web page [8]. We did not Length 3
pose any criterion for selecting DTDs except for
availability for easy download and validity. Some DTDs In this section we show the results for path expressions of
were excluded because they did not pass our XML parser, length 3, which is the longest path length applicable to all
the IBM alphaWorks xml4j. 37 DTDs. We shall examine the results for other path
lengths in the next section. In the interest of space, we
3.6.1 Evaluation Metric show the results only for a subset of the DTDs and
summarize the others.
Our major concern in evaluating the algorithms is the First we consider whether the Basic approach is
efficiency of query processing. Our metric is the average practical. For 11 of our 37 DTDs, Basic did not run to
number of SQL joins required to process path expressions completion because it ran out of virtual memory. The
of a certain length N. We use this metric because path reason for this is that Basic generates huge numbers of
expressions are at the heart of query languages proposed relations if DTDs have large strongly connected
for semi-structured data. We are particularly concerned components. We can see this effect clearly on some of
the DTDs that Basic did run to completion. One 19 node
DTD has a SCC size of 4, and the number of relations produces at least the number of SQL queries as Shared.
created is 204 times as many as created by Hybrid, Figure 15 shows the total number of joins.
totalling 3462 relations. Due to this severe limitation of Using the average total number of joins required to
Basic, we concentrate on the comparisons between process path expressions of length 3, we can roughly
Shared and Hybrid. categorize the 37 DTDs into four groups:
Group 1. DTDs for which Hybrid reduces a large
Shared Hybrid percentage of joins per SQL query but incurs a smaller
2 increase in the number of SQL queries. The net result is
1.8 Hybrid requires fewer joins than Shared. Example: DTD
1.6
Joins/Query

1.4 “ofx1516”.
1.2
1 Group 2. DTDs for which Hybrid reduces a large
0.8
0.6 percentage of joins per SQL query and incurs a
0.4 comparable increase in the number of SQL queries. The
0.2
0 total number of joins is about the same. Example: DTD
“vrml”.
l
14

6
x

id if
bi l

l
tia
am

m
sm
tf-

p
51
at

vr
ps

en
ni
m

Group 3. DTDs for which Hybrid reduces some joins

s
re

per SQL query, but not enough to offset the increase in

the number of SQL queries; therefore Hybrid generates
Figure 13 more joins for a path expression than Shared. Example:
DTD “saej”.
Shared Hybrid
Group 4. DTDs for which both Shared and Hybrid
2 produce about the same number of joins per SQL query,
1.8
1.6 and about the same number of SQL queries, resulting in
1.4 approximately the same total number of joins. Example:
Queries

1.2
1 DTD “math”.
0.8
0.6 Hybrid inlines more than Shared in Groups 1, 2 and 3.
0.4
0.2 This reduces the number of joins per SQL query but
0
increases the number of SQL queries. The net increase or
l
14

id if
bi l

l
h

decrease in the total number of joins depends on the

tia
am

m
e
sm
p
tf-

51
at

vr
ps

en
m

ni
x1

structure of the DTD. In Group 4, most of the shared

s
re

nodes are either set nodes or involved in recursion. Since

Shared and Hybrid treat set nodes and recursive nodes
Figure 14 identically, there is no significant difference in their
Shared Hybrid performance in Group 4.

2.5 Group 1 Group 2 Group 3 Group 4

2 Num 13 2 6 16
Total Joins

1.5 DTDs
1
0.5 The number of DTDs in each group from all 37 DTDs
0 is summarized in the table above. We can infer that in a
large number of DTDs (Group 4), most of the shared
l
14

l
6
h

si pif
bi l

il
ia
am

m
sm
tf-

51
at

sa
nt

vr
ps

ni
m

nodes are either set nodes or recursive nodes.

de
of

3.6.3 Results for Path Expression s of Other Lengths

Figure 15
In the previous section, we showed the results for path
Figures 13, 14 and 15 show results for 10 of the expressions of length 3. In order to see how the results
DTDs. As shown in Figure 13, Hybrid eliminates a large carry over to other path lengths, let us examine how the
number of joins for some DTDs, whereas for others, number of joins scales with the path length. We found
Hybrid and Shared produce about the same number of that for almost all the DTDs, the number of joins scales
joins. Figure 14 shows that for some DTDs, querying over linearly with the path length, the only difference is the
3-length path expressions using Hybrid requires more scaling factor, which is determined by the structure of the
SQL queries than using Shared, while for other DTDs, the DTD. Furthermore, the gap between the performance of
number of SQL queries is the same. Note that for any path Shared and Hybrid typically widens when the path
expression, Shared always produces at least the number of lengthens. Figure 16 and Figure 17 show the scaling for
joins per SQL query as Hybrid, and Hybrid always two DTDs in group 1 and group 3 respectively.
Shared Hybrid 4. Converting Semi-Structured Queries to
11
SQL
10
9 Semi-structured query languages have a lot more
8 flexibility than SQL. In particular, they allow path
Total Joins

7
6
expressions with various operators and wild cards. The
5 challenge is to rewrite these queries in SQL exploiting
4 DTD information. In this section, we consider only
3
2 queries with string values as results. Queries with more
1 complex result formats are dealt with in Section 5. For
0
ease of exposition, we present the translation algorithm
1 2 3 4 5 6 7 8 9 10 11
only in the context of the Shared approach. The
Path Length
generalization to the other approaches is straightforward.

Figure 16 4.1 Converting Queries with Si mple Path

Expressions to SQL
Shared Hybrid
Consider the following XML-QL query, and an
11 equivalent Lorel-like query, over the DTD in Figure 2 that
10
9 asks for the first and last name of the author of a book
8 with title “The Selfish Gene”. Note that we have slightly
7
Total Joins

6 extended the XML-QL syntax to query over all

5 documents conforming to a DTD.
4
3 WHERE <book>
2
1 <booktitle> The Selfish Gene </booktitle>
0 <author>
<name>
1 2 3 4 5 6 7 8 9 10 11
<firstname> $f </firstname>
Path Length <lastname> $l </lastname>
</name>
</author>
Figure 17 </book> IN * CONFORMING TO pubs.dtd
CONSTRUCT <result> $f $l </result>
3.6.4 Evaluation Using Path Exp ressions Starting
From the Document Root
Select Y.name.firstname,
So far, we have examined the performance of our Y.name.lastname
From book X, X.author Y
algorithms assuming path expressions start from an
Where X.booktitle = “Databases”
arbitrary node in the DTD graph. What is different if the
path expressions start from the root of a document? The As can be seen from the Lorel-like representation, this
real difference is in the total number of joins. A path query essentially consists of five path expressions,
expression starting from the root of a document is always namely, book, X.author, Y.name.firstname,
converted to one SQL query - therefore the total number Y.name.lastname and X.booktitle. Of these path
of joins is equivalent to the number of joins per SQL expressions, book is the root path expression and the
query. Since the Hybrid algorithm always produces fewer others are dependent path expressions. This query is
joins per SQL query, it is always better than Shared for translated into SQL as follows: (a) first, the relation(s)
path expressions that start from the document root. corresponding to start of the root path expression(s) are
For DTDs in groups 3 and 4 (the majority of DTDs), identified and added to the from clause of the SQL query,
both Shared and Hybrid are practically the same. The then (b) if necessary, the path expressions are translated to
main issue is the excessive fragmentation of the DTDs joins among relations (when elements are inlined, joins
that leads to the number of joins being almost equal to the are not necessary). The SQL query generated in this
length of the path expression (Figure 17). This is likely to fashion for the example query above is shown in Figure
be very inefficient in the relational model, especially for 18. Note that a join condition has been added to the where
long path lengths. The main cause of this fragmentation is clause to link the book and author and a selection
the presence of set sub-elements. Section 6 includes a (A.parentCODE = 0, where 0 indicates that the parent of
proposed extension to alleviate this problem. the author is a book) is performed on author to make sure
that only authors reached through book are considered.
4.3 Converting Arbitrary Path Expressions to
Select A.”author.name.firstname”,
A.”author.name.lastname” Simple Recursive Path Expressions
From author A, book B
Where B.bookID = A.parentID In general, path expressions can be of arbitrary
AND A.parentCODE = 0 complexity. For example, we could have a query that asks
AND B.”book.booktitle” = “The Selfish Gene” for all the name elements reachable directly or indirectly
Figure 18 through monograph. This would be represented in a
Lorel-like language as (an equivalent query can be
4.2 Converting Simple Recursi ve Path Expressions expressed in XML-QL):
to SQL
Select X
Consider the following XML-QL query that requires the From monograph.(#)*.name X
names of all editors reachable directly or indirectly from
the monograph with title “Subclass Cirripedia”. The We have a general technique that takes path
corresponding XML-QL query (and an equivalent Lorel expressions appearing in such queries (in this example
like query) is shown below: “monograph.(#)*.name”) and translates them into
possibly many simple (recursive) path expressions. SQL
WHERE <*.monograph> queries are then generated for each simple recursive path
<editor.(monograph.editor)*>
<name> $n </name> expression. This notion of splitting a path expression to
</> many simple path expressions is crucial to processing
<title> Subclass Cirripedia </title> queries having arbitrary path expressions in SQL. The
</> IN * CONFORMING TO pubs.dtd
CONSTRUCT <result> $n </result>
details of the technique are tedious and we omit them here
in the interest of space.
Our technique is general enough to handle path
Select Y.name
expressions with nested recursion (e.g., “(a.(b)*.c)*”).
From *.monograph X, X.editor.(monograph.editor)* Y
However, relational database systems such as IBM DB2
Where X.title = “Subclass Cirripedia”
cannot currently handle these queries because they do not
have support for nested recursive queries.
There are two interesting features about this query.
The first is the tag “*.monograph” which states that we 5. Converting Relational Results to XML
are interested in monographs reachable from any path.
In the previous section, we assumed that the results of a
The second is the tag “editor.(monograph.editor)*” that
query were string values. We relax this assumption in this
specifies all editors reachable directly or indirectly from a
section and explore how the tabular results returned by
monograph. The trick in converting this to a least fix-
SQL queries can be converted to complex structured
point query such as that supported by IBM DB2 is to
XML documents. This is perhaps the main drawback in
determine (a) the initialization of the recursion and (b) the
using current relational technology to provide XML
actual recursive path expression. In the example above,
querying – constructing arbitrary XML result sets is
the initialization of the recursion is the path expression
difficult. In this section we give some examples, using
*.monograph.editor with the selection condition
XML-QL as the illustrative query languages because it
monograph.title = “Subclass Cirripedia” and the recursive
provides XML structuring constructs.
path expression is monograph.editor. Each can be
converted to a SQL fragment just like a simple path
expression. The final query is the union of the two SQL 5.1 Simple Structuring
fragments within a least fix-point operator. The query Consider the query in Figure 20 that asks for the first
generated in this fashion is shown in Figure 19, in IBM name and last name of all the authors of books, nested
DB2 syntax. Note that the “with clause” is the equivalent appropriately. Constructing such results from a relational
of the least fix-point operator in DB2. system is natural and efficient, since it only requires
attaching the appropriate tags for each tuple (Figure 21).
With Q1 (monographID, name) AS
(Select X.monographID, X.”editor.name”
From monograph X 5.2 Tag Variables
Where X.title = “Subclass Cirripedia”
UNION ALL A tag variable is one that ranges over the value of an
Select Z.monographID, Z.”editor.name” XML tag. Some queries requiring tag variables in their
From Q1 Y, monograph Z
Where Y.monographID = Z.parentID AND
results are naturally translated to the relational model.
Z.parentCODE = 0 Consider the query in Figure 22 that ask for names of
) authors of all publications, nested under a tag specifying
Select A.name the type of publication. This can be handled by
From Q1 A
generating a relational query that contains the tag value as
Figure 19 an element of the result tuple. Then at result generation
WHERE <book> <author>
<author> <firstname> Richard </firstname>
<firstname> $f </firstname>
<lastname> $l </lastname> <lastname> Dawkins </lastname>
(Richard, Dawkins)
</> </author>
</> IN * CONFORMS TO pubs.dtd (NULL, Darwin)
CONSTRUCT <author> <author>
<firstname> $f </firstname> <lastname> Darwin </lastname>
<lastname> $l </lastname>
</author>
</author>
Figure 20 Figure 21
<book>
<author>
<firstname> Richard </firstname>
WHERE <$p>
<lastname> Dawkins </lastname>
<author>
</author>
<firstname> $f </firstname>
(book, Richard, Dawkins) </book>
<lastname> $l </lastname>
(book, NULL, Darwin) <book>
</>
(monograph, NULL, Darwin) <author>
</> IN * CONFORMS TO pubs.dtd
<lastname> Darwin </lastname>
CONSTRUCT <$p>
</author>
<author>
</book>
<firstname> $f </firstname>
<monograph>
<lastname> $l </lastname>
<author>
</author>
<lastname> Darwin </lastname>
</>
</author>
</monograph>

Figure 22 Figure 23
time, the tag attribute in the result tuple can be converted 5.4 Complex Element Construc tion
to the appropriate XML tag (Figure 23).
Unfortunately, returning tag values as tuple attributes
cannot handle all result construction problems. In
5.3 Grouping
particular, queries that are required to return complex
Consider the query in Figure 24 that requires all the XML elements are problematic. Consider a query that
publications of an author (assuming an author is uniquely asks for all article elements in the XML data set, and
identified by his/her last name) to be grouped together, furthermore assume that an article may have multiple
and within this structure, requires the titles of publications authors and multiple titles. In object-relational
to be grouped by the type of the publication. The terminology, article has two set-valued attributes, authors
relational result from the translation of this query will be a and titles, corresponding to two set sub-elements in XML
set of tuples having fields corresponding to last name of terminology.
author, title of publication and type of publication. WHERE <book>
However, we cannot use the relational group-by operator <article> $a </article>
to group by last name and type of publication because the </> IN * CONFORMS TO pubs.dtd
SQL group-by semantics implies that we should apply an CONSTRUCT <article> $a </>
aggregate function to title, which does not make sense. To create the appropriate result, we must retrieve all
Thus, the options are either (a) have the relational engine authors and all titles for each article. This is difficult to do
order the result tuples first by last name and then by type in the relational model because flattening multiple set-
and scan the result in order to construct the XML valued attributes into tuple format gives rise to a multi-
document or (b) get an unordered set of tuples and do a valued dependency [11] and is likely to be very inefficient
grouping operation, by last name and then by type, when the sets are large, for example, if papers have many
outside the relational engine. The first approach is authors and many titles. There appears to be no efficient
illustrated in Figure 25. way to tackle this problem in the traditional relational
Figure 25 illustrates several points. The first is that model. One solution would be to return separate relations,
treating tag variables as attributes in the result relation each flattening one set-valued attribute and “join” these
provides a way of uniformly treating the contents of the relations outside the database while constructing the XML
result XML document. In this case, we are able to group document. However, this requires duplication of database
by the tag variable just like any other attribute. The functionality both in terms of execution and optimization.
second observation is that some relational database This solution would be particularly bad for an element
functionality (hash-based group-by) is either not fully with many set-valued attributes. A related problem occurs
exploited or is duplicated outside. when reconstructing recursive elements. We return to
these issues in Section 6.
WHERE <$p> <author>
<(title|booktitle)> $t </> <name> Darwin </name>
<author> <book>
<lastname> $l </lastname> <title> Origin of Species </title>
</> <title> The Descent of Man </title>
</> IN * CONFORMS TO pubs.dtd </book>
CONSTRUCT <author ID=authorID($l)> <monograph>
<name> $l </name> (Darwin, book, Origin of Species) <title> Subclass Cirripedia </title>
<$p ID=pID($p)> (Darwin, book, Descent of Man) </monograph>
<title> $t </> (Darwin, monograph, Subclass </author>
</> Cirripedia) <author>
</> (Dawkins, book, The Selfish Gene) <name> Dawkins </name>
<book>
<title> The Selfish Gene </title>
</book>
</author>

Figure 24 Figure 25

5.5 Heterogeneous Results Our qualitative evaluation based on real DTDs from
diverse domains raises some performance concerns –
Consider the following XML-QL query that creates a
specifically, in many cases relatively simple XML queries
result document having both titles and authors as elements
require either many SQL queries, or require a few SQL
(this is the heterogeneous result). This is easily handled in
queries with many joins in them. It is an open question
our approach for translating queries because this query
whether semi-structured query processing techniques can
would be split into two queries, one for selecting titles and
do this kind of work more efficiently. The fact that semi-
another for selecting authors. The results of the two
structured models represent a sequence of joins as a path
queries can be handled in different ways, one constructing
expression, or handle what is logically a union of queries
title elements and another constructing author elements.
by using wildcards and “or” operators, does not
The results can then be merged together.
automatically imply more efficient evaluation strategies.
WHERE <article> Our experience has shown that relational systems
<$p> $y </> could more effectively handle XML query workloads with
</article> IN * CONFORMING TO pubs.dtd
CONSTRUCT <$p> $y </>
the following extensions:
Support for Sets: Set-valued attributes would be useful
in two important ways. First, storing set sub-elements as
5.6 Nested Queries
set-valued attributes [19,21] would reduce fragmentation.
XML-QL is structured in terms of query blocks and one This is likely to be a big win because most of the
query block can be nested under another. These nested fragmentation we observed in real DTDs was due to sets.
queries can be rewritten in terms of SQL queries, using Second, set-valued attributes, along with support for
outer joins (and possibly skolem function ids) to construct nesting [13], would allow a relational system to perform
the association between a query and a sub-query. The more of the processing required for generating complex
details are complex and we omit it in the interest of space. XML results.
Untyped/Variable-Typed References: IDREFs are not
6. Conclusions typed in XML. Therefore, queries that navigate through
IDREFs cannot be handled in current relational systems
With the growing importance of XML documents as a without a proliferation of joins – one for each possible
means to represent data in the World Wide Web, there has reference type.
been a lot of effort on devising new technologies to Information Retrieval Style Indices: More powerful
process queries over XML documents. Our focus in this indices, such as Oracle8i’s ConText search engine for
paper, however, has been to study the virtues and XML [17], that can index over the structure of string
limitations of the traditional relational model for attributes would be useful in querying over ANY fields in
processing queries over XML documents conforming to a a DTD. Further, under restricted query requirements,
schema. The potential advantages of this approach are whole fragments of a document can be stored as an
many – reusing a mature technology, using an existing indexed text field, thus reducing fragmentation.
high performance system, and seamlessly querying over Flexible Comparisons Operators: A DTD schema
data represented as XML documents or relations. We treats every value as a string. This often creates the need
have shown that it is possible to handle most queries on to compare a string attribute with, say, an integer value,
XML documents using a relational database, barring after typecasting the string to an integer. The traditional
certain types of complex recursion. relational model cannot support such comparisons. The
problem persists even in the presence of DCDs or XML
Schemas because different DTDs may represent Unstructured Data”, Proceedings of the ACM
“comparable” values as different types. A related issue is SIGMOD Conference, Montreal, Canada, June 1996.
that of flexible indices. Techniques for building such 6. V. Christophides, S. Abiteboul, S. Cluet, M. Scholl,
indices have been proposed in the context of semi- “From Structured Documents to Novel Query
structured databases [14]. Facilities”, Proceedings of the ACM SIGMOD
Multiple-Query Optimization/Execution: As outlined Conference, Minneapolis, Minnesota, May 1994.
in Section 4, complex path expressions are handled in a 7. G. Copeland, S. Khoshafian, “A Decomposition
relational database by converting them into many simple Storage Model”, Proceedings of the ACM SIGMOD
path expressions, each corresponding to a separate SQL Conference, Austin, Texas, May 1985.
query. Since these SQL queries are derived from a single 8. R. Cover, “The SGML/XML Web Page”,
regular path expression, they are likely to share many http://www.oasis-open.org/cover/xml.html.
relational scans, selections and joins. Rather than treating 9. Deutsch, M. Fernandez, D. Florescu, A. Levy, D.
them all as separate queries, it may be more efficient to Suciu, “XML-QL: A Query Language for XML”,
optimize and execute them as a group [20]. http://www.w3.org/TR/NOTE-xml-ql.
More Powerful Recursion: As mentioned in Section 4, 10. Deutsch, M. Fernandez, D. Suciu, “Storing Semi-
in order to fully support all recursive path expressions, structured Data with STORED”, Proceedings of the
support for fixed point expressions defined in terms of ACM SIGMOD Conference, Philadelphia,
other fixed point expressions (i.e., nested fixed point Pennslyvania, May 1999.
expressions) is required. 11. R. Fagin, “Multi-valued Dependencies and a New
These extensions are not by themselves new and have Normal Form for Relational Databases”, ACM
been proposed in other contexts. However, they gain new Transactions on Database Systems, 2(3), pp. 262-278,
importance in light of our evaluation of the requirements 1977.
for processing XML documents. Another important issue 12. M. Fernandez, D. Suciu, “Optimizing Regular Path
to be considered in the context of the World Wide Web is Expressions Using Graph Schemas”, Proceedings of
distributed query processing – taking advantage of the Fourteenth ICDE Conference, Orlando, Florida,
queryable XML sources. Further research on these February 1998.
techniques in the context of processing XML documents 13. Jaeschke, H. J. Schek, “Remarks on the Algebra of
will, we believe, facilitate the use of sophisticated Non First Normal Form Relations”, Proceedings of
relational data management techniques in handling the the ACM Symposium on Principles of Database
novel requirements of emerging XML-based applications. Systems, Los Angeles, California, March 1982.
14. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, J.
7. Acknowledgements Widom, “Lore: A Database Management System for
Semistructured Data”, SIGMOD Record, 26(3), pp.
Funding for this work was provided by DARPA through 54-66, September 1997.
Rome Research Laboratory Contract No. F30602-97-2 15. J. McHugh, J. Widom, “Compile-Time Path
0247 and NSF through NSF Award CDA-9623632. Expansion in Lore”, Workshop on Query Processing
for Semistructured Data and Non-Standard Data
8. References Formats, Jerusalem, Israel, January 1999.
16. Microsoft Corporation, XML Schema,
1. S. Abiteboul, D. Quass, J. McHugh, J. Widom, J.
http://www.microsoft.com/xml/schema/reference/star
Wiener, “The Lorel Query Language for
.asp.
Semistructured Data”, International Journal on
17. Oracle Corporation, “XML Support in Oracle 8 and
Digital Libraries, 1(1), pp. 68-88, April 1997.
beyond”, Technical white paper,
2. J. Bosak, T. Bray, D. Connolly, E. Maler, G. Nicol,
http://www.oracle.com/xml/documents.
C. M. Sperberg-McQueen, L. Wood, J. Clark, “W3C
18. The Query Languages Workshop (QL’98),
XML Specification DTD”,
http://www.w3.org/TandS/QL/QL98/, December
http://www.w3.org/XML/1998/06/xmlspec-report
1998.
19980910.htm.
19. K. Ramasamy, J. F. Naughton, D. Maier, “Storage
3. T. Bray, J. Paoli, C. M. Sperberg-McQueen, Representations for Set-Valued Attributes”, Working
“Extensible Markup Language (XML) 1.0”, Paper, Department of Computer Sciences, University
http://www.w3.org/TR/REC-xml.
of Wisconsin-Madison.
4. T. Bray, C. Frankston, A. Malhotra, “Document
20. T. Sellis, “Multiple-Query Optimization”, ACM
Content Description for XML”, Transactions on Database Systems, 12(1), pp. 23-52,
http://www.w3.org/TR/NOTE-dcd. June 1990.
5. P. Buneman, S. Davidson, G. Hillebrand, D. Suciu, 21. Zaniolo, “The Database Language GEM”,
“A Query Language and Optimization Techniques for Proceedings of the ACM SIGMOD Conference, San
Jose, California, May 1983.

Latest C - CPE - 2409 Exam Dumps
No ratings yet
Latest C - CPE - 2409 Exam Dumps
5 pages
Heavin-2018-Challenges For Digital Transformat
No ratings yet
Heavin-2018-Challenges For Digital Transformat
9 pages
PetroTechManual 4
100% (1)
PetroTechManual 4
32 pages
Relational Databases For Querying XML Documents Limitations and Opportunities
No ratings yet
Relational Databases For Querying XML Documents Limitations and Opportunities
13 pages
Storing and Querying XML Data Using Rdbms Yesi Novaria Kunang
No ratings yet
Storing and Querying XML Data Using Rdbms Yesi Novaria Kunang
8 pages
Beginning XML
From Everand
Beginning XML
Joe Fawcett
3/5 (1)
XML and Web Database
No ratings yet
XML and Web Database
10 pages
Structural XML Query Processing
No ratings yet
Structural XML Query Processing
41 pages
Improving Query Performance Using Materialized XML Views: A Learning-Based Approach
No ratings yet
Improving Query Performance Using Materialized XML Views: A Learning-Based Approach
12 pages
IDBE Lectures 12 - XML
No ratings yet
IDBE Lectures 12 - XML
30 pages
A Review On Query Processing and Query Languages For Content Management in XML Database
No ratings yet
A Review On Query Processing and Query Languages For Content Management in XML Database
4 pages
Mapping of XML Document and Relational Database (Using Structural Queries)
No ratings yet
Mapping of XML Document and Relational Database (Using Structural Queries)
6 pages
Adv Dbms 2019 Notes Unit IV
No ratings yet
Adv Dbms 2019 Notes Unit IV
11 pages
Querying XML Documents With Xquery
No ratings yet
Querying XML Documents With Xquery
20 pages
Mastering XML: Essential Techniques
From Everand
Mastering XML: Essential Techniques
Brett Neutreon
No ratings yet
XML Index Internals - Paper
No ratings yet
XML Index Internals - Paper
12 pages
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
From Everand
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
Christopher Right
2.5/5 (2)
Accelerating XPath Evaluation in Any RDBMS
No ratings yet
Accelerating XPath Evaluation in Any RDBMS
43 pages
Mapping XML To Key-Value Database: Abstract-XML Is A Popular Data Format Used in Many
No ratings yet
Mapping XML To Key-Value Database: Abstract-XML Is A Popular Data Format Used in Many
7 pages
Querying The Schema's Using Xspath in XML Language: T. Vamsi Vardhan Reddy, D.V. Subbaiah. M.Tech, (PH.D)
No ratings yet
Querying The Schema's Using Xspath in XML Language: T. Vamsi Vardhan Reddy, D.V. Subbaiah. M.Tech, (PH.D)
5 pages
XML-to-SQL Query Translation Literature: The State of The Art and Open Problems
100% (3)
XML-to-SQL Query Translation Literature: The State of The Art and Open Problems
17 pages
XML DB - U3
No ratings yet
XML DB - U3
42 pages
Chapter 11: XML: Data Integration
No ratings yet
Chapter 11: XML: Data Integration
73 pages
12-Data Mapping For Transformation From RDB Schema To RDF Schema
No ratings yet
12-Data Mapping For Transformation From RDB Schema To RDF Schema
7 pages
XML and Databases
No ratings yet
XML and Databases
33 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
7 pages
DBMS Unit4 Notes
No ratings yet
DBMS Unit4 Notes
95 pages
XML: Extensible Markup Language
No ratings yet
XML: Extensible Markup Language
35 pages
First Frequent Pattern-Tree Based XML Pattern Fragment Growth Method For Web Contents
No ratings yet
First Frequent Pattern-Tree Based XML Pattern Fragment Growth Method For Web Contents
6 pages
XML Unlocked: A Complete Guide to Mastery and Advanced Techniques
From Everand
XML Unlocked: A Complete Guide to Mastery and Advanced Techniques
Adam Jones
No ratings yet
XML and Databases
No ratings yet
XML and Databases
35 pages
Yazici XML Ex
No ratings yet
Yazici XML Ex
71 pages
Niit Pre XML
No ratings yet
Niit Pre XML
19 pages
Adbms Unit1
No ratings yet
Adbms Unit1
19 pages
XML Databases
No ratings yet
XML Databases
11 pages
Chapter 11
No ratings yet
Chapter 11
73 pages
Xquery XML Databases: Roger L. Costello 16 June 2010
No ratings yet
Xquery XML Databases: Roger L. Costello 16 June 2010
27 pages
XML Programming With SQL/XML and Xquery: Facto Standard For Retrieving and Exchanging
No ratings yet
XML Programming With SQL/XML and Xquery: Facto Standard For Retrieving and Exchanging
24 pages
33 Vector Space Model For XML Retrieval
No ratings yet
33 Vector Space Model For XML Retrieval
29 pages
Unit 5 XML
No ratings yet
Unit 5 XML
73 pages
Research Print
No ratings yet
Research Print
11 pages
9idb Rel2 Features
No ratings yet
9idb Rel2 Features
13 pages
Unit IV XML Databases Adt
No ratings yet
Unit IV XML Databases Adt
36 pages
Data Warehouse Design From XML Sources: Matteo Golfarelli Stefano Rizzi Boris Vrdoljak
No ratings yet
Data Warehouse Design From XML Sources: Matteo Golfarelli Stefano Rizzi Boris Vrdoljak
8 pages
Dbms Lab Manual 2
No ratings yet
Dbms Lab Manual 2
10 pages
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
E Tensible Arkup Anguage
No ratings yet
E Tensible Arkup Anguage
25 pages
P24CDMCA4 Unit4
No ratings yet
P24CDMCA4 Unit4
10 pages
Ic Units 3
No ratings yet
Ic Units 3
23 pages
Proejct Part C Homework 3: About
No ratings yet
Proejct Part C Homework 3: About
60 pages
Native Xquery Processing in Oracle XMLDB: Zhen Hua Liu Muralidhar Krishnaprasad Vikas Arora
No ratings yet
Native Xquery Processing in Oracle XMLDB: Zhen Hua Liu Muralidhar Krishnaprasad Vikas Arora
8 pages
August 2016 1474359621 05
No ratings yet
August 2016 1474359621 05
5 pages
Efficient Data Mining For XML Queries - Answering Support: G. Seshadri Sekhar, Dr.S. Murali Krishna
No ratings yet
Efficient Data Mining For XML Queries - Answering Support: G. Seshadri Sekhar, Dr.S. Murali Krishna
10 pages
Fsfs D Asd Asda
No ratings yet
Fsfs D Asd Asda
34 pages
Discuss Various Features of XML: Extended Hierarchical Format Data Definition
No ratings yet
Discuss Various Features of XML: Extended Hierarchical Format Data Definition
7 pages
Session - 6 - Complex Data Types
No ratings yet
Session - 6 - Complex Data Types
27 pages
CH4 WEB Lecture
No ratings yet
CH4 WEB Lecture
24 pages
From XML Schema To ODL Schema: Aggregation and Composition Transformation
No ratings yet
From XML Schema To ODL Schema: Aggregation and Composition Transformation
7 pages
Storing and Querying XML Data in Databases
No ratings yet
Storing and Querying XML Data in Databases
40 pages
XML 215 Presentation
No ratings yet
XML 215 Presentation
22 pages
Unit1 XML
No ratings yet
Unit1 XML
57 pages
Cms-Mod Shop-55-10 1 1 70 1713
No ratings yet
Cms-Mod Shop-55-10 1 1 70 1713
17 pages
Introduction To XQuery in SQL Server 2005 From Microsoft
No ratings yet
Introduction To XQuery in SQL Server 2005 From Microsoft
32 pages
Sushant's Resume
No ratings yet
Sushant's Resume
2 pages
Apple 12 Pro Specifications
No ratings yet
Apple 12 Pro Specifications
2 pages
Introduction To Java Programming
No ratings yet
Introduction To Java Programming
24 pages
Upp Co
No ratings yet
Upp Co
75 pages
2009 Website Localization
No ratings yet
2009 Website Localization
7 pages
2-ch3 Autoinstall
No ratings yet
2-ch3 Autoinstall
15 pages
Cyber Security Awareness As Critical Driver To National Security
No ratings yet
Cyber Security Awareness As Critical Driver To National Security
4 pages
Gangaram - Java - Sage IT
No ratings yet
Gangaram - Java - Sage IT
5 pages
Analysis On Handwriting Using Pen-Tablet For Identification of Person and Handedness
No ratings yet
Analysis On Handwriting Using Pen-Tablet For Identification of Person and Handedness
5 pages
Rubrik Quick Competitive Selling Guide
No ratings yet
Rubrik Quick Competitive Selling Guide
2 pages
QP of AI Grade IX Set B
No ratings yet
QP of AI Grade IX Set B
2 pages
Basics On SDH From STM-1 Up To
No ratings yet
Basics On SDH From STM-1 Up To
124 pages
Firmware Upgrade Tool Lite User Guide
No ratings yet
Firmware Upgrade Tool Lite User Guide
22 pages
Active Defence Through Deceptive IPS
No ratings yet
Active Defence Through Deceptive IPS
81 pages
Coding Interview Preparation Guide
No ratings yet
Coding Interview Preparation Guide
8 pages
Problems For Prmo
No ratings yet
Problems For Prmo
4 pages
Log
No ratings yet
Log
2 pages
Smart Wheelchair Design For Elderly and Disabled People: September 2023
No ratings yet
Smart Wheelchair Design For Elderly and Disabled People: September 2023
8 pages
Unit Two PPT Grade9
No ratings yet
Unit Two PPT Grade9
39 pages
Segment Routing Work Book by Orhan Ergun LLC. Orhan Ergun LLC
No ratings yet
Segment Routing Work Book by Orhan Ergun LLC. Orhan Ergun LLC
25 pages
376-18 Wenke Lee Rebuttal Opinion Summaries
No ratings yet
376-18 Wenke Lee Rebuttal Opinion Summaries
3 pages
Prabhu Raj Resume
No ratings yet
Prabhu Raj Resume
2 pages
Daftar Pustaka - (New)
No ratings yet
Daftar Pustaka - (New)
13 pages
SAP BW Useful Tables
No ratings yet
SAP BW Useful Tables
12 pages
Basic Computer Operation
33% (3)
Basic Computer Operation
4 pages
CLion Clang Installation
No ratings yet
CLion Clang Installation
8 pages
Document Management Plan - The Complete Guide
No ratings yet
Document Management Plan - The Complete Guide
15 pages

Relational Databases For Querying XML Documents

Uploaded by

Relational Databases For Querying XML Documents

Uploaded by

MIT OpenCourseWare

20.453J / 2.771J / HST.958J Biomedical Information Technology

Jayavel Shanmugasundaram Kristin Tufte Gang He

Abstract model that would make it more appropriate for

<!ELEMENT book (booktitle, author)

book (bookID: integer, book.booktitle.isroot: boolean, book.booktitle : string)

Group 3. DTDs for which Hybrid reduces some joins

per SQL query, but not enough to offset the increase in

decrease in the total number of joins depends on the

structure of the DTD. In Group 4, most of the shared

nodes are either set nodes or involved in recursion. Since

2.5 Group 1 Group 2 Group 3 Group 4

nodes are either set nodes or recursive nodes.

3.6.3 Results for Path Expression s of Other Lengths

Figure 16 4.1 Converting Queries with Si mple Path

6 extended the XML-QL syntax to query over all

You might also like