0% found this document useful (0 votes)

51 views12 pages

Improving Query Performance Using Materialized XML Views: A Learning-Based Approach

This paper considers the problem of improving the efficiency of query processing on an XML interface of a relational database. Our learning-based approach precomputes and stores (materializes) parts of the answers to the workload queries as clustered XML views. Our experiments show that the approach can significantly reduce processing costs for frequent and important queries on relational databases.

Uploaded by

nithiananthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views12 pages

Improving Query Performance Using Materialized XML Views: A Learning-Based Approach

Uploaded by

nithiananthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Improving Query Performance Using Materialized XML

Views: A Learning-Based Approach

Ashish Shah and Rada Chirkova

Department of Computer Science

North Carolina State University
Campus Box 7535, Raleigh NC 27695-7535
{anshah,rychirko}@ncsu.edu

Abstract. We consider the problem of improving the efficiency of query processing

on an XML interface of a relational database, for predefined query workloads. The
main contribution of this paper is to show that selective materialization of data as
XML views reduces query-execution costs in relatively static databases. Our
learning-based approach precomputes and stores (materializes) parts of the answers
to the workload queries as clustered XML views. In addition, the data in the
materialized XML clusters are periodically incrementally refreshed and rearranged,
to respond to the changes in the query workload. Our experiments show that the
approach can significantly reduce processing costs for frequent and important queries
on relational databases with XML interfaces.

1 Introduction

The extended markup language (XML) [18] is a simple and flexible format that is playing
an increasingly important role in publishing and querying data in the World Wide Web. As
XML has become a de facto standard for business data exchange, it is imperative for
businesses to make their existing data available in XML for their partners. At the same
time, most business data are still stored in relational databases. A general way to publish
XML data in relational databases is to provide XML interfaces over the stored relations
and to enable querying the interfaces using XML query languages. In response to the
demand for such frameworks, database systems with XML interfaces over non-XML data
are increasingly available, notably relational systems from Oracle, IBM, and Microsoft.
In this paper we consider the problem of improving the efficiency of evaluating XML
queries on relational databases with XML interfaces. When querying a data source using
its XML interface, an application issues a query in an XML query language and expects an
answer in XML. If the data source is a relational database, this way of interacting with the
database adds new dimensions to the old problem of efficiently evaluating queries on
relational data. In the standard scheme for evaluating queries on an XML interface of a
relational database, the relational query-processing engine computes a relation that is an
answer to the query on the stored relational data; see [9] for an overview. On top of this
process, the query-processing engine has to (1) translate the query from an XML query
language into SQL (the resulting query is then posed on the relational data), and (2)
translate the answer into XML. To efficiently process a query on an XML interface of a
relational database, the query-processing engine has to efficiently perform all three tasks.
We propose an approach to reducing the amount of time the query-processing engine
spends on answering queries on XML interfaces of relational databases. The idea of our
approach is to circumvent the standard query-answering scheme described above, by
precomputing and storing, or materializing, some of the relational data as XML views. If
the DBMS has chosen the “right” data to materialize, it can use these XML views to
answer some or most of the frequent and important queries on the data source without
accessing the relational data. We show that our approach can significantly reduce the time
to process frequent and important queries on relational databases with XML interfaces.
Our approach is not the first view-based approach to the problem of efficiently
computing XML data on relational databases. To clarify how our approach differs from
previous work, we use the terms (1) view definitions, which are data specifications given in
terms of stored data (or possibly in terms of other views), and (2) view answers, which are
the data that satisfy the definition of a view on the database. In past work, researchers have
looked into the problem of efficiently evaluating XML queries over XML view definitions
of relational data (e.g., SilkRoute [8] or XPERANTO [16]). We build on the past work by
adding a new component to this framework: We incrementally materialize XML view
answers to frequent and important XML queries on a relational database, using a learning
approach. To the best of our knowledge, we are the first to propose this approach.
The following are the contributions of this paper:
• We develop a learning-based approach to materializing relational data in XML.
• We propose a system architecture that takes advantage of the materialized XML
to reduce the total query-execution times for incoming query workloads.
• We show how to transform a purely relational database system to accommodate
materialized XML and our system architecture.
Using our approach may result in significant efficiency gains on relatively static
databases. Moreover, it is possible to combine our solution with the orthogonal approaches
described in [8,16], thus achieving the combined advantages of the two solutions.
The remainder of the paper is organized as follows. Section 1.1 discusses related work.
In Section 2 we formalize the problem and outline our approach. In Sections 3 and 4, we
describe the system architecture and the learning algorithm. Section 5 describes
experimental results. We discuss the approach in Section 6, and conclude with Section 7.

1.1 Related Work

The problem of XML query answering has recently received a lot of attention. [11, 13]
propose a logical foundation for the XML data model. [3] describes a system for data-
model management, with tools to map schemas between XML and relations. [6] looks into
developing XML documents in a normal form that guarantees some desirable properties of
the document format. [7] proposes an approach to efficiently representing and querying
semistructured Web data. [10] proposes an XML data model and a formal process, to map
Web information sources into commonly perceived logical models; the approach provides
for easy and efficient information extraction from the World-Wide Web.
[14] describes an approach to XML data integration, based on an object-oriented data
model. [15] proposes an XML data-management system that integrates relational DBMS,
Java and XSLT. [20] reports on a system that manages XML data based on a flexible
mapping strategy; given XML data, the system stores data in relations, for efficient
querying and manipulation. XCache [2] describes a web-based XML-querying system that
supports semantic caching; ACE-XQ [4] is a caching system for queries in XQuery; the
system uses sophisticated cache-management mechanisms in the XML context.
SilkRoute [8] is a framework for publishing relational data using XML view definitions.
The approach incorporates an algorithm for translating queries from XQuery into SQL and
an optimization algorithm for selecting an efficient evaluation plan for the SQL queries.
XPERANTO [16] is an XML-centric middleware layer that lets users query and structure
the contents of a relational database as XML data and thus allows them to ignore the
underlying relations. Using the XPERANTO query facility and the default XML view
definition of the underlying database, it is possible to specify custom XML view
definitions that better suit the needs of the applications.
The motivation for using views in query processing comes from information-integration
applications; one approach, called data warehousing [17], uses materialized views.
[1,5,19,21] propose a unified approach to the problem of view maintenance in data
warehouses. In our work, we use a learning method called concept, or rule, learning [12].

2 Problem Specification and Outline of the Proposed Approach

In this section we specify the problem of improving the efficiency of answering queries on
XML interfaces of relational databases, and outline our solution. An XML-relational data
source (“data source”) comprises a relational database system and an XML interface. For a
query in an XML query language, to evaluate the query on a data source means to obtain
an XML answer to the query via the XML interface of the source. Suppose there is a finite
set of important queries, with associated relative weights, that users or applications
frequently pose on the data source. We call these queries a query workload. In our cost
model, the cost of evaluating a query on a data source is the total time elapsed between
posing the query on the source and obtaining an answer to the query in XML. The total
cost of evaluating a query workload on a data source is the weighted sum of the costs of
evaluating all workload queries, using their relative weights. We consider the problem of
improving the efficiency of evaluating a query workload on a data source; the goal here is
to reduce the total cost of evaluating a given query workload on a given data source.
To improve the efficiency of evaluating a query workload on a data source, we propose
an approach based on incrementally materializing XML views of workload-relevant data.
To materialize a view is to compute and store the answer to the view on the database. We
materialize views in XML rather than in relations, to reduce or eliminate the time required
to translate (1) the workload queries from an XML query language into SQL, and (2) the
relational answers to the queries into XML. In the proposed system architecture, when
answering a query, the query-processing engine first searches the materialized XML
views, rather than the relational tables; if the query can be answered using the views, there
is no need to access the underlying relations. Using this approach may result in significant
efficiency gains when the underlying relational data do not change very often.
In our approach, we need to decide which data to materialize in XML. We use a
learning-based approach to materialize only the data that is needed to answer the workload
queries on the data source. In database systems, it is common to maintain statistics on the
stored data, for the purposes of query optimization [9]. We maintain similar statistics on
access rates to the data in the stored relations, and materialize the most frequently accessed
tuples in XML. We use learning techniques combined with the access-rate statistics to
decide when and how to change, incrementally, the set of records materialized in XML.
We manage the materialized data using the concept of clustering. In our approach,
clustering means combining related XML records into a single materialized XML
structure. These XML structures are stored in a special relation and can be queried using
the data source’s XML query language. (In the remainder of the paper we assume that
XQuery is the language of choice.) Storing the most frequently accessed tuples in
materialized XML clusters increases the probability that future workload queries will be
satisfied by the clusters. To answer those queries that are not satisfied by the XML
clusters, we use the relational query-processing engine.
3 The System Architecture

We now discuss the architecture of the system. We describe the query-processing

subsystem, the required changes to the schema of the originally relational data source, and
the process of generating workload-related XML data from the stored relations.

3.1 The Query-Processing Subsystem

In this section we describe a typical query path taken by an input query; see Fig. 1.

Fig. 1. The Query-Processing subsystem

The solid lines in Fig. 1 show the primary query path, which is taken for all queries on
the data. If a workload query can be answered by the materialized XML clusters, then only
the primary path is taken. Otherwise, the query next follows the secondary query path,
shown in dotted lines in Fig. 1; here, the input query is pushed down to the relational level
and is answered using the stored relations, rather than the materialized XML.
The XML clusters are stored as values of an attribute in a special relation. The system
queries the relation in SQL to find the most relevant cluster, and then poses the XQuery
query on the cluster. The schema for the clusters is specified by the database administrator.

3.2 Setting up Materialized XML Clusters

In this section we describe how to set up materialized XML clusters, by transforming the
relational-database schema to accommodate XML. For simplicity, we use a schema with
just two relations, R(A1,…, An) and S(B1,…, Bm). A1 is the primary key of the relation R.

3.2.1 Modifying the given relational schemas

In our approach, for tuples of certain relations we keep track of how many times each tuple
is accessed in answering the workload queries. To enable these access counts, we change
the schema of the relational data source, by adding an extra attribute to the schema of one
or more of the stored relations. The most likely candidates for this schema change are the
relations of interest, which are relations that have high access rates, primarily large
relations that are involved in expensive joins. For instance, suppose we have a query that
involves a join of the relations R and S. If the relation R is large, the query would be
expensive to evaluate, hence we consider R as a suitable candidate for the schema change.
(Alternatively, the database administrator can make the choice of the schema to modify.)
Suppose we decide to add an attribute A(n+1) to the schema of the relation R; we will
store access counts for the tuples in relation R as values of this attribute. R(A1,…,An, A(n+1))
is the schema of the modified relation. Initially, the value of A(n+1) is NULL in all tuples.

3.2.2 Creating the relations for the materialized XML clusters

We now define the schema of the relation T that will store the materialized XML clusters,
as T(A1, C). Recall that A1 is the primary key of the relation R; using this attribute in the
relation T helps us index the materialized XML clusters in the same way as the relation R.
The attribute C is used to store the materialized XML clusters in text format.
To summarize, we set up materialized XML clusters by doing the following:
1. Select a relation of interest (R in the example) to modify.
2. Add an access-count attribute to the schema of the selected relation.
3. Create a new relation (T in the example), to hold the materialized XML version
of the data in the selected relation of interest (R in the example).

4 The Learning Algorithm

In this section we describe a learning algorithm that populates and incrementally maintains
the XML clusters. We first describe how to select relational tuples for materialization, and
then explain our clustering strategy for building an XML tree of “interesting records.”
Our general approach is as follows. When answering queries, we first pose each query
on the materialized XML clusters in the relation T that we have added to the original
stored relations. Whenever a query cannot be answered using the materialized XML
clusters (or at system startup, see next paragraph), the query is translated into SQL and
pushed down to the stored relations. Each time this process is activated, the system
increments access counts for all tuples that contribute to the answer to the SQL query.
At system startup, the relation T that holds the materialized XML is empty. As a result,
all incoming queries have to be translated into SQL and pushed to the relational query-
processing engine. The materialization phase starts when the access counts in the relations
of interest exceed an empirically determined threshold value, see Section 4.3; all tuples
whose access counts are greater than the threshold value are materialized into XML. The
schema for the materialized XML is specified by the input XQuery workload.
(Alternatively, it can be specified by the database administrator.) As the learning algorithm
executes over an extended time period, the most frequently accessed tuples in the relations
of interest are materialized into XML and stored in the relation T.

4.1 Learning I: Discovering Access Patterns in the Relations of Interest

To incrementally materialize and maintain XML clusters of workload-relevant data, the

system periodically runs a learning process that translates frequently accessed relational
tuples into XML and reorders the resulting records in a hierarchy of clusters. We now
describe the first stage of the learning process, where the system discovers access patterns
in the relations of interest by using the access-count attribute. Once the access pattern is
established, the system translates the most frequently accessed tuples into XML. To obtain
the current access pattern, the system needs to execute the following steps.
1. (This step is executed during the system startup.) Input an expected query stream
and set up the desired output XML schema.
2. Pose the incoming workload queries on the stored relations; in answering the
queries, increment the access counts for those tuples in the relations of interest
that contribute to the answers to the queries.
During the system startup we use an expected, rather than real, query stream to
determine access patterns in the relations of interest. For example, if each workload query
may use one of the given 250K keywords with given frequencies, then for our expected
query stream we select the 1000 most-frequent keywords.

4.2 Learning II: Materializing XML and Forming Clusters

Once the first stage of the learning process has discovered the access patterns in the
relations of interest, the system performs, in several iterations, the following steps:
1. To generate the materialized XML records, retrieve from the relations of interest
all tuples whose access counts are greater than the predefined threshold value.
2. Translate the data into XML and store in the materialized XML relation.
3. Form clusters (also see section 4.3):
a. Find all relational tuples that are related to the materialized XML, w.r.t.
the workload queries.
b. Select those of the tuples whose access counts exceed the threshold
value, and translate them into XML.
c. Cluster the tuples and materialized XML into a single XML tree.

4.3 The Clustering Phase

In our selective materialization, we use clustering to increase the scope of materialized

XML beyond the relations of interest, by incrementally adding to the XML records
“interesting records” from other relations. The criterion for adding these interesting
records is the same as the criterion for materializing relational tuples in XML. More
precisely, the relations with the most frequently accessed records are selected in the
descending order of access frequency. For example, if there are three relations R1, R2, R3,
in descending order of tuple-access frequencies, then we can form clusters, starting with R1
and R2, then R2 and R3, and so on.
The relation T now contains a single XML structure, which holds related records with
high access rates. In each cluster, the records are sorted in the order of their access counts.
In the current implementation, the schema for the cluster is provided as an external input
(see Fig. 1). Choosing cluster schemas automatically is a direction of future work.
We now explain on an example how to form hierarchies of clusters. Consider a database
with four relations, R1-R4, in the descending order of tuple-access frequency. We first
modify the relation R1, to store the XML clusters generated from the tuples retrieved from
a join of R1 and R2 on some attribute. Similarly, we modify R2 to store a join of R2 and R3,
and so on. With every join of Rn and Rn+1, we form the most-frequently accessed clusters;
the clusters form a hierarchy w.r.t. their access rates: For example, the cluster formed from
R1 and R2 will have higher access rates than the cluster for R2 and R3. In our experiments,
we have explored the first level of clustering for simple queries; see Section 5. We are
working on implementing multiple levels of clustering for more complex queries.
In our approach we determine the threshold value empirically: At system startup time,
we repeat the learning process several times to arrive at a suitable value. The choice of the
threshold value is a tradeoff between larger materialized views and better query-execution
times: A lower threshold value means more tuples will be materialized as XML; thus more
queries will get satisfied in the XML views. A higher threshold value prevents most of the
relational data from being selected for materialization, which limits the number of queries
that can be answered using the views. The key is to strike a balance between the point at
which the system materializes tuples and the proportion of records to be materialized. In
our future work, we intend to make the choice of this threshold value dynamic.

5 Experimental Setup and Results

5.1 The Setup

The CDDB collection [22] is a database that stores information about CDs and CD tracks.
The CDDB schema comprises two relations, Disc(cd_id,cd_title,genre,num_of_tracks) and
Tracks(cd_id,track_title). (For simplicity, we omit other attributes of the relations in
CDDB.) The Disc relation has 250K tuples. Each CD has an average of 10 tracks, stored in
the Tracks relation. Fig. 2 shows some tuples in the two relations in CDDB.
In our experiments, we used Oracle 9.2 on a Dell Server P4600 with Intel Xeon CPU at
2GHz and 2GB of memory running on Microsoft Windows 2000. We implemented the
middleware interface in Java using Sun JDK 1.4, and ran it on an Intel Pentium II 333MHz
machine with 128MB of memory on Red Hat Linux 7.3. We conducted a significant
number of runs to ensure that the effect of network delays on our experiments is minimal.

Fig. 2. Some tuples in relations Disc and Track in the CDDB database

Fig. 3. Data in the Disc relation with the modified schema

Fig. 4. Schema for the relation that holds the materialized XML and an example of a simple cluster

To determine access patterns for the Disc relation, we added a new attribute, count, to
the schema; this attribute holds an access count for each CD record. The rest of the
database schema is unchanged. (Section 3.2.1 explains how to choose relations for the
schema change.) Fig. 3 shows the tuples in the Disc relation with the modified schema.
Fig. 4 shows the table XmlDiscTrack. This new relation holds materialized XML as text
data in record format. The process of defining this materialized table is explained in
Section 3.2.2. In the XmlDiscTrack relation that we create in the CDDB database,
attributes cd_id and count are the same as in the Disc relation. The value of the count
attribute in XmlDiscTrack equals the value of count in the corresponding tuple in the Disc
relation, at the point in time where that tuple was materialized as XML. The XML attribute
in XMLDiscTrack holds the materialized XML. For example, the value of the XML
attribute, in the tuple for the ‘Air Supply’ CD in XMLDiscTrack, is shown in Fig. 4.
Workload queries: The workload queries in our experiments use CD titles as keywords.
In our architecture, the query-processing engine first tries to answer each workload query
by searching the XML clusters in the relation XMLDiscTrack; if it fails to find an answer
there, the engine then searches the Disc table using SQL. The two query paths are shown
in Fig. 1 in Section 3. In learning stage I, whenever the system answers an input query
using the original stored relations in the CDDB database, it increments the access count for
each answer tuple in the Disc table. For learning stage II to be invoked, the access counts
have to reach the threshold value; see Section 4.2. In the second stage of learning, we
materialize in XML all the tuples in the Disc relation whose access counts exceed the
threshold value. The generation of materialized XML is explained in Section 4. Fig. 4
shows the materialized XML for the CD “Air Supply” generated from Disc and Track.
Cluster formation: This phase is invoked for every tuple in the relation XMLDiscTrack
that holds materialized XML. The XML shown in Fig. 4 is only suitable to answer those
queries that ask for tracks in that CD; this restriction limits the scope of the approach.
Hence, we form clusters. Clusters are formed by identifying related records. The algorithm
for selecting these related records is explained formally in Section 4. The tuples in the Disc
relation that match ‘Air Supply’ and that have their access counts above the threshold value
are chosen to form the clusters. These tuples are converted to XML and merged into the
original structure. An example for the merged structure is shown in Fig. 5. Once the
clustering phase is completed, the XML shown in Fig. 6 replaces the XML in Fig. 4.

Fig. 5. An example of materialized clustered XML

5.2 Experimental Results

In this section we show the results of our experiments on the feasibility of our learning-
based materialization approach.
ÿ Comparing the efficiency of querying materialized XML to the efficiency of
getting answers to SQL queries on the stored relational data.
The objective of this experiment was to analyze whether XQuery-based querying is
effective on materialized XML views, as compared to using SQL on the stored relations.

Fig. 6. Comparison between using random queries on materialized XML and on relational data

Fig. 7. Average query times for a random set of 1000 repeated queries on relational data and an
analysis of the time required to convert this relational data to XML

Fig. 7 shows query-execution times for 5000 XML records, for the query SELECT *
FROM Disc, Track WHERE Disc.cd_id = Track.cd_id AND Disc.cd_title = ‘%Eddie
Murphy%’. Interesting tuples in the join of Disc and Track are stored in XML. The
relational tables hold 2.5 million tuples (250K CDs times 10 tracks). The cluster records
are similar to the XML shown in Fig. 5. The graph is a plot of query-execution times for
XQuery queries based on the attribute cd_title of the Disc relation. The experiment shows
that processing a query on an XML view is faster than using SQL on the relations and then
converting the answer to XML. Fig. 6 shows that executing SQL queries is more time-
consuming than executing their XQuery counterparts on materialized XML.
In pushing XQuery queries to the relational data, converting the answers into XML is a
major overhead. We analyze the overhead in Fig. 7, which shows that the process of
converting answer tuples into XML is the most expensive part of answering queries.
Hence, it would be beneficial if such data were to be materialized.
ÿ Analyzing the maximum time spent in converting query answers to XML.
The objective of this experiment was to analyze the time spent on translating
relational query answers into XML. The graph in Fig. 7 is a plot of query-execution times
for SQL queries based on the attribute cd_title of the Disc relation. The graph shows, as a
solid line, the mean execution times for relational queries plus the times to convert the
answers into XML. We see that of the total time of around 190 ms, converting relational
data into XML takes around 60ms (the dotted line). While the relational query takes 190
ms – 60 ms = 130 ms to execute, there is an overhead of 60 ms in converting the relational
data to XML. These results are the motivation for using materialization techniques.
ÿ Simulation runs to show the decrease in total query-execution times when
querying the materialized XML alongside the stored relational data.

Fig. 8. Average query-execution times for a randomized set of 1000 repeated queries on a
combination of relational and materialized data

Fig. 8 shows 30 simulation runs for a query workload of 1000 randomly selected CD
titles. The X-axis shows the query ID, while the Y-axis shows the query-execution times.
The vertical dotted lines show the points at which XML materialization took place. (Recall
that the system periodically runs the learning algorithm.) It can be seen in Fig.8 that after
every learning stage, the slope of the curve falls. Intuitively, after new learning has taken
place, the XML clusters can satisfy a higher number of queries, with higher efficiency.

6 Discussion

The proposed approach is to store materialized XML views in a relational database using
learning. One extreme of the approach is to materialize the entire relational database as
XML and then use a native XML engine to answer queries. This way, we would be able to
avoid the overhead of translating all possible queries on the data source into SQL, and of
translating the relational answers to the queries into XML. However, query performance
might degrade considerably, as XML query-answering techniques are slower than their
relational counterparts. In addition, the system would have to incur a significant overhead
of keeping the XML consistent with the underlying relations.
In Section 4 we described the process of grouping together related records in XML
clusters. This approach allows a database system to incrementally find an optimal
proportion of XML records that can be accessed faster than the relational tables. This
optimal proportion can be arrived at by varying the size of the clustered XML and the
threshold value. Additional improvements can be made when user applications maintain
local caches: It may be beneficial to prefetch the XML data in the application’s cache, so
that future queries from the application have a higher chance of being satisfied locally.
In our approach, the added counters and flags in the relations have to be updated
frequently and thus create an overhead. In our future work, we plan to reduce the overhead
by updating tuple-access counts offline or during periods of lower query-loads.
We materialize only frequently-accessed tuples; thus, only a fraction of the database is
materialized as XML at any given time. (The clusters are recomputed from scratch every
time the learning phase is invoked.) The advantage of the learning approach is to balance
the proportion of data in relations and XML, by materializing the tuples that are in the
answers to multiple queries.
As the materialized XML is generated based on the access count of relational tuples,
there may be queries that need to access both materialized XML and relational database.
We plan to explore how to handle such queries in our future work.

7 Conclusions and Future Work

We have described a view- and learning-based solution to the problem of reducing total
query-execution times in relational data sources with XML interfaces. Our approach
combines learning techniques with selective materialization; our experiments show that it
can prove beneficial in improving query-execution speeds in relatively static databases.
This paper describes an implementation that is external to the database engine. We are
currently working on incorporating our approach inside a relational database-management
system. We are looking into automating schema definition for materialized XML clusters,
by using the information about past query workloads and the relations accessed by these
workloads. We are working on developing a learning approach to selecting “interesting
records” for XML clusters. We plan to implement a dynamic approach to selecting the
threshold value in XML materialization. We plan to devise better strategies for (1)
prioritizing XML records within clusters, and (2) automatically dematerializing obsolete
XML data. Finally, we plan to automate the choice of the relations of interest, given a
query workload.

References

1. J. Chen, S. Chen, and E.A. Rundensteiner. A transactional model for data warehouse
maintenance. In Proc. of the 21st Int’l Conference on Conceptual Modeling (ER), 2002.
2. L. Chen, E.A. Rundensteiner, and S. Wang. XCache: A semantic caching system for XML
queries. In Proc. 2002 ACM SIGMOD International Conference on Management of Data, 2002.
3. K.T. Claypool, E.A. Rundensteiner, X. Zhang, H. Su, H.A. Kuno, W.C. Lee, and G. Mitchell.
Gangam — a solution to support multiple data models, their mappings and maintenance. In Proc.
2001 ACM SIGMOD International Conference on Management of Data, 2001.
4. L. Chen, S. Wang, E. Cash, B. Ryder, I. Hobbs, and E.A. Rundensteiner. A fine-grained
replacement strategy for XML query cache. In Proc. Fourth ACM CIKM International Workshop
on Web Information and Data Management (WIDM 2002), pages 76–83, 2002.
5. J. Chen, X. Zhang, S. Chen, A. Koeller, and E.A. Rundensteiner. DyDa: Data warehouse
maintenance in fully concurrent environments. In Proc. ACM SIGMOD, 2001.
6. D.W. Embley and W.Y. Mok. Developing XML Documents with Guaranteed “Good” Properties.
In Proc. 20th International Conference on Conceptual Modeling (ER), pages (426-441), 2001.
7. I.M.R.E. Filha, A.S. da Silva, A.H.F. Laender, and D.W. Embley. Using nested tables for
representing and querying semistructured web data. In Proceedings of the Advanced Information
Systems Engineering, 14th International Conference (CAiSE 2002), 2002.
8. M. Fernandez, Y. Kadiyska, D. Suciu, A. Morishima, and W.C. Tan. SilkRoute: A framework for
publishing relational data in XML. ACM Trans. Database Systems, 27(4):438–493, 2002.
9. Yannis E. Ioannidis. Query optimization. In Allen B. Tucker, editor, The Computer Science and
Engineering Handbook, pages 1038–1057. CRC Press, 1997.
10.Z. Liu, F. Li, and W.K. Ng. Wiccap data model: Mapping physical websites to logical views. In
Proc. 21st International Conference on Conceptual Modeling (ER), 2002.
11.Liu Mengchi. A logical foundation for XML. In Proc. Advanced Information Systems
Engineering, 14th International Conference (CAiSE 2002), pages 568–583, 2002.
12. Tom M. Mitchell. Generalization as search. Artificial Intelligence, 18:203–226, 1982.
13. Liu Mengchi and Tok Wang Ling. Towards declarative XML querying. In Proc. 3rd International
Conference on Web Information Systems Engineering (WISE 2002), pages 127–138, 2002.
14. K. Passi, L. Lane, S.K.Madria, B.C. Sakamuri, M.K. Mohania, and S.S. Bhowmick. A model for
XML schema integration. In Proc. 3rd Int’l Conf. E-Commerce and Web Technologies, 2002.
15.Giuseppe Psaila. ERX: An experience in integrating entity-relationship models, relational
databases, and XML technologies. In Proc. XML-Based Data Management and Multimedia
Engineering EDBT workshop, 2002.
16. J. Shanmugasundaram, J. Kiernan, E. J. Shekita, C. Fan, and J. Funderburk. Querying XML
views of relational data. In Proc. 27th Int’l Conference on Very Large Data Bases, 2001.
17. Jennifer Widom. Research problems in data warehousing. In Proc. Fourth International
Conference on Information and Knowledge Management, pages 25–30, 1995.
18. Extensible Markup Language (XML) http://www.w3.org/XML.
19. X. Zhang, L. Ding, and E.A. Rundensteiner. Parallel multi-source view maintenance. VLDB
Journal: Very Large DataBases, 2003. (To appear).
20. X. Zhang, M. Mulchandani, S. Christ, B. Murphy, and E.A. Rundensteiner. Rainbow: mapping-
driven XQuery processing system. In Proc. ACM SIGMOD, 2002.
21. Xin Zhang and Elke A. Rundensteiner. Integrating the maintenance and synchronization of data
warehouses using a cooperative framework. Information Systems, 27:219–243, 2002.
22. The CDDB database. http://www.freedb.org.

ISYS6508 Database System: Week 9 Semi-Structured Data and XML
No ratings yet
ISYS6508 Database System: Week 9 Semi-Structured Data and XML
40 pages
Holistic Evaluation of XML Queries With Structural Preferences On An Annotated Strong Dataguide
No ratings yet
Holistic Evaluation of XML Queries With Structural Preferences On An Annotated Strong Dataguide
11 pages
XML Notes
No ratings yet
XML Notes
49 pages
XML DB - U3
No ratings yet
XML DB - U3
42 pages
XML-to-SQL Query Translation Literature: The State of The Art and Open Problems
100% (3)
XML-to-SQL Query Translation Literature: The State of The Art and Open Problems
17 pages
Questions - STD 2
No ratings yet
Questions - STD 2
27 pages
The Neuritis Was Theft
No ratings yet
The Neuritis Was Theft
26 pages
Research Print
No ratings yet
Research Print
11 pages
P24CDMCA4 Unit4
No ratings yet
P24CDMCA4 Unit4
10 pages
XML 215 Presentation
No ratings yet
XML 215 Presentation
22 pages
Structural XML Query Processing
No ratings yet
Structural XML Query Processing
41 pages
Accelerating XPath Evaluation in Any RDBMS
No ratings yet
Accelerating XPath Evaluation in Any RDBMS
43 pages
XML and Databases
No ratings yet
XML and Databases
35 pages
Relational Databases For Querying XML Documents
No ratings yet
Relational Databases For Querying XML Documents
14 pages
Query and Transformation in DBMS: Done by Ruban Christu Raj
No ratings yet
Query and Transformation in DBMS: Done by Ruban Christu Raj
10 pages
Jurnal XML 2
No ratings yet
Jurnal XML 2
7 pages
Storing and Querying XML Data in Databases
No ratings yet
Storing and Querying XML Data in Databases
40 pages
Adbms Unit1
No ratings yet
Adbms Unit1
19 pages
Cms-Mod Shop-55-10 1 1 70 1713
No ratings yet
Cms-Mod Shop-55-10 1 1 70 1713
17 pages
A Graph-Theoretic Approach To Map Conceptual Designs To XML Schemas
No ratings yet
A Graph-Theoretic Approach To Map Conceptual Designs To XML Schemas
45 pages
Querying XML Documents With Xquery
No ratings yet
Querying XML Documents With Xquery
20 pages
XML: Extensible Markup Language
No ratings yet
XML: Extensible Markup Language
35 pages
XML Technologies and Applications: Rajshekhar Sunderraman
No ratings yet
XML Technologies and Applications: Rajshekhar Sunderraman
24 pages
Introduction To XQuery in SQL Server 2005 From Microsoft
No ratings yet
Introduction To XQuery in SQL Server 2005 From Microsoft
32 pages
August 2016 1474359621 05
No ratings yet
August 2016 1474359621 05
5 pages
9idb Rel2 Features
No ratings yet
9idb Rel2 Features
13 pages
Data Warehouse Design From XML Sources: Matteo Golfarelli Stefano Rizzi Boris Vrdoljak
No ratings yet
Data Warehouse Design From XML Sources: Matteo Golfarelli Stefano Rizzi Boris Vrdoljak
8 pages
XML Index Internals - Paper
No ratings yet
XML Index Internals - Paper
12 pages
(Lecture Notes in Computer Science 6309 _ Information Systems and Applications, Incl. Internet_Web, And HCI) M. Tamer Özsu, Patrick Kling (Auth.), Mong Li Lee, Jeffrey Xu Yu, Zohra Bellahsène, Rainer
No ratings yet
(Lecture Notes in Computer Science 6309 _ Information Systems and Applications, Incl. Internet_Web, And HCI) M. Tamer Özsu, Patrick Kling (Auth.), Mong Li Lee, Jeffrey Xu Yu, Zohra Bellahsène, Rainer
163 pages
E Tensible Arkup Anguage
No ratings yet
E Tensible Arkup Anguage
25 pages
Xquery XML Databases: Roger L. Costello 16 June 2010
No ratings yet
Xquery XML Databases: Roger L. Costello 16 June 2010
27 pages
Niit Pre XML
No ratings yet
Niit Pre XML
19 pages
Design and Implementation of A XML Repository System Using Dbms and Irs
No ratings yet
Design and Implementation of A XML Repository System Using Dbms and Irs
11 pages
Efficient Data Mining For XML Queries - Answering Support: G. Seshadri Sekhar, Dr.S. Murali Krishna
No ratings yet
Efficient Data Mining For XML Queries - Answering Support: G. Seshadri Sekhar, Dr.S. Murali Krishna
10 pages
Mapping of XML Document and Relational Database (Using Structural Queries)
No ratings yet
Mapping of XML Document and Relational Database (Using Structural Queries)
6 pages
Xquery: An XML Query Language
No ratings yet
Xquery: An XML Query Language
19 pages
Efficient Searching On Data Using Forward Search
No ratings yet
Efficient Searching On Data Using Forward Search
8 pages
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Relational Databases For Querying XML Documents Limitations and Opportunities
No ratings yet
Relational Databases For Querying XML Documents Limitations and Opportunities
13 pages
Native Xquery Processing in Oracle XMLDB: Zhen Hua Liu Muralidhar Krishnaprasad Vikas Arora
No ratings yet
Native Xquery Processing in Oracle XMLDB: Zhen Hua Liu Muralidhar Krishnaprasad Vikas Arora
8 pages
Storing and Querying XML Data Using Rdbms Yesi Novaria Kunang
No ratings yet
Storing and Querying XML Data Using Rdbms Yesi Novaria Kunang
8 pages
Integrating Data Warehouses With Web Data: A Survey
No ratings yet
Integrating Data Warehouses With Web Data: A Survey
16 pages
Authors:: XML Query and Transformation Language
No ratings yet
Authors:: XML Query and Transformation Language
3 pages
First Frequent Pattern-Tree Based XML Pattern Fragment Growth Method For Web Contents
No ratings yet
First Frequent Pattern-Tree Based XML Pattern Fragment Growth Method For Web Contents
6 pages
E Tensible Arkup Anguage
No ratings yet
E Tensible Arkup Anguage
25 pages
CH4 WEB Lecture
No ratings yet
CH4 WEB Lecture
24 pages
A Review On Query Processing and Query Languages For Content Management in XML Database
No ratings yet
A Review On Query Processing and Query Languages For Content Management in XML Database
4 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
7 pages
Interactive Query and Search in Semistructured Databases: Roy Goldman, Jennifer Widom
No ratings yet
Interactive Query and Search in Semistructured Databases: Roy Goldman, Jennifer Widom
7 pages
TMP 8 C15
No ratings yet
TMP 8 C15
1 page
How To Use Visio 2010 For Facilities Management
No ratings yet
How To Use Visio 2010 For Facilities Management
20 pages
Iptv
No ratings yet
Iptv
3 pages
IDBE Lectures 12 - XML
No ratings yet
IDBE Lectures 12 - XML
30 pages
XML and Web Database
No ratings yet
XML and Web Database
10 pages
MFD 2-00-330 ECDIS Quick Guide (Ed.2)
No ratings yet
MFD 2-00-330 ECDIS Quick Guide (Ed.2)
8 pages
Advanced Oracle SQL and PL/SQL: Techniques for Data Analysis and Manipulation
From Everand
Advanced Oracle SQL and PL/SQL: Techniques for Data Analysis and Manipulation
Adam Jones
No ratings yet
MOSHELL Command Database
100% (2)
MOSHELL Command Database
96 pages
ZXUR 9000 UMTS (V4.11.20) Commissioning Guide-Quick Data Configuration - R1.4
100% (2)
ZXUR 9000 UMTS (V4.11.20) Commissioning Guide-Quick Data Configuration - R1.4
94 pages
Subash - 2
No ratings yet
Subash - 2
25 pages
SQL Fundamentals for New Developers: A Practical Guide with Examples
From Everand
SQL Fundamentals for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Data Structure Notes
No ratings yet
Data Structure Notes
60 pages
VERITAS NetBackup (TM) 5 (1) .1 System Administrators Guide For UNIX, Volume I
No ratings yet
VERITAS NetBackup (TM) 5 (1) .1 System Administrators Guide For UNIX, Volume I
566 pages
DataFusion: Query Execution with Rust and Arrow: The Complete Guide for Developers and Engineers
From Everand
DataFusion: Query Execution with Rust and Arrow: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Geocities JCL Interview Questions
100% (3)
Geocities JCL Interview Questions
18 pages
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Lecture 13 Signal Flow Graphs and Mason, S Rule
No ratings yet
Lecture 13 Signal Flow Graphs and Mason, S Rule
27 pages
POV-Ray A Tool For Scientific Visualisation (Paul Bourke)
No ratings yet
POV-Ray A Tool For Scientific Visualisation (Paul Bourke)
22 pages
AVL Trees: Algorithms and Balanced Data Structures
From Everand
AVL Trees: Algorithms and Balanced Data Structures
Richard Johnson
No ratings yet
SQL and NoSQL Full Mastery: A Comprehensive Guide to Modern Data Management
From Everand
SQL and NoSQL Full Mastery: A Comprehensive Guide to Modern Data Management
Kameron Hussain
No ratings yet
Emerging Issues in Communication Research
No ratings yet
Emerging Issues in Communication Research
18 pages
Msbi Interview - Questions
No ratings yet
Msbi Interview - Questions
45 pages
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
HRD Rental Marketplace
No ratings yet
HRD Rental Marketplace
13 pages
Installing Oracle 9i On RHEL5
No ratings yet
Installing Oracle 9i On RHEL5
7 pages
ESL - Online English Teaching Schools
No ratings yet
ESL - Online English Teaching Schools
12 pages
Openvpn Server and Client Setup On Windows
100% (1)
Openvpn Server and Client Setup On Windows
4 pages
Tmpe002 TMP
No ratings yet
Tmpe002 TMP
22 pages
Roever College of Engineering & Technology: Individual Class Timetable
No ratings yet
Roever College of Engineering & Technology: Individual Class Timetable
4 pages
Sequential Circuit Synthesis - II: Virendra Singh Indian Institute of Science Bangalore
No ratings yet
Sequential Circuit Synthesis - II: Virendra Singh Indian Institute of Science Bangalore
23 pages
tmp3BD4 TMP
No ratings yet
tmp3BD4 TMP
18 pages
Six Sigma in Kci Bearing
No ratings yet
Six Sigma in Kci Bearing
5 pages
TMP 324 B
No ratings yet
TMP 324 B
17 pages
MIPS Lab Environment Reference
No ratings yet
MIPS Lab Environment Reference
17 pages
Screencast: Powerpoint 101: Everything You Need To Make A Basic Presentationby 17 Feb 2014
No ratings yet
Screencast: Powerpoint 101: Everything You Need To Make A Basic Presentationby 17 Feb 2014
13 pages
Examination For Cisco Voice
No ratings yet
Examination For Cisco Voice
12 pages
Hidden Technical Debt in Machine Learning Systems
No ratings yet
Hidden Technical Debt in Machine Learning Systems
9 pages
tmpB5B TMP
No ratings yet
tmpB5B TMP
13 pages
Memo 097.7 - 062519 - Managing Terminated Contracts Management Application PCMA PDF
No ratings yet
Memo 097.7 - 062519 - Managing Terminated Contracts Management Application PCMA PDF
8 pages
Oracle Nca
No ratings yet
Oracle Nca
7 pages
Advanced SQL Performance Tuning: Optimize Your Database Workloads
From Everand
Advanced SQL Performance Tuning: Optimize Your Database Workloads
Robert Johnson
No ratings yet
Regular Singular Points: MATH 365 Ordinary Differential Equations
No ratings yet
Regular Singular Points: MATH 365 Ordinary Differential Equations
12 pages
Use and Analysis On Cyclomatic Complexity in Software Development
No ratings yet
Use and Analysis On Cyclomatic Complexity in Software Development
4 pages
tmp92C5 TMP
No ratings yet
tmp92C5 TMP
5 pages
TMP D1 BD
No ratings yet
TMP D1 BD
5 pages
Logo Design Books Meta
No ratings yet
Logo Design Books Meta
3 pages
Synopsis Locker Security System
No ratings yet
Synopsis Locker Security System
2 pages
tmp4BC5 TMP
No ratings yet
tmp4BC5 TMP
2 pages
tmp812C TMP
No ratings yet
tmp812C TMP
2 pages
tmp1275 TMP
No ratings yet
tmp1275 TMP
1 page
tmpBECC TMP
No ratings yet
tmpBECC TMP
1 page
Tmp14e8 TMP
No ratings yet
Tmp14e8 TMP
1 page
Tmp680e TMP
No ratings yet
Tmp680e TMP
1 page
TMP B0 EA
No ratings yet
TMP B0 EA
1 page
TMP AF9 E
No ratings yet
TMP AF9 E
1 page

Improving Query Performance Using Materialized XML Views: A Learning-Based Approach

Uploaded by

Improving Query Performance Using Materialized XML Views: A Learning-Based Approach

Uploaded by

Improving Query Performance Using Materialized XML

Views: A Learning-Based Approach

Ashish Shah and Rada Chirkova

Department of Computer Science

Abstract. We consider the problem of improving the efficiency of query processing

1.1 Related Work

2 Problem Specification and Outline of the Proposed Approach

We now discuss the architecture of the system. We describe the query-processing

3.1 The Query-Processing Subsystem

Fig. 1. The Query-Processing subsystem

3.2 Setting up Materialized XML Clusters

3.2.1 Modifying the given relational schemas

3.2.2 Creating the relations for the materialized XML clusters

4 The Learning Algorithm

4.1 Learning I: Discovering Access Patterns in the Relations of Interest

To incrementally materialize and maintain XML clusters of workload-relevant data, the

4.2 Learning II: Materializing XML and Forming Clusters

4.3 The Clustering Phase

In our selective materialization, we use clustering to increase the scope of materialized

5 Experimental Setup and Results

5.1 The Setup

Fig. 3. Data in the Disc relation with the modified schema

Fig. 5. An example of materialized clustered XML

5.2 Experimental Results

7 Conclusions and Future Work

You might also like