0% found this document useful (0 votes)
51 views12 pages

Improving Query Performance Using Materialized XML Views: A Learning-Based Approach

This paper considers the problem of improving the efficiency of query processing on an XML interface of a relational database. Our learning-based approach precomputes and stores (materializes) parts of the answers to the workload queries as clustered XML views. Our experiments show that the approach can significantly reduce processing costs for frequent and important queries on relational databases.

Uploaded by

nithiananthi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views12 pages

Improving Query Performance Using Materialized XML Views: A Learning-Based Approach

This paper considers the problem of improving the efficiency of query processing on an XML interface of a relational database. Our learning-based approach precomputes and stores (materializes) parts of the answers to the workload queries as clustered XML views. Our experiments show that the approach can significantly reduce processing costs for frequent and important queries on relational databases.

Uploaded by

nithiananthi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Improving Query Performance Using Materialized XML

Views: A Learning-Based Approach

Ashish Shah and Rada Chirkova

Department of Computer Science


North Carolina State University
Campus Box 7535, Raleigh NC 27695-7535
{anshah,rychirko}@ncsu.edu

Abstract. We consider the problem of improving the efficiency of query processing


on an XML interface of a relational database, for predefined query workloads. The
main contribution of this paper is to show that selective materialization of data as
XML views reduces query-execution costs in relatively static databases. Our
learning-based approach precomputes and stores (materializes) parts of the answers
to the workload queries as clustered XML views. In addition, the data in the
materialized XML clusters are periodically incrementally refreshed and rearranged,
to respond to the changes in the query workload. Our experiments show that the
approach can significantly reduce processing costs for frequent and important queries
on relational databases with XML interfaces.

1 Introduction

The extended markup language (XML) [18] is a simple and flexible format that is playing
an increasingly important role in publishing and querying data in the World Wide Web. As
XML has become a de facto standard for business data exchange, it is imperative for
businesses to make their existing data available in XML for their partners. At the same
time, most business data are still stored in relational databases. A general way to publish
XML data in relational databases is to provide XML interfaces over the stored relations
and to enable querying the interfaces using XML query languages. In response to the
demand for such frameworks, database systems with XML interfaces over non-XML data
are increasingly available, notably relational systems from Oracle, IBM, and Microsoft.
In this paper we consider the problem of improving the efficiency of evaluating XML
queries on relational databases with XML interfaces. When querying a data source using
its XML interface, an application issues a query in an XML query language and expects an
answer in XML. If the data source is a relational database, this way of interacting with the
database adds new dimensions to the old problem of efficiently evaluating queries on
relational data. In the standard scheme for evaluating queries on an XML interface of a
relational database, the relational query-processing engine computes a relation that is an
answer to the query on the stored relational data; see [9] for an overview. On top of this
process, the query-processing engine has to (1) translate the query from an XML query
language into SQL (the resulting query is then posed on the relational data), and (2)
translate the answer into XML. To efficiently process a query on an XML interface of a
relational database, the query-processing engine has to efficiently perform all three tasks.
We propose an approach to reducing the amount of time the query-processing engine
spends on answering queries on XML interfaces of relational databases. The idea of our
approach is to circumvent the standard query-answering scheme described above, by
precomputing and storing, or materializing, some of the relational data as XML views. If
the DBMS has chosen the “right” data to materialize, it can use these XML views to
answer some or most of the frequent and important queries on the data source without
accessing the relational data. We show that our approach can significantly reduce the time
to process frequent and important queries on relational databases with XML interfaces.
Our approach is not the first view-based approach to the problem of efficiently
computing XML data on relational databases. To clarify how our approach differs from
previous work, we use the terms (1) view definitions, which are data specifications given in
terms of stored data (or possibly in terms of other views), and (2) view answers, which are
the data that satisfy the definition of a view on the database. In past work, researchers have
looked into the problem of efficiently evaluating XML queries over XML view definitions
of relational data (e.g., SilkRoute [8] or XPERANTO [16]). We build on the past work by
adding a new component to this framework: We incrementally materialize XML view
answers to frequent and important XML queries on a relational database, using a learning
approach. To the best of our knowledge, we are the first to propose this approach.
The following are the contributions of this paper:
• We develop a learning-based approach to materializing relational data in XML.
• We propose a system architecture that takes advantage of the materialized XML
to reduce the total query-execution times for incoming query workloads.
• We show how to transform a purely relational database system to accommodate
materialized XML and our system architecture.
Using our approach may result in significant efficiency gains on relatively static
databases. Moreover, it is possible to combine our solution with the orthogonal approaches
described in [8,16], thus achieving the combined advantages of the two solutions.
The remainder of the paper is organized as follows. Section 1.1 discusses related work.
In Section 2 we formalize the problem and outline our approach. In Sections 3 and 4, we
describe the system architecture and the learning algorithm. Section 5 describes
experimental results. We discuss the approach in Section 6, and conclude with Section 7.

1.1 Related Work

The problem of XML query answering has recently received a lot of attention. [11, 13]
propose a logical foundation for the XML data model. [3] describes a system for data-
model management, with tools to map schemas between XML and relations. [6] looks into
developing XML documents in a normal form that guarantees some desirable properties of
the document format. [7] proposes an approach to efficiently representing and querying
semistructured Web data. [10] proposes an XML data model and a formal process, to map
Web information sources into commonly perceived logical models; the approach provides
for easy and efficient information extraction from the World-Wide Web.
[14] describes an approach to XML data integration, based on an object-oriented data
model. [15] proposes an XML data-management system that integrates relational DBMS,
Java and XSLT. [20] reports on a system that manages XML data based on a flexible
mapping strategy; given XML data, the system stores data in relations, for efficient
querying and manipulation. XCache [2] describes a web-based XML-querying system that
supports semantic caching; ACE-XQ [4] is a caching system for queries in XQuery; the
system uses sophisticated cache-management mechanisms in the XML context.
SilkRoute [8] is a framework for publishing relational data using XML view definitions.
The approach incorporates an algorithm for translating queries from XQuery into SQL and
an optimization algorithm for selecting an efficient evaluation plan for the SQL queries.
XPERANTO [16] is an XML-centric middleware layer that lets users query and structure
the contents of a relational database as XML data and thus allows them to ignore the
underlying relations. Using the XPERANTO query facility and the default XML view
definition of the underlying database, it is possible to specify custom XML view
definitions that better suit the needs of the applications.
The motivation for using views in query processing comes from information-integration
applications; one approach, called data warehousing [17], uses materialized views.
[1,5,19,21] propose a unified approach to the problem of view maintenance in data
warehouses. In our work, we use a learning method called concept, or rule, learning [12].

2 Problem Specification and Outline of the Proposed Approach

In this section we specify the problem of improving the efficiency of answering queries on
XML interfaces of relational databases, and outline our solution. An XML-relational data
source (“data source”) comprises a relational database system and an XML interface. For a
query in an XML query language, to evaluate the query on a data source means to obtain
an XML answer to the query via the XML interface of the source. Suppose there is a finite
set of important queries, with associated relative weights, that users or applications
frequently pose on the data source. We call these queries a query workload. In our cost
model, the cost of evaluating a query on a data source is the total time elapsed between
posing the query on the source and obtaining an answer to the query in XML. The total
cost of evaluating a query workload on a data source is the weighted sum of the costs of
evaluating all workload queries, using their relative weights. We consider the problem of
improving the efficiency of evaluating a query workload on a data source; the goal here is
to reduce the total cost of evaluating a given query workload on a given data source.
To improve the efficiency of evaluating a query workload on a data source, we propose
an approach based on incrementally materializing XML views of workload-relevant data.
To materialize a view is to compute and store the answer to the view on the database. We
materialize views in XML rather than in relations, to reduce or eliminate the time required
to translate (1) the workload queries from an XML query language into SQL, and (2) the
relational answers to the queries into XML. In the proposed system architecture, when
answering a query, the query-processing engine first searches the materialized XML
views, rather than the relational tables; if the query can be answered using the views, there
is no need to access the underlying relations. Using this approach may result in significant
efficiency gains when the underlying relational data do not change very often.
In our approach, we need to decide which data to materialize in XML. We use a
learning-based approach to materialize only the data that is needed to answer the workload
queries on the data source. In database systems, it is common to maintain statistics on the
stored data, for the purposes of query optimization [9]. We maintain similar statistics on
access rates to the data in the stored relations, and materialize the most frequently accessed
tuples in XML. We use learning techniques combined with the access-rate statistics to
decide when and how to change, incrementally, the set of records materialized in XML.
We manage the materialized data using the concept of clustering. In our approach,
clustering means combining related XML records into a single materialized XML
structure. These XML structures are stored in a special relation and can be queried using
the data source’s XML query language. (In the remainder of the paper we assume that
XQuery is the language of choice.) Storing the most frequently accessed tuples in
materialized XML clusters increases the probability that future workload queries will be
satisfied by the clusters. To answer those queries that are not satisfied by the XML
clusters, we use the relational query-processing engine.
3 The System Architecture

We now discuss the architecture of the system. We describe the query-processing


subsystem, the required changes to the schema of the originally relational data source, and
the process of generating workload-related XML data from the stored relations.

3.1 The Query-Processing Subsystem

In this section we describe a typical query path taken by an input query; see Fig. 1.

Fig. 1. The Query-Processing subsystem

The solid lines in Fig. 1 show the primary query path, which is taken for all queries on
the data. If a workload query can be answered by the materialized XML clusters, then only
the primary path is taken. Otherwise, the query next follows the secondary query path,
shown in dotted lines in Fig. 1; here, the input query is pushed down to the relational level
and is answered using the stored relations, rather than the materialized XML.
The XML clusters are stored as values of an attribute in a special relation. The system
queries the relation in SQL to find the most relevant cluster, and then poses the XQuery
query on the cluster. The schema for the clusters is specified by the database administrator.

3.2 Setting up Materialized XML Clusters

In this section we describe how to set up materialized XML clusters, by transforming the
relational-database schema to accommodate XML. For simplicity, we use a schema with
just two relations, R(A1,…, An) and S(B1,…, Bm). A1 is the primary key of the relation R.

3.2.1 Modifying the given relational schemas


In our approach, for tuples of certain relations we keep track of how many times each tuple
is accessed in answering the workload queries. To enable these access counts, we change
the schema of the relational data source, by adding an extra attribute to the schema of one
or more of the stored relations. The most likely candidates for this schema change are the
relations of interest, which are relations that have high access rates, primarily large
relations that are involved in expensive joins. For instance, suppose we have a query that
involves a join of the relations R and S. If the relation R is large, the query would be
expensive to evaluate, hence we consider R as a suitable candidate for the schema change.
(Alternatively, the database administrator can make the choice of the schema to modify.)
Suppose we decide to add an attribute A(n+1) to the schema of the relation R; we will
store access counts for the tuples in relation R as values of this attribute. R(A1,…,An, A(n+1))
is the schema of the modified relation. Initially, the value of A(n+1) is NULL in all tuples.

3.2.2 Creating the relations for the materialized XML clusters


We now define the schema of the relation T that will store the materialized XML clusters,
as T(A1, C). Recall that A1 is the primary key of the relation R; using this attribute in the
relation T helps us index the materialized XML clusters in the same way as the relation R.
The attribute C is used to store the materialized XML clusters in text format.
To summarize, we set up materialized XML clusters by doing the following:
1. Select a relation of interest (R in the example) to modify.
2. Add an access-count attribute to the schema of the selected relation.
3. Create a new relation (T in the example), to hold the materialized XML version
of the data in the selected relation of interest (R in the example).

4 The Learning Algorithm

In this section we describe a learning algorithm that populates and incrementally maintains
the XML clusters. We first describe how to select relational tuples for materialization, and
then explain our clustering strategy for building an XML tree of “interesting records.”
Our general approach is as follows. When answering queries, we first pose each query
on the materialized XML clusters in the relation T that we have added to the original
stored relations. Whenever a query cannot be answered using the materialized XML
clusters (or at system startup, see next paragraph), the query is translated into SQL and
pushed down to the stored relations. Each time this process is activated, the system
increments access counts for all tuples that contribute to the answer to the SQL query.
At system startup, the relation T that holds the materialized XML is empty. As a result,
all incoming queries have to be translated into SQL and pushed to the relational query-
processing engine. The materialization phase starts when the access counts in the relations
of interest exceed an empirically determined threshold value, see Section 4.3; all tuples
whose access counts are greater than the threshold value are materialized into XML. The
schema for the materialized XML is specified by the input XQuery workload.
(Alternatively, it can be specified by the database administrator.) As the learning algorithm
executes over an extended time period, the most frequently accessed tuples in the relations
of interest are materialized into XML and stored in the relation T.

4.1 Learning I: Discovering Access Patterns in the Relations of Interest

To incrementally materialize and maintain XML clusters of workload-relevant data, the


system periodically runs a learning process that translates frequently accessed relational
tuples into XML and reorders the resulting records in a hierarchy of clusters. We now
describe the first stage of the learning process, where the system discovers access patterns
in the relations of interest by using the access-count attribute. Once the access pattern is
established, the system translates the most frequently accessed tuples into XML. To obtain
the current access pattern, the system needs to execute the following steps.
1. (This step is executed during the system startup.) Input an expected query stream
and set up the desired output XML schema.
2. Pose the incoming workload queries on the stored relations; in answering the
queries, increment the access counts for those tuples in the relations of interest
that contribute to the answers to the queries.
During the system startup we use an expected, rather than real, query stream to
determine access patterns in the relations of interest. For example, if each workload query
may use one of the given 250K keywords with given frequencies, then for our expected
query stream we select the 1000 most-frequent keywords.

4.2 Learning II: Materializing XML and Forming Clusters

Once the first stage of the learning process has discovered the access patterns in the
relations of interest, the system performs, in several iterations, the following steps:
1. To generate the materialized XML records, retrieve from the relations of interest
all tuples whose access counts are greater than the predefined threshold value.
2. Translate the data into XML and store in the materialized XML relation.
3. Form clusters (also see section 4.3):
a. Find all relational tuples that are related to the materialized XML, w.r.t.
the workload queries.
b. Select those of the tuples whose access counts exceed the threshold
value, and translate them into XML.
c. Cluster the tuples and materialized XML into a single XML tree.

4.3 The Clustering Phase

In our selective materialization, we use clustering to increase the scope of materialized


XML beyond the relations of interest, by incrementally adding to the XML records
“interesting records” from other relations. The criterion for adding these interesting
records is the same as the criterion for materializing relational tuples in XML. More
precisely, the relations with the most frequently accessed records are selected in the
descending order of access frequency. For example, if there are three relations R1, R2, R3,
in descending order of tuple-access frequencies, then we can form clusters, starting with R1
and R2, then R2 and R3, and so on.
The relation T now contains a single XML structure, which holds related records with
high access rates. In each cluster, the records are sorted in the order of their access counts.
In the current implementation, the schema for the cluster is provided as an external input
(see Fig. 1). Choosing cluster schemas automatically is a direction of future work.
We now explain on an example how to form hierarchies of clusters. Consider a database
with four relations, R1-R4, in the descending order of tuple-access frequency. We first
modify the relation R1, to store the XML clusters generated from the tuples retrieved from
a join of R1 and R2 on some attribute. Similarly, we modify R2 to store a join of R2 and R3,
and so on. With every join of Rn and Rn+1, we form the most-frequently accessed clusters;
the clusters form a hierarchy w.r.t. their access rates: For example, the cluster formed from
R1 and R2 will have higher access rates than the cluster for R2 and R3. In our experiments,
we have explored the first level of clustering for simple queries; see Section 5. We are
working on implementing multiple levels of clustering for more complex queries.
In our approach we determine the threshold value empirically: At system startup time,
we repeat the learning process several times to arrive at a suitable value. The choice of the
threshold value is a tradeoff between larger materialized views and better query-execution
times: A lower threshold value means more tuples will be materialized as XML; thus more
queries will get satisfied in the XML views. A higher threshold value prevents most of the
relational data from being selected for materialization, which limits the number of queries
that can be answered using the views. The key is to strike a balance between the point at
which the system materializes tuples and the proportion of records to be materialized. In
our future work, we intend to make the choice of this threshold value dynamic.

5 Experimental Setup and Results

5.1 The Setup

The CDDB collection [22] is a database that stores information about CDs and CD tracks.
The CDDB schema comprises two relations, Disc(cd_id,cd_title,genre,num_of_tracks) and
Tracks(cd_id,track_title). (For simplicity, we omit other attributes of the relations in
CDDB.) The Disc relation has 250K tuples. Each CD has an average of 10 tracks, stored in
the Tracks relation. Fig. 2 shows some tuples in the two relations in CDDB.
In our experiments, we used Oracle 9.2 on a Dell Server P4600 with Intel Xeon CPU at
2GHz and 2GB of memory running on Microsoft Windows 2000. We implemented the
middleware interface in Java using Sun JDK 1.4, and ran it on an Intel Pentium II 333MHz
machine with 128MB of memory on Red Hat Linux 7.3. We conducted a significant
number of runs to ensure that the effect of network delays on our experiments is minimal.

Fig. 2. Some tuples in relations Disc and Track in the CDDB database

Fig. 3. Data in the Disc relation with the modified schema

Fig. 4. Schema for the relation that holds the materialized XML and an example of a simple cluster

To determine access patterns for the Disc relation, we added a new attribute, count, to
the schema; this attribute holds an access count for each CD record. The rest of the
database schema is unchanged. (Section 3.2.1 explains how to choose relations for the
schema change.) Fig. 3 shows the tuples in the Disc relation with the modified schema.
Fig. 4 shows the table XmlDiscTrack. This new relation holds materialized XML as text
data in record format. The process of defining this materialized table is explained in
Section 3.2.2. In the XmlDiscTrack relation that we create in the CDDB database,
attributes cd_id and count are the same as in the Disc relation. The value of the count
attribute in XmlDiscTrack equals the value of count in the corresponding tuple in the Disc
relation, at the point in time where that tuple was materialized as XML. The XML attribute
in XMLDiscTrack holds the materialized XML. For example, the value of the XML
attribute, in the tuple for the ‘Air Supply’ CD in XMLDiscTrack, is shown in Fig. 4.
Workload queries: The workload queries in our experiments use CD titles as keywords.
In our architecture, the query-processing engine first tries to answer each workload query
by searching the XML clusters in the relation XMLDiscTrack; if it fails to find an answer
there, the engine then searches the Disc table using SQL. The two query paths are shown
in Fig. 1 in Section 3. In learning stage I, whenever the system answers an input query
using the original stored relations in the CDDB database, it increments the access count for
each answer tuple in the Disc table. For learning stage II to be invoked, the access counts
have to reach the threshold value; see Section 4.2. In the second stage of learning, we
materialize in XML all the tuples in the Disc relation whose access counts exceed the
threshold value. The generation of materialized XML is explained in Section 4. Fig. 4
shows the materialized XML for the CD “Air Supply” generated from Disc and Track.
Cluster formation: This phase is invoked for every tuple in the relation XMLDiscTrack
that holds materialized XML. The XML shown in Fig. 4 is only suitable to answer those
queries that ask for tracks in that CD; this restriction limits the scope of the approach.
Hence, we form clusters. Clusters are formed by identifying related records. The algorithm
for selecting these related records is explained formally in Section 4. The tuples in the Disc
relation that match ‘Air Supply’ and that have their access counts above the threshold value
are chosen to form the clusters. These tuples are converted to XML and merged into the
original structure. An example for the merged structure is shown in Fig. 5. Once the
clustering phase is completed, the XML shown in Fig. 6 replaces the XML in Fig. 4.

Fig. 5. An example of materialized clustered XML

5.2 Experimental Results

In this section we show the results of our experiments on the feasibility of our learning-
based materialization approach.
ÿ Comparing the efficiency of querying materialized XML to the efficiency of
getting answers to SQL queries on the stored relational data.
The objective of this experiment was to analyze whether XQuery-based querying is
effective on materialized XML views, as compared to using SQL on the stored relations.

Fig. 6. Comparison between using random queries on materialized XML and on relational data

Fig. 7. Average query times for a random set of 1000 repeated queries on relational data and an
analysis of the time required to convert this relational data to XML

Fig. 7 shows query-execution times for 5000 XML records, for the query SELECT *
FROM Disc, Track WHERE Disc.cd_id = Track.cd_id AND Disc.cd_title = ‘%Eddie
Murphy%’. Interesting tuples in the join of Disc and Track are stored in XML. The
relational tables hold 2.5 million tuples (250K CDs times 10 tracks). The cluster records
are similar to the XML shown in Fig. 5. The graph is a plot of query-execution times for
XQuery queries based on the attribute cd_title of the Disc relation. The experiment shows
that processing a query on an XML view is faster than using SQL on the relations and then
converting the answer to XML. Fig. 6 shows that executing SQL queries is more time-
consuming than executing their XQuery counterparts on materialized XML.
In pushing XQuery queries to the relational data, converting the answers into XML is a
major overhead. We analyze the overhead in Fig. 7, which shows that the process of
converting answer tuples into XML is the most expensive part of answering queries.
Hence, it would be beneficial if such data were to be materialized.
ÿ Analyzing the maximum time spent in converting query answers to XML.
The objective of this experiment was to analyze the time spent on translating
relational query answers into XML. The graph in Fig. 7 is a plot of query-execution times
for SQL queries based on the attribute cd_title of the Disc relation. The graph shows, as a
solid line, the mean execution times for relational queries plus the times to convert the
answers into XML. We see that of the total time of around 190 ms, converting relational
data into XML takes around 60ms (the dotted line). While the relational query takes 190
ms – 60 ms = 130 ms to execute, there is an overhead of 60 ms in converting the relational
data to XML. These results are the motivation for using materialization techniques.
ÿ Simulation runs to show the decrease in total query-execution times when
querying the materialized XML alongside the stored relational data.

Fig. 8. Average query-execution times for a randomized set of 1000 repeated queries on a
combination of relational and materialized data

Fig. 8 shows 30 simulation runs for a query workload of 1000 randomly selected CD
titles. The X-axis shows the query ID, while the Y-axis shows the query-execution times.
The vertical dotted lines show the points at which XML materialization took place. (Recall
that the system periodically runs the learning algorithm.) It can be seen in Fig.8 that after
every learning stage, the slope of the curve falls. Intuitively, after new learning has taken
place, the XML clusters can satisfy a higher number of queries, with higher efficiency.

6 Discussion

The proposed approach is to store materialized XML views in a relational database using
learning. One extreme of the approach is to materialize the entire relational database as
XML and then use a native XML engine to answer queries. This way, we would be able to
avoid the overhead of translating all possible queries on the data source into SQL, and of
translating the relational answers to the queries into XML. However, query performance
might degrade considerably, as XML query-answering techniques are slower than their
relational counterparts. In addition, the system would have to incur a significant overhead
of keeping the XML consistent with the underlying relations.
In Section 4 we described the process of grouping together related records in XML
clusters. This approach allows a database system to incrementally find an optimal
proportion of XML records that can be accessed faster than the relational tables. This
optimal proportion can be arrived at by varying the size of the clustered XML and the
threshold value. Additional improvements can be made when user applications maintain
local caches: It may be beneficial to prefetch the XML data in the application’s cache, so
that future queries from the application have a higher chance of being satisfied locally.
In our approach, the added counters and flags in the relations have to be updated
frequently and thus create an overhead. In our future work, we plan to reduce the overhead
by updating tuple-access counts offline or during periods of lower query-loads.
We materialize only frequently-accessed tuples; thus, only a fraction of the database is
materialized as XML at any given time. (The clusters are recomputed from scratch every
time the learning phase is invoked.) The advantage of the learning approach is to balance
the proportion of data in relations and XML, by materializing the tuples that are in the
answers to multiple queries.
As the materialized XML is generated based on the access count of relational tuples,
there may be queries that need to access both materialized XML and relational database.
We plan to explore how to handle such queries in our future work.

7 Conclusions and Future Work

We have described a view- and learning-based solution to the problem of reducing total
query-execution times in relational data sources with XML interfaces. Our approach
combines learning techniques with selective materialization; our experiments show that it
can prove beneficial in improving query-execution speeds in relatively static databases.
This paper describes an implementation that is external to the database engine. We are
currently working on incorporating our approach inside a relational database-management
system. We are looking into automating schema definition for materialized XML clusters,
by using the information about past query workloads and the relations accessed by these
workloads. We are working on developing a learning approach to selecting “interesting
records” for XML clusters. We plan to implement a dynamic approach to selecting the
threshold value in XML materialization. We plan to devise better strategies for (1)
prioritizing XML records within clusters, and (2) automatically dematerializing obsolete
XML data. Finally, we plan to automate the choice of the relations of interest, given a
query workload.

References

1. J. Chen, S. Chen, and E.A. Rundensteiner. A transactional model for data warehouse
maintenance. In Proc. of the 21st Int’l Conference on Conceptual Modeling (ER), 2002.
2. L. Chen, E.A. Rundensteiner, and S. Wang. XCache: A semantic caching system for XML
queries. In Proc. 2002 ACM SIGMOD International Conference on Management of Data, 2002.
3. K.T. Claypool, E.A. Rundensteiner, X. Zhang, H. Su, H.A. Kuno, W.C. Lee, and G. Mitchell.
Gangam — a solution to support multiple data models, their mappings and maintenance. In Proc.
2001 ACM SIGMOD International Conference on Management of Data, 2001.
4. L. Chen, S. Wang, E. Cash, B. Ryder, I. Hobbs, and E.A. Rundensteiner. A fine-grained
replacement strategy for XML query cache. In Proc. Fourth ACM CIKM International Workshop
on Web Information and Data Management (WIDM 2002), pages 76–83, 2002.
5. J. Chen, X. Zhang, S. Chen, A. Koeller, and E.A. Rundensteiner. DyDa: Data warehouse
maintenance in fully concurrent environments. In Proc. ACM SIGMOD, 2001.
6. D.W. Embley and W.Y. Mok. Developing XML Documents with Guaranteed “Good” Properties.
In Proc. 20th International Conference on Conceptual Modeling (ER), pages (426-441), 2001.
7. I.M.R.E. Filha, A.S. da Silva, A.H.F. Laender, and D.W. Embley. Using nested tables for
representing and querying semistructured web data. In Proceedings of the Advanced Information
Systems Engineering, 14th International Conference (CAiSE 2002), 2002.
8. M. Fernandez, Y. Kadiyska, D. Suciu, A. Morishima, and W.C. Tan. SilkRoute: A framework for
publishing relational data in XML. ACM Trans. Database Systems, 27(4):438–493, 2002.
9. Yannis E. Ioannidis. Query optimization. In Allen B. Tucker, editor, The Computer Science and
Engineering Handbook, pages 1038–1057. CRC Press, 1997.
10.Z. Liu, F. Li, and W.K. Ng. Wiccap data model: Mapping physical websites to logical views. In
Proc. 21st International Conference on Conceptual Modeling (ER), 2002.
11.Liu Mengchi. A logical foundation for XML. In Proc. Advanced Information Systems
Engineering, 14th International Conference (CAiSE 2002), pages 568–583, 2002.
12. Tom M. Mitchell. Generalization as search. Artificial Intelligence, 18:203–226, 1982.
13. Liu Mengchi and Tok Wang Ling. Towards declarative XML querying. In Proc. 3rd International
Conference on Web Information Systems Engineering (WISE 2002), pages 127–138, 2002.
14. K. Passi, L. Lane, S.K.Madria, B.C. Sakamuri, M.K. Mohania, and S.S. Bhowmick. A model for
XML schema integration. In Proc. 3rd Int’l Conf. E-Commerce and Web Technologies, 2002.
15.Giuseppe Psaila. ERX: An experience in integrating entity-relationship models, relational
databases, and XML technologies. In Proc. XML-Based Data Management and Multimedia
Engineering EDBT workshop, 2002.
16. J. Shanmugasundaram, J. Kiernan, E. J. Shekita, C. Fan, and J. Funderburk. Querying XML
views of relational data. In Proc. 27th Int’l Conference on Very Large Data Bases, 2001.
17. Jennifer Widom. Research problems in data warehousing. In Proc. Fourth International
Conference on Information and Knowledge Management, pages 25–30, 1995.
18. Extensible Markup Language (XML) http://www.w3.org/XML.
19. X. Zhang, L. Ding, and E.A. Rundensteiner. Parallel multi-source view maintenance. VLDB
Journal: Very Large DataBases, 2003. (To appear).
20. X. Zhang, M. Mulchandani, S. Christ, B. Murphy, and E.A. Rundensteiner. Rainbow: mapping-
driven XQuery processing system. In Proc. ACM SIGMOD, 2002.
21. Xin Zhang and Elke A. Rundensteiner. Integrating the maintenance and synchronization of data
warehouses using a cooperative framework. Information Systems, 27:219–243, 2002.
22. The CDDB database. http://www.freedb.org.

You might also like