Materialized Views
Materialized Views
Materialized views have been available in Oracle for quite a while, under their alter ego
"Snapshots," and OLTP persons will be familiar with them through their Replication abilities. In
the world of data warehousing they serve a rather different function, although in an abstract way
they are doing exactly the same thing, which is to make a set of information available in multiple
places.
It is important to start with a solid conceptual understanding of what a materialized view (MV)
actually is. There are three elements to an MV...
The definition of a query.
The storage of that query's result set.
A wide range of metadata that controls and helps to define the MV.
It is also important to grasp the limitations of MVs, which vary widely by Oracle version. When
you have understood the nature of MVs it is but a short step to the incorrect assumption that a
given MV ought to be capable of a particular function. That function may very well be available
in the next major or minor release of Oracle, but in your version, it might not. Indeed, it might be
documented to be available in the next version, but in your particular environment there might be
a bug - previously known or of your own discovery - that prevents that functionality from
operating as expected. Or at all.
Maybe the most important realization is that MVs are enormously complex when compared to
other features, such as bitmap indexes, and in conformance with the previously outlined principle
that "simple equals good, and complex equal bad" that means that they must be approached with
some degree of caution.
In essence, there are three situations in which we might want to make the same set of information
available in multiple places.
The first of these is in summary tables. We provide summary tables in a data warehouse because
not every query is going to need the very detailed information that our fact tables may provide,
and because allowing high-level queries to scan a much smaller pre-aggregated data set reduces
our i/o burden.
The second situation is in dimension tables. The dimension tables contain lists of all the values in
the key columns of the fact tables, and maybe some more information on those values such as
descriptive text for codes.
The third situation is in the fact tables themselves. I will admit this is a bit of a stretch for many
people, but the fact tables are generally the cleansed and transformed versions of some kind of
source data set. That data set may be a set of OLTP database tables, or a set of flat files, but the
fact tables are still in some ways a duplication of the original data set.
A summary can be a very simple object to define. At their most simple they provide a restricted
list of the key columns of a fact table and an aggregation - generally a SUM() -- of some or all of
the metric columns. Therefore, a fact table with seven dimensional keys and six metrics can be
aggregated to a summary table of four dimensional keys plus two metrics. In the process, the
table becomes "shorter" as well as "narrower," and one million rows may be reduced to thirty
thousand - an aggregation ratio of about 33.
A more advanced form of summary table might include an aggregation on one of the dimensional
keys. A fact table with a dimensional key at the "time" level (eg. "03-Jan-2004 10:51:22") may be
summarized as in the example above, but with the "time" key aggregated to an "hour", "day", or
"month" level, thus increasing the aggregation ratio.
These two forms might even be combined, with an original "store" key included in the summary
with the addition of a higher "region" key to allow faster region-based access to the table.
There are two main features of MVs that make them an attractive option for providing summary
tables -- query rewrite and fast refresh.
Query Rewrite for Summary Tables
Query rewrite allows the optimizer to redefine a SQL select statement directed at a fact table so
that it addresses a summary table instead. It does this by comparing the relative costs of providing
the result from the original table with the costs of providing the result from one of many available
MVs, so needless to say, the cost-based optimizer (CBO) is a must. In fact, the CBO is essential
for nearly all data warehouse-specific features, so if you are one of those rule-based optimizer
fans you will just have to suck-it-up and make the change.
Although there are many hoops to be jumped through in configuring query rewrite, at the instance
level as well as the schema level, once it is up and running the functionality is robust, and
unlikely to cause sudden "feature-unexpectedly-stopped-working" surprises on a busy Monday
morning, providing that you maintain good statistics on the fact and summary tables and
partitions.
Fast Refresh for Summary Tables
Fast refresh is an essential mechanism for the maintenance of MVs. It allows MVs to be kept
current with respect to the master fact table by making as few changes as possible to the MV, in
contrast to the complete refresh, which would require that the entire master fact table be scanned
and re-aggregated every time it is modified.
The heavily partition-oriented methodology that Oracle suggests for maintaining fact tables
implies that MVs ought to be responsive to change at the partition level of the fact table, and
indeed Oracle offers Partition Change Tracking (PCT), at various levels of capability and
maturity dependent upon version, to allow this.
PCT offers the ability to associate summary table rows with the fact table partition from which
they were sourced, for the purpose of both fast refresh and query rewrite capabilities. Among
other restrictions (documented in the Oracle Data Warehousing Guide for your version), either
the partition key of the fact table or a column based on the Oracle-supplied
DBMS_MVIEW.PMARKER must be included in the SELECT list for the MV.
As for query rewrite, there are some hoops to be jumped through for an MV to be
fast-refreshable, and even more for it to be fast-refreshable through PCT.
However ...
It's all very seductive stuff, this ability of your summary tables to be self-maintaining - change the
fact table, and they look after themselves. However, there are some issues that make it not-quite-
so clear cut, and in my view, there are some disadvantages that need to be considered before we
leap headlong into this functionality.
In the next article, I will continue the topic of Materialized Views with some of their
disadvantages.
PERFORMANCE TUNING:
Performance tuning is a broad and somewhat complex topic area when it comes to
Oracle databases. Two of the biggest questions faced by your average DBA concern
where to start and what to do. All you may know is that someone (a user) reports a
problem about a slow or poor performing application or query. Where do you even
begin to start when faced with this situation?
Priority Description
First Define the problem clearly and then formulate a tuning goal.
Fourth Use the statistics gathered in the second step to get a conceptual picture of
what might be happening on the system.
Fifth Identify the changes to be made and then implement those changes.
Sixth Determine whether the objectives identified in step one have been met. If they
have, stop tuning. If not, repeat steps five and six until the tuning goal is met.
Reference: OCP: Oracle9i Performance Tuning Study Guide, SYBEX, Inc.
Interestingly, the emphasis on identifying which step an action falls under went away with
Oracle9i, and recitation of the principles is not a testable item. The title of documentation even
changed between releases one and two, and that should send a clear signal that the art of
performance tuning (or, performance and tuning) is still just that – an art. When it comes to
instance tuning, the steps are even further reduced in Oracle10g.
The performance tuning guide for Oracle10g (Release 2) identifies the overall process as The
Oracle Performance Improvement Method. The steps have been expanded, but overall, remain
the same.
b. Get a full set of operating system, database, and application statistics from
the system when the performance is both good and bad. If these are not
available, then get whatever is available. Missing statistics are analogous to
missing evidence at a crime scene: They make detectives work harder and it is
more time-consuming.
2. Check for the top ten most common mistakes with Oracle, and determine if
any of these are likely to be the problem. List these as symptoms for later
analysis. These are included because they represent the most likely problems.
ADDM automatically detects and reports nine of these top ten issues. See
Chapter 6, "Automatic Performance Diagnostics" and "Top Ten Mistakes Found
in Oracle Systems".
5. Validate that the changes made have had the desired effect, and see if the
user's perception of performance has improved. Otherwise, look for more
bottlenecks, and continue refining the conceptual model until your
understanding of the application becomes more accurate.
6. Repeat the last three steps until performance goals are met or become
impossible due to other constraints.
These areas pretty much cover the Oracle RDBMS and instance from top to bottom. The
remainder of this article will focus on tuning SQL, or more precisely, preventing slow SQL
execution. Aren't these the same thing? Mostly yes, but a common approach in development is
making a statement perform well enough or fast enough. Each and every statement does not
have to be optimal, but some thought has to go into coding them. You do not have the time to
optimize hundreds or even thousands of SQL statements, but at the same time, there are
guidelines you can follow to avoid common mistakes and bad coding.
That is quite a list and overall is thorough and accurate. Step 9, referring to the use of the
Rule-based optimizer, may cause a reliance or dependency on a feature Oracle has identified
as a future item to be deprecated. You are eventually going to have to solve the problem
using the CBO, so you may as well start now and forget about the RBO. Step 14 should be
changed to something along the lines of "reduce I/O contention" instead of its currently stated
"separate index and table tablespaces" guidance.
In Closing
In the next article of this series, we will look at some specific steps of these tips. For example,
advice given on many Web sites about how to improve a SQL statement's performance
typically includes "use bind variables." Well, I am sure many people have this question: "How,
exactly, do I do that?" It is actually pretty simple, as are many of the details of how to use
many of these tips.
» See All Articles by Columnist Steve Callan
A placeholder is a column for which you set the data type and value in PL/SQL that you define.
Placeholder columns are useful when you want to selectively set the value of a column (e.g.,
each time the nth record is fetched, or each time a record containing a specific value is
fetched, etc.). You can set the value of a placeholder column in the following places: the
Before Report Trigger, if the placeholder is a report-level column a report-level formula
column, if the placeholder is a report-level column a formula in the placeholder's group or a
group below it (the value is set once for each record of the group)