Guide For Datawarehousing
Guide For Datawarehousing
Universal Database
for
Windows NT, on up to the enterprise level
systems running on enhanced parallel MPP
architectures such as IBM SP2 with DB2
Universal Database. Database Management
Systems play key roles in the long-term via-
bility of Data Warehouses. Issues such as easy
access to operational data, scalability, and
What is the problem at hand? Is it a problem
related to cycle time, customer satisfaction,
more cost-effective decision making, better
business intelligence, or a general lack of
information with which to make decisions on
any of the above?
Which of the departmental or enterprise goals
and responsibilities are directly related to the
problem at hand?
What are the Critical Success Factorsthose
things that must be done well to solve
the problem?
Which of the organizational components
of the enterprise is best positioned to solve
the problem?
Which audience within this organization will
best use this technology and how: executives,
financial analysts, scientists, engineers, clerical
and administrative users, line managers,
others and why do they need this (who will it
benefit)?
How can this technology be used to solve the
problem? This is where the alignment of the
technology to the problem will occur.
Quantifying benefits
Once the benefits of the technology have been
aligned to the business objectives, they should
then be quantified. The reason for this is that
management must be able to answer the ques-
tion: How will we know if this project is suc-
cessful? The answer must fit the form: This
project will be successful if it allows us to
achieve the following goals...
In many organizations, quantifying benefits
takes the form of a financial analysis, specifi-
cally a Return on Investment (ROI) analysis.
A study by International Data Corporation
(IDC), co-sponsored by IBM, showed that
Data Warehousing could provide significant
and impressive ROI numbers. The study,
which included 62 participants, demonstrated
that the overall ROI on warehouse projects
was 401% with payback periods of two to
three years.
What was interesting about the study, however
was that the smaller, departmental implemen-
tations, sometimes known as Data Marts, had
a 533% ROI, while the larger, enterprise
efforts showed an impressive 322% ROI.
The IDC study identified three kinds of bene-
fits in the use of Data Warehouses.
Cost avoidance benefits. These benefits were
the ability not to spend money that is
presently spent on generating endless reports
to end-users. This included the resources
expended by IT to generate answers on ad hoc
queries. In many ways, Data Warehousing rep-
resents a liberation for IT by providing users
with the tools they have needed over the years
to generate their own reports. By allowing the
users to do so, one eliminates the often end-
less loops of the user requesting a report from
IT, IT delivering the report, the user either not
approving the report due to some miscommu-
nication or changing their minds after seeing
what they asked for, etc.
Efficiency gains from increased productivity
among end-user professionals who gather and
analyze data. The analyst who must stop an
analysis to get information and who has to ask
someone else to get the information loses effi-
ciency in two ways. The first is in the loop
described above, where there might be sev-
eral iterations of request and response
between the analyst and IT before the analyst
is satisfied with the results of the request. The
second is in the interruption of the analysis,
and the inefficiencies associated with recover-
ing the thought processes that were
underway when the analysis was suspended.
Warehouse dependent savings due to deci-
sions based on analysis that could only come
from data in a warehouse. This is a quality-
of-decision issue that comes from the fact that
certain data associations may not exist in any
one operational system and therefore are not
available to the analyst. By building those
requisite associations and relations in the
Warehouse, a situation where the whole is
greater than the sum of its parts occurs and
those relationships allow the analyst to do
their work better, and become more effective.
This is not a case of people making bad deci-
sions prior to the advent of the system; this is
about giving people better tools to empower
them to do better work.
Not all benefit quantifiers are in terms of
ROI. A recent article described how firms
often eschew formal ROI analysis because
they consider the data warehouse a strategic
investment. In this case, these organizations
were convinced prima facia that the benefits
would be worth the costs. A cautionary word,
however: make sure that there is an under-
standing of the projected costs before starting
the project if there is no formal financial
measurement technique such as ROI.
Regardless of the culture of the organization,
whether it accepts soft benefits in its approval
process or not, be sure to quantify the antici-
pated benefits in business terms. Without this,
the organization will have no metrics to deter-
mine whether or not the project is successful.
2
Data
Warehouse
Data
Mart
600%
400%
200%
0%
Overall
ROI
Fig. 2
3
Perspectives on data
marts and warehouses
The IDC finding would lead one to believe
that since the data marts delivered the high-
est ROI, a viable strategy could be to allow
each department to implement their own data
marts independently. The problem with this
thinking is that it ignores the issue of cross-
functional analysis. Let us examine what
happens when Finance and Marketing each
develop their own data marts independently of
each other. Let us further assume that each
keeps track of sales, but defines them differ-
ently. For example, marketing defines a sale
as a booking while finance defines it as a pay-
ment. What are some of the potential ramifi-
cations of this situation? A primary
consequence is that analyses of the same
business dynamicsalesmay result in two
departments having significantly different
views and analyses. More importantly, sup-
pose finance needs something from the mar-
keting Data Mart, or vice versa. Without a
unifying design or data standard, the analysis
of data that spans two or more organizational
Data Marts will not be possible using this
strategy. It is therefore important in develop-
ing a Warehousing strategy to understand the
implications of an independent Data Mart
strategy and factor these risks into its formu-
lation. A more detailed discussion of indepen-
dent vs. dependent Data Marts is presented in
the Process Criteria.
The heat is onto get warehouses up and
running fast (but) The biggest benefit
comes down the road, when you can support
20 different decision support applications
with the same architectureGartnerGroup.
This quote from the GartnerGroup highlights
the need to develop an overall strategy and
architecture for the organization prior to or at
least in parallel with the development of the
first Warehouse project. At the beginning of
these projects, everyone is in a hurry to reap
the immediate benefits of the systems and
becomes impatient with the planning process,
which is sometimes seen to slow things down.
However, experience tells us the payoffs will
be that much greater, and subsequent
improvements will come that much sooner if
a sufficient amount of time is invested at the
front end of the project in defining an overall
enterprise architecture into which the Data
Marts can connect.
Understanding product
integration issues
The current maturity level of the Data
Warehousing industry leads to a proliferation
of many disconnected offerings in the mar-
ketplace. Many small vendors have joined the
fray with products targeted at one or two ele-
ments of the Data Warehousing architecture.
As a result, there are very few offerings in the
market, which answer all of the needs of a
potential end user. This leads to a risk regard-
ing how well different packages will integrate,
which common platforms are supported, and
so on. A trend is emerging where a number of
large vendors form integrated product teams
with smaller vendors that hold a solid solution
in an important warehousing niche. For
example if a large vendor had a great DBMS
solution, they might team with product ven-
dors with extract technology, data cleansing
technology or specialized OLAP tools. One
benefit of these vendor consortiums is to
agree on common approaches to sharing meta
data between tools and databases from differ-
ent vendors. Meta data, with its global signifi-
cance, is a key to product integration in
contemporary data warehousing solutions.
The investment of these vendors integrat-
ing products from different vendors benefits
end users by resolving the tricky and risky
issues associated with integrating products
from different vendors.
Analytical applications
One of the fastest growing trends in the Data
Warehousing market is the emergence of
packaged analytical applications. The ability
to script data analysis applications to derive
particular end results is paramount to the
evolution of effective data warehousing solu-
tions. In this regard, early data warehousing
solutions were all one-off customized applica-
tions built to deal with particular business
challenges. This continues to be an important
part of data warehousing activities. But in
addition, as applications and business needs
become more generalized and as custom
applications begin to be widely deployed,
there is a viable place for sets of off-the-shelf
analysis applications that can be deployed
quickly and effectively to meet many common
business problems. These off-the-shelf
applications also include the capacity for
semi-customization and are the trend of the
future. The costs of these applications should
continue to drop, while the real value will
continue to grow.
Dealing with cultural issues
Data Warehousing is about pooling resources
(data), which implies sharing, which in turn
can imply a loss of control, a concept some-
times inimical to many data owners. This
kind of organizational provincialism can
sometimes throw up impediments to a
Warehousing project and must be dealt with
in the early phases of the project.
Business unit and process
considerations
Technology is only useful to the extent that it
supports our ability to carry out our assign-
ments and achieve our corporate goals.
Therefore the introduction of any new
technology must be aligned with the business
units and processes which it is intended to
support. The IT organization may or may not
have all the requisite technical skills, but IT
will not successfully implement a warehouse
project without the involvement and commit-
ment of the business unit. Too often,
technology is developed independently of any
business process considerations, many times
with catastrophic results.
Choosing technical services
Very few, if any, IT organizations have the
requisite combination of skills and resources
required to perform all of the technology,
planning and implementation tasks required
for successful Data Warehouse projects. This
is not intended as an affront to IT organiza-
tions, but rather a simple observation that it
will take a broad spectrum of talent which
crosses many disciplines to make this work.
The chances that a single IT organization will
have all of these talents available for this pro-
ject at the same time are small. Therefore, at
some point in time, many organizations will
need to locate a partner to consult in the
technical planning for the Warehouse and
then eventually assist in the implementation
of the system. The partner selected must be
able to operate within the constraints enu-
merated in the organizations Warehousing
strategy, including the methodology chosen,
4
implementation style chosen (see Process
Criteria - Methods), and so on. Choosing
the wrong partner, for example one who has
Data Warehousing experience but not Data
Mart experience, or one who does not have
experience across the entire spectrum of
products and services, can increase the risks
associated with these systems. For example,
IBM offers specific services in conjunction
with its Visual Warehouse solution as
well as custom services for any scope of
warehouse implementation.
The state of standards
OMG committee for common
warehouse meta data
IBM, in conjunction with Oracle and Unisys,
is sponsoring an OMG (Open Management
Group) subcommittee for the standardization
of Common Warehouse Meta Data. The objec-
tives of this committee are to establish an
industry standard for common warehouse
meta data interchange and to provide a
generic mechanism that can be used to trans-
fer a wide variety of warehouse meta data.
The intent is to define a rich set of warehouse
models to facilitate the sharing of meta data,
to adopt open APIs (Java and Corba) for
direct tool access to meta data repositories,
and to adopt XML as the standard mecha-
nism for exchanging meta data between tools.
The subcommittee, chaired by IBM, is in the
process of accepting vendor proposals for the
above objectives.
OMG committee for XML/XMI
A related OMG subcommittee has been
formed to standardize XML Meta Data
Interchange (XMI). IBM, Unisys and other
industry leaders are also involved in this
work. IBM and Unisys have submitted a pro-
posal co-submitted by Oracle, DSTC, and
Platinum Technology and supported by
numerous other vendors. The proposal for an
XML Meta Data Interchange Format specifies
an open information interchange model that
is intended to give developers working with
object technology the ability to easily inter-
change meta data between modeling tools
and between tools and meta data repositories.
In a data-warehousing context, the proposal
defines a stream-based interchange format
for exchanging instances of UML models.
One of the inherent risks of a new technology
is the lack of standards, and Data Warehousing
is no exception. There are countless examples
of competing technologies that resolved them-
selves into one standard, and most of the time
the resolution creates winners and losers.
Eight-track tape owners were losers in the
technology battle with cassettes, Betamax
owners lost against VHS in video recording,
CPM lost to MS/DOS, and the list goes on.
Obviously, one risk mitigation strategy in this
arena is to align the project with a big player
in the industryone that will have an influ-
ence in determining the winning standards.
Another innovative mechanism which was
used in a major bank IT shop was to include
in the cost/benefit analysis an estimated cost
to bury the existing system and replace it with
a new one in case the wrong decision was
made. If the project still made sense after
including this cost, then the bank would go
ahead with it in spite of the standards risk.
Lack of attention to training
For many years, exposure to the world of the
database was limited to the inhabitants of the
glass housethe IT department.
Consequently, many end users are not familiar
with the concepts behind navigating a data
schema or unraveling the mysteries of joining
tables via keys to get queries answered. A lib-
eration of sorts will ensue from allowing users
access to their own data. However, the users
must be prepared for life in this liberation and
must be ready to accept the responsibilities of
runaway queries, etc. Therefore, it is incum-
bent on management to make sure that ade-
quate training is provided to users to allow
them to use the system effectively. They have
to use the system without getting so frustrated
that they give up, or worse, poison the project
by maligning it to others. They cannot bring
the system to its knees by constructing
queries, which run forever, or base an analysis
on faulty data because the user was not famil-
iar enough with the system to understand the
information for which he or she was asking.
Process criteria
A number of elements of the Warehousing
strategy have to do with processes: processes
by which the strategy is implemented, and
processes which are supported by the
overall strategy.
Scope of effort: How big
should it be?
Many large warehouse projects have failed
because of an inability of the organization to
handle the size and scope of the project. It is
tempting to think of a single repository where
all of the enterprises data problems can be
solved in one fell swoop. And, if the organiza-
tion can indeed come up with an integrated
data model and solve all of the issues associ-
ated with such architecture, the benefits are
indeed significant. However, this is some-
times not realistic and not necessary for the
problem at hand. Industry studies have esti-
mated the average size of a large warehouse
project to be nearly two million dollars1 with
a time to completion measured in years. No
doubt, some business problems require the
integration of data from many systems and
will require a global strategy. Many others
will not, and a simpler strategy, one of Data
Marts, will likely be less risky.
This question of project scope should be
readily answered from the exercise described
earlier on aligning the technology to the cor-
porate Vision and Mission. Once the questions
relating to who needs the technology and
which problems are being addressed are
answered, the scope should be relatively
straightforward to determine, which should
allow management to allocate appropriate
resources to manage it.
A strategy, which entails an enterprise wide
scope, has certain implications associated
with it, which should be understood. First and
foremost it requires integration and coopera-
tion among multiple organizational elements.
Many issues will arise regarding different def-
initions of similar or identical terms, competing
objectives and agendas, data parochialism and
an unwillingness to give up control. This is a
situation, which can be difficult to manage
and successfully navigate. Oftentimes, change
management is necessarily intertwined with
this exercise, since the organization will have
to wrestle with inter-departmental issues as
5
described above. Management must assess
whether or not the organization is ready to
deal with these kinds of issues, or whether
they might not better wait until a more
appropriate time.
A Departmental, or Data Mart approach is by
definition smaller in scope, more focused in
its outcome, quicker to achieve, less costly.
However, there is a risk to developing Data
Marts in a vacuum, as described in more
detail in the method section below. Ideally
there is a need to think globally about future
integration with other departmental applica-
tions and Data Marts to avoid developing
Data Islands.
Data Marts have their place and present a
strong business case for starting with such an
implementation, but if an organization deter-
mines that an enterprise warehouse is the
appropriate strategy then there are many suc-
cessful models to emulate.
Implementation
approach options
Deciding whether a Data Warehouse or Data
Mart is right for the enterprise is an appropri-
ate beginning. It must, however, be followed
by a decision on whether to buy an integrated
package from a single vendor, engage a systems
integrator to bring together a collection of best
of breed products, or have the enterprises IT
department create a best of breed solution.
The obvious advantage to dealing with an
organization which can offer a complete solu-
tion is, of course, faster realization of business
goals, usually at a reduced cost, and with a
good degree of certainty that the ultimate
solution will work (lower technological risk).
These benefits do come at a price, though,
and that price is the potential compromises
that have to be made in functionality and
performance by accepting the full suite of
products from a single vendor. Further, the
enterprise may have to adapt business
processes to fit the specific characteristics
of technology sourced from a single vendor.
The best of breed concept is certainly not new.
It has as its foundation the tenet that the
enterprise will be better off if it can somehow
bring together the best extraction tools, data-
bases, SMP/MPP hardware, disk drives,
analysis tools, network, etc. and get them to
function as a unified system. Aside from the
difficulty and religious wars that accompany
the attempts to define best, the price paid for
the anticipated exceptional performance is
primarily the pain associated with integrating
the disparate components. Different vendors
value different architectures and functionality
characteristics, and the best extraction tools,
for example, may not integrate well with the
best meta data repository, and so on. Many
implementations of components which should
integrate, are advertised as compatible, but
require extensive work to get all details to
work for an organizations application. This
kind of scenario is especially significant in
technologies which are young, and in which
standards are not yet entrenched. Data
Warehousing meta data standards are still
evolving, although IBM, in conjunction with
Oracle and Unisys, is sponsoring an OMG
(Open Management Group) subcommittee for
the standardization of Common Warehouse
Meta Data. This will result in a rich set of
warehouse models to facilitate the sharing of
meta data. Another OMG subcommittee has
also been formed to standardize XML Meta
Data Interchange (XMI). These standards ini-
tiatives will go a long way to resolving these
important issues. Most integrators find that
all projects require compromises to achieve
integration, although some level of custom fit
with the enterprise is generally achieved.
There is a second, subtler price associated
with this custom fit, and that is what to do in
the future about upgrading to later releases.
Assume, for example, that an integrator com-
bines extraction tool A with meta data reposi-
tory B, database C and analysis tools D.
Assume further that all of the products, (A, B,
C, and D) are at release 1.0. The integrator
finishes the job, and the customer is satisfied,
and some time later extraction tool A moves
to release 2.0, three months from then the
analysis tool moves to release 2.0, and so on,
each release bringing with it features and
functions the organization would like to
incorporate. The challenge is: how to do that?
Are these independent release updates com-
patible with each other? Will the resultant
system be backward compatible with the orig-
inal system? If the organization continues to
rely on the integrator to maintain a system at
the latest best-of-breed status, this is tanta-
mount to a full employment act for the inte-
grator, and certainly of dubious value to the
customer. On the other hand, a customer who
has purchased an integrated package from a
single vendor can put the onus of compatibil-
ity and upgrading on the shoulders of the
vendor, and upgrade the warehouse to the
next release by buying a single upgrade.
The third option, that of using the enterprises
own IT shop to integrate the package is prob-
ably not a viable option except for very large
and sophisticated organizations. Few IT shops
have the expertise to integrate best of breed
components, and fewer still have the available
resources. Remember that one of the reasons
to build a Warehouse is because the IT
Database Database Database
Data
Store
Data
Store
Data
Store
Fig. 3
Common Meta data
Database Database Database
Data
Store
Data
Store
Data
Store
Data Loading Architecture
Common Access Architecture
Fig. 4
6
department is buried with requests from users
who are already frustrated with the inability
to get answers to their mission-critical plan-
ning questions. The idea of taking an entire
team of IT specialists out of the front lines for
months (or years for a global warehouse), and
simultaneously maintaining or reducing the
backlog could be a formula for disaster.
Top-down and bottom-up
approaches
How should an organization approach the
design and construction of the Warehouse?
Should it go for a top-down technique, setting
up an enterprise-wide architecture and then
constructing Warehouses or Data Marts that
conform to that architecture? Or perhaps
take a bottom-up philosophy, starting right in
with highly focused and targeted Data Mart
projects aimed at specific critical areas of the
business? The issues here have to do with
deciding between addressing short-term tac-
tical requirements to help individual depart-
ments and long term strategic planning issues
regarding data architectures that have to cut
across political and organizational boundaries.
The Top-Down approach can yield the best
long-term results, but it also invokes the most
angst within an organization. It is very diffi-
cult, expensive, and time consuming to
achieve consensus on a single, consistent,
accepted and valid view of the business, the
data it needs, etc. In the prior example of the
Marketing and Finance departments, two dif-
ferent operational definitions for sales were
developed. Which of the definitions will be
used in the Warehouse, or will there be two
terms that must now be defined? If so, they
cannot both be called sales, and the meta
data had better be clear as to what the final
determination became. It is clear that many
more change management issues will have to
be dealt with in this approach, as organiza-
tions grapple with the sins of the past in not
having promulgated an accepted data stan-
dardization program, etc.
The Bottom-Up approach favors the use of
smaller, more focused applications of
Warehouses that can avoid the pitfalls of the
Top-Down approach by simply limiting the
extent of the implementation. This approach
also exhibits simpler data archaeology prob-
lems: there are usually limited data sets, lim-
ited user views, a good understanding of the
data needs and how they relate to the business
problem. In its purest form, this approach
trades the near term pain of dealing with data
standardization issues for the longer term
inability to operate cross-organizationally.
In this approach, each department is respon-
sible for extracting whatever data they need,
defining their own meta data and using their
own private Warehouses for decision support
at the departmental level. Three obvious
problems arise: (1) this architecture is difficult
to scale up to an enterprise view; and (2) the
lack of standardization prevents analysts in
one organization from accessing information
from another organizations Warehouse which
might be of use to them in their analysis. And
(3) they may derive different answers to the
same questione.g., what were sales last
month. This is demonstrated by the brick
walls inserted between departmental systems
in Fig. 3 preventing interactions. The primary
cause of this is the Warehouse Tower of Babel
syndrome inherent in the underlying philoso-
phy. Each mart can create confusing, overlap-
ping and contradicting views of the business,
like the proverbial six blind men trying to
describe an elephant, each only being able to
relate to the elephant according to the portion
of the animal they were feeling. What is a
customer? a product? a sale?
This approach works if the organization has a
business problem with a single focus and the
data to solve that problem exists in only a few
places, with no political ownership issues.
Some organizations use a hybrid approach to
gain the speed and cost advantages of the
highly focused departmental approach, yet at
the same time making sure that the imple-
mentation is consistent with the overall goals
of the organization.
This approach uses the principles developed
in Rapid Application Development (RAD)
methodologies and intentionally delivers
iterations of the departmental Warehouse,
attempting at each iteration to come closer
to an overall enterprise data model and
data architecture.
This allows the organization to take advantage
of the speed and cost savings of the smaller
approach while at the same time mitigating
and eventually overcoming the lack of inte-
gration and islands of automation problems
inherent in this approach. Other organiza-
tions are using the smaller implementation as
a proof of concept and prototype/pilot instal-
lation. This assists in proving the benefits of
the technology on a smaller scale, and smaller
risk, and then scaling the solution into a more
7
global implementation, either by building
more integrated departmental Warehouses
(see Fig. 4) or by moving to a full global
enterprise Data Warehouse implementation.
Any of these approaches can work for your
organization. What is important is that you are
aware of the issues associated of each and
actively mitigate the cons.
Technology criteria
The technology dimension will of course play
a major role in the enterprise strategy.
Different strategies will require different tech-
nological characteristics and features. Just as
the technology must align itself with the busi-
ness mission, the strategy must also consider
the technology.
Scalability
Scalability refers to the ability of a system to
increase in capacity as users demand more, as
data stores grow, as more users are added to
the system and as more applications are devel-
oped against the Warehouse.
One of the challenges associated with intro-
ducing new technologies and new applications
is that of determining the actual system load
after implementation. JAD (Joint Application
Development) sessions attempt to mitigate the
risk by attempting to produce a picture of user
requirements, but the truth is that users who
have never had the opportunity to work with a
new technology dont really know what to ask
for. Further, they dont know ahead of time
what kinds of demands they will make on the
system until they actually sit down and start
working with it. Therefore, as the users become
familiar with the query capabilities and the
navigational issues, their own success will
cause them to demand more from the system
as word spreads and more users exploit the
features of the system. In time, users will
become more sophisticated and begin explor-
ing with ideas of data mining and visualization
as their analyses become more complex. All of
these are factors that demand scalability in a
system. Companies acquiring warehousing
technology need to be assured that scalability
is built-in to the architecture.
Manageability
Data Warehouses require the development
and implementation of new processes, tools,
and work systems to manage the extraction/
transformation of operational data, the
administration of users (adding, deleting
users, changing access rights) and so forth. In
the course of deploying a Warehouse, a
number of operational management decisions
must be made and supported by the product
solution set:
What is the relationship and meaning of the
data being loaded to the intended business
use? How frequently should data loads and
transformations be made? Should they be
daily, weekly, monthly, quarterly?
How will the Warehouse reporting adapt to
the business processes as they change over
time? How will the system model and track
the business?
How much data needs to come in on each
load, and how long will it take for the system
to recompute all the indexes, meta data
updates, and other administrative details that
must be undertaken? How will the system
monitor database operations?
How will the system deal with backups/
restores and what should the process be to
administer a disaster recovery plan?
If the system is to be paid for by usage
chargebacks, is there a mechanism for the
systems administrator to keep track of usage
by account number or password, and are there
reports available to facilitate this feature?
Performance
How well the system performs will be the
ultimate arbiter in the success or failure of
the project. The intent of the Warehouse is to
help people do their jobs more effectively and
efficiently. If the response time is not ade-
quate, users will not use the system, the
enterprise will not derive any benefit from the
expenditure, and the project may wind up
with a negative ROI. Therefore the technol-
ogy dimensions of performance must be thor-
oughly considered in developing a strategy.
Over the years, there have been a number of
studies to determine the limits of human
patience in dealing with computer response
times. In general, users want to see some-
thing back in a timeframe that does not inter-
rupt their thought processes. Some have said
that two to five seconds is a good target
response time. But when we are dealing with
such gargantuan database sizes, it is difficult
to conceive of doing a table scan on a multi-
million row table in that timeframe, which
leads us to the second human factors point.
That is the fact that the amount of time a user
will wait is proportional to the perceived diffi-
culty of the procedure requested. Therefore,
if a user knows they have entered a particularly
nasty query, they will be more tolerant of
delay. The best advice on this subject is to work
with the user community in establishing
meaningful metrics and working cooperatively
to set and meet expectations on both sides.
There are several areas that directly impact
the performance of a system: the hardware
architecture and the database architecture.
The hardware dimension is simply the use of
Symmetric Multi Processing (SMP) and
Massively Parallel Processing (MPP) architec-
tures. The top end of the MPP capacity does
outstrip the top end capabilities of SMP.
However, the applications where this kind of
performance is necessary are few and far
between. For this reason, many industry ana-
lysts are predicting that SMP will be the
hands down winner in the Warehousing
market. This does not mean that every ware-
house implementation should be on a parallel
technology of one sort or another, many appli-
cations can perform satisfactorily on non-par-
allel systems.
Some database vendors have followed the
hardware architecture by introducing paral-
lelism into the database functions. For example,
IBM has continually enhanced products such
as DB2 and it now provides capabilities such
as parallel query, loads, joins, scans, and
utility processing providing tremendous value.
Meta data
Extraction
Tool
Common
Format
Fig. 7
Translator Meta data
Extraction
Tool
Format
A
Format
B
Meta data
Extraction
Tool
Common
Format
Common
Interface
Common
Interface
Fig. 5
Fig. 6
8
Flexibility
If there is one thing that distinguishes market
conditions today it is the pace of change. In
fact, as pointed out in the introduction, one of
the drivers leading enterprises to consider
Warehousing technology is the need to keep
up with that change. Deployment of a
Warehouse will not alter the pace of change or
the need to keep up with the change, which
means the Warehouse technology itself must
be flexible, allowing for rapid responses to
changing conditions. The Warehouse must be
adaptable to changes in the enterprise, such
as reorganizations, mergers, and acquisitions.
It may be necessary to implement new queries
based on responding to a competitors product
introductionqueries not envisioned in the
original design. If the database needs to be
redesigned, reloaded or indexed, this could
mean a significant delay in responding to
the competition.
Ease/speed of implementation
The development and maintenance tools
available with the Warehouse will be key to
the success of the project. Are the tool sets
tightly integrated, as is usually the case in a
one-stop-shop solution, or will the develop-
ment team have to wrestle with the tools as
well as with the disparate products to make
them work together? Are the user interfaces
graphical or the old command line metaphor?
Do the development tools automatically gen-
erate the meta data content, or will there be a
second step to build the meta data and then a
third to reconcile the mistakes made in doing
this manually? These considerations apply not
only to the initial deployment of the Warehouse,
but also speak to the flexibility, since changes
to the Warehouse will likely at some point
involve the use of these tools again. A trade
article in Datamation put it best: Having
tools that are well and deeply integrated with
each other is a necessity, not a luxury.
Tool integration
Integration deals with how well and how
smoothly the different architectural compo-
nents interact. Obviously these are subjective
terms, and therefore there are degrees of
integration in many different areas such as
administrative interfaces, Warehouse
management, problem determination, etc.
Tight meta data integration would yield bene-
fits from needing only to input meta data
once and, once operational, preventing meta
data from becoming out of synchronization.
For example, an extraction tool and a meta
data repository could interact according to
several different models as follows:
In the first model, each component has its own
structures and syntax for the meta data and a
third component is interposed between them
to effect information transfer between them.
In Fig. 5, a translation module changes the
formats of the information from one component
to the other. Therefore, in this model there is
no direct integration between components.
A second option (see Fig. 6) is to have each
module maintain different internal structures
and syntax, but to have a common defined
interface, such as a meta data standard. In
this case, the accepted transfer mechanism
and translation is built into each module.
Compromises must still be made because the
internal structural differences may yet inter-
fere with some functions, but overall the
interaction between the components is facili-
tated by the acceptance of a common mecha-
nism for information interchange. This
situation is considered to be moderately well
integrated, and in fact do have an accepted
architecture that each component follows to
effect this integration.
Finally, on the other end of the spectrum are
tools that use the same structures and syntax
internally as well as present a common inter-
face to each other (see Fig. 7). These compo-
nents are completely integrated. The more
vendors involved in the solution, the more
challenges exist integrating themsome
single vendor solutions make this option rela-
tively achievable.
The degree of integration of the Warehouse is
an important consideration in developing an
architecture and an implementation strategy.
First of all, the Warehouse must integrate with
the existing architecture and infrastructure
of the organization. Reverting to the three
models discussed earlier, if the Warehouse
can only integrate with the existing architec-
ture by means of extensive interposition of
custom code, the project will be very expensive,
lengthy and complex. Questions must be
asked about the degree to which the proposed
Warehouse must conform to existing supported
operating systems, networks, data standards,
existing databases, existing application
development environments, and more.
If the Warehouse supports common, open
architectures, such as in the second model, the
likelihood of being able to add analysis func-
tionality later, such as data mining for exam-
ple, will be higher. Open architectures also
make customization of products easier due to
the more regular and predictable nature of the
interaction among architectural components.
9
With regard to the existing operational appli-
cations from which the Warehouse will extract
its information, the tighter the integration
with all data sources, the better. Remember
that some of the systems in the enterprise will
have flat files or older hierarchical or network
database architectures which can present a
challenge. Which of the three models of
integration are required for the enterprise?
Data can be extracted from the databases in
discrete batches or continuously, on a transac-
tional basis. Depending on the Warehouse
application, either of these may be preferable
or both may be required. Do extraction tools
exist for both cases, or will there need to be
some special code written? Finally, can the
existing Warehouse automatically extract por-
tions of the meta data from the existing opera-
tional systems, or will there need to be special
software or processes employed to do so?
How easy will it be to integrate the Warehouse
with existing operations, including adminis-
trative activities? Data extracts, cleansing,
transformation and loading should not be
done manually if it can be avoided. Not only
does this introduce more cost in labor, but it
can also interject too many vulnerabilities for
error. Understanding all of the dimensions of
integration with your existing environment and
what is possible to automate is a must.
The Warehouse should also integrate with
enterprise standards, both de jure standards,
those promulgated by officially sanctioned
bodies such as ANSI, OMG, the Meta Data
Council and de facto standards that are
practices and products generally accepted in
the industry.
Completeness
The completeness of a solution talks to the
existence of all the architectural components.
This means that all of the parts an organiza-
tion needs to work are in place and functional,
from extraction and transformation to storage
and meta data, to the analytical, management,
and change processes required.
Many large vendors, particularly IBM, have
invested significantly in their core technology
and have evaluated the issues surrounding
Warehouse technology and implementation and
developed partnerships and processes that
address the issues discussed in this section. The
case studies that follow are good examples that
illustrate many of the points just described.
Implementation tactics
Following is a high level guideline for imple-
menting successful Data Warehousing projects.
Planning the project
Many organizations are in such a hurry to
install a system that they tend to gloss over
this vital step. To paraphrase what Lewis
Carrolls Cheshire Cat told Alice when she told
him she didnt know where she was going, if
you dont know where youre going any road will
get you there. The corollary, of course, is that
if one does have a destination certain, then the
choice of roads is important, and one had best
spend some time setting it up.
Gaining commitment at top
Senior Management Sponsorship is crucial in
these projects, and the level of support is com-
mensurate with the scope of the project. If the
project is Departmental in scope, then the
senior Departmental leadership must be on
board. Obviously support even higher will only
help, but be careful of bypassing chain-of-
command positions which could turn a poten-
tial ally into a snubbed detractor.
Forming a user/ITpartnership
In todays dynamic environment in which data
warehousing solutions are becoming key
ingredients to business success, users are
learning nearly as much about their data
requirements as OLTP users. Therefore, it is
critical to form a team where both technical
and end-user personnel can work together to
develop a mutually acceptable and technologi-
cally achievable set of specifications and
requirements. This should result in continuous
collaboration throughout the project as many
warehouse projects can be viewed as a discov-
ery process where the business perspective
must be weighed alongside the technical issues.
This is a good area to use RAD (Rapid
Application Development) techniques to allow
technicians to demonstrate to end-users the
potential capabilities of the system and to
allow users a tangible feedback mechanism.
One implementation of Data Warehousing
technology advocated the use of business
managers as well as end users. The business
managers can not only talk about whats done
today in the existing processes, but are in a
better position to articulate the business vision
to both the technicians and the end users.
Determining Metrics for Success of Project
How will the organization know that the pro-
ject was successful? If there is an intention of
using the original project as a proving
grounds for an enterprise-wide rollout, how
can there be a rollout if there is no yardstick
for measuring the results of the pilot? As part
of the planning process it is critical that met-
rics be defined which clearly demonstrate that
the results of the project meet and are aligned
with the original business goals which drove
the project to begin with. It is also imperative
that the metrics be objective rather than sub-
jective.
The project plan
This step is anathema to many technologists,
but a project plan detailing the objectives,
approach, strategy, ownership, timeframe, and
resources responsibilities is a must if the pro-
ject is to be managed with any degree of pro-
fessionalism and efficiency.
Identifying the areas of
expertise required
One of the outputs of a good plan is identifica-
tion of the resources required and an analysis
of whether they exist in-house and whether or
not they are available. Make sure they are rep-
resented on the team. Having senior manage-
ment support is a good prerequisite for being
able to get the people you need, and if they are
not available, for being able to get permission
ahead of time to go outside (hire service
providers) if necessary.
Developing a
communications plan
A Warehouse is only useful to the organization
if users exploit its abilities. It is foolhardy to
spend effort on developing such a project only
to spring a surprise onto a department or
enterprise which is not ready to take advan-
10
tage of it because they didnt know it was
coming, or when it would be ready, etc. A
communications plan is essential to dissemi-
nate information about the project.
Using benchmarking
techniques
If possible, investigate other organizations that
have successfully completed projects similar
to the one at hand. This can take the form of
literary research or actual trips to view opera-
tional systems, interview users, developers
and management as to the best practices they
have encountered in their project. Remember
that not all practices that work in one envi-
ronment are universally portable, so make
sure that the context in which a particular
practice is said to work can be extended to
the target environment.
Selecting a methodology
A methodology that is known and accepted by
the organization will go a long way to smooth-
ing out many of the rough spots which any
project will hit. A methodology provides three
components to a project:
A logical series of activities to achieve a
desired end. This structure will tell the orga-
nization what has to be done and when in
order to produce a quality product.
A definition of deliverables associated with
each activity and progress reports on the busi-
ness benefits from each activity. This tells the
organization what the output is of each step.
Roles and responsibilities for actors in
the activities.
The Methodology will also help define a pro-
ject structure and what has to be done to
manage the project: when reviews are to be
held and what each review will cover. One
trap to avoid here is the one of putting some-
one in charge who is too narrow in his or her
scope. For example, a technologist who has no
appreciation for or understanding of the busi-
ness problem, or a functional specialist who
has no patience for technological issues.
There has to be a blending of business and
technology in order for these projects to
succeed, and therefore the lead people have
to be sensitive to the various cultural differ-
ences of the team members. As the Data
Warehousing Institute said in a recent publi-
cation: Data Warehousing is a service
business, not a storage business.
Using service providers
Any organization will have areas where there
is either a lack of expertise or where there are
insufficient skilled resources available for the
project. The project manager should look for
areas of expertise which are not represented
on the team. In many IT shops, the most sig-
nificant missing element where consulting
services might be required is the planning
elements. On the business side, organizations
will need to adapt to the new opportunities
the technology provides. For example, report-
ing and analysis, prior to Warehouses, was
very flat file orientedrun a report and look
at it on paperrun another report and get it
on paper six weeks later. With Warehouse
tools such as OLAP, reports become three-
dimensional and you pivot and drill through
them searching for information. Designing
these reports with little or no prior experience
is very challenging in that you must think
differently. Users will need support in learn-
ing, specifying and evolving the output of
data warehouses.
Architecting the solution
Architecture is defined as the definition of the
components of a solution and their interaction
(see Fig. 8). This guide has delved into several
areas where the benefits of a well-defined and
integrated architecture have been demon-
strated. Defining the components of the solu-
tion and how they interact is a critical step in
implementing a successful project. The solution
should be an end-to-end solution and allow for
all the characteristics described earlier, includ-
ing scalability, extensibility and manageability.
The choice available to IT range from tightly
coupled and integrated product suites
provided by some vendors (e.g., IBM Visual
Warehouse
) or individual/groups of building
blocks of Warehouse products that IT would
take the primarily role in integrating to meet
their needs.
Defining a directory such as IBM Visual
Warehouse Information Catalog into the
architecture will allow the system to draw
business meta data from dozens of other prod-
ucts, including DB2, Oracle, Sybase, Hyperion
Essbase, CASE tools, and more. Tools such as
the Information Catalog can help provide
users with the keystone component of consis-
tent, synchronized meta data that helps
developers, maintainers and users alike.
Design
Develop
Pilot
Deploy
Train
Operations
Architecture
Fig. 8
Designing the system
Designs should be based on open, well-
understood architectures with well-defined
interfaces between the components. Use of
standards will help the scalability and flexi-
bility of the system over time. Navigational
tools for users and user interfaces in general
should be designed with the business prob-
lem in mind in a cooperative process with end
users and managers.
Departmental systems should be designed
within the context of an enterprise framework.
That is, where possible and practical, identify
data elements and concepts that span multiple
departments and try to define them in the
broadest possible terms so as to be able to
include other departments later on.
Developing the system
The development environment should be
capable of using RAD techniques to help end
users who might not be able to grasp
Warehousing and analysis concepts immedi-
ately. Use iterative prototyping with time
boxing and multiple releases where necessary
to gain consensus from the user community.
One of the challenges associated with build-
ing a system is the ongoing integration of
meta data. The value of integrated meta data
is that it reduces the time to implement a
Warehouse, as well as providing greater main-
tenance and management efficiencies. When
meta data is updated manually this process
can introduce increased error rates for end
users (e.g. defining tables incorrectly). One
solution is put forward by the Meta Data
Council; a corollary to the OMG (Open
Management Group) subcommittee for the
standardization of Common Warehouse Meta
Data and the related OMG subcommittee to
standardize XMI. It is a federated approach
whereby all stores of meta data type informa-
tion (data dictionaries, passive repositories,
encyclopedias, data base catalogs, etc.) pass
through a metahub that changes the syntax
and other relevant information about the
data. This allows any tool that adopts this
approach to interoperate and interchange
meta data within and around the Warehouse
on an ongoing basis. This approach is not yet
widely available but holds great promise
for resolving one of the major issues in
meta management.
11
Piloting the implementation
Pilot systems, also known as prototype systems
or proof of concept systems are among the
most misunderstood concepts in the industry.
There are three possible reasons for an orga-
nization to embark upon a pilot program:
The technology is foreign to the organization,
and there is a need to understand the benefits
of the technology to see how it might help in
the business problems at hand.
The technology is understood, but finding an
appropriate application is not. The organiza-
tion wants to know if a particular application
of this technology to a business problem is
appropriate, which is to say, will this technol-
ogy solve the problem?
The technology is understood as well as the
application, but the cost/benefit equation is
not understood. The organization wants to
know if the cost of the solution is worth it.
Each of these motivations will result in differ-
ent pilots, in different places in the organiza-
tion and with different associated metrics.
Regardless of the motivation, however, a pri-
mary question that must be asked is: How will
we know if the pilot answered our questions?
The only way is to develop quantitative as well
as qualitative metrics and in addition develop
tools and techniques for capturing and ana-
lyzing the results at the end of the pilot program.
Deploying the system
If the strategy calls for replicating successful
projects in other departments, then a roll out
plan must be developed to identify the order
of the rollout as well as any integration efforts
that must be dealt with. These might include
process reengineering, especially if multiple
departments are today involved in a process
that is not automated, and one of these
departments will receive the automation prior
to the others.
Training users
One sure way to snatch defeat from the jaws
of victory is to develop a technically outstand-
ing Warehouse and then let users loose on it
with no training. That which is intuitive to a
technologist steeped in Warehousing concepts
is gibberish to an end user whose focus is run-
ning a business. Training programs must be
developed which target not the technological
niceties behind the screens, but rather that
address the hows of using the system from a
business viewpoint. Focus on training the
user on understanding the meta data and the
navigation capabilities as well as how to use
those in analyzing a business problem.
Managing warehouse
operations
Developing a Warehouse is one thing, keep-
ing it operational is another. It requires man-
aging the timeliness of file transfer and load
processes as the system grows in size and
complexity. Daily operations create meta data
changes, such as the addition of a user, or
loading of a new star schema, which means
the meta data repository must be managed. In
addition, the organization will need feedback
reports on the Warehouse operations. Managers
will want to know that the quarterly data
extraction actually was started on schedule
and completed on time with all appropriate
indexes regenerated.
Some users, in spite of all best training
efforts will construct queries that are capable
of bringing a system to its knees. In these
situations, organizations will need database
management tools that report on how much
system resources are being used by a query
and allow the interruption and subsequent
termination of the query.
Some reporting tools are also able to identify
which areas of the database are being hit
hardest, or most frequently, or even identify
which times of the month/quarter/year those
areas are most likely to be accessed. This will
allow managers to govern the extraction and
indexing processeschange table structures,
drop columns, define different aggregations
according to anticipated uses.
New demands by business units enabled by
technology or new business theory will
always tax a core business system such as a
Warehouse, but good architecture will deliver
flexibility downstream to deal with these
opportunities such as todays web enabled
marts or integrated business process marts.
12
Summary
Data Warehousing can provide significant
new benefits to an organization. The technol-
ogy is moving from an early adopter stage to a
maturity phase. This has removed much of
the technology risk associated with early
adopters stage but risk associated with imple-
mentation and strategy are still very much
factors. By considering the issues described in
this guide you should be able to reduce risk
associated with implementing data warehouse
solutions and deliver higher value solutions in
a reduced timeframe.
Appendix AElements of
a warehouse
Fig. 9 represents the basic elements of a Data
Warehouse architecture that responds to the
needs of enterprises today. This model is
slightly different from most of the ones being
promulgated throughout the industry in that
it combines the technological elements with
the process and planning elements mentioned
earlier. The human figures in the diagram
represent those elements that are primarily of
a service nature, such as planning, analysis,
processes and management.
In this section we will briefly touch on the
individual elements and their contribution to
the overall architecture.
Planning/analysis
Arguably the most important element of the
Data Warehouse architecture, the planning
and analysis constituent is the one in which
the enterprise determines which problems it
needs to solve, what the characteristics of the
problem are and how best to solve them. Once
the problem is clearly framed, then the team
(a joint IT/Business Unit) can move on to
identifying and resolving other dependent
questions such as:
What existing business processes are involved
in the defined problem?
How large a warehouse do we need to satisfy
our business problem?
What operational data systems (existing oper-
ational data sources) do we need to access to
get the data we need?
What transformations will need to be built to
convert the operational data into usable ware-
housed data?
What are the characteristics of the Data
Model we need to build to answer the antici-
pated queries?
Data acquisition
Data acquisition includes extraction of data
from operational systems, cleansing the data
(restructuring records, removal of operational
data, translating field values to a common
data dictionary) and checking data integrity
and consistency. It involves transformation
(adding time fields, summarization, and
derived fields), loading of the clean data into
the Warehouse database, updating the
Warehouse indexes, etc. It is the component of
the architecture that provides the raw data
into the warehouse repository that will then
be used for analysis. Clearly, it will be impor-
tant to understand how the data will be used
so that transformations, cleanings, etc. render
a suitable format with which to accomplish
our query goals.
Data storage
The Data Storage element represents the
database and accompanying structures that
are used in the analytical process. Relational
and multidimensional databases are primarily
used in architectures today, and while there is
a fair amount of debate in the industry as to
which provides the best results, relational
databases offer more flexibility, smaller over-
all size, and easier access to atomic data for
drill down operations. A trend emerging
within the industry is for database vendors to
increasingly add multi- dimensional functions
into relational products.
Meta data
Just as any other kind of warehouse needs to
keep an inventory of its holdings, a Data
Warehouse needs to keep track of what data it
is currently holding along with the pedigree
of that data. This is the role of the meta data
repository, to give users and technicians easy-
to-understand information about the data
such as where the data came from, which
rules were used in creating the data, what the
data elements mean, etc. Many systems divide
the Business or End-User Directory, so that
technical information about the data is kept
in a separate directory from that which is
required by the user to understand the data
from a business perspective. (For example
Database
Database
Database
Legacy
Systems
Data
Store
Warehouse
Management
and Control
Business Processes
Methodology Planning
Cleanse
Transform
Data
Acquisition
Data Access
Meta data
Data
Model
Fig. 9
13
IBM Visual Warehouse Manager deals with
technical meta data while their Visual
Warehouse Information Catalog manages the
business meta data.)
Data access
Here, the rubber meets the road in that the
tools presented here to the end user are the
ones that will ultimately drive their perception
of the utility of the system. Access can be cat-
egorized using one of the following descriptors:
Standard Query, Data Interpretation,
Multidimensional Analysis, Data Mining, and
Enterprise Reporting.
Standard query tools allow users to develop
an hypothesis and create questions (queries)
to test its validity. This is sometimes called a
verification-driven approach. Examples
include business statistics and optimization
(linear programming). Multi-dimensional
analysis tools facilitate flexible investigation of
the data along various dimensions, applying
operations such as time series analysis, and
enabling interactive drill-down capabilities.
Data mining tools use a discovery-driven
approach where sophisticated data mining
algorithms are automatically applied to detect
trends, patterns, and correlation hidden in the
data. Enterprise Reporting is information dis-
tribution for the masses, usually via a Web
browser that provides collection and distribu-
tion of data to large numbers of people
throughout the enterprise, sometimes referred
to as publish and subscribe.
Data delivery
This element refers to how the resultant data is
presented to the end user, and has two
dimensions, the physical and logical conduit
by which it reaches the end user, and the
mechanisms by which the data may be visu-
alized. Geographic dispersion, security and
data volumes will dictate the use of various
conduits such as Local Area Networks,
Internet, Wide Area Networks (public and
private), etc. Query complexity and output
volume will dictate the rendering of the data.
Options range from simple tables such as
spreadsheets to simple two-dimensional
graphics such as bar graphs and pie charts to
very sophisticated visualization technologies
that utilize three-dimensional landscapes to
portray the results.
Management
Management of the Warehouse represents an
indispensable architectural element. Once
the Warehouse is set up, refreshes of the data
must be scheduled and efficiently and correctly
executed. Backups of the Warehouse must be
made at reasonable and predetermined fre-
quencies. Users must be added and deleted.
Security must be administered. The list goes
on. The larger the warehouse, the more diffi-
cult the task becomes, and therefore the more
important that this be done well. A recent
article in Computerworld stressed the fact
that some Warehouses have reached a size
such that data refreshes cannot be done with-
out taking the system down during opera-
tional hours, because of 18 and 20 hour
upload processes.
Methodology
Methodologies are key ingredients for any
complex project. A good methodology
will define:
A logical stepwise approach to solving a prob-
lem where each step builds on the
previous one.
A complete set of deliverables and measure-
ments against goals so that nothing is
forgotten in the project.
A project structure including roles and
responsibilities of each participant.
Use of a good methodology will greatly
increase the chances of success of any project,
but in Data Warehousing it is essential due to
the integrative (and therefore complex)
nature of the problem.
Project management and
administration
Although this technically falls under the
purview of a good methodology, it is impor-
tant to emphasize the need for solid project
management with both IT and Business units
involved in developing and setting up the
Warehouse. Assembling the team, setting and
managing expectations and working an effec-
tive communications plan are all a part of a
good project management team.
Appendix BCase
studies
IBMs principal solution for generating and
managing Data Warehouse and Data Mart
systems is Visual Warehouse. Other IBM
offerings such as the Data Replication family
and DB2 DataJoiner (for multi-vendor data-
base access) can complement Visual
Warehouse as data is moved from source to
target systems. IBM also partners with com-
panies such as Evolutionary Technologies
International for more complex extract capa-
bilities and Vality Technology Inc. for data
cleansing technology. In addition, IBM has
key partnering arrangements with Brio
Technology, Business Objects, and Cognos for
query and reporting as well as with Hyperion
for OLAP technology.
For the warehouse database, IBM offers
industry-leading DB2. The DB2 family spans
Netfinity systems, AS/400 systems, RISC
System/6000 hardware, IBM mainframes,
non-IBM machines from Hewlett-Packard
and Sun Microsystems, and operating systems
such as OS/2, Windows (9x & NT), UNIX,
OS/400, and OS/390. When DB2 DataJoiner
is used in conjunction with Visual Warehouse,
non-IBM databases, such as those from
Oracle, Sybase, and Informix serve as the
warehouse database.
More case studies and information is avail-
able on IBMs Web site at
www.ibm.com/software/bi.
Visual Warehouse helps
McDonalds Canada get to the
meat of its marketing data
Behind every mouthful of a beefy Big Mac
burger is a huge organization, working long
hours to make sure customers get the quality
and service that has become McDonalds hall-
mark. In Canada, more than 1,000 McDonalds
restaurants do brisk business. Nevertheless,
growing competitive pressures and customers
demand for new values have prompted
McDonalds Canada to aggressively expand its
market presence with a larger number of
strategically located restaurants. At the same
time, the company constantly strives to cur-
tail its cost of operations. Oswald Edwards,
manager for strategic & architectural plan-
ning at McDonalds Canada, explains,
14
Extensive discussions with our business
executives revealed that they needed detailed,
accurate, and timely information for strategic
planning and decision making.
Today, a new DB2-based data warehouse, cre-
ated and run by IBM Visual Warehouse, is
providing key transaction information to
market analysts at McDonalds Canada. It
includes information that is helping them
answer questions such as what combination of
products sells the most at a given time of day,
which day of the week a new campaign should
be launched, the success of promotional
campaigns linked with other leading brands,
and much more. Says Edwards, We were able
to convince our business users that IBMs
data warehouse solution was the one for us,
adding, Weve achieved enormous returns
from our investment in Visual Warehouse. It
will give us access to information that will
help us to substantially increase restaurants
sales and reduce operating expenses.
Simplifying development, driving down
maintenance costs
The data warehouse, in DB2 for AIX, resides
on a four-node RS/6000 SP server. Driving
this is Visual Warehouse. Running on a
Windows NT
IBMs approach to
developing an architecture for a data ware-
house that supports the implementation of
either functional, centralized, or decentral-
ized warehouses.
Informational applicationsApplications
which are written to analyze data from a Data
Warehouse for Decision Support purposes.
Informational dataData which has been
extracted, summarized, and stored in a Data
Warehouse for purposes of supporting
Informational Applications.
Meta dataData about data. For example,
information about where the data is stored,
who is responsible for maintaining the data,
how often the data is refreshed, etc.
MiddlewareA communications layer that
allows applications to interact across hard-
ware and network environments.
Multi-dimensional analysis (MDA)
Informational Analysis on data which takes
into account many different relationships,
each of which represents a dimension. For
example, a person doing an analysis of retail
may want to understand the relationships
among sales by region, by quarter, by demo-
graphic distribution (income, education level,
gender, or by product). Multi-Dimensional
Analysis will yield results for these complex
relationships. Multi-Dimensional Analysis is
sometimes referred to as On-Line Analytical
Processing or OLAP.
On-line analytical processing (OLAP)
Processing that supports the analysis of busi-
ness trends and projections. It is also known
as Multi-Dimensional Analysis.
Operational applicationsApplications
which support the daily operations of the
enterprise. Usually included in this class of
applications are Order Entry, Accounts
Payable, Accounts Receivable, etc.
QueryA request for information from the
Data Warehouse posed by the user or tool
operated by the user.
Relational database management system
(RDBMS)A database system built around
the relational model based on tables, columns
and views.
ReplicationThe process of keeping a copy
of data.
Source databaseThe database from which
data will be extracted or copied into the
Data warehouse.
Star schemaA modeling scheme that has a
single object in the middle connected to a
number of objects around it radially - hence
the name star. A fact such as sales, compensa-
tion, payment, or invoices is qualified by one
or more dimensions such as by month, by
product, by geographical region. The fact is
represented by a fact table and the dimen-
sions are represented by dimension tables.
Target databaseThe database in which data
will be loaded or inserted.
Technical directoryThe portion of the Meta
data Repository which deals with the techni-
cal information about the data. Such infor-
mation may include the field designation
(alphanumeric, etc.), the number of charac-
ters, range checks, etc.
18
GC-26-9313-01
For more information about
IBM Visual Warehouse
please contact your IBM marketing
representative or IBM authorized software
reseller, or visit our Web site at
www.ibm.com/software/vwor
www.ibm.com/software/data