0% found this document useful (0 votes)
13 views42 pages

Basic Elements of A Data Warehouse: Prof. Navneet Goyal Department of Computer Science BITS, Pilani

The document discusses the basic elements of a data warehouse, including source systems, data staging areas, and operational data stores, while comparing the Kimball and Inmon approaches to data warehousing. It highlights the advantages and disadvantages of data marts, the importance of pilot projects, and the role of operational data stores in data warehousing architecture. The document concludes that both Kimball's and Inmon's philosophies have merit, with many enterprises leaning towards Kimball's bottom-up approach due to its practicality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views42 pages

Basic Elements of A Data Warehouse: Prof. Navneet Goyal Department of Computer Science BITS, Pilani

The document discusses the basic elements of a data warehouse, including source systems, data staging areas, and operational data stores, while comparing the Kimball and Inmon approaches to data warehousing. It highlights the advantages and disadvantages of data marts, the importance of pilot projects, and the role of operational data stores in data warehousing architecture. The document concludes that both Kimball's and Inmon's philosophies have merit, with many enterprises leaning towards Kimball's bottom-up approach due to its practicality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Basic Elements of a

Data Warehouse
Prof. Navneet Goyal
Department Of Computer Science
BITS, Pilani
Jun 17, 2025 1
Basic Elements of a DW
• Source Systems
• Data Staging Area
• Presentation Servers
• Data Mart/Super Marts
• Data Warehouse
• Operational Data Store
• OLAP
Kimball vs. Inmon

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 2


Data Warehousing
Architecture
Monitoring &
OLAP
Administration
servers
Metadata
Repository Analysis

Extract
Query/
External
Sources
Transform Reportin
Load
Refresh
Serv g
Operation e
al dbs Data
Mining

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 3

Data
Data Marts
• What is a data mart?
• Advantages and disadvantages of
data marts
• Issues with the development and
management of data marts

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 4


Data Marts
• A subset of a data warehouse that
supports the requirements of a
particular department or business
process
• Characteristics include:
– Does not always contain detailed data
unlike data warehouses
– More easily understood and navigated
– Can be dependent or independent

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 5


Reasons for Creating Data
Marts

• Proof of Concept for the DW


• Can be developed quickly and less
resource intensive than DW
• To give users access to data they
need to analyze most often
• To improve query response time due
to reduction in the volume of data to
be accessed

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 6


Kimball vs Inmon
• Bill Inmon's paradigm: Data warehouse is one
part of the overall business intelligence
system. An enterprise has one data
warehouse, and data marts source their
information from the data warehouse. In
the data warehouse, information is
stored in 3rd normal form.
• Ralph Kimball's paradigm: Data warehouse is
the conglomerate of all data marts
within the enterprise. Information is
always stored in the dimensional model.

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 7


Kimball vs Inmon

• Bill Inmon: Endorses a Top-Down design


Independent data marts cannot comprise an effective EDW.
Organizations must focus on building EDW
• Ralph Kimball: Endorses a Bottom-Up
design
EDW effectively grows up around many of the several
independent data marts – such as for sales, inventory, or
marketing

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 8


Kimball vs Inmon: War of Words
"...The data warehouse is nothing more than the
union of all the data marts...,"
Ralph Kimball, December 29, 1997.

"You can catch all the minnows in the ocean and stack
them together and they still do not make a whale,"
Bill Inmon, January 8, 1998.

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 9


Data Warehouse or Data Mart
First?
• Top-Down vs. Bottom-Up Approach
• Advantages of Top-Down
– A truly corporate effort, an enterprise view of
data
– Inherently architected-not a union of
disparate DMs
– Central rules and control
– May be developed fast using iterative
approach

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 10


Data Warehouse or Data Mart
First?
• Disadvantages of Top-Down
– Takes longer to build even with iterative
method
– High exposure/risk to failure
– Needs high level of cross functional skills
– High outlay without proof of concept
– Difficult to sell this approach to senior
management and sponsors

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 11


Data Warehouse or Data Mart
First?
• Advantages of Bottom-Up Approach
– Faster and easier implementation of
manageable pieces
– Favorable ROI and proof of concept
– Less risk of failure
– Inherently incremental; can schedule
important DMs first
– Allows project team to learn and grow

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 12


Data Warehouse or Data Mart
First?
• Disadvantages of Bottom-Up Approach
– Each DM has its own narrow view of data
– Permeates redundant data in every DM
– Difficult to integrate if the overall
requirements are not considered in the
beginning
• Kimball’s approach is considered as a
Bottom-Up approach, but he disagrees

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 13


The Bottom-Up Misnomer

Kimball encourages you to broaden your


perspective both “vertically” and
“horizontally” while gathering business
requirements while developing data
marts

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 14


The Bottom-Up Misnomer
• Vertical
– Don’t just rely on the business data analyst
to determine requirements
– Inputs from senior managers about their
vision, objectives, and challenges are critical
– Ignoring this vertical span might cause failure
in understanding the organization’s direction
and likely future trends

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 15


The Bottom-Up Misnomer
• Horizontal
– Look horizontally across the departments
before designing the DW
– Critical in establishing the enterprise view
– Challenging to do if one particular department
if funding the project
– Ignoring horizontal span will create isolated,
department-centric databases that are
inconsistent and can’t be integrated
– Complete coverage in a large organization is
difficult
– One rep. from each dept. interacting with the
core development team can be of immense
help
Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 16
Data Warehouse or Data Mart
First?
New Practical approach by Kimball
1. Plan and define requirements at the overall
corporate level
2. Create a surrounding architecture for a
complete warehouse
3. Conform and standardize the data content
4. Implement the Data Warehouse as a series
of Supermarts, one at a time

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 17


A Word about SUPERMARTS
• Totally monolithic approach vs. totally
stovepipe approach
• A step-by-step approach for building an EDW
from granular data
• A Supermart s a data mart that has been
carefully built with a disciplined architectural
framework
• A Supermart is naturally a complete subset of
the DW
• A Supermart is based on the most granular
data that can possible be collected and stored
• Conformed dimensions and standardized fact
definitions
Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 18
Pilot Projects: Risk vs. Reward
• Start with a pilot implementation as
the first rollout for DW
• Pilot projects have advantage of
being small and manageable
• Provide organization with a “proof of
concept”

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 19


Pilot Projects: Risk vs. Reward
Functional scope of a pilot project
should be determined based on:
1. The Degree of risk enterprise is
willing to take
2. The potential for leveraging the
pilot project
 Avoid constructing a throwaway
prototype
 Pilot warehouse must have actual
value to the enterprise
Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 20
Pilot Projects: Risk vs.
Reward

High Risk High Risk


Low Reward High reward
RISK

Low Risk Low Risk


Low Reward High Reward

Jun 17, 2025


REWARD
© Prof. Navneet Goyal, Dept. of Comp. Sc. 21
Kimball vs. Inmon
There is no right or wrong between
these two ideas, as they represent
different data warehousing
philosophies. In reality, the data
warehouse in most enterprises are
closer to Ralph Kimball's idea. This is
because most data warehouses started
out as a departmental effort, and hence
they originated as a data mart. Only
when more data marts are built later do
they evolve into a data warehouse.
Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 22
Dependent Data Marts

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 23

Figure source unknown


Independent Data
Marts

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 24

Figure source unknown


References
1. The Bottom-Up misnomer, Margy Ross and Ralph Kimball,

http://www.intelligententerprise.com/030917/615warehouse1_1.sht
ml
,
September 2003.
2. Data Warehousing Fundamentals, Paulraj Pooniah, J Wiley,
2012.
3. The Data Warehouse Toolkit, 3e, Ralph Kimball, J Wiley, 2002.
4. Data Warehousing: Architecture and Implementation, Mark
Humphries et al., Pretince Hall PTR, 1999.
5. Building the Data Warehouse, 4e, WH Inmon,

Jun 17, 2025 © Prof. Navneet Goyal, Dept. of Comp. Sc. 25


ODS

• An operational data store (ODS) is a type of


database often used as an interim area for a
data warehouse.
• ODS Is highly volatile
• An ODS is designed to quickly perform
relatively simple queries on small amounts
of data (such as finding the status of a
customer order)
• An ODS is similar to your short term
memory in that it stores only very recent
information; in comparison, the data
warehouse is more like long term memory
in that it stores relatively permanent
information.
ODS

Figure taken from The Operational Data Store:


Designing the Operational Data Store, By Bill Inmon, DM Review, July 1998
ODS

Figure taken from The Operational Data Store:


By Bill Inmon, INFO DB, 1995
ODS

• In Figure 1 the ODS is seen to be


an architectural structure that is
fed by integration and
transformation (i/t) programs.
These i/t programs can be the
same programs as the ones that
feed the data warehouse or they
can be separate programs.
• The ODS, in turn, feeds data to the
data warehouse.
ODS

• According to Inmon, an ODS is a


"subject-oriented, integrated,
volatile, current valued data store,
designed to serve operational
users as they do high performance
integrated processing.
• In the early 1990s, the original
ODS systems were developed as a
reporting tool for administrative
purposes
ODS

• Subject-oriented
• Customer, product, account, vendor etc.
• Integrated
• Data is cleansed, standardized and placed into
a consistent data model
• Volatile
• UPDATEs occur regularly, whereas data
warehouses are refreshed via INSERTs to firmly
preserve history
• Current valued
• Changes are made almost with zero latency
Classification of ODS

Table source unknown


ODS
• ODS is also referred to as Generation 1 DW
• Separate system that sits between source
transactional system & DW
• Hot extract used for answering narrow
range of urgent operational questions like:
– Was the order shipped?
– Was the payment made?
• ODS is particularly useful when:
– ETL process of the main DW delayed the
availability of data
– Only aggregated data is available
ODS

• ODS plays a dual role:


– Serve as a source of data for DW
– Querying
• Supports lower-latency reporting through
creation of a distinct architectural
construct & application separate from DW
• Half operational & half DSS
• A place where data was integrated & fed to
a downstream DW
• Extension of the DW ETL layer
ODS

• ODS has been absorbed by the DW


– Modern DWs now routinely extract data
on a daily basis
– Real-time techniques allow the DW to
always be completely current
– DWs hav become far more operational
than in the past
– Footprints of conventional DW & ODS
now overlap so completely that it is not
fruitful to make a distinction between
the kinds of systems
ODS
• Classification of ODS based on:
– Urgency
• Class I - IV
– Position in overall architecture
• Internal or External
A Word About ODS
• Urgency
– Class I – Updates of data from operational
systems to ODS are synchronous
– Class II – Updates between operational
environment & ODS occurs between 2-3
hour frame
– Class III – synchronization of updates
occurs overnight
A Word About ODS
• Urgency
– Class IV – Updates into the ODS from the
DW are unscheduled
• Data in the DW is analyzed, and periodically
placed in the ODS
• For Example –Customer Profile Data
• Customer Name & ID
• Customer Volume – High/low
• Customer Profitability – High/low
• Customer Freq. of activity – very freq./very
infreq.
• Customer likes & dislikes
ODS
ODS & Real-Time Data
Warehousing
• Which class of ODS can be used for
RTDWH?
• HOW?
• Let us first look at what we mean by
RTDWH
• Wait till we talk about RTDWH
Q&A

Jun 17, 2025 41


Thank You

Jun 17, 2025 42

You might also like