Ch4 - Data Warehousing
Ch4 - Data Warehousing
Data Warehousing
1
Outline
• Definition of Data Warehouse
• Star Schema 2
Definition of Data Warehouse
• It is a huge central database that accepts, stores and maintain
data from different sources and locations.
• Disparate sources may use different formats and technologies.
3
Definition of Data Mart
• A data mart is a simple form of a data warehouse
that is focused on a single subject (or functional
area), such as sales, finance or marketing.
• Data marts are small slices of the data
warehouse.
• Data marts are often built and controlled by a
single department within an organization.
• Given their single-subject focus, data marts
usually draw data from only a few sources.
• The sources could be internal operational
systems, a central data warehouse, or external
data. 4
Reasons for creating a data mart
• Easy access to frequently needed data
• Creates collective view by a group of users
• Improves end-user response time
• Ease of creation
• Lower cost than implementing a full data
warehouse
• Potential users are more clearly defined than
in a full data warehouse
• Contains only business essential data and is
less cluttered.
5
Benefits of Data Warehouse
• Collect data from multiple sources into a single database so a
single query engine can be used to present data.
• Maintain data history, even if the source transaction systems
do not.
• Integrate data from multiple source systems, enabling a
central view across the enterprise.
• Improve data quality by flagging or even fixing bad data.
• Present the organization's information consistently (constantly
and reliably).
• Provide a single common data model for all data of interest
regardless of the data's source.
• Restructure the data so that it makes sense to the business
users.
• Making decision–support queries are easier to write.
6
Example of using a Data Warehouse
7
Characteristics of Data Warehouse
• A data warehouse is a system used for reporting and
data analysis.
• Integrating data from one or more disparate sources
creates a central repository of data, a data
warehouse (DW).
• Data warehouses store current and historical data
and are used for creating trending reports for senior
management reporting such as annual and quarterly
comparisons.
• The data stored in the warehouse is uploaded from
the operational systems.
8
Operational and Informational Systems
9
Data Warehouse Versus Data Mart
10
Types of systems used (1)
Online Analytical Processing (OLAP)
• It is characterized by a low volume of transactions.
• Queries are often very complex and involve
aggregations.
• OLAP databases store aggregated, historical data in
multi-dimensional schemas (usually star schemas).
12
Data Warehouse Architectures
• Generic Two-Level Architecture
• Independent Data Mart
• Dependent Data Mart and Operational Data
Store
• Logical Data Mart and Real-Time Data
Warehouse
• Three-Layer architecture
L
One,
company-
wide
T warehouse
E 14
T
E
T
E Simpler data access
Single ETL for Dependent data marts
Enterprise Data Warehouse (EDW) loaded from EDW 16
Logical data mart and real time warehouse architecture
T
E
Near real-time ETL for Data marts are NOT separate databases,
Data Warehouse but logical views of the data warehouse
17
Easier to create new data marts
Three-layer data architecture for a data warehouse
18
Data Characteristics: Status vs. Event Data
Status
Status
19
Data Characteristics: Transient vs. Periodic Data
With transient
data, changes
Transient
to existing
operational
records are
data
written over
previous
records, thus
destroying
the previous
data content
20
Data Characteristics: Transient vs. Periodic Data
Periodic data
are never
Periodic physically
warehouse altered or
data deleted once
they have
been added
to the store
21
The Reconciled Data Layer
• Typical operational data is:
– Transient–not historical
– Not normalized (perhaps due to denormalization for
performance)
– Restricted in scope–not comprehensive
– Sometimes poor quality–inconsistencies and errors
Record-level: Field-level:
Selection–data partitioning single-field–from one field to one field
Joining–data combining multi-field–from many fields to one, or
Aggregation–data summarization one field to many 26
Steps in data reconciliation (4)
Load/Index= place transformed data into the warehouse and create indexes
28
Multifield transformation
29
Star Schema
• The star schema separates business
process data into facts.
• Facts hold the measurable, quantitative data
about a business, and dimensions which are
descriptive attributes related to fact data.
• Examples of fact data include sales price,
sale quantity, and time, distance, speed, and
weight measurements.
30
Components of a star schema
Fact tables contain factual or
quantitative data
31
Excellent for ad-hoc queries, but bad for online transaction processing
Star schema example
Fact table provides statistics for
sales broken down by product,
period and store dimensions
32
Star schema with sample data
33