0% found this document useful (0 votes)
42 views31 pages

Chapter 13 - Data Warehousing

The document discusses key aspects of a data warehouse including that it is an integrated, subject-oriented, time-variant, non-volatile database that supports decision making. It provides details on how data is organized, including that it is integrated from multiple sources, arranged by subject area, represents data flow over time, and is never removed once added. The document also discusses data marts, OLAP, star schemas, and relational OLAP as techniques for organizing and analyzing multidimensional data to support decision making.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views31 pages

Chapter 13 - Data Warehousing

The document discusses key aspects of a data warehouse including that it is an integrated, subject-oriented, time-variant, non-volatile database that supports decision making. It provides details on how data is organized, including that it is integrated from multiple sources, arranged by subject area, represents data flow over time, and is never removed once added. The document also discusses data marts, OLAP, star schemas, and relational OLAP as techniques for organizing and analyzing multidimensional data to support decision making.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Chapter 13 – Data

Warehousing
Data Warehouse
 DSS – friendly data repository for the DSS is
the DATA WAREHOUSE

 Definition: Integrated, Subject-Oriented,


Time-Variant, Nonvolatile database that
provides support for decision making
Integrated
 The data warehouse is a centralized,
consolidated database that integrated data
derived from the entire organization
– Multiple Sources
– Diverse Sources
– Diverse Formats
Subject-Oriented
 Data is arranged and optimized to provide
answer to questions from diverse functional
areas
– Data is organized and summarized by topic
 Sales / Marketing / Finance / Distribution / Etc.
Time-Variant
 The Data Warehouse represents the flow of
data through time
 Can contain projected data from statistical
models
 Data is periodically uploaded then time-
dependent data is recomputed
Nonvolatile
 Once data is entered it is NEVER removed
 Represents the company’s entire history
– Near term history is continually added to it
– Always growing
– Must support terabyte databases and
multiprocessors
 Read-Only database for data analysis and
query processing
Data Marts
 Small Data Stores
 More manageable data sets
 Targeted to meet the needs of small groups
within the organization

 Small, Single-Subject data warehouse


subset that provides decision support to a
small group of people
OLAP
 Online Analytical Processing Tools
 DSS tools that use multidimensional data
analysis techniques
– Support for a DSS data store
– Data extraction and integration filter
– Specialized presentation interface
12 Rules of a Data Warehouse
 Data Warehouse and Operational
Environments are Separated
 Data is integrated
 Contains historical data over a long period
of time
 Data is a snapshot data captured at a given
point in time
 Data is subject-oriented
12 Rules of Data Warehouse
 Mainly read-only with periodic batch updates
 Development Life Cycle has a data driven
approach versus the traditional process-
driven approach
 Data contains several levels of detail
– Current, Old, Lightly Summarized, Highly
Summarized
12 Rules of Data Warehouse
 Environment is characterized by Read-only
transactions to very large data sets
 System that traces data sources, transformations,
and storage
 Metadata is a critical component
– Source, transformation, integration, storage,
relationships, history, etc
 Contains a chargeback mechanism for resource
usage that enforces optimal use of data by end
users
OLAP
 Need for More Intensive Decision Support
 4 Main Characteristics
– Multidimensional data analysis
– Advanced Database Support
– Easy-to-use end-user interfaces
– Support Client/Server architecture
Multidimensional Data Analysis
Techniques
 Advanced Data Presentation Functions
– 3-D graphics, Pivot Tables, Crosstabs, etc.
– Compatible with Spreadsheets & Statistical
packages
– Advanced data aggregations, consolidation and
classification across time dimensions
– Advanced computational functions
– Advanced data modeling functions
Advanced Database Support
 Advanced Data Access Features
– Access to many kinds of DBMS’s, flat files, and
internal and external data sources
– Access to aggregated data warehouse data
– Advanced data navigation (drill-downs and roll-
ups)
– Ability to map end-user requests to the
appropriate data source
– Support for Very Large Databases
Easy-to-Use End-User Interface
 Graphical User Interfaces
 Much more useful if access is kept simple
Client/Server Architecture
 Framework for the new systems to be
designed, developed and implemented
 Divide the OLAP system into several
components that define its architecture
– Same Computer
– Distributed among several computer
OLAP Architecture
 3 Main Modules
– GUI
– Analytical Processing Logic
– Data-processing Logic
OLAP Client/Server
Architecture
Relational OLAP
 Relational Online Analytical Processing
– OLAP functionality using relational database
and familiar query tools to store and analyze
multidimensional data
 Multidimensional data schema support
 Data access language & query performance
for multidimensional data
 Support for Very Large Databases
Multidimensional Data Schema
Support
 Decision Support Data tends to be
– Nonnormalized
– Duplicated
– Preaggregated
 Star Schema
– Special Design technique for multidimensional
data representations
– Optimize data query operations instead of data
update operations
Star Schemas
 Data Modeling Technique to map
multidimensional decision support data into
a relational database
 Current Relational modeling techniques do
not serve the needs of advanced data
requirements
Star Schema
 4 Components
– Facts
– Dimensions
– Attributes
– Attribute Hierarchies
Facts
 Numeric measurements (values) that represent a
specific business aspect or activity
 Stored in a fact table at the center of the star
scheme
 Contains facts that are linked through their
dimensions
 Can be computed or derived at run time
 Updated periodically with data from operational
databases
Dimensions
 Qualifying characteristics that provide
additional perspectives to a given fact
– DSS data is almost always viewed in relation to
other data
 Dimensions are normally stored in
dimension tables
Attributes
 Dimension Tables contain Attributes
 Attributes are used to search, filter, or classify
facts
 Dimensions provide descriptive characteristics
about the facts through their attributed
 Must define common business attributes that will
be used to narrow a search, group information, or
describe dimensions. (ex.: Time / Location /
Product)
 No mathematical limit to the number of dimensions
(3-D makes it easy to model)
Attribute Hierarchies
 Provides a Top-Down data organization
– Aggregation
– Drill-down / Roll-Up data analysis
 Attributes from different dimensions can be
grouped to form a hierarchy
Star Schema for Sales
Dimension
Tables

Fact Table
Star Schema Representation
 Fact and Dimensions are represented by physical
tables in the data warehouse database
 Fact tables are related to each dimension table in
a Many to One relationship (Primary/Foreign Key
Relationships)
 Fact Table is related to many dimension tables
– The primary key of the fact table is a composite primary
key from the dimension tables
 Each fact table is designed to answer a specific
DSS question
Star Schema
 The fact table is always the larges table in
the star schema
 Each dimension record is related to
thousand of fact records
 Star Schema facilitated data retrieval
functions
 DBMS first searches the Dimension Tables
before the larger fact table
Data Warehouse Implementation
 An Active Decision Support Framework
– Not a Static Database
– Always a Work in Process
– Complete Infrastructure for Company-Wide
decision support
– Hardware / Software / People / Procedures /
Data
– Data Warehouse is a critical component of the
Modern DSS – But not the Only critical
component

You might also like