Module8 DataWarehousing
Module8 DataWarehousing
Warehousing
Introduction
• Data warehousing’s goal is to make the right
information available @ the right time
• Data warehousing is a data store (eg., a database of
some sort) and a process for bringing together
disparate data from throughout an organization for
decision-support purposes
• Relational database management systems (RDBMS),
such as Oracle, DB2, Sybase, Informix, Focus, SQL
Server, etc. are often used for data warehousing
Definitions of a Data Warehouse
Customers Orders
Transactions
Vendors Etc…
Data Miners:
Etc… • “Farmers” – they know
• “Explorers” - unpredictable
Copied,
organized
summarized
Decision
Support
Data Mart Information
Decision
Data Support
Data Mart Information
Warehouse
Decision
Support
Data Mart Information
Generic Architecture of Data
• Describes the
datadata
in a the
wayway
thatitiswill
familiar
be stored
to business
in an
users which might be different than the way the
RDBMS
logical shows it
Metadata
• General definition: Data about data !!!
– Examples:
• A library’s card catalog (metadata) describes publications (data)
• A file system maintains permissions (metadata) about files (data)
• A form of system documentation including:
– Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.)
– Description of the contents of each field (e.g., start date)
– Date when data were loaded
– Indication of currency of the data (last updated)
– Mappings between systems (e.g., A.this = B.that)
• Invaluable, otherwise have to research to find it
Business Rules
• Highest level of abstraction from operational (transaction)
data
• Describes why relationships exist and how they are applied
• Examples:
– Need to have 3 forms of ID for credit
– Only allow a maximum daily withdrawal of $200
– After the 3rd log-in attempt, lock the log-in screen
– Accept no bills larger than $20
– Others???
General Architecture for Data Warehousing
• Source systems
• Extraction, (Clean),
Transformation, & Load
(ETL)
• Central repository
• Metadata repository
• Data marts
• Operational feedback
• End users (business)
Where does OLAP fit in?
OLAP Overview
• Interactive, exploratory analysis of
multidimensional data to discover patterns
gender
n ts
e
c id
age ac
OLAP Architecture
Server Options
• Single processor
• Symmetric
multiprocessor (SMP)
• Massively parallel
processor (MPP)
OLAP Server Options
• ROLAP (Relational)
• MOLAP (Multidimensional)
• HOLAP (Hybrid)
OLAP – Online Analytical Processing
• A definition:
Blue
blob
OLAP Cube - 5
Page Columns
Three- Region:
North
Sales
Display Rows
Year
1997
Total
OLAP Cube - 6
Dimension Example
Six- Brand Mt. Airy
Store Atlanta
Dimensional Customer segment Business
Cube Product group
Period
Desks
January
Variable Units sold
Rotation (Pivot Table)
Drill Down
Region Sales variance
Africa 105%
Asia 57%
Europe 122%
North America 97%
Pacific 85%
South America 163%