0% found this document useful (0 votes)
12 views

Module8 DataWarehousing

Uploaded by

Bhumika Kukade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module8 DataWarehousing

Uploaded by

Bhumika Kukade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Module 8: Data

Warehousing
Introduction
• Data warehousing’s goal is to make the right
information available @ the right time
• Data warehousing is a data store (eg., a database of
some sort) and a process for bringing together
disparate data from throughout an organization for
decision-support purposes
• Relational database management systems (RDBMS),
such as Oracle, DB2, Sybase, Informix, Focus, SQL
Server, etc. are often used for data warehousing
Definitions of a Data Warehouse

“A subject-oriented, integrated, time-variant and non-

1. volatile collection of data in support of management's


decision making process”
- W.H. Inmon

“A copy of transaction data, specifically


2. structured for query and analysis”
- Ralph Kimball
Data Warehouse
• For organizational learning to take place, data from
many sources must be gathered together and
organized in a consistent and useful way – hence,
Data Warehousing (DW)
• DW allows an organization (enterprise) to remember
what it has noticed about its data
• Data Mining techniques make use of the data in a
Data Warehouse
Data Warehouse
Enterprise
“Database”

Customers Orders
Transactions

Vendors Etc…

Data Miners:
Etc… • “Farmers” – they know
• “Explorers” - unpredictable
Copied,
organized
summarized

Data Data Mining


Warehouse
Data Warehouse
 A data warehouse is a copy of transaction data specifically
structured for querying, analysis, reporting, and more
rigorous data mining
 Note that the data warehouse contains a copy of the
transactions which are not updated or changed later by the
transaction system
 Also note that this data is specially structured, and may have
been transformed when it was copied into the data
warehouse
Data Mart

• A Data Mart is a smaller, more focused Data


Warehouse – a mini-warehouse.

• A Data Mart typically reflects the business


rules of a specific business unit within an
enterprise.
Data Warehouse to Data Mart

Decision
Support
Data Mart Information

Decision
Data Support
Data Mart Information
Warehouse

Decision
Support
Data Mart Information
Generic Architecture of Data

(synonym) Transaction data


Transaction (Operational) Data
• Operational (production) systems create (massive number of)
transactions, such as sales, purchases, deposits, withdrawals,
returns, refunds, phone calls, toll roads, web site “hits”, etc…
• Transactions are the base level of data – the raw material for
understanding customer behavior
• Unfortunately, operational systems change due to changing
business needs
• Fortunately, operational systems can usually be changed to
support changing business needs
• Data warehousing strategies need to be aware of operational
system changes
Operational Summary Data
Summaries are for a
specific time period and Other Examples???
utilize the transaction
data for that time
period
Decision Support Summary Data
• The data that are used to help make decisions about the
business
– Financial Data, such as:
• Income Statements (Profit & Loss)
• Balance Sheets (Assets – Liabilities = Net Worth)
– Sales summaries
– Other examples???
• Data warehouses maintain this type of data, however
financial data “of record” (for audit purposes) usually comes
from databases and not the data warehouse (confusing???)
• Generally, it is a bad idea to use the same system for analytic
and operational purposes
Database Schema
• Database schema defines the structure of data, not
the values of the data (e.g., first name, last name =
structure; Ron Norman = values of the data)
• In RDBMS:
– Columns = fields = attributes (A,B,C)
– Rows = records = tuples (1-7)
Logical & Physical Database Schema

• Describes the
datadata
in a the
wayway
thatitiswill
familiar
be stored
to business
in an
users which might be different than the way the
RDBMS
logical shows it
Metadata
• General definition: Data about data !!!
– Examples:
• A library’s card catalog (metadata) describes publications (data)
• A file system maintains permissions (metadata) about files (data)
• A form of system documentation including:
– Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.)
– Description of the contents of each field (e.g., start date)
– Date when data were loaded
– Indication of currency of the data (last updated)
– Mappings between systems (e.g., A.this = B.that)
• Invaluable, otherwise have to research to find it
Business Rules
• Highest level of abstraction from operational (transaction)
data
• Describes why relationships exist and how they are applied
• Examples:
– Need to have 3 forms of ID for credit
– Only allow a maximum daily withdrawal of $200
– After the 3rd log-in attempt, lock the log-in screen
– Accept no bills larger than $20
– Others???
General Architecture for Data Warehousing

• Source systems
• Extraction, (Clean),
Transformation, & Load
(ETL)
• Central repository
• Metadata repository
• Data marts
• Operational feedback
• End users (business)
Where does OLAP fit in?
OLAP Overview
• Interactive, exploratory analysis of
multidimensional data to discover patterns

gender
n ts
e
c id
age ac
OLAP Architecture
Server Options

• Single processor

• Symmetric

multiprocessor (SMP)

• Massively parallel

processor (MPP)
OLAP Server Options

• ROLAP (Relational)

• MOLAP (Multidimensional)

• HOLAP (Hybrid)
OLAP – Online Analytical Processing

• A definition:

• Data representation is in the form of a CUBE


• OLAP goes beyond SQL with its analysis capabilities
• Key feature of OLAP: Relevant multi-dimensional views
such as products, time, geography
OLAP Cube - 1
OLAP Cube - 2
OLAP Cube - 3
• Star Structure (quite common)
Product Region
Model Nation
Type Facts District
Color Product Dealer
Region
Time
Channel
Revenue
Channel Expenses Time
Units Week
Year
OLAP Cube - 4

Sales 1996 1997


The
Cube Red
blob

Blue
blob
OLAP Cube - 5

Page Columns
Three- Region:
North
Sales

Dimensional Red Blue Total


Cube 1996
blob blob

Display Rows
Year
1997
Total
OLAP Cube - 6

Dimension Example
Six- Brand Mt. Airy
Store Atlanta
Dimensional Customer segment Business
Cube Product group
Period
Desks
January
Variable Units sold
Rotation (Pivot Table)
Drill Down
Region Sales variance
Africa 105%
Asia 57%
Europe 122%
North America 97%
Pacific 85%
South America 163%

Nation Sales variance


China 123%
Japan 52%
India 87%
Singapore 95%
Results of Data Mining Include:
• Forecasting what may happen in the future
• Classifying people or things into groups by
recognizing patterns
• Clustering people or things into groups based
on their attributes
• Associating what events are likely to occur
together
• Sequencing what events are likely to lead to
later events

You might also like