0% found this document useful (0 votes)
6 views24 pages

CH-2 Data Warehouse and OLAP

Uploaded by

tesewaka3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views24 pages

CH-2 Data Warehouse and OLAP

Uploaded by

tesewaka3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Data Mining By Gidena 1

 Defined in many different ways, but none are rigorous


definition.
◦ a data warehouse refers to a database that is maintained
separately from the organization’s operational database
 A short and more comprehensive definition is given by Inmon
as
◦ “A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.”
 We will see each of the elements that define data warehouse
one by one

Data Mining By Gidena 2


 Organized around major subjects, such as customer, product,
sales.
 Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing.
 Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process.

Data Mining By Gidena 3


 Constructed by integrating multiple, heterogeneous data
sources
◦ relational databases, flat files, on-line transaction records

 Data cleaning and data integration techniques are applied.


◦ Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
◦ When data is moved to the warehouse, it is converted.

Data Mining By Gidena 4


• The time horizon for the data warehouse is significantly longer
than that of operational systems.
– Operational database: current value data.
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)

• Every key structure in the data warehouse


– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time
element”.

Data Mining By Gidena 5


 A physically separate store of data transformed from the
operational environment.
 Operational update of data does not occur in the data
warehouse environment.
◦ Does not require transaction processing, recovery, and
concurrency control mechanisms
◦ Requires only two operations:
 initial loading of data and access of data.

Data Mining By Gidena 6


 Data warehousing is the process of constructing and using data
warehouses
 Construction of data warehouse requires data integration, data
cleaning and data consolidation.
 Using data warehouse usually necessitate a collection of
decision support systems
 This allows “knowledge workers” (such as managers, analysts,
and executives) to use the warehouse to quickly and
conveniently obtain an overview of the data and to make
sound decision based on information in the warehouse.

Data Mining By Gidena 7


 OLTP (On-Line Transaction Processing)
◦ Major task of traditional relational DBMS
◦ Day-to-day operations: purchasing, inventory, banking, manufacturing,
payroll, registration, accounting, etc.
 OLAP (On-Line Analytical Processing)
◦ Major task of data warehouse system
◦ Data analysis and decision making

Data Mining By Gidena 8


• Distinct features (OLTP vs. OLAP):
– User and system orientation:
• OLTP is customer oriented system used for transaction and query processing by clerks,
clients and information technology professionals
• OLAP is market oriented system used for data analysis by knowledge workers including
managers, executives, and analysts
– Data contents: OLTP contains current, detailed data where as OLAP
systems contains large, historical, consolidated data and provides
facilities for summarization, aggregation
– Database design: OLTP adopt ER for data modeling and application
oriented DB design where as OLAP uses star type model and subject
oriented DB design.
– View: OLTP focus on current and local data view where as OLAP has
multiple version of DB schema due to evolutionary process of the
enterprise
– Access patterns: OLTP access pattern is usually update where as OLAP
access pattern is read-only but complex queries
Data Mining By Gidena 9
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support t
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

Data Mining By Gidena 10


 In multidimensional model, data are organized into multiple dimensions,
and each dimension contains multiple level of abstraction defined by
concept hierarchies.
 This organization provides users with flexibility to view data from
different perspectives.

 Different OLAP data cube operations exists to materialize these views:


◦ Roll up (drill-up)
◦ Drill down (roll down)
◦ Slice and dice
◦ Pivot (rotate)

Data Mining By Gidena 11


 Roll up (drill-up): summarize data
◦ by climbing up hierarchy (say from day into
week or year) or by dimension reduction

Data Mining By Gidena 12


 Drill down (roll down): reverse of roll-up
◦ from higher level summary to lower level
summary (say from region to town) or
detailed data, or introducing new
dimensions

Data Mining By Gidena 13


 Slice:
◦ Slice operation performs selection on
one dimension of a given cube
resulting in a sub-cube (say time =
Q1)

Data Mining By Gidena 14


 Dice:
◦ Dice operation performs defines a sub
cube by performing a selection on two or
more dimension

Data Mining By Gidena 15


 Pivot (rotate):
◦ reorient the cube, visualization, 3D to
series of 2D planes.

Data Mining By Gidena 16


 Can be built using top-down approaches, bottom-up approaches or a
combination of both
◦ Top-down: Starts with overall design and planning
 Start with the over all design and planning
 Require huge investment and commitment
 Appropriate when the technology is mature and well known
◦ Bottom-up: Starts with experiments and prototypes
 Appropriate in the early stage of business modeling and technology development
 Enables the business to move forward at considerably less expense and to evaluate
the benefits of technology before making significant commitment
 From software engineering point of view
◦ Waterfall: structured and systematic analysis at each step before proceeding
to the next
◦ Spiral: rapid generation of increasingly functional systems, short turn
around time, quick turn around

Data Mining By Gidena 17


 In general, data warehouse design process consists of
1. Choosing a business process to model, e.g., orders,
invoices, sales, shipment, inventory, account administration,
general ledger etc.

2. Choosing the grain (atomic level of data) of the business


process that will be represented in the fact table

3. Choosing the dimensions that will apply to each fact table


record

4. Choosing the measure that will populate each fact table


record
Data Mining By Gidena 18
Executive
Information
System/Highly
summarized data

Data Mining By Gidena 19


 Data warehouse often adopt three-tier architecture
 Warehouse database server (The bottom tier)
◦ Almost always a relational DBMS, rarely flat files
◦ Back end tools and utilities are used to feed data into the bottom tier
◦ The tools and utilities perform data extraction, cleaning and transformation as
well as load and refresh functions to update the warehouse
 OLAP servers (Middle tier)
◦ Implemented either as Relational OLAP (ROLAP) or Multidimensional OLAP
(MOLAP)
◦ ROLAP: extended relational DBMS that maps operations on multidimensional
data to standard relational operators
◦ Multidimensional OLAP (MOLAP): special-purpose server that directly
implements multidimensional data and operations
 Clients(the top tier)
◦ Query and reporting tools -Data mining tools
◦ Analysis tools
Data Mining By Gidena 20
The Complete Data Warehouse System

Information Sources Data Warehouse OLAP Servers Clients


Server (Tier 2) (Tier 3)
(Tier 1)
e.g., MOLAP
Semistructured Analysis
Sources
Data
Warehouse serve

extract Query/Reporting
transform
load serve
refresh
etc. e.g., ROLAP
Operational
DB’s Data Mining
serve

Data Marts

Data Mining By Gidena 21


• From the architecture point of view, there are three data
warehouse models described as Enterprise warehouse, Data Mart,
or Virtual warehouse

• Enterprise warehouse
– collects all information about subjects that span the entire organization
(customers, products, sales, assets, personnel)
– Requires extensive business modeling (may take years to design and build)

Data Mining By Gidena 22


 Data Mart
◦ a subset of corporate-wide data that is of value to a specific groups of
users.
◦ Its scope is confined to specific, selected groups
◦ For example, a marketing data mart my confine its subject to customer,
product and sales
◦ Data marts depending on the data source can be dependent or
independent
 Dependent data mart are sourced directly from the enterprise data warehouse
 Independent data marts source can be from some operational data sources,
external information providers, from data generated locally within a
particular department or geographic area

Data Mining By Gidena 23


 Virtual warehouse
◦ A set of views over operational databases
◦ Only some of the possible summary views may be materialized
◦ Easy to build but requires excess capacity on operational
database servers

Data Mining By Gidena 24

You might also like