1 & 2 Data Warehousing_021052
1 & 2 Data Warehousing_021052
1 & 2 Data Warehousing_021052
Data Warehousing
Introduction
• Detailed data:
– This is the actual data which has been pulled in from
the various sources.
– Normally stored offline and aggregated into next level
of data.
Typical Architecture of Data Warehouse
• Lightly/highly summarised data:
– Summarised data tends to create various views of the
detailed data, to answer specific queries.
• It is the aggregated data generated by the warehouse
manager.
– It needs to be summarised because there is such a
large amount of data
• Purpose is to speed up queries.
– Because these views can change, there also needs to
be meta-data.
• This is subject to change on an on-going basis depending on
the types of queries.
Typical Architecture of Data Warehouse
• Meta-data:
– Is a description of data in warehouse.
– Changes according to structure of data in warehouse.
• Archive/backup:
– Since the data warehouse will always grow, some of
the older data can be archived, in a way that it can still
be included in queries if required.
Information Flows
Operational data
source 1
Warehouse Mgr
Meta-flow Reporting query, app
Meta-
data development,EIS tools
Inflow Highly
summ. Outflow
Load data
mgr Query
OLAP tools
Lightly manager
Upflow
summ.
Detailed data DBMS
Warehouse mgr
Data-mining tools
Downflow
Operational data Archive/backup
source n
ROLAP
server
SQL Request
RDB
Server Result
Result
RDB
Server
MOLAP
server
Request
Load Result
Database/Application Presentation
Logic Layer Layer
Hybrid OLAP
• Hybrid OLAP is a combination of both ROLAP and
MOLAP.
– It offers higher scalability of ROLAP and faster computation of
MOLAP.
• HOLAP servers allows to store the large data
volumes of detailed information.
• The aggregations are stored separately in MOLAP
store.
Specialized SQL Servers
• Specialized SQL servers provide advanced query
language and query processing support for SQL queries
over star and snowflake schemas in a read-only
environment.
Managed Query Environment
(MQE)
• MQE is a newer technology.
– Data can be delivered either directly from the RDB or from a
MOLAP/ROLAP server in the form of a data cube.
– The data cube is stored and analysed locally – therefore they
are simple to install, and each user can build a custom data
cube.
MQE
End-user
RDB SQL tools
Server
Result
MOLAP
server
Request
Load Result
Real World Scenarion
Casino
Understand the desired
objectives
Step 1: Determine business objectives
• Improve customer experience
Step 2: Collect appropriate data to help obtain your
business objective
• Target the right customer
Step 3: Identify what success looks like
• Increase customer visits
Collect the Right data about
your customer
Store the data e.g. in a data
warehouse
Analyse data to better
understand their customer
Visualize the Data to know
the target customer
Assignment
Enumerate the major differences between the
following:
• Data lake and Big Data
• Data mining and Business Intelligence
Further Reading
• Connolly and Begg, chapters 31 to 34.
• W H Inmon, Building the Data Warehouse, New
York, Wiley and Sons, 1993.
• Benyon-Davies P, Database Systems (2nd ed),
Macmillan Press, 2000, ch 34, 35 & 36.