Lecture 1 Introduction To Data Warehousing
Lecture 1 Introduction To Data Warehousing
Warehousing
Lecture 1
Conducted by
Ms. Akila Brahmana
Department of ICT
Faculty of Technology
University of Ruhuna
Outline of Lecture
❑ Data Warehousing and Information Integration
❑ Brief History of Data Warehousing
❑ What is a Data Warehouse?
❑ Types of Data and Their Uses
❑ Data Warehouse Architectures
❑ Issues in Data Warehousing
Why Need Data Analysis?
❑ to know your customers and yourself better
❑ for effective business strategies,
❑ to provide future directions to business organizations.
This kind of data analysis has been going on for long time. But there
is an urgency in getting such data analysis done faster. Main problem
in doing this has been the disparate and heterogeneous data sources.
Integration System
...
Wrapper Wrapper Wrapper
...
Source Source Source
Disadvantages of Query-Driven Approach
...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor
...
Source Source Source
Advantages of Warehousing Approach
❑ High query performance
❑ But not necessarily most current information
❑ Doesn’t interfere with local processing at sources
❑ Complex queries at warehouse
❑ OLTP at information sources
❑ Information copied at warehouse
❑ Can modify, annotate, summarize, restructure, etc.
❑ Can store historical information
❑ Security, no auditing
❑ Has caught on in industry
Query-driven approach
Query-driven approach still better for
❑ Rapidly changing information
❑ Rapidly changing information sources
❑ Clients with unpredictable needs
Data Warehouse Evolution
“Building the
Relational Company DW” Data Replication
Databases DWs Inmon (1992) Tools
Information-
TIME
“Prehistoric “Middle Data Based
Times” Ages” Revolution
Management
“A DW is a
❑ subject-oriented,
❑ integrated,
❑ time-varying,
❑ non-volatile
collection of data that is used primarily in organizational decision
making.”
Data
Warehouse
Data Warehouse
Population
Operational Systems
Warehouse is a Specialized DB
Standard DB Warehouse
❑ Mostly updates ❑ Mostly reads
❑ Many small transactions ❑ Queries are long and complex
❑ Mb - Gb of data ❑ Gb - Tb of data
❑ Current snapshot ❑ History
❑ Index ❑ Lots of scans
Summarized, reconciled data
Raw data
❑
decision-makers, analysts)
❑
clerical users)
Warehousing and Industry
Warehousing is big business
$2 billion in 1995
$3.5 billion in early 1997
$8 billion in 1998
$13 billion in 2018
Predicted: to cross $ 30 billion by 2025 [Global Market Insights, Inc.]
Types of Data
❑ Two-layer
Operational Informational
systems systems
❑ Real-time + derived data
❑ Most commonly used approach Derived Data
in industry today
Real-time data
Three-layer Architecture: Conceptual View
Transformation of real-time data to derived data really
requires two steps
Operational Informational
systems systems
View level
Derived Data “Particular informational
needs”
Reconciled Data
Physical Implementation
of the Data Warehouse
Real-time data
Data Warehousing: Two Distinct Issues
(1) How to get information into warehouse
“Data warehousing”
(2) What to do with data once it’s in warehouse
“Warehouse DBMS”
❑ Data Mining
❑ Used to extract useful information and patterns from data.
❑ The data mining can be carried with any traditional database, but since a data
warehouse contains quality data, it is good to have data mining over the data
warehouse system.
❑ Business Intelligence(BI)
❑ An environment in which business users conduct analyses that yield overall
understanding of where
❑ The business has been
❑ Where it is now and
❑ Where it will be in the near future (i.e.planning)
❑ Data Mining is a subset of Business Intelligence (BI)
Thank You!
Activity 1