0% found this document useful (0 votes)
4 views28 pages

Lecture_5 database

The document provides an overview of data warehousing, including its evolution, definitions, and architecture. It discusses the advantages of data warehousing, the concept of data marts, and operational data stores, highlighting the differences between global and local warehouses. Additionally, it addresses key issues in data warehousing and includes multiple-choice questions related to concurrency control and deadlock management.

Uploaded by

goliemate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views28 pages

Lecture_5 database

The document provides an overview of data warehousing, including its evolution, definitions, and architecture. It discusses the advantages of data warehousing, the concept of data marts, and operational data stores, highlighting the differences between global and local warehouses. Additionally, it addresses key issues in data warehousing and includes multiple-choice questions related to concurrency control and deadlock management.

Uploaded by

goliemate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to Data Warehousing

Web Sites

 Rkimball.com
 Iemagazine.com
 Orsoc.org.com/shop/
 DBMSMAG.com

DW 2
Ali El-Bastawissy
Data Warehouse Evolution

“Building the
Relational Company DW” Data Replication
Databases DWs Inmon (1992) Tools

1960 1975 1980 1985 1990 1995 2000

Information-
“Middle Data

TIME
“Prehistoric Based
Times” Ages” Revolution
Management

PC’s and End-user 1st DW DW Vendor DW


Spreadsheets Interfaces Article Confs. Frameworks
DW 3
Ali El-Bastawissy
Problem: Heterogeneous Information Sources

“Heterogeneities are everywhere”

Personal
Databases

World
Scientific Databases
Wide
Web
 Different interfaces Digital Libraries
 Different data representations
 Duplicate and inconsistent information

DW 4
Ali El-Bastawissy
The Warehousing Approach
Data integrated in Clients

advance and stored in


a DW for direct Data
Warehouse
querying and analysis

Integration System Metadata

...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor

...
Source Source Source
DW 5
Advantages of Warehousing Approach

 High query performance


– But not necessarily most current information
 Doesn’t interfere with local processing at sources
– Complex queries at warehouse
– OLTP at information sources
 Information copied at warehouse
– Can modify, annotate, summarize, restructure, etc.
– Can store historical information
– Security, no auditing

DW 6
Ali El-Bastawissy
DW Definition

A collection of technologies aimed at enabling the


knowledge workers to make better and faster
decision
Inmon&Codd 1992

According to Inmon&Codd
“Operational applications (OLTP) and Decision support applications
(OLAP) cannot coexist efficiently in the same database”

DW 7
Ali El-Bastawissy
Warehouse is a Specialized DB

DB DW
 Mostly updates  Mostly reads
 Many small transactions  Queries are long & complex
 Mb - Gb of data  Gb – Tb - Pb of data
 Current snapshot  History: Multi-snapshots
 Index/hash on p.k.  Lots of advanced Indices
 Raw data  Summarized / reconciled
 Thousands of users (e.g.,  Hundreds of users (e.g.,
clerical users) decision-makers, analysts)

DW 8
What is a Data Warehouse?
“A DW is a
– subject-oriented,
– integrated,
– time-varying,
– non-volatile
collection of data that is designed to support the
DSS functions”.
-- W.H. Inmon, Building the Data Warehouse, 1996

What is the correspondent DB Definition?..!


DW 9
Ali El-Bastawissy
What is a Data Warehouse? … Cont’d

 Stored collection of diverse data


– A solution to data integration problem
– Single repository of information

 Subject-oriented
– Organized by subject, not by application
– Used for analysis, data mining, etc.

DW 10
Ali El-Bastawissy
What is a Data Warehouse? … Cont’d

 Non-volatile/ Time varying


– Historical
– Time attributes are important

 Examples
– All transactions ever at WalMart
– Complete client histories at insurance firm

DW 11
Ali El-Bastawissy
Data Mart

 Is a small (local) DW which is a subset of the


enterprise-wide (global) DW.
 It contains only the data which is relevant for a type
of users or enterprise area.
 It enables faster response to queries
 It uses either relational or multidimensional data
structures.

DW 12
Ali El-Bastawissy
Global Vs Local Warehouses

Global Local
 Results from a complex extraction-  From the global by an
integration- aggregation process extraction- aggregation process

 Data is detailed, voluminous, and  Data is highly aggregated and


lightly aggregated less voluminous

DW 13
Ali El-Bastawissy
Operational Data Store ODS

 Is an intermediate layer that may be introduced


between the data sources and the global
warehouse.

 ODS contains the resulting data from the


transformation – integration – aggregation process

DW 14
Ali El-Bastawissy
ODS Differences
 ODS contains only fresh and current data
 ODS is subject to change more frequently
 The aggregation in ODS is of small
granularity (weakly summarized)
 ODS is a good support for:
– Collective operational decisions
– Immediate corporate information

DW 15
Ali El-Bastawissy
A generic DW architecture

Data Marts
(Local DWs)

Aggregation&customization

DW Schema
Global
DW
ODS

extractors

Text Schema Sources


file DB
DW 16
A generic DW architecture

Any Source Any Data Any Access


LAN/
Q WAN
L
U
O
E
A
Data Marts R
Applications D
Y
M
M
A
A
N
N
A
A
G
G
E
External data E
M Data Warehouse M
E
E WWW
N
N
T
T

DW 17 Operational Data Store


Two Distinct DW Issues

“Data warehousing”
(How to get information into warehouse)

“Warehouse DBMS”
(What to do with data once it’s in warehouse)

 Both are rich research areas


 Industry has focused more on the 2nd issue

DW 18
Ali El-Bastawissy
MCQ Exam

1) A protocol that ensures the system will never enter a deadlock state is
called
A. Deadlock elimination
B. Deadlock prevention
C. Deadlock recovery
D. Deadlock detection
2) The rigorous two-phase locking protocol permits releasing all locks at the
E. Beginning of transaction
F. During execution of transaction
G. End of transaction
H. Never in the life-time of transaction

DW 19
3) The system must deal with the deadlocks that are not prevented by using
schemes of
A. Validation
B. Deadlock detection
C. Deadlock recovery
D. Both A and B
4) A two-phase locking protocol variant that requires that all locks be held
until the transaction commit is called
E. Lock-point two-phase locking protocol
F. Deadlock two-phase locking protocol
G. Strict two-phase locking protocol
H. Rigorous two-phase locking protocol

DW 20
5) The deadlock prevention scheme that requires each transaction to locks
all its data items before it begins
A. Initialization
B. Execution
C. Evaluation
D. Processing
6) A transaction that is inserting a new tuple into the database is given an
E. Shared lock
F. Mutual lock
G. Exclusive lock
H. NO lock

DW 21
7) Concurrency control is a challenging task for transactions that have
A. Application accesses
B. I/O activities
C. User interactions
D. Application interactions
8) For deadlock prevention, when we use an ordering of data items, the
request locks are sequenced in
E. Consistent with access
F. Consistent with relation
G. Consistent with ordering
H. Consistent with execution

DW 22
9) Cascading rollbacks can be avoided by
A. Strict two-phase locking protocol
B. Rigorous two-phase locking protocol
C. Deadlock two-phase locking protocol
D. Lock-point two-phase locking protocol
10) The two modes of locking a data item, are termed as 'shared' and
E. Composite
F. Compatible
G. Exclusive
H. Linear

DW 23
11) Two-phase locking does not ensure freedom from
A. Obtain locks
B. Release locks
C. New locks
D. Deadlocks
12) A set of rules applied over a transaction that may lock and unlock each
of the data items in the database, is known to be
E. Unlocking protocol
F. Locking protocol
G. Deadlock protocol
H. Validation protocol

DW 24
13) For controlling preemption, each transaction can be assigned a unique
A. Order
B. Identifier
C. Locator
D. Timestamp
14) A protocol that permits a transaction to lock a new data item only if it
has not yet unlocked any data item, is called
E. Two-phase unlocking protocol
F. One-phase locking protocol
G. Two-phase locking protocol
H. One-phase unlocking protocol

DW 25
15) A time stamp-ordering scheme ensures
A. Serializability
B. Cascading
C. Atomicity
D. Consistency
16) The data item may be locked by
E. 2 modes
F. 3 modes
G. 4 modes
H. 5 modes

DW 26
17) Deadlock prevention approaches are of
A. 2 types
B. 3 types
C. 4 types
D. 5 types
18) A deadlock can be broken down by
E. Committing one or more transactions
F. Aborting one or more transactions
G. Rolling back one or more transactions
H. Terminating one or more transactions

DW 27
19) Exclusive locks are released at the end of the transaction to ensure
A. Recoverability
B. Cascadelessness
C. Key-value locking
D. Both A and B
20) Various concurrency-control schemes are used to ensure
E. Serializability
F. Deadlock prevention
G. Timeouts
H. Locking states

DW 28

You might also like