0% found this document useful (0 votes)
118 views23 pages

Data - Warehouse - Architectural Components part-III

This document discusses data warehouse architectures and the data reconciliation process. It describes a three-layer architecture consisting of operational, reconciled, and derived data. The data reconciliation process includes capturing data from source systems, cleansing the data to fix errors, transforming the data, and loading it into the data warehouse using either refresh or update modes. Meta data and an enterprise data model are also important components of the overall architecture.

Uploaded by

komal zia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views23 pages

Data - Warehouse - Architectural Components part-III

This document discusses data warehouse architectures and the data reconciliation process. It describes a three-layer architecture consisting of operational, reconciled, and derived data. The data reconciliation process includes capturing data from source systems, cleansing the data to fix errors, transforming the data, and loading it into the data warehouse using either refresh or update modes. Meta data and an enterprise data model are also important components of the overall architecture.

Uploaded by

komal zia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Data Warehouse Architectures

Data Warehousing/Mining 1
Objectives of Today's Lecture
 Three-Layer architecture
 Enterprise data model
 Meta data
 Data part in three-layer architecture
 Status Vs Event data
 Transient Vs Periodic data
 Extract and Types of Extract
 Loading Two modes

Data Warehousing/Mining 2
Three-Layer Data Architecture

 Three important terms


1. Operational data
2. Reconciled data
3. Derived data

Data Warehousing/Mining 3
Three-Layer Data Architecture
Contd..
 Operational data ?
 Reconciled data
– Is detailed, current data intended to be the single,
authoritative source for all decision support
applications
 Derived data
– Data that have been selected, formatted, and
aggregated for end user decision support
application

Data Warehousing/Mining 4
Three-Layer Data Architecture
Contd..
 Two components plays important role in this
architecture :
– Enterprise data model
– Meta data

Data Warehousing/Mining 5
Three-layer architecture

Data Warehousing/Mining 6
Role of Enterprise Data Model

 It presents total picture explaining the data


required by an organization
 It control the phased evolution of DWH
 It takes too long to develop the enterprise
data model in one step and dynamic need for
decision making will change before the
warehouse is built

Data Warehousing/Mining 7
Role Of Meta Data

 Meta data are data that describes the properties or or


characteristics of other data
 Operational meta data
– Describes data in various operational systems
 EDWH meta data
– Derived from EDWH
– Describes the reconciled data layer as well as the rules rules
for transforming operational data into reconciled data
 Data mart Meta Data
– Describes the derived data layer and the rules for
transforming reconciled data to derived data

Data Warehousing/Mining 8
Data part in three-layer architecture

 Operational Data?
 Reconciled data
– Are detailed, current data intended to be the
single, authoritative source for all decision
support applications
 Derived data
– Data that have been selected, formatted, and
aggregated for END USER decision support
applications

Data Warehousing/Mining 9
Status Vs Event data
 Status data
– Before and after image of data
 Event data
– Data on which action/event is performed
 Event
– A database action (create, update, or delete) that results
from a transaction.
– A transaction may lead to one or more events like in case of :
 Withdrawal
 transfer
 In practice most of the data stored in DB is status
data
 Both data are typically stored in DB logs for backup
an recovery
Data Warehousing/Mining 10
Transient Vs Periodic data

 Transient data
– Data in which changes to existing records are
written over pervious records
– It destroys the previous data
 Periodic data
– Data that are never physically altered or deleted,
once the have been added to the store.

Data Warehousing/Mining 11
Example of Periodic and Transient
Data

Data Warehousing/Mining 12
Data reconciliation

 It can be visualized as a process consisting of


four steps
– Capture
– Scrub
– Transformation
– loading

Data Warehousing/Mining 13
Extract

 Capturing the relevant data from the source


files and DBs used to fill the EDW
 Types of Extract
– Static extract
– Incremental extract

Data Warehousing/Mining 14
Extract Contd..

 Static extract
– A method of capturing a snapshot of the required
source data at a point in time
– Used to fill DWH initially
 Incremental extract
– A method of capturing only the changes that have
occurred in the source data since the last capture
– Used for ongoing warehouse maintenance

Data Warehousing/Mining 15
Data Scrubbing / cleansing

 A technique using pattern recognition and other


artificial intelligence techniques to upgrade the
quality of raw data before transforming and moving
the data to the warehouse
 Which data needs to be scrubbed
– Misspelled names and addresses
– Impossible or erroneous dates of birth
– Fields used for the purpose for which it was never intended
– Missing data
– Duplicate data
– Mismatched addresses or area codes

Data Warehousing/Mining 16
Steps in data reconciliation

Capture = extract…obtaining a snapshot


of a chosen subset of the source data for
loading into the data warehouse

Static extract = capturing a Incremental extract =


snapshot of the source data at capturing changes that have
a point in time occurred since the last static
extract
Data Warehousing/Mining 17
Data Scrubbing / cleansing Contd..
 Bandwidth of Cleansing depends upon quality of
data
– Higher the quality less effort will be needed for cleansing
and vice versa
 Common cleansing tasks are
– Decoding data to make them understandable for DWH
applications
– Reformatting and changing data types
– Converting between different measuring units
– Finding missing data to complete the batch of data necessary
for subsequent loading

Data Warehousing/Mining 18
Steps in data reconciliation (continued)

Scrub = cleanse…uses pattern


recognition and AI techniques to
upgrade data quality

Fixing errors: misspellings, Also: decoding, reformatting, time


erroneous dates, incorrect field usage, stamping, conversion, key generation,
mismatched addresses, missing data, merging, error detection/logging,
duplicate data, inconsistencies locating missing data

Data Warehousing/Mining 19
Load and Index

 Two modes
– Refresh mode
– Update mode

Data Warehousing/Mining 20
Load and Index Contd..

 Refresh mode
– An approach to fill the DWH that employs bulk rewriting of
the target data at periodic intervals
– Replaces the previous contents
– Less popular
– Good for filling DWH initially
– Used in conjunction with static data capture
 Update mode
– An approach in which only changes in the source data are
written to the DWH
– New records are written without overwriting previous
record
– Used in connection with incremental data capture
Data Warehousing/Mining 21
Steps in data reconciliation (continued)

Load/Index= place transformed data


into the warehouse and create indexes

Refresh mode: bulk rewriting of Update mode: only changes in


target data at periodic intervals source data are written to data
warehouse

Data Warehousing/Mining 22
Thank You Very Much

Data Warehousing/Mining 23

You might also like