Lecture 6

Uploaded by

samkh866n

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views38 pages

Lecture 6

Uploaded by

samkh866n

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

THE ARCHITECTURAL COMPONENTS

Lecture # 06
Instructor: Mr. Sharjeel Ahmed
Slide Elements
• The architectural components
• Data Warehouse Architecture
• Distinguishing Characteristics
• Architectural Framework
• Technical Architecture
DATA WAREHOUSE ARCHITECTURE
Architecture: Definitions
• The structure that brings all the components of a data warehouse
together is known as the architecture.
• In your data warehouse, architecture includes a number of factors.
• Primarily, it includes the integrated data that is the centerpiece.
• The architecture includes everything that is needed to prepare the
data and store it.
• It also includes all the means for delivering information from your data
warehouse.
• The architecture is further composed of the rules, procedures, and
functions that enable your data warehouse to work and fulfill the
business requirements.
• Finally, the architecture is made up of the technology that empowers
your data warehouse.
General Purpose of The Architecture
• The architecture provides the overall framework for developing and
deploying your data warehouse.

• It is a comprehensive blueprint.

• The architecture defines the standards, measurements, general

design, and support techniques.
Architecture in Three Major Areas
As you already know, the three major areas in the data warehouse are:
• Data acquisition
• Data storage
• Information delivery
DISTINGUISHING CHARACTERISTICS
Distinguishing Characteristics
• Data warehouse architecture is wide, complex, and expansive.

• In a data warehouse, the architecture consists of distinct components.

• The architecture has distinguishing characteristics worth considering

in detail.

• Before moving on to discuss the architectural framework itself, let us

review the distinguishing characteristics of data warehouse
architecture:
Distinguishing Characteristics (Cont. )
Different Objectives and Scope
Defining the scope for a data warehouse is difficult. There are several
sets of factors to consider:
• The number and extent of the data sources. How many legacy systems are
you going to extract the data from? What are the external sources?
• Are you planning to include departmental files, spreadsheets, and private
databases? What about including the archived data?
• Scope of the architecture may again be measured in terms of the data
transformations and integration functions.
• Data granularity and data volumes are also important considerations.
• Yet another serious consideration is the impact of the data warehouse on the
existing operational systems. Because of the data extractions, comparisons,
and reconciliation, you have to determine how much negative impact the data
warehouse will have on the performance of operational systems.
• When will your batch extracts be run and how will they affect the production
source systems?
Distinguishing Characteristics (Cont. )
Data Content
• The “read-only” data in the data warehouse sits in the middle as the
primary component in the architecture.
• In your data warehouse, you keep data integrated from multiple sources.
• After extracting the data, which by itself is an elaborate process, you transform
the data, cleanse it, and integrate it in a staging area. Only then you move the
integrated data into the data ware house repository as read-only data.
Operational data is not “read-only” data.
• Further, your data warehouse architecture must support the storing of data
grouped by business subjects, not grouped by applications as in the case of
operational systems.
• The data in your data warehouse does not represent a snapshot containing
the values of the variables as they are at the current time. This is different and
distinct from most operational systems.
Distinguishing Characteristics (Cont. )
Complex Analysis and Quick Response
• Data warehouse architecture must support complex analysis of the
strategic information by the users.
• Users must be able to drill down, roll up, slice and dice data, and play with
“what-if ” scenarios.
• Users must have the capability to review the result sets in different output
options.
• Users are no longer content with textual result sets or results displayed in
tabular formats. Every result set in tabular format must be translated into
graphical charts.
• Your data warehouse architecture must make it easy to make strategic
decisions quickly.
• There must be appropriate components in the architecture to support quick
response by the users to deal with situations by using the information provided
by your data warehouse.
Distinguishing Characteristics (Cont. )
Flexible and Dynamic
• You have to make sure your data warehouse architecture is flexible
enough to accommodate additional requirements as and when they
surface.
• Additional requirements surface to include the missed items in the
business requirements.
• Moreover, business conditions themselves change. In fact, they keep
on changing. Changing business conditions call for additional
business requirements to be included in the data warehouse.
• If the data warehouse architecture is designed to be flexible and
dynamic, then your data warehouse can cater to the supplemental
requirements as and when they arise.
Distinguishing Characteristics (Cont. )
Metadata-driven
• As the data moves from the source systems to the end-users as
useful, strategic information, metadata surrounds the entire
movement.
• The metadata component of the architecture holds data about every
phase of the movement, and, in a true sense, makes the movement
happen.
• In your data warehouse architecture, the metadata component
interleaves with and connects the other components.
ARCHITECTURAL FRAMEWORK
Architectural Framework
• Earlier ( referring to Slide Page 6), we grouped the architectural
components as building blocks in the three distinct areas of data
acquisition, data storage, and information delivery.

• In each of these broad areas of the data warehouse, the architectural

components serve specific purposes.
1. Architecture Supporting Flow of Data
2. The Management and Control Module
Architecture Supporting Flow of Data
• Data that finally reaches the end-user as useful strategic information
begins as disparate data elements in the various data sources.
• This collection of data from the various sources moves to the staging
area.
• The extracted data then goes through a detailed preparation process
in the staging area before it is sent forward to the data warehouse to
be properly stored.
• From the data warehouse storage, data transformed into useful
information is retrieved by the users or delivered to the user desktops
as required.
• Let us now follow the flow of the data. At each stop along the passage,
let us identify the architectural components. Some of the architectural
components govern the flow of data from beginning to end.
Architecture Supporting Flow of Data (Cont. )
• What are the architectural components, and how do these components enable
the data flow?
At the Data Source
• Here the internal and external data sources form the source data architectural
component. Source data governs the extraction of data for preparation and
storage in the data warehouse. The data staging architectural component
governs the transformation, cleansing, and integration of data.
In the Data Warehouse Repository
• The data storage architectural component includes the loading of data from the
staging area and also storing the data in suitable formats for information
delivery. The metadata architectural component is also a storage mechanism to
contain data about the data at every point of the flow of data from beginning to
end.
At the User End
• The information delivery architectural component includes dependent data
marts, special multidimensional databases, and a full range of query and
reporting facilities.
Architecture Supporting Flow of Data (Cont. )
The Management and Control Module
• This architectural component is an overall module managing and
controlling the entire data warehouse environment.
• It is an umbrella component working at various levels and covering
all the operations.
• This component has two major functions: first to constantly monitor
all the ongoing operations, and next to step in and recover from
problems when things go wrong.
The Management and Control Module (Cont. )
TECHNICAL ARCHITECTURE
Technical Architecture
• The technical architecture of a data warehouse is the set of
functions and services provided within its components. The technical
architecture also includes the procedures and rules that are required
to perform the functions and provide the services. The technical
architecture also encompasses the data stores needed for each
component to provide the services.

• The architecture is not the set of tools needed to perform functions &
provide services. When we refer to the data extraction function within
one of the architectural components, we are simply mentioning the
function itself and the various tasks associated with it. Also, we are
relating the data store for the staging area to the data extraction
function because extracted data is moved to the staging area. Where
do the tools fit in? Tools are the means to implement the architecture.
• Architecture comes first and the tools follow.
Technical Architecture (Cont. )
• Let us now move on to consider the technical architecture in each of
the three major areas of the data warehouse:
1. Data Acquisition
2. Data Storage
3. Information Delivery

Detailed on Next Slides:

Data Acquisition
Data Acquisition: This area covers the entire process of extracting
data from the data sources, moving all the extracted data to the staging
area, and preparing the data for loading into the data warehouse
repository.

Two Major architectural components: Two major components of this

area are source data and data staging. The functions and services in
this area relate to these architectural components. The variations in the
data sources have a direct impact on the extent and scope of the
functions and services.

Data Flow: The data flow begins at the data sources and pauses at the
staging area. After transformation and integration, the data is ready for
loading into the data warehouse repository.
Data Acquisition (Cont. )
Data Acquisition (Cont. )
Data Sources: For the majority of data warehouses, the primary data
source consists of the enterprise’s operational systems.
• Many operational systems at several enterprises are still legacy systems
that resides on hierarchical or network databases. Use appropriate
language of the particular DBMS to extract data.
• Some More recent operational systems run on the client/server
architecture. Usually, these systems are supported by relational DBMSs.
Here you use an SQL-based language for extracting data.
• Large number of companies have adopted ERP (enterprise resource
planning) systems. ERP data sources provide an advantage in that the
data from these sources is already consolidated and integrated. There
could, however, be a few drawbacks to using ERP. You will have to use the
ERP vendor’s proprietary tool for data extraction. Also, most of the ERP
offerings contain very large numbers of source data tables.
• For Data from outside sources, you will have to create temporary files to
hold the data received from outside sources. After reformatting and
rearranging data elements, you will have to move the data to staging area.
Data Acquisition (Cont. )
Intermediary Data Stores: As data gets extracted from data sources,
it moves through temporary files.
• Sometimes, extracts of homogeneous data from several source
applications are pulled into separate temporary files and then merged
into another temporary file before moving it to the staging area.
• The opposite process is also common. From each application, one or
two large flat files are created and then divided into smaller files and
merged appropriately before moving the data to the staging area.
• Typically, the general practice is to use flat files to extract data from
operational systems.
Staging Area:
• This is the place where all the extracted data is put together and
prepared for loading into the data warehouse. The staging area may
contain data at the lowest grain to populate tables containing
business measurements. Staging area data repositories are relational
databases containing the fully integrated and cleansed data.
Data Acquisition (Cont. )
• Functions and Services: This is a general list. It does not indicate
the extent or complexity of each function or service:
1. Data Extraction
• Select data sources and determine types of filters to be applied
• Generate automatic extract files from operational systems using
replication and other techniques
• Create intermediary files to store selected data to be merged later
• Transport extracted files from multiple platforms
• Provide automated job control services for creating extract files
• Reformat input from outside sources
• Reformat input from departmental data files, databases, and
spreadsheets
• Generate common application code for data extraction
• Resolve inconsistencies for common data elements from multiple
sources
Data Acquisition (Cont. )
2. Data Transformation
• Map input data to data for data warehouse repository.
• Clean data, de-duplicate, and merge/purge.
• De-normalize extracted data structures as required by the
dimensional model of the data warehouse.
• Convert data types.
• Calculate and derive attribute values.
• Check for referential integrity.
• Aggregate data as needed. Resolve missing values.
• Consolidate and integrate data
Data Acquisition (Cont. )
3. Data Staging
• Provide backup and recovery.
• Sort and merge files.
• Create files as input to make changes to dimension tables
• If data staging storage is a relational database, create and populate
database.
• Preserve audit trail to relate each data item in the data warehouse to
input source.
• Resolve and create primary and foreign keys for load tables.
• Consolidate datasets and create flat files for loading through DBMS
utilities.
• If staging area storage is a relational database, extract load files
Data Storage (Cont. )
• This area covers the process of loading the data from the staging
area into the data warehouse repository.
Data Storage (Cont. )
Data Storage: This area covers the process of loading the data from
the staging area into the data warehouse repository.

Data Flow: For data storage, the data flow begins at the data staging
area to the data warehouse repository.
• If the data warehouse is an enterprise-wide data warehouse being
built in a top-down fashion, then there could be movements of data
from the enterprise-wide data warehouse repository to the
repositories of the dependent data marts.
• Alternatively, if data warehouse is being built in a bottom-up manner,
then the data movements stop with the appropriate conformed data
marts.
Data Storage (Cont. )
Data Groups: Prepared data waiting in the data staging area fall into
two groups.
• The first group is the set of files or tables containing data for a full
refresh. This group of data is usually meant for the initial loading of
the data warehouse. Occasionally, some data warehouse tables may
be refreshed fully.
• The other group of data is the set of files or tables containing ongoing
incremental loads. Most of these relate to nightly loads. Some
incremental loads of dimension data may be performed at less
frequent intervals.

The Data Repository: Almost all of today’s data warehouse databases

are relational databases. All the power, flexibility, and ease of use
capabilities of the RDBMS become available for the processing of data.
Data Storage (Cont. )
Functions and Services: This is a general list. It does not indicate the
extent or complexity of each function or service:
• Load data for full refreshes of data warehouse tables
• Perform incremental loads at regular prescribed intervals
• Support loading into multiple tables at detailed and summarized
levels
• Optimize the loading process
• Provide automated job control services for loading data warehouse
• Provide backup and recovery for the data warehouse database
• Provide security
• Monitor and fine-tune the database
• Periodically archive data from the database according to preset
conditions
Information Delivery (Cont. )
Information delivery: This area spans a broad spectrum of many
different methods of making information available to users.
• The information delivery component makes eases users to access
the information either directly from the enterprise-wide data
warehouse, from dependent data marts, or from the set of conformed
data marts.
• Most of the information access in a data warehouse is through online
queries and interactive analysis sessions. Nevertheless, your data
warehouse will also be producing regular and ad hoc reports.

• Data Flow: For information delivery, the data flow begins at the
enterprise-wide data warehouse and the dependent data marts when
the design is based on the top-down technique.
• When the design follows the bottom-up method, the data flow starts
at the set of conformed data marts.
• Data transformed into information flows to the user desktops during
query sessions.
Information Delivery (Cont. )
Service Locations: In your information delivery component, you may
provide query services from the user desktop, from an application
server, or from the database itself. This will be one of the critical
decisions for your architecture design.

Data Store: For information delivery, you may consider the following
intermediary data stores:
• Proprietary temporary stores to hold results of individual queries and
reports for repeated use
• Data stores for standard reporting
• Proprietary multidimensional databases

Functions and Services: This is a general list. It does not indicate the
extent or complexity of each function or service.
• Provide security to control information access
Information Delivery (Cont. )
• Monitor user access to improve service and for future enhancements
• Allow users to browse data warehouse content
• Simplify access by hiding internal complexities of storage from users
• Automatically reformat queries for optimal execution
• Enable queries to be aware of aggregate tables for faster results
• Govern queries and control runaway queries
• Provide self-service report generation for users, consisting of a
variety of flexible options to create, schedule, and run reports
• Store result sets of queries and reports for future use
• Provide multiple levels of data granularity
• Provide event triggers to monitor data loading
• Make provision for the users to perform complex analysis through
online analytical processing (OLAP)
• Enable data feeds to downstream, specialized decisions support
systems such as EIS and data mining
Information Delivery (Cont. )

DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
Abinitio Session 1
100% (1)
Abinitio Session 1
237 pages
Data Warehousing - Architecture - Tutorialspoint
No ratings yet
Data Warehousing - Architecture - Tutorialspoint
7 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
OpenText Vendor Invoice Management For SAP Solutions 7.5 SP5 - Administration Guide
100% (5)
OpenText Vendor Invoice Management For SAP Solutions 7.5 SP5 - Administration Guide
246 pages
Data Warehouses: FPT University
No ratings yet
Data Warehouses: FPT University
46 pages
CHAPTER 3 Architectural Components
No ratings yet
CHAPTER 3 Architectural Components
10 pages
Architectural COmponents
No ratings yet
Architectural COmponents
27 pages
Data Warehousing: L.Ramanathan Asst. Prof. Scse VIT University
No ratings yet
Data Warehousing: L.Ramanathan Asst. Prof. Scse VIT University
34 pages
Data Warehouse
No ratings yet
Data Warehouse
11 pages
Unit 2 Data Mining & Warehouse
No ratings yet
Unit 2 Data Mining & Warehouse
40 pages
DM104 - Evaluation of Business Performance
No ratings yet
DM104 - Evaluation of Business Performance
15 pages
Data Ware House Architectures
No ratings yet
Data Ware House Architectures
34 pages
CH 2 Introduction To Data Warehousing
No ratings yet
CH 2 Introduction To Data Warehousing
31 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
5 pages
DMW p1 Merged
No ratings yet
DMW p1 Merged
316 pages
DATA Ware House Mining NOTES
No ratings yet
DATA Ware House Mining NOTES
31 pages
Unit II-DM
No ratings yet
Unit II-DM
53 pages
Unit II-DM
No ratings yet
Unit II-DM
54 pages
Data Warehousing and Data Mining Original Notes
No ratings yet
Data Warehousing and Data Mining Original Notes
47 pages
Data Warehouse
No ratings yet
Data Warehouse
143 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
8 pages
Data Warehouse Week 1
No ratings yet
Data Warehouse Week 1
78 pages
Nimish PPT Datawarehouse
No ratings yet
Nimish PPT Datawarehouse
9 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
Lec 01 - Intro To Data Warehouse
No ratings yet
Lec 01 - Intro To Data Warehouse
54 pages
3 Marks 1.what Is Data Warehouse?: o o o o o
No ratings yet
3 Marks 1.what Is Data Warehouse?: o o o o o
13 pages
Mid Syllabus DWH
No ratings yet
Mid Syllabus DWH
25 pages
Data Warehouse Final Report
No ratings yet
Data Warehouse Final Report
19 pages
Data Warehouse Unit1 CS3551
No ratings yet
Data Warehouse Unit1 CS3551
25 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Unit 3 - Notes
No ratings yet
Unit 3 - Notes
20 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
34 pages
Requirements As The Driving Force For Data Warehousing: Mr. Hubert I. Caguiat
No ratings yet
Requirements As The Driving Force For Data Warehousing: Mr. Hubert I. Caguiat
64 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
4 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
Chapter 2
No ratings yet
Chapter 2
44 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
9 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
29 pages
Data Warehouse Concepts
100% (1)
Data Warehouse Concepts
11 pages
Data Warehouse-Ccs341 Material
No ratings yet
Data Warehouse-Ccs341 Material
58 pages
Data Warehouse Architecture-Group 1
No ratings yet
Data Warehouse Architecture-Group 1
6 pages
Data Warehouse Architecture
100% (3)
Data Warehouse Architecture
63 pages
Architecture
No ratings yet
Architecture
8 pages
Data Warehousing, Business Analytics and Online Analytical - 1
No ratings yet
Data Warehousing, Business Analytics and Online Analytical - 1
35 pages
DW & DM Module 4
No ratings yet
DW & DM Module 4
4 pages
2024 Meeting 1 - Data Warehouse Fundamentals
No ratings yet
2024 Meeting 1 - Data Warehouse Fundamentals
47 pages
Data Warehousing - Architecture
No ratings yet
Data Warehousing - Architecture
7 pages
Data Warehouse & Data Mining Notes
No ratings yet
Data Warehouse & Data Mining Notes
9 pages
DWDM
No ratings yet
DWDM
97 pages
DWH Week 03
No ratings yet
DWH Week 03
17 pages
DW Unit 1
No ratings yet
DW Unit 1
29 pages
Unit-2: Multi-Dimensional Data Model?
No ratings yet
Unit-2: Multi-Dimensional Data Model?
21 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Business Mid Sem
No ratings yet
Business Mid Sem
6 pages
Assignment of Information Technology: Submitted To: Submitted by
No ratings yet
Assignment of Information Technology: Submitted To: Submitted by
14 pages
Lecture-2 The Building Blocks
No ratings yet
Lecture-2 The Building Blocks
36 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
50 pages
Report Requirements Datawarehouse
No ratings yet
Report Requirements Datawarehouse
41 pages
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
SMA SI2012 2224 Technical Description
No ratings yet
SMA SI2012 2224 Technical Description
212 pages
Enhanced Sleeper Coach Technology - Space Utilisation
No ratings yet
Enhanced Sleeper Coach Technology - Space Utilisation
10 pages
Nokia 6100 LCD Display Driver
No ratings yet
Nokia 6100 LCD Display Driver
55 pages
Signal Corps Radio Sets 1943
100% (2)
Signal Corps Radio Sets 1943
170 pages
4.2blockchain and Cloud
No ratings yet
4.2blockchain and Cloud
7 pages
Wattmeter Solved PRoblems-Paliza, Joshua
100% (1)
Wattmeter Solved PRoblems-Paliza, Joshua
11 pages
Manoj Gate
No ratings yet
Manoj Gate
5 pages
Neom Oxagon Local Control & Main PLC Panel Material Schedule 17-10-2022
No ratings yet
Neom Oxagon Local Control & Main PLC Panel Material Schedule 17-10-2022
1 page
Part - 1-General Introduction
No ratings yet
Part - 1-General Introduction
19 pages
Sensor Deviations: Transfer Function Accuracy
No ratings yet
Sensor Deviations: Transfer Function Accuracy
1 page
TestSuite ApplicationTest DOC v10 en
No ratings yet
TestSuite ApplicationTest DOC v10 en
14 pages
Commonmeccheme
No ratings yet
Commonmeccheme
3 pages
EPA Test Procedure For EVs-PHEVs-11-14-2017
No ratings yet
EPA Test Procedure For EVs-PHEVs-11-14-2017
2 pages
2016 3 1 2 Haenisch
No ratings yet
2016 3 1 2 Haenisch
10 pages
The Soliton Pulses Generator Experiment by Jean-Louis Naudin
No ratings yet
The Soliton Pulses Generator Experiment by Jean-Louis Naudin
10 pages
04 Digital Applications Notizen
No ratings yet
04 Digital Applications Notizen
61 pages
Topic 1 - Problem Domain of Artificial Intelligence
100% (1)
Topic 1 - Problem Domain of Artificial Intelligence
21 pages
Ensoniq DP 4 Musicians Manual
No ratings yet
Ensoniq DP 4 Musicians Manual
212 pages
SOD123
No ratings yet
SOD123
5 pages
Discussion Lab3C PLC
33% (3)
Discussion Lab3C PLC
3 pages
Jncis-Sp & Jncip-Sp Blueprint
No ratings yet
Jncis-Sp & Jncip-Sp Blueprint
4 pages
Brochure - Full - Spreads
No ratings yet
Brochure - Full - Spreads
21 pages
1K/2K/4K Spi Serial Cmos Eeprom Features
No ratings yet
1K/2K/4K Spi Serial Cmos Eeprom Features
11 pages
Description: Foam Concentrate Pump
No ratings yet
Description: Foam Concentrate Pump
5 pages
Gs Coating Equipment: For Tablet Film and Sugar Coating, Pellets and Microgranules
100% (1)
Gs Coating Equipment: For Tablet Film and Sugar Coating, Pellets and Microgranules
8 pages
Hardware's Role in Virtual Instrumentation
No ratings yet
Hardware's Role in Virtual Instrumentation
6 pages
CSC311 Lecture 1
No ratings yet
CSC311 Lecture 1
29 pages
Automatic Fire Detection and Suppression System
No ratings yet
Automatic Fire Detection and Suppression System
2 pages
Oracle Fusion Expenses Android
No ratings yet
Oracle Fusion Expenses Android
7 pages

Lecture 6

Uploaded by

Lecture 6

Uploaded by

THE ARCHITECTURAL COMPONENTS

• The architecture defines the standards, measurements, general

• In a data warehouse, the architecture consists of distinct components.

• The architecture has distinguishing characteristics worth considering

• Before moving on to discuss the architectural framework itself, let us

• In each of these broad areas of the data warehouse, the architectural

Detailed on Next Slides:

Two Major architectural components: Two major components of this

The Data Repository: Almost all of today’s data warehouse databases

You might also like