0% found this document useful (0 votes)
54 views27 pages

Lec2 - DWH Architecture

A three-tier data warehouse architecture typically consists of a bottom, middle, and top tier. The bottom tier is the data warehouse database which uses tools to extract, clean, and load data from source systems. The middle tier can be either a ROLAP or MOLAP server. The top tier contains query, reporting, and analysis tools for end users. Key components of a data warehouse also include operational data stores, a load manager for extraction and loading, a warehouse manager for data storage and organization, a query manager for access, and end user tools for analysis.

Uploaded by

sana faiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views27 pages

Lec2 - DWH Architecture

A three-tier data warehouse architecture typically consists of a bottom, middle, and top tier. The bottom tier is the data warehouse database which uses tools to extract, clean, and load data from source systems. The middle tier can be either a ROLAP or MOLAP server. The top tier contains query, reporting, and analysis tools for end users. Key components of a data warehouse also include operational data stores, a load manager for extraction and loading, a warehouse manager for data storage and organization, a query manager for access, and end user tools for analysis.

Uploaded by

sana faiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Warehousing

Data warehouse Architecture

1
Three Tier Data Warehouse Architecture
• Generally a data warehouses adopts a three-tier architecture.

2
• Bottom Tier - The bottom tier of the architecture is the
data warehouse database server. It is the relational
database system. We use the back-end tools and utilities to
feed data into the bottom tier. These backend tools and
utilities perform the Extract, Clean, Load, and refresh
functions.
• Middle Tier - In the middle tier, we have the OLAP Server
that can be implemented in either of the following ways.
– By Relational OLAP (ROLAP), which is an extended relational
database management system.
– By Multidimensional OLAP (MOLAP) model, which directly
implements the multidimensional data and operations.
• Top-Tier - This tier is the front-end client layer. This layer
holds the query tools and reporting tools, analysis tools
and data mining tools.

3
4
Typical architecture of a data warehouse
Data Warehouse Components
• Operational data store
• Load Manager
• Warehouse Manager
• Query Manager
• End User Access Tool

5
Operational Data Stores (ODS)
• The data in a data warehouse comes from operational systems of
the organization as well as from other external sources. These
are collectively referred to as source systems.
• The source of data for the data warehouse is supplied from:
– Operational data held in network databases.
– Departmental data held in proprietary file systems such as
VSAM
– Private data held on workstations and private servers.
– External systems such as the Internet, commercially available
databases
– Databases associated with an organization’s suppliers or
customers.
–  VSAM: Virtual Storage Access Method

6
Data Warehouse Components
• Load Manager
– This component performs the operations
required to extract and load process.
– The size and complexity of the load manager
varies between specific solutions from one data
warehouse to other

7
Load Manager Architecture

8
Load manager extracts data from different sources; performs
simple transformations into structure similar to the one in the
data warehouse and loads to temporary data store.
• The load manager performs the following functions:
– Identification of data.
– Validation of data about the accuracy.
– Extraction of data from original source.
– Cleansing of data by eliminating meaningless values and
making it usable.
– Data formatting.
– Data standardization by getting them into a consistent form.
– Data merging by taking data from different sources and
consolidating into one place.
– Establishing referential integrity.

9
Data Warehouse Components
Warehouse manager is the centre of data-
warehousing system. The data within the data
warehouse is organized such that it becomes easy
to find, use and update frequently from its sources.
– A warehouse manager is responsible for the warehouse
management process.
– It consists of third-party system software, C
programs/data management tools.
– The size and complexity of warehouse managers varies
between specific solutions.

10
Warehouse Manager Architecture

11
• Operations Performed by Warehouse
Manager
– Analysis of data to ensure consistency
– Transformation and merging of source data
from temporary storage into data warehouse
tables
– Creation of indexes and views on base tables
– Generation of de-normalizations (if necessary)
– Generation of aggregations (if necessary)
– Backing-up and archiving data.

12
Query Manager
• Query Manager Component provides the
end-users with access to the stored
warehouse information through the use of
specialized end-user tools. Data mining
access tools have various categories such as
query and reporting, on-line analytical
processing (OLAP), statistics, data discovery
and graphical and geographical information
systems.
13
Query Manager
• Query Manager
– Query manager is responsible for directing the
queries to the suitable tables.
• By directing the queries to appropriate tables, the
speed of querying and response generation can be
increased.
– Query manager is responsible for scheduling
the execution of the queries posed by the user
– This component is typically constructed using
end-user data access tools, data warehouse
monitoring tools
14
Query Manager Architecture

15
• query manager includes the following:
– Query redirection via C tool or RDBMS
– Stored procedures
– Query management tool
– Query scheduling via C tool or RDBMS
– Query scheduling via third-party software

16
Metadata
• This area of the warehouse stores all the metadata
(data about data).
• Metadata is used for a variety of purposes
including:
– the extraction and loading processes – metadata is
used to map data sources to a common view of the data
within the warehouse
– the warehouse management process – metadata is
used to automate the production of summary tables
– as part of the query management process – metadata
is used to direct a query to the most appropriate data
source.
17
18
End-User Access Tools
• The principal purpose of data warehousing
is to provide information to business users
for strategic decision-making.
• These users interact with the warehouse
using end-user access tools.
– Reporting and query tools
– Online analytical processing (OLAP) tools;
– Data mining tools.

19
Data Warehouse Data Flows

20
• Inflow : Extraction, cleansing, and loading of the
source data.
• Upflow : Adding value to the data in the
warehouse through summarizing, packaging, and
distribution of the data.
• Downflow : Archiving and backing-up the data
in the warehouse.
• Outflow : Making the data available to end-
users.
• Metaflow: Managing the metadata.

21
Data Warehouse Models
• Virtual Warehouse
– The view over an operational data warehouse is
known as a virtual warehouse.
– Building a virtual warehouse is easy requires
excess capacity on operational database servers
• Data mart
• Enterprise Warehouse

22
Data Warehouse Models
• Virtual Warehouse
• Data mart
– Data mart contains a subset of organization-wide data
– Data marts are small in size.
– Data marts are customized by department.
– The source of a data mart is departmentally
structured data warehouse.
– Data marts are flexible.
• Enterprise Warehouse

23
Data Warehouse Models
• Virtual Warehouse
• Data mart
• Enterprise Warehouse
– An enterprise warehouse collects all the information
and the subjects spanning an entire organization.
– It provides us enterprise-wide data integration.
– The data is integrated from operational systems and
external information providers.
– This information can vary from a few gigabytes to
hundreds of gigabytes, terabytes or beyond.
24
Data Warehousing Tools
• Amazon Redshift
• Teradata
• Oracle
• Informatica 
• IBM Infosphere
• Ab Initio Software
• ParAccel 
• Cloudera 
• Analytix DS
• MarkLogic 
25
Assignment and Quiz # 2
• Give brief introduction of five data
warehousing tools
– Handwritten
– Deadline: Next lecture
• Quiz 2 next lecture

26
Project
• Submit the title and brief introduction of the
business you want to built a data
warehouse/ data mart
• Also highlight the subject(s) you want to
analyze.

27

You might also like