0% found this document useful (0 votes)
19 views38 pages

Lecture 6

Uploaded by

samkh866n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views38 pages

Lecture 6

Uploaded by

samkh866n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

THE ARCHITECTURAL COMPONENTS

Lecture # 06
Instructor: Mr. Sharjeel Ahmed
Slide Elements
• The architectural components
• Data Warehouse Architecture
• Distinguishing Characteristics
• Architectural Framework
• Technical Architecture
DATA WAREHOUSE ARCHITECTURE
Architecture: Definitions
• The structure that brings all the components of a data warehouse
together is known as the architecture.
• In your data warehouse, architecture includes a number of factors.
• Primarily, it includes the integrated data that is the centerpiece.
• The architecture includes everything that is needed to prepare the
data and store it.
• It also includes all the means for delivering information from your data
warehouse.
• The architecture is further composed of the rules, procedures, and
functions that enable your data warehouse to work and fulfill the
business requirements.
• Finally, the architecture is made up of the technology that empowers
your data warehouse.
General Purpose of The Architecture
• The architecture provides the overall framework for developing and
deploying your data warehouse.

• It is a comprehensive blueprint.

• The architecture defines the standards, measurements, general


design, and support techniques.
Architecture in Three Major Areas
As you already know, the three major areas in the data warehouse are:
• Data acquisition
• Data storage
• Information delivery
DISTINGUISHING CHARACTERISTICS
Distinguishing Characteristics
• Data warehouse architecture is wide, complex, and expansive.

• In a data warehouse, the architecture consists of distinct components.

• The architecture has distinguishing characteristics worth considering


in detail.

• Before moving on to discuss the architectural framework itself, let us


review the distinguishing characteristics of data warehouse
architecture:
Distinguishing Characteristics (Cont. )
Different Objectives and Scope
Defining the scope for a data warehouse is difficult. There are several
sets of factors to consider:
• The number and extent of the data sources. How many legacy systems are
you going to extract the data from? What are the external sources?
• Are you planning to include departmental files, spreadsheets, and private
databases? What about including the archived data?
• Scope of the architecture may again be measured in terms of the data
transformations and integration functions.
• Data granularity and data volumes are also important considerations.
• Yet another serious consideration is the impact of the data warehouse on the
existing operational systems. Because of the data extractions, comparisons,
and reconciliation, you have to determine how much negative impact the data
warehouse will have on the performance of operational systems.
• When will your batch extracts be run and how will they affect the production
source systems?
Distinguishing Characteristics (Cont. )
Data Content
• The “read-only” data in the data warehouse sits in the middle as the
primary component in the architecture.
• In your data warehouse, you keep data integrated from multiple sources.
• After extracting the data, which by itself is an elaborate process, you transform
the data, cleanse it, and integrate it in a staging area. Only then you move the
integrated data into the data ware house repository as read-only data.
Operational data is not “read-only” data.
• Further, your data warehouse architecture must support the storing of data
grouped by business subjects, not grouped by applications as in the case of
operational systems.
• The data in your data warehouse does not represent a snapshot containing
the values of the variables as they are at the current time. This is different and
distinct from most operational systems.
Distinguishing Characteristics (Cont. )
Complex Analysis and Quick Response
• Data warehouse architecture must support complex analysis of the
strategic information by the users.
• Users must be able to drill down, roll up, slice and dice data, and play with
“what-if ” scenarios.
• Users must have the capability to review the result sets in different output
options.
• Users are no longer content with textual result sets or results displayed in
tabular formats. Every result set in tabular format must be translated into
graphical charts.
• Your data warehouse architecture must make it easy to make strategic
decisions quickly.
• There must be appropriate components in the architecture to support quick
response by the users to deal with situations by using the information provided
by your data warehouse.
Distinguishing Characteristics (Cont. )
Flexible and Dynamic
• You have to make sure your data warehouse architecture is flexible
enough to accommodate additional requirements as and when they
surface.
• Additional requirements surface to include the missed items in the
business requirements.
• Moreover, business conditions themselves change. In fact, they keep
on changing. Changing business conditions call for additional
business requirements to be included in the data warehouse.
• If the data warehouse architecture is designed to be flexible and
dynamic, then your data warehouse can cater to the supplemental
requirements as and when they arise.
Distinguishing Characteristics (Cont. )
Metadata-driven
• As the data moves from the source systems to the end-users as
useful, strategic information, metadata surrounds the entire
movement.
• The metadata component of the architecture holds data about every
phase of the movement, and, in a true sense, makes the movement
happen.
• In your data warehouse architecture, the metadata component
interleaves with and connects the other components.
ARCHITECTURAL FRAMEWORK
Architectural Framework
• Earlier ( referring to Slide Page 6), we grouped the architectural
components as building blocks in the three distinct areas of data
acquisition, data storage, and information delivery.

• In each of these broad areas of the data warehouse, the architectural


components serve specific purposes.
1. Architecture Supporting Flow of Data
2. The Management and Control Module
Architecture Supporting Flow of Data
• Data that finally reaches the end-user as useful strategic information
begins as disparate data elements in the various data sources.
• This collection of data from the various sources moves to the staging
area.
• The extracted data then goes through a detailed preparation process
in the staging area before it is sent forward to the data warehouse to
be properly stored.
• From the data warehouse storage, data transformed into useful
information is retrieved by the users or delivered to the user desktops
as required.
• Let us now follow the flow of the data. At each stop along the passage,
let us identify the architectural components. Some of the architectural
components govern the flow of data from beginning to end.
Architecture Supporting Flow of Data (Cont. )
• What are the architectural components, and how do these components enable
the data flow?
At the Data Source
• Here the internal and external data sources form the source data architectural
component. Source data governs the extraction of data for preparation and
storage in the data warehouse. The data staging architectural component
governs the transformation, cleansing, and integration of data.
In the Data Warehouse Repository
• The data storage architectural component includes the loading of data from the
staging area and also storing the data in suitable formats for information
delivery. The metadata architectural component is also a storage mechanism to
contain data about the data at every point of the flow of data from beginning to
end.
At the User End
• The information delivery architectural component includes dependent data
marts, special multidimensional databases, and a full range of query and
reporting facilities.
Architecture Supporting Flow of Data (Cont. )
The Management and Control Module
• This architectural component is an overall module managing and
controlling the entire data warehouse environment.
• It is an umbrella component working at various levels and covering
all the operations.
• This component has two major functions: first to constantly monitor
all the ongoing operations, and next to step in and recover from
problems when things go wrong.
The Management and Control Module (Cont. )
TECHNICAL ARCHITECTURE
Technical Architecture
• The technical architecture of a data warehouse is the set of
functions and services provided within its components. The technical
architecture also includes the procedures and rules that are required
to perform the functions and provide the services. The technical
architecture also encompasses the data stores needed for each
component to provide the services.

• The architecture is not the set of tools needed to perform functions &
provide services. When we refer to the data extraction function within
one of the architectural components, we are simply mentioning the
function itself and the various tasks associated with it. Also, we are
relating the data store for the staging area to the data extraction
function because extracted data is moved to the staging area. Where
do the tools fit in? Tools are the means to implement the architecture.
• Architecture comes first and the tools follow.
Technical Architecture (Cont. )
• Let us now move on to consider the technical architecture in each of
the three major areas of the data warehouse:
1. Data Acquisition
2. Data Storage
3. Information Delivery

Detailed on Next Slides:


Data Acquisition
Data Acquisition: This area covers the entire process of extracting
data from the data sources, moving all the extracted data to the staging
area, and preparing the data for loading into the data warehouse
repository.

Two Major architectural components: Two major components of this


area are source data and data staging. The functions and services in
this area relate to these architectural components. The variations in the
data sources have a direct impact on the extent and scope of the
functions and services.

Data Flow: The data flow begins at the data sources and pauses at the
staging area. After transformation and integration, the data is ready for
loading into the data warehouse repository.
Data Acquisition (Cont. )
Data Acquisition (Cont. )
Data Sources: For the majority of data warehouses, the primary data
source consists of the enterprise’s operational systems.
• Many operational systems at several enterprises are still legacy systems
that resides on hierarchical or network databases. Use appropriate
language of the particular DBMS to extract data.
• Some More recent operational systems run on the client/server
architecture. Usually, these systems are supported by relational DBMSs.
Here you use an SQL-based language for extracting data.
• Large number of companies have adopted ERP (enterprise resource
planning) systems. ERP data sources provide an advantage in that the
data from these sources is already consolidated and integrated. There
could, however, be a few drawbacks to using ERP. You will have to use the
ERP vendor’s proprietary tool for data extraction. Also, most of the ERP
offerings contain very large numbers of source data tables.
• For Data from outside sources, you will have to create temporary files to
hold the data received from outside sources. After reformatting and
rearranging data elements, you will have to move the data to staging area.
Data Acquisition (Cont. )
Intermediary Data Stores: As data gets extracted from data sources,
it moves through temporary files.
• Sometimes, extracts of homogeneous data from several source
applications are pulled into separate temporary files and then merged
into another temporary file before moving it to the staging area.
• The opposite process is also common. From each application, one or
two large flat files are created and then divided into smaller files and
merged appropriately before moving the data to the staging area.
• Typically, the general practice is to use flat files to extract data from
operational systems.
Staging Area:
• This is the place where all the extracted data is put together and
prepared for loading into the data warehouse. The staging area may
contain data at the lowest grain to populate tables containing
business measurements. Staging area data repositories are relational
databases containing the fully integrated and cleansed data.
Data Acquisition (Cont. )
• Functions and Services: This is a general list. It does not indicate
the extent or complexity of each function or service:
1. Data Extraction
• Select data sources and determine types of filters to be applied
• Generate automatic extract files from operational systems using
replication and other techniques
• Create intermediary files to store selected data to be merged later
• Transport extracted files from multiple platforms
• Provide automated job control services for creating extract files
• Reformat input from outside sources
• Reformat input from departmental data files, databases, and
spreadsheets
• Generate common application code for data extraction
• Resolve inconsistencies for common data elements from multiple
sources
Data Acquisition (Cont. )
2. Data Transformation
• Map input data to data for data warehouse repository.
• Clean data, de-duplicate, and merge/purge.
• De-normalize extracted data structures as required by the
dimensional model of the data warehouse.
• Convert data types.
• Calculate and derive attribute values.
• Check for referential integrity.
• Aggregate data as needed. Resolve missing values.
• Consolidate and integrate data
Data Acquisition (Cont. )
3. Data Staging
• Provide backup and recovery.
• Sort and merge files.
• Create files as input to make changes to dimension tables
• If data staging storage is a relational database, create and populate
database.
• Preserve audit trail to relate each data item in the data warehouse to
input source.
• Resolve and create primary and foreign keys for load tables.
• Consolidate datasets and create flat files for loading through DBMS
utilities.
• If staging area storage is a relational database, extract load files
Data Storage (Cont. )
• This area covers the process of loading the data from the staging
area into the data warehouse repository.
Data Storage (Cont. )
Data Storage: This area covers the process of loading the data from
the staging area into the data warehouse repository.

Data Flow: For data storage, the data flow begins at the data staging
area to the data warehouse repository.
• If the data warehouse is an enterprise-wide data warehouse being
built in a top-down fashion, then there could be movements of data
from the enterprise-wide data warehouse repository to the
repositories of the dependent data marts.
• Alternatively, if data warehouse is being built in a bottom-up manner,
then the data movements stop with the appropriate conformed data
marts.
Data Storage (Cont. )
Data Groups: Prepared data waiting in the data staging area fall into
two groups.
• The first group is the set of files or tables containing data for a full
refresh. This group of data is usually meant for the initial loading of
the data warehouse. Occasionally, some data warehouse tables may
be refreshed fully.
• The other group of data is the set of files or tables containing ongoing
incremental loads. Most of these relate to nightly loads. Some
incremental loads of dimension data may be performed at less
frequent intervals.

The Data Repository: Almost all of today’s data warehouse databases


are relational databases. All the power, flexibility, and ease of use
capabilities of the RDBMS become available for the processing of data.
Data Storage (Cont. )
Functions and Services: This is a general list. It does not indicate the
extent or complexity of each function or service:
• Load data for full refreshes of data warehouse tables
• Perform incremental loads at regular prescribed intervals
• Support loading into multiple tables at detailed and summarized
levels
• Optimize the loading process
• Provide automated job control services for loading data warehouse
• Provide backup and recovery for the data warehouse database
• Provide security
• Monitor and fine-tune the database
• Periodically archive data from the database according to preset
conditions
Information Delivery (Cont. )
Information delivery: This area spans a broad spectrum of many
different methods of making information available to users.
• The information delivery component makes eases users to access
the information either directly from the enterprise-wide data
warehouse, from dependent data marts, or from the set of conformed
data marts.
• Most of the information access in a data warehouse is through online
queries and interactive analysis sessions. Nevertheless, your data
warehouse will also be producing regular and ad hoc reports.

• Data Flow: For information delivery, the data flow begins at the
enterprise-wide data warehouse and the dependent data marts when
the design is based on the top-down technique.
• When the design follows the bottom-up method, the data flow starts
at the set of conformed data marts.
• Data transformed into information flows to the user desktops during
query sessions.
Information Delivery (Cont. )
Service Locations: In your information delivery component, you may
provide query services from the user desktop, from an application
server, or from the database itself. This will be one of the critical
decisions for your architecture design.

Data Store: For information delivery, you may consider the following
intermediary data stores:
• Proprietary temporary stores to hold results of individual queries and
reports for repeated use
• Data stores for standard reporting
• Proprietary multidimensional databases

Functions and Services: This is a general list. It does not indicate the
extent or complexity of each function or service.
• Provide security to control information access
Information Delivery (Cont. )
• Monitor user access to improve service and for future enhancements
• Allow users to browse data warehouse content
• Simplify access by hiding internal complexities of storage from users
• Automatically reformat queries for optimal execution
• Enable queries to be aware of aggregate tables for faster results
• Govern queries and control runaway queries
• Provide self-service report generation for users, consisting of a
variety of flexible options to create, schedule, and run reports
• Store result sets of queries and reports for future use
• Provide multiple levels of data granularity
• Provide event triggers to monitor data loading
• Make provision for the users to perform complex analysis through
online analytical processing (OLAP)
• Enable data feeds to downstream, specialized decisions support
systems such as EIS and data mining
Information Delivery (Cont. )

You might also like