0% found this document useful (0 votes)
122 views

DataWarehousing Building Blocks

This document discusses the key components and concepts involved in building a data warehouse, including: 1) Data is extracted from multiple sources like databases, files, and transactions, then transformed through cleaning, standardization, and integration before being loaded into the data warehouse. 2) A data warehouse is subject-oriented, integrated, nonvolatile, and time-variant, with data organized around major subjects and providing an historical perspective. 3) The data warehouse contains data from production systems, internal sources, archived data, and external sources, which is prepared and stored separately from transaction systems.

Uploaded by

dsrawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views

DataWarehousing Building Blocks

This document discusses the key components and concepts involved in building a data warehouse, including: 1) Data is extracted from multiple sources like databases, files, and transactions, then transformed through cleaning, standardization, and integration before being loaded into the data warehouse. 2) A data warehouse is subject-oriented, integrated, nonvolatile, and time-variant, with data organized around major subjects and providing an historical perspective. 3) The data warehouse contains data from production systems, internal sources, archived data, and external sources, which is prepared and stored separately from transaction systems.

Uploaded by

dsrawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Data Warehousing: Building Blocks

Definition
C Bill Inmon, considered to be the father of Data
Warehousing provides the following definition:
C “A Data Warehouse is a subject oriented, integrated,
nonvolatile, and time variant collection of data in support of
management’s decisions.”
C Defining Features are
C Subject Oriented
C Integrated
C NonVolatile
C TimeVariant
C Data Granularity
Data Warehouse—Subject-Oriented

C Organized around major subjects, such as customer,product,


sales
C Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing
C Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process

3 September 1,2012
Subject Oriented
Data Warehouse—Integrated
C Constructed by integrating multiple, heterogeneous data
sources
C relational databases, flat files, on-line transactionrecords
C Data cleaning and data integration techniques are applied.
C Ensure consistency in naming conventions, encoding
structures,attribute measures,etc.among different data
sources
C E.g.,Hotel price:currency,tax,breakfast covered,etc.
C When data is moved to the warehouse, it is converted.

5 September 1,2012
Integrated Data
Data Warehouse—Nonvolatile
C A physically separate store of data transformed from the
operational environment
C Operational update of data does not occur in the data
warehouse environment
C Does not require transaction processing, recovery,and
concurrency control mechanisms
C Requires only two operations in data accessing:
C initial loading of data and access of data

7 September 1,2012
Non Volatile
Data Warehouse—Time Variant
C The time horizon for the data warehouse is significantly longer
than that of operational systems
C Operational database: current value data
C Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
C Every key structure in the data warehouse
C Contains an element of time, explicitly or implicitly
C But the key of operational data may or may not contain“time
element”

9
Data Granularity
Approaches for Data Warehouse Design
C Top-down or bottom-up approach?
C Enterprise-wide or departmental?
C Which first—data warehouse or data mart?
C Build pilot or go with a full-fledged implementation?
C Dependent or independent data marts?
Three Data Warehouse Models
C Enterprise warehouse
C collects all of the information about subjects spanning the
entire organization
C Data Mart
C a subset of corporate-wide data that is of value to a specific
groups of users. Its scope is confined to specific, selected
groups, such as marketing data mart
C Independent vs. dependent (directly from warehouse) datamart
C Virtual warehouse
C A set of views over operational databases
C Only some of the possible summary views may be materialized

12 September 1,2012
Data Warehouse vs Data Marts
Top Down Approach
C The advantages of this approach are:
C A truly corporate effort, an enterprise view of data
C Inherently architected—not a union of disparate data marts
C Single, central storage of data about the content
C Centralized rules and control
C May see quick results if implemented with iterations
C The disadvantages are:
C Takes longer to build even with an iterative method
C High exposure/risk to failure
C Needs high level of cross-functional skills
C High outlay without proof of concept
Bottom Up Approach
C The advantages of this approach are:
C Faster and easier implementation of manageable pieces
C Favorable return on investment and proof of concept
C Less risk of failure
C Inherently incremental; can schedule important data marts
first
C Allows project team to learn and grow
C The disadvantages are:
C Each data mart has its own narrow view of data
C Permeates redundant data in every data mart
C Perpetuates inconsistent and irreconcilable data
C Proliferates unmanageable interfaces
Practical Approach
C The steps in this practical approach are as follows:
C 1. Plan and define requirements at the overall corporate
level
C 2. Create a surrounding architecture for a complete
warehouse
C 3. Conform and standardize the data content
C 4. Implement the data warehouse as a series of supermarts,
one at a time
Data Warehouse Development: A
Recommended Approach

Multi-Tier Data
Warehouse
Distributed
Data Marts

Enterprise
Data Data
Data
Mart Mart
Warehouse

Model refinement Model refinement

Define a high-level corporate data model


17 September 1,2012
Data Warehouse Building Blocks
Production Data
C This category of data comes from the various operational
systems of the enterprise.
C Based on the information requirements in the data warehouse,
you choose segments of data from the different operational
systems.
C It is in different formats and taken from different hardware
platforms.
C The significant and disturbing characteristic of production data
is disparity.
C The challenge is to standardize and transform the disparate
data from the various production systems, convert the data,
and integrate the pieces into useful data for storage.
Internal Data
C In every organization, users keep their “private”
spreadsheets, documents, customer profiles, and
sometimes even departmental databases.
C This is the internal data, parts of which could be useful in
a data warehouse.
C You cannot ignore the internal data held in private files in
your organization.
C It is a collective judgment call on how much of the
internal data should be included in the data warehouse.
C Internal data adds additional complexity to the process of
transforming and integrating the data before it can be
stored in the data warehouse.
Archived Data
C In every operational system, you periodically take the old
data and store it in archived files.
C The circumstances in your organization dictate how often
and which portions of the operational databases are
archived for storage. Some data is archived after a year.
C Many different methods of archiving exist.There are
staged archival methods.
C At the first stage, recent data is archived that may still be
online.
C At the second stage, the older data is archived to flat files on
disk storage.
C At the next stage, the oldest data is archived to tape cartridges
or microfilm and even kept off-site.
External Data
C Most executives depend on data from external sources
for a high percentage of the information they use.
C They use statistics relating to their industry produced by
external agencies.
C They use market share data of competitors.
C They use standard values of financial indicators for
performance.
C Usually, data from outside sources do not conform to
your formats.
C You have to organize the data transmissions from the
external sources.
Data Staging
C The extracted data coming from several disparate
sources needs to be changed,converted,and made ready
in a format that is suitable to be stored for querying and
analysis.
C Three major functions need to be performed in this area.
C Extraction,
C Transformation
C Loading
C A separate staging area, therefore,is a necessity for
preparing data for the data warehouse.
Data Extraction
C This function has to deal with numerous data sources.You
have to employ the appropriate technique for each data
source.
C Source data may be from different source machines in
diverse data formats (Relational,Spreadsheets, Flat files
etc).
C After you extract the data, where do you keep the data
for further preparation?
C More frequently, data warehouse implementation teams
extract the source into a separate physical environment
from which moving the data into the data warehouse is
easy.
Data Transformation
C Number of individual tasks as part of data transformation.Are need to be
performed.
C First,cleaning the data extracted from each source.Cleaning may just be
C correction of misspellings,or
C resolution of conflicts between state codes and zip codes ,or
C providing default values for missing data elements,or
C elimination of duplicates
C Standardization of data elements forms a large part of data transformation.
C Semantic standardization is another major task.You resolve synonyms and
homonyms.
C Data transformation involves many forms of combining pieces of data from
the different sources.
C Data transformation also involves purging source data that is not useful and
separating out source records into new combinations.
C Sorting and merging of data takes place on a large scale in the data staging
area.
Data Loading
Data Storage Component
C The data storage for the data warehouse is a separate
repository.
C The data repository for a data warehouse keeps large volumes
of historical data for analysis.
C It has to keep the data in structures suitable for analysis,and
not for quick retrieval of individual pieces of information.
C Therefore, the data storage for the data warehouse is kept
separate from the data storage for operational systems.
C Data extracted from the data warehouse storage is aggregated
in many ways and the summary data is kept in the
multidimensional databases (MDDBs).
Information Delivery Component
C Who are the users that need information from the data
warehouse?
C The novice user comes to the data warehouse with no
training and, therefore needs prefabricated reports and preset
queries.
C The casual user needs information once in a while,not
regularly.
C The business analyst looks for ability to do complex analysis
using the information
C The power user wants to be able to navigate throughout the
data warehouse,pick up interesting data,format his or her own
queries, drill through the data layers, and create custom
reports and ad hoc queries.
Information Delivery Component Contd…
Metadata Component
C Metadata in a data warehouse is similar to the data
dictionary or the data catalogue in a database
management system.
C The metadata component is the data about the data in
the data warehouse.
C Types of Metadata
C Operational Metadata
C Extraction and Transformation Metadata
C End-User Metadata
Metadata Contd…
C Operational Metadata.
C As you know, data for the data warehouse comes from several
operational systems of the enterprise.
C These source systems contain different data structures, field lengths
and data types, different source files, multiple coding schemes and
field lengths. Operational metadata contain all of this information
about the operational data sources.
C Extraction andTransformation Metadata.
C Extraction and transformation metadata contain data about the
extraction of data from the source systems, namely, the extraction
frequencies, extraction methods, and business rules for the data
extraction.
C Also, this category of metadata contains information about all the
data transformations that take place in the data staging area.
Metadata Contd…
C End-User Metadata.
C The end-user metadata is the navigational map of the data
warehouse.
C It enables the end-users to find information from the data
warehouse.
C The end-user metadata allows the end-users to use their own
business terminology and look for information in those ways in
which they normally think of the business.
Management And Control Component
C This component of the data warehouse architecture sits
on top of all the other components.
C It coordinates the services and activities within the data
warehouse.
C It controls the data transformation and the data transfer
into the data warehouse storage.
C It works with the database management systems and
enables data to be properly stored in the repositories.
C It monitors the movement of data into the staging area
and from there into the data warehouse storage itself.
C It interacts with the metadata component to perform the
management and control functions.
Thanks

You might also like