Unit 2 Data Warehouse New

This document provides an overview of data warehouse and OLAP technology. It discusses key concepts such as the components of a data warehouse including operational data sources, operational data stores, load managers, warehouse managers, query managers, and end user access tools. It also covers ETL processes, data warehouse architectures including star schemas and snowflake schemas, differences between OLTP and OLAP systems, and considerations for data warehouse implementation and conceptual modeling.

Uploaded by

SUMAN SHEKHAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views45 pages

Unit 2 Data Warehouse New

Uploaded by

SUMAN SHEKHAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Unit 2

Data Warehouse and OLAP Technology

• A data warehouse is simply a single, complete and consistent store of
data optained from a variety of sources and made available to end
users in a way they can understand and use it in a business context.
• A data warehouse is a subject oriented ,integrated, time variant and
nonvolatile collection of data in support of managements decision
making process.
Data warehouse- subject oriented
• Oriented to the major subject areas of the corporation that have been
defined in the data model.
• for example, for an insurance company :customer, product,
transaction or activity, policy ,claim, account etc.
Data warehouse-Integrated
• There is no consistency in encoding, naming conventions, among
different data sources.
• heterogeneous data sources
• when data is moved to the warehouse, it is converted.
Data warehouse- nonvolatile
• Operational data is regularly accessed and manipulated a record at a
time and update is done to data in the operational environment.
Data warehouse- time variance
• That time Horizon for the data warehouse is sufficiently longer than
that of operational systems.
• operational database: current value data
Building blocks or component
• Meta data -good metadata is essential to the effective operation of a
data warehouse and it is used in data collection, data transformation
and data access.
• Meta data maps the translation of information from the operational
system to the analytical system.
Data marts
• Data mart are smaller than data warehouses and generally contain
information from a single department of a business or organisation.
The current trend in data warehouseing is to develop a data
warehouse with several smaller related data marts for specific kinds
of queries and reports.
Security
• As with any information system security of data is determined by the
hardware software and the procedures that created them. The
reliability and authenticity of the data and information extracted from
the warehouse will be a function of the reliability and authenticity of
the warehouse and the various source systems.
Construction
• That steps in planning of data warehouse are identical to the steps
for any other type of computer application. Users must be involved to
determine the scope of the warehouse and what business
requirements need to be met.
Why a warehouse
• Two approaches:
• 1.Query-driven (lazy)
• 2.Warehouse (Eager)
• The traditional research
• Query driven( lazy, on demand)
Disadvantages of query driven approach
• Delay in query processing.
• Slow or unavailable information sources
• complex filtering and integration
• inefficient and potentially expensive for frequent queries
• competes with local processing at sources
• has not caught on in industry
The warehousing approach
• Information integrated in advance
• stored in warehouse for Direct.
• Advantages of warehousing approach
• High query performance
• but not necessarily most current information
• does not interfere with local processing at sources
• complex queries at warehouse.
Data warehouse architectures
• 1. Single layer
• every data element is stored once only
• virtual warehouse
• 2.Two layer
• real time+ derived data
• most commonly used approach in industry today
• 3. three layered architecture
• transformation of real time data to derived data really requires two steps: view
level ‘particular informational needs’
• physical implementation of the data warehouse.
Data warehouse architecture
Data warehouse components
• 1.Operational data sources-For the data warehouse is supplied from
mainframe operational data held in first generation hierarchical and
network data bases ,departmental data held in File systems, private
data held on work stations and private servers and external systems
such as the internet, commercially available database or database
associated with and organisationa’s suppliers or customers.
• 2. Operational datastore (ODS)- is a repository of current and
integrated operational data used for analysis. it is often structured
and supplied with data in the same way as the data warehouse, but
may in fact simply act as a staging area for data to be moved into the
warehouse.
• 3. Load manager-Also called the front-end company, it performance
all the operations associated with the extraction and loading of data
into the warehouse. these operations include simple transformations
of the data to prepare the data for entry into the warehouse.
• 4. Warehouse manager- performs all the operations associated with
the management of the data in the warehouse. The operations
performed by this component include analysis of data to ensure
consistency, transformation and merging of source data, creation of
indexes and views, generation of denormalisation and aggregations.
• 5. Query manager- also called back and component, it performs all
the operations associated with the management of user queries. The
operations performed by this component include directing queries to
the appropriate tables and scheduling the execution of queries.
• 6. End user access tools -can be categorised into five main groups
data reporting and query tools, application development tools,
executive information system tools, online analytical processing tools
and data mining tools.
• Diagram in data warehouse slide.
Data warehouse implementation
• Includes loading data, Implementing transformation program, design
user interface, developing standard query and reports and training to
warehouse users.
ETL in data warehouse
• The process of extracting data from source system and bringing it into
the data warehouse is commonly called ETL which stands for:
• Extraction -to retrieve all the required data from the source system
with as little resources as possible.
• Transformation –Applies a set of rules to transform the data from the
source to the target .
• converting any measured data to the same dimension using the same
units so that they can later be joined.
• it also requires joining data from several sources, generating
aggregates, sorting, deriving new calculated values.
• Loading-To ensure that the load is performed correctly and with as
little resources as possible. The target of the load process is often a
database. The referential integrity needs to be maintained by ETL tool
to ensure consistency.
Advantages of data warehouse
implementation
• 1. Better data management and delivery -one of the most important
advantages of using a data warehousing system in the organisation is
efficient data management and delivery .It helps in the storage of all
types of data from different sources into a single base that can be
used for analysis purposes.
• 2. Better decision making- the use of effective inside cell business
intelligence the management of the organisation can take effective
decisions based on solid data analysis.
• Cost reduction -it helps in avoiding duplication of works that
ultimately helps in reducing the cost and increasing the efficiency of
the organisation.
• Competitive advantages- as the organisation is able to make effective
decision, they would be ready to out with their competitors as they
are able to fully utilise their resources and can focus on activities in a
better way.
Data processing models
• There are two basic data processing motels
• 1. 0LTP-The main aim of OLTP is reliable and efficient processing of a
large number of transactions and ensuring data consistency.
• 2. OLAP- The main aim of OLEP is efficient multi dimensional
processing of large data volumes.
Traditional OLTP
• Traditionally DBMS Have been used for online transaction processing OLTP
• Order entry :pull up order and update status field
• banking: transfer rupees thousand from account X to account Y
• critical data processing task
• detailed up to date data
• structured repetitive tasks
• Short transactions are the unit of work
• read and update a few records
• isolation, recovery and integrity are critical
OLTP vs OLAP
• OLTP: online transaction processing
• describes processing at operational sites
• OLAP :online analytical processing
• describes processing at warehouse
Comparison of 0LTP system and data
warehousing system
Conceptual modelling of data warehouse
• Three basic conceptual DBMS schemas:
• Star schema
• snowflake schema
• fact constellation
Star schema
• A single object in the middle connected to a number of dimension tables.
• Terms
• Basic notion: a measure(e.g sales quality etc.)
• given collection of numeric measures
• each measures depends on a set of dimensions (e.g sales volumes as a function of
product ,time and location )
• relation which relates the dimensions to the measure of interest is called the fact table(e.g
sale)
• information about dimensions can be represented as a collection of relations called the
dimension table(Product, Customer ,store)
• each dimension can have a set of associated attributes.
• Diagram in data warehouse slide
Snowflake schema
• A refinement of Star schema where the dimensional hierarchy is
represented explicitly by normalising the dimension tables.
• Diagram in data warehouse slide
Fact constellation
• Multiple fact table share dimension tables database design methodology for data
warehouse
• 1. choosing the process
• 2 choosing the grain
• 3. identifying and confirming the dimensions
• 4. choosing the facts
• 5.storing the precalculation in fact table
• 6.rounding out the dimension tables
• 7. choosing the duration of the database
• 8. tracking slowly changing dimensions
• 9.deciding the query priorities and the query modes.
• Choosing the process-
• the process (function) refer to the subject matter of a particular data
marts. The first data mart to be built should be the one that is most
likely to be delivered on time within budget and to answer the most
commercial important business questions.
• The best choice for the first data mart tends to be the one that is
related to sales.
• Choosing the grain-
• Choosing the grain means deciding exactly what affect people record
represents.
• Only when the grain for the fact table is chosen we can identify the
dimensions of the fact table.
• The grain decision for the fact table also determines the grain of each
of the dimension tables.
• Identifying and conforming the dimensions-
• Dimensions set the context for formulating queries about the facts in
the fact table.
• We identify dimensions in sufficient detail to describe things such as
clients and properties at the correct Grain.
• Choosing the facts-
• The grain of the fact table determines which facts can be used in the
data mart -all facts must be expressed at the level implied by The
Grain.
• Storing pre-calculation in the fact table-
• Once the facts have been selected it should be re-examined to
determine whether there are opportunities to use pre-
calculations.ex :a profit or loss statement.
• Rounding out the dimensions tables-
• In this step we return to the dimention tables and add as many text
descriptions to the dimensions as possible.
• The text description should be as understandable to the users as
possible.
• Choosing the duration of the data warehouse-
• The duration measures how far back in time the fact table goes .
• for some companies(e.g insurance companies) there may be a legal
requirement to retain data extending back five or more years.
• Tracking slowly changing dimensions-
• The changing dimension problem means that the proper description
of the old client and the old branch must be used with the old data
warehouse schema.
• Deciding the query priorities and the query moves-
• In this step we consider physical design issues.
• The presence of pre- stored summaries and aggregates
• security issue
• backup issue etc

Denodo Certified Developer (Associate) Exam - Prep
No ratings yet
Denodo Certified Developer (Associate) Exam - Prep
25 pages
DWH by Concepts - v1
No ratings yet
DWH by Concepts - v1
56 pages
C Abapd 2309
100% (1)
C Abapd 2309
27 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
Advanced Database Systems: Lab Material (Part I)
100% (5)
Advanced Database Systems: Lab Material (Part I)
21 pages
Module-1 DBMS Student Vesion
No ratings yet
Module-1 DBMS Student Vesion
95 pages
SnowflakeSQL Intro
No ratings yet
SnowflakeSQL Intro
33 pages
ETL State of The Art
No ratings yet
ETL State of The Art
198 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
CH 2 Introduction To Data Warehousing
No ratings yet
CH 2 Introduction To Data Warehousing
31 pages
Introduction To Java Database Connectivity JDBC
No ratings yet
Introduction To Java Database Connectivity JDBC
12 pages
IT Practical File CLASS 10
No ratings yet
IT Practical File CLASS 10
34 pages
SQL Injection: Berserkr
No ratings yet
SQL Injection: Berserkr
7 pages
Bab 05 Manajemen Data Dan Pengetahuan
No ratings yet
Bab 05 Manajemen Data Dan Pengetahuan
43 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
DBT Cert Notes
No ratings yet
DBT Cert Notes
22 pages
Relationsanddiagraphs
No ratings yet
Relationsanddiagraphs
55 pages
SQL INterview Questions
No ratings yet
SQL INterview Questions
27 pages
DWNotes PDF
No ratings yet
DWNotes PDF
209 pages
Building A Data Warehouse With SQL Server: Presented by John Sterrett
No ratings yet
Building A Data Warehouse With SQL Server: Presented by John Sterrett
28 pages
Data Analytics Complete Syllabus
No ratings yet
Data Analytics Complete Syllabus
5 pages
MIS-15 - Data and Knowledge Management
No ratings yet
MIS-15 - Data and Knowledge Management
55 pages
Abdi
No ratings yet
Abdi
4 pages
Full SQL Updated
No ratings yet
Full SQL Updated
273 pages
Query Optimization
No ratings yet
Query Optimization
30 pages
Advance Java Programming
No ratings yet
Advance Java Programming
7 pages
Chap01 Data Warehouse 1
No ratings yet
Chap01 Data Warehouse 1
65 pages
Govindarajan Data Vault PDF
100% (1)
Govindarajan Data Vault PDF
29 pages
000 2007 Business Intelligence Platform Capability Matrix Kurt Schlegel, Bhavish Sood
No ratings yet
000 2007 Business Intelligence Platform Capability Matrix Kurt Schlegel, Bhavish Sood
11 pages
Resume Velocity 2 Years Informatica
No ratings yet
Resume Velocity 2 Years Informatica
2 pages
Data Warehouse
100% (1)
Data Warehouse
12 pages
MySQL Storage Engines
No ratings yet
MySQL Storage Engines
2 pages
Triggers & Active Data Bases
No ratings yet
Triggers & Active Data Bases
10 pages
Caching Strategy in Relational Database and Redis
No ratings yet
Caching Strategy in Relational Database and Redis
19 pages
Data Warehousing: Defined and Its Applications
No ratings yet
Data Warehousing: Defined and Its Applications
31 pages
DWDM UNIT-1 Lecture Notes
No ratings yet
DWDM UNIT-1 Lecture Notes
15 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
Data Modeler Release Notes
No ratings yet
Data Modeler Release Notes
81 pages
Syllabus DWBI Database ETL Informatica Testing
No ratings yet
Syllabus DWBI Database ETL Informatica Testing
3 pages
Individual Assignment - 1
No ratings yet
Individual Assignment - 1
2 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
11 pages
Unit 2
No ratings yet
Unit 2
8 pages
Dbms 6
No ratings yet
Dbms 6
2 pages
WinCC - Connectivity Pack - Query For User Archives
No ratings yet
WinCC - Connectivity Pack - Query For User Archives
2 pages
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
No ratings yet
Talend Open Studio For Master Data Management: A Practical Starter Guide 2nd Edition
100 pages
DP-900 Practice Set
100% (2)
DP-900 Practice Set
23 pages
Velocity v8 Data Warehousing Methodology
No ratings yet
Velocity v8 Data Warehousing Methodology
1,106 pages
Datawarehousing Chap01
No ratings yet
Datawarehousing Chap01
27 pages
Speed Your Data Lake ROI
100% (1)
Speed Your Data Lake ROI
16 pages
EB2406 - Teradata PDF
No ratings yet
EB2406 - Teradata PDF
18 pages
The Data Warehouse Environment - Building The Data WareHouse
No ratings yet
The Data Warehouse Environment - Building The Data WareHouse
52 pages
Teradata SQL Performance Tuning Case Study Part II
0% (1)
Teradata SQL Performance Tuning Case Study Part II
37 pages
Informatica MDM Intregration With PIM - Design Blueprint v1.0
100% (1)
Informatica MDM Intregration With PIM - Design Blueprint v1.0
22 pages
Data Warehousing and Data Mining
100% (1)
Data Warehousing and Data Mining
48 pages
Unit 1
No ratings yet
Unit 1
14 pages
Informatica
No ratings yet
Informatica
7 pages
ETL Testing in Less Time
No ratings yet
ETL Testing in Less Time
16 pages
Data Warehouse and Design Presentation
No ratings yet
Data Warehouse and Design Presentation
11 pages
Data Mining Unit - 1 Notes
No ratings yet
Data Mining Unit - 1 Notes
16 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
38 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
DQ Architecture
0% (1)
DQ Architecture
3 pages
Bussiness Intelligence
No ratings yet
Bussiness Intelligence
6 pages
Data Warehousing Strategy
No ratings yet
Data Warehousing Strategy
22 pages
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
Data Warehouse Assessment
No ratings yet
Data Warehouse Assessment
2 pages
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
No ratings yet
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
5 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
18 pages
Informatica Big Data Management Course Agenda
100% (2)
Informatica Big Data Management Course Agenda
4 pages
Star Trak Data Warehouse Schema v1 Draft
No ratings yet
Star Trak Data Warehouse Schema v1 Draft
11 pages
Data Warehousing FAQ
No ratings yet
Data Warehousing FAQ
5 pages
How To Sell A Data Warehouse To Upper Management Checklist
No ratings yet
How To Sell A Data Warehouse To Upper Management Checklist
6 pages
Quora - Informatica DW BI Ques ANS
No ratings yet
Quora - Informatica DW BI Ques ANS
7 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
Dataware Q&a Bank
100% (1)
Dataware Q&a Bank
42 pages
SQL Business Intelligence Developer in New York NY Resume Peter Rosenblum
No ratings yet
SQL Business Intelligence Developer in New York NY Resume Peter Rosenblum
2 pages
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
No ratings yet
CSE 530 - Database Management Systems: Data Warehousing Presentation by Ali Gardezi Prashanth Janardanan Aaron Sheffield
69 pages
Informatica Tutorials
No ratings yet
Informatica Tutorials
2 pages
Data Warehouse Questions
No ratings yet
Data Warehouse Questions
2 pages
Data Mining and Data Warehouse BY
100% (1)
Data Mining and Data Warehouse BY
12 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Insurance DataWare House Design Vechiles
No ratings yet
Insurance DataWare House Design Vechiles
2 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Need of Two Types of Data: Information
No ratings yet
Need of Two Types of Data: Information
7 pages
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
From Everand
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
Manoj Kumar
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
SnapLogic Second Edition
From Everand
SnapLogic Second Edition
Gerardus Blokdyk
No ratings yet

Unit 2 Data Warehouse New

Uploaded by

Unit 2 Data Warehouse New

Uploaded by

Unit 2

Data Warehouse and OLAP Technology

You might also like