0% found this document useful (0 votes)

26 views9 pages

Chapter 1

Good pdf

Uploaded by

nafyjabesa1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views9 pages

Chapter 1

Good pdf

Uploaded by

nafyjabesa1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Chapter - 1

Introduction to Data Warehouse

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in

support of management's decision making process.

Subject-Oriented: A data warehouse can be used to analyze a particular subject area.

For example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and
source B may have different ways of identifying a product, but in a data warehouse, there will be only
a single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3
months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a
transactions system, where often only the most recent data is kept. For example, a transaction system
may hold the most recent address of a customer, where a data warehouse can hold all addresses
associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.

Data Warehouse Design Process:

A data warehouse can be built using a top-down approach, a bottom-up approach, or a combination of
both.
 The top-down approach starts with the overall design and planning. It is useful in cases
where the technology is mature and well known, and where the business problems that must be
solved are clear and well understood.
 The bottom-up approach starts with experiments and prototypes. This is useful in the early
stage of business modeling and technology development. It allows an organization to move
forward at considerably less expense and to evaluate the benefits of the technology before
making significant commitments.
 In the combined approach, an organization can exploit the planned and strategic nature of the
top-down approach while retaining the rapid implementation and opportunistic application of
the bottom-up approach.

1
The warehouse design process consists of the following steps:
 Choose a business process to model, for example, orders, invoices, shipments, inventory,
account administration, sales, or the general ledger. If the business process is organizational
and involves multiple complex object collections, a data warehouse model should be followed.
However, if the process is departmental and focuses on the analysis of one kind of business
process, a data mart model should be chosen.
 Choose the grain of the business process. The grain is the fundamental, atomic level of data to
be represented in the fact table for this process, for example, individual transactions, individual
daily snapshots, and so on.
 Choose the dimensions that will apply to each fact table record. Typical dimensions are time,
item, customer, supplier, warehouse, transaction type, and status.
 Choose the measures that will populate each fact table record. Typical measures are numeric
additive quantities like dollars sold and units sold.
A Three Tier Data Warehouse Architecture:

2
Tier-1:
The bottom tier is a warehouse database server that is almost always a relational database system.
Back-end tools and utilities are used to feed data into the bottom tier from operational databases or
other external sources (such as customer profile information provided by external consultants). These
tools and utilities perform data extraction, cleaning, and transformation (e.g., to merge similar data
from different sources into a unified format), as well as load and refresh functions to update the data
warehouse. The data are extracted using application program interfaces known as gateways. A gateway
is supported by the underlying DBMS and allows client programs to generate SQL code to be executed
at a server. Examples of gateways include ODBC (Open Database Connection) and OLEDB (Open
Linking and Embedding for Databases) by Microsoft and JDBC (Java Database Connection). This tier
also contains a metadata repository, which stores information about the data warehouse and its
contents.
Tier-2:
The middle tier is an OLAP server that is typically implemented using either a relational OLAP
(ROLAP) model or a multidimensional OLAP.
 OLAP model is an extended relational DBMS thatmaps operations on multidimensional data to
standard relational operations.
 A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly
implements multidimensional data and operations.

Tier-3:
The top tier is a front-end client layer, which contains query and reporting tools, analysis tools, and/or
data mining tools (e.g., trend analysis, prediction, and so on).

Data Warehouse Models:

There are three data warehouse models.
1. Enterprise warehouse:
 An enterprise warehouse collects all of the information about subjects spanning the entire
organization.
 It provides corporate-wide data integration, usually from one or more operational systems or
external information providers, and is cross-functional in scope.

 It typically contains detailed data as well as summarized data, and can range in size from a few
gigabytes to hundreds of gigabytes, terabytes, or beyond.
 An enterprise data warehouse may be implemented on traditional mainframes, computer super

3
servers, or parallel architecture platforms. It requires extensive business modeling and may
take years to design and build.
2. Data mart:

 A data mart contains a subset of corporate-wide data that is of value to a specific group of
users. The scope is confined to specific selected subjects. For example, a marketing data mart
may confine its subjects to customer, item, and sales. The data contained in data marts tend to
be summarized.
 Data marts are usually implemented on low-cost departmental servers that are UNIX/LINUX-
or Windows-based. The implementation cycle of a data mart is more likely to be measured in
weeks rather than months or years. However, it may involve complex integration in the long
run if its design and planning were not enterprise-wide.

 Depending on the source of data, data marts can be categorized as independent more
dependent. Independent data marts are sourced from data captured from one or more
operational systems or external information providers, or from data generated locally within a
particular department or geographic area. Dependent data marts are source directly from
enterprise data warehouses.

3. Virtual warehouse:

 A virtual warehouse is a set of views over operational databases. For efficient query
processing, only some of the possible summary views may be materialized.
 A virtual warehouse is easy to build but requires excess capacity on operational database
servers.

Meta Data Repository:

Metadata are data about data. When used in a data warehouse, metadata are the data that define
warehouse objects. Metadata are created for the data names and definitions of the given warehouse.
Additional metadata are created and captured for time stamping any extracted data, the source of the
extracted data, and missing fields that have been added by data cleaning or integration processes.

4
Schema Design:
Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Databases The entity-
relationship data model is commonly used in the design of relational databases, where a database
schema consists of a set of entities and the relationships between them. Such a data model is
appropriate for online transaction processing. A data warehouse, however, requires a concise,
subject-oriented schema that facilitates on-line data analysis. The most popular data model for a data
warehouse is a multidimensional model. Such a model can exist in the form of a star schema, a
snowflake schema, or a fact constellation schema. Let’s look at each of these schema types. Star
schema: The most common modeling paradigm is the star schema, in which the data warehouse
contains (1) a large central table (fact table) containing the bulk of the data, with no redundancy, and
(2) a set of smaller attendant tables (dimension tables), one for each dimension. The schema graph
resembles a starburst, with the dimension tables displayed in a radial pattern around the central fact
table.

Star schema:

A star schema for All Electronics sales is shown in Figure. Sales are considered along four
dimensions, namely, time, item, branch, and location. The schema contains a central fact table for sales
that contains keys to each of the four dimensions, along with two measures: dollars sold and units sold.
To minimize the size of the fact table, dimension identifiers (such as time key and item key) are
system-generated identifiers. Notice that in the star schema, each dimension is represented by only one
table, and each table contains a set of attributes. For example, the location dimension table contains the
attribute set{location key, street, city, province or state, country}. This constraint may introduce some
redundancy.

For example, “Vancouver” and “Victoria” are both cities in the Canadian province of British
Columbia. Entries for such cities in the location dimension table will create redundancy among the
attributes province or state and country, that is, (..., Vancouver, British Columbia, Canada) and (...,
Victoria, British Columbia, Canada). Moreover, the attributes within a dimension table may form
either a hierarchy (total order) or a lattice (partial order).

5
Snowflake Schema:

A snowflake schema for All Electronics sales is given in Figure Here; the sales fact table is identical to
that of the star schema in Figure. The main difference between the two schemas is in the definition of
dimension tables.
The single dimension table for item in the star schema is normalized in the snowflake schema,
resulting in new item and supplier tables. For example, the item dimension table now contains the
attributes item key, item name, brand, type, and supplier key, where supplier key is linked to the
supplier dimension table, containing supplier key and supplier type information. Similarly, the single
dimension table for location in the star schema can be normalized into two new tables: location and
city. The city key in the new location table links to the city dimension. Notice that further normalization
can be performed on province or state and country in the snowflake schema

6
Fact constellation

A fact constellation schema is shown in Figure. This schema specifies two fact tables, sales and
shipping. The sales table definition is identical to that of the star schema . The shipping table has five
dimensions, or keys: item key, time key, shipper key, from location, and to location, and two
measures: dollars cost and units shipped.
A fact constellation schema allows dimension tables to be shared between fact tables. For example, the
dimensions tables for time, item, and location are shared between both the sales and shipping fact
tables.

In data warehousing, there is a distinction between a data warehouse and a data mart.

A data warehouse collects information about subjects that span the entire organization, such as
customers, items, sales, assets, and personnel, and thus its scope is enterprise-wide. For data
warehouses, the fact constellation schema is commonly used, since it can model multiple, interrelated
subjects. A data mart, on the other hand, is a department subset of the data warehouse that focuses on
selected subjects, and thus its scope is department wide. For data marts, the star or snowflake schema
are commonly used, since both are geared toward modeling single subjects, although the star schema
is more popular and efficient.

7
OLAP (Online analytical Processing):

 OLAP is an approach to answering multi-dimensional analytical (MDA) queries swiftly.

 OLAP is part of the broader category of business intelligence, which also encompasses
relational database, report writing and data mining.
 OLAP tools enable users to analyze multidimensional data interactively from multiple
perspectives.
OLAP consists of three basic analytical operations:

 Consolidation (Roll-Up)
 Drill-Down
 Slicing And Dicing
 Consolidation involves the aggregation of data that can be accumulated and computed in one
or more dimensions. For example, all sales offices are rolled up to the sales
department or sales division to anticipate sales trends.
 The drill-down is a technique that allows users to navigate through the details. For
instance, users can view the sales by individual products that make up a region’s sales.

 Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the
OLAP cube and view (dicing) the slices from different viewpoints.
8
Types of OLAP:

Relational OLAP (ROLAP):

ROLAP works directly with relational databases. The base data and the dimension tables are stored as
relational tables and new tables are created to hold the aggregated information. It depends on a
specialized schema design.

Multidimensional OLAP (MOLAP):

MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP.

MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational
database. Therefore it requires the pre-computation and storage of information in the cube - the
operation known as processing.
MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube
contains all the possible answers to a given range of questions.
MOLAP tools have a very fast response time and the ability to quickly write back data into the data
set.

Hybrid OLAP (HOLAP):

There is no clear agreement across the industry as to what constitutes Hybrid OLAP, except that a
database will divide data between relational and specialized storage.
For example, for some vendors, a HOLAP database will use relational tables to hold the larger
quantities of detailed data, and use specialized storage for at least some aspects of the smaller
quantities of more-aggregate or less-detailed data.
HOLAP addresses the shortcomings of MOLAP and ROLAP by combining the capabilities of both
approaches.
HOLAP tools can utilize both pre-calculated cubes and relational data sources.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Internship Alternative Raw 2
No ratings yet
Internship Alternative Raw 2
38 pages
CAPE IT Unit 2 May - June 2019 Paper 2
No ratings yet
CAPE IT Unit 2 May - June 2019 Paper 2
23 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
FDS Unit 2
No ratings yet
FDS Unit 2
21 pages
DWDM Lecture Notes III-II
No ratings yet
DWDM Lecture Notes III-II
81 pages
Module 1 Notes
No ratings yet
Module 1 Notes
29 pages
ch4 DW Summary
No ratings yet
ch4 DW Summary
8 pages
Unit - 2 Data Warehouse
No ratings yet
Unit - 2 Data Warehouse
55 pages
Presentation DW DM
No ratings yet
Presentation DW DM
132 pages
C Lecture
No ratings yet
C Lecture
8 pages
03 Data Warehouse
No ratings yet
03 Data Warehouse
27 pages
Fundamentals of Data Science Notes (Module - 2)
No ratings yet
Fundamentals of Data Science Notes (Module - 2)
11 pages
Data Mining & Warehousing-1
No ratings yet
Data Mining & Warehousing-1
33 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Dataware House Unit-1 Continued
No ratings yet
Dataware House Unit-1 Continued
12 pages
DataWarehousing and Its Relevance
No ratings yet
DataWarehousing and Its Relevance
19 pages
Data Warehousing, Business Analytics and Online Analytical - 1
No ratings yet
Data Warehousing, Business Analytics and Online Analytical - 1
35 pages
Approach, or A Combination of Both
No ratings yet
Approach, or A Combination of Both
12 pages
DWDM Lecture Notes U-1
No ratings yet
DWDM Lecture Notes U-1
11 pages
Data Warehouse: Subject Oriented
No ratings yet
Data Warehouse: Subject Oriented
6 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Data Warehousing 2
No ratings yet
Data Warehousing 2
14 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
(2025!04!03) - Data Warehouse - Lecture 3
No ratings yet
(2025!04!03) - Data Warehouse - Lecture 3
41 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
Warehouse
No ratings yet
Warehouse
60 pages
U1 DMBI
No ratings yet
U1 DMBI
51 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
9 pages
02datawarehousing For DM
No ratings yet
02datawarehousing For DM
38 pages
DWDM Concept Demonstration
No ratings yet
DWDM Concept Demonstration
102 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
Data Warehouse
No ratings yet
Data Warehouse
174 pages
Chapter 2
No ratings yet
Chapter 2
44 pages
DataMining - Chapter2 - Data WareHouse
No ratings yet
DataMining - Chapter2 - Data WareHouse
53 pages
04olap New
No ratings yet
04olap New
55 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
Module-3 Data Warehousing
No ratings yet
Module-3 Data Warehousing
44 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
Data Warehousing - CH2
No ratings yet
Data Warehousing - CH2
26 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
04OLAP
100% (1)
04OLAP
58 pages
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
No ratings yet
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
59 pages
DWDM Unit 1 Notes
No ratings yet
DWDM Unit 1 Notes
41 pages
DWDM Lecture Notes III-ii - For Nlcad-6-86
No ratings yet
DWDM Lecture Notes III-ii - For Nlcad-6-86
81 pages
Unit 3 Notes
0% (1)
Unit 3 Notes
20 pages
Unit 2 - Data Science BCA
No ratings yet
Unit 2 - Data Science BCA
20 pages
Unit-2: Multi-Dimensional Data Model?
No ratings yet
Unit-2: Multi-Dimensional Data Model?
21 pages
Presented By: Nirmalya Fadikar B.E. Information Technology
No ratings yet
Presented By: Nirmalya Fadikar B.E. Information Technology
8 pages
Course Overview: What Is Data Warehouse
No ratings yet
Course Overview: What Is Data Warehouse
75 pages
04OLAP
No ratings yet
04OLAP
58 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
Bda U2
No ratings yet
Bda U2
44 pages
Mobile App Devt Chapter 1 Introduction
No ratings yet
Mobile App Devt Chapter 1 Introduction
34 pages
Unit Four 2016 EC
No ratings yet
Unit Four 2016 EC
20 pages
Unit 7
No ratings yet
Unit 7
60 pages
IP I - Chapter 1 - Internet Technologies and Protocols
No ratings yet
IP I - Chapter 1 - Internet Technologies and Protocols
30 pages
IT - COA - Chapter 3
No ratings yet
IT - COA - Chapter 3
16 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Chapter 2 Simple Sorting and Searching Algorithms
No ratings yet
Chapter 2 Simple Sorting and Searching Algorithms
6 pages
Chapter 1
No ratings yet
Chapter 1
12 pages
Chapter - 4 Stack & Queue
No ratings yet
Chapter - 4 Stack & Queue
25 pages
Chapter 3 Linked List
No ratings yet
Chapter 3 Linked List
14 pages
ch1 Social
No ratings yet
ch1 Social
28 pages
Chap 1
No ratings yet
Chap 1
10 pages
CH 5
No ratings yet
CH 5
40 pages
Chapter 4HIST CC
No ratings yet
Chapter 4HIST CC
48 pages
Chapter 2
No ratings yet
Chapter 2
48 pages
Chapter Six, Ethiopia and The Horn
No ratings yet
Chapter Six, Ethiopia and The Horn
40 pages
CH 3 2 Routing
No ratings yet
CH 3 2 Routing
13 pages
CH 3 1 Routers
No ratings yet
CH 3 1 Routers
31 pages
CH 2 Router and Switch
No ratings yet
CH 2 Router and Switch
50 pages
CH - 1 - Device Configuration
No ratings yet
CH - 1 - Device Configuration
15 pages
Java Resume To Crack FAANG
No ratings yet
Java Resume To Crack FAANG
1 page
Best Books For UGC NET CS
No ratings yet
Best Books For UGC NET CS
38 pages
A Multilingual Chatbot For Supporting Mobile Companies Complaints. Case Study: ATM Mobilis of Algeria
No ratings yet
A Multilingual Chatbot For Supporting Mobile Companies Complaints. Case Study: ATM Mobilis of Algeria
75 pages
Artificial Intelligence Group No 1 Assignment 1
No ratings yet
Artificial Intelligence Group No 1 Assignment 1
5 pages
500+ Interview Questions-1
No ratings yet
500+ Interview Questions-1
126 pages
Huffman Coding
No ratings yet
Huffman Coding
11 pages
Nan Mudhhalvan Uipath Project
No ratings yet
Nan Mudhhalvan Uipath Project
12 pages
Vinayak Jadhav 9146941944
No ratings yet
Vinayak Jadhav 9146941944
1 page
Week 6 Laboratory Exercise 04 Data Dictionary
No ratings yet
Week 6 Laboratory Exercise 04 Data Dictionary
3 pages
Zohaib's Resume
No ratings yet
Zohaib's Resume
1 page
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
10 pages
Aarohan Subedi
No ratings yet
Aarohan Subedi
19 pages
Ebook 2023 Glossary AI Terms
No ratings yet
Ebook 2023 Glossary AI Terms
22 pages
Business Intelligence - Chapter 4
No ratings yet
Business Intelligence - Chapter 4
28 pages
Data Science One Mark Question
No ratings yet
Data Science One Mark Question
3 pages
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
No ratings yet
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
452 pages
CH 11
No ratings yet
CH 11
31 pages
Roadmap Penelitian Informatika
No ratings yet
Roadmap Penelitian Informatika
4 pages
Question Bank WSMA Unit-1 Web Metrics and Analytics
No ratings yet
Question Bank WSMA Unit-1 Web Metrics and Analytics
1 page
Credit Card Default Prediction Using Machine Learning Techniques
No ratings yet
Credit Card Default Prediction Using Machine Learning Techniques
6 pages
Lis 220 Computer and Data Processing Main 1
No ratings yet
Lis 220 Computer and Data Processing Main 1
21 pages
A Large-Scale TV Video and Metadata Database For French Political Content Analysis and Fact-Checking
No ratings yet
A Large-Scale TV Video and Metadata Database For French Political Content Analysis and Fact-Checking
5 pages
Hms
No ratings yet
Hms
105 pages
Syllabus DLP
No ratings yet
Syllabus DLP
2 pages
Nirmal Kukna: Software Engineer
No ratings yet
Nirmal Kukna: Software Engineer
1 page
Modul 3 Data Science
No ratings yet
Modul 3 Data Science
10 pages
T.Y.B.Sc.I.T. Sem V Advanced Web Programming
No ratings yet
T.Y.B.Sc.I.T. Sem V Advanced Web Programming
12 pages
Brief - Data Governance
No ratings yet
Brief - Data Governance
20 pages

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Chapter - 1

Introduction to Data Warehouse

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in

Subject-Oriented: A data warehouse can be used to analyze a particular subject area.

For example, "sales" can be a particular subject.

Data Warehouse Design Process:

Data Warehouse Models:

Meta Data Repository:

 OLAP is an approach to answering multi-dimensional analytical (MDA) queries swiftly.

Relational OLAP (ROLAP):

Multidimensional OLAP (MOLAP):

Hybrid OLAP (HOLAP):

You might also like