Unit 2-DATA WAREHOUSE

A Data Warehouse (DW) is a specialized relational database designed for query and analysis, integrating historical data from multiple sources to support decision-making. It offers advantages such as centralized data access and trend identification, but also has disadvantages including high implementation costs and complexity in data integration. Various schemas like Star, Snowflake, and Galaxy are used to organize data within a DW, each with unique characteristics and trade-offs.

Uploaded by

soumyachandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views28 pages

Unit 2-DATA WAREHOUSE

Uploaded by

soumyachandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

DATAWARE HOUSE

UNIT 2
Data Warehouse
• A Data Warehouse (DW) is a relational database that is designed for query
and analysis rather than transaction processing.
• It includes historical data derived from transaction data from single and
multiple sources.
• A Data Warehouse provides integrated, enterprise-wide, historical data and
focuses on providing support for decision-makers for data modeling and
analysis.
• A Data Warehouse can be viewed as a data system with the following
attributes:
• It is a database designed for investigative tasks, using data from various applications.
• It supports a relatively small number of clients with relatively long interactions.
• It includes current and historical data to provide a historical perspective of information.
• Its usage is read-intensive.
• It contains a few large tables.
• Data warehousing is a technology that enables businesses to store,
manage, and analyze large volumes of data from various sources in a
centralized repository. The primary goal of data warehousing is to
provide a comprehensive and integrated view of an organization's
data to support informed decision-making.
Advantages
• It provides a central repository for critical data, making it easy for business
users to access information from various sources.
• By providing a consolidated view of data from different sources, data
warehouses enable organizations to make informed decisions based on
accurate and consistent data.
• Integrates multiple data sources to reduce stress on the production system and
reduces the total turnaround time for analysis and reporting.
• Data warehouses provide historical data that can be used to identify trends and
patterns over time, leading to better decision-making and planning.
• Restructures and integrates data to make it easier for users to use for reporting
and analysis.
• Saves time by allowing users to access critical data from multiple sources in a
single place.
• Data warehouses can easily scale to meet the needs of growing organizations,
allowing them to store and analyze large volumes of data.
Disadvantages
• Implementing and maintaining a data warehouse can be expensive,
including hardware, software, and personnel costs.
• Not suitable for unstructured data.
• Not suitable for `real-time or near-real-time data processing.
• Integrating data from multiple sources into a single data warehouse
can be complex and time-consuming.
• Data in the warehouse may become outdated quickly.
• Changes in data types and ranges, data source schema, indexes, and
queries can be challenging to implement.
Difference between Database & Data Warehouses
Aspect Database Data Warehouse
Purpose Primarily for transactional operations Primarily for analytical operations
Data Type Handles structured data Handles structured and unstructured data
Schema Generally follows a normalized schema Often follows a denormalized schema
Data Volume Usually handles smaller data volumes Handles large volumes of historical data
Optimized for complex queries and
Performance Optimized for read and write operations
reporting
Data Freshness Emphasizes real-time data Emphasizes historical and periodic data
Query Complexity Supports simpler, real-time queries Supports complex, analytical queries
Data
Limited emphasis on data transformations Involves significant data transformations
Transformations
Usage Used for day-to-day operations Used for decision-making and analysis
Data Model OLTP (Online Transaction Processing) OLAP (Online Analytical Processing)
Multidimensional Data Model
• The multidimensional data model is a type of data model
used primarily in data warehousing that organizes data into
multiple dimensions, each representing a specific attribute
of the data.
• It typically uses a cube structure to organize this data and
supports high-performance querying for analytical reports.
• This structure helps simplify the analysis of large and
complex data sets.
• Multidimensional data model in data warehouse is a model which
represents data in the form of data cubes.
• It allows to model and view the data in multiple dimensions and it is
defined by dimensions and facts.
• Multidimensional data model is generally categorized around a
central theme and represented by a fact table.
Multidimensional Schema
• Multidimensional Schema is especially designed to model
data warehouse systems.
• The schemas are designed to address the unique needs of very
large databases designed for the analytical purpose (OLAP).
• Types of Data Warehouse Schema:
• Following are 3 main types of multidimensional schemas each having
its unique advantages.
1. Star Schema
2. Snowflake Schema
3. Galaxy Schema
Star Schema
• Star Schema in data warehouse, is a schema in which the
center of the star can have one fact table and a number of
associated dimension tables.
• It is known as star schema as its structure resembles a star.
• The Star Schema data model is the simplest type of Data
Warehouse schema and also known as Star Join Schema
and is optimized for querying large data sets.
Example

In the above Star Schema example, the fact table is at the center
which contains keys to every dimension table like Dealer_ID, Model
ID, Date_ID, Product_ID, Branch_ID & other attributes like Units sold
and revenue.
Fact Tables
• A Fact table in a star schema contains facts and is connected to
dimensions.
• A fact table has two types of columns:
• A column that includes Facts
• Foreign Key to Dimensions Table
• Generally, the primary key of a fact table is a composite key that is
made up of all the foreign keys that make up the table.
• Fact tables can contain detail-level facts or aggregated facts. Fact
tables that include aggregated facts are often called summary
tables. Fact tables usually contain facts that have been aggregated
to some level.
Dimension Tables
• A dimension is an architecture that categorizes data in a hierarchy.
• A dimension without hierarchies and levels is called a flat dimension
or list.
• Each dimension table’s primary key is part of the composite primary
key of the fact table.
• A dimension attribute is a descriptive, textual attribute that helps
describe a dimensional value.
• Fact tables are usually larger than dimension tables.
Characteristics of Star Schema
• Every dimension in a star schema is represented with the only one-
dimension table.
• The dimension table should contain the set of attributes.
• The dimension table is joined to the fact table using a foreign key.
• The dimension table are not joined to each other
• Fact table would contain key and measure
• The Star schema is easy to understand and provides optimal disk usage.
• The dimension tables are not normalized. For instance, in the above figure,
Country_ID does not have Country lookup table as an OLTP design would
have.
• The schema is widely supported by BI Tools
Advantages of Star Schema
• Star schemas have a more straightforward join logic compared to other
schemas for fetching data from highly normalized transactional schemas.
• As opposed to highly normalized transactional schemas, the star schema
simplifies common business reporting logic, such as reporting and
period-over-period.
• Star schemas are widely used by OLAP systems to design cubes
efficiently. A star schema can be used as a source without designing a
cube structure in most major OLAP systems.
• By enabling specific performance schemes that can be applied to
queries, the query processor software in Star Schema can offer better
execution plans.
Disadvantage of Star Schema
• Since the schema is highly de-normalized, data integrity is not
enforced well.
• Not flexible in terms of analytical needs.
• Star schemas do not reinforce many-to-many relationships within
business entities.
Snowflake Schema
• Snowflake Schema in data warehouse is a logical arrangement of
tables in a multidimensional database such that the ER
diagram resembles a snowflake shape.
• A Snowflake Schema is an extension of a Star Schema, and it adds
additional dimensions.
• The dimension tables are normalized which splits data into
additional tables.
Example
In the following Snowflake Schema example, Country is further normalized into an
individual table.
Characteristics of Snowflake Schema
• The main benefit of the snowflake schema it uses smaller disk
space.
• Easier to implement a dimension is added to the Schema
• Due to multiple tables query performance is reduced
• The primary challenge that you will face while using the snowflake
Schema is that you need to perform more maintenance efforts
because of the more lookup tables.
Advantage of Snowflake Schema
• Snowflake schema’s primary advantage is its ability to reduce disk
storage requirements and join smaller lookup tables, improving
query performance.
• Provides greater scalability in the interrelationship between
components and dimension levels.
• There is no redundancy, so it is easier to maintain.
Disadvantage of Snowflake Schema
• A significant disadvantage of the snowflake schema is the increased
maintenance required.
• Complex queries are challenging to understand.
• A larger number of tables means more joins, so a longer query
execution time.
Galaxy Schema
• A Galaxy Schema contains two fact table that share dimension
tables between them.
• It is also called Fact Constellation Schema.
• The schema is viewed as a collection of stars hence the name Galaxy
Schema.
Example

In the above example, there are two facts table

1.Revenue
2.Product.
In Galaxy schema shares dimensions are called Conformed Dimensions.
Characteristics of Galaxy Schema
• The dimensions in this schema are separated into separate dimensions
based on the various levels of hierarchy.
• For example, if geography has four levels of hierarchy like region,
country, state, and city then Galaxy schema should have four
dimensions.
• Moreover, it is possible to build this type of schema by splitting the one-
star schema into more Star schemes.
• The dimensions are large in this schema which is needed to build based
on the levels of hierarchy.
• This schema is helpful for aggregating fact tables for better
understanding.
Data Cleaning
• Data cleaning is the process of fixing or removing incorrect,
corrupted, incorrectly formatted, duplicate, or incomplete data within
a dataset.
• When combining multiple data sources, there are many opportunities
for data to be duplicated or mislabeled.
• There is no one absolute way to prescribe the exact steps in the data
cleaning process because the processes will vary from dataset to
dataset.
Data Integration and Transformation
• Data integration and transformation is the process of combining data
from multiple sources, such as databases, files, web services, or APIs,
into a single data set or data warehouse.
• This involves resolving issues such as data quality, consistency,
compatibility, and security.
• Data transformation is the process of modifying the data to fit a
specific purpose, such as analysis, reporting, or visualization.
• This involves applying operations such as filtering, sorting,
aggregating, joining, or splitting the data.
Data Reduction
• Data reduction is a method of reducing the size of original data so
that it may be represented in a much smaller space.
• While reducing data, data reduction techniques preserve data
integrity.
• The time spent on data reduction should not be overlooked in favor
of the time saved by data mining on the smaller data set.
Discretization
• In a data warehouse, "discretization" refers to the process of
transforming continuous data (like numerical values) into a set of
discrete categories or intervals, essentially "binning" the data to
simplify analysis and make it compatible with certain data mining
algorithms that require discrete data; it's a common data
preprocessing step used to reduce complexity and improve the
usability of large datasets within a data warehouse.
• Purpose:-To make continuous data easier to interpret and analyze by
dividing it into manageable, discrete ranges.

Sae Arp4754 Rev.A
88% (8)
Sae Arp4754 Rev.A
115 pages
Identification of Safety Critical Equipment (SCE) : Guide
100% (3)
Identification of Safety Critical Equipment (SCE) : Guide
28 pages
1901 2022412984 SC400T00AENUTrainerHandbook
100% (2)
1901 2022412984 SC400T00AENUTrainerHandbook
194 pages
Unit 5 DW
No ratings yet
Unit 5 DW
12 pages
Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?
No ratings yet
Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?
15 pages
Unit2 - 5marks (Datascience)
No ratings yet
Unit2 - 5marks (Datascience)
16 pages
Data Warehouse
No ratings yet
Data Warehouse
85 pages
Dimensional Modeling and Schemas: Data Modeling Research Paper
No ratings yet
Dimensional Modeling and Schemas: Data Modeling Research Paper
11 pages
DWM Unit 2. Data Warehousing Modeling & OLAP I
100% (2)
DWM Unit 2. Data Warehousing Modeling & OLAP I
16 pages
Data Warehouse Concepts PDF
0% (1)
Data Warehouse Concepts PDF
14 pages
DWM Chp2 Notes
No ratings yet
DWM Chp2 Notes
21 pages
DW Concepts
No ratings yet
DW Concepts
7 pages
Lect-6-Data warehousing-Part-II
No ratings yet
Lect-6-Data warehousing-Part-II
37 pages
DMDW 7
No ratings yet
DMDW 7
30 pages
Introduction To DataWarehouse and DataMining
No ratings yet
Introduction To DataWarehouse and DataMining
35 pages
Data Cubemod2
100% (1)
Data Cubemod2
21 pages
Home Work 3
0% (1)
Home Work 3
10 pages
DW-DM R19 Unit-1
100% (1)
DW-DM R19 Unit-1
25 pages
Data Warehouse Schemas
No ratings yet
Data Warehouse Schemas
87 pages
Unit-1 Lecture Notes
100% (1)
Unit-1 Lecture Notes
43 pages
Unit 2 Notes DWM
No ratings yet
Unit 2 Notes DWM
14 pages
DW Lab Manual Print
No ratings yet
DW Lab Manual Print
47 pages
Data Mining Notes UNIT II
No ratings yet
Data Mining Notes UNIT II
25 pages
Data Warehouse Schema
No ratings yet
Data Warehouse Schema
6 pages
Data Warehouse
No ratings yet
Data Warehouse
81 pages
Star and Snowflake
No ratings yet
Star and Snowflake
4 pages
Data Warehousing Schemas and Objects
No ratings yet
Data Warehousing Schemas and Objects
24 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
11 pages
Chapter Nine
No ratings yet
Chapter Nine
36 pages
Data Warehouse Lec-3
No ratings yet
Data Warehouse Lec-3
38 pages
Lecture Six-Schemas
No ratings yet
Lecture Six-Schemas
5 pages
1
No ratings yet
1
35 pages
Data Warehouse 1735829229
No ratings yet
Data Warehouse 1735829229
11 pages
Schema
No ratings yet
Schema
17 pages
8 Database Schema
No ratings yet
8 Database Schema
8 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
26 pages
Dataware House Strcture
No ratings yet
Dataware House Strcture
13 pages
Unit I DMT
No ratings yet
Unit I DMT
74 pages
CH 3
No ratings yet
CH 3
60 pages
Data Warehouse and Data Mining
No ratings yet
Data Warehouse and Data Mining
11 pages
Final DWM
No ratings yet
Final DWM
30 pages
DWM 2
No ratings yet
DWM 2
21 pages
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
No ratings yet
3 - Business Analysis in Data Mining - L6 - 7 - 8 - 9 - 10
40 pages
The Basics: Facts & Dimensions
No ratings yet
The Basics: Facts & Dimensions
4 pages
DMDW-MDM L8,9
No ratings yet
DMDW-MDM L8,9
53 pages
3 - Data Warehousing and Business Intelligence
No ratings yet
3 - Data Warehousing and Business Intelligence
58 pages
Data Warehousing Mid-Term Answers (Tentative)
No ratings yet
Data Warehousing Mid-Term Answers (Tentative)
4 pages
Multidimensional Schema
No ratings yet
Multidimensional Schema
4 pages
DMDW Unit2
No ratings yet
DMDW Unit2
35 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
5 pages
Star Schema
No ratings yet
Star Schema
5 pages
Unit 2
No ratings yet
Unit 2
33 pages
Schemas For Multidimensional Databases
No ratings yet
Schemas For Multidimensional Databases
5 pages
SPPU 2022 Solved Question Paper DWDM
50% (2)
SPPU 2022 Solved Question Paper DWDM
25 pages
Module 1 Data Warehousing Fundamentals
No ratings yet
Module 1 Data Warehousing Fundamentals
17 pages
Chapter V
No ratings yet
Chapter V
38 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
6 pages
Data Warehousing: People Making Technology Wor K™
100% (1)
Data Warehousing: People Making Technology Wor K™
44 pages
5.data Warehouse
No ratings yet
5.data Warehouse
19 pages
Unit 2 DWM
No ratings yet
Unit 2 DWM
16 pages
Data Warehouse
No ratings yet
Data Warehouse
8 pages
NPN RF Transistor: Absolute Maximum Ratings TA 25 Cumess Otherwise Noted
No ratings yet
NPN RF Transistor: Absolute Maximum Ratings TA 25 Cumess Otherwise Noted
3 pages
Incident Report (CASE#28912) - RCBC Bankard - Recording Cannot Be Seen On Interactions
No ratings yet
Incident Report (CASE#28912) - RCBC Bankard - Recording Cannot Be Seen On Interactions
3 pages
The Intel Pentium
No ratings yet
The Intel Pentium
10 pages
Sanjay Shelar Pune 9890704605
No ratings yet
Sanjay Shelar Pune 9890704605
9 pages
Smart Parking System..
No ratings yet
Smart Parking System..
75 pages
ABB Network Management Card User Manual-En
No ratings yet
ABB Network Management Card User Manual-En
36 pages
Soal Uas Ujian-1
No ratings yet
Soal Uas Ujian-1
3 pages
Large Rhombicosidodecahedron PDF
No ratings yet
Large Rhombicosidodecahedron PDF
11 pages
TMCQ
No ratings yet
TMCQ
14 pages
Computer Profile Summary: Plan For Your Next Computer Refresh... Click For Belarc's System Management Products
0% (1)
Computer Profile Summary: Plan For Your Next Computer Refresh... Click For Belarc's System Management Products
6 pages
ARC Family Disaster Plan Template r083012 0
No ratings yet
ARC Family Disaster Plan Template r083012 0
3 pages
6th Sem LI-Fi Technology Research Paper Final
No ratings yet
6th Sem LI-Fi Technology Research Paper Final
42 pages
BE Mechatronics Brochure 2019 Final
No ratings yet
BE Mechatronics Brochure 2019 Final
2 pages
10 - FCFS and SJF Algorithm
No ratings yet
10 - FCFS and SJF Algorithm
28 pages
Law and Emerging Technologies Unit 1
No ratings yet
Law and Emerging Technologies Unit 1
103 pages
Topo Sheet and Calculation
No ratings yet
Topo Sheet and Calculation
16 pages
BTH Brochure - Bharat Test House
No ratings yet
BTH Brochure - Bharat Test House
6 pages
Datasheet Wdeh220 20120702-14235212729
No ratings yet
Datasheet Wdeh220 20120702-14235212729
4 pages
Blanket Purchase Order
No ratings yet
Blanket Purchase Order
8 pages
6 Step File Prep Guide For Adobe Illustration To Ezcad
No ratings yet
6 Step File Prep Guide For Adobe Illustration To Ezcad
13 pages
ECE 2006 Semester II
No ratings yet
ECE 2006 Semester II
4 pages
004N - UG EVO 3 IP ENG 15 - 04 - 2021 - Compressed
No ratings yet
004N - UG EVO 3 IP ENG 15 - 04 - 2021 - Compressed
52 pages
TAFJ JBC Remote Debugger
No ratings yet
TAFJ JBC Remote Debugger
10 pages
Technical Service Guide: General Electric Side-by-Side Knob Control/Metal Liner Refrigerator
No ratings yet
Technical Service Guide: General Electric Side-by-Side Knob Control/Metal Liner Refrigerator
70 pages
DOH AO No 2020 0023
No ratings yet
DOH AO No 2020 0023
11 pages
03 CP PDF
No ratings yet
03 CP PDF
8 pages
PROII91 GettingStartedGuide
No ratings yet
PROII91 GettingStartedGuide
122 pages

Unit 2-DATA WAREHOUSE

Uploaded by

Unit 2-DATA WAREHOUSE

Uploaded by

DATAWARE HOUSE

In the above example, there are two facts table

You might also like