0% found this document useful (0 votes)
99 views8 pages

Unit 2

Step 1) Dimensional modeling is a technique used to optimize data storage in a data warehouse. It involves storing data in fact and dimension tables. Step 2) The key steps of dimensional modeling are to identify the business process, grain, dimensions, facts, and build a star or snowflake schema. Dimensions provide context for facts and include attributes. Facts are measures/metrics. Step 3) A star schema, the simplest type, has one fact table in the center connected to multiple dimension tables. It is optimized for querying large datasets.

Uploaded by

Binay Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views8 pages

Unit 2

Step 1) Dimensional modeling is a technique used to optimize data storage in a data warehouse. It involves storing data in fact and dimension tables. Step 2) The key steps of dimensional modeling are to identify the business process, grain, dimensions, facts, and build a star or snowflake schema. Dimensions provide context for facts and include attributes. Facts are measures/metrics. Step 3) A star schema, the simplest type, has one fact table in the center connected to multiple dimension tables. It is optimized for querying large datasets.

Uploaded by

Binay Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Unit -2 Data Warehousing

Dimensional Modeling in Data Warehouse


Dimensional Modeling (DM) is a data structure technique optimized for data storage in a
Data warehouse. The purpose of dimensional modeling is to optimize the database for
faster retrieval of data. The concept of Dimensional Modelling was developed by Ralph
Kimball and consists of “fact” and “dimension” tables.
A dimensional model in data warehouse is designed to read, summarize, analyze numeric
information like values, balances, counts, weights, etc. in a data warehouse.
These dimensional and relational models have their unique way of data storage that has
specific advantages.
For instance, in the relational mode, normalization and ER models reduce redundancy in
data. On the contrary, dimensional model in data warehouse arranges data in such a way
that it is easier to retrieve information and generate reports.
Hence, Dimensional models are used in data warehouse systems and not a good fit for
relational systems.
Elements of Dimensional Data Model
Fact
Facts are the measurements/metrics or facts from your business process. For a Sales
business process, a measurement would be quarterly sales number
Dimension
Dimension provides the context surrounding a business process event. In simple terms,
they give who, what, where of a fact. In the Sales business process, for the fact quarterly
sales number, dimensions would be
 Who – Customer Names
 Where – Location
 What – Product Name
In other words, a dimension is a window to view information in the facts.
Attributes
The Attributes are the various characteristics of the dimension in dimensional data
modeling.
In the Location dimension, the attributes can be
 State
 Country
 Zipcode etc.
Attributes are used to search, filter, or classify facts. Dimension Tables contain Attributes
Fact Table
A fact table is a primary table in dimension modelling.
A Fact Table contains
1. Measurements/facts
2. Foreign key to dimension table
Dimension Table
 A dimension table contains dimensions of a fact.
 They are joined to fact table via a foreign key.
 Dimension tables are de-normalized tables.
 The Dimension Attributes are the various columns in a dimension table
 Dimensions offers descriptive characteristics of the facts with the help of their
attributes
 No set limit set for given for number of dimensions
 The dimension can also contain one or more hierarchical relationships
Steps of Dimensional Modeling
The accuracy in creating your Dimensional modeling determines the success of your data
warehouse implementation. Here are the steps to create Dimension Model
1. Identify Business Process
2. Identify Grain (level of detail)
3. Identify Dimensions

Er.Binay Yadav Page 1


Unit -2 Data Warehousing

4. Identify Facts
5. Build Star
The model should describe the Why, How much, When/Where/Who and What of your
business process

Step 1) Identify the Business Process


Identifying the actual business process a data warehouse should cover. This could be Marketing,
Sales, HR, etc. as per the data analysis needs of the organization. The selection of the Business
process also depends on the quality of data available for that process. It is the most important
step of the Data Modelling process, and a failure here would have cascading and irreparable
defects.
Step 2) Identify the Grain
The Grain describes the level of detail for the business problem/solution. It is the process of
identifying the lowest level of information for any table in your data warehouse. If a table contains
sales data for every day, then it should be daily granularity. If a table contains total sales data for
each month, then it has monthly granularity.
Step 3) Identify the Dimensions
Dimensions are nouns like date, store, inventory, etc. These dimensions are where all the data
should be stored. For example, the date dimension may contain data like a year, month and
weekday.
Step 4) Identify the Fact
This step is co-associated with the business users of the system because this is where they get
access to data stored in the data warehouse. Most of the fact table rows are numerical values like
price or cost per unit, etc.
Step 5) Build Schema
In this step, you implement the Dimension Model. A schema is nothing but the database structure
(arrangement of tables). There are two popular schemas
1. Star Schema
The star schema architecture is easy to design. It is called a star schema because diagram
resembles a star, with points radiating from a center. The center of the star consists of the fact
table, and the points of the star is dimension tables.
The fact tables in a star schema which is third normal form whereas dimensional tables are de-
normalized.
2. Snowflake Schema
The snowflake schema is an extension of the star schema. In a snowflake schema, each dimension
are normalized and connected to more dimension tables.

Rules for Dimensional Modelling


Following are the rules and principles of Dimensional Modeling:
 Load atomic data into dimensional structures.
 Build dimensional models around business processes.
 Need to ensure that every fact table has an associated date dimension table.
 Ensure that all facts in a single fact table are at the same grain or level of detail.
 It’s essential to store report labels and filter domain values in dimension tables

Er.Binay Yadav Page 2


Unit -2 Data Warehousing

 Need to ensure that dimension tables use a surrogate key


 Continuously balance requirements and realities to deliver business solution to support
their decision-making
What is Multi-Dimensional Data Model in Data Warehouse?
Multidimensional data model in data warehouse is a model which represents data in the form of
data cubes. It allows to model and view the data in multiple dimensions and it is defined by
dimensions and facts. Multidimensional data model is generally categorized around a central
theme and represented by a fact table.

Multidimensional Schema is especially designed to model data warehouse systems. The


schemas are designed to address the unique needs of very large databases designed for the
analytical purpose (OLAP).
Types of Data Warehouse Schema:
Following are 3 chief types of multidimensional schemas each having its unique advantages.
 Star Schema
 Snowflake Schema
 Galaxy Schema
What is a Star Schema?
Star Schema in data warehouse, is a schema in which the center of the star can have one fact
table and a number of associated dimension tables. It is known as star schema as its structure
resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema. It
is also known as Star Join Schema and is optimized for querying large data sets.
Example of Star Schema Data Modelling
In the following Star Schema example, the fact table is at the center which contains keys to every
dimension table like Dealer_ID, Model ID, Date_ID, Product_ID, Branch_ID & other attributes like
Units sold and revenue.

Example of Star Schema Diagram


Fact Tables
A Fact table in a star schema contains facts and is connected to dimensions. A fact table has two
types of columns:
 A column that includes Facts
 Foreign Key to Dimensions Table
Generally, the primary key of a fact table is a composite key that is made up of all the foreign keys
that make up the table.
Fact tables can contain detail-level facts or aggregated facts. Fact tables that include aggregated
facts are often called summary tables. Fact tables usually contain facts that have been
aggregated to some level.
Dimension Tables
A dimension is an architecture that categorizes data in a hierarchy. A dimension without
hierarchies and levels is called a flat dimension or list. Each dimension table’s primary key is part
of the composite primary key of the fact table. A dimension attribute is a descriptive, textual

Er.Binay Yadav Page 3


Unit -2 Data Warehousing

attribute that helps describe a dimensional value. Fact tables are usually larger than dimension
tables.
Characteristics of Star Schema
 Every dimension in a star schema is represented with the only one-dimension table.
 The dimension table should contain the set of attributes.
 The dimension table is joined to the fact table using a foreign key
 The dimension table are not joined to each other
 Fact table would contain key and measure
 The Star schema is easy to understand and provides optimal disk usage.
 The dimension tables are not normalized. For instance, in the above figure, Country_ID
does not have Country lookup table as an OLTP design would have.
 The schema is widely supported by BI Tools
Advantages of Star Schema
 Star schemas have a more straightforward join logic compared to other schemas for
fetching data from highly normalized transactional schemas.
 As opposed to highly normalized transactional schemas, the star schema simplifies
common business reporting logic, such as reporting and period-over-period.
 Star schemas are widely used by OLAP systems to design cubes efficiently. A star schema
can be used as a source without designing a cube structure in most major OLAP systems.
 By enabling specific performance schemes that can be applied to queries, the query
processor software in Star Schema can offer better execution plans.
Disadvantage of Star Schema
 Since the schema is highly de-normalized, data integrity is not enforced well.
 Not flexible in terms of analytical needs.
 Star schemas do not reinforce many-to-many relationships within business entities.
What is a Snowflake Schema?
Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional
database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an
extension of a Star Schema, and it adds additional dimensions. The dimension tables are
normalized which splits data into additional tables.
Snowflake Schema Example
In the following Snowflake Schema example, Country is further normalized into an individual table.

Example of Snowflake Schema


Characteristics of Snowflake Schema
 The main benefit of the snowflake schema it uses smaller disk space.
 Easier to implement a dimension is added to the Schema
 Due to multiple tables query performance is reduced
 The primary challenge that you will face while using the snowflake Schema is that you need
to perform more maintenance efforts because of the more lookup tables.
Advantage of Snowflake Schema
 Snowflake schema’s primary advantage is its ability to reduce disk storage requirements
and join smaller lookup tables, improving query performance.

Er.Binay Yadav Page 4


Unit -2 Data Warehousing

 Provides greater scalability in the interrelationship between components and dimension


levels.
 There is no redundancy, so it is easier to maintain.
Disadvantage of Snowflake Schema
 A significant disadvantage of the snowflake schema is the increased maintenance
required.
 Complex queries are challenging to understand.
 A larger number of tables means more joins, so a longer query execution time.
What is a Galaxy Schema?
A Galaxy Schema contains two fact table that share dimension tables between them. It is also
called Fact Constellation Schema. The schema is viewed as a collection of stars hence the name
Galaxy Schema.

Example of Galaxy Schema


As you can see in above example, there are two facts table
1. Revenue
2. Product.
In Galaxy schema shares dimensions are called Conformed Dimensions.
Characteristics of Galaxy Schema
 The dimensions in this schema are separated into separate dimensions based on the
various levels of hierarchy.
 For example, if geography has four levels of hierarchy like region, country, state, and city
then Galaxy schema should have four dimensions.
 Moreover, it is possible to build this type of schema by splitting the one-star schema into
more Star schemes.
 The dimensions are large in this schema which is needed to build based on the levels of
hierarchy.
 This schema is helpful for aggregating fact tables for better understanding.
Difference between Star Schema and Snowflake Schema
Following is a key difference between Snowflake schema vs Star schema:
Star Schema Snowflake Schema
Hierarchies for the dimensions are stored in
Hierarchies are divided into separate tables.
the dimensional table.
One fact table surrounded by dimension table
It contains a fact table surrounded by
which are in turn surrounded by dimension
dimension tables.
table
In a star schema, only single join creates
A snowflake schema requires many joins to
the relationship between the fact table and
fetch the data.
any dimension tables.
Simple DB Design. Very Complex DB Design.
Denormalized Data structure and query also
Normalized Data Structure.
run faster.
High level of Data redundancy Very low-level data redundancy
Single Dimension table contains aggregated Data Split into different Dimension Tables.

Er.Binay Yadav Page 5


Unit -2 Data Warehousing

data.
Cube processing might be slow because of
Cube processing is faster.
the complex join.
Offers higher performing queries using Star
The Snowflake schema is represented by
Join Query Optimization.
centralized fact table which unlikely
Tables may be connected with multiple
connected with multiple dimensions.
dimensions.

What is Data Mart in Data Warehouse?


A Data Mart is focused on a single functional area of an organization and contains a subset of data
stored in a Data Warehouse. A Data Mart is a condensed version of Data Warehouse and is
designed for use by a specific department, unit or set of users in an organization. E.g., Marketing,
Sales, HR or finance. It is often controlled by a single department in an organization.
Data Mart usually draws data from only a few sources compared to a Data warehouse. Data marts
are small in size and are more flexible compared to a Datawarehouse.
Types of Data Mart
There are three main types of data mart:
1. Dependent: Dependent data marts are created by drawing data directly from operational,
external or both sources.
2. Independent: Independent data mart is created without the use of a central data
warehouse.
3. Hybrid: This type of data marts can take data from data warehouses or operational
systems.

Dependent Data Mart


A dependent data mart allows sourcing organization’s data from a single Data Warehouse. It is
one of the data mart example which offers the benefit of centralization. If you need to develop one
or more physical data marts, then you need to configure them as dependent data marts.
Dependent Data Mart in data warehouse can be built in two different ways. Either where a user
can access both the data mart and data warehouse, depending on need, or where access is
limited only to the data mart. The second approach is not optimal as it produces sometimes
referred to as a data junkyard. In the data junkyard, all data begins with a common source, but
they are scrapped, and mostly junked.

Dependent Data Mart

Independent Data Mart


An independent data mart is created without the use of central Data warehouse. This kind of Data
Mart is an ideal option for smaller groups within an organization.
An independent data mart has neither a relationship with the enterprise data warehouse nor with
any other data mart. In Independent data mart, the data is input separately, and its analyses are
also performed autonomously.

Er.Binay Yadav Page 6


Unit -2 Data Warehousing

Implementation of independent data marts is antithetical to the motivation for building a data
warehouse. First of all, you need a consistent, centralized store of enterprise data which can be
analyzed by multiple users with different interests who want widely varying information.

Independent Data Mart


Hybrid Data Mart:
A hybrid data mart combines input from sources apart from Data warehouse. This could be helpful
when you want ad-hoc integration, like after a new group or product is added to the organization.
It is the best data mart example suited for multiple database environments and fast
implementation turnaround for any organization. It also requires least data cleansing effort.
Hybrid Data mart also supports large storage structures, and it is best suited for flexible for
smaller data-centric applications.

Hybrid Data Mart

Er.Binay Yadav Page 7


Unit -2 Data Warehousing

What is a virtual warehouse?


A virtual warehouse is a process for collecting and managing data from different sources. You can
use it typically to connect and analyze data from heterogeneous sources. A virtual warehouse is
essential for queries and DML operations, including loading data into tables. You can define a
virtual warehouse by its size and other properties. These help control and automate warehouse
activity.

Benefits of Virtual Warehousing


Three of the most notable benefits of virtual warehousing include:
 Faster fulfillment of customer orders
 Reserve and/or store inventory in a more cost-efficient manner
 Share real-time data about inventory with other users

Er.Binay Yadav Page 8

You might also like