0% found this document useful (0 votes)
30 views36 pages

Chapter Nine

A star schema organizes data into a central fact table linked to multiple dimension tables. A snowflake schema expands on this by normalizing dimension tables into multiple tables linked through foreign keys. This reduces data redundancy but increases the number of tables and joins needed for queries. Both schemas improve query performance over traditional OLTP schemas by separating facts and dimensions for faster retrieval and updates.

Uploaded by

ambroseoryem1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views36 pages

Chapter Nine

A star schema organizes data into a central fact table linked to multiple dimension tables. A snowflake schema expands on this by normalizing dimension tables into multiple tables linked through foreign keys. This reduces data redundancy but increases the number of tables and joins needed for queries. Both schemas improve query performance over traditional OLTP schemas by separating facts and dimensions for faster retrieval and updates.

Uploaded by

ambroseoryem1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Brian Kathabasya BIT/pgdIT

Chapter Nine
What is Star Schema?
What is Star Schema?

A star schema is the elementary form of a dimensional model, in which data


are organized into facts and dimensions. A fact is an event that is counted
or measured, such as a sale or log in. A dimension includes reference data
about the fact, such as date, item, or customer.

A star schema is a relational schema where a relational schema whose


design represents a multidimensional data model. The star schema is the
explicit data warehouse schema. It is known as star schema because the
entity-relationship diagram of this schemas simulates a star, with points,
diverge from a central table. The center of the schema consists of a large
fact table, and the points of the star are the dimension tables.
Fact Tables

A table in a star schema which contains facts and connected to dimensions. A fact table has
two types of columns: those that include fact and those that are foreign keys to the dimension
table. The primary key of the fact tables is generally a composite key that is made up of all of
its foreign keys.

A fact table might involve either detail level fact or fact that have been aggregated (fact tables
that include aggregated fact are often instead called summary tables). A fact table generally
contains facts with the same level of aggregation.

Dimension Tables

A dimension is an architecture usually composed of one or more hierarchies that categorize


data. If a dimension has not got hierarchies and levels, it is called a flat dimension or list.
The primary keys of each of the dimensions table are part of the composite primary keys of
the fact
table. Dimensional attributes help to define the dimensional value. They are generally
descriptive, textual values. Dimensional tables are usually small in size than fact table.

Fact tables store data about sales while dimension tables data about the geographic
region (markets, cities), clients, products, times, channels.

Characteristics of Star Schema

The star schema is intensely suitable for data warehouse database design because of
the following features:

oIt creates a DE-normalized database that can quickly provide query responses.
oItprovides a flexible design that can be changed easily or added to throughout the
development cycle, and as the database grows.
oIt provides a parallel in design to how end-users typically think of and use the data.
oIt reduces the complexity of metadata for both developers and end-users.
Advantages of Star Schema

Star Schemas are easy for end-users and application to understand and navigate. With a well-
designed schema, the customer can instantly analyze large, multidimensional data sets.

The main advantage of star schemas in a decision-support environment are:


Query Performance

A star schema database has a limited number of table and clear join paths, the query run
faster than they do against OLTP systems. Small single-table queries, frequently of a
dimension table, are almost instantaneous. Large join queries that contain multiple tables
takes only seconds or minutes to run.

In a star schema database design, the dimension is connected only through the central fact
table. When the two-dimension table is used in a query, only one join path, intersecting the
fact tables, exist between those two tables. This design feature enforces authentic and
consistent query results.

Load performance and administration

Structural simplicity also decreases the time required to load large batches of record into a
star schema database. By describing facts and dimensions and separating them into the
various table, the impact of a load structure is reduced. Dimension table can be populated
once and
occasionally refreshed. We can add new facts regularly and selectively by appending records
to a fact table.

Built-in referential integrity

A star schema has referential integrity built-in when information is loaded. Referential
integrity is enforced because each data in dimensional tables has a unique primary key, and
all keys in the fact table are legitimate foreign keys drawn from the dimension table. A record
in the fact table which is not related correctly to a dimension cannot be given the correct key
value to be retrieved.

Easily Understood

A star schema is simple to understand and navigate, with dimensions joined only through the
fact table. These joins are more significant to the end-user because they represent the
fundamental relationship between parts of the underlying business. Customer can also browse
dimension table attributes before constructing a query.
Disadvantage of Star Schema

There is some condition which cannot be meet by star schemas like the relationship between
the user, and bank account cannot describe as star schema as the relationship between them is
many to many.

Example: Suppose a star schema is composed of a fact table, SALES, and several dimension
tables connected to it for time, branch, item, and geographic locations.

The TIME table has a column for each day, month, quarter, and year. The ITEM table has
columns for each item_Key, item_name, brand, type, supplier_type. The BRANCH table has
columns for each branch_key, branch_name, branch_type. The LOCATION table has
columns of geographic data, including street, city, state, and country.
What is Snowflake Schema?

A snowflake schema is equivalent to the star schema. "A schema is known as a snowflake if
one or more dimension tables do not connect directly to the fact table but must join through
other dimension tables."

The snowflake schema is an expansion of the star schema where each point of the star
explodes into more points. It is called snowflake schema because the diagram of
snowflake schema resembles a snowflake. Snowflaking is a method of normalizing the
dimension tables in a STAR schemas. When we normalize all the dimension tables entirely,
the resultant structure resembles a snowflake with the fact table in the middle.

Snowflaking is used to develop the performance of specific queries. The schema is


diagramed with each fact surrounded by its associated dimensions, and those dimensions are
related to other dimensions, branching out into a snowflake pattern.
The snowflake schema consists of one fact table which is linked to many dimension tables,
which can be linked to other dimension tables through a many-to-one relationship. Tables in
a snowflake schema are generally normalized to the third normal form. Each dimension table
performs exactly one level in a hierarchy.

The following diagram shows a snowflake schema with two dimensions, each having three
levels. A snowflake schemas can have any number of dimension, and each dimension can
have any number of levels.
Example: Figure shows a snowflake schema with a Sales fact table, with Store,
Location, Time, Product, Line, and Family dimension tables. The Market dimension has two
dimension tables with Store as the primary dimension table, and Location as the outrigger
dimension table. The product dimension has three dimension tables with Product as the
primary dimension table, and the Line and Family table are the outrigger dimension tables.
A star schema store all attributes for a dimension into one denormalized table. This needed
more disk space than a more normalized snowflake schema. Snowflaking normalizes
the dimension by moving attributes with low cardinality into separate dimension tables that
relate to the core dimension table by using foreign keys. Snowflaking for the sole
purpose of minimizing disk space is not recommended, because it can adversely
impact query performance.

In snowflake, schema tables are normalized to delete redundancy. In snowflake


dimension tables are damaged into multiple dimension tables.

Figure shows a simple STAR schema for sales in a manufacturing company. The sales fact
table include quantity, price, and other relevant metrics. SALESREP,
CUSTOMER, PRODUCT, and TIME are the dimension tables.
The STAR schema for sales, as shown above, contains only five tables, whereas the
normalized version now extends to eleven tables. We will notice that in the snowflake
schema, the attributes with low cardinality in each original dimension tables are removed to
form separate tables. These new tables are connected back to the original dimension table
through artificial keys.
A snowflake schema is designed for flexible querying across more complex dimensions
and relationship. It is suitable for many to many and one to many relationships between
dimension levels.

Advantage of Snowflake Schema

1.The primary advantage of the snowflake schema is the development in query


performance due to minimized disk storage requirements and joining smaller lookup
tables.
2.It provides greater scalability in the interrelationship between dimension levels
and components.
3.No redundancy, so it is easier to maintain.
Disadvantage of Snowflake Schema

 The primary disadvantage of the snowflake schema is the additional


maintenance efforts required due to the increasing number of lookup tables.
It is also known as a multi fact star schema.
 There are more complex queries and hence, difficult to understand.
 More tables more join so more query execution time.
2
In this scenario, the SALES table contains only four columns with IDs from the dimension tables,
TIME, ITEM, BRANCH, and LOCATION, instead of four columns for time data, four columns for
ITEM data, three columns for BRANCH data, and four columns for LOCATION data. Thus, the size
of the fact table is significantly reduced. When we need to change an item, we need only make a
single change in the dimension table, instead of making many changes in the fact table.

We can create even more complex star schemas by normalizing a dimension table into several tables.
The normalized dimension table is called a Snowflake.

Difference between Star and Snowflake Schemas Star Schema

oIn a star schema, the fact table will be at the center and is connected to the dimension tables.
oThe tables are completely in a denormalized structure.

3
o SQL queries performance is good as there is less number of joins involved.
o Data redundancy is high and occupies more disk space.

4
5
Snowflake Schema

oA snowflake schema is an extension of star schema where the dimension tables are connected to
one or more dimensions.
oThe tables are partially denormalized in structure.
oThe performance of SQL queries is a bit less when compared to star schema as more number of
joins are involved.
oData redundancy is low and occupies less disk space when compared to star schema.

6
7
Let's see the differentiate between Star and
Snowflake Schema.
9
Basis for Comparison Star Schema Snowflake Schema
Ease of It has redundant data and hence less No redundancy and therefore more easy to maintain
Maintenance/change easy to maintain/change and change
Ease of Use Less complex queries and simple to More complex queries and therefore less easy to
understand understand
Parent table In a star schema, a dimension table In a snowflake schema, a dimension table will have one
will not have any parent table or more parent tables
Query Performance Less number of foreign keys and More foreign keys and thus more query execution
hence lesser query execution time time

Normalization It has De-normalized tables It has normalized tables


Type of Data Warehouse Good for data marts with simple Good to use for data warehouse core to simplify
relationships (one to one or one to complex relationships (many to many)
many)

Joins Fewer joins Higher number of joins

10
Dimension Table It contains only a single dimension It may have more than one dimension table for each
table for each dimension dimension
Hierarchies Hierarchies for the dimension are Hierarchies are broken into separate tables in a
stored in the dimensional table snowflake schema. These hierarchies help to drill
itself in a star schema down the information from topmost hierarchies to the
lowermost hierarchies.
When to use When the dimensional table When dimensional table store a huge number of rows
contains less number of rows, we with redundancy information and space is such an
can go for Star schema. issue, we can choose snowflake schema to store
space.

Data Warehouse system Work best in any data warehouse/ Better for small data warehouse/data mart.
data mart

11
What is Fact Constellation Schema?

A Fact constellation means two or more fact tables sharing one or more dimensions. It is
also called Galaxy schema.

Fact Constellation Schema describes a logical structure of data warehouse or data mart. Fact
Constellation Schema can design with a collection of de-normalized FACT, Shared, and Conformed
Dimension tables.

12
Fact Constellation Schema is a sophisticated database design that is difficult to summarize
information. Fact Constellation Schema can implement between aggregate Fact tables or decompose
a complex Fact table into independent simplex Fact tables.

13
Example: A fact constellation schema is shown in
the figure below.
15
This schema defines two fact tables, sales, and shipping. Sales are treated along four
dimensions, namely, time, item, branch, and location. The schema contains a fact table for sales that
includes keys to each of the four dimensions, along with two measures: Rupee_sold and units_sold.
The shipping table has five dimensions, or keys: item_key, time_key, shipper_key, from_location,
and to_location, and two measures: Rupee_cost and units_shipped.

The primary disadvantage of the fact constellation schema is that it is a more challenging
design because many variants for specific kinds of aggregation must be considered and selected.

16
Thanks for listening

You might also like