Chapter Nine
Chapter Nine
Chapter Nine
What is Star Schema?
What is Star Schema?
A table in a star schema which contains facts and connected to dimensions. A fact table has
two types of columns: those that include fact and those that are foreign keys to the dimension
table. The primary key of the fact tables is generally a composite key that is made up of all of
its foreign keys.
A fact table might involve either detail level fact or fact that have been aggregated (fact tables
that include aggregated fact are often instead called summary tables). A fact table generally
contains facts with the same level of aggregation.
Dimension Tables
Fact tables store data about sales while dimension tables data about the geographic
region (markets, cities), clients, products, times, channels.
The star schema is intensely suitable for data warehouse database design because of
the following features:
oIt creates a DE-normalized database that can quickly provide query responses.
oItprovides a flexible design that can be changed easily or added to throughout the
development cycle, and as the database grows.
oIt provides a parallel in design to how end-users typically think of and use the data.
oIt reduces the complexity of metadata for both developers and end-users.
Advantages of Star Schema
Star Schemas are easy for end-users and application to understand and navigate. With a well-
designed schema, the customer can instantly analyze large, multidimensional data sets.
A star schema database has a limited number of table and clear join paths, the query run
faster than they do against OLTP systems. Small single-table queries, frequently of a
dimension table, are almost instantaneous. Large join queries that contain multiple tables
takes only seconds or minutes to run.
In a star schema database design, the dimension is connected only through the central fact
table. When the two-dimension table is used in a query, only one join path, intersecting the
fact tables, exist between those two tables. This design feature enforces authentic and
consistent query results.
Structural simplicity also decreases the time required to load large batches of record into a
star schema database. By describing facts and dimensions and separating them into the
various table, the impact of a load structure is reduced. Dimension table can be populated
once and
occasionally refreshed. We can add new facts regularly and selectively by appending records
to a fact table.
A star schema has referential integrity built-in when information is loaded. Referential
integrity is enforced because each data in dimensional tables has a unique primary key, and
all keys in the fact table are legitimate foreign keys drawn from the dimension table. A record
in the fact table which is not related correctly to a dimension cannot be given the correct key
value to be retrieved.
Easily Understood
A star schema is simple to understand and navigate, with dimensions joined only through the
fact table. These joins are more significant to the end-user because they represent the
fundamental relationship between parts of the underlying business. Customer can also browse
dimension table attributes before constructing a query.
Disadvantage of Star Schema
There is some condition which cannot be meet by star schemas like the relationship between
the user, and bank account cannot describe as star schema as the relationship between them is
many to many.
Example: Suppose a star schema is composed of a fact table, SALES, and several dimension
tables connected to it for time, branch, item, and geographic locations.
The TIME table has a column for each day, month, quarter, and year. The ITEM table has
columns for each item_Key, item_name, brand, type, supplier_type. The BRANCH table has
columns for each branch_key, branch_name, branch_type. The LOCATION table has
columns of geographic data, including street, city, state, and country.
What is Snowflake Schema?
A snowflake schema is equivalent to the star schema. "A schema is known as a snowflake if
one or more dimension tables do not connect directly to the fact table but must join through
other dimension tables."
The snowflake schema is an expansion of the star schema where each point of the star
explodes into more points. It is called snowflake schema because the diagram of
snowflake schema resembles a snowflake. Snowflaking is a method of normalizing the
dimension tables in a STAR schemas. When we normalize all the dimension tables entirely,
the resultant structure resembles a snowflake with the fact table in the middle.
The following diagram shows a snowflake schema with two dimensions, each having three
levels. A snowflake schemas can have any number of dimension, and each dimension can
have any number of levels.
Example: Figure shows a snowflake schema with a Sales fact table, with Store,
Location, Time, Product, Line, and Family dimension tables. The Market dimension has two
dimension tables with Store as the primary dimension table, and Location as the outrigger
dimension table. The product dimension has three dimension tables with Product as the
primary dimension table, and the Line and Family table are the outrigger dimension tables.
A star schema store all attributes for a dimension into one denormalized table. This needed
more disk space than a more normalized snowflake schema. Snowflaking normalizes
the dimension by moving attributes with low cardinality into separate dimension tables that
relate to the core dimension table by using foreign keys. Snowflaking for the sole
purpose of minimizing disk space is not recommended, because it can adversely
impact query performance.
Figure shows a simple STAR schema for sales in a manufacturing company. The sales fact
table include quantity, price, and other relevant metrics. SALESREP,
CUSTOMER, PRODUCT, and TIME are the dimension tables.
The STAR schema for sales, as shown above, contains only five tables, whereas the
normalized version now extends to eleven tables. We will notice that in the snowflake
schema, the attributes with low cardinality in each original dimension tables are removed to
form separate tables. These new tables are connected back to the original dimension table
through artificial keys.
A snowflake schema is designed for flexible querying across more complex dimensions
and relationship. It is suitable for many to many and one to many relationships between
dimension levels.
We can create even more complex star schemas by normalizing a dimension table into several tables.
The normalized dimension table is called a Snowflake.
oIn a star schema, the fact table will be at the center and is connected to the dimension tables.
oThe tables are completely in a denormalized structure.
3
o SQL queries performance is good as there is less number of joins involved.
o Data redundancy is high and occupies more disk space.
4
5
Snowflake Schema
oA snowflake schema is an extension of star schema where the dimension tables are connected to
one or more dimensions.
oThe tables are partially denormalized in structure.
oThe performance of SQL queries is a bit less when compared to star schema as more number of
joins are involved.
oData redundancy is low and occupies less disk space when compared to star schema.
6
7
Let's see the differentiate between Star and
Snowflake Schema.
9
Basis for Comparison Star Schema Snowflake Schema
Ease of It has redundant data and hence less No redundancy and therefore more easy to maintain
Maintenance/change easy to maintain/change and change
Ease of Use Less complex queries and simple to More complex queries and therefore less easy to
understand understand
Parent table In a star schema, a dimension table In a snowflake schema, a dimension table will have one
will not have any parent table or more parent tables
Query Performance Less number of foreign keys and More foreign keys and thus more query execution
hence lesser query execution time time
10
Dimension Table It contains only a single dimension It may have more than one dimension table for each
table for each dimension dimension
Hierarchies Hierarchies for the dimension are Hierarchies are broken into separate tables in a
stored in the dimensional table snowflake schema. These hierarchies help to drill
itself in a star schema down the information from topmost hierarchies to the
lowermost hierarchies.
When to use When the dimensional table When dimensional table store a huge number of rows
contains less number of rows, we with redundancy information and space is such an
can go for Star schema. issue, we can choose snowflake schema to store
space.
Data Warehouse system Work best in any data warehouse/ Better for small data warehouse/data mart.
data mart
11
What is Fact Constellation Schema?
A Fact constellation means two or more fact tables sharing one or more dimensions. It is
also called Galaxy schema.
Fact Constellation Schema describes a logical structure of data warehouse or data mart. Fact
Constellation Schema can design with a collection of de-normalized FACT, Shared, and Conformed
Dimension tables.
12
Fact Constellation Schema is a sophisticated database design that is difficult to summarize
information. Fact Constellation Schema can implement between aggregate Fact tables or decompose
a complex Fact table into independent simplex Fact tables.
13
Example: A fact constellation schema is shown in
the figure below.
15
This schema defines two fact tables, sales, and shipping. Sales are treated along four
dimensions, namely, time, item, branch, and location. The schema contains a fact table for sales that
includes keys to each of the four dimensions, along with two measures: Rupee_sold and units_sold.
The shipping table has five dimensions, or keys: item_key, time_key, shipper_key, from_location,
and to_location, and two measures: Rupee_cost and units_shipped.
The primary disadvantage of the fact constellation schema is that it is a more challenging
design because many variants for specific kinds of aggregation must be considered and selected.
16
Thanks for listening