0% found this document useful (0 votes)

11 views17 pages

Unit 4 - Notes-1

Partitioning is essential for managing large fact tables in data warehouses, enhancing performance, and facilitating backup/recovery. Various partitioning strategies include horizontal partitioning by time, dimension-based partitioning, and vertical partitioning through normalization and row splitting. The document also discusses fact tables, dimension tables, types of facts, and design considerations for summary tables to optimize data access and query performance.

Uploaded by

smartbroad26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views17 pages

Unit 4 - Notes-1

Uploaded by

smartbroad26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT 4

PARTITIONING
Partitioning is done to enhance performance and facilitate easy management of data.
Partitioning also helps in balancing the various requirements of the system. It optimizes the
hardware performance and simplifies the management of data warehouse by partitioning each
fact table into multiple separate partitions. In this chapter, we will discuss different
partitioning strategies.

Why is it Necessary to Partition?

Partitioning is important for the following reasons −

∙ For easy management,

∙ To assist backup/recovery,
∙ To
enhance performance.
For Easy Management
The fact table in a data warehouse can grow up to hundreds of gigabytes in size. This huge
size of fact table is very hard to manage as a single entity. Therefore it needs partitioning.
To Assist Backup/Recovery
If we do not partition the fact table, then we have to load the complete fact table with all the
data. Partitioning allows us to load only as much data as is required on a regular basis. It
reduces the time to load and also enhances the performance of the system.
Note − To cut down on the backup size, all partitions other than the current partition can be
marked as read-only. We can then put these partitions into a state where they cannot be
modified. Then they can be backed up. It means only the current partition is to be backed up.
To Enhance Performance
By partitioning the fact table into sets of data, the query procedures can be enhanced. Query
performance is enhanced because now the query scans only those partitions that are relevant.
It does not have to scan the whole data.

TYPES OF PARTITIONING

HORIZONTAL PARTITIONING

There are various ways in which a fact table can be partitioned. In horizontal partitioning, we
have to keep in mind the requirements for manageability of the data warehouse.
1. Partitioning by Time into Equal Segments
In this partitioning strategy, the fact table is partitioned on the basis of time period. Here each
time period represents a significant retention period within the business. For example, if the
user queries for month to date data then it is appropriate to partition the data into monthly
segments. We can reuse the partitioned tables by removing the data in them.
2. Partition by Time into Different-sized Segments
This kind of partition is done where the aged data is accessed infrequently. It is implemented
as a set of small partitions for relatively current data, larger partition for inactive data.
Points to Note
∙ The detailed information remains available online.
∙ The number of physical tables is kept relatively small, which reduces the
operating cost.
∙ This technique is suitable where a mix of data dipping recent history and data
mining through entire history is required.
∙ This technique is not useful where the partitioning profile changes on a regular
basis, because repartitioning will increase the operation cost of data
warehouse.

3. Partition on a Different Dimension

The fact table can also be partitioned on the basis of dimensions other than time such as
product group, region, supplier, or any other dimension. Let's have an example.
Suppose a market function has been structured into distinct regional departments like on a
state by state basis. If each region wants to query on information captured within its region,
it would prove to be more effective to partition the fact table into regional partitions. This will
cause the queries to speed up because it does not require to scan information that is not
relevant.
Points to Note
∙ The query does not have to scan irrelevant data which speeds up the query
process.
∙ This technique is not appropriate where the dimensions are unlikely to change in
future. So, it is worth determining that the dimension does not change in
future.
∙ If the dimension changes, then the entire fact table would have to be
repartitioned.
Note − We recommend to perform the partition only on the basis of time dimension, unless
you are certain that the suggested dimension grouping will not change within the life of the
data warehouse.
4. Partition by Size of Table
When there are no clear basis for partitioning the fact table on any dimension, then we should
partition the fact table on the basis of their size. We can set the predetermined size as a
critical point. When the table exceeds the predetermined size, a new table partition is created.
Points to Note
∙ This partitioning is complex to manage.
∙ It requires metadata to identify what data is stored in each partition.

5. Partitioning Dimensions
If a dimension contains large number of entries, then it is required to partition the dimensions.
Here we have to check the size of a dimension.
Consider a large design that changes over time. If we need to store all the variations in order
to apply comparisons, that dimension may be very large. This would definitely affect the
response time.
6. Round Robin Partitions
In the round robin technique, when a new partition is needed, the old one is archived. It uses
metadata to allow user access tool to refer to the correct table partition.
This technique makes it easy to automate table management facilities within the data
warehouse.

VERTICAL PARTITION

Vertical partitioning splits the data vertically. The following images depicts how vertical
partitioning is done.

Vertical partitioning can be performed in the following two ways −

∙ Normalization
∙ Row Splitting

1. Normalization
Normalization is the standard relational method of database organization. In this method, the
rows are collapsed into a single row, hence it reduce space. Take a look at the following tables
that show how normalization is performed.
Table before Normalization
Product_id Qty Value sales_date Store_id Store_name Location Region

30 5 3.67 3-Aug-13 16 sunny Bangalore S

35 4 5.33 3-Sep-13 16 sunny Bangalore S

40 5 2.50 3-Sep-13 64 san Mumbai W

45 7 5.66 3-Sep-13 16 sunny Bangalore S

Table after Normalization

Store_id Store_name Location Region

16 sunny Bangalore W

64 san Mumbai S

Product_id Quantity Value sales_date Store_id

30 5 3.67 3-Aug-13 16

35 4 5.33 3-Sep-13 16

40 5 2.50 3-Sep-13 64

45 7 5.66 3-Sep-13 16

2. Row Splitting
Row splitting tends to leave a one-to-one map between partitions. The motive of row splitting
is to speed up the access to large table by reducing its size.
Note − While using vertical partitioning, make sure that there is no requirement to perform a
major join operation between two partitions.
What is Fact Table?
A fact table is a primary table in a dimensional model.

A Fact Table contains

1. Measurements/facts
2. Foreign key to dimension table

What is a Dimension Table?

∙ A dimension table contains dimensions of a fact.

∙ They are joined to fact table via a foreign key.
∙ Dimension tables are de-normalized tables.
∙ The Dimension Attributes are the various columns in a dimension table
∙ Dimensions offers descriptive characteristics of the facts with the help of their
attributes
∙ No set limit set for given for number of dimensions
∙ The dimension can also contain one or more hierarchical relationships
Fact Table vs Dimension Table
Parameters Fact Table Dimension Table

Definition Measurements, metrics or Companion table to the fact table

facts about a business contains descriptive attributes to be
process. used as query constraining.

Characteris Located at the center of a star or Connected to the fact table and located
tic snowflake schema and at the edges of the star or snowflake
surrounded by dimensions. schema

Design Defined by their grain or its Should be wordy, descriptive,

most atomic level. complete, and quality assured.

Task Fact table is a measurable event Collection of reference information

for which dimension table data is about a business.
collected and is used for analysis
and reporting.

Type of Facts tables could contain Evert dimension table contains

Data information like sales against a attributes which describe the details of
set of dimensions like Product the dimension. E.g., Product
and Date. dimensions can contain Product ID,
Product Category, etc.

Key Primary Key in fact table Dimension table has a primary key
is mapped as foreign columns that uniquely identifies
keys to each dimension.
Dimensions.

Storage Helps to store report labels and Load detailed atomic data into
filter domain values in dimensional structures.
dimension tables.

Hierarchy Does not contain Hierarchy Contains Hierarchies. For example

Location could contain, country, pin
code, state, city, etc.

Types of Facts Table

The fact table is a central table in the data schemas. It is found in the centre of a star schema
or snowflake schema and surrounded by a dimension table. It contains the facts of a particular
business process, such as sales revenue by month. Facts are known as measurements or
matrices. It captures a measurement or a metric. It is an essential concept for data
warehousing and BI Certification.

The fact table stores quantitative information of analysis that is not arranged. The fact table is
a primary table in the dimensional model. It also contains measurement, metric and
quantitative information.
Types of Facts

There are three types of facts:

1. Summative facts: Summative facts are used with aggregation functions such as sum (),
average (), etc.
2. Semi summative facts: There are small numbers of quasi-summative fact aggregation
functions that will apply. For example, consider bank account details. We also cannot
also apply () for a bank balance which will not have useful results, but the minimum()
and maximum() functions return useful information.
3. Non-additive facts: We cannot use numerical aggregation functions such as sum (),
average (), on non-additive facts. For non-additive facts, ratio or percentage is used.

Types of Fact Table

1. Transaction Fact Table

The transaction fact table is a basic approach to operate the businesses. These fact tables
represent an event that occurs at the primary point. A line exists in the fact table for the
customer or product when the transaction occurs.

Many rows in a fact table connect to a customer or product because they are involved in
multiple transactions. Transaction data is often structured quickly in a one-dimensional
framework. The lowest-level data is the rawest dimensional data that cannot be done by
summarized data.

2. Snapshot Fact Table

The snapshot fact table describes the state of things at a particular time and contains many
semi-additive and non-additive facts.

Example: The daily equilibrium fact is expressed by the customer dimension but not by the
time dimension.

Periodic snapshots require the performance of the business at regular and estimated time
intervals. Unlike a transaction fact table where we load a row for each event, with periodic
snapshots, we take a picture of the activity at the end of the day, week, or month, and then
another picture at the end of the next period.

Example: Performance summary of a salesman during the previous month.

3. Accumulated Fact Sheet

The accumulated fact table is used to show the activity of a process that has a beginning and
an end.

For example, we are processing an order. An order remains in the process until it will be
processed. As the step towards completing the order is completed, the corresponding row in
the fact table is updated.

Fact less Facts

We have also a transaction fact tables which contain no measures. We call it as fact less fact
tables. These tables are used to capture the action of the business process. For example, a
criminal case is a simple fact with no measures but can have a lot of dimensional attributes
associated with the fact.

DESIGN SUMMARY TABLE

Summary tables store data that is aggregated and/or summarized for performance reasons
(i.e., to improve the performance of business queries). Most business queries (i.e.,
approximately 80%) will run against summary tables.
Data is aggregated by combining multiple concepts together and/or combining large amounts
of detailed data together. Most business queries analyze a summarization or aggregation of
data (i.e., facts) across one or more dimensions. Therefore, a summary table may use
multiple dimensions. For example, a table that analyzes accounts by region by customer by
service by month uses four dimensions.

Design Considerations

the main objective when designing summary tables is to minimize the amount of data being
accessed and the number of tables being joined. This is done by storing intermediate query
results, such as:

1. summaries of large amounts of data (e.g., summing product inventory by

quarter), 2. Combinations of multiple concepts (e.g., sales by customer by market),
3. Reference data (e.g., product description).

Identify What to Aggregate

Examine highly used business queries and problem business queries (i.e., queries that are
slow or consume a lot of resources) and identify aggregation requirements. Define:

Fact data to be summarized. For example, a fact table has the following three
dimensions: Service, Geographical Location, and Time. There are multiple queries that
aggregate the facts by month. The summary table created to meet this requirement has the
following dimensions: Service, Geographical Location, and Month. The summary table time
dimension contains a month (i.e., MM) instead of a date (i.e., YYMMDD). This summary
table reduces the number of rows to be read and the length of the rows.

Attributes (i.e., metrics) in the fact tables that should be aggregated. Examine problem
queries and identify the attributes that are aggregated by the queries. The aggregation will
most probably be across multiple dimensions, and multiple attributes from the same fact table
will also probably have to be aggregated.

Related facts to be aggregated into the same summary table. Examine problem business
queries and for each query identify the facts (i.e., from different fact tables) that are
aggregated by the query. Aggregating multiple facts into the same summary table improves
performance by replacing multiple queries or a complex query containing multiple unions
with a simple query.

Identify How Much to Aggregate

For each summary table, select the degree of aggregation required. Once facts are aggregated
to a certain level of detail more detail is not available within that summary table.

To ensure greater flexibility, a rule of thumb is to aggregate to one level of detail greater than
what is required and aggregate up a level when the queries run. However, this approach
should not be used if the number of rows to be aggregated (when the queries are run) will be
large.

Select the Level of Denormalization

Summary tables are recreated on a regular basis, therefore including dimension data in
summary tables is not an issue. To limit the number of table joins (i.e., summary table to
dimension tables), a rule of thumb is to use real world keys in summary tables and not use
generated keys. This approach should not be used if the summarization level of the summary
table is low (i.e., the summary table contains a lot of rows). Another rule of thumb that
minimizes joins is to always store physical dates in summary tables.

Design Indexes

To maximize the performance (and use) of summary tables, the rule of thumb is to index all
access paths to a summary table. Summary tables (i.e., aggregations) will change regularly.
The design, creation, and maintenance of summary tables should be assigned to someone
who is end-user-oriented and doesn't mind change.

It is very easy to create a lot of summary tables. Too many summary tables will increase costs
(e.g., disk space, resources to create the tables, resources to maintain the tables).

Consider the following when deciding to create summary tables:

1. Store multiple occurrences within a level in one summary table instead of separate
summary tables for each level (e.g., store monthly transactions in one summary table instead
of creating a separate table for each month).

2. Aggregating multiple facts into one summary table considerably increases the row size of
the summary table; therefore only use this approach for highly used queries.

3. Aggregate summary tables to one level of detail greater than the level required.

Understanding The Dimensional Modeling 1727537549
No ratings yet
Understanding The Dimensional Modeling 1727537549
21 pages
4 - Dimensional Modeling
No ratings yet
4 - Dimensional Modeling
71 pages
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
No ratings yet
BSC IT TB For 5th Semester (Data Warehousing - 53) Kuvempu University
7 pages
DWT Chapter 2 Part 1
No ratings yet
DWT Chapter 2 Part 1
18 pages
Partitioning
No ratings yet
Partitioning
8 pages
What Are The Dimensions in Data Warehouse
100% (1)
What Are The Dimensions in Data Warehouse
6 pages
Week 04 - 05
No ratings yet
Week 04 - 05
60 pages
CSIS 3300 W3 Denormalization StarSchema
No ratings yet
CSIS 3300 W3 Denormalization StarSchema
27 pages
Method For Developing and Partitioning Graph-Based Data Warehouses Using Association Rules
No ratings yet
Method For Developing and Partitioning Graph-Based Data Warehouses Using Association Rules
12 pages
Data Warehousing - C03 - DM
No ratings yet
Data Warehousing - C03 - DM
42 pages
Ravi Data Warehousing Concepts Document 1665375367
No ratings yet
Ravi Data Warehousing Concepts Document 1665375367
49 pages
Lecture 4
No ratings yet
Lecture 4
24 pages
Designing A Data Warehouse
No ratings yet
Designing A Data Warehouse
36 pages
DWM Unit-Ii Notes
No ratings yet
DWM Unit-Ii Notes
27 pages
Data Mining Questions
No ratings yet
Data Mining Questions
9 pages
Dimensional Modeling (III)
No ratings yet
Dimensional Modeling (III)
11 pages
Partitioning Method
No ratings yet
Partitioning Method
8 pages
3 RD Unit Partioning
No ratings yet
3 RD Unit Partioning
3 pages
U4 - 5 I o Parallelism
No ratings yet
U4 - 5 I o Parallelism
8 pages
Modeling Fact Tables in Warehouse - Microsoft Fabric - Microsoft Learn
No ratings yet
Modeling Fact Tables in Warehouse - Microsoft Fabric - Microsoft Learn
7 pages
DataWarehouse Interview Question
No ratings yet
DataWarehouse Interview Question
7 pages
Designing A Data Warehouse
No ratings yet
Designing A Data Warehouse
38 pages
1.1 (Dimensional Modelling)
No ratings yet
1.1 (Dimensional Modelling)
51 pages
Chapter V
No ratings yet
Chapter V
38 pages
Microsoft - Strategies For Partitioning Relational Data Warehouses in SQL Server
No ratings yet
Microsoft - Strategies For Partitioning Relational Data Warehouses in SQL Server
27 pages
Craige
No ratings yet
Craige
4 pages
Partitioning - DW
No ratings yet
Partitioning - DW
14 pages
Database Partitioning A Review Paper
No ratings yet
Database Partitioning A Review Paper
4 pages
DB Monitoring & Performance Script
100% (1)
DB Monitoring & Performance Script
14 pages
dw4 - Dimension1
No ratings yet
dw4 - Dimension1
75 pages
A Comprehensive Guide To Oracle Partitioning With Samples
No ratings yet
A Comprehensive Guide To Oracle Partitioning With Samples
36 pages
Populating A DW With SS2K
No ratings yet
Populating A DW With SS2K
5 pages
DW - Unit 2
No ratings yet
DW - Unit 2
11 pages
Snowflake or Flatten in A DM
No ratings yet
Snowflake or Flatten in A DM
3 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
14 pages
Bi Lecture4 - 2023
No ratings yet
Bi Lecture4 - 2023
49 pages
Data Warehouse Implementation
No ratings yet
Data Warehouse Implementation
37 pages
Partitioning PDF
No ratings yet
Partitioning PDF
5 pages
DWM
No ratings yet
DWM
19 pages
Lecture 1 Notes: Dimension Tables
No ratings yet
Lecture 1 Notes: Dimension Tables
2 pages
DW CrashCoursePPT
No ratings yet
DW CrashCoursePPT
24 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
46 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
4 pages
DW - Chap 6
No ratings yet
DW - Chap 6
2 pages
The Data Warehouse ETL Toolkit - Chapter 06
No ratings yet
The Data Warehouse ETL Toolkit - Chapter 06
77 pages
Dimensional Modeling
100% (1)
Dimensional Modeling
19 pages
KU82 Pivoting Fact Table With Fact Dimension
No ratings yet
KU82 Pivoting Fact Table With Fact Dimension
1 page
Types of Facts Table - Javatpoint
No ratings yet
Types of Facts Table - Javatpoint
1 page
Partitioning Strategy
No ratings yet
Partitioning Strategy
17 pages
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
No ratings yet
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
10 pages
The Process of Data Mapping For Data Integration Projects
No ratings yet
The Process of Data Mapping For Data Integration Projects
6 pages
Data Stage
No ratings yet
Data Stage
10 pages
Fact Tables
No ratings yet
Fact Tables
3 pages
Data Warehouse Schema
No ratings yet
Data Warehouse Schema
10 pages
Table Partitioning in SQL Server
No ratings yet
Table Partitioning in SQL Server
11 pages
Assignment Day13 Lesson10 JDBC
No ratings yet
Assignment Day13 Lesson10 JDBC
11 pages
Structured Query Language
100% (1)
Structured Query Language
62 pages
Data Mning
No ratings yet
Data Mning
10 pages
Dimensions DW
No ratings yet
Dimensions DW
6 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
11 pages
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Lab1-Algorithms For Information Retrieval. Introduction
No ratings yet
Lab1-Algorithms For Information Retrieval. Introduction
13 pages
SAP Audit Information and Approach
No ratings yet
SAP Audit Information and Approach
17 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
Fact Tables
No ratings yet
Fact Tables
3 pages
Data Catalog Guidelines
No ratings yet
Data Catalog Guidelines
29 pages
Python Basics
No ratings yet
Python Basics
39 pages
Veeam Availability Suite 9 5 Editions Comparison
No ratings yet
Veeam Availability Suite 9 5 Editions Comparison
7 pages
DBMS 2nd Unit Notes
No ratings yet
DBMS 2nd Unit Notes
42 pages
2016 - IT 244 - Assignment1
No ratings yet
2016 - IT 244 - Assignment1
4 pages
LVM3
No ratings yet
LVM3
5 pages
4.4 Normalization
No ratings yet
4.4 Normalization
55 pages
Assignment 1
No ratings yet
Assignment 1
15 pages
5
No ratings yet
5
2 pages
Depression Screening GenAI
No ratings yet
Depression Screening GenAI
9 pages
Reviewer Infoshit
No ratings yet
Reviewer Infoshit
9 pages
CC - Unit IV
No ratings yet
CC - Unit IV
20 pages
Understanding Hard Disk Partitions
No ratings yet
Understanding Hard Disk Partitions
2 pages
2017 - Corbellini Et Al. - Persisting Big-Data, The NoSQL Landscape
No ratings yet
2017 - Corbellini Et Al. - Persisting Big-Data, The NoSQL Landscape
23 pages
J OHN
No ratings yet
J OHN
8 pages
Data Fusion
No ratings yet
Data Fusion
107 pages
Big Data Analytics (2017 Regulation)
No ratings yet
Big Data Analytics (2017 Regulation)
8 pages
Plflow 2 Userguide
No ratings yet
Plflow 2 Userguide
22 pages
Rubrik and GDPR Technical Reference PDF
No ratings yet
Rubrik and GDPR Technical Reference PDF
11 pages
Vdocument - in - Appsync With Dell Emc Unity Table of Contents Executive Summary 3 Audience
No ratings yet
Vdocument - in - Appsync With Dell Emc Unity Table of Contents Executive Summary 3 Audience
22 pages
Intelligence Community Massive Digital Data Systems Initiative
No ratings yet
Intelligence Community Massive Digital Data Systems Initiative
18 pages
DDB Lec 1
No ratings yet
DDB Lec 1
18 pages
Risxai Presentation Template
No ratings yet
Risxai Presentation Template
2 pages
C355 1 Team File Submission Team 3 001
No ratings yet
C355 1 Team File Submission Team 3 001
5 pages
Attach .MDF File Without .LDF File in SQL Server
100% (3)
Attach .MDF File Without .LDF File in SQL Server
2 pages
VXFS Commands
No ratings yet
VXFS Commands
2 pages
Laravel Cheat Sheet: by Via
No ratings yet
Laravel Cheat Sheet: by Via
3 pages

Unit 4 - Notes-1

Uploaded by

Unit 4 - Notes-1

Uploaded by

UNIT 4

Why is it Necessary to Partition?

Partitioning is important for the following reasons −

∙ For easy management,

3. Partition on a Different Dimension

Vertical partitioning can be performed in the following two ways −

30 5 3.67 3-Aug-13 16 sunny Bangalore S

35 4 5.33 3-Sep-13 16 sunny Bangalore S

40 5 2.50 3-Sep-13 64 san Mumbai W

Table after Normalization

Product_id Quantity Value sales_date Store_id

A Fact Table contains

What is a Dimension Table?

∙ A dimension table contains dimensions of a fact.

Definition Measurements, metrics or Companion table to the fact table

Design Defined by their grain or its Should be wordy, descriptive,

Task Fact table is a measurable event Collection of reference information

Type of Facts tables could contain Evert dimension table contains

Hierarchy Does not contain Hierarchy Contains Hierarchies. For example

Types of Facts Table

There are three types of facts:

Types of Fact Table

1. Transaction Fact Table

2. Snapshot Fact Table

Example: Performance summary of a salesman during the previous month.

Fact less Facts

DESIGN SUMMARY TABLE

1. summaries of large amounts of data (e.g., summing product inventory by

Identify What to Aggregate

Identify How Much to Aggregate

Select the Level of Denormalization

Consider the following when deciding to create summary tables:

You might also like