0% found this document useful (0 votes)

3 views

PostgreSQL Data Base Design Part 2

The document discusses database design focusing on star and snowflake schemas, highlighting their structure and differences. It explains normalization as a technique to reduce redundancy and improve data integrity, detailing various normal forms and their implications. Additionally, it addresses data anomalies that can arise from insufficient normalization and emphasizes the importance of balancing normalization with query complexity.

Uploaded by

sleonsalome

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

PostgreSQL Data Base Design Part 2

Uploaded by

sleonsalome

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Star and snowflake

schema
D ATA B A S E D E S I G N

Lis Sulmont
Curriculum Manager
Star schema
Dimensional modeling: star schema Example:
Fact tables
Supply books to stores in USA and Canada
Holds records of a metric Keep track of book sales
Changes regularly

Connects to dimensions via foreign keys

Dimension tables

Holds descriptions of attributes

Does not change as often

DATABASE DESIGN
Star schema example

DATABASE DESIGN
Snowflake schema (an extension)

DATABASE DESIGN
Same fact table, different
dimensions

Star schemas: one dimension Snowflake schemas: more than one

dimension

Because dimension tables are normalized

DATABASE DESIGN
What is normalization?
Database design technique
Divides tables into smaller tables and connects them via relationships

Goal: reduce redundancy and increase data integrity

DATABASE DESIGN
What is normalization?
Database design technique
Divides tables into smaller tables and connects them via relationships

Goal: reduce redundancy and increase data integrity

Identify repeating groups of data and create new tables for them

DATABASE DESIGN
Book dimension of the star schema

Most likely to have repeating values:

Author
Publisher

Genre

DATABASE DESIGN
Book dimension of the snowflake schema

DATABASE DESIGN
Store dimension of the star schema

City

State
Country

DATABASE DESIGN
Store dimension of the snowflake schema

DATABASE DESIGN
DATABASE DESIGN
DATABASE DESIGN
Let's practice!
D ATA B A S E D E S I G N
Normalized and
denormalized
databases
D ATA B A S E D E S I G N

Lis Sulmont
Curriculum Manager
Back to our book store example
Denormalized: star schema Normalized: snowflake schema

DATABASE DESIGN
Denormalized Query
Goal: get quantity of all Octavia E. Butler books sold in Vancouver in Q4 of 2018

SELECT SUM(quantity) FROM fact_booksales

-- Join to get city
INNER JOIN dim_store_star on fact_booksales.store_id = dim_store_star.store_id
-- Join to get author
INNER JOIN dim_book_star on fact_booksales.book_id = dim_book_star.book_id
-- Join to get year and quarter
INNER JOIN dim_time_star on fact_booksales.time_id = dim_time_star.time_id
WHERE
dim_store_star.city = 'Vancouver' AND dim_book_star.author = 'Octavia E. Butler' AND
dim_time_star.year = 2018 AND dim_time_star.quarter = 4;

7600

Total of 3 joins

DATABASE DESIGN
Normalized query
SELECT
SUM(fact_booksales.quantity)
FROM
fact_booksales
-- Join to get city
INNER JOIN dim_store_sf ON fact_booksales.store_id = dim_store_sf.store_id
INNER JOIN dim_city ON dim_store_sf.city_id = dim_city_sf.city_id
-- Join to get author
INNER JOIN dim_book_sf ON fact_booksales.book_id = dim_book_sf.book_id
INNER JOIN dim_author_sf ON dim_book_sf.author_id = dim_author_sf.author_id
-- Join to get year and quarter
INNER JOIN dim_time_sf ON fact_booksales.time_id = dim_time_sf.time_id
INNER JOIN dim_month_sf ON dim_time_sf.month_id = dim_month_sf.month_id
INNER JOIN dim_quarter_sf ON dim_month_sf.quarter_id = dim_quarter_sf.quarter_id
INNER JOIN dim_year_sf ON dim_quarter_sf.year_id = dim_year_sf.year_id

DATABASE DESIGN
Normalized query (continued)
WHERE
dim_city_sf.city = `Vancouver`
AND
dim_author_sf.author = `Octavia E. Butler`
AND
dim_year_sf.year = 2018 AND dim_quarter_sf.quarter = 4;

sum
7600

Total of 8 joins

So, why would we want to normalize a databases?

DATABASE DESIGN
Normalization saves space

Denormalized databases enable data redundancy

DATABASE DESIGN
Normalization saves space

Normalization eliminates data redundancy

DATABASE DESIGN
Normalization ensures better data integrity

1. Enforces data consistency

Must respect naming conventions because of referential integrity, e.g., 'California', not 'CA' or
'california'

2. Safer updating, removing, and inserting

Less data redundancy = less records to alter

3. Easier to redesign by extending

Smaller tables are easier to extend than larger tables

DATABASE DESIGN
Database normalization
Advantages
Normalization eliminates data redundancy: save on storage

Better data integrity: accurate and consistent data

Disadvantages
Complex queries require more CPU

DATABASE DESIGN
Remember OLTP and OLAP?
OLTP OLAP
e.g., Operational databases e.g., Data warehouses

Typically highly normalized Typically less normalized

Write-intensive Read-intensive

Prioritize quicker and safer insertion of data Prioritize quicker queries for analytics

DATABASE DESIGN
Let's practice!
D ATA B A S E D E S I G N
Normal forms
D ATA B A S E D E S I G N

Lis Sulmont
Curriculum Manager
Normalization
Identify repeating groups of data and create new tables for them

A more formal definition:

The goals of normalization are to:

Be able to characterize the level of redundancy in a relational schema

Provide mechanisms for transforming schemas in order to remove redundancy

1 Database Design, 2nd Edition by Adrienne Watt

DATABASE DESIGN
Normal forms (NF)
Ordered from least to most normalized:

First normal form (1NF) Fourth normal form (4NF)

Second normal form (2NF) Essential tuple normal form (ETNF)

Third normal form (3NF) Fifth normal form (5NF)

Elementary key normal form (EKNF) Domain-key Normal Form (DKNF)

Boyce-Codd normal form (BCNF) Sixth normal form (6NF)

1 https://en.wikipedia.org/wiki/Database_normalization

DATABASE DESIGN
1NF rules
Each record must be unique - no duplicate rows
Each cell must hold one value

Initial data

| Student_id | Student_Email | Courses_Completed |

|------------|-----------------|----------------------------------------------------------|
| 235 | [email protected] | Introduction to Python, Intermediate Python |
| 455 | [email protected] | Cleaning Data in R |
| 767 | [email protected] | Machine Learning Toolbox, Deep Learning in Python |

DATABASE DESIGN
In 1NF form
| Student_id | Student_Email |
|------------|-----------------|
| 235 | [email protected] |
| 455 | [email protected] |
| 767 | [email protected] |

| Student_id | Completed |
|------------|--------------------------|
| 235 | Introduction to Python |
| 235 | Intermediate Python |
| 455 | Cleaning Data in R |
| 767 | Machine Learning Toolbox |
| 767 | Deep Learning in Python |

DATABASE DESIGN
2NF
Must satisfy 1NF AND
If primary key is one column
then automatically satisfies 2NF

If there is a composite primary key

then each non-key column must be dependent on all the keys

Initial data

| Student_id (PK) | Course_id (PK) | Instructor_id | Instructor | Progress |

|-----------------|----------------|---------------|---------------|----------|
| 235 | 2001 | 560 | Nick Carchedi | .55 |
| 455 | 2345 | 658 | Ginger Grant | .10 |
| 767 | 6584 | 999 | Chester Ismay | 1.00 |

DATABASE DESIGN
In 2NF form
| Student_id (PK) | Course_id (PK) | Percent_Completed |
|-----------------|----------------|-------------------|
| 235 | 2001 | .55 |
| 455 | 2345 | .10 |
| 767 | 6584 | 1.00 |

| Course_id (PK) | Instructor_id | Instructor |

|----------------|---------------|---------------|
| 2001 | 560 | Nick Carchedi |
| 2345 | 658 | Ginger Grant |
| 6584 | 999 | Chester Ismay |

DATABASE DESIGN
3NF
Satisfies 2NF

No transitive dependencies: non-key columns can't depend on other non-key columns

Initial Data

| Course_id (PK) | Instructor_id | Instructor | Tech |

|----------------|---------------|---------------|--------|
| 2001 | 560 | Nick Carchedi | Python |
| 2345 | 658 | Ginger Grant | SQL |
| 6584 | 999 | Chester Ismay | R |

DATABASE DESIGN
In 3NF
| Course_id (PK) | Instructor | Tech |
|----------------|---------------|--------|
| 2001 | Nick Carchedi | Python |
| 2345 | Ginger Grant | SQL |
| 6584 | Chester Ismay | R |

| Instructor_id | Instructor |
|---------------|---------------|
| 560 | Nick Carchedi |
| 658 | Ginger Grant |
| 999 | Chester Ismay |

DATABASE DESIGN
Data anomalies
What is risked if we don't normalize enough?

1. Update anomaly

2. Insertion anomaly

3. Deletion anomaly

DATABASE DESIGN
Update anomaly
Data inconsistency caused by data redundancy when updating

| Student_ID | Student_Email | Enrolled_in | Taught_by |

|------------|-----------------|-------------------------|---------------------|
| 230 | [email protected] | Cleaning Data in R | Maggie Matsui |
| 367 | [email protected] | Data Visualization in R | Ronald Pearson |
| 520 | [email protected] | Introduction to Python | Hugo Bowne-Anderson |
| 520 | [email protected] | Arima Models in R | David Stoffer |

To update student 520 's email:

Need to update more than one record, otherwise, there will be inconsistency
User updating needs to know about redundancy

DATABASE DESIGN
Insertion anomaly
Unable to add a record due to missing attributes

| Student_ID | Student_Email | Enrolled_in | Taught_by |

Unable to insert a student who has signed up but not enrolled in any courses

DATABASE DESIGN
Deletion anomaly
Deletion of record(s) causes unintentional loss of data

| Student_ID | Student_Email | Enrolled_in | Taught_by |

If we delete Student 230 , what happens to the data on Cleaning Data in R ?

DATABASE DESIGN
Data anomalies
What is risked if we don't normalize enough?

1. Update anomaly

2. Insertion anomaly

3. Deletion anomaly

The more normalized the database, the less prone it will be to data anomalies

Don't forget the downsides of normalization from the last video

DATABASE DESIGN
Let's practice!
D ATA B A S E D E S I G N

Normalization With Example2
No ratings yet
Normalization With Example2
20 pages
Normal Forms
No ratings yet
Normal Forms
23 pages
ACC: Database Normalization Basics Description of Normalization
No ratings yet
ACC: Database Normalization Basics Description of Normalization
5 pages
RDBMS Lab Manual
No ratings yet
RDBMS Lab Manual
22 pages
er diagram
No ratings yet
er diagram
28 pages
Topic 4 Normalization
No ratings yet
Topic 4 Normalization
64 pages
Data Base_Database_Databse Chapter 9
No ratings yet
Data Base_Database_Databse Chapter 9
54 pages
Normalization Bbc2024
No ratings yet
Normalization Bbc2024
10 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
Data Warehouse: Bilal Hussain
No ratings yet
Data Warehouse: Bilal Hussain
34 pages
Database Design and Development Week 1
No ratings yet
Database Design and Development Week 1
64 pages
Module 9
No ratings yet
Module 9
6 pages
normalization
No ratings yet
normalization
25 pages
Week 8 Normalisation p1
No ratings yet
Week 8 Normalisation p1
24 pages
normalaization ppt 3nf
No ratings yet
normalaization ppt 3nf
46 pages
Cia 1 Part B
No ratings yet
Cia 1 Part B
29 pages
Normalization in DBMS
No ratings yet
Normalization in DBMS
8 pages
databases
No ratings yet
databases
4 pages
Normalization
No ratings yet
Normalization
45 pages
Kroenke Dbp16e Chapter 4
No ratings yet
Kroenke Dbp16e Chapter 4
31 pages
GCSE Computer Science QP On Relational Databases and SQL
No ratings yet
GCSE Computer Science QP On Relational Databases and SQL
5 pages
مشروع DBMS
No ratings yet
مشروع DBMS
8 pages
Database System Final Project
No ratings yet
Database System Final Project
10 pages
04 Data Normalization and Erd 4-4-21
No ratings yet
04 Data Normalization and Erd 4-4-21
7 pages
Ok
No ratings yet
Ok
5 pages
Normalization Paper
No ratings yet
Normalization Paper
3 pages
NoteGPT_Database Normalisation_ Second Normal Form
No ratings yet
NoteGPT_Database Normalisation_ Second Normal Form
2 pages
Lecture 2. RDB and SQL
No ratings yet
Lecture 2. RDB and SQL
75 pages
CH09_PPT_DesigningDatabase
No ratings yet
CH09_PPT_DesigningDatabase
43 pages
Week 1
No ratings yet
Week 1
15 pages
Blue Modern Security and Technology Presentation
No ratings yet
Blue Modern Security and Technology Presentation
16 pages
Week2 (1)
No ratings yet
Week2 (1)
34 pages
q.bank Solve of Programming
No ratings yet
q.bank Solve of Programming
33 pages
db2
No ratings yet
db2
15 pages
Normalization docx (Autosaved)
No ratings yet
Normalization docx (Autosaved)
33 pages
DB Normalization
No ratings yet
DB Normalization
10 pages
Databases Chapter 1 - Database Design
No ratings yet
Databases Chapter 1 - Database Design
10 pages
12 Normalization
No ratings yet
12 Normalization
41 pages
What's The Problem?: Relational Databases
No ratings yet
What's The Problem?: Relational Databases
14 pages
Database System Final Project
No ratings yet
Database System Final Project
9 pages
603s129
No ratings yet
603s129
54 pages
Normalization
No ratings yet
Normalization
57 pages
4 normalize pdf
No ratings yet
4 normalize pdf
57 pages
Databases
No ratings yet
Databases
21 pages
Database Normalization Updated
No ratings yet
Database Normalization Updated
22 pages
Lesson 5
No ratings yet
Lesson 5
33 pages
Normalisation Seminar Report
100% (1)
Normalisation Seminar Report
29 pages
Dbms Module 3
No ratings yet
Dbms Module 3
12 pages
CDM-7-Faveenna
No ratings yet
CDM-7-Faveenna
17 pages
Database Lecture Slides
No ratings yet
Database Lecture Slides
39 pages
Unit 4: Relational Database Design
No ratings yet
Unit 4: Relational Database Design
19 pages
normalization2017bybiplapbhattarai-180211151119
No ratings yet
normalization2017bybiplapbhattarai-180211151119
27 pages
DB Lab Task-03.docx
No ratings yet
DB Lab Task-03.docx
4 pages
Applied Database
No ratings yet
Applied Database
39 pages
Notes-1 20240303
No ratings yet
Notes-1 20240303
8 pages
Designing Databases: Data Storage Design Objectives
No ratings yet
Designing Databases: Data Storage Design Objectives
8 pages
It6701 - Information Management: Unit I - Database Modelling, Management and Development
No ratings yet
It6701 - Information Management: Unit I - Database Modelling, Management and Development
35 pages
SQL Server 2014 Development Essentials
From Everand
SQL Server 2014 Development Essentials
Basit A. Masood-Al-Farooq
4.5/5 (2)
SQL and NoSQL Full Mastery: A Comprehensive Guide to Modern Data Management
From Everand
SQL and NoSQL Full Mastery: A Comprehensive Guide to Modern Data Management
Kameron Hussain
No ratings yet
Data Analytics & Visualization All-in-One For Dummies
From Everand
Data Analytics & Visualization All-in-One For Dummies
Jack A. Hyman
No ratings yet
AVHDL
No ratings yet
AVHDL
183 pages
Grade 5 PPT - Math - Q2 - W5 - Day 1
100% (2)
Grade 5 PPT - Math - Q2 - W5 - Day 1
13 pages
Grade 8 Ch5 Workbook PDF
No ratings yet
Grade 8 Ch5 Workbook PDF
57 pages
Elements of Design-1
No ratings yet
Elements of Design-1
29 pages
Lecture 8 Looping Statements
No ratings yet
Lecture 8 Looping Statements
24 pages
Simultanous Equations Chapter 4 PDF
No ratings yet
Simultanous Equations Chapter 4 PDF
10 pages
Answers SS ch1
No ratings yet
Answers SS ch1
1 page
An Explanatory Handbook To Code of Practice For Structural Use of Concrete 2004
No ratings yet
An Explanatory Handbook To Code of Practice For Structural Use of Concrete 2004
163 pages
A Guide On Geometric Design of Road
100% (3)
A Guide On Geometric Design of Road
93 pages
EE3405 Electrical Machines II PDF
No ratings yet
EE3405 Electrical Machines II PDF
6 pages
Magnetron
No ratings yet
Magnetron
23 pages
Brownand Leigh 1996
No ratings yet
Brownand Leigh 1996
13 pages
Concord Rules 1
No ratings yet
Concord Rules 1
9 pages
The Indian Community School, Kuwait Syllabus Plan For The Year 2017-2018
No ratings yet
The Indian Community School, Kuwait Syllabus Plan For The Year 2017-2018
8 pages
BS en 00933-6-2014
No ratings yet
BS en 00933-6-2014
26 pages
Code - Aster: Operator DEFI - GEOM - FIBRE
No ratings yet
Code - Aster: Operator DEFI - GEOM - FIBRE
5 pages
Figure 3.13 Transfer of Control With Multiple Interrupts
No ratings yet
Figure 3.13 Transfer of Control With Multiple Interrupts
10 pages
Aakash CPP - 3
No ratings yet
Aakash CPP - 3
34 pages
4 Redox Chemistry
No ratings yet
4 Redox Chemistry
10 pages
Development of Class 800/801 High-Speed Rolling Stock For UK Intercity Express Programme
No ratings yet
Development of Class 800/801 High-Speed Rolling Stock For UK Intercity Express Programme
9 pages
Bubbles Lab Report
100% (1)
Bubbles Lab Report
8 pages
Gauss-Seidel Method: Description
No ratings yet
Gauss-Seidel Method: Description
4 pages
Heterocyclic Compounds 3 فصل ثاني مرحلة ثانية مادة العضوية
No ratings yet
Heterocyclic Compounds 3 فصل ثاني مرحلة ثانية مادة العضوية
59 pages
ACS800MultidrivescatalogueREVE en
No ratings yet
ACS800MultidrivescatalogueREVE en
48 pages
2020-2021 S5 - M2 - Half-Yearly Exam (With Marking Scheme)
No ratings yet
2020-2021 S5 - M2 - Half-Yearly Exam (With Marking Scheme)
35 pages
Building Notes - 46356005 - 2024 - 11 - 29 - 15 - 16
No ratings yet
Building Notes - 46356005 - 2024 - 11 - 29 - 15 - 16
192 pages
Cs Commands Right
No ratings yet
Cs Commands Right
14 pages
12 Bm 170 Top 5 Marks-1
No ratings yet
12 Bm 170 Top 5 Marks-1
19 pages
Note Positivo Stilo - Xc3550 Xc3570 71r-s14ct6-t820 Schematic
100% (1)
Note Positivo Stilo - Xc3550 Xc3570 71r-s14ct6-t820 Schematic
35 pages
Characteristics of Effective MIS
0% (1)
Characteristics of Effective MIS
2 pages

PostgreSQL Data Base Design Part 2

Uploaded by

PostgreSQL Data Base Design Part 2

Uploaded by

Star and snowflake

Connects to dimensions via foreign keys

Holds descriptions of attributes

Does not change as often

Star schemas: one dimension Snowflake schemas: more than one

Because dimension tables are normalized

Goal: reduce redundancy and increase data integrity

Goal: reduce redundancy and increase data integrity

Most likely to have repeating values:

SELECT SUM(quantity) FROM fact_booksales

So, why would we want to normalize a databases?

Denormalized databases enable data redundancy

Normalization eliminates data redundancy

1. Enforces data consistency

2. Safer updating, removing, and inserting

3. Easier to redesign by extending

Better data integrity: accurate and consistent data

Typically highly normalized Typically less normalized

A more formal definition:

The goals of normalization are to:

Be able to characterize the level of redundancy in a relational schema

Provide mechanisms for transforming schemas in order to remove redundancy

1 Database Design, 2nd Edition by Adrienne Watt

First normal form (1NF) Fourth normal form (4NF)

Second normal form (2NF) Essential tuple normal form (ETNF)

Elementary key normal form (EKNF) Domain-key Normal Form (DKNF)

Boyce-Codd normal form (BCNF) Sixth normal form (6NF)

| Student_id | Student_Email | Courses_Completed |

If there is a composite primary key

| Student_id (PK) | Course_id (PK) | Instructor_id | Instructor | Progress |

| Course_id (PK) | Instructor_id | Instructor |

No transitive dependencies: non-key columns can't depend on other non-key columns

| Course_id (PK) | Instructor_id | Instructor | Tech |

| Student_ID | Student_Email | Enrolled_in | Taught_by |

To update student 520 's email:

| Student_ID | Student_Email | Enrolled_in | Taught_by |

| Student_ID | Student_Email | Enrolled_in | Taught_by |

If we delete Student 230 , what happens to the data on Cleaning Data in R ?

Don't forget the downsides of normalization from the last video

You might also like