0% found this document useful (0 votes)
69 views

Database Normalization

Normalization is the process of organizing data in a database to avoid redundancy and anomalies when data is inserted, updated, or deleted. It involves decomposing tables to eliminate redundant data and dependencies between non-key attributes. This is done through analyzing attribute dependencies and decomposing the table into smaller, normalized tables in at least first normal form with a single, unique key in each table. The normalized tables are then related through joins while ensuring no information is lost from the original table.

Uploaded by

Anabelle
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Database Normalization

Normalization is the process of organizing data in a database to avoid redundancy and anomalies when data is inserted, updated, or deleted. It involves decomposing tables to eliminate redundant data and dependencies between non-key attributes. This is done through analyzing attribute dependencies and decomposing the table into smaller, normalized tables in at least first normal form with a single, unique key in each table. The normalized tables are then related through joins while ensuring no information is lost from the original table.

Uploaded by

Anabelle
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Normalization

COURSE 4: Databases
Normalization
when and why

Databases C5: Normal Forms


Normalization
• Informal:
• Organize data in a relational database in order to avoid redundancy and data
manipulation anomalies.

• Decompose a relation (table) without loosing information.

Databases C5: Normal Forms


Incorrect
Incorrect ER
table structure

INSERT/UPDATE/DELETE
REDUNDANCY
ANOMALY

Databases C5: Normal Forms


Incorrect
Incorrect ER
table structure

decomposition /
synthesis
----
normalization
INSERT/UPDATE/DELETE
REDUNDANCY
ANOMALY

Databases C5: Normal Forms


Normalization
• Avoid redundancy

Databases C5: Normal Forms


Normalization CUSTOMER
CUSTOMER_ID LAST_NAME … ….
1 Smith ... ….
• Avoid redundancy
2 Green … ….
3 Avery ... ….

LOAN

LOAN_ID CUSTOMER_ID AMOUNT DATE


101 1 125000 18/04/21
102 1 25000 14/04/22
103 2 12500 03/05/21
127 2 20000 …
389 3 75000 …

Databases C5: Normal Forms


Normalization
• Avoid redundancy

Databases C5: Normal Forms


Normalization
• Avoid redundancy
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
3 Avery 769 mortgage 45000

Databases C5: Normal Forms


Normalization
• INSERT anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
4 Stark ??? null null

Databases C5: Normal Forms


Normalization
• UPDATE anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
3 Avery 769 mortgage 45000

Databases C5: Normal Forms


Normalization
• DELETE anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
4 Stark 700 mortgage 45000

Databases C5: Normal Forms


Table with
redundancy/not in a
normal form

Analyze
dependencies

“good”
decomposition

table in normal form table in normal form

Databases C5: Normal Forms


Decomposition

Databases C5: Normal Forms


Decomposition Step 1: Projection
NAME LOAN_ID TYPE AMOUNT
➢ S1 = ς(𝑁𝐴𝑀𝐸,𝐿𝑂𝐴𝑁𝐼𝐷,𝑇𝑌𝑃𝐸,𝐴𝑀𝑂𝑈𝑁𝑇) 𝑅 Smith 101 mortgage 125000
Smith 102 credit card 25000
Green 103 credit card 12500
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
Smith 389 mortgage 75000
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
CUSTOMER_ID NAME
3 Smith 389 mortgage 75000
1 Smith
1 Smith
2 Green
➢ 𝑆2 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝑁𝐴𝑀𝐸) 𝑅 3 Smith

Databases C5: Normal Forms


Decomposition Step 2: Join
CUSTOMER_ID NAME NAME LOAN_ID TYPE AMOUNT
1 Smith Smith 101 mortgage 125000
1 Smith Smith 102 credit card 25000
2 Green Green 103 credit card 12500
3 Smith Smith 389 mortgage 75000

CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT

• Lossy decomposition 1 Smith 101 mortgage 125000


1 Smith 102 credit card 25000
S1 ⋈ S2 ⊇ 𝑅
1 Smith 389 mortgage 75000
3 Smith 101 mortgage 125000
3 Smith 102 credit card 25000
3 Smith 389 mortgage 75000
2 Green 103 credit card 12500

Databases C5: Normal Forms


Decomposition Step 1: Projection
CUSTOMER_ID LOAN_ID TYPE AMOUNT
➢ S1 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝐿𝑂𝐴𝑁𝐼𝐷,𝑇𝑌𝑃𝐸,𝐴𝑀𝑂𝑈𝑁𝑇) 𝑅1 101 mortgage 125000
1 102 credit card 25000
2 103 credit card 12500
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
3 389 mortgage 75000
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
CUSTOMER_ID NAME
3 Smith 389 mortgage 75000
1 Smith
1 Smith
2 Green
➢ 𝑆2 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝑁𝐴𝑀𝐸) 𝑅 3 Smith

Databases C5: Normal Forms


Decomposition Step 2: Join
CUSTOMER_ID NAME CUSTOMER_ID LOAN_ID TYPE AMOUNT
1 Smith 1 101 mortgage 125000
1 Smith 1 102 credit card 25000
2 Green 2 103 credit card 12500
3 Smith 3 389 mortgage 75000

CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT


• Lossless decomposition 1 Smith 101 mortgage 125000
S1 ⋈ S2 = 𝑅 1 Smith 102 credit card 25000
3 Smith 389 mortgage 75000
2 Green 103 credit card 12500

Databases C5: Normal Forms


Decomposition
• lossy decompositions and lossless decompositions.

• Lossy: R → decompose(R): S1, S2 → recompose(S1,S2) ⊇ R

lossy =/= less data, (less is more!)


lossy = lost information

• Lossless R → decompose(R): S1, S2 → recompose(S1,S2) = R

Databases C5: Normal Forms


Decomposition
• Lossy
ς𝑅1 𝑅 ⋈ ς𝑅2 𝑅 ⊇ R

• Lossless
ς𝑅1 𝑅 ⋈ ς𝑅2 𝑅 = R

Databases C5: Normal Forms


Functional dependencies

Databases C5: Normal Forms


Functional dependencies
CUSTOMER_ID NAME X Y Z T
1 Smith X1 Y1 Z1 T1
1 Smith X1 Y2 Z1 T2
2 Green X2 Y2 Z2 T2
3 Smith X2 Y3 Z2 T3
X3 Y3 Z2 T4

• CUSTOMER_ID -> NAME


• X -> Z
• Z --/--> X
• X -> X

Databases C5: Normal Forms


Normal Forms

Databases C5: Normal Forms


NF1 NF2 NF3 BCNF NF4 NF5

Databases C5: Normal Forms


NF1
ATOMIC ATTRIBUTES

Databases C5: Normal Forms


NF1
• Atomic attributes
• No multi-valued attributes

• The domain of each attribute contains only atomic values and each
attribute contains only a value of its domain.

• A relational database is at least in NF1

Databases C5: Normal Forms


NF1
EMP_ID NAME EMAIL
1 Williams [email protected] Optional:
Composed key
[email protected] Artificial key
2 Davis [email protected]
[email protected]
3 Miller [email protected] EMAIL_ID EMP_ID EMAIL
4 Stewart [email protected] 1 1 [email protected]
[email protected] 2 1 [email protected]
3 2 [email protected]
4 2 [email protected]
5 3 [email protected]
6 4 [email protected]
7 4 [email protected]

Databases C5: Normal Forms


NF2
NO PARTIAL DEPENDENCIES

Databases C5: Normal Forms


NF2
• Tables in NF1
• No non-key attributes (not part of the key) that depend on a subset of
the attributes forming the key.

• There are no partial dependencies.

Databases C5: Normal Forms


Functional dependencies
X Y Z T X Y Z T
X1 Y1 Z1 T1 X1 Y1 Z1 …
X2 Y1 Z1 T2 X2 Y1 Z1 …
X2 Y2 Z2 T3 X2 Y2 Z2 …
X2 Y3 Z2 T3 X2 Y3 Z2 …
X2 Y3 Z2 T3 X2 Y3 Z2 …

• partial (X,Y) → Z
• Y →Z

Databases C5: Normal Forms


Functional dependencies
X Y Z T
X Y Z T … Y1 … T1
X1 Y1 … T1 … Y1 … T2
X2 Y1 … T2 … … … …
X2 Y2 … T3 … … … …
X2 Y3 … T3 … … … …
X2 Y3 … T3

X Y Z T

• total (X,Y) → T … … … …
X2 … … T2
• X -/-> T
X2 … … T3
• Y -/-> T
… … … …
… … … …

Databases C5: Normal Forms


Functional dependencies

AIRPORT_ID AIRPLANE_ID DEPARTURE AIRPLANE_MODEL BOARDING_GATE


1 101 30/03/20 17:00 Boeing 777 42
1 102 02/05/20 09:30 Airbus A320 50
2 201 06/08/20 10:45 Boeing 757 35
2 202 10/10/20 06:20 Airbus A320 10
1 101 06/04/20 16:35 Boeing 777 23

Databases C5: Normal Forms


NF2 dependencies

K1,K2 X, Y
K1 -> X

(K1, K2) -> Y

Databases C5: Normal Forms


NF2 dependencies

K1,K2 X, Y
K1 -> X

(K1, K2) -> Y

K1,K2 Y K1 X

Databases C5: Normal Forms


NF2 dependencies

K1,K2 X, Y
K1 -> X

(K1, K2) -> Y

K1 = AIRPLANE_ID
K2 = AIRPORT_ID, DEPARTURE K1,K2 Y K1 X
Y = BOARDING_GATE
X = AIRPLANE_MODEL

Databases C5: Normal Forms


AIRPORT_ID AIRPLANE_ID DEPARTURE AIRPLANE_MODEL BOARDING_GATE
1 101 30/03/20 17:00 Boeing 777 42
1 102 02/05/20 09:30 Airbus A320 50
2 201 06/08/20 10:45 Boeing 757 35
2 202 10/10/20 06:20 Airbus A320 10
1 101 06/04/20 16:35 Boeing 777 23

AIRPORT_ID AIRPLANE_ID DEPARTURE BOARDING_GATE AIRPLANE_ID AIRPLANE_MODEL


1 101 30/03/20 17:00 42 101 Boeing 777
1 102 02/05/20 09:30 50 102 Airbus A320
2 201 06/08/20 10:45 35 201 Boeing 757
2 202 10/10/20 06:20 10 202 Airbus A320
1 101 06/04/20 16:35 23

Databases C5: Normal Forms


NF3
NO TRANSITIVE DEPENDENCIES

Databases C5: Normal Forms


NF3
• Tables in NF2
• Non-key attributes (not part of the key) depend on the entire key and
only on the key.

• There are no transitive dependencies.

Databases C5: Normal Forms


AIRPORT_ID AIRPLANE_ID DEPARTURE MODEL CAPACITY REVISION_DATE BOARDING_GATE
1 101 30/03/20 17:00 Boeing 777 451 01/01/2021 42
1 102 02/05/20 09:30 Airbus A320 150 01/03/2020 50
2 201 06/08/20 10:45 Boeing 757 295 03/05/2020 35
2 202 10/10/20 06:20 Airbus A320 150 04/06/2021 10
1 101 06/04/20 16:35 Boeing 777 451 08/09/2020 23

AIRPORT_ID AIRPLANE_ID DEPARTURE BOARDING_GATE AIRPLANE_ID MODEL CAPACITY REVISION_DATE


1 101 30/03/20 17:00 42
101 Boeing 777 451 01/01/2021
1 102 02/05/20 09:30 50
102 Airbus A320 150 01/03/2020
2 201 06/08/20 10:45 35
201 Boeing 757 259 03/05/2020
2 202 10/10/20 06:20 10
202 Airbus A320 150 04/06/2021
1 101 06/04/20 16:35 23

Databases C5: Normal Forms


AIRPLANE_ID MODEL CAPACITY REVISION_DATE

101 Boeing 777 451 01/01/2021


102 Airbus A320 150 01/03/2020
201 Boeing 757 259 03/05/2020
202 Airbus A320 150 04/06/2021

Databases C5: Normal Forms


NF3 dependencies

K X, Y, Z
K -> X

X -> Y

Databases C5: Normal Forms


NF3 dependencies

K X, Y, Z
K -> X

X -> Y

K X, Z X Y

Databases C5: Normal Forms


NF3 dependencies

K X, Y, Z
K -> X

X -> Y

K = AIRPLANE_ID
X = AIRPLANE_MODEL K X, Z X Y
Y = CAPACITY
Z= REVISION_DATE

Databases C5: Normal Forms


AIRPLANE_ID MODEL CAPACITY REVISION_DATE

101 Boeing 777 451 01/01/2021


102 Airbus A320 150 01/03/2020
201 Boeing 757 259 03/05/2020
202 Airbus A320 150 04/06/2021

AIRPLANE_ID MODEL REVISION_DATE


MODEL CAPACITY
101 Boeing 777 01/01/2021
Boeing 777 451
102 Airbus A320 01/03/2020
Airbus A320 150
201 Boeing 757 03/05/2020
Boeing 757 259
202 Airbus A320 04/06/2021

Databases C5: Normal Forms

You might also like