0% found this document useful (0 votes)
2 views31 pages

Normalization

The document provides an overview of relational databases, focusing on the structure of relations, the importance of primary keys, and the concepts of normalization. It explains various types of keys, including super keys and candidate keys, and outlines the normalization process through its different forms to eliminate data anomalies. Additionally, it emphasizes the need to identify entities and their relationships for effective database design.

Uploaded by

RishabhRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views31 pages

Normalization

The document provides an overview of relational databases, focusing on the structure of relations, the importance of primary keys, and the concepts of normalization. It explains various types of keys, including super keys and candidate keys, and outlines the normalization process through its different forms to eliminate data anomalies. Additionally, it emphasizes the need to identify entities and their relationships for effective database design.

Uploaded by

RishabhRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

BAN 610 Database Management

Instructor:
Peng Xie

Department of
Management
Relation

A relation is a named, two-dimensional table of data which consists of:

- rows, also called records.


- named columns, also called attributes or fields.

Example:
EMPLOYEE_REG

employee_id last_name first_name department registration


123456789 Gruber Stacey IT Part time
111222333 Clayton John marketing Full time
Relation

A relation is a named, two-dimensional table of data which consists of:


Example:
EMPLOYEE_REG

employee_id last_name first_name department registration


123456789 Gruber Stacey IT Part time
111222333 Clayton John marketing Full time

1. The table must have a unique name


2. Every record (row) must be unique (can’t allow two rows with exactly the same values
for all their attributes)
3. Attributes (columns) in tables must have unique names
4. The order of the columns must be irrelevant
5. The order of the rows must be irrelevant
Relational Databases
Primary Key

- Primary Key
- An attribute
- Uniquely identify a record
- Single Key
- Composite Key

- Example
- Student ID
- SSN
Example of Primary Keys

CUSTOMERS

Single Key

SALES

Composite Key
More on keys

- Super keys
- The set of all combination of attributes that helps uniquely identify an entity
- Suppose we have the table

Emp_SSN EID Emp_name


1234556 332 A
3323234 232 B
3323232 223 A
1234145 321 C
- The super keys can be
- {Emp_SSN}
- {EID}
- {Emp_SSN, EID}
- {Emp_SSN,Emp_name}
- {EID, Emp_name}

6
More on keys

- Candidate keys
- The minimal super key with no redundant value
Emp_SSN EID Emp_name
1234556 332 A
3323234 232 B
3323232 223 A
1234145 321 C

- The candidate keys can be


- {Emp_SSN}
- {EID}

- Primary key
- A primary key is selected from candidate keys

7
More on keys

- Another example of super keys and candidate keys


- Consider the following table
Classroom Reserved Duration Type
1 10:00 1 Dissertation
1 11:00 2 Dissertation
1 14:00 3 Conference
2 12:00 1.5 Presentation
2 14:00 2.5 Presentation
2 17:00 3.5 Lecture
- The super keys are
- {Classroom, Reserved},
- {Classroom, Reserved, Type}
- {Classroom, Reserved, Duration}
- {Classroom, Reserved, Duration, Type}
- Why {Classroom, Duration} is not listed?

8
More on keys

- Another example of super keys and candidate keys


- Consider the following table
Classroom Reserved Duration Type
1 10:00 1 Dissertation
1 11:00 2 Dissertation
1 14:00 3 Conference
2 12:00 1.5 Presentation
2 14:00 2.5 Presentation
2 17:00 3.5 Lecture
- The candidate keys are
- {Classroom, Reserved}

9
Normalization

- The purpose of Normalization


- To fully get rid of the above-mentioned anomalies

- 5 forms of normalization
- 1 NF
- 2 NF
- 3 NF
- 3.5NF (BCNF)
- 4 NF
- 5 NF

- Each form is stricter than the last form


Relational Databases Normalization

- Normalization goal:
- Reduce redundancies to a minimum so that data updating
anomalies are removed without affecting information
retrievability.
Functional Dependency

- Notation
- A relation is denoted r(R), r is the name of the relation, R is the
set of all attributes.
- In the following example, the R is (A,B,C,D)

- Super key is a subset K of R, so that for any two tuples in r(R), 𝑡1


and 𝑡2 , if 𝑡1 ≠ 𝑡2 , we must have 𝑡1 𝐾 ≠ 𝑡2 [𝐾].
- e.g. in the above example, one super key is (A,B), so for any two
tuples such as 𝑡1 = (𝑎1 , 𝑏1 , 𝑐1 , 𝑑1 ), and 𝑡2 = (𝑎1 , 𝑏2 , 𝑐1 , 𝑑2 ). If 𝑡1 ≠
𝑡2 , we must have

𝑡1 𝐾 = 𝑡1 𝐴, 𝐵 = 𝑎1 , 𝑏1 ≠ 𝑡2 𝐾 = 𝑡1 𝐴, 𝐵 = (𝑎1 , 𝑏2 )
Functional Dependency

- Functional dependency
- For r(R), if it satisfies the functional dependency 𝛼 → 𝛽, then
for any pairs of tuples, 𝑡1 and 𝑡2 , we must have:
- if 𝑡1 𝛼 = 𝑡2 𝛼 , then 𝑡1 𝛽 = 𝑡2 𝛽
- In the previous example

- Functional dependency A → 𝐶 satisfied, since whenever column


A is the same, the column C must be the same as well.
- However, C → 𝐴 is not satisfied.
- Does the following hold?
- BC →AD?
- BD →AC?
Functional Dependency

- Functional dependency is logical


- Consider the following relation
Building Room number Capacity
A 121 120
B 203 20
C 302 10
D 121 120
D 231 130

- Just from the current instances in the relation, we have


- Room number-> capacity
- But in the future, when we add more data to this table, it is very
likely that the same room number in different building have
different capacity.
- So, logically, room number -> capacity doesn’t hold.
- Remember, functional dependency is always logical.
Normalization

- NF1
- Each column must be single value (atomic value)
- Each column should store the same type value
- Each column must have a unique name
- Order of rows are irrelevant
Emp_id Last_ name First_ name Dept Salary Hire_ date Training_ course Course_name
10020 Marvin John Mktg 65000 38047 101,205 A,B
10300 Carter Michael Sales 60000 38384 103,210 C,D
20139 Gates Susan Acctg 62000 38777 202 E
20040 Sanchez Jose Finance 58000 39295 210 D
21113 Li Ping Ops 55000 39692 207 F

- The Training_course and the Course_name contain double values


- Violating the rules
Normalization

- NF1
- We break down these columns into multiple columns to achieve NF1

Emp_id Last_ name First_ name Dept Salary Hire_ date Training_ course Course_date Course_name
10020 Marvin John Mktg 65000 38047 101 38078 A
10020 Marvin John Mktg 65000 38047 205 38231 B
10300 Carter Michael Sales 60000 38384 103 38565 C
10300 Carter Michael Sales 60000 38412 210 38930 D
20139 Gates Susan Acctg 62000 38777 202 39326 E
20040 Sanchez Jose Finance 58000 39295 210 39661 D
21113 Li Ping Ops 55000 39692 207 40057 F
Normalization

- NF2
- NF1 is satisfied
- Partial dependency does not exist
- But what is partial dependency?

- Example of partial dependency


- The primary key for this table is Emp_id and Training_course
- The non-key attribute first_name depends only on Emp_id
- Or, given the Emp_id, the first_name is known for sure
- But the first_name has nothing to do with Training_course
- Or, For the training_course 210, there are two different first_name.
- A non-key attribute only depend on one of the keys -> partial dependency
- Solution -> split table
Emp_id Last_ name First_ name Dept Salary Hire_ date Training_ course Course_name
10020 Marvin John Mktg 65000 38047 101 A
10020 Marvin John Mktg 65000 38047 205 B
10300 Carter Michael Sales 60000 38384 103 C
10300 Carter Michael Sales 60000 38412 210 D
20139 Gates Susan Acctg 62000 38777 202 E
20040 Sanchez Jose Finance 58000 39295 210 D
21113 Li Ping Ops 55000 39692 207 F
Normalization

- NF3
- NF2 satisfied
- No transitive dependency exists
- But what is transitive dependency?

- Example of transitive dependency


- In the following table, the primary key is Emp_id
- Since there is only one key in primary key, this table is in NF2
- However, we have Emp_ID -> Emp_state -> Emp -> Country
- So Emp_Country transitively dependant on the Emp_ID
- Solution -> split table
Emp_id name Emp_state Emp_country age
10020Marvin CA US 65000
10300Carter GA US 60000
20040Sanchez TX US 58000
21113Li NY US 55000
Normalization

- NF3.5 (Boyce-Codd normalization form)


- NF3 satisfied
- Non-prime attributes don’t determine one of the keys
- This is uncommon

- Example
- In the following table the primary key is Person and Shop Type
- The nearest shop depends on both person and shop type -> 2NF
- There is only one non-key attribute -> no transitive dependency -> 3NF
- But nearest shop determines the shop type -> not NF3.5
- Solution -> separating to {person, shop} and {shop, shop type}
- But we sacrifice some query functionality

Person Shop Type Nearest Shop


David Optician Eagle Eye
David Hairdresser Snippets
Wright Bookshop Merlin Books
Fuller Bakery Doughy’s
Fuller Hairdresser Sweeney Todd’s
Fuller Optician Eagle Eye
Normalization

- NF4
- NF3.5 satisfied
- No multi-value dependency
- Example
- In the following table, John is associated with a set of hobbies (Ball, Singing) and a
set of phone numbers (1121, 1122)
- The two sets have more than one element
- And the Hobby and the Phone number are independent with each other
- This created the multi-value dependency problem
Emp_id Last_ name First_ name Hobby Phone number
10020Marvin John Ball 1121
10020Marvin John Singing 1122
- Imagine you could easily imply the following redundant records in the same table
Emp_id Last_ name First_ name Hobby Phone Number
10020Marvin John Ball 1122
10020Marvin John Singing 1121

- Solution -> split table


Normalization

- NF5
- NF4 satisfied
- No join dependency exist
- Join dependency
- Nothing depend on others logically -> no multivalued dependency -> NF4 satisfied
- If we decompose the relation into multiple smaller relationships, and then join
them, the result relationship should be exactly the same as the original relation,
then join dependency exists->not NF5, need table splitting. Otherwise, join
dependency does not exist->NF5.
- Suppose we have the following relation
Buyer Seller Lender
Smith Jones BOA
Smith Wilson Chase
Nelson Jones Chase

- We can decompose the relation into three


Normalization

- The three smaller relations


- Buyer and Seller
Buyer Seller
Smith Jones
Smith Wilson
Nelson Jones

- Buyer and lender


Buyer Lender
Smith BOA
Smith Chase
Nelson Chase

- Seller and lender


Seller Lender
Jones BOA
Wilson Chase
Jones Chase
Normalization

- Join them together again.


- not the original relation -> no join dependency -> if we split into three relations,
we loss information -> NF5 is already satisfied.

Buyer Seller Lender


Smith Jones BOA
Smith Jones Chase
Smith Wilson Chase
Nelson Jones Chase
Normalization

- Example of Join Dependency.

Subject Professor Semester

Maths 2 S 2
Data structure M 3
Deep learning A 5
Data structure S 4
Deep learning T 5
Computer network K 2
Digital logical design A 5

- Please verify
Normalization

- If we follow these rules specifically to normalize a table, it is going to take a long time
- Follow these steps instead
- identify entities in the stored information
- An entity is a person, place, object, event, concept about whom/which
the organization/user wishes to maintain data
- An attribute is NOT an entity but a feature of an entity.
- Example: employees and training courses are entities. An employee name
or a training course number are attributes.
- Find out the relationship between these entities
- One-to-one capital city and country
- One-to-many customer and orders
- Many-to-many students and courses
- Split the tables according to entities
- Each table should only include the attributes regarding the entity
- Attributes of an instance of the entity should not change over time
- For many to many relationship, add an association entity
- The association entity will contain both primary keys of the two entities
and any attributes regarding to the interaction of the two entity
Example of Normalization

- The First normal form (1NF)


- A table in which the intersection of every column and record contains only
one value
Example of Normalization

- The First normal form (1NF)


- A table in which the intersection of every column and record contains only
one value
Example of Normalization

- The Second form (2NF)


Example of Normalization

- The Second form (2NF)


Example of Normalization

- The Second form (3NF)


- table that is in 1NF and 2NF and in which all non-primary-key column can be
worked out from only the primary key column(s) and no other columns.
Example of Normalization

- The Second form (3NF)

You might also like