Normalization
Normalization
Instructor:
Peng Xie
Department of
Management
Relation
Example:
EMPLOYEE_REG
- Primary Key
- An attribute
- Uniquely identify a record
- Single Key
- Composite Key
- Example
- Student ID
- SSN
Example of Primary Keys
CUSTOMERS
Single Key
SALES
Composite Key
More on keys
- Super keys
- The set of all combination of attributes that helps uniquely identify an entity
- Suppose we have the table
6
More on keys
- Candidate keys
- The minimal super key with no redundant value
Emp_SSN EID Emp_name
1234556 332 A
3323234 232 B
3323232 223 A
1234145 321 C
- Primary key
- A primary key is selected from candidate keys
7
More on keys
8
More on keys
9
Normalization
- 5 forms of normalization
- 1 NF
- 2 NF
- 3 NF
- 3.5NF (BCNF)
- 4 NF
- 5 NF
- Normalization goal:
- Reduce redundancies to a minimum so that data updating
anomalies are removed without affecting information
retrievability.
Functional Dependency
- Notation
- A relation is denoted r(R), r is the name of the relation, R is the
set of all attributes.
- In the following example, the R is (A,B,C,D)
𝑡1 𝐾 = 𝑡1 𝐴, 𝐵 = 𝑎1 , 𝑏1 ≠ 𝑡2 𝐾 = 𝑡1 𝐴, 𝐵 = (𝑎1 , 𝑏2 )
Functional Dependency
- Functional dependency
- For r(R), if it satisfies the functional dependency 𝛼 → 𝛽, then
for any pairs of tuples, 𝑡1 and 𝑡2 , we must have:
- if 𝑡1 𝛼 = 𝑡2 𝛼 , then 𝑡1 𝛽 = 𝑡2 𝛽
- In the previous example
- NF1
- Each column must be single value (atomic value)
- Each column should store the same type value
- Each column must have a unique name
- Order of rows are irrelevant
Emp_id Last_ name First_ name Dept Salary Hire_ date Training_ course Course_name
10020 Marvin John Mktg 65000 38047 101,205 A,B
10300 Carter Michael Sales 60000 38384 103,210 C,D
20139 Gates Susan Acctg 62000 38777 202 E
20040 Sanchez Jose Finance 58000 39295 210 D
21113 Li Ping Ops 55000 39692 207 F
- NF1
- We break down these columns into multiple columns to achieve NF1
Emp_id Last_ name First_ name Dept Salary Hire_ date Training_ course Course_date Course_name
10020 Marvin John Mktg 65000 38047 101 38078 A
10020 Marvin John Mktg 65000 38047 205 38231 B
10300 Carter Michael Sales 60000 38384 103 38565 C
10300 Carter Michael Sales 60000 38412 210 38930 D
20139 Gates Susan Acctg 62000 38777 202 39326 E
20040 Sanchez Jose Finance 58000 39295 210 39661 D
21113 Li Ping Ops 55000 39692 207 40057 F
Normalization
- NF2
- NF1 is satisfied
- Partial dependency does not exist
- But what is partial dependency?
- NF3
- NF2 satisfied
- No transitive dependency exists
- But what is transitive dependency?
- Example
- In the following table the primary key is Person and Shop Type
- The nearest shop depends on both person and shop type -> 2NF
- There is only one non-key attribute -> no transitive dependency -> 3NF
- But nearest shop determines the shop type -> not NF3.5
- Solution -> separating to {person, shop} and {shop, shop type}
- But we sacrifice some query functionality
- NF4
- NF3.5 satisfied
- No multi-value dependency
- Example
- In the following table, John is associated with a set of hobbies (Ball, Singing) and a
set of phone numbers (1121, 1122)
- The two sets have more than one element
- And the Hobby and the Phone number are independent with each other
- This created the multi-value dependency problem
Emp_id Last_ name First_ name Hobby Phone number
10020Marvin John Ball 1121
10020Marvin John Singing 1122
- Imagine you could easily imply the following redundant records in the same table
Emp_id Last_ name First_ name Hobby Phone Number
10020Marvin John Ball 1122
10020Marvin John Singing 1121
- NF5
- NF4 satisfied
- No join dependency exist
- Join dependency
- Nothing depend on others logically -> no multivalued dependency -> NF4 satisfied
- If we decompose the relation into multiple smaller relationships, and then join
them, the result relationship should be exactly the same as the original relation,
then join dependency exists->not NF5, need table splitting. Otherwise, join
dependency does not exist->NF5.
- Suppose we have the following relation
Buyer Seller Lender
Smith Jones BOA
Smith Wilson Chase
Nelson Jones Chase
Maths 2 S 2
Data structure M 3
Deep learning A 5
Data structure S 4
Deep learning T 5
Computer network K 2
Digital logical design A 5
- Please verify
Normalization
- If we follow these rules specifically to normalize a table, it is going to take a long time
- Follow these steps instead
- identify entities in the stored information
- An entity is a person, place, object, event, concept about whom/which
the organization/user wishes to maintain data
- An attribute is NOT an entity but a feature of an entity.
- Example: employees and training courses are entities. An employee name
or a training course number are attributes.
- Find out the relationship between these entities
- One-to-one capital city and country
- One-to-many customer and orders
- Many-to-many students and courses
- Split the tables according to entities
- Each table should only include the attributes regarding the entity
- Attributes of an instance of the entity should not change over time
- For many to many relationship, add an association entity
- The association entity will contain both primary keys of the two entities
and any attributes regarding to the interaction of the two entity
Example of Normalization