0% found this document useful (0 votes)
88 views71 pages

Normalization

The given functional dependency AB -> is incomplete. Functional dependencies must specify the attributes on the right hand side that are functionally determined by the attributes on the left hand side. Without the right hand side attributes, it is not possible to determine if this is a full functional dependency or what the candidate keys are. More information is needed to analyze this functional dependency.

Uploaded by

Anmol Omar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views71 pages

Normalization

The given functional dependency AB -> is incomplete. Functional dependencies must specify the attributes on the right hand side that are functionally determined by the attributes on the left hand side. Without the right hand side attributes, it is not possible to determine if this is a full functional dependency or what the candidate keys are. More information is needed to analyze this functional dependency.

Uploaded by

Anmol Omar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 71

Normalization in DBMS

• Normalization is the process of organizing the


data in the database.
• Normalization is used to minimize the
redundancy from a relation or set of relations. It
is also used to eliminate the undesirable
characteristics like Insertion, Update and
Deletion Anomalies.
• Normalization divides the larger table into the
smaller table and links them using relationship.
• The normal form is used to reduce redundancy
from the database table.
Why Normalisation is needed?
• If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when
we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left with
old values. Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere
else.
• Insert anomalies − We tried to insert data in a record that does not exist at
all.
Example of Anomalies

Sid Sname Cid Cname Fid Fname Salary

1 Vinit C1 DBMS F1 Sanskriti 60000

2 Kaustubh C2 JAVA F2 Samreen 50000

3 Mihir C1 DBMS F1 Sanskriti 60000

4 Vanshika C1 DBMS F1 Sanskriti 60000


Normal Forms
First Normal Form (1NF)

• A relation will be 1NF if it contains an atomic


value.
• It states that an attribute of a table cannot
hold multiple values. It must hold only single-
valued attribute.
• First normal form disallows the multi-valued
attribute, composite attribute, and their
combinations.
• There should be no multi-valued attribute
• Student table
• This table is not in first normal form
Roll no Name Course
1 Sai c/c++
2 Harsh JAVA
3 Onkar C/DBMS
How to convert to first normal form
Roll No Name Course
1 Sai C
1 Sai C++
2 Harsh Java
3 Onkar C
3 Onkar DBMS

Primary key= Roll No + Course ---- Composite Key


Second solution
Roll No Name Course1 Course2

1 Sai C C++

2 Harsh Java NULL

3 Onkar C DBMS

Primary Key: Roll No


Third Solution
• Divide the tables into number of tables
Roll No(Primary Key) Name
1 Sai
2 Harsh
3 Onkar

Roll No(Foreign Key) Course


1 C
1 C++
2 JAVA
3 C
3 DBMS
Example: Unnormalized table
Course Course Name Teacher Roll Name System Hourly Total
Code Name No Used Rate Hours
C1 Visual Basic ABC 100 A1 P–I 20 7
101 A2 P – II 30 3
102 A3 Celeron 10 6
103 A4 P – IV 40 1
C2 Oracle & Dev DEF 100 A1 P–I 20 7
104 A5 P – III 35 3
105 A6 P – II 30 1
101 A2 P – II 30 2
C3 C++ KJP 106 A7 P – IV 40 3
107 A8 P – IV 40 2
108 A9 P–I 20 1
C4 Java Kumar 109 A10 Cyrix 20 2
Approaches to normalize table
• In general, there are two basic approaches to
normalize tables.
STUDENT (Flattening)
(Normalized Table)
Course Course Name Teacher Roll Name System Hourly Total
Code Name No Used Rate Hours
C1 Visual Basic ABC 100 A1 P–I 20 7
C1 Visual Basic ABC 101 A2 P – II 30 3
C1 Visual Basic ABC 102 A3 Celeron 10 6
C1 Visual Basic ABC 103 A4 P – IV 40 1
C2 Oracle & Dev DEF 100 A1 P–I 20 7
C2 Oracle & Dev DEF 104 A5 P – III 35 3
C2 Oracle & Dev DEF 105 A6 P – II 30 1
C2 Oracle & Dev DEF 101 A2 P – II 30 2
C3 C++ KJP 106 A7 P – IV 40 3
C3 C++ KJP 107 A8 P – IV 40 2
C3 C++ KJP 108 A9 P–I 20 1
C4 Java Kumar 109 A10 Cyrix 20 2
Approaches to normalize table
STUDENT (Decomposition)
(Normalized Table)
COURSE Course Roll Name System Hourly Total
Code No Used Rate Hours
Course Course Name Teacher
Code Name C1 100 A1 P–I 20 7
C1 Visual Basic ABC C1 101 A2 P – II 30 3
C2 Oracle & Dev DEF C1 102 A3 Celeron 10 6
C3 C++ KJP C1 103 A4 P – IV 40 1
C4 Java Kumar C2 100 A1 P–I 20 7
C2 104 A5 P – III 35 3
C2 105 A6 P – II 30 1
COURSE_STUDENT C2 101 A2 P – II 30 2
C3 106 A7 P – IV 40 3
PK = (Course_code, C3 107 A8 P – IV 40 2
RollNo) C3 108 A9 P–I 20 1
C4 109 A10 Cyrix 20 2
19
Functional Dependency
• Method to describe relationship between attributes
• Like X -> Y --- X determines Y or Y is determined by X
• X- Determinant
• Y- Dependent

• Example: Sid -> Sname


• Like I have Sname – Alok,Alok
• How will I distinguish?
• Trivial FD: X-> Y then Y is subset of X
• Example : Sid -> Sid
• These are always true, as attribute to be
determined is subset.
• Another example: Sid Sname -> Sid. Its
intersection cannot be NULL
• Non trivial dependency
• X-> Y X intersection Y = null
• Sid -> sname
• Sid -> semester
• Sid -> Phone
• Intersection of all these is always NULL
Trivial functional dependency

• A → B has trivial functional dependency if B is a subset of A.


• The following dependencies are also trivial like: A → A, B → B
• Example:
• Consider a table with two columns Employee_Id and Employe
e_Name.  
• {Employee_id, Employee_Name}   →    Employee_Id is a trivial
 functional dependency as   
• Employee_Id is a subset of {Employee_Id, Employee_Name}.  
• Also, Employee_Id → Employee_Id and Employee_Name   →  
  Employee_Name are trivial dependencies too.  
Non-trivial functional dependency

• A → B has a non-trivial functional dependency


if B is not a subset of A.
• When A intersection B is NULL, then A → B is
called as complete non-trivial.
• Example:
• ID   →    Name,  
• Name   →    DOB  
Example
Employee
SSN Name JobType DeptName
557-78-6587 Lance Smith Accountant Salary

214-45-2398 Lance Smith Engineer Product

Note: Name is functionally dependent on SSN because an


employee’s name can be uniquely determined from their SSN.
Name does not determine SSN, because more than one employee
can have the same name..
Functional Dependence (FD)
Functional Dependence (FD)
Fully Functional Dependency
• Fully Functional Dependence(FFD) is defined
as attribute Y is FFD on attribute X, if it is a FD
on X and not FD on any proper subset of X.
• (X1,X2)->Y
X
It should dependent on combination.
• Closure method: Method to find all candidate keys in a
table
• R(ABCD)
• FD{A->B,B->C,C->D}

• A+ -> ABCD
• B+ -> BCD
• C+ -> CD
• D+ -> D So, candidate key (A)

• Transitive Property
• AB+ -> ABCD
• A candidate key, but AB cannot be
• So, A is only candidate key.
• If you have B with it then it is superkey(B).

• Prime attribute: A
• Non Prime attribute: B,C,D
Solve the following question:

Function dependency given as below for relation R(ABCD)


A->B
B->C
C->D
D->A
which of the following is/are the candidate keys?
a)A ONLY
b)B ONLY
c)C ONLY
d)D ONLY
e)ALL OF THE ABOVE
Solution
• R(ABCD)
• FD ={A->B,B->C,C->D,D->A}
• A+ -> {ABCD}
• B+ -> {BCDA}
• C+ -> {CDAB}
• D+ -> {ABCD}
• Candidate Key{A,B,C,D}
• Prime Attributes: attributes that help in making
primary key .So, {A,B,C,D} are all prime attributes
Consider the following functional dependencies

•AB -> C
•C -> D
•BD -> E for a relation R(A,B,C,D,E).

•Closure of AB={ A,B,C,D,E}


•CLOSURE OF C={C,D}
•CLOSURE OF BD={B,D,E}
Find closure of ACD, BCD, AB, CD.
Ques 1. Consider the following functional dependencies

•{AB->C, B->D, C->E, D->A}  for a relation R(A,B,C,D,E).

Find closure of B, CD, BC

Ques 2. Given R(E-ID, E-NAME, E-CITY, E-STATE)

•FDs = { E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE }


Finding Candidate Keys and Super Keys of a Relation using FD set
Function dependency given as below for relation
R(ABCD)
FD: {A->B, B->C, C->D}
In the above case, choose the prime attributes?
a)A
b)B
c)A,B,C,D
d)C
Shortcut for identify Candidate Key
Consider a relation R(ABCDE)
FD={A->B,BC->D, E->C, D->A}
Step 1: Check the attributes on the right side which are getting determined
i.e. BCDA. E can’t be determined so, E will be used in candidate key
Step 2: E should be found which is {EC}
Step3 : Find combination with E starting from A
AE + =ABECD
And so on with other BE +,CE +,DE +
Find which others are candidate key?
a) BE b) CE c) DE d) BE and DE
Inference Rule (IR)

• The Armstrong's axioms are the basic inference rule.


• Armstrong's axioms are used to conclude functional
dependencies on a relational database.
• The inference rule is a type of assertion. It can apply to a set
of FD(functional dependency) to derive other FD.
• Using the inference rule, we can derive additional functional
dependency from the initial set.
• The Functional dependency has 6 types of inference rule:
 Reflexive Rule (IR1)

• In the reflexive rule, if Y is a subset of X, then


X determines Y.
• If X ⊇ Y then X  →    Y 
• Any attribute determining itself 
• Example:
• X = {a, b, c, d, e}  
• Y = {a, b, c}  
Augmentation Rule (IR2)

• The augmentation is also called as a partial dependency. In


augmentation, if X determines Y, then XZ determines YZ for
any Z.
• If X    →  Y then XZ   →   YZ   
• Example:
• For R(ABCD),  if A   →   B then AC  →   BC 
• Example
• Sid -> sname
• Sid Phoneno -> Sname Phoneno
 
Transitive Rule (IR3)

• In the transitive rule, if X determines Y and Y


determine Z, then X must also determine Z.
• If X   →   Y and Y  →  Z then X  →   Z    
Union Rule (IR4)

• Union rule says, if X determines Y and X determines Z, then X


must also determine Y and Z.
• If X    →  Y and X   →  Z then X  →    YZ     
• Proof:
• 1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX =
X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Decomposition Rule (IR5)

• Decomposition rule is also known as project rule. It


is the reverse of union rule.
• This Rule says, if X determines Y and Z, then X
determines Y and X determines Z separately.
• If X   →   YZ then X   →   Y and X  →    Z   
• Proof:
• 1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
Pseudo transitive Rule (IR6)

• In Pseudo transitive Rule, if X determines Y and YZ determines


W, then XZ determines W.
• If X   →   Y and YZ   →   W then XZ   →   W   
• Proof:
• 1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Second Normal Form
• Table must be in first normal form
• There should be no partial dependency
• All non prime attributes should be fully
dependent on candidate key
• Non prime attributes: Attributes that are not
participating in formation of candidate key
Second Normal Form (2NF)
• A relation R is in 2NF if and only if it is in 1NF
and every non-key attribute is fully functional
dependent on the primary key.
Name

RollNo
System_Used

Functional Dependence
Course_Code Hourly_Rate
Diagram
Course_Name

Total Hrs. Teacher_Name


Second Normal Form (2NF)
• A resultant database of 1NF Course_Code
does not satisfy above rule, because non-key
attributes Name, System_Used and
Hourly_Rate are not fully dependent on the
primary key (Course_Code, Rollno) because
Name, System_Used and Hourly_Rate are
functional dependent on Rollno and Rollno is a
subset of the primary key so it does not hold
the law of fully functional dependence.
Rule to convert 1NF to 2NF
• Consider a relation where a primary key
consists of attributes A and B. These two
attributes determine all other attributes.
• Attribute C is fully dependent on the key.
• Attribute D is partially dependent on the key
because we only need attribute A to
functionally determine it.
• Attributes C and D are non-key attributes.
Rule to convert 1NF to 2NF
• The rule is to replace the original relation by
two new relations. The first new relation has
three attributes: A, B and C. The primary key
of this relation is (A, B) i.e. the primary key of
the original relation. The second relation has A
and D as its only two attributes. Observe that
attribute A has been designated, as the
primary key of the second relation and that
attribute D is now fully dependent on the key.
Second Normal Form (2NF)
HOURS_ASSIGNED STUDENT_SYSTEM_CHARGE
Course Roll Total Roll Name System Hourly
Code No Hours No Used Rate
C1 100 7 100 A1 P–I 20
C1 101 3 101 A2 P – II 30 COURSE
102 A3 Celeron 10 Course Course Teacher
C1 102 6
Code Name Name
C1 103 1 103 A4 P – IV 40
Visual
C2 100 7 100 A1 P–I 20 C1 ABC
Basic
C2 104 3 104 A5 P – III 35
Oracle
105 A6 P – II 30 C2 DEF
C2 105 1 & Dev
C2 101 2 101 A2 P – II 30 C3 C++ KJP
C3 106 3 106 A7 P – IV 40 C4 Java Kumar
C3 107 2 107 A8 P – IV 40
C3 108 1 108 A9 P–I 20
C4 109 2 109 A10 Cyrix 20
Second Normal Form
Customer ID Store ID Location
1 1 Delhi
1 3 Mumbai
2 1 Delhi
3 2 Banglore
4 3 Mumbai

Prime attribute: CustomerID,StoreID


Non Prime Attribute: Location
Location is determined by store id
Convert to second normal form
• Divide the table
Customer id StoreID
1 1
1 3
2 1
3 2
4 3

Store id location
1 Delhi
2 Banglore
3 mumbai
Second Normal Form (2NF)

Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one subject.
Question
A relation R(ABCDEF) where Function dependencies are given
below:
C->F, E->A, EC->D,A->B

i.Find Candidate Key


a) EC b) FC c)Both
ii. Find Prime Attribute
a)E,C b) C,E,A c) BOTH
iii. Find Non Prime attributes
a)A,B,D,E B) A,B,D,F c) NONE
Question

Consider following functional dependencies in


relation R (A, B, C, D )
AB -> C
BC -> D
Determine whether the relation is in Second
Normal Form or not?
Prime and nonprime attributes
• An attribute of a relation schema R is called
Prime attribute if it is a member of some
candidate key of R.
• An attribute is called non prime if it is not a
prime attribute i.e. it is not the member of any
candidate key.
Third Normal Form (3NF)
• A relation R is in 3NF if and only if the
following conditions are satisfied
simultaneously:
– R is already in 2NF
– No nonprime attribute functionally determines
any other nonprime attribute Or No nonprime
attribute is transitively dependent on the key

• The objective of transforming relations into


3NF is to remove all transitive dependencies.
Third Normal Form (3NF)
Name Functional Dependence
Diagram
RollNo System_Used RollNo  System_Used
System_Used  Hourly_Rate
It Means RollNo  Hourly_Rate
Hourly_Rate
Third Normal Form (3NF)
STUDENT_SYSTEM_CHARGE STUDENT_SYSTEM
Roll Name System Hourly Roll Name System
No Used Rate No Used
100 A1 P–I 20 100 A1 P–I CHARGES
101 A2 P – II 30 101 A2 P – II System Hourly
Used Rate
102 A3 Celeron 10 102 A3 Celeron
Celeron 10
103 A4 P – IV 40 103 A4 P – IV
Cyrix 20
100 A1 P–I 20 Convert To 100 A1 P–I
P–I 20
104 A5 P – III 35 104 A5 P – III
P – II 30
105 A6 P – II 30 105 A6 P – II
P – III 35
101 A2 P – II 30 101 A2 P – II
P – IV 40
106 A7 P – IV 40 106 A7 P – IV
107 A8 P – IV 40 107 A8 P – IV
108 A9 P–I 20 108 A9 P–I
109 A10 Cyrix 20 2NF 109 A10 Cyrix 3NF
• Super key in the table above:
• {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n  
• Candidate key: {EMP_ID}
• Non-prime attributes: In the given table, all attributes except EMP_ID are
non-prime.
• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP
dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates the rule of third
normal form.
• That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
Question
A relation R(ABCD) where Function dependencies are given
below:
AB->C, C->D

Candidate Key- AB
Prime Attributes- A,B
Non- Prime Attributes- C,D

Is the relation in 3NF?


Question
A relation R(ABCD) where Function dependencies are given
below:
AB->CD, D->A
Is the relation in 3NF?
Boyce Codd normal form (BCNF)

• BCNF is the advance version of 3NF. It is stricter than


3NF.
• A table is in BCNF if every functional dependency X
→ Y, X is the super key of the table.
• For BCNF, the table should be in 3NF, and for every
FD, LHS is super key.
• Example: Let's assume there is a company where
employees work in more than one department.
• In the above table Functional dependencies
are as follows:
• EMP_ID  →  EMP_COUNTRY  
• EMP_DEPT  →   {DEPT_TYPE, EMP_DEPT_NO}  
• Candidate key: {EMP-ID, EMP-DEPT}
• The table is not in BCNF because neither
EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we
decompose it into three tables:
• Functional dependencies:
• EMP_ID   →    EMP_COUNTRY  
• EMP_DEPT   →   {DEPT_TYPE, EMP_DEPT_NO}  
• Candidate keys:
• For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
• Now, this is in BCNF because left side part of both the
functional dependencies is a key.

You might also like