0% found this document useful (0 votes)
72 views107 pages

Lecture 11fjylkgey

in a separate relation. The document discusses normalization of relational databases. Normalization is the process of analyzing relations and removing anomalies like insertion, deletion and update anomalies. It involves decomposing relations into smaller relations while maintaining properties like lossless join and dependency preservation. The document covers concepts like functional dependencies, normal forms like 1NF, anomalies, and provides examples of converting an unnormalized relation to first normal form through techniques like removing repeating groups and creating separate relations for repeating attributes.

Uploaded by

Rifky Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views107 pages

Lecture 11fjylkgey

in a separate relation. The document discusses normalization of relational databases. Normalization is the process of analyzing relations and removing anomalies like insertion, deletion and update anomalies. It involves decomposing relations into smaller relations while maintaining properties like lossless join and dependency preservation. The document covers concepts like functional dependencies, normal forms like 1NF, anomalies, and provides examples of converting an unnormalized relation to first normal form through techniques like removing repeating groups and creating separate relations for repeating attributes.

Uploaded by

Rifky Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 107

Relational Database Management Systems

Normalization
Lecture 10
In this lecture you will
learn
• Mathematical notions behind relational model

• Normalization
Introduction
• Relations derived from ER model may be ‘faulty’
– Subjective process.
– May cause data redundancy, and insert/delete/update
anomalies.

• We use some mathematical (semantic?) properties of


relations to
– locate these faults and
– fix them

• Process is called Normalization.


Normalization (contd.)

• Relational database schema = set of relations

• Relation = set of attributes

• How we group the attributes to relations is important.


Normalization (contd.)
• Too many attributes in a relation
– Waste space
– Anomalies
• Insert anomaly
• Delete anomaly
• Update anomaly

• Decomposing the relation into too smaller set of relations


– Loss-less join property
– Dependency preserving property
Data Redundancy

• Major aim of relational database design is


– to group attributes into relations to minimize data redundancy
and
– to reduce file storage space required by base relations.

• Data redundancy is undesirable because of the following


anomalies
– ‘Insert’ anomalies
– ‘Delete’ anomalies
– ‘Update’ anomalies
Anomalies

Too many attributes…


For example,

LECTURER (id, name, address, salary, department, building)


Anomalies (contd.)

Insertion Anomaly…
1. Inserting a new lecturer to the
LECTURER table
- Department information is
repeated (ensure that correct
department information is
inserted).

LECTURER (id, name, address, salary, department, building)


Anomalies (contd.)

• Inserting a department with no employees


– (Impossible – b/c null values for id is not allowed)
Anomalies (contd.)

Deletion Anomalies…

• Deleting the last lecturer


from the department will lose
information about the
department.

LECTURER (id, name, address, salary, department, building)


Anomalies (contd.)

Updating Anomalies…

• Updating the department’s


building needs to be done
for all lecturers working for
that department.

LECTURER (id, name, address, salary, department, building)


Decomposition of
Relations
• Staff and Branch relations which are obtained by
decomposing StaffBranch do not suffer from these
anomalies.

• Two important properties of decomposition


– Lossless-join property enables us to find any instance
of original relation from corresponding instances in the
smaller relations.
– Dependency preservation property enables us to
enforce a constraint on original relation by enforcing
some constraint on each of the smaller relations.
Loss-less join property

Decomposing the relation into too smaller relations…

• Loss-less join property: we might lose information if we


decompose relations…
Loss-less join property
(contd.)
For example,

S R1 R2
S P D S P P D
S1 P1 D1 S1 P1 P1 D1
S2 P2 D2 S2 P2 P2 D2
S3 P1 D3 S3 P1 P1 D3
Loss-less join property
(contd.)
Joining them together, we get spurious tuples…

S P D
S1 P1 D1
R1 R2 S1 P1 D3
S2 P2 D2
S3 P1 D1
S3 P1 D3
The Process of
Normalization
• Formal technique for analyzing a relation based on its primary key and
functional dependencies between its attributes.

• Often executed as a series of steps. Each step corresponds to a specific


normal form, which has known properties.

• As normalization proceeds, relations become progressively more


restricted (stronger) in format and also less vulnerable to update
anomalies.

• Given a relation, use the following cycle


– Find out what normal form it is in
– Transform the relation to the next higher form by decomposing it to
form simpler relations
– You may need to refine the relation further if decomposition resulted
in undesirable properties

Normalization is based on Functional Dependencies


Functional dependency

• A functional dependency, denoted by X Y,


• X functionally determines Y
• Y is functionally dependent on X

• where X and Y are sets of attributes in relation R, specifies


the following constraint:
Let t1 and t2 be tuples of relation R for any given instance
Whenever t1[X] = t2[X] then t1[Y] = t2[Y]

where ti[X] represents the values for X in tuple t i


Functional dependency
(contd.)
TEACH
STUDENT COURSE TEACHER
Narayana Database ABC
Sumith Database ABC
Nalin Operating Systems Samantha
Kamal Mathematics Chandrika
Janith Database ABC
Ranil Operating Systems Samantha
Saman Mathematics Chandrika
Ruwan Database ABC

TEACHER  COURSE
Functional Dependency
• Diagrammatic representation:

• Determinant of a functional dependency refers to attribute or group of


attributes on left-hand side of the arrow.

• If the determinant can maintain the functional dependency with a


minimum number of attributes, then we call it full functional
dependency.
Key Terms
Review of some terms…
• Superkey: Set of attributes S in relation R that can
be used to identify each tuple uniquely.

• Key: A key is a superkey with the additional


property that removal of any attributes from the key
will not satisfy the key condition.
Key Terms
• Candidate Key: Each key of a relation is called a
candidate key.

• Primary Key: A candidate key is chosen to be the


primary key.

• Prime Attribute: an attribute which is a member of


a candidate key.

• Nonprime Attribute: An attribute which is not


prime.
Unnormalized Form
(UNF)
• A table that contains one or more repeating groups.

• To create an unnormalized table:


– Transform data from information source (e.g. form) into
table format with columns and rows.

Example 1 – address and name fields are composite

Name Address Phone

Sally Singer 123 Broadway New York, NY, 11234 (111) 222-3345

Jason Jumper 456 Jolly Jumper St. Trenton NJ, 11547 (222) 334-5566
Another example of UNF
Example 2 – repeating columns for each client &
composite name field

Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3

Kilroy
TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs 9 hrs
Inc.

RK-56 Mary Mayhem Italiana 67 hrs Linkers 2 hrs    


UNF to 1NF

• Remove repeating group by:


– entering appropriate data into the empty columns of rows
containing repeating data (‘flattening’ the table).

Or by
– placing repeating data along with copy of the original key
attribute(s) into a separate relation.
Normalization (contd.)

1st Normal Form


• A relation R is in first normal
form (1NF) if domains of all
attributes in the relation are
atomic (simple & indivisible).

• Avoid multivalued & composite


attributes.
Normalization (contd.)

For example…
DEPARTMENT (Dname,Dnumber, DMGRSSN, DLocation)

DEPARTMENT

DNAME DNUMBER DMGRSSN DLOCATIONS

Research 5 333445555 {Mathara, Kandy, Metro}


Administration 4 987654321 {Malabe}
Headquarters 1 888665555 {Metro}

• Department relation not in 1NF


• How to take into 1NF ?
Normalization (contd.)

• Solution 1: Repeat the same info (redundancy + new key


attribute).

• Solution 2: If max number of locations is known, create a


column for each location (may have lots of null values).

• Solution 3: Create a separate DLOCATION relation with


foreign key.
Normalization (contd.)
Solution 1:
DEPARTMENT

DNAME DNUMBER DMGRSSN DLOCATIONS

Research 5 333445555 Mathara


Research 5 333445555 Kandy
Research 5 333445555 Metro

Administration 4 987654321 Malabe


Headquarters 1 888665555 Metro

1NF relation with redundancy

• Expand the key so that there will be a separate tuple in the


original DEPARTMENT relation for each location of the
DEPARTMENT.
• This solution has the disadvantage of introducing
redundancy in the relation.
Normalization (contd.)
Solution 2:

DEPARTMENT

DNAME DNUMBER DMGRSSN DLOC1 DLOC2 DLOC3

Research 5 333445555 Mathara Kandy Metro

Administration 4 987654321 Malabe Null Null

Headquarters 1 888665555 Metro Null Null

• Need to know max number of locations.


• create a column for each location.
• may have lots of null values.
Normalization (contd.)
Solution 3:
• Remove the attribute
DLOCTION and place it in
DEPT_LOCATIONS
DNUMBER DLOCATIONS
a separate relation
DEPT_LOCTIONS along
1 Metro
4 Malabe
with the primary key
5 Mathara
DNUMBER of
5 Kandy DEPARTMENT.
5 Metro • The PK is the combination
{DNUMBER, DLOCTION}
• This decompose the non-
DEPARTMENT
DNAME DNUMBER DMGRSSN
INF relation into two INF
relation.
Research 5 333445555
Administration 4 987654321
Headquarters 1 888665555
Normalization (contd.)
• A functional dependency, X  Y is a full functional dependency if
removal of any attribute A from X means that the dependency does
not hold
(i.e. (X –{A})  Y does not hold )

TEACH
STUDENT COURSE TEACHER CAMPUS
Narayan Database ABC Metro
Smith Database XYZ Malabe
Nalin Operating Systems Samantha Metro
Kamal Operating Systems ABC Malabe
Janith Database ABC Metro
Ranil Operating Systems Samantha Metro
Saman Operating Systems ABC Malabe
Ruwan Database XYZ Malabe
{Teacher, Campus}  Course
Normalization (contd.)
2nd Normal Form:
• A relation R is in second normal form (2NF) if every
nonprime attribute A in R is not partially dependent on any
key of R.

Example: Not in 2NF

TEACHER CAMPUS COURSE ADDRESS


ABC Metro Database BoC Merchant Tower
XYZ Malabe Database Malabe Campus
Samantha Metro Operating Systems BoC Merchant Tower
ABC Malabe Operating Systems Malabe Campus
Normalization (contd.)
• Lossless join decomposition:
– Decomposition of R into X and Y is lossless-join w.r.t. a
set of FDs F if, for every instance r that satisfies F:
 X(r)  Y (r) = r

Theorem:

This condition holds if attributes common to


X and Y contains a key for either X or Y
Removing partial
dependency

• Place the attributes that create the partial


dependency in a separate table.

• Make sure that the new table's primary key is left in


the original table.
Normalization (contd.)

TEACHER CAMPUS COURSE ADDRESS


ABC Metro Database BoC Merchant Tower
XYZ Malabe Database Malabe Campus
Samantha Metro Operating Systems BoC Merchant Tower
ABC Malabe Operating Systems Malabe Campus
Normalization (contd.)

Example: After normalized into 2NF

TEACHER CAMPUS COURSE

ABC Metro Database


XYZ Malabe Database
Samantha Metro Operating Systems
ABC Malabe Operating Systems

CAMPUS ADDRESS
Metro BoC Merchant Tower
Malabe Malabe Campus
Another Example

EMP_PROJ

SSN PNUM HOURS ENAME PNAME LOC

FD1
FD2
FD3
Normalization (contd.)

EP1 SSN PNUM HOURS

EP2 SSN ENAME

EP3 PNUM PNAME PLOC


Normalization (contd.)

3rd Normal Form:


• A relation R is in 3rd normal
form (3NF) if every

– R is in 2NF, and
– No nonprime attribute is
transitively dependent on any
key.
Transitive dependency
Attribute is dependent on another attribute that is not
part of the primary key.
Requires the decomposition of the table containing
the transitive dependency.

EMP_DEPT

ENAME SSN BDATE ADD DNUM DNAME DMGR


Removing transitive
dependency

• Place the attributes that create the transitive


dependency in a separate table.

• Make sure that the new table's primary key attribute


is the foreign key in the original table.
ENAME SSN BDATE ADD DNUM DNAME DMGR

DNUM DNAME DMGR

ENAME SSN BDATE ADD DNUM


IsRemove
the tablethe Transitive
in 3NF?
dependency
Why?

Remove Transitive Dependency


Normalization (contd.)

PROPERTY_ COUNTY LOT AREA PRICE TAX_


ID _NAME # RATE

FD1

FD2
FD3
FD4

Keys: PropertyID, (County_Name, Lot#)


Normalization (contd.)

• 1NF, 2NF & 3NF guarantee to preserve lossless


join property.
Your Turn !!
Dependency diagram

Identify the dependencies shown in the above diagram

C1->C2 partial dependency


C4 ->C5 transitive dependency
C1,C3 -> C2,C4,C5 functional dependency
• Create a database whose tables are at
least in 2NF, showing the dependency
diagrams for each table.
Create a database whose tables
are at least in 3NF
Normalization (contd.)
• Denormalization…

Sometime for performance


reasons, database designer
may leave the relation in a
lower normal form. This
process is known as
denormalization.
Normalization

• Normalization complete 

• Any questions ???


Normalization Flow

UNF
Remove repeating groups

1NF
Remove partial dependencies

2NF

Remove transitive dependencies

3NF

More normalized forms


Your Turn!
Student Results Table
Course Course Student Student Date of Tutor Tutor Grade Result
Code Title Code Name Birth Code Name

SYA Systems A2345 Smith 20/08/69 1746 Jones A Dist.


Analysis A7423 Barker 03/04/59 1746 Jones C Pass
B3472 Green 23/02/70 1330 Jarvis D Pass
A3472 Harris 17/07/69 1746 Jones F Fail
B9843 Green 10/11/68 1330 Jarvis B Merit

COB COBOL A7423 Barker 03/04/69 1520 Hooper E Fail


A4217 Morris 17/01/68 1520 Hooper B Merit
B8238 Carter 09/12/69 1520 Hooper C Pass

PAS Pascal A4217 Morris 17/01/68 1520 Hooper A Dist.


B9843 Green 10/11/68 1283 Trotter B Merit
A3393 White 30/09/69 1283 Trotter E Fail
A4247 Cross 25/12/69 1520 Hooper C Pass
Relational Data Analysis

Possible Entities in System

• COURSE
• STUDENT
• TUTOR
• GRADE?
Relational Data
Analysis

Unnormalised Form

• Table made up of ROWS & COLUMNS


• Rows grouped together
• write table in unnormalised form
• choose unique KEY and underline
Relational Data
Analysis
Data Attributes
• Student Code • Tutor Code
• Student Name • Tutor Name
• Date of Birth • Grade
• Result
• All REPEAT for a given
value of COURSE CODE
Relational Data Analysis
Normalisation Table
UNF 1NF 2NF
LEVEL

e 1
1
ode 2
ame 2
rth 2
e 2
me 2
2
2
Relational Data Analysis

First Normal Form

• Any relation is in First Normal Form when


it contains no repeating groups of data
Relational Data Analysis

Course Code enter name of non-


repeating group

SYA
enter values of non-
COB repeating attribute

PAS
Relational Data Analysis

Course Code Course Title

SYA Systems Analysis

COB COBOL

PAS Pascal
Relational Data Analysis
Course Student Student Date of Tutor Tutor Grade Result
Code Code Name Birth Code Name

SYA A2345 Smith 20/08/69 1746 Jones A Dist.


SYA A7423 Barker 03/04/59 1746 Jones C Pass
SYA B3472 Green 23/02/70 1330 Jarvis D Pass
SYA A3472 Harris 17/07/69 1746 Jones F Fail
SYA B9843 Green 10/11/68 1330 Jarvis B Merit

COB A7423 Barker 03/04/69 1520 Hooper E Fail


COB A4217 Morris 17/01/68 1520 Hooper B Merit
COB B8238 Carter 09/12/69 1520 Hooper C Pass

PAS A4217 Morris 17/01/68 1520 Hooper A Dist.


PAS B9843 Green 10/11/68 1283 Trotter B Merit
PAS A3393 White 30/09/69 1283 Trotter E Fail
PAS A4247 Cross 25/12/69 1520 Hooper C Pass
Relational Data Analysis

Normalisation Table
UNF 1NF 2NF
LEVEL

1 Course Code
1 Course Title
e 2
me 2 Course Code
h 2 Student Code
2 Student Name
2 Date of Birth
2 Tutor Code
2 Tutor Name
Grade
Result
Relational Data Analysis

Normalisation Table
UNF UNF 1NF
2NF 3NF LEVEL

Course Code 1 Course Code


Course Title 1 Course Title
Student Code 2
Student Name 2
Date of Birth 2
Tutor Code 2 Course Code
Tutor Name 2 Student Code
Grade 2 Student Name
Result 2 Date of Birth
Tutor Code
Tutor Name
Grade
Result
Relational Data Analysis

Normalisation Table
UNF UNF 1NF 2NF
3NF
LEVEL

Course Code 1 Course Code


Course Title 1 Course Title
Student Code 2
Student Name 2
Date of Birth 2
Tutor Code 2 Course Code
Tutor Name 2 Student Code
Grade 2 Student Name
Result 2 Date of Birth
Tutor Code
Tutor Name
Grade
Result
Relational Data Analysis

Normalisation Table
UNF UNF 1NF 2NF
3NF
LEVEL

Course Code 1 Course Code


Course Title 1 Course Title
Student Code 2
Student Name 2
Date of Birth 2
Tutor Code 2 Course Code
Tutor Name 2 Student Code
Grade 2 Student Name
Result 2 Date of Birth
Tutor Code
Tutor Name
Grade
Result
Relational Data Analysis

Second Normal Form


• Any relation already in 1NF is also in 2NF if
EITHER the key is a single attribute OR the non-
key items are fully dependent on the WHOLE key

• In Second Normal Form, you remove data items


which depend on only part of a key
Relational Data Analysis

What attribute or attributes determine the TUTOR


CODE?

• Course Code
• Student Code
• Course Code + Student Code
Relational Data Analysis

Student Code Student Name

Student Code
Tutor Code
Course Code
Relational Data Analysis

Student Name
Date of Birth
Course Code Tutor Code
Student Code Tutor Name
Grade
Result
Relational Data Analysis

Normalisation Table - 2NF


UNF UNF 1NF 2NF
3NF LEVEL

Course Code 1 Course Code Course Code


Course Title 1 Course Title Course Title
Student Code 2
Student Name 2 Course Code Course Code
Date of Birth 2 Student Code Student Code
Tutor Code 2 Student Name Tutor Code
Tutor Name 2 Date of Birth Tutor
Name
Grade 2 Tutor Code Grade
Result 2 Tutor Name Result
Grade
Result Student Code
Student
Name
Date of
Birth
Relational Data Analysis

Normalisation Table - 2NF


UNF UNF 1NF 2NF
3NF LEVEL

Course Code 1 Course Code Course Code


Course Title 1 Course Title Course Title
Student Code 2
Student Name 2 Course Code Course Code
Date of Birth 2 Student Code Student Code
Tutor Code 2 Student Name Tutor Code
Tutor Name 2 Date of Birth Tutor
Name
Grade 2 Tutor Code Grade
Result 2 Tutor Name Result
Grade
Result Student Code
Student
Name
Date of
Birth
Relational Data Analysis

Third Normal Form


• Any relation in 2NF is also 3NF if all non-key
attributes are independent of all other non-key
attributes and all key attributes are independent
of all the other key attributes

• In Third Normal Form, you remove any attributes


which are not directly dependent upon the key
Relational Data Analysis
Normalisation Table - 3NF
UNF UNF 1NF 2NF
3NF
LEVEL

Course Code 1 Course Code Course Code


Course Code
Course Title 1 Course Title Course Title
Course Title
Student Code 2
Student Name 2 Course Code Course Code
Course Code
Date of Birth 2 Student Code Student Code
Student Code
Tutor Code 2 Student Name Tutor Code
Tutor Code
Tutor Name 2 Date of Birth Tutor Name
Grade
Grade 2 Tutor Code Grade

Result 2 Tutor Name Result


Student Code
Grade
Relational Data Analysis
Normalisation Table - 3NF
UNF UNF 1NF 2NF
3NF
LEVEL

Course Code 1 Course Code Course Code


Course Code
Course Title 1 Course Title Course Title
Course Title
Student Code 2
Student Name 2 Course Code Course Code
Course Code
Date of Birth 2 Student Code Student Code
Student Code
Tutor Code 2 Student Name Tutor Code
*Tutor Code
Tutor Name 2 Date of Birth Tutor Name
*Grade
Grade 2 Tutor Code Foreign Grade
Result 2 Tutor Name Result
Student Code key
Grade
Student Name
Relational Data Analysis

Summary:
• choose a suitable key from a table of raw data
• identify repeating groups
• write the data in unnormalised form
• convert unnormalised data to first normal form
• convert first normal form to second normal form
• convert second normal form to third normal form
Another Example
• The following report is a User View

• This is a view of the data that is seen by a


particular person at a particular time for a
particular reason

• The example is a Project Cost Report for the


Construction Company

• A user view displays the data that is


necessary for the user to perform a particular
task, in this case it is to review the project cost
User view
Need for normalisation

The above structure matches the report format


Observations
• PRO_NUM intended to be primary key

• Table entries invite data inconsistencies

• Table displays data anomalies


– Update
• Modifying JOB_CLASS
– Insertion
• New employee must be assigned project
– Deletion
• If employee deleted, other vital data lost
Conversion to 1NF
• Repeating groups must be eliminated
– Proper primary key developed
• Combination of PROJ_NUM and EMP_NUM
– Dependencies can be identified
• Desirable dependencies based on primary key
• Less desirable dependencies
– Partial
» based on part of composite primary key
– Transitive
» one nonprime attribute depends on another nonprime
attribute
Data Organisation: 1NF

Figure 4.3
1NF Summarised
• All key attributes defined

• No repeating groups in table

• All attributes dependent on


primary key
Alternate Method
A Short Cut : Conversion to
2NF
• Start with 1NF format:
• Write each key component on separate line
• Write original key on last line
• Each component is new table
• Write dependent attributes after each key

PROJECT (PROJ_NUM, PROJ_NAME)

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)

ASSIGN (PROJ_NUM, EMP_NUM, HOURS)


Dependency Diagram

Types of Dependencies
2NF Conversion Results

PROJECT (PROJ_NUM, PROJ_NAME)

EMPLOYEE (EMP_NUM,
EMP_NAME,JOB_CLASS, CHG_HOUR)

ASSIGN (PROJ_NUM, EMP_NUM,


ASSIGN_HOURS)
2NF Summarised
• In 1NF
• Includes no partial dependencies
– No attribute dependent on a portion of primary
key
• Still possible to exhibit transitive dependency
– Attributes may be functionally dependent on
nonkey attributes
What is Transitive
Dependency ?
An Example 2NF to 3NF
3NF Summarised

• In 2NF

• Contains no transitive
dependencies
Denormalisation
• Normalisation is one of many database design goals
• Normalised table requirements
– Additional processing
– Loss of system speed
• normalisation purity is difficult to sustain due to
conflict in:
– Design efficiency
– Information requirements
– Processing
Unnormalised Table Defects

• Data updates less efficient


• Indexing more cumbersome
• No simple strategies for creating views
More examples
• Given following relation and example data:
PartNumb Descriptio Supplier SupplierAddre Pric
er n ss e ($)
10010 20 GB Seagate Cuppertino, 100
Disk CA
10010 20 GB IBM Armork, NY 90
Disk
10220 256 MB Kensingto San Mateo, 220
RAM card n CA
10220 256 MB IBM Armork, NY 290
RAM card
10220 256 MB Sun Palo alto, CA 310
RAM card Microsyst
• List the functional dependencies and
normalise the data to 3NF.
Conclusion
• Quality of the relations derived from ER models is unknown.

• Normalization is a systematic process of either assessing or


converting these relations into progressively stricter normal
forms.

• Advanced normal forms such as Boyce-Codd normal form


(BNCF), 4NF and 5NF exist

You might also like