0% found this document useful (0 votes)
73 views55 pages

Database Systems 7

The document discusses normalization and normal forms. It defines normalization as developing a faithful and flexible data model that reduces redundancy and anomalies like insertion, deletion, and update anomalies. It discusses first, second, third normal forms and beyond. Functional dependencies and keys are also defined. The goal of normalization is to organize data into tables and relations without anomalies.

Uploaded by

Tayseer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views55 pages

Database Systems 7

The document discusses normalization and normal forms. It defines normalization as developing a faithful and flexible data model that reduces redundancy and anomalies like insertion, deletion, and update anomalies. It discusses first, second, third normal forms and beyond. Functional dependencies and keys are also defined. The goal of normalization is to organize data into tables and relations without anomalies.

Uploaded by

Tayseer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 55

Database Systems

Lecture
Normalization
Instructor
Muhammad Alyas Shahid
Objectives of Normalization
• Develop a good description of the data, its
relationships and constraints
• Produce a stable set of relations that
• Is a faithful model of the enterprise
• Is highly flexible
• Reduces redundancy-saves space and reduces
inconsistency in data
• Is free of update, insertion and deletion anomalies
Anomalies
• An anomaly is an inconsistent, incomplete, or
contradictory state of the database
• Insertion anomaly – user is unable to insert a new
record when it should be possible to do so
• Deletion anomaly – when a record is deleted, other
information that is tied to it is also deleted
• Update anomaly –a record is updated, but other
appearances of the same items are not updated
Anomaly Examples: NewClass Table
courseNo stuId stuLastName facId schedule room grade

ART103A S1001 Smith F101 MWF9 H221 A


ART103A S1010 Burns F101 MWF9 H221
ART103A S1006 Lee F101 MWF9 H221 B
CSC201A S1003 Jones F105 TUTHF10 M110 A
CSC201A S1006 Lee F105 TUTHF10 M110 C
HST205A S1001 Smith F202 MWF11 H221

Figure 5.1 The NewClass Table

NewClass(courseNo, stuId, stuLastName, fID, schedule, room, grade)

• Update anomaly: If schedule of ART103A is updated in first record,


and not in second and third – inconsistent data
• Deletion anomaly: If record of student S1001 is deleted, information
about HST205A class is lost also
• Insertion anomaly: It is not possible to add a new class, for MTH101A
, even if its teacher, schedule, and room are known, unless there is a
student registered for it, because the key contains stuId
Normal Forms
• First normal form -1NF
• Second normal form-2NF
• Third normal form-3NF
• Boyce-Codd normal form-BCNF
• Fourth normal form-4NF
• Fifth normal form-5NF
• Domain/Key normal form-DKNF

Each is contained within the previous form – each


has stricter rules than the previous form
Design Object: put schema in highest normal form
that is practical and appropriate for the data in
the database
Normal Forms (cont.)
All Relations
1NF
2NF
3NF
BCNF
4NF
5NF

DKNF
Functional Dependency
• A functional dependency (FD) is a type of
relationship between attributes
• If A and B are sets of attributes of relation R, say B
is functionally dependent on A if each A value in R
has associated with it exactly one value of B in R.
• Alternatively, if two tuples have the same A values,
they must also have the same B values
• Write A→B, read A functionally determines B, or
B functionally dependent on A
• FD is actually a many-to-one relationship between
A and B
Example of FDs
Stuid lastName major credits status socSecNo
S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee CSC 15 Freshman 088520876
S1010 Burns Art 63 Junior 099320985
S1060 Jones CSC 25 Freshman 064624738
Figure 5.3 NewStudent Table (assume each student has only one mjaor)

• Let R be
NewStudent(stuId, lastName, major, credits, status, socSecNo)
• FDs in R include
{stuId}→{lastName}, but not the reverse
{stuId} →{lastName, major, credits, status, socSecNo, stuId}
{socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
{credits}→{status}, but not {status}→{credits}
Trivial Functional Dependency
• The FD X→Y is trivial if set {Y} is a subset of set {X}

Examples: If A and B are attributes of R,


{A}→{A}
{A,B} →{A}
{A,B} →{B}
{A,B} →{A,B}
are all trivial FDs
Keys
• Superkey – functionally determines all
attributes in a relation
– Superkeys of NewStudent: {stuId}, {stuId, lastName},
{stuId, any other attribute}, {socSecNo, any other
attribute}
• Candidate key - superkey that is a minimal
identifier (no extraneous attributes)
– Candidate keys of NewStudent: {stuId}, {socSecNo}
– If no two students are permitted to have th same
combination of name and major values, {lastName,
manor} would be a candidate key
– A relation may have several candidate keys
Keys (cont)
• Primary key - candidate key actually used to
identify tuples in a relation
– Has no null values
– Values are unique in database
– {stuId} is primay key
• Choose because some international students may not
have social security number, and there are some security
issues with social security numbers.
• Should also enforce uniqueness and no-null rule
for candidate keys
First Normal Form
• A relation is in 1NF if and only if every
attribute is single-valued for each tuple
– Each cell of the table has only one value in it
– Domains of attributes are atomic: no sets, lists,
repeating fields or groups allowed in domains
Counter-Example for 1NF
Stuid lastName major credits status socSecNo
S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee Math 15 Freshman 088520876
CSC
S1010 Burns English 63 Junior 099320985
Art
S1060 Jones CSC 25 Freshman 064624738

Figure 5.4(a) NewStu Table (Assume students can have double majors)

• NewStu(StuId, lastName, major, credits, status,


socSecNo) – Assume students can have more
than one major
• The major attribute is not single-valued for each
tuple
Ensuring 1NF
• Best solution: For each multi-valued attribute,
create a new table, in which you place the key
of the original table and the multi-valued
attribute. Keep the original table, with its key
Ex. NewStu2(stuId, lastName, credits,status, socSecNo)
Majors(stuId, major)
NewStu2
Stuid lastName credits status socSecNo
S1001 Smith 90 Senior 100429500
S1003 Jones 95 Senior 010124567
S1006 Lee 15 Freshman 088520876
S1010 Burns 63 Junior 099320985
S1060 Jones 25 Freshman 064624738

Majors
stuId major
S1001 History
S1003 Math
S1006 CSC
S1006 Math
S1010 Art
S1010 English Figure 5.4 (b) NewStu2 Table and Majors Table
S1060 CSC
Another method for 1NF
• “Flatten” the original table by making the
multi-valued attribute part of the key
Student(stuId, lastName, major, credits,
status, socSecNo)
Stuid lastName major credits status socSecNo
S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee CSC 15 Freshman 088520876
S1006 Lee Math 15 Freshman 088520876
S1010 Burns Art 63 Junior 099320985
S1010 Burns English 63 Junior 099320985
S1060 Jones CSC 25 Freshman 064624738

Figure 5.4(d) NewStu Table Rewritten in 1NF, with{StuId, major} as primary key
Yet Another Method
• If the number of repeats is limited,
make additional columns for multiple
values
Student(stuId, lastName, major1, major2, credits, status, socSecNo)

Stuid lastName Major major2 Credits Status socSecNo


S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee CSC Math 15 Freshman 088520876
S1010 Burns Art English 63 Junior 099320985
S1060 Jones CSC 25 Freshman 064624738

Figure 5.4(c) NewStu3 Table with two attributes for major


Full Functional Dependency
• In relation R, set of attributes B is fully
functionally dependent on set of attributes A
of R if B is functionally dependent on A but
not functionally dependent on any proper
subset of A
• This means every attribute in A is needed to
functionally determine B
Partial Functional Dependency Example
NewClass(courseNo, stuId, stuLastName, facId, schedule, room, grade)

FDs:
{courseNo,stuId} → {lastName}
{courseNo,stuId} →{facId}
{courseNo,stuId} →{schedule}
{courseNo,stuId} →{room}
{courseNo,stuId} →{grade}
courseNo → facId **partial FD
courseNo → schedule **partial FD
courseNo →room ** partial FD
stuId → lastName ** partial FD
…plus trivial FDs that are partial…
Second Normal Form
• A relation is in second normal form (2NF) if it
is in first normal form and all the non-key
attributes are fully functionally dependent on
the key.
• No non-key attribute is FD on just part of the
key
• If key has only one attribute, and R is 1NF, R is
automatically 2NF
Converting to 2NF
• Identify each partial FD
• Remove the attributes that depend on each of the
determinants so identified
• Place these determinants in separate relations along
with their dependent attributes
• In original relation keep the composite key and any
attributes that are fully functionally dependent on all
of it
• Even if the composite key has no dependent
attributes, keep that relation to connect logically the
others
2NF Example
NewClass(courseNo, stuId, stuLastName, facId, schedule, room, grade )

FDs grouped by determinant:


{courseNo} → {courseNo,facId, schedule, room}
{stuId} → {stuId, lastName}
{courseNo,stuId} → {courseNo, stuId, facId, schedule,
room, lastName, grade}

Create tables grouped by determinants:


Course(courseNo,facId, schedule, room)
Stu(stuId, lastName)
Keep relation with original composite key, with attributes FD on it, if any
NewStu2( courseNo, stuId, grade)
2NF Example
courseNo stuId stuLastName facId schedule room grade
ART103A S1001 Smith F101 MWF9 H221 A
ART103A S1010 Burns F101 MWF9 H221
ART103A S1006 Lee F101 MWF9 H221 B
CSC201A S1003 Jones F105 TUTHF10 M110 A
First Normal Form Relation
CSC201A S1006 Lee F105 TUTHF10 M110 C
HST205A S1001 Smith F202 MWF11 H221

Register Stu Class2


courseNo stuId grade stuId stuLastName courseNo facId schedule room

ART103A S1001 A S1001 Smith ART103A F101 MWF9 H221

ART103A S1010 S1010 Burns CSC201A F105 TUTHF10 M110

ART103A S1006 B S1006 Lee HST205A F202 MWF11 H221

CSC201A S1003 A S1003 Jones

CSC201A S1006 C
HST205A S1001

Second Normal Form Relations


Transitive Dependency
• If A, B, and C are attributes of relation R, such that A
→ B, and B → C, then C is transitively dependent on
A

Example:
NewStudent (stuId, lastName, major, credits, status)
FD:
credits→status

By transitivity:
stuId→credits  credits→status implies stuId→status

Transitive dependencies cause update, insertion, deletion anomalies.


Third Normal Form
• A relation is in third normal form (3NF) if whenever
a non-trivial functional dependency X→A exists, then
either X is a superkey or A is a member of some
candidate key
• To be 3NF, relation must be 2NF and have no
transitive dependencies
• No non-key attribute determines another non-key
attribute. Here key includes “candidate key”
Making a relation 3NF
• For example,
NewStudent (stuId, lastName, major, credits, status)
with FD credits→status

• Remove the dependent attribute, status, from the relation


• Create a new table with the dependent attribute and its
determinant, credits
• Keep the determinant in the original table

NewStu2 (stuId, lastName, major, credits)


Stats (credits, status)
Example Transitive Dependency
NewStudent
Stuid lastName Major Credits Status
S1001 Smith History 90 Senior
S1003 Jones Math 95 Senior
S1006 Lee CSC 15 Freshman
S1010 Burns Art 63 Junior
S1060 Jones CSC 25 Freshman

Transitive Dependency

NewStu2 Stats
Stuid lastName Major Credits Credits Status
S1001 Smith History 90 15 Freshman
S1003 Jones Math 95 25 Freshman
S1006 Lee CSC 15 63 Junior
S1010 Burns Art 63 90 Senior
S1060 Jones CSC 25 95 Senior

Removed Transitive Dependency


Boyce-Codd Normal Form
• A relation is in Boyce/Codd Normal Form
(BCNF) if whenever a non-trivial functional
dependency X→A exists, then X is a superkey
• Stricter than 3NF, which allows A to be part of
a candidate key
• If there is just one single candidate key, the
forms are equivalent
Example
NewFac (facName, dept, office, rank, dateHired)

FDs:
office → dept
facName,dept → office, rank, dateHired
facName,office → dept, rank, dateHired

• NewFac is not BCNF because office is not a superkey


• To make it BCNF, remove the dependent attributes to a new relation, with the determinant as
the key
• Project into
Fac1 (office, dept)
Fac2 (facName, office, rank, dateHired)

Note we have lost a functional dependency in Fac2 – no longer able to see that {facName, dept}
is a determinant, since they are in different relations
Example Boyce-Codd Normal Form
Faculty
facName dept office rank dateHired
Adams Art A101 Professor 1975
Byrne Math M201 Assistant 2000
Davis Art A101 Associate 1992
Gordon Math M201 Professor 1982
Hughes Mth M203 Associate 1990
Smith CSC C101 Professor 1980
Smith History H102 Associate 1990
Tanaka CSC C101 Instructor 2001
Vaughn CSC C101 Associate 1995

Fac1 Fac2
office dept facName office rank dateHired
A101 Art Adams A101 Professor 1975
C101 CSC Byrne M201 Assistant 2000
C105 CSC Davis A101 Associate 1992
H102 History Gordon M201 Professor 1982
M201 Math Hughes M203 Associate 1990
M203 Math Smith C101 Professor 1980
Smith H102 Associate 1990
Tanaka C101 Instructor 2001
Vaughn C101 Associate 1995
Normalization Example
• Relation that stores information about projects in large
business
– Work (projName, projMgr, empId, hours, empName, budget,
startDate, salary, empMgr, empDept, rating)
prijName projMgr empId hours empName budget startDate salary empMgr empDept rating
Jupiter Smith E101 25 Jones 100000 01/15/04 60000 Levine 10 9
Jupiter Smith E105 40 Adams 100000 01/15/04 55000 Jones 12
Jupiter Smith E110 10 Rivera 100000 01/15/04 43000 Levine 10 8
Maxima Lee E101 15 Jones 200000 03/01/04 60000 Levine 10
Maxima Lee E110 30 Rivera 200000 03/01/04 43000 Levine 10
Maxima Lee E120 15 Tanaka 200000 03/01/04 45000 Jones 15
Normalization Example (cont)
1. Each project has a unique name.
2. Although project names are unique, names of employees and managers
are not.
3. Each project has one manager, whose name is stored in projMgr.
4. Many employees can be assigned to work on each project, and an
employee can be assigned to more than one project. The attribute hours
tells the number of hours per week a particular employee is assigned to
work on a particular project.
5. budget stores the amount budgeted for a project, and startDate gives
the starting date for a project.
6. salary gives the annual salary of an employee.
7. empMgr gives the name of the employee’s manager, who might not be the
same as the project manager.
8. empDept gives the employee’s department. Department names are
unique. The employee’s manager is the manager of the employee’s
department.
9. rating gives the employee’s rating for a particular project. The project
manager assigns the rating at the end of the employee’s work on the
project.
Normalization Example (cont)
• Functional dependencies
– projName  projMgr, budget, startDate
– empId  empName, salary, empMgr, empDept
– projName, empId  hours, rating
– empDept  empMgr
– empMgr does not functionally determine empDept since
people's names were not unique (different managers may have
same name and manage different departments or a manager may
manage more than one department
– projMgr does not determine projName
• Primary Key
– projName, empId since every member depends on
that combination
Normalization Example (cont)
• First Normal Form
– With the primary key each cell is single valued,
Work in 1NF
• Second Normal Form
– Pratial dependencies
• projName  projMgr, budget, startDate
• empId  empName, salary, empMgr, empDept
– Transform to
• Proj (projName, projMgr, budget, startDate)
• Emp (empId, empName, salary, empMgr, empDept)
• Work1 (projName, empId, hours, rating)
Normalization Example (cont)
Second Normal Form
Proj Work1
prijName empId hours rating
prijName projMgr budget startDate Jupiter E101 25 9
Jupiter Smith 100000 01/15/04
Jupiter E105 40
Maxima Lee 200000 03/01/04
Jupiter E110 10 8
Maxima E101 15
Maxima E110 30
Maxima E120 15
Emp
empId empName salary empMgr empDept
E101 Jones 60000 Levine 10
E105 Adams 55000 Jones 12
E110 Rivera 43000 Levine 10
E101 Jones 60000 Levine 10
E110 Rivera 43000 Levine 10
E120 Tanaka 45000 Jones 15
Normalization Example (cont)
• Third Normal Form
– Proj in 3NF – no non-key atrribute functionally
determines another non-key attribute
– Work1 in 3NF – no transitive dependency
involving hours or rating
– Emp not in 3NF – transitive dependency
• empDept  empMgr and empDept is not a
superkey, nor is empMgr part of a candidate key
• Need two relations
– Emp1 (empId, empName, salary, empDept)
– Dep (empDept, empMgr)
Normalization Example (cont)
Third Normal Form

Emp1 Dept
empId empNam salary empDept empDept empMgr
e 10 Levine
E101 Jones 60000 10
12 Jones
E105 Adams 55000 12
15 Jones
E110 Rivera 43000 10
E120 Tanaka 45000 15

Proj Work1
prijName empId hours rating
prijName projMgr budget startDate Jupiter E101 25 9
Jupiter Smith 100000 01/15/04
Jupiter E105 40
Maxima Lee 200000 03/01/04
Jupiter E110 10 8
Maxima E101 15
Maxima E110 30
Maxima E120 15

This is also BCNF since the only determinant


in each relation is the primary key
Properties of Decompositions
• Starting with a universal relation that contains all the
attributes, we can decompose into relations by
projection
• A decomposition of a relation R is a set of relations
{R1,R2,...,Rn} such that each Ri is a subset of R and the
union of all of the Ri is R.
• Desirable properties of decompositions
– Attribute preservation - every attribute is in some relation
– Dependency preservation - see previous example
– Lossless decomposition - discussed later
Dependency Preservation
• If R is decomposed into {R1,R2,…,Rn,} so that
for each functional dependency X→Y all the
attributes in X  Y appear in the same
relation, Ri, then all FDs are preserved
• Allows DBMS to check each FD constraint by
checking just one table for each
Multi-valued Dependency
• In R(A,B,C) if each A values has associated with it a
set of B values and a set of C values such that the B
and C values are independent of each other, then A
multi-determines B and A multi-determines C
• Multi-valued dependencies occur in pairs
• Example: JointAppoint(facId, dept, committee)
assuming a faculty member can belong to more than
one department and belong to more than one
committee
• Table must list all combinations of values of
department and committee for each facId
4NF
• A table is 4NF if it is BCNF and has no multi-
valued dependencies
• Example: remove MVDs in JointAppoint
Appoint1(facId,dept)
Appoint2(facId,committee)
Example of Lossy Projection
Original EmpRoleProj table:

EmpName role projName


Smith designer Nile
Smith programmer Amazon
Smith designer Amazon
Jones designer Amazon

Project into
Table a Table b
EmpName role role projName
Smith designer designer Nile
Smith programmer programmer Amazon
Jones designer designer Amazon

Joining Table a and Table b produces


EmpName role projName
Smith designer Nile
Smith designer Amazon
Smith programmer Amazon
Jones designer Nile  spurious tuple
Jones designer Amazon
Lossless Decomposition
• A decomposition of R into {R1, R2,....,Rn} is
lossless if the natural join of R1, R2,...,Rn
produces exactly the relation R
• No spurious tuples are created when the
projections are joined.
• Always possible to find a BCNF decomposition
that is lossless
Lossless Projections
• Lossless property guaranteed if for each pair of relations that
will be joined, the set of common attributes is a superkey of
one of the relations
• Binary decomposition of R into {R1,R2} lossless iff one of these
holds
R1 ∩ R2 → R1 - R2
or
R1 ∩ R2 → R2 - R1
• For n relations in decomposition, test by general Algorithm to
Test for Lossless Join, Section 5.6.3
• If projection is done by successive binary projections, can
apply binary decomposition test repeatedly
Algorithm to Test for Lossless Join
• Given a relation R(A1,A2,…An), a set of functional dependencies, F, and
a decomposition of R into Relations R1, R2, …Rm, to determine
whether the decomposition has a lossless join
– Construct an m by n table, S, with a column for each of the n attributes in
R and a row for each of the m relations in the decomposition
– For each cell S(I,j) of S,
• If the attribute for the column, Aj, is in the relation for the row, Ri, then place
the symbol a(j) in the cell else place the symbol b(I,j)there
– Repeat the following process until no more changes can be made
to Sc for each FD X  Y in F
• For all rows in S that have the same symbols in the columns corresponding to
the attributes of X, make the symbols for the columns that represent attributes
of Y equal by the following rule:
– If any row has an a value,. A(j), then set the value of that column in all the other rows equal to a(j)
– If no row ahs an a value, then pick any one of the b values, say b(I,j), and set all the other rows
equal to b(I,j)

– If, after all possible changes have been made to S, a row is made up
entirely of a symbols, a(1, a(2, …,a(n), then the join is lossless. If
there is no such row, the join is lossy.
5NF and DKNF
• A relation is 5NF if there are no remaining
non-trivial lossless projections
• A relation is in Domain-Key Normal Form
(DKNF) if every constraint is a logical
consequence of domain constraints or key
constraints
– Every constraint means the normal
constraints (functional dependencies, etc.) as
well as general constraints (specific to the
data, e.g., first digit of stuId indicate student
status (1 for <30 hours, 2 for 30 to 59, …))
De-normalization
• When to stop the normalization process
– When applications require too many joins
– When you cannot get a non-loss decomposition
that preserves dependencies
Inference Rules for FDs
• Armstrong’s Axioms
– Reflexivity If B is a subset of A, then A → B..
– Augmentation If A → B, then AC → BC.
– Transitivity If A → B and B → C, then A → C
Additional rules that follow:
– Additivity If A → B and A → C, then A → BC
– Projectivity If A → BC, then A → B and A → C
– Pseudotransitivity If A → B and CB → D, then AC → D
Closure of Set of FDs
• If F is a set of functional dependencies for a
relation R, then the set of all functional
dependencies that can be derived from F, F+, is
called the closure of F
• Could compute closure by applying
Armstrong’s Axioms repeatedly
Closure of an Attribute
• If A is an attribute or set of attributes of relation R,
all the attributes in R that are functionally
dependent on A in R form the closure of A, A+
• Computed by Closure Algorithm for A, Section
5.7.3
result ← A;
while (result changes) do
for each functional dependency B → C in F
if B is contained in result then result ← result  C;
end;
A+ ← result;
Uses of Attribute Closure
• Can determine if A is a superkey-if every
attribute in R functionally dependent on A
• Can determine whether a given FD X→Y is in
the closure of the set of FDs. (Find X+, see if it
includes Y)
Redundant FDs and Covers
• Given a set of FDs, can determine if any of
them is redundant, i.e. can be derived from
the remaining FDs, by a simple algorithm –
see Section 5.7.4
• If a relation R has two sets of FDs, F and G
– then F is a cover for G if every FD in G is also in
F+
– F and G are equivalent if F is a cover for G and
G is a cover for F (i.e. F+ = G+)
Minimal Set of FDs
• Set of FDs, F is minimal if
– The right side of every FD in F has a single
attribute (called standard or canonical form)
– No attribute in the left side of any FD is
extraneous
– F has no redundant FDs
Minimal Cover for Set of FDs
• A minimal cover for a set of FDs is a cover such
that no proper subset of itself is also a cover
• A set of FDs may have several minimal covers
• See Algorithm for Finding a Minimal Cover,
Section 5.7.7
Decomposition Algorithm for BCNF

• Can always find a lossless decomposition that is


BCNF
– Find a FD that is a violation of BCNF and remove it by
decomposition process
– Repeat this process until all violations are removed
– See algorithm, Section 5.7.8
• No need to go through 1NF, 2NF, 3NF process
• Not always possible to preserve all FDs
Synthesis Algorithm for 3NF
• Can always find 3NF decomposition that is
lossless and that preserves all FDs
• 3NF Algorithm uses synthesis
– Begin with universal relation and set of FDs,G
– Find a minimal cover for G
– Combine FDs that have the same determinant
– Include a relation with a key of R
– See algorithm, Section 5.7.7

You might also like