0% found this document useful (0 votes)
13 views5 pages

05 Normalisation

This document discusses normalization in relational databases. It begins by showing an example of an unnormalized "READS" relation that has multiple values in a column, violating first normal form (1NF). To normalize it, the relation is split into two tables, "READS2" with columns for Name and Paper, achieving 1NF. The document then discusses problems with relations not in 1NF, such as duplication, update anomalies, insertion anomalies, and deletion anomalies. It concludes that to deal with these problems, non-loss decomposition should be used to split relations into projections that can be recombined without loss of information.

Uploaded by

danelozano23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

05 Normalisation

This document discusses normalization in relational databases. It begins by showing an example of an unnormalized "READS" relation that has multiple values in a column, violating first normal form (1NF). To normalize it, the relation is split into two tables, "READS2" with columns for Name and Paper, achieving 1NF. The document then discusses problems with relations not in 1NF, such as duplication, update anomalies, insertion anomalies, and deletion anomalies. It concludes that to deal with these problems, non-loss decomposition should be used to split relations into projections that can be recombined without loss of information.

Uploaded by

danelozano23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Normalisation in Relational Databases

• We know what a relation is.


– Or do we?

Master in • One of the hardest things is to take an informally set-up table


Information containing some information and convert this to a database
appropriately. Suppose we have some information about the
Technology newspaper-reading habits of various persons:
– Smith reads the Record and the Mail,
– Lee reads the Herald,
ICT - 209 – etc.

Database Applications
• Let’s represent this relation in a table:
Normalization

1 2

Unnormalised Relations First Normal Form (1NF)


• Here’s a first try: • Here is another try:

READS
Name PaperList READS2
Smith Record, Mail Name Paper
Lee Herald Smith Record
Smith Mail
• This is not ideal. Each person is associated with an unspecified number Lee Herald
of papers. The items in the PaperList column do not have a consistent
• This clearly contains the same information.
form.
• And it has the property that we sought. It is in First Normal Form
• Generally, RDBMS can’t cope with relations like this. Each entry in a (1NF).
table needs to have a single data item in it. – A relation is in 1NF if no entry consists of more than one value (i.e. does
• This is an unnormalised relation. not have repeating groups)
• All RDBMS require relations not to be like this - not to have multiple • So this will be the first requirement in designing our databases:
values in any column (i.e. no repeating groups) – our relations must be in 1NF.

3 4

Achieving 1NF Recap: A Relation in 1NF


• In general, to achieve 1NF we need to get rid of repeating groups in our • Getting into 1NF is just the start.
tables. There are two alternative ways of doing this. • Relations that are in 1NF can still have considerable problems.
• Example : Staff borrowers in a Library.
• The one-table approach: we extend the table rows by replicating the non- – their staff number functions as a Library number
repeated columns for each repeated item value. This is what we did in the – and someone has had the ingenious idea of adding details of books
last slide. borrowed (in the same relation/table)
– Here’s a first go:
• The two-table approach: split the repeating and non-repeating data into
separate tables (Non-loss Decomposition) StaffBorrower
– We must then choose a primary key for the repeating data table Sno Sname Sdept Grade Salary Bno Date_out
– …and insert this as a foreign key in the non-repeating data table 1 Smith Computing 2.7 26813 1 30/06/2002
2 Black Marketing 1.5 17278 8 08/07/2002
• The two-table approach is often better as it takes up less space and leads us
to 2nd Normal Form…. More later!
• Primary key: (Sno,Bno)
• Next, we will look at problems with 1NF…

5 6
Problems with a 1NF Relation: Duplication Problems with a 1NF Relation: Update
• Now suppose that Smith borrows another book. Anomalies
• To ensure 1NF, we shall have to have a complete new row:
• Such repetition means that updates can be difficult.
• Suppose that Smith goes on to a new grade.
StaffBorrower – Changes would be required to all records for Smith.
Sno Sname Sdept Grade Salary Bno Date_out
1 Smith Computing 2.7 26813 1 30/06/2002 – (And there is a danger that we may miss some.)
2 Black Marketing 1.5 17278 8 08/07/2002 • Suppose that the salary for grade 2.7 is changed.
1 Smith Computing 2.7 26813 53 12/07/2002
– All records for all staff members on grade 2.7 would have to be changed.
• A fact should be stored only once. Updates are then problem-free.
• We have stored all the other details about this member of staff again, in • This example relation is poorly structured, being subject to update
the new row. anomalies.
• Not only is information about Smith duplicated, but
– the fact that staff on grade 2.7 earn £26,813 is duplicated

7 8

More Problems in a 1NF Relation: Solution: Non-loss decomposition (NLD)


Insertion and Deletion Anomalies
• Suppose that a new scale point is created (2.8 earns £27,491) • How do we deal with these problems?
– and as yet there is no-one on that scale point. • We carry out a non-loss decomposition (NLD): we replace the relation
• How do we record this? The following violates entity integrity: by projections from which the original relation can be re-created (by
StaffBorrower joining)
Sno Sname Sdept Grade Salary Bno Date_out • The re-creation must give no less and no more than we started with
Null Null Null 2.8 27491 Null Null
• See some examples next
• We call this an insertion anomaly.
• Also: suppose that Smith returns all her books
– do we delete all the corresponding rows?
• -removing all trace of Smith,
– or do we remove the Bno and Date_out entries from all the rows?
• -leaving duplicated information about Smith
• These are deletion anomalies.

9 10

An example of an NLD - I An example of an NLD - II


•SUC • Similarly, we can carry out an NLD of the Borrower relation
Student Unit Coordinator
•STAFF_BORROWER2 LOAN
Gary 3131 Hamilton
Sno Sname Sdept Grade Salary Sno Bno Date_out
Tracy 3131 Hamilton
1 Smith CS&M 2.7 21123 1 1 30/6/2004
Sinead 3133 Clark
1 7 1/7/2004
Sean 3133 Clark
– decomposes into • At first it looks as though the new scheme will take more space (but
•SU UC actually it takes less, because we remove repetitions)
Student Unit Unit Coordinator
Gary 3131 3131 Hamilton
Tracy 3131 3133 Clark
Sinead 3133
Sean 3133
• If we Join SU and UC (over Unit, obviously) we get back exactly
the four rows of SUC we started with
– so this is an NLD

11 12
A formal apparatus The predicate of a relation
• We need a method of analysing relations to detect and prevent these • Any relation has a predicate - a definition of what any row means
problems • This will usually just be a statement in natural language, e.g.
• We need a set of definitions and procedures to – SUML1 : “the student Stu took unit UCode and obtained a mark of
– diagnose whether relations have a ‘silly’ design Mark. Unit UCode is coordinated by lecturer Lect.”
– turn them into other, better designed, relations in a systematic way •
Or perhaps
• We begin by reminding ourselves what a relation “means” – SUML2 : “the student Stu took unit UCode and obtained a mark of
– a relation’s interpretation is not always obvious from the names of its Mark. In that unit, their tutorial was taken by lecturer Lect.”
columns ..e.g. •
Or even
•SUML – SUML3 : “the student Stu took unit UCode and obtained a mark of
Stu UCode Lect Mark Mark. In that unit, lecturer Lect was one of the lecturers”
Gary 3131 Hamilton 64

13 14

Relations: Tuples represent true statements Normalisation: the big picture


•SUML • The basic idea is as follows:
Stu UCode Lect Mark • Some earlier (ER) analysis has given a database design consisting of one or
Gary 3131 Hamilton 64
more relations: for each relation, ...
• Each row (“tuple”) is a true statement: so, in SUML1, “Gary took 3131 • We inspect the relation’s predicate to find
and got 64. 3131 is coordinated by Dr Hamilton”
– which attribute(s) is/are the candidate key
• In SUML2 that would become “Gary took 3131 and got 64. His tutorial – what functional dependencies exist (see slide after next)
was taken by Dr Hamilton”
• From these, we decide what normal form the relation is in
• The Closed World Assumption: if a row isn’t in the relation, the • And hence whether the relation should be decomposed
corresponding statement is false
• And, if so, how to do that decomposition
– so if we don’t see
•Stu UCode Lect Mark
Fiona 3131 Null Null
P.S. When inspecting the relation’s predicate, one also needs to look at what
multi-valued dependencies exist (but we will not cover this in this course, as it
– then Fiona didn’t do 3131
is required for 4th Normal Form relations – see last slide)

15 16

Recap: Candidate Keys Functional Dependency


• As you will know, sometimes there can be several possible
candidate keys for a relation • In any relation, a column (or set of columns) Y is functionally dependent on a
column (or set of columns) X if at any one time exactly one Y value is
– suppose departments have unique names and also unique ids associated with any X value.
•DEPT – We write X → Y.
ID Name HoD _
Mkt Marketing Prof Burt • For example, at any one time a single surname (Y) is associated with a
particular registration-number (X)
CS&M Computing Science and Maths Dr Clark
– note that the surname may change
– For the above relation, ID and Name are candidate keys
– and that the dependency is only one-way (i.e. directional):
• We can choose which we designate as the primary key of the registration-number determines surname, but the reverse is not true
relation
• We call X a determinant and we write X -> Y
• A key field is a field (column) that is [part of] a candidate key • By definition, the attribute(s) on the left of a functional dependency (i.e. X
– ID and Name are key fields above) is called the determinant of that FD
• A non-key field is a field that isn’t part of any candidate key • To know the FDs, you must know the predicate
– HoD is a non-key field
• Every relation has a candidate/primary key (rows are unique by
definition)
17 18
Functional Dependency - More Second Normal Form (2NF)
• Here are some examples of FDs from StaffBorrower: • A relation is in Second Normal Form (2NF) if
•Sno → Sname – it is in 1NF and
Sno → Sdept – every non-key column is fully FD (FFD) on the primary key.
Sno → Grade
Grade → Salary
(Sno,Bno) → Date_out • The relation StaffBorrower is not in 2NF. Why not?
(Sno,Bno) → Sname – Because a non-key column (such as Sname) is not fully FD on the
•…(there are many others) primary key (Sno,Bno)
• We note that (Sno,Bno) is the primary key of
STAFF_BORROWER
• We can achieve 2NF by splitting our 1NF into two or more
• Non-key columns, by definition, are FD on primary key columns relations in 2NF.
• Definition: X → Y is a full FD (FFD) if Y is dependent on the
whole of X.
– Note that here Sname is not fully FD on the primary key (Sno,Bno)

• When normalising a database, we are only interested in full FDs.


19 20

2NF Decomposition But ...


• We can split any relation in 1NF into two or more relations in 2NF.
• For example, on Slide 2, previously we saw that we could • The scheme above does not solve all problems: the Grade/Salary
decompose STAFF_BORROWER into problem remains.
• • We may observe that the dependency of Salary on Sno has a special
StaffBorrower2
Loan property. It is indirect (or “transitive”), as it is achieved via a third,
Sno Bno Date_out
Sno Sname Sdept Grade Salary 1 1 30/06/2002
non-key attribute:
1 Smith Computing 2.7 26813 1 53 12/07/2002 •Sno → Grade → Salary
• We can fix the problem by insisting that our relations have a
stronger property. This will lead us from 2NF to 3NF.
• Both of the above are in 2NF.
• Recap: At first it looks as though the new scheme will require more
space (but actually it takes less, because we avoid repetitions)
• And the 2NF decomposition solves many of the problems
– but not all…..

21 22

Third Normal Form Other/Higher Normal Forms


• A relation is in Third Normal Form (3NF) if
– it is in 1NF and • So we have a hierarchy of normal forms (1NF, 2NF, 3NF).
– every non-key column is non-transitively fully FD on the primary key. – To be useful, a database structure really must be in 3NF (and we
can always find an NLD into 3NF)
• We can always find a decomposition into 3NF.
– It is the highest normal form with no disadvantages
• In our decomposition above, the Loan relation is already in 3NF:
– The result may look as though it would take more space but
– the only non-key column is Date_out actually takes less
– the primary key is (Sno,Bno), and (Sno,Bno) → Date_out (i.e. FFD) • 3NF solves most problems, but there are some rare anomalies
• But (as we have seen) StaffBorrower2 is not in 3NF. that can still arise (especially where there are overlapping
candidate keys).
• It is easy to see what to do:
• Hence, there are further normal forms:
– Boyce-Codd normal form (BCNF)
StaffBorrower3 PayScale
– Fourth normal form
Sno Sname Sdept Grade Grade Salary
1 Smith Computing 2.7 2.7 26813 – Fifth normal form
• And algorithms to carry out decompositions…
Note: More practice in Tutorial & Practical on Normalization (Decomposition)! • … but we shall NOT deal with these in this course…☺
23 24
A Practical Strategy for Normalizing
Relations
• A very commonly used approach (also recommended for your
assignment!) is:

– design the database using E/R diagrams

– specify the relations from the diagrams

– check the normal form of every relation

– if any relation isn’t 3NF, the E/R analysis was wrong, and we should
repeat the process

END of topic!

25

You might also like