0% found this document useful (0 votes)
42 views54 pages

Norm DBAlg

The document discusses database normalization. Some key points: - Normalization is the process of structuring tables to remove redundant data and dependencies on non-key attributes. This reduces data anomalies. - There are various levels of normalization from 1NF to 5NF that successively reduce redundancy. Most databases should be at least 3NF or BCNF. - The levels impose restrictions on table design based on functional dependencies between attributes. Higher normal forms have less redundancy and fewer anomalies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views54 pages

Norm DBAlg

The document discusses database normalization. Some key points: - Normalization is the process of structuring tables to remove redundant data and dependencies on non-key attributes. This reduces data anomalies. - There are various levels of normalization from 1NF to 5NF that successively reduce redundancy. Most databases should be at least 3NF or BCNF. - The levels impose restrictions on table design based on functional dependencies between attributes. Higher normal forms have less redundancy and fewer anomalies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Database Normalization

LECTURE NOTES
Definition

This is the process which allows you to winnow out


redundant data within your database.
This involves restructuring the tables to successively
meeting higher forms of Normalization.
A properly normalized database should have the
following characteristics
◦ Scalar values in each fields
◦ Absence of redundancy.
◦ Minimal use of null values.
◦ Minimal loss of information.
Levels of Normalization

Levels of normalization based on the amount of


redundancy in the database.
Various levels of normalization are:

Redundancy
◦ First Normal Form (1NF)

Number of Tables
◦ Second Normal Form (2NF)
◦ Third Normal Form (3NF)

Complexity
◦ Boyce-Codd Normal Form (BCNF)
◦ Fourth Normal Form (4NF)
◦ Fifth Normal Form (5NF)
◦ Domain Key Normal Form (DKNF)
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFin inorder
orderto
toavoid
avoidthe
thedatabase
database
anomalies.
anomalies.
Levels of Normalization

1NF
2NF
3NF
4NF
5NF
DKNF

Each
Eachhigher
higherlevel
levelisisaasubset
subsetof
ofthe
thelower
lowerlevel
level
First Normal Form (1NF)

A table is considered to be in 1NF if all the fields contain


only scalar values (as opposed to list of values). Example (Not 1NF)

ISBN Title AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00


Snoopy, 232-234-1234,
Grumpy 665-235-6532

0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95


Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Roman 444-444-4444 Big House 123-456-7890 $25.00


Basic

Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar
1NF - Decomposition

1. Place all items that appear in the repeating group in a new table
2. Designate a primary key for each new table produced.
3. Duplicate in the new table the primary key of the table from which the
repeating group was extracted or vice versa.
Example (1NF) ISBN AuName AuPhone

0-321-32132-1 Sleepy 321-321-1111

ISBN Title PubName PubPhone Price 0-321-32132-1 Snoopy 232-234-1234

0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-321-32132-1 Grumpy 665-235-6532

0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 0-55-123456-9 Smith 654-223-3455

1-22-233700-0 Visual Big House 123-456-7890 $25.00 0-123-45678-0 Joyce 666-666-6666


Basic
1-22-233700-0 Roman 444-444-4444
Functional Dependencies

1. If one set of attributes in a table determines another set of


attributes in the table, then the second set of attributes is
said to be functionally dependent on the first set of
attributes.

Example
ISBN 1 Title Price Table Scheme: {ISBN, Title, Price}
0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN}  {Title}
0-55-123456-9 Main Street $22.95 {ISBN} 
0-123-45678-0 Ulysses $34.00
{Price}
1-22-233700-0 Visual $25.00
Basic
Functional Dependencies
Example 2
PubID PubName PubPhone
1 Big House 999-999-9999
Table Scheme: {PubID, PubName, PubPhone}
2 Small House 123-456-7890 Functional Dependencies: {PubId}  {PubPhone}
3 Alpha Press 111-111-1111 {PubId} 
{PubName}
{PubName, PubPhone} 
Example 3 {PubID}
AuID AuName AuPhone
1 Sleepy 321-321-1111
Table Scheme: {AuID, AuName, AuPhone}
2 Snoopy 232-234-1234 Functional Dependencies: {AuId}  {AuPhone}
3 Grumpy 665-235-6532 {AuId} 
4 Jones 123-333-3333 {AuName}
5 Smith 654-223-3455 {AuName, AuPhone}  {AuID}
6 Joyce 666-666-6666

7 Roman 444-444-4444
FD – Example

Database to track reviews of papers submitted to an academic


conference. Prospective authors submit papers for review and possible
acceptance in the published conference proceedings. Details of the
entities
◦ Author information includes a unique author number, a name, a mailing
address, and a unique (optional) email address.
◦ Paper information includes the primary author, the paper number, the title,
the abstract, and review status (pending, accepted, rejected)
◦ Reviewer information includes the reviewer number, the name, the mailing
address, and a unique (optional) email address
◦ A completed review includes the reviewer number, the date, the paper
number, comments to the authors, comments to the program chairperson,
and ratings (overall, originality, correctness, style, clarity)
FD – Example

Functional Dependencies
◦ AuthNo  AuthName, AuthEmail, AuthAddress
◦ AuthEmail  AuthNo
◦ PaperNo  Primary-AuthNo, Title, Abstract, Status
◦ RevNo  RevName, RevEmail, RevAddress
◦ RevEmail  RevNo
◦ RevNo, PaperNo  AuthComm, Prog-Comm, Date,
Rating1, Rating2, Rating3, Rating4, Rating5
Second Normal Form (2NF)

For a table to be in 2NF, there are two requirements


◦ The database is in first normal form
◦ All nonkey attributes in the table must be functionally dependent on the entire
primary key

Note: Remember that we are dealing with non-key attributes


Example 1 (Not 2NF)
Scheme  {Title, PubId, AuId, Price, AuAddress}
1. Key  {Title, PubId, AuId}
2. {Title, PubId, AuID}  {Price}
3. {AuID}  {AuAddress}
4. AuAddress does not belong to a key
5. AuAddress functionally depends on AuId which is a subset of a key
Second Normal Form (2NF)
Example 2 (Not 2NF)
Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
1. key  {City, Street, HouseNumber}
2. {City, Street, HouseNumber}  {HouseColor}
3. {City}  {CityPopulation}
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the City which is a proper subset of the key

Example 3 (Not 2NF)


Scheme  {studio, movie, budget, studio_city}
6. Key  {studio, movie}
7. {studio, movie}  {budget}
8. {studio}  {studio_city}
9. studio_city is not a part of a key
10. studio_city functionally depends on studio which is a proper subset of the key
2NF - Decomposition

1. If a data item is fully functionally dependent on only a part of the


primary key, move that data item and that part of the primary key to a
new table.
2. If other data items are functionally dependent on the same part of the
key, place them in the new table also
3. Make the partial primary key copied from the original table the primary
key for the new table. Place all items that appear in the repeating group
in a new table
Example 1 (Convert to 2NF)
Old Scheme  {Title, PubId, AuId, Price, AuAddress}
New Scheme  {Title, PubId, AuId, Price}
New Scheme  {AuId, AuAddress}
2NF - Decomposition

Example 2 (Convert to 2NF)


Old Scheme  {Studio, Movie, Budget, StudioCity}
New Scheme  {Movie, Studio, Budget}
New Scheme  {Studio, City}

Example 3 (Convert to 2NF)


Old Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
New Scheme  {City, Street, HouseNumber, HouseColor}
New Scheme  {City, CityPopulation}
Third Normal Form (3NF)

This form dictates that all non-key attributes of a table must be functionally
dependent on a candidate key i.e. there can be no interdependencies among
non-key attributes.
For a table to be in 3NF, there are two requirements
◦ The table should be second normal form
◦ No attribute is transitively dependent on the primary key

Example (Not in 3NF)


Scheme  {Title, PubID, PageCount, Price }
1. Key  {Title, PubId}
2. {Title, PubId}  {PageCount}
3. {PageCount}  {Price}
4. Both Price and PageCount depend on a key hence 2NF
5. Transitively {Title, PubID}  {Price} hence not in 3NF
Third Normal Form (3NF)

Example 2 (Not in 3NF)


Scheme  {Studio, StudioCity, CityTemp}
1. Primary Key  {Studio}
2. {Studio}  {StudioCity}
3. {StudioCity}  {CityTemp}
4. {Studio}  {CityTemp}
BuildingI Contractor Fee
5. Both StudioCity and CityTemp depend on the entire key hence 2NF
D
6. CityTemp transitively depends on Studio hence violates 3NF 100 Randolph 1200
150 Ingersoll 1100
Example 3 (Not in 3NF) 200 Randolph 1200

Scheme  {BuildingID, Contractor, Fee} 250 Pitkin 1100


7. Primary Key  {BuildingID} 300 Randolph 1200
8. {BuildingID}  {Contractor}
9. {Contractor}  {Fee}
10. {BuildingID}  {Fee}
11. Fee transitively depends on the BuildingID
12. Both Contractor and Fee depend on the entire key hence 2NF
3NF - Decomposition

1. Move all items involved in transitive dependencies to a new entity.


2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign key on the original
entity.
Example 1 (Convert to 3NF)
Old Scheme  {Title, PubID, PageCount, Price }
New Scheme  {PubID, PageCount, Price}
New Scheme  {Title, PubID, PageCount}
3NF - Decomposition
Example 2 (Convert to 3NF)
Old Scheme  {Studio, StudioCity, CityTemp}
New Scheme  {Studio, StudioCity}
New Scheme  {StudioCity, CityTemp}

BuildingI Contractor Contractor Fee


D
100 Randolph Randolph 1200
Example 3 (Convert to 3NF) 150 Ingersoll Ingersoll 1100
Old Scheme  {BuildingID, Contractor, Fee} 200 Randolph Pitkin 1100

New Scheme  {BuildingID, Contractor} 250 Pitkin


300 Randolph
New Scheme  {Contractor, Fee}
Boyce-Codd Normal Form (BCNF)
BCNF does not allow dependencies between attributes that belong to candidate keys.
BCNF is a refinement of the third normal form in which it drops the restriction of a non-key
attribute from the 3rd normal form.
Third normal form and BCNF are not same if the following conditions are true:
◦ The table has two or more candidate keys
◦ At least two of the candidate keys are composed of more than one attribute
◦ The keys are not disjoint i.e. The composite candidate keys share some attributes

Example 1 - Address (Not in BCNF)


Scheme  {City, Street, ZipCode }
1. Key1  {City, Street }
2. Key2  {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street}  {ZipCode}
5. {ZipCode}  {City}
6. Dependency between attributes belonging to a key
Boyce Codd Normal Form (BCNF)

Example 2 - Movie (Not in BCNF)


Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
1. Key1  {MovieTitle, PersonName}
2. Key2  {MovieID, PersonName}
3. Both role and payment functionally depend on both candidate keys thus 3NF
4. {MovieID}  {MovieTitle}
5. Dependency between MovieID & MovieTitle Violates BCNF

Example 3 - Consulting (Not in BCNF)


Scheme  {Client, Problem, Consultant}
6. Key1  {Client, Problem}
7. Key2  {Client, Consultant}
8. No non-key attribute hence 3NF
9. {Client, Problem}  {Consultant}
10. {Client, Consultant}  {Problem}
11. Dependency between attributess belonging to keys violates BCNF
BCNF - Decomposition

1. Place the two candidate primary keys in separate entities


2. Place each of the remaining data items in one of the resulting
entities according to its dependency on the primary key.
Example 1 (Convert to BCNF)
Old Scheme  {City, Street, ZipCode }
New Scheme1  {ZipCode, Street}
New Scheme2  {City, Street}
Loss of relation {ZipCode}  {City}
Alternate New Scheme1  {ZipCode, Street }
Alternate New Scheme2  {ZipCode, City}
Decomposition – Loss of Information

1. If decomposition does not cause any loss of information it is called a


lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost it is
called a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a collection
of smaller schemas that are in BCNF form. However the dependency
preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3rd normal form
that also preserves the dependencies.
• 3NF may be better than BCNF in some cases

Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas
BCNF - Decomposition

Example 2 (Convert to BCNF)


Old Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieTitle, PersonName}

Loss of relation {MovieID}  {MovieTitle}


New Scheme  {MovieID, PersonName, Role, Payment}
New Scheme  {MovieID, MovieTitle}

We got the {MovieID}  {MovieTitle} relationship back


Example 3 (Convert to BCNF)
Old Scheme  {Client, Problem, Consultant}
New Scheme  {Client, Consultant}
New Scheme  {Client, Problem}
Fourth Normal Form (4NF)
Fourth normal form eliminates independent many-to-one relationships between
columns.
To be in Fourth Normal Form,
◦ a relation must first be in Boyce-Codd Normal Form.
◦ a given relation may not contain more than one multi-valued attribute.

Example (Not in 4NF)


Scheme  {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre) Movie ScreeningCity Genre
1. All columns are a part of the only Hard Code Los Angles Comedy
candidate key, hence BCNF Hard Code New York Comedy
2. Many Movies can have the same Genre Bill Durham Santa Cruz Drama
3. Many Cities can have the same movie Bill Durham Durham Drama
4. Violates 4NF The Code Warrier New York Horror
Fourth Normal Form (4NF)

Example 2 (Not in 4NF)


Manager Child Employee
Scheme  {Manager, Child, Employee}
1. Primary Key  {Manager, Child, Employee} Jim Beth Alice

2. Each manager can have more than one child Mary Bob Jane
3. Each manager can supervise more than one employee Mary NULL Adam
4. 4NF Violated

Example 3 (Not in 4NF)


Scheme  {Employee, Skill, ForeignLanguage}
Employee Skill Language
5. Primary Key  {Employee, Skill, Language }
1234 Cooking French
6. Each employee can speak multiple languages
7. Each employee can have multiple skills 1234 Cooking German

8. Thus violates 4NF 1453 Carpentry Spanish

1453 Cooking Spanish


2345 Cooking Spanish
4NF - Decomposition

1. Move the two multi-valued relations to separate tables


2. Identify a primary key for each of the new entity.
Example 1 (Convert to 3NF)
Old Scheme  {MovieName, ScreeningCity, Genre}
New Scheme  {MovieName, ScreeningCity}
New Scheme  {MovieName, Genre}

Movie Genre Movie ScreeningCity


Hard Code Comedy Hard Code Los Angles

Bill Durham Drama Hard Code New York

The Code Warrier Horror Bill Durham Santa Cruz

Bill Durham Durham

The Code Warrier New York


4NF - Decomposition

Example 2 (Convert to 4NF) Manager Child Manager Employee

Old Scheme  {Manager, Child, Employee} Jim Beth Jim Alice

New Scheme  {Manager, Child} Mary Bob Mary Jane


Mary Adam
New Scheme  {Manager, Employee}

Example 3 (Convert to 4NF)


Old Scheme  {Employee, Skill, ForeignLanguage}
New Scheme  {Employee, Skill}
Employee Skill Employee Language
New Scheme 
1234 Cooking 1234 French
{Employee, ForeignLanguage} 1453 Carpentry 1234 German

1453 Cooking 1453 Spanish

2345 Cooking 2345 Spanish


Fifth Normal Form (5NF)

Fifth normal form is satisfied when all tables are broken into
as many tables as possible in order to avoid redundancy.
Once it is in fifth normal form it cannot be broken into
smaller relations without changing the facts or the meaning.
Domain Key Normal Form (DKNF)

The relation is in DKNF when there can be no insertion or


deletion anomalies in the database.
DBMS Architecture
SQL engine works as follows;
SQL query  relational algebra plan
Relational algebra plan  Optimized plan
Execute each operator of the plan
Relational Algebra
Formalism for creating new relations from existing ones
Its place in the big picture:

Declartive
Declartive
query
query Algebra
Algebra Implementation
Implementation
language
language

SQL, Relational algebra


relational calculus Relational bag algebra
Relational Algebra
Five operators:
◦ Union: 
◦ Difference: -
◦ Selection: s
◦ Projection: P
◦ Cartesian Product: 

Derived or auxiliary operators:


◦ Intersection, complement
◦ Joins (natural,equi-join, theta join, semi-join)
◦ Renaming: r
1. Union and 2. Difference
R1  R2
Example:
◦ ActiveEmployees  RetiredEmployees
R1 – R2
Example:
◦ AllEmployees -- RetiredEmployees
What about Intersection ?
It is a derived operator
R1  R2 = R1 – (R1 – R2)
Also expressed as a join (will see later)
Example
◦ UnionizedEmployees  RetiredEmployees
3. Selection
Returns all tuples which satisfy a condition
Notation: sc(R)
Examples
◦ sSalary > 40000 (Employee)
◦ sname = “Smith” (Employee)

The condition c can be =, <, , >, , <>


SSN Name Salary
1234545 John 200000
5423341 Smith 600000
4352342 Fred 500000

s Salary > 40000 (Employee)

SSN Name Salary


5423341 Smith 600000
4352342 Fred 500000
4. Projection
Eliminates columns, then removes
duplicates
Notation: P A1,…,An (R)
Example: project social-security number
and names:
◦ P SSN, Name (Employee)
◦ Output schema: Answer(SSN, Name)
SSN Name Salary
1234545 John 200000
5423341 John 600000
4352342 John 200000

P Name,Salary (Employee)

Name Salary
John 20000
John 60000
5. Cartesian Product
Each tuple in R1 with each tuple in R2
Notation: R1  R2
Example:
◦ Employee  Dependents
Very rare in practice; mainly used to express joins
Cartesian Product Example

Employee
Name SSN
John 999999999
Tony 777777777

Dependents
EmployeeSSN Dname
999999999 Emily
777777777 Joe

Employee x Dependents
Name SSN EmployeeSSN Dname
John 999999999 999999999 Emily
John 999999999 777777777 Joe
Tony 777777777 999999999 Emily
Tony 777777777 777777777 Joe
Relational Algebra
Five operators:
◦ Union: 
◦ Difference: -
◦ Selection: s
◦ Projection: P
◦ Cartesian Product: 

Derived or auxiliary operators:


◦ Intersection, complement
◦ Joins (natural,equi-join, theta join, semi-join)
◦ Renaming: r
Renaming
Changes the schema, not the instance
Notation: r B1,…,Bn (R)
Example:
◦ rLastName, SocSocNo (Employee)
◦ Output schema:
Answer(LastName, SocSocNo)
Renaming Example
Employee
Name SSN
John 999999999
Tony 777777777

rLastName, SocSocNo (Employee)


LastName SocSocNo
John 999999999
Tony 777777777
Natural Join
Notation: R1 || R2
Meaning: R1 || R2 = PA(sC(R1  R2))
Where:
◦ The selection sC checks equality of all common attributes
◦ The projection eliminates the duplicate common attributes
Natural Join Example
Employee
Name SSN
John 999999999
Tony 777777777

Dependents
SSN Dname
999999999 Emily
777777777 Joe
Employee Dependents =
PName, SSN, Dname(s SSN=SSN2(Employee x rSSN2, Dname(Dependents))
Name SSN Dname
John 999999999 Emily
Tony 777777777 Joe
Natural Join
R= A B S= B C
Z U
X Y
X Z V W
Y Z Z V
Z V

R || S= A B C
X Z U
X Z V
Y Z U
Y Z V
Z V W
Natural Join
1. Given the schemas R(A, B, C, D), and S(A, C, E), what
is the schema of R || S ?
2. Given R(A, B, C), S(D, E), what is R || S ?
3. Given R(A, B), S(A, B), what is R || S ?
Theta Join
A join that involves a predicate
R1 || q R2 = s q (R1  R2)
Here q can be any condition
Equality - join
A theta join where q is an equality
R1 || A=B R2 = s A=B (R1  R2)
Example:
◦ Employee || SSN=SSN Dependents

Most useful join in practice


Semijoin
R | S = P A1,…,An (R || S)
Where A1, …, An are the attributes in R
Example:
◦ Employee | Dependents
Semijoins in Distributed
Databases
Semijoins are used in distributed databases

Dependents
Employee
SSN Dname Age
SSN Name ... ...
... ... network

Employee |
Employee | ssn=ssn (s age>71(Dependents))
ssn=ssn (s age>71 (Dependents))
T = P SSN s age>71 (Dependents)
R = Employee | T
Answer = R || Dependents
Complex RA Expressions
P name

buyer-ssn=ssn

pid=pid

seller-ssn=ssn

P ssn P pid
sname=fred sname=gizmo

Person Purchase Person Product


Operations on Bags
A bag = a set with repeated elements
All operations need to be defined carefully on bags
{a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}
{a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
sC(R): preserve the number of occurrences
PA(R): no duplicate elimination
Cartesian product, join: no duplicate elimination
Important ! Relational Engines work on bags, not sets !

Reading assignment: 5.3 – 5.4


Finally: RA has Limitations !
Cannot compute “transitive closure”
Name1 Name2 Relationship
Fred Mary Father
Mary Joe Cousin
Mary Bill Spouse
Nancy Lou Sister

Find all direct and indirect relatives of Fred


Cannot express in RA !!! Need to write C program

You might also like