0% found this document useful (0 votes)
202 views50 pages

DBMS Session 6 Notes

The document discusses normalization of databases. It covers the goals of normalization as arranging data logically to minimize duplication, allow for quick access and modification while maintaining integrity. Various normal forms are introduced from 1NF to 5NF with increasing strictness of rules. Types of dependencies like functional, multi-valued and join dependencies are explained which can cause problems. First normal form and achieving it by removing multi-valued attributes into new tables or additional columns is specifically outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views50 pages

DBMS Session 6 Notes

The document discusses normalization of databases. It covers the goals of normalization as arranging data logically to minimize duplication, allow for quick access and modification while maintaining integrity. Various normal forms are introduced from 1NF to 5NF with increasing strictness of rules. Types of dependencies like functional, multi-valued and join dependencies are explained which can cause problems. First normal form and achieving it by removing multi-valued attributes into new tables or additional columns is specifically outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

CS 623 – Database Management Systems

Session 6 Agenda

 Review of Chapter 3: Individual Assignments


 Review of Chapter 4: Individual and Team Assignments
 Oracle Installation Questions
 Review Midterm requirements
 Chapter 6: Normalization and Denormalization

Note: Midterm Exam - 11/2/21


Review of Chapter 3: Individual
Assignments
Review of Chapter 4: Individual and Team
Assignments
CS 623 – Database Management Systems

Midterm Exam – November 2, 2021

 The Midterm Exam will be held on Tuesday, November 2


 Midterm Exam duration: 90 minutes.
 Once you begin the Midterm Exam, you must complete in one sitting.
 The Midterm Exam will consist of 50 multiple choice questions and
will cover Chapter 1 - Chapter 7 (approximately 7 questions per
chapter).
 A study guide has been uploaded to the Content section of Pace
Classes.
CS 623 – Database Management Systems

Midterm Exam – November 2, 2021 (Con’t)

 The Midterm Exam will be conducted during the zoom call.


 Students will use LockDown Browser to take the exam.
 LockDown Browser allows the instructor to do live proctoring via
Zoom.
 Therefore, all students must turn on their webcam (with microphone
muted) during the exam, which will allow the professor to view the
student and surrounding area.
 If any students have questions during the exam, use the chat facility
on Zoom to communicate with professor.
 If any students need to leave the computer during the exam, use the
chat facility on Zoom to request permission from instructor.
CS 623 – Database Management Systems

Midterm Exam – November 2, 2021 (Con’t)

 Students are NOT allowed to use any books, materials, technology,


or seek help from other people or collaborate with other students.
 Academic Honesty is highly valued at Pace. Students are expected
to know and comply with Pace provisions on academic honesty, and
consequences of academic dishonesty which can include failing the
class or being expelled from the University.
Chapter 6: Normalization and
Denormalization
Chapter 6: Normalization and Denormalization

Results of a Poorly Designed Database

A poorly designed database may lead to redundant data and


anomalies.
 Redundant data is unnecessary reoccurring data (repeating groups
of data).
 Anomalies are any occurrence that weakens the integrity of your
data due to irregular or inconsistent storage (insert, update and
delete irregularity, that generates the inconsistent data).
Chapter 6: Normalization and Denormalization

Anomalies

An anomaly is an inconsistent, incomplete, or contradictory state of the


database.
 If anomalies are present:
 we would be unable to represent some information
 we might lose information when certain updates were performed
 we would run the risk of having data become inconsistent over time
Chapter 6: Normalization and Denormalization

Types of Anomalies

 Insertion anomaly – user is unable to insert a new record when it


should be possible to do so

 Update anomaly –a record is updated, but other appearances of the


same items are not updated

 Deletion anomaly – when a record is deleted, other information that


is tied to it is also deleted
Chapter 6: Normalization and Denormalization

Anomalies Example - Combined Student-Enroll Table

 Insertion anomaly: It is not possible to add a new class, for MTH101A , even if
faculty, schedule, room are known, unless there is a student registered for it,
because stuId is part of primary key
 Update anomaly: If schedule of ART103A is updated in first record, and not in
second and third – data is inconsistent
 Deletion anomaly: If record of student S1001 is deleted, information about
HST205A class is also lost
Chapter 6: Normalization and Denormalization

Normalization / Objectives

Normalization is the process of efficiently organizing data in a


database, which reduces the amount of space a database
consumes and ensures that data is logically stored.

The main objectives of the normalization process:


 eliminating redundant data (storing the same data in more than one
table)
 ensuring data dependencies make sense (only storing related data
in a table).
 Design is free from insert, delete, update anomalies
 Model flexibility (allowing the model to be extended when needed to
account for new attributes, entity sets, and relationships)
Chapter 6: Normalization and Denormalization

The Goals of Normalization

When normalizing a database you should achieve four goals:


 Arranging data into logical groups such that each group describes a
small part of the whole
 Minimizing the amount of duplicated data stored in a database
 Building a database in which you can access and manipulate the
data quickly and efficiently without compromising the integrity of the
data storage
 Organizing the data such that, when you modify it, you make the
changes in only one place
Chapter 6: Normalization and Denormalization

Normal Forms

We use the normalization process to design efficient and functional


databases. By normalizing, we store data where it logically and uniquely
belongs. The normalization process involves a few steps and each step
is called a form.
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
 Domain Key Normal Form (DK/NF)
Each form is contained within the previous form – each form has stricter
rules than the previous form
Chapter 6: Normalization and Denormalization

Types of Dependencies involved with Normalization

The following all cause problems in Relational Design:

 Functional dependencies
 Multi-valued dependencies
 Join dependencies
Chapter 6: Normalization and Denormalization

Functional Dependencies

Functional dependencies - is a relationship between two attributes,


typically between the primary key and other non-key attributes within
a table. Therefore for a relational table R, attribute B is functionally
dependent on attribute A (usually the primary key)
Written: A→B
Read: A functionally determines B
or: B is functionally dependent on A
Example: Student table: stuId and StuLastName
StuId is the primary key
StuId uniquely identifies the StuLastName attribute because if we know
the student id, we can tell the student name associated with it.
Chapter 6: Normalization and Denormalization

Multi-Valued Dependencies

Multi-valued dependencies occurs when two attributes in a table are


independent of each other but, both depend on a third attribute. If a
table has attributes A, B, and C and
B and C are multi-valued facts of A.
Written: A ->-> B A ->-> C
Read: A multi-determines attribute B
And: A multi-determines attribute C
Example: Student table: StuLastName, Major, Sport
Chapter 6: Normalization and Denormalization

Multi-Valued Dependencies (Con’t)

Attributes Major and Sport are independent of each other but


dependent on StuLastName
Problem: Table must list all combinations of values of Major and
Sport for each StuLastName to avoid implying relationships that do
not exist
A table with a multivalued dependency violates the normalization
standard of Fourth Normal Form (4NF) because it creates
unnecessary redundancies and can contribute to inconsistent data.
To bring this up to 4NF, it is necessary to break the Multi-valued
dependency information into two tables:
StudentMajors table: StuLastName, Major
StudentSports table: StuLastName, Sport
Chapter 6: Normalization and Denormalization

Join Dependencies

A Join dependency is generalization of Multivalued dependency


A relation is said to have join dependency if it can be recreated by
joining multiple sub-relations and each of these sub-relations has a
subset of the attributes of the original relation.
Basically a table can be created by joining multiple tables
Chapter 6: Normalization and Denormalization

First Normal Form (1NF)

A table is in First Normal Form (1NF) if and only if the following


conditions are satisfied:
 Each attribute contains only one value (single-valued)
 All attribute values are atomic meaning they cannot be broken down
any further (there are no repeating groups).
 There are no duplicated rows in the table
Chapter 6: Normalization and Denormalization

Example of table that does not satisfy 1NF - NewStu2 Table

NewStud2 (New Student table):


• Assume stuId is the primary key
• Assume students can have more than one major
• The major attribute is not single-valued for each tuple. For a given
stuId, there may be more than one value for major
Chapter 6: Normalization and Denormalization

Ideal Method to First Normal Form (1NF)

Best solution: For each multi-valued attribute, create a new table, in


which you place the key of the original table and the multi-valued
attribute. Keep the original table, with its primary key
Example: NewStu2 (stuId, lastName, credits, status, socSecNo)
Majors (stuId, major)
Chapter 6: Normalization and Denormalization

Second Method to First Normal Form (1NF)

If the number of repeats is limited, make additional columns for multiple


values. Ex: major1 and major2

Drawback: Must know the maximum number of repeats and queries


become more complex.
Chapter 6: Normalization and Denormalization

Third Method to First Normal Form (1NF)

Flatten the original table by making the multi-valued attributes part of


the primary key:
Ex: Student (stuId, major, lastname, credits, status, socSecNo)
Chapter 6: Normalization and Denormalization

Second Normal Form (2NF)

A table is in Second Normal Form (2NF) if and only if the following


conditions are satisfied:
 Table is in First Normal Form (1NF)
 And all non-primary key attributes are fully functionally dependent on
the primary key (no partial dependency)
Note: If primary key has only one attribute and the table is 1NF, then
the table is automatically 2NF
Chapter 6: Normalization and Denormalization

Converting to Second Normal Form (2NF)

 Identify each partial functional dependencies


 Remove the partially functional dependency attributes that depend
on each of the determinants so identified
 Place these determinants in separate table along with their
dependent attributes
 In original table, keep the composite key and any attributes that are
fully functionally dependent on all of it
 Even if the composite key has no dependent attributes, keep that
relation to connect logically the others
Chapter 6: Normalization and Denormalization

Second Normal Form (2NF) Example

StuId classNumber Cost


S1001 ART103A 1000
S1002 HST205A 1500
S1001 MTH101B 2000
S1004 MTH103C 1000
S1004 ART103A 1000
S1002 CSC201A 2000

Note: that there are many classes with the same cost.
ART103A, MTH103C = 1000
Chapter 6: Normalization and Denormalization

Second Normal Form (2NF) Example (Con’t)

2NF tries to reduce the redundant data getting stored on disc.


For example, if there are 100 students taking ART103A, we do not need to
store its cost as 1000 for all the 100 records, instead we can store it once
in the second table as the course fee for C1 is 1000.

Table 1 Table 2
StuId classNumber classNumber Cost
S1001 ART103A ART103A 1000
S1002 HST205A HST205A 1500
S1001 MTH101B MTH101B 2000
S1004 MTH103C MTH103C 1000
S1004 ART103A CSC201A 2000
S1002 CSC201A
Chapter 6: Normalization and Denormalization

Third Normal Form (3NF)

A table is in Third Normal Form (3NF) if and only if the following


conditions are satisfied:
 Table is in Second Normal Form (2NF)
 And all non-primary key attributes are transitive dependent on the
primary key

A table is in 3NF if at least one of the following condition holds in every


non-trivial function dependency X→Y exists:
 X is a superkey
 Y is a prime attribute (each element of Y is part of some candidate
key)
Chapter 6: Normalization and Denormalization

Making a Relation Third Normal Form (3NF)

Example: Student (StuId, StuName, StuCity, StuState, StuZip)


 StuCity and StuState are dependent on StuZip
 StuZip is dependent on StuId
 The non-prime attributes (StuCity, StuState) are transitively dependent
on key (Stuid), which violates the rule 3NF.

To fix this we:


 Remove the dependent attribute from the table.
 Create new table with dependent attribute and its determinant
 Keep the determinant in the original table:
NewStudent (StuId, StuName,StuZip)
zipCode (StuZip, StuCity, StuState)
Chapter 6: Normalization and Denormalization

Boyce-Codd Normal Form (BCNF)

A table is in Boyce-Codd Normal Form (BCNF) if and only if the


following conditions are satisfied:
 Table is in Third Normal Form (3NF)
 And for any non-trivial functional dependency A→B, A is a
superkey

Therefore, to check for BCNF, we simply identify all the determinants


and verify that they are superkeys. If they are not, we break up the
relational table by projection until we have a set of relational tables
all in BCNF.
Chapter 6: Normalization and Denormalization

Boyce-Codd Normal Form (BCNF) - Example

stuId classNumber facName


S1001 ART103A Adams
S1001 HST205A Tanaka
S1002 ART103A Byrne
S1003 MTH101B Smith
S1004 ART103A Adams

- Primary key: stuid, classNumber


- One student can enroll in many classes S1001 (ART103A, HST205A)
- For Each classNumber, Faculty is assigned to student
- ClassNumber can be taught by many Faculty ART103A (Adams,Byrne)
- There is a dependency between ClassNumber and FacName where
ClassNumber is dependent on FacName
Chapter 6: Normalization and Denormalization

Boyce-Codd Normal Form (BCNF) – Example (Con’t)

 This table satisfies the First Normal Form (1NF) because all the


values are atomic, column names are unique and all the values
stored in a particular column are of same domain.
 This table also satisfies the 2nd Normal Form (2NF) as their is
no Partial Dependency.
 This table also satisfies 3rd Normal Form (3NF) as there is
no Transitive Dependency.
 But this table is not in Boyce-Codd Normal Form because faculty is
dependent on ClassNumber and while ClassNumber is part of the
primary key, Faculty is a non-primary key attribute which is not
allowed in BCNF.
Chapter 6: Normalization and Denormalization

Boyce-Codd Normal Form (BCNF) – Example (Con’t)

How to fix this: we remove the dependent attributes to a new relational


table, with the determinant as the primary key

Student stuId FacId


S1001 F101
S1001 F102

Faculty FacID FacName ClassNum


F101 Adams ART103A
F102 Tanaka HST205A
Chapter 6: Normalization and Denormalization

Fourth Normal Form (4NF)

A table is in Fourth Normal Form (4NF) if and only if the following


conditions are satisfied:
 It should be in the Boyce-Codd Normal Form (BCNF).
 the table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization
standard of Fourth Normal Form (4NF) because it creates
unnecessary redundancies and can contribute to inconsistent data.
To bring this up to 4NF, it is necessary to break this information into
two tables.
Chapter 6: Normalization and Denormalization

Fourth Normal Form (4NF) Example

StuId classNumber Hobby


S1001 ART103A Cricket
S1001 MTH101B Hockey

S1001 is taking 2 courses and two hobbies


The problem is that there is no relationship between classNumber and
Hobby – they are independent of each other.
Therefore, there is a multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.

StuId classNumber Hobby


S1001 ART103A Cricket
S1001 ART103A Hockey
S1001 MTH101B Cricket
S1001 MTH101B Hockey
Chapter 6: Normalization and Denormalization

Fourth Normal Form (4NF) Example

To fix this, we can decompose the table into 2 tables.


StuId classNumber
S1001 ART103A
S1001 MTH101B

StuId Hobby
S1001 Cricket
S1001 Hockey
Chapter 6: Normalization and Denormalization

Fifth Normal Form (5NF)

Fifth Normal Form (5NF) also known as Project-Join Normal Form


(PJ/NF).
A table is in Fifth Normal Form (5NF) if and only if the following conditions
are satisfied:
 Table is in Fourth Normal Form (4NF)
 It cannot be further non-loss or lossless decomposed (join dependency)

A lossless join decomposition is a decomposition of a table into tables


such that a natural join of the two smaller tables yields back the original
table. This is central in removing redundancy safely from databases
while preserving the original data.

Note: 5NF is satisfied when all the tables are broken into as many tables
as possible in order to avoid redundancy.
Chapter 6: Normalization and Denormalization

Fifth Normal Form (5NF) - Example

FacID Subject Semester


F101 History 1
F101 Math 1
F102 Math 2

F101 teaches History and Math for Semester 1 but does not teach
Math for Semester 2
The primary key is the combination of all three columns.
Assume we want to add Semester 3 but do not know who will be
teaching what subject. We cannot leave the attributes null since they
are part of the primary key
Chapter 6: Normalization and Denormalization

Fifth Normal Form (5NF) – Example (Con’t)

To fix this we need to decompose this into following 3 tables:

Table 1 Table 2 Table 3


FaciD Subject FacID Semester Semester Subject
F101 History F101 1 1 History
F101 Math F101 1 1 Math
F102 Math F102 2 2 Math
Chapter 6: Normalization and Denormalization

Domain-Key Normal Form (DKNF)

A table is in Domain-Key Normal Form (DKNF) when every constraint


is a logical consequence of domain constraints or key constraints.

 Domain constraints specify the possible values of the attribute.


Ex: Colors are only black and white.
Ex: GPA of Student is between 0 and 4.
 Key constraints specify keys of some table.

The basic idea behind the DKNF is to specify the normal form that
takes into account all the possible dependencies and constraints.
In other words, DKNF requires that the database contains no
constraints other than domain constraints and key constraints.
Chapter 6: Normalization and Denormalization

Denormalization

 Denormalization is the reverse process of normalization

 When to stop the normalization process


 When applications require too many joins
 When you cannot get a non-loss decomposition that preserves
dependencies

 Denormalization means deliberately choosing a lower normal form


Chapter 6: Normalization and Denormalization

Problems with a Normalized Database

 Best used when the source data is relatively simple


 Stored data does not resemble the original documents from which it
is taken, but instead is shredded into separate tables
 Usually store only the most current information, not historical data
 Useful for OLTP, online transaction processing
 Optimized for write operations; read operations may be slow, if joins
of the tables are required
Chapter 6: Normalization and Denormalization

Non-Normalized Databases

 OLAP systems
 Used for planning and decision-making
 Require historical data, not just current data
 Updates are rare; optimized for reading
 Data stored in denormalized form
 Object-based systems
 Needed for advanced applications
 Objects are not normalized
Chapter 6: Normalization and Denormalization

Non-Normalized Databases (Con’t)

 Big data systems-XML, Google’s Big Table, HBase, Cassandra,


and Hadoop
 Capture data in the format of its source
 Store data in a denormalized, usually duplicated, form
 Allow multiple versions of items to be stored
 Provide efficient and scalable read access to data
 Facilitate data transmission
Chapter 6 Questions?
Chapter 6: Team Assignment Review

Normalization

Chapter 6 – Team Project: Normalizing the Relational Model for the


Team Project and Creating a Normalized Oracle Database
Read the sample project steps for this chapter and apply the same
techniques to the team project that you are developing. For the team
project, do the following:

Step 6.1 - Begin with the list of the tables that the entities and
relationships from the E-R diagram mapped to naturally, from
the sample project section at the end of chapter 4.
For each table on the list, identify functional dependencies and
normalize the relation to Boyce-Codd Normal Form (BCNF). Then
decide whether the resulting tables should be implemented in that
form. If not, explain why.
Chapter 6: Team Assignment Review

Normalization (Con’t)

Step 6.2 - Update the data dictionary and list of assumptions as


needed.
 

Step 6.3 - For each table, write the table name and write out the
names, data types, and sizes of all the data items.
Identify any constraints, using the conventions of the DBMS you will use for
implementation.
 

Step 6.4 - Write and execute SQL statements to create all the tables
needed to implement the design

Step 6.5 - Write and execute SQL statements to create indexes for
foreign keys and any other columns that will be used most often for
queries. (primary key, foreign key, check constraints)
Note: Step 6.4 and Step 6.5 can be combined.
Chapter 6: Team Assignment Review

Normalization (Con’t)

Step 6.6 - Write and execute SQL statements to insert at least five
records in each table, preserving all constraints.
Put in enough data to demonstrate how the database will function.

Step 6.7 - Write and execute SQL statements that will process five
non-routine requests for information from the database just
created.
Note: Make sure to write 5 different SQL statements. Also use a
WHERE clause or join tables.
Do not write select * from <table_name>;
Chapter 6: Team Assignment Review

Normalization (Con’t)

Step 6.8 - Write and execute SQL statements to create at least one
trigger.

Step 6.9 - Write and execute SQL statements to demonstrate that


the trigger is working as expected.
To demonstrate that the trigger is working as expected, provide a
screenshot of the data before and after the trigger is executed.

You might also like