0% found this document useful (0 votes)
13 views67 pages

6.CSI2004-ADBMS Normalization

Uploaded by

roshika.s2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views67 pages

6.CSI2004-ADBMS Normalization

Uploaded by

roshika.s2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

CSI2004 - Advanced Database

Management Systems
Relational Database Model

1/10/2025 10:26:03 2
AM
Relational Database Model

•In the relational database management system


(RDBMS), the data is represented as a set of
relations.

Relations:
• A relation appears as a two-dimensional table.
• The RDBMS organizes the data so that its
external view is a set of relations or tables.
• This does not mean that data is stored as tables:
the physical storage of the data is independent
of the way in which the data is logically
organized.
1/10/2025 10:26:03
AM
14.3
Relational Database Model
A relation in an RDBMS has the following features:

 Name. Each relation in a relational database should


have a name that is unique among other relations.
 Attributes. Each column in a relation is called an
attribute. The attributes are the column headings in
the table
 Tuples. Each row in a relation is called a tuple. A
tuple defines a collection of attribute values. The total
number of rows in a relation is called the cardinality of
the relation. Note that the cardinality of a relation
changes when tuples are added or deleted. This makes
the database dynamic.
14.4
Relation Schema
Relational Database Model
Basic Structure
 Formally, given sets D1, D2, …. Dn a relation r is a subset of
D 1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai 
Di
 Example: If
/* Set of all customer names */
 customer_name = {Jones, Smith, Curry, Lindsay, …}
/* set of all street names*/
 customer_street = {Main, North, Park, …}
/* set of all city names */
 customer_city = {Harrison, Rye, Pittsfield, …}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) }
is a relation over
customer_name x customer_street x customer_city
Relation Schema

 A1, A2, …, An are attributes


 R = (A1, A2, …, An ) is a relation schema
Example:
Customer_schema = (customer_name,
customer_street, customer_city)
 r(R) denotes a relation r on the relation schema R
Example:
customer (Customer_schema)
Relation Schema
 D is called the domain of Ai and is denoted by dom(Ai).

 A relation schema is used to describe a relation; R is called


the name of this relation.
 The degree (or arity) of a relation is the number of attributes
n of its relation schema

STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Gpa)

STUDENT(Name: string, Ssn: string, Home_phone: string,


Address: string, Office_phone: string, Age: integer, Gpa: float)
dom(Name) = Names;
dom(Ssn) = Social_security_numbers;
dom(HomePhone) =USA_phone_numbers3,
dom(Office_phone) = USA_phone_numbers,
dom(Gpa) =Grade_point_averages
Relational Database Design, Schema
Refinement

1/10/2025 10:26:03 10
AM
Database Design

How to design a relational database schema


based on the conceptual schema design
Relational Database Design
 Relational database design: The grouping of attributes to
form "good" relation schemas
 Relational database design requires that we find a “good”
collection of relation schemas.
 A bad design may lead to
 Repetition of Information.
 Inability to represent certain information.
 Criteria for "good" relations:
 Discuss guidelines for good relational design
 Discuss formal concepts of functional dependencies
and normal forms 1NF 2NF 3NF BCNF
Informal Design Guidelines for Relational
Databases

1. Semantics of the Relation Attributes


2. Redundant Information in Tuples and Update
Anomalies
3. Null Values in Tuples
4. Spurious Tuples

1/10/2025 10:26:03 13
AM
Informal Design Guidelines for Relational
Databases

 GUIDELINE 1: Design a schema that can be explained


easily relation by relation. The semantics of attributes
1/10/2025 should be easy to interpret.
 GUIDELINE 2: Design a schema that does not suffer
from the insertion, deletion and update anomalies. If there
are any present, then note them so that applications can be
made to take them into account
 GUIDELINE 3: Relations should be designed such that
their tuples will have as few NULL values as possible
 GUIDELINE 4: The relations should be designed to
satisfy the lossless join condition. No spurious tuples
should be generated by doing a natural-join of any
relations.
Informal Design Guidelines for Relational
Databases

1.Semantics of the Relation Attributes

 Informally, each tuple in a relation should represent


one entity or relationship instance. (Applies to
individual relations and their attributes).
 Attributes of different entities (EMPLOYEEs,
DEPARTMENTs, PROJECTs) should not be mixed
in the same relation
 Only foreign keys should be used to refer to other
entities
 Entity and relationship attributes should be kept
apart as10:26:03
1/10/2025 much as possible. 15
AM
Informal Design Guidelines for Relational
Databases

1/10/2025 10:26:03 16
AM
Informal Design Guidelines for Relational
Databases
Redundant Information in Tuples and Update
Anomalies

 Mixing attributes of multiple entities may cause


problems
 Information is stored redundantly wasting storage
 Problems with update anomalies
 Insertion anomalies
 Deletion anomalies
 Modification anomalies
1/10/2025 10:26:03 17
AM
Informal Design Guidelines for Relational
Databases
 Update Anomaly: Changing the name of project number
P1 from “ProductX” to “Product Y” may cause this update
to be made for all employees working on project P1.

1/10/2025 10:26:03 18
AM
Informal Design Guidelines for Relational
Databases

 Insert Anomaly: Cannot insert a project unless an


employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project.
 Delete Anomaly: When a project is deleted, it will
result in deleting all the employees who work on that
project. Alternately, if an employee is the sole
employee on a project, deleting that employee would
result in deleting the corresponding project.

1/10/2025 10:26:03 19
AM
Informal Design Guidelines for Relational
Databases

Null Values in Tuples

 Attributes that are NULL frequently could be


placed in separate relations (with the primary key)
 Reasons for nulls:
 attribute not applicable or invalid
 attribute value unknown (may exist)
 value known to exist, but unavailable

1/10/2025 10:26:03 20
AM
Informal Design Guidelines for Relational
Databases
Spurious Tuples

• Bad designs for a relational database may result in


erroneous results for certain JOIN operations
• The "lossless join" property is used to guarantee
meaningful results for join operations
There are two important properties of
decompositions:
(a) non-additive or losslessness of the
corresponding join
(b) preservation of the functional dependencies.
1/10/2025 10:26:03 21
AM
Functional Dependency

• Functional dependencies (FDs) are used to


specify formal measures of the "goodness" of
relational designs
• FDs and keys are used to define normal forms for
relations
• FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
• A set of attributes X functionally determines a
set of attributes Y if the value of X determines a
unique value for Y
FUNCTIONAL DEPENDENCIES

 Functional dependency is a constraint between two


attributes or two sets of attributes
 Assume X and Y are two attributes. Functional
dependency between X and Y is represented by
XY
 The attribute on the left hand side of the arrow in
functional dependency is called a determinant.
 The attribute on the right hand side of the arrow in
functional dependency is called a dependent
 X  Y holds if whenever two tuples have the same
value for X, they must have the same value for Y
 Example: sno  sname
FUNCTIONAL DEPENDENCIES
 Example: social security number determines employee
name
SSN  ENAME
 X  Y holds if whenever two tuples have the same
value for X, they must have the same value for Y
 For any two tuples t1 and t2 in any relation
instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]
 Denoted by X  Y
 X is Called the Left Hand Side of FD
 Y is Called the Right Hand Side of FD
 Read as X Functionally Determines Y in R
 However, a candidate key is always a determinant, but
a determinant doesn’t need to be a key.
FUNCTIONAL DEPENDENCIES

Student
SNo SName CNo CName Addr Instr. Office
5425 Susan 102 Calc I …San Jose, CA P. Smith B42
Ross Room
112
7845 Dave 541 Bio 10 .San Diego, CA L. Talip B24
Turco Room
210

SNo -> SName CNo -> CName Instr -> Office


SNo -> Addr CNo -> Instr
FUNCTIONAL DEPENDENCIES

 project number determines project name and


location
PNUMBER  {PNAME, PLOCATION}

 employee ssn & project number determines the


hours per week that the employee works on the
project
{SSN, PNUMBER}  HOURS
FUNCTIONAL DEPENDENCIES

Inference Rules for FDs


 Given a set of FDs F, we can infer additional FDs that hold
whenever the FDs in F hold
 Armstrong's inference rules(Armstrong's Axioms):
 IR1. (Reflexive) If Y subset-of X, then X  Y
 IR2. (Augmentation) If X  Y, then XZ  YZ
(Notation: XZ stands for X U Z)
 IR3. (Transitive) If X  Y and Y  Z,
then X  Z
 IR1, IR2, IR3 form a sound and complete set of inference
rules
FUNCTIONAL DEPENDENCIES

Inference Rules for FDs


 Some additional inference rules that are useful:
 Decomposition: If X  YZ, then X  Y and
XZ
 Union: If X  Y and X  Z, then X  YZ
 Psuedotransitivity: If X  Y and WY  Z, then
WX  Z
 Self-determination: A  A
 Composition : if A B and C D , then
ACBD
 General unification Theorem : if AB and
CD, then AU(C-B)BD
Normalization
Need for Normalization
Normalization
Need:
 Technique of organizing the data into multiple
related tables, to minimize data redundancy.
 Redundancy:

 Repetition of similar data at multiple places


 Increases the size of database
 Creates issues like insertion anomaly, deletion
anomaly and updation anomaly
Example

ROLL NO NAME BRANCH HOD OFFICE_CONTACT NO

1 DINESH CSE Mr. M 43377

2 ASHOK CSE Mr. M 43377

3 RANI CSE Mr. M 43377

4 SASIKALA CSE Mr. M 43377

Redundancy
Issues due to Redundancy

 Insertion anomaly
 Insert redundant data for every new row
 Two different but related data is stored in the
same table
ROLL NAME BRANCH HOD OFFICE_CONTACT
NO NO
1 DINESH CSE Mr. M 43377

2 ASHOK CSE Mr. M 43377

3 RANI CSE Mr. M 43377

4 SASIKALA CSE Mr. M 43377


Issues due to Redundancy

 Deletion anomaly
 Deleting student information also deletes the branch
information
 When no student in the department – nowhere
department details will be in database
 Branch information deleted along with student data
ROLL NAME BRANCH HOD OFFICE_CONTACT NO
NO
1 DINESH CSE Mr. M 43377
2 ASHOK CSE Mr. M 43377
3 RANI CSE Mr. M 43377
4 SASIKALA CSE Mr. M 43377
Issues due to Redundancy

 Updation anomaly
 When Mr.M leaves and Mr.N Joins as HOD/CSE,
need to modify for each student record the HOD data
 When even a single record missing the modification –
data inconsistency occurs
ROLL NAME BRANCH HOD OFFICE_CONTACT NO
NO
1 DINESH CSE Mr. M 43377
2 ASHOK CSE Mr. M 43377
3 RANI CSE Mr. M 43377
4 SASIKALA CSE Mr. M 43377
Solution to these problems

NORMALIZATION
HOW?
NORMALIZATION

Student Table

Student Table Branch Table


NORMALIZATION
ROLL NO NAME BRANCH HOD OFFICE_CONTACT NO
1 DINESH CSE Mr. M 43377
2 ASHOK CSE Mr. M 43377
3 RANI CSE Mr. M 43377
4 SASIKALA CSE Mr. M 43377

ROLL NAME BRANCH BRANCH HOD OFFICE_CO


NO NTACT NO
1 DINESH CSE CSE Mr. M 43377
2 ASHOK CSE
3 RANI CSE Insertion, deletion and updation
4 SASIKALA CSE anomalies are solved
Normal Forms
NORMALIZATION
 A step-by-step process of decomposing
unsatisfactory "bad" relations by breaking up
their attributes into smaller relations

 The different stages of normalization are known


as “Normal Forms”

 To accomplish normalization we need to


understand the concept of Functional
Dependencies and Keys.
Keys

 Key is an attribute or combination of attributes,


that uniquely identifies a row in a relation.
Definitions of Keys & Attributes Participating in
Keys

 A superkey of a relation schema R = {A1, A2, ...., An} is a set of


attributes S subset-of R with the property that no two tuples t1
and t2 in any legal relation state r of R will have t1[S] = t2[S]
 If a relation schema has more than one key, each is called a
candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys.
 A Prime attribute must be a member of some candidate key
 A Nonprime attribute is not a prime attribute—that is, it is not
a member of any candidate key.
NORMALIZATION
 Normal Forms:
 First Normal Form

 Second Normal Form

 Third Normal Form

 Boyce-Codd Normal Form

 Fourth Normal Form

 Fifth Normal Form


NORMALIZATION
Table with
Multivalued
attributes
Remove Multi-valued Attributes
1NF
Remove partial dependencies
2NF
Remove transitive dependencies

3NF
Remove remaining anomalies
resulting from FDs
BCNF

Remove Multi-valued dependencies

4NF
Remove remaining anomalies
Join dependencies
5NF
1NF
First Normal Form
 Each attribute should have Atomic values
 A column should contain values from the same
domain
 Each column should have unique name
 No ordering of tuples and columns
 No duplicate rows
 Only Simple and Indivisible Values in the
Domain of Attributes.
 Disallows composite attributes, multivalued
attributes, and nested relations
First Normal Form

Approach 1

 A table with multivalued attributes is converted to


a relation in 1NF by extending the data in each
column to fill cells that are empty because of the
multivalued attributes.
Example of 1NF
Approach 1
First Normal Form
Anomalies in 1NF
•Insert anomalies
•Delete anomalies
•Update anomalies

Duplication of values in 1NF table.


Example of 1NF
Approach 2:
Decompose into smaller schema in 1NF

Not in 1NF

Decompose DEPARTMENT schema into DEPT1 and DEPT2

DEPT1 =(Dname, Dnumber,Dmgr_ssn)


In 1NF
DEPT2=(Dnumber, Dlocations)
Example of 1NF
Not in 1NF

STUDENT (sid, sname, credits, dname, building, roomno)

Decompose STUDENT schema into DEPT1 and DEPT2

STUD1 =(sid, sname, credits, dname)


In 1NF
STUD2=(dname, building, roomno)
Example of 1NF

Composite attributes 
Create a separate attribute for each component attribute
Multivalued attributes 
Create separate schema,
Map multivalued attribute to separate tuple of the relation
Example of 1NF
2NF
Second Normal Form

 A Relation is said to be in Second Normal Form if and


only if :
 It is in the First normal form and

 No partial dependency exists between non-key

attributes and key attributes

 Partial Functional Dependency: A FD in which one or


more nonkey attributes are functionally dependent on
part (but not all) of the primary key.
Second Normal Form
Second Normal Form
Second Normal Form

Is the EMP_PROJ is in 2NF?


If not, show the 2NF of EMP_PROJ
Second Normal Form

2NF of EMP_PROJ
Third Normal Form
 A relation R is said to be in the Third Normal Form
(3NF) if and only if
 It is in 2NF and

 No transitive dependency exists between non-

key attributes.
 Transitive Functional dependency
- Assume X, Y and Z are three attributes.
X -> Y , Y-> Z => X -> Z

Let R={a,b,c}
PK={a} & bc holds
R decomposed into R1 & R2
R1= {b,c} PK=b
R2= {a,b} PK= {a} FK=b
Third Normal Form
Third Normal Form
Third Normal Form
Normalization
Example Problem for Practice
Consider the pet health history report table given below and normalize it to 3NF
Normalization

You might also like