0% found this document useful (0 votes)
65 views111 pages

UNIT3 Functional Dependency and Normalization

Uploaded by

Siddharth Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views111 pages

UNIT3 Functional Dependency and Normalization

Uploaded by

Siddharth Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Unit 3

Functional Dependency
And
Normalization
Unit 3.1

Functional Dependency
Introduction
• ensure that changes made to the database by
authorized users do not result in a loss of data
consistency.
• integrity constraints guard against accidental damage
to the database.
• seen two forms of integrity constraints for the E-R
model
• Key declarations - certain attributes form a
candidate key for a given entity set.
• Form of a relationship - many to many, one to
many, one to one.
• integrity constraints with minimal overhead
• Domain constraints
• Referential Integrity
• Functional Dependency
• Normalization
Domain Constraints
• domain of possible values must be associated with
every attribute
• standard domain types, such as integer types,
character types, and date/time types defined in SQL
• Declaring an attribute to be of a particular domain
acts as a constraint on the values that it can take
• Domain constraints are the most elementary form of
integrity constraint.
• They are tested easily by the system whenever a
new data item is entered into the database
• for several attributes may have the same domain.
• For example, the attributes customer-name and
employee-name might have the same domain:
i.e. the set of all person names
• whether customer-name and branch-name should
have the same domain.
• At the implementation level, both customer names
and branch names are character strings.
• at the conceptual, rather than the physical level,
customer-name and branch-name should have
distinct domains
• Equivalent Statement in Oracle
• CREATE TABLE gender_domain
• (gender VARCHAR2(1) PRIMARY KEY,
CONSTRAINT ch_gen CHECK (gender IN
('M', 'F')));
Referential Integrity

• value that appears in one relation for a given set of


attributes also appears for a certain set of
attributes in another relation.
• This condition is called referential integrity.
Referential Integrity in SQL
• Foreign keys can be specified as part of the SQL
create table statement by using the foreign key
clause
create table account
(...
foreign key (branch-name) references branch(name)
on delete cascade
on update cascade,
...)
• create table suppliers1(
• id number,name varchar2(50),CONSTRAINT FK_SP1
foreign key(id) references consumers(p_id) on delete
cascade )
• on delete cascade
• if a delete of a tuple in branch results in this
referential-integrity constraint being violated,
the system does not reject the delete.
• Instead, the delete “cascades” to the account
relation, deleting the tuple that refers to the branch
that was deleted

• Similarly, the system does not reject an update to


field referenced by the constraint
• instead, the system updates the field branch-name in
referencing tuples in account to new value
• actions other than cascade, if the constraint is
violated:
• The referencing field (branch-name) can be set to
null by using set null in place of cascade

• create table suppliers2


(id number,name varchar2(50),
CONSTRAINT FK_SP2 foreign key(id) references
consumers(p_id) on delete set null )
Database Modification

• Database modifications can cause violations of


referential integrity.
Database Modification
Pitfalls in Relational-Database Design
• Repetition of information
• Complicating to update
• Inability to represent certain information

Lending-schema = (branch-name, branch-city, assets,


customer-name, loan-number, amount)

• t[assets] is the asset figure for the branch named t[branch-name].


t[branch-city] is the city in which the branch named t[branch- name] is
located.
• t[loan-number] is the number assigned to a loan made by the branch
named t[branch-name] to the customer named t[customer- name].

• t[amount] is the amount of the loan whose number is t[loan-number].


Pitfalls in Relational-Database Design
• Add a new loan to our database
• loan is made by the Perryridge branch to Adams in the amount
of $1500. Let the loan-number be L-31
• Repeat the asset and city data for the Perryridge branch
• (Perryridge, Horseneck, 1700000, Adams, L-31,
1500)
• Asset of branch downtown changes from 17000 to 19000.

• Expected only one tuple need to change value.

• But In alternate design, more tuple gets changed & its costly
Pitfalls in Relational-Database Design
Pitfalls in Relational-Database Design

• Another problem with the Lending-schema design is that we


cannot represent directly the information concerning a
branch (branch-name, branch-city, assets) unless there
exists at least one loan at the branch.
Functional Dependencies
• the goal of a relational-database design is to
generate a set of relation schemas
• that allows us to store information without
unnecessary redundancy,
• yet also allows us to retrieve information easily.

One approach is to design schemas that are in an


appropriate normal form.
• Functional dependencies play a key role in
differentiating good database designs from bad
database designs.
• A functional dependency is a type of constraint that
is a generalization of the notion of key
• Functional dependencies are constraints on the
set of legal relations
• Definition:
• A functional dependency occurs when one attribute
in a relation uniquely determines another attribute.
• This can be written A -> B
• which would be the same as stating "B is functionally
dependent upon A."
• Let R be the relation, and
• let x and y be the arbitrary subset of the set of
attributes of R.
• Then we say that Y is functionally dependent on x –
in symbol.
• X→Y
• (Read x functionally determines y)
• If and only if each x value in R has associated with it
precisely one y value in R
• In other words
• Whenever two tuples of R agree on their x value,
they also agree on their Y value.
• In a table listing employee characteristics including
Social Security Number (SSN) and name,
• it can be said that name is functionally dependent
upon SSN (or SSN -> name)
• because an employee's name can be uniquely
determined from their SSN.
• However, the reverse statement (name -> SSN) is
not true because more than one employee can have
the same name but different SSNs.
Basic Concepts
• Consider a relation schema R, and
• let α ⊆ R and β ⊆ R.
• The functional dependency α→βholds on schema
R if,
• in any legal relation r(R),
• for all pairs of tuples t1 and t2 in r such that
• t1[α] = t2[α], then
• t1[β] = t2[β].

1:1 relationship between attribute(s) on left and right-hand


side of a dependency hole for all time.
• X → Y means
• Given any two tuples in r, if the X values are the
same,
• then the Y values must also be the same.
• (but not vice versa)

•Read “→” as “determines”

if “K → all attributes of R”

then K is a superkey for R

• FDs are a generalization of keys.


Prof. V. V. Kheradkar
• Loan-info-schema = (loan-number, branch-name,
customer-name, amount)
• The set of functional dependencies that we expect to
hold on this relation schema is
• loan-number →amount
• loan-number →branch-name
• We would not, however, expect the functional
dependency
• loan-number →customer-name
• in general, a given loan can be made to more than
one customer (for example, to both members of a
husband–wife pair).
• We shall use functional dependencies in two ways:
• 1. Totest relations to see whether they are legal
under a given set of functional dependencies.
• If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
• 2. To specify constraints on the set of legal relations.
• concern with only those relations that satisfy a given
set of functional dependencies.
• If we wish to constrain ourselves to relations on
schema R that satisfy a set F of functional
dependencies,
• we say that F holds on R.
• Some functional dependencies are said to be trivial
• because they are satisfied by all relations.
• For example, A → A is satisfied by all relations
involving attribute A.
• for all tuples t1 and t2 such that t1[A] = t2[A],
• it is the case that t1[A] = t2[A].
• Similarly, AB → A is satisfied by all relations involving
attribute A.
• In general, a functional dependency of the form α →
β is trivial if β ⊆ α.
• An FD is trivial if and only if
• the right hand side is a subset of the left hand side.
• e.g. <S#, P#> → <S#>. (Trivial)
• Nontrivial dependencies are the one, which are not
trivial.

• when we design a relational database,


• we first list those functional dependencies that must
always hold.
• In the banking example, list of dependencies
includes the following:
• Branch-schema = (branch-name, branch-city, assets)
• Customer-schema = (customer-name, customer-
street, customer-city)
• Loan-schema = (loan-number, branch-name,
amount)
• Borrower-schema = (customer-name, loan-number)
• Account-schema = (account-number, branch-name,
balance)
• Depositor-schema = (customer-name, account-
number)
• On Branch-schema:
• branch-name →branch-city
• branch-name →assets

• On Customer-schema:
• customer-name→ customer-city
• customer-name→ customer-street

• On Loan-schema:
• loan-number → amount
• loan-number → branch-name
• On Borrower-schema:
• No functional dependencies

• On Account-schema:
• account-number → branch-name
• account-number → balance

• On Depositor-schema:
• No functional dependencies
Types of functional dependencies:
• Full Functional dependency
• Partial Functional dependency
• Transitive dependency
Closure of a Set of
Functional Dependencies
• The set of all FDs that are implied by a given set S of
FDs
• is called the closure of S, denoted by S+
• It is not sufficient to consider the given set of
functional dependencies.
• We need to consider all functional dependencies that
hold
• given a relation schema R = (A, B, C, G, H, I)
• and the set of functional dependencies
• A→B
• A→C
• CG→ H
• CG→ I
• B→H
• The functional dependency A→ H is logically implied.
• Suppose
• t1 and t2 are tuples such that
• t1[A] = t2[A]
• Since we are given that A→B,
• it follows from the definition of functional dependency
that
• t1[B] = t2[B]
• Then, since we are given that B → H,
• it follows from the definition of functional dependency
that
• t1[H] = t2[H]
• Let F be a set of functional dependencies.
• The closure of F, denoted by F+,
• is the set of all functional dependencies logically
implied by F.
• Given F, we can compute F+ directly from the formal
definition of functional dependency.
• If F is large, this process would be lengthy and
difficult.
Armstrong’s Axioms (Rules of inference)
• Axioms, or rules of inference, provide a simpler
technique for reasoning about functional
dependencies.

• We can use the following three rules to find logically


implied functional dependencies.

• By applying these rules repeatedly, we can find all of


F+, given F.
• This collection of rules is called Armstrong’s
axioms in honuor of the person who first proposed it.
1. Reflexivity rule.
• If α is a set of attributes and β ⊆ α, then α →β holds

2. Augmentation rule.
• If α → β holds and γ is a set of attributes, then γα →
γβ holds.

3. Transitivity rule.
• If α →β holds and β → γ holds, then α → γ holds.
• Armstrong’s axioms are sound,
• because they do not generate any incorrect
functional dependencies.
• They are complete, because, for a given set F of
functional dependencies, they allow us to generate
all F+.
• Although Armstrong’s axioms are complete,
• it is tiresome to use them directly for the computation
of F+.
• To simplify matters further, we list additional rules.
• Union rule.
• If α → β holds and α → γ holds, then α →βγ holds.

• Decomposition rule.
• If α →βγ holds, then α → β holds and α →γ holds.

• Pseudotransitivity rule.
• If α→β holds and γβ →δ holds, then αγ →δ holds.
1. Reflexivity: if B is a subset of A, then A → B.
2. Augmentation: if A → B then AC → BC
3. Transitivity: it A → B and B → C then A → C.
4. Self – determination: A → A.
5. Decomposition: If A → BC, then A→B,A→C.
6. Union: it A→ B and A→ C, then A → BC
7. Composition: if A → B, C → D then AC → BD.
8. If A → B and C → D, then All (C – B) → BD
• Let us apply our rules to the example of schema
• R = (A, B, C, G, H, I) and
• the set F of functional dependencies
• {A → B, A → C, CG → H, CG → I, B → H}.
• We list several members of F+ here:

• A → H. Since A → B and B → H hold, we apply the


transitivity rule.
• CG → HI . Since CG → H and CG → I , the union
rule implies that CG → HI .
• AG → I. Since A → C and CG → I, the
pseudotransitivity rule implies that AG → I holds.
• Another way of finding that AG → I holds is as
follows. We use the augmentation rule on A → C to
infer AG → CG.
• Applying the transitivity rule to this dependency and
CG → I, we infer AG → I.
Closure of Attribute Sets
• To test whether a set α is a superkey,
• we must devise an algorithm for computing the set of
attributes functionally determined by α.
• One way of doing this is to compute F+,
• take all functional dependencies with α as the left-
hand side, and take the union of the right-hand sides
of all such dependencies.
• However, doing so can be expensive, since F+ can
be large.
Closure of Attribute Sets
• Algorithm to compute a+, the closure of a under F
• compute (AG)+ with the functional dependencies {A
→ B, A → C, CG → H, CG → I, B → H}.
• We start with result = AG.
• A → B causes us to include B in result.
• we observe that A → B is in F, A ⊆ result (which is
AG), so result := result 𝖴 B.
• A→ C causes result to become ABCG.
• CG→H causes result to become ABCGH.
• CG→I causes result to become ABCGHI.
Canonical Cover
• Suppose that we have a set of functional
dependencies F on a relation schema.
• Whenever a user performs an update on the relation,
• the database system must ensure that the update
does not violate any functional dependencies,
• that is, all the functional dependencies in F are
satisfied in the new database state.
• The system must roll back the update if it violates
any functional dependencies in the set F.
• We can reduce the effort spent in checking for
violations by testing a simplified set of functional
dependencies that has the same closure as the given
set.
• Any database that satisfies the simplified set of
functional dependencies will also satisfy the original
set, and vice versa,
• since the two sets have the same closure.
• However, the simplified set is easier to test.
• An attribute of a functional dependency is said to be
extraneous
• if we can remove it without changing the closure of
the set of functional dependencies.
• The formal definition of extraneous attributes is as
follows.
• For example, suppose we have the functional
dependencies AB → C and A → C in F.
• Then, B is extraneous in AB → C.
• A canonical cover Fc for F is a set of dependencies
such that F logically implies all dependencies in Fc,
• and Fc logically implies all dependencies in F.
• Furthermore, Fc must have the following properties:
• No functional dependency in Fc contains an
extraneous attribute.
• Each left side of a functional dependency in Fc is
unique.
• That is, there are no two dependencies α1 → β1 and
α2 → β2 in Fc such that α1 = α2.
• Consider the following set F of functional
dependencies on schema (A,B,C):
• A→BC
• B→C
• A→B
• AB → C

• Let us compute the canonical cover for F.


• There are two functional dependencies with the
same set of attributes on the left side of the arrow:
• A→BC
• A→B
• We combine these functional dependencies into A→
BC.

• A is extraneous in AB → C because F logically


implies (F −{AB → C}) 𝖴 {B → C}.
• This assertion is true because B → C is already in
our set of functional dependencies.
• C is extraneous in A → BC, since A→ BC is logically
implied by A → B and B →C.

• Thus, our canonical cover is


• A→B
• B→C

• A canonical cover might not be unique.


Testing if an Attribute is Extraneous
Consider a set F of functional dependencies and the functional dependency  →  in F.
To test if attribute A   is extraneous in 
1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains ; if it does, A is extraneous
in 
To test if attribute A   is extraneous in 
1. compute + using only the dependencies in
F’ = (F – { → })  { →( – A)},
2. check that + contains A; if it does, A is extraneous in 
Computing a Canonical Cover
1. Let R = (A, B, C) and F = {AB → C, A → C}
Find canonical cover?

Check A is extraneous in AB → C?
compute (AB – A)+ under F
if contains C then extraneous
Check B is extraneous in AB → C?
compute (AB – B)+ under F
if contains C then extraneous

2. Let R = (A, B, C,D,E) and F = {AB → CD, A → E, E→ C }


Find canonical cover?

Check A is extraneous in AB → CD?


Check B is extraneous in AB → CD?
Decomposition
• Lending-schema = (branch-name, branch-city,
assets, customer-name, loan-number, amount)
• The set F of functional dependencies that we require
to hold on Lending-schema are

• branch-name → branch-city, assets


• loan-number → amount, branch-name
• Lending-schema is an example of a bad database
design.
• Assume that we decompose it to the following three
relations:

• Branch-schema = (branch-name, branch-city, assets)


• Loan-schema = (loan-number, branch-name,
amount)
• Borrower-schema = (customer-name, loan-number)
Unit 3.2

Normalization
Why Database Normalization Matters

Database normalization is a crucial process in designing


efficient, organized databases

we'll explore the primary purpose behind normalization


Data Anomalies

Without normalization, databases can suffer from data


anomalies.

These anomalies include redundancy, inconsistency, and


other issues that hinder data accuracy.
The Core Goal of Normalization

At its core, normalization seeks to eliminate data redundancy


and ensure data integrity

This process prevents the same data from being stored in


multiple places, reducing inconsistencies.
Ensuring Data Accuracy

By eliminating redundancy and organizing data efficiently,


normalization promotes data accuracy.

When you make a change in one place, it reflects consistently


throughout the database.
The Relevance of Normalization

Normalization might seem complex, but its relevance lies in


maintaining well-structured databases.

Its primary purpose is to ensure that data remains accurate


and consistent, making the database a valuable resource.
The Foundation of Database Normalization

First Normal Form (1NF) is the initial step in database


normalization.

It ensures that each column contains atomic (indivisible)


values.
The Problem

Before 1NF, data redundancy can lead to inconsistencies.

Consider a table with repeating values; this is where 1NF


comes into play.
Definition of 1NF

What Is First Normal Form (1NF):

1NF requires that a table must have no repeating groups or


arrays.

Each column should store only individual, atomic values.


1NF in Equation

A table is in 1NF if, for every non-key attribute, each field


contains only atomic values.

Equation: R(A1, A2, ..., An) is in 1NF if, for each non-key
attribute Ai, the values are atomic.
Breaking It Down
Understanding Atomic Values

Atomic values cannot be divided further.

For example, a "Name" column should not contain both first


and last names in a single cell.
1NF in Action

Let's consider a table to illustrate 1NF.

Table: “Students”

Columns: "StudentID," "Name," "Courses"


Before 1NF Transformation

Restructuring for 1NF


Why 1NF Matters

1NF eliminates data redundancy and ensures data


consistency.

It simplifies database operations and maintains data


accuracy.
Second Normal Form (2NF)
Building on Data Organization

Second Normal Form (2NF) refines data organization in


databases.

It goes beyond 1NF to eliminate partial dependencies.


The Problem

Before 2NF, partial dependencies can cause data anomalies.

Consider a table with composite primary keys; this is where


2NF comes into play.
What Is Second Normal Form (2NF)

2NF requires that a table be in 1NF and

All non-key attributes are fully functionally dependent on the


entire primary key
2NF in Equation

A table is in 2NF if, for every non-key attribute, it is fully


functionally dependent on the entire primary key.

Equation: R(A1, A2, ..., An) is in 2NF if, for each non-key
attribute Ai, it is fully functionally dependent on the entire
primary key.

Let R be a relation (table) with attributes X and Y.

If Y is fully functionally dependent on X, X → Y.


Breaking It Down

Understanding Full Functional Dependencies:

A non-key attribute should depend on the entire primary key.

Partial dependencies are not allowed.


2NF in Action

Let's consider a table to illustrate 2NF.

Table: “Orders”

Columns: "OrderID," "ProductID," "Product Name,"


"CustomerID"
Before 2NF

After 2NF
Why 2NF Matters

2NF reduces data redundancy and eliminates partial


dependencies.

It ensures efficient database management and maintains


data consistency.

2NF builds upon 1NF to refine data organization and promote


data integrity.
Exploring Third Normal Form (3NF)

Definition: 3NF is a level of database normalization that


reduces data redundancy and ensures data integrity.

It plays a critical role in structuring databases efficiently and


maintaining accurate data.
The Problem
with Data Redundancy

Data Redundancy: When the same data is stored in multiple


places, increasing the risk of inconsistencies.

Challenges: Difficulty in updating data consistently and


efficiently.
The Transitive Dependency Issue

Transitive Dependency: When one non-key attribute


depends on another non-key attribute.

Impact: It can lead to inefficient storage and potential data


anomalies.
3NF

A relation (table) is in 3NF if, for every non-prime attribute


(an attribute that is not part of the primary key), it is non-
transitively dependent on every superkey.

In simpler terms, 3NF ensures that non-key attributes are


functionally dependent only on the primary key and not on
other non-key attributes within the same table. This reduces
data anomalies and improves data integrity.
TEACHER_DETAILS table
Why 3NF Matters
The Third Normal Form is also considered to be the ample
requirement to build a database as the tables in the Third
Normal Form are devoid of insert, update or delete
anomalies.

The Third Normal Form removes the redundancy effectively


so the data becomes consistent as well as maintains the data
integrity.

As the redundancy is reduced, the database becomes less in


size and also the duplication of data is reduced which also
improves the performance.
BCNF
BCNF (Boyce Codd Normal Form) is an advanced version of
the third normal form (3NF), and often,

it is also known as the 3.5 Normal Form.

3NF doesn't remove 100% redundancy in the cases where for


a functional dependency (say, A->B),

A is not the candidate key of the table. To deal with such


situations, BCNF was introduced.
Rules for BCNF
It should satisfy all the conditions of the Third Normal Form
(3NF).

For any functional dependency (A->B), A should be either the


super key or the candidate key.

In simple words, it means that A can't be a non-prime


attribute if B is given as a prime attribute.
Example
In this example, we have to find the highest normalization
form, and for that, we are given a relation R(A, B, C, D, E) with
functional dependencies as follows: { BC->D, AC->BE, B->E }

As we can see, (AC)+={A, C, B, E, D} and also, none of its


subsets can determine all the attributes of the relation.

There is another point to be noted that A or C can’t be


derived from any other attribute of the relation, and
therefore, there is only one candidate key, {AC}.
Example

Prime attributes in DBMS are always part of the candidate


keys, and for this relation R,

prime attributes are: {A, C} while non-prime attributes are:


{B, E, D}.
Example 2
In this example, we have to again find the highest
normalization form, and for that, we are given a relation R(A,
B, C) with functional dependencies as follows: {AB ->C, C ->B,
AB ->B} Candidate Key (given): {AB}

Clearly, prime attributes for Relation R are: {A,B} while non-


prime attributes are: {C}.

For this particular example, let us start from the order of


hierarchy with higher restrictions, and firstly, we will check
for BCNF here.
Example 2
{AB->C} and {AB->B} are in BCNF because AB is the candidate
key present on the LHS of both dependencies.

The second dependency, {C->B}, however, is not in BCNF


because C is neither a super key nor a candidate key.

C->B is, however, present in 3NF because B is a prime


attribute that satisfies the conditions of 3NF. Hence, relation
R has 3NF as the highest normalization form.
Example 2
As we know that each professor teaches only one subject,
but one subject may be taught by multiple professors.

This shows that there is a dependency between the subject &


the professor, and the subject is always dependent on the
professor (professor -> subject).

As we know that the professor column is a non-prime


attribute, while the subject is a prime attribute. This is not
allowed in BCNF in DBMS. For BCNF, the deriving attribute
(professor here) must be a prime attribute.
How to satisfy BCNF?
we will decompose the table into two tables: the Student table and the
Professor table to satisfy the conditions of BCNF.
4NF
The relation is said to be in 4NF if the relation is in Boyce
Codd Normal Form and

has no multi-valued dependency.

What is Multi-valued Dependency ?

Let's have an example to understand multi-valued


dependency
4NF

As you can see in the above table, Employee E901 is


interested in two departments HR and Sales and, has two
hobbies Badminton and Reading.
4NF

This will result in multiple records for E901 as


4NF
In the above table, you can see that for the Employee E901
multiple records exist in the DEPARTMENT and the HOBBY
attribute.

Hence the multi-valued dependencies are

EMPLOYEE_ID −> DEPARTMENT and


EMPLOYEE_ID −> HOBBY
Also, the DEPARTMENT and HOBBY attributes are
independent of each other thus leading to a multi-valued
dependency in the above table.
4NF
To satisfy the fourth normal form, we can decompose the
relation into two tables,
5NF
A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.

5NF is satisfied when all the tables are broken into as many tables
as possible in order to avoid redundancy.

5NF is also known as Project-join normal form (PJ/NF).


5NF

• In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2.
• In this case, combination of all these fields required to identify a
valid data.

5NF
Suppose we add a new Semester as Semester 3 but do not know
about the subject and who will be taking that subject so we leave
Lecturer and Subject as NULL.

• But all three columns together acts as a primary key, so we can't


leave other two columns blank.

• So to make the above table into 5NF, we can decompose it into


three relations P1, P2 & P3:

You might also like