0% found this document useful (0 votes)
19 views32 pages

Module 3

Uploaded by

ajith Pb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views32 pages

Module 3

Uploaded by

ajith Pb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

MODULE 3

Relational Database Integrity


Integrity Constraints

o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc.
The value of the attribute must be available in the corresponding domain.

Example:

1
2. Entity integrity constraints
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation
and if the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null
or be available in Table 2.

Example:

2
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique and null value in the relational table.

Example:

The Problem of redundancy in Database

Redundancy means having multiple copies of same data in the database. This problem arises
when a database is not normalized. Suppose a table of student details attributes are: student
Id, student name, college name, college rank, course opted.

3
As it can be observed that values of attribute college name, college rank, course is being
repeated which can lead to problems. Problems caused due to redundancy are: Insertion
anomaly, Deletion anomaly, and Updation anomaly.

1. Insertion Anomaly –
If a student detail has to be inserted whose course is not being decided yet then insertion
will not be possible till the time course is decided for student.

This problem happens when the insertion of a data record is not possible without adding
some additional unrelated data to the record.
1. Deletion Anomaly –
If the details of students in this table is deleted then the details of college will also get
deleted which should not occur by common sense.
This anomaly happens when deletion of a data record results in losing some unrelated
information that was stored as part of the record that was deleted from a table.
2. Updation Anomaly –
Suppose if the rank of the college changes then changes will have to be all over the
database which will be time-consuming and computationally costly.
If updation do not occur at all places then database will be in inconsistent state.

Normalization

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of relations.
It is also used to eliminate the undesirable characteristics like Insertion, Update and
Deletion Anomalies.

4
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.

Types of Normal Forms


There are the four types of normal forms:

First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.


o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

5
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.

EMPLOYEE table:

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.

o In the second normal form, all non-key attributes are fully functional dependent on the primary
key

Table: StudentCourse
StudentID CourseID StudentName Grade

101 CS101 Alice A

101 MATH101 Alice B

102 CS101 Bob A

103 MATH101 Charlie C

6
Candidate Key:

The combination of StudentID and CourseID uniquely identifies each row.

Partial Functional Dependency

 Dependency: StudentID → StudentName


o Example: Knowing StudentID = 101 is enough to determine StudentName =
Alice.
o This is a partial dependency because StudentName depends only on part of the
composite key (StudentID) and not on the entire composite key (StudentID,
CourseID).

Fully Functional Dependency

 Dependency: (StudentID, CourseID) → Grade


o Example: Knowing both StudentID = 101 and CourseID = CS101 is required to
determine Grade = A.
o This is a fully functional dependency because Grade depends on the entire
composite key and not on just one part of it.

Normalization:

To remove partial dependency and achieve 2NF, you would split the table as follows:

Table 1: Student
StudentID StudentName

101 Alice

102 Bob

103 Charlie

Table 2: CourseEnrollment
StudentID CourseID Grade

101 CS101 A

101 MATH101 B

102 CS101 A

103 MATH101 C

7
Now, both tables satisfy 2NF since all non-prime attributes are fully functionally dependent
on their respective primary keys.

Third Normal Form (3NF)


Third Normal Form (3NF)

A table is in Third Normal Form (3NF) if:

1. It is in Second Normal Form (2NF).


2. It has no transitive dependency, i.e., no non-prime attribute depends on another non-prime
attribute.

Example Table (Before 3NF):


StudentID CourseID StudentName InstructorID InstructorName

101 CS101 Alice 201 Dr. Smith

101 MATH101 Alice 202 Dr. Johnson

102 CS101 Bob 201 Dr. Smith

103 MATH101 Charlie 202 Dr. Johnson

Primary Key:

The composite key (StudentID, CourseID) uniquely identifies each row.

Transitive Dependency:

 InstructorName depends on InstructorID, and InstructorID depends on (StudentID,


CourseID).
o Hence, InstructorName indirectly depends on the primary key (StudentID,
CourseID). This is a transitive dependency, violating 3NF.

Decomposed Tables (After 3NF):

To eliminate the transitive dependency, split the table into two:

Table 1: StudentCourse
StudentID CourseID StudentName InstructorID

101 CS101 Alice 201

101 MATH101 Alice 202

102 CS101 Bob 201

103 MATH101 Charlie 202

8
Table 2: Instructor
InstructorID InstructorName

201 Dr. Smith

202 Dr. Johnson

Explanation:

 Table 1: The transitive dependency is removed by storing only the InstructorID in the
StudentCourse table.
 Table 2: The Instructor table contains the details about the instructor, ensuring no
redundant data.

Now, the StudentCourse table satisfies 3NF:

1. It is in 2NF (no partial dependency).


2. There is no transitive dependency because non-prime attributes (e.g., InstructorName) do
not depend on other non-prime attributes (e.g., InstructorID).

Boyce Codd normal form (BCNF)

BCNF (Boyce-Codd Normal Form) is a type of database normalization used to eliminate


redundancy and dependency anomalies. A table is in BCNF if it satisfies the following
criteria:

1. It is in 3rd Normal Form (3NF).


2. Every determinant is a candidate key.

Key Terms:

 Determinant: An attribute or a set of attributes that uniquely determine another attribute.


 Candidate Key: An attribute or a minimal set of attributes that can uniquely identify a tuple
in the table.

Why BCNF?

BCNF addresses anomalies that arise when:

 A non-candidate key attribute depends on a part of a composite primary key.


 There are overlapping candidate keys, causing redundancy.

Example Before BCNF:

Consider a table Course_Enrollment:

9
CourseID Instructor StudentID StudentName

C101 Dr. Smith S1 Alice

C102 Dr. Brown S2 Bob

C101 Dr. Smith S2 Bob

Functional Dependencies:

1. CourseID→Instructor (Each course has one instructor.)


2. StudentID→StudentName(Each student has a unique name.)
3. CourseID,StudentID→Instructor,StudentName(Composite key: CourseID and StudentID
uniquely identify each record.)

Problem:

The determinant CourseID→Instructor violates BCNF because CourseID is not a candidate


key.

Decomposing into BCNF:

To achieve BCNF, we split the table into two:

1. Course Table (to resolve CourseID→Instructor)

CourseID Instructor

C101 Dr. Smith

C102 Dr. Brown

2. Enrollment Table (remaining attributes form a valid candidate key):

CourseID StudentID StudentName

C101 S1 Alice

C102 S2 Bob

C101 S2 Bob

Advantages of BCNF:

 Eliminates redundancy.
 Avoids anomalies in data insertion, deletion, or updates.

This decomposition ensures every determinant is a candidate key, achieving BCNF.

10
Fourth normal form (4NF)

o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then
the relation will be a multi-valued dependency.

Example

11
Fifth normal form (5NF)

A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.

5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.

5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

12
Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leav

e other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

13
Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

some important points about normal forms

14
The above diagram also implies-
● BCNF is stricter than 3NF.
● 3NF is stricter than 2NF.
● 2NF is stricter than 1NF.

While determining the normal form of any given relation,


● Start checking from BCNF.
● This is because if it is found to be in BCNF, then it will surely be in all other normal forms.
● If the relation is not in BCNF, then start moving towards the outer circles and check for
other normal forms in the order they appear.

● In a relational database, a relation is always in First Normal Form (1NF) at least.

15
Functional dependency in DBMS

What is Functional Dependency


Functional dependency in DBMS, as the name suggests is a relationship between attributes of
a table dependent on each other. Introduced by E. F. Codd, it helps in preventing data
redundancy and gets to know about bad designs.
To understand the concept thoroughly, let us consider P is a relation with attributes A and B.
Functional Dependency is represented by -> (arrow sign)
Then the following will represent the functional dependency between attributes with an arrow
sign:
A -> B

Example
The following is an example that would make it easier to understand functional dependency:
We have a <Department> table with two attributes: DeptId and DeptName.
DeptId = Department ID
DeptName = Department Name

The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute.
This is because if you want to know the department name, then at first you need to have
the DeptId.

16
Therefore, the above functional dependency between DeptId and DeptName can be
determined as DeptId is functionally dependent on DeptName:
DeptId -> DeptName

Types of Functional Dependency


Functional Dependency has three forms:

● Trivial Functional Dependency


● Non-Trivial Functional Dependency
● Completely Non-Trivial Functional Dependency
Let us begin with Trivial Functional Dependency:
Trivial Functional Dependency
● A functional dependency X → Y is said to be trivial if and only if Y ⊆ X.
● Thus, if RHS of a functional dependency is a subset of LHS, then it is called as a trivial
functional dependency.

Examples-

The examples of trivial functional dependencies are-


● AB → A
● AB → B
● AB → AB

Example
We are considering the same <Department> table with two attributes to understand the
concept of trivial dependency.
The following is a trivial functional dependency since DeptId is a subset
of DeptId and DeptName
{ DeptId, DeptName } -> Dept Id

17
Non –Trivial Functional Dependency
● A functional dependency X → Y is said to be non-trivial if and only if Y ⊄ X.
● Thus, if there exists at least one attribute in the RHS of a functional dependency that is not
a part of LHS, then it is called as a non-trivial functional dependency.

Examples-

The examples of non-trivial functional dependencies are-


● AB → BC
● AB → CD

Example
DeptId -> DeptName

The above is a non-trivial functional dependency since DeptName is a not a subset of DeptId.
Completely Non - Trivial Functional Dependency
It occurs when A intersection B is null in:
A ->B

Inference Rules-

Reflexivity-
If B is a subset of A, then A → B always holds.

Transitivity-
If A → B and B → C, then A → C always holds.
Augmentation-
If A → B, then AC → BC always holds.

Decomposition-
If A → BC, then A → B and A → C always holds.

Composition-
If A → B and C → D, then AC → BD always holds.

18
Additive-
If A → B and A → C, then A → BC always holds.
Rules for Functional Dependency-

Rule-01:

A functional dependency X → Y will always hold if all the values of X are unique (different)
irrespective of the values of Y.

Example-

Consider the following table-

A B C D E

5 4 3 2 2

8 5 3 2 1

1 9 3 3 5

4 7 3 3 8

The following functional dependencies will always hold since all the values of attribute ‘A’
are unique-
● A→B
● A → BC
● A → CD
● A → BCD
● A → DE
● A → BCDE
In general, we can say following functional dependency will always hold-

19
A → Any combination of attributes A, B, C, D, E

Similar will be the case for attributes B and E.

Rule-02:

A functional dependency X → Y will always hold if all the values of Y are same irrespective
of the values of X.

Example-
Consider the following table-

A B C D E

5 4 3 2 2

8 5 3 2 1

1 9 3 3 5

4 7 3 3 8

The following functional dependencies will always hold since all the values of attribute ‘C’
are same-
● A→C
● AB → C
● ABDE → C
● DE → C
● AE → C

In general, we can say following functional dependency will always hold true-

20
Any combination of attributes A, B, C, D, E → C

Combining Rule-01 and Rule-02 we can say-

In general, a functional dependency α → β always holds-


If either all values of α are unique or if all values of β are same or both.

Rule-03:

For a functional dependency X → Y to hold, if two tuples in the table agree on the value of
attribute X, then they must also agree on the value of attribute Y.

Rule-04:

For a functional dependency X → Y, violation will occur only when for two or more same
values of X, the corresponding Y values are different.
Equivalence of Two Sets of Functional Dependencies-
In DBMS,f
● Two different sets of functional dependencies for a given relation may or may not be
equivalent.
● If F and G are the two sets of functional dependencies, then following 3 cases are possible-

Case-01: F covers G (F ⊇ G)
Case-02: G covers F (G ⊇ F)
Case-03: Both F and G cover each other (F = G)

Case-01: Determining Whether F Covers G-

Following steps are followed to determine whether F covers G or not-

21
Step-01:

● Take the functional dependencies of set G into consideration.


● For each functional dependency X → Y, find the closure of X using the functional
dependencies of set G.

Step-02:

● Take the functional dependencies of set G into consideration.


● For each functional dependency X → Y, find the closure of X using the functional
dependencies of set F.

Step-03:

● Compare the results of Step-01 and Step-02.


● If the functional dependencies of set F has determined all those attributes that were

● Thus, we conclude F covers G (F ⊇ G) otherwise not.


determined by the functional dependencies of set G, then it means F covers G.

Case-02: Determining Whether G Covers F-

Following steps are followed to determine whether G covers F or not-

Step-01:

● Take the functional dependencies of set F into consideration.


● For each functional dependency X → Y, find the closure of X using the functional
dependencies of set F.

Step-02:

● Take the functional dependencies of set F into consideration.


● For each functional dependency X → Y, find the closure of X using the functional
dependencies of set G.

Step-03:

22
● Compare the results of Step-01 and Step-02.
● If the functional dependencies of set G has determined all those attributes that were

● Thus, we conclude G covers F (G ⊇ F) otherwise not.


determined by the functional dependencies of set F, then it means G covers F.

Case-03: Determining Whether Both F and G Cover Each Other-

● If F covers G and G covers F, then both F and G cover each other.


● Thus, if both the above cases hold true, we conclude both F and G cover each other (F =
G).

Decomposition of a Relation-

The process of breaking up or dividing a single relation into two or


more sub relations is called as decomposition of a relation.

Properties of Decomposition-

The following two properties must be followed when decomposing a given relation-

1. Lossless decomposition-

Lossless decomposition ensures-


● No information is lost from the original relation during decomposition.
● When the sub relations are joined back, the same relation is obtained that was decomposed.
Every decomposition must always be lossless.

2. Dependency Preservation-

Dependency preservation ensures-


● None of the functional dependencies that holds on the original relation are lost.
● The sub relations still hold or satisfy the functional dependencies of the original relation.

A Decomposition D = { R1, R2, R3….Rn } of R is dependency preserving wrt a set F


of

23
Functional dependency if
(F1 ∪ F2 ∪ … ∪ Fm)+ = F+.
Consider a relation R
R ---> F{...with some functional dependency(FD)....}
R is decomposed or divided into R1 with FD { f1 } and R2 with { f2 }, then
there can be three cases:
f1 U f2 = F -----> Decomposition is dependency preserving.
f1 U f2 is a subset of F -----> Not Dependency preserving.
f1 U f2 is a super set of F -----> This case is not possible.
Problem: Let a relation R (A, B, C, D ) and functional dependency {AB –> C, C –>
D, D –> A}.
Relation R is decomposed into R1( A, B, C) and R2(C, D). Check whether
decomposition is
dependency preserving or not.
Solution:
R1(A, B, C) and R2(C, D)
Let us find closure of F1 and F2
To find closure of F1, consider all combination of
ABC. i.e., find closure of A, B, C, AB, BC and AC
Note ABC is not considered as it is always ABC
closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute
closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial attributes
closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial attributes
closure(AC) = {A, C, D}
AC --> D // Removing AC from right side as these are trivial attributes
F1 {C--> A, AB --> C, BC --> A}.
Similarly F2 { C--> D }
In the original Relation Dependency { AB --> C , C --> D , D --> A}.
AB --> C is present in F1.
C --> D is present in F2.
D --> A is not preserved.
F1 U F2 is a subset of F. So given decomposition is not dependency

24
Types of Decomposition-

Decomposition of a relation can be completed in the following two ways-

1. Lossless Join Decomposition-

● Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.


● This decomposition is called lossless join decomposition when the join of the sub relations
results in the same relation R that was decomposed.
● For lossless join decomposition, we always have-

R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R

where ⋈ is a natural join operator

Example-

Consider the following relation R( A , B , C )-

A B C

1 2 1

2 5 3

25
3 3 3

R( A , B , C )

Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-

The two sub relations are-

A B

1 2

2 5

3 3

R1( A , B )

B C

2 1

5 3

3 3

26
R2( B , C )

Now, let us check whether this decomposition is lossless or not.


For lossless decomposition, we must have-
R1 ⋈ R2 = R

Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-

A B C

1 2 1

2 5 3

3 3 3

This relation is same as the original relation R.


Thus, we conclude that the above decomposition is lossless join decomposition.

NOTE-

● Lossless join decomposition is also known as non-additive join decomposition.


● This is because the resultant relation after joining the sub relations is same as the
decomposed relation.
● No extraneous tuples appear after joining of the sub-relations.

2. Lossy Join Decomposition-

● Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.


● This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
● The natural join of the sub relations is always found to have some extraneous tuples.
● For lossy join decomposition, we always have-

27
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R

where ⋈ is a natural join operator

Example-

Consider the following relation R( A , B , C )-

A B C

1 2 1

2 5 3

3 3 3

R( A , B , C )

Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-

The two sub relations are-

A C

1 1

28
2 3

3 3

R1( A , B )

B C

2 1

5 3

3 3

R2( B , C )

Now, let us check whether this decomposition is lossy or not.


For lossy decomposition, we must have-
R1 ⋈ R2 ⊃ R

Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 we get-

A B C

1 2 1

2 5 3

2 3 3

29
3 5 3

3 3 3

This relation is not same as the original relation R and contains some extraneous tuples.
Clearly, R1 ⋈ R2 ⊃ R.
Thus, we conclude that the above decomposition is lossy join decomposition.

NOTE-

● Lossy join decomposition is also known as careless decomposition.


● This is because extraneous tuples get introduced in the natural join of the sub-relations.
● Extraneous tuples make the identification of the original tuples difficult.

Determining Whether Decomposition Is Lossless Or Lossy-

Consider a relation R is decomposed into two sub relations R1 and R2.


Then,
● If all the following conditions satisfy, then the decomposition is lossless.
● If any of these conditions fail, then the decomposition is lossy.

Condition-01:

Union of both the sub relations must contain all the attributes that are present in the original
relation R.
Thus,

R1 ∪ R2 = R

30
Condition-02:

● Intersection of both the sub relations must not be null.


● In other words, there must be some common attribute which is present in both the sub
relations.
Thus,

R1 ∩ R2 ≠ ∅

Condition-03:

Intersection of both the sub relations must be a super key of either R1 or R2 or both.
Thus,

R1 ∩ R2 = Super key of R1 or R2

PRACTICE PROBLEMS BASED ON DETERMINING WHETHER DECOMPOSITION IS


LOSSLESS OR LOSSY-

Problem-01:

Consider a relation schema R ( A , B , C , D ) with the functional dependencies A → B and


C → D. Determine whether the decomposition of R into R1 ( A , B ) and R2 ( C , D ) is
lossless or lossy.

Solution-

To determine whether the decomposition is lossless or lossy,


● We will check all the conditions one by one.
● If any of the conditions fail, then the decomposition is lossy otherwise lossless.

Condition-01:

According to condition-01, union of both the sub relations must contain all the attributes of
relation R.

31
So, we have-
R1 ( A , B ) ∪ R2 ( C , D )
=R(A,B,C,D)
Clearly, union of the sub relations contain all the attributes of relation R.
Thus, condition-01 satisfies.

Condition-02:

According to condition-02, intersection of both the sub relations must not be null.
So, we have-
R1 ( A , B ) ∩ R2 ( C , D )

Clearly, intersection of the sub relations is null.
So, condition-02 fails.
Thus, we conclude that the decomposition is lossy.

32

You might also like