DBMS Unit-3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

UNIT-3

Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition

Lossless-join decomposition
Lossless-join decomposition is a process in which a relation is decomposed into two or more
relations.
“This property guarantees that no information is lost from the original relation during the
decomposition.”
It is also known as non-additive join decomposition.
When the sub relations combine again then the new relation must be the same as the original
relation was before decomposition.

Consider a relation R if we decomposed it into sub-parts relation R1 and relation R2.


The decomposition is lossless when it satisfies the following statement −
 If we union the sub Relation R1 and R2 then it must contain all the attributes that are
available in the original relation R before decomposition. i.e. R1UR2=R
 Intersections of R1 and R2 cannot be Null. The sub relation must contain a common
attribute. The common attribute must contain unique data. i.e. R1Intersection R2= { not
be null}.
The common attribute must be a super key of sub relations either R1 or R2.
Here,
R = (A, B, C)
R1 = (A, B)
R2 = (B, C)
The relation R has three attributes A, B, and C. The relation R is decomposed into two relation
R1 and R2. R1 and R2 both have 2-2 attributes. The common attributes are B.
The Value in Column B must be unique. If it contains a duplicate value then the Lossless-
join decomposition is not possible.
Draw a table of Relation R with Raw Data −
R (A, B, C)
A B C

12 25 34

10 36 09

12 42 30
It decomposes into the two sub relations −

R1 (A, B)
A B

12 25

10 36

12 42

R2 (B, C)
B C

25 34

36 09

42 30
Now, we can check the first condition for Lossless-join decomposition.
1. The union of sub relation R1 and R2 is the same as relation R.
R1U R2 = R
We get the following result −
A B C

12 25 34
A B C

10 36 09

12 42 30
The relation is the same as the original relation R. Hence, the above decomposition is Lossless-
join decomposition.
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, “at least one decomposed table or sub table must
satisfy the main table functional dependency.”
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set
o (A->BC).
o The relational R is decomposed into R1 (A, B, C) and R2 (A, D) which is dependency
preserving because FD A->BC of R is a part of relation R1 (ABC).
o i.e. the attributes of R1(A, B, C) are the same as the functional dependency of the R (A-
>BC).

Problems Related to Decomposition

1. Loss of Information
 Non-loss decomposition: When a relation is decomposed into two or more smaller
relations, and the original relation can be perfectly reconstructed by taking the
natural join of the decomposed relations, then it is termed as lossless
decomposition. If not, it is termed "lossy decomposition."
 Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you
decompose it into `R1(A, B)` and `R2(B, C)`, it would be lossy because you can't
recreate the original table using natural joins.
Example: Consider a relation R(A,B,C) with the following data:

| A | B | C |
|----|----|----|
| 1 | X | P |
| 1 | Y | P |
| 2 | Z | Q |

Suppose we decompose R into R1(A,B) and R2(A,C).


R1(A, B):

| A | B |
|----|----|
| 1 | X |
| 1 | Y |
| 2 | Z |

R2(A, C):

| A | C |
|----|----|
| 1 | P |
| 1 | P |
| 2 | Q |

Now, if we take the natural join of R1 and R2 on attribute A, we get back the
original relation R. Therefore, this is a lossless decomposition.

2. Loss of Functional Dependency


 Once tables are decomposed, certain functional dependencies might not be
preserved, which can lead to the inability to enforce specific integrity constraints.
 Example: If you have the functional dependency `A → B` in the original table, but in
the decomposed tables, there is no table with both `A` and `B`, this functional
dependency can't be preserved.
Example: Let's consider a relation R with attributes A,B, and C and the following
functional dependencies:
A → B
B→C
Now, suppose we decompose R into two relations:
R1(A,B) with FD A → B
R2(B,C) with FD B → C
In this case, the decomposition is dependency-preserving because all the
functional dependencies of the original relation R can be found in the
decomposed relations R1 and R2. We do not need to join R1 and R2 to enforce
or check any of the functional dependencies.
However, if we had a functional dependency in R, say A → C, which cannot be
determined from either R1 or R2 without joining them, then the decomposition
would not be dependency-preserving for that specific FD.

3. Increased Complexity
 Decomposition leads to an increase in the number of tables, which can complicate
queries and maintenance tasks. While tools and ORM (Object-Relational Mapping)
libraries can mitigate this to some extent, it still adds complexity.

4. Redundancy
 Incorrect decomposition might not eliminate redundancy, and in some cases, can
even introduce new redundancies.

5. Performance Overhead
 An increased number of tables, while aiding normalization, can also lead to more
complex SQL queries involving multiple joins, which can introduce performance
overheads.

Functional Dependency (FD)


In a relational database management, functional dependency is a concept that specifies the
relationship between two sets of attributes where one attribute determines the value of another
attribute.
It typically exists between the primary key and non-key attribute within a table. It is denoted
as X → Y, where the attribute set on the left side of the arrow, X is called Determinant,
and Y is called the Dependent. It can represent as FD: xy
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because
if we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
1. FD: Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.

2. Non-trivial functional dependency


o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as non-trivial functional
dependency.
Example:
1. FD:ID → Name,
The intersection of ID and Name are Null
2. FD: Name → DOB
The intersection of Name and DOB are Null
Multivalued Dependency
1. Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
2. A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors (white and
black) of each model every year.
BIKE_M MANUF_ COL
ODEL YEAR OR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black


Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called as multivalued dependent attributes on
BIKE_MODEL.
The representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR, COLOR
OR
2. BIKE_MODEL → → MANUF_YEAR
3. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and “COLOR".
OR
This can be read as “COLOR and MANUF_YEAR are dependent on BIKE_MODEL and
independent of each other”
OR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL
multidetermined COLOR".
Normalization
1. Normalization is the process of organizing the data in the database.
2. Normalization is used to minimize or reduce the redundancy from a relation or set of
relations.
3. It is also used to eliminate undesirable characteristics like Insertion, Update, and
Deletion Anomalies (problems).
4. Normalization divides the larger table into smaller and links them using
relationships.
Types of Normal Forms
1. First Normal Form(1NF)
2. Second Normal Form(2NF)
3. Third Normal Form(3NF)
4. Boyce Codd Normal Form(BCNF)
5. Fourth Normal Form(4NF)
6. Fifth Normal Form(5NF)

1.First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP
14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

2. Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key attributes.

Example: Let's assume, a school can store the data of teachers and the subjects they teach.

Note: In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

The above table is not in 2NF due to the KEY attribute TEACHER_ID has the repeated values,
To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

3. Third Normal Form (3NF)


1. In the 3NF, the relation must be in First and Second Normal Form
2. In which no non-primary-key attribute is transitively dependent on the
primary key, then it is in Third Normal Form (3NF).
Note – If A->B and B->C are two FDs then A->C is called transitive dependency.
Example
Consider a relation student (rollno, game, feestructure)
Rollno Game Feestructure

1 Basketball 500

2 Basketball 500

3 Basketball 500

4 Cricket 600
Rollno Game Feestructure

5 Cricket 600

6 Cricket 600

7 Tennis 400

FD − {rollno -> game, game -> feestructure, rollno -> feestructure}

The above student table is in 1NF because there are no multivalue attributes.

Student table is also in 2NF because all non-key attributes are fully functional dependent on the
primary key (rollno).

But the table is not in 3NF because there is transitive dependency exists in above table.

So divide the student table into R1(game, feestructure) and R2 (rollno, game).

Table:R1

Rollno Game

1 Basketball

2 Basketball

3 Basketball

4 Cricket

5 Cricket

6 Cricket

7 tennis

Table:R2
Game Feestructure

Basketball 500

Cricket 600

Tennis 400

In above two tables no transitive dependency exists so that the above two tables follows the
3NF rule.

Boyce-Codd Normal Form (BCNF)

Rule 1: The table should be in the 3rd Normal Form.


Rule 2: X should be a super key for every functional dependency (FD) X−>Y in a given
relation.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Super key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into two tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Super keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has “no multi-valued
dependency”.
o For a dependency A → B, C if for a single value of A, multiple values of B, C are exists
or dependent on A, then the relation will be a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey
The given STUDENT table is in 3NF and BCNF also but the COURSE and HOBBY are two
independent attributes. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:
STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics
STUDENT_HOBBY
STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the
result of the query. It uses operators to perform queries.
Types of Relational operation

1. Selection(σ): It is used to select required tuples of the relations.

Example:
A B C

1 2 4

2 2 3

3 2 3

4 3 4

For the above relation, σ(c>3)R will select the tuples which have c more than 3.
A B C

1 2 4

4 3 4

Note: The selection operator only selects the required tuples but does not display them. For
display, the data projection operator is used.

2. Projection(π): It is used to project required column data from a relation.


Example: Consider Table 1. Suppose we want columns B and C from Relation R.
π(B,C)R will show following columns.
B C

2 4

2 3

3 4

Note: By Default, projection removes duplicate data.

3. Union(U): Union operation in relational algebra is the same as union operation in set theory.
Example:
FRENCH
Student_Name Roll_Number

Ram 01

Mohan 02

Vivek 13

Geeta 17

GERMAN
Student_Name Roll_Number

Vivek 13

Geeta 17

Shyam 21

Rohan 25

Consider the following table of Students having different optional subjects in their course.
π(Student_Name)FRENCH U π(Student_Name)GERMAN
Student_Name

Ram

Mohan

Vivek

Geeta

Shyam

Rohan

Note: The only constraint in the union of two relations is that both relations must have the
same set of Attributes.

4. Set Difference(-): Set Difference in relational algebra is the same set difference operation as
in set theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used as follows
π(Student_Name)FRENCH - π(Student_Name)GERMAN
Student_Name

Ram

Mohan

5. Set Intersection(∩): Set Intersection in relational algebra is the same set intersection
operation in set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is used as
follows
π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN
Student_Name

Vivek

Geeta

Note: The only constraint in the Set Difference between two relations is that both relations
must have the same set of Attributes.
6. Rename(ρ): Rename is a unary operation used for renaming attributes of a relation.
ρ(a/b)R will rename the attribute 'b' of the relation by 'a'.

7. Cross Product(X): Cross-product between two relations. Let’s say A and B, so the cross
product between A X B will result in all the attributes of A followed by each attribute of B.
Each record of A will pair with every record of B.

Example:
A
Name Age Gender

Ram 14 M

Sona 15 F

Kim 20 M

B
ID Course

1 DS

2 DBMS

AXB
Name Age Gender ID Course

Ram 14 M 1 DS

Ram 14 M 2 DBMS

Sona 15 F 1 DS

Sona 15 F 2 DBMS

Kim 20 M 1 DS

Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.

Relational Calculus
1. Relational calculus is a non-procedural query language.
2. In the non-procedural query language, the user is concerned with the details
of how to obtain the end results.
3. The relational calculus tells what to do but never explains how to do.

Types of Relational calculus:

Tuple Relational Calculus (TRC)


It is a non-procedural query language used in relational database management
systems (RDBMS) to retrieve data from tables. TRC is based on the concept of
tuples, which are ordered sets of attribute values that represent a single row or
record in a database table.

Syntax: The basic syntax of TRC is as follows:

{ t | P(t) }

Where t is a tuple variable and P (t) is a logical formula that describes the
conditions that the tuples in the result must satisfy. The curly braces {} are
used to indicate that the expression is a set of tuples.
For example, let’s say we have a table called “Employees” with the
following attributes:

Employee ID

Name

Salary

Department ID

To retrieve the names of all employees who earn more than $50,000 per year,
we can use the following TRC query:
{ t | Employees(t) ∧ t.Salary > 50000 }
In this query, the “Employees(t)” expression specifies that the tuple variable t
represents a row in the “Employees” table. The “∧” symbol is the logical AND
operator, which is used to combine the condition “t.Salary > 50000” with the
table selection.
The result of this query will be a set of tuples, where each tuple contains the
Name attribute of an employee who earns more than $50,000 per year.
do it.

Domain Relational Calculus (DRC)


Domain Relational Calculus is similar to Tuple Relational Calculus, where it
makes a list of the attributes that are to be chosen from the relations as per
the conditions.
{<a1,a2,a3,.....an> | P(a1,a2,a3,.....an)}
Where a1,a2,…an are the attributes of the relation and P is the condition.
Example:
{< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}

Output: This query will yield the article, page, and subject from the
relational javatpoint, where the subject is a database.

Where javapoint is the Table name


Article, page and subject are the attributes of table javapoint.

The Problem of redundancy in Database


Redundancy means having multiple copies of the same data in
the database. This problem arises when a database is not normalized.
Suppose a table of student details attributes is: student Id, student
name, college name, college rank, and course opted.
Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

As it can be observed that values of Attribute College name, college


rank, and course is being repeated which can lead to problems.
Problems caused due to redundancy are:
 Insertion anomaly
 Deletion anomaly
 Updation anomaly
Insertion Anomaly
If a student detail has to be inserted whose course is not being decided
yet then insertion will not be possible till the time course is decided for
the student.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU 1

Deletion Anomaly
If the details of students in this table are deleted then the details of the
college will also get deleted which should not occur by common
sense. This anomaly happens when the deletion of a data record
results in losing some unrelated information that was stored as part of
the record that was deleted from a table.
Updation Anomaly
Suppose the rank of the college changes then changes will have to be
all over the database which will be time-consuming and
computationally costly.

Colleg
Student_ID Name Contact e Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

All places should be updated, If updation does not occur at all places
then the database will be in an inconsistent state.
Problems Caused Due to Redundancy
 Data Inconsistency: Redundancy can lead to data inconsistencies,
where the same data is stored in multiple locations, and changes to
one copy of the data are not reflected in the other copies. This can
result in incorrect data being used in decision-making processes and
can lead to errors and inconsistencies in the data.
 Storage Requirements: Redundancy increases the storage
requirements of a database. If the same data is stored in multiple
places, more storage space is required to store the data. This can
lead to higher costs and slower data retrieval.
 Update Anomalies: Redundancy can lead to update anomalies,
where changes made to one copy of the data are not reflected in
the other copies. This can result in incorrect data being used in
decision-making processes and can lead to errors and
inconsistencies in the data.
 Performance Issues: Redundancy can also lead to performance
issues, as the database must spend more time updating multiple
copies of the same data. This can lead to slower data retrieval and
slower overall performance of the database.
 Security Issues: Redundancy can also create security issues, as
multiple copies of the same data can be accessed and manipulated
by unauthorized users. This can lead to data breaches and
compromise the confidentiality, integrity, and availability of the
data.
 Maintenance Complexity: Redundancy can increase the
complexity of database maintenance, as multiple copies of the same
data must be updated and synchronized. This can make it more
difficult to troubleshoot and resolve issues and can require more
time and resources to maintain the database.
 Data Duplication: Redundancy can lead to data duplication, where
the same data is stored in multiple locations, resulting in wasted
storage space and increased maintenance complexity. This can also
lead to confusion and errors, as different copies of the data may
have different values or be out of sync.
 Data Integrity: Redundancy can also compromise data integrity,
as changes made to one copy of the data may not be reflected in
the other copies. This can result in inconsistencies and errors and
can make it difficult to ensure that the data is accurate and up-to-
date.
 Usability Issues: Redundancy can also create usability issues, as
users may have difficulty accessing the correct version of the data
or may be confused by inconsistencies and errors. This can lead to
frustration and decreased productivity, as users spend more time
searching for the correct data or correcting errors.
Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible
in order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1
but he doesn't take Math class for Semester 2. In this case, combination of
all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject
as NULL. But all three columns together acts as a primary key, so we can't
leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:

P1

Backward Skip 10sPlay VideoForward Skip 10s

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2
SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

You might also like