0% found this document useful (0 votes)
10 views24 pages

UNIT III Redundancy

The document discusses schema refinement in databases, focusing on the problems caused by data redundancy, such as insertion, deletion, and updation anomalies. It explains how redundancy leads to data inconsistencies, increased storage requirements, and performance issues, and emphasizes the importance of normalization to eliminate redundancy. Additionally, it covers decomposition in DBMS, detailing lossless and lossy decompositions, and the significance of dependency preservation in maintaining data integrity.

Uploaded by

M. Madhusudhan M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

UNIT III Redundancy

The document discusses schema refinement in databases, focusing on the problems caused by data redundancy, such as insertion, deletion, and updation anomalies. It explains how redundancy leads to data inconsistencies, increased storage requirements, and performance issues, and emphasizes the importance of normalization to eliminate redundancy. Additionally, it covers decomposition in DBMS, detailing lossless and lossy decompositions, and the significance of dependency preservation in maintaining data integrity.

Uploaded by

M. Madhusudhan M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT-III

Schema Refinement
Problems caused by redundancy:

Redundancy means having multiple copies of the same data in the


database. This problem arises when a database is not normalized.
Suppose a table of student details attributes is: student ID, student
name, college name, college rank, and course opted.

Name Contact College Course Rank


Student_ID

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

It can be observed that values of attribute college name, college rank,


and course are being repeated which can lead to problems. Problems
caused due to redundancy are:

 Insertion anomaly
 Deletion anomaly
 Updation anomaly

1
Insertion Anomaly

 If a student detail has to be inserted whose course is not being


decided yet then insertion will not be possible till the time
course is decided for the student.

Student_ID Name Contact


College Course Rank

Himans 7300934
100 GEU 1
hu 851

 This problem happens when the insertion of a data record is not


possible without adding some additional unrelated data to the
record.

Deletion Anomaly
If the details of students in this table are deleted then the details of
the college will also get deleted which should not occur by common
sense. This anomaly happens when the deletion of a data record
results in losing some unrelated information that was stored as part
of the record that was deleted from a table.
It is not possible to delete some information without losing some
other information in the table as well.

2
Updation Anomaly
Suppose the rank of the college changes then changes will have to be
all over the database which will be time-consuming and
computationally costly.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

All places should be updated, If updation does not occur at all places
then the database will be in an inconsistent state.
Redundancy in a database occurs when the same data is stored in
multiple places. Redundancy can cause various problems such as
data inconsistencies, higher storage requirements, and slower data
retrieval.
Problems Caused Due to Redundancy:

 Data Inconsistency: Redundancy can lead to data


inconsistencies, where the same data is stored in multiple
locations, and changes to one copy of the data are not reflected in
the other copies. This can result in incorrect data being used in
decision-making processes and can lead to errors and
inconsistencies in the data.
3
 Storage Requirements: Redundancy increases the storage
requirements of a database. If the same data is stored in multiple
places, more storage space is required to store the data. This can
lead to higher costs and slower data retrieval.
 Update Anomalies: Redundancy can lead to update anomalies,
where changes made to one copy of the data are not reflected in
the other copies. This can result in incorrect data being used in
decision-making processes and can lead to errors and
inconsistencies in the data.
 Performance Issues: Redundancy can also lead to performance
issues, as the database must spend more time updating multiple
copies of the same data. This can lead to slower data retrieval and
slower overall performance of the database.
 Security Issues: Redundancy can also create security issues, as
multiple copies of the same data can be accessed and manipulated
by unauthorized users. This can lead to data breaches and
compromise the confidentiality, integrity, and availability of the
data.
 Maintenance Complexity: Redundancy can increase the
complexity of database maintenance, as multiple copies of the
same data must be updated and synchronized. This can make it
more difficult to troubleshoot and resolve issues and can require
more time and resources to maintain the database.
 Data Duplication: Redundancy can lead to data duplication,
where the same data is stored in multiple locations, resulting in

4
wasted storage space and increased maintenance complexity. This
can also lead to confusion and errors, as different copies of the
data may have different values or be out of sync.
 Data Integrity: Redundancy can also compromise data integrity,
as changes made to one copy of the data may not be reflected in
the other copies. This can result in inconsistencies and errors and
can make it difficult to ensure that the data is accurate and up-to-
date.
 Usability Issues: Redundancy can also create usability issues, as
users may have difficulty accessing the correct version of the data
or may be confused by inconsistencies and errors. This can lead to
frustration and decreased productivity, as users spend more time
searching for the correct data or correcting errors.
To prevent redundancy in a database, normalization techniques can
be used. Normalization is the process of organizing data in a
database to eliminate redundancy and improve data
integrity. Normalization involves breaking down a larger table into
smaller tables and establishing relationships between them. This
reduces redundancy and makes the database more efficient and
reliable.
Advantages of data redundancy in DBMS

o Provides Data Security: Data redundancy can enhance data


security as it is difficult for cyber attackers to attack data that are
in different locations.

5
o Provides Data Reliability: Reliable data improves accuracy
because organizations can check and confirm whether data is
correct.
o Create Data Backup: Data redundancy helps in backing up the
data.

Disadvantages of data redundancy in DBMS

o Data corruption: Redundant data leads to high chances of data


corruption.
o Wastage of storage: Redundant data requires more space,
leading to a need for more storage space.
o High cost: Large storage is required to store and maintain
redundant data, which is costly.

How to reduce data redundancy in DBMS


We can reduce data redundancy using the following methods:

o Database Normalization: We can normalize the data using the


normalization method. In this method, the data is broken down
into pieces, which means a large table is divided into two or
more small tables to remove redundancy. Normalization
removes insert anomaly, update anomaly, and delete anomaly.
o Deleting Unused Data: It is important to remove redundant
data from the database as it generates data redundancy in the
DBMS. It is a good practice to remove unwanted data to reduce
redundancy.
o Master Data: The data administrator shares master data across
multiple systems. Although it does not remove data redundancy,
but it updates the redundant data whenever the data is changed.

6
Decomposition

Decomposition refers to the division of tables into multiple tables to


produce consistency in the data. In this article, we will learn about
the Database concept. This article is related to the concept of
Decomposition in DBMS. It explains the definition of
Decomposition, types of Decomposition in DBMS, and its
properties.

What is Decomposition in DBMS?


When we divide a table into multiple tables or divide a relation into
multiple relations, then this process is termed Decomposition in
DBMS. We perform decomposition in DBMS when we want to
process a particular data set. It is performed in a database
management system when we need to ensure consistency and remove
anomalies and duplicate data present in the database. When we
perform decomposition in DBMS, we must try to ensure that no
information or data is lost.

7
Decomposition in DBMS

Types of Decomposition
There are two types of Decomposition:
 Lossless Decomposition
 Lossy Decomposition

Types of Decomposition

Lossless Decomposition
If the information is not lost from the relation that is
decomposed, then the decomposition will be lossless.

8
The lossless decomposition guarantees that the join of
relations will result in the same relation as it was decomposed.
The relation is said to be lossless decomposition if natural
joins of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:

EMP_I EMP_NAM EMP_AG EMP_CIT DEPT_I DEPT_NAM


D E E Y D E

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and


DEPARTMENT

EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

9
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME

827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing

Now, when these two relations are joined on the common column
"EMP_ID", then the resultant relation will look like:

Employee ⋈ Department
EMP_I EMP_NA EMP_A EMP_CI DEPT_I DEPT_NA
D ME GE TY D ME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Example:

There is a relation called R(A, B, C)

10
A B C

55 16 27

48 52 89
Now we decompose this relation into two sub relations R1 and R2
R1(A, B)

A B

55 16

48 52

R2(B, C)
B C

16 27

52 89
After performing the Join operation we get the same original relation

A B C

55 16 27

48 52 89

Now, if we take the natural join of R1 and R2 on attribute A, we get


back the original relation R. Therefore, this is a lossless
decomposition.

11
Example: Let's consider a table `R(A, B, C)` with a dependency `A
→ B`. If you decompose it into `R1(A, B)` and `R2(B, C)`, it would
be lossy because you can't recreate the original table using natural
joins.

Example: Consider a relation R(A,B,C) with the following data:


|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
Suppose we decompose R into R1(A,B) and R2(A,C).

R1(A, B):
|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):
|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |

12
Now, if we take the natural join of R1 and R2 on attribute A, we get
back the original relation R. Therefore, this is a lossless
decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table
must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of
R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with
functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
o Dependency Preservation: A Decomposition D = { R1, R2,
R3…Rn } of R is dependency preserving wrt a set F of
Functional dependency if

o (F1 ? F2 ? … ? Fm)+ = F+.


o Consider a relation R
o R ---> F{...with some functional dependency(FD)....}
o R is decomposed or divided into R1 with FD { f1 } and R2
with { f2 }, then

13
o there can be three cases:
o f1 U f2 = F -----> Decomposition is dependency preserving.
o f1 U f2 is a subset of F -----> Not Dependency preserving.
o f1 U f2 is a super set of F -----> This case is not possible.

14
15
Problem:
Let a relation R (A, B, C, D ) and functional dependency {AB
–> C, C –> D, D –> A}. Relation R is decomposed into R1( A, B, C)
and R2(C, D). Check whether decomposition is dependency
preserving or not.

Solution:

R1(A, B, C) and R2(C, D)

Let us find closure of F1 and F2


To find closure of F1, consider all combination of
ABC. i.e., find closure of A, B, C, AB, BC and AC
Note ABC is not considered as it is always ABC

closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present
R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute

closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial
attributes

closure(BC) = {B, C, D, A}
= {A, B, C}
16
BC --> A // Removing BC from right side as these are trivial
attributes

closure(AC) = {A, C, D}
NULL SET

F1 {C--> A, AB --> C, BC --> A}.


Similarly F2 { C--> D }

In the original Relation Dependency { AB --> C , C --> D , D -->


A}.
AB --> C is present in F1.
C --> D is present in F2.
D --> A is not preserved.

F1 U F2 is a subset of F. So given decomposition is not


dependency preserving.

Problems Related to Decomposition

1. Loss of Information(lossless decomposition or lossy


decomposition)

2. Loss of Functional Dependency

 Once tables are decomposed, certain functional dependencies


might not be preserved, which can lead to the inability to
enforce specific integrity constraints.

17
 Example: If you have the functional dependency `A → B` in the
original table, but in the decomposed tables, there is no table
with both `A` and `B`, this functional dependency can't be
preserved.

Example: Let's consider a relation R with attributes A,B, and C and


the following functional dependencies:

A→B
B→C

Now, suppose we decompose R into two relations:

R1(A,B)withFDA→B
R2(B,C) with FD B → C

In this case, the decomposition is dependency-preserving because all


the functional dependencies of the original relation R can be found in
the decomposed relations R1 and R2. We do not need to join R1 and
R2 to enforce or check any of the functional dependencies.

However, if we had a functional dependency in R, say A → C, which


cannot be determined from either R1 or R2 without joining them, then
the decomposition would not be dependency-preserving for that
specific FD.

18
3. Increased Complexity

 Decomposition leads to an increase in the number of tables,


which can complicate queries and maintenance tasks. While
tools and ORM (Object-Relational Mapping) libraries can
mitigate this to some extent, it still adds complexity.

4. Redundancy

 Incorrect decomposition might not eliminate redundancy, and in


some cases, can even introduce new redundancies.

5. Performance Overhead

 An increased number of tables, while aiding normalization, can


also lead to more complex SQL queries involving multiple joins,
which can introduce performance overheads.

Functional Dependency

The functional dependency is a relationship that exists between two


attributes. It typically exists between the primary key and non-key
attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the


production is known as a dependent.

19
Example:

Assume we have an employee table with attributes: Emp_Id,


Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute


of employee table because if we know the Emp_Id, we can tell that
employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency

1.Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B
20
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name}→ Employee_Id is a trivial function
al dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Empl
oyee_Name are trivial dependencies too.

2. Non-trivial functional dependency


A → B has a non-trivial functional dependency if B is not a subset of A.

When A intersection B is NULL, then A → B is called as complete non-


trivial.

Example:

ID → Name,
Name → DOB
Reasoning about Functional Dependancy:
Inference Rule (IR):

1.The Armstrong's axioms are the basic inference rule.

2.Armstrong's axioms are used to conclude functional dependencies


on a relational database.

3.The inference rule is a type of assertion. It can apply to a set of


FD(functional dependency) to derive other FD.

4.Using the inference rule, we can derive additional functional


dependency from the initial set.

21
The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1):

In the reflexive rule, if Y is a subset of X, then X determines Y.

If X ⊇ Y then X → Y

Example:

X = {a, b, c, d, e}
Y = {a, b, c}

2. Augmentation Rule (IR2):

The augmentation is also called as a partial dependency. In


augmentation, if X determines Y, then XZ determines YZ for any Z.

If X → Y then XZ → YZ
Example:

For R(ABCD), if A → B then AC → BC

3. Transitive Rule (IR3):

In the transitive rule, if X determines Y and Y determine Z, then X


must also determine Z.

If X → Y and Y → Z then X → Z

4. Union Rule (IR4):

Union rule says, if X determines Y and X determines Z, then X must


also determine Y and Z.

If X → Y and X → Z then X → YZ

22
Proof:

1.X→Y(given)
2.X→Z(given)
3.X→XY(using IR2 on 1 by augmentation with X. Where
XX=X)
4.XY→YZ(using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5):

Decomposition rule is also known as project rule. It is the reverse of


union rule.

This Rule says, if X determines Y and Z, then X determines Y and X


determines Z separately.

If X → YZ then X → Y and X → Z

Proof:

1.X→YZ(given)
2.YZ→Y(usingIR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6):

In Pseudo transitive Rule, if X determines Y and YZ determines W,


then XZ determines W.

If X → Y and YZ → W then XZ → W

Proof:

1.X→Y(given)
2.WY→Z(given)
3.WX→WY(using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

23
24

You might also like