0% found this document useful (0 votes)
30 views40 pages

DBMS Unit2 Print

Good
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views40 pages

DBMS Unit2 Print

Good
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

UNIT: II: DATABASE DESIGN

Syllabus
Entity-Relationship model - E-R Diagrams - Enhanced-ER Model - ER-to-Relational Mapping
- Functional Dependencies Non-loss Decomposition - First, Second, Third Normal Forms,
Dependency Preservation - Boyce/Codd Normal Form Multi-valued Dependencies and Fourth
Normal Form - Join Dependencies and Fifth Normal Form.
Part I: Entity Relationship Model
Introduction to Entity Relationship Model
Entity Relational model is a model for identifying entities to be represented in the database and
representation of how those entities are related.
ER Model
The ER data model specifies enterprise schema that represents the overall logical structure of
a database.
The E-R model is very useful in mapping the meanings and interactions of real-world entities
onto a conceptual schema.
The ER model consists of three basic concepts -
1) Entity Sets
• Entity: An entity is an object that exists and is distinguishable from other objects The entity
can be concrete or abstract. The concrete entity can be - Person, Book, Bank. The abstract entity
can be like - holiday, concept entity is represented as a box.

• Entity set: The entity set is a set of entities of the same types Each entity in entity set have
the same set of attributes and the set of attributes will distinguish it from other entity sets2)
Relationship Sets
Relationship is an association among two or more entities.
The relationship set is a collection of similar relationships. Example - shows the relationship
works for the two entities Employee and Departments.
The association between entity sets is called as participation. That is, the entity sets E1,
E2,..., En participate in relationship set R.The function that an entity plays in a relationship is
called that entity's role.
3) Attributes
Attributes define the properties of a data object of entity. For example, if student is an
entity, his ID, name, address, date of birth, class are its attributes. The attributes help in
determining the unique entity. Entity is shown by rectangular box and attributes are shown in
oval. The primary key is underlined.

Types of Attributes
1) Simple and Composite Attributes:
1) Simple attributes are attributes that are drawn from the atomic value domains
For example - Name = {Parth}; Age = {23}
1) Composite attributes: Attributes that consist of a hierarchy of attributes For example -
Address may consists of "Number", "Street" and "Suburb"→ Address = {59+ 'JM Road' +
'Shivaji Nagar'}

2) Single valued and multivalued:


• There are some attributes that can be represented using a single value. For example -
StudentID attribute for a Student is specific only one studentID.
• Multivalued attributes: Attributes that have a set of values for each entity. It is represented by
concentric ovals. For example - Degrees of a person: BSc', 'MTech', 'PhD'

3) Derived attribute:
Derived attributes are the attributes that contain values that are calculated from other
attributes. To represent derived attribute there is dotted ellipse inside the solid ellipse. For
example, Age can be derived from attribute DateOfBirth. In this situation, DateOfBirth might
be called Stored Attribute.

Mapping Cardinality
Mapping Cardinality represents the number of entities to which another entity can be
associated via a relationship set.
The mapping cardinalities are used in representing the binary relationship sets. Various types
of mapping cardinalities are -
1) One to One: An entity A is associated with at least one entity on B and an entity B is
associated with at one entity on A. This can be represented as,
2) One to Many: An entity in A is associated with any number of entities in B. An entity in B,
however, can be associated with at most one entity in A.

3) Many to One: An entity in A is associated with at most one entity in B. An entity in B,


however, can be associated with any number of entities in A.

4) Many to many:An entity in A is associated with any number (zero or more) of entities in
B, and an entity in B is associated with any number (zero or more) of entities in A.

ER DIAGRAMS
An E-R diagram can express the overall logical structure of a database graphically. E-
R diagrams are used to model real-world objects like a person, a car, a company and the relation
between these real-world objects.

Various Components used in ER Model are-


Features of ER model
i) E-R diagrams are used to represent E-R model in a database, which makes them easy to be
converted into relations (tables).
ii) E-R diagrams require no technical knowledge and no hardware support.
iii) These diagrams are very easy to understand and easy to create even by a naive user.

MAPPING CARDINALITY REPRESENTATION USING ER DIAGRAM


There are four types of relationships that are considered for key constraints.
i) One to one relation: When entity A is associated with at the most one entity B then it shares
one to one relation. For example - There is one project manager who manages only one project.

ii) One to many :When entity A is associated with more than one entities at a time then there
is one to many relation. For example - One customer places order at a time.

iii) Many to one : When more than one entities are associated with only one entity then there
is many to one relation. For example – Many student take a ComputerSciCourse

Alternate representation can be

iv) Many to many: When more than one entities are associated with more than one entities.
For example -Many teachers can teach many students.
Alternate representation can be

TERNARY RELATIONSHIP
The relationship in which three entities are involved is called ternary relationship. For example

Binary and Ternary Relationships


If a relationship connects three entities, it is called ternary or "3-ary."
• Ternary relationships are required when binary relationships are not sufficient to accurately
describe the semantics of an association among three entities.
• For example - Suppose, you have a database for a company that contains the entities,
PRODUCT, SUPPLIER, and CUSTOMER.
The usual relationships might be PRODUCT/ SUPPLIER where the company buys products
from a supplier - a normal binary relationship. The intersection attribute for
PRODUCT/SUPPLIER is wholesale_price
• Now consider the CUSTOMER entity, and that the customer buys products. If all customers
pay the same price for a product, regardless of supplier, then you have a simple binary
relationship between CUSTOMER and PRODUCT. For the CUSTOMER/PRODUCT
relationship, the intersection attribute is retail_price.

• Single ternary relation: Now consider a different scenario. Suppose the customer buys
products but the price depends not only on the product, but also on the supplier. Suppose you
needed a customerID, a productID, and a supplierID to identify a price.
Weak Entity Set
• A weak entity is an entity that cannot be uniquely identified by its attributes alone. The entity
set which does not have sufficient attributes to form a primary key is called as weak entity set.

Strong Entity Set


The entity set that has primary key is called as strong entity set
Weak entity set because each player needs a team

Difference between Strong and Weak Entity Set


ENHANCED ER MODEL
Specialization and Generalization AU: Dec- 19, Marks 7
• In this relationship hierarchies, some entities can act as superclass and some other entities can
act as subclass.
• Superclass: An entity type that represents a general concept at a high level, is called
superclass.
• Subclass: An entity type that represents a specific concept at lower levels, is called subclass.
• The subclass is said to inherit from superclass. When a subclass inherits from one or more
super classes, it inherits all their attributes. In addition to the inherited attributes, a subclass can
also define its own specific attributes.
Specialization
• The process of making subclasses from a general concept is called specialization. This is top-
down process.
Generalization
• The process of making superclass from subclasses is called generalization. This is a bottom
up process. In this process multiple sets are synthesized into high level entities.
• The symbol used for specialization/ Generalization is,

• For example - There can be two subclass entities namely Hourly_Emps and Contract_Emps
which are subclasses of Empoyee class. We might have attributes hours_worked and hourly
wage defined for Hourly_Emps and an attribute contractid defined for ContractEmps.
Therefore, the attributes defined for an Hourly_Emps entity are the attributes for
Employees plus Hourly_Emps. We say that the attributes for the entity set Employees are
inherited by the entity set Hourly_Emps and that Hourly-Emps ISA (read is a) Employees.
CONSTRAINTS ON SPECIALIZATION/GENERALIZATION
There are four types of constraints on specialization/generalization relationship. These are -
1) Membership constraints: This is a kind of constraints that involves determining which
entities can be members of a given lower-level entity. There are two types of membership
constraints –
i) Condition defined: In condition-defined lower-level entity sets, membership is evaluated
on the basis of whether or not an entity satisfies an explicit condition or predicate.
ii) User defined: This is kind of entity set that in which the membership is manually defined.
2) Disjoint constraints: The disjoint constraint only applies when a superclass has more than
one subclass. If the subclasses are disjoint, then an entity occurrence can be a member of only
one of the subclasses. For entity Student has either Postgraduate Student entity or
Undergraduate Student

3) Overlapping: When some entity can be a member of more than one subclasses. For example
- Person can be both a Student or a Staff. The And can be used to represent this constraint.

4) Completeness: It specifies whether or not an entity in the higher-level entity set must belong
to at least one of the lower-level entity sets within the generalization/specialization.
i) Total generalization or specialization: Each higher-level entity must belong to a lower-
level entity set. For example - Account in the bank must either Savings account or Current
Account. The mandatory can be used to represent this constraint.
ii) Partial generalization or specialization: Some higher-level entities may not belong to
any lower-level entity set.

Aggregation
A featu re of the entity relationship model that allows a relationship set to participate in
another relationship set. This is indicated on an ER diagram by drawing a dashed box around
the aggregation. Example:
Review Question
1. Explain with suitable example, the constraints of specialization and generalization in ER
modeling. AU: Dec. 19, Marks 7
Examples based on ER Diagram
Example 2.5.1 Draw the ER diagram for banking systems (home loan applications). AU:
Dec.-17, Marks 8OR Draw an ER diagram corresponding to customers and loans. AU: May.-
14, Marks 8OR Write short notes on: E-R diagram for banking system. AU: Dec.-14,
Solution:
Construct an ER model for the car rental company database."AU: Dec.-15, Marks 16
Solution:

ER to Relational Mapping
AU: May-17, Dec.-19, Marks 13
In this section we will discuss how to map various ER model constructs to Relational Model
construct.
Mapping of Entity Set to Relationship
• An entity set is mapped to a relation in a straightforward way.
• Each attribute of entity set becomes an attribute of the table.
• The primary key attribute of entity set becomes an entity of the table.
• For example - Consider following ER diagram.

The converted employee table is as follows –


The SQL statement captures the information for above ER diageam as follows -
CREATE TABLE Employee( EmpID CHAR(11),
EName CHAR(30),
Salary INTEGER,
PRIMARY KEY(EmpID))

Mapping Relationship Sets(Without Constraints) to Tables


• Create a table for the relationship set.
• Add all primary keys of the participating entity sets as fields of the table.
• Add a field for each attribute of the relationship.
• Declare a primary key using all key fields from the entity sets.
• Declare foreign key constraints for all these fields from the entity sets. Example

The SQL statement captures the information for relationship present in above ER diagram as
follows -
CREATE TABLE Works In (EmpID CHAR(11),
DeptID CHAR(11),EName CHAR(30), Salary INTEGER,
DeptName CHAR(20),Building CHAR(10),
PRIMARY KEY(EmpID,DeptID),
FOREIGN KEY (EmpID) REFERENCES Employee,
FOREIGN KEY (DeptID) REFERENCES Department )
Mapping Relationship Sets (With Constraints) to Tables
• If a relationship set involves n entity sets and some m of them are linked via arrows in the ER
diagram, the key for anyone of these m entity sets constitutes a key for the relation to which
the relationship set is mapped.
• Hence we have m candidate keys, and one of these should be designated as the primary key.
• There are two approaches used to convert a relationship sets with key constraints into table.
• Approach 1:
• By this approach the relationship associated with more than one entities is separately
represented using a table. For example - Consider following ER diagram. Each Dept has at
most one manager, according to the key constraint on Manages.

Here the constraint is each department has at the most one manager to manage it. Hence no two
tuples can have same DeptID. Hence there can be a separate table named Manages with DeptID
as Primary Key. The table can be defined using following SQL statement
CREATE TABLE Manages (EmpID CHAR(11),
DeptID INTEGER,
Since DATE,
PRIMARY KEY (DeptID),
FOREIGN KEY (EmpID) REFERENCES Employees,
FOREIGN KEY (DeptID) REFERENCES Departments)
Approach 2:
• In this approach, it is preferred to translate a relationship set with key constraints.
• It is a superior approach because, it avoids creating a distinct table for the relationship set.
• The idea is to include the information about the relationship set in the table corresponding
to the entity set with the key, taking advantage of the key constraint.
• This approach eliminates the need for a separate Manages relation, and queries asking for
a department's manager can be answered without combining information from two relations.
• The only drawback to this approach is that space could be wasted if several departments
have no managers.
• The following SQL statement, defining a Dep_Mgr relation that captures the information
in both Departments and Manages, illustrates the second approach to translating relationship
sets with key constraints:
CREATE TABLE Dep_Mgr (DeptID INTEGER,
DName CHAR(20), Budget REAL,
EmpID CHAR (11), since DATE,
PRIMARY KEY (DeptID), FOREIGN KEY (EmpID) REFERENCES Employees)

Mapping Weak Entity Sets to Relational Mapping


• A weak entity can be identified uniquely only by considering the primary key of another
(owner) entity. Following steps are used for mapping Weak Entity Set to Relational Mapping
• Create a table for the weak entity set.
• Make each attribute of the weak entity set a field of the table. AI baris M
• Add fields for the primary key attributes of the identifying owner.
• Declare a foreign key constraint on these identifying owner fields.
• Instruct the system to automatically delete any tuples in the table for which there are no
owner. For example - Consider following ER model,

Following SQL Statement illustrates this mapping


CREATE TABLE Department (DeptID CHAR(11),
DeptName CHAR(20), Bldg No CHAR(5), PRIMARY KEY (DeptID,Bldg_No),
FOREIGN KEY(Bldg_No) References Buildings on delete cascade
)
Mapping of Specialization / Generalization (EER Construct)to Relational Mapping
The specialialization/Generalization relationship (Enhanced ER Construct) can be
mapped to database tables(relations) using
three methods. To demonstrate the methods, we will take the - InventoryItem, Book, DVD

Method 1: All the entities in the relationship are mapped to individual tables
InventoryItem(ID, name)
Book(ID,Publisher)
DVD(ID, Manufacturer)
Method 2: Only subclasses are mapped to tables. The attributes in the superclass are duplicated
in all subclasses. For example -
Book(ID,name, Publisher)
DVD(ID, name, Manufacturer)
Method 3: Only the superclass is mapped to a table. The attributes in the subclasses are taken
to the superclass. For example -
InventoryItem(ID, name, Publisher, Manufacturer)
This method will introduce null values. When we insert a Book record in the table, the
Manufacturer column value will be null. In the same way, when we insert a DVD record in the
table, the Publisher value will be null.
Part II: Relational Database Design
Concept of Relational Database Design AU: Dec.-19, Marks 7
• There are two primary goals of relational database design -
i) Store information without unnecessary redundancy,
ii) To allows us to retrieve information easily.
• For achieving these goals, the database design need to be normalized. That means we have to
check whether the schema is in normal form or not.
• For checking the normal form of the schema, it is necessary to check the functional
dependencies and other data dependencies that exists within the schema.
Functional Dependencies
Definition: Let P and Q be sets of columns, then: P functionally determines Q, written P→Q
if and only if any two rows that are equal on (all the attributes in) P must be equal on (all the
attributes in) Q.
In other words, the functional dependency holds If , T1.P =T2.P, then T1.Q=T2.Q
For example: Consider a relation in which the roll of the student and his/her name is stored as
follows:

Here, R->N is true. That means the functional dependency holds true here. Because for every
assigned RollNuumber of student there will be unique name.

Computing Closure Set of Functional Dependency


The closure set is a set of all functional dependencies implied by a given set F. It is denoted by
F+
The closure set of functional dependency can be computed using basic three rules which are
also called as Armstrong's Axioms.
These are as follows -
i) Reflexivity: If XY, then X→ Y
ii) Augmentation: If X→Y, then XZ → YZ for any Z
iii) Transitivity: If X→Y and Y-Z, then X-Z
In addition to above axioms some additional rules for computing closure set of functional
dependency are as follows - • Union: If XY and X-Z then XYZ
• Decomposition: If X-YZ, then XY and X→ Z
Example 2.8.1 Compute the closure of the following set of functional dependencies for a
relation scheme R(A,B,C,D,E), F={A->BC, CD->E, B->D, E->A)
Solution: Consider F as follows
A->BC
CD->E
B->D
E->A
The closure can be written for each attribute of relation as follows:
• (A)*= Step 1: {A}-> the attribute itself
Step 2: {ABC} as A->BC
Step 3: {ABCD} as B->D
Step 4: {ABCDE} as CD->E
Step 5: {ABCDE} as E->A and A is already present
Hence (A)+ ={ABCDE}
• (B)*=Step 1:{B}
Step 2: {BD} as B->D
Step 3: {BD} as there is no BD pair on LHS of F
Hence (B) ={BD}
• (C)*=Step 1:{C}
Step 2: {C} as there is no single C on LHS of F
Hence (C)* ={C}
• (D)+ = Step 1: {D}
Step 3: {D} as there is no BD pair on LHS of F
Hence (D)* ={D} .
• (E)+ =Step 1: {E}
Step 2: {EA} as E->A
Step 3: {EABC) as A->BC
Step 4: {EABCD} as B->D
Step 5: {EABCD} as CD->E and E is already present
By rearranging we get {ABCDE}
Hence (E)+ ={ABCDE}
• (CD)*=Step 1:{CD}
Step 2:{CDE}
Step 3: {CDEA}
Step 4:{CDEAB}
By rearranging we get {ABCDE}
Hence (CD)* ={ABCDE}
Example 2.8.2 Compute the closure of the following set of functional dependencies for a
relation scheme R(A,B,C,D,E), F={A->BC, CD->E, B->D, E->A) and Find the candidate key.
Solution:
We can identify candidate from the given relation schema with the help of functional
dependency. For that purpose, we need to compute the closure set of attribute. Now we will
find out the closure set which can completely identify the relation R(A,B,C,D).
Let, (A)+ = {ABCDE}
(B)+ = {BD}
(C)+ = {C}
(D)+ ={ABCDE}
(E)+ = {ABCDE}
(CD)+ = {ABCD}
Clearly, only (A), (E) and (CD)* gives us (ABCDE) i.e. complete relation R. Hence these are
the candidate keys.

Canonical Cover or Minimal Cover


Formal Definition: A minimal cover for a set F of FDs is a set G of FDs such that:
1)Write the functional dependency in such a way that it contains right side one attribute. Ex:
X->A, where A is a single attribute.
2) Find the closure of attributes and delete the redundant entries.
3) Check whether any left side attribute can be reduced.
Concept of Extraneous Attributes
Definition: An attribute of a functional dependency is said to be extraneous if we can remove
it without changing the closure of the set of functional dependencies.
Algorithm for computing Canonical Cover for set of functional Dependencies F
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1 →β1 and αl→β2 and αl→β1β2
Find a functional dependency α→β in Fc with an extraneous attribute either in α or in β
/* The test for extraneous attributes is done using Fc, not F */
If an extraneous attribute is found, delete it from α→βin Fc.
until (Fc does not change)
Example 2.8.3 Consider the following functional dependencies over the attribute set
R(ABCDE) for finding minimal cover FD = {A->C, AC->D, B->ADE}
Solution: Step 1: Split the FD such that R.H.S contain single attribute. Hence we get
A->C
AC->D
B->A
B->D
B->E
Step 2: Find the redundant entries and delete them. This can be done as follows
For A->C: We find (A)* by assuming that we delete A->C temporarily. We get esimab
(A)*={A}. Thus from A it is not possible to obtain C by deleting A->C. This means we can
not delete A->C
• For AC->D: We find (AC)* by assuming that we delete AC->D temporarily. We get
(AC)=(AC). Thus by such deletion it is not possible to obtain D. This means we can not delete
AC->D
• For B->A: We find (B)* by assuming that we delete B->A temporarily. We get (B)*={BDE).
Thus by such deletion it is not possible to obtain A. This means we can not delete B->A
• For B->D: We find (B)* by assuming that we delete B->D temporarily. We get
(B)=(BEACD). This shows clearly that even if we delete B->D we can obtain D. This means
we can delete B->A. Thus it is redundant.
• For B->E: We find (B) by assuming that we delete B->E temporarily. We get (B)*={BDAC).
Thus by such deletion it is not possible to obtain E. This means we can not delete B->E
To summarize we get now
A->C
AC->D
B->A
B->E
Thus R.H.S gets simplified.
Step 3: Now we will simplify L.H.S.
Consider AC->D. Here we can split A and C. For that we find closure set of A and C.
(A)+ = {AC}
(C)+ = {C}
Thus C can be obtained from both A as well as C. That also means we need not have to have
AC on L.H.S. Instead, only. A can be allowed and C can be eliminated. Thus after
simplification we get
A->D
To summarize we get now
A->C
A->D
B->A
B->E
Thus L.H.S gets simplified.
Step 3: The simplified L.H.S. and R.H.S can be combined together to form
A->CD
B->AE
This is a minimal cover or Canonical cover of functional dependencies.

Concept of Redundancy and Anomalies


Definition: Redundancy is a condition created in database in which same piece of data is held
at two different places.
Redundancy is at the root of several problems associated with relational schemas.
Problems caused by redundancy: Following problems can be caused by redundancy-
i) Redundant storage: Some information is stored repeatedly.
ii) Update anomalies: If one copies of such repeated data is updated then inconsistency is
created unless all other copies are similarly updated.
iii) Insertion anomalies: Due to insertion of new record repeated information get added to the
relation schema.
iv) Deletion anomalies: Due to deletion of particular record some other important information
associated with the deleted record get deleted and thus we may lose some other important
information from the schema.
Example: Following example illustrates the above discussed anomalies or redundancy
problems
Consider following Schema in which all possible information about Employee is stored.

1) Redundant storage: Note that the information about DeptID, DeptName and DeptLoc is
repeated.
2) Update anomalies: In above table if we change DeptLoc of Pune to Chennai, then it will
result inconsistency as for DeptID 101 the DeptLoc is Pune. Or otherwise, we need to update
multiple copies of DeptLoc from Pune to Chennai. Hence this is an update anomaly.
3) Insertion anomalies: For above table if we want to add new tuple say (5, EEE,50000) for
DeptID 101 then it will cause repeated information of (101, XYZ,Pune) will occur.
4) Deletion anomalies: For above table, if we delete a record for EmpID 4, then automatically
information about the DeptID 102,DeptName PQR and DeptLoc Mumbai will get deleted and
one may not be aware about DeptID 102. This causes deletion anomaly.
Decomposition
• Decomposition is the process of breaking down one table into multiple tables.
• Formal definition of decomposition is -
• A decomposition of relation Schema R consists of replacing the relation Schema by two
relation schema that each contain a subset of attributes of R and together include all attributes
of R by storing projections of the instance.
• For example - Consider the following table
Employee_Department table as follows -
We can decompose the above relation Schema into two relation schemas as Employee (Eid,
Ename, Age, City, Salary) and Department (Deptid, Eid, DeptName) as follows –
Employee Table

Department Table

• The decomposition is used for eliminating redundancy.


• For example: Consider following relation Schema R in which we assume that the grade
determines the salary, the redundancy is caused
Schema R
• Hence, the above table can be decomposed into two Schema S and T as follows:

Properties Associated with Decomposition


There are two properties associated with decomposition and those are -
1) Loss-less Join or non-Loss Decomposition: When all information found in the original
database is preserved after decomposition, we call it as loss less or non-loss decomposition.
2) Dependency Preservation: This is a property in which the constraints on the original table
can be maintained by simply enforcing some constraints on each of the smaller relations.

Non-loss Decomposition or Loss-less Join


The lossless join can be defined using following three conditions:
i) Union of attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be
either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
ii) Intersection of attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
iii)Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1)
Att (R1) ∩ Att (R2) -> Att (R2)
Example 2.10.1 Consider the following relation R(A,B,C,D)and FDs A->BC, is the
decomposition of R into R1(A,B,C), R2(A,D). Check if the decomposition is lossless join or not.
Solution:
Step 1: Here Att(R1) U Att(R2) = Att(R) i.e R1(A,B,C) u R2(A,D)=(A,B,C,D) i.e R.
Step 2: Here R1 ∩ R2={A}. Thus Att(R1) ∩ Att(R2)≠Φ. Here the second condition gets
satisfied.
Step 3: Att(R1) ∩ Att(R2) -> {A}. Now (A)*={A,B,C) ϵ attributes of R1. Thus the third
condition gets satisfied.
This shows that the given decomposition is a lossless join.
Example 2.10.2 Consider the following relation R(A,B,C,D,E,F) and FDs A->BC, C->A, D-
>E, F->A, E->D is the decomposition of R into R1(A,C,D), R2(B,C,D), and R3(E,F,D). Check
for lossless.
Solution:
Step 1: R1UR2UR3 = R. Here the first condition for checking lossless join is satisfied as
(A,C,D)U (B,C,D) U (E,F,D) = {A,B,C,D,E,F) which is nothing but R.
Step 2: Consider R1 ∩ R2 = {CD) and R2 ∩ R3 = {D}. Hence second condition of intersection
not being gets satisfied.
Step 3: Now, consider R1(A,C,D) and R2(B,C,D). We find R1∩ R2 = {CD}
(CD)* = {ABCDE} = attributes of R1 i.e.{A,C,D). Hence condition 3 for checking lossless
join for R1 and R2 gets satisfied.
Step 4: Now, consider R2(B,C,D) and R3(E,F,D). We find R2 ∩ R3 = {D}. (D)*={D,E} which
is neither complete set of attributes of R2 or R3.[Note that F is missing for being attribute of
R3].
Hence it is not lossless join decomposition. Or in other words we can say it is a lossy
decomposition.

Dependency Preservation
• Definition: A Decomposition D = {R1, R2, R3....Rn} of R is dependency preserving for a
set F of Functional dependency if - (F1 U F2 U... U Fm) = F.
• If decomposition is not dependency-preserving, some dependency is lost in the
decomposition.
Example 2.10.4 Consider the relation R (A, B, C) for functional dependency set (A-> B and
B-> C) which is decomposed into two relations R1 = (A, C) and R2 = (B, C). Then check if this
decomposition dependency preserving or not.
Solution: This can be solved in following steps:
Step 1: For checking whether the decomposition is dependency preserving or not we need to
check following condition
F+= (F1UF2)+
Step 2: We have with us the F+ = { A->B and B->C}
Step 3: Let us find (F1)+ for relation R1 and (F2)+ for relation R2

Step 4: We will eliminate all the trivial relations and useless relations. Hence we can obtain
R1 and R2 as,

(F1UF2)+ = {A->C, B->C) ≠ {A->B, B->C) i.e.(F)+


Thus the condition specified in step 1 i.e. F+=(F1UF2)+ is not true. Hence it is not dependency
preserving decomposition.
Example 2.10.5 Let relation R(A,B,C,D) be a relational schema with following functional
dependencies (A->B, B->C,C->D, and D->B). The decomposition of R into (A,B), (B,C) and
(B,D). Check whether this decomposition is dependency preserving or not.
Solution:
Step 1: Let (F) = {A->B, B->C, C->D,D->B}.
Step 2: We will find (F1)+, (F2)+, (F3)+ for relations R1(A,B), R2(B,C) and R3(B,D) as follows
-
Step 3: We will eliminate all the trivial relations and useless relations. Hence we can obtain
R1 U R2 U R3 as,

Step 4: As from above FD's we get

Step 5:This proves that F+= (F1UF2UF3)+. Hence given decomposition is dependency
preserving.

Part –III
NORMALIZATION IN DBMS
Normalization in DBMS is a technique using which you can organize the data in the database
tables so that:
 There is less repetition of data,
 A large set of data is structured into a bunch of smaller tables,
 and the tables have a proper relationship between them.
DBMS Normalization is a systematic approach to decompose (break down) tables to
eliminate data redundancy(repetition) and undesirable characteristics like Insertion anomaly in
DBMS, Update anomaly in DBMS, and Delete anomaly in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data, and set up the
relationship between tables.

Why we need Normalization in DBMS?


Normalization is required for,
 Eliminating redundant(useless) data, therefore handling data integrity, because if data
is repeated it increases the chances of inconsistent data.
 Normalization helps in keeping data consistent by storing the data in one table and
referencing it everywhere else.
 Storage optimization although that is not an issue these days because Database storage
is cheap.
 Breaking down large tables into smaller tables with relationships, so it makes the
database structure more scalable and adaptable.
 Ensuring data dependencies make sense i.e. data is logically stored.

Problems without Normalization in DBMS


If a table is not properly normalized and has data redundancy(repetition) then it will not
only eat up extra memory space but will also make it difficult for you to handle and update
the data in the database, without losing data.
Insertion, Updation, and Deletion Anomalies are very frequent if the database is not
normalized. To understand these anomalies let us take an example of a Student table.

Rollno Name Branch Hod office_tel


401 Akon CSE Mr. X 53337
402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

In the table above, we have data for four Computer Sci. students.
As we can see, data for the fields branch, hod (Head of Department), and office_tel are
repeated for the students who are in the same branch in the college, this is Data Redundancy.

1. Insertion Anomaly in DBMS


 Suppose for a new admission, until and unless a student opts for a branch, data of the

student cannot be inserted, or else we will have to set the branch information as NULL.
 Also, if we have to insert data for 100 students of the same branch, then the branch
information will be repeated for all those 100 students.
 These scenarios are nothing but Insertion anomalies.
 If you have to repeat the same data in every row of data, it's better to keep the data
separately and reference that data in each row.
 So in the above table, we can keep the branch information separately, and just use
the branch_id in the student table, where branch_id can be used to get the branch
information.
2. Updation Anomaly in DBMS
 What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer
science department? In that case, all the student records will have to be updated, and if
by mistake we miss any record, it will lead to data inconsistency.
 This is an Updation anomaly because you need to update all the records in your table
just because one piece of information got changed.

3. Deletion Anomaly in DBMS


 In our Student table, two different pieces of information are kept together, the Student
information and the Branch information.
 So if only a single student is enrolled in a branch, and that student leaves the college,
or for some reason, the entry for the student is deleted, we will lose the branch
information too.
 So never in DBMS, we should keep two different entities together, which in the above
example is Student and branch,
The solution for all the three anomalies described above is to keep the student information and
the branch information in two different tables. And use the branch_id in the student table to
reference the branch.

PRIMARY KEY AND NON-KEY ATTRIBUTES


Before we move on to learn different Normal Forms in DBMS, let's first understand
what is a primary key and what are non-key attributes.

As you can see in the table above, the student_id column is a primary key because
using the student_id value we can uniquely identify each row of data, hence the remaining
columns then become the non-key attributes.
TYPES OF DBMS NORMAL FORMS

Normalization rules are divided into the following normal forms:


1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form
6. Fifth Normal Form

1. First Normal Form (1NF)


For a table to be in the First Normal Form, it should follow the following 4 rules:
1. It should only have single(atomic) valued attributes/columns.
2. Values stored in a column should be of the same domain.
3. All the columns in a table should have unique names.
4. And the order in which data is stored should not matter.
Example.
If we have an Employee table in which we store the employee information along with
the employee skillset, the table will look like this:
emp_id emp_name emp_mobile emp_skills

1 John Tick 9999957773 Python, JavaScript

2 Darth Trader 8888853337 HTML, CSS, JavaScript

3 Rony Shark 7777720008 Java, Linux, C++


The above table has 4 columns:
 All the columns have different names.
 All the columns hold values of the same type like emp_name has all the
names, emp_mobile has all the contact numbers, etc.
 The order in which we save data doesn't matter
 But the emp_skills column holds multiple comma-separated values, while as per the
First Normal form, each column should have a single value.
Hence the above table is not in First Normal form.
For converting above table to 1NF we must follow the following steps,
i) Remove the emp_skills column from the Employee table and keep it in some other
table.
ii) Or add multiple rows for the employee and each row is linked with one skill.

i) Create Separate tables for Employee and Employee Skills


So the Employee table, And the new Employee_Skill table will look like this,
emp_id emp_skill

emp_id emp_name emp_mobile 1 Python

1 John Tick 9999957773 1 JavaScript

2 Darth Trader 8888853337 2 HTML

3 Rony Shark 7777720008 2 CSS

2 JavaScript

3 Java

3 Linux

3 C++
ii) Add Multiple rows for Multiple skills
We can also simply add multiple rows to add multiple skills. This will lead to repetition
of the data, but that can be handled as you further Normalize your data using the
Second Normal form and the Third Normal form.
emp_id emp_name emp_mobile emp_skill

1 John Tick 9999957773 Python

1 John Tick 9999957773 JavaScript

2 Darth Trader 8888853337 HTML

2 Darth Trader 8888853337 CSS

2 Darth Trader 8888853337 JavaScript

3 Rony Shark 7777720008 Java

3 Rony Shark 7777720008 Linux

3 Rony Shark 7777720008 C++

2. Second Normal Form (2NF)


For a table to be in the Second Normal Form,
1. It should be in the First Normal form.
2. And, it should not have Partial Dependency.
Example to understand Partial dependency and the Second Normal Form.
What is Partial Dependency?
When a table has a primary key that is made up of two or more columns, then all the
columns (not included in the primary key) in that table should depend on the entire primary
key and not on a part of it. If any column (which is not in the primary key) depends on a part
of the primary key, then we say we have Partial dependency in the table.
Example:
If we have two tables Students and Subjects, to store student information and information
related to subjects. Subject Table:
Student table:
subject_id subject_name
student_id student_name branch
1 C Language
1 Akon CSE
2 DSA
2 Bkon Mechanical
3 Operating System
And we have another table Score to store the marks scored by students in any subject like this,

student_id subject_id Marks teacher_name

1 1 70 Miss. C

1 2 82 Mr. D

2 1 65 Mr. Op
Now in the above table, the primary key is student_id + subject_id, because both these
information are required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on the subject
information or just the subject_id, so we should not keep that information in the Score table.
The column teacher_name should be in the Subjects table. And then the entire system
will be Normalized as per the Second Normal Form.
Updated Subject table and Updated Score table:

subject_id subject_name teacher_name student_id subject_id Marks

1 C Language Miss. C 1 1 70

2 DSA Mr. D 1 2 82

Operating 2 1 65
3 Mr. Op
System

3. Third Normal Form (3NF)


A table is said to be in the Third Normal Form when,
1. It satisfies the First Normal Form and the Second Normal form.
2. And, it doesn't have Transitive Dependency.
What is Transitive Dependency?
In a table we have some column that acts as the primary key and other columns depends
on this column. But what if a column that is not the primary key depends on another column
that is also not a primary key or part of it? Then we have Transitive dependency in our table.
Let's take an example. We had the Score table in the Second Normal Form above. If we have
to store some extra information in it, like,
1. exam_type
2. total_marks
To store the type of exam and the total marks in the exam so that we can later calculate the
percentage of marks scored by each student.
The Score table will look like this,

student_id subject_id Marks exam_type total_marks

1 1 70 Theory 100

1 2 82 Theory 100

2 1 42 Practical 50
 In the table above, the column exam_type depends on
both student_id and subject_id, because,
o a student can be in the CSE branch or the Mechanical branch,
o and based on that they may have different exam types for different subjects.
o The CSE students may have both Practical and Theory for Compiler Design,
o whereas Mechanical branch students may only have Theory exams for Compiler
Design.
But the column total_marks just depends on the exam_type column. And
the exam_type column is not a part of the primary key. Because the primary key
is student_id + subject_id, hence we have a Transitive dependency here.
How to Transitive Dependency?
We create a separate table for ExamType and use it in the Score table.
New ExamType table,
exam_type_id exam_type total_marks duration

1 Practical 50 45

2 Theory 100 180

3 Workshop 150 300


We have created a new table ExamType and we have added more related information
in it like duration (duration of exam in mins.), and now we can use the exam_type_id in
the Score table.
4. Boyce-Codd Normal Form (BCNF)

 Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is
also known as 3.5 Normal Form.

Rules for BCNF

For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:

 It should be in the Third Normal Form.


 And, for any dependency A → B, A should be a super key
(i.e) A → B, A cannot be a non-prime attribute, if B is a prime attribute.
Example :
we have a college enrolment table with columns student_id, subject and professor.
student_id Subject Professor
101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash
104 Java P.Java

In the table above:


 One student can enrol for multiple subjects. For example, student with student_id 101,
has opted for subjects - Java & C++
 For each subject, a professor is assigned to the student.
 And, there can be multiple professors teaching one subject like we have for Java.
In the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point here is, one professor teaches only one subject, but one subject may
have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on
the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.

But this table is not in Boyce-Codd Normal Form.


Because, In the table above, student_id, subject form primary key, which
means subject column is a prime attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not
allowed by BCNF.

How to satisfy BCNF?


To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.

Below we have the structure for both the tables.

Student Table Professor Table


p_id professor subject
student_id p_id
1 P.Java Java
101 1
2 P.Cpp C++
101 2
and so on...
and so on...

And now, this relation satisfy Boyce-Codd Normal Form.

5. Fourth Normal Form (4NF)

Rules for 4th Normal Form


For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:

1. It should be in the Boyce-Codd Normal Form.


2. And, the table should not have any Multi-valued Dependency.

What is Multi-valued Dependency?


A table is said to have multi-valued dependency, if the following conditions are true,

1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued
dependency.

Example
Below we have a college enrolment table with columns s_id, course and hobby.
s_id course Hobby
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.

Two records for student with s_id 1, will give rise to two more records, as shown below,
s_id course Hobby
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby. They
are independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.

How to satisfy 4th Normal Form?


To make the above relation satify the 4th normal form, we can decompose the table into 2
tables.
CourseOpted Table Hobbies Table,
s_id Course
1 Science s_id hobby
1 Maths 1 Cricket
2 C# 1 Hockey
2 Php 2 Cricket
2 Hockey
Now this relation satisfies the fourth normal form.

5. Fifth Normal Form (5NF)


Fifth Normal Form in Database Normalization is generally not implemented in real life
database design. But you should know what it is.
Concept of Fifth Normal Form
The database is said to be in 5NF if -
i) It is in 4th Normal Form
ii) If we can decompose table further to eliminate redundancy and anomalies and when we
rejoin the table we should not be losing the original data or get a new record (join Dependency
Principle)
The fifth normal form is also called as project join normal form
For example - Consider following table

Here we assume the keys as {Seller, Company, Product}


The above table has multivalued dependency as
Seller → {Company, Product). Hence table is not in 4th Normal Form. To make the above table
in 4th normal form we decompose above table into two tables as

The above table is in 4th Normal Form as there is no multivalued dependency. But it is not in
5th normal form because if we join the above two table we may get
To avoid the above problem we can decompose the tables into three tables as
Seller_Company, Seller_Product, and Company Product table

Thus the table in in 5th normal form.

You might also like