0% found this document useful (0 votes)
33 views

Unit-2 DBMS Notes

The document discusses database design and the entity-relationship (ER) model. It defines database design as the process of designing, developing, implementing, and maintaining enterprise data management systems. The objectives are to produce logical and physical design models. The ER model is a conceptual data model that represents real-world entities and relationships between entities. It uses entities, attributes, and relationships. ER diagrams visually show entities as rectangles, attributes as ovals, and relationships as diamonds. The document provides examples and definitions of entities, attributes, relationships, and cardinalities.

Uploaded by

Amrith Madhira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Unit-2 DBMS Notes

The document discusses database design and the entity-relationship (ER) model. It defines database design as the process of designing, developing, implementing, and maintaining enterprise data management systems. The objectives are to produce logical and physical design models. The ER model is a conceptual data model that represents real-world entities and relationships between entities. It uses entities, attributes, and relationships. ER diagrams visually show entities as rectangles, attributes as ovals, and relationships as diamonds. The document provides examples and definitions of entities, attributes, relationships, and cardinalities.

Uploaded by

Amrith Madhira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Unit-2

Database Design and ER-Model

Overview of Design process-


Database Design is a collection of processes that facilitate the
designing, development, implementation and maintenance of
enterprise data management systems. Properly designed database are
easy to maintain, improves data consistency and are cost effective in
terms of disk storage space. The database designer decides how the
data elements correlate and what data must be stored.

The main objectives of database design in DBMS are to produce logical


and physical designs models of the proposed database system.

The logical model concentrates on the data requirements and the data
to be stored independent of physical considerations. It does not
concern itself with how the data will be stored or where it will be
stored physically.

The physical data design model involves translating the logical DB


design of the database onto physical media using hardware resources
and software systems such as It helps produce database systems

1. That meet the requirements of the users


2. Have high performance.

Database design process in DBMS is crucial for high


performance database system.
What is ER Model?
ER Model stands for Entity Relationship Model is a high-level
conceptual data model diagram. ER model helps to systematically
analyze data requirements to produce a well-designed database. The
ER Model represents real-world entities and the relationships between
them. Creating an ER Model in DBMS is considered as a best practice
before implementing your database.

What is ER Diagram?
ER Diagram stands for Entity Relationship Diagram, also known as ERD
is a diagram that displays the relationship of entity sets stored in a
database. In other words, ER diagrams help to explain the logical
structure of databases. ER diagrams are created based on three basic
concepts: entities, attributes and relationships.

ER Diagrams contain different symbols that use rectangles to represent


entities, ovals to define attributes and diamond shapes to represent
relationships.

ER Diagrams Symbols & Notations


Entity Relationship Diagram Symbols & Notations mainly contains
three basic symbols which are rectangle, oval and diamond to
represent relationships between elements, entities and attributes.
There are some sub-elements which are based on main elements in
ERD Diagram. ER Diagram is a visual representation of data that
describes how data is related to each other using different ERD Symbols
and Notations.
Following are the main components and its symbols in ER Diagrams:

• Rectangles: This Entity Relationship Diagram symbol represents


entity types
• Ellipses : Symbol represent attributes
• Diamonds: This symbol represents relationship types
• Lines: It links attributes to entity types and entity types with other
relationship types

Primary key: attributes are underlined

• Double Ellipses: Represent multi-valued attributes

ER Diagram Symbols

Components of the ER Diagram


This model is based on three basic concepts:

• Entities
• Attributes
• Relationships

ER Diagram Examples

For example, in a University database, we might have entities for


Students, Courses, and Lecturers. Students entity can have attributes
like Rollno, Name, and DeptID. They might have relationships with
Courses and Lecturers.
WHAT IS ENTITY?
A real-world thing either living or non-living that is easily recognizable
and nonrecognizable. It is anything in the enterprise that is to be
represented in our database. It may be a physical thing or simply a fact
about the enterprise or an event that happens in the real world.

An entity can be place, person, object, event or a concept, which stores


data in the database. The characteristics of entities are must have an
attribute, and a unique key. Every entity is made up of some ‘attributes’
which represent that entity.

Examples of entities:

• Person: Employee, Student, Patient


• Place: Store, Building
• Object: Machine, product, and Car
• Event: Sale, Registration, Renewal
• Concept: Account, Course

Notation of an Entity

Entity set:
Student

An entity set is a group of similar kind of entities. It may contain entities


with attribute sharing similar values. Entities are represented by their
properties, which also called attributes. All attributes have their
separate values. For example, a student entity may have a name, age,
class, as attributes.
Example of Entities:

A university may have some departments. All these departments


employ various lecturers and offer several programs.

Some courses make up each program. Students register in a particular


program and enroll in various courses. A lecturer from the specific
department takes each course, and each lecturer teaches a various
group of students.

Relationship
Relationship is nothing but an association among two or more entities.
E.g., Tom works in the Chemistry department.

Entities take part in relationships. We can often identify relationships


with verbs or verb phrases.
For example:

• You are attending this lecture


• I am giving the lecture
• Just loke entities, we can classify relationships according to
relationship-types:
• A student attends a lecture
• A lecturer is giving a lecture.

Weak Entities
A weak entity is a type of entity which doesn’t have its key attribute. It
can be identified uniquely by considering the primary key of another
entity. For that, weak entity sets need to have participation.

In above ER Diagram examples, “Trans No” is a discriminator within a


group of transactions in an ATM.
Let’s learn more about a weak entity by comparing it with a Strong
Entity

Strong Entity Set Weak Entity Set

Strong entity set always has a It does not have enough attributes
primary key. to build a primary key.

It is represented by a rectangle It is represented by a double


symbol. rectangle symbol.

It contains a Primary key It contains a Partial Key which is


represented by the underline represented by a dashed underline
symbol. symbol.

The member of a strong entity set is The member of a weak entity set
called as dominant entity set. called as a subordinate entity set.

In a weak entity set, it is a


Primary Key is one of its attributes
combination of primary key and
which helps to identify its member.
partial key of the strong entity set.

In the ER diagram the relationship The relationship between one strong


between two strong entity set and a weak entity set shown by
shown by using a diamond symbol. using the double diamond symbol.
Attributes
It is a single-valued property of either an entity-type or a relationship-
type.

For example, a lecture might have attributes: time, date, duration,


place, etc.

An attribute in ER Diagram examples, is represented by an Ellipse

Types of Attributes Description

Simple attributes can’t be divided any


further. For example, a student’s
Simple attribute
contact number. It is also called an
atomic value.

It is possible to break down composite


attribute. For example, a student’s full
Composite attribute
name may be further divided into first
name, second name, and last name.

This type of attribute does not include


in the physical database. However,
their values are derived from other
Derived attribute attributes present in the database. For
example, age should not be stored
directly. Instead, it should be derived
from the DOB of that employee.
Multivalued attributes can have more
than one values. For example, a
Multivalued attribute
student can have more than one
mobile number, email address, etc.

Cardinality
Defines the numerical attributes of the relationship between two
entities or entity sets.

Different types of cardinal relationships are:

• One-to-One Relationships
• One-to-Many Relationships
• May to One Relationships
• Many-to-Many Relationships
1.One-to-one:
One entity from entity set X can be associated with at most one entity
of entity set Y and vice versa.

Example: One student can register for numerous courses. However, all
those courses have a single line back to that one student.
2.One-to-many:
One entity from entity set X can be associated with multiple entities of
entity set Y, but an entity from entity set Y can be associated with at
least one entity.

For example, one class is consisting of multiple students.

3. Many to One
More than one entity from entity set X can be associated with at most
one entity of entity set Y. However, an entity from entity set Y may or
may not be associated with more than one entity from entity set X.

For example, many students belong to the same class.

Constraints are used for modeling limitations on the relations between


entities.
There are two types of constraints on the Entity Relationship (ER)
model −
• Mapping cardinality or cardinality ratio.
• Participation constraints.
Mapping Cardinality
It is expressed as the number of entities to which another entity can be
associated via a relationship set.
For the binary relationship set there are entity set A and B then the
mapping cardinality can be one of the following −
• One-to-one
• One-to-many
• Many-to-one
• Many-to-many
One-to-one relationship
An entity set A is associated with at most one entity in B and an entity
in B is associated with at most one entity in A.

One-to-many relationship
An entity set A is associated with any number of entities in B with a
possibility of zero and an entity in B is associated with at most one
entity in A.
Many-to-one relationship
An entity set A is associated with at most one entity in B and an entity
set in B can be associated with any number of entities in A with a
possibility of zero.

Many-to-many relationship
An entity set A is associated with any number of entities in B with a
possibility of zero and an entity in B is associated with any number of
entities in A with a possibility of zero.
Participation Constraints
Participate constraints are two types as mentioned below −
• Total participation
• Partial Participation
The participation constraints are explained in the diagram below −

Here, the customer to Loan is partial participation and the loan to the
customer is total participation.
Total participation
The participation of an entity set E in a relationship set R is said to be
total if every entity in E Participates in at least one relationship in R.
For Example − Participation of loan in the relationship borrower is total
participation.
Partial Participation
If only some of the entities in E participate in relationship R, then the
participation of E in R is said to be partial participation.
For example − Participation of customers in the relationship borrower is
partial participation.
Reduction of schemas to relational schemas
Entity-Relationship (ER) Diagram is a diagrammatic representation of
data in databases, it shows how data is related.
Note: This article is for those who already know what is ER diagram
and how to draw ER diagram.
1) When there are Many to One cardinalities in the ER diagram.
For example, a student can be enrolled only in one course, but a
course can be enrolled by many students.

For Student(SID, Name), SID is the primary key. For Course(CID,


C_name ), CID is the primary key

Student Course
(SID Name) ( CID C_name )
-------------- -----------------
1 A c1 Z
2 B c2 Y
3 C c3 X
4 D
Enroll
(SID CID)
----------
1 C1
2 C1
3 c3
4 C2
Now the question is, what should be the primary key for Enroll?
Should it be SID or CID or both combined into one. We can’t have CID
as the primary key because a CID can have multiple SIDs. (SID, CID) can
distinguish table uniquely, but it is not minimum. So SID is the primary
key for the relation enrollment.
For the above ER diagram, we considered three tables in the database

Student
Enroll
Course
But we can combine Student and Enroll table renamed as
Student_enroll.
Student_Enroll
( SID Name CID )
---------------------
1 A c1
2 B c1
3 C c3
4 D c2
Student and enroll tables are merged now.
So require a minimum of two DBMS tables for Student_enroll and
Course.
Note: In One to Many relationships we can have a minimum of two
tables.

2. When there are Many to Many cardinalities in ER Diagram.


Let us consider the above example with the change that now a student
can enroll in more than 1 course.

Student Course
( SID Name) ( CID C_name )
-------------- -----------------
1 A c1 Z
2 B c2 Y
3 C c3 X
4 D
Enroll
( SID CID )
----------
1 C1
1 C2
2 C1
2 C2
3 c3
4 C2
Now, the same question arises. What is the primary key of Enroll
relation? If we carefully analyze, the primary key for Enroll table is (
SID, CID ).
But in this case, we can’t merge Enroll table with any one of the
Student and Course. If we try to merge Enroll with any one of the
Student and Course it will create redundant data.
Note: Minimum of three tables are required in the Many to Many
relationships.

3. One to One Relationship


There are two possibilities
A) If we have One to One relationship and we have total
participation at at-least one end.
For example, consider the below ER diagram.
A1 and B1 are primary keys of E1 and E2 respectively.
In the above diagram, we have total participation at the E1 end.
Only a single table is required in this case having the primary key of E1
as its primary key.
Since E1 is in total participation, each entry in E1 is related to only one
entry in E2, but not all entries in E2 are related to an entry in E1.
The primary key of E1 should be allowed as the primary key of the
reduced table since if the primary key of E2 is used, it might have null
values for many of its entries in the reduced table.
Note: Only 1 table required.

B) One to One relationship with no total participation.

A1 and B1 are primary keys of E1 and E2 respectively.


The primary key of R can be A1 or B1, but we can’t still combine all
three tables into one. if we do so, some entries in the combined table
may have NULL entries. So the idea of merging all three tables into
one is not good.
But we can merge R into E1 or E2. So a minimum of 2 tables is
required.

ER Design Issues
The basic design issues of an ER database schema in the following
points:

1) Use of Entity Set vs Attributes

The use of an entity set or attribute depends on the structure of the


real-world enterprise that is being modelled and the semantics
associated with its attributes. It leads to a mistake when the user use
the primary key of an entity set as an attribute of another entity set.

2) Use of Entity Set vs. Relationship Sets


It is difficult to examine if an object can be best expressed by an entity
set or relationship set. To understand and determine the right use, the
user need to designate a relationship set for describing an action that
occurs in-between the entities

3) Use of Binary vs n-ary Relationship Sets


Generally, the relationships described in the databases are binary
relationships. However, non-binary relationships can be represented by
several binary relationships. For example, we can create and represent
a ternary relationship 'parent' that may relate to a child, his father, as
well as his mother. Such relationship can also be represented by two
binary relationships i.e, mother and father, that may relate to their
child. Thus, it is possible to represent a non-binary relationship by a set
of distinct binary relationships.
4) Placing Relationship Attributes

The cardinality ratios can become an affective measure in the


placement of the relationship attributes. So, it is better to associate the
attributes of one-to-one or one-to-many relationship sets with any
participating entity sets, instead of any relationship set.

Thus, it requires the overall knowledge of each part that is involved


in desgining and modelling an ER diagram. The basic requirement is to
analyse the real-world enterprise and the connectivity of one entity or
attribute with other.

Extended Entity-Relationship (EE-R) Model


EER is a high-level data model that incorporates the extensions to the
original ER model. Enhanced ERD are high level models that represent
the requirements and complexities of complex database.
In addition to ER model concepts EE-R includes −

• Subclasses and Super classes.


• Specialization and Generalization.
• Category or union type.
• Aggregation.

These concepts are used to create EE-R diagrams.


Subclasses and Super class
Super class is an entity that can be divided into further subtype.
For example − consider Shape super class.
Super class shape has sub groups: Triangle, Square and Circle.Sub
classes are the group of entities with some unique attributes.Sub class
inherits the properties and attributes from super class.
Specialization and Generalization
Generalization is a process of generalizing an entity which contains
generalized attributes or properties of generalized entities.

It is a Bottom up process i.e. consider we have 3 sub entities Car, Truck


and Motorcycle. Now these three entities can be generalized into one
super class named as Vehicle.
Specialization is a process of identifying subsets of an entity that share
some different characteristic. It is a top down approach in which one
entity is broken down into low level entity.
In above example Vehicle entity can be a Car, Truck or Motorcycle.

Aggregation
Represents relationship between a whole object and its component.

Consider a ternary relationship Works_On between Employee, Branch


and Manager. Now the best way to model this situation is to use
aggregation, So, the relationship-set, Works_On is a higher level entity-
set. Such an entity-set is treated in the same manner as any other
entity-set. We can create a binary relationship, Manager, between
Works_On and Manager to represent who manages what tasks.
Alternative Notations for Modeling Data
A diagrammatic representation of the data model of an application is a
veryimportant part of designing a database schema. Creation of a
database schema requires not only data modeling experts, but also
domain experts who knowthe requirements of the application but may
not be familiar with data modeling.An intuitive diagrammatic
representation is particularly important since it eases
communication of information between these groups of experts.

A number of alternative notations for modeling data have been


proposed,of which E-R diagrams and UML class diagrams are the most
widely used. Thereis no universal standard for E-R diagram notation,
and different books and E-Rdiagram software use different notations.
We have chosen a particular notation

Figure 7.24 Symbols used in the E-R notation.


Database Design and the E-R Model
in this sixth edition of this book which actually differs from the notation
we used
in earlier editions, for reasons that we explain later in this section.
In the rest of this section, we study some of the alternative E-R diagram
notations, as well as the UML class diagram notation. To aid in
comparison of our
notation with these alternatives, Figure 7.24 summarizes the set of
symbols we
have used in our E-R diagram notation.
Alternative E-R Notations
Figure 7.25 indicates some of the alternative E-R notations that are
widely used.
One alternative representation of attributes of entities is to show them
in ovals
connected to the box representing the entity; primary key attributes
are indicated
by underlining them. The above notation is shown at the top of the
figure. Relationship
attributes can be similarly represented, by connecting the ovals to the
diamond representing the relationship.
ways, as shown in Figure 7.25. In one alternative, shown on the left side
of the
figure, labels ∗ and 1 on the edges out of the relationship are used for
depicting
many-to-many, one-to-one, and many-to-one relationships. The case of
one-tomany
is symmetric to many-to-one, and is not shown.
In another alternative notation shown on the right side of the figure,
relationship
sets are represented by lines between entity sets, without diamonds;
only
binary relationships can be modeled thus. Cardinality constraints in
such a notation
are shown by “crow’s-foot” notation, as in the figure. In a relationship R
between E1 and E2, crow’s feet on both sides indicates a many-to-many
relationship,
while crow’s feet on just the E1 side indicates a many-to-one
relationship
from E1 to E2. Total participation is specified in this notation by a
vertical bar.
Note however, that in a relationship R between entities E1 and E2, if
the participation of E1 in R is total, the vertical bar is placed on the
opposite side, adjacent to entity E2. Similarly, partial participation is
indicated by using a circle, again on the opposite side.

Characteristics of a good database are:


1. We should be able to store all kinds of data that exist in this real
world. Since we need to work with all kinds of data and
requirements, the database should be strong enough to store all
kinds of data that are present around us.
2. We should be able to relate the entities/tables in the database by
means of relation. i.e.; any two tables should be related. Let us
say, an employee works for a department. This implies that an
Employee is related to a particular department. We should be
able to define such a relationship between any two entities in the
database. There should not be any table lying without any
mapping.
3. Data and applications should be isolated. Because the database is
a system that gives the platform to store the data, and the data is
the one that allows the database to work. Hence there should be
a clear differentiation between them.

4. There should not be any duplication of data in the database. Data


should be stored in such a way that it should not be repeated in
multiple tables. If repeated, it would be an unnecessary waste of
DB space, and maintaining such data becomes chaos.
5. DBMS has a strong query language. Once the database is
designed, this helps the user to retrieve and manipulate the data.
If a particular user wants to see any specific data, he can apply as
many filtering conditions that he wants and pull the data that he
needs.
6. Multiple users should be able to access the same database,
without affecting the other user. i.e.; if teachers want to update a
student’s marks in the Results table at the same time, then they
should be allowed to update the marks for their subjects, without
modifying other subject marks. A good database should support
this feature.
7. It supports multiple views to the user, depending on his role. In a
school database, Students will able to see only their reports and
their access would be read-only. At the same time, teachers will
have access to all the students with modification rights. But the
database is the same. Hence a single database provides different
views to different users.
8. The database should also provide security, i.e.; when there are
multiple users are accessing the database, each user will have
their own levels of rights to see the database. Some of them will
be allowed to see the whole database, and some will have only
partial rights. For example, an instructor who is teaching Physics
will have access to see and update marks of his subject. He will
not have access to other subjects. But the HOD will have full
access to all the subjects.
9. The database should also support the ACID property. i.e.; while
performing any transactions like insert, update and delete, the
database makes sure that the real purpose of the data is not lost.
For example, if a student’s address is updated, then it should
make sure that there is no duplicate data is created nor there is
any data mismatch for that student.

Functional Dependency
The functional dependency is a relationshipthat exists between two
attributes. It typically exists between the primary key and non-key
attribute within a table.

X → Y
The left side of FD is known as a determinant, the right side of the
production is known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id,


Emp_Name, Emp_Address.Here Emp_Id attribute can uniquely identify
the Emp_Name attribute of employee table because if we know the
Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency


1. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial function
al dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employe
e_Name are trivial dependencies too.

2. Non-trivial functional dependency


o A → B has a non-trivial functional dependency if B is not a subset
of A.
o When A intersection B is NULL, then A → B is called as complete
non-trivial.

Example

ID → Name,

Name → DOB
Normalization
o Normalization is the process of organizing the data in the
database.

o Normalization divides the larger table into smaller and links them
using relationships.

o The normal form is used to reduce redundancy from the database


table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these


anomalies. Failure to eliminate anomalies leads to data redundancy and
can cause data integrity and other problems as the database grows.
Normalization consists of a series of guidelines that helps to guide you
in creating a good database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot


insert a new tuple into a relationship due to lack of data.

o Deletion Anomaly: The delete anomaly refers to the situation


where the deletion of data results in the unintended loss of some
other important data.
o Updatation Anomaly: The update anomaly is when an update of a
single data value requires multiple rows of data to be updated.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms.


The normal forms apply to individual relations. The relation is said to be
in particular normal form if it satisfies constraints.

Following are the various types of Normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-


key attributes are fully functional dependent on
the primary key.

3NF A relation will be in 3NF if it is in 2NF and no


transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce


Codd's normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's
normal form and has no multi-valued dependency.

Advantages of Normalization

o Normalization helps to minimize data redundancy.

o Greater overall database organization.

o Data consistency within the database.

o Much more flexible database design.

o Enforces the concept of relational integrity.

Disadvantages of Normalization

o You cannot start building the database before knowing what the
user needs.

o It is very time-consuming and difficult to normalize relations of a


higher degree.

Careless decomposition may lead to a bad database design,


leading to serious problems First Normal Form (1NF)

o A relation will be 1NF if it contains.

o an atomic value.
o It states that an attribute of a table cannot hold multiple values. It
must hold only single-valued attributes.
1.First Normal Form:

o First normal form disallows the multi-valued attribute, composite

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued


attribute EMP_PHONE.

EMPLOYEE table:

The decomposition of the EMPLOYEE table into 1NF has been shown
below.
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

2.Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully
functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one
subject.
TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on


TEACHER_ID which is a proper subset of a candidate key. That's why it
violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:
TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_DETAIL table:

TEACHER_SUBJECT table:

SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer
For a table to be in third normal form, it needs to satisfy the following
conditions:

• It should be in second normal form


• It should not have any transitive dependencies for non-prime
attributes
Transitive dependencies are indirect relationships between values in
the same table that cause functional dependencies.
For a table to not have any transitive dependencies, we need to ensure
that no non-prime attribute determines another non-prime attribute as
only prime attributes or candidate keys can determine non-prime
attributes for a table in 3NF.
The following figure shows an example of a transitive dependency:

3.Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to
achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then
the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following


conditions for every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some
candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_I EMP_NAM EMP_ZI EMP_STAT EMP_CIT


D E P E Y

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above table

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZI


P}....so on

Candidate key: {EMP_ID}


Non-prime attributes: In the given table, all attributes except
EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and


EMP_ZIP dependent on EMP_ID. The non-prime attributes

(EMP_STATE, EMP_CITY) transitively dependent on super


key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to


the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary
key.The resultant table is as follows

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007


EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

4.Boyce Codd normal form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.


o A table is in BCNF if every functional dependency X → Y, X is the
super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is
super key.

Example: Let's assume there is a company where employees work in


more than one department.

EMPLOYEE table:

EMP_I EMP_COUNT EMP_DEP DEPT_T EMP_DEPT_N


D RY T YPE O

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

EMP_ID → EMP_COUNTRY

1. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}


4.3K

Tech Cheap

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone
are keys.

To convert the given table into BCNF, we decompose it into three


tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549


EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

Forthefirsttable: EMP_ID
Forthesecondtable: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional
dependencies is a key.
5.Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has
no multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple values
of B exists, then the relation will be a multi-valued dependency.

ExampLE

STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE
and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains


twocourses, Computer and Math andwo hobbies, Dancing and Singing.
So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two
tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics
STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

DATA ANOMOLIES AND ITS TYPES

The normalization process was created largely in order to reduce the


negative effects of creating tables that will introduce anomalies into the
database.

There are three types of Data Anomalies: Update Anomalies, Insertion


Anomalies, and Deletion Anomalies.
Update Anomalies happen when the person charged with the task of
keeping all the records current and accurate, is asked, for example, to
change an employee’s title due to a promotion. If the data is stored

redundantly in the same table, and the person misses any of them,
then there will be multiple titles associated with the employee. The end
user has no way of knowing which is the correct title.

Insertion Anomalies happen when inserting vital data into the


database is not possible because other data is not already there. For
example, if a system is designed to require that a customer be on file
before a sale can be made to that customer, but you cannot add a
customer until they have bought something, then you have an insert
anomaly. It is the classic "catch-22" situation.

Deletion Anomalies happen when the deletion of unwanted


information causes desired information to be deleted as well. For
example, if a single database record contains information about a
particular product along with information about a salesperson for the
company and the salesperson quits, then information about the
product is deleted along with salesperson information

Database Design Methodologies


Database Design Methodologies has phases to guide the designer for
assistance. The Methodology has a structured approach to help in the
design process.
The following are the phases/ models −

Conceptual Phase
The Conceptual phase lets you know the entities and the relation
between them. It describes the conceptual schema. The entities &
relations are defined here.
Logical Phase
Logical data model provides details about the data to the physical
phase. The physical process gives ER Diagram, data dictionary, schema,
etc that acts as a source for the physical design process.
Physical Phase
The physical database design allows the designer to decide on how the
database will be implemented.

---------XXX---------

You might also like