0% found this document useful (0 votes)
10 views47 pages

DBMS Unitiii

Uploaded by

sirisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views47 pages

DBMS Unitiii

Uploaded by

sirisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

1

Entity Relationship Diagram – ER


Diagram in DBMS
An Entity–relationship model (ER model) describes the structure of a database with
the help of a diagram, which is known as Entity Relationship Diagram (ER Diagram).
An ER model is a design or blueprint of a database that can later be implemented as a
database. The main components of E-R model are: entity set and relationship set.

What is an Entity Relationship Diagram (ER


Diagram)?
An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a
table or attribute of a table in database, so by showing relationship among tables and
their attributes, ER diagram shows the complete logical structure of a database. Lets
have a look at a simple ER diagram to understand this concept.

A simple ER Diagram:
In the following diagram we have two entities Student and College and their
relationship. The relationship between Student and College is many to one as a college
can have many students however a student cannot study in multiple colleges at the
same time. Student entity has attributes such as Stu_Id, Stu_Name & Stu_Addr and
College entity has attributes such as Col_ID & Col_Name.
2

Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss
these terms in detail in the next section(Components of a ER Diagram) of this guide so
don’t worry too much about these terms now, just go through them once.

Rectangle: Represents Entity sets.


Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set

Components of a ER Diagram

As shown in the above diagram, an ER diagram has three main components:


1. Entity
2. Attribute
3. Relationship

1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an
ER diagram.
For example: In the following ER diagram we have two entities Student and College and
these two entities have many to one relationship as many students study in a single
college. We will read more about relationships later, for now focus on entities.

Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
double rectangle. For example – a bank account cannot be uniquely identified without
3

knowing the bank to which the account belongs, so bank account is a weak entity.

2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an
ER diagram. There are four types of attributes:

1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student
roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is
underlined.

2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For
example, In student entity, the student address is a composite attribute as an address is
4

composed of other attributes such as pin code, state, country.

3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is
represented with double ovals in an ER Diagram. For example – A person can have
more than one phone numbers so the phone number attribute is multivalued.

4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It
is represented by dashed oval in an ER Diagram. For example – Person age is a
derived attribute as it changes over time and can be derived from another attribute
(Date of birth).

E-R diagram with multivalued and derived attributes:

3. Relationship
5

A relationship is represented by diamond shape in ER diagram, it shows the relationship


among entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

1. One to One Relationship


When a single instance of an entity is associated with a single instance of another entity
then it is called one to one relationship. For example, a person has only one passport
and a passport is given to one person.

2. One to Many Relationship


When a single instance of an entity is associated with more than one instances of
another entity then it is called one to many relationship. For example – a customer can
place many orders but a order cannot be placed by many customers.

3. Many to One Relationship


When more than one instances of an entity is associated with a single instance of
another entity then it is called many to one relationship. For example – many students
can study in a single college but a student cannot study in many colleges at the same
6

time.

4. Many to Many Relationship


When more than one instances of an entity is associated with more than one instances
of another entity then it is called many to many relationship. For example, a can be
assigned to many projects and a project can be assigned to many students.

Total Participation of an Entity set


A Total participation of an entity set represents that each entity in entity set must have
at least one relationship in a relationship set. For example: In the below diagram each
college must have at-least one associated Student.
7

Structural Constraints
Cardinality Constraint-

Cardinality constraint defines the maximum number of relationship instances in which an entity can
participate.

Types of Cardinality Ratios-

There are 4 types of cardinality ratios-

1. Many-to-Many cardinality (m:n)


2. Many-to-One cardinality (m:1)
3. One-to-Many cardinality (1:n)
4. One-to-One cardinality (1:1 )

Also read- Relationship Sets in DBMS and Entity Sets in DBMS

1. Many-to-Many Cardinality-
8

By this cardinality constraint,


 An entity in set A can be associated with any number (zero or more) of entities in set B.
 An entity in set B can be associated with any number (zero or more) of entities in set A.

Symbol Used-

Example-

Consider the following ER diagram-

Here,
 One student can enroll in any number (zero or more) of courses.
 One course can be enrolled by any number (zero or more) of students.

2. Many-to-One Cardinality-

By this cardinality constraint,


 An entity in set A can be associated with at most one entity in set B.
 An entity in set B can be associated with any number (zero or more) of entities in set A.
9

Symbol Used-

Example-

Consider the following ER diagram-

Here,
 One student can enroll in at most one course.
 One course can be enrolled by any number (zero or more) of students.

3. One-to-Many Cardinality-

By this cardinality constraint,


 An entity in set A can be associated with any number (zero or more) of entities in set B.
10

 An entity in set B can be associated with at most one entity in set A.

Symbol Used-

Example-

Consider the following ER diagram-

Here,
 One student can enroll in any number (zero or more) of courses.
 One course can be enrolled by at most one student.

4. One-to-One Cardinality-
11

By this cardinality constraint,


 An entity in set A can be associated with at most one entity in set B.
 An entity in set B can be associated with at most one entity in set A.

Symbol Used-

Example-

Consider the following ER diagram-

Here,
 One student can enroll in at most one course.
 One course can be enrolled by at most one student.
12

Participation Constraints-

Participation constraints define the least number of relationship instances in which an entity must
compulsorily participate.

Types of Participation Constraints-

There are two types of participation constraints-

1. Total participation
2. Partial participation

1. Total Participation-

 It specifies that each entity in the entity set must compulsorily participate in at least one
relationship instance in that relationship set.
 That is why, it is also called as mandatory participation.
 Total participation is represented using a double line between the entity set and relationship set.
13

Example-

Here,
 Double line between the entity set “Student” and relationship set “Enrolled in” signifies total
participation.
 It specifies that each student must be enrolled in at least one course.

2. Partial Participation-

 It specifies that each entity in the entity set may or may not participate in the relationship instance
in that relationship set.
 That is why, it is also called as optional participation.
 Partial participation is represented using a single line between the entity set and relationship set.

Example-
14

Here,
 Single line between the entity set “Course” and relationship set “Enrolled in” signifies partial
participation.
 It specifies that there might exist some courses for which no enrollments are made.

Relationship between Cardinality and Participation


Constraints-

Minimum cardinality tells whether the participation is partial or total.


 If minimum cardinality = 0, then it signifies partial participation.
 If minimum cardinality = 1, then it signifies total participation.
Maximum cardinality tells the maximum number of entities that participates in a relationship set.

Keys in Relational Model (Candidate, Super, Primary,


Alternate and Foreign)
15

Candidate Key: The minimal set of attribute which can uniquely identify a tuple is known as
candidate key. For Example, STUD_NO in STUDENT relation.

The value of Candidate Key is unique and non-null for every tuple.

 There can be more than one candidate key in a relation. For Example, STUD_NO as well
as STUD_PHONE both are candidate keys for relation STUDENT.
 The candidate key can be simple (having only one attribute) or composite as well. For
Example, {STUD_NO, COURSE_NO} is a composite candidate key for relation
STUDENT_COURSE.
Note – In Sql Server a unique constraint that has a nullable column, allows the value ‘null‘ in
that column only once. That’s why STUD_PHONE attribute as candidate here, but can not be
‘null’ values in primary key attribute.

Super Key: The set of attributes which can uniquely identify a tuple is known as Super Key. For
Example, STUD_NO, (STUD_NO, STUD_NAME) etc.
 Adding zero or more attributes to candidate key generates super key.
 A candidate key is a super key but vice versa is not true.

Primary Key: There can be more than one candidate key in a relation out of which one can be
chosen as primary key. For Example, STUD_NO as well as STUD_PHONE both are candidate
keys for relation STUDENT but STUD_NO can be chosen as primary key (only one out of many
candidate keys).

Alternate Key: The candidate key other than primary key is called as alternate key. For
Example, STUD_NO as well as STUD_PHONE both are candidate keys for relation STUDENT
but STUD_PHONE will be alternate key (only one out of many candidate keys).

Foreign Key: A foreign key is a key used to link two tables together. This is
sometimes also called as a referencing key.
A Foreign Key is a column or a combination of columns whose values match
a Primary Key in a different table.
The relationship between 2 tables matches the Primary Key in one
of the tables with a Foreign Key in the second table.
If a table has a primary key defined on any field(s), then you cannot have
two records having the same value of that field(s).

For Example, STUD_NO in STUDENT_COURSE relation is not unique. It has been repeated
for the first and third tuple. However, the STUD_NO in STUDENT relation is a primary key and it
needs to be always unique and it cannot be null.
16

Problems with E-R Model


The E-R model can result problems due to limitations in the way the entities
are related in the relational databases. These problems are called
connection traps. These problems often occur due to a misinterpretation of
the meaning of certain relationships.
Two main types of connection traps are called fan traps and chasm traps.
Fan Trap. It occurs when a model represents a relationship between entity
types, but pathway between certain entity occurrences is ambiguous.
Chasm Trap. It occurs when a model suggests the existence of a
relationship between entity types, but pathway does not exist between
certain entity occurrences.

Fan Trap
A fan trap occurs when one to many relationships fan out from a single
entity.
For example: Consider a database of Department, Site and Staff, where
one site can contain number of department, but a department is situated
only at a single site. There are multiple staff members working at a single
site and a staff member can work from a single site. The above case is
represented in e-r diagram shown.

The problem of above e-r diagram is that, which staff works in a particular
department remain answered. The solution is to restructure the original E-R
model to' represent the correct association as shown.

In other words the two entities should have a direct relationship between
them to provide the necessary information.
17

There is one another way to solve the problem of e-r diagram of figure, by
introducing direct relationship between DEPT and STAFF as shown in
figure.

Another example: Let us consider another case, where one branch


contains multiple staff members and cars, which are represented.

The problem of above E-R diagram is that, it is unable to tell which member
of staff uses a particular, which is represented. It is not possible tell which
member of staff uses' car SH34.

The solution is to shown the relationship between STAFF and CAR as


shown.
18

With this relationship the fan rap is resolved and now it is possible to tell
car SH34 is used by S1500 as shown in figure. It means it is now possible to
tell which car is used by which staff.

Chasm Trap
As discussed earlier, a chasm trap occurs when a model suggests the
existence of a relationship between entity types, but the pathway does not
exist between certain entity occurrences.
It occurs where there is a relationship with partial participation, which
forms part of the pathway between entities that are related.
For example: Let us consider a database where, a single branch is
allocated many staff who handles the management of properties for rent.
Not all staff members handle the property and not all property is managed
by a member of staff. The above case is represented in the e-r diagram.
19

Now, the above e-r diagram is not able to represent what properties are
available at a branch. The partial participation of Staff and Property in the
SP relation means that some properties cannot be associated with a branch
office through a member of staff.
We need to add the missing relationship which is called BP between the
Branch and the Property entities as shown.

Another example: Consider another case, where a branch has multiple


cars but a car can be associated with a single branch. The car is handles by
a single staff and a staff can use only a single cat. Some of staff members
have no car available for their use. The above case is represented in E-R
diagram with appropriate connectivity and cardinality.

The problem of the above E-R diagram is that, it is not possible tell in which
branch staff member S0003 works at as shown.
20

It means the above e-r diagram is not able to represent the relationship
between the BRANCH and STAFF due the partial participation of CAR and
STAFF entities. We need to add the missing relationship which is called BS
between the Branch and STAFF entities as shown.

With this relationship the Chasm trap resolved and now it is possible to
represent to which branch each member of staff works at, as for our
example of staff S003 as shown.
21

Enhanced Entity Relationship Model (EER Model)


EER Model
EER is a high-level data model that incorporates the extensions to the original ER
model.

It is a diagrammatic technique for displaying the following concepts


 Sub Class and Super Class
 Specialization and Generalization
 Union or Category
 Aggregation
These concepts are used when the comes in EER schema and the resulting schema
diagrams called as EER Diagrams.

Features of EER Model

 EER creates a design more accurate to database schemas.


 It reflects the data properties and constraints more precisely.
 It includes all modeling concepts of the ER model.
22

 Diagrammatic technique helps for displaying the EER schema.


 It includes the concept of specialization and generalization.
 It is used to represent a collection of objects that is union of objects of different of
different entity types.
A. Sub Class and Super Class
 Sub class and Super class relationship leads the concept of Inheritance.

 The relationship between sub class and super class is denoted with symbol.
1. Super Class
 Super class is an entity type that has a relationship with one or more subtypes.
 An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle, Triangle.
2. Sub Class
 Sub class is a group of entities with unique attributes.
 Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.

B. Specialization and Generalization

1. Generalization
 Generalization is the process of generalizing the entities which contain the properties
of all the generalized entities.
 It is a bottom approach, in which two lower level entities combine to form a higher level
entity.
 Generalization is the reverse process of Specialization.
 It defines a general entity type from a set of specialized entity type.
 It minimizes the difference between the entities by identifying the common features.
For example:
23

In the above example, Tiger, Lion, Elephant can all be generalized as Animals.

2. Specialization
 Specialization is a process that defines a group entities which is divided into sub groups
based on their characteristic.
 It is a top down approach, in which one higher entity can be broken down into two
lower level entity.
 It maximizes the difference between the members of an entity by identifying the
unique characteristic or attributes of each member.
 It defines one or more sub class for the super class and also forms the
superclass/subclass relationship.
For example

In the above example, Employee can be specialized as Developer or Tester, based on


what role they play in an Organization.

C. Category or Union
24

 Category represents a single super class or sub class relationship with more than one
super class.
 It can be a total or partial participation.
For example Car booking, Car owner can be a person, a bank (holds a possession on a
Car) or a company. Category (sub class) → Owner is a subset of the union of the three
super classes → Company, Bank, and Person. A Category member must exist in at least
one of its super classes.

D. Aggregation
 Aggregation is a process that represent a relationship between a whole object and its
component parts.
 It abstracts a relationship between objects and viewing the relationship as an object.
 It is a process when two entity is treated as a single entity.

In the above example, the relation between College and Course is acting as an Entity in
Relation with Student.
25

DBMS Functional Dependency:


Transitive, Trivial, Multivalued (Example)
What is a functional dependency?
Functional Dependency is when one attribute determines another attribute in a
DBMS system. Functional Dependency plays a vital role to find the difference
between good and bad database design.

Example:

Employee number Employee Name Salar City


y

1 Dana 5000 San Francisco


0

2 Francis 3800 London


0

3 Andrew 2500 Tokyo


0

In this example, if we know the value of Employee number, we can obtain Employee
Name, city, salary, etc.

By this, we can say that the city, Employee Name, and salary are functionally
depended on Employee number.

A functional dependency is denoted by an arrow →

The functional dependency of X on Y is represented by X →Y

Key terms
Here, are some key terms for functional dependency:

Key Terms Description


26

Axiom Axioms is a set of inference rules used to infer all the functional dependencies on a
relational database.

Decomposition It is a rule that suggests if you have a table that appears to contain two entities which
are determined by the same primary key then you should consider breaking them up
into two different tables.

Dependent It is displayed on the right side of the functional dependency diagram.

Determinant It is displayed on the left side of the functional dependency Diagram.

Union It suggests that if two tables are separate, and the PK is the same, you should consider
putting them. together

Rules of Functional Dependencies


Below given are the Three most important rules for Functional Dependency:

 Reflexive rule –. If X is a set of attributes and Y is_subset_of X, then X holds


a value of Y.
 Augmentation rule: When x -> y holds, and c is attribute set, then ac -> bc
also holds. That is adding attributes which do not change the basic
dependencies.
 Transitivity rule: This rule is very much similar to the transitive rule in algebra
if x -> y holds and y -> z holds, then x -> z also holds. X -> y is called as
functionally that determines y.

Types of Functional Dependencies


 Multivalued dependency:
 Trivial functional dependency:
 Non-trivial functional dependency:
 Transitive dependency:

Multivalued dependency in DBMS


27

Multivalued dependency occurs in the situation where there are multiple


independent multivalued attributes in a single table. A multivalued dependency is a
complete constraint between two sets of attributes in a relation. It requires that
certain tuples be present in a relation.

Example:

Car_model Maf_year Color

H001 2017 Metallic

H001 2017 Green

H005 2018 Metallic

H005 2018 Blue

H010 2015 Metallic

H033 2012 Gray

In this example, maf_year and color are independent of each other but dependent
on car_model. In this example, these two columns are said to be multivalue
dependent on car_model.

This dependence can be represented like this:

car_model -> maf_year

car_model-> colour

Trivial Functional dependency:


The Trivial dependency is a set of attributes which are called a trivial if the set of
attributes are included in that attribute.

So, X -> Y is a trivial functional dependency if Y is a subset of X.


28

For example:

Emp_id Emp_name

AS555 Harry

AS811 George

AS999 Kevin

Consider this table with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a


subset of {Emp_id,Emp_name}.

Non trivial functional dependency in DBMS


Functional dependency which also known as a nontrivial dependency occurs when
A->B holds true where B is not a subset of A. In a relationship, if attribute B is not a
subset of attribute A, then it is considered as a non-trivial dependency.

Company CEO Age

Microsoft Satya 51
Nadella

Google Sundar Pichai 46

Apple Tim Cook 57

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)

But CEO is not a subset of Company, and hence it's non-trivial functional
dependency.
29

Transitive dependency:
A transitive is a type of functional dependency which happens when t is indirectly
formed by two functional dependencies.

Example:

Company CEO Age

Microsoft Satya 51
Nadella

Google Sundar Pichai 46

Alibaba Jack Ma 54

{Company} -> {CEO} (if we know the compay, we know its CEO's name)

{CEO } -> {Age} If we know the CEO, we know the Age

Therefore according to the rule of rule of transitive dependency:

{ Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.

Note: You need to remember that transitive dependency can only occur in a relation
of three or more attributes.

What is Normalization?
Normalization is a method of organizing the data in the database which helps you to
avoid data redundancy, insertion, update & deletion anomaly. It is a process of
analyzing the relation schemas based on their different functional dependencies and
primary key.

Normalization is inherent to relational database theory. It may have the effect of


duplicating the same data within the database which may result in the creation of
additional tables.

Advantages of Functional Dependency


30

 Functional Dependency avoids data redundancy. Therefore same data do not


repeat at multiple locations in that database
 It helps you to maintain the quality of data in the database
 It helps you to defined meanings and constraints of databases
 It helps you to identify bad designs
 It helps you to find the facts regarding the database design

Summary
 Functional Dependency is when one attribute determines another attribute in
a DBMS system.
 Axiom, Decomposition, Dependent, Determinant, Union are key terms for
functional dependency
 Four types of functional dependency are 1) Multivalued 2) Trivial 3) Non-trivial
4) Transitive
 Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table
 The Trivial dependency occurs when a set of attributes which are called a
trivial if the set of attributes are included in that attribute
 Nontrivial dependency occurs when A->B holds true where B is not a subset
of A
 A transitive is a type of functional dependency which happens when it is
indirectly formed by two functional dependencies
 Normalization is a method of organizing the data in the database which helps
you to avoid data redundancy
 Functional dependency helps you to maintain the quality of data in the
database

Normalization in DBMS: 1NF, 2NF, 3NF


and BCNF in Database
BY CHAITANYA SINGH | FILED UNDER: DBMS

Normalization is a process of organizing the data in database to avoid data


redundancy, insertion anomaly, update anomaly & deletion anomaly. Let’s discuss
about anomalies first then we will discuss normal forms with examples.

Anomalies in DBMS
31

There are three types of anomalies that occur when the database is not normalized.
These are – Insertion, update and deletion anomaly. Let’s take an example to
understand this.

Example: Suppose a manufacturing company stores the employee details in a table


named employee that has four attributes: emp_id for storing employee’s id, emp_name
for storing employee’s name, emp_address for storing employee’s address and
emp_dept for storing the department details in which the employee works. At some
point of time the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table
is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he
belongs to two departments of the company. If we want to update the address of Rick
then we have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then as
32

per the database, Rick would be having two different addresses, which is not correct
and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training
and currently not assigned to any department then we would not be able to insert the
data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department
D890 then deleting the rows that are having emp_dept as D890 would also delete the
information of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.

Normalization
Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)

First normal form (1NF)


As per the rule of first normal form, an attribute (column) of a table cannot hold multiple
values. It should hold only atomic values.

Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:

emp_idemp_nameemp_addressemp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212


33

9900012222

103 Ron Chennai 7778881212

9990000123

104 Lester Bangalore


8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company stored
them in the same field as you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_idemp_nameemp_addressemp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222


34

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 Non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.

teacher_idsubject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38
35

333 Physics 40

333 Chemistry40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_idteacher_age

111 38

222 38

333 40

teacher_subject table:

teacher_idsubject
36

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key should be
removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:
37

emp_idemp_nameemp_zipemp_stateemp_cityemp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part
of any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is


dependent on emp_id that makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key (emp_id). This violates the rule of
3NF.

To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:

employee table:
38

emp_idemp_nameemp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

employee_zip table:

emp_zipemp_stateemp_cityemp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam


39

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter
than 3NF. A table complies with BCNF if it is in 3NF and for every functional
dependency X->Y, X should be the candidate key of the table.

Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:

emp_idemp_nationalityemp_dept dept_typedept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

1002 American design and technical supportD134 100

1002 American Purchasing department D134 600


40

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

emp_idemp_nationality

1001 Austrian

1002 American

emp_dept table:

emp_dept dept_typedept_no_of_emp

Production and planning D001 200

stores D001 250


41

design and technical supportD134 100

Purchasing department D134 600

emp_dept_mapping table:

emp_idemp_dept

1001 Production and planning

1001 stores

1002 design and technical support

1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
42

For second table: emp_dept


For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

DATABASE DESIGN METHODOLOGY

Designing of database is most important responsibility of the software


professionals who are dealing with the database related projects. For this
they follow the Design Methodology. It helps the designer to plan, manage,
control, and evaluate database development projects.
Design methodology: A structured approach that uses procedures,
techniques, tools, and documentation aids to support and facilitate the
process of design.
A design methodology consists of phases each containing a number of steps,
which guide the designer in the techniques appropriate at each stage of the
project.

Phases of Design Methodology


The database design methodology is divided into three main phases. These
are:

• Conceptual database design


• Logical database design
• Physical database design

Conceptual database design


The process of constructing a model of the information used in an
enterprise, independent of all physical considerations.
The conceptual database design phase begins with the creation of a
conceptual data model of the enterprise, which is entirely independent of
implementation details such as the target DBMS, application programs,
programming languages, hardware platform, performance issues, or any
other physical considerations.

Logical database design


43

It is a process of constructing a model of the information used in an


enterprise based on specific data model, but independent of a particular
DBMS and other physical considerations.
The logical database design phase maps the conceptual model on to a
logical model, which is influenced by the data model for the target database
(for example, the relational model). The logical data model is a source of
information for the physical design phase.
The output of this process is a global logical data model consisting of an
Entity- Relationship diagram, relational schema, and supporting
documentation that describes this model, such as a data dictionary.
Together, these represent the sources of information for the physical design
process, and they provide the physical database designer with a vehicle for
making tradeoffs that are so important to an efficient database design.

Physical database design


It is a description of the implementation of the database on secondary
storage; it describes the base relations, file organizations, and indexes used
to achieve efficient access to the data, and any associated integrity
constraints and security measures.
Whereas logical database design is concerned with the what, physical
database design is concerned with the how. The physical database design
phase allows the designer to make decisions on how the database is to be
implemented. Therefore, physical design is tailored to a specific DBMS.
There is feedback between physical and logical design, because decisions
taken during physical design for improving performance may affect the
logical data model.
For example, decisions taken during physical for improving performance,
such as merging relations together, might affect the structure of the logical
data model, which will have an associated effect on the application design.

Steps of physical database design methodology


After designing logical database model, the steps of physical database
design methodology are as follows:
Step 1: Translate global logical data model for target DBMS It includes
operations like the Design of base relation, derived data and design of
enterprise constraints.
Step 2: Design physical representation
It includes operations like analyzing of transactions, selection offile
organizations, selection of indexes and estimates the disk space
requirements.
Step 2 is most important part in designing of physical design of database. It
is used to determine the optimal file organizations to store the base
44

relations and the indexes· that are required to achieve acceptable


performance, that is, the way in which relations and tuples will be held on
secondary storage.
One of the main objectives of physical database design is to store data in an
efficient ay. There are a number of factors that we may use to measure
efficiency:
• Transaction throughput
This is the number of transactions that can be processed in a given time
interval.
In some systems, such as airline reservations, high transaction throughput
is critical to the overall success of the system.
• Disk storage
This is the amount of disk space required to store the database files. The
designer may wish to minimize the amount of disk storage used.
Response time
It is the time required for the completion of a single transaction. From a
user's point of view, we want to minimize response time as much as
possible.

First normal form (1NF)

First normal form:

A relation is in first normal form if every attribute in every row can contain only one single (atomic)
value.

A university uses the following relation:

Student(Surname, Name, Skills)

The attribute Skills can contain multiple values and therefore the relation is not in the first normal form.

But the attributes Name and Surname are atomic attributes that can contain only one value.
45

Example First normal form

Example First normal form

To get to the first normal form (1NF) we must create a separate tuple for each value of the multivalued
attribute

Second normal form (2NF)

Second normal form:

A relation is in second normal form if it is in 1NF and every non key attribute is fully functionally
dependent on the primary key.

A university uses the following relation:

Student(IDSt, StudentName, IDProf, ProfessorName, Grade)

The attributes IDSt and IDProf are the identification keys.


All attributes a single valued (1NF).

The following functional dependencies exist:


46

1. The attribute ProfessorName is functionally dependent on attribute IDProf (IDProf --> ProfessorName)

2. The attribute StudentName is functionally dependent on IDSt (IDSt --> StudentName)

3. The attribute Grade is fully functional dependent on IDSt and IDProf (IDSt, IDProf --> Grade)

Example Second normal form

The table in this example is in first normal form (1NF) since all attributes are single valued. But it is not
yet in 2NF. If student 1 leaves university and the tuple is deleted, then we loose all information about
professor Schmid, since this attribute is fully functional dependent on the primary key IDSt. To solve this
problem, we must create a new table Professor with the attribute Professor (the name) and the key
IDProf. The third table Grade is necessary for combining the two relations Student and Professor and to
manage the grades. Besides the grade it contains only the two IDs of the student and the professor. If
now a student is deleted, we do not loose the information about the professor.

Third normal form (3NF)

Third normal form:

A relation is in third normal form if it is in 2NF and no non key attribute is transitively dependent on the
primary key.
47

A bank uses the following relation:

Vendor(ID, Name, Account_No, Bank_Code_No, Bank)

The attribute ID is the identification key. All attributes are single valued (1NF). The table is also in 2NF.

The following dependencies exist:

1. Name, Account_No, Bank_Code_No are functionally dependent on ID (ID --> Name, Account_No,
Bank_Code_No)

2. Bank is functionally dependent on Bank_Code_No (Bank_Code_No --> Bank)

Example Third normal form

The table in this example is in 1NF and in 2NF. But there is a transitive dependency between
Bank_Code_No and Bank, because Bank_Code_No is not the primary key of this relation. To get to the
third normal form (3NF), we have to put the bank name in a separate table together with the clearing
number to identify it.

You might also like