DBMS Unitiii
DBMS Unitiii
A simple ER Diagram:
In the following diagram we have two entities Student and College and their
relationship. The relationship between Student and College is many to one as a college
can have many students however a student cannot study in multiple colleges at the
same time. Student entity has attributes such as Stu_Id, Stu_Name & Stu_Addr and
College entity has attributes such as Col_ID & Col_Name.
2
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss
these terms in detail in the next section(Components of a ER Diagram) of this guide so
don’t worry too much about these terms now, just go through them once.
Components of a ER Diagram
1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an
ER diagram.
For example: In the following ER diagram we have two entities Student and College and
these two entities have many to one relationship as many students study in a single
college. We will read more about relationships later, for now focus on entities.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
double rectangle. For example – a bank account cannot be uniquely identified without
3
knowing the bank to which the account belongs, so bank account is a weak entity.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an
ER diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student
roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is
underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For
example, In student entity, the student address is a composite attribute as an address is
4
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is
represented with double ovals in an ER Diagram. For example – A person can have
more than one phone numbers so the phone number attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It
is represented by dashed oval in an ER Diagram. For example – Person age is a
derived attribute as it changes over time and can be derived from another attribute
(Date of birth).
3. Relationship
5
time.
Structural Constraints
Cardinality Constraint-
Cardinality constraint defines the maximum number of relationship instances in which an entity can
participate.
1. Many-to-Many Cardinality-
8
Symbol Used-
Example-
Here,
One student can enroll in any number (zero or more) of courses.
One course can be enrolled by any number (zero or more) of students.
2. Many-to-One Cardinality-
Symbol Used-
Example-
Here,
One student can enroll in at most one course.
One course can be enrolled by any number (zero or more) of students.
3. One-to-Many Cardinality-
Symbol Used-
Example-
Here,
One student can enroll in any number (zero or more) of courses.
One course can be enrolled by at most one student.
4. One-to-One Cardinality-
11
Symbol Used-
Example-
Here,
One student can enroll in at most one course.
One course can be enrolled by at most one student.
12
Participation Constraints-
Participation constraints define the least number of relationship instances in which an entity must
compulsorily participate.
1. Total participation
2. Partial participation
1. Total Participation-
It specifies that each entity in the entity set must compulsorily participate in at least one
relationship instance in that relationship set.
That is why, it is also called as mandatory participation.
Total participation is represented using a double line between the entity set and relationship set.
13
Example-
Here,
Double line between the entity set “Student” and relationship set “Enrolled in” signifies total
participation.
It specifies that each student must be enrolled in at least one course.
2. Partial Participation-
It specifies that each entity in the entity set may or may not participate in the relationship instance
in that relationship set.
That is why, it is also called as optional participation.
Partial participation is represented using a single line between the entity set and relationship set.
Example-
14
Here,
Single line between the entity set “Course” and relationship set “Enrolled in” signifies partial
participation.
It specifies that there might exist some courses for which no enrollments are made.
Candidate Key: The minimal set of attribute which can uniquely identify a tuple is known as
candidate key. For Example, STUD_NO in STUDENT relation.
The value of Candidate Key is unique and non-null for every tuple.
There can be more than one candidate key in a relation. For Example, STUD_NO as well
as STUD_PHONE both are candidate keys for relation STUDENT.
The candidate key can be simple (having only one attribute) or composite as well. For
Example, {STUD_NO, COURSE_NO} is a composite candidate key for relation
STUDENT_COURSE.
Note – In Sql Server a unique constraint that has a nullable column, allows the value ‘null‘ in
that column only once. That’s why STUD_PHONE attribute as candidate here, but can not be
‘null’ values in primary key attribute.
Super Key: The set of attributes which can uniquely identify a tuple is known as Super Key. For
Example, STUD_NO, (STUD_NO, STUD_NAME) etc.
Adding zero or more attributes to candidate key generates super key.
A candidate key is a super key but vice versa is not true.
Primary Key: There can be more than one candidate key in a relation out of which one can be
chosen as primary key. For Example, STUD_NO as well as STUD_PHONE both are candidate
keys for relation STUDENT but STUD_NO can be chosen as primary key (only one out of many
candidate keys).
Alternate Key: The candidate key other than primary key is called as alternate key. For
Example, STUD_NO as well as STUD_PHONE both are candidate keys for relation STUDENT
but STUD_PHONE will be alternate key (only one out of many candidate keys).
Foreign Key: A foreign key is a key used to link two tables together. This is
sometimes also called as a referencing key.
A Foreign Key is a column or a combination of columns whose values match
a Primary Key in a different table.
The relationship between 2 tables matches the Primary Key in one
of the tables with a Foreign Key in the second table.
If a table has a primary key defined on any field(s), then you cannot have
two records having the same value of that field(s).
For Example, STUD_NO in STUDENT_COURSE relation is not unique. It has been repeated
for the first and third tuple. However, the STUD_NO in STUDENT relation is a primary key and it
needs to be always unique and it cannot be null.
16
Fan Trap
A fan trap occurs when one to many relationships fan out from a single
entity.
For example: Consider a database of Department, Site and Staff, where
one site can contain number of department, but a department is situated
only at a single site. There are multiple staff members working at a single
site and a staff member can work from a single site. The above case is
represented in e-r diagram shown.
The problem of above e-r diagram is that, which staff works in a particular
department remain answered. The solution is to restructure the original E-R
model to' represent the correct association as shown.
In other words the two entities should have a direct relationship between
them to provide the necessary information.
17
There is one another way to solve the problem of e-r diagram of figure, by
introducing direct relationship between DEPT and STAFF as shown in
figure.
The problem of above E-R diagram is that, it is unable to tell which member
of staff uses a particular, which is represented. It is not possible tell which
member of staff uses' car SH34.
With this relationship the fan rap is resolved and now it is possible to tell
car SH34 is used by S1500 as shown in figure. It means it is now possible to
tell which car is used by which staff.
Chasm Trap
As discussed earlier, a chasm trap occurs when a model suggests the
existence of a relationship between entity types, but the pathway does not
exist between certain entity occurrences.
It occurs where there is a relationship with partial participation, which
forms part of the pathway between entities that are related.
For example: Let us consider a database where, a single branch is
allocated many staff who handles the management of properties for rent.
Not all staff members handle the property and not all property is managed
by a member of staff. The above case is represented in the e-r diagram.
19
Now, the above e-r diagram is not able to represent what properties are
available at a branch. The partial participation of Staff and Property in the
SP relation means that some properties cannot be associated with a branch
office through a member of staff.
We need to add the missing relationship which is called BP between the
Branch and the Property entities as shown.
The problem of the above E-R diagram is that, it is not possible tell in which
branch staff member S0003 works at as shown.
20
It means the above e-r diagram is not able to represent the relationship
between the BRANCH and STAFF due the partial participation of CAR and
STAFF entities. We need to add the missing relationship which is called BS
between the Branch and STAFF entities as shown.
With this relationship the Chasm trap resolved and now it is possible to
represent to which branch each member of staff works at, as for our
example of staff S003 as shown.
21
The relationship between sub class and super class is denoted with symbol.
1. Super Class
Super class is an entity type that has a relationship with one or more subtypes.
An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle, Triangle.
2. Sub Class
Sub class is a group of entities with unique attributes.
Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.
1. Generalization
Generalization is the process of generalizing the entities which contain the properties
of all the generalized entities.
It is a bottom approach, in which two lower level entities combine to form a higher level
entity.
Generalization is the reverse process of Specialization.
It defines a general entity type from a set of specialized entity type.
It minimizes the difference between the entities by identifying the common features.
For example:
23
In the above example, Tiger, Lion, Elephant can all be generalized as Animals.
2. Specialization
Specialization is a process that defines a group entities which is divided into sub groups
based on their characteristic.
It is a top down approach, in which one higher entity can be broken down into two
lower level entity.
It maximizes the difference between the members of an entity by identifying the
unique characteristic or attributes of each member.
It defines one or more sub class for the super class and also forms the
superclass/subclass relationship.
For example
C. Category or Union
24
Category represents a single super class or sub class relationship with more than one
super class.
It can be a total or partial participation.
For example Car booking, Car owner can be a person, a bank (holds a possession on a
Car) or a company. Category (sub class) → Owner is a subset of the union of the three
super classes → Company, Bank, and Person. A Category member must exist in at least
one of its super classes.
D. Aggregation
Aggregation is a process that represent a relationship between a whole object and its
component parts.
It abstracts a relationship between objects and viewing the relationship as an object.
It is a process when two entity is treated as a single entity.
In the above example, the relation between College and Course is acting as an Entity in
Relation with Student.
25
Example:
In this example, if we know the value of Employee number, we can obtain Employee
Name, city, salary, etc.
By this, we can say that the city, Employee Name, and salary are functionally
depended on Employee number.
Key terms
Here, are some key terms for functional dependency:
Axiom Axioms is a set of inference rules used to infer all the functional dependencies on a
relational database.
Decomposition It is a rule that suggests if you have a table that appears to contain two entities which
are determined by the same primary key then you should consider breaking them up
into two different tables.
Union It suggests that if two tables are separate, and the PK is the same, you should consider
putting them. together
Example:
In this example, maf_year and color are independent of each other but dependent
on car_model. In this example, these two columns are said to be multivalue
dependent on car_model.
car_model-> colour
For example:
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
Microsoft Satya 51
Nadella
Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional
dependency.
29
Transitive dependency:
A transitive is a type of functional dependency which happens when t is indirectly
formed by two functional dependencies.
Example:
Microsoft Satya 51
Nadella
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{ Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation
of three or more attributes.
What is Normalization?
Normalization is a method of organizing the data in the database which helps you to
avoid data redundancy, insertion, update & deletion anomaly. It is a process of
analyzing the relation schemas based on their different functional dependencies and
primary key.
Summary
Functional Dependency is when one attribute determines another attribute in
a DBMS system.
Axiom, Decomposition, Dependent, Determinant, Union are key terms for
functional dependency
Four types of functional dependency are 1) Multivalued 2) Trivial 3) Non-trivial
4) Transitive
Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table
The Trivial dependency occurs when a set of attributes which are called a
trivial if the set of attributes are included in that attribute
Nontrivial dependency occurs when A->B holds true where B is not a subset
of A
A transitive is a type of functional dependency which happens when it is
indirectly formed by two functional dependencies
Normalization is a method of organizing the data in the database which helps
you to avoid data redundancy
Functional dependency helps you to maintain the quality of data in the
database
Anomalies in DBMS
31
There are three types of anomalies that occur when the database is not normalized.
These are – Insertion, update and deletion anomaly. Let’s take an example to
understand this.
The above table is not normalized. We will see the problems that we face when a table
is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he
belongs to two departments of the company. If we want to update the address of Rick
then we have to update the same in two rows or the data will become inconsistent. If
somehow, the correct address gets updated in one department but not in other then as
32
per the database, Rick would be having two different addresses, which is not correct
and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training
and currently not assigned to any department then we would not be able to insert the
data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department
D890 then deleting the rows that are having emp_dept as D890 would also delete the
information of employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.
Normalization
Here are the most commonly used normal forms:
Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:
emp_idemp_nameemp_addressemp_mobile
9900012222
9990000123
Two employees (Jon & Lester) are having two mobile numbers so the company stored
them in the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
emp_idemp_nameemp_addressemp_mobile
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
teacher_idsubject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
35
333 Physics 40
333 Chemistry40
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_idteacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_idsubject
36
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:
37
emp_idemp_nameemp_zipemp_stateemp_cityemp_district
To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
employee table:
38
emp_idemp_nameemp_zip
employee_zip table:
emp_zipemp_stateemp_cityemp_district
Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:
emp_idemp_nationalityemp_dept dept_typedept_no_of_emp
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_idemp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_typedept_no_of_emp
emp_dept_mapping table:
emp_idemp_dept
1001 stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
42
This is now in BCNF as in both the functional dependencies left side part is a key.
A relation is in first normal form if every attribute in every row can contain only one single (atomic)
value.
The attribute Skills can contain multiple values and therefore the relation is not in the first normal form.
But the attributes Name and Surname are atomic attributes that can contain only one value.
45
To get to the first normal form (1NF) we must create a separate tuple for each value of the multivalued
attribute
A relation is in second normal form if it is in 1NF and every non key attribute is fully functionally
dependent on the primary key.
1. The attribute ProfessorName is functionally dependent on attribute IDProf (IDProf --> ProfessorName)
3. The attribute Grade is fully functional dependent on IDSt and IDProf (IDSt, IDProf --> Grade)
The table in this example is in first normal form (1NF) since all attributes are single valued. But it is not
yet in 2NF. If student 1 leaves university and the tuple is deleted, then we loose all information about
professor Schmid, since this attribute is fully functional dependent on the primary key IDSt. To solve this
problem, we must create a new table Professor with the attribute Professor (the name) and the key
IDProf. The third table Grade is necessary for combining the two relations Student and Professor and to
manage the grades. Besides the grade it contains only the two IDs of the student and the professor. If
now a student is deleted, we do not loose the information about the professor.
A relation is in third normal form if it is in 2NF and no non key attribute is transitively dependent on the
primary key.
47
The attribute ID is the identification key. All attributes are single valued (1NF). The table is also in 2NF.
1. Name, Account_No, Bank_Code_No are functionally dependent on ID (ID --> Name, Account_No,
Bank_Code_No)
The table in this example is in 1NF and in 2NF. But there is a transitive dependency between
Bank_Code_No and Bank, because Bank_Code_No is not the primary key of this relation. To get to the
third normal form (3NF), we have to put the bank name in a separate table together with the clearing
number to identify it.