Unit 2 Adbms
Unit 2 Adbms
Unit 2 Adbms
DATABASE DESIGN
Dependencies Non-loss Decomposition First, Second, Third Normal Forms, Dependency Preservation
Boyce/Codd Normal Form Multi-valued Dependencies and Fourth Normal Form Join Dependencies and
Entity:
An entity can be a real-world object that can be easily identifiable.
Example:
In a school database, students, teachers, classes, and courses offered can be considered
as entities.
All entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities.
An entity set may contain entities with attribute sharing similar values.
Example:
A Students set may contain all the students of a school; likewise a Teachers set may
contain all the teachers of a school from all faculties.
Attributes:
Entities are represented by means of their properties, called attributes.
All attributes have values.
Example:
a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes.
Example:
a student's name cannot be a numeric value. It has to be alphabetic. A student's age
cannot be negative, etc.
Types of Attributes:
Simple attribute are atomic values, which cannot be divided further.
Example: a student's phone number is an atomic value of 10 digits.
Composite attribute
Example: a student's complete name may have first_name and
1
last_name.
Derived attribute
database, but their values are derived from other attributes present in the database.
Example: average_salary in a department should not be saved
directly in the database, instead it can be derived.
Example: age can be derived from data_of_birth.
Single-value attribute -value attributes contain single value.
Example: Social_Security_Number.
Multi-value attribute -value attributes may contain more than one values.
Example: a person can have more than one phone number,
email_address, etc.
Th
simple single-valued attributes
simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes
Keys:
Key is an attribute or collection of attributes that uniquely identifies an entity among entity
set.
Example: the roll_number of a student makes him/her identifiable among students.
Super Key
entity set.
Example:consider the student relation, student(rollno,Name ,Age) is a super
key.
Candidate Key
more than one candidate key.
Example: consider the car relation
car(license_no,engine_serial_no,make,model,year)
The candidate keys are license_no and engine_serial_no
Primary Key
entity set is termed as primary key. A primary key is one of the candidate keys chosen by the
database designer to uniquely identify the entity set.
Example: consider the employee relation,
Employee(eno,ename,doj,sal,job,dno) in this eno is the primary key.
Foreign key: An attribute in one relation whose value matches the primary key in some
other relation is called a foreign key.
Example: consider the two relations employee and dept,
Employee(eno,ename,doj,sal,job,dno)
Dept(dno,dname,dloc)
In the above relations, the primary key of dept 'dno' is present in the employee
relation so that employee relation dno is known as foreign key.
Relationship:
The association among entities is called a relationship.
Example:
An employee works_at a department, a student enrolls in a course. Here, Works_at and
Enrolls are called relationships.
2
Relationship Set:
A set of relationships of similar type is called a relationship set. Like entities, a relationship
too can have attributes. These attributes are called descriptive attributes.
Degree of Relationship:
The number of participating entities in a relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree n
Mapping Cardinalities:
Cardinality defines the number of entities in one entity set, which can be associated with
the number of entities of other set via relationship set.
One-to-one
set B and vice versa.
One-to-many
of entity set B
however an entity
from entity set B,
can be associated
with at most one
entity.
Many-to-one
entity of entity set B, however an entity from entity set B can be associated with more than
one entity from entity set A.
3
Many-to-many
and vice versa.
At
tributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse
represents one
attribute and is
directly connected to
its entity (rectangle).
If the attributes are composite, they are further divided in a tree like structure. Every node is
then connected to its attribute. That is, composite attributes are represented by ellipses that are
connected with an ellipse.
4
Multivalued attributes are depicted by double ellipse.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written
inside the diamond-box. All the entities (rectangles) participating in a relationship, are connected to
it by a line.
5
marked as '1:1'.
One-to-many p,
it is marked as '1:N'.
Many-to- When more than one instance of entity is associated with the relationship, it
is marked as 'N:1'.
Many
-to-
many
more
than one instance of an entity on the right can be associated with the relationship. It depicts
many-to-many relationship.
Participation Constraints:
Total Participation
represented by double lines.
Partial participation
is
represe
nted by
single
lines.
6
2.3 ENHANCED-ER MODEL (EER Model):
The EER model includes all the basic modeling concepts of the ER model.
In addition, it includes the concepts of subclass and superclass and the related concepts of
specialization and generalization.
Another concept included in the EER model is that of a category or union type , which is
used to represent a collection of objects (entities) that is the union of objects of different
entity types.
EER is used to model concepts more accurately than the ER diagram.
Specialization:
Specialization is the process of defining a set of subclasses of an entity type; this entity type
is called the superclass of the specialization.
Example:
the set of subclasses {SECRETARY, ENGINEER, TECHNICIAN} is a specialization of
the superclass EMPLOYEE that distinguishes among employee entities based on the job
type of each employee entity.
There may be several specializations of the same entity type based on different
distinguishing characteristics.
Example: another specialization of the EMPLOYEE entity type may yield the set of
subclasses {SALARIED_EMPLOYEE, HOURLY_EMPLOYEE}; this specialization
distinguishes among employees based on the method of pay.
The subclasses that define a specialization are attached by lines to a circle that represents the
specialization, which is connected in turn to the superclass.
7
The subset symbol on each line connecting a subclass to the circle indicates the direction of
the superclass/subclass relationship.
Attributes that apply only to entities of a particular subclass such as TypingSpeed of
SECRETARY are attached to the rectangle representing that subclass.
These are called specific attributes (or local attributes) of the subclass.
A subclass can participate in specific relationship types, such as the HOURLY_EMPLOYEE
subclass participating in the BELONGS_TO relationship.
Generalization:
The generalization process can be viewed as being functionally the inverse of the
specialization process.
8
we can view {CAR, TRUCK} as a specialization of VEHICLE, rather than viewing
VEHICLE as a generalization of CAR and TRUCK.
9
Example:
Example:
If the EMPLOYEE entity type has an attribute Job_type we can specify the condition of
membership in th
which we call the defining predicate of the subclass.
A predicate-defined subclass is indicated by writing the predicate condition next to the
line that connects the subclass to the specialization circle.
If all subclasses in a specialization have their membership condition on the same attribute of
the superclass, the specialization itself is called an attribute-defined specialization, and the
attribute is called the defining attribute of the specialization.
Two other constraints may apply to a specialization.
1.disjointness (or disjointedness) constraint, which means that an entity can be a
member of at most one of the subclasses of the specialization.
2. completeness (or totalness) constraint, which may be total or partial.
A total specialization constraint specifies that every entity in the superclass must be a
member of at least one subclass in the specialization.
There are four possible constraints on specialization:
Disjoint, total
Disjoint, partial
Overlapping, total
Overlapping, partial
10
2.4 ER-TO-RELATIONAL MAPPING:
ER Model, when conceptualized into diagrams, gives a good overview of entity-
relationship, which is easier to understand.
ER diagrams can be mapped to relational schema, that is, it is possible to create relational
schema using ER diagram..
There are several processes and algorithms available to convert ER Diagrams into
Relational Schema. Some of them are automated and some of them are manual. We may
focus here on the mapping diagram contents to relational basics.
11
5. One course is taught by only one instructor. But one instructor teaches many courses. Hence the
cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes
1. "Departmen_Name" can identify a department uniquely. Hence Department_Name is the key
attribute for the Entity "Department".
2. Course_ID is the key attribute for "Course" Entity.
3. Student_ID is the key attribute for "Student" Entity.
4. Instructor_ID is the key attribute for "Instructor" Entity.
Step 4: Identify other relevant attributes
1. For the department entity, other attributes are location
2. For course entity, other attributes are course_name,duration
3. For instructor entity, other attributes are first_name, last_name, phone
4. For student entity, first_name, last_name, phone
Step 5: Draw complete ER diagram
12
In a given relation R, X and Y are attributes. Attributes Y is functionally dependent on
attribute X if each value of X determines exactly one value of Y, which is represented as
X Y
Marks Grade.
Consider the relation schema EMP_PROJ ; from the semantics of the attributes and the
relation, we know that the following functional dependencies should hold:
a.
Types
(a) Full functional dependency
(b) Partial functional dependency
(c) Transitive functional dependency
13
Student_no
Marks
Course_no
In the above example marks is fully functionally dependent on student_no and course_no
together and not on subset of {student_no, course_no}.
This means marks cannot be determined either by student_no or course_no alone. It can be
determined only using student_no and course_no together.
Hence marks is fully functionally dependent on {student_no, course_no}.
Example:
grade depends on marks and in turn make depends on {student_no course_no}, hence
Grade depends fully transitively on {student_no & course_no}.
14
William W. Armstrong established a set of rules which can be used to infer the functional
dependencies in a relational database :
Reflexivity rule: If A is a set of attributes, and B is a set of attributes that are completely
contained in A, the A implies B.
Augmentation rule: If A implies B, and C is a set of attributes, then if A implies B, then AC
implies BC.
Transitivity rule: If A implies B and B implies C, then A implies C.
These can be simplified if we also use:
Union rule: If A implies B and A implies C, the A implies BC.
Decomposition rule: If A implies BC then A implies B and A implies C.
Pseudotransitivity rule: If A implies B and CB implies D, then AC implies D.
Example:
Consider the schema R = (A, B, C, G, H, I) and the set F of functional dependencies {A B, A C,
CG H, CG I,B H}. F+ can be found as follows:
A H. Since A B and B H hold, we apply the transitivity rule.
CG HI. Since CG H and CG I , the union rule implies that CG HI .
AG I. Since A C and CG I , the pseudotransitivity rule implies that AG I holds.
Anomalies:
Anomalies can be classified into insertion anomalies, deletion anomalies, and modification
anomalies.
Insertion Anomalies:
Insertion anomalies can be differentiated into two types, illustrated by the following
examples based on the EMP_DEPT relation:
To insert a new employee tuple into EMP_DEPT, we must include either the attribute
values for the department that the employee works for, or NULLs . For example, to insert a new
tuple for an employee who works in department number 5, we must enter all the attribute values of
department 5 correctly so that they are consistent with the corresponding values for department 5 in
other tuples in EMP_DEPT.
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place NULL values in the attributes for employee. This
violates the entity integrity for EMP_DEPT because Ssn is its primary key.
Deletion Anomalies:
The problem of deletion anomalies is related to the second insertion anomaly situation.
If we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that department is lost
from the database.
This problem does not occur in the database were tuples are stored separately.
Modification Anomalies:
In EMP_DEPT, if we change the value of one of the attributes of a particular department
say, the manager of department 5 we must update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent.
If we fail to update some tuples, the same department will be shown to have two different
values for manager in different employee tuples, which would be wrong.
15
2.6 DECOMPOSITION
A single decomposition schema R={A1,A2 n} that includes all the attributes of the
database. Every attributes name is unique using the FDs, the algorithms decomposes the universal
relation schema R into a set of relation schema D={R 1, R2 m} that will become the relational
database schema D is called decomposition of R.
We must make sure that each attribute in R will appear in atleast one relation schema R i in
the decomposition so that no attributes are lost formally.
Properties of Decomposition
(1) Dependency preservation.
(2) Lossless (or non additive) join property.
Dependency Preservation
If each functional dependency X Y specified in F either appeared directly in one of the
relation schemas Ri in the decomposition D or could be from the dependencies that appear in some
Ri. Informally this Dependency preservation condition.
Definition
Given a set of dependencies F on R, the projection of F on Ri, denoted by Ri(F) where
Ri-subset of R, is the set of dependencies X +
Y in F such that the attributes in XUY are all
condition in Ri.
Hence the projection of F on each relation schema Ri in the decomposition D is the set of FDs
in F+, such that all their left and RHS attributes are in Ri.
A decomposition D={R1, R2 m} of R is dependency preserving with respect to F if the
union of the properties of F on each Ri on D is equivalent to F.
(ie)
2.7 NORMALIZATION:
Normalization is a process of organizing the data in the database.
It is a systematic approach of decomposing tables to eliminate data redundancy.
It was developed by E. F. Codd.
Normalization is a multi-step process that puts the data into a tabular form by removing the
duplicate data from the relation tables.
It is a step by step decomposition of complex records into simple records.
It is also called as Canonical Synthesis.
16
It is the technique of building database structures to store data.
Definition of Normalization
Normalization is a process of designing a consistent database by minimizing redundancy and
ensuring data integrity through decomposition which is lossl
Features of Normalization
Normalization avoids the data redundancy.
It is a formal process of developing data structures.
It promotes the data integrity.
It ensures data dependencies make sense that means data is logically stored.
It eliminates the undesirable characteristics like Insertion, Updation and Deletion Anomalies.
Types of Normalization
Following are the types of Normalization:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Fourth Normal Form
5. Fifth Normal Form
6. BCNF (Boyce Codd Normal Form)
7. DKNF (Domain Key Normal Form)
1) The
domain of DLOCATIONS contains atomic values, but some tuples can have a set of these
values. Therefore DLOCATIONS is not functionally dependent on DNUMBER.
17
2) Domain of DLOCATIONS has set of values and hence non-atomic. But DNUMBER
DLOCATIONS exist.
DNUMBER DLOCATIONS
5 Bellaire
5 Alaska
5 Newyork
18
E305 P27 10 Raju Finance EFG
Ename,
Pname & PLocation violates 2NF. FD2 and FD3 are partially dependent on primary key.
Remedy:
Decompose and set up a new relation for each partial key with its dependent attribute(s).
Make sure to keep a relation with the original primary key and any attributes that are fully
functionally dependent on it.
19
A functional depe
An attribute that is not part of any candidate key is known as non-prime attribute.
3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.
For a table to be in BCNF, following conditions must be satisfied:
R must be in 3rd Normal Form
F
20
E4 Susan P3 250
E4 Susan P5 75
E1 Veronica P5 40
ECode+ProjCode is the primary key. You will notice that Name+ProjCode could be
chosen as the primary key and hence, is a candidate key.
* Hours is functionally dependent on ECode+ProjCode.
* Hours is also functionally dependent on Name+ProjCode.
* Name is functionally dependent on Ecode.
* ECode Is functionally dependent on Name.
You will notice that this table has:
Multiple candidate keys, that is ECode+ProjCode and Name+ProjCode.
The candidate keys are composite.
The candidate keys overlap since the attribute - ProjCode is common.
This is the case of the Boyce-Codd Normal form. This is in third NF. The only non key item
is Hours, which is dependent on the whole key that is ECode + ProjCode or Name+ ProjCode.
ECode and Name are determinants since they are functionally dependent on the each other.
However, they are not candidate keys by themselves. As per BCNF, the determinants have to be
candidate keys.
Employee
Ecode Name
E1 Veronica
E2 Anthony
E3 Mac
E4 Susan
E4 Susan
E1 Veronica
Project
Proj Code Hours
P2 48
P5 100
P6 15
P3 250
P5 75
P5 40
21
E2 P5
E3 P6
E4 P3
E4 P5
E1 P5
Multi-valued Dependency:
A table is said to have multi-valued dependency, if the following conditions are true,
1.For a dependenc single value of A, multiple value of B exists, then the table
may have multi-valued dependency.
2.Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3.And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then
B and C should be independent of each other.
Pizza Delivery:
Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville
The {Restaurant} attribute depends on both Pizza Variety and Delivery Area. The dependencies are:
{Restaurant} {Pizza Variety}
{Restaurant} {Delivery Area}
22
A1 Pizza Stuffed Crust A1 Pizza Shelbyville
Elite Pizza Thin Crust A1 Pizza Capital City
Elite Pizza Stuffed Crust Elite Pizza Capital City
Vincenzo's Pizza Thick Crust Vincenzo's Pizza Springfield
Vincenzo's Pizza Thin Crust Vincenzo's Pizza Shelbyville
23
Product Types By Traveling Salesman Brands By Traveling Salesman
Traveling Traveling
Product Type Brand
Salesman Salesman
Jack Schneider Vacuum Cleaner Jack Schneider Acme
Jack Schneider Breadbox Willy Loman Robusto
Willy Loman Pruning Shears Louis Ferguson Robusto
Willy Loman Vacuum Cleaner Louis Ferguson Acme
Willy Loman Breadbox Louis Ferguson Nimbus
Willy Loman Umbrella Stand
Louis Ferguson Telescope Product Types ByBrand
Louis Ferguson Vacuum Cleaner Brand Product Type
Louis Ferguson Lava Lamp Acme Vacuum Cleaner
Louis Ferguson Tie Rack Acme Breadbox
Acme Lava Lamp
Robusto Pruning Shears
Robusto Vacuum Cleaner
Robusto Breadbox
Robusto Umbrella Stand
Robusto Telescope
Nimbus Tie Rack
24