0% found this document useful (0 votes)
61 views478 pages

Dbms Notes 1

The document outlines the fundamentals of Database Management Systems (DBMS), covering key concepts such as database characteristics, architecture, and user roles. It details the syllabus for a course on DBMS, including topics like SQL, normalization, and transaction properties. Additionally, it discusses the advantages and drawbacks of using DBMS compared to traditional file systems, and the various types of database users and their responsibilities.

Uploaded by

amanmalikup786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views478 pages

Dbms Notes 1

The document outlines the fundamentals of Database Management Systems (DBMS), covering key concepts such as database characteristics, architecture, and user roles. It details the syllabus for a course on DBMS, including topics like SQL, normalization, and transaction properties. Additionally, it discusses the advantages and drawbacks of using DBMS compared to traditional file systems, and the various types of database users and their responsibilities.

Uploaded by

amanmalikup786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 478

Fundamental of

Database Management System


BCAC0020
Lecture- 1

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
Syllabus: Module 1
Basic Concepts:
– Characteristics of the Database,
– Database & Database users, DBA
– Schema & Instances,
– DBMS Architecture & Data Independence,
– Data Base Languages,
– Data Models- Relational, Network, Hierarchical.
– Data Modeling using the Entity Relationship Approach.
Relational Model concepts:
– Relational Data Model Concepts, Relational Algebra
File Organization Techniques:
– Sequential file organization,
– Index File Organization,
– Random file organization.
Syllabus: Module 2
Introduction on SQL:
– Data definition and Data manipulation command in SQL,
– views and queries in SQL,
– Specifying Constraints & index in SQL.
Normalization:
– Functional dependencies, Lossless join & dependency preserving
decomposition.
– Normal forms based on keys (1NF, 2NF, 3NF & BCNF),
– De-normalization
Transaction:
– Introduction,
– Properties (Atomicity, Consistency, Isolation, Durability),
– Transaction State.
Types of Database
– Concept of object oriented data base, Distributed database, Client server
database.
Books to Follow
Text Book:
• Database System Concepts 7th Edition Avi Silberschatz Henry
F. Korth, S. Sudarshan 2019 McGraw-Hill.
• Fundamentals of Database Systems, 7th Edition, Ramez
Elmasri, Shamkant B. Navathe, 2016 Pearson.
Reference Books:
• Bipin Desai, (2006), An Introduction to Database System, West
Pub. Co.
• Jeff Parkins and Bryan Morgan, Teach Yourself SQL in 14 days
Data
• Known facts
that can be
recorded and
that have
implicit
meaning.
Fundamental of
Database Management System
BCAC0005
Lecture- 2

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
Definitions
• Data:
– Known facts that can be recorded and have an implicit meaning.
• Database:
– A collection of related data.
• Database Management System (DBMS):
DBMS contains information about a particular enterprise
– Collection of interrelated data
– Set of programs to access the data
– An environment that is both convenient and efficient to use
• Mini-world:
– Some part of the real world about which data is stored in a database.
For example, student grades and transcripts at a university.
University Database Example

• Application program examples


– Add new students, instructors, and courses
– Register students for courses, and generate class rosters
– Assign grades to students, compute grade point averages
(GPA) and generate transcripts

• In the early days, database applications were built directly on


top of file systems
Database Applications
• Database Applications:
– Banking: transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized
recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax
deductions
• Databases can be very large.
• Databases touch all aspects of our lives
Database Management System(DBMS)

• DBMS is a collection of programs that enables users to create and


maintain a database.
• The primary goal of a DBMS is to provide a way to store and
retrieve database information that is both convenient and efficient.
• The DBMS is hence a general-purpose software system that
facilitates the processes of
– defining,
– constructing ,
– manipulating and
– sharing
• databases among various users and application.
• Defining : specifying the data types, structures, and
constraints for the data.

• Constructing :includes storing the data itself on some


storage medium.

• Manipulating :includes querying the database to retrieve


specific data, updating the data to reflect changes in data,
generating reports from data.

• Sharing : allows multiple users to access the database


concurrently.
https://www.c-sharpcorner.com/arBtCicAlCe0/00w5hFuantd-aims-etnhtaelo-fmDBoMsSt-popular-database-in-the-world/
Characteristics and Benefits of a Database

• Self-describing nature of a database system


• Support for multiple views of data
• Sharing of data and multiuser system
• Control of data redundancy
• Enforcement of integrity constraints
• Restriction of unauthorized access
• Data independence
• Backup and recovery facilities
Drawbacks of using file systems to store data

• Data redundancy and inconsistency


– Multiple file formats, duplication of information in different files
• Difficulty in accessing data
– Need to write a new program to carry out each new task
• Data isolation
– Multiple files and formats
• Integrity problems
– Integrity constraints (e.g., account balance > 0) become “buried” in
program code rather than being stated explicitly
– Hard to add new constraints or change existing ones
Drawbacks of using file systems to store data

• Atomicity of updates
– Failures may leave database in an inconsistent state with partial updates carried
out
– Example: Transfer of funds from one account to another should either complete or
not happen at all
• Concurrent access by multiple users
– Concurrent access needed for performance
– Uncontrolled concurrent accesses can lead to inconsistencies
• Example: Two people reading a balance (say 100) and updating it by
withdrawing money (say 50 each) at the same time
• Security problems
– Hard to provide user access to some, but not all, data

Database systems offer solutions to all the above problems


DBMS Disadvantages
• DBMS may involve unnecessary overhead costs that would
not be incurred in traditional file processing.

• The overhead costs of using a DBMS are due to the following:


– High initial investment in hardware, software, and training
– The generality that a DBMS provides for defining and processing data
– Overhead for providing security, concurrency control, recovery, and
integrity functions
When not to use a DBMS
• Main inhibitors (costs) of using a DBMS:
– High initial investment and possible need for additional
hardware.
– Overhead for providing generality, security, concurrency control,
recovery, and integrity functions.
• When a DBMS may be unnecessary:
– If the database and applications are simple, well defined, and
not expected to change.
– If there are stringent real-time requirements that may not be
met because of DBMS overhead.
– If access to data by multiple users is not required.
When not to use a DBMS

• When no DBMS may suffice:


– If the database system is not able to handle the complexity
of data because of modeling limitations
– If the database users need special operations not
supported by the DBMS.
Fundamental of
Database Management System
BCAC0020
Lecture- 3

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications
Department,
GLA University, Mathura
Database Users

Database
Database Users
1. Database Administrator (DBA)
• A person who has central control over the system is called a
database administrator (DBA).
• Functions of a DBA include:
– Schema definition
– Storage structure and access-method definition
– Schema and physical-organization modification
– Granting of authorization for data access
– Routine maintenance
– Periodically backing up the database
– Ensuring that enough free disk space is available for normal
operations, and upgrading disk space as required
– Monitoring jobs running on the database
Database Users

2. Database designers
• Responsible for identifying the type of data to be stored in
the database and for choosing appropriate structures to
represent and store this data.
• Communicate with all prospective database users in order to
understand their requirements and to create a design that
meets these requirements.
• In many cases, the designers are on the staff of the DBA and
may be assigned other staff responsibilities after the database
design is completed.
Database Users

3. End Users
a) Casual end users
b) Naive or parametric end users
c) Sophisticated end users
Database Users

a) Casual End User


• Occasionally access the database, but they may need different
information each time.
• They use a sophisticated database query language to specify
their requests.
• Middle- or high-level managers or other occasional browsers.
Database Users

b) Naïve End User


• Their main job function revolves around
– constantly querying and updating the database, using standard types
of queries and updates—called canned transactions—that have been
carefully programmed and tested.
Examples:
• Bank tellers check account balances and post withdrawals and
deposits.
• Reservation agents for airlines, hotels, and car rental
companies check availability for a given request and make
reservations
Database Users

c) Sophisticated End User


• Engineers, scientists, business analysts, and others
• Familiar with the facilities of the DBMS in order to implement
their own applications
• They have complex requirements.
Database Users

4. System Analysts and Application Programmers


• System analysts
– Determine the requirements of end users, especially naive and
parametric end users.
– Develop specifications for standard canned transactions that meet
these requirements.
• Application programmers
– implement these specifications as programs;
– then they test, debug, document, and maintain these canned
transactions.
– Such analysts and programmers—commonly referred to as software
developers or software engineers.

BCAC0020 Fundamental of DBMS


Fundamental of
Database Management System
BCAC0020
Lecture- 4 / Schema & Instance

Presented by:
Anil kr.
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
View of Data
• A major purpose of a database system is to provide users with
an abstract view of the data.
– Data models
• A collection of conceptual tools for describing data,
data relationships, data semantics, and consistency
constraints.
– Data abstraction
• Hide the complexity of data structures to represent
data in the database from users through several levels
of data abstraction.

BCAC0020 Fundamental of DBMS


Levels of Abstraction
• Physical level: describes how a record (e.g., instructor) is stored.
• Logical level: describes data stored in database, and the
relationships among the data.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
• View level: application programs hide details of data types. Views
can also hide information (such as an employee’s salary) for security
purposes.

BCAC0020 Fundamental of DBMS


Three Schema Architecture
END User
Forms, Login Pages,
Command prompt External View External View External View

What data CONCEPTUAL SCHEMA


are stored? OR
LOGICAL LEVEL

How data actually INTERNAL SCHEMA


OR
stored?
PHYSICAL LEVEL
BCAC0020 Fundamental of DBMS
Instances and Schemas
• Logical Schema – the overall logical structure of the database
– Example: The database consists of information about a set
of customers and accounts in a bank and the relationship
between them
Analogous to type information of a variable in a program
• Physical schema– the overall physical structure of the
database
• Instance – the actual content of the database at a particular
point in time
• Analogous to the value of a variable

BCAC0020 Fundamental of DBMS


Instances and Schemas

https://selfstudynote.blogspot.com/2016/05/database-schema-and-instance.html
BCAC00020 Fundamental of DBMS
Logical Data Independence
• Logical data is data about database, that is, it stores
information about how data is managed inside.
• For example, a table (relation) stored in the database and all
its constraints, applied on that relation.
• Logical data independence is a kind of mechanism, which
liberalizes itself from actual data stored on the disk.
• If we do some changes on table format, it should not change
the data residing on the disk.

BCAC0020 Fundamental of DBMS


Logical Data Independence
• Example: if we add some new columns or remove some
columns from table then the user view and programs should
not change.
• For example: consider two users A & B.
• Both are selecting the fields "EmployeeNumber" and
"EmployeeName".
• If user B adds a new column (e.g. salary) to his table, it will
not effect the external view for user A, though the internal
schema of the database has been changed for both users A &
B.

BCAC0020 Fundamental of DBMS


Logical Data Independence
• Logical data independence is more difficult to achieve than
physical data independence,
• Since application programs are heavily dependent on the
logical structure of the data that they access.

BCAC0020 Fundamental of DBMS


Physical Data Independence
• The ability to modify the physical schema without changing
the logical schema
– Applications depend on the logical schema
– In general, the interfaces between the various levels and
components should be well defined so that changes in
some parts do not seriously influence others.

BCAC00020 Fundamental of
DBMS
Fundamental of
Database Management System
BCAC0020
Lecture- 5 / DBMS Architecture

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
DBMS Architecture
• A database architect develops and implements software to
meet the needs of users.
• The design of a DBMS depends on its architecture.
• It can be centralized or decentralized or hierarchical.
• The architecture of a DBMS can classified into:
1-tier architecture
2-tier architecture
3-tier architecture

Source: https://medium.com/oceanize-geeks/concepts-of-database-architecture-
dfdc558a93e4
1-tier architecture
• The database is directly available
to the user. Application
• Any changes done here will Program +
directly be done on the database Database
itself.
• It doesn't provide a handy tool
for end users.
• The 1-Tier architecture is used
for development of the local
application, where programmers
can directly communicate with
the database for the quick
response.
2-tier architecture
• Same as basic client-server.
• Client side ( The user interfaces and application programs)
• Server side (Query processing and transaction management
over database)
• Applications on the client end can directly communicate with
the database at the server side.
• To communicate with the DBMS, client-side application
establishes a connection with the server side.
• For this interaction, API's like: ODBC, JDBC are used.
2-tier architecture
user interfaces
+
application
programs

ODBC
Network /
Database JDBC

query processing ,
transaction management
Server
Client
2-tier architecture
• Advantage
– Maintenance and understanding is easier
• Disadvantage
– Poor performance when there are a large number of users.
3-tier architecture
query processingA, A user interfaces
transaction management +
p p application
p p programs
S C
l l
e l
i i
r i
c Network c
Database v e
a a
e n
t t
r t
i i
o o
Server n n Client
3-tier architecture
• Application layer (business logic layer) between the user and
the DBMS
• Application layer is responsible for communicating the user's
request to the DBMS system and send the response from the
DBMS to the user.
• Application layer processes
– functional logic, constraint, and rules
before passing data to the user or down to the DBMS
• Three tier architecture is the most popular DBMS
architecture.
3-tier architecture
The goal of Three-tier architecture is:
• To separate the user applications and physical database
• Proposed to support DBMS characteristics
• Program-data independence
• Support of multiple views of the data
• This type of architecture is used in case of large web
applications.
Question ?
Where we use 1-tier/2-tier/3-tier Architecture ?
Answer
1 tier:
Development of the local application
2 tier:
Attendance Management System
Library Management System
3 tier: Any large website
IRCTC, facebook, etc.
Fundamental of
Database Management System
BCAC0020
Lecture- 6 / Database Languages

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
Database Language
1. Data Definition Language
• Define database structure.
• It is used to create,
– Schema, tables, indexes, constraints, etc.
in the database.
• Used to store the information of metadata
• Metadata
– the number of tables and schemas,
– their names,
– indexes,
– columns in each table,
– constraints, etc.
Tasks that come under DDL

• These commands are used to update the database schema


that's why they come under Data definition language.
– Create: It is used to create objects in the database.
– Alter: It is used to alter the structure of the database.
– Drop: It is used to delete objects from the
database(removes structure).
– Truncate: It is used to remove all records from a table
(structure remains same).
– Rename: It is used to rename an object.
– Comment: It is used to comment on the data dictionary.
2. Data Manipulation Language

• It is used for accessing and manipulating data in a database.


• It handles user requests.
– Select: It is used to retrieve data from a database.
– Insert: It is used to insert data into a table.
– Update: It is used to update existing data within a table.
– Delete: It is used to delete few(based on condition)/all
records from a table(structure remain same).
3. Data Control Language
• The DCL execution is transactional.
• Tasks that come under DCL:
– Grant: It is used to give user access privileges to a
database.
– Revoke: It is used to take back permissions from the user.
4. Transaction Control Language

• TCL is used to run the changes made by the DML statement.


• TCL can be grouped into a logical transaction.
• Tasks that come under TCL:
– Commit: It is used to save the transaction on the database.
– Rollback: It is used to restore the database to original since
the last Commit.
Question ?

• Drop vs Delete vs Truncate……..?

BCAC0020 Fundamental of DBMS


Drop vs Delete vs Truncate

Delete Truncate Drop


DML DDL DDL
Removes few or all Removes all record Removes table
records
No change in No Change in Change in structure
structure structure
May use where Not used Not used
clause
Can be rollback Can’t rollback Can’t rollback
Slower than Faster than Delete Faster than Delete
truncate
BCAC00020 Fundamental of DBMS
Fundamental of
Database Management System
BCAC0020
Lecture- 7 / Database Model

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
Database model
• It shows the logical structure of a database, including the
relationships and constraints
• It determine how data can be stored and accessed.
• Most data models can be represented by an accompanying
database diagram.
Types of database models

– Hierarchical database model


– Network model
– Relational model
– Object-oriented database model
Hierarchical model
• The hierarchical model organizes data into a tree-like
structure.
• Each record has a single parent or root.
• Sibling records are sorted in a particular order.
• That order is used as the physical order for storing the
database.
• This model is good for describing many real-world
relationships.
• This model was primarily used by IBM’s Information
Management Systems in the 60s and 70s, but they are rarely
seen today due to certain operational inefficiencies.
Hierarchical model
Network Model
• This is an extension of the Hierarchical model.
• In this model data is organized more like a graph, and are
allowed to have more than one parent node.
• In this database model data is more related as more
relationships are established in this database model.
• Also, as the data is more related, hence accessing the data is
also easier and fast.
• This database model was used to map many-to-many data
relationships.
• This was the most widely used database model, before
Relational Model was introduced.
Network Model
Relational model
• The model was introduced by E.F. Codd in 1970.
• The most common model, the relational model store data into
tables,
• Tables are also known as relations, each of which consists of
columns and rows.
• Each column lists an attribute of the entity, such as price, zip
code, or birth date.
• Together, the attributes in a relation are called a domain.
• A particular attribute or combination of attributes is chosen as
a primary key that can be referred to in other tables, it’s called
a foreign key.
• Each row, also called a tuple, includes data about a specific
instance of the entity, such as a particular employee.
• The model also accounts for the types of relationships
between those tables, including one-to-one, one-to-many,
and many-to-many relationships.
• Relational databases are typically written in Structured Query
Language (SQL).
Object Oriented Data model

• An object database is a database management system in


which information is represented in the form of objects as
used in object-oriented programming.
• Object databases are different from relational
databases which are table-oriented.
• Object databases have been considered since the early 1980s

BCAC0020 Fundamental of DBMS


BCAC0020 Fundamental of DBMS
Questions ?
• Which database model is used in your
University ?

BCAC0020 Fundamental of DBMS


Answer
• Relational Model

BCAC0020 Fundamental of DBMS


Fundamental of
Database Management System
BCAC0020
Lecture- 8 / Database Design

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
Design Phases
Design Phases
1. Conceptual Design
Once all the requirements have been collected and analyzed, the next step is to
create a conceptual schema for the database, using a high level conceptual data
model. The result of this phase is an Entity-Relationship (ER) diagram or UML
class diagram. It describes how different entities (objects, items) are related to
each other. It also describes what attributes (features) each entity has. It includes
the definitions of all the concepts (entities, attributes) of the
application area. During or after the conceptual schema design, the basic data
model operations can be used to specify the high-level user operations identified
during the functional analysis. This also serves to confirm that the conceptual
schema meets all the indentified functional requirements.
Design Phases
2. Logical Design
The result of the logical design phase (or data model mapping
phase) is a set of relation schemas. The ER diagram or class
diagram is the basis for these relation schemas. To create the
relation schemas is quite a mechanical operation. There are
rules how the ER model or class diagram is transferred to
relation schemas. The relation schemas are the basis for
table definitions. In this phase (if not done in previous phase)
the primary keys and foreign keys are defined.
Design Phases
Normalization
Normalization is the last part of the logical design. The goal of
normalization is to eliminate redundancy and potential
update anomalies. Redundancy means that the same data is
saved more than once in a database. Update anomaly is a
consequence of redundancy. If a piece of data is saved in
more than one place, the same data must be updated in
more than one place. Normalization is a technique by which
one can modify the relation schema to reduce the
redundancy. Each normalization phase adds more relations
(tables) into the database.
Design Phases
Physical Design
The goal of the last phase of database design, physical design,
is to implement the database. At this phase one must know
which database management system (DBMS) is
used. For example, different DBMS's have different names for
datatypes and have different datatypes. The SQL clauses to
create the database are written. The idexes, the integrity
constraints (rules) and the users' access rights are defined.
Finally the data to test the database is added in.
In parallel with these activities, application programs are
designed. The implementation of the programs can start
when the database is created and data has been added in.
Design Phases
• Initial phase -- characterize fully the data needs of the
prospective database users.
• Second phase -- choosing a data model
– Applying the concepts of the chosen data model
– Translating these requirements into a conceptual schema
of the database.
– A fully developed conceptual schema indicates the
functional requirements of the enterprise.
• Describe the kinds of operations (or transactions) that
will be performed on the data.
Design Phases (Cont.)
 Final Phase -- Moving from an abstract data model to the
implementation of the database
– Logical Design – Deciding on the database schema.
• Database design requires that we find a “good” collection
of relation schemas.
 Business decision – What attributes should we record in
the database?
 Computer Science decision – What relation schemas
should we have and how should the attributes be
distributed among the various relation schemas?
– Physical Design – Deciding on the physical layout of the
database
Design Alternatives
 In designing a database schema, we must ensure that we avoid
two major pitfalls:
• Redundancy: a bad design may result in repeat
information.
 Redundant representation of information may lead to
data inconsistency among the various copies of
information
• Incompleteness: a bad design may make certain aspects of
the enterprise difficult or impossible to model.
 Avoiding bad designs is not enough. There may be a large
number of good designs from which we must choose.
Design Approaches
 Entity Relationship Model
• Models an enterprise as a collection of entities and
relationships
 Entity: a “thing” or “object” in the enterprise that is
distinguishable from other objects
• Described by a set of attributes
 Relationship: an association among several entities
• Represented diagrammatically by an entity-relationship
diagram:
 Normalization Theory
• Formalize what designs are bad, and test for them
Outline of the ER Model
ER diagram
• ER diagram or Entity Relationship diagram is a conceptual
model that gives the graphical representation of the logical
structure of the database.
• It shows all the constraints and relationships that exist among
the different components.
Components of ER diagram

• An ER diagram is mainly composed of following three


components-
– Entity Sets
– Attributes
– Relationship Set
Student Table
Example-
Roll_no Name Age
• This complete table is
1 Akshay 20
referred to as “Student
Entity Set” and 2 Rahul 19
3 Pooja 20
• Each row represents an
“entity”. 4 Aarti 19
Representation as ER Diagram

Student Table

Roll_no Name Age


1 Akshay 20
2 Rahul 19
3 Pooja 20
4 Aarti 19
ER Diagram Symbols- Entity Sets

1. For Entity Sets


• An entity set is a set of same type of entities.
• An entity refers to any object having-
– Either a physical existence such as a particular person, office, house or
car.
– Or a conceptual existence such as a school or a company.

• An entity set may be of the following two types-


– Strong entity set
– Weak entity set
ER Diagram Symbols- Entity Sets

1. Strong Entity Set-

• A strong entity set possess its own primary key.


• It is represented using a single rectangle.

2. Weak Entity Set-


• A weak entity set do not possess its own primary key.
• It is represented using a double rectangle.
ER Diagram Symbols-Relationship Sets

• Relationship defines an association among several entities.

• A relationship set is a set of same type of relationships.

• A relationship set may be of the following two types-


– Strong relationship set
– Weak relationship set
ER Diagram Symbols-Relationship Sets

1. Strong Relationship Set-


• A strong relationship exists between two strong entity sets.
• It is represented using a diamond symbol.
2. Weak Relationship Set-
• A weak relationship exists between the strong and weak
entity set.
• It is represented using a double diamond symbol.
ER Diagram Symbols - Attributes

• Attributes are the properties which describes the entities of


an entity set.
• There are several types of attributes.
ER Diagram Symbols- Participation Constraints

• Participation constraint defines the least number of


relationship instances in which an entity has to necessarily
participate.
• There are two types of participation constraints-
– Partial participation
– Total participation
ER Diagram Symbols- Participation Constraints

1. Partial Participation-
• Partial participation is represented using a single line
between the entity set and relationship set.
2. Total Participation-
• Total participation is represented using a double line
between the entity set and relationship set.
ER Diagram Symbols –
Specialization and Generalization

• Generalization is a process of forming a generalized super


class by extracting the common characteristics from two or
more classes.
• Specialization is a reverse process of generalization where a
super class is divided into sub classes by assigning the specific
characteristics of sub classes to them.
ER Diagram Symbols –
Cardinality Constraints / Ratios
• Cardinality constraint defines the maximum number of
relationship instances in which an entity can participate.
• There are 4 types of cardinality ratios-
– Many-to-many cardinality (m:n)
– Many-to-one cardinality (m:1)
– One-to-many cardinality (1:n)
– One-to-one cardinality (1:1)
ER Diagram Symbols –
Cardinality Constraints / Ratios
Entity Sets in DBMS
• An entity refers to any object having-
– Either a physical existence such as a particular person, office, house or
car.
– Or a conceptual existence such as a school, a university, a company or
a job.

• In ER diagram,
– Attributes are associated with an entity set.
– Attributes describe the properties of entities in the entity set.
– Based on the values of certain attributes, an entity can be identified
uniquely.
Types of Entity Sets
1. Strong Entity Set
• A strong entity set is an entity set that contains sufficient
attributes to uniquely identify all its entities.
• In other words, a primary key exists for a strong entity set.
• Primary key of a strong entity set is represented by
underlining it.
1. Strong Entity Set
Symbols Used-
• A single rectangle is used for representing a strong entity set.
• A diamond symbol is used for representing the relationship
that exists between two strong entity sets.
• A single line is used for representing the connection of the
strong entity set with the relationship set.
• A double line is used for representing the total participation
of an entity set with the relationship set.
• Total participation may or may not exist in the relationship.
1. Strong Entity Set
In this ER diagram,
• Two strong entity sets “Student” and “Course” are related to each other.
• Student ID and Student name are the attributes of entity set “Student”.
• Student ID is the primary key using which any student can be identified
uniquely.
• Course ID and Course name are the attributes of entity set “Course”.
• Course ID is the primary key using which any course can be identified
uniquely.
• Double line between Student and relationship set signifies total
participation.
• It suggests that each student must be enrolled in at least one course.
• Single line between Course and relationship set signifies partial
participation.
• It suggests that there might exist some courses for which no enrollments
are made.
2. Weak Entity Set
• A weak entity set is an entity set that does not contain
sufficient attributes to uniquely identify its entities.
• In other words, a primary key does not exist for a weak entity
set.
• However, it contains a partial key called as a discriminator.
• Discriminator can identify a group of entities from the entity
set.
• Discriminator is represented by underlining with a dashed line
2. Weak Entity Set
NOTE-
•The combination of discriminator and primary key of the strong
entity set makes it possible to uniquely identify all entities of the weak
entity set.
•Thus, this combination serves as a primary key for the weak entity
set.
• Clearly, this primary key is not formed by the weak entity set
completely.
2. Weak Entity Set
Symbols Used-
• A double rectangle is used for representing a weak entity set.
• A double diamond symbol is used for representing the
relationship that exists between the strong and weak entity
sets and this relationship is known as identifying relationship.
• A double line is used for representing the connection of the
weak entity set with the relationship set.
• Total participation always exists in the identifying relationship.
2. Weak Entity Set
• In this ER diagram,

• One strong entity set “Building” and one weak entity set “Apartment” are
related to each other.
• Strong entity set “Building” has building number as its primary key.
• Door number is the discriminator of the weak entity set “Apartment”.
• This is because door number alone can not identify an apartment uniquely
as there may be several other buildings having the same door number.
• Double line between Apartment and relationship set signifies total
participation.
• It suggests that each apartment must be present in at least one building.
• Single line between Building and relationship set signifies partial
participation.
• It suggests that there might exist some buildings which has no apartment.
2. Weak Entity Set
• To uniquely identify any apartment,
– First, building number is required to identify the particular building.
– Secondly, door number of the apartment is required to uniquely
identify the apartment.

• Thus, Primary key of Apartment


= Primary key of Building + Its own discriminator
= Building number + Door number
Strong entity set VS Weak entity set

Strong entity set Weak entity set


A single rectangle is used for the A double rectangle is used for the
representation of a strong entity set. representation of a weak entity set.

It contains sufficient attributes to form its It does not contain sufficient attributes to
primary key. form its primary key.

A double diamond symbol is used for the


A diamond symbol is used for the
representation of the identifying
representation of the relationship that
relationship that exists between the strong
exists between the two strong entity sets.
and weak entity set.

A single line is used for the representation A double line is used for the representation
of the connection between the strong of the connection between the weak entity
entity set and the relationship. set and the relationship set.

Total participation may or may not exist in Total participation always exists in the
the relationship. identifying relationship.
Important Note
• In ER diagram, weak entity set is always present in total
participation with the identifying relationship set.
• So, we always have the picture like shown here-
Relationship in DBMS
• A relationship is defined as an association among several
entities.
Example
• ‘Enrolled in’ is a relationship that exists between
entities Student and Course.
Relationship Set-
• A relationship set is a set of relationships of same type.
Example
• Set representation of above ER diagram is-
Degree of a Relationship Set
• The number of entity sets that participate in a relationship set
is termed as the degree of that relationship set.

• Thus, Degree of a relationship set


= Number of entity sets participating in a relationship set
Types of Relationship Sets-

• On the basis of degree of a relationship set, a relationship set


can be classified into the following types-
– Unary relationship set
– Binary relationship set
– Ternary relationship set
1. Unary Relationship Set-
• Unary relationship set is a relationship set where only one
entity set participates in a relationship set.
Example-
Recursive Relationship
• It is possible for the same entity to participate in the relationship.
•This is termed a recursive relationship.
Employee entity
• Employee no
• Employee surname
• Employee forename
• Employee DOB
• Employee dept number
• Manager no * (this is the employee no of the employee's manager)
2. Binary Relationship Set-

• Binary relationship set is a relationship set where two entity


sets participate in a relationship set.
3. Ternary Relationship Set-
• Ternary relationship set is a relationship set where three
entity sets participate in a relationship set.
Cardinality in ER Diagram
• Cardinality constraint defines the maximum number of
relationship instances in which an entity can participate.

• There are 4 types of cardinality ratios-


– Many-to-Many cardinality (m:n)
– Many-to-One cardinality (m:1)
– One-to-Many cardinality (1:n)
– One-to-One cardinality (1:1 )
1. Many-to-Many Cardinality
• By this cardinality constraint,
• An entity in set A can be associated with any number (zero or
more) of entities in set B.
• An entity in set B can be associated with any number (zero or
more) of entities in set A.
• One student can enroll in any number (zero or more) of
courses.
• One course can be enrolled by any number (zero or more) of
students.
2. Many-to-One Cardinality
• An entity in set A can be associated with at most one entity in
set B.
• An entity in set B can be associated with any number (zero or
more) of entities in set A.
• One student can enroll in at most one course.
• One course can be enrolled by any number (zero or more) of
students.
3. One-to-Many Cardinality
• An entity in set A can be associated with any number (zero or
more) of entities in set B.
• An entity in set B can be associated with at most one entity in
set A.
• One student can enroll in any number (zero or more) of
courses.
• One course can be enrolled by at most one student.
4. One-to-One Cardinality
• An entity in set A can be associated with at most one entity in
set B.
• An entity in set B can be associated with at most one entity in
set A.
• One student can enroll in at most one course.
• One course can be enrolled by at most one student.
Participation Constraints
• Participation constraints define the least number of
relationship instances in which an entity must compulsorily
participate.

• There are two types of participation constraints-


– Total participation
– Partial participation
1. Total Participation-
• It specifies that each entity in the entity set must
compulsorily participate in at least one relationship instance
in that relationship set.
• That is why, it is also called as mandatory participation.
• Total participation is represented using a double line
between the entity set and relationship set.
• Double line between the entity set “Student” and relationship
set “Enrolled in” signifies total participation.
• It specifies that each student must be enrolled in at least one
course.
2. Partial Participation
• It specifies that each entity in the entity set may or may not
participate in the relationship instance in that relationship set.
• That is why, it is also called as optional participation.
• Partial participation is represented using a single line between
the entity set and relationship set.
• Single line between the entity set “Course” and relationship
set “Enrolled in” signifies partial participation.
• It specifies that there might exist some courses for which no
enrollments are made.
Relationship between Cardinality and Participation Constraints-

• Minimum cardinality tells whether the participation is partial


or total.
– If minimum cardinality = 0, then it signifies partial
participation.
– If minimum cardinality = 1, then it signifies total
participation.
• Maximum cardinality tells the maximum number of entities
that participates in a relationship set.
Types of Attributes
• Attributes are the descriptive properties which are owned by
each entity of an Entity Set.
• There exist a specific domain or set of values for each
attribute from where the attribute can take its values.
Types of Attributes
• In ER diagram, attributes associated with an entity set may be
of the following types-
1. Simple Attributes-
• Simple attributes are those attributes which can not be
divided further.

Example

• Here, all the attributes are simple attributes as they can not
be divided further.
2. Composite Attributes
• Composite attributes are those attributes which are
composed of many other simple attributes.

Example-

• Here, the attributes “Name” and “Address” are composite attributes as


they are composed of many other simple attributes.
3. Single Valued Attributes
• Single valued attributes are those attributes which can take
only one value for a given entity from an entity set.

• Here, all the attributes are single valued attributes as they can
take only one specific value for each entity.
4. Multi Valued Attributes
• Multi valued attributes are those attributes which can take
more than one value for a given entity from an entity set.

Example-

• Here, the attributes “Mob_no” and “Email_id” are multi valued attributes
as they can take more than one values for a given entity.
5. Derived Attributes
• Derived attributes are those attributes which can be derived
from other attribute(s).

Example-

• Here, the attribute “Age” is a derived attribute as it can be derived from


the attribute “DOB”.
6. Key Attributes-
• Key attributes are those attributes which can identify an
entity uniquely in an entity set.

Example-

• Here, the attribute “Roll_no” is a key attribute as it can


identify any student uniquely.
Steps to Create an ERD
Steps to Create an ERD
• In a university, a Student enrolls in Courses.
• A student must be assigned to at least one or more Courses.
• Each course is taught by a single Professor.
• To maintain instruction quality, a Professor can deliver only
one course
Steps to Create an ERD
Step 1) Entity Identification
• We have three entities
– Student
– Course
– Professor
Steps to Create an ERD
Step 2) Relationship Identification
• We have the following two relationships
– The student is assigned a course
– Professor delivers a course
Steps to Create an ERD
Step 3) Cardinality Identification
• For them problem statement we know that,
– A student can be assigned multiple courses
– A Professor can deliver only one course
Steps to Create an ERD
Step 4) Identify Attributes
• You need to study the files, forms, reports, data currently
maintained by the organization to identify attributes.
• You can also conduct interviews with various stakeholders to
identify entities.
• Initially, it's important to identify the attributes without
mapping them to a particular entity.
• Once, you have a list of Attributes, you need to map them to
the identified entities.
• Once the mapping is done, identify the primary Keys.
• If a unique key is not readily available, create one.
Steps to Create an ERD
Entity Primary Key Attribute
Student Student_ID StudentName
Professor Employee_ID ProfessorName
Course Course_ID CourseName
Steps to Create an ERD
Step 5) Create the ERD
• A more modern representation of ERD Diagram
Steps to Create an ERD
ER diagram example
• Suppose you are given the following requirements for a
simple database for the National
• Hockey League (NHL):
– the NHL has many teams,
– each team has a name, a city, a coach, a captain, and a set of
players,
– each player belongs to only one team,
– each player has a name, a position (such as left wing or goalie),
a skill level, and a set of injury records,
– a team captain is also a player,
– a game is played between two teams (referred to as host_team
and guest_team) and has a date (such as May 11th, 1999) and a
score (such as 4 to 2).
• Entities:
– Team(t_name, city, coach )
– Player(p_name, position, skill_level)
– Injury record (Weak entity, depend on player)
• Relationships:
– Each team has a captain which is also a player
– Each team has many player
– A game is played between two teams(host and guest),
and has date and score(attributes)
– A player has injury record
Date
Score

Game Captain Has


1 1 m
1 Host Guest 1 Injury Record
1
n Player
Team 1 Belongs
_To
P_no

Coach Description
T_name

City
Position
P_name
P_no Skill_level
Example 2
• A university registrar’s office maintains data about the following
entities:
– courses, including number, title, credits, syllabus, and prerequisites;
– course offerings, including course number, year, semester, section
number, instructor(s), timings, and classroom;
– students, including student-id, name, and program;
– instructors, including identification number, name, department, and
title.
• Further, the enrollment of students in courses and grades awarded
to students in each course they are enrolled for must be
appropriately modeled.
• Construct an E-R diagram for the registrar’s office.
• Document all assumptions that you make about the mapping
constraints.
Entities
• Student(sid, name, program)
• Course(C_number, title, credits, syllabus)
• course offerings( c_number, year, semester, section_number, timings, and
classroom)
• Instructor(iid, name, department, title)

Relationships
• Students enrolls in course offerings, then grade is allotted.
• Instructor teaches course offerings.
• A course is offered Course offerings
• A main course required A prerequisite course.
Example 3
• Construct an E-R diagram for a car-insurance company whose
customers own one or more cars each.
• Each car has associated with it zero to any number of
recorded accidents.
• Construct appropriate tables for the above ER
Diagram ?
• Car insurance tables:
– person (driver-id, name, address)
– car (license, year,model)
– accident (report-number, date, location)
– participated(driver-id, license, report-number,
damage-amount)
Example 4
• Construct an E-R diagram for a hospital with a set of patients
and a set of medical doctors.
• Associate with each patient a log of the various tests and
examinations conducted.
• Construct appropriate tables for the above ER
Diagram :
– Patient(SS#, name, insurance)
– Physician ( name, specialization)
– Test-log( SS#, test-name, date, time)
– Doctor-patient (physician-name, SS#)
– Patient-history(SS#, test-name, date)
Example 5
• Draw the E-R diagram which models an online
bookstore.
Converting ER Diagrams to Tables

• After designing an ER Diagram


• ER diagram is converted into the tables in relational model.
• This is because relational models can be easily implemented
by RDBMS like MySQL , Oracle etc.
Rule-01: For Strong Entity Set With Only Simple Attributes
• A strong entity set with only simple attributes will require only one
table in relational model.
• Attributes of the table will be the attributes of the entity set.
• The primary key of the table will be the key attribute of the entity
set.

Schema : Student ( Roll_no , Gender , Age )

Roll_no Gender Age


Rule-02: For Strong Entity Set With Composite Attributes
• A strong entity set with any number of composite attributes
will require only one table in relational model.
• While conversion, simple attributes of the composite
attributes are taken into account and not the composite
attribute itself.
Roll_no First_name Last_name House_no Street City
Rule-03: For Strong Entity Set With Multi Valued Attributes
• A strong entity set with any number of multi valued attributes
will require two tables in relational model.
• One table will contain all the simple attributes with the
primary key.
• Other table will contain the primary key and all the multi
valued attributes.
Roll_no City Roll_no Mobile_No
Rule-04: Translating Relationship Set into a Table
• A relationship set will require one table in the relational
model.
• Attributes of the table are-
• Primary key attributes of the participating entity sets
• Its own descriptive attributes if any.
• Set of non-descriptive attributes will be the primary key.
Emp_no Dept_id Since

Schema : Works in ( Emp_no , Dept_id , since )


Rule-05: For Binary Relationships With Cardinality Ratios
The following four cases are possible-

– Case-01: Binary relationship with cardinality ratio m:n


– Case-02: Binary relationship with cardinality ratio 1:n
– Case-03: Binary relationship with cardinality ratio m:1
– Case-04: Binary relationship with cardinality ratio 1:1
Case-01: For Binary Relationship With Cardinality Ratio m:n

Here, three tables will be required-


• A ( a1 , a2 )
• R ( a1 , b1 )
• B ( b1 , b2 )
Case-02: For Binary Relationship With Cardinality Ratio 1:n

Here, two tables will be required-


• A ( a1 , a2 )
• BR ( a1 , b1 , b2 )
NOTE- Here, combined table will be drawn for the entity set B
and relationship set R.
Case-03: For Binary Relationship With Cardinality Ratio m:1

Here, two tables will be required-


• AR ( a1 , a2 , b1 )
• B ( b1 , b2 )
NOTE- Here, combined table will be drawn for the entity set A
and relationship set R.
Case-04: For Binary Relationship With Cardinality Ratio 1:1

Here, two tables will be required.


Either combine ‘R’ with ‘A’ or ‘B’

Way-01:
AR ( a1 , a2 , b1 )
B ( b1 , b2 )

Way-02:
A ( a1 , a2 )
BR ( a1 , b1 , b2 )
Thumb Rules to Remember
• While determining the minimum number of tables required
for binary relationships with given cardinality ratios, following
thumb rules must be kept in mind-
– For binary relationship with cardinality ration m : n , separate and
individual tables will be drawn for each entity set and relationship.
– For binary relationship with cardinality ratio either m : 1 or 1 : n ,
always remember “many side will consume the relationship” i.e. a
combined table will be drawn for many side entity set and relationship
set.
– For binary relationship with cardinality ratio 1 : 1 , two tables will be
required. You can combine the relationship set with any one of the
entity sets.
Rule-06: For Binary Relationship With Both Cardinality
Constraints and Participation Constraints

• Cardinality constraints will be implemented as discussed in


Rule-05.
• Because of the total participation constraint, foreign key
acquires NOT NULL constraint i.e. now foreign key can not be
null.
• Case-01: For Binary Relationship With Cardinality Constraint
and Total Participation Constraint From One Side
• Because cardinality ratio = 1 : n , so we will combine the entity set B
and relationship set R.
• Then, two tables will be required-
A ( a1 , a2 )
BR ( a1 , b1 , b2 )
• Because of total participation, foreign key a1 has acquired NOT
NULL constraint, so it can’t be null now.
• Case-02: For Binary Relationship With Cardinality Constraint
and Total Participation Constraint From Both Sides
• If there is a key constraint from both the sides of an entity set
with total participation, then that binary relationship is
represented using only single table.
• Here, Only one table is required.
ARB ( a1 , a2 , b1 , b2 )
Rule-07: For Binary Relationship With Weak Entity Set
• Weak entity set always appears in association with identifying
relationship with total participation constraint.
• Here, two tables will be required-
A ( a1 , a2 )
BR ( a1 , b1 , b2 )
Problem-01
• Find the minimum number of tables required for the following
ER diagram in relational model-
Problem-01
Solution
• Applying the rules, minimum 3 tables will be required-
MR1 (M1 , M2 , M3 , P1)
P (P1 , P2)
NR2 (P1 , N1 , N2)
Problem-02
• Find the minimum number of tables required to represent the
given ER diagram in relational model
Solution
• Applying the rules, minimum 4 tables will be required-
– AR1R2 (a1 , a2 , b1 , c1)
– B (b1 , b2)
– C (c1 , c2)
– R3 (b1 , c1)
Problem-03
Solution
• BR1R4R5 (b1 , b2 , a1 , c1 , d1)
• A (a1 , a2)
• R2 (a1 , c1)
• CR3 (c1 , c2 , d1)
• D (d1 , d2)
Problem-04
• Applying the rules, minimum 3 tables will be required
• E1 (a1 , a2)
• E2R1R2 (b1 , b2 , a1 , c1 , b3)
• E3 (c1 , c2)
Problem-05
Solution
• Applying the rules that we have learnt, minimum 6 tables will
be required-
Account (Ac_no , Balance , b_name)
Branch (b_name , b_city , Assets)
Loan (L_no , Amt , b_name)
Borrower (C_name , L_no)
Customer (C_name , C_street , C_city)
Depositor (C_name , Ac_no)
ER model
• An Entity–relationship model (ER model) describes the
structure of a database with the help of a diagram,
• This diagram is known as Entity Relationship Diagram (ER
Diagram).
• An ER model is a design or blueprint of a database that can
later be implemented as a database.
• The main components of E-R model are: entity set and
relationship set.
Entity Relationship Diagram (ER Diagram)

• An ER diagram shows the relationship among entity sets.


• An entity set is a group of similar entities and these entities
can have attributes.
• In terms of DBMS, an entity is a table or attribute of a table in
database,
• By showing relationship among tables and their attributes, ER
diagram shows the complete logical structure of a database.
A simple ER Diagram

•In the following diagram we have two entities Student and College
and their relationship.
•The relationship between Student and College is many to one as a
college can have many students however a student cannot study in
multiple colleges at the same time.
•Student entity has attributes such as Stu_Id, Stu_Name & Stu_Addr
and College entity has attributes such as Col_ID & Col_Name.
Component of ER Diagram
1. Strong Entity:
• An entity may be any object, class, person or place.
• In the ER diagram, an entity can be represented as rectangles.
• Consider an organization as an example- manager, product,
employee, department etc. can be taken as an entity.
a. Weak Entity
• An entity that depends on another entity called a weak entity.
• The weak entity doesn't contain any key attribute of its own.
•The weak entity is represented by a double rectangle.
Ex. Installment, Dependent of employee, etc.
2. Attribute
• The attribute is used to describe the property of an entity.
• Ellipse is used to represent an attribute.
Example, id, age, contact number, name, etc. can be attributes
of a student.
a. Key Attribute
• The key attribute is used to represent the main characteristics
of an entity.
• It represents a primary key.
• The key attribute is represented by an ellipse with the text
underlined.
b. Composite Attribute
• An attribute that composed of many other attributes is known as a
composite attribute.
• The composite attribute is represented by an ellipse, and those ellipses
are connected with an ellipse.
c. Multi-valued Attribute
• An attribute can have more than one value.
• These attributes are known as a multi-valued attribute.
• The double oval is used to represent multi-valued attribute.
Example, a student can have more than one phone number.
d. Derived Attribute
• An attribute that can be derived from other attribute is known
as a derived attribute.
• It can be represented by a dashed ellipse.
Example, A person's age changes over time and can be
derived from another attribute like Date of birth.
3. Relationship
• A relationship is used to describe the relation between
entities. Diamond or rhombus is used to represent the
relationship.
Recursive Relationship
• It is possible for the same entity to participate in the relationship.
•This is termed a recursive relationship.
Employee entity
• Employee no
• Employee surname
• Employee forename
• Employee DOB
• Employee dept number
• Manager no * (this is the employee no of the employee's manager)
Mapping Constraints(Cardinality)
• A mapping constraint is a data constraint that expresses the
number of entities to which another entity can be related via
a relationship set.
• It is most useful in describing the relationship sets that involve
more than two entity sets.
• For binary relationship set R on an entity set A and B, there
are four possible mapping cardinalities.
• These are as follows:
– One to one (1:1)
– One to many (1:M)
– Many to one (M:1)
– Many to many (M:M)
a. One-to-One Relationship
• When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship.
• For example, a person has only one passport and a passport is
given to one person.
b. One-to-many relationship
• When only one instance of the entity on the left, and more
than one instance of an entity on the right associates with the
relationship then this is known as a one-to-many relationship.
• For example, Scientist can invent many inventions, but the
invention is done by the only specific scientist.
c. Many-to-one relationship
• When more than one instance of the entity on the left, and
only one instance of an entity on the right associates with the
relationship then it is known as a many-to-one relationship.
• For example, Student enrolls for only one course, but a course
can have many students.
d. Many-to-many relationship
• When more than one instance of the entity on the left, and
more than one instance of an entity on the right associates
with the relationship then it is known as a many-to-many
relationship.
• For example, Employee can assign by many projects and
project can have many employees.
Notation of ER diagram
• In ER diagram,
many notations are
used to express the
cardinality.

• Cardinality
specifies how
many
instances of
an entity
relate to one
instance of
another entity
Purchase order
Keys
• Keys play an important role in the relational database.
• It is used to uniquely identify any record or row of data from
the table. It is also used to establish and identify relationships
between tables.
• For example: In Student table, ID is used as a key because it is
unique for each student. In PERSON table, passport_number,
license_number, SSN are keys since they are unique for each
person.
Types of key
1. Primary key
• It is the first key which is used to identify one and only one
instance of an entity uniquely.
• An entity can contain multiple keys as we saw in PERSON
table.
• The key which is most suitable from those lists become a
primary key.
• In the EMPLOYEE table, ID can be primary key since it is
unique for each employee. In the EMPLOYEE table, we can
even select License_Number and Passport_Number as
primary key since they are also unique.
• For each entity, selection of the primary key is based on
requirement and developers.
2. Candidate key
• A candidate key is an attribute or set of an attribute which can
uniquely identify a tuple.
• The remaining attributes except for primary key are
considered as a candidate key.
• The candidate keys are as strong as the primary key.
• For example: In the EMPLOYEE table, id is best suited for the
primary key. Rest of the attributes like SSN, Passport_Number,
and License_Number, etc. are considered as a candidate key.
3. Super Key
• Super key is a set of an attribute which can uniquely identify a
tuple. Super key is a superset of a candidate key.
• For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME) the name of two employees can be the
same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.
• The super key would be EMPLOYEE-ID, (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
4. Foreign key
• Foreign keys are the column of the table which is used to
point to the primary key of another table.
• In a company, every employee works in a specific department,
and employee and department are two different entities.
• So we can't store the information of the department in the
employee table.
• That's why we link these two tables through the primary key
of one table.
• We add the primary key of the DEPARTMENT table,
Department_Id as a new attribute in the EMPLOYEE table.
• Now in the EMPLOYEE table, Department_Id is the foreign
key, and both the tables are related.
Steps to Create an ERD
• In a university, a Student enrolls in Courses.
• A student must be assigned to at least one or
more Courses.
• Each course is taught by a single Professor.
• To maintain instruction quality, a Professor can
deliver only one course
Step 1) Entity Identification
• We have three entities
– Student
– Course
– Professor
Step 2) Relationship Identification
• We have the following two relationships
– The student is assigned a course
– Professor delivers a course
Step 3) Cardinality Identification
• For them problem statement we know that,
– A student can be assigned multiple courses
– A Professor can deliver only one course
Step 4) Identify Attributes
• You need to study the files, forms, reports, data currently
maintained by the organization to identify attributes.
• You can also conduct interviews with various stakeholders to
identify entities.
• Initially, it's important to identify the attributes without
mapping them to a particular entity.
• Once, you have a list of Attributes, you need to map them to
the identified entities.
• Once the mapping is done, identify the primary Keys.
• If a unique key is not readily available, create one.
Entity Primary Key Attribute
Student Student_ID StudentName
Professor Employee_ID ProfessorName
Course Course_ID CourseName
Step 5) Create the ERD
• A more modern representation of ERD
Diagram
ER diagram example
• Suppose you are given the following requirements for a
simple database for the National
• Hockey League (NHL):
– the NHL has many teams,
– each team has a name, a city, a coach, a captain, and a set of
players,
– each player belongs to only one team,
– each player has a name, a position (such as left wing or goalie),
a skill level, and a set of injury records,
– a team captain is also a player,
– a game is played between two teams (referred to as host_team
and guest_team) and has a date (such as May 11th, 1999) and a
score (such as 4 to 2).
• Entities:
– Team(t_name, city, coach )
– Player(p_name, position, skill_level)
– Injury record (Weak entity, depend on player)
• Relationships:
– Each team has a captain which is also a player
– Each team has many player
– A game is played between two teams(host and guest),
and has date and score(attributes)
– A player has injury record
Date
Score

Game Captain Has


1 1 m
1 Host Guest 1 Injury Record
1
n Player
Team 1 Belongs
_To
P_no

Coach Description
T_name

City
Position
P_name
P_no Skill_level
Example 2
• A university registrar’s office maintains data about the following
entities:
– courses, including number, title, credits, syllabus, and prerequisites;
– course offerings, including course number, year, semester, section
number, instructor(s), timings, and classroom;
– students, including student-id, name, and program;
– instructors, including identification number, name, department, and
title.
• Further, the enrollment of students in courses and grades awarded
to students in each course they are enrolled for must be
appropriately modeled.
• Construct an E-R diagram for the registrar’s office.
• Document all assumptions that you make about the mapping
constraints.
Entities
• Student(sid, name, program)
• Course(C_number, title, credits, syllabus)
• course offerings( c_number, year, semester, section_number, timings, and
classroom)
• Instructor(iid, name, department, title)

Relationships
• Students enrolls in course offerings, then grade is allotted.
• Instructor teaches course offerings.
• A course is offered Course offerings
• A main course required A prerequisite course.
Example 3
• Construct an E-R diagram for a car-insurance
company whose customers own one or more
cars each.
• Each car has associated with it zero to any
number of recorded accidents.
• Construct appropriate tables for the above ER
Diagram ?
• Car insurance tables:
– person (driver-id, name, address)
– car (license, year,model)
– accident (report-number, date, location)
– participated(driver-id, license, report-number,
damage-amount)
Example 4
• Construct an E-R diagram for a hospital with a set of patients
and a set of medical doctors.
• Associate with each patient a log of the various tests and
examinations conducted.
• Construct appropriate tables for the above ER
Diagram :
– Patient(SS#, name, insurance)
– Physician ( name, specialization)
– Test-log( SS#, test-name, date, time)
– Doctor-patient (physician-name, SS#)
– Patient-history(SS#, test-name, date)
Example 5
• Draw the E-R diagram which models an online
bookstore.
Fundamental of
Database Management System
BCAC0020
Lecture - 1

Presented by:
Anil Kr. Chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
Fundamentals of database
management system

BCAC0020
Module-2
File Sequential file organization
Organization
Techniques Index File Organization, Random file organization
Relational data model concept Relational Algebra(select
operation, Project Operation)

Relational Union operation, set difference, cartesian product


model concepts
Joins(Natural Join, outer Join(Left, Right, Full))
Introduction of RDBMS & CODD rules
Data definition in SQL(CREATE, ALTER, DROP, TRUNCATE,
RENAME)
DML Queries(SELECT, INSERT UPDATE, DELETE)
Views in SQL
SQL
Specifying Constraints(Primary key, Unique, Foreign key, Null)
Group By and Having clause
Index in SQL
File Organization Techniques

• Storing the files in certain order is called file organization.


• The main objective of file organization is
– Optimal selection of records i.e.; records should be accessed as fast as
possible.
– Any insert, update or delete transaction on records should be easy,
quick and should not harm other records.
– No duplicate records should be induced as a result of insert, update or
delete
– Records should be stored efficiently so that cost of storage is minimal.
Types of file organization
– Sequential File Organization
– Indexed Sequential Access Method
– Heap(random) File Organization
– Hash/Direct File Organization
– B+ Tree File Organization
– Cluster File Organization
1. Sequential File Organization
• Here each file/records are stored one after the other in a
sequential manner.
• This can be achieved in two ways:
• In the first method:
– Records are stored one after the other as they are inserted into the
tables.
– When a new record is inserted, it is placed at the end of the file.
– In the case of any modification or deletion of record, the record will be
searched in the memory blocks.
– Once it is found, it will be marked for deleting and new block of record
is entered.
Sequential File Organization

• In the diagram above, R1, R2, R3 etc are the records.

•They contain all the attribute of a row. i.e.; when we say student
record, it will have his id, name, address, course, DOB etc.

• Similarly R1, R2, R3 etc can be considered as one full set of


attributes.
Sequential File Organization

Inserting a new record


Sequential File Organization

• In the second method,


– records are sorted (either ascending or descending) each time they
are inserted into the system.
– This method is called sorted file method.
– Sorting of records may be based on the primary key or on any other
columns.
– Whenever a new record is inserted, it will be inserted at the end of the
file and then it will sort – ascending or descending based on key value
and placed at the correct position.
– In the case of update, it will update the record and then sort the file to
place the updated record in the right place.
– Same is the case with delete.
2. Indexed Sequential Access Method
(ISAM)
• This is an advanced sequential file organization method.
• Here records are stored in order of primary key in the file.
• For each primary key, an index value is generated and
mapped with the record.
• This index is nothing but the address of record in the file.
• In this method, if any record has to be retrieved, based on its
index value, the data block address is fetched and the record
is retrieved from memory.
Indexed Sequential Access Method (ISAM)
Indexed Sequential Access Method (ISAM)

Advantages of ISAM
• Since each record has its data block address, searching for a
record in larger database is easy and quick.
• This method gives flexibility of using any column as key field
and index will be generated based on that. In addition to the
primary key and its index, we can have index generated for
other fields too.
• It supports range retrieval, partial retrieval of records.
• Since the index is based on the key value, we can retrieve the
data for the given range of values.
Indexed Sequential Access Method (ISAM)

Disadvantages of ISAM
• An extra cost to maintain index has to be afforded. i.e.; we
need to have extra space in the disk to store this index value.
When there is multiple key-index combinations, the disk
space will also increase.
• As the new records are inserted, these files have to be
restructured to maintain the sequence.
• Similarly, when the record is deleted, the space used by it
needs to be released. Else, the performance of the database
will slow down.
3. Random(Heap) file organization
technique
• Here records are inserted at the end of the file as and when
they are inserted.
• There is no sorting or ordering of the records.
• Once the data block is full, the next record is stored in the
new block.
• This new block need not be the very next block.
• This method can select any block in the memory to
store the new records.
Random(Heap) file organization technique

• It is similar to pile file in the sequential method, but here


data blocks are not selected sequentially.
• They can be any data blocks in the memory.
• It is the responsibility of the DBMS to store the records and
manage them.
Random(Heap) file organization technique
Random(Heap) file organization technique
Random(Heap) file organization technique

• When a record has to be retrieved from the database, in this


method, we need to traverse from the beginning of the file till
we get the requested record.
• Hence fetching the records in very huge tables, it is time
consuming.
• To delete or update a record, first we need to search for the
record.
• Again, searching a record is similar to retrieving it- start from
the beginning of the file till the record is fetched.
• If it is a small file, it can be fetched quickly. But larger the file,
greater amount of time needs to be spent in fetching.
Random(Heap) file organization technique

• In addition, while deleting a record, the record will be deleted


from the data block.
• But it will not be freed and it cannot be re-used.
• Hence as the number of record increases, the memory size
also increases and hence the efficiency decreases.
• For the database to perform better, DBA has to free this
unused memory periodically.
Random(Heap) file organization technique

Advantages of Heap File Organization


• It is suited for very small files as the fetching of records is
faster in them. As the file size grows, linear search for the
record becomes time consuming.
Disadvantages of Heap File Organization
• This method is inefficient for larger databases as it takes time
to search/modify the record.
• Proper memory management is required to boost the
performance. Otherwise there would be lots of unused
memory blocks lying and memory size will simply be growing.
Fundamental of
Database Management System
BCAC0020
Lecture - 2

Presented by:
Anil chanchal
Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura
Relational model concepts

• Relational data model is the primary data model, which is


used widely around the world for data storage and processing.
• This model is simple and it has all the properties and
capabilities required to process data with storage efficiency
Basic concepts of relational data model

• Tables − In relational data model, relations are saved in the format of


Tables. This format stores the relation among entities. A table has rows
and columns, where rows represents records and columns represent the
attributes.
• Tuple − A single row of a table, which contains a single record for that
relation is called a tuple.
• Relation instance − A finite set of tuples in the relational database system
represents relation instance. Relation instances do not have duplicate
tuples.
• Relation schema − A relation schema describes the relation name (table
name), attributes, and their names.
• Relation key − Each row has one or more attributes, known as relation key,
which can identify the row in the relation (table) uniquely.
• Attribute domain − Every attribute has some pre-defined value scope,
known as attribute domain.
Constraints in DBMS
• Relational constraints are the restrictions imposed on the
database contents and operations.
• They ensure the correctness of data in the database.
Types of Constraints in DBMS
1. Domain Constraint
• Domain constraint defines the domain or set of values for an
attribute.
• It specifies that the value taken by the attribute must be the
atomic value from its domain.
Example STU_ID Name Age
S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
S004 Rahul A
Here, value ‘A’ is not allowed since only integer values can be taken by the age
attribute.
2. Tuple Uniqueness Constraint

• Tuple Uniqueness constraint specifies that all the tuples must


be necessarily unique in any relation.

STU_ID Name Age STU_ID Name Age


S001 Akshay 20 S001 Akshay 20
S002 Abhishek 21 S001 Akshay 20
S003 Shashank 20 S003 Shashank 20
S004 Rahul 20 S004 Rahul 20
3. Key Constraint
• Key constraint specifies that in any relation-
– All the values of primary key must be unique.
– The value of primary key must not be null.

STU_ID Name Age


S001 Akshay 20
S001 Abhishek 21
S003 Shashank 20
S004 Rahul 20
4. Entity Integrity Constraint
• Entity integrity constraint specifies that no attribute of
primary key must contain a null value in any relation.
• This is because the presence of null value in the primary key
violates the uniqueness property.

STU_ID Name Age


S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
Rahul 20
5. Referential Integrity Constraint
• This constraint is enforced when a foreign key references
the primary key of a relation.
• It specifies that all the values taken by the foreign key
must either be available in the relation of the primary
key or be null.

Important Results
– We can not insert a record into a referencing relation if the
corresponding record does not exist in the referenced
relation.
– We can not delete or update a record of the referenced
relation if the corresponding record exists in the
referencing relation.
• Here, relation ‘Student’ references the relation ‘Department’.

STU_ID Name Dept_no Dept_no Dept_name


S001 Akshay D10 D10 ASET
S002 Abhishek D10 D11 ALS
S003 Shashank D11 D12 ASFL
S004 Rahul D14 D13 ASHS
Referential Integrity Constraint Violation

• There are following three possible causes of violation of


referential integrity constraint-

• Cause-01: Insertion in a referencing relation


• Cause-02: Deletion from a referenced relation
• Cause-03: Updation in a referenced relation
Cause-01: Insertion in a Referencing Relation

• It is allowed to insert only those values in the referencing


attribute which are already present in the value of the
referenced attribute.

Branch
Branch_Name
_Code
Roll_no Name Age Branch_Code
CS Computer Science
1 Rahul 22 CS EE Electronics Engineering

2 Anjali 21 CS IT Information Technology

3 Teena 20 IT CE Civil Engineering


Cause-01: Insertion in a Referencing Relation

Branch
Branch_Name
_Code
Roll_no Name Age Branch_Code
CS Computer Science
1 Rahul 22 CS EE Electronics Engineering
2 Anjali 21 CS IT Information Technology
3 Teena 20 IT CE Civil Engineering

4 James 23 ME
Cause-02: Deletion from a Referenced Relation

• It is not allowed to delete a row from the referenced relation


if the referencing attribute uses the value of the referenced
attribute of that row.

Branch
Branch_Name
_Code
Roll_no Name Age Branch_Code
CS Computer Science
1 Rahul 22 CS EE Electronics Engineering

2 Anjali 21 CS IT Information Technology

3 Teena 20 IT CE Civil Engineering


• To handle this we can simultaneously delete those tuples
from the referencing relation where the referencing attribute
uses the value of referenced attribute being deleted.
• This method of handling the violation is called as On Delete
Cascade.
OR
• This method involves aborting or deleting the request for a
deletion from the referenced relation if the value is used by
the referencing relation.
Cause-03: Updation in a Referenced Relation

• It is not allowed to update a row of the referenced relation if


the referencing attribute uses the value of the referenced
attribute of that row.

Branch
Branch_Name
_Code
Roll_no Name Age Branch_Code
CSE Computer Science
1 Rahul 22 CS EE Electronics Engineering

2 Anjali 21 CS IT Information Technology

3 Teena 20 IT CE Civil Engineering


• We can simultaneously updating those tuples of the
referencing relation where the referencing attribute uses the
referenced attribute value being updated.
OR
• This method involves aborting or deleting the request for an
updation of the referenced relation if the value is used by the
referencing relation.
Relational Algebra
• Relational algebra
– Basic set of operations for the relational model
• Relational algebra expression
– Sequence of relational algebra operations
• Each relation is defined to be a set of tuples in the
formal relational mode
The SELECT Operation
Subset of the tuples from a relation that satisfies a
selection condition:

Boolean expression <selection condition> contains


clauses of the form:

<attribute name> <comparison op> <constant value>


Or
<attribute name> <comparison op> <attribute name>
The SELECT Operation
• <attribute name> is the name of an attribute of R,

• <comparison op> is normally one of the operators


,=, <, ≤,>, ≥, ≠-, and

• <constant value> is a constant value from the


attribute domain
The SELECT Operation
• <selection condition> applied independently to each
individual tuple t in R
– If condition evaluates to TRUE, tuple selected
• Boolean conditions AND, OR, and NOT
• Unary
– Applied to a single relation
The SELECT Operation
Example
• To select the EMPLOYEE tuples whose department is 4,

ς Dno=4(EMPLOYEE)
• To select the EMPLOYEE whose salary is greater than $30,000

ς Salary>30000 (EMPLOYEE)
The SELECT Operation
Example
• select all employees who either work in department 4 and
make over $25,000 per year, or work in department 5 and
make over $30,000

ς ( Dno=4 AND Salary>25000) OR (Dno=5 AND

(EMPLOYEE)
Salary>30000)
The SELECT Operation
• The degree of the relation resulting from a SELECT
operation—its number of attributes—is the same as the
degree of R.

• The number of tuples in the resulting relation is always less


than or equal to the number of tuples in R.
The SELECT Operation
• Notice that the SELECT operation is commutative; that is,

ς (
<cond1> ς (R)) =
<cond2> ς (
<cond2> ς (R))
<cond1>

• Hence, a sequence of SELECTs can be applied in any order.


The SELECT Operation
• In addition, we can always combine a cascade (or sequence)
of SELECT operations into a single SELECT operation with a
conjunctive (AND) condition; that is,

ς <cond1> ς
(ς <cond2>(...( (R)) ...)) =
<condn>

ς <cond1> AND<cond2> AND...AND <condn>(R)


The SELECT Operation
• In SQL, the SELECT condition is typically specified in the
WHERE clause of a query.

σ Dno=4 AND Salary>25000 (EMPLOYEE)


• would correspond to the following SQL query:
SELECT *
FROM EMPLOYEE
WHERE Dno=4 AND Salary>25000;
Selection

rating  (S2)
8
S2

sid sname rating age sid sname rating age


28 yuppy 9 35.0 28 yuppy 9 35.0
31 lubber 8 55.5 58 rusty 10 35.0
44 guppy 5 35.0
58 rusty 10 35.0
QUIZ…….??

Guess the output….


The PROJECT Operation
• Selects columns from table and discards the
other columns:

• Duplicate elimination
– Result of PROJECT operation is a set of distinct
tuples
sname rating
Projection yuppy 9
lubber 8
guppy 5
rusty 10
sname,rating(S2)
S2 sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5 age
44 guppy 5 35.0
35.0
58 rusty 10 35.0
55.5
age(S2)
Selection & Projection
sid sname rating age
S2 28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0

 sname,rating( rating 8(S2))

sname rating
yuppy 9
rusty 10
Selection & Projection
sid sname rating age
S2 28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0

 sname,rating( rating 8(S2))

sname rating
yuppy 9
rusty 10
QUIZ…….??

• Guess the output…


• Which of the following relational algebra
expressions are syntactically correct? What do
they mean?
1. STUDENTS.
2. ςMAXPT=10(EXERCISES).
3. πFIRST(πLAST(STUDENTS)).
4. ςPOINTS ≤ 5(ςPOINTS ≥ 1(RESULTS)).
5. ςPOINTS(πPOINTS=10(RESULTS)).
• Which of the following relational algebra
expressions are syntactically correct? What do
they mean?
1. STUDENTS.
2. ςMAXPT=10(EXERCISES).
3. πFIRST(πLAST(STUDENTS)). Wrong

4. ςPOINTS ≤ 5(ςPOINTS ≥ 1(RESULTS)).


5. σPOINTS(πPOINTS=10(RESULTS)).
Union

S1 sid sname rating age


22 dustin 7 45.0 S1S2
31 lubber 8 55.5
58 rusty 10 35.0 sid sname rating age
22 dustin 7 45.0
S2 sid sname rating age 31 lubber 8 55.5
28 yuppy 9 35.0 58 rusty 10 35.0
31 lubber 8 55.5 44 guppy 5 35.0
44 guppy 5 35.0 28 yuppy 9 35.0
58 rusty 10 35.0
Intersection

S1 sid sname rating age


22 dustin 7 45.0
S1 S2
31 lubber 8 55.5
58 rusty 10 35.0

sid sname rating age


S2 sid sname rating age
28 yuppy 9 35.0 31 lubber 8 55.5
31 lubber 8 55.5 58 rusty 10 35.0
44 guppy 5 35.0
58 r usty 10 35.0
Set-Difference
S1 S2

sid sname rating age sid sname rating age


22 dustin 7 45.0 28 yuppy 9 35.0
31 lubber 8 55.5 31 lubber 8 55.5
58 rusty 10 35.0 44 guppy 5 35.0
58 rusty 10 35.0

S1 S2

sid sname rating age


22 dustin 7 45.0
Cross-Product (Cartesian Product) X
• It is also called “cross product”
• R × S concatenates each tuple from R with each tuple from S.
• If the relation R contains n tuples, and the relation S contains
m tuples, then R × S contains n ∗ m tuples.
• R × S is written in SQL as
SELECT *
FROM R, S
Cross-Product (Cartesian Product)
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1, with
field names `inherited’ if possible.
– Conflict: Both S1 and R1 have a field called sid.
– in practice the Cartesian product is rarely used.
Cross-Product (Cartesian Product)
S1 R1
sid sname rating age sid bid day
22 dustin 7 45.0 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0

(sid) sname rating age (sid) bid day


22 dustin 7 45.0 22 101 10/10/96
(S1R1) 22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Pname Price Pname Cname Cost
Pname Price Laptop 1500 Laptop CPU 500
Laptop 1500 Laptop 1500 Laptop HDD 300
Car 20000 Laptop 1500 Laptop CASE 700
Airplane 3000000 Laptop 1500 Car Wheels 1000
Car 20000 Laptop CPU 500
Pname Cname Cost Car 20000 Laptop HDD 300
Laptop CPU 500 Car 20000 Laptop CASE 700
Laptop HDD 300 Car 20000 Car Wheels 1000
Laptop CASE 700 Airplane 3000000 Laptop CPU 500
Car Wheels 1000 Airplane 3000000 Laptop HDD 300
Airplane 3000000 Laptop CASE 700
Airplane 3000000 Car Wheels 1000
Renaming

• An operator ρR(S) that pretends “R.” to all attribute names is


sometimes useful:

• This is only an abbreviation for an application of the projection:

πR.A←A, R.B←B(S).
• Otherwise, attribute names in relational algebra do not automatically
contain the relation name.
Joins: used to combine relations
• Condition Join: R⊳
⊲c S   c (R S)

(sid) sname rating age (sid) bid day


22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
S1 ⊳
⊲S1.sid  R1.sid R1
• Result schema same as that of cross-product.
• Fewer tuples than cross-product, might be able to
compute more efficiently
• Sometimes called a theta-join.
Join
• Equi-Join: A special case of condition join
where the condition c contains only equalities.
S1.sid sname rating age R1.sid bid day
22 dustin 7 45.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
⊲S1..sid  R1.sid R1
S1⊳
• Natural Join: Equijoin on all common fields,
but only one copy of fields for which equality is
specified.
• Theta (θ) Join, equi join and natural joins are
called Inner join.
S1 R1
sid sname rating age sid bid day
22 dustin 7 45.0 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0
Note
• A theta join allows for arbitrary comparison relationships
(such as ≥).
• An equijoin is a theta join using the equality operator.
• A natural join is an equijoin on attributes that have the same
name in each relationship.
• Additionally, a natural join removes the duplicate columns
involved in the equality comparison so only 1 of each
compared column remains.
Outer Joins
• An inner join includes only those tuples with matching
attributes and the rest are discarded in the resulting relation.
• Therefore, we need to use outer joins to include all the tuples
from the participating relations in the resulting relation.
• There are three kinds of outer joins − left outer join, right
outer join, and full outer join.
Left Outer Join(R S)
Courses HoD
A B A B
100 Database 100 Alex
101 Mechanics 102 Maya
102 Electronics 104 Mira

Courses HoD
A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya


Right Outer Join: ( R S)

Courses HoD
A B C D

100 Database 100 Alex

102 Electronics 102 Maya

--- --- 104 Mira


Full Outer Join: ( R S)

Courses HoD
A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya

--- --- 104 Mira


Division
• Not supported as a primitive operator, but useful for
expressing queries like: Find sailors who have
reserved all boats.
• Let A have 2 fields, x and y; B have only field y:
– A/B =  x |  x, y  A  y  B
– i.e., A/B contains all x tuples (sailors) such that for every y tuple
(boat) in B, there is an xy tuple in A.
A/B contains all x tuples such that for every y tuple in
B, there is an xy tuple in A.

sno pno pno pno pno


s1 p1 p2 p2 p1
s p2 B1 p4 p2
1 p3 p4
B2
s p4
1 p1 sno
B3
s p2 s1
1 p2 s2 sno
s s3 s1 sno
2 s4 s4 s1
s
A A/B1 A/B2 A/B3
2
s
Division Operation – Example
 Relations r, s:
A B
B
 1 1
 2
 3 2
 1
 1
s
 1
 3
 4
 6
 1
 2

r  s: A r


Another Division Example
 Relations r, s:
A B C D E D E
 a  a 1 a 1
 a  a 1 b 1
 a  b 1 s
 a  a 1
 a  b 3
 a  a 1
 a  b 1
 a  b 1
r

 r  s: A B C

 a 
 a 
Example of Division

• Find all customers who have an account at all


branches located in Chicago
– Branch (bname, assets, bcity)
– Account (bname, acct#, cname, balance)
Example of Division
r1: Find all branches in Chicago
r1 ( Branch )
bname bcity'Chicago'
r2: Find (bname, cname) pair from Account
r2  ( Account)
bname,cname
r3: Customers in r2 with every branch name in
r1
r3r2r1
Exercise 1
Given relational schema:
Sailors (sid, sname, rating, age)
Reservation (sid, bid, date)
Boats (bid, bname, color)

1) Find names of sailors who’ve reserved boat #103


2) Find names of sailors who’ve reserved a red boat
3) Find sailors who’ve reserved a red or a green boat
4) Find sailors who’ve reserved a red and a green boat
5) Find the names of sailors who’ve reserved all boats
1) Find names of sailors who’ve reserved boat
#103

• Solution 1:  sname(( bid 103 Reserves) ⊳


⊲ Sailors)

 Solution 2:  (Temp1,  bid 103 Reserves)

 (Temp2, Temp1 ⊳⊲Sailors)


 sname (Temp2)

 Solution 3:  sname( bid 103(Reserves⊳⊲ Sailors))


2) Find names of sailors who’ve reserved a red boat

• Boats (bid, bname, color)


• Information about boat color only available in
Boats; so need an extra join:
 sname(( color ' red ' Boats) ⊳⊲Reserves ⊳⊲Sailors)

 A more efficient solution -- why more efficient?


 sname(sid((  ⊲Res)⊳
Boats)⊳ ⊲Sailors)
bid color'red'
A query optimizer can find this, given the first solution!
3) Find sailors who’ve reserved a red or a green boat

• Can identify all red or green boats, then find


sailors who’ve reserved one of these boats:
(Tempboats,(color'red'color'green' Boats))

 sname(Tempboats⊳⊲Reserves⊳⊲Sailors)
4) Find sailors who’ve reserved a red and a green boat

• Previous approach won’t work! Why?


• Must identify sailors who’ve reserved red
boats, sailors who’ve reserved green boats,
then find the intersection (note that sid is a
key for Sailors):
4) Find sailors who’ve reserved a red and a green boat
• Previous approach won’t work! Why?
• Must identify sailors who’ve reserved red boats, sailors who’ve reserved
green boats, then find the intersection (note that sid is a key for Sailors):

 (Tempred, sid ((color ⊲Reserves))


Boats)⊳
'red'

 (Tempgreen,  sid ((color 'green' Boats)⊳⊲ Reserves))

 sname((TempredTempgreen)⊳⊲ Sailors)
5) Find the names of sailors who’ve reserved all boats

• Uses division; schemas of the input relations to


division (/) must be carefully chosen:

 (Tempsids, ( sid,bidReserves) / ( bid Boats))

 sname(Tempsids⊳⊲Sailors)

 To find sailors who’ve reserved all ‘Interlake’ boats:


..... / ( Boats)
bid bname ' Interlake'
Exercise 2
• Student(sID, surName, firstName, campus, email, cgpa)
• Course(dept, cNum, name, breadth)
• Offering(oID, dept, cNum, term, instructor)
• Took(sID, oID, grade)
• Student number of all students who have
taken cNum= 343 from dept = csc.
• Student number of all students who have
taken csc343 and earned an A+ in it.
Exercise 3
• employee (person-name, street, city)
• works (person-name, company-name, salary)
• company (company-name, city)
• manages (person-name, manager-name)
a. Find the names of all employees
who work for First Bank Corporation.

Πperson-name (ς company-name = “First Bank Corporation”


(works))
Find the names and cities of residence of all employees
who work for First Bank Corporation.

Πperson-name, city (employee ⊲ (ς company-name =“First



Bank Corporation” (works)))
Find the names, street address, and cities of residence of all
employees who work for First Bank Corporation and earn more
than $10,000 per annum.

Πperson-name, street, city (ς company-name = “First Bank Corporation” 𝖠


salary >10000)works ⊳ ⊲ employee)
Find the names of all employees in this database who live in the
same city as the company for which they work.

Πperson-name (employee ⊳⊲ works ⊳⊲ company)


Assume the companies may be located in several cities. Find all
companies located in every city in which Small Bank Corporation
is located

Πcompany-name (company ÷(Πcity (ς company-name = “Small Bank


Corporation” (company))))
Exercise 4
• Suppliers(sid: integer, sname: string, address: string)
• Parts(pid: integer, pname: string, color: string)
• Catalog(sid: integer, pid: integer, cost: real)
Find the names of suppliers who supply some
red part.

πsname(πsid((πpidς color=red Pa⊳⊲


rts) C⊳a⊲talog)
Suppliers)
Find the sids of suppliers who supply
some red or green part.

πsid(πpid(ς color=red∨color=green Parts) ⊳⊲ catalog)


Find the sids of suppliers who supply some red
part or are at 221 Packer Street.

ρ(R1, πsid((πpidς color=red Parts) ⊳⊲Catalog))


ρ(R2, πsid(ς address=221PackerStreet Suppliers))

R1 𝖴 R2
Find the sids of suppliers who supply
some red part and some green part.
ρ(R1, πsid((πpidςcolor=red Parts) ⊳⊲Catalog))

ρ(R2, πsid((πpidς color=green Parts) ⊳⊲Catalog))


R1 ∩ R2
Find the sids of suppliers who supply every
part.

πsid,pid Catalog)/(πpid Parts)


(
Find the sids of suppliers who supply every red
part.

πsid,pid Catalog)/(πpidς color=red Parts)


(
Find the sids of suppliers who supply
every red or green part

πsid,pidCatalog)/(πpidς
( color=red ∨ color=green Parts)
Functional Dependency in DBMS
• In any relation, a functional dependency
α → β holds if
• Two tuples having same value of attribute α also have same
value for attribute β.
Mathematically,
• If α and β are the two sets of attributes in a relational
table R where:
α⊆R
β⊆R
• Then, for a functional dependency to exist from α to
β, If t1*α+ = t2*α+, then t1*β+ = t2*β+
α β
• fd : α → β t1[α+ t1[β+
t2[α+ t2[β+
……. …….
Types Of Functional Dependencies
1. Trivial Functional Dependencies
• A functional dependency X → Y is said to be trivial if and only if Y ⊆
X.

• Thus, if RHS of a functional dependency is a subset of LHS, then it is


called as a trivial functional dependency.

Examples

• AB → A
• AB → B
• AB → AB
2. Non-Trivial Functional Dependencies
• A functional dependency X → Y is said to be non-trivial if and
only if Y ⊄ X.
• Thus, if there exists at least one attribute in the RHS of a
functional dependency that is not a part of LHS, then it is
called as a non-trivial functional dependency.

Examples

• AB → BC
• AB → CD
Inference Rules
Reflexivity
• If B is a subset of A, then A → B always holds.
Transitivity
• If A → B and B → C, then A → C always holds.
Augmentation
• If A → B, then AC → BC always holds.
Decomposition
• If A → BC, then A → B and A → C always holds.
Composition
• If A → B and C → D, then AC → BD always holds.
Additive
• If A → B and A → C, then A → BC always holds.
Rules for Functional Dependency

Rule-01:
• A functional dependency X → Y will always hold if all the
values of X are unique (different) irrespective of the values of
Y.

A B C D E
A→B
5 4 3 2 2 A → BC
8 5 3 2 1 A → CD
A → BCD
1 9 3 3 5 A → DE
4 7 3 3 8 A → BCDE
Rule-02:
• A functional dependency X → Y will always hold if all the
values of Y are same irrespective of the values of X.

A B C D E
A→C
5 4 3 2 2 AB → C
8 5 3 2 1 ABDE → C
DE → C
1 9 3 3 5 AE → C
4 7 3 3 8
Closure of an Attribute Set
• The set of all those attributes which can be functionally
determined from an attribute set is called as a closure of that
attribute set.
• Closure of attribute set {X} is denoted as {X}+
Steps to Find Closure of an Attribute Set

Step-01:
• Add the attributes contained in the attribute set for which
closure is being calculated to the result set.
Step-02
• Recursively add the attributes to the result set which can be
functionally determined from the attributes already contained
in the result set.
Example
• Consider a relation R ( A , B , C , D , E , F , G ) with the
functional dependencies-
A → BC
BC → DE
D→F
CF → G
• Now, let us find the closure of some attributes and attribute
sets
Closure of attribute A
A+ = { A }
= , A , B , C - ( Using A → BC )
= , A , B , C , D , E - ( Using BC → DE )
= , A , B , C , D , E , F - ( Using D → F )
= , A , B , C , D , E , F , G - ( Using CF → G )
Thus,
• A+ = { A , B , C , D , E , F , G }
Closure of attribute D
D+ = { D }
= , D , F - ( Using D → F )
• We can not determine any other attribute using attributes D and F
contained in the result set. Thus,
D+ = { D , F }

Closure of attribute set {B, C}


{ B , C }+ = { B , C }
= , B , C , D , E - ( Using BC → DE )
= , B , C , D , E , F - ( Using D → F )
= , B , C , D , E , F , G - ( Using CF → G )
• Thus,
{ B , C }+ = { B , C , D , E , F , G }
Finding the Keys Using Closure

Super Key
• If the closure result of an attribute set contains all the
attributes of the relation, then that attribute set is called as a
super key of that relation.
• Thus, we can say-
• “The closure of a super key is the entire relation schema.”
Candidate Key
• If there exists no subset of an attribute set whose closure
contains all the attributes of the relation, then that attribute
set is called as a candidate key of that relation.
Problem
Consider the given functional dependencies-
• AB → CD
• AF → D
• DE → F
• C→G
• F→E
• G→A

Which of the following options is false?


A. { CF }+ = { A , C , D , E , F , G }
B. { BG }+ = { A , B , C , D , G }
C. { AF }+ = { A , C , D , E , F , G }
D. { AB }+ = { A , C , D , F ,G }
Answer
• Option (C) and Option (D)
Keys in DBMS
• A key is a set of attributes that can identify each tuple
uniquely in the given relation.
1. Super Key

• A super key is a set of attributes that can identify each tuple


uniquely in the given relation.
• A super key is not restricted to have any specific number of
attributes.
• Thus, a super key may consist of any number of attributes.
1. Super Key
EXAMPLE:

Student ( class_roll , name , age , address , course , section )

• Given below are the examples of super keys since each set
can uniquely identify each student in the Student table-

( class_roll , name , age , address , course , section )


( course , section , class_roll )
( name , address )
1. Super Key
NOTE-
• All the attributes in a super key are definitely sufficient to
identify each tuple uniquely in the given relation but all of
them may not be necessary.
2. Candidate Key

• A set of minimal attribute(s) that can identify each tuple


uniquely in the given relation is called as a candidate key.
2. Candidate Key
Example
Student ( class_roll , name , age , address , course , section )

• Given below are the examples of candidate keys since


each set consists of minimal attributes required to
identify each student uniquely in the Student table-

( course , section , class_roll )


( name , address )
2. Candidate Key
NOTES
• All the attributes in a candidate key are sufficient as well as
necessary to identify each tuple uniquely.
• Removing any attribute from the candidate key fails in
identifying each tuple uniquely.
• The value of candidate key must always be unique.
• The value of candidate key can never be NULL.
• It is possible to have multiple candidate keys in a relation.
• Those attributes which appears in some candidate key are
called as prime attributes.
3. Primary Key
• A primary key is a candidate key that the database designer
selects while designing the database.
OR
• Candidate key that the database designer implements is
called as a primary key.
3. Primary Key
NOTES
• The value of primary key can never be NULL.
• The value of primary key must always be unique.
• The values of primary key can never be changed i.e. no
updation is possible.
• The value of primary key must be assigned when inserting a
record.
• A relation is allowed to have only one primary key.
4. Alternate Key
• Candidate keys that are left unimplemented or unused after
implementing the primary key are called as alternate keys.
OR
• Unimplemented candidate keys are called as alternate keys.
5. Foreign Key
• An attribute ‘X’ is called as a foreign key to some other
attribute ‘Y’ when its values are dependent on the values of
attribute ‘Y’.
• The attribute ‘X’ can assume only those values which are
assumed by the attribute ‘Y’.
• Here, the relation in which attribute ‘Y’ is present is called as
the referenced relation.
• The relation in which attribute ‘X’ is present is called as
the referencing relation.
• The attribute ‘Y’ might be present in the same table or in
some other table.
5. Foreign Key

Here, t_dept can take only those values which are


present in dept_no in Department table since only
those departments actually exist.
5. Foreign Key
NOTES
• Foreign key references the primary key of the table.
• Foreign key can take only those values which are present in
the primary key of the referenced relation.
• Foreign key may have a name other than that of a primary
key.
• Foreign key can take the NULL value.
• There is no restriction on a foreign key to be unique.
• In fact, foreign key is not unique most of the time.
• Referenced relation may also be called as the master table
or primary table.
• Referencing relation may also be called as the foreign table.
6. Partial Key
• Partial key is a key using which all the records of the table can
not be identified uniquely.
• However, a bunch of related tuples can be selected from the
table using the partial key.
6. Partial Key
Dependent ( Emp_no, Dependent_name , Relation )

Here, using partial key Emp_no, we can not identify a tuple uniquely
but we can select a bunch of tuples from the table.

Emp_no Dependent_name Relation


E1 Suman Mother

E1 Ajay Father

E2 Vijay Father
E2 Ankush Son
7. Composite Key
• A primary key comprising of multiple attributes and not just a
single attribute is called as a composite key.
8. Unique Key
• Unique key is a key with the following properties-
– It is unique for all the records of the table.
– Once assigned, its value can not be changed i.e. it is non-updatable.
– It may have a NULL value.
8. Unique Key
Example
• The best example of unique key is Adhaar Card Numbers
• The Adhaar Card Number is unique for all the citizens (tuples)
of India (table).
• If it gets lost and another duplicate copy is issued, then the
duplicate copy always has the same number as before.
• Thus, it is non-updatable.
• Few citizens may not have got their Adhaar cards, so for them
its value is NULL.
9. Surrogate Key
• Surrogate key is a key with the following properties-
• It is unique for all the records of the table.
• It is updatable.
• It can not be NULL i.e. it must have some value.

Example
• Mobile Number of students in a class where every student
owns a mobile phone.
10. Secondary Key
• Secondary key is required for the indexing purpose for better
and faster searching.
Finding Candidate Keys
• A set of minimal attribute(s) that can identify each tuple
uniquely in the given relation is called as a candidate key.
OR
• A minimal super key is called as a candidate key.
Finding Candidate Key
Step-01
• Determine all essential attributes of the given relation.

• Essential attributes are those attributes which are not present


on RHS of any functional dependency.
• Essential attributes are always a part of every candidate key.
• This is because they can not be determined by other
attributes.
Finding Candidate Key
Example
• Let R(A, B, C, D, E, F) be a relation scheme with the following
functional dependencies
A→B
C→D
D→E

• Here, the attributes which are not present on RHS of any


functional dependency are A, C and F.
• So, essential attributes are- A, C and F.
Finding Candidate Key
Step-02
• The remaining attributes of the relation are non-essential
attributes.
• This is because they can be determined by using essential
attributes.

• Now, following two cases are possible


Case-01
• If all essential attributes together can determine all
remaining non-essential attributes, then-
– The combination of essential attributes is the candidate key.
– It is the only possible candidate key.
Case-02
• If all essential attributes together can not determine all
remaining non-essential attributes, then-
– The set of essential attributes and some non-essential attributes
will be the candidate key(s).
– In this case, multiple candidate keys are possible.
– To find the candidate keys, we check different combinations of
essential and non-essential attributes.
FINDING CANDIDATE KEYS
Problem-01
• Let R = (A, B, C, D, E, F) be a relation scheme with the
following dependencies-
C→ F
E→A
EC → D
A→B
Which of the following is a key for R?
A. CD
B. EC
C. AE
D. AC
Solution
{ CE }+
={C,E}
= , C , E , F - ( Using C → F )
= , A , C , E , F - ( Using E → A )
= , A , C , D , E , F - ( Using EC → D )
= , A , B , C , D , E , F - ( Using A → B )
• We conclude that CE can determine all the attributes of
the given relation.
• So, CE is the only possible candidate key of the relation.
Problem-02
• Let R = (A, B, C, D, E) be a relation scheme with the following
dependencies-
AB → C
C→D
B→E
• Find the candidate keys and super keys.
Solution
{ AB }+
={A,B}
= , A , B , C - ( Using AB → C )
= , A , B , C , D - ( Using C → D )
= , A , B , C , D , E - ( Using B → E )

Hence AB is the candidate key.


Any combination along with AB will be superkey.
Problem-03
• Consider the relation scheme R(E, F, G, H, I, J, K, L, M, N) and the set
of functional dependencies-
, E, F - → , G -
, F-→, I,J-
, E, H - → , K, L -
,K-→, M-
,L-→, N-

What is the key for R?


A. { E, F }
B. { E, F, H }
C. { E, F, H, K, L }
D. {E}
Solution
{ EFH }+
={E,F,H}
= , E , F , G , H - ( Using EF → G )
= , E , F , G , H , I , J - ( Using F → IJ )
= , E , F , G , H , I , J , K , L - ( Using EH → KL )
= , E , F , G , H , I , J , K , L , M - ( Using K → M )
= , E , F , G , H , I , J , K , L , M , N - ( Using L → N )
Decomposition of a Relation
• The process dividing a single relation into two or more sub
relations is called as decomposition of a relation.
Properties of Decomposition
1. Lossless decomposition
2. Dependency preserving decomposition
1. Lossless decomposition
• Lossless decomposition ensures
– No information is lost from the original relation during decomposition.
– When the sub relations are joined back, the same relation is obtained
that was decomposed.
– Every decomposition must always be lossless.
2. Dependency Preservation
• Dependency preservation ensures-
– None of the functional dependencies that holds on the original
relation are lost.
– The sub relations still hold or satisfy the functional dependencies of
the original relation.
Types of Decomposition
1. Lossless Join Decomposition
• Consider there is a relation R which is decomposed into sub
relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition when the
join of the sub relations results in the same relation R that was
decomposed.
• For lossless join decomposition, we always have

R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R

where ⋈ is a natural join operator


A B C
Example
1 2 1
R(A, B, C) is decomposed into:
2 5 3
R1( A , B ) 3 3 3
and
R2( B , C )

Now, let us check whether this


decomposition is lossless or not.
A B B C
1 2 2 1
2 5 5 3
3 3 3 3
For lossless decomposition, we must have:
R1 ⋈ R2 = R

Now, if we perform the natural join ( ⋈ ) of the sub relations


R1 and R2 , we get
A B C
1 2 1
2 5 3
3 3 3
This relation is same as the original relation R.
Thus, we conclude that the above decomposition is lossless
join decomposition.
2. Lossy Join Decomposition
• Consider there is a relation R which is decomposed into sub
relations R1 , R2 , …. , Rn.
• This decomposition is called lossy join decomposition when
the join of the sub relations does not result in the same
relation R that was decomposed.
• The natural join of the sub relations is always found to have
some extraneous tuples.
• For lossy join decomposition, we always have
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
where ⋈ is a natural join operator
R( A , B , C )
A B C
1 2 1
2 5 3
R1( A , B ) 3 3 3

R2( B , C )

R1 ⋈ R2 ⊃ R R1 ⋈ R2
A C B C
A B C 1 1 2 1
1 2 1 2 3 5 3
2 5 3 3 3 3 3
2 3 3
Now, if we perform the
3 5 3 natural join ( ⋈ ) of the
sub relations R1 and
3 3 3
R2 we get-
Determining Whether Decomposition Is Lossless Or Lossy

Condition-01
• Union of both the sub relations must contain all the attributes that
are present in the original relation R.
R 1 𝖴 R2 = R
Condition-02
• Intersection of both the sub relations must not be null.
• In other words, there must be some common attribute which is
present in both the sub relations.
R1 ∩ R2 ≠ ∅
Condition-03
• Intersection of both the sub relations must be a super key of either
R1 or R2 or both.
R1 ∩ R2 = Super key of R1 or R2
If any of these conditions fail, then the
decomposition is lossy.
Problem-01
• Consider a relation schema R ( A , B , C , D )
with the functional dependencies A → B and C → D.
Determine whether the decomposition of R into R1 ( A , B )
and R2 ( C , D ) is lossless or lossy.
Solution
Condition-01
• According to condition-01, union of both the sub relations must
contain all the attributes of relation R.
R 1 ( A , B ) 𝖴 R2 ( C , D ) = R ( A , B , C , D )
• Clearly, union of the sub relations contain all the attributes of
relation R.
• Thus, condition-01 satisfies.
Condition-02
• According to condition-02, intersection of both the sub relations
must not be null.
R1 ( A , B ) ∩ R2 ( C , D ) = Φ
• Clearly, intersection of the sub relations is null.
• So, condition-02 fails.
• Thus, we conclude that the decomposition is lossy.
Problem-02
• Consider a relation schema R ( A , B , C , D ) with the following
functional dependencies-
A→B
B→C
C→D
D→B
• Determine whether the decomposition of R into R1 ( A , B ) ,
R2 ( B , C ) and R3 ( B , D ) is lossless or lossy.
Solution
Condition 1
• R1 ( A , B ) 𝖴 R2 ( B , C ) 𝖴 R3 ( B , D ) = R(A,B,C,D)
Condition 2
• R1 ∩ R2 ≠ ∅ True
• (R1 U R2) ∩ R3 ≠ ∅ True
Condition 3
• R1 ∩ R2 = {B}+ = {BCD} super key of R2
• (R1 U R2) ∩ R3 = {B}+ = {BCD} super key of R3
• Hence lossless decomposition
Normalization in DBMS
• Reducing the redundancies
• Ensuring the integrity of data through lossless decomposition
• Normalization is done through normal forms.
First Normal Form
• A given relation is called in First Normal Form (1NF)
– if each cell of the table contains only an atomic value.
OR
– if the attribute of every tuple is either single valued or a null value.
First Normal Form

Example

Student_id Name Subjects


100 Akshay Computer Networks, Designing
101 Aman Database Management System
102 Anjali Automata, Compiler Design

Relation is not in 1NF


First Normal Form
Relation is in 1NF
Student_id Name Subjects
100 Akshay Computer Networks
100 Akshay Designing
101 Aman Database Management System
102 Anjali Automata
102 Anjali Compiler Design
First Normal Form
NOTE
• By default, every relation is in 1NF.
• This is because formal definition of a relation states that value
of all the attributes must be atomic.
Second Normal Form
• A given relation is called in Second Normal Form (2NF) if and
only if-
– Relation already exists in 1NF.
– No partial dependency exists in the relation.
Second Normal Form
Partial Dependency
• A partial dependency is a dependency where a part of the
candidate key determines non-prime attribute(s).

• In other words,
A → B is called a partial dependency if and only if-
– A is a subset of some candidate key
– B is a non-prime attribute.
• If any one condition fails, then it will not be a partial
dependency.
Second Normal Form
Example
• Consider a relation- R ( V , W , X , Y , Z ) with functional
dependencies-
VW → XY
Y→V
WX → YZ
• The possible candidate keys for this relation are- VW , WX , WY
• Prime attributes = { V , W , X , Y }
• Non-prime attributes = { Z }
• Now, if we observe the given dependencies-
• There is no partial dependency.
• Thus, we conclude that the given relation is in 2NF.
Consider a relation- R ( V , W , X , Y , Z ) with
functional dependencies-
VW → XY
Y→V
WX → YZ
Third Normal Form
• A given relation is called in Third Normal Form (3NF)
if and only if-
– Relation already exists in 2NF.
– No transitive dependency exists for non-prime attributes.

If A->B and B->C are two FDs then A->C is called


transitive dependency.

Where A is a prime attribute, B & C are Non


Prime attribute
Third Normal Form
• For every non-trivial function dependency X –> Y:
– X is a super key.
– Y is a prime attribute (each element of Y is part of some candidate
key).
Third Normal Form
Example
•Consider a relation- R ( A , B , C , D , E ) with functional dependencies- A
→ BC
CD → E
B→D
E→A
• The possible candidate keys for this relation are-
A , E , CD , BC
• Prime attributes = { A , B , C , D , E }
• There are no non-prime attributes
• It is clear that there are no non-prime attributes in the relation.
• Thus, we conclude that the given relation is in 3NF.
Boyce-Codd Normal Form
• A given relation is called in BCNF if and only if-
– Relation already exists in 3NF.
– For each non-trivial functional dependency A → B, A is a
super key of the relation.
Boyce-Codd Normal Form
Example
• Consider a relation- R ( A , B , C ) with the functional dependencies-
A→B
B→C
C→A
•The possible candidate keys for this relation are- A
,B,C
• All RHS are superkey hence relation R is in BCNF.
Normal Form Summary
Question 1
• Given a relation R( A, B, C, D) and Functional
Dependency set FD = , AB → CD, B → C -, determine
whether the given R is in 2NF? If not convert it into 2
NF.
R( A, B, C, D)
FD = { AB → CD, B → C }

{AB}+ = {ABCD}
hence AB is Candidate Key Definition of 2NF: No non-prime attribute
should be partially dependent on Candidate
Prime Attribute: A,B
Key
Non Prime Attribute: C,D

B → C is Partial dependency, hence relation R is not in 2NF


Convert the table R(A, B, C, D) in 2NF:
• Since FD: B → C, our table was not in 2NF, let's decompose
the table
R1(B, C)
• Since the key is AB, and from FD ,AB → CD-, we can create
R2(A, B, C, D) but this will again have a problem of partial
dependency B → C, hence R2(A, B, D).
• Finally, the decomposed table which is in 2NF
a) R1( B, C)
b) R2(A, B, D)
Question 2
• Given a relation R( P, Q, R, S, T) and Functional
Dependency set FD = { PQ → R, S → T }, determine
whether the given R is in 2NF? If not convert it into
2 NF.
R( P, Q, R, S, T)

{ PQ → R, S → T }

{PQS}+ = {PQRST}

PQ → R and S → T, Partial functional Dependency


hence R( P, Q, R, S, T) is not in 2NF
Convert the table R( P, Q, R, S, T) in 2NF:
• Since due to FD: PQ → R and S → T, our table was not in 2NF, let's
decompose the table
• R1(P, Q, R) (Now in table R1 FD: PQ → R is Full F D, hence R1 is in
2NF)
• R2( S, T) (Now in table R2 FD: S → T is Full F D, hence R2 is in 2NF)
• And create one table for the key, since the key is PQS.
• R3(P, Q, S)

• Finally, the decomposed tables which is in 2NF are:


a) R1( P, Q, R)
b) R2(S, T)
c) R3(P, Q, S)
Question 3
• Given a relation R( P, Q, R, S, T, U, V, W, X, Y) and Functional
Dependency set FD = { PQ → R, PS → VW, QS → TU, P → X,
W → Y }, determine whether the given R is in 2NF? If not
convert it into 2 NF.
R( P, Q, R, S, T, U, V, W, X, Y)
Functional Dependency set FD =
{ PQ → R, PS → VW, QS → TU, P → X, W → Y }

{PQS}+ = {PQRSTUVWXY}

prime attribute(part of candidate key) are {P, Q, S}


non-prime attribute are {R, T, U, V, W, X ,Y}

PQ → R, PS → VW, QS → TU, P → X are Partial FD


Convert the table R( P, Q, R, S, T, U, V, W, X, Y) in 2NF:
• Since due to FD: PQ → R, PS → VW, QS → TU, P → X our table was
not in 2NF, let's decompose the table
• R1 (P, Q, R) (Now in table R1 FD: PQ → R is Full F D, hence R1 is in
2NF)
• R2 ( P, S, V, W) (Now in table R2 FD: PS → VW is Full F D, hence R2 is
in 2NF)
• R3 ( Q, S, T, U) (Now in table R3 FD: QS → TU is Full F D, hence R3 is
in 2NF)
• R4 ( P, X) (Now in table R4 FD : P → X is Full F D, hence R4 is in 2NF)
• R5 ( W, Y) (Now in table R5 FD: W → Y is Full F D, hence R2 is in
2NF)
• And create one table for the key, since the key is PQS.
• R6 (P, Q, S)
•Finally, the decomposed tables which is in 2NF are:
R1(P, Q, R)
R2( P, S, V, W)
R3( Q , S, T, U)
R4( P, X)
R5( W, Y)
R6(P, Q, S)
Question 4
• Given a relation R( A, B, C, D, E) and Functional
Dependency set FD = , A → B, B → E, C → D-,
determine whether the given R is in 2NF? If not
convert it into 2 NF.
R( A, B, C, D, E)
FD = , A → B, B → E, C → D-
{AC}+ = {ABCDE}
Prime attribute = A, C
Non-prime attribute = B D E

FD: A → B, C → D does not satisfy the definition of 2NF,


Hence because of FD A → B and C → D, the above table R( A, B, C, D, E) is
not in 2NF
Convert the table R(A, B, C, D, E) in 2NF:
• Since due to FD: A →B and C → D our table was not in 2NF,
let's decompose the table
• R1(A, B, E) ( from FD: A → B and B → E and both are
violating 2 NF definition)
• R2( C, D) (Now in table R2 FD: C → D is Full F D, hence R2 is
in 2NF)
• And create one table for candidate key AC
• R3 ( A, C)
Finally, the decomposed tables which are in 2NF:
R1( A, B, E)
R2( C, D)
R3( A, C)
• Question 1: Given a relation R( X, Y, Z) and
Functional Dependency set FD = , X → Y and Y
→ Z -, determine whether the given R is in
3NF? If not convert it into 3 NF.
R( X, Y, Z)
FD = { X → Y and Y → Z }

{X}+ = {X, Y, Z}
X is Candidate Key
FD are X → Y and Y → Z
So, we can write X → Z
X→Y→Z
Prime Non Non
Hence the relation is not in 3 NF
Prime Prime
• Now check the above table is in 2 NF.
• FD: X → Y is in 2NF ( as Key is not breaking and its Fully
functional dependent )
• FD: Y → Z is also in 2NF( as it does not violate the definition of
2NF)
• Hence above table R( X, Y, Z ) is in 2NF but not in 3NF.
Convert the table R( X, Y, Z) into 3NF:
• Since due to FD: Y → Z, our table was not in 3NF, let's
decompose the table
• FD: Y → Z was creating issue, hence one table R1(Y, Z)
• Create one Table for key X, R2(X, Y), since X → Y
• Hence decomposed tables which are in 3NF are:
R1(X, Y)
R2(Y, Z)
• Question 2: Given a relation R( X, Y, Z, W, P)
and Functional Dependency set FD = , X → Y, Y
→ P, and Z → W-, determine whether the
given R is in 3NF? If not convert it into 3 NF.
R( X, Y, Z, W, P) and FD = { X → Y, Y → P, and Z → W}

{XZ}+ = XZYPW
XZ is Candidate Key
{ X → Y, Y → P, and Z → W}

X→Y→P
Prime Non Non
Hence the relation is not in 3 NF
Prime Prime
Transaction
• Transaction is a set of operations which are all
logically related.

OR

• Transaction is a single logical unit of work formed by


a set of operations.
Operations in Transaction
1. Read Operation
• Read operation reads the data from the database and then
stores it in the buffer in main memory.
• For example- Read(A) instruction will read the value of A from
the database and will store it in the buffer in main memory.
2. Write Operation
• Write operation writes the updated data value back to the
database from the buffer.
• For example- Write(A) will write the updated value of A from
the buffer to the database.
Transaction States
• A transaction goes through many different states throughout
its life cycle.
• Transaction states are as follows-
– Active state
– Partially committed state
– Committed state
– Failed state
– Aborted state
– Terminated state
1. Active State
• This is the first state in the life cycle of a transaction.
• A transaction is called in an active state as long as its
instructions are getting executed.
• All the changes made by the transaction now are stored in the
buffer in main memory.
2. Partially Committed State
• After the last instruction of transaction has executed, it enters
into a partially committed state.
• After entering this state, the transaction is considered to be
partially committed.
• It is not considered fully committed because all the changes
made by the transaction are still stored in the buffer in main
memory.
3. Committed State
• After all the changes made by the transaction have been
successfully stored into the database, it enters into
a committed state.
• Now, the transaction is considered to be fully committed.
4. Failed State
• When a transaction is getting executed in the active state or
partially committed state and some failure occurs due to
which it becomes impossible to continue the execution, it
enters into a failed state.
5. Aborted State
• After the transaction has failed and entered into a failed state,
all the changes made by it have to be undone.
• To undo the changes made by the transaction, it becomes
necessary to roll back the transaction.
• After the transaction has rolled back completely, it enters into
an aborted state.
6. Terminated State
• This is the last state in the life cycle of a transaction.
• After entering the committed state or aborted state, the
transaction finally enters into a terminated state where its life
cycle finally comes to an end.
ACID Properties OF Transaction

• It is important to ensure that the database remains consistent


before and after the transaction.
• To ensure the consistency of database, certain properties are
followed by all the transactions occurring in the system.
• These properties are called as ACID Properties of a
transaction.
Atomicity
• This property ensures that either the transaction occurs
completely or it does not occur at all.
• In other words, it ensures that no transaction occurs partially.
• That is why, it is also referred to as “All or nothing rule“.
• It is the responsibility of Transaction Control Manager to
ensure atomicity of the transactions.
2. Consistency
• This property ensures that integrity constraints are
maintained.
• In other words, it ensures that the database remains
consistent before and after the transaction.
• It is the responsibility of DBMS and application programmer
to ensure consistency of the database.
3. Isolation
• Transactions can occur simultaneously without causing any
inconsistency.
• During execution, each transaction feels as if it is getting executed
alone in the system.
• A transaction does not realize that there are other transactions as
well getting executed in parallel.
• Changes made by a transaction becomes visible to other
transactions only after they are written in the memory.
• The resultant state of the system after executing all the transactions
is same as the state that would be achieved if the transactions were
executed serially one after the other.
• It is the responsibility of concurrency control manager to ensure
isolation for all the transactions.
4. Durability
• This property ensures that all the changes made by a
transaction after its successful execution are written
successfully to the disk.
• It also ensures that these changes exist permanently and are
never lost even if there occurs a failure of any kind.
• It is the responsibility of recovery manager to ensure
durability in the database.

You might also like