Dbms Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

UNIT – 1

Database
The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently. It is also used to organize the data in the form of a table,
schema, views, and reports, etc.

For example: The college Database organizes the data about the admin, staff,
students and faculty etc.

Using the database, you can easily retrieve, insert, and delete the information.

Database Management System


o Database management system is a software which is used to manage the
database. For example: MySQL, Oracle, etc are a very popular commercial
database which is used in different applications.

o DBMS provides an interface to perform various operations like database


creation, storing data in it, updating data, creating a table in the database
and a lot more.

o It provides protection and security to the database. In the case of multiple


users, it also maintains data consistency.

DBMS allows users the following tasks:

o Data Definition: It is used for creation, modification, and removal of


definition that defines the organization of data in the database.

o Data Updation: It is used for the insertion, modification, and deletion of the
actual data in the database.

o Data Retrieval: It is used to retrieve the data from the database which can
be used by applications for various purposes.

o User Administration: It is used for registering and monitoring users,


maintain data integrity, enforcing data security, dealing with concurrency
control, monitoring performance and recovering information corrupted by
unexpected failure.

Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the
information.

o It can provide a clear and logical view of the process that manipulates data.

o DBMS contains automatic backup and recovery procedures.

o It contains ACID properties which maintain data in a healthy state in case of


failure.

o It can reduce the complex relationship between data.

o It is used to support manipulation and processing of data.

o It is used to provide security of data.

o It can view the database from different viewpoints according to the


requirements of the user.

Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it
stores all the data in one single database file and that recorded data is placed
in the database.

o Data sharing: In DBMS, the authorized users of an organization can share


the data among multiple users.

o Easily Maintenance: It can be easily maintainable due to the centralized


nature of the database system.

o Reduce time: It reduces development time and maintenance need.

o Backup: It provides backup and recovery subsystems which create automatic

Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor
and large memory size to run DBMS software.

o Size: It occupies a large space of disks and large memory to run them
efficiently.

o Complexity: Database system creates additional complexity and


requirements.

o Higher impact of failure: Failure is highly impacted the database because in


most of the organization, all the data stored in a single database and if the
database is damaged due to electric failure or database corruption then the
data may be lost forever.

Types of Database :

1) Centralized Database
It is the type of database that stores data at a centralized database system. It
comforts the users to access the stored data from different locations through several
applications. These applications contain the authentication process to let users
access data securely. An example of a Centralized database can be Central Library
that carries a central database of each library in a college/university.

Advantages of Centralized Database


o It has decreased the risk of data management, i.e., manipulation of data will
not affect the core data.

o Data consistency is maintained as it manages data in a central repository.

o It provides better data quality, which enables organizations to establish data


standards.

o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database


o The size of the centralized database is large, which increases the response
time for fetching the data.
o It is not easy to update such an extensive database system.

o If any server failure occurs, entire data will be lost, which could be a huge
loss.

2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed
among different database systems of an organization. These database systems are
connected via communication links. Such links help the end-users to access the data
easily. Examples of the Distributed database are Apache Cassandra, HBase, Ignite,
etc.

We can further divide a distributed database system into:

o Homogeneous DDB: Those database systems which execute on the same


operating system and use the same application process and carry the same
hardware devices.

o Heterogeneous DDB: Those database systems which execute on different


operating systems under different application procedures, and carries
different hardware devices.

3) Relational Database
This database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A
relational database uses SQL for storing, manipulating, as well as maintaining the
data. E.F. Codd invented the database in 1970. Each table in the database carries a
key that makes the data unique from others. Examples of Relational databases are
MySQL, Microsoft SQL Server, Oracle, etc.

4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of
data sets. It is not a relational database as it stores data not only in tabular form but
in several different ways. It came into existence when the demand for building
modern applications increased. Thus, NoSQL presented a wide variety of database
technologies in response to the demands. We can further divide a NoSQL database
into the following four types:

a. Key-value storage: It is the simplest type of database storage where it


stores every single item as a key (or attribute name) holding its value, together.

b. Document-oriented Database: A type of database used to store data as


JSON-like document. It helps developers in storing data by using the same
document-model format as used in the application code.

c. Graph Databases: It is used for storing vast amounts of data in a graph-like


structure. Most commonly, social networking websites use the graph
database.
d. Wide-column stores: It is similar to the data represented in relational
databases. Here, data is stored in large columns together, instead of storing
in rows.

5) Cloud Database
A type of database where data is stored in a virtual environment and executes over
the cloud computing platform. It provides users with various cloud computing
services (SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous
cloud platforms, but the best options are:

o Amazon Web Services(AWS)

o Microsoft Azure

o Kamatera

o PhonixNAP

o ScienceSoft

o Google Cloud SQL, etc.

6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing
data in the database system. The data is represented and stored as objects which
are similar to the objects used in the object-oriented programming language.

7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship
nodes. Here, it organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record
in the tree will contain only one parent. On the other hand, each parent record can
have multiple child records.

8) Network Databases
It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them.
Unlike the hierarchical database, it allows each record to have multiple children and
parent nodes to form a generalized graph structure.

9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This
database is basically designed for a single user.

10) Operational Database


The type of database which creates and updates the database in real-time. It is
basically designed for executing and handling the daily data operations in several
businesses. For example, An organization uses operational databases for managing
per day transactions.

11) Enterprise Database


Large organizations or enterprises use this database for managing a massive
amount of data. It helps organizations to increase and improve their efficiency. Such
a database allows simultaneous access to users.
What is File System?
A File Management system is a DBMS that allows acces to single files or
tables at a time. In a File System, data is directly stored in set of files. It
contains flat files that have no relation to other files (when only one table is
stored in single file, then this file is known as flat file)

Advantages of DBMS over File system :

• Data redundancy and inconsistency –


Redundancy is the concept of repetition of data i.e. each data may have
more than a single copy. The file system cannot control redundancy of
data as each user defines and maintains the needed files for a specific
application to run. There may be a possibility that two users are
maintaining same files data for different applications. Hence changes
made by one user does not reflect in files used by second users, which
leads to inconsistency of data. Whereas DBMS controls redundancy by
maintaining a single repository of data that is defined once and is
accessed by many users. As there is no or less redundancy, data remains
consistent.
• Data sharing –
File system does not allow sharing of data or sharing is too complex.
Whereas in DBMS, data can be shared easily due to centralized system.
• Data concurrency –
Concurrent access to data means more than one user is accessing the
same data at the same time. Anomalies occur when changes made by
one user gets lost because of changes made by other user. File system
does not provide any procedure to stop anomalies. Whereas DBMS
provides a locking system to stop anomalies to occur.
• Data searching –
For every search operation performed on file system, a different
application program has to be written. While DBMS provides inbuilt
searching operations. User only have to write a small query to retrieve
data from database.
• Data integrity –
There may be cases when some constraints need to be applied on the
data before inserting it in database. The file system does not provide any
procedure to check these constraints automatically. Whereas DBMS
maintains data integrity by enforcing user defined constraints on data by
itself.
• System crashing –
In some cases,systems might have crashes due to various reasons. It is a
bane in case of file systems because once the system crashes, there will
be no recovery of the data that’s been lost. A DBMS will have the
recovery manager which retrieves the data making it another advantage
over file systems.

• Data security –
A file system provides a password mechanism to protect the database but
how longer can the password be protected?No one can guarantee that.
This doesn’t happen in the case of DBMS. DBMS has specialized
features that help provide shielding to its data.

Data Abstraction
Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from
users. This process of hiding irrelevant details from user is called data
abstraction.

We have three levels of abstraction:


Physical level: This is the lowest level of data abstraction. It describes how
data is actually stored in database. You can get the complex data structure
details at this level.

Logical level: This is the middle level of 3-level data abstraction architecture.
It describes what data is stored in database.

View level: Highest level of data abstraction. This level describes the user
interaction with database system.

Example: Let’s say we are storing customer information in a customer table.


At physical level these records can be described as blocks of storage (bytes,
gigabytes, terabytes etc.) in memory. These details are often hidden from the
programmers.

Database Language
o A DBMS has appropriate languages and interfaces to express database
queries and updates.

o Database languages can be used to read, store and update the data in the
database.

Types of Database Language


1. Data Definition Language
o DDL stands for Data Definition Language. It is used to define database
structure or pattern.

o It is used to create schema, tables, indexes, constraints, etc. in the database.

o Using the DDL statements, you can create the skeleton of the database.

o Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table,
constraints, etc.

2. Data Manipulation Language


DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.

o Insert: It is used to insert data into a table.

o Update: It is used to update existing data within a table.

o Delete: It is used to delete all records from a table.

o Merge: It performs UPSERT operation, i.e., insert or update operations.

o Call: It is used to call a structured query language or a Java subprogram.

o Explain Plan: It has the parameter of explaining data.

o Lock Table: It controls concurrency.

3. Data Control Language


o DCL stands for Data Control Language. It is used to retrieve the stored or
saved data.

o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have
the feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.

o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language


TCL is used to run the changes made by the DML statement. TCL can be grouped
into a logical transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.

o Rollback: It is used to restore the database to original since the last Commit.

Data Independence
o Data independence can be explained using the three-schema architecture.

o Data independence refers characteristic of being able to modify the schema at


one level of the database system without altering the schema at the next
higher level.

There are two types of data independence:

1. Logical Data Independence


o Logical data independence refers characteristic of being able to change the
conceptual schema without having to change the external schema.

o Logical data independence is used to separate the external level from the
conceptual view.

o If we do any changes in the conceptual view of the data, then the user view
of the data would not be affected.

o Logical data independence occurs at the user interface level.

2. Physical Data Independence


o Physical data independence can be defined as the capacity to change the
internal schema without having to change the conceptual schema.

o If we do any changes in the storage size of the database system server, then
the Conceptual structure of the database will not be affected.

o Physical data independence is used to separate conceptual levels from the


internal levels.

o Physical data independence occurs at the logical interface level.


DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers,
database servers and other components that are connected with networks.

o The client/server architecture consists of many PCs and a workstation which


are connected via the network.

o DBMS architecture depends upon how users are connected to the database to
get their request done.

Types of DBMS Architecture

Database architecture can be seen as a single tier or multi-tier. But logically,


database architecture is of two types like: 2-tier architecture and 3-tier
architecture.

1-Tier Architecture
o In this architecture, the database is directly available to the user. It means
the user can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.

o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick
response.

2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier
architecture, applications on the client end can directly communicate with the
database at the server side. For this interaction, API's like: ODBC, JDBC are
used.

o The user interfaces and application programs are run on the client-side.

o The server side is responsible to provide the functionalities like: query


processing and transaction management.

o To communicate with the DBMS, client-side application establishes a


connection with the server side.

Fig: 2-tier Architecture


3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server.
In this architecture, client can't directly communicate with the server.

o The application on the client-end interacts with an application server which


further communicates with the database system.

o End user has no idea about the existence of the database beyond the
application server. The database also has no idea about any other user
beyond the application.

o The 3-Tier architecture is used in case of large web application.

Fig: 3-tier Architecture

Three schema Architecture


o The three schema architecture is also called ANSI/SPARC architecture or
three-level architecture.

o This framework is used to describe the structure of a specific database


system.
o The three schema architecture is also used to separate the user
applications and physical database.

o The three schema architecture contains three-levels. It breaks the


database down into three different categories.

The three-schema architecture (three view of data)

In the above diagram:

o It shows the DBMS architecture.

o Mapping is used to transform the request and response between various


database levels of architecture.

o Mapping is not good for small DBMS because it takes more time.

o In External / Conceptual mapping, it is necessary to transform the request


from external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from the
conceptual to internal level.

1. Internal Level
o The internal level has an internal schema which describes the physical
storage structure of the database.

o The internal schema is also known as a physical schema.

o It uses the physical data model. It is used to define that how the data will
be stored in a block.

o The physical level is used to describe complex low-level data structures in


detail.

2. Conceptual Level
o The conceptual schema describes the design of a database at the
conceptual level. Conceptual level is also known as logical level.

o The conceptual schema describes the structure of the whole database.

o The conceptual level describes what data are to be stored in the database
and also describes what relationship exists among those data.

o In the conceptual level, internal details such as an implementation of the


data structure are hidden.

o Programmers and database administrators work at this level.

3. External Level
o At the external level, a database contains several schemas that sometimes
called as subschema. The subschema is used to describe the different view
of the database.

o An external schema is also known as view schema.

o Each view schema describes the database part that a particular user group
is interested and hides the remaining database from that user group.

o The view schema describes the end user interaction with database
systems.
Unit 2

Data Models
Data models define how the logical structure of a database is modeled. Data Models
are fundamental entities to introduce abstraction in a DBMS. Data models define how
data is connected to each other and how they are processed and stored inside the
system.

1) Relational Data Model: This type of model designs the data in the form of rows
and columns within a table. Thus, a relational model uses tables for representing
data and in-between relationships. Tables are also called relations. This model was
initially described by Edgar F. Codd, in 1969. The relational data model is the widely
used model which is primarily used by commercial data processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of


data as objects and relationships among them. These objects are known as entities,
and relationship is an association among these entities. This model was designed by
Peter Chen and published in 1976 papers. It was widely used in database designing.
A set of attributes describe the entities. For example, student_name, student_id
describes the 'student' entity. A set of the same type of entities is known as an
'Entity set', and the set of the same type of relationships is known as 'relationship
set'.
3) Object-based Data Model: An extension of the ER model with notions of
functions, encapsulation, and object identity, as well. This model supports a rich
type system that includes structured and collection types. Thus, in 1980s, various
database systems following the object-oriented approach were developed. Here, the
objects are nothing but the data carrying its properties.

4) Semistructured Data Model: This type of data model is different from the
other three data models (explained above). The semistructured data model allows
the data specifications at places where the individual data items of the same type
may have different attributes sets. The Extensible Markup Language, also known as
XML, is widely used for representing the semistructured data. Although XML was
initially designed for including the markup information to the text document, it gains
importance because of its application in the exchange of data

ER Model
The ER model defines the conceptual view of a database. It works around real-world
entities and the associations among them. At view level, the ER model is considered
a good option for designing databases.

Entity
An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes, and
courses offered can be considered as entities. All these entities have some attributes
or properties that give them their identity. Entities are represented by means of
rectangles. Rectangles are named with the entity set they represent.

Attributes
Entities are represented by means of their properties, called attributes. All attributes
have values. For example, a student entity may have name, class, and age as
attributes.
Attributes are the properties of entities. Attributes are represented by means of
ellipses. Every ellipse represents one attribute and is directly connected to its entity
(rectangle).
If the attributes are composite, they are further divided in a tree like structure. Every
node is then connected to its attribute. That is, composite attributes are represented
by ellipses that are connected with an ellipse.

Types of Attributes
• Simple attribute − Simple attributes are atomic values, which cannot be
divided further. For example, a student's phone number is an atomic value of
10 digits.
• Composite attribute − Composite attributes are made of more than one
simple attribute. For example, a student's complete name may have
first_name and last_name.

• Derived attribute − Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in
the database. For example, average_salary in a department should not be
saved directly in the database, instead it can be derived. For another example,
age can be derived from data_of_birth.

• Single-value attribute − Single-value attributes contain single value. For


example − Social_Security_Number.
• Multi-value attribute − Multi-value attributes may contain more than one
values. For example, a person can have more than one phone number,
email_address, etc.

Entity-Set and Keys


Key is an attribute or collection of attributes that uniquely identifies an entity among
entity set.
For example, the roll_number of a student makes him/her identifiable among
students.
• Super Key − A set of attributes (one or more) that collectively identifies an
entity in an entity set.
• Candidate Key − A minimal super key is called a candidate key. An entity set
may have more than one candidate key.
• Primary Key − A primary key is one of the candidate keys chosen by the
database designer to uniquely identify the entity set.

Relationship
The association among entities is called a relationship. For example, an
employee works_at a department, a student enrolls in a course. Here, Works_at
and Enrolls are called relationships.

Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a
relationship too can have attributes. These attributes are called descriptive
attributes.

Degree of Relationship
The number of participating entities in a relationship defines the degree of the
relationship.

• Binary = degree 2
• Ternary = degree 3
• n-ary = degree

Participation Constraints
• Total Participation − Each entity is involved in the relationship. Total
participation is represented by double lines.
• Partial participation − Not all entities are involved in the relationship. Partial
participation is represented by single lines.

Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of
entities to which another entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two
entity sets.

o For binary relationship set R on an entity set A and B, there are four possible
mapping cardinalities. These are as follows:

1. One to one (1:1)

2. One to many (1:M)

3. Many to one (M:1)

4. Many to many (M:M)

One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2,
and an entity in E2 is associated with at most one entity in E1.

One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in
E2, and an entity in E2 is associated with at most one entity in E1.

Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2,
and an entity in E2 is associated with any number of entities in E1.

Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities
in E2, and an entity in E2 is associated with any number of entities in E1.
ER Diagram Symbols and Notations

Entity
An entity can be a person, place, event, or object that is relevant to a given system. For
example, a school system may include students, teachers, major courses, subjects, fees, and
other items. Entities are represented in ER diagrams by a rectangle and named using singular
nouns.

Weak Entity-
A weak entity is an entity that depends on the existence of another entity. In
more technical terms it can be defined as an entity that cannot be identified by
its own attributes. It uses a foreign key combined with its attributed to form the
primary key. An entity like order item is a good example for this. The order item
will be meaningless without an order so it depends on the existence of the order.

Strong Entity Set-

• A strong entity set is an entity set that contains sufficient attributes to uniquely identify
all its entities.
• In other words, a primary key exists for a strong entity set.
• Primary key of a strong entity set is represented by underlining it
• Difference between Strong and Weak Entity:

S.NO Strong Entity Weak Entity

While weak entity has partial discriminator


1. key.

Strong entity is not dependent of


2. any other entity. Weak entity is depend on strong entity.

Strong entity is represented by Weak entity is represented by double


3. single rectangle. rectangle.

Two strong entity’s relationship While the relation between one strong and
is represented by single one weak entity is represented by double
4. diamond. diamond.

Strong entity have either total While weak entity always has total
5. participation or not. participation.

What is an Entity Relationship Diagram (ER


Diagram)?
An ER diagram shows the relationship among entity sets. An entity set is a
group of similar entities and these entities can have attributes. In terms of
DBMS, an entity is a table or attribute of a table in database, so by showing
relationship among tables and their attributes, ER diagram shows the
complete logical structure of a database. Lets have a look at a simple ER
diagram to understand this concept.
UNIT 3
Relational data model is the primary data model, which is used widely around the
world for data storage and processing. This model is simple and it has all the
properties and capabilities required to process data with storage efficiency.

Concepts
Tables − In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows
represents records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is
called a tuple.
Relation instance − A finite set of tuples in the relational database system
represents relation instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name),
attributes, and their names.
Relation key − Each row has one or more attributes, known as relation key, which
can identify the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as
attribute domain.

COOD’S RULES
Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database must
obey in order to be regarded as a true relational database.

Rule 1: Information Rule


The data stored in a database, may it be user data or metadata, must be a value of
some table cell. Everything in a database must be stored in a table format.

Rule 2: Guaranteed Access Rule


Every single data element (value) is guaranteed to be accessible logically with a
combination of table-name, primary-key (row value), and attribute-name (column
value). No other means, such as pointers, can be used to access data.

Rule 3: Systematic Treatment of NULL Values


The NULL values in a database must be given a systematic and uniform treatment.
This is a very important rule because a NULL can be interpreted as one the following
− data is missing, data is not known, or data is not applicable.

Rule 4: Active Online Catalog


The structure description of the entire database must be stored in an online catalog,
known as data dictionary, which can be accessed by authorized users. Users can
use the same query language to access the catalog which they use to access the
database itself.

Rule 5: Comprehensive Data Sub-Language Rule


A database can only be accessed using a language having linear syntax that supports
data definition, data manipulation, and transaction management operations. This
language can be used directly or by means of some application. If the database
allows access to data without any help of this language, then it is considered as a
violation.

Rule 6: View Updating Rule


All the views of a database, which can theoretically be updated, must also be
updatable by the system.

Rule 7: High-Level Insert, Update, and Delete Rule


A database must support high-level insertion, updation, and deletion. This must not
be limited to a single row, that is, it must also support union, intersection and minus
operations to yield sets of data records.

Rule 8: Physical Data Independence


The data stored in a database must be independent of the applications that access
the database. Any change in the physical structure of a database must not have any
impact on how the data is being accessed by external applications.

Rule 9: Logical Data Independence


The logical data in a database must be independent of its user’s view (application).
Any change in logical data must not affect the applications using it. For example, if
two tables are merged or one is split into two different tables, there should be no
impact or change on the user application. This is one of the most difficult rule to apply.

Rule 10: Integrity Independence


A database must be independent of the application that uses it. All its integrity
constraints can be independently modified without the need of any change in the
application. This rule makes a database independent of the front-end application and
its interface.

Rule 11: Distribution Independence


The end-user must not be able to see that the data is distributed over various
locations. Users should always get the impression that the data is located at one site
only. This rule has been regarded as the foundation of distributed database systems.

Rule 12: Non-Subversion Rule


If a system has an interface that provides access to low-level records, then the
interface must not be able to subvert the system and bypass security and integrity
constraints.

Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations
as input and yields instances of relations as output. It uses operators to perform
queries. An operator can be either unary or binary. They accept relations as their
input and yield relations as their output. Relational algebra is performed recursively
on a relation and intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −

• Select
• Project
• Union
• Set different
• Cartesian product
• Rename
We will discuss all these operations in the following sections.

Select Operation (σ)


It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional
logic formula which may use connectors like and, or, and not. These terms may use
relational operators like − =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or
those books published after 2010.

Project Operation (∏)


It projects column(s) that satisfy a given predicate.
Notation − ∏A , A , A (r)
1 2 n

Where A , A , A are attribute names of relation r.


1 2 n

Duplicate rows are automatically eliminated, as relation is a set.


For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation Books.

Union Operation (∪)


It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −

• r, and s must have the same number of attributes.


• Attribute domains must be compatible.
• Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an
article or both.

Set Difference (−)


The result of set difference operation is tuples, which are present in one relation but
are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
Cartesian Product (Χ)
Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by
tutorialspoint.

Rename Operation (ρ)


The results of relational algebra are also relations but without any name. The rename
operation allows us to rename the output relation. 'rename' operation is denoted with
small Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are −

• Set intersection
• Assignment
• Natural join

Relational Calculus
In contrast to Relational Algebra, Relational Calculus is a non-procedural query
language, that is, it tells what to do but never explains how to do it.
Relational calculus exists in two forms −
Tuple Relational Calculus (TRC)
Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −
{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with 'name' from Author who has written article on
'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output − The above query will yield the same result as the previous one.
Domain Relational Calculus (DRC)
In DRC, the filtering variable uses the domain of attributes instead of entire tuple
values (as done in TRC, mentioned above).
Notation −
{ a , a , a , ..., a | P (a , a , a , ... ,a )}
1 2 3 n 1 2 3 n

Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{< article, page, subject > | ∈ TutorialsPoint ∧ subject =
'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where
subject is database.
Just like TRC, DRC can also be written using existential and universal quantifiers.
DRC also involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is
equivalent to Relational Algebra.

ER Model to Relational Model


ER diagrams mainly comprise of −

• Entity and its attributes


• Relationship, which is association among entities.

Mapping Entity
An entity is a real-world object with some attributes.

Mapping Process (Algorithm)

• Create table for each entity.


• Entity's attributes should become fields of tables with their respective data types.
• Declare primary key.
Mapping Relationship
A relationship is an association among entities.

Mapping Process

• Create table for a relationship.


• Add the primary keys of all participating Entities as fields of table with their respective
data types.
• If relationship has any attribute, add each attribute as field of table.
• Declare a primary key composing all the primary keys of participating entities.
• Declare all foreign key constraints

Constraints on Relational database model


Mainly Constraints on the relational database are of 4 types:

1. Domain constraints
2. Key constraints
3. Entity Integrity constraints
4. Referential integrity constraints
Let discuss each of the above constraints in detail.
1. Domain constraints :
1. Every domain must contain atomic values(smallest indivisible units) it means
composite and multi-valued attributes are not allowed.
2. We perform datatype check here, which means when we assign a data type to a
column we limit the values that it can contain. Eg. If we assign the datatype of
attribute age as int, we cant give it values other then int datatype.
3. Example:

2. Key Constraints or Uniqueness Constraints :


These are called uniqueness constraints since it ensures that every tuple in the
relation should be unique

Example:

1. Keys are the entity set that is used to identify an entity within its entity set
uniquely.

2. An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and null value in the
relational table.

Example

3. A relation can have multiple keys or candidate keys(minimal


superkey), out of which we choose one of the keys as primary key, we
don’t have any restriction on choosing the primary key out of
candidate keys, but it is suggested to go with the candidate key with
less number of attributes.
4. Null values are not allowed in the primary key, hence Not Null
constraint is also a part of key constraint.
3. Entity Integrity Constraints :
1. Entity Integrity constraints says that no primary key can take NULL value, since
using primary key we identify each tuple uniquely in a relation.
2. Example:

4. Referential Integrity Constraints :


1. The Referential integrity constraints is specified between two relations or tables
and used to maintain the consistency among the tuples in two relations.
2. This constraint is enforced through foreign key, when an attribute in the foreign
key of relation R1 have the same domain(s) as the primary key of relation R2, then
the foreign key of R1 is said to reference or refer to the primary key of relation R2.
3. The values of the foreign key in a tuple of relation R1 can either take the values of
the primary key for some tuple in relation R2, or can take NULL values, but can’t
be empty
4. Example:
UNIT 4
Relational database design (RDD) models information and data into a
set of tables with rows and columns. Each row of a relation/table
represents a record, and each column represents an attribute
of data. The Structured Query Language (SQL) is used to
manipulate relational databases.

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is
known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name,


Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated
with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency


1. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.

o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional depend
ency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name a
re trivial dependencies too.

2. Non-trivial functional dependency


o A → B has a non-trivial functional dependency if B is not a subset of A.

o When A intersection B is NULL, then A → B is called as complete non-


trivial.

Example:

1. ID → Name,
2. Name → DOB

Normalization
1) Normalization is the process of organizing the data in the database.
2) Normalization is used to minimize the redundancy from a relation or
set of relations. It is also used to eliminate the undesirable
characteristics like Insertion, Update and Deletion Anomalies.
3) Normalization divides the larger table into the smaller table and links
them using relationship.
4) The normal form is used to reduce redundancy from the database
table.

Purpose of Normalization
Normalization is the process of structuring and handling the relationship between
data to minimize redundancy in the relational table and avoid the unnecessary
anomalies properties from the database like insertion, update and delete. It helps to
divide large database tables into smaller tables and make a relationship between
them. It can remove the redundant data and ease to add, manipulate or delete table
fields.

Types of Normal Forms


There are the four types of normal forms

First Normal Form (1NF)


1. A relation will be 1NF if it contains an atomic value.
2. It states that an attribute of a table cannot hold multiple values. It must hold
only single-valued attribute.
3. First normal form disallows the multi-valued attribute, composite attribute,
and their combinations.

Example 1 – Relation STUDENT in table 1 is not in 1NF because of


multi-valued attribute STUD_PHONE. Its decomposition into 1NF has
been shown in table 2.

Second Normal Form (2NF)


a. In the 2NF, relational must be in 1NF.
b. In the second normal form, all non-key attributes are fully functional
dependent on the primary key

Example 1 – Consider table-3 as following below.


STUD_NO COURSE_NO COURSE_FEE

1 C1 1000

2 C2 1500

1 C4 2000

4 C3 1000

4 C1 1000

2 C5 2000
Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.

o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.

o If there is no transitive dependency for non-prime attributes, then the


relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.

1. X is a super key.

2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE

222 Ram 201010 UP

333 Krishna 02228 Bhopal

444 Shyam 60007 Amritsar

555 Raghav 06389 Goa

666 John 462007 MP

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF

emp_id emp_nationality emp_dept dept_type dept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

is stricter than 3NF. A table complies with BCNF if it is in 3NF and for
every functional dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more


than one department. They store the data like this:

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables
like this:
emp_nationality table:
emp_id emp_nationality

1001 Austrian

1002 American

emp_dept table:

emp_dept dept_type dept_no_of_emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600

emp_dept_mapping table:
emp_id emp_dept

1001 Production and planning

1001 stores

1002 design and technical support

1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a
key.

Fourth normal form (4NF)


1. A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.

2. For a dependency A → B, if for a single value of A, multiple values of B exists,


then the relation will be a multi-valued dependency.

STU_ID COURSE HOBBY


21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

3. The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and
HOBBY.
4. In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So
there is a Multi-valued dependency on STU_ID, which leads to unnecessary
repetition of data.
5. So to make the above table into 4NF, we can decompose it into two tables:
6. STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

7. STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

Fifth normal form (5NF)


1. A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
2. 5NF is satisfied when all the tables are broken into as many tables as possible
in order to avoid redundancy.

3. 5NF is also known as Project-join normal form (PJ/NF)

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer Rohit Semester 1

Math Rohit Semester 1

4. In the above table, John takes both Computer and Math class for Semester 1
but he doesn't take Math class for Semester 2. In this case, combination of all
these fields required to identify a valid data.
5. Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject
as NULL. But all three columns together acts as a primary key, so we can't
leave other two columns blank.
6. So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
7. P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

8. P2

SUBJECT LECTURER

Computer Anshika

Computer Rohit

Math Rohit
9. P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 Rohit

Semester 1 Rohit

You might also like