0% found this document useful (0 votes)
49 views80 pages

DBMS Notes For B.TECH

A database is a structured collection of inter-related data managed by a Database Management System (DBMS), which provides functionalities for data retrieval, insertion, and deletion. The document outlines the differences between DBMS and traditional file systems, emphasizing aspects like data sharing, security, and data manipulation techniques. It also discusses the three-schema architecture for database structure, various database languages, and different types of database models including hierarchical, network, and relational models.

Uploaded by

Aftaz Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views80 pages

DBMS Notes For B.TECH

A database is a structured collection of inter-related data managed by a Database Management System (DBMS), which provides functionalities for data retrieval, insertion, and deletion. The document outlines the differences between DBMS and traditional file systems, emphasizing aspects like data sharing, security, and data manipulation techniques. It also discusses the three-schema architecture for database structure, various database languages, and different types of database models including hierarchical, network, and relational models.

Uploaded by

Aftaz Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 80

What is Database

The database is a collection of inter-related data which is used to retrieve,


insert and delete the data efficiently. It is also used to organize the data in
the form of a table, schema, views, and reports, etc.

Database Management System


o Database management system is a software which is used to manage
the database. For example: MySQL, Oracle, etc are a very popular
commercial database which is used in different applications.
o DBMS provides an interface to perform various operations like
database creation, storing data in it, updating data, creating a table in
the database and a lot more.
o It provides protection and security to the database. In the case of
multiple users, it also maintains data consistency.

There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In The file system is a collection


DBMS, the user is not required of data. In this system, the
to write the procedures. user has to write the
procedures for managing the
database.

Sharing of Due to the centralized approach, Data is distributed in many


data data sharing is easy. files, and it may be of
different formats, so it isn't
easy to share data.

Data DBMS gives an abstract view of The file system provides the
Abstraction data that hides the details. detail of the data
representation and storage of
data.

Security and DBMS provides a good It isn't easy to protect a file


Protection protection mechanism. under the file system.
Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects crash mechanism, i.e., if the
the user from system failure. system crashes while entering
some data, then the content
of the file will be lost.

Manipulation DBMS contains a wide variety of The file system can't


Techniques sophisticated techniques to efficiently store and retrieve
store and retrieve the data. the data.

Concurrency DBMS takes care of Concurrent In the File system, concurrent


Problems access of data using some form access has many problems
of locking. like redirecting the file while
deleting some information or
updating some information.

Where to use Database approach used in File system approach used in


large systems which interrelate large systems which
many files. interrelate many files.

Cost The database system is The file system approach is


expensive to design. cheaper to design.

Data Due to the centralization of the In this, the files and


Redundancy database, the problems of data application programs are
and redundancy and inconsistency created by different
Inconsistency are controlled. programmers so that there
exists a lot of duplication of
data which may lead to
inconsistency.

Structure The database structure is The file system approach has


complex to design. a simple structure.

Data In this system, Data In the File system approach,


Independence Independence exists, and it can there exists no Data
be of two types. Independence.
o Logical Data
Independence
o Physical Data
Independence

Integrity Integrity Constraints are easy to Integrity Constraints are


Constraints apply. difficult to implement in file
system.

Data Models In the database approach, 3 In the file system approach,


types of data models exist: there is no concept of data
models exists.
o Hierarchal data models
o Network data models
o Relational data models

Flexibility Changes are often a necessity to The flexibility of the system is


the content of the data stored in less as compared to the DBMS
any system, and these changes approach.
are more easily with a database
approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

2. Traditional File System Vs DBMS


File System :
The file system is basically a way of arranging the files in a storage
medium like a hard disk. The file system organizes the files and helps
in the retrieval of files when they are required. File systems consist of
different files which are grouped into directories. The directories
further contain other folders and files. The file system performs basic
operations like management, file naming, giving access rules, etc.
Example: NTFS(New Technology File System), EXT(Extended File
System).
File System

Three schema Architecture


o The three schema architecture is also called ANSI/SPARC architecture or
three-level architecture.
o This framework is used to describe the structure of a specific database
system.
o The three schema architecture is also used to separate the user applications
and physical database.
o The three schema architecture contains three-levels. It breaks the database
down into three different categories.

The three-schema architecture is as follows:


In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between
various database levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the
request from external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from
the conceptual to internal level.

In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between
various database levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the
request from external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from
the conceptual to internal level.

In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between
various database levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the
request from external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from
the conceptual to internal level.

Objectives of Three schema Architecture


The main objective of three level architecture is to enable multiple users to
access the same data with a personalized view while storing the underlying
data only once. Thus it separates the user's view from the physical structure
of the database. This separation is desirable for the following reasons:

o Different users need different views of the same data.


o The approach in which a particular user needs to see the data may
change over time.
o The users of the database should not worry about the physical
implementation and internal workings of the database such as data
compression and encryption techniques, hashing, optimization of the
internal structures etc.
o All users should be able to access the same data according to their
requirements.
o DBA should be able to change the conceptual structure of the database
without affecting the user's
o Internal structure of the database should be unaffected by changes to
physical aspects of the storage.

Mapping between Views


The three levels of DBMS architecture don't exist independently of each
other. There must be correspondence between the three levels i.e. how they
actually correspond with each other. DBMS is responsible for correspondence
between the three types of schema. This correspondence is called Mapping.

There are basically two types of mapping in the database


architecture:

o Conceptual/ Internal Mapping


o External / Conceptual Mapping

Conceptual/ Internal Mapping

The Conceptual/ Internal Mapping lies between the conceptual level and the
internal level. Its role is to define the correspondence between the records
and fields of the conceptual level and files and data structures of the internal
level.

External/ Conceptual Mapping

The external/Conceptual Mapping lies between the external level and the
Conceptual level. Its role is to define the correspondence between a
particular external and the conceptual view.

Database Languages in DBMS


o A DBMS has appropriate languages and interfaces to express database
queries and updates.
o Database languages can be used to read, store and update the data in the
database.
Types of Database Languages

1. Data Definition Language (DDL)


o DDL stands for Data Definition Language. It is used to define database
structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table,
constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.


o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why they
come under Data definition language.

2. Data Manipulation Language (DML)


DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java
subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.

3. Data Control Language (DCL)


o DCL stands for Data Control Language. It is used to retrieve the stored
or saved data.
o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does


not have the feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.


4. Transaction Control Language (TCL)
TCL is used to run the changes made by the DML statement. TCL can be
grouped into a logical transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last
Commit.

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java
subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.

3. Data Control Language (DCL)


o DCL stands for Data Control Language. It is used to retrieve the stored
or saved data.
o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does


not have the feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language (TCL)


TCL is used to run the changes made by the DML statement. TCL can be
grouped into a logical transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last
Commit.

DBMS Database Models


A Database model defines the logical design and structure of a database. It
defines how data will be stored, accessed, and updated in a database
management system.

 As per your application's requirement, you can use a database model


to define your database.

 The database model sets the rule, relationships, constraints, etc. to


define how data is stored in the database.

 It's like creating a blueprint of your Database.

 There are different types of Database models and each one has its own
set of features.

 You can define how you want to structure the application data using a
database model.

In this tutorial you will learn about the 7 database model that are popularly
used.

Type of Database models

There are several different Database model types, some of them are old,
while some of them are new, to cater to the new age requirements. Here is a
list of the 7 popular Database models:
1. Hierarchical Model

2. Network Model

3. Entity-relationship Model

4. Relational Model

5. Object-oriented Model

6. NoSQL Model

7. Graph Model

Let's learn about the different types of database models along with their
main features and when should you use them.

1. Hierarchical Model

 The hierarchical database model organizes data into a tree-like


structure, with a single root, to which all the other data is linked.

 The hierarchy starts from the Root data, and expands like a tree,
adding child nodes to the parent nodes.

 In this model, a child node will only have a single parent node.

 This model efficiently describes many real-world relationships like


the index of a book, etc.

 IBM's Information Management System (IMS) is based on this model.

 Data is organized into a tree-like structure with a one-to-many


relationship between two different types of data, for example,
one department can have many courses, many teachers, and of
course many students(like shown in the diagram below).
Advantages/Disadvantages of the Hierarchical Model

Here are a few points to mark the advantages and disadvantages of the
Hierarchical database model:

1. Because it has one-to-many relationships between different types of


data so it is easier and fast to fetch the data.

2. But the Hierarchical model is less flexible.

3. And it doesn't support many-to-many relationships.

2. Network Model

 The Network Model is an extension of the Hierarchical model.

 In this model, data is organized more like a graph, and allowed to


have more than one parent node.

 In the network database model, data is more related as more


relationships are established in this database model.
 Also, as the data is more related, hence accessing the data is also
easier and fast.

 This database model uses many-to-many data relationships.

 Integrated Data Store (IDS) is based on this database model.

 This was the most widely used database model before Relational Model
was introduced.

 The implementation of the Network model is complex, and it's very


difficult to maintain it.

 The Network model is difficult to modify also.

 You may want to explore this if you are developing some social
networking applications, although the Graph Database model is new
and is far better than the Network Database model.

Advantages of the Network Model

1. It supports complex relationships


2. It allows more flexibility

3. Entity-relationship Model

 In this database model, relationships are created by dividing objects of


interest into entities and their characteristics into attributes.

 Different entities are related using relationships.

 ER Models are defined to represent the relationships in pictorial form to


make it easier for different stakeholders to understand.

 This model is good to design a database, which can then be turned into
tables in a relational model (explained below).

 Let's take an example, If we have to design a School Database, then


the Student will be
an entity with attributes name, age, address, etc. As an Address is
generally complex, it can be
another entity with attributes street, pincode, city, etc, and there will
be a relationship between them.

 Relationships can also be of different types. You can learn about ER


Diagrams in detail if you want to learn about entities and relationships.
Advantages of the ER Model

1. It is easy to understand and design.

2. Using the ER model we can represent data structures easily.

3. As the ER model cannot be directly implemented into a database


model, it is just a step toward designing the relational database model.

ER (Entity Relationship) Diagram in DBMS


o ER model stands for an Entity-Relationship model. It is a high-level data
model. This model is used to define the data elements and relationship for a
specified system.
o It develops a conceptual design for the database. It also develops a very
simple and easy to design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an
entity-relationship diagram.
For example, Suppose we design a school database. In this database, the
student will be an entity with attributes like address, name, id, age, etc. The
address can be another entity with attributes like city, street name, pin code,
etc and there will be a relationship between them.
Component of ER Diagram

1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an
entity can be represented as rectangles.

Consider an organization as an example- manager, product, employee,


department etc. can be taken as an entity.
a. Weak EntityPlay Video

An entity that depends on another entity called a weak entity. The weak
entity doesn't contain any key attribute of its own. The weak entity is
represented by a double rectangle.

2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to
represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a
student.

Key Attribute

The key attribute is used to represent the main characteristics of an entity. It


represents a primary key. The key attribute is represented by an ellipse with
the text underlined.
Multivalued Attribute

An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued
attribute.

For example, a student can have more than one phone number.
An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued
attribute.

For example, a student can have more than one phone number.

4. Relational Model

 In this model, data is organized in two-dimensional tables and the


relationship is maintained by storing a common field.

 This model was introduced by E.F Codd in 1970, and since then it has
been the most widely used database model.

 The basic structure of data in the relational model is tables. All the
information related to a particular type is stored in rows of that table.

 Hence, tables are also known as relations in the relational model.

 You can design tables, normalize them to reduce data redundancy,


and use Structured Query language or SQL to access data from the
tables.
 Some of the most popular databases are based on this database
model. For example, Oracle, MySQL, etc.

Advantages of the Relational Model

1. It's simple and easy to implement.

2. Poplar database software is available for this database model.

3. It supports SQL using which you can easily query the data.

Relational Algebra is procedural query language. Relational Calculus is a non-


procedural or declarative query language. Relational Algebra targets how to obtain the
result. Relational Calculus targets what result to obtain.

Difference between Hierarchical and Network Model


Hierarchical Data Model:
Hierarchical data model is the oldest type of the data model. It was
developed by IBM in 1968. It organizes data in the tree-like structure.
Hierarchical model consists of the following :
 It contains nodes which are connected by branches.
 The topmost node is called the root node.
 If there are multiple nodes appear at the top level, then these can
be called as root segments.
 Each node has exactly one parent.
 One parent may have many child.

In the above figure, Electronics is the root node which has two children
i.e. Televisions and Portable Electronics. These two has further children
for which they act as parent. For example: Television has children as
Tube, LCD and Plasma, for these three Television act as parent. It
follows one to many relationship.
2. Network Data Model:
It is the advance version of the hierarchical data model. To organize
data it uses directed graphs instead of the tree-structure. In this child
can have more than one parent. It uses the concept of the two data
structures i.e. Records and Sets.
In the above figure, Project is the root node which has two children i.e.
Project 1 and Project 2. Project 1 has 3 children and Project 2 has 2
children. Total there are 5 children i.e Department A, Department B
and Department C, they are network related children as we said that
this model can have more than one parent. So, for the Department B
and Department C have two parents i.e. Project 1 and Project 2.
Difference between Hierarchical Data Model and Network Data
Model :
S.
No. Hierarchical Data Model Network Data Model

In this model, you could create a


In this model, to store data hierarchy method
1. network that shows how data is
is used.
related to each other.

It implements 1:1, 1:n and also


2. It implements 1:1 and 1:n relations.
many to many relations.

To organize records, it uses


3. To organize records, it uses tree structure.
graphs.

Records are linked with the help of


4. Records are linked with the help of pointers.
linked list.

5. Insertion anomaly exits in this model i.e. There is no insertion anomaly.


child node cannot be inserted without the
S.
No. Hierarchical Data Model Network Data Model

parent node.

Deletion anomaly exists in this model i.e. it


6. There is no deletion anomaly.
is difficult to delete the parent node.

It is used to access the data which is complex It is used to access the data which
7.
and asymmetric. is complex and symmetric.

When update operation is performed, it


No such problem exists because of
suffers from inconsistency problem because
8. the single occurrence of records
of the existence of multiple instances of child
while updating.
records.

There is partial data independence


9. This model lacks data independence.
in this model.

Less flexible in comparison to the relational


10. It is flexible.
model.

Searching for a record is easy


When you are searching for a record then
because of the availability of
11. firstly you need to visit parent record before
multiple access paths to reach data
retrieving a child record.
item.

Example- Oracle. SQL Server,


Example- IBM’s IMS (Information Sybase DBMS implement this
12.
Management System) implement this model. model.

Types of keys

I) Super Key – An attribute or a combination of attribute


that is used to identify the records uniquely is known as
Super Key. A table can have many Super Keys. 13
E.g. of Super Key 1 ID 2 ID, Name 3 ID, Address 4 ID,
Department_ID 5 ID, Salary 6 Name, Address 7 Name,
Address, Department_ID ………… So on as any
combination which can identify the records uniquely will
be a Super Key.
II) Candidate Key – It can be defined as minimal Super Key
or irreducible Super Key. In other words an attribute or a
combination of attribute that identifies the record
uniquely but none of its proper subsets can identify the
records uniquely.
E.g. of Candidate Key 1 Code 2 Name, Address For above
table we have only two Candidate Keys (i.e. Irreducible
Super Key) used to identify the records from the table
uniquely. Code Key can identify the record uniquely and
similarly combination of Name and Address can identify
the record uniquely, but neither Name nor Address can
be used to identify the records uniquely as it might be
possible that we have two employees with similar name
or two employees from the same house.
III) Primary Key – A Candidate Key that is used by the
database designer for unique identification of each row in
a table is known as Primary Key. A Primary Key can
consist of one or more attributes of a table.
E.g. of Primary Key - Database designer can use one of
the Candidate Key as a Primary Key. In this case we have
“Code” and “Name, Address” as Candidate Key, we will
consider “Code” Key as a Primary Key as the other key is
the combination of more than one attribute.
IV) Foreign Key – A foreign key is an attribute or combination
of attribute in one base table that points to the candidate
key (generally it is the primary key) of another table. The
purpose of the foreign key is to ensure referential
integrity of the data i.e. only values that are supposed to
appear in the database are permitted. 14
E.g. of Foreign Key – Let consider we have another table
i.e. Department Table with Attributes “Department_ID”,
“Department_Name”, “Manager_ID”, ”Location_ID” with
Department_ID as an Primary Key. Now the
Department_ID attribute of Employee Table (dependent
or child table) can be defined as the Foreign Key as it can
reference to the Department_ID attribute of the
Departments table (the referenced or parent table), a
Foreign Key value must match an existing value in the
parent table or be NULL.
V) Composite Key – If we use multiple attributes to create a
Primary Key then that Primary Key is called Composite
Key (also called a Compound Key or Concatenated Key).
E.g. of Composite Key, if we have used “Name, Address”
as a Primary Key then it will be our Composite Key.
VI) Alternate Key – Alternate Key can be any of the
Candidate Keys except for the Primary Key.
E.g. of Alternate Key is “Name, Address” as it is the only
other Candidate Key which is not a Primary Key.
VII) Secondary Key – The attributes that are not even the
Super Key but can be still used for identification of
records (not unique) are known as Secondary Key.
E.g. of Secondary Key can be Name, Address, Salary,
Department_ID etc. as they can identify the records but
they might not be unique.

Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values
for an attribute.
o The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.

Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those
rows.
o A table can contain a null value other than the primary key field.

Example:
3. Referential Integrity Constraints
o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must
be null or be available in Table 2.

Example:

4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set
uniquely.
o An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and null value in the
relational table.

Example:
Unit-2

Relational Algebra
Relational algebra is a procedural query language. It gives a step by step
process to obtain the result of the query. It uses operators to perform
queries.

Types of Relational operation

1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

1. Notation: σ p(r)

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like:
AND OR and NOT. These relational can use as relational operators like =, ≠,
≥, <, >, ≤.

For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

1. σ BRANCH_NAME="perryride" (LOAN)

Output:
BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn


Input:

1. ∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:
1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in both R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:
1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in R but not in S.
o It is denoted by intersection minus (-).

1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Jackson

Hayes

Willians

Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each
row in the other table. It is also known as a cross product.
o It is denoted by X.

1. Notation: E X D
Example:
EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:

1. EMPLOYEE X DEPARTMENT

Output:
EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted
by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to


STUDENT1.

1. ρ(STUDENT1, STUDENT)

Introduction

Relational Calculus in database management system (DBMS) is all about "What you want ?".
Relational calculus does not tell us how to get the results from the Database, but it just cares
about what we want.
The theory of Relational calculus was introduced by computer scientist and
mathematician Edgar Codd. Let's deep dive and try to understand Relational calculus.

What is Relational Calculus?

Before understanding Relational calculus in DBMS, we need to understand Procedural


Language and Declarative Langauge.

1. Procedural Language - Those Languages which clearly define how to get the required results
from the Database are called Procedural Language. Relational algebra is a Procedural Language.
2. Declarative Language - Those Language that only cares about What to get from the database
without getting into how to get the results are called Declarative Language. Relational
Calculus is a Declarative Language.

So Relational Calculus is a Declarative Language that uses Predicate Logic or First-Order Logic
to determine the results from Database.

Types of Relational Calculus in DBMS


Relational Calculus is of Two Types:

1. Tuple Relational Calculus (TRC)


2. Domain Relational Calculus (DRC)

Tuple Relational Calculus (TRC)

Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each row of the table
and checks if the predicate is true or false for the given row. Depending on the given predicate
condition, it returns the row or part of the row.

The Tuple Relational Calculus expression Syntax

{t \| P(t)}

Where t is the tuple variable that runs over every Row, and P(t) is the predicate logic expression
or condition.

Let's take an example of a Customer Database and try to see how TRC expressions work.

Customer Table
Customer_id Name Zip code

1 Rohit 12345

2 Rahul 13245
Customer_id Name Zip code

3 Rohit 56789

4 Amit 12345.

Example 1: Write a TRC query to get all the data of customers whose zip code is 12345.

TRC Query: {t \| t ∈ Customer ∧ t.Zipcode = 12345} or TRC Query: {t \| Customer(t) ∧


t[Zipcode] = 12345 }

Workflow of query - The tuple variable "t" will go through every tuple of the Customer table.
Each row will check whether the Cust_Zipcode is 12345 or not and only return those rows that
satisfies the Predicate expression condition.

The TRC expression above can be read as "Return all the tuple which belongs to the
Customer Table and whose Zipcode is equal to 12345."

Result of the TRC expression above:

Customer_id Name Zip code

1 Rohit 12345

4. Amit 12345

Example 2: Write a TRC query to get the customer id of all the Customers.

TRC query: { t \| ∃s (s ∈ Customer ∧ s.Customer_id = t.customer_id) }

Result of the TRC Query:

Customer_id

4
Domain Relational Calculus (DRC)

Domain Relational Calculus uses domain Variables to get the column values required from the
database based on the predicate expression or condition.

The Domain realtional calculus expression syntax:

{<x1,x2,x3,x4...> \| P(x1,x2,x3,x4...)}

where,

<x1,x2,x3,x4...> are domain variables used to get the column values required,
and P(x1,x2,x3...) is predicate expression or condition.

Let's take the example of Customer Database and try to understand DRC queries with some
examples.

Customer Table
Customer_id Name Zip code

1 Rohit 12345

2 Rahul 13245

3 Rohit 56789

4 Amit 12345

Example 1: Write a DRC query to get the data of all customers with Zip code 12345.

DRC query: {<x1,x2,x3> \| <x1,x2> ∈ Customer ∧ x3 = 12345 }

Workflow of Query: In the above query x1,x2,x3 (ordered) refers to the attribute or column
which we need in the result, and the predicate condition is that the first two domain variables x1
and x2 should be present while matching the condition for each row and the third domain
variable x3 should be equal to 12345.

Result of the DRC query will be:

Customer_id Name Zip code

1 Rohit 12345

4 Amit 12345

Example 2: Write a DRC query to get the customer id of all the customer.
DRC Query: { <x1> \| ∃ x2,x3(<x1,x2,x3> ∈ Customer ) }

Result of the above Query will be:

Customer_id

Types of Functional dependencies in


DBMS

In a relational database management, functional dependency is a


concept that specifies the relationship between two sets of attributes
where one attribute determines the value of another attribute. It is
denoted as X → Y, where the attribute set on the left side of the
arrow, X is called Determinant, and Y is called the Dependent.
Functional dependencies are used to mathematically express relations
among database entities and are very important to understand
advanced concepts in Relational Database System and understanding
problems in competitive exams like Gate.
Example:
nam
roll_no e dept_name dept_building

42 abc CO A4

43 pqr IT A3
nam
roll_no e dept_name dept_building

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional


dependencies:
 roll_no → { name, dept_name, dept_building },→ Here, roll_no
can determine values of fields name, dept_name and
dept_building, hence a valid Functional dependency
 roll_no → dept_name , Since, roll_no can determine whole set
of {name, dept_name, dept_building}, it can determine its
subset dept_name also.
 dept_name → dept_building , Dept_name can identify the
dept_building accurately, since departments with different
dept_name will also have a different dept_building
 More valid functional dependencies: roll_no → name, {roll_no,
name} ⇢ {dept_name, dept_building}, etc.
Here are some invalid functional dependencies:
 name → dept_name Students with the same name can have
different dept_name, hence this is not a valid functional
dependency.
 dept_building → dept_name There can be multiple
departments in the same building. Example, in the above table
departments ME and EC are in the same building B2, hence
dept_building → dept_name is an invalid functional
dependency.
 More invalid functional dependencies: name → roll_no, {name,
dept_name} → roll_no, dept_building → roll_no, etc.
Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity
rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ
is also valid by the augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence
{roll_no, name, dept_name} → {dept_building, dept_name} is
also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies,
then X→Z is also valid by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building,
then roll_no → dept_building is also valid.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset
of the determinant. i.e. If X → Y and Y is the subset of X, then it is
called trivial functional dependency
Example:
roll_n
o name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency,


since the dependent name is a subset of determinant set {roll_no,
name}. Similarly, roll_no → roll_no is also an example of trivial
functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not
a subset of the determinant. i.e. If X → Y and Y is not a subset of X,
then it is called Non-trivial functional dependency.
Example:
roll_n
o name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the


dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a
non-trivial functional dependency, since age is not a subset of
{roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent
set are not dependent on each other. i.e. If a → {b, c} and there
exists no functional dependency between b and c, then it is called
a multivalued functional dependency.
For example,
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18
roll_no name age

45 abc 19

Here, roll_no → {name, age} is a multivalued functional


dependency, since the dependents name & age are not
dependent on each other(i.e. name → age or age → name doesn’t
exist !)
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent
on determinant. i.e. If a → b & b → c, then according to axiom of
transitivity, a → c. This is a transitive functional dependency.
For example,
nam
enrol_no e dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to


the axiom of transitivity, enrol_no → building_no is a valid functional
dependency. This is an indirect functional dependency, hence called
Transitive functional dependency.
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes
uniquely determines another attribute or set of attributes. If a relation
R has attributes X, Y, Z with the dependencies X->Y and X->Z which
states that those dependencies are fully functional.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part
of the composite key, rather than the whole key. If a relation R has
attributes X, Y, Z where X and Y are the composite key and Z is non
key attribute. Then X->Z is a partial functional dependency in RBDMS.
Advantages of Functional Dependencies
Functional dependencies having numerous applications in the field of
database management system. Here are some applications listed
below:

1. Data Normalization

Data normalization is the process of organizing data in a database in


order to minimize redundancy and increase data integrity. Functional
dependencies play an important part in data normalization. With the
help of functional dependencies we are able to identify the primary
key, candidate key in a table which in turns helps in normalization.

2. Query Optimization

With the help of functional dependencies we are able to decide the


connectivity between the tables and the necessary attributes need to
be projected to retrieve the required data from the tables. This helps in
query optimization and improves performance.

3. Consistency of Data

Functional dependencies ensures the consistency of the data by


removing any redundancies or inconsistencies that may exist in the
data. Functional dependency ensures that the changes made in one
attribute does not affect inconsistency in another set of attributes thus
it maintains the consistency of the data in database.

4. Data Quality Improvement

Functional dependencies ensure that the data in the database to be


accurate, complete and updated. This helps to improve the overall
quality of the data, as well as it eliminates errors and inaccuracies that
might occur during data analysis and decision making, thus functional
dependency helps in improving the quality of data in database.
GATE Question: In a schema with attributes A, B, C, D and E
following set of functional dependencies are given
{A -> B, A -> C, CD -> E, B -> D, E -> A}
Which of the following functional dependencies is NOT implied
by the above set? (GATE IT 2005)
A. CD -> AC
B. BD -> CD
C. BC -> CD
D. AC -> BC
Answer: Using FD set given in question,
(CD)+ = {CDEAB} which means CD -> AC also holds true.
(BD)+ = {BD} which means BD -> CD can’t hold true. So this FD is no
implied in FD set. So (B) is the required option.
Others can be checked in the same way.

Prime and non-prime attributes


Attributes which are parts of any candidate key of relation are called
as prime attribute, others are non-prime attributes. For Example,
STUD_NO in STUDENT relation is prime attribute, others are non-
prime attribute.
GATE Question: Consider a relation scheme R = (A, B, C, D, E,
H) on which the following functional dependencies hold: {A–
>B, BC–> D, E–>C, D–>A}. What are the candidate keys of R?
[GATE 2005]
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
Answer: (AE)+ = {ABECD} which is not set of all attributes. So AE is
not a candidate key. Hence option A and B are wrong.
(AEH)+ = {ABCDEH}
(BEH)+ = {BEHCDA}
(BCH)+ = {BCHDA} which is not set of all attributes. So BCH is not a
candidate key. Hence option C is wrong.
So correct answer is D.

SQL JOIN
As the name shows, JOIN means to combine something. In case of SQL, JOIN
means "to combine two or more tables".

In SQL, JOIN clause is used to combine the records from two or more tables
in a database.

Types of SQL JOIN


1. INNER JOIN
2. LEFT JOIN
3. RIGHT JOIN
4. FULL JOIN

Sample Table
EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE

1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

PROJECT_NO EMP_ID DEPARTMENT

101 1 Testing
102 2 Development

103 3 Designing

104 4 Development

1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables
as long as the condition is satisfied. It returns the combination of all rows
from both the tables where the condition satisfies.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. INNER JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. INNER JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching
values from the right table. If there is no matching join value, it will return
NULL.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. LEFT JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. LEFT JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

Russell NULL

Marry NULL
3. RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of
right table and the matched values from the left table. If there is no
matching in both tables, it will return NULL.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. RIGHT JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. RIGHT JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer
join. Join tables have all the records from both tables. It puts NULL on the
place of matches not found.

Syntax
1. SELECT table1.column1, table1.column2, table2.column1,....
2. FROM table1
3. FULL JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. FULL JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

Russell NULL

Marry NULL

What is Database Normalization?


In database management systems (DBMS), normal forms are a series
of guidelines that help to ensure that the design of a database is
efficient, organized, and free from data anomalies. There are several
levels of normalization, each with its own set of guidelines, known as
normal forms.
Important Points Regarding Normal Forms in DBMS
 First Normal Form (1NF): This is the most basic level of
normalization. In 1NF, each table cell should contain only a
single value, and each column should have a unique name.
The first normal form helps to eliminate duplicate data and
simplify queries.
 Second Normal Form (2NF): 2NF eliminates redundant
data by requiring that each non-key attribute be dependent
on the primary key. This means that each column should be
directly related to the primary key, and not to other columns.
 Third Normal Form (3NF): 3NF builds on 2NF by requiring
that all non-key attributes are independent of each other. This
means that each column should be directly related to the
primary key, and not to any other columns in the same table.
 Boyce-Codd Normal Form (BCNF): BCNF is a stricter form
of 3NF that ensures that each determinant in a table is a
candidate key. In other words, BCNF ensures that each non-
key attribute is dependent only on the candidate key.
 Fourth Normal Form (4NF): 4NF is a further refinement of
BCNF that ensures that a table does not contain any multi-
valued dependencies.
 Fifth Normal Form (5NF): 5NF is the highest level of
normalization and involves decomposing a table into smaller
tables to remove data redundancy and improve data
integrity.

First Normal Form


If a relation contain composite or multi-valued attribute, it violates
first normal form or a relation is in first normal form if it does not
contain any composite or multi-valued attribute. A relation is in first
normal form if every attribute in that relation is singled valued
attribute.
 Example 1 – Relation STUDENT in table 1 is not in 1NF because
of multi-valued attribute STUD_PHONE. Its decomposition into
1NF has been shown in table
2.

 Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
 In the above table Course is a multi-valued attribute so it is
not in 1NF. Below Table is in 1NF as there is no multi-valued
attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Second Normal Form
To be in second normal form, a relation must be in first normal form
and relation must not contain any partial dependency. A relation is in
2NF if it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on
any proper subset of any candidate key of the table. Partial
Dependency – If the proper subset of candidate key determines non-
prime attribute, it is called partial dependency.
 Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
 {Note that, there are many courses having the same course
fee} Here, COURSE_FEE cannot alone decide the value of
COURSE_NO or STUD_NO; COURSE_FEE together with
STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the
value of STUD_NO; Hence, COURSE_FEE would be a non-prime
attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE,
i.e., COURSE_FEE is dependent on COURSE_NO, which is a
proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partial dependency and so this
relation is not in 2NF. To convert the above relation to 2NF,
we need to split the table into two tables such as : Table 1:
STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1
Table 2
STUD_NO COURSE_NO COURSE_NO
COURSE_FEE
1 C1 C1
1000
2 C2 C2
1500
1 C4 C3
1000
4 C3 C4
2000
4 C1 C5
2000
 NOTE: 2NF tries to reduce the redundant data getting stored
in memory. For instance, if there are 100 students taking C1
course, we don’t need to store its Fee as 1000 for all the 100
records, instead, once we can store it in the second table as
the course fee for C1 is 1000.
 Example 2 – Consider following functional dependencies in
relation R (A, B , C, D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no
partial dependency, i.e., any proper subset of AB doesn’t determine
any non-prime attribute.
X is a super key.
Y is a prime attribute (each element of Y is part of some
candidate key).
Example 1: In relation STUDENT given in Table 4, FD set: {STUD_NO -
> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates
the third normal form.
To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE,
COUNTRY)
Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All
possible candidate keys in above relation are {A, E, CD, BC} All
attributes are on right sides of all functional dependencies are prime.
Example 2: Find the highest normal form of a relation R(A,B,C,D,E)
with FD set as {BC->D, AC->BE, B->E}
Step 1: As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C
can’t be derived from any other attribute of the relation, so there will
be only 1 candidate key {AC}.
Step 2: Prime attributes are those attributes that are part of
candidate key {A, C} in this example and others will be non-prime {B,
D, E} in this example.
Step 3: The relation R is in 1st normal form as a relational DBMS does
not allow multi-valued or composite attribute. The relation is in 2nd
normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is
candidate key) and B->E is in 2nd normal form (B is not a proper
subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is
a super key nor D is a prime attribute) and in B->E (neither B is a
super key nor E is a prime attribute) but to satisfy 3rd normal for,
either LHS of an FD should be super key or RHS should be prime
attribute. So the highest normal form of relation will be 2nd Normal
form.
For example consider relation R(A, B, C) A -> BC, B -> A and B both
are super keys so above relation is in BCNF.
Third Normal Form
A relation is said to be in third normal form, if we did not have any
transitive dependency for non-prime attributes. The basic condition
with the Third Normal Form is that, the relation must be in Second
Normal Form.
Below mentioned is the basic condition that must be hold in the non-
trivial functional dependency X -> Y:
 X is a Super Key.
 Y is a Prime Attribute ( this means that element of Y is some
part of Candidate Key).

Unit - 3

Transaction
o The transaction is a set of logically related operation. It contains a group of
tasks.
o A transaction is an action or series of actions. It is performed by a single user
to perform operations for accessing the contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account


to Y's account. This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
States of Transaction
In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the
transaction is being executed.
o For example: Insertion or deletion or updating a record is done here. But all
the records are still not saved to the database.

Partially committed
o In the partially committed state, a transaction executes its final operation,
but the data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is
executed in this state.

Committed
A transaction is said to be in a committed state if it executes all its
operations successfully. In this state, all the effects are now permanently
saved on the database system.
Failed state
o If any of the checks made by the database recovery system fails, then the
transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a
query to fetch the marks, then the transaction will fail to execute.

Aborted
o If any of the checks fail and the transaction has reached a failed state then
the database recovery system will make sure that the database is in its
previous consistent state. If not then it will abort or roll back the transaction
to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing
the transaction, all the executed transactions are rolled back to its consistent
state.
o After aborting the transaction, the database recovery module will select one
of the two operations:
1. Re-start the transaction
2. Kill the transaction

ACID Properties in DBMS


DBMS is the management of data that should remain integrated when any
changes are done in it. It is because if the integrity of the data is affected,
whole data will get disturbed and corrupted. Therefore, to maintain the
integrity of the data, there are four properties described in the database
management system, which are known as the ACID properties. The ACID
properties are meant for the transaction that goes through a different group
of tasks, and there we come to see the role of the ACID properties.

In this section, we will learn and understand about the ACID properties. We
will learn what these properties stand for and what does each property is
used for. We will also understand the ACID properties with the help of some
examples.
ACID Properties
The expansion of the term ACID defines for:

1) Atomicity
The term atomicity defines that the data remains atomic. It means if any
operation is performed on the data, either it should be performed or
executed completely or should not be executed at all. It further means that
the operation should not break in between or execute partially. In the case of
executing operations on the transaction, the operation should be completely
executed and not partially.

Example: If Remo has account A having $30 in his account from which he
wishes to send $10 to Sheero's account, which is B. In account B, a sum of $
100 is already present. When $10 will be transferred to account B, the sum
will become $110. Now, there will be two operations that will take place. One
is the amount of $10 that Remo wants to transfer will be debited from his
account A, and the same amount will get credited to account B, i.e., into
Sheero's account. Now, what happens - the first operation of debit executes
successfully, but the credit operation, however, fails. Thus, in Remo's
account A, the value becomes $20, and to that of Sheero's account, it
remains $100 as it was previously present.
In the above diagram, it can be seen that after crediting $10, the amount is
still $100 in account B. So, it is not an atomic transaction.

The below image shows that both debit and credit operations are done
successfully. Thus the transaction is atomic.

Thus, when the amount loses atomicity, then in the bank systems, this
becomes a huge issue, and so the atomicity is the main focus in the bank
systems.
2) Consistency
The word consistency means that the value should remain preserved
always. In DBMS, the integrity of the data should be maintained, which
means if a change in the database is made, it should remain preserved
always. In the case of transactions, the integrity of the data is very essential
so that the database remains consistent before and after the transaction.
The data should always be correct.

Example:

In the above figure, there are three accounts, A, B, and C, where A is making
a transaction T one by one to both B & C. There are two operations that take
place, i.e., Debit and Credit. Account A firstly debits $50 to account B, and
the amount in account A is read $300 by B before the transaction. After the
successful transaction T, the available amount in B becomes $150. Now, A
debits $20 to account C, and that time, the value read by C is $250 (that is
correct as a debit of $50 has been successfully done to B). The debit and
credit operation from account A to C has been done successfully. We can see
that the transaction is done successfully, and the value is also read correctly.
Thus, the data is consistent. In case the value read by B and C is $300, which
means that data is inconsistent because when the debit operation executes,
it will not be consistent.

3) Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a
database where no data should affect the other one and may occur
concurrently. In short, the operation on one database should begin when the
operation on the first database gets complete. It means if two operations are
being performed on two different databases, they may not affect the value of
one another. In the case of transactions, when two or more transactions
occur simultaneously, the consistency should remain maintained. Any
changes that occur in any particular transaction will not be seen by other
transactions until the change is not committed in the memory.

Example: If two operations are concurrently running on two different


accounts, then the value of both accounts should not get affected. The value
should remain persistent. As you can see in the below diagram, account A is
making T1 and T2 transactions to account B and C, but both are executing
independently without affecting each other. It is known as Isolation.

4) Durability
Durability ensures the permanency of something. In DBMS, the term
durability ensures that the data after the successful execution of the
operation becomes permanent in the database. The durability of the data
should be so perfect that even if the system fails or leads to a crash, the
database still survives. However, if gets lost, it becomes the responsibility of
the recovery manager for ensuring the durability of the database. For
committing the values, the COMMIT command must be used every time we
make changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining the
consistency and availability of data in the database.

Thus, it was a precise introduction of ACID properties in DBMS. We have


discussed these properties in the transaction section also.

Concurrency Control
Concurrency Control is the working concept that is required for controlling
and managing the concurrent execution of database operations and thus
avoiding the inconsistencies in the database. Thus, for maintaining the
concurrency of the database, we have the concurrency control protocols.

Concurrent Execution in DBMS


o In a multi-user system, multiple users can access and use the same database
at one time, which is known as the concurrent execution of the database. It
means that the same database is executed simultaneously on a multi-user
system by different users.
o While working on the database transactions, there occurs the requirement of
using the database by multiple users for performing different operations, and
in that case, concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be
done in an interleaved manner, and no operation should affect the other
executing operations, thus maintaining the consistency of the database.
Thus, on making the concurrent execution of the transaction operations,
there occur several challenging problems that need to be solved.

Problems with Concurrent Execution


In a database transaction, the two main operations
are READ and WRITE operations. So, there is a need to manage these two
operations in the concurrent execution of the transactions as if these
operations are not performed in an interleaved manner, and the data may
become inconsistent. So, the following problems occur with the Concurrent
Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)


The problem occurs when two different database transactions perform the
read/write operations on the same database items in an interleaved manner
(i.e., concurrent execution) that makes the values of the items incorrect
hence making the database inconsistent.

For example:

Consider the below diagram where two transactions T X and TY, are
performed on the same account A where the balance of account A is
$300.

o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250
(only deducted and not updated/write).
o Alternately, at time t3, transaction T Y reads the value of account A that will be
$300 only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only
added but not updated/write).
o At time t6, transaction T X writes the value of account A that will be updated
as $250 only, as TY didn't update the value yet.
o Similarly, at time t7, transaction T Y writes the values of account A, so it will
write as done at time t4 that will be $400. It means the value written by T X is
lost, i.e., $250 is lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)


The dirty read problem occurs when one transaction updates an item of the
database, and somehow the transaction fails, and before the data gets
rollback, the updated database item is accessed by another transaction.
There comes the Read-Write Conflict between both transactions.

For example:

Consider two transactions TX and TY in the below diagram


performing read/write operations on account A where the available
balance in account A is $300:
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value
changes back to $300 (as initially).
o But the value for account A remains $350 for transaction T Y as committed,
which is the dirty read and therefore known as the Dirty Read Problem.

Unrepeatable Read Problem (W-R Conflict)


Also known as Inconsistent Retrievals Problem that occurs when in a
transaction, two different values are read for the same database item.

For example:

Consider two transactions, TX and TY, performing the read/write


operations on account A, having an available balance = $300. The
diagram is shown below:

o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to
the available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction T X reads the available value of account A,
and that will be read as $400.
o It means that within the same transaction T X, it reads two different values of
account A, i.e., $ 300 initially, and after updation made by transaction T Y, it
reads $400. It is an unrepeatable read and is therefore known as the
Unrepeatable read problem.

Thus, in order to maintain consistency in the database and avoid such


problems that take place in concurrent execution, management is needed,
and that is where the concept of Concurrency Control comes into role.

Types of Serializability
There are two ways to check whether any non-serial schedule is
serializable.

Types of Serializability – Conflict & View


1. Conflict serializability
Conflict serializability refers to a subset of serializability that focuses
on maintaining the consistency of a database while ensuring that
identical data items are executed in an order.
Example
Three transactions—t1, t2, and t3—are active on a schedule “S” at
once. Let’s create a graph of precedence.
Transaction – 1 Transaction – 2 Transaction – 3
(t1) (t2) (t3)

R(a)

R(b)

R(b)

W(b)

W(a)

W(a)

R(a)

W(a)

It is a conflict serializable schedule as well as a serial schedule


because the graph (a DAG) has no loops. We can also determine the
order of transactions because it is a serial schedule.
DAG of transactions

As there is no incoming edge on Transaction 1, Transaction 1 will be


executed first. T3 will run second because it only depends on T1. Due
to its dependence on both T1 and T3, t2 will finally be executed.
Therefore, the serial schedule’s equivalent order is: t1 –> t3 –> t2
This ad will end in 5

2. View Serializability
View serializability is a kind of operation in a serializable in which
each transaction should provide some results, and these outcomes
are the output of properly sequentially executing the data item. The
view serializability, in contrast to conflict serialized, is concerned with
avoiding database inconsistency. The view serializability feature of
DBMS enables users to see databases in contradictory ways.View
Serializability refers to the process of determining whether a
schedule’s views are equivalent.
Example
We have a schedule “S” with two concurrently running transactions,
“t1” and “t2.”
Schedule – S:
Transaction-1 Transaction-2
(t1) (t2)

R(a)

W(a)

R(a)

W(a)

R(b)

W(b)

R(b)

W(b)

By switching between both transactions’ mid-read-write operations,


let’s create its view equivalent schedule (S’).
Schedule – S’:
Transaction-1 Transaction-2
(t1) (t2)

R(a)

W(a)

R(b)
Transaction-1 Transaction-2
(t1) (t2)

W(b)

R(a)

W(a)

R(b)

W(b)

It is a view serializable schedule since a view similar schedule is


conceivable.
Note: A conflict serializable schedule is always viewed as
serializable, but vice versa is not always true.

Concurrency Control Protocols


Concurrency control protocols are the set of rules which are
maintained in order to solve the concurrency control problems in the
database. It ensures that the concurrent transactions can execute
properly while maintaining the database consistency. The concurrent
execution of a transaction is provided with atomicity, consistency,
isolation, durability, and serializability via the concurrency control
protocols.
 Locked based concurrency control protocol
 Timestamp based concurrency control protocol

Locked based Protocol

In locked based protocol, each transaction needs to acquire locks


before they start accessing or modifying the data items. There are
two types of locks used in databases.
 Shared Lock : Shared lock is also known as read lock which
allows multiple transactions to read the data simultaneously.
The transaction which is holding a shared lock can only read
the data item but it can not modify the data item.
 Exclusive Lock : Exclusive lock is also known as the write
lock. Exclusive lock allows a transaction to update a data
item. Only one transaction can hold the exclusive lock on a
data item at a time. While a transaction is holding an
exclusive lock on a data item, no other transaction is allowed
to acquire a shared/exclusive lock on the same data item.
There are two kind of lock based protocol mostly used in database:
 Two Phase Locking Protocol : Two phase locking is a
widely used technique which ensures strict ordering of lock
acquisition and release. Two phase locking protocol works in
two phases.
 Growing Phase : In this phase, the transaction
starts acquiring locks before performing any
modification on the data items. Once a transaction
acquires a lock, that lock can not be released until
the transaction reaches the end of the execution.
 Shrinking Phase : In this phase, the transaction
releases all the acquired locks once it performs all the
modifications on the data item. Once the transaction
starts releasing the locks, it can not acquire any locks
further.
 Strict Two Phase Locking Protocol : It is almost similar to
the two phase locking protocol the only difference is that in
two phase locking the transaction can release its locks before
it commits, but in case of strict two phase locking the
transactions are only allowed to release the locks only when
they performs commits.
Timestamp based Protocol
 In this protocol each transaction has a timestamp attached to
it. Timestamp is nothing but the time in which a transaction
enters into the system.
 The conflicting pairs of operations can be resolved by the
timestamp ordering protocol through the utilization of the
timestamp values of the transactions. Therefore,
guaranteeing that the transactions take place in the correct
order.

Unit-4
1. DAC (Discretionary Access Control):
 Definition: DAC is a model of access control where the owner of
an object (e.g., a file or a database table) has discretion over
who can access and manipulate that object. In other words, the
owner of the resource can grant or revoke permissions to other
users or entities.
 Key Features:
 Users have the discretion to control access to their own
resources.
 Typically, permissions are associated with individual users
or groups, and the owner decides who gets what level of
access.
 Common implementations include file-level permissions on
operating systems and some database systems that allow
users to grant privileges to others.

Example: Consider a shared folder on a file server in a small company. The


folder contains project files that several employees are collaborating on. In a
DAC system:

 User A, who created the folder, can decide to grant read and write
access to User B and read-only access to User C.
 User B, with the granted permissions, can further decide to allow or
deny access to User D for specific files within the folder.
 User C may decide to restrict access to a subfolder for User A.

2. MAC (Mandatory Access Control):


 Definition: MAC is a more rigid access control model where
access is controlled by a central authority or security policy.
Users and objects are assigned security labels, and access
decisions are based on these labels. Users can access objects
only if they have the necessary security clearance.
 Key Features:
 Access decisions are determined by system-wide security
policies rather than the discretion of individual users.
 Users and objects are assigned security labels, such as
security levels or categories.
 The Bell-LaPadula model and the Biba model are well-
known examples of MAC models used for access control in
various security-conscious environments.

Example: Imagine a highly secure government database containing


classified information. In a MAC system:
 Each user, such as an agent or analyst, is assigned a security
clearance level, such as "Top Secret," "Secret," or "Unclassified."
 Each document or record in the database is labeled with its
classification level, such as "Top Secret," "Secret," or "Unclassified."
 Access to the database is controlled by a strict policy that only allows
users with a matching or higher clearance level to access documents.

In this example, access decisions are not at the discretion of individual users
but are determined by the security labels and the security policy, making it a
MAC system.

3. RBAC (Role-Based Access Control):


 Definition: RBAC is a model of access control that is based on
the roles and responsibilities of users within an organization.
Access permissions are associated with roles, and users are
assigned to roles based on their job functions.
 Key Features:
 Users are assigned to roles, and roles are associated with
specific permissions.
 Access control decisions are based on the roles that users
have, rather than their individual identities.
 RBAC simplifies access management by organizing
permissions and users into logical groups.

Example: Consider a hospital's patient records system. In an RBAC system:

 Roles are defined, such as "Doctor," "Nurse," "Administrator," and


"Clerk."
 Permissions are associated with these roles. For instance, "Doctor" role
may have permissions to view and update patient records, while
"Clerk" role may only have permission to view basic patient
information.
 Users, such as Dr. Smith and Nurse Johnson, are assigned roles based
on their job titles.
 Dr. Smith can access and modify patient records because they have
the "Doctor" role, while Nurse Johnson can only view records with her
"Nurse" role.

It's important to note that these access control models can sometimes be
used in combination within a DBMS, depending on the security requirements
of the organization and the sensitivity of the data being managed. For
example, an organization might use MAC for highly classified data, DAC for
less sensitive data, and RBAC for managing access to various applications
and systems within the organization.

The choice of access control model depends on factors such as the


organization's security policies, regulatory requirements, and the need for
centralized control versus user discretion in managing access to data.

. Intrusion Detection:

ntrusion detection in a Database Management System (DBMS) involves


monitoring and analyzing activities within a database to identify and respond
to unauthorized or suspicious access, queries, or modifications. There are
two main types of intrusion detection systems (IDS) in DBMS:

1. Host-Based Intrusion Detection System (HIDS): This type of IDS


focuses on monitoring activities on the database server itself.
2. Network-Based Intrusion Detection System (NIDS): This type of
IDS examines network traffic to and from the database server.

Scenario: You're securing a sensitive financial database.

Steps:

1. Define Normal Behavior: Understand typical database access


patterns.
2. Install and Configure HIDS: Use Host-Based Intrusion Detection
System (HIDS) to monitor activities.
3. Define Alerts: Set alerts for unusual activities (e.g., failed logins, late-
night access).
4. Monitoring: HIDS continuously monitors the DBMS.
5. Incident Response: Investigate and take action when alerts are
triggered.

Example Alert: Unusual 3:00 AM access to sensitive financial data triggers


an alert, prompting an investigation to determine if it's legitimate or a
potential intrusion.

SQL Injection:
SQL injection is a specific type of cyberattack that targets vulnerabilities in a DBMS by injecting
malicious SQL code into user inputs. This code can exploit vulnerabilities to gain unauthorized
access to the database or manipulate its contents. SQL injection attacks often occur when user
inputs are not properly validated or sanitized before being used in SQL queries.
Here's a simplified example of an SQL injection attack:

Suppose a website has a search box that allows users to search for products by name. The
website's code might construct an SQL query like this:

SELECT * FROM products WHERE name = 'user_input';

If an attacker enters the following text into the search box:

The SQL query becomes:

SELECT * FROM products WHERE name = '' OR 1=1 -- ';

In this case, the attacker has injected SQL code that always evaluates to true (1=1), effectively
bypassing any authentication and potentially gaining unauthorized access to all product records.

To prevent SQL injection, developers should use parameterized queries or prepared statements,
which separate user input from the SQL query and automatically handle input sanitization.

Intrusion detection systems can help detect SQL injection attempts by monitoring database
activity for unusual SQL queries, unexpected patterns, or signs of unauthorized access. When
suspicious activity is detected, the IDS can trigger alerts or take action to mitigate the threat.

In summary, intrusion detection systems are crucial for identifying and responding to security
threats in a DBMS, including SQL injection attacks, which are a common and potentially
damaging form of attack that targets vulnerabilities in database queries.

Data Warehousing:

Imagine you have a big store, and you want to keep track of all the items
you sell and how much money you make. You collect this information every
day and store it in a special place called a "warehouse" just for this data.

 Data Warehouse: It's like a giant storage room for all your business
data, where you organize and keep records of everything that happens
in your store. It's designed to make it easy to find and use this
information whenever you need it.

Data Mining:

Now that you have this treasure trove of data in your warehouse, you want
to do something useful with it. Data mining is like having a team of
detectives go through all the information to find hidden patterns, valuable
insights, and answers to important questions.
 Data Mining: It's the process of digging through your stored data to
discover things you might have missed. It helps you find trends, make
predictions, and uncover valuable information, kind of like solving
mysteries hidden in your data.

Aspect Data Warehousing Data Mining

Analyzes and extracts insights from


Purpose Stores and manages data. data.

Main Function Data storage and organization. Data analysis and pattern discovery.

Utilizes data from data warehouse or


Data Source Collects data from various sources. other sources.

Analyzes structured and unstructured


Data Structure Organized and structured data. data.

Focuses on historical data for pattern


Timeframe Historical and current data. discovery.

Queries and Focuses on advanced analytics and


Reporting Supports querying and reporting. modeling.

Goal To provide a centralized repository To uncover hidden insights and


Aspect Data Warehousing Data Mining

for data. knowledge.

Example Use Storing sales data from multiple Discovering which products are often
Case stores in one place. bought together.

You might also like