Database Management System
Database Management System
• Types of DBMS
• Four Types of DBMS systems are:
• Hierarchical database
• Network database
• Relational database
• Object-Oriented database
• In a Hierarchical database, model data is
organized in a tree-like structure. Data is Stored
Hierarchically (top down or bottom up) format.
Data is represented using a parent-child
relationship. In Hierarchical DBMS parent may
have many children, but children have only one
parent.
• Advantages
• The design of the hierarchical model is simple.
• Provides Data Integrity since it is based on parent/
child relationship
• Data sharing is feasible since the data is stored in a
single database.
• Even for large volumes of data, this model works
perfectly.
• Disadvantages
• Implementation is complex.
• This model has to deal with anomalies like Insert,
Update and Delete.
• Maintenance is difficult since changes done in the
database may want you to do changes in the entire
database structure.
Network Model
• Disadvantages
• Object databases are not widely adopted.
• In some situations, the high complexity can cause
performance problems.
Advantages of DBMS
Users view data in the form of rows and columns. Tables and
relations are used to store data. Multiple views of the same
database may exist. Users can just view the data and interact
with the database, storage and implementation details are
hidden from them.
Data Independence
The main purpose of data abstraction is achieving data independence
in order to save time and cost required when the database is modified
or altered.
We have namely two levels of data independence arising from these
levels of abstraction :
Physical level data independence : It refers to the characteristic of
being able to modify the physical schema without any alterations to the
conceptual or logical schema, done for optimization purposes, e.g.,
Conceptual structure of the database would not be affected by any
change in storage size of the database system server. Changing from
sequential to random access files is one such example.
• Sophisticated Users :
Sophisticated users can be engineers, scientists, business
analyst, who are familiar with the database. They can
develop their own data base applications according to their
requirement. They don’t write the program code but they
interact the data base by writing SQL queries directly
through the query processor.
• Data Base Designers :
Data Base Designers are the users who design the structure of data
base which includes tables, indexes, views, constraints, triggers,
stored procedures. He/she controls what data must be stored and
how the data items to be related.
• Application Programmer :
Application Programmer are the back end programmers who writes
the code for the application programs.They are the computer
professionals. These programs could be written in Programming
languages such as Visual Basic, Developer, C, FORTRAN, COBOL etc.
• DDL Interpreter –
It processes the DDL statements into a set of
table containing meta data (data about data).
• Embedded DML Pre-compiler –
It processes DML statements embedded in an
application program into procedural calls.
• Query Optimizer –
It executes the instruction generated by DML
Compiler.
• 2. Storage Manager :
Storage Manager is a program that provides an interface between the data stored
in the database and the queries received. It is also known as Database Control
System. It maintains the consistency and integrity of the database by applying the
constraints and executes the DCL statements. It is responsible for updating,
storing, deleting, and retrieving data in the database.
It contains the following components –
• Authorization Manager –
It ensures role-based access control, i.e,. checks whether the particular person is
privileged to perform the requested operation or not.
• Integrity Manager –
It checks the integrity constraints when the database is modified.
•
• Transaction Manager –
It controls concurrent access by performing the operations
in a scheduled way that it receives the transaction. Thus, it
ensures that the database remains in the consistent state
before and after the execution of a transaction.
• File Manager –
It manages the file space and the data structure used to
represent information in the database.
• Buffer Manager –
It is responsible for cache memory and the transfer of data
between the secondary storage and main memory.
• 3. Disk Storage :
It contains the following components –
• Data Files –
It stores the data.
• Data Dictionary –
It contains the information about the structure of
any database object. It is the repository of
information that governs the metadata.
Database
ER model
a. Weak Entity
• An entity that depends on another entity called a
weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is
represented by a double rectangle.
•
•
2. Attribute
• The attribute is used to describe the property of an entity.
Eclipse is used to represent an attribute.
• For example, id, age, contact number, name, etc. can be
attributes of a student.
• a. Key Attribute
• The key attribute is used to represent the main
characteristics of an entity. It represents a primary key.
The key attribute is represented by an ellipse with the
text underlined.
•
• b. Composite Attribute
• An attribute that composed of many other attributes is known as a
composite attribute. The composite attribute is represented by an
ellipse, and those ellipses are connected with an ellipse.
•
• c. Multivalued Attribute
• An attribute can have more than one value. These
attributes are known as a multivalued attribute. The
double oval is used to represent multivalued attribute.
• For example, a student can have more than one phone
number.
• d. Derived Attribute
• An attribute that can be derived from other attribute is
known as a derived attribute. It can be represented by a
dashed ellipse.
• For example, A person's age changes over time and can be
derived from another attribute like Date of birth.
•
• 3. Relationship
• A relationship is used to describe the relation
between entities. Diamond or rhombus is
used to represent the relationship.
•
• Types of relationship are as follows:
• a. One-to-One Relationship
• When only one instance of an entity is associated with the relationship, then it is
known as one to one relationship.
• For example, A female can marry to one male, and a male can marry to one
female.
•
• b. One-to-many relationship
• When only one instance of the entity on the left, and more
than one instance of an entity on the right associates with
the relationship then this is known as a one-to-many
relationship.
• For example, Scientist can invent many inventions, but the
invention is done by the only specific scientist.
• It is the first key which is used to identify one and only one
instance of an entity uniquely. An entity can contain
multiple keys as we saw in PERSON table. The key which is
most suitable from those lists become a primary key.
• In the EMPLOYEE table, ID can be primary key since it is
unique for each employee. In the EMPLOYEE table, we can
even select License_Number and Passport_Number as
primary key since they are also unique.
• For each entity, selection of the primary key is based on
requirement and developers.
•
• A candidate key is an attribute or set of an
attribute which can uniquely identify a tuple.
• The remaining attributes except for primary key
are considered as a candidate key. The candidate
keys are as strong as the primary key.
• For example: In the EMPLOYEE table, id is best
suited for the primary key. Rest of the attributes
like SSN, Passport_Number, and License_Number,
etc. are considered as a candidate key.
2. Candidate key
3. Super Key
n-ary Relationship –
When there are n entities set participating in a
relation, the relationship is called as n-ary
relationship.
• Cardinality:
The number of times an entity of an entity set
participates in a relationship set is known as
cardinality. Cardinality can be of different types:
• One to one – When each entity in each entity set
can take part only once in the relationship, the
cardinality is one to one. Let us assume that a
male can marry to one female and a female can
marry to one male. So the relationship will be
one to one.
Using Sets, it can be represented as:
• Many to one – When entities in one entity set can take
part only once in the relationship set and entities in
other entity set can take part more than once in the
relationship set, cardinality is many to one. Let us
assume that a student can take only one course but
one course can be taken by many students. So the
cardinality will be n to 1. It means that for one course
there can be n students but for one student, there will
be only one course.
•
Using Sets, it can be represented as:
• Many to many – When entities in all entity
sets can take part more than once in the
relationship cardinality is many to many. Let
us assume that a student can take more than
one course and one course can be taken by
many students. So the relationship will be
many to many.
Using Sets, it can be represented as:
• Participation.Constraint:
Participation Constraint is applied on the entity participating
in the relationship set.
The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.
EXAMPLE
2. Entity integrity constraints
The entity integrity constraint states that primary key value can't be
null.
This is because the primary key value is used to identify individual
rows in relation and if the primary key has a null value, then we
can't identify those rows.
A table can contain a null value other than the primary key field.
Example:
3. Referential Integrity Constraints
Keys are the entity set that is used to identify an entity within its
entity set uniquely.
An entity set can have multiple keys, but out of which one key will
be the primary key. A primary key can contain a unique value in the
relational table.
Example:
SQL Statements
Show databases;
Create database db_name;
Use db_name;
Select database();
Creating Table
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
);
CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);
Syntax
CREATE TABLE table_name (
column1 datatype constraint,
column2 datatype constraint,
column3 datatype constraint,
....
);
SQL Constraints
SQL constraints are used to specify rules for the data in a table.
Constraints are used to limit the type of data that can go into a table.
This ensures the accuracy and reliability of the data in the table. If there
is any violation between the constraint and the data action, the action is
aborted.
Constraints can be column level or table level. Column level
constraints apply to a column, and table level constraints apply
to the whole table.
The following constraints are commonly used in SQL:
•NOT NULL - Ensures that a column cannot have a NULL value
Update table nm
UPDATE tutorials_tbl -> SET tutorial_title = 'Learning JAVA' -> WHERE tutorial_id = 3;
SQL can perform various tasks like create a table, add data
to tables, drop the table, modify the table, set permission
for users.
There are five types of SQL commands: DDL, DML, DCL, TCL, and
DQL.
1. Data Definition Language (DDL)
DDL changes the structure of the table like creating a table, deleting a
table, altering a table, etc.
CREATE
ALTER
DROP
TRUNCATE
• Create table tname(c1 dt1,c2,dt2,c3 dt3);
• UPDATE Customers
SET ContactName = 'Alfred Schmidt',
City= 'Frankfurt'
WHERE CustomerID = 1;
• UPDATE Customers
SET ContactName='Juan';
DCL commands are used to grant and take back authority from any
database user.
Grant
Revoke
• GRANT SELECT ON employees TO
bob@localhost;
• GRANT INSERT, UPDATE, DELETE ON
employees TO bob@localhost;
• GRANT DELETE ON classicmodels.employees
TO bob@localhsot;
• GRANT INSERT ON classicmodels.* TO
bob@localhost;
• REVOKE INSERT, UPDATE ON classicmodels.*
FROM rfc@localhost;
TCL commands can only use with DML commands like INSERT,
DELETE and UPDATE only.
COMMIT
ROLLBACK
SAVEPOINT
5. Data Query Language
SELECT
To create the view, we can select the fields from one or more tables
present in the database.
A view can either have specific rows based on certain condition or all
the rows of a table.
Select
Project
Union
Set different
Cartesian product
Rename
Types of Relational operation
Select Operation (σ)
It selects tuples that satisfy the given predicate
from a relation.
Notation − σp(r)
For example −
r s={t|t r or t s} Notation − r U s
Output − Projects the names of the authors who have either wri en a
book or an article or both.
The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation − r − s
Where r and s are rela ons and their output will be defined as −
r Χ s = { q t | q r and t s}
− Yields a rela on, which shows all the books and ar cles
written by tutorialspoint.
Rename Operation (ρ)
The results of relational algebra are also relations but without any
name. The rename operation allows us to rename the output relation.
'rename' operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)]
ρ(STUDENT1, STUDENT)
Relational Calculus
Relational calculus is a non-procedural query language. In the
non-procedural query language, the user is concerned with the
details of how to obtain the end results.
Notation:
{T | P (T)} or {T | Condition (T)}
Duplicate tuples
.
Cust(cname,street,city)
`display the details of the customer having
Branch(br-name,br-city) an account
Depo(c-name,accno)
Queries-1: Find the loan number, branch, amount of loans of greater
than or equal to 10000 amount.
Queries-2: Find the loan number for each loan of an amount greater or
equal to 10000.
Queries-3: Find the names of all customers who have a loan and an
account at the bank.
{t | s borrower(t[customer-name] = s[customer-name] u
Find the name of all the customers who have a loan from hyd
br and find the loan amt
Syntax: { c1, c2, c3, ..., cn | F(c1, c2, c3, ... ,cn)}
For example,
Again, the above query will return the names and ages of the students
in the table Student who are older than 17.
UNIT-3
The SQL UNION Operator
Every SELECT statement within UNION must have the same number
of columns
UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
• UNION ALL Syntax
• The UNION operator selects only distinct
values by default. To allow duplicate values,
use UNION ALL:
• SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;
• The SQL UPDATE Statement
• UPDATE Syntax
• UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
• SELECT TOP 3 * FROM Customers;
• MIN() Syntax
• SELECT MIN(column_name)
FROM table_name
WHERE condition;
• MAX() Syntax
• SELECT MAX(column_name)
FROM table_name
WHERE condition;
• COUNT() Syntax
• SELECT COUNT(column_name)
FROM table_name
WHERE condition;
• SELECT AVG(column_name)
FROM table_name
WHERE condition;
• The SUM() function returns the total sum of a
numeric column.
• SUM() Syntax
• SELECT SUM(column_name)
FROM table_name
WHERE condition;
• The SQL LIKE Operator
• IN Syntax
• SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1, value2, ...);
• SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');
• BETWEEN Syntax
• SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
• SELECT * FROM Products
WHERE Price BETWEEN 10 AND 20;
•
• SQL JOIN
• SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_na
me;
• SELECT Orders.OrderID,
Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID
= Customers.CustomerID;
• SQL LEFT JOIN Keyword
• SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
• SELECT Orders.OrderID, Employees.LastName,
Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID
= Employees.EmployeeID
ORDER BY Orders.OrderID;
• SQL FULL OUTER JOIN Keyword
• SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;
• SELECT Customers.CustomerName,
Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders ON Customers.Custo
merID=Orders.CustomerID
ORDER BY Customers.CustomerName;
• SQL Self Join
• SELECT column_name(s)
FROM table1 T1, table1 T2
WHERE condition;
• SELECT A.CustomerName AS CustomerName1,
B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;
• GROUP BY Syntax
• SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
• SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
• HAVING Syntax
• SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
• SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;
•
• The SQL EXISTS Operator
• EXISTS Syntax
• SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name FROM table_name WHERE conditio
n);
•
• SELECT SupplierName
FROM Suppliers
WHERE EXISTS (SELECT ProductName FROM P
roducts WHERE Products.SupplierID =
Suppliers.supplierID AND Price < 20);
• The SQL ANY and ALL Operators
• So, we need to avoid these types of anomalies from the tables and maintain
the integrity, accuracy of the database table. Therefore, we use the
normalization concept in the database management system.
Types of Normal Forms
There are the four types of normal forms:
• Closure Set of an Attribute
• The first normal form expects you to follow a few simple rules
while designing your database, and they are:
• R(A B C D)
• AB->D
• B->C
• Prime attribute—A B
• Non Prime attribute --- C D
• Prime attribute—A B Non Prime attribute --- C D
• R(A B C D)
• AB->D
• B->C
•
• R1(A B D) R2(B C)
• R(ABCDE)
• AB->C
• D->E
• AB->C IS PD
• D->E Is PD
• So not in 2NF
• So decompose
R(ABCDE)
AB->C
D->E
Try This
• R(ABCDE)
• A->B
• B->E
• C->D
• NO NOT IN
2NF
Decomposition
R(ABCDEFGHIJ)
AB->C
AD->GH
BD->EF
A->I
H->J
DECOMPOSITION
SO NOT IN 2NF
Third Normal Form (3NF)
R(ABC)
A->B
B->C
R(ABCBDE)
A->B
B->E
C->D
AC-ESSENTIAL
(AC)=ABCDE---C.KEY
3 NF
R(ABCBDEFGHIJ)
AB->C
A->DE
B->F
F->GH
D->IJ
AB-ESSENTIAL ATTRI
(AB)=ABCDEFGHIJ
R(ABCDE)
AB->C
B->D
D->E
AB-ESSENTIAL
(AB)-ABCDE
BCNF—BOYCE CODD NORMAL FORM
R(ABC)
AB->C
C->B
(A)=*
(AB)=ABC
(AC)=ABC
AB-> NO PD
C->B NO PD BCOS C-PRIME B-IS PRIME
SO IN 2NF
AB->C --- NO TD
C->B----NO TD
AB->CD
D->A
BC->DE
R(ABCDE) ----CHECK THIS
BC->ADE
D->B
FUNCTIONAL DEPENDENCY
R(A BC D E F G H)
AB->C
A->DE
B->F
F->GH
Essential Attributes: AB
(AB)+= ABCDEFGH
Candidate Key: AB
CHK BCNF
R(A BC D E F G H)
AB->C -- OK
A->DE -- NO SO NOT IN BCNF
B->F
F->GH
CHK 3NF
R(A BC D E F G H)
AB->C -- OK
A->DE -- NO SO NOT IN 3NF
B->F
F->GH
CHK 2NF
R(A BC D E F G H)
AB->C -- OK
A->DE -- PD SO NOT IN 2NF
B->F
F->GH SO IN 1NF
2 . R(A B C D E F)
AB->C
DC->AE
E->F
3. R(A B C D E)
CE->D
D->B
C->A
4. R(A B C D E F G H I)
AB->C
BD->EF
AD->GH
A->I
5. R(A B C D E )
AB->CD
D->A
BC->DE
6. R(A B C D E)
BC->ADE
D->B
7. R(V W X Y Z)
X->YV
Y->Z
Z->Y
VW->X
8. R(A B C D E F)
ABC->D
ABD->E
CD->F
CDF->B
BF->D
9. R(A B C )
A->B
B->C
C->A
Fourth normal form (4NF):
It builds on the first three normal forms (1NF, 2NF and 3NF) and the
Boyce-Codd Normal Form (BCNF).
R Is decompose into R1 R2 R3
R(ABCD) decomposed as R1(AB) and R2(D) then there is loss of data because we
do not have C in R1 nor R2.
A B B C
1 A A P
2 B B Q
3 A A r
A B C
1 A P
1 A R
2 B Q
3 A P
3 A R
R
A B C D E
A 122 1 P W
B 234 2 Q X
A 568 1 R Y
C 347 3 S Z
Isolation − There may be many transaction processing with the same data set at
the same time. Each transaction should be isolated from others to prevent data
corruption.
• Abort: If a transaction aborts then all the changes made are not visible.
• Commit: If a transaction commits then all the changes made are visible.
• Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B
consists of Rs 300. Transfer Rs 100 from account A to account B.
• T1 T2
• Read(A)
•
A:= A-100
•
Write(A)
• Read(B)
•
Y:= Y+100
• If the transaction T fails after the completion of transaction T1 but before completion of transaction T2,
then the amount will be deducted from A but not added to B. This shows the inconsistent database
state. In order to ensure correctness of database state, the transaction must be executed in entirety.
Consistency
The integrity constraints are maintained so that the database is consistent before and
after the transaction.
The execution of a transaction will leave a database in either its prior stable state or a
new stable state.
The consistent property of database states that every transaction sees a consistent
database instance.
The transaction is used to transform the database from one consistent state to another
consistent state.
For example: The total amount must be maintained before or after the transaction.
Total before T occurs = 600+300=900
Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.
Isolation
It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the first one
is completed.
Durability
The durability property is used to indicate the performance of the
database's consistent state. It states that the transaction made the
permanent changes.
•
Read/Access data (R).
• Write/Change data (W).
• Commit.
• Example –
Transfer of 50₹ from Account A to Account B.
• Initially A= 500₹, B= 800₹.
• This data is brought to RAM from Hard Disk.
• R(A) -- 500 // Accessed from RAM.
• A = A-50 // Deducting 50₹ from A.
• W(A)--450 // Updated in RAM.
• R(B) -- 800 // Accessed from RAM.
• B=B+50 // 50₹ is added to B's Account.
• W(B) --850 // Updated in RAM.
• commit // The data in RAM is taken back to Hard Disk.
• Note –
The updated value of Account A = 450₹ and Account B = 850₹.
• All instructions before commit come under a partially committed
state and are stored in RAM. When the commit is read the data is
fully accepted and is stored in Hard Disk.
• If the data is failed anywhere before commit we have to go back
and start from the beginning. We can’t continue from the same
state. This is known as Roll Back.
• Uses of Transaction Management :
• The DBMS is used to schedule the access of data concurrently. It
means that the user can access multiple data from the database
without being interfered with each other. Transactions are used to
manage concurrency.
• It is also used to satisfy ACID properties.
• It is used to solve Read/Write Conflict.
• It is used to implement Recoverability, Serializability, and Cascading.
• Transaction Management is also used for Concurrency Control
Protocols and Locking of data.
• Transaction States :
Transactions can be implemented using SQL queries and
Server. In the below-given diagram, you can see how
transaction states works.
•
Active state
The active state is the first state of every transaction. In this state, the
transaction is being executed.
Partially committed
In the partially committed state, a transaction executes its final
operation, but the data is still not saved to the database.
In the total mark calculation example, a final display of the total marks
step is executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its
operations successfully. In this state, all the effects are now
permanently saved on the database system.
Failed state
. of the checks made by the database recovery system fails, then the
If any
transaction is said to be in the failed state.
In the example of total mark calculation, if the database is not able to fire a
query to fetch the marks, then the transaction will fail to execute.
Aborted
If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous
consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
If the transaction fails in the middle of the transaction then before executing
the transaction, all the executed transactions are rolled back to its consistent
state.
After aborting the transaction, the database recovery module will select one of
the two operations:
In the given (a) figure, Schedule A shows the serial schedule where T1
followed by T2.
In the given (b) figure, Schedule B shows the serial schedule where T2
followed by T1.
.
2. Non-serial Schedule
It contains many possible orders in which the system can execute the
individual operations of the transactions.
In the given figure (c) and (d), Schedule C and Schedule D are the
non-serial schedules. It has interleaving of operations.
3. Serializable schedule
The. serializability of schedules is used to find non-serial schedules that allow
the transaction to execute concurrently without interfering with one another.
It identifies which schedules are correct when executions of the transaction
have interleaving of their operations.
A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
.
We cannot change sequence of statement execution , but we can go for context
switching as shown above. No multiple instructions are not executed, bcos processor
do not execute multiple execution at the same time. Does processor do more than
one task?
Serial scheduling when no context switching
Non serial scheduling when context switching happens
If we can convert a non serial schedule to a serial schedule then we can say it is
consistent schedule.
But if we cannot convert a non serial to serial schedule it does not mean it is
inconsistent
It is not guarantee that a student not passing EAMCET is poor ….
• If we can swap instructions in 2 transactions then we call them as non
conflicting instructions otherwise called as conflicting instructions.
No prob
If problem exists means the statements are conflicting so no swapping can happen
CYCLIC SO NOT A
CONFLICT SERIALIZABLE STEP-4
INCONSISTENT
CANNOT CONVERT NON
SERIAL TO SERIAL SCHEDULE
T1 T2 T3
STEP-1
R(X)
R(Y)
R(Y)
W(Y)
W(X)
W(X) STEP-2
R(X)
W(X)
NON CYCLIC
SO CONSISTENT
CYCLIC SO
NOT A CONFLICT SERIALIZABLE
STEP-4
INCONSISTENT
T1 T2 T3 T1 T2 T3
R(A) R(A)
W(A) W(B)
R(A) R(B)
W(A) W(A)
W(B) W(B)
W(B) W(C)
W(A) W(B)
W(C) R(C)
W(A) R(A)
W(A) W(C)
W(B) W(A)
W(C) R(A)
Ex:1 EX:2
What is concurrency in DBMS?
• Database concurrency is the ability of a database to allow multiple users
to affect multiple transactions. This is one of the main properties that
separates a database from other forms of data storage, like spreadsheets.
... Other users can read the file, but may not edit data.
• If we run only one transaction at a time than the acid property is sufficient but it is
possible that when multiple transactions are executed concurrently than database
may become inconsistent.
• Concurrency is the tendency for things to happen at the same time in a system. ...
Figure 1: Example of concurrency at work: parallel activities that do not interact
have simple concurrency issues. It is when parallel activities interact or share the
same resources that concurrency issues become important.
• Purpose of Concurrency
• 1. To enforce isolation
• 2. To preserve DB consistency
• 3. To resolve R-W and W-W conflict
• Concurrency control techniques
• 1.Lock based protocol
• 2. Two-phase locking Protocol
• 3. Time stamp ordering Protocol
• 4. Multi version concurrency control
• 5. Validation concurrency control
• 1.Lock based protocol
• In this type of protocol, any transaction cannot read or write data
until it acquires an appropriate lock on it. There are two types of lock:
•
• 1. Shared lock:
• It is also known as a Read-only lock. In a shared lock, the data item
can only read by the transaction.
• It can be shared between the transactions because when the
transaction holds a lock, then it can't update the data on the data
item.
• 2. Exclusive lock:
• In the exclusive lock, the data item can be both reads as well as
written by the transaction.
• This lock is exclusive, and in this lock, multiple transactions do not
modify the same data simultaneously.
• If all the locks are granted then this protocol allows the
transaction to begin. When the transaction is completed then
it releases all the lock.
• If all the locks are not granted then this protocol allows the
transaction to rolls back and waits until all the locks are
granted.
LOCK COMPATIBILITY S `X
S OK NO
X NO NO
T1 T2
EXCLUSIVE LOCK X(B)
LOCK R(B)
B-50
W(B)
UNLOCK (B)
LOCK S(B) SHARED
R(B) LOCK
UNLOCK (B)
• Note: Any number of transactions can hold
shared lock on one data item
Conversion of Locks
• Upgrading—R-Lock-Write-Lock
• Downgrading-write lock-read lock
Two-phase locking (2PL)
The two-phase locking protocol divides the execution phase of the
transaction into three parts.
• In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
• In the second part, the transaction acquires all the locks. The third phase is
started as soon as the transaction releases its first lock.
• In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.
• There are two phases of 2PL:
• Strict 2PL
• Only exclusive locks cannot be released till commit
executes. Deadlock may occur.
•
• Rigorous 2PL: both shared and exclusive locks is not
released until transaction is committed, but acquires
locks as and when required. Deadlock may occure
•
Timestamp Ordering Protocol
• The Timestamp Ordering Protocol is used to order the transactions based
on their Timestamps. The order of transaction is nothing but the ascending
order of the transaction creation.
• The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time
or logical counter.
if TS(Ti) < W_TS(Q) means Ti needs to read a value of Q that was already
overwritten. Hence request must be rejected and Ti
must rollback.
if TS(Ti) >= W_TS(Q) operation can be allowed .
W(Q)
R(Q)
Ti=5 Tx=10
W(Q) requested but not
performed
R(Q)
W(Q) not allowed
IT IS SWAP OF CONFLICT INSTRN
w-r ON Q
.
If TS(Ti) < W_TS(Q)
Ti=5 Tx=10
W(Q) was suppose to happen here
W(Q)
W(Q) not allowed here
TS(Ti)>= R_TS(Q)
TS(Ti)>=W_TS(Q)
SO we can say that the Timestamp protocol says that juniors
are allowed but not the seniors allowed to do mistakes.
Example
You went to college gate and started doing something and for
which other students ignored it
So the results are announce Y fails in the examination and X pass the examination
with good marks.
So now Y thinks why he got less marks in the exam when he has copied all the
answers from Y.
When he asks the question to X , then X explains that after you left the examination
hall , and when I was going thru all the answers and modified most of them. So the
difference.
So what is the mistake done by Y , he should have submitting the answer sheet after
X submits, but it has not happened.
. Initial value of A in DB=10 Dirty read problem
T1 T2
A=10 R(A)
A=A+1=11
W(A)
R(A) T2 reads the
value of A as 11
COMMIT
which is not a
… permanent value
Here T1 performs … (uncommitted
some operations values)of A.
…
and then commits
the transaction … If T1 rollbacks
COMMIT the transaction
then so value of
A will be back
with 10.
One day an young guy see that his parents are going out and because of that he feels
very happy and in this happiness he tries to do something thinking that he is only
present in the house, so he thinks of cooking a maggie and wanted to watch
ca`rtoon network.
In this process he takes maggie packet and cooker to cook the same, he puts the
cooker on the gas stove and then goes to watch cartoon nw.
When he comes back after 10 minutes and when he takes out the lid of the cooker
he observes that the cooker contains Rice with rajma and not maggie.
In the anger he tries to throw the rice bowl out and while doing so he gets a voice of
someone sitting on the fan saying it is me and I wants the rice with rajma.
In the same way when we run any Transaction, every transaction feels that it is
getting executed and no other transactions are in process. And no will be no
transaction.
Parents of the boy explains him that there is nothing like ghost and all it is your
thought .
As said earlier that the boy again see that his parents are going out and then he tries
to do something in the kitchen, and he take the cooker and the pasta packet and
cooks the same by placing it on a gas stove, with an ounce of doubt that there is be
a ghost in the house. The boy again goes and watch the cartoon nw. when he goes
back to the kitchen after sometime he observes that there is nothing in the kitchen.
Phantom Read Problem
.
T1 T2 Initial value of X =10
R(x)
T1 read the value of X which is 10
R(x)
T2 reads the value of X which is 10
Delete(x)
Then T1 deletes X .
R(x)
T2 now tries to read the value of X
and he finds that the particular
variable do not exists. This is called
phantom read problem.
Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
Write phase: If the validation of the transaction is validated, then the temporary
results are written to the database or system otherwise the transaction is rolled
back.
This protocol is also called Optimistic protocol because it will continue
execution of the statement thinking that every thing will finish without any
problem .
It is the method of restoring the database to its correct state in the event of
a failure at the time of the transaction or after the end of a process.
dependability refers to both the flexibility of the DBMS to various kinds of
failure and its ability to recover from those failures.
The storage of data usually includes four types of media with an increasing
amount of reliability: the main memory, the magnetic disk, the magnetic
tape, and the optical disk. Many different forms of failure can affect
database processing and/or transaction, and each of them has to be dealt
with differently. Some data failures can affect the main memory only, while
others involve non-volatile or secondary storage also. Among the sources of
failure are:
Due to hardware or software errors, the system crashes, which ultimately
resulting in loss of main memory.
Failures of media, such as head crashes or unreadable media that results in
the loss of portions of secondary storage.
There can be application software errors, such as logical errors that are
accessing the database that can cause one or more transactions to abort or
fail.
Natural physical disasters can also occur, such as fires, floods, earthquakes,
or power failures.
Carelessness or unintentional destruction of data or directories by operators
or users.
Damage or intentional corruption or hampering of data (using malicious
software or files) hardware or software facilities.
Whatever the grounds of the failure are, there are two principal things that
you have to consider:
. Failure of main memory, including that database buffers.
Failure of the disk copy of that database.
Recovery Facilities
Every DBMS should offer the following facilities to help out with the recovery
mechanism:
Backup mechanism makes backup copies at a specific interval for the
database.
Logging facilities keep tracing the current state of transactions and any
changes made to the database.
Checkpoint facility allows updates to the database for getting the latest
patches to be made permanent and keep secure from vulnerability.
Recovery manager allows the database system for restoring the database to a
reliable and steady-state after any failure occurs.
Log-Based Recovery
In this method, all the logs are created and stored in the stable storage, and
the database is updated when a transaction commits.
2. Immediate database modification:
The recovery system reads log files from the end to start. It reads log files from T4 to
T1.
Recovery system maintains two lists, a redo-list, and an undo-list.
The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
• Deadlock in DBMS
• A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complications in
DBMS as no task ever gets finished and is in waiting state forever.
• For example: In the student table, transaction T1 holds a lock on some rows and needs to
update some rows in the grade table. Simultaneously, transaction T2 holds locks on some
rows in the grade table and needs to update the rows in the Student table held by
Transaction T1.
• Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt
state and remain at a standstill. It will remain in a standstill until the DBMS detects the
deadlock and aborts one of the transactions.
•
• Concurrency control means that multiple transactions can be executed at the same time and
then the interleaved logs occur. But there may be changes in transaction results so maintain
• During recovery, it would be very difficult for the recovery system to backtrack all the logs
• Recovery with concurrent transactions can be done in the following four ways.
• Transaction rollback
• Checkpoints
• Restart recovery
• In this scheme, the recovery scheme depends greatly on the concurrency control scheme
that is used. So, to rollback a failed transaction, we must undo the updates performed by the
transaction.
•
• Transaction rollback :
• The system scans the log backward a failed transaction, for every log record found in the log the
system restores the data item.
• Checkpoints :
• Checkpoints is a process of saving a snapshot of the applications state so that it can restart from
that point in case of failure.
• Checkpoint is a point of time at which a record is written onto the database form the buffers.
• When it reaches the checkpoint, then the transaction will be updated into the database, and till
that point, the entire log file will be removed from the file. Then the log file is updated with the
new step of transaction till the next checkpoint and so on.
• The checkpoint is used to declare the point before which the DBMS was in the consistent state, and
all the transactions were committed.
• To ease this situation, ‘Checkpoints‘ Concept is used by the most DBMS.
• In this scheme, we used checkpoints to reduce the number of log records that the system
must scan when it recovers from a crash.
• In a concurrent transaction processing system, we require that the checkpoint log record be
of the form <checkpoint L>, where ‘L’ is a list of transactions active at the time of the
checkpoint.
• A fuzzy checkpoint is a checkpoint where transactions are allowed to perform updates even
while buffer blocks are being written out.
• Restart recovery :
• The undo-list consists of transactions to be undone, and the redo-list consists of transaction
to be redone.
• The system constructs the two lists as follows: Initially, they are both empty. The system
scans the log backward, examining each record, until it finds the first <checkpoint> record.
• UNIT- 5
File Organization
o The File is a collection of records. Using the primary key, we can access
the records. The type and frequency of access can be determined by the
type of file organization which was used for a given set of records.
o File organization is a logical relationship among various records. This
method defines how file records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are
stored in terms of blocks, and the blocks are placed on the storage
medium.
o The first approach to map the database to the file is to use the several
files and store only one fixed length record in any given file. An
alternative approach is to structure our files so that we can contain
multiple lengths for records.
o Files of fixed length records are easier to implement than the files of
variable length records.
This method is the easiest method for file organization. In this method, files
are stored sequentially. This method can be implemented in two ways:
o In this method, the new record is always inserted at the file's end, and
then it will sort the sequence in ascending or descending order. Sorting
of records is based on any primary key or any other key.
o In the case of modification of any record, it will update the record and
then sort the file, and lastly, the updated record is placed in the right
place.
o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism
like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like
grade calculation of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.
o It is the simplest and most basic type of organization. It works with data
blocks. In heap file organization, the records are inserted at the file's
end. When the records are inserted, it doesn't require the sorting and
ordering of records.
o When the data block is full, the new record is stored in some other
block. This new data block need not to be the very next data block, but
it can select any data block in the memory to store new records. The
heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of
the same size. It is the D B M S responsibility to store and manage the
new records.
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose
we want to insert a new record R2 in a heap. If the data block 3 is full then it
will be inserted in any of the database selected by the D B M S , let's say data
block 1.
If we want to search, update or delete the data in heap file organization, then
we need to traverse the data from staring of the file till we get the requested
record.
If the database is very large then searching, updating or deleting of record will
be time-consuming because there is no sorting or ordering of records. In the
heap file organization, we need to check all the data until we get the requested
record.
o This method is inefficient for the large database because it takes time
to search or modify the record.
o
o This method is inefficient for large databases.
Hash File Organization
H ash File Organization uses the computation of hash function on some fields
of the records. The hash function's output determines the location of disk
block where the records are to be placed.
When a record has to be received using the hash key columns, then the
address is generated, and the whole record is retrieved using that address. In
the same way, when a new record has to be inserted, then the address is
generated using the hash key and record is directly inserted. The same
process is applied in the case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In
this method, each record will be stored randomly in the memory.
B+ File Organization
o In this method, searching becomes very easy as all the records are
stored only in the leaf nodes and sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records
can increase or decrease and the B+ tree structure can also grow or
shrink.
o It is a balanced tree structure, and any insert/update/delete does not
affect the performance of tree.
If any record ha s to be retrieved based on its index value, then the address of
the data block is fetched and the record is retrieved from the memory.
Pros of ISAM:
o In this method, each record has the address of its data block, searching
a record in a huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records.
Since the index is based on the primary key values, we can retrieve the
data for the given range of value. In the same way, the partial value can
also be easily searched, i.e., the student name starting with 'JA' can be
easily searched.
Co n s of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be
reconstructed to maintain the sequence.
o When the record is deleted, then the space used by it needs to be
released. Otherwise, the performance of the database will slow down.
o When the two or more records are stored in the same file, it is known
as clusters. These files will have two or more tables in the same data
block, and key attributes which are used to map these tables together
are stored only once.
o This method reduces the cost of searching for various records in
different files.
o The cluster file organization is used when there is a frequent need for
joining the tables with the same condition. These joins will give only a
few records from both tables. In the given example, we are retrieving
the record for only particular departments. This method can't be used
to retrieve the record for the entire department.
In this method, we can directly insert, update or delete any record. Data is
sorted based on the key with which searching is done. Cluster key is a type
of key with which joining of the table is performed.
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored
together. The above EMP LOY EE and DEPARTMENT relationship is an
example of an indexed cluster. Here, all the records are grouped based on the
cluster key- DEP_ID and all the records are grouped.
2. H a s h Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the
records based on the cluster key, we generate the value of the has h key for
the cluster key and store the records with the same ha s h key value.
o This method ha s the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use.
If we change the condition of joining then traversing the file takes a lot
of time.
o This method is not suitable for a table with a 1:1 condition.
o Indexing in D B M S
Index structure:
o The first column of the database is the search key that contains a
copy of the primary key or candidate key of the table. The values of
the primary key are stored in sorted order so that the corresponding
data can be accessed easily.
o The second column of the database is the data reference. It contains
a set of pointers holding the address of the disk block where the value
of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which
are sorted are known as ordered indices.
Primary Index
o If the index is created on the basis of the primary key of the table,
then it is known as primary indexing. These primary keys are unique
to each record and contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the
searching operation is quite efficient.
o The primary index can be classified into two types: Dense index and
Sparse index.
Dense index
o The dense index contains a n index record for every search key value
in the data file. It makes searching faster.
o In this, the number of records in the index table is same as the
number of records in the main table.
o It needs more space to store index record itself. The index records
have the search key and a pointer to the actual record on the disk.
Sparse index
o In the data file, index record appears only for a few items. E a ch item
points to a block.
o In this, instead of pointing to each record in the main table, the index
points to the records in the main table in a gap.
Clustering Index
The previous schema is little confusing because one disk block is shared by
records which belong to the different cluster. If we use separate disk block for
separate clusters, then it is called better technique.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also
grows. These mappings are usually kept in the primary memory so that
address fetch should be faster. Then the secondary memory searches the
actual data based on the address got from mapping. If the mapping size grows
then fetching the address itself becomes slower. In this case, the sparse index
will not be efficient. To overcome this problem, secondary indexing is
introduced.
o If you want to find the record of roll 111 in the diagram, then it will
search the highest entry which is smaller than or equal to 111 in the
first level index. It will get 100 at this level.
o Then in the second index level, again it does ma x (111) <= 111 and gets
110. Now using the address 110, it goes to the data block and starts
searching each record till it gets 111.
o This is how a search is performed in this method. Inserting, updating
or deleting is also done in the same manner.
B+ Tree
Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node.
The B+ tree is of the order n where n is fixed for every B+ tree.
o It contains a n internal node and leaf node.
Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers
except the root node.
o At most, a n internal node of the tree contains n pointers.
Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers
and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to
next leaf node.
In this case, we have to split the leaf node, so that it can be inserted into tree
without affecting the fill factor, balance and order.
The 3 rd leaf node has the values (50, 55, 60, 65, 70) and its current root node
is 50. We will split the leaf node of the tree in the middle so that its balance
is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
If these two ha s to be leaf nodes, the intermediate node cannot branch from
50. It should have 60 added to it, and then we can have pointers to a new leaf
node.
Suppose we want to delete 60 from the above example. In this case, we have
to remove 60 from the intermediate node as well as from the 4th leaf node too.
If we remove it from the intermediate node, then the tree will not satisfy the
rule of the B+ tree. So we need to modify it to have a balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will
show as follows:
Hashing
In this technique, data is stored at the data blocks whose address is generated
by using the hashing function. The memory location where these records are
stored is known as data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the
address. Most of the time, the hash function uses the primary key to generate
the address of the data block. A has h function is a simple mathematical
function to any complex mathematical function. We can even consider the
primary key itself as the address of the data block. That means each row
whose address will be the same as a primary key stored in the data block.
The above diagram shows data block addresses same as primary key value.
This has h function can also be a simple mathematical function like
exponential, mod, cos, sin, etc. Suppose we have mod (5) hash function to
determine the address of the data block. In this case, it applies mod (5) hash
function on the primary keys and generates 3, 3, 1, 4 and 2 respectively, and
records are stored in those data block addresses.
Types of Hashing:
o Static Hashing
o Dynamic Hashing
Static Hashing
In static hashing, the resultant data bucket address will always be the same.
That means if we generate an address for EMP_ID =103 using the hash
function mod (5) then it will always result in same bucket address 3. Here,
there will be no change in the bucket address.
Hence in this static hashing, the number of data buckets in memory remains
constant throughout. In this example, we will have five data buckets in the
memory used to store the data.
Operations of Static Hashing
o Searching a record
When a record needs to be searched, then the same hash function retrieves
the address of the bucket where the data is stored.
o Insert a Record
When a new record is inserted into the table, then we will generate a n address
for a new record based on the ha s h key and record is stored in that location.
o Delete a Record
o Update a Record
To update a record, we will first search it using a ha s h function, and then the
data record is updated.
If we want to insert some new record into the file but the address of a data
bucket generated by the has h function is not empty, or data already exists in
that address. This situation in the static hashing is known as bucket
overflow. This is a critical situation in this method.
To overcome this situation, there are various methods. Some commonly used
methods are as follows:
1. Open Hashing
When buckets are full, then a new data bucket is allocated for the same hash
result and is linked after the previous one. This mechanism is known
as Overflow chaining.
Dynamic Hashing
o Firstly, you have to follow the same procedure for retrieval, ending up
in some bucket.
o If there is still space in that bucket, then place the record in it.
o If the bucket is full, then we will split the bucket and redistribute the
records.
For example:
Consider the following grouping of keys into buckets, depending on the prefix
of their ha s h address:
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two
bits of 5 and 6 are 01, so it will go into bucket B1. The last two bits of 1 and
3 are 10, so it will go into bucket B2. The last two bits of 7 are 11, so it will
go into B3.
Insert key 9 with ha sh address 10001 into the above structure:
o Since key 9 has has h address 10001, it must go into the first bucket.
But bucket B1 is full, so it will get split.
o The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are
001, so it will go into bucket B1, and the last three bits of 6 are 101, so
it will go into bucket B5.
o Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and
100 entry because last two bits of both the entry are 00.
o Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and
110 entry because last two bits of both the entry are 10.
o Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry
because last two bits of both the entry are 11.
Advantages of dynamic hashing
o In this method, the performance does not decrease as the data grows in
the system. It simply increases the size of memory to accommodate the
data.
o In this method, memory is well utilized as it grows and shrinks with the
data. There will not be any unused memory lying.
o This method is good for the dynamic database where data grows and
shrinks frequently.
o In this method, if the data size increases then the bucket size is also
increased. These addresses of data will be maintained in the bucket
address table. This is because the data address will keep changing as
buckets grow and shrink. If there is a huge increase in data,
maintaining the bucket address table becomes tedious.
o In this case, the bucket overflow situation will also occur. But it might
take little time to reach this situation than static hashing.
Duplicate tuples