0% found this document useful (0 votes)
13 views41 pages

Advanced Database Chapter 456

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views41 pages

Advanced Database Chapter 456

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

1/9/2024

Chapter Four
CONCURRENCY CONTROL

• Concurrency Control is the management procedure that is required for


controlling concurrent execution of the operations that take place on a
database.
• But before knowing about concurrency control, we should know about
concurrent execution.
Concurrent Execution in DBMS
• In a multi-user system, multiple users can access and use the same database at one
time, which is known as the concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user system by different users.
• While working on the database transactions, there occurs the requirement of using
the database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
• The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations,
thus maintaining the consistency of the database. Thus, on making the concurrent
execution of the transaction operations, there occur several challenging problems
that need to be solved.

1
1/9/2024

Problems with Concurrent Execution


In a database transaction, the two main operations
are READ and WRITE operations.
• So, there is a need to manage these two operations in the concurrent
execution of the transactions as if these operations are not performed in an
interleaved manner, and the data may become inconsistent. So, the following
problems occur with the Concurrent Execution of the operations:
Problem 1: Lost Update Problems (W - W Conflict)
• The problem occurs when two different database transactions perform the
read/write operations on the same database items in an interleaved manner
(i.e., concurrent execution) that makes the values of the items incorrect hence
making the database inconsistent.

For example:
• Consider the below diagram where two transactions TX and TY, are performed on
the same account A where the balance of account A is $300.

2
1/9/2024

• At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
• At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
• Alternately, at time t3, transaction TY reads the value of account A that will be
$300 only because TX didn't update the value yet.
• At time t4, transaction TY adds $100 to account A that becomes $400 (only added
but not updated/write).
• At time t6, transaction TX writes the value of account A that will be updated as
$250 only, as TY didn't update the value yet.
• Similarly, at time t7, transaction TY writes the values of account A, so it will write
as done at time t4 that will be $400. It means the value written by TX is lost, i.e.,
$250 is lost.
Hence data becomes incorrect, and database sets to inconsistent.

Problem 2: Dirty Read Problems (W-R Conflict)


• The dirty read problem occurs when one transaction updates an item of the
database, and somehow the transaction fails, and before the data gets rollback, the
updated database item is accessed by another transaction. There comes the Read-Write
Conflict between both transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations
on account A where the available balance in account A is $300:

3
1/9/2024

• At time t1, transaction TX reads the value of account A, i.e., $300.


• At time t2, transaction TX adds $50 to account A that becomes $350.
• At time t3, transaction TX writes the updated value in account A, i.e.,
$350.
• Then at time t4, transaction TY reads account A that will be read as $350.
• Then at time t5, transaction TX rollbacks due to server problem, and the
value changes back to $300 (as initially).
• But the value for account A remains $350 for transaction TY as committed,
which is the dirty read and therefore known as the Dirty Read Problem.

Problem 3: Unrepeatable Read Problem (W-R Conflict)


• Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
• For example:
• Consider two transactions, TX and TY, performing the read/write operations on
account A, having an available balance = $300.

4
1/9/2024

• At time t1, transaction TX reads the value from account A, i.e., $300.
• At time t2, transaction TY reads the value from account A, i.e., $300.
• At time t3, transaction TY updates the value of account A by adding
$100 to the available balance, and then it becomes $400.
• At time t4, transaction TY writes the updated value, i.e., $400.
• After that, at time t5, transaction TX reads the available value of
account A, and that will be read as $400.
• It means that within the same transaction TX, it reads two different
values of account A, i.e., $ 300 initially, and after updation made by
transaction TY, it reads $400. It is an unrepeatable read and is therefore
known as the Unrepeatable read problem.
• Thus, in order to maintain consistency in the database and avoid
such problems that take place in concurrent execution,
management is needed, and that is where the concept of
Concurrency Control comes into role.

Concurrency Control
• Concurrency Control is the working concept that is required
for controlling and managing the concurrent execution of
database operations and thus avoiding the inconsistencies in
the database.
• Therefore, maintaining the concurrency of the database,
concurrency control protocols are used.

5
1/9/2024

Concurrency Control Protocols


• The concurrency control protocols ensure the atomicity,
consistency, isolation, durability and serializability of the
concurrent execution of the database transactions.
• Therefore, these protocols are categorized as:
Lock Based Concurrency Control Protocol
Time Stamp Concurrency Control Protocol
Validation Based Concurrency Control Protocol

Lock-Based Protocol
• In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it.
• The lock is a variable that denotes those operations that can be executed on the
particular data item.
• There are two types of lock:
1. Shared lock(Read-only lock or lock-s)
• In a shared lock, the data item can only read by the transaction.
• It can be shared between the transactions because when the transaction holds a lock, then it can't
update the data on the data item.
• So, any number of transaction can hold lock-s(shared lock)
2. Exclusive lock(lock-x):
• In the exclusive lock, the data item can be both reads as well as written by the transaction.
• This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
• Lock-x(exclusive lock) is hold only by one transaction at a time.

6
1/9/2024

Lesson- 2

There are four types of lock protocols available:


1. Simplistic lock protocol
• It is the simplest way of locking the data while
transaction.
• Simplistic lock-based protocols allow all the
transactions to get the lock on the data before
insert or delete or update on it.
• Transactions are able to unlock the data item after
completing the write operations

7
1/9/2024

2. Pre-claiming Lock Protocol


• Pre-claiming Lock Protocols evaluate the transaction to list all the data items
on which they need locks.
• Before initiating an execution of the transaction, it requests DBMS for all
the lock on all those data items.
• If all the locks are granted then this protocol allows the transaction to begin.
When the transaction is completed then it releases all the lock.
• If all the locks are not granted then this protocol allows the transaction to
rolls back and waits until all the locks are granted.

3. Two-phase locking (2PL):means locking and unlocking


• The two-phase locking protocol divides the execution phase of the transaction into
three parts.
• In the first part, the time at which the execution of the transaction starts, it seeks
permission for the lock it requires.
• In the second part, the transaction acquires all the locks.
• The third phase is started as soon as the transaction releases its first lock. In this
phase, the transaction cannot demand any new locks. It only releases the acquired
locks.

8
1/9/2024

There are two phases of 2PL:


 2PL have two phases: growing phase(lock are obtained but not
release) and shrinking phase(lock is released but no new lock
needed)
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the
transaction may be released, but no new locks can be acquired.
 In the growing phase transaction reaches a point where all the locks
it may need has been acquired.
 This point is called LOCK POINT that the operations are done.
 After the lock point has been reached, the transaction enters a
shrinking phase.

Example:
In this example, if lock conversion is allowed then the following phase can happen:
1. Upgrading of lock (from S(A) to X (A)) is allowed in growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.

Transaction T1:
• Growing phase: from step 1-3
• Shrinking phase: from step 5-7
• Lock point: at 3
Transaction T2:
• Growing phase: from step 2-6
• Shrinking phase: from step 8-9
• Lock point: at 6

9
1/9/2024

4. Strict Two-phase locking (Strict-2PL)


o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring
all the locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all
the locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.

10
1/9/2024

Timestamp Ordering Protocol


The Timestamp Ordering Protocol is used to order the transactions based on
their Timestamps. The order of transaction is nothing but the ascending order
of the transaction creation.
The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or
logical counter.
The lock-based protocol is used to manage the order between conflicting pairs
among transactions at the execution time. But Timestamp based protocols
start working as soon as a transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the transaction
T1 has entered the system at 007 times and transaction T2 has entered the
system at 009 times. T1 has the higher priority, so it executes first as it is
entered the system first.
The timestamp ordering protocol also maintains the timestamp of last 'read'
and 'write' operation on a data.

Basic Timestamp ordering protocol works as follows:


1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
• If W_TS(X) >TS(Ti) then the operation is rejected.
• If W_TS(X) <= TS(Ti) then the operation is executed.
• Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
• If TS(Ti) < R_TS(X) then the operation is rejected.
• If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed.
Where,
 TS(Ti) denotes the timestamp of the transaction Ti.
 R_TS(X) denotes the Read time-stamp of data-item X.
 W_TS(X) denotes the Write time-stamp of data-item X.

11
1/9/2024

Advantages and Disadvantages of TO protocol:


• TO protocol: ensures serializability since the precedence graph is as
follows:

• TO Protocol: ensures freedom from deadlock that means no


transaction ever waits.
• But the schedule may not be recoverable and may not even be cascade-
free.

Validation Based Protocol


Validation phase is also known as optimistic concurrency
control technique. In the validation based protocol, the
transaction is executed in the following three phases:
1. Read phase: In this phase, the transaction T is read and
executed. It is used to read the value of various data items and
stores them in temporary local variables. It can perform all the
write operations on temporary variables without an update to
the actual database.
2. Validation phase: In this phase, the temporary variable value
will be validated against the actual data to see if it violates the
serializability.
3. Write phase: If the validation of the transaction is validated,
then the temporary results are written to the database or
system otherwise the transaction is rolled back.

12
1/9/2024

Here each phase has the following different timestamps:


• Start(Ti): It contains the time when Ti started its execution.
• Validation (Ti): It contains the time when Ti finishes its read phase and
starts its validation phase.
• Finish(Ti): It contains the time when Ti finishes its write phase.
• This protocol is used to determine the time stamp for the transaction for serialization
using the time stamp of the validation phase, as it is the actual phase which
determines if the transaction will commit or rollback.
• Hence TS(T) = validation(T).
• The serializability is determined during the validation process.
• While executing the transaction, it ensures a greater degree of concurrency and also less number
of conflicts.
• Thus it contains transactions which have less number of rollbacks.

END OF
CHAPTER FOUR

13
1/9/2024

Chapter Five

Recoverability of Schedule
and
Database Recovery

We have discussed
Non-serial schedules which are not serializable
are called as non-serializable schedules.
Non-serializable schedules may be recoverable or
irrecoverable.

1
1/9/2024

Recoverable Schedules
If in a schedule,
A transaction performs a dirty read operation from
an uncommitted transaction and its commit
operation is delayed till the uncommitted
transaction either commits or roll backs then such
a schedule is called as a Recoverable Schedule.

Types of Recoverable Schedules

• A recoverable schedule may be any one of


these kinds
Cascading Schedule
Cascadeless Schedule
Strict Schedule

2
1/9/2024

• Cascading Schedule-
• If in a schedule, failure of one transaction causes several other dependent
transactions to rollback or abort, then such a schedule is called as
a Cascading Schedule or Cascading Rollback or Cascading Abort.
• Which leads to the wastage of CPU time.

• Here,
• Transaction T2 depends on transaction T1.
• Transaction T3 depends on transaction T2.
• Transaction T4 depends on transaction T3.
• In this schedule,
• The failure of transaction T1 causes the transaction T2 to rollback.
• The rollback of transaction T2 causes the transaction T3 to rollback.
• The rollback of transaction T3 causes the transaction T4 to rollback.
• Such a rollback is called as a Cascading Rollback.
• NOTE-
• If the transactions T2, T3 and T4 would have committed before the
failure of transaction T1, then the schedule would have been
irrecoverable.

3
1/9/2024

Cascadeless Schedule
• If in a schedule, a transaction is not allowed to read a data item
until the last transaction that has written it is committed or
aborted, then such a schedule is called as a Cascadeless
Schedule.
• In other words,
• Cascadeless schedule allows only committed read operations.
• Therefore, it avoids cascading roll back and thus saves CPU time.

Example

NOTE-
• Cascadeless schedule allows only committed read operations.
• However, it allows uncommitted write operations.

4
1/9/2024

Strict Schedule
• If in a schedule, a transaction is neither allowed to read
nor write a data item until the last transaction that has
written it is committed or aborted, then such a schedule
is called as a Strict Schedule.
• In other words,
Strict schedule allows only committed read and write
operations.
Which means strict schedule implements more
restrictions than cascadeless schedule.

Example

5
1/9/2024

Remember-
 Strict schedules are more strict than cascadeless schedules.
All strict schedules are cascadeless schedules.
All cascadeless schedules are not strict schedules.

Database Recovery
Purpose of Database Recovery
To bring the database into the last consistent state, which existed prior
to the failure.
To preserve transaction properties (Atomicity, Consistency, Isolation
and Durability).
Example:
If the system crashes before a fund transfer transaction completes its
execution, then either one or both accounts may have incorrect value.
Thus, the database must be restored to the state before the
transaction modified any of the accounts.

6
1/9/2024

Types of Failure
• The database may become unavailable for use due to
 Transaction failure: Transactions may fail because of
incorrect input, deadlock, incorrect synchronization.
System failure: System may fail because of addressing
error, application error, operating system fault, RAM
failure, etc.
Media failure: Disk head crash, power interruption, etc.

Transaction Log
 For recovery from any type of failure data values prior
to modification (BFIM - BeFore Image) and the new
value after modification (AFIM – AFter Image) are
required.
BFIM: The old value of the data item before
updating
AFIM: The new value of the data item after
updating
These values and other information is stored in a
sequential file called Transaction log.

7
1/9/2024

Data Update
• Immediate Update: As soon as a data item is modified in
cache, the disk copy is updated.
• Deferred Update: All modified data items in the cache is
written either after a transaction ends its execution or after a
fixed number of transactions have completed their
execution.
• Shadow update: The modified version of a data item does
not overwrite its disk copy but is written at a separate disk
location.

Recovery Scheme:
1. Deferred Update (No Undo/Redo)
The data update goes as follows:
• A set of transactions records their updates in the log.
• At commit point under WAL scheme these updates are saved on
database disk.
• Write-Ahead Logging ( WAL ) is a standard method for
ensuring data integrity which means a mechanism used to
ensure data durability, consistency, and recovery in the event
of failures.
• After reboot from a failure the log is used to redo all the
transactions affected by this failure.
• No undo is required because no AFIM is flushed to the disk
before a transaction commits.

8
1/9/2024

2. Recovery Techniques Based on Immediate Update


Undo/No-redo Algorithm
 In this algorithm AFIMs of a transaction are flushed to
the database disk under WAL before it commits.
For this reason the recovery manager undoes all
transactions during recovery.
No transaction is redone.
It is possible that a transaction might have completed
execution and ready to commit but this transaction is also
undone.

3.Shadow Paging
The AFIM does not overwrite its BFIM but recorded at
another place on the disk.
Thus, at any time a data item has AFIM and BFIM
(Shadow copy of the data item) at two different places on
the disk.

X and Y: Shadow copies of data items


X' and Y': Current copies of data items

9
1/9/2024

4. The ARIES Recovery Algorithm


It stands for: Algorithm for Recovery and Isolation Exploiting
Semantics
ARIES uses logs to record the progress of transactions and
their actions which cause changes to recoverable data objects.
The ARIES Recovery Algorithm is based on:
WAL (Write Ahead Logging):Is that DBMS needs to write the log
records associated with a particular modification before it writes the
page to the disk
Repeating history during redo:
• ARIES will retrace all actions of the database system prior to the crash to
reconstruct the database state when the crash occurred.
Logging changes during undo:
• It will prevent ARIES from repeating the completed undo operations if a
failure occurs during recovery, which causes a restart of the recovery
process.

The ARIES Recovery Algorithm Consists of Three Steps:


1. Analysis:The analysis step identifies the dirty (updated)
pages in the buffer and the set of transactions active at the
time of the crash
2. Redo: necessary redo operations are applied.
3. Undo: log is scanned backwards and the operations of
transactions active at the time of crash are undone in
reverse order.

10
1/9/2024

Recovery in Multi database Systems


MDBS integrates a set of autonomous and heterogeneous
local DBSs.
Such as:
CAD(Computer Aided Design)
CASE(computer-aided software engineering)
GIS (geographic information systems)
WFMS(Workflow Management System)
In turn, each local DBS consists of a local DBMS and a
database.
Users can access information from multiple sources
through global transactions.
Operations belonging to global transactions are executed
by local DBMSs.
Besides global transactions, there exist local transactions
in a multidatabase environment.

Cont.….
• The Global Transaction Manager (GTM) is Responsible for
recoverability in MDBS
• GTM is a set of Interface Servers (servers, for short), and
multiple LDBSs.
• To each LDBS, there is an associated server.
• An LDBS consists of a DBMS and at least one database.
• The GTM comprises three modules:
1. Global Transaction Interface (GTI)
2. Global Scheduler (GS)
3. Global Recovery Manager (GRM).

11
1/9/2024

Group Assignment
Group 1: Data Fragmentation Techniques in Distributed
database Design
Group 2: Replication Allocation Techniques for Distributed
database Design
Group 3: Types of Distributed Database Systems
Group 4: Query Processing in Distributed Databases

End of Chapter-5

12
Chapter 6

Data Structures
The Set
 A set in data structure is a collection of unique elements, often
referred to as distinct values or items.
 It is an abstract data type that can store a collection of data
elements without any particular order.
 Each element in the set is unique, meaning that no two elements
can have the same value.
 Sets are commonly used in computer science and programming
to solve problems that involve storing and manipulating distinct
elements.
 They provide operations such as inserting an element into the
set, removing an element from the set, checking if an element is
present in the set, and performing operations like union,
intersection, and difference between sets.
 It does not allow duplicate elements. When an element is inserted into a set, the set
checks if the element already exists, and if it does, the insertion is ignored.
 Sets are also often implemented using data structures such as arrays, linked lists,
or binary trees.
 The choice of implementation depends on the specific requirements of the
problem at hand, such as the expected size of the set, the expected number of
operations, and the desired time complexity for the operations.
Set Implementation Classes
 Implementing a set using the HashSet class in Java:
import java.util.HashSet;
import java.util.Set;
public class SetExample {
public static void main(String[] args) {
// Create a set of integers
Set<Integer> numberSet = new HashSet<>();
// Add elements to the set
numberSet.add(10);
numberSet.add(20);
numberSet.add(30);
numberSet.add(40);
// Print the set
System.out.println("Number Set: " + numberSet);
// Check if an element exists in the set
boolean contains20 = numberSet.contains(20);
System.out.println("Contains 20: " + contains20);
// Remove an element from the set
boolean removed = numberSet.remove(30);
System.out.println("Removed 30: " + removed);

// Print the set after removal


System.out.println("Number Set after removal: " + numberSet)
// Perform set operations
Set<Integer> otherSet = new HashSet< >();
otherSet.add(20);
otherSet.add(50);
otherSet.add(60);
// Union of two sets
Set<Integer> unionSet = new HashSet<>(numberSet);
unionSet.addAll(otherSet);
System.out.println("Union Set: " + unionSet);
// Intersection of two sets
Set<Integer> intersectionSet = new HashSet<>(numberSet);
intersectionSet.retainAll(otherSet);
System.out.println("Intersection Set: " + intersectionSet);

// Difference between two sets


Set<Integer> differenceSet = new HashSet<>(numberSet);
differenceSet.removeAll(otherSet);
System.out.println("Difference Set: " + differenceSet);
}
}
The List

 A list in data structure is a collection of elements where each element is


assigned a unique position or index within the list.
 It is an ordered collection, meaning that the elements are stored sequentially
in the list, and each element can be accessed using its index.
 Lists can be implemented using various data structures, such as arrays,
linked lists, or dynamic arrays.
 The main advantage of a list is its ability to store elements in a specific
order.
 This allows for easy traversal of the elements in a predictable manner, either
sequentially or by accessing specific positions.
 Some common operations performed on lists include
 inserting an element at a specific position

 removing an element from a specific position

 updating the value of an element at a specific position

 accessing an element by its index and

 finding the number of elements in the list.

 Additionally, lists can support various features such as sorting, searching,


merging, splitting, and reversing the order of elements.
 In Java, the "List" interface is part of the Java Collections framework and provides
an abstraction for implementing lists.
 The most commonly used implementation classes for the List interface are:
 1. ArrayList:
 It is an implementation of a dynamic array, meaning it automatically resizes itself
when elements are added or removed.
 It provides fast access to elements using their indexes but may have slower
insertion or removal operations at the beginning or middle of the list.
 2. LinkedList:
 It is an implementation of a doubly-linked list, where each element is connected
to its previous and next elements.
 It provides fast insertion and removal operations anywhere in the list but may
have slower access to elements using their indexes.
List Implementation Classes
 Implementing List using ArrayList in java:
import java.util.ArrayList;
import java.util.List;
public class ArrListExample {
public static void main(String[] args) {
// Create an ArrayList of Strings
List<String> names = new ArrayList<>();
// Add elements to the list
names.add(“Abebe");
names.add(“Chala");
names.add("Mike");
// Access elements by index
String first = names.get(0);
System.out.println("First name: " + first);
// Remove an element
boolean removed = names.remove(“Chala");
System.out.println("Removed: " + removed);
// Print the list
System.out.println("Names: " + names);
}
}
Implementing list using LinkedList in java
import java.util.LinkedList;
import java.util.List;
public class LinkListExample {
public static void main(String[] args) {
// Create a LinkedList of Integers
List<Integer> numbers = new LinkedList<>();
// Add elements to the list
numbers.add(10);
numbers.add(20);
numbers.add(30);
// Access elements by index
int first = numbers.get(0);
System.out.println("First number: " + first);
// Remove an element
boolean removed = numbers.remove(Integer.valueOf(20));
System.out.println("Removed: " + removed);
// Print the list
System.out.println("Numbers: " + numbers);
}
}
The Queue
 A queue is a data structure that follows the First-In-First-Out (FIFO) principle.
 It is an ordered collection of elements where the addition of new elements happens
at one end, called the rear or enqueue, and the removal of elements happens at the
other end, called the front or dequeue.
 The element that has been in the queue the longest is the first one to be removed.
 The main operations supported by a queue are:
 Enqueue: Adding an element to the rear of the queue.
 Dequeue: Removing and returning the element from the front of the queue.
 Peek/Front: Returning the element at the front of the queue without removing
it.
 isEmpty: Checking if the queue is empty.
 Size: Returning the current number of elements in the queue.
 Queues can be implemented using various data structures such as:
 arrays
 linked lists or
 circular buffers.
 Each implementation has its own advantages and trade-offs in terms of time
complexity and memory usage.
Queue Implementation Classes
import java.util.LinkedList;
import java.util.Queue;

public class QueueExample {


public static void main(String[] args) {
Queue<String> queue = new LinkedList<>();

// Enqueue elements
queue.add(“Chala");
queue.add(“Abebe");
queue.add("Mike");

// Print the current queue


System.out.println("Queue: " + queue);
// Dequeue an element
String dequeued = queue.remove();
System.out.println("Dequeued: " + dequeued);

// Get the front element without dequeueing


String front = queue.peek();
System.out.println("Front: " + front);

// Check if the queue is empty


boolean empty = queue.isEmpty();
System.out.println("Is Empty? " + empty);

// Print the updated queue


System.out.println("Updated Queue: " + queue);
}
}

You might also like