Unit 4 dbms LNU
Unit 4 dbms LNU
Unit-4
Transaction
A transaction can be defined as a group of tasks. A single task is the minimum processing
unit which cannot be divided further.
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This small
transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in
main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
For example: If in the above transaction, the debit transaction fails after executing operation 2
then X's value will remain 4000 in the database which is not acceptable by the bank.
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
Atomicity − This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either
before the execution of the transaction or after the execution/abortion/failure of the
transaction.
Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be
updated once the system springs back into action.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
States of Transactions
A transaction in a database can be in one of the following states −
Active − In this state, the transaction is being executed. This is the initial state of every
transaction.
Partially Committed − When a transaction executes its final operation, it is said to be in
a partially committed state.
Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then
the recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select
one of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
Committed − If a transaction executes all its operations successfully, it is said to be
committed. All its effects are now permanently established on the database system.
Transaction Properties
The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is
aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Consistency
o The integrity constraints are maintained so that the database is consistent before and
after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a
new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to another
consistent state.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot be
used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that
data item can't be accessed by any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database's consistent
state. It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system
failure. When a transaction is completed, then the database reaches a state known as
the consistent state. That consistent state cannot be lost, even in the event of a
system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
What is the term serializability in DBMS?
Example
Let’s take two transactions T1 and T2,
If both transactions are performed without interfering each other then it is called
as serial schedule, it can be represented as follows –
T1 T2
READ1(A)
WRITE1(A)
READ1(B)
C1
READ2(B)
WRITE2(B)
READ2(B)
C2
Non serial schedule − When a transaction is overlapped between the transaction
T1 and T2.
Example
Consider the following example −
T1 T2
READ1(A)
WRITE1(A)
READ2(B)
WRITE2(B)
READ1(B)
WRITE1(B)
READ1(B)
Types of serializability
There are two types of serializability –
View serializability
Conflict serializability
It orders any conflicting operations in the same way as some serial execution. A pair
of operations is said to conflict if they operate on the same data item and one of
them is a write operation.
That means
Precedence graph
Example 1
The cycle is present so it is not conflict serializable.
Example 2
1. Logical errors: If a transaction cannot complete due to some code error or an internal
error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction
because the database system is not able to execute it. For example, The system
aborts an active transaction, in case of deadlock or resource unavailability.
Recoverability in DBMS
The characteristics of non-serializable schedules are as follows −
Irrecoverable schedules
If a transaction does a dirty read operation from an uncommitted transaction and commits before the
transaction from where it has read the value, then such a schedule is called an irrecoverable
schedule.
Example
T1 T2
Read(A)
Write(A)
T1 T2
Read(A)
- ///Dirty
Read
- Write(A)
- Commit
Rollback
Recoverable Schedules
If any transaction that performs a dirty read operation from an uncommitted transaction and also its
committed operation becomes delayed till the uncommitted transaction is either committed or
rollback such type of schedules is called as Recoverable Schedules.
Example
T1 T2
T1 T2
Read(A)
Write(A)
Read(A)
- ///Dirty
Read
- Write(A)
Commit
Commit //
delayed
Let's assume there is a transaction to modify the City of a student. The following logs are
written for this transaction.
o When the transaction is initiated, then it writes 'start' log.
1. <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log
is written to the file.
1. <Tn, City, 'Noida', 'Bangalore' >
o When the transaction is finished, then it writes another log to indicate the end of
the transaction.
1. <Tn, Commit>
The recovery system reads log files from the end to start. It reads log files from T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with
o <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all
the transactions are removed and then redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn, Commit>. The
T1 transaction will have only <Tn, commit> in the log file. That's why the transaction is
committed after the checkpoint is crossed. Hence it puts T1, T2 and T3 transaction into redo
list.
o The transaction is put into undo state if the recovery system sees a log with <Tn, Start> but no
commit or abort log found. In the undo-list, all the transactions are undone, and their logs are
removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list since this
transaction is not yet complete and failed amid.
Types of Checkpoints
There are basically two main types of Checkpoints:
1. Automatic Checkpoint
2. Manual Checkpoint
1. Automatic Checkpoint: These checkpoints occur very frequently like every hour or every
day. These intervals are set by the database administrator. They are generally used by heavy
databases as they are frequently updated, and we can recover the data easily in case of
failure.
2. Manual Checkpoint: These are the checkpoints that are manually set by the database
administrator. Manual checkpoints are generally used for smaller databases. They are
updated very less frequently only when they are set by the database administrator.
Advantages of Checkpoints
Checkpoints help us in recovering the transaction of the database in case of a random
shutdown of the database.
It enhancing the consistency of the database in case when multiple transactions are
executing in the database simultaneously.
It increasing the data recovery process.
Checkpoints work as a synchronization point between the database and the transaction log
file in the database.
Checkpoint records in the log file are used to prevent unnecessary redo operations.
Since dirty pages are flushed out continuously in the background, it has a very low overhead
and can be done frequently.
Checkpoints provide the baseline information needed for the restoration of the lost state in
the event of a system failure.
A database checkpoint keeps track of change information and enables incremental database
backup.
A database storage checkpoint can be mounted, allowing regular file system operations to
be performed.
Database checkpoints can be used for application solutions which include backup, recovery
or database modifications.
Disadvantages of Checkpoints
1. Database storage checkpoints can only be used to restore from logical errors (E.g. a
human error).
2. Because all the data blocks are on the same physical device, database storage checkpoints
cannot be used to restore files due to a media failure.
For example, in the following wait-for-graph, transaction T1 is waiting for data item X which is
locked by T3. T3 is waiting for Y which is locked by T2 and T2 is waiting for Z which is locked by
T1. Hence, a waiting cycle is formed, and none of the transactions can proceed executing.
Deadlock Handling
There are three classical approaches for deadlock handling, namely −
Deadlock prevention.
Deadlock avoidance.
Deadlock detection and removal.
All of the three approaches can be incorporated in both a centralized and a distributed
database system.
Deadlock Prevention
The deadlock prevention approach does not allow any transaction to acquire locks that will
lead to deadlocks. The convention is that when more than one transactions request for
locking the same data item, only one of them is granted the lock.
One of the most popular deadlock prevention methods is pre-acquisition of all the locks. In
this method, a transaction acquires all the locks before starting to execute and retains the
locks for the entire duration of transaction. If another transaction needs any of the already
acquired locks, it has to wait until all the locks it needs are available. Using this approach, the
system is prevented from being deadlocked since none of the waiting transactions are holding
any lock.
Deadlock Avoidance
The deadlock avoidance approach handles deadlocks before they occur. It analyzes the
transactions and the locks to determine whether or not waiting leads to a deadlock.
The method can be briefly stated as follows. Transactions start executing and request data
items that they need to lock. The lock manager checks whether the lock is available. If it is
available, the lock manager allocates the data item and the transaction acquires the lock.
However, if the item is locked by some other transaction in incompatible mode, the lock
manager runs an algorithm to test whether keeping the transaction in waiting state will cause
a deadlock or not. Accordingly, the algorithm decides whether the transaction can wait or one
of the transactions should be aborted.
There are two algorithms for this purpose, namely wait-die and wound-wait. Let us assume
that there are two transactions, T1 and T2, where T1 tries to lock a data item which is already
locked by T2. The algorithms are as follows –
Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2,
T1 is aborted and later restarted.
Wound-Wait − If T1 is older than T2, T2 is aborted and later restarted. Otherwise, if T1 is
younger than T2, T1 is allowed to wait.
But before knowing about concurrency control, we should know about concurrent execution.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the concurrent
execution of database operations and thus avoiding the inconsistencies in the database. Thus, for
maintaining the concurrency of the database, we have the concurrency control protocols.
Lock management
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires anappropriate lock
on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock,then it
can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by thetransaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Security for databases must cover and safeguard the following aspects:
o The database containing data.
o Database management systems (DBMS)
o Any applications that are associated with it.
o Physical database servers or the database server virtual, and the hardware that runs it.
o The infrastructure for computing or network that is used to connect to the database.
Security of databases is a complicated and challenging task that requires all aspects of security
practices and technologies. This is inherently at odds with the accessibility of databases. The
more usable and accessible the database is, the more susceptible we are to threats from
security. The more vulnerable it is to attacks and threats, the more difficult it is to access and
utilize.
A distributed database is basically a database that is not limited to one system, it is spread
over different sites, i.e, on multiple computers or over a network of computers. A distributed
database system is located on various sites that don’t share physical components. This may
be required when a particular database needs to be accessed by various users globally. It
needs to be managed such that for the users it looks like one single database.
1. Object Structure:
The structure of an object refers to the properties that an object is made up of. These
properties of an object are referred to as an attribute. Thus, an object is a real-world entity
with certain attributes that makes up the object structure. Also, an object encapsulates the
data code into a single unit which in turn provides data abstraction by hiding the
implementation details from the user.
The object structure is further composed of three types of components: Messages, Methods,
and Variables. These are explained below.
1. Messages –
A message provides an interface or acts as a communication medium between an object
and the outside world. A message can be of two types:
Read-only message: If the invoked method does not change the value of a variable,
then the invoking message is said to be a read-only message.
Update message: If the invoked method changes the value of a variable, then the
invoking message is said to be an update message.
2. Methods –
When a message is passed then the body of code that is executed is known as a method.
Whenever a method is executed, it returns a value as output. A method can be of two
types:
Read-only method: When the value of a variable is not affected by a method, then it is
known as the read-only method.
Update-method: When the value of a variable change by a method, then it is known
as an update method.
3. Variables –
It stores the data of an object. The data stored in the variables makes the object
distinguishable from one another.
2. Object Classes:
An object which is a real-world entity is an instance of a class. Hence first we need to define a
class and then the objects are made which differ in the values they store but share the same
class definition. The objects in turn correspond to various messages and variables stored in
them.
Example –
class CLERK
{ //variables
char name;
string address;
int id;
int salary;
//Messages
char get_name();
string get_address();
int annual_salary();
};
In the above example, we can see, CLERK is a class that holds the object variables and
messages.
Features of ODBMS:
Object-oriented data model: ODBMS uses an object-oriented data model to store and
manage data. This allows developers to work with data in a more natural way, as objects are
similar to the objects in the programming language they are using.
Complex data types: ODBMS supports complex data types such as arrays, lists, sets, and
graphs, allowing developers to store and manage complex data structures in the database.
Automatic schema management: ODBMS automatically manages the schema of the
database, as the schema is defined by the classes and objects in the application code. This
eliminates the need for a separate schema definition language and simplifies the
development process.
High performance: ODBMS can provide high performance, especially for applications that
require complex data access patterns, as objects can be retrieved with a single query.
Data integrity: ODBMS provides strong data integrity, as the relationships between objects
are maintained by the database. This ensures that data remains consistent and correct, even
in complex applications.
Concurrency control: ODBMS provides concurrency control mechanisms that ensure that
multiple users can access and modify the same data without conflicts.
Scalability: ODBMS can scale horizontally by adding more servers to the database cluster,
allowing it to handle large volumes of data.
Support for transactions: ODBMS supports transactions, which ensure that multiple
operations on the database are atomic and consistent.
Advantages:
Supports Complex Data Structures: ODBMS is designed to handle complex data structures,
such as inheritance, polymorphism, and encapsulation. This makes it easier to work with
complex data models in an object-oriented programming environment.
Improved Performance: ODBMS provides improved performance compared to traditional
relational databases for complex data models. ODBMS can reduce the amount of mapping
and translation required between the programming language and the database, which can
improve performance.
Reduced Development Time: ODBMS can reduce development time since it eliminates the
need to map objects to tables and allows developers to work directly with objects in the
database.
Supports Rich Data Types: ODBMS supports rich data types, such as audio, video, images, and
spatial data, which can be challenging to store and retrieve in traditional relational databases.
Scalability: ODBMS can scale horizontally and vertically, which means it can handle larger
volumes of data and can support more users.
Disadvantages:
Limited Adoption: ODBMS is not as widely adopted as traditional relational databases, which
means it may be more challenging to find developers with experience working with ODBMS.
Lack of Standardization: ODBMS lacks standardization, which means that different vendors
may implement different features and functionality.
Cost: ODBMS can be more expensive than traditional relational databases since it requires
specialized software and hardware.
Integration with Other Systems: ODBMS can be challenging to integrate with other systems,
such as business intelligence tools and reporting software.
Scalability Challenges: ODBMS may face scalability challenges due to the complexity of the
data models it supports, which can make it challenging to partition data across multiple
nodes.