0% found this document useful (0 votes)
32 views45 pages

DBMS Unit4

Uploaded by

Mohammad Aabid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views45 pages

DBMS Unit4

Uploaded by

Mohammad Aabid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Relational Algebra

Relational algebra is a procedural query language. It gives a step by step process to


obtain the result of the query. It uses operators to perform queries.

Types of Relational operation

1. Select Operation:

o The select operation selects tuples that satisfy a given predicate.


o It is denoted by sigma (σ).

1. Notation: σ p(r)

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.

For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500


Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

1. σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:

o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye


Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:

o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:
DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO
Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner
Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:

o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in both R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:

o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in R but not in S.
o It is denoted by intersection minus (-).

1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table


Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Jackson

Hayes

Willians

Curry

6. Cartesian product

o The Cartesian product is used to combine each row in one table with each
row in the other table. It is also known as a cross product.
o It is denoted by X.

1. Notation: E X D

Example:
EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales
C Legal

Input:

1. EMPLOYEE X DEPARTMENT

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to


STUDENT1.

1. ρ(STUDENT1, STUDENT)

Join Operations:

given join condition is satisfied. It is denoted by ⋈.


A Join operation combines related tuples from different relations, if and only if a

Example:
EMPLOYEE

EMP_CODE EMP_NAME

101 Stephan

102 Jack

103 Harry

SALARY

EMP_CODE SALARY

101 50000

102 30000

103 25000

1. Operation: (EMPLOYEE ⋈ SALARY)

Result:

EMP_CODE EMP_NAME SALARY

101 Stephan 50000

102 Jack 30000

103 Harry 25000

Types of Join operations:


1. Natural Join:

o A natural join is the set of tuples of all combinations in R and S that are equal
on their common attribute names.
o It is denoted by ⋈.

Example: Let's use the above EMPLOYEE table and SALARY table:

Input:

1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)

Output:

EMP_NAME SALARY

Stephan 50000

Jack 30000
Harry 25000

2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with
missing information.

Example:

EMPLOYEE

EMP_NAME STREET CITY

Ram Civil line Mumbai

Shyam Park street Kolkata

Ravi M.G. Street Delhi

Hari Nehru nagar Hyderabad

FACT_WORKERS

EMP_NAME BRANCH SALARY

Ram Infosys 10000

Shyam Wipro 20000

Kuber HCL 30000

Hari TCS 50000

Input:

1. (EMPLOYEE ⋈ FACT_WORKERS)

Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000


Hari Nehru nagar Hyderabad TCS 50000

An outer join is basically of three types:

a. Left outer join


b. Right outer join
c. Full outer join

a. Left outer join:

o Left outer join contains the set of tuples of all combinations in R and S that are
equal on their common attribute names.
o In the left outer join, tuples in R have no matching tuples in S.
o It is denoted by ⟕.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟕ FACT_WORKERS

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

b. Right outer join:

o Right outer join contains the set of tuples of all combinations in R and S that
are equal on their common attribute names.
o In right outer join, tuples in S have no matching tuples in R.
o It is denoted by ⟖.

Example: Using the above EMPLOYEE table and FACT_WORKERS Relation

Input:
1. EMPLOYEE ⟖ FACT_WORKERS

Output:

EMP_NAME BRANCH SALARY STREET CITY

Ram Infosys 10000 Civil line Mumbai

Shyam Wipro 20000 Park street Kolkata

Hari TCS 50000 Nehru street Hyderabad

Kuber HCL 30000 NULL NULL

c. Full outer join:

o Full outer join is like a left or right join except that it contains all rows from both
tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S
that have no matching tuples in R in their common attribute name.
o It is denoted by ⟗.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟗ FACT_WORKERS

Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

Kuber NULL NULL HCL 30000

3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched
data as per the equality condition. The equi join uses the comparison operator(=).

Example:

CUSTOMER RELATION

CLASS_ID NAME

1 John

2 Harry

3 Jackson

PRODUCT

PRODUCT_ID CITY

1 Delhi

2 Mumbai

3 Noida

Input:

1. CUSTOMER ⋈ PRODUCT

Output:

CLASS_ID NAME PRODUCT_ID CITY

1 John 1 Delhi

2 Harry 2 Mumbai

3 Harry 3 Noida
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is
known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name,


Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated
with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency

1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dep
endency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_N
ame are trivial dependencies too.
2. non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB

Inference Rule (IR):


o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies on a
relational database.
o The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
o Using the inference rule, we can derive additional functional dependency from
the initial set.

The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1)


In the reflexive rule, if Y is a subset of X, then X determines Y.

If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In augmentation, if X determines Y,
then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example:
For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z
separately.
If X → YZ then X → Y and X → Z

Proof:

1.X→YZ (given)
2. Y Z→Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)

6. Pseudo transitive Rule (IR6)


In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

Normalization
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:

o Making relations very large.


o It isn't easy to maintain and update data as it would involve searching many
records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyse and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure
to eliminate anomalies leads to data redundancy and can cause data integrity and
other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a


new tuple into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the
deletion of data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single
data value requires multiple rows of data to be updated.

Types of Normal Forms:


Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal form
if it satisfies constraints.

Following are the various types of Normal forms:


Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-
valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.

Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher
degree.
o Careless decomposition may lead to a bad database design, leading to
serious problems.

What is lossless join decomposition in


DBMS?

Lossless-join decomposition is a process in which a relation is decomposed into two or


more relations. This property guarantees that the extra or less tuple generation problem
does not occur and no information is lost from the original relation during the
decomposition. It is also known as non-additive join decomposition.

When the sub relations combine again then the new relation must be the same as the
original relation was before decomposition.

Consider a relation R if we decomposed it into sub-parts relation R1 and relation R2.

The decomposition is lossless when it satisfies the following statement −

 If we union the sub Relation R1 and R2 then it must contain all the attributes that
are available in the original relation R before decomposition.
 Intersections of R1 and R2 cannot be Null. The sub relation must contain a
common attribute. The common attribute must contain unique data.

The common attribute must be a super key of sub relations either R1 or R2.

Here,

R = (A, B, C)

R1 = (A, B)

R2 = (B, C)

The relation R has three attributes A, B, and C. The relation R is decomposed into two
relation R1 and R2. . R1 and R2 both have 2-2 attributes.The common attributes are B.
The Value in Column B must be unique. if it contains a duplicate value then the
Lossless-join decomposition is not possible.

Draw a table of Relation R with Raw Data −

R (A, B, C)
A B C
12 25 34
10 36 09
12 42 30

It decomposes into the two sub relations −

R1 (A, B)
A B
12 25
10 36
12 42
R2 (B, C)
B C
25 34
36 09
42 30

Now, we can check the first condition for Lossless-join decomposition.

The union of sub relation R1 and R2 is the same as relation R.

R1U R2 = R

We get the following result −

A B C
12 25 34
10 36 09
12 42 30

The relation is the same as the original relation R. Hence, the above decomposition is
Lossless-join decomposition.
Transaction Management
Transactions are a set of operations used to perform a logical set of work.
It is the bundle of all the instructions of a logical operation. A transaction
usually means that the data in the database has changed. One of the
major uses of DBMS is to protect the user’s data from system failures. It is
done by ensuring that all the data is restored to a consistent state when
the computer is restarted after a crash. The transaction is any one
execution of the user program in a DBMS. One of the important properties
of the transaction is that it contains a finite number of steps. Executing the
same program multiple times will generate multiple transactions.
Example: Consider the following example of transaction operations to be
performed to withdraw cash from an ATM vestibule.
Steps for ATM Transaction
1. Transaction Start.
2. Insert your ATM card.
3. Select a language for your transaction.
4. Select the Savings Account option.
5. Enter the amount you want to withdraw.
6. Enter your secret pin.
7. Wait for some time for processing.
8. Collect your Cash.
9. Transaction Completed.
A transaction can include the following basic database access
operation.
 Read/Access data (R): Accessing the database item from disk
(where the database stored data) to memory variable.
 Write/Change data (W): Write the data item from the memory
variable to the disk.
 Commit: Commit is a transaction control language that is used to
permanently save the changes done in a transaction
Example: Transfer of 50₹ from Account A to Account B. Initially A= 500 ₹,
B= 800₹. This data is brought to RAM from Hard Disk.

R(A) -- 500 // Accessed from RAM.


A = A-50 // Deducting 50 ₹ from A.
W(A)--450 // Updated in RAM.
R(B) -- 800 // Accessed from RAM.
B=B+50 // 50 ₹ is added to B's Account.
W(B) --850 // Updated in RAM.
commit // The data in RAM is taken back to Hard Disk.

Stages of Transaction

Note: The updated value of Account A = 450 ₹ and Account B = 850 ₹.


All instructions before committing come under a partially committed state
and are stored in RAM. When the commit is read the data is fully
accepted and is stored on a Hard Disk.
If the transaction is failed anywhere before committing we have to go back
and start from the beginning. We can’t continue from the same state. This
is known as Roll Back.
Desirable Properties of Transaction (ACID
Properties)
For a transaction to be performed in DBMS, it must possess several
properties often called ACID properties.
 A – Atomicity
 C – Consistency
 I – Isolation
 D – Durability
Transaction States
Transactions can be implemented using SQL queries and Servers. In the
diagram, you can see how transaction states work.
Transaction States

The transaction has four properties. These are used to maintain


consistency in a database, before and after the transaction.
Property of Transaction:
 Atomicity
 Consistency
 Isolation
 Durability
Atomicity
 States that all operations of the transaction take place at once if
not, the transactions are aborted.
 There is no midway, i.e., the transaction cannot occur partially.
Each transaction is treated as one unit and either run to
completion or is not executed at all.
 Atomicity involves the following two operations:
 Abort: If a transaction aborts, then all the changes made are not
visible.
 Commit: If a transaction commits then all the changes made are
visible.
Consistency
 The integrity constraints are maintained so that the database is
consistent before and after the transaction.
 The execution of a transaction will leave a database in either its
prior stable state or anew stable state.
 The consistent property of database states that every transaction
sees a consistent database instance.
 The transaction is used to transform the database from one
consistent state to another consistent state.
Isolation
 It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the
first one is completed.
 In isolation, if the transaction T1 is being executed and using the
data item X, then that data item can’t be accessed by any other
transaction T2 until the transaction T1ends.
 The concurrency control subsystem of the DBMS enforced the
isolation property
Durability
 The durability property is used to indicate the performance of the
database’s consistent state. It states that the transaction made
the permanent changes.
 They cannot be lost by the erroneous operation of a faulty
transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even in the
event of a system’s failure.
 The recovery subsystem of the DBMS has the responsibility of
Durability property.
Implementing of Atomicity and Durability
The recovery-management component of a database system can support
atomicity and durability by a variety of schemes.
E.g. the shadow-database scheme:
Shadow copy
 In the shadow-copy scheme, a transaction that wants to update
the database first creates a complete copy of the database.
 All updates are done on the new database copy, leaving the
original copy, the shadow copy, untouched. If at any point the
transaction has to be aborted, the system merely deletes the new
copy. The old copy of the database has not been affected.
 This scheme is based on making copies of the database, called
shadow copies, assumes that only one transaction is active at a
time.
 The scheme also assumes that the database is simply a file on
disk. A pointer called db pointer is maintained on disk; it points to
the current copy of the database.
Transaction Isolation Levels in DBMS
Some other transaction may also have used value produced by the failed
transaction. So we also have to rollback those transactions.
The SQL standard defines four isolation levels:
 Read Uncommitted: Read Uncommitted is the lowest isolation
level. In this level, one transaction may read not yet committed
changes made by other transaction, thereby allowing dirty reads.
In this level, transactions are not isolated from each other.
 Read Committed: This isolation level guarantees that any data
read is committed at the moment it is read. Thus it does not
allows dirty read. The transaction holds a read or write lock on
the current row, and thus prevent other transactions from
reading, updating or deleting it.
 Repeatable Read: This is the most restrictive isolation level. The
transaction hold seed locks on all rows it references and writes
locks on all rows it inserts, updates. deletes. Since other
transaction cannot read, update or delete these rows,
consequently it
avoids non-repeatable read.
 Serializable: This is the Highest isolation level. A serializable
execution is guaranteed to be serializable. Serializable execution
is defined to be an execution of operations in which concurrently
executing transactions appears to be serially executing.
Failure Classification
To find that where the problem has occurred, we generalize a failure into
the following categories:
 Transaction failure
 System crash
 Disk failure

1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a
point from where it can’t go any further. If a few transactions or process is
hurt, then this is called as transaction failure.
Reasons for a transaction failure could be –
1. Logical errors: If a transaction cannot complete due to some
code error or an internal error condition, then the logical error
occurs.
2. Syntax error: It occurs where the DBMS itself terminates an
active transaction because the database system is not able to
execute it. For example, The system aborts an active transaction,
in case of deadlock or resource unavailability.
2. System Crash
System failure can occur due to power failure or other hardware or
software failure. Example: Operating system error.
 Fail-stop assumption: In the system crash, non-volatile storage
is assumed not to be corrupted.
3. Disk Failure
 It occurs where hard-disk drives or storage drives used to fail
frequently. It was a common problem in the early days of
technology evolution.
 Disk failure occurs due to the formation of bad sectors, disk head
crash, and unreachability to the disk or any other failure, which
destroy all or part of disk storage.

Uses of Transaction Management


 The DBMS is used to schedule the access of data concurrently. It
means that the user can access multiple data from the database
without being interfered with by each other. Transactions are
used to manage concurrency.
 It is also used to satisfy ACID properties.
 It is used to solve Read/Write Conflicts.
 It is used to implement Recoverability, Serializability, and
Cascading.
 Transaction Management is also used for Concurrency Control
Protocols and the Locking of data.
Advantages of using a Transaction
 Maintains a consistent and valid database after each transaction.
 Makes certain that updates to the database don’t affect its
dependability or accuracy.
 Enables simultaneous use of numerous users without sacrificing
data consistency.
Disadvantages of using a Transaction
 It may be difficult to change the information within the transaction
database by end-users.
 We need to always roll back and start from the beginning rather
than continue from the previous state.
Conclusion
In DBMSs, transaction management is crucial to preserving data integrity.
To guarantee dependable operations, it upholds the ACID (Atomicity,
Consistency, Isolation, Durability) qualities. A key component of reliable
database systems, transactions enable the grouping of several processes
into a single unit while providing data consistency and security against
concurrent access.
Schedule
A series of operation from one transaction to another transaction is known as
schedule. It is used to preserve the order of the operation in each of the individual
transaction.

1. Serial Schedule
The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction. In the serial schedule, when the first
transaction completes its cycle, then the next transaction is executed.

For example: Suppose there are two transactions T1 and T2 which have some
operations. If it has no interleaving of operations, then there are the following two
possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of
T2.
2. Execute all the operations of T1 which was followed by all the operations of
T2.

o In the given (a) figure, Schedule A shows the serial schedule where T1
followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2
followed by T1.

2. Non-serial Schedule
o If interleaving of operations is allowed, then there will be non-serial schedule.
o It contains many possible orders in which the system can execute the
individual operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving of operations.

3. Serializable schedule
o The serializability of schedules is used to find non-serial schedules that allow
the transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction
have interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Here,

Schedule A and Schedule B are serial schedule.

Schedule C and Schedule D are Non-serial schedule.

Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.

Assume a schedule S. For S, we construct a graph known as precedence graph.


This graph has a pair G = (V, E), where V consists a set of vertices, and E consists a
set of edges. The set of vertices is used to contain all the transactions participating in
the schedule. The set of edges is used to contain all edges Ti ->Tj for which one of
the three conditions holds:

1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).


2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).

o If a precedence graph contains a single edge Ti → Tj, then all the instructions
of Ti are executed before the first instruction of Tj is executed.
o If a precedence graph for schedule S contains a cycle, then S is non-
serializable. If the precedence graph has no cycle, then S is known as
serializable.

For example:
Explanation:

Read(A): In T1, no subsequent writes to A, so no new edges


Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges

Precedence graph for schedule S1:


The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is
non-serializable.
Explanation:

Read(A): In T4,no subsequent writes to A, so no new edges


Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges

Precedence graph for schedule S2:

The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is
serializable.

Conflict Serializable Schedule


o A schedule is called conflict serializability if after swapping of non-conflicting
operations, it can transform into a serial schedule.
o The schedule will be a conflict serializable if it is conflict equivalent to a serial
schedule.

Conflicting Operations
The two operations become conflicting if all conditions satisfy:

1. Both belong to separate transactions.


2. They have the same data item.
3. They contain at least one write operation.

Example:
Swapping is possible only if S1 and S2 are logically equal.

Here, S1 = S2. That means it is non-conflict.


Here, S1 ≠ S2. That means it is conflict.

Conflict Equivalent
In the conflict equivalent, one can be transformed to another by swapping non-
conflicting operations. In the given example, S2 is conflict equivalent to S1 (S1 can
be converted to S2 by swapping non-conflicting operations).

Two schedules are said to be conflict equivalent if and only if:

1. They contain the same set of the transaction.


2. If each pair of conflict operations are ordered in the same way.

Example:
Schedule S2 is a serial schedule because, in this, all operations of T1 are performed
before starting any operation of T2. Schedule S1 can be transformed into a serial
schedule by swapping non-conflicting operations of S1.

After swapping of non-conflict operations, the schedule S1 becomes:

T1 T2

Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)

Since, S1 is conflict serializable.

View Serializability
o A schedule will view serializable if it is view equivalent to a serial schedule.
o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:

1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and
S2. In schedule S1, if a transaction T1 is reading the data item A, then in S2,
transaction T1 should also read A.

Above two schedules are view equivalent because Initial read operation in S1 is
done by T1 and in S2 it is also done by T1.

2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should
read A which is updated by Tj.

Above two schedules are not view equal because, in S1, T3 is reading A updated by
T2 and in S2, T3 is reading A updated by T1.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a
transaction T1 updates A at last then in S2, final writes operations should also be
done by T1.

Above two schedules is view equal because Final write operation in S1 is done by
T3 and in S2, the final write operation is also done by T3.

Example:

Schedule S

With 3 transactions, the total number of possible schedule

1. = 3! = 6
2. S1 = <T1 T2 T3>
3. S2 = <T1 T3 T2>
4. S3 = <T2 T3 T1>
5. S4 = <T2 T1 T3>
6. S5 = <T3 T1 T2>
7. S6 = <T3 T2 T1>

Taking first schedule S1:


Schedule S1

Step 1: final updation on data items

In both schedules S and S1, there is no read except the initial read that's why we
don't need to check that condition.

Step 2: Initial Read

The initial read operation in S is done by T1 and in S1, it is also done by T1.

Step 3: Final Write

The final write operation in S is done by T3 and in S1, it is also done by T3. So, S
and S1 are view Equivalent.

The first schedule S1 satisfies all three conditions, so we don't need to check
another schedule.

Hence, view equivalent serial schedule is:

1. T1 → T2 → T3

Concurrency Control Techniques in DBMS


What is Concurrency Control in DBMS?
Concurrency Control is a crucial Database Management System
(DBMS) component. It manages simultaneous operations without them conflicting
with each other. The primary aim is maintaining consistency, integrity, and isolation
when multiple users or applications access the database simultaneously.

In a multi-user database environment, it’s common for numerous users to want to


access and modify the database simultaneously. This is what we call concurrent
execution. Imagine a busy library where multiple librarians are updating book records
simultaneously. Just as multiple librarians shouldn’t try to update the same record
simultaneously, database users shouldn’t interfere with each other’s operations.
Executing transactions concurrently offers many benefits, like improved system
resource utilization and increased throughput. However, these simultaneous
transactions mustn’t interfere with each other. The ultimate goal is to ensure the
database remains consistent and correct. For instance, if two people try to book the
last seat on a flight at the exact moment, the system must ensure that only one
person gets the seat.
But concurrent execution can lead to various challenges:
1. Lost Updates: Consider two users trying to update the same data. If one
user reads a data item and then another user reads the same item and
updates it, the first user’s updates could be lost if they weren’t aware of
the second user’s actions.
2. Uncommitted Data: If one user accesses data that another user has
updated but not yet committed (finalized), and then the second user
decides to abort (cancel) their transaction, the first user has invalid data.
3. Inconsistent Retrievals: A transaction reads several values from the
database, but another transaction modifies some of those values in the
middle of its operation.
To address these challenges, the DBMS employs concurrency control
techniques. Think of it like traffic rules. Just as traffic rules ensure vehicles don’t
collide, concurrency control ensures transactions don’t conflict.
Why is Concurrency Control Needed?
As we just discussed above about what concurrency control is, from that we can now
figure out that we need concurrency control because of the following reasons listed
below:
1. Ensure Database Consistency: Without concurrency control,
simultaneous transactions could interfere with each other, leading to
inconsistent database states. Proper concurrency control ensures the
database remains consistent even after numerous concurrent
transactions.
2. Avoid Conflicting Updates: When two transactions attempt to update
the same data simultaneously, one update might overwrite the other
without proper control. Concurrency control ensures that updates don’t
conflict and cause unintended data loss.
3. Prevent Dirty Reads: Without concurrency control, one transaction might
read data that another transaction is in the middle of updating (but hasn’t
finalized). This can lead to inaccurate or “dirty” reads, where the data
doesn’t reflect the final, committed state.
4. Enhance System Efficiency: By managing concurrent access to the
database, concurrency control allows multiple transactions to be
processed in parallel. This improves system throughput and makes
optimal use of resources.
5. Protect Transaction Atomicity: For a series of operations within a
transaction, it’s crucial that all operations succeed (commit) or none do
(abort). Concurrency control ensures that transactions are atomic and
treated as a single indivisible unit, even when executed concurrently with
others.

Concurrency Control Techniques in DBMS


The various concurrency control techniques are:
1. Two-phase locking Protocol
2. Time stamp ordering Protocol
3. Multi version concurrency control
4. Validation concurrency control
Let’s understand each technique one by one in detail
1. Two-phase locking Protocol
Two-phase locking (2PL) is a protocol used in database management systems to
control concurrency and ensure transactions are executed in a way that preserves
the consistency of a database. It’s called “two-phase” because, during each
transaction, there are two distinct phases: the Growing phase and the Shrinking
phase.

Breakdown of the Two-Phase Locking protocol


1. Phases:
 Growing Phase: During this phase, a transaction can obtain
(acquire) any number of locks as required but cannot release
any. This phase continues until the transaction acquires all the
locks it needs and no longer requests.
 Shrinking Phase: Once the transaction releases its first lock,
the Shrinking phase starts. During this phase, the transaction
can release but not acquire any more locks.
2. Lock Point: The exact moment when the transaction switches from the
Growing phase to the Shrinking phase (i.e. when it releases its first lock)
is termed the lock point.
The primary purpose of the Two-Phase Locking protocol is to ensure conflict-
serializability, as the protocol ensures a transaction does not interfere with others in
ways that produce inconsistent results.
2. Time stamp ordering Protocol
The Timestamp Ordering Protocol is a concurrency control method used in
database management systems to maintain the serializability of transactions. This
method uses a timestamp for each transaction to determine its order in relation to
other transactions. Instead of using locks, it ensures transaction order based on their
timestamps.
Breakdown of the Time stamp ordering protocol
1. Read Timestamp (RTS):
 This is the latest or most recent timestamp of a transaction that
has read the data item.
 Every time a data item X is read by a transaction T with
timestamp TS, the RTS of X is updated to TS if TS is more
recent than the current RTS of X.
2. Write Timestamp (WTS):
 This is the latest or most recent timestamp of a transaction that
has written or updated the data item.
 Whenever a data item X is written by a transaction T with
timestamp TS, the WTS of X is updated to TS if TS is more
recent than the current WTS of X.
The timestamp ordering protocol uses these timestamps to determine whether a
transaction’s request to read or write a data item should be granted. The protocol
ensures a consistent ordering of operations based on their timestamps, preventing
the formation of cycles and, therefore, deadlocks.

3. Validation concurrency control


Validation (or Optimistic) Concurrency Control (VCC) is an advanced database
concurrency control technique. Instead of acquiring locks on data items, as is done
in most traditional (pessimistic) concurrency control techniques, validation
concurrency control allows transactions to work on private copies of database items
and validates the transactions only at the time of commit.
The central idea behind optimistic concurrency control is that conflicts between
transactions are rare, and it’s better to let transactions run to completion and only
check for conflicts at commit time.
Breakdown of Validation Concurrency Control (VCC):
1. Phases: Each transaction in VCC goes through three distinct phases:
 Read Phase: The transaction reads values from the database
and makes changes to its private copy without affecting the
actual database.
 Validation Phase: Before committing, the transaction checks if
the changes made to its private copy can be safely written to the
database without causing any conflicts.
 Write Phase: If validation succeeds, the transaction updates the
actual database with the changes made to its private copy.
2. Validation Criteria: During the validation phase, the system checks for
potential conflicts with other transactions. If a conflict is found, the system
can either roll back the transaction or delay it for a retry, depending on the
specific strategy implemented.
Real-Life Example
Scenario: A world-famous band, “The Algorithmics,” is about to release tickets for
their farewell concert. Given their massive fan base, the ticketing system is expected
to face a surge in access requests.
EventBriteMax must ensure that ticket sales are processed smoothly without double
bookings or system failures.
1. Two-Phase Locking Protocol (2PL):
 Usage: Mainly for premium ticket pre-sales to fan club members. These sales
occur a day before the general ticket release.
 Real-Life Example: When a fan club member logs in to buy a ticket, the
system uses 2PL. It locks the specific seat they choose during the
transaction. Once the transaction completes, the lock is released. This
ensures that no two fan club members can book the same seat at the same
time.
2. Timestamp Ordering Protocol:
 Usage: For general ticket sales.
 Real-Life Example: As thousands rush to book their tickets, each transaction
gets a timestamp. If two fans try to book the same seat simultaneously, the
one with the earlier timestamp gets priority. The other fan receives a
message suggesting alternative seats.
3. Validation Concurrency Control:
 Usage: For group bookings where multiple seats are booked in a single
transaction.
 Real-Life Example: A group of friends tries to book 10 seats together. They
choose their seats and proceed to payment. Before finalizing, the system
validates that all 10 seats are still available (i.e., no seat was booked by
another user in the meantime). If there’s a conflict, the group is prompted to
choose a different set of seats. If not, their booking is confirmed.
The concert ticket sales go off without a hitch. Fans rave about the smooth
experience, even with such high demand. Behind the scenes, EventBriteMax’s
effective implementation of the four concurrency control protocols played a crucial
role in ensuring that every fan had a fair chance to purchase their ticket and no seats
were double-booked. The Algorithmics go on to have a fantastic farewell concert,
with not a single problem in the ticketing process.

You might also like