UNIT I&unit-2 Material Mtech
UNIT I&unit-2 Material Mtech
Database Objects. Normalization Techniques: Functional Dependency, 1NF, 2NF, 3NF, BCNF; Multi valued
Dependency; Loss-less Join and Dependency Preservation.
Relational Models: -
The relational model represents how data is stored in Relational Databases. A relational database consists of a
collection of tables each of which is assigned a unique name. Consider a relation STUDENT with attributes
ROLL_NO, NAME, ADDRESS, PHONE, and AGE shown in the table.
Data Modeling
Data modeling is the process of defining the structure of a database to ensure efficient data organization,
storage, and retrieval. It involves conceptualizing real-world entities, their attributes, and relationships in a
structured format.
Query Languages
Query languages allow users to interact with databases to retrieve, manipulate, and manage data. The most
widely used query language is SQL (Structured Query Language).
Types of SQL
Create Table
sql
CopyEdit
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT
);
Insert Data
sql
CopyEdit
INSERT INTO Students (StudentID, Name, Age) VALUES (1, 'Alice', 22);
Retrieve Data
sql
CopyEdit
SELECT * FROM Students WHERE Age > 20;
Join Tables
sql
CopyEdit
SELECT Students.Name, Courses.CourseName
FROM Students
G. E. Sastry, Assistant Professor Page 2 of 50
JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
JOIN Courses ON Enrollments.CourseID = Courses.CourseID;
MongoDB (Document-based)
json
CopyEdit
db.students.find({ "age": { "$gt": 20 } })
cypher
CopyEdit
MATCH (s:Student)-[:ENROLLED_IN]->(c:Course) RETURN s.name, c.name;
Database Objects
Database objects are the components of a database that store, manage, and manipulate data. These objects help
in organizing data efficiently and ensuring its integrity. The most common database objects include tables,
views, indexes, stored procedures, triggers, sequences, and more.
1. Tables
Tables are the core database objects where data is stored in a structured format.
Example:
sql
CopyEdit
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT,
CourseID INT,
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
2. Views
A view is a virtual table that represents a saved SQL query. It does not store data but provides a way to simplify
complex queries.
Example:
G. E. Sastry, Assistant Professor Page 3 of 50
sql
CopyEdit
CREATE VIEW StudentDetails AS
SELECT Students.Name, Courses.CourseName
FROM Students
JOIN Courses ON Students.CourseID = Courses.CourseID;
3. Indexes
Indexes improve the speed of data retrieval operations by creating an efficient lookup mechanism.
Example:
sql
CopyEdit
CREATE INDEX idx_student_name ON Students(Name);
Types of Indexes:
4. Stored Procedures
Stored procedures are predefined SQL statements that execute complex tasks efficiently and securely.
Example:
sql
CopyEdit
CREATE PROCEDURE GetStudentDetails(@StudentID INT)
AS
BEGIN
SELECT * FROM Students WHERE StudentID = @StudentID;
END;
Execution:
sql
CopyEdit
EXEC GetStudentDetails 1;
5. Triggers
Triggers are automated database actions that execute in response to specific events, such as inserts, updates, or
deletes.
Example:
sql
CopyEdit
G. E. Sastry, Assistant Professor Page 4 of 50
CREATE TRIGGER StudentInsertTrigger
ON Students
AFTER INSERT
AS
BEGIN
PRINT 'New student record added!';
END;
6. Sequences
A sequence is used to generate unique values, often for auto-incrementing primary keys.
Example:
sql
CopyEdit
CREATE SEQUENCE StudentID_Seq
START WITH 1
INCREMENT BY 1;
Usage:
sql
CopyEdit
INSERT INTO Students (StudentID, Name, Age)
VALUES (NEXT VALUE FOR StudentID_Seq, 'Alice', 22);
7. Synonyms
Example:
sql
CopyEdit
CREATE SYNONYM StudentSyn FOR Students;
SELECT * FROM StudentSyn;
Storing the same Information Redundantly that is in more than one place within database can lead to
several Problems.
1. Redundant Storage
Some information is stored repeatedly.
2. Update anomalies
If data items are scattered and are not linked to each other properly, then there may be instances
when we try to update one data item that has copies of it scattered at several places, few instances of it get
updated properly while few are left with their old values.
This leaves database in an inconsistent state.
G. E. Sastry, Assistant Professor Page 5 of 50
This anomaly is caused due to data redundancy. Redundant information makes updates more difficult
since, for example, changing the name of the student 501 would require that all tuples containing 501 in
Regno must be updated. If for some reason, all tuples are not updated, we might have a database that gives
two names for a student, which is inconsistent information. This problem is called update anomaly. An
update anomaly results in data inconsistency.
3. Insertion anomalies
We tried to insert data in a record that does not exist at all. Inability to represent certain information-The
primary key of the above relation be (Regno, course code). Any new tuple to be inserted in the relation must
have a value for the primary key since entity integrity constraint requires that a key may not be totally or
partially NULL. However, in the given relation if one wanted to insert the code and name of a new subject
in the database, it would not be possible until a student enrols in that Course. Similarly information about a
new student cannot be inserted in the database until the student enrols in a course. These problems are called
insertion anomalies.
4. Deletion anomalies
We tried to delete a record, but parts of it left undeleted because of unawareness, the data is also saved
somewhere else. Loss of Useful Information: In some instances, useful information may be lost when a tuple
is deleted. For example, if we delete the tuple corresponding to student 502 enrolled for CS-104, we will
lose relevant information about the course i.e., course name. This is called deletion anomaly.
Example is:
Decomposition:
Decomposition is the process of breaking down the relation R into two or more relation schemas that
each contain a subset of the attributes of R and together include all attributes in R.
The decomposition of R into R1 and R2 is lossy when the join of R1 and R2 does not yield the same
relation as in R.
One of the disadvantage of decomposition into two or more tables is that some information is lost
during retrieval of original relation.
e.g. if we have a table STUDENT (Rollno,sname,dept). If we decompose the table into two tables
one with attributes Student_info(Rollno,sname) and another as Student_dept(sname,dept). When we
join the two tables then we may get some spurious or extra tupples which makes the data
inconsistent.
Lossless Join Decomposition (Non-additive Join)
The decomposition of R into R1 and R2 is lossless when the join of R1 and R2 yield the same
relation as in R.
e.g. if we have a table STUDENT (Rollno,sname,dept). If we decompose the table into two tables
one with attributes Student_info(Rollno,sname) and another as Student_dept(Rollno,dept).
When we join the two tables then we get the same relation as that of student.
Here no spurious or extra tuples are generated.
Hence, care must be taken before decomposing a relation into parts.
Functional dependency
Definition
A functional dependency is a constraint between two sets of attributes from the database
A functional dependency, denoted by X Y, between two sets of attributes X and Y that are subsets
of R specifies a constraint on the possible tuples that can form a relation state r of R.
The constraint is that, for any two tuples t1 and t2 if, t1 [X] =t2 [X], then we must also have t1 [Y] =
t2[Y].
This means that the values of the Y component of a tuple depend on, or are determined by, the values
of the X component; or alternatively, the values of the X component of a tuple uniquely (or
functionally) determine the values of the Y component.
We also say that there is a functional dependency from X to Y or that Y is functionally dependent on
X.
The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-
hand side of the FD, and Y is called the right-hand side.
Functional dependency is represented by arrow sign () that is AB, where A functionally
determines B.
Determinant: Attribute or set of attributes on the left hand side of the arrow.
Dependent: Attribute or set of attributes on the right hand side of the arrow.
If F is set of functional dependencies then the closure of F, denoted as F+, is the set of all functional
dependencies logically implied by F. Armstrong's Axioms are set of rules, when applied repeatedly
generates closure of functional dependencies.
2. Transitivity rule: Same as transitive rule in algebra, if ab holds and bc holds then ac also hold ab
is called functionally determines b
Normalization
If a database design is not perfect it may contain anomalies, which leads to inconsistence of database
itself. Managing a database with anomalies is next to impossible.
Normalization is the process of efficiently organizing data in the
Eliminate redundant data (for example, storing the same data in more than one table) and
Ensure data dependencies make sense (only storing related data in a table).
Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is
logically stored.
Normal Forms Based on FDS:
The normalization process, as first proposed by Codd (1972a), takes a relation schema through a
series of tests to "certify" whether it satisfies a certain normal form.
The normal forms based on FDs are 1st normal form (1NF), second normal form (2NF), third normal
form (3NF), and Boyce-Codd normal form (BCNF).
Normalization of data can hence be looked upon as a process of analyzing the given relation
schemas based on their FDs and primary keys to achieve the desirable properties of
ADVANTAGES OF NORMALIZATION
More flexible data structure i.e. we should be able to add new rows and data values easily
DISADVANTAGES OF NORMALIZATION
You cannot start building the database before you know what the user needs.
On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance degrades.
It is very time consuming and difficult process in normalizing relations of higher degree.
Careless decomposition may leads to bad design of database which may leads to serious problems.
EXAMPLE DATA:
A company obtains parts from a number of suppliers. Each supplier is located in one city. A city can
have more than one supplier located there and each city has a status code associated with it. Each supplier
may provide many parts. The company creates a simple relational table to store this information that can be
expressed in relational notation as:
FIRST (s#, status, city, p#, qty)
where
s# supplier identifcation number (this is the primary key) status
status code assigned to city
city name of city where supplier is located p#
part number of part supplied
qty> quantity of parts supplied to date
Definition: A relation is said to be in 1NF if it contains atomic values and each row can provide a unique
combination of values.
For example all the fields in the below table are atomic with single values and each row contains
unique combination of values so it in 1NF
1NF does not allow multivalued attributes. If multivalued attributes present then the relation need to
be decomposed.
Update anomalies are problems that arise when information is inserted, deleted, or updated.
INSERT. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added
until they supplied a part.
DELETE. If a row is deleted, then not only is the information about quantity and part lost but also
information about the supplier.
UPDATE. If supplier s1 moved from London to New York, then six rows would have to be updated
with this new information.
Definition: A relation is said to be in 2NF, if it is already in 1NF and each and every attribute fully
depends on the primary key of the relation.
Speaking inversely, if a table has some attributes which is not dependent on the primary key of that
table, then it is not in 2NF.
That is, every non-key column must be dependent upon the entire primary key. FIRST is in 1NF but
not in 2NF because status and city are functionally dependent upon only on the column s# of the
composite primary key (s#, p#).
1. Identify any determinants other than the composite key, and the columns they determine.
2. Create and name a new table for each determinant and the unique columns it determines.
3. Move the determined columns from the original table to the new table. The determinate becomes the
primary key of the new table.
4. Delete the columns you just moved from the original table except for the determinate which will
serve as a foreign key.
5. The original table may be renamed to maintain semantic meaning.
To transform FIRST into 2NF we move the columns s#, status, and city to a new table called SECOND. The
column s# becomes the primary key of this new table
INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be
inserted until there is a supplier in the city.
DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as
the association between supplier and city.
Definition: A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive
dependency in that relation.
Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and the table must
be split to bring it into 3NF
Table PARTS is already in 3NF. The non-key column, qty, is fully dependent upon the primary key
(s#, p#). SUPPLIER is in 2NF but not in 3NF because it contains a transitive dependency. A transitive
dependency is occurs when a non-key column that is a determinant of the primary key is the determinate of
other columns. The concept of a transitive dependency can be illustrated by showing the functional
dependencies in SUPPLIER:
s# —> city
city —>status
Then,
s# —>status
Note that SUPPLIER.status is determined both by the primary key s# and the non-key column city. The
process of transforming a table into 3NF is:
To transform SUPPLIER into 3NF, we create a new table called CITY_STATUS and move the
columns city and status into it. Status is deleted from the original table, city is left behind to serve as a
foreign key to CITY_STATUS, and the original table is renamed to SUPPLIER_CITY to reflect its semantic
meaning. The results are shown in Figure 3 below.
The results of putting the original table into 3NF has created three tables. These can be represented in
"psuedo-SQL" as:
SUPPLIER_CITY(s#, city)
Primary Key (s#)
Foreign Key (city) references CITY_STATUS.city
The advantages to having relational tables in 3NF is that it eliminates redundant data which in turn
saves space and reduces manipulation anomalies.
INSERT. Facts about the status of a city, Rome has a status of 50, can be added even though there is
not supplier in that city. Likewise, facts about new suppliers can be added even though they have not
yet supplied parts.
DELETE. Information about parts supplied can be deleted without destroying information about a
supplier or a city.
UPDATE. Changing the location of a supplier or the status of a city requires modifying only one
row.
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that
Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.
In general, a multivalued dependency occurs when a relation R has attributes A, B, and C such that A
determines a set of values for B, A determines a set of values for C, and B and C are independent of
each other.
Definition: - A relation is said to be in 4NF if it is in BCNF/3NF and contains no multi valued attributes.
(Or)
Let R be a relation schema, X and Y be nonempty subsets of the attributes of R, and F be a set of
dependencies that includes both FDs and MVDs. R is in 4NF if, for every non trivial MVD X->->Y that
Trivial Multi value dependency :it is one MVD that satisfied in Y ⊆ X or XY = R, or (trivial)
holds over R, X is a super key.
N Now consider another table example involving Course, Student_name and text_book.
The attributes Projname' and Children name are multivalued facts about the attribute 'Ename'.
However, since a Project has no influence over the Chidren name, these multi-valued facts about 10 courses are
independent of each other. Thus the table contains an MVD. Multi-value facts are represented by.
Here, Project and children are independent of each other. So all the anomalies occur in the above table
This problem of MVD is handled in Fourth Normal Form. To put it into 4NF, two separate tables are formed as
shown below:
Emp-Children (ename,childrenname)
Definition: - A relation is said to be in 5NF if and only if every join dependency in relation is implied by the
candidate keys of relation.
Or
Simply we can say “A table is in fifth normal form (5NF) if it is in 4NF and it cannot have a lossless
decomposition into any number of smaller tables.
SNO->>JNO
PNO->>JNO
where PNO is Project number, SNO is Supplier number and JNO is Parts number.
The above table is not in 5nf. To make it into 5nf the table is decomposed into 3 TABLES.ie. Supplier-
Project, Supplier-Parts and Project- Parts
Dependency preservation:
It is a property of decomposition. The FD s that hold on the relation R should be preserved directly
or indirectly even after decomposition.
i.e. we should decompose the relation R in such a way that the FDs that hold good on R must be
derived or holds good on the decomposed relations.
Candidate Keys
What is a Transaction…?
A transaction, in the context of a database, is a logical unit that is independently executed for data retrieval
or updates.
A logical unit of work that must be either entirely completed or aborted.
Any action that reads from and/or writes to a database may consist of
o Simple SELECT statement to generate a list of table contents
o A series of related UPDATE statements to change the values of attributes in
various tables
o A series of INSERT statements to add rows to one or more tables
o A combination of SELECT, UPDATE, and INSERT statements
Successful transaction changes the database from one consistent state to another
Most real-world database transactions are formed by two or more database requests.
Transaction has four properties such as: Atomicity, Consistency, isolation, durability. (ACID Properties)
Single-User versus Multiuser Systems:-
o One criterion for classifying a database system is according to the number of users who can use the system
concurrently—that is, at the same time
o A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users
can use the system—and hence users can access the database—concurrently.
Single Users:
o Single-user database supports one user at a time. Single-user can access the database at one point of
time. These types of systems are optimized for a personal desktop experience not for multiple users of
the system at the same time. All the resources are always available for the user to work.
o Example: Stand-alone personal computers, Microsoft Access, etc.
Multi Users:
o Most other DBMSs are multiuser.
o Example: an airline reservations system is used by hundreds of travel agents and reservation clerks
concurrently.
o Systems in banks, insurance agencies, stock exchanges, supermarkets, and the like are also operated on
by many users who submit transactions concurrently to the system
o Multiple users can access databases—and use computer systems—simultaneously because of the
concept of multiprogramming, which allows the computer to execute multiple programs or processes at
the same time.
o If only a single central processing unit (CPU) exists, it can actually execute at most one process at a time.
o However, multiprogramming operating systems execute some commands from one process, then
suspend that process and execute some commands from the next process, and so on. A process is
resumed at the point where it was suspended whenever it gets its turn to use the CPU again. Hence,
concurrent execution of processes is actually interleaved which shows two processes A and B executing
concurrently in an interleaved fashion.
Interleaving keeps the CPU busy when a process requires an input or output (I/O) operation, such as
reading a block from disk.
The CPU is switched to execute another process rather than remaining idle during I/O time.
Interleaving also prevents a long process from delaying other processes.
o A transaction is an atomic unit of work that should either be completed in its entirety or not done at all.
o For recovery purposes, the system needs to keep track of when each transaction starts, terminates, and
commits or aborts
Therefore, the recovery manager of the DBMS needs to keep track of the following operations:
o A transaction goes into an active state immediately after it starts execution, where it can execute its
READ and WRITE operations
o When the transaction ends, it moves to the partially committed state
o At this point, some recovery protocols need to ensure that a system failure will not result in an inability
to record the changes of the transaction permanently
o Once this check is successful, the transaction is said to have reached its commit point and enters the
committed state.
o When a transaction is committed, it has concluded its execution successfully and all its changes must be
recorded permanently in the database, even if a system failure occurs.
o However, a transaction can go to the failed state if one of the checks fails or if the transaction is aborted
during its active state.
o The transaction may then have to be rolled back to undo the effect of its WRITE operations on the
database.
o The terminated state corresponds to the transaction leaving the system.
o The transaction information
that is maintained in system
tables while the transaction has
been running is removed when
the transaction terminates.
o Failed or aborted transactions
may be restarted later—either
automatically or after being
resubmitted by the user—as
brand new transactions
The System Log
o To be able to recover from failures that affect transactions, the system maintains a log to keep track of
all transaction operations that affect the values of database items, as well as other transaction
information that may be needed to permit recovery from failures.
o The log is a sequential, append-only file that is kept on disk, so it is not affected by any type of failure
except for disk or catastrophic failure.
o Typically, one (or more) main memory buffers hold the last part of the log file, so that log entries are
first added to the main memory buffer.
o When the log buffer is filled, or when certain other conditions occur, the log buffer is appended to the
end of the log file on disk. In addition, the log file from disk is periodically backed up to archival storage
(tape) to guard against catastrophic failures.
The following are the types of entries—called log records—that are written to the log file and the
corresponding action for each log record. In these entries, T refers to a unique transaction-id that is generated
automatically by the system for each transaction and that is used to identify each transaction:
1. [start_transaction, T]: Indicates that transaction T has started execution.
2. [write_item, T, X, old_value, new_value] : Indicates that transaction T has changed the value of
database item X from old_value to new_value.
3. [read_item, T, X] : Indicates that transaction T has read the value of database item X.
4. [commit, T]: Indicates that transaction T has completed successfully, and affirms that its effect can be
committed (recorded permanently) to the database.
5. [abort, T] : Indicates that transaction T has been aborted.
o It is possible to undo the effect of these WRITE operations of a transaction T by tracing backward
through the log and resetting all items changed by a WRITE operation of T to their old_values.
o Redo of an operation may also be necessary if a transaction has its updates recorded in the log but a
failure occurs before the system can be sure that all these new_values have been written to the actual
database on disk from the main memory buffers.
Commit Point of a Transaction
o A transaction T reaches its commit point when all its operations that access the database have been
executed successfully and the effect of all the transaction operations on the database have been recorded
in the log.
o Beyond the commit point, the transaction is said to be committed, and its effect must be permanently
recorded in the database.
o The transaction then writes a commit record [commit, T] into the log. If a system failure occurs, we can
search back in the log for all transactions T that have written a [start_transaction, T] record into the log
but have not written their [commit, T] record yet; these transactions may have to be rolled back to undo
their effect on the database during the recovery process.
o Transactions that have written their commit record in the log must also have recorded all their WRITE
operations in the log, so their effect on the database can be redone from the log records.
o It is common to keep one or more blocks of the log file in main memory buffers, called the log buffer,
until they are filled with log entries and then to write them back to disk only once, rather than writing to
disk every time a log entry is added.
o This saves the overhead of multiple disk writes of the same log file buffer.
o At the time of a system crash, only the log entries that have been written back to disk are considered in
the recovery process because the contents of main memory may be lost.
o Hence, before a transaction reaches its commit point, any portion of the log that has not been written to
the disk yet must now be written to the disk. This process is called force-writing the log buffer before
committing a transaction.
Desirable Properties of Transactions
Transactions should possess several properties, often called the ACID properties; they should be
enforced by the concurrency control and recovery methods of the DBMS. The following are the ACID
properties:
Atomicity. A transaction is an atomic unit of processing; it
should either be performed in its entirety or not performed at
all.
Consistency preservation. A transaction should be
consistency preserving, meaning that if it is completely
executed from beginning to end without interference from
other transactions, it should take the database from one
consistent state to another.
Isolation. A transaction should appear as though it is being
executed in isolation from other transactions, even though many transactions are executing concurrently.
That is, the execution of a transaction should not be interfered with by any other transactions executing
concurrently.
Durability or permanency. The changes applied to the database by a committed transaction must persist in
the database. These changes must not be lost because of any failure
Serializability
When more than one transaction is executed by the operating system in a multiprogramming environment,
there are possibilities that instructions of one transactions are interleaved with some other transaction.
Schedule: A schedule (or history) S of transactions (T1, T2 …TN) is ...an interleaving of the set of actions
contained in (T1, T2 .. TN) such that the actions of any single transaction are in order.
Ex. If T1 has (a, b, c) operations and T2 has (p, q, r, s) operations then possible schedules are,
S1: (a, b, p, c, q, r, s)
S2: (p, a, q, b, c, r, s)
Serial Schedule: A schedule in which transactions are aligned in such a way that one transaction is executed
first. When the first transaction completes its cycle then next transaction is executed. Transactions are ordered
one after other. This type of schedule is called serial schedule as transactions are executed in a serial manner.
Figure:
Examples of serial and nonserial schedules involving transactions T1 and T2. (a) Serial schedule A: T1 followed
by T2. (b) Serial schedule B: T2 followed by T1. (c) Two non serial schedules C and D with interleaving of
operations.
o Assume that the initial values of database items are X = 90 and Y = 90 and that N = 3 and M = 2. After
executing transactions T1 and T2, we would expect the database values to be X = 89 and Y = 93,
according to the meaning of the transactions.
o Sure enough, executing either of the serial schedules A or B gives the correct results.
o Now consider the non serial schedules C and D. Schedule gives the results X = 92 and Y = 93, in which
the X value is erroneous, whereas schedule D gives the correct results.
o There is a simple algorithm for determining whether a particular schedule is conflict serializable or not.
o Most concurrency control methods do not actually test for serializability.
o Rather protocols, or rules, are developed that guarantee that any schedule that follows these rules will be
serializable.
o The algorithm looks at only the read_item and write_item operations in a schedule to construct a
precedence graph (or serialization graph), which is a directed graph G = (N, E) that consists of a set of
nodes N = {T1, T2, ..., Tn } and a set of directed edges E = {e1, e2, ..., em }.
o There is one node in the graph for each transaction Ti in the schedule. Each edge ei in the graph is of the
form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of ei and Tk is the ending node of ei .
o Such an edge from node Tj to node Tk is created by the algorithm if one of the operations in Tj appears
in the schedule before some conflicting operation in Tk.
Algorithm. Testing Conflict Serializability of a Schedule S
1) For each transaction Ti participating in schedule S, create a node labeled Ti in the precedence graph.
2) For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti
→ Tj ) in the precedence graph.
3) For each case in S where Tj executes a write_item(X) after Ti executes a read_item(X), create an edge
(Ti → Tj ) in the precedence graph.
4) For each case in S where Tj executes a write_item(X) after Ti executes write_item(X), create an edge (Ti
→ Tj ) in the precedence graph.
5) The schedule S is serializable if and only if the precedence graph has no cycles
o If there is a cycle in the precedence graph, schedule S is not (conflict) serializable; if there is no cycle, S
is serializable
o If there is no cycle in the precedence graph, we can create an equivalent serial schedule S` that is
equivalent to S, by ordering the transactions that participate in S as follows: Whenever an edge exists in
the precedence graph from Ti to Tj , Ti must appear before Tj in the equivalent serial schedule S`.
o A serial schedule represents inefficient processing because no interleaving of operations from different
transactions is permitted.
o This can lead to low CPU utilization while a transaction waits for disk I/O, or for another transaction to
terminate, thus slowing down processing considerably.
o A serializable schedule gives the benefits of concurrent execution. In practice, it is quite difficult to test
for the serializability of a schedule.
o The interleaving of operations from concurrent transactions—which are usually executed as processes
by the operating system—is typically determined by the operating system scheduler which allocates
resources to all processes.
o Factors such as system load, time of transaction submission, and priorities of processes contribute to the
ordering of operations in a schedule.
o Hence, it is difficult to determine how the operations of a schedule will be interleaved beforehand to
ensure serializability.
Fig : The read and write operations of three transactions T1, T2, and T3
Another example of serializability testing.
(d) Precedence graph for schedule E.
(e) Precedence graph for schedule F.
(f) Precedence graph with two equivalent serial schedules.
Two-Phase Locking Techniques for Concurrency Control
o A lock is a variable associated with a data item that describes the status of the item with respect to
possible operations that can be applied to it.
o Generally, there is one lock for each data item in the database.
o Locks are used as a means of synchronizing the access by concurrent transactions to the database items.
o Two problems associated with the use of locks deadlock and starvation and show how these problems
are handled in concurrency control protocols.
Types of Locks and System Lock Tables
Binary Locks.
1. Shared/Exclusive (or Read/write) Locks.
2. Conversion of Locks.
Binary Locks
A binary lock can have two states or values: locked and unlocked (or 1 and 0, for simplicity). A distinct
lock is associated with each database item X.
If the value of the lock on X is 1, item X cannot be accessed by a database operation that requests the item. If
the value of the lock on X is 0, the item can be accessed when requested, and the lock value is changed
to 1.
Two operations, lock_item and unlock_item, are used with binary locking.
A transaction requests access to an item X by first issuing a lock_item(X) operation.
If LOCK(X) = 1, the transaction is forced to wait.
If LOCK(X) = 0, it is set to 1 (the transaction locks the item) and the transaction is allowed to access item
X.
When the transaction is through using the item, it issues an unlock_item(X) operation, which sets
LOCK(X) back to 0 (unlocks the item) so that X may be accessed by other transactions. Hence, a binary lock
enforces mutual exclusion on the data item.
A description of the lock_item(X) and unlock_item(X) operations is shown below
The DBMS has a lock manager subsystem to keep track of and control access to locks.
If the simple binary locking scheme described here is used, every transaction must obey the following rules:
1. A transaction T must issue the operation lock_item(X) before any read_item(X) or write_item(X)
operations are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all read_item(X) and write_item(X)
operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the lock on item X.
4. A transaction T will not issue an unlock_item(X) operation unless it already holds the lock on item X.
Between the lock_item(X) and unlock_item(X) operations in transaction T, T is said to hold the lock on
item X. At most one transaction can hold the lock on a particular item. Thus no two transactions can access the
same item concurrently.
Shared/Exclusive (or Read/Write) Locks: The preceding binary locking scheme is too restrictive for
database items because at most, one transaction can hold a lock on a given item. We should allow several
transactions to access the same item X if they all access X for reading purposes only.
This is because read operations on the same item by different transactions are not conflicting. However,
if a transaction is to write an item X, it must have exclusive access to X.
For this purpose, a different type of lock called a multiple-mode lock is used. In this scheme called
shared/exclusive or read/write locks there are three locking operations: read_lock(X), write_lock(X), and
unlock(X).
A lock associated with an item X, LOCK(X),
now has three possible states: read-locked, write-
locked, or unlocked.
A read-locked item is also called share-
locked because other transactions are allowed to
read the item, whereas a write-locked item is called
exclusive- locked because a single transaction
exclusively holds the lock on the item.
Each record in the lock table will have four
fields: Again, to save space, the system needs to
maintain lock records only for locked items in the
lock table
The value (state) of LOCK is either read-
locked or write-locked, suitably coded (if we assume
no records are kept in the lock table for unlocked
items).
Fig: Locking and unlocking operations for two
mode (read-write or shared-
exclusive) locks.
o Sometimes it is desirable to relax conditions 4 and 5 in the preceding list in order to allow lock
conversion; that is, a transaction that already holds a lock on item X is allowed under certain conditions
to convert the lock from one locked state to another.
o For example, it is possible for a transaction T to issue a read_lock(X) and then later to upgrade the lock
by issuing a write_lock(X) operation.
o If T is the only transaction holding a read lock on X at the time it issues the write_lock(X) operation, the
lock can be upgraded; otherwise, the transaction must wait.
o It is also possible for a transaction T to issue a write_lock(X) and then later to downgrade the lock by
issuing a read_lock(X) operation.
o When upgrading and downgrading of locks is used, the lock table must include transaction identifiers in
the record structure for each lock (in the locking_transaction(s) field) to store the information on which
transactions hold locks on the item.
o The descriptions of the read_lock(X) and write_lock(X) operations must be changed appropriately to
allow for lock upgrading and downgrading.
Concurrency Control Based on Timestamp Ordering
o The use of locks, combined with the 2PL protocol, guarantees serializability of schedules.
o The serializable schedules produced by 2PL have their equivalent serial schedules based on the order in
which executing transactions lock the items they acquire.
o If a transaction needs an item that is already locked, it may be forced to wait until the item is released.
o Some transactions may be aborted and restarted because of the deadlock problem.
o A different approach that guarantees serializability involves using transaction timestamps to order
transaction execution for an equivalent serial schedule
Timestamps
o Typically, timestamp values are assigned in the order in which the transactions are submitted to the
system, so a timestamp can be thought of as the transaction start time.
o Concurrency control techniques based on timestamp ordering do not use locks; hence, deadlocks cannot
occur.
o Timestamps can be generated in several ways.
o One possibility is to use a counter that is incremented each time its value is assigned to a transaction.
o The transaction timestamps are numbered 1, 2, 3, ... in this scheme.
o A computer counter has a finite maximum value, so the system must periodically reset the counter to
zero when no transactions are executing for some short period of time.
o Another way to implement timestamps is to use the current date/time value of the system clock and
ensure that no two timestamp values are generated during the same tick of the clock.
The Timestamp Ordering Algorithm
o The idea for this scheme is to order the transactions based on their timestamps.
o A schedule in which the transactions participate is then serializable, and the only equivalent serial
schedule permitted has the transactions in order of their timestamp values.
o This is called timestamp ordering (TO).
o The algorithm must ensure that, for each item accessed by conflicting operations in the schedule, the
order in which the item is accessed does not violate the timestamp order.
o To do this, the algorithm associates with each database item X two timestamp (TS) values:
1. read_TS(X). The read timestamp of item X is the largest timestamp among all the timestamps of
transactions that have successfully read item X—that is, read_TS(X) = TS(T), where T is the youngest
transaction that has read X successfully.
2. write_TS(X). The write timestamp of item X is the largest of all the timestamps of transactions that have
successfully written item X—that is, write_TS(X) = TS(T), where T is the youngest transaction that has written
X successfully.
Basic Timestamp Ordering (TO): Whenever some transaction T tries to issue a read_item(X) or a write_item(X)
operation, the basic TO algorithm compares the timestamp of T with read_TS(X) and write_TS(X) to ensure
that the timestamp order of transaction execution is not violated.
If this order is violated, then transaction T is aborted and resubmitted to the system as a new transaction
with a new timestamp.
If T is aborted and rolled back, any transaction T1 that may have used a value written by T must also be
rolled back
Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back, and so
on. This effect is known as cascading rollback and is one of the problems associated with basic TO, since the
schedules produced are not guaranteed to be recoverable. An additional protocol must be enforced to ensure that
the schedules are recoverable, cascade less, or strict.
The concurrency control algorithm must check whether conflicting operations violate the timestamp ordering in
the following two cases:
o A variation of basic TO called strict TO ensures that the schedules are both strict (for easy
recoverability) and (conflict) serializable.
o In this variation, a transaction T that issues a read_item(X) or write_item(X) such that TS(T) >
write_TS(X) has its read or write operation delayed until the transaction Tthat wrote the value of X
(hence TS(T) = write_TS(X)) has committed or aborted.
o To implement this algorithm, it is necessary to simulate the locking of an item X that has been written
by transaction T until Tis either committed or aborted. This algorithm does not cause deadlock, since T
waits for T only if TS(T) > TS(T).
Thomas’s Write Rule:
A modification of the basic TO algorithm, known as Thomas’s write rule, does not enforce conflict
serializability, but it rejects fewer write operations by modifying the checks for the write_item(X) operation as
follows:
1. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
2. If write_TS(X) > TS(T), then do not execute the write operation but continue processing. This is because
some transaction with timestamp greater than TS(T)—and hence after T in the timestamp ordering—has
already written the value of X. Thus, we must ignore the write_item(X) operation of T because it is already
outdated and obsolete. Notice that any conflict arising from this situation would be detected by case (1).
3. If neither the condition in part (1) nor the condition in part (2) occurs, then execute the write_item(X)
operation of T and set write_TS(X) to TS(T).
o To implement this algorithm, it is necessary to simulate the locking of an item X
that has been written by transaction T until Tis either committed or aborted. This
algorithm does not cause deadlock, since T waits for T only if TS(T) > TS(T).
Thomas’s Write Rule:
A modification of the basic TO algorithm, known as Thomas’s write rule, does not
enforce conflict serializability, but it rejects fewer write operations by modifying the
checks for the write_item(X) operation as follows:
4. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
5. If write_TS(X) > TS(T), then do not execute the write operation but continue
processing. This is because some transaction with timestamp greater than TS(T)—and
hence after T in the timestamp ordering—has already written the value of X. Thus,
we must ignore the write_item(X) operation of T because it is already outdated and
obsolete. Notice that any conflict arising from this situation would be detected by
case (1).
6. If neither the condition in part (1) nor the condition in part (2) occurs, then execute the
write_item(X) operation of T and set write_TS(X) to TS(T).