Record Notebook
Record Notebook
E (INFORMATION TECHNOLOGY)
UNIT–IV
Active State
As we can see in the above diagram that a transaction goes into “partially
committed” state from the active state when there are read and write operations
present in the transaction.
A transaction contains number of read and write operations. Once the whole
transaction is successfully executed, the transaction goes into partially
committed state where we have all the read and write operations performed on
the main memory (local memory) instead of the actual database.
The reason why we have this state is because a transaction can fail during
execution so if we are making the changes in the actual database instead of local
memory, database may be left in an inconsistent state in case of any
failure. This state helps us to rollback the changes made to the database in
case of a failure during execution.
Committed State
If a transaction completes the execution successfully then all the changes made
in the local memory during partially committed state are permanently stored in
the database. You can also see in the above diagram that a transaction goes from
partially committed state to committed state when everything is successful.
Aborted State
This schedule determines the exact order of operations that are going to be
performed on database. In this example, all the instructions of transaction T1 are
executed before the instructions of transaction T2, however this is not always
necessary and we can have various types of schedules which we will discuss in
this article.
T1 T2
---- ----
R(X)
W(X)
R(Y)
R(Y)
R(X)
W(Y)
We have various types of schedules in DBMS. Lets discuss them one by one.
Serial Schedule
T1 T2
---- ----
R(A)
R(B)
W(A)
commit
R(B)
R(A)
W(B)
commit
Strict Schedule
Cascadeless Schedule
Ta Tb
----- -----
R(X)
W(X)
W(X)
commit
R(X)
W(X)
commit
Recoverable Schedule
Types of Serializability
1. Conflict Serializability
2. View Serializability
Conflicting operations
Two operations are said to be in conflict, if they satisfy all the following three
conditions:
T1 T2
----- ------
R(A)
R(B)
R(A)
R(B)
W(B)
W(A)
To convert this schedule into a serial schedule we must have to swap the R(A)
operation of transaction T2 with the W(A) operation of transaction T1. However
we cannot swap these two operations because they are conflicting operations,
thus we can say that this given schedule is not Conflict Serializable.
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
After swapping R(A) of T1 and R(B) of T2 we get:
T1 T2
----- ------
R(A)
R(B)
R(A)
W(B)
R(B)
W(A)
After swapping R(A) of T1 and W(B) of T2 we get:
T1 T2
----- ------
R(A)
R(B)
W(B)
R(A)
R(B)
W(A)
We finally got a serial schedule after swapping all the non-conflicting
operations so we can say that the given schedule is Conflict Serializable.
T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)
Serial Schedule of the above given schedule:
As we know that in Serial schedule a transaction only starts when the current
running transaction is finished. So the serial schedule of the above given
schedule would look like this:
T1 T2
----- ------
R(X)
W(X)
R(Y)
W(Y)
R(X)
W(X)
R(Y)
W(Y)
If we can prove that the given schedule is View Equivalent to its serial
schedule then the given schedule is called view Serializable.
We know that a serial schedule never leaves the database in inconsistent state
because there are no concurrent transactions execution. However a non-serial
schedule can leave the database in inconsistent state because there are multiple
transactions running concurrently. By checking that a given non-serial schedule
is view serializable, we make sure that it is a consistent schedule.
Lets learn how to check whether the two schedules are view equivalent.
Two schedules T1 and T2 are said to be view equivalent, if they satisfy all the
following conditions:
1. Initial Read: Initial read of each data item in transactions must match in both
schedules. For example, if transaction T1 reads a data item X before transaction
T2 in schedule S1 then in schedule S2, T1 should read X before T2.
Read vs Initial Read: You may be confused by the term initial read. Here
initial read means the first read operation on a data item, for example, a data
item X can be read multiple times in a schedule but the first read operation on X
is called the initial read. This will be more clear once we will get to the example
in the next section of this same article.
2. Final Write: Final write operations on each data item must match in both the
schedules. For example, a data item X is last written by Transaction T1 in
schedule S1 then in S2, the last write operation on X should be performed by
the transaction T1.
If a schedule is view equivalent to its serial schedule then the given schedule is
said to be View Serializable. Lets take an example.
Initial Read
In schedule S1, transaction T1 first reads the data item X. In S2 also transaction
T1 first reads the data item X.
Lets check for Y. In schedule S1, transaction T1 first reads the data item Y. In
S2 also the first read operation on Y is performed by T1.
We checked for both data items X & Y and the initial read condition is
satisfied in S1 & S2.
Final Write
Lets check for Y. In schedule S1, the final write operation on Y is done by
transaction T2. In schedule S2, final write on Y is done by T2.
We checked for both data items X & Y and the final write condition is satisfied
in S1 & S2.
Update Read
In S1, transaction T2 reads the value of X, written by T1. In S2, the same
transaction T2 reads the X after it is written by T1.
In S1, transaction T2 reads the value of Y, written by T1. In S2, the same
transaction T2 reads the value of Y after it is updated by T1.
The update read condition is also satisfied for both the schedules.
Result: Since all the three conditions that checks whether the two schedules are
view equivalent are satisfied in this example, which means S1 and S2 are view
equivalent. Also, as we know that the schedule S2 is the serial schedule of S1,
thus we can say that the schedule S1 is view serializable schedule.
Deadlock in DBMS
A deadlock is a condition wherein two or more tasks are waiting for each other
in order to be finished but none of the task is willing to give up the resources
that other task needs. In this situation no task ever gets finished and is in waiting
state forever.
Coffman conditions
Deadlock Handling
Did that made you laugh? You may be wondering how ignoring a deadlock can
come under deadlock handling. But to let you know that the windows you are
using on your PC, uses this approach of deadlock handling and that is reason
sometimes it hangs up and you have to reboot it to get it working. Not only
Windows but UNIX also uses this approach.
The question is why? Why instead of dealing with a deadlock they ignore it
and why this is being called as Ostrich algorithm?
Well! Let me answer the second question first, This is known as Ostrich
algorithm because in this approach we ignore the deadlock and pretends that it
would never occur, just like Ostrich behavior “to stick one’s head in the sand
and pretend there is no problem.”
Let’s discuss why we ignore it: When it is believed that deadlocks are very
rare and cost of deadlock handling is higher, in that case ignoring is better
solution than handling it. For example: Let’s take the operating system example
– If the time requires handling the deadlock is higher than the time requires
rebooting the windows then rebooting would be a preferred choice considering
that deadlocks are very rare in windows.
Deadlock detection
Resource scheduler is one that keeps the track of resources allocated to and
requested by processes. Thus, if there is a deadlock it is known to the resource
scheduler. This is how a deadlock is detected.
Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a deadlock
occurs so preventing one or more of them could prevent the deadlock.
Deadlock Avoidance
Deadlock can be avoided if resources are allocated in such a way that it avoids
the deadlock occurrence. There are two algorithms for deadlock avoidance.
• Wait/Die
• Wound/Wait
Here is the table representation of resource allocation for each algorithm. Both
of these algorithms take process age into consideration while determining the
best possible way of resource allocation for deadlock avoidance.
Wait/Die Wound/Wait
Conflict Example
You and your brother have a joint bank account, from which you both can
withdraw money. Now let’s say you both go to different branches of the same
bank at the same time and try to withdraw 5000 INR, your joint account has
only 6000 balance. Now if we don’t have concurrency control in place you both
can get 5000 INR at the same time but once both the transactions finish the
account balance would be -4000 which is not possible and leaves the database
in inconsistent state.
We need something that controls the transactions in such a way that allows the
transaction to run concurrently but maintaining the consistency of data to avoid
such issues.
1. Shared Lock(S)
2. Exclusive Lock(X)
1. Shared Lock(S): Shared lock is placed when we are reading the data,
multiple shared locks can be placed on the data but when a shared lock is placed
no exclusive lock can be placed.
For example, when two transactions are reading Steve’s account balance, let
them read by placing shared lock but at the same time if another transaction
wants to update the Steve’s account balance by placing Exclusive lock, do not
allow it until reading is finished.
2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write
the data. This lock allows both the read and write operation, Once this lock is
placed on the data no other lock (shared or Exclusive) can be placed on the data
until Exclusive lock is released.
For example, when a transaction wants to update the Steve’s account balance,
let it do by placing X lock on it but if a second transaction wants to read the
data(S lock) don’t allow it, if another transaction wants to write the data(X lock)
don’t allow that either.
__________________________
| | S | X |
|-------------------------
| S | True | False |
|-------------------------
| X | False | False |
--------------------------
How to read this matrix?:
There are two rows, first row says that when S lock is placed, another S lock
can be acquired so it is marked true but no Exclusive locks can be acquired so
marked False.
In second row, When X lock is acquired neither S nor X lock can be acquired so
both marked false.
Concurrency control is used to address such conflicts which mostly occur with a
multi-user system. It helps you to make sure that database transactions are
performed concurrently without violating the data integrity of respective
databases.
Therefore, concurrency control is a most important element for the proper
functioning of a system where two or multiple database transactions that require
access to the same data, are executed simultaneously.
Here, are some issues which you will likely to face while using the Concurrency
Control method:
• Lost Updates occur when multiple transactions select the same row and
update the row based on the value selected
• Uncommitted dependency issues occur when the second transaction
selects a row which is updated by another transaction (dirty read)
• Non-Repeatable Read occurs when a second transaction is trying to
access the same row several times and reads different data each time.
• Incorrect Summary issue occurs when one transaction takes summary
over the value of all the instances of a repeated data-item, and second
transaction update few instances of that specific data-item. In that
situation, the resulting summary does not reflect a correct result.
Assume that two people who go to electronic kiosks at the same time to buy a
movie ticket for the same movie and the same show time.
However, there is only one seat left in for the movie show in that particular
theatre. Without concurrency control, it is possible that both moviegoers will
end up purchasing a ticket. However, concurrency control method does not
allow this to happen. Both moviegoers can still access information written in the
movie seating database. But concurrency control only provides a ticket to the
buyer who has completed the transaction process first.
• Lock-Based Protocols
• Two Phase
• Timestamp-Based Protocols
• Validation-Based Protocols
Lock-based Protocols
A lock is a data variable which is associated with a data item. This lock signifies
that operations that can be performed on the data item. Locks help synchronize
access to the database items by concurrent transactions.
Binary Locks: A Binary lock on a data item can either locked or unlocked
states.
A shared lock is also called a Read-only lock. With the shared lock, the data
item can be shared between transactions. This is because you will never have
permission to update data on the data item.
For example, consider a case where two transactions are reading the account
balance of a person. The database will let them read by placing a shared lock.
However, if another transaction wants to update that account's balance, shared
lock prevent it until the reading process is over.
With the Exclusive Lock, a data item can be read as well as written. This is
exclusive and can't be held concurrently on the same data item. X-lock is
requested using lock-x instruction. Transactions may unlock the data item after
finishing the 'write' operation.
4. Pre-claiming Locking
Starvation
Deadlock refers to a specific situation where two or more processes are waiting
for each other to release a resource or more than two processes are waiting for
the resource in a circular chain.
This locking protocol divides the execution phase of a transaction into three
different parts.
• Growing Phase: In this phase transaction may obtain locks but may not
release any locks.
• Shrinking Phase: In this phase, a transaction may release locks but not
obtain any new lock
It is true that the 2PL protocol offers serializability. However, it does not ensure
that deadlocks do not happen.
In the above-given diagram, you can see that local and global deadlock
detectors are searching for deadlocks and solve them with resuming transactions
to their initial states.
Strict-Two phase locking system is almost similar to 2PL. The only difference
is that Strict-2PL never releases a lock after using it. It holds all the locks until
the commit point and releases all the locks at one go when the process is over.
Centralized 2PL
Primary copy 2PL mechanism, many lock managers are distributed to different
sites. After that, a particular lock manager is responsible for managing the lock
for a set of data items. When the primary copy has been updated, the change is
propagated to the slaves.
Distributed 2PL
Timestamp-based Protocols
The older transaction is always given priority in this method. It uses system
time to determine the time stamp of the transaction. This is the most commonly
used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting
transactions when they will execute. Timestamp-based protocols manage
conflicts as soon as an operation is created.
Example:
Advantages:
Disadvantages:
Summary
Example
1. Granting Dino article created permission.
2. Subject: Dino
3. Action: Create
4. Object: Article
The controls are discretionary in the sense that a subject with a certain access
permission is capable of passing that permission (perhaps indirectly) on to any
other subject.
Example
1. Granting Dino article created permission.
2. Subject: Dino
3. Action: Create
4. Object: Article
5. Dino can create article now, and give this permission to others.
6. Dino grants James to create articles.
7. Subject: James
8. Action: Create
9. Object: Article
Example
1. Granting Dino article created permission.
2. Subject: Dino
3. Action: Create
4. Object: Article
6. Subject: Article
7. Action: Created
8. Object: Dino
RBAC differs from access control lists (ACLs), used in traditional discretionary
access-control systems, in that it assigns permissions to specific operations
with meaning in the organization, rather than to low level data objects. For
example, an access control list could be used to grant or deny write access to a
particular system file, but it would not dictate how that file could be changed. In
an RBAC-based system, an operation might be to ‘create a credit account’
transaction in a financial application or to ‘populate a blood sugar level test’
record in a medical application.
Group vs Role
SQL Injection
SQL injection is a code injection technique that might destroy your database.
SQL injection is the placement of malicious code in SQL statements, via web
page input.
SQL injection usually occurs when you ask a user for input, like their
username/userid, and instead of a name/id, the user gives you an SQL statement
that you will unknowingly run on your database.
Example
txtUserId = getRequestString("UserId");
txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
The rest of this chapter describes the potential dangers of using user input in
SQL statements.
Look at the example above again. The original purpose of the code was to
create an SQL statement to select a user, with a given user id.
If there is nothing to prevent a user from entering "wrong" input, the user can
enter some "smart" input like this:
105 OR 1=1
UserId:
Then, the SQL statement will look like this:
The SQL above is valid and will return ALL rows from the "Users" table,
since OR 1=1 is always TRUE.
Does the example above look dangerous? What if the "Users" table contains
names and passwords?
SELECT UserId, Name, Password FROM Users WHERE UserId = 105 or 1=1;
A hacker might get access to all the user names and passwords in a database, by
simply inserting 105 OR 1=1 into the input field.
Username:
John Doe
Password:
myPass
Example
uName = getRequestString("username");
uPass = getRequestString("userpassword");
sql = 'SELECT * FROM Users WHERE Name ="' + uName + '" AND Pass ="'
+ uPass + '"'
Result
SELECT * FROM Users WHERE Name ="John Doe" AND Pass ="myPass"
A hacker might get access to user names and passwords in a database by simply
inserting " OR ""=" into the user name or password text box:
User Name:
Password:
The code at the server will create a valid SQL statement like this:
Result
SELECT * FROM Users WHERE Name ="" or ""="" AND Pass ="" or ""=""
The SQL above is valid and will return all rows from the "Users" table,
since OR ""="" is always TRUE.
The SQL statement below will return all rows from the "Users" table, then
delete the "Suppliers" table.
Example
SELECT * FROM Users; DROP TABLE Suppliers
Example
txtUserId = getRequestString("UserId");
txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
Result
SELECT * FROM Users WHERE UserId = 105; DROP TABLE Suppliers;
Use SQL Parameters for Protection
To protect a web site from SQL injection, you can use SQL parameters.
SQL parameters are values that are added to an SQL query at execution time, in
a controlled manner.
The SQL engine checks each parameter to ensure that it is correct for its column
and are treated literally, and not as part of the SQL to be executed.
Another Example
txtNam = getRequestString("CustomerName");
txtAdd = getRequestString("Address");
txtCit = getRequestString("City");
txtSQL = "INSERT INTO Customers (CustomerName,Address,City)
Values(@0,@1,@2)";
db.Execute(txtSQL,txtNam,txtAdd,txtCit);
Examples
txtUserId = getRequestString("UserId");
sql = "SELECT * FROM Customers WHERE CustomerId = @0";
command = new SqlCommand(sql);
command.Parameters.AddWithValue("@0",txtUserID);
command.ExecuteReader();
txtNam = getRequestString("CustomerName");
txtAdd = getRequestString("Address");
txtCit = getRequestString("City");
txtSQL = "INSERT INTO Customers (CustomerName,Address,City)
Values(@0,@1,@2)";
command = new SqlCommand(txtSQL);
command.Parameters.AddWithValue("@0",txtNam);
command.Parameters.AddWithValue("@1",txtAdd);
command.Parameters.AddWithValue("@2",txtCit);
command.ExecuteNonQuery();