1 - Dbms-Module-5 FULL
1 - Dbms-Module-5 FULL
▫ Copy that disk block into a buffer in main memory (if that disk
block is not already in some main memory buffer).
▫ Copy item X from the program variable named X into its correct
location in the buffer.
▫ Store the updated block from the buffer back to disk (either
immediately or at some later point in time).
le S1
T1 T2
R(A)
W A,B
(A) R(A) T1 T2
W(A)
R(B)
No Cycle in precedence graph so S1 is conflict
W(B) serializable
R(B)
W(B)
Check if S2 is conflict
serializa b le
SN GCE
S2
T1 T2
A
R(A) W(A)
R(B) T1 T2
W(B)
B
R(B)
Cycle in precedence graph so S1 is NOT conflict
R(A)
serializable
Check if S3 is conflict serializable and find its
serializing order
S3
T1 T2 T3
R(A)
W(B) A
T1 T2
R(B)
R(A)
W(A) A B
R(A) T3
SERIALIZING ORDER= T2 T1 T3
View Equivalence
• Two schedules S1 and S2 are said to be view equivalent if
for each data item,
▫ If the initial read in S1 is done by Ti, then same transaction
should be done in S2 also
▫ If the final write in S1 is done by Ti, then in S2 also Ti should
perform the final write
▫ If Ti reads the item written by Tj in S1, then S2 also Ti should
read the item written by Tj
View Serializable Schedule
• A non serial schedule is view serializable schedule if it is
view equivalent to some of its serial schedule
Check if S1 is view serializable
S1 S2: T1T2
T1 T2 T1 T2
R(A) R(A)
W(A W(A) S1 and S2 is view
R(B) serializable
) R(A)
W(B) hence serializable
W(A)
R(A)
R(B) W(A)
W(B) R(B)
R(B) W(B)
W(B)
Initial read in S1 by T 1 and S2 T2
Final write in S1 is by T2 AND S2 is by T2
Writer reader conflict in data item is same sequence in both S1 AND s2
Important Points
• Every Conflict serializable schedule is View Serializable but
vice versa is not true
• Every Conflict Serializable schedule is Serializable but vice
versa is not true
• Every View Serializable schedule is serializable and vice
versa is also true
Shortcut to check View Serializable or not
Step 1: All conflict Serializable schedule are View Serializable
Step 2: For those schedules that are not conflict serializable
If there does not exist any blind writes, then the
schedule is surely not view serializable
If there exist blind write, then the schedule may or may not be view
serializable. The follow Normal Procedure
Characterizing Schedule Based on
Recoverability
• Once a transaction is committed, we can’t abort or rollback
• If a transaction is failed, it can’t commit
1. Irrecoverable and Recoverable Schedule
2. Cascadeless Schedule
3. Strict Schedule
Irrecoverable Schedule
• If a transaction Tj reads a data item written by transaction
Ti and commit operation of T j is before the commit of Ti,
then such schedules are irecoverable
S
T1 T2 Initially A=10
R(A) Here T2 is not recoverable ie
A=A //A=-10 irrecoverable
-20 T2 performs a dirty read
W(A R(A) //A=-10
) Commit; // A=-10 is permanent
R(B)
W(B) if T1 fails here t1 should roolback A to 10 but
Commit; A’s value is already committed. It cannot
rollback. A=-10 is an invalid value
Recoverable Schedules
• If a transaction Tj reads a data item written by transaction
Ti, then the commit operation of T j should happen only
after the commit Ti, then such schedules are recoverable
• Tht is Writer_transaction should commit first, then
Reader_transaction commit (Recoverable Schedules)
S
First T1 has first write so it must
T1 T2
commit here it is so.
R(A)
W(A //A=10 So T2 is recoverable in case if
) R(A) there is a failure in T1
R(B)
W(B)
Commit;
Commit;
S
T1 T2
W(A)
R(A) Irrecoverable
Commit;
Commit;
S
T1 T2
R(A)
Recoverable
W
(A)
C W(A) //Blind Write
ommit; Commit;
S
T1 T2 T3
R(A)
W(A) Recoverable
R(A)
W(A) The problem with above
R(A) schedule is cascading
W(A) rollback/aborts
C1;
C2
C3
Cascading Rollbacks
• Because of a single failure in a particular transaction, if all
dependent transactions need to be rolled back, it is called
as cascading rollbacks
• Such schedules are recoverable, but can be time consuming
since numerous transaction csn be rolled back
• Complete waste of work, time and resources
• To avoid cascading rollbacks, we need Cascadeless
schedules
Cascadeless Schedules
• Schedule which are free from Cascading Rollbacks
• If every transaction in the schedule reads only items that
were written by committed transaction, then such a
schedule is called Cascadeless schedules
S
T1 T2 T3
Recoverable and
R(A)
Cascadeless
W(A)
C1; Every cascadeless
R(A) schedules are
W(A) Recoverable but vice
C2 versa is not true
R(A)
W(A)
C3
S
T1 T2
R(A)
R(B) Recoverable NOT Cascadeless
W(B)
W(A) Cascading abort
R(B) rollbacks/ possible
C;
C;
Strict Schedule
• More restrictive type of schedule
• In a schedule, a transaction is neither allowed to read/write
a data item until the transaction that has write is
committed or aborted
S
T1 T2
Recoverable, Cascadeless,
W(A) Not Strict
W(A)
C;
S
T1 T2 Recoverable, Cascadeless,
W(A) Strict
C
R(A)/W(A)
C;
//50 A=A-20
//30
W(A) //40
W(A)//30
Unrepeatable Read Problem
• It happens when one transaction reads the same data twice
and another transaction updates that in data between the
two reads.
S
T1 T2
R(A)
//10 R(A) //10
A=A-20 //-10
W(A)
R(A)//-10
S
T1 T2
Select * from Employee
where Id between 1 and 3
// 2 rows
T1 T2 . . . ... Tn
S(A)
R(A)
S(A)
R(A)
Case 2
• If any transaction want to apply an exclusive lock on data
item on which already a shared lock is allowed for some
other transaction is Not Possible
T1 T2 Tn
S(A)
R(A)
S(A)
R(A)
X(A)//Not allowed
W(A)
Case 2... cont..
• But T3 can aquire this lock after unlocking
T1 T2 Tn
S(A)
R(A)
S(A)
R(A)
U(A)
U(A)
X(A)//allowed
W(A)
Multiple shared locks on same data item is allowed
But all other combinations are not allowed
Qn 1. Whether the following is lock co m patible or
S NGCE
notT1 T2
S(A)
R(A)
S(B)
R(B)
U(A)
X(A)
R(A)
B is not unlocked.
X(B) so this exclusive lock is not
R(B) possible
So NOT Lock
Compatible
Qn 2. Whether the following is lock co m patible
SN GCE
or
not T1
T2
S(A)
R(A)
U(A) So Lock Compatible
S(B)
R(B) X(A) This is called as Simple
W(A) Locking Protocol
S(B)
R(B)
U(A)
U(B)
U(B)
Qn 2. Given a non serial schedule
T1 T2
S(A)
R(A) So Lock Compatible,
U(A) But it is not Serializable
X(A)
W(A) A
U(A) T2
X(B) T1
W(B) B
U(B)
S(B)
R(B)
U(B)
Problems with Simple Locking Protocol
I. Sometimes does not guarantee Serializability
II. May lead to deadlock
X(A)//LP
R(A)
W(A)
U(A) Lock Compatible,
C; 2PL,
C; Irrecoverable
Cascading rollbacks are possible
T1 T2 T3
X(A)//LP
R(A)
W(A)
U(A)
X(A)//LP
R(A)
W(A)
U(A) Lock Compatible,
C; 2PL,
S(A)//LP Recoverable
R(A) NOT Cascadeless
C;
C;
Deadlocks are possible
T1 T2
X(A) X(B)
R(A) R(B)
W(A) W(B)
********* **********
Wants to update B Wants to update A
********* **********
Here none of the transaction can continue. None of them will release
the locks
Because once they release any locks then it is not possible to lock any
data item
This will leads to deadlock
• These drawback can be removed by using the following
variation of basic 2PL
• We have to solve Recoverability, Cascading Rollbacks and
Deadlock
Variations of 2PL
1. Strict 2PL
2. Rigorous 2PL
3. Conservative 2PL
Strict 2PL
• Follow basic 2PL + All exclusive locks should unlock only
after commit operation
• Strict Schedule
▫ A transaction is neither allowed to read/write a data item
until the transaction that has written is committed
T1 T2
W(A)
C;
R(A)/W(A) //Strict Schedule
Benifits and Drawbacks of Strict 2PL
• Benifits
▫ Generates Conflict Serializable scchedules(variation of Basic
2PL)
▫ Produce Strict Schedules (hence it is recoverable and
cascadeless)
• Drawback
▫ Deadlock is possible
Example
T1 T2
X(A)
R(A)
S(B)
R(B)
S(B)
R(B)
W(A)
U(B)
C;
U(A) Exclusive lock on A unlocked
after comit(unlock not
X(A) mandatory)
R(A) Lock
W(A); Compatible,
C; 2PL, Strict
2PL
Rigorous 2PL
• More stronger than Strict 2PL
• Follow basic 2PL + All locks (both exclusive and shared
locks) should unlock only after commit operation
• Every Rigorous 2PLis Strict 2PL but vice versa is not
true
Benifits and Drawbacks of Rigorous 2PL
(Same as Strict 2PL)
• Benifits
▫ Generates Conflict Serializable scchedules(variation of Basic
2PL)
▫ Produce Strict Schedules (hence it is recoverable and
cascadeless)
• Drawback
▫ Deadlock is possible
Example
T1 T2
S(A)
R(A)
S(B)
R(B)
X(C)
R(C)
W(A)
C
U(B);
U(C) Rigorous 2PL & Strict
S(C) 2PL
R(C)
C;
Conservative 2PL(C2PL)
• No DEADLOCK
• Follow Basic 2PL + All transaction should obtain all lock
they need before the transaction begins
• Release all lock after commit
• Recoverable, Serializable, Casadeless, Deadlock Free
• Not practical to implement
Steps for Conservative 2PL(C2PL)
Step 1: All Locks should acquire at the beginning( before
transactionexecution begins
Step 2: Perform operations(Read/Write)
Step 3: On completion release all locks
T2 CAN START
Drawbacks of Conservative 2PL
• Poor resourse utilization
• Concurrency is limited
• Each transaction needs to declare all the data items that
need to be read/write at beginning, which is not always
possible
Log-based recovery
• A Log is the most widely used structure for recording database
modifications.
• Update log record: It describes a single database write. It has
following fields-
▫ Transaction identifier is the unique identifier of the transaction
that performed the write operation.
▫ Data item identifier is the unique identifier of the data item written.
▫ Old value is the value of the data item prior to the write.
▫ New value is the value of the data item that it will have after the
write.
• Various types of log records are
represented as:
• < Ti start>: Transaction Ti has started.
• <Ti, X, V1, V2> : Transaction Ti has performed a write
on data item Xj , Xj had value v1 before the write, and will
have value v2 after the write.
• <Ti commit> : Transaction Ti has committed.
• <Ti abort> : Transaction Ti has aborted.
• Two techniques that users log:
▫ Deferred database modification
▫ Immediate database modification
Deferred Database Modification
• The deferred database modification scheme records all
modifications to the log, but defers all the writes to after partial
commit.
• Assume that transactions execute serially
• Transaction starts by writing <Ti start> record to log.
• A write(X) operation results in a log record <Ti, X, V> being
written, where V is the new value for X
▫ Note: old value is not needed for this scheme
• The write is not performed on X at this time, but is
deferred.
• When Ti partially commits, <Ti commit> is written to the
log
• Finally, the log records are read and used to actually execute the
previously deferred writes.
• During recovery after a crash, a transaction needs to be
redone if and only if both <Ti start> and<Ti commit> are
there in the log.
• Redoing a transaction Ti (redo Ti) sets the value of all data
items updated by the transaction to the new values.
• Crashes can occur while
▫ the transaction is executing the original updates, or
▫ while recovery action is being taken
Example Transaction T0 and T1 (T0 execute before T1)
• If log on stable storage at time of crash is as in case:
(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is
present
(c) redo(T0) must be performed followed by redo(T1)
since
<T0 commit> and <Ti commit> are present
Immediate Database Modification
• The immediate database modification technique allows database
modifications to be output to the database while the transaction is still in
the active state.
• Update log record must be written before database item is written
▫ We assume that the log record is output directly to stable storage
▫ Can be extended to postpone log record output, so long as prior to execution
of an output(B) operation for a data block B, all log records corresponding to
items B must be flushed to stable storage
• Output of updated blocks can take place at any time before or after
transaction commit
• Order in which blocks are output can be different from the order in which
they are written.
Immediate Database Modification Example
Checkpointing Protocol
• All committed transactions in the log file before checkpoint
are permanently saved on the disk. So no need to anything
• All committed tansactions after checkpoint should be
redone (redo list)
• All uncommitted transactions (before and after checkpoint)
should be undone (undo list)
• Recovery procedure has two operations instead
of one:
▫ undo(Ti) restores the value of all data items
updated by Ti to their old
values, going backwards from the last log record
for Ti
▫ redo(Ti) sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
• Both operations must be idempotent
▫ That is, even if the operation is executed multiple times the effect is
the same as if it is executed once
Needed since operations may get re-executed during recovery
• When recovering after failure:
▫ Transaction Ti needs to be undone if the log contains the record <Ti
start>, but does not contain the record <Ti commit>.
▫ Transaction Ti needs to be redone if the log contains both the record
<Ti start> and the record <Ti commit>.
• Undo operations are performed first, then redo operations.
Immediate DB Modification Recovery Example
Recovery With Concurrent Transactions
• We modify the log-based recovery schemes to allow multiple transactions
to execute concurrently.
▫ All transactions share a single disk buffer and a single log
▫ A buffer block can have data items updated by one or more transactions
• We assume concurrency control using strict two-phase locking;
▫ i.e. the updates of uncommitted transactions should not be visible to other
transactions
Otherwise how to perform undo if T1 updates A, then T2 updates A and
commits, and finally T1 has to abort?
• Logging is done as described earlier.
▫ Log records of different transactions may be interspersed in the log.
• The checkpointing technique and actions taken on recovery have to be
changed
▫ since several transactions may be active when a checkpoint is
performed.
NoSQL
• NoSQL database stands for “Not Only SQL” or “Not SQL.”
• It is a non-relational Data Management System, that does
not require a fixed schema.
• It avoids joins, and is easy to scale. The major purpose of
using a NoSQL database is for distributed data stores
with humongous data storage needs.
• NoSQL is used for Big data and real-time web apps.
• For example, companies like Twitter, Facebook and Google
collect terabytes of user data every single day
• The concept of NoSQL databases became popular with Internet
giants like Google, Facebook, Amazon, etc. who deal with huge
volumes of data.
• The system response time becomes slow when you use RDBMS for
massive volumes of data.
• To resolve this problem, we could “scale up” our systems by
upgrading our existing hardware.
• This process is expensive.
• The alternative for this issue is to distribute database load on
multiple hosts whenever the load increases. This method is known
as “scaling out.”
• NoSQL database is non-relational, so it scales out better than
relational databases as they are designed with web applications in
mind.
Breif History of NoSQL Databases
• 1998- Carlo Strozzi use the term NoSQL for his lightweight,
open-source relational database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
Features of NoSQL
• Non-relational
▫ NoSQL databases never follow the relational model
▫ Never provide tables with flat fixed-column records
▫ Work with self-contained aggregates or BLOBs
▫ Doesn’t require object-relational mapping and data normalization
▫ No complex features like query languages, query planners,referential
integrity joins,
• ACID
▫ Schema-free
▫ NoSQL databases are either schema-free or have relaxed schemas
▫ Do not require any sort of definition of the schema of the data
▫ Offers heterogeneous structures of data in the same domain
Types of NoSQL
• NoSQL Databases are mainly categorized into four types:
▫ Key-value Pair Based
▫ Column-oriented
▫ Graphs based
▫ Document-oriented
Key Value Database
• A key-value database (sometimes called a key-value store)
uses a simple key-value method to store data.
• These databases contain a simple string (the key) that is
always unique and an arbitrary large data field (the value).
• They are easy to design and implement.
• As the name suggests, this type of NoSQL database implements a
hash table to store unique keys along with the pointers to the
corresponding data values.
• The values can be of scalar data types such as integers or complex
structures such as JSON, lists, BLOB, and so on.
• A value can be stored as an integer, a string, JSON, or an array—
with a key used to reference that value.
• It typically offers excellent performance and can be optimized to
fit an organization’s needs.
• Key-value stores have no query language but they do provide a
way to add and remove key-value pairs.
• Values cannot be queried or searched upon. Only the key can be
queried
When to use a key value database
• When your application needs to handle lots of small
continuous reads and writes, that may be volatile.
Key- value databases offer fast in-memory
access.
• When storing basic information, such as customer details;
storing webpages with the URL as the key and the
webpage as the value; storing shopping-cart contents,
product categories, e-commerce product details
• For applications that don’t require frequent updates or
need to support complex queries
Use Cases for key value database
• Session management on a large scale.
• Using cache to accelerate application responses.
• Storing personal data on specific users.
• Product recommendations, storing personalized lists of
items for individual customers.
• Managing each player’s session in massive multiplayer
online games.
• Redis, Dynamo, Riak are some NoSQL examples of key-
value store DataBases.
Column Oriented
• While a relational database stores data in rows and reads data row
by row, a column store is organized as a set of columns.
• When you want to run analytics on a small number of columns,
you can read those columns directly without consuming
memory with the unwanted data.
• Columns are often of the same type and benefit from more
efficient compression, making reads even faster.
• Columnar databases can quickly aggregate the value of a given
column (adding up the total sales for the year, for example).
Use cases include analytics.
• Column databases use the concept of keys p ace, which is
SN GCE
• The Row Key is exactly that: the specific identifier of that row and is always
unique.
• The column contains the name, value, and timestamp, so that’s
straightforward. The name/value pair is also straight forward, and the
timestamp is the date and time the data was entered into the database.
• Some examples of column-store databases include Casandra, CosmoDB,
Bigtable, and HBase.
Use Cases
• Developers mainly use column databases in:
▫ Content management systems
▫ Blogging platforms
▫ Systems that maintain counters
▫ Services that have expiring usage
▫ Systems that require heavy write requests (like log
aggregators)
Benifits of Column Database
• There are several benefits that go along with columnar databases:
• Column stores are excellent at compression and therefore are
efficient in terms of storage.
• You can reduce disk resources while holding massive amounts of
information in a single column
• Since a majority of the information is stored in a column,
aggregation queries are quite fast, which is important for projects
that require large amounts of queries in a small amount of time.
• Scalability is excellent with column-store databases.
▫ They can be expanded nearly infinitely, and are often spread across
large clusters of machines, even numbering in thousands.
▫ That also means that they are great for Massive Parallel Processing
• Load times are similarly excellent, as you can easily load a
billion-row table in a few seconds.
▫ You can load and query nearly instantly.
• Large amounts of flexibility as columns do not necessarily
have to look like each other.
▫ You can add new and different columns without disrupting
the whole database.
Document Oriented Database
• Is a modernized way of storing data as JSON rather than
basic columns/rows — i.e. storing data in its native
form.
• This storage system lets you retrieve, store, and manage
document oriented information
• It’s a very popular category of modern NoSQL databases,
used by the likes of MongoDB, Cosmos DB,
DocumentDB, SimpleDB, PostgreSQL, OrientDB,
Elasticsearch and RavenDB.
PREPARED BY SHARIKA T R,
S
NGCE