0% found this document useful (0 votes)
15 views185 pages

Unit 5

Uploaded by

TEJASWI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views185 pages

Unit 5

Uploaded by

TEJASWI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

File Organization, Record Organization

and Storage Access


File Organization

 The database is stored as a collection of


files. Each file is a sequence of records.
A record is a sequence of fields.
 One approach:
assume record size is fixed
each file has records of one particular type
only
different files are used for different relations
This case is easiest to implement; will consider
variable length records later.
Fixed-Length Records
 Simple approach:
 Store record i starting from byte n  (i – 1), where n is the size of
each record.
 Record access is simple but records may cross blocks
 Modification: do not allow records to cross block boundaries

 Deletion of record i:
alternatives:
 move records i + 1, . . ., n
to i, . . . , n – 1
 move record n to i
 do not move records, but
link all free records on a
free list
Deleting record 3 and compacting
Deleting record 3 and moving last record
Variable-Length Records
 Variable-length records arise in database systems in several ways:
 Storage of multiple record types in a file.
 Record types that allow variable lengths for one or more fields such as
strings (varchar)
 Record types that allow repeating fields (used in some older data
models).
 Attributes are stored in order
 Variable length attributes represented by fixed size (offset, length),
with actual data stored after all fixed length attributes
 Null values represented by null-value bitmap
Variable-Length Records: Slotted Page Structure

 Slotted page header contains:


 number of record entries
 end of free space in the block
 location and size of each record
 Records can be moved around within a page to keep them
contiguous with no empty space between them; entry in the
header must be updated.
 Pointers should not point directly to record — instead they
should point to the entry for the record in header.
Organization of Records in Files

 Heap – a record can be placed anywhere in the file where there


is space
 Sequential – store records in sequential order, based on the
value of the search key of each record
 Hashing – a hash function computed on some attribute of each
record; the result specifies in which block of the file the record
should be placed
 Records of each relation may be stored in a separate file. In a
multitable clustering file organization records of several
different relations can be stored in the same file
 Motivation: store related records on the same block to minimize I/O
Sequential File Organization
 Suitable for applications that require sequential processing of
the entire file
 The records in the file are ordered by a search-key
Sequential File Organization (Cont.)
 Deletion – use pointer chains
 Insertion –locate the position where the record is to be inserted
 if there is free space insert there
 if no free space, insert the record in an overflow block
 In either case, pointer chain must be updated
 Need to reorganize the file
from time to time to restore
sequential order
Multitable Clustering File Organization
Store several relations in one file using a multitable clustering
file organization

department

instructor

multitable clustering
of department and
instructor
Multitable Clustering File Organization (cont.)
 good for queries involving department instructor, and for
queries involving one single department and its instructors
 bad for queries involving only department
 results in variable size records
 Can add pointer chains to link records of a particular relation
Data Dictionary Storage
The Data dictionary (also called system catalog) stores
metadata; that is, data about data, such as
 Information about relations
 names of relations
 names, types and lengths of attributes of each relation
 names and definitions of views
 integrity constraints
 User and accounting information, including passwords
 Statistical and descriptive data
 number of tuples in each relation
 Physical file organization information
 How relation is stored (sequential/hash/…)
 Physical location of relation

 Information about indices (Chapter 11)


Relational Representation of System Metadata

 Relational
representatio
n on disk
 Specialized
data
structures
designed for
efficient
access, in
memory
Storage Access

 A database file is partitioned into fixed-length storage


units called blocks. Blocks are units of both storage
allocation and data transfer.
 Database system seeks to minimize the number of
block transfers between the disk and memory. We can
reduce the number of disk accesses by keeping as
many blocks as possible in main memory.
 Buffer – portion of main memory available to store
copies of disk blocks.
 Buffer manager – subsystem responsible for
allocating buffer space in main memory.
Buffer Manager

 Programs call on the buffer manager when they


need a block from disk.
1. If the block is already in the buffer, buffer manager
returns the address of the block in main memory
2. If the block is not in the buffer, the buffer manager
1. Allocates space in the buffer for the block
1. Replacing (throwing out) some other block, if required, to
make space for the new block.
2. Replaced block written back to disk only if it was modified
since the most recent time that it was written to/fetched from
the disk.
2. Reads the block from the disk to the buffer, and returns
the address of the block in main memory to requester.
Buffer-Replacement Policies

 Most operating systems replace the block least recently


used (LRU strategy)
 Idea behind LRU – use past pattern of block references as a
predictor of future references
 Queries have well-defined access patterns (such as
sequential scans), and a database system can use the
information in a user‘s query to predict future references
 LRU can be a bad strategy for certain access patterns involving
repeated scans of data
 For example: when computing the join of 2 relations r and s by a
nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …
 Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
Buffer-Replacement Policies (Cont.)

 Pinned block – memory block that is not allowed to be


written back to disk.
 Toss-immediate strategy – frees the space occupied by a
block as soon as the final tuple of that block has been
processed
 Most recently used (MRU) strategy – system must pin
the block currently being processed. After the final tuple of
that block has been processed, the block is unpinned, and
it becomes the most recently used block.
 Buffer manager can use statistical information regarding
the probability that a request will reference a particular
relation
 E.g., the data dictionary is frequently accessed. Heuristic:
keep data-dictionary blocks in main memory buffer
 Buffer managers also support forced output of blocks for
the purpose of recovery (more in Chapter 16)
Indexing and Hashing
Indexing and Hashing
 Basic Concepts
 Ordered Indices
 B+-Tree Index Files
 B+ Tree extensions
 Multiple key access
 Static Hashing
 Dynamic Hashing
 Comparison of Ordered Indexing and Hashing
 Bitmap indices
Basic Concepts

 Indexing mechanisms used to speed up access to desired data.


 E.g., author catalog in library
 Search Key - attribute to set of attributes used to look up records in a
file.
 An index file consists of records (called index entries) of the form

search-key pointer
 Index files are typically much smaller than the original file
 Two basic kinds of indices:
 Ordered indices: search keys are stored in sorted order
 Hash indices: search keys are distributed uniformly across
―buckets‖ using a ―hash function‖.
Index Evaluation Metrics
 Access types supported efficiently. E.g.,
 records with a specified value in the attribute
 or records with an attribute value falling in a specified range of
values.
 Access time
 Insertion time
 Deletion time
 Space overhead
Ordered Indices

 In an ordered index, index entries are stored sorted on the search key
value. E.g., author catalog in library.
 Primary index: in a sequentially ordered file, the index whose search
key specifies the sequential order of the file.
 Also called clustering index
 The search key of a primary index is usually but not necessarily the
primary key.
 Secondary index: an index whose search key specifies an order
different from the sequential order of the file. Also called
non-clustering index.
 Index-sequential file: ordered sequential file with a primary index.
Dense Index Files
 Dense index — Index record appears for every search-key
value in the file.
 E.g. index on ID attribute of instructor relation
Dense Index Files (Cont.)
 Dense index on dept_name, with instructor file sorted on
dept_name
Sparse Index Files
 Sparse Index: contains index records for only some search-key
values.
 Applicable when records are sequentially ordered on search-key
 To locate a record with search-key value K we:
 Find index record with largest search-key value < K
 Search file sequentially starting at the record to which the index
record points
Sparse Index Files (Cont.)
 Compared to dense indices:
 Less space and less maintenance overhead for insertions and
deletions.
 Generally slower than dense index for locating records.
 Good tradeoff: sparse index with an index entry for every block in file,
corresponding to least search-key value in the block.
Secondary Indices Example

Secondary index on salary field of instructor

 Index record points to a bucket that contains pointers to all the


actual records with that particular search-key value.
 Secondary indices have to be dense
Primary and Secondary Indices
 Indices offer substantial benefits when searching for records.
 BUT: Updating indices imposes overhead on database
modification --when a file is modified, every index on the file
must be updated,
 Sequential scan using primary index is efficient, but a
sequential scan using a secondary index is expensive
 Each record access may fetch a new block from disk
 Block fetch requires about 5 to 10 milliseconds, versus
about 100 nanoseconds for memory access
Multilevel Index
 If primary index does not fit in memory, access becomes
expensive.
 Solution: treat primary index kept on disk as a sequential file
and construct a sparse index on it.
 outer index – a sparse index of primary index
 inner index – the primary index file
 If even outer index is too large to fit in main memory, yet
another level of index can be created, and so on.
 Indices at all levels must be updated on insertion or deletion
from the file.
Multilevel Index (Cont.)
Index Update: Deletion

 If deleted record was the


only record in the file with its
particular search-key value,
the search-key is deleted
from the index also.
 Single-level index entry deletion:
 Dense indices – deletion of search-key is similar to file record
deletion.
 Sparse indices –
 if an entry for the search key exists in the index, it is
deleted by replacing the entry in the index with the next
search-key value in the file (in search-key order).
 If the next search-key value already has an index entry, the
entry is deleted instead of being replaced.
Index Update: Insertion
 Single-level index insertion:
 Perform a lookup using the search-key value appearing in
the record to be inserted.
 Dense indices – if the search-key value does not appear in
the index, insert it.
 Sparse indices – if index stores an entry for each block of
the file, no change needs to be made to the index unless a
new block is created.
 If
a new block is created, the first search-key value
appearing in the new block is inserted into the index.
 Multilevel insertion and deletion: algorithms are simple
extensions of the single-level algorithms
Secondary Indices
 Frequently, one wants to find all the records whose values in
a certain field (which is not the search-key of the primary
index) satisfy some condition.
 Example 1: In the instructor relation stored sequentially by
ID, we may want to find all instructors in a particular
department
 Example 2: as above, but where we want to find all
instructors with a specified salary or with salary in a
specified range of values
 We can have a secondary index with an index record for
each search-key value
B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.
 Disadvantage of indexed-sequential files
 performance degrades as file grows, since many overflow blocks
get created.
 Periodic reorganization of entire file is required.
 Advantage of B+-tree index files:
 automatically reorganizes itself with small, local, changes, in the
face of insertions and deletions.
 Reorganization of entire file is not required to maintain
performance.
 (Minor) disadvantage of B+-trees:
 extra insertion and deletion overhead, space overhead.
 Advantages of B+-trees outweigh disadvantages
 B+-trees are used extensively
 A B+ tree consists of a root, internal nodes and leaves. ... This is
primarily because unlike binary search trees, B+ trees have
very high fanout (number of pointers to child nodes in a node,
typically on the order of 100 or more), which reduces the number
of I/O operations required to find an element in the tree.
Example of B+-Tree
B+-Tree Index Files (Cont.)

A B+-tree is a rooted tree satisfying the following properties:

 All paths from root to leaf are of the same length


 Each node that is not a root or a leaf has between n/2 and
n children.
 A leaf node has between (n–1)/2 and n–1 values
 Special cases:
 If the root is not a leaf, it has at least 2 children.
 If the root is a leaf (that is, there are no other nodes in
the tree), it can have between 0 and (n–1) values.
B+-Tree Node Structure
 Typical node

 Ki are the search-key values


 Pi are pointers to children (for non-leaf nodes) or pointers to
records or buckets of records (for leaf nodes).
 The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
(Initially assume no duplicate keys, address duplicates later)
Leaf Nodes in B+-Trees

Properties of a leaf node:


 For i = 1, 2, . . ., n–1, pointer Pi points to a file record with
search-key value Ki,
 If Li, Lj are leaf nodes and i < j, Li‘s search-key values are less
than or equal to Lj‘s search-key values
 Pn points to next leaf node in search-key order
Non-Leaf Nodes in B+-Trees
 Non leaf nodes form a multi-level sparse index on the leaf
nodes. For a non-leaf node with m pointers:
 All the search-keys in the subtree to which P1 points are
less than K1
 For 2  i  n – 1, all the search-keys in the subtree to which
Pi points have values greater than or equal to Ki–1 and less
than Ki
 All the search-keys in the subtree to which Pn points have
values greater than or equal to Kn–1
Example of B+-tree

B+-tree for instructor file (n = 6)

 Leaf nodes must have between 3 and 5 values


((n–1)/2 and n –1, with n = 6).
 Non-leaf nodes other than root must have between 3
and 6 children ((n/2 and n with n =6).
 Root must have at least 2 children.
Observations about B+-trees
 Since the inter-node connections are done by pointers,
―logically‖ close blocks need not be ―physically‖ close.
 The non-leaf levels of the B+-tree form a hierarchy of sparse
indices.
 The B+-tree contains a relatively small number of levels
 Level below root has at least 2* n/2 values
 Next level has at least 2* n/2 * n/2 values
 .. etc.
 If there are K search-key values in the file, the tree height is
no more than  logn/2(K)
 thus searches can be conducted efficiently.
 Insertions and deletions to the main file can be handled
efficiently, as the index can be restructured in logarithmic time
(as we shall see).
Queries on B+-Trees (Cont.)

 If there are K search-key values in the file, the height of the tree is no
more than logn/2(K).
 A node is generally the same size as a disk block, typically 4
kilobytes
 and n is typically around 100 (40 bytes per index entry).
 With 1 million search key values and n = 100
 at most log50(1,000,000) = 4 nodes are accessed in a lookup.
 Contrast this with a balanced binary tree with 1 million search key
values — around 20 nodes are accessed in a lookup
 above difference is significant since every node access may need
a disk I/O, costing around 20 milliseconds
Updates on B+-Trees: Insertion
1. Find the leaf node in which the search-key value would appear
2. If the search-key value is already present in the leaf node
1. Add record to the file
2. If necessary add a pointer to the bucket.
3. If the search-key value is not present, then
1. add the record to the main file (and create a bucket if
necessary)
2. If there is room in the leaf node, insert (key-value, pointer)
pair in the leaf node
3. Otherwise, split the node (along with the new (key-value,
pointer) entry) as discussed in the next slide.
Updates on B+-Trees: Insertion (Cont.)
 Splitting a leaf node:
 take the n (search-key value, pointer) pairs (including the one
being inserted) in sorted order. Place the first n/2 in the original
node, and the rest in a new node.
 let the new node be p, and let k be the least key value in p. Insert
(k,p) in the parent of the node being split.
 If the parent is full, split it and propagate the split further up.
 Splitting of nodes proceeds upwards till a node that is not full is found.
 In the worst case the root node may be split increasing the height
of the tree by 1.

Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Next step: insert entry with (Califieri,pointer-to-new-node) into parent
B+-Tree Insertion

B+-Tree before and after insertion of ―Adams‖


B+-Tree Insertion

B+-Tree before and after insertion of ―Lamport‖


Insertion in B+-Trees (Cont.)
 Splitting a non-leaf node: when inserting (k,p) into an already full
internal node N
 Copy N to an in-memory area M with space for n+1 pointers and n
keys
 Insert (k,p) into M
 Copy P1,K1, …, K n/2-1,P n/2 from M back into node N
 Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly allocated node
N‘
 Insert (K n/2,N‘) into parent N
 Read pseudocode in book!

Califieri

Adams Brandt Califieri Crick Adams Brandt Crick


Characteristics of B+ Tree
 The following are some of the main characteristics of a B+ tree:
Operations (insert, delete) on the tree keep it balanced.
 A minimum occupancy of 50 percent is guaranteed for each node
except the root if the deletion algorithm is implemented.
 However, deletion is often implemented by simply locating the data
entry and removing it, without adjusting the tree as needed to
guarantee the 50 percent occupancy, because files typically grow
rather than shrink.
 Searching for a record requires just a traversal from the root to the
appropriate leaf. We will refer to the length of a path from the root to
a leaf—any leaf, because the tree is balanced—as the height of the tree.
For example, a tree with only a leaf level and a single index level, such
as the tree has height 1. Because of high fan-out (, the height of a B+
tree is rarely more than 3 or 4.
 B+ trees in which every node contains m entries, where d ≤ m ≤ 2d.
The value d is a parameter of the B+ tree, called the order of the tree,
and is a measure of the capacity of a tree node. The root node is the
only exception to this requirement on the number of entries; for the
root it is simply required that 1 ≤ m ≤ 2d.
Examples of B+-Tree Deletion

Before and after deleting ―Srinivasan‖

 Deleting ―Srinivasan‖ causes merging of under-full leaves


Examples of B+-Tree Deletion (Cont.)

Deletion of ―Singh‖ and ―Wu‖ from result of previous example

 Leaf containing Singh and Wu became underfull, and borrowed a value


Kim from its left sibling
 Search-key value in the parent changes as a result
Example of B+-tree Deletion (Cont.)

Before and after deletion of ―Gold‖ from earlier example

 Node with Gold and Katz became underfull, and was merged with its sibling
 Parent node becomes underfull, and is merged with its sibling
Value separating two nodes (at the parent) is pulled down when merging
 Root node then has only one child, and is deleted
Updates on B+-Trees: Deletion

 Find the record to be deleted, and remove it from the main file and
from the bucket (if present)
 Remove (search-key value, pointer) from the leaf node if there is no
bucket or if the bucket has become empty
 If the node has too few entries due to the removal, and the entries in
the node and a sibling fit into a single node, then merge siblings:
 Insert all the search-key values in the two nodes into a single node
(the one on the left), and delete the other node.
 Delete the pair (Ki–1, Pi), where Pi is the pointer to the deleted
node, from its parent, recursively using the above procedure.
Updates on B+-Trees: Deletion
 Otherwise, if the node has too few entries due to the removal, but the
entries in the node and a sibling do not fit into a single node, then
redistribute pointers:
 Redistribute the pointers between the node and a sibling such that
both have more than the minimum number of entries.
 Update the corresponding search-key value in the parent of the
node.
 The node deletions may cascade upwards till a node which has n/2
or more pointers is found.
 If the root node has only one pointer after deletion, it is deleted and
the sole child becomes the root.
Non-Unique Search Keys
 Alternatives to scheme described earlier
 Buckets on separate block (bad idea)
 List of tuple pointers with each key
 Extra code to handle long lists
 Deletion of a tuple can be expensive if there are many
duplicates on search key (why?)
 Low space overhead, no extra cost for queries
 Make search key unique by adding a record-identifier
 Extra storage overhead for keys
 Simpler code for insertion/deletion
 Widely used
B+-Tree File Organization

 Index file degradation problem is solved by using B+-Tree indices.


 Data file degradation problem is solved by using B+-Tree File
Organization.
 The leaf nodes in a B+-tree file organization store records, instead of
pointers.
 Leaf nodes are still required to be half full
 Since records are larger than pointers, the maximum number of
records that can be stored in a leaf node is less than the number of
pointers in a nonleaf node.
 Insertion and deletion are handled in the same way as insertion and
deletion of entries in a B+-tree index.
B+-Tree File Organization (Cont.)

Example of B+-tree File Organization


 Good space utilization important since records use more space than
pointers.
 To improve space utilization, involve more sibling nodes in redistribution
during splits and merges
 Involving 2 siblings in redistribution (to avoid split / merge where
possible) results in each node having at least 2n / 3 entries
Other Issues in Indexing

 Record relocation and secondary indices


 If a record moves, all secondary indices that store record pointers
have to be updated
 Node splits in B+-tree file organizations become very expensive
 Solution: use primary-index search key instead of record pointer in
secondary index
 Extra traversal of primary index to locate record
– Higher cost for queries, but node splits are cheap
 Add record-id if primary-index search key is non-unique
Indexing Strings
 Variable length strings as keys
 Variable fanout
 Use space utilization as criterion for splitting, not number of
pointers
 Prefix compression
 Key values at internal nodes can be prefixes of full key
 Keep enough characters to distinguish entries in the
subtrees separated by the key value
– E.g. ―Silas‖ and ―Silberschatz‖ can be separated by
―Silb‖
 Keys in leaf node can be compressed by sharing common
prefixes
Bulk Loading and Bottom-Up Build
 Inserting entries one-at-a-time into a B+-tree requires  1 IO per entry
 assuming leaf level does not fit in memory
 can be very inefficient for loading a large number of entries at a time
(bulk loading)
 Efficient alternative 1:
 sort entries first (using efficient external-memory sort algorithms
discussed later in Section 12.4)
 insert in sorted order
 insertion will go to existing page (or cause a split)
 much improved IO performance, but most leaf nodes half full
 Efficient alternative 2: Bottom-up B+-tree construction
 As before sort entries
 And then create tree layer-by-layer, starting with leaf level
 details as an exercise
 Implemented as part of bulk-load utility by most database systems
B-Tree Index Files

 Similar to B+-tree, but B-tree allows search-key values to


appear only once; eliminates redundant storage of search
keys.
 Search keys in nonleaf nodes appear nowhere else in the B-
tree; an additional pointer field for each search key in a
nonleaf node must be included.
 Generalized B-tree leaf node

 Nonleaf node – pointers Bi are the bucket or file record


pointers.
B-Tree Index File Example

B-tree (above) and B+-tree (below) on same data


B-Tree Index Files (Cont.)

 Advantages of B-Tree indices:


 May use less tree nodes than a corresponding B+-Tree.
 Sometimes possible to find search-key value before reaching leaf
node.
 Disadvantages of B-Tree indices:
 Only small fraction of all search-key values are found early
 Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees
typically have greater depth than corresponding B+-Tree
 Insertion and deletion more complicated than in B+-Trees
 Implementation is harder than B+-Trees.
 Typically, advantages of B-Trees do not out weigh disadvantages.
Multiple-Key Access
 Use multiple indices for certain types of queries.
 Example:
select ID
from instructor
where dept_name = ―Finance‖ and salary = 80000
 Possible strategies for processing query using indices on
single attributes:
1. Use index on dept_name to find instructors with
department name Finance; test salary = 80000
2. Use index on salary to find instructors with a salary of
$80000; test dept_name = ―Finance‖.
3. Use dept_name index to find pointers to all records
pertaining to the ―Finance‖ department. Similarly use index
on salary. Take intersection of both sets of pointers
obtained.
Indices on Multiple Keys
 Composite search keys are search keys containing more
than one attribute
 E.g. (dept_name, salary)
 Lexicographic ordering: (a1, a2) < (b1, b2) if either
 a1 < b1, or
 a1=b1 and a2 < b2
Indices on Multiple Attributes

Suppose we have an index on combined search-key


(dept_name, salary).

 With the where clause


where dept_name = ―Finance‖ and salary = 80000
the index on (dept_name, salary) can be used to fetch only records
that satisfy both conditions.
 Using separate indices in less efficient — we may fetch many
records (or pointers) that satisfy only one of the conditions.
 Can also efficiently handle
where dept_name = ―Finance‖ and salary < 80000
 But cannot efficiently handle
where dept_name < ―Finance‖ and balance = 80000
 May fetch many records that satisfy the first but not the second
condition
Other Features
 Covering indices
 Add extra attributes to index so (some) queries can avoid fetching
the actual records
 Particularly useful for secondary indices
– Why?
 Can store extra attributes only at leaf
Hashing
Static Hashing

 A bucket is a unit of storage containing one or more records (a


bucket is typically a disk block).
 In a hash file organization we obtain the bucket of a record directly
from its search-key value using a hash function.
 Hash function h is a function from the set of all search-key values K
to the set of all bucket addresses B.
 Hash function is used to locate records for access, insertion as well
as deletion.
 Records with different search-key values may be mapped to the
same bucket; thus entire bucket has to be searched sequentially to
locate a record.
Example of Hash File Organization

Hash file organization of instructor file, using dept_name as key


(See figure in next slide.)

 There are 10 buckets,


 The binary representation of the ith character is assumed to be the
integer i.
 The hash function returns the sum of the binary representations of
the characters modulo 10
 E.g. h(Music) = 1 h(History) = 2
h(Physics) = 3 h(Elec. Eng.) = 3
Example of Hash File Organization

Hash file organization of instructor file, using dept_name as key


(see previous slide for details).
Hash Functions

 Worst hash function maps all search-key values to the same bucket;
this makes access time proportional to the number of search-key
values in the file.
 An ideal hash function is uniform, i.e., each bucket is assigned the
same number of search-key values from the set of all possible values.
 Ideal hash function is random, so each bucket will have the same
number of records assigned to it irrespective of the actual distribution of
search-key values in the file.
 Typical hash functions perform computation on the internal binary
representation of the search-key.
 For example, for a string search-key, the binary representations of
all the characters in the string could be added and the sum modulo
the number of buckets could be returned. .
Handling of Bucket Overflows

 Bucket overflow can occur because of


 Insufficient buckets
 Skew in distribution of records. This can occur due to two
reasons:
 multiple records have same search-key value
 chosen hash function produces non-uniform distribution of key
values
 Although the probability of bucket overflow can be reduced, it cannot
be eliminated; it is handled by using overflow buckets.
Handling of Bucket Overflows (Cont.)

 Overflow chaining – the overflow buckets of a given bucket are


chained together in a linked list.
 Above scheme is called closed hashing.
 An alternative, called open hashing, which does not use overflow
buckets, is not suitable for database applications.
Hash Indices
 Hashing can be used not only for file organization, but also for index-
structure creation.
 A hash index organizes the search keys, with their associated record
pointers, into a hash file structure.
 Strictly speaking, hash indices are always secondary indices
 if the file itself is organized using hashing, a separate primary
hash index on it using the same search-key is unnecessary.
 However, we use the term hash index to refer to both secondary
index structures and hash organized files.
Example of Hash Index

hash index on instructor, on attribute ID


Deficiencies of Static Hashing

 In static hashing, function h maps search-key values to a fixed set of B


of bucket addresses. Databases grow or shrink with time.
 If initial number of buckets is too small, and file grows, performance
will degrade due to too much overflows.
 If space is allocated for anticipated growth, a significant amount of
space will be wasted initially (and buckets will be underfull).
 If database shrinks, again space will be wasted.
 One solution: periodic re-organization of the file with a new hash
function
 Expensive, disrupts normal operations
 Better solution: allow the number of buckets to be modified dynamically.
Dynamic Hashing
 Good for database that grows and shrinks in size
 Allows the hash function to be modified dynamically
 Extendable hashing – one form of dynamic hashing
 Hash function generates values over a large range — typically b-bit
integers, with b = 32.
 At any time use only a prefix of the hash function to index into a
table of bucket addresses.
 Let the length of the prefix be i bits, 0  i  32.
 Bucket address table size = 2i. Initially i = 0
Value of i grows and shrinks as the size of the database grows

and shrinks.
 Multiple entries in the bucket address table may point to a bucket
(why?)
 Thus, actual number of buckets is < 2i
 The number of buckets also changes dynamically due to
coalescing and splitting of buckets.
General Extendable Hash Structure

In this structure, i2 = i3 = i, whereas i1 = i – 1 (see next


slide for details)
Use of Extendable Hash Structure

 Each bucket j stores a value ij


 All the entries that point to the same bucket have the same values on
the first ij bits.
 To locate the bucket containing search-key Kj:
1. Compute h(Kj) = X
2. Use the first i high order bits of X as a displacement into bucket
address table, and follow the pointer to appropriate bucket
 To insert a record with search-key value Kj
 follow same procedure as look-up and locate the bucket, say j.
 If there is room in the bucket j insert record in the bucket.
 Else the bucket must be split and insertion re-attempted (next slide.)
 Overflow buckets used instead in some cases (will see shortly)
Insertion in Extendable Hash Structure (Cont)
To split a bucket j when inserting record with search-key value Kj:
 If i > ij (more than one pointer to bucket j)
 allocate a new bucket z, and set ij = iz = (ij + 1)
 Update the second half of the bucket address table entries originally
pointing to j, to point to z
 remove each record in bucket j and reinsert (in j or z)
 recompute new bucket for Kj and insert record in the bucket (further
splitting is required if the bucket is still full)
 If i = ij (only one pointer to bucket j)
 If i reaches some limit b, or too many splits have happened in this
insertion, create an overflow bucket
 Else
 increment i and double the size of the bucket address table.
 replace each entry in the table by two entries that point to the
same bucket.
 recompute new bucket address table entry for Kj
Now i > ij so use the first case above.
Deletion in Extendable Hash Structure

 To delete a key value,


 locate it in its bucket and remove it.
 The bucket itself can be removed if it becomes empty (with
appropriate updates to the bucket address table).
 Coalescing of buckets can be done (can coalesce only with a
―buddy‖ bucket having same value of ij and same ij –1 prefix, if it is
present)
 Decreasing bucket address table size is also possible
 Note: decreasing bucket address table size is an expensive
operation and should be done only if number of buckets becomes
much smaller than the size of the table
Use of Extendable Hash Structure: Example
Example (Cont.)

 Initial Hash structure; bucket size = 2


Example (Cont.)

 Hash structure after insertion of ―Mozart‖, ―Srinivasan‖,


and ―Wu‖ records
Example (Cont.)

 Hash structure after insertion of Einstein record


Example (Cont.)
 Hash structure after insertion of Gold and El Said records
Example (Cont.)
 Hash structure after insertion of Katz record
Example (Cont.)

And after insertion of


eleven records
Example (Cont.)

And after insertion of


Kim record in previous
hash structure
Extendable Hashing vs. Other Schemes
 Benefits of extendable hashing:
Hash performance does not degrade with growth of file
 Minimal space overhead
 Disadvantages of extendable hashing
 Extra level of indirection to find desired record
 Bucket address table may itself become very big (larger than
memory)
 Cannot allocate very large contiguous areas on disk either

 Solution: B+-tree structure to locate desired record in bucket


address table
 Changing size of bucket address table is an expensive operation
 Linear hashing is an alternative mechanism
 Allows incremental growth of its directory (equivalent to bucket
address table)
 At the cost of more bucket overflows
Comparison of Ordered Indexing and Hashing

 Cost of periodic re-organization


 Relative frequency of insertions and deletions
 Is it desirable to optimize average access time at the expense of
worst-case access time?
 Expected type of queries:
 Hashing is generally better at retrieving records having a
specified value of the key.
 If range queries are common, ordered indices are to be
preferred
 In practice:
 PostgreSQL supports hash indices, but discourages use due to
poor performance
 Oracle supports static hash organization, but not hash indices
 SQLServer supports only B+-trees
Bitmap Indices
 Bitmap indices are a special type of index designed for efficient
querying on multiple keys
 Records in a relation are assumed to be numbered sequentially
from, say, 0
 Given a number n it must be easy to retrieve record n
 Particularly easy if records are of fixed size
 Applicable on attributes that take on a relatively small number
of distinct values
 E.g. gender, country, state, …
 E.g. income-level (income broken up into a small number of
levels such as 0-9999, 10000-19999, 20000-50000, 50000-
infinity)
 A bitmap is simply an array of bits
Bitmap Indices (Cont.)
 In its simplest form a bitmap index on an attribute has a bitmap for
each value of the attribute
 Bitmap has as many bits as records
 In a bitmap for value v, the bit for a record is 1 if the record has the
value v for the attribute, and is 0 otherwise
Bitmap Indices (Cont.)

 Bitmap indices are useful for queries on multiple attributes


 not particularly useful for single attribute queries
 Queries are answered using bitmap operations
 Intersection (and)
 Union (or)
 Complementation (not)
 Each operation takes two bitmaps of the same size and applies the
operation on corresponding bits to get the result bitmap
 E.g. 100110 AND 110011 = 100010
100110 OR 110011 = 110111
NOT 100110 = 011001
 Males with income level L1: 10010 AND 10100 = 10000
 Can then retrieve required tuples.
 Counting number of matching tuples is even faster
Bitmap Indices (Cont.)

 Bitmap indices generally very small compared with relation size


 E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space
used by relation.
 If number of distinct attribute values is 8, bitmap is only 1% of
relation size
 Deletion needs to be handled properly
 Existence bitmap to note if there is a valid record at a record location
 Needed for complementation
 not(A=v): (NOT bitmap-A-v) AND ExistenceBitmap
 Should keep bitmaps for all values, even null value
 To correctly handle SQL null semantics for NOT(A=v):
 intersect above result with (NOT bitmap-A-Null)
Efficient Implementation of Bitmap Operations

 Bitmaps are packed into words; a single word and (a basic CPU
instruction) computes and of 32 or 64 bits at once
 E.g. 1-million-bit maps can be and-ed with just 31,250 instruction
 Counting number of 1s can be done fast by a trick:
 Use each byte to index into a precomputed array of 256 elements
each storing the count of 1s in the binary representation
 Can use pairs of bytes to speed up further at a higher memory
cost
 Add up the retrieved counts
 Bitmaps can be used instead of Tuple-ID lists at leaf levels of
B+-trees, for values that have a large number of matching records
 Worthwhile if > 1/64 of the records have that value, assuming a
tuple-id is 64 bits
 Above technique merges benefits of bitmap and B+-tree indices
Index Definition in SQL
 Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch_name)
 Use create unique index to indirectly specify and enforce the
condition that the search key is a candidate key is a candidate key.
 Not really required if SQL unique integrity constraint is supported
 To drop an index
drop index <index-name>
 Most database systems allow specification of type of index, and
clustering.
Transactions
Chapter 14: Transactions
 Transaction Concept
 Transaction State
 Concurrent Executions
 Serializability
 Recoverability
 Implementation of Isolation
 Transaction Definition in SQL
 Testing for Serializability.
Transaction Concept
 A transaction is a unit of program execution that accesses and
possibly updates various data items.
 E.g. transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Two main issues to deal with:
 Failures of various kinds, such as hardware failures and system
crashes
 Concurrent execution of multiple transactions
Example of Fund Transfer
 Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Atomicity requirement
 if the transaction fails after step 3 and before step 6, money will be ―lost‖
leading to an inconsistent database state
 Failure could be due to software or hardware
 the system should ensure that updates of a partially executed transaction
are not reflected in the database
 Durability requirement — once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has taken place), the updates to the
database by the transaction must persist even if there are software or
hardware failures.
Example of Fund Transfer (Cont.)
 Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Consistency requirement in above example:
 the sum of A and B is unchanged by the execution of the transaction
 In general, consistency requirements include
 Explicitly specified integrity constraints such as primary keys and foreign
keys
 Implicit integrity constraints
– e.g. sum of balances of all accounts, minus sum of loan amounts
must equal value of cash-in-hand
 A transaction must see a consistent database.
 During transaction execution the database may be temporarily inconsistent.
 When the transaction completes successfully the database must be
consistent
 Erroneous transaction logic can lead to inconsistency
Example of Fund Transfer (Cont.)
 Isolation requirement — if between steps 3 and 6, another
transaction T2 is allowed to access the partially updated database, it
will see an inconsistent database (the sum A + B will be less than it
should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
 Isolation can be ensured trivially by running transactions serially
 that is, one after the other.
 However, executing multiple transactions concurrently has significant
benefits, as we will see later.
ACID Properties
A transaction is a unit of program execution that accesses and possibly
updates various data items.To preserve the integrity of data the database
system must ensure:
 Atomicity. Either all operations of the transaction are properly reflected
in the database or none are.
 Consistency. Execution of a transaction in isolation preserves the
consistency of the database.
 Isolation. Although multiple transactions may execute concurrently,
each transaction must be unaware of other concurrently executing
transactions. Intermediate transaction results must be hidden from other
concurrently executed transactions.
 That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started execution
after Ti finished.
 Durability. After a transaction completes successfully, the changes it
has made to the database persist, even if there are system failures.
Transaction State
 Active – the initial state; the transaction stays in this state while it is
executing
 Partially committed – after the final statement has been executed.
 Failed -- after the discovery that normal execution can no longer
proceed.
 Aborted – after the transaction has been rolled back and the
database restored to its state prior to the start of the transaction.
Two options after it has been aborted:
 restart the transaction
 can be done only if no internal logical error
 kill the transaction
 Committed – after successful completion.
Transaction State (Cont.)
Concurrent Executions
 Multiple transactions are allowed to run concurrently in the system.
Advantages are:
 increased processor and disk utilization, leading to better
transaction throughput
 E.g. one transaction can be using the CPU while another is
reading from or writing to the disk
 reduced average response time for transactions: short
transactions need not wait behind long ones.
 Concurrency control schemes – mechanisms to achieve isolation
 that is, to control the interaction among the concurrent
transactions in order to prevent them from destroying the
consistency of the database
 Will study in Chapter 16, after studying notion of correctness
of concurrent executions.
Schedules
 Schedule – a sequences of instructions that specify the chronological
order in which instructions of concurrent transactions are executed
 a schedule for a set of transactions must consist of all instructions
of those transactions
 must preserve the order in which the instructions appear in each
individual transaction.
 A transaction that successfully completes its execution will have a
commit instructions as the last statement
 by default transaction assumed to execute commit instruction as its
last step
 A transaction that fails to successfully complete its execution will have
an abort instruction as the last statement
Schedule 1
 Let T1 transfer $50 from A to B, and T2 transfer 10% of the
balance from A to B.
 A serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule where T2 is followed by T1
Schedule 3
 Let T1 and T2 be the transactions defined previously. The
following schedule is not a serial schedule, but it is equivalent
to Schedule 1.

In Schedules 1, 2 and 3, the sum A + B is preserved.


Schedule 4
 The following concurrent schedule does not preserve the
value of (A + B ).
Serializability
 Basic Assumption – Each transaction preserves database
consistency.
 Thus serial execution of a set of transactions preserves
database consistency.
 A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule. Different forms of schedule
equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
Simplified view of transactions

 We ignore operations other than read and write


instructions
 We assume that transactions may perform arbitrary
computations on data in local buffers in between reads
and writes.
 Our simplified schedules consist of only read and write
instructions.
Conflicting Instructions
 Instructions li and lj of transactions Ti and Tj respectively, conflict
if and only if there exists some item Q accessed by both li and lj,
and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don‘t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
 Intuitively, a conflict between li and lj forces a (logical) temporal
order between them.
 If li and lj are consecutive in a schedule and they do not
conflict, their results would remain the same even if they had
been interchanged in the schedule.
Conflict Serializability
 If a schedule S can be transformed into a schedule S´ by a series of
swaps of non-conflicting instructions, we say that S and S´ are
conflict equivalent.
 We say that a schedule S is conflict serializable if it is conflict
equivalent to a serial schedule
Conflict Serializability (Cont.)
 Schedule 3 can be transformed into Schedule 6, a serial
schedule where T2 follows T1, by series of swaps of non-
conflicting instructions. Therefore Schedule 3 is conflict
serializable.

Schedule 3 Schedule 6
Conflict Serializability (Cont.)

 Example of a schedule that is not conflict serializable:

 We are unable to swap instructions in the above schedule to


obtain either the serial schedule < T3, T4 >, or the serial
schedule < T4, T3 >.
View Serializability
 Let S and S´ be two schedules with the same set of transactions. S
and S´ are view equivalent if the following three conditions are met,
for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in
schedule S’ also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value
was produced by transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that was produced by the
same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation
in schedule S must also perform the final write(Q) operation in
schedule S’.
As can be seen, view equivalence is also based purely on reads and
writes alone.
View Serializability (Cont.)
 A schedule S is view serializable if it is view equivalent to a serial
schedule.
 Every conflict serializable schedule is also view serializable.
 Below is a schedule which is view-serializable but not conflict
serializable.

 What serial schedule is above equivalent to?


 Every view serializable schedule that is not conflict serializable has
blind writes.
Other Notions of Serializability
 The schedule below produces same outcome as the serial
schedule < T1, T5 >, yet is not conflict equivalent or view
equivalent to it.

 Determining such equivalence requires analysis of operations


other than read and write.
Testing for Serializability
 Consider some schedule of a set of transactions T1, T2, ..., Tn
 Precedence graph — a directed graph where the vertices
are the transactions (names).
 We draw an arc from Ti to Tj if the two transaction conflict,
and Ti accessed the data item on which the conflict arose
earlier.
 We may label the arc by the item that was accessed.
 Example 1
Test for Conflict Serializability
 A schedule is conflict serializable if and only
if its precedence graph is acyclic.
 Cycle-detection algorithms exist which take
order n2 time, where n is the number of
vertices in the graph.
 (Better algorithms take order n + e
where e is the number of edges.)
 If precedence graph is acyclic, the
serializability order can be obtained by a
topological sorting of the graph.
 This is a linear order consistent with the
partial order of the graph.
 For example, a serializability order for
Schedule A would be
T5  T1  T 3  T2  T4
 Are there others?
Test for View Serializability
 The precedence graph test for conflict serializability cannot be used
directly to test for view serializability.
 Extension to test for view serializability has cost exponential in the
size of the precedence graph.
 The problem of checking if a schedule is view serializable falls in the
class of NP-complete problems.
 Thus existence of an efficient algorithm is extremely unlikely.
 However practical algorithms that just check some sufficient
conditions for view serializability can still be used.
Recoverable Schedules
Need to address the effect of transaction failures on concurrently
running transactions.
 Recoverable schedule — if a transaction Tj reads a data item
previously written by a transaction Ti , then the commit operation of Ti
appears before the commit operation of Tj.
 The following schedule (Schedule 11) is not recoverable if T9 commits
immediately after the read

 If T8 should abort, T9 would have read (and possibly shown to the user)
an inconsistent database state. Hence, database must ensure that
schedules are recoverable.
Cascading Rollbacks
 Cascading rollback – a single transaction failure leads to a
series of transaction rollbacks. Consider the following schedule
where none of the transactions has yet committed (so the
schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.


 Can lead to the undoing of a significant amount of work
Cascadeless Schedules
 Cascadeless schedules — cascading rollbacks cannot occur; for
each pair of transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
 Every cascadeless schedule is also recoverable
 It is desirable to restrict the schedules to those that are cascadeless
Concurrency Control
 A database must provide a mechanism that will ensure that all possible
schedules are
 either conflict or view serializable, and
 are recoverable and preferably cascadeless
 A policy in which only one transaction can execute at a time generates
serial schedules, but provides a poor degree of concurrency
 Are serial schedules recoverable/cascadeless?
 Testing a schedule for serializability after it has executed is a little too
late!
 Goal – to develop concurrency control protocols that will assure
serializability.
Concurrency Control (Cont.)
 Schedules must be conflict or view serializable, and recoverable,
for the sake of database consistency, and preferably cascadeless.
 A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency.
 Concurrency-control schemes tradeoff between the amount of
concurrency they allow and the amount of overhead that they
incur.
 Some schemes allow only conflict-serializable schedules to be
generated, while others allow view-serializable schedules that are
not conflict-serializable.
Concurrency Control vs. Serializability Tests

 Concurrency-control protocols allow concurrent schedules, but ensure


that the schedules are conflict/view serializable, and are recoverable
and cascadeless .
 Concurrency control protocols generally do not examine the
precedence graph as it is being created
 Instead a protocol imposes a discipline that avoids nonseralizable
schedules.
 We study such protocols in Chapter 16.
 Different concurrency control protocols provide different tradeoffs
between the amount of concurrency they allow and the amount of
overhead that they incur.
 Tests for serializability help us understand why a concurrency control
protocol is correct.
Weak Levels of Consistency
 Some applications are willing to live with weak levels of consistency,
allowing schedules that are not serializable
 E.g. a read-only transaction that wants to get an approximate total
balance of all accounts
 E.g. database statistics computed for query optimization can be
approximate (why?)
 Such transactions need not be serializable with respect to other
transactions
 Tradeoff accuracy for performance
Levels of Consistency in SQL-92
 Serializable — default
 Repeatable read — only committed records to be read, repeated
reads of same record must return same value. However, a
transaction may not be serializable – it may find some records
inserted by a transaction but not find others.
 Read committed — only committed records can be read, but
successive reads of record may return different (but committed)
values.
 Read uncommitted — even uncommitted records may be read.

 Lower degrees of consistency useful for gathering approximate


information about the database
 Warning: some database systems do not ensure serializable
schedules by default
 E.g. Oracle and PostgreSQL by default support a level of
consistency called snapshot isolation (not part of the SQL
standard)
Transaction Definition in SQL
 Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
 In SQL, a transaction begins implicitly.
 A transaction in SQL ends by:
 Commit work commits current transaction and begins a new
one.
 Rollback work causes current transaction to abort.
 In almost all database systems, by default, every SQL statement
also commits implicitly if it executes successfully
 Implicit commit can be turned off by a database directive
 E.g. in JDBC, connection.setAutoCommit(false);
Review
Types of Schedules
Serial Schedules

 In serial schedules, all the transactions


execute serially one after the other.
 When one transaction executes, no other
transaction is allowed to execute.
Characteristics-
 Consistent
 Recoverable
Non-Serial Schedules

 In non-serial schedules,
multiple transactions
execute concurrently.
 Operations of all the
transactions are interleaved
or mixed with each other
Finding Number Of Schedules-
 Consider there are n number of transactions T1, T2, T3 ….
, Tn with N1, N2, N3 …. , Nn number of operations
respectively.
Total Number of Schedules-
 Total number of possible schedules (serial + non-serial) is
given by-

Total Number of Serial Schedules-


Total number of serial schedules
= Number of different ways of arranging n transactions
= n!
Total Number of Non-Serial Schedules-
 Total number of non-serial schedules
Serializability

 Some non-serial schedules may lead to


inconsistency of the database.
 Serializability is a concept that helps to identify
which non-serial schedules are correct and will
maintain the consistency of the database.
Serializable Schedules-
 If a given non-serial schedule of ‗n‘ transactions is
equivalent to some serial schedule of ‗n‘
transactions, then it is called as a serializable
schedule.
 Serializable schedules behave exactly same as
serial schedules.
Serial Schedules Vs Serializable Schedules-

Serial Schedules Serializable Schedules


No concurrency is allowed.
Concurrency is allowed.
Thus, all the
Thus, multiple transactions can
transactions necessarily execute
execute concurrently.
serially one after the other.

Serial schedules lead to less Serializable schedules improve


resource utilization and CPU both resource utilization and CPU
throughput. throughput.

Serial Schedules are less efficient


Serializable Schedules are always
as compared to serializable
better than serial schedules.
schedules.
(due to above reason)
(due to above reason)
Types of Serializability-
Serializability is mainly of two types-
1. Conflict Serializability- If a given non-serial
schedule can be converted into a serial
schedule by swapping its non-conflicting
operations, then it is called as a conflict
serializable schedule.
2. View Serializability - If a given schedule
is found to be view equivalent to some
serial schedule, then it is called as a view
serializable schedule.
Conflicting Operations-
Two operations are called as conflicting operations if all
the following conditions hold true for them-
 Instructions li and lj of transactions Ti and Tj respectively,
conflict if and only if there exists some item Q accessed
by both li and lj, and at least one of these instructions wrote
Q.
1. li = read(Q), lj = read(Q). li and lj don‘t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
 Intuitively, a conflict between li and lj forces a (logical)
temporal order between them.
 If li and lj are consecutive in a schedule and they do not
conflict, their results would remain the same even if they
had been interchanged in the schedule.
 If a schedule S can be transformed into
a schedule S´ by a series of swaps of
non-conflicting instructions, we say that
S and S´ are conflict equivalent.
 We say that a schedule S is conflict
serializable if it is conflict equivalent to
a serial schedule
Example-
Consider the following schedule-
In this schedule,
•W1 (A) and R2 (A) are called as conflicting
operations.
Checking Whether a Schedule is Conflict Serializable Or
Not
Follow the following steps to check whether a given non-
serial schedule is conflict serializable or not-
Step-01: Find and list all the conflicting operations.
Step-02: Start creating a precedence graph by drawing one
node for each transaction.
Step-03: Draw an edge for each conflict pair such that if Xi
(V) and Yj (V) forms a conflict pair then draw an edge
from Ti to Tj.
 This ensures that Ti gets executed before Tj.
Step-04: Check if there is any cycle formed in the graph.
 If there is no cycle found, then the schedule is conflict
serializable otherwise not.
Problem-01:
 Check whether the given schedule S is
conflict serializable or not-
 S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B)
, W1(A) , W2(B)
Solution-
Step-01:
 List all the conflicting operations and determine
the dependency between the transactions-
1. R2(A) , W1(A) (T2 → T1)
2. R1(B) , W2(B) (T1 → T2)
3. R3(B) , W2(B) (T3 → T2)
Step-02:
 Draw the precedence graph-

• Clearly, there exists a cycle in the precedence


graph.
• Therefore, the given schedule S is not conflict
serializable.
Problem-02:

Check whether the given schedule S is conflict serializable


and recoverable or not
View Serializability - If a given schedule is
found to be view equivalent to some serial
schedule, then it is called as a view serializable
schedule.
View Equivalent Schedules-
 Consider two schedules S1 and S2 each consisting
of two transactions T1 and T2.
 Schedules S1 and S2 are called view equivalent if
the following three conditions hold true for them.
Condition-01:
 For each data item X, if transaction Ti reads X
from the database initially in schedule S1, then in
schedule S2 also, Ti must perform the initial read
of X from the database.

“Initial readers must be same for all the data items”.


Condition-02:
 If transaction Ti reads a data item that has been
updated by the transaction Tj in schedule S1, then in
schedule S2 also, transaction Ti must read the same
data item that has been updated by the transaction
Tj.
“Write-read sequence must be same.”.
Condition-03:
 For each data item X, if X has been updated at last
by transaction Ti in schedule S1, then in schedule
S2 also, X must be updated at last by transaction Ti.
“Final writers must be same for all the data items”.
Checking Whether a Schedule is View Serializable Or Not
Method-01:
 Check whether the given schedule is conflict serializable
or not.
 If the given schedule is conflict serializable, then it is
surely view serializable. Stop and report your answer.
 If the given schedule is not conflict serializable, then it
may or may not be view serializable. Go and check using
other methods.

1. “All conflict serializable schedules are view serializable.”


2. “All view serializable schedules may or may not be conflict
serializable.”
Method-02:

 Check if there exists any blind write operation.


(Writing without reading is called as a blind write).
 If there does not exist any blind write, then the
schedule is surely not view serializable. Stop and
report your answer.
 If there exists any blind write, then the schedule may
or may not be view serializable. Go and check using
other methods.
 “No blind write means not a view serializable
schedule.”
Method-03:

 In this method, try finding a view equivalent


serial schedule.
 By using the above three conditions, write all the
dependencies.
 Then, draw a graph using those dependencies.
 If there exists no cycle in the graph, then the
schedule is view serializable otherwise not.
 Check whether the given schedule S is view
serializable or not
Step-01:
 List all the conflicting operations and
determine the dependency between the
transactions-
 W1(B) , W2(B) (T1 → T2)
 W1(B) , W3(B) (T1 → T3)
 W1(B) , W4(B) (T1 → T4)
 W2(B) , W3(B) (T2 → T3)
 W2(B) , W4(B) (T2 → T4)
 W3(B) , W4(B) (T3 → T4)
 Step-02: Draw the precedence graph-

 Clearly, there exists no cycle in the precedence


graph.
 Therefore, the given schedule S is conflict
serializable.
 Thus, we conclude that the given schedule is
also view serializable.
Check whether the given schedule S is
view serializable or not
Step-01: List all the conflicting operations and
determine the dependency between the
transactions-
 R1(A) , W3(A) (T1 → T3)
 R2(A) , W3(A) (T2 → T3)
 R2(A) , W1(A) (T2 → T1)
 W3(A) , W1(A) (T3 → T1)
Step-02:Draw the precedence graph-
•Since, the given schedule S is not conflict serializable,
so, it may or may not be view serializable.
•To check whether S is view serializable or not, let us
use another method.
•Let us check for blind writes.
Checking for Blind Writes
 There exists a blind write W3 (A) in the given
schedule S.
 Therefore, the given schedule S may or may not
be view serializable.
Now,
 To check whether S is view serializable or not,
let us use another method.
 Let us derive the dependencies and then draw a
dependency graph.
Drawing a Dependency Graph-
 T1 firstly reads A and T3 firstly updates A.
 So, T1 must execute before T3.
 Thus, we get the dependency T1 → T3.
 Final updation on A is made by the transaction T1.
 So, T1 must execute after all other transactions.
 Thus, we get the dependency (T2, T3) → T1.
 There exists no write-read sequence.
 Now, let us draw a dependency graph using these dependencies-
•Clearly, there exists a cycle in the
dependency graph.
•Thus, we conclude that the given schedule S
is not view serializable.

Check view serializability
Transaction Isolation and Atomicity
• What if a transaction fails during concurrent execution?

• If a transaction Ti fails, we need to undo the effect of


this transaction to ensure the atomicity of the
transaction.
• In a system that allow concurrent execution, the
atomicity property requires that any transaction Tj that
is dependent on Ti is also aborted.
• To achieve this we need to place restrictions on the type
of schedules permitted in the system.
Non-Serializable Schedules

 A non-serial schedule which is not serializable is


called as a non-serializable schedule.
 A non-serializable schedule is not guaranteed to
produce the same effect as produced by some serial
schedule on any consistent database.
Characteristics-
Non-serializable schedules
 may or may not be consistent
 may or may not be recoverable
Irrecoverable Schedules

If in a schedule,
 A transaction performs a dirty read operation
from an uncommitted transaction and commits
before the transaction from which it has read
the value then such a schedule is known as
an Irrecoverable Schedule.
Consider the following schedule

Here,
 T2 performs a dirty read
operation.
 T2 commits before T1.
 T1 fails later and roll
backs.
 The value that T2 read
now stands to be
incorrect.
 T2 can not recover since
it has already committed.
Recoverable Schedules-
If in a schedule,
 A transaction performs a dirty read operation from
an uncommitted transaction and its commit
operation is delayed till the uncommitted
transaction either commits or roll backs then such
a schedule is known as a Recoverable Schedule.

Here,
 The commit operation of the transaction that
performs the dirty read is delayed.
 This ensures that it still has a chance to recover if
the uncommitted transaction fails later.
Consider the following schedule

Here,
 T2 performs a dirty read
operation.
 The commit operation of
T2 is delayed till T1
commits or roll backs.
 T1 commits later.
 T2 is now allowed to
commit.
 In case, T1 would have
failed, T2 has a chance
to recover by rolling
back.
Checking Whether a Schedule is Recoverable
or Irrecoverable

Method-01:
 Check whether the given schedule is conflict
serializable or not.
 If the given schedule is conflict serializable,
then it is surely recoverable. Stop and report
your answer.
 If the given schedule is not conflict
serializable, then it may or may not be
recoverable. Go and check using other
methods.
Method-02:
 Check if there exists any dirty read operation.
 (Reading from an uncommitted transaction is
called as a dirty read)
 If there does not exist any dirty read operation,
then the schedule is surely recoverable. Stop and
report your answer.
 If there exists any dirty read operation, then the
schedule may or may not be recoverable.
If there exists a dirty read operation, then follow the following cases

Case-01:

 If the commit operation of the transaction performing the dirty read


occurs before the commit or abort operation of the transaction
which updated the value, then the schedule is irrecoverable.

Case-02:

 If the commit operation of the transaction performing the dirty read


is delayed till the commit or abort operation of the transaction
which updated the value, then the schedule is recoverable.

No dirty read means a recoverable schedule


Even if a schedule is recoverable, to recover correctly from
the failure of a transaction Ti, we may have to rollback
several transactions.
A recoverable schedule may be any one of these kinds:
1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedule
Cascading Schedule-

 If in a schedule, failure of one transaction causes several


other dependent transactions to rollback or abort, then such
a schedule is called as a Cascading Schedule or
Cascading Rollback or Cascading Abort.
 It simply leads to the wastage of CPU time.
Here,
1.Transaction T2
depends on
transaction T1.
2. Transaction
T3 depends on
transaction T2.
3.Transaction T4
depends on
transaction T3.
In this schedule,
 The failure of transaction T1 causes the transaction T2
to rollback.
 The rollback of transaction T2 causes the transaction
T3 to rollback.
 The rollback of transaction T3 causes the transaction
T4 to rollback.
 Such a rollback is called as a Cascading Rollback.
NOTE-
 If the transactions T2, T3 and T4 would have
committed before the failure of transaction T1, then
the schedule would have been irrecoverable.
Cascadeless Schedule-
 If in a schedule, a transaction is not allowed to
read a data item until the last transaction that
has written it is committed or aborted, then
such a schedule is called as a Cascadeless
Schedule.
In other words,
 Cascadeless schedule allows only committed
read operations.
 Therefore, it avoids cascading roll back and
thus saves CPU time.
NOTE-

•Cascadeless schedule allows only


committed read operations.
•However, it allows uncommitted write
operations.
Strict Schedule-
 If in a schedule, a transaction is neither allowed
to read nor write a data item until the last
transaction that has written it is committed or
aborted, then such a schedule is called as a
Strict Schedule.
In other words,
 Strict schedule allows only committed read and
write operations.
 Clearly, strict schedule implements more
restrictions than cascadeless schedule.
Remember-
 Strict schedules are
more strict than
cascadeless schedules.
 All strict schedules are
cascadeless schedules.
 All cascadeless
schedules are not strict
schedules.
Transaction Isolation Levels in DBMS

 Isolation determines how transaction integrity


is visible to other users and systems. It means
that a transaction should take place in a system
in such a way that it is the only transaction that
is accessing the resources in a database system.
 Isolation levels defines the degree to which a
transaction must be isolated from the data
modifications made by any other transaction in
the database system.
A transaction isolation level are defined by the following
phenomena :
 Dirty Read – A Dirty read is the situation when a
transaction reads a data that has not yet been committed.
For example, Let’s say transaction 1 updates a row and
leaves it uncommitted, meanwhile Transaction 2 reads the
updated row. If transaction 1 rolls back the change,
transaction 2 will have read data that is considered never
to have existed.
 Non Repeatable read – Non Repeatable read occurs
when a transaction reads same row twice, and get a
different value each time. For example, suppose
transaction T1 reads a data. Due to concurrency, another
transaction T2 updates the same data and commit, Now if
transaction T1 rereads the same data, it will retrieve a
different value.
Based on these phenomena, The SQL standard
defines four isolation levels :
1. Read Uncommitted – Read Uncommitted is the
lowest isolation level. In this level, one
transaction may read not yet committed changes
made by other transaction, thereby allowing
dirty reads. In this level, transactions are not
isolated from each other.
2. Read Committed – This isolation level only
committed data to be read, but doesn’t require
repeatable reads. Thus it does not allows dirty
read. For instance, between two reads of a data
item by transaction, another transaction may
have updated the data item and committed.
3. Repeatable Read – This is the most restrictive
isolation level. Allows only committed data to
be read and further requires that, between two
reads of data item by a transaction , no other
transaction is allowed to update it.
4. Serializable – This is the Highest isolation level.
A serializable execution is guaranteed to be
serializable. Serializable execution is defined to
be an execution of operations in which
concurrently executing transactions appears to
be serially executing.
 All the isolation levels above additionally
disallow dirty writes.
 Many database systems run, by default , at
the read-committed isolation level.
 In SQL, it is possible to set isolation level
explicitly .
Example: set transaction isolation level
serializable

You might also like