0% found this document useful (0 votes)
43 views60 pages

Recovery and Indexing

Shadow paging is a recovery technique where the database is made up of fixed size logical pages mapped to physical blocks. It uses two page tables: the current page table points to the most recent pages, while the shadow page table is copied at transaction start and saved to disk. This allows consistency if a failure occurs during a transaction. ARIES is a more complex recovery algorithm that uses structures like log sequence numbers, dirty page tables, and transaction tables to redo and undo transactions during a three phase recovery of analysis, redo, and undo.

Uploaded by

Mrinaal Malhotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views60 pages

Recovery and Indexing

Shadow paging is a recovery technique where the database is made up of fixed size logical pages mapped to physical blocks. It uses two page tables: the current page table points to the most recent pages, while the shadow page table is copied at transaction start and saved to disk. This allows consistency if a failure occurs during a transaction. ARIES is a more complex recovery algorithm that uses structures like log sequence numbers, dirty page tables, and transaction tables to redo and undo transactions during a three phase recovery of analysis, redo, and undo.

Uploaded by

Mrinaal Malhotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Unit -5

Recovery and Indexing


Updation of database
• There are two ways of updating database:
• Differed database updation(database is updated only after committing
the transaction)
• Immediate database updation(updating database will be done
immediately after execution of instructions without waiting for
commit)
Updation of database
Implementation of shadow paging
Shadow Paging : In this recovery technique, database is considered as made
up of fixed size of logical units of storage which are referred as pages. pages
are mapped into physical blocks of storage, with help of the page table which
allow one entry for each logical page of database. 

This method uses two page tables named current page table and shadow


page table. The entries which are present in current page table are used to
point to most recent database pages on disk.

Another table i.e., Shadow page table is used when the transaction starts which
is copying current page table. After this, shadow page table gets saved on disk
and current page table is going to be used for transaction
Implementation of shadow paging
Implementation of Shadow paging
• The main use of this technique is maintaining the
consistency in data if failure happens in any case. The
technique is also known with the name of Cut of Place
Updating.

•  It is an inexpensive and faster method to perform recovery


after a crash.
• There is a chance of data fragmentation while performing
shadow paging
Implementation of shadow paging
• Ensure that all buffer pages in main memory that have been changed
by the transaction are output to the disk.

• Output the current page table to disk .

• Output the disk address of the current page table to the fixed
location in stable storage containing the address of the shadow
page table.
ARIES algorithm
Aries algorithm is used for recovery but a little more complicated as
compared to the one we studied earlier.

Data structures we need to implement the algorithm:


1. pageLSN – whenever an update operation occurs on a page, the
operation stores the LSN of its log record in the pageLSN
2. PrevLSN – each log record also contains the LSN of the previous log
record of the same transaction.
3. Compensation log records- log records for special redo-only operations.
ARIES algorithm
UndoNextLSN – records the LSN of the log that needs to be undone next.

DirtyPageTable – contains the list of pages that have been updated in the
database buffer. The Dirty Page Table contains an entry for each dirty
page in the buffer, which includes the page ID and the LSN corresponding
to the earliest update to that page.

The Transaction Table contains an entry for each active transaction, with
information such as the transaction ID, transaction status, and the LSN of
the most recent log record for the transaction.
ARIES algorithm
Checkpoint log – A checkpoint log record contains DirtyPageTable and a
list of active transactions. For each transaction, the checkpoint log record
also notes lastLSN, the LSN of the last record written by the transaction

Recovery algorithm will consists of three phases or pass for system


recovery in case of crash.
1. Analysis pass
2. Redo pass
3. Undo pass
ARIES algorithm
ARIES algorithm work on steal, no-force approach
• • Steal: if a frame is dirty and chosen for replacement, the page it contains is
• written to disk even if the modifying transaction is still active.
• • No-force: Pages in the buffer pool that are modified by a transaction are not
• forced to disk when the transaction commits
Steps involved in ARIES algorithm:
• Analysis
• The analysis step identifies the dirty (updated) pages in the buffer, and the set of transactions active at the
time of the crash. The appropriate point (RedoLSN) in the log where the REDO operation should start is also
determined.

• REDO PHASE
• The REDO phase actually reapplies updates from the log to the database. Certain information in the ARIES
log will provide the start point for REDO, from which REDO operations are applied until the end of the log is
reached information stored by ARIES and in the data pages will allow ARIES to determine whether the
operation to be redone has actually been applied to the database and hence need not be reapplied. Thus
only the necessary REDO operations are applied during recovery

• UNDO
• During the UNDO phase, the log is scanned backwards and the operations of transactions that were active at
the time of the crash are undone in reverse order. The information needed for ARIES to accomplish its
recovery procedure includes the log, the Transaction Table, and the Dirty Page Table. In addition, check
pointing is used. These two tables are maintained by the transaction manager and written to the log during
check pointing.
File organization
Storing the files in certain order is called file organization. The main
objective of file organization is
i. records should be accessed as fast as possible.
ii. Any insert, update or delete transaction on records should be easy,
quick and should not harm other records.
iii. No duplicate records should be induced as a result of insert, update
or delete
iv. Records should be stored efficiently so that cost of storage is
minimal.
Types of file organizations:
• 1. Sequential file organization
• This technique stores the data element in the sequence manner that is
organized one after another

• The File organization in DBMS supports various data operations such as


insert, update, delete, and retrieve the data.

• The file store the unique data attributes for identification and that helps to
place the data element in the sequence. It is logical sequencing in computer
memory that stores the data element for the database management systems.
Sequential file organizations
• 1. Pile file method
• It is a standard method for sequential file organization in which the data
elements are inserted one after another in the order those are inserted.
• In case of a new record being inserted, it is placed at the end position of
the file that is after the last inserted data element or record.

• In the scenario of data modification or data deletion operation, the


particular data element is searched through the sequence in the memory
blocks, and after it is found, the update or deletion operation applied to
that data element. 
Sequential file organization
Illustration of insertion in pile file method using an example:
Sequential file organization
2. Sorted file method
• This method provisions the data element to be arranged and stored in
the shorted order. The data elements are stored as ascending or
descending order based upon the primary key or any other key
reference.

• In the case of the shorted file method scenario, the new data element
or the new record is inserted at the end position of the file. After the
inserting step, It then gets shorted in the ascending or the descending
order based upon the key.
Sequential file organizations
2. Sorted file method
• For the update or data modification scenario, the data element is searched,
and updated based upon the condition. And, after the update operation
completes the shorting process happens to rearrange the data elements, and
the updated data element is placed at the right position of the sequential file
structure.

• for deletion operation, the data item is searched through the shorted
sequence, and mark as delete once it gets identified. After the delete
operation completes the other data elements are get shorted and rearranged
again with the original ascending or descending order.
Sequential file organization
Illustration of insertion in sorted file method using an example:
Heap File Organization
When a file is created using Heap File Organization, the Operating System allocates
memory area to that file without any further accounting details. File records can be
placed anywhere in that memory area. Heap File Organization works with data
blocks. In this method records are inserted at the end of the file, into the data blocks.
No Sorting or Ordering is required in this method. If a data block is full, the new
record is stored in some other block, Here the other data block need not be the very
next data block, but it can be any block in the memory. It is the responsibility of
DBMS to store and manage the new records.
Heap File Organization
• If we want to search, delete or update data in heap file Organization the we will
traverse the data from the beginning of the file till we get the requested record.
Thus if the database is very huge, searching, deleting or updating the record will
take a lot of time.

• Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a
new record R2 has to be inserted in the heap then, since the last data block i.e data
block 3 is full it will be inserted in any of the database selected by the DBMS, lets
say data block 1
Hash file organization
• Hash File Organization uses Hash function computation on some fields of
the records. The output of the hash function determines the location of
disk block where the records are to be placed.
• In this method of file organization, hash function is used to calculate the
address of the block to store the records.
• The hash function can be any simple or complex mathematical function.

• Hence each record is stored randomly irrespective of the order they come.
Hence this method is also known as Direct or Random file organization.
clustered file organization:
• In this mechanism, related records from one or more relations are kept
in the same disk block, that is, the ordering of records is not based on
primary key or search key.

• In this method two or more table which are frequently used to join and
get the results are stored in the same file called clusters. These files will
have two or more tables in the same data block and the key columns
which map these tables are stored only once. This method hence
reduces the cost of searching for various records in different files. All the
records are found at one place and hence making search efficient.
RAID ( Redundant array of independent disk)
• Redundant Array of Independent Disks, is a technology to connect
multiple secondary storage devices and use them as a single storage
media.

• Key Evaluation Points for a RAID System


• Reliability
• Performance
• Capacity
RAID levels
• RAID-0 (Stripping): Blocks are “stripped” across disks. 
RAID levels
• RAID-1 (Mirroring)
• More than one copy of each block is stored in a separate disk. Thus,
every block has two (or more) copies, lying on different disks. 
RAID levels
• In Raid-2, the error of the data is checked at every bit level. Here, we
use Hamming Code Parity Method to find the error in the data.
• It uses one designated drive to store parity.
• The structure of Raid-2 is very complex as we use two disks in this
technique. One word is used to store bits of each word and another
word is used to store error code correction
RAID levels
• RAID-3 (Byte-Level Stripping with Dedicated Parity)
• It consists of byte-level striping with dedicated parity striping
• At this level, we store parity information in a disc section and write to
a dedicated parity drive
RAID levels
 RAID-4 (Block-Level Stripping with Dedicated Parity)
• Instead of duplicating data, this adopts a parity-based approach. 

• In the figure, we can observe one column (disk) dedicated to parity. 


• Parity is calculated using a simple XOR function. If the data bits are
0,0,0,1 the parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0
the parity bit is XOR(0,1,1,0) = 0. A simple approach is that an even
number of ones results in parity 0, and an odd number of ones results
in parity 1. 
RAID levels
• Assume that in the above figure, C3 is lost due to some disk failure.
Then, we can recompute the data bit stored in C3 by looking at the
values of all the other columns and the parity bit. This allows us to
recover lost data. 
RAID levels
• RAID-5 (Block-Level Stripping with Distributed Parity)
• This is a slight modification of the RAID-4 system where the only
difference is that the parity rotates among the drives. 
RAID levels
• RAID 6 is an extension of level 5. In this level, two independent
parities are generated and stored in distributed fashion among multiple
disks. Two parities provide additional fault tolerance. This level
requires at least four disk drives to implement RAID
Indexing
• Indexing is a data structure technique to efficiently retrieve records from the
database files based on some attributes on which the indexing has been done.
Indexing in database systems is similar to what we see in books.

• Indexing is defined based on its indexing attributes. Indexing can be of the


following types −
• Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
• Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values.
• Clustering Index − Clustering index is defined on an ordered data file. The data
file is ordered on a nonkey field.
Indexing
• Dense index:
• In dense index, there is an index record for every search key value in the database.
This makes searching faster but requires more space to store index records itself.
Index records contain search key value and a pointer to the actual record on the
disk.
Indexing
• Sparse index:
• In sparse index, index records are not created for every search key. An index record here contains
a search key and an actual pointer to the data on the disk. To search a record, we first proceed by
index record and reach at the actual location of the data. If the data we are looking for is not where
we directly reach by following the index, then the system starts sequential search until the desired
data is found.
Multilevel index
• Index records comprise search-key values and data pointers. Multilevel index is
stored on the disk along with the actual database files. As the size of the database
grows, so does the size of the indices. There is an immense need to keep the index
records in the main memory so as to speed up the search operations. If single-level
index is used, then a large size index cannot be kept in memory which leads to
multiple disk accesses.

• Multi-level Index helps in breaking down the index into several smaller indices in
order to make the outermost level so small that it can be saved in a single disk
block, which can easily be accommodated anywhere in the main memory.
Multilevel index
Indexing data structures
• ISAM (Indexed sequential access method): is a file management system developed
at IBM that allows records to be accessed either sequentially (in the order they
were entered) or randomly (with an index).
• ISAM is a static index structure –effective when the file is not frequently updated.
Not suitable for files that grow and shrink
• In ISAM, leaf pages contains data entries and non leaf pages contains index
entries of the form (search key value, page id)
One level ISAM Tree
Multi level ISAM Tree
Multi level ISAM TREE
• If we want to insert some records into database, overflow
pages might be created in order to insert data into. The ISAM
tree remains static, so insertions and deletions in the data
file do not affect the tree layers.

• If a new record is going to be inserted and there is no space


left on the associated page, we need to create a overflow
page and hang it to the leaf page.
Multi level ISAM structure with overflow
pages
Example of data records insertion
B+ Trees
• B+ tree is a balanced tree in which every path from the root of the tree to a leaf is
of the same length, and each non-leaf node of the tree has between [M/2] and [M]
children, where m is fixed for a particular tree.

• Internal nodes contain only search keys (no data)

• All leaves are at the same depth


B+ Trees : insertion
• Find the correct leaf for insertion.
• If leaf node has enough space then done otherwise, move up the
middle element and then insert the new element in the tree.
• This process can happen recursively based on the situation in the
tree.
B+ Trees : Deletion
• Search the node , need to delete.
• Before removing make sure the node level has enough number of
nodes ( n/2)
• If node is index node as well remove the node from index node as
well.
Introduction to NO SQL
• NoSQL is a type of database management system (DBMS) that is
designed to handle and store large volumes of unstructured and semi-
structured data.
• Unlike traditional relational databases that use tables with pre-defined
schemas to store data, NoSQL databases use flexible data models that
can adapt to changes in data structures and are capable of scaling
horizontally to handle growing amounts of data.
• The term NoSQL originally referred to “non-SQL” or “non-relational”
databases, 
Types of NO SQL databases
• NoSQL databases are generally classified into four main categories:
1.Document databases: These databases store data as semi-structured
documents, such as JSON or XML, and can be queried using document-
oriented query languages.
2.Key-value stores: These databases store data as key-value pairs, and are
optimized for simple and fast read/write operations.
3.Column-family stores: These databases store data as column families, which
are sets of columns that are treated as a single entity. They are optimized for
fast and efficient querying of large amounts of data.
4.Graph databases: These databases store data as nodes and edges, and are
designed to handle complex relationships between data.
Key Features of NoSQL :

1.Dynamic schema: NoSQL databases do not have a fixed schema and


can accommodate changing data structures without the need for
migrations or schema alterations.
2.Horizontal scalability: NoSQL databases are designed to scale out by
adding more nodes to a database cluster, making them well-suited for
handling large amounts of data and high levels of traffic.
3.Document-based: Some NoSQL databases, such as MongoDB, use a
document-based data model, where data is stored in semi-structured
format, such as JSON or BSON.
4.Key-value-based: Other NoSQL databases, such as Redis, use a key-
value data model, where data is stored as a collection of key-value
pairs.
Key features
1.Column-based: Some NoSQL databases, such as Cassandra, use a column-
based data model, where data is organized into columns instead of rows.
2.Distributed and high availability: NoSQL databases are often designed to
be highly available and to automatically handle node failures and data
replication across multiple nodes in a database cluster.
3.Flexibility: NoSQL databases allow developers to store and retrieve data in a
flexible and dynamic manner, with support for multiple data types and
changing data structures.

4.Performance: NoSQL databases are optimized for high performance and


can handle a high volume of reads and writes, making them suitable for big
data and real-time applications.

You might also like