0% found this document useful (0 votes)
13 views53 pages

Storage System - RAID Levels

Uploaded by

Ankit Dahiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views53 pages

Storage System - RAID Levels

Uploaded by

Ankit Dahiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Database Management System

(BTIT502-18)

Basics-Storage
RAID levels
1
In this topic…

• An overview of various types of


– storage devices that are used for accessing and storing data.

• Types of Data Storage


– Primary Memory- RAM/ROM(BIOS/OS)
– Secondary Storage/ROM
differ from one another as per the
– Tertiary /Auxilliary Storage speed and capacity
Differ from one another as per the speed and Capacity
Primary Memory (Storage)
• Main Memory: volatile, small enough to carry the entire
database

• Cache: costly, fastest , maintained by the computer hardware,


frequently programs/data used by CPU.

• Within CPU… registers


Secondary (HD) Storage
• the user to save and store data permanently

• Online (within the system), non-volatile storage

• Egs …
– Flash Memory: plugged into the USB slots , used in the server
systems for caching the frequently used data.
– Magnetic Disk Storage (HD): storing the data for a long time, ensures
availability of the data, Persistent storage
Tertiary Storage

• external from the computer system


• slowest speed
• store a large amount of data
• Offline storage
• used for data backup
• Egs …
– Optical Storage (CDs, DVDs)
– Tape Storage (Mag. Tapes)
– Pen drives
Storage Hierarchy

Speed, Cost
capacity
increases
increase
Redundant Array of Independent Disks
(RAID)
• Technology to connect multiple secondary storage devices (Disks)
and use them as a single storage media.

• RAID consists of an array of disks in which multiple disks are


connected together to achieve different goals.

• Multiple disks
RAID levels
• RAID 0
• RAID 1
• RAID 2
• RAID 3
• RAID 4
• RAID 5
• RAID 6
RAID 0
• A striped array of disks is implemented.
• The data is broken down into blocks(multiple bytes) and the
blocks are distributed among disks.
• Each disk receives a block of data to write/read in parallel.
• It enhances the speed and performance of the storage device.
• There is no parity(data check) and backup in Level 0.
RAID 1
• Uses mirroring techniques.
• When data is sent to a RAID controller, it sends a copy of data
to all the disks in the array.
• RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2
• RAID 2 records ECC (error correction code) using Hamming distance for
its data, striped on different disks.
• Like level 0, each data bit in a word is recorded on a separate disk and
ECC codes of the data words are stored on a different set disks.
• Backup as well Data error correction
• Due to its complex structure and high cost, RAID 2 is not commercially
available.
RAID 3
• RAID 3 stripes the data onto multiple disks. The parity bit
(detect the error) generated for data word is stored on a
different disk.
• This technique makes it to overcome single disk failures.
RAID 4
• In this level, an entire block (multiple words) of data is written onto
data disks and then the parity is generated and stored on a different
disk.
• level 3 uses byte-level striping, whereas level 4 uses block-level
striping. Both level 3 and level 4 require at least three disks to
implement RAID.
RAID 5
• RAID 5 writes whole data blocks onto different disks,
• But the parity bits generated for data block stripe are
distributed among all the data disks rather than storing them
on a different dedicated disk.
RAID 6
• RAID 6 is an extension of level
5.
• In this level, two independent
parities are generated and
stored in distributed fashion
among multiple disks.
• Two parities provide
additional fault tolerance. This
level requires at least four disk
drives to implement RAID.
Indexing
• It is a type of data structure. It is used to locate and access the data in
a database (retrieve the data) table quickly.

• used to optimize the performance (time-minimum / speed-maximum)


of a database
– minimizing the number of disk accesses required when a query is processed
(OPTIMAL).

• “Data structure technique to efficiently retrieve records from the


database based on some attributes”
Index structure

• Search key - contains a copy of the primary key or candidate key of the
table.
• The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.

• Data reference- contains a set of pointers holding the address of the disk
block where the value of the particular key can be found.
Example

Search Key Data Reference


Types of Indexes
1. Primary index
2. Clustered Index
3. Secondary Index
Indexing Methods/Techniques
• Primary/Ordered Index
– Dense Index
– Sparse Index

• Multilevel Index
1. Primary/Ordered Index
• Based on an ordered data file, ordered on a key field
• The key field is generally the primary key of the relation
• The indices are usually sorted to make searching faster (Ordered
Indices)

• Primary/Ordered Indexing is of two types −


– Dense Index
– Sparse Index
Dense Index
• Contains an index record for every search key value in the data file.
• Makes searching faster.
• The number of records in the index table is same as the number of
records in the main table.
• Needs more space to store index record itself. The index records
have the search key and a pointer to the actual record on the disk.
Sparse Index
• In sparse index, index records are not created for every search Key.
• In the data file, index record appears only for a few items. Each item
points to a block.
• Instead of pointing to each record in the main table, the index points
to the records in the main table in a gap.
2. Clustering Index
• A clustered index can be defined as an ordered data file.
Sometimes the index is created on non-primary key columns
which may not be unique for each record.

• In this case, to identify the record faster, we will group two or


more columns to get the unique value and create index out of
them. This method is called a clustering index.

• The records which have similar characteristics are grouped, and


indexes are created for these group.
3. Secondary Index
• Two level Index.

• As the size of the table grows, the size of mapping also grows. These mappings are usually
kept in the primary memory so that address fetch should be faster. Then the secondary
memory searches the actual data based on the address got from mapping. If the mapping
size grows then fetching the address itself becomes slower. In this case, the sparse index will
not be efficient. To overcome this problem, secondary indexing is introduced.

• In secondary indexing, to reduce the size of mapping, another level of indexing is introduced.
In this method, the huge range for the columns is selected initially so that the mapping size
of the first level becomes small. Then each range is further divided into smaller ranges. The
mapping of the first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary memory (hard disk).
Secondary
Index
B Tree
+

• Balanced binary search tree which follows a multi-level index


format.
• The leaf nodes denote actual data pointers.
• Ensures that all leaf nodes remain at the same height
(Balanced Tree).
• In the B+ tree, the leaf nodes are linked using a link list.
Therefore, a B+ tree can support random access as well as
sequential access.
Example- B tree+

• Every leaf node is at equal distance from the root node.


(Balanced Tree)
B Tree Insertion
+

• B+ trees are filled from bottom and each entry is done at the leaf node.
• If a leaf node overflows −
– Split leaf node into two parts.
– Partition at i = ⌊(m+1)/2⌋.
– First i entries are stored in one node.
– Rest of the entries (i+1 onwards) are moved to a new node.
– ith key is duplicated at the parent of the leaf.
• If a non-leaf node overflows −
– Split node into two parts.
– Partition the node at i = ⌈(m+1)/2⌉.
– Entries up to i are kept in one node.
– Rest of the entries are moved to a new node.
Example-B Tree Insertion
+

• Suppose we want to insert a record 60 in the below structure.

• Values=4 , i=(4+1)/2=low(2.5)=2
• It will go to the 3rd leaf node after 55 and a leaf node of this tree is already
full, so we cannot insert 60 there.
• So the have to split the leaf node
• The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node
is 50. We will split the leaf node of the tree in the middle so that its balance
is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
• If these two has to be leaf nodes, the intermediate node cannot branch
from 50. It should have 60 added to it, and then we can have pointers to a
new leaf node.
B+ Tree Deletion
• B+ tree entries are deleted at the leaf nodes.
• The target entry is searched and deleted.
– If it is an internal node, delete and replace with the entry from the left
position.
• After deletion, underflow is tested,
– If underflow occurs, distribute the entries from the nodes left to it.
• If distribution is not possible from left, then
– Distribute from the nodes right to it.
• If distribution is not possible from left or from right, then
– Merge the node with left and right to it.
• to delete 60 from the above example
• In this case, we have to remove 60 from the intermediate node as
well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+
tree. So we need to modify it to have a balanced tree.
Searching a record in B+ Tree
• To search 55 in the below B+ tree structure.
• Firstly fetch for the intermediary node which will direct to the leaf node that can
contain a record for 55.
• So, in the intermediary node, we will find a branch between 50 and 75 nodes.
Then at the end, we will be redirected to the third leaf node. Here DBMS will
perform a sequential search to find 55.
Hashing
• Indexing is inefficient for huge DBs.. WHY?? Searching is sequential
and slow.
• Hashing technique is used to calculate the direct location of a data
record on the disk without using index structure.
• In this, data is stored at the data blocks / bucket (multiple bytes)
• the address for data block is generated by using the hashing
function.
• The memory location where these records are stored is known as
data bucket or data blocks.
Hashing Function
• A hash function is a simple mathematical function to any
complex mathematical function. (egs Mod, sin, cos)

• the hash function mostly uses the primary key to generate the
address of the data block.

• primary key itself can be the address of the data block


Hash Organization
• Bucket − A hash file stores data in bucket format. Bucket is
considered a unit of storage. A bucket typically stores one
complete disk block, which in turn can store one or more records.

• Hash Function − A hash function, h, is a mapping function that


maps all the set of search-keys K to the address where actual
records are placed. It is a function from search keys to bucket
addresses.
• Example- data block addresses same as primary key value

98
Other Hash Functions..
• The hash function can also be a simple mathematical function like exponential, mod, cos, sin,
etc.
• Suppose we have mod (5) hash function to determine the address of the data block.
• In this case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and
2 respectively, and records are stored in those data block addresses.
• Eg. 98 mod (5) = 3 and 103 mod 5 = 3 , so both records of key value 98 and 103 will be stored
with same address 3 in memory as linked list.
• 104, 106
Types of Hashing
1. Static Hashing
• In static hashing, the resultant data bucket address will always be the same.
That means if we generate an address for EMP_ID =103 using the hash
function mod (5) then it will always result in same bucket address 3. Here,
there will be no change in the bucket address.
• Hence in this static hashing, the number of data buckets in memory remains
constant throughout.
Disadvantage of Static Hashing
• If new record needs to be added and we want to generate an
address of the data bucket, but data already exists in that data
address. This situation is BUCKET OVERFLOW.
2. Dynamic Hashing
• The dynamic hashing method is used to overcome the problems
of static hashing like bucket overflow.

• In this method, data buckets grow or shrink as the records


increases or decreases. This method is also known as Extendable
hashing method.

• This method makes hashing dynamic, i.e., it allows insertion or


deletion without resulting in poor performance.
To insert a new record using Hashing (dynamic)

• Firstly, you have to follow the same procedure for retrieval,


ending up in some bucket.
• If there is still space in that bucket, then place the record in it.
• If the bucket is full, then we will split the bucket and
redistribute the records.
Insert Key to data buckets
• 1. 3. Copying records in data buckets

• 2. Last bits Bucket

00 B0
01 B1
10 B2
11 B3

Insert key 9 with hash address 10001 into the above structure???
The max records in each bucket is 2
Insert key 9 with hash address 10001 into the previous
structure
• Since key 9 has hash address 10001, it must go into the first bucket. But bucket
B1 is full, so it will get split.
• Take last 3 bits to decide the bucket
Key Hash Address RULE (last 3 bits)
1 11010 000 B0
2 00 000 001B1
3 11110 010 B2
4 00000 011 B3
5 01001 100 B4
6 10101 101 B5
7 10111 110 B6
9 10 001 111 B7
Advantages of dynamic hashing
• The performance does not decrease as the data grows in the
system. It simply increases the size of memory to accommodate the
data.

• Memory is well utilized as it grows and shrinks with the data. There
will not be any unused memory lying.

• Good for the dynamic database where data grows and shrinks
frequently.
Disadvantages of dynamic hashing
• In this method, if the data size increases then the bucket size is
also increased. If there is a huge increase in data, maintaining the
bucket address table becomes tedious.

• In this case, the bucket overflow situation will also occur. But it
might take little time to reach this situation than static hashing.
• Thank You

• Any Queries?????

You might also like