Storage System - RAID Levels
Storage System - RAID Levels
(BTIT502-18)
Basics-Storage
RAID levels
1
In this topic…
• Egs …
– Flash Memory: plugged into the USB slots , used in the server
systems for caching the frequently used data.
– Magnetic Disk Storage (HD): storing the data for a long time, ensures
availability of the data, Persistent storage
Tertiary Storage
Speed, Cost
capacity
increases
increase
Redundant Array of Independent Disks
(RAID)
• Technology to connect multiple secondary storage devices (Disks)
and use them as a single storage media.
• Multiple disks
RAID levels
• RAID 0
• RAID 1
• RAID 2
• RAID 3
• RAID 4
• RAID 5
• RAID 6
RAID 0
• A striped array of disks is implemented.
• The data is broken down into blocks(multiple bytes) and the
blocks are distributed among disks.
• Each disk receives a block of data to write/read in parallel.
• It enhances the speed and performance of the storage device.
• There is no parity(data check) and backup in Level 0.
RAID 1
• Uses mirroring techniques.
• When data is sent to a RAID controller, it sends a copy of data
to all the disks in the array.
• RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.
RAID 2
• RAID 2 records ECC (error correction code) using Hamming distance for
its data, striped on different disks.
• Like level 0, each data bit in a word is recorded on a separate disk and
ECC codes of the data words are stored on a different set disks.
• Backup as well Data error correction
• Due to its complex structure and high cost, RAID 2 is not commercially
available.
RAID 3
• RAID 3 stripes the data onto multiple disks. The parity bit
(detect the error) generated for data word is stored on a
different disk.
• This technique makes it to overcome single disk failures.
RAID 4
• In this level, an entire block (multiple words) of data is written onto
data disks and then the parity is generated and stored on a different
disk.
• level 3 uses byte-level striping, whereas level 4 uses block-level
striping. Both level 3 and level 4 require at least three disks to
implement RAID.
RAID 5
• RAID 5 writes whole data blocks onto different disks,
• But the parity bits generated for data block stripe are
distributed among all the data disks rather than storing them
on a different dedicated disk.
RAID 6
• RAID 6 is an extension of level
5.
• In this level, two independent
parities are generated and
stored in distributed fashion
among multiple disks.
• Two parities provide
additional fault tolerance. This
level requires at least four disk
drives to implement RAID.
Indexing
• It is a type of data structure. It is used to locate and access the data in
a database (retrieve the data) table quickly.
• Search key - contains a copy of the primary key or candidate key of the
table.
• The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.
• Data reference- contains a set of pointers holding the address of the disk
block where the value of the particular key can be found.
Example
• Multilevel Index
1. Primary/Ordered Index
• Based on an ordered data file, ordered on a key field
• The key field is generally the primary key of the relation
• The indices are usually sorted to make searching faster (Ordered
Indices)
• As the size of the table grows, the size of mapping also grows. These mappings are usually
kept in the primary memory so that address fetch should be faster. Then the secondary
memory searches the actual data based on the address got from mapping. If the mapping
size grows then fetching the address itself becomes slower. In this case, the sparse index will
not be efficient. To overcome this problem, secondary indexing is introduced.
• In secondary indexing, to reduce the size of mapping, another level of indexing is introduced.
In this method, the huge range for the columns is selected initially so that the mapping size
of the first level becomes small. Then each range is further divided into smaller ranges. The
mapping of the first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary memory (hard disk).
Secondary
Index
B Tree
+
• B+ trees are filled from bottom and each entry is done at the leaf node.
• If a leaf node overflows −
– Split leaf node into two parts.
– Partition at i = ⌊(m+1)/2⌋.
– First i entries are stored in one node.
– Rest of the entries (i+1 onwards) are moved to a new node.
– ith key is duplicated at the parent of the leaf.
• If a non-leaf node overflows −
– Split node into two parts.
– Partition the node at i = ⌈(m+1)/2⌉.
– Entries up to i are kept in one node.
– Rest of the entries are moved to a new node.
Example-B Tree Insertion
+
• Values=4 , i=(4+1)/2=low(2.5)=2
• It will go to the 3rd leaf node after 55 and a leaf node of this tree is already
full, so we cannot insert 60 there.
• So the have to split the leaf node
• The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node
is 50. We will split the leaf node of the tree in the middle so that its balance
is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
• If these two has to be leaf nodes, the intermediate node cannot branch
from 50. It should have 60 added to it, and then we can have pointers to a
new leaf node.
B+ Tree Deletion
• B+ tree entries are deleted at the leaf nodes.
• The target entry is searched and deleted.
– If it is an internal node, delete and replace with the entry from the left
position.
• After deletion, underflow is tested,
– If underflow occurs, distribute the entries from the nodes left to it.
• If distribution is not possible from left, then
– Distribute from the nodes right to it.
• If distribution is not possible from left or from right, then
– Merge the node with left and right to it.
• to delete 60 from the above example
• In this case, we have to remove 60 from the intermediate node as
well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+
tree. So we need to modify it to have a balanced tree.
Searching a record in B+ Tree
• To search 55 in the below B+ tree structure.
• Firstly fetch for the intermediary node which will direct to the leaf node that can
contain a record for 55.
• So, in the intermediary node, we will find a branch between 50 and 75 nodes.
Then at the end, we will be redirected to the third leaf node. Here DBMS will
perform a sequential search to find 55.
Hashing
• Indexing is inefficient for huge DBs.. WHY?? Searching is sequential
and slow.
• Hashing technique is used to calculate the direct location of a data
record on the disk without using index structure.
• In this, data is stored at the data blocks / bucket (multiple bytes)
• the address for data block is generated by using the hashing
function.
• The memory location where these records are stored is known as
data bucket or data blocks.
Hashing Function
• A hash function is a simple mathematical function to any
complex mathematical function. (egs Mod, sin, cos)
• the hash function mostly uses the primary key to generate the
address of the data block.
98
Other Hash Functions..
• The hash function can also be a simple mathematical function like exponential, mod, cos, sin,
etc.
• Suppose we have mod (5) hash function to determine the address of the data block.
• In this case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and
2 respectively, and records are stored in those data block addresses.
• Eg. 98 mod (5) = 3 and 103 mod 5 = 3 , so both records of key value 98 and 103 will be stored
with same address 3 in memory as linked list.
• 104, 106
Types of Hashing
1. Static Hashing
• In static hashing, the resultant data bucket address will always be the same.
That means if we generate an address for EMP_ID =103 using the hash
function mod (5) then it will always result in same bucket address 3. Here,
there will be no change in the bucket address.
• Hence in this static hashing, the number of data buckets in memory remains
constant throughout.
Disadvantage of Static Hashing
• If new record needs to be added and we want to generate an
address of the data bucket, but data already exists in that data
address. This situation is BUCKET OVERFLOW.
2. Dynamic Hashing
• The dynamic hashing method is used to overcome the problems
of static hashing like bucket overflow.
00 B0
01 B1
10 B2
11 B3
Insert key 9 with hash address 10001 into the above structure???
The max records in each bucket is 2
Insert key 9 with hash address 10001 into the previous
structure
• Since key 9 has hash address 10001, it must go into the first bucket. But bucket
B1 is full, so it will get split.
• Take last 3 bits to decide the bucket
Key Hash Address RULE (last 3 bits)
1 11010 000 B0
2 00 000 001B1
3 11110 010 B2
4 00000 011 B3
5 01001 100 B4
6 10101 101 B5
7 10111 110 B6
9 10 001 111 B7
Advantages of dynamic hashing
• The performance does not decrease as the data grows in the
system. It simply increases the size of memory to accommodate the
data.
• Memory is well utilized as it grows and shrinks with the data. There
will not be any unused memory lying.
• Good for the dynamic database where data grows and shrinks
frequently.
Disadvantages of dynamic hashing
• In this method, if the data size increases then the bucket size is
also increased. If there is a huge increase in data, maintaining the
bucket address table becomes tedious.
• In this case, the bucket overflow situation will also occur. But it
might take little time to reach this situation than static hashing.
• Thank You
• Any Queries?????