14 Raid
14 Raid
14 Raid
Spring 2022
Module 13
Redundant Arrays of Inexpensive Disks
(RAID)
John Zahorjan
Background
• We start with very static formatting
• Superblock/inode/free block map locations are absolute positions on disk
/etc
/etc/passwd
Disk 1
(C drive)
Background (cont.)
• One file system = one disk was too rigid
• If file system was corrupted, you lost everything
• Could have only one file system
• One blocksize, for instance
• Backups are often done on a file system, so schedule for most important and least
important data would be the same
• This is a logical backup – a backup of the ADT that is the file system
• Disk backups (copying the disk at the block level) involved the entire disk
Disk Partitions
/
/
/etc /home
/jz
/etc/passwd
Disks or partitions
/etc/fstab
# /etc/fstab
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
• Bytes are cheap means you can afford to waste them if it helps you
achieve other goals
/
/
/etc /home
/jz
/etc/passwd
bigdata.db
• Can we do better?
Improving “Single Threaded” Performance
bigdata.db bigdata.db
10
Reliability through Redundancy
• The issue: disk failure
• not software failure (it’s not journaling/crash tolerance)
• not user error (it’s not backup)
• At the scales we’re currently considering (tens of disks), it’s typically enough
to be resilient to the failure of a single disk
• What are the chances that a second disk will fail before you’ve replaced the first
one?
• Er, it has happened to us!
• So:
• Obtain performance from striping
• Obtain reliability from redundancy
11
RAID
• Disks are cheap, so it’s easy to put lots of disks (10s, say) in one box
for increased storage, performance, and availability
• Option 2: software
• A low level layer of the OS knows there are multiple disks, but presents
them to upper layers as a single block device
• That is, it does what the hw RAID controller does
13
Some RAID tradeoffs
• Granularity
• fine-grained: stripe each file over all disks
• high throughput for the file
• limits transfer to 1 file at a time
• coarse-grained: stripe each file over only a few disks
• limits throughput for 1 file
• allows concurrent access to multiple files
• Redundancy
• uniformly distribute redundancy information on disks
• avoids load-balancing problems
• concentrate redundancy information on a small number of disks
• partition the disks into data disks and redundancy disks
14
RAID Level 0: Non-Redundant Striping
• RAID Level 0 is a non-redundant disk array
• Files/blocks are striped across disks, no redundant info
• High (single-file) read throughput
• Best write throughput (no redundant info to write)
• Maximum use of disk capacity
• Any disk failure results in data loss
data disks
RAID Level 1: Mirrored Disks
• Files are striped across half the disks, and mirrored to the other half
• 2x space expansion
• Reads: Read from either copy
• read time is fastest read among copies
• Writes: Write both copies
• write time is slowest write among copies
• On single drive failure, just use the surviving disk during repair
• If two disks fail, you rely on luck…
identical identical
Prelude to RAID Levels 2-5: A parity refresher
1 0 1 1 0 1 1 0 1
• To each byte, add a bit whose value is set so that the total number
of 1’s is even
• Can detect any odd number of bit errors
• If an odd number of bits have their values flipped, the overall parity won’t
be even, so you’ll know something is wrong
• Can correct a single error if the error is a bit “goes missing”
• (next slide)
• More sophisticated schemes, called ECC (error correcting codes),
can correct multiple bit errors at the cost of requiring more “extra
bits”
17
RAID Levels 2, 3,a and 4: Striping + Parity Disk
• RAID levels 2, 3, and 4 use parity or ECC disks
• e.g., each byte on the parity disk is a parity function of the corresponding
bytes on all the other disks
• details between the different levels have to do with kind of ECC used, and
whether it is bit-level, byte-level, or block-level
• To recover from a single disk failure, read the remaining disks (including
the parity) disk to compute the missing data
0 1 2 3 PO
File Block
5 6 7 P1 4 Numbers
10 11 P2 8 9
• Basically like RAID 5 but with replicated parity blocks so that it can
survive two disk failures.
• Useful for larger disk arrays where multiple failures are more likely.
20
RAID Summary
• Why use multiple disks (vs. one bigger disk)?
• Is there any realistic situation in which you might lose “too many” disks at
once?
• For example, all of them?