Physical Design
Physical Design
Database System Concepts - 7th Edition 12.2 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy
Database System Concepts - 7th Edition 12.3 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy (Cont.)
Database System Concepts - 7th Edition 12.4 ©Silberschatz, Korth and Sudarshan
Storage Interfaces
Database System Concepts - 7th Edition 12.5 ©Silberschatz, Korth and Sudarshan
Magnetic Hard Disk Mechanism
Database System Concepts - 7th Edition 12.6 ©Silberschatz, Korth and Sudarshan
Magnetic Disks (Cont.)
§ Disk controller – interfaces between the computer system and the disk
drive hardware.
• accepts high-level commands to read or write a sector
• initiates actions such as moving the disk arm to the right track and
actually reading or writing the data
• Computes and attaches checksums to each sector to verify that
data is read back correctly
§ If data is corrupted, with very high probability stored checksum
won’t match recomputed checksum
• Ensures successful writing by reading back sector after writing it
• Performs remapping of bad sectors
Database System Concepts - 7th Edition 12.7 ©Silberschatz, Korth and Sudarshan
Performance Measures of Disks
§ Access time – the time it takes from when a read or write request is
issued to when data transfer begins. Consists of:
• Seek time – time it takes to reposition the arm over the correct track.
§ Average seek time is 1/2 the worst case seek time.
• Would be 1/3 if all tracks had the same number of sectors, and
we ignore the time to start and stop arm movement
§ 4 to 10 milliseconds on typical disks
• Rotational latency – time it takes for the sector to be accessed to
appear under the head.
§ 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
§ Average latency is 1/2 of the above latency.
• Overall latency is 5 to 20 msec depending on disk model
§ Data-transfer rate – the rate at which data can be retrieved from or stored
to the disk.
• 25 to 200 MB per second max rate, lower for inner tracks
Database System Concepts - 7th Edition 12.8 ©Silberschatz, Korth and Sudarshan
Performance Measures (Cont.)
Database System Concepts - 7th Edition 12.9 ©Silberschatz, Korth and Sudarshan
Performance Measures (Cont.)
§ Mean time to failure (MTTF) – the average time the disk is expected to
run continuously without any failure.
• Typically 3 to 5 years
• Probability of failure of new disks is quite low, corresponding to a
“theoretical MTTF” of 500,000 to 1,200,000 hours for a new disk
§ E.g., an MTTF of 1,200,000 hours for a new disk means that given
1000 relatively new disks, on an average one will fail every 1200
hours
• MTTF decreases as disk ages
Database System Concepts - 7th Edition 12.10 ©Silberschatz, Korth and Sudarshan
Flash Storage
Database System Concepts - 7th Edition 12.11 ©Silberschatz, Korth and Sudarshan
Flash Storage (Cont.)
§ Erase happens in units of erase block
• Takes 2 to 5 millisecs
• Erase block typically 256 KB to 1 MB (128 to 256 pages)
§ Remapping of logical page addresses to physical page addresses avoids
waiting for erase
§ Flash translation table tracks mapping
• also stored in a label field of flash page
• remapping carried out by flash translation layer
Database System Concepts - 7th Edition 12.12 ©Silberschatz, Korth and Sudarshan
SSD Performance Metrics
Database System Concepts - 7th Edition 12.13 ©Silberschatz, Korth and Sudarshan
RAID
Database System Concepts - 7th Edition 12.14 ©Silberschatz, Korth and Sudarshan
Improvement of Reliability via Redundancy
§ Redundancy – store extra information that can be used to rebuild
information lost in a disk failure
§ E.g., Mirroring (or shadowing)
• Duplicate every disk. Logical disk consists of two physical disks.
• Every write is carried out on both disks
§ Reads can take place from either disk
• If one disk in a pair fails, data still available in the other
§ Data loss would occur only if a disk fails, and its mirror disk also
fails before the system is repaired
• Probability of combined event is very small
§ Except for dependent failure modes such as fire or building
collapse or electrical power surges
§ Mean time to data loss depends on mean time to failure,
and mean time to repair
• E.g., MTTF of 100,000 hours, mean time to repair of 10 hours gives
mean time to data loss of 500*106 hours (or 57,000 years) for a
mirrored pair of disks (ignoring dependent failure modes)
Database System Concepts - 7th Edition 12.15 ©Silberschatz, Korth and Sudarshan
Improvement in Performance via Parallelism
Database System Concepts - 7th Edition 12.16 ©Silberschatz, Korth and Sudarshan
RAID Levels
Database System Concepts - 7th Edition 12.17 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
§ Parity blocks: Parity block j stores XOR of bits from block j of each disk
• When writing data to a block j, parity block j must also be computed
and written to disk
§ Can be done by using old parity block, old value of current block
and new value of current block (2 block reads + 2 block writes)
§ Or by recomputing the parity value using the new values of blocks
corresponding to the parity block
• More efficient for writing large amounts of data sequentially
• To recover data for a block, compute XOR of bits from all other
blocks in the set including the parity block
Database System Concepts - 7th Edition 12.18 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
Database System Concepts - 7th Edition 12.19 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
Database System Concepts - 7th Edition 12.20 ©Silberschatz, Korth and Sudarshan
RAID Levels (Cont.)
Database System Concepts - 7th Edition 12.21 ©Silberschatz, Korth and Sudarshan
Choice of RAID Level
Database System Concepts - 7th Edition 12.22 ©Silberschatz, Korth and Sudarshan
Choice of RAID Level (Cont.)
Database System Concepts - 7th Edition 12.23 ©Silberschatz, Korth and Sudarshan
Hardware Issues
Database System Concepts - 7th Edition 12.24 ©Silberschatz, Korth and Sudarshan
Hardware Issues (Cont.)
Database System Concepts - 7th Edition 12.25 ©Silberschatz, Korth and Sudarshan
Optimization of Disk-Block Access
Database System Concepts - 7th Edition 12.26 ©Silberschatz, Korth and Sudarshan
Optimization of Disk-Block Access
R6 R3 R1 R5 R2 R4
Database System Concepts - 7th Edition 12.27 ©Silberschatz, Korth and Sudarshan
Magnetic Tapes
§ Hold large volumes of data and provide high transfer rates
• Few GB for DAT (Digital Audio Tape) format, 10-40 GB with DLT
(Digital Linear Tape) format, 100 GB+ with Ultrium format, and 330 GB
with Ampex helical scan format
• Transfer rates from few to 10s of MB/s
§ Tapes are cheap, but cost of drives is very high
§ Very slow access time in comparison to magnetic and optical disks
• limited to sequential access.
• Some formats (Accelis) provide faster seek (10s of seconds) at cost of
lower capacity
§ Used mainly for backup, for storage of infrequently used information, and
as an off-line medium for transferring information from one system to
another.
§ Tape jukeboxes used for very large capacity storage
• Multiple petabyes (1015 bytes)
Database System Concepts - 7th Edition 12.29 ©Silberschatz, Korth and Sudarshan