Secondary Storage Devices: Magnetic Disks Optical Disks Floppy Disks Magnetic Tapes
Secondary Storage Devices: Magnetic Disks Optical Disks Floppy Disks Magnetic Tapes
Magnetic Disks
Optical Disks
Floppy Disks
Magnetic Tapes
Jatinder singh
Secondary Storage Devices
Two major types of secondary storage
devices:
1. Direct Access Storage Devices (DASDs)
• Magnetic Discs
Hard disks (high capacity, low cost, fast)
Floppy disks (low capacity, lower cost,
slow)
• Optical Disks
CD-ROM = (Compact disc, read-only
memory
2. Serial Devices
• Magnetic tapes (very fast sequential access)
Jatinder singh
Storage and Files
Storage has major implications for DBMS design!
• READ: transfer data from disk to main memory (RAM).
• WRITE: transfer data from RAM to disk.
• Both operations are high-cost operations, relative to in-
memory operations, so DB must be planned carefully!
Why Not Store Everything in Main Memory?
• Costs too much: Cost of RAM about 100 times the cost of
the same amount of disk space, so relatively small size.
• Main memory is volatile.
• Typical storage hierarchy:
Main memory (RAM) (primary storage) for currently
used data.
Disk for the main database (secondary storage).
Tapes for archiving older versions of the data (tertiary
storage).
Jatinder singh
Storage Hierarchy
Secondary storage: magnetic disk/
optical devices/ tape systems
• typical capacity a number of 100GB for
fixed media; for removable
• cost per MB $0.01 for fixed media,
more for removable
• typical access time 8ms to 12ms for
fixed media, larger for removable
Jatinder singh
Magnetic Disks
Bits of data (0’s and 1’s) are stored on
circular magnetic platters called disks.
A disk rotates rapidly (& never stops).
A disk head reads and writes bits of
data as they pass under the head.
Often, several platters are organized
into a disk pack (or disk drive).
Jatinder singh
A Disk Drive
surfaces
Spindle Boom
Read/Write heads
tracks
sector
Surface of disk showing tracks and sectors
Jatinder singh
Organization of Disks
Disk contains concentric tracks.
Tracks are divided into sectors
A sector is the smallest addressable
unit in a disk.
Jatinder singh
Components of a Disk
Spindle
Tracks
Disk head
The platters spin (say, 90rps).
The arm assembly is Sector
moved in or out to position
a head on a desired track.
Tracks under heads make
a cylinder (imaginary!). Platters
Arm movement
Only one head
reads/writes at any
one time.
Arm assembly
Block size is a multiple
of sector size (which is often
fixed). Jatinder singh
Disk Controller
Disk controllers: typically embedded in the disk
drive, which acts as an interface between the
CPU and the disk hardware.
Jatinder singh
Accessing Data
When a program reads a byte from the
disk, the operating system locates the
surface, track and sector containing that
byte, and reads the entire sector into a
special area in main memory called
buffer.
The bottleneck of a disk access is moving
the read/write arm.
• So it makes sense to store a file in tracks that
are below/above each other on different
surfaces, rather than in several tracks on the
same surface.
Jatinder singh
Cylinders
A cylinder is the set of tracks at a
given radius of a disk pack.
• i.e. a cylinder is the set of tracks that
can be accessed without moving the
disk arm.
All the information on a cylinder can
be accessed without moving the
read/write arm.
Jatinder singh
Cylinders
Jatinder singh
Estimating Capacities
Jatinder singh
Exercise
Consider a block-addressable disk with the
following characteristics:
• Size of track 20,000 bytes.
• Nondata overhead per block = 300 bytes.
• Record size = 100 byte.
Q) How many records can be stored
per track if blocking factor is 10 or
60?
a) 10 (20000/1300*10=150)
b) 60 (20000/6300*60=180)
Jatinder singh
The Cost of a Disk Access
The time to access a sector in a track on a surface is divided
into 3 components:
Time Action
Component
Seek Time Time to move the read/write
arm to the correct cylinder
Rotational delay (or Time it takes for the disk to
latency) rotate so that the desired
sector is under the
read/write head
Transfer time Once the read/write head is
positioned over the data, this
is the time it takes for
Jatinder singh
transferring data
Seek time
Seek time is the time required to move
the arm to the correct cylinder.
Largest in cost.
Typically:
• 5 ms (miliseconds) to move from one track to
the next (track-to-track)
• 50 ms maximum (from inside track to outside
track)
• 30 ms average (from one random track to
another random track)
Jatinder singh
Average Seek Time (s)-1
It is usually impossible to know exactly how many
tracks will be traversed in every seek,
• we usually try to determine the average seek time
(s) required for a particular file operation.
Note:
• Min latency = 0
• Max latency = Time for one disk revolution
• Average latency (r) = (min + max) / 2
= max / 2
= time for ½ disk revolution
Typically 6 – 4 ms, at average
Jatinder singh
Rotational Latency-2
Given the following:
• R = the rotational speed of the spindle
(in rotations per second)
= the number of radians through
which the track must rotate
• then the rotational latency radians is:
Latency= (/2)*(1000/R), in ms
Jatinder singh
Transfer Time-1
Transfer time is the time for the read/write head to pass
over a block.
The transfer time is given by the formula:
number of sectors
Transfer time = --------------------------------- x rotation
time
track capacity in number of sectors
Jatinder singh
Transfer Time-2
The transfer time depends only on the speed at which the
spindle rotates, and the number of sectors that must be
read.
Given:
• St = the total number of sectors per track
• the transfer time for n contiguous sectors on the same
track is:
Transfer Time =(n/St)*(1000/R), in ms
Jatinder singh
Fast Sequential Reading
We assume that blocks are arranged so that
there is no rotational delay in transferring
from one track to another within the same
cylinder. This is possible if consecutive track
beginnings are staggered (like running races
on circular race tracks)
We also assume that the consecutive blocks
are arranged so that when the next block is on
an adjacent cylinder, there is no rotational
delay after the arm is moved to new cylinder
Fast sequential reading: no rotational delay
after finding the first block.
Jatinder singh
Assuming Fast Reading,
Consequently
Reading b blocks:
i. Sequentially:
s + r + b * btt
insignificant for large files, where b is very large:
b * btt
ii. Randomly:
b * (s + r + btt)
Jatinder singh
Secondary Storage Devices:
Magnetic Tapes
Jatinder singh
Characteristics
No direct access, but very fast
sequential access.
Resistant to different environmental
conditions.
Easy to transport, store, cheaper than
disk.
Before it was widely used to store
application data; nowadays, it’s mostly
used for backups or archives.
Jatinder singh
MT Characteristics-2
A sequence of bits are stored on
magnetic tape.
For storage, the tape is wound on a
reel.
To access the data, the tape is
unwound from one reel to another.
As the tape passes the head, bits of
data are read from or written onto the
tape.
Jatinder singh
Reel 1 Reel 2
tape
Read/write head
Jatinder singh
Tracks
Typically data on tape is stored in 9
separate bit streams, or tracks.
Each track is a sequence of bits.
Recording density = # of bits per
inch (bpi). Typically 800 or 1600 bpi.
30000 bpi on some recent devices.
Jatinder singh
MT recording in detail
8 bits = 1 byte
… 0 0 0 0 …
1 1 1 1
1 1 1 1
0 0 0 0
1 1 1 1 ½”
1 1 1 1
0 0 0 0
…
…
1
0
1
0
1
0
1
0
…
…
parity bit
Jatinder singh
Tape Organization
logical record
2400’
EOT
BOT Data blocks Interblock gap marker
marker (for acceleration &
Header block deceleration of tape)
(describes data blocks)
Jatinder singh
Data Blocks and Records
Each data block is a sequence of
contiguous records.
A record is the unit of data that a user’s
program deals with.
The tape drive reads an entire block of
records at once.
Unlike a disk, a tape starts and stops.
When stopped, the read/write head is over
an interblock gap.
Jatinder singh
Secondary Storage Devices:
CD-ROM
Jatinder singh
Physical Organization of CD-ROM
Compact Disk – read only memory (write
once), R/W is also available.
Data is encoded and read optically with a
laser
Can store around +600MB data
Digital data is represented as a series of
Pits and Lands:
• Pit = a little depression, forming a lower level
in the track
• Land = the flat part between pits, or the
upper levels in the track
Jatinder singh
Organization of data
Reading a CD is done by shining a laser at the
disc and detecting changing reflections patterns.
• 1 = change in height (land to pit or pit to land)
• 0 = a “fixed” amount of time between 1’s
LAND PIT LAND PIT LAND
...------+ +-------------+ +---...
|_____| |_______|
..0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 ..
Note that with CLV, the linear speed of the spiral passing under
the R/W head remains constant.
Jatinder singh
CD-ROM
Note that: Since 0's are represented by the length of time
between transitions, we must travel at constant linear
velocity (CLV)on the tracks.
Jatinder singh
Addressing
1 second of play time is divided up into 75
sectors.
Each sector holds 2KB
60 min CD:
60min * 60 sec/min * 75 sectors/sec =
270,000 sectors = 540,000 KB ~ 540 MB
A sector is addressed by:
Minute:Second:Sector
e.g. 16:22:34
Jatinder singh
File Structures for CD-ROM
Solution
• One approach place the entire directory structure in
one file, such that it allows building a left child right
sibling structure to be able to access any file.
Jatinder singh
File Structures for CD-ROM
The second approach is to create an index
to the file locations by hashing the full
path names of each file.
This method will not work for generic file
or directory searches.
A third method may utilize both above
methods, one can keep the advantage of
Unix like one file per directory scheme, at
the same time allows building indexes for
the subdirectories.
Jatinder singh
File Structures for CD-ROM
A forth method, assume directories as files
as well and use a special index that
organizes the directories and the files into
a hierarchy where a simple parental index
indicates the relationship between all
entries.
Rec Number File or dir name Parent
0 Root
1 Subdir1 0
2 Subdir11 1
3 Subdir12 1
4 File11 1
5 File 0
6 Subdir2 0
Jatinder singh
Representation of individual files on CD-
ROM
B+ Tree type data structures are appropriate for
organizing the files on CD-ROMs.
Build once read many times allows attempting to
achieve100% utilization of blocks or buckets.
Packing the internal nodes so that all of them can
be maintained in the memory during the data
fetches is important.
Secondary indexes can be formed so that the
records are pined to the indexes on a CD-ROM, as
the file will never be reorganized…
Jatinder singh
Representation of individual files on CD-
ROM
This may force the files on the source disks and
their copies on the CD-ROM to be differently
organized, because of the efficiency concerns.
It is possible to use hashing on the CD-ROM,
except that the overflow should either not exist
or minimized. This becomes possible when the
addressing space is kept large.
Remember that the files to be put on a CD-ROM
are final, so the hashing function can be chosen
to perform the best, i.e. with no collisions.
Jatinder singh