Chapter 13:disk Storage and Basic File Structures

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

CHAPTER 13:DISK

STORAGE AND BASIC FILE


STRUCTURES

Copyright © 2007 Ramez Elmasri


and Shamkant B. Navathe Disk Storage and Basic File Structures
Storage Hierarchy Includes Two Main
Slide 13-
2
Categories:
 Primary storage.
 This category includes storage media that can be
operated on directly by the computer central
processing unit (CPU)
 Secondary storage.
 This category includes magnetic disks, optical disks,
and tapes.
Storage of Databases
Slide 13-
3

 Most databases are stored permanently on magnetic


disk secondary storage, for the following reasons
 Databases are too large to fit entirely in main memory.
 The circumstances that cause permanent loss of stored
data arise less frequently for disk secondary storage than
for primary storage.
 The cost of storage per unit of data is less for disk than
for primary storage
Secondary Storage Devices:
Slide 13-
4
Disk Devices
 The most basic unit of data on the disk is a single bit of
information
 To code information, bits are grouped into bytes
 The capacity of a disk is the number of bytes it can
store
 Data stored as magnetized areas on magnetic disk
surfaces.
 A disk is single sided if it stores information on only
one of its surfaces and double-sided if both surfaces are
used
 A disk pack contains several magnetic disks connected
to a rotating spindle.
Secondary Storage Devices:
Slide 13-
5
Disk Devices(contd.)
 Disks are divided into concentric circular tracks on each disk
surface.
 Track capacities vary typically from 4 to 50 Kbytes or more
 A track is divided into smaller blocks or sectors
 because it usually contains a large amount of information
 The division of a track into sectors is hard-coded on the disk
surface and cannot be changed.
Secondary Storage Devices:
Slide 13-
6
Disk Devices(contd.)
 A track is divided into blocks.
 The block size B is fixed for each system.
 Typical block sizes range from B=512 bytes to B=4096 bytes.
 Whole blocks are transferred between disk and main memory for
processing.
Secondary Storage Devices:
Slide 13-
7
Disk Devices(contd.)
Secondary Storage Devices:
Slide 13-
8
Disk Devices(contd.)
 A read-write head moves to the track that contains the block
to be transferred.
 Disk rotation moves the block under the read-write head for
reading or writing.
 A physical disk block (hardware) address consists of:
 a cylinder number (imaginary collection of tracks of same radius
from all recorded surfaces)
 the track number or surface number (within the cylinder)
 and block number (within track).
 In many modern disk drives, a single number called LBA
(Logical Block Address)
 The total time needed to locate and transfer an arbitrary block,
given its address, is the sum of the seek time, rotational delay,
and block transfer time
Secondary Storage Devices:
Slide 13-
9
Disk Devices(contd.)

 Reading or writing a disk block is time consuming


because of the seek time s and rotational delay
(latency) rd.
Secondary Storage Devices:
Slide 13-
10
Disk Devices(contd.)
Typical Disk Parameters
Slide 13-
11

(Courtesy of Seagate Technology)


Placing File Records On Disk
Slide 13-
12
Records and Record Types
 Records usually describe entities and their attributes
 A collection of field names and their corresponding
data types constitutes a record type
 Fields themselves may be fixed length or variable
length
 Variable length fields can be mixed into one record:
 Separator characters or length fields are needed so that the
record can be “parsed.”
Placing File Records On Disk
Record Blocking and Spanned Versus Unspanned Records
Slide 13-
13

 Blocking:
 Refers to storing a number of records in one block on
the disk.
 Blocking factor (bfr) refers to the number of
records per block.
 There may be empty space in a block if an integral
number of records do not fit in one block.
 Spanned Records:
 Refers to records that exceed the size of one or more
blocks and hence span a number of blocks.
Placing File Records On Disk
Allocating File Blocks on Disk
Slide 13-
14

 A file is a sequence of records, where each record is a


collection of data values (or data items).
 A file descriptor (or file header) includes information that
describes the file, such as the field names and their data types,
and the addresses of the file blocks on disk.
 Records are stored on disk blocks.
 The blocking factor bfr for a file is the (average) number of
file records stored in a disk block.
 A file can have fixed-length records or variable-length
records.
Placing File Records On
Slide 13-
15
Disk(Contd..)
 Three record storage formats are structured below
as:
 (a) A Fixed –Length record with six fields and
size of 71 bytes
 (b) A record with two variable-length fields and
three fixed length fields
 (c) A variable- field record with three types of
separator characters
Placing File Records On
Slide 13-
16
Disk(Cont..)
Placing File Records On Disk
Allocating File Blocks on Disk(contd.)
Slide 13-
17

 File records can be unspanned or spanned


 Unspanned: no record can span two blocks
 Spanned: a record can be stored in more than one block
 The physical disk blocks that are allocated to hold the records
of a file can be contiguous, linked, or indexed.
 In a file of fixed-length records, all records have the same
format. Usually, unspanned blocking is used with such files.
 Files of variable-length records require additional information
to be stored in each record, such as separator characters and
field types.
 Usually spanned blocking is used with such files.
Placing File Records On Disk
Types of record organization (a) unspanned and
Slide 13-
18
(b) spanned
Operation on Files
Slide 13-
19

 Typical file operations include:


 OPEN: Readies the file for access, and associates a pointer that will refer to a
current file record at each point in time.
 FIND: Searches for the first file record that satisfies a certain condition, and makes
it the current file record.
 FINDNEXT: Searches for the next file record (from the current record) that
satisfies a certain condition, and makes it the current file record.
 READ: Reads the current file record into a program variable.
 INSERT: Inserts a new record into the file & makes it the current file record.
 DELETE: Removes the current file record from the file, usually by marking the
record to indicate that it is no longer valid.
 MODIFY: Changes the values of some fields of the current file record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Reorganizes the file records.
 For example, the records marked deleted are physically removed from the file or a
new organization of the file records is created.
 READ_ORDERED: Read the file blocks in order of a specific field of the file.
Files of Unordered Records
Slide 13-
20
(Heap Files)
 Also called a heap or a pile file.
 New records are inserted at the end of the file.
 A linear search through the file records is
necessary to search for a record.
 This requires reading and searching half the file blocks
on the average, and is hence quite expensive.
 Record insertion is quite efficient.
 Reading the records in order of a particular field
requires sorting the file records.
Files Of Ordered Records (Sorted
Slide 13-
21
Files)
 Also called a sequential file.
 File records are kept sorted by the values of an ordering field.
 Insertion is expensive: records must be inserted in the correct order.
 It is common to keep a separate unordered overflow (or transaction)
file for new records to improve insertion efficiency; this is periodically
merged with the main ordered file.
 A binary search can be used to search for a record on its ordering field
value.
 This requires reading and searching log of the file blocks on the
2
average, an improvement over linear search.
 Reading the records in order of the ordering field is quite efficient.
FILES OF ORDERED RECORDS (SORTED
FILES)(contd.)
Slide 13-
22
Average Access Times
Slide 13-
23

 The following table shows the average access time


to access a specific record for a given type of file
Parallelizing Disk Access using RAID
Slide 13-
24
Technology.
 Secondary storage technology must take steps to
keep up in performance and reliability with
processor technology.
 A major advance in secondary storage technology is
represented by the development of RAID, which
originally stood for Redundant Arrays of
Inexpensive Disks.
 The main goal of RAID is to even out the widely
different rates of performance improvement of disks
against those in memory and microprocessors.
RAID Technology (contd.)
Slide 13-
25

 A natural solution is a large array of small


independent disks acting as a single higher-
performance logical disk.
 A concept called data striping is used, which
utilizes parallelism to improve disk performance.
 Data striping distributes data transparently over
multiple disks to make them appear as a single
large, fast disk.
RAID Technology (contd.)
Slide 13-
26

 Different raid organizations were defined based on different combinations


of the two factors of granularity of data interleaving (striping) and pattern
used to compute redundant information.
 Raid level 0 has no redundant data and hence has the best write
performance at the risk of data loss
 Raid level 1 uses mirrored disks.
 Raid level 2 uses memory-style redundancy by using Hamming codes,
which contain parity bits for distinct overlapping subsets of
components. Level 2 includes both error detection and correction.
 Raid level 3 uses a single parity disk relying on the disk controller to
figure out which disk has failed.
 Raid Levels 4 and 5 use block-level data striping, with level 5
distributing data and parity information across all disks.
 Raid level 6 applies the so-called P + Q redundancy scheme using
Reed-Soloman codes to protect against up to two disk failures by using
just two redundant disks.
Use of RAID Technology (contd.)
Slide 13-
27

 Different raid organizations are being used under different situations


 Raid level 1 (mirrored disks) is the easiest for rebuild of a disk from other disks
 It is used for critical applications like logs
 Raid level 2 uses memory-style redundancy by using Hamming codes, which
contain parity bits for distinct overlapping subsets of components.
 Level 2 includes both error detection and correction.
 Raid level 3 (single parity disks relying on the disk controller to figure out which
disk has failed) and level 5 (block-level data striping) are preferred for Large
volume storage, with level 3 giving higher transfer rates.
 Most popular uses of the RAID technology currently are:
 Level 0 (with striping), Level 1 (with mirroring) and Level 5 with an extra drive for
parity.
 Design Decisions for RAID include:
 Level of RAID, number of disks, choice of parity schemes, and grouping of disks
for block-level striping.
Use of RAID
Technology (contd.)
Slide 13-
28
Trends in Disk Technology
Slide 13-
29
Storage Area Networks
Slide 13-
30

 The demand for higher storage has risen considerably in


recent times.
 Organizations have a need to move from a static fixed data
center oriented operation to a more flexible and dynamic
infrastructure for information processing.
 Thus they are moving to a concept of Storage Area Networks
(SANs).
 In a SAN, online storage peripherals are configured as nodes on
a high-speed network and can be attached and detached from
servers in a very flexible manner.
 This allows storage systems to be placed at longer distances
from the servers and provide different performance and
connectivity options.
Storage Area Networks (contd.)
Slide 13-
31

 Advantages of SANs are:


 Flexible many-to-many connectivity among servers and storage
devices using fiber channel hubs and switches.
 Up to 10km separation between a server and a storage system
using appropriate fiber optic cables.
 Better isolation capabilities allowing non-disruptive addition of
new peripherals and servers.
 SANs face the problem of combining storage options from
multiple vendors and dealing with evolving standards of
storage management software and hardware.

You might also like