0% found this document useful (0 votes)
4 views53 pages

Database Management System Chapter 1

Uploaded by

mfalturki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views53 pages

Database Management System Chapter 1

Uploaded by

mfalturki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Copyright © 2016 Ramez Elmasri and Shamkant B.

Navathe
CHAPTER 16

Disk Storage, Basic File Structures,


Hashing, and Modern Storage
Architectures

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


16.1 Introduction
 Databases typically stored on magnetic disks
 Accessed using physical database file structures
 Storage hierarchy
 Primary storage

Main memory, cache memory
 Secondary storage

Magnetic disks, flash memory, solid-state drives
 Tertiary storage

Removable media

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 3


Memory Hierarchies and Storage
Devices
 Cache memory
 CPU cache
 DRAM
 Mass storage
 Magnetic disks

CD-ROM, DVD, tape drives
 Flash memory
 Nonvolatile

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 4


Storage Hierarchy

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 5


Access Times

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 6


Storage Types and Characteristics

Table 16.1 Types of Storage with Capacity, Access Time,


Max Bandwidth (Transfer Speed), and Commodity Cost

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-7


Storage Organization of Databases
 Persistent data
 Most databases
 Transient data
 Exists only during program execution
 File organization
 Determines how records are physically placed on
the disk
 Determines how records are accessed

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 8


16.2 Secondary Storage Devices
 Hard disk drive
 Bits (ones and zeros)
 Grouped into bytes or characters
 Disk capacity measures storage size
 Disks may be single or double-sided
 Concentric circles called tracks
 Tracks divided into blocks or sectors
 Disk packs
 Cylinder

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 9


Single-Sided Disk and Disk Pack

Figure 16.1 (a) A single-sided disk with read/write hardware


(b) A disk pack with read/write hardware
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-10
Sectors on a Disk

Figure 16.2 Different sector organizations on disk (a) Sectors subtending


a fixed angle (b) Sectors maintaining a uniform recording density

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-11


Secondary Storage Devices (cont’d.)
 Formatting
 Divides tracks into equal-sized disk blocks
 Blocks separated by interblock gaps
 Data transfer in units of disk blocks
 Hardware address supplied to disk I/O hardware
 Buffer
 Used in read and write operations
 Read/write head
 Hardware mechanism for read and write
operations
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 12
Secondary Storage Devices (cont’d.)
 Disk controller
 Interfaces disk drive to computer system
 Standard interfaces

SCSI

SATA

SAS

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 13


System Design Goals
 Allow the DBMS to manage databases that
exceed the amount of memory available.
 Reading/writing to disk is expensive, so it must
be managed carefully to avoid large stalls and
performance degradation.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 14


Sequential vs. Random
 Random access on an HDD is much slower than
sequential access.
 Traditional DBMSs are designed to maximize
sequential access.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 15


Secondary Storage Devices (cont’d.)
 Techniques for efficient data access
 Data buffering
 Proper organization of data on disk
 Reading data ahead of request
 Proper scheduling of I/O requests
 Use of log disks to temporarily hold writes
 Use of SSDs or flash memory for recovery
purposes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 16


Solid State Device Storage
 Sometimes called flash storage
 Main component: controller
 Set of interconnected flash memory cards
 No moving parts
 Data less likely to be fragmented
 More costly than HDDs
 DRAM-based SSDs available
 Faster access times compared with flash

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 17


Magnetic Tape Storage Devices
 Sequential access
 Must scan preceding blocks
 Tape is mounted and scanned until required block
is under read/write head
 Important functions
 Backup
 Archive

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 18


16.3 Buffering of Blocks
 Buffering most useful when processes can run
concurrently in parallel

Figure 16.3 Interleaved concurrency versus parallel execution

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 19


Buffering of Blocks (cont’d.)
 Double buffering can be used to read continuous
stream of blocks

Figure 16.4 Use of two buffers, A and B, for reading from disk

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 20


Buffer Management and
Replacement Strategies
 Buffer management information
 Pin count
 Dirty bit
 Buffer replacement strategies
 Least recently used (LRU)
 Clock policy
 First-in-first-out (FIFO)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 21


LRU vs. FIFO

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 22


16.4 Placing File Records on Disk
 Record: collection of related data values or items
 Values correspond to record field
 Data types
 Numeric
 String
 Boolean
 Date/time
 Binary large objects (BLOBs)
 Unstructured objects

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 23


Placing File Records on Disk
(cont’d.)
 Reasons for variable-length records
 One or more fields have variable length
 One or more fields are repeating
 One or more fields are optional
 File contains records of different types

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 24


Record Blocking and Spanned
Versus Unspanned Records
 File records allocated to disk blocks
 Spanned records
 Larger than a single block
 Pointer at end of first block points to block
containing remainder of record
 Unspanned
 Records not allowed to cross block boundaries

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 25


Record Blocking and Spanned
Versus Unspanned Records (cont’d.)
 Blocking factor
 Average number of records per block for the file

Figure 16.6 Types of record organization (a) Unspanned (b) Spanned

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-26


Record Blocking and Spanned
Versus Unspanned Records (cont’d.)
 Allocating file blocks on disk
 Contiguous allocation
 Linked allocation
 Indexed allocation

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 27


Record Blocking and Spanned
Versus Unspanned Records (cont’d.)
 File header (file descriptor)
 Contains file information needed by system
programs

Disk addresses of blocks

Format descriptions

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 28


16.5 Operations on Files
 Retrieval operations
 No change to file data
 Update operations
 File change by insertion, deletion, or modification
 Records selected based on selection condition

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 29


Operations on Files (cont’d.)
 Examples of operations for accessing file records
 Open
 Find
 Read
 FindNext
 Delete
 Insert
 Close
 Scan

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 30


Storage Manager
 The DBMS stores a database as one or more
files on disk.
 It organizes the files as a collection of pages.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 31


Database Pages
 A page is a fixed-size block of data.
 It can contain tuples, meta-data, indexes, log
records
 Each page is given a unique identifier.
 The DBMS uses an indirection layer to map page
ids to physical locations.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 32


Storage Architecture
 File Organization
 Page Layout
 Tuple Layout

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 33


File Organization
 Different DBMSs manage pages in files on disk in
different ways.
 Heap File Organization
 Sequential / Sorted File Organization
 Hashing File Organization

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 34


16.6 Files of Unordered Records
(Heap Files)
 Heap (or pile) file
 Records placed in file in order of insertion
 Inserting a new record is very efficient
 Searching for a record requires linear search
 Deletion techniques
 Rewrite the block
 Use deletion marker

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 35


Database Heap: Linked List

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Database Heap: Page Directory
 The DBMS maintains special pages that tracks
the location of data pages in the database files.
 The directory also records the number of free
slots per page.
 The DBMS has to make sure that the directory
pages are in sync with the data pages.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Database Heap: Page Directory

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


16.7 Files of Ordered Records
(Sorted Files)
 Ordered (sequential) file
 Records sorted by ordering field

Called ordering key if ordering field is a key field
 Advantages
 Reading records in order of ordering key value is
extremely efficient
 Finding next record
 Binary search technique

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 39


Access Times for Various File
Organizations

Table 16.3 Average access times for a file of b


blocks under basic file organizations

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16-40


16.8 Hashing Techniques
 Hash function (randomizing function)
 Applied to hash field value of a record
 Yields address of the disk block of stored record
 Organization called hash file
 Search condition is equality condition on the hash
field
 Hash field typically key field
 Hashing also internal search structure
 Used when group of records accessed exclusively
by one field value

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 41


Hash Example

Ex: key= 105 Hash(k)= mod 10 result= 5

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 42


Hashing Techniques (cont’d.)
 Internal hashing
 Hash table
 Collision
 Hash field value for inserted record hashes to
address already containing a different record
 Collision resolution
 Open addressing
 Chaining
 Multiple hashing

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 43


Hashing Techniques (cont’d.)
 External hashing for disk files
 Target address space made of buckets
 Bucket: one disk block or contiguous blocks
 Hashing function maps a key into relative bucket
 Table in file header converts bucket number to disk
block address
 Collision problem less severe with buckets

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 44


Storage Architecture
 File Organization
 Page Layout
 Tuple Layout

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 45


Page Layout
 Every page contains a header of metadata about
the page's contents.
 Page Size
 Checksum
 DBMS Version
 Compression Information
 How to organize data in a page?
 Tuple-oriented
 Log-structured

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 46


Tuple-Oriented
 The most common layout to store tuples in a
page called: slotted pages
 The slot array maps "slots" to the tuples' starting
position offsets.
 The header keeps track of:
 The # of used slots
 The offset of the starting location
of the last slot used.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 47


Log-Structured
 Instead of storing tuples in pages, the DBMS only
stores log records.
 The system appends log records to the file of
how the database was modified
 How to Read?
 How to Write?
 How to improve Reads?

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 48


Storage Architecture
 File Organization
 Page Layout
 Tuple Layout

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 49


Tuple Layout
 A tuple is essentially a sequence of bytes.
 It's the job of the DBMS to interpret those bytes
into attribute types and values.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 50


Tuple Layout
 Each tuple is prefixed with a header that contains
meta-data about it.
 Visibility info (concurrency control)
 Bit Map for NULL values.
 We do not need to store meta-data about the
schema, why?

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 51


Tuple Data
 Attributes are typically stored in the order that you
specify them when you create the table.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 52


Record ID
 The DBMS needs a way to keep track of
individual tuples.
 Each tuple is assigned a unique record identifier.
 Most common: page_id + offset/slot
 Can also contain file location info.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 53

You might also like