Block Diagram of A DBMS: (R&G Chapter 9)
Block Diagram of A DBMS: (R&G Chapter 9)
Yea, from the table of my memory Ill wipe away all trivial fond records. -- Shakespeare, Hamlet
DB
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management
DB
Costs too much. For ~$1000, PCConnection will sell you either
~80GB of RAM (unrealistic) ~400GB of Flash USB keys (unrealistic) ~180GB of Flash solid-state disk (serious) ~7.7TB of disk (serious)
Bigger, Slower
Source: Operating Systems Concepts 5th Edition
2/3/09
Jim Grays Storage Latency Analogy: How Far Away is the Data?
10 9 Andromeda Tape /Optical Robot Pluto 2,000 Years
Disks
Still the secondary storage device of choice. Main advantage over tape:
random access vs. sequential.
10 6 Disk
2 Years
Time to retrieve a block depends on location Relative placement of blocks on disk has major impact on DBMS performance!
Components of a Disk
Disk head Spindle Tracks
The platters spin (say, 120 rps). The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). Only one head reads/ writes at any one time.
Block
seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface)
Arm movement
Platters
Arm assembly
Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?
Blocks in a file should be arranged sequentially on disk (by `next), to minimize seek and rotational delay. For a sequential scan, pre-fetching several pages at a time is a big win!
Request for a sequence of pages best satisfied by pages stored sequentially on disk!
Responsibility of disk space manager. Higher levels dont know how this is done, or how free space is managed. Though they may make performance assumptions!
Hence disk space manager should do a decent job.
2/3/09
Context
BUFFER POOL
DB
Data must be in RAM for DBMS to operate on it! BufMgr hides the fact that not all data is in RAM
If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!
2/3/09
A(1) B(p)
An approximation of LRU C(1) Arrange frames into a cycle, store one reference bit per frame
Can think of this as the 2nd chance bit
When pin count reduces to 0, turn on ref. bit When replacement necessary:
do for each page in cycle { if (pincount == 0 && ref bit is on) turn off ref bit; // 2nd chance else if (pincount == 0 && ref bit is off) choose this page for replacement; } until a page is chosen;
Context
Files of Records
Blocks are the interface for I/O, but Higher levels of DBMS operate on records, and files of records. FILE: A collection of pages, each containing a collection of records. Must support:
insert/delete/modify record fetch a particular record (specified using record id) scan all records (possibly with some conditions on the records to be retrieved)
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management
DB
Data Page
Data Page
Full Pages
Data Page
Data Page
The header page id and Heap file name must be stored someplace.
Database catalog
2/3/09
DIRECTORY
Data Page N
Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g.,
Find all students in the CS department Find all students with a gpa > 3
The entry for a page can include the number of free bytes on the page. The directory is a collection of pages; linked list implementation is just one alternative.
Much smaller than linked list of all HF pages!
Indexes are file structures that enable us to answer such value-based queries efficiently.
F1 L1
F2 L2
F3 L3
F4 L4
F1
F2
F3
F4
Address = B+L1+L2
Information about field types same for all records in a file; stored in system catalogs. Finding ith field done via arithmetic.
Array of Field Offsets Second offers direct access to ith field, efficient storage of nulls (special dont know value); small directory overhead.
...
Slot N
...
N
1# slots
PACKED
Record id = <page id, slot #>. In first alternative, moving records for free space management changes rid; may not be acceptable.
Can move records on page without changing rid; so, attractive for fixed-length records too.
SLOT DIRECTORY
2/3/09
System Catalogs
For each relation:
name, file location, file structure (e.g., Heap file) attribute name and type, for each attribute index name, for each index integrity constraints
pg_attribute
Summary
Disks provide cheap, non-volatile storage.
Better random access than tape, worse than RAM Arrange data to minimize seek and rotation delays.
Depends on workload!
Summary (Contd.)
DBMS vs. OS File Support
DBMS needs non-default features Careful timing of writes, control over prefetch
Summary (Contd.)
DBMS File tracks collection of pages, records within each.
Pages with free space identified using linked list or directory structure
Indexes support efficient retrieval of records based on the values in some fields. Catalog relations store information about relations, indexes and views.