0% found this document useful (0 votes)
14 views

Lecture 14

The document discusses database storage concepts including file organization using pages and records, indexing, memory hierarchy, and buffer management. File organization involves storing records in pages with various options like heap files and sorted files. Buffer management caches frequently used pages in memory using policies like LRU to reduce disk I/O.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 14

The document discusses database storage concepts including file organization using pages and records, indexing, memory hierarchy, and buffer management. File organization involves storing records in pages with various options like heap files and sorted files. Buffer management caches frequently used pages in memory using policies like LRU to reduce disk I/O.

Uploaded by

Faruk Karagoz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

CSE 412 Database Management

Lecture 14 Database Storage


Jia Zou
Arizona State University

1
Overview
• File Organization
• Files of records
• Page Formats
• Record Formats
• Indexing
• Memory hierarchy
• Buffer management
Files
• FILE: A collection of pages, each containing a collection of records.
• Must support:
• insert/delete/modify record
• read a particular record (specified using record id)
• scan all records (possibly with some conditions on the records to be retrieved)
Alternative File Organization
• Several options (w/ trade-offs):
• Heap files: Suitable when typical access is a file scan retrieving all records.
• Sorted Files:
Later
• Index File Organizations:
Heap File using Lists
• The header page id and heap file name must be stored someplace.
• Each page contains 2 `pointers’ plus data.
Heap File using Lists
• The header page id and Heap file name must be stored someplace.
• Each page contains 2 `pointers’ plus data.

Any problems?
Heap File Using a Page Directory

• The entry for a page can


include the number of free
bytes on the page.
• The directory is a collection of
pages; linked list
implementation is just one
alternative.
• Much smaller than linked
list of all Heap File pages!
The Problem
• How would you store records on a page/ file, such that
• you can point to them
• you can insert/delete records with few disk accesses
Fixed-Length Records
• A Packed approach
Fixed-Length Records
• Insertion?
Fixed-Length Records
• How about deletes?
Fixed-Length Records
• How about deletes?

Bad - we have too much to


reorganize/update
Another Solution for Fixed-Length Records
• Slots+Bitmaps

✔ insertions
✔ deletions
Variable Length Records
• Slotted Page

• pack them
• keep ptrs to them
• rec-id = <page-id, slot#>
• mark start of free space
Record Formats: Fixed Length
• Information about field types same for all records in a file; stored in
system catalogs.
• Finding i’th field done via arithmetic.
Record Formats
• Fixed length records: straightforward - store info in catalog
• Variable length records: encode the length of each field?
• Store the length
• Use delimiter
Variable Length records
• Two alternative formats (# fields is fixed):

Pros and Cons?


Variable Length records
• Two alternative formats (# fields is fixed):

More popular!
Overview
• File Organization
• Files of records
• Page Formats
• Record Formats
• Indexing
• Memory hierarchy
• Buffer management
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
The Storage Hierarchy
• Main memory (RAM) for currently
used data.
• Disk for the main database
(secondary storage).
• Tapes for archiving older versions of
the data (tertiary storage).
Overview
• File Organization
• Files of records
• Page Formats
• Record Formats
• Indexing
• Memory hierarchy
• Buffer management
Motivation of Initial RDBMS architecture: Disk
is relatively VERY slow
• READ: disk-> main memory (RAM)
• WRITE: main memory (RAM) -> disk
• Both are high-cost operations, relative to in-memory operations, so
must be planned carefully
Rules of Thumbs
• Memory access much faster than disk I/O (~ 1000x)
• “Sequential” I/O faster than “random” I/O (~ 10x)
• seek time: moving arms to position disk head on track
• rotational delay: waiting for block to rotate under head
Dominating
• transfer time: actually moving data to/from disk surface
• SSD?
• Similar sequential and random
• Reading is much faster than writing
Disk Arrays: RAID (Redundant Array of
Inexpensive Disks)

Mean time to failure


Why not store it all in memory?
• Costs too much.
• disk: ~$0.1/Gb; memory: ~$5~10/Gb
• High-end Databases today in the 10-100 TB range
• Approx 60% of the cost of a production system is in the disks
• Main memory is volatile.
• Note: some specialized systems do store entire database in main
memory.
Can we leverage OS for DB storage
management?

OS virtual memory
OS file system
Can we leverage OS for DB storage
management?
• Unfortunately, OS often gets in the way of DBMS
• DBMS needs to do things “its own way”
• Control over buffer replacement policy
• LRU not always best (some times worst!)
• Control over flushing data to disk
• Write-ahead logging (WAL) protocol requires flushing log entries to disk
Overview
• File Organization
• Files of records
• Page Formats
• Record Formats
• Indexing
• Memory hierarchy
• Buffer management
Organize Disk Space into Pages
• A table is stored as one or more files, a file contains one or more
pages
• Higher levels call upon this layer to:
• allocate/de-allocate a page
• read/write a page
• Best if requested pages are stored sequentially on disk! Higher levels
don’t need to know if/ how this is done, nor how free space is
managed.
Buffer Management
Pinned or
Unpinned
Buffer Management
• Data must be in RAM for DBMS to operate on it!
• Buffer Mgr hides the fact that not all data is in RAM
When a Page is Requested ...
• Buffer pool information table contains: NOT FOUND <?,?,?>
• If requested page is not in pool and the pool is not full:
• Read requested page into chosen frame
• Pin the page and return its address
• If requested page is not in pool and the pool is full:
• Choose an (un-pinned) frame for replacement
• If frame is “dirty”, write it to disk
• Read requested page into chosen frame
• Pin the page and return its address
• Buffer pool information table now contains:

• Unpin it when you finish using the page


Buffer Replacement Policy
• Frame is chosen for replacement by a replacement policy:
• Least-recently-used (LRU), MRU, Clock, etc.
• Policy -> big impact on # of I/O ’s; depends on the access pattern
LRU Replacement Policy
• Least Recently Used (LRU)
• for each page in buffer pool, keep track of time last unpinned
• replace the frame which has the oldest (earliest) time
• very common policy: intuitive and simple
• Problems?
LRU Replacement Policy
• Problem: Sequential Flooding
• LRU + repeated sequential scans.
• # buffer frames < # pages in file means each page request causes an I/O. MRU
much better in this situation (but not in all situations, of course).
Sequential Flooding – Illustration
How LRU work?
How LRU work?
How LRU work?
How will MRU Work?
How will MRU work?
How will MRU work?
How will MRU work?
How will MRU work?
Advanced Paging Algorithm
• Greedy-dual
• Locality Set
• Clock
Summary
• Buffer manager brings pages into RAM.
• Very important for performance
• Page stays in RAM until released by requestor.
• Written to disk when frame chosen for replacement (which is sometime after
requestor releases the page).
• Choice of frame to replace based on replacement policy.
Conclusions
• Memory hierarchy
• Disks: (>1000x slower) thus
• pack info in blocks
• try to fetch nearby blocks (sequentially)
• Buffer management: very important
• LRU, MRU, etc
• Record organization: Slotted page

You might also like