0% found this document useful (0 votes)

8 views72 pages

04 Storage2

The document outlines the course structure for Database Systems (15-445/645) taught by Prof. Andy Pavlo in Fall 2024, including important dates for homework and projects. It discusses various database storage techniques, particularly focusing on tuple organization, slotted pages, and log-structured storage. Additionally, it highlights upcoming database talks and events, as well as the challenges associated with tuple-oriented storage.

Uploaded by

abidine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views72 pages

04 Storage2

Uploaded by

abidine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Database

Systems
Database Storage:
Tuple Organization
15-445/645 FALL 2024 PROF. ANDY PAVLO

15-445/645 FALL 2024 PROF. ANDY PAVLO

ADMINISTRIVIA
Homework #1 is due September 8th @ 11:59pm

Project #0 is due September 8th @ 11:59pm

Project #1 will be released on September 10th

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

UPCOMING DATABASE TALKS

Databricks
→ Tuesday Sept 10th @ 6:00pm
→ GHC 4401

Snowflake
→ Thursday Sept 12th @ 12:00pm
→ GHC 9115

Apache DataFusion (DB Seminar)

→ Monday Sept 23rd @ 4:30pm
→ Zoom

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

UPCOMING DATABASE EVENTS

CMU-DB Industry Affiliates Retreat
→ Monday Sept 16th: Research Talks + Poster Session
→ Tuesday Sept 17th: Company Info Sessions
→ All events are open to the public.

Sign-up for Company Info Sessions (@61)

Add your Resume if You Want to Make $$$ (@92)

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LAST CLASS
We presented a disk-oriented architecture where
the DBMS assumes that the primary storage
location of the database is on non-volatile disk.

We then discussed a page-oriented storage scheme

for organizing tuples across heap files.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3

→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3

→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3

→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3

→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4

→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4

→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

RECORD IDS
The DBMS assigns each logical tuple a
unique record identifier that
CTID (6-bytes)
represents its physical location in the
database.
→ File Id, Page Id, Slot #
→ Most DBMSs do not store ids in tuple.
→ SQLite uses ROWID as the true primary ROWID (8-bytes)
ROWID

key and stores them as a hidden attribute.

Applications should never rely on %%physloc%% (8-bytes)

these IDs to mean anything.
ROWID (10-bytes)
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

TUPLE-ORIENTED STORAGE
Insert a new tuple:
→ Check page directory to find a page with a free slot.
→ Retrieve the page from disk (if not in memory).
→ Check slot array to find empty space in page that will fit.

Update an existing tuple using its record id:

→ Check page directory to find location of page.
→ Retrieve the page from disk (if not in memory).
→ Find offset in page using slot array.
→ If new data fits, overwrite existing data.
Otherwise, mark existing tuple as deleted and insert new
version in a different page.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

TUPLE-ORIENTED STORAGE
Problem #1: Fragmentation
→ Pages are not fully utilized (unusable space, empty slots).
Problem #2: Useless Disk I/O
→ DBMS must fetch entire page to update one tuple.
Problem #3: Random Disk I/O
→ Worse case scenario when updating multiple tuples is that
each tuple is on a separate page.

What if the DBMS cannot overwrite data in

pages and could only create new pages?
→ Examples: Some object stores, HDFS, Google Colossus
HDF Google Colossu

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

TODAY'S AGENDA
Log-Structured Storage
Index-Organized Storage
Data Representation

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LOG-STRUCTURED STORAGE
Instead of storing tuples in pages and updating the
in-place, the DBMS maintains a log that records
changes to tuples.
→ Each log entry represents a tuple PUT/DELETE operation.
→ Originally proposed as log-structure merge trees (LSM
Trees) in 1996.

The DBMS applies changes to an in-memory data

structure (MemTable) and then writes out the
changes sequentially to disk (SSTable).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LOG-STRUCTURED STORAGE
MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LOG-STRUCTURED STORAGE
PUT (key101,a1) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LOG-STRUCTURED STORAGE
PUT (key102,b1) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LOG-STRUCTURED STORAGE
PUT (key101,a2) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LOG-STRUCTURED STORAGE
PUT (key103,c1) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

LOG-STRUCTURED STORAGE
MemTable SSTable
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory

Disk
5-445/645 (Fall 2024)