0% found this document useful (0 votes)
8 views72 pages

04 Storage2

The document outlines the course structure for Database Systems (15-445/645) taught by Prof. Andy Pavlo in Fall 2024, including important dates for homework and projects. It discusses various database storage techniques, particularly focusing on tuple organization, slotted pages, and log-structured storage. Additionally, it highlights upcoming database talks and events, as well as the challenges associated with tuple-oriented storage.

Uploaded by

abidine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views72 pages

04 Storage2

The document outlines the course structure for Database Systems (15-445/645) taught by Prof. Andy Pavlo in Fall 2024, including important dates for homework and projects. It discusses various database storage techniques, particularly focusing on tuple organization, slotted pages, and log-structured storage. Additionally, it highlights upcoming database talks and events, as well as the challenges associated with tuple-oriented storage.

Uploaded by

abidine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Database

Systems
Database Storage:
Tuple Organization
15-445/645 FALL 2024 PROF. ANDY PAVLO

15-445/645 FALL 2024 PROF. ANDY PAVLO


2

ADMINISTRIVIA
Homework #1 is due September 8th @ 11:59pm

Project #0 is due September 8th @ 11:59pm

Project #1 will be released on September 10th

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


3

UPCOMING DATABASE TALKS


Databricks
→ Tuesday Sept 10th @ 6:00pm
→ GHC 4401

Snowflake
→ Thursday Sept 12th @ 12:00pm
→ GHC 9115

Apache DataFusion (DB Seminar)


→ Monday Sept 23rd @ 4:30pm
→ Zoom

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


4

UPCOMING DATABASE EVENTS


CMU-DB Industry Affiliates Retreat
→ Monday Sept 16th: Research Talks + Poster Session
→ Tuesday Sept 17th: Company Info Sessions
→ All events are open to the public.

Sign-up for Company Info Sessions (@61)


Add your Resume if You Want to Make $$$ (@92)

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


5

LAST CLASS
We presented a disk-oriented architecture where
the DBMS assumes that the primary storage
location of the database is on non-volatile disk.

We then discussed a page-oriented storage scheme


for organizing tuples across heap files.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


6

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


7

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


8

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


9

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


10

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


11

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


12

RECORD IDS
The DBMS assigns each logical tuple a
unique record identifier that
CTID (6-bytes)
represents its physical location in the
database.
→ File Id, Page Id, Slot #
→ Most DBMSs do not store ids in tuple.
→ SQLite uses ROWID as the true primary ROWID (8-bytes)
ROWID

key and stores them as a hidden attribute.

Applications should never rely on %%physloc%% (8-bytes)


these IDs to mean anything.
ROWID (10-bytes)
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


13

TUPLE-ORIENTED STORAGE
Insert a new tuple:
→ Check page directory to find a page with a free slot.
→ Retrieve the page from disk (if not in memory).
→ Check slot array to find empty space in page that will fit.

Update an existing tuple using its record id:


→ Check page directory to find location of page.
→ Retrieve the page from disk (if not in memory).
→ Find offset in page using slot array.
→ If new data fits, overwrite existing data.
Otherwise, mark existing tuple as deleted and insert new
version in a different page.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


14

TUPLE-ORIENTED STORAGE
Problem #1: Fragmentation
→ Pages are not fully utilized (unusable space, empty slots).
Problem #2: Useless Disk I/O
→ DBMS must fetch entire page to update one tuple.
Problem #3: Random Disk I/O
→ Worse case scenario when updating multiple tuples is that
each tuple is on a separate page.

What if the DBMS cannot overwrite data in


pages and could only create new pages?
→ Examples: Some object stores, HDFS, Google Colossus
HDF Google Colossu

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


15

TODAY'S AGENDA
Log-Structured Storage
Index-Organized Storage
Data Representation

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


11

LOG-STRUCTURED STORAGE
Instead of storing tuples in pages and updating the
in-place, the DBMS maintains a log that records
changes to tuples.
→ Each log entry represents a tuple PUT/DELETE operation.
→ Originally proposed as log-structure merge trees (LSM
Trees) in 1996.

The DBMS applies changes to an in-memory data


structure (MemTable) and then writes out the
changes sequentially to disk (SSTable).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
PUT (key101,a1) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
PUT (key102,b1) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
PUT (key101,a2) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
PUT (key103,c1) MemTable

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 SSTable

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 SSTable SSTable Newest→Oldest

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 SSTable SSTable Newest→Oldest

Level #1 SSTable

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 Newest→Oldest

Level #1 SSTable

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 SSTable SSTable Newest→Oldest

Level #1 SSTable

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 SSTable SSTable Newest→Oldest

Level #1 SSTable SSTable

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 Newest→Oldest

Level #1 SSTable SSTable

Disk
Level #2 SSTable
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable SSTable

Key Low→High
PUT (key101,a2)
PUT (key102,b1)
PUT (key103,c1)

Memory
Level #0 Newest→Oldest

Level #1
Disk
Level #2 SSTable
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
MemTable

Memory
Level #0 SSTable

Level #1 SSTable

Disk
Level #2 SSTable
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


LOG-STRUCTURED STORAGE
GET (key101) MemTable
SummaryTable
• Min/Max Key
Per SSTable
• Key Filter
Per Level
Memory
Level #0 SSTable

Level #1 SSTable

Disk
Level #2 SSTable
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


34

LOG-STRUCTURED STORAGE
Key-value storage that appends log SSTable
records on disk to represent changes

Key Low→High
DEL (key100)
to tuples (PUT, DELETE).
PUT (key101,a3)
→ Each log record must contain the tuple's
unique identifier. PUT (key102,b2)
→ Put records contain the tuple contents. PUT (key103,c1)
→ Deletes marks the tuple as deleted.

As the application makes changes to


the database, the DBMS appends log
records to the end of the file without
checking previous log records.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


35

LOG-STRUCTURED COMPACTION
Periodically compact SSTAbles to reduce wasted
space and speed up reads.
→ Only keep the "latest" values for each key using a sort-
merge algorithm.
SSTable SSTable
DEL (key100) PUT (key101,a2)

+
PUT (key101,a3) PUT (key102,b1)
PUT (key102,b2) DEL (key103)
PUT (key103,c1) PUT (key104,d2)

Newest→Oldest
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


36

LOG-STRUCTURED COMPACTION
Periodically compact SSTAbles to reduce wasted
space and speed up reads.
→ Only keep the "latest" values for each key using a sort-
merge algorithm.
SSTable SSTable SSTable
DEL (key100) PUT (key101,a2) DEL (key100)

+
PUT (key101,a3) PUT (key102,b1) PUT (key101,a3)
PUT (key102,b2) DEL (key103) PUT (key102,b2)
PUT (key103,c1) PUT (key104,d2) PUT (key103,c1)
PUT (key104,d2)

Newest→Oldest
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


37

DISCUSSION
Log-structured storage managers are more common
today than in previous decades.
→ This is partly due to the proliferation of RocksDB.

What are some downsides of this approach?


→ Write-Amplification
→ Compaction is Expensive

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


16

OBSERVATION
The two table storage approaches we've discussed
so far rely on indexes to find individual tuples.
→ Such indexes are necessary because the tables are
inherently unsorted.

But what if the DBMS could keep tuples sorted


automatically using an index?

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


17

INDEX-ORGANIZED STORAGE
DBMS stores a table's tuples as the value of an index
data structure.
→ Still use a page layout that looks like a slotted page.
→ Tuples are typically sorted in page based on key.
B+Tree pays maintenance costs upfront, whereas
LSMs pay for it later.

Inner key→ key→ key→


Nodes Header offset offset offset

Leaf
Nodes
Tuple #3 Tuple #2 Tuple #6
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


40

TUPLE STORAGE
A tuple is essentially a sequence of bytes prefixed
with a header that contains meta-data about it.

It is the job of the DBMS to interpret those bytes


into attribute types and values.

The DBMS's catalogs contain the schema


information about tables that the system uses to
figure out the tuple's layout.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


19

DATA LAYOUT

unsigned char[]
CREATE TABLE foo (
id INT PRIMARY KEY, header id value
value BIGINT
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


19

DATA LAYOUT

unsigned char[]
CREATE TABLE foo (
id INT PRIMARY KEY, header id value
value BIGINT
);
reinterpret_cast<int32_t*>(address)

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

WORD-ALIGNED TUPLES
All attributes in a tuple must be word aligned to
enable the CPU to access it without any unexpected
behavior or additional work.

CREATE TABLE foo (


id INT PRIMARY KEY,
unsigned char[]
cdate TIMESTAMP,
color CHAR(2),
zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

WORD-ALIGNED TUPLES
All attributes in a tuple must be word aligned to
enable the CPU to access it without any unexpected
behavior or additional work.

CREATE TABLE foo (


32-bits id INT PRIMARY KEY, unsigned char[]
cdate TIMESTAMP, id
color CHAR(2),
zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

WORD-ALIGNED TUPLES
All attributes in a tuple must be word aligned to
enable the CPU to access it without any unexpected
behavior or additional work.

CREATE TABLE foo (


32-bits id INT PRIMARY KEY, unsigned char[]
64-bits cdate TIMESTAMP, id cdate
color CHAR(2),
zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

WORD-ALIGNED TUPLES
All attributes in a tuple must be word aligned to
enable the CPU to access it without any unexpected
behavior or additional work.

CREATE TABLE foo (


32-bits id INT PRIMARY KEY, unsigned char[]
64-bits cdate TIMESTAMP, id cdate c
16-bits color CHAR(2),
zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

WORD-ALIGNED TUPLES
All attributes in a tuple must be word aligned to
enable the CPU to access it without any unexpected
behavior or additional work.

CREATE TABLE foo (


32-bits id INT PRIMARY KEY, unsigned char[]
64-bits cdate TIMESTAMP, id cdate c zipc
16-bits color CHAR(2),
32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


21

WORD-ALIGNMENT: PADDING
Add empty bits after attributes to ensure that tuple
is word aligned. Essentially round up the storage
size of types to the next largest word size.

CREATE TABLE foo (


32-bits id INT PRIMARY KEY,
00000000 00000
64-bits cdate TIMESTAMP, id 00000000
00000000
00000000
cdate c zipc 000
00000
000

16-bits color CHAR(2),


32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


22

WORD-ALIGNMENT: REORDERING
Switch the order of attributes in the tuples' physical
layout to make sure they are aligned.
→ May still have to use padding to fill remaining space.

CREATE TABLE foo (


32-bits id INT PRIMARY KEY,
64-bits cdate TIMESTAMP, id cdate c zipc
16-bits color CHAR(2),
32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


22

WORD-ALIGNMENT: REORDERING
Switch the order of attributes in the tuples' physical
layout to make sure they are aligned.
→ May still have to use padding to fill remaining space.

CREATE TABLE foo (


32-bits id INT PRIMARY KEY,
000000000000
64-bits cdate TIMESTAMP, id zipc cdate c 000000000000
000000000000
000000000000
16-bits color CHAR(2),
32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


51

DATA REPRESENTATION
INTEGER/BIGINT/SMALLINT/TINYINT
→ Same as in C/C++.
FLOAT/REAL vs. NUMERIC/DECIMAL
→ IEEE-754 Standard / Fixed-point Decimals.
VARCHAR/VARBINARY/TEXT/BLOB
→ Header with length, followed by data bytes OR pointer to
another page/offset with data.
→ Need to worry about collations / sorting.
TIME/DATE/TIMESTAMP/INTERVAL
→ 32/64-bit integer of (micro/milli)-seconds since Unix
epoch (January 1st, 1970).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


24

VARIABLE PRECISION NUMBERS


Inexact, variable-precision numeric type that uses
the "native" C/C++ types.
Store directly as specified by IEEE-754.
→ Example: FLOAT, REAL/DOUBLE

These types are typically faster than fixed precision


numbers because CPU ISA's (Xeon, Arm) have
instructions / registers to support them.

But they do not guarantee exact values…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


53

VARIABLE PRECISION NUMBERS

Rounding Example Output


#include <stdio.h> x+y = 0.300000
0.3 = 0.300000
int main(int argc, char* argv[]) {
float x = 0.1;
float y = 0.2;
printf("x+y = %f\n", x+y);
printf("0.3 = %f\n", 0.3);
}

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


54

VARIABLE PRECISION NUMBERS

Rounding Example Output


#include <stdio.h> x+y = 0.300000
0.3 = 0.300000
int#include
main(int<stdio.h>
argc, char* argv[]) {
float x = 0.1; x+y = 0.30000001192092895508
int main(int
float argc, char* argv[]) {
y = 0.2; 0.3 = 0.29999999999999998890
float x = =0.1;
printf("x+y %f\n", x+y);
float y = 0.2;
printf("0.3 = %f\n", 0.3);
} printf("x+y = %.20f\n", x+y);
printf("0.3 = %.20f\n", 0.3);
}

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


26

FIXED PRECISION NUMBERS


Numeric data types with (potentially) arbitrary
precision and scale. Used when rounding errors are
unacceptable.
→ Example: NUMERIC, DECIMAL

Many different implementations.


→ Example: Store in an exact, variable-length binary
representation with additional meta-data.
→ Can be less expensive if the DBMS does not provide
arbitrary precision (e.g., decimal point can be in a different
position per value).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


56

POSTGRES: NUMERIC

# of Digits
typedef unsigned char NumericDigit;
Weight of 1st Digit typedef struct {
int ndigits;
Scale Factor int weight;
int scale;
Positive/Negative/NaN int sign;
NumericDigit *digits;
Digit Storage } numeric;
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


57

POSTGRES: NUMERIC

# of Digits
typedef unsigned char NumericDigit;
Weight of 1st Digit typedef struct {
int ndigits;
Scale Factor int weight;
int scale;
Positive/Negative/NaN int sign;
NumericDigit *digits;
Digit Storage } numeric;
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


28

NULL DATA TYPES


Choice #1: Null Column Bitmap Header
→ Store a bitmap in a centralized header that specifies what
attributes are null.
→ This is the most common approach in row-stores.

Choice #2: Special Values


→ Designate a placeholder value to represent NULL for a data
type (e.g., INT32_MIN). More common in column-stores.

Choice #3: Per Attribute Null Flag


→ Store a flag that marks that a value is null.
Don't → Must use more space than just a single bit because this
Do This! messes up with word alignment.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


5
28

NULL DATA TYPES


Choice #1: Null Column Bitmap Header
→ Store a bitmap in a centralized header that specifies what
attributes are null.
→ This is the most common approach in row-stores.

Choice #2: Special Values


→ Designate a placeholder value to represent NULL for a data
type (e.g., INT32_MIN). More common in column-stores.

Choice #3: Per Attribute Null Flag


→ Store a flag that marks that a value is null.
Don't → Must use more space than just a single bit because this
Do This! messes up with word alignment.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


29

LARGE VALUES
CREATE TABLE foo (
Most DBMSs do not allow a tuple to id INT PRIMARY KEY,
exceed the size of a single page. data INT,
Tuple
contents TEXT
);
To store values that are larger than a
page, the DBMS uses separate Header INT INT TEXT
overflow storage pages.
→ Postgres: TOAST (>2KB)
→ MySQL: Overflow (>½ size of page)
→ SQL Server: Overflow (>size of page)

Lots of potential optimizations:


→ Overflow Compression, German Strings
Overflow Compression German Strings

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


29

LARGE VALUES
CREATE TABLE foo (
Most DBMSs do not allow a tuple to id INT PRIMARY KEY,
exceed the size of a single page. data INT,
Tuple
contents TEXT
);
To store values that are larger than a
page, the DBMS uses separate Header INT INT TEXT
overflow storage pages.
→ Postgres: TOAST (>2KB)
→ MySQL: Overflow (>½ size of page) Overflow Page
→ SQL Server: Overflow (>size of page) VARCHAR DATA

Lots of potential optimizations:


→ Overflow Compression, German Strings
Overflow Compression German Strings

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


29

LARGE VALUES
CREATE TABLE foo (
Most DBMSs do not allow a tuple to id INT PRIMARY KEY,
exceed the size of a single page. data INT,
Tuple
contents TEXT
);
To store values that are larger than a
page, the DBMS uses separate Header INT INT size TEXT
location

overflow storage pages.


→ Postgres: TOAST (>2KB)
→ MySQL: Overflow (>½ size of page) Overflow Page
→ SQL Server: Overflow (>size of page) VARCHAR DATA

Lots of potential optimizations:


→ Overflow Compression, German Strings
Overflow Compression German Strings

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


30

EXTERNAL VALUE STORAGE


Some systems allow you to store a Tuple
large value in an external file. Header a b c d e
Treated as a BLOB type.
→ Oracle: BFILE data type
→ Microsoft: FILESTREAM data type
External File
The DBMS cannot manipulate the
contents of an external file.
→ No durability protections. Data
→ No transaction protections.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


19
30

EXTERNAL VALUE STORAGE


Some systems allow you to store a Tuple
large value in an external file. Header a b c d e
Treated as a BLOB type.
→ Oracle: BFILE data type
→ Microsoft: FILESTREAM data type
External File
The DBMS cannot manipulate the
contents of an external file.
→ No durability protections. Data
→ No transaction protections.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


65

SYSTEM CATALOGS
A DBMS stores meta-data about databases in its
internal catalogs.
→ Tables, columns, indexes, views
→ Users, permissions
→ Internal statistics

Almost every DBMS stores the database's catalog


inside itself (i.e., as tables).
→ Wrap object abstraction around tuples.
→ Specialized code for "bootstrapping" catalog tables.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


66

SYSTEM CATALOGS
You can query the DBMS’s internal
INFORMATION_SCHEMA catalog to get info about the
database.
→ ANSI standard set of read-only views that provide info
about all the tables, views, columns, and procedures in a
database

DBMSs also have non-standard shortcuts to


retrieve this information.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


67

ACCESSING TABLE SCHEMA


List all the tables in the current database:

SELECT * SQL-92
FROM INFORMATION_SCHEMA.TABLES
WHERE table_catalog = '<db name>';

\d; Postgres

SHOW TABLES; MySQL

.tables SQLite

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


68

ACCESSING TABLE SCHEMA


List all the tables in the student table:

SELECT * SQL-92
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'student'

\d student; Postgres

DESCRIBE student; MySQL

.schema student SQLite

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


69

SCHEMA CHANGES
ADD COLUMN:
→ NSM: Copy tuples into new region in memory.
→ DSM: Just create the new column segment on disk.
DROP COLUMN:
→ NSM #1: Copy tuples into new region of memory.
→ NSM #2: Mark column as "deprecated", clean up later.
→ DSM: Just drop the column and free memory.
CHANGE COLUMN:
→ Check whether the conversion is allowed to happen.
Depends on default values.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


70

INDEXES
CREATE INDEX:
→ Scan the entire table and populate the index.
→ Have to record changes made by txns that modified the
table while another txn was building the index.
→ When the scan completes, lock the table and resolve
changes that were missed after the scan started.
DROP INDEX:
→ Just drop the index logically from the catalog.
→ It only becomes "invisible" when the txn that dropped it
commits. All existing txns will still have to update it.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


71

CONCLUSION
Log-structured storage is an alternative approach to
the tuple-oriented architecture.
→ Ideal for write-heavy workloads because it maximizes
sequential disk I/O.

The storage manager is not entirely independent


from the rest of the DBMS.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


38

NEXT CLASS
Breaking your preconceived notion that a DBMS
stores everything as rows…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

You might also like