0% found this document useful (0 votes)
6 views55 pages

03 Storage1

The document outlines the course structure for Database Systems (15-445/645) taught by Prof. Andy Pavlo in Fall 2024, including important dates for assignments and projects. It covers topics related to database storage, file organization, and the architecture of disk-based databases, emphasizing the importance of efficient data access and management. Key concepts discussed include storage hierarchy, access times, and different page storage architectures, such as heap files and tuple-oriented storage.

Uploaded by

wz1151897402
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views55 pages

03 Storage1

The document outlines the course structure for Database Systems (15-445/645) taught by Prof. Andy Pavlo in Fall 2024, including important dates for assignments and projects. It covers topics related to database storage, file organization, and the architecture of disk-based databases, emphasizing the importance of efficient data access and management. Key concepts discussed include storage hierarchy, access times, and different page storage architectures, such as heap files and tuple-oriented storage.

Uploaded by

wz1151897402
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Database

Systems
Database Storage:
Files & Pages
15-445/645 FALL 2024 PROF. ANDY PAVLO

15-445/645 FALL 2024 PROF. ANDY PAVLO


2

ADMINISTRIVIA
Homework #1 is due September 8th @ 11:59pm

Project #0 is due September 8th @ 11:59pm

Project #1 will be released on September 9th

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


3

LAST CLASS
We now understand what a database looks like at a
logical level and how to write queries to read/write
data (e.g., using SQL).

We will next learn how to build software that


manages a database (i.e., a DBMS).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


4

COURSE OUTLINE Application


SQL

Relational Databases Query Planning


Storage
Operator Execution
Query Execution
Concurrency Control Access Methods
Database Recovery Buffer Pool Manager
Distributed Databases
Disk Manager
Potpourri

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


5

TODAY'S AGENDA
Background
File Storage
Page Layout
Tuple Layout
DB Flash Talk: Neon

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


6

DISK-BASED ARCHITECTURE
The DBMS assumes that the primary storage
location of the database is on non-volatile disk.

The DBMS's components manage the movement of


data between non-volatile and volatile storage.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


7

STORAGE HIERARCHY
Faster
CPU Registers
Smaller
Expensive
CPU Caches
Volatile
Random Access
Byte-Addressable DRAM

Non-Volatile SSD
Sequential Access
Block-Addressable
HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
8

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM

SSD

Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
9

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM

SSD
Fast Network Storage
Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
10

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM
Persistent Memory
SSD
Fast Network Storage
Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
11

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM
Persistent Memory
SSD
Fast Network Storage
Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
12

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM
Persistent Memory
SSD
Fast Network Storage
Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
13

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM
Persistent Memory
SSD
Fast Network Storage
Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
14

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM
CXL Type 3
SSD
Fast Network Storage
Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
15

STORAGE HIERARCHY
Faster
28
CPU
CPU Registers
Smaller
CPU Expensive
CPU Caches

Memory DRAM
CXL Type 3
SSD
Fast Network Storage
Disk HDD
Slower
Network Storage Larger
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


Cheaper
16

ACCESS TIMES
Latency Numbers Every Programmer Should Know
1 ns L1 Cache Ref
4 ns L2 Cache Ref
100 ns DRAM
16,000 ns SSD
2,000,000 ns HDD
~50,000,000 ns Network Storage
1,000,000,000 ns Tape Archives
Source: Colin Scott
Colin Scott

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


17

ACCESS TIMES
Latency Numbers Every Programmer Should Know
1 ns L1 Cache Ref 1 sec
4 ns L2 Cache Ref 4 sec
100 ns DRAM 100 sec
16,000 ns SSD 4.4 hours
2,000,000 ns HDD 3.3 weeks
~50,000,000 ns Network Storage 1.5 years
1,000,000,000 ns Tape Archives 31.7 years
Source: Colin Scott
Colin Scott

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


19

SEQUENTIAL VS. RANDOM ACCESS


Random access on non-volatile storage is almost
always much slower than sequential access.

DBMS will want to maximize sequential access.


→ Algorithms try to reduce number of writes to random
pages so that data is stored in contiguous blocks.
→ Allocating multiple pages at the same time is called an
extent.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

SYSTEM DESIGN GOALS


Allow the DBMS to manage databases that exceed
the amount of memory available.

Reading/writing to disk is expensive, so it must be


managed carefully to avoid large stalls and
performance degradation.

Random access on disk is usually much slower than


sequential access, so the DBMS will want to
maximize sequential access.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


21

DISK-ORIENTED DBMS
Lectures #13-14
Get Page #2
Lecture #6 Execution
Engine
Pointer to Page #2
Buffer Pool
Directory Header
Interpret Page #2 layout
2 Update Page #2

Memory
Lecture #6
Database File

Directory Header Header Header Header Header


Lectures #3-5

1 2 3 4 5 … Pages

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


22

DATABASE STORAGE
Problem #1: How the DBMS represents the ← Today
database in files on disk.

Problem #2: How the DBMS manages its memory


and moves data back-and-forth from disk.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


23

FILE STORAGE
The DBMS stores a database as one or more files on
disk typically in a proprietary format.
→ The OS does not know anything about the contents of
these files.
→ We will discuss portable file formats next week…

Early systems in the 1980s used custom filesystems


on raw block storage.
→ Some "enterprise" DBMSs still support this.
→ Most newer DBMSs do not do this.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


24

STORAGE MANAGER
The storage manager is responsible for maintaining
a database's files.
→ Some do their own scheduling for reads and writes to
improve spatial and temporal locality of pages.

It organizes the files as a collection of pages.


→ Tracks data read/written to pages.
→ Tracks the available space.

A DBMS typically does not maintain multiple


copies of a page on disk.
→ Assume this happens above/below storage manager.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


25

DATABASE PAGES
A page is a fixed-size block of data.
→ It can contain tuples, meta-data, indexes, log records…
→ Most systems do not mix page types.
→ Some systems require a page to be self-contained.

Each page is given a unique identifier (page ID).


→ A page ID could be unique per DBMS instance, per
database, or per table.
→ The DBMS uses an indirection layer to map page IDs to
physical locations.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


26

DATABASE PAGES
There are three different notions of Default DB Page Sizes
"pages" in a DBMS:
→ Hardware Page (usually 4KB) 4KB
→ OS Page (usually 4KB, x64 2MB/1GB)
→ Database Page (512B-32KB)

A hardware page is the largest block


of data that the storage device can 8KB
guarantee failsafe writes.

DBMSs that specialize in read-only


workloads have larger page sizes. 16KB
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


27

PAGE STORAGE ARCHITECTURE


Different DBMSs manage pages in files on disk in
different ways.
→ Heap File Organization
→ Tree File Organization
→ Sequential / Sorted File Organization (ISAM)
→ Hashing File Organization

At this point in the hierarchy, we do not need to


know anything about what is inside of the pages.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


28

HEAP FILE
A heap file is an unordered collection of pages with
tuples that are stored in random order.
→ Create / Get / Write / Delete Page
→ Must also support iterating over all pages.

Need additional meta-data to track location of files


and free space availability.
Offset = Page# × PageSize
Database File

Page0 Page1 Page2 Page3 Page4

Get Page #2

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


29

HEAP FILE
A heap file is an unordered collection of pages with
tuples that are stored in random order.
→ Create / Get / Write / Delete Page
→ Must also support iterating over all pages.

Need additional meta-data to track location of files


and free space availability.
File Location Page# × PageSize

Page
Get Page #23 Directory

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


30

HEAP FILE: PAGE DIRECTORY


File 1 File 2
The DBMS maintains special pages
that tracks the location of data pages Page0 Page0

in the database files. Directory


Data Data
→ One entry per database object. Table X
→ Must make sure that the directory pages Index Y
are in sync with the data pages. Table Z Page1 Page1

DBMS also keeps meta-data about ⋮ Data Data


pages' contents:
→ Amount of free space per page.
→ List of free / empty pages. ⋮ ⋮
→ Page type (data vs. meta-data).
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


31

TODAY'S AGENDA
File Storage
Page Layout
Tuple Layout

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


32

PAGE HEADER
Every page contains a header of meta- Page
data about the page's contents.
→ Page Size Header
→ Checksum
→ DBMS Version
→ Transaction Visibility Data
→ Compression / Encoding Meta-data
→ Schema Information
→ Data Summary / Sketches

Some systems require pages to be self-


contained (e.g., Oracle).
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


33

PAGE LAYOUT
For any page storage architecture, we now need to
decide how to organize the data inside of the page.
→ We are still assuming that we are only storing tuples in a
Lecture #5 row-oriented storage model.

Approach #1: Tuple-oriented Storage ← Today


Approach #2: Log-structured Storage
Approach #3: Index-organized Storage

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


34

PAGE LAYOUT
For any page storage architecture, we now need to
decide how to organize the data inside of the page.
→ We are still assuming that we are only storing tuples in a
Lecture #5 row-oriented storage model.

Approach #1: Tuple-oriented Storage


Approach #2: Log-structured Storage
Lecture #4
Approach #3: Index-organized Storage

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


35

TUPLE-ORIENTED STORAGE
How to store tuples in a page? Page
Strawman Idea: Keep track of the Num Tuples = 0
number of tuples in a page and then
just append a new tuple to the end.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


36

TUPLE-ORIENTED STORAGE
How to store tuples in a page? Page
Strawman Idea: Keep track of the Num Tuples = 30
number of tuples in a page and then Tuple #1
just append a new tuple to the end.
Tuple #2
→ What happens if we delete a tuple?
Tuple #3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


37

TUPLE-ORIENTED STORAGE
How to store tuples in a page? Page
Strawman Idea: Keep track of the Num Tuples = 230
number of tuples in a page and then Tuple #1
just append a new tuple to the end.
→ What happens if we delete a tuple?
Tuple #3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


38

TUPLE-ORIENTED STORAGE
How to store tuples in a page? Page
Strawman Idea: Keep track of the Num Tuples = 30
number of tuples in a page and then Tuple #1
just append a new tuple to the end.
Tuple #4
→ What happens if we delete a tuple?
Tuple #3

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


39

TUPLE-ORIENTED STORAGE
How to store tuples in a page? Page
Strawman Idea: Keep track of the Num Tuples = 30
number of tuples in a page and then Tuple #1
just append a new tuple to the end.
Tuple #4
→ What happens if we delete a tuple?
→ What happens if we have a variable- Tuple #3
length attribute?

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


40

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


41

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


42

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


43

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4 Tuple #3


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


44

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


45

SLOTTED PAGES
The most common layout scheme is Slot Array
called slotted pages. 1 2 3 4 5 6 7

Header
The slot array maps "slots" to the
tuples' starting position offsets.

The header keeps track of: Tuple #4


→ The # of used slots
→ The offset of the starting location of the Tuple #2 Tuple #1
last slot used.
Fixed- and Var-length
Tuple Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


46

RECORD IDS
The DBMS assigns each logical tuple a
unique record identifier that
CTID (6-bytes)
represents its physical location in the
database.
→ File Id, Page Id, Slot #
→ Most DBMSs do not store ids in tuple.
→ SQLite uses ROWID as the true primary ROWID (8-bytes)
ROWID

key and stores them as a hidden attribute.

Applications should never rely on %%physloc%% (8-bytes)


these IDs to mean anything.
ROWID (10-bytes)
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


47

TODAY'S AGENDA
File Storage
Page Layout
Tuple Layout

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


48

TUPLE LAYOUT
A tuple is essentially a sequence of bytes.
→ These bytes do not have to be contiguous.

It is the job of the DBMS to interpret those bytes


into attribute types and values.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


49

TUPLE HEADER
Each tuple is prefixed with a header Tuple
that contains meta-data about it. Header Attribute Data
→ Visibility info (concurrency control)
→ Bit Map for NULL values.

We do not need to store meta-data


about the schema.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


50

TUPLE DATA
Attributes are typically stored in the Tuple
order that you specify them when you Header a b c d e
create the table.

This is done for software engineering CREATE TABLE foo (


reasons (i.e., simplicity). a INT PRIMARY KEY,
b INT NOT NULL,
c INT,
However, it might be more efficient d DOUBLE,
to lay them out differently. e FLOAT
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


51

DENORMALIZED TUPLE DATA


DBMS can physically denormalize
(e.g., "pre-join") related tuples and CREATE TABLE foo (
store them together in the same page. a INT PRIMARY KEY,
→ Potentially reduces the amount of I/O for b INT NOT NULL,
common workload patterns. ); CREATE TABLE bar (
→ Can make updates more expensive.
c INT PRIMARY KEY,
a INT
⮱REFERENCES foo (a),
);

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


52

DENORMALIZED TUPLE DATA


DBMS can physically denormalize foo
(e.g., "pre-join") related tuples and Header a b
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns.
→ Can make updates more expensive. bar
Header c a
SELECT * FROM foo JOIN bar
ON foo.a = bar.a; Header c a
Header c a

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


53

DENORMALIZED TUPLE DATA


DBMS can physically denormalize foo
(e.g., "pre-join") related tuples and Header a b c c c …
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns. foo bar
→ Can make updates more expensive.

SELECT * FROM foo JOIN bar


ON foo.a = bar.a;

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


54

DENORMALIZED TUPLE DATA


DBMS can physically denormalize foo
(e.g., "pre-join") related tuples and Header a b c c c …
store them together in the same page.
→ Potentially reduces the amount of I/O for
common workload patterns. foo bar
→ Can make updates more expensive.

Not a new idea.


→ IBM System R did this in the 1970s.
→ Several NoSQL DBMSs do this without
calling it physical denormalization.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


55

CONCLUSION
Database is organized in pages.
Different ways to track pages.
Different ways to store pages.
Different ways to store tuples.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


56

NEXT CLASS
Log-Structured Storage
Index-Organized Storage
Value Representation
Catalogs

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

You might also like