0% found this document useful (0 votes)
10 views76 pages

05 Storage3

The document outlines the course details for Database Systems (15-445/645) taught by Prof. Andy Pavlo in Fall 2024, including important dates for homework and projects. It discusses various database workloads such as OLTP and OLAP, as well as storage models like N-ary Storage Model (NSM) and Decomposition Storage Model (DSM), highlighting their advantages and disadvantages. Upcoming events related to the course and industry affiliates are also mentioned.

Uploaded by

abidine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views76 pages

05 Storage3

The document outlines the course details for Database Systems (15-445/645) taught by Prof. Andy Pavlo in Fall 2024, including important dates for homework and projects. It discusses various database workloads such as OLTP and OLAP, as well as storage models like N-ary Storage Model (NSM) and Decomposition Storage Model (DSM), highlighting their advantages and disadvantages. Upcoming events related to the course and industry affiliates are also mentioned.

Uploaded by

abidine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Database

Systems
Storage Models &
Data Compression
15-445/645 FALL 2024 PROF. ANDY PAVLO

15-445/645 FALL 2024 PROF. ANDY PAVLO


2

ADMINISTRIVIA
Homework #2 is due Sept 22nd @ 11:59pm

Project #1 is due Sept 29th @ 11:59pm


→ Recitation on Wed Sept 18th @ 6:00pm

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


3

UPCOMING DATABASE EVENTS


CMU-DB Industry Affiliates Visit Day
→ Monday Sept 16th: Research Talks + Poster Session
→ Tuesday Sept 17th: Company Info Sessions
→ All events are open to the public.

Sign-up for Company Info Sessions (@61)


Add your Resume if You Want to Make $$$ (@92)

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


4

LAST CLASS
We discussed storage architecture alternatives to
tuple-oriented scheme.
→ Log-structured storage
→ Index-organized storage

These approaches are ideal for write-heavy


(INSERT/UPDATE/DELETE) workloads.
But the most important query for some workloads
may be read (SELECT ) performance…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


5

TODAY'S AGENDA
Database Workloads
Storage Models
Data Compression
DB Flash Talk: StarTree

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


6

DATABASE WORKLOADS
On-Line Transaction Processing (OLTP)
→ Fast operations that only read/update a small amount of
data each time.

On-Line Analytical Processing (OLAP)


→ Complex queries that read a lot of data to compute
aggregates.

Hybrid Transaction + Analytical Processing


→ OLTP + OLAP together on the same database instance

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


7

DATABASE WORKLOADS
Complex
Operation Complexity

OLAP
Jim Gray

Jim Gray

OLTP
Simple
Write-Heavy Read-Heavy

5-445/645 (Fall 2024)


Workload Focus Source: Mike Stonebraker
Mike Stonebraker

15-445/645 (Fall 2024)


8

DATABASE WORKLOADS
Complex
Operation Complexity

OLAP
HTAP
OLTP
Simple
Write-Heavy Read-Heavy

5-445/645 (Fall 2024)


Workload Focus Source: Mike Stonebraker
Mike Stonebraker

15-445/645 (Fall 2024)


9

WIKIPEDIA EXAMPLE

CREATE TABLE useracct ( CREATE TABLE pages (


userID INT PRIMARY KEY, pageID INT PRIMARY KEY,
userName VARCHAR UNIQUE, title VARCHAR UNIQUE,
⋮ latest INT
); ⮱REFERENCES revisions (revID),
);

CREATE TABLE revisions (


revID INT PRIMARY KEY,
userID INT REFERENCES useracct (userID),
pageID INT REFERENCES pages (pageID),
content TEXT,
updated DATETIME
);
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


10

OBSERVATION
The relational model does not specify that the
DBMS must store all a tuple's attributes together in
a single page.

This may not actually be the best layout for some


workloads…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


11

OLTP
On-line Transaction Processing:
SELECT P.*, R.*
→ Simple queries that read/update a small FROM pages AS P
amount of data that is related to a single INNER JOIN revisions AS R
entity in the database. ON P.latest = R.revID
WHERE P.pageID = ?
This is usually the kind of application
that people build first. UPDATE useracct
SET lastLogin = NOW(),
hostname = ?
WHERE userID = ?

INSERT INTO revisions


VALUES (?,?…,?)
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


11

OLAP
On-line Analytical Processing: SELECT COUNT(U.lastLogin),
→ Complex queries that read large portions EXTRACT(month FROM
of the database spanning multiple entities. U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
You execute these workloads on the GROUP BY
EXTRACT(month FROM U.lastLogin)
data you have collected from your
OLTP application(s).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


13

STORAGE MODELS
A DBMS's storage model specifies how it
physically organizes tuples on disk and in memory.
→ Can have different performance characteristics based on
the target workload (OLTP vs. OLAP).
→ Influences the design choices of the rest of the DBMS.

Choice #1: N-ary Storage Model (NSM)


Choice #2: Decomposition Storage Model (DSM)
Choice #3: Hybrid Storage Model (PAX)

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


14

N-ARY STORAGE MODEL (NSM)


The DBMS stores (almost) all attributes for a single
tuple contiguously in a single page.
→ Also commonly known as a row store

Ideal for OLTP workloads where queries are more


likely to access individual entities and execute write-
heavy workloads.

NSM database page sizes are typically some constant


multiple of 4 KB hardware pages.
→ See Lecture #03

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


15

NSM: PHYSICAL ORGANIZATION


A disk-oriented NSM system stores a Col A Col B Col C

tuple's fixed-length and variable- Row #0


Row #1
a0
a1
b0
b1
c0
c1
length attributes contiguously in a Row #2 a2 b2 c2
single slotted page. Row #3 a3 b3 c3
Row #4 a4 b4 c4

The tuple's record id (page#, slot#) is


Row #5 a5 b5 c5

how the DBMS uniquely identifies a Slot Array


physical tuple.

Database Page
header

header a0 b0 c0

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


16

NSM: PHYSICAL ORGANIZATION


A disk-oriented NSM system stores a Col A Col B Col C

tuple's fixed-length and variable- Row #0


Row #1
a0
a1
b0
b1
c0
c1
length attributes contiguously in a Row #2 a2 b2 c2
single slotted page. Row #3 a3 b3 c3
Row #4 a4 b4 c4

The tuple's record id (page#, slot#) is


Row #5 a5 b5 c5

how the DBMS uniquely identifies a Slot Array


physical tuple.

Database Page
header

header a1
b1 c1 header a0 b0 c0

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


17

NSM: PHYSICAL ORGANIZATION


A disk-oriented NSM system stores a Col A Col B Col C

tuple's fixed-length and variable- Row #0


Row #1
a0
a1
b0
b1
c0
c1
length attributes contiguously in a Row #2 a2 b2 c2
single slotted page. Row #3 a3 b3 c3
Row #4 a4 b4 c4

The tuple's record id (page#, slot#) is


Row #5 a5 b5 c5

how the DBMS uniquely identifies a Slot Array


physical tuple.

Database Page
header

header a5 b5 c5 header a4
b4 c4 header a3 b3 c3
header a2 b2 c2 header a1
b1 c1 header a0 b0 c0

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


18

NSM: OLTP EXAMPLE


SELECT * FROM useracct
Lectures #8 + #9
WHERE userName = ?
AND userPass = ?
Index

NSM Disk Page


header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin


Database File

header userID userName userPass hostname lastLogin

header - - - - -

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


19

NSM: OLTP EXAMPLE


SELECT * FROM useracct
Lectures #8 + #9
WHERE userName = ?
AND userPass = ?
Index
INSERT INTO useracct
VALUES (?,?,…?)
NSM Disk Page
header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin


Database File

header userID userName userPass hostname lastLogin

header userID
- userName
- userPass
- hostname
- lastLogin
-

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


20

NSM: OLAP EXAMPLE


SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)
Database File

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


21

NSM: OLAP EXAMPLE


SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)

NSM Disk Page


header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin


Database File

header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


22

NSM: OLAP EXAMPLE


SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)

NSM Disk Page


header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin


Database File

header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin

Disk
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


23

NSM: OLAP EXAMPLE


SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)

NSM Disk Page


header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin


Database File

header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin

Disk
5-445/645 (Fall 2024)
Useless Data!
15-445/645 (Fall 2024)
24

NSM: SUMMARY
Advantages
→ Fast inserts, updates, and deletes.
→ Good for queries that need the entire tuple (OLTP).
→ Can use index-oriented physical storage for clustering.

Disadvantages
→ Not good for scanning large portions of the table and/or a
subset of the attributes.
→ Terrible memory locality in access patterns.
→ Not ideal for compression because of multiple value
domains within a single page.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


25

DECOMPOSITION STORAGE MODEL (DSM)


Store a single attribute for all tuples
contiguously in a block of data.
→ Also known as a "column store"

Ideal for OLAP workloads where


read-only queries perform large scans
over a subset of the table’s attributes.

DBMS is responsible for


combining/splitting a tuple's
attributes when reading/writing.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


26

DSM: PHYSICAL ORGANIZATION


Store attributes and meta-data (e.g., Col A Col B Col C

nulls) in separate arrays of fixed- Row #0 a0 b0 c0

length values.
Row #1 a1 b1 c1
Row #2 a2 b2 c2
→ Most systems identify unique physical Row #3 a3 b3 c3
tuples using offsets into these arrays. Row #4 a4 b4 c4
→ Need to handle variable-length values… Row #5 a5 b5 c5

Maintain separate pages per attribute

Page #1
header null bitmap

with a dedicated header area for meta-


a0 a1 a2 a3 a4 a5

data about entire column.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


27

DSM: PHYSICAL ORGANIZATION


Store attributes and meta-data (e.g., Col A Col B Col C

nulls) in separate arrays of fixed- Row #0 a0 b0 c0

length values.
Row #1 a1 b1 c1
Row #2 a2 b2 c2
→ Most systems identify unique physical Row #3 a3 b3 c3
tuples using offsets into these arrays. Row #4 a4 b4 c4
→ Need to handle variable-length values… Row #5 a5 b5 c5

Maintain separate pages per attribute

Page #2 Page #1
header null bitmap

with a dedicated header area for meta-


a0 a1 a2 a3 a4 a5

data about entire column. header


b0 b1 b2 b3
null bitmap
b4 b5

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


28

DSM: PHYSICAL ORGANIZATION


Store attributes and meta-data (e.g., Col A Col B Col C

nulls) in separate arrays of fixed- Row #0 a0 b0 c0

length values.
Row #1 a1 b1 c1
Row #2 a2 b2 c2
→ Most systems identify unique physical Row #3 a3 b3 c3
tuples using offsets into these arrays. Row #4 a4 b4 c4
→ Need to handle variable-length values… Row #5 a5 b5 c5

Maintain separate pages per attribute

Page #2 Page #1
header null bitmap

with a dedicated header area for meta-


a0 a1 a2 a3 a4 a5

data about entire column. header


b0 b1 b2 b3
null bitmap
b4 b5

header null bitmap

Page #3
c0 c1 c2 c3 c4
5-445/645 (Fall 2024)
c5
15-445/645 (Fall 2024)
29

DSM: OLAP EXAMPLE

header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin

header userID userName userPass hostname lastLogin

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


30

DSM: OLAP EXAMPLE


SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)

DSM Disk Page


header hostname hostname hostname hostname

userID lastLogin hostname hostname hostname hostname hostname hostname


Database File

hostname hostname hostname hostname hostname hostname

hostname hostname hostname hostname hostname hostname


userName
Disk userPass
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


31

DSM: OLAP EXAMPLE


SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)

DSM Disk Page


header hostname hostname hostname hostname

userID lastLogin hostname hostname hostname hostname hostname hostname


Database File

hostname hostname hostname hostname hostname hostname

hostname hostname hostname hostname hostname hostname


userName
Disk userPass
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


32

DSM: OLAP EXAMPLE


SELECT COUNT(U.lastLogin),
EXTRACT(month FROM U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY EXTRACT(month FROM U.lastLogin)

DSM Disk Page


header lastLogin lastLogin lastLogin lastLogin

userID lastLogin lastLogin lastLogin lastLogin lastLogin lastLogin lastLogin


Database File

lastLogin lastLogin lastLogin lastLogin lastLogin lastLogin

lastLogin lastLogin lastLogin lastLogin lastLogin lastLogin


userName
Disk userPass
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


33

DSM: TUPLE IDENTIFICATION


Choice #1: Fixed-length Offsets
→ Each value is the same length for an attribute.

Choice #2: Embedded Tuple Ids


→ Each value is stored with its tuple id in a column.
Don't
Do This!
Offsets Embedded Ids
A B C D A B C D
0 0 0 0 0
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


34

DSM: VARIABLE-LENGTH DATA


Padding variable-length fields to ensure they are
fixed-length is wasteful, especially for large
attributes.

A better approach is to use dictionary compression to


convert repetitive variable-length data into fixed-
length values (typically 32-bit integers).
→ More on this later in this lecture…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


35

DECOMPOSITION STORAGE MODEL (DSM)


Advantages
→ Reduces the amount wasted I/O per query because the
DBMS only reads the data that it needs.
→ Faster query processing because of increased locality and
cached data reuse (Lecture #13).
→ Better data compression.

Disadvantages
→ Slow for point queries, inserts, updates, and deletes
because of tuple splitting/stitching/reorganization.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


36

OBSERVATION
OLAP queries almost never access a single column
in a table by itself.
→ At some point during query execution, the DBMS must get
other columns and stitch the original tuple back together.
But we still need to store data in a columnar format
to get the storage + execution benefits.

We need columnar scheme that still stores


attributes separately but keeps the data for each
tuple physically close to each other…

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


37

PAX STORAGE MODEL


Partition Attributes Across (PAX)
A close-up of a paper

Description automatically generated

is a hybrid storage model that


vertically partitions attributes within
a database page.
→ Examples: Parquet, ORC, and Arrow.

The goal is to get the benefit of faster


processing on columnar storage while
retaining the spatial locality benefits
of row storage.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


26

PAX: PHYSICAL ORGANIZATION


Col A Col B Col C
Horizontally partition data into row Row #0 a0 b0 c0

groups. Then vertically partition their Row #1 a1 b1 c1


Row #2 a2 b2 c2
attributes into column chunks. Row #3 a3 b3 c3
Row #4 a4 b4 c4

Global meta-data directory contains Row #5 a5 b5 c5

offsets to the file's row groups.


→ This is stored in the footer if the file is Column

Row Group
Chunk
Row Group Meta-Data
immutable (Parquet, Orc). a0 a1 a2 b0 b1 b2

PAX File
c0 c1 c2

Each row group contains its own


meta-data header about its contents.
File Meta-Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


26

PAX: PHYSICAL ORGANIZATION


Col A Col B Col C
Horizontally partition data into row Row #0 a0 b0 c0

groups. Then vertically partition their Row #1 a1 b1 c1


Row #2 a2 b2 c2
attributes into column chunks. Row #3 a3 b3 c3
Row #4 a4 b4 c4

Global meta-data directory contains Row #5 a5 b5 c5

offsets to the file's row groups.


→ This is stored in the footer if the file is Column

Row Group
Chunk
Row Group Meta-Data
immutable (Parquet, Orc). a0 a1 a2 b0 b1 b2

PAX File
c0 c1 c2

Each row group contains its own

Row Group
Row Group Meta-Data

meta-data header about its contents. a3


c3
a4 a5
c4
b3
c5
b4 b5

File Meta-Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


26

PAX: PHYSICAL ORGANIZATION


149

Col A Col B Col C


Horizontally partition data into row Row #0 a0 b0 c0

groups. Then vertically partition their Row #1 a1 b1 c1


Row #2 a2 b2 c2
attributes into column chunks. Row #3 a3 b3 c3
Row #4 a4 b4 c4

Global meta-data directory contains Row #5 a5 b5 c5

offsets to the file's row groups.


→ This is stored in the footer if the file is Column

Row Group
Chunk
Row Group Meta-Data
immutable (Parquet, Orc). a0 a1 a2 b0 b1 b2

PAX File
c0 c1 c2

Each row group contains its own

Row Group
Row Group Meta-Data

meta-data header about its contents. a3


c3
a4 a5
c4
b3
c5
b4 b5

File Meta-Data
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


41

OBSERVATION
I/O is the main bottleneck if the DBMS fetches data
from disk during query execution.

The DBMS can compress pages to increase the


utility of the data moved per I/O operation.

Key trade-off is speed vs. compression ratio


→ Compressing the database reduces DRAM requirements.
→ It may decrease CPU costs during query execution.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


42

DATABASE COMPRESSION
Goal #1: Must produce fixed-length values.
→ Only exception is var-length data stored in separate pool.

Goal #2: Postpone decompression for as long as


possible during query execution.
→ Also known as late materialization.

Goal #3: Must be a lossless scheme.


→ People (typically) don't like losing data.
→ Any lossy compression must be performed by application.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


43

COMPRESSION GRANULARITY
Choice #1: Block-level
→ Compress a block of tuples for the same table.
Choice #2: Tuple-level
→ Compress the contents of the entire tuple (NSM-only).
Choice #3: Attribute-level
→ Compress a single attribute within one tuple (overflow).
→ Can target multiple attributes for the same tuple.
Choice #4: Column-level
→ Compress multiple values for one or more attributes stored
for multiple tuples (DSM-only).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


44

NAÏVE COMPRESSION
Compress data using a general-purpose algorithm.
Scope of compression is only based on the data
provided as input.
→ LZO (1996), LZ4 (2011), Snappy (2011),
Oracle OZIP (2014), Zstd (2015)

Considerations
→ Computational overhead
→ Compress vs. decompress speed.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


45

MYSQL INNODB COMPRESSION

Buffer Pool Database File


mod log
Compressed Page0 [1,2,4,8] KB
mod log
Compressed Page1

mod log
Compressed Page2

Source: MySQL 5.7 Documentation


MySQL 5.7 Documentation

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


46

MYSQL INNODB COMPRESSION

Buffer Pool Database File


Write mod log mod log
[1,2,4,8] KB
Compressed Page0 Compressed Page0

mod log
Compressed Page1

mod log
Compressed Page2

Source: MySQL 5.7 Documentation


MySQL 5.7 Documentation

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


47

MYSQL INNODB COMPRESSION

Buffer Pool Database File


Read mod log mod log
[1,2,4,8] KB
Compressed Page0 Compressed Page0

mod log
Uncompressed Compressed Page1
Page0 16 KB
mod log
Compressed Page2

Source: MySQL 5.7 Documentation


MySQL 5.7 Documentation

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


48

MYSQL INNODB COMPRESSION

Buffer Pool Database File


mod log mod log
Compressed Page0 Compressed Page0 [1,2,4,8] KB
mod log
Read Uncompressed
16 KB Compressed Page1
Page0
mod log
Compressed Page2

Source: MySQL 5.7 Documentation


MySQL 5.7 Documentation

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


49

NAÏVE COMPRESSION
The DBMS must decompress data first before it can
be read and (potentially) modified.
→ This limits the "scope" of the compression scheme.

These schemes also do not consider the high-level


meaning or semantics of the data.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


50

OBSERVATION
Ideally, we want the DBMS to operate on
compressed data without decompressing it first.

Database Magic!
SELECT * FROM users SELECT * FROM users
WHERE name = 'Andy' WHERE name = XX

NAME SALARY NAME SALARY


Andy 99999 XX AA
Jignesh 88888 YY BB

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


51

COMPRESSION GRANULARITY
Choice #1: Block-level
→ Compress a block of tuples for the same table.
Choice #2: Tuple-level
→ Compress the contents of the entire tuple (NSM-only).
Choice #3: Attribute-level
→ Compress a single attribute within one tuple (overflow).
→ Can target multiple attributes for the same tuple.
Choice #4: Column-level
→ Compress multiple values for one or more attributes stored
for multiple tuples (DSM-only).

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


52

COLUMNAR COMPRESSION
Run-length Encoding
Bit-Packing Encoding
Bitmap Encoding
Delta / Frame-of-Reference Encoding
Incremental Encoding
Dictionary Encoding

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


53

RUN-LENGTH ENCODING
Compress runs of the same value in a single column
into triplets:
→ The value of the attribute.
→ The start position in the column segment.
→ The # of elements in the run.

Requires the columns to be sorted intelligently to


maximize compression opportunities.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


54

RUN-LENGTH ENCODING

Original Data Compressed Data


id isDead id isDead
1 Y 1 (Y,0,3)
2 Y 2 (N,3,1)
3 Y 3 (Y,4,1)
4 N 4 (N,5,1)
6 Y 6 (Y,6,2)
7 N 7 RLE Triplet
8 Y 8 - Value
9 Y 9 - Offset
- Length

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


55

RUN-LENGTH ENCODING

Compressed Data
id isDead
1 (Y,0,3)
2 (N,3,1)
SELECT isDead, COUNT(*)
3 (Y,4,1)
FROM users
GROUP BY isDead 4 (N,5,1)
6 (Y,6,2)
7 RLE Triplet
8 - Value
9 - Offset
- Length

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


56

RUN-LENGTH ENCODING

Original Data Compressed Data


id isDead id isDead
1 Y 1 (Y,0,3)
2 Y 2 (N,3,1)
3 Y 3 (Y,4,1)
4 N 4 (N,5,1)
6 Y 6 (Y,6,2)
7 N 7 RLE Triplet
8 Y 8 - Value
9 Y 9 - Offset
- Length

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


57

RUN-LENGTH ENCODING

Sorted Data Compressed Data


id isDead id isDead
1 Y 1 (Y,0,6)
2 Y 2 (N,7,2)
3 Y 3
6 Y 6
8 Y 8
9 Y 9
4 N 4
7 N 7

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


38

BIT PACKING
Original Data Original:
If the values for an integer attribute is 8 × 32-bits =
smaller than the range of its given int32 256 bits
data type size, then reduce the 13 00000000 00000000 00000000 00001101

number of bits to represent each 191 00000000 00000000 00000000 10111111

value. 56 00000000 00000000 00000000 00111000

92 00000000 00000000 00000000 01011100

81 00000000 00000000 00000000 01010001

Use bit-shifting tricks to operate on 120 00000000 00000000 00000000 01111000

multiple values in a single word. 231 00000000 00000000 00000000 11100111

172 00000000 00000000 00000000 10101100

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


38

BIT PACKING
Original Data Original:
If the values for an integer attribute is 8 × 32-bits =
smaller than the range of its given int32 256 bits
data type size, then reduce the 13 00000000 00000000 00000000 00001101

number of bits to represent each 191 00000000 00000000 00000000 10111111

value. 56 00000000 00000000 00000000 00111000

92 00000000 00000000 00000000 01011100

81 00000000 00000000 00000000 01010001

Use bit-shifting tricks to operate on 120 00000000 00000000 00000000 01111000

multiple values in a single word. 231 00000000 00000000 00000000 11100111

172 00000000 00000000 00000000 10101100

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


38

BIT PACKING
Original Data Original:
If the values for an integer attribute is 8 × 32-bits =
smaller than the range of its given int32 256 bits
data type size, then reduce the 13 00001101

number of bits to represent each 191 10111111

value. 56 00111000

92 01011100

81 01010001

Use bit-shifting tricks to operate on 120 01111000

multiple values in a single word. 231 11100111

172 10101100

Compressed:
8 × 8-bits =
64 bits
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


39

PATCHING / MOSTLY ENCODING


A variation of bit packing for when an attribute's
values are "mostly" less than the largest size, store
them with smaller data type.
→ The remaining values that cannot be compressed are
stored in their raw form.
Original Data Compressed Data
int32 mostly8 offset value
Original: 13 13 3 99999999 Compressed:
8 × 32-bits = 191 181 (8 × 8-bits) +
99999999 XXX
256 bits 92 92
16-bits + 32-bits
81 81 = 112 bits
120 120
231 231
Redshift Documentation
172 172

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


62

BITMAP ENCODING
Store a separate bitmap for each unique value for an
attribute where an offset in the vector corresponds
to a tuple.
→ The ith position in the Bitmap corresponds to the ith tuple
in the table.
→ Typically segmented into chunks to avoid allocating large
blocks of contiguous memory.

Only practical if the value cardinality is low.


Some DBMSs provide bitmap indexes.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


41

BITMAP ENCODING

Original Data Compressed Data


id isDead isDead
id Y N
1 Y
2 Y 1 1 0
3 Y 2 1 0
4 N 3 1 0
6 Y 4 0 1
7 N 6 1 0
8 Y 7 0 1
9 Y 8 1 0
9 1 0

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


41

BITMAP ENCODING
Compressed:
Original Data Compressed Data 16 bits + 16 bits =
32 bits
id isDead
isDead
id Y N
2 × 8-bits =
1 Y 16 bits
2 Y 1 1 0
3 Y 2 1 0
4 N Original: 3 1 0
6 Y
8 × 8-bits = 4 0 1 8 × 2-bits =
64 bits 6 1 0 16 bits
7 N
8 Y 7 0 1
9 Y 8 1 0
9 1 0

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


42

BITMAP ENCODING: EXAMPLE


Assume we have 10 million tuples.
CREATE TABLE customer (
43,000 zip codes in the US. id INT PRIMARY KEY,
→ 10000000 × 32-bits = 40 MB name VARCHAR(32),
→ 10000000 × 43000 = 53.75 GB email VARCHAR(64),
address VARCHAR(64),
Every time the application inserts a zip_code INT
new tuple, the DBMS must extend );
43,000 different bitmaps.

There are compressed data structures


for sparse data sets:
→ Roaring Bitmaps
Roaring Bitmap

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


66

DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.

Original Data
time64 temp
12:00 99.5
12:01 99.4
12:02 99.5
12:03 99.6
12:04 99.4

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


67

DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.

Original Data Compressed Data


time64 temp time64 temp
12:00 99.5 12:00 99.5
12:01 99.4 +1 -0.1
12:02 99.5 +1 +0.1
12:03 99.6 +1 +0.1
12:04 99.4 +1 -0.2

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


68

DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
→ Combine with RLE to get even better compression ratios.

Original Data Compressed Data


time64 temp time64 temp
12:00 99.5 12:00 99.5
12:01 99.4 +1 -0.1
12:02 99.5 +1 +0.1
12:03 99.6 +1 +0.1
12:04 99.4 +1 -0.2

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


69

DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
→ Combine with RLE to get even better compression ratios.

Original Data Compressed Data Compressed Data


time64 temp time64 temp time64 temp
12:00 99.5 12:00 99.5 12:00 99.5
12:01 99.4 +1 -0.1 (+1,4) -0.1
12:02 99.5 +1 +0.1 +0.1
12:03 99.6 +1 +0.1 +0.1
12:04 99.4 +1 -0.2 -0.2

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


70

DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
→ Combine with RLE to get even better compression ratios.
Frame-of-Reference Variant: Use global min value.
Original Data Compressed Data Compressed Data
time64 temp time64 temp time64 temp
12:00 99.5 12:00 99.5 12:00 99.5
12:01 99.4 +1 -0.1 (+1,4) -0.1
12:02 99.5 +1 +0.1 +0.1
12:03 99.6 +1 +0.1 +0.1
12:04 99.4 +1 -0.2 -0.2
5 × 64-bits 64-bits + (4 × 16-bits) 64-bits + (2 × 16-bits)
= 320 bits = 128 bits = 96 bits
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


44

DICTIONARY COMPRESSION
Replace frequent values with smaller fixed-length
codes and then maintain a mapping (dictionary)
from the codes to the original values
→ Typically, one code per attribute value.
→ Most widely used native compression scheme in DBMSs.

The ideal dictionary scheme supports fast encoding


and decoding for both point and range queries.
→ Encode/Locate: For a given uncompressed value, convert
it into its compressed form.
→ Decode/Extract: For a given compressed value, convert it
back into its original form.
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


72

DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same
collation as the original values.

Original Data Compressed Data

Dictionary
name name value code

Sorted
Andrea 10 Andrea 10
Mr.Pickles 40 Andy 20
Andy 20 Jignesh 30
Jignesh 30 Mr.Pickles 40
Mr.Pickles 40
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


73

DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same
collation as the original values.

SELECT * FROM users SELECT * FROM users


WHERE name LIKE 'And%' WHERE name BETWEEN 10 AND 20

Original Data Compressed Data

Dictionary
name name value code

Sorted
Andrea 10 Andrea 10
Mr.Pickles 40 Andy 20
Andy 20 Jignesh 30
Jignesh 30 Mr.Pickles 40
Mr.Pickles 40
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


74

ORDER-PRESERVING ENCODING
SELECT name FROM users
WHERE name LIKE 'And%' Still must perform scan on column

SELECT DISTINCT name


FROM users Only need to access dictionary
WHERE name LIKE 'And%'

Original Data Compressed Data

Dictionary
name name value code

Sorted
Andrea 10 Andrea 10
Mr.Pickles 40 Andy 20
Andy 20 Jignesh 30
Jignesh 30 Mr.Pickles 40
Mr.Pickles 40
5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


75

CONCLUSION
It is important to choose the right storage model for
the target workload:
→ OLTP = Row Store
→ OLAP = Column Store

DBMSs can combine different approaches for even


better compression.
Dictionary encoding is probably the most useful
scheme because it does not require pre-sorting.

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)


76

DATABASE STORAGE
Problem #1: How the DBMS represents the
database in files on disk.

Problem #2: How the DBMS manages its memory


and moves data back-and-forth from disk. ← Next

5-445/645 (Fall 2024)

15-445/645 (Fall 2024)

You might also like