05 Storage3
05 Storage3
Systems
Storage Models &
Data Compression
15-445/645 FALL 2024 PROF. ANDY PAVLO
ADMINISTRIVIA
Homework #2 is due Sept 22nd @ 11:59pm
LAST CLASS
We discussed storage architecture alternatives to
tuple-oriented scheme.
→ Log-structured storage
→ Index-organized storage
TODAY'S AGENDA
Database Workloads
Storage Models
Data Compression
DB Flash Talk: StarTree
DATABASE WORKLOADS
On-Line Transaction Processing (OLTP)
→ Fast operations that only read/update a small amount of
data each time.
DATABASE WORKLOADS
Complex
Operation Complexity
OLAP
Jim Gray
Jim Gray
OLTP
Simple
Write-Heavy Read-Heavy
DATABASE WORKLOADS
Complex
Operation Complexity
OLAP
HTAP
OLTP
Simple
Write-Heavy Read-Heavy
WIKIPEDIA EXAMPLE
OBSERVATION
The relational model does not specify that the
DBMS must store all a tuple's attributes together in
a single page.
OLTP
On-line Transaction Processing:
SELECT P.*, R.*
→ Simple queries that read/update a small FROM pages AS P
amount of data that is related to a single INNER JOIN revisions AS R
entity in the database. ON P.latest = R.revID
WHERE P.pageID = ?
This is usually the kind of application
that people build first. UPDATE useracct
SET lastLogin = NOW(),
hostname = ?
WHERE userID = ?
OLAP
On-line Analytical Processing: SELECT COUNT(U.lastLogin),
→ Complex queries that read large portions EXTRACT(month FROM
of the database spanning multiple entities. U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
You execute these workloads on the GROUP BY
EXTRACT(month FROM U.lastLogin)
data you have collected from your
OLTP application(s).
STORAGE MODELS
A DBMS's storage model specifies how it
physically organizes tuples on disk and in memory.
→ Can have different performance characteristics based on
the target workload (OLTP vs. OLAP).
→ Influences the design choices of the rest of the DBMS.
Database Page
header
header a0 b0 c0
Database Page
header
header a1
b1 c1 header a0 b0 c0
Database Page
header
header a5 b5 c5 header a4
b4 c4 header a3 b3 c3
header a2 b2 c2 header a1
b1 c1 header a0 b0 c0
header - - - - -
Disk
5-445/645 (Fall 2024)
header userID
- userName
- userPass
- hostname
- lastLogin
-
Disk
5-445/645 (Fall 2024)
Disk
5-445/645 (Fall 2024)
Disk
5-445/645 (Fall 2024)
Disk
5-445/645 (Fall 2024)
Disk
5-445/645 (Fall 2024)
Useless Data!
15-445/645 (Fall 2024)
24
NSM: SUMMARY
Advantages
→ Fast inserts, updates, and deletes.
→ Good for queries that need the entire tuple (OLTP).
→ Can use index-oriented physical storage for clustering.
Disadvantages
→ Not good for scanning large portions of the table and/or a
subset of the attributes.
→ Terrible memory locality in access patterns.
→ Not ideal for compression because of multiple value
domains within a single page.
length values.
Row #1 a1 b1 c1
Row #2 a2 b2 c2
→ Most systems identify unique physical Row #3 a3 b3 c3
tuples using offsets into these arrays. Row #4 a4 b4 c4
→ Need to handle variable-length values… Row #5 a5 b5 c5
Page #1
header null bitmap
length values.
Row #1 a1 b1 c1
Row #2 a2 b2 c2
→ Most systems identify unique physical Row #3 a3 b3 c3
tuples using offsets into these arrays. Row #4 a4 b4 c4
→ Need to handle variable-length values… Row #5 a5 b5 c5
Page #2 Page #1
header null bitmap
length values.
Row #1 a1 b1 c1
Row #2 a2 b2 c2
→ Most systems identify unique physical Row #3 a3 b3 c3
tuples using offsets into these arrays. Row #4 a4 b4 c4
→ Need to handle variable-length values… Row #5 a5 b5 c5
Page #2 Page #1
header null bitmap
Page #3
c0 c1 c2 c3 c4
5-445/645 (Fall 2024)
c5
15-445/645 (Fall 2024)
29
Disadvantages
→ Slow for point queries, inserts, updates, and deletes
because of tuple splitting/stitching/reorganization.
OBSERVATION
OLAP queries almost never access a single column
in a table by itself.
→ At some point during query execution, the DBMS must get
other columns and stitch the original tuple back together.
But we still need to store data in a columnar format
to get the storage + execution benefits.
Row Group
Chunk
Row Group Meta-Data
immutable (Parquet, Orc). a0 a1 a2 b0 b1 b2
PAX File
c0 c1 c2
Row Group
Chunk
Row Group Meta-Data
immutable (Parquet, Orc). a0 a1 a2 b0 b1 b2
PAX File
c0 c1 c2
Row Group
Row Group Meta-Data
File Meta-Data
5-445/645 (Fall 2024)
Row Group
Chunk
Row Group Meta-Data
immutable (Parquet, Orc). a0 a1 a2 b0 b1 b2
PAX File
c0 c1 c2
Row Group
Row Group Meta-Data
File Meta-Data
5-445/645 (Fall 2024)
OBSERVATION
I/O is the main bottleneck if the DBMS fetches data
from disk during query execution.
DATABASE COMPRESSION
Goal #1: Must produce fixed-length values.
→ Only exception is var-length data stored in separate pool.
COMPRESSION GRANULARITY
Choice #1: Block-level
→ Compress a block of tuples for the same table.
Choice #2: Tuple-level
→ Compress the contents of the entire tuple (NSM-only).
Choice #3: Attribute-level
→ Compress a single attribute within one tuple (overflow).
→ Can target multiple attributes for the same tuple.
Choice #4: Column-level
→ Compress multiple values for one or more attributes stored
for multiple tuples (DSM-only).
NAÏVE COMPRESSION
Compress data using a general-purpose algorithm.
Scope of compression is only based on the data
provided as input.
→ LZO (1996), LZ4 (2011), Snappy (2011),
Oracle OZIP (2014), Zstd (2015)
Considerations
→ Computational overhead
→ Compress vs. decompress speed.
mod log
Compressed Page2
mod log
Compressed Page1
mod log
Compressed Page2
mod log
Uncompressed Compressed Page1
Page0 16 KB
mod log
Compressed Page2
NAÏVE COMPRESSION
The DBMS must decompress data first before it can
be read and (potentially) modified.
→ This limits the "scope" of the compression scheme.
OBSERVATION
Ideally, we want the DBMS to operate on
compressed data without decompressing it first.
Database Magic!
SELECT * FROM users SELECT * FROM users
WHERE name = 'Andy' WHERE name = XX
COMPRESSION GRANULARITY
Choice #1: Block-level
→ Compress a block of tuples for the same table.
Choice #2: Tuple-level
→ Compress the contents of the entire tuple (NSM-only).
Choice #3: Attribute-level
→ Compress a single attribute within one tuple (overflow).
→ Can target multiple attributes for the same tuple.
Choice #4: Column-level
→ Compress multiple values for one or more attributes stored
for multiple tuples (DSM-only).
COLUMNAR COMPRESSION
Run-length Encoding
Bit-Packing Encoding
Bitmap Encoding
Delta / Frame-of-Reference Encoding
Incremental Encoding
Dictionary Encoding
RUN-LENGTH ENCODING
Compress runs of the same value in a single column
into triplets:
→ The value of the attribute.
→ The start position in the column segment.
→ The # of elements in the run.
RUN-LENGTH ENCODING
RUN-LENGTH ENCODING
Compressed Data
id isDead
1 (Y,0,3)
2 (N,3,1)
SELECT isDead, COUNT(*)
3 (Y,4,1)
FROM users
GROUP BY isDead 4 (N,5,1)
6 (Y,6,2)
7 RLE Triplet
8 - Value
9 - Offset
- Length
RUN-LENGTH ENCODING
RUN-LENGTH ENCODING
BIT PACKING
Original Data Original:
If the values for an integer attribute is 8 × 32-bits =
smaller than the range of its given int32 256 bits
data type size, then reduce the 13 00000000 00000000 00000000 00001101
BIT PACKING
Original Data Original:
If the values for an integer attribute is 8 × 32-bits =
smaller than the range of its given int32 256 bits
data type size, then reduce the 13 00000000 00000000 00000000 00001101
BIT PACKING
Original Data Original:
If the values for an integer attribute is 8 × 32-bits =
smaller than the range of its given int32 256 bits
data type size, then reduce the 13 00001101
value. 56 00111000
92 01011100
81 01010001
172 10101100
Compressed:
8 × 8-bits =
64 bits
5-445/645 (Fall 2024)
BITMAP ENCODING
Store a separate bitmap for each unique value for an
attribute where an offset in the vector corresponds
to a tuple.
→ The ith position in the Bitmap corresponds to the ith tuple
in the table.
→ Typically segmented into chunks to avoid allocating large
blocks of contiguous memory.
BITMAP ENCODING
BITMAP ENCODING
Compressed:
Original Data Compressed Data 16 bits + 16 bits =
32 bits
id isDead
isDead
id Y N
2 × 8-bits =
1 Y 16 bits
2 Y 1 1 0
3 Y 2 1 0
4 N Original: 3 1 0
6 Y
8 × 8-bits = 4 0 1 8 × 2-bits =
64 bits 6 1 0 16 bits
7 N
8 Y 7 0 1
9 Y 8 1 0
9 1 0
DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
Original Data
time64 temp
12:00 99.5
12:01 99.4
12:02 99.5
12:03 99.6
12:04 99.4
DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
→ Combine with RLE to get even better compression ratios.
DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
→ Combine with RLE to get even better compression ratios.
DELTA ENCODING
Recording the difference between values that follow
each other in the same column.
→ Store base value in-line or in a separate look-up table.
→ Combine with RLE to get even better compression ratios.
Frame-of-Reference Variant: Use global min value.
Original Data Compressed Data Compressed Data
time64 temp time64 temp time64 temp
12:00 99.5 12:00 99.5 12:00 99.5
12:01 99.4 +1 -0.1 (+1,4) -0.1
12:02 99.5 +1 +0.1 +0.1
12:03 99.6 +1 +0.1 +0.1
12:04 99.4 +1 -0.2 -0.2
5 × 64-bits 64-bits + (4 × 16-bits) 64-bits + (2 × 16-bits)
= 320 bits = 128 bits = 96 bits
5-445/645 (Fall 2024)
DICTIONARY COMPRESSION
Replace frequent values with smaller fixed-length
codes and then maintain a mapping (dictionary)
from the codes to the original values
→ Typically, one code per attribute value.
→ Most widely used native compression scheme in DBMSs.
DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same
collation as the original values.
Dictionary
name name value code
Sorted
Andrea 10 Andrea 10
Mr.Pickles 40 Andy 20
Andy 20 Jignesh 30
Jignesh 30 Mr.Pickles 40
Mr.Pickles 40
5-445/645 (Fall 2024)
DICTIONARY: ORDER-PRESERVING
The encoded values need to support the same
collation as the original values.
Dictionary
name name value code
Sorted
Andrea 10 Andrea 10
Mr.Pickles 40 Andy 20
Andy 20 Jignesh 30
Jignesh 30 Mr.Pickles 40
Mr.Pickles 40
5-445/645 (Fall 2024)
ORDER-PRESERVING ENCODING
SELECT name FROM users
WHERE name LIKE 'And%' Still must perform scan on column
Dictionary
name name value code
Sorted
Andrea 10 Andrea 10
Mr.Pickles 40 Andy 20
Andy 20 Jignesh 30
Jignesh 30 Mr.Pickles 40
Mr.Pickles 40
5-445/645 (Fall 2024)
CONCLUSION
It is important to choose the right storage model for
the target workload:
→ OLTP = Row Store
→ OLAP = Column Store
DATABASE STORAGE
Problem #1: How the DBMS represents the
database in files on disk.