Notes 03 - Database Storage - II
Notes 03 - Database Storage - II
Yousef M. Elmehdwi
Slides: adapted from courses taught by Andy Pavlo, Carnegie Mellon University, Hector
Garcia-Molina, Stanford, & Shun Yan Cheung, Emory University
1 / 74
Reading
2 / 74
Database Storage
3 / 74
Today’s Agenda
Data Representation
How the system stores the actual binary data for individual attributes
(columns) within the database.
System Catalogs
Internal metadata is maintained by the database to understand both the
data that is actually stored and how to interpret the bytes within the tuples
Storage Models
How data is organized and stored within the database system.
Modification of Tuples
4 / 74
Tuple Storage
5 / 74
What are the data items we want to store?
a salary
a name
a date
a picture
⇒ What we have available: Bytes
6 / 74
Data Representation
7 / 74
IEEE-754 Standard1
1 https://en.wikipedia.org/wiki/IEEE 754
8 / 74
Data Representation
1 refer to the fundamental data types that are supported directly by the C/C++ programming languages without any additional
libraries or custom data types. These native data types are typically used for storing and manipulating data efficiently in these
languages.
9 / 74
Data Representation: Integers
C/C++ Representation
Most DBMSs store integers using their “native” C/C++ types as specified
by the IEEE-754 standard.
These values are fixed length.
Examples: INTEGER/BIGINT/SMALLINT/TINYINT
10 / 74
Variable Precision Numbers
1 Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that
Rounding Example
# include < stdio .h >
Output
x + y = 0.300000
0.3 = 0.300000
12 / 74
Variable Precision Numbers
Rounding Example
# include < stdio .h >
Output
x + y = 0. 3 0 0 0 0 0 0 1 1 9 2 092 89 55 08
0.3 = 0 . 2 9 9 9 9 9 9 9 9 9 99 99 998 89 0
13 / 74
Variable Precision Numbers
Rounding Example
public class RoundingError {
public static void main ( String [] args ) {
float x = 1.0 f ;
for ( int i = 0; i < 10; i ++) {
x -= 0.1 f ;
}
System . out . printf ( " Result w precision : %.1 f \ n " , x ) ;
System . out . printf ( " Result w precision 10: %.10 f \ n " ,x ) ;
}
}
Output
Result w precision 1 : -0.0
Result w precision 10: -0.0000000745
14 / 74
Data Representation: Fixed Point Precision Numbers
15 / 74
PostgreSQL: NUMERIC
16 / 74
PostgreSQL Source Code, numeric.c
17 / 74
Data Representation: Variable Length Data
18 / 74
Large Values
Most DBMSs don’t allow a tuple to exceed the size of a single page.
Handling tuples that exceed the size of a single page in a DBMS is a
common challenge.
DBMSs typically have strategies to deal with this situation to ensure data
integrity and efficient storage.
Two common approaches to handle such cases are:
overflow page
external storage
19 / 74
Large Values: Overflow Page
To store values that are larger than a
page, the DBMS uses separate overflow
storage pages and have the tuple contain a
reference to that page.
The main part of the tuple, which can fit
within a single page, is stored in the
primary page, while the overflowed part
is stored in one or more additional
overflow pages.
Overflow pages are linked to the primary
page, forming a chain of pages that
together represent the complete tuple.
When querying the data, the DBMS
follows these chains of pages to
reconstruct the complete tuple.
These overflow pages can contain pointers to additional overflow pages
until all the data can be stored.
20 / 74
Large Values: Overflow Page
To store values that are larger than a
page, the DBMS uses separate overflow
storage pages and have the tuple contain a
reference to that page.
Different DBMSs have different
name/specification/requirements when
they do that:
Postgres: TOAST (The
Oversized-Attribute Storage
Technique) (>2KB)
MySQL: Overflow (> 12 size of page)
SQL Server: Overflow (> size of
page)
These overflow pages can contain pointers to additional overflow pages
until all the data can be stored.
21 / 74
External Value Storage
Some systems allow to store large data values, such as files or binary
objects, in an external file rather than directly within the database, and
then the tuple will contain a pointer to that file.
Example:
if the database is storing photo information, the DBMS can store the
photos in the external files rather than having them take up large amounts
of space in the DBMS.
1 The Unix epoch is a reference point in time, representing January 1, 1970, at 00:00:00 UTC (Coordinated Universal Time). It
1 Bootstrapping is the process of initializing a DBMS’s catalog during system setup or database creation. During this phase, the
DBMS uses low-level access methods or internal mechanisms to create the catalog tables and populate them with initial data
24 / 74
System Catalog
You can query the DBMS’s internal INFORMATION SCHEMA catalog to get
info about the database.
ANSI standard set of read-only views that provide info about all of the
tables, views, columns, and procedures in a database.
DBMSs also have non-standard shortcuts to retrieve this information.
25 / 74
Accessing Table Schema
-- SQL -92
SELECT *
FROM I N F O R M A T I O N _ S C H E M A . TABLES
WHERE table_catalog = ’ < db name > ’;
\d; -- Postgres
SHOW TABLES ; -- MySQL
. tables ; -- SQLite
26 / 74
Accessing Table Schema
-- SQL -92
SELECT *
FROM I N F O RM AT I O N _ S C H E M A . TABLES
WHERE table_name = ’ student ’
\ d student ; -- Postgres
DESCRIBE student ; -- MySQL
. schema student ; -- SQLite
27 / 74
Today’s Agenda
Data Representation
System Catalogs
Storage Models
Ways to store tuples in pages
28 / 74
Observation
The relational model does not specify that we have to store all of a
tuple’s attributes together in a single page.
This may not actually be the best layout for some workloads
There are many different workloads for database systems.
By workload1 , we are referring to the general nature of requests a system
will have to handle.
Different workloads have different requirements for data storage and
access patterns.
1 refers to the types of queries, transactions, and operations the system is expected to handle.
29 / 74
Wikipedia Example
30 / 74
OLTP
SELECT P .* , R .*
FROM pages AS P
INNER JOIN revisions AS R
ON P . latest = R . revID
On-line Transaction Processing: WHERE P . pageID = ?
Simple queries that read/update
a small amount of data that is
related to a single entity in the
UPDATE useracct
database. SET lastLogin = NOW () ,
hostname = ?
WHERE userID = ?
This is usually the kind of
application that people build first.
INSERT INTO revisions
VALUES (? ,?... ,?)
31 / 74
OLTP: On-line Transaction Processing
32 / 74
OLAP
33 / 74
OLAP: On-line Analytical Processing
34 / 74
HTAP: Hybrid Transaction + Analytical Processing
35 / 74
Data Storage Model
36 / 74
N -ARY Storage Model (NSM)
The DBMS stores all attributes for a single tuple contiguously in a single
page.
Ideal for OLTP workloads where requests are insert-heavy and
transactions tend to operate only an individual entity
it takes only one fetch to be able to get all of the attributes for a single
tuple.
37 / 74
N -ARY Storage Model (NSM)
The DBMS stores all attributes for a single tuple contiguously in a single
page.
38 / 74
N -ARY Storage Model (NSM)
The DBMS stores all attributes for a single tuple contiguously in a single
page.
39 / 74
N -ARY Storage Model (NSM)
The DBMS stores all attributes for a single tuple contiguously in a single
page.
40 / 74
N -ARY Storage Model (NSM)
41 / 74
N -ARY Storage Model (NSM)
42 / 74
N -ARY Storage Model (NSM)
43 / 74
N -ARY Storage Model (NSM)
44 / 74
N -ARY Storage Model (NSM)
45 / 74
N -ARY Storage Model (NSM)
46 / 74
N -ARY Storage Model (NSM)
47 / 74
N -ARY Storage Model (NSM)
48 / 74
N -ARY Storage Model (NSM)
Advantages
Fast inserts, updates, and deletes.
Good for queries that need the entire tuple.
Disadvantages
Not good for scanning large portions of the table and/or a subset of the
attributes.
This is because it pollutes the buffer pool by fetching data that is not
needed for processing the query.
49 / 74
Decomposition Storage Model (DSM)
The DBMS stores the values of a single attribute (column) for all tuples
contiguously in a block of data.
Vertically partition a database into a collection of individual columns that
are stored separately
Also known as a “column store”.
Ideal for OLAP workloads where read-only queries perform large scans
over a subset of the table’s attributes.
50 / 74
Decomposition Storage Model (DSM)
The DBMS stores the values of a single attribute for all tuples
contiguously in a page.
Also known as a “column store”.
51 / 74
Decomposition Storage Model (DSM)
The DBMS stores the values of a single attribute for all tuples
contiguously in a page.
Also known as a “column store”.
52 / 74
Decomposition Storage Model (DSM)
53 / 74
Decomposition Storage Model (DSM)
54 / 74
Decomposition Storage Model: Tuple Identification
To put the tuples back together when we are using a column store, we can
use:
Choice #1: Fixed-length Offsets (most commonly used approach)
Choice #2: Embedded Tuple Ids (less common approach)
55 / 74
Choice #1: Fixed-length Offsets
Each column of the table is stored separately, and for each column, a
fixed-length offset is used to locate the position of each tuple within that
column.
Assuming the attributes are all fixed-length, the DBMS can compute the
offset of the attribute for each tuple.
When the system wants the attribute for a specific tuple, it knows how to
jump to that spot in the file from the offset.
To accommodate the variable-length fields, the system can either pad
fields so that they are all the same length or use a dictionary that takes a
fixed-size integer and maps the integer to the value.
56 / 74
Choice #2: Embedded Tuple Ids
57 / 74
Decomposition Storage Model (DSM)
Advantages
Reduces the amount wasted I/O because the DBMS only reads the data
that it needs.
Enable better query processing and data compression.
because all of the values for the same attribute are stored contiguously
Disadvantages
Slow for point queries, inserts, updates, and deletes because of tuple
splitting/stitching.
58 / 74
Modification of Tuples
59 / 74
1) Insertion
60 / 74
2) Deletion
61 / 74
Options
Trade-offs
How expensive is immediate reclaim?
How expensive is to move valid tuple to free space for immediate reclaim?
How much space is wasted?
e.g., deleted tuples, delete fields, ...
62 / 74
Concern with deletions
Example
Record Y can be referenced by other tuples (e.g., tuples X1 & X2)
63 / 74
Concern with deletions
Example
When the tuple Y is deleted
64 / 74
Techniques to handle tuple deletion
Using logical addresses is easy
Before deleting tuple Y that is referenced by tuples X1 and X2
65 / 74
Techniques to handle tuple deletion
Using logical addresses is easy
After deleting tuple Y
The logical address used by tuple Y must remain in the map table
Furthermore:
The logical address used by tuple Y cannot be re-used
67 / 74
Techniques to handle tuple deletion
68 / 74
Tombstones
Example
Before deleting tuple Y
69 / 74
Tombstones
Example
After deleting tuple Y
70 / 74
Tombstones
When you insert a new tuple, you cannot use the space of a tombstone
tuple (tombstone tuple must be preserved)
Because: Existing tuple references to the deleted tuple will then
references to the newly inserted tuple:
71 / 74
Update
72 / 74
Conclusion
The storage manager is not entirely independent from the rest of the
DBMS.
A DBMS encodes and decodes the tuple’s bytes into a set of attributes
based on its schema.
It is important to choose the right storage model for the target workload:
OLTP = Row Store
OLAP = Column Store
73 / 74
Database Storage: Next
74 / 74