0% found this document useful (0 votes)
15 views

Notes 03 - Database Storage - II

The document discusses advanced database storage concepts, focusing on how DBMS represents databases in files, manages memory, and organizes data. It covers data representation for various types, including integers, floating-point numbers, and variable-length data, as well as the importance of system catalogs for metadata management. Additionally, it highlights different storage models and their implications for OLTP and OLAP workloads.

Uploaded by

Dhruv Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Notes 03 - Database Storage - II

The document discusses advanced database storage concepts, focusing on how DBMS represents databases in files, manages memory, and organizes data. It covers data representation for various types, including integers, floating-point numbers, and variable-length data, as well as the importance of system catalogs for metadata management. Additionally, it highlights different storage models and their implications for OLTP and OLAP workloads.

Uploaded by

Dhruv Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

CS525: Advanced Database Organization

Notes 3: Database Storage


Part II

Yousef M. Elmehdwi

Department of Computer Science

Illinois Institute of Technology


[email protected]

September 6th 2023

Slides: adapted from courses taught by Andy Pavlo, Carnegie Mellon University, Hector
Garcia-Molina, Stanford, & Shun Yan Cheung, Emory University

1 / 74
Reading

Database Systems: The Complete Book, 2nd Edition,


Chapter 2: Data Storage
Database System Concepts (6th/7th Edition)
Chapter 10 (6th)/ Chapter 13 (7th)

2 / 74
Database Storage

Problem#1: How the DBMS represents the database in files on disk


i.e., how to lay out data on disk.
Problem#2: How the DBMS manages its memory and move data
back-and-forth from disk.

3 / 74
Today’s Agenda

Data Representation
How the system stores the actual binary data for individual attributes
(columns) within the database.
System Catalogs
Internal metadata is maintained by the database to understand both the
data that is actually stored and how to interpret the bytes within the tuples
Storage Models
How data is organized and stored within the database system.
Modification of Tuples

4 / 74
Tuple Storage

A tuple is essentially a sequence of bytes (byte arrays).


It is up to the DBMS to know how to interpret those bytes to derive the
values for attributes.
The DBMS’s catalogs contain the schema information about tables that
the system uses to figure out the tuple’s layout.

5 / 74
What are the data items we want to store?

a salary
a name
a date
a picture
⇒ What we have available: Bytes

6 / 74
Data Representation

How a DBMS stores the bytes for a value?


How the DBMS stores the binary data for different types of values or
attributes?
There are five high level data types that can be stored in tuples:
integers,
variable precision numbers,
fixed point precision numbers,
variable length values, and
dates/times.

7 / 74
IEEE-754 Standard1

This is a specific standard that defines the binary representation of


floating-point numbers (like float and double) and their arithmetic
operations in computing.
Ensures consistency in how floating-point numbers are stored and
manipulated across different computer architectures and programming
languages.
It specifies the format for representing real numbers, including the sign
bit, exponent, and mantissa (fractional part).
Example: 32 bit in Standard IEEE 754:

1 https://en.wikipedia.org/wiki/IEEE 754
8 / 74
Data Representation

How a DBMS stores the bytes for a value


INTEGER/BIGINT/SMALLINT/TINYINT
C/C++ Representation
All integers are stored in their “native” C/C++ types1 within the database.
FLOAT/REAL vs. NUMERIC/DECIMAL
IEEE-754 Standard/Fixed-point Decimals
VARCHAR/VARBINARY/TEXT/BLOB
Header with length, followed by data bytes.
TIME/DATE/TIMESTAMP
32/64-bit integer of (micro)seconds since Unix epoch

1 refer to the fundamental data types that are supported directly by the C/C++ programming languages without any additional

libraries or custom data types. These native data types are typically used for storing and manipulating data efficiently in these
languages.
9 / 74
Data Representation: Integers

C/C++ Representation
Most DBMSs store integers using their “native” C/C++ types as specified
by the IEEE-754 standard.
These values are fixed length.
Examples: INTEGER/BIGINT/SMALLINT/TINYINT

10 / 74
Variable Precision Numbers

These are inexact1 , variable-precision numeric types that uses the


“native” C/C++ types.
Store directly as specified by IEEE-754 standard.
These values are also fixed length.
Typically faster than arbitrary precision numbers because the CPU can
execute instructions on them directly.
Example: FLOAT, REAL/DOUBLE
but can have rounding errors2 when performing computations due to the
fact that some numbers cannot be represented precisely in binary
floating-point format.
As a result, calculations may yield slightly inaccurate results.
To avoid this issue, we use Fixed-Point Precision Numbers.

1 Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that

storing and retrieving a value might show slight discrepancies.


2 rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced

by the same algorithm using finite-precision, rounded arithmetic.


11 / 74
Variable Precision Numbers

Rounding Example
# include < stdio .h >

int main ( int argc , char * argv []) {


float x = 0.1;
float y = 0.2;
printf ( " x + y = % f \ n " , x + y ) ;
printf ( " 0.3 = % f \ n " , 0.3) ;
}

Output
x + y = 0.300000
0.3 = 0.300000

12 / 74
Variable Precision Numbers

Rounding Example
# include < stdio .h >

int main ( int argc , char * argv []) {


float x = 0.1;
float y = 0.2;
printf ( " x + y = %.20 f \ n " , x + y ) ;
printf ( " 0.3 = %.20 f \ n " , 0.3) ;
}

Output
x + y = 0. 3 0 0 0 0 0 0 1 1 9 2 092 89 55 08
0.3 = 0 . 2 9 9 9 9 9 9 9 9 9 99 99 998 89 0

13 / 74
Variable Precision Numbers

Rounding Example
public class RoundingError {
public static void main ( String [] args ) {
float x = 1.0 f ;
for ( int i = 0; i < 10; i ++) {
x -= 0.1 f ;
}
System . out . printf ( " Result w precision : %.1 f \ n " , x ) ;
System . out . printf ( " Result w precision 10: %.10 f \ n " ,x ) ;
}
}

Output
Result w precision 1 : -0.0
Result w precision 10: -0.0000000745

14 / 74
Data Representation: Fixed Point Precision Numbers

Numeric data types with arbitrary precision and scale.


Used when round errors are unacceptable.
Example: NUMERIC, DECIMAL
Typically stored in an exact, variable-length binary representation with
additional meta-data that specifies the length of the data and the position
of the decimal point.
Like a VARCHAR but not stored as a string
but the DBMS pays a performance penalty to get this accuracy.
Calculations involving fixed-point precision numbers may be slower
compared to operations with native numeric types like FLOAT or
DOUBLE, which use hardware-based floating-point arithmetic.

15 / 74
PostgreSQL: NUMERIC

16 / 74
PostgreSQL Source Code, numeric.c
17 / 74
Data Representation: Variable Length Data

These represent data types of arbitrary length.


An array of bytes of arbitrary length.
Has a header that keeps track of the length of the string to make it easy
to jump to the next value. It may also contain a checksum for the data.
Example: VARCHAR, VARBINARY, TEXT, BLOB.

18 / 74
Large Values

Most DBMSs don’t allow a tuple to exceed the size of a single page.
Handling tuples that exceed the size of a single page in a DBMS is a
common challenge.
DBMSs typically have strategies to deal with this situation to ensure data
integrity and efficient storage.
Two common approaches to handle such cases are:
overflow page
external storage

19 / 74
Large Values: Overflow Page
To store values that are larger than a
page, the DBMS uses separate overflow
storage pages and have the tuple contain a
reference to that page.
The main part of the tuple, which can fit
within a single page, is stored in the
primary page, while the overflowed part
is stored in one or more additional
overflow pages.
Overflow pages are linked to the primary
page, forming a chain of pages that
together represent the complete tuple.
When querying the data, the DBMS
follows these chains of pages to
reconstruct the complete tuple.
These overflow pages can contain pointers to additional overflow pages
until all the data can be stored.

20 / 74
Large Values: Overflow Page
To store values that are larger than a
page, the DBMS uses separate overflow
storage pages and have the tuple contain a
reference to that page.
Different DBMSs have different
name/specification/requirements when
they do that:
Postgres: TOAST (The
Oversized-Attribute Storage
Technique) (>2KB)
MySQL: Overflow (> 12 size of page)
SQL Server: Overflow (> size of
page)
These overflow pages can contain pointers to additional overflow pages
until all the data can be stored.

21 / 74
External Value Storage
Some systems allow to store large data values, such as files or binary
objects, in an external file rather than directly within the database, and
then the tuple will contain a pointer to that file.
Example:
if the database is storing photo information, the DBMS can store the
photos in the external files rather than having them take up large amounts
of space in the DBMS.

Treated as a BLOB type


Oracle: BFILE data type
Contains a locator pointing to a large
binary file stored outside the database.
Microsoft: FILESTREAM data type
The DBMS cannot manipulate the
contents of an external file.

Reading: A paper explains the trade-offs between these two options:


To BLOB or Not To BLOB: Large Object Storage in a Database or a
Filesystem
22 / 74
Data Representation: Dates and Times

Varies widely across different database systems?


However, a common approach is to represent dates and times as the
number of (micro/milli)seconds since the Unix epoch1 .
Example: TIME, DATE, TIMESTAMP.

1 The Unix epoch is a reference point in time, representing January 1, 1970, at 00:00:00 UTC (Coordinated Universal Time). It

is widely used as a starting point for measuring time intervals.


23 / 74
System Catalog

In order for the DBMS to be able to decipher the contents of tuples, it


maintains an internal catalog to tell it meta-data about the databases
A DBMS stores meta-data about databases in its internal catalogs.
The meta-data will contain what tables and columns the databases have
along with their types and the orderings of the values.
Tables, columns, indexes, views
Users, permissions
Internal statistics
Almost every DBMS stores their a database’s catalog in itself in the
format that they use for their tables
They use special code to bootstrap1 these catalog tables (wrap low-level
access methods to access the catalog)

1 Bootstrapping is the process of initializing a DBMS’s catalog during system setup or database creation. During this phase, the

DBMS uses low-level access methods or internal mechanisms to create the catalog tables and populate them with initial data
24 / 74
System Catalog

You can query the DBMS’s internal INFORMATION SCHEMA catalog to get
info about the database.
ANSI standard set of read-only views that provide info about all of the
tables, views, columns, and procedures in a database.
DBMSs also have non-standard shortcuts to retrieve this information.

25 / 74
Accessing Table Schema

List all of the tables in the current database:

-- SQL -92
SELECT *
FROM I N F O R M A T I O N _ S C H E M A . TABLES
WHERE table_catalog = ’ < db name > ’;

\d; -- Postgres
SHOW TABLES ; -- MySQL
. tables ; -- SQLite

26 / 74
Accessing Table Schema

List all of the columns in the student table:

-- SQL -92
SELECT *
FROM I N F O RM AT I O N _ S C H E M A . TABLES
WHERE table_name = ’ student ’

\ d student ; -- Postgres
DESCRIBE student ; -- MySQL
. schema student ; -- SQLite

27 / 74
Today’s Agenda

Data Representation
System Catalogs
Storage Models
Ways to store tuples in pages

28 / 74
Observation

The relational model does not specify that we have to store all of a
tuple’s attributes together in a single page.
This may not actually be the best layout for some workloads
There are many different workloads for database systems.
By workload1 , we are referring to the general nature of requests a system
will have to handle.
Different workloads have different requirements for data storage and
access patterns.

1 refers to the types of queries, transactions, and operations the system is expected to handle.
29 / 74
Wikipedia Example

30 / 74
OLTP

SELECT P .* , R .*
FROM pages AS P
INNER JOIN revisions AS R
ON P . latest = R . revID
On-line Transaction Processing: WHERE P . pageID = ?
Simple queries that read/update
a small amount of data that is
related to a single entity in the
UPDATE useracct
database. SET lastLogin = NOW () ,
hostname = ?
WHERE userID = ?
This is usually the kind of
application that people build first.
INSERT INTO revisions
VALUES (? ,?... ,?)

31 / 74
OLTP: On-line Transaction Processing

Fast, short running operations


Simple queries that operate on single entity at a time
Typically handle more writes than reads
Repetitive operations
Usually the kind of application that people build first
Example
User invocations of Amazon (Amazon storefront).
Users can add things to their cart,
they can make purchases,
but the actions only affect their accounts.

32 / 74
OLAP

On-line Analytical Processing:


SELECT COUNT ( U . lastLogin ) ,
Complex queries that read large EXTRACT ( month FROM
portions of the database U . lastLogin ) AS month
spanning multiple entities. FROM useracct AS U
WHERE U . hostname LIKE ’ %. gov ’
GROUP BY
You execute these workloads on EXTRACT ( month FROM U .
the data you have collected from lastLogin )
your OLTP application(s).

33 / 74
OLAP: On-line Analytical Processing

Long running, more complex queries


Reads large portions of the database
Analyzing and deriving new data from existing data collected on the
OLTP side
Example
Amazon computing the five most bought items over a one month period
for these geographical locations.

34 / 74
HTAP: Hybrid Transaction + Analytical Processing

A new type of workload which has become popular recently is HTAP,


which is like a combination which tries to do OLTP and OLAP together on
the same database.

Watch HTAP Databases: What is New and What is Next -


SIGMOD22-HTAP-Tutorial- June 2022

35 / 74
Data Storage Model

There are different ways to store tuples in pages.


The DBMS can store tuples in different ways that are better for either
OLTP or OLAP workloads
We have been assuming the n-ary storage model (aka “row storage”)
so far this semester.

36 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.
Ideal for OLTP workloads where requests are insert-heavy and
transactions tend to operate only an individual entity
it takes only one fetch to be able to get all of the attributes for a single
tuple.

37 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.

38 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.

39 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.

40 / 74
N -ARY Storage Model (NSM)

SELECT * FROM useracct


WHERE userName = ? AND userPass = ?

INSERT INTO useracct


VALUES (? ,? ,.. ,?)

41 / 74
N -ARY Storage Model (NSM)

42 / 74
N -ARY Storage Model (NSM)

43 / 74
N -ARY Storage Model (NSM)

SELECT COUNT ( U . lastLogin ) ,


EXTRACT ( month FROM U . lastLogin ) AS month
FROM useracct AS U
WHERE U . hostname LIKE ’ %. gov ’
GROUP BY EXTRACT ( month FROM U . lastLogin )

44 / 74
N -ARY Storage Model (NSM)

45 / 74
N -ARY Storage Model (NSM)

46 / 74
N -ARY Storage Model (NSM)

47 / 74
N -ARY Storage Model (NSM)

48 / 74
N -ARY Storage Model (NSM)

Advantages
Fast inserts, updates, and deletes.
Good for queries that need the entire tuple.
Disadvantages
Not good for scanning large portions of the table and/or a subset of the
attributes.
This is because it pollutes the buffer pool by fetching data that is not
needed for processing the query.

49 / 74
Decomposition Storage Model (DSM)

The DBMS stores the values of a single attribute (column) for all tuples
contiguously in a block of data.
Vertically partition a database into a collection of individual columns that
are stored separately
Also known as a “column store”.
Ideal for OLAP workloads where read-only queries perform large scans
over a subset of the table’s attributes.

50 / 74
Decomposition Storage Model (DSM)

The DBMS stores the values of a single attribute for all tuples
contiguously in a page.
Also known as a “column store”.

51 / 74
Decomposition Storage Model (DSM)

The DBMS stores the values of a single attribute for all tuples
contiguously in a page.
Also known as a “column store”.

52 / 74
Decomposition Storage Model (DSM)

SELECT COUNT ( U . lastLogin ) ,


EXTRACT ( month FROM U . lastLogin ) AS month
FROM useracct AS U
WHERE U . hostname LIKE ’ %. gov ’
GROUP BY EXTRACT ( month FROM U . lastLogin )

53 / 74
Decomposition Storage Model (DSM)

If there is a match on one page, how can we figure out a match on


another page?

54 / 74
Decomposition Storage Model: Tuple Identification

To put the tuples back together when we are using a column store, we can
use:
Choice #1: Fixed-length Offsets (most commonly used approach)
Choice #2: Embedded Tuple Ids (less common approach)

When decomposing a relational database into a column store model, it’s


essential to have a mechanism to identify and reconstruct the original
tuples when needed.

55 / 74
Choice #1: Fixed-length Offsets

Each column of the table is stored separately, and for each column, a
fixed-length offset is used to locate the position of each tuple within that
column.
Assuming the attributes are all fixed-length, the DBMS can compute the
offset of the attribute for each tuple.
When the system wants the attribute for a specific tuple, it knows how to
jump to that spot in the file from the offset.
To accommodate the variable-length fields, the system can either pad
fields so that they are all the same length or use a dictionary that takes a
fixed-size integer and maps the integer to the value.

56 / 74
Choice #2: Embedded Tuple Ids

A less common approach


Each column is still stored separately, but instead of using fixed-length
offsets, each column contains embedded tuple IDs or pointers that
indicate the position or identity of the tuple to which each value belongs.
These embedded IDs link the values across columns, allowing the DBMS
to reconstruct tuples by following the tuple IDs across columns.
Note that this method has a large storage overhead because it needs to
store a tuple id for every attribute entry.

57 / 74
Decomposition Storage Model (DSM)

Advantages
Reduces the amount wasted I/O because the DBMS only reads the data
that it needs.
Enable better query processing and data compression.
because all of the values for the same attribute are stored contiguously
Disadvantages
Slow for point queries, inserts, updates, and deletes because of tuple
splitting/stitching.

58 / 74
Modification of Tuples

How to handle the following operations on the tuple level?


1 Insertion
2 Deletion
3 Update

59 / 74
1) Insertion

Easy case Tuples fixed length/not in sequence(unordered)


Insert new tuple at end of file
or, in deleted slot
A little harder
If records are variable size, not as easy
may not be able to reuse space - fragmentation
A Difficult case: tuples in sequence (ordered)
Find position and slide following tuples
If tuples are sequenced by linking, insert overflow blocks

60 / 74
2) Deletion

61 / 74
Options

(a) Deleted and immediately reclaim space by shifting other tuples or


removing overflows
(b) Mark deleted and list as free for re-use
May need chain of deleted tuples (for re-use)
Need a way to mark

Trade-offs
How expensive is immediate reclaim?
How expensive is to move valid tuple to free space for immediate reclaim?
How much space is wasted?
e.g., deleted tuples, delete fields, ...

62 / 74
Concern with deletions

A caveat when using physical addresses to reference a block/record

Example
Record Y can be referenced by other tuples (e.g., tuples X1 & X2)

63 / 74
Concern with deletions

A caveat when using physical addresses to reference a block/record

Example
When the tuple Y is deleted

the physical addresses will reference an incorrect tuple

64 / 74
Techniques to handle tuple deletion
Using logical addresses is easy
Before deleting tuple Y that is referenced by tuples X1 and X2

65 / 74
Techniques to handle tuple deletion
Using logical addresses is easy
After deleting tuple Y

Deleted tuple is identified by a NULL physical address in the Map table


66 / 74
Very important

The logical address used by tuple Y must remain in the map table
Furthermore:
The logical address used by tuple Y cannot be re-used

67 / 74
Techniques to handle tuple deletion

Deleting a tuple using physical address: use a tombstone record


Tombstone record: a (very small) special purpose tuple used to indicate a
deleted tuple
When a tuple is deleted, it is replaced by the tombstone record
This tombstone is permanent, it must exist until the entire database is
reconstructed
Note: If we are using a map table, then the tombstone can be a null
pointer in place of the physical address.

68 / 74
Tombstones

Example
Before deleting tuple Y

69 / 74
Tombstones

Example
After deleting tuple Y

70 / 74
Tombstones
When you insert a new tuple, you cannot use the space of a tombstone
tuple (tombstone tuple must be preserved)
Because: Existing tuple references to the deleted tuple will then
references to the newly inserted tuple:

71 / 74
Update

If new tuple is shorter than previous, easy


If it is longer, need to shift tuples, create overflow blocks
Note: We will never create a tombstone tuple in an update operation

72 / 74
Conclusion

The storage manager is not entirely independent from the rest of the
DBMS.
A DBMS encodes and decodes the tuple’s bytes into a set of attributes
based on its schema.
It is important to choose the right storage model for the target workload:
OLTP = Row Store
OLAP = Column Store

73 / 74
Database Storage: Next

Problem#1: How the DBMS represents the database in files on disk


i.e., how to lay out data on disk.
Problem#2: How the DBMS manages its memory and move data
back-and-forth from disk.

74 / 74

You might also like