0% found this document useful (0 votes)

2 views4 pages

04-Storage2 2

The lecture discusses Log-Structured Storage, which addresses issues like fragmentation and random disk I/O by storing changes to tuples in a sequential log format, improving write performance but potentially slowing down reads. It also covers Index-Organized Storage, where tuples are stored as index values, and various data representation methods for tuples, including handling of integers, variable precision numbers, and null data types. Finally, it highlights the importance of system catalogs for maintaining metadata about databases and their structure.

Uploaded by

abidine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views4 pages

04-Storage2 2

Uploaded by

abidine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lecture #04: Database Storage (Part II)

15-445/645 Database Systems (Fall 2024)

https://15445.courses.cs.cmu.edu/fall2024/
Carnegie Mellon University
Andy Pavlo

1 Log-Structured Storage
There are several problems associated with the Slotted-Page tuple-oriented architecture discussed in the
previous lecture:
• Fragmentation: Deletion of tuples can leave gaps in the pages, making them not fully utilized.
• Useless Disk I/O: Due to the block-oriented nature of non-volatile storage, the whole block needs
to be fetched to update a tuple.
• Random Disk I/O: The disk reader could have to jump to 20 different places to update 20 different
tuples, which can be very slow.
What if we were working on a system which only allows creation of new pages and no overwrites (e.g.
HDFS, Google Colossus, Some object stores)? The log-structured storage model works with this assump-
tion and addresses some of the problems listed above.

Log-Structured Storage Overview

Log-structured Storage is based on log-structured file systems (LSFS) 1 and log-structured merge trees (LSM
Tree)2 . Instead of storing tuples in pages and updating them in-place, the DBMS only stores the log records
of changes to the tuples. The DBMS applies changes to an in-memory data structure (MemTable) and then
writes out the changes sequentially to disk (SSTable). Records contain the tuple’s unique identifier, the
type of operation (PUT/DELETE), and, for put, the contents of the tuple. Effectively, you only keep track
of the latest values for each key (most recent PUT/DELETE). Noticeably, in-place updates are applied to the
in-memory data structure since it is fast, while disk writes are sequential and existing pages are immutable
which leads to reduced random disk I/O. This is good for append-only storage. The DBMS also sorts each
SSTable based on keys from low to high before it writes them to disk.
To read a record, the DBMS first checks MemTable to see whether it exists. If the key does not exist in
the MemTable, then the DBMS has to check the SSTables at each level. A brute force solution is to scan
down the SSTables from newest to oldest and perform binary search within each SSTable to find the
most recent contents of the tuple, which can be slow. To avoid this, the DBMS can maintain an in-memory
SummaryTable to track additional metadata like min/max key per SSTable and key filter (e.g., Bloom
filters) per level.

Compaction
In a write-heavy workload, the DBMS will accumulate a large number of SSTables on disk. Thus, the DBMS
can periodically use a sort-merge algorithm to compact the log by taking only the most recent change for
each tuple across several pages. It can reduce wasted space and speed up reads.
1
https://doi.org/10.1145/146941.146943
2
https://doi.org/10.1007/s002360050048
Fall 2024 – Lecture #04 Database Storage (Part II)

In Universal Compaction, any log files can be compacted together. In Level Compaction, the smallest
files are level 0. Level 0 files can be compacted to create a bigger level 1 file, level 1 files can be compacted
to a level 2 file, etc. Tiering is another log compaction method that will not be covered in this course.

Tradeoffs
The tradeoffs of using Log-Structured Storage can be summarized below:
• Fast sequential writes, good for append only storage
• Reads may be slow
• Compaction is expensive
• Subject to write amplification (for each logical write, there could be multiple physical writes).

2 Index-Organized Storage
Observe that both page-oriented storage and log-structured storage rely on additional index to find indi-
vidual tuples because the tables are inherently unsorted. In the index-organized storage scheme, the
DBMS directly stores a table’s tuples as the value of an index data structure. The DBMS would use a page
layout that looks like a slotted page, and tuples are typically sorted in page based on key.

3 Data Representation
The data in a tuple is essentially just byte arrays prefixed with a header that contains meta-data about it.
It doesn’t keep track of what kinds of values the attributes are. It is up to the DBMS to know how to keep
track of that and interpret those bytes. A data representation scheme is how a DBMS stores the bytes for a
value.
DBMSs want to make sure the tuples are word-aligned so that the CPU to access it without any unexpected
behavior or additional work. Two approaches are usually taken:
• Padding: Add empty bits after attributes to ensure that tuple is word aligned.
• Reordering: Switch the order of attributes in the physical layout to make sure they are aligned.
There are five high level datatypes that can be stored in tuples: integers, variable-precision numbers, fixed-
point precision numbers, variable length values, and dates/times.

Integers
Most DBMSs store integers using their “native” C/C++ types as specified by the IEEE-754 standard. These
values are fixed length.
Examples: INTEGER, BIGINT, SMALLINT, TINYINT.

Variable Precision Numbers

These are inexact, variable-precision numeric types that use the “native” C/C++ types specified by IEEE-754
standard. These values are also fixed length.
Operations on variable-precision numbers are faster to compute than arbitrary precision numbers because
the CPU can execute instructions on them directly. However, there may be rounding errors when perform-
ing computations due to the fact that some numbers cannot be represented precisely.
Examples: FLOAT, REAL.

15-445/645 Database Systems

Page 2 of 4
Fall 2024 – Lecture #04 Database Storage (Part II)

Fixed-Point Precision Numbers

These are numeric data types with arbitrary precision and scale. They are typically stored in exact, variable-
length binary representation (almost like a string) with additional meta-data that will tell the system things
like the length of the data and where the decimal should be.
These data types are used when rounding errors are unacceptable, but the DBMS pays a performance
penalty to get this accuracy.
Examples: NUMERIC, DECIMAL.

Variable-Length Data
These represent data types of arbitrary length. They are typically stored with a header that keeps track of
the length of the string to make it easy to jump to the next value. It may also contain a checksum for the
data.
Most DBMSs do not allow a tuple to exceed the size of a single page. The ones that do store the data on
a special “overflow” page and have the tuple contain a reference to that page. These overflow pages can
contain pointers to additional overflow pages until all the data can be stored.
Some systems will let you store these large values in an external file, and then the tuple will contain a
pointer to that file. For example, if the database is storing photo information, the DBMS can store the
photos in the external files rather than having them take up large amounts of space in the DBMS. One
downside of this is that the DBMS cannot manipulate the contents of this file. Thus, there are no durability
or transaction protections.
Examples: VARCHAR, VARBINARY, TEXT, BLOB.

Dates and Times

Representations for date/time vary for different systems. Typically, these are represented as some unit
time (micro/milli)seconds since the unix epoch.
Examples: TIME, DATE, TIMESTAMP.

Null Data Types

There are three common apporaches to represent nulls in a DBMS.
• Null Column Bitmap Header: Store a bitmap in a centralized header that specifies what attributes
are null. This is the most common approach.
• Special Values: Designate a value to represent NULL for a data type (e.g., INT32 MIN).
• Per Attribute Null Flag: Store a flag that marks that a value is null. This apporach is NOT recom-
mended because it is not memory-efficient. For each value, the DBMS has to use more than just a
single bit to avoid messing up with word alignment.

4 System Catalogs
In order for the DBMS to be able to decipher the contents of tuples, it maintains an internal catalog to tell
it meta-data about the databases.
Metadata Contents:
• The tables and columns the database has as well as any indexes on those tables.
• Users of the database and what permissions they have.

15-445/645 Database Systems

Page 3 of 4
Fall 2024 – Lecture #04 Database Storage (Part II)

• Statistics about the table and what contents are contained within them (i.e., max value of an at-
tribute).
Most DBMSs store their catalog inside of themselves in the format that they use for their tables. They use
special code to “bootstrap” these catalog tables.

15-445/645 Database Systems

Page 4 of 4

009 Databases
No ratings yet
009 Databases
51 pages
Unit 1 (DBMS)
No ratings yet
Unit 1 (DBMS)
24 pages
Comp101 Lect05
No ratings yet
Comp101 Lect05
39 pages
DBMS Series Part-1
No ratings yet
DBMS Series Part-1
487 pages
04 Storage2
No ratings yet
04 Storage2
72 pages
05 Storage3
No ratings yet
05 Storage3
76 pages
03 Storage1
No ratings yet
03 Storage1
55 pages
3 Storage
No ratings yet
3 Storage
34 pages
Mysql Notes
No ratings yet
Mysql Notes
47 pages
Notes 03 - Database Storage - II
No ratings yet
Notes 03 - Database Storage - II
74 pages
Question Bank
No ratings yet
Question Bank
85 pages
Unit 1-DBMS
No ratings yet
Unit 1-DBMS
100 pages
Introduction To SQL
No ratings yet
Introduction To SQL
32 pages
Unit I - Database Management System
No ratings yet
Unit I - Database Management System
77 pages
DBMS and SQL Notes
100% (1)
DBMS and SQL Notes
68 pages
08 Storage
No ratings yet
08 Storage
43 pages
Unit 5 DBMS
No ratings yet
Unit 5 DBMS
38 pages
DB Storage3
No ratings yet
DB Storage3
65 pages
Unit 4
No ratings yet
Unit 4
18 pages
MIS in NASSIT Sierra Leone.
100% (1)
MIS in NASSIT Sierra Leone.
20 pages
Database System Concepts and Architecture
No ratings yet
Database System Concepts and Architecture
18 pages
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
No ratings yet
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
68 pages
Database Management System Chapter 2
No ratings yet
Database Management System Chapter 2
19 pages
Database Systems
No ratings yet
Database Systems
181 pages
CPE 313 Database Management Systems: Fall 2021/2022
No ratings yet
CPE 313 Database Management Systems: Fall 2021/2022
24 pages
SQL Server Storage Internals 101
No ratings yet
SQL Server Storage Internals 101
20 pages
Dbms File
No ratings yet
Dbms File
41 pages
Dbms Unit Test Notes Till Unit 4
No ratings yet
Dbms Unit Test Notes Till Unit 4
31 pages
3.1 - SQL Data Types
No ratings yet
3.1 - SQL Data Types
7 pages
Normalization Book PDF
No ratings yet
Normalization Book PDF
181 pages
CamScanner 02-28-2023 13.34
No ratings yet
CamScanner 02-28-2023 13.34
32 pages
Database Storage: Intro To Database Systems Andy Pavlo
No ratings yet
Database Storage: Intro To Database Systems Andy Pavlo
54 pages
Database Assignment - 2
No ratings yet
Database Assignment - 2
9 pages
L1 Introduction To DBMS
No ratings yet
L1 Introduction To DBMS
35 pages
Database Unit1 Notes For Reference
No ratings yet
Database Unit1 Notes For Reference
19 pages
7055 DBMS Assignment 3
No ratings yet
7055 DBMS Assignment 3
4 pages
Dbms Unit II
No ratings yet
Dbms Unit II
17 pages
LAB Experiment 1
No ratings yet
LAB Experiment 1
27 pages
Structured Query Language SQL: Htet Mon Win Banking Division ACE Data Systems
No ratings yet
Structured Query Language SQL: Htet Mon Win Banking Division ACE Data Systems
37 pages
07 Database Management Systempart1 (1-20)
No ratings yet
07 Database Management Systempart1 (1-20)
19 pages
03-Storage1 Notes
No ratings yet
03-Storage1 Notes
4 pages
Lecture3 PDF
No ratings yet
Lecture3 PDF
28 pages
SQL Premsentation
No ratings yet
SQL Premsentation
225 pages
03 Storage1
No ratings yet
03 Storage1
4 pages
Systems Design Study Guide
No ratings yet
Systems Design Study Guide
32 pages
Database
No ratings yet
Database
5 pages
Database Management Short Notes
No ratings yet
Database Management Short Notes
5 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Module 7 Basic SQL-1
No ratings yet
Module 7 Basic SQL-1
30 pages
Unit-1 - Notes Dbms
No ratings yet
Unit-1 - Notes Dbms
47 pages
Table Creation and DML Commands
No ratings yet
Table Creation and DML Commands
26 pages
DBMS Unit 1
No ratings yet
DBMS Unit 1
19 pages
Creating and Modifying Database Tables
No ratings yet
Creating and Modifying Database Tables
71 pages
Grade 8 Computer Monthly Test January
50% (4)
Grade 8 Computer Monthly Test January
3 pages
1 CH1 IT Project Management
No ratings yet
1 CH1 IT Project Management
19 pages
Python For Cybersecurity Using Python For Cyber Offense and Defense 1st Edition Poston Iii Download
100% (2)
Python For Cybersecurity Using Python For Cyber Offense and Defense 1st Edition Poston Iii Download
53 pages
Unit
No ratings yet
Unit
13 pages
Relational Database Management Systems-Basic
No ratings yet
Relational Database Management Systems-Basic
29 pages
Dbms Study Material
No ratings yet
Dbms Study Material
12 pages
PPDS OSS Restriction Maintenance For Model Mix Planning SAPAPO RET2
No ratings yet
PPDS OSS Restriction Maintenance For Model Mix Planning SAPAPO RET2
2 pages
Information Extraction From Product Labels: A Machine Vision Approach
No ratings yet
Information Extraction From Product Labels: A Machine Vision Approach
20 pages
Epwm Pdpint-Trip Zone
No ratings yet
Epwm Pdpint-Trip Zone
120 pages
Dokumen - Tips Widevine Level 1 Provisioning Models Level 1 Provisioning Models W I D e Vi 1
100% (1)
Dokumen - Tips Widevine Level 1 Provisioning Models Level 1 Provisioning Models W I D e Vi 1
13 pages
Worksheet in TLE 6-Week 9
No ratings yet
Worksheet in TLE 6-Week 9
2 pages
Module 1 Introduction and Dart Programming
No ratings yet
Module 1 Introduction and Dart Programming
282 pages
From Forms To HTML: Understanding and Using Oracle Projects' HTML Pages
100% (1)
From Forms To HTML: Understanding and Using Oracle Projects' HTML Pages
29 pages
305 Prep Azure
No ratings yet
305 Prep Azure
118 pages
Big Data Unit 2 - PPT1
No ratings yet
Big Data Unit 2 - PPT1
15 pages
Unit 1 - Structured Paradigm
No ratings yet
Unit 1 - Structured Paradigm
67 pages
ICT IGCSE - Hardware and Software - Computers - Quizizz
No ratings yet
ICT IGCSE - Hardware and Software - Computers - Quizizz
5 pages
Computer Architecture: MIPS Instruction Set Architecture
No ratings yet
Computer Architecture: MIPS Instruction Set Architecture
34 pages
Computer Science - MY SQL
No ratings yet
Computer Science - MY SQL
9 pages
ChatGPT Teardown
No ratings yet
ChatGPT Teardown
9 pages
Module-5 Structure, Union, Pointers and Preprocessor Directives
No ratings yet
Module-5 Structure, Union, Pointers and Preprocessor Directives
12 pages
IDeliverable - Writing An Orchard Webshop Module From Scratch - Part 1
No ratings yet
IDeliverable - Writing An Orchard Webshop Module From Scratch - Part 1
13 pages
CV - DEINTEC 2020 Fernando Arciniega
No ratings yet
CV - DEINTEC 2020 Fernando Arciniega
5 pages
+ Add/request New Update: 19949926 - SICHUAN Province Airport Group. Co., LTD
No ratings yet
+ Add/request New Update: 19949926 - SICHUAN Province Airport Group. Co., LTD
2 pages
COMP8780 Assignment Two - 2021-Final
No ratings yet
COMP8780 Assignment Two - 2021-Final
10 pages
Edms 2
No ratings yet
Edms 2
10 pages
Resume For Cloud Technologies
No ratings yet
Resume For Cloud Technologies
4 pages
Asce Latex User Guide For Editorial Manager: Bibtex
No ratings yet
Asce Latex User Guide For Editorial Manager: Bibtex
2 pages
Nikon Software NIS-Elements D
No ratings yet
Nikon Software NIS-Elements D
4 pages
Bar Council of The State of Andhra Pradesh Instructions On Online Enrolment Registration
No ratings yet
Bar Council of The State of Andhra Pradesh Instructions On Online Enrolment Registration
3 pages
Crime Investigation Management System Abstract
No ratings yet
Crime Investigation Management System Abstract
2 pages
Microservice
No ratings yet
Microservice
2 pages
Lecture 12
No ratings yet
Lecture 12
3 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet

04-Storage2 2

Uploaded by

04-Storage2 2

Uploaded by

Lecture #04: Database Storage (Part II)

15-445/645 Database Systems (Fall 2024)

Log-Structured Storage Overview

Variable Precision Numbers

15-445/645 Database Systems

Fixed-Point Precision Numbers

Dates and Times

Null Data Types

15-445/645 Database Systems

15-445/645 Database Systems

You might also like