04-Storage2 2
04-Storage2 2
1 Log-Structured Storage
There are several problems associated with the Slotted-Page tuple-oriented architecture discussed in the
previous lecture:
• Fragmentation: Deletion of tuples can leave gaps in the pages, making them not fully utilized.
• Useless Disk I/O: Due to the block-oriented nature of non-volatile storage, the whole block needs
to be fetched to update a tuple.
• Random Disk I/O: The disk reader could have to jump to 20 different places to update 20 different
tuples, which can be very slow.
What if we were working on a system which only allows creation of new pages and no overwrites (e.g.
HDFS, Google Colossus, Some object stores)? The log-structured storage model works with this assump-
tion and addresses some of the problems listed above.
Compaction
In a write-heavy workload, the DBMS will accumulate a large number of SSTables on disk. Thus, the DBMS
can periodically use a sort-merge algorithm to compact the log by taking only the most recent change for
each tuple across several pages. It can reduce wasted space and speed up reads.
1
https://doi.org/10.1145/146941.146943
2
https://doi.org/10.1007/s002360050048
Fall 2024 – Lecture #04 Database Storage (Part II)
In Universal Compaction, any log files can be compacted together. In Level Compaction, the smallest
files are level 0. Level 0 files can be compacted to create a bigger level 1 file, level 1 files can be compacted
to a level 2 file, etc. Tiering is another log compaction method that will not be covered in this course.
Tradeoffs
The tradeoffs of using Log-Structured Storage can be summarized below:
• Fast sequential writes, good for append only storage
• Reads may be slow
• Compaction is expensive
• Subject to write amplification (for each logical write, there could be multiple physical writes).
2 Index-Organized Storage
Observe that both page-oriented storage and log-structured storage rely on additional index to find indi-
vidual tuples because the tables are inherently unsorted. In the index-organized storage scheme, the
DBMS directly stores a table’s tuples as the value of an index data structure. The DBMS would use a page
layout that looks like a slotted page, and tuples are typically sorted in page based on key.
3 Data Representation
The data in a tuple is essentially just byte arrays prefixed with a header that contains meta-data about it.
It doesn’t keep track of what kinds of values the attributes are. It is up to the DBMS to know how to keep
track of that and interpret those bytes. A data representation scheme is how a DBMS stores the bytes for a
value.
DBMSs want to make sure the tuples are word-aligned so that the CPU to access it without any unexpected
behavior or additional work. Two approaches are usually taken:
• Padding: Add empty bits after attributes to ensure that tuple is word aligned.
• Reordering: Switch the order of attributes in the physical layout to make sure they are aligned.
There are five high level datatypes that can be stored in tuples: integers, variable-precision numbers, fixed-
point precision numbers, variable length values, and dates/times.
Integers
Most DBMSs store integers using their “native” C/C++ types as specified by the IEEE-754 standard. These
values are fixed length.
Examples: INTEGER, BIGINT, SMALLINT, TINYINT.
Variable-Length Data
These represent data types of arbitrary length. They are typically stored with a header that keeps track of
the length of the string to make it easy to jump to the next value. It may also contain a checksum for the
data.
Most DBMSs do not allow a tuple to exceed the size of a single page. The ones that do store the data on
a special “overflow” page and have the tuple contain a reference to that page. These overflow pages can
contain pointers to additional overflow pages until all the data can be stored.
Some systems will let you store these large values in an external file, and then the tuple will contain a
pointer to that file. For example, if the database is storing photo information, the DBMS can store the
photos in the external files rather than having them take up large amounts of space in the DBMS. One
downside of this is that the DBMS cannot manipulate the contents of this file. Thus, there are no durability
or transaction protections.
Examples: VARCHAR, VARBINARY, TEXT, BLOB.
4 System Catalogs
In order for the DBMS to be able to decipher the contents of tuples, it maintains an internal catalog to tell
it meta-data about the databases.
Metadata Contents:
• The tables and columns the database has as well as any indexes on those tables.
• Users of the database and what permissions they have.
• Statistics about the table and what contents are contained within them (i.e., max value of an at-
tribute).
Most DBMSs store their catalog inside of themselves in the format that they use for their tables. They use
special code to “bootstrap” these catalog tables.