0% found this document useful (0 votes)

31 views74 pages

Notes 03 - Database Storage - II

The document discusses advanced database storage concepts, focusing on how DBMS represents databases in files, manages memory, and organizes data. It covers data representation for various types, including integers, floating-point numbers, and variable-length data, as well as the importance of system catalogs for metadata management. Additionally, it highlights different storage models and their implications for OLTP and OLAP workloads.

Uploaded by

Dhruv Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views74 pages

Notes 03 - Database Storage - II

Uploaded by

Dhruv Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

CS525: Advanced Database Organization

Notes 3: Database Storage

Part II

Yousef M. Elmehdwi

Department of Computer Science

Illinois Institute of Technology

[email protected]

September 6th 2023

Slides: adapted from courses taught by Andy Pavlo, Carnegie Mellon University, Hector
Garcia-Molina, Stanford, & Shun Yan Cheung, Emory University

1 / 74
Reading

Database Systems: The Complete Book, 2nd Edition,

Chapter 2: Data Storage
Database System Concepts (6th/7th Edition)
Chapter 10 (6th)/ Chapter 13 (7th)

2 / 74
Database Storage

Problem#1: How the DBMS represents the database in files on disk

i.e., how to lay out data on disk.
Problem#2: How the DBMS manages its memory and move data
back-and-forth from disk.

3 / 74
Today’s Agenda

Data Representation
How the system stores the actual binary data for individual attributes
(columns) within the database.
System Catalogs
Internal metadata is maintained by the database to understand both the
data that is actually stored and how to interpret the bytes within the tuples
Storage Models
How data is organized and stored within the database system.
Modification of Tuples

4 / 74
Tuple Storage

A tuple is essentially a sequence of bytes (byte arrays).

It is up to the DBMS to know how to interpret those bytes to derive the
values for attributes.
The DBMS’s catalogs contain the schema information about tables that
the system uses to figure out the tuple’s layout.

5 / 74
What are the data items we want to store?

a salary
a name
a date
a picture
⇒ What we have available: Bytes

6 / 74
Data Representation

How a DBMS stores the bytes for a value?

How the DBMS stores the binary data for different types of values or
attributes?
There are five high level data types that can be stored in tuples:
integers,
variable precision numbers,
fixed point precision numbers,
variable length values, and
dates/times.

7 / 74
IEEE-754 Standard1

This is a specific standard that defines the binary representation of

floating-point numbers (like float and double) and their arithmetic
operations in computing.
Ensures consistency in how floating-point numbers are stored and
manipulated across different computer architectures and programming
languages.
It specifies the format for representing real numbers, including the sign
bit, exponent, and mantissa (fractional part).
Example: 32 bit in Standard IEEE 754:

1 https://en.wikipedia.org/wiki/IEEE 754
8 / 74
Data Representation

How a DBMS stores the bytes for a value

INTEGER/BIGINT/SMALLINT/TINYINT
C/C++ Representation
All integers are stored in their “native” C/C++ types1 within the database.
FLOAT/REAL vs. NUMERIC/DECIMAL
IEEE-754 Standard/Fixed-point Decimals
VARCHAR/VARBINARY/TEXT/BLOB
Header with length, followed by data bytes.
TIME/DATE/TIMESTAMP
32/64-bit integer of (micro)seconds since Unix epoch

1 refer to the fundamental data types that are supported directly by the C/C++ programming languages without any additional

libraries or custom data types. These native data types are typically used for storing and manipulating data efficiently in these
languages.
9 / 74
Data Representation: Integers

C/C++ Representation
Most DBMSs store integers using their “native” C/C++ types as specified
by the IEEE-754 standard.
These values are fixed length.
Examples: INTEGER/BIGINT/SMALLINT/TINYINT

10 / 74
Variable Precision Numbers

These are inexact1 , variable-precision numeric types that uses the

“native” C/C++ types.
Store directly as specified by IEEE-754 standard.
These values are also fixed length.
Typically faster than arbitrary precision numbers because the CPU can
execute instructions on them directly.
Example: FLOAT, REAL/DOUBLE
but can have rounding errors2 when performing computations due to the
fact that some numbers cannot be represented precisely in binary
floating-point format.
As a result, calculations may yield slightly inaccurate results.
To avoid this issue, we use Fixed-Point Precision Numbers.

1 Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that

storing and retrieving a value might show slight discrepancies.

2 rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced

by the same algorithm using finite-precision, rounded arithmetic.

11 / 74
Variable Precision Numbers

Rounding Example
# include < stdio .h >

int main ( int argc , char * argv []) {

float x = 0.1;
float y = 0.2;
printf ( " x + y = % f \ n " , x + y ) ;
printf ( " 0.3 = % f \ n " , 0.3) ;
}

Output
x + y = 0.300000
0.3 = 0.300000

12 / 74
Variable Precision Numbers

Rounding Example
# include < stdio .h >

int main ( int argc , char * argv []) {

float x = 0.1;
float y = 0.2;
printf ( " x + y = %.20 f \ n " , x + y ) ;
printf ( " 0.3 = %.20 f \ n " , 0.3) ;
}

Output
x + y = 0. 3 0 0 0 0 0 0 1 1 9 2 092 89 55 08
0.3 = 0 . 2 9 9 9 9 9 9 9 9 9 99 99 998 89 0

13 / 74
Variable Precision Numbers

Rounding Example
public class RoundingError {
public static void main ( String [] args ) {
float x = 1.0 f ;
for ( int i = 0; i < 10; i ++) {
x -= 0.1 f ;
}
System . out . printf ( " Result w precision : %.1 f \ n " , x ) ;
System . out . printf ( " Result w precision 10: %.10 f \ n " ,x ) ;
}
}

Output
Result w precision 1 : -0.0
Result w precision 10: -0.0000000745

14 / 74
Data Representation: Fixed Point Precision Numbers

Numeric data types with arbitrary precision and scale.

Used when round errors are unacceptable.
Example: NUMERIC, DECIMAL
Typically stored in an exact, variable-length binary representation with
additional meta-data that specifies the length of the data and the position
of the decimal point.
Like a VARCHAR but not stored as a string
but the DBMS pays a performance penalty to get this accuracy.
Calculations involving fixed-point precision numbers may be slower
compared to operations with native numeric types like FLOAT or
DOUBLE, which use hardware-based floating-point arithmetic.

15 / 74
PostgreSQL: NUMERIC

16 / 74
PostgreSQL Source Code, numeric.c
17 / 74
Data Representation: Variable Length Data

These represent data types of arbitrary length.

An array of bytes of arbitrary length.
Has a header that keeps track of the length of the string to make it easy
to jump to the next value. It may also contain a checksum for the data.
Example: VARCHAR, VARBINARY, TEXT, BLOB.

18 / 74
Large Values

Most DBMSs don’t allow a tuple to exceed the size of a single page.
Handling tuples that exceed the size of a single page in a DBMS is a
common challenge.
DBMSs typically have strategies to deal with this situation to ensure data
integrity and efficient storage.
Two common approaches to handle such cases are:
overflow page
external storage

19 / 74
Large Values: Overflow Page
To store values that are larger than a
page, the DBMS uses separate overflow
storage pages and have the tuple contain a
reference to that page.
The main part of the tuple, which can fit
within a single page, is stored in the
primary page, while the overflowed part
is stored in one or more additional
overflow pages.
Overflow pages are linked to the primary
page, forming a chain of pages that
together represent the complete tuple.
When querying the data, the DBMS
follows these chains of pages to
reconstruct the complete tuple.
These overflow pages can contain pointers to additional overflow pages
until all the data can be stored.

20 / 74
Large Values: Overflow Page
To store values that are larger than a
page, the DBMS uses separate overflow
storage pages and have the tuple contain a
reference to that page.
Different DBMSs have different
name/specification/requirements when
they do that:
Postgres: TOAST (The
Oversized-Attribute Storage
Technique) (>2KB)
MySQL: Overflow (> 12 size of page)
SQL Server: Overflow (> size of
page)
These overflow pages can contain pointers to additional overflow pages
until all the data can be stored.

21 / 74
External Value Storage
Some systems allow to store large data values, such as files or binary
objects, in an external file rather than directly within the database, and
then the tuple will contain a pointer to that file.
Example:
if the database is storing photo information, the DBMS can store the
photos in the external files rather than having them take up large amounts
of space in the DBMS.

Treated as a BLOB type

Oracle: BFILE data type
Contains a locator pointing to a large
binary file stored outside the database.
Microsoft: FILESTREAM data type
The DBMS cannot manipulate the
contents of an external file.

Reading: A paper explains the trade-offs between these two options:

To BLOB or Not To BLOB: Large Object Storage in a Database or a
Filesystem
22 / 74
Data Representation: Dates and Times

Varies widely across different database systems?

However, a common approach is to represent dates and times as the
number of (micro/milli)seconds since the Unix epoch1 .
Example: TIME, DATE, TIMESTAMP.

1 The Unix epoch is a reference point in time, representing January 1, 1970, at 00:00:00 UTC (Coordinated Universal Time). It

is widely used as a starting point for measuring time intervals.

23 / 74
System Catalog

In order for the DBMS to be able to decipher the contents of tuples, it

maintains an internal catalog to tell it meta-data about the databases
A DBMS stores meta-data about databases in its internal catalogs.
The meta-data will contain what tables and columns the databases have
along with their types and the orderings of the values.
Tables, columns, indexes, views
Users, permissions
Internal statistics
Almost every DBMS stores their a database’s catalog in itself in the
format that they use for their tables
They use special code to bootstrap1 these catalog tables (wrap low-level
access methods to access the catalog)

1 Bootstrapping is the process of initializing a DBMS’s catalog during system setup or database creation. During this phase, the

DBMS uses low-level access methods or internal mechanisms to create the catalog tables and populate them with initial data
24 / 74
System Catalog

You can query the DBMS’s internal INFORMATION SCHEMA catalog to get
info about the database.
ANSI standard set of read-only views that provide info about all of the
tables, views, columns, and procedures in a database.
DBMSs also have non-standard shortcuts to retrieve this information.

25 / 74
Accessing Table Schema

List all of the tables in the current database:

-- SQL -92
SELECT *
FROM I N F O R M A T I O N _ S C H E M A . TABLES
WHERE table_catalog = ’ < db name > ’;

\d; -- Postgres
SHOW TABLES ; -- MySQL
. tables ; -- SQLite

26 / 74
Accessing Table Schema

List all of the columns in the student table:

-- SQL -92
SELECT *
FROM I N F O RM AT I O N _ S C H E M A . TABLES
WHERE table_name = ’ student ’

\ d student ; -- Postgres
DESCRIBE student ; -- MySQL
. schema student ; -- SQLite

27 / 74
Today’s Agenda

Data Representation
System Catalogs
Storage Models
Ways to store tuples in pages

28 / 74
Observation

The relational model does not specify that we have to store all of a
tuple’s attributes together in a single page.
This may not actually be the best layout for some workloads
There are many different workloads for database systems.
By workload1 , we are referring to the general nature of requests a system
will have to handle.
Different workloads have different requirements for data storage and
access patterns.

1 refers to the types of queries, transactions, and operations the system is expected to handle.
29 / 74
Wikipedia Example

30 / 74
OLTP

SELECT P .* , R .*
FROM pages AS P
INNER JOIN revisions AS R
ON P . latest = R . revID
On-line Transaction Processing: WHERE P . pageID = ?
Simple queries that read/update
a small amount of data that is
related to a single entity in the
UPDATE useracct
database. SET lastLogin = NOW () ,
hostname = ?
WHERE userID = ?
This is usually the kind of
application that people build first.
INSERT INTO revisions
VALUES (? ,?... ,?)

31 / 74
OLTP: On-line Transaction Processing

Fast, short running operations

Simple queries that operate on single entity at a time
Typically handle more writes than reads
Repetitive operations
Usually the kind of application that people build first
Example
User invocations of Amazon (Amazon storefront).
Users can add things to their cart,
they can make purchases,
but the actions only affect their accounts.

32 / 74
OLAP

On-line Analytical Processing:

SELECT COUNT ( U . lastLogin ) ,
Complex queries that read large EXTRACT ( month FROM
portions of the database U . lastLogin ) AS month
spanning multiple entities. FROM useracct AS U
WHERE U . hostname LIKE ’ %. gov ’
GROUP BY
You execute these workloads on EXTRACT ( month FROM U .
the data you have collected from lastLogin )
your OLTP application(s).

33 / 74
OLAP: On-line Analytical Processing

Long running, more complex queries

Reads large portions of the database
Analyzing and deriving new data from existing data collected on the
OLTP side
Example
Amazon computing the five most bought items over a one month period
for these geographical locations.

34 / 74
HTAP: Hybrid Transaction + Analytical Processing

A new type of workload which has become popular recently is HTAP,

which is like a combination which tries to do OLTP and OLAP together on
the same database.

Watch HTAP Databases: What is New and What is Next -

SIGMOD22-HTAP-Tutorial- June 2022

35 / 74
Data Storage Model

There are different ways to store tuples in pages.

The DBMS can store tuples in different ways that are better for either
OLTP or OLAP workloads
We have been assuming the n-ary storage model (aka “row storage”)
so far this semester.

36 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.
Ideal for OLTP workloads where requests are insert-heavy and
transactions tend to operate only an individual entity
it takes only one fetch to be able to get all of the attributes for a single
tuple.

37 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.

38 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.

39 / 74
N -ARY Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a single
page.

40 / 74
N -ARY Storage Model (NSM)

SELECT * FROM useracct

WHERE userName = ? AND userPass = ?

INSERT INTO useracct

VALUES (? ,? ,.. ,?)

41 / 74
N -ARY Storage Model (NSM)

42 / 74
N -ARY Storage Model (NSM)

43 / 74
N -ARY Storage Model (NSM)

SELECT COUNT ( U . lastLogin ) ,

EXTRACT ( month FROM U . lastLogin ) AS month
FROM useracct AS U
WHERE U . hostname LIKE ’ %. gov ’
GROUP BY EXTRACT ( month FROM U . lastLogin )

44 / 74
N -ARY Storage Model (NSM)

45 / 74
N -ARY Storage Model (NSM)

46 / 74
N -ARY Storage Model (NSM)

47 / 74
N -ARY Storage Model (NSM)

48 / 74
N -ARY Storage Model (NSM)

Advantages
Fast inserts, updates, and deletes.
Good for queries that need the entire tuple.
Disadvantages
Not good for scanning large portions of the table and/or a subset of the
attributes.
This is because it pollutes the buffer pool by fetching data that is not
needed for processing the query.

49 / 74
Decomposition Storage Model (DSM)

The DBMS stores the values of a single attribute (column) for all tuples
contiguously in a block of data.
Vertically partition a database into a collection of individual columns that
are stored separately
Also known as a “column store”.
Ideal for OLAP workloads where read-only queries perform large scans
over a subset of the table’s attributes.

50 / 74
Decomposition Storage Model (DSM)

The DBMS stores the values of a single attribute for all tuples
contiguously in a page.
Also known as a “column store”.

51 / 74
Decomposition Storage Model (DSM)

The DBMS stores the values of a single attribute for all tuples
contiguously in a page.
Also known as a “column store”.

52 / 74
Decomposition Storage Model (DSM)

SELECT COUNT ( U . lastLogin ) ,

EXTRACT ( month FROM U . lastLogin ) AS month
FROM useracct AS U
WHERE U . hostname LIKE ’ %. gov ’
GROUP BY EXTRACT ( month FROM U . lastLogin )

53 / 74
Decomposition Storage Model (DSM)

If there is a match on one page, how can we figure out a match on

another page?

54 / 74
Decomposition Storage Model: Tuple Identification

To put the tuples back together when we are using a column store, we can
use:
Choice #1: Fixed-length Offsets (most commonly used approach)
Choice #2: Embedded Tuple Ids (less common approach)

When decomposing a relational database into a column store model, it’s

essential to have a mechanism to identify and reconstruct the original
tuples when needed.

55 / 74
Choice #1: Fixed-length Offsets

Each column of the table is stored separately, and for each column, a
fixed-length offset is used to locate the position of each tuple within that
column.
Assuming the attributes are all fixed-length, the DBMS can compute the
offset of the attribute for each tuple.
When the system wants the attribute for a specific tuple, it knows how to
jump to that spot in the file from the offset.
To accommodate the variable-length fields, the system can either pad
fields so that they are all the same length or use a dictionary that takes a
fixed-size integer and maps the integer to the value.

56 / 74
Choice #2: Embedded Tuple Ids

A less common approach

Each column is still stored separately, but instead of using fixed-length
offsets, each column contains embedded tuple IDs or pointers that
indicate the position or identity of the tuple to which each value belongs.
These embedded IDs link the values across columns, allowing the DBMS
to reconstruct tuples by following the tuple IDs across columns.
Note that this method has a large storage overhead because it needs to
store a tuple id for every attribute entry.

57 / 74
Decomposition Storage Model (DSM)

Advantages
Reduces the amount wasted I/O because the DBMS only reads the data
that it needs.
Enable better query processing and data compression.
because all of the values for the same attribute are stored contiguously
Disadvantages
Slow for point queries, inserts, updates, and deletes because of tuple
splitting/stitching.

58 / 74
Modification of Tuples

How to handle the following operations on the tuple level?

1 Insertion
2 Deletion
3 Update

59 / 74
1) Insertion

Easy case Tuples fixed length/not in sequence(unordered)

Insert new tuple at end of file
or, in deleted slot
A little harder
If records are variable size, not as easy
may not be able to reuse space - fragmentation
A Difficult case: tuples in sequence (ordered)
Find position and slide following tuples
If tuples are sequenced by linking, insert overflow blocks

60 / 74
2) Deletion

61 / 74
Options

(a) Deleted and immediately reclaim space by shifting other tuples or

removing overflows
(b) Mark deleted and list as free for re-use
May need chain of deleted tuples (for re-use)
Need a way to mark

Trade-offs
How expensive is immediate reclaim?
How expensive is to move valid tuple to free space for immediate reclaim?
How much space is wasted?
e.g., deleted tuples, delete fields, ...

62 / 74
Concern with deletions

A caveat when using physical addresses to reference a block/record

Example
Record Y can be referenced by other tuples (e.g., tuples X1 & X2)

63 / 74
Concern with deletions

A caveat when using physical addresses to reference a block/record

Example
When the tuple Y is deleted

the physical addresses will reference an incorrect tuple

64 / 74
Techniques to handle tuple deletion
Using logical addresses is easy
Before deleting tuple Y that is referenced by tuples X1 and X2

65 / 74
Techniques to handle tuple deletion
Using logical addresses is easy
After deleting tuple Y

Deleted tuple is identified by a NULL physical address in the Map table

66 / 74
Very important

The logical address used by tuple Y must remain in the map table
Furthermore:
The logical address used by tuple Y cannot be re-used

67 / 74
Techniques to handle tuple deletion

Deleting a tuple using physical address: use a tombstone record

Tombstone record: a (very small) special purpose tuple used to indicate a
deleted tuple
When a tuple is deleted, it is replaced by the tombstone record
This tombstone is permanent, it must exist until the entire database is
reconstructed
Note: If we are using a map table, then the tombstone can be a null
pointer in place of the physical address.

68 / 74
Tombstones

Example
Before deleting tuple Y

69 / 74
Tombstones

Example
After deleting tuple Y

70 / 74
Tombstones
When you insert a new tuple, you cannot use the space of a tombstone
tuple (tombstone tuple must be preserved)
Because: Existing tuple references to the deleted tuple will then
references to the newly inserted tuple:

71 / 74
Update

If new tuple is shorter than previous, easy

If it is longer, need to shift tuples, create overflow blocks
Note: We will never create a tombstone tuple in an update operation

72 / 74
Conclusion

The storage manager is not entirely independent from the rest of the
DBMS.
A DBMS encodes and decodes the tuple’s bytes into a set of attributes
based on its schema.
It is important to choose the right storage model for the target workload:
OLTP = Row Store
OLAP = Column Store

73 / 74
Database Storage: Next

Problem#1: How the DBMS represents the database in files on disk

i.e., how to lay out data on disk.
Problem#2: How the DBMS manages its memory and move data
back-and-forth from disk.

74 / 74

Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
820 pages
Data Types Handout1
100% (1)
Data Types Handout1
75 pages
Lect 6 Programming Logic Using ‘C’ Data Types
No ratings yet
Lect 6 Programming Logic Using ‘C’ Data Types
30 pages
Basic Data Structure
No ratings yet
Basic Data Structure
55 pages
PPL (Unit2 Data Types)
75% (4)
PPL (Unit2 Data Types)
43 pages
DBMS Lecture 2
No ratings yet
DBMS Lecture 2
13 pages
Data Types-C++
No ratings yet
Data Types-C++
182 pages
Database Storage: Intro To Database Systems Andy Pavlo
No ratings yet
Database Storage: Intro To Database Systems Andy Pavlo
54 pages
Data Types
No ratings yet
Data Types
12 pages
Data Type and Data Structure
No ratings yet
Data Type and Data Structure
16 pages
CSI_03_tim
No ratings yet
CSI_03_tim
73 pages
PostgreSQL Data Base Creation Part 2
No ratings yet
PostgreSQL Data Base Creation Part 2
32 pages
Slot03-04-BasicComputation
No ratings yet
Slot03-04-BasicComputation
61 pages
08-storage
No ratings yet
08-storage
43 pages
02a. Data Types-2
No ratings yet
02a. Data Types-2
35 pages
Slot03 04 BasicComputation
No ratings yet
Slot03 04 BasicComputation
60 pages
Lecture3 PDF
No ratings yet
Lecture3 PDF
28 pages
SQL Premsentation
No ratings yet
SQL Premsentation
225 pages
Database Management System Chapter 2
No ratings yet
Database Management System Chapter 2
19 pages
Computer Systems Engineering Student Notes
No ratings yet
Computer Systems Engineering Student Notes
68 pages
Data Types C
No ratings yet
Data Types C
11 pages
Ceng301 Dbms Session 9
No ratings yet
Ceng301 Dbms Session 9
16 pages
Fe1008 03
No ratings yet
Fe1008 03
11 pages
DBMS 4.pdf - Bvvtuo
No ratings yet
DBMS 4.pdf - Bvvtuo
10 pages
Basic Data Types
No ratings yet
Basic Data Types
9 pages
Data Definition in DBMS_module2
No ratings yet
Data Definition in DBMS_module2
24 pages
C_Programming__2_
No ratings yet
C_Programming__2_
71 pages
SQL Data Types
No ratings yet
SQL Data Types
10 pages
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
No ratings yet
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
68 pages
Ashokit Oracle 6pm Batch (1)
No ratings yet
Ashokit Oracle 6pm Batch (1)
72 pages
Dba110lab03 Answers
No ratings yet
Dba110lab03 Answers
13 pages
Unit 16 Data Representation
No ratings yet
Unit 16 Data Representation
25 pages
Me8691-Computer Aided Design and Manufacturing-1937982819-Cad Notes
No ratings yet
Me8691-Computer Aided Design and Manufacturing-1937982819-Cad Notes
156 pages
8 SQL Data Types in Sap Hana
No ratings yet
8 SQL Data Types in Sap Hana
8 pages
Computer Science 13:11:23
No ratings yet
Computer Science 13:11:23
3 pages
PPL-Unit 2
No ratings yet
PPL-Unit 2
45 pages
02 Data Stru
No ratings yet
02 Data Stru
59 pages
CSC-335 Data Structures and Algorithms: Instructor: Ahmad Reza Hadaegh
No ratings yet
CSC-335 Data Structures and Algorithms: Instructor: Ahmad Reza Hadaegh
23 pages
SQL Data Types
No ratings yet
SQL Data Types
5 pages
04-storage2_2
No ratings yet
04-storage2_2
4 pages
3.1 Notes - Data Types, Variables, and Constants
No ratings yet
3.1 Notes - Data Types, Variables, and Constants
3 pages
Data Types
No ratings yet
Data Types
2 pages
3576
No ratings yet
3576
2 pages
ISA V85a A64 XML 00bet8 PDF
No ratings yet
ISA V85a A64 XML 00bet8 PDF
2,036 pages
02 Data Stru
No ratings yet
02 Data Stru
59 pages
Data Types
No ratings yet
Data Types
12 pages
Data type
No ratings yet
Data type
3 pages
Data Types Sophia.K
No ratings yet
Data Types Sophia.K
1 page
Programming With C++ - (Chapter 3. Fundamental Data Types in C++)
No ratings yet
Programming With C++ - (Chapter 3. Fundamental Data Types in C++)
24 pages
primary Data types
No ratings yet
primary Data types
2 pages
SQL Commands
No ratings yet
SQL Commands
24 pages
PPL Record - Mohamed Shalin, Cse, 17 Veltech, Chennai
No ratings yet
PPL Record - Mohamed Shalin, Cse, 17 Veltech, Chennai
86 pages
Abrir New SAT Math Book 2024 (April Version)
No ratings yet
Abrir New SAT Math Book 2024 (April Version)
178 pages
Data Types: Category of Data Types Will Be Dealt With Later
No ratings yet
Data Types: Category of Data Types Will Be Dealt With Later
16 pages
2.1 Data Types Notes
No ratings yet
2.1 Data Types Notes
4 pages
Floating-Point Numbers: Chapter 1: Creating
No ratings yet
Floating-Point Numbers: Chapter 1: Creating
1 page
WEEK-2-MODULE-2-Part-1-Analytical-Chemistry
No ratings yet
WEEK-2-MODULE-2-Part-1-Analytical-Chemistry
17 pages
mysql-data-types
No ratings yet
mysql-data-types
2 pages
CEP233 - M1 - Definition Classification and Types of Surveys
No ratings yet
CEP233 - M1 - Definition Classification and Types of Surveys
15 pages
SQL Data Types
No ratings yet
SQL Data Types
4 pages
Esc 101 Variable Types
No ratings yet
Esc 101 Variable Types
5 pages
Important Review: Behrang Parhizkar (Hani)
No ratings yet
Important Review: Behrang Parhizkar (Hani)
26 pages
C Lecture-3-DataTypes
No ratings yet
C Lecture-3-DataTypes
8 pages
Data Types
No ratings yet
Data Types
1 page
Ch. 4 Roundoff and Truncation Errors
No ratings yet
Ch. 4 Roundoff and Truncation Errors
16 pages
HYSWEEP® SURVEY QuickStart
No ratings yet
HYSWEEP® SURVEY QuickStart
128 pages
Floating Point
No ratings yet
Floating Point
33 pages
McMullen ProgwPython 1e Mod04 PowerPoint
No ratings yet
McMullen ProgwPython 1e Mod04 PowerPoint
34 pages
MATLAB
100% (1)
MATLAB
182 pages
Second Term JSS1 Mathematics
100% (1)
Second Term JSS1 Mathematics
33 pages
Chapter 1
No ratings yet
Chapter 1
76 pages
Models User Guide PDF
No ratings yet
Models User Guide PDF
163 pages
Excel Guide 170212
No ratings yet
Excel Guide 170212
113 pages
Floating Point Circuits
No ratings yet
Floating Point Circuits
32 pages
GUI Testing Checklist
100% (1)
GUI Testing Checklist
91 pages
Math 421 422-Part 1
No ratings yet
Math 421 422-Part 1
95 pages
Tdi Excel Week 2 Assignment For Beginners (March)
No ratings yet
Tdi Excel Week 2 Assignment For Beginners (March)
4 pages
25-03-2024 - All Seniors - Jee-Main - IGTM-10 - Q.PAPER
No ratings yet
25-03-2024 - All Seniors - Jee-Main - IGTM-10 - Q.PAPER
19 pages
Br100 SCM Om Global Setup Document v1
No ratings yet
Br100 SCM Om Global Setup Document v1
83 pages
Assignment-2 Natural Log
No ratings yet
Assignment-2 Natural Log
4 pages
CH 05
No ratings yet
CH 05
30 pages
NFRC 601-2020 E0a0
No ratings yet
NFRC 601-2020 E0a0
8 pages
Chap1 ERROR
No ratings yet
Chap1 ERROR
21 pages
101 Excel Functions
No ratings yet
101 Excel Functions
33 pages
ASME IIA SA29 SA29M Steel Bars, Carbon and Alloy, Hot-Wrought and Cold-Finished
No ratings yet
ASME IIA SA29 SA29M Steel Bars, Carbon and Alloy, Hot-Wrought and Cold-Finished
1 page
Business Math Ig Mam Jo CSM
No ratings yet
Business Math Ig Mam Jo CSM
8 pages
Ce 010: Fundamentals of Surveying: Engr. Mariano Mike L. Tolentino
No ratings yet
Ce 010: Fundamentals of Surveying: Engr. Mariano Mike L. Tolentino
15 pages
dlp8 Math5q2
No ratings yet
dlp8 Math5q2
3 pages
Delivery Item Category (DIC) T Code: 0VLP
No ratings yet
Delivery Item Category (DIC) T Code: 0VLP
6 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet

Notes 03 - Database Storage - II

Uploaded by

Notes 03 - Database Storage - II

Uploaded by

CS525: Advanced Database Organization

Notes 3: Database Storage

Department of Computer Science

Illinois Institute of Technology

September 6th 2023

Database Systems: The Complete Book, 2nd Edition,

Problem#1: How the DBMS represents the database in files on disk

A tuple is essentially a sequence of bytes (byte arrays).

How a DBMS stores the bytes for a value?

This is a specific standard that defines the binary representation of

How a DBMS stores the bytes for a value

These are inexact1 , variable-precision numeric types that uses the

storing and retrieving a value might show slight discrepancies.

by the same algorithm using finite-precision, rounded arithmetic.

int main ( int argc , char * argv []) {

int main ( int argc , char * argv []) {

Numeric data types with arbitrary precision and scale.

These represent data types of arbitrary length.

Treated as a BLOB type

Reading: A paper explains the trade-offs between these two options:

Varies widely across different database systems?

is widely used as a starting point for measuring time intervals.

In order for the DBMS to be able to decipher the contents of tuples, it

List all of the tables in the current database:

List all of the columns in the student table:

Fast, short running operations

On-line Analytical Processing:

Long running, more complex queries

A new type of workload which has become popular recently is HTAP,

Watch HTAP Databases: What is New and What is Next -

There are different ways to store tuples in pages.

SELECT * FROM useracct

INSERT INTO useracct

SELECT COUNT ( U . lastLogin ) ,

SELECT COUNT ( U . lastLogin ) ,

If there is a match on one page, how can we figure out a match on

When decomposing a relational database into a column store model, it’s

A less common approach

How to handle the following operations on the tuple level?

Easy case Tuples fixed length/not in sequence(unordered)

(a) Deleted and immediately reclaim space by shifting other tuples or

A caveat when using physical addresses to reference a block/record

A caveat when using physical addresses to reference a block/record

the physical addresses will reference an incorrect tuple

Deleted tuple is identified by a NULL physical address in the Map table

Deleting a tuple using physical address: use a tombstone record

If new tuple is shorter than previous, easy

Problem#1: How the DBMS represents the database in files on disk

You might also like