0% found this document useful (0 votes)
50 views28 pages

Lecture3 PDF

This document summarizes a lecture on representing data elements in a database. It discusses storing fixed and variable length tuples, dealing with pointers, and issues with updates. It covers storing records in blocks, using offset tables and structured addresses, and managing pointers when blocks are moved between main memory and secondary storage using techniques like pointer swizzling.

Uploaded by

john
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views28 pages

Lecture3 PDF

This document summarizes a lecture on representing data elements in a database. It discusses storing fixed and variable length tuples, dealing with pointers, and issues with updates. It covers storing records in blocks, using offset tables and structured addresses, and managing pointers when blocks are moved between main memory and secondary storage using techniques like pointer swizzling.

Uploaded by

john
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Advanced Database Technology

Rasmus Pagh and S. Srinivasa Rao


IT University of Copenhagen
Spring 2006

Representing data elements

February 13, 2006

Based on Chapter 12 in GUW, [Pagh03] Sec. 1, and [CLRS01] pp. 405-409


This lecture: Representing data elements.

In this lecture we ask: How does one store relations in a blocked memory?

title year length filmType


Star Wars 1977 124 color
Mighty Ducks 1991 104 color
Wayne’s World 1992 95 color

Schema:Movie; Star Wars; Schema:Movie; Wayne’s


1977; 124; color; Mighty World; 1992; 95; color;
Ducks; 1991; 104; color ...

Problems with updates:


What if we want to add “Episode 4” to “Star Wars” but there is not
sufficient space in the block?

Representing data elements 1


Overview of this lecture

• Storing fixed sized tuples


• Dealing with pointers
• Variable length tuples
• Updates
• Queues, stacks, and linked lists (I/O model and amortized analysis)
• Index structures (separate slide set)

Representing data elements 2


Some terminology

• Attributes are stored as a sequence of bytes, called fields.


• Tuples are stored as a collection of fields, called records.
• Records are put together and are stored in blocks.
• A relation is a collection of records stored in blocks, called a file.

Representing data elements 3


Attributes stored in fields

The schema of a relation specifies the type of attributes. This decides how
much space is needed to store a relation. (It may be of variable size.)
• CHAR(7) a string of length 7 is stored in 7 bytes.
• BIT(2) is two bits, can be stored in two bits, but often a whole byte is
used.
• {RED, GREEN, BLUE, YELLOW} is an enumerated type that can be
stored as 00, 01, 10, 11, i.e. two bits is enough.

Representing data elements 4


Variable sized data

Some attributes may not have fixed sized. If the size varies a lot for
different tuples, then we do not want to allocate memory for all tuples to be
able to store the maximum sized attributes.
However, that is what VARCHAR(n) in SQL does. n + 1 is the number of
bytes allocated for the string, even if it may be shorter.

Two solutions:
• Length + content: n + 1 bytes allocated for a string of length n.
6 S t r i n g (assuming n < 256)
• Null-terminated string: n + 1 bytes allocated for a string of length n.
S t r i n g ⊥

Representing data elements 5


Records

A tuple is stored in a record. The size is the sum of sizes of the fields in the
record. A record often also stores a ‘header’.

A record header might store information such as


• the schema for the record (or a pointer to it)
• Size of the record
• Timestamps (last read, last updated)
The schema is used to access specific fields in the record.

Representing data elements 6


Schema information

Tells us how the fields (attributes) are stored within a record (tuple).
It contains
• the attributes of the relation
• their types
• the order in which attributes appear in a tuple
• constraints on the attributes and the relation

Representing data elements 7


Fixed-length records in blocks

Records are stored in blocks. Typically a block only contains one kind of
records.

The block may have a header with info.:


• Index information, often in form of pointers. (More on indexes later
today.)
• Type of tuples in the block.
• Offset table for the records in the block. Needed if records are of
variable length.
• Block ID.
• Timestamps.

Representing data elements 8


Problem session: Packing fields

Read the box on page 573 in the book and discuss:


• Do you agree with their conclusion?
• When is it a good/bad idea to pack fields?

Representing data elements 9


Block and record addresses

Addresses (pointers) to fields, records and blocks are often part of records
and we have to deal with them in a special way. E.g., pointers to schemas
and pointers used in index structures are stored in records.

Why addresses are different from other kind of data:


• Blocks are moved from secondary memory to main memory when they
are used.
• Records may move, both within a block and from one block to another.
• Records may be deleted.
• Attribute values may change size, i.e. data move within a record.

Representing data elements 10


Block addresses in main and secondary memory

Block address for blocks in main memory:


The block has an internal memory address when it is loaded into a buffer in
main memory.

Block address for blocks in secondary memory:


The physical address has to be used. The physical address describes the
physical location of the block.

Representing data elements 11


Physical and logical addresses

Physical address
Describes physical location, i.e. which disk, which cylinder, which track etc.
Typical size is 8-16 bytes.

Logical address
A fixed length arbitrary string for each record. A map table is used to map
logical addresses to physical addresses.
Useful when records are moved, since only the map table has to be updated,
and not the references to the record.

Representing data elements 12


Structured addresses

Structured address
A combination of physical and logical addresses. E.g., only the physical
address for the block. To find a record, an offset table in the block or
another kind of search in the block is needed.

Reasons why structured addresses are useful:


• A record can move within a block and still have the same structured
address.
• When a record is removed it can be replaced by a tombstone that
marks it deleted. The structured address can still be unchanged. When
the record is looked up we know it is deleted.

Representing data elements 13


Offset tables

How to organize an offset table:


• Grow the offset table from left to right and insert records from right to
left (since we do not know the size of the offset table when dealing with
variable length records and when using tombstones).
• If the entries of the offset table are large enough references to other
blocks can be stored. Useful when records are moved and we do not
want to update the address.
• The tombstone can be stored in the offset table and the space used by
the deleted record can be reused by another record.

Representing data elements 14


Pointer swizzling

How to manage pointers when blocks are moved between main memory
(memory addresses) and secondary memory (database addresses).
• When in secondary memory, database addresses are used.
• When in main memory, database or memory addresses may be used.
Using memory addresses is more efficient. Otherwise translation is needed.
A translation table is used to map database addresses to memory addresses.

Pointer swizzling
When a block is moved from secondary to main memory, pointers in the
block can be swizzled (translated) from database addresses to memory
addresses. A bit indicates the type of address.

Representing data elements 15


Swizzling strategies

Automatic Swizzling
When a block is moved into main memory, all pointers in the block are
swizzled if possible. All addresses to blocks currently in memory are stored
in the translation table.

Swizzling on Demand
Pointers are swizzled when they are followed. When a block is moved into
memory only the translation table is updated.

No Swizzling
Pointers are never swizzled. The translation table is used all the time.

Representing data elements 16


Problem session: Swizzling

• Discuss the pros and cons of the three swizzling strategies:


– Automatic Swizzling,
– Swizzling on Demand, and
– No Swizzling.
When is it a good/bad idea to use them?
• What are the problems when a block is written back to disk? And how
can they be solved?

Representing data elements 17


Variable-length data and records

Reasons why records not always have the same size:


• Fields of variable length. Attribute content vary in size.
• Repeating fields. An attribute that appears several times, but how
many times is not specified by the schema.
• Records of variable format. When different tuples in a relation have
different sets of attributes. E.g., if many attributes have no content.
• Enormous fields. Data like movies and pictures in the relation. The
record may not fit into one block.

Representing data elements 18


Fields of variable length

When a field has variable size we still have to be able to find all fields in the
record. Since the offset cannot be read from the relation schema some extra
information is stored in the record header.

Example of how it can be solved:


• Store fixed length fields first in the record.
• Store the total size of the record.
• Store offsets for variable sized fields (except the first).

Representing data elements 19


Repeating fields
When a record contains a variable number of a field.
Store information in the record header to locate all occurrences of the field
in the record.
A method to deal with fields of variable size and variable number of
occurrences:
• Keep the record fixed size.
• Store variable length data in a separate block and use a pointer to it.
• Fixed sized records can be searched more efficiently. Less information is
needed in the header. Moving records is easier.
• The number of I/O’s increase, since a pointer has to be followed.
Mixed strategies may be a good solution.

Representing data elements 20


Spanned records

A record is called spanned record if it is split between two or more blocks.

Reasons for spanned records:


• Space utilization.
• Records larger than a block.
For each fragment of a record, extra information on where to find next and
previous fragment is needed.

Representing data elements 21


BLOBS
Binary, Large OBjectS = BLOBS
BLOBS can be images, movies, audio files and other very large values that
can be stored in fields.
Storing BLOBS
• Stored in several blocks.
• Preferable to store them consecutively on a cylinder for efficient
retrieval.
Retrieving BLOBS
• A client retrieving a movie may not want it all at the same time.
• Retrieving a specific part of the large data requires an index structure
to make it efficient.

Representing data elements 22


Problem session: Updates

We will look at three types of updates:


• Insertions of new tuples
• Deletions of tuples
• Tuple updates
What problems may arise when updates are performed on the database?
Think of the different situations where we have:
• fixed length vs. variable length tuples
• no order vs. sorted tuples

Representing data elements 23


Updates
Insert
No order: No problem, just find a block with enough space or use a new
block.
Fixed order: May be a problem if there is not enough room in the correct
block. Solutions:
1. Find space in nearby block and rearrange
2. Create an overflow block
Delete
Pack data in the block to prepare for new inserts. Remove overflow blocks
if possible. Leave a tombstone if there may be pointers to the record.
Update
Fixed length: No problem.
Variable length: Same as for insert and delete. (But no tombstones.)

Representing data elements 24


Stacks and Queues

A stack maintains a collection of items in which only the most recently


added item may be removed.
A queue maintains a collection of items in which only the earliest added
item may be accessed/removed.
How can we maintain a stack or queue in external memory?
– use buffering

“macroscopic view” in external memory is same as “microscopic view” in


internal memory.

Representing data elements 25


Problem session on linked lists

Representing data elements 26


Summary

• Storing fixed sized tuples


• Variable length tuples
– offset tables
– overflow blocks
• Dealing with pointers
– logical and physical addresses
– database and memory addresses
– pointer swizzling
• Updates
• stacks, queues and linked lists in external memory

Representing data elements 27

You might also like