0% found this document useful (0 votes)

50 views28 pages

Lecture3 PDF

This document summarizes a lecture on representing data elements in a database. It discusses storing fixed and variable length tuples, dealing with pointers, and issues with updates. It covers storing records in blocks, using offset tables and structured addresses, and managing pointers when blocks are moved between main memory and secondary storage using techniques like pointer swizzling.

Uploaded by

john

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views28 pages

Lecture3 PDF

Uploaded by

john

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Advanced Database Technology

Rasmus Pagh and S. Srinivasa Rao

IT University of Copenhagen
Spring 2006

Representing data elements

February 13, 2006

Based on Chapter 12 in GUW, [Pagh03] Sec. 1, and [CLRS01] pp. 405-409

This lecture: Representing data elements.

In this lecture we ask: How does one store relations in a blocked memory?

title year length filmType

Star Wars 1977 124 color
Mighty Ducks 1991 104 color
Wayne’s World 1992 95 color

Schema:Movie; Star Wars; Schema:Movie; Wayne’s

1977; 124; color; Mighty World; 1992; 95; color;
Ducks; 1991; 104; color ...

Problems with updates:

What if we want to add “Episode 4” to “Star Wars” but there is not
sufficient space in the block?

Representing data elements 1

Overview of this lecture

• Storing fixed sized tuples

• Dealing with pointers
• Variable length tuples
• Updates
• Queues, stacks, and linked lists (I/O model and amortized analysis)
• Index structures (separate slide set)

Representing data elements 2

Some terminology

• Attributes are stored as a sequence of bytes, called fields.

• Tuples are stored as a collection of fields, called records.
• Records are put together and are stored in blocks.
• A relation is a collection of records stored in blocks, called a file.

Representing data elements 3

Attributes stored in fields

The schema of a relation specifies the type of attributes. This decides how
much space is needed to store a relation. (It may be of variable size.)
• CHAR(7) a string of length 7 is stored in 7 bytes.
• BIT(2) is two bits, can be stored in two bits, but often a whole byte is
used.
• {RED, GREEN, BLUE, YELLOW} is an enumerated type that can be
stored as 00, 01, 10, 11, i.e. two bits is enough.

Representing data elements 4

Variable sized data

Some attributes may not have fixed sized. If the size varies a lot for
different tuples, then we do not want to allocate memory for all tuples to be
able to store the maximum sized attributes.
However, that is what VARCHAR(n) in SQL does. n + 1 is the number of
bytes allocated for the string, even if it may be shorter.

Two solutions:
• Length + content: n + 1 bytes allocated for a string of length n.
6 S t r i n g (assuming n < 256)
• Null-terminated string: n + 1 bytes allocated for a string of length n.
S t r i n g ⊥

Representing data elements 5

Records

A tuple is stored in a record. The size is the sum of sizes of the fields in the
record. A record often also stores a ‘header’.

A record header might store information such as

• the schema for the record (or a pointer to it)
• Size of the record
• Timestamps (last read, last updated)
The schema is used to access specific fields in the record.

Representing data elements 6

Schema information

Tells us how the fields (attributes) are stored within a record (tuple).
It contains
• the attributes of the relation
• their types
• the order in which attributes appear in a tuple
• constraints on the attributes and the relation

Representing data elements 7

Fixed-length records in blocks

Records are stored in blocks. Typically a block only contains one kind of
records.

The block may have a header with info.:

• Index information, often in form of pointers. (More on indexes later
today.)
• Type of tuples in the block.
• Offset table for the records in the block. Needed if records are of
variable length.
• Block ID.
• Timestamps.

Representing data elements 8

Problem session: Packing fields

Read the box on page 573 in the book and discuss:

• Do you agree with their conclusion?
• When is it a good/bad idea to pack fields?

Representing data elements 9

Block and record addresses

Addresses (pointers) to fields, records and blocks are often part of records
and we have to deal with them in a special way. E.g., pointers to schemas
and pointers used in index structures are stored in records.

Why addresses are different from other kind of data:

• Blocks are moved from secondary memory to main memory when they
are used.
• Records may move, both within a block and from one block to another.
• Records may be deleted.
• Attribute values may change size, i.e. data move within a record.

Representing data elements 10

Block addresses in main and secondary memory

Block address for blocks in main memory:

The block has an internal memory address when it is loaded into a buffer in
main memory.

Block address for blocks in secondary memory:

The physical address has to be used. The physical address describes the
physical location of the block.

Representing data elements 11

Physical and logical addresses

Physical address
Describes physical location, i.e. which disk, which cylinder, which track etc.
Typical size is 8-16 bytes.

Logical address
A fixed length arbitrary string for each record. A map table is used to map
logical addresses to physical addresses.
Useful when records are moved, since only the map table has to be updated,
and not the references to the record.

Representing data elements 12

Structured addresses

Structured address
A combination of physical and logical addresses. E.g., only the physical
address for the block. To find a record, an offset table in the block or
another kind of search in the block is needed.

Reasons why structured addresses are useful:

• A record can move within a block and still have the same structured
address.
• When a record is removed it can be replaced by a tombstone that
marks it deleted. The structured address can still be unchanged. When
the record is looked up we know it is deleted.

Representing data elements 13

Offset tables

How to organize an offset table:

• Grow the offset table from left to right and insert records from right to
left (since we do not know the size of the offset table when dealing with
variable length records and when using tombstones).
• If the entries of the offset table are large enough references to other
blocks can be stored. Useful when records are moved and we do not
want to update the address.
• The tombstone can be stored in the offset table and the space used by
the deleted record can be reused by another record.

Representing data elements 14

Pointer swizzling

How to manage pointers when blocks are moved between main memory
(memory addresses) and secondary memory (database addresses).
• When in secondary memory, database addresses are used.
• When in main memory, database or memory addresses may be used.
Using memory addresses is more efficient. Otherwise translation is needed.
A translation table is used to map database addresses to memory addresses.

Pointer swizzling
When a block is moved from secondary to main memory, pointers in the
block can be swizzled (translated) from database addresses to memory
addresses. A bit indicates the type of address.

Representing data elements 15

Swizzling strategies

Automatic Swizzling
When a block is moved into main memory, all pointers in the block are
swizzled if possible. All addresses to blocks currently in memory are stored
in the translation table.

Swizzling on Demand
Pointers are swizzled when they are followed. When a block is moved into
memory only the translation table is updated.

No Swizzling
Pointers are never swizzled. The translation table is used all the time.

Representing data elements 16

Problem session: Swizzling

• Discuss the pros and cons of the three swizzling strategies:

– Automatic Swizzling,
– Swizzling on Demand, and
– No Swizzling.
When is it a good/bad idea to use them?
• What are the problems when a block is written back to disk? And how
can they be solved?

Representing data elements 17

Variable-length data and records

Reasons why records not always have the same size:

• Fields of variable length. Attribute content vary in size.
• Repeating fields. An attribute that appears several times, but how
many times is not specified by the schema.
• Records of variable format. When different tuples in a relation have
different sets of attributes. E.g., if many attributes have no content.
• Enormous fields. Data like movies and pictures in the relation. The
record may not fit into one block.

Representing data elements 18

Fields of variable length

When a field has variable size we still have to be able to find all fields in the
record. Since the offset cannot be read from the relation schema some extra
information is stored in the record header.

Example of how it can be solved:

• Store fixed length fields first in the record.
• Store the total size of the record.
• Store offsets for variable sized fields (except the first).

Representing data elements 19

Repeating fields
When a record contains a variable number of a field.
Store information in the record header to locate all occurrences of the field
in the record.
A method to deal with fields of variable size and variable number of
occurrences:
• Keep the record fixed size.
• Store variable length data in a separate block and use a pointer to it.
• Fixed sized records can be searched more efficiently. Less information is
needed in the header. Moving records is easier.
• The number of I/O’s increase, since a pointer has to be followed.
Mixed strategies may be a good solution.

Representing data elements 20

Spanned records

A record is called spanned record if it is split between two or more blocks.

Reasons for spanned records:

• Space utilization.
• Records larger than a block.
For each fragment of a record, extra information on where to find next and
previous fragment is needed.

Representing data elements 21

BLOBS
Binary, Large OBjectS = BLOBS
BLOBS can be images, movies, audio files and other very large values that
can be stored in fields.
Storing BLOBS
• Stored in several blocks.
• Preferable to store them consecutively on a cylinder for efficient
retrieval.
Retrieving BLOBS
• A client retrieving a movie may not want it all at the same time.
• Retrieving a specific part of the large data requires an index structure
to make it efficient.

Representing data elements 22

Problem session: Updates

We will look at three types of updates:

• Insertions of new tuples
• Deletions of tuples
• Tuple updates
What problems may arise when updates are performed on the database?
Think of the different situations where we have:
• fixed length vs. variable length tuples
• no order vs. sorted tuples

Representing data elements 23

Updates
Insert
No order: No problem, just find a block with enough space or use a new
block.
Fixed order: May be a problem if there is not enough room in the correct
block. Solutions:
1. Find space in nearby block and rearrange
2. Create an overflow block
Delete
Pack data in the block to prepare for new inserts. Remove overflow blocks
if possible. Leave a tombstone if there may be pointers to the record.
Update
Fixed length: No problem.
Variable length: Same as for insert and delete. (But no tombstones.)

Representing data elements 24

Stacks and Queues

A stack maintains a collection of items in which only the most recently

added item may be removed.
A queue maintains a collection of items in which only the earliest added
item may be accessed/removed.
How can we maintain a stack or queue in external memory?
– use buffering

“macroscopic view” in external memory is same as “microscopic view” in

internal memory.

Representing data elements 25

Problem session on linked lists

Representing data elements 26

Summary

• Storing fixed sized tuples

• Variable length tuples
– offset tables
– overflow blocks
• Dealing with pointers
– logical and physical addresses
– database and memory addresses
– pointer swizzling
• Updates
• stacks, queues and linked lists in external memory

Representing data elements 27

IMDB Movie Analysis 05 Project
No ratings yet
IMDB Movie Analysis 05 Project
7 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
4 DBMS
No ratings yet
4 DBMS
78 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
18 pages
Lecture 17
No ratings yet
Lecture 17
24 pages
31 File Structures
No ratings yet
31 File Structures
20 pages
File Organization1
No ratings yet
File Organization1
17 pages
File and File Structure: Overview of Storage Device
No ratings yet
File and File Structure: Overview of Storage Device
29 pages
Fs Report
No ratings yet
Fs Report
28 pages
14-Record Nei Blocchi
No ratings yet
14-Record Nei Blocchi
14 pages
6 Data Storage and Querying
100% (1)
6 Data Storage and Querying
58 pages
Data Storage and Access Methods: Min Song IS698
No ratings yet
Data Storage and Access Methods: Min Song IS698
50 pages
Unit - 5 - Part 1
No ratings yet
Unit - 5 - Part 1
49 pages
DBMS - Unit 3 - Page 1-6
No ratings yet
DBMS - Unit 3 - Page 1-6
19 pages
Lecture 03 Storage
No ratings yet
Lecture 03 Storage
32 pages
Dbms 5
No ratings yet
Dbms 5
38 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
No ratings yet
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
38 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Elmasri 6e Ch17 Week2 HW DiskStorage
No ratings yet
Elmasri 6e Ch17 Week2 HW DiskStorage
96 pages
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
No ratings yet
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
13 pages
Unit 5
No ratings yet
Unit 5
185 pages
Ch4-Data Storage and Indexing
No ratings yet
Ch4-Data Storage and Indexing
116 pages
VND - Ms Powerpoint&Rendition 1
No ratings yet
VND - Ms Powerpoint&Rendition 1
118 pages
2.3 Databases
No ratings yet
2.3 Databases
9 pages
Business Objects Design
No ratings yet
Business Objects Design
5 pages
08 Storage
No ratings yet
08 Storage
43 pages
Topic2 4 Stid5014 PDD
No ratings yet
Topic2 4 Stid5014 PDD
70 pages
File Organization
No ratings yet
File Organization
37 pages
File Organization
No ratings yet
File Organization
47 pages
File Organization
No ratings yet
File Organization
4 pages
Data Storage and Indexing: João R. Campos
No ratings yet
Data Storage and Indexing: João R. Campos
55 pages
Unit I - Database Management System
No ratings yet
Unit I - Database Management System
77 pages
Fundamental File Structure Concepts-Report
No ratings yet
Fundamental File Structure Concepts-Report
25 pages
Day0 - Disk Storage
No ratings yet
Day0 - Disk Storage
50 pages
Topic: Databases: A Database Is A Way of Storing Information in A Structured, Logical Way. They Are Used To Collect and
No ratings yet
Topic: Databases: A Database Is A Way of Storing Information in A Structured, Logical Way. They Are Used To Collect and
8 pages
DBT 1
No ratings yet
DBT 1
10 pages
Lecture 03 Storage (2) - Without Answers
No ratings yet
Lecture 03 Storage (2) - Without Answers
45 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
Unit 4
No ratings yet
Unit 4
30 pages
DBMS Indexing and Storage
No ratings yet
DBMS Indexing and Storage
53 pages
Architecture and Implementation of Database Systems HS 07 Indexing
No ratings yet
Architecture and Implementation of Database Systems HS 07 Indexing
9 pages
CST 204 Dbms Module - 3 Physical Data Organization
No ratings yet
CST 204 Dbms Module - 3 Physical Data Organization
93 pages
Intro File2
No ratings yet
Intro File2
36 pages
04-Storage2 2
No ratings yet
04-Storage2 2
4 pages
Files, Pages, Records
No ratings yet
Files, Pages, Records
56 pages
Data Storage Structures
No ratings yet
Data Storage Structures
38 pages
CH 13
No ratings yet
CH 13
6 pages
Module 1 Part2
No ratings yet
Module 1 Part2
67 pages
4th Lecture (Database Structure)
No ratings yet
4th Lecture (Database Structure)
14 pages
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
No ratings yet
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
68 pages
Storage and File Structures: Goals
No ratings yet
Storage and File Structures: Goals
13 pages
Encapsulation Presentation
No ratings yet
Encapsulation Presentation
38 pages
Datatypes
No ratings yet
Datatypes
5 pages
Fundamental File Structure Concepts & Managing Files of Records
No ratings yet
Fundamental File Structure Concepts & Managing Files of Records
49 pages
ENACh 13 Final
No ratings yet
ENACh 13 Final
34 pages
Sequential Storage
No ratings yet
Sequential Storage
9 pages
6 Storage
No ratings yet
6 Storage
13 pages
File Organization and Data Base Design
No ratings yet
File Organization and Data Base Design
17 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Lectures For The Course: Data Warehousing and Data Mining (406035)
No ratings yet
Lectures For The Course: Data Warehousing and Data Mining (406035)
7 pages
IOT Based Vehicle Tracking and Monitoring System Using GPS and GSM
No ratings yet
IOT Based Vehicle Tracking and Monitoring System Using GPS and GSM
5 pages
Iot Based Implementation of Vehicle Monitoring and Tracking System Using Node Mcu
No ratings yet
Iot Based Implementation of Vehicle Monitoring and Tracking System Using Node Mcu
5 pages
fr2018 PDF
No ratings yet
fr2018 PDF
141 pages
1 Introduction To Statistical Packages
No ratings yet
1 Introduction To Statistical Packages
2 pages
Introduction and Statistical Packages: Based On A Book by Julian J. Faraway
No ratings yet
Introduction and Statistical Packages: Based On A Book by Julian J. Faraway
11 pages
UCLA Policy 340 Attachment B Page 1 of 1 Direct Costs, Indirect Costs and Administrative Overhead
No ratings yet
UCLA Policy 340 Attachment B Page 1 of 1 Direct Costs, Indirect Costs and Administrative Overhead
1 page
Economics, Institutions, and Development: A Global Perspective
No ratings yet
Economics, Institutions, and Development: A Global Perspective
19 pages
Statisticalpackage PDF
No ratings yet
Statisticalpackage PDF
16 pages
Statistical Software: An Overview: January 2011
No ratings yet
Statistical Software: An Overview: January 2011
9 pages
ZCC Form No
No ratings yet
ZCC Form No
4 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Resume Aman Bajpai
No ratings yet
Resume Aman Bajpai
2 pages
Health Management
No ratings yet
Health Management
7 pages
Solaris9 Brandedzone On Solaris10: Steps I Followed
No ratings yet
Solaris9 Brandedzone On Solaris10: Steps I Followed
5 pages
JWT - Magazine May 2024
No ratings yet
JWT - Magazine May 2024
145 pages
Lab 03
No ratings yet
Lab 03
9 pages
I2c Commands For Omni-3md and Omni-3mdmax v1.06
No ratings yet
I2c Commands For Omni-3md and Omni-3mdmax v1.06
5 pages
Use 8051 To Switch On and Off An LED Using A Toggle Switch
No ratings yet
Use 8051 To Switch On and Off An LED Using A Toggle Switch
20 pages
Auditing IT Governance Controls
No ratings yet
Auditing IT Governance Controls
30 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Install Guide Linux Ubuntu 8.04.4 LTS v1.0
No ratings yet
Install Guide Linux Ubuntu 8.04.4 LTS v1.0
12 pages
Print Voucher - Queenara Network
No ratings yet
Print Voucher - Queenara Network
10 pages
01 - Unique User Identification Across Multiple Social Networks
No ratings yet
01 - Unique User Identification Across Multiple Social Networks
2 pages
1 s2.0 S0167865516303324 Main
No ratings yet
1 s2.0 S0167865516303324 Main
7 pages
Ge 5it Living in The It Era Module 06 Managing The Files
No ratings yet
Ge 5it Living in The It Era Module 06 Managing The Files
42 pages
CCNA Command
No ratings yet
CCNA Command
19 pages
Computer Paper Key 8 B
No ratings yet
Computer Paper Key 8 B
3 pages
Week 10
No ratings yet
Week 10
3 pages
Software Engineering Using Autonomous Agents Are We There Yet
No ratings yet
Software Engineering Using Autonomous Agents Are We There Yet
3 pages
Week 3 - LECTURE - 2 - GIS Data Systems RASTER - VECTOR
No ratings yet
Week 3 - LECTURE - 2 - GIS Data Systems RASTER - VECTOR
26 pages
Clock Tree Design Considerations
No ratings yet
Clock Tree Design Considerations
4 pages
Nexus R5 Wiring Diagram V1.0
No ratings yet
Nexus R5 Wiring Diagram V1.0
2 pages
Summative Test Ict 9
No ratings yet
Summative Test Ict 9
3 pages
Data Science Notes
No ratings yet
Data Science Notes
13 pages
KSS Configuration
No ratings yet
KSS Configuration
2 pages
CBD2234 Lecture2 Ch2
No ratings yet
CBD2234 Lecture2 Ch2
37 pages
Tinyos
No ratings yet
Tinyos
15 pages
Project Codes
No ratings yet
Project Codes
3 pages
Dada
No ratings yet
Dada
3 pages
PDAWIRELESSDEVICES
No ratings yet
PDAWIRELESSDEVICES
29 pages

Lecture3 PDF

Uploaded by

Lecture3 PDF

Uploaded by

Advanced Database Technology

Rasmus Pagh and S. Srinivasa Rao

Representing data elements

February 13, 2006

Based on Chapter 12 in GUW, [Pagh03] Sec. 1, and [CLRS01] pp. 405-409

title year length filmType

Schema:Movie; Star Wars; Schema:Movie; Wayne’s

Problems with updates:

Representing data elements 1

• Storing fixed sized tuples

Representing data elements 2

• Attributes are stored as a sequence of bytes, called fields.

Representing data elements 3

Representing data elements 4

Representing data elements 5

A record header might store information such as

Representing data elements 6

Representing data elements 7

The block may have a header with info.:

Representing data elements 8

Read the box on page 573 in the book and discuss:

Representing data elements 9

Why addresses are different from other kind of data:

Representing data elements 10

Block address for blocks in main memory:

Block address for blocks in secondary memory:

Representing data elements 11

Representing data elements 12

Reasons why structured addresses are useful:

Representing data elements 13

How to organize an offset table:

Representing data elements 14

Representing data elements 15

Representing data elements 16

• Discuss the pros and cons of the three swizzling strategies:

Representing data elements 17

Reasons why records not always have the same size:

Representing data elements 18

Example of how it can be solved:

Representing data elements 19

Representing data elements 20

A record is called spanned record if it is split between two or more blocks.

Reasons for spanned records:

Representing data elements 21

Representing data elements 22

We will look at three types of updates:

Representing data elements 23

Representing data elements 24

A stack maintains a collection of items in which only the most recently

“macroscopic view” in external memory is same as “microscopic view” in

Representing data elements 25

Representing data elements 26

• Storing fixed sized tuples

Representing data elements 27

You might also like