0% found this document useful (0 votes)
33 views32 pages

Notes 01 - Introduction

The document provides an introduction to database management systems (DBMS) and outlines core concepts such as data, databases, queries, relations, and schemas. It discusses the Megatron 3000, an imaginary DBMS, detailing its implementation and limitations, including issues with data storage, query processing, and concurrency control. The course aims to teach students better methods for building DBMSs, with reading assignments to refresh foundational knowledge in relational models and SQL.

Uploaded by

Dhruv Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views32 pages

Notes 01 - Introduction

The document provides an introduction to database management systems (DBMS) and outlines core concepts such as data, databases, queries, relations, and schemas. It discusses the Megatron 3000, an imaginary DBMS, detailing its implementation and limitations, including issues with data storage, query processing, and concurrency control. The course aims to teach students better methods for building DBMSs, with reading assignments to refresh foundational knowledge in relational models and SQL.

Uploaded by

Dhruv Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

CS525-04/05: Advanced Database Organization

Notes 1: Introduction to DBMS Implementation

Yousef M. Elmehdwi
Department of Computer Science

Illinois Institute of Technology

[email protected]

August 23rd 2023

Slides: adapted from a course taught by Hector Garcia-Molina, Stanford

1 / 32
Core Terminology Review
Data
Data refers to any piece of information that holds value and is worth
keeping.
It’s often stored in electronic form and can range from numbers and text
to images and videos.
Database
organized collection of interrelated data that models some aspect of the
real-world.
Query
operation that retrieves specific data from a database based on certain
criteria or conditions.
queries allow users to extract relevant information.
Relation
refers to the organization of data into a two-dimensional table, where rows
(tuples) represent basic entities or facts of some sort, and columns
(attributes) represent properties of those entities.
Schema
a description of the structure of the data in a database, often called
“metadata”
it’s like a blueprint that outlines how the data is organized, what types of
data are stored, and how they are related.
2 / 32
Database Management System (DBMS)

A DBMS is software that allows applications to store and analyze


information in a database.
A general-purpose DBMS is designed to allow the definition, creation,
querying, update, and administration of databases.

3 / 32
Advanced Database Organization?

=Database Implementation
=How to implement a database system
and have fun doing it ;-)

4 / 32
What do you want from a DBMS?

Keep data around (persistent)


Answer questions (queries) about data
Update data

5 / 32
Isn’t Implementing a Database System Simple?

Relation ⇒ Statements ⇒ Results

6 / 32
Introduction the Megatron 3000
Database Management System

“Imaginary” database System


The latest from Megatron Labs
Incorporates latest relational technology
UNIX compatible
Lightweight & cheap!

7 / 32
Megatron 3000 Implementation Details

Megatron 3000 uses the file system to store its relations


Relations stored in files (ASCII)
Use a separate file per entity/relation.
The application has to parse the files each time they want to read/update
records.
e.g., relation Students(name,id,dept ) is in /usr/db/Students
The file Students has one line for each tuple.
Values of components of a tuple are stored as a character string, separated
by special marker character #

Smith # 123 # CS
Jonson # 522 # EE
..
.

8 / 32
Megatron 3000 Implementation Details

The database schema is stored in a special file


Schema file (ASCII) in /usr/db/schema
For each relation, the file schema has a line beginning with that relation
name, in which attribute names alternate with types.
The character # separates elements of these lines.

Students # name # STR # id # INT # dept . . .


Depts # C # STR # A # INT ...
..
.

9 / 32
Megatron 3000 Implementation Details

10 / 32
Megatron 3000 Sample Sessions

We are now talking to the Megatron 3000 user interface, to which we


can type SQL queries in response to the Megatron prompt (&).

11 / 32
Megatron 3000 Sample Sessions

A # ends a query

12 / 32
Megatron 3000 Sample Sessions

Execute a query and send the result to printer

Result sent to LPR (printer).

13 / 32
Megatron 3000 Sample Sessions

Execute a query and store the result in a new file

New relation LowId created.

14 / 32
How Megatron 3000 Executes Queries

To execute

SELECT * FROM R WHERE < condition >

1 Read schema to get attributes of R


2 Check validity of condition
3 Display attributes of R as the header
4 Read file R; for each line:
a Check condition
b If TRUE, display the line as tuple

15 / 32
Megatron 3000 Query Execution

To execute

SELECT * FROM R WHERE < condition > | T

1 Process select as before but omit Step 3


2 Write results to new file T
3 Append new line to dictionary usr/db/schema

16 / 32
Megatron 3000 Query Execution

Consider a more complicated query, one involving a join of two relations


R, S
To execute

SELECT A , B FROM R , S WHERE < condition >

1 Read schema to get R,S attributes


2 Read R file, for each line r:
a Read S file, for each line s:
1 Create join tuple r & s
2 Check condition
3 If TRUE, Display r,s[A,B]

17 / 32
What’s wrong with Megatron 3000 DBMS?

DBMS is not implemented like our imaginary Megatron 3000


Described implementation is inadequate for applications involving
significant amount of data or multiple users of data
Partial list of problems follows

18 / 32
What’s wrong with Megatron 3000 DBMS?

Tuple layout on disk is inadequate with no flexibility when the database


is modified
e.g., change String from CS to CSDept in one Students tuple, we have to
rewrite the entire file
ASCII storage is expensive
Deletions are expensive

19 / 32
What’s wrong with Megatron 3000 DBMS?

Search expensive; no indexes


e.g., cannot find tuple with given key quickly
Always have to read full relation

20 / 32
What’s wrong with Megatron 3000 DBMS?

Brute force query processing


e.g.,

SELECT * FROM R , S WHERE R . A = S . A and S . B > 1000

Much better if use index to select tuples that satisfy condition (Do select
using S.B >1000 first)
More efficient join (sort both relations on A and merge)

21 / 32
What’s wrong with Megatron 3000 DBMS?

No buffer manager
There is no way for useful data to be buffered in main memory; all data
comes off the disk, all the time
e.g., need caching.

22 / 32
What’s wrong with Megatron 3000 DBMS?

No concurrency control
Several users can modify a file at the same time with unpredictable results.

23 / 32
What’s wrong with Megatron 3000 DBMS?

No reliability
e.g., in case of error/crash, say, power failure or leave operations half done
Can lose data

24 / 32
What’s wrong with Megatron 3000 DBMS?

No security
e.g., file system security is coarse
Unable to restrict access, say, to some fields of a relation and not others

25 / 32
What’s wrong with Megatron 3000 DBMS?

No application program interface (API)


e.g., how can a payroll program get at the data?

26 / 32
What’s wrong with Megatron 3000 DBMS?

Cannot interact with other DBMSs.

27 / 32
What’s wrong with Megatron 3000 DBMS?

No GUI

28 / 32
This Course

Introduce students to better way of building a database management


systems.

29 / 32
Reading assignment

Refresh your memory about basics of the relational model and SQL
from your earlier course notes
from some textbook
http://cs.iit.edu/~cs425/schedule.html

30 / 32
Reading

Course Blackboard: Assignments\Reading subfolder


Chapter 1: “Introduction to DBMS Implementation”

31 / 32
Next

Notes 2: Hardware

32 / 32

You might also like