0% found this document useful (0 votes)
148 views16 pages

Eecs 4411

This document outlines the topics that will be covered in the CSE-4411 Database Management Systems course. Unlike the previous CSE-3421 course which focused on using database systems, this course will focus on building database systems. The course will cover how to build components like the physical storage, query processor, and transaction manager. Students will work on building their own simplified relational database management system. The goal is to understand how commercial systems work by implementing core techniques.

Uploaded by

amir moghaddam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views16 pages

Eecs 4411

This document outlines the topics that will be covered in the CSE-4411 Database Management Systems course. Unlike the previous CSE-3421 course which focused on using database systems, this course will focus on building database systems. The course will cover how to build components like the physical storage, query processor, and transaction manager. Students will work on building their own simplified relational database management system. The goal is to understand how commercial systems work by implementing core techniques.

Uploaded by

amir moghaddam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CSE-4411

Database Management Systems

York University
Parke Godfrey

Fall 2010

CSE-4411—Database Management Systems—Godfrey – p. 1/16


CSE-3421 vs CSE-4411
CSE-4411 is a continuation of CSE-3421, right?
More of the same, eh?

Ha! No way.
In this class, we focus on how to build a database system.
In CSE-3421, we focused on what functionality a database
system provides, and how to use it.

CSE-4411—Database Management Systems—Godfrey – p. 2/16


Data Independence
• Do not need to know how a compiler works to write a
program.
• Do not need to know how an operating system is built to
use one.
• Don’t need to know how a car works to drive one.
• Don’t need to know how a database system is built to use it.

• physical data independence: how the data is logically


organized is independent of how it is physically organized.
(There is also logical data independence. . .)
• Codd’s law: Can only access and update the database via
the “query language” (SQL).
• SQL is a declarative language.
CSE-4411—Database Management Systems—Godfrey – p. 3/16
How to build a Database System?
Okay, more specifically, a relational database management
system( RDBMS).
E.g., Oracle, IBM DB2, Microsoft SQL Server, Informix,
MySQL, & Postgres.

In this class, we’re going to build our own


system!

CSE-4411—Database Management Systems—Godfrey – p. 4/16


How to build a Database System?
What is involved?
What functionality do we need to support?
E.g., SQL
What are our design criteria?
Should be fast. (At what?)
Must handle updates to the database and read-only
queries efficiently.
(Trade-offs involved!)
What are our design choices? Our design constraints?
How will the available technology affect our design
(architecture)?
E.g., Main memory technologies (like CMOS) are
volatile.

CSE-4411—Database Management Systems—Godfrey – p. 5/16


I. The Physical Database
Storage & Access
Ensure that data is permanent and safe.

Goals:
• permanence
• fast, random access
• fault tolerance (to support crash recovery)

Design questions:
• What devices / technology do we use?
• What data-structures do we use?
How do we access given pieces of data quickly?

CSE-4411—Database Management Systems—Godfrey – p. 6/16


II. The Query Processor
How to evaluate (SQL) queries efficiently? We need a
• query parser
• plan generator (and query optimizer)
Turns a valid SQL query into a “program” that answers
the query.
• query plan evaluator

Problems:
• SQL is reasonably complex.
• Not all (equivalent) queries are equal.
Some queries / query plans will evaluate inherently must
faster.
Big issue:
• How to “pick”, or design, a good query plan for a query?
CSE-4411—Database Management Systems—Godfrey – p. 7/16
A “Complex” Query

Supplier S: A (name), C (city)


Retailer R: B (name), C (city)
Query: Which supplier has a location in every city of a
retailer? Show such supplier (A) / retailer (B) pairs.

{hA, Bi | ∀C(hB, Ci ∈ R → hA, Ci ∈ S)}

πA,B (R × S) − πA,B (πA,B,C (πA (S) × R) − R ⋊


⋉ S)

CSE-4411—Database Management Systems—Godfrey – p. 8/16


A “Complex” Query
in SQL
select A, B from R, S
except
select A, B from (
select S.A, R.B, R.C from R, S
except
select S.A, R.B, R.C
from R, S
where R.C = S.C) as Z;

Any problems?

CSE-4411—Database Management Systems—Godfrey – p. 9/16


A “Complex” Query
Better?
select A, B
from R, S
where R.C = S.C
except
select A, B from (
select S.A, R1 .B, R2 .C
from R as R1 , R as R2 , S
where R1 .C = S.C and R1 .B = R2 .B
except
select S.A, R.B, R.C from R, S
where R.C = S.C
) as Z;

CSE-4411—Database Management Systems—Godfrey – p. 10/16


A “Complex” Query
cleaned up
with
J (A, B, C) as (
select S.A, R.B, R.C
from R, S
where R.C = S.C)
select distinct A, B from J
except
select J.A, J.B
from J, R
where J.B = R.B and
(J.A, J.B, R.C) not in
(select A, B, C from J);

CSE-4411—Database Management Systems—Godfrey – p. 11/16


A “Complex” Query
via COUNT
select J.A, J.B
from (select S.A, R.B, count(*) as Cs
from R, S
where R.C = S.C
group by S.A, R.B) as J,
(select B, count(*) as Cs
from R
group by B) as K
where J.B = K.B and
J.Cs = K.Cs;

CSE-4411—Database Management Systems—Godfrey – p. 12/16


The Query Optimizer

1. Rewrite
– Rewrite the query into something “simpler”, but that
means the same thing.

2. Cost-based
a. Determine a “best” over-all query tree.
b. Pick the best method for each operator in the query tree.
1) Pick the best access path for each table involved.
2) Assign the “best” algorithm to each operator
⋉, π , σ , . . .).
(⋊
c. Do a. & b. simultaneously!

CSE-4411—Database Management Systems—Godfrey – p. 13/16


III. Database Management
• transaction management Properties:
– How do we ensure updates are
• Atomicity
made to the database
correctly? • Consistency
• concurrency control • Isolation
– How do we ensure that multiple • Durability
X-act’s occuring
“simultaneously” are treated
correctly?
• crash recovery
– How do we recover from
failures? (E.g., ARIES)

CSE-4411—Database Management Systems—Godfrey – p. 14/16


Buliding a Database System
Anything we miss?
• host language support What pieces / modules do
e.g., JDBC we need to implement all
• data definition language this?
(DDL) What’s our architecture?
e.g., CREATE TABLE . . . Need a
• administrative functions • need a query optimizer
(for DBA’s) & security • a transaction manager
e.g., GRANT . . . – a lock manager for
• ... concurrency control
• a crash recovery
mechanism
• ...

CSE-4411—Database Management Systems—Godfrey – p. 15/16


Buliding a Database System
Why study this?!
It’s fun!
Some will get a job building RDBMSs.
E.g., at IBM Toronto Laboratory (for DB2)
Cannot be a good DB Administrator without understanding
how the system works.
Can be a better DB programmer when you understand how
the system works.
Lots of places are building database-like systems.
Can reuse the techniques and technologies from RDBMSs.

CSE-4411—Database Management Systems—Godfrey – p. 16/16

You might also like