0% found this document useful (0 votes)
37 views39 pages

Lecture 01

This document outlines the structure and content of the CSE544 Introduction to Databases course, including information about instructors, meeting times, goals of the course, grading structure, projects, textbooks, and an overview of topics to be covered such as data modeling, transactions, query execution, and database security. The course will use a lecture and paper discussion format and aims to teach foundational database concepts as well as current research topics.

Uploaded by

Rana Gaballah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views39 pages

Lecture 01

This document outlines the structure and content of the CSE544 Introduction to Databases course, including information about instructors, meeting times, goals of the course, grading structure, projects, textbooks, and an overview of topics to be covered such as data modeling, transactions, query execution, and database security. The course will use a lecture and paper discussion format and aims to teach foundational database concepts as well as current research topics.

Uploaded by

Rana Gaballah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

CSE544

Introduction

Monday, March 27, 2006


Staff
• Instructor: Dan Suciu
– CSE 662, [email protected]
– Office hours: Wednesdays, 12pm-1pm
• TA: Bhushan Mandhani
– Office hours: TBA
• Mailing list: [email protected]
– http://mailman.cs.washington.edu/mailman/private/cse544
• Web page: (a lot of stuff already there) http://www.cs.
washington.edu/544
Course Times
• Mon, Wed, 10:30-12

• Final:
– 8:30-10:20 a.m. Monday, Jun. 5, 2006
– In this room
Goals of the Course
• Using database systems

• Foundations of data management.

• Issues in building database systems.

• Current research topics in databases.


Format
Basic structure:
• Lectures on Wednesdays
• Paper discussions on Mondays (reviews !)
Content
• Data modeling basics (3weeks):
– SQL/XQuery with homeworks on Postgres/Galax
– Logical foundations of databases
• Transactions (2weeks):
– concurrency control (locks, timestamps)
– recovery (undo, redo, undo/redo)
• Topics in query execution/optimization (3weeks)
• Database security (1 week)
• Databases + IR, Probabilistic databases (1 week)
Textbooks

Won’t follow any book, but you may want to consult


them if you need more details to understand a topic
• Database Management Systems, Ramakrishnan
• The Complete Book, GarciaMolina, Ullman, Widom
• Xquery from the Experts, Howard Katz, Ed.
• Data on the Web, Abiteboul, Buneman, Suciu
• Theory of database systems, Abiteboul, Hull, Vianu
Grading
• Homework: 20%
• Paper reviews: 20%
• Participation in the discussions: 10%
• Project: 30%
• Final: 20%
Homework: 20%
HW1:
minor programming in SQL and Xquery

HW2:
problem sets, no programming
theory, optimizations, query execution,
transactions
Project: 30%
• Choose from a list of mini-research
topics, or come up with your own
• Open ended
• Write short research paper (2-3 pages)
• Conference-style presentation
Project: 30%
• Goals: apply database principles to a new problem
– Understand and model the problem
– Research and understand related work (1-2 papers)
– Propose some new approach (creativity will be evaluated)
– Implement some part

• NOT intended to be a major software development


• Amount of work may vary widely between groups
Project: 30%
Milestones:
• Groups of 1-3 assembled by 4/5

• Proposals due by 4/10

• Short research papers (2-3pages) due by 5/30

• Presentations on 5/31 in class (MAY TAKE LONGER THAN


12pm)
Paper Reviews: 20%
• There will be reading assignments

• Papers are discussed Mondays

• You have to write the reviews by Sunday


night
Final: 20%

• June 5, 8:30-10:30, same room

• Challenging and fun


Database
What is a database ?

Give examples of databases


Database
What is a database ?
• A collection of files storing related data

Give examples of databases


• Accounts database; payroll database; UW’s
students database; Amazon’s products
database; airline reservation database
Database Management System
What is a DBMS ?

Give examples of DBMS


Database Management System
What is a DBMS ?
• A big C program written by someone else that allows us to
manage efficiently a large database and allows it to persist
over long periods of time

Give examples of DBMS


• DB2 (IBM), SQL Server (MS), Oracle, Sybase
• MySQL, Postgres, …
Market Shares
From 2004 www.computerworld.com

• IMB: 35% market with $2.5BN in sales

• Oracle: 33% market with $2.3BN in sales

• Microsoft: 19% market with $1.3BN in sales


An Example
The Internet Movie Database
http://www.imdb.com

• Entities:
Actors (800k), Movies (400k), Directors, …

• Relationships:
who played where, who directed what, …
Want to store and process locally; what functions do we need ?
Functionality
1. Create/store large datasets
2. Search/query/update
3. Change the structure
4. Concurrent access to many user
5. Recover from crashes
6. Security (not here, but in other apps)
Possible Organizations
• Files

• Spreadsheets

• DBMS
1. Create/store Large Datasets
• Files Yes, but…

• Spreadsheets Not really…

• DBMS Yes
2. Search/Query/Update
• Simple query:
– In what year was ‘Rain man’ produced ?
• Multi-table query:
– Find all movies by ‘Coppola’
• Complex query:
– For each actor, count her/his movies
• Updating
– Insert a new movie; add an actor to a movie; etc
2. Search/Query/Update
• Files Simple queries

• Spreadsheets Multi-table queries


(maybe)

• DBMS All

Updates: generally OK
3. Change the Structure
Add Address to each Actor

Very hard
• Files

• Spreadsheets Yes

• DBMS Yes
4. Concurrent Access
Multiple users access/update the data concurrently

• What can go wrong ?

• How do we protect against that in OS ?

• This is insufficient in databases; why ?


4. Concurrent Access
Multiple users access/update the data concurrently

• What can go wrong ?


– Lost update; resulting in inconsistent data
• How do we protect against that in OS ?
– Locks
4. Concurrent Access
Transfer $100 from Find total amount
account A to B: in A and B:
XX==Read(Accounts,
Read(Accounts,A);
A);
X.amount
X.amount==X.amount
X.amount--100;
100;
Write(Accounts,
Write(Accounts,A,
A,X);
X); XX==Read(Accounts,
Read(Accounts,A);
A);
YY==Read(Accounts,
Read(Accounts,B);
B);
SS==X.amount
X.amount++Y.amount
Y.amount
YY==Read(Accounts,
Read(Accounts,B);
B); return S
return S
Y.amount
Y.amount==Y.amount
Y.amount++100;
100;
Write(Accounts,
Write(Accounts,B,
B,Y);
Y);

What can go wrong ? Do locks help ?


5. Recover from crashes
XX==Read(Accounts,
Read(Accounts,A);
A);
X.amount
X.amount==X.amount
X.amount--100;
100;
Write(Accounts,
Write(Accounts,A,
A,X);
X);
CRASH !
YY==Read(Accounts,
Read(Accounts,B);
B);
Y.amount
Y.amount==Y.amount
Y.amount++100;
100;
Write(Accounts,
Write(Accounts,B,
B,Y);
Y);

What is the problem ?


Enters a DMBS
“Two tier system” or “client-server”

connection
(ODBC, JDBC)

Database server
(someone else’s
Data files C program) Applications
DBMS = Collection of Tables
Directors: Movie_Directors:
id fName lName id mid
15901 Francis Ford Coppola 15901 130128
... ...

Movies:
mid Title Year
130128 The Godfather 1972
...

Still implemented as files,


but behind the scenes can be quite complex
“data independence”
1. Create/store Large Datasets
Use SQL to create and populate tables:
CREATE
CREATE TABLETABLE Actors
Actors ((
Name INSERT
INSERT INTO
INTO Actors
Actors
Name CHAR(30)
CHAR(30) VALUES(‘Tom
DateOfBirth
DateOfBirth CHAR(20)
CHAR(20) VALUES(‘Tom Hanks’,
Hanks’, .. .. .).)
)) .. .. ..

Size and physical organization is handled by DBMS


We focus on modeling the database

Will study data modeling in this course


2. Searching/Querying/Updating
• Find all movies by ‘Coppola’

SELECT
SELECT title
title
FROM
FROM Movies,
Movies, Directors,
Directors, Movie_Directors
Movie_Directors
WHERE
WHERE Directors.lname
Directors.lname== ‘Coppola’
‘Coppola’ and
and
Movies.mid
Movies.mid == Movie_Directors.mid
Movie_Directors.mid and
and
Movie_Directors.id
Movie_Directors.id ==Directors.id
Directors.id

• What
Wehappens behind
will study SQL the scenedetails
in gory ? in this course

We will discuss the query optimizer in class.


3. Changing the Structure
Add Address to each Actor

ALTER
ALTERTABLE
TABLE Actor
Actor
ADD
ADD address
address CHAR(50)
CHAR(50)
DEFAULT
DEFAULT ‘unknown’
‘unknown’

Lots of cleverness goes on behind the scenes


3&4 Concurrency&Recovery:
Transactions
• A transaction = sequence of statements that
either all succeed, or all fail
• E.g. Transfer $100 BEGINBEGINTRANSACTION;
TRANSACTION;
UPDATE
UPDATEAccounts
Accounts
SET
SET amount==amount
amount amount--100
100
WHERE
WHEREnumber
number==4662
4662
UPDATE
UPDATEAccounts
Accounts
SET
SETamount
amount==amount
amount++100
100
WHERE number = 7199
WHERE number = 7199
COMMIT
COMMIT
Transactions
• Transactions have the ACID properties:
A = atomicity
C = consistency
I = isolation
D = durability
4. Concurrent Access
• Serializable execution of transactions
– The I (=isolation) in ACID

We study three techniques in this course


Locks
Timestamps
Validation
5. Recovery from crashes
• Every transaction either executes
completely, or doesn’t execute at all
– The A (=atomicity) in ACID

We study three types of log files in this course


Undo log file
Redo log file
Undo/Redo log file

You might also like