0% found this document useful (0 votes)
4 views

lec01_introduction-414

CSE 414 is an introductory course on data management, covering topics such as databases, DBMS, and the relational data model. The course includes lectures, sections, homework assignments, and two exams, with grading based on homework and exams. Key resources include a main textbook and a course website for communication and announcements.

Uploaded by

alessiatxy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

lec01_introduction-414

CSE 414 is an introductory course on data management, covering topics such as databases, DBMS, and the relational data model. The course includes lectures, sections, homework assignments, and two exams, with grading based on homework and exams. Key resources include a main textbook and a course website for communication and announcements.

Uploaded by

alessiatxy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

CSE 414: Intro to Data Management

Introduction

March 25, 2024 Introduction 1


Outline

1. Administrivia

2. Databases, DBMS

3. The Relational Data Model

March 25, 2024 Introduction 2


414 Staff
Instructor: TAs:
§ Dan Suciu § Zareef Amyeen
suciu@cs § Eden Chmielewski
§ Cindy Fu
§ Arjun Jagnani
§ Moe Kayali
§ Aaron D Kim
§ Madrona Kelly Maling
§ Qirui Wang
§ Emi Kamaleiokekua Yoshikawa
§ Andrew Mingwei Zhang

March 25, 2024 Introduction 3


Course Format
§ Lectures: in person, in this room
• Attend. Arrive on time. Pay attention.

§ Sections: in person, see locations at my.uw.edu


• Bring your laptop

§ Several homework assignments


• First assignment published on gradescope

§ Two exams:
• Midterm: Friday, April 26, 10:30-11:20 in class
• Final: Monday, June 3, 8:30-10:20 same room

March 25, 2024 Introduction 4


Communication
§ Website:
• https://cs.uw.edu/414 same as
• https://courses.cs.washington.edu/courses/cse414/24sp/

§ Ed message board (link on website)


• All course-related questions
• Log in today, enable email notifications

§ Class mailing list


• Very low traffic, only important announcements

March 25, 2024 Introduction 5


Textbook

March 25, 2024 Introduction 6


Textbook

Main textbook, available at the bookstore or pdf:


§ Database Systems: The Complete Book,
Hector Garcia-Molina,
Jeffrey Ullman,
Jennifer Widom, 22au
second edition.

Also useful:
§ Database Management Systems
(3rd Edition)

March 25, 2024 Introduction 7


Grading
§ Grading:
• Homeworks 50%, Exams 20%+30%
§ Late days:
• 6 in total, max 2/assignment in 24 hours chunks

§ Collaboration:
• Do complete homeworks individually
• Do discuss concepts, but see previous item
• Don’t show your work
• Don’t post it on the Web
• Don’t look at other peoples’ work
• Don’t use AI tools to produce your work

March 25, 2024 Introduction 8


Questions?

March 25, 2024 Introduction 9


Questions?

Let’s get started!

March 25, 2024 Introduction 10


Database
What is a database ?

Give examples of databases

March 25, 2024 Introduction 11


Database
What is a database ?
• A collection of files storing related data

Give examples of databases

March 25, 2024 Introduction 12


Database
What is a database ?
• A collection of files storing related data

Give examples of databases


• Accounts database
• Payroll database
• UW’s student database
• Amazon’s products database
• Airline reservation database

March 25, 2024 Introduction 13


Database Management System
What is a DBMS ?

March 25, 2024 Introduction 14


Database Management System
What is a DBMS ?
§ “A big program written by someone else that allows us to
manage efficiently a large database and allows it to persist
over long periods of time”

March 25, 2024 Introduction 15


Database Management System
What is a DBMS ?
§ “A big program written by someone else that allows us to
manage efficiently a large database and allows it to persist
over long periods of time”

Examples of DBMSs
§ Oracle, IBM DB2, Microsoft SQL Server, Vertica, Teradata
§ Cloud: Snowflake, Redshift, BigQuery, SQL Azure
§ Open source: MySQL (Sun/Oracle), PostgreSQL, DuckDB
§ Open source library: SQLite

March 25, 2024 Introduction 16


Database Management System
What is a DBMS ?
§ “A big program written by someone else that allows us to
manage efficiently a large database and allows it to persist
over long periods of time”

Examples of DBMSs
§ Oracle, IBM DB2, Microsoft SQL Server, Vertica, Teradata
§ Cloud: Snowflake, Redshift, BigQuery, SQL Azure
§ Open source: MySQL (Sun/Oracle), PostgreSQL, DuckDB
§ Open source library: SQLite

A DBMS needs a Data Model


March 25, 2024 Introduction 17
Data Models

March 25, 2024 Introduction 18


Example
Database of patients, their names, their health status…
How do we describe information?

March 25, 2024 Introduction 19


Example
Database of patients, their names, their health status…
How do we describe information?

Medical Records
PatientID Name Status Notes
123 Alex Healthy …
345 Bob Critical …

March 25, 2024 Introduction 20


Example
Database of patients, their names, their health status…
How do we describe information?

Medical Records
PatientID Name Status Notes
123 Alex Healthy …
345 Bob Critical …

Data Model
A Data Model is a mathematical formalism to describe data. It is how we can talk
about data conceptually without having to think about implementation.

March 25, 2024 Introduction 21


3 Parts of a Data Model
The 3 parts of any data model

Medical Records
PatientID Name Status Notes
123 Alex Healthy? …
345 Bob Critical …

March 25, 2024 Introduction 22


3 Parts of a Data Model
The 3 parts of any data model
§ Instance
• The actual data

Medical Records
PatientID Name Status Notes
123 Alex Healthy? …
345 Bob Critical …

March 25, 2024 Introduction 23


3 Parts of a Data Model
The 3 parts of any data model
§ Instance
• The actual data
§ Schema
• A description of what data is being stored

Medical Records
PatientID Name Status Notes
123 Alex Healthy? …
345 Bob Critical …

March 25, 2024 Introduction 24


3 Parts of a Data Model
The 3 parts of any data model
§ Instance
• The actual data
§ Schema
• A description of what data is being stored
§ Query Language
• How to retrieve and manipulate data

Medical Records
PatientID Name Status Notes “Which patients are critical?”
123 Alex Healthy? … SELECT * FROM records
345 Bob Critical … WHERE status=“critical”

March 25, 2024 Introduction 25


Data Models
There are lots of models out there!
§ Relational
§ Semi-structured
§ Key-value pairs
§ Graph
§ OO
§…

March 25, 2024 Introduction 26


Data Models
There are lots of models out there!
§ Relational
§ Semi-structured
https://db-engines.com/en/ranking
§ Key-value pairs
§ Graph
§ OO
§…

March 25, 2024 Introduction 27


Data Models
There are lots of models out there!
§ Relational
§ Semi-structured
https://db-engines.com/en/ranking
§ Key-value pairs
§ Graph
§ OO
§…
And the winner is:

The Relational Data Model

March 25, 2024 Introduction 28


Relational Data Model

March 25, 2024 Introduction 29


What is the Relational Model?

March 25, 2024 Introduction 30


The Relational Model

Ted Codd Turing Award 1981

March 25, 2024 Introduction 31


The Relational Model

§ Data is stored in simple, flat relations

§ Is retrieved via a set-at-a-time query language

§ No prescription for the physical representation

March 25, 2024 Introduction 32


The Relational Model

§ Data is stored in simple, flat relations


We start here

§ Is retrieved via a set-at-a-time query language

§ No prescription for the physical representation

March 25, 2024 Introduction 33


Components of the Relational Model

Payroll (UserId, Name, Job, Salary)

March 25, 2024 Introduction 34


Components of the Relational Model

Schema, describes data

Payroll (UserId, Name, Job, Salary)

March 25, 2024 Introduction 35


Components of the Relational Model

Schema, describes data

Payroll (UserId, Name, Job, Salary)


UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

March 25, 2024 Introduction 36


Components of the Relational Model

Schema, describes data

Payroll (UserId, Name, Job, Salary)


UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

Instance of actual data


March 25, 2024 Introduction 37
Components of the Relational Model

Table/
Relation

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

March 25, 2024 Introduction 38


Components of the Relational Model

Table/
Relation

UserID Name Job Salary


123 Jack TA 50000
Rows/
345 Allison TA 60000
Tuples/ 567 Magda Prof 90000
Records 789 Dan Prof 100000

March 25, 2024 Introduction 39


Components of the Relational Model

Table/
Relation Columns/Attributes/Fields

UserID Name Job Salary


123 Jack TA 50000
Rows/
345 Allison TA 60000
Tuples/ 567 Magda Prof 90000
Records 789 Dan Prof 100000

March 25, 2024 Introduction 40


Characteristics of the Relational Model
§ Set semantics

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

March 25, 2024 Introduction 41


Characteristics of the Relational Model
§ Set semantics
§ Order doesn’t matter

UserID Name Job Salary UserID Name Job Salary


123 Jack TA 50000 567 Magda Prof 90000
345 Allison TA 60000 123 Jack TA 50000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 345 Allison TA 60000

March 25, 2024 Introduction 42


Characteristics of the Relational Model
§ Set semantics
§ Order doesn’t matter
§ Duplicates not allowed

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000 Violates set
789 Dan Prof 100000 semantics!

March 25, 2024 Introduction 43


Characteristics of the Relational Model
§ Set semantics
§ Order doesn’t matter
§ Duplicates not allowed
§ …but systems do allow them

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000 Allowed by
789 Dan Prof 100000 systems,
but bad idea

March 25, 2024 Introduction 44


Characteristics of the Relational Model
§ Attributes are typed and static
• INTEGER, FLOAT, VARCHAR(n), DATETIME, …

UserID Name Job Salary Violates


123 Jack TA banana attribute type
345 Allison TA 60000 assuming INT

567 Magda Prof 90000


789 Dan Prof 100000

March 25, 2024 Introduction 45


Characteristics of the Relational Model
§ Attributes are typed and static
• INTEGER, FLOAT, VARCHAR(n), DATETIME, …
§ Tables are flat

No sub-tables allowed!
UserID Name Job Salary
123 Jack JobName HasBananas 50000
TA 0
farmer 1

345 Allison TA 60000


567 Magda Prof 90000
789 Dan Prof 100000

March 25, 2024 Introduction 46


The Relational Model

§ Data is stored in simple, flat relations


We saw this

§ Is retrieved via a set-at-a-time query language

§ No prescription for the physical representation

March 25, 2024 Introduction 47


The Relational Model

§ Data is stored in simple, flat relations


We saw this

§ Is retrieved via a set-at-a-time query language

What doe this mean?

§ No prescription for the physical representation

March 25, 2024 Introduction 48


Characteristics of the Relational Model
But how is this data ACTUALLY stored?
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

March 25, 2024 Introduction 49


Characteristics of the Relational Model
But how is this data ACTUALLY stored?
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

“123\tJack\tTA\t50000\t345\tAllison…” or maybe
“123\t345\t567\t789\tJack\tAllison…”

March 25, 2024 Introduction 50


Characteristics of the Relational Model
But how is this data ACTUALLY stored?
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

“123\tJack\tTA\t50000\t345\tAllison…” or maybe
“123\t345\t567\t789\tJack\tAllison…”

No prescription for physical storage: system decides

March 25, 2024 Introduction 51


Characteristics of the Relational Model
But how is this data ACTUALLY stored?
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

“123\tJack\tTA\t50000\t345\tAllison…” or maybe
“123\t345\t567\t789\tJack\tAllison…”
Physical Data Independence

No prescription for physical storage: system decides

March 25, 2024 Introduction 52


The Relational Model

§ Data is stored in simple, flat relations

§ Is retrieved via a set-at-a-time query language

§ No prescription for the physical representation

March 25, 2024 Introduction 53


The Relational Model
We discussed this…

§ Data is stored in simple, flat relations

§ Is retrieved via a set-at-a-time query language

§ No prescription for the physical representation

…and this

March 25, 2024 Introduction 54


The Relational Model

§ Data is stored in simple, flat relations


Next Lectures: SQL

§ Is retrieved via a set-at-a-time query language

§ No prescription for the physical representation

March 25, 2024 Introduction 55

You might also like