0% found this document useful (0 votes)
2 views

01-introduction

The document outlines the structure and logistics of the Intro to Database Systems course (15-445/645) for Fall 2022, including course policies, project details, and a focus on the relational model. It emphasizes the importance of original work, the use of a specific academic DBMS for projects, and introduces key concepts in database management and relational algebra. Additionally, it highlights the significance of data integrity and manipulation within database systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

01-introduction

The document outlines the structure and logistics of the Intro to Database Systems course (15-445/645) for Fall 2022, including course policies, project details, and a focus on the relational model. It emphasizes the importance of original work, the use of a specific academic DBMS for projects, and introduces key concepts in database management and relational algebra. Additionally, it highlights the significance of data integrity and manipulation within database systems.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Intro to Database Systems (15-445/645)

01 Course Intro &


Relational Model
FALL Andy
2022 Pavlo
2

15-445/645 (Fall 2022)


4

TO DAY ’ S AG E N DA

Course Logistics
Relational Model
Relational Algebra

15-445/645 (Fall 2022)


5

WA I T L I S T

I do not control the wait list.


I do not take bribes.

Admins will move students moved off the wait list


as new spots become available.
If you are not currently enrolled, the likelihood
that you will get in is unfortunately very low.

15-445/645 will be offered in Spring 2023!


15-445/645 (Fall 2022)
6

L E C T U R E RU L E S

Please interrupt me for the following reasons:


→ I am speaking too fast.
→ You don't understand what I am talking about.
→ You have a database-related question.

Do not interrupt me for the following reasons:


→ Whether you can use the bathroom.
→ Questions about blockchains.

I will not answer questions about the lecture


immediately after class.
15-445/645 (Fall 2022)
7

C O U R S E OV E R V I E W

This course is about the design/implementation of


database management systems (DBMSs).

This is not a course about how to use a DBMS to


build applications or how to administer a DBMS.
→ See CMU 95-703 (Heinz College)

15-445/645 (Fall 2022)


8

P RO J E C T S

All projects will use the CMU DB


Group BusTub academic DBMS.
→ Each project builds on the previous one.
→ We will not teach you how to
write/debug C++17.

Total of four late days the entire


semester for projects only.

You must complete Project #0 before


Sept 11th.
15-445/645 (Fall 2022)
9

C O U R S E LO G I S T I C S

Course Policies + Schedule: Course Web Page


Discussion + Announcements: Piazza
Homeworks + Projects: Gradescope
Final Grades: Canvas

Non-CMU students will be able to complete all


assignments using Gradescope (PXWVR5).
→D
→ Somebody needs to finish my Wikipedia article.

15-445/645 (Fall 2022)


9

C O U R S E LO G I S T I C S

Course Policies + Schedule: Course Web Page


Discussion + Announcements: Piazza
Homeworks + Projects: Gradescope
Final Grades: Canvas

Non-CMU students will be able to complete all


assignments using Gradescope (PXWVR5).
→D
→ Somebody needs to finish my Wikipedia article.

15-445/645 (Fall 2022)


10

P L AG I A R I S M WA R N I N G

The homework and projects must be your own


original work. They are not group assignments.
You may not copy source code from other people
or the web.

Plagiarism is not tolerated. You will get lit up.


→ Please ask me if you are unsure.

See CMU's Policy on Academic Integrity for


additional information.
15-445/645 (Fall 2022)
DATA B A S E R E S E A RC H

¡Databases! – A Database Seminar Series


Mondays @ 4:30pm (starting on 9/12)
→ Live on Zoom. Published to Youtube afterwards
→ https://db.cs.cmu.edu/seminar2022

15-445/645 (Fall 2022)


12

15-445/645 (Fall 2022)


https://db.cs.cmu.edu/seminar2022
Databases
14

DATA B A S E

Organized collection of inter-related data that


models some aspect of the real-world.

Databases are the core component of most


computer applications.

15-445/645 (Fall 2022)


15

DATA B A S E E X A M P L E

Create a database that models a digital music store


to keep track of artists and albums.

Things we need for our store:


→ Information about Artists
→ What Albums those Artists released

15-445/645 (Fall 2022)


16

F L AT F I L E S T R AW M A N

Store our database as comma-separated value


(CSV) files that we manage ourselves in our
application code.
→ Use a separate file per entity.
→ The application must parse the files each time they want
to read/update records.

15-445/645 (Fall 2022)


17

F L AT F I L E S T R AW M A N

Create a database that models a digital music store.


Artist(name, year, country) Album(name, artist, year)
"Wu-Tang Clan",1992,"USA" "Enter the Wu-Tang","Wu-Tang Clan",1993

"Notorious BIG",1992,"USA" "St.Ides Mix Tape","Wu-Tang Clan",1994


"Liquid Swords","GZA",1990
"GZA",1990,"USA"

15-445/645 (Fall 2022)


18

F L AT F I L E S T R AW M A N

Example: Get the year that GZA went solo.


Artist(name, year, country)
for line in file.readlines():
"Wu-Tang Clan",1992,"USA" record = parse(line)
"Notorious BIG",1992,"USA" if record[0] == "GZA":
"GZA",1990,"USA" print(int(record[1]))

15-445/645 (Fall 2022)


19

F L AT F I L E S : DATA I N T E G R I T Y

How do we ensure that the artist is the same for


each album entry?

What if somebody overwrites the album year with


an invalid string?

What if there are multiple artists on an album?


What happens if we delete an artist that has
albums?
15-445/645 (Fall 2022)
20

F L AT F I L E S : I M P L E M E N TAT I O N

How do you find a particular record?

What if we now want to create a new application


that uses the same database?

What if two threads try to write to the same file at


the same time?

15-445/645 (Fall 2022)


21

F L AT F I L E S : D U R A B I L I T Y

What if the machine crashes while our program is


updating a record?

What if we want to replicate the database on


multiple machines for high availability?

15-445/645 (Fall 2022)


22

DATA B A S E M A N AG E M E N T S Y S T E M

A database management system (DBMS) is


software that allows applications to store and
analyze information in a database.

A general-purpose DBMS supports the definition,


creation, querying, update, and administration of
databases in accordance with some data model.

15-445/645 (Fall 2022)


23

DATA M O D E L S

A data model is a collection of concepts for


describing the data in a database.

A schema is a description of a particular collection


of data, using a given data model.

15-445/645 (Fall 2022)


24

DATA M O D E L S

Relational ← This Course


Key/Value
Graph
Document / Object
Wide-Column / Column-family
Array / Matrix / Vectors
Hierarchical
Network
Multi-Value
15-445/645 (Fall 2022)
25

E A R LY D B M S s

Early database applications were difficult to build


and maintain on available DBMSs in the 1960s.
→ Examples: IDS, IMS, CODASYL
→ Computers were expensive, humans were cheap.

Tight coupling between logical and physical layers.

Programmers had to (roughly) know what queries


the application would execute before they could
deploy the database.

15-445/645 (Fall 2022)


26

E A R LY D B M S s

Ted Codd was a mathematician


working at IBM Research in the late
1960s.

He saw IBM's developers spending


their time rewriting database
programs every time the database’s
schema or layout changed.

Devised the relational model in 1969. Edgar F. Codd


15-445/645 (Fall 2022)
26

E A R LY D B M S s

Ted Codd was a mathematician


working at IBM Research in the late
1960s.

He saw IBM's developers spending


their time rewriting database
programs every time the database’s
schema or layout changed.

Devised the relational model in 1969. Edgar F. Codd


15-445/645 (Fall 2022)
27

R E L AT I O N A L M O D E L

The relational model defines a database abstraction


based on relations to avoid maintenance overhead.

Key tenets:
→ Store database in simple data structures (relations).
→ Physical storage left up to the DBMS implementation.
→ Access data through high-level language, DBMS figures
out best execution strategy.

15-445/645 (Fall 2022)


28

R E L AT I O N A L M O D E L

Structure: The definition of the database's


relations and their contents.

Integrity: Ensure the database's contents satisfy


constraints.

Manipulation: Programming interface for


accessing and modifying a database's contents.

15-445/645 (Fall 2022)


29

R E L AT I O N A L M O D E L

A relation is an unordered set that Artist(name, year, country)


contain the relationship of attributes name year country

that represent entities. Wu-Tang Clan 1992 USA


Notorious BIG 1992 USA

A tuple is a set of attribute values (also GZA 1990 USA

known as its domain) in the relation.


→ Values are (normally) atomic/scalar. n-ary Relation
→ The special value NULL is a member of =
every domain (if allowed). Table with n columns

15-445/645 (Fall 2022)


30

R E L AT I O N A L M O D E L : P R I M A R Y K E Y S

A relation's primary key uniquely Artist(name, year, country)


identifies a single tuple. name year country
Wu-Tang Clan 1992 USA
Some DBMSs automatically create an
Notorious BIG 1992 USA
internal primary key if a table does
GZA 1990 USA
not define one.

Auto-generation of unique integer


primary keys:
→ SEQUENCE (SQL:2003)
→ AUTO_INCREMENT (MySQL)

15-445/645 (Fall 2022)


30

R E L AT I O N A L M O D E L : P R I M A R Y K E Y S

A relation's primary key uniquely Artist(id, name, year, country)


identifies a single tuple. id name year country
123 Wu-Tang Clan 1992 USA
Some DBMSs automatically create an
456 Notorious BIG 1992 USA
internal primary key if a table does
789 GZA 1990 USA
not define one.

Auto-generation of unique integer


primary keys:
→ SEQUENCE (SQL:2003)
→ AUTO_INCREMENT (MySQL)

15-445/645 (Fall 2022)


31

R E L AT I O N A L M O D E L : F O R E I G N K E Y S

A foreign key specifies that an attribute from one


relation has to map to a tuple in another relation.

15-445/645 (Fall 2022)


31

R E L AT I O N A L M O D E L : F O R E I G N K E Y S
Artist(id, name, year, country)
id name year country
123 Wu-Tang Clan 1992 USA
ArtistAlbum(artist_id, album_id) 456 Notorious BIG 1992 USA
artist_id album_id 789 GZA 1990 USA
123 11
123 22 Album(id, name, year)
789 22 id name year
456 22 11 Enter the Wu-Tang 1993
22 St.Ides Mix Tape 1994
33 Liquid Swords 1995

15-445/645 (Fall 2022)


32

DATA M A N I P U L AT I O N L A N G UAG E S ( D M L )

Methods to store and retrieve information from a


database.

Procedural: ← Relational
→ The query specifies the (high-level) strategy Algebra
to find the desired result based on sets / bags.

Non-Procedural (Declarative): ← Relational


→ The query specifies only what data is wanted Calculus
and not how to find it.

15-445/645 (Fall 2022)


33

R E L AT I O N A L A LG E B R A

Fundamental operations to retrieve σ Select


and manipulate tuples in a relation. π Projection
→ Based on set algebra.
∪ Union
Each operator takes one or more ∩ Intersection
relations as its inputs and outputs a – Difference
new relation.
→ We can "chain" operators together to create × Product
more complex operations.
⋈ Join

15-445/645 (Fall 2022)


34

R E L AT I O N A L A LG E B R A : S E L E C T
R(a_id,b_id)
Choose a subset of the tuples from a a_id b_id

relation that satisfies a selection a1


a2
101
102
predicate. a2 103
→ Predicate acts as a filter to retain only a3 104
tuples that fulfill its qualifying σa_id='a2'(R) σa_id='a2'∧ b_id>102(R)
requirement. a_id b_id a_id b_id
→ Can combine multiple predicates using a2 102 a2 103
conjunctions / disjunctions. a2 103

Syntax: σpredicate(R) SELECT * FROM R


WHERE a_id='a2' AND b_id>102;

15-445/645 (Fall 2022)


35

R E L AT I O N A L A LG E B R A : P RO J E C T I O N
R(a_id,b_id)
Generate a relation with tuples that a_id
a1
b_id
101
contains only the specified attributes. a2 102
→ Can rearrange attributes’ ordering. a2 103
→ Can manipulate the values. a3 104

Πb_id-100,a_id(σa_id='a2'(R))
Syntax: A1,A2,…,An(R) b_id-100 a_id
2 a2
3 a2

SELECT b_id-100, a_id


FROM R WHERE a_id = 'a2';

15-445/645 (Fall 2022)


36

R E L AT I O N A L A LG E B R A : U N I O N
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains all a_id b_id a_id b_id
a1 101 a3 103
tuples that appear in either only one a2 102 a4 104
or both input relations. a3 103 a5 105

(R ∪ S)
Syntax: (R ∪ S) a_id b_id
a1 101
a2 102
(SELECT * FROM R) a3 103
UNION ALL a3 103
(SELECT * FROM S); a4 104
a5 105

15-445/645 (Fall 2022)


37

R E L AT I O N A L A LG E B R A : I N T E R S E C T I O N
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains only a_id b_id a_id b_id
a1 101 a3 103
the tuples that appear in both of the a2 102 a4 104
input relations. a3 103 a5 105

Syntax: (R ∩ S) (R ∩ S)
a_id b_id
a3 103

(SELECT * FROM R)
INTERSECT
(SELECT * FROM S);

15-445/645 (Fall 2022)


38

R E L AT I O N A L A LG E B R A : D I F F E R E N C E
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains only a_id b_id a_id b_id
a1 101 a3 103
the tuples that appear in the first and a2 102 a4 104
not the second of the input relations. a3 103 a5 105

Syntax: (R – S) (R – S)
a_id b_id
a1 101
a2 102
(SELECT * FROM R)
EXCEPT
(SELECT * FROM S);

15-445/645 (Fall 2022)


39

R E L AT I O N A L A LG E B R A : P RO D U C T
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains all a_id b_id a_id b_id
a1 101 a3 103
possible combinations of tuples from a2 102 a4 104
the input relations. a3 103 a5 105
(R × S)
Syntax: (R × S) R.a_id
a1
R.b_id
101
S.a_id
a3
S.b_id
103
a1 101 a4 104
a1 101 a5 105
SELECT * FROM R CROSS JOIN S; a2 102 a3 103
a2 102 a4 104
a2 102 a5 105
SELECT * FROM R, S; a3 103 a3 103
a3 103 a4 104
a3 103 a5 105

15-445/645 (Fall 2022)


40

R E L AT I O N A L A LG E B R A : J O I N
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains all a_id b_id a_id b_id
tuples that are a combination of two a1 101 a3 103

tuples (one from each input relation) a2


a3
102
103
a4
a5
104
105
with a common value(s) for one or
more attributes. (R ⋈ S)
a_id b_id
Syntax: (R ⋈ S) a3 103

SELECT * FROM R NATURAL JOIN S;

SELECT * FROM R JOIN S USING (a_id, b_id);


15-445/645 (Fall 2022)
41

R E L AT I O N A L A LG E B R A : E X T R A O P E R ATO R S

Rename (ρ)
Assignment (R←S)
Duplicate Elimination (δ)
Aggregation (γ)
Sorting (τ)
Division (R÷S)

15-445/645 (Fall 2022)


42

O B S E R VAT I O N

Relational algebra still defines the high-level steps


of how to compute a query.
→ σb_id=102(R⋈S) vs. (R⋈(σb_id=102(S))

A better approach is to state the high-level answer


that you want the DBMS to compute.
→ Retrieve the joined tuples from R and S where b_id
equals 102.

15-445/645 (Fall 2022)


43

R E L AT I O N A L M O D E L : Q U E R I E S

The relational model is independent of any query


language implementation.

SQL is the de facto standard (many dialects).

for line in file.readlines():


SELECT year FROM artists
record = parse(line)
WHERE name = 'GZA';
if record[0] == "GZA":
print(int(record[1]))

15-445/645 (Fall 2022)


44

DATA M O D E L S

Relational
Key/Value
Graph
Document / Object ← Leading Alternative
Wide-Column / Column-family
Array / Matrix / Vectors
Hierarchical
Network
Multi-Value
15-445/645 (Fall 2022)
45

D O C U M E N T DATA M O D E L

Embed data hierarchy into a single object.

Artist R1(id,…)


ArtistAlbum R2(artist_id,album_id)


Album R3(id,…)

15-445/645 (Fall 2022)


45

D O C U M E N T DATA M O D E L

Embed data hierarchy into a single object.

Artist R1(id,…)


ArtistAlbum R2(artist_id,album_id)


Album R3(id,…)

15-445/645 (Fall 2022)


45

D O C U M E N T DATA M O D E L

Embed data hierarchy into a single object.


Application Code {
class Artist { "name": "GZA",
Artist int id; "year": 1990,
"albums": [
String name;
{
int year; "name": "Liquid Swords",
Album albums[]; "year": 1995
} },
class Album { {
int id; "name": "Beneath the Surface",
Album String name; "year": 1999
}
int year;
]
} }
15-445/645 (Fall 2022)
46

CONCLUSION

Databases are ubiquitous.

Relational algebra defines the primitives for


processing queries on a relational database.

We will see relational algebra again when we talk


about query optimization + execution.

15-445/645 (Fall 2022)


47

NEXT CLASS

Modern SQL
→ Make sure you understand basic SQL before the lecture.

15-445/645 (Fall 2022)

You might also like