01-relationalmodel
01-relationalmodel
Systems (15-445/645)
Lecture #01
Relational
Model &
Algebra
FALL 2023 Prof. Andy Pavlo Prof. Jignesh Patel
15-445/645 (Fall 2023)
3
C O U R S E LO G I S T I C S
C O U R S E LO G I S T I C S
L E C T U R E RU L E S
TO DAY ’ S AG E N DA
DATA B A S E
DATA B A S E E X A M P L E
F L AT F I L E S T R AW M A N
F L AT F I L E S T R AW M A N
F L AT F I L E S : DATA I N T E G R I T Y
F L AT F I L E S : I M P L E M E N TAT I O N
F L AT F I L E S : D U R A B I L I T Y
DATA B A S E M A N AG E M E N T S Y S T E M
DATA M O D E L S
DATA M O D E L S
Relational
Key/Value
Graph
Document / XML / Object
Wide-Column / Column-family
Array / Matrix / Vectors
Hierarchical
Network
Multi-Value
15-445/645 (Fall 2023)
19
DATA M O D E L S
DATA M O D E L S
Relational
Key/Value
Graph
← NoSQL
Document / XML / Object
Wide-Column / Column-family
Array / Matrix / Vectors
Hierarchical
Network
Multi-Value
15-445/645 (Fall 2023)
21
DATA M O D E L S
Relational
Key/Value
Graph
Document / XML / Object
Wide-Column / Column-family
Array / Matrix / Vectors ← Machine Learning
Hierarchical
Network
Multi-Value
15-445/645 (Fall 2023)
22
DATA M O D E L S
Relational
Key/Value
Graph
Document / XML / Object
Wide-Column / Column-family
Array / Matrix / Vectors
Hierarchical
Network ← Obsolete / Legacy / Rare
Multi-Value
15-445/645 (Fall 2023)
23
DATA M O D E L S
E A R LY D B M S s
E A R LY D B M S s
Edgar F. Codd
15-445/645 (Fall 2023)
27
E A R LY D B M S s
Edgar F. Codd
15-445/645 (Fall 2023)
28
R E L AT I O N A L M O D E L
Key tenets:
→ Store database in simple data structures (relations).
→ Physical storage left up to the DBMS implementation.
→ Access data through high-level language, DBMS figures
out best execution strategy.
R E L AT I O N A L M O D E L
R E L AT I O N A L M O D E L
R E L AT I O N A L M O D E L : P R I M A R Y K E Y S
R E L AT I O N A L M O D E L : P R I M A R Y K E Y S
R E L AT I O N A L M O D E L : F O R E I G N K E Y S
R E L AT I O N A L M O D E L : F O R E I G N K E Y S
R E L AT I O N A L M O D E L : F O R E I G N K E Y S
R E L AT I O N A L M O D E L : F O R E I G N K E Y S
R E L AT I O N A L M O D E L : C O N S T R A I N T S
DATA M A N I P U L AT I O N L A N G UAG E S ( D M L )
Procedural: ← Relational
→ The query specifies the (high-level) strategy Algebra
to find the desired result based on sets / bags.
R E L AT I O N A L A LG E B R A
R E L AT I O N A L A LG E B R A : S E L E C T
R(a_id,b_id)
Choose a subset of the tuples from a a_id b_id
relation that satisfies a selection a1 101
predicate. a2 102
a2 103
→ Predicate acts as a filter to retain only
a3 104
tuples that fulfill its qualifying
requirement. σa_id='a2'(R) σa_id='a2'∧ b_id>102(R)
→ Can combine multiple predicates using a_id b_id a_id b_id
conjunctions / disjunctions. a2 102 a2 103
a2 103
R E L AT I O N A L A LG E B R A : P RO J E C T I O N
R(a_id,b_id)
Generate a relation with tuples that a_id b_id
contains only the specified attributes. a1 101
→ Rearrange attributes’ ordering. a2 102
→ Remove unwanted attributes. a2 103
→ Manipulate values to create derived a3 104
attributes. Πb_id-100,a_id(σa_id='a2'(R))
b_id-100 a_id
Syntax: ΠA1,A2,…,An(R) 2 a2
3 a2
R E L AT I O N A L A LG E B R A : U N I O N
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains all a_id b_id a_id b_id
tuples that appear in either only one a1 101 a3 103
or both input relations. a2 102 a4 104
a3 103 a5 105
Syntax: (R ∪ S) (R ∪ S)
a_id b_id
a1 101
a2 102
(SELECT * FROM R) a3 103
UNION a4 104
(SELECT * FROM S); a5 105
R E L AT I O N A L A LG E B R A : I N T E R S E C T I O N
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains only a_id b_id a_id b_id
the tuples that appear in both of the a1 101 a3 103
input relations. a2 102 a4 104
a3 103 a5 105
Syntax: (R ∩ S) (R ∩ S)
a_id b_id
a3 103
(SELECT * FROM R)
INTERSECT
(SELECT * FROM S);
R E L AT I O N A L A LG E B R A : D I F F E R E N C E
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains only a_id b_id a_id b_id
the tuples that appear in the first and a1 101 a3 103
not the second of the input relations. a2 102 a4 104
a3 103 a5 105
Syntax: (R – S) (R – S)
a_id b_id
a1 101
a2 102
(SELECT * FROM R)
EXCEPT
(SELECT * FROM S);
R E L AT I O N A L A LG E B R A : P RO D U C T
R(a_id,b_id) S(a_id,b_id)
Generate a relation that contains all a_id b_id a_id b_id
possible combinations of tuples from a1 101 a3 103
the input relations. a2 102 a4 104
a3 103 a5 105
Syntax: (R × S) R.a_id
(R × S)
R.b_id S.a_id S.b_id
a1 101 a3 103
a1 101 a4 104
a1 101 a5 105
SELECT * FROM R CROSS JOIN S; a2 102 a3 103
a2 102 a4 104
a2 102 a5 105
SELECT * FROM R, S; a3 103 a3 103
a3 103 a4 104
a3 103 a5 105
R E L AT I O N A L A LG E B R A : J O I N
R(a_id,b_id) S(a_id,b_id,val)
Generate a relation that contains all a_id b_id a_id b_id val
tuples that are a combination of two a1 101 a3 103 XXX
tuples (one from each input relation) a2 102 a4 104 YYY
more attributes. (R ⋈ S)
R.a_id R.b_id S.a_id S.b_id S.val a_id b_id val
R E L AT I O N A L A LG E B R A : J O I N
R(a_id,b_id) S(a_id,b_id,val)
Generate a relation that contains all a_id b_id a_id b_id val
tuples that are a combination of two a1 101 a3 103 XXX
tuples (one from each input relation) a2 102 a4 104 YYY
more attributes. (R ⋈ S)
a_id b_id val
R E L AT I O N A L A LG E B R A : E X T R A O P E R ATO R S
Rename (ρ)
Assignment (R←S)
Duplicate Elimination (δ)
Aggregation (γ)
Sorting (τ)
Division (R÷S)
O B S E R VAT I O N
R E L AT I O N A L M O D E L : Q U E R I E S
DATA M O D E L S
Relational
Key/Value
Graph
Document / XML / Object ← Leading Alternative
Wide-Column / Column-family
Array / Matrix / Vectors ← Current Hotness
Hierarchical
Network
Multi-Value
15-445/645 (Fall 2023)
40
D O C U M E N T DATA M O D E L
D O C U M E N T DATA M O D E L
Artist R1(id,…)
⨝
ArtistAlbum R2(artist_id,album_id)
⨝
Album R3(id,…)
D O C U M E N T DATA M O D E L
Artist R1(id,…)
⨝
ArtistAlbum R2(artist_id,album_id)
⨝
Album R3(id,…)
D O C U M E N T DATA M O D E L
Application Code {
class Artist { "name": "GZA",
Artist int id;
"year": 1990,
"albums": [
String name; {
int year; "name": "Liquid Swords",
Album albums[]; "year": 1995
},
} {
class Album { "name": "Beneath the Surface",
int id; "year": 1999
Album String name; }
]
int year;
}
}
V E C TO R DATA M O D E L
V E C TO R DATA M O D E L
V E C TO R DATA M O D E L
Album(id, name, year) Embeddings
id name year Id1 → [0.32, 0.78, 0.30, ...]
11 Enter the Wu-Tang 1993 Id2 → [0.99, 0.19, 0.81, ...]
Transformer
22 St.Ides Mix Tape 1994 Id3 → [0.01, 0.18, 0.85, ...]
Vector
Index
HNSW, IVFFlat
Meta Faiss, Spotify Annoy
15-445/645 (Fall 2023)
43
V E C TO R DATA M O D E L
Album(id, name, year) Embeddings
id name year Id1 → [0.32, 0.78, 0.30, ...]
11 Enter the Wu-Tang 1993 Id2 → [0.99, 0.19, 0.81, ...]
Transformer
22 St.Ides Mix Tape 1994 Id3 → [0.01, 0.18, 0.85, ...]
HNSW, IVFFlat
Meta Faiss, Spotify Annoy
15-445/645 (Fall 2023)
60
CONCLUSION
NEXT CLASS
Modern SQL
→ Make sure you understand basic SQL before the lecture.