0% found this document useful (0 votes)
100 views111 pages

S4 - RM Ra SQL

The document discusses the relational model, relational algebra, and SQL. It begins by introducing the relational model for modeling and representing data using relations. It then covers relational algebra for evaluating queries on relational data using operations like selection, projection, joins, and more. Finally, it mentions that SQL allows users to ask queries on relational data. Key concepts covered include the relational model, relations, schemas, instances, keys, integrity constraints, and the basic operations of relational algebra.

Uploaded by

study material
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views111 pages

S4 - RM Ra SQL

The document discusses the relational model, relational algebra, and SQL. It begins by introducing the relational model for modeling and representing data using relations. It then covers relational algebra for evaluating queries on relational data using operations like selection, projection, joins, and more. Finally, it mentions that SQL allows users to ask queries on relational data. Key concepts covered include the relational model, relations, schemas, instances, keys, integrity constraints, and the basic operations of relational algebra.

Uploaded by

study material
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Relational

 Model,  Relational  Algebra,  &  SQL

CS165  Sections  -­‐ Fall  2016


Today
how  we  model  &  represent  our  data
relational  model
how  we  evaluate  queries  on  our  data
relational  algebra
what  queries  we  can  ask  on  our  data
SQL

2
RELATIONAL  MODEL

3
Why  Study  the  Relational  Model?  

Simple  yet  expressive

Widely  used
Vendors:  IBM,  Microsoft,  Oracle,  etc.
Relational  Database:  Definitions
Relational database: a  set  of  relations.
Relation: made  up  of  2  parts:
Schema : specifies name  of  relation,  plus  name  and  
type  of  each  column.  
E.g.  Students(sid:  string,  name:  string,  login:  string,  age:  integer,  gpa:  real)  

Instance :  a  table, with  rows  and  columns.


#rows  =  cardinality
#fields  =  degree  /  arity
Can  think  of  a  relation  as  a  set  of  rows  or  tuples.  
i.e.,  all  rows  are  distinct,  no  order  among  rows  
Example:  Students  relation
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@cs 18 3.2
53650 Smith smith@math 19 3.8

Cardinality  =  3,  arity =  5,  all  rows  distinct.

Do  all  values  in  each  column  of  a  relation  instance  


have  to  be  distinct?
Keys
Keys  are  a  way  to  associate  tuples  in  different  
relations
Keys  are  one  form  of  integrity  constraint  (IC)

Enrolled Students
sid cid grade
sid name login age gpa
53666 15-101 C
53666 18-203 B 53666 Jones jones@cs 18 3.4
53650 15-112 A 53688 Smith smith@cs 18 3.2
53666 15-105 B 53650 Smith smith@math 19 3.8

FOREIGN  Key PRIMARY  Key


Primary  Keys
A  set  of  fields  is  a  superkey if:
No  two  distinct  tuples  can  have  same  values  in  all  key  fields
A  set  of  fields  is  a  key for  a  relation  if  :
It  is  a  superkey
No  subset  of  the  fields  is  a  superkey
What  if  >1  key  for  a  relation?
One  of  the  keys  is  chosen  (by  DBA)  to  be  the  primary  key.          
Other  keys  are  called  candidate keys.

sid name login age gpa


53666 Jones jones@cs 18 3.4
53688 Smith smith@cs 18 3.2
53650 Smith smith@math 19 3.8
Referential  Integrity
Can  we  insert  {50000,  18-­‐203,  A} in  Enrolled  ?

What  should  happen  if  we  delete  Jones  from  Students?

What  should  happen  if  we  delete  a  row  from  Enrolled?

Enrolled Students
sid cid grade
sid name login age gpa
53666 15-101 C
53666 18-203 B 53666 Jones jones@cs 18 3.4
53650 15-112 A 53688 Smith smith@cs 18 3.2
53666 15-105 B 53650 Smith smith@math 19 3.8
RELATIONAL  ALGEBRA

10
Relational  Algebra
Relational  Query  Languages
Selection  &  Projection
Union,  Set  Difference  &  Intersection
Cross  product  &  Joins
Examples

11
Relational  Query  Languages
Query  languages:   Allow  manipulation  and  
retrieval  of  data  from  a  database.
Relational  model  supports  simple,  powerful  QLs:
Strong  formal  foundation  based  on  logic.
Allows  for  much  optimization.
Query  Languages  !=  programming  languages!
QLs  not  expected  to  be  “Turing  complete”.
QLs  not  intended  to  be  used  for  complex  calculations.
QLs  support  easy,  efficient  access  to  large  data  sets.

Relational  Algebra
Preliminaries
A  query  is  applied  to  relation  instances,  and  the  
result  of  a  query  is  also  a  relation  instance.
Schemas of  input  relations  for  a  query  are  fixed (but  query  
will  run  over  any  legal  instance)
The  schema  for  the  result of  a  given  query  is  also  fixed.    It  is  
determined  by  the  definitions  of  the  query  language  
constructs.
Positional  vs.  named-­‐field  notation:    
Positional  notation  easier  for  formal  definitions,  named-­‐field  
notation  more  readable.    
Both  used  in  SQL
Relational  Algebra:  5  Basic  Operations
Selection (σ)        Selects  a  subset  of  rows from  relation  
(horizontal).
Projection (π)    Retains  only  wanted  columns from  
relation  (vertical).
Cross-­‐product (x)    Allows  us  to  combine  two  relations.
Set-­‐difference (–)    Tuples  in  R1,  but  not  in  R2.

Union (È )    Tuples  in  R1  and/or  in  R2.

Since  each  operation  returns  a  relation,  operations  can  


be  composed! (Algebra  is  “closed”.)
R1 sid bid day
Example  Instances 22 101 10/10/96
58 103 11/12/96
S1 sid sname rating age
Boats
22 dustin 7 45.0
bid bname color
101 Interlake blue 31 lubber 8 55.5
102 Interlake red 58 rusty 10 35.0
103 Clipper green
S2 sid sname rating age
104 Marine red
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Relational  Algebra
Relational  Query  Languages
Selection  &  Projection
Union,  Set  Difference  &  Intersection
Cross  product  &  Joins
Examples

16
Projection  (π)
Examples: πage(S2) πsname,  rating(S2)
Retains  only  attributes  that  are  in  the  “projection  list”.
Schema of  result:
exactly  the  fields  in  the  projection  list,  with  the  
same  names  that  they  had  in  the  input  relation.
Projection  operator  has  to  eliminate  duplicates (How  
do  they  arise?  Why  remove  them?)
Note:  real  systems  typically  don’t  do  duplicate  
elimination  unless  the  user  explicitly  asks  for  it.    
(Why  not?)
Projection  (π) sname
yuppy
rating
9
lubber 8
guppy 5
rusty 10
sid sname rating age
πsname,  rating(S2)
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
Projection  (π) sname
yuppy
rating
9
lubber 8
guppy 5
rusty 10
sid sname rating age
πsname,  rating(S2)
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0 age
58 rusty 10 35.0
35.0
S2
55.5

πage(S2)
Selection  (s)
Selects  rows  that  satisfy  selection  condition.
Result  is  a  relation.
Schema of  result  is  same  as  that  of  the  input  
relation.
Do  we  need  to  do  duplicate  elimination?    

sid sname rating age


28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
σrating>8(S2)
Selection  (s)
Selects  rows  that  satisfy  selection  condition.
Result  is  a  relation.
Schema of  result  is  same  as  that  of  the  input  
relation.
Do  we  need  to  do  duplicate  elimination?    

sid sname rating age


28 yuppy 9 35.0 sname rating
31 lubber 8 55.5 yuppy 9
44 guppy 5 35.0
58 rusty 10 35.0 rusty 10

σrating>8(S2) πsname,  rating(σrating>8(S2))


Relational  Algebra
Relational  Query  Languages
Selection  &  Projection
Union,  Set  Difference  &  Intersection
Cross  product  &  Joins
Examples

22
Union  and  Set-­‐Difference

All  of  these  operations  take  two  input  relations,  which  


must  be  union-­‐compatible:
Same  number  of  fields.
“Corresponding” fields  have  the  same  type.

For  which,  if  any,  is  duplicate  elimination  required?


Union
sid sname rating age sid sname rating age
22 dustin 7 45.0 22 dustin 7 45.0
31 lubber 8 55.5
31 lubber 8 55.5
58 rusty 10 35.0
58 rusty 10 35.0 44 guppy 5 35.0
S1 28 yuppy 9 35.0
sid sname rating age S1∪S2
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
Set  Difference
sid sname rating age sid sname rating age
22 dustin 7 45.0 22 dustin 7 45.0
31 lubber 8 55.5
S1  – S2
58 rusty 10 35.0
S1
sid sname rating age sid sname rating age
28 yuppy 9 35.0 28 yuppy 9 35.0
31 lubber 8 55.5 44 guppy 5 35.0
44 guppy 5 35.0
58 rusty 10 35.0 S2  – S1
S2
Compound  Operator:  Intersection
In  addition  to  the  5  basic  operators,  there  are  several  
additional  “Compound  Operators”
These  add  no  computational  power  to  the  language,  
but  are  useful  shorthands.
Can  be  expressed  solely  with  the  basic  ops.

Intersection  takes  two  input  relations,  which  must  be  


union-­‐compatible.
Q:  How  to  express  it  using  basic  operators?
R  Ç S  =  R    - (R  - S)
Intersection
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0 sid sname rating age
S1 31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0 S1  Ç S2
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
Relational  Algebra
Relational  Query  Languages
Selection  &  Projection
Union,  Set  Difference  &  Intersection
Cross  product  &  Joins
Examples

28
Cross-­‐Product
S1  x  R1:  Each  row  of  S1  paired  with  each  row  of  R1.
Q:  How  many  rows  in  the  result?
Result  schema  has  one  field  per  field  of  S1  and  R1,  with  
field  names  “inherited”  if  possible.
May  have  a  naming  conflict:    Both  S1  and  R1  have  a  
field  with  the  same  name.
In  this  case,  can  use  the  renaming  operator:

ρ(C(1➝ sid1, 5➝ sid2),S1✕R1)


Cross  Product  Example
sid sname rating age sid bid day
22 dustin 7 45.0 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0
R1
S1

(sid) sname rating age (sid) bid day


22 dustin 7 45.0 22 101 10/10/96
S1  X  R1  = 22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Compound  Operator:  Join
Joins  are  compound  operators  involving  cross  product,  
selection,  and  (sometimes)  projection.
Most  common  type  of  join  is  a  “natural  join”  (often  just  
called  “join”).    R            S  conceptually  is:
1.  Compute  R  x  S
2.  Select  rows  where  attributes  that  appear  in  both  relations  
have  equal  values
3.  Project  all  unique  attributes  and  one  copy  of    each  of  the  
common  ones.
Note:  Usually  done  much  more  efficiently  than  this.
Useful  for  putting  “normalized”  relations  back  together.
Natural  Join  Example
sid sname rating age
22 dustin 7 45.0 sid bid day
31 lubber 8 55.5 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
S1 R1

S1              R1  =
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
Natural  Join  Example
1 (sid) sname rating age (sid) bid day
S1  X  R1  = 22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Natural  Join  Example
1 (sid) sname rating age (sid) bid day
S1  X  R1  = 22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
s
2
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Natural  Join  Example
1 (sid) sname rating age (sid) bid day
S1  x  R1  = 22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
s
2
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96

p 3

S1              R1= sid sname rating age bid day


22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
Other  Types  of  Joins
Condition  Join  (or  “theta-­‐join”):
R⋈c S  =  σc(R×S)
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
R1⋈S1.sid<R1.sid S1
Result  schema  same  as  that  of  cross-­‐product.
May  have  fewer  tuples  than  cross-­‐product.
Equi-­‐Join: condition  c contains  only  conjunction  of  equalities.
Relational  Algebra
Relational  Query  Languages
Selection  &  Projection
Union,  Set  Difference  &  Intersection
Cross  product  &  Joins
Examples

37
sid bid day
Examples Reserves
22 101 10/10/96
58 103 11/12/96

Sailors sid sname rating age


22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0

Boats bid bname color


101 Interlake Blue
102 Interlake Red
103 Clipper Green
104 Marine Red
Examples
1.  Find  names  of  sailors  who  have  reserved  boat  #103

2.  Find  names  of  sailors  who  have  reserved  a  red  boat

3.  Find  sailors  who  have  reserved  a  red  or  a  green  boat

4.  Find  sailors  who  have  reserved  a  red  and a  green  


boat

39
Answers
1.  Find  names  of  sailors  who  have  reserved  boat  #103

Solution  1:       πsname((σbid=103Reserves)⋈ Sailors)

Solution  2: πsname(σbid=103(Reserves⋈ Sailors))


Answers
2.  Find  names  of  sailors  who  have  reserved  a  red  boat

Information  about  boat  color  only  available  in  Boats;  so  


need  an  extra  join:

πsname((σcolor=‘red’Boats)  ⋈ Reserves  ⋈ Sailors)


A  more  efficient  solution:
πsname(πsid(πbid(σcolor=‘red’Boats)  ⋈ Res)  ⋈ Sailors)

A  query  optimizer  can  find  this  given  the  first  solution!


Answers
3.  Find  sailors  who  have  reserved  a  red  or  a  green  boat

Can  identify  all  red  or  green  boats,  then  find  sailors  
who have  reserved  one  of  these  boats:

ρ(Tempboats,(σcolor=‘red’  ∨ color=‘green’Boats))

πsname(Tempboats ⋈ Reserves  ⋈ Sailors)


Answers
4.  Find  sailors  who  have  reserved  a  red  and a  green  boat

Previous  approach  won’t  work!    Must  identify  sailors  who have  


reserved  red  boats,  sailors  who have  reserved  green  boats,  then  
find  the  intersection  (note  that  sid is  a  key  for  Sailors):

ρ(Tempred,πsid((σcolor=‘red’Boats)  ⋈ Reserves))

ρ(Tempgreen,πsid((σcolor=‘green’Boats)  ⋈ Reserves))

πsname((Tempred ⋂ Tempgreen)  ⋈ Sailors)


More  examples  …
1. Find  (the  name  of)  all  sailors  whose  rating  is  
above  9
2. Find  all  sailors  who  reserved  a  boat  prior  to  
November  1,  1996
3. Find  (the  names  of)  all  boats  that  have  been  
reserved  at  least  once
4. Find  all  pairs  of  sailors  with  the  same  rating
5. Find  all  pairs  of  sailors  in  which  the  older sailor  
has  a  lower rating
Answers  …
1. Find  (the  name  of)  all  sailors  whose  rating  is  
above  9

π sname(σ (Sailors))
rating>9
Answers  …
2. Find  all  sailors  who  reserved  a  boat  prior  to  
November  1,  1996

π sname(Sailors  σ (Reserves))
day<'11/1/96'
Answers  …
3. Find  (the  names  of)  all  boats  that  have  been  
reserved  at  least  once

π (Boats  Reserves)
bname
Answers  …
4.  Find  all  pairs  of  sailors  with  the  same  rating

ρ (S1(1!!→sid1,2 !!→sname1,3!!→rating1,4 !!→age1),Sailors)

ρ (S2(1!!→sid2,2 !!→sname2,3!!→rating2,4 !!→age2),Sailors)

π (S1 S2)
sname1,sname2 rating1=rating2∧sid1≠sid2
Answers  …
5. Find  all  pairs  of  sailors  in  which  the  older
sailor  has  a  lower rating

π (S1 S2)
sname1,sname2 age1>age2∧rating1<rating2
Relational  Algebra
Relational  Query  Languages
Selection  &  Projection
Union,  Set  Difference  &  Intersection
Cross  product  &  Joins
Examples
Division  (additional  material)

50
Last Compound  Operator:  Division
Useful  for  expressing  “for  all”  queries  like:                                                                
Find  sids of  sailors  who  have  reserved  all boats.
For  A/B  attributes  of  B  are  subset  of  attrs of  A.
May  need  to  “project”  to  make  this  happen.
E.g.,  let  A have  2  fields,  x  and  y  ;  B have  only  field  y  :

A B = { x ∀ y ∈ B(∃ x, y ∈ A)}
A/B  contains  all  x tuples  such  that  for  every y tuple  in  B,  
there  is  an  xy tuple  in  A.
Examples  of  Division  A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4
s1 p3
B1 p2

s1 p4
B2 p4

s2 p1 sno B3
s2 p2 s1
s3 p2 s2
s4 p2 s3
s4 p4 s4

A A/B1
Examples  of  Division  A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4
s1 p3
B1 p2

s1 p4
B2 p4

s2 p1 sno B3
s2 p2 s1
s3 p2 s2 sno
s4 p2 s3 s1
s4 p4 s4 s4

A A/B1 A/B2
Examples  of  Division  A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4
s1 p3
B1 p2

s1 p4
B2 p4

s2 p1 sno B3
s2 p2 s1
s3 p2 s2 sno
s4 p2 s3 s1 sno
s4 p4 s4 s4 s1
A A/B1 A/B2 A/B3
Expressing  A/B  Using  Basic  Operators
Division  is  not  essential  op;  just  a  useful  
shorthand.    
(Also  true  of  joins,  but  joins  are  so  common  that  systems  
implement  joins  specially.)
Idea: For  A/B,  compute  all  x values  that  are  not  
“disqualified”  by  some  y value  in  B.
x value  is  disqualified if  by  attaching  y  value  from  B,  we  
obtain  a  xy tuple  that  is  not  in  A.

Disqualified  x values: p x ((p x ( A)´ B) - A)


A/B: p x ( A) - Disqualified  x values
Expressing  A/B: π sno (A)−π sno ((π sno (A)×B)−A)
sno pno
sno pno s1 p1
s1 p1 sno pno
s1 p2
s1 p2 s1 p1
s1 p4
s1 p3 ⇐ s2 × p2
s2 p1
s1 p4 s3
s2 p2 p4
s2 p1 s4
s2 p4 d

B
s2 p2 s3 p1
s3 p2 s3 p2
s4 p2 s3 p4
s4 p4 s4 p1
s4 p2
A
d
s4 p4
T1=π sno (A)×B
Expressing  A/B: π sno (A)−π sno ((π sno (A)×B)−A)
sno pno sno pno
sno pno s1 p1 s2 p4
s1 p1 s1 p2 s3 p1 pno
s1 p2 s1 p4 s3 p4 p1
s1 p3 s2 p1 s4 p1 p2
s1 p4 s2 p2
d

p4
T1−A
s2 p1 s2 p4
s2 p2
B
s3 p1
s3 p2 s3 p2 sno
s4 p2 s3 p4 s2
s4 p4 s4 p1 s3
s4 p2
A s4
s4 p4 d

T1=π sno (A)×B T2=π sno (T1−A)


Expressing  A/B: π sno (A)−π sno ((π sno (A)×B)−A)
sno pno sno pno
sno pno s1 p1 s2 p4
s1 p1 s1 p2 s3 p1 pno
s1 p2 s1 p4 s3 p4 p1
s1 p3 s2 p1 s4 p1 p2
s1 p4 s2 p2
d

p4
T1−A
s2 p1 s2 p4
s2 p2
B
s3 p1 sno
s3 p2 s3 p2 s1 sno sno
s4 p2 s3 p4 s2
s4 p4 − s2 = s1
s4 p1 s3 s3
s4 p2 s4
A/B=
A s4 π sno (A)−T2
s4 p4 d

T1=π sno (A)×B T2=π sno (T1−A)


Example  of  Division
Find  the  names  of  sailors  who  have  reserved  all  boats

Uses  division;  schemas  of  the  input  relations  to  /  must  


be  carefully  chosen:
r (Tempsids, (p Re serves) / (p Boats))
sid, bid bid
p sname (Tempsids !" Sailors)

To  find  sailors  who  have  reserved  all  “Interlake”  boats:


..... / p (s Boats)
bid bname =' Interlake'
SQL

60
Moving  on  to  SQL
Database  Management  Systems  (DBMS)  store  and  
manage  large  quantities  of  data
We  want  an  intuitive  way  to  ask  questions  (queries)
You  have  been  taught  procedural  languages  (C,  java)
which  specify  how  to  solve  a  problem  (or  answer  a  question)
Now,  we  talk  about  SQL
SQL  is  a  declarative  query  language
We  ask  what  we  want  and  the  DBMS  is  going  to  deliver!
SQL  -­‐ A  language  for  Relational  DBs
SQL* (a.k.a.  “Sequel”),  standard  language
Data  Definition  Language  (DDL)
create,  modify,  delete  relations
specify  constraints
administer  users,  security,  etc.
Data  Manipulation  Language  (DML)
Specify  queries  to  find  tuples  that  satisfy  criteria
add,  modify,  remove  tuples

*  Structured  Query  Language


Reiterate  some  terminology
sid name login age gpa
Relation  (or  table)
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2

sid name login age gpa


Row  (or  tuple) 53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2

sid name login age gpa


Column  (or  attribute)
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2
Reiterate  some  terminology
sid name login age gpa
Primary  Key  (PK)
53666 Jones jones@cs 18 3.4
53688 Smith smith@ee 18 3.2

The  PK  of  a  relation  is  the  column  (or  the  group  of  
columns)  that  can  uniquely  define  a  row.

In  other  words:
Two  rows  cannot have  the  same  PK.
SQL  Overview
CREATE TABLE <name> ( <field> <domain>, … )

INSERT INTO <name> (<field names>)


VALUES (<field values>)

DELETE FROM <name>


WHERE <condition>

UPDATE <name>
SET <field name> = <value>
WHERE <condition>

SELECT <fields>
FROM <name>
WHERE <condition>
GROUP BY <fields>
HAVING <condition>
ORDER BY <fields>
Creating  Relations  in  SQL
Creates  the  Students  relation.
Note:  the  type  (domain)    of  each  field  is  
specified,  and  enforced  by  the  DBMS  
whenever  tuples  are  added  or  modified.  
CREATE TABLE Students
(sid CHAR(20),
name CHAR(20),
login CHAR(10),
age INTEGER,
gpa FLOAT)
Table  Creation  (continued)
Another  example:  the  Enrolled  table  holds  
information  about  courses    students  take.

CREATE TABLE Enrolled


(sid CHAR(20),
cid CHAR(20),
grade CHAR(2))
Primary  and  Candidate  Keys  in  SQL
Possibly  many  candidate  keys (specified  using  UNIQUE),  
one  of  which  is  chosen  as  the  primary  key.
Keys  must  be  used  carefully!
“For  a  given  student  and  course,  there  is  a  single  grade.”

CREATE TABLE Enrolled CREATE TABLE Enrolled


(sid CHAR(20) (sid CHAR(20)
cid CHAR(20), cid CHAR(20),
grade CHAR(2), vs. grade CHAR(2),
PRIMARY KEY (sid),
PRIMARY KEY (sid,cid))
UNIQUE (cid, grade))

“Students  can  take  only  one  course,  and  no  two  students  in  a  
course  receive  the  same  grade.”
Foreign  Keys,  Referential  Integrity
Foreign  key :  Set  of  fields  in  one  relation  that  is  used  to  
“refer”  to  a  tuple  in  another  relation.    
Must  correspond  to  the  primary  key  of  the  other  
relation.    
Like  a  “logical  pointer”.

If  all  foreign  key  constraints  are  enforced,    referential  


integrity is  achieved  (i.e.,  no  dangling  references.)
Foreign  Keys  in  SQL
Example:  Only  students  listed  in  the  Students  relation
should  be  allowed  to  enroll  for  courses.
sid is  a  foreign  key  referring  to  Students:
CREATE TABLE Enrolled
(sid CHAR(20),cid CHAR(20),grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students )

Enrolled
sid cid grade Students
53666 15-101 C sid name login age gpa
53666 18-203 B 53666 Jones jones@cs 18 3.4
53650 15-112 A 53688 Smith smith@cs 18 3.2
53666 15-105 B 53650 Smith smith@math 19 3.8
Adding  and  Deleting  Tuples
Can  insert  a  single  tuple  using:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (‘53688’, ‘Smith’, ‘smith@cs’, 18, 3.2)

Can  delete  all  tuples  satisfying  some  condition  


(e.g.,  name  =  Smith):
DELETE
FROM Students S
WHERE S.name = ‘Smith’

Powerful  variants  of  these  commands  are  available;  


more  later!
The  simplest  SQL  query
“Find  all  contents  of  a  table”
In  this  example:  “Find  all  info  for  all  students”

SELECT * sid name login age gpa

FROM Students S 53666 Jones jones@cs 18 3.4


53688 Smith smith@ee 18 3.2
53777 White white@cs 19 4.0

To  find  just  names  and  logins,  replace  the  first  line:


SELECT S.name, S.login
Show  specific  columns
“Find  name  and  login  for  all  students”

name login
SELECT S.name, S.login
FROM Students S Jones jones@cs
Smith smith@ee
White white@cs

This  is  called:  “Project name  and  login  


from  table  Students”
Show  specific  rows
“Find  all  18  year  old  students”

sid name login age gpa


SELECT *
53666 Jones jones@cs 18 3.4
FROM Students S
53688 Smith smith@ee 18 3.2
WHERE S.age=18

This  is  called:  “Select students  with  age  18.”


Querying  Multiple  Relations
Can  specify  a  join  over  two  tables  as  follows:
SELECT S.name, E.cid
FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=‘B'
sid cid grade sid name login age gpa
53831 Carnatic101 C 53666 Jones jones@cs 18 3.4
53831 Reggae203 B 53688 Smith smith@ee 18 3.2
53650 Topology112 A
53666 History105 B

S.name E.cid
result  =
Jones History105
Basic  SQL  Query SELECT                [DISTINCT]    target-­‐list
FROM                  relation-­‐list
WHERE                qualification
relation-­‐list :  A  list  of  relation  names  
possibly  with  a  range-­‐variable after  each  name
target-­‐list :  A  list  of  attributes  of  tables  in  relation-­‐list
qualification :  Comparisons  combined  using  AND,  OR  and  
NOT.
Comparisons  are  Attr op const or  Attr1  op Attr2,  where  op is  
one  of <, >, =, £, ³, ¹
DISTINCT:  optional  keyword  indicating  that  the  answer  
should  not  contain  duplicates.  
In  SQL  SELECT,  the  default  is  that  duplicates  are  not
eliminated!  (Result  is  called  a  “multiset”)
Query  Semantics
Conceptually,  a  SQL  query  can  be  computed:
1.  FROM  :  compute  cross-­‐product of  tables  
(e.g.,  Students  and  Enrolled).
2.  WHERE  :  Check  conditions,  discard  tuples  that  fail.  
(called  “selection”).
3.  SELECT :  Delete  unwanted  fields.  
(called  “projection”).
4.  If  DISTINCT  specified,  eliminate  duplicate  rows.
Probably  the  least  efficient  way  to  compute  a  query!  
Query  Optimization  helps  us  find  more  efficient  
strategies  to  get  the  same  answer.
Remember  the  query  and  the  data

SELECT S.name, E.cid


FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=‘B'

sid cid grade sid name login age gpa


53831 Carnatic101 C 53666 Jones jones@cs 18 3.4
53831 Reggae203 B 53688 Smith smith@ee 18 3.2
53650 Topology112 A
53666 History105 B
Step  1  – Cross  Product
Combine  with  cross-­‐product  all  tables  of  the  FROM  clause.

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade


53666 Jones jones@cs 18 3.4 53831 Carnatic101 C
53666 Jones jones@cs 18 3.4 53832 Reggae203 B
53666 Jones jones@cs 18 3.4 53650 Topology112 A
53666 Jones jones@cs 18 3.4 53666 History105 B
53688 Smith smith@ee 18 3.2 53831 Carnatic101 C
53688 Smith smith@ee 18 3.2 53831 Reggae203 B
53688 Smith smith@ee 18 3.2 53650 Topology112 A
53688 Smith smith@ee 18 3.2 53666 History105 B

SELECT S.name, E.cid


FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=‘B'
Step  2  -­‐ Discard  tuples  that  fail  predicate
Make  sure  the  WHERE  clause  is  true!

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade


53666 Jones jones@cs 18 3.4 53831 Carnatic101 C
53666 Jones jones@cs 18 3.4 53832 Reggae203 B
53666 Jones jones@cs 18 3.4 53650 Topology112 A
53666 Jones jones@cs 18 3.4 53666 History105 B
53688 Smith smith@ee 18 3.2 53831 Carnatic101 C
53688 Smith smith@ee 18 3.2 53831 Reggae203 B
53688 Smith smith@ee 18 3.2 53650 Topology112 A
53688 Smith smith@ee 18 3.2 53666 History105 B

SELECT S.name, E.cid


FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=‘B'
Step  3  -­‐ Discard  Unwanted  Columns
Show  only  what  is  on  the  SELECT clause.

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade


53666 Jones jones@cs 18 3.4 53831 Carnatic101 C
53666 Jones jones@cs 18 3.4 53832 Reggae203 B
53666 Jones jones@cs 18 3.4 53650 Topology112 A
53666 Jones jones@cs 18 3.4 53666 History105 B
53688 Smith smith@ee 18 3.2 53831 Carnatic101 C
53688 Smith smith@ee 18 3.2 53831 Reggae203 B
53688 Smith smith@ee 18 3.2 53650 Topology112 A
53688 Smith smith@ee 18 3.2 53666 History105 B

SELECT S.name, E.cid


FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=‘B'
Was  this  a  fast  way  to  evaluate:
SELECT S.name, E.cid
FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=‘B'

If  not  what  is  an  efficient  way?

82
Reserves sid bid day
Now  the  Details… 22 101 10/10/96
95 103 11/12/96
We  will  use  these   Sailors sid sname rating age
instances  of  
relations  in  our   22 Dustin 7 45.0
examples. 31 Lubber 8 55.5
95 Bob 3 63.5
bid bname color
Boats
101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine red
Example  Schemas
CREATE TABLE Sailors (sid INTEGER,
sname CHAR(20),rating INTEGER,age REAL,
PRIMARY KEY sid)

CREATE TABLE Boats (bid INTEGER,


bname CHAR (20), color CHAR(10)
PRIMARY KEY bid)

CREATE TABLE Reserves (sid INTEGER,


bid INTEGER, day DATE,
PRIMARY KEY (sid, bid, date),
FOREIGN KEY sid REFERENCES Sailors,
FOREIGN KEY bid REFERENCES Boats)
Another  Join  Query
SELECT sname
FROM Sailors, Reserves
WHERE Sailors.sid=Reserves.sid
AND bid=103

(sid) sname rating age (sid) bid day


22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
95 Bob 3 63.5 22 101 10/10/96
95 Bob 3 63.5 95 103 11/12/96
Range  Variables  (1/2)
Can  associate  “range  variables”  with  the  tables  in  the  
FROM  clause.  
saves  writing,  makes  queries  easier  to  understand
Needed  when  ambiguity  could  arise.  
for  example,  if  same  table  used  multiple  times  in  same  
FROM  (called  a  “self-­‐join”)
SELECT sname
FROM Sailors,Reserves
WHERE Sailors.sid=Reserves.sid AND bid=103

Can  be   SELECT S.sname


rewritten  using FROM Sailors S, Reserves R
WHERE S.sid=R.sid AND bid=103
range  variables  as:
Range  Variables  (2/2)
Here  is  an  example  where  range  variables  are  required  
(self-­‐join  example):
SELECT x.sname, x.age, y.sname, y.age
FROM Sailors x, Sailors y
WHERE x.age > y.age

Note  that  target  list  can  be  replaced  by  “*”  if  you  don’t  
want  to  do  a  projection:
SELECT *
FROM Sailors x
WHERE x.age > 20
Find  sailors  who  have  reserved  at  least  one  boat

SELECT S.sid
FROM Sailors S, Reserves R
WHERE S.sid=R.sid
Would  adding  DISTINCT  to  this  query  make  a  
difference?
What  is  the  effect  of  replacing  S.sid by  S.sname in  the  
SELECT  clause?    
Would  adding  DISTINCT  to  this  variant  of  the  query  make  a  
difference?
Expressions
Can  use  arithmetic  expressions  in  SELECT  clause  (plus  
other  operations  we’ll  discuss  later)
Use AS to  provide  column  names
SELECT S.age, S.age-5 AS age1, 2*S.age AS age2
FROM Sailors S
WHERE S.sname = ‘dustin’
Can  also  have  expressions  in  WHERE  clause:
SELECT S1.sname AS name1, S2.sname AS name2
FROM Sailors S1, Sailors S2
WHERE 2*S1.rating = S2.rating - 1
String  operations
SQL  also  supports  some  string  operations
“LIKE” is  used  for  string  matching.  

SELECT S.age, age1=S.age-5, 2*S.age AS age2


FROM Sailors S
WHERE S.sname LIKE ‘B_%B’

`_’  stands  for  any  one  character


`%’  stands  for  0  or  more  arbitrary  characters.
Logical  Operations
SQL  queries  produce  new  tables
If  the  results  of  two  queries  are  
set-­‐compatible  (same  #  and  types  columns)
then  we  can  apply  logical  operations
UNION
INTERSECTION
SET  DIFFERENCE  (called  EXCEPT  or  MINUS)
Find  sids of  sailors  who  have  reserved  a  red  or a  green  boat
UNION:  Can  be  used  to  compute  the  union  of  any  two  union-­‐
compatible sets  of  tuples  (which  are  themselves  the  result  of  
SQL  queries).
SELECT R.sid
FROM Boats B,Reserves R
WHERE R.bid=B.bid AND
(B.color=‘red’OR B.color=‘green’)
Vs.
SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid=B.bid AND B.color=‘red’
UNION SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid=B.bid AND
B.color=‘green’
Find  sids of  sailors  who  have  reserved  a  red  and a  green  boat

If  we  simply  replace  OR by  AND in  the  previous  query,  


we  get  the  wrong  answer.    (Why?)
Instead,  could  use  a  self-­‐join:

SELECT R1.sid
SELECT
FROM BoatsR.sid
B1, Reserves R1,
FROM Boats Boats
B,Reserves R
B2, Reserves R2
WHERE
WHERE R.bid=B.bid AND
R1.sid=R2.sid
(B.color=‘red’
AND AND B.color=‘green’)
R1.bid=B1.bid
AND R2.bid=B2.bid
AND (B1.color=‘red’ AND B2.color=‘green’)
AND  Continued…
Key  field!
INTERSECT:discussed in  the   SELECT S.sid
book. Can  be  used  to   FROM Sailors S, Boats B,
compute  the  intersection  of   Reserves R
any  two    union-­‐compatible WHERE S.sid=R.sid
sets  of  tuples.   AND R.bid=B.bid
AND B.color=‘red’
INTERSECT
Also  in  text:    EXCEPT
(sometimes  called  MINUS)
SELECT S.sid
FROM Sailors S, Boats B,
Included  in  the  SQL/92  
Reserves R
standard,  but  some  systems  
WHERE S.sid=R.sid
don’t  support  them.
AND R.bid=B.bid
AND B.color=‘green’
Your  turn  …
1. Find  (the  names  of)  all  sailors  who  are  over  
50  years  old
2. Find  (the  names  of)  all  boats  that  have  
been  reserved  at  least  once
3. Find  all  sailors  who  have  not reserved  a  
red  boat  (hint:  use  “EXCEPT”)
4. Find  all  pairs  of  same-­‐color  boats
5. Find  all  pairs  of  sailors  in  which  the  older
sailor  has  a  lower rating
Answers  …
1. Find  (the  names  of)  all  sailors  who  are  over  
50  years  old

SELECT S.sname
FROM Sailors S
WHERE S.age > 50
Answers  …
2. Find  (the  names  of)  all  boats  that  have  
been  reserved  at  least  once

SELECT DISTINCT B.bname


FROM Boats B, Reserves R
WHERE R.bid=B.bid
Answers  …
3. Find  all  sailors  who  have  not reserved  a  
red  boat
SELECT S.sid
FROM Sailors S
EXCEPT
SELECT R.sid
FROM Boats B,Reserves R
WHERE R.bid=B.bid
AND B.color=‘red’
Answers  …
4. Find  all  pairs  of  same-­‐color  boats

SELECT B1.bname, B2.bname


FROM Boats B1, Boats B2
WHERE B1.color = B2.color
AND B1.bid < B2.bid
Answers  …
5. Find  all  pairs  of  sailors  in  which  the  older
sailor  has  a  lower rating

SELECT S1.sname, S2.sname


FROM Sailors S1, Sailors S2
WHERE S1.age > S2.age
AND S1.rating < S2.rating
Queries  With  GROUP  BY  and  HAVING

SELECT [DISTINCT] target-list


FROM relation-list
WHERE qualification
GROUP BY grouping-list
[HAVING group-qualification]

Group  rows  by  columns  in  grouping-­‐list


Use  the    HAVING  clause  to  restrict  which  group-­‐rows  
are  returned  in  the  result  set
Conceptual  Evaluation
1. Cross-­‐product  of  relation-­‐list
2. Select  only  tuples  that  follow  the  where  clause  
(qualification)
3. Partition  rows  by  the  value  of  attributes  in  grouping-­‐list
4. Select  only  groups  that  follow  the  group-­‐qualification
Expressions  in  group-­‐qualification must  have  a  single  value  per  group!  That  is,  
attributes  in  group-­‐qualification must  be  arguments  of  an  aggregate  op  or  must  
also  appear  in  the  grouping-­‐list.    
5. One  answer  tuple  is  generated  per  qualifying  group.
Find  the  age  of  the  youngest  sailor  with  age  ³ 18,  
for  each  rating  with  at  least  2  such sailors
SELECT S.rating, MIN (S.age)
FROM Sailors S
sid sname rating age
WHERE S.age >= 18 22 dustin 7 45.0
GROUP BY S.rating 31 lubber 8 55.5
HAVING COUNT (*) > 1 71 zorba 10 16.0
64 horatio 7 35.0
rating age 29 brutus 1 33.0
1 33.0 58 rusty 10 35.0
7 45.0
rating m-age count
7 35.0
2 8 55.5 1 33.0 1 rating
10 35.0 7 35.0 2 7 35.0
3 8 55.0 1
10 35.0 1
Sorting  the  Results  of  a  Query
ORDER  BY  column   [  ASC  |  DESC]  [,  ...]
SELECT S.rating, S.sname, S.age
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’
ORDER BY S.rating, S.sname;
Sorting  the  Results  of  a  Query
ORDER  BY  column   [  ASC  |  DESC]  [,  ...]
SELECT S.rating, S.sname, S.age
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’
ORDER BY S.rating, S.sname;

Extra  reporting  power  obtained  by  combining  


with  aggregation.
SELECT S.sid, COUNT (*) AS redrescnt
FROM Sailors S, Boats B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’
GROUP BY S.sid
ORDER BY redrescnt DESC;
TPCH  on  PostgreSQL

DEMO

106
Getting  PostgreSQL
Part  of  several  linux distributions
[https://www.postgresql.org/download/linux/]

For  Ubuntu  14.04:


$  sudo apt-­‐get  install  postgresql-­‐9.3  

107
TPC-­‐H:  a  decision  support  benchmark  
a  suite  of  business  oriented  ad-­‐hoc  queries  
queries  and  data:  
broad  industry-­‐wide  relevance  while  
maintain  ease  of  implementation
relevant  systems:
examine  large  volumes  of  data
execute  highly  complex  queries
answer  critical  business  questions

108
1.2 Database Entities, Relationships, and Characteristics

TPC-­‐H  schema
The components of the TPC-H database are defined to consist of eight separate and individual tables (the Base
Tables). The relationships between columns of these tables are illustrated in Figure 2: The TPC-H Schema.

Figure 2: The TPC-H Schema

PART (P_) PARTSUPP (PS_) LINEITEM (L_) ORDERS (O_)


SF*200,000 SF*800,000 SF*6,000,000 SF*1,500,000
PARTKEY PARTKEY ORDERKEY ORDERKEY

NAME SUPPKEY PARTKEY CUSTKEY

MFGR AVAILQTY SUPPKEY ORDERSTATUS

BRAND SUPPLYCOST LINENUMBER TOTALPRICE

TYPE COMMENT QUANTITY ORDERDATE

SIZE EXTENDEDPRICE ORDER-


CUSTOMER (C_) PRIORITY
CONTAINER SF*150,000 DISCOUNT
CLERK
CUSTKEY
RETAILPRICE TAX SHIP-
NAME PRIORITY
COMMENT RETURNFLAG
ADDRESS COMMENT
LINESTATUS
SUPPLIER (S_) NATIONKEY
SF*10,000 SHIPDATE
PHONE
SUPPKEY COMMITDATE
ACCTBAL
NAME RECEIPTDATE
MKTSEGMENT
ADDRESS
SHIPINSTRUCT
COMMENT
NATIONKEY SHIPMODE
PHONE NATION (N_) COMMENT
25
ACCTBAL
NATIONKEY REGION (R_)
COMMENT 5
NAME
REGIONKEY
REGIONKEY
NAME
COMMENT
COMMENT

Legend:
The parentheses following each table name contain the prefix of the column names for that table;
The arrows point in the direction of the one-to-many relationships between tables; 109
The number/formula below each table name represents the cardinality (number of rows) of the table. Some
TPCH  Q6
select  sum(l_extendedprice*l_discount)  as  revenue
from lineitem
where l_shipdate >=  date  '[DATE]’  and  
l_shipdate <  date  '[DATE]'  +  interval  '1'  year  and  
l_discount between  [DISCOUNT]  -­‐ 0.01  and
[DISCOUNT]  +  0.01  and  l_quantity <  [QUANT];

parameters:
DATE
DISCOUNT
QUANT

110
Getting  TPCH
Transaction  Performance  Council  (TPC)
[http://www.tpc.org/tpch/]

More  details  about  all  22  queries:


[http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-­‐h_v2.17.1.pdf]

Download  Specification  &  Tools  from  


[http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp]

111

You might also like