0% found this document useful (0 votes)
2 views

02-modernsql

The lecture introduces the history and evolution of SQL, beginning with IBM's creation of the first relational query language in 1971 and culminating in the current SQL:2023 standard. It covers various SQL functionalities, including data manipulation, aggregation, and string operations, as well as the importance of the Relational Model. Today's agenda includes topics such as aggregations, group by operations, and nested queries.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

02-modernsql

The lecture introduces the history and evolution of SQL, beginning with IBM's creation of the first relational query language in 1971 and culminating in the current SQL:2023 standard. It covers various SQL functionalities, including data manipulation, aggregation, and string operations, as well as the importance of the Relational Model. Today's agenda includes topics such as aggregations, group by operations, and nested queries.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Intro to Database

Systems (15-445/645)

Lecture #02
Modern
SQL
FALL 2023 Prof. Andy Pavlo Prof. Jignesh Patel
2

LAST CLASS

We introduced the Relational Model as the


superior data model for databases.

We then showed how Relational Algebra is the


building blocks that will allow us to query and
modify a relational database.

15-445/645 (Fall 2023)


3

S Q L H I S TO R Y

In 1971, IBM created its first relational query


language called SQUARE.

IBM then created "SEQUEL" in 1972 for IBM


System R prototype DBMS.
→ Structured English Query Language

IBM releases commercial SQL-based DBMSs:


→ System/38 (1979), SQL/DS (1981), and DB2 (1983).

15-445/645 (Fall 2023)


4

S Q L H I S TO R Y

In 1971, IBM created its first relational query


language called SQUARE.

IBM then created "SEQUEL" in 1972 for IBM


System R prototype DBMS.
→ Structured English Query Language

IBM releases commercial SQL-based DBMSs:


→ System/38 (1979), SQL/DS (1981), and DB2 (1983).

15-445/645 (Fall 2023)


5

S Q L H I S TO R Y

In 1971, IBM created its first relational query


language called SQUARE.

IBM then created "SEQUEL" in 1972 for IBM


System R prototype DBMS.
→ Structured English Query Language

IBM releases commercial SQL-based DBMSs:


→ System/38 (1979), SQL/DS (1981), and DB2 (1983).

15-445/645 (Fall 2023)


6

S Q L H I S TO R Y

ANSI Standard in 1986. ISO in 1987


→ Structured Query Language

Current standard is SQL:2023


→ SQL:2023 → Property Graph Queries, Muti-Dim. Arrays
→ SQL:2016 → JSON, Polymorphic tables
→ SQL:2011 → Temporal DBs, Pipelined DML
→ SQL:2008 → Truncation, Fancy Sorting
→ SQL:2003 → XML, Windows, Sequences, Auto-Gen IDs.
→ SQL:1999 → Regex, Triggers, OO

The minimum language syntax a system needs to


15-445/645 (Fall 2023)
say that it supports SQL is SQL-92.
7

S Q L H I S TO R Y

ANSI Standard in 1986. ISO in 1987


→ Structured Query Language

Current standard is SQL:2023


→ SQL:2023 → Property Graph Queries, Muti-Dim. Arrays
→ SQL:2016 → JSON, Polymorphic tables
→ SQL:2011 → Temporal DBs, Pipelined DML
→ SQL:2008 → Truncation, Fancy Sorting
→ SQL:2003 → XML, Windows, Sequences, Auto-Gen IDs.
→ SQL:1999 → Regex, Triggers, OO

The minimum language syntax a system needs to


15-445/645 (Fall 2023)
say that it supports SQL is SQL-92.
8

S Q L H I S TO R Y

ANSI Standard in 1986. ISO in 1987


→ Structured Query Language

Current standard is SQL:2023


→ SQL:2023 → Property Graph Queries, Muti-Dim. Arrays
→ SQL:2016 → JSON, Polymorphic tables
→ SQL:2011 → Temporal DBs, Pipelined DML
→ SQL:2008 → Truncation, Fancy Sorting
→ SQL:2003 → XML, Windows, Sequences, Auto-Gen IDs.
→ SQL:1999 → Regex, Triggers, OO

The minimum language syntax a system needs to


15-445/645 (Fall 2023)
say that it supports SQL is SQL-92.
9

R E L AT I O N A L L A N G UAG E S

Data Manipulation Language (DML)


Data Definition Language (DDL)
Data Control Language (DCL)

Also includes:
→ View definition
→ Integrity & Referential Constraints
→ Transactions

Important: SQL is based on bags (duplicates) not


sets (no duplicates).
15-445/645 (Fall 2023)
10

TO DAY ' S AG E N DA

Aggregations + Group By
String / Date / Time Operations
Output Control + Redirection
Window Functions
Nested Queries
Lateral Joins
Common Table Expressions

15-445/645 (Fall 2023)


11

E X A M P L E DATA B A S E

student(sid,name,login,gpa) enrolled(sid,cid,grade)
sid name login age gpa sid cid grade
53666 RZA rza@cs 44 4.0 53666 15-445 C
53688 Bieber jbieber@cs 27 3.9 53688 15-721 A
53655 Tupac shakur@cs 25 3.5 53688 15-826 B
53655 15-445 B
course(cid,name) 53666 15-721 C
cid name
15-445 Database Systems
15-721 Advanced Database Systems
15-826 Data Mining
15-799 Special Topics in Databases

15-445/645 (Fall 2023)


12

AG G R E G AT E S

Functions that return a single value from a bag of


tuples:
→ AVG(col)→ Return the average col value.
→ MIN(col)→ Return minimum col value.
→ MAX(col)→ Return maximum col value.
→ SUM(col)→ Return sum of values in col.
→ COUNT(col)→ Return # of values for col.

15-445/645 (Fall 2023)


13

AG G R E G AT E S

Aggregate functions can (almost) only be used in


the SELECT output list.

Get # of students with a “@cs” login:


SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs'

15-445/645 (Fall 2023)


14

AG G R E G AT E S

Aggregate functions can (almost) only be used in


the SELECT output list.

Get # of students with a “@cs” login:


SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs'

15-445/645 (Fall 2023)


15

AG G R E G AT E S

Aggregate functions can (almost) only be used in


the SELECT output list.

Get # of students with a “@cs” login:


SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs'
SELECT COUNT(*) AS cnt
FROM student WHERE login LIKE '%@cs'

15-445/645 (Fall 2023)


16

AG G R E G AT E S

Aggregate functions can (almost) only be used in


the SELECT output list.

Get # of students with a “@cs” login:


SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs'
SELECT COUNT(*) AS cnt
FROM student WHERE login LIKE '%@cs'
SELECT COUNT(1) AS cnt
FROM student WHERE login LIKE '%@cs'

15-445/645 (Fall 2023)


17

AG G R E G AT E S

Aggregate functions can (almost) only be used in


the SELECT output list.

Get # of students with a “@cs” login:


SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs'
SELECT COUNT(*) AS cnt
FROM student WHERE login LIKE '%@cs'
SELECT COUNT(1) AS cnt
FROM student WHERE login LIKE '%@cs'
SELECT COUNT(1+1+1) AS cnt
FROM student WHERE login LIKE '%@cs'

15-445/645 (Fall 2023)


18

M U LT I P L E AG G R E G AT E S

Get the number of students and their average GPA that


have a “@cs” login.

AVG(gpa) COUNT(sid)
SELECT AVG(gpa), COUNT(sid) 3.8 3
FROM student WHERE login LIKE '%@cs'

15-445/645 (Fall 2023)


20

AG G R E G AT E S

Output of other columns outside of an aggregate is


undefined.

Get the average GPA of students enrolled in each course.

AVG(s.gpa) e.cid
SELECT AVG(s.gpa), e.cid 3.86 ???
FROM enrolled AS e JOIN student AS s
ON e.sid = s.sid

15-445/645 (Fall 2023)


21

AG G R E G AT E S

Output of other columns outside of an aggregate is


undefined.

Get the average GPA of students enrolled in each course.

AVG(s.gpa) e.cid
SELECT AVG(s.gpa), e.cid 3.86 ???
FROM enrolled AS e JOIN student AS s
ON e.sid = s.sid

15-445/645 (Fall 2023)


22

G RO U P B Y

Project tuples into subsets and


SELECT AVG(s.gpa), e.cid
calculate aggregates against FROM enrolled AS e JOIN student AS s
each subset. ON e.sid = s.sid
GROUP BY e.cid

e.sid s.sid s.gpa e.cid


53435 53435 2.25 15-721 AVG(s.gpa) e.cid
53439 53439 2.70 15-721 2.46 15-721
56023 56023 2.75 15-826 3.39 15-826
59439 59439 3.90 15-826 1.89 15-445
53961 53961 3.50 15-826
58345 58345 1.89 15-445

15-445/645 (Fall 2023)


23

G RO U P B Y

Non-aggregated values in SELECT output clause


must appear in GROUP BY clause.

SELECT AVG(s.gpa), e.cid, s.name


FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid

15-445/645 (Fall 2023)


24

G RO U P B Y

Non-aggregated values in SELECT output clause


must appear in GROUP BY clause.

SELECT AVG(s.gpa), e.cid, s.name


FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid

15-445/645 (Fall 2023)


25

G RO U P B Y

Non-aggregated values in SELECT output clause


must appear in GROUP BY clause.

SELECT AVG(s.gpa), e.cid, s.name


FROM enrolled AS e JOIN student AS s
ON e.sid = s.sid
GROUP BY e.cid, s.name

15-445/645 (Fall 2023)


26

H AV I N G

Filters results based on aggregation computation.


Like a WHERE clause for a GROUP BY

SELECT AVG(s.gpa) AS avg_gpa, e.cid


FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
AND avg_gpa > 3.9
GROUP BY e.cid

15-445/645 (Fall 2023)


27

H AV I N G

Filters results based on aggregation computation.


Like a WHERE clause for a GROUP BY

SELECT AVG(s.gpa) AS avg_gpa, e.cid


FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP
AND BY
avg_gpa
e.cid> 3.9
GROUP BYavg_gpa
HAVING e.cid > 3.9;

15-445/645 (Fall 2023)


28

H AV I N G

Filters results based on aggregation computation.


Like a WHERE clause for a GROUP BY

SELECT AVG(s.gpa) AS avg_gpa, e.cid


FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP
AND BY
avg_gpa
e.cid> 3.9
GROUP BYAVG(s.gpa)
HAVING avg_gpa
e.cid > 3.9;
> 3.9;

AVG(s.gpa) e.cid
3.75 15-415 avg_gpa e.cid
3.950000 15-721 3.950000 15-721
3.900000 15-826
15-445/645 (Fall 2023)
29

S T R I N G O P E R AT I O N S

String Case String Quotes


SQL-92 Sensitive Single Only
Postgres Sensitive Single Only
MySQL Insensitive Single/Double
SQLite Sensitive Single/Double
MSSQL Sensitive Single Only
Oracle Sensitive Single Only

WHERE UPPER(name) = UPPER('TuPaC') SQL-92

WHERE name = "TuPaC" MySQL

15-445/645 (Fall 2023)


30

S T R I N G O P E R AT I O N S

LIKE is used for string matching.


String-matching operators
→ '%' Matches any substring (including SELECT * FROM enrolled AS e
empty strings). WHERE e.cid LIKE '15-%'
→ '_' Match any one character
SELECT * FROM student AS s
WHERE s.login LIKE '%@c_'

15-445/645 (Fall 2023)


31

S T R I N G O P E R AT I O N S

SQL-92 defines string functions.


→ Many DBMSs also have their own unique functions
Can be used in either output and predicates:

SELECT SUBSTRING(name,1,5) AS abbrv_name


FROM student WHERE sid = 53688

SELECT * FROM student AS s


WHERE UPPER(s.name) LIKE 'KAN%'

15-445/645 (Fall 2023)


32

S T R I N G O P E R AT I O N S

SQL standard defines the || operator for


concatenating two or more strings together.

SELECT name FROM student SQL-92


WHERE login = LOWER(name) || '@cs'
SELECT name FROM student MSSQL
WHERE login = LOWER(name) + '@cs'
SELECT name FROM student MySQL
WHERE login = CONCAT(LOWER(name), '@cs')

15-445/645 (Fall 2023)


33

DAT E / T I M E O P E R AT I O N S

Operations to manipulate and modify DATE/TIME


attributes.
Can be used in both output and predicates.
Support/syntax varies wildly…

Demo: Get the # of days since the beginning of


the year.

15-445/645 (Fall 2023)


34

OUTPUT REDIRECTION

Store query results in another table:


→ Table must not already be defined.
→ Table will have the same # of columns with the same
types as the input.

SELECT DISTINCT cid INTO CourseIds SQL-92


FROM enrolled;

CREATE TABLE CourseIds ( MySQL


SELECT DISTINCT cid FROM enrolled);

15-445/645 (Fall 2023)


35

OUTPUT REDIRECTION

Store query results in another table:


→ Table must not already be defined.
→ Table will have the same # of columns with the same
types as the input.

SELECT DISTINCT cid INTO CourseIds SQL-92


FROM SELECT
enrolled;
DISTINCT cid Postgres
INTO TEMPORARY CourseIds
CREATE TABLE
FROM CourseIds
enrolled; ( MySQL
SELECT DISTINCT cid FROM enrolled);

15-445/645 (Fall 2023)


36

OUTPUT REDIRECTION

Insert tuples from query into another table:


→ Inner SELECT must generate the same columns as the
target table.
→ DBMSs have different options/syntax on what to do with
integrity violations (e.g., invalid duplicates).

INSERT INTO CourseIds SQL-92


(SELECT DISTINCT cid FROM enrolled);

15-445/645 (Fall 2023)


37

O U T P U T C O N T RO L

ORDER BY <column*> [ASC|DESC]


→ Order the output tuples by the values in one or more of
their columns.
sid grade
SELECT sid, grade FROM enrolled 53123 A
WHERE cid = '15-721' 53334 A
ORDER BY grade 53650 B
53666 D

15-445/645 (Fall 2023)


38

O U T P U T C O N T RO L

ORDER BY <column*> [ASC|DESC]


→ Order the output tuples by the values in one or more of
their columns.

SELECT sid, grade FROM enrolled


WHERE cidsid,
SELECT = '15-721'
grade FROM enrolled
ORDER BY cid
WHERE grade= '15-721'
ORDER BY 2

15-445/645 (Fall 2023)


39

O U T P U T C O N T RO L

ORDER BY <column*> [ASC|DESC]


→ Order the output tuples by the values in one or more of
their columns.

SELECT sid, grade FROM enrolled


WHERE cidsid,
SELECT = '15-721'
grade FROM enrolled
ORDER BY cid
WHERE grade= '15-721'
ORDER BY 2
sid
SELECT sid FROM enrolled 53666
WHERE cid = '15-721' 53650
ORDER BY grade DESC, sid ASC 53123
53334
15-445/645 (Fall 2023)
40

O U T P U T C O N T RO L

FETCH {FIRST|NEXT} <count> ROWS


OFFSET <count> ROWS
→ Limit the # of tuples returned in output.
→ Can set an offset to return a “range”
SELECT sid, name FROM student
WHERE login LIKE '%@cs'
FETCH FIRST 10 ROWS ONLY;
SELECT sid, name FROM student
WHERE login LIKE '%@cs'
ORDER BY gpa
OFFSET 10 ROWS
FETCH FIRST 10 ROWS WITH TIES;
15-445/645 (Fall 2023)
41

O U T P U T C O N T RO L

FETCH {FIRST|NEXT} <count> ROWS


OFFSET <count> ROWS
→ Limit the # of tuples returned in output.
→ Can set an offset to return a “range”
SELECT sid, name FROM student
WHERE login LIKE '%@cs'
FETCH FIRST 10 ROWS ONLY;
SELECT sid, name FROM student
WHERE login LIKE '%@cs'
ORDER BY gpa
OFFSET 10 ROWS
FETCH FIRST 10 ROWS WITH TIES;
15-445/645 (Fall 2023)
42

WINDOW FUNCTIONS

Performs a "sliding" calculation across a set of


tuples that are related.
Like an aggregation but tuples are not grouped
into a single output tuples.
How to “slice” up data
Can also sort

SELECT ... FUNC-NAME(...) OVER (...)


FROM tableName
Aggregation Functions
Special Functions
15-445/645 (Fall 2023)
43

WINDOW FUNCTIONS

Aggregation functions:
→ Anything that we discussed earlier sid cid grade row_num
53666 15-445 C 1
Special window functions: 53688 15-721 A 2
→ ROW_NUMBER()→ # of the current row 53688 15-826 B 3
→ RANK()→ Order position of the current 53655 15-445 B 4
row. 53666 15-721 C 5

SELECT *, ROW_NUMBER() OVER () AS row_num


FROM enrolled

15-445/645 (Fall 2023)


44

WINDOW FUNCTIONS

The OVER keyword specifies how to


group together tuples when cid sid row_number
15-445 53666 1
computing the window function. 15-445 53655 2
Use PARTITION BY to specify group. 15-721 53688 1
15-721 53666 2
15-826 53688 1

SELECT cid, sid,


ROW_NUMBER() OVER (PARTITION BY cid)
FROM enrolled
ORDER BY cid

15-445/645 (Fall 2023)


45

WINDOW FUNCTIONS

You can also include an ORDER BY in the window


grouping to sort entries in each group.

SELECT *,
ROW_NUMBER() OVER (ORDER BY cid)
FROM enrolled
ORDER BY cid

15-445/645 (Fall 2023)


46

WINDOW FUNCTIONS

Find the student with the second highest grade for each
course.
Group tuples by cid
Then sort by grade
SELECT * FROM (
SELECT *, RANK() OVER (PARTITION BY cid
ORDER BY grade ASC) AS rank
FROM enrolled) AS ranking
WHERE ranking.rank = 2

15-445/645 (Fall 2023)


47

NESTED QUERIES

Invoke a query inside of another query to compose


more complex computations.
→ They are often difficult to optimize for the DBMS to
optimize due to correlations.
→ Inner queries can appear (almost) anywhere in query.

Outer Query SELECT name FROM student WHERE


sid IN (SELECT sid FROM enrolled) Inner Query

15-445/645 (Fall 2023)


48

NESTED QUERIES

Get the names of students in '15-445'

SELECT name FROM student


WHERE ...

sid in the set of people that take 15-445

15-445/645 (Fall 2023)


49

NESTED QUERIES

Get the names of students in '15-445'

SELECT name FROM student


WHERE ...
SELECT sid FROM enrolled
WHERE cid = '15-445'

15-445/645 (Fall 2023)


50

NESTED QUERIES

Get the names of students in '15-445'

SELECT name FROM student


WHERE ...
sid IN (
SELECT sid FROM enrolled
WHERE cid = '15-445'
)

15-445/645 (Fall 2023)


51

NESTED QUERIES

Get the names of students in '15-445'

SELECT name FROM student


WHERE ...
sid IN (
SELECT sid FROM enrolled
WHERE cid = '15-445'
)

15-445/645 (Fall 2023)


52

NESTED QUERIES

ALL→ Must satisfy expression for all rows in the


sub-query.

ANY→ Must satisfy expression for at least one row


in the sub-query.

IN→ Equivalent to '=ANY()' .

EXISTS→ At least one row is returned without


comparing it to an attribute in outer query.

15-445/645 (Fall 2023)


53

NESTED QUERIES

Get the names of students in '15-445'

SELECT name FROM student


WHERE sid = ANY(
SELECT sid FROM enrolled
WHERE cid = '15-445'
)

15-445/645 (Fall 2023)


54

NESTED QUERIES

Find student record with the highest id that is enrolled


in at least one course.

SELECT MAX(e.sid), s.name


FROM enrolled AS e, student AS s
WHERE e.sid = s.sid;

This won't work in SQL-92. It runs in SQLite, but


not Postgres or MySQL (v8 with strict mode).

15-445/645 (Fall 2023)


55

NESTED QUERIES

Find student record with the highest id that is enrolled


in at least one course.

SELECT sid, name FROM student


WHERE ...

"Is the highest enrolled sid"

15-445/645 (Fall 2023)


56

NESTED QUERIES

Find student record with the highest id that is enrolled


in at least one course.

SELECT sid, name FROM student


WHERE sid
... is
INthe
(
SELECT MAX(sid) FROM enrolled
)

15-445/645 (Fall 2023)


57

NESTED QUERIES

Find student record with the highest id that is enrolled


in at least one course.

SELECT sid, name FROM student


SELECT
WHERE sid
... issid,
IN ( name FROM student
the
WHEREMAX(sid)
SELECT sid IN (FROM enrolled
) SELECT sid FROM enrolled
ORDER BY sid DESC FETCH FIRST 1 ROW ONLY
)

15-445/645 (Fall 2023)


58

NESTED QUERIES

Find student record with the highest id that is enrolled


in at least one course.

SELECT sid, name FROM student


SELECT
WHERE sid
... issid,
IN ( name FROM student
the
WHEREMAX(sid)
SELECT sid IN (FROM enrolled
) SELECTsid
SELECT student.sid, name
FROM enrolled
FROMBYstudent
ORDER sid DESC FETCH FIRST 1 ROW ONLY
) JOIN (SELECT MAX(sid) AS sid
FROM enrolled) AS max_e
ON student.sid = max_e.sid;

15-445/645 (Fall 2023)


59

NESTED QUERIES

Find all courses that have no students enrolled in it.

SELECT * FROM course


WHERE ...
“with no tuples in the enrolled table”

sid cid grade


cid name
53666 15-445 C
15-445 Database Systems
53688 15-721 A
15-721 Advanced Database Systems
53688 15-826 B
15-826 Data Mining
53655 15-445 B
15-799 Special Topics in Databases
53666 15-721 C

15-445/645 (Fall 2023)


60

NESTED QUERIES

Find all courses that have no students enrolled in it.

SELECT * FROM course


WHERE NOT
... EXISTS(
tuples
“with in
nothe enrolled
tuples in table
the enrolled table”
)

15-445/645 (Fall 2023)


61

NESTED QUERIES

Find all courses that have no students enrolled in it.

SELECT * FROM course


WHERE NOT
... EXISTS(
SELECT
tuples * the
in FROM enrolled
enrolled table
“with no tuples in the enrolled table”
) WHERE course.cid = enrolled.cid
)

cid name
15-799 Special Topics in Databases

15-445/645 (Fall 2023)


62

L AT E R A L J O I N S

The LATERAL operator allows a nested query to


reference attributes in other nested queries that
precede it.
→ You can think of it like a for loop that allows you to
invoke another query for each tuple in a table.

t1.x t2.y
SELECT * FROM 1 2
(SELECT 1 AS x) AS t1,
LATERAL (SELECT t1.x+1 AS y) AS t2;

15-445/645 (Fall 2023)


63

L AT E R A L J O I N

Calculate the number of students enrolled in each course


and the average GPA. Sort by enrollment count in
descending order.

SELECT * FROM course AS c,


For each course:
⮕ Compute the # of enrolled students
For each course:
⮕ Compute the average gpa of enrolled students

15-445/645 (Fall 2023)


64

L AT E R A L J O I N

Calculate the number of students enrolled in each course


and the average GPA. Sort by enrollment count in
descending order.

SELECT * FROM course AS c,


For each course:
LATERAL (SELECT COUNT(*) AS cnt FROM enrolled
⮕ ComputeWHERE
the # of enrolled students
enrolled.cid = c.cid) AS t1,
LATERAL (SELECT AVG(gpa) AS avg FROM student AS s
For each course:
JOIN enrolled AS e ON s.sid = e.sid
⮕ ComputeWHERE
the average
e.cid gpa of enrolled
= c.cid) students
AS t2;

15-445/645 (Fall 2023)


65

L AT E R A L J O I N

Calculate the number of students enrolled in each course


and the average GPA. Sort by enrollment count in
descending order.

SELECT * FROM course AS c,


For each course:
LATERAL (SELECT COUNT(*) AS cnt FROM enrolled
⮕ ComputeWHERE
the # of enrolled students
enrolled.cid = c.cid) AS t1,
LATERAL (SELECT AVG(gpa) AS avg FROM student AS s
For each course:
JOIN enrolled AS e ON s.sid = e.sid
⮕ ComputeWHERE
the average
e.cid gpa of enrolled
= c.cid) students
AS t2;

15-445/645 (Fall 2023)


66

L AT E R A L J O I N

Calculate the number of students enrolled in each course


and the average GPA. Sort by enrollment count in
descending order.

SELECT * FROM course AS c,


For each course:
LATERAL (SELECT COUNT(*) AS cnt FROM enrolled
⮕ ComputeWHERE
the # of enrolled students
enrolled.cid = c.cid) AS t1,
LATERAL (SELECT AVG(gpa) AS avg FROM student AS s
For each course:
JOIN enrolled AS e ON s.sid = e.sid
⮕ ComputeWHERE
the average
e.cid gpa of enrolled
= c.cid) students
AS t2;

15-445/645 (Fall 2023)


67

L AT E R A L J O I N

Calculate the number of students enrolled in each course


and the average GPA. Sort by enrollment count in
descending order.

SELECT * FROM course AS c,


For each course:
LATERAL (SELECT COUNT(*) AS cnt FROM enrolled
⮕ ComputeWHERE
the # of enrolled students
enrolled.cid = c.cid) AS t1,
LATERAL (SELECT AVG(gpa) AS avg FROM student AS s
For each course:
JOIN enrolled AS e ON s.sid = e.sid
⮕ ComputeWHERE
the average
e.cid gpa of enrolled
= c.cid) students
AS t2;

15-445/645 (Fall 2023)


68

L AT E R A L J O I N

Calculate the number of students enrolled in each course


and the average GPA. Sort by enrollment count in
descending order. cid name cnt avg
15-445 Database Systems 2 3.75
15-721 Advanced Database Systems 2 3.95
SELECT * FROM course AS c, 15-826 Data Mining 1 3.9
For each course:
LATERAL (SELECT COUNT(*) AS cntSpecial
15-799 FROM Topics
enrolled
in Databases 0 null
⮕ ComputeWHERE
the # of enrolled students
enrolled.cid = c.cid) AS t1,
LATERAL (SELECT AVG(gpa) AS avg FROM student AS s
For each course:
JOIN enrolled AS e ON s.sid = e.sid
⮕ ComputeWHERE
the average
e.cid gpa of enrolled
= c.cid) students
AS t2;

15-445/645 (Fall 2023)


69

C O M M O N TA B L E E X P R E S S I O N S

Provides a way to write auxiliary statements for


use in a larger query.
→ Think of it like a temp table just for one query.
Alternative to nested queries and views.

WITH cteName AS (
SELECT 1
)
SELECT * FROM cteName

15-445/645 (Fall 2023)


70

C O M M O N TA B L E E X P R E S S I O N S

You can bind/alias output columns to names


before the AS keyword.

WITH cteName (col1, col2) AS (


SELECT 1, 2
)
SELECT col1 + col2 FROM cteName

WITH cteName (colXXX, colXXX) AS (


SELECT 1, 2
)
SELECT colXXX + colXXX FROM cteName
15-445/645 (Fall 2023)
71

C O M M O N TA B L E E X P R E S S I O N S

You can bind/alias output columns to names


before the AS keyword.

WITH cteName (col1, col2) AS (


SELECT 1, 2
)
SELECT col1 + col2 FROM cteName

WITH cteName (colXXX, colXXX) AS (


SELECT 1, 2
)
SELECT colXXX + colXXX FROM cteName
15-445/645 (Fall 2023)
72

C O M M O N TA B L E E X P R E S S I O N S

You can bind/alias output columns to names


before the AS keyword.

WITH cteName (col1, col2) AS (


SELECT 1, 2
)
SELECT col1 + col2 FROM cteName

WITH cteName (colXXX, colXXX) AS ( Postgres


SELECT 1, 2
)
SELECT *
colXXX
FROM cteName
+ colXXX FROM cteName
15-445/645 (Fall 2023)
73

C O M M O N TA B L E E X P R E S S I O N S

Find student record with the highest id that is enrolled


in at least one course.

WITH cteSource (maxId) AS (


SELECT MAX(sid) FROM enrolled
)
SELECT name FROM student, cteSource
WHERE student.sid = cteSource.maxId

15-445/645 (Fall 2023)


74

CONCLUSION

SQL is not a dead language.

You should (almost) always strive to compute your


answer as a single SQL statement.

15-445/645 (Fall 2023)


75

HOMEWORK #1

Write SQL queries to perform basic data analysis.


→ Write the queries locally using SQLite + DuckDB.
→ Submit them to Gradescope
→ You can submit multiple times and use your best score.

Due: Sunday Sept 10th @ 11:59pm

https://15445.courses.cs.cmu.edu/fall2023/homework1

15-445/645 (Fall 2023)


76

NEXT CLASS

Storage Management

15-445/645 (Fall 2023)

You might also like