0% found this document useful (0 votes)
15 views

03 Advanced SQL Annotateddi

Gagahbs

Uploaded by

divakarmass444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

03 Advanced SQL Annotateddi

Gagahbs

Uploaded by

divakarmass444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Advanced SQL

Lecture 3: Advanced SQL

1 / 76
Advanced SQL

Today’s Agenda

Advanced SQL
1.1 Recap
1.2 Relational Language
1.3 Aggregates
1.4 Grouping
1.5 String and Date/Time Functions
1.6 Output Control
1.7 Nested Queries
1.8 Window Functions
1.9 Common Table Expressions
1.10 Joins

2 / 76
Advanced SQL Recap

Recap

3 / 76
Advanced SQL Recap

Relational Model

Proposed in 1970 by Ted Codd (IBM Almaden).


Data model to avoid this maintenance.

• Store database in simple data structures


• Access data through high-level language
• Physical storage left up to implementation

4 / 76
Advanced SQL Recap

Core Operators

• These operators take in relations (i.e., tables) as input and return a relation as output.
• We can “chain” operators together to create more complex operations.

• Selection (σ)
• Projection (Π)
• Union (∪)
• Intersection (∩)
• Difference (−)
• Product (×)
• Join (1)

5 / 76
Advanced SQL Relational Language

Relational Language

6 / 76
Advanced SQL Relational Language

Relational Language

• User only needs to specify the answer that they want, not how to compute it.
• The DBMS is responsible for efficient evaluation of the query.
▶ Query optimizer: re-orders operations and generates query plan

7 / 76
Advanced SQL Relational Language

SQL History

• Originally “SEQUEL" from IBM’s System R prototype.


▶ Structured English Query Language
▶ Adopted by Oracle in the 1970s.
▶ IBM releases DB2 in 1983.
▶ ANSI Standard in 1986. ISO in 1987
▶ Structured Query Language

8 / 76
Advanced SQL Relational Language

SQL History

• Current standard is SQL:2016


▶ SQL:2016 −→ JSON, Polymorphic tables
▶ SQL:2011 −→ Temporal DBs, Pipelined DML
▶ SQL:2008 −→ TRUNCATE, Fancy sorting
▶ SQL:2003 −→ XML, windows, sequences, auto-gen IDs.
▶ SQL:1999 −→ Regex, triggers, OO
• Most DBMSs at least support SQL-92
• Comparison of different SQL implementations

9 / 76
Advanced SQL Relational Language

Relational Language

• Data Manipulation Language (DML)


• Data Definition Language (DDL)
• Data Control Language (DCL)
• Also includes:
▶ View definition
▶ Integrity & Referential Constraints
▶ Transactions
• Important: SQL is based on bag semantics (duplicates) not set semantics (no
duplicates).

10 / 76
Advanced SQL Relational Language

List of SQL Features

• Aggregations + Group By
• String / Date / Time Operations
• Output Control + Redirection
• Nested Queries
• Join
• Common Table Expressions
• Window Functions

11 / 76
Advanced SQL Relational Language

Example Database

sid name login age gpa sid cid grade


1 Maria maria@cs 19 3.8 1 1 B
students 2 Rahul rahul@cs 22 3.5 enrolled 1 2 A
3 Shiyi shiyi@cs 26 3.7 2 3 B
4 Peter peter@ece 35 3.8 4 2 C

cid name
1 Computer Architecture
courses 2 Machine Learning
3 Database Systems
4 Programming Languages

12 / 76
Advanced SQL Aggregates

Aggregates

13 / 76
Advanced SQL Aggregates

Aggregates

• Functions that return a single value from a bag of tuples:


▶ AVG(col)−→ Return the average col value.
▶ MIN(col)−→ Return minimum col value.
▶ MAX(col)−→ Return maximum col value.
▶ SUM(col)−→ Return sum of values in col.
▶ COUNT(col)−→ Return number of values for col.

14 / 76
Advanced SQL Aggregates

Aggregates

• Aggregate functions can only be used in the SELECT output list.


• Task: Get number of students with a "@cs" login:
SELECT COUNT(login) AS cnt
FROM students WHERE login LIKE '%@cs'
SELECT COUNT(*) AS cnt
FROM students WHERE login LIKE '%@cs'
SELECT COUNT(1) AS cnt
FROM students WHERE login LIKE '%@cs'

CNT
3

15 / 76
Advanced SQL Aggregates

Multiple Aggregates

• Task: Get the number of students and their average GPA that have a "@cs" login.
SELECT AVG(gpa), COUNT(sid)
FROM students WHERE login LIKE '%@cs'

AVG CNT
3.6666 3

16 / 76
Advanced SQL Aggregates

Distinct Aggregates

• COUNT, SUM, AVG support DISTINCT


• Task: Get the number of unique students that have an "@cs" login.
SELECT COUNT(DISTINCT login)
FROM students WHERE login LIKE '%@cs'

COUNT
3

17 / 76
Advanced SQL Aggregates

Aggregates

• Output of columns outside of an aggregate.


• Task: Get the average GPA of students enrolled in each course.
SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid
AVG e.cid
3.5 ???

18 / 76
Advanced SQL Aggregates

Aggregates

• Output of columns outside of an aggregate.


• Task: Get the average GPA of students enrolled in each course.
SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid
AVG e.cid
3.5 ???
• column "e.cid" must appear in the GROUP BY clause or be used in an aggregate
function

18 / 76
Advanced SQL Grouping

Grouping

19 / 76
Advanced SQL Grouping

Group By

• Project tuples into subsets and calculate aggregates of each subset.


• Task: Get the average GPA of students enrolled in each course.
SELECT e.cid, AVG(s.gpa)
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid
GROUP BY e.cid
e.cid AVG
1 3.8
3 3.5
2 3.8

20 / 76
Advanced SQL Grouping

Group By

• Non-aggregated values in SELECT output clause must appear in GROUP BY clause.


SELECT e.cid, AVG(s.gpa), s.name
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid
GROUP BY e.cid
SELECT e.cid, AVG(s.gpa), s.name
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid
GROUP BY e.cid, s.name

21 / 76
Advanced SQL Grouping

Having

• Filters results based on aggregate value.


• Predicate defined over a group (WHERE clause for a GROUP BY)
SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid AND avg_gpa > 3.9
GROUP BY e.cid
SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid
GROUP BY e.cid
HAVING avg_gpa > 3.9

22 / 76
Advanced SQL Grouping

Having

• Filters results based on aggregate value.


• Predicate defined over a group (WHERE clause for a GROUP BY)
SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid
GROUP BY e.cid
HAVING AVG(s.gpa) > 3.9

e.cid AVG
1 3.8
2 3.8

23 / 76
Advanced SQL String and Date/Time Functions

String and Date/Time Functions

24 / 76
Advanced SQL String and Date/Time Functions

String Operations

String Case String Quotes


SQL-92 Sensitive Single Only
Postgres Sensitive Single Only
MySQL Insensitive Single/Double
SQLite Sensitive Single/Double
DB2 Sensitive Single Only
Oracle Sensitive Single Only
WHERE UPPER(name) = UPPER('MaRiA') // SQL-92
WHERE name = 'MaRiA' // MySQL

25 / 76
Advanced SQL String and Date/Time Functions

String Operations

• LIKE is used for string matching.


• String-matching operators
▶ % : Matches any substring (including empty strings).
▶ _ : Match any one character
SELECT * FROM student AS s
WHERE s.login LIKE '%@%'
SELECT * FROM student AS s
WHERE s.login LIKE '%@c_'

26 / 76
Advanced SQL String and Date/Time Functions

String Operations

• SQL-92 defines string functions.


▶ Many DBMSs also have their own unique functions
• These functions can be used in any expression (projection, predicates, e.t.c.)
SELECT SUBSTRING(name,0,5) AS abbrv_name
FROM students WHERE sid = 1
SELECT * FROM students AS s
WHERE UPPER(e.name) LIKE 'M%'

27 / 76
Advanced SQL String and Date/Time Functions

String Operations

• SQL standard says to use || operator to concatenate two or more strings together.
SQL-92
SELECT name FROM students WHERE login = LOWER(name) || '@cs'
MSSQL
SELECT name FROM students WHERE login = LOWER(name) + '@cs'
MySQL
SELECT name FROM students WHERE login = CONCAT(LOWER(name), '@cs')

28 / 76
Advanced SQL String and Date/Time Functions

Date/Time Operations

• Operations to manipulate and modify DATE/TIME attributes.


• Can be used in any expression.
• Support/syntax varies wildly!
• Task: Get the number of days since 2000.
• Demo Time!
PostgreSQL
SELECT (now()::date - '2000-01-01'::date) AS days;
MySQL
SELECT DATEDIFF(CURDATE(), '2000-01-01') AS days;
SQL Server
SELECT DATEDIFF(day, '2000/01/01', GETDATE()) AS days;

29 / 76
Advanced SQL Output Control

Output Control

30 / 76
Advanced SQL Output Control

Output Redirection

• Store query results in another table:


▶ Table must not already be defined.
▶ Table will have the same number of columns with the same types as the input.
SQL-92
SELECT DISTINCT cid INTO CourseIds
FROM enrolled;
MySQL
CREATE TABLE CourseIds (
SELECT DISTINCT cid FROM enrolled
);

31 / 76
Advanced SQL Output Control

Output Redirection

• Insert tuples from query into another table:


▶ Inner SELECT must generate the same columns as the target table.
▶ DBMSs have different options/syntax on what to do with duplicates.
SQL-92
INSERT INTO CourseIds
(SELECT DISTINCT cid FROM enrolled);

32 / 76
Advanced SQL Output Control

Output Control

• ORDER BY <column*> [ASC|DESC]


▶ Order the output tuples by the values in one or more of their columns.
SELECT sid, grade FROM enrolled
WHERE cid = 2
ORDER BY grade
SELECT sid, grade FROM enrolled
WHERE cid = 2
ORDER BY grade DESC, sid ASC

sid grade
1 A
4 A

33 / 76
Advanced SQL Output Control

Output Control

• LIMIT <count> [offset]


▶ Limit the number of tuples returned in output.
▶ Can set an offset to return a "range"
SELECT sid, name FROM students
WHERE login LIKE '%@cs'
LIMIT 10
SELECT sid, name FROM students
WHERE login LIKE '%@cs'
LIMIT 20 OFFSET 10

34 / 76
Advanced SQL Nested Queries

Nested Queries

35 / 76
Advanced SQL Nested Queries

Nested Queries

• Queries containing other queries.


• They are often difficult to optimize.
• Inner queries can appear (almost) anywhere in query.
SELECT name FROM students --- Outer Query
WHERE sid IN
(SELECT sid FROM enrolled) --- Inner Query

36 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students in course 2


SELECT name FROM students
WHERE ...

37 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students in course 2


SELECT name FROM students
WHERE ...
SELECT sid FROM enrolled
WHERE cid = 2

38 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students in course 2


SELECT name FROM students
WHERE sid IN (
SELECT sid FROM enrolled
WHERE cid = 2
)

name
Maria
Peter

39 / 76
Advanced SQL Nested Queries

Nested Queries

• ALL −→ Must satisfy expression for all rows in sub-query


• ANY −→ Must satisfy expression for at least one row in sub-query.
• IN −→ Equivalent to ’=ANY()’.
• EXISTS −→ Returns true if the subquery returns one or more records.

40 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students in course 2


SELECT name FROM students
WHERE sid = ANY (
SELECT sid FROM enrolled
WHERE cid = 2
)

41 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students in course 2


SELECT name FROM students AS s
WHERE EXISTS ( --- EXISTS operator
SELECT sid FROM enrolled AS e
WHERE cid = 2 and s.sid = e.sid
)

42 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students in course 2


SELECT (SELECT s.name --- Inner query in projection expression
FROM students AS s
WHERE s.sid = e.sid) AS sname
FROM enrolled AS e
WHERE cid = 2

43 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students not in course 2


SELECT name FROM students
WHERE sid ...

44 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Get the names of students not in course 2


SELECT name FROM students
WHERE sid != ALL (
SELECT sid FROM enrolled
WHERE cid = 2
)

name
Rahul
Shiyi

45 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Find students record with the highest id that is enrolled in at least one course.
--- Won't work in SQL-92
SELECT MAX(e.sid), s.name
FROM enrolled AS e, students AS s
WHERE e.sid = s.sid;

46 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Find students record with the highest id that is enrolled in at least one course.
--- "Is greater than every other sid"
SELECT sid, name
FROM students
WHERE ...
--- "Is greater than every other sid"
SELECT sid, name
FROM students
WHERE sid >= ALL(
SELECT sid FROM enrolled
)

sid name
4 Peter

47 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Find students record with the highest id that is enrolled in at least one course.
SELECT sid, name FROM students
FROM students
WHERE sid IN (
SELECT MAX(sid) FROM enrolled
)
SELECT sid, name FROM students
WHERE sid IN (
SELECT sid FROM enrolled
ORDER BY sid DESC LIMIT 1
)

48 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Find all courses that has no students enrolled in it.


SELECT * FROM courses
WHERE ...
--- "with no tuples in the 'enrolled' table"

49 / 76
Advanced SQL Nested Queries

Nested Queries

• Task: Find all courses that has no students enrolled in it.


SELECT * FROM courses
WHERE NOT EXISTS(
SELECT * FROM enrolled
WHERE course.cid = enrolled.cid
)

cid name
4 Peter

50 / 76
Advanced SQL Window Functions

Window Functions

51 / 76
Advanced SQL Window Functions

Window Functions

• Performs a “sliding” calculation across a set of related tuples.


• Unlike GROUP BY, tuples do not collapse into a group
• So needed if must refer back to individual tuples
SELECT ... FUNC-NAME(...) --- Special Window Functions, Aggregation Functions
OVER(...) --- How to slice up data? Can also sort.
FROM tableName

52 / 76
Advanced SQL Window Functions

Window Functions

• Special window functions:


▶ ROW_NUMBER() −→ Number of the current row
▶ RANK() −→ Order position of the current row.
• Aggregation functions:
▶ All the functions that we discussed earlier (e.g., MIN, MAX, AVG)
SELECT *, ROW_NUMBER()
OVER () AS row_num
FROM enrolled
sid cid grade row_num
1 1 B 1
1 2 A 2
2 3 B 3
4 2 A 4
53 / 76
Advanced SQL Window Functions

Window Functions

• The OVER keyword specifies how to group together tuples when computing the
window function.
• Use PARTITION BY to specify group.
SELECT cid, sid, ROW_NUMBER()
OVER (PARTITION BY cid) --- Note the row numbering
FROM enrolled
ORDER BY cid
cid sid row_number
1 1 1
2 1 1
2 4 2
3 2 1

54 / 76
Advanced SQL Window Functions

Window Functions

• You can also include an ORDER BY in the window grouping to sort entries in each
group.
SELECT cid, sid, ROW_NUMBER()
OVER (ORDER BY cid) --- Note the row numbering
FROM enrolled
ORDER BY cid
cid sid row_number
1 1 1
2 1 2
2 4 3
3 2 4

55 / 76
Advanced SQL Window Functions

Window Functions

• Task: Find the students with the highest grade for each course.
SELECT cid, sid, grade, rank FROM (
SELECT *, RANK() -- Group tuples by cid and then sort by grade
OVER (PARTITION BY cid ORDER BY grade ASC) AS rank
FROM enrolled
) AS ranking
WHERE ranking.rank = 1

cid sid grade rank


1 1 B 1
2 1 A 1
3 2 B 1

56 / 76
Advanced SQL Window Functions

Window Functions

• Task: Get the name of the students with the second highest grade for each course.
SELECT cid, sid, grade, rank FROM (
SELECT *, RANK()
OVER (PARTITION BY cid ORDER BY grade ASC) AS rank
FROM enrolled
) AS ranking
WHERE ranking.rank = 2 --- Update rank

cid sid grade rank


2 4 C 2

57 / 76
Advanced SQL Window Functions

Window Functions

• Task: Get the name of the students with the second highest grade for each course.
SELECT * FROM (
SELECT C.name, S.name, E.grade, RANK()
OVER (PARTITION BY E.cid ORDER BY E.grade ASC) AS grade_rank
FROM students S, courses C, enrolled E
WHERE S.sid = E.sid AND C.cid = E.cid --- Connect with students
) AS ranking
WHERE ranking.grade_rank = 2

name name grade rank


Machine Learning Peter C 2

58 / 76
Advanced SQL Common Table Expressions

Common Table Expressions

59 / 76
Advanced SQL Common Table Expressions

Common Table Expressions

• Provides a way to write auxiliary statements for use in a larger query.


▶ Think of it like a temp table just for one query.
• Alternative to nested queries and materialized views.
WITH cteName AS (
SELECT 1
)
SELECT * FROM cteName
column
1

60 / 76
Advanced SQL Common Table Expressions

Common Table Expressions

• You can bind output columns to names before the AS keyword.


WITH cteName (col1, col2) AS (
SELECT 1, 2
)
SELECT col1 + col2 FROM cteName
column
3

61 / 76
Advanced SQL Common Table Expressions

Common Table Expressions

• Task: Find students record with the highest id that is enrolled in at least one course.
WITH cteSource (maxId) AS (
SELECT MAX(sid) FROM enrolled
)
SELECT name FROM students, cteSource
WHERE students.sid = cteSource.maxId

62 / 76
Advanced SQL Common Table Expressions

Common Table Expressions – Recursion

• Task: Print the sequence of numbers from 1 to 10.


WITH RECURSIVE cteSource (counter) AS (
(SELECT 1)
UNION ALL
(SELECT counter + 1 FROM cteSource WHERE counter < 10)
)
SELECT * FROM cteSource

63 / 76
Advanced SQL Joins

Joins

64 / 76
Advanced SQL Joins

Types of Join

• Types of Join
▶ (INNER) JOIN (1) −→ Returns records that have matching values in both tables
▶ LEFT OUTER JOIN (=1) −→ Returns all records from the left table, and the matched
records from the right table
▶ RIGHT OUTER JOIN (1<) −→ Returns all records from the right table, and the matched
records from the left table
▶ FULL OUTER JOIN (=1<) −→ Returns all records when there is a match in either left or
right table

65 / 76
Advanced SQL Joins

Example Database

SQL Fiddle: Link

sid name sid hobby


1 Maria 1 Stars
students 2 Rahul hobbies 1 Climbing
3 Shiyi 2 Coding
4 Peter 5 Rugby

66 / 76
Advanced SQL Joins

Types of Join: Inner Join

• Task: List the hobbies of students.


SELECT name, hobby
FROM students JOIN hobbies
ON students.id = hobbies.user_id;

name grade
Maria Stars
Maria Climbing
Rahul Coding

67 / 76
Advanced SQL Joins

Types of Join: Left Outer Join

• Task: List the hobbies of all students.


SELECT name, hobby
FROM students LEFT OUTER JOIN hobbies
ON students.id = hobbies.user_id;

name grade
Maria Stars
Maria Climbing
Rahul Coding
Peter NULL
Shiyi NULL

68 / 76
Advanced SQL Joins

Types of Join: Right Outer Join

• Task: List all the hobbies of students.


SELECT name, hobby
FROM students RIGHT OUTER JOIN hobbies
ON students.id = hobbies.user_id;

name grade
Maria Stars
Maria Climbing
Rahul Coding
NULL Rugby

69 / 76
Advanced SQL Joins

Types of Join: Full Outer Join

• Task: List all the hobbies of all students.


SELECT name, hobby
FROM students FULL OUTER JOIN hobbies
ON students.id = hobbies.user_id;

name grade
Maria Stars
Maria Climbing
Rahul Coding
NULL Rugby
Peter NULL
Shiyi NULL

70 / 76
Advanced SQL Joins

More Types of Join

• SEMI JOIN (⋉)


▶ Returns record from the left table if there is a matching record in the right table
▶ Unlike regular JOIN, only returns columns from the left table and no duplicates.
▶ We do not care about the values of other columns in the right table’s record
▶ Used to execute queries with EXISTS or IN operators
• ANTI JOIN (▷)
▶ Opposite of a SEMI JOIN
▶ Returns record from the left table if there is no matching record in the right table
▶ Used to execute queries with NOT EXISTS or NOT IN operators
• LATERAL JOIN (▶◁) (a.k.a., Dependent Join, CROSS APPLY)
▶ Subqueries appearing in FROM clause can be preceded by the key word LATERAL
▶ Table functions appearing in FROM clause can also be preceded by the key word
LATERAL

71 / 76
Advanced SQL Joins

Types of Join: Semi Join

• Task: List the names of students with hobbies.


SELECT name
FROM students
WHERE sid IN
(SELECT sid
FROM hobbies);

name
Maria
Rahul

72 / 76
Advanced SQL Joins

Types of Join: Anti Join

• Task: List the names of students without hobbies.


SELECT name
FROM students
WHERE sid NOT IN
(SELECT sid
FROM hobbies);

name
Shiyi
Peter

73 / 76
Advanced SQL Joins

Types of Join: Lateral Join

• Task: List the names of students with hobbies.


SELECT name
FROM students, LATERAL (SELECT sid FROM hobbies
WHERE students.sid = hobbies.sid) ss;

name
Maria
Maria
Rahul

74 / 76
Advanced SQL Joins

Conclusion

• SQL is not a dead language.


• You should (almost) always strive to compute your answer as a single SQL statement.

75 / 76
Advanced SQL Joins

Next Class

• Storage Management

76 / 76

You might also like