Lecture7 Fall
Lecture7 Fall
SQL Intermediate
Jia Zou
Arizona State University
1
SQL Intermediate
2
Today’s Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
3
Practice
• HW2-Problem 11: Please return to me the distinct names of
suppliers, each of which has supplied at least two parts (Use
Self Join for Query and Relational Algebra)
4
Practice
• HW2-Problem 11: Please return to me the distinct names of
suppliers, each of which has supplied at least two parts (Use
Self Join for Query and Relational Algebra)
5
Practice
• HW2-Problem 11: Please return to me the distinct names of
suppliers, each of which has supplied at least two parts (Use
Self Join for Query and Relational Algebra)
6
Practice
Simplified Question:
• Please return to me the distinct keys of suppliers, each of which has
supplied at least two parts
We need a self-join on the part-supp table.
7
Practice
Simplified Question:
• Please return to me the distinct keys of suppliers, each of which has supplied
at least two parts
We need a self-join on the part-supp table.
Key Idea:
1. Rename the two part-supp tables (e.g., to PS1 and PS2 respectively)
8
Practice
Simplified Question:
• Please return to me the distinct keys of suppliers, each of which has supplied
at least two parts
We need a self-join on the part-supp table.
Key Idea:
1. Rename the two part-supp tables (e.g., to PS1 and PS2 respectively)
2. A self join on PS1.ps_suppkey = PS2.ps_suppkey AND PS1.ps_partkey <>
PS2.ps_partkey
9
Practice
Simplified Question:
• Please return to me the distinct keys of suppliers, each of which has supplied
at least two parts
We need a self-join on the part-supp table.
Key Idea:
1. Rename the two part-supp tables (e.g., to PS1 and PS2 respectively)
2. A self join on PS1.ps_suppkey = PS2.ps_suppkey AND PS1.ps_partkey <>
PS2.ps_partkey
3. Project on ps_suppkey
10
Idea 1: Renaming
ρps1 ρps2
PartSupp PartSupp
11
Idea 2: Self-Join
⨝ ps1.ps_suppkey=ps2.ps_suppkey ∧
ps1.ps_partkey<> ps2.ps_partkey
ρps1 ρps2
PartSupp PartSupp
12
Idea 3: Projection Πps1.ps_suppkey
⨝ ps1.ps_suppkey=ps2.ps_suppkey ∧
ps1.ps_partkey<> ps2.ps_partkey
ρps1 ρps2
PartSupp PartSupp
13
Now we have solved the simplified problem,
then how to solve the original problem?
14
Now we have solved the simplified problem,
then how to solve the original problem?
• Idea: Join with the Supplier Table, which contains the names of suppliers
15
Join with the
supplier table ⨝ Supplier.s_suppkey =ps1.ps_suppkey
Supplier Πps1.ps_suppkey
ps1.ps_suppkey=ps2.ps_suppkey ∧
⨝ ps1.ps_partkey<> ps2.ps_partkey
ρps1 ρps2
PartSupp PartSupp
Is that sufficient?
⨝ Supplier.s_suppkey =ps1.ps_suppkey
Supplier Πps1.ps_suppkey
ps1.ps_suppkey=ps2.ps_suppkey ∧
⨝ ps1.ps_partkey<> ps2.ps_partkey
ρps1 ρps2
PartSupp PartSupp
The user wants Πs_name
supplier names,
we need run a
⨝ Supplier.s_suppkey =ps1.ps_suppkey
ρps1 ρps2
PartSupp PartSupp
Final Answer to HW2-Problem 11
Relational Algebra: πs_name(Supplier ⨝s_suppkey=ps_suppkey
ρps1PartSupp ⨝ps1.ps_partkey!=ps2.ps_partkey ∧
ps1.ps_suppkey=ps2.ps_suppkeyρps2PartSupp)
SQL Query
19
HW2-Problem 12 will be similar!
20
Today’s Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
21
Missing Information
• Example: User (uid, name, age, pop)
• Value unknown
• We do not know Nelson’s age
• Value not applicable
• Suppose pop is based on interactions with others on our social networking site
• Nelson is new to our site; what is his pop?
22
Solution 1
• Dedicate a value from each domain (type)
• pop cannot be −1, so use −1 as a special value to
indicate a missing or invalid pop
• Leads to incorrect answers if not careful
• SELECT AVG(pop) FROM User;
• Complicates applications
• SELECT AVG(pop) FROM User WHERE pop <> -1;
• Perhaps the value is not as special as you
think!
• Ever heard of the Y2K bug? “00” was used as a
missing or invalid year value
23
Solution 2
• A valid-bit for every column
• User (uid, name,
name_is_valid ,
age, age_is_valid ,
pop, pop_is_valid )
• Complicates schema and queries
• SELECT AVG(pop) FROM User WHERE pop_is_valid ;
24
Solution 3
• Decompose the table; missing row = missing value
• UserName (uid, name)
• UserAge (uid, age)
• UserPop (uid, pop)
• UserID (uid)
• Still complicates schema and queries
• How to get all information about users in a table?
• Natural join doesn’t work!
25
SQL’s solution
• A special value NULL
• For every domain
• Special rules for dealing with NULL’s
• Example: User (uid, name, age, pop)
• <789, “Nelson”, NULL, NULL>
26
Computing with NULLs
• When we operate on a NULL and another value (including another
NULL) using +, −, etc., the result is NULL
• Aggregate functions ignore NULL, except COUNT(*) (since it counts
rows)
27
Three-valued Logic
• TRUE = 1, FALSE = 0, UNKNOWN = 0.5
• ! AND " = min(!, ")
• ! OR " = max(!, ")
• NOT ! = 1 − !
• When we compare a NULL with another value (including another
NULL) using = , >, etc., the result is UNKNOWN
• WHERE and HAVING clauses only select rows for output if the
condition evaluates to TRUE
• UNKNOWN is not enough
28
IS NULL/IS NOT NULL
• Example: Who has NULL pop values?
• SELECT * FROM User WHERE pop = NULL;
• Does not work; never returns anything
• SQL introduced special, built-in predicates IS NULL and IS NOT NULL
• SELECT * FROM User WHERE pop IS NULL;
29
Outerjoin motivation
First let’s see: what is the natural join result?
30
Outerjoin motivation
First let’s see: what is the natural join result?
31
Outerjoin motivation
First let’s see: what is the natural join result?
32
Outerjoin motivation
First let’s see: what is the natural join result?
33
Optional: Left Outer Natural Join
34
Optional: Left Outer Natural Join
•⟕
35
Today’s Agenda
• Data Definition Language
• Data Manipulation Language
• Basic Queries (SELECT-FROM-WHERE)
• ORDER BY
• Set Operations
• Null Values
• Aggregation
36
Aggregates
• Standard SQL aggregate functions: COUNT, SUM, AVG, MIN, MAX
• Example: number of users under 18, and their average popularity
• SELECT COUNT(*)
FROM User
WHERE age < 18;
• COUNT(*) counts the number of rows
37
Aggregates with Distinct
• Example: How many users are in some group?
• SELECT COUNT(DISTINCT uid)
FROM Member;
is equivalent to:
• SELECT COUNT(*)
FROM (SELECT DISTINCT uid FROM Member);
38
Practice
• Please return me the total number of parts of which the
retailing price is higher than 2000.
39
Practice
• Please return me the total number of parts of which the
retailing price is higher than 2000.
40
Practice
• I want to purchase all available instances of the part
named 'blush thistle blue yellow saddle' in the world.
How much supply cost in total should I pay for these
parts? (Each supplier supplies a different available
quantity of the part, and each supplier supplies the part
at a different supply cost)
41
Practice
• I want to purchase all available instances of the part
named 'blush thistle blue yellow saddle' in the world.
How much supply cost in total should I pay for these
parts? (Each supplier supplies a different available
quantity of the part, and each supplier supplies the part
at a different supply cost)
42
Grouping
• SELECT … FROM … WHERE …
GROUP BY list_of_columns
43
Example of Grouping By
• SELECT age, AVG(pop) FROM User GROUP BY age;
44
Example of Aggregates (with no Group By)
• An aggregate query with no GROUP BY clause = all rows go into one
group
• SELECT AVG(pop) FROM User;
SELECT AVG(pop) AS avg_pop
FROM User
45
Having
• Used to filter groups based on the group properties (e.g., aggregate
values, GROUP BY column values)
• SELECT … FROM … WHERE … GROUP BY …
HAVING condition;
46
Having examples
• List the average popularity for each age group with more than a
hundred users
47
Practice
• Please return me the total number of parts supplied by
suppliers whose account balance is below 0. The results should
contain two columns:
• Supplier key: SupplierKey
• Number of different types of parts supplied by this supplier: NumParts
48
Practice
• Please return me the total number of parts supplied by
suppliers whose supplier cost is below 100. The results should
contain two columns:
• Supplier key: SupplierKey
• Number of different types of parts supplied by this supplier: NumParts
50
Practice
• Please return me the total number of parts supplied by
suppliers whose supplier cost is below 100. The results should
contain two columns:
• Supplier key: SupplierKey
• Number of different types of parts supplied by this supplier: NumParts
52
Practice
• Please return me the total number of parts supplied by
suppliers whose account balance is below 0. The results should
contain two columns:
• Supplier Name
• Number of different types of parts supplied by this supplier: NumParts
53