0% found this document useful (0 votes)
35 views

Advanced SQL Ii: CS 564 - Fall 2020

The document discusses advanced SQL topics including aggregation, GROUP BY, HAVING, NULL values, and outer joins. Aggregation allows performing operations like SUM, COUNT, and AVING on columns. GROUP BY groups rows based on column values and HAVING filters groups. NULL values are treated differently in comparisons and operations. Outer joins include rows without matches unlike inner joins.

Uploaded by

Jaden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Advanced SQL Ii: CS 564 - Fall 2020

The document discusses advanced SQL topics including aggregation, GROUP BY, HAVING, NULL values, and outer joins. Aggregation allows performing operations like SUM, COUNT, and AVING on columns. GROUP BY groups rows based on column values and HAVING filters groups. NULL values are treated differently in comparisons and operations. Outer joins include rows without matches unlike inner joins.

Uploaded by

Jaden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

ADVANCED

SQL II

CS 564 - Fall 2020

ACKs: Dan Suciu, Jignesh Patel, AnHai Doan


WHAT IS THIS LECTURE ABOUT

• SQL: Aggregation
– Aggregate operators
– GROUP BY
– HAVING
• SQL: Nulls
• SQL: Outer Joins

CS 564 [Fall 2020] - Paris Koutris 2


AGGREGATION

CS 564 [Fall 2020] - Paris Koutris 3


AGGREGATION

• SUM,AVG,COUNT,MIN,MAX can be applied to a


column in a SELECT clause to produce that
aggregation on the column
• COUNT(*) simply counts the number of tuples

SELECT AVG(Population)
FROM Country
WHERE Continent = ‘Europe’;

CS 564 [Fall 2020] - Paris Koutris 4


AGGREGATION: ELIMINATE DUPLICATES

We can use COUNT(DISTINCT <attribute>) to


remove duplicate tuples before counting!

SELECT COUNT (DISTINCT Language)


FROM CountryLanguage ;

CS 564 [Fall 2020] - Paris Koutris 5


GROUP BY

• We may follow a SELECT-FROM-WHERE expression


by GROUP BY and a list of attributes
• The relation is then grouped according to the
values of those attributes, and any aggregation is
applied only within each group

SELECT Continent, COUNT(*)


FROM Country
GROUP BY Continent;

CS 564 [Fall 2020] - Paris Koutris 6


GROUP BY: EXAMPLE
SELECT A, SUM(B * C)
FROM R
GROUP BY A;
5 = 2*0 + 5*1

R A B C A B C A SUM(B*C)
a 2 0 a 2 0 SELECT a 5
a 5 1 grouping 5 1 clause b 7
b 7 1 b 7 1 c 4
b 6 0 6 0
c 4 1 c 4 1

CS 564 [Fall 2020] - Paris Koutris 7


RESTRICTIONS
If any aggregation is used, then each element of the
SELECT list must be either:
– aggregated, or
– an attribute on the GROUP BY list

This query is wrong!!


SELECT Continent, COUNT(Code)
FROM Country
GROUP BY Code;
CS 564 [Fall 2020] - Paris Koutris 8
GROUP BY + HAVING

The HAVING clause always follows a GROUP BY


clause in a SQL query
• it applies to each group, and groups not satisfying the
condition are removed
• it can refer only to attributes of relations in the FROM
clause, as long as the attribute makes sense within a group

The HAVING clause applies only on aggregates!

CS 564 [Fall 2020] - Paris Koutris 9


HAVING: EXAMPLE

SELECT Language, COUNT(CountryCode) AS N


FROM CountryLanguage
WHERE Percentage >= 50
GROUP BY Language
HAVING N > 2
ORDER BY N DESC ;

CS 564 [Fall 2020] - Paris Koutris 10


PUTTING IT ALL TOGETHER

SELECT [DISTINCT] S
FROM R, S, T ,…
WHERE C1
GROUP BY attributes
HAVING C2
ORDER BY attribute ASC/DESC
LIMIT N ;

CS 564 [Fall 2020] - Paris Koutris 11


CONCEPTUAL EVALUATION

1. Compute the FROM-WHERE part, obtain a table


with all attributes in R,S,T,…
2. Group the attributes in the GROUP BY
3. Compute the aggregates and keep only groups
satisfying condition C2 in the HAVING clause
4. Compute aggregates in S
5. Order by the attributes specified in ORDER BY
6. Limit the output if necessary

CS 564 [Fall 2020] - Paris Koutris 12


NULL VALUES

CS 564 [Fall 2020] - Paris Koutris 13


NULL VALUES

• tuples in SQL relations can have NULL as a value


for one or more attributes
• The meaning depends on context:
– Missing value: e.g. we know that Greece has
some population, but we don’t know what it is
– Inapplicable: e.g. the value of attribute spouse
for an unmarried person

CS 564 [Fall 2020] - Paris Koutris 14


NULL PROPAGATION

• When we do arithmetic operations using NULL, the


result is again a NULL
– (10 * x)+5 returns NULL if x = NULL
– NULL/0 also returns NULL!

• String concatenation also results in NULL when


one of the operands is NULL
– 'Wisconsin' || NULL|| '-Madison’ returns NULL

CS 564 [Fall 2020] - Paris Koutris 15


COMPARISONS WITH NULL

• The logic of conditions in SQL is 3-valued logic:


– TRUE = 1
– FALSE = 0
– UNKNOWN = 0.5
• When any value is compared with a NULL, the
result is UNKNOWN
– e.g. x > 5 is UNKNOWN if x = NULL
• A query produces a tuple in the answer only if
its truth value in the WHERE clause is TRUE (1)
CS 564 [Fall 2020] - Paris Koutris 16
3-VALUED LOGIC

The truth value of a WHERE clause is computed


using the following rules:

• C1 AND C2 ----> min{ value(C1), value(C2) }


• C1 OR C2 ----> max{ value(C1), value(C2) }
• NOT C ----> 1- value(C)

CS 564 [Fall 2020] - Paris Koutris 17


3-VALUED LOGIC: EXAMPLE
tuple (1, NULL, NULL)
SELECT *
FROM R
WHERE (R.A>0) AND ((R.B<5) OR (NOT R.C=3));

1 0.5 0.5

0.5 (1-0.5)

0.5 (max{0.5, 0.5})

0.5 (min{0.5, 1}) the expression is UNKNOWN!


CS 564 [Fall 2020] - Paris Koutris 18
COMPLICATIONS

What will happen in the following query?

SELECT COUNT(*)
FROM Country
WHERE IndepYear > 1990 OR IndepYear <= 1990 ;

It will not count the rows with NULL!

CS 564 [Fall 2020] - Paris Koutris 19


TESTING FOR NULL

We can test for NULL explicitly:


– x IS NULL
– x IS NOT NULL

SELECT COUNT(*)
FROM Country
WHERE IndepYear > 1990 OR IndepYear <= 1990
OR IndepYear IS NULL;

CS 564 [Fall 2020] - Paris Koutris 20


OUTER JOINS

CS 564 [Fall 2020] - Paris Koutris 21


INNER JOINS

The joins we have seen so far are inner joins

SELECT C.Name AS Country, MAX(T.Population) AS N


FROM Country C, City T
WHERE C.Code = T.CountryCode
GROUP BY C.Name;
Alternative syntax:
SELECT C.Name AS Country, MAX(T.Population) AS N
FROM Country C
INNER JOIN City T ON C.Code = T.CountryCode
GROUP BY C.Name;
We can simply also write JOIN
CS 564 [Fall 2020] - Paris Koutris 22
LEFT OUTER JOINS

A left outer join includes tuples from the left relation


even if there’s no match on the right! It fills the
remaining attributes with NULL

SELECT C.Name AS Country, MAX(T.Population)


FROM Country C
LEFT OUTER JOIN City T
ON C.Code = T.CountryCode
GROUP BY C.Name ;

CS 564 [Fall 2020] - Paris Koutris 23


LEFT OUTER JOIN: EXAMPLE

SELECT A, C
R A B FROM R LEFT OUTER JOIN S
a 2 ON R.B = S.B
a 5
b 5 A C
c 6
a 100
a 300
S B C
2 100 b 300
3 200 c NULL
5 300
7 400

CS 564 [Fall 2020] - Paris Koutris 24


OTHER OUTER JOINS

• Left outer join:


– include the left tuple even if there is no match
• Right outer join:
– include the right tuple even if there is no match
• Full outer join:
– include the both left and right tuples even if
there is no match

CS 564 [Fall 2020] - Paris Koutris 25

You might also like