0% found this document useful (0 votes)
20 views32 pages

01.murachs MySQL 2019 Chapter 06

This document provides an overview of advanced SQL skills, focusing on summarizing data using aggregate functions and creating summary queries. It explains how to use the GROUP BY and HAVING clauses to group and filter results, as well as the syntax and application of various aggregate functions like COUNT, AVG, SUM, MIN, and MAX. The content is structured in independent modules, allowing readers to learn and apply new SQL skills as needed.

Uploaded by

Thai Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views32 pages

01.murachs MySQL 2019 Chapter 06

This document provides an overview of advanced SQL skills, focusing on summarizing data using aggregate functions and creating summary queries. It explains how to use the GROUP BY and HAVING clauses to group and filter results, as well as the syntax and application of various aggregate functions like COUNT, AVG, SUM, MIN, and MAX. The content is structured in independent modules, allowing readers to learn and apply new SQL skills as needed.

Uploaded by

Thai Vu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

More SQL skills

as you need them


In section 1, you learned a professio11al subset of SQL skills that you can
use to work with data in an existing database. Now, in this section, you
can add to those skills by learning new skills whenever you need them. To
make that possible, each chapter in this section has been designed as an
independent module. As a result, you can read these chapters in whatever
sequence you prefer.
In chapter 6, you '11 learn how to summarize the data that you retrieve.
In chapter 7, you'll learn more about coding subqueries. In chapter 8,
you'll learn more about the types of data that MySQL suppo11s. And
in chapter 9, you'll learn how to use MySQL functions in your SQL
statements.
How to code summary

queries
ln this chapter, you'll learn how to code queries that summarize data. For
example, you can use summary queries to report sales totals by vendor or
state. Similarly, you can use summary queries to get a count of the number
of i11voices that were processed each day of the month. But first, you'll learn
how to use a special type of function called an aggregate function. Aggregate
functions allow you to do jobs like calculate averages, summarize totals, or find
the highest value for a given column, and you 'll use them in summary queries.

How to work with aggregate functions ............................ 170


How to code aggregate functions ................................................................ 170
Queries that use aggregate functions ........................................................... 172
How to group and summarize data ................................... 174
How to code the GROUP BY and HAVING clauses .................................. 174
Queries that use the GROUP BY and HAVING clauses ............................. 176
How the HAYING clause compares to the WHERE clause ..... .................. 178
How to code compound search conditions .................................................. 180
How to use the WITH ROLLUP operator ................................................... 182
How to use the GROUPING function ......................................................... 184
How to code aggregate window functions ...................... 188
How the aggregate window fu nctions work .............................. .................. 188
How to L1se fra1nes .... ................................. ......................................... ......... 1.90
How to use named windows ........................................................................ 194
Perspective ......................................................................... 196
170 Section 2 More SQL skills cts you need them

How to work with aggregate functions


In chapter 3, you learned how to use scalar functions, which operate on a
single value and return a single value. In this chapter, you '11 learn how to use
aggregate functions, which operate on a series of values and return a single
summary value. Because aggregate functions typically operate on the values
in columns, they are sometimes referred to as column functions . A query that
contains one or more aggregate functions is typically refe1red to as a su,nmary
query.

How to code aggregate functions


Figure 6-1 presents the syntax of the most common aggregate functions.
Most of these functions operate on an expression. Typically, the expression is
just a column name. For example, you could get the average of all values in the
invoice total column like this:
AVG(invoice t ota l)
However, an expression can also be more complex. In this figure, for example,
the expression that's coded for the SUM function calculates the balance due of
an invoice using the invoice_total, payment_total, and credit_total columns. The
result is a single value that represents the total amount due for all the selected
invoices. In this case, the WHERE clause selects only those invoices with a
balance due.
When you t1se these ft1nctions, you can also code the ALL or DISTINCT
keyword. The ALL keyword is the default, which means that all values are
included in the calculation. The exceptions are null values, which are excluded
from all these functions except for COUNT(*).
If you don' t want duplicate values included, you can code the DISTINCT
keyword. In most cases, you'll use DISTINCT only with the COUNT function
as shown in the next figure. You won't use it with MIN or MAX because it has
no effect on those functions. And it doesn' t usually make sense to use it with the
AVG and SUM functions.
Unlike the other aggregate functions, you can't use the ALL or DISTINCT
keywords or an expression with COUNT(*). Instead, you code this function
exactly as shown in the syntax . The value returned by this function is the number
of rows in the base table that satisfy the search condition of the query, including
rows with null values. In this figure, for example, the COUNT(*) function in the
query indicates that the Invoices table contains 11 invoices with a balance due.
Chapter 6 How to code sum1ncary queries 171

The syntax of the aggregate functions


Function syntax Result
AVG{[ALLIDISTINCT] expression) The average of lhe non-nu_ll values in the expression.
SUM{[ALLIDISTINCT] expression) The total of the non-nt1ll values in the expression.
MIN{[ALLjDISTINCT] expression) The lowest non-null value in the expression.
MAX{[ALLIDISTINCT] expression) The hi ghest non-null value in the expression.
COUNT{[ALLIDISTINCT] expression) The number of non-null values in the expression.
COUNT{*) The number of rows selected by the query.

A summary query that counts unpaid invoices and calculates the total due
SELECT COUNT{*) AS number_ of_ invoices,
SUM(invoice_ total - payment_ total - credit_total) AS total_due
FROM invoices
WHERE invoice total - payment_ total - credit total> 0
number _of_invokes total_due

► 11 32020.42

Description
• Aggregate functions, also called column functions, perform a calculation on the
values in a set of selected rows.
• A sitmmary query is a SELECT statement that includes one or more aggregate
functions.
• The expression you specify for the AVG and SUM functions must result in a
numeric value. The expression for the MIN, MAX, and COUNT ft.1nctions can
result in a numeric, date, or string value.
• By default, all values are included in the calculation regardless of whether they're
duplicated. If you want to omit duplicate values, code the DISTINCT keyword.
This keyword is typically used with the COUNT function.
• All of the aggregate functions except for COUNT(*) ignore null values.
• If you code an aggregate function in the SELECT clause, that clause can include
non-agg1·egate columns from the base table if that column is functionally dependent
on an aggregate column. See figure 6-3 for more information.

Figure 6-1 How to code aggregate functions


172 Section 2 More SQL skills cts you need them

Queries that use aggregate functions


This figure presents four more queries that use aggregate functions. The
first two queries use the COUNT(*) function to count the number of rows in the
Invoices table that satisfy the search condition. In both cases, only those invoices
with invoice dates after 1/1/2018 are included in the count. In addition, the first
query uses the AVG function to calculate the average amount of those invoices
and the SUM function to calculate the total amount of those invoices. In contrast,
the second query uses the MIN and MAX functions to get the minimum and
maximum invoice amounts.
Although the MIN, MAX, and COUNT functions are typically used on
columns that contain numeric data, they can also be used on columns that
contain character or date data. In the third query, for example, they're used
on the vendor_name column in the Vendors table. Here, the MIN function
returns the name of the vendor that's lowest in the sort sequence, the MAX
function returns the name of the vendor that's highest in the so1t sequence, and
the COUNT ft1nction returns the total number of vendors. Note that since the
vendor_name column can't contain null values, the COUNT(*) function would
have returned the same result.
The fourth query shows how using the DISTINCT keyword can affect
the result of a COUNT function. Here, the first COUNT function uses the
DISTINCT keyword to count the number of vendors that have invoices dated
1/1/2018 or later in the Invoices table. To do that, it looks for distinct values in
the vendor_id column. In contrast, since the second COUNT function doesn' t
include the DISTINCT keyword, it counts every invoice that's dated 1/1/2018
or later. Although you could use the COUNT(*) function instead, this example
uses COUNT(vendor_id) to clearly show the difference between coding and not
coding the DISTINCT keyword.
With two exceptions, a SELECT clause that contains an aggregate function
can contain only aggregate functions. The first exception is if the column specifi-
cation results in a literal value. This is shown by the first column in the first two
queries in figure 6-2. The second exception is if the query includes a GROUP
BY clause. Then, the SELECT clause can include any columns specified in the
GROUP BY clause as shown in the next two figures.
Chapter 6 How to code sum1n.ary queries 173

A summary query that uses the COUNT(*), AVG, and SUM functions
SELECT 'After 1/1/2018' AS selection_ date,
COUNT(*) AS number_ of_ invoices,
ROUND{AVG{invoice_ total), 2) AS avg_ invoice_ amt,
SUM{invoice_ total) AS total_ invoice_ amt
FROM invoices
WHERE invoice_ date > '2018-01-01'
selection_date number_of_lnvOJces avg_ilvoice_amt total_invoice_amt
► After 1/1/2018 114 1879.74 214290.51

A summary query that uses the MIN and MAX functions


SELECT 'After 1/1/2018' AS selection_ date,
COUNT { *) AS n11mh~r_ of_ invoices,
MAX{invoice_ total) AS highest_ invoice_ total,
MIN(invoice_ total) AS lowest_ invoice_ total
FROM invoices
WHERE invoice date> '2018-01-01'
seJection_date number_ofJnvoices highest_invoice_total lowest_invoice_total
► After 1/1/2018 114 37966.19 6.00

A summary query that works on non-numeric columns


SELECT MIN{vendor_ name) AS first_ vendor,
MAX{vendor_ name) AS last_ vendor,
COUNT(vendor_ name) AS number_ of_ vendors
FROM vendors
I flrst_vendor last_vendor number_of_vendors
► IAbbey Office Furnishings Zylka Design 122

A summary query that uses the DISTINCT keyword


SELECT COUNT(DISTINCT vendor_ id) AS number_ of_ vendors,
COUNT{vendor_ id) AS number_ of_ invoices,
ROUND(AVG(invoice_ total), 2) AS avg_ invoice_ amt,
SUM(invoice_ total) AS total_ invoice_ amt
FROM invoices
WHERE invoice_ date > '2018-01-01'

-► -
number _of_vendors
34
number_of_involces
114
avg_invoic:e_amt
1879.74
total_invoice_amt
214290.51

Description
• To cot1nt all of the selected 1·ows, you typically use the COUNT(*) function.
Alternately, you can use the COUNT function with the name of any column that
can't contain null values.
• To cot1nt only the rows with unique values in a specified column, you can code
the COUNT function with the DISTINCT keyword followed by the name of the
column.

Figure 6-2 Queries that use aggregate functions


174 Section 2 More SQL skills cts you need them

How to group and summarize data


Now that you understand how aggregate functions work, you're ready to
learn how to group data and use aggregate functions to summarize the data in
each group. To do that, you can use two new clauses of the SELECT statement:
GROUP BY and HAVING.

How to code the GROUP BY and HAVING clauses


Figure 6-3 shows the syntax of the SELECT statement with the GROUP BY
and HAVING clauses. The GROUP BY clause determines how the selected rows
are grouped, and the HAVING clause determines which groups are included in
the final results. These clauses are coded after the WHERE clause but before the
ORDER BY clause. That makes sense because the WHERE clause is applied
before the rows are grouped, and the ORDER BY clause is applied after the rows
are grouped.
In the GROUP BY clause, you list one or more columns or expressions
separated by commas. Then, the rows in the result set are grouped by those
columns or expressions in ascending sequence. That means that a single row is
returned for each unique set of values in the GROUP BY columns. In this figure,
for instance, the frrst example groups the results by a single column. In the next
figure, you can see examples that group by multiple columns.
This example calculates the average invoice amount for each vendor who has
invoices in the Invoices table. To do that, it uses a GROUP BY clause to group
the invoices by vendor_id. As a result, the AVG function calculates the average
of the invoice_total column for each group rather than for the entire result set.
The example in this figure also includes a HAVING clause. The search
condition in this clause specifies that only those vendors with invoices that
average over $2,000 should be included. Note that this condition must be applied
after the rows are grouped and the average for each group has been calculated.
In addition to the AVG function, the SELECT clause includes the vendor_id
column. That's usually what you want since the rows are grouped by this
column. However, if you don' t want to include the coluID11s used in the GROUP
BY clause in the SELECT clause, you don' t have to.
In most cases, the SELECT clause for a statement that includes a GROUP
BY clause will only include the columns that are used for grouping, along with
the aggi·egate functions. However, you can also include expressions that result
in a constant value as well as columns that are functionally dependent on a
column that's used for grouping. For a column to be functionally dependent on a
grouping column, the grouping column must be a primary key for the column in
the SELECT clause or it must be unique and not allow null values.
The second example illusu·ates how this works. This example gets the
vendor name, vendor state, and the average invoice total for each vendor.
To do that, it joins the Vendors and Invoices table and groups the rows by
vendor_name. Although the vendor_state column isn't included in the GROUP
BY clause, it can be included in the SELECT clause because the vendor_name
column is a unique colt1mn that can't contain null values. Because of that, the
vendor_state colu1nn is functionally dependent on it.
Chapter 6 How to code sum1ncary queries 175

The syntax of a SELECT statement with GROUP BY and HAVING clauses


SELECT select_ list
FROM table_ source
[WHERE search_ condition]
[GROUP BY group_ by_ list]
[HAVING search_ condition]
[ORDER BY order_ by_ list]

A summary query that calculates the average invoice amount by vendor


SELECT vendor_ id, ROUND(AVG(invoice_ total), 2) AS average_ invoice_ amount
FROM invoices
GROUP BY vendor_ id
HAVING AVG(invoice_ total) > 2000
ORDER BY average_ invoice_ amount DESC
vendor id average_lnvoice_amount

► 110 23978.48
72 10963.66
104 7125.34
99 6940.25
119 490 1.26
122 2575.33
86 2433.00
100 2184.50

(8 rows)

A summary query that includes a functionally dependent column


SELECT vendor_ name, ~ endor_ state ,
ROUND(AVG(invoice_ total), 2) AS average_ invoice_ amount
FROM vendors JOIN invoices ON vendors.vendor_ id = invoices.vendor_ id
GROUP BY vendor_ narn~
HAVING AVG(invoice total) > 2000
ORDER BY average_ invoice_ amount DESC

Description
• The GROUP BY clause groups the rows of a result set based on one or more columns or
expressions. To include two or more columns or expressions, separate them by commas.
• If you include aggregate functions in the SELECT clause, the aggregate is calculated for
each group specified by the GROUP BY clause.
• If you include two or more colu1nns or expressions in the GROUP BY clause, they form
a hierarchy where each column or expression is subordinate to the previous one.
• The HAVING clause specifies a search condition for a group or an aggregate. MySQL
applies this condition after it groups the rows that satisfy the search condition in the
WHERE clause.
• When a SELECT statement includes a GROUP BY clause, the SELECT clause can
include the columns used for grouping, aggregate functions, and expressions that
result in a constant value.
• The SELECT clause can also include columns that are functionally dependent on a
column used fo1· grouping. To be functionally dependent, the grouping column must
be a primary key of the table that contains the column in the SELECT clause or it
must be unique and not allow null values.

Figure 6-3 How to code the GROUP BY and HAVING clauses


176 Section 2 More SQL skills cts you need them

Queries that use the GROUP BY


and HAVING clauses
Figure 6-4 presents three more queries that group data. The first query in this
figure groups the rows in the Invoices table by vendor_id and returns a count of
the number of invoices for each vendor.
The second query shows how you can group by mo1·e than one column.
Here, a join is used to combine the vendor_state and vendor_city columns from
the Vendors table with a count and average of the invoices in the Invoices table.
Because the rows are grouped by both state and city, a row is returned for each
state and city combination.
The third query is identical to the second query except that it includes a
HAVING clause. This clause uses the COUNT function to limit the state and city
groups that are included in the result set to those that have two or more invoices.
In other words, it excludes groups that have only one invoice.
With MySQL 8.0.12 and earlier, the GROUP BY clause sorted the columns
in ascending sequence by default. Then, to change that sequence, you could
code the DESC keyword after the column name in the GROUP BY clause. You
could also code the ASC keyword to make it clear that the rows were sorted in
ascending sequence. And, yot1 could improve the performance of a query by
coding an ORDER BY NULL clause so the result set wasn't sorted at all.
With MySQL 8.0.13 and later, the columns in a GROUP BY clause are
no longer sorted by default. In addition, you can no longer specify the ASC or
DESC keywords on this clause. Instead, you must code an ORDER BY clause
to sort the rows in a result set. Otherwise, MySQL doesn't guarantee the sort
sequence.
Chapter 6 How to code sum1ncary queries 177

A summary query that counts the number of invoices by vendor


SELECT vendor_ id, COUNT(*) AS invoice_ qty
FROM invoices
GROUP BY vendor_ id
I vendorJd lnvoice_Qty ""'.
Ice

► 34
37
2
3
-
48 1
72 2

(34 rows )

A summary query that calculates the number of invoices


and the average invoice amount for the vendors in each state and city
SELECT vendor_ state, vendor_ city, COUNT(*) AS invoice_ qty,
ROUND(AVG(invoice_ total), 2) AS invoice_ avg
FROM invoices JOIN vendors
ON invoices.vendor id= vendors.vendor id
GROUP BY vendor_ state, vendor_ city
ORDER BY vendor_ state, vendor_ city
vendor_state vendor_city n volce_qty inv01ce_avg
- AZ. Phoenix 1 662.00

CA Fresno 19 1208.75
CA Los Angeles 1 503.20
CA Oxnard 3 188.00

(20 rows)

A summary query that limits the groups


to those with two or more invoices
SELECT vendor_ state, vendor_ city, COUNT(*) AS invoice_ qty,
ROUND(AVG(invoice_ total), 2) AS invoice_ avg
FROM invoices JOIN vendors
ON invoices.vendor_ id = vendors.vendor_ id
GROUP BY vendor_ state, vendor_ city
HAVING COUNT(*) >= 2
ORDER BY vendor_ state, vendor_ city
vendor- state vendor_dty invoice_Qty 111voice_avg I\
-
► Fresno 19 1208.75 1--

CA Oxnard 3 188,00
CA Pasadena 5 196. 12
CA Sacramento 7 253.00 \I

(12 rows)

Description
• With MySQL 8.0.12 and earlier, the GROUP BY clause sorted the columns in ascending
sequence by default. Then, to change the sort sequence, you could code the DESC
keyword after the column name in the GROUP BY clause. In addition, to get your results
faster, you cot1ld code an ORDER BY NULL clause to prevent MySQL from sorting the
rows in the GROUP BY clause.
• With MySQL 8.0.13 and later, the columns in a GROUP BY clause are no longer sorted by
default, and you can't code the ASC or DESC keyword on this clause. Instead, you must
use the ORDER BY clause to specify the sort sequence.

Figure 6-4 Queries that use the GROUP BY and HAVING clauses
178 Section 2 More SQL skills cts you need them

How the HAVING clause compares


to the WHERE clause
As you've seen, you can limit the groups included in a result set by coding
a search condition in the HAVING clause. In addition, you can apply a search
condition to each row before it's included in a group. To do that, you code the
search condition in the WHERE clause just as you would for any SELECT state-
ment. To make sure you understand the differences between search conditions
coded in the HAVING and WHERE clauses, figure 6-5 presents two examples.
The first example groups the invoices in the Invoices table by vendor name
and calculates a count and average invoice amount for each group. Then, the
HAVINO clause limits the groups in the result set to those that have an average
invoice total greater than $500.
In contrast, the second example includes a WHERE clause that limits the
invoices included in the groups to those that have an invoice total greater than
$500. In other words, the search condition in this example is applied to every
row. In the previous example, it was applied to each group of rows. As a result,
these examples show that there are eight invoices for Zylka Design in the
Invoices table, but only seven of them are over $500.
Beyond this, there are two differences in the expressions that you can
include in the WHERE and HAVING clauses. First, the HAVING clause can
include aggregate functions as shown in the frrst example, but the WHERE
clause can 't. That's because the search condition in a WHERE clause is applied
before the rows are grouped. Second, although the WHERE clause can refer to
any column in the base tables, the HAVING clause can only refer to columns
included in the SELECT clause. That's because it filters the summarized result
set that's defmed by the SELECT, FROM, WHERE, and GROUP BY clauses. In
other words, it doesn' t filter the base tables.
Chapter 6 How to code sum1ncary queries 179

A summary query with a search condition in the HAVING clause


SELECT vendor_ name,
COUNT(*) AS invoice_ qty,
ROUND{AVG{invoice_ total), 2) AS invoice_ avg
FROM vendors JOIN invoices
ON vendors.vendor_ id = invoices.vendor_ id
GROUP BY vendor_ name
HAVING AVG{invoice_ total) > 500
ORDER BY invoice_ qty DESC
vendor_name invoice_qty lnvoice_avg I\
-
► United Parcel Service 9 2575.33
Zylka Design 8 867. 53
Maftoy lithographing Inc 5 23978.48
IBM 2 600.06

(19 rows)

A summary query with a search condition in the WHERE clause


SELECT vendor_ name,
COUNT{*) AS invoice_ qty,
ROUND(AVG(invoice_ total), 2) AS invoice_ avg
FROM vendors JOIN invoices
ON vendors.vendor_ id = invoices.vendor_ id
WHERE invoice_ total > 500
GROUP BY vendor_ name
ORDER BY invoice_ qty DESC
• •
vendor_name invoice_qty nvotCe_avg
Unrted Parcel Service 2575.33 .,__
► 9 I

Zylka Design 7 9~.67


MaDoy l ithographing Inc 5 23978.48
Ingram 2 1on.21

{20 rows)

Description
• When you include a WHERE clat1se in a SELECT statement that uses grouping
and aggregates, MySQL applies the search condition before it groups the rows and
calculates the aggregates.
• When you include a HAVING clause in a SELECT statement that uses grouping
and aggregates, MySQL applies the search condition after it groups the rows and
calculates the aggregates.
• A WHERE clause can refer to any colwnn in the base tables.
• A HAVING clause can only refer to a column included in the SELECT clause.
• A WHERE clause can't contain aggregate ft1nctions.
• A HAVING clause can contain aggregate functions.

Figure 6-5 How the HAVING clause compares to the WHERE clause
180 Section 2 More SQL skills cts you need them

How to code compound search conditions


You can code compound search conditions in a HAVING clause just as you
can in a WHERE clause. The first example in figure 6-6 shows how this works.
This query groups invoices by invoice date and calculates a count of the invoices
and the sum of the invoice totals for each date. In addition, the HAYING clause
specifies three conditions. First, the invoice date must be between 5/1/2018 and
5/31/2018. Second, the invoice count must be greater than 1. And third, the sum
of the invoice totals must be greater than $ 100.
In the HAVING clause of this query, the second and third conditions include
aggregate functions. As a result, they must be coded in the HAVING clause. The
frrst condition, however, doesn't include an aggregate function, so it could be
coded in either the HAVING or WHERE clause. The second example shows this
condition coded in the WHERE clause. Either way, both queries return the same
result set.
So, where should you code your search conditions? In general, I think
queries are easier to 1·ead when they include al] the search conditions in the
HAVING clause. However, if you prefer to code non-aggregate search conditions
in the WHERE clause, that's OK too.
Chapter 6 How to code sum1ncary queries 181

A summary query with a compound condition in the HAVING clause


SELECT
invoice_ date,
COUNT(*) AS invoice_ qty,
SUM(invoice total) AS invoice_ sum
FROM invoices
GROUP BY invoice_ date
HAVING invoice_ date BETWEEN '2018-05-01' AND '2018-05-31'
AND COUNT(*) > 1
AND SUM(invoice_ total) > 100
ORDER BY invoice_ date DESC

The same query coded with a WHERE clause


SELECT
invoice_ date,
COUNT(*) AS invoice_ qty,
SUM(invoice_ total) AS invoice_ sum
FROM invoices
WHERE invoice_ date BETWEEN '2018-05-01' AND '2018-05-31'
GROUP BY invoice_ date
HAVING COUNT(*) > 1
AND SUM(invoice_ total) > 100
ORDER BY invoice_ date DESC

The result set returned by both queries


• •
invoice_date invoice_qty mvo,ce_sum

► 2018-05-3 1 2 453.75
2018-05-25 3 220L15
2018-05-23 2 347. 75
2018-05-21 2 8078.44
2018-05-13 3 1888.95
2018-05- 11 2 5009.51
2018-05-03 2 866.87

(7 rows)

Description
• You can use the AND and OR operators to code compound search conditions in a
HAVING clause just as you can in a WHERE clause.
• If a search condition includes an aggregate function, it must be coded in the
HAVING clause. Otherwise, it can be coded in either the HAVING or the WHERE
clause.

Figure 6-6 How to code compound search conditions


182 Section 2 More SQL skills cts you need them

How to use the WITH ROLLUP operator


So far, this chapter has discussed standard SQL keywords and functions.
However, MySQL provides an extension to standard SQL that's useful for
summarizing data: the WITH ROLLUP operator.
You can use the WITH ROLLUP operator to add one or more summary rows
to a result set that uses gi·ouping and aggregates. The two examples in figure 6-7
show how this works.
The first example shows how the WITH ROLLUP operator works when you
group by a single column. This statement groups the invoices by vendor_id and
calculates an invoice count and invoice total for each vendor group. In addition,
since the GROUP BY clause includes the WITH ROLLUP operator, this query
adds a summary row to the end of the result set. This row summaiizes all of the
aggregate columns in the result set. In this case, it summarizes the invoice_count
and invoice_total columns. Since the vendor_id column can 't be summarized,
it's assigned a null value.
The second query in this figure shows how the WITH ROLLUP operator
works when you group by two columns. This query groups vendors by state
and city and counts the number of vendors in each group. Then, this query adds
summary rows for each state, and it adds a final summary row at the end of the
result set.
Before MySQL 8.0.13, you couldn't use the use an ORDER BY clause to
sort a result set if the GROUP BY clause included the WITH ROLLUP operator.
Instead, you had to sort the individual columns by coding the ASC or DESC
keyword after the column name in the GROUP BY clause. With MySQL 8.0.13
and later, however, you can't code the ASC or DESC keyword on the GROUP
BY clause. Instead, if you want to sort the result set, you can now use an
ORDER BY clause. Keep in mind, though, that when you use WITH ROLLUP,
the result set is sorted by the columns in the GROUP BY clause in ascending
sequence by default. So you'll only code an ORDER BY clause if you want to
change this sequence.
Chapter 6 How to code sum1ncary queries 183

A summary query that includes a final summary row


SELECT vendor_ id, COUNT(*) AS invoice_ count,
SUM(invoice_ total) AS invoice_ total
FROM invoices
GROUP BY vendor_ id WITH ROLLUP
vendorj d 111voice_cou,t involce_total
119 1 ~01.26
121 8 6940.25
122 9 2J ln .96
123 47 4378.02 -
lml!!I 114 214290.51 ..,
~

(35 rows)

A summary query that includes a summary row for each grouping level
SELECT vendor_ state, vendor_ city, COUNT(*) AS qty_vendors
FROM vendors
WHERE vendor_ state IN ( • IA• , •NJ• )
GROUP BY vendor_ state, vendor_ city WITH ROLLUP
vendor_state vendor_dty qty_vendors

► IA Fairfield 1
IA Washington 1
IA om,, 2
NJ East Brunswick 2
NJ Fairfield 1
NJ Washington 1
NJ HW!I 4
ml 11111!1 6

Description
• You can use the WITH ROLLUP operator in the GROUP BY clause to add
summary rows to the final result set.
• The WITH ROLLUP operator adds a summary row for each group specified in the
GROUP BY clause. It also adds a summary row to the end of the result set that
summarizes the entire result set.
• If the GROUP BY clause specifies a single group, the WITH ROLLUP operator
only adds the fmal summary row.
• With MySQL 8.0.12 and earlier, you couldn' t use the ORDER BY clause with the
WITH ROLLUP operator. Instead, to sort individual columns, you coded the ASC
or DESC keyword after the column in the GROUP BY clause.
• With MySQL 8.0.13 and later, you can't code the ASC or DESC keyword on the
GROUP BY clause. However, you can now include an ORDER BY clause to sort
the result set when the GROUP BY clause includes WITH ROLLUP.
• With MySQL 8.0.12 and earlier, you couldn't use the DISTINCT keyword in any
of the aggregate functions when you used the WITH ROLLUP operator. With
MySQL 8.0.13 and later, you can use the DISTINCT keyword.

Figure 6-7 How to use the WITH ROLLUP operator


184 Section 2 More SQL skills cts you need them

How to use the GROUPING function


When you group by a column that can contain null values, the result of the
grouping can be a null value. In addition, when you use the WITH ROLLUP
operator to summarize a column that can contain null values, the summary row
will contain a null value in that column. Because of that, it can be difficult to
distinguish between the null values due to grouping and the null values due to
• •
summanz1ng.
The first query in figure 6-8 illustrates how this works. Here, the query
includes the invoice date and payment date from the Invoices table, as well as
the sum of the invoice totals and the sum of the invoice balances. The first five
rows in the result set are for the same invoice date. Both the first and the last of
those five rows contains a null value in the Payment Date column. The first row
contains a null value because one or more of the invoices for that invoice date
contain a null value. The last row contains a null value because it is a summary
row for all of the invoices for that invoice date. Without studying this result set
carefully, though, it's difficult to tell which null values are for summary rows
and which aren't.
To help distinguish between these null values, you can use the GROUPING
function that was introduced with MySQL 8.0. This function evaluates the
expression you specify to determine if the expression results in a null value
because it's in a sum1nary row. If it does, the GROUPING function returns a
value of 1. Otherwise, it returns a value of 0.
This is illustrated by the second query in this figure. This query is the
same as the first query except that it uses IF and GROUPING functions for the
invoice_date and pay1nent_date columns in the SELECT clause. You'll learn
about the IF function in chapter 9. For now, just realize that it evaluates the
expression in the first argument and returns the second argument if the expres-
sion is true or the third argument if it's not.
In this case, the first argument of each IF function is a GROUPING function.
These GROUPING functions test if the invoice_date or payment_date column
contains a null value because it's in a summary row. If so, the IF function returns
the literal value that's specified by the second argument. Otherwise, it returns
the value of the column grouping that's specified by the third argument. If
you compare the results of this query with the results of the first query, you'll
see that it's now obvious which rows are summary rows because they contain
literal values instead of null values. This is a common use for the GROUPING
function.
Chapter 6 How to code sum1ncary queries 185

The basic syntax of the GROUPING function


GROUPING (expression)

A summary query that uses WITH ROLLUP on a table with null values
SELECT invoice_ date, payment_ date,
SUM(invoice_ total) AS invoice_ total,
SUM(invoice_ total - credit_ total - payment_ total) AS balance_ due
FROM invoices
WHERE invoice_ date BETWEEN '2018-07-24' AND 1 2018-07-31 1
GROUP BY invoice_ date, payment_ date WITH ROLLUP
invoice_date payment_date Invoice_total balance_due
0®91
► 2018-07-24 503.20 503. 20
2018-07-24 2018-08-19 3689.99 0.00
2018-07-24 2018-08-23 67.00 0 .00
2018-07-24 2018-08-27 23517.58 0 .00
2018-07-24 HPJII 27777.n 503. 20
2018-07-25 2018-08-22 1000.'16 0 ,00
2018-07-25 l®!I 1000.-46 0 .00
2018-07-28 UQl!I 90.36 90.36
2018-07-28 HW!I 90.36 90.36
2018-07-30 2018-09-03 22. 57 0 .00
2018--0 7-30 ®J!I 22.57 o.oo
2018-07-31 lllij!i 10976.06 10976.06
il®!I 10976.06 10976.06
~&,;8-07-31
HW!I 39867.22 11569.62

A query that substitutes literals for nulls in summary rows


SELECT IF( GROUPI:NG(invoice_ date) = 1 , 'Grand totals', invoice_ date)
AS invoice_ date,
IF( GROUPI:NG(payment_ date) = 1 , 'Invoice date totals', payment_ date)
AS payme.n t_ date,
SUM(invoice_ total) AS invoice_ total,
SUM(invoice_ total - credit_ total - payment_ total) AS balance_ due
FROM invoices
WHERE invoice_ date BETWEEN '2018-07-24' AND 1 2018-07-31 1
GROUP BY invoice_ date, payment_ date WITH ROLLUP
Jinvoice_date payment_date Invoice_total balance _due
H©il
► 2018-07-24 503.20 503.20
2018-07-24 2018-08-19 3689.99 0.00
2018-07-24 2018-08-23 67.00 o.oo
2018-07-24 2018-08-27 23517.58 0.00
2018-07-24 Invoice date totals L777/.77 503.20
2018-07-25 2018-08-22 1000.-46 0.00
2018-07-25 Invoice date totals 1000.-46 0.00
2018-07-28 llill!I 90.36 90.36
2018-07-28 Invoice date totals 90.36 90. 36
2018-07-30 2018-09-03 22. 57 0.00
2018-07-30 Invoice date totals 22.57 0.00
2018-07-31 Ut!HI 10976.06 10976.06
2018-07-31 Invoice date totals 10976.06 10976.06
Grand totals Invoice date totals 39867. 22 11569.62

Description
• The GROUPING function returns 1 if the expression is null because it's in a
summary row. Otherwise, it returns 0.

Figure 6-8 How to use the GROUPING function (part 1 of 2)


186 Section 2 More SQL skills cts you need them

Part 2 of figure 6-8 shows another common use for the GROUPING
function. The query in this example is identical to the second one in part 1
of this figure, except that it includes a HAVING clause. This clause uses the
GROUPING function to filter the result set so only the summary rows are
included. To do that, it checks if this function returns a value of 1 for the
invoice_date or payment_date column.
Chapter 6 How to code sum1n.ary queries 187

A query that displays only summary rows


SELECT IF( GROUPING(invoice_ date ) = l , 'Grand totals', invoice_ date)
AS invoice_ date,
IF( GROUPING (payment_ date) = l , 'Invoice date totals', payment_ date)
AS payment_ date,
SUM(invoice_ total) AS invoice_ total,
SUM(invoice_ total - credit_ total - payment_ total) AS balance_ due
FROM invoices
WHERE invoice_ date BETWEEN '2018-07-24' AND '2018-07 - 31'
GROUP BY invoice_ date, payment_ date WITH ROLLUP
HAVING GROUPING(invoice- date) = 1 OR GROUPING{payment- date) = ~

invoice _date payment_date invoice_total balance_due i
2018-07-24 Invoice date totals l.7111. n
► 503.20
2018-07-25 Invoice date totals 1000.46 0.00 '
2018-07-28 Invoice date l'Otals 90.36 90.36 '
2018-07-30 Invoice date totals 22.57 0.00
2018-07·31 Invoice date totals 10976.06 10976.06
Grand totals Invoice date totals 39867.22 11569.62
. .., '

Description
• The GROUPING function is co1nmo1tly used to replace the nulls that are gener-
ated by WITH ROLLUP with literal values. To do that, you can use it with the IF
function as shown in this figure.
• The IF function evaluates the expression in the frrst argument and returns the
second argument if the expression is true and the third argument if the expression is
false. See chapter 9 for more information on this function.
• If you want to display just the summary rows produced by the WITH ROLLUP
operator, you can include one or more GROUPING functions in the HAVING
clause.
• In addition to the SELECT and HAVINO clauses, you can code the GROUPING
function in the ORDER BY clause.

Figure 6-8 How to use the GROUPING function (part 2 of 2)


188 Section 2 More SQL skills cts you need them

How to code aggregate


window functions
In the topics that follow, you '11 learn how to use the aggregate window
functions that were introduced with MySQL 8.0. You can use the window
functions with any of the aggregate functions you learned about in this chapter.

How the aggregate window functions work


Earlier in this chapter, you learned how to use some of the aggregate
functions with the GROUP BY clause to group and summarize data. When
you use GROUP BY, a single row is returned for each unique set of values in
the grouped columns. If a result set is grouped by the vendor_id column, for
example, only one row is returned for each vendor, and that vendor is summa-
rized by the aggregate functions that are included in the SELECT clause.
Aggregate window functions are similar except that the groups, or partitions,
ai:en't collapsed to a single row. Instead, all of the rows in the result set are
returned.
Figure 6-9 illustJ:ates how this works. To start, you code an aggregate
window function by including the OVER clause. This clause defines the window
that's used by the aggregate function. A window consists of all of the rows that
are needed to evaluate the function for the current row. You'll learn more about
how this works as you review the examples that follow.
The fust example in this figure shows a SELECT statement that includes
two aggregate window functions. Both of these functions use the SUM function
to calculate a total of the invoice_total column. However, the OVER clause for
the first function is empty, which means that all of the rows in the result set
are included in a single partition. Because of that, the total_invoices column
contains the same value for each column, which is the total of all of the invoices
in the result set. In this case, to calculate the total of all invoices, the SUM
function for each row needs a window into all of the other rows in the result set.
By contrast, the second window function in this query uses the PARTITION
BY clause to partition the result set by the vendor_id column. That way, the sum
of the invoice totals is calculated for each vendor instead of for all vendors. You
can see the result of this function in the vendor_total column. In this case, to
calculate the total of all invoices for each vendor, the SUM function for each row
needs a window into all the other rows for the same vendor. That means that if
the fust row for vendor 110 is the current row, it needs a window into the other
four rows for that vendor.
If you want to sort the rows within each partition, you can code the ORDER
BY clause on the OVER clause. This is illustrated by the second example in this
figure. Here, the second aggregate window fi.1nction indicates that the invoices
for eacl1 vendor should be sorted by the invoice_total column. If you compare
the sequence of the invoices for vendor 110 in this result set with the sequence in
the first result set, you shouldn't have any trouble understanding how this works.
Chapter 6 How to code sum1ncary queries 189

The basic syntax of the OVER clause


OVER{[PARTITION BY expressionl [, expression2] •••
[ORDER BY expressionl [ASCjDESC] [, expression2 [ASC IDESC]] ••• )

A SELECT statement with two aggregate window functions


SELECT vendor_ id, invoice_ date, invoice_ total,
SUM{invoice_ total) OVER() AS total_ invoices,
SUM{invoice_ total) OVER(PARTITION BY vendor_ id) AS vendor_ total
FROM invoices
WHERE invoice_ total > 5000

,-
I vendor_id invoice_date 1nvolce_total total_invoices vendor _total '

► 72 2018-06-01 21842.00 155800.00 21842.00


99 2018-06-18 6940.25 155800.00 6940. 25
104 2018-05-21 7125.34 155800.00 7125.34
110 2018-05-28 37966.19 155800.00 119892.41
110 2018-07-19 26881.-10 155800.00 119892.41
110 2018-07-23 20551.18 155800.00 119892.41
110 2018-07-24 23517.58 155800.00 119892.41
110 2018-07-31 10976.06 155800.00 119892.41

A SELECT statement that includes a cumulative total


SELECT vendor_ id, invoice_ date, invoice_ total,
SUM{invoice_ total ) OVER() AS total_ invoices,
SUM{invoice_ total) OVER(PARTITION BY vendor_ id
ORDER BY invoice_ total) AS vendor_ total
FROM invoices
WHERE invoice_ total > 5000

-
vendor id
-
invoice date invoice total
- -
total invoices vendor total
-
► 72 2018-06-01 21842.00 155800.00 21842.00
99 2018-06-18 6940.25 155800.00 6940.25
104 2018-05-21 7125.34 155800.00 7125.34
110 2018-07-31 10976.06 155800.00 10976.06
110 2018--07-23 20551.18 155800.00 31527.24
110 20 18-07-24 23517. 58 155800.00 55044.82
110 2018-07-19 26881.40 155800.00 81926.22
110 2018-05-28 37966.19 155800.00 119892.4 1

Description
• The window functions can be used with all of the aggregate functions listed in
figure 6-1 , as well as others.
• Unlike aggregate functions that use GROUP BY, the groups, or partitions, in an
aggregate window function are not collapsed to a single row.
• A window consists of all of the rows that are needed to calculate the aggregate
value for the current row.
• To treat an aggregate function as a window function, you include an OVER clause
that indicates how to partition the rows in the result set.
• If you code an e1npty OVER clause, the entire result set is treated as a single partition.
• If you code an OVER clause with a PARTITION BY clause, the aggregate function
is performed on each partition.
• If you code an ORDER BY clause on the OVER clause, the rows within each parti-
tion are sorted and the values from one row to the next are cumulative.

Figure 6-9 How the aggregate window functions work


190 Section 2 More SQL skills cts you need them

Another difference between these two result sets are the values in the
vendor_total column for vendor 110. That's because, when you code the
ORDER BY clause, the frame includes all of the rows from the start of the parti-
tion through the current row. You' ll learn more about defining frames explicitly
in the next figure . For now, just realize that a frame consists of one or more rows
within a partition relative to the current row.
For the SUM function, this means that the column contains a cumulative
total for each vendor. To see how this works, you can compare the values in
the vendor- total column with the values in the invoice- total column for vendor
110. Here, the values for the first row are the same. However, the second row
in the vendor_total column contains the value of the first row plus the value of
the second row in the invoice- total colunm. The third row in the vendor- total
column contains the value of the second row plus the value of the thit·d row in
the invoice total column. And so on.

How to use frames


In addition to partitioning the rows in the result set for an aggregate
function, you can create a frame that defines a subset of the current partition.
Because a frame is relative to the current row, it can move within a partition as
the current row changes. As you'll see, that makes it easy to calculate cumulative
totals and moving averages.
Figure 6-10 shows the syntax for defining a frame. To start, you can code the
ROWS or RANGE keyword. If you use ROWS , the frame is detemuned by the
number of rows before and after the current row. If yot1 use RANGE, the frame
is deter111jned by the value of the rows before and after the current row. In some
cases, you can get the same result with either ROWS or RANGE. In othe1· cases,
though, you'll need to use one or the other to get the result you want. You' ll see
examples of that in a minute.
Following the ROWS or RANGE keyword, you can specify just the starting
row for the frame or both the starting and ending rows. If you specify just the
starting row, the ending row is the current row. To specify both a starting and
ending row, you code a BETWEEN clause.
To indicate where a frame starts or ends, you can code any of the values
in the table shown in thls figure . To illustrate, the first example in thls figure
shows how to defme a frame that includes the first row in the partition up to
and including the current row. To do that, it uses the ROWS keyword followed
by a BETWEEN clause that specifies the starting and ending rows. In this case,
UNBOUNDED PRECEDING indicates that the frame starts at the first row in
the partition, and CURRENT ROW indicates that the frame ends at the current
row. Then, because the rows in the partitions are sorted, the column contains
cumulative values.
Note that, because the frame ends at the current row, you could also define
the frame like this:
ROWS UNBOUNDED PRECEDI NG
Chapter 6 How to code sum1n.ary queries 191

The syntax for defining a frame


{ROWS I RANGE} {frame_ start I BETWEEN frame_ start AND frame_ end}

Possible values for frame start and frame end


Value Description
CURRENT ROW The frame starts or ends with the current row.
UNBOUNDED PRECEDING The frame starts or ends with the first row in the partition.
UNBOUNDED FOLLOWING The frame starts or ends with the last row in the partition.
expr PRECEDING With ROWS , the frame starts expr rows before the current row. With
RANGE, the frame starts with the first row before the current row
whose value is expr less than tl1e value of the cun·ent row.
expr FOLLOWING With ROWS, the frame starts expr rows after the current row. With
RANGE, the frame starts with the last row after the current row whose
value is expr greater than the value of the current row.

A SELECT statement that defines a frame


SELECT vendor_ id, invoice_ date, invoice_ total,
SUM{invoice_ total) OVER{) AS total_ invoices,
SUM(invoice_ total) OVER(PARTITION BY vendor_ id ORDER BY invoice date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
AS vendor_ total
FROM invoices
WHERE invoice_ date BETWEEN '2018-04-01' AND '2018-04-30'
vendor_id lnvoice_date invoice_total total_invoices vendor _total
I

► 89 2018--04-24 95.00 5828. 18 95.00


95 2018-04-30 16.33 5828.18 16. 33
96 2018-04-26 66200 5828. 18 662.00
121 2018-04-24 601.95 5828. 18 601.95
122 2018-04-08 3813. 33 5828. 18 3813.33
123 2018-04-10 40. 20 5828. 18 40. 20
123 2018-0+13 138. 75 5828.18 178.95
123 2018-04-16 144.70 5828. 18 323.65
123 2018-04-16 15.50 5828. 18 339 . 15
123 2018-04-16 42.75 5828. 18 381.90
123 2018-04-21 172.50 5828.18 554.40
123 2018-04-24 4 2.67 5828. 18 597.07
.
123 2018-0+25 4250 5828. 18 639.57
''I"'

Description
• A frame can be defined as the number of rows before and after the current row
(ROWS) or a range of values based on the value of the current row (RANGE).
• If you specify just the starting row for a frame, the ending row is the current row.
To specify both a starting and ending row, you use the BETWEEN clause. When
you use BETWEEN, the starting row for a frame must not come after the ending
row.
• If an ORDER BY clause is included in the OVER clause and you use the ROWS
keyword, values are accumulated up to and including the current row as shown above.

Figure 6-1 O How to use frames (part 1 of 2)


192 Section 2 More SQL skills cts you need them

In that case, the ending row defaults to the cun·ent row. You can also omit the
frame definition entirely, since it's the default when you include ORDER BY on
the OVER clause. You saw how that worked in the previous figure.
Part 2 of figure 6-10 presents two more queries that use frames. The fust
query is almost identical to the one in part 1 of this figure. The only difference is
that it uses the RANGE keyword instead of the ROWS keyword. Because of that,
the frame includes all of the rows within the partition, along with the current row
and any of its peers. In this case, a peer is a row that has the same value as other
rows in the sort column. In this example, for instance, the result set includes
three invoices dated 2018-04-16 for vendor 123. If you look at the vendor_total
column for these rows, you'll see that they all contain the same value. That's
because the value of the invoice total colu1nn for all three of these rows is
included in the accumulation for the rows.
The second example in this figure illustrates a common use for frames.
Here, a moving average is calculated for the invoice totals. A moving average is
an average that's calculated on the current row plus a specified number of rows
before and after the current row. It's particularly useful when working with data
over a period of time to eliminate short-term fluctuations so long-term trends
become more obvious.
In this example, a three-month average is calculated for the sum of invoice
totals. To do that, the RANGE keyword is coded with a BETWEEN clause
that indicates that the invoice total for the current month, one month before the
cun·ent month, and one month after the current month should be used to calcu-
late the average. The three-month average for month 5, for example, is calculated
by adding the values in the invoice_total column for months 4, 5, and 6 and
dividing by 3.
Note that when you calculate a moving average, there isn't a row before
the first row to include in the average. Because of that, the average for that row
includes just the invoice totals for the current row and the next row. Similarly, the
average for the last row includes just the invoice totals for the current row and
the previous row.
This query also uses the MONTH function in the SELECT clause, the
ORDER BY clause for the OVER clause, and the GROUP BY clause. This
function extracts the numeric month from a date. You'll learn about this function
as well as other functions for working with dates in chapter 9.
Chapter 6 How to code sum1ncary queries 193

A SELECT statement that creates peer groups


SELECT vendor_ id, invoice_ date, invoice_ total,
SUM(invoice_ total} OVER(} AS total_ invoices,
SUM(invoice_ total} OVER(PARTITION BY vendor_ id ORDER BY invoice_ date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW}
AS vendor_ total
FROM invoices
WHERE invoice_ date BETWEEN '2018-04-01' AND '2018-04-30'
vendor_id involce_date invoice_total total _invoices vendor _total
► 89 2018-04-24 95.00 5828. 18 95.00
95 2018-04-30 16.33 5828. 18 16.3"3
96 2018-04-26 662..00 5828. 18 662.00
121 2018-04-24 601.95 5828. 18 60L95
122 2018-04-08 3813.33 5828.18 3813.33
123 2018-04-10 40.20 5828. 18 '10. 20
123 2018-04-13 138. 75 5828. 18 178.95
-
123 2018-04-16 144. 70 5828. 18 381.90
123 2018-04-16 15. 50 5828. 18 381.90 Peer group
123 2018-04-16 42..75 5828.18 38 L90 -
123 2018-04-21 172.50 5828. 18 554.-10
12'3 2018-04-24 42.67 5828. 18 597.07
123 2018-04-25 42. 50 5828. 18 639.57
- ~-

A SELECT statement that calculates moving averages


SELECT MONTH(invoice_ date} AS month, SUM(invoice_ total} AS total_ invoices,
ROUND(AVG(SUM(invoice_ total}} OVER(ORDER BY MONTH(invoice_ date}
RANGE BETWEEN 1 PRECED:ING AND 1 FOLLOW:ING }, 2} AS 3_month_ avg
FROM invoices
GROUP BY MONTH(invoice_ date)
month -
total invoices 3_month_avg

► 4 5828.18 32212.64
5 58597. 10 39614.34
6 54417.73 69370. 19
7 95095.75 49955.08
8 351.75 4n23.75

Description
• If an ORDER BY clause is included in the OVER clause and you use the RANGE keyword,
values are accumulated up to and including the current row as well as its peer rows. A peer
is a row that's in the same sort sequence as other rows in the partition as shown by the first
example above.
• You can use a frame to calcul ate a moving average. A moving average is calculated by adding
the value of the current row to the values of zero or 1nore preceding and following rows.
• Because there are no preceding rows for the frrst row in a partition, the moving average
for that row consists of the average of the value of the current row plus the values of the
following rows. Similarly, the moving average for the last row consists of the average of
the value of the current row plus the values of the previous rows.

Figure 6-10 How to use frames (part 2 of 2)


194 Section 2 More SQL skills cts you need them

How to use named windows


In some cases, you'll need to code a SELECT statement with two or more
aggregate functions that use the same window. Then, you may want to use a
named window so you don't have to repeat the definition for the window for
each function. Figure 6-11 shows how to define and use a named window.
The frrst example in this figure shows a SELECT statement that includes
four of the aggregate functions. Each function includes an OVER clause that
partitions the rows in the result set by the vendor_id column. To do that, the
PARTITION BY clause is repeated on each OVER clause.
An easier way to do this is to name the window by coding a WINDOW
clause as shown in the second example. Here, the window is named
vendor_window, and it's defined with the PARTITION BY clause. Then,
the four aggregate functions include just the window name on the OVER
clause. In other words, they don't have to repeat the PARTITION BY
clause. Note that when you code j11st a window name, you don't enclose it
in parentheses.
If you review the code in this example, you might wonder why you would
use a named window. After all, the code isn't any simpler since the window
definition consists of only a PARI'l'l'ION BY clause. The answer is that, if you
wanted to change the window definition, you would only need to do it in one
place. Of course, window names provide even mo1·e of an advantage as your
window definitions get more complex.
The thil·d example in this figure shows how you can modify a window
definition when you use it. To do that, you can add a PARTITION BY or
ORDER BY clause or a frame definition. In this example, the named window
partitions the rows in the result set by the vendor_id column just like the second
example. Then, the SELECT clause includes two columns that sum the invoice
totals for each vendor. Both columns use the named window, but they sort the
totals in a different sequence. To do that, the window name is followed by an
ORDER BY clause, and both the name and clause are enclosed in parentheses.
When you use named windows, you should know that you can't modify any
of the clauses that are included in the window definition. For example, because
the window in the third example includes a PARTITION BY clause, you can't
include that clause on an OVER clause that uses the named window. Instead,
you can only add to the window definition.
Chapter 6 How to code sum1ncary queries 195

The syntax for naming a window


WINDOW window_ name AS ([partition_ clause] [order_ clause] [frame_ clause])

A SELECT statement with four functions that use the same window
SELECT vendor_ id, invoice_ date, invoice_ total,
SUM(invoice_ total) OVER(PARTITION BY vendor_ id) AS vendor_ total,
ROUND(AVG(invoice_ total) OVER(PARTITION BY vendor_ id) , 2) AS vendor_ avg,
MAX(invoice_ total) OVER(PARTITION BY vendor_ id) AS vendor_max,
MIN(invoice_ total) OVER(PARTITION BY vendor_ id) AS vendor_min
FROM invoices
WHERE invoice_ total > 5000

A SELECT statement with a named window


SELECT vendor_ id, invoice_ date, invoice_ total,
SUM(invoice_ total) OVER vendor_ windrn,t AS vendor_ total,
ROUND(AVG(invoice_ total) OVER vendor_ window, 2) AS vendor_ avg,
MAX(invoice_ total) OVER vendor_ window AS vendor_ ma~,
MIN(invoice_ total) OVER vendor_ window AS vendor_ min
FROM invoices
WHERE invoice_ total > 5000
WINDOW vendor_ window AS (PARTITION BY vendor_ id)

The result set for both statements


vendorjd invoice_date invoice_total vendor_total vendor_avg vendor_max vendor_min

► n 2018-06,-01 21842.00 21842.00 21842.00 21842.00 21842.00


99 2018-06-18 6940.25 6940.25 6940.25 6940.25 6940.25
104 2018-05-21 7125.34 7125.34 7l25.34 7 125.34 7l25.34
110 2018-05-28 37966.19 119892,41 23978.48 37966.19 10976.06
110 2018-07-19 2688 1.40 119892.4 1 23978.48 37966.19 10976.06
110 2018-07-23 20551.18 119892.41 23978.48 37966.19 10976.06 1,

110 2018-07-24 235 17.58 119892. 41 23978.48 37966.19 10976.06


110 2018-07-31 10976.06 119892.41 23978.48 37966.19 10976.06
""='

A SELECT statement that adds to the specification for a named window


SELECT vendor_ id, invoice_ date, invoice_ total,
SUM(invoice_ total) OVER (vendor_ window ORDER BY invoice_ date ASC)
AS invoice_ date_ asc,
SUM(invoice_ total) OVER (vendor_ window ORDER BY invoice_ date DESC)
AS invoice_ date_ desc
FROM invoices
WHERE invoice_ total > 5000
WINDOW vendor_ window AS (PARTITION BY vendor_ id)

Description
• To define a named window, you code a WINDOW clause. This clause should be coded
after the HAVING clause and before the ORDER BY clause, if those clauses are included.
• To use a named window, you code it on the OVER clause. If you code just the window
name, you don't include parentheses.
• If a WINDOW clause doesn' t include a PARTITION BY or ORDER BY clause or a
frame definition, you can add it to the window when you use it. To do that, you code
the window name and the additional clause in parentheses after the OVER keyword.

Figure 6-11 How to use named windows


196 Section 2 More SQL skills cts you need them

Perspective
In this chapter, you learned how to code queries that group and summarize
data. In most cases, you '11 be able to use the techniques presented here to get
the summary information you need.

Terms
scalar function partition
aggregate function window
column function frame
summary query peer

functionally dependent column movmg average
aggregate window function named window

Exercises
1. Write a SELECT statement that returns one row for each vendor in the
Invoices table that contains these columns:
The vendor_id column from the Invoices table
The sum of the invoice- total columns in the Invoices table for that vendor
This should return 34 rows.
2. Write a SELECT statement that returns one row for each vendor that contains
these columns:
The vendor_name column from the Vendors table
The sum of the payment_total columns in the Invoices table for that vendor
Sort the result set in descending sequence by the payment total sum for each
vendor.
3. Write a SELECT statement that returns one row for each vendor that contains
three columns:
The vendor name column from the Vendors table
The count of the invoices in the I11voices table for each vendor
The sum of the invoice_total columns in the Invoices table for each vendor
Sort the result set so the vendor with the most invoices appears first.
4. Write a SELECT statement that returns one row for each general ledger
account number that contains three columns:
The account_description column from the General_Ledger_Accounts table
The count of the items in the Invoice- Line- Items table that have the same
account_number
Chapter 6 How to code sum1ncary queries 197

The sum of the line item amount columns in the Invoice- Line- Items table
that have the same account_nu1nber
Return only those rows where the count of line items is greater than 1. This
sl1ould return 10 rows.
Group the result set by the account_description column.
Sort the resL1lt set in descending sequence by the sum of the line item
a1nounts.
5. Modify the solution to exercise 4 so it returns only invoices dated in the
second quarter of 2018 (April 1, 2018 to June 30, 2018). This should still
return 10 rows but with some different line item counts for each vendor. Hint:
Join to tlie Invoices table to code a secirch condition based on invoice_date.
6. Write a SELECT statement that answers this question: What is the total
amount invoiced for each general ledger account nt1mber? Return these
columns:
The account- number column fro1n the Invoice- Line- Items table
The sum of the line_item_amount columns from the Invoice_Line_Items
table
Use the WITH ROLLUP operator to include a row that gives the grand total.
This should return 22 rows.
7. Write a SELECT statement that answers this question: Which vendors are
being paid from more than one account? Return these columns:
The vendor name colL1mn from the Vendors table
The count of distinct general ledger accounts that apply to that vendor's
• •
1nvo1ces
This should return 2 rows.
8. Write a SELECT statement that answers this question: What are the last
payment date and total amount due for each vendor with each terms id?
Return these columns:
The terms id column from the Invoices table
The vendor id column from the Invoices table
The last payment date for each combination of terms id and vendor id in the
Invoices table
The sum of the balance due (invoice_total - payment_total - credit_total)
for each combination of terms id and vendor id in the Invoices table
Use the WITH ROLLUP operator to include rows that give a summa1-y for
each terms id as well as a row that gives the grand total. This should return 40
rows.
Use the IF and GROUPING functions to replace the null values in the terms_id
and vendor_id columns with literal values if they're for st1mmary rows.
198 Section 2 More SQL skills cts you need them

9. Write a SELECT statement that uses aggregate window functions to calculate


the total due for all vendors and the total due for each vendor. Return these
columns:
The vendor id from the Invoices table
The balance due (invoice_total - payment_total - credit_total) for each
invoice in the Invoices table with a balance due greater than 0
The total balance due for all vendors in the Invoices table
The total balance due for each vendor in the Invoices table
Modify the column that contains the balance due £01· each vendor so it
contains a cumulative total by balance due. This should return 11 rows.
10. Modify the solution to exercise 9 so it includes a column that calculates the
average balance due for each vendor in the Invoices table. This column should
contain a cumulative average by balance due.
Modify the SELECT statement so it uses a named window for the last two
aggregate window functions.
11. Write a SELECT statement that uses an aggregate window function to calcu-
late a moving average of the sum of invoice totals. Return these columns:
The month of the invoice date from the Invoices table
The sum of the invoice totals from the Invoices table
The moving average of the invoice totals sorted by invoice month
The result set should be grouped by invoice month and the frame for the
moving average should include the current row plus tlu·ee rows before the
current row.

You might also like