01.murachs MySQL 2019 Chapter 06
01.murachs MySQL 2019 Chapter 06
A summary query that counts unpaid invoices and calculates the total due
SELECT COUNT{*) AS number_ of_ invoices,
SUM(invoice_ total - payment_ total - credit_total) AS total_due
FROM invoices
WHERE invoice total - payment_ total - credit total> 0
number _of_invokes total_due
•
► 11 32020.42
Description
• Aggregate functions, also called column functions, perform a calculation on the
values in a set of selected rows.
• A sitmmary query is a SELECT statement that includes one or more aggregate
functions.
• The expression you specify for the AVG and SUM functions must result in a
numeric value. The expression for the MIN, MAX, and COUNT ft.1nctions can
result in a numeric, date, or string value.
• By default, all values are included in the calculation regardless of whether they're
duplicated. If you want to omit duplicate values, code the DISTINCT keyword.
This keyword is typically used with the COUNT function.
• All of the aggregate functions except for COUNT(*) ignore null values.
• If you code an aggregate function in the SELECT clause, that clause can include
non-agg1·egate columns from the base table if that column is functionally dependent
on an aggregate column. See figure 6-3 for more information.
A summary query that uses the COUNT(*), AVG, and SUM functions
SELECT 'After 1/1/2018' AS selection_ date,
COUNT(*) AS number_ of_ invoices,
ROUND{AVG{invoice_ total), 2) AS avg_ invoice_ amt,
SUM{invoice_ total) AS total_ invoice_ amt
FROM invoices
WHERE invoice_ date > '2018-01-01'
selection_date number_of_lnvOJces avg_ilvoice_amt total_invoice_amt
► After 1/1/2018 114 1879.74 214290.51
-► -
number _of_vendors
34
number_of_involces
114
avg_invoic:e_amt
1879.74
total_invoice_amt
214290.51
Description
• To cot1nt all of the selected 1·ows, you typically use the COUNT(*) function.
Alternately, you can use the COUNT function with the name of any column that
can't contain null values.
• To cot1nt only the rows with unique values in a specified column, you can code
the COUNT function with the DISTINCT keyword followed by the name of the
column.
► 110 23978.48
72 10963.66
104 7125.34
99 6940.25
119 490 1.26
122 2575.33
86 2433.00
100 2184.50
(8 rows)
Description
• The GROUP BY clause groups the rows of a result set based on one or more columns or
expressions. To include two or more columns or expressions, separate them by commas.
• If you include aggregate functions in the SELECT clause, the aggregate is calculated for
each group specified by the GROUP BY clause.
• If you include two or more colu1nns or expressions in the GROUP BY clause, they form
a hierarchy where each column or expression is subordinate to the previous one.
• The HAVING clause specifies a search condition for a group or an aggregate. MySQL
applies this condition after it groups the rows that satisfy the search condition in the
WHERE clause.
• When a SELECT statement includes a GROUP BY clause, the SELECT clause can
include the columns used for grouping, aggregate functions, and expressions that
result in a constant value.
• The SELECT clause can also include columns that are functionally dependent on a
column used fo1· grouping. To be functionally dependent, the grouping column must
be a primary key of the table that contains the column in the SELECT clause or it
must be unique and not allow null values.
► 34
37
2
3
-
48 1
72 2
(34 rows )
(20 rows)
CA Oxnard 3 188,00
CA Pasadena 5 196. 12
CA Sacramento 7 253.00 \I
(12 rows)
Description
• With MySQL 8.0.12 and earlier, the GROUP BY clause sorted the columns in ascending
sequence by default. Then, to change the sort sequence, you could code the DESC
keyword after the column name in the GROUP BY clause. In addition, to get your results
faster, you cot1ld code an ORDER BY NULL clause to prevent MySQL from sorting the
rows in the GROUP BY clause.
• With MySQL 8.0.13 and later, the columns in a GROUP BY clause are no longer sorted by
default, and you can't code the ASC or DESC keyword on this clause. Instead, you must
use the ORDER BY clause to specify the sort sequence.
Figure 6-4 Queries that use the GROUP BY and HAVING clauses
178 Section 2 More SQL skills cts you need them
(19 rows)
{20 rows)
Description
• When you include a WHERE clat1se in a SELECT statement that uses grouping
and aggregates, MySQL applies the search condition before it groups the rows and
calculates the aggregates.
• When you include a HAVING clause in a SELECT statement that uses grouping
and aggregates, MySQL applies the search condition after it groups the rows and
calculates the aggregates.
• A WHERE clause can refer to any colwnn in the base tables.
• A HAVING clause can only refer to a column included in the SELECT clause.
• A WHERE clause can't contain aggregate ft1nctions.
• A HAVING clause can contain aggregate functions.
Figure 6-5 How the HAVING clause compares to the WHERE clause
180 Section 2 More SQL skills cts you need them
► 2018-05-3 1 2 453.75
2018-05-25 3 220L15
2018-05-23 2 347. 75
2018-05-21 2 8078.44
2018-05-13 3 1888.95
2018-05- 11 2 5009.51
2018-05-03 2 866.87
(7 rows)
Description
• You can use the AND and OR operators to code compound search conditions in a
HAVING clause just as you can in a WHERE clause.
• If a search condition includes an aggregate function, it must be coded in the
HAVING clause. Otherwise, it can be coded in either the HAVING or the WHERE
clause.
(35 rows)
A summary query that includes a summary row for each grouping level
SELECT vendor_ state, vendor_ city, COUNT(*) AS qty_vendors
FROM vendors
WHERE vendor_ state IN ( • IA• , •NJ• )
GROUP BY vendor_ state, vendor_ city WITH ROLLUP
vendor_state vendor_dty qty_vendors
► IA Fairfield 1
IA Washington 1
IA om,, 2
NJ East Brunswick 2
NJ Fairfield 1
NJ Washington 1
NJ HW!I 4
ml 11111!1 6
Description
• You can use the WITH ROLLUP operator in the GROUP BY clause to add
summary rows to the final result set.
• The WITH ROLLUP operator adds a summary row for each group specified in the
GROUP BY clause. It also adds a summary row to the end of the result set that
summarizes the entire result set.
• If the GROUP BY clause specifies a single group, the WITH ROLLUP operator
only adds the fmal summary row.
• With MySQL 8.0.12 and earlier, you couldn' t use the ORDER BY clause with the
WITH ROLLUP operator. Instead, to sort individual columns, you coded the ASC
or DESC keyword after the column in the GROUP BY clause.
• With MySQL 8.0.13 and later, you can't code the ASC or DESC keyword on the
GROUP BY clause. However, you can now include an ORDER BY clause to sort
the result set when the GROUP BY clause includes WITH ROLLUP.
• With MySQL 8.0.12 and earlier, you couldn't use the DISTINCT keyword in any
of the aggregate functions when you used the WITH ROLLUP operator. With
MySQL 8.0.13 and later, you can use the DISTINCT keyword.
A summary query that uses WITH ROLLUP on a table with null values
SELECT invoice_ date, payment_ date,
SUM(invoice_ total) AS invoice_ total,
SUM(invoice_ total - credit_ total - payment_ total) AS balance_ due
FROM invoices
WHERE invoice_ date BETWEEN '2018-07-24' AND 1 2018-07-31 1
GROUP BY invoice_ date, payment_ date WITH ROLLUP
invoice_date payment_date Invoice_total balance_due
0®91
► 2018-07-24 503.20 503. 20
2018-07-24 2018-08-19 3689.99 0.00
2018-07-24 2018-08-23 67.00 0 .00
2018-07-24 2018-08-27 23517.58 0 .00
2018-07-24 HPJII 27777.n 503. 20
2018-07-25 2018-08-22 1000.'16 0 ,00
2018-07-25 l®!I 1000.-46 0 .00
2018-07-28 UQl!I 90.36 90.36
2018-07-28 HW!I 90.36 90.36
2018-07-30 2018-09-03 22. 57 0 .00
2018--0 7-30 ®J!I 22.57 o.oo
2018-07-31 lllij!i 10976.06 10976.06
il®!I 10976.06 10976.06
~&,;8-07-31
HW!I 39867.22 11569.62
Description
• The GROUPING function returns 1 if the expression is null because it's in a
summary row. Otherwise, it returns 0.
Part 2 of figure 6-8 shows another common use for the GROUPING
function. The query in this example is identical to the second one in part 1
of this figure, except that it includes a HAVING clause. This clause uses the
GROUPING function to filter the result set so only the summary rows are
included. To do that, it checks if this function returns a value of 1 for the
invoice_date or payment_date column.
Chapter 6 How to code sum1n.ary queries 187
Description
• The GROUPING function is co1nmo1tly used to replace the nulls that are gener-
ated by WITH ROLLUP with literal values. To do that, you can use it with the IF
function as shown in this figure.
• The IF function evaluates the expression in the frrst argument and returns the
second argument if the expression is true and the third argument if the expression is
false. See chapter 9 for more information on this function.
• If you want to display just the summary rows produced by the WITH ROLLUP
operator, you can include one or more GROUPING functions in the HAVING
clause.
• In addition to the SELECT and HAVINO clauses, you can code the GROUPING
function in the ORDER BY clause.
,-
I vendor_id invoice_date 1nvolce_total total_invoices vendor _total '
-
vendor id
-
invoice date invoice total
- -
total invoices vendor total
-
► 72 2018-06-01 21842.00 155800.00 21842.00
99 2018-06-18 6940.25 155800.00 6940.25
104 2018-05-21 7125.34 155800.00 7125.34
110 2018-07-31 10976.06 155800.00 10976.06
110 2018--07-23 20551.18 155800.00 31527.24
110 20 18-07-24 23517. 58 155800.00 55044.82
110 2018-07-19 26881.40 155800.00 81926.22
110 2018-05-28 37966.19 155800.00 119892.4 1
Description
• The window functions can be used with all of the aggregate functions listed in
figure 6-1 , as well as others.
• Unlike aggregate functions that use GROUP BY, the groups, or partitions, in an
aggregate window function are not collapsed to a single row.
• A window consists of all of the rows that are needed to calculate the aggregate
value for the current row.
• To treat an aggregate function as a window function, you include an OVER clause
that indicates how to partition the rows in the result set.
• If you code an e1npty OVER clause, the entire result set is treated as a single partition.
• If you code an OVER clause with a PARTITION BY clause, the aggregate function
is performed on each partition.
• If you code an ORDER BY clause on the OVER clause, the rows within each parti-
tion are sorted and the values from one row to the next are cumulative.
Another difference between these two result sets are the values in the
vendor_total column for vendor 110. That's because, when you code the
ORDER BY clause, the frame includes all of the rows from the start of the parti-
tion through the current row. You' ll learn more about defining frames explicitly
in the next figure . For now, just realize that a frame consists of one or more rows
within a partition relative to the current row.
For the SUM function, this means that the column contains a cumulative
total for each vendor. To see how this works, you can compare the values in
the vendor- total column with the values in the invoice- total column for vendor
110. Here, the values for the first row are the same. However, the second row
in the vendor_total column contains the value of the first row plus the value of
the second row in the invoice- total colunm. The third row in the vendor- total
column contains the value of the second row plus the value of the thit·d row in
the invoice total column. And so on.
Description
• A frame can be defined as the number of rows before and after the current row
(ROWS) or a range of values based on the value of the current row (RANGE).
• If you specify just the starting row for a frame, the ending row is the current row.
To specify both a starting and ending row, you use the BETWEEN clause. When
you use BETWEEN, the starting row for a frame must not come after the ending
row.
• If an ORDER BY clause is included in the OVER clause and you use the ROWS
keyword, values are accumulated up to and including the current row as shown above.
In that case, the ending row defaults to the cun·ent row. You can also omit the
frame definition entirely, since it's the default when you include ORDER BY on
the OVER clause. You saw how that worked in the previous figure.
Part 2 of figure 6-10 presents two more queries that use frames. The fust
query is almost identical to the one in part 1 of this figure. The only difference is
that it uses the RANGE keyword instead of the ROWS keyword. Because of that,
the frame includes all of the rows within the partition, along with the current row
and any of its peers. In this case, a peer is a row that has the same value as other
rows in the sort column. In this example, for instance, the result set includes
three invoices dated 2018-04-16 for vendor 123. If you look at the vendor_total
column for these rows, you'll see that they all contain the same value. That's
because the value of the invoice total colu1nn for all three of these rows is
included in the accumulation for the rows.
The second example in this figure illustrates a common use for frames.
Here, a moving average is calculated for the invoice totals. A moving average is
an average that's calculated on the current row plus a specified number of rows
before and after the current row. It's particularly useful when working with data
over a period of time to eliminate short-term fluctuations so long-term trends
become more obvious.
In this example, a three-month average is calculated for the sum of invoice
totals. To do that, the RANGE keyword is coded with a BETWEEN clause
that indicates that the invoice total for the current month, one month before the
cun·ent month, and one month after the current month should be used to calcu-
late the average. The three-month average for month 5, for example, is calculated
by adding the values in the invoice_total column for months 4, 5, and 6 and
dividing by 3.
Note that when you calculate a moving average, there isn't a row before
the first row to include in the average. Because of that, the average for that row
includes just the invoice totals for the current row and the next row. Similarly, the
average for the last row includes just the invoice totals for the current row and
the previous row.
This query also uses the MONTH function in the SELECT clause, the
ORDER BY clause for the OVER clause, and the GROUP BY clause. This
function extracts the numeric month from a date. You'll learn about this function
as well as other functions for working with dates in chapter 9.
Chapter 6 How to code sum1ncary queries 193
► 4 5828.18 32212.64
5 58597. 10 39614.34
6 54417.73 69370. 19
7 95095.75 49955.08
8 351.75 4n23.75
Description
• If an ORDER BY clause is included in the OVER clause and you use the RANGE keyword,
values are accumulated up to and including the current row as well as its peer rows. A peer
is a row that's in the same sort sequence as other rows in the partition as shown by the first
example above.
• You can use a frame to calcul ate a moving average. A moving average is calculated by adding
the value of the current row to the values of zero or 1nore preceding and following rows.
• Because there are no preceding rows for the frrst row in a partition, the moving average
for that row consists of the average of the value of the current row plus the values of the
following rows. Similarly, the moving average for the last row consists of the average of
the value of the current row plus the values of the previous rows.
A SELECT statement with four functions that use the same window
SELECT vendor_ id, invoice_ date, invoice_ total,
SUM(invoice_ total) OVER(PARTITION BY vendor_ id) AS vendor_ total,
ROUND(AVG(invoice_ total) OVER(PARTITION BY vendor_ id) , 2) AS vendor_ avg,
MAX(invoice_ total) OVER(PARTITION BY vendor_ id) AS vendor_max,
MIN(invoice_ total) OVER(PARTITION BY vendor_ id) AS vendor_min
FROM invoices
WHERE invoice_ total > 5000
Description
• To define a named window, you code a WINDOW clause. This clause should be coded
after the HAVING clause and before the ORDER BY clause, if those clauses are included.
• To use a named window, you code it on the OVER clause. If you code just the window
name, you don't include parentheses.
• If a WINDOW clause doesn' t include a PARTITION BY or ORDER BY clause or a
frame definition, you can add it to the window when you use it. To do that, you code
the window name and the additional clause in parentheses after the OVER keyword.
Perspective
In this chapter, you learned how to code queries that group and summarize
data. In most cases, you '11 be able to use the techniques presented here to get
the summary information you need.
Terms
scalar function partition
aggregate function window
column function frame
summary query peer
•
functionally dependent column movmg average
aggregate window function named window
Exercises
1. Write a SELECT statement that returns one row for each vendor in the
Invoices table that contains these columns:
The vendor_id column from the Invoices table
The sum of the invoice- total columns in the Invoices table for that vendor
This should return 34 rows.
2. Write a SELECT statement that returns one row for each vendor that contains
these columns:
The vendor_name column from the Vendors table
The sum of the payment_total columns in the Invoices table for that vendor
Sort the result set in descending sequence by the payment total sum for each
vendor.
3. Write a SELECT statement that returns one row for each vendor that contains
three columns:
The vendor name column from the Vendors table
The count of the invoices in the I11voices table for each vendor
The sum of the invoice_total columns in the Invoices table for each vendor
Sort the result set so the vendor with the most invoices appears first.
4. Write a SELECT statement that returns one row for each general ledger
account number that contains three columns:
The account_description column from the General_Ledger_Accounts table
The count of the items in the Invoice- Line- Items table that have the same
account_number
Chapter 6 How to code sum1ncary queries 197
The sum of the line item amount columns in the Invoice- Line- Items table
that have the same account_nu1nber
Return only those rows where the count of line items is greater than 1. This
sl1ould return 10 rows.
Group the result set by the account_description column.
Sort the resL1lt set in descending sequence by the sum of the line item
a1nounts.
5. Modify the solution to exercise 4 so it returns only invoices dated in the
second quarter of 2018 (April 1, 2018 to June 30, 2018). This should still
return 10 rows but with some different line item counts for each vendor. Hint:
Join to tlie Invoices table to code a secirch condition based on invoice_date.
6. Write a SELECT statement that answers this question: What is the total
amount invoiced for each general ledger account nt1mber? Return these
columns:
The account- number column fro1n the Invoice- Line- Items table
The sum of the line_item_amount columns from the Invoice_Line_Items
table
Use the WITH ROLLUP operator to include a row that gives the grand total.
This should return 22 rows.
7. Write a SELECT statement that answers this question: Which vendors are
being paid from more than one account? Return these columns:
The vendor name colL1mn from the Vendors table
The count of distinct general ledger accounts that apply to that vendor's
• •
1nvo1ces
This should return 2 rows.
8. Write a SELECT statement that answers this question: What are the last
payment date and total amount due for each vendor with each terms id?
Return these columns:
The terms id column from the Invoices table
The vendor id column from the Invoices table
The last payment date for each combination of terms id and vendor id in the
Invoices table
The sum of the balance due (invoice_total - payment_total - credit_total)
for each combination of terms id and vendor id in the Invoices table
Use the WITH ROLLUP operator to include rows that give a summa1-y for
each terms id as well as a row that gives the grand total. This should return 40
rows.
Use the IF and GROUPING functions to replace the null values in the terms_id
and vendor_id columns with literal values if they're for st1mmary rows.
198 Section 2 More SQL skills cts you need them