0% found this document useful (0 votes)
64 views

Aggregate Window Functions and Frames

This document discusses how to calculate moving averages and totals in PostgreSQL using window functions. It provides examples of source data on medals earned by the USA in summer Olympics from 1984-2004. It then shows how to use window functions like AVG and SUM with a frame to calculate a 3-year moving average and total of medals earned over time. This allows analyzing trends and performance while smoothing out year-to-year variability.

Uploaded by

dieko
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Aggregate Window Functions and Frames

This document discusses how to calculate moving averages and totals in PostgreSQL using window functions. It provides examples of source data on medals earned by the USA in summer Olympics from 1984-2004. It then shows how to use window functions like AVG and SUM with a frame to calculate a 3-year moving average and total of medals earned over time. This allows analyzing trends and performance while smoothing out year-to-year variability.

Uploaded by

dieko
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Aggregate window

functions
P O S TG R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
Source table
Query Result

SELECT | Year | Medals |


Year, COUNT(*) AS Medals |------|--------|
FROM Summer_Medals | 1992 | 13 |
WHERE | 1996 | 5 |
Country = 'BRA' | 2004 | 18 |
AND Medal = 'Gold' | 2008 | 14 |
AND Year >= 1992 | 2012 | 14 |
GROUP BY Year
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Aggregate functions
MAX Query MAX Result

WITH Brazil_Medals AS (...) 18

SELECT MAX(Medals) AS Max_Medals


FROM Brazil_Medals;

SUM Query SUM Result

WITH Brazil_Medals AS (...) 64

SELECT SUM(Medals) AS Total_Medals


FROM Brazil_Medals;

what if you want to see the max meddals earned so far, or calculate cumulative sum of medals earned?

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


MAX Window function
Query Result

WITH Brazil_Medals AS (...) | Year | Medals | Max_Medals |


|------|--------|------------|
SELECT | 1992 | 13 | 13 |
Year, Medals, | 1996 | 5 | 13 |
MAX(Medals) | 2004 | 18 | 18 |
OVER (ORDER BY Year ASC) AS Max_Medals | 2008 | 14 | 18 |
FROM Brazil_Medals; | 2012 | 14 | 18 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


SUM Window function
Query Result

WITH Brazil_Medals AS (...) | Year | Medals | Medals_RT |


|------|--------|-----------|
SELECT | 1992 | 13 | 13 |
Year, Medals, | 1996 | 5 | 18 |
SUM(Medals) OVER (ORDER BY Year ASC) AS Medals_RT | 2004 | 18 | 36 |
FROM Brazil_Medals; | 2008 | 14 | 50 |
| 2012 | 14 | 64 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Partitioning with aggregate window functions
Query Query

WITH Medals AS (...) WITH Medals AS (...)


SELECT Year, Country, Medals, SELECT Year, Country, Medals,
SUM(Meals) OVER (...) SUM(Meals) OVER (PARTITION BY Country ...)
FROM Medals; FROM Medals;

Result Result

| Year | Country | Medals | Medals_RT | | Year | Country | Medals | Medals_RT |


|------|---------|--------|-----------| |------|---------|--------|-----------|
| 2004 | BRA | 18 | 18 | | 2004 | BRA | 18 | 18 |
| 2008 | BRA | 14 | 32 | | 2008 | BRA | 14 | 32 |
| 2012 | BRA | 14 | 46 | | 2012 | BRA | 14 | 46 |
| 2004 | CUB | 31 | 77 | | 2004 | CUB | 31 | 31 |
| 2008 | CUB | 2 | 79 | | 2008 | CUB | 2 | 33 |
| 2012 | CUB | 5 | 84 | | 2012 | CUB | 5 | 38 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S TG R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S
Frames
P O S TG R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
Motivation
LAST_VALUE

LAST_VALUE(City) OVER (
ORDER BY Year ASC
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS Last_City

Frame: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Without the frame, LAST_VALUE would return the row's value in the City column

By default, a frame starts at the beginning of a table or partition and ends at the current row

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROWS BETWEEN
ROWS BETWEEN [START] AND [FINISH]
n PRECEDING : n rows before the current row

CURRENT ROW : the current row

n FOLLOWING : n rows after the current row

Examples

ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source table
Query Result

SELECT | Year | Medals |


Year, COUNT(*) AS Medals |------|--------|
FROM Summer_Medals | 1996 | 36 |
WHERE | 2000 | 66 |
Country = 'RUS' | 2004 | 47 |
AND Medal = 'Gold' | 2008 | 43 |
GROUP BY Year | 2012 | 47 |
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


MAX without a frame
Query Result

WITH Russia_Medals AS (...) | Year | Medals | Max_Medals |


|------|--------|------------|
SELECT | 1996 | 36 | 36 |
Year, Medals, | 2000 | 66 | 66 |
MAX(Medals) | 2004 | 47 | 66 |
OVER (ORDER BY Year ASC) AS Max_Medals | 2008 | 43 | 66 |
FROM Russia_Medals | 2012 | 47 | 66 |
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


MAX with a frame
Query Result

WITH Russia_Medals AS (...) | Year | Medals | Max_Medals | Max_Medals_Last |


|------|--------|------------|-----------------|
SELECT | 1996 | 36 | 36 | 36 |
Year, Medals, | 2000 | 66 | 66 | 66 |
MAX(Medals) | 2004 | 47 | 66 | 66 |
OVER (ORDER BY Year ASC) AS Max_Medals, | 2008 | 43 | 66 | 47 |
MAX(Medals) | 2012 | 47 | 66 | 47 |
OVER (ORDER BY Year ASC
ROWS BETWEEN
1 PRECEDING AND CURRENT ROW)
AS Max_Medals_Last
FROM Russia_Medals
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Current and following rows
Query Result

WITH Russia_Medals AS (...) | Year | Medals | Max_Medals_Next |


|------|--------|-----------------|
SELECT | 1996 | 36 | 66 |
Year, Medals, | 2000 | 66 | 66 |
MAX(Medals) | 2004 | 47 | 47 |
OVER (ORDER BY Year ASC | 2008 | 43 | 47 |
ROWS BETWEEN | 2012 | 47 | 47 |
CURRENT ROW AND 1 FOLLOWING)
AS Max_Medals_Next
FROM Russia_Medals
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S TG R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S
Moving averages and
totals
P O S TG R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Moving averages
Overview
Moving average (MA): Average of last n periods
Example: 10-day MA of units sold in sales is the average of the last 10 days' sold units

Used to indicate momentum/trends if a days's units sold is higher than its moving average, then the next day more units are
likely to be sold

Also useful in eliminating seasonality

Moving total: Sum of last n periods


Example: Sum of the last 3 Olympic games' medals

Used to indicate performance; if the sum is going down, overall performance is going down

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source table
Query Result

SELECT | Year | Medals |


Year, COUNT(*) AS Medals |------|--------|
FROM Summer_Medals | 1984 | 168 |
WHERE | 1988 | 77 |
Country = 'USA' | 1992 | 89 |
AND Medal = 'Gold' | 1996 | 160 |
AND Year >= 1980 | 2000 | 130 |
GROUP BY Year | 2004 | 116 |
ORDER BY Year ASC; | 2008 | 125 |
| 2012 | 147 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Moving average
Query Result

WITH US_Medals AS (...) | Year | Medals | Medals_MA |


|------|--------|-----------|
SELECT | 1984 | 168 | 168.00 |
Year, Medals, | 1988 | 77 | 122.50 |
AVG(Medals) OVER | 1992 | 89 | 111.33 |
(ORDER BY Year ASC | 1996 | 160 | 108.67 |
ROWS BETWEEN | 2000 | 130 | 126.33 |
2 PRECEDING AND CURRENT ROW) AS Medals_MA | 2004 | 116 | 135.33 |
FROM US_Medals | 2008 | 125 | 123.67 |
ORDER BY Year ASC; | 2012 | 147 | 129.33 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Moving total
Query Result

WITH US_Medals AS (...) | Year | Medals | Medals_MT |


|------|--------|-----------|
SELECT | 1984 | 168 | 168 |
Year, Medals, | 1988 | 77 | 245 |
SUM(Medals) OVER | 1992 | 89 | 334 |
(ORDER BY Year ASC | 1996 | 160 | 326 |
ROWS BETWEEN | 2000 | 130 | 379 |
2 PRECEDING AND CURRENT ROW) AS Medals_MT | 2004 | 116 | 406 |
FROM US_Medals | 2008 | 125 | 371 |
ORDER BY Year ASC; | 2012 | 147 | 388 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROWS vs RANGE
RANGE BETWEEN [START] AND [FINISH]
Functions much the same as ROWS BETWEEN

RANGE treats duplicates in OVER 's ORDER BY subclause as a single entity

Table range_rt col treats the rwos with duplicate values as single rows, summing them up first then displaying that sum for each duplicate row

| Year | Medals | Rows_RT | Range_RT |


|------|--------|---------|----------|
| 1992 | 10 | 10 | 10 |
| 1996 | 50 | 60 | 110 |
| 2000 | 50 | 110 | 110 |
| 2004 | 60 | 170 | 230 |
| 2008 | 60 | 230 | 230 |
| 2012 | 70 | 300 | 300 |

ROWS BETWEEN is almost always used over RANGE BETWEEN

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S TG R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

You might also like