0% found this document useful (0 votes)
9 views28 pages

Summchpt 1

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views28 pages

Summchpt 1

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
Motivation
USA total and running total of Summer Discus throw reigning champion status
Olympics gold medals since 2004
| Year | Champion | Last_Champion | Reigning_Champion |
|------ |----------|--------------- |------------------- |
| Year | Medals | Medals_RT |
| 1996 | GER | null | false |
|------|--------|----------- |
| 2000 | LTU | GER | false |
| 2004 | 116 | 116 |
| 2004 | LTU | LTU | true |
| 2008 | 125 | 241 |
| 2008 | EST | LTU | false |
| 2012 | 147 | 388 |
| 2012 | GER | EST | false |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Course outline
1. Introduction to window functions
2. Fetching, ranking, and paging
3. Aggregate window functions and frames
4. Beyond window functions

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Summer olympics dataset
Each row represents a medal awarded in the Summer Olympics games

Columns
Year , City

Sport , Discipline , Event

Athlete , Country , Gender

Medal

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Window functions
Perform an operation across a set of rows that are somehow related to the current row
Similar to GROUP BY aggregate functions, but all rows remain in the output

Uses

Fetching values from preceding or following rows (e.g. fetching the previous row's value)
Determining reigning champion status
Calculating growth over time
Assigning ordinal ranks (1st, 2nd, etc.) to rows based on their values' positions in a sorted list

Running totals, moving averages

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Row numbers
Query Result

SELECT | Year | Event | Country |


Year, Event, Country |------ |---------------------------- |--------- |
FROM Summer_Medals | 1896 | 100M Freestyle | HUN |
WHERE | 1896 | 100M Freestyle For Sailors | GRE |
Medal = 'Gold'; | 1896 | 1200M Freestyle | HUN |
| ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter ROW_NUMBER
Query Result

SELECT | Year | Event | Country | Row_N |


Year, Event, Country, |------ |---------------------------- |---------|------- |
ROW_NUMBER() OVER () AS Row_N | 1896 | 100M Freestyle | HUN | 1 |
FROM Summer_Medals | 1896 | 100M Freestyle For Sailors | GRE | 2 |
WHERE | 1896 | 1200M Freestyle | HUN | 3 |
Medal = 'Gold'; | ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Anatomy of a window function
Query

SELECT
Year, Event, Country,
ROW_NUMBER() OVER () AS Row_N
FROM Summer_Medals
WHERE
Medal = 'Gold';

FUNCTION_NAME() OVER (...)


ORDER BY

PARTITION BY

ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
ORDER BY
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
Row numbers
Query Result*

SELECT | Year | Event | Country | Row_N |


Year, Event, Country, |------ |---------------------------- |---------|------- |
ROW_NUMBER() OVER () AS Row_N | 1896 | 100M Freestyle | HUN | 1 |
FROM Summer_Medals | 1896 | 100M Freestyle For Sailors | GRE | 2 |
WHERE | 1896 | 1200M Freestyle | HUN | 3 |
Medal = 'Gold'; | ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter ORDER BY
ORDER BY in OVER orders the rows related to the current row
Example: Ordering by year in descending order in ROW_NUMBER 's OVER clause will assign 1
to the most recent year's rows

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ordering by Year in descending order
Query Result

SELECT | Year | Event | Country | Row_N |


Year, Event, Country, |------|--------------- |--------- |------- |
ROW_NUMBER() OVER (ORDER BY Year DESC) AS Row_N | 2012 | Wg 96 KG | IRI | 1 |
FROM Summer_Medals | 2012 | 4X100M Medley | USA | 2 |
WHERE | 2012 | Wg 84 KG | RUS | 3 |
Medal = 'Gold'; | ... | ... | ... | ... |
| 2008 | 50M Freestyle | BRA | 637 |
| 2008 | 96 - 120KG | CUB | 638 |
| ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ordering by multiple columns
Query Result

SELECT | Year | Event | Country | Row_N |


Year, Event, Country, |------|--------- |--------- |------- |
ROW_NUMBER() OVER | 2012 | + 100KG | FRA | 1 |
(ORDER BY Year DESC, Event ASC) AS Row_N | 2012 |+67KG | SRB | 2 |
FROM Summer_Medals | 2012 | + 78KG | CUB | 3 |
WHERE | ... | ... | ... | ... |
Medal = 'Gold';

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ordering in- and outside OVER
Query Result

SELECT | Year | Event | Country | Row_N |


Year, Event, Country, |------|--------- |--------- |------- |
ROW_NUMBER() OVER | 2012 | 1500M | ALG | 36 |
(ORDER BY Year DESC, Event ASC) AS Row_N | 2000 | 1500M | ALG | 1998 |
FROM Summer_Medals | 1996 | 1500M | ALG | 2662 |
WHERE | ... | ... | ... | ... |
Medal = 'Gold'
ORDER BY Country ASC, Row_N ASC; ORDER BY inside OVER takes effect before

ORDER BY outside OVER

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Reigning champion
A reigning champion is a champion who's won both the previous and current years'
competitions
The previous and current year's champions need to be in the same row (in two different
columns)
Enter LAG

LAG(column, n) OVER (...) returns column 's value at the row n rows before the current row

LAG(column, 1) OVER (...) returns the previous row's value

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Current champions
Query Result

SELECT | Year | Champion |


Year, Country AS Champion |------|----------|
FROM Summer_Medals | 1996 | GER |
WHERE | 2000 | LTU |
Year IN (1996, 2000, 2004, 2008, 2012) | 2004 | LTU |
AND Gender = 'Men' AND Medal = 'Gold' | 2008 | EST |
AND Event = 'Discus Throw'; | 2012 | GER |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Current and last champions
Query Result

WITH Discus_Gold AS ( | Year | Champion | Last_Champion |


SELECT |------ |----------|--------------- |
Year, Country AS Champion | 1996 | GER | null |
FROM Summer_Medals | 2000 | LTU | GER |
WHERE | 2004 | LTU | LTU |
Year IN (1996, 2000, 2004, 2008, 2012) | 2008 | EST | LTU |
AND Gender = 'Men' AND Medal = 'Gold' | 2012 | GER | EST |
AND Event = 'Discus Throw')

SELECT
Year, Champion,
LAG(Champion, 1) OVER
(ORDER BY Year ASC) AS Last_Champion
FROM Discus_Gold
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
PARTITION BY
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
Motivation
Query Result

WITH Discus_Gold AS ( | Year | Event | Champion | Last_Champion |


SELECT |------|-------------- |---------- |--------------- |
Year, Event, Country AS Champion | 2004 | Discus Throw | LTU | null |
FROM Summer_Medals | 2008 | Discus Throw | EST | LTU |
WHERE | 2012 | Discus Throw | GER | EST |
Year IN (2004, 2008, 2012) | 2004 | Triple Jump | SWE | GER |
AND Gender = 'Men' AND Medal = 'Gold' | 2008 | Triple Jump | POR | SWE |
AND Event IN ('Discus Throw', 'Triple Jump') | 2012 | Triple Jump | USA | POR |
AND Gender = 'Men')

SELECT
When Event changes from Discus Throw
Year, Event, Champion, to Triple Jump , LAG fetched
LAG(Champion) OVER
(ORDER BY Event ASC, Year ASC) AS Last_Champion
Discus Throw 's last champion as opposed
FROM Discus_Gold to a null
ORDER BY Event ASC, Year ASC;
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Enter PARTITION BY
PARTITION BY splits the table into partitions based on a column's unique values
The results aren't rolled into one column
Operated on separately by the window function
ROW_NUMBER will reset for each partition

LAG will only fetch a row's previous value if its previous row is in the same partition

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Partitioning by one column
Query Result

WITH Discus_Gold AS (...) | Year | Event | Champion | Last_Champion |


|------|-------------- |---------- |--------------- |
SELECT | 2004 | Discus Throw | LTU | null |
Year, Event, Champion, | 2008 | Discus Throw | EST | LTU |
LAG(Champion) OVER | 2012 | Discus Throw | GER | EST |
(PARTITION BY Event | 2004 | Triple Jump | SWE | null |
ORDER BY Event ASC, Year ASC) AS Last_Champion | 2008 | Triple Jump | POR | SWE |
FROM Discus_Gold | 2012 | Triple Jump | USA | POR |
ORDER BY Event ASC, Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


More complex partitioning
| Year | Country | Event | Row_N |
|------|--------- |---------------------- |------- |
| 2008 | CHN | + 78KG (Heavyweight) | 1 |
| 2008 | CHN |-49KG | 2 |
| ... | ... | ... | ... |
| 2008 | JPN | 48 - 55KG | 27 |
| 2008 | JPN | 48 - 55KG | 28 |
| ... | ... | ... | ... |
| 2012 | CHN | +75KG | 32 |
| 2012 | CHN |-49KG | 33 |
| ... | ... | ... | ... |
| 2012 | JPN | +75KG | 51 |
| 2012 | JPN |-49KG | 52 |
| ... | ... | ... | ... |

Row number should reset per Year and Country

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Partitioning by multiple columns
Query Result

WITH Country_Gold AS ( | Year | Country | Event | Row_N |


SELECT |------|--------- |---------------------- |------- |
DISTINCT Year, Country, Event | 2008 | CHN | + 78KG (Heavyweight) | 1 |
FROM Summer_Medals | 2008 | CHN |-49KG | 2 |
WHERE | ... | ... | ... | ... |
Year IN (2008, 2012) | 2008 | JPN | 48 - 55KG | 1 |
AND Country IN ('CHN', 'JPN') | 2008 | JPN | 48 - 55KG | 2 |
AND Gender = 'Women' AND Medal = 'Gold') | ... | ... | ... | ... |
| 2012 | CHN | +75KG | 1 |
SELECT | 2012 | CHN |-49KG | 2 |
Year, Country, Event, | ... | ... | ... | ... |
ROW_NUMBER() OVER (PARTITION BY Year, Country) | 2012 | JPN | +75KG | 1 |
FROM Country_Gold; | 2012 | JPN |-49KG | 2 |
| ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

You might also like