0% found this document useful (0 votes)
7 views29 pages

Summchpt 2

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views29 pages

Summchpt 2

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 29

Fetching

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
The four functions
Relative
LAG(column, n) returns column 's value at the row n rows before the current row

LEAD(column, n) returns column 's value at the row n rows after the current row

Absolute
FIRST_VALUE(column) returns the first value in the table or partition

LAST_VALUE(column) returns the last value in the table or partition


POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
LEAD
Query Result

WITH Hosts AS ( | Year | City | Next_City | After_Next_City |


SELECT DISTINCT Year, City |------|-----------|-----------|----------------- |
FROM Summer_Medals) | 1896 | Athens | Paris | St Louis |
| 1900 | Paris | St Louis | London |
SELECT | 1904 | St Louis | London | Stockholm |
Year, City, | 1908 | London | Stockholm | Antwerp |
LEAD(City, 1) OVER (ORDER BY Year ASC) | 1912 | Stockholm | Antwerp | Paris |
AS Next_City, | ... | ... | ... | ... |
LEAD(City, 2) OVER (ORDER BY Year ASC)
AS After_Next_City
FROM Hosts
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


FIRST_VALUE and LAST_VALUE
Query Result

SELECT | Year | City | First_City | Last_City |


Year, City, |------|-----------|------------ |----------------- |
FIRST_VALUE(City) OVER | 1896 | Athens | Athens | London |
(ORDER BY Year ASC) AS First_City, | 1900 | Paris | Athens | London |
LAST_VALUE(City) OVER ( | 1904 | St Louis | Athens | London |
ORDER BY Year ASC | 1908 | London | Athens | London |
RANGE BETWEEN | 1912 | Stockholm | Athens | London |
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS Last_City
By default, a window starts at the beginning
FROM Hosts of the table or partition and ends at the
ORDER BY Year ASC;
current row
RANGE BETWEEN ... clause extends the
window to the end of the table or partition
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Partitioning with LEAD
LEAD(Champion, 1) without PARTITION BY LEAD(Champion, 1) with
PARTITION BY Event
| Year | Event | Champion | Next_Champion |
|------|-------------- |---------- |--------------- |
| Year | Event | Champion | Next_Champion |
| 2004 | Discus Throw | LTU | EST |
|------|-------------- |---------- |--------------- |
| 2008 | Discus Throw | EST | GER |
| 2004 | Discus Throw | LTU | EST |
| 2012 | Discus Throw | GER | SWE |
| 2008 | Discus Throw | EST | GER |
| 2004 | Triple Jump | SWE | POR |
| 2012 | Discus Throw | GER | null |
| 2008 | Triple Jump | POR | USA |
| 2004 | Triple Jump | SWE | POR |
| 2012 | Triple Jump | USA | null |
| 2008 | Triple Jump | POR | USA |
| 2012 | Triple Jump | USA | null |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Partitioning with FIRST_VALUE
FIRST_VALUE(Champion) without FIRST_VALUE(Champion) with
PARTITION BY Event PARTITION BY Event
| Year | Event | Champion | First_Champion | | Year | Event | Champion | First_Champion |
|------|-------------- |---------- |---------------- | |------|-------------- |---------- |---------------- |
| 2004 | Discus Throw | LTU | LTU | | 2004 | Discus Throw | LTU | LTU |
| 2008 | Discus Throw | EST | LTU | | 2008 | Discus Throw | EST | LTU |
| 2012 | Discus Throw | GER | LTU | | 2012 | Discus Throw | GER | LTU |
| 2004 | Triple Jump | SWE | LTU | | 2004 | Triple Jump | SWE | SWE |
| 2008 | Triple Jump | POR | LTU | | 2008 | Triple Jump | POR | SWE |
| 2012 | Triple Jump | USA | LTU | | 2012 | Triple Jump | USA | SWE |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Ranking
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
The ranking functions
ROW_NUMBER() always assigns unique numbers, even if two rows' values are the same

RANK() assigns the same number to rows with identical values, skipping over the next
numbers in such cases
DENSE_RANK() also assigns the same number to rows with identical values, but doesn't skip over
the next numbers

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source table
Query Result

SELECT | Country | Games |


Country, COUNT(DISTINCT Year) AS Games |---------|-------|
FROM Summer_Medals | GBR | 27 |
WHERE | DEN | 26 |
Country IN ('GBR', 'DEN', 'FRA', | FRA | 26 |
'ITA', 'AUT', 'BEL', | ITA | 25 |
'NOR', 'POL', 'ESP') | AUT | 24 |
GROUP BY Country | BEL | 24 |
ORDER BY Games DESC; | NOR | 22 |
| POL | 20 |
| ESP | 18 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Different ranking functions - ROW_NUMBER
Query Result

WITH Country_Games AS (...) | Country | Games | Row_N |


|---------|-------|-------|
SELECT | GBR | 27 | 1 |
Country, Games, | DEN | 26 | 2 |
ROW_NUMBER() | FRA | 26 | 3 |
OVER (ORDER BY Games DESC) AS Row_N | ITA | 25 | 4 |
FROM Country_Games | AUT | 24 | 5 |
ORDER BY Games DESC, Country ASC; | BEL | 24 | 6 |
| NOR | 22 | 7 |
| POL | 20 | 8 |
| ESP | 18 | 9 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Different ranking functions - RANK
Query Result

WITH Country_Games AS (...) | Country | Games | Row_N | Rank_N |


|--------- |------- |------- |-------- |
SELECT | GBR | 27 | 1 | 1 |
Country, Games, | DEN | 26 | 2 | 2 |
ROW_NUMBER() | FRA | 26 | 3 | 2 |
OVER (ORDER BY Games DESC) AS Row_N, | ITA | 25 | 4 | 4 |
RANK() | AUT | 24 | 5 | 5 |
OVER (ORDER BY Games DESC) AS Rank_N | BEL | 24 | 6 | 5 |
FROM Country_Games | NOR | 22 | 7 | 7 |
ORDER BY Games DESC, Country ASC; | POL | 20 | 8 | 8 |
| ESP | 18 | 9 | 9 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Different ranking functions - DENSE_RANK
Query Result

WITH Country_Games AS (...) | Country | Games | Row_N | Rank_N | Dense_Rank_N |


|--------- |------- |------- |-------- |-------------- |
SELECT | GBR | 27 | 1 | 1 | 1 |
Country, Games, | DEN | 26 | 2 | 2 | 2 |
ROW_NUMBER() | FRA | 26 | 3 | 2 | 2 |
OVER (ORDER BY Games DESC) AS Row_N, | ITA | 25 | 4 | 4 | 3 |
RANK() | AUT | 24 | 5 | 5 | 4 |
OVER (ORDER BY Games DESC) AS Rank_N, | BEL | 24 | 6 | 5 | 5 |
DENSE_RANK() | NOR | 22 | 7 | 7 | 5 |
OVER (ORDER BY Games DESC) AS Dense_Rank_N | POL | 20 | 8 | 8 | 6 |
FROM Country_Games | ESP | 18 | 9 | 9 | 7 |
ORDER BY Games DESC, Country ASC;

ROW_NUMBER and RANK will have the same DENSE_RANK 's last rank is the count of
unique values being ranked
last rank, the count of rows

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ranking without partitioning - Source table
Query Result

SELECT | Country | Athlete | Medals |


Country, Athlete, COUNT(*) AS Medals |--------- |------------------- |-------- |
FROM Summer_Medals | CHN | SUN Yang | 4 |
WHERE | CHN | Guo Shuang | 3 |
Country IN ('CHN', 'RUS') | CHN | WANG Hao | 3 |
AND Year = 2012 | ... | ... | ... |
GROUP BY Country, Athlete | RUS | MUSTAFINA Aliya | 4 |
HAVING COUNT(*) > 1 | RUS | ANTYUKH Natalya | 2 |
ORDER BY Country ASC, Medals DESC; | RUS | ISHCHENKO Natalia | 2 |
| ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ranking without partitioning
Query Result

WITH Country_Medals AS (...) | Country | Athlete | Medals | Rank_N |


|--------- |------------------- |--------|-------- |
SELECT | CHN | SUN Yang | 4 | 1 |
Country, Athlete, Medals, | CHN | Guo Shuang | 3 | 2 |
DENSE_RANK() | CHN | WANG Hao | 3 | 2 |
OVER (ORDER BY Medals DESC) AS Rank_N | ... | ... | ... | ... |
FROM Country_Medals | RUS | MUSTAFINA Aliya | 4 | 1 |
ORDER BY Country ASC, Medals DESC; | RUS | ANTYUKH Natalya | 2 | 3 |
| RUS | ISHCHENKO Natalia | 2 | 3 |
| ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ranking with partitioning
Query Result

WITH Country_Medals AS (...) | Country | Athlete | Medals | Rank_N |


|--------- |------------------- |--------|-------- |
SELECT | CHN | SUN Yang | 4 | 1 |
Country, Athlete, | CHN | Guo Shuang | 3 | 2 |
DENSE_RANK() | CHN | WANG Hao | 3 | 2 |
OVER (PARTITION BY Country | ... | ... | ... | ... |
ORDER BY Medals DESC) AS Rank_N | RUS | MUSTAFINA Aliya | 4 | 1 |
FROM Country_Medals | RUS | ANTYUKH Natalya | 2 | 2 |
ORDER BY Country ASC, Medals DESC; | RUS | ISHCHENKO Natalia | 2 | 2 |
| ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Paging
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
What is paging?
Paging: Splitting data into (approximately) equal chunks
Uses
Many APIs return data in "pages" to reduce data being sent
Separating data into quartiles or thirds (top middle 33%, and bottom thirds) to judge
performance
Enter NTILE

NTILE(n) splits the data into n approximately equal pages

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Paging - Source table
Query Result

SELECT | Discipline |
DISTINCT Discipline |--------------------- |
FROM Summer_Medals; | Wrestling Freestyle |
| Archery |
| Baseball |
Split the data into 15 approx. equally sized | Lacrosse |
pages | Judo |
| Athletics |
67/15 ≃ 4, so each each page will contain | ... |

four or five rows (67 rows)


POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Paging
Query Result

WITH Disciplines AS ( | Discipline | Page |


SELECT |--------------------- |------ |
DISTINCT Discipline | Wrestling Freestyle | 1 |
FROM Summer_Medals) | Archery | 1 |
| Baseball | 1 |
SELECT | Lacrosse | 1 |
Discipline, NTILE(15) OVER () AS Page | Judo | 1 |
From Disciplines | Athletics | 2 |
ORDER BY Page ASC; | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Top, middle, and bottom thirds
Query Result

WITH Country_Medals AS ( | Country | Medals | Third |


SELECT |---------|--------|-------|
Country, COUNT(*) AS Medals | USA | 4585 | 1 |
FROM Summer_Medals | URS | 2049 | 1 |
GROUP BY Country), | GBR | 1720 | 1 |
| ... | ... | ... |
SELECT | CZE | 56 | 2 |
Country, Medals, | LTU | 55 | 2 |
NTILE(3) OVER (ORDER BY Medals DESC) AS Third | ... | ... | ... |
FROM Country_Medals; | DOM |6 | 3 |
| BWI |5 | 3 |
| ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Thirds averages
Query Result

WITH Country_Medals AS (...), | Third | Avg_Medals |


|------- |------------ |
Thirds AS ( | 1 | 598.74 |
SELECT | 2 | 22.98 |
Country, Medals, | 3 | 2.08 |
NTILE(3) OVER (ORDER BY Medals DESC) AS Third
FROM Country_Medals)

SELECT
Third,
ROUND(AVG(Medals), 2) AS Avg_Medals
FROM Thirds
GROUP BY Third
ORDER BY Third ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

You might also like