0% found this document useful (0 votes)
6 views24 pages

Chapter 2

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views24 pages

Chapter 2

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Fetching

P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
The four functions
Relative
LAG(column, n) returns column 's value at the row n rows before the current row

LEAD(column, n) returns column 's value at the row n rows after the current row

Absolute
FIRST_VALUE(column) returns the first value in the table or partition

LAST_VALUE(column) returns the last value in the table or partition

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


LEAD
Query Result

WITH Hosts AS ( | Year | City | Next_City | After_Next_City |


SELECT DISTINCT Year, City |------|-----------|-----------|-----------------|
FROM Summer_Medals) | 1896 | Athens | Paris | St Louis |
| 1900 | Paris | St Louis | London |
SELECT | 1904 | St Louis | London | Stockholm |
Year, City, | 1908 | London | Stockholm | Antwerp |
LEAD(City, 1) OVER (ORDER BY Year ASC) | 1912 | Stockholm | Antwerp | Paris |
AS Next_City, | ... | ... | ... | ... |
LEAD(City, 2) OVER (ORDER BY Year ASC)
AS After_Next_City
FROM Hosts
ORDER BY Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


FIRST_VALUE and LAST_VALUE
Query Result

SELECT | Year | City | First_City | Last_City |


Year, City, |------|-----------|------------|-----------------|
FIRST_VALUE(City) OVER | 1896 | Athens | Athens | London |
(ORDER BY Year ASC) AS First_City, | 1900 | Paris | Athens | London |
LAST_VALUE(City) OVER ( | 1904 | St Louis | Athens | London |
ORDER BY Year ASC | 1908 | London | Athens | London |
RANGE BETWEEN | 1912 | Stockholm | Athens | London |
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
By default, a window starts at the
) AS Last_City
FROM Hosts beginning of the table or partition and ends
ORDER BY Year ASC;
at the current row
RANGE BETWEEN ... clause extends the
window to the end of the table or partition

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Partitioning with LEAD
LEAD(Champion, 1) without PARTITION BY LEAD(Champion, 1) with
PARTITION BY Event
| Year | Event | Champion | Next_Champion |
|------|--------------|----------|---------------|
| Year | Event | Champion | Next_Champion |
| 2004 | Discus Throw | LTU | EST |
|------|--------------|----------|---------------|
| 2008 | Discus Throw | EST | GER |
| 2004 | Discus Throw | LTU | EST |
| 2012 | Discus Throw | GER | SWE |
| 2008 | Discus Throw | EST | GER |
| 2004 | Triple Jump | SWE | POR |
| 2012 | Discus Throw | GER | null |
| 2008 | Triple Jump | POR | USA |
| 2004 | Triple Jump | SWE | POR |
| 2012 | Triple Jump | USA | null |
| 2008 | Triple Jump | POR | USA |
| 2012 | Triple Jump | USA | null |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Partitioning with FIRST_VALUE
FIRST_VALUE(Champion) without FIRST_VALUE(Champion) with
PARTITION BY Event PARTITION BY Event

| Year | Event | Champion | First_Champion | | Year | Event | Champion | First_Champion |


|------|--------------|----------|----------------| |------|--------------|----------|----------------|
| 2004 | Discus Throw | LTU | LTU | | 2004 | Discus Throw | LTU | LTU |
| 2008 | Discus Throw | EST | LTU | | 2008 | Discus Throw | EST | LTU |
| 2012 | Discus Throw | GER | LTU | | 2012 | Discus Throw | GER | LTU |
| 2004 | Triple Jump | SWE | LTU | | 2004 | Triple Jump | SWE | SWE |
| 2008 | Triple Jump | POR | LTU | | 2008 | Triple Jump | POR | SWE |
| 2012 | Triple Jump | USA | LTU | | 2012 | Triple Jump | USA | SWE |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S
Ranking
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
The ranking functions
ROW_NUMBER() always assigns unique numbers, even if two rows' values are the same

RANK() assigns the same number to rows with identical values, skipping over the next
numbers in such cases

DENSE_RANK() also assigns the same number to rows with identical values, but doesn't skip
over the next numbers

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source table
Query Result

SELECT | Country | Games |


Country, COUNT(DISTINCT Year) AS Games |---------|-------|
FROM Summer_Medals | GBR | 27 |
WHERE | DEN | 26 |
Country IN ('GBR', 'DEN', 'FRA', | FRA | 26 |
'ITA', 'AUT', 'BEL', | ITA | 25 |
'NOR', 'POL', 'ESP') | AUT | 24 |
GROUP BY Country | BEL | 24 |
ORDER BY Games DESC; | NOR | 22 |
| POL | 20 |
| ESP | 18 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Different ranking functions - ROW_NUMBER
Query Result

WITH Country_Games AS (...) | Country | Games | Row_N |


|---------|-------|-------|
SELECT | GBR | 27 | 1 |
Country, Games, | DEN | 26 | 2 |
ROW_NUMBER() | FRA | 26 | 3 |
OVER (ORDER BY Games DESC) AS Row_N | ITA | 25 | 4 |
FROM Country_Games | AUT | 24 | 5 |
ORDER BY Games DESC, Country ASC; | BEL | 24 | 6 |
| NOR | 22 | 7 |
| POL | 20 | 8 |
| ESP | 18 | 9 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Different ranking functions - RANK
Query Result

WITH Country_Games AS (...) | Country | Games | Row_N | Rank_N |


|---------|-------|-------|--------|
SELECT | GBR | 27 | 1 | 1 |
Country, Games, | DEN | 26 | 2 | 2 |
ROW_NUMBER() | FRA | 26 | 3 | 2 |
OVER (ORDER BY Games DESC) AS Row_N, | ITA | 25 | 4 | 4 |
RANK() | AUT | 24 | 5 | 5 |
OVER (ORDER BY Games DESC) AS Rank_N | BEL | 24 | 6 | 5 |
FROM Country_Games | NOR | 22 | 7 | 7 |
ORDER BY Games DESC, Country ASC; | POL | 20 | 8 | 8 |
| ESP | 18 | 9 | 9 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Different ranking functions - DENSE_RANK
Query Result

WITH Country_Games AS (...) | Country | Games | Row_N | Rank_N | Dense_Rank_N |


|---------|-------|-------|--------|--------------|
SELECT | GBR | 27 | 1 | 1 | 1 |
Country, Games, | DEN | 26 | 2 | 2 | 2 |
ROW_NUMBER() | FRA | 26 | 3 | 2 | 2 |
OVER (ORDER BY Games DESC) AS Row_N, | ITA | 25 | 4 | 4 | 3 |
RANK() | AUT | 24 | 5 | 5 | 4 |
OVER (ORDER BY Games DESC) AS Rank_N, | BEL | 24 | 6 | 5 | 5 |
DENSE_RANK() | NOR | 22 | 7 | 7 | 5 |
OVER (ORDER BY Games DESC) AS Dense_Rank_N | POL | 20 | 8 | 8 | 6 |
FROM Country_Games | ESP | 18 | 9 | 9 | 7 |
ORDER BY Games DESC, Country ASC;

DENSE_RANK 's last rank is the count of


ROW_NUMBER and RANK will have the same
unique values being ranked
last rank, the count of rows

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ranking without partitioning - Source table
Query Result

SELECT | Country | Athlete | Medals |


Country, Athlete, COUNT(*) AS Medals |---------|-------------------|--------|
FROM Summer_Medals | CHN | SUN Yang | 4 |
WHERE | CHN | Guo Shuang | 3 |
Country IN ('CHN', 'RUS') | CHN | WANG Hao | 3 |
AND Year = 2012 | ... | ... | ... |
GROUP BY Country, Athlete | RUS | MUSTAFINA Aliya | 4 |
HAVING COUNT(*) > 1 | RUS | ANTYUKH Natalya | 2 |
ORDER BY Country ASC, Medals DESC; | RUS | ISHCHENKO Natalia | 2 |
| ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ranking without partitioning
Query Result

WITH Country_Medals AS (...) | Country | Athlete | Medals | Rank_N |


|---------|-------------------|--------|--------|
SELECT | CHN | SUN Yang | 4 | 1 |
Country, Athlete, Medals, | CHN | Guo Shuang | 3 | 2 |
DENSE_RANK() | CHN | WANG Hao | 3 | 2 |
OVER (ORDER BY Medals DESC) AS Rank_N | ... | ... | ... | ... |
FROM Country_Medals | RUS | MUSTAFINA Aliya | 4 | 1 |
ORDER BY Country ASC, Medals DESC; | RUS | ANTYUKH Natalya | 2 | 3 |
| RUS | ISHCHENKO Natalia | 2 | 3 |
| ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Ranking with partitioning
Query Result

WITH Country_Medals AS (...) | Country | Athlete | Medals | Rank_N |


|---------|-------------------|--------|--------|
SELECT | CHN | SUN Yang | 4 | 1 |
Country, Athlete, | CHN | Guo Shuang | 3 | 2 |
DENSE_RANK() | CHN | WANG Hao | 3 | 2 |
OVER (PARTITION BY Country | ... | ... | ... | ... |
ORDER BY Medals DESC) AS Rank_N | RUS | MUSTAFINA Aliya | 4 | 1 |
FROM Country_Medals | RUS | ANTYUKH Natalya | 2 | 2 |
ORDER BY Country ASC, Medals DESC; | RUS | ISHCHENKO Natalia | 2 | 2 |
| ... | ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S
Paging
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
What is paging?
Paging: Splitting data into (approximately) equal chunks
Uses
Many APIs return data in "pages" to reduce data being sent

Separating data into quartiles or thirds (top middle 33%, and bottom thirds) to judge
performance

Enter NTILE

NTILE(n) splits the data into n approximately equal pages

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Paging - Source table
Query Result

SELECT | Discipline |
DISTINCT Discipline |---------------------|
FROM Summer_Medals; | Wrestling Freestyle |
| Archery |
| Baseball |
Split the data into 15 approx. equally sized
| Lacrosse |
pages | Judo |
| Athletics |
67/15 ≃ 4, so each each page will contain | ... |

four or five rows


(67 rows)

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Paging
Query Result

WITH Disciplines AS ( | Discipline | Page |


SELECT |---------------------|------|
DISTINCT Discipline | Wrestling Freestyle | 1 |
FROM Summer_Medals) | Archery | 1 |
| Baseball | 1 |
SELECT | Lacrosse | 1 |
Discipline, NTILE(15) OVER () AS Page | Judo | 1 |
From Disciplines | Athletics | 2 |
ORDER BY Page ASC; | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Top, middle, and bottom thirds
Query Result

WITH Country_Medals AS ( | Country | Medals | Third |


SELECT |---------|--------|-------|
Country, COUNT(*) AS Medals | USA | 4585 | 1 |
FROM Summer_Medals | URS | 2049 | 1 |
GROUP BY Country), | GBR | 1720 | 1 |
| ... | ... | ... |
SELECT | CZE | 56 | 2 |
Country, Medals, | LTU | 55 | 2 |
NTILE(3) OVER (ORDER BY Medals DESC) AS Third | ... | ... | ... |
FROM Country_Medals; | DOM | 6 | 3 |
| BWI | 5 | 3 |
| ... | ... | ... |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Thirds averages
Query Result

WITH Country_Medals AS (...), | Third | Avg_Medals |


|-------|------------|
Thirds AS ( | 1 | 598.74 |
SELECT | 2 | 22.98 |
Country, Medals, | 3 | 2.08 |
NTILE(3) OVER (ORDER BY Medals DESC) AS Third
FROM Country_Medals)

SELECT
Third,
ROUND(AVG(Medals), 2) AS Avg_Medals
FROM Thirds
GROUP BY Third
ORDER BY Third ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

You might also like