0% found this document useful (0 votes)
10 views31 pages

Summchpt 4

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Summchpt 4

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 31

Pivoting

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
Transforming tables
Before After

| Country | Year | Awards | | Country | 2008 | 2012 |


|--------- |------|-------- | |--------- |------|------ |
| CHN | 2008 | 74 | | CHN | 74 | 56 |
| CHN | 2012 | 56 | | RUS | 43 | 47 |
| RUS | 2008 | 43 | | USA | 125 | 147 |
| RUS | 2012 | 47 |
| USA | 2008 | 125 |
| USA | 2012 | 147 |
Pivoted by Year
Easier to scan, especially if pivoted by a
Gold medals awarded to China, Russia, and
chronologically ordered column
the USA
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Enter CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM CROSSTAB($$


source_sql TEXT
$$) AS ct (column_1 DATA_TYPE_1,
column_2 DATA_TYPE_2,
...,
column_n DATA_TYPE_N);

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Queries
Before After

SELECT CREATE EXTENSION IF NOT EXISTS tablefunc;


Country, Year, COUNT(*) AS Awards
FROM Summer_Medals SELECT * FROM CROSSTAB($$
WHERE SELECT
Country IN ('CHN', 'RUS', 'USA') Country, Year, COUNT(*) :: INTEGER AS Awards
AND Year IN (2008, 2012) FROM Summer_Medals
AND Medal = 'Gold' WHERE
GROUP BY Country, Year Country IN ('CHN', 'RUS', 'USA')
ORDER BY Country ASC, Year ASC; AND Year IN (2008, 2012)
AND Medal = 'Gold'
GROUP BY Country, Year
ORDER BY Country ASC, Year ASC;
$$) AS ct (Country VARCHAR, "2008" INTEGER, "2012" INTEGER)
ORDER BY Country ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source query
WITH Country_Awards AS (
SELECT
Country, Year, COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Country IN ('CHN', 'RUS', 'USA')
AND Year IN (2004, 2008, 2012)
AND Medal = 'Gold' AND Sport = 'Gymnastics'
GROUP BY Country, Year
ORDER BY Country ASC, Year ASC)

SELECT
Country, Year,
RANK() OVER
(PARTITION BY Year ORDER BY Awards DESC) :: INTEGER
AS rank
FROM Country_Awards
ORDER BY Country ASC, Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source result
| Country | Year |Rank |
|--------- |------|----- |
| CHN |2004|3 |
| CHN |2008|1 |
| CHN |2012|1 |
| RUS |2004|1 |
| RUS |2008|2 |
| RUS |2012|2 |
| USA | 2004 | 2 |
| USA | 2008 | 3 |
| USA | 2012 | 3 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Pivot query
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM CROSSTAB($$


...
$$) AS ct (Country VARCHAR,
"2004" INTEGER,
"2008" INTEGER,
"2012" INTEGER)

ORDER BY Country ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Pivot result
| Country | 2004 | 2008 | 2012 |
|--------- |------|------|------ |
| CHN | 3 | 1 | 1 |
| RUS | 1 | 2 | 2 |
| USA | 2 | 3 | 3 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
ROLLUP and CUBE
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
Group-level totals
Chinese and Russian medals in the 2008 Summer Olympics per medal class

| Country | Medal | Awards |


|--------- |--------|-------- |
| CHN | Bronze | 57 |
| CHN | Gold | 74 |
| CHN | Silver | 53 |
| CHN | Total | 184 |
| RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
| RUS | Total | 143 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


The old way
SELECT
Country, Medal, COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY Country, Medal
ORDER BY Country ASC, Medal ASC
UNION ALL

SELECT
Country, 'Total', COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY Country, 2
ORDER BY Country ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter ROLLUP
SELECT
Country, Medal, COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY Country, ROLLUP(Medal)
ORDER BY Country ASC, Medal ASC;

ROLLUP is a GROUP BY subclause that includes extra rows for group-level aggregations

GROUP BY Country, ROLLUP(Medal) will count all Country - and Medal -level totals, then
count only Country -level totals and fill in Medal with null s for these rows

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROLLUP - Query
SELECT
Country, Medal, COUNT(*) AS Awards
FROM summer_medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY ROLLUP(Country, Medal)
ORDER BY Country ASC, Medal ASC;

ROLLUP is hierarchical, de-aggregating from the leftmost provided column to the right-most
ROLLUP(Country, Medal) includes Country -level totals

ROLLUP(Medal, Country) includes Medal -level totals

Both include grand totals

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROLLUP - Result
| Country | Medal | Awards |
|--------- |--------|-------- |
| CHN | Bronze | 57 |
| CHN | Gold | 74 |
| CHN | Silver | 53 |
| CHN | null | 184 |
| RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
| RUS | null | 143 |
| null | null | 327 |

Group-level totals contain nulls ; the row with all null s is the grand total
Notice that it didn't include Medal -level totals, since it's ROLLUP(Country, Medal) and not
ROLLUP(Medal, Country)

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter CUBE
SELECT
Country, Medal, COUNT(*) AS Awards
FROM summer_medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY CUBE(Country, Medal)
ORDER BY Country ASC, Medal ASC;

CUBE is a non-hierarchical ROLLUP

It generates all possible group-level aggregations


CUBE(Country, Medal) counts Country -level, Medal -level, and grand totals

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


CUBE - Result
| Country | Medal | Awards |
|--------- |--------|-------- |
| CHN | Bronze | 57 |
| CHN | Gold | 74 |
| CHN | Silver | 53 |
| CHN | null | 184 |
| RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
| RUS | null | 143 |
| null | Bronze | 113 |
| null | Gold | 117 |
| null | Silver | 97 |
| null | null | 327 |

Notice that Medal -level totals are included

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROLLUP vs CUBE
Source ROLLUP(Year, Quarter)

| Year | Quarter | Sales | | Year | Quarter | Sales |


|------|--------- |------- | |------ |---------|------- |
| 2008 | Q1 | 12 | | 2008 | null | 27 |
| 2008 | Q2 | 15 | | 2009 | null | 48 |
| 2009 | Q1 | 21 | | null | null | 75 |
| 2009 | Q2 | 27 |

CUBE(Year, Quarter)
Use ROLLUP when you have hierarchical
data (e.g., date parts) and don't want all Above rows + the following
possible group-level aggregations
| Year | Quarter | Sales |
Use CUBE when you want all possible | ------|---------|------- |
| null | Q1 | 33 |
group-level aggregations | null | Q2 | 42 |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
A survey of useful
functions
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

Michel Semaan
Data Scientist
Nulls ahoy
Query Result

SELECT | Country | Medal | Awards |


Country, Medal, COUNT(*) AS Awards |---------|--------|--------|
FROM summer_medals | CHN | Bronze | 57 |
WHERE | CHN | Gold | 74 |
Year = 2008 AND Country IN ('CHN', 'RUS') | CHN | Silver | 53 |
GROUP BY ROLLUP(Country, Medal) | CHN | null | 184 |
ORDER BY Country ASC, Medal ASC; | RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
null s signify group totals
| RUS | null | 143 |

| null | null | 327 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter COALESCE
COALESCE() takes a list of values and returns the first non- null value, going from left to right

COALESCE(null, null, 1, null, 2) ? 1

Useful when using SQL operations that return null s


ROLLUP and CUBE

Pivoting
LAG and LEAD

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Annihilating nulls
Query Result

SELECT | Country | Medal | Awards |


COALESCE(Country, 'Both countries') AS Country, |----------------|------------ |-------- |
COALESCE(Medal, 'All medals') AS Medal, | Both countries | All medals | 327 |
COUNT(*) AS Awards | CHN | All medals | 184 |
FROM summer_medals | CHN | Bronze | 57 |
WHERE | CHN | Gold | 74 |
Year = 2008 AND Country IN ('CHN', 'RUS') | CHN | Silver | 53 |
GROUP BY ROLLUP(Country, Medal) | RUS | All medals | 143 |
ORDER BY Country ASC, Medal ASC; | RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Compressing data
Before After

| Country | Rank | CHN, RUS, USA


|--------- |------ |
| CHN | 1 |
| RUS | 2 | Succinct and provides all information
| USA | 3 |
needed because the ranking is implied

Rank is redundant because the ranking is


implied
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Enter STRING_AGG
STRING_AGG(column, separator) takes all the values of a column and concatenates them, with
separator in between each value

STRING_AGG(Letter, ', ') transforms this...

| Letter |
|--------|
| A |
| B |
| C |

...into this

A,B,C

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Query and result
Before After

WITH Country_Medals AS ( WITH Country_Medals AS (...),


SELECT
Country, COUNT(*) AS Medals Country_Ranks AS (...)
FROM Summer_Medals
WHERE Year = 2012 SELECT STRING_AGG(Country, ', ')
AND Country IN ('CHN', 'RUS', 'USA') FROM Country_Medals;
AND Medal = 'Gold'
AND Sport = 'Gymnastics'
Result
GROUP BY Country),
SELECT CHN, RUS, USA

Country,
RANK() OVER (ORDER BY Medals DESC) AS Rank
FROM Country_Medals
ORDER BY Rank ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS

You might also like