S03-Window Functions Within SQLite
S03-Window Functions Within SQLite
ipynb - Colab
A window function is a special SQL function where the input values are taken from a
“window” of one or more rows in the results set of a SELECT statement.
row_number()
rank()
dense_rank()
percent_rank()
cume_dist()
ntile(N)
lag(expr), lag(expr, offset), lag(expr, offset, default)
lead(expr), lead(expr, offset), lead(expr, offset, default)
first_value(expr)
last_value(expr)
nth_value(expr, N)
Previous tutorials are created based on a little bit older SQLite versions. At that time, window
Functions are not available. In this tutoiral, I am going to touch some of them in the following 5
sections:
Firstly, give a comparison between SQL Window Functions vs. SQL Aggregate Functions
Secondly, go through a few WINDOW functions based on regular aggregate functions, such as
AVG, MIN/MAX, COUNT, SUM.
Forthly, talk about generating statistics (e.g., percentiles, quartiles, median, etc .) with the
NTILE function, a common task for a data scientist.
Fifthly, focus on LAG and LEAD, two functions that are super important if you are interviewing
for a role that requires dealing with time-series data.
https://colab.research.google.com/drive/15x1e1H_D-7M4TjgT1i3lp6f3oC8jmu_z#printMode=true 1/15
7/25/24, 4:13 PM s03-Window Functions within SQLite.ipynb - Colab
In the previous tutorials, we presented some application cases of Aggregate Functions. What are
the Similarities and Differences between them? Let's have a quick gothrough.
By using the GROUP BY clause, you can calculate an aggregate value for several groups in one
query. That is to say, aggregate functions collapse the individual rows and present the aggregate
value for all the rows in the group.
The window frame (or simply window) is defined using the OVER() clause. This clause also allows
defining a window based on a specific column (similar to GROUP BY).
To calculate the returned values, window functions may use aggregate functions, but they will use
them with the OVER() clause.
Aggregate functions with GROUP BY differ from window functions in that they:
Window functions differ from aggregate functions used with GROUP BY in that they:
May use many functions other than aggregates (e.g. RANK(), LAG(), or LEAD()).
Groups rows on the row’s rank, percentile, etc. as well as its column value.
Do not collapse rows.
May use a sliding window frame (which depends on the current row).
%load_ext sql
%sql sqlite:///data/demo.db3
'Connected: @data/demo.db3'
Window functions are functions that perform calculations across a set of rows related to the
current row.
It is comparable to the type of calculation done with an aggregate function, but unlike regular
aggregate functions, window functions do not group several rows into a single output row — the
rows retain their own identities.
Behind the scenes, the window functions process more than just the query results' current row.
2.1.1 AVG
%%sql sqlite://
SELECT
YR,
MO,
PREC_mm,
ROUND(AVG(PREC_mm) OVER(PARTITION BY MO)) AS avg_PREC_mm
FROM watershed_monthly
ORDER by MO
https://colab.research.google.com/drive/15x1e1H_D-7M4TjgT1i3lp6f3oC8jmu_z#printMode=true 3/15
7/25/24, 4:13 PM s03-Window Functions within SQLite.ipynb - Colab
Done.
YR MO PREC_mm avg_PREC_mm
1981 1 96.2901611328125 114.0
1982 1 59.744789123535156 114.0
1983 1 94.48113250732422 114.0
1984 1 111.43696594238281 114.0
1985 1 78.43978881835938 114.0
1986 1 45.433536529541016 114.0
1987 1 146.7594757080078 114.0
1988 1 106.8199234008789 114.0
1989 1 97.1476821899414 114.0
1990 1 84.20606994628906 114.0
1991 1 120.2723388671875 114.0
1992 1 114.5992660522461 114.0
1993 1 139.7407684326172 114.0
1994 1 117.82371520996094 114.0
1995 1 99.79764556884766 114.0
1996 1 171.72898864746094 114.0
1997 1 109.2258529663086 114.0
1998 1 165.39779663085938 114.0
1999 1 129.19338989257812 114.0
2000 1 81.89866638183594 114.0
2001 1 66.61447143554688 114.0
2002 1 125.24539184570312 114.0
2003 1 137.62448120117188 114.0
2004 1 116.2384262084961 114.0
2005 1 178.54624938964844 114.0
2006 1 99.33370971679688 114.0
2007 1 206.89495849609375 114.0
2008 1 77.4709701538086 114.0
2009 1 122.1943359375 114.0
2010 1 107.7326431274414 114.0
1981 2 160.22804260253906 119.0
1982 2 87.51470184326172 119.0
1983 2 52.12006378173828 119.0
1984 2 163.99578857421875 119.0
1985 2 130.32330322265625 119.0
1986 2 110.43309020996094 119.0
1987 2 184.14231872558594 119.0
1988 2 183.34051513671875 119.0
1989 2 117.50041961669922 119.0
1990 2 116.64447021484375 119.0
1991 2 113.96415710449219 119.0
1992 2 189.6724853515625 119.0
1993 2 136.2637939453125 119.0
1994 2 158.59536743164062 119.0
https://colab.research.google.com/drive/15x1e1H_D-7M4TjgT1i3lp6f3oC8jmu_z#printMode=true 4/15
7/25/24, 4:13 PM s03-Window Functions within SQLite.ipynb - Colab
In the final output, every row has the average PREC_mm from the same month (MO). By adding a
ORDER by, you will find there is the same averge value for each month.
https://colab.research.google.com/drive/15x1e1H_D-7M4TjgT1i3lp6f3oC8jmu_z#printMode=true 11/15
7/25/24, 4:13 PM s03-Window Functions within SQLite.ipynb - Colab
Let’s take a look at a more complicated example, where we calculated a running sum with a window
function.
%%sql sqlite://
SELECT
YR,
MO,
PREC_mm,
SUM(PREC_mm) OVER(ORDER BY MO) AS running_total,
SUM(PREC_mm) OVER() AS overall,
ROUND(SUM(PREC_mm) OVER(ORDER BY MO) * 100.0 /SUM(PREC_mm) OVER(), 2) AS running_percent
FROM watershed_monthly
https://colab.research.google.com/drive/15x1e1H_D-7M4TjgT1i3lp6f3oC8jmu_z#printMode=true 12/15
7/25/24, 4:13 PM s03-Window Functions within SQLite.ipynb - Colab
Done.
YR MO PREC_mm running_total overall running_percentage
1981 1 96.2901611328125 3408.3335914611816 24084.3432366848 14.15
1982 1 59.744789123535156 3408.3335914611816 24084.3432366848 14.15
1983 1 94.48113250732422 3408.3335914611816 24084.3432366848 14.15
1984 1 111.43696594238281 3408.3335914611816 24084.3432366848 14.15
1985 1 78.43978881835938 3408.3335914611816 24084.3432366848 14.15
1986 1 45.433536529541016 3408.3335914611816 24084.3432366848 14.15
1987 1 146.7594757080078 3408.3335914611816 24084.3432366848 14.15
1988 1 106.8199234008789 3408.3335914611816 24084.3432366848 14.15
1989 1 97.1476821899414 3408.3335914611816 24084.3432366848 14.15
1990 1 84.20606994628906 3408.3335914611816 24084.3432366848 14.15
1991 1 120.2723388671875 3408.3335914611816 24084.3432366848 14.15
1992 1 114.5992660522461 3408.3335914611816 24084.3432366848 14.15
1993 1 139.7407684326172 3408.3335914611816 24084.3432366848 14.15
1994 1 117.82371520996094 3408.3335914611816 24084.3432366848 14.15
1995 1 99.79764556884766 3408.3335914611816 24084.3432366848 14.15
1996 1 171.72898864746094 3408.3335914611816 24084.3432366848 14.15
1997 1 109.2258529663086 3408.3335914611816 24084.3432366848 14.15
1998 1 165.39779663085938 3408.3335914611816 24084.3432366848 14.15
1999 1 129.19338989257812 3408.3335914611816 24084.3432366848 14.15
2000 1 81.89866638183594 3408.3335914611816 24084.3432366848 14.15
2001 1 66.61447143554688 3408.3335914611816 24084.3432366848 14.15
2002 1 125.24539184570312 3408.3335914611816 24084.3432366848 14.15
2003 1 137.62448120117188 3408.3335914611816 24084.3432366848 14.15
2004 1 116.2384262084961 3408.3335914611816 24084.3432366848 14.15
2005 1 178.54624938964844 3408.3335914611816 24084.3432366848 14.15
2006 1 99.33370971679688 3408.3335914611816 24084.3432366848 14.15
2007 1 206.89495849609375 3408.3335914611816 24084.3432366848 14.15
2008 1 77.4709701538086 3408.3335914611816 24084.3432366848 14.15
2009 1 122.1943359375 3408.3335914611816 24084.3432366848 14.15
2010 1 107.7326431274414 3408.3335914611816 24084.3432366848 14.15
1981 2 160.22804260253906 6970.13357925415 24084.3432366848 28.94
1982 2 87.51470184326172 6970.13357925415 24084.3432366848 28.94
1983 2 52.12006378173828 6970.13357925415 24084.3432366848 28.94
1984 2 163.99578857421875 6970.13357925415 24084.3432366848 28.94
1985 2 130.32330322265625 6970.13357925415 24084.3432366848 28.94
1986 2 110.43309020996094 6970.13357925415 24084.3432366848 28.94
1987 2 184.14231872558594 6970.13357925415 24084.3432366848 28.94
1988 2 183.34051513671875 6970.13357925415 24084.3432366848 28.94
1989 2 117.50041961669922 6970.13357925415 24084.3432366848 28.94
1990 2 116.64447021484375 6970.13357925415 24084.3432366848 28.94
1991 2 113.96415710449219 6970.13357925415 24084.3432366848 28.94
1992 2 189.6724853515625 6970.13357925415 24084.3432366848 28.94
1993 2 136.2637939453125 6970.13357925415 24084.3432366848 28.94
1994 2 158.59536743164062 6970.13357925415 24084.3432366848 28.94
https://colab.research.google.com/drive/15x1e1H_D-7M4TjgT1i3lp6f3oC8jmu_z#printMode=true 13/15
7/25/24, 4:13 PM s03-Window Functions within SQLite.ipynb - Colab