SQL
WINDOW
FUNCTIONS
What is a Window Function ?
AWindow Function in SQL is apowerful toolthat allows you to perform
calculations across a set of table rows related to the current row. Unlike
aggregate functions that return a single result for a group, window
functions return a value for each row while still being able to access related
rows' data. This makes them ideal for complex analytics, ranking, running
totals, and more.
Syntax for Window Functions
The basic syntax for a window function is:
<window_function>(): This is the function you are applying, like
ROW_NUMBER(), RANK(), etc.
PARTITION BY: Divides the result set into partitions to which the
window function is applied. (optional)
ORDER BY: Specifies the order of rows in each partition. It’s mandatory
for ranking functions.(optional)
window_frame_clause: Defines a subset of rows within the partition
that the function operates on, like ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW.(optional)
Sample Table : Sales
Let’s consider a sample Sales table:
Different Types of Window Functions
1. Aggregate Functions 2. Ranking Functions 3. Value Functions
SUM() AVG() ROW_NUMBER() LEAD()
MIN() and MAX() RANK() LAG()
COUNT() DENSE_RANK(): FIRST_VALUE()
NTILE(n) LAST_VALUE()
NTH_VALUE()
1. Aggregate Functions
An aggregatewindow function performs aggregation across a specified
set of rows defined by the window frame. This window frame is determined
by the PARTITION BY, ORDER BY, and ROWS or RANGE clauses.
Common Aggregate Window Functions
SUM(): Calculates the sum of values.
AVG(): Computes the average of values.
COUNT(): Counts the number of rows.
MIN(): Finds the minimum value.
MAX(): Finds the maximum value.
Syntax :
Example Query :
Let’s calculate the running total and average sale price using aggregate
window functions.
a. Average Sale Price :
Here, AVG(sale_price) OVER () calculates the average of sale_price across
all rows.
Output :
Explanation:
Global Average: Computes the average sale price across all rows in the
dataset.
Uniform Value: Provides the same average value for each row,
reflecting the overall average price.
b.Using ORDER BY
Here, SUM(quantity) OVER (ORDER BY order_date) calculates the
cumulative sum of quantity, ordered by order_date.
Example : Running Total of Quantity
Output :
Explanation:
Cumulative Sum: Calculates a cumulative sum of the quantity column,
adding up all quantities from the start up to the current row, based on
the order of order_date. Preservation of Rows: Maintains the original
rows and their data while computing the running total.
c.Using PARTITION BY
You can also use PARTITION BY to compute aggregates within specific
partitions.
Example: Running Total by Product Code
Output :
Explanation:
Partitioned Running Total: Calculates the running total of quantities
separately for each product_code, based on the order of order_date.
Product-wise Calculation: Only sums quantities within the same
product code partition, keeping totals distinct by product.
d.Query with All Aggregate Window Functions
Here’s a query that demonstrates the use of several aggregate window
functions on the sales dataset:
Output :
These output tables illustrate how each aggregate window function
processes the data, providing various analytical insights while maintaining
the detail of each row.
2. Ranking Window Functions
1. RANK()
The RANK() function assigns a unique rank to each row within a partition,
with gaps in ranking values if there are ties. If two or more rows have the
same rank, the next rank(s) are skipped.
2. DENSE_RANK()
The DENSE_RANK() function also assigns ranks to rows within a partition but
does not leave gaps in the ranking values for ties. Each distinct rank is
assigned consecutively without skipping.
3. ROW_NUMBER()
The ROW_NUMBER() function assigns a unique sequential integer to rows
within a partition, with no gaps. It is the simplest ranking function and does
not handle ties.
Example Query
Let’s use the RANK(), DENSE_RANK(), and ROW_NUMBER() functions to rank
orders based on sale_price.
Explanation
RANK(): ,
Assigns a rank to each row with ties receiving the same rank
and gaps in the ranking sequence .
DENSE_RANK(): ,
Assigns a rank to each row with ties receiving the same
rank and no gaps in the ranking sequence .
ROW_NUMBER(): ,
Assigns a unique sequential integer to each row with
,
no gaps regardless of ties .
Output Table
,
Based on the above query the output will be :
Summary
RANK (): Rows with the same sale_price receive the same rank. The next
rank after a tie has gaps (e.g., rank 1, 1, 3).
DENSE_RANK(): Rows with the same sale_price receive the same rank,
with no gaps in ranks (e.g., rank 1, 1, 2).
ROW_NUMBER(): Each row receives a unique, sequential number (e.g., 1,
2, 3).
4. PERCENT _RANK ()
The PERCENT_RANK() function calculates the relative rank of a row within a
partition as a percentage. It measures the position of a row in a sorted
dataset as a percentage of the total number of rows.
Formula:
Where:
Rank of current row: The rank assigned to the current row.
Total number of rows: The total number of rows in the partition.
5. NTILE()
The NTILE() function divides the dataset into a specified number of
approximately equal-sized buckets or tiles. It assigns a bucket number to
each row.
Formula:
Where:
n: Number of buckets or tiles to divide the dataset into.
Example Query
Explanation
_
PERCENT RANK () OVER (ORDER BY sale_price DESC): Calculates the
percentage rank of each row based on sale_price in descending order.
NTILE(4) OVER (ORDER BY sale_price DESC): Divides the rows into 4
buckets based on sale_price in descending order.
Output Table
Summary
_ :
percent rank Shows the relative position of each sale price as a
.
percentage of the total number of rows For instance the highest sale,
price (200) _
has a percent rank of 0.00, indicating it is at the top of the
distribution .
_ :
ntile bucket Divides the data into 4 _
buckets based on sale price For.
,
instance the highest sale prices fall into bucket 1, while the lower sale
prices fall into buckets 2 and 3.
3. Value Window Functions
Value window functions in SQL are used to retrieve values from a set of
rows related to the current row, based on specific criteria. These functions
are essential for performing calculations that involve looking at the data
around the current row.
Value window functions include:
LEAD(): Provides access to the value of a column in a subsequent row.
LAG(): Provides access to the value of a column in a preceding row.
FIRST_VALUE(): Returns the first value in a window frame.
LAST_VALUE(): Returns the last value in a window frame.
NTH_VALUE(): Returns the value of a specific row within the window
frame, where N is the row number.
1. LEAD( )
The LEAD() function provides access to a value from a subsequent row in
the result set. It's useful for comparing a row with future rows.
Syntax:
column_name: The column whose value you want to retrieve .
offset: The number of rows forward from the current row default is ( 1).
default: The value to return if the offset goes beyond the end of the
(
result set default is NULL ).
Example Query 1:
Retrieve the next sale price for each order :
Output Table :
Explanation :
( _ ) _
LEAD sale price retrieves the sale price from the next row ordered by
order_date.
Example Query 2:
Retrieve the sale price of the row two steps ahead :
Output Table :
Explanation :
( _
LEAD sale price , 2) retrieves the sale_price two rows ahead.
2. LAG ()
The LAG () function provides access to a value from a preceding row in the
result set. It's useful for comparing a row with previous rows.
Syntax :
column_name: .
The column whose value you want to retrieve
offset: (
The number of rows backward from the current row default is
1).
default: The value to return if the offset goes before the start of the
(
result set default is NULL ).
Example Query 1:
Retrieve the previous sale price for each order :
Output Table :
Explanation :
( _ ) _
LAG sale price retrieves the sale price from the previous row ordered by
_
order date .
Example Query 2:
Retrieve the sale price of the row two steps back :
Output Table :
Explanation :
( _
LAG sale price , 2) retrieves the sale_price two rows back.
3. FIRST_ VALUE()
The FIRST_VALUE() function returns the first value in the window frame, as
defined by the ORDER BY clause.
Syntax:
Example Query:
Retrieve the first sale price in the result set:
Output Table:
Explanation:
FIRST_VALUE(sale_price) returns the first sale_price according to the
ORDER BY clause.
4. NTH _VALUE ()
The NTH VALUE _
() function returns the value of a specific row within the
window frame, where N is the row number.
Syntax:
Example Query:
Retrieve the first sale price in the result set :
Output Table:
Explanation:
_ (
NTH VALUE sale price _ , 2) returns the second sale_price in the result set.
4. _
LAST ()
VALUE
The LAST_VALUE() function returns the last value in the window frame, as
defined by the ORDER BY clause. It's important to specify the correct
window frame to get the expected results.
Syntax:
Example Query:
Retrieve the last sale price in the window frame:
Output Table:
Why This Output is Incorrect:
In SQL, if you do not explicitly specify a RANGE frame, the default frame
is usually RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT
ROW. This means that the window function considers all rows from the
start of the partition up to the current row by default.
Issue with LAST_VALUE() Function: Because the default frame does not
extend beyond the current row, LAST_VALUE() will return the value of
the CURRENT ROW if the frame is not specified correctly. To get the
actual last value of the window frame, you need to explicitly define the
frame to include all rows from the current row to the end of the
partition.
Correct Usage of LAST_VALUE() with Explicit Frame Specification
To retrieve the last value of the partition or window frame, you should use
ROWS to define the frame explicitly. For example, to get the last value from
the current row to the end of the partition, you can use:
Corrected Query:
Output Table with Correct Frame Specification
Explanation:
By specifying ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING, you ensure that the LAST_VALUE() function considers all rows
from the current row to the end of the partition. This way, the
last_sale_price will accurately reflect the last sale_price value within the
defined window frame.
SQL Frame Clause
The SQL frame clause is used with window functions to define the subset of
rows within the partition that the function operates on. It controls the
"window" or range of rows that are considered for the calculation of each
row's result.
Frame Clause Syntax
PARTITION BY: Divides the result set into partitions to which the
window function is applied independently.
ORDER BY: Determines the order of rows within each partition.
ROWS and RANGE: Define the window frame for the function.
Default Frame Clause
If no frame specification is provided, the default frame clause depends on
the function and the SQL database system, but generally, it is:
For Aggregate Functions: ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW is often the default, meaning the window includes
all rows from the start of the partition up to the current row. For Non-
Aggregate Functions: Some window functions may default to a frame
that encompasses all rows in the partition.
Frame Specifications
1. ROWS Clause
The ROWS clause specifies a range of rows relative to the current row,
allowing more control over the frame.
Syntax:
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING :
Includes all rows from the current row to the end of the partition.
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW :
Includes all rows from the start of the partition up to the current row .
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: Includes one row
before and one row after the current row.
Example Query with ROWS:
Compute the average sale_price for the current row and the two
preceding rows.
Output Table:
Explanation:
, (
For the first row the window includes just the current row since there
are not enough preceding rows ).
,
For the second row the window includes the current row and the 1
preceding row .
,
For the third row the window includes the current row and the 2
preceding rows .
The same logic follows for subsequent rows
2. RANGE Clause
The RANGE clause specifies a range of values relative to the current row’s
value, rather than row numbers. This is used with ordered columns and is
generally applied to numeric or date columns.
Syntax:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW :
Includes all rows from the start of the partition up to the current row,
based on the ordering column’s value.
RANGE BETWEEN CURRENT ROW AND INTERVAL '1 DAY'
FOLLOWING : Includes rows within the range of the current row s value
’
up to one day ahead.
Example Query with RANGE:
Compute the total sale_price for the current row and rows with order
dates within one day prior.
Output Table:
Explanation:
,
For the first row there are no rows within the specified range (1 day
)
preceding so the total is the same as the sale price_ .
,
For the second row the window includes both the current row and the
previous row .
,
For the third row the window includes the current row and rows within
the specified range of 1 day before.
The same logic follows for subsequent rows .
When to Use ROWS vs RANGE
ROWS: Use ROWS when you need a fixed number of rows relative to
. ’
the current row It s ideal for operations that are sensitive to the actual
,
row count such as moving averages or sum calculations .
RANGE: Use RANGE when you need to consider a range of values rather
. ’
than a fixed number of rows It s useful for operations that are sensitive
,
to the value ranges such as calculating sums or averages over a time
period or numeric range .
Common Use Cases for SQL Window Functions
1.Running Totals: Calculate cumulative sums over a partitioned dataset,
such as total sales over time.
2.Moving Averages: Compute averages over a sliding window of rows to
smooth out fluctuations in data.
3.Ranking: Assign ranks to rows within a partition, such as ranking
employees based on performance.
4.Row Numbering: Assign unique sequential numbers to rows within a
partition or the entire result set.
5.Lag and Lead Analysis: Compare a row's value to a previous or
subsequent row's value, useful for trend analysis.
6.Percentiles: Determine the percentile rank of a value within a set, often
used in statistical analysis.