Window Functions
Window Functions
SQL
WINDOW
FUNCTIONS
What is a Window Function?
A Window Function in SQL is a powerful tool that allows you to perform
calculations across a set of table rows related to the current row. Unlike
aggregate functions that return a single result for a group, window
functions return a value for each row while still being able to access related
rows' data. This makes them ideal for complex analytics, ranking, running
totals, and more.
Syntax :
Example Query :
Let’s calculate the running total and average sale price using aggregate
window functions.
Output :
Explanation:
Global Average: Computes the average sale price across all rows in the
dataset.
Uniform Value: Provides the same average value for each row,
reflecting the overall average price.
Output :
Explanation:
Cumulative Sum: Calculates a cumulative sum of the quantity column,
adding up all quantities from the start up to the current row, based on
the order of order_date.
Preservation of Rows: Maintains the original rows and their data while
computing the running total.
Output :
Explanation:
Partitioned Running Total: Calculates the running total of quantities
separately for each product_code, based on the order of order_date.
Product-wise Calculation: Only sums quantities within the same
product code partition, keeping totals distinct by product.
Output :
Example Query
Let’s use the RANK(), DENSE_RANK(), and ROW_NUMBER() functions to rank
orders based on sale_price.
Output Table
Based on the above query, the output will be:
Summary
RANK(): Rows with the same sale_price receive the same rank. The next
rank after a tie has gaps (e.g., rank 1, 1, 3).
DENSE_RANK(): Rows with the same sale_price receive the same rank,
with no gaps in ranks (e.g., rank 1, 1, 2).
ROW_NUMBER(): Each row receives a unique, sequential number (e.g., 1,
2, 3).
Where:
Rank of current row: The rank assigned to the current row.
Total number of rows: The total number of rows in the partition.
5. NTILE()
The NTILE() function divides the dataset into a specified number of
approximately equal-sized buckets or tiles. It assigns a bucket number to
each row.
Formula:
Where:
n: Number of buckets or tiles to divide the dataset into.
Explanation
PERCENT_RANK() OVER (ORDER BY sale_price DESC): Calculates the
percentage rank of each row based on sale_price in descending order.
NTILE(4) OVER (ORDER BY sale_price DESC): Divides the rows into 4
buckets based on sale_price in descending order.
Output Table
1. LEAD( )
The LEAD() function provides access to a value from a subsequent row in
the result set. It's useful for comparing a row with future rows.
Syntax:
Example Query 1:
Retrieve the next sale price for each order:
Output Table:
Explanation:
LEAD(sale_price) retrieves the sale_price from the next row ordered by
order_date.
Output Table:
Explanation:
LEAD(sale_price, 2) retrieves the sale_price two rows ahead.
Syntax:
Example Query 1:
Retrieve the previous sale price for each order:
Output Table:
Explanation:
LAG(sale_price) retrieves the sale_price from the previous row ordered by
order_date.
Example Query 2:
Retrieve the sale price of the row two steps back:
Output Table:
Explanation:
LAG(sale_price, 2) retrieves the sale_price two rows back.
Tarini Prasad Das
3. FIRST_VALUE()
The FIRST_VALUE() function returns the first value in the window frame, as
defined by the ORDER BY clause.
Syntax:
Example Query:
Retrieve the first sale price in the result set:
Output Table:
Explanation:
FIRST_VALUE(sale_price) returns the first sale_price according to the
ORDER BY clause.
4. NTH_VALUE()
The NTH_VALUE() function returns the value of a specific row within the
window frame, where N is the row number.
Syntax:
Example Query:
Retrieve the first sale price in the result set:
Output Table:
Explanation:
NTH_VALUE(sale_price, 2) returns the second sale_price in the result set.
Example Query:
Retrieve the last sale price in the window frame:
Output Table:
Corrected Query:
Explanation:
By specifying ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING, you ensure that the LAST_VALUE() function considers all rows
from the current row to the end of the partition. This way, the
last_sale_price will accurately reflect the last sale_price value within the
defined window frame.
PARTITION BY: Divides the result set into partitions to which the
window function is applied independently.
ORDER BY: Determines the order of rows within each partition.
ROWS and RANGE: Define the window frame for the function.
Syntax:
Compute the average sale_price for the current row and the two
preceding rows.
Explanation:
For the first row, the window includes just the current row (since there
are not enough preceding rows).
For the second row, the window includes the current row and the 1
preceding row.
For the third row, the window includes the current row and the 2
preceding rows.
The same logic follows for subsequent rows
Syntax:
Compute the total sale_price for the current row and rows with order
dates within one day prior.
Explanation:
For the first row, there are no rows within the specified range (1 day
preceding) so the total is the same as the sale_price.
For the second row, the window includes both the current row and the
previous row.
For the third row, the window includes the current row and rows within
the specified range of 1 day before.
The same logic follows for subsequent rows.