SQL Window Function !!
SQL Window Function !!
WINDOW
FUNCTIONS
What is a Window Function ?
AWindow Function in SQL is apowerful toolthat allows you to perform
calculations across a set of table rows related to the current row. Unlike
functions return a value for each row while still being able to access related
rows' data. This makes them ideal for complex analytics, ranking, running
PARTITION BY: Divides the result set into partitions to which the
ORDER BY: Specifies the order of rows in each partition. It’s mandatory
NTILE(n) LAST_VALUE()
NTH_VALUE()
1. Aggregate Functions
An aggregatewindow function performs aggregation across a specified
set of rows defined by the window frame. This window frame is determined
Syntax :
Example Query :
Let’s calculate the running total and average sale price using aggregate
window functions.
a. Average Sale Price :
Here, AVG(sale_price) OVER () calculates the average of sale_price across
all rows.
Output :
Explanation:
Global Average: Computes the average sale price across all rows in the
dataset.
Uniform Value: Provides the same average value for each row,
reflecting the overall average price.
b.Using ORDER BY
Here, SUM(quantity) OVER (ORDER BY order_date) calculates the
cumulative sum of quantity, ordered by order_date.
Example : Running Total of Quantity
Output :
Explanation:
Cumulative Sum: Calculates a cumulative sum of the quantity column,
adding up all quantities from the start up to the current row, based on
the order of order_date. Preservation of Rows: Maintains the original
rows and their data while computing the running total.
c.Using PARTITION BY
You can also use PARTITION BY to compute aggregates within specific
partitions.
Example: Running Total by Product Code
Output :
Explanation:
Partitioned Running Total: Calculates the running total of quantities
separately for each product_code, based on the order of order_date.
Product-wise Calculation: Only sums quantities within the same
product code partition, keeping totals distinct by product.
d.Query with All Aggregate Window Functions
Here’s a query that demonstrates the use of several aggregate window
functions on the sales dataset:
Output :
with gaps in ranking values if there are ties. If two or more rows have the
2. DENSE_RANK()
The DENSE_RANK() function also assigns ranks to rows within a partition but
does not leave gaps in the ranking values for ties. Each distinct rank is
3. ROW_NUMBER()
The ROW_NUMBER() function assigns a unique sequential integer to rows
within a partition, with no gaps. It is the simplest ranking function and does
Example Query
RANK(): ,
Assigns a rank to each row with ties receiving the same rank
,
no gaps regardless of ties .
Output Table
,
Based on the above query the output will be :
Summary
RANK (): Rows with the same sale_price receive the same rank. The next
rank after a tie has gaps (e.g., rank 1, 1, 3).
DENSE_RANK(): Rows with the same sale_price receive the same rank,
2, 3).
4. PERCENT _RANK ()
The PERCENT_RANK() function calculates the relative rank of a row within a
Formula:
Where:
Rank of current row: The rank assigned to the current row.
5. NTILE()
The NTILE() function divides the dataset into a specified number of
each row.
Formula:
Where:
n: Number of buckets or tiles to divide the dataset into.
Example Query
Explanation
_
PERCENT RANK () OVER (ORDER BY sale_price DESC): Calculates the
percentage rank of each row based on sale_price in descending order.
Output Table
Summary
_ :
percent rank Shows the relative position of each sale price as a
.
percentage of the total number of rows For instance the highest sale,
price (200) _
has a percent rank of 0.00, indicating it is at the top of the
distribution .
_ :
ntile bucket Divides the data into 4 _
buckets based on sale price For.
,
instance the highest sale prices fall into bucket 1, while the lower sale
rows related to the current row, based on specific criteria. These functions
are essential for performing calculations that involve looking at the data
1. LEAD( )
The LEAD() function provides access to a value from a subsequent row in
the result set. It's useful for comparing a row with future rows.
Syntax:
column_name: The column whose value you want to retrieve .
offset: The number of rows forward from the current row default is ( 1).
default: The value to return if the offset goes beyond the end of the
(
result set default is NULL ).
Example Query 1:
Retrieve the next sale price for each order :
Output Table :
Explanation :
( _ ) _
LEAD sale price retrieves the sale price from the next row ordered by
order_date.
Example Query 2:
Retrieve the sale price of the row two steps ahead :
Output Table :
Explanation :
( _
LEAD sale price , 2) retrieves the sale_price two rows ahead.
2. LAG ()
The LAG () function provides access to a value from a preceding row in the
result set. It's useful for comparing a row with previous rows.
Syntax :
column_name: .
The column whose value you want to retrieve
offset: (
The number of rows backward from the current row default is
1).
default: The value to return if the offset goes before the start of the
(
result set default is NULL ).
Example Query 1:
Retrieve the previous sale price for each order :
Output Table :
Explanation :
( _ ) _
LAG sale price retrieves the sale price from the previous row ordered by
_
order date .
Example Query 2:
Retrieve the sale price of the row two steps back :
Output Table :
Explanation :
( _
LAG sale price , 2) retrieves the sale_price two rows back.
3. FIRST_ VALUE()
The FIRST_VALUE() function returns the first value in the window frame, as
Syntax:
Example Query:
Output Table:
Explanation:
ORDER BY clause.
4. NTH _VALUE ()
The NTH VALUE _
() function returns the value of a specific row within the
window frame, where N is the row number.
Syntax:
Example Query:
Output Table:
Explanation:
_ (
NTH VALUE sale price _ , 2) returns the second sale_price in the result set.
4. _
LAST ()
VALUE
The LAST_VALUE() function returns the last value in the window frame, as
Syntax:
Example Query:
Output Table:
Why This Output is Incorrect:
In SQL, if you do not explicitly specify a RANGE frame, the default frame
ROW. This means that the window function considers all rows from the
Issue with LAST_VALUE() Function: Because the default frame does not
extend beyond the current row, LAST_VALUE() will return the value of
the CURRENT ROW if the frame is not specified correctly. To get the
actual last value of the window frame, you need to explicitly define the
frame to include all rows from the current row to the end of the
partition.
To retrieve the last value of the partition or window frame, you should use
ROWS to define the frame explicitly. For example, to get the last value from
the current row to the end of the partition, you can use:
Corrected Query:
Output Table with Correct Frame Specification
Explanation:
FOLLOWING, you ensure that the LAST_VALUE() function considers all rows
from the current row to the end of the partition. This way, the
last_sale_price will accurately reflect the last sale_price value within the
PARTITION BY: Divides the result set into partitions to which the
window function is applied independently.
ORDER BY: Determines the order of rows within each partition.
ROWS and RANGE: Define the window frame for the function.
1. ROWS Clause
The ROWS clause specifies a range of rows relative to the current row,
Syntax:
Compute the average sale_price for the current row and the two
preceding rows.
Output Table:
Explanation:
, (
For the first row the window includes just the current row since there
The RANGE clause specifies a range of values relative to the current row’s
value, rather than row numbers. This is used with ordered columns and is
Syntax:
Compute the total sale_price for the current row and rows with order
Explanation:
,
For the first row there are no rows within the specified range (1 day
)
preceding so the total is the same as the sale price_ .
,
For the second row the window includes both the current row and the
previous row .
,
For the third row the window includes the current row and rows within
ROWS: Use ROWS when you need a fixed number of rows relative to
. ’
the current row It s ideal for operations that are sensitive to the actual
,
row count such as moving averages or sum calculations .
RANGE: Use RANGE when you need to consider a range of values rather
. ’
than a fixed number of rows It s useful for operations that are sensitive
,
to the value ranges such as calculating sums or averages over a time