Window Functions
Window Functions
Window Functions
We are all well-known for the regular aggregate function that performs calculations on
the table and works with a GROUP BY clause. However, only a small percentage of SQL
users use Window functions, and these are functions that work on a group of rows and
display a single aggregated value for every row. This article will discuss in detail the
window functions in SQL Server.
The below pictorial representations explain the difference of aggregate function and
window function in SQL Server:
Syntax
The following are the basic syntax for using a window function:
1. window_function_name([ALL] expression)
2. OVER (
3. [partition_defintion]
4. [order_definition]
5. )
Parameter Explanation
ALL: It is an optional keyword that is used to count all values along with duplicates. We
cannot use the DISTINCT keyword in window functions.
OVER: It specifies the window clauses for aggregate functions. It mainly contains two
expressions partition by and order by, and it always has an opening and closing
parenthesis even there is no expression.
PARTITION BY: This clause divides the rows into partitions, and then a window function
is operated on each partition. Here we need to provide columns for the partition after
the PARTITION BY clause. If we want to specify more than one column, we must
separate them by a comma operator. SQL Server will group the entire table when this
clause is not specified, and values will be aggregated accordingly.
ORDER BY: It is used to specify the order of rows within each partition. When this clause
is not defined, SQL Server will use the ORDER BY for the entire table.
Example
Let us understand the concept of window function through an example. First, we will
create a table named "product_sales" using the following statement:
1. CREATE TABLE Product_Sales(
2. Emp_Name VARCHAR(45) NOT NULL,
3. Year INT NOT NULL,
4. Country VARCHAR(45) NOT NULL,
5. Prod_name VARCHAR(45) NOT NULL,
6. Sales DECIMAL(12,2) NOT NULL,
7. PRIMARY KEY(Emp_Name, Year)
8. );
Next, we will fill records into this table using the INSERT statement as below:
1. INSERT INTO Product_Sales(Emp_Name, Year, Country, Prod_name, Sales)
2. VALUES('Mike Johnson', 2017, 'Britain', 'Laptop', 10000),
3. ('Mike Johnson', 2018, 'Britain', 'Laptop', 15000),
4. ('Mike Johnson', 2019, 'Britain', 'TV', 20000),
5. ('Mary Greenspan', 2017, 'Australia', 'Computer', 15000),
6. ('Mary Greenspan', 2018, 'Australia', 'Computer', 10000),
7. ('Mary Greenspan', 2019, 'Australia', 'TV', 20000),
8. ('Nancy Jackson', 2017, 'Canada', 'Mobile', 20000),
9. ('Nancy Jackson', 2018, 'Canada', 'Calculator', 1500),
10. ('Nancy Jackson', 2019, 'Canada', 'Mobile', 25000);
We can verify the inserted records using the SELECT statement. We will see the below
output:
Now we will demonstrate all window functions using this table.
It is an aggregate function that performs the addition of the specified field for a
specified group or the entire table when we have not specified any group. Here we will
examine this function in both ways, either regular aggregate function or window
aggregate function.
The below statement explains the regular aggregate function that adds the order
amount for each country:
1. SELECT Country, SUM(Sales) AS total_amount
2. FROM Product_Sales GROUP BY Country;
Executing the statement, we see that this function groups multiple rows into a single
output row. It causes individual rows to lose their identity.
The below statement explains the window aggregate function that maintains the row
identity. It also displays the aggregated value for each row.
1. SELECT Emp_Name, Year, Country, Prod_name, Sales, SUM(Sales)
2. OVER(PARTITION BY Country) as grand_total
3. FROM Product_Sales;
Executing the query will return the below output. Here we can see that it aggregates the
data for each country and displays the sum of total sales for each of them. It also inserts
another column for the total sales as grand_total so that each row retains its identity.
AVG()
This function returns the average value of the specified column. It works in exactly the
same way with a window function.
The below example will produce the average sales for each country and each year.
Here we have specified more than one average by specifying multiple fields in the
partition list.
1. SELECT Emp_Name, Year, Country, Prod_name, Sales, AVG(Sales)
2. OVER(PARTITION BY Country, YEAR(Year)) as avg_sales_amount
3. FROM Product_Sales;
Executing the statement will return the below output where we can see that on average,
we have received a sale amount of 15000 for Australia country.
MIN()
This function returns the minimum value for a specified group. When we have not
defined the group, it will return the minimum value for the entire table.
The below example will return the smallest sales amount for each country:
1. SELECT Emp_Name, Year, Country, Prod_name, Sales, MIN(Sales)
2. OVER(PARTITION BY Country) AS minimum_sales_amount
3. FROM Product_Sales;
Executing the query will produce the below output where we can see the minimum sales
amount for each country:
MAX()
This function returns the maximum value for a specified group. When we have not
defined the group, it will return the maximum value for the entire table.
The below example will return the highest sales amount for each country:
1. SELECT Emp_Name, Year, Country, Prod_name, Sales, MAX(Sales)
2. OVER(PARTITION BY Country) AS minimum_sales_amount
3. FROM Product_Sales;
Executing the query will produce the below output where we can see the highest sales
amount for each country:
COUNT()
The count function will return the total number of rows or records present in the table
or group. The regular aggregate function uses the DISTINCT keyword not to count the
duplicates rows. But the window count function does not support this keyword. If we
use this keyword with the window function, SQL Server throws an error.
Suppose we want to see how many employees order the product in the year 2018. We
cannot directly count all employees as the same employee has ordered multiple
products in the same year.
For example,
1. SELECT Country, COUNT(DISTINCT Emp_name) As number_of_employees
2. FROM Product_Sales
3. GROUP BY Country;
1. SELECT Country, COUNT(DISTINCT Emp_name)
2. OVER(PARTITION BY Country) As number_of_employees
3. FROM Product_Sales;
1. SELECT Emp_Name, Year, Country, Prod_name, Sales, COUNT(Prod_name)
2. OVER(PARTITION BY Country) As total_product
3. FROM Product_Sales;
RANK()
It's used to generate a unique rank for each row in a table based on the specified
value. If this function gets the two records with the same value, it will assign the same
rank to both records and skip the next ranking. For example, if rank 2 has two identical
values, the rank function provides the same rank 2 to both records and skip the next
rank 3. Now, the next rank will be assigned with rank 4.
1. SELECT first_name, last_name, city,
2. RANK () OVER (ORDER BY city) AS Rank_No
3. FROM rank_demo;
This query returns the below output where we see that the same rank (2) is assigned to
two identical records having equal city names. The next number in the ranking will be its
previous rank plus a number of duplicate numbers, i.e. 4.
DENSE_RANK()
It works the same as the RANK() function except that it does not skip any rank. It always
assigns rank in consecutive order. It means that when two records are found equal, this
function will assign the same rank to both records and the next rank being the next
sequential number.
The below query explains this function practically to assign a rank number for
each row based on the city:
1. SELECT first_name, last_name, city,
2. DENSE_RANK() OVER (ORDER BY city) AS Rank_No
3. FROM rank_demo;
This query will return the below output where we can see that the duplicate values have
the same rank, and the next rank is given to the next record without skipping a rank
value.
ROW_NUMBER()
It is used to assign a unique sequential number to each record within the partition. It
always starts with one and increases by one until all the records in a partition are not
reached. It will be reset when one partition ranking is completed and goes to the next
partition.
The below query assigns the number to each row based on the city:
1. SELECT first_name, last_name, city,
2. ROW_NUMBER() OVER (ORDER BY city) AS Rank_No
3. FROM rank_demo;
1. SELECT first_name, last_name, city,
2. ROW_NUMBER() OVER (PARTITION BY city ORDER BY first_name) AS Rank_No
3. FROM rank_demo;
NTILE()
The following statement will divide the table into 3 quartiles based on the city
column:
1. SELECT first_name, last_name, city,
2. NTILE(3) OVER ( ORDER BY city) AS Rank_No
3. FROM rank_demo;
Executing the statement will return the below output where we see each group have
three quartiles:
PERCENT_RANK()
This function evaluates a percentile rank (relative rank) for rows within a partition of a
result set. It gives the result between 0 and 1. If it finds the NULL value, it treats them as
the lowest possible value.
This function evaluates the rank with the help of the below formula for each record:
1. (rank-1) / ( total_rows-1)
Here, rank indicates the numbering of each row returns by rank() function, and
total_rows are the total number of rows found in the partition.
The following example will calculate the rank value for each row order by country
name:
1. SELECT Year, Prod_name, Country, Sales,
2. PERCENT_RANK() OVER(PARTITION BY Year ORDER BY Country) AS my_rank
3. FROM Product_Sales;
The LEAD and LAG functions are used to get the preceding and succeeding values of
specified rows from the current row within its partition.
1. SELECT Year, Prod_name, Country, Sales,
2. LEAD(Sales,1) OVER (PARTITION BY Year ORDER BY Country) AS Next_Sale
3. FROM Product_Sales;
The following example returns the sales and previous sales detail of each
employee. It first split the result set based on the year and then sorted each partition
using the country column. After that, we have to use the LAD() function on each
partition to get the previous sales detail.
1. SELECT Year, Prod_name, Country, Sales,
2. LAG(Sales, 1) OVER (PARTITION BY Year ORDER BY Country) AS Previous_Sale
3. FROM Product_Sales;
Executing the statement will display the expected result:
These functions are used to find the first and last record in the table or a partition if the
PARTITION BY clause is specified. Here we should note that these functions are
mandatory to use the ORDER BY clause. Let us see how these functions work in SQL
Server through practical examples.
The following example will find the first and last sales of each country in a given
table:
1. SELECT Year, Prod_name, Country, Sales,
2. FIRST_VALUE(Sales) OVER(PARTITION BY Country ORDER BY Country) first_sale,
3. LAST_VALUE(Sales) OVER(PARTITION BY Country ORDER BY Country) last_sale
4. FROM Product_Sales;
Executing the query will display the expected result as shown below:
Conclusion
This article will explain all window functions used in the SQL Server that work on a set of
rows and return a single aggregated value for every row.