0% found this document useful (0 votes)

257 views20 pages

SQL - 05 - Window Functions

The document discusses window functions in SQL, which allow calculations like ranking and averaging to be performed on partitions of rows rather than the entire table at once. It explains how various window functions like ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE() can be used to rank or bucket ordered partitions of rows. The document also discusses how aggregated window functions like AVG() can be used to compare row values to averages calculated within partitions.

Uploaded by

Arun Jith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

257 views20 pages

SQL - 05 - Window Functions

Uploaded by

Arun Jith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

SQL - 05 - Window Functions and

Subqueries

Problem Statement:
You are a Data Analyst at the Food Corporation of India (FCI). You have been tasked to
study the Farmer’s market - Mandis.

Dataset: Farmer’s Market Database

Setting context - What does Group By do to the

output? - Groups / Collapses rows.
● All the functions covered, like ROUND(), return one value in each row of the
results dataset.
● When GROUP BY is used, the functions operate on multiple values in an
aggregated group of records, summarizing across multiple rows in the
underlying dataset, like AVG(), but each value returned is associated with a
single row in the results.
● The output is always grouped.

What if we want to operate on multiple rows but don’t want the records to be
grouped in the output?

Let’s look at an example use case:

Question: Get the price of the most expensive item

per vendor?
This is pretty simple:
1. Group records by vendor_id in the vendor_inventory table.
2. Return the MAX original_price value

SELECT
vendor_id,
MAX(original_price) AS highest_price
FROM farmers_market.vendor_inventory
GROUP BY vendor_id

Now, here you’ll get the most expensive item per vendor.

What if the question changes:

New Question: Rank the products in each vendor’s

inventory. Expensive products get a lower rank.
In this question:
- You don’t want to group the rows by vendor here as we want to rank all the
products on each date.
- So, we need a technique to maintain the row-level information you would
otherwise lose by using Group By.
Window Functions
Window function gives the ability to put the values from one row of data into
context compared to a group of rows or partitions.

We can answer questions like

● Where would this row land in the results if the dataset were sorted? - Rank
● How does a value in this row compare to the prior row? - Accessing preceding /
following row.
● How does a current row value compare to the group or partition(in window
function’s context) average value?

So, window functions return group aggregate calculations alongside individual

row-level information for items in that group or partition.

So, in our question,

We need a function that allows us to rank rows by a value—in our case, ranking
products per vendor by price—called ROW_NUMBER().

Syntax : ROW_NUMBER() OVER (<partition_definition> <order_definition>)

SELECT
vendor_id,
market_date,
product_id,
original_price,
ROW_NUMBER() OVER (PARTITION BY vendor_id ORDER BY original_price
DESC) AS price_rank
FROM farmers_market.vendor_inventory

Syntax breakdown:

● I would interpret the ROW_NUMBER() line as “number the inventory rows per
vendor, sorted by original price, in descending order.”

● OVER() - tells the DBMS to apply the function over a window of rows.

● The part inside the parentheses says how to apply the ROW_NUMBER()
function.

● We’re going to PARTITION BY vendor_id (you can think of this like a GROUP
BY without actually combining the rows, so we’re telling it how to split the rows
into groups without aggregating).

● The ORDER BY indicates how to sort the rows. So, we’ll sort the rows by price,
high to low, within each vendor_id partition, and then number each row as per
their price.

● The highest-priced item per vendor will be first assigned row number 1.

Output explanation:

● For each vendor, the products are sorted by original_price, high to low, and
the row numbering column is called price_rank.

● The row numbering starts when you get to the next vendor_id, so the most
expensive item per vendor has a price_rank of 1.

Subquery - You can also return the record of the highest-priced item per vendor by
querying the results of the query we’ve just written:

SELECT * FROM
(
SELECT
vendor_id,
market_date,
product_id,
original_price,
ROW_NUMBER() OVER (PARTITION BY vendor_id ORDER BY original_price DESC)
AS
price_rank
FROM farmers_market.vendor_inventory ORDER BY vendor_id) x
WHERE x.price_rank = 1

Query Breakdown

● You’ll notice that the preceding query has a different structure than the queries
we have written so far.

● The concept of subqueries comes again. There is one query embedded inside
the other! This is also called “querying from a derived table,”.

● We’re treating the results of the “inner” SELECT statement like a table, here
given the table alias x, selecting all columns from it, and filtering to only the rows
with a particular ROW_NUMBER.

● Our ROW_NUMBER column is aliased price_rank, and we’re filtering to

price_rank = 1, because we numbered the rows by original_price in
descending order, so the most expensive item will have the lowest row number.

Why not put the WHERE clause in the main query itself? - Execution Order

● If we didn’t use a subquery and had attempted to filter based on the values in the
price_rank field by adding a WHERE clause to the first query with the ROW_
NUMBER function, we would get an error.

● The price_rank value is unknown at the time the WHERE clause conditions
are evaluated per row because the window functions have not yet had a chance
to check the entire dataset to determine the ranking.

● If we tried to put the ROW_NUMBER function in the WHERE clause, instead of

referencing the price_rank alias, we would get a different error, but for the same
reason.
[In the CHEATSHEET as well]: All query elements are processed in a very strict
order:

● FROM - the database gets the data from tables in FROM clause and if
necessary, performs the JOINs,
● WHERE - the data are filtered with conditions specified in the WHERE clause,
● GROUP BY - the data are grouped by conditions specified in the WHERE
clause,
● Aggregate functions - the aggregate functions are applied to the groups
created in the GROUP BY phase,
● HAVING - the groups are filtered with the given condition,
● Window functions,
● SELECT - the database selects the given columns,
● DISTINCT - repeated values are removed,
● UNION/INTERSECT/EXCEPT - the database applies set operations,
● ORDER BY - the results are sorted,
● OFFSET - the first rows are skipped,
● LIMIT/FETCH/TOP - only the first rows are selected

Note : You can also use ROW_NUMBER without a PARTITION BY clause to number
every record across the whole result (instead of numbering per partition).

Go back to the output of the ROW_NUMBER function output.

Transition -> The problem with the output in ROW_NUMBER() is that even for the same
values, we are getting different numbers or ranks but what if you want same rank to be
assigned to same values.

RANK and DENSE_RANK

To return all products with the highest price per vendor when there is more than one
with the same price, use the RANK function

The RANK function numbers the results just like ROW_NUMBER does, but gives
rows with the same value the same ranking.

Replace ROW_NUMBER with RANK in the original query.

SELECT
vendor_id,
market_date,
product_id,
original_price,
RANK() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) AS
price_rank
FROM farmers_market.vendor_inventory
ORDER BY vendor_id, original_price DESC

Output Breakdown

● Notice that the ranking for vendor_id 1 goes from 1 to 2 to 4, skipping 3. That’s
because there’s a tie for second place, so there’s no third place.

● If you don’t want to skip numbers like this in your ranking when there is a tie use
the DENSE_RANK function..

● And if you don’t want the ties at all, use the ROW_NUMBER function.

The ROW_NUMBER() and RANK() functions can help answer a question that asks
something like
● “What are the top 10 items sold at the farmer’s market, by price?” (by filtering the
results to rows numbered less than or equal to 10).

Transition
But what if you were asked to return the “top tenth” of the inventory when sorted by
price?

● You could start by running a query that used the COUNT() function,
● dividing the number returned by 10,
● then writing another query that numbers the rows, and
● filtering to those with a row number less than or equal to the number you just
determined.

But that isn’t a dynamic solution; you’d have to modify it as the number of rows
in the database changed.
NTILE function <To be improved>
The SQL Server NTILE() is a window function that distributes rows of an ordered
partition into a specified number of approximately equal groups, or buckets. It assigns
each group a bucket number starting from one. For each row in a group, the NTILE()
function assigns a bucket number representing the group to which the row belongs.

The dynamic solution is to use the NTILE function.

SELECT
vendor_id,
market_date,
product_id,
original_price,
NTILE(10) OVER (ORDER BY original_price DESC) AS price_ntile
FROM farmers_market.vendor_inventory
ORDER BY original_price DESC

Important points about NTILE

● If the number of rows in the results set can be divided evenly, the results will be
broken up into n equally sized groups, labeled 1 to n.
● If they can’t be divided up evenly, some groups will end up with one more row
than others.
● Note that the NTILE is only using the count of rows to split the groups, and
is not using a field value to determine where to make the splits.
● Therefore, it’s possible that two rows with the same value specified in
ORDER BY clause will end up in two different NTILE groups.

Aggregated Window Functions

Question: As a farmer, you want to figure out which of your

products were above the average price per product on each
market date?
Solution:
We can use the AVG() function as a window function, partitioned by market_date, and
compare each product’s price to that value.

First, let’s try using AVG() as a window function.

SELECT
vendor_id,
market_date,
product_id,
original_price,
AVG(original_price) OVER (PARTITION BY market_date) AS
average_cost_product_by_market_date
FROM farmers_market.vendor_inventory

Breakdown
● The AVG() function in this query is structured as a window function, meaning it
has “OVER (PARTITION BY __ ORDER BY __)” syntax, so instead of returning a
single row per group with the average for that group, like you would get with
GROUP BY, this function displays the average for each partition in every row
within the partition.
● When you get to a new market_date value in the results dataset, the
average_cost_product_by_market_date value changes.

Follow-up Question: Extract the farmer’s products with prices

above the market date’s average product cost.

● Using a subquery, we can filter the results to a single vendor, with vendor_id 8,
and
● only display products that have prices above the market date’s average
product cost.
● Here we will also format the average_cost_product_by_market_ date to two
digits after the decimal point using the ROUND() function:

SELECT * FROM
(
SELECT
vendor_id,
market_date,
product_id,
original_price,
ROUND(AVG(original_price) OVER (PARTITION BY market_date ORDER
BY market_date), 2) AS average_cost_product_by_market_date
FROM farmers_market.vendor_inventory )x
WHERE x.vendor_id = 8
AND x.original_price > x.average_cost_product_by_market_date
ORDER BY x.market_date, x.original_price DESC

Another Example
● Another use of an aggregate window function is to count how many items are in
each partition.

Question: Count how many different products each vendor

brought to market on each date and display that count on each
row.

The answer to this question would help you identify that the row you’re looking at
represents just one of the products in a counted set:

SELECT
vendor_id,
market_date,
product_id,
original_price,
COUNT(product_id) OVER (PARTITION BY market_date, vendor_id)
vendor_product_count_per_market_date
FROM farmers_market.vendor_inventory
ORDER BY vendor_id, market_date, original_price DESC

Output:
● You can see that even if I’m only looking at one row for vendor 7 on July 6, 2019,
I would know that it is one of 4 products that vendor had in their inventory on
that market date.

Question: Calculate the running total of the cost of items

purchased by each customer, sorted by the date and time and the
product_id

● We need to partition by customer_id

● Aggregate the total cost i.e. quantity of items multiplied by cost per qty.
● Order by market_date, transaction_time and product_id.

SELECT customer_id,
market_date,
vendor_id,
product_id,
quantity * cost_to_customer_per_qty AS price,
SUM(quantity * cost_to_customer_per_qty) OVER (PARTITION BY
customer_id ORDER BY market_date, transaction_time, product_id) AS
customer_spend_running_total
FROM farmers_market.customer_purchases

● This SUM functions as a running total because of the combination of the

PARTITION BY and ORDER BY clauses in the window function.

● We showed what happens when there is only an ORDER clause, and when both
clauses are present.

● What do you expect to happen when there is only a PARTITION BY clause (and
no ORDER BY clause)?

SELECT customer_id,
market_date,
vendor_id,
product_id,
ROUND(quantity * cost_to_customer_per_qty, 2) AS price,
ROUND(SUM(quantity * cost_to_customer_per_qty) OVER (PARTITION BY
customer_id), 2) AS customer_spend_total
FROM farmers_market.customer_purchases

● This version with no in-partition sorting, calculates the total spent by the
customer and displays that summary total on every row.

● So, without the ORDER BY, the SUM is calculated across the entire partition
instead of as a per-row running total.

● We also added the ROUND() function so this final output displays the prices with
two numbers after the decimal point.

LAG and LEAD

Now we can see how SQL can calculate the changes over time.

Question: Using the vendor_booth_assignments table in the

Farmer’s Market database, display each vendor’s booth
assignment for each market_date alongside their previous booth
assignments.

For this, we are going to use the LAG() function.

● LAG retrieves data from a row that is a selected number of rows back in the
dataset. You can set the number of rows (offset) to any integer value x to count x
rows backward, following the sort order specified in the ORDER BY section of
the window function.
● Partition by vendor_id.

Syntax of LEAD / LAG:

LEAD(expr, N, default)
OVER (Window_specification | Window_name)
Parameters used:

● expr: It can be a column or any bulit-in function.

● N: It is a positive value which determine number of rows preceding/succeeding
the current row. If it is omitted in query then its default value is 1.
● default: It is the default value return by function in-case no row
precedes/succeedes the current row by N rows. If it is missing then it is by
default NULL.
● OVER(): It defines how rows are partitioned into groups. If OVER() is empty then
function compute result using all rows.
● Window_specification: It consist of query partition clause which determines how
the query rows are partitioned and ordered.
● Window_name: If window is specified elsewhere in the query then it is referenced
using this Window_name.

SELECT
market_date,
vendor_id,
booth_number,
LAG(booth_number,1) OVER (PARTITION BY vendor_id ORDER BY
market_date, vendor_id) AS previous_booth_number
FROM farmers_market.vendor_booth_assignments
ORDER BY market_date, vendor_id, booth_number

The LAG() function is used to get value from row that precedes the current row.

Output:
● In this case, for each vendor_id for each market_date, we’re pulling the
booth_number the vendor had 1 market date in the past.
● The values are all NULL for the first market date because there is no prior market
date to pull values from.

Using this as a subquery

Question: The Market manager may want to filter these query results to a specific
market date to determine which vendors are new or changing booths that day, so
we can contact them and ensure setup goes smoothly.
Check it for date: 2019-04-10

Breakdown:
● We will create this report by wrapping the query with the LAG function in another
query,
● we can use this inner query results to filter the results to a market_date and
vendors whose current booth_number is different from their
previous_booth_number:

SELECT * FROM
(
SELECT
market_date,
vendor_id,
booth_number,
LAG(booth_number,1) OVER (PARTITION BY vendor_id ORDER BY market_
date, vendor_id) AS previous_booth_number

FROM farmers_market.vendor_booth_assignments
ORDER BY market_date, vendor_id, booth_number
)x

WHERE x.market_date = '2019-04-10'

AND (x.booth_number <> x.previous_booth_number OR x.previous_
booth_number IS NULL)

Question: Let’s say you want to find out if the total sales on each
market date are higher or lower than they were on the previous
market date.

Breakdown(crux - they will have to use both GroupBy(total sales for the day) and
Window function(LAG)):
● We are going to use the customer_purchases table from the Farmer’s Market
database, and also adding GROUP BY function, which the previous examples
did not include.
● The window functions are calculated after the grouping and aggregation occur.
● First, we need to get the total sales per market date, using a GROUP BY and
regular aggregate SUM.

SELECT
market_date,
SUM(quantity * cost_to_customer_per_qty) AS market_date_total_sales
FROM farmers_market.customer_purchases
GROUP BY market_date

Then, we can add the LAG() window function to output the previous market_date’s
calculated sum on each row.

We ORDER BY market_date in the window function to ensure it’s the previous market
date we’re comparing to and not another date.

SELECT
market_date,
SUM(quantity * cost_to_customer_per_qty) AS market_date_total_sales,
LAG(SUM(quantity * cost_to_customer_per_qty), 1) OVER (ORDER BY
market_date) AS previous_market_date_total_sales
FROM farmers_market.customer_purchases
GROUP BY market_date

LEAD works the same way as LAG, but it gets the value from the next row instead of
the previous row (assuming the offset integer is 1). You can set the offset integer to any
value x to count x rows forward, following the sort order specified in the ORDER BY
section of the window function.
Rolling Average - Window Frame
What if we want running costs or cumulative or running costs for each customer ?

Q1 - Calculate the daily cumulative sales over the entire Sales

table.
SALES CSV.

SELECT employee,
sale,
date,
SUM(sale) OVER (ORDER BY date) AS cum_sales
FROM sales;

Breakdown of this query:

● We are This query internally is represented as: (ORDER BY date RANGE BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW)

In th

● UNBOUNDED PRECEDING: It indicates that the window starts at the first row of
the partition, UNBOUNDED PRECEDING is the default.
● CURRENT ROW indicates the window begins or ends at the current row.
● UNBOUNDED FOLLOWING indicates that the window ends at the last row of
the partition

In the output,

Query: SELECT employee, sale, date,

SUM(sale) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS cum_sales
FROM sales;

Both ROWS and RANGE clauses in SQL limit the rows considered by the window
function within a partition.
The ROWS clause does that quite literally. It specifies a fixed number of
rows that precede or follow the current row regardless of their value. These
rows are used in the window function.

On the other hand, the RANGE clause logically limits the rows. That means it
considers the rows based on their value compared to the current row.

Q3 - Calculate the moving average on a window frame of 1

preceding and 1 following.

SELECT MONTH(date), SUM(sale),

AVG(SUM(sale)) OVER (ORDER BY MONTH(date)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sliding_avg
FROM sales GROUP BY MONTH(date);
How will you calculate 5 - DAY moving average for Stock prices?

SELECT MONTH(date), SUM(sale),

AVG(SUM(sale)) OVER (ORDER BY MONTH(date)
RANGE 4 PRECEDING) AS sliding_avg
FROM sales GROUP BY MONTH(date);

Nth_VALUE() - Window Frames

The NTH_VALUE() is a window function that allows you to get a value from the Nth row
in an ordered set of rows.

Syntax of the NTH_VALUE() function:

NTH_VALUE(expression, N)
FROM FIRST
OVER (
partition_clause
order_clause
frame_clause
)

● The NTH_VALUE() function returns the value of expression from the Nth row of
the window frame. If that Nth row does not exist, the function returns NULL. N
must be a positive integer e.g., 1, 2, and 3.
● The FROM FIRST instructs the NTH_VALUE() function to begin calculation at the
first row of the window frame.

https://dev.mysql.com/blog-archive/mysql-8-0-2-introducing-window-functions/

Let’s look at the Employee table.

Employee Table < for the instructor>:

CREATE TABLE farmers_market.employee(

employee_name VARCHAR(50) NOT NULL,
department VARCHAR(50) NOT NULL,
salary INT NOT NULL,
PRIMARY KEY (employee_name , department)
);
INSERT INTO
farmers_market.employee(employee_name,
department,
salary)
VALUES
('Diane Murphy','Accounting',8435),
('Mary Patterson','Accounting',9998),
('Jeff Firrelli','Accounting',8992),
('William Patterson','Accounting',8870),
('Gerard Bondur','Accounting',11472),
('Anthony Bow','Accounting',6627),
('Leslie Jennings','IT',8113),
('Leslie Thompson','IT',5186),
('Julie Firrelli','Sales',9181),
('Steve Patterson','Sales',9441),
('Foon Yue Tseng','Sales',6660),
('George Vanauf','Sales',10563),
('Loui Bondur','SCM',10449),
('Gerard Hernandez','SCM',6949),
('Pamela Castillo','SCM',11303),
('Larry Bott','SCM',11798),
('Barry Jones','SCM',10586);

Question: Find the employee with the second highest salary in

each department.

SELECT
first_name,
department_id,
salary,
NTH_VALUE(first_name, 2) OVER (
PARTITION BY department_id
ORDER BY salary DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) second_highest_salary
FROM
farmers_market.employee;

[TODO Question] - use subqueries to find the 2nd highest or 3rd5th highest item/value.

SELECT vendor_id, MAX(original_price) AS salary

FROM vendor_inventory
WHERE original_price <= (SELECT MAX(original_price)
FROM vendor_inventory);

SELECT original_price
FROM
(SELECT original_price
FROM vendor_inventory
ORDER BY original_price
LIMIT 3) AS Comp
ORDER BY original_price
LIMIT 1;

Window Functions
No ratings yet
Window Functions
30 pages
Windonction
No ratings yet
Windonction
16 pages
6 SQL - 05 - Window - Fns - 5
No ratings yet
6 SQL - 05 - Window - Fns - 5
15 pages
SQL Window Function !!
No ratings yet
SQL Window Function !!
30 pages
Samarth
No ratings yet
Samarth
21 pages
SQL Practice Problems - 57 Beginning, Intermediate, and Advanced Challenges For You To Solve Using A "Learn-By-Doing" Approach
No ratings yet
SQL Practice Problems - 57 Beginning, Intermediate, and Advanced Challenges For You To Solve Using A "Learn-By-Doing" Approach
130 pages
Lab Manual Week 03
100% (1)
Lab Manual Week 03
4 pages
(M8-Main) Advanced SQL
No ratings yet
(M8-Main) Advanced SQL
60 pages
SQL Difference
No ratings yet
SQL Difference
61 pages
Best2 Toppers SQL-interview-Question
100% (1)
Best2 Toppers SQL-interview-Question
47 pages
COPA - TT (NSQF 2022) - Compressed
No ratings yet
COPA - TT (NSQF 2022) - Compressed
344 pages
SQL
33% (3)
SQL
20 pages
SQL Information
No ratings yet
SQL Information
90 pages
SQL Exercise
No ratings yet
SQL Exercise
11 pages
04SQL and Advanced SQL
No ratings yet
04SQL and Advanced SQL
133 pages
SQL Queries For Practice
No ratings yet
SQL Queries For Practice
3 pages
SQL Queries
100% (1)
SQL Queries
3 pages
Electronics 1
No ratings yet
Electronics 1
77 pages
Practical Work 1 - PF - Q - Sesi II 2024 2025
No ratings yet
Practical Work 1 - PF - Q - Sesi II 2024 2025
8 pages
DBMS Lab # 5 SQL Constraints
No ratings yet
DBMS Lab # 5 SQL Constraints
14 pages
SQL Queries For Practice - Advanced SQL Queries PDF
No ratings yet
SQL Queries For Practice - Advanced SQL Queries PDF
15 pages
Leetcode SQL QnA 1693149052
No ratings yet
Leetcode SQL QnA 1693149052
60 pages
Hive Join
No ratings yet
Hive Join
6 pages
Azure Data Engineer Roadmap
No ratings yet
Azure Data Engineer Roadmap
4 pages
4BD20CS028,4BD20CS056 PDF
No ratings yet
4BD20CS028,4BD20CS056 PDF
40 pages
Working With Triggers in A MySQL Database PDF
No ratings yet
Working With Triggers in A MySQL Database PDF
10 pages
22MIS7236 IWT Assignment - 3
No ratings yet
22MIS7236 IWT Assignment - 3
14 pages
Amazing MySQL Interview Preparation
No ratings yet
Amazing MySQL Interview Preparation
13 pages
LAB#4
No ratings yet
LAB#4
5 pages
SQL Joins: Database HND
No ratings yet
SQL Joins: Database HND
13 pages
SQL Labsheet
No ratings yet
SQL Labsheet
9 pages
Sigcomm 99
No ratings yet
Sigcomm 99
11 pages
Query 1:: Unique Liquor Stores in Iowa
No ratings yet
Query 1:: Unique Liquor Stores in Iowa
3 pages
Unit 2 PDF
No ratings yet
Unit 2 PDF
156 pages
CREATE TABLE Statement (Microsoft Access SQL) - Microsoft Docs PDF
No ratings yet
CREATE TABLE Statement (Microsoft Access SQL) - Microsoft Docs PDF
2 pages
Chapter 5. Template
No ratings yet
Chapter 5. Template
38 pages
Loops in MySQL
No ratings yet
Loops in MySQL
39 pages
Format of Project Proposal
No ratings yet
Format of Project Proposal
3 pages
PAMPHLET 25 Groups
No ratings yet
PAMPHLET 25 Groups
34 pages
SQL RDBMS
100% (2)
SQL RDBMS
289 pages
JavaDoc Cheat Sheet
No ratings yet
JavaDoc Cheat Sheet
5 pages
Mysql Exercises
No ratings yet
Mysql Exercises
4 pages
Word Processing
No ratings yet
Word Processing
32 pages
Mysql Chapter-1: Database
No ratings yet
Mysql Chapter-1: Database
34 pages
Case-Study Solution
No ratings yet
Case-Study Solution
4 pages
My) SQL Cheat Sheet: Mysql Command-Line What How Example (S)
No ratings yet
My) SQL Cheat Sheet: Mysql Command-Line What How Example (S)
3 pages
SQL Introduction
No ratings yet
SQL Introduction
96 pages
Structure Query Language (SQL)
No ratings yet
Structure Query Language (SQL)
112 pages
Learn SQL
No ratings yet
Learn SQL
70 pages
Client Server Model: Many Databases Applications Are Built in
No ratings yet
Client Server Model: Many Databases Applications Are Built in
52 pages
Structured Query Language
No ratings yet
Structured Query Language
26 pages
Dbms Unit2
No ratings yet
Dbms Unit2
169 pages
Job Description Swiggy aSDE - Front End
No ratings yet
Job Description Swiggy aSDE - Front End
5 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
Managed Save Sequence (Additional and Unmanage Save)
No ratings yet
Managed Save Sequence (Additional and Unmanage Save)
9 pages
DDL Commands
No ratings yet
DDL Commands
65 pages
Samson Dbms (R23) FULL NOTES-1 - Removed
No ratings yet
Samson Dbms (R23) FULL NOTES-1 - Removed
25 pages
Difference Between DBMS and RDBMS
No ratings yet
Difference Between DBMS and RDBMS
16 pages
Principles of Concurrency
No ratings yet
Principles of Concurrency
7 pages
Sap Hana Eim Adapter SDK en
No ratings yet
Sap Hana Eim Adapter SDK en
72 pages
B: O: D: / M: A: + S: - : Precedence: BODMAS
No ratings yet
B: O: D: / M: A: + S: - : Precedence: BODMAS
17 pages
Teradata SQL Quick Reference Guide: Simplicity by Design, Second Edition
No ratings yet
Teradata SQL Quick Reference Guide: Simplicity by Design, Second Edition
9 pages
Stack Pointer Frame Pointer Difference
No ratings yet
Stack Pointer Frame Pointer Difference
7 pages
String in Java
No ratings yet
String in Java
3 pages
Querying With T-SQL - 01
No ratings yet
Querying With T-SQL - 01
24 pages
DataWarehouseInterview Part1
No ratings yet
DataWarehouseInterview Part1
4 pages
MCQ Unit123 Software Engg
No ratings yet
MCQ Unit123 Software Engg
10 pages
Tafj Calljee
No ratings yet
Tafj Calljee
8 pages
VennDiagram MySQL Joins
No ratings yet
VennDiagram MySQL Joins
1 page
Opps
No ratings yet
Opps
7 pages
SQL: Queries, Constraints, Triggers, Null: February 18, 2014
No ratings yet
SQL: Queries, Constraints, Triggers, Null: February 18, 2014
67 pages
Xerox V4 Print Driver Xerox V4 Applications Overview v4.1
No ratings yet
Xerox V4 Print Driver Xerox V4 Applications Overview v4.1
19 pages
SQL Concepts - Tuning PDF
No ratings yet
SQL Concepts - Tuning PDF
561 pages
An Introduction To Computers and Problem Solving
No ratings yet
An Introduction To Computers and Problem Solving
45 pages
15-BISE Lahore - C Language Practicals-StudentNotes
No ratings yet
15-BISE Lahore - C Language Practicals-StudentNotes
6 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
188 pages
Previous Next
No ratings yet
Previous Next
8 pages
Introduction To: What Is SQL?
No ratings yet
Introduction To: What Is SQL?
25 pages
AshitaGoel Resume
No ratings yet
AshitaGoel Resume
1 page
DWDM Single PPT Notes
No ratings yet
DWDM Single PPT Notes
169 pages
Database Course Outline INFO1101
No ratings yet
Database Course Outline INFO1101
5 pages
Advanced SQL
No ratings yet
Advanced SQL
10 pages
Rank, Dense Rank
100% (1)
Rank, Dense Rank
3 pages
Mysql
No ratings yet
Mysql
7 pages
Efficient Coding - LEX
No ratings yet
Efficient Coding - LEX
9 pages
MIcrosoft SQL Server 2012 - T-SQL
No ratings yet
MIcrosoft SQL Server 2012 - T-SQL
9 pages
Variable: Presented By:-Ms. Rinkle Aswani Lecturer, JIMS
No ratings yet
Variable: Presented By:-Ms. Rinkle Aswani Lecturer, JIMS
19 pages
SQL Program Practic
100% (2)
SQL Program Practic
13 pages
Umit Alhotra: 31, New Officers Colony Stadium Road, Rikhidev Marg Patiala (PB) 147001
No ratings yet
Umit Alhotra: 31, New Officers Colony Stadium Road, Rikhidev Marg Patiala (PB) 147001
2 pages
Differentiate Debugging From Testing For LO: LO-1.2.3: ISTQB Exam Question Syllabus Level: CTFL 2010
No ratings yet
Differentiate Debugging From Testing For LO: LO-1.2.3: ISTQB Exam Question Syllabus Level: CTFL 2010
1 page

SQL - 05 - Window Functions

Uploaded by

SQL - 05 - Window Functions

Uploaded by

SQL - 05 - Window Functions and

Dataset: Farmer’s Market Database

Setting context - What does Group By do to the

Let’s look at an example use case:

Question: Get the price of the most expensive item

What if the question changes:

New Question: Rank the products in each vendor’s

We can answer questions like

So, window functions return group aggregate calculations alongside individual

So, in our question,

Syntax : ROW_NUMBER() OVER (<partition_definition> <order_definition>)

● Our ROW_NUMBER column is aliased price_rank, and we’re filtering to

● If we tried to put the ROW_NUMBER function in the WHERE clause, instead of

Go back to the output of the ROW_NUMBER function output.

RANK and DENSE_RANK

Replace ROW_NUMBER with RANK in the original query.

The dynamic solution is to use the NTILE function.

Important points about NTILE

Aggregated Window Functions

Question: As a farmer, you want to figure out which of your

First, let’s try using AVG() as a window function.

Follow-up Question: Extract the farmer’s products with prices

Question: Count how many different products each vendor

Question: Calculate the running total of the cost of items

● We need to partition by customer_id

● This SUM functions as a running total because of the combination of the

LAG and LEAD

Question: Using the vendor_booth_assignments table in the

For this, we are going to use the LAG() function.

Syntax of LEAD / LAG:

● expr: It can be a column or any bulit-in function.

Using this as a subquery

WHERE x.market_date = '2019-04-10'

Q1 - Calculate the daily cumulative sales over the entire Sales

Breakdown of this query:

Query: SELECT employee, sale, date,

Q3 - Calculate the moving average on a window frame of 1

SELECT MONTH(date), SUM(sale),

SELECT MONTH(date), SUM(sale),

Nth_VALUE() - Window Frames

Syntax of the NTH_VALUE() function:

Let’s look at the Employee table.

CREATE TABLE farmers_market.employee(

Question: Find the employee with the second highest salary in

SELECT vendor_id, MAX(original_price) AS salary

You might also like