0% found this document useful (0 votes)
4 views

Window Functions SQL

Uploaded by

Divyansh Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Window Functions SQL

Uploaded by

Divyansh Jain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

SQL-05: Window Functions

Monday, August 22, 2022 9:04 PM

In group by, we can only get one unique value per group
While in window functions, we can generate multiple values from each group (partition)

ROW_NUMBER() with nothing inside OVER()

• WHERE/HAVING can't be used when we filter results from a window function


We need to treat that query as a table and then extract that column from it

• row_number() vs rank() vs dense_rank()


rank() produces ordering based on the column coming after ORDER BY

If there's no ORDER BY, it creates the same rank for all rows inside that partition

NTILE(k) creates k equal sub partitions of each partition and assigns each of these a rank
On using ORDER BY, running average is automatically calculated
For each vendor ID, what was the previous booth allotted to him (previous means booth alloted on the previous market date)

SELECT
cols,
LAG(lag_col, num_rows_to_lag)
OVER(PARTITION BY partition_col ORDER BY order_col) AS new_col
FROM
table

Order is as follows:
• Data gets partitioned on partition_col, this is the granularity at which we want data like each group
• Now, inside that group, we want the previous/next row since we're applying lead/lag, so on what value do we want the ordering that's there in
ORDER BY
• Now the actual column for which we want the previous/next value, that is there inside LAG()

Now suppose, we want to filter based on a particular market date, we can't do this:
This query first applies WHERE filter and then the lag which is not we expect

Hence, we should do:

Similar to LAG, there's LEAD


Lag gets prev row's data, Lead gets next row's data

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here, I want to know the cumulative sum of sales (sum of sales till that day) for each employee

I can do this:

If we don't put the order by clause, it'll associate aggregate sale for each employee with each (employee, order_date) tuple
Now, suppose I want cumulative sum of only previous two days (including the curr day) and not all days
lag() can help me to get 2nd last day's value but not last 2 day's values

These are 4 different functions

N preceding helps to aggregate over previous n rows


M following helps to aggregate over next m rows
unbounded preceding aggregates over all previous rows
unbounded following aggregates over all future rows

Orig table:

Query:

Here we used Month because while using order by with range, order by column must be numeric
moving_avg_value_on_curr_day = (value_on_prev_day + value_on_curr_day + value_on_next_day)/3

This gives the same result as

Question:
• Get employee who earns the 2nd highest salary in each department

One way to do is to do using rank() function and subqueries, other way is NTH_VALUE()
Range between unbounded preceding and following use karna is compulsory otherwise it returns NULLs for all people having salaries < 2nd highest
salary
This is because by default the range is unbounded preceding (as we saw in the above eg. Also)

You might also like