Window Functions SQL
Window Functions SQL
In group by, we can only get one unique value per group
While in window functions, we can generate multiple values from each group (partition)
If there's no ORDER BY, it creates the same rank for all rows inside that partition
NTILE(k) creates k equal sub partitions of each partition and assigns each of these a rank
On using ORDER BY, running average is automatically calculated
For each vendor ID, what was the previous booth allotted to him (previous means booth alloted on the previous market date)
SELECT
cols,
LAG(lag_col, num_rows_to_lag)
OVER(PARTITION BY partition_col ORDER BY order_col) AS new_col
FROM
table
Order is as follows:
• Data gets partitioned on partition_col, this is the granularity at which we want data like each group
• Now, inside that group, we want the previous/next row since we're applying lead/lag, so on what value do we want the ordering that's there in
ORDER BY
• Now the actual column for which we want the previous/next value, that is there inside LAG()
Now suppose, we want to filter based on a particular market date, we can't do this:
This query first applies WHERE filter and then the lag which is not we expect
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here, I want to know the cumulative sum of sales (sum of sales till that day) for each employee
I can do this:
If we don't put the order by clause, it'll associate aggregate sale for each employee with each (employee, order_date) tuple
Now, suppose I want cumulative sum of only previous two days (including the curr day) and not all days
lag() can help me to get 2nd last day's value but not last 2 day's values
Orig table:
Query:
Here we used Month because while using order by with range, order by column must be numeric
moving_avg_value_on_curr_day = (value_on_prev_day + value_on_curr_day + value_on_next_day)/3
Question:
• Get employee who earns the 2nd highest salary in each department
One way to do is to do using rank() function and subqueries, other way is NTH_VALUE()
Range between unbounded preceding and following use karna is compulsory otherwise it returns NULLs for all people having salaries < 2nd highest
salary
This is because by default the range is unbounded preceding (as we saw in the above eg. Also)