0% found this document useful (0 votes)
27 views23 pages

Windowing Functions in Databricks 1736450539

The document provides an overview of windowing functions in Databricks, detailing various window options such as currentRow, unboundedPreceding, and unboundedFollowing. It also covers SQL syntax for functions like ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, FIRST_VALUE, and LAST_VALUE, demonstrating how to assign unique values or retrieve values from adjacent rows within partitions. The examples illustrate how to create windows and apply these functions to employee data based on salary and department.

Uploaded by

Cubic Section
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views23 pages

Windowing Functions in Databricks 1736450539

The document provides an overview of windowing functions in Databricks, detailing various window options such as currentRow, unboundedPreceding, and unboundedFollowing. It also covers SQL syntax for functions like ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, FIRST_VALUE, and LAST_VALUE, demonstrating how to assign unique values or retrieve values from adjacent rows within partitions. The examples illustrate how to create windows and apply these functions to employee data based on salary and department.

Uploaded by

Cubic Section
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Windowing Functions in Databricks

Creating Windows emp_id name age


Different window options available are:

Window.currentRow 1 Shane 47
Window.unboundedPreceding
2 Eileen 29
Window.unboundedFollowing
Specified number of rows
3 Tadeas 34
Window.partitionBy("clm").orderBy("clm2").
rowsBetween(WindowStart, WindowEnd)
if You are here 4 Gordy 59

5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:

Window.currentRow 1 Shane 47
Window.partitionBy("clm").orderBy("clm2")\
.rowsBetween(Window.currentRow, Window.currentRow)
2 Eileen 29
Window Created as
Window Start 3 Tadeas 34

Window End 4 Gordy 59

if You are here 5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:
Window.unboundedPreceding
1 Shane 47
Window.unboundedFollowing
Window.partitionBy("clm").orderBy("clm2")\
.rowsBetween(Window.unboundedPreceding, 2 Eileen 29
Window.unboundedFollowing)

Window Created as 3 Tadeas 34


Window Start
4 Gordy 59
Window End if You are here

5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:

Specified number of rows 1 Shane 47


Window.partitionBy("clm").orderBy("clm2")\
.rowsBetween(-1, 1)
2 Eileen 29

Window Created as 3 Tadeas 34


Window Start
4 Gordy 59
Window End if You are here

5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:

The different window options can be used 1 Shane 47


together for start and end of window
2 Eileen 29

3 Tadeas 34

4 Gordy 59
if You are here

5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:
Window.unboundedPreceding
Specified number of rows 1 Shane 47
Window.partitionBy("clm").orderBy("clm2")\
.rowsBetween(Window.unboundedPreceding, 1)
2 Eileen 29

Window Created as 3 Tadeas 34


Window Start
4 Gordy 59
Window End if You are here

5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:
Specified number of rows
Window.unboundedFollowing 1 Shane 47
Window.partitionBy("clm").orderBy("clm2")\
.rowsBetween(1,Window.unboundedFollowing)
2 Eileen 29

Window Created as 3 Tadeas 34


Window Start
4 Gordy 59
Window End if You are here

5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:
Window.currentRow
Specified number of rows 1 Shane 47
Window.partitionBy("clm").orderBy("clm2")\
.rowsBetween(Window.currentRow, 1)
2 Eileen 29

Window Created as 3 Tadeas 34


Window Start
4 Gordy 59
Window End if You are here

5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks
Creating Windows emp_id name age
Different window options available are:
Specified number of rows
Window.currentRow 1 Shane 47
Window.partitionBy("clm").orderBy("clm2")\
.rowsBetween(1,Window.currentRow)
2 Eileen 29

Window Created as 3 Tadeas 34


Window Start
4 Gordy 59
Window End
if You are here
5 Carlos 18

Shwetank Singh
GritSetGrow - GSGLearn.com A Single partition
Windowing Functions in Databricks and SQL
ROW NUMBER
Assigns a unique sequential number to rows within each partition.

Databricks
Syntax
window = Window.partitionBy("department").orderBy("salary")
Employees
window_df = employees_df.withColumn("row_num", row_number().over(window)).show()
data for
SELECT first_name, salary, a single
department
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) AS ROW_NUM SQL
FROM Employees

Dataset
Shwetank Singh
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
ROW NUMBER
Assigns a unique sequential number to rows within each partition.

Syntax Databricks
window = Window.partitionBy("department").orderBy("salary")
window_df = employees_df.withColumn("row_num", row_number().over(window)).show()

SELECT first_name, salary, 1


ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary) AS ROW_NUM SQL
FROM Employees

2
Dataset

Shwetank Singh
A unique sequential value is assigned to rows GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
RANK
Assigns a rank to rows within a partition with gaps in rank values for
tied rows.
Databricks
Syntax
window = Window.partitionBy("department").orderBy("salary")
Employees
window_df = employees_df.withColumn("rank_num", rank().over(window)).show()
data for
SELECT first_name, salary, a single
department
RANK() OVER (PARTITION BY department ORDER BY salary) AS RANK_NUM SQL
FROM Employees

Dataset
Shwetank Singh
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
RANK
Assigns a rank to rows within a partition with gaps in rank values for tied rows.

Syntax Databricks
window = Window.partitionBy("department").orderBy("salary")
window_df = employees_df.withColumn("rank_num", rank().over(window)).show()

SELECT first_name, salary, 1


RANK() OVER (PARTITION BY department ORDER BY salary) AS RANK_NUM SQL
FROM Employees

2
Dataset

Shwetank Singh
A rank is assigned to rows with same values having same rank, the next value is given a rank with a gap GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
DENSE RANK
Similar to RANK, but without gaps in rank values for tied rows.

Databricks
Syntax
window = Window.partitionBy("department").orderBy("salary")
Employees
window_df = employees_df.withColumn("drank_num", dense_rank().over(window)).show()
data for
SELECT first_name, salary, a single
department
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary) AS DRANK_NUM SQL
FROM Employees

Dataset
Shwetank Singh
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
DENSE RANK
Similar to RANK, but without gaps in rank values for tied rows.

Syntax Databricks
window = Window.partitionBy("department").orderBy("salary")
window_df = employees_df.withColumn("drank_num", dense_rank().over(window)).show()

SELECT first_name, salary, 1


DENSE_RANK() OVER (PARTITION BY department ORDER BY salary) AS DRANK_NUM SQL
FROM Employees

2
Dataset

Shwetank Singh
A rank is assigned to every row, same values having same rank but next value is sequential without gap GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
LEAD
Returns the value from the next row in the same partition.

Databricks
Syntax
window = Window.partitionBy("department").orderBy("salary")
Employees
window_df = employees_df.withColumn("next_value", lead(“salary”).over(window)).show()
data for
SELECT first_name, salary, a single
department
LEAD(salary) OVER (PARTITION BY department ORDER BY salary) AS DRANK_NUM SQL
FROM Employees

Dataset
Shwetank Singh
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
LEAD
Returns the value from the next row in the same partition.

Syntax Databricks
window = Window.partitionBy("department").orderBy("salary")
window_df = employees_df.withColumn("next_value", lead(“salary”).over(window)).show()

SELECT first_name, salary, 1


LEAD(salary) OVER (PARTITION BY department ORDER BY salary) AS DRANK_NUM SQL
FROM Employees

2
Dataset

Shwetank Singh
For the new columns the value for Salary from next row will be returned GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
LAG
Returns the value from the previous row in the same partition.

Databricks
Syntax
window = Window.partitionBy("department").orderBy("salary")
Employees
window_df = employees_df.withColumn("previous_value", lag(“salary”).over(window)).show()
data for
SELECT first_name, salary, a single
department
LAG(salary) OVER (PARTITION BY department ORDER BY salary) AS DRANK_NUM SQL
FROM Employees

Dataset
Shwetank Singh
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
LAG
Returns the value from the previous row in the same partition.

Syntax Databricks
window = Window.partitionBy("department").orderBy("salary")
window_df = employees_df.withColumn("previous_value", lag(“salary”).over(window)).show()

SELECT first_name, salary, 1


LAG(salary) OVER (PARTITION BY department ORDER BY salary) AS PREVIOUS_VALUE SQL
FROM Employees

2
Dataset

Shwetank Singh
For the new columns the value for Salary from previous row will be returned
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
FIRST VALUE
Returns the first value in the same partition.

Databricks
Syntax
window = Window.partitionBy("department").orderBy("salary")
Employees
window_df = employees_df.withColumn("previous_value", first(“salary”).over(window)).show()
data for
SELECT first_name, salary, a single
department
FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS DRANK_NUM SQL
FROM Employees

Dataset
Shwetank Singh
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
FIRST VALUE
Returns the first value in the same partition.

Syntax Databricks
window = Window.partitionBy("department").orderBy("salary")
window_df = employees_df.withColumn("first_value", first(“salary”).over(window)).show()

SELECT first_name, salary, 1


FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS FIRST_VALUE SQL
FROM Employees

2
Dataset

Shwetank Singh
The first value of the partition which is 95000 will be the value for the new column in this partition GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
LAST VALUE
Returns the last value in the same partition.

Databricks
Syntax
window = Window.partitionBy("department").orderBy("salary")
Employees
window_df = employees_df.withColumn("last_value", last(“salary”).over(window)).show()
data for
SELECT first_name, salary, a single
department
LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS LAST_VALUE SQL
FROM Employees

Dataset
Shwetank Singh
GritSetGrow - GSGLearn.com
Windowing Functions in Databricks and SQL
LAST VALUE
Returns the last value in the same partition.

Syntax Databricks
window = Window.partitionBy("department").orderBy("salary")
window_df = employees_df.withColumn("last_value", last(“salary”).over(window)).show()

SELECT first_name, salary, 1


LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary) AS LAST_VALUE SQL
FROM Employees

2
Dataset

Shwetank Singh
The last value of the partition which is 64000 will be the value for the new column in this partition GritSetGrow - GSGLearn.com

You might also like