pyspark.sql.functions.explode#

pyspark.sql.functions.explode(col)[source]#

Returns a new row for each element in the given array or map. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or column name

Target column to work on.

Returns
Column

One row per array item or map key value.

Notes

Only one explode is allowed per SELECT clause.

Examples

Example 1: Exploding an array column

>>> from pyspark.sql import functions as sf
>>> df = spark.sql('SELECT * FROM VALUES (1,ARRAY(1,2,3,NULL)), (2,ARRAY()), (3,NULL) AS t(i,a)')
>>> df.show()
+---+---------------+
|  i|              a|
+---+---------------+
|  1|[1, 2, 3, NULL]|
|  2|             []|
|  3|           NULL|
+---+---------------+
>>> df.select('*', sf.explode('a')).show()
+---+---------------+----+
|  i|              a| col|
+---+---------------+----+
|  1|[1, 2, 3, NULL]|   1|
|  1|[1, 2, 3, NULL]|   2|
|  1|[1, 2, 3, NULL]|   3|
|  1|[1, 2, 3, NULL]|NULL|
+---+---------------+----+

Example 2: Exploding a map column

>>> from pyspark.sql import functions as sf
>>> df = spark.sql('SELECT * FROM VALUES (1,MAP(1,2,3,4,5,NULL)), (2,MAP()), (3,NULL) AS t(i,m)')
>>> df.show(truncate=False)
+---+---------------------------+
|i  |m                          |
+---+---------------------------+
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|
|2  |{}                         |
|3  |NULL                       |
+---+---------------------------+
>>> df.select('*', sf.explode('m')).show(truncate=False)
+---+---------------------------+---+-----+
|i  |m                          |key|value|
+---+---------------------------+---+-----+
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|1  |2    |
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|3  |4    |
|1  |{1 -> 2, 3 -> 4, 5 -> NULL}|5  |NULL |
+---+---------------------------+---+-----+

Example 3: Exploding multiple array columns

>>> import pyspark.sql.functions as sf
>>> df = spark.sql('SELECT ARRAY(1,2) AS a1, ARRAY(3,4,5) AS a2')
>>> df.select(
...     '*', sf.explode('a1').alias('v1')
... ).select('*', sf.explode('a2').alias('v2')).show()
+------+---------+---+---+
|    a1|       a2| v1| v2|
+------+---------+---+---+
|[1, 2]|[3, 4, 5]|  1|  3|
|[1, 2]|[3, 4, 5]|  1|  4|
|[1, 2]|[3, 4, 5]|  1|  5|
|[1, 2]|[3, 4, 5]|  2|  3|
|[1, 2]|[3, 4, 5]|  2|  4|
|[1, 2]|[3, 4, 5]|  2|  5|
+------+---------+---+---+

Example 4: Exploding an array of struct column

>>> import pyspark.sql.functions as sf
>>> df = spark.sql('SELECT ARRAY(NAMED_STRUCT("a",1,"b",2), NAMED_STRUCT("a",3,"b",4)) AS a')
>>> df.select(sf.explode('a').alias("s")).select("s.*").show()
+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  3|  4|
+---+---+