pyspark.sql.functions.lit#

pyspark.sql.functions.lit(col)[source]#

Creates a Column of literal value.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn, str, int, float, bool or list, NumPy literals or ndarray.

the value to make it as a PySpark literal. If a column is passed, it returns the column as is.

Changed in version 3.4.0: Since 3.4.0, it supports the list type.

Returns
Column

the literal instance.

Examples

Example 1: Creating a literal column with an integer value.

>>> import pyspark.sql.functions as sf
>>> df = spark.range(1)
>>> df.select(sf.lit(5).alias('height'), df.id).show()
+------+---+
|height| id|
+------+---+
|     5|  0|
+------+---+

Example 2: Creating a literal column from a list.

>>> import pyspark.sql.functions as sf
>>> spark.range(1).select(sf.lit([1, 2, 3])).show()
+--------------+
|array(1, 2, 3)|
+--------------+
|     [1, 2, 3]|
+--------------+

Example 3: Creating a literal column from a string.

>>> import pyspark.sql.functions as sf
>>> df = spark.range(1)
>>> df.select(sf.lit("PySpark").alias('framework'), df.id).show()
+---------+---+
|framework| id|
+---------+---+
|  PySpark|  0|
+---------+---+

Example 4: Creating a literal column from a boolean value.

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([(True, "Yes"), (False, "No")], ["flag", "response"])
>>> df.select(sf.lit(False).alias('is_approved'), df.response).show()
+-----------+--------+
|is_approved|response|
+-----------+--------+
|      false|     Yes|
|      false|      No|
+-----------+--------+

Example 5: Creating literal columns from Numpy scalar.

>>> from pyspark.sql import functions as sf
>>> import numpy as np 
>>> spark.range(1).select(
...     sf.lit(np.bool_(True)),
...     sf.lit(np.int64(123)),
...     sf.lit(np.float64(0.456)),
...     sf.lit(np.str_("xyz"))
... ).show() 
+----+---+-----+---+
|true|123|0.456|xyz|
+----+---+-----+---+
|true|123|0.456|xyz|
+----+---+-----+---+

Example 6: Creating literal columns from Numpy ndarray.

>>> from pyspark.sql import functions as sf
>>> import numpy as np 
>>> spark.range(1).select(
...     sf.lit(np.array([True, False], np.bool_)),
...     sf.lit(np.array([], np.int8)),
...     sf.lit(np.array([1.5, 0.1], np.float64)),
...     sf.lit(np.array(["a", "b", "c"], np.str_)),
... ).show() 
+------------------+-------+-----------------+--------------------+
|ARRAY(true, false)|ARRAY()|ARRAY(1.5D, 0.1D)|ARRAY('a', 'b', 'c')|
+------------------+-------+-----------------+--------------------+
|     [true, false]|     []|       [1.5, 0.1]|           [a, b, c]|
+------------------+-------+-----------------+--------------------+