Adding StructType Columns To Spark DataFrames
Adding StructType Columns To Spark DataFrames
Spark DataFrames
Matthew Powers
Follow
Jan 15, 2018 · 3 min read
StructType overview
The StructType case class can be used to define a DataFrame
schema as follows.
val data = Seq(
Row(1, "a"),
Row(5, "z")
)
df.show()+---+------+
|num|letter|
+---+------+
| 1| a|
| 5| z|
+---+------+
val df = spark.createDataFrame(
spark.sparkContext.parallelize(data),
schema
)
val actualDF = df.withColumn(
"animal_interpretation",
struct(
(col("weight") > 5).as("is_large_animal"),
col("animal_type").isin("rat", "cat", "dog").as("is_mammal")
)
)
actualDF.show(truncate = false)+------+-----------
+---------------------+
|weight|animal_type|animal_interpretation|
+------+-----------+---------------------+
|20.0 |dog |[true,true] |
|3.5 |cat |[false,true] |
|6.0E-6|ant |[false,false] |
+------+-----------+---------------------+
val df = spark.createDataFrame(
spark.sparkContext.parallelize(data),
schema
)
df
.transform(ExampleTransforms.withIsTeenager())
.transform(ExampleTransforms.withHasPositiveMood())
.transform(ExampleTransforms.withWhatToDo())
.show()+---+-----+-----------+-----------------+-----------+
|age| mood|is_teenager|has_positive_mood| what_to_do|
+---+-----+-----------+-----------------+-----------+
| 30|happy| false| true| null|
| 13| sad| true| false| null|
| 18| glad| true| true|have a chat|
+---+-----+-----------+-----------------+-----------+