0% found this document useful (0 votes)
14 views13 pages

Json To Dataframe

Uploaded by

Rajashekar M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

Json To Dataframe

Uploaded by

Rajashekar M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1.

spark read json file into dataframe

{"ID":1,"NAME":"Jourdan","GENDER":"Female","DOB":"2012-01-01","SALARY":82445.63,"NRI":null}
{"ID":2,"NAME":"Alvera","GENDER":"Female","DOB":"2023-08-08","SALARY":75985.14,"NRI":true}
{"ID":3,"NAME":"Chauncey","GENDER":"Male","DOB":"2010-09-17","SALARY":81600.32,"NRI":null}
{"ID":4,"NAME":"Karrie","GENDER":"Female","DOB":"2024-02-28","SALARY":93889.24,"NRI":null}
{"ID":5,"NAME":"Phil","GENDER":"Female","DOB":"2022-06-06","SALARY":99743.67,"NRI":true}

1.1 Read JSON file without specifying the schema

val spark = SparkSession.builder().master("local").appName("Reading JSON").getOrCreate()

val inputDF = spark.read


.json("C:\\Users\\RECVUE-1162\\Desktop\\JSON_POC\\Simple_Multi_line.json")

println("Show DataFrame schema and data")


inputDF.printSchema()

println("inputDF:")
inputDF.show(false)

Output:

Show DataFrame schema and data


root
|-- DOB: string (nullable = true) //Spark is not inferring the date type without specifying the schema
|-- GENDER: string (nullable = true)
|-- ID: long (nullable = true)
|-- NAME: string (nullable = true)
|-- NRI: boolean (nullable = true)
|-- SALARY: double (nullable = true)

inputDF:
+----------+------+---+--------+----+--------+
|DOB |GENDER|ID |NAME |NRI |SALARY |
+----------+------+---+--------+----+--------+
|2012-01-01|Female|1 |Jourdan |null|82445.63|
|2023-08-08|Female|2 |Alvera |true|75985.14|
|2010-09-17|Male |3 |Chauncey|null|81600.32|
|2024-02-28|Female|4 |Karrie |null|93889.24|
|2022-06-06|Female|5 |Phil |true|99743.67|
+----------+------+---+--------+----+--------+

1.2 Read JSON file with specifying the schema

val spark = SparkSession.builder().master("local").appName("Reading JSON").getOrCreate()


val schema = StructType(
Array(
StructField("ID", IntegerType),
StructField("NAME", StringType),
StructField("GENDER", StringType),
StructField("DOB", DateType),
StructField("SALARY", DoubleType),
StructField("NRI", BooleanType)
)
)

val inputDF = spark.read


.schema(schema)
.json("C:\\Users\\RECVUE-1162\\Desktop\\JSON_POC\\Single_Line.json")

println("Show DataFrame schema and data")


inputDF.printSchema()

println("inputDF:")
inputDF.show(false)

Printing Schema of Columns


root
|-- ID: integer (nullable = true)
|-- NAME: string (nullable = true)
|-- GENDER: string (nullable = true)
|-- DOB: date (nullable = true) //Reading file with a user-specified custom schema
|-- SALARY: double (nullable = true)
|-- NRI: boolean (nullable = true)

inputDF:
+---+--------+------+----------+--------+----+
|ID |NAME |GENDER|DOB |SALARY |NRI |
+---+--------+------+----------+--------+----+
|1 |Jourdan |Female|2012-01-01|82445.63|null|
|2 |Alvera |Female|2023-08-08|75985.14|true|
|3 |Chauncey|Male |2010-09-17|81600.32|null|
|4 |Karrie |Female|2024-02-28|93889.24|null|
|5 |Phil |Female|2022-06-06|99743.67|true|
+---+--------+------+----------+--------+----+

2. Read JSON file from multiline

Sample Data:

[{"ID":1,
"NAME":"Jourdan",
"GENDER":"Female",
"DOB":"2012-01-01",
"SALARY":82445.63,
"NRI":null
},
{"ID":2,
"NAME":"Alvera",
"GENDER":"Female",
"DOB":"2023-08-08",
"SALARY":75985.14,
"NRI":true
},
{"ID":3,
"NAME":"Chauncey",
"GENDER":"Male",
"DOB":"2010-09-17",
"SALARY":81600.32,
"NRI":null
},
{"ID":4,
"NAME":"Karrie",
"GENDER":"Female",
"DOB":"2024-02-28",
"SALARY":93889.24,
"NRI":null
},
{"ID":5,
"NAME":"Phil",
"GENDER":"Female",
"DOB":"2022-06-06",
"SALARY":99743.67,
"NRI":true
}]

Code:

val spark = SparkSession.builder().master("local").appName("Reading JSON").getOrCreate()

val schema = StructType(


Array(
StructField("ID", IntegerType),
StructField("NAME", StringType),
StructField("GENDER", StringType),
StructField("DOB", DateType),
StructField("SALARY", DoubleType),
StructField("NRI", BooleanType)
)
)

val inputDF = spark.read


.schema(schema)
.option("multiline","true")
.json("C:\\Users\\RECVUE-1162\\Desktop\\JSON_POC\\Simple_Multi_line.json")

println("Show DataFrame schema and data")


inputDF.printSchema()

println("inputDF:")
inputDF.show(false)

Output:

Show DataFrame schema and data


root
|-- ID: integer (nullable = true)
|-- NAME: string (nullable = true)
|-- GENDER: string (nullable = true)
|-- DOB: date (nullable = true)
|-- SALARY: double (nullable = true)
|-- NRI: boolean (nullable = true)

inputDF:
+---+--------+------+----------+-------------+----+
|ID |NAME |GENDER|DOB |SALARY |NRI |
+---+--------+------+----------+-------------+----+
|1 |Jourdan |Female|2012-01-01|82445.3232323|null|
|2 |Alvera |Female|2023-08-08|75985.14 |true|
|3 |Chauncey|Male |2010-09-17|81600.32 |null|
|4 |Karrie |Female|2024-02-28|93889.24 |null|
|5 |Phil |Female|2022-06-06|99743.67 |true|
+---+--------+------+----------+-------------+----+

3. Read Nested JSON data into dataframe

[
{
"ID": 2,
"NAME": "Jane Smith",
"AGE": 35,
"HEIGHT": 5.6,
"WEIGHT": 155.0,
"IS_STUDENT": false,
"DOB": "1987-09-20",
"ADDRESS": {
"STREET": "456 Oak St",
"CITY": "Othertown",
"STATE": "CA",
"ZIPCODE": "54321"
},
"GREADES": [75, 85, 90],
"SALARY": 85000.75,
"IS_MANAGER": true
},
{
"ID": 3,
"NAME": "Alice Johnson",
"AGE": 28,
"HEIGHT": 5.4,
"WEIGHT": 140.0,
"IS_STUDENT": true,
"DOB": "1993-03-10",
"ADDRESS": {
"STREET": "789 Pine St",
"CITY": "Smalltown",
"STATE": "TX",
"ZIPCODE": "67890"
},
"GREADES": [90, 95, 100],
"SALARY": 65000.25,
"IS_MANAGER": false
},
{
"ID": 4,
"NAME": "Robert Brown",
"AGE": 40,
"HEIGHT": 6.0,
"WEIGHT": 180.0,
"IS_STUDENT": false,
"DOB": "1982-12-05",
"ADDRESS": {
"STREET": "101 Elm St",
"CITY": "Villagetown",
"STATE": "IL",
"ZIPCODE": "98765"
},
"GREADES": [80, 85, 90],
"SALARY": 90000.00,
"IS_MANAGER": true
},
{
"ID": 5,
"NAME": "Emily Lee",
"AGE": 25,
"HEIGHT": 5.8,
"WEIGHT": 160.0,
"IS_STUDENT": true,
"DOB": "1996-07-08",
"ADDRESS": {
"STREET": "321 Maple St",
"CITY": "Hometown",
"STATE": "FL",
"ZIPCODE": "54321"
},
"GREADES": [95, 95, 95],
"SALARY": 60000.50,
"IS_MANAGER": false
},
{
"ID": 6,
"NAME": "Michael Davis",
"AGE": 45,
"HEIGHT": 6.2,
"WEIGHT": 190.0,
"IS_STUDENT": false,
"DOB": "1977-11-15",
"ADDRESS": {
"STREET": "567 Cedar St",
"CITY": "Mountainview",
"STATE": "CA",
"ZIPCODE": "12345"
},
"GREADES": [70, 75, 80],
"SALARY": 100000.00,
"IS_MANAGER": true
}
]

Code:

val spark = SparkSession.builder().master("local").appName("Reading JSON").getOrCreate()

var inputDF = spark.read


.option("multiline","true")
.option("inferschema", "true")
.json("C:\\Users\\RECVUE-1162\\Desktop\\JSON_POC\\Nested_Data.json")

inputDF = inputDF.withColumn("DOB", col("DOB").cast(DateType))

println("Show DataFrame schema and data")


inputDF.printSchema()

println("inputDF:")
inputDF.select("*").show(false)

println("Splitting nested fields in ADDRESS Column")


val splitAddressDF =
inputDF.selectExpr("ID","NAME","DOB","AGE","SALARY","HEIGHT","WEIGHT","IS_MANAGER","GRAD
ES","ADDRESS.*").drop("ADDRESS")
splitAddressDF.printSchema()
splitAddressDF.show(false)

println("Exploding GRADES Array into separate rows")


val explodedDF = splitAddressDF.withColumn("GRADE", explode(col("GRADES"))).drop("GRADES")
explodedDF.show(false)

Output:

Show DataFrame schema and data


root
|-- ADDRESS: struct (nullable = true)
| |-- CITY: string (nullable = true)
| |-- STATE: string (nullable = true)
| |-- STREET: string (nullable = true)
| |-- ZIPCODE: string (nullable = true)
|-- AGE: long (nullable = true)
|-- DOB: date (nullable = true)
|-- GRADES: array (nullable = true)
| |-- element: long (containsNull = true)
|-- HEIGHT: double (nullable = true)
|-- ID: long (nullable = true)
|-- IS_MANAGER: boolean (nullable = true)
|-- IS_STUDENT: boolean (nullable = true)
|-- NAME: string (nullable = true)
|-- SALARY: double (nullable = true)
|-- WEIGHT: double (nullable = true)
inputDF:
+---------------------------------------+---+----------+-------------+------+---+----------+----------+-------------+--------+------
+
|ADDRESS |AGE|DOB |GRADES |HEIGHT|ID
|IS_MANAGER|IS_STUDENT|NAME |SALARY |WEIGHT|
+---------------------------------------+---+----------+-------------+------+---+----------+----------+-------------+--------+------
+
|{Othertown, CA, 456 Oak St, 54321} |35 |1987-09-20|[75, 85, 90] |5.6 |2 |true |false |Jane
Smith |85000.75|155.0 |
|{Smalltown, TX, 789 Pine St, 67890} |28 |1993-03-10|[90, 95, 100]|5.4 |3 |false |true |Alice
Johnson|65000.25|140.0 |
|{Villagetown, IL, 101 Elm St, 98765} |40 |1982-12-05|[80, 85, 90] |6.0 |4 |true |false |Robert
Brown |90000.0 |180.0 |
|{Hometown, FL, 321 Maple St, 54321} |25 |1996-07-08|[95, 95, 95] |5.8 |5 |false |true |Emily Lee
|60000.5 |160.0 |
|{Mountainview, CA, 567 Cedar St, 12345}|45 |1977-11-15|[70, 75, 80] |6.2 |6 |true |false |Michael
Davis|100000.0|190.0 |
+---------------------------------------+---+----------+-------------+------+---+----------+----------+-------------+--------+------
+

Splitting nested fields in ADDRESS Column


root
|-- ID: long (nullable = true)
|-- NAME: string (nullable = true)
|-- DOB: date (nullable = true)
|-- AGE: long (nullable = true)
|-- SALARY: double (nullable = true)
|-- HEIGHT: double (nullable = true)
|-- WEIGHT: double (nullable = true)
|-- IS_MANAGER: boolean (nullable = true)
|-- GRADES: array (nullable = true)
| |-- element: long (containsNull = true)
|-- CITY: string (nullable = true)
|-- STATE: string (nullable = true)
|-- STREET: string (nullable = true)
|-- ZIPCODE: string (nullable = true)

+---+-------------+----------+---+--------+------+------+----------+-------------+------------+-----+------------+-------+
|ID |NAME |DOB |AGE|SALARY |HEIGHT|WEIGHT|IS_MANAGER|GRADES |CITY
|STATE|STREET |ZIPCODE|
+---+-------------+----------+---+--------+------+------+----------+-------------+------------+-----+------------+-------+
|2 |Jane Smith |1987-09-20|35 |85000.75|5.6 |155.0 |true |[75, 85, 90] |Othertown |CA |456 Oak
St |54321 |
|3 |Alice Johnson|1993-03-10|28 |65000.25|5.4 |140.0 |false |[90, 95, 100]|Smalltown |TX |789 Pine
St |67890 |
|4 |Robert Brown |1982-12-05|40 |90000.0 |6.0 |180.0 |true |[80, 85, 90] |Villagetown |IL |101 Elm St
|98765 |
|5 |Emily Lee |1996-07-08|25 |60000.5 |5.8 |160.0 |false |[95, 95, 95] |Hometown |FL |321 Maple
St|54321 |
|6 |Michael Davis|1977-11-15|45 |100000.0|6.2 |190.0 |true |[70, 75, 80] |Mountainview|CA |567
Cedar St|12345 |
+---+-------------+----------+---+--------+------+------+----------+-------------+------------+-----+------------+-------+

Exploding GRADES Array into separate rows


+---+-------------+----------+---+--------+------+------+----------+------------+-----+------------+-------+-----+
|ID |NAME |DOB |AGE|SALARY |HEIGHT|WEIGHT|IS_MANAGER|CITY |STATE|STREET
|ZIPCODE|GRADE|
+---+-------------+----------+---+--------+------+------+----------+------------+-----+------------+-------+-----+
|2 |Jane Smith |1987-09-20|35 |85000.75|5.6 |155.0 |true |Othertown |CA |456 Oak St |54321
|75 |
|2 |Jane Smith |1987-09-20|35 |85000.75|5.6 |155.0 |true |Othertown |CA |456 Oak St |54321
|85 |
|2 |Jane Smith |1987-09-20|35 |85000.75|5.6 |155.0 |true |Othertown |CA |456 Oak St |54321
|90 |
|3 |Alice Johnson|1993-03-10|28 |65000.25|5.4 |140.0 |false |Smalltown |TX |789 Pine St |67890
|90 |
|3 |Alice Johnson|1993-03-10|28 |65000.25|5.4 |140.0 |false |Smalltown |TX |789 Pine St |67890
|95 |
|3 |Alice Johnson|1993-03-10|28 |65000.25|5.4 |140.0 |false |Smalltown |TX |789 Pine St |67890
|100 |
|4 |Robert Brown |1982-12-05|40 |90000.0 |6.0 |180.0 |true |Villagetown |IL |101 Elm St |98765 |80
|
|4 |Robert Brown |1982-12-05|40 |90000.0 |6.0 |180.0 |true |Villagetown |IL |101 Elm St |98765 |85
|
|4 |Robert Brown |1982-12-05|40 |90000.0 |6.0 |180.0 |true |Villagetown |IL |101 Elm St |98765 |90
|
|5 |Emily Lee |1996-07-08|25 |60000.5 |5.8 |160.0 |false |Hometown |FL |321 Maple St|54321
|95 |
|5 |Emily Lee |1996-07-08|25 |60000.5 |5.8 |160.0 |false |Hometown |FL |321 Maple St|54321
|95 |
|5 |Emily Lee |1996-07-08|25 |60000.5 |5.8 |160.0 |false |Hometown |FL |321 Maple St|54321
|95 |
|6 |Michael Davis|1977-11-15|45 |100000.0|6.2 |190.0 |true |Mountainview|CA |567 Cedar St|12345
|70 |
|6 |Michael Davis|1977-11-15|45 |100000.0|6.2 |190.0 |true |Mountainview|CA |567 Cedar St|12345
|75 |
|6 |Michael Davis|1977-11-15|45 |100000.0|6.2 |190.0 |true |Mountainview|CA |567 Cedar St|12345
|80 |
+---+-------------+----------+---+--------+------+------+----------+------------+-----+------------+-------+-----+

4. Read Nested JSON data into DataFrame

{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}

Code:

val spark = SparkSession.builder().master("local").appName("Reading JSON").getOrCreate()

val schema = StructType(


Array(
StructField("id", StringType),
StructField("type", StringType),
StructField("name", StringType),
StructField("ppu", DoubleType),
StructField("batters", StructType(
Array(
StructField("batter", ArrayType(StructType(
Array(
StructField("id", StringType),
StructField("type", StringType)
)
)))
)
)),
StructField("topping", ArrayType(StructType(
Array(
StructField("id", StringType),
StructField("type", StringType)
)
)))
)
)

val inputDF = spark.read


.schema(schema)
.option("multiline","true")
.json("C:\\Users\\RECVUE-1162\\Desktop\\JSON_POC\\DONUT_JSON.json")

println("Show DataFrame schema and data")


inputDF.printSchema()

println("inputDF:")
inputDF.show(false)

val sampleDF = inputDF.withColumnRenamed("id", "key")

println("creating a separate row for each element of “batter” array by exploding “batter” column and \n
Extract the individual elements from the “new_batter” struct")
val finalBatDF = sampleDF
.select(col("key"),
explode(col("batters.batter")).alias("new_batter"))
.select("key", "new_batter.*")
.withColumnRenamed("id", "bat_id")
.withColumnRenamed("type", "bat_type")
finalBatDF.show(false)

println("Convert Nested “toppings” to Structured DataFrame")

val topDF = sampleDF


.select(col("key"), explode(col("topping")).alias("new_topping"))
.select("key","new_topping.*")
.withColumnRenamed("id", "top_id")
.withColumnRenamed("type", "top_type")
topDF.show(false)

println("Explode the batters array")


val explodedBattersDF = inputDF.select(col("id"), col("type"), col("name"), col("ppu"),
explode(col("batters.batter")).as("batter"), col("topping"))
println("explodedBattersDF")
explodedBattersDF.show(100,false)

println("Explode the topping array")


val explodedToppingDF = explodedBattersDF.select(col("id"), col("type"), col("name"), col("ppu"),
col("batter.id").as("batter_id"), col("batter.type").as("batter_type"),
explode(col("topping")).as("topping"))
println("explodedToppingDF:")
explodedToppingDF.show(100,false)

println("Select the desired columns to form the complete DataFrame")


val completeDF = explodedToppingDF.select(col("id"), col("type"), col("name"), col("ppu"),
col("batter_id"), col("batter_type"), col("topping.id").as("topping_id"),
col("topping.type").as("topping_type"))

completeDF.show(100,false)

Output:

Show DataFrame schema and data


root
|-- id: string (nullable = true)
|-- type: string (nullable = true)
|-- name: string (nullable = true)
|-- ppu: double (nullable = true)
|-- batters: struct (nullable = true)
| |-- batter: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- id: string (nullable = true)
| | | |-- type: string (nullable = true)
|-- topping: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
| | |-- type: string (nullable = true)

inputDF:
+----+-----+----+----+---------------------------------------------------------+--------------------------------------------------------
---------------------------------------------------------------------------------+
|id |type |name|ppu |batters |topping
|
+----+-----+----+----+---------------------------------------------------------+--------------------------------------------------------
---------------------------------------------------------------------------------+
|0001|donut|Cake|0.55|{[{1001, Regular}, {1002, Chocolate}, {1003, Blueberry}]}|[{5001, None}, {5002,
Glazed}, {5005, Sugar}, {5007, Powdered Sugar}, {5006, Chocolate with Sprinkles}, {5003, Chocolate},
{5004, Maple}]|
+----+-----+----+----+---------------------------------------------------------+--------------------------------------------------------
---------------------------------------------------------------------------------+

creating a separate row for each element of “batter” array by exploding “batter” column and
Extract the individual elements from the “new_batter” struct
+----+------+---------+
|key |bat_id|bat_type |
+----+------+---------+
|0001|1001 |Regular |
|0001|1002 |Chocolate|
|0001|1003 |Blueberry|
+----+------+---------+

Convert Nested “toppings” to Structured DataFrame


+----+------+------------------------+
|key |top_id|top_type |
+----+------+------------------------+
|0001|5001 |None |
|0001|5002 |Glazed |
|0001|5005 |Sugar |
|0001|5007 |Powdered Sugar |
|0001|5006 |Chocolate with Sprinkles|
|0001|5003 |Chocolate |
|0001|5004 |Maple |
+----+------+------------------------+

Explode the batters array


explodedBattersDF
+----+-----+----+----+-----------------+------------------------------------------------------------------------------------------------
-----------------------------------------+
|id |type |name|ppu |batter |topping
|
+----+-----+----+----+-----------------+------------------------------------------------------------------------------------------------
-----------------------------------------+
|0001|donut|Cake|0.55|{1001, Regular} |[{5001, None}, {5002, Glazed}, {5005, Sugar}, {5007, Powdered
Sugar}, {5006, Chocolate with Sprinkles}, {5003, Chocolate}, {5004, Maple}]|
|0001|donut|Cake|0.55|{1002, Chocolate}|[{5001, None}, {5002, Glazed}, {5005, Sugar}, {5007, Powdered
Sugar}, {5006, Chocolate with Sprinkles}, {5003, Chocolate}, {5004, Maple}]|
|0001|donut|Cake|0.55|{1003, Blueberry}|[{5001, None}, {5002, Glazed}, {5005, Sugar}, {5007, Powdered
Sugar}, {5006, Chocolate with Sprinkles}, {5003, Chocolate}, {5004, Maple}]|
+----+-----+----+----+-----------------+------------------------------------------------------------------------------------------------
-----------------------------------------+
Explode the topping array
explodedToppingDF:
+----+-----+----+----+---------+-----------+--------------------------------+
|id |type |name|ppu |batter_id|batter_type|topping |
+----+-----+----+----+---------+-----------+--------------------------------+
|0001|donut|Cake|0.55|1001 |Regular |{5001, None} |
|0001|donut|Cake|0.55|1001 |Regular |{5002, Glazed} |
|0001|donut|Cake|0.55|1001 |Regular |{5005, Sugar} |
|0001|donut|Cake|0.55|1001 |Regular |{5007, Powdered Sugar} |
|0001|donut|Cake|0.55|1001 |Regular |{5006, Chocolate with Sprinkles}|
|0001|donut|Cake|0.55|1001 |Regular |{5003, Chocolate} |
|0001|donut|Cake|0.55|1001 |Regular |{5004, Maple} |
|0001|donut|Cake|0.55|1002 |Chocolate |{5001, None} |
|0001|donut|Cake|0.55|1002 |Chocolate |{5002, Glazed} |
|0001|donut|Cake|0.55|1002 |Chocolate |{5005, Sugar} |
|0001|donut|Cake|0.55|1002 |Chocolate |{5007, Powdered Sugar} |
|0001|donut|Cake|0.55|1002 |Chocolate |{5006, Chocolate with Sprinkles}|
|0001|donut|Cake|0.55|1002 |Chocolate |{5003, Chocolate} |
|0001|donut|Cake|0.55|1002 |Chocolate |{5004, Maple} |
|0001|donut|Cake|0.55|1003 |Blueberry |{5001, None} |
|0001|donut|Cake|0.55|1003 |Blueberry |{5002, Glazed} |
|0001|donut|Cake|0.55|1003 |Blueberry |{5005, Sugar} |
|0001|donut|Cake|0.55|1003 |Blueberry |{5007, Powdered Sugar} |
|0001|donut|Cake|0.55|1003 |Blueberry |{5006, Chocolate with Sprinkles}|
|0001|donut|Cake|0.55|1003 |Blueberry |{5003, Chocolate} |
|0001|donut|Cake|0.55|1003 |Blueberry |{5004, Maple} |
+----+-----+----+----+---------+-----------+--------------------------------+

Select the desired columns to form the complete DataFrame


+----+-----+----+----+---------+-----------+----------+------------------------+
|id |type |name|ppu |batter_id|batter_type|topping_id|topping_type |
+----+-----+----+----+---------+-----------+----------+------------------------+
|0001|donut|Cake|0.55|1001 |Regular |5001 |None |
|0001|donut|Cake|0.55|1001 |Regular |5002 |Glazed |
|0001|donut|Cake|0.55|1001 |Regular |5005 |Sugar |
|0001|donut|Cake|0.55|1001 |Regular |5007 |Powdered Sugar |
|0001|donut|Cake|0.55|1001 |Regular |5006 |Chocolate with Sprinkles|
|0001|donut|Cake|0.55|1001 |Regular |5003 |Chocolate |
|0001|donut|Cake|0.55|1001 |Regular |5004 |Maple |
|0001|donut|Cake|0.55|1002 |Chocolate |5001 |None |
|0001|donut|Cake|0.55|1002 |Chocolate |5002 |Glazed |
|0001|donut|Cake|0.55|1002 |Chocolate |5005 |Sugar |
|0001|donut|Cake|0.55|1002 |Chocolate |5007 |Powdered Sugar |
|0001|donut|Cake|0.55|1002 |Chocolate |5006 |Chocolate with Sprinkles|
|0001|donut|Cake|0.55|1002 |Chocolate |5003 |Chocolate |
|0001|donut|Cake|0.55|1002 |Chocolate |5004 |Maple |
|0001|donut|Cake|0.55|1003 |Blueberry |5001 |None |
|0001|donut|Cake|0.55|1003 |Blueberry |5002 |Glazed |
|0001|donut|Cake|0.55|1003 |Blueberry |5005 |Sugar |
|0001|donut|Cake|0.55|1003 |Blueberry |5007 |Powdered Sugar |
|0001|donut|Cake|0.55|1003 |Blueberry |5006 |Chocolate with Sprinkles|
|0001|donut|Cake|0.55|1003 |Blueberry |5003 |Chocolate |
|0001|donut|Cake|0.55|1003 |Blueberry |5004 |Maple |
+----+-----+----+----+---------+-----------+----------+------------------------+

You might also like