0% found this document useful (0 votes)
112 views3 pages

Spark Scala Interview Question

The document provides a Spark Scala interview question from Airbnb to find the date with the maximum number of room types searched from user data. It describes splitting the room type column, exploding the array, grouping by date, collecting the room types, and counting to get the results. The key steps are to split the room types, explode to separate rows, group by date, collect the room types, and count to find the date with the highest number of searches.

Uploaded by

seenu0104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views3 pages

Spark Scala Interview Question

The document provides a Spark Scala interview question from Airbnb to find the date with the maximum number of room types searched from user data. It describes splitting the room type column, exploding the array, grouping by date, collecting the room types, and counting to get the results. The key steps are to split the room types, explode to separate rows, group by date, collect the room types, and count to find the date with the highest number of searches.

Uploaded by

seenu0104
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Spark Scala Interview Question (Airbnb)

Given below data:


user_id, date_searched, filter_room_types
+-----------------+-----------------------------+
1 | "2022-01-01" | "entire home,private room"
2 | "2022-01-02" | "entire home,shared room"
3 | "2022-01-02" | "private room,shared room"
4 | "2022-01-03" | "private room"
+-----------------+-----------------------------+

𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻: Find the date on which maximum number of rooms were searched along with
their count.

#key #takeaways

✅ 𝗰𝗼𝗹𝗹𝗲𝗰𝘁_𝗹𝗶𝘀𝘁: to get list of the items in a group.

✅ 𝗲𝘅𝗽𝗹𝗼𝗱𝗲: to get collection items on seperate rows.

✅ create Date column while making Dataframe:


⇒ 𝗷𝗮𝘃𝗮.𝘀𝗾𝗹.𝗗𝗮𝘁𝗲.𝘃𝗮𝗹𝘂𝗲𝗢𝗳("𝟮𝟬𝟮𝟮-𝟬𝟭-𝟬𝟭")

#Approach:
𝟭. split: room_type column to get Array of rooms
𝘀𝗽𝗹𝗶𝘁(𝗰𝗼𝗹("𝗳𝗶𝗹𝘁𝗲𝗿_𝗿𝗼𝗼𝗺_𝘁𝘆𝗽𝗲𝘀"), ",")

𝟮. explode: the Array to get separate row for each room


𝗲𝘅𝗽𝗹𝗼𝗱𝗲(𝘀𝗽𝗹𝗶𝘁(𝗰𝗼𝗹("𝗳𝗶𝗹𝘁𝗲𝗿_𝗿𝗼𝗼𝗺_𝘁𝘆𝗽𝗲𝘀"), ","))

𝟯. groupBy: date_searched and then aggregate using:


⇒ collect_list to get all the rooms in a list
⇒ size(list) to get the number of rooms searched

𝟰. 𝗼𝗿𝗱𝗲𝗿𝗕𝘆("𝗰𝗼𝘂𝗻𝘁") descending and 𝗹𝗶𝗺𝗶𝘁 𝟭 to get the row with highest number of rooms
searched.

#Spark #Scala Solution:

import org.apache.spark.sql.{Row, SparkSession}


import org.apache.spark.sql.types._
import org.apache.spark.sql.functions.{explode, split, col, count, collect_list}
import java.sql.Date

object AirBnB_Room_Type_Search {
private val spark = SparkSession.builder
.appName("AirBnB_Room_Type_Search")
.master("yarn")
.getOrCreate()
def main(args: Array[String]): Unit = {
val schema = new StructType(
Array(
StructField("user_id", IntegerType, nullable = true),
StructField("date_searched", DateType, nullable = true),
StructField("filter_room_types", StringType, nullable = true)
)
)

val data = Seq(


Row(1, Date.valueOf("2022-01-01"), "entire home,private room"),
Row(2, Date.valueOf("2022-01-02"), "entire home,shared room"),
Row(3, Date.valueOf("2022-01-02"), "private room,shared room"),
Row(4, Date.valueOf("2022-01-03"), "private room")
)

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

val explodedDF = df.withColumn("filter_room_types",


explode(split(col("filter_room_types"), ",")))

val resultDF = explodedDF.groupBy("date_searched")


.agg(
collect_list("filter_room_types").as("All Rooms"),
count("filter_room_types").as("count")
)
.orderBy(col("count").desc).limit(1)

resultDF.show(truncate=false)
}
}

#credits
Ankit Bansal
OnlineLearningCenter
Suraz G.

#tagging for more reach


Mark Lewis
Daniel Ciocîrlan
Pankaj Menaria
Sagar Prajapati

Shashank Mishra 🇮🇳

AIRBNB Host Help | Airbnb for Work | Airbnb Host Community

#bigdata #dataengineering #data #interviewquestions #databricks #sparksql #sparkinterviewq


uestions #problemsolving #explode #
Activate to view larger image,

You might also like