0% found this document useful (0 votes)
33 views7 pages

Spark Handbook

The document outlines various data engineering scenarios involving real-time data processing and aggregation using Spark. It includes examples of handling IoT sensor data, log data, user transactions with skew, and nested JSON datasets, along with common mistakes to avoid. Additionally, it promotes Prominent Academy's services for preparing candidates for data engineering interviews through mock interviews and personalized coaching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views7 pages

Spark Handbook

The document outlines various data engineering scenarios involving real-time data processing and aggregation using Spark. It includes examples of handling IoT sensor data, log data, user transactions with skew, and nested JSON datasets, along with common mistakes to avoid. Additionally, it promotes Prominent Academy's services for preparing candidates for data engineering interviews through mock interviews and personalized coaching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

data

engineer

AFRIN AHAMED
Question:

You are given a stream of IoT sensor data with columns:


sensor_id, timestamp, and value. Detect sensors with values
exceeding a threshold (e.g., 100) in real-time.

Explanation:
Read streaming data from Kafka.
Parse the data and filter sensors with values exceeding the threshold.
Write the output to the console.

Afrin Ahamed
Question:

You are given a stream of log data with columns: timestamp,


user_id, and action. Calculate the count of actions per user in
real-time.

Explanation:
Explanation:
Read streaming data from Kafka.
Parse the data and group by user_id and a 1-minute window.
Write the output to the console.

Common Mistakes:
Not defining the window correctly.
Forgetting to start the streaming query with awaitTermination().

Afrin Ahamed
Question:

You are given a dataset of user transactions with columns:


user_id, transaction_id, and amount. The dataset is heavily
skewed on the user_id column. Calculate the total transaction
amount per user while handling the skew.

Explanation:
Add a random salt to the skewed key (user_id) to distribute the data
evenly.
Perform the first aggregation on the salted key.
Remove the salt and perform a second aggregation to get the final
result.

Common Mistakes:
Not addressing data skew, leading to slow performance.
Forgetting to remove the salt in the final aggregation.

Afrin Ahamed
Question:

You are given a nested JSON dataset with the following


structure:

Write a Spark job to extract the following:


Total revenue per order.
Payment method for each order.

Afrin Ahamed
Explanation:
Use explode to flatten the nested items array.
Calculate revenue for each item by multiplying quantity and price.
Group by order_id and payment.method to aggregate the total
revenue.

Common Mistakes:
Not using explode to handle nested arrays.
Forgot to include the payment.method in the groupBy.

Afrin Ahamed
Think your skills are enough?

Think again—these Data engineer
scenario-based questions could cost you
your data engineering job.
In a recent interview at many big MNC’s, one of our
students faced scenario-based questions related to
data engineering, and many candidates struggled to
answer them correctly. These questions are designed
to test your real-world knowledge and ability to solve
complex data engineering problems.

Unfortunately, many students failed to answer these


questions confidently. The truth is, preparation is key,
and that’s where Prominent Academy comes in!
We specialize in preparing you for spark and data
engineering interviews by:
✅ Offering scenario-based mock interviews
✅ Providing hands-on training with data engineering
features
✅ Optimizing your resume & LinkedIn profile
✅ Giving personalized interview coaching to ensure
you’re job-ready
Don’t leave your future to chance!

Afrin Ahamed
Afrin Ahamed

You might also like