Spark Handbook
Spark Handbook
engineer
AFRIN AHAMED
Question:
Explanation:
Read streaming data from Kafka.
Parse the data and filter sensors with values exceeding the threshold.
Write the output to the console.
Afrin Ahamed
Question:
Explanation:
Explanation:
Read streaming data from Kafka.
Parse the data and group by user_id and a 1-minute window.
Write the output to the console.
Common Mistakes:
Not defining the window correctly.
Forgetting to start the streaming query with awaitTermination().
Afrin Ahamed
Question:
Explanation:
Add a random salt to the skewed key (user_id) to distribute the data
evenly.
Perform the first aggregation on the salted key.
Remove the salt and perform a second aggregation to get the final
result.
Common Mistakes:
Not addressing data skew, leading to slow performance.
Forgetting to remove the salt in the final aggregation.
Afrin Ahamed
Question:
Afrin Ahamed
Explanation:
Use explode to flatten the nested items array.
Calculate revenue for each item by multiplying quantity and price.
Group by order_id and payment.method to aggregate the total
revenue.
Common Mistakes:
Not using explode to handle nested arrays.
Forgot to include the payment.method in the groupBy.
Afrin Ahamed
Think your skills are enough?
❌
Think again—these Data engineer
scenario-based questions could cost you
your data engineering job.
In a recent interview at many big MNC’s, one of our
students faced scenario-based questions related to
data engineering, and many candidates struggled to
answer them correctly. These questions are designed
to test your real-world knowledge and ability to solve
complex data engineering problems.
Afrin Ahamed
Afrin Ahamed