0% found this document useful (0 votes)
84 views7 pages

Deloitte Data Engineer

The document outlines various data engineering scenarios involving real-time processing of IoT sensor data, log data, user transactions, and nested JSON datasets. It emphasizes the importance of correctly handling data streams, addressing data skew, and using appropriate aggregation techniques. Additionally, it promotes Prominent Academy's services for preparing candidates for data engineering interviews through mock interviews and personalized coaching.

Uploaded by

ronit.kumar2802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views7 pages

Deloitte Data Engineer

The document outlines various data engineering scenarios involving real-time processing of IoT sensor data, log data, user transactions, and nested JSON datasets. It emphasizes the importance of correctly handling data streams, addressing data skew, and using appropriate aggregation techniques. Additionally, it promotes Prominent Academy's services for preparing candidates for data engineering interviews through mock interviews and personalized coaching.

Uploaded by

ronit.kumar2802
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

data

engineer

www.prominentacademy.in
Question:

You are given a stream of IoT sensor data with columns:


sensor_id, timestamp, and value. Detect sensors with values
exceeding a threshold (e.g., 100) in real-time.

Explanation:
Read streaming data from Kafka.
Parse the data and filter sensors with values exceeding the threshold.
Write the output to the console.

📞 Don’t wait—call us at +91 98604 38743 today


Your next opportunity is closer than you think. Let’s get you there!
Question:

You are given a stream of log data with columns: timestamp,


user_id, and action. Calculate the count of actions per user in
real-time.

Explanation:
Explanation:
Read streaming data from Kafka.
Parse the data and group by user_id and a 1-minute window.
Write the output to the console.

Common Mistakes:
Not defining the window correctly.
Forgetting to start the streaming query with awaitTermination().

📞 Don’t wait—call us at +91 98604 38743 today


Your next opportunity is closer than you think. Let’s get you there!
Question:

You are given a dataset of user transactions with columns:


user_id, transaction_id, and amount. The dataset is heavily
skewed on the user_id column. Calculate the total transaction
amount per user while handling the skew.

Explanation:
Add a random salt to the skewed key (user_id) to distribute the data
evenly.
Perform the first aggregation on the salted key.
Remove the salt and perform a second aggregation to get the final
result.

Common Mistakes:
Not addressing data skew, leading to slow performance.
Forgetting to remove the salt in the final aggregation.

📞 Don’t wait—call us at +91 98604 38743 today


Your next opportunity is closer than you think. Let’s get you there!
Question:

You are given a nested JSON dataset with the following


structure:

Write a Spark job to extract the following:


Total revenue per order.
Payment method for each order.

📞 Don’t wait—call us at +91 98604 38743 today


Your next opportunity is closer than you think. Let’s get you there!
Explanation:
Use explode to flatten the nested items array.
Calculate revenue for each item by multiplying quantity and price.
Group by order_id and payment.method to aggregate the total
revenue.

Common Mistakes:
Not using explode to handle nested arrays.
Forgot to include the payment.method in the groupBy.

📞 Don’t wait—call us at +91 98604 38743 today


Your next opportunity is closer than you think. Let’s get you there!
#AzureSynapse #DataEngineering #InterviewPreparation
#JobReady #MockInterviews #Deloitte #CareerSuccess
#ProminentAcademy

❌Think your skills are enough?


Think again—these Data engineer
scenario-based questions could cost you
your data engineering job.
In a recent interview at many big MNC’s, one of our
students faced scenario-based questions related to
data engineering, and many candidates struggled to
answer them correctly. These questions are designed
to test your real-world knowledge and ability to solve
complex data engineering problems.

Unfortunately, many students failed to answer these


questions confidently. The truth is, preparation is key,
and that’s where Prominent Academy comes in!
We specialize in preparing you for spark and data


engineering interviews by:


Offering scenario-based mock interviews
Providing hands-on training with data engineering


features


Optimizing your resume & LinkedIn profile
Giving personalized interview coaching to ensure
you’re job-ready
Don’t leave your future to chance!

📞Call us at +91 98604 38743and get the


interview prep you need to succeed

You might also like