0% found this document useful (0 votes)

33 views7 pages

Spark Handbook

The document outlines various data engineering scenarios involving real-time data processing and aggregation using Spark. It includes examples of handling IoT sensor data, log data, user transactions with skew, and nested JSON datasets, along with common mistakes to avoid. Additionally, it promotes Prominent Academy's services for preparing candidates for data engineering interviews through mock interviews and personalized coaching.

Uploaded by

ravikanth.vss.kompella

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views7 pages

Spark Handbook

Uploaded by

ravikanth.vss.kompella

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

data

engineer

AFRIN AHAMED
Question:

You are given a stream of IoT sensor data with columns:

sensor_id, timestamp, and value. Detect sensors with values
exceeding a threshold (e.g., 100) in real-time.

Explanation:
Read streaming data from Kafka.
Parse the data and filter sensors with values exceeding the threshold.
Write the output to the console.

Afrin Ahamed
Question:

You are given a stream of log data with columns: timestamp,

user_id, and action. Calculate the count of actions per user in
real-time.

Explanation:
Explanation:
Read streaming data from Kafka.
Parse the data and group by user_id and a 1-minute window.
Write the output to the console.

Common Mistakes:
Not defining the window correctly.
Forgetting to start the streaming query with awaitTermination().

Afrin Ahamed
Question:

You are given a dataset of user transactions with columns:

user_id, transaction_id, and amount. The dataset is heavily
skewed on the user_id column. Calculate the total transaction
amount per user while handling the skew.

Explanation:
Add a random salt to the skewed key (user_id) to distribute the data
evenly.
Perform the first aggregation on the salted key.
Remove the salt and perform a second aggregation to get the final
result.

Common Mistakes:
Not addressing data skew, leading to slow performance.
Forgetting to remove the salt in the final aggregation.

Afrin Ahamed
Question:

You are given a nested JSON dataset with the following

structure:

Write a Spark job to extract the following:

Total revenue per order.
Payment method for each order.

Afrin Ahamed
Explanation:
Use explode to flatten the nested items array.
Calculate revenue for each item by multiplying quantity and price.
Group by order_id and payment.method to aggregate the total
revenue.

Common Mistakes:
Not using explode to handle nested arrays.
Forgot to include the payment.method in the groupBy.

Afrin Ahamed
Think your skills are enough?
❌
Think again—these Data engineer
scenario-based questions could cost you
your data engineering job.
In a recent interview at many big MNC’s, one of our
students faced scenario-based questions related to
data engineering, and many candidates struggled to
answer them correctly. These questions are designed
to test your real-world knowledge and ability to solve
complex data engineering problems.

Unfortunately, many students failed to answer these

questions confidently. The truth is, preparation is key,
and that’s where Prominent Academy comes in!
We specialize in preparing you for spark and data
engineering interviews by:
✅ Offering scenario-based mock interviews
✅ Providing hands-on training with data engineering
features
✅ Optimizing your resume & LinkedIn profile
✅ Giving personalized interview coaching to ensure
you’re job-ready
Don’t leave your future to chance!

Afrin Ahamed
Afrin Ahamed

Databricks Data Engineer Professional
No ratings yet
Databricks Data Engineer Professional
98 pages
Ace The Data Engineer Interview PDF
No ratings yet
Ace The Data Engineer Interview PDF
72 pages
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
No ratings yet
12 - DataEngineer - Interview - Questions and Answers - EPAM Anywhere
2 pages
Week 3 - Data Engineering Lifecycle
100% (1)
Week 3 - Data Engineering Lifecycle
6 pages
AS Chemistry - Revision Notes Unit 3 - Introduction To Organic Chemistry
No ratings yet
AS Chemistry - Revision Notes Unit 3 - Introduction To Organic Chemistry
15 pages
SEL-487B-1: Bus Differential and Breaker Failure Relay
100% (1)
SEL-487B-1: Bus Differential and Breaker Failure Relay
726 pages
Deloitte Scenario-Based Questions in Spark
No ratings yet
Deloitte Scenario-Based Questions in Spark
7 pages
Deloitte Data Engineer
No ratings yet
Deloitte Data Engineer
7 pages
Interview
No ratings yet
Interview
2 pages
TCS Rejected Many Due To Weak PySpark Logic!?
No ratings yet
TCS Rejected Many Due To Weak PySpark Logic!?
7 pages
Data Engineering Interviews Are Getting TOUGHER?
No ratings yet
Data Engineering Interviews Are Getting TOUGHER?
8 pages
Quantiphi Interview
No ratings yet
Quantiphi Interview
2 pages
Top 10 Code Challenges Interview Abhishek
No ratings yet
Top 10 Code Challenges Interview Abhishek
6 pages
PySpark Optimization Scenarios - Wipro
No ratings yet
PySpark Optimization Scenarios - Wipro
8 pages
Spark Test Que
No ratings yet
Spark Test Que
3 pages
Pyspark Coding Interview Questions
No ratings yet
Pyspark Coding Interview Questions
19 pages
Publicis Sapient Pyspark
No ratings yet
Publicis Sapient Pyspark
10 pages
Interview QnAs - CloudyML
No ratings yet
Interview QnAs - CloudyML
13 pages
N RQgi 8 Eg DUNFS451 K4 X QXA
No ratings yet
N RQgi 8 Eg DUNFS451 K4 X QXA
61 pages
Myinterview Qs
No ratings yet
Myinterview Qs
9 pages
?stuck in A Loop of Rejections - Let's Break The Cycle!?
No ratings yet
?stuck in A Loop of Rejections - Let's Break The Cycle!?
7 pages
Resume Building Tips by Prafful
No ratings yet
Resume Building Tips by Prafful
7 pages
Palash Mondal (Data Scientist) Resume 5+ Exp
No ratings yet
Palash Mondal (Data Scientist) Resume 5+ Exp
3 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
4 pages
Tiger Analytics 1735834470
No ratings yet
Tiger Analytics 1735834470
27 pages
DS R Unit-4
No ratings yet
DS R Unit-4
5 pages
Data Engineering Vs Data Science
No ratings yet
Data Engineering Vs Data Science
26 pages
Spark Scenario Based Interview Questions !! For Interview
No ratings yet
Spark Scenario Based Interview Questions !! For Interview
4 pages
Unit 4 - DS - 1st Year
No ratings yet
Unit 4 - DS - 1st Year
6 pages
Day6 Dataanalyst
No ratings yet
Day6 Dataanalyst
9 pages
The Art of Clean Code: Best Practices to Eliminate Complexity and Simplify Your Life
From Everand
The Art of Clean Code: Best Practices to Eliminate Complexity and Simplify Your Life
Christian Mayer
No ratings yet
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
9 pages
Pyspark Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
No ratings yet
Pyspark Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
12 pages
Latest Data Mining Lab Manual
No ratings yet
Latest Data Mining Lab Manual
74 pages
BYD Company Interview Questions
No ratings yet
BYD Company Interview Questions
1 page
Data Engineer
No ratings yet
Data Engineer
19 pages
5 Tips To Prepare For Data Scientist Interview
No ratings yet
5 Tips To Prepare For Data Scientist Interview
17 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
29 pages
Big Data Analysis With Apache Spark: Uc#Berkeley
No ratings yet
Big Data Analysis With Apache Spark: Uc#Berkeley
80 pages
De Interview Raamashaamy Qna Bank
No ratings yet
De Interview Raamashaamy Qna Bank
11 pages
Company Interview
No ratings yet
Company Interview
24 pages
Set. No - 1 P18PECS031-Data Preparation and Analysis QP - PH.D.
No ratings yet
Set. No - 1 P18PECS031-Data Preparation and Analysis QP - PH.D.
22 pages
Important Da
No ratings yet
Important Da
9 pages
Anomalies in Dataset
No ratings yet
Anomalies in Dataset
4 pages
Unit 4 Spark SQL
No ratings yet
Unit 4 Spark SQL
49 pages
Pyspark Coding Questions From StrataScratch Platform
No ratings yet
Pyspark Coding Questions From StrataScratch Platform
23 pages
Dsebl ZG522
No ratings yet
Dsebl ZG522
4 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
Tech Mahindra
No ratings yet
Tech Mahindra
1 page
Data Science With Machine Learning Level 1-5
No ratings yet
Data Science With Machine Learning Level 1-5
7 pages
Top 100+ Data Engineer Interview Questions and Answers For 2022
No ratings yet
Top 100+ Data Engineer Interview Questions and Answers For 2022
4 pages
Apache Spark - Practices
No ratings yet
Apache Spark - Practices
24 pages
Scenario Series 19 - Handling JSON in Pyspark
No ratings yet
Scenario Series 19 - Handling JSON in Pyspark
8 pages
Py 1731703428
No ratings yet
Py 1731703428
8 pages
Text 3
No ratings yet
Text 3
3 pages
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
No ratings yet
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
9 pages
BDA Questions
No ratings yet
BDA Questions
8 pages
Algorithms For Data Engineers 1737183205
No ratings yet
Algorithms For Data Engineers 1737183205
6 pages
Data Engineer Interview at A Top Product-Based Company
No ratings yet
Data Engineer Interview at A Top Product-Based Company
7 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
45 Data Analyst Interview Questions-1
No ratings yet
45 Data Analyst Interview Questions-1
22 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
M8 - M1L4
No ratings yet
M8 - M1L4
20 pages
Geo3701 Unit 2
No ratings yet
Geo3701 Unit 2
59 pages
Homework Riddles
100% (1)
Homework Riddles
5 pages
Itep Grammar Practice Exercises: Complete The Sentence: Error Correction
No ratings yet
Itep Grammar Practice Exercises: Complete The Sentence: Error Correction
5 pages
English Periodic Test Class XII Mock
No ratings yet
English Periodic Test Class XII Mock
3 pages
Y9 2. Possibility Diagram
No ratings yet
Y9 2. Possibility Diagram
13 pages
8 Total Quality Management Principles - Lucidchart Blog
No ratings yet
8 Total Quality Management Principles - Lucidchart Blog
12 pages
Business Etiquette in South Korea - 20230908 - 122053 - 0000
No ratings yet
Business Etiquette in South Korea - 20230908 - 122053 - 0000
8 pages
BachHoang FritoLay Memo
No ratings yet
BachHoang FritoLay Memo
4 pages
SRS-02 (Gen. Aptitude Test) SET-A PDF
No ratings yet
SRS-02 (Gen. Aptitude Test) SET-A PDF
22 pages
Game Changer - Record
No ratings yet
Game Changer - Record
3 pages
Revision For Gifted Student
No ratings yet
Revision For Gifted Student
6 pages
Class X Prep Ideas
No ratings yet
Class X Prep Ideas
6 pages
The Next Big Thing Quantum Computings Potential On Chemicals
No ratings yet
The Next Big Thing Quantum Computings Potential On Chemicals
7 pages
Untitled
No ratings yet
Untitled
4 pages
FTS - Test-02 (Code-B) - 24-03-2023
No ratings yet
FTS - Test-02 (Code-B) - 24-03-2023
32 pages
Mathematics-10 Q1 Module-1.ppsm
No ratings yet
Mathematics-10 Q1 Module-1.ppsm
53 pages
11.metar and Taf
No ratings yet
11.metar and Taf
51 pages
1.develop A Program To Draw A Line Using Bresenham's Line Drawing Technique
No ratings yet
1.develop A Program To Draw A Line Using Bresenham's Line Drawing Technique
1 page
Common Session Observe Workplace Hygiene Procedures
No ratings yet
Common Session Observe Workplace Hygiene Procedures
8 pages
Intoduction To Ista
No ratings yet
Intoduction To Ista
14 pages
TNX Tower Manual
No ratings yet
TNX Tower Manual
265 pages
Saso Iso 17089 2 2020 e
No ratings yet
Saso Iso 17089 2 2020 e
45 pages
Introduction To Management Chapter 3
No ratings yet
Introduction To Management Chapter 3
10 pages
Block 1 Psyc1009 Course Pack 2021 Final
No ratings yet
Block 1 Psyc1009 Course Pack 2021 Final
14 pages
Te - Civil - Syllabus - R19 - C - Scheme - Draft Dopy
No ratings yet
Te - Civil - Syllabus - R19 - C - Scheme - Draft Dopy
150 pages
LEADERSHIP Notes
No ratings yet
LEADERSHIP Notes
6 pages
Lifi
No ratings yet
Lifi
19 pages

Spark Handbook

Uploaded by

Spark Handbook

Uploaded by

data

You are given a stream of IoT sensor data with columns:

You are given a stream of log data with columns: timestamp,

You are given a dataset of user transactions with columns:

You are given a nested JSON dataset with the following

Write a Spark job to extract the following:

Unfortunately, many students failed to answer these

You might also like