0% found this document useful (0 votes)
19 views2 pages

Spark Questions Asked in Mock Interview

The document lists a series of Spark-related questions commonly asked in mock interviews, covering topics such as executors, slowly changing dimensions (SCD), data handling, and optimization techniques. It includes inquiries about file reading/writing modes, handling NULL values, data frame operations, and various Spark concepts like Medallion Architecture and partitioning. The questions are designed to assess knowledge and practical skills in Spark and data processing.

Uploaded by

Satyajit Ligade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views2 pages

Spark Questions Asked in Mock Interview

The document lists a series of Spark-related questions commonly asked in mock interviews, covering topics such as executors, slowly changing dimensions (SCD), data handling, and optimization techniques. It includes inquiries about file reading/writing modes, handling NULL values, data frame operations, and various Spark concepts like Medallion Architecture and partitioning. The questions are designed to assess knowledge and practical skills in Spark and data processing.

Uploaded by

Satyajit Ligade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

 Spark Questions Asked in Mock Interview 

1. There are 10 nodes and 15 Cores How many executors will be there?
2. How will you implement SCD in your project and which type will you use
and why?
3. Modes while reading a file?
4. Modes while writing in different file format?
5. What is query to implement SCD1 AND 2 in Delta Table?
6. How will you handle NULL in data frame?
7. How will you read only 4 files in a folder having 10 files in it?
8. How will you replace NULL by replacing with NA or with any value?
9. Difference in Left Anti-Join and Left Semi-Join?
10.What is Medallion Architecture?
11.How will you handle duplicates in data frame or how will you remove
duplicates in Data Frame?
12.What is serialization and what is deserialization?
13.How to create Delta Table?
14.Memory Management in Spark?
15.What is sorting and shuffling?
16.What is salting?
17.How will we handle skewness ?
18.Optimization Technique in Spark?
19.Advance Join in spark?
20.What is Partition By and Bucket By/Bucketing?
21.Errors faced in our Airline Project?
22.What is Partition Pruning and Dynamic Partition Pruning?
23.How will you check the skewness in spark?
24.How will you check which partition have lager data in it without using
UI?
25. How to remove data from Disk and from memory?
26.What is Lineage and how it is different from DAG?
27.Steps to handle extra comma in CSV file?
28.Difference between JSON and parquet file format?
29.After writing in Parquet file why we use Coalesce(1)?
30.Speculative Exection?
31.Difference in Spark’s (Union and Union All) and SQL’s (Union and Union
All).
32. What is broadcast variable?
33.How will you write exact SQL queries in Spark?
34.What is Spark-Submit Command?
35.If there is no python worker will our pyspark code work?
36.Why we can’t use Coalesce to increase partition?
37.What is hash and Heap?
38.How our execution plan switch to AQE?
39.Calculation of Number of executor, cores
40.10 GB file and there is cluster of 5 Executor, tell how many number of
task will be formed?
41. Pivot and unpivot in Spark?
42.How will you flatten the data?
43.How will you Extract columns from JSON file?
44.How will you take out the column from the data frame and save it.?

You might also like