Extended Spark Interview QA
Extended Spark Interview QA
Compare Spark with Hadoop MapReduce. What are the key differences?
Spark provides in-memory data processing which makes it much faster than Hadoop
MapReduce, which relies on disk-based processing. Spark also supports interactive queries,
streaming data, and iterative algorithms better than Hadoop's batch-processing model.
Describe the Spark execution model. How does it process data in parallel?
Spark follows a master-slave architecture where the driver coordinates tasks and workers
(executors) process data in parallel across partitions. This distributed approach allows data
to be split and processed concurrently.
What is a Spark Driver? Explain its role in a Spark application.
The Spark Driver is responsible for orchestrating the execution of a Spark job. It translates
user code into tasks, distributes tasks among executors, and monitors their progress. It runs
on the master node.
What is a DAG (Directed Acyclic Graph) in Spark? How does it help optimize
operations?
A DAG represents a sequence of computations as a graph of stages and tasks in Spark. By
breaking tasks into stages, Spark optimizes execution through stage-level scheduling,
avoiding unnecessary data shuffling and recomputation.