Day 26 Modes of Deployment
Day 26 Modes of Deployment
Cluster Mode
In Cluster Mode, the Driver runs within the cluster on one of the worker nodes, and the
cluster manager allocates resources, including the Driver and Executors, to handle the
application’s execution.
Use Cluster Mode for production applications or long-running jobs, where the driver runs
within the cluster for better resource management and fault tolerance.
1. User submits the Spark application to the Driver.
2. Driver communicates with the Cluster Manager (YARN) to obtain resources.
3. Cluster Manager starts the Application Master, which initializes the driver.
4. Driver assigns tasks to Executor 1 and Executor 2 for processing.
5. Executors carry out the tasks and return results to the Driver.
6. Driver aggregates all the results and sends the final output back to the User.
Client Mode
Client Mode: In Client Mode, the Driver runs on the client machine, and it directly interacts
with the cluster manager to request resources and assign tasks to the worker nodes for
execution.
Client Mode: Use Client Mode for interactive applications or development and testing,
where the driver needs to run on the local machine and interact directly with the user.
1. User submits the Spark application to the Driver.
2. Driver (acting as the Application Master) requests resources from the Cluster
Manager (YARN).
3. Cluster Manager allocates the required resources and returns them to the Driver.
4. Driver assigns tasks to Executor 1 for data processing.
5. Driver assigns tasks to Executor 2 for data processing.
6. Executor 1 sends task results back to the Driver.
7. Executor 2 sends task results back to the Driver.
8. Driver sends the final output back to the User.
Key Differences Between Client Mode and Cluster Mode
Driver Location Runs on the client machine Runs on a worker node in the cluster
Execution
Managed by the client Managed by the cluster
Control
Conclusion
• Spark-submit gives flexibility to choose between Client Mode for development and
Cluster Mode for production.
• Databricks takes this flexibility further by unifying and automating deployment,
making Spark applications easier to develop, test, and deploy.