0% found this document useful (0 votes)
9 views4 pages

Day 26 Modes of Deployment

Uploaded by

suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Day 26 Modes of Deployment

Uploaded by

suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Master Spark Concepts Zero to Big data Hero:

What is Spark Submit?


Spark-submit is a command-line tool used to deploy Spark applications to a cluster. It allows
users to:
1. Specify application configurations such as memory, cores, and dependencies.
2. Submit Spark jobs in different deployment modes.
3. Interact with resource managers like YARN, Mesos, or Kubernetes.
Key Features of Spark-submit:
• Enables the execution of distributed applications.
• Supports multiple languages like Scala, Python, Java, and R.
• Offers flexibility with deployment modes.

Deployment Modes in Spark


When submitting a Spark job using spark-submit, you must choose a deploy mode to define
where the driver program (main application logic) will run.

Cluster Mode
In Cluster Mode, the Driver runs within the cluster on one of the worker nodes, and the
cluster manager allocates resources, including the Driver and Executors, to handle the
application’s execution.
Use Cluster Mode for production applications or long-running jobs, where the driver runs
within the cluster for better resource management and fault tolerance.
1. User submits the Spark application to the Driver.
2. Driver communicates with the Cluster Manager (YARN) to obtain resources.
3. Cluster Manager starts the Application Master, which initializes the driver.
4. Driver assigns tasks to Executor 1 and Executor 2 for processing.
5. Executors carry out the tasks and return results to the Driver.
6. Driver aggregates all the results and sends the final output back to the User.
Client Mode
Client Mode: In Client Mode, the Driver runs on the client machine, and it directly interacts
with the cluster manager to request resources and assign tasks to the worker nodes for
execution.
Client Mode: Use Client Mode for interactive applications or development and testing,
where the driver needs to run on the local machine and interact directly with the user.
1. User submits the Spark application to the Driver.
2. Driver (acting as the Application Master) requests resources from the Cluster
Manager (YARN).
3. Cluster Manager allocates the required resources and returns them to the Driver.
4. Driver assigns tasks to Executor 1 for data processing.
5. Driver assigns tasks to Executor 2 for data processing.
6. Executor 1 sends task results back to the Driver.
7. Executor 2 sends task results back to the Driver.
8. Driver sends the final output back to the User.
Key Differences Between Client Mode and Cluster Mode

Aspect Client Mode Cluster Mode

Driver Location Runs on the client machine Runs on a worker node in the cluster

Requires an active client


Dependency Operates independently of the client
connection

Production and large-scale


Best For Development and testing
applications

Execution
Managed by the client Managed by the cluster
Control

How Databricks Overcomes These Limitations?


Databricks simplifies the deployment of Spark applications and abstracts the complexities of
deployment modes. Here's how:
1. Unified Environment:
o Databricks combines the benefits of both Client Mode and Cluster Mode by
managing the driver program and executors within its environment.
2. Interactive Notebooks:
o Provides a seamless notebook interface for interactive development, similar to
Client Mode, but hosted entirely on the Databricks platform.
3. Job Clusters:
o For production workloads, Databricks uses job clusters, ensuring stability and
independence from user sessions, akin to Cluster Mode.
4. Enhanced Reliability:
o Databricks automates resource allocation and handles disconnections
gracefully, allowing users to focus on development without worrying about
deployment configurations.
5. Scalability and Optimization:
o Databricks clusters dynamically scale resources based on workload needs,
offering better efficiency and performance compared to traditional deployment
modes.

Conclusion
• Spark-submit gives flexibility to choose between Client Mode for development and
Cluster Mode for production.
• Databricks takes this flexibility further by unifying and automating deployment,
making Spark applications easier to develop, test, and deploy.

You might also like