Pyspark - Spark-Submit Important Configs

Uploaded by

kumari.munni3737

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views3 pages

Pyspark - Spark-Submit Important Configs

Uploaded by

kumari.munni3737

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

spark-submit #

spark-submit is the command used to submit applications to a Spark cluster. It is a powerful tool that allows
you to configure various settings for your Spark jobs, including memory and CPU allocation, cluster modes,
and application-specific parameters. Properly configuring spark-submit is essential for optimizing Spark jobs
for performance and resource usage.

Key Configurations in spark-submit #

Below are the most important configurations you can use with spark-submit, along with their purposes and
examples:

1. Application Resource Configuration #

--master: Specifies the cluster manager to connect to. It can be local for local mode, yarn for Hadoop
YARN, mesos for Apache Mesos, or k8s for Kubernetes.
Example: --master yarn
--deploy-mode: Defines whether to launch the driver on the worker nodes (cluster) or locally on the
machine submitting the application (client).
Example: --deploy-mode cluster
--num-executors: Sets the number of executors to use for the job. This is applicable in cluster modes
like YARN.
Example: --num-executors 5
--executor-cores: Specifies the number of CPU cores per executor. Higher values increase
parallelism.
Example: --executor-cores 4
--executor-memory: Allocates memory for each executor process. Proper sizing can prevent out-of-
memory errors.
Example: --executor-memory 8G
--driver-memory: Sets the amount of memory allocated for the driver process.
Example: --driver-memory 4G

2. Configuration for Dynamic Resource Allocation #

--conf spark.dynamicAllocation.enabled=true: Enables dynamic allocation of executors. Spark will

scale the number of executors up and down based on workload.
Example: --conf spark.dynamicAllocation.enabled=true
--conf spark.dynamicAllocation.minExecutors: Minimum number of executors to be allocated when
dynamic allocation is enabled.
Example: --conf spark.dynamicAllocation.minExecutors=2
--conf spark.dynamicAllocation.maxExecutors: Maximum number of executors Spark can allocate.
Example: --conf spark.dynamicAllocation.maxExecutors=10

3. Resource Management and Scheduling #

--conf spark.yarn.executor.memoryOverhead: Extra memory to be allocated per executor for JVM

overheads. This is useful for managing memory more effectively.
Example: --conf spark.yarn.executor.memoryOverhead=1024
--conf spark.scheduler.mode: Configures the scheduling mode (FIFO or FAIR). FAIR scheduling
allows jobs to share resources more evenly.
Example: --conf spark.scheduler.mode=FAIR
--conf spark.locality.wait: Adjusts the amount of time Spark waits to launch tasks on preferred
nodes before scheduling elsewhere. Helps in managing locality.
Example: --conf spark.locality.wait=3s
4. Spark Logging and Debugging #

--conf spark.eventLog.enabled=true: Enables Spark event logging. This helps in monitoring and
debugging by storing event information.
Example: --conf spark.eventLog.enabled=true
--conf spark.eventLog.dir: Specifies the directory where the event logs should be stored.
Example: --conf spark.eventLog.dir=hdfs:///logs/
--conf spark.executor.logs.rolling.strategy=time: Sets the rolling strategy for executor logs.
Useful for managing log file sizes and retention.
Example: --conf spark.executor.logs.rolling.strategy=time
--conf spark.executor.logs.rolling.time.interval=daily: Defines the interval for rolling executor
logs.
Example: --conf spark.executor.logs.rolling.time.interval=daily

5. Application Specific Configurations #

--conf spark.sql.shuffle.partitions: Configures the number of partitions to use when shuffling data
during Spark SQL operations. Tweaking this number can optimize shuffling.
Example: --conf spark.sql.shuffle.partitions=200
--conf spark.serializer: Specifies the serializer for RDDs. The default is Java serialization, but Kryo
serialization can be more efficient.
Example: --conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.executor.extraJavaOptions: Passes additional JVM options for executors. Useful for
setting system properties or managing JVM memory.
Example: --conf spark.executor.extraJavaOptions="-XX:+UseG1GC"

6. Security Configurations #

--conf spark.authenticate=true: Enables authentication for Spark communication to enhance

security.
Example: --conf spark.authenticate=true
--conf spark.authenticate.secret: Defines the secret key for Spark authentication.
Example: --conf spark.authenticate.secret=mySecretKey
--conf spark.ssl.enabled=true: Enables SSL for all Spark communication.
Example: --conf spark.ssl.enabled=true

Example spark-submit Command #

Below is an example of a spark-submit command using several of these configurations:

spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 5 \
--executor-cores 4 \
--executor-memory 8G \
--driver-memory 4G \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.minExecutors=2 \
--conf spark.dynamicAllocation.maxExecutors=10 \
--conf spark.yarn.executor.memoryOverhead=1024 \
--conf spark.scheduler.mode=FAIR \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=hdfs:///logs/ \
--conf spark.sql.shuffle.partitions=200 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.executor.extraJavaOptions="-XX:+UseG1GC" \
--conf spark.authenticate=true \
--conf spark.authenticate.secret=mySecretKey \
--class com.example.MySparkApp \
/path/to/my-spark-app.jar

Drivers For Big Data
No ratings yet
Drivers For Big Data
7 pages
20ec52i W1 1
No ratings yet
20ec52i W1 1
14 pages
LTE Retainability
90% (10)
LTE Retainability
18 pages
Databricks Apache Spark Certified Developer Master Cheat Sheet
100% (1)
Databricks Apache Spark Certified Developer Master Cheat Sheet
29 pages
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
Chapter 7 Interrupts of 8085
100% (1)
Chapter 7 Interrupts of 8085
20 pages
Pyspark Tutorial
100% (2)
Pyspark Tutorial
27 pages
Spark by Sumit
No ratings yet
Spark by Sumit
33 pages
Weblinkug
100% (1)
Weblinkug
338 pages
SQL Queries With Answers
100% (5)
SQL Queries With Answers
16 pages
Aksa Lte NW Assessment
100% (2)
Aksa Lte NW Assessment
43 pages
Apache Spark Interview Questions Book
100% (1)
Apache Spark Interview Questions Book
15 pages
ALV Reports
No ratings yet
ALV Reports
70 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
7 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Understanding Apache Spark Architecture
No ratings yet
Understanding Apache Spark Architecture
30 pages
Apache Spark Component Guide
No ratings yet
Apache Spark Component Guide
84 pages
Encoder:: US/.html
No ratings yet
Encoder:: US/.html
2 pages
02 - Apache Spark On Amazon EMR
No ratings yet
02 - Apache Spark On Amazon EMR
31 pages
Q1. Understanding Apache Spark
No ratings yet
Q1. Understanding Apache Spark
4 pages
MS Azure ALL
No ratings yet
MS Azure ALL
39 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
ADCA Final Question PDF
50% (2)
ADCA Final Question PDF
6 pages
Dsa Report 1
No ratings yet
Dsa Report 1
66 pages
Spark 20 Tuning Guide
No ratings yet
Spark 20 Tuning Guide
21 pages
Bda U4
No ratings yet
Bda U4
49 pages
TmForum ODA
No ratings yet
TmForum ODA
42 pages
Spark Guide Hortonworks Data Platform
No ratings yet
Spark Guide Hortonworks Data Platform
52 pages
Spark Ops Final
No ratings yet
Spark Ops Final
45 pages
Spark Class 1 PPT
No ratings yet
Spark Class 1 PPT
33 pages
FrmGETHortanWorksFile Aspx
No ratings yet
FrmGETHortanWorksFile Aspx
44 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
Service Quotas: User Guide
No ratings yet
Service Quotas: User Guide
19 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
30 pages
XYZ Bank: Detailed Test Plan
No ratings yet
XYZ Bank: Detailed Test Plan
15 pages
Operating Systems: Simple/Basic Segmentation
No ratings yet
Operating Systems: Simple/Basic Segmentation
29 pages
Unit 4 Spark Updated
No ratings yet
Unit 4 Spark Updated
86 pages
Unit IV Spark
No ratings yet
Unit IV Spark
23 pages
Basics of Apache Spark Configuration Settings - by Halil Ertan - Towards Data Science
No ratings yet
Basics of Apache Spark Configuration Settings - by Halil Ertan - Towards Data Science
11 pages
Spark Class 1
No ratings yet
Spark Class 1
33 pages
Configuration - Spark 2.3.2 Documentation
No ratings yet
Configuration - Spark 2.3.2 Documentation
20 pages
Rhod RGB: Quick Installation Guide
No ratings yet
Rhod RGB: Quick Installation Guide
12 pages
Big Data Technology: Vietnam National University of HCMC
No ratings yet
Big Data Technology: Vietnam National University of HCMC
39 pages
VLAN Note 1744033214
No ratings yet
VLAN Note 1744033214
47 pages
DC Unit 1
No ratings yet
DC Unit 1
18 pages
Hadoopvsspark 180108070838
No ratings yet
Hadoopvsspark 180108070838
17 pages
Spark Basic
No ratings yet
Spark Basic
40 pages
Spark Class 2
No ratings yet
Spark Class 2
37 pages
BDALab Assn5
No ratings yet
BDALab Assn5
16 pages
Unit 4 Topic 5 Spark On YARN
No ratings yet
Unit 4 Topic 5 Spark On YARN
26 pages
Compare Hadoop and Spark.: Table
No ratings yet
Compare Hadoop and Spark.: Table
10 pages
Apache Spark Guide
No ratings yet
Apache Spark Guide
33 pages
CCF Usage Manual
No ratings yet
CCF Usage Manual
8 pages
HP Color LaserJet CM1312
No ratings yet
HP Color LaserJet CM1312
7 pages
Spark Optimization 1741826797
No ratings yet
Spark Optimization 1741826797
7 pages
Final Note
No ratings yet
Final Note
31 pages
Slips Bigdata
No ratings yet
Slips Bigdata
6 pages
Cluster in Databricks
No ratings yet
Cluster in Databricks
9 pages
2231 Addendum OOP244
No ratings yet
2231 Addendum OOP244
13 pages
Py Spark
No ratings yet
Py Spark
7 pages
Review of Related Systems
No ratings yet
Review of Related Systems
6 pages
CH3477 Evaluation Board
No ratings yet
CH3477 Evaluation Board
9 pages
Apache Spark Installation
No ratings yet
Apache Spark Installation
4 pages
Hadoop Spark MongoDB SCALA Notes
No ratings yet
Hadoop Spark MongoDB SCALA Notes
4 pages
Spark Overview: Security
No ratings yet
Spark Overview: Security
4 pages
29 PDFsam Apache Spark Tutorial
No ratings yet
29 PDFsam Apache Spark Tutorial
7 pages
INCEPTEZ TECHNOLOGIES Installation Guide: 1. Install Spark
No ratings yet
INCEPTEZ TECHNOLOGIES Installation Guide: 1. Install Spark
2 pages
Spark Optimisation
No ratings yet
Spark Optimisation
7 pages
DWR-961 REVD DATASHEET v1.00 VERIZON
No ratings yet
DWR-961 REVD DATASHEET v1.00 VERIZON
2 pages
Watercad 4.0
No ratings yet
Watercad 4.0
5 pages
Internal Schedule
No ratings yet
Internal Schedule
4 pages
CIS Question Bank
No ratings yet
CIS Question Bank
3 pages
18 SparkSubmit BigData 6x
No ratings yet
18 SparkSubmit BigData 6x
3 pages
ICT PRE-MOCK Paper 2
No ratings yet
ICT PRE-MOCK Paper 2
4 pages
Apache Spark Vs MapReduce
No ratings yet
Apache Spark Vs MapReduce
3 pages
Spark Terminology A To Z
No ratings yet
Spark Terminology A To Z
7 pages
6th International Conference On VLSI & Embedded Systems (VLSIE 2025)
No ratings yet
6th International Conference On VLSI & Embedded Systems (VLSIE 2025)
2 pages
Cluster Configuration and Spark UI Databricks 1721934901
No ratings yet
Cluster Configuration and Spark UI Databricks 1721934901
3 pages
Spark - Out of Memory Exception Handling
No ratings yet
Spark - Out of Memory Exception Handling
3 pages
6AV78622BE000AA0 Datasheet en
No ratings yet
6AV78622BE000AA0 Datasheet en
2 pages
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
From Everand
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
John Edward Cooper Berg
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet

Pyspark - Spark-Submit Important Configs

Uploaded by

Pyspark - Spark-Submit Important Configs

Uploaded by

spark-submit #

Key Configurations in spark-submit #

1. Application Resource Configuration #

2. Configuration for Dynamic Resource Allocation #

--conf spark.dynamicAllocation.enabled=true: Enables dynamic allocation of executors. Spark will

3. Resource Management and Scheduling #

--conf spark.yarn.executor.memoryOverhead: Extra memory to be allocated per executor for JVM

5. Application Specific Configurations #

--conf spark.authenticate=true: Enables authentication for Spark communication to enhance

Example spark-submit Command #

Below is an example of a spark-submit command using several of these configurations:

You might also like