Apache Spark Application Performance Tuning

This three-day hands-on training course focuses on improving the performance of Apache Spark applications, covering key concepts such as architecture, performance evaluation, and optimization techniques. Participants will learn to identify performance issues, utilize caching, and understand the new features in Spark 3.0. The course is designed for experienced developers and data scientists familiar with Python and SQL.

Uploaded by

Rajveer Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views2 pages

Apache Spark Application Performance Tuning

Uploaded by

Rajveer Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

TRAINING SHEET

APACHE SPARK APPLICATION

PERFORMANCE TUNING
Maximize the performance of your applications

This three-day hands-on training course delivers the key concepts and expertise
“Cloudera’s instructor was developers need to improve the performance of their Apache Spark applications.
excellent, offering clear and During the course, participants will learn how to identify common sources of poor
concise training that was easy to performance in Spark applications, techniques for avoiding or solving them, and best
understand. His wide-ranging practices for Spark application monitoring.
peripheral knowledge helped
Apache Spark Application Performance Tuning presents the architecture and
apply the course materials to
concepts behind Apache Spark and underlying data platform, then builds on this
real-world situations. I look
foundational understanding by teaching students how to tune Spark application
forward to attending another
course.”
code. The course format emphasizes instructor-led demonstrations illustrate both
performance issues and the techniques that address them, followed by hands-on
Comscore exercises that give students an opportunity to practice what they’ve learned through
an interactive notebook environment. The course applies to Spark 2.4, but also
introduces the Spark 3.0 Adaptive Query Execution framework.

What You Will Learn

Students who successfully complete this course will be able to:
• Understand Apache Spark’s architecture, job execution, and how techniques
such as lazy execution and pipelining can improve runtime performance
• Evaluate the performance characteristics of core data structures such as RDD
and DataFrames
• Select the file formats that will provide the best performance for your application
• Identify and resolve performance problems caused by data skew
• Use partitioning, bucketing, and join optimizations to improve SparkSQL
performance
• Understand the performance overhead of Python-based RDDs, DataFrames, and
user-defined functions
• Take advantage of caching for better application performance
• Understand how the Catalyst and Tungsten optimizers work
• Understand how Workload XM can help troubleshoot and proactively monitor
Spark applications performance
• Learn about the new features in Spark 3.0 and specifically how the Adaptive
Query Execution engine improves performance

What to Expect
This course is designed for software developers, engineers, and data scientists who
have experience developing Spark applications and want to learn how to improve the
performance of their code. This is not an introduction to Spark.
Spark examples and hands-on exercises are presented in Python and the ability to
program in this language is required. Basic familiarity with the Linux command line is
assumed. Basic knowledge of SQL is helpful.
TRAINING SHEET

Course Details:

Spark Architecture Mitigating Spark Shuffles Caching Data for Reuse

• RDDs • Denormalization • Caching Options
• DataFrames and Datasets • Broadcast Joins • Impact on Performance
• Lazy Evaluation • Map-Side Operations • Caching Pitfalls
• Pipelining • Sort Merge Joins
Workload XM (WXM) Introduction
Data Sources and Formats Partitioned and Bucketed Tables • WXM Overview
• Available Formats Overview • Partitioned Tables • WXM for Spark Developers
• Impact on Performance • Bucketed Tables
• The Small Files Problem • Impact on Performance What’s New in Spark 3.0?
• Adaptive Number of Shuffle Partitions
Inferring Schemas Improving Join Performance • Skew Joins
• The Cost of Inference • Skewed Joins • Convert Sort Merge Joins to
• Mitigating Tactics • Bucketed Joins Broadcast Joins
• Incremental Joins • Dynamic Partition Pruning
Dealing With Skewed Data • Dynamic Coalesce Shuffle Partitions
• Recognizing Skew Pyspark Overhead and UDFs
• Mitigating Tactics • Pyspark Overhead Appendix A: Partition Processing
• Scalar UDFs
Catalyst and Tungsten Overview Appendix B: Broadcasting
• Vector UDFs using Apache Arrow
• Catalyst Overview • Scala UDFs Appendix C: Scheduling
• Tungsten Overview

Cloudera, Inc. 5470 Great America Parkway, Santa Clara, CA 95054 cloudera.com
© 2020 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered
trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their
respective companies. Information is subject to change without notice.
spark-application-performance-tuning-datasheet_103 : 201026

C Modeling and Simulation
No ratings yet
C Modeling and Simulation
709 pages
Spark Devops
0% (1)
Spark Devops
301 pages
PySpark+Slides v1
No ratings yet
PySpark+Slides v1
458 pages
Beginners Guide To Making Money Online
100% (8)
Beginners Guide To Making Money Online
129 pages
Ebin - Pub Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
100% (1)
Ebin - Pub Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
307 pages
Spark SQL Tutorial PDF
100% (1)
Spark SQL Tutorial PDF
35 pages
Road Map 1741960074
No ratings yet
Road Map 1741960074
24 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Intro To Spark Development
No ratings yet
Intro To Spark Development
172 pages
Making Games With Python & Pygame
100% (2)
Making Games With Python & Pygame
368 pages
Databricks On AWS 01 Getting Started Apache Spark Slides
100% (1)
Databricks On AWS 01 Getting Started Apache Spark Slides
29 pages
Apache Spark Tutorial
100% (4)
Apache Spark Tutorial
36 pages
Big Data - Road Map
No ratings yet
Big Data - Road Map
22 pages
Spark SQL Tutorial
0% (1)
Spark SQL Tutorial
7 pages
Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
No ratings yet
Hands On Guide To Apache Spark 3 Build Scalable Computing Engines For Batch and Stream Data Processing 1nbsped 1484293797 9781484293799
407 pages
Reporting Format Guide Version 2.0
No ratings yet
Reporting Format Guide Version 2.0
193 pages
Apach Spark With Scala Slides
No ratings yet
Apach Spark With Scala Slides
187 pages
Learning Spark Preview Ed
No ratings yet
Learning Spark Preview Ed
18 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
Spark Tutorial
No ratings yet
Spark Tutorial
8 pages
Huawei Jny-Lx1 10.0.1.167 (C185e3r3p1) &jny-Lx1 10.0.
No ratings yet
Huawei Jny-Lx1 10.0.1.167 (C185e3r3p1) &jny-Lx1 10.0.
9 pages
B2. Introduction To Big Data With Spark and Hadoop - Coursera
No ratings yet
B2. Introduction To Big Data With Spark and Hadoop - Coursera
12 pages
EMC Networker and EMC Data Domain Boost Duplication Devices
No ratings yet
EMC Networker and EMC Data Domain Boost Duplication Devices
108 pages
Module 3
No ratings yet
Module 3
51 pages
3 SDS Documnet
No ratings yet
3 SDS Documnet
84 pages
Bda U3 p1 (Intro To Spark)
No ratings yet
Bda U3 p1 (Intro To Spark)
66 pages
Thesis Apache Spark
100% (2)
Thesis Apache Spark
4 pages
4 - Spark SQL
No ratings yet
4 - Spark SQL
58 pages
Bda U4
No ratings yet
Bda U4
49 pages
Spark Tutorial
No ratings yet
Spark Tutorial
77 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
SAS Viya The Python Perspective 1st Edition Kevin D. Smith PDF Download
No ratings yet
SAS Viya The Python Perspective 1st Edition Kevin D. Smith PDF Download
47 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
Day5 Patterns Use Cases
No ratings yet
Day5 Patterns Use Cases
45 pages
07 - Apache Spark - An Introduction
No ratings yet
07 - Apache Spark - An Introduction
36 pages
Apache Iceberg - Additional Real World Use Cases
No ratings yet
Apache Iceberg - Additional Real World Use Cases
25 pages
Net Term
No ratings yet
Net Term
54 pages
DE Python
No ratings yet
DE Python
11 pages
Delhi DSSSB 03 - 2023 Various Post Online Form 2023 - DSSSB
No ratings yet
Delhi DSSSB 03 - 2023 Various Post Online Form 2023 - DSSSB
4 pages
Trend Nologies Curriculum
No ratings yet
Trend Nologies Curriculum
30 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
1.1.4 and 1.1.5
No ratings yet
1.1.4 and 1.1.5
38 pages
Ebook Tutorial Gimp Bahasa Indonesia: Click Here To Get File
No ratings yet
Ebook Tutorial Gimp Bahasa Indonesia: Click Here To Get File
3 pages
Untitled
No ratings yet
Untitled
12 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
4a.introduction To Apache Spark
No ratings yet
4a.introduction To Apache Spark
28 pages
Introduction-to-Apache-Spark
No ratings yet
Introduction-to-Apache-Spark
22 pages
IA1 2023 Step by Step Guide Proposed Solution-1
No ratings yet
IA1 2023 Step by Step Guide Proposed Solution-1
18 pages
Design and Construction of An Ledscore Board For Minna Township Stadium
No ratings yet
Design and Construction of An Ledscore Board For Minna Township Stadium
9 pages
205 Intern Report
No ratings yet
205 Intern Report
18 pages
20J41A0514-Big Data Spark
No ratings yet
20J41A0514-Big Data Spark
12 pages
Pam Lab 2 1725806846
No ratings yet
Pam Lab 2 1725806846
15 pages
Sales and Distribution (SD)
No ratings yet
Sales and Distribution (SD)
12 pages
Semi - Automatic Solder Paste Printer
No ratings yet
Semi - Automatic Solder Paste Printer
14 pages
Apache Iceberg - Java and Python APIs
No ratings yet
Apache Iceberg - Java and Python APIs
9 pages
The Review of Raspberry Pi Based - Systems To Assist The Disabled Persons
No ratings yet
The Review of Raspberry Pi Based - Systems To Assist The Disabled Persons
10 pages
LaTeXBibliography Management - Wikibooks, Open Books For An Open World
No ratings yet
LaTeXBibliography Management - Wikibooks, Open Books For An Open World
15 pages
Spark Training - Java
No ratings yet
Spark Training - Java
8 pages
YAAPT Pitch Tracking MATLAB Function
No ratings yet
YAAPT Pitch Tracking MATLAB Function
11 pages
Big Data Hadoop & Spark Curriculum
No ratings yet
Big Data Hadoop & Spark Curriculum
10 pages
Submitted By:: Shweta Verma Ashaina Neha Kumari
No ratings yet
Submitted By:: Shweta Verma Ashaina Neha Kumari
10 pages
B3. Machine Learning With Apache Spark - Coursera
No ratings yet
B3. Machine Learning With Apache Spark - Coursera
10 pages
Venu Data Engineering Training in Hyderabad 1
No ratings yet
Venu Data Engineering Training in Hyderabad 1
8 pages
Lab 1
No ratings yet
Lab 1
7 pages
1 PDFsam Apache Spark Tutorial
No ratings yet
1 PDFsam Apache Spark Tutorial
7 pages
Spark Tips 1716698498
No ratings yet
Spark Tips 1716698498
7 pages
PSEUDOCODE
No ratings yet
PSEUDOCODE
8 pages
The 67 Steps by Tai Lopez
No ratings yet
The 67 Steps by Tai Lopez
7 pages
Big Data Training in Chennai - Big Data Course in Chennai
No ratings yet
Big Data Training in Chennai - Big Data Course in Chennai
1 page
Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
DE Bootcamp - Week 3 Day 2
No ratings yet
DE Bootcamp - Week 3 Day 2
4 pages
Cloudera Developer Training For Spark and Hadoop
No ratings yet
Cloudera Developer Training For Spark and Hadoop
4 pages
Project Name
No ratings yet
Project Name
5 pages
Int 421
No ratings yet
Int 421
2 pages
PySpark Airflow Interview Pack Shubham
No ratings yet
PySpark Airflow Interview Pack Shubham
3 pages
Turbo VPN Boosts March Madness Streaming & Expands U.S. Server Network To 21 States!
No ratings yet
Turbo VPN Boosts March Madness Streaming & Expands U.S. Server Network To 21 States!
2 pages
Productflyer - 978 1 4842 0964 6 PDF
No ratings yet
Productflyer - 978 1 4842 0964 6 PDF
1 page
Dhillon Resume
No ratings yet
Dhillon Resume
2 pages
Critical Path Analysis Cpa
No ratings yet
Critical Path Analysis Cpa
3 pages
PySpark Training
No ratings yet
PySpark Training
3 pages
James Ruzzell Tigno: #931 Bued, Calasiao, Pangasinan Contact: 09495977935
No ratings yet
James Ruzzell Tigno: #931 Bued, Calasiao, Pangasinan Contact: 09495977935
2 pages
Big Data Analytics With Spark: A Practitioner's Guide To Using Spark For Large Scale Data Analysis
No ratings yet
Big Data Analytics With Spark: A Practitioner's Guide To Using Spark For Large Scale Data Analysis
1 page
Udemy - 4
No ratings yet
Udemy - 4
1 page
Computer Class 12 - T - 3
No ratings yet
Computer Class 12 - T - 3
1 page
Cloudera Spark Training
No ratings yet
Cloudera Spark Training
2 pages
Swetha
No ratings yet
Swetha
1 page
Mastering the Art of Scala Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Scala Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
From Everand
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
Adam Jones
No ratings yet
Apache Spark Unleashed: Advanced Techniques for Data Processing and Analysis
From Everand
Apache Spark Unleashed: Advanced Techniques for Data Processing and Analysis
Adam Jones
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Apache Spark Application Performance Tuning

Uploaded by

Apache Spark Application Performance Tuning

Uploaded by

TRAINING SHEET

APACHE SPARK APPLICATION

What You Will Learn

Spark Architecture Mitigating Spark Shuffles Caching Data for Reuse

You might also like