SlideShare a Scribd company logo
SparkOscope: Enabling Apache
Spark Optimization Through Cross-
Stack Monitoring and Visualization
Yiannis Gkoufas
IBM Research Dublin,Ireland
High Performance Systems
whoami
• Research Software Engineer in IBM Research,
Ireland since 2012
• Work on Analytics Foundations Middleware
– Distributed Frameworks, Anything Java/Scala based,
Web-based POCs
• High Performance Systems Group: Kostas,
Andrea, Dimitris, Khalid, Michael, Michele,
Mustafa, Pierre, Sri
Spark Experience
• We love developing in Spark our analytical
workloads and fully embraced it since the early
1.0.x versions
• Last few years, used it to run jobs on large
volume of energy-related sensor data
Jobs on Daily Basis
• Once we managed to develop the needed jobs,
they were executed in a recurring fashion
• We were receiving a new batch of data every
day
Fighting Bugs
• When there was a bug on our code, it was very
easy to discover it the Spark Web UI
• We could easily retrieve information about the
job, stage and line number in our source code
Fighting bottlenecks
• However we couldn’t easily spot which jobs and
stages were causing a slow down
• What was the part of our code that was the
bottleneck?
Ganglia Extension
• We had the option to use the Ganglia
Extension to export the metrics but:
– We need to maintain/configure yet another external
system
– There is no association with the Spark
jobs/stages/source code
Spark Monitoring Framework
• We could use the built-in Spark Monitoring
Framework but:
– Collecting CSVs from the worker nodes and
aggregating them seems cumbersome
– Again we couldn’t easily extract associations with
our source code of the job
Current Monitoring Architecture
Spark Worker1 Spark Worker2
Executor1 Executor2 Executor3 Executor4 Executor5 Executor6
Executor
Source
Executor
Source
Executor
Source
Executor
Source
Executor
Source
Executor
Source
CSV CSV CSV CSV CSV CSV
Job Execution
Monitoring
Framework
Local
Filesystem
Enter SparkOscope
SparkOscope Overview
• Extension to enrich Spark’s Monitoring
Framework with OS-level Metrics
• Enhancement of the Web UI to plot all the
available metrics + the newly developed OS-
level metrics
SparkOscope Modules
• SigarSource: Attached to the executor, leveraging
Hyperic Sigar library to get OS-Level Metrics
• HDFSSink: Exports all available metrics to an HDFS
directory
• MQTTSink: Publishes all available metrics on an MQTT
Topic
• Modified Web UI: Modified Spark Web UI to plot
historical and realtime plots, generated from the modules
SparkOscope Flavors
• Historical Plots: View metrics on the UI after
the job has finished
• Realtime Plots: View metrics on the UI in
realtime as the job is being executed
• Headless: Use SigarSource, HDFSSink,
MQTTSink without viewing the plots on the UI
– https://github.com/ibm-research-ireland/sparkoscope-headless
SparkOscope High-level
Architecture - Historical plots
Spark Worker1 Spark Worker2
Executor1 Executor2 Executor3 Executor4 Executor5 Executor6
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Job Execution
Monitoring
Framework
HDFS /custom-metrics/app-xxxxxxx
/executor1
/executor2
/executor3
/executor4
/executor5
/executor6
Spark Web UI
SparkOscope High-level
Architecture - Realtime plots
Spark Worker1 Spark Worker2
Executor1 Executor2 Executor3 Executor4 Executor5 Executor6
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Executor
Sigar
Source
Job Execution
Monitoring
Framework
Master /custom-metrics/app-xxxxxxx
Spark Web UI
MQTT Broker
SparkOscope Basic Installation
• Clone the git repo: https://github.com/ibm-research-
ireland/sparkoscope
• Build Spark
• Modify the configuration files:
metrics.properties spark-defaults.conf
SparkOscope OS-level Metrics
• Download the Hyperic Sigar library to all the slave nodes
• Extract it anywhere in the system
• Modify the configuration files
metrics.properties spark-env.sh
SparkOscope Realtime Plots
• Modify the configuration files
metrics.properties spark-defaults.conf
• Make sure that no service is currently running on ports specified on
the Master
• Make sure that executor.sink.mqtt.port is the same as
spark.moquette.conf
SparkOscope Headless Installation
• Clone the git repo: https://github.com/ibm-research-
ireland/sparkoscope-headless
• Build the maven project
• Modify the configuration files as described for SigarSource,
HDFSSink, MQTTSink
• Additionally you need to append to spark.executor.extraClassPath
the paths of the created jars
• No need to have the patched Spark version, since the metrics
are not displayed in the UI
Demo!
Roadmap
• Expand the range of available Sinks and
Sources
• Smart recommendations on infrastructure needs
derived from patterns of resource utilization of
jobs
• Work with the opensource ecosystem to improve
it and target more use cases
Thank You.
Questions?
email: yiannisg@ie.ibm.com

More Related Content

PDF
Spark Summit EU talk by Simon Whitear
Spark Summit
 
PDF
Continuous Application with FAIR Scheduler with Robert Xue
Databricks
 
PDF
The Universal Dataplane
Michelle Holley
 
PPTX
Alfresco tuning part2
Luis Cabaceira
 
PDF
PostgreSQL Materialized Views with Active Record
David Roberts
 
PDF
Nextflow Camp 2019: nf-core tutorial
Phil Ewels
 
PDF
Intro to CliWrap
Oleksii Holub
 
PDF
IBM Power Systems Performance Report
thinkASG
 
Spark Summit EU talk by Simon Whitear
Spark Summit
 
Continuous Application with FAIR Scheduler with Robert Xue
Databricks
 
The Universal Dataplane
Michelle Holley
 
Alfresco tuning part2
Luis Cabaceira
 
PostgreSQL Materialized Views with Active Record
David Roberts
 
Nextflow Camp 2019: nf-core tutorial
Phil Ewels
 
Intro to CliWrap
Oleksii Holub
 
IBM Power Systems Performance Report
thinkASG
 

What's hot (7)

PPTX
Moving Gigantic Files Into and Out of the Alfresco Repository
Jeff Potts
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PPTX
Alfresco tuning part1
Luis Cabaceira
 
PDF
Triplewave: a step towards RDF Stream Processing on the Web
Daniele Dell'Aglio
 
PDF
ElasticSearch in action
Codemotion
 
PDF
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
PDF
TRex Traffic Generator - Hanoch Haim
harryvanhaaren
 
Moving Gigantic Files Into and Out of the Alfresco Repository
Jeff Potts
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Alfresco tuning part1
Luis Cabaceira
 
Triplewave: a step towards RDF Stream Processing on the Web
Daniele Dell'Aglio
 
ElasticSearch in action
Codemotion
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
TRex Traffic Generator - Hanoch Haim
harryvanhaaren
 
Ad

Similar to SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas (20)

PDF
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit
 
PPTX
SplunkLive! Developer Session
Splunk
 
PPTX
Integrating Splunk into your Spring Applications
Damien Dallimore
 
PDF
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
 
PPTX
Splunk Developer Platform
Damien Dallimore
 
PPTX
Serverless spark
MamathaBusi
 
PDF
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
 
PDF
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
 
PDF
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
PDF
we45 DEFCON Workshop - Building AppSec Automation with Python
Abhay Bhargav
 
POTX
Using the Splunk Java SDK
Damien Dallimore
 
DOC
CV_RishabhDixit
Rishabh Dixit
 
PPTX
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
PPSX
Elastic-Engineering
Araf Karsh Hamid
 
PDF
Laying the Foundation for Ionic Platform Insights on Spark
Ionic Security
 
PDF
Spark Hsinchu meetup
Yung-An He
 
PPTX
Sas 2015 event_driven
Sascha Möllering
 
PDF
Running Apache Spark Jobs Using Kubernetes
Databricks
 
PDF
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Databricks
 
PPTX
Modern application development with oracle cloud sangam17
Vinay Kumar
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit
 
SplunkLive! Developer Session
Splunk
 
Integrating Splunk into your Spring Applications
Damien Dallimore
 
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
 
Splunk Developer Platform
Damien Dallimore
 
Serverless spark
MamathaBusi
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Eren Avşaroğulları
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
we45 DEFCON Workshop - Building AppSec Automation with Python
Abhay Bhargav
 
Using the Splunk Java SDK
Damien Dallimore
 
CV_RishabhDixit
Rishabh Dixit
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
Elastic-Engineering
Araf Karsh Hamid
 
Laying the Foundation for Ionic Platform Insights on Spark
Ionic Security
 
Spark Hsinchu meetup
Yung-An He
 
Sas 2015 event_driven
Sascha Möllering
 
Running Apache Spark Jobs Using Kubernetes
Databricks
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Databricks
 
Modern application development with oracle cloud sangam17
Vinay Kumar
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

Recently uploaded (20)

PDF
Data Analyst Certificate Programs for Beginners | IABAC
Seenivasan
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
Trading Procedures (1).pptxcffcdddxxddsss
garv794
 
PPTX
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
Global journeys: estimating international migration
Office for National Statistics
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
batch data Retailer Data management Project.pptx
sumitmundhe77
 
PDF
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
Accentfuture
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
Lecture 1 Intro in Inferential Statistics.pptx
MiraLamuton
 
PDF
Data Science Trends & Career Guide---ppt
jisajoy3061
 
PDF
345_IT infrastructure for business management.pdf
LEANHTRAN4
 
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
JanakiRaman206018
 
PPTX
Purple and Violet Modern Marketing Presentation (1).pptx
SanthoshKumar229321
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PaulYoung221210
 
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
dushyantsharma1221
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
abhinavmemories2026
 
Data Analyst Certificate Programs for Beginners | IABAC
Seenivasan
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
Trading Procedures (1).pptxcffcdddxxddsss
garv794
 
Measurement of Afordability for Water Supply and Sanitation in Bangladesh .pptx
akmibrahimbd
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Global journeys: estimating international migration
Office for National Statistics
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
batch data Retailer Data management Project.pptx
sumitmundhe77
 
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
Accentfuture
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Lecture 1 Intro in Inferential Statistics.pptx
MiraLamuton
 
Data Science Trends & Career Guide---ppt
jisajoy3061
 
345_IT infrastructure for business management.pdf
LEANHTRAN4
 
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
JanakiRaman206018
 
Purple and Violet Modern Marketing Presentation (1).pptx
SanthoshKumar229321
 
Chad Readey - An Independent Thinker
Chad Readey
 
Moving the Public Sector (Government) to a Digital Adoption
PaulYoung221210
 
Major-Components-ofNKJNNKNKNKNKronment.pptx
dushyantsharma1221
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
abhinavmemories2026
 

SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring with Yiannis Gkoufas

  • 1. SparkOscope: Enabling Apache Spark Optimization Through Cross- Stack Monitoring and Visualization Yiannis Gkoufas IBM Research Dublin,Ireland High Performance Systems
  • 2. whoami • Research Software Engineer in IBM Research, Ireland since 2012 • Work on Analytics Foundations Middleware – Distributed Frameworks, Anything Java/Scala based, Web-based POCs • High Performance Systems Group: Kostas, Andrea, Dimitris, Khalid, Michael, Michele, Mustafa, Pierre, Sri
  • 3. Spark Experience • We love developing in Spark our analytical workloads and fully embraced it since the early 1.0.x versions • Last few years, used it to run jobs on large volume of energy-related sensor data
  • 4. Jobs on Daily Basis • Once we managed to develop the needed jobs, they were executed in a recurring fashion • We were receiving a new batch of data every day
  • 5. Fighting Bugs • When there was a bug on our code, it was very easy to discover it the Spark Web UI • We could easily retrieve information about the job, stage and line number in our source code
  • 6. Fighting bottlenecks • However we couldn’t easily spot which jobs and stages were causing a slow down • What was the part of our code that was the bottleneck?
  • 7. Ganglia Extension • We had the option to use the Ganglia Extension to export the metrics but: – We need to maintain/configure yet another external system – There is no association with the Spark jobs/stages/source code
  • 8. Spark Monitoring Framework • We could use the built-in Spark Monitoring Framework but: – Collecting CSVs from the worker nodes and aggregating them seems cumbersome – Again we couldn’t easily extract associations with our source code of the job
  • 9. Current Monitoring Architecture Spark Worker1 Spark Worker2 Executor1 Executor2 Executor3 Executor4 Executor5 Executor6 Executor Source Executor Source Executor Source Executor Source Executor Source Executor Source CSV CSV CSV CSV CSV CSV Job Execution Monitoring Framework Local Filesystem
  • 11. SparkOscope Overview • Extension to enrich Spark’s Monitoring Framework with OS-level Metrics • Enhancement of the Web UI to plot all the available metrics + the newly developed OS- level metrics
  • 12. SparkOscope Modules • SigarSource: Attached to the executor, leveraging Hyperic Sigar library to get OS-Level Metrics • HDFSSink: Exports all available metrics to an HDFS directory • MQTTSink: Publishes all available metrics on an MQTT Topic • Modified Web UI: Modified Spark Web UI to plot historical and realtime plots, generated from the modules
  • 13. SparkOscope Flavors • Historical Plots: View metrics on the UI after the job has finished • Realtime Plots: View metrics on the UI in realtime as the job is being executed • Headless: Use SigarSource, HDFSSink, MQTTSink without viewing the plots on the UI – https://github.com/ibm-research-ireland/sparkoscope-headless
  • 14. SparkOscope High-level Architecture - Historical plots Spark Worker1 Spark Worker2 Executor1 Executor2 Executor3 Executor4 Executor5 Executor6 Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Job Execution Monitoring Framework HDFS /custom-metrics/app-xxxxxxx /executor1 /executor2 /executor3 /executor4 /executor5 /executor6 Spark Web UI
  • 15. SparkOscope High-level Architecture - Realtime plots Spark Worker1 Spark Worker2 Executor1 Executor2 Executor3 Executor4 Executor5 Executor6 Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Executor Sigar Source Job Execution Monitoring Framework Master /custom-metrics/app-xxxxxxx Spark Web UI MQTT Broker
  • 16. SparkOscope Basic Installation • Clone the git repo: https://github.com/ibm-research- ireland/sparkoscope • Build Spark • Modify the configuration files: metrics.properties spark-defaults.conf
  • 17. SparkOscope OS-level Metrics • Download the Hyperic Sigar library to all the slave nodes • Extract it anywhere in the system • Modify the configuration files metrics.properties spark-env.sh
  • 18. SparkOscope Realtime Plots • Modify the configuration files metrics.properties spark-defaults.conf • Make sure that no service is currently running on ports specified on the Master • Make sure that executor.sink.mqtt.port is the same as spark.moquette.conf
  • 19. SparkOscope Headless Installation • Clone the git repo: https://github.com/ibm-research- ireland/sparkoscope-headless • Build the maven project • Modify the configuration files as described for SigarSource, HDFSSink, MQTTSink • Additionally you need to append to spark.executor.extraClassPath the paths of the created jars • No need to have the patched Spark version, since the metrics are not displayed in the UI
  • 20. Demo!
  • 21. Roadmap • Expand the range of available Sinks and Sources • Smart recommendations on infrastructure needs derived from patterns of resource utilization of jobs • Work with the opensource ecosystem to improve it and target more use cases