0% found this document useful (0 votes)
47 views15 pages

SPark Monitoring and Tuning PPT 3.3.1

The document outlines a course on Big Data Technologies focusing on Spark and Scala at Chandigarh University, detailing course objectives, outcomes, and recommended readings. It emphasizes the importance of understanding the Hadoop Ecosystem, Apache Spark components, and techniques for monitoring and tuning Spark applications. Additionally, it provides insights into optimizing performance, resource usage, and best practices for effective Spark job execution.

Uploaded by

rahul104941
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views15 pages

SPark Monitoring and Tuning PPT 3.3.1

The document outlines a course on Big Data Technologies focusing on Spark and Scala at Chandigarh University, detailing course objectives, outcomes, and recommended readings. It emphasizes the importance of understanding the Hadoop Ecosystem, Apache Spark components, and techniques for monitoring and tuning Spark applications. Additionally, it provides insights into optimizing performance, resource usage, and best practices for effective Spark job execution.

Uploaded by

rahul104941
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

APEX INSTITUTE OF TECHNOLOGY.

AIT-IBM CSE
CHANDIGARH UNIVERSITY, MOHALI

Big Data Technologies (Spark & Scala)


(22CSH-391)
Lecture-1 (CO1)
By
Dr Geeta Rani (E15227)
Associate Professor (Chandigarh University)
Course Objective
 The students will be able to illustrate the interaction of multi-faceted
fields like data mining

 The students will be able to understand statistics and mathematics in the


development of Predictive Analytics

 The students shall understand and Apply the concepts of different


models

 The students shall understand various aspects of IBM SPSS Modeler


interface

 The students shall be able to familiarize with various data clustering and
dimension reduction techniques
Books
• Sr No Title of the Book Author Name Volume/Edition Publish
Hours Years
• 1 The Art of Data Science Roger Peng 3rd lulu.com 2016
• 2 Scala CookBook Alvin Alexander 2 nd Edition O'reilly 2008

• Reference Books

• Sr No Title of the Book Author Name Volume/Edition Publish


Hours Years
• 1 Scala CookBook Alvin Alexander 4th O'reilly 2014
Books
• E. Siegel, “Predictive Analytics: The Power to Predict Who Will Click,
Buy, Lie, or Die ". John Wiley & Sons, Inc, 2013.

• P. Simon, ," Too Big to Ignore: The Business Case for Big Data”, Wiley
India, 2013

• J. W. Foreman, " Data Smart: Using Data Science to Transform


information into Insight,", Addison-Wesley

OTHER LINKS
• https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&
url=https://spark.apache.org/&ved=2ahUKEwjck97Qjr2KAxUETGwGHeL
3AG8QFnoECA0QAQ&usg=AOvVaw0PRjizm_RRWFrZz0aW1eey
• https://developer.ibm.com/predictiveanalytics/videos/category/tutorial
s/
Course Outcomes

C • Understand the components of the Hadoop Ecosystem and Data Science


O methodology
1
• Understand the constructs of Scala
C
O2
• Understand Apache Spark and its components
C
O3
• Design the applications using Scala
C
O4

• Develop the Applications using Spark and its available Libraries


C
O5
Spark Monitoring and
Tuning
Optimizing Spark Performance and Resources
Introduction
• - Spark Monitoring: Tracking performance and resource
usage of Spark applications.
• - Spark Tuning: Optimizing configurations to improve
performance and resource utilization.
Key Components of Spark Monitoring
• - Driver and Executor Metrics:
• - Driver: Coordinates execution.
• - Executor: Runs tasks and stores data.
• - Metrics Collected: Task time, shuffle read/write, GC
time, etc.
• - Monitoring Tools: Spark UI, Ganglia, Prometheus.
Spark UI for Monitoring
• - Accessible on port 4040 by default.
• - Key Tabs:
• - Jobs: Displays job progress and statistics.
• - Stages: Detailed stage-level information.
• - Executors: Memory and CPU usage.
Spark Monitoring with External Tools
• - Ganglia: Monitors resource usage in Spark clusters.
• - Prometheus: Collects and exposes metrics for analysis.
• - Datadog/New Relic: Comprehensive Spark monitoring
and alerting.
Introduction to Spark Tuning
• - Spark Tuning improves job execution and resource
usage.
• - Focus Areas:
• - Memory allocation.
• - Parallelism and partitions.
• - Shuffle optimization.
Key Tuning Parameters
• - `spark.executor.memory`: Memory per executor.
• - `spark.executor.cores`: Number of cores per executor.
• - `spark.default.parallelism`: Default number of
partitions.
• - `spark.sql.shuffle.partitions`: Partitions for shuffle
operations.
Optimizing Spark Jobs
• - Data Caching: Use `cache()` or `persist()` for reused
data.
• - Broadcast Variables: Efficiently share small read-only
data.
• - Partitioning: Avoid data skew with balanced partitions.
Best Practices
• - Monitor regularly using Spark UI and external tools.
• - Adjust configurations iteratively for optimal
performance.
• - Optimize data shuffling and caching strategies.
• - Analyze query plans to identify inefficiencies.
Conclusion
• - Spark Monitoring ensures efficient resource utilization.
• - Spark Tuning improves job performance and
scalability.
• - Continuous monitoring and optimization are key.

You might also like