Troubleshooting Apache Spark [Video]

This is the code repository for Troubleshooting Apache Spark [Video], published by Packt. It contains all the supporting project files necessary to work through the video course from start to finish.

About the Video Course

In this course, you will learn how Spark's computation model works and leverage the DataFrame API along with its optimizations. Joining is one of the most important features in any Big Data tool and you will implement joins and write code in an efficient way. Implementing efficient transformations is hard. Common problems can cause your processing to go on a very long time. You will learn how to leverage reusing objects, and reduce setup and startup overheads using shared variables. Also, you will master Spark streaming and solve problems that arise while using that API.

What You Will Learn

Solve long-running computation problems by leveraging lazy evaluation in Spark
Avoid memory leaks by understanding the internal memory management of Apache Spark
Rework problems due to not-scaling out pipelines by using partitions
Debug and create user-defined functions that enrich the Spark API
Choose a proper join strategy depending on the characteristics of your input data
Troubleshoot APIs for joins - DataFrames or DataSets
Write code that minimizes object creation using the proper API
Troubleshoot real-time pipelines written in Spark Streaming

Instructions and Navigation

Assumed Knowledge

To fully benefit from the coverage included in this course, you will need:
To fully benefit from the coverage included in this course, you will need experienced Apache Spark technology.

Technical Requirements

This course has the following software requirements:
This course has the following software requirements:
For an optimal experience with hands-on labs and other practical activities, we recommend the following configuration:

OS: Mac Processor: Not Applicable Memory: 4GB or above Storage: 50GB free space

Software Requirements OS: Windows or Mac Browser: Google Chrome Atom IDE, Latest Version Node.js LTS 8.9.1 Installe

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Section_1		Section_1
Section_2		Section_2
Section_3		Section_3
Section_4		Section_4
churnanalysis		churnanalysis
data/mllib		data/mllib
project		project
resources		resources
src		src
LICENSE		LICENSE
README.md		README.md
SparkApisTests.scala		SparkApisTests.scala
SparkTransformationsActionsTest.scala		SparkTransformationsActionsTest.scala
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Troubleshooting Apache Spark [Video]

About the Video Course

What You Will Learn

Instructions and Navigation

Assumed Knowledge

Technical Requirements

Related Products

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

PacktPublishing/Troubleshooting-Apache-Spark

Folders and files

Latest commit

History

Repository files navigation

Troubleshooting Apache Spark [Video]

About the Video Course

What You Will Learn

Instructions and Navigation

Assumed Knowledge

Technical Requirements

Related Products

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages