3 - PDFsam - Beginner Guide Spark
3 - PDFsam - Beginner Guide Spark
Disclaimer
This material is intended only for the learners and is not intended for any commercial purpose. If you are not the
intended recipient, then you should not distribute or copy this material. Please notify the sender immediately or
click here to contact us.
Published by
ACADGILD,
[email protected]
What is Spark?
Apache spark is a cluster computing framework
which runs on Hadoop and handles different
Spark SQL +
types of data. It is a one stop solution to many
DataFrames
problems. Spark has rich resources for handling
the data and most importantly, it is 10-20x faster
than Hadoop’s MapReduce. It attains this speed Spark
of computation by its in-memory primitives. Streaming
The data is cached and is present in the memory MLlib
(RAM) and performs all the computations Machine
in-memory. Learning
GraphX
Spark’s rich resources has almost all the
Graph
components of Hadoop. For example we can
Computation
perform batch processing in Spark and real time
data processing, without using any additional
tools like kafka/flume of Hadoop. It has its own
streaming engine called spark streaming.
Spark Core API