Open navigation menu

Scribd

0% found this document useful (0 votes)

26 views58 pages

ApacheSparkWorkshop 2020 09 17

Apache Workshop

Uploaded by

tinnguyen1111ntt

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views58 pages

ApacheSparkWorkshop 2020 09 17

Apache Workshop

Uploaded by

tinnguyen1111ntt

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Introduction to Apache Spark

Slavko Žitnik, Marko Prelevikj

University of Ljubljana, Faculty for computer and information science

1 Presentation title www.prace-ri.eu

Agenda

▶ About Apache Spark

▶ Spark execution modes
▶ Basic Spark data structures (RDDs)
▶ Hands-on tutorial
▶ RDDs and operations
▶ DataFrames, User defined functions and SparkSQL
▶ Hands-on lab exercises – Jupyter notebooks
▶ Hands-on lab exercise – Spark on an HPC
▶ Apache Spark deployment using Slurm
▶ Challenge exercises (independent work) & debug session

2 Presentation title www.prace-ri.eu

About Apache Spark

▶ Fast, expressive, general-purpose in-memory cluster computing framework compatible with

Apache Hadoop and built around speed, ease of use and streaming analytics
▶ Faster and easier than Hadoop MapReduce*
▶ Large community and 3rd party libraries
▶ Provides high-level APIs (Java, Scala, Python, R)
▶ Supports variety of workloads
▶ interactive queries, streaming, machine learning and graph processing

3 Presentation title www.prace-ri.eu

Apache Spark Use cases

▶ Logs processing (Uber)

▶ Event detection and real-time analysis
▶ Interactive analysis
▶ Latency reduction
▶ Advanced ad-targeting (Yahoo!)
▶ Recommendation systems (Netflix, Pinterest)
▶ Fraud detection
▶ Sentiment analysis (Twitter)
▶ ...

4 Presentation title www.prace-ri.eu

Apache Spark Use cases

▶ Logs processing (Uber)

▶ Event detection and real-time analysis
▶ Interactive analysis
▶ Latency reduction
▶ Advanced ad-targeting (Yahoo!)
▶ Recommendation systems (Netflix, Pinterest)
▶ Fraud detection
▶ Sentiment analysis (Twitter)
▶ ...

5 Presentation title www.prace-ri.eu

Apache Spark Use cases

▶ Logs processing (Uber)

▶ Event detection and real-time analysis
▶ Interactive analysis
▶ Latency reduction
▶ Advanced ad-targeting (Yahoo!)
▶ Recommendation systems (Netflix, Pinterest)
▶ Fraud detection
▶ Sentiment analysis (Twitter)
▶ ...

6 Presentation title www.prace-ri.eu

Apache Spark Use cases

▶ Logs processing (Uber)

▶ Event detection and real-time analysis
▶ Interactive analysis
▶ Latency reduction
▶ Advanced ad-targeting (Yahoo!)
▶ Recommendation systems (Netflix, Pinterest)
▶ Fraud detection
▶ Sentiment analysis (Twitter)
▶ ...

7 Presentation title www.prace-ri.eu

Apache Spark general setup: Twitter sentiment analysis

8 Presentation title www.prace-ri.eu

Hadoop MapReduce vs. Apache Spark

▶ Big data frameworks

▶ Performance
▶ Ease of use
▶ Costs
▶ Data processing
▶ Fault tolerance
▶ Security
▶ Hadoop
▶ Archival data analysis
▶ Spark
▶ Real-time data analysis

9 Presentation title www.prace-ri.eu

Hadoop MapReduce vs. Apache Spark

▶ Big data frameworks

▶ Performance
▶ Ease of use
▶ Costs
▶ Data processing
▶ Fault tolerance
▶ Security
▶ Hadoop
▶ Archival data analysis
▶ Spark
▶ Real-time data analysis

10 Presentation title www.prace-ri.eu

Hadoop MapReduce vs. Apache Spark

▶ Big data frameworks

▶ Performance
▶ Ease of use
▶ Costs
▶ Data processing
▶ Fault tolerance
▶ Security
▶ Hadoop
▶ Archival data analysis
▶ Spark
▶ Real-time data analysis

11 Presentation title www.prace-ri.eu

Hadoop MapReduce vs. Apache Spark

▶ Big data frameworks

▶ Performance
▶ Ease of use
▶ Costs
▶ Data processing
▶ Fault tolerance
▶ Security
▶ Hadoop
▶ Archival data analysis
▶ Spark
▶ Real-time data analysis

12 Presentation title www.prace-ri.eu

Hadoop MapReduce vs. Apache Spark

▶ Big data frameworks

▶ Performance
▶ Ease of use
▶ Costs
▶ Data processing
▶ Fault tolerance
▶ Security
▶ Hadoop
▶ Archival data analysis
▶ Spark
▶ Real-time data analysis

13 Presentation title www.prace-ri.eu

Hadoop MapReduce vs. Apache Spark

▶ Big data frameworks

▶ Performance
▶ Ease of use
▶ Costs
▶ Data processing
▶ Fault tolerance
▶ Security
▶ Hadoop
▶ Archival data analysis
▶ Spark
▶ Real-time data analysis

14 Presentation title www.prace-ri.eu

Apacke Spark ecosystem

Spark SQL Spark Streaming Machine learning GraphX 3rd party library

Apache Spark Core

Standalone
EC2 Hadoop YARN Apache Mesos Kubernetes
scheduler

R Java Python Scala

15 Presentation title www.prace-ri.eu

Spark SQL Spark Streaming Machine learning GraphX 3rd party library

Spark ecosystem: Spark core Apache Spark Core

Standalone
EC2 Hadoop YARN Apache Mesos Kubernetes
scheduler
▶ Core functionalities
▶ task scheduling
▶ memory management
R Java Python Scala
▶ fault recovery
▶ storage systems interaction
▶ etc.
▶ Basic data structure definitions/abstractions
▶ Resilient Distributed Data sets (RDDs)
▶ main Spark data structure
▶ Directed Acyclic Graph (DAG)

16 Presentation title www.prace-ri.eu

Spark SQL Spark Streaming Machine learning GraphX 3rd party library

Spark ecosystem: Spark SQL Apache Spark Core

Standalone
EC2 Hadoop YARN Apache Mesos Kubernetes
scheduler
▶ Structured data manipulation
▶ Data Frames definition
▶ Table-like data representation
R Java Python Scala
▶ RDDs extension
▶ Schema definition
▶ SQL queries execution
▶ Native support for schema-based data
▶ Hive, Paquet, JSON, CSV

17 Presentation title www.prace-ri.eu

Spark SQL Spark Streaming Machine learning GraphX 3rd party library

Spark ecosystem: Streaming Apache Spark Core

Standalone
EC2 Hadoop YARN Apache Mesos Kubernetes
scheduler
▶ Data analysis of streaming data
▶ e.g. tweets, log messages
▶ Features of stream processing
R Java Python Scala
▶ High-troughput
▶ Fault-tolerant
▶ End-to-end
▶ Exactly-once
▶ High-level abstraction of a discretized stream
▶ Dstream represented as a sequence of RDDs
▶ Spark 2.3+ , Continuous Processing
▶ end-to-end latencies as low as 1ms

18 Presentation title www.prace-ri.eu

Spark SQL Spark Streaming Machine learning GraphX 3rd party library

Apache Spark Core

Spark ecosystem: MLlib Standalone

scheduler
EC2 Hadoop YARN Apache Mesos Kubernetes

▶ Common ML functionalities
▶ ML Algorithms R Java Python Scala
▶ common learning algorithms such as classification, regression, clustering, and collaborative filtering
▶ Featurization
▶ feature extraction, transformation, dimensionality reduction, and selection
▶ Pipelines
▶ tools for constructing, evaluating, and tuning ML Pipelines
▶ Persistence
▶ saving and load algorithms, models, and Pipelines
▶ Utilities
▶ linear algebra, statistics, data handling, etc.
▶ Two APIs
▶ RDD-based API (spark.mllib package)
▶ Spark 2.0+, DataFrame-based API (spark.ml package)
▶ Methods scale out across the cluster by default

19 Presentation title www.prace-ri.eu

Spark SQL Spark Streaming Machine learning GraphX 3rd party library

Apache Spark Core

Spark ecosystem: GraphX
Standalone
EC2 Hadoop YARN Apache Mesos Kubernetes
scheduler

▶ Support for graphs and graph-parallel

computation
R Java Python Scala
▶ Extension of RDDs (Graph)
▶ directed multigraph with properties on vertices and edges
▶ Graph computation operators
▶ subgraph, joinVertices, and aggregateMessages, etc.
▶ PregelAPI support

20 Presentation title www.prace-ri.eu

Spark Execution modes

▶ Local mode
▶ „Pseudo-cluster“ ad-hoc setup using script
▶ Cluster mode
▶ Running via cluster manager
▶ Interactive mode
▶ Direct manipulation in a shell (pyspark, spark-shell)

21 Presentation title www.prace-ri.eu

Spark execution modes Local mode

▶ Non-distributed single-JVM
deployment mode
▶ Spark library spawns (in a JVM)
▶ driver
▶ scheduler
▶ master
▶ executor
▶ Parallelism is the number of
threads defined by a parameter N in
a spark master URL
▶ local[N]

22 Presentation title www.prace-ri.eu

Spark execution modes Cluster mode

▶ Deployment on a private cluster

▶ Apache Mesos
▶ Hadoop YARN
▶ Kubernetes
▶ Standalone mode, ...

23 Presentation title www.prace-ri.eu

Spark execution modes Cluster mode

▶ Components
▶ Worker
▶ Node in a cluster, managed by an executor
▶ Executor manages computation, storage and caching
▶ Cluster manager
▶ Allocates resources via SparkContext with Driver program
▶ Driver program
▶ A program holding SparkContext and main code to execute in Spark
▶ Sends application code to executors to execute
▶ Listens to incoming connections from executors

24 Presentation title www.prace-ri.eu

Spark execution modes Cluster mode

▶ Deploy modes (standalone clusters)

▶ Client mode (default)
▶ Driver runs in the same process as client that submits the app
▶ Cluster mode
▶ Driver launched from a worker process
▶ Client process exits immediately after application submission

25 Presentation title www.prace-ri.eu

Spark Execution process

1. Data preparation/import
▶ RDDs creation – i.e. parallel dataset
with partitions
2. Transformations/actions definition*
▶ Creation of tasks (units of work) sent to one executor
▶ Job is a set of tasks executed by an action*
3. Creation of a directed acyclic graph (DAG)
▶ Contains a graph of RDD operations
▶ Defition of stages – set of tasks to be executed in parallel (i.e. at a partition level)
4. Execution of a program

26 Presentation title www.prace-ri.eu

Spark Programming concepts (Resilient distributed datasets -
RDDs)

▶ Basic data representation in Spark

▶ A distributed collection of items – partitions
▶ Enables operations to be performed in parallel
▶ Immutable (read-only)
▶ Fault tolerant
▶ “Recipe“ of data transformations is preserved, so a partition can be re-created at any
time
▶ Caching
▶ Different storage levels possible
▶ Supports a set of Spark transformations and actions

27 Presentation title www.prace-ri.eu

Spark Programming concepts (Resilient distributed datasets -
RDDs)

▶ Computations are expressed using

▶ creation of new RDDs
▶ transforming existing RDDs
▶ operations on RDDs to compute results (actions)
▶ Distributes the data within RDDs across nodes (executors) in the cluster and parallelizes
calculations

28 Presentation title www.prace-ri.eu

RDD Operations

▶ RDDs enable following operations

▶ transformations
▶ lazy operations that return a new RDD from input RDDs
▶ narrow or wide types
▶ examples: map, filter, join, groupByKey...
▶ actions
▶ return a result or write to storage,
execute transformations
▶ examples: count, collect, save

29 Presentation title www.prace-ri.eu

RDD Transformations vs. actions

30 Presentation title www.prace-ri.eu

Hands On
1. Use NoMachine to login to UL FS‘s HPC
2. Open Terminal/Console/“Konzola“
3. Clone Workshop Git repository and enter its folder
git clone https://github.com/szitnik/Apache-Spark-Workshop.git
cd Apache-Spark-Workshop
4. Enter the following commands
module load Spark/2.4.0-Hadoop-2.7-Java-1.8
python3 -m venv spark-workshop-env
. spark-workshop-env/bin/activate

pip install --upgrade pip

pip install pyspark jupyter
python

31 Presentation title www.prace-ri.eu

RDDs in Spark

▶ We will use pySpark library interactively

import pyspark
sc = pyspark.SparkContext(appName='SparkWorkshop', master='local[1]')

32 Presentation title www.prace-ri.eu

RDDs in Spark
▶ Creation of RDDs
▶ From a collection
rdd1 = sc.parallelize([('John', 23), ('Mark', 11), ('Jenna', 44),
('Sandra', 61)])
▶ From a file
rdd2 = sc.textFile('data/IMDB Dataset.csv')

▶ Basic transformations map(), filter(), flatMap()

older = rdd1.filter(lambda x: x[1] > 18)
anonymized = older.map(lambda x: (x[0][0], x[1]))

birthdays = rdd1.map(lambda x: list(range(1, x[1]+1)))

birthdays = rdd1.flatMap(lambda x: list(range(1, x[1]+1)))

33 Presentation title www.prace-ri.eu

RDDs in Spark
▶ Further actions, transformations
rdd2.take(2)

def organize(line):
data = line.split('",')
data = data if len(data) == 2 else line.split(',')
return (data[1], data[0][1:51] + ' ...')
movies = rdd2.filter(lambda x: x != 'review,sentiment').map(organize)

movies.count() // 50.000
movies = movies.filter(lambda x: x[0] in ['positive', 'negative'])
movies.count() // 45.936
movieCounts = movies.groupByKey().map(lambda x: (x[0], len(x[1])))

34 Presentation title www.prace-ri.eu

RDDs in Spark
▶ Caching
movies.take(2)

posReviews = movies.filter(lambda x: x[0] == 'positive').map(lambda x: x[1])

negReviews = movies.filter(lambda x: x[0] == 'negative').map(lambda x: x[1])

posReviews.cache().collect()

35 Presentation title www.prace-ri.eu

RDDs in Spark
▶ Caching
posReviews.filter(lambda x: 'good' in x).count() //605
negReviews.filter(lambda x: 'bad' in x).count() //788

36 Presentation title www.prace-ri.eu

RDDs in Spark

37 Presentation title www.prace-ri.eu

RDDs in Spark
▶ DAG exploration
def splitLine(line):
return line.replace(',', ' ').replace('"', ' ').replace('.', ' ').split()

rdd2 = sc.textFile('data/IMDB Dataset.csv', 4)

wordCounts = rdd2.flatMap(splitLine).map(lambda word: (word, 1)). \
reduceByKey(lambda a,b: a+b, 3)
wordCounts.takeOrdered(10, key = lambda x: -x[1])

textFile() flatMap() map() reduceByKey() takeOrdered()

Stage 1 Stage 2

38 Presentation title www.prace-ri.eu

RDDs in Spark
▶ DAG exploration
(admin console
result)

39 Presentation title www.prace-ri.eu

DataFrames (= RDDs + schema) in Spark
▶ Spark SQL enables read/write
from/to files, JSON, databases,
etc.
▶ DataFrames interoperable with Pandas dataframe
▶ DataFrames creation ...
from pyspark.sql import SQLContext, Row
from pyspark.sql.types import StructType, IntegerType, StringType, StructField
sqlContext = SQLContext(sc)

df1 = sqlContext.createDataFrame(rdd1, ["name", "age"])

ExampleRow = Row("name", "age")

rdd1a = rdd1.map(lambda x: ExampleRow(x[0], x[1]))
df1 = sqlContext.createDataFrame(rdd1a)

40 Presentation title www.prace-ri.eu

DataFrames (= RDDs + schema) in Spark
▶ ... DataFrames creation ...
schema = StructType([StructField("name", StringType(), False), \
StructField("age", IntegerType(), True)])
df1 = sqlContext.createDataFrame(rdd1, schema)

df1.show()

df1.printSchema()

41 Presentation title www.prace-ri.eu

DataFrames (= RDDs + schema) in Spark
▶ ... DataFrames creation
df2 = sqlContext.read.format('csv').option('header', 'true'). \
option('mode', 'DROPMALFORMED').load('data/IMDB Dataset.csv')
df2.show(5)

df2.printSchema()

42 Presentation title www.prace-ri.eu

DataFrames and User Defined Functions (UDF) in Spark

▶ User defined functions are custom functions to run against the "database" directly
▶ Caveats
▶ Optimization problems (especially in pySpark!)
▶ Special values handling by the programmer (e.g. null values)
▶ Approaches to use UDFs
▶ df = df.withColumn
▶ df = sqlContext.sql("SELECT * FROM <UDF>")
▶ rdd.map(UDF())

43 Presentation title www.prace-ri.eu

DataFrames and User Defined Functions (UDF) in Spark
▶ Examples
from pyspark.sql.functions import udf

reviewLen = udf(lambda r: len(r), IntegerType())

reviewSnippet = udf(lambda r: r[0:50] + ' ...', StringType())

df2 = df2.withColumn('reviewLength', reviewLen('review'))

df2 = df2.withColumn('reviewSnippet', reviewSnippet('review'))

44 Presentation title www.prace-ri.eu

DataFrames and User Defined Functions (UDF) in Spark
▶ Examples
def words(review, type = 'positive'):
sentimentWords = ['good', 'great', 'nice', 'awesome']
if type == 'negative':
sentimentWords = ['bad', 'worst', 'ugly', 'scary']
return sum(map(lambda w: review.count(w), sentimentWords))

positiveWords = udf(lambda r: words(r), IntegerType())

negativeWords = udf(lambda r: words(r, 'negative'), IntegerType())

df2 = df2.withColumn('positiveWords', positiveWords('review'))

df2 = df2.withColumn('negativeWords', positiveWords('review'))

df2 = df2.drop('review')

45 Presentation title www.prace-ri.eu

DataFrames and User Defined Functions (UDF) in Spark
▶ Examples
df2.cache().show()

46 Presentation title www.prace-ri.eu

Spark SQL DataFrame operations
▶ Examples
df2.select('sentiment', 'positiveWords').show(3)

df2.select(df2['sentiment'], df2['positiveWords']).show(3)

df2.select(df2['sentiment'], df2['positiveWords']). \
filter(df2['positiveWords'] > 10).show(3)

df2.groupBy('sentiment').count().show()

df2.summary().show()

47 Presentation title www.prace-ri.eu

Spark SQL SQL operations
▶ Examples
df2.createOrReplaceTempView('imdb')

sqlContext.sql('SELECT * FROM imdb WHERE positiveWords > 10 LIMIT 5').show()

sqlContext.sql('SELECT sentiment, count(*) FROM imdb GROUP BY sentiment').show()

48 Presentation title www.prace-ri.eu

Lab exercises - Jupyter
▶ Run jupyter notebook command in the project folder and run notebooks from notebooks
folder

49 Presentation title www.prace-ri.eu

Lab exercise – Spark on an HPC
▶ Move to spark-hpc folder:
▶ 00_clean.sh
▶ Script to clean logs, data generated by running scripts
▶ 01_run-sbatch.sh
▶ Prepare and submit scripts with a Spark job
▶ conf/spark-env.sh
▶ Env variables for worker folder and log folder set
▶ job.py
▶ Pyspark source code (i.e. simple pi calculation script)
▶ logs/
▶ Spark and slurm log folder
▶ NOTES.txt
▶ Short Slurm commands reference
▶ spark-job-TEMPLATE.sh
▶ Slurm script for job submission
▶ workers/
▶ Workers working directories

50 Presentation title www.prace-ri.eu

Lab exercise – Spark on an HPC
▶ job.py

51 Presentation title www.prace-ri.eu

Lab exercise – Spark on an HPC
▶ spark-job-TEMPLATE.sh

52 Presentation title www.prace-ri.eu

Lab exercise – Spark on an HPC
▶ 01_run-sbatch.sh

53 Presentation title www.prace-ri.eu

Lab exercise – Spark on an HPC
▶ Run spark application on an HPC (commands):
./01_run-sbatch.sh

squeue -u campus02

sacct -j 51438

54 Presentation title www.prace-ri.eu

Lab exercise – Spark on an HPC
▶ Check the output log:
cat logs/slurm_stdout_err__51438.log

55 Presentation title www.prace-ri.eu

Challenge exercises
▶ Check Lab 1 and Lab 2 Jupyter notebooks and solve challenges at the end
▶ Train a classifier to predict movie review sentiment
▶ Use the provided IMBD reviews.csv data an split it to train and test set
▶ Extract features (e.g. TF-IDF), train model (e.g. SVM) and test it
▶ See MLlib documentation at https://spark.apache.org/docs/latest/ml-guide.html
▶ Use more nodes with workers on an HPC
▶ Adapt HPC lab exercise to be run on multiple nodes
▶ Hint: https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:slurm_sbatch_script_for_spark_applications
▶ Viewing event logs in the SparkUI after Slurm job is finished
▶ Replicate the HPC exercise from before, retrieve logs and run history server to explore SparkUI
▶ Hints: https://researchcomputing.princeton.edu/faq/spark-via-slurm

56 Presentation title www.prace-ri.eu

References

▶ https://training.databricks.com/visualapi.pdf
▶ https://events.prace-ri.eu/event/896/
▶ https://luminousmen.com/post/spark-core-concepts-explained
▶ https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:slurm_sbatch_script_for_spark_applications
▶ https://researchcomputing.princeton.edu/faq/spark-via-slurm

57 Presentation title www.prace-ri.eu

THANK YOU FOR YOUR ATTENTION

www.prace-ri.eu

58 Presentation title www.prace-ri.eu

You might also like

Cylindrical Pins Is-2393
No ratings yet
Cylindrical Pins Is-2393
2 pages
192a Lt-2 LM 500 Elite Fanuc Oi TF Impm
100% (2)
192a Lt-2 LM 500 Elite Fanuc Oi TF Impm
138 pages
Modulo Ingles Basica Superior Intensiva
No ratings yet
Modulo Ingles Basica Superior Intensiva
32 pages
Concrete Masonry Report
No ratings yet
Concrete Masonry Report
21 pages
Agricola Rulebook
No ratings yet
Agricola Rulebook
12 pages
The Doctrine of Justification in The Work of N. T. Wright: by William N. Wilder Center For Christian Study
No ratings yet
The Doctrine of Justification in The Work of N. T. Wright: by William N. Wilder Center For Christian Study
43 pages
ISO 37120 City Indicators - City of Pickering
No ratings yet
ISO 37120 City Indicators - City of Pickering
37 pages
LIMPO, JEPHERLIN-LESSON 4 Becoming A Member of Society
100% (1)
LIMPO, JEPHERLIN-LESSON 4 Becoming A Member of Society
9 pages
Reviewed Essay Retailing GD Goenka
No ratings yet
Reviewed Essay Retailing GD Goenka
46 pages
Mastering Apache Spark PDF
75% (4)
Mastering Apache Spark PDF
541 pages
Mastering Apache Spark PDF
No ratings yet
Mastering Apache Spark PDF
663 pages
English 9 Prose Story
No ratings yet
English 9 Prose Story
56 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
33 Chemistry in Everyday Life Formula Sheets
No ratings yet
33 Chemistry in Everyday Life Formula Sheets
4 pages
Course Slideware
No ratings yet
Course Slideware
60 pages
Assign 1 Engagement Letter
No ratings yet
Assign 1 Engagement Letter
4 pages
BigData Spark Sparklyr
No ratings yet
BigData Spark Sparklyr
80 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
SPARK
No ratings yet
SPARK
125 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
No ratings yet
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
96 pages
Apache Spark Engine
100% (1)
Apache Spark Engine
82 pages
Samantha Beasley Resume-2014
No ratings yet
Samantha Beasley Resume-2014
3 pages
Longyear Bits Hardness Rating & Comparison Chart
No ratings yet
Longyear Bits Hardness Rating & Comparison Chart
1 page
Unit 5
100% (1)
Unit 5
109 pages
Spark Introduction
No ratings yet
Spark Introduction
26 pages
Intro To Apache Spark
No ratings yet
Intro To Apache Spark
66 pages
BrahMos I
No ratings yet
BrahMos I
29 pages
Lecture 25
No ratings yet
Lecture 25
59 pages
Analysis of A Poison Tree
No ratings yet
Analysis of A Poison Tree
2 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
09 Programming Hadoop - Spark, R and Pig
No ratings yet
09 Programming Hadoop - Spark, R and Pig
80 pages
Module 9: Processing Distributed Data With Apache Spark: WWW - Edureka.co/big-Data-And-Hadoop
No ratings yet
Module 9: Processing Distributed Data With Apache Spark: WWW - Edureka.co/big-Data-And-Hadoop
45 pages
The European Legacy: Toward New Paradigms
No ratings yet
The European Legacy: Toward New Paradigms
11 pages
Git Hub Log
No ratings yet
Git Hub Log
4 pages
Apach Spark With Scala Slides
No ratings yet
Apach Spark With Scala Slides
187 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Fluxion
No ratings yet
Fluxion
7 pages
Unit 4 Spark Cassendra
No ratings yet
Unit 4 Spark Cassendra
41 pages
Unit-V Spark
No ratings yet
Unit-V Spark
69 pages
What Is Apache Spark?
No ratings yet
What Is Apache Spark?
232 pages
Web Development Dissertation Topics
100% (2)
Web Development Dissertation Topics
6 pages
Introduction To Spark
No ratings yet
Introduction To Spark
54 pages
Spark Programming Basics
No ratings yet
Spark Programming Basics
54 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
Shark
No ratings yet
Shark
24 pages
Module 3
No ratings yet
Module 3
51 pages
Software Development Contract Template
No ratings yet
Software Development Contract Template
6 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Key Features: General-Purpose Fast Cluster Computing Platform
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
16 pages
Bommel Et Al. - 2016 - Livelihoods of Local Communities in An Amazonian Floodplain Coping With Global Changes From Role-Playing Games T
No ratings yet
Bommel Et Al. - 2016 - Livelihoods of Local Communities in An Amazonian Floodplain Coping With Global Changes From Role-Playing Games T
11 pages
Architecture and Components of Spark
No ratings yet
Architecture and Components of Spark
6 pages
Daa Unit V Notes
No ratings yet
Daa Unit V Notes
7 pages
Database Management Systems
No ratings yet
Database Management Systems
1 page
Spark PPT
No ratings yet
Spark PPT
55 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Mod4 Bda
No ratings yet
Mod4 Bda
14 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Apache Spark Tutorial
100% (1)
Apache Spark Tutorial
6 pages
Andhra Pradesh SSC Results, AP SSC 2015 Results, SSC Marks, 10th Class
No ratings yet
Andhra Pradesh SSC Results, AP SSC 2015 Results, SSC Marks, 10th Class
1 page
Bda Notes
No ratings yet
Bda Notes
241 pages
Big Data Computing Notes
No ratings yet
Big Data Computing Notes
17 pages
Bda U4
No ratings yet
Bda U4
49 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
SPARK
No ratings yet
SPARK
66 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
Unit 4 Spark Updated
No ratings yet
Unit 4 Spark Updated
86 pages
Cie 1
No ratings yet
Cie 1
12 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
AIESL CAPABILITY (Group A) 1
No ratings yet
AIESL CAPABILITY (Group A) 1
314 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
When Prime Brokers Fail The Unheeded Risk To Hedge Funds Banks and The Financial Industry 1st Edition by Aikman ISBN 1576603555 978-1576603550
No ratings yet
When Prime Brokers Fail The Unheeded Risk To Hedge Funds Banks and The Financial Industry 1st Edition by Aikman ISBN 1576603555 978-1576603550
38 pages
Unit 6 Spark
No ratings yet
Unit 6 Spark
43 pages
L03-Spark Framework
No ratings yet
L03-Spark Framework
58 pages
BDA Lec8
No ratings yet
BDA Lec8
39 pages
Apache Spark Lecture Notes
No ratings yet
Apache Spark Lecture Notes
4 pages
07 - Apache Spark - An Introduction
No ratings yet
07 - Apache Spark - An Introduction
36 pages
Current State of Fabrication Technologies and Materials Fo 2018 Acta Biomate
No ratings yet
Current State of Fabrication Technologies and Materials Fo 2018 Acta Biomate
30 pages
Unit V
No ratings yet
Unit V
35 pages
Unit IV Spark
No ratings yet
Unit IV Spark
23 pages
Big Data Anlytics Unit 3 R22 It
No ratings yet
Big Data Anlytics Unit 3 R22 It
57 pages
Bda Unit 5 - Mam
No ratings yet
Bda Unit 5 - Mam
44 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
Apache Spark
No ratings yet
Apache Spark
3 pages
3.5 Apache Spark
No ratings yet
3.5 Apache Spark
12 pages
Fire From Heaven Mary Renault Instant Download
No ratings yet
Fire From Heaven Mary Renault Instant Download
38 pages
SPARK
No ratings yet
SPARK
47 pages
Analytics at Large Scale in Spark
No ratings yet
Analytics at Large Scale in Spark
13 pages
Apache Spark Unleashed: Advanced Techniques for Data Processing and Analysis
From Everand
Apache Spark Unleashed: Advanced Techniques for Data Processing and Analysis
Adam Jones
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet