0% found this document useful (0 votes)

184 views2 pages

Apache Spark Installation and Programming Guide

This document provides a step-by-step guide to install and configure Apache Spark in standalone mode on a single machine. It explains how to download and extract the Spark files, set the SPARK_HOME environment variable, and launch the Spark shell. It also provides a basic Java code example to count the words in a file using Spark.

Uploaded by

suneha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

184 views2 pages

Apache Spark Installation and Programming Guide

Uploaded by

suneha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Apache Spark Installation and Programming Guide

This is a step-by-step guide to install Apache Spark. Spark can be configured with multiple cluster
managers like YARN or in local mode and standalone mode.

StandaloneDeployMode
In this practical, you will be configuring Spark to run in standalone mode. Both driver and worker
nodes run on the same machine.
Since we use Java to write and run programs on Spark, ensure that Java 8 is pre-installed on
the machines on which you have to run Spark job.
To install Spark on the machine, you would download prebuilt binary of Spark from
http://spark.apache.org/downloads.html page.

Select the spark distribution as shown in the below snapshot:

You can also directly download Spark-1.6.1 by using the following command:

wget http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.4.tgz

Decompress the Spark file into the directory where you want to store Spark.

tar xvf spark-1.6.1-bin-hadoop2.4.tgz C /DeZyre

Make a softlink to the actual spark directory (This will be helpful for any version upgrade in future)

ln -s spark-1.5.2-bin-hadoop2.4 spark

Make an entry for spark in .bashrc file

SPARK_HOME=/mydirectory/spark

export PATH=$SPARK_HOME/bin:$PATH
Source the changed .bashrc file by the command

source ~/.bashrc

We have successfully configured spark in standalone mode. To check lets launch the Spark Shell by
the following command:

spark-shell

To check the Sparks Scala shell version by the following command

sc.version

WritingProgram
Next we will write a basic Java application to count a word in a file. Below is the source code for the
Word Count program in Apache Spark. You also need to import some Spark classes into your program.
You also need to include the path for the file to be used.

JavaRDD<String> textFile = sc.textFile("hdfs://...");

JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String,

String>() {

public Iterable<String> call(String s) { return Arrays.asList(s.split("

")); }

});

JavaPairRDD<String, Integer> pairs = words.mapToPair(new

PairFunction<String, String, Integer>() {

public Tuple2<String, Integer> call(String s) { return new Tuple2<String,

Integer>(s, 1); }

});

JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new

Function2<Integer, Integer, Integer>() {

public Integer call(Integer a, Integer b) { return a + b; }

});

counts.saveAsTextFile("hdfs://...");

Guide To GENEX Assistant For Training
100% (1)
Guide To GENEX Assistant For Training
30 pages
Learning Spark - Chapter 2
No ratings yet
Learning Spark - Chapter 2
6 pages
Unit V
No ratings yet
Unit V
23 pages
Apache Spark Installation
No ratings yet
Apache Spark Installation
4 pages
Spark Overview: Security
No ratings yet
Spark Overview: Security
4 pages
29 PDFsam Apache Spark Tutorial
No ratings yet
29 PDFsam Apache Spark Tutorial
7 pages
22 PDFsam Apache Spark Tutorial
No ratings yet
22 PDFsam Apache Spark Tutorial
7 pages
Practical 11cdscds
No ratings yet
Practical 11cdscds
4 pages
Unit 4 Spark Updated
No ratings yet
Unit 4 Spark Updated
86 pages
Lec - Spark
No ratings yet
Lec - Spark
65 pages
SABDE3G06 Big Data Sparks
No ratings yet
SABDE3G06 Big Data Sparks
57 pages
8 PDFsam Apache Spark Tutorial
No ratings yet
8 PDFsam Apache Spark Tutorial
7 pages
What Is Apache Spark?
No ratings yet
What Is Apache Spark?
232 pages
Group B Assignment No: 13: 1) Install Scala
No ratings yet
Group B Assignment No: 13: 1) Install Scala
5 pages
Spark
No ratings yet
Spark
65 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Apache Spark Tutorial
100% (1)
Apache Spark Tutorial
6 pages
Installing Apache Spark and Scala: Windows
No ratings yet
Installing Apache Spark and Scala: Windows
3 pages
Part-B Assignment No. 3
No ratings yet
Part-B Assignment No. 3
5 pages
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
No ratings yet
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
11 pages
Sumit Kothari Apache Spark and Scala Practical 17
No ratings yet
Sumit Kothari Apache Spark and Scala Practical 17
18 pages
Step 1: Verifying Java Installation: Download Scala
No ratings yet
Step 1: Verifying Java Installation: Download Scala
3 pages
Spark
No ratings yet
Spark
160 pages
Install Apache Spark in A Standalone Mode On Windows
No ratings yet
Install Apache Spark in A Standalone Mode On Windows
11 pages
4.2. Spark Applications
No ratings yet
4.2. Spark Applications
19 pages
BDALab Assn5
No ratings yet
BDALab Assn5
16 pages
RDD Programming Guide - Spark 3.5.5 Documentation
No ratings yet
RDD Programming Guide - Spark 3.5.5 Documentation
14 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
Installation Et Configuration de Spark
No ratings yet
Installation Et Configuration de Spark
14 pages
Day1 Main
No ratings yet
Day1 Main
188 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
7 pages
Part B Assignment No 13
No ratings yet
Part B Assignment No 13
4 pages
Spark-Tutorial - IV - Python
No ratings yet
Spark-Tutorial - IV - Python
212 pages
Spark Python Install
No ratings yet
Spark Python Install
3 pages
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
No ratings yet
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
45 pages
Big Data Computing Spark Basics and RDD: Ke Yi
No ratings yet
Big Data Computing Spark Basics and RDD: Ke Yi
43 pages
Apache Spark and Ignite
No ratings yet
Apache Spark and Ignite
4 pages
Integration of Python With Hadoop and Spark
No ratings yet
Integration of Python With Hadoop and Spark
10 pages
CIS612 SparkInstallation Ubuntun
No ratings yet
CIS612 SparkInstallation Ubuntun
10 pages
Spark PPT
No ratings yet
Spark PPT
55 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
Spark Introduction
No ratings yet
Spark Introduction
19 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Apache Spark Tutorial
100% (4)
Apache Spark Tutorial
36 pages
Apache Spark
No ratings yet
Apache Spark
100 pages
Analytics at Large Scale in Spark
No ratings yet
Analytics at Large Scale in Spark
13 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
Configuration - Spark 2.3.2 Documentation
No ratings yet
Configuration - Spark 2.3.2 Documentation
20 pages
Unit 5
100% (1)
Unit 5
109 pages
Final Note
No ratings yet
Final Note
31 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
1 PDFsam Apache Spark Tutorial
No ratings yet
1 PDFsam Apache Spark Tutorial
7 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
Unit IV Spark
No ratings yet
Unit IV Spark
23 pages
Chapter 3 Spark
No ratings yet
Chapter 3 Spark
6 pages
Apache Spark
No ratings yet
Apache Spark
162 pages
Audials Modules
No ratings yet
Audials Modules
4 pages
How To - Define and Reference A Resource
No ratings yet
How To - Define and Reference A Resource
2 pages
Android ADB Instruction
100% (1)
Android ADB Instruction
7 pages
Installing Window XP: CHS Training December 26 - 30, 2013
No ratings yet
Installing Window XP: CHS Training December 26 - 30, 2013
36 pages
Google - Chrome - User Data - Default
No ratings yet
Google - Chrome - User Data - Default
2 pages
Dec50103 PW4 Ver - Jun20
No ratings yet
Dec50103 PW4 Ver - Jun20
24 pages
Map Update Portal Landing Page
No ratings yet
Map Update Portal Landing Page
38 pages
Com, Dcom, Com+
100% (2)
Com, Dcom, Com+
20 pages
VEIKK S640 Instruction Manual PDF
100% (1)
VEIKK S640 Instruction Manual PDF
20 pages
Y10 04 P24 Assessment
No ratings yet
Y10 04 P24 Assessment
4 pages
Log
No ratings yet
Log
39 pages
Introduction To: JAVA GUI Programming
No ratings yet
Introduction To: JAVA GUI Programming
63 pages
Proview Installation Guide Debian: Package pwr48
No ratings yet
Proview Installation Guide Debian: Package pwr48
5 pages
Settings Provider
No ratings yet
Settings Provider
244 pages
LISTING PROGRAM Done
No ratings yet
LISTING PROGRAM Done
41 pages
DMOC Codes 10961 20nos. - PO# - 1300
No ratings yet
DMOC Codes 10961 20nos. - PO# - 1300
20 pages
Windows Pe (Winpe) : Where Do I Download It?
No ratings yet
Windows Pe (Winpe) : Where Do I Download It?
4 pages
Windows Keyboard Shortcuts
No ratings yet
Windows Keyboard Shortcuts
1 page
CV Template For Software Engineer
No ratings yet
CV Template For Software Engineer
4 pages
QN431 OSI Stack For Windows ICCP
No ratings yet
QN431 OSI Stack For Windows ICCP
17 pages
List of Run Command For Windows 7-8-10 Download in PDF
50% (4)
List of Run Command For Windows 7-8-10 Download in PDF
4 pages
Microsoft 98-349
No ratings yet
Microsoft 98-349
28 pages
Step by Step Guide To Fine Grai
No ratings yet
Step by Step Guide To Fine Grai
10 pages
Enhanced Diagnostic Monitor User's Guide 2.10.0
No ratings yet
Enhanced Diagnostic Monitor User's Guide 2.10.0
246 pages
Modern CPP and Windows Store Apps PDF
100% (1)
Modern CPP and Windows Store Apps PDF
411 pages
Eztrade User Manual
No ratings yet
Eztrade User Manual
12 pages
What Is Domain Rename
No ratings yet
What Is Domain Rename
91 pages
00-Installations - Rev G
No ratings yet
00-Installations - Rev G
69 pages
COC 3 Setup Computer Server by Group 3: Members
No ratings yet
COC 3 Setup Computer Server by Group 3: Members
8 pages

Apache Spark Installation and Programming Guide

Uploaded by

Apache Spark Installation and Programming Guide

Uploaded by

Apache Spark Installation and Programming Guide

Select the spark distribution as shown in the below snapshot:

tar xvf spark-1.6.1-bin-hadoop2.4.tgz C /DeZyre

Make an entry for spark in .bashrc file

To check the Sparks Scala shell version by the following command

JavaRDD<String> textFile = sc.textFile("hdfs://...");

JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String,

public Iterable<String> call(String s) { return Arrays.asList(s.split("

JavaPairRDD<String, Integer> pairs = words.mapToPair(new

PairFunction<String, String, Integer>() {

public Tuple2<String, Integer> call(String s) { return new Tuple2<String,

JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new

Function2<Integer, Integer, Integer>() {

public Integer call(Integer a, Integer b) { return a + b; }

You might also like