Apache Spark On Docker: 1. Pull The Image From Docker Repository

This Dockerfile builds a Docker image containing Apache Spark 1.6.0 that depends on a base Hadoop Docker image. The document provides instructions for pulling and building the Docker image, running the image to launch Spark applications on YARN in either client or cluster mode, and submitting Spark applications from outside the Docker container by configuring environment variables.

Uploaded by

Gaurav Saini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views3 pages

Apache Spark On Docker: 1. Pull The Image From Docker Repository

Uploaded by

Gaurav Saini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Apache Spark on Docker

This repository contains a Docker file to build a Docker image with Apache Spark. This Docker
image depends on our previous Hadoop Docker image, available at the SequenceIQ GitHub page.
The base Hadoop Docker image is also available as an official Docker image.

1. Pull the image from Docker Repository

COMMAND - docker pull sequenceiq/spark:1.6.0
2. Building the image
COMMAND - cd /path/to/dockerfile
docker build --rm -t sequenceiq/spark:1.6.0

ERROR –

Running the image

 if using boot2docker make sure your VM has more than 2GB memory
 in your /etc/hosts file add $(boot2docker ip) as host 'sandbox' to make it easier to access
your sandbox UI
 open yarn UI ports when running container

docker run -it -p 8088:8088 -p 8042:8042 -p 4040:4040 -h sandbox sequenceiq/spark:1.6.0 bash

docker run -d -h sandbox sequenceiq/spark:1.6.0 -d

Versions
Hadoop 2.6.0 and Apache Spark v1.6.0 on Centos

Testing
There are two deploy modes that can be used to launch Spark applications on YARN.

YARN-client mode
In yarn-client mode, the driver runs in the client process, and the application master is only used
for requesting resources from YARN.

# run the spark shell

spark-shell \
--master yarn-client \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1
# execute the the following command which should return 1000
scala> sc.parallelize(1 to 1000).count()

YARN-cluster mode
In yarn-cluster mode, the Spark driver runs inside an application master process which is managed
by YARN on the cluster, and the client can go away after initiating the application.

Estimating Pi (yarn-cluster mode):

# execute the the following command which should write the "Pi is roughly 3.1418" into the logs
# note you must specify --files argument in cluster mode to enable metrics
spark-submit \
--class org.apache.spark.examples.SparkPi \
--files $SPARK_HOME/conf/metrics.properties \
--master yarn-cluster \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
$SPARK_HOME/lib/spark-examples-1.6.0-hadoop2.6.0.jar
Estimating Pi (yarn-client mode):

# execute the the following command which should print the "Pi is roughly 3.1418" to the screen
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
$SPARK_HOME/lib/spark-examples-1.6.0-hadoop2.6.0.jar

Submitting from the outside of the container

To use Spark from outside of the container it is necessary to set the YARN_CONF_DIR environment
variable to directory with a configuration appropriate for the docker. The repository contains such
configuration in the yarn-remote-client directory.

export YARN_CONF_DIR="`pwd`/yarn-remote-client"
Docker's HDFS can be accessed only by root. When submitting Spark applications from outside of
the cluster, and from a user different than root, it is necessary to configure the
HADOOP_USER_NAME variable so that root user is used.

export HADOOP_USER_NAME=root

ccs339 Crypto Currency Lab Manual
No ratings yet
ccs339 Crypto Currency Lab Manual
54 pages
MT6893 5G Smartphone Application Processor Technical Brief V1.2
No ratings yet
MT6893 5G Smartphone Application Processor Technical Brief V1.2
77 pages
Python
No ratings yet
Python
8 pages
Unit V
No ratings yet
Unit V
23 pages
5 ApacheSparkonKubernetes
No ratings yet
5 ApacheSparkonKubernetes
5 pages
Mujeeb Mulesoft Resume
No ratings yet
Mujeeb Mulesoft Resume
3 pages
Portable Programming Device by Salto: RW PPD
No ratings yet
Portable Programming Device by Salto: RW PPD
16 pages
U-4 Rem
No ratings yet
U-4 Rem
8 pages
Unit-4 Containers and Docker
No ratings yet
Unit-4 Containers and Docker
44 pages
How To Dockerize Web Application Using Docker Compose
100% (2)
How To Dockerize Web Application Using Docker Compose
11 pages
Week 9-Module 10 Build and Deploy ML Models
No ratings yet
Week 9-Module 10 Build and Deploy ML Models
27 pages
7 Get Started With Auto Scaling Pods
No ratings yet
7 Get Started With Auto Scaling Pods
9 pages
Docker
No ratings yet
Docker
22 pages
RTP3400 Datasheet
No ratings yet
RTP3400 Datasheet
17 pages
Arnav MLOPSLab03
No ratings yet
Arnav MLOPSLab03
5 pages
Apache Spark Installation and Programming Guide
No ratings yet
Apache Spark Installation and Programming Guide
2 pages
Docker Data Science
No ratings yet
Docker Data Science
22 pages
Pytorch Extending Our Containers
No ratings yet
Pytorch Extending Our Containers
13 pages
Onur 447 Spring15 Lecture5 Uarch Afterlecture
No ratings yet
Onur 447 Spring15 Lecture5 Uarch Afterlecture
80 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Chapter - 6: Linux File System Structure
No ratings yet
Chapter - 6: Linux File System Structure
38 pages
Learning Spark - Chapter 2
No ratings yet
Learning Spark - Chapter 2
6 pages
Containerized AI For Anomaly Detection
No ratings yet
Containerized AI For Anomaly Detection
12 pages
Spark Introduction
No ratings yet
Spark Introduction
19 pages
20.docker Notes
No ratings yet
20.docker Notes
16 pages
The New IBM z13 PART 1 - SHARE Feb21a2014
No ratings yet
The New IBM z13 PART 1 - SHARE Feb21a2014
56 pages
Big Data Analytics: Indian Institute of Management Bangalore
100% (1)
Big Data Analytics: Indian Institute of Management Bangalore
8 pages
CST202 Computer Organization and Architecture, December 2024
No ratings yet
CST202 Computer Organization and Architecture, December 2024
2 pages
CS 1102 Unit 2 Programming Assignment
29% (7)
CS 1102 Unit 2 Programming Assignment
2 pages
Test Pi
No ratings yet
Test Pi
17 pages
Final Project Report Crime Data
No ratings yet
Final Project Report Crime Data
37 pages
Middleware
100% (1)
Middleware
15 pages
CLOUD COMPUTING FULL - 230912 - 062952 - Removed - 230913 - 123329
No ratings yet
CLOUD COMPUTING FULL - 230912 - 062952 - Removed - 230913 - 123329
49 pages
Docker Notes
No ratings yet
Docker Notes
10 pages
Kubernetes 10 Tips To Overcome The Overwhelming
No ratings yet
Kubernetes 10 Tips To Overcome The Overwhelming
15 pages
Dlangspec PDF
No ratings yet
Dlangspec PDF
516 pages
Red Hat Portfolio Overview
No ratings yet
Red Hat Portfolio Overview
27 pages
Docker
No ratings yet
Docker
6 pages
Laser Tool XT440
No ratings yet
Laser Tool XT440
8 pages
Programming Projects in C: For Students of Engineering, Science, and Mathematics
0% (1)
Programming Projects in C: For Students of Engineering, Science, and Mathematics
15 pages
Uninstalling DB2 On Linux
No ratings yet
Uninstalling DB2 On Linux
5 pages
Practical-6
No ratings yet
Practical-6
7 pages
Ludo Game
No ratings yet
Ludo Game
4 pages
Iphone Marketing Mix
No ratings yet
Iphone Marketing Mix
12 pages
Perl Syntax: Basic Script
No ratings yet
Perl Syntax: Basic Script
9 pages
Lab Manual Cloud Computing PDF
No ratings yet
Lab Manual Cloud Computing PDF
12 pages
How To Calculate Present Values?: Abhinav Anand (IIM Bangalore)
No ratings yet
How To Calculate Present Values?: Abhinav Anand (IIM Bangalore)
50 pages
1 Inggris
No ratings yet
1 Inggris
4 pages
Finance For Livelihoods at The Margins: Moneylender Credit in A Rajasthan Village (Draft Notes - Not For Quotation)
No ratings yet
Finance For Livelihoods at The Margins: Moneylender Credit in A Rajasthan Village (Draft Notes - Not For Quotation)
14 pages
Chap 4 BMA
No ratings yet
Chap 4 BMA
38 pages
Word Exercises
No ratings yet
Word Exercises
1 page
Duality
No ratings yet
Duality
31 pages
HPA and METRIC SERVER K8S
No ratings yet
HPA and METRIC SERVER K8S
5 pages
Docc 004
No ratings yet
Docc 004
10 pages
Quiz 2 Inputs
No ratings yet
Quiz 2 Inputs
8 pages
Duality Notes and Problems
No ratings yet
Duality Notes and Problems
12 pages
S3-Information Asymmetry
No ratings yet
S3-Information Asymmetry
9 pages
What Is Operations Research (Management Science) ?
No ratings yet
What Is Operations Research (Management Science) ?
11 pages
What Is Bigquery: Enterprise Data Warehouse
No ratings yet
What Is Bigquery: Enterprise Data Warehouse
2 pages
Evaluate Investment Option in GARDEN CITY PROJECT Using Net Present Value
No ratings yet
Evaluate Investment Option in GARDEN CITY PROJECT Using Net Present Value
7 pages
Ethernet
No ratings yet
Ethernet
5 pages
The Maturing of Grameen Bank: Eviews
No ratings yet
The Maturing of Grameen Bank: Eviews
3 pages
What Is Docker
No ratings yet
What Is Docker
3 pages
Curriculum Vitae Lucas Coutinho Marcelino Personal Information
No ratings yet
Curriculum Vitae Lucas Coutinho Marcelino Personal Information
3 pages
GMAD00246010 - SatC640-C650 Laptop Toshiba
No ratings yet
GMAD00246010 - SatC640-C650 Laptop Toshiba
2 pages
Lecture12 PDF
No ratings yet
Lecture12 PDF
9 pages
How to Hack Like a Ghost: Breaching the Cloud
From Everand
How to Hack Like a Ghost: Breaching the Cloud
Sparc Flow
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Cloud Native Security
From Everand
Cloud Native Security
Chris Binnie
5/5 (1)
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
From Everand
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
Andrew Lee
3/5 (2)
Hands-on React Native
From Everand
Hands-on React Native
Ahmed Bouchefra
1/5 (1)
Native Docker Clustering with Swarm
From Everand
Native Docker Clustering with Swarm
Fabrizio Soppelsa
No ratings yet
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Mastering Go Network Automation
From Everand
Mastering Go Network Automation
Ian Taylor
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Ansible For Linux by Examples
From Everand
Ansible For Linux by Examples
Luca Berton
No ratings yet
Understanding Software Engineering Vol 3: Programming Basic Software Functionalities.
From Everand
Understanding Software Engineering Vol 3: Programming Basic Software Functionalities.
Gabriel Clemente
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Build your own Blockchain: Make your own blockchain and trading bot on your pc
From Everand
Build your own Blockchain: Make your own blockchain and trading bot on your pc
Magelan Cybersecurity
No ratings yet
Arch Linux: Fast and Light!
From Everand
Arch Linux: Fast and Light!
Frank Cheung
3/5 (2)
Extending Docker
From Everand
Extending Docker
Russ McKendrick
5/5 (1)
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
From Everand
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
Ian Taylor
No ratings yet
Linux DevOps Tools Engineer (701) Practice Tests: 400 Questions to Ace Your Certification
From Everand
Linux DevOps Tools Engineer (701) Practice Tests: 400 Questions to Ace Your Certification
Steve Brown
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
vSphere 5 AutoLab 1.1a Deployment Guide
From Everand
vSphere 5 AutoLab 1.1a Deployment Guide
Alastair Cooke
No ratings yet
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet
Network Security All-in-one: ASA Firepower WSA Umbrella VPN ISE Layer 2 Security
From Everand
Network Security All-in-one: ASA Firepower WSA Umbrella VPN ISE Layer 2 Security
Redouane MEDDANE
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
From Everand
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
John Edward Cooper Berg
No ratings yet
DevOps. How To Build Pipelines With Bitbucket Pipelines + Docker Container + AWS ECS + JDK 11 + Maven 3?
From Everand
DevOps. How To Build Pipelines With Bitbucket Pipelines + Docker Container + AWS ECS + JDK 11 + Maven 3?
John Edward Cooper Berg
No ratings yet
Configuration of Apache Server to Support Asp
From Everand
Configuration of Apache Server to Support Asp
Dr. Hidaia Mahmood Alassouli
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet

Apache Spark On Docker: 1. Pull The Image From Docker Repository

Uploaded by

Apache Spark On Docker: 1. Pull The Image From Docker Repository

Uploaded by

Apache Spark on Docker

1. Pull the image from Docker Repository

Running the image

docker run -it -p 8088:8088 -p 8042:8042 -p 4040:4040 -h sandbox sequenceiq/spark:1.6.0 bash

docker run -d -h sandbox sequenceiq/spark:1.6.0 -d

# run the spark shell

Estimating Pi (yarn-cluster mode):

Submitting from the outside of the container

You might also like