0% found this document useful (0 votes)
6 views9 pages

BDC Output 6

Uploaded by

vogalaf328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

BDC Output 6

Uploaded by

vogalaf328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Steps :-

1st Step:

Create 2 clones of the Virtual Machine you’ve previously created.


2nd Step:

Make sure all the VM’s have the following network configuration on

Apapter 2:

3rd Step:

Let’s change the hostname on each virtual machine. Open the file and
type the name of the machina. Use this command:

sudo nano /hostname


4th Step:

Now let’s figure out what our ip address is. To do that just type the
command:
ip addr

This is on the master VM, as you can see our IP is


192.168.205.10.
This means that our IP’s are:

master: 192.168.205.10
slave1: 192.168.205.11
slave2: 192.168.205.12

5th Step:

We need to edit the hosts file. Use the following command:


sudo nano /etc/hosts
and add your network information:
6th Step:

In order for the machines to assimilate the previous steps we need to


reboot them. Use the following command in all of them: sudo reboot

7th Step:

Do this step on all the Machines, master and slaves.


Now, in order to install Java we need to do some things. Follow these
commands and give permission when needed:
$ sudo apt-get install software_properties_common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install openjdk-11-jdk
To check if java is installed, run the following command.
ssh-copy-id user@pd-master
ssh-copy-id user@pd-slave1
ssh-copy-id user@pd-slave2

Let’s check if everything went well, try to connect to the slaves:


$ ssh slave01
$ ssh slave02
NOTE: Everything inside this step must be done on all the virtual
machines.

Use the following command :


$ wget http://www-us.apache.org/dist/spark/spark-2.4.4/spark-
2.4.4-bin-hadoop2.7.tgz

This is the most recent version as of the writing of this arcticle, it might
have changed
if you try it later. Anyway, I think you’ll still be good using this one.

Extract the Apache Spark file you just downloaded

Use the following command to extract the Spark tar file:


$ tar xvf spark-2.4.4-bin-hadoop2.7.tgz

Use the following command to move the spark software files to respective
directory (/usr/local/bin)
$ sudo mv spark-2.4.4-bin-hadoop2.7 /usr/local/spark

Set up the environment for Apache Spark


$ sudo gedit~/.bashrc

Note: this screenshot has a mistake, when you’re doing this


don’t leave a
space like I did. Just write “PATH=$PATH”.

the ~/.bashrc file:

Add the following line to the file. This adds the location where the spark
software file are located to the PATH variable.
9th Step:

Apache Spark Master Configuration (do this step on the Master VM only)

Edit spark-env.sh

Move to spark conf folder and create a copy of the template of spark-
env.sh
and rename it.
$ cd /usr/local/spark/conf
$ cp spark-env.sh.template spark-env.sh

Now edit the configuration file spark-env.sh. $


sudo vim spark-env.sh

And add the following parameters:


export SPARK_MASTER_HOST='<MASTER-IP>'export
JAVA_HOME=<Path_of_JAVA_installation>

Edit the configuration file slaves in (/usr/local/spark/conf


$ sudo nano slaves

You might also like