0% found this document useful (0 votes)

16 views13 pages

Experiment-2 BDA Lab

The document outlines the installation process for Hadoop in three modes: Standalone, Pseudo Distributed, and Fully Distributed. It provides detailed steps for setting up the environment, configuring necessary files, and verifying the installation for each mode. Additionally, it includes instructions for monitoring the Hadoop setup using web-based tools.

Uploaded by

Sai Tejaswini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views13 pages

Experiment-2 BDA Lab

Uploaded by

Sai Tejaswini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Week 3, 4:

2. (i) Perform setting up and Installing Hadoop in its three operating modes:

a) Standalone,

b) Pseudo distributed,

c) Fully distributed

(ii) Use web based tools to monitor your Hadoop setup.

Hadoop Operation Modes :

Local/Standalone Mode : After downloading Hadoop in your system, by default, it is

configured in a standalone mode and can be run as a single java process.

Pseudo Distributed Mode : It is a distributed simulation on single machine. Each Hadoop

daemon such as hdfs, yarn, MapReduce etc., will run as a separate java process. This mode is
useful for development.

Fully Distributed Mode : This mode is fully distributed with minimum two or more machines
as a cluster.

Hadoop is supported by GNU/Linux platform and its flavors. Therefore, we have to install a
Linux operating system for setting up Hadoop environment. In case you have an OS other
than Linux, you can install a Virtualbox software in it and have Linux inside the Virtualbox

Installing Hadoop in Standalone Mode :

Here we will discuss the installation of Hadoop 2.7.0 in standalone mode.

There are no daemons running and everything runs in a single JVM. Standalone mode is
suitable for running MapReduce programs during development, since it is easy to test and
debug them.

STEP-1 :

To get the required packages

It downloads the package lists from the repositories and "updates" them to get information
on the newest versions of packages and their dependencies. It will do this for all repositories
and PPAs(Personal package archives).

$ sudo apt-get update

STEP -2 :

Installing Java

Java is the main prerequisite for Hadoop. First of all, you should verify the existence of java in
your system using the command “java -version”.

$ java –version

If not install java using following commands…

$ sudo add-apt-repository ppa:webupd8team/java

$ sudo apt-get update

$ sudo apt-get install oracle-java7-installer

Pre-installation Setup

Before installing Hadoop into the Linux environment, we need to set up Linux using ssh
(Secure Shell). Follow the steps given below for setting up the Linux environment.

SSH setup is required to do different operations on a cluster such as starting, stopping,

distributed daemon shell operations. To authenticate different users of Hadoop, it is required
to provide public/private key pair for a Hadoop user and share it with different users.

The following commands are used for generating a key value pair using SSH. Copy the public
keys form id_dsa.pub to authorized_keys, and provide the owner with read and write
permissions to authorized_keys file respectively.

$ sudo apt-get install ssh

$ sudo apt-get install rsync

$ ssh-keygen -t dsa -P ' ' -f ~/.ssh/id_dsa

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Downloading Hadoop:

$ wget -c http://apache.mirrors.lucidnetworks.net/hadoop/common/hadoop-2.7.0/hadoop-
2.7.0.tar.gz

Extracting and Installing hadoop

$ sudo tar -zxvf hadoop-2.7.0.tar.gz

To provide accessibility to all the users changing the location of hadoop

$ sudo mv hadoop /usr/local/hadoop

update-java-alternatives updates all alternatives belonging to one

runtime or development kit for the Java language. A package does
provide these information of it's alternatives in
/usr/lib/jvm/.<jname>.jinfo.

$ update-alternatives --config java

You can set Hadoop environment variables by appending the following commands to
~/.bashrc file.

$ sudo gedit ~/.bashrc

#Hadoop Variables

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Redirect the environment to hadoop environment variables

$ source ~/.bashrc

You can find all the Hadoop configuration files in the location
“$HADOOP_HOME/etc/hadoop”. It is required to make changes in those configuration files
according to your Hadoop infrastructure.

Changing directory path to following path :

$ cd /usr/local/hadoop/etc/hadoop

Setting up JAVA_HOME path :

In order to develop Hadoop programs in java, you have to reset the java environment
variables in hadoop-env.sh file by replacing JAVA_HOME value with the location of java in
your system.

$ sudo gedit hadoop-env.sh

#The java implementation to use.

export JAVA_HOME="/usr/lib/jvm/java-7-oracle"

so that’s all required for Hadoop standalone installation process.

Output :-

Before proceeding further, to make sure that Hadoop is working fine.

Just issue the following command:

$ hadoop version

If everything is fine with your setup, then you should see the following result:

aliet@lab1-9:~$ hadoop version

Hadoop 2.7.0

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
d4c8d4d4d203c934e8074b31289a28724c0842cf
Compiled by jenkins on 2015-04-10T18:40Z
Compiled with protoc 2.5.0
From source with checksum a9e90912c37a35c3195d23951fd18f
This command was run using /usr/local/hadoop/hadoop-
2.7.0/share/hadoop/common/hadoop-common-2.7.0.jar

It means your Hadoop's standalone mode setup is working fine. By default, Hadoop is
configured to run in a non-distributed mode on a single machine.

----------------------------Hadoop Stand alone Mode installation process ends here-----------------

Installing Hadoop in Pseudo Distributed Mode :

Continuation to the hadoop standalone mode setup, we have to make the following
configurations to convert hadoop setup as Pseudo Distributed mode.

You can find all the Hadoop configuration files in the location
“/usr/local/hadoop/etc/hadoop”. It is required to make changes in those configuration files
according to your Hadoop infrastructure. So we need to change the path before editing the
files as follows :

$ cd /usr/local/hadoop/etc/hadoop

The following are the list of files that you have to edit to configure Hadoop.

$ Sudo gedit core-site.xml

The core-site.xml file contains information such as the port number used for Hadoop
instance, memory allocated for the file system, memory limit for storing the data, and size of
Read/Write buffers.

Open the core-site.xml and add the following properties in between <configuration>,
</configuration> tags.

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

$sudo gedit hdfs-site.xml

The hdfs-site.xml file contains information such as the value of replication data, namenode
path, and datanode paths of your local file systems. It means the place where you want to
store the Hadoop infrastructure.

$sudo gedit yarn-site.xml

This file is used to configure yarn into Hadoop. Open the yarn-site.xml file and add the
following properties in between the <configuration>, </configuration> tags in this file.

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

mapred-site.xml

This file is used to specify which MapReduce framework we are using. By default, Hadoop
contains a template of yarn-site.xml. First of all, it is required to copy the file from mapred-
site,xml.template to mapred-site.xml file using the following command.

$ sudo cp mapred.site.xml.template mapred-site.xml

$sudo gedit mapred-site.xml

Open mapred-site.xml file and add the following properties in between the <configuration>,
</configuration>tags in this file.

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Verifying Hadoop Pseudo mode Installation

The following steps are used to verify the Hadoop installation.

$ cd ~

Creating directories related to name node and datanode

$ mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

$ mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

Changing the ownership and providing access to hadoop directories

$ sudo chown aliet:aliet -R /usr/local/hadoop

Set up the namenode using the command “hdfs namenode -format” as follows.

$ hdfs namenode –format

The following command is used to start all services(daemons). Executing this command will
start your Hadoop file system.

$ start-all.sh

$ jps

Output :-

If everything is fine with your setup, then you should see the following result:

3796 Jps
3205 SecondaryNameNode
2849 NameNode
3362 ResourceManager
3492 DataNode
3498 NodeManager
FULLY DISTRIBUTED MODE :

Hadoop Multi-Node cluster on a distributed environment

As the whole cluster cannot be demonstrated, we are explaining the Hadoop cluster
environment using three systems (one master and two slaves); given below are their IP
addresses.

Hadoop Master : 192.168.1.15 (hadoop-master)

Hadoop Slave : 192.168.1.16 (hadoop-slave-1)
Hadoop Slave : 192.168.1.17 (hadoop-slave-2)

1. Installing Java and checking the installation

2. Installing SSH and adding key

Clone Hadoop single node cluster as Hadoop master and slave-1, slave-2
(Take 3 systems installed with hadoop single node cluster and make the following changes to
configure fully distributed mode (multi node cluster))

In Hadoop master node and slave nodes :

$ sudo gedit /etc/hosts

192.168.1.15 master
192.168.1.16 slave1
192.168.1.17 slave2

$ sudo gedit /etc/hostname

Master

$ cd /usr/local/hadoop/etc/hadoop

$ sudo gedit core-site.xml

replace localhost as master

$ sudo gedit hdfs-site.xml

replace value 1 as 3

$ sudo gedit yarn-site.xml

add the following configuration

<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
</configuration>

$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

remove dfs.namenode.name.dir property section

$ sudo rm -rf /usr/local/hadoop/hadoop_data

$ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode

$ sudo chown -R aliet:aliet /usr/local/hadoop

Reboot master node

//In Slave nodes only

Hadoopslave Node (conf should be done on each slavenode)

$ sudo nano /etc/hostname

slave<number>

reboot all nodes

// In Hadoopmaster Node only

$ sudo nano /usr/local/hadoop/etc/hadoop/masters

Master

$ sudo nano /usr/local/hadoop/etc/hadoop/slaves

remove localhost and add

slave1
slave2

$ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

replace dfs.datanode.data.dir property section

as dfs.namenode.name.dir

$ sudo rm -rf /usr/local/hadoop/hadoop_data

$ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode

$ sudo chown -R aliet:aliet /usr/local/hadoop

$ hadoop namenode -format

$ start-all.sh

$ jps (check in all 3 datanodes)

Output :

Checking in Browser :

http://master:8088/

http://master:50070/
(ii)Use web based tools to monitor your Hadoop setup.

The default port number to access Hadoop is 50070. Use the following url to get Hadoop
services on browser.

http://localhost:50070

Verify All Applications for Cluster

The default port number to access all applications of cluster is 8088. Use the following url to
visit this service.

http://localhost:8088
Fully Distributed mode

http://master:50070 (or) http://192.168.1.15:50070

http://master:8088 (or) http://192.168.1.15:8088

213nt1306 - Big Data Analytics Lab Manual
No ratings yet
213nt1306 - Big Data Analytics Lab Manual
80 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
Aryan
No ratings yet
Aryan
60 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
EDU-210-8.1-Lab Guide PDF
0% (1)
EDU-210-8.1-Lab Guide PDF
167 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6 Installing On Ubuntu 14.04 (Single-Node Cluster)
27 pages
Veritas Storage Foundation 5.0 Administrator's Guide
No ratings yet
Veritas Storage Foundation 5.0 Administrator's Guide
710 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
BDAO
No ratings yet
BDAO
23 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
How To Install Hadoop On Ubuntu 18
No ratings yet
How To Install Hadoop On Ubuntu 18
15 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
Hadoop Installation
No ratings yet
Hadoop Installation
7 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Hadoop Cluster Creation
No ratings yet
Hadoop Cluster Creation
8 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Support of Hadoop Cluster Installation and Administration
No ratings yet
Support of Hadoop Cluster Installation and Administration
10 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Configuration Guide - IP Service (V200R002C00 - 01)
No ratings yet
Configuration Guide - IP Service (V200R002C00 - 01)
227 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Hadoop Installation
No ratings yet
Hadoop Installation
5 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Bda Record
No ratings yet
Bda Record
27 pages
Session Hijacking
No ratings yet
Session Hijacking
50 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
Experiment 1 Hadoop Installation
No ratings yet
Experiment 1 Hadoop Installation
6 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Big Data Lab Record
No ratings yet
Big Data Lab Record
30 pages
Hadoop Multinode Cluster Installation
No ratings yet
Hadoop Multinode Cluster Installation
4 pages
Practical 5
No ratings yet
Practical 5
3 pages
Hadoop 2 - Pseudo Node Installation
No ratings yet
Hadoop 2 - Pseudo Node Installation
9 pages
Build Internet Infrastructure LO2
No ratings yet
Build Internet Infrastructure LO2
19 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
HADOOP 1.X Installation Steps On Ubuntu
No ratings yet
HADOOP 1.X Installation Steps On Ubuntu
3 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Lesson Plan in TLE Grade 10
No ratings yet
Lesson Plan in TLE Grade 10
7 pages
Huawei HCIA-Big Data V3.0 Certification Exam
No ratings yet
Huawei HCIA-Big Data V3.0 Certification Exam
4 pages
1.1 Informix Fundamentals
No ratings yet
1.1 Informix Fundamentals
61 pages
ReadMe (Vertigo173)
No ratings yet
ReadMe (Vertigo173)
2 pages
PowerEdge R740 Server Specification 1
No ratings yet
PowerEdge R740 Server Specification 1
2 pages
DIgital Forensics DA 1
No ratings yet
DIgital Forensics DA 1
17 pages
Log
No ratings yet
Log
94 pages
PowerScale - Isilon - HD400-CTO Hardware Upgrade Checklist
No ratings yet
PowerScale - Isilon - HD400-CTO Hardware Upgrade Checklist
14 pages
Firebird Gbak
No ratings yet
Firebird Gbak
30 pages
Virtual Disk API Programming
No ratings yet
Virtual Disk API Programming
44 pages
Contoh Soal
No ratings yet
Contoh Soal
2 pages
Vigor 3900 CLI Guide PDF
No ratings yet
Vigor 3900 CLI Guide PDF
97 pages
Clustered Installations: Sterling B2B Integrator
No ratings yet
Clustered Installations: Sterling B2B Integrator
52 pages
Lecture 9 - Multiplexing - Demultiplexing
No ratings yet
Lecture 9 - Multiplexing - Demultiplexing
11 pages
EdgeTech JSF DATA FILE Description 0004824 - Rev - 1 - 20
No ratings yet
EdgeTech JSF DATA FILE Description 0004824 - Rev - 1 - 20
38 pages
05 TD IoT
No ratings yet
05 TD IoT
29 pages
Asynchronous Transfer Mode
No ratings yet
Asynchronous Transfer Mode
33 pages
HSCSCollector 0IG 001
No ratings yet
HSCSCollector 0IG 001
16 pages
Principles Technology: Services
No ratings yet
Principles Technology: Services
19 pages
C Memory Management
No ratings yet
C Memory Management
8 pages
PAVER WCF Service Installation
No ratings yet
PAVER WCF Service Installation
8 pages
Network Video Recorder: NVR301E Series
No ratings yet
Network Video Recorder: NVR301E Series
3 pages
Cairo University SPARC V2 CUSPARC V2 Processor
No ratings yet
Cairo University SPARC V2 CUSPARC V2 Processor
4 pages
Event Id 1000 PDF
No ratings yet
Event Id 1000 PDF
2 pages
Teradata Performance Optimization
No ratings yet
Teradata Performance Optimization
7 pages