0% found this document useful (0 votes)
68 views

SRM Institute of Science and Technology: Vadapalani Campus Department of Computer Science and Engineeering

This document provides steps to install Hadoop cluster on an EC2 Ubuntu instance for educational purposes. It involves installing Java 8, openssh server, downloading Hadoop files and configuring core-site.xml, yarn-site.xml, mapred-site.xml and hdfs-site.xml files. Directories are created for datanode, namenode and permissions set. Hadoop is started using start-dfs.sh and start-yarn.sh scripts. The status of daemons can be verified using jps command and Hadoop UI accessed via ports 8088 and 50070.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

SRM Institute of Science and Technology: Vadapalani Campus Department of Computer Science and Engineeering

This document provides steps to install Hadoop cluster on an EC2 Ubuntu instance for educational purposes. It involves installing Java 8, openssh server, downloading Hadoop files and configuring core-site.xml, yarn-site.xml, mapred-site.xml and hdfs-site.xml files. Directories are created for datanode, namenode and permissions set. Hadoop is started using start-dfs.sh and start-yarn.sh scripts. The status of daemons can be verified using jps command and Hadoop UI accessed via ports 8088 and 50070.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 5

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY: VADAPALANI CAMPUS

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEEERING

Prerequisite

ubuntu 16.04

Make ec2 as password Authentication:

use command for setting password to ec2 ubuntu image: sudo passwd ubuntu

Step:1 JAVA 8-----

1. sudo add-apt-repository ppa:webupd8team/java


2. sudo apt-get update
3. sudo apt-get install oracle-java8-installer
4. sudo apt-get install oracle-java8-set-default

Step 2: SSH SERVER INSTALLATION

5. sudo apt-get install openssh-server

6. sudo sed -i -e 's/PasswordAuthentication no/PasswordAuthentication yes/g'


/etc/ssh/sshd_config

7. ssh-keygen -t dsa -P “” -f ~/.ssh/id_dsa


cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

8. sudo service ssh restart

9. ssh localhost
//passwordless login
10. exit

Step 3: Download hadoop package

https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz

10 .wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-
2.7.3.tar.gz

11. sudo tar -xzvf hadoop-2.7.3.tar.gz


sudo mkdir -p /usr/local/hadoop

sudo mv hadoop-2.7.3/* /usr/local/hadoop/

12. sudo chown -R ubuntu:ubuntu /usr/local/hadoop

//create folder for datanode and name node

13sudo mkdir -p /app/hadoop/tmp

14 sudo mkdir -p /app/hadoop/tmp

set permission

15 sudo chown -R ubuntu /app/hadoop/tmp

Step 4: Configure Hadoop:


 Check where your Java is installed:
 16 readlink -f /usr/bin/java

If you get something like /usr/lib/jvm/java-8-oracle/jre/bin/java,

/usr/lib/jvm/java-8-oracle is what you should used for JAVA_HOME.

 Add to ~/.bashrc file:

17 sudo nano ~/.bashrc


export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"

 Reload ~/.bashrc file:


18 source ~/.bashrc
 Modify JAVA_HOME in

19 sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

 Modify
20. sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

to have something like:


<configuration>
...
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/app/hadoop/tmp</value>

<description>A base for other temporary directories.</description>

</property>

...
</configuration>

 Modify
21. sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

to have something like:


<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>

 Create /usr/local/lib/hadoop-2.7.0/etc/hadoop/mapred-site.xml
from template:

21. cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml

 Modify
22. sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

to have something like:


<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

 Modify
23. sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

to have something like:


<configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

</configuration>

 Format file system:


24. hdfs namenode -format

 Start Hadoop:
25. start-dfs.sh
26. start-yarn.sh
You might be asked to accept machine’s key.
 Check if everything is running:
27. jps

You should get something like:


Jps
NodeManager
NameNode
ResourceManager
DataNode
SecondaryNameNode

TYPE IN WEB BROWSER


28. http://localhost:8088/cluster
29. http://localhost:50070/

INSTALLED HADOOP CLUSTER SUCCESSFULLY IN AMAZON EC2

You might also like