0% found this document useful (0 votes)
62 views8 pages

Installation of Hadoop

The document provides steps to install and configure Hadoop on Ubuntu, including: 1) Installing Java and configuring JAVA_HOME. 2) Creating a dedicated Hadoop user "hduser" and generating an SSH key. 3) Downloading and extracting Hadoop before configuring core-site.xml, mapred-site.xml, and hdfs-site.xml files. 4) Formatting the NameNode and starting all Hadoop services using start-all.sh. 5) Verifying services are running using the jps tool.

Uploaded by

David Joseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views8 pages

Installation of Hadoop

The document provides steps to install and configure Hadoop on Ubuntu, including: 1) Installing Java and configuring JAVA_HOME. 2) Creating a dedicated Hadoop user "hduser" and generating an SSH key. 3) Downloading and extracting Hadoop before configuring core-site.xml, mapred-site.xml, and hdfs-site.xml files. 4) Formatting the NameNode and starting all Hadoop services using start-all.sh. 5) Verifying services are running using the jps tool.

Uploaded by

David Joseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1. Installing Sun JDK 1.6: Installing JDK is a required step to install Hadoop.

You can follow the steps in


my previous post.

1. Based on your linux architecture, download the proper version from Oracle website (Oracle
JDK 1.7)
2. Then, uncompress the jdk archive using the following command:
tar -xvf jdk-7u65-linux-i586.tar
Or using the following command for 64 bits:
tar -xvf jdk-7u65-linux-x64.tar
3. Create a folder named jvm under (if not exists) using the following command
sudo mkdir -p /usr/lib/jvm
4. Then, move the extracted directory to /usr/lib/jvm:
sudo mv ~/Downloads/jdk1.7.0_71 /usr/lib/jvm/
5. Run the following commands to update the execution alternatives:
sudo update-alternatives --install "/usr/bin/java" "java"
"/usr/lib/jvm/jdk1.7.0_71/bin/java" 1 sudo update-alternatives
--install "/usr/bin/javac" "javac"
"/usr/lib/jvm/jdk1.7.0_71/bin/javac" 1 sudo update-alternatives
--install "/usr/bin/javaws" "javaws"
"/usr/lib/jvm/jdk1.7.0_71/bin/javaws" 1
6. Finally, you need to export JAVA_HOME variable:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
or it is better to set JAVA_HOME in .bashrc:
nano ~/.bashrc
then add the same line:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71

2. Adding a dedicated Hadoop system user: You will need a user for hadoop system you will install. To
create a new user "hduser" in a group called "hadoop", run the following commands in your terminal:
$sudo addgroup hadoop
$sudo adduser --ingroup hadoop hduser
3.ConfiguringSSH:inMichaelBlog,heassumedthattheSSHisalreadyinstalled.Butifyoudidn'tinstallSSH
serverbefore,youcanrunthefollowingcommandinyourterminal:Bythiscommand,youwillhaveinstalledssh

serveronyourmachine,theportis22bydefault.

$sudo apt-get install openssh-server


WehaveinstalledSSHbecauseHadooprequiresaccesstolocalhost(incasesinglenodecluster)or
communicateswithremotenodes(incasemultinodecluster).
Afterthisstep,youwillneedtogenerateSSHkeyforhduser(andtheusersyouneedtoadministerHadoopif
any)byrunningthefollowingcommands,butyouneedfirsttoswitchtohduser:
$su - hduser
$ssh-keygen -t rsa -P ""
TobesurethatSSHinstallationiswentwell,youcanopenanewterminalandtrytocreatesshsessionusing
hduserbythefollowingcommand:
$ssh localhost

InstallingHadoop
NowwecandownloadHadooptobegininstallation.GotoApacheDownloadsanddownloadHadoopversion
0.20.2.Toovercomethesecurityissues,youcandownloadthetarfileinhduserdirectory,for
example,/home/hduser.Checkthefollowingsnapshot:

Thenyouneedtoextractthetarfileandrenametheextractedfolderto'hadoop'.Openanewterminalandrunthe
followingcommand:
$ cd /home/hduser
$ sudo tar xzf hadoop-0.20.2.tar.gz
$ sudo mv hadoop-0.20.2 hadoop
Pleasenoteifyouwanttograntaccessforanotherhadoopadminuser(e.g.hduser2),youhavetogrant
readpermissiontofolder/home/hduserusingthefollowingcommand:
sudo chown -R hduser2:hadoop hadoop

Update$HOME/.bashrc
Youwillneedtoupdatethe.bachrcforhduser(andforeveryuseryouneedtoadministerHadoop).Toopen.bachrc
file,youwillneedtoopenitasroot:
$sudogedit/home/hduser/.bashrc
Thenyouwilladdthefollowingconfigurationsattheendof.bachrcfile

# Set Hadoop-# related environment variables


export HADOOP_HOME=/home/hduser/hadoop

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later
on)

export JAVA_HOME=/usr/lib/jvm/java-6-sun
# or you can write the following command if you used this post to install your java
# export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH

# Some convenient aliases and functions for running Hadoop-related commands


unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin

HadoopConfiguration

Now,weneedtoconfigureHadoopframeworkonUbuntumachine.Thefollowingareconfigurationfileswecan
usetodotheproperconfiguration.Toknowmoreabouthadoopconfigurations,youcanvisitthissite

hadoopenv.sh
WeneedonlytoupdatetheJAVA_HOMEvariableinthisfile.Simplyyouwillopenthisfileusingatexteditor
usingthefollowingcommand:

$sudo gedit /home/hduser/hadoop/conf/hadoop-env.sh


Thenyouwillneedtochangethefollowingline
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
To
export JAVA_HOME=/usr/lib/jvm/java-6-sun
or you can write the following command if you used this post to install your java
# export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
Note:ifyoufaced"Error:JAVA_HOMEisnotset"Errorwhilestartingtheservices,thenyouseemsthatyou
forgottoeuncommentthepreviousline(justremove#).

coresite.xml
First,weneedtocreateatempdirectoryforHadoopframework.Ifyouneedthisenvironmentfortestingoraquick
prototype(e.g.developsimplehadoopprogramsforyourpersonaltest...),Isuggesttocreatethisfolder
under/home/hduser/directory,otherwise,youshouldcreatethisfolderinasharedplaceundersharedfolder(like
/usr/local...)butyoumayfacesomesecurityissues.Buttoovercometheexceptionsthatmaycausedbysecurity
(likejava.io.IOException),Ihavecreatedthetmpfolderunderhduserspace.
Tocreatethisfolder,typethefollowingcommand:
$ sudo mkdir

/home/hduser/tmp

Pleasenotethatifyouwanttomakeanotheradminuser(e.g.hduser2inhadoopgroup),youshouldgranthimaread
andwritepermissiononthisfolderusingthefollowingcommands:

$ sudo chown hduser2:hadoop /home/hduser/tmp


$ sudo chmod 755 /home/hduser/tmp
Now,wecanopenhadoop/conf/coresite.xmltoeditthehadoop.tmp.direntry.
Wecanopenthecoresite.xmlusingtexteditor:
$sudogedit/home/hduser/hadoop/conf/coresite.xml
Thenaddthefollowingconfigurationsbetween<configuration>..</configuration>xmlelements:
<!-- In: conf/core-site.xml -->

<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

mapredsite.xml
Wewillopenthehadoop/conf/mapredsite.xmlusingatexteditorandaddthefollowingconfigurationvalues(like
coresite.xml)
<!-- In: conf/mapred-site.xml -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

hdfssite.xml
Openhadoop/conf/hdfssite.xmlusingatexteditorandaddthefollowingconfigurations:
<!-- In: conf/hdfs-site.xml -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

FormattingNameNode
YoushouldformattheNameNodeinyourHDFS.Youshouldnotdothisstepwhenthesystemisrunning.Itis
usuallydoneonceatfirsttimeofyourinstallation.
Runthefollowingcommand
$/home/hduser/hadoop/bin/hadoop namenode -format

NameNode Formatting

StartingHadoopCluster
Youwillneedtonavigatetohadoop/bindirectoryandrun./startall.shscript.

Starting Hadoop Services using ./start-all.sh

Thereisanicetoolcalledjps.Youcanuseittoensurethatalltheservicesareup.

Using jps tool

The key feature of a Writable is that the framework knows how to serialize and deserialize
a Writable object. The WritableComparable adds the compareTo interface so the framework
knows how to sort the WritableComparable objects.

You might also like