Installation of Hadoop
Installation of Hadoop
1. Based on your linux architecture, download the proper version from Oracle website (Oracle
JDK 1.7)
2. Then, uncompress the jdk archive using the following command:
tar -xvf jdk-7u65-linux-i586.tar
Or using the following command for 64 bits:
tar -xvf jdk-7u65-linux-x64.tar
3. Create a folder named jvm under (if not exists) using the following command
sudo mkdir -p /usr/lib/jvm
4. Then, move the extracted directory to /usr/lib/jvm:
sudo mv ~/Downloads/jdk1.7.0_71 /usr/lib/jvm/
5. Run the following commands to update the execution alternatives:
sudo update-alternatives --install "/usr/bin/java" "java"
"/usr/lib/jvm/jdk1.7.0_71/bin/java" 1 sudo update-alternatives
--install "/usr/bin/javac" "javac"
"/usr/lib/jvm/jdk1.7.0_71/bin/javac" 1 sudo update-alternatives
--install "/usr/bin/javaws" "javaws"
"/usr/lib/jvm/jdk1.7.0_71/bin/javaws" 1
6. Finally, you need to export JAVA_HOME variable:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
or it is better to set JAVA_HOME in .bashrc:
nano ~/.bashrc
then add the same line:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
2. Adding a dedicated Hadoop system user: You will need a user for hadoop system you will install. To
create a new user "hduser" in a group called "hadoop", run the following commands in your terminal:
$sudo addgroup hadoop
$sudo adduser --ingroup hadoop hduser
3.ConfiguringSSH:inMichaelBlog,heassumedthattheSSHisalreadyinstalled.Butifyoudidn'tinstallSSH
serverbefore,youcanrunthefollowingcommandinyourterminal:Bythiscommand,youwillhaveinstalledssh
serveronyourmachine,theportis22bydefault.
InstallingHadoop
NowwecandownloadHadooptobegininstallation.GotoApacheDownloadsanddownloadHadoopversion
0.20.2.Toovercomethesecurityissues,youcandownloadthetarfileinhduserdirectory,for
example,/home/hduser.Checkthefollowingsnapshot:
Thenyouneedtoextractthetarfileandrenametheextractedfolderto'hadoop'.Openanewterminalandrunthe
followingcommand:
$ cd /home/hduser
$ sudo tar xzf hadoop-0.20.2.tar.gz
$ sudo mv hadoop-0.20.2 hadoop
Pleasenoteifyouwanttograntaccessforanotherhadoopadminuser(e.g.hduser2),youhavetogrant
readpermissiontofolder/home/hduserusingthefollowingcommand:
sudo chown -R hduser2:hadoop hadoop
Update$HOME/.bashrc
Youwillneedtoupdatethe.bachrcforhduser(andforeveryuseryouneedtoadministerHadoop).Toopen.bachrc
file,youwillneedtoopenitasroot:
$sudogedit/home/hduser/.bashrc
Thenyouwilladdthefollowingconfigurationsattheendof.bachrcfile
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later
on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun
# or you can write the following command if you used this post to install your java
# export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
HadoopConfiguration
Now,weneedtoconfigureHadoopframeworkonUbuntumachine.Thefollowingareconfigurationfileswecan
usetodotheproperconfiguration.Toknowmoreabouthadoopconfigurations,youcanvisitthissite
hadoopenv.sh
WeneedonlytoupdatetheJAVA_HOMEvariableinthisfile.Simplyyouwillopenthisfileusingatexteditor
usingthefollowingcommand:
coresite.xml
First,weneedtocreateatempdirectoryforHadoopframework.Ifyouneedthisenvironmentfortestingoraquick
prototype(e.g.developsimplehadoopprogramsforyourpersonaltest...),Isuggesttocreatethisfolder
under/home/hduser/directory,otherwise,youshouldcreatethisfolderinasharedplaceundersharedfolder(like
/usr/local...)butyoumayfacesomesecurityissues.Buttoovercometheexceptionsthatmaycausedbysecurity
(likejava.io.IOException),Ihavecreatedthetmpfolderunderhduserspace.
Tocreatethisfolder,typethefollowingcommand:
$ sudo mkdir
/home/hduser/tmp
Pleasenotethatifyouwanttomakeanotheradminuser(e.g.hduser2inhadoopgroup),youshouldgranthimaread
andwritepermissiononthisfolderusingthefollowingcommands:
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
mapredsite.xml
Wewillopenthehadoop/conf/mapredsite.xmlusingatexteditorandaddthefollowingconfigurationvalues(like
coresite.xml)
<!-- In: conf/mapred-site.xml -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
hdfssite.xml
Openhadoop/conf/hdfssite.xmlusingatexteditorandaddthefollowingconfigurations:
<!-- In: conf/hdfs-site.xml -->
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
FormattingNameNode
YoushouldformattheNameNodeinyourHDFS.Youshouldnotdothisstepwhenthesystemisrunning.Itis
usuallydoneonceatfirsttimeofyourinstallation.
Runthefollowingcommand
$/home/hduser/hadoop/bin/hadoop namenode -format
NameNode Formatting
StartingHadoopCluster
Youwillneedtonavigatetohadoop/bindirectoryandrun./startall.shscript.
Thereisanicetoolcalledjps.Youcanuseittoensurethatalltheservicesareup.
The key feature of a Writable is that the framework knows how to serialize and deserialize
a Writable object. The WritableComparable adds the compareTo interface so the framework
knows how to sort the WritableComparable objects.