0% found this document useful (0 votes)
7 views14 pages

Hadoop InstallSteps

Uploaded by

dasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Hadoop InstallSteps

Uploaded by

dasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 14

Hadoop Installation Steps

1: Run Ubuntu

2: Open Teminal (Prass Ctrl+ Alt +T)


To Update Ubuntu with current Updates:(Step 3 is Optional)
3: vsmcoesys058$ sudo apt-get install update
Note : Press y or yes when prompt messages required
To Install Java JDK(use either default-jdk , openjdk-7-jdk)
4: vsmcoesys058$ sudo apt-get install default-jdk
OR
4: vsmcoesys058$ sudo apt-get install openjdk-7-jdk
Note : Press y or yes when prompt messages required

To Create Group for Hadoop:


5:vsmcoesys058
5: $ sudo addgroup hadoop

To Create User for Hadoop Group:


syntax: $ sudo adduser --ingroup <groupname> <username>

6:vsmcoesys058$ sudo adduser --ingroup hadoop hdvsm

Enter new UNIX password: Type your password


Retype new UNIX password: Type your password
Press Enter key for remaining 4 options
Is Information correct? (y/n): y
Give Super User Permissions to username of Hadoop
7:vsmcoesys058$ sudo adduser hdvsm sudo

Install Secure Shell (ssh) for client and server or server only.
8:vsmcoesys058$ sudo apt-get install openssh.server
Note : Press y or yes when prompt messages required

Now Connect/Login with Hadoop User Name & Password


9:vsmcoesys058$ su hdvsm
Password: Type Password which is given in step 6

Identify Present Working Directory


10: hdvsm$ pwd

Generate ssh key by using RSA algorithm


11: hdvsm$ ssh-keygen -t rsa -P “”
Enter file in which to save the key (/home/hdvsm/.ssh/id_rsa): Press Enter Key
Note : Press y or yes when prompt messages required

12:hdvsm$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Run ssh localhost


13:hdvsm$ ssh localhost
Are you sure you want to continue connecting(yes/no):yes

Now, Exit from localhost


14:hdvsm$ exit

Now, Download and Extract Hadoop software:


wget is non-interactive tool to download files from web.
If you Downloading it , then pwd should be in /home/admin_name .
Where admin_name means Ubuntu main user name,
Otherwise goto step 16B , if it is already downloaded.
15A :hdvsm$ wget http://mirrors.sonic.net/apache/hadoop/common/
hadoop-2.7.1/hadoop-2.7.1.tar.gz
15.1: hdvsm$ cd Downloads
Extract Hadoop tar command is equals to Unzip in Windows
x- extract , v – verbose , z – zip/unzip , and f - files
15.2: hdvsm@vsmcoesys058/home/Downloads $ tar xvzf hadoop- 2.7.1.tar.gz
Rename hadoop-2.7.1 into hadoop using mv command :
15.3: hdvsm@vsmcoesys058/home/Downloads $ sudo mv hadoop-2.7.1 hadoop
hdvsm@vsmcoesys058/home/Downloads

Move hadoop from Downloads folder into /usr/local/ folder using


mv command :
15.4: hdvsm@vsmcoesys058:~$ sudo mv hadoop /usr/local/
password for hdvsm: Type Password which is given in step 6
Goto Step 17

OR (
16B: Open our Softwares Folder in Home directory .
16.1: Right Click on hadoop-2.7.1.tar.gz zipped file.
16.2: Click on “Open with Archive Manager”
16.3: Right Click on hadoop-2.7.1
16.4. Click on Extract
16.5: Click /Home directory in left side Explorer
16.6: Click Extract button available at right side Down Corner.
16.7: After Extraction,Click Close button of Extraction Result dialog box
Rename hadoop-2.7.1 into hadoop using mv command :
16.8: hdvsm@/home/admins$ sudo mv hadoop-2.7.1 hadoop

Move hadoop from Downloads folder into /usr/local/ folder using


mv command :
16.8: hdvsm@/home/admins$ sudo mv hadoop /usr/local/
password for hdvsm: Type Password hadoopuser which is given in step 6

Change Directory to back/previous directory


17: hdvsm$ cd

Change Owner permissions of hadoop user


18:hdvsm$sudo chown -R hdvsm /usr/local
Configuration of Environments for Hadoop
(Setting Various Paths):
Note: nano,gedit are default editors of Ubuntu OS.
19:hdvsm$ sudo nano ~/.bashrc
ADD the following lines of code at the end of the .bashrc file
(AFTER fi statement at the end of the file )
copy:
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$JAVA_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
-Djava.library.path=$HADOOP_HOME/lib"
export LD_LIBRARY_PATH=/usr/lib/hadoop/lib/native
paste into .bashrc file.
Then to save and close bashrc file :
1.Press CTRL+O 2. Press Enter Key 3.Press CTRL +X

Execute script file


19.1 : hdvsm$ source ~/.bashrc

Edit hadoop-env.sh shell file


20:hdvsm$sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Find JAVA_HOME= line in hadoop-env.sh and change
JAVA_HOME= path line with the following 3 lines
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib"

1.Press CTRL+O 2. Press Enter Key 3.Press CTRL +X

Edit core-site.xml file:


21:hdvsm$sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml
Paste the following property code between
<configuration> </configuration> tag:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

1.Press CTRL+O 2. Press Enter Key 3.Press CTRL +X

Edit hdfs-site.xml file:


22:hdvsm$sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Paste the following proprty code between
<configuration> </configuration> tag:

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_tmp/hdfs/datanode</value>
</property>

1.Press CTRL+O 2. Press Enter Key 3.Press CTRL +X


Note: By default,HDFS maintains 3 replications . I have given 1 replication . We can change the value of the
replications

Edit yarn-site.xml file:


23:hdvsm$sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
Paste the following proprty code between
<configuration> </configuration> tag:

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

1.Press CTRL+O 2. Press Enter Key 3.Press CTRL +X


Copy mapred-site.xml.template to mapred-site.xml
24:hdvsm$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/mapred-site.xml
Note:Copy and Paste above 2 lines command as a single command

Edit mapred-site.xml file


25:hdvsm$ sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
Paste the following property code between <configuration>
</configuration> tag
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

1.Press CTRL+O 2. Press Enter Key 3.Press CTRL +X

Now Create Folder for HDFS for maintaining NameNode and DataNode

26:hdvsm$ sudo mkdir -p /usr/local/hadoop_tmp


Note: We can give any name as Hadoop folder(i.e., hadoop_tmp) it shoud
be equal to properties of hdfs.site.xml (see hdfs.site.xml )

Create Namenode folder in HDFS


27:hdvsm$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode

Create Datanode folder in HDFS


28:hdvsm$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode

Assign Owner Permissions to HDFS Folder . Remember : Give your


hadoop username . Do not paste and run it. You Paste it and change hadoop
username(created in step 6) and run(Press enter key) it.
29:hdvsm$sudo chown -R hdvsm /usr/local/hadoop_tmp

30: Close terminal

31: Open Terminal( Press Ctrl+Alt+T)

Login into/with Hadoop username


32: vsmcoesys058$ su hdvsm
Password: Type password of hadoop user.

Format Name Node of Hadoop Distributed File System (HDFS)


33:hdvsm$ hdfs namenode -format
Note:Press y or yes when prompt messages required

Start Daemons(BackGround processes) Name Node and Data Node,( Node


Manager)Secondary Name Node, Job Tracker, Task Tracker( Resource
Manager)
34:hdvsm$ start-dfs.sh
Note: some times , system asks as follows for 3 times:
Enter passphrase for key '/home/hduser/.ssh/id_rsa': Press enter Key
hduser@localhost's password: Type password of hadoop user.

35:hdvsm$ start-yarn.sh
Note: some times , system asks as follows for 2 times:
Enter passphrase for key '/home/hduser/.ssh/id_rsa': Press enter Key
hduser@localhost's password: Type password of hadoop user.

OR you can give $ start-all.sh


Monitor JVM Process Status using jps tool
36:hdvsm$ jps
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
4615 SecondaryNameNode
5081 NodeManager
4766 ResourceManager
4419 DataNode
5193 Jps
4262 NameNode

We have sucessfully installed and configured Hadoop Framework.


To run any Hadoop commands or/and run MapReduce programs , All
Daemons should be activated/started.

Close all Daemons at the end the processing of HDFS & MapReduce
33:hdvsm$ stop-all.sh

-------------------------------------------------------------------------------------------

Opening Cluster User Interfeces(UI) in Browsers


1.To open HDFS and folders in Browser:
1. Open any Browser
2. Type http://localhost:50070 in URL address bar
3.Click Utilities Menu to Browse HDFS folders and files
To open Applications status in Browser:
1. Open any Browser
2. Type http://localhost:8088 in URL address bar 3
3. Monitor states of applications ( i.e.,Mapreduce programs)

Common Errors:
1 If Name node is not shown by jps command , then apply step 33 to
format Name Node and start all Daemons.

2. If Permission Denied error is occured , then apply chmod command:


Example:
$ sudo chmod 777 -R /usr/local/hadoop

R - recursively apply rwx permissions to all sub folders


We can specify any /folder/. Do not change permissions of parent folders
in Ubuntu
3. We can use nano or gedit Editors
Install and Run Eclipse
1. Open Our Softwares folder in /Home Directory
2. Right Click on eclipse-jee-luna-SR2-linux-gtk.-.jar.gz file
3. Click on “Open with Archive manager”
4. Right Click on Eclipse folder
5. Click Extract
6. Click /Home directory in left side Explorer
7. Click Extract button available at Right Side Down Corner.
8. Close Archive Manager Dialog Box. Close Extracted Window
9. Open Terminal (Ctrl + Alt +T)
10. $ sudo mv eclipse /opt/
11. $ sudo chmod 777 /usr/share/applications/
12. $ gedit /usr/share/applications/eclipse.desktop

Copy and Paste the following code:


[Desktop Entry]
Name=EclipseLuna
Name[en]=EclipseLuna
Comment=Integrated Developement Environment
Type=Application
Exec=/opt/eclipse/eclipse
Icon=/opt/eclipse/icon.xpm
Terminal=false
NoDisplay=false
Categories=Development;IDE

Click Save Icon and close gedit window

Now, Eclipse Icon is Created in Ubuntu.

Opening Eclipse

13. To open Eclipse, Click First (Search)Icon/button(Ubuntu Button) of


Ubuntu Icons available in left side .
14. Type Eclipse word in search text box
It shows Eclipse Icon in Applications Group
15. Click Eclipse Icon
Now the system opens Eclipse IDE.

Develop and Run Hadoop MapReduce program


using Java in Eclipse
1. Open Terminal and Login into hadoop user.
2. Run all Daemons of Hadoop( start-all.sh and jps commands)
3. Create folder to store our inputfiles in Hadoop HDFS
$ hdfs dfs -mkdir /inputs
Note: inputs folder is created in HDFS File System
4. Copy input files into HDFS. The input files may be existing huge files or
create new input file for testing purpose.

4.1: $ sudo nano Words.txt


Type the following words as follows
I am I am I was I can I can able
Press Ctrl+o Press Enter key Press Ctrl +x
To Copy Local input file into HDFS System
4.2: $ hdfs dfs -put Words.txt /inputs/Words.txt
New, The file “Words.txt” is copied from Local Directory of Ubuntu into
/inputs folder of HDFS.

5. Minimize the Terminal (Do not Close terminal)

-------------------------------------------------------------------------------------

6. Open Eclipse
Note: Create workspace for our project in any folder/directory. Remember
workspacename and its directory.Ex: WorkSpaceWordCount. Click Next
7. Once Eclipse is opened, Maximize Eclipse window to display Menu bar.
Otherwise, Menu Bar will not displayed. OR Click New Icon.
8. To open New Project:
8.1. File -> New -> Project -> Java project
8.2. Type projectName Ex: prjWordCount
8.3. Click Next -> Finish
9. Add New Class to the project
9.1. Right Click on Projectname in left side project explorer
9.2. New --> Class ->
9,3.Remove Package name ,if exists. Then class will be stored in default
package.and
9.4.Type Class name , which is equals to main () method's class name in
your source code
9.5. Click Finish Button
Note: Click OK Button for Perspective message box
Note: if package name is specified when creating class, then
package packagename; should be the first statement of source code of your
class. And Use your class as packagename.classname whereever required.

10. Copy WordCount.java source code from java file.

11. Paste the code into Eclipse code area and Save project Ctrl+S

12 To Add Hadoop , MapReduce jar files


12.1. Right click on Projectname to get Popup Menu
12.2. Click Build Path --> Add External Archives
12.3. Click FileSystem(Available at leftside) --> usr --> local -->hadoop
-->share --> hadoop -->common -->
12.4 Select all jar files by pressing Shift key and Click OK

12.5. Click FileSystem(Available at leftside) --> usr --> local -->hadoop


-->share --> hadoop -->mapreduce -->
12.6. Select all jar files by pressing Shift key and Click OK

13. To Create jar file for our project


13.1. Right click on Projectname to get Popup Menu
12.2. Click Export -->
12.3. Expand > Java --> click JAR file --> Next
Note: Click OK Button for Warning message box
12.4. Type Jar file name (ex: WordCount.jar) in JAR Expert Dialog Box
12.5. Click Next --> Finish
Note:By deafult, The jar file can be created in /workspace folder

---------------------------------------------------------------------------------------
13. Maximize Terminal
14. To run MapReduce Program :
syntax : $ cd workspacename
14.1 $ cd WorkSpaceWordCount

Note: workspacename indicates workspace name of our project, which is


created/specified in first dialog box in Eclipse The workspacename folder
contains our jar file.
Systex: $ hadoop jar jarfile DriverClassName /inputfolder/inputfilename
/outputfolder
14.2. hduser/admin/WorkSpaceWordCount $ hadoop jar WordCount.jar
WordCount /inputs/Words.txt /wordcountoutput

where,
jar file : WordCount.jar
class name: WordCount
inputfolder: /inputs (which is already created)
input filename: Words.txt (which is already created)
outputfolder: /wordcountoutput ( It will be created automatically by the
HDFS) to store output files , which are produced by The Hadoop. Separate
output folder will be created for each Program

To see the output


15. $ hdfs dfs -cat /wordcountoutput/part-r-00000

It shows final output

OR

15. Open Browser


type http://localhost:50070
click Utilities Link
Click Browse the file system then HDFS shows allfolders of HDFS.
Click required folder and open part-r-00000 file , download it.

Thank you.

You might also like