0% found this document useful (0 votes)
56 views4 pages

BDM Lab Manual 2

This document provides instructions for loading and executing sample WordCount MapReduce code in both Hadoop and IntelliJ IDE. It describes how to create a JAR file of Java code in IntelliJ, upload it to a Hadoop VM and execute it. It also explains how to view job status and logs through the Hadoop GUI and command line.

Uploaded by

Vijay Mano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views4 pages

BDM Lab Manual 2

This document provides instructions for loading and executing sample WordCount MapReduce code in both Hadoop and IntelliJ IDE. It describes how to create a JAR file of Java code in IntelliJ, upload it to a Hadoop VM and execute it. It also explains how to view job status and logs through the Hadoop GUI and command line.

Uploaded by

Vijay Mano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

MANUAL for BIG DATA MODELLING LABORATORY – 2

Ex.No. 6
Load and Execute Wordcount MapReduce Java code in Hadoop
Load and Execute Wordcount MapReduce Python code in Hadoop

Sample Steps to Load and Execute Wordcount MapReduce code in Hadoop


To execute the example MapReduce code from repository,
$ hadoop jar /home/hadoop/install/hadoop/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-
mapredue-examples-2.6.5.jar wordcount /dir2/test2.txt /output 

Where wordcount is the MapReduce program which runs on test2.txt file and generates the
output file inside the /output folder.

To view the output,


$ hadoop fs -cat /output/part-r-00000 

Note: During the subsequent execution of the program, inside the same /output folder, it can’t
create the new output file in the same name. So, either delete the existing output file and run
another code, or choose some other folder as output folder.

Ex.No. 7
Load and execute existing WordCount MapReduce code in IntelliJ

i) Download the following files


WordCount.java, WordCountMapper.java, and WordCountReducer.java files
Hadoop jar files

ii) Create a new project in IntelliJ Idea software


File Menu > New > Project > Java > Click on Next Button > Type Project name >
Finish > Ok

iii) Load the word count files into the current project
Copy all the three word count Java code files from its folder
Paste it inside the src folder of Project Explorer window in IntelliJ

The file WordCount.java contains the main() method which should be executed by
choosing the menu option Run or “Run WordCount.main()” option from right-
click menu. But it is erroneous one, since it requires the supportive Hadoop
libraries which are not available in IntelliJ by default.

iv) Add the Hadoop libraries into the current project


File Menu > Project Structure > Libraries > + > Java Libraries > Choose the
folder which contains Hadoop Jar files > Select all Jar files from the list > Ok >
Add it into the current project > Ok

v) Now select the menu option Run or “Run WordCount.main()” option from right-
click menu to execute the code.
On successful execution also, it can’t show the output. Since, it requires the input
file name as command line argument.

Ex.No.8
Create the Jar file of your sample Java code in IntelliJ, upload it into the VM and
execute that code in Hadoop

i) To create the Jar file, first of all inform the system where to create the Jar file.

In IntelliJ, select the following options in the menu


Menu - File > Project Structure > Artifacts > + > Jar > From modules with
dependencies > select the Java file which contains main() method > Ok

It creates META-INF folder inside the src folder (refer the Project Explorer window)

Project Explorer > src folder > META-INF folder > manifest.mf file

ii) To create the Jar file,

Menu - Build > Build Artifact > Project Name


List of Actions > Select Build to create the Jar file

It creates the folder “out” above “src” folder.

Out > Artifact  Project-name-Jar > Project-name.jar


> Production (in orange color)

iii) If any further modifications done in the code, rebuild the artifacts to get its reflection
in the .jar file also.

iv) To load the Jar file into the VM,


Open MobaXTerm > Start the session with VM
Sftp > Upload > Browse and get project-name.jar file

Otherwise, create a new folder in VM’s file system and then upload the .jar file into it.

v) To start Hadoop daemons,


$ start-all.sh 

Verify it by using,
$ jps 
... Jps
... Resource Manager
... Name Node
... Backup Name Node

vi) To run the .jar file in VM


$ hadoop jar project-name.jar 

vii) To run the WordCount.jar file in VM, upload the text file into HDFS
$ vi test.txt 
$ hadoop fs –put test.txt /test.txt 

Then run the WordCount.jar file,


$ hadoop jar WordCount.jar /test.txt /output 
where /test.txt is the input HDFS file, /output is the HDFS folder to create the output
file. The name of the output file would be part-r00000 at the very first time.

To view the output,


$hadoop fs -cat /output/part-r00000 

viii) If the text file (input file of the word count problem) exists in Host OS File
System(i.e., Windows) of some other system in the same network, you can get into
the Guest OS File System through any one of the two ways in MobaXterm.

a) Upload the file


b) scp -v test.txt username@ipaddress-of-Hadoop-VM:. 

Otherwise, through E-Mail, you can download it into your VM.

ix) Check the daemons which are created during the execution of the MapReduce code,
During the execution of MapReduce WordCount program in the current terminal
window, open a new terminal window, and then give the command jps in that new
terminal. It will list out the following daemons.
... Jps
... Resource Manager
... Name Node
... Backup Name Node
... Node Manager
... MRApp Manager
... YarnChild1
... YarnChild2

MR Application Manager is to monitor and manage the currently running job in the
cluster. The YarnChilds are the daemons of Mapper and Reducer processes.

x) View the status in GUI


Open a web browser (eg. FireFox) in VM
Give the web address as localhost:50070
Select utilities > Browse FS > Enter / > It will list all the files in the master machine

xi) View the number of blocks required to handle the input file of the Word Count Map
Reduce program.

Browser window > Double click on the test.txt file > Select Block >
Block 0
Slave 1
Slave 2 like details will be shown if it is multinode cluster configuration.

xii) View the Job History in GUI


To start the job history,
$ mr -jobhistory-daemon.sh start historyserver 

In browser window:
localhost:19888/jobhistory 

It shows the log information of both Mapper code as well as the Reducer code.

xiii) View specific job details


localhost:19888
Click on specific job > Select Mapper > Select Log

You might also like