Exp 5 Bdafinal
Exp 5 Bdafinal
Workflow of
● During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the
cluster. Generally the MapReduce paradigm is based on sending the computer to where the data
resides!
●
● MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
● Map stage − The map or mapper’s job is to process the input data. Generally the input data is in the
form of a file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to
the mapper function line by line. The mapper processes the data and creates several small chunks of
data.
● Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The
Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new
set of output, which will be stored in the HDFS. ● During a MapReduce job, Hadoop sends the Map
and Reduce tasks to the appropriate servers in the cluster.
● The framework manages all the details of data-passing such as issuing tasks, verifying task
completion, and copying data around the cluster between the nodes. ● Most of the computing takes
place on nodes with data on local disks that reduces the network traffic.
● After completion of the given tasks, the cluster collects and reduces the data to form an appropriate
result, and sends it back to the Hadoop server.
● The framework manages all the details of data-passing such as issuing tasks, verifying task
completion, and copying data around the cluster between the nodes. ● Most of the computing takes
place on nodes with data on local disks that reduces the network traffic.
● After completion of the given tasks, the cluster collects and reduces the data to form an appropriate
result, and sends it back to the Hadoop server.
Steps to follow:
∙ Inside this folder, right click -> new -> text document
∙ Copy the Java code (Mapper, Reducer, Driver) and paste it into Notepad.(Java code is added in the
classroom)
∙ Click File → Save As.
∙ Choose All Files as the file type.
∙ Save the file as MatrixMultiplicationMapper.java inside a new folder (e.g., C:\
hadoop_project\).
∙ Repeat the same process for:
∙ MatrixMultiplicationReducer.java
∙ MatrixMultiplicationDriver.java
Step 2: Compile the Java Files
∙ Open Command Prompt (cmd) and navigate to the project folder: C:\hadoop_project ∙ Compile the
Java files with Hadoop dependencies
∙ javac -classpath
"C:\hadoop\share\hadoop\common\*;C:\hadoop\share\hadoop\mapreduce\*;C:\had oop\share\hadoop\
hdfs\*" -d . MatrixMultiplicationMapper.java MatrixMultiplicationReducer.java
MatrixMultiplicationDriver.java
This will generate .class files inside the current directory.
Open Notepad and copy the below data. Save the file as matrix_input.txt in C:\hadoop_project\.
Upload to HDFS: This you will have to do in another command prompt. First launch all deamons using
start-all.cmd and follow the below steps
Viewing output :
Getting output file in specified folder
Observations and learning: MapReduce is a processing technique and a program model for distributed
computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map
takes a set of data and converts it into another set of data, where individual elements are broken down into tuples
(key/value pairs).
Questions
Consider A and B matrix of 2 x 2 dimension, perform matrix multiplication using Mapreduce. Write all
the steps as discussed in the class.