0% found this document useful (0 votes)
41 views7 pages

Exp 5 Bdafinal

This document outlines an experiment aimed at implementing a matrix multiplication algorithm using Map-Reduce in Hadoop. It details the prerequisites, theoretical background of Map-Reduce, and step-by-step instructions for setting up the environment, writing the necessary Java code, compiling it, and executing the program. The conclusion emphasizes the successful implementation of the algorithm and includes a question for further practice.

Uploaded by

jpurva23ecs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views7 pages

Exp 5 Bdafinal

This document outlines an experiment aimed at implementing a matrix multiplication algorithm using Map-Reduce in Hadoop. It details the prerequisites, theoretical background of Map-Reduce, and step-by-step instructions for setting up the environment, writing the necessary Java code, compiling it, and executing the program. The conclusion emphasizes the successful implementation of the algorithm and includes a question for further practice.

Uploaded by

jpurva23ecs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

EXPERIMENT NO : 5

To implement a simple algorithm in Map-Reduce: Matrix Multiplication

Name : Purva Jage


Div : A
Class : TYECS
Roll no : 626
Date of performance :
Date of submission :
Grade:
Sign:
EXPERIMENT NO : 5

Aim: To implement a simple algorithm in Map-Reduce: Matrix Multiplication/word count .


Prerequisite : Ensure that Hadoop is installed, configured and is running.

OutCome: After completing this experiment students will be able to


1. Implement logic and execute complex programs by external resources using map reduce.

Theory: Map reduce


MapReduce is a style of computing that has been implemented in several systems, including Google’s internal
implementation (simply called MapReduce) and the popular open-source implementation Hadoop which can
be obtained, along with the HDFS file system from the Apache Foundation. You can use an implementation
of MapReduce to manage many large
scale computations in a way that is tolerant of hardware faults. All you need to write are two functions, called
Map and Reduce, while the system manages the parallel execution, coordination of tasks that execute Map or
Reduce, and also deals with the possibility that one of these tasks will fail to execute. In brief, a MapReduce
computation executes as follows:
1. Some Map tasks each are given one or more chunks from a distributed file system. These Map tasks turn
the chunk into a sequence of key-value pairs. The way key value pairs are produced from the input data is
determined by the code written by the user for the Map function.
2. The key-value pairs from each Map task are collected by a master controller and sorted by key. The keys
are divided among all the Reduce tasks, so all key-value pairs with the same key wind up at the same Reduce
task.
3. The Reduce tasks work on one key at a time, and combine all the values associated with that key in some
way. The manner of combination of values is determined by the code written by the user for the Reduce
function.
Matrix Multiplication
Suppose we have an n x n matrix M, whose element in row i and column j will be denoted by Mij. Suppose
we also have a vector v of length n, whose j th element is Vj . Then the matrix vector product is the vector of
length n, whose ith element xi.
Let A and B be the two matrices to be multiplied and the result be matrix C. Matrix A has dimensions
L, M and matrix B has dimensions M, N.
In the Map phase:

Workflow of

Map Reduce Program to count word:


Use Mapper and Reducer:

● During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the
cluster. Generally the MapReduce paradigm is based on sending the computer to where the data
resides!

● MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
● Map stage − The map or mapper’s job is to process the input data. Generally the input data is in the
form of a file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to
the mapper function line by line. The mapper processes the data and creates several small chunks of
data.
● Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The
Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new
set of output, which will be stored in the HDFS. ● During a MapReduce job, Hadoop sends the Map
and Reduce tasks to the appropriate servers in the cluster.
● The framework manages all the details of data-passing such as issuing tasks, verifying task
completion, and copying data around the cluster between the nodes. ● Most of the computing takes
place on nodes with data on local disks that reduces the network traffic.
● After completion of the given tasks, the cluster collects and reduces the data to form an appropriate
result, and sends it back to the Hadoop server.
● The framework manages all the details of data-passing such as issuing tasks, verifying task
completion, and copying data around the cluster between the nodes. ● Most of the computing takes
place on nodes with data on local disks that reduces the network traffic.
● After completion of the given tasks, the cluster collects and reduces the data to form an appropriate
result, and sends it back to the Hadoop server.
Steps to follow:

Step 1: Create a folder in C:\ as ‘hadoop_project’ => C:\hadoop_project

∙ Inside this folder, right click -> new -> text document
∙ Copy the Java code (Mapper, Reducer, Driver) and paste it into Notepad.(Java code is added in the
classroom)
∙ Click File → Save As.
∙ Choose All Files as the file type.
∙ Save the file as MatrixMultiplicationMapper.java inside a new folder (e.g., C:\
hadoop_project\).
∙ Repeat the same process for:

∙ MatrixMultiplicationReducer.java
∙ MatrixMultiplicationDriver.java
Step 2: Compile the Java Files

∙ Open Command Prompt (cmd) and navigate to the project folder: C:\hadoop_project ∙ Compile the
Java files with Hadoop dependencies
∙ javac -classpath
"C:\hadoop\share\hadoop\common\*;C:\hadoop\share\hadoop\mapreduce\*;C:\had oop\share\hadoop\
hdfs\*" -d . MatrixMultiplicationMapper.java MatrixMultiplicationReducer.java
MatrixMultiplicationDriver.java
This will generate .class files inside the current directory.

Step 3: Create a JAR File:

This creates matrix-multiplication.jar inside C:\hadoop_project\.

Step 4: Prepare Input Data

Open Notepad and copy the below data. Save the file as matrix_input.txt in C:\hadoop_project\.
Upload to HDFS: This you will have to do in another command prompt. First launch all deamons using
start-all.cmd and follow the below steps

C:\Users\Administrator>hdfs dfs -mkdir -p /matrix_input

C:\Users\Administrator>hdfs dfs -put C:\hadoop_project\matrix_input.txt /matrix_input

Step 5: Run the JAR File


C:\Users\Administrator>hadoop jar C:\hadoop_project\matrix-multiplication.jar
MatrixMultiplicationDriver /matrix_input /matrix_output

Viewing output :
Getting output file in specified folder
Observations and learning: MapReduce is a processing technique and a program model for distributed
computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map
takes a set of data and converts it into another set of data, where individual elements are broken down into tuples
(key/value pairs).

Conclusion: Thus we successfully implemented a simple algorithm in Map-Reduce: Matrix Multiplication

Questions
Consider A and B matrix of 2 x 2 dimension, perform matrix multiplication using Mapreduce. Write all
the steps as discussed in the class.

You might also like