0% found this document useful (0 votes)

4 views4 pages

MapReduce - Algorithm

The MapReduce algorithm consists of two main tasks: Map and Reduce, executed by the Mapper and Reducer classes respectively. It employs various mathematical algorithms such as sorting, searching, indexing, and TF-IDF to process and analyze data across multiple systems. The document provides detailed explanations and examples of each algorithm's functionality within the MapReduce framework.

Uploaded by

VIKAS GOYAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views4 pages

MapReduce - Algorithm

Uploaded by

VIKAS GOYAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

6/19/22, 5:36 PM MapReduce - Algorithm

MapReduce - Algorithm

The MapReduce algorithm contains two important tasks, namely Map and Reduce.

The map task is done by means of Mapper Class

The reduce task is done by means of Reducer Class.

Mapper class takes the input, tokenizes it, maps and sorts it. The output of Mapper class is used
as input by Reducer class, which in turn searches matching pairs and reduces them.

MapReduce implements various mathematical algorithms to divide a task into small parts and
assign them to multiple systems. In technical terms, MapReduce algorithm helps in sending the
Map & Reduce tasks to appropriate servers in a cluster.

These mathematical algorithms may include the following −

Sorting
Searching
Indexing
TF-IDF

Sorting
Sorting is one of the basic MapReduce algorithms to process and analyze data. MapReduce
implements sorting algorithm to automatically sort the output key-value pairs from the mapper by
their keys.

Sorting methods are implemented in the mapper class itself.

In the Shuffle and Sort phase, after tokenizing the values in the mapper class, the Context
class (user-defined class) collects the matching valued keys as a collection.
To collect similar key-value pairs (intermediate keys), the Mapper class takes the help of
RawComparator class to sort the key-value pairs.

https://www.tutorialspoint.com/map_reduce/map_reduce_algorithm.htm# 1/4
6/19/22, 5:36 PM MapReduce - Algorithm

The set of intermediate key-value pairs for a given Reducer is automatically sorted by
Hadoop to form key-values (K2, {V2, V2, …}) before they are presented to the Reducer.

Searching
Searching plays an important role in MapReduce algorithm. It helps in the combiner phase
(optional) and in the Reducer phase. Let us try to understand how Searching works with the help
of an example.

Example
The following example shows how MapReduce employs Searching algorithm to find out the
details of the employee who draws the highest salary in a given employee dataset.

Let us assume we have employee data in four different files − A, B, C, and D. Let us also
assume there are duplicate employee records in all four files because of importing the
employee data from all database tables repeatedly. See the following illustration.

The Map phase processes each input file and provides the employee data in key-value
pairs (<k, v> : <emp name, salary>). See the following illustration.

The combiner phase (searching technique) will accept the input from the Map phase as a
key-value pair with employee name and salary. Using searching technique, the combiner
will check all the employee salary to find the highest salaried employee in each file. See the
following snippet.

<k: employee name, v: salary>

Max= the salary of an first employee. Treated as max salary

https://www.tutorialspoint.com/map_reduce/map_reduce_algorithm.htm# 2/4
6/19/22, 5:36 PM MapReduce - Algorithm

if(v(second employee).salary > Max){

Max = v(salary);
}

else{
Continue checking;
}

The expected result is as follows −

<manisha,
<satish, 26000> <gopal, 50000> <kiran, 45000> 45000>

Reducer phase − Form each file, you will find the highest salaried employee. To avoid
redundancy, check all the <k, v> pairs and eliminate duplicate entries, if any. The same
algorithm is used in between the four <k, v> pairs, which are coming from four input files.
The final output should be as follows −

<gopal, 50000>

Indexing
Normally indexing is used to point to a particular data and its address. It performs batch indexing
on the input files for a particular Mapper.

The indexing technique that is normally used in MapReduce is known as inverted index. Search
engines like Google and Bing use inverted indexing technique. Let us try to understand how
Indexing works with the help of a simple example.

Example
The following text is the input for inverted indexing. Here T[0], T[1], and t[2] are the file names
and their content are in double quotes.

T[0] = "it is what it is"

T[1] = "what is it"
T[2] = "it is a banana"

After applying the Indexing algorithm, we get the following output −

"a": {2}
"banana": {2}
"is": {0, 1, 2}

https://www.tutorialspoint.com/map_reduce/map_reduce_algorithm.htm# 3/4
6/19/22, 5:36 PM MapReduce - Algorithm

"it": {0, 1, 2}
"what": {0, 1}

Here "a": {2} implies the term "a" appears in the T[2] file. Similarly, "is": {0, 1, 2} implies the term
"is" appears in the files T[0], T[1], and T[2].

TF-IDF
TF-IDF is a text processing algorithm which is short for Term Frequency − Inverse Document
Frequency. It is one of the common web analysis algorithms. Here, the term 'frequency' refers to
the number of times a term appears in a document.

Term Frequency (TF)

It measures how frequently a particular term occurs in a document. It is calculated by the number
of times a word appears in a document divided by the total number of words in that document.

TF(the) = (Number of times term the ‘the’ appears in a document) / (Total number of
terms in the document)

Inverse Document Frequency (IDF)

It measures the importance of a term. It is calculated by the number of documents in the text
database divided by the number of documents where a specific term appears.

While computing TF, all the terms are considered equally important. That means, TF counts the
term frequency for normal words like “is”, “a”, “what”, etc. Thus we need to know the frequent
terms while scaling up the rare ones, by computing the following −

IDF(the) = log_e(Total number of documents / Number of documents with term ‘the’ in

it).

The algorithm is explained below with the help of a small example.

Example
Consider a document containing 1000 words, wherein the word hive appears 50 times. The TF
for hive is then (50 / 1000) = 0.05.

Now, assume we have 10 million documents and the word hive appears in 1000 of these. Then,
the IDF is calculated as log(10,000,000 / 1,000) = 4.

The TF-IDF weight is the product of these quantities − 0.05 × 4 = 0.20.

https://www.tutorialspoint.com/map_reduce/map_reduce_algorithm.htm# 4/4

Back Savers Production Problem
No ratings yet
Back Savers Production Problem
4 pages
Monoidify! - Monoids As A Design Principle For Efficient MapReduce Algorithms
No ratings yet
Monoidify! - Monoids As A Design Principle For Efficient MapReduce Algorithms
3 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
CNNS, Part 1: An Introduction To Convolutional Neural Networks
No ratings yet
CNNS, Part 1: An Introduction To Convolutional Neural Networks
17 pages
Map Reduce
No ratings yet
Map Reduce
5 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
Map Reduce Algorithm
No ratings yet
Map Reduce Algorithm
8 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Unit 2
No ratings yet
Unit 2
12 pages
Hadoop Training #5: MapReduce Algorithm
100% (2)
Hadoop Training #5: MapReduce Algorithm
31 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
IRS Module 5
No ratings yet
IRS Module 5
24 pages
Unit 3
No ratings yet
Unit 3
22 pages
Research Paper - Map Reduce - CSC3323
No ratings yet
Research Paper - Map Reduce - CSC3323
16 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Inverted Index
No ratings yet
Inverted Index
13 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Cloudera Academic Partnership 7
No ratings yet
Cloudera Academic Partnership 7
70 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Algorithms
No ratings yet
Algorithms
49 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
03 MapReduce
No ratings yet
03 MapReduce
184 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
L04 MapReduce
No ratings yet
L04 MapReduce
37 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Data Science
No ratings yet
Data Science
7 pages
Lec 8
No ratings yet
Lec 8
19 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Lec 8
No ratings yet
Lec 8
24 pages
777 1651400043 BD Module 4
No ratings yet
777 1651400043 BD Module 4
21 pages
Paper Dvi
No ratings yet
Paper Dvi
7 pages
Unit 3
No ratings yet
Unit 3
27 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
MapReduce Algorithms For Big Data Analysis
No ratings yet
MapReduce Algorithms For Big Data Analysis
2 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
The Mapreduce Programming Model
No ratings yet
The Mapreduce Programming Model
64 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Module 1 Algorithm For Massive Datasets
No ratings yet
Module 1 Algorithm For Massive Datasets
59 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
No ratings yet
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
7 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
CLE-Lec 2
No ratings yet
CLE-Lec 2
14 pages
Unit-1 Evolution of Computers
No ratings yet
Unit-1 Evolution of Computers
3 pages
Mobile Computing
No ratings yet
Mobile Computing
70 pages
Interview Qstns
No ratings yet
Interview Qstns
6 pages
LITA Unit 4
No ratings yet
LITA Unit 4
94 pages
Vocabulary Building English
No ratings yet
Vocabulary Building English
16 pages
New Doc 2019-09-13 14.13.13
No ratings yet
New Doc 2019-09-13 14.13.13
4 pages
New Doc 2019-09-19 14.06.49
No ratings yet
New Doc 2019-09-19 14.06.49
11 pages
Name Node Federation
No ratings yet
Name Node Federation
3 pages
Synchronous Generator and Motor
No ratings yet
Synchronous Generator and Motor
5 pages
Precis Writing
No ratings yet
Precis Writing
2 pages
Unit 1 English
No ratings yet
Unit 1 English
12 pages
Spectro QST 1
No ratings yet
Spectro QST 1
4 pages
Single Phase Induction Motor
No ratings yet
Single Phase Induction Motor
5 pages
DC Machine Notes
No ratings yet
DC Machine Notes
12 pages
3 Phase Induction Motor
No ratings yet
3 Phase Induction Motor
3 pages
BEE Tutorial Sheet 12 & 13
No ratings yet
BEE Tutorial Sheet 12 & 13
3 pages
Thevenin's Notes
No ratings yet
Thevenin's Notes
4 pages
BVP
No ratings yet
BVP
23 pages
Ch7 Crypto6e
No ratings yet
Ch7 Crypto6e
43 pages
Assignment 6 ML
No ratings yet
Assignment 6 ML
4 pages
Tutorials Mu Law Compression
No ratings yet
Tutorials Mu Law Compression
5 pages
Operation Research
No ratings yet
Operation Research
15 pages
Lecture No.45 Data Structures: Dr. Sohail Aslam
No ratings yet
Lecture No.45 Data Structures: Dr. Sohail Aslam
54 pages
JJJ
No ratings yet
JJJ
6 pages
Computation of DFT
No ratings yet
Computation of DFT
13 pages
Unit-5 (Sorting-Hashing)
No ratings yet
Unit-5 (Sorting-Hashing)
37 pages
Embedded Systems Spring 2021 Quiz #02
No ratings yet
Embedded Systems Spring 2021 Quiz #02
2 pages
H170227e Vimbainashe Chigumbu Daa-3
No ratings yet
H170227e Vimbainashe Chigumbu Daa-3
8 pages
Submitted in Partial Fulfilment For The Award of Degree of
No ratings yet
Submitted in Partial Fulfilment For The Award of Degree of
13 pages
Ai Fundamental Midterm Quizzes - Jei
No ratings yet
Ai Fundamental Midterm Quizzes - Jei
48 pages
Chapter 6a Digital Filter
No ratings yet
Chapter 6a Digital Filter
21 pages
Question Bank
100% (1)
Question Bank
3 pages
Unit III
No ratings yet
Unit III
89 pages
3 Free Courses That Helped Me Land My First Data Scientist Job in Amazon - by Farzad Mahmoodinobar - Medium
No ratings yet
3 Free Courses That Helped Me Land My First Data Scientist Job in Amazon - by Farzad Mahmoodinobar - Medium
15 pages
Lesson Plan Numec
No ratings yet
Lesson Plan Numec
6 pages
FCFS SJF RR
No ratings yet
FCFS SJF RR
12 pages
Digital Communication S. Haykin
No ratings yet
Digital Communication S. Haykin
79 pages
Daa
No ratings yet
Daa
113 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
10 pages
Class 7
No ratings yet
Class 7
24 pages
Eee 5218 Information Theory and Coding Exam (Main and CBD)
No ratings yet
Eee 5218 Information Theory and Coding Exam (Main and CBD)
3 pages
Parametric Identification
No ratings yet
Parametric Identification
6 pages
Case Study
No ratings yet
Case Study
4 pages
Digicrome Data Science & AI 11 Month Course PDF
No ratings yet
Digicrome Data Science & AI 11 Month Course PDF
36 pages
D3804a15 DS 4
No ratings yet
D3804a15 DS 4
10 pages

MapReduce - Algorithm

Uploaded by

MapReduce - Algorithm

Uploaded by

6/19/22, 5:36 PM MapReduce - Algorithm

The map task is done by means of Mapper Class

These mathematical algorithms may include the following −

Sorting methods are implemented in the mapper class itself.

<k: employee name, v: salary>

if(v(second employee).salary > Max){

The expected result is as follows −

T[0] = "it is what it is"

After applying the Indexing algorithm, we get the following output −

Term Frequency (TF)

Inverse Document Frequency (IDF)

IDF(the) = log_e(Total number of documents / Number of documents with term ‘the’ in

The algorithm is explained below with the help of a small example.

The TF-IDF weight is the product of these quantities − 0.05 × 4 = 0.20.

You might also like