0% found this document useful (0 votes)

3 views6 pages

Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS

The document explains the fundamentals of MapReduce, a programming model developed by Google to efficiently process large data sets by dividing tasks into smaller parts assigned to multiple computers. It details the Map and Reduce tasks, how data is processed into key-value pairs, and the importance of grouping and combining data. Additionally, it addresses handling node failures during processing to ensure the completion of MapReduce jobs.

Uploaded by

sahilkhedekar2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS

Uploaded by

sahilkhedekar2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Module -4 : UNDERSTANDING MAP REDUCE FUNDAMENTALS

MapReduce
1. Traditional Enterprise Systems normally have a centralized server to store and process data.
2. The following illustration depicts a schematic view of a traditional enterprise system. Traditional model
is certainly not suitable to process huge volumes of scalable data and cannot be accommodated by
standard database servers.
3. Moreover, the centralized system creates too much of a bottleneck while processing multiple files
simultaneously.

Figure 4.1: MapReduce

4. Google solved this bottleneck issue using an algorithm called MapReduce. MapReduce divides a task
into small parts and assigns them to many computers.
5. Later, the results are collected at one place and integrated to form the result dataset.

Figure 4.2: Physical structure

6. A MapReduce computation executes as follows:

 Some number of Map tasks each are given one or more chunks from a distributed file system. These Map
tasks turn the chunk into a sequence of key-value pairs. The way key-value pairs are produced from the
input data is determined by the code written by the user for the Map function.
The key-value pairs from each Map task are collected by a master controller and sorted by key. Thekeys are divided
among all the Reduce tasks, so all key-value pairs with the same key wind up at thesame Reduce task

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
 The Reduce tasks work on one key at a time, and combine all the values associated with that key in some
way. The manner of combination of values is determined by the code written by the user for the Reduce
function.

Figure 4.3: Schematic MapReduce Computation

A. The Map Task

i. We view input files for a Map task as consisting of elements, which can be any type: a tuple or a
document, for example.
ii. A chunk is a collection of elements, and no element is stored across two chunks.
iii. Technically, all inputs to Map tasks and outputs from Reduce tasks are of the key-value-pair form, but
normally the keys of input elements are not relevant and we shall tend to ignore them.
iv. Insisting on this form for inputs and outputs is motivated by the desire to allow composition of
several MapReduce processes.
v. The Map function takes an input element as its argument and produces zero or more key-valuepairs.
vi. The types of keys and values are each arbitrary. vii. Further, keys are not “keys” in the usualsense;
they do not have to be unique.
vii. Rather a Map task can produce several key-value pairs with the same key, even from the same
element.
Example 1: A MapReduce computation with what has become the standard example application:
counting the number of occurrences for each word in a collection of documents. In this example, the input
file is a repository of documents, and each document is an element. The Map function for this example
uses keys that are of type String (the words) and values that are integers. The Map task readsa document and
breaks it into its sequence of words w1, w2, . . . , wn. It then emits a sequence of key- value pairs where the
value is always 1. That is, the output of the Map task for this document is the sequence of key-value pairs:
(w1, 1), (w2, 1), . . . , (wn, 1)
A single Map task will typically process many documents – all the documents in one or more chunks. Thus,
its output will be more than the sequence for the one document suggested above. If a word w appears m
times among all the documents assigned to that process, then there will be m key-value pairs (w, 1)
among its output. An option, is to combine these m pairs into a single pair (w, m), but
we can only do that because, the Reduce tasks apply an associative and commutative operation, addition,
to the values.

B. Grouping by Key
i. As the Map tasks have all completed successfully, the key-value pairs are grouped by key, and the values
associated with each key are formed into a list of values.

iii. The master controller process knows how many Reduce tasks there will be, say r such tasks.
iv. The user typically tells the MapReduce system what r should be.
v. Then the master controller picks a hash function that applies to keys and produces a bucket number
from 0 to r − 1.
vi. Each key that is output by a Map task is hashed and its key-value pair is put in one of r local files. Each
file is destined for one of the Reduce tasks.1.
vii. To perform the grouping by key and distribution to the Reduce tasks, the master controller merges the
files from each Map task that are destined for a particular Reduce task and feeds the merged file to that
process as a sequence of key-list-of-value pairs.
viii. That is, for each key k, the input to the Reduce task that handles key k is a pair of the form (k,
[v1, v2, . . . , vn]), where (k, v1), (k, v2), . . . , (k, vn) are all the key-value pairs with key k coming from
all the Map tasks.

C. The Reduce Task

i. The Reduce function’s argument is a pair consisting of a key and its list of associated values.
ii. The output of the Reduce function is a sequence of zero or more key-value pairs.
iii. These key-value pairs can be of a type different from those sent from Map tasks to Reduce tasks, but
often they are the same type.
iv. We shall refer to the application of the Reduce function to a single key and its associated list ofvalues
as a reducer. A Reduce task receives one or more keys and their associated value lists.
v. That is, a Reduce task executes one or more reducers. The outputs from all the Reduce tasks are
merged into a single file.
vi. Reducers may be partitioned among a smaller number of Reduce tasks is by hashing the keys and
associating each
vii. Reduce task with one of the buckets of the hash function.
The Reduce function simply adds up all the values. The output of a reducer consists of the word and the
sum. Thus, the output of all the Reduce tasks is a sequence of (w, m) pairs, where w is a word that appears
at least once among all the input documents and m is the total number of occurrences of w among all those
documents.
D. Combiners
i. A Reduce function is associative and commutative. That is, the values to be combined can be
combined in any order, with the same result.
ii. The addition performed in Example 1 is an example of an associative and commutative operation. It
doesn’t matter how we group a list of numbers v1, v2, . . . , vn; the sum will be the same. iii.When the
Reduce function is associative and commutative, we can push some of what the reducers do to the Map
tasks
iv. These key-value pairs would thus be replaced by one pair with key w and value equal to the sumof all
the 1’s in all those pairs.
v. That is, the pairs with key w generated by a single Map task would be replaced by a pair (w, m), where
m is the number of times that w appears among the documents handled by this Map task.

E. Details of MapReduce task

The MapReduce algorithm contains two important tasks, namely Map and Reduce.

i. The Map task takes a set of data and converts it into another set of data, where individual elementsare
broken down into tuples (key-value pairs).
Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
Figure 4.4: Overview of the execution of a MapReduce program

ii. The Reduce task takes the output from the Map as an input and combines those data tuples (key- value
pairs) into a smaller set of tuples.
iii. The reduce task is always performed after the map job.

Figure 4.5: Reduce job

 Input Phase − Here we have a Record Reader that translates each record in an input file and sendsthe
parsed data to the mapper in the form of key-value pairs.
 Map − Map is a user-defined function, which takes a series of key-value pairs and processes eachone of
them to generate zero or more key-value pairs.
 Intermediate Keys − they key-value pairs generated by the mapper are known as intermediate keys.
 Combiner − A combiner is a type of local Reducer that groups similar data from the map phase into
identifiable sets. It takes the intermediate keys from the mapper as input and applies a user-defined code
to aggregate the values in a small scope of one mapper. It is not a part of the main MapReduce algorithm;
it is optional.
 Shuffle and Sort − The Reducer task starts with the Shuffle and Sort step. It downloads the grouped key-
value pairs onto the local machine, where the Reducer is running. The individual key-value pairs are sorted
by key into a larger data list. The data list groups the equivalent keys together so that their values can be
iterated easily in the Reducer task.
 Reducer − The Reducer takes the grouped key-value paired data as input and runs a Reducer function on
each one of them. Here, the data can be aggregated, filtered, and combined in a number of ways, and it
requires a wide range of processing. Once the execution is over, it gives zero or more key- value pairs to
the final step.

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
 Output Phase − In the output phase, we have an output formatter that translates the final key-value pairs
from the Reducer function and writes them onto a file using a record writer.
iv. The MapReduce phase

Figure 46 The MapReduce Phase

F. MapReduce-Example

Twitter receives around 500 million tweets per day, which is nearly 3000 tweets per second. The
following illustration shows how Tweeter manages its tweets with the help of MapReduce.

Figure4.7: Example

i. Tokenize − Tokenizes the tweets into maps of tokens and writes them as key-value pairs.
ii. Filter − Filters unwanted words from the maps of tokens and writes the filtered maps as key- value
pairs.
iii. Count − Generates a token counter per word.
iv. Aggregate Counters − Prepares an aggregate of similar counter values into small manageableunits.

G. MapReduce – Algorithm

The MapReduce algorithm contains two important tasks, namely Map and Reduce.

i. The map task is done by means of Mapper Class

Mapper class takes the input, tokenizes it, maps and sorts it. The output of Mapper class is used asinput by
Reducer class, which in turn searches matching pairs and reduces them.

ii. The reduce task is done by means of Reducer Class.

MapReduce implements various mathematical algorithms to divide a task into small parts and assign
Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
them to multiple systems. In technical terms, MapReduce algorithm helps in sending the Map & Reduce
tasks to appropriate servers in a cluster.

Figure 4.8: The MapReduce Class

H. Coping With Node Failures

i. The worst thing that can happen is that the compute node at which the Master is executing fails. In this
case, the entire MapReduce job must be restarted.
ii. But only this one node can bring the entire process down; other failures will be managed by the Master,
and the MapReduce job will complete eventually.
iii. Suppose the compute node at which a Map worker resides fails. This failure will be detected by the
Master, because it periodically pings the Worker processes.
iv. All the Map tasks that were assigned to this Worker will have to be redone, even if they had completed.
The reason for redoing completed Map asks is that their output destined for the Reduce tasks resides at
that compute node, and is now unavailable to the Reduce tasks.
v. The Master sets the status of each of these Map tasks to idle and will schedule them on a Worker when
one becomes available.
vi. The Master must also inform each Reduce task that the location of its input from that Map task has
changed. Dealing with a failure at the node of a Reduce worker is simpler.
vii. The Master simply sets the status of its currently executing Reduce tasks to idle. These will be
rescheduled on another reduce worker later.

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com

Map Reduce
No ratings yet
Map Reduce
33 pages
HALLIBURTON-MWD-LWD Services Overview
100% (3)
HALLIBURTON-MWD-LWD Services Overview
8 pages
Unit 5 Notes Data Analytics Kit 601
No ratings yet
Unit 5 Notes Data Analytics Kit 601
44 pages
ABB 2025 Dealer Price List
100% (2)
ABB 2025 Dealer Price List
364 pages
BDA Module 3
No ratings yet
BDA Module 3
66 pages
DRKP Module 3
No ratings yet
DRKP Module 3
44 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
44 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
48 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
5 RK - MapReduce - v3
No ratings yet
5 RK - MapReduce - v3
30 pages
Nosql Qbsol Ia-02
No ratings yet
Nosql Qbsol Ia-02
18 pages
Blockchain Hacking Preview
100% (1)
Blockchain Hacking Preview
37 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
L04 MapReduce
No ratings yet
L04 MapReduce
37 pages
Understanding Inputs and Outputs of Mapreduce
No ratings yet
Understanding Inputs and Outputs of Mapreduce
13 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Unit 3
No ratings yet
Unit 3
22 pages
Module 1 Algorithm For Massive Datasets
No ratings yet
Module 1 Algorithm For Massive Datasets
59 pages
RSM Minitab Tutorial
100% (5)
RSM Minitab Tutorial
41 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Map Reduce Algorithm
No ratings yet
Map Reduce Algorithm
4 pages
Unit III
No ratings yet
Unit III
8 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Content - DELMIA - Ergonomics at Work Essentials
No ratings yet
Content - DELMIA - Ergonomics at Work Essentials
28 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
IT 2023 - Digital - (SEGi Susan 012-2820 251)
No ratings yet
IT 2023 - Digital - (SEGi Susan 012-2820 251)
24 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Unit 4 2 - CC
No ratings yet
Unit 4 2 - CC
6 pages
Telecommunications Security Code of Practice
No ratings yet
Telecommunications Security Code of Practice
150 pages
StotraNidhi Telugu 15-Books Combo
No ratings yet
StotraNidhi Telugu 15-Books Combo
1 page
Map Reduce
No ratings yet
Map Reduce
7 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Mapreduce
No ratings yet
Mapreduce
13 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Features Description: Single Phase, Multifunction Energy Meter IC
No ratings yet
Features Description: Single Phase, Multifunction Energy Meter IC
30 pages
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
SU 841 Separation System System Description: STO P
0% (1)
SU 841 Separation System System Description: STO P
2 pages
Litefinance Partner Agreement en
No ratings yet
Litefinance Partner Agreement en
16 pages
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Map Reduce
No ratings yet
Map Reduce
11 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Unit Ii Iintroduction To Map Reduce
No ratings yet
Unit Ii Iintroduction To Map Reduce
4 pages
MS 02 230
No ratings yet
MS 02 230
58 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Leviat - Ancon - AUS Coupler BR - 2024
No ratings yet
Leviat - Ancon - AUS Coupler BR - 2024
24 pages
Trade Ultra Brochure Web
No ratings yet
Trade Ultra Brochure Web
11 pages
Javascript - 50 functions and tutorial
From Everand
Javascript - 50 functions and tutorial
Nino Paiotta
4/5 (1)
Why MapReduce
No ratings yet
Why MapReduce
8 pages
WIREs Data Min Knowl - 2023 - Shaik - Remote Patient Monitoring Using Artificial Intelligence Current State
No ratings yet
WIREs Data Min Knowl - 2023 - Shaik - Remote Patient Monitoring Using Artificial Intelligence Current State
31 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
Set 5
No ratings yet
Set 5
10 pages
Whitepaper EngineeringDesignSimulationShapeOptimization OnshapeSimScaleESTECO
No ratings yet
Whitepaper EngineeringDesignSimulationShapeOptimization OnshapeSimScaleESTECO
17 pages
PDF (SG) - EAP11 - 12 - Unit 12 - Lesson 1 - Organizing Data From Surveys
No ratings yet
PDF (SG) - EAP11 - 12 - Unit 12 - Lesson 1 - Organizing Data From Surveys
18 pages
17 Microprocessor Systems Lecture No 17 JMP and LOOP Instructions PDF
No ratings yet
17 Microprocessor Systems Lecture No 17 JMP and LOOP Instructions PDF
12 pages
Monoidify! - Monoids As A Design Principle For Efficient MapReduce Algorithms
No ratings yet
Monoidify! - Monoids As A Design Principle For Efficient MapReduce Algorithms
3 pages
AP-M-90216200059 Rev.01
No ratings yet
AP-M-90216200059 Rev.01
10 pages
Robotics Motor & Gear
No ratings yet
Robotics Motor & Gear
3 pages
Building Internet Brands: Brand Equity and Brand Image Creating A Strong Brand On The Internet
No ratings yet
Building Internet Brands: Brand Equity and Brand Image Creating A Strong Brand On The Internet
22 pages
Karl George EMG
No ratings yet
Karl George EMG
2 pages
Incident Report Aiden Fucci
100% (9)
Incident Report Aiden Fucci
34 pages
Combined Voltage and Current Post Insulator Sensors: Ordering Table Part Number Sequence 96AB/CDEFGH Where
No ratings yet
Combined Voltage and Current Post Insulator Sensors: Ordering Table Part Number Sequence 96AB/CDEFGH Where
2 pages
Manual Garvens S2 Uk
100% (1)
Manual Garvens S2 Uk
2 pages
Welding
No ratings yet
Welding
3 pages
Geu Admit Card Back
No ratings yet
Geu Admit Card Back
1 page
SAP Security Audit Tool or SAP SECURITY WITH SIMPAUDIT
No ratings yet
SAP Security Audit Tool or SAP SECURITY WITH SIMPAUDIT
1 page
Matlab Demo Instructions
No ratings yet
Matlab Demo Instructions
1 page

Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS

Uploaded by

Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS

Uploaded by

Module -4 : UNDERSTANDING MAP REDUCE FUNDAMENTALS

Figure 4.1: MapReduce

Figure 4.2: Physical structure

6. A MapReduce computation executes as follows:

Figure 4.3: Schematic MapReduce Computation

A. The Map Task

C. The Reduce Task

E. Details of MapReduce task

Figure 4.5: Reduce job

Figure 46 The MapReduce Phase

i. The map task is done by means of Mapper Class

ii. The reduce task is done by means of Reducer Class.

Figure 4.8: The MapReduce Class

H. Coping With Node Failures

You might also like