0% found this document useful (0 votes)

11 views44 pages

09b - MapReduce

HKBU - COMP7940

Uploaded by

christopherhkrita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views44 pages

09b - MapReduce

HKBU - COMP7940

Uploaded by

christopherhkrita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Cloud-

Enabling
Technologies:
Map Reduce

Slides are modified from several

sources. Please check reference page at
the back
Distributed System

⚫ Any system should deal with two tasks:

– Storage -> GFS
– Computation
⚫ How do we deal with the scalability problem?
⚫ How do we use multiple computers to do what
used to do on one?
How it all got started:
Google MapReduce (2004)

23320 citations and counting …

3
Key-Value Pairs

(key, value) pairs are used as the format for both data and intermediate results

4
MapReduce

• Mappers and Reducers are users’ code (provide as functions)

• Just need to obey the Key-Value pairs interface
• Mappers:
• Consume <key, value> pairs
• Produce <key, value> pairs
• Reducers:
• Consume <key, <list of values>>
• Produce <key, value>
• Shuffling and Sorting:
• Hidden phase between mappers and reducers
• Groups all <key, value> pairs with the same key from all mappers, and passes
them to a certain reducer in the form of <key, <list of values>>

5
A Brief View of MapReduce
Processing Granularity

• Mappers
• Run on a record-by-record basis
• Your code processes that record and may produce
• Zero, one, or many outputs
• Reducers
• Run on a group-of-records (having same key)
• Your code processes that group and may produce
• Zero, one, or many outputs

9
MapReduce: The Map Step

Input Intermediate
key-value pairs key-value pairs

k v
map
v
k k v
map
v
k k v

… …

v k v
k
MapReduce: The Reduce Step

Intermediate Key-value groups

key-value pairs
reduce
k v k v v v k v
reduce
k v k v v k v
group

k v

… … …

k v k v k v
Warm up: Word Count

⚫ We have a large file of words, one word to a line

⚫ Count the number of times each distinct word
appears in the file
⚫ Sample application: analyze web server logs to
find popular URLs
Word Count
⚫ Case 1: Entire file fits in memory
⚫ Load the file into memory and do the counting.
⚫ Case 2: File too large for mem, but all <word,
count> pairs fit in mem
⚫ Create a list of <word, count> pair in the
memory, and scan the file on disk in a
streaming fashion
⚫ Case 3: File on disk, too many distinct words to fit
in memory
⚫ Sort the file on disk (costly) and then scan the file
and count
– sort datafile | uniq –c
Word Count

⚫ To make it slightly harder, suppose we have a

large corpus of documents
⚫ Count the number of times each distinct word
occurs in the corpus
– words(docs/*) | sort | uniq -c
– where words takes a file and outputs the words in it,
one to a line
⚫ The above captures the essence of MapReduce
– Great thing is it is naturally parallelizable
Word Count
• Job: Count the occurrences of each word in a data set

Map Reduce
Tasks Tasks

15
Word Count Example

Provided by the Provided by the

programmer programmer

MAP: Reduce:
Read input and Group by key: Collect all
produces a set Collect all pairs values
The crew of the space
of key-value with same key belonging to the

reads
shuttle Endeavor at
recently pairs key and output a
d
returned to Earth as e
ambassadors, harbingers th

sequential
of a new eraof space
exploration. Scientists at ea
NASA are saying that (The, 1) (crew, 1) d
ry
the l
(crew, 1) (crew, 1)
recent assembly of the (crew, 2)
(of, 1) (space, 1) al
Dextre bot is the first
(space, 1) ti
step in a long-term (the, 1) (the, 1) e

Only
space- (the, 3) n
(space, 1) (the, 1) u
based man/mache
(shuttle, 1) (the, 1) (shuttle, 1)
partnership. '"The work (recently, 1) S
we're doing now -- the (Endeavor, 1) (shuttle, 1)
robotics we're doing -- is (recently, 1) (recently, 1) … e
what we're going to
q
…. …
need
……………………..

Big document (key, value) (key, value) (key, value)

Word Count Example

Key range the node

is responsible for
(apple, 3)
(apple, {1, 1, 1}) (an, 2)
Mapper Reducer
(1-2) (an, {1, 1}) (A-G) (because, 1)
(1, the apple)
(because, 1) (green, 1)
(2, is an apple) (green, 1)
(3, not an orange) Mapper (is, {1, 1}) Reducer (is, 2)
(3-4)
(not, {1, 1})
(H-N) (not, 2)
(4, because the)
(5, orange) (orange, {1, 1, 1}) (orange, 3)
Mapper Reducer
(6, unlike the apple) (5-6) (the, {1, 1, 1}) (O-U)
(the, 3)
(7, is orange) (unlike, 1) (unlike, 1)

(8, not green)

Mapper Reducer
(7-8) (V-Z)

1 Each mapper 2 The mappers 3 Each KV-pair output 4 The reducers 5 The reducers
receives some process the by the mapper is sort their input process their
of the KV- KV-pairs sent to the reducer by key input one
pairs one by one that is responsible and group it group
as input for it at a time
How it looks like in Java

Provide implementation for

Hadoop’s Mapper abstract class

Map function

Provide implementation for

Hadoop’s Reducer abstract class
Reduce function

Job configuration
Example 2: Inverted Index

• Search engines use inverted index to find webpages containing a

given keyword quickly
• MapReduce program for creating an inverted index:
• Map
• For each (url, doc) pair
• Emit (keyword, url) for each keyword in doc
• Reduce
• For each keyword, output (keyword, list of urls)

20
Exercise1: Find the maximum temperature
each year
• Given a large dataset of weather station readings, write down the
Map and Reduce steps necessary to find the maximum temperature
recorded for each year for all weather stations.

• The dataset contains lines with the following format: `stationID, year,
month, day, max temperature (maxTemp), min temperature
(minTemp)‘

21
Exercise2: How to process this SQL query in
MapReduce?

SELECT AuthorName FROM Authors, Books WHERE

Authors.AuthorID=Books.AuthorID AND Books.Date>1980
22
Answer Q1:

• (Map Steps) For each record,

• Read each line and parse it
• Emit (year, maxTemp), where year is the key and max temperature (maxTemp)
is the value.

• (Reduce Steps) For each key,

• Collect all values
• Keep only the max value

23
Answer Q2:

• For each record in the ‘Authors’ table:

• Map: Emit (AuthorID, AuthorName)
• For each record in the ‘Books’ table:
• Map: Emit (AuthorID, Date)
• Reduce:
• For each AuthorID, if Date>1980, output AuthorName

24
Answer Q2 (Optimized)

• For each record in the ‘Authors’ table:

• Map: Emit (AuthorID, AuthorName)
• For each record in the ‘Books’ table:
• Map: If Date>1980, emit (AuthorID, Date)
• Reduce:
• For each AuthorID, output AuthorName

25
Hadoop

• Hadoop is open-source implementation for Google’s MapReduce and

GFS
• Clean and simple programming abstraction
• Users only provide two functions “map” and “reduce”
• Automatic parallelization & distribution
• Hidden from the end-user
• Fault tolerance and automatic recovery
• Nodes/tasks will fail and will recover automatically

26
Brief history
• Initially developed by Doug Cutting as a filesystem for Apache Nutch, a
web search engine

• early name: Nutch Distributed FileSystem (NDFS)

• moved out of Nutch and acquired by Yahoo! in 2006 as an independent

project called Hadoop

2
8
The origin of the name
• “Hadoop” is a made-up name, as explained by Doug Cutting:

“The name my kid gave a stuffed yellow elephant.

Short, relatively easy to spell and pronounce,
meaningless, and not used elsewhere: those are my
naming criteria. Kids are good at generating such.”

2
9
Hadoop: How it Works

30
Hadoop Architecture

• Hadoop framework consists of two main layers

• Distributed file system (HDFS)
• Execution engine (MapReduce)

Main node (single node)

Many worker nodes

31
MapReduce Framework
Hadoop Distributed File System (HDFS)
One namenode
Maintains metadata info about files:
• Maps a filename to a set of blocks
• Maps a block to the DataNodes where it resides
• Replication engine for blocks

File F 1 2 3 4 5

Blocks (64 MB)

Many datanode (1000s)

- Store the actual data
- Files are divided into blocks
- Each block is replicated r times
(Default = 3)
- Communicates with NameNode
through periodic “heartbeat” (once per 3
secs) 33
Data flow overview ClusterId
NameNode
(Master)

Client
Secondary
NameNode

ClusterId

DataNodes

3
4
Data Flow

⚫ Input, final output are stored on a distributed file

system
– Scheduler tries to schedule map tasks “close” to
physical storage location of input data
⚫ Intermediate results are stored on local FS of
map and reduce workers
⚫ Output is often input to another map reduce task
Distributed Execution Overview

User
Program

fork fork fork

assign Master
assign
map
reduce

Input Data Worker

write Output
local Worker File 0
Split 0 write
read
Split 1 Worker
Output
Split 2 Worker File 1
Worker remote
read,
sort
Heartbeats
• DataNodes send heartbeats to the NameNode

• Once every 3 secs

• NameNode uses heartbeats to detect DataNode failure

• No response in 10 mins is considered a failure

37
Replication engine
• Upon detecting a DataNode failure

• Choose new DataNodes for replicas

• Balance disk usage

• Balance communication traffic to DataNodes

38
HDFS Erasure Coding
• New feature introduced in Hadoop 3.0
• Problem with Replication Mechanism in HDFS
• Each replica uses 100% storage overhead, thus results
in 200% storage overhead.
• Cold replica

39
• Requires only 50% storage overhead.
• But can tolerate only 1 failure.

40
Reed-Solomon Algorithm
• RS multiplies 𝑚 data cells with a Generator Matrix (GT) to get
extended codeword with 𝑚 data cells and 𝑛 parity cells.
• Data can be recovered by multiplying the inverse of the generator
matrix with the extended codewords as long as 𝑚 out of 𝑚 + 𝑛 cells are
available.
• XOR is the special case with 𝑛 = 1
• Can tolerate up
to 𝑛 failures
• But increases
CPU load

41
NameNode failure
• A single point of failure

• Transaction log stored in multiple directories

• Directory on local file system

• A directory on a remote file system (NFS)

• Add a secondary NameNode

42
Hadoop Map-Reduce
(Example: Color Count)
Input blocks Produces (k, v) Shuffle & Sorting Consumes(k, [v])
on HDFS ( , 1) based on k ( , [1,1,1,1,1,1..])

Produces(k’, v’)
Map Parse-hash ( , 100)
Reduce

Map Parse-hash
Reduce

Map Parse-hash

Users only provide the “Map” and “Reduce” functions

43
Hadoop MapReduce

• Job Tracker is the master node (runs with the namenode)

• Receives the user’s job
• Decides on how many tasks will run (number of mappers)
• Decides on where to run each mapper (locality matters)

Node 1 Node 2 Node 3

• This file has 5 Blocks → run 5 map tasks

• Where to run the task reading block “1”

• Try to run it on Node 1 or Node 3

44
Hadoop MapReduce

• Task Tracker is the slave node (runs on each datanode)

• Receives the task from Job Tracker
• Runs the task until completion (either map or reduce task)
• Always in communication with the Job Tracker reporting progress

Map Parse-hash
Reduce

Map Parse-hash
Reduce
In this example, 1 map-reduce
job consists of 4 map tasks and 3
Map Parse-hash
reduce tasks
Reduce

Map Parse-hash

45
Failures

⚫ Map worker failure

– Map tasks completed or in-progress at worker are
reset to idle
– Reduce workers are notified when task is rescheduled
on another worker
⚫ Reduce worker failure
– Only in-progress tasks are reset to idle
⚫ Master failure
– MapReduce task is aborted and client is notified
On worker failure
• Detect failure via periodic heartbeats

• Workers send heartbeat messages (ping) periodically to

the master node

• Re-execute completed and in-progress map tasks

• Re-execute in-progress reduce tasks

• Task completion committed through master

47
Reference

• Chapter 6, Dan C. Marinescu, Cloud Computing Theory and Practice,

Second Edition
• https://www.ibm.com/docs/en/cics-ts/5.4?topic=processing-acid-
properties-transactions
• https://www.mongodb.com/nosql-explained/best-nosql-database
• Slides from, M. Silic, Analysis of Massive Dataset, University of Zagreb

IDS Unit3
No ratings yet
IDS Unit3
19 pages
03 MapReduce
No ratings yet
03 MapReduce
184 pages
3 Hadoop
No ratings yet
3 Hadoop
111 pages
ECS G43T-DM1 Schematic
80% (5)
ECS G43T-DM1 Schematic
38 pages
SYS600 - Application Objects
No ratings yet
SYS600 - Application Objects
372 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
24 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Map Reduce
No ratings yet
Map Reduce
26 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
Ch02a Mapreduce
No ratings yet
Ch02a Mapreduce
53 pages
Big Data Analytics Module 3: Mapreduce Paradigm: Faculty Name: Ms. Varsha Sanap Dr. Vivek Singh
No ratings yet
Big Data Analytics Module 3: Mapreduce Paradigm: Faculty Name: Ms. Varsha Sanap Dr. Vivek Singh
36 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
A10 DS Adc
No ratings yet
A10 DS Adc
20 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
UNIT 2-tt1
No ratings yet
UNIT 2-tt1
7 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
24 pages
Basic Mouse and Keyboarding PP
No ratings yet
Basic Mouse and Keyboarding PP
12 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Mas 3
No ratings yet
Mas 3
50 pages
All 1313 Combined 2
No ratings yet
All 1313 Combined 2
114 pages
Hadoop - Mapreduce
No ratings yet
Hadoop - Mapreduce
5 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
Hadoop-Yahoo - Tutorial Course 1
No ratings yet
Hadoop-Yahoo - Tutorial Course 1
149 pages
Big Data
No ratings yet
Big Data
43 pages
Programming in Pascal: From simple Pascal programs to current desktop applications with Database DEV-PASCAL, LAZARUS AND PASCAL N-IDE
From Everand
Programming in Pascal: From simple Pascal programs to current desktop applications with Database DEV-PASCAL, LAZARUS AND PASCAL N-IDE
Olga Maria Stefania Cucaro
No ratings yet
Hadoop
No ratings yet
Hadoop
34 pages
Topic 5-Abstract Data Structures - Revision - Notes
100% (1)
Topic 5-Abstract Data Structures - Revision - Notes
12 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Bda Ia1 Scheme
No ratings yet
Bda Ia1 Scheme
7 pages
Mod4 sp24v3
No ratings yet
Mod4 sp24v3
54 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Ashish Assignment
No ratings yet
Ashish Assignment
29 pages
1.4 Map Reduce
No ratings yet
1.4 Map Reduce
30 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
How To Install Oracle Solaris 11 On Sun Machine
No ratings yet
How To Install Oracle Solaris 11 On Sun Machine
18 pages
Bda Unit 1
No ratings yet
Bda Unit 1
13 pages
Mastering Java: A Comprehensive Guide to Development Tools and Techniques
From Everand
Mastering Java: A Comprehensive Guide to Development Tools and Techniques
Lena Neill
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
3828A20820F
No ratings yet
3828A20820F
36 pages
SERRATRON PROC - 001a - XSM
No ratings yet
SERRATRON PROC - 001a - XSM
17 pages
Laboratory Manual: Microprocessor & Microcontroller
0% (1)
Laboratory Manual: Microprocessor & Microcontroller
4 pages
Computer Studies - Computer Studies Form 2 - Zeraki Achievers 5.0 - Marking Scheme
No ratings yet
Computer Studies - Computer Studies Form 2 - Zeraki Achievers 5.0 - Marking Scheme
9 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
BigData - Oozie
No ratings yet
BigData - Oozie
5 pages
Classes in Python
No ratings yet
Classes in Python
5 pages
Active Directory ADFS2 - Plan For and Deploy
No ratings yet
Active Directory ADFS2 - Plan For and Deploy
13 pages
Dell's Supply Chain Management
No ratings yet
Dell's Supply Chain Management
19 pages
Manual KSH
No ratings yet
Manual KSH
52 pages
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
מערכות הפעלה- הרצאה 3 יחידה ד - Process
No ratings yet
מערכות הפעלה- הרצאה 3 יחידה ד - Process
43 pages
An Interpreter For Extended Lambda Calculus - AIM-349
No ratings yet
An Interpreter For Extended Lambda Calculus - AIM-349
43 pages
Logcat 1729738744061
No ratings yet
Logcat 1729738744061
17 pages
Bentley Ram Concrete
No ratings yet
Bentley Ram Concrete
2 pages
Log
No ratings yet
Log
14 pages
18EC46 Question Bank
100% (1)
18EC46 Question Bank
1 page
SAP HANA Guidelines For Red Hat Enterprise
No ratings yet
SAP HANA Guidelines For Red Hat Enterprise
3 pages
Overview of Experion HS R430: Superior, Flexible, and Efficient !!
No ratings yet
Overview of Experion HS R430: Superior, Flexible, and Efficient !!
10 pages
Developing Web Services Using SOAP
No ratings yet
Developing Web Services Using SOAP
11 pages
Blancco - Mobile Diagnostics Mobile Processors
No ratings yet
Blancco - Mobile Diagnostics Mobile Processors
2 pages
EapSimAka - Seek-For-Android - Support For EAP-SIM and EAP-AKA in Android
No ratings yet
EapSimAka - Seek-For-Android - Support For EAP-SIM and EAP-AKA in Android
5 pages

09b - MapReduce

Uploaded by

09b - MapReduce

Uploaded by

Cloud-

Slides are modified from several

⚫ Any system should deal with two tasks:

23320 citations and counting …

• Mappers and Reducers are users’ code (provide as functions)

Intermediate Key-value groups

⚫ We have a large file of words, one word to a line

⚫ To make it slightly harder, suppose we have a

Provided by the Provided by the

Big document (key, value) (key, value) (key, value)

Key range the node

(8, not green)

Provide implementation for

Provide implementation for

• Search engines use inverted index to find webpages containing a

SELECT AuthorName FROM Authors, Books WHERE

• (Map Steps) For each record,

• (Reduce Steps) For each key,

• For each record in the ‘Authors’ table:

• For each record in the ‘Authors’ table:

• Hadoop is open-source implementation for Google’s MapReduce and

• early name: Nutch Distributed FileSystem (NDFS)

• moved out of Nutch and acquired by Yahoo! in 2006 as an independent

“The name my kid gave a stuffed yellow elephant.

• Hadoop framework consists of two main layers

Main node (single node)

Many worker nodes

Blocks (64 MB)

Many datanode (1000s)

⚫ Input, final output are stored on a distributed file

fork fork fork

Input Data Worker

• Once every 3 secs

• NameNode uses heartbeats to detect DataNode failure

• No response in 10 mins is considered a failure

• Choose new DataNodes for replicas

• Balance disk usage

• Balance communication traffic to DataNodes

• Transaction log stored in multiple directories

• Directory on local file system

• A directory on a remote file system (NFS)

• Add a secondary NameNode

Users only provide the “Map” and “Reduce” functions

• Job Tracker is the master node (runs with the namenode)

Node 1 Node 2 Node 3

• This file has 5 Blocks → run 5 map tasks

• Where to run the task reading block “1”

• Task Tracker is the slave node (runs on each datanode)

⚫ Map worker failure

• Workers send heartbeat messages (ping) periodically to

• Re-execute completed and in-progress map tasks

• Re-execute in-progress reduce tasks

• Task completion committed through master

• Chapter 6, Dan C. Marinescu, Cloud Computing Theory and Practice,

You might also like