Map Reduce Algorithm - Hadoop

The document discusses the MapReduce algorithm and Hadoop. It describes the map and reduce functions and how MapReduce handles tasks like scheduling, data distribution, synchronization, and fault tolerance. It provides examples of using MapReduce for problems like finding the sum of squares and building an inverted index.

Uploaded by

Sushan Gautam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views15 pages

Map Reduce Algorithm - Hadoop

Uploaded by

Sushan Gautam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Map Reduce Algorithm

– Hadoop
18BCE2482
18BCE2488
18BCE2490
MapReduce ?
 Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’) → <k’, v’>*
 All values with the same key are reduced together

 Usually, programmers also specify:

partition (k’, number of partitions) → partition for k’
 Often a simple hash of the key, e.g. hash(k’) mod n
 Allows reduce operations for different keys in parallel
combine (k’, v’) → <k’, v’>*
 Mini-reducers that run in memory after the map phase
 Used as an optimization to reducer network traffic
Performance

 Handles scheduling
Assigns workers to map and reduce tasks
 Handles “data distribution”
Moves the process to the data
 Handles synchronization
Gathers, sorts, and shuffles intermediate data
 Handles faults
Detects worker failures and restarts
 Everything happens on top of a distributed FS
Sum of Square
Sum of Square of Even and Odd
Map Reduce Architecture
Map Reduce with Combiner
Map Side
1. Map task writes to a circular buffer which it writes the output to
2. Once it reaches a threshold, it starts to spill the contents to local
disk
3. Before writing to disk, the data is partitioned corresponding to the
reducers that the data will be sent to
4. Each partition is sorted by key and combiner is run on the sorted
output
5. Multiple spill files may be created by the time map finishes.
These spill files are merged into a single partitioned, sorted output
file
6. The output file partitions are made available to reducers over
HTTP
Reduce Side
1. The map outputs are sitting on local disks. Reduce tasks will need
this data in order to proceed with the reduce task
2. Reduce task needs the map output for its particular partition from
several maps across the cluster
3. The reduce task starts copying the map outputs as soon as each
map completes. This is the copy phase. The map outputs are
fetched in parallel by multiple threads.
4. Map outputs are copied to jvm’s memory if small enough, else
copied to disk. As copies accumulate, they are merged into larger
sorted files. When all are copied, they are merged maintaining
their sort order
5. Reduce function is invoked for each key in sorted output and
output is written
SEARCHING
employee data in four different files − A, B, C, and D

Map Phase
- processes each input file and provides the employee data in
key-value pairs (<k, v> : <emp name, salary>).
Combiner phase
 will accept the input from the Map phase
 the combiner will check all the employee salary to find the highest salaried employee in
each file

Reducer phase

 From each file, you will find the highest salaried employee.
<gopal,50000>
Inverted Index

Unit 3 Notes
No ratings yet
Unit 3 Notes
21 pages
Assignment 2 Write-Up
No ratings yet
Assignment 2 Write-Up
7 pages
SWR302
No ratings yet
SWR302
287 pages
Dispute Management
No ratings yet
Dispute Management
3 pages
Unit 3
No ratings yet
Unit 3
27 pages
Big Data Unit-2 PPT Part2
No ratings yet
Big Data Unit-2 PPT Part2
78 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Introduction To Interactive Content
No ratings yet
Introduction To Interactive Content
8 pages
Bda U2
No ratings yet
Bda U2
79 pages
Big Data Unit 2 - PPT1
No ratings yet
Big Data Unit 2 - PPT1
15 pages
Bda Unit 2
No ratings yet
Bda Unit 2
48 pages
DASH IF IOP For ATSC3 0 v1.0
No ratings yet
DASH IF IOP For ATSC3 0 v1.0
68 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
CG Programs
No ratings yet
CG Programs
72 pages
Unit 3
No ratings yet
Unit 3
22 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
4 pages
Analyzing Data With Hadoop
No ratings yet
Analyzing Data With Hadoop
54 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
FIXED FOR NEWS-ANNOUNCEMENT-Announcement - Manage-Php
No ratings yet
FIXED FOR NEWS-ANNOUNCEMENT-Announcement - Manage-Php
7 pages
Qemu Interrupt
No ratings yet
Qemu Interrupt
34 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Big Data Analytics-4
No ratings yet
Big Data Analytics-4
26 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Unit - III
No ratings yet
Unit - III
37 pages
Unit 2
No ratings yet
Unit 2
12 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Mockingboard 4c+ Installation Manual
No ratings yet
Mockingboard 4c+ Installation Manual
10 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
21CS1601 Unit 5 Understanding Big Data Technolgies
No ratings yet
21CS1601 Unit 5 Understanding Big Data Technolgies
20 pages
Unit 4
No ratings yet
Unit 4
19 pages
Cyber Safety and Online Access and Computer Security Notes
No ratings yet
Cyber Safety and Online Access and Computer Security Notes
5 pages
UUCMS - Unified University College Management System
No ratings yet
UUCMS - Unified University College Management System
2 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
M800 CDMA TM-System Architecture
100% (1)
M800 CDMA TM-System Architecture
85 pages
Unit 3
No ratings yet
Unit 3
13 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
AZ - 900 Part 5
No ratings yet
AZ - 900 Part 5
11 pages
HP Compaq Presario 2100 (AMD) Quanta KT3I T2.2 Shematic Diagram 3D
No ratings yet
HP Compaq Presario 2100 (AMD) Quanta KT3I T2.2 Shematic Diagram 3D
30 pages
UNIT 3bda
No ratings yet
UNIT 3bda
16 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Amiga Computing 084
100% (2)
Amiga Computing 084
148 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
Chap 6 - MapReduce Programming
No ratings yet
Chap 6 - MapReduce Programming
37 pages
Bda Mod2
No ratings yet
Bda Mod2
8 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Muhammad Ahmed Khan - Cv-1
No ratings yet
Muhammad Ahmed Khan - Cv-1
2 pages
CN Lab Manual
75% (4)
CN Lab Manual
34 pages
3D Scanning Using The NextEngine 3D Scanner HD - Instructables
No ratings yet
3D Scanning Using The NextEngine 3D Scanner HD - Instructables
9 pages
Chapter 4: Automating Active Directory Domain Services Administration
No ratings yet
Chapter 4: Automating Active Directory Domain Services Administration
19 pages
SDA Lab 12
No ratings yet
SDA Lab 12
4 pages
Basic Four Ict
No ratings yet
Basic Four Ict
3 pages
Gramhal Massachusetts Institute of Technology Echoing Green Mittal Institute
No ratings yet
Gramhal Massachusetts Institute of Technology Echoing Green Mittal Institute
1 page
Microsoft Azure Presentation
No ratings yet
Microsoft Azure Presentation
32 pages
Case Study 1
100% (1)
Case Study 1
11 pages
Map Red
No ratings yet
Map Red
6 pages
Smart Aquaponics Farming Using Internet of Things
No ratings yet
Smart Aquaponics Farming Using Internet of Things
15 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
All Obj Methods
100% (1)
All Obj Methods
24 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
CDR
No ratings yet
CDR
22 pages
Eller Santé® USB C Hub Hyperion 9IN1 Type C Hub
No ratings yet
Eller Santé® USB C Hub Hyperion 9IN1 Type C Hub
1 page
How To Create A War File Using Maven
0% (1)
How To Create A War File Using Maven
3 pages
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Hadoop Performance Tuning
100% (1)
Hadoop Performance Tuning
13 pages
Ericsson RBS Series
100% (1)
Ericsson RBS Series
2 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
MS Excel Trade Test Actual Part 2
No ratings yet
MS Excel Trade Test Actual Part 2
5 pages

Map Reduce Algorithm - Hadoop

Uploaded by

Map Reduce Algorithm - Hadoop

Uploaded by

Map Reduce Algorithm

 Usually, programmers also specify:

You might also like