Map Reduce

This document provides an overview of MapReduce, describing how it divides large data processing tasks into smaller sub-tasks that can be run in parallel across clusters of computers. It explains that MapReduce programs take input as lists and output lists, and use Map and Reduce functions to distribute the workload. The Map function produces intermediate key-value pairs that get shuffled and sorted before being passed to the Reduce function to produce the final output. It also describes how MapReduce can be used to analyze large datasets like social media usage or trading firm reconciliations.

Uploaded by

Indra Kishor Chaudhary Avaiduwai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views11 pages

Map Reduce

Uploaded by

Indra Kishor Chaudhary Avaiduwai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Map Reduce

By
Isha Shrestha
Jebina Maharjan
Manisha Bhandari
Sarah Gorkhali
Introduction
● Map Reduce is designed to process the large amount of data in
parallel by dividing the work.
● The whole job is taken from the user and divided into smaller
tasks and assign them into the working nodes.
● Map Reduce programs take input as a list and convert to the
output in the form of list as well.
Why Map Reduce
● Distribute the load
● Reduce the big data and extract the meaningful
data.
Working of Map Reduce

MapReduce are two functions: Map and Reduce.

They are sequenced one after the other.
● The Map function takes input from the disk as <key,value> pairs, processes them, and
produces another set of intermediate <key,value> pairs as output.
● The Reduce function also takes inputs as <key,value> pairs, and produces
<key,value> pairs as output.
Map

● The input data is first split into smaller blocks. Each block is
then assigned to a mapper for processing.
● For example, if a file has 100 records to be processed, 100
mappers can run together to process one record each. Or
maybe 50 mappers can run together to process two records
each. The Hadoop framework decides how many mappers to
use, based on the size of the data to be processed and the
memory block available on each mapper server.
Reduce

•After all the mappers complete processing, the

framework shuffles and sorts the results before passing
them on to the reducers. A reducer cannot start while a
mapper is still in progress. All the map output values
that have the same key are assigned to a single reducer,
which then aggregates the values for that key.
Combine and Partition

There are two intermediate steps between Map and Reduce.

Combine
● It is an optional process.
● The combiner is a reducer that runs individually on each mapper server.
● It reduces the data on each mapper further to a simplified form before passing it
downstream.
● This makes shuffling and sorting easier as there is less data to work with.
Partition

● It is the process that translates the <key, value> pairs resulting from mappers to
another set of <key, value> pairs to feed into the reducer.
● It decides how the data has to be presented to the reducer and also assigns it to a
particular reducer.
● The default partitioner determines the hash value for the key, resulting from the
mapper, and assigns a partition based on this hash value. There are as many partitions
as there are reducers. So, once the partitioning is complete, the data from each
partition is sent to a specific reducer.
Implementation
It can be written in Java, C, C++, Python, Ruby,Perl,etc.
Uses
It can be used with any complex problem that can be solved through
parallelization.
A social media site could use it to determine how many new sign-ups it received
over the past month from different countries, to gauge its increasing popularity
among different geographies.
A trading firm could perform its batch reconciliations faster and also determine
which scenarios often cause trades to break.
Search engines could determine page views, and marketers could perform
sentiment analysis using MapReduce.

Documentation of Stock-Market-Prediction Final Project
0% (1)
Documentation of Stock-Market-Prediction Final Project
21 pages
Class 12 Macro Economics Mind Map Chapter - 4 Determination of Income and Employment
No ratings yet
Class 12 Macro Economics Mind Map Chapter - 4 Determination of Income and Employment
24 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Attieh Brochure (Small FS)
50% (2)
Attieh Brochure (Small FS)
20 pages
Big Data
No ratings yet
Big Data
120 pages
ITA 2005 Soil Conditioning For EPB Machines Balance of Functional and Ecological Properties
No ratings yet
ITA 2005 Soil Conditioning For EPB Machines Balance of Functional and Ecological Properties
7 pages
Sox Audit
No ratings yet
Sox Audit
8 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
ABB Azipod Brochure Lores
No ratings yet
ABB Azipod Brochure Lores
8 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
ECS765P - W2 - The MapReduce Programming Model
No ratings yet
ECS765P - W2 - The MapReduce Programming Model
53 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Bda Unit 2
No ratings yet
Bda Unit 2
48 pages
L04 MapReduce
No ratings yet
L04 MapReduce
37 pages
DRKP Module 3
No ratings yet
DRKP Module 3
44 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
SAP ISU - SAP Expertise Consulting
100% (1)
SAP ISU - SAP Expertise Consulting
11 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Unit 3
No ratings yet
Unit 3
27 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
Unit - III
No ratings yet
Unit - III
37 pages
Unit 3
No ratings yet
Unit 3
22 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
Hadoop MapReduce Tutorial
No ratings yet
Hadoop MapReduce Tutorial
25 pages
Bda Winter 2021 Solution
No ratings yet
Bda Winter 2021 Solution
27 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Distributed Systems: 18. Mapreduce
No ratings yet
Distributed Systems: 18. Mapreduce
39 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Nosql Qbsol Ia-02
No ratings yet
Nosql Qbsol Ia-02
18 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
Boycott List of Israel Items
No ratings yet
Boycott List of Israel Items
3 pages
Distributed Database and Big Data
No ratings yet
Distributed Database and Big Data
72 pages
Unit III
No ratings yet
Unit III
8 pages
Data Science
No ratings yet
Data Science
7 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Nosql Mod3
No ratings yet
Nosql Mod3
18 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Hadoop For Dummies: Mapreduce To The Rescue
No ratings yet
Hadoop For Dummies: Mapreduce To The Rescue
17 pages
Map Red
No ratings yet
Map Red
6 pages
Module 2
No ratings yet
Module 2
7 pages
Delhi Metro Rail Corporation LTD
No ratings yet
Delhi Metro Rail Corporation LTD
3 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
Unit Ii Iintroduction To Map Reduce
No ratings yet
Unit Ii Iintroduction To Map Reduce
4 pages
Appendix-I Application Form For Empanelment of Valuers PDF
No ratings yet
Appendix-I Application Form For Empanelment of Valuers PDF
2 pages
01 - ITIL Patch Management Best Practices
No ratings yet
01 - ITIL Patch Management Best Practices
4 pages
Professional Price List: 03-04-2018
No ratings yet
Professional Price List: 03-04-2018
17 pages
Stock Market Analysis and Prediction
No ratings yet
Stock Market Analysis and Prediction
16 pages
Hasee HP500 Laptop Schematics
No ratings yet
Hasee HP500 Laptop Schematics
41 pages
Practicum Action Plan
No ratings yet
Practicum Action Plan
2 pages
CCTV Tech HBK - 0713 508
0% (1)
CCTV Tech HBK - 0713 508
66 pages
MongoDB Lab
100% (1)
MongoDB Lab
6 pages
Kabel - PC SPSC2000 FW2 PDF
No ratings yet
Kabel - PC SPSC2000 FW2 PDF
1 page
Usermanual Em6400.v01
No ratings yet
Usermanual Em6400.v01
81 pages
Luvlygurumi Kitty (ING)
No ratings yet
Luvlygurumi Kitty (ING)
5 pages
Simcom Sim5215 Sim5216 Atc en v1.21
No ratings yet
Simcom Sim5215 Sim5216 Atc en v1.21
527 pages
Course Work of PHD
100% (1)
Course Work of PHD
8 pages
Lec # 06 - DLD
No ratings yet
Lec # 06 - DLD
30 pages
s13643 023 02202 8
No ratings yet
s13643 023 02202 8
10 pages
Chp2-Binary Numbers and Codes (15.1.09)
No ratings yet
Chp2-Binary Numbers and Codes (15.1.09)
16 pages
Characteristics of Patent Litigation A Window On Competition Lanjouw and Schankerman
No ratings yet
Characteristics of Patent Litigation A Window On Competition Lanjouw and Schankerman
41 pages
LST-1198 Decom &amp Xfrto Morroco 7-9-84
No ratings yet
LST-1198 Decom &amp Xfrto Morroco 7-9-84
30 pages
Long Term Water Repellent Treatment For External Masonry: Belzona® 5122
No ratings yet
Long Term Water Repellent Treatment For External Masonry: Belzona® 5122
2 pages
Use Case Lookup
No ratings yet
Use Case Lookup
17 pages
Bachelor of Science in Hospitality Management
No ratings yet
Bachelor of Science in Hospitality Management
21 pages
Tutorial 1
No ratings yet
Tutorial 1
14 pages
Green and Black Minimalist Resume
No ratings yet
Green and Black Minimalist Resume
2 pages

Map Reduce

Uploaded by

Map Reduce

Uploaded by

Map Reduce

MapReduce are two functions: Map and Reduce.

•After all the mappers complete processing, the

There are two intermediate steps between Map and Reduce.

You might also like