MapReduce - Documentation

MapReduce is a programming model for processing large datasets in a distributed manner across clusters of computers. It involves dividing a task into smaller subtasks that are processed in parallel by the nodes in a cluster, and then combining the results from the subtasks into the final output. The key functions in MapReduce are the Mapper, which processes input data into intermediate key-value pairs, and the Reducer, which combines those intermediate pairs based on keys to produce the final output. The number of map tasks depends on the number of data blocks in the input file, so more input data results in more parallelism and faster processing.

Uploaded by

sudhirmutcherla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views2 pages

MapReduce - Documentation

Uploaded by

sudhirmutcherla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

MapReduce

MapReduce are programs, designed to compute large volumes of data in

a parallel fashion, which requires dividing the workload across a large
number of machines (nodes). The basic notion of MapReduce is to divide a
task into subtasks, handle the sub-tasks in parallel, and combine the
results of the subtasks to form the final output.
MapReduce consists of two key functions: Mapper and Reducer.
Mapper is a function which process the input data. The mapper processes
the data and creates several small chunks of data. The input to the mapper
function is in the form of (key, value) pairs, even though the input to a
MapReduce program is a file or directory (which is stored in the HDFS).

Working of Mapper in MapReduce:

1. The input data from the users is passed to the Mapper which is
specified by an InputFormat. InputFormat is specified in the driver
code. It defines the location of the input data like a file or directory on
HDFS. It also determines how to split the input data into input splits.
2. Each Mapper deals with a single input split. RecordReader are
objects which is a part of InputFormat, used to extract (key, value)
records from the input source (split data)
3. The Mapper processes the input, which are, the (key, value) pairs
and provides an output, which are also (key, value) pairs. The output
from the Mapper is called the intermediate output.
4. The Mapper may use or completely ignore the input key. For
example, a standard pattern is to read a file one line at a time. The key
is the byte offset into the file at which the line starts. The value is the
contents of the line itself. Typically the key is considered irrelevant. If
the Mapper writes anything out, the output must be in the form of
key/value pairs.
5. The output from the Mapper (intermediate keys and their value
lists) are passed to the Reducer in sorted key order.
6. The Reducer outputs zero or more final key/value pairs. These are
written to HDFS. The Reducer usually emits a single key/value pair for
each input key
7. If a Mapper appears to be running more slowly or lagging than the
others, a new instance of the Mapper will be started on another
machine, operating on the same data. The results of the first Mapper to
finish will be used. Hadoop will eliminate the Mapper which is still
running
The number of map tasks in a MapReduce program depends on the
number of data blocks of the input file. For example, if the block size is
128MB per block of split data and the input data is of size 1GB, then the
number of map tasks will be 8 map tasks. The number of map tasks
increases with the increase in the input data and hence parallelism
increases which results in faster processing of data.

Hill Resort Synopsis
91% (22)
Hill Resort Synopsis
5 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
Unit - III
No ratings yet
Unit - III
37 pages
2 Bda Chapter2 Answer
No ratings yet
2 Bda Chapter2 Answer
9 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
MAP Reduce - 1
No ratings yet
MAP Reduce - 1
34 pages
Bda U2
No ratings yet
Bda U2
79 pages
Map Reduce
No ratings yet
Map Reduce
8 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Unit 2
No ratings yet
Unit 2
12 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Lec 6
No ratings yet
Lec 6
14 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
Bda Unit 3
No ratings yet
Bda Unit 3
14 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
No ratings yet
Lecture 2 - Mapreduce: Cpe 458 - Parallel Programming, Spring 2009
26 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
4 pages
Data Science
No ratings yet
Data Science
7 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Map Red
No ratings yet
Map Red
6 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
5 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
Lec 6
No ratings yet
Lec 6
16 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Big Data
No ratings yet
Big Data
120 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Hadoop: A Seminar Report On
No ratings yet
Hadoop: A Seminar Report On
28 pages
Unit 5
No ratings yet
Unit 5
7 pages
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Offences Relating Marriage
No ratings yet
Offences Relating Marriage
12 pages
Automatic Iot Based Plant Monitoring and Watering System Using Raspberry Pi
No ratings yet
Automatic Iot Based Plant Monitoring and Watering System Using Raspberry Pi
13 pages
Gruwaars: Personality: Gruwaars Are Voyeuristic by Nature. They Love To See Individuals Act in
No ratings yet
Gruwaars: Personality: Gruwaars Are Voyeuristic by Nature. They Love To See Individuals Act in
3 pages
Theories of Risk Perception
No ratings yet
Theories of Risk Perception
21 pages
10th Holiday Worksheet
No ratings yet
10th Holiday Worksheet
28 pages
Reading Assessment Tool: Banna National High School
No ratings yet
Reading Assessment Tool: Banna National High School
2 pages
Quick Guide To Writing An Argumentative Essay: Paragraph 1: Introduction
No ratings yet
Quick Guide To Writing An Argumentative Essay: Paragraph 1: Introduction
2 pages
Four Positive Effects From The Teachings of Sri Sathya Sai Baba
No ratings yet
Four Positive Effects From The Teachings of Sri Sathya Sai Baba
13 pages
Case Study
No ratings yet
Case Study
44 pages
Two Types of Writing - Creative and Letter Writing PDF
No ratings yet
Two Types of Writing - Creative and Letter Writing PDF
3 pages
Definition of Trend: 2nd Semester Weeks 1-2
No ratings yet
Definition of Trend: 2nd Semester Weeks 1-2
2 pages
Sorry, Ya Just Gotta Be Quiet, My Dad, He's Still Sleeping, He Works Nights. C'mon in
No ratings yet
Sorry, Ya Just Gotta Be Quiet, My Dad, He's Still Sleeping, He Works Nights. C'mon in
12 pages
Lymph Ad en Op A Thy
No ratings yet
Lymph Ad en Op A Thy
24 pages
Brazil Thesis Statement
100% (2)
Brazil Thesis Statement
5 pages
Software Requirements Specification: Version 1.0 Approved
No ratings yet
Software Requirements Specification: Version 1.0 Approved
13 pages
Al Adab Al Mufrad (بدلأا درفملا) - Being Dutiful to the Parents (رب نيدلاولا) - Class #1
No ratings yet
Al Adab Al Mufrad (بدلأا درفملا) - Being Dutiful to the Parents (رب نيدلاولا) - Class #1
3 pages
Mughira 2
No ratings yet
Mughira 2
16 pages
A Meta Analytic Review of Guided Notes
No ratings yet
A Meta Analytic Review of Guided Notes
25 pages
Adult STS Lesson 145 - Miscellaneous Laws
No ratings yet
Adult STS Lesson 145 - Miscellaneous Laws
4 pages
South African in London: BBC Learning English London Life
No ratings yet
South African in London: BBC Learning English London Life
4 pages
Grammer Exercise
100% (1)
Grammer Exercise
4 pages
Gitoralmanifestation 180812171655
No ratings yet
Gitoralmanifestation 180812171655
74 pages
Ecpe Speaking Notes Part 1 Webinar
100% (3)
Ecpe Speaking Notes Part 1 Webinar
12 pages
M20 - Modelos de Fichas PDF
No ratings yet
M20 - Modelos de Fichas PDF
8 pages
Ujjam Bai v. State of Uttar Pradesh-1
No ratings yet
Ujjam Bai v. State of Uttar Pradesh-1
95 pages
Your First 100 Clients. Free Chapter Download PDF
100% (1)
Your First 100 Clients. Free Chapter Download PDF
7 pages
Elementary Practical Statistics
67% (3)
Elementary Practical Statistics
440 pages
Bank Nifty Weekly FnO Hedging Strategy
No ratings yet
Bank Nifty Weekly FnO Hedging Strategy
5 pages
Riga EN Y18
No ratings yet
Riga EN Y18
2 pages

MapReduce - Documentation

Uploaded by

MapReduce - Documentation

Uploaded by

MapReduce

MapReduce are programs, designed to compute large volumes of data in

Working of Mapper in MapReduce:

You might also like