0% found this document useful (0 votes)

3 views

Map Reduce

Uploaded by

anneinlondon02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Map Reduce

Uploaded by

anneinlondon02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

11/28/24, 9:40 PM ChatGPT

MapReduce Theory
MapReduce is a programming model for processing large datasets in parallel across a distributed
cluster of computers. The model consists of two main steps: the Map step and the Reduce step.

1. Map Step:
In this step, the input data is divided into chunks, and the map function processes these chunks in
parallel, creating intermediate key-value pairs. These key-value pairs are generated based on the
input data.

2. Shuffle and Sort Step:

After the map function processes the data, the system groups and sorts these key-value pairs based
on their keys, so that all values corresponding to the same key are together.

3. Reduce Step:
In this step, the reduce function processes the grouped data and combines or aggregates it to
produce the final result.

MapReduce Workflow:
Map: Breaks the data into smaller, manageable chunks and processes them in parallel.
Shuffle: Organizes the intermediate key-value pairs.
Reduce: Combines or aggregates the processed results into the final output.

MapReduce allows large-scale data processing because it runs across many machines and ensures
that each part of the data is processed efficiently.

Key Features of MapReduce:

Scalability: Can handle petabytes of data by distributing the computation across a cluster of
machines.
Fault Tolerance: If a task fails on one machine, it is reassigned to another, ensuring reliable
execution.
Parallel Processing: Processes data simultaneously on multiple machines, speeding up
computation.

Simple Example of MapReduce

Problem: Count the number of occurrences of each word in a large text file.

Input Data:
text Copy code

https://chatgpt.com/c/674890b0-e7ec-8001-bff2-af9ce12c6aad 1/3
11/28/24, 9:40 PM ChatGPT

Hello World Hello Hadoop Hadoop is great

Step-by-Step Process:

1. Map Step:
Each line of the text is split into words, and for each word, we emit a key-value pair. The key is the
word, and the value is 1 (representing the occurrence of that word).
Input:

csharp Copy code

Hello World Hello Hadoop Hadoop is great

Map Output (Key-Value Pairs):

scss Copy code

(Hello, 1) (World, 1) (Hello, 1) (Hadoop, 1) (Hadoop, 1) (is, 1) (great, 1)

2. Shuffle and Sort Step:

The intermediate data is grouped by keys (the words), so all occurrences of the same word are
grouped together.
Grouped Data:

css Copy code

(Hello, [1, 1]) (World, [1]) (Hadoop, [1, 1]) (is, [1]) (great, [1])

3. Reduce Step:
The reduce function takes each group of values associated with a key and aggregates them (sums
them up in this case).
Reduce Output:

scss Copy code

(Hello, 2) (World, 1) (Hadoop, 2) (is, 1) (great, 1)

Final Output:
The final result is the word count for each word in the input text:

makefile Copy code

https://chatgpt.com/c/674890b0-e7ec-8001-bff2-af9ce12c6aad 2/3
11/28/24, 9:40 PM ChatGPT

Hello: 2 World: 1 Hadoop: 2 is: 1 great: 1

Code Example (Pseudo-code):

Here’s a simple pseudo-code implementation for MapReduce:

python Copy code

# Map function def map_function(input_data): output = [] for line in input_data: for

word in line.split(): output.append((word, 1)) return output # Reduce function def
reduce_function(mapped_data): result = {} for word, count in mapped_data: if word in
result: result[word] += count else: result[word] = count return result # Example usage
input_data = [ "Hello World", "Hello Hadoop", "Hadoop is great" ] # Map phase
mapped_data = map_function(input_data) # Reduce phase final_result =
reduce_function(mapped_data) print(final_result)

Output:

arduino Copy code

{'Hello': 2, 'World': 1, 'Hadoop': 2, 'is': 1, 'great': 1}

Conclusion:
Big Data refers to large-scale datasets that require specialized tools for storage and
processing.
MapReduce is a powerful method for parallel processing of large datasets, where data is
processed in the form of key-value pairs.
The process involves mapping data into smaller chunks, shuffling and sorting the intermediate
data, and reducing it to a final result.

MapReduce is particularly effective for distributed computing environments like Hadoop, where
data is spread across many machines.

https://chatgpt.com/c/674890b0-e7ec-8001-bff2-af9ce12c6aad 3/3

Anderson, J. R. (1982) - Acquisition of Cognitive
No ratings yet
Anderson, J. R. (1982) - Acquisition of Cognitive
38 pages
Day 1 La Llorona Lesson Plan
No ratings yet
Day 1 La Llorona Lesson Plan
3 pages
The Origins of A Nation
No ratings yet
The Origins of A Nation
15 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Data Science
No ratings yet
Data Science
7 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Introduction to batch processing
No ratings yet
Introduction to batch processing
23 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Hadoop - Mapreduce (1)
No ratings yet
Hadoop - Mapreduce (1)
5 pages
Unit 3 Map Reduce
No ratings yet
Unit 3 Map Reduce
3 pages
Map Reduce
No ratings yet
Map Reduce
18 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
Bda 03
No ratings yet
Bda 03
10 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Fundamentals of MapReduce With Example
No ratings yet
Fundamentals of MapReduce With Example
2 pages
BDP 2024 08
No ratings yet
BDP 2024 08
14 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Map Reduce
No ratings yet
Map Reduce
69 pages
Map reduce
No ratings yet
Map reduce
35 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
The Mapreduce Programming Model
No ratings yet
The Mapreduce Programming Model
64 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Hadoop MapReduce Explained Simply
No ratings yet
Hadoop MapReduce Explained Simply
3 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
Mapreduce 190419130907
No ratings yet
Mapreduce 190419130907
12 pages
bda megh
No ratings yet
bda megh
50 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
ECS765P_W2_The MapReduce Programming Model
No ratings yet
ECS765P_W2_The MapReduce Programming Model
53 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
45 pages
2 MapReduce continue
No ratings yet
2 MapReduce continue
12 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Mapreduce article review
No ratings yet
Mapreduce article review
8 pages
Hadoop OnePage
No ratings yet
Hadoop OnePage
2 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
CC UNIT-7
No ratings yet
CC UNIT-7
16 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Parallel Programming, Mapreduce Model: Unit Ii
No ratings yet
Parallel Programming, Mapreduce Model: Unit Ii
47 pages
7_bdp-2024-08
No ratings yet
7_bdp-2024-08
14 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
BDA 2 (1)
No ratings yet
BDA 2 (1)
35 pages
Untitled
No ratings yet
Untitled
16 pages
The CAP Theorem Overview
No ratings yet
The CAP Theorem Overview
16 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
From Everand
Raster Graphics Editor: Transforming Visual Realities: Mastering Raster Graphics Editors in Computer Vision
Fouad Sabry
No ratings yet
Chapter One Inclusivness
No ratings yet
Chapter One Inclusivness
21 pages
Udaka Shanti Tamil
No ratings yet
Udaka Shanti Tamil
123 pages
Appendix B
No ratings yet
Appendix B
60 pages
Web de English
No ratings yet
Web de English
235 pages
Department of Education: Pre-Formal Observation Reflection Form
No ratings yet
Department of Education: Pre-Formal Observation Reflection Form
5 pages
Elliptical Construction Ik4
No ratings yet
Elliptical Construction Ik4
12 pages
Unit 2: Algebraic Expression
No ratings yet
Unit 2: Algebraic Expression
3 pages
A Semantic Interpretation of W B Yeats the Second
No ratings yet
A Semantic Interpretation of W B Yeats the Second
15 pages
Language Maurya SD
No ratings yet
Language Maurya SD
6 pages
CSE357 Workbook
No ratings yet
CSE357 Workbook
62 pages
Life Pre-Int Wordlist Final Alphabetised Unit 7
No ratings yet
Life Pre-Int Wordlist Final Alphabetised Unit 7
7 pages
AIM For Oracle Apps
No ratings yet
AIM For Oracle Apps
31 pages
Getting Started in Photoshop
No ratings yet
Getting Started in Photoshop
19 pages
Icelandic Fairy Tales
No ratings yet
Icelandic Fairy Tales
137 pages
Brand Storytelling Worksheet
100% (1)
Brand Storytelling Worksheet
8 pages
Source Code Visual Basic
No ratings yet
Source Code Visual Basic
10 pages
Ancient World: Legal History or The History of Law Is The Study of How
No ratings yet
Ancient World: Legal History or The History of Law Is The Study of How
2 pages
B1 Guardia Civil Tests
No ratings yet
B1 Guardia Civil Tests
1 page
Full download Grails in Action 1st Edition Glen Smith pdf docx
No ratings yet
Full download Grails in Action 1st Edition Glen Smith pdf docx
67 pages
Relnotes MZ32-AR0 R28 F04
No ratings yet
Relnotes MZ32-AR0 R28 F04
2 pages
1-6 English Language
100% (3)
1-6 English Language
172 pages
PLC-work Book - 19 - 05 - 2018
No ratings yet
PLC-work Book - 19 - 05 - 2018
12 pages
Pure Mathematics - Reasoning and Logic
No ratings yet
Pure Mathematics - Reasoning and Logic
24 pages
Essay Wise Children
No ratings yet
Essay Wise Children
2 pages
Internal Aid To Construction
92% (12)
Internal Aid To Construction
20 pages
xvappas chicken wings
No ratings yet
xvappas chicken wings
7 pages
Introducing The 'Mini Arcade' PDF
No ratings yet
Introducing The 'Mini Arcade' PDF
34 pages

Map Reduce

Uploaded by

Map Reduce

Uploaded by

11/28/24, 9:40 PM ChatGPT

2. Shuffle and Sort Step:

Key Features of MapReduce:

Simple Example of MapReduce

Hello World Hello Hadoop Hadoop is great

csharp Copy code

Hello World Hello Hadoop Hadoop is great

Map Output (Key-Value Pairs):

scss Copy code

(Hello, 1) (World, 1) (Hello, 1) (Hadoop, 1) (Hadoop, 1) (is, 1) (great, 1)

2. Shuffle and Sort Step:

css Copy code

scss Copy code

(Hello, 2) (World, 1) (Hadoop, 2) (is, 1) (great, 1)

makefile Copy code

Hello: 2 World: 1 Hadoop: 2 is: 1 great: 1

Code Example (Pseudo-code):

python Copy code

# Map function def map_function(input_data): output = [] for line in input_data: for

arduino Copy code

{'Hello': 2, 'World': 1, 'Hadoop': 2, 'is': 1, 'great': 1}

You might also like