0% found this document useful (0 votes)

5 views10 pages

Map-reduce-Developing a map-reduce application – Map-reduce working procedure-2

Uploaded by

cakvlr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views10 pages

Map-reduce-Developing a map-reduce application – Map-reduce working procedure-2

Uploaded by

cakvlr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

MapReduce Applications

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Applications
Here are a few simple applications of interesting programs that
can be easily expressed as MapReduce computations.
Distributed Grep: The map function emits a line if it matches a
supplied pattern. The reduce function is an identity function that
just copies the supplied intermediate data to the output.
Count of URL Access Frequency: The map function processes
logs of web page requests and outputs (URL; 1). The reduce
function adds together all values for the same URL and emits a
(URL; total count) pair.
ReverseWeb-Link Graph: The map function outputs (target;
source) pairs for each link to a target URL found in a page named
source. The reduce function concatenates the list of all source
URLs associated with a given target URL and emits the pair:
(target; list(source))
Big Data Computing Vu Pham Hadoop MapReduce 2.0
Contd…
Term-Vector per Host: A term vector summarizes the
most important words that occur in a document or a set
of documents as a list of (word; frequency) pairs.

The map function emits a (hostname; term vector) pair

for each input document (where the hostname is
extracted from the URL of the document).

The reduce function is passed all per-document term

vectors for a given host. It adds these term vectors
together, throwing away infrequent terms, and then emits
a final (hostname; term vector) pair

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Contd…
Inverted Index: The map function parses each document,
and emits a sequence of (word; document ID) pairs. The
reduce function accepts all pairs for a given word, sorts
the corresponding document IDs and emits a (word;
list(document ID)) pair. The set of all output pairs forms a
simple inverted index. It is easy to augment this
computation to keep track of word positions.

Distributed Sort: The map function extracts the key from

each record, and emits a (key; record) pair. The reduce
function emits all pairs unchanged.

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Applications of MapReduce
(1) Distributed Grep:

Input: large set of files

Output: lines that match pattern

Map – Emits a line if it matches the supplied

pattern

Reduce – Copies the intermediate data to output

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Applications of MapReduce
(2) Reverse Web-Link Graph:

Input: Web graph: tuples (a, b)

where (page a → page b)

Output: For each page, list of pages that link to it

Map – process web log and for each input <source,

target>, it outputs <target, source>
Reduce - emits <target, list(source)>

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Applications of MapReduce
(3) Count of URL access frequency:

Input: Log of accessed URLs, e.g., from proxy server

Output: For each URL, % of total accesses for that URL

Map – Process web log and outputs <URL, 1>

Multiple Reducers - Emits <URL, URL_count>
(So far, like Wordcount. But still need %)
Chain another MapReduce job after above one
Map – Processes <URL, URL_count> and outputs
<1, (<URL, URL_count> )>
1 Reducer – Does two passes. In first pass, sums up all
URL_count’s to calculate overall_count. In second pass
calculates %’s
Emits multiple <URL, URL_count/overall_count>
Big Data Computing Vu Pham Hadoop MapReduce 2.0
Applications of MapReduce
(4) Map task’s output is sorted (e.g., quicksort)
Reduce task’s input is sorted (e.g., mergesort)

Sort
Input: Series of (key, value) pairs
Output: Sorted <value>s

Map – <key, value> → <value, _> (identity)

Reducer – <key, value> → <key, value> (identity)
Partitioning function – partition keys across reducers
based on ranges (can’t use hashing!)
• Take data distribution into account to balance
reducer tasks

Big Data Computing Vu Pham Hadoop MapReduce 2.0

The YARN Scheduler
• Used underneath Hadoop 2.x +
• YARN = Yet Another Resource Negotiator
• Treats each server as a collection of containers
– Container = fixed CPU + fixed memory

• Has 3 main components

– Global Resource Manager (RM)
• Scheduling
– Per-server Node Manager (NM)
• Daemon and server-specific functions
– Per-application (job) Application Master (AM)
• Container negotiation with RM and NMs
• Detecting task failures of that job
Big Data Computing Vu Pham Hadoop MapReduce 2.0
YARN: How a job gets a container
Resource Manager
Capacity Scheduler In this figure
• 2 servers (A, B)
• 2 jobs (1, 2)

1. Need 3. Container on Node B 2. Container Completed

container
Node A Node Manager A Node B
Node Manager B

Application Application Task

4. Start task, please!
Master 1 Master 2 (App2)

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Sniper2 mq4
No ratings yet
Sniper2 mq4
8 pages
Computer Methods in Power System Analysis - Stagg, El-Abiad
100% (3)
Computer Methods in Power System Analysis - Stagg, El-Abiad
438 pages
Data Structures and Algorithms-II
100% (6)
Data Structures and Algorithms-II
193 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Map Reduce_3
No ratings yet
Map Reduce_3
23 pages
Hadoop MapReduce2.0 (Part-I)
No ratings yet
Hadoop MapReduce2.0 (Part-I)
18 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
3.4 Map Scheduler
No ratings yet
3.4 Map Scheduler
23 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
19. Lecture MapReduce
No ratings yet
19. Lecture MapReduce
70 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
24 pages
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
24 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Lec 7
No ratings yet
Lec 7
10 pages
Lec 7
No ratings yet
Lec 7
10 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Assn - No:1 Cloud Computing Assignment 13.10.2019
No ratings yet
Assn - No:1 Cloud Computing Assignment 13.10.2019
4 pages
Unit 1 Lecture 3
No ratings yet
Unit 1 Lecture 3
12 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Unit-4-1
No ratings yet
Unit-4-1
12 pages
4a-MapReduce
No ratings yet
4a-MapReduce
47 pages
MapReduce_Unit3
No ratings yet
MapReduce_Unit3
27 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
No ratings yet
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
14 pages
T4_Mapreduce
No ratings yet
T4_Mapreduce
39 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
MapReduce Introduction
No ratings yet
MapReduce Introduction
34 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Data Science
No ratings yet
Data Science
7 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
BDA unit 3
No ratings yet
BDA unit 3
7 pages
Problem-Solving Using Mapreduce/Hadoop
No ratings yet
Problem-Solving Using Mapreduce/Hadoop
22 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
BDA Module 3
No ratings yet
BDA Module 3
66 pages
DM - Topic Five
No ratings yet
DM - Topic Five
30 pages
BDA unit-3
No ratings yet
BDA unit-3
63 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
BDA FW-4
No ratings yet
BDA FW-4
7 pages
lsde_workshop_wk9(2)
No ratings yet
lsde_workshop_wk9(2)
31 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
Big Data Analytics
No ratings yet
Big Data Analytics
44 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
53 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
532
No ratings yet
532
70 pages
20 pips EA with time
No ratings yet
20 pips EA with time
6 pages
unit3 (1)
No ratings yet
unit3 (1)
67 pages
Dsa Fab Assign2
No ratings yet
Dsa Fab Assign2
3 pages
Advanced Soft Computing
No ratings yet
Advanced Soft Computing
24 pages
Java Programing
No ratings yet
Java Programing
144 pages
Cryptography:installation - RST at Main Pyca:cryptography
No ratings yet
Cryptography:installation - RST at Main Pyca:cryptography
8 pages
CSE110 Assignment 07 Question
No ratings yet
CSE110 Assignment 07 Question
2 pages
Bio-Medical Image Processing
No ratings yet
Bio-Medical Image Processing
82 pages
Ai - Ii Lab Manual
No ratings yet
Ai - Ii Lab Manual
59 pages
Pointers Exercises
No ratings yet
Pointers Exercises
7 pages
Chapter03a Annotated PDF
No ratings yet
Chapter03a Annotated PDF
86 pages
What Is JVM
No ratings yet
What Is JVM
7 pages
Dump Linux MCQs
No ratings yet
Dump Linux MCQs
181 pages
Internship Training Programs-Python
No ratings yet
Internship Training Programs-Python
23 pages
JK Flip Flop to T Flip Flop Conversion[1] Copy (2)
No ratings yet
JK Flip Flop to T Flip Flop Conversion[1] Copy (2)
10 pages
Programing C++ Ans
No ratings yet
Programing C++ Ans
10 pages
Software Design Document
No ratings yet
Software Design Document
3 pages
46_CST3 (OOPS JOURNAL)
No ratings yet
46_CST3 (OOPS JOURNAL)
18 pages
Grade XII - Python Programs - To Be Written in Record
No ratings yet
Grade XII - Python Programs - To Be Written in Record
19 pages
MCA304
No ratings yet
MCA304
459 pages
Combinatorics: A Taste of Algebraic Combinatorics
No ratings yet
Combinatorics: A Taste of Algebraic Combinatorics
4 pages
Exercises ChurchTuring
No ratings yet
Exercises ChurchTuring
2 pages
Cosc 327
No ratings yet
Cosc 327
3 pages
Linux Commands Everyone Should Know
No ratings yet
Linux Commands Everyone Should Know
19 pages
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
No ratings yet
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
43 pages
File Handling
No ratings yet
File Handling
13 pages

Map-reduce-Developing a map-reduce application – Map-reduce working procedure-2

Uploaded by

Map-reduce-Developing a map-reduce application – Map-reduce working procedure-2

Uploaded by

MapReduce Applications

Big Data Computing Vu Pham Hadoop MapReduce 2.0

The map function emits a (hostname; term vector) pair

The reduce function is passed all per-document term

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Distributed Sort: The map function extracts the key from

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Input: large set of files

Map – Emits a line if it matches the supplied

Reduce – Copies the intermediate data to output

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Input: Web graph: tuples (a, b)

Output: For each page, list of pages that link to it

Map – process web log and for each input <source,

Big Data Computing Vu Pham Hadoop MapReduce 2.0

Input: Log of accessed URLs, e.g., from proxy server

Map – Process web log and outputs <URL, 1>

Map – <key, value> → <value, _> (identity)

Big Data Computing Vu Pham Hadoop MapReduce 2.0

• Has 3 main components

1. Need 3. Container on Node B 2. Container Completed

Application Application Task

Big Data Computing Vu Pham Hadoop MapReduce 2.0

You might also like