0% found this document useful (0 votes)
11 views

Lecture 5 MapReduce Working

Uploaded by

BHAWANI KUMARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture 5 MapReduce Working

Uploaded by

BHAWANI KUMARI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Processing Big Data

With Hadoop Map


Reduce Technology

By
Dr. Aditya Bhardwaj

[email protected]

Big Data Analytics and Business Intelligence (CSET/CMCA-580)


Learning Objectives

MapReduce
What is Job Tracker Task Tracker
Working
MapReduce Role Role
Architecture
Functional Architecture of Hadoop
• The core component of Hadoop includes HDFS and MapReduce.
Working of MapReduce
The basic unit of information used by MapReduce is a key-
value pair.
• The Map task takes a set of data and converts it into
another set of data, where individual elements are
broken down into tuples (key-value pairs).

• The Reduce task takes the output from the Map as an


input and the aggregates those intermediate key-value
pair which is the final output.

5/24
Example- To Demonstrate MapReduce Working
• For example, consider a MapReduce job that counts the number of times each
word is used across a set of documents.

• Note: Framework sorts all intermediate key-value pair by key, not by value 6/24
How does MapReduce Works High-Level Architecture?

• The Shuffle stage and the Reduce stage together are called
the Reduce stage.
• Shuffling: It is second phase of MapReduce used to sort, group
and shuffle the output coming from the Mapper function.
7/24
MapReduce Wordcount Realtime Applications
Application 1. Break down movie ratings by rating
score

Application 2. Log analysis from a web server

8/24
Job and Task Tracker in Hadoop Map Reduce Architecture

•There are two types of nodes for job execution


•One Master -JobTracker
•Multiple Slaves –TaskTracker
What is JobTracker?

• JobTracker is a node which can run on the NameNode


(MasterNode) to allocates the job to task trackers.

• It tracks resource availability and task life cycle


management, tracking its progress, fault tolerance etc.

10/24
Functions of JobTracker

Job Tracker –
• JobTracker receives the requests for MapReduce execution from the
client.

• JobTracker talks to the NameNode to determine the location of the


data.

• JobTracker finds the best TaskTracker nodes to execute tasks based on


the data locality (proximity of the data) and the available slots to
execute a task on a given node.

• JobTracker monitors the individual TaskTrackers and the submits back


the overall status of the job back to the client.

11/24
Functions of TaskTracker

TaskTracker –

• TaskTracker runs on DataNode.

• Map and Reduce functions are executed on DataNodes using TaskTrackers.

• TaskTracker run the tasks and report the status of task to JobTracker.
TaskTracker run on DataNodes. It has function of following the orders of
the job tracker and updating the job tracker with its progress status
periodically.

12/24
Features of MapReduce
1. Simplicity – MapReduce jobs are easy to run. Applications
can be written in any language such as java, C++,
andFeatures
python. of MapReduce
2. Scalability – MapReduce framework are built in such a way
that they can accommodate more machines as and when
required.
3. Synchronization: Execution of several concurrent processes
requires synchronization. The MapReduce framework tracks all
the tasks along with their mapping timings and start the
reduction process after the completion of mapping phase.

4. Fault Tolerance – MapReduce takes care of failures. If one


copy of data is unavailable, another machine has a copy of
the same key pair which can be used for solving the same
subtask.
Conclusion
• The functioning of MapReduce like we just went through is
a sequential flow.

• MapReduce Shuffling and Sorting occurs simultaneously to


summarize the Mapper intermediate output.

14/2
Thanks Note

15
tungal/presentations/ad2012

You might also like