0% found this document useful (0 votes)
66 views25 pages

3 Mapreduce Notes

The document discusses MapReduce and Hadoop, describing the architecture and features of MapReduce including parallelism, moving computation to data, and fault tolerance. It also discusses MapReduce programming and reduce-side joins.

Uploaded by

Sandeep Boyina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views25 pages

3 Mapreduce Notes

The document discusses MapReduce and Hadoop, describing the architecture and features of MapReduce including parallelism, moving computation to data, and fault tolerance. It also discusses MapReduce programming and reduce-side joins.

Uploaded by

Sandeep Boyina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Scalable Data Processing with

MapReduce
Google’s CaseStudy
What is MapReduce?
MapReduce Programming Style
Hadoop MapReduce App
Architecture
MapReduce Architecture
Master node

MapReduce job
submitted by JobTracker
client computer

In our case: laptop

Slave node Slave node Slave node

TaskTracker TaskTracker TaskTracker

Task instance Task instance Task instance


JobTracker
TaskTracker & TaskInstance
Job Execution in Hadoop
MapReduce Features
MapReduce Features
• Provides simple programming model

• Parallelism

• Moving computation to data

• Fault-tolerance

• Provides job status and monitoring capabilities


MR Programming
in java/python
Reduce-side Join
• Basic idea: group by join key
– Map over both sets of tuples
– Emit tuple as value with join key as the intermediate
key
– Execution framework brings together tuples sharing
the same key
– Perform actual join in reducer
• Two variants
– 1-to-1 joins
– 1-to-many and many-to-many joins

You might also like