2 1-MapReduce
2 1-MapReduce
Major Components
• User Components:
– Mapper
– Reducer
– Combiner (Optional)
– Partitioner (Optional)
(Shuffle)
– Writable(s) (Optional)
• System Components:
– Master
– Input Splitter*
– Output Committer*
* You can use your own if you really want!
Spli Mapper
t0 0
Reduce Out
r0 0
Input
Spli Mapper
t1 1
Reduce Out
r1 1
Spli Mapper
t2 2
Input Splitter
• Is responsible for splitting your input into multiple
chunks
• These chunks are then used as input for your mappers
• Splits on logical boundaries. The default is 64MB per
chunk
– Depending on what you’re doing, 64MB might be a LOT of
data! You can change it
• Typically, you can just use one of the built in splitters,
unless you are reading in a specially formatted file
• Each map task corresponds to a single input split.
Record Reader
• RecordReader generates record (key-value
pair) and passes to the map function.
• The RecordReader is invoked repeatedly on
the input until the entire split is consumed.
Mapper
• Reads in input pair <K,V> (a section as split by the input
splitter)
• Outputs a pair <K’, V’>
• Ex. For our Word Count example, with the following input:
“The teacher went to the store. The store was closed; the
store opens in the morning. The store opens at 9am.”