0% found this document useful (0 votes)
19 views1 page

Hadoop Streaming Hadoop Pipes Swig: 4 Inputs and Outputs

m3

Uploaded by

p001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views1 page

Hadoop Streaming Hadoop Pipes Swig: 4 Inputs and Outputs

m3

Uploaded by

p001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

MapReduce Tutorial

Hadoop Streaming is a utility which allows users to create and run jobs with any
executables (e.g. shell utilities) as the mapper and/or the reducer.
Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications
(non JNITM based).

4 Inputs and Outputs


The MapReduce framework operates exclusively on <key, value> pairs, that is, the
framework views the input to the job as a set of <key, value> pairs and produces a set of
<key, value> pairs as the output of the job, conceivably of different types.
The key and value classes have to be serializable by the framework and hence need to
implement the Writable interface. Additionally, the key classes have to implement the
WritableComparable interface to facilitate sorting by the framework.
Input and Output types of a MapReduce job:
(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3,
v3> (output)

5 Example: WordCount v1.0


Before we jump into the details, lets walk through an example MapReduce application to get
a flavour for how they work.
WordCount is a simple application that counts the number of occurences of each word in a
given input set.
This works with a local-standalone, pseudo-distributed or fully-distributed Hadoop
installation (Single Node Setup).

5.1 Source Code

WordCount.java

1. package org.myorg;

2.

3. import java.io.IOException;

4. import java.util.*;

5.

6. import org.apache.hadoop.fs.Path;

7. import org.apache.hadoop.conf.*;

Copyright 2008 The Apache Software Foundation. All rights reserved. Page 3

You might also like