0% found this document useful (0 votes)
10 views12 pages

STREAMING

Hadoop Streaming is a utility in the Apache Hadoop framework that enables users to run jobs in multiple programming languages, facilitating the processing of large data sets in distributed environments. Initially supporting only Java for MapReduce jobs, it now allows scripting languages like Python and Ruby. The functionality involves passing data between mapped and reduced jobs through input/output streams, utilizing mapper and reducer scripts to handle key-value pairs.

Uploaded by

dhanushraja32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

STREAMING

Hadoop Streaming is a utility in the Apache Hadoop framework that enables users to run jobs in multiple programming languages, facilitating the processing of large data sets in distributed environments. Initially supporting only Java for MapReduce jobs, it now allows scripting languages like Python and Ruby. The functionality involves passing data between mapped and reduced jobs through input/output streams, utilizing mapper and reducer scripts to handle key-value pairs.

Uploaded by

dhanushraja32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

STREAMING & PIPES

Hadoop Streaming is a utility module in the


Apache Hadoop framework that allows users
to create and run jobs in various programming
languages, not just Java. It is instrumental in
processing vast amounts of data across
distributed computing environments.
HISTORY
Apache introduced Hadoop Streaming as part of its
Hadoop project to tackle the challenges of Big Data. In
its initial stages, Hadoop only supported MapReduce
jobs written in Java.
However, with the advent of Hadoop Streaming,
scripting languages like Python and Ruby also came into
the picture.
FUNCTIONALITY AND FEATURES

Hadoop Streaming works by passing data


between mapped and reduced jobs via
input/output streams.
It leverages mapper scripts to convert input
data into a set of intermediate key-value
pairs, and reducer scripts to merge these key-
value pairs into smaller sets.
HADOOP_HOME/bin/hadoop jar
$

$HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

You might also like