0% found this document useful (0 votes)

16 views3 pages

Hadoop Mapreduce Python Script

Uploaded by

zammy official

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views3 pages

Hadoop Mapreduce Python Script

Uploaded by

zammy official

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Hadoop Streaming Program using Python

____________MAPPER_____________

1> make a file named mapper.py and paste below python code for mapper in it

$ nano mapper.py

#!/usr/bin/env python

import sys

for line in sys.stdin:

line = line.strip()

words = line.split()

for word in words:

print '%s\t%s' % (word, 1)

--------understanding above code---------------

#[ for line in sys.stdin: ] described that input comes from standard input (STDIN).
Standard input(stdin), is the source of input data for python ,

#[ line = line.strip() ] removes extra spaces

#[ words = line.split() ] splits line into words

#[ for word in words: ] increases counters

#[ print '%s\t%s' % (word, 1) ] will write the result to (stdout) . This output will
input for reducer

2> Grant permission to mapper.py

$ chmod 744 /home/ubuntu/mapper.py

____________REDUCER_____________

3> make a file named reducer.py and paste below python code for reducer in it

$ nano reducer.py

#!/usr/bin/env python

from operator import itemgetter

import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:

line = line.strip()

word, count = line.split('\t', 1)

try:
count = int(count)
except ValueError:

continue

if current_word == word:
current_count += count
else:
if current_word:

print '%s\t%s' % (current_word, current_count)

current_count = count
current_word = word

if current_word == word:
print '%s\t%s' % (current_word, current_count)
----understanding above code----

#The code in reducer.py will read results of mapper.py through standard input so , output
of mapper.py and input of reducer.py must match .

#[ word, count = line.split('\t', 1) ] will parse input got from mapper

#[ try:
count = int(count)
except ValueError: ] will convert count which is in currently string format to int
because count is going to be a number , i.e int.

#The [continue] statement after the code will ignore the line if count was not the number , i.e int

#[ if current_word == word:
current_count += count
else:
if current_word: ] here if works because hadoop sorts map output i.e word before it is passed to the reducer

#[ print '%s\t%s' % (current_word, current_count)

current_count = count
current_word = word] this will write result to standard output (STDOUT)

4> Grant all permission to reducer.py

$ chmod 744 /home/ubuntu/reducer.py

RUNNING PYTHON CODE ON HADOOP_

S.
5> first copy the files that has to be Processed from our local file system to Hadoop’s HDF

$ hadoop fs -put <filename> <input>

6> run hadoop streaming jar file which will allow python code on hadoop followed by mapper reducer input
and output

$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -file

/home/ubuntu/mapper.py -mapper /home/ubuntu/mapper.py -file /home/ubuntu/reducer.py
-reducer /home/ubuntu/reducer.py -input in -output out1

----------Understanding above command-------------------

Here -file takes File/dir to be shipped in the Job jar file -input takes DFS input file for the Map step .
-mapper takes the streaming command to run map steps . -reducer takes the streaming command to run
reduce step

Human Resource Policies of Tata Consultancy Services
100% (3)
Human Resource Policies of Tata Consultancy Services
20 pages
Hadoop Python MapReduce Tutorial For Beginners
No ratings yet
Hadoop Python MapReduce Tutorial For Beginners
15 pages
01e Sensitivity Analysis and Duality
100% (2)
01e Sensitivity Analysis and Duality
42 pages
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
5 pages
Unit 3 MapReduce Part 2
No ratings yet
Unit 3 MapReduce Part 2
12 pages
TP3 - Hadoop Python - Wordcount
No ratings yet
TP3 - Hadoop Python - Wordcount
6 pages
Assignment 04_ Saiful Islam
No ratings yet
Assignment 04_ Saiful Islam
6 pages
Commands in Hadoop
No ratings yet
Commands in Hadoop
7 pages
MapReduce (Streaming) TP Report
No ratings yet
MapReduce (Streaming) TP Report
16 pages
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
Mapred Streaming
No ratings yet
Mapred Streaming
14 pages
Word Count
No ratings yet
Word Count
3 pages
Hai Hadoop
No ratings yet
Hai Hadoop
14 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
big data - map reduce
No ratings yet
big data - map reduce
5 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
Mapreduce
No ratings yet
Mapreduce
7 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
Bda File
No ratings yet
Bda File
28 pages
Hadoop Streaming
No ratings yet
Hadoop Streaming
6 pages
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
Map Reduce
No ratings yet
Map Reduce
30 pages
Palak
No ratings yet
Palak
10 pages
MapReduce Enhanced Guide
No ratings yet
MapReduce Enhanced Guide
3 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Unit-2 (Hadoop)
No ratings yet
Unit-2 (Hadoop)
16 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
23 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Midterm Examination: SUBJECT: Scalable and Distributed Computing (ID: IT1391U)
No ratings yet
Midterm Examination: SUBJECT: Scalable and Distributed Computing (ID: IT1391U)
3 pages
Profound Linux For Developers
From Everand
Profound Linux For Developers
Onder Teker
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Python-Deprecated Library v1.1 Documentation
From Everand
Python-Deprecated Library v1.1 Documentation
Laurent LAPORTE
No ratings yet
Configure Hadoop Cluster in Pseudo Distributed Mode. Try Hadoop Basic Commands
No ratings yet
Configure Hadoop Cluster in Pseudo Distributed Mode. Try Hadoop Basic Commands
88 pages
Bda Lab Output
No ratings yet
Bda Lab Output
22 pages
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
Module 3 - Mapreduce
No ratings yet
Module 3 - Mapreduce
40 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Introduction To Batch Processing
No ratings yet
Introduction To Batch Processing
23 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
4 pages
Bda Practical 2
No ratings yet
Bda Practical 2
3 pages
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Day 6
No ratings yet
Day 6
12 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Mapreduce
No ratings yet
Mapreduce
94 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Modeling and Path Simulation of An Autonomous Tomato Picking Robot
No ratings yet
Modeling and Path Simulation of An Autonomous Tomato Picking Robot
12 pages
Service Access Codes (SAC) For IN Services As Supplied by Internet Cell
No ratings yet
Service Access Codes (SAC) For IN Services As Supplied by Internet Cell
1 page
Cse Placement Status
No ratings yet
Cse Placement Status
6 pages
Inf505 Chapter 2
No ratings yet
Inf505 Chapter 2
6 pages
Advanced Topics in Logic Design Final Exam
No ratings yet
Advanced Topics in Logic Design Final Exam
3 pages
9825A Quick Reference Guide
No ratings yet
9825A Quick Reference Guide
29 pages
Data Structures Using Python
No ratings yet
Data Structures Using Python
116 pages
FX 9860 G
No ratings yet
FX 9860 G
16 pages
Cup To Disc Ratio Glaucoma
No ratings yet
Cup To Disc Ratio Glaucoma
10 pages
Gadung Chips Wukirsari
No ratings yet
Gadung Chips Wukirsari
20 pages
Final Exam Sample 2
No ratings yet
Final Exam Sample 2
12 pages
Market Survey On BSNL
No ratings yet
Market Survey On BSNL
39 pages
Google Montreal Hackathon
No ratings yet
Google Montreal Hackathon
26 pages
Cascmd en
No ratings yet
Cascmd en
369 pages
Wireless Network Coding: Opportunities & Challenges
No ratings yet
Wireless Network Coding: Opportunities & Challenges
8 pages
EC-Interview Questions PDF
No ratings yet
EC-Interview Questions PDF
4 pages
1 Back Up System Fanuc
60% (5)
1 Back Up System Fanuc
21 pages
Airline Reservation System
No ratings yet
Airline Reservation System
73 pages
RN
0% (1)
RN
49 pages
Lab 5 IP
No ratings yet
Lab 5 IP
11 pages
PHP Assignment-3
No ratings yet
PHP Assignment-3
3 pages
1972 18erdos
No ratings yet
1972 18erdos
4 pages
ALU (3) - Division Algorithms: Humboldt-Universität Zu Berlin
No ratings yet
ALU (3) - Division Algorithms: Humboldt-Universität Zu Berlin
29 pages
Linear Programming - Graphical Method
No ratings yet
Linear Programming - Graphical Method
17 pages
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
No ratings yet
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
26 pages
Block Chain - in - Education For Sribd
100% (1)
Block Chain - in - Education For Sribd
14 pages
3-2metodologi Nursalam EDISI 4-21 NOV
No ratings yet
3-2metodologi Nursalam EDISI 4-21 NOV
12 pages
US Marine Corps - Signals Intelligence (SIGINT) MCWP 2-15.2
100% (6)
US Marine Corps - Signals Intelligence (SIGINT) MCWP 2-15.2
123 pages

Hadoop Mapreduce Python Script

Uploaded by

Hadoop Mapreduce Python Script

Uploaded by

Hadoop Streaming Program using Python

for line in sys.stdin:

for word in words:

print '%s\t%s' % (word, 1)

--------understanding above code---------------

#[ line = line.strip() ] removes extra spaces

#[ words = line.split() ] splits line into words

#[ for word in words: ] increases counters

2> ​Grant permission to mapper.py

$ chmod 744 /home/ubuntu/mapper.py

from operator import itemgetter

for line in sys.stdin:

word, count = line.split('\t', 1)

print '%s\t%s' % (current_word, current_count)

#[ word, count = line.split('\t', 1) ] will parse input got from mapper

#[ print '%s\t%s' % (current_word, current_count)

​4>​ ​Grant all permission to reducer.py

$ chmod 744 /home/ubuntu/reducer.py

____________​RUNNING PYTHON CODE ON HADOOP​_____________

​ $ hadoop fs -put <filename> <input>

$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar -file

----------Understanding above command-------------------

You might also like

2> Grant permission to mapper.py

4> Grant all permission to reducer.py

RUNNING PYTHON CODE ON HADOOP_

$ hadoop fs -put <filename> <input>