0% found this document useful (0 votes)

61 views6 pages

Hadoop Streaming

This document discusses Hadoop streaming using Python. It provides steps to install Python, write mapper and reducer code for a word count problem, and run the program locally and on HDFS. The mapper code takes input and emits each word and count. The reducer code sums the counts for each word. Common errors like bad interpreters are addressed. References for more information on Hadoop streaming and installing Python are also provided.

Uploaded by

Bigg Boss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views6 pages

Hadoop Streaming

Uploaded by

Bigg Boss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

HADOOP STREAMING

1. Install Python
# apt update
# apt install python-is-python3
# whereis python3
Hoặc
# apt update && sudo apt upgrade -y
# apt install software-properties-common -y
# add-apt-repository ppa:deadsnakes/ppa -y
# add-apt-repository ppa:deadsnakes/nightly -y
# apt update
# apt install python3.11
# python3.11 --version
2. Example Using Python WordCount
Mapper Phase Code
Tạo file mapper.py và cấp quyền chmod +x mapper.py
#!/usr/bin/python3
"""mapper.py"""

import sys

# input comes from STDIN (standard input)

for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
# increase counters
for word in words:
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
#
# tab-delimited; the trivial word count is 1
print ('%s\t%s' % (word, 1))
Reducer Phase Code
Tạo file reducer.py và cấp quyền chmod +x reducer.py

Biên soạn: Lê Thị Minh Châu

#!/usr/bin/python3
"""reducer.py"""

from operator import itemgetter

import sys

current_word = None
current_count = 0
word = None

# input comes from STDIN

for line in sys.stdin:
# remove leading and trailing whitespace
line = line.strip()

# parse the input we got from mapper.py

word, count = line.split('\t', 1)

# convert count (currently a string) to int

try:
count = int(count)
except ValueError:
# count was not a number, so silently
# ignore/discard this line
continue

# this IF-switch only works because Hadoop sorts map

output
# by key (here: word) before it is passed to the
reducer
if current_word == word:
current_count += count
else:
if current_word:
# write result to STDOUT
print '%s\t%s' % (current_word,
current_count)
current_count = count
current_word = word

# do not forget to output the last word if needed!

if current_word == word:
print ('%s\t%s' % (current_word, current_count))
3. Thực thi chương trình WordCount trên thư mục cục bộ
$ echo "foo foo quux labs foo bar quux" |
/home/hadoopminhchau/mapper.py

Biên soạn: Lê Thị Minh Châu

$ echo "foo foo quux labs foo bar quux" |
/home/hadoopminhchau/mapper.py | sort -k1,1 |
/home/hadoopminhchau/reducer.py

Tạo file data.txt chứa dữ liệu

$ cat ./data.txt | ./mapper.py

$ cat ./data.txt | ./mapper.py | sort -k1,1 | ./reducer.py

Biên soạn: Lê Thị Minh Châu

4. Thực thi chương trình WordCount trên HDFS
Tạo thư mục myinput chứa dữ liệu

Copy thư mục myinput vào HDFS

Chạy MapReduce job

$ hadoop jar hadoop-streaming-3.3.4.jar -file mapper.py -
mapper mapper.py -file reducer.py -reducer reducer.py -
input ./myinput -output ./myoutput

Biên soạn: Lê Thị Minh Châu

Hiển thị kết quả
$ hdfs dfs -cat ./myoutput/part-00000

5. Sửa một số lỗi

Nếu báo lỗi/usr/bin/env: ‘python\r’: No such file or directory

$ sudo apt install dos2unix
Nếu báo lỗi /usr/bin/python^m bad interpreter
$ vim mapper.py then :set ff=unix

Biên soạn: Lê Thị Minh Châu

6. References

[1] https://www.tutorialspoint.com/hadoop/hadoop_streaming.htm
[2] https://www.tutsmake.com/how-to-install-python-3-10-on-ubuntu-22-04/
[3] https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-
python/

Biên soạn: Lê Thị Minh Châu

Python: Learn Python in 24 Hours
From Everand
Python: Learn Python in 24 Hours
Alex Nordeen
4/5 (12)
Striver SDE Sheet
100% (2)
Striver SDE Sheet
14 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Basic Language
From Everand
Basic Language
Durgesh
No ratings yet
Oracle LibraryCacheInternals JulianDyke
No ratings yet
Oracle LibraryCacheInternals JulianDyke
66 pages
Data Miner 2 Api Guide PDF
No ratings yet
Data Miner 2 Api Guide PDF
58 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Python-Deprecated Library v1.1 Documentation
From Everand
Python-Deprecated Library v1.1 Documentation
Laurent LAPORTE
No ratings yet
cài đặt hadoop
No ratings yet
cài đặt hadoop
6 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Hadoop Mapreduce Python Script
No ratings yet
Hadoop Mapreduce Python Script
3 pages
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Easy Programming for Everyone
From Everand
Easy Programming for Everyone
Umar Asghar
No ratings yet
Python Pranks and Mischief with NLP
From Everand
Python Pranks and Mischief with NLP
Edward Franklin
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
5 pages
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
TP3 - Hadoop Python - Wordcount
No ratings yet
TP3 - Hadoop Python - Wordcount
6 pages
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
C in 30 Pages
From Everand
C in 30 Pages
U.Q. Magnusson
4.5/5 (2)
Profound Linux For Developers
From Everand
Profound Linux For Developers
Onder Teker
No ratings yet
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet
Computer Programming The Doctrine
From Everand
Computer Programming The Doctrine
Adesh Silva
No ratings yet
Chuong 2
No ratings yet
Chuong 2
47 pages
C Programming Language
From Everand
C Programming Language
Younish Pathan
No ratings yet
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
Unit 3 MapReduce Part 2
No ratings yet
Unit 3 MapReduce Part 2
12 pages
"C Programming for Beginners: A Step-by-Step Guide"
From Everand
"C Programming for Beginners: A Step-by-Step Guide"
Lov kush
No ratings yet
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Collection of Raspberry Pi Projects
From Everand
Collection of Raspberry Pi Projects
Guillermo Perez Guillen
5/5 (1)
Python for Absolute Beginners: Learn to Code Fast!
From Everand
Python for Absolute Beginners: Learn to Code Fast!
Ibnul Jaif Farabi
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
19Nh14-102190051-Lab10 - Chương Trình MapReduce Process Units
No ratings yet
19Nh14-102190051-Lab10 - Chương Trình MapReduce Process Units
6 pages
Learn C Programming through Nursery Rhymes and Fairy Tales: Classic Stories Translated into C Programs
From Everand
Learn C Programming through Nursery Rhymes and Fairy Tales: Classic Stories Translated into C Programs
Shari Eskenas
No ratings yet
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
42 Astoundingly Useful Scripts and Automations for the Macintosh
From Everand
42 Astoundingly Useful Scripts and Automations for the Macintosh
Jerry Stratton
No ratings yet
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
From Everand
Python for Beginners: An Introduction to Learn Python Programming with Tutorials and Hands-On Examples
Nathan Metzler
4/5 (2)
Python - CoBan
No ratings yet
Python - CoBan
5 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
02-Hadoop Ecosystem
No ratings yet
02-Hadoop Ecosystem
21 pages
Mastering Python: A Comprehensive Guide for Beginners and Experts
From Everand
Mastering Python: A Comprehensive Guide for Beginners and Experts
Rick Spair
No ratings yet
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Commands in Hadoop
No ratings yet
Commands in Hadoop
7 pages
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
E Learning Week2 Format
No ratings yet
E Learning Week2 Format
29 pages
PROGRAMMING WITH PYTHON: Master the Basics and Beyond with Hands-On Projects and Expert Guidance (2024 Guide for Beginners)
From Everand
PROGRAMMING WITH PYTHON: Master the Basics and Beyond with Hands-On Projects and Expert Guidance (2024 Guide for Beginners)
ERROL HOWARD
No ratings yet
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
#1 Book on Python Programming
From Everand
#1 Book on Python Programming
Minhaj
No ratings yet
Specialized Model in Software Engineering: Component Based Development
No ratings yet
Specialized Model in Software Engineering: Component Based Development
6 pages
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
No ratings yet
Non-Divisible Subset - ENSAH-IC-001 1573745597 Question - Contests - HackerRank
4 pages
Course Outline
No ratings yet
Course Outline
6 pages
CIT304 Chapter 2 Y24
No ratings yet
CIT304 Chapter 2 Y24
18 pages
HTML - CSS - Input Forms 2 PDF
No ratings yet
HTML - CSS - Input Forms 2 PDF
58 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
31 pages
Spiros Styliaras: Devops Engineer
No ratings yet
Spiros Styliaras: Devops Engineer
2 pages
Course:: EE306: Introduction To Computing
No ratings yet
Course:: EE306: Introduction To Computing
4 pages
Angularjs Cheatsheet
No ratings yet
Angularjs Cheatsheet
4 pages
L1 - Instructions - Intro - Operations - Operands of The Computer
No ratings yet
L1 - Instructions - Intro - Operations - Operands of The Computer
19 pages
Database Ass
No ratings yet
Database Ass
4 pages
SDKRef
No ratings yet
SDKRef
305 pages
Nested Loop: Muhammad Ahmad Lecturer Cs Department
100% (1)
Nested Loop: Muhammad Ahmad Lecturer Cs Department
16 pages
RAW Paste Data
100% (7)
RAW Paste Data
1 page
CSE2005 Lab Da1
No ratings yet
CSE2005 Lab Da1
25 pages
Practical List: C U S T
No ratings yet
Practical List: C U S T
11 pages
Spring Boot Notes 1
No ratings yet
Spring Boot Notes 1
83 pages
A Short Note On How To Model Liabilities in Prophet
100% (1)
A Short Note On How To Model Liabilities in Prophet
9 pages
TR Command in Unix/Linux With Examples: $ TR (OPTION) SET1 (SET2)
No ratings yet
TR Command in Unix/Linux With Examples: $ TR (OPTION) SET1 (SET2)
4 pages
Akash Khandagale JavaDeveloper Resume PDF
No ratings yet
Akash Khandagale JavaDeveloper Resume PDF
2 pages
PEGA Interview Questions
0% (1)
PEGA Interview Questions
2 pages
Architecture Java Runtime Environment
No ratings yet
Architecture Java Runtime Environment
12 pages
Django Ppts
No ratings yet
Django Ppts
243 pages
4 Bit Multiplier
No ratings yet
4 Bit Multiplier
7 pages
Se Module 2
No ratings yet
Se Module 2
28 pages
Flashfile App Own Algorithm
No ratings yet
Flashfile App Own Algorithm
12 pages
Functions of String in C++
No ratings yet
Functions of String in C++
9 pages

Hadoop Streaming

Uploaded by

Hadoop Streaming

Uploaded by

HADOOP STREAMING

# input comes from STDIN (standard input)

Biên soạn: Lê Thị Minh Châu

from operator import itemgetter

# input comes from STDIN

# parse the input we got from mapper.py

# convert count (currently a string) to int

# this IF-switch only works because Hadoop sorts map

# do not forget to output the last word if needed!

Biên soạn: Lê Thị Minh Châu

Tạo file data.txt chứa dữ liệu

$ cat ./data.txt | ./mapper.py

$ cat ./data.txt | ./mapper.py | sort -k1,1 | ./reducer.py

Biên soạn: Lê Thị Minh Châu

Copy thư mục myinput vào HDFS

Chạy MapReduce job

Biên soạn: Lê Thị Minh Châu

5. Sửa một số lỗi

Nếu báo lỗi/usr/bin/env: ‘python\r’: No such file or directory

Biên soạn: Lê Thị Minh Châu

Biên soạn: Lê Thị Minh Châu

You might also like