Hadoop Streaming
Hadoop Streaming
Hadoop Streaming
1. Install Python
# apt update
# apt install python-is-python3
# whereis python3
Hoặc
# apt update && sudo apt upgrade -y
# apt install software-properties-common -y
# add-apt-repository ppa:deadsnakes/ppa -y
# add-apt-repository ppa:deadsnakes/nightly -y
# apt update
# apt install python3.11
# python3.11 --version
2. Example Using Python WordCount
Mapper Phase Code
Tạo file mapper.py và cấp quyền chmod +x mapper.py
#!/usr/bin/python3
"""mapper.py"""
import sys
current_word = None
current_count = 0
word = None
[1] https://www.tutorialspoint.com/hadoop/hadoop_streaming.htm
[2] https://www.tutsmake.com/how-to-install-python-3-10-on-ubuntu-22-04/
[3] https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-
python/