Hadoop - Session 7 Python
Hadoop - Session 7 Python
Python
www.Bigdatainpractice.com
Introduction to Python
Python Data Types
Working with Files
Conditions, Loops etc
Data Structures
Python Streaming on Hadoop
www.Bigdatainpractice.com
12/19/2015
Python Features
1. Open Source, All purpose
Programming Language
2. Developed by Guido
Van Rossum (first
released in 1991)
3. Guido wanted to bridge
the gap between C and
Shell
4. Rapid development
(Python is interpreted
Language therefore no
compilation)
www.Bigdatainpractice.com
12/19/2015
With open
www.Bigdatainpractice.com
www.Bigdatainpractice.com
12/19/2015
*********************************************
******************reducer.py******************
Look at script
*********************************************
hadoop jar <streaming.jar> -file /user/cloudera/mapper.py -file
/user/cloudera/reducer.py -mapper /user/cloudera/mapper.py -reducer
/user/cloudera/reducer.py -input /user/cloudera/INPUT1/SalesData.csv output /user/cloudera/OUT_PY
www.Bigdatainpractice.com
www.Bigdatainpractice.com
12/19/2015
www.Bigdatainpractice.com
12/19/2015
Thank You
www.Bigdatainpractice.com