0% found this document useful (0 votes)

5 views

Lab4_hadoop Using WSL

The document outlines the process of analyzing maximum temperatures using Hadoop streaming with Python scripts for mapping and reducing. It includes commands for unzipping files, creating and modifying mapper and reducer scripts, and executing a Hadoop job to process input files. The mapper extracts date and temperature data, while the reducer calculates the maximum temperature for each date.

Uploaded by

siddharthrishi17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Lab4_hadoop Using WSL

Uploaded by

siddharthrishi17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

unzip MaxTemperatureAnalysis.zip.

zip -d /home/Hadoop/
unzip MaxTemperatureAnalysis.zip -d /home/Hadoop/

hadoop@hp:~$ touch max_temp_mapper.py

hadoop@hp:~$ vi max_temp_mapper.py
hadoop@hp:~$ touch max_temp_reducer.py
hadoop@hp:~$ vi max_temp_reducer.py
hadoop@hp:~$ chmod +x max_temp_mapper.py max_temp_reducer.py
hadoop@hp:~$ realpath max_temp_mapper.py
/home/hadoop/max_temp_mapper.py
hadoop@hp:~$ realpath max_temp_reducer.py
/home/hadoop/max_temp_reducer.py

hadoop@hp:~$ hdfs dfs -put file1.txt /user/hadoop/

hadoop@hp:~$ hdfs dfs -put file2.txt /user/hadoop/

hadoop@hp:~$ hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar

\
-input /user/hadoop/file1.txt \
-input /user/hadoop/file2.txt \
-output /user/hadoop/max_temp_output \
-mapper /home/hadoop/max_temp_mapper.py \
-reducer /home/hadoop/max_temp_reducer.py

hdfs dfs -cat /user/hadoop/max_temp_output/part-00000 (delete previous output

folder: hadoop fs -rm -r /user/hadoop/max_temp_output)

max_temp_reducer.py

#!/usr/bin/env python3
import sys

current_date = None
max_temperature = float('-inf')

for line in sys.stdin:

# Strip any leading/trailing whitespace
line = line.strip()

# Split the line into key and value

date, temperature = line.split('\t')

try:
# Convert temperature to float
temperature = float(temperature)
except ValueError:
# Handle the case where the temperature is not a valid float
continue

# Check if we are still processing the same date

if current_date == date:
# Update the maximum temperature for this date
if temperature > max_temperature:
max_temperature = temperature
else:
# If we have moved to a new date, output the result for the previous date
if current_date is not None:
print(f"{current_date}\t{max_temperature}")

# Start processing the new date

current_date = date
max_temperature = temperature

# Output the result for the last date

if current_date is not None:
print(f"{current_date}\t{max_temperature}")

max_temp_mapper.py
#!/usr/bin/env python3
import sys

for line in sys.stdin:

# Strip any leading/trailing whitespace
line = line.strip()

# Split the line into columns based on comma

parts = line.split(',')

# Check if the line has the expected number of columns

if len(parts) >= 3:
try:
# Extract the date and temperature
date = parts[1]
temperature = float(parts[3])

# Output the date and temperature as key-value pair

print(f"{date}\t{temperature}")
except ValueError:
# Handle the case where the temperature is not a valid float
continue

Backup and Restoration SOP Sample
56% (9)
Backup and Restoration SOP Sample
7 pages
IBM Spectrum Protect - Level 2 Quiz
No ratings yet
IBM Spectrum Protect - Level 2 Quiz
15 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Mapper Reducer Code for File1
No ratings yet
Mapper Reducer Code for File1
1 page
BDA4
No ratings yet
BDA4
7 pages
Theory IoT7
No ratings yet
Theory IoT7
2 pages
Climate Analysis System
No ratings yet
Climate Analysis System
14 pages
miniproject.
No ratings yet
miniproject.
14 pages
Lab Assignment -4 B-3 5 6
No ratings yet
Lab Assignment -4 B-3 5 6
2 pages
AP20110010464.docx
No ratings yet
AP20110010464.docx
7 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Unit Iii LM
No ratings yet
Unit Iii LM
14 pages
Recurrent Neural Network-Programs
No ratings yet
Recurrent Neural Network-Programs
9 pages
Bda Material Unit 3
No ratings yet
Bda Material Unit 3
14 pages
Tcs EDA Question
0% (1)
Tcs EDA Question
5 pages
Practical 2-2
No ratings yet
Practical 2-2
9 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Import Redfish Import Time From Datetime Import Datetime
No ratings yet
Import Redfish Import Time From Datetime Import Datetime
6 pages
Project-2
No ratings yet
Project-2
6 pages
Group B - A3N
No ratings yet
Group B - A3N
5 pages
Worksheet 6th
No ratings yet
Worksheet 6th
6 pages
195-student - Jupyter Notebook
No ratings yet
195-student - Jupyter Notebook
9 pages
LAB4TASK2.ipynb - Colaboratory
No ratings yet
LAB4TASK2.ipynb - Colaboratory
5 pages
Python Project
No ratings yet
Python Project
1 page
Icyhot Py
No ratings yet
Icyhot Py
1 page
ADBMS-Module 3
No ratings yet
ADBMS-Module 3
115 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
Group B - A3N
No ratings yet
Group B - A3N
5 pages
Bot_shreyasi_Assignment solutions
No ratings yet
Bot_shreyasi_Assignment solutions
5 pages
1_2_merged
No ratings yet
1_2_merged
12 pages
Weatherman
No ratings yet
Weatherman
2 pages
DA Lab Program-3
No ratings yet
DA Lab Program-3
9 pages
python1_6525dcb47295324e17b67ca124390904
No ratings yet
python1_6525dcb47295324e17b67ca124390904
5 pages
Weatherman
No ratings yet
Weatherman
2 pages
Class 10 Ai Practical
No ratings yet
Class 10 Ai Practical
7 pages
Weather
No ratings yet
Weather
33 pages
Python Study Group 1
No ratings yet
Python Study Group 1
1 page
Selamu Dawit Analysis
No ratings yet
Selamu Dawit Analysis
3 pages
Map Reduce
No ratings yet
Map Reduce
15 pages
Task 6 - Chp7 - v2
No ratings yet
Task 6 - Chp7 - v2
2 pages
12212174_BigdataFinal
No ratings yet
12212174_BigdataFinal
13 pages
MapReduce and Yarn
No ratings yet
MapReduce and Yarn
39 pages
Code
No ratings yet
Code
2 pages
41b Data Wrangling, Grouping and Aggregation
No ratings yet
41b Data Wrangling, Grouping and Aggregation
31 pages
Practical 2.ipynb - Colaboratory
No ratings yet
Practical 2.ipynb - Colaboratory
2 pages
Assignment No 1
No ratings yet
Assignment No 1
3 pages
BDA Lab Manual -BAD601-Final one - 7-11
No ratings yet
BDA Lab Manual -BAD601-Final one - 7-11
25 pages
Weather Cloud Program
No ratings yet
Weather Cloud Program
3 pages
Map Reduce
No ratings yet
Map Reduce
46 pages
SE LAB 3 Experiments
No ratings yet
SE LAB 3 Experiments
11 pages
Hadoop Questions
No ratings yet
Hadoop Questions
2 pages
Pal Baby Ki File
No ratings yet
Pal Baby Ki File
21 pages
Q1
No ratings yet
Q1
3 pages
allcodes
No ratings yet
allcodes
36 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Panda 2
No ratings yet
Panda 2
2 pages
Sammm
No ratings yet
Sammm
1 page
Temperature Monitoring System
No ratings yet
Temperature Monitoring System
3 pages
Analyzing IoT Data in Python Chapter2
No ratings yet
Analyzing IoT Data in Python Chapter2
35 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Windows Command Prompt
From Everand
Windows Command Prompt
Murat Yildirimoglu
No ratings yet
Oe Week 2 Assignments
No ratings yet
Oe Week 2 Assignments
9 pages
N4 Computer Practice TEST 2
No ratings yet
N4 Computer Practice TEST 2
9 pages
vSAN Datastore
No ratings yet
vSAN Datastore
12 pages
NCO Olympiad Sample Paper 2 For Class 6 With Solutions
No ratings yet
NCO Olympiad Sample Paper 2 For Class 6 With Solutions
24 pages
Product Brief Western Digital WD Red Pro HDD
No ratings yet
Product Brief Western Digital WD Red Pro HDD
3 pages
System service parts - ThinkPad X230 Tablet , X230i Tablet - Lenovo Support US
No ratings yet
System service parts - ThinkPad X230 Tablet , X230i Tablet - Lenovo Support US
1 page
Learnin G Activity Sheets For: Comp Uter 3
No ratings yet
Learnin G Activity Sheets For: Comp Uter 3
6 pages
PC 3000 Express
No ratings yet
PC 3000 Express
1 page
Exam Answers Memory Management
No ratings yet
Exam Answers Memory Management
2 pages
DS8000 virtualization of data
No ratings yet
DS8000 virtualization of data
10 pages
2.services and Componant of OS
No ratings yet
2.services and Componant of OS
18 pages
3-2 Storage Data Protection Technologies and Applications
No ratings yet
3-2 Storage Data Protection Technologies and Applications
48 pages
Cohesity SnapTree Solution Brief
No ratings yet
Cohesity SnapTree Solution Brief
2 pages
Aws Euinit23-64992 - 9767995576
No ratings yet
Aws Euinit23-64992 - 9767995576
2 pages
Computer Memory and Memory Unit
No ratings yet
Computer Memory and Memory Unit
6 pages
Flowcharting With The ANSI Standard: A Tutorial Ned Chapin
No ratings yet
Flowcharting With The ANSI Standard: A Tutorial Ned Chapin
28 pages
Computer RRB
No ratings yet
Computer RRB
19 pages
CSO Unit-4 Summary
No ratings yet
CSO Unit-4 Summary
3 pages
chtp5 - 11 TIF Beta
No ratings yet
chtp5 - 11 TIF Beta
8 pages
Partitioned Data Set Extended
No ratings yet
Partitioned Data Set Extended
2 pages
Storage Devices
No ratings yet
Storage Devices
6 pages
Classification of Memories
No ratings yet
Classification of Memories
17 pages
Troubleshooting Insufficient Disk Space
No ratings yet
Troubleshooting Insufficient Disk Space
3 pages
Department of Computer Sciences: Bahria University, Islamabad Campus
No ratings yet
Department of Computer Sciences: Bahria University, Islamabad Campus
5 pages
Western Digital Ultrastar DC HC555 Data Center 20-TB Hard Drive Data Sheet
No ratings yet
Western Digital Ultrastar DC HC555 Data Center 20-TB Hard Drive Data Sheet
2 pages
Unit 06
No ratings yet
Unit 06
34 pages
Computer Memory Research
No ratings yet
Computer Memory Research
10 pages
Rainbow Technology: BY P.Aswanth Sai
No ratings yet
Rainbow Technology: BY P.Aswanth Sai
15 pages

Lab4_hadoop Using WSL

Uploaded by

Lab4_hadoop Using WSL

Uploaded by

unzip MaxTemperatureAnalysis.zip.

hadoop@hp:~$ touch max_temp_mapper.py

hadoop@hp:~$ hdfs dfs -put file1.txt /user/hadoop/

hadoop@hp:~$ hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar

hdfs dfs -cat /user/hadoop/max_temp_output/part-00000 (delete previous output

for line in sys.stdin:

# Split the line into key and value

# Check if we are still processing the same date

# Start processing the new date

# Output the result for the last date

for line in sys.stdin:

# Split the line into columns based on comma

# Check if the line has the expected number of columns

# Output the date and temperature as key-value pair

You might also like