0% found this document useful (0 votes)
5 views

Lab4_hadoop Using WSL

The document outlines the process of analyzing maximum temperatures using Hadoop streaming with Python scripts for mapping and reducing. It includes commands for unzipping files, creating and modifying mapper and reducer scripts, and executing a Hadoop job to process input files. The mapper extracts date and temperature data, while the reducer calculates the maximum temperature for each date.

Uploaded by

siddharthrishi17
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lab4_hadoop Using WSL

The document outlines the process of analyzing maximum temperatures using Hadoop streaming with Python scripts for mapping and reducing. It includes commands for unzipping files, creating and modifying mapper and reducer scripts, and executing a Hadoop job to process input files. The mapper extracts date and temperature data, while the reducer calculates the maximum temperature for each date.

Uploaded by

siddharthrishi17
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

unzip MaxTemperatureAnalysis.zip.

zip -d /home/Hadoop/
unzip MaxTemperatureAnalysis.zip -d /home/Hadoop/

hadoop@hp:~$ touch max_temp_mapper.py


hadoop@hp:~$ vi max_temp_mapper.py
hadoop@hp:~$ touch max_temp_reducer.py
hadoop@hp:~$ vi max_temp_reducer.py
hadoop@hp:~$ chmod +x max_temp_mapper.py max_temp_reducer.py
hadoop@hp:~$ realpath max_temp_mapper.py
/home/hadoop/max_temp_mapper.py
hadoop@hp:~$ realpath max_temp_reducer.py
/home/hadoop/max_temp_reducer.py

hadoop@hp:~$ hdfs dfs -put file1.txt /user/hadoop/


hadoop@hp:~$ hdfs dfs -put file2.txt /user/hadoop/

hadoop@hp:~$ hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar


\
-input /user/hadoop/file1.txt \
-input /user/hadoop/file2.txt \
-output /user/hadoop/max_temp_output \
-mapper /home/hadoop/max_temp_mapper.py \
-reducer /home/hadoop/max_temp_reducer.py

hdfs dfs -cat /user/hadoop/max_temp_output/part-00000 (delete previous output


folder: hadoop fs -rm -r /user/hadoop/max_temp_output)

max_temp_reducer.py

#!/usr/bin/env python3
import sys

current_date = None
max_temperature = float('-inf')

for line in sys.stdin:


# Strip any leading/trailing whitespace
line = line.strip()

# Split the line into key and value


date, temperature = line.split('\t')

try:
# Convert temperature to float
temperature = float(temperature)
except ValueError:
# Handle the case where the temperature is not a valid float
continue

# Check if we are still processing the same date


if current_date == date:
# Update the maximum temperature for this date
if temperature > max_temperature:
max_temperature = temperature
else:
# If we have moved to a new date, output the result for the previous date
if current_date is not None:
print(f"{current_date}\t{max_temperature}")

# Start processing the new date


current_date = date
max_temperature = temperature

# Output the result for the last date


if current_date is not None:
print(f"{current_date}\t{max_temperature}")

max_temp_mapper.py
#!/usr/bin/env python3
import sys

for line in sys.stdin:


# Strip any leading/trailing whitespace
line = line.strip()

# Split the line into columns based on comma


parts = line.split(',')

# Check if the line has the expected number of columns


if len(parts) >= 3:
try:
# Extract the date and temperature
date = parts[1]
temperature = float(parts[3])

# Output the date and temperature as key-value pair


print(f"{date}\t{temperature}")
except ValueError:
# Handle the case where the temperature is not a valid float
continue

You might also like