0% found this document useful (0 votes)
5 views6 pages

Lab01

The document outlines two exercises related to MapReduce functions in a lab setting. Exercise 1 details the input and output parameters for Mapper and Reducer functions, using examples for word counting. Exercise 2 provides source code for various MapReduce jobs, including calculating employee salaries and error logging, with specific implementations for summing, averaging, and counting data.

Uploaded by

dev.m.dodiya2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Lab01

The document outlines two exercises related to MapReduce functions in a lab setting. Exercise 1 details the input and output parameters for Mapper and Reducer functions, using examples for word counting. Exercise 2 provides source code for various MapReduce jobs, including calculating employee salaries and error logging, with specific implementations for summing, averaging, and counting data.

Uploaded by

dev.m.dodiya2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

8/9/2024 Lab-1

IE-494

Vansh Joshi
202101445
IE-494 Lab-1

❖ Exercise.1: Understand all solved problems here. Each problem has one Map
and one Reduce function. You need to document the following for both functions
in each problem:

1. Input parameters: Key and Value with one-line description about each
parameter. Description typically tells what data this parameter carries.

2. Output: Key and Value with one-line description about each output.
Description typically tells what data each output emits.

- For Mapper Function :


• Input parameters : We are taking sample.txt file as a input for the word
count. File contain any text.
For eg : Horse is running.
• Output parameters : It will output the key value pair where key contains
distinct words present in the sample.txt file and value contains the
frequency of the word present in line.
For eg : [Horse : 1], [is : 1], [running : 1]

- For Reducer Function :


• Input parameters : It would contain the unique occurrences of a word.
For eg : Horse : 1,1,1

• Output parameters : It would return all the words with it’s respective sum of
frequency.
For eg : Horse : 3

1
IE-494 Lab-1

❖ Exercise.2:
• Colab Link:

• Source Code:

1.

from mrjob.job import MRJob

import re

class EmpSum(MRJob):
def mapper(self, key, line):
record = re.split(',', line)
dno = record[2]
salary = int(record[3])
yield dno, salary

def reducer(self, dno, salaries):


yield dno, sum(salaries)

if __name__ == '__main__':
EmpSum.run()

2.

%%file empsumma.py
from mrjob.job import MRJob
import re

class EmpMaxSalary(MRJob):
def mapper(self, _, line):
record = re.split(',', line)
state = record[4]
if state == 'MA':
dno = record[2]
salary = int(record[3])
yield dno, salary

def reducer(self, dno, salaries):

2
IE-494 Lab-1

yield dno, max(salaries)

if __name__ == '__main__':
EmpMaxSalary.run()

3.

%%file empavg.py
from mrjob.job import MRJob
import re

class EmpAvgSalary(MRJob):
def mapper(self, _, line):
record = re.split(',', line)
dno = record[2]
salary = int(record[3])
yield dno, salary

def reducer(self, dno, salaries):

yield dno, sum(salaries) / len(salaries)

if __name__ == '__main__':
EmpAvgSalary.run()

4.

%%file empsalary_4.py
from mrjob.job import MRJob
import re

class EmpSalary(MRJob):
def mapper(self, _, line):
record = re.split(',', line)
dno = int(record[2])
salary = int(record[3])
if dno == 5 and salary > 100000:
yield None, record # Yield the entire record if it matches

if __name__ == '__main__':
EmpSalary.run()

5.

%%file emp_5.py

3
IE-494 Lab-1

from mrjob.job import MRJob


import re

class EmpCountByGender(MRJob):
def mapper(self, key, line):
record = re.split(',', line)
dno = record[2]
gender = record[5]
yield (dno, gender), 1 # Yield a tuple of (dno, gender) as key

def reducer(self, dno_gender, counts):


yield dno_gender, sum(counts) # Sum the counts for each (dno,
gender)

if __name__ == '__main__':
EmpCountByGender.run()

6.

%%file monthly_summary.py
from mrjob.job import MRJob
import re

class MonthlySummary(MRJob):

def mapper(self, _, line):


request = line.split(' ')
date = request[4]
# Extract year-month, number of requests (always 1), and
download size
match = re.search(r'\[(\d{2}/\w{3}/\d{4}):.+\] "\w+ .+" \d+
(\d+)', line)
if match:
year_month = match.group(1).split('/')[1] + '-' +
match.group(1).split('/')[2] # Format: Dec-2015
num_requests = 1
download_size = int(match.group(2))
yield year_month, (num_requests, download_size)

def reducer(self, year_month, values):


# Sum the number of requests and download sizes for each month
total_requests = 0
total_download = 0
for num_requests, download_size in values:
total_requests += num_requests
total_download += download_size
yield year_month, (total_requests, total_download / (1024 *
1024)) # Convert download size to MB

4
IE-494 Lab-1

if __name__ == '__main__':
MonthlySummary.run()

7.

%%file error_counter.py
from mrjob.job import MRJob

class Error404TimeStamps(MRJob):

def mapper(self, key, line):


request = line.split(' ')
url = request[10] #initial portion of the url
#Combining the requested resource to make complete url
url = url.replace('"','') + request[6][1:]

timestamp = request[3]+request[4] #Combining date and timedelay


status_code = request[8]
if status_code == "404":
yield timestamp, url

if __name__ == '__main__':
Error404TimeStamps.run()

--X –

You might also like