Bda Lab
Bda Lab
FOR WOMEN
Experiment-1
Aim: Write a program to implement :
a) Map function
b) Reduce function
Output:
Output:
Experiment-2
Aim: Map reduce program to implement word count.
Map: Split text into words and map each word to a tuple of (word,1).
Reduce: Aggregate the counts for each word.
Code:
Map function –
def map_function(text_chunk):
word_list = text_chunk.split() # Split text into words
map_output = []
for word in word_list:
map_output.append((word, 1)) # Output tuple (word, 1)
return map_output
Reduce function –
from collections import defaultdict
def reduce_function(map_outputs):
word_count = defaultdict(int) # Dictionary to count occurrences of each word
for word, count in map_outputs:
word_count[word] += count # Aggregate the count for each word
return word_count
Reduce function-
from mrjob.job import MRJob
from mrjob.step import MRStep
class InvertedIndexReducer(MRJob):
def steps(self):
return [
MRStep(mapper=self.mapper, reducer=self.reducer)
]
def mapper(self, _, line):
# Split the line into document_id and content
doc_id, content = line.split("\t", 1)
# Tokenize content and emit word with document id
words = content.split()
for word in words:
yield word.lower(), doc_id
def reducer(self, word, doc_ids):
# Aggregate document IDs for each word
doc_id_list = set(doc_ids) # Use set to avoid duplicates
yield word, list(doc_id_list)
if __name__ == '__main__':
InvertedIndexReducer.run()
Experiment-4
Aim: HDFS commands.
Reduce function-
from collections import defaultdict
def reducer(mapped_data):
# To store the total sum and count of values per category
sums = defaultdict(float)
counts = defaultdict(int)
# Process each (key, value) pair from the mapper
for category, value in mapped_data:
sums[category] += value
counts[category] += 1
# Calculate averages for each category
averages = {}
for category in sums:
averages[category] = sums[category] / counts[category]
return averages