0% found this document useful (0 votes)
40 views11 pages

Bda Lab

Uploaded by

kirti049btit21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views11 pages

Bda Lab

Uploaded by

kirti049btit21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

INDIRA GANDHI DELHI TECHNICAL UNIVERSITY

FOR WOMEN

BIG DATA ANALYTICS


PRACTICAL FILE
BTECH IT-1

Submitted to: Dr. Deepak

Submitted by: Kirti, 04901032021

S.No Title Date Remarks


1. WAP to implement: map function
and reduce function
2. Map reduce program to
implement word count.
3. Map reduce program to
implement Inverted Index.
4. HDFS commands.
5. Map reduce program to
implement Average calculation.

Experiment-1
Aim: Write a program to implement :
a) Map function
b) Reduce function

Code: a] # Custom implementation of the map function


def custom_map(function, iterable):
result = []
for item in iterable:
result.append(function(item))
return result
# Example function to apply to each element
def square(x):
return x * x
# Input list
numbers = [1, 2, 3, 4, 5]
# Using the custom map function
squared_numbers = custom_map(square, numbers)
# Output the result
print(squared_numbers)

Output:

b] from functools import reduce


# Function to sum two numbers
def add(x, y):
return x + y
# List of numbers
numbers = [1, 2, 3, 4, 5]
# Use reduce to sum the list
result = reduce(add, numbers)
print("The sum of the list is:", result)

Output:

Experiment-2
Aim: Map reduce program to implement word count.
Map: Split text into words and map each word to a tuple of (word,1).
Reduce: Aggregate the counts for each word.

Code:
 Map function –
def map_function(text_chunk):
word_list = text_chunk.split() # Split text into words
map_output = []
for word in word_list:
map_output.append((word, 1)) # Output tuple (word, 1)
return map_output

 Reduce function –
from collections import defaultdict
def reduce_function(map_outputs):
word_count = defaultdict(int) # Dictionary to count occurrences of each word
for word, count in map_outputs:
word_count[word] += count # Aggregate the count for each word
return word_count

Example: # Example input text


text = "MapReduce is a programming model for processing large data sets with
a distributed algorithm"

# Split text into chunks for Map function


text_chunks = [text] # In a real-world scenario, you'd have multiple chunks
from large files

# Apply map function


mapped_results = []
for chunk in text_chunks:
mapped_results.extend(map_function(chunk))

# Apply reduce function


word_count_result = reduce_function(mapped_results)

# Print word count


for word, count in word_count_result.items():
print(f'{word}: {count}')
Output:
Experiment-3
Aim: Map reduce program to implement inverted index.
Map- Map each word to its document ID.
Reduce- Aggregate document IDs for each word to create an inverted index.
Code:
 Map function-
from mrjob.job import MRJob
class InvertedIndexMapper(MRJob):
def mapper(self, _, line):
# Split the line into document_id and content
doc_id, content = line.split("\t", 1)
# Tokenize content and emit word with document id
words = content.split()
for word in words:
yield word.lower(), doc_id
if __name__ == '__main__':
InvertedIndexMapper.run()

 Reduce function-
from mrjob.job import MRJob
from mrjob.step import MRStep
class InvertedIndexReducer(MRJob):
def steps(self):
return [
MRStep(mapper=self.mapper, reducer=self.reducer)
]
def mapper(self, _, line):
# Split the line into document_id and content
doc_id, content = line.split("\t", 1)
# Tokenize content and emit word with document id
words = content.split()
for word in words:
yield word.lower(), doc_id
def reducer(self, word, doc_ids):
# Aggregate document IDs for each word
doc_id_list = set(doc_ids) # Use set to avoid duplicates
yield word, list(doc_id_list)
if __name__ == '__main__':
InvertedIndexReducer.run()

Example: Document 1: "MapReduce program example"


Document 2: "MapReduce for inverted index"
Output:

Experiment-4
Aim: HDFS commands.

1. HDFS command syntax

2. Basic HDFS command


3. Additional HDFS commands
Experiment-5

Aim: Map reduce program to implement average calculation.


Map- Map each record to (key, value) pairs where key is the category and
value is the number to be averaged.
Reduce- Compute the average for each key.
Code;
 Map function-
def mapper(record):
# Assuming record is in the format: "Category, Value"
category, value = record.split(', ')
# Emit the category as key and value as float number
return (category, float(value))

 Reduce function-
from collections import defaultdict
def reducer(mapped_data):
# To store the total sum and count of values per category
sums = defaultdict(float)
counts = defaultdict(int)
# Process each (key, value) pair from the mapper
for category, value in mapped_data:
sums[category] += value
counts[category] += 1
# Calculate averages for each category
averages = {}
for category in sums:
averages[category] = sums[category] / counts[category]
return averages

Example: [('A', 5), ('A', 10), ('B', 7), ('B', 12)]


Output:
Experiment-6
Aim:

You might also like