0% found this document useful (0 votes)

19 views

Assignment 1 - 2024

The document describes two MapReduce assignments. The first asks to analyze population census data to count births by month, and suggests handling many records by adding more reducers. The second asks to precompute friend recommendations by counting common friends between users, and provides sample input/output and asks to design the MapReduce job.

Uploaded by

claudia wong

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Assignment 1 - 2024

Uploaded by

claudia wong

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment 1: MapReduce Design

Out: Feb 29, 2024

Due: Mar 12, 2024, end of day

1. (4pts) Consider that National Bureau of Statistics wants to analyze its population census data.
The dataset contains the details of babies born in the United States in 2017. Each record is of the
form and there are around 17.23 million
records. In order to find the number of babies born during each month of the year, you come up
with the following mapper and reducer.

The MapReduce cluster provided to you consists of N mappers and but only 2 reducers as shown
in the figure above. Reducer1 receives all (key, value) pairs where keys are between A and M
inclusive and Reducer2 receives (key, value) pairs between N and Z inclusive.

Given that mapper and reducer function produces the correct output, what possible issue(s) could
you face while processing a job consisting of 17.23 million records? Suggest a workaround for
that issue.

2. (6pts) Facebook has a list of friends (note that friends are a bi-directional thing on Facebook.
If I'm your friend, you're mine). They also have lots of disk space and they serve hundreds of
millions of requests everyday. They've decided to pre-compute calculations when they can to
reduce the processing time of requests. One common processing request is the "You and Joe
have 230 friends in common" feature. When you visit someone's profile, you see a list of friends
that you have in common. This list doesn't change frequently so it'd be wasteful to recalculate it
every time you visited the profile. We're going to use mapreduce so that we can calculate
everyone's common friends once a day and store those results.

Assume the input file are stored as User: [List of Friends], and the list of friends are sorted, for
example

A: [B, C, D, E, F]

B: [A, C, F]

C: [A, B]

D: [A]

E: [A]

F: [A, B]

This friendship network can be visualized as

As you can see, A and B has common friend C and F, and A and E has no common friend. Your
output will contain a common friends list for all user pairs (not including the one with no
common friend). For a pair of user, you only need to output their common friends once. For
example, you will only output common friends for [B, C] and [C, B] once as [B, C]. It’s also
okay that the common friends list is not sorted. For the above data, you will have the output:

[A, B]: [C, F]

[A, C]: [B]

[A, F]: [B]

[B, C]: [A]

[B, D]: [A]

[B, E]: [A]

[B, F]: [A]

[C, D]: [A]

[C, E]: [A]

[C, F]: [A, B]

[D, E]: [A]

[D, F]: [A]

[E, F]: [A]

Given this input and desired output, design a MapReduce job to perform the required processing.
In particular, detail the sequence of map/reduce phases of your algorithm: what are the map keys,
what are the map values, what are the reduce keys, what are the reduce values, what does the
map function do, what does the reduce function do. Also indicate if there is a possibility to use a
combiner at each step. You can use natural language, diagrams, examples AND/OR pseudo-code
to describe the algorithm, as you prefer (so long as it is readable).

(hint: given one input line A: [B, C, D, E, F], you can imply that B and C have common friend
A. And it’s okay to use a tuple, such as (user1, user2), as the key of key-value pair.)

Common Friends Problem
No ratings yet
Common Friends Problem
42 pages
BDA List of Experiments For Practical Exam
No ratings yet
BDA List of Experiments For Practical Exam
21 pages
Job
No ratings yet
Job
4 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
MR Databases
No ratings yet
MR Databases
52 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Module 1 Algorithm For Massive Datasets
No ratings yet
Module 1 Algorithm For Massive Datasets
59 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
37 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Section8 Mapreduce Solution PDF
No ratings yet
Section8 Mapreduce Solution PDF
5 pages
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
No ratings yet
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
30 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Map Reduce
No ratings yet
Map Reduce
26 pages
Big Data Lab
No ratings yet
Big Data Lab
12 pages
BDA RepeatedImp Questions
No ratings yet
BDA RepeatedImp Questions
30 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
14 MapReduce PDF
100% (1)
14 MapReduce PDF
82 pages
14 MapReduce
100% (1)
14 MapReduce
82 pages
BDA Module 3
No ratings yet
BDA Module 3
66 pages
Map Reduce Design and EXECUTION FRAMEWORK
No ratings yet
Map Reduce Design and EXECUTION FRAMEWORK
21 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Bda - Unit I - Lecture 6, 7
No ratings yet
Bda - Unit I - Lecture 6, 7
48 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
MapReduce Questions
No ratings yet
MapReduce Questions
8 pages
Problem-Solving Using Mapreduce/Hadoop
No ratings yet
Problem-Solving Using Mapreduce/Hadoop
22 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
CC UNIT-7
No ratings yet
CC UNIT-7
16 pages
Solution - BDA - IA1 - 23-24
No ratings yet
Solution - BDA - IA1 - 23-24
10 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
No ratings yet
Join Algorithms Using Mapreduce: A Survey: Vikas Jadhav, Jagannath Aghav, Sunil Dorwani
5 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Exp5 BDI 60004200124
No ratings yet
Exp5 BDI 60004200124
5 pages
MapReduce Algo Design Final
No ratings yet
MapReduce Algo Design Final
46 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Map reduce
No ratings yet
Map reduce
35 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
Lesson 2 A Review of Hadoop
No ratings yet
Lesson 2 A Review of Hadoop
6 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
Learning With Hadoop Based Data Mining: - A Case Study On Mapreduce
No ratings yet
Learning With Hadoop Based Data Mining: - A Case Study On Mapreduce
38 pages
MapReduce: Simplified Data Processing On Large Clusters
100% (1)
MapReduce: Simplified Data Processing On Large Clusters
13 pages
8300 17977 1 PB
No ratings yet
8300 17977 1 PB
19 pages
Data Processing For Large Database Using Mapreduce Approach Using Apso
No ratings yet
Data Processing For Large Database Using Mapreduce Approach Using Apso
59 pages
exp5bdafinal
No ratings yet
exp5bdafinal
7 pages
UNIT 2-tt1
No ratings yet
UNIT 2-tt1
7 pages
5 RK_MapReduce_v3
No ratings yet
5 RK_MapReduce_v3
30 pages