Map Reduce Excercise

This document describes 4 problems related to data analysis using MapReduce. It asks to: 1) Implement the DISTINCT operator to return unique values for a column in 1 MapReduce stage. 2) Implement a SHUFFLE operator to randomly reorder a dataset using MapReduce. 3) Calculate the communication cost for a DISTINCT query on a column where another column meets a condition. 4) Design a MapReduce job to calculate average sales price by supplier from product sales records. It also asks true/false questions about MapReduce properties.

Uploaded by

Ashwin Ajmera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

182 views2 pages

Map Reduce Excercise

Uploaded by

Ashwin Ajmera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Cloud Computing for Data Analysis

Assignment – 1
1. The DISTINCT(X) operator is used to return only distinct (unique) values for datatype (or
column) X in the entire dataset .

As an example, for the following table A:

A.ID A.ZIPCODE A.AGE

1 12345 30
2 12345 40
3 78910 10
4 78910 10
5 78910 20

DISTINCT(A.ID) = (1, 2, 3, 4, 5)
DISTINCT(A.ZIPCODE) = (12345, 78910)
DISTINCT(A.AGE) = (30, 40, 10, 20)

Implement the DISTINCT(X) operator using Map-Reduce. Provide the algo-

rithm pseudocode. You should use only one Map-Reduce stage, i.e. the algorithm should
make only one pass over the data.

2. The SHUFFLE operator takes a dataset as input and randomly re-orders it.

Hint: Assume that we have a function rand(m) that is capable of outputting a random integer
between [1, m].
Implement the SHUFFLE operator using Map-Reduce. Provide the algorithm pseudocode.

3. What is the communication cost (in terms of total data flow on the network between mappers and
reducers) for following query using Map-Reduce:

Get DISTINCT(A.ID from A WHERE A.AGE > 30 )

The dataset A has 1000M rows, and 400M of these rows have A.AGE <= 30. DISTINCT(A.ID)
has 1M elements. A tuple emitted from any mapper is 1 KB in size.
4. Consider the checkout counter at a large supermarket chain. For each item sold, it generates a
record of the form [ProductId, Supplier, Price]. Here, ProductId is the unique identifier of a
product, Supplier is the supplier name of the product and Price is the sales price for the item.
Assume that the supermarket chain has accumulated many terabytes of data over a period of
several months.

The CEO wants a list of suppliers, listing for each supplier the average sales price of items
provided by the supplier. How would you organize the computation using the Map-Reduce
computation model?

***************************************************************************

For the following questions give short explanations of your answers.

5. True or False: Each mapper/reducer must generate the same number of output key/value pairs
as it receives on the input.
6. True or False: The output type of keys/values of mappers/reducers must be of the same type as
their input.
7. True or False: The input to reducers is grouped by key.
8. True or False: It is possible to start reducers while some mappers are still running.

Android Malware and Analysis
No ratings yet
Android Malware and Analysis
232 pages
Java SE 8 Question Bank
100% (1)
Java SE 8 Question Bank
107 pages
Nptel Big Data Full Assignment Solution 2021
100% (8)
Nptel Big Data Full Assignment Solution 2021
36 pages
2023 BD All Assignment
No ratings yet
2023 BD All Assignment
63 pages
Relational Algebra Operations in Mapreduce
No ratings yet
Relational Algebra Operations in Mapreduce
28 pages
Enterprise Resource Planning System
No ratings yet
Enterprise Resource Planning System
30 pages
220CT Revision Notes
100% (2)
220CT Revision Notes
20 pages
Big Data 2020
No ratings yet
Big Data 2020
13 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
E Book Ansible Open Virtualization Pro 1
No ratings yet
E Book Ansible Open Virtualization Pro 1
43 pages
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
No ratings yet
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
3 pages
Combined Exam 29.06.2020
No ratings yet
Combined Exam 29.06.2020
13 pages
Linux Lab Manual by Zoom PDF
No ratings yet
Linux Lab Manual by Zoom PDF
184 pages
Cmmi 2.0
No ratings yet
Cmmi 2.0
13 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
36 pages
TeraData DBA
No ratings yet
TeraData DBA
7 pages
RFP VMware
No ratings yet
RFP VMware
10 pages
UEC735
No ratings yet
UEC735
2 pages
Big Data: Practice Exercises
0% (1)
Big Data: Practice Exercises
4 pages
DTM
No ratings yet
DTM
215 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Chapter 5 Multimedia Database System
No ratings yet
Chapter 5 Multimedia Database System
47 pages
Red Hat Enterprise Linux 7 Beta 7.3 Release Notes en US
No ratings yet
Red Hat Enterprise Linux 7 Beta 7.3 Release Notes en US
144 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
DevOps - Presentation - 21052024 - v8.0 - Part 1
No ratings yet
DevOps - Presentation - 21052024 - v8.0 - Part 1
95 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Pig: Building High-Level Dataflows Over Map-Reduce: Utkarsh Srivastava
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce: Utkarsh Srivastava
46 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Science BSC Information Technology Semester 5 2019 November Next Generation Technologies Cbcs
No ratings yet
Science BSC Information Technology Semester 5 2019 November Next Generation Technologies Cbcs
21 pages
Test Plan For KOO Application
No ratings yet
Test Plan For KOO Application
26 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
Driving Licence MGMT System Commented by WE
No ratings yet
Driving Licence MGMT System Commented by WE
27 pages
Please Use Either of The 3 Option Given Below While Setting Up The Subjective/descriptive Questions
No ratings yet
Please Use Either of The 3 Option Given Below While Setting Up The Subjective/descriptive Questions
22 pages
Module 3
No ratings yet
Module 3
79 pages
Chapter 05 Slides
No ratings yet
Chapter 05 Slides
35 pages
BDH Answer Bank
No ratings yet
BDH Answer Bank
21 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
Module 3 Nosql
No ratings yet
Module 3 Nosql
12 pages
MKT301 Sec 2 Assignment 2 Group 1
No ratings yet
MKT301 Sec 2 Assignment 2 Group 1
24 pages
DRKP Module 3
No ratings yet
DRKP Module 3
44 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
DS WhitePapers Overview of Upgrade From ENOVIA V6 To 3DEXPERIENCE R2023x
No ratings yet
DS WhitePapers Overview of Upgrade From ENOVIA V6 To 3DEXPERIENCE R2023x
64 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
Black Insurgency (McAdam)
No ratings yet
Black Insurgency (McAdam)
21 pages
Farm Worker Movement (Jenkins, Perrow)
No ratings yet
Farm Worker Movement (Jenkins, Perrow)
21 pages
12 Sympathizers (Oegema, Klandermans) PDF
No ratings yet
12 Sympathizers (Oegema, Klandermans) PDF
21 pages
Nosql Mod3
No ratings yet
Nosql Mod3
18 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
5 RK - MapReduce - v3
No ratings yet
5 RK - MapReduce - v3
30 pages
DS BigDATA 2ièmeN2TR UVT 2022 2023
No ratings yet
DS BigDATA 2ièmeN2TR UVT 2022 2023
4 pages
Unit - 4
No ratings yet
Unit - 4
45 pages
BDA - Unit - III-1
No ratings yet
BDA - Unit - III-1
57 pages
BDA IV B.Tech I Sem MR18-Mid-2 Objective Questions
No ratings yet
BDA IV B.Tech I Sem MR18-Mid-2 Objective Questions
11 pages
Coursera 2
No ratings yet
Coursera 2
17 pages
Social Networks (Snow)
No ratings yet
Social Networks (Snow)
16 pages
DSE 3222 05 Mar 2025
No ratings yet
DSE 3222 05 Mar 2025
14 pages
Web Methods Certification Overview PDF
No ratings yet
Web Methods Certification Overview PDF
5 pages
5-Yarn Architecture Components Workflow Scheduling-22-01-2025
No ratings yet
5-Yarn Architecture Components Workflow Scheduling-22-01-2025
26 pages
21st Century Malware Threatscape: 15 Years of Evolution
No ratings yet
21st Century Malware Threatscape: 15 Years of Evolution
16 pages
Parameter Manipulation: Prev Next
No ratings yet
Parameter Manipulation: Prev Next
5 pages
Name of The Student Student ID Session 2. Present Address
No ratings yet
Name of The Student Student ID Session 2. Present Address
9 pages
Nosql Qbsol Ia-02
No ratings yet
Nosql Qbsol Ia-02
18 pages
FlowMonitor - A Network Monitoring Framework For T
No ratings yet
FlowMonitor - A Network Monitoring Framework For T
11 pages
12 Ip Question Paper
No ratings yet
12 Ip Question Paper
8 pages
Re Producing Feminine Bodies Emergent Spaces Through Contestation in The Women S March On Washington PDF
No ratings yet
Re Producing Feminine Bodies Emergent Spaces Through Contestation in The Women S March On Washington PDF
12 pages
Final Exam.
No ratings yet
Final Exam.
3 pages
Solution - BDA - IA1 - 23-24
No ratings yet
Solution - BDA - IA1 - 23-24
10 pages
A Living Archive of Modern Protest Memory Making in The Women S March
No ratings yet
A Living Archive of Modern Protest Memory Making in The Women S March
10 pages
BDA Question BANK
No ratings yet
BDA Question BANK
7 pages
Oracle 10g Install
No ratings yet
Oracle 10g Install
12 pages
Recruitment Notification For The Post of Junior Technical Officer (IT Software)
No ratings yet
Recruitment Notification For The Post of Junior Technical Officer (IT Software)
5 pages
SAP Model Configuration PDF
No ratings yet
SAP Model Configuration PDF
2 pages
Arcgis Desktop Entry Second Line
No ratings yet
Arcgis Desktop Entry Second Line
9 pages
Framing The Women's March On Washington
No ratings yet
Framing The Women's March On Washington
10 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
Emergent and Divergent Spaces in The Women S March The Challenges of Intersectionality and Inclusion
No ratings yet
Emergent and Divergent Spaces in The Women S March The Challenges of Intersectionality and Inclusion
9 pages
bigDataAnalytics hw1 2022 Sol
No ratings yet
bigDataAnalytics hw1 2022 Sol
9 pages
Short Questions
No ratings yet
Short Questions
17 pages
INFO2180 - Lab 3 (20 Marks) : Tic-Tac-Toe
No ratings yet
INFO2180 - Lab 3 (20 Marks) : Tic-Tac-Toe
8 pages
MapReduce Questions
No ratings yet
MapReduce Questions
8 pages
Quiz 1 - Attempt Review
No ratings yet
Quiz 1 - Attempt Review
7 pages
Activity Clock PDF
No ratings yet
Activity Clock PDF
2 pages
Dsebl ZG522
No ratings yet
Dsebl ZG522
4 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
Overview and Ongoing Works of TMForum
No ratings yet
Overview and Ongoing Works of TMForum
7 pages
Ads Phoenix Circus School
No ratings yet
Ads Phoenix Circus School
11 pages
20aipw602 - Big Data Analytics With Laboratory Question Bank
No ratings yet
20aipw602 - Big Data Analytics With Laboratory Question Bank
22 pages
Sample MCQs
No ratings yet
Sample MCQs
4 pages
21SE28 BDA CA III SET B-Key
No ratings yet
21SE28 BDA CA III SET B-Key
8 pages
1 Introduction Bash Shell Linux Mac Os m1 Overview Slides PDF
No ratings yet
1 Introduction Bash Shell Linux Mac Os m1 Overview Slides PDF
6 pages
Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS
No ratings yet
Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS
6 pages
Assignment 1 - 2024
No ratings yet
Assignment 1 - 2024
3 pages
Big Data Midterm
No ratings yet
Big Data Midterm
3 pages
Enterprise Feedback Management
No ratings yet
Enterprise Feedback Management
5 pages
Data Warehousing&Data Mining AMTCSE0114
No ratings yet
Data Warehousing&Data Mining AMTCSE0114
3 pages
23CP309T BDA RE-MSE Question Paper
No ratings yet
23CP309T BDA RE-MSE Question Paper
2 pages
Simple Ajax Form With Email Attachment Using PHP
No ratings yet
Simple Ajax Form With Email Attachment Using PHP
4 pages
Ela 2
No ratings yet
Ela 2
3 pages
Assignment - Big Data Management
No ratings yet
Assignment - Big Data Management
2 pages
Bda Cat II QP - Intern
No ratings yet
Bda Cat II QP - Intern
2 pages
Notes
No ratings yet
Notes
3 pages
MB-124G Big Data Analytics Using R (Elective - G)
No ratings yet
MB-124G Big Data Analytics Using R (Elective - G)
2 pages
Supplementary Exam 23CP309T BDA ESE Question Paper
No ratings yet
Supplementary Exam 23CP309T BDA ESE Question Paper
2 pages
Map Reduce
No ratings yet
Map Reduce
1 page
KMM Introduction v.1 10222020
No ratings yet
KMM Introduction v.1 10222020
1 page
Jonesalison Resume
No ratings yet
Jonesalison Resume
1 page
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet

Map Reduce Excercise

Uploaded by

Map Reduce Excercise

Uploaded by

Cloud Computing for Data Analysis

As an example, for the following table A:

A.ID A.ZIPCODE A.AGE

Implement the DISTINCT(X) operator using Map-Reduce. Provide the algo-

Get DISTINCT(A.ID from A WHERE A.AGE > 30 )

For the following questions give short explanations of your answers.

You might also like