0% found this document useful (0 votes)

3 views

bigdata-1

Uploaded by

Ramesh Raj

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

bigdata-1

Uploaded by

Ramesh Raj

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

Suppose we execute the word-count MapReduce program described in this section on a

large repository such as a copy of the Web. We shall

use 100 Map tasks and some number of Reduce tasks.
(a) Suppose we do not use a combiner at the Map tasks. Do you expect there
to be significant skew in the times taken by the various reducers to process
their value list? Why or why not?
(b) If we combine the reducers into a small number of Reduce tasks, say 10
tasks, at random, do you expect the skew to be significant? What if we
instead combine the reducers into 10,000 Reduce tasks?
! (c) Suppose we do use a combiner at the 100 Map tasks. Do you expect skew
to be significant? Why or why not?

Design MapReduce algorithms to take a very large file of

integers and produce as output:
(a) The largest integer.
(b) The average of all the integers.
(c) The same set of integers, but with each integer appearing only once.
(d) The count of the number of distinct integers in the input.

Our formulation of matrix-vector multiplication assumed that

the matrix M was square. Generalize the algorithm to the case where M is an
r-by-c matrix for some number of rows r and columns c.

In the form of relational algebra implemented in SQL, relations are not sets, but
bags; that is, tuples are allowed to appear more than
once. There are extended definitions of union, intersection, and difference for
bags, which we shall define below. Write MapReduce algorithms for computing
the following operations on bags R and S:
(a) Bag Union, defined to be the bag of tuples in which tuple t appears the
sum of the numbers of times it appears in R and S.
(b) Bag Intersection, defined to be the bag of tuples in which tuple t appears
the minimum of the numbers of times it appears in R and S.
(c) Bag Difference, defined to be the bag of tuples in which the number of
times a tuple t appears is equal to the number of times it appears in R
minus the number of times it appears in S. A tuple that appears more
times in S than in R does not appear in the difference.

Selection can also be performed on bags. Give a MapReduce

implementation that produces the proper number of copies of each tuple t that
passes the selection condition. That is, produce key-value pairs from which the
correct result of the selection can be obtained easily from the values.

The relational-algebra operation R(A, B) ⊲⊳ B<C S(C, D)

produces all tuples (a, b, c, d) such that tuple (a, b) is in relation R, tuple (c,
d) is
in S, and b < c. Give a MapReduce implementation of this operation, assuming
R and S are sets.

Suppose a job consists of n tasks, each of which takes time t

seconds. Thus, if there are no failures, the sum over all compute nodes of the
time taken to execute tasks at that node is nt. Suppose also that the probability
of a task failing is p per job per second, and when a task fails, the overhead of
management of the restart is such that it adds 10t seconds to the total execution
time of the job. What is the total expected execution time of the job?

Suppose a Pregel job has a probability p of a failure during

any superstep. Suppose also that the execution time (summed over all compute
nodes) of taking a checkpoint is c times the time it takes to execute a superstep.
To minimize the expected execution time of the job, how many supersteps
should elapse between checkpoints?

What is the communication cost of each of the following

algorithms, as a function of the size of the relations, matrices, or vectors to
which they are applied?
(a) The matrix-vector multiplication algorithm of Section 2.3.2.
(b) The union algorithm of Section 2.3.6.
(c) The aggregation algorithm of Section 2.3.8

Suppose relations R, S, and T have sizes r, s, and t, respectively, and we want to

take the 3-way join R(A, B) ⊲⊳ S(B, C) ⊲⊳ T (A, C),
using k reducers. We shall hash values of attributes A, B, and C to a, b, and c
buckets, respectively, where abc = k. Each reducer is associated with a vector
of buckets, one for each of the three hash functions. Find, as a function of r, s,
t, and k, the values of a, b, and c that minimize the communication cost of the
algorithm.

Suppose we take a star join of a fact table F(A1, A2, . . . , Am)

with dimension tables Di(Ai
, Bi) for i = 1, 2, . . . , m. Let there be k reducers,
each associated with a vector of buckets, one for each of the key attributes
A1, A2, . . . , Am. Suppose the number of buckets into which we hash Ai
is ai
.Naturally, a1a2 · · · am = k. Finally, suppose each dimension table Di has size
di
, and the size of the fact table is much larger than any of these sizes. Find
the values of the ai
’s that minimize the cost of taking the star join as one
MapReduce operation.

Describe the graphs that model the following problems.

(a) The multiplication of an n × n matrix by a vector of length n.
(b) The natural join of R(A, B) and S(B, C), where A, B, and C have domains of
sizes a, b, and c, respectively.

The grouping and aggregation on the relation R(A, B), where A is the
grouping attribute and B is aggregated by the MAX operation. Assume
A and B have domains of size a and b, respectively.

Provide the details of the proof that a one-pass matrixmultiplication algorithm

requires replication rate at least r ≥ 2n
2/q, including:
(a) The proof that, for a fixed reducer size, the maximum number of outputs
are covered by a reducer when that reducer receives an equal number of
rows of M and columns of N.
(b) The algebraic manipulation needed, starting with Pk
i=1 q
2
i ≥ 4n
4

Suppose our inputs are bit strings of length b, and the outputs
correspond to pairs of strings at Hamming distance 1.11
(a) Prove that a reducer of size q can cover at most (q/2) log2
q outputs.
(b) Use part (a) to show the lower bound on replication rate: r ≥ b/ log2
q.
(c) Show that there are algorithms with replication rate as given by part (b)
for the cases q = 2, q = 2b
, and q = 2b/2

Smart Interviews
No ratings yet
Smart Interviews
1 page
Ch2. Solution Manual - The Design and Analysis of Algorithm - Levitin
50% (2)
Ch2. Solution Manual - The Design and Analysis of Algorithm - Levitin
51 pages
PDDL: Knight Tour With Holes: Urukh-Ai
No ratings yet
PDDL: Knight Tour With Holes: Urukh-Ai
14 pages
CO-1: Tutorials Tutorial-1
No ratings yet
CO-1: Tutorials Tutorial-1
9 pages
quiz paper
No ratings yet
quiz paper
10 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
2015-Spr
No ratings yet
2015-Spr
4 pages
Job
No ratings yet
Job
4 pages
AA Exam 2022
No ratings yet
AA Exam 2022
3 pages
Model Questions JRF CS 2025 CSB
No ratings yet
Model Questions JRF CS 2025 CSB
9 pages
GATE - Computer Science (CS) - 1998 Exam Paper
No ratings yet
GATE - Computer Science (CS) - 1998 Exam Paper
17 pages
Extra Problems
100% (3)
Extra Problems
41 pages
Experiment No. 2
No ratings yet
Experiment No. 2
22 pages
Midterm 02 Solutions
No ratings yet
Midterm 02 Solutions
10 pages
Combined Exam 24.11.2020
No ratings yet
Combined Exam 24.11.2020
13 pages
QUESTIONS Dynamic Programming
No ratings yet
QUESTIONS Dynamic Programming
6 pages
discrete Structure 1
No ratings yet
discrete Structure 1
21 pages
Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin - Download Today With Full Content
100% (13)
Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin - Download Today With Full Content
47 pages
Homework 7
0% (1)
Homework 7
5 pages
Ai Graduate Sample Question Paper
100% (1)
Ai Graduate Sample Question Paper
3 pages
Unit Iii & Iv
No ratings yet
Unit Iii & Iv
4 pages
Ugc Net - January 2017 Paper-II
No ratings yet
Ugc Net - January 2017 Paper-II
11 pages
امتحان الداتابيز تيرم تانى ٢٠٢٣ Database CS604 FinalExam 2022 2023 T2 v1 Model Online
No ratings yet
امتحان الداتابيز تيرم تانى ٢٠٢٣ Database CS604 FinalExam 2022 2023 T2 v1 Model Online
12 pages
Quiz Week 02
No ratings yet
Quiz Week 02
2 pages
Section8 Mapreduce Solution PDF
No ratings yet
Section8 Mapreduce Solution PDF
5 pages
Competitive Programming Algorithms
No ratings yet
Competitive Programming Algorithms
1 page
Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitindownload
100% (10)
Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitindownload
59 pages
End Sem Paper
No ratings yet
End Sem Paper
2 pages
All chapter download Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin
100% (11)
All chapter download Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin
56 pages
Mining Massive Data University of Primorska Fall 2020
No ratings yet
Mining Massive Data University of Primorska Fall 2020
2 pages
Screenshot 2024-07-09 at 8.25.19 PM
No ratings yet
Screenshot 2024-07-09 at 8.25.19 PM
2 pages
Assignment 1 - 2024
No ratings yet
Assignment 1 - 2024
3 pages
Vector Spaces and Linear Transformations. Basic Concepts and Theorems
No ratings yet
Vector Spaces and Linear Transformations. Basic Concepts and Theorems
7 pages
Class 12 computer
No ratings yet
Class 12 computer
7 pages
Chapter 2 of The Book "Introduction To The Design and Analysis of Algorithms," 2nd Edition
100% (2)
Chapter 2 of The Book "Introduction To The Design and Analysis of Algorithms," 2nd Edition
51 pages
DMS Unitwise Last 3 Years Questions
No ratings yet
DMS Unitwise Last 3 Years Questions
10 pages
MC0063 Discrete Mathematics Model Question Paper
No ratings yet
MC0063 Discrete Mathematics Model Question Paper
26 pages
Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin instant download
100% (5)
Solution Manual for Introduction to the Design and Analysis of Algorithms, 3/E 3rd Edition Anany Levitin instant download
54 pages
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
No ratings yet
CSCE 3110 Data Structures & Algorithm Analysis: Rada Mihalcea
19 pages
TOC Question Bank
No ratings yet
TOC Question Bank
95 pages
1st Semester Exams - Discrete Structures - April 2019 - DIT
No ratings yet
1st Semester Exams - Discrete Structures - April 2019 - DIT
6 pages
CS-GATE - Solved PYQ
No ratings yet
CS-GATE - Solved PYQ
38 pages
HW 1
No ratings yet
HW 1
5 pages
Cuet Pg 2021 With Solution
No ratings yet
Cuet Pg 2021 With Solution
8 pages
UCT501_MST_24
No ratings yet
UCT501_MST_24
2 pages
Problem Set: The 33 ACM International Collegiate Programming Contest Asia Regional Contest - Hefei
No ratings yet
Problem Set: The 33 ACM International Collegiate Programming Contest Asia Regional Contest - Hefei
15 pages
Hash Solution
100% (2)
Hash Solution
3 pages
Mathematical Foundations of Computer Science: Unit-I
No ratings yet
Mathematical Foundations of Computer Science: Unit-I
7 pages
CS 106 Tutorials 1 9
No ratings yet
CS 106 Tutorials 1 9
11 pages
CS218-Data Structures Final Exam
100% (2)
CS218-Data Structures Final Exam
7 pages
Data Structure and Algorithms II - Final - Summer 2023
No ratings yet
Data Structure and Algorithms II - Final - Summer 2023
4 pages
Session+9+and+10
No ratings yet
Session+9+and+10
32 pages
Quiz1 Solution (1)
No ratings yet
Quiz1 Solution (1)
4 pages
University of Toronto Faculty of Applied Science and Engineering Aps106 Midterm Ii - March 27, 2014
No ratings yet
University of Toronto Faculty of Applied Science and Engineering Aps106 Midterm Ii - March 27, 2014
5 pages
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
C# Functions and Tutorial - 50 Examples
From Everand
C# Functions and Tutorial - 50 Examples
Nino Paiotta
No ratings yet
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Standard-Slope Integration: A New Approach to Numerical Integration
From Everand
Standard-Slope Integration: A New Approach to Numerical Integration
Peter James Italia, MD
No ratings yet
Analytical Modeling of Solute Transport in Groundwater: Using Models to Understand the Effect of Natural Processes on Contaminant Fate and Transport
From Everand
Analytical Modeling of Solute Transport in Groundwater: Using Models to Understand the Effect of Natural Processes on Contaminant Fate and Transport
Mark Goltz
No ratings yet
Quick Sort
No ratings yet
Quick Sort
5 pages
Unit 5
No ratings yet
Unit 5
95 pages
Formal Logic Its Scope and Limits (Richard C. Jeffrey) (Z-Library)
No ratings yet
Formal Logic Its Scope and Limits (Richard C. Jeffrey) (Z-Library)
232 pages
[5] Brouwer's Cambridge Lectures on Intuitionism
No ratings yet
[5] Brouwer's Cambridge Lectures on Intuitionism
10 pages
Language of Relations and Functions
No ratings yet
Language of Relations and Functions
17 pages
Module 2 Operators
No ratings yet
Module 2 Operators
18 pages
Presantation - Chapter 06 - Brute Force and Exhaustive Search
No ratings yet
Presantation - Chapter 06 - Brute Force and Exhaustive Search
68 pages
Determinant and Adjoint of Fuzzy Neutrosophic Soft Matrices
No ratings yet
Determinant and Adjoint of Fuzzy Neutrosophic Soft Matrices
17 pages
1. Let F be a σ-algebra of subsets of Ω.: Ubmit The First Four Problems Only
No ratings yet
1. Let F be a σ-algebra of subsets of Ω.: Ubmit The First Four Problems Only
2 pages
Unit 4 Digital Logic Gates
No ratings yet
Unit 4 Digital Logic Gates
39 pages
2 2023 Lesson Plan CS2012 DAA
No ratings yet
2 2023 Lesson Plan CS2012 DAA
4 pages
5 PLC Program To Implement A Combinational Logic Circuit
No ratings yet
5 PLC Program To Implement A Combinational Logic Circuit
4 pages
CH-3 Syntax Analyzer
No ratings yet
CH-3 Syntax Analyzer
41 pages
Math 112 Module 11 Derivative of Inverse Trigonometric Functions
No ratings yet
Math 112 Module 11 Derivative of Inverse Trigonometric Functions
15 pages
Introduction To Number Theory
No ratings yet
Introduction To Number Theory
95 pages
NP Hard and NP Complete
No ratings yet
NP Hard and NP Complete
15 pages
Propositions: Chapter 1: The Foundations: Logic and Proofs
No ratings yet
Propositions: Chapter 1: The Foundations: Logic and Proofs
21 pages
Chapter 3: Expressions and Interactivity: Starting Out With C++ Early Objects Seventh Edition
No ratings yet
Chapter 3: Expressions and Interactivity: Starting Out With C++ Early Objects Seventh Edition
45 pages
Boole
No ratings yet
Boole
1 page
Elementary Number Theory and Methods of Proof: CSE 215, Foundations of Computer Science Stony Brook University
No ratings yet
Elementary Number Theory and Methods of Proof: CSE 215, Foundations of Computer Science Stony Brook University
52 pages
Introduction To Sets: Basic, Essential, and Important Properties of Sets
50% (2)
Introduction To Sets: Basic, Essential, and Important Properties of Sets
41 pages
Operation Research Quiz: A. Destinations, Sources
100% (1)
Operation Research Quiz: A. Destinations, Sources
8 pages
Mid Term Discreet Math
No ratings yet
Mid Term Discreet Math
2 pages
Problem1-Replace Each Emelent With Product of Remainig Element Except Self
No ratings yet
Problem1-Replace Each Emelent With Product of Remainig Element Except Self
2 pages
GameAI FuzzyLogic
No ratings yet
GameAI FuzzyLogic
33 pages
Lesson 6 Higher Derivatives and Implicit Differentiation
No ratings yet
Lesson 6 Higher Derivatives and Implicit Differentiation
17 pages
Kakatiya Institute of Technology & Science, Warangal
No ratings yet
Kakatiya Institute of Technology & Science, Warangal
4 pages
Activity 4 - MathLogic
No ratings yet
Activity 4 - MathLogic
2 pages

bigdata-1

Uploaded by

bigdata-1

Uploaded by

Suppose we execute the word-count MapReduce program described in this section on a

large repository such as a copy of the Web. We shall

Design MapReduce algorithms to take a very large file of

Our formulation of matrix-vector multiplication assumed that

Selection can also be performed on bags. Give a MapReduce

The relational-algebra operation R(A, B) ⊲⊳ B<C S(C, D)

Suppose a job consists of n tasks, each of which takes time t

Suppose a Pregel job has a probability p of a failure during

What is the communication cost of each of the following

Suppose relations R, S, and T have sizes r, s, and t, respectively, and we want to

Suppose we take a star join of a fact table F(A1, A2, . . . , Am)

Describe the graphs that model the following problems.

Provide the details of the proof that a one-pass matrixmultiplication algorithm

You might also like