0% found this document useful (0 votes)
17 views22 pages

Module 3

The document presents educational material on Big Data Analytics, specifically focusing on Matrix-Vector and Matrix-Matrix multiplication using MapReduce. It outlines the problem statements, methods for handling large matrices, and provides pseudocode examples for implementing these operations. Additionally, it includes university questions related to MapReduce concepts and functions.

Uploaded by

Biya Rahul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Module 3

The document presents educational material on Big Data Analytics, specifically focusing on Matrix-Vector and Matrix-Matrix multiplication using MapReduce. It outlines the problem statements, methods for handling large matrices, and provides pseudocode examples for implementing these operations. Additionally, it includes university questions related to MapReduce concepts and functions.

Uploaded by

Biya Rahul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes.

Distribution and modifications of the content is prohibited.

BIG DATA ANALYTICS(BDA)


ITDO8011

Subject In-charge
Sonali Suryawanshi
Assistant Professor, Department of Information Technology, SFIT
Room No. 328
email: [email protected]

St. Francis Institute of Technology


Department of Information Technology 1
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Vector Multiplication by MapReduce

Google implementation of MapReduce

• created to execute very large matrix-vector multiplications


• When ranking of Web pages that goes on at search engines, n is
in the tens of billions.
• Page Rank- iterative algorithm
• Also, useful for simple (memory-based) recommender systems

St. Francis Institute of Technology


Department of Information Technology 2
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Vector Multiplication by MapReduce

Problem Statement
• Given,
n × n matrix M, whose element in row i and column j will be denoted
𝑚𝑖𝑗 .
• a vector v of length n.

Assume that
– the row-column coordinates of each matrix element will be discoverable, either
from its position in the file, or because it is stored with explicit coordinates, as a
triple (i, j, 𝑚𝑖𝑗).
– the position of element 𝑣𝑗 in the vector v will be discoverable in the analogous
way
St. Francis Institute of Technology
Department of Information Technology 3
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Vector Multiplication by MapReduce

Case 1: n is large, but not so large that vector v cannot fit in main memory

● The Map Function: written to apply to one element of M


● v is first read to computing node executing a Map task and is available
for all applications of the Map function at this compute node.
● Each Map task will operate on a chunk of the matrix M.
● From each matrix element 𝑚𝑖𝑗 it produces the keyvalue pair (i, ( 𝑚𝑖𝑗 . 𝑣𝑗
) ).
● Thus, all terms of the sum that make up the component 𝑥𝑖 of the
matrix-vector product will get the same key, i.

St. Francis Institute of Technology


Department of Information Technology 4
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Vector Multiplication by MapReduce

● Case 1 Continued …

● The Reduce Function:


● The Reduce function simply sums all the values associated with a
given key i.
● The result will be a pair (i,𝑥𝑖 )

St. Francis Institute of Technology


Department of Information Technology 5
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Vector Multiplication by MapReduce

Case 2: n is large to fit into main memory

● v should be stored in computing nodes used for the Map task


● Divide the matrix into vertical stripes of equal width and divide the
vector into an equal number of horizontal stripes, of the same height.
● Our goal is to use enough stripes so that the portion of the vector in
one stripe can fit conveniently into main memory at a compute node.

St. Francis Institute of Technology


Department of Information Technology 6
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Vector Multiplication by MapReduce

Case 2: n is large to fit into main memory

● The ith stripe of the matrix multiplies only components from the ith
stripe of the vector.
● Divide the matrix into one file for each stripe, and do the same for the
vector.
● Each Map task is assigned a chunk from one of the stripes of the
matrix and gets the entire corresponding stripe of the vector.
● The Map and Reduce tasks can then act exactly as was described, as
case 1.

St. Francis Institute of Technology


Department of Information Technology 7
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Vector Multiplication
by MapReduce

St. Francis Institute of Technology


Department of Information Technology 8
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Vij

St. Francis Institute of Technology


Department of Information Technology 9
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

St. Francis Institute of Technology


Department of Information Technology 10
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Matrix Multiplication:

2-Step MapReduce Multiplication


Let
M is a matrix where mij is an element of M with i as row and j as column,
N is a matrix where njk is an element of N with j as row and k as column
P is Product M*N where pij is an element of P with i as row and k as column
(No of column of M = Number of rows of N)
Pik = ∑(j=1) (mij ∗njk)

St. Francis Institute of Technology


Department of Information Technology 11
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Matrix Multiplication:

St. Francis Institute of Technology


Department of Information Technology 12
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Matrix Multiplication:

St. Francis Institute of Technology


Department of Information Technology 13
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Matrix Multiplication:

St. Francis Institute of Technology


Department of Information Technology 14
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

Matrix-Matrix Multiplication:

St. Francis Institute of Technology


Department of Information Technology 15
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

1 step matrix- matrix multiplication:


1.Heavy code of map and reduce
2.Used with sparse matrix where most of the values are zero so each value is stored
in the memory in the form of eg. M11= (1,1,2) (i.e all elements are stored in group
of three elements)
3.It puts more work to mapper

St. Francis Institute of Technology


Department of Information Technology 16
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

1 step matrix- matrix multiplication:

St. Francis Institute of Technology


Department of Information Technology 17
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.

1 step matrix- matrix multiplication:

St. Francis Institute of Technology


Department of Information Technology 18
University questions
Q.1) Explain concept of MapReduce using an example. Write MapReduce Pseudocode
for group by aggregation in a database (10 marks)
Q.2) What Is MapReduce? Explain Combiners using an example (10 Marks)
Q.3) Write MapReduce Pseudocode to multiply two matrices. Illustrate the procedure on
following matrices. Clearly show All the steps (10 Marks)

Q.4) Write pseudo code for Matrix vector Multiplication by MapReduce. Illustrate with an
example showing all the steps ( 10 Marks)
1. Which of the following options must aptly explains the reason behind the
creation of Mapreduce?
a) Need to increase the processing power of new hardware
b) Need to perform complex analysis of structured data
c) Need to increase number of users
d) Need to spread distributed computing resources

2. In designing mapreduce framework, which of the following needs the


engineers consider?
a. In should be cheap and distributed free of cost
b. Processing should expand and contract automatically
c. Processing should be stopped in the case of network failure
d. Developers should be able to create new languages
3.which of the following describes the map function?
a) It processes data to create a list of key-value pairs
b) It indexes the data to list all the words occurring in it
c) It converts a relational database to key-value pairs
d) It tracks data across multiple tables and clusters in hadoop

4. Which of the following describes the reduce function?


a) It analyzes the map function results to show the most frequently occurring
values
b) It combines the map function results to return a list of the best matches for
the query
c) It adds the results of the map function to convert the key-value pair lists to
columnar database
d) It processes map function results and creates a new key-value pair list to
answer the query
5. What role does the map function play in a word count query
a) It sorts the words alphabetically and return a list of the most frequently
used words
b) It creates a list with each word as a key and the number of occurrences as
the value
c) It creates a list with each word as a key and `every occurrences as a value
d) It return a list with each document as a key and the number of words in ti
as the value

You might also like