Matrix-Vector Multiplication Using MapReduce in Big Data.

This document discusses matrix multiplication in big data and how MapReduce can be used to solve this problem efficiently. Matrix multiplication is a fundamental operation with many applications but can be challenging at large scales due to the immense size of matrices. MapReduce breaks the problem into independent mapping tasks that can be run in parallel, followed by a reducing task to combine the results. The document provides an example of how MapReduce can be applied to matrix multiplication, breaking it into two jobs - the first to create key-value pairs for multiplication and the second to perform the multiplications and combining of results. Overall, MapReduce provides an efficient parallel approach for matrix operations on big data.

Uploaded by

Pallavi girase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views4 pages

Matrix-Vector Multiplication Using MapReduce in Big Data.

Uploaded by

Pallavi girase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Matrix-Vector Multiplication using MapReduce in

Big Data.
Snehal Nikam Pallavi Girase

Dept. of Computer Science , Prof. Dr. Manisha Abhyankar

ABSTRACT with several subtasks. There are many applications

using MapReduce; like , MapReduce with K means
Matrix-vector multiplication is an absolutely
for remote-sensing image clustering , MapReduce
fundamental operation, with countless applications
with decision tree for classification , and
in computing and scientific computing. Efficient
MapReduce with expectation maximization for text
algorithms for matrix-vector multiplication are of
filtering . MapReduce has also been utilized in real-
paramount importance. One popular application
time systems and for job scheduling .
for giant data is matrix operation , which has been
solved using many approaches. However, the sheer DATA INTEGRATION AND ANALYTICS
size of the matrix are often an issue: if the matrix is
The data process, e.g., in business intelligence
dense, then Ω(n 2 ) time is certainly required for an
systems, typically consists of two stages, data
n × n matrix. Recently, researchers have applied
integration and data analytics.
MapReduce as a replacement approach to unravel
Data integration. Data integration typically is
this problem. during this paper, we prove matrix
mentioned the method of knowledge extraction-
operation of massive data using MapReduce. This
transformation-load (ETL). Traditionally this
paper includes the techniques for solving matrix
process runs at a daily interval , like daily, weekly
operation using MapReduce, the time-complexity
or monthly. within the real-time scenario,
and therefore the number of mappers needed for
however, large amounts of small sized data are
every technique.
processed into the info warehouse, possibly in high
Keywords — Big Data; MapReduce; Matrix velocity and in many parallel streams.
Multiplication, Real-time analytics. Data analytics goes after the
mixing process, which analyzes the info residing
INTRODUCTION
during a data store, or in streaming. Data analytics
Today’s business intelligence systems increasingly is to urge the insight/value of knowledge with the
demand the support for the mixing and analytics of assistance of analytics tools. When the info size
massive data. matrix operation has many related grows to web scale, employing a traditional
real-life applications, which may be a fundamental analytic tool becomes challenging, e.g., unable to
operation in algebra . More and more enterprises return the analytics results within a deadline , thus
face the challenges of handling fast-growing loses the worth .
amount of knowledge and data diversity. Recently,
BIG DATA ANALYSIS AND RESEARCH TRENDS
researchers have found many applications for
matrices thanks to the extensive use of private Big data may be a hot research topic which has
computers, which increased the utilization of many applications where complex and large data
matrices during a big variety of applications, like should be analyzed. Number of published articles
economics, engineering, statistics, and other during this topic is shown in Table I and illustrated
sciences. MapReduce may be a parallel approach in Fig. 1. The research during this topic is increasing
that contains two sequential tasks which are Map because it is employed almost anywhere lately in
and Reduce tasks. These tasks are implemented news articles, professional magazines, and social
networks like tweets, YouTube videos, and blog second matrix are n × m and m × q, respectively.
discussions. Google scholar is employed to extract
number of articles published for every year using
the query string "Big Data" as exact search term.

The time complexity of the algorithm is O(n3 ),

which needs to locate every element of the arrays
that are multiplied. Better approaches are
proposed over the years with less time complexity
than the brute-force algorithm; like , MapReduce.

ROLE OF MAPREDUCE IN BIG DATA

Parallel computation has been largely used for

matrix operation which is replaced recently with
MapReduce [14]. MapReduce may be a framework
for giant data in parallel distributed environments.
It consists of two sequential tasks; Map and
Reduce:

 Map: takes a set of data and change it to

another set of data so that each processor
work on different set of data which is
formed using a key-value pairs.
 Reduce: combine identical key-value pairs
to form the intended output. This should
always start after the map task.

One of the large challenges facing big data analysis MATRIX MULTIPLICATION USING MAPREDUCE
is multiplying matrices which is implemented using
many approaches and frameworks like There are tons of applications that use matrix
MapReduce. operation during which the matrices are
considered big. Thus, finding efficient matrix
MATRIX MULTIPLICATION IN BIG DATA operation algorithm may be a popular research
topic. Time and price are the most challenges
Many problems are solved using matrix operation
facing this problem during which several
because it is an important operation in algebra .
algorithms were proposed within the literature
Fig. 2 shows how matrices are multiplied to make
so as to unravel this problem.
the resulting matrix. In matrix operation , number
MapReduce may be a parallel framework for
of columns of the primary matrix is that the same
giant data, which contains two jobs when is
as number of rows of the second matrix, where the
applied on matrix multiplication:
sizes of the primary matrix and therefore the
 First job: the reduce task is inactive,
while the map task is simply used to
read the input file and create a pair of # Here all are 2, therefore when k=1, i can have 2
elements for multiplication. values 1 & 2, each case can have 2 further values of
 Second job: the map task implements j=1 and j=2.
the multiplication independently for
# Substituting all values in formula
each pair of elements, while reduce job
combines the results for each output k=1 i=1 j=1 ((1, 1), (A, 1, 1))
element.
j=2 ((1, 1), (A, 2, 2))
EXAMPLE
i=2 j=1 ((2, 1), (A, 1, 3))
MapReduce is a technique in which a huge
program is subdivided into small tasks and run j=2 ((2, 1), (A, 2, 4))
parallel to make computation faster, save time,
and it is mostly used in distributed systems. It has 2 k=2 i=1 j=1 ((1, 2), (A, 1, 1))
important parts:
j=2 ((1, 2), (A, 2, 2))
 Mapper: It takes raw data input and
organizes into key, value pairs. For example, i=2 j=1 ((2, 2), (A, 1, 3))
In a dictionary, you search for the word
“Data” and its associated meaning is “facts j=2 ((2, 2), (A, 2, 4))
and statistics collected together for reference
o To compute the mapping for matrix B:
or analysis”. Here the Key is Data and
the Value associated with is facts and i=1 j=1 k=1 ((1, 1), (B, 1, 5))
statistics collected together for reference or
analysis. k=2 ((1, 2), (B, 1, 6))
 Reducer: It is responsible for processing
data in parallel and produce final output. j=2 k=1 ((1, 1), (B, 2, 7))
So let us consider the matrix multiplication
example to visualize MapReduce. j=2 ((1, 2), (B, 2, 8))
Consider the following matrix:
i=2 j=1 k=1 ((2, 1), (B, 1, 5))

[ 13 24] × [ 57 68] k=2 ((2, 2), (B, 1, 6))

Here we can see that matrix A is a 2×2 matrix j=2 k=1 ((2, 1), (B, 2, 7))
which means the number of rows(i)=2 and the
number of columns(j)=2. Matrix B is also a 2×2 k=2 ((2, 2), (B, 2, 8))
matrix where number of rows(j)=2 and number of
columns(k)=2. Each cell of the matrix is labelled as o We have already computed mapping matrix
Aij and Bij. .So, now the next step is to reduce the
For example element 3 in matrix A is called A21 mapping.
i.e. 2nd-row 1st column. Now One step matrix o The formula for Reducing is:
multiplication has 1 mapper and 1 reducer.
Reducer(k, v)=(i, k)=>Make sorted Alist and Blist
The Formula for mapping matrix A and B is:
(i, k) => Summation (Aij * Bjk)) for j
o Mapper for Matrix A (k, v)=((i, k), (A, j, Aij))
for all k Output =>((i, k), sum)
o Mapper for Matrix B (k, v)=((i, k), (B, j, Bjk))
 To compute reducer of matrix:
for all i
 # We can observe from Mapper
To compute the mapping for matrix A: computation that 4 pairs are common (1,
1), (1, 2), (2, 1) and (2, 2)
# k, i, j computes the number of times it occurs.
# Make a list separate for Matrix A & B with research, while combining them together research
adjoining values taken from Mapper step above: area remains a recent research area, where only
four papers are presented; which are discussed
(1, 1) =>Alist ={(A, 1, 1), (A, 2, 2)}
through this paper. We concluded that column-
Blist ={(B, 1, 5), (A, 2, 7)} byrow technique having a time complexity of O(n)
and n number of mappers is perhaps the simplest
Now Aij x Bjk: [(1*5) + (2*7)] =19 (i) technique, while element-bycolumn-block and
column-block-by-row-block are moderately
(1, 2) =>Alist ={(A, 1, 1), (A, 2, 2)}
acceptable ones, which compromises between the
Blist ={(B, 1, 6), (A, 2, 8)} time complexity of the algorithm and therefore the
number of mappers.
Now Aij x Bjk: [(1*6) + (2*8)] =22 (ii)

(2, 1) =>Alist ={(A, 1, 3), (A, 2, 4)}

REFERENCES
Blist ={(B, 1, 5), (A, 2, 7)}
[1] https://www.researchgate.net/publication/
Now Aij x Bjk: [(3*5) + (4*7)] =43 (iii) 220779373_Matrix-
vector_multiplication_in_sub-
(2, 2) =>Alist ={(A, 1, 3), (A, 2, 4)} quadratic_time_some_preprocessing_requi
red
Blist ={(B, 1, 6), (A, 2, 8)}
[2] https://arxiv.org/pdf/1805.11938.pdf
Now Aij x Bjk: [(3*6) + (4*8)] =50 (iv) [3] https://www.geeksforgeeks.org/matrix-
multiplication-with-1-mapreduce-step/
From (i), (ii), (iii) and (iv) we can conclude that the
[4] https://www.researchgate.net/publication/
final matrix is:
322872479_Matrix_multiplication_of_big_d
((1, 1), 19) ata_using_MapReduce_A_review
[5] https://www.researchgate.net/publication/
((1, 2), 22) 266660592_Survey_of_real-
((2, 1), 43) time_processing_systems_for_big_data
[6] https://www.sciencedirect.com/science/ar
((2, 2), 50) ticle/pii/S2444883417300268
[7] https://lendap.wordpress.com/2015/02/16
Therefore the final matrix is:
/matrix-multiplication-with-mapreduce/
[8] http://www.mathcs.emory.edu/~cheung/C
[ 1943 2250] ourses/554/Syllabus/9-parallel/matrix-
mult.html
[9] https://journalofbigdata.springeropen.com/
CONCLUSION articles/10.1186/s40537-020-00362-1
[10]https://www.sciencedirect.com/science/ar
With the arrival of the large data era, handling ticle/pii/S0747717108800132
large amounts of knowledge is challenging. The
facility of algebraic matrix operation algorithms
stems from the redundancy in multiplying different
pairs of an equivalent collections of vectors. Such
redundancy isn't present in matrix-vector
multiplication. We observed that articles are
increasingly published for these three areas of

Ad3351 Daa Lecture Notes Units 1,2,3
No ratings yet
Ad3351 Daa Lecture Notes Units 1,2,3
79 pages
Transform and Conquer, Presorting
100% (1)
Transform and Conquer, Presorting
2 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
Generally Accepted Scheduling Principles Gasp Compiled
0% (1)
Generally Accepted Scheduling Principles Gasp Compiled
1,821 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
21 pages
NCP - Format Spina Bifida
100% (2)
NCP - Format Spina Bifida
2 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Parkinson Detection Using Machine Learning Algorithms
No ratings yet
Parkinson Detection Using Machine Learning Algorithms
8 pages
Python Program (Journal)
No ratings yet
Python Program (Journal)
67 pages
Space Programming
100% (2)
Space Programming
3 pages
Week 2 Python For Data Science
No ratings yet
Week 2 Python For Data Science
27 pages
Skripta Iz Engleskog Sa Vezbama
No ratings yet
Skripta Iz Engleskog Sa Vezbama
54 pages
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
38 pages
Harrison's Principles of Internal Medicine - 19e - Kasper, Fauci, Hauser, Longo, Jameson, Loscalzo (Dragged)
0% (1)
Harrison's Principles of Internal Medicine - 19e - Kasper, Fauci, Hauser, Longo, Jameson, Loscalzo (Dragged)
16 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
of Sedimentary Basins - Notes
100% (1)
of Sedimentary Basins - Notes
44 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
Firmenliste Katar DT DLD
No ratings yet
Firmenliste Katar DT DLD
1 page
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Data Mining Lab Manual
33% (3)
Data Mining Lab Manual
44 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Data Structure Unit 5 (Searching and Sorting Notes)
100% (1)
Data Structure Unit 5 (Searching and Sorting Notes)
26 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Bell ADT D-Series General Info
100% (1)
Bell ADT D-Series General Info
32 pages
Matrix Multiplication of Big Data Using
No ratings yet
Matrix Multiplication of Big Data Using
6 pages
Computer Science Project
No ratings yet
Computer Science Project
19 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
IT6006 Data Analytics
No ratings yet
IT6006 Data Analytics
12 pages
Big Data - SRM University PDF
No ratings yet
Big Data - SRM University PDF
29 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
ADA Question Bank
No ratings yet
ADA Question Bank
8 pages
Al3452 Os Notes
No ratings yet
Al3452 Os Notes
280 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
90 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
Q2 LE TLE 7 Lesson 9 Week 7
No ratings yet
Q2 LE TLE 7 Lesson 9 Week 7
20 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Chpater 1 - Unit 2
No ratings yet
Chpater 1 - Unit 2
31 pages
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
100% (1)
Fundamentals of Data Science: Nehru Institute of Engineering and Technology
17 pages
R Programming UNIT-1
No ratings yet
R Programming UNIT-1
48 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
M.Tech (CSE) Big Data Analytics Curriculum
No ratings yet
M.Tech (CSE) Big Data Analytics Curriculum
69 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
Practical No.2 Perform The Extraction Transformation and Loading (ETL) Process To Construct The Database in The Sqlserver
No ratings yet
Practical No.2 Perform The Extraction Transformation and Loading (ETL) Process To Construct The Database in The Sqlserver
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
66 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Clouds and Big Data Computing
No ratings yet
Clouds and Big Data Computing
13 pages
AoA Important Question
100% (1)
AoA Important Question
3 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
Matrix Mult
No ratings yet
Matrix Mult
6 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Unit 1 All About You
No ratings yet
Unit 1 All About You
11 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Pianoman: "Piano Man"
No ratings yet
Pianoman: "Piano Man"
2 pages
Python Programming Lecture 1
No ratings yet
Python Programming Lecture 1
14 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
84 pages
Anne Enright
No ratings yet
Anne Enright
4 pages
Ampere's Law
No ratings yet
Ampere's Law
20 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
Get TRDoc
No ratings yet
Get TRDoc
309 pages
Processfolio
No ratings yet
Processfolio
3 pages
Xe155ucr Spec
No ratings yet
Xe155ucr Spec
20 pages
The Amazing World of Dictionaries
No ratings yet
The Amazing World of Dictionaries
7 pages
Nashik Car Deler List
No ratings yet
Nashik Car Deler List
8 pages
3rd Year 1st Semester
No ratings yet
3rd Year 1st Semester
11 pages
Test and Evaluation of Aircraft Avionics and Weapon Systems 2nd Edition Robert B. Mcshea PDF Download
No ratings yet
Test and Evaluation of Aircraft Avionics and Weapon Systems 2nd Edition Robert B. Mcshea PDF Download
52 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
SAFEGRID35BP25X5P
No ratings yet
SAFEGRID35BP25X5P
2 pages
General Purpose Processor
No ratings yet
General Purpose Processor
13 pages
MMG 301 Final March18
No ratings yet
MMG 301 Final March18
143 pages
Prof Ed 106 Written Report 2.1
No ratings yet
Prof Ed 106 Written Report 2.1
12 pages
Chapter 22
No ratings yet
Chapter 22
54 pages
CS314 Spring 2023 Homework 4 Due Monday, March 27, 11:59pm Submission: PDF File Through Canvas 1 Problem - Pointers
No ratings yet
CS314 Spring 2023 Homework 4 Due Monday, March 27, 11:59pm Submission: PDF File Through Canvas 1 Problem - Pointers
10 pages
Micro CC 20 Plus Communication Protocol
No ratings yet
Micro CC 20 Plus Communication Protocol
9 pages
PART ONE: Reading: Plagiarism
No ratings yet
PART ONE: Reading: Plagiarism
2 pages
UWC Robert Bosch College Vacancies For Fall 2021
No ratings yet
UWC Robert Bosch College Vacancies For Fall 2021
2 pages
Erick Oliva
No ratings yet
Erick Oliva
6 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet

Matrix-Vector Multiplication Using MapReduce in Big Data.

Uploaded by

Matrix-Vector Multiplication Using MapReduce in Big Data.

Uploaded by

Matrix-Vector Multiplication using MapReduce in

Dept. of Computer Science , Prof. Dr. Manisha Abhyankar

ABSTRACT with several subtasks. There are many applications

The time complexity of the algorithm is O(n3 ),

ROLE OF MAPREDUCE IN BIG DATA

Parallel computation has been largely used for

 Map: takes a set of data and change it to

[ 13 24] × [ 57 68] k=2 ((2, 2), (B, 1, 6))

(2, 1) =>Alist ={(A, 1, 3), (A, 2, 4)}

You might also like