Matrix-Vector Multiplication Using MapReduce in Big Data.

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Matrix-Vector Multiplication using MapReduce in

Big Data.
Snehal Nikam Pallavi Girase

Dept. of Computer Science , Prof. Dr. Manisha Abhyankar

ABSTRACT with several subtasks. There are many applications


using MapReduce; like , MapReduce with K means
Matrix-vector multiplication is an absolutely
for remote-sensing image clustering , MapReduce
fundamental operation, with countless applications
with decision tree for classification , and
in computing and scientific computing. Efficient
MapReduce with expectation maximization for text
algorithms for matrix-vector multiplication are of
filtering . MapReduce has also been utilized in real-
paramount importance. One popular application
time systems and for job scheduling .
for giant data is matrix operation , which has been
solved using many approaches. However, the sheer DATA INTEGRATION AND ANALYTICS
size of the matrix are often an issue: if the matrix is
The data process, e.g., in business intelligence
dense, then Ω(n 2 ) time is certainly required for an
systems, typically consists of two stages, data
n × n matrix. Recently, researchers have applied
integration and data analytics.
MapReduce as a replacement approach to unravel
Data integration. Data integration typically is
this problem. during this paper, we prove matrix
mentioned the method of knowledge extraction-
operation of massive data using MapReduce. This
transformation-load (ETL). Traditionally this
paper includes the techniques for solving matrix
process runs at a daily interval , like daily, weekly
operation using MapReduce, the time-complexity
or monthly. within the real-time scenario,
and therefore the number of mappers needed for
however, large amounts of small sized data are
every technique.
processed into the info warehouse, possibly in high
Keywords — Big Data; MapReduce; Matrix velocity and in many parallel streams.
Multiplication, Real-time analytics. Data analytics goes after the
mixing process, which analyzes the info residing
INTRODUCTION
during a data store, or in streaming. Data analytics
Today’s business intelligence systems increasingly is to urge the insight/value of knowledge with the
demand the support for the mixing and analytics of assistance of analytics tools. When the info size
massive data. matrix operation has many related grows to web scale, employing a traditional
real-life applications, which may be a fundamental analytic tool becomes challenging, e.g., unable to
operation in algebra . More and more enterprises return the analytics results within a deadline , thus
face the challenges of handling fast-growing loses the worth .
amount of knowledge and data diversity. Recently,
BIG DATA ANALYSIS AND RESEARCH TRENDS
researchers have found many applications for
matrices thanks to the extensive use of private Big data may be a hot research topic which has
computers, which increased the utilization of many applications where complex and large data
matrices during a big variety of applications, like should be analyzed. Number of published articles
economics, engineering, statistics, and other during this topic is shown in Table I and illustrated
sciences. MapReduce may be a parallel approach in Fig. 1. The research during this topic is increasing
that contains two sequential tasks which are Map because it is employed almost anywhere lately in
and Reduce tasks. These tasks are implemented news articles, professional magazines, and social
networks like tweets, YouTube videos, and blog second matrix are n × m and m × q, respectively.
discussions. Google scholar is employed to extract
number of articles published for every year using
the query string "Big Data" as exact search term.

The time complexity of the algorithm is O(n3 ),


which needs to locate every element of the arrays
that are multiplied. Better approaches are
proposed over the years with less time complexity
than the brute-force algorithm; like , MapReduce.

ROLE OF MAPREDUCE IN BIG DATA

Parallel computation has been largely used for


matrix operation which is replaced recently with
MapReduce [14]. MapReduce may be a framework
for giant data in parallel distributed environments.
It consists of two sequential tasks; Map and
Reduce:

 Map: takes a set of data and change it to


another set of data so that each processor
work on different set of data which is
formed using a key-value pairs.
 Reduce: combine identical key-value pairs
to form the intended output. This should
always start after the map task.

One of the large challenges facing big data analysis MATRIX MULTIPLICATION USING MAPREDUCE
is multiplying matrices which is implemented using
many approaches and frameworks like There are tons of applications that use matrix
MapReduce. operation during which the matrices are
considered big. Thus, finding efficient matrix
MATRIX MULTIPLICATION IN BIG DATA operation algorithm may be a popular research
topic. Time and price are the most challenges
Many problems are solved using matrix operation
facing this problem during which several
because it is an important operation in algebra .
algorithms were proposed within the literature
Fig. 2 shows how matrices are multiplied to make
so as to unravel this problem.
the resulting matrix. In matrix operation , number
MapReduce may be a parallel framework for
of columns of the primary matrix is that the same
giant data, which contains two jobs when is
as number of rows of the second matrix, where the
applied on matrix multiplication:
sizes of the primary matrix and therefore the
 First job: the reduce task is inactive,
while the map task is simply used to
read the input file and create a pair of # Here all are 2, therefore when k=1, i can have 2
elements for multiplication. values 1 & 2, each case can have 2 further values of
 Second job: the map task implements j=1 and j=2.
the multiplication independently for
# Substituting all values in formula
each pair of elements, while reduce job
combines the results for each output k=1 i=1 j=1 ((1, 1), (A, 1, 1))
element.
j=2 ((1, 1), (A, 2, 2))
EXAMPLE
i=2 j=1 ((2, 1), (A, 1, 3))
MapReduce is a technique in which a huge
program is subdivided into small tasks and run j=2 ((2, 1), (A, 2, 4))
parallel to make computation faster, save time,
and it is mostly used in distributed systems. It has 2 k=2 i=1 j=1 ((1, 2), (A, 1, 1))
important parts:
j=2 ((1, 2), (A, 2, 2))
 Mapper: It takes raw data input and
organizes into key, value pairs. For example, i=2 j=1 ((2, 2), (A, 1, 3))
In a dictionary, you search for the word
“Data” and its associated meaning is “facts j=2 ((2, 2), (A, 2, 4))
and statistics collected together for reference
o To compute the mapping for matrix B:
or analysis”. Here the Key is Data and
the Value associated with is facts and i=1 j=1 k=1 ((1, 1), (B, 1, 5))
statistics collected together for reference or
analysis. k=2 ((1, 2), (B, 1, 6))
 Reducer: It is responsible for processing
data in parallel and produce final output. j=2 k=1 ((1, 1), (B, 2, 7))
So let us consider the matrix multiplication
example to visualize MapReduce. j=2 ((1, 2), (B, 2, 8))
Consider the following matrix:
i=2 j=1 k=1 ((2, 1), (B, 1, 5))

[ 13 24] × [ 57 68] k=2 ((2, 2), (B, 1, 6))


Here we can see that matrix A is a 2×2 matrix j=2 k=1 ((2, 1), (B, 2, 7))
which means the number of rows(i)=2 and the
number of columns(j)=2. Matrix B is also a 2×2 k=2 ((2, 2), (B, 2, 8))
matrix where number of rows(j)=2 and number of
columns(k)=2. Each cell of the matrix is labelled as o We have already computed mapping matrix
Aij and Bij. .So, now the next step is to reduce the
For example element 3 in matrix A is called A21 mapping.
i.e. 2nd-row 1st column. Now One step matrix o The formula for Reducing is:
multiplication has 1 mapper and 1 reducer.
Reducer(k, v)=(i, k)=>Make sorted Alist and Blist
The Formula for mapping matrix A and B is:
(i, k) => Summation (Aij * Bjk)) for j
o Mapper for Matrix A (k, v)=((i, k), (A, j, Aij))
for all k Output =>((i, k), sum)
o Mapper for Matrix B (k, v)=((i, k), (B, j, Bjk))
 To compute reducer of matrix:
for all i
 # We can observe from Mapper
To compute the mapping for matrix A: computation that 4 pairs are common (1,
1), (1, 2), (2, 1) and (2, 2)
# k, i, j computes the number of times it occurs.
# Make a list separate for Matrix A & B with research, while combining them together research
adjoining values taken from Mapper step above: area remains a recent research area, where only
four papers are presented; which are discussed
(1, 1) =>Alist ={(A, 1, 1), (A, 2, 2)}
through this paper. We concluded that column-
Blist ={(B, 1, 5), (A, 2, 7)} byrow technique having a time complexity of O(n)
and n number of mappers is perhaps the simplest
Now Aij x Bjk: [(1*5) + (2*7)] =19 (i) technique, while element-bycolumn-block and
column-block-by-row-block are moderately
(1, 2) =>Alist ={(A, 1, 1), (A, 2, 2)}
acceptable ones, which compromises between the
Blist ={(B, 1, 6), (A, 2, 8)} time complexity of the algorithm and therefore the
number of mappers.
Now Aij x Bjk: [(1*6) + (2*8)] =22 (ii)

(2, 1) =>Alist ={(A, 1, 3), (A, 2, 4)}


REFERENCES
Blist ={(B, 1, 5), (A, 2, 7)}
[1] https://www.researchgate.net/publication/
Now Aij x Bjk: [(3*5) + (4*7)] =43 (iii) 220779373_Matrix-
vector_multiplication_in_sub-
(2, 2) =>Alist ={(A, 1, 3), (A, 2, 4)} quadratic_time_some_preprocessing_requi
red
Blist ={(B, 1, 6), (A, 2, 8)}
[2] https://arxiv.org/pdf/1805.11938.pdf
Now Aij x Bjk: [(3*6) + (4*8)] =50 (iv) [3] https://www.geeksforgeeks.org/matrix-
multiplication-with-1-mapreduce-step/
From (i), (ii), (iii) and (iv) we can conclude that the
[4] https://www.researchgate.net/publication/
final matrix is:
322872479_Matrix_multiplication_of_big_d
((1, 1), 19) ata_using_MapReduce_A_review
[5] https://www.researchgate.net/publication/
((1, 2), 22) 266660592_Survey_of_real-
((2, 1), 43) time_processing_systems_for_big_data
[6] https://www.sciencedirect.com/science/ar
((2, 2), 50) ticle/pii/S2444883417300268
[7] https://lendap.wordpress.com/2015/02/16
Therefore the final matrix is:
/matrix-multiplication-with-mapreduce/
[8] http://www.mathcs.emory.edu/~cheung/C
[ 1943 2250] ourses/554/Syllabus/9-parallel/matrix-
mult.html
[9] https://journalofbigdata.springeropen.com/
CONCLUSION articles/10.1186/s40537-020-00362-1
[10]https://www.sciencedirect.com/science/ar
With the arrival of the large data era, handling ticle/pii/S0747717108800132
large amounts of knowledge is challenging. The
facility of algebraic matrix operation algorithms
stems from the redundancy in multiplying different
pairs of an equivalent collections of vectors. Such
redundancy isn't present in matrix-vector
multiplication. We observed that articles are
increasingly published for these three areas of

You might also like