0% found this document useful (0 votes)
0 views

Lecture 4

The document outlines the content of Lecture 4 of the course MS&E 317/CS 263 at Stanford University, focusing on Matrix Vector Multiplication and PageRank algorithms. It details the implementation of these algorithms using MapReduce, including specific functions for mapping and reducing data. Additionally, it discusses the challenges of dead-ends in PageRank and introduces the concept of random teleports to improve the algorithm's accuracy.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lecture 4

The document outlines the content of Lecture 4 of the course MS&E 317/CS 263 at Stanford University, focusing on Matrix Vector Multiplication and PageRank algorithms. It details the implementation of these algorithms using MapReduce, including specific functions for mapping and reducing data. Additionally, it discusses the challenges of dead-ends in PageRank and introduces the concept of random teleports to improve the algorithm's accuracy.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MS&E 317/CS 263: Algorithms for Modern Data Models, Spring 2014

http://msande317.stanford.edu.
Instructors: Ashish Goel and Reza Zadeh, Stanford University.

Lecture 4, 4/9/2014. Scribed by Burak Yavuz.

4.1 Outline
1. Matrix Vector Multiply (Av)

2. PageRank

• on MapReduce
• on RDD’s / Spark

4.2 Matrix Vector Multiplication on MapReduce


We have a sparse matrix A stored in the form < i, j, aij >, where i, j are the row and column
indices and a vector v stored as < j, vj >. We wish to compute Av.
For the following algorithm, we assume v is small enough to fit into the memory of the mapper.

Algorithm 1 Matrix Vector Multiplication on MapReduce


1: function map(< i, j, aij >)
2: Emit(i, aij v[j])
3: end function
4: function reduce(key,values)
5: ret ← 0
6: for val ∈ values do
7: ret ← ret + val
8: end for
9: Emit(key, ret)
10: end function

4.3 PageRank
For a graph G with n nodes, we define the transition matrix Q = D−1 A, where A ∈ Rn×n is the
adjacency matrix and D ∈ Rn×n is a diagonal matrix composed of the outgoing edges from each
node.
We use Power Iteration to estimate importance values for webpages as v (k+1) = v (k) Q, where
v ∈ Rn is a row vector, and k is the number of iterations. We set v (0) = 1, a vector with each
element equaling one.

1
1

2 3

5 4

7 6

Figure 1: Graph G

Using Q as the probability distribution for random walks is a problem when G contains dead-
ends, i.e. “sink” nodes (nodes 2 and 7 in Figure 1). We introduce the idea of random teleports.
With probability α, the random walker can teleport to a random webpage or continue walking with
probability 1 − α where 0 < α < 1. Then we have a new matrix:

P = (1 − α)Q + αΛ

where  
−−− λ −−−
− − − λ − − −
 
 · 
Λ=
 
·

 
 
 · 
− − − λ − − − n×n
and α ∈ Rn is composed of the probability distribution of teleporting to a webpage.
The Power Iteration applies again: π (k+1) = π (k) Q.

Theorem 4.1
kπ − v (k) k2 ≤ e−ak
for some constant a > 0.

According to 4.1, for n = 109 , around 9 iterations are enough to get correct ranking.

4.3.1 PageRank on MapReduce


P
P is stored as < i, {(j, Pij )} >, where j Pij = 1, ∀i ∈ [1, n].
(k)
v is stored as < i, vi >.
We use a two-step algorithm:
Step 1:
Annotate Pi with vi , i.e. Emit < i, vi , {(j, Pij )} >.
Step 2:

2
Algorithm 2 PageRank Computation on MapReduce, Step 2
1: function map(< i, vi , {(j, Pij )} >)
2: for (j, Pij ) ∈ links do
(k)
3: Emit(j, Pij vi )
4: end for
5: end function
6: function reduce(key,values)
(k+1) P
7: vi = v∈values v
(k+1)
8: Emit (i, vi )
9: end function

You might also like