Lecture 4
Lecture 4
http://msande317.stanford.edu.
Instructors: Ashish Goel and Reza Zadeh, Stanford University.
4.1 Outline
1. Matrix Vector Multiply (Av)
2. PageRank
• on MapReduce
• on RDD’s / Spark
4.3 PageRank
For a graph G with n nodes, we define the transition matrix Q = D−1 A, where A ∈ Rn×n is the
adjacency matrix and D ∈ Rn×n is a diagonal matrix composed of the outgoing edges from each
node.
We use Power Iteration to estimate importance values for webpages as v (k+1) = v (k) Q, where
v ∈ Rn is a row vector, and k is the number of iterations. We set v (0) = 1, a vector with each
element equaling one.
1
1
2 3
5 4
7 6
Figure 1: Graph G
Using Q as the probability distribution for random walks is a problem when G contains dead-
ends, i.e. “sink” nodes (nodes 2 and 7 in Figure 1). We introduce the idea of random teleports.
With probability α, the random walker can teleport to a random webpage or continue walking with
probability 1 − α where 0 < α < 1. Then we have a new matrix:
P = (1 − α)Q + αΛ
where
−−− λ −−−
− − − λ − − −
·
Λ=
·
·
− − − λ − − − n×n
and α ∈ Rn is composed of the probability distribution of teleporting to a webpage.
The Power Iteration applies again: π (k+1) = π (k) Q.
Theorem 4.1
kπ − v (k) k2 ≤ e−ak
for some constant a > 0.
According to 4.1, for n = 109 , around 9 iterations are enough to get correct ranking.
2
Algorithm 2 PageRank Computation on MapReduce, Step 2
1: function map(< i, vi , {(j, Pij )} >)
2: for (j, Pij ) ∈ links do
(k)
3: Emit(j, Pij vi )
4: end for
5: end function
6: function reduce(key,values)
(k+1) P
7: vi = v∈values v
(k+1)
8: Emit (i, vi )
9: end function