The Linear Algebra Behind Google
by
SINDHUJARANI.V Roll No. [10212338]
()
May 1, 2012
1 / 34
Outline of the Talk
Basic working of the Google.
Page Rank Algorithm.
Solving the Page Rank algorithm by eigen system.
Solving the Page Rank algorithm by linear system.
()
May 1, 2012
2 / 34
Introdution
The Internet can be seen as a large directed graph, where the Web pages themselves represent verticess, and their links as edges.
The page rank algorithm ranking the back links of the verticess.
Which verticess having more back links getting more important.
()
May 1, 2012
3 / 34
Example:Figure 1
'$ 1 '$ E 3
' &% &% d s d T d d d d d d d d d d d d d d c '$ d d '$ d d 2 4 E &%
()
&%
May 1, 2012
4 / 34
we have the vote for the page x1 = 2, x2 = 1, x3 = 3, and x4 = 2. So that page 3 is the most important, pages 1 and 4 are second important, and page 2 is least important.
Drawback: Not all votes are equally important. A vote from a page with low importance should be less important than the vote from the more importance page.
To avoid this, each vote s importance is divided by the number of dierent votes a page casts.
()
May 1, 2012
5 / 34
Matrix Model
The new formate consider as a matrix which is in the form of Aij = 1/Nj 0 if Pj links to Pi , otherwise (1)
where Nj is the number of out links from page Pj . Recursive form: The recursive form per page is dened as ri =
jLi
rj /Nj ,
(2)
where ri is the page rank of page Pi , Nj is the number of out links from page Pj and Li are the pages that link to page Pi .
()
May 1, 2012
6 / 34
Let s apply this approach to gure 1. For page 1, the recursive form as x1 = x13 + x24 , for the page 2 x2 = x31 , x3 = x1 + x22 + x24 and x4 = x31 + x22 . 3 Now, these linear equations can be written Ax = x, where x=[x1 , x2 , x3 , x4 ]T and in the matrix form as 0 0 1 1/2 1/3 0 0 0 A= 1/3 1/2 0 1/2 , 1/3 1/2 0 0
which transforms the web ranking problem into the standardproblem of nding an eigenvector for a square matrix. In this case, we obtain x1 0.387, x2 0.129, x3 0.290, and x4 0.194, where page 1 getting rank 1, page 2 getting rank 3, page 3 getting rank 2, and page 4 getting rank 4.
()
May 1, 2012
7 / 34
Speciality of the matrix A
Denition: A square matrix is called a column stochastic matrix, if all of its entries are nonnegative and the entries in each column sum to 1. A is called a column stochastic matrix. A has 1 as an eigenvalue. A has left eigenvector, which sum is equal to 1.
()
May 1, 2012
8 / 34
Diculty arise when using the formula 2
Stuck at a page.
Nonuniquness rankings.
Stuck in a subgraph.
()
May 1, 2012
9 / 34
Stuck at a page
Denition: A node that has no out links is called dangling node.
If graph has dangling node then the link matrix has a column of zeros to that node. so we cannot get the column stochastic matrix. To modify the link matrix as a column stochastic matrix, replace all zeros with 1 n in all the zero column, where n is the dimension of the matrix. 1 A = A + eT d n where e is a row vector of ones, and d is a row vector, dened as dj =
()
Now the matrix as
(3)
1 0
if Nj = 0 otherwise
May 1, 2012
(4)
10 / 34
Example:Figure 2
&% s d d d '$ '$ E 2 3 ' &% &% '$ '$ ' 4 5 E &% &% s '$ d d d d d 6 d &%
()
'$ 1
May 1, 2012
11 / 34
For our gure 2, we have d = [1, 0, 0, 0]. Thus 1 A = A + eT d n 0 0 0 = 0 0 0 0 0 0 0
1 2 1 2 1 3 1 3 1 3
0 0 0
0 0 0 0
1 2 1 2
0 0 0 1/2 0
1 2
With the creation of matrix A, we have a column stochastic matrix.
1 0 6 0 1 6 0 1 + 6 1 1 6 0 1 6 1 0 6
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
1 0 6 1 0 6 1 0 6 = 1 0 6 1 0 6 1 0 6
0 0 0 0
1 2
1 2
0 0 0
1 3
1 3 1 3
0 0 0 0
1 2 1 2
0 0 0
1 2 1 2
0 0 0 1 0 0
()
May 1, 2012
12 / 34
Nonuniquness ranking
For our rankings, it is desirable that the dimension of V1 (A) (corresponding to the eigenvalue 1) equal s 1, so that there is a nonzero eigenvector x with i xi =1 which can be for page ranking.
It is not always true in general.
()
May 1, 2012
13 / 34
Example:Figure 3
'$ &% T 1
'$ 3
'$ c 2 &%
()
&% '$ c 4 &%
s d &% T d d d '$ d 5
May 1, 2012
14 / 34
The link matrix of, gure 3 as
0 1 A = 0 0 0
1 0 0 0 0
0 0 0 1 0
0 0 0 0 1 1/2 . 0 1/2 0 0
We nd here that V1 (A) is two-dimensional. One possible pair of basis vectors is x = [1/2, 1/2, 0, 0, 0]T and y = [0, 0, 1/2, 1/2, 0]T .
We know that any linear combination of these two vectors yields another vector in V1 (A). so we will face the problem in the ranking.
()
May 1, 2012
15 / 34
Overcome the dim(V1(A))
To solving this problem we are modifying the equation (2).This analysis that follows, basically a special case of the Perron Frobenius theorem. Perron Frobenius theorem: Let B be an n n matrix with nonnegative real entries. Then we have the following: 1. B has a nonnegative real eigenvalue. The largest such eigenvalue (B), dominates the absolute values of all other eigenvalues of B. The domination is strict if the entries of B are strictly positive. 2. If B has strictly positive entries, then (B) is a simple positive eigenvalue, and the corresponding eigenvector can be normalized to have strictly positive entries. 3. If B has an eigenvector v with strictly positive entries, then the corresponding eigenvalue is (B).
()
May 1, 2012
16 / 34
A Modication of the Link Matrix A
For an n page web with no dangling nodes, We will replace the matrix A with the matrix A = A + (1 )S. (5)
For an n page web with dangling nodes, We will replace the matrix A with the matrix A = A + (1 )S. (6) where 0 1 is called a damping factor and S denote an n n matrix with all 1 entries n . The matrix S is column stochastic, and also V1 (S) is one dimensional.
()
May 1, 2012
17 / 34
Speciality of the matrix A
1. All the entries Aij satisfy 0 Aij 1. 2. Each of the columns sum to one, Aij = 1 for all j.
3. If the value of = 0, then we have the original problem as A = A. 4. If the value of = 1, then we have the problem as A = S.
()
May 1, 2012
18 / 34
Random walker
The random walker starts from a random page, and then selects one of the out links from the page in a random fashion.
The page rank of a specic page can now be viewed as the asymptotic probability that the walker is present at the page.
This is possible, as the walker is more likely to randomly wander to pages with many votes (lots of in links), giving him a large probability of ending up in such pages.
()
May 1, 2012
19 / 34
Stuck in a subgraph
There is still one possible pitfall in the ranking. The walker wander into a subsection of the complete graph that does not link to any outside pages. The link matrix for this model reducible matrix. We therefore want the matrix to be irreducible, which making sure he cannot get stuck in a subgraph. This irreducibility is called teleportationwhich means ability to jump with a small probability from any page in the link structure to any other page. This can mathematically be described for page with no dangling nodes as: 1 (7) A = A + (1 ) e T e. n For page with dangling nodes as: 1 A = A + (1 ) e T e n where e is a row vector of ones, and is a damping factor.
() May 1, 2012 20 / 34
(8)
Example:Figure 4
&% s d d d '$ '$ E 2 3 ' &% &% '$ '$ ' 4 5 E &% &% s '$ d d d d d 6 d &%
()
'$ 1
May 1, 2012
21 / 34
The link matrix for gure 4 using equation (8) as, 1 11 19
6 12 1 60 60
1 60 1 60 1 60 1 60 7 15 7 15
1 60 1 60 1 60 7 15 7 15 1 60
1 60
1 6 1 6 1 T A = A + (1 ) e e = 1 n 6 1 6
1 6
19 60 1 60 19 60 1 60 1 60
11 12 1 60 1 60 1 60
1 60 1 60 11 12 1 60 1 60
Here set to 0.85. Here the matrix A is a column stochastic matrix. When adding (1 ) 1 e T e gives an equal chance of jumping to all pages. n
()
May 1, 2012
22 / 34
Analysis of the matrix A
Denition: A matrix A is positive if Aij > 0 for all i and j. If A is positive and column stochastic, then any eigenvector in V1 (A) has all positive or all negative components. If A is positive and column stochastic, then V1 (A) has dimension 1.
()
May 1, 2012
23 / 34
Solution Methods for Solving the Page rank Problem
The page rank is the same as nding the eigenvector corresponding to the largest eigenvalue of the matrix A. To solve this we need an iterative method that works well for large sparse matrices. There are two methods for solving Page rank Problem: 1. Eigen system problem. (The power method) 2. Linear system problem. (Jacobi method,Gauss-Seidel method,SOR method,etc..)
()
May 1, 2012
24 / 34
The power method
The Power method is a simple method for nding the largest eigenvalue and corresponding eigenvector of a matrix. It can be used when there is a dominant eigenvalue of A. Consider iterates of the power method applied to A as 1 1 Ax (k1) = Ax (k1) + e T dx (k1) + (1 ) e T ex (k1) = x (k) , n n where x (k1) is a probability vector, and thus ex (k1) = 1.
()
May 1, 2012
25 / 34
Convergence of the power method
Rescale the power method at each iteration by xk = any vector norm.
Axk1 Axk1
, where
can be
Every positive column stochastic matrix A has a unique vector x with positive components such that Ax = x with x 1 = 1. The vector x can be computed as x=limk Ak x0 for any initial guess x0 with positive components such that x0 1 = 1. The rate of convergence may be shown to be linear for the Power method is |2 /1 |.
()
May 1, 2012
26 / 34
Linear system problem
We begin by formulating the page rank problem as a linear system. 1 The eigen system problem Ax = Ax + (1 ) n e T ex = x can be rewritten as, 1 (I A)x = (1 ) e T =: b. n Let we split the matrix (I A) as, (I A) = A = (L + D + U), (10) (9)
where D is the diagonal matrix and, L and U are strict lower triangular and strict upper triangular respectively.
()
May 1, 2012
27 / 34
Properties of (I A)
1. (I A) is an M-matrix. 2. (I A) is nonsingular. 3. The row sums of (I A) are 1 . = 1 + , provided at least one nondangling node exists. 4. I A 5. Since (I A) is an M-matrix, (I A)1 0. 6. The row sums of (I A)1 is
1 1 .
Therefore (I A)1
1+ 1 .
1 1 .
7. Thus, the condition number (I A) =
Denition: A real matrix A that has Aij 0 when i = j and aii 0 for all i.A can be expressed as A = sI B, where s > 0 and B 0. when s > (B), A is called an M matrix. M matrix can be either singular or nonsingular.
()
May 1, 2012
28 / 34
Jacobi method
The Jacobi method can be applied to Google matrix (10) (L + D + U)x = b Dx = b (L + U)x k1
k
x k = D 1 [b (L + U)x k1 ],
where D is invertible matrix.
()
May 1, 2012
29 / 34
Gauss Seidel method
The Gauss Seidel method can be applied to Google matrix (10) (L + D + U)x = b (L + D)x k = b Ux k1 x k = (L + D)1 [b Ux k1 ],
where (L + D) is invertible matrix. The Gauss seidel method converges much faster than the Power and Jacobi methods. The disadvantage is very hard to parallelize.
()
May 1, 2012
30 / 34
SOR method
The SOR method can be applied to Google matrix (10) (L + D + U)x = b (L + D)x k = (b Ux k1 ) + (1 )Dx k1 x k = (L + D)1 [(b Ux k1 ) + (1 )Dx k1 ],
where 1 2. when =1, this method return to the Gauss seidel. Here (L + D) is invertible matrix.
The cost of SOR method per iteration is more expansive and less ecient in parallel computing for huge matrix system.
()
May 1, 2012
31 / 34
Plot for number of the iteration required for convergence by dierent method
0.8 0.7 1.4
1.2
0.6 1 0.5 0.8 0.4 0.6 0.3 0.4 0.2 0.1 0.2
10
15
20
25
30
1.5
2.5
3.5
4.5
Figure: Jacobi method
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Figure: Gauss-seidal
1.5
2.5
3.5
4.5
Figure: SOR method
Figure: Power method
()
May 1, 2012
32 / 34
Conclusion
We are discussed mathematical idea used in Google search engine.
Investigated various problem arises when computing the Google matrix (Link matrix).
Taken, an example of large matrix representation of the Internet, and developed computing the Page rank using dierent methods.
()
May 1, 2012
33 / 34
Bibliography
Erik Andersson and Per-Anders Ekstrom. Investigating googles pagerank algorithm. (2):19, 2004. Pavel Berkhin. A survey on pagerank computing. (1):8889, 13-07-2005. Kurt Bryan and Tanya Leise. The 25,000,000,000 eigenvector: The linear algebra behind google. SIAM Review, (3), 2006. Amy N. Langville and Carl D. Meyer. Deeper inside pagerank. 2004.
()
May 1, 2012
34 / 34