0% found this document useful (0 votes)
29 views18 pages

Page Rank Algorithm

Uploaded by

Santosh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views18 pages

Page Rank Algorithm

Uploaded by

Santosh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

BIG DATA ANALYTICS

CS-411

Dr. Bharat Singh


Department of Computer Science and Engineering

Indian Institute of Information Technology, Ranchi


11/12/24
Page Rank Algorithm
“A page is important if important pages link to it.”

2
PAGE RANK ALGORITHM

11/12/24
 The PageRank algorithm or Google algorithm was introduced by Lary Page, one of the founders of
Google.
 It was first used to rank web pages in the Google search engine.

 Nowadays, it is more and more used in many different fields, for example in ranking users in social

media etc…
 What is fascinating with the PageRank algorithm is how to start from a complex problem and end up

with a very simple solution.

3
PAGE RANK ALGORITHM

11/12/24
The web can be represented like a directed
graph where nodes represent the web
pages and edges form links between them.
Typically, if a node (web page) i is linked to
a node j, it means that i refers to j.

4
PAGE RANK ALGORITHM
 We have to define what is the importance of a web page.
 As a first approach,

11/12/24
 it is the total number of web pages that refer to it.
 If we stop to this criteria, the importance of these web pages that refer to it is not taken into account.
 In other words, an important web page and a less important one has the same weight.
 Another approach is
 a web page spread its importance equally to all web pages it links to.
 By doing that, we can then define the score of a node j as follows:

 where rᵢ is the score of the node i and dᵢ its out-degree.

5
PAGE RANK ALGORITHM

11/12/24
From the graph, we can write this linear system:

we can solve by using Gaussian elimination.

6
PAGE RANK ALGORITHM

11/12/24
 But this solution is limited for small graphs.
 Indeed, as this kind of graphs are sparse and Gauss elimination

modifies the matrix when performing its operations, we lose the


sparsity of the matrix and it would take more memory space.
 In the worst case, the matrix can no longer be stored.

7
PAGE RANK ALGORITHM

11/12/24
 Markov Chain and PageRank
 Graph can be seen as a Markov chain
with the following transition matrix:

8
PAGE RANK ALGORITHM

11/12/24
 P transpose is row stochastic which is a condition to apply Markov chain theorems.
 For the initial distribution, let’s consider that it is equal to :

 where n is the total number of nodes.


 This means that the random walker will choose randomly the initial node from where it can reach all other

nodes.

9
PAGE RANK ALGORITHM
 At every step, the random walker will jump to another node according to the transition matrix. the
probability distribution is then computed for every step.

11/12/24
 This distribution tells us where the random walker is likely to be after a certain number of steps.

 The probability distribution is computed using the following equation:

 A stationary distribution of a Markov chain is a probability distribution π with π = Pπ.


 This means that the distribution will not change after one step. It is important to note that not all Markov

chains admit a stationary distribution

10
PAGE RANK ALGORITHM

11/12/24
 All we have to do is solving this equation:

 We notice that π is an eigenvector of the matrix P with the eigenvalue 1.


 Instead of computing all eigenvectors of P and select the one which corresponds to the

eigenvalue 1, we use the Frobenius-Perron theorem.

11
PAGE RANK ALGORITHM

11/12/24
Frobenius-Perron theorem:
 If a matrix A is a square and positive matrix (all its entries are positive),
then it has a positive eigenvalue r, such as |λ| < r, where λ is an
eigenvalue of A. The eigenvector v of A with eigenvalue r is positive and is
the unique positive eigenvector.
 To compute π, we use the power method iteration which is an iterative method to compute the
dominant eigenvector of a given matrix A.

12
PAGE RANK ALGORITHM
Teleportation and Damping Factor
 In the web graph, we can find a web page i which refers

11/12/24
only to web page j and j refers only to i. This is what we
call spider trap problem.
 We can also find a web page which has no outlink. It is

commonly named Dead end.


13
PAGE RANK ALGORITHM

11/12/24
 Spider Trap, when the random walker reaches the node 1 in the above example, he can
only jump to node 2 and from node 2, he can only reach node 1, and so on. The
importance of all other nodes will be taken by nodes 1 and 2. In the above example, the
probability distribution will converge to π = (0, 0.5, 0.5, 0). This is not the desired result.
 Dead Ends, when the walker arrives at node 2, it can’t reach any other node because it

has no outlink. The algorithm cannot converge.


 To get over these two problems, we introduce the notion
of teleportation.

14
PAGE RANK ALGORITHM

Teleportation

11/12/24
 Teleportation consists of connecting each node of the graph to all other nodes.
 The graph will be then complete.
 The idea is with a certain probability β, the random walker will jump to another node according
to the transition matrix P and with a probability (1-β)/n, it will jump randomly to any node in the
graph. We get then the new transition matrix R:

where v is a vector of ones, and e a vector of 1/n. β is commonly defined as the damping factor.

15
PAGE RANK ALGORITHM
TELEPORTATION
 By applying teleportation in our example, we get the following new transition matrix:

11/12/24
 The matrix R has the same properties than P which means that it admits a stationary distribution,
so we can use all the theorems we saw previously.

16
PAGE RANK ALGORITHM

11/12/24

17
Thank You!!
Dr. Bharat Singh
Email id—
[email protected]
Mobile No–
8707223885

You might also like