0% found this document useful (0 votes)

11 views7 pages

6 Pagerank

pagerank

Uploaded by

paufernandezcester

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

6 Pagerank

pagerank

Uploaded by

paufernandezcester

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Pagerank (Brin and Page 1998)

The idea that made Google great

This document introduces the famous Pagerank algorithm (Brin and Page 1998). These lecture notes are
not meant as an exhaustive and detailed document; instead, they contain a quick description of the main
concepts and features of the algorithm, with links to references and other more complete resources. Students
are strongly encouraged to read these, it should help with the “AA” transversal skill for the CAI-GCED
course. In the mid-term or final exams you will be asked questions regarding this topic, so make sure you
know the content of this algorithm and some of the extensions mentioned here.

1. What is pagerank? Main ideas

As its name suggests, Pagerank is an algorithm that generates a ranking. Namely, given a directed graph
as input (a web graph, for example, where nodes are pages and links are html hyperlinks) it assigns an
importance score to each page or node (its “pagerank”). The algorithm was proposed in the context of web
search, and the pagerank of pages was used to rank results. Using the global structure of the web to improve
ranking of web results was the magic ingredient that gave Google advantage over its competitors.
In this text I talk about pages and nodes interchangeably, and it should be understood that when I say page I
mean a node in the input graph.
The main idea of the algorithm lies in the fact that a link from page A to page B should be understood as an
endorsement of B’s importance by A. Not all endorsements are equal, and being endorsed by the Queen of
England is not the same as being endorsed by a random citizen.

A page is important if it is pointed to by other important pages

To formalize this seemingly circular concept, let us start introducing some notation. Let G = (V, E) be a
directed graph (the web graph) with nodes V = {1, .., n} – so, there are n pagesˆ1 – and (i, j) ∈ E if page i
points to page j.
As our running example, we may have the following 4-node graph given by

• V = {1, 2, 3, 4}, and

• E = {(1, 1), (1, 3), (1, 4), (2, 1), (2, 4), (3, 2), (3, 4), (4, 2)}

The pagerank of a page i ∈ V, pi is a real value (positive score) associated to each page in the input graph.
It corresponds to the importance of node i globally in the graph. The intuition behind the mathematical
definition of pagerank is that
1 Think n extremely large.

1
The pagerank (prestige) of a node is passed in equal parts to the nodes to which it points.

So we define:
Definition (pagerank): The vector ( pi )i∈V of pageranks should satisfy
1. ∑i pi = 1, and
pj
2. for all i: pi = ∑( j,i)∈E out( j)

where out( j) is the outdegree of vertex j.

Example with toy graph

The definition leads to a system of n + 1 linear equations. For example, let us instantiate the equation for p1
pj
using the general definition pi = ∑ . We have to go over all nodes j pointing to 1, which are nodes 1
j →i
out( j)
and 2. Therefore we get that:

p1 p
p1 = + 2
3 2

Notice the different denominators: node 1 shares its pagerank equally among 3 nodes (outdegree is 3), and
node 2 has outdegree 2 and hence shares its pagerank equally to nodes 1 and 4. The following picture should
make the “flow” of pageranks clear:

This leads to the set of n + 1 linear equations:

p1 p
p1 = + 2
3 2
p3
p2 = + p4
2
p1
p3 =
3
p1 p p
p4 = + 2+ 3
3 2 2
1= p1 + p2 + p3 + p4

Notice also that the pageranks are distributed across the graph but the net pagerank should stay constant
(and add up to 1)2 .

2. Linear algebra view

We may write the system of linear equations compactly, as usual, using matrix notation. Let M be the matrix
such that (M is called the transition matrix):
• Mi,j = 1/out(i ) if (i, j) ∈ E
• Mi,j = 0 if (i, j) ̸∈ E
Then the system of equations above is equivalent to the matrix equation

p = MT p
2 For well-behaved graphs at least, we shall see this later.

2
Notice that p is the eigenvector of M T associated to eigenvalue 1, and so, finding the solution to our system
of equation is equivalent to finding the leading eigenvector of matrix M T .
In conclusion, a node’s importance is given by its coordinate in the leading eigenvector of the (transpose of)
the transition matrix M.

Example with toy graph

Rows of M add to 1 (row-stochastic). Columns of M T add to 1 (column-stochastic).

1 1 1 1 1 
3 0 3 3 3 2 0 0
1 0 0 1 0 0 1
1
M= 2
1
2
1 MT = 
1
2 
0 2 0 2 3 0 0 0
0 1 0 0 1 1 1
3 2 2 0

     
p1 1/3 1/2 0 0 p1
 p2   0 0 1/2 1  p2 
 = · 
 p3  1/3 0 0 0  p3 
p4 1/3 1/2 1/2 0 p4

3. Probabilistic view (random surfer)

An equivalent but useful view of pagerank is given by a probabilistic interpretation of a random surfer that
jumps from page to page at random (following links in the web graph at random).
Following our toy example, assume that the surfer starts at node 1, and follows links uniformly at random.

Possible sequences of nodes visited by the random surfer:

• 1, 1, 1, 1, 4, 2, ..
• 1, 3, 4, 2, 1, 3, ..
• 1, 4, 2, 1, 4, 2, ..

We view the “pagerank vector” now as a distribution over the nodes of the web graph of the location of the
random surfer at time t.
For example:
• p(t = 0) = (1, 0, 0, 0) T means that at time t = 0 the random surfer is at node 1.
• p(t = 0) = (1/4, 1/4, 1/4, 1/4) T means that at time t = 0 the random surfer could be at any node with
equal probability.

3
Exercise: Supposing that the random surfer starts from node 1, where could we find her at time t = 1, and at
time t = 2? And at any given time t?
Here, the transition matrix is telling us the probabilities of jumping between nodes, in particular, the first
row tells us that, at time t = 1, the location of the surfer is given by p(t = 1) = (1/3, 0, 1/3, 1/3) T , which we
obtain by the matrix-vector multiplication p(t = 1) = M T p(t = 0). In general, to figure out the location in
the following time step, we can use

p(t + 1) := M T p(t) := ( M T )2 p(t − 1) := ... := ( M T )t p(0)

And the fixed point of this recurrence gives us the solution to pagerank as well, namely, when p(t + 1) =
p(t) = M T p(t), then we have clearly found the pagerank solution since it satisfies the linear equations as
required.

4. Power iteration method

The last recurrence suggests a method for finding the pagerank values of a graph, called the power iteration
method for obvious reasons:
• t=0
• p(0) = (1/n, .., 1/n) T
• repeat until convergence:
– p ( t + 1) = M T p ( t )
– t = t+1

Implementation of power method and application to toy example

import numpy as np

Mt = np.array([[1/3, 0, 1/3, 1/3],

[1/2, 0, 0, 1/2],
[0, 1/2, 0, 1/2],
[0, 1, 0, 0]]).transpose()

n = 4

t = 0
p = np.array([1/n] * n)
p_old = p + 10

while not np.allclose(p, p_old):

t += 1
p_old = p
p = Mt @ p

print(f'converged after {t} iterations to vector {p}')

produces the output:
converged after 22 iterations to vector [0.26 0.35 0.09 0.3]
Worth noting is that this type of implementation that relies directly on the matrix-vector multiplication may
be wasteful of space (only needing to store matrix M in explicit form is wasteful as we may have lots of
0s in sparse graphs – and most real graphs are actually sparse) and, consequently, on time. More efficient
implementations exist where each iteration of the power method can be done in time O(n + m) where n is

4
the number of nodes and m is the number of edges in the graph. Note that the matrix representation uses
space and hence time O(n2 ) in each iteration which for large graphs may be prohibitive. More on this in the
corresponding pagerank lab session.

5. Convergence guarantees of the power method

Now, does the power method always work? If not, when is it guaranteed to work? We need to figure out the
following:
• The method converges to some solution
• The method converges to a unique solution
• The method converges fast to the unique solution
• The method converges fast to the unique solution for any starting point
It turns out that the power method can fail indeed for some bad input Ms. Let us look at some of the things
that could go wrong:

5.1. Dangling nodes

Dangling nodes are nodes with no outgoing links. The presence of such nodes is problematic for the
definition of pagerank. In fact, it is not hard to find a simple input graph with a dangling node where there
is no solution to the pagerank system of equations (Exercise: find it!)
Note that in the M matrix, the row corresponding to a dangling node is all 0, and so the matrix is not
stochastic (the rows corresponding to dangling nodes sum to 0 and not to the expected 1 in stochastic
matrices). In fact, when the transition matrix M is stochastic (or M T is column-stochastic), then we are
guaranteed to find at least one solution by the Perron-Frobenius theorem of linear algebra.
So, to guarantee the existence of a solution we shall force the M matrices that we compute the power method
over to be stochastic. How do we do that? Well, essentially by substituting all-zero rows corresponding to
dangling nodes with uniform rows where all entries are 1/n. This corresponds to adding outgoing links from
dangling nodes to all nodes in the graph (including themselves). Or, equivalently, we are redistributing the
pagerank of dangling nodes accross all nodes equally.
Remember that we said at some point that if all goes well the total pagerank adds up to 1 always? Well, in
the presence of dangling nodes this is not so (try to imagine why!), and when executing one iteration of the
power method we will see that the total pagerank is less and less. In other words, pagerank seems to leak. If
you find this is the case in your implementation, then chances are you are not dealing with dangling nodes
appropriately.

5.2. Sink nodes

Sink nodes are nodes whose only outgoing link is to themselves. Try to think what happens to a random
surfer once she enters a sink node. There is no escape! So, these types of nodes end up hoarding all the
pagerank deeming it useless.

5.3. Disconnected components

Something similar happens in graphs that are not strongly connected, namely, that have disconnected
components. Once the surfer enters such a component there is no way out, and so uniqueness of pagerank
solution is not guaranteed; basically pagerank is ill-defined in this case.
Think of the following scenario and solutions to the equation p = M T p:

5

0 1
The system of equations is p = p and so, in this extreme case,
1 0
any vector is a solution!

Unconnected components have more than 1 eigenvector associated to the eigenvalue 1. If the graph is
strongly connected this does not happen - multiplicity 1, and then uniqueness of solution is guaranteed.

5.4 Certain cyclic patterns

A certain kind of cyclic graphs are problematic since they can make the power iteration fail. See, for example,
the following case:

This has a unique solution: (1/4, 1/4, 1/4, 1/4) T however the power
method fails to converge if it starts from any other vector than the
solution vector.

Not all cyclic graphs are problematic; in fact the problematic ones are the ones that are periodic. Here the
solution will be to make sure that all input graphs are aperiodic.

5.5 Fixing the problematic cases: damping factor (λ)

We need to fix problematic cases by modifying the transition matrix to make sure it is stochastic, aperiodic,
and strongly connected. This way none of the problematic cases can happen and a unique solution exists,
and the power method is guaranteed to find it fast.
Now, to solve the issue of sink nodes and/or disconnected components, or periodic input graphs, Google’s
founders came up with the notion of damping factor. They define the Google matrix G as a mixture of the
original transition matrix M (forced to be made stochastic) and the transition matrix corresponding to a
complete digraph to guarantee success of the power method:

1
G = λM + (1 − λ) J
n
where J is the matrix containing all 1’s and 0 < λ < 1 is the damping factor.
Instead of working the power method on M, we are going to use the Google matrix G, which is guaranteed to
have a unique pagerank solution that the power method will find fast. In essence, the graph that corresponds
to G is strongly connected, aperiodic and G is stochastic, so the Perron-Frobenius theorem guarantees
existence and uniqueness of solution, and the second eigenvalue of G governs convergence of the power
method.

5.6 Damping factor and teleportation

The way we can interpret the damping factor in the random surfer interpretation of pagerank is as follows.
Before following a link in the web graph, the random surfer tosses a (biased) coin: with probability λ she
follows an outgoing link from the current location, but with probability 1 − λ, she jumps (i.e. teleports) to
any node in the web graph.

6
Final observations on the damping factor λ
• λ is to ensure uniqueness and (fast) convergence. Not in pagerank definition.
• As λ → 1, solution closer to the “true’ ’ pagerank
• As λ → 0, solution closer to uniform (not interesting)
• As λ → 0, faster guaranteed convergence
• Balance between speed and accuracy
• Values 0.8 . . . 0.9 typical

6. Topic-sensitive pagerank
The pagerank vector is defined on the basis of the linear system of equations p = G T p, or equivalently

p = λM T p + (1 − λ)u

where u is the uniform vector (all of its entries are 1/n).

This alternative formulation can be useful for defining personalized pageranks, by modifying the teleportation
to bias the result towards some subset of pages. So, instad of teleporting to any page uniformly at random
(given by u), we can teleport to a particular subset of pages given by some personalized vector r:

p = λM T p + (1 − λ)r

This is the idea in the following paper, which describes topic-sensitive pagerank. Notice that depending on
the nature of r the resulting aperiodicity, strong connection etc. may be broken and so extra care needs to be
taken in those cases.
For more information, please read Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web
search by T.H. Haveliwala (Haveliwala 2003).

References
Brin, Sergey, and Lawrence Page. 1998. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.”
Comput. Networks 30 (1-7): 107–17. https://doi.org/10.1016/S0169-7552(98)00110-X.
Haveliwala, T. H. 2003. “Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search.”
IEEE Transactions on Knowledge and Data Engineering 15 (4): 784–96. https://doi.org/10.1109/TKDE.2003.
1208999.

Web Search Algorithms and PageRank
100% (1)
Web Search Algorithms and PageRank
101 pages
Module VI Link Analysis Final
No ratings yet
Module VI Link Analysis Final
104 pages
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
No ratings yet
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
99 pages
Lecture 12 - Link Analysis
No ratings yet
Lecture 12 - Link Analysis
57 pages
Link Analysis
No ratings yet
Link Analysis
47 pages
Lecture 9
No ratings yet
Lecture 9
64 pages
Page Rank With 13 Cases
No ratings yet
Page Rank With 13 Cases
72 pages
PageRank 2021
No ratings yet
PageRank 2021
55 pages
Page Rank and HITS
No ratings yet
Page Rank and HITS
39 pages
TM3 ch05 Link Analysis
No ratings yet
TM3 ch05 Link Analysis
69 pages
Lect 14-Web Ranking
No ratings yet
Lect 14-Web Ranking
30 pages
Graph Help Session
No ratings yet
Graph Help Session
27 pages
Page Rank
No ratings yet
Page Rank
29 pages
CSF-469-L11-13 (Link Analysis Page Rank)
No ratings yet
CSF-469-L11-13 (Link Analysis Page Rank)
47 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
18 pages
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
No ratings yet
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
35 pages
PMBD-07-Link Analysis
No ratings yet
PMBD-07-Link Analysis
42 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
44 pages
Page Rank, Structure of Web and Analyzing A Web Graph
No ratings yet
Page Rank, Structure of Web and Analyzing A Web Graph
17 pages
GRP 11 - Page Rank Algorithms
No ratings yet
GRP 11 - Page Rank Algorithms
15 pages
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
No ratings yet
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
51 pages
Lec 31
No ratings yet
Lec 31
15 pages
The Linear Algebre Behind Google
No ratings yet
The Linear Algebre Behind Google
13 pages
EXP-11-Implementation of Page Rank Algorithm
No ratings yet
EXP-11-Implementation of Page Rank Algorithm
8 pages
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
No ratings yet
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
33 pages
Technical University of Ilmenau Institute For Theoretical and Technical Computer Science Automata and Formal Languages
No ratings yet
Technical University of Ilmenau Institute For Theoretical and Technical Computer Science Automata and Formal Languages
19 pages
Google Pagerank and Reduced-Order Modelling
No ratings yet
Google Pagerank and Reduced-Order Modelling
56 pages
LinAlgPaperFinal2 Screen
No ratings yet
LinAlgPaperFinal2 Screen
12 pages
The $25,000,000,000 Eigenvector The Linear Algebra Behind Google
No ratings yet
The $25,000,000,000 Eigenvector The Linear Algebra Behind Google
11 pages
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
No ratings yet
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
13 pages
Google PageRank
No ratings yet
Google PageRank
22 pages
Jeffrey D. Ullman Stanford University
No ratings yet
Jeffrey D. Ullman Stanford University
55 pages
Page Rank PDF
0% (1)
Page Rank PDF
20 pages
Google Ems English
No ratings yet
Google Ems English
22 pages
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
No ratings yet
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
46 pages
Google Pagerank: The World'S Largest Matrix Computation
No ratings yet
Google Pagerank: The World'S Largest Matrix Computation
13 pages
Name: Kartik Jolapara Sapid: Div: Branch
No ratings yet
Name: Kartik Jolapara Sapid: Div: Branch
4 pages
PageRank Algorithm - The Mathematics of Google Search
No ratings yet
PageRank Algorithm - The Mathematics of Google Search
8 pages
Report PDF
No ratings yet
Report PDF
35 pages
Assignment5 NLA Aug2023
No ratings yet
Assignment5 NLA Aug2023
7 pages
Lab 4-2
No ratings yet
Lab 4-2
4 pages
Google Pagerank: Maths Delivers!
No ratings yet
Google Pagerank: Maths Delivers!
24 pages
Project2 SimplifiedPageRank
No ratings yet
Project2 SimplifiedPageRank
6 pages
Pagerank
No ratings yet
Pagerank
3 pages
The Linear Algebra Behind Google
No ratings yet
The Linear Algebra Behind Google
13 pages
1.1 Pagerank Description
No ratings yet
1.1 Pagerank Description
19 pages
Power Point
No ratings yet
Power Point
77 pages
Googles Secret and Linear Algebra
No ratings yet
Googles Secret and Linear Algebra
7 pages
Cse535 Link Analysis
No ratings yet
Cse535 Link Analysis
19 pages
Implementation and Analysis of Google's Page Rank Algorithm Using Network Dataset
No ratings yet
Implementation and Analysis of Google's Page Rank Algorithm Using Network Dataset
5 pages
Link Analysis: (Follow The Links To Learn More!)
No ratings yet
Link Analysis: (Follow The Links To Learn More!)
28 pages
The Use of The Linear Algebra by Web Search Engines
No ratings yet
The Use of The Linear Algebra by Web Search Engines
5 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
9 pages
Java Programming PDF
No ratings yet
Java Programming PDF
247 pages
The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102
No ratings yet
The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102
6 pages
Mini-Project #3 - Pagerank: 1 Motivation
No ratings yet
Mini-Project #3 - Pagerank: 1 Motivation
3 pages
Lecture 3 - Page Rank
No ratings yet
Lecture 3 - Page Rank
7 pages
Gauss Jordan Elimination
No ratings yet
Gauss Jordan Elimination
3 pages
SDT Spring Exam 2023 QP Final
100% (2)
SDT Spring Exam 2023 QP Final
8 pages
MTH202 Final Term MCQs Mega File With Reference by Farhan
No ratings yet
MTH202 Final Term MCQs Mega File With Reference by Farhan
125 pages
Basic Concepts of Algorithm
No ratings yet
Basic Concepts of Algorithm
20 pages
Digital Logic Design 5th Edition Chap 3 Notes
No ratings yet
Digital Logic Design 5th Edition Chap 3 Notes
32 pages
Simplex Method PDF
No ratings yet
Simplex Method PDF
18 pages
Ad3251data Structures Design
No ratings yet
Ad3251data Structures Design
2 pages
@vtucode - in BAD402 Model Paper 2022 Scheme
No ratings yet
@vtucode - in BAD402 Model Paper 2022 Scheme
2 pages
DC - Unit 1 - Final
No ratings yet
DC - Unit 1 - Final
71 pages
3) Data Types, Operators and Control Statements
No ratings yet
3) Data Types, Operators and Control Statements
21 pages
Answer Key 1-4 Study Guide and Intervention
100% (1)
Answer Key 1-4 Study Guide and Intervention
2 pages
Tucs Dissertation 2014
No ratings yet
Tucs Dissertation 2014
247 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Techniques: Two Pointer Technique
No ratings yet
Techniques: Two Pointer Technique
5 pages
Module 6 Finding The Roots of Nonlinear Equations Using Iterative Methods Part 1-Of-2
No ratings yet
Module 6 Finding The Roots of Nonlinear Equations Using Iterative Methods Part 1-Of-2
10 pages
11.numerical Differentiation
No ratings yet
11.numerical Differentiation
20 pages
6 2
No ratings yet
6 2
2 pages
AI-BCA-6th Sem-DAVCCFBD
No ratings yet
AI-BCA-6th Sem-DAVCCFBD
13 pages
Automata Theory and Computability (17CS54) : 5 Semester
No ratings yet
Automata Theory and Computability (17CS54) : 5 Semester
20 pages
High-Performance NTT Architecture For Large Integer Multiplication
No ratings yet
High-Performance NTT Architecture For Large Integer Multiplication
4 pages
2AC3 Midterm Practice
No ratings yet
2AC3 Midterm Practice
4 pages
CS311-Computational Structures: Problems, Languages, Machines, Computability, Complexity
No ratings yet
CS311-Computational Structures: Problems, Languages, Machines, Computability, Complexity
51 pages
Grade 10 Provincial Examination Mathematics P1 (English) June 2023 Question Paper
No ratings yet
Grade 10 Provincial Examination Mathematics P1 (English) June 2023 Question Paper
5 pages
Today's Material: - Lower Bounds On Comparison-Based Sorting - Linear-Time Sorting Algorithms
No ratings yet
Today's Material: - Lower Bounds On Comparison-Based Sorting - Linear-Time Sorting Algorithms
16 pages
What Is FEC, and How Do I Use It - 2019-06-13 - Signal Integrity Journal
No ratings yet
What Is FEC, and How Do I Use It - 2019-06-13 - Signal Integrity Journal
4 pages
CGF and Regular Grammar - Ambuity
No ratings yet
CGF and Regular Grammar - Ambuity
6 pages
20CS3403
No ratings yet
20CS3403
2 pages
ModernCrypto18Homework10 Solutions
No ratings yet
ModernCrypto18Homework10 Solutions
4 pages
Data Compression (Rcs087) Assignment Unit-5
No ratings yet
Data Compression (Rcs087) Assignment Unit-5
6 pages
Tiling Games
From Everand
Tiling Games
Raymond R. Fletcher III
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)

6 Pagerank

Uploaded by

6 Pagerank

Uploaded by

Pagerank (Brin and Page 1998)

The idea that made Google great

1. What is pagerank? Main ideas

A page is important if it is pointed to by other important pages

• V = {1, 2, 3, 4}, and

where out( j) is the outdegree of vertex j.

Example with toy graph

This leads to the set of n + 1 linear equations:

2. Linear algebra view

Example with toy graph

3. Probabilistic view (random surfer)

Possible sequences of nodes visited by the random surfer:

p(t + 1) := M T p(t) := ( M T )2 p(t − 1) := ... := ( M T )t p(0)

4. Power iteration method

Implementation of power method and application to toy example

Mt = np.array([[1/3, 0, 1/3, 1/3],

while not np.allclose(p, p_old):

print(f'converged after {t} iterations to vector {p}')

5. Convergence guarantees of the power method

5.1. Dangling nodes

5.2. Sink nodes

5.3. Disconnected components

5.4 Certain cyclic patterns

5.5 Fixing the problematic cases: damping factor (λ)

5.6 Damping factor and teleportation

where u is the uniform vector (all of its entries are 1/n).

You might also like