Math 551 Lab 12

The document outlines a lab session for Math 551 focused on applying linear algebra techniques to rank webpages using the Google PageRank algorithm. It includes instructions for downloading necessary files, using MATLAB commands, and performing various tasks such as loading data, creating a Google matrix, and computing eigenvalues. The lab aims to illustrate the importance of webpage ranking based on hyperlink structures and user navigation probabilities.

Uploaded by

knap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

Math 551 Lab 12

Uploaded by

knap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Math 551

Lab 12

Goals: To apply linear algebra techniques to ranking of webpages in the World Wide Web.

To get started:

• Download the file lab12.m and put it in your working directory.

• Download the file AdjMatrix.mat which contains the adjacency matrix of a so-called
“wiki-vote” network with 8297 nodes 1

Matlab commands used: load, size, numel, nnz, for... end, gplot

What you have to submit: The file lab12.m which you will modify during the lab session.

INTRODUCTION

According to social polls, the majority of users only look at the first few results of online
searches and very few users look past the first page of results. Hence, it is crucially important
to rank the pages in the “right” order so that the most respectable and relevant results will
come first. The simplest way to determine the rank of a webpage in a network is to look at how
many times it has been referred to by other webpages. This simple ranking method leaves a
lot to be desired. In particular, it can be easily manipulated by referring to a certain webpage
from a lot of “junk” webpages. The quality of the webpages referring to the page we are trying
to rank should matter too. This is the main idea behind the Google PageRank algorithm.
The Google PageRank algorithm is the oldest algorithm used by Google to rank web pages
which are preranked offline. The PageRank score of every webpage is recomputed each time
Google crawls the web. Let us look at the theory behind the algorithm. As it turns out it, it
is based on the theorems of linear algebra!
The main assumption of the algorithm is that if you are located on any webpage then with equal
probability you can follow any of the hyperlinks from that page to another page. This allows
us to represent a webpage network as a directed graph with the webpages being the nodes,
and the edges being the hyperlinks between the webpages. The adjacency matrix of such a
network is built in the following way: the (i, j)th element of this matrix is equal to 1 if there
is a hyperlink from the webpage i to the webpage j and is equal to 0 otherwise. Then the row
sums of this matrix will represent numbers of hyperlinks from each webpage and the column
sums will represent numbers of times each webpage has been referred to by other webpages.
Furthermore, we can generate a matrix of probabilities S such that the (i, j)th element of this
matrix is equal to the probability of traveling from ith webpage to jth webpage in the network.
This probability is equal to zero if there is no hyperlink from ith page to jth page and is equal
to 1/Ni if there is a hyperlink from ith page to jth page, where Ni is the total number of
hyperlinks from the ith page. For instance, consider a sample network of only four webpages,
as shown on the Fig. 1. The matrix S for this network can be written as:
1
The original network data is available here: https://snap.stanford.edu/data/.

1
1 2

4 3
Figure 1: Sample network of four webpages
 
0 1/3 1/3 1/3
 1/2 0 0 1/2 
S=
 0 1/2 0 1/2 
 (1)
0 0 0 0

There are several issues which make working with the matrix S inconvenient. First of all, there
are webpages that do not have any hyperlinks - these are called “dangling nodes” (such as node
4 in Fig. 1). These nodes will correspond to zero rows of the matrix S. Moreover, the webpages
in the network may not be connected to each other and the graph of the network may consist
of several disconnected components. These possibilities lead to undesirable properties of the
matrix S which make computations more complicated and sometimes even impossible.
The problem of dangling nodes can be solved by assigning all elements of the matrix S in the
rows corresponding to the dangling nodes equal probabilities 1/N , where N is the number of
the nodes in the network. This can be understood in the following way: if we are at a dangling
node, we can with equal probability jump to any other page in the network. To solve the
potential disconnectedness problem, we assume that a user can follow hyperlinks on any page
with a probability 1 − α and can jump (or “teleport”) to any other page in the network with
a probability α. The number α is called the damping factor. The value α = 0.15 is usually
taken in practical applications. The “teleport” surfing of the network can be interpreted as a
user manually typing the webpage address in the browser or using a saved hyperlink from their
bookmarks to move from one page onto another. The introduction of the damping factor allows
us to obtain the Google matrix G in the form:
G = (1 − α)S + αE,
where E is a matrix with all the elements equal to 1/N , where N is a number of webpages in
the network.
The matrix G has nice properties. In particular, it has only positive entries and all of its rows
sum up to 1. In mathematical language, this matrix is stochastic and irreducible (you can look
up the precise definitions of these terms if you are interested). The matrix G satisfies the
following Perron-Frobenius theorem:

Theorem 1 (Perron-Frobenius) Every square matrix with positive entries has a unique unit
eigenvector with all positive entries. The eigenvalue corresponding to this eigenvector is real
and positive. Moreover, this eigenvalue is simple and is the largest in absolute value among all
the eigenvalues of this matrix.

2
Let us apply this theorem to the matrix G. First of all, observe that the row sums of the matrix
G are equal to 1. Consider the vector 1 = (1, 1, ..., 1)T (the vector of all 1s). It is easy to see
that
G1 = 1.
(Multiplying a matrix G by 1 has the same
√ effect as the MATLAB command sum(G,2).) But
then it follows that v1 = 1/k1k = 1/ N is the unique unit eigenvector with all positive
components, and, therefore, by the Perron-Frobenius theorem, λ1 = 1 is the largest eigenvalue!
We are interested in the left eigenvector for the eigenvalue λ1 = 1:
uT1 G = uT1 .
Again, by the Perron-Frobenius theorem, the vector u1 is the unique unit eigenvector with all
positive components corresponding to the largest in absolute value eigenvalue λ1 = 1. We will
use the components of this vector for the ranking of webpages in the network.
Let us look at the justification behind this algorithm. We have already established that the
vector u1 exists. Consider the following iterative process. Assume that at the beginning a user
can be on any webpage in the network with equal probability:
w0 = (1/N, 1/N, ..., 1/N ).
After 1 step (one move from one webpage to another using hyperlinks or teleporting), the
probability vector of being on the ith webpage is determined by the ith component of the
vector
w1 = w0 G.
After two moves the vector of probabilities becomes
w2 = w1 G = w0 G2 ,
and so on.
We hope that after a large number of steps n, the vector wn = w0 Gn starts approaching some
kind of limit vector w∗ , wn → w∗ . It turns out that due to the properties of the matrix G
this limit vector w∗ indeed exists and it is exactly the eigenvector corresponding to the largest
eigenvalue, namely, λ1 = 1. Moreover, numerical computation of matrix eigenvalues is actually
based on taking the powers of the matrix (it is called the Power method) and not on solving
the characteristic equation!
Notice that for any row vector w, the dot product w1 is simply the sum of all entries of w.
(That is, w*ones(N,1) == sum(w).) Also,
G1 = 1 =⇒ G2 1 = 1 =⇒ G3 1 = 1 =⇒ ···
(see if you can figure out why) so
wn 1 = w0 Gn 1 = w0 1 = 1.
From this, it is possible to show that w∗ is a non-negative vector whose entries sum to 1.
The ith component of this vector represent the probability of being on the ith webpage in the
network after a very large number of moves along the hyperlinks. Thus, it is reasonable to take
these probabilities as ranking of the webpages in the network.

3
TASKS

1. Open the file lab12.m. In the code cell titled %%Load the network data load the data
from the file AdjMatrix.mat into Matlab by using the load command. Save the resulting
matrix as AdjMatrix. Observe that the adjacency matrices of real networks are likely to
be very large (may contain millions of nodes or more) and sparse. Check the sparsity of
the matrix AdjMatrix using the functions numel and nnz. Denote the ratio of non-zero
elements over the total number entries in AdjMatrix as RatioNnzAdjMatrix.
Variables: AdjMatrix, RatioNnzAdjMatrix

2. Check the dimensions of the matrix AdjMatrix using the size function. Save the dimen-
sions as new variables m and n.
Variables: m, n

3. Observe that while the network described by the matrix AdjMatrix is not large at all from
the viewpoint of practical applications, computations with this matrix may still take a
noticeable amount of time. To save time, we will cut a subset out of this network and use
it to illustrate the Google PageRank algorithm. Introduce a new variable NumNetwork and
set its value to 500. Then cut a submatrix AdjMatrixSmall out of the matrix AdjMatrix
and plot the graph represented by the matrix AdjMatrixSmall by running the following
code cell (see the file lab12.m):
%% Display a small amount of network
NumNetwork=500;
AdjMatrixSmall=AdjMatrix(1:NumNetwork,1:NumNetwork);
for j=1:NumNetwork
coordinates(j,1)=NumNetwork*rand;
coordinates(j,2)=NumNetwork*rand;
end;
gplot(AdjMatrixSmall,coordinates,’k-*’);

This will plot the subgraph of the first 500 nodes in the network with random locations
of the nodes. Notice the use of the function gplot to produce this graph. Observe that
Matlab has special functions graph and digraph for working with graphs, but those
functions are a part of the special package “Graph and Network Algorithms” which may
not be immediately available. Simpler methods, as shown above, will be sufficient for our
purposes.
Variables: AdjMatrixSmall, coordinates, NumNetwork

4. Use sum and max to check the amount of links originating from each webpage, namely,
find the largest out-degree and the page with the largest out-degree.
Variables: NumLinks, MaxOutLinks, PageMaxOutLinks

5. Create a matrix of probabilities (Google matrix). Element (i, j) of the matrix shows the
probability of moving from the i-th page of the network to j-th page. It is assumed that
the user can follow any link on the page with a total probability of 85% (all hyperlinks
are equal), and jump (teleport) to any other page in the network with a total probability
of 15% (again, all pages are equal). Namely we set the parameter α = 0.15 and proceed
as follow (see the file lab12.m):

4
alpha=0.15;
GoogleMatrix=zeros(NumNetwork,NumNetwork);
for i=1:NumNetwork
if NumLinks(i)~=0
GoogleMatrix(i,:)=AdjMatrixSmall(i,:)./NumLinks(i);
else
GoogleMatrix(i,:)=1./NumNetwork;
end;
end;

GoogleMatrix=(1-alpha)*GoogleMatrix+alpha*ones(NumNetwork,NumNetwork)./NumNetwork;

6. Check the sum of the rows for the GoogleMatrix matrix.

Variables: SumGoogleMatrix
Q1: What can you say about the row sums of the GoogleMatrix matrix? Explain.

7. Introduce the 1 × NumNetwork row vector w0 with each entry equal to 1/NumNetwork,
and compute the consequent vectors w1 = w0 G, w2 = w1 G, w3 = w2 G, w90 = w0 G90 ,
w100 = w0 G100 , where G is the GoogleMatrix. Compute the difference δw = w100 − w90 .
Observe that the sequence wn converges to a certain limit vector w∗ very fast.
Variables: w0, w1, w2, w3, w90, w100, deltaw
8. Compute the eigenvalues and the left and the right eigenvectors of the matrix G using the
function eig. Observe that the right eigenvector corresponding to the eigenvalue λ1 = 1
is proportional to the vector v1 = (1, 1, ..., 1). To compute the left eigenvectors, use the
function eig on the matrix G’. Select the left eigenvector corresponding to the eigenvalue
λ1 = 1 and denote it as u1.
Variables: u1
9. By default the vector u1 is not scaled to have all positive components (even though they
will all have the same sign). Normalize this vector by using the code:
u1=abs(u1)/norm(u1,1);

This will create a probability vector with all positive components whose entries sum to 1.
10. Use the function max to select the maximal element and its index in the array.
Variables: MaxRank, PageMaxRank
Q2: Which page is the most important in the network?

11. Find out whether the highest ranking webpage is the same as the page with the most
hyperlinks pointing to it. To do so, create the vector of column sums of the matrix
AdjMatrixSmall and save it as MaxInLinks. Use the function max again to select the
page with the maximum number of in-links.
Variables: MaxInLinks, PageMaxInLinks
Q3: What is the number of hyperlinks pointing to the webpage PageMaxRank?

Social Network Analysis
No ratings yet
Social Network Analysis
28 pages
Applications of Eigenvalues and Eigenvectors
No ratings yet
Applications of Eigenvalues and Eigenvectors
5 pages
Senior Project Whole PDF
No ratings yet
Senior Project Whole PDF
23 pages
Power Point
No ratings yet
Power Point
77 pages
Module 4 MapReduce and Link Analysis
No ratings yet
Module 4 MapReduce and Link Analysis
103 pages
How Works: M. Ram Murty, FRSC Queen's Research Chair Queen's University
No ratings yet
How Works: M. Ram Murty, FRSC Queen's Research Chair Queen's University
29 pages
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
No ratings yet
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
35 pages
Chapters PageRank and 7
No ratings yet
Chapters PageRank and 7
66 pages
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
No ratings yet
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
99 pages
The Linear Algebra Behind Google
No ratings yet
The Linear Algebra Behind Google
13 pages
TM3 ch05 Link Analysis
No ratings yet
TM3 ch05 Link Analysis
69 pages
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
No ratings yet
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
13 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
44 pages
CSF-469-L11-13 (Link Analysis Page Rank)
No ratings yet
CSF-469-L11-13 (Link Analysis Page Rank)
47 pages
Page Rank With 13 Cases
No ratings yet
Page Rank With 13 Cases
72 pages
Lecture 9
No ratings yet
Lecture 9
64 pages
Distributed Computing Seminar: Lecture 5: Graph Algorithms & Pagerank
No ratings yet
Distributed Computing Seminar: Lecture 5: Graph Algorithms & Pagerank
33 pages
Random Walks On Graphs: An Overview: Purnamrita Sarkar
No ratings yet
Random Walks On Graphs: An Overview: Purnamrita Sarkar
71 pages
Page Rank
No ratings yet
Page Rank
29 pages
Google Pagerank: The World'S Largest Matrix Computation
No ratings yet
Google Pagerank: The World'S Largest Matrix Computation
13 pages
Googles Secret and Linear Algebra
No ratings yet
Googles Secret and Linear Algebra
7 pages
Google Ems English
No ratings yet
Google Ems English
22 pages
Assignment5 NLA Aug2023
No ratings yet
Assignment5 NLA Aug2023
7 pages
Lect 14-Web Ranking
No ratings yet
Lect 14-Web Ranking
30 pages
Page Rank
No ratings yet
Page Rank
13 pages
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
No ratings yet
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
51 pages
Google Pagerank: Maths Delivers!
No ratings yet
Google Pagerank: Maths Delivers!
24 pages
1.1 Pagerank Description
No ratings yet
1.1 Pagerank Description
19 pages
Markov Chains
No ratings yet
Markov Chains
37 pages
Cse535 Link Analysis
No ratings yet
Cse535 Link Analysis
19 pages
Jeffrey D. Ullman Stanford University
No ratings yet
Jeffrey D. Ullman Stanford University
55 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
18 pages
Rec Sys Network
No ratings yet
Rec Sys Network
45 pages
Lecture11 PageRank V0
No ratings yet
Lecture11 PageRank V0
38 pages
Google Pagerank and Reduced-Order Modelling
No ratings yet
Google Pagerank and Reduced-Order Modelling
56 pages
Lab 4-2
No ratings yet
Lab 4-2
4 pages
6 Pagerank
No ratings yet
6 Pagerank
7 pages
PageRank Algorithm - The Mathematics of Google Search
No ratings yet
PageRank Algorithm - The Mathematics of Google Search
8 pages
The Use of The Linear Algebra by Web Search Engines
No ratings yet
The Use of The Linear Algebra by Web Search Engines
5 pages
The Linear Algebra Behind Google
No ratings yet
The Linear Algebra Behind Google
34 pages
CS345 Data Mining: Link Analysis Algorithms Page Rank
No ratings yet
CS345 Data Mining: Link Analysis Algorithms Page Rank
37 pages
Abstract. The Original Purpose of Google'S Pagerank Algorithm Is To Assess The
No ratings yet
Abstract. The Original Purpose of Google'S Pagerank Algorithm Is To Assess The
6 pages
Feb 28
No ratings yet
Feb 28
12 pages
14 Link 1
No ratings yet
14 Link 1
10 pages
Link Analysis
No ratings yet
Link Analysis
47 pages
Google Eigenvector
No ratings yet
Google Eigenvector
3 pages
Report PDF
No ratings yet
Report PDF
35 pages
De Kerchove NV07
No ratings yet
De Kerchove NV07
15 pages
Pagerank
No ratings yet
Pagerank
3 pages
The Use of The Linear Algebra by Web Search Engine
No ratings yet
The Use of The Linear Algebra by Web Search Engine
6 pages
Applications of Stochastic Models in Web Page Ranking
No ratings yet
Applications of Stochastic Models in Web Page Ranking
8 pages
Link Analysis: (Follow The Links To Learn More!)
No ratings yet
Link Analysis: (Follow The Links To Learn More!)
28 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
9 pages
Page Rank PDF
0% (1)
Page Rank PDF
20 pages
The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102
No ratings yet
The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102
6 pages
Project 12
No ratings yet
Project 12
7 pages
Linear Algebra Report
No ratings yet
Linear Algebra Report
21 pages
FADEC - Full-Authority Digital Engine Control
100% (5)
FADEC - Full-Authority Digital Engine Control
44 pages
Project2 SimplifiedPageRank
No ratings yet
Project2 SimplifiedPageRank
6 pages
Gerd Baumann - Mathematics For Engineers III - Vector Calculus (2011, Oldenbourg Wissenschaftsverlag) (10.1524 - 9783486714470) - Libgen - Li
100% (1)
Gerd Baumann - Mathematics For Engineers III - Vector Calculus (2011, Oldenbourg Wissenschaftsverlag) (10.1524 - 9783486714470) - Libgen - Li
434 pages
Iphone 12 Mini 07300290A Repair
100% (1)
Iphone 12 Mini 07300290A Repair
81 pages
Chapter 3 - Types of Computers
No ratings yet
Chapter 3 - Types of Computers
23 pages
Authentic - V1?
No ratings yet
Authentic - V1?
5 pages
Brochure SRT 4930 - en
No ratings yet
Brochure SRT 4930 - en
2 pages
GDP PPT by Anuj Shrma
No ratings yet
GDP PPT by Anuj Shrma
18 pages
UltraPIPE Software Ingles
No ratings yet
UltraPIPE Software Ingles
34 pages
DC17 Ch04
No ratings yet
DC17 Ch04
42 pages
Bios & Uefi
No ratings yet
Bios & Uefi
3 pages
Block Coding - Contest Package 2025
No ratings yet
Block Coding - Contest Package 2025
2 pages
Leica Aibot: Line 1 Line 2 (Optional)
No ratings yet
Leica Aibot: Line 1 Line 2 (Optional)
2 pages
ESB Services API Reference Guide
No ratings yet
ESB Services API Reference Guide
12 pages
Arrays
No ratings yet
Arrays
11 pages
Jyothsna CV
No ratings yet
Jyothsna CV
1 page
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
No ratings yet
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
15 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
ORAIMO 20000 Mah Power Bank (12 W, Fast Charging) Price in India - Buy ORAIMO 20000 Mah Power Bank (12 W, Fast Charging) Online at
No ratings yet
ORAIMO 20000 Mah Power Bank (12 W, Fast Charging) Price in India - Buy ORAIMO 20000 Mah Power Bank (12 W, Fast Charging) Online at
4 pages
IBM I 7.1 System MGMT - Performance Ref Info
No ratings yet
IBM I 7.1 System MGMT - Performance Ref Info
278 pages
Python Mysql Connectivity Lab Work
No ratings yet
Python Mysql Connectivity Lab Work
3 pages
MCA 1st Year Syllabus
No ratings yet
MCA 1st Year Syllabus
14 pages
Rate of Penetration (ROP) Optimization in Drilling With Vibration Control by Hegde 2019
No ratings yet
Rate of Penetration (ROP) Optimization in Drilling With Vibration Control by Hegde 2019
11 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Unit4 SQL and Database Project at Students Notes
No ratings yet
Unit4 SQL and Database Project at Students Notes
29 pages
Cisco Webex Rooms Brochure
No ratings yet
Cisco Webex Rooms Brochure
22 pages
Smart Lock System Project Report
No ratings yet
Smart Lock System Project Report
2 pages
Brocade 300 8 GB Fibre Channel Switch Up To 24 Ports: Issue
No ratings yet
Brocade 300 8 GB Fibre Channel Switch Up To 24 Ports: Issue
3 pages
PecStar iEMS V3.6 System Design Guide
No ratings yet
PecStar iEMS V3.6 System Design Guide
17 pages
The Youtube Social Network: Mirjam Wattenhofer Roger Wattenhofer Zack Zhu
No ratings yet
The Youtube Social Network: Mirjam Wattenhofer Roger Wattenhofer Zack Zhu
9 pages
How To Find The Product Model of Your Dell Computer - Dell India
No ratings yet
How To Find The Product Model of Your Dell Computer - Dell India
3 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet

Math 551 Lab 12

Uploaded by

Math 551 Lab 12

Uploaded by

Math 551

• Download the file lab12.m and put it in your working directory.

6. Check the sum of the rows for the GoogleMatrix matrix.

You might also like