0% found this document useful (0 votes)
13 views4 pages

Lab 4-2

Uploaded by

yi Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Lab 4-2

Uploaded by

yi Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lab 4 Internet Search Engines Math 2B

I. Page Rank

Modern search engines employ ranking methods to provide the “best” results first. One of the
most known and influential algorithms for computing the relevance of web pages is the Page
Rank algorithm used by the Google search engine. The idea brought up by this algorithm is that
the importance of any web page can be judged by looking at the pages that link to it. If we create
a web page 𝑖 and include a hyperlink to the web page 𝑗, this means that we consider 𝑗 important
and relevant for our topic. If there are a lot of pages that link to 𝑗, this means that the common
belief is that page 𝑗 is important. If on the other hand, 𝑗 has only one backlink, but that comes
from an authoritative site 𝑘, we say that 𝑘 transfers its authority to 𝑗. That is to say 𝑘 asserts that
𝑗 is important. Whether we talk about popularity or authority, we can iteratively assign a rank to
each web page, based on the ranks of the pages that point to it. To understand this idea, let us a
look at a simple example illustrated by the image below. Suppose we have a small “internet”
consisting of just 4 web sites referencing each other in the manner suggested by the image.

If we assume that someone on any given w1 w2 w3 w4


1
page is equally likely to click on any of 0 0 0 2 Webpage 1
the links we can create a transition matrix 1
with the states are the 4 webpages. If we 3
0 1 0 Webpage 2
𝑃= 1 1 1
begin by assuming that a user is equally 0 Webpage 3
3 2 2
likely to be on any one of the pages to 1 1
begin with, we have an initial state vector [3 2
0
0] Webpage 4
𝒙𝟎 = [0.25, 0.25, 0.25, 0.25]𝑇 Then if we wanted to know the probability that the user is on a
given webpage after 𝑘 clicks we have the probability vector 𝑃𝑘 𝒙𝟎 .
1) Using the techniques from lab 2 to find the limit of this Markov process (round to 3 decimal
places). The result is called the PageRank vector of our system.

2) Recall from lab 2 that steady states correspond to the case where 𝑃𝒒 = 𝒒. We now know that
this is an example of a matrix having an eigenvalue of 1. Find all of the eigenvectors of 𝑃 which
correspond to the eigenvalue 1. A probability vector is a vector whose entries are all positive and
whose entries sum up to 1. Verify that the PageRank vector in (1) is the unique probability vector
which happens to be an eigenvector corresponding to the eigenvalue 1.

II. Slide shows (a simple example)

Consider a series of web pages which constitute a slide show. Each page has one or more of the
following buttons NEXT (go to next slide/page), PREVIOUS (go to previous page/slide), and
FIRST (go to first slide/page).

3) Suppose a slide show consists of 4 pages. The first page only contains a NEXT button. The
second page contains a NEXT and a PREVIOUS button. The third page contains a NEXT,
PREVIOUS and FIRST button. The fourth page contains a PREVIOUS and a FIRST button.
Find a transition matrix between these four pages. What is the logical initial state for this Markov
process? Find the PageRank vector for this slide show.

4) Consider a 5 page slide show where the first page contains a NEXT button and the last page
contains a PREVIOUS and a FIRST button and all of the other pages contain a NEXT and
PREVIOUS button. Find the PageRank vector for this slide show.

5) Design a 5 page slide show where the third page has the highest rank. Do you believe you
have designed a slide show that will give the third page the highest possible rank? Why or why
not? Does your slide show represent an “honest” slide show? Explain.

III. Dangling Nodes

Sometimes you might end up on a webpage that has no outgoing pages. We refer to this as a
dangling node. Consider the simple example of just 3 webpages. Page 1 and Page 2 only have a
link to Page 3, but Page 3 has no outgoing links. This is described by the directed graph

1 3 2
6) Write a transition matrix for this system. Verify that regardless of the choice of initial
probability vector, the PageRank vector is [0, 0, 1] implying that only page of any value is the
dangling node.

It does not make sense that pages without links would be the only important pages. Nor does it
make sense that a user on a page without any links would never leave that page. One solution to
this problem is to assume that any such dangling node has virtual links to all of the other pages
including itself.
7) Apply these virtual links to the system above to obtain a new transition matrix. What is the
PageRank vector in this situation.

IV. PageRank and Eigenvalues

Since the PageRank vector is a steady state of a Markov process we see that it must be an
eigenvector of the transition matrix corresponding to the eigenvalue 1. Although there are
infinitely many eigenvectors, we know that the PageRank vector is a probability vector (i.e. a
vector with nonnegative entries that sum to one). In order for the PageRank to be well-defined
we would need to show that it is the unique probability eigenvector corresponding to 1.

8) Consider a
system where pages
1 and 2 only point
at each other and
pages 3, 4 and 5
point at each other.
(This situation is
described in the
directed graph to
the right.)
Write that transition matrix described by this system. Find a basis for the eigenspace
corresponding to the eigenvalue 1. Is the PageRank vector unique in this system? Justify your
answer.

V. The Google Matrix

As we saw in section (IV) a problem occurs when we are presented with disconnected
components. If we have two unlinked page clusters and the initial state vector is only in one of
these clusters then none of the pages in the other cluster will ever be accessed. An algorithm for
dealing with this situation was developed by Larry Page and Sergey Brin while they were
graduate students at Stanford and would later be used in search engines when they founded
Google in the 1990s.

In order to overcome both the problems of dangling nodes and disconnected components they
introduced the idea that at any given moment there is a probability 𝑝 that a user might randomly
select any of the available webpages without following a link. (This probability is called the
damping factor. A typical value for 𝑝 is 0.15.) If 𝑃 is the transition matrix of the system we
define the Page Rank matrix (or Google matrix) of the system to be

𝑀 = (1 − 𝑝)𝑃 + 𝑝𝐵

where 𝐵 is the 𝑛 × 𝑛 matrix whose entries are all 1/𝑛.


9) Use the Page Rank matrix with a damping factor of 𝑝 = 0.15 to find the PageRank of the
system in (8).

10) Use the Page Rank matrix with a damping factor of 𝑝 = 0.15 to find the PageRank of the
system in (6). Do you observe the same problem that occurred in (6)? How does this new
solution compare to the solution found in (7)? Which fix do you feel more accurately reflects
what might happen? Why?

The reason that the Page Rank matrix solves our earlier problems is due to the Perron-Frobenius
Theorem. One of which’s corollaries states that if 𝑃 is a positive, column stochastic matrix (i.e. a
matrix with non-negative entries whose columns sum to 1), then there is a unique probability
eigenvector which corresponds to the eigenvalue 1.

11) Prove that if 𝑃 is a transition matrix then the Page Rank matrix is a positive, column
stochastic matrix.

12) Consider the system


represented by the
directed graph to the left.
Compute the PageRank
vector of this system using
the Google matrix and a
damping factor of
𝑝 = 0.15. Interpret your
results in terms of the
relationship between the
number of incoming links
that each node has and its
rank.

13) Use the Google Matrix


to compute the PageRank
vector of the directed graph
to the left using the
damping factors

𝑝=0

𝑝 = 0.15

𝑝 = 0.5

𝑝=1

You might also like