0% found this document useful (0 votes)

13 views4 pages

Lab 4-2

Uploaded by

yi Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

Lab 4-2

Uploaded by

yi Liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lab 4 Internet Search Engines Math 2B

I. Page Rank

Modern search engines employ ranking methods to provide the “best” results first. One of the
most known and influential algorithms for computing the relevance of web pages is the Page
Rank algorithm used by the Google search engine. The idea brought up by this algorithm is that
the importance of any web page can be judged by looking at the pages that link to it. If we create
a web page 𝑖 and include a hyperlink to the web page 𝑗, this means that we consider 𝑗 important
and relevant for our topic. If there are a lot of pages that link to 𝑗, this means that the common
belief is that page 𝑗 is important. If on the other hand, 𝑗 has only one backlink, but that comes
from an authoritative site 𝑘, we say that 𝑘 transfers its authority to 𝑗. That is to say 𝑘 asserts that
𝑗 is important. Whether we talk about popularity or authority, we can iteratively assign a rank to
each web page, based on the ranks of the pages that point to it. To understand this idea, let us a
look at a simple example illustrated by the image below. Suppose we have a small “internet”
consisting of just 4 web sites referencing each other in the manner suggested by the image.

If we assume that someone on any given w1 w2 w3 w4

1
page is equally likely to click on any of 0 0 0 2 Webpage 1
the links we can create a transition matrix 1
with the states are the 4 webpages. If we 3
0 1 0 Webpage 2
𝑃= 1 1 1
begin by assuming that a user is equally 0 Webpage 3
3 2 2
likely to be on any one of the pages to 1 1
begin with, we have an initial state vector [3 2
0
0] Webpage 4
𝒙𝟎 = [0.25, 0.25, 0.25, 0.25]𝑇 Then if we wanted to know the probability that the user is on a
given webpage after 𝑘 clicks we have the probability vector 𝑃𝑘 𝒙𝟎 .
1) Using the techniques from lab 2 to find the limit of this Markov process (round to 3 decimal
places). The result is called the PageRank vector of our system.

2) Recall from lab 2 that steady states correspond to the case where 𝑃𝒒 = 𝒒. We now know that
this is an example of a matrix having an eigenvalue of 1. Find all of the eigenvectors of 𝑃 which
correspond to the eigenvalue 1. A probability vector is a vector whose entries are all positive and
whose entries sum up to 1. Verify that the PageRank vector in (1) is the unique probability vector
which happens to be an eigenvector corresponding to the eigenvalue 1.

II. Slide shows (a simple example)

Consider a series of web pages which constitute a slide show. Each page has one or more of the
following buttons NEXT (go to next slide/page), PREVIOUS (go to previous page/slide), and
FIRST (go to first slide/page).

3) Suppose a slide show consists of 4 pages. The first page only contains a NEXT button. The
second page contains a NEXT and a PREVIOUS button. The third page contains a NEXT,
PREVIOUS and FIRST button. The fourth page contains a PREVIOUS and a FIRST button.
Find a transition matrix between these four pages. What is the logical initial state for this Markov
process? Find the PageRank vector for this slide show.

4) Consider a 5 page slide show where the first page contains a NEXT button and the last page
contains a PREVIOUS and a FIRST button and all of the other pages contain a NEXT and
PREVIOUS button. Find the PageRank vector for this slide show.

5) Design a 5 page slide show where the third page has the highest rank. Do you believe you
have designed a slide show that will give the third page the highest possible rank? Why or why
not? Does your slide show represent an “honest” slide show? Explain.

III. Dangling Nodes

Sometimes you might end up on a webpage that has no outgoing pages. We refer to this as a
dangling node. Consider the simple example of just 3 webpages. Page 1 and Page 2 only have a
link to Page 3, but Page 3 has no outgoing links. This is described by the directed graph

1 3 2
6) Write a transition matrix for this system. Verify that regardless of the choice of initial
probability vector, the PageRank vector is [0, 0, 1] implying that only page of any value is the
dangling node.

It does not make sense that pages without links would be the only important pages. Nor does it
make sense that a user on a page without any links would never leave that page. One solution to
this problem is to assume that any such dangling node has virtual links to all of the other pages
including itself.
7) Apply these virtual links to the system above to obtain a new transition matrix. What is the
PageRank vector in this situation.

IV. PageRank and Eigenvalues

Since the PageRank vector is a steady state of a Markov process we see that it must be an
eigenvector of the transition matrix corresponding to the eigenvalue 1. Although there are
infinitely many eigenvectors, we know that the PageRank vector is a probability vector (i.e. a
vector with nonnegative entries that sum to one). In order for the PageRank to be well-defined
we would need to show that it is the unique probability eigenvector corresponding to 1.

8) Consider a
system where pages
1 and 2 only point
at each other and
pages 3, 4 and 5
point at each other.
(This situation is
described in the
directed graph to
the right.)
Write that transition matrix described by this system. Find a basis for the eigenspace
corresponding to the eigenvalue 1. Is the PageRank vector unique in this system? Justify your
answer.

V. The Google Matrix

As we saw in section (IV) a problem occurs when we are presented with disconnected
components. If we have two unlinked page clusters and the initial state vector is only in one of
these clusters then none of the pages in the other cluster will ever be accessed. An algorithm for
dealing with this situation was developed by Larry Page and Sergey Brin while they were
graduate students at Stanford and would later be used in search engines when they founded
Google in the 1990s.

In order to overcome both the problems of dangling nodes and disconnected components they
introduced the idea that at any given moment there is a probability 𝑝 that a user might randomly
select any of the available webpages without following a link. (This probability is called the
damping factor. A typical value for 𝑝 is 0.15.) If 𝑃 is the transition matrix of the system we
define the Page Rank matrix (or Google matrix) of the system to be

𝑀 = (1 − 𝑝)𝑃 + 𝑝𝐵

where 𝐵 is the 𝑛 × 𝑛 matrix whose entries are all 1/𝑛.

9) Use the Page Rank matrix with a damping factor of 𝑝 = 0.15 to find the PageRank of the
system in (8).

10) Use the Page Rank matrix with a damping factor of 𝑝 = 0.15 to find the PageRank of the
system in (6). Do you observe the same problem that occurred in (6)? How does this new
solution compare to the solution found in (7)? Which fix do you feel more accurately reflects
what might happen? Why?

The reason that the Page Rank matrix solves our earlier problems is due to the Perron-Frobenius
Theorem. One of which’s corollaries states that if 𝑃 is a positive, column stochastic matrix (i.e. a
matrix with non-negative entries whose columns sum to 1), then there is a unique probability
eigenvector which corresponds to the eigenvalue 1.

11) Prove that if 𝑃 is a transition matrix then the Page Rank matrix is a positive, column
stochastic matrix.

12) Consider the system

represented by the
directed graph to the left.
Compute the PageRank
vector of this system using
the Google matrix and a
damping factor of
𝑝 = 0.15. Interpret your
results in terms of the
relationship between the
number of incoming links
that each node has and its
rank.

13) Use the Google Matrix

to compute the PageRank
vector of the directed graph
to the left using the
damping factors

𝑝=0

𝑝 = 0.15

𝑝 = 0.5

𝑝=1

DOLERO8 Key Officials
No ratings yet
DOLERO8 Key Officials
2 pages
SF EP People Profile Admin
No ratings yet
SF EP People Profile Admin
112 pages
OperatingSystem Lab9
No ratings yet
OperatingSystem Lab9
24 pages
Install Shield User Guide
100% (1)
Install Shield User Guide
2,078 pages
Google Pagerank: The World'S Largest Matrix Computation
No ratings yet
Google Pagerank: The World'S Largest Matrix Computation
13 pages
Applications of Eigenvalues and Eigenvectors
No ratings yet
Applications of Eigenvalues and Eigenvectors
5 pages
Microsoft Outlook 2016 - Level 1
No ratings yet
Microsoft Outlook 2016 - Level 1
503 pages
2024-03-23
No ratings yet
2024-03-23
6 pages
(ILLUSION) AI Girl and Honey Select 2 - Card Sharing Thread: Attachments
No ratings yet
(ILLUSION) AI Girl and Honey Select 2 - Card Sharing Thread: Attachments
1 page
Digital Marketing (SMM & SEO)
No ratings yet
Digital Marketing (SMM & SEO)
7 pages
What Is Broadband
No ratings yet
What Is Broadband
3 pages
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
No ratings yet
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
46 pages
CS345 Data Mining: Link Analysis Algorithms Page Rank
No ratings yet
CS345 Data Mining: Link Analysis Algorithms Page Rank
37 pages
Link Analysis
No ratings yet
Link Analysis
37 pages
Application of Eigenvalues and Eigenvectors.
No ratings yet
Application of Eigenvalues and Eigenvectors.
10 pages
De Kerchove NV07
No ratings yet
De Kerchove NV07
15 pages
Bridging The Digital Divide For Senior Citizens
No ratings yet
Bridging The Digital Divide For Senior Citizens
33 pages
Markov Chains PDF
No ratings yet
Markov Chains PDF
66 pages
Number: 156-215.80 Passing Score: 800 Time Limit: 120 Min File Version: 1
No ratings yet
Number: 156-215.80 Passing Score: 800 Time Limit: 120 Min File Version: 1
68 pages
Examen Associate Administrator
100% (1)
Examen Associate Administrator
15 pages
MMD4
No ratings yet
MMD4
13 pages
Gi Joe 2 Retaliation Movie 1080p Kickass Torrent Download PDF
No ratings yet
Gi Joe 2 Retaliation Movie 1080p Kickass Torrent Download PDF
4 pages
Online EC - Citizen User Manual
No ratings yet
Online EC - Citizen User Manual
77 pages
Project2 SimplifiedPageRank
No ratings yet
Project2 SimplifiedPageRank
6 pages
React Resume
100% (2)
React Resume
4 pages
EXP-11-Implementation of Page Rank Algorithm
No ratings yet
EXP-11-Implementation of Page Rank Algorithm
8 pages
By Jose Elias Flores Llallico
No ratings yet
By Jose Elias Flores Llallico
20 pages
Power Point
No ratings yet
Power Point
77 pages
Cos1511 101 3 2021
100% (1)
Cos1511 101 3 2021
59 pages
Computer and Network Security: Firewall and Proxy Servers Lab
No ratings yet
Computer and Network Security: Firewall and Proxy Servers Lab
15 pages
2023-12-30
No ratings yet
2023-12-30
7 pages
Page Rank
No ratings yet
Page Rank
29 pages
Module VI Link Analysis Final
No ratings yet
Module VI Link Analysis Final
104 pages
Clustering of Hub and Authority Web Docu
No ratings yet
Clustering of Hub and Authority Web Docu
5 pages
Pagerank
No ratings yet
Pagerank
3 pages
Module 4 MapReduce and Link Analysis
No ratings yet
Module 4 MapReduce and Link Analysis
103 pages
Proposal For Website Maintenance and Design Services
No ratings yet
Proposal For Website Maintenance and Design Services
2 pages
Google Eigenvector
No ratings yet
Google Eigenvector
3 pages
Proyecto: Expressions W/ Tener
No ratings yet
Proyecto: Expressions W/ Tener
9 pages
Lecture 3 - Page Rank
No ratings yet
Lecture 3 - Page Rank
7 pages
Lecture 12 - Link Analysis
No ratings yet
Lecture 12 - Link Analysis
57 pages
Debian Server
No ratings yet
Debian Server
12 pages
Link Analysis
No ratings yet
Link Analysis
47 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
44 pages
Google PageRank - The Math Behind The Search Engine - Rebecca S Wills
No ratings yet
Google PageRank - The Math Behind The Search Engine - Rebecca S Wills
15 pages
Liuty
No ratings yet
Liuty
50 pages
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
No ratings yet
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
33 pages
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
No ratings yet
The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google
13 pages
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
No ratings yet
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
99 pages
Markov Chains
No ratings yet
Markov Chains
37 pages
Usa Uncek 45
0% (1)
Usa Uncek 45
33 pages
Partnering With Dell EMC Services
No ratings yet
Partnering With Dell EMC Services
20 pages
Lect 14-Web Ranking
No ratings yet
Lect 14-Web Ranking
30 pages
CSF-469-L11-13 (Link Analysis Page Rank)
No ratings yet
CSF-469-L11-13 (Link Analysis Page Rank)
47 pages
Assignment5 NLA Aug2023
No ratings yet
Assignment5 NLA Aug2023
7 pages
Lecture 9
No ratings yet
Lecture 9
64 pages
Jeffrey D. Ullman Stanford University
No ratings yet
Jeffrey D. Ullman Stanford University
55 pages
Page Rank and HITS
No ratings yet
Page Rank and HITS
39 pages
Google Pagerank: Maths Delivers!
No ratings yet
Google Pagerank: Maths Delivers!
24 pages
Page Rank With 13 Cases
No ratings yet
Page Rank With 13 Cases
72 pages
Google Pagerank and Reduced-Order Modelling
No ratings yet
Google Pagerank and Reduced-Order Modelling
56 pages
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
No ratings yet
Dbms Review-3: G.BALAVIGNESH-10MSE1072 Harshavardhan-10Mse1077
35 pages
Web App. Dev. Security - Lab Manual 4-1 - PHP-2
No ratings yet
Web App. Dev. Security - Lab Manual 4-1 - PHP-2
19 pages
Page Rank PDF
0% (1)
Page Rank PDF
20 pages
Cse535 Link Analysis
No ratings yet
Cse535 Link Analysis
19 pages
Implementation and Analysis of Google's Page Rank Algorithm Using Network Dataset
No ratings yet
Implementation and Analysis of Google's Page Rank Algorithm Using Network Dataset
5 pages
D.M. Agencies
No ratings yet
D.M. Agencies
34 pages
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
No ratings yet
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
51 pages
Technical University of Ilmenau Institute For Theoretical and Technical Computer Science Automata and Formal Languages
No ratings yet
Technical University of Ilmenau Institute For Theoretical and Technical Computer Science Automata and Formal Languages
19 pages
White and Blue Professional Modern Technology Pitch Deck Presentation
No ratings yet
White and Blue Professional Modern Technology Pitch Deck Presentation
11 pages
Report PDF
No ratings yet
Report PDF
35 pages
RajSingh WIexp1
No ratings yet
RajSingh WIexp1
7 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
18 pages
Foc Unit 3
No ratings yet
Foc Unit 3
42 pages
Bigdata Analytics Module 6: Big Data Analytics Applications: Faculty Name: Ms. Varsha Sanap Dr. Vivek Kumar Singh
No ratings yet
Bigdata Analytics Module 6: Big Data Analytics Applications: Faculty Name: Ms. Varsha Sanap Dr. Vivek Kumar Singh
31 pages
6 Pagerank
No ratings yet
6 Pagerank
7 pages
Math 551 Lab 12
No ratings yet
Math 551 Lab 12
5 pages
VLSM Addressing: Problem 1
No ratings yet
VLSM Addressing: Problem 1
9 pages
PageRank Algorithm - The Mathematics of Google Search
No ratings yet
PageRank Algorithm - The Mathematics of Google Search
8 pages
Name: Kartik Jolapara Sapid: Div: Branch
No ratings yet
Name: Kartik Jolapara Sapid: Div: Branch
4 pages
All About Stryx
No ratings yet
All About Stryx
5 pages
Link Analysis: (Follow The Links To Learn More!)
No ratings yet
Link Analysis: (Follow The Links To Learn More!)
28 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
9 pages
Google PageRank
No ratings yet
Google PageRank
22 pages
The Use of The Linear Algebra by Web Search Engines
No ratings yet
The Use of The Linear Algebra by Web Search Engines
5 pages
Applications of Stochastic Models in Web Page Ranking
No ratings yet
Applications of Stochastic Models in Web Page Ranking
8 pages
1.1 Pagerank Description
No ratings yet
1.1 Pagerank Description
19 pages
The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102
No ratings yet
The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102
6 pages
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lab 4-2

Uploaded by

Lab 4-2

Uploaded by

Lab 4 Internet Search Engines Math 2B

If we assume that someone on any given w1 w2 w3 w4

II. Slide shows (a simple example)

III. Dangling Nodes

IV. PageRank and Eigenvalues

V. The Google Matrix

where 𝐵 is the 𝑛 × 𝑛 matrix whose entries are all 1/𝑛.

12) Consider the system

13) Use the Google Matrix

You might also like