0% found this document useful (0 votes)
134 views23 pages

Page Ranking Techniques: Eminar

The document discusses PageRank, the algorithm used by Google to rank the importance of web pages. It defines PageRank as a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any page. The PageRank of a page is calculated based on the PageRank of other pages linking to it, with pages that have more important inbound links having a higher PageRank. Examples are provided to demonstrate how the PageRank of pages changes based on their link structure. Finally, ways to artificially increase a page's PageRank through unethical techniques like spamming are discussed.

Uploaded by

Chandan Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views23 pages

Page Ranking Techniques: Eminar

The document discusses PageRank, the algorithm used by Google to rank the importance of web pages. It defines PageRank as a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any page. The PageRank of a page is calculated based on the PageRank of other pages linking to it, with pages that have more important inbound links having a higher PageRank. Examples are provided to demonstrate how the PageRank of pages changes based on their link structure. Finally, ways to artificially increase a page's PageRank through unethical techniques like spamming are discussed.

Uploaded by

Chandan Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Seminar

on

Page Ranking Techniques


In Search Engines

chandan kumar sethy


regd no.-0701211135
roll no.-107428
branch-cse
Introduction
Need
Increasing need of Search engine.
Search results should be ordered by
Relevancy.
Importance.

What is Page Ranking


Algorithms
HITS (Hyperlink Induced Topic Search)
e.g.Alta Vista

PageRank
e.g. Google.
Definition – PageRank.
We assume page A has pages T1...Tn which point to it
(i.e., are citations). The parameter d is a damping
factor, which can be set between 0 and 1. We usually set
d to 0.85 .……. C(A) is defined as the number of links
going out of page A. The PageRank of a page A is given
as follows:
 

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Ref: Sergey Brin and Lawrence Page ”The Anatomy of a Large-Scale


Hypertextual Web Search Engine”
http://www-db.stanford.edu/~backrub/google.html
How to use formula.
e.g. 2 pages A and B, pointing to each other.

A B
Start with PR(A) = PR(B) =1

PR(A) = (1-d) + d * (PR(B)/C(B))


= (1-0.85) + 0.85 * (1/1)
= 1
PR(B) = (1-d) + d * (PR(A)/C(A))
= (1-0.85) + 0.85 * (1/1)
= 1
Lets start with PR(A) = PR(B) = 10
After 1st iteration:
PR(A) = (1-d) + d*(PR(B)/C(B))
= 0.15 + 0.85 * (10/1)
= 8.65
PR(B) = (1-d) + d*(PR(A)/C(A))
= 0.15 + 0.85 * (8.65/1)
= 7.50
After 2nd iteration:
PR(A) = (1-d) + d*(PR(B)/C(B))
= 0.15 + 0.85 * (7.50/1)
= 6.527
PR(B) = (1-d) + d*(PR(A)/C(A))
= 0.15 + 0.85 * (6.527/1)
= 5.698
And so on….. till?
Ans: Iterations should be repeated till PR values
converges……..

In this example ……..till


PR(A) = PR(B) =1.

Thus we can start with any values of PR, and


should repeat iterations till PR values converges
i.e. don’t change too much.
Difference…
Result of PR calculation.
Google toolbar values
Examples
Assumption: We’ll take initial PR value of each page as 1.0
Example 1

  PR(A) = (1-d) + d ( 0)
A B = 0.15
PR(B) = (1-d) + d (0)
= 0.15

For practicing examples on PageRank use calculator:


www.webworkshop.net/pagerank_calculator.php?
lnks=2,10,15&iblprs=0.15,0.15,0.15,0.15&pgnms=&pgs=2&initpr=1&
its=100&type=simple
Example 2

PR (A) = (1-d) + d (PR(B) / C(B))


= 0.15 + 0.85 (1/1)
A B =1
PR (B) = (1-d) + d (0)
= 0.15

Dangling links are links that go to pages that don't have any outbound links.
Orphan pages are those, which don’t have any inbound link.
Example 3
From here onwards I’ll represent final PR values after sufficient no. of
iterations inside page.

B B
1.0 1.0
A A
1.0 1.0

C C
1.0
1.0
Example 4

B
0.575

A
1.85

C
0.575

Observation: We can channel large proportion of PR


of site to a particular page.
Example 5

B
External Site1 0.575
1.0 A
1.0

C External Site 2
0.638
0.575

External Site 1 B
1.0 1.255
A
2.6

C External Site 2
1.255 1.215

Observation: We can reduce PR leak by increasing internal link


structure.
Example 5 Cont..

B
1.549

External Site 1 A
1.0 2.146

C External Site 2
1.720 1.215
How to increase PR?

By adding spam pages.


Join forum.
Submit to search engine directories.
Reciprocating links.
Contents.
Adding spam pages.

B
281.6

Spam 1

0.39
A
331.0
Spam 2
0.39

Spam
1000
0.39
Conclusion.
Even though formula for calculating PageRank seems to be
difficult, it is easy to understand. But when a simple calculation is
applied hundreds of times, the results can seem complicated. And we
can not predict the result of these iterations. Surely, more practice can
yield more observations.
PageRank is important factor considered in Google ranking, but
it is only one of the important factors considered. e.g. now a days
Google is paying a lot of attention to the link’s anchor text while
deciding relevancy of target page.
But as Page Rank is also one of the important factor, one should
be well aware of PageRank while designing the website.
References.
http://www.webworkshop.net/pagerank.html
 http://www.iprcom.com/papers/pagerank/
http://www-db.stanford.edu/~backrub/google.html
http://www.google.com/intl/en/technology/
http://www.google-watch.org/pagerank.html
?
Thanks

You might also like