0% found this document useful (0 votes)
12 views25 pages

Eigenvector Centrality and HITS Algorithm

Eigenvector Centrality and HITS algorithm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

Eigenvector Centrality and HITS Algorithm

Eigenvector Centrality and HITS algorithm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Eigenvector

Centrality and
Hyperlink
Induced Topic
Search (HITS)
Eigenvector Centrality : Revisited
❑The eigen vector centrality 𝑥𝑣 of a node 𝑣 in a network 𝐺 𝑉, 𝐸 is given by
1 1
𝑥𝑣 = ෍ 𝑥𝑡 = ෍(𝑎𝑣𝑡 × 𝑥𝑡 )
λ λ
𝑡∈𝑁(𝑣) 𝑡∈𝑉

where λ is the largest eigen value of the matrix 𝐴 = 𝑎𝑖𝑗 , the adjacency matrix of the network 𝐺

❑The largest eigen value λ is obtained by solving the equation


𝐴. 𝑋 = λ. 𝑋

❑𝑋 above is a column vector, whose 𝑣 𝑡ℎ entry is 𝑥𝑣 , the eigen vector centrality of the node 𝑣
Hyperlink-Induced Topic Search
(HITS)
❑Based on the concept of Hub nodes and Authority nodes.
Hyperlink-Induced Topic Search
(HITS)
❑In response to a query, instead of an ordered list of pages each meeting the
query, find two sets of inter-related pages:

❑Hub pages are good lists of links on a subject.


`
❑Authority pages occur recurrently on good hubs for the subject.

❑Thus, a good hub page for a topic points to many authoritative pages for that
topic.

❑A good authority page for a topic is pointed to by many good hubs for that topic.
AT&T
Alice

ITIM
Hubs Authorities
Bob
O2

Mobile telecom companies


How to compute hub and authority
scores
❑ Do a regular web search first

❑ Call the search result the root set

❑ Find all pages that are linked to or linked from pages in the root set

❑ Call this larger set the base set

❑ Finally, compute hubs and authorities for the base set (which we’ll view as a
small web graph)
How to compute hub and authority
scores…

• Root set typically has


200–1000 nodes.

Root • Base set may have up to


set 5000 nodes.

Base set
How to compute hub and
authority scores…
▪ Given a broad search query, q, HITS collects a set of pages as
follows:
▪ It sends the query q to a search engine.
▪ It then collects t (t = 200 is used in the HITS paper) highest ranked
pages. This set is called the root set W.
▪ It then grows W by including any page pointed to by a page in W and
any page that points to a page in W. This gives a larger set S, base
set.
How to compute hub and
authority scores…
▪ HITS works on the pages in S, and assigns every page in S an
authority score and a hub score.
▪ Let the number of pages in S be n.
▪ We use G = (V, E) to denote the hyperlink graph of S.
▪ We use L to denote the adjacency matrix of the graph.
How to compute hub and
authority scores…
Let the authority score of the page i be a(i), and the hub score of page i be
h(i).
The mutual reinforcing relationship of the two scores is represented as
follows:
a(i) =  h( j )
( j ,i )E

h(i) =  a( j )
( i , j )E
How to compute hub and
authority scores…
We use a to denote the column vector with all the authority
scores,
a = [a(1), a(2), …, a(n)]T, and
use h to denote the column vector with all the hub scores,
h = [h(1), h(2), …, h(n)]T,
Then,
a = LTh
h = La
How to compute hub and
authority scores…
▪ The computation of authority scores and hub scores is the same as the computation of the
PageRank scores, using power iteration.

▪ If we use ak and hk to denote authority and hub vectors at the kth iteration, the iterations for
generating the final solutions are
How to compute hub and authority
scores…
Example:
Example:
Exercise: Compute Hub and
Authority for the below graph
Co-citation and Bibliographic
Coupling
Another area of research concerned with links is citation analysis of scholarly
publications.
◦ A scholarly publication cites related prior work to acknowledge the origins of some ideas
and to compare the new proposal with existing work.

When a paper cites another paper, a relationship is established between the


publications.
◦ Citation analysis uses these relationships (links) to perform various types of analysis.

We discuss two types of citation analysis, co-citation and bibliographic


coupling. The HITS algorithm is related to these two types of analysis.
Co-citation and Bibliographic
Coupling
If papers i and j are both cited by paper k, then they may be related in some sense to one
another.

The more papers they are cited by, the stronger their relationship is.
Bibliographic coupling
Bibliographic coupling operates on a similar principle.
Bibliographic coupling links papers that cite the same articles
◦ if papers i and j both cite paper k, they may be related.
The more papers they both cite, the stronger their similarity is.
Relationships with co-citation
and bibliographic coupling
Co-citation of pages i and j, denoted by Cij, is
n
Cij = 
k =1
Lki Lkj = ( LT L)ij

The authority matrix (LTL) of HITS is the co-citation matrix C

Bibliographic coupling of two pages i and j, denoted by Bij is


n
Bij = 
k =1
Lik L jk = ( LLT )ij ,
The hub matrix (LLT) of HITS is the bibliographic coupling matrix B
Strengths and weaknesses of
HITS
Strength: its ability to rank pages according to the query topic, which
may be able to provide more relevant authority and hub pages.
Weaknesses:
◦ It is easily spammed. It is in fact quite easy to influence HITS since adding out-
links in one’s own page is so easy.
◦ Topic drift. Many pages in the expanded set may not be on topic.
◦ Inefficiency at query time: The query time evaluation is slow. Collecting the root
set, expanding it and performing eigenvector computation are all expensive
operations
Katz Centrality
❑An extension of eigenvector centrality

❑Can be used to compute centrality in directed networks such as citation networks and the World
Wide Web

❑Mostly suitable in the analysis of directed acyclic graphs

❑Computes the relative influence of a node in a network by considering all immediate neighbors
and all further nodes connected to the node

❑Connections with distant neighbors are, however, penalized by an attenuation factor


Katz Centrality: Attenuation Factor
❑ Let us consider the influence of Jose in the network,
and also let the attenuation factor be 𝛼, 0 < 𝛼 < 1

❑ Immediate neighbours of Jose are Diego, Aziz, Bob,


Priya, and Sri. Influence of these neighbours on Jose
𝑣 would be attenuated at a factor of 𝛼

❑ Second order neighbours of Jose are Agneta, John,


Samantha, and Kim. Influence of these neighbours
on Jose would be attenuated at a factor of 𝛼 2

❑ The (only) third order neighbour of Jose is Jane.


https://www.geeksforgeeks.org/katz-centrality-centrality-measure/
Influence of these neighbours on Jose would be
attenuated at a factor of 𝛼 3
Katz Centrality
❑The Katz centrality of a node 𝑣𝑖 in a network 𝐺(𝑉, 𝐸), denoted 𝐶𝐾𝑎𝑡𝑧 (𝑖), is
defined as
∝ |𝑉|
𝑘
𝐶𝐾𝑎𝑡𝑧 𝑖 = ෍ ෍ 𝛼 𝑘 × 𝐴𝑗𝑖
𝑘=1 𝑗=1

where 𝐴 is the adjacency matrix of 𝐺


❑Matrix 𝐴𝑘 indicates the presence/absence of a path of length 𝑘 between a
node-pair
Slide Credits and Reference Material:
1) Social Network Analysis by Tanmoy Chakrabor ty
2) Slides by : CS583, Bing Liu, UIC
3) Slides by: CS276, Information Retrieval and Web Search Chris
Manning and Pandu Nayak

You might also like