Link Mining
Link Mining
• Issues
• Content similarity is easily spammed.
Links
e.g.Alta Vista
• Authority
• A page with many in-links.
• page may have good or authoritative content on a topic
• Hub
• Page with many out-links.
• Page serves as an organizer of information on a topic
• The key idea
• Good hub points to many good authorities
• Good authority is pointed to by many good hubs.
Authorities and Hubs
• Initially Ha = Hb = Hc = Hd =1
Authorities and Hubs example
• Initially Ha = Hb = Hc = Hd =1
1. Aa = Hb = 1; Ab = Ha = 1; Ac = Ha + Hb = 2; Ad = Ha + Hb + Hc = 3
Normalise: Aa = 0.143 ; Ab = 0.143; Ac = 0.286; Ad = 0.429
• HITS works on the pages in S, and assigns every page in S an authority score
and a hub score.
• Let the number of pages in S be n.
• We again use G = (V, E) to denote the hyperlink graph of S.
• We use L to denote the adjacency matrix of the graph.
The HITS algorithm
• Let the authority score of the page i be a(i), and the hub score of page i be
h(i).
• The mutual reinforcing relationship of the two scores is represented as
follows:
The HITS
algorithm
How is HITS used