0% found this document useful (0 votes)
7 views17 pages

SimRank Algorithm

SimRank Algorithm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

SimRank Algorithm

SimRank Algorithm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Social

Network
Analysis
LINK ANALYSIS
SimRank: Measuring Similarity of
Objects
SimRank: Measuring Similarity of
Objects
▪ SimRank measures similarity of the structural context in which objects
occur, based on their relationships with other objects.
▪ Idea : Two objects are similar if they are related to similar objects
▪ For a given domain, SimRank can be combined with other domain-specific
similarity measures.
▪ Example 1 : Citation graph
▪ Two papers are similar if they are cited by similar papers
▪ Example 2 : E- Commerce graph
▪ Two products are similar if they are bought by similar customers
▪ Two customers are similar if they are buying similar products
SimRank: Measuring Similarity of
Objects
▪ Idea : Two objects are similar if they are related to similar
objects
▪ More precisely, objects a and b are similar if they are related to
objects c and d, respectively, and c and d are themselves
similar.
▪ The base case is that objects are similar to themselves.
SimRank: Measuring Similarity of
Objects
▪ Graph shows the Web pages of two professors ProfA
and ProfB, their students StudentA and StudentB, and
the home page of their university Univ.
▪ Edges between nodes represent hyperlinks from one
page to another.
▪ From the fact that both are referenced (linked to) by
Univ, we may infer that ProfA and ProfB are similar
▪ Can we infer that StudentA and StudentB are also
similar based on the similarity of ProfA and ProfB ?
▪ Similar inference can be derived for other pairs of
objects
SimRank: Measuring Similarity of
Objects
▪ Logical representation of the SimRank computation by using a node-pair
▪ A new graph G2 is formed in which each node represents an ordered pair of nodes of
G.
▪ A node (a, b) of G2 points to a node (c, d) if, in G, a points to c and b points to d.
▪ Each node-pair shows the similarity score between two nodes that they represent
▪ Scores are symmetric
▪ Draw (a, b) and (b, a) as a single node {a, b} (with the union of their associated
edges).
▪ Iterative computation of SimRank scores for each node in G2
SimRank: Measuring Similarity of
Objects
SimRank: Measuring Similarity of
Objects
SimRank: Basic Formulation
▪ For a node 𝑣 in the network, 𝐼 𝑣 = {𝐼𝑖 (𝑣)|1 ≤ 𝑖 ≤ |𝐼 𝑣 |} and 𝑂 𝑣 = {𝑂𝑖 (𝑣)|1 ≤ 𝑖 ≤ |𝑂 𝑣 |} denotes the
sets of indegree and outdegree neighbours, respectively.
▪ Formulate the similarity score 𝑠(𝑎, 𝑏) ∈ 0,1 as follows:
1 𝑖𝑓 𝑎 = 𝑏
0 𝑖𝑓 𝐼 𝑎 = ∅ 𝑜𝑟 𝐼 𝑏 = ∅
𝑠 𝑎, 𝑏 = |𝐼 𝑎 | |𝐼 𝑏 |
𝐶
෍ ෍ 𝑠(𝐼𝑖 𝑎 , 𝐼𝑗 𝑏 ) 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝐼 𝑎 . |𝐼 𝑏 |
𝑖=1 𝑗=1

▪ A node is maximally similar to itself


▪ No way of determining the score for a neighborhood that does not exist
▪ Similarity between two randomly selected nodes is proportional to the average similarity between their
neighbors
SimRank: Basic Formulation
▪ Constant C is considered as a confidence level or a decay factor
▪ Consider a simple scenario where page x references both c and d, so
we conclude some similarity between c and d.
▪ The similarity of x with itself is 1, but we probably don’t want to
conclude that s(c, d) = s(x, x) = 1.
▪ Rather, we let s(c, d) = C x s(x, x), meaning that we are less confident
about the similarity between c and d than we are between x and itself
SimRank: Example 2 Citation Network

cites
E ❑Paper E cites papers C and D
❑Papers C and D appears similar
C D ❑Paper H cites papers B and G
F ❑Papers B and G appears similar
A B
❑What about the similarity of papers A and B?
❑Lets calculate…
G H
SimRank: Example 2 Citation Network

cites
E In-neighbors:
•I(A)={C,F,G} I(B)={D,H}
C D • s(A,B)=0.8/6 (s(C,D)+s(C,H)+s(F,D)+s(F,H)+s(G,D)+s(G,H))
F
•s(A,B)=0.8/6 (0.8+0+0+0+0+0)=0.8×0.8/6=0.64/6≈0.107
A B

G H
SimRank in
Heterogeneous Bipartite Network
▪ In a heterogeneous network of users and products, the similarity of products and users are
mutually-reinforced
▪ two users can be considered similar if they buy similar products
▪ two products can be considered similar if they are bought by similar users

▪ Similarity between two distinct users can be expressed as:


|𝑂 𝑢1 | |𝑂 𝑢2 |
𝐶1
𝑠 𝑢1 , 𝑢2 = ෍ ෍ 𝑠(𝑂𝑖 𝑢1 , 𝑂𝑗 (𝑢2 ))
𝑂 𝑢1 . |𝑂 𝑢2 |
𝑖=1 𝑗=1

▪ Similarity between two distinct products can be expressed as:


|𝐼 𝑝1 | |𝐼 𝑝2 |
𝐶2
𝑠 𝑝1 , 𝑝2 = ෍ ෍ 𝑠(𝐼𝑖 𝑝1 , 𝐼𝑗 (𝑝2 ))
𝐼 𝑝1 . |𝐼 𝑝2 |
𝑖=1 𝑗=1
Illustration: SimRank in
Heterogeneous Bipartite Network
buys To calculate the similarity between users 𝐴 and 𝐵
k

𝑂 𝐴 = 𝑘, 𝑙, 𝑚 and 𝑂 𝐵 = 𝑙, 𝑚, 𝑛
A l 𝐼 𝑘 = 𝐴 , 𝐼 𝑙 = 𝐴, 𝐵 , 𝐼 𝑚 = 𝐴, 𝐵 , and 𝐼 𝑛 = 𝐵
Products
users

𝐶1
𝑠 𝐴, 𝐵 = 3×3 (𝑠 𝑘, 𝑙 + 𝑠 𝑘, 𝑚 + 𝑠 𝑘, 𝑛 + 𝑠 𝑙, 𝑙 + 𝑠 𝑙, 𝑚 + 𝑠 𝑙, 𝑛 + 𝑠 𝑚, 𝑙 + 𝑠 𝑚, 𝑚 +
B m 𝑠(𝑚, 𝑛))

We have, 𝑠 𝑋, 𝑋 = 1 and 𝑠 𝑋, 𝑌 = 𝑠(𝑌, 𝑋)


n 𝐶 2 𝐶2 𝐶2 .𝑠(𝐴,𝐵)
𝑠 𝑘, 𝑙 = 1×2 𝑠 𝐴, 𝐴 + 𝑠 𝐴, 𝐵 = +
2 2
Illustration: SimRank in
Heterogeneous Bipartite Network
𝐶2 𝐶2 .𝑠(𝐴,𝐵)
buys Similarly, 𝑠 𝑘, 𝑚 = + , 𝑠 𝑘, 𝑛 = 𝐶2 . 𝑠(𝐴, 𝐵)
2 2
k
𝐶2 𝐶2 .𝑠(𝐴,𝐵) 𝐶2 𝐶2 .𝑠(𝐴,𝐵)
𝑠 𝑙, 𝑙 = 1 , 𝑠 𝑙, 𝑚 = + , 𝑠 𝑙, 𝑛 = +
2 2 2 2

𝐶2 𝐶2 .𝑠(𝐴,𝐵) 𝐶2 𝐶2 .𝑠(𝐴,𝐵)
A l 𝑠 𝑚, 𝑙 = + , 𝑠 𝑚, 𝑚 = 1 , 𝑠 𝑚, 𝑛 = +
2 2 2 2
Products
users

3𝐶1 𝐶2 +2𝐶1
Solving, 𝑠 𝐴, 𝐵 =
9−4𝐶1 𝐶2
B m
Further, setting 𝐶1 = 𝐶2 = 0.8,
𝒔 𝑨, 𝑩 = 𝟎. 𝟓𝟒𝟕
n
SimRank in
Homogeneous Bipartite Network
Can you apply SimRank in following application ?
In a citation network, two scientific papers might be similar as
survey papers if they cite similar result papers, while two papers
might be similar as result papers if they are cited by similar survey
papers.
References:
1) Social Network Analysis by Tanmoy Chakraborty

2) SimRank: a measure of structural-context similarity by Jeh, Glen and Widom, Jennifer, ACM,
2002. Link: https://dl.acm.org/doi/abs/10.1145/775047.775126

You might also like