SimRank Algorithm
SimRank Algorithm
Network
Analysis
LINK ANALYSIS
SimRank: Measuring Similarity of
Objects
SimRank: Measuring Similarity of
Objects
▪ SimRank measures similarity of the structural context in which objects
occur, based on their relationships with other objects.
▪ Idea : Two objects are similar if they are related to similar objects
▪ For a given domain, SimRank can be combined with other domain-specific
similarity measures.
▪ Example 1 : Citation graph
▪ Two papers are similar if they are cited by similar papers
▪ Example 2 : E- Commerce graph
▪ Two products are similar if they are bought by similar customers
▪ Two customers are similar if they are buying similar products
SimRank: Measuring Similarity of
Objects
▪ Idea : Two objects are similar if they are related to similar
objects
▪ More precisely, objects a and b are similar if they are related to
objects c and d, respectively, and c and d are themselves
similar.
▪ The base case is that objects are similar to themselves.
SimRank: Measuring Similarity of
Objects
▪ Graph shows the Web pages of two professors ProfA
and ProfB, their students StudentA and StudentB, and
the home page of their university Univ.
▪ Edges between nodes represent hyperlinks from one
page to another.
▪ From the fact that both are referenced (linked to) by
Univ, we may infer that ProfA and ProfB are similar
▪ Can we infer that StudentA and StudentB are also
similar based on the similarity of ProfA and ProfB ?
▪ Similar inference can be derived for other pairs of
objects
SimRank: Measuring Similarity of
Objects
▪ Logical representation of the SimRank computation by using a node-pair
▪ A new graph G2 is formed in which each node represents an ordered pair of nodes of
G.
▪ A node (a, b) of G2 points to a node (c, d) if, in G, a points to c and b points to d.
▪ Each node-pair shows the similarity score between two nodes that they represent
▪ Scores are symmetric
▪ Draw (a, b) and (b, a) as a single node {a, b} (with the union of their associated
edges).
▪ Iterative computation of SimRank scores for each node in G2
SimRank: Measuring Similarity of
Objects
SimRank: Measuring Similarity of
Objects
SimRank: Basic Formulation
▪ For a node 𝑣 in the network, 𝐼 𝑣 = {𝐼𝑖 (𝑣)|1 ≤ 𝑖 ≤ |𝐼 𝑣 |} and 𝑂 𝑣 = {𝑂𝑖 (𝑣)|1 ≤ 𝑖 ≤ |𝑂 𝑣 |} denotes the
sets of indegree and outdegree neighbours, respectively.
▪ Formulate the similarity score 𝑠(𝑎, 𝑏) ∈ 0,1 as follows:
1 𝑖𝑓 𝑎 = 𝑏
0 𝑖𝑓 𝐼 𝑎 = ∅ 𝑜𝑟 𝐼 𝑏 = ∅
𝑠 𝑎, 𝑏 = |𝐼 𝑎 | |𝐼 𝑏 |
𝐶
𝑠(𝐼𝑖 𝑎 , 𝐼𝑗 𝑏 ) 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝐼 𝑎 . |𝐼 𝑏 |
𝑖=1 𝑗=1
cites
E ❑Paper E cites papers C and D
❑Papers C and D appears similar
C D ❑Paper H cites papers B and G
F ❑Papers B and G appears similar
A B
❑What about the similarity of papers A and B?
❑Lets calculate…
G H
SimRank: Example 2 Citation Network
cites
E In-neighbors:
•I(A)={C,F,G} I(B)={D,H}
C D • s(A,B)=0.8/6 (s(C,D)+s(C,H)+s(F,D)+s(F,H)+s(G,D)+s(G,H))
F
•s(A,B)=0.8/6 (0.8+0+0+0+0+0)=0.8×0.8/6=0.64/6≈0.107
A B
G H
SimRank in
Heterogeneous Bipartite Network
▪ In a heterogeneous network of users and products, the similarity of products and users are
mutually-reinforced
▪ two users can be considered similar if they buy similar products
▪ two products can be considered similar if they are bought by similar users
𝑂 𝐴 = 𝑘, 𝑙, 𝑚 and 𝑂 𝐵 = 𝑙, 𝑚, 𝑛
A l 𝐼 𝑘 = 𝐴 , 𝐼 𝑙 = 𝐴, 𝐵 , 𝐼 𝑚 = 𝐴, 𝐵 , and 𝐼 𝑛 = 𝐵
Products
users
𝐶1
𝑠 𝐴, 𝐵 = 3×3 (𝑠 𝑘, 𝑙 + 𝑠 𝑘, 𝑚 + 𝑠 𝑘, 𝑛 + 𝑠 𝑙, 𝑙 + 𝑠 𝑙, 𝑚 + 𝑠 𝑙, 𝑛 + 𝑠 𝑚, 𝑙 + 𝑠 𝑚, 𝑚 +
B m 𝑠(𝑚, 𝑛))
𝐶2 𝐶2 .𝑠(𝐴,𝐵) 𝐶2 𝐶2 .𝑠(𝐴,𝐵)
A l 𝑠 𝑚, 𝑙 = + , 𝑠 𝑚, 𝑚 = 1 , 𝑠 𝑚, 𝑛 = +
2 2 2 2
Products
users
3𝐶1 𝐶2 +2𝐶1
Solving, 𝑠 𝐴, 𝐵 =
9−4𝐶1 𝐶2
B m
Further, setting 𝐶1 = 𝐶2 = 0.8,
𝒔 𝑨, 𝑩 = 𝟎. 𝟓𝟒𝟕
n
SimRank in
Homogeneous Bipartite Network
Can you apply SimRank in following application ?
In a citation network, two scientific papers might be similar as
survey papers if they cite similar result papers, while two papers
might be similar as result papers if they are cited by similar survey
papers.
References:
1) Social Network Analysis by Tanmoy Chakraborty
2) SimRank: a measure of structural-context similarity by Jeh, Glen and Widom, Jennifer, ACM,
2002. Link: https://dl.acm.org/doi/abs/10.1145/775047.775126