0% found this document useful (0 votes)
97 views21 pages

Social Network Analysis Unit-4

The document discusses link prediction in social networks. It begins by defining link prediction as predicting the likelihood of a future association between two nodes that currently have no association. It then discusses various methods for link prediction, including feature-based classification models, Bayesian probabilistic models, probabilistic relational models, and relational Markov networks. Feature-based methods extract features to train classification models, while probabilistic approaches model joint probabilities among network entities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views21 pages

Social Network Analysis Unit-4

The document discusses link prediction in social networks. It begins by defining link prediction as predicting the likelihood of a future association between two nodes that currently have no association. It then discusses various methods for link prediction, including feature-based classification models, Bayesian probabilistic models, probabilistic relational models, and relational Markov networks. Feature-based methods extract features to train classification models, while probabilistic approaches model joint probabilities among network entities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Link Prediction

in
Social Networks
Contents

 Introduction




 Feature based Link Prediction




 Bayesian Probabilistic Models




 Probabilistic Relational Models




 Relational Markov Network
Link Prediction

MDPI and ACS Style


Zhu, J.; Zhang, J.; Wu, Q.; Jia, Y.; Zhou, B.; Wei, X.; Yu, P.S. Constrained Active Learning for Anchor Link Prediction Across
Multiple Heterogeneous Social Networks. Sensors 2017, 17, 1786.
Link Prediction

 Predict the likelihood of a future association between


two nodes, knowing that there is no association

between the nodes in the current state of the graph.
 G[t, t'] denotes the sub graph of G restricted to the

edges with time-stamps between t and t'.
 Training interval [t0, t0' ] and a test interval [t1, t1' ]

where t0' < t1.
 Link prediction task is to output a list of edges not
present in G[t0, t0' ], but are predicted to appear in
the network G[t1, t1' ].
Application Areas

 Internet and www  automatic web hyper-link creation,


web site hyper-link prediction
E-commerce  build recommendation systems

 Bibliography and library science  de-duplication and

record linkage
 Bioinformatics  protein-protein interaction (PPI)

prediction
 Security related applications  identify hidden groups of
terrorists and criminals
Methods of Link Prediction

 The traditional (non-Bayesian) models  extract a set of



features to train a binary classification model.
 The probabilistic approach  model the joint-
probability among entities in a network by Bayesian
graphical models.
 The linear algebraic approach  computes the
similarity between nodes in a network by rank-
reduced similarity matrices.
 Notation: For a node x, Γ(x) represents the set of
neighbors of x. degree(x) is the size of the Γ(x).
Feature based Link Prediction
(Supervised Classification)


u, v ∈ V are two vertices in the graph G(V,E) and the label of the data point u, v is y(u,v)

 Assume that the interactions between u and v are symmetric








 Classification model predicts the unknown labels of pair of
vertices, u, v  E in the graph G[t1, t1].



 Popular supervised classification tools, such as naive Bayes,
neural networks, support vector machines (SVM) and k
nearest neighbors can be used.



 Major challenge is to choose a set of features for the
classification task.
Feature Set Construction

 Each data point corresponds to a pair of vertices with


the label denoting their link status, so the chosen
features should represent some form of proximity

between the pair of vertices. 
 Graph topological features:

(1) Node neighborhood based (2) Path based
 Compute the similarity based on the node
neighborhoods or based on the ensembles of paths
between a pair of nodes.
Node Neighborhood based Features

 Common Neighbors. For two nodes, x and y, the


size of their common neighbors is |Γ(x) ∩ Γ(y)|.
 As the number of common neighbors grows higher, the chance

that x and y will have a link between them increases.
 Jaccard Coefficient: normalizes the size of
common neighbors.
 Probability that a common neighbor of a pair of vertices x and y
would be selected if the selection is made randomly from the
union of the neighbor-sets of x and y.
 Adamic/Adar: Weighs the common neighbors
with smaller degree more heavily, works better than
the previous two metrics.
Path based Features

 Shortest Path Distance: Path distance between two nodes can



influence the formation of a link between them.
 Katz: Directly sums over all the paths that exist between a pair
of vertices x and y. Exponentially damps the contribution of a path
by a factor of βl. Katz score between all the pairs of vertices can be
-1
 computed by finding (I − βA) − I
 Hitting time: Expected number of steps required for a random
walk starting at x to reach y. Shorter hitting time denotes that the
nodes are similar to each other, so they have a higher chance of
linking in the future.

 Rooted Pagerank: The stationary probability of y in a random
walk that returns to x with probability 1 − β in each step, moving
to a random neighbor with probability β.
Bayesian Probabilistic Models

 Obtain a posterior probability that denotes the


chance of co-occurrence of the vertex pairs.




 Link Prediction By Local Probabilistic
Models




 Network Evolution based Probabilistic Model




 Hierarchical Probabilistic Model
Link Prediction By Local Probabilistic Models

 Uses Markov Random Field (MRF), an undirected graphical


 model.
 Central neighborhood set consists of other nodes that appear
 in the local neighborhood of x or y.
 Compute the joint probability P({w, x, y, z}), which represents
 the probability of co-occurrence of the objects in this set.
 Step-1: Find a collection of central neighborhood sets  find a
shortest path between x and y and then all the nodes along this path
 can belong to one central neighborhood set.
 Step-2:Obtain the training data for the MRF model, which is
 taken from the event log of the social network.
 Step-3: An MRF model is trained from the training data.
This training process is translated to a maximum entropy
optimization problem. Once the model PM(Q) is built, one
can estimate the joint probability between the vertex x and y.
Network Evolution based Probabilistic Model

For graph G(V, φ), V is the set of nodes and



φ : V × V → [0, 1] is an edge label function
 φ(x, y) denotes the probability that an edge exists
between node x and y in G.
 φ(x, y) = 1 if an edge exists

φ(x, y) = 0 if an edge does not exist.

 φ(t)  the edge label function at time t, which changes over
 time;
 The model is Markovian, i.e., φ(t+1) depends only
on φ(t).
Model Evolution

 An edge label is copied from node l to node m randomly


 with probability wlm.
 First, the model decides on l and m, then chooses an edge
label uniformly from one of l’s |V | − 1 edge labels
 (excluding φ(l,m)) to copy as m’s edge label.
 The model satisfies the following probability constraints.



 Through the edge label copying process, l can become
 friend of one of m’s friend.
 The learning task in the model is to compute the weights
wij and the edge labels φ(t+1) given the edge label φ(t)
from training dataset.
Hierarchical Probabilistic Model

 Let G be a graph with n vertices.



 A dendogram D is a binary tree with n leaves
corresponding to the vertices of G. Each of the n−1
internal nodes of D corresponds to the group of vertices
 that are descended from it.
 A probability pr is associated with each internal node r.
Then, given two vertices i,j of G, the probability pij that
they are connected by an edge is pij = pr where r is the
 lowest common ancestor in D.
 The combination, (D, {pr}) of the dendogram and the set
of probabilities then defines a hierarchical random
graph.
Cont..

 One of the dendograms are chosen by Markov chain


Monte Carlo(MCMC) sampling method with probabilities
 proportional to their likelihood.
 To create the Markov chain, the method creates a set of
transitions between possible dendograms through
 rearrangement.
 For link prediction, a set of sample dendograms are
obtained at regular intervals once the MCMC random walk
 reaches an equilibrium.
 For the pair of vertices x and y for which no connection
exists, the model computes a mean probability pxy that
they are connected by averaging over the corresponding
probability pxy in each of the sampled dendograms.
Probabilistic Relational Models

 Concrete modeling tool that provides a systematic


way to incorporate both vertex and edge attributes to
model the joint probability distribution of a set of
entities and the links that associate them.
 Considers the object-relational nature of structured
data by capturing probabilistic interactions
between entities
 Type 1: Based on Bayesian networks, which
consider the relation links to be directed
 Type 2: Based on relational Markov networks,
which consider the relation links to be undirected.
Example : Co-authorship Network

 PRM can have heterogeneous entities in the model. Ex:


 article, author, conferenceVenue, and institution.
 Author may have attributes like name, affiliationInstitute,
 status
 An article may have publicationYear, conferenceVenue;

 An institute may have location, Conference venue may have
 attributes like ResearchKeywords.
 Then there can be relational links between these entities.

 Two persons can be related by an advisor/advisee
 relationship.
 A person can be related to a paper by an author relationship.
A paper can be related to a conference venue by publish
relationship.
Relational Markov Network

 Relational counterpart of undirected graphical models or


 Markov Networks.
 A Markov network for V defines a joint distribution over
V through an undirected dependency network and a set
 of parameters.
 For a graph G, if C(G) is the set of cliques , the Markov
network defines the distribution p(v)

Where Z is the standard normalizing factor


vc is the vertex set of the clique c φc
is a clique potential function.
Cont..

 Given a particular instantiation I of the schema, the


RMN M produces an unrolled Markov network

over the attributes of entities in I
 There exists one clique for each c ∈ C(I), and all of these cliques are

associated with the same clique potential φC.
 In a large network with a lot of relational attributes,

the network is typically large
 RMN also uses belief propagation for inference.
References

 https://pdfs.semanticscholar.org/e7d3/0fefe1b99c21813
 873f976e46d03dc82b4fc.pdf
 https://www.slideshare.net/SinaSajadmanesh/probabili
 stic-relational-models-for-link-prediction-problem
 http://www.robotics.stanford.edu/~koller/Papers/Fried
 man+al:IJCAI99.pdf
 https://networkx.github.io/documentation/networkx-
 1.10/reference/algorithms.link_prediction.html
 https://courses.cs.washington.edu/courses/cse574/05sp
 /slides/rmn-danny.pdf
 https://pdfs.semanticscholar.org/presentation/fad1/fb7
85125ba8db0e8bf417f8a68729327269e.pdf

You might also like