Clustering Social Network Graphs

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

 

CLUSTERING
SOCIAL NETWORK
GRAPHS 
Introduction

 Social Network is a nonrandom collection


of entities in a network, having at least
one relationship between them 
 Social networks contain communities of
entities that are connected by many edges 
 Eg: Groups of friends at school, Researchers
interested in the same topic etc.
 Communities can be identified by clustering
 Absence of a proper distance measure
Disadvantages  Sub communities will not be identified
of  Possibility of different cluster nodes
Standard getting combined
 Possibility of wrong clustering in both
Clustering K Means and Hierarchical clustering
Algorithms 
 Betweenness of an edge (a, b) is the
number of pairs of nodes x and y
such that the edge (a, b) lies on the
Solving the shortest path between x and y 
 Finding the edges that are least likely
problem to be inside a community
    Large betweenness shows edge runs
between two different communities
"Betweenness" 
The Girvan-Newman Algorithm 
 Used for calculating the number of shortest paths going through each edge 
 Visits each node X once and computes the number of shortest paths from X to each of the other nodes that
go through each of the edges
 STEPS
1. Performing a breadth-first search (BFS) of the graph, starting at the node X 
2. Label each node by the number of shortest paths that reach it from the root and label each node Y  by sums of labels 
3. Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go
through e
4. Repeat for all nodes
5. Completing credit calculation
The Girvan-Newman Algorithm contd...
Step1 - performing a breadth-first search (BFS) of the graph
The Girvan-Newman Algorithm contd...
Step 2 - Label each node by the number of shortest paths that reach it from the root and label each node Y 
by sums of labels 
The Girvan-Newman Algorithm contd...
Step 3 - Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go through e
The rules for the calculation are as follows:
1. Each leaf in the DAG (a leaf is a node with no DAG edges to nodes at levels below) gets a credit of 1.
2. Each node that is not a leaf gets a credit equal to 1 plus the sum of the credits of the DAG edges from that node to the level
below.
3. A DAG edge e entering node Z from the level above is given a share of the credit of Z proportional to the fraction of shortest
paths from the root to Z that go through e.
The Girvan-Newman Algorithm contd...
Step 5 &6 -Repeat for all nodes and Completing credit calculation
Since each shortest path will have been discovered twice – once when each of its endpoints is the root – we must
divide the credit for each edge by 2.
Girvan-Newman Algorithm
& Betweenness

 Remove Edges with highest credit value


 Stopped when individuals are assigned to clusters
The Girvan-Newman Algorithm Disadvantage
 Nodes cannot be in two different communities together
 Certain nodes may be removed from the community on being associated with another
community
THANKYOU

You might also like