The document discusses social network clustering and algorithms for identifying communities within networks. It introduces the Girvan-Newman algorithm which uses betweenness centrality to identify edges that connect communities. The algorithm calculates the betweenness of each edge by finding the number of shortest paths between all nodes that pass through that edge. Edges with the highest betweenness are removed iteratively until individual communities are identified. However, the algorithm has disadvantages in that nodes cannot belong to multiple communities and certain boundary nodes may be incorrectly removed from their own community.
The document discusses social network clustering and algorithms for identifying communities within networks. It introduces the Girvan-Newman algorithm which uses betweenness centrality to identify edges that connect communities. The algorithm calculates the betweenness of each edge by finding the number of shortest paths between all nodes that pass through that edge. Edges with the highest betweenness are removed iteratively until individual communities are identified. However, the algorithm has disadvantages in that nodes cannot belong to multiple communities and certain boundary nodes may be incorrectly removed from their own community.
The document discusses social network clustering and algorithms for identifying communities within networks. It introduces the Girvan-Newman algorithm which uses betweenness centrality to identify edges that connect communities. The algorithm calculates the betweenness of each edge by finding the number of shortest paths between all nodes that pass through that edge. Edges with the highest betweenness are removed iteratively until individual communities are identified. However, the algorithm has disadvantages in that nodes cannot belong to multiple communities and certain boundary nodes may be incorrectly removed from their own community.
The document discusses social network clustering and algorithms for identifying communities within networks. It introduces the Girvan-Newman algorithm which uses betweenness centrality to identify edges that connect communities. The algorithm calculates the betweenness of each edge by finding the number of shortest paths between all nodes that pass through that edge. Edges with the highest betweenness are removed iteratively until individual communities are identified. However, the algorithm has disadvantages in that nodes cannot belong to multiple communities and certain boundary nodes may be incorrectly removed from their own community.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 12
CLUSTERING SOCIAL NETWORK GRAPHS Introduction
Social Network is a nonrandom collection
of entities in a network, having at least one relationship between them Social networks contain communities of entities that are connected by many edges Eg: Groups of friends at school, Researchers interested in the same topic etc. Communities can be identified by clustering Absence of a proper distance measure Disadvantages Sub communities will not be identified of Possibility of different cluster nodes Standard getting combined Possibility of wrong clustering in both Clustering K Means and Hierarchical clustering Algorithms Betweenness of an edge (a, b) is the number of pairs of nodes x and y such that the edge (a, b) lies on the Solving the shortest path between x and y Finding the edges that are least likely problem to be inside a community Large betweenness shows edge runs between two different communities "Betweenness" The Girvan-Newman Algorithm Used for calculating the number of shortest paths going through each edge Visits each node X once and computes the number of shortest paths from X to each of the other nodes that go through each of the edges STEPS 1. Performing a breadth-first search (BFS) of the graph, starting at the node X 2. Label each node by the number of shortest paths that reach it from the root and label each node Y by sums of labels 3. Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go through e 4. Repeat for all nodes 5. Completing credit calculation The Girvan-Newman Algorithm contd... Step1 - performing a breadth-first search (BFS) of the graph The Girvan-Newman Algorithm contd... Step 2 - Label each node by the number of shortest paths that reach it from the root and label each node Y by sums of labels The Girvan-Newman Algorithm contd... Step 3 - Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go through e The rules for the calculation are as follows: 1. Each leaf in the DAG (a leaf is a node with no DAG edges to nodes at levels below) gets a credit of 1. 2. Each node that is not a leaf gets a credit equal to 1 plus the sum of the credits of the DAG edges from that node to the level below. 3. A DAG edge e entering node Z from the level above is given a share of the credit of Z proportional to the fraction of shortest paths from the root to Z that go through e. The Girvan-Newman Algorithm contd... Step 5 &6 -Repeat for all nodes and Completing credit calculation Since each shortest path will have been discovered twice – once when each of its endpoints is the root – we must divide the credit for each edge by 2. Girvan-Newman Algorithm & Betweenness
Remove Edges with highest credit value
Stopped when individuals are assigned to clusters The Girvan-Newman Algorithm Disadvantage Nodes cannot be in two different communities together Certain nodes may be removed from the community on being associated with another community THANKYOU