SMA Module3
SMA Module3
Analytics
Module 3
Community Structure in Networks
Overview - Community Structure in Networks
• Definition of Communities in social
networks, Applications of Community
Detection, Types of Communities.
• Community Detection Methods:
–Disjoint Community Detection- Node-
Centric Community Detection,
Modularity and Community
Detection- Louvain Algorithm, Girvan
Newman;
–Overlapping Community Detection:
Clique Percolation, Link Partition;
–Local Community Detection
Homophily in the Society
[real-world] community
A group of individuals with common economic, social, or political
interests or characteristics, often living in relative proximity.
Communities in a Network
Identifying communities gives an insight about
the inherent network structure
Community detection is an ill-defined problem
what we mean by a ‘community’ is often not
concrete
often hard to reliably define a ground-truth
annotation for communities
no standard measure to asses the
performance
Diverse approaches to the problem depending
on how we define a community structure in the
network
Community Detection in Networks: Applications
Performance enhancement of the similarity-based link prediction algorithms
Improving recommendation quality in Recommender systems by separating like-
minded people
Controlling information diffusion within a network by identifying community
memberships
Designing better marketing strategy by identifying position of the target group within
the network
Restricting epidemic propagation by suitably isolating and immunizing the vulnerable
population
Better anomaly detection in nodes, especially in evolving networks
Studying evolution of communities
Applications in criminology and detecting terrorist groups
Social Media Communities
• A basic community comes to existence when likeminded users on social
media form a link and start interacting with each other.
• Any formation of a community requires
–1) a set of at least two nodes sharing some interest and
–2) interactions with respect to that interest.
• Two types of groups in social media
–Explicit Groups: formed by user subscriptions
–Implicit Groups: implicitly formed by social interactions
• We may see group, cluster, cohesive subgroup, or module in different
contexts instead of “community”
Explicit Communities
(Clearly defined groups with membership & participation)
• These are communities where users explicitly join or participate, often with visible
membership lists.
• Facebook Groups – Groups formed around shared interests, such as "Machine
Learning Enthusiasts" or "Indian Cooking Lovers."
• Reddit Subreddits – Communities like r/India or r/AskScience, where users subscribe
and engage in discussions.
• LinkedIn Groups – Professional groups like "Data Science & AI Professionals" where
members interact on specific topics.
• Discord Servers – Gaming, tech, and hobby communities where users join dedicated
servers and participate in channels.
• WhatsApp/Telegram Groups – Private or public groups formed for specific
discussions, events, or topics.
Implicit Communities
(Inferred from interactions rather than explicit membership)
• These communities emerge naturally from user behavior, connections, or shared
activities rather than formal membership.
• Twitter Interaction Networks – Users who frequently like, reply, or retweet each other
form implicit communities around political figures, sports, or tech trends.
• Instagram Engagement Clusters – Users who consistently like, comment, or follow
similar accounts form hidden communities, like a fan base around celebrities or
influencers.
• YouTube Watch Patterns – Viewers who frequently watch and engage with similar types
of videos (e.g., tech reviews, fitness content) form implicit communities without direct
interaction.
• GitHub Developer Networks – Developers contributing to the same open-source
projects or following each other’s repositories create an implicit programming
community.
Facebook
• Frequent Interactors – People who consistently like, comment, or share posts
from the same set of pages or friends (e.g., fans of a specific political party,
sports team, or celebrity).
• Shared Interest Networks – Users who follow similar Facebook Pages (e.g.,
multiple pages related to stock trading, AI, or Bollywood) but are not in the
same group.
• Event-Based Communities – Users who RSVP to similar events (e.g., tech
conferences, music festivals) and engage with posts related to them.
• Ad Targeting Clusters – Facebook’s recommendation system identifies people
with similar browsing and engagement patterns, creating hidden communities
that receive similar ads.
LinkedIn
• Skill-Based Networks – Users who endorse each other for similar skills (e.g., "Python
Developer" or "Marketing Strategy") form hidden professional clusters.
• Frequent Engagers – People who regularly like or comment on posts related to
specific industries (e.g., AI advancements, HR trends) but are not in the same
LinkedIn group.
• Recruiter-Job Seeker Clusters – Job seekers applying for similar roles get
recommended similar recruiters, forming an unseen community of professionals
vying for the same job opportunities.
• Alumni Networks Without Groups – People who studied at the same university and
interact with related posts, even without joining an official alumni group.
• Would you like a slide version or additional real-world examples?
Types of Communities: Disjoint Communities
• Community detection
–Discovering implicit communities
• Community evolution
–Studying temporal evolution of communities
• Community evaluation
–Evaluating Detected Communities
What is community detection?
• Community detection in social networks is the process of identifying groups of nodes
(users, entities) that are more densely connected to each other than to the rest of the
network.
• It helps in understanding the structure of social interactions, influence, and
relationships.
• The process of finding clusters of nodes (‘‘communities’’)
– With Strong internal connections and
– Weak connections between different communities
• Ideal decomposition of a large graph
– Completely disjoint communities
– There are no interactions between different communities.
• In practice,
– find community partitions that are maximally decoupled.
Challenges to Community Detection in Social Networks
Challenges to Community Detection in Social Networks
• No Universal Definition of a Community
– Some algorithms define communities based on dense connections (e.g.,
modularity-based methods), while others focus on structural roles (e.g., core-
periphery models).
– Real-world communities may be based on common interests, social interactions,
or functional relationships, leading to different interpretations of community
structures.
• 2. Different Algorithms Yield Different Communities
– Louvain may detect broad clusters,
– Infomap may find smaller, more information-theoretic groups,
– Label propagation may give unstable results due to randomness.
• The choice of algorithm introduces a layer of subjectivity in defining communities.
Challenges to Community Detection in Social Networks
• Parameter Sensitivity
– Many community detection methods require parameters (e.g., resolution in
modularity, number of clusters in spectral clustering).
– Different parameter settings can lead to entirely different community structures,
making the results subjective to user choices.
• Ground Truth Communities May Not Be Well-Defined
– In many cases, the "real" communities in a social network are unknown or
ambiguous.
– On Facebook, community detection might group users based on friendship
networks, but the "ground truth" communities (e.g., real-life friend circles,
workplace colleagues, or shared interest groups) may not always align with the
detected structure.
Challenges to Community Detection in Social Networks
• Overlapping vs. Non-Overlapping Communities
– Some researchers argue that real-world communities are overlapping (e.g., people
belong to multiple groups), while many traditional algorithms force hard partitions.
– Deciding whether a community should be strictly separate or overlapping is inherently
subjective.
• Human Interpretation Bias
– Analysts may interpret the results differently based on their prior knowledge, leading
to confirmation bias.
– For example, a detected community on Twitter might be labeled as "political
activists," but another analyst may see them as "social justice groups."
• Dynamic Nature of Communities
– Communities evolve over time, and different snapshots of a network may show
different structures.
– What is considered a "valid" community today may not be relevant tomorrow, adding
another layer of subjectivity.
Challenges to Community Detection in Social Networks
• k-Clique:
• Definition: A k-clique is a subgraph in which the shortest distance between any two
nodes is at most k.
• This means that within the subgraph, any two nodes can be reached from one
another in k steps or fewer.
• Characteristics:
– The subgraph is not necessarily fully connected; some nodes may not have direct
links but are connected through intermediate nodes.
– The focus is on the maximum distance between nodes, allowing for a more relaxed
structure compared to traditional cliques.
k-Clan
• Definition: A k-clan is a k-clique with the additional constraint that the diameter of the induced
subgraph is at most k.
• The diameter is the greatest distance between any pair of nodes within the subgraph.
• Characteristics:
– The subgraph induced by the k-clan is more tightly connected, ensuring that all nodes are
within k steps of each other within the subgraph itself.
– This stricter condition means that k-clans are more cohesive subsets compared to k-cliques.
• Key Difference:
– While both k-cliques and k-clans consider the distance between nodes, a k-clique allows for
nodes to be within k steps in the context of the entire graph, potentially relying on external
nodes to maintain these distances.
– In contrast, a k-clan requires that the nodes be within k steps of each other within the
subgraph itself, ensuring a more cohesive and self-contained structure.
– Understanding these distinctions is crucial for analyzing the cohesiveness and communication
efficiency within subgroups of social networks.
Node-centric Community Detection: K-club
K-club is a K-clan minus the maximality
1
condition
2 6 {2, 3, 4}, {3, 4, 5}, {4, 5, 6}, {5, 6, 2}, and {6, 2,
3} in 𝐺2 are all 2-clubs
Every K-clan is a K-club as well as a K-clique
3 5 Challenges:
These algorithms are still computationally
4
expensive for large K
𝐺2 Deciding appropriate K is difficult
K – Cliques , Clans, Clubs
• Ananya, Nikhil, Sonia, Priya, Raj - 2 Clique {kartik not in clique but on path}
• Ananya, Nikhil, Sonia, Raj - 2 Clique 2 Clan
• Ananya, Nikhil, Sonia, Priya - 2 Clique 2 Clan
• Ananya, Kartik, Sonia, Priya, Raj - 2 Clique 2 Clan
• Ravi Kartik meena Amit - 2 Clique 2 Clan
• All the above are also 2 Clubs but also
– Nikhil Ananya Sonia
– Kartik Ravi Meera etc etc
A stricter form of k-Clique The longest shortest path Yes – No new nodes can
k-Clan where the diameter of the (diameter) within the group be added if it increases
subgraph is at most k. is ≤ k. the diameter beyond k.
• For detecting k-Cliques, k-Clans, and k-Clubs in large-scale social networks, exact
solutions are often impractical due to their NP-complete or NP-hard nature.
• Instead, researchers use heuristic, approximation, and optimization-based methods.
Summary
1. 𝑘-Clique: a maximal subgraph in which the largest shortest path distance between
any nodes is less than or equal to 𝑘
2. 𝑘-Clan: follows the same definition as a 𝑘-clique
– Additional Constraint: nodes on the shortest paths should be part of the
subgraph (i.e., diameter)
3. 𝑘-Club: a 𝑘-clique where for all shortest paths within the subgraph the distance is
equal or less than 𝑘. (need not be maximal)
– All 𝑘-clans are 𝑘-cliques, but not vice versa.
Node-centric Community Detection: K-plex
Based on Node Degree
2 1 A subset of vertices 𝑆 in a graph is a 𝐾-plex if every
vertex of the induced subgraph 𝐺[𝑆] has degree at
least |𝑆| − 𝐾
6 3 A measure based on the degree of the nodes
In the network 𝐺3 ,
The subset {3,4,5,6} is a 1-plex, i.e., a regular
clique
The subset {1,3,4,5,6} is a 2-plex, but not a 1-
5 4
plex
𝐺3 The subset {1,2,3,4,5,6} is a 3-plex, but not a 2-
plex
K-Plex
• 𝒌-plex: a set of vertices 𝑉 in
which we have
b=16
b=7.5
61
The Girvan-Newman Algorithm
Betweenness(1-3) = 1X5=5
Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) =
betweenness(8-12)= 3X4=12
Girvan Newman method: example
70
Another example
5X5=25
Another example
5X6=30 5X6=30
Another example
Girvan-Newman: Results
76
We need to resolve 2 questions
77
Community Detection: Modularity
• Node-centric methods discussed so far are not very useful when the network is large
• Modularity comes from the word ‘module’
• a network-centric metric to determine the quality of a community structure
• Based on the principle of comparison between
– the actual number of edges in a subgraph and its expected number of edges
– the expected number of edges is calculated by assuming a null model
• In the null model,
– each vertex is randomly connected to other vertices irrespective of the community
structure
– However, some of the structural properties are preserved
– One popular structural property is the degree distribution
Concept of Modularity
𝑘-means
2 eigenvectors
Modularity Matrix
Modularity and Null Model
Making Modularity Optimization Faster
• Louvain Method
–A greedy modularity optimization method for community
detection
Example – Modularity Computation
Modularity Computataion
Examples – Modularity Computation
Community Detection: Modularity Maximization
Modularity can be positive, negative, and zero
Positive modularity shows presence of strong community structure
Networks with high modularity have dense connections between the nodes
within modules but sparse connections between nodes in different
modules.
Different community assignments can lead to different values of modularity
an assignment that maximizes the modularity of the overall network often
finds the communities in the network
Fast Greedy Algorithm
Louvain Method
Extreme Cases of Modularity
• If all nodes belong to a single community, the modularity is zero (Q=0).
• This means that the graph does not have a meaningful community structure
compared to a random distribution of edges.
• Effective community detection should produce modularity values
significantly greater than zero, typically between 0.3 and 0.7 for good
partitions.
• If each node is its own community, modularity is negative (Q<0).
• This is because the partitioning is worse than random, as no intra-
community edges exist, and all edges are treated as inter-community edges,
lowering modularity.
• Typically, meaningful community structures have modularity values closer
to 0.3–0.7.
Motivating Example - Louvain Method
• Example: Friends in a Social Network A -- B D
• Consider a social network where individuals are connected based on | /
friendships:
• Initial Communities: C -- E
– Each person is initially in their own community: {A}, {B}, {C}, {D}, {E}.
• Optimization:
– The algorithm notices strong connections among A, B, C, and E, leading
to a merge into a single community.
– New communities: {A, B, C, E}, {D}.
• Modularity Calculation:
– Modularity is calculated to assess the improvement in the network
division.
• Final Result:
– After several iterations, the Louvain algorithm settles on a division with
higher modularity.
• Final communities: {A, B, C, E}, {D}.
Overview – Louvain Method
Community Detection: Louvain Method
Louvain Method
Start with a weighted network where all nodes are in their own
communities (i.e., n communities)
First Phase:
• For each node 𝑣𝑖 ,
–For all neighbors 𝑣𝑗 ∈ 𝑁(𝑣𝑖 ):
• compute the modularity gain if 𝑣𝑖 is removed from its community and placed in
the community of 𝑣𝑗 .
–Find the community with the maximum modularity gain
–If the maximum gain is positive, remove 𝑣𝑖 from its community, and
place it in that community
–If no positive gain, do not change communities
• Repeat until no node changes its community
Important Points about Phase I
Stanford (Basketball)
Stanford (Squash)
112
CPM - Illustrated
Communities:
{1, 2, 3, 4}
{4, 5, 6, 7, 8}
114
Clique Percolation Method: Example
Cliques of size 3:
{𝑣1 , 𝑣2 , 𝑣3 }, {𝑣3 , 𝑣4 , 𝑣5 },
𝑣4 , 𝑣5 , 𝑣6 , {𝑣4 , 𝑣5 , 𝑣7 },
{𝑣4 , 𝑣6 , 𝑣7 }, {𝑣5 , 𝑣6 , 𝑣7 },
{𝑣6 , 𝑣7 , 𝑣8 }, {𝑣8 , 𝑣9 , 𝑣10 }
Communities:
{𝑣1 , 𝑣2 , 𝑣3 },
{𝑣8 , 𝑣9 , 𝑣10 },
{𝑣3 , 𝑣4 , 𝑣5 , 𝑣6 , 𝑣7 , 𝑣8 }
Overview – Link Partition
Overlapping Community Detection: Link Partition
Uses links (edges) in the networks to detect communities
Two major approaches
Create a link network and apply a node partitioning algorithm or disjoint
community detection algorithm to find the community
Use similarity measures on the edges to find the communities directly
by creating the dendrogram
Jaccard coefficient might be good choice for similarity measure
For two edges 𝑒𝑖𝑘 and 𝑒𝑗𝑘 connected to node 𝑘,
|𝑁𝑖 ∩ 𝑁𝑗 |
𝑆𝑖𝑚 𝑒𝑖𝑘 ,𝑒𝑗𝑘 =
|𝑁𝑖 ∪ 𝑁𝑗 |
Where 𝑁𝑖 and 𝑁𝑗 are the neighbours of the nodes 𝑖 and 𝑗
Link Partition Method - Illustrations
Example Network
• Key Idea
• Given a seed node, the algorithm expands the community iteratively by
greedily optimizing local modularity.
• It does not require the full network—only local neighborhood
information.
• This method is ideal for applications like personalized recommendations
on Instagram, Facebook, or LinkedIn, where a user’s network is
incrementally explored.
Algorithm
• Start with a seed node (e.g., an Instagram user).
• Initialize community as just the seed node.
• Expand the community:
–Consider neighboring nodes (directly connected to the
current community).
–Add the best node (the one that maximizes the local
modularity gain).
• Repeat until adding another node does not improve local
modularity.
Local Modularity
• Suppose that in a graph G, we have perfect
knowledge of the connectivity of some set of
vertices, i.e., the
• known portion of the graph, which we denote C.
• This necessarily implies the existence of a set of
vertices U about which we know only their
adjacencies to C.
• Further, let us assume that the only way we may
gain additional knowledge about G is by visiting
some neighboring vertex 𝑉𝑖 ∈ U, which yields a list
of its adjacencies.
• As a result, 𝑉𝑖 becomes a member of C, and
additional unknown vertices may be added to U.
Local Community Detection: Local Modularity
Let us say we have complete knowledge about a subgraph (community) C of an
unweighted and undirected network G
We denote Known Adjacency Matrix of G as
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑖 ∈ 𝐶 ∨ 𝑗 ∈ 𝐶
𝐴𝑖𝑗 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Define the quality of the community as the fraction of known connections lie
completely inside the community:
σ𝑖𝑗 𝐴𝑖𝑗 𝜉 𝑖, 𝑗 1
= ∗
𝐴𝑖𝑗 𝜉 𝑖, 𝑗
σ𝑖𝑗 𝐴𝑖𝑗 2𝑚
𝑖,𝑗
∗
𝑚 : total number of edges in the partial adjacency matrix
• 𝜉 𝑖, 𝑗 : 1 if 𝑖 and 𝑗 both belongs to the same community; 0 otherwise –
– This quantity will be large when C has many internal connections, and few
connections to the unknown portion of the graph.
– This measure also has the property that when |C| ≫ |U|, the partition will almost
always appear to be good.
Local Community Detection: Local Modularity
W is the set of vertices in C that have atleast
one neighbor in U
Define boundary adjacency matrix as
𝐴መ 𝑖𝑗
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑖 ∈ 𝑊 ∨ 𝑗 ∈ 𝑊
=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒