0% found this document useful (0 votes)
9 views

SMA Module3

The document discusses community structure in social networks, including definitions, types of communities, and detection methods. It highlights the importance of community detection for applications such as improving recommendation systems and controlling information diffusion. The challenges of community detection, including varying definitions and algorithmic discrepancies, are also addressed, emphasizing the complexity of accurately identifying communities within networks.

Uploaded by

Mukund Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

SMA Module3

The document discusses community structure in social networks, including definitions, types of communities, and detection methods. It highlights the importance of community detection for applications such as improving recommendation systems and controlling information diffusion. The challenges of community detection, including varying definitions and algorithmic discrepancies, are also addressed, emphasizing the complexity of accurately identifying communities within networks.

Uploaded by

Mukund Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

Social Media

Analytics
Module 3
Community Structure in Networks
Overview - Community Structure in Networks
• Definition of Communities in social
networks, Applications of Community
Detection, Types of Communities.
• Community Detection Methods:
–Disjoint Community Detection- Node-
Centric Community Detection,
Modularity and Community
Detection- Louvain Algorithm, Girvan
Newman;
–Overlapping Community Detection:
Clique Percolation, Link Partition;
–Local Community Detection
Homophily in the Society

Homophily occurs against a


Tendency of individuals to associate and number of categories:
bond with similar others  Age
Similar nodes tend to attract each other,  Class: Education, occupation, and
and dissimilar nodes tend to get away from Social
each other  Location
Causes formation of a community structure  Interests
in a social network  Organizational role, etc.
Social Community

[real-world] community
A group of individuals with common economic, social, or political
interests or characteristics, often living in relative proximity.
Communities in a Network
Identifying communities gives an insight about
the inherent network structure
Community detection is an ill-defined problem
what we mean by a ‘community’ is often not
concrete
often hard to reliably define a ground-truth
annotation for communities
no standard measure to asses the
performance
Diverse approaches to the problem depending
on how we define a community structure in the
network
Community Detection in Networks: Applications
Performance enhancement of the similarity-based link prediction algorithms
Improving recommendation quality in Recommender systems by separating like-
minded people
 Controlling information diffusion within a network by identifying community
memberships
Designing better marketing strategy by identifying position of the target group within
the network
Restricting epidemic propagation by suitably isolating and immunizing the vulnerable
population
Better anomaly detection in nodes, especially in evolving networks
Studying evolution of communities
Applications in criminology and detecting terrorist groups
Social Media Communities
• A basic community comes to existence when likeminded users on social
media form a link and start interacting with each other.
• Any formation of a community requires
–1) a set of at least two nodes sharing some interest and
–2) interactions with respect to that interest.
• Two types of groups in social media
–Explicit Groups: formed by user subscriptions
–Implicit Groups: implicitly formed by social interactions
• We may see group, cluster, cohesive subgroup, or module in different
contexts instead of “community”
Explicit Communities
(Clearly defined groups with membership & participation)
• These are communities where users explicitly join or participate, often with visible
membership lists.
• Facebook Groups – Groups formed around shared interests, such as "Machine
Learning Enthusiasts" or "Indian Cooking Lovers."
• Reddit Subreddits – Communities like r/India or r/AskScience, where users subscribe
and engage in discussions.
• LinkedIn Groups – Professional groups like "Data Science & AI Professionals" where
members interact on specific topics.
• Discord Servers – Gaming, tech, and hobby communities where users join dedicated
servers and participate in channels.
• WhatsApp/Telegram Groups – Private or public groups formed for specific
discussions, events, or topics.
Implicit Communities
(Inferred from interactions rather than explicit membership)
• These communities emerge naturally from user behavior, connections, or shared
activities rather than formal membership.
• Twitter Interaction Networks – Users who frequently like, reply, or retweet each other
form implicit communities around political figures, sports, or tech trends.
• Instagram Engagement Clusters – Users who consistently like, comment, or follow
similar accounts form hidden communities, like a fan base around celebrities or
influencers.
• YouTube Watch Patterns – Viewers who frequently watch and engage with similar types
of videos (e.g., tech reviews, fitness content) form implicit communities without direct
interaction.
• GitHub Developer Networks – Developers contributing to the same open-source
projects or following each other’s repositories create an implicit programming
community.
Facebook
• Frequent Interactors – People who consistently like, comment, or share posts
from the same set of pages or friends (e.g., fans of a specific political party,
sports team, or celebrity).
• Shared Interest Networks – Users who follow similar Facebook Pages (e.g.,
multiple pages related to stock trading, AI, or Bollywood) but are not in the
same group.
• Event-Based Communities – Users who RSVP to similar events (e.g., tech
conferences, music festivals) and engage with posts related to them.
• Ad Targeting Clusters – Facebook’s recommendation system identifies people
with similar browsing and engagement patterns, creating hidden communities
that receive similar ads.
LinkedIn

• Skill-Based Networks – Users who endorse each other for similar skills (e.g., "Python
Developer" or "Marketing Strategy") form hidden professional clusters.
• Frequent Engagers – People who regularly like or comment on posts related to
specific industries (e.g., AI advancements, HR trends) but are not in the same
LinkedIn group.
• Recruiter-Job Seeker Clusters – Job seekers applying for similar roles get
recommended similar recruiters, forming an unseen community of professionals
vying for the same job opportunities.
• Alumni Networks Without Groups – People who studied at the same university and
interact with related posts, even without joining an official alumni group.
• Would you like a slide version or additional real-world examples?
Types of Communities: Disjoint Communities

Also referred to as flat communities


Each node in the network can belong to
at most one community
Differs from disconnected components:
nodes in two different communities can still
have connecting edges
referred to as bridges
Example: Full-time employees of an
organization
Types of Communities: Overlapping Communities

Members can belong to more than one


community at a time
Communities can even share edges
Realistic and generic community structure
Harder to find than flat communities
Example: Various groups in social
networks
Types of Communities: Hierarchical Communities

Outcome of merging two or more


flat or overlapping communities in a
network
Can be linked to other hierarchical,
overlapping, or flat communities
Example: various city-level
communities merged to form a state-
level community
Examples
• Facebook
– Global → National → Local (e.g., "AI Enthusiasts" → "AI Enthusiasts India" → "AI Enthusiasts
Bangalore")
– Interest-Based Groups (e.g., "Fitness & Nutrition" → "Bodybuilding," "Yoga," "Keto Diet")
– Brand Engagement Tiers (e.g., Apple → Apple India → iPhone Users, Mac Users)
• LinkedIn
– Corporate Networks (e.g., Company → Departments → Alumni Groups)
– Industry-Specific Subgroups (e.g., "Data Science Professionals" → "AI in Finance," "AI in
Healthcare")
– Professional Certification Networks (e.g., "Certified Project Managers (PMP)" → "Agile PMs,"
"Scrum Practitioners")
• Twitter (X)
– Fanbase Hierarchy (e.g., "Marvel Fans" → "Spider-Man Fans," "Avengers Fans")
– Hashtag-Based Movements (e.g., #MeToo → #MeTooIndia → #MeTooMedia)
– Influencer-Led Sub-Communities (e.g., Tech Twitter → AI Enthusiasts, Cybersecurity Experts)
Local Communities
Shows a community structure from local perspective without
focusing on global
Local communities in social networks are smaller, tightly-knit
groups where users interact frequently.
◦ Facebook: Neighborhood groups (e.g., "Mumbai Foodies" or
"NYC Runners")Alumni groups (e.g., "IIT Delhi 2010 Batch")
Local business communities (e.g., Chennai Startup Founders)
◦ LinkedIn: Industry-specific professional groups (e.g., "Data
Science India") Company employee groups (e.g., "Google
India Employees") University alumni
◦ Instagram: City-based influencer networks (e.g., "Bangalore
Photographers") Niche hobby groups using common hashtags
(e.g., #HyderabadCycling)Small business communities
promoting local products
◦ Twitter (X): Regional activist communities (e.g., local
environmental groups) Followers of local politicians,
celebrities, or sports teamsLocal event-based discussions
(#DelhiComicCon)
What is Community Analysis?

• Community detection
–Discovering implicit communities

• Community evolution
–Studying temporal evolution of communities

• Community evaluation
–Evaluating Detected Communities
What is community detection?
• Community detection in social networks is the process of identifying groups of nodes
(users, entities) that are more densely connected to each other than to the rest of the
network.
• It helps in understanding the structure of social interactions, influence, and
relationships.
• The process of finding clusters of nodes (‘‘communities’’)
– With Strong internal connections and
– Weak connections between different communities
• Ideal decomposition of a large graph
– Completely disjoint communities
– There are no interactions between different communities.
• In practice,
– find community partitions that are maximally decoupled.
Challenges to Community Detection in Social Networks
Challenges to Community Detection in Social Networks
• No Universal Definition of a Community
– Some algorithms define communities based on dense connections (e.g.,
modularity-based methods), while others focus on structural roles (e.g., core-
periphery models).
– Real-world communities may be based on common interests, social interactions,
or functional relationships, leading to different interpretations of community
structures.
• 2. Different Algorithms Yield Different Communities
– Louvain may detect broad clusters,
– Infomap may find smaller, more information-theoretic groups,
– Label propagation may give unstable results due to randomness.
• The choice of algorithm introduces a layer of subjectivity in defining communities.
Challenges to Community Detection in Social Networks
• Parameter Sensitivity
– Many community detection methods require parameters (e.g., resolution in
modularity, number of clusters in spectral clustering).
– Different parameter settings can lead to entirely different community structures,
making the results subjective to user choices.
• Ground Truth Communities May Not Be Well-Defined
– In many cases, the "real" communities in a social network are unknown or
ambiguous.
– On Facebook, community detection might group users based on friendship
networks, but the "ground truth" communities (e.g., real-life friend circles,
workplace colleagues, or shared interest groups) may not always align with the
detected structure.
Challenges to Community Detection in Social Networks
• Overlapping vs. Non-Overlapping Communities
– Some researchers argue that real-world communities are overlapping (e.g., people
belong to multiple groups), while many traditional algorithms force hard partitions.
– Deciding whether a community should be strictly separate or overlapping is inherently
subjective.
• Human Interpretation Bias
– Analysts may interpret the results differently based on their prior knowledge, leading
to confirmation bias.
– For example, a detected community on Twitter might be labeled as "political
activists," but another analyst may see them as "social justice groups."
• Dynamic Nature of Communities
– Communities evolve over time, and different snapshots of a network may show
different structures.
– What is considered a "valid" community today may not be relevant tomorrow, adding
another layer of subjectivity.
Challenges to Community Detection in Social Networks

• Use multiple algorithms and compare results to identify stable


patterns.
• Incorporate domain knowledge to validate detected
communities.
• Employ objective evaluation metrics, such as modularity,
conductance, or normalized mutual information (NMI).
• Allow for interpretability by using explainable AI techniques to
understand why a method detects certain communities.
Community Detection Methods: A Taxonomy
Community Detection Algorithms

Group Users based


on Group attributes

Group Users based


on Member
attributes
Member-Based Community Detection
• Look at node characteristics; and
• Identify nodes with similar characteristics and consider them a community
Node Characteristics
A. Degree
–Nodes with same (or similar) degrees are in one community
–Example: cliques
B. Reachability
–Nodes that are close (small shortest paths) are in one community
–Example: 𝑘-cliques, 𝑘-clubs, and 𝑘-clans
C. Similarity
–Similar nodes are in the same community
Node-centric Community Detection
• Use the property of the nodes to find community structure in the
network
• Exploits node-centric features in a number of ways:
–Complete Mutuality
• Cliques
–Reachability of Members
• K-cliques
• K-clan
• K-club
–Node Degree
• K-plex
• K-core
Node-centric Community Detection: Finding Cliques
A subgraph of a graph is a clique if every
vertex-pair in the subgraph are adjacent
Has diameter of 1
Can be considered as communities
A couple of problems with this approach
 Finding cliques from a network is NP-
complete
 Constraints on cliques are too strict a
requirement
 Large cliques are not present in social
networks usually
Node Centric
• Most common subgraph searched for:
• Clique: a maximum complete subgraph in which all nodes inside the subgraph
adjacent to each other
Find communities by searching for
1. The maximum clique: the one To overcome this, we can
with the largest number of vertices, I. Brute Force
or II. Relax cliques
2. All maximal cliques: cliques that III. Use cliques as the core
are not subgraphs of a larger for larger communities
clique; i.e., cannot be further
expanded

Both problems are NP-hard


Brute-Force Method

• Can find all the maximal cliques in the graph


• For each vertex 𝑣𝑥 , we find the maximal
clique that contains node 𝑣𝑥

Impractical for large networks:


• For a complete graph of only 100 nodes, the algorithm will generate at
least 299 − 1 different cliques starting from any node in the graph
Enhancing the Brute-Force Performance
[Systematic] Pruning can help:
• When searching for cliques of size 𝑘 or larger
• If the clique is found, each node should have a degree equal to or more
than 𝑘 − 1
• We can first prune all nodes (and edges connected to them) with degrees
less than 𝑘 − 1
–More nodes will have degrees less than 𝑘 − 1
–Prune them recursively
• For large 𝑘, many nodes are pruned as social media networks follow a
power-law degree distribution
Maximum Clique: Pruning…

Example. to find a clique ≥ 4, remove all nodes


with degree ≤ (4 − 1) − 1 = 2
– Remove nodes 2 and 9
– Remove nodes 1 and 3
– Remove node 4
Even with pruning, cliques are less desirable
– Cliques are rare
– A clique of 1000 nodes, has 999x1000/2
edges
– A single edge removal destroys the clique
– That is less than 0.0002% of the edges!
Node-Centric Community Detection: K-Cliques
The maximal subset of vertices of the network such that, for
any two nodes belonging to this subset, the shortest distance
1 between them is less than or equal to K
1-clique is normal clique – {1,2,6} {4,5,7}
2 6
The nodes {1,2,3, 5,6} {2,3,4,5,6} forms a 2-clique in 𝐺1
2-cliques are known as known as friend of a friend in social
network analysis
Issue:
A node not present in K-clique can contribute in formation
3 5 of the shortest distance in it!! (see vertex 1)
• This happens because:
4
7
– Bridging Nodes: A node outside a k-clique might serve as a
bridge between different k-cliques, reducing the shortest
𝑮𝟏 path between nodes in different communities.
– Network Connectivity: Even if a node is not fully connected
to all members of a k-clique, it can still provide shortcut.
– Periphery Influence: Peripheral nodes often help connect
dense subgraphs
•Blue (k-Clique 1) and Green (k-Clique 2) represent two fully
connected subgroups.
•Red (Node 9) is a bridging node that is not part of either k-
clique but connects them.
•This node reduces the shortest path distance between the
two k-cliques, acting as a shortcut.
• Here is a LinkedIn network
visualization where four executives
(Ravi, Ananya, Meera, Karthik) form
a 2-clique (connected in blue)
• Priya (HR Manager in red) is not part
of the clique but contributes to
shortest paths by connecting Ananya
and Karthik.
• This setup illustrates how a node
outside the clique can still be critical
in shortest path formation between
members.
Node-Centric Community Detection: K-Clan
A stricter version of K-clique
1
Only the nodes present in the set under inspection are
2 6 used to create the subgraph in which the distance
between any two nodes should be less than or equal
to K
In the network 𝐺1 ,
3 5
The nodes {1,2,3,5,6} forms a 2-clique, but it is not a 2-
clan
4
The nodes {2,3,4,5,6} forms a 2-clan in the network 𝐺2
𝐺2 {1,2,3,6} {1,2,5,6}
Maximality condition of K-clique also persists in K-clan
K -Cliques

• k-Clique:
• Definition: A k-clique is a subgraph in which the shortest distance between any two
nodes is at most k.
• This means that within the subgraph, any two nodes can be reached from one
another in k steps or fewer.
• Characteristics:
– The subgraph is not necessarily fully connected; some nodes may not have direct
links but are connected through intermediate nodes.
– The focus is on the maximum distance between nodes, allowing for a more relaxed
structure compared to traditional cliques.
k-Clan
• Definition: A k-clan is a k-clique with the additional constraint that the diameter of the induced
subgraph is at most k.
• The diameter is the greatest distance between any pair of nodes within the subgraph.
• Characteristics:
– The subgraph induced by the k-clan is more tightly connected, ensuring that all nodes are
within k steps of each other within the subgraph itself.
– This stricter condition means that k-clans are more cohesive subsets compared to k-cliques.
• Key Difference:
– While both k-cliques and k-clans consider the distance between nodes, a k-clique allows for
nodes to be within k steps in the context of the entire graph, potentially relying on external
nodes to maintain these distances.
– In contrast, a k-clan requires that the nodes be within k steps of each other within the
subgraph itself, ensuring a more cohesive and self-contained structure.
– Understanding these distinctions is crucial for analyzing the cohesiveness and communication
efficiency within subgroups of social networks.
Node-centric Community Detection: K-club
K-club is a K-clan minus the maximality
1
condition
2 6 {2, 3, 4}, {3, 4, 5}, {4, 5, 6}, {5, 6, 2}, and {6, 2,
3} in 𝐺2 are all 2-clubs
Every K-clan is a K-club as well as a K-clique
3 5 Challenges:
These algorithms are still computationally
4
expensive for large K
𝐺2 Deciding appropriate K is difficult
K – Cliques , Clans, Clubs

• Ananya, Nikhil, Sonia, Priya, Raj - 2 Clique {kartik not in clique but on path}
• Ananya, Nikhil, Sonia, Raj - 2 Clique 2 Clan
• Ananya, Nikhil, Sonia, Priya - 2 Clique 2 Clan
• Ananya, Kartik, Sonia, Priya, Raj - 2 Clique 2 Clan
• Ravi Kartik meena Amit - 2 Clique 2 Clan
• All the above are also 2 Clubs but also
– Nikhil Ananya Sonia
– Kartik Ravi Meera etc etc

• Find All (Exercise)


Community Type Definition Key Constraints Maximal?
A relaxed clique where each Every node in the group
Yes – No new nodes can
node is at most k hops away must be reachable from
k-Clique be added without
from every other node in the every other node in at most
breaking the k-hop rule.
subgraph. k steps.

A stricter form of k-Clique The longest shortest path Yes – No new nodes can
k-Clan where the diameter of the (diameter) within the group be added if it increases
subgraph is at most k. is ≤ k. the diameter beyond k.

Must be connected and No – A k-Club is not


A subgraph where every node is
have a shortest path necessarily maximal,
k-Club at most k steps from every
distance of ≤ k between all meaning a larger subset
other node.
members. may exist.

• For detecting k-Cliques, k-Clans, and k-Clubs in large-scale social networks, exact
solutions are often impractical due to their NP-complete or NP-hard nature.
• Instead, researchers use heuristic, approximation, and optimization-based methods.
Summary
1. 𝑘-Clique: a maximal subgraph in which the largest shortest path distance between
any nodes is less than or equal to 𝑘
2. 𝑘-Clan: follows the same definition as a 𝑘-clique
– Additional Constraint: nodes on the shortest paths should be part of the
subgraph (i.e., diameter)
3. 𝑘-Club: a 𝑘-clique where for all shortest paths within the subgraph the distance is
equal or less than 𝑘. (need not be maximal)
– All 𝑘-clans are 𝑘-cliques, but not vice versa.
Node-centric Community Detection: K-plex
Based on Node Degree
2 1 A subset of vertices 𝑆 in a graph is a 𝐾-plex if every
vertex of the induced subgraph 𝐺[𝑆] has degree at
least |𝑆| − 𝐾
6 3 A measure based on the degree of the nodes
In the network 𝐺3 ,
The subset {3,4,5,6} is a 1-plex, i.e., a regular
clique
The subset {1,3,4,5,6} is a 2-plex, but not a 1-
5 4
plex
𝐺3 The subset {1,2,3,4,5,6} is a 3-plex, but not a 2-
plex
K-Plex
• 𝒌-plex: a set of vertices 𝑉 in
which we have

• 𝑑𝑣 is the degree of 𝑣 in the


induced subgraph
– Number of nodes from
𝑉 that are connected to 𝑣
• Clique of size 𝑘 is a 1-plex
• Finding the maximum 𝑘-plex:
NP-hard
– In practice, relatively easier
due to smaller search
space.
Node-centric Community Detection: K-core
Another degree-centric measure
A subgraph 𝐺′ of a graph 𝐺 in which each
node has degree greater than or equal to 𝐾
K+1 core subgraph can be created from the
current K core subgraph by recursively
removing nodes of degree K.
This above should be repeated until there is
no node of degree K in the current subgraph.
Issues:
Checking whether a given network is K-
core or K-plex is computationally easy
Finding maximal K-core/K-plex is NP-
complete!!
K-Core Maximal K-Core
• In degree pruning, we start
listing vertices with degrees less
than k.
• We then iteratively remove a
vertex with a degree less than k
from the graph and all its
incident edges until no such
vertices remain.
• The resulting subgraph is then
the K-Core of the original graph.
• A K-Truss is a subgraph where
every edge belongs to at least
(K-2) triangles.
Node-centric Community Detection: K-Shell
• 𝒌-shell: nodes that are part of the 𝑘-core, but are
not part of the (𝑘 + 1)-core.
• Assume there are n nodes and you apply k-shell
decomposition in it.
• So nodes with degree 1 will be in bucket1 then we
will see that after disconnecting these nodes is
there any node left with degree 1 if yes then we
will add them in bucket 1 and again check and
repeat these steps for degree 2, 3, and so on and
put them in bucket2, bucket3, etc.
– bucket1 = [3, 7, 6, 5]
– bucket2 = [1, 2, 4].
Which Method is Best?
• k-Core and k-Shell: Due to their linear time complexity, these methods are well-suited
for large-scale networks.
• k-Truss: While more computationally demanding, it's feasible for moderately sized
graphs but may be challenging for extremely large networks.
• k-Plex: Given its NP-hard nature, exact detection in large graphs is often impractical;
heuristic or approximate methods are typically employed.
• Dynamic Graphs: Social networks are often dynamic, with nodes and edges
continuously being added or removed.
• Maintaining up-to-date decompositions in such evolving graphs is complex and may
require frequent recomputation, which is computationally expensive.
• Memory Consumption: Processing large graphs demands substantial memory
resources.
• Algorithms may need to be optimized for external memory or distributed computing
environments to handle datasets that exceed the capacity of a single machine.
Which Method is Best?

• k-Clique detection is related to the classical Clique Problem, which is NP-complete.


• k-Clan and k-Club introduce additional constraints on the shortest path distances
within the community, making them even harder (NP-hard).
• Exact algorithms are impractical for large social networks, so heuristic and
approximation algorithms are used in real-world applications (e.g., Facebook and
LinkedIn).
• Small graphs (≤ 10,000 nodes) → Bron-Kerbosch, ILP, or Greedy Expansion.
• Large graphs (millions of nodes, e.g., Facebook, LinkedIn) → Partitioning + Parallel
Heuristics.
• For highly connected communities (e.g., scientific collaborations, corporate
networks) → Spectral Clustering or Modularity Optimization.
Community Detection vs. Clustering
• While all previous methods offer valuable insights into network topology,
their limitations are in capturing complex community structures and
interpretation
• Most clustering algorithms can be used for community detection
• In general the difference is in having link information
–Clustering algorithms works on the distance or similarity matrix
• k-means
–Network data tends to be “discrete”, leading to algorithms using the
graph property directly
• k-clique, quasi-clique, vertex-betweenness, edge-betweenness etc.
–Graph clustering algorithms are more proper than traditional clustering
algorithms
Divisive Hierarchical Clustering
• Divisive clustering
–Partition nodes into several sets
–Each set is further divided into smaller ones
–Network-centric partition can be applied for the partition
• Girvan-Newman Example: recursively remove the “weakest” links within a
“community” to be found
–Find the edge with the weakest link
–Remove the edge and update the corresponding strength of each edge
• Recursively apply the above two steps until a network is discomposed into a
desired number of connected components.
• Each component forms a community
Edge Betweenness

• To determine weakest links, (vital links) algorithm uses a measure


“edge betweenness”
• Edge betweenness is the number of shortest paths that pass along with
the edge
• "For an edge e in a graph, edge betweenness of e is defined as the
number of shortest paths between all node pairs (vi, vj) in the graph such
that the shortest path between vi and vj passes through e".
• If there are k different shortest paths between vi and vj we divide the
number by 'k'.
Method 1: Strength of Weak Ties
• Edge betweenness: Number of shortest paths passing over the edge

b=16
b=7.5

Edge strengths (call volume) Edge betweenness


in a real network in a real network 59
Edge Betweenness: Example

• edge betweenness of e(1, 2)


is 6/2 + 1 = 4, as all the
shortest paths from 2 to {4,
5, 6, 7, 8, 9} have to either
pass e(1, 2) or e(2, 3), and
e(1,2) is the shortest path
between 1 and 2
Edge Betweennesss

• The edge betweenness centrality (EBC) can be defined as


the number of shortest paths that pass through an edge
in a network.

• Each and every edge is given an EBC score based on the


shortest paths among all the nodes in the graph.

61
The Girvan-Newman Algorithm

1. Calculate edge betweenness for all edges in the


graph.
2. Remove the edge with the highest
betweenness.
3. Recalculate betweenness for all edges affected
by the edge removal.
4. Repeat until all edges are removed.
The Girvan-Newman Algorithm
1. Calculate (EB) score for all edges in the graph. We can store it in a distance
matrix as usual.
2. Identify edge with highest EB score and remove it.
3. If several edges with the same high EB score, all of them can be removed in
one step.
4. If this causes the graph to separate into disconnected subgraphs, these form
first level communities.
5. Recompute the EB score for all the remaining edges.
6. Repeat from step 2.
7. Continue until the graph is partitioned into as many communities as desired or
the highest EB score is below a pre defined threshold value.
Edge Betweenness Clustering: Example
Initial betweenness value

the first edge that needs to be removed


is e(4, 5) (or e(4, 6))
By removing e(4, 5), we compute the edge betweenness
once again;
this time, e(4, 6) has the highest betweenness value: 20.
This is because all shortest paths between nodes {1,2,3,4} to
nodes {5,6,7,8,9} must pass e(4, 6); therefore, it has
betweenness 4*5 = 20.
Implementation
• The major cost for the above algorithm is finding the EB score for every
edge in the graph.
• We need to find the number of shortest paths from all nodes.
• Girvan and Newman proposed a faster algorithm based on use of
Breadth First Search (BFS) algorithm.
• For each node N in the graph
–Perform breadth-first search of graph starting at node N
–Determine the number of shortest paths from N to every other node
–Based on these numbers, determine the amount of flow from N to all
other nodes that use each edge
–Divide sum of flow of all edges by 2
65
Girvan Newman method: Example

Betweenness(7-8)= 7x7 = 49 Betweenness(1-3) = 1X12=12


Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) = betweenness(8-12)= 3X11=33
Girvan Newman method: example

Betweenness(1-3) = 1X5=5
Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) =
betweenness(8-12)= 3X4=12
Girvan Newman method: example

Betweenness of every edge = 1


Girvan Newman method: example
Girvan-Newman: Example
Step 1: Step 2:

Step 3: Hierarchical network decomposition:

70
Another example

5X5=25
Another example

5X6=30 5X6=30
Another example
Girvan-Newman: Results

Communities in physics collaborations 75


Girvan-Newman: Results

• Zachary’s Karate club:


Hierarchical decomposition

76
We need to resolve 2 questions

• How to compute betweenness?


• How to select the number of clusters?

77
Community Detection: Modularity
• Node-centric methods discussed so far are not very useful when the network is large
• Modularity comes from the word ‘module’
• a network-centric metric to determine the quality of a community structure
• Based on the principle of comparison between
– the actual number of edges in a subgraph and its expected number of edges
– the expected number of edges is calculated by assuming a null model
• In the null model,
– each vertex is randomly connected to other vertices irrespective of the community
structure
– However, some of the structural properties are preserved
– One popular structural property is the degree distribution
Concept of Modularity

• Modularity is a measure of the structure of networks or graphs which


measures the strength of division of a network into modules (also called
groups, clusters or communities).
• Networks with high modularity have dense connections between the nodes
within modules but sparse connections between nodes in different modules.
• Modularity is often used in optimization methods for detecting community
structure in networks.
• Modularity is the fraction of the edges that fall within the given groups minus
the expected fraction if edges were distributed at random.
• Value of the modularity for unweighted & undirected graphs lies in the range
[-1,1]
Modularity in a Graph (Simple Explanation)
• Modularity is a measure of how well a graph is divided into communities. It
compares the actual connections within groups to what we would expect if edges
were randomly distributed.
• Why Is Modularity Useful for Community Detection?
• Higher modularity → Stronger communities (nodes are densely connected within
groups, but sparsely connected outside).
• Used to evaluate and optimize community
• Detection methods like Louvain Algorithm.
• Helps identify real-world groups in social networks (e.g., Facebook friend clusters,
LinkedIn professional groups).
• Goal: Find partitions that maximize modularity, meaning strong internal
connections and weak external links!
Community Detection: Modularity

The modularity 𝑄 of the community structure can be written as:


1 deg 𝑖 ∙ deg(𝑗)
𝑄= ෍ 𝑎𝑖𝑗 − 𝛿 𝐶𝑜𝑚𝑚 𝑖 , 𝐶𝑜𝑚𝑚(𝑗)
2 ∙ |𝐸| 2 ∙ |𝐸|
𝑖,𝑗
 𝐶𝑜𝑚𝑚 𝑖 is the identifier of the community in which node 𝑖 belongs to
 𝛿 𝐶𝑜𝑚𝑚 𝑖 , 𝐶𝑜𝑚𝑚(𝑗) =
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑐𝑜𝑚𝑚𝑢𝑛𝑖𝑡𝑦
ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Community Detection: Modularity
An alternative formulation of
modularity:
|𝐶𝑜𝑚𝑚| 2
𝑚𝑛 𝑘𝑛
𝑄= ෍ −
|𝐸| 2 ∙ |𝐸|
𝑛=1

 𝑚𝑛 denotes the number of edges in


the community 𝑛
 |𝐶𝑜𝑚𝑚| is the total number of
communities
 𝑘𝑛 = σ𝑖∈𝐶𝑜𝑚𝑚(𝑛) deg(𝑖)
Modular Communities
Consider a graph 𝐺(𝑉, 𝐸), where the degrees are known
beforehand however edges are not
–Consider two vertices 𝑣𝑖 and 𝑣𝑗 with degrees 𝑑𝑖 and 𝑑𝑗.
• What is an expected number of edges between 𝑣𝑖 and 𝑣𝑗?
• For any edge going out of 𝑣𝑖 randomly the probability of this
edge getting connected to vertex 𝑣𝑗 is
Modularity and Modularity Maximization

• Given a degree distribution, we know the expected number of edges


between any pairs of vertices

• We assume that real-world networks should be far from random.


Therefore, the more distant they are from this randomly generated
network, the more structural they are.

• Modularity defines this distance and modularity maximization tries to


maximize this distance
Normalized Modularity
• Consider a partitioning of the data 𝑃 = (𝑃1, 𝑃2, 𝑃3, … , 𝑃𝑘)

For partition 𝑃𝑥 , this distance can be


defined as

This distance can be generalized for a


partitioning 𝑃

The normalized version of this distance is


defined as Modularity
Modularity Maximization: Example
Two
Communities:
{1, 2, 3, 4}
and
{5, 6, 7, 8, 9}

𝑘-means

2 eigenvectors

Modularity Matrix
Modularity and Null Model
Making Modularity Optimization Faster

• The matrix method discussed is slow and does not scale to


millions of nodes (and billions of edges)

• We can perform greedy optimization of modularity to speed-up


the process

• Louvain Method
–A greedy modularity optimization method for community
detection
Example – Modularity Computation
Modularity Computataion
Examples – Modularity Computation
Community Detection: Modularity Maximization
Modularity can be positive, negative, and zero
 Positive modularity shows presence of strong community structure
Networks with high modularity have dense connections between the nodes
within modules but sparse connections between nodes in different
modules.
Different community assignments can lead to different values of modularity
an assignment that maximizes the modularity of the overall network often
finds the communities in the network
Fast Greedy Algorithm
Louvain Method
Extreme Cases of Modularity
• If all nodes belong to a single community, the modularity is zero (Q=0).
• This means that the graph does not have a meaningful community structure
compared to a random distribution of edges.
• Effective community detection should produce modularity values
significantly greater than zero, typically between 0.3 and 0.7 for good
partitions.
• If each node is its own community, modularity is negative (Q<0).
• This is because the partitioning is worse than random, as no intra-
community edges exist, and all edges are treated as inter-community edges,
lowering modularity.
• Typically, meaningful community structures have modularity values closer
to 0.3–0.7.
Motivating Example - Louvain Method
• Example: Friends in a Social Network A -- B D
• Consider a social network where individuals are connected based on | /
friendships:
• Initial Communities: C -- E
– Each person is initially in their own community: {A}, {B}, {C}, {D}, {E}.
• Optimization:
– The algorithm notices strong connections among A, B, C, and E, leading
to a merge into a single community.
– New communities: {A, B, C, E}, {D}.
• Modularity Calculation:
– Modularity is calculated to assess the improvement in the network
division.
• Final Result:
– After several iterations, the Louvain algorithm settles on a division with
higher modularity.
• Final communities: {A, B, C, E}, {D}.
Overview – Louvain Method
Community Detection: Louvain Method
Louvain Method
Start with a weighted network where all nodes are in their own
communities (i.e., n communities)
First Phase:
• For each node 𝑣𝑖 ,
–For all neighbors 𝑣𝑗 ∈ 𝑁(𝑣𝑖 ):
• compute the modularity gain if 𝑣𝑖 is removed from its community and placed in
the community of 𝑣𝑗 .
–Find the community with the maximum modularity gain
–If the maximum gain is positive, remove 𝑣𝑖 from its community, and
place it in that community
–If no positive gain, do not change communities
• Repeat until no node changes its community
Important Points about Phase I

• A point can be considered multiple times

• A Local Minima of modularity maximization is achieved in phase I

• Phase I is order dependent


–The modularity achieved is more or less stable and is less
dependent on the initial order
–The computation time depends on the initial order.
Louvain Method
Second Phase:
–Build a new network
• Nodes are communities
• Edges are the edges between nodes in the corresponding
communities (weights are sum of the weights)
• Self-loops represent edges within the community
• The algorithm creates hierarchies of communities
• It usually ends in less than 10 passes
• It is seems to be an O(𝑛 log 𝑛) algorithm
Community Detection through Modularity Maximization:
Limitations
• Resolution limit:
–well-connected smaller communities tend to get merged with larger
communities even if the resultant communities are not that dense
–fails to detect those communities which are well-separated with
densely connected intra-community nodes but only a single inter-
community edge with the rest of the network
• Degeneracy of solutions:
–the case when there is an exponential number of community
structures with same (maximum) modularity value
Concept of overlapping communities
• In the social network arena where nodes are individuals it is natural that the
individuals can belong to several different communities at a time and thus
overlapping communities would be more natural.
– In a Twitter network we can have individuals following several other individuals at
the same time and thus participating in several communities simultaneously.
– LinkedIn Networks - A person can be in multiple professional communities (e.g.,
Data Science, AI, and Product Management).
– Facebook Groups -A person can be a member of multiple interest groups (e.g., "AI
Enthusiasts" and "Startup Founders").
– Instagram - Celebrities & BrandsA fashion influencer who is also a fitness coach
(e.g., an athlete with a clothing brand).
– Tech & Business InfluencersA startup founder posting about AI, entrepreneurship,
and personal development.
107
Facebook Network
Social communities

High school Summer


internship

Stanford (Basketball)
Stanford (Squash)

Nodes: Facebook Users


108
Edges: Friendships
Clique Percolation Method
• "Clique" a fully connected
subgraph.
– Every pair of vertices in the subgraph
is connected through an edge.
• Finding all cliques of a given size
in a graph, is an NP-hard problem.
• "k-clique" indicates, the clique
consisting of k vertices.
• For e.g. a 6-clique indicate a • six 3-cliques
complete sub-graphs having 6 • (1, 2, 3), (1, 2, 8), (2, 4, 5), (2, 4,
vertices. 6), (2, 5, 6) (4, 5, 6)
• one 4-clique (2, 4, 5, 6). 109
Clique Percolation Method (CPM)
• Identify overlapping communities.
• intuition -"In a dense community likely to find large number of edges & thus
cliques; Unlikely that edges between communities i.e., inter-community edges
form cliques".
• Assumption that community is normally formed from overlapping cliques;
detects communities by searching for adjacent cliques.
• First extract all the k-cliques in the network.
• A new graph called as "Clique-Graph" is constructed
–Each extracted K-Clique is compressed as one vertex.
–Connect two vertices in this clique- graph if the cliques represented by them
have (k − 1) members in common.
• Each connected subgraph in the clique-graph represents one community 110
111
CPM - Illustrated
• we have six 3-cliques.
• We form a Clique-graph
with 6 vertices, each
vertex representing one of
these following 6 cliques:
• a:(1, 2, 3); b:(1, 2, 8);
c:(2,4, 5); d:(2, 4, 6);
e:(2,5,6); f:(4, 5,6);

112
CPM - Illustrated

• k = 3; add edge if two cliques share min of 2 vertices.


• Clique a & Clique b have vertices 1 and 2 in common so will be connected through an
edge.
• Similarly add other edges
• Connected components (a, b) & (c, d, e, f) form the communities.
• 2 connected components - 2 communities:
– c1 : (1, 2, 3, 8)
– c2 : (2, 4, 5, 6)
• Thus the community set C = {c1, c2} where the vertex 2 overlaps both the
communities.
• Vertex 7 is isolated as it is not a part of any 3-cliques.
113
CPM Example
Cliques of size 3:
{1, 2, 3}, {1, 3, 4}, {4, 5, 6},{5, 6, 7},
{5,6, 8}, {5, 7, 8}, {6, 7, 8}

Communities:
{1, 2, 3, 4}
{4, 5, 6, 7, 8}
114
Clique Percolation Method: Example
Cliques of size 3:
{𝑣1 , 𝑣2 , 𝑣3 }, {𝑣3 , 𝑣4 , 𝑣5 },
𝑣4 , 𝑣5 , 𝑣6 , {𝑣4 , 𝑣5 , 𝑣7 },
{𝑣4 , 𝑣6 , 𝑣7 }, {𝑣5 , 𝑣6 , 𝑣7 },
{𝑣6 , 𝑣7 , 𝑣8 }, {𝑣8 , 𝑣9 , 𝑣10 }

Communities:
{𝑣1 , 𝑣2 , 𝑣3 },
{𝑣8 , 𝑣9 , 𝑣10 },
{𝑣3 , 𝑣4 , 𝑣5 , 𝑣6 , 𝑣7 , 𝑣8 }
Overview – Link Partition
Overlapping Community Detection: Link Partition
Uses links (edges) in the networks to detect communities
Two major approaches
Create a link network and apply a node partitioning algorithm or disjoint
community detection algorithm to find the community
Use similarity measures on the edges to find the communities directly
by creating the dendrogram
Jaccard coefficient might be good choice for similarity measure
For two edges 𝑒𝑖𝑘 and 𝑒𝑗𝑘 connected to node 𝑘,
|𝑁𝑖 ∩ 𝑁𝑗 |
𝑆𝑖𝑚 𝑒𝑖𝑘 ,𝑒𝑗𝑘 =
|𝑁𝑖 ∪ 𝑁𝑗 |
Where 𝑁𝑖 and 𝑁𝑗 are the neighbours of the nodes 𝑖 and 𝑗
Link Partition Method - Illustrations

Example Network

Link Network for the Dendrogram for the


example network example network obtained
using similarity measures
Applications of Link Partitioning in Real Social Networks
• Link partitioning is highly useful for analyzing social networks like Instagram
and Facebook, where users belong to multiple communities based on their
interactions.
• Instead of just clustering users (nodes), it groups relationships (links) based
on their nature, capturing overlapping communities.
• Instagram – Spam & Fake Account Detection
–Spam bots interact differently from real users.
–By clustering link behaviors, Instagram detects unnatural engagement
patterns.
–Example: A bot might comment generic text on many posts but rarely DM
anyone → Identified as spam.
–Application: Instagram can flag accounts that belong to "spam clusters"
based on link partitioning.
Instagram
• Instead of forcing the user into one community, link partitioning detects separate
link-based groups:
–Fitness Community (Likes fitness-related posts)
–Tech Community (Comments on AI and startups)
–Fashion Community (DMs influencers for brand collaborations)
• Application: Instagram can recommend better content by identifying which
community a specific interaction belongs to.
• Influencer Marketing & Audience Segmentation
–Influencers have followers from different niches.
–Link partitioning analyzes interactions, grouping users based on engagement.
–Brands can identify which community an influencer is strongest in.
–Example:A fashion influencer also posts travel content.
–Link partitioning can separate "fashion-focused" followers from "travel lovers",
helping brands target the right audience.
FaceBook
• Link partitioning helps in community detection based on relationships.
• Friendship & Group Overlap Detection
– A person might be in multiple groups (e.g., Work, Family, Gaming).
– link partitioning clusters friendships differently.
– Example: A Facebook user named John:
– Chats with work colleagues → Work Group ; Likes and comments in a gaming community →
Gaming Group ; shares personal updates with family members → Family Group
– Link partitioning captures these separate relationship types, helping Facebook recommend
better content & connections.
• Viral Content & Trend Detection
• Facebook tracks how posts spread through shared links.
• Link partitioning detects clusters of users who frequently share similar types of content.
– Application: Helps identify trending topics in specific communities (e.g., tech news vs.
entertainment news).
– Helps Facebook prioritize news in the feed based on how it spreads within link communities.
Local Community Detection
• Algorithms that aim to search all communities in a network can be regarded as global
community detection algorithms
• However, in some real-world scenarios, people may be more concerned about the local
community instead of the global ones.
• For example, popular social applications such as Facebook and Instagram etc can
recommend candidate friends to a specific user.
• Intuitively, the persons who are in the same social circles with the user are more likely to
be recommended than others.
• Why Use Local Community Detection?
– Scalability: Works well on massive social networks like Facebook, Instagram, Twitter,
where global methods are computationally expensive.
– Personalized Analysis: Finds communities specific to a user, useful for
recommendations, targeted ads, and friend suggestions.
– Dynamic Networks: Works in real-time as networks evolve, unlike global methods.
Local Communities - Seed Expansion Methods
• Several algorithms exist to detect communities locally, starting from a seed node
and growing a subgraph.
• These methods start from a node and expand outward, adding strongly connected
nodes.
• Example: Personalized PageRank (PPR)Starts from a seed node (e.g., a user on
Instagram)
– Uses random walks to find closely connected nodes
– More likely to detect friends and frequently interacted users
• Real-World Application:
– Instagram uses a similar approach for "People You May Know" suggestions.
– Facebook recommends Groups and Events based on locally detected
communities
Local Optimization Methods
• These methods iteratively refine the local community based on a quality
measure.
• Example: Local Louvain Algorithm
–Starts from a seed node
–Merges nearby nodes if it improves modularity locally
–Stops when no more improvements can be made
• Real-World Application:
–Facebook detects friend clusters based on likes, comments, and shared
groups.
–Twitter identifies sub-communities in trending hashtags
Link-Based Approaches
• Instead of focusing on nodes, these methods group edges that are densely
connected.
• Example: Link Clustering Method
–Groups strongly connected interactions (DMs, comments, follows)
–Naturally detects overlapping communities
• Real-World Application:
–Instagram identifies interest-based communities (e.g., users active in
fitness & travel).
–Facebook detects political discussion groups by analyzing frequent
interactions.
Applications

Platform Use Case


Suggesting new friends based on shared likes &
Instagram
comments
Facebook Recommending groups & events

Twitter Finding topic-based communities (e.g., AI, politics)

LinkedIn Suggesting professional connections in the same industry

YouTube Recommending videos based on similar user interactions


Local Modularity Method

• Key Idea
• Given a seed node, the algorithm expands the community iteratively by
greedily optimizing local modularity.
• It does not require the full network—only local neighborhood
information.
• This method is ideal for applications like personalized recommendations
on Instagram, Facebook, or LinkedIn, where a user’s network is
incrementally explored.
Algorithm
• Start with a seed node (e.g., an Instagram user).
• Initialize community as just the seed node.
• Expand the community:
–Consider neighboring nodes (directly connected to the
current community).
–Add the best node (the one that maximizes the local
modularity gain).
• Repeat until adding another node does not improve local
modularity.
Local Modularity
• Suppose that in a graph G, we have perfect
knowledge of the connectivity of some set of
vertices, i.e., the
• known portion of the graph, which we denote C.
• This necessarily implies the existence of a set of
vertices U about which we know only their
adjacencies to C.
• Further, let us assume that the only way we may
gain additional knowledge about G is by visiting
some neighboring vertex 𝑉𝑖 ∈ U, which yields a list
of its adjacencies.
• As a result, 𝑉𝑖 becomes a member of C, and
additional unknown vertices may be added to U.
Local Community Detection: Local Modularity
Let us say we have complete knowledge about a subgraph (community) C of an
unweighted and undirected network G
We denote Known Adjacency Matrix of G as
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑖 ∈ 𝐶 ∨ 𝑗 ∈ 𝐶
𝐴𝑖𝑗 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Define the quality of the community as the fraction of known connections lie
completely inside the community:
σ𝑖𝑗 𝐴𝑖𝑗 𝜉 𝑖, 𝑗 1
= ∗
෍ 𝐴𝑖𝑗 𝜉 𝑖, 𝑗
σ𝑖𝑗 𝐴𝑖𝑗 2𝑚
𝑖,𝑗

 𝑚 : total number of edges in the partial adjacency matrix
• 𝜉 𝑖, 𝑗 : 1 if 𝑖 and 𝑗 both belongs to the same community; 0 otherwise –
– This quantity will be large when C has many internal connections, and few
connections to the unknown portion of the graph.
– This measure also has the property that when |C| ≫ |U|, the partition will almost
always appear to be good.
Local Community Detection: Local Modularity
W is the set of vertices in C that have atleast
one neighbor in U
Define boundary adjacency matrix as
𝐴መ 𝑖𝑗
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑖 ∈ 𝑊 ∨ 𝑗 ∈ 𝑊
=ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Using boundary-matrix, define local


modularity as:
σ𝑖𝑗 𝐴መ 𝑖𝑗 𝛿 𝑖, 𝑗 𝐼
𝑄𝑙𝑜𝑐𝑎𝑙 = =
σ𝑖𝑗 𝐴መ 𝑖𝑗 𝑇
 𝛿 𝑖, 𝑗 : 1, when 𝑖 belongs to 𝑊 and 𝑗 belongs
to 𝐶, or vice versa, 0 otherwise
Abstract division of network G into local  𝑇: number of edges with at least one end-
community C, its boundary W and the edges point in 𝑊
connecting W to the unknown neighbors U  𝐼: number of edges having no end-point in 𝑈
Local Community Detection: Local Modularity Maximization
• Basic Algorithm steps:
1. Initialization: Initialize local community 𝐶 with 𝑣0 only;
neighbors of 𝑣0 form 𝑈
2. Update: Follow the following steps
a. Iterate over neighbors of 𝑣0 , add those to 𝐶 that provide
maximum increase in 𝑄𝑙𝑜𝑐𝑎𝑙
b. Explore neighbors of new vertices in 𝐶, add new neighbors
to 𝑈
3. Termination: process continues until the number of vertices in
𝐶 reaches a pre-decided maximum number
Local Community Detection: Subgraph Modularity
subgraph modularity is proposed based on the degree of each vertex
Define adjacency matrix for subgraph 𝐶 and its neighbors 𝑈 as:
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑖 ∈ 𝐶 ∨ 𝑗 ∈ 𝐶
𝑆𝑖𝑗 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Define in-degree of subgraph 𝐶 is the total number of edges that lie completely in
subgraph 𝐶
𝐼𝑛 𝐶 = ෍ 𝑆𝑖𝑗 𝛿 𝑖, 𝑗
𝑖,𝑗
Define out-degree of subgraph 𝐶 is the total number of edges between 𝐶 and the
remaining part of the network 𝐺
𝑂𝑢𝑡 𝐶 = ෍ 𝑆𝑖𝑗 𝜆 𝑖, 𝑗
𝑖,𝑗
• 𝜆 𝑖, 𝑗 : 1 if exactly one of 𝑖 or 𝑗 lies in subgraph 𝐶, and 0 otherwise
Local Community Detection: Subgraph Modularity Maximization
The subgraph modularity 𝑆𝑀 is defined for the subgraph 𝐶 of network 𝐺 as the ratio of
the in-degree of subgraph 𝐶 to the out-degree of the subgraph 𝐶
𝐼𝑛 𝐶
𝑆𝑀 =
𝑂𝑢𝑡 𝐶
Basic Algorithmic steps:
1. Initialization: Initialize a subgraph 𝐶 with node 𝑣 only; neighbors of 𝑣 form 𝑈
2. Addition: Iterate over the vertices in U and add those vertices to C that increase 𝑆𝑀
value of the subgraph C
3. Deletion: Remove vertices in each iteration from 𝑉𝐶 such that the network remains
connected and the 𝑆𝑀 value increases
4. Final Step: Add neighbors of those vertices, which are left in the subgraph 𝐶,
considering only those neighbors that are not already in 𝑈
Community Search
Aims to find a community that contains the query node in a network
A query dependent version of community detection problem
All the community detection algorithms are also applicable to community search
Common algorithms used for community search
 K-core: maximal network in which each node has minimum degree k
 K-clique: set of K vertices such that each pair of vertices has an edge between
them
 K-truss: maximal subgraph such that every edge of the network is contained in at
least (K − 2) triangles in the subgraph
 K-ECC (K - edge connected component): a subgraph such that after removing any
K − 1 edges, the subgraph is still connected
Community Detection (CD) versus Community Search (CS)

a) In CD, we find all the communities inside a network; in the case of


CS, we only find the community related to the query vertex
b) In CD, we use global parameters to find communities in the network;
in CS, the query parameter given by the user is used to find the
communities in the network
c) CD algorithms are generally time-consuming and non-scalable; they
cannot be used for online tasks. CS algorithms are meant to work for
the online tasks
d) It is hard to use indexes and dynamic networks with CD; whereas, it
is easy to do so with CS algorithms
END

You might also like