Cairo University
Faculty of Graduate Studies for statistical Research
Machine Learning Approach for Community Detection and
Influential Node Detection In Social Networks
Presented By
Amal Mahmoud Yehia
Supervised By
Prof. Dr. Lamiaa Fattouh Ibrahim
Dr. Yasser Abd Elhamid
2021
Agenda
1. Introduction
2. Background
3. Benefits of communities in social groups
4. Community in Real World.
5- Problem Definition
6- Clustering Algorithms
7- Work Plan.
8- References
1-Introduction
• Social Network is a connection between a number of
users or groups through a common platform (website) for
sharing their career interest, personal interest, activities
and opinions on public events
• The rapid increase on the social networks present an
urgent need for identifying the community analysis.
1-Introduction
• Social network analysis (SNA) helps us to find
the mapping, relationships and information
flow between people, groups, organizations or
computers.
2-Background
• Graph is the most powerful structure to represent social
network [Ms. Prajakta Vispute, Dr. Shirish Sane 2020]
• Social Network is represented by an undirected Graph
G =(V, E)
• Vertices V represent the entity in the network like users,
organizations.
• Edges E represent link or association between the
entities
• There are many social networks like Facebook, Google+,
Twitter, LinkedIn, WeChat, QQ, etc.
Community
• A community is a group of related nodes that
– are densely interconnected
– have fewer connections with the rest of the
network
Community Detection
Community detection is finding the nodes which have
similar properties (e.g., age , job, hopes,…) , or which
share common interests and keeping them into the
same group.
The main goal of finding communities is to find the
data or information from various clusters and also find
the relationship among node [Mrs. Kulkarni Varsha2020 ].
CENTRALITY
• Node centrality: Every node has some degree of
influence or importance within the social domain.
• centrality is a quantitative measure that aims at
revealing the importance of a node.
• The degree centrality of a vertex is the number of
edges that connect to it.
• Influence user is the user with high centrality, so he
has many followers influenced by his opinions.
CENTRALITY
Betweenness centrality
Betweenness centrality is to measure one node
undertaking ‘mediation’ role in a network.
• The strength of a tie can be measured by edge
betweenness
• Edge betweenness: is the number of shortest paths
between pairs of nodes that run through that edge
• The edge with higher betweenness tends to be the
bridge between two communities.
Benefits of communities in social
groups
– Behavior Analysis
– Location-Based Interaction Analysis
– Recommender Systems Development
– Link Prediction
– Customer Interaction, Analysis & Marketing
– Media Use
– Security
– Social studies
Graphs from the Real Word
Webpage Hyperlink Graph
Directed Communities
Network of Word Associations
Overlapping Communities
Attributes of nodes
- People can be queried
about different
features, like
( age, gender, race,
socioeconomic status,
place of residence,
grade in school, etc. )
Problem Definition
Finding Communities in complex network such as Online Social network is
a difficult task, many challenges .
Scalability: such networks can be huge, often in a scale of millions of
actors and hundreds of millions of connections, Existing Community
detection techniques might fail when applied directly to networks of this
size.
Heterogeneity: In reality, multiple relationships can exist between
individuals, and multiple types of entities can also be involved in one
network.
Evolution: Social network emphasizes timeliness, which makes the
network dynamic and changes over time.
Evaluation: the task of comparison and evaluation different work in
community detection is also a challenging process .
Clustering Algorithm
Clustering Algorithm
Density-Based
Spanning tree Markov chain Spectral Partitioning Hierarchical
Clustering
Algorithm Algorithm clustering cluster Clustering (HC)
Algorithm
Community Detection Algorithms
Density-Based Clustering Algorithm
can identify clusters in large spatial data sets by
looking at the local density of database
elements .
Hierarchical Clustering (HC)
Goal: build a hierarchical structure of communities
based on network topology
Strategies for building modular hierarchical
clustering fall into one of two types:
– Divisive Hierarchical Clustering (top-down)
– Agglomerative Hierarchical clustering (bottom-up)
Divisive Hierarchical Clustering
• Divisive clustering
– Partition nodes into several sets
– Each set is further divided into smaller ones
• One particular example: recursively remove the
“weakest” tie
– Find the edge with the least strength
– Remove the edge and update the corresponding
strength of each edge
• Recursively apply the above two steps until a network
is discomposed into desired number of connected
components.
• Each component forms a community 17
Agglomerative Hierarchical Clustering
• Initialize each node as a community
• Merge communities successively into larger
communities following a certain criterion
– E.g., based on modularity increase
18
Community Detection Algorithms
• Partitioning clustering
indicates another popular class of methods to
find clusters in a set of data points. But, the
number of clusters is preassigned.
• Spectral Graph Theory
Enables finding a small community in a network
with millions of vertices.
Community Detection Algorithms
Markov chain Algorithm
is method for generating a sequence of random variables where the
current value is dependent on the value of the prior variable.
Specifically, selecting the next variable is only dependent upon the last
variable in the chain.
Spanning tree Algorithm
•Every connected graph contains a spanning tree.
• On weighted graphs, one can define a minimum (maximum) spanning
tree, i. e. a spanning tree such that the sum of the weights on the edges
is minimal (maximal).
•Minimum and maximum spanning trees are often used in detect
communities fast and accurate
Work Plan
The research methodology will conduct the following steps:
Step 1: Literature survey on different published research in
community detection
and influential node.
Step 2: Determine the characteristics and statistical information
about the datasets that will be used.
Step 3: Propose machine learning approach for community
detection and influential node with consideration of
performance.
Step 4: Evaluate the proposed approach.
Step 5: Analysis of results, extraction of conclusions, and
suggestions for future work.
Refrences
• An Overview of Community Detection Algorithms in Social Networks,(2020), Mrs.
Kulkarni Varsha, Dr. Kiran Kumari Patil, Proceedings of the Fifth International Conference
on Inventive Computation Technologies (ICICT-2020) IEEE Xplore Part
Number:CFP20F70-ART; ISBN:978-1-7281-4685-0
• Community detection in Social Media, (2012), Symeon Papadopoulos, Yiannis Kompatsiaris,
Athena Vakali, Ploutarchos Spyridonos, Data Mining and Knowledge Discovery May 2012,
Volume 24, Issue 3, pp 515-554
• Community Detection in Graphs, (2010), Santo Fortunato, Complex Networks and Systems
Lagrange Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino,I-ITALY.
• Social Network Analysis. Methods and Applications, (2008), Wasserman, Stanley, Faust,
Katherine, Cambridge, University Press
• Introduction to social network methods, (2005) Robert A. Hanneman and Mark Riddle,
University of California,