0% found this document useful (0 votes)
11 views6 pages

3.2 Detecting Communities in Social Networks

Uploaded by

sshanjay1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

3.2 Detecting Communities in Social Networks

Uploaded by

sshanjay1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Social Network Analysis (3 - 3) Extraction & Mining Communities in Web Social Networks

With transpose

t 0 0 0
A = 0 0 0
1 1 0
Assume the initial hub weight vector is :
1
u = 1
1
Ans. :
We compute the authority weight vector by :

t
0 0 0 1 0
v = A u = 0 0 01 = 0
1 1 0 1 2
Then, the updated hub weight is :
0 0 1 0 2
u = Av = 0 0 10 = 2
0 0 0 2 0
This already corresponds to our intuition that node 3 is the most authoritative, since it is the
only one with incoming edges, and that nodes 1 and 2 are equally important hubs. If we repeat
the process further, we will only obtain scalar multiples of the vectors v and u computed at
step 1. So the relative weights of the nodes remain the same.

 3.2 Detecting Communities in Social Networks


 Network is used to represent real world entity. For example social network is connected
with friendships or co-authors. In most of the example, real social network contains two
parts : denser and sparser part.
 In denser sub-network, group of peoples are closely connected to each other. This type of
denser sub-network called as communities.
 Detecting communities from given social networks are practically important for the
following reasons:
1. For information recommendation, communities are used. In communities, members
have similar preferences and tests.
2. Communities will help us understand the structures of given social networks.
3. Communities will play important roles when we visualize large-scale social networks
 Communities, also known as modules and clusters, are sets of nodes which are relatively
more connected, and are believed to be the intrinsic structures in networks in the nature.
 Nodes in the same community often share interesting properties such as a common
function, interest, or purpose. Thus, community detection is one of the most important
problems in network analysis.

TECHNICAL PUBLICATIONS® - An up thrust for knowledge


Social Network Analysis (3 - 4) Extraction & Mining Communities in Web Social Networks

 Why Community Detection ?


1. Communities in a citation network might represent related papers on a single topic;
2. Communities on the web might represent pages of related topics;
3. Community can be considered as a summary of the whole network thus easy to visualize
and understand.
4. Sometimes, community can reveal the properties without releasing the individual
privacy information.

 3.3 Definition of Community


 Definition is divided into three parts:
1. Local definitions
2. Global definitions
3. Definitions based on vertex similarity.

 3.3.1 Local Definition


 The attention is focused on the vertices of the sub-network under investigation and on its
immediate neighborhood.
 A local definition of community is divided into two types : self-referring ones and
comparative ones.
 The examples of self referring definitions are clique , n-clique and k-plex.
1. Clique : A maximal sub-networks where each vertex is adjacent to all the others.
2. n-clique : A maximal sub-network such that the distance of each pair of vertices is not
larger than n
3. k-plex : A maximal sub-network such that each vertex is adjacent to all the others except
at most k of them.
 The examples of comparative definitions are LS set and weak community
 LS set : sub-network where each vertex has more neighbors inside than outside of the sub-
network
 Weak community : The total degree of the vertices inside the community exceeds the
number of edges lying between the community and the rest of the network.
 Fig. 3.3.1 shows a graph and a listing of the cliques contained in it. The sub-graphs are in
fact cliques, and that there are no remaining cliques in the graph. Notice that cliques in a
graph may overlap. The same node or set of nodes might belong to more than one clique.
Cliques = {1, 2, 3}, {1, 3, 5} and {3, 4, 5, 6}
 For example, in figure node 3 belongs in all three cliques. Also, there may be nodes that do
not belong to any cliques (for example node 7). However, no clique can be entirely
contained within another clique, because if it were the smaller clique would not be
maximal.
TECHNICAL PUBLICATIONS® - An up thrust for knowledge
Social Network Analysis (3 - 5) Extraction & Mining Communities in Web Social Networks

Fig. 3.3.1 : Graph and cliques

 3.3.2 Global Definitions


 A global definition of community is related to a sub-network with respect to the network as
a whole. It starts from a null model.
 Network which matches the original network in some of its topological features, but which
does not display community structure. Then, the linking properties of sub-networks of the
initial network are compared with those of the corresponding sub-networks in the null
model. If there is a wide difference between them, the sub-networks are regarded as
communities.
 Null model is designed by using randomness in the distribution of edges among vertices.
The most popular null model is that proposed by Newman and Girvan.
 Null model consists of a randomized version of the original network, where edges are
rewired at random, under the constraint that each vertex keeps its degree. This null model is
the basic concept behind the definition of modularity, a function which evaluates the
goodness of partitions of a network into communities.

 3.3.3 Definitions Based on Vertex Similarity


 Last type of definition is based on an assumption that communities are groups of vertices
which are similar to each other. To evaluate the similarity between each pair of vertices,
some calculation is used.
 Similarity measures are based on hierarchical clustering. Hierarchical clustering is a way to
find several layers of communities that are composed of vertices similar to each other.
 Repetitive merges of similar vertices based on some quantitative similarity measures will
generate a structure shown in Fig. 3.3.2. This structure is called dendrogram.
 Decompose data objects into a several levels of nested partitioning (tree of clusters), called
a dendrogram. A clustering of the data objects is obtained by cutting the dendrogram at
the desired level, then each connected component forms a cluster.
TECHNICAL PUBLICATIONS® - An up thrust for knowledge
Social Network Analysis (3 - 6) Extraction & Mining Communities in Web Social Networks

 Highly similar vertices are connected in the lower part of the dendrogram. Subtrees
obtained by cutting the dendrogram with horizontal line correspond to communities.
Communities of different granularity will be obtained by changing the position of the
horizontal line.

Fig. 3.3.2 : Dendrogram


 The horizontal axis of the dendrogram represents the distance or dissimilarity between
clusters. The vertical axis represents the objects and clusters.
 Each joining of two clusters is represented on the graph by the splitting of a horizontal line
into two horizontal lines. The horizontal position of the split, shown by the short vertical
bar, gives the distance (dissimilarity) between the two clusters.
 A cross-section of the tree at any level, as indicated by the dotted line, will give the
communities at that level

 3.4 Evaluating Communities


 Various methods are used for partitioning given network into communities. It is necessary
to establish which partition exhibit a real community structure.
 Quality function supports for finding good partitions. The most popular quality function is
the modularity.
 Newman and Girvan were among the first to address this issue and proposed modularity to
quantify the strength of community structure.
 This metric, based on the intuition that nodes within the same community should be more
tightly connected than they would be by chance, has been adopted for a variety of uses
including the validation and comparison of community structures, but also as an objective
function for optimization algorithms to identify communities.

TECHNICAL PUBLICATIONS® - An up thrust for knowledge


Social Network Analysis (3 - 7) Extraction & Mining Communities in Web Social Networks

 Fig. 3.4.1 shows a small network with community structure. In this case there are three
communities, denoted by the dashed circles, which have dense internal links but between
which there are only a lower density of external links.

Fig. 3.4.1 : Small network with community structure


 A graph can be split into communities in numerous ways, i.e. for each graph there are many
possible community structures. In the simple case, a community structure is defined as a
graph partition into a set of node sets C = {Ci}.
 To provide a measure of the quality of a community structure, we make use of modularity.
 Modularity quantifies the extent to which a given graph partition into communities presents
a systematic tendency to have more intra-community links than the same community
structure would present if the links would be rewired under ER (Erdos-Renyi) graph model.
 Modularity (Q) is defined in several ways.
k
Q =  ( eii – a i )
2

i=1
Where eii = Probability edge is in module i
2
ai = Probability a random edge would fall into module i
 Another View of Modularity

TECHNICAL PUBLICATIONS® - An up thrust for knowledge


Social Network Analysis (3 - 8) Extraction & Mining Communities in Web Social Networks

 Modularity measures the strength of a community partition by taking into account the
degree distribution. A larger value indicates a good community structure
 One advantage of modularity is that it can be computed using only connectivity of the
network, in the absence of any node labels or other information. However, this property can
also be considered a weakness because modularity is unable to incorporate metadata (e.g.
node labels) even if it is available.
 Modularity measures internal and not external connectivity, but it does so with reference to
a randomized null model.
 The modularity can be either positive or negative. Positive values indicate the possible
presence of community structure

 3.5 Methods for Community Detection and Mining


 The classical methods for dividing given networks into sub-networks are graph partitioning,
hierarchical clustering, and k-means clustering.
 All these methods depend upon the numbers of clusters or their size in advance. It is
necessary to find suitable methods that have abilities of extracting complete information
about the community structure of networks.
 The methods for detecting communities are roughly classified into the following categories:
1. Divisive algorithms
2. Modularity optimization
3. Spectral algorithms

 3.5.1 Divisive Algorithm


 Simple method to identify communities in a network is to find the edges that can connect
vertices of different communities and remove them, so that the communities get
disconnected from each other.
 Newman-Girvan algorithm was has two best features :
1. They involve iterative removal of edges from the network to split it into communities,
the edges removed being identified using “betweenness” measure which represents
number of shortest paths between pair of nodes that pass through the links
2. These measures are recalculated after each removal.
 Newman-Girvan algorithms are highly effective at discovering community structure in both
computer-generated and real-world network data, and they can be also used for complex
structure of networked systems. Fig. 3.5.1 shows detecting communities based on edge
betweenness.
 It uses the idea that “bridges” between communities must have high edge betweenness. The
edge with higher betweenness tends to be the bridge between two communities.
 The edge betweenness of an edge is the number of shortest paths between pairs of vertices
run along it. Iteratively removing the edges with highest betweenness, we can determine a
hierarchical tree and then communities.
TECHNICAL PUBLICATIONS® - An up thrust for knowledge

You might also like