0% found this document useful (0 votes)
70 views12 pages

A Comparative Study of Social Network Analysis Tools

This document compares social network analysis tools. It defines key concepts used to represent social networks, like one-mode and two-mode graphs. It discusses expected functionalities of SNA tools, including graph visualization, computation of node-level and network-level indicators, and community detection. The document aims to benchmark SNA tools based on these criteria.

Uploaded by

Alek Phabiovsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views12 pages

A Comparative Study of Social Network Analysis Tools

This document compares social network analysis tools. It defines key concepts used to represent social networks, like one-mode and two-mode graphs. It discusses expected functionalities of SNA tools, including graph visualization, computation of node-level and network-level indicators, and community detection. The document aims to benchmark SNA tools based on these criteria.

Uploaded by

Alek Phabiovsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

International Workshop on Web Intelligence and Virtual Enterprises 2 (2010) 1

A comparative study of social network


analysis tools
David Combe1 ∗ , Christine Largeron1 , Előd Egyed-Zsigmond2 and Mathias Géry1
Université de Lyon
1
CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000, Saint-Étienne, France
Université de Saint-Étienne, Jean-Monnet, F-42000, Saint-Étienne, France
Email: {david.combe, christine.largeron, mathias.gery}@univ-st-etienne.fr
2
UMR 5205 CNRS, LIRIS
7 av J. Capelle, F 69100 Villeurbanne, France
Email: [email protected]

Abstract. Social networks have known an important development since the appearance of web 2.0 platforms. This leads to a
growing need for social network mining and social network analysis (SNA) methods and tools in order to provide deeper analysis
of the network but also to detect communities in view of various applications. For this reason, a lot of works have focused on
graph characterization or clustering and several new SNA tools have been developed over these last years. The purpose of this
article is to compare some of these tools which implement algorithms dedicated to social network analysis.

Keywords: Social Network Analysis, Tools, Benchmark, Community detection

1. Introduction haviour and evolution of human networks [34,33].


Several indicators were proposed to characterize the
The explosion of Web 2.0 (blogs, wikis, content actors as well as the network itself. One of these indi-
sharing sites, social networks, etc.) opens up new per- cators, for example, was the centrality that can be used
spectives for sharing and managing information. In in marketing to discover the early adopters or the peo-
this context, among several emerging research fields ple whose activity is likely to spread information to
concerning "Web Intelligence", one of the most ex- many people in a shortest way.
citing is the development of applications specialized Nowadays, the wide use of Internet around the
in the handling of the social dimension of the Web. world allows to connect a lot of people. According to
Particularly, building and managing virtual communi- the Facebook Factsheet page, there are currently over
ties for Virtual Enterprises require the development of 500 millions active users and according to Datamoni-
a new generation of tools integrating social network tor they will be around one billion in 2012. As pointed
modeling and analysis. in the Gartner study [17], this very important devel-
Several decades ago, the first works on Social Net- opment of the networks gives rise to a growing need
works Analysis (SNA) was carried out by researchers for social network mining and social network analysis
in Social Sciences who wanted to understand the be- methods in order to provide deeper comprehension of
the network and to detect communities and study their
* Corresponding author. evolution for applications in areas such as community
This work has been partly funded by the Web Intelligence project
marketing, social shopping, recommendation mecha-
(région Rhône-Alpes) nisms and personalization filtering or alumni manage-
http://www.web-intelligence-rhone-alpes.org ment. For this reason, while many new technologies
2 Combe et al. / A comparative study of social network analysis tools

(wikis, social bookmarks and social tagging, etc) and is the case, for instance in a classical dataset [35] re-
services (GData, Google Friend Connect, OpenSocial, lated to a karate club where the nodes correspond to
Facebook Beacon,. . . ) were proposed on internet, sev- the members of the club and where the edges are used
eral new SNA tools have been developed. These tools to describe their friendships. When the relationships
are very useful to analyze theoretically a social net- are directed, edges are replaced by arcs. Nodes as well
work but also to represent it graphically. They compute as edges can have attributes. In that case, we can talk
different indicators which characterize the network’s then about labeled graphs.
structure, the relationships between the actors as well
as the position of a particular actor. They also allow the 2.2. Two-mode graph
comparison of several networks. The purpose of this
article is to present some actual major tools and to de- When the relationships between two types of ele-
scribe some of their functionalities. A similar compar- ments are considered, for example the members and
ison has already been done in [22], but with a more the competitions in the karate club, a two-mode graph
statistical vision. Our comparative survey on the state- is most suited to represent the social network. A two-
of-the-art tools for network visualization and analysis, mode graph, also known as bipartite graph, is a graph
is focused on three main points: with two types of vertices. The edges are allowed only
between nodes of different types.
– Graph visualization;
The most common way to store two-mode data is
– Computation of various indicators providing a lo-
a rectangular data matrix with the two node types re-
cal (i.e. at the node level) or a global description
spectively in rows and columns. For example, a 2 di-
(i.e. on the whole graph);
mensional matrix with the actors in rows and the events
– Community detection (i.e. clustering);
in columns can represent a two-mode graph for the
In order to present the characteristics of the differ- karate club. This representation is very common in
ent tools, the main concepts used to represent social SNA [19]. Two-modes graphs can be transformed in
networks are defined in the next section. The different one-mode graphs using a projection on one node type
measures we want to find in a SNA tool are presented and creating edges between these nodes using different
in section 3. We will describe the benchmarking ap- aggregation functions.
proach and the results of this comparative study in sec- The concept of graph can be generalized by a hyper-
tion 4 and then we will conclude. graph, in which two sets of vertices can be connected
by an edge. A multigraph is a graph which is permitted
to have edges that have the same end nodes.
2. Notations In the next sections, we note |V | and |E| the number
of vertices and edges in G and deg(v), the degree of
The theoretical framework for social network anal- the node v giving the number of adjacent edges to v.
ysis was introduced in the 1960s. Following the ba-
sic idea of Moreno [26] who suggested to represent
agents by points connected by lines, Cartwright and 3. Expected functionalities of network analysis
Harary have proposed to analyze this sociogram using tools
the graph theory. For this reason, they are considered
as the founders of the modern graph theory for social This work focuses on different functionalities pro-
network analysis [8]. vided by network analysis tools. These functionalities
Two types of graphs can be defined to represent a are firstly the visualization of the network, secondly
social network: one-mode and two-mode graphs. the computation of statistics based on nodes and on
edges, and finally, community detection (or cluster-
2.1. One-mode Graph ing).

When the relationships between actors are consid- 3.1. Visualization


ered, the social network can be represented by a graph
G = (V, E) where V is the set of nodes (or vertices) Visualization is one of the most wanted functionali-
associated to the actors , and E ∈ V × V is the set ties in graph handling programs, and this stays true for
of edges which correspond to their relationships. This network analysis software.
Combe et al. / A comparative study of social network analysis tools 3

20

27
15
15
26 16
10
21 13
32 29
27 20
14 31
6 23 14 4 11
12
34 7
21
11
9
31
33
34
18 1 33
10
25 13
19
30
7 9
8
3 2 1 6
2
19
16
17
29 5
23
4
24 32
28 8
28
22

22
5 18 12
3
24
26
25
30

Fig. 1. Visualization of Zachary’s Karate club using the igraph li- Fig. 2. Visualization of Zachary’s Karate club using the Pajek appli-
brary and spring layout cation and Kamada-Kawai layout

Many algorithms consist in pushing isolated vertices ate properties of the graph like the randomness or small
toward empty spaces and in grouping adjacent nodes. world distributions.
These algorithms are directly inspired by physical phe- On the other hand, the descriptors at the node level
nomena. For example, edges can be seen as springs and are useful for detecting the nodes strategically placed
nodes can be handled as electrically charged particles. in the network or highlighting those that take an im-
portant part in communication such as bridges or hubs.
The location of each element is recalculated step by
step. These methods require several iterations in order 3.2.1. Vertex and edge scoring
to provide a good result on large graphs. Force-based The place of a given actor in the network can be de-
layouts are simple to develop but are subject to poor scribed using measures based on vertex scoring. Com-
local minimum results (see Fig. 1). mon types of vertex scoring are the centrality mea-
Among these algorithms, we can mention, Fruchter- sures. Within graph theory and network analysis, there
man Reingold, which is a well-used force-based al- are various measures of the centrality of a vertex to
gorithm for graph visualization [15]. An example is determine the relative importance of this vertex within
provided on Fig. 3. An alternative is the algorithm of the graph. For example, to measure how important a
Kamada-Kawai [24] (see Fig. 2), which has a faster person is within a social network, Freeman [14] has
convergence than Fruchterman Reingold, but which distinguished three main centralities:
often does not give so good results than this last one. a) Degree centrality: The first and simplest measure is
It can be envisaged to use Kamada-Kawai in order to the degree centrality. It emphasizes nodes with the
calculate a first placement of the vertices. These two high degrees [28].
methods are among those called “spring algorithms”. In oriented graphs, we can distinguish:
Some other layouts are different in the way they pro-
vide a view of the neighborhood for a node (i.e. ra- – incoming degree of a vertex v:
dial layout, hyperbolic layout). 3D graph visualization
is the logical extension of planar representations. Most N + (v) = |{i ∈ V : (i, v) ∈ E(G)}| (1)
of the methods proposed are adaptable to 3D.
– outgoing degree:
Local zoom based, so called fish-eye functionality
can be also interesting to visually explore large graphs
N − (v) = |{i ∈ V : (v, i) ∈ E(G)}| (2)
[16].
– degree centrality [14]:
3.2. Indicator based network description
deg(v)
Many quantitative indicators have been defined on CD (v) = (3)
|V | − 1
networks [34,33].
The descriptors at the network level are used to com- b) Closeness centrality: For connected graphs, close-
pare the proportion of nodes versus edges, or to evalu- ness centrality is the inverse of the average distance
4 Combe et al. / A comparative study of social network analysis tools

to all other nodes. This indicator can be useful for d) PageRank: The score computed by Page Rank [7] is
many applications in the real world. For instance, if higher for nodes that are highly connected and con-
edges were streets, the crossroad (vertex) with the nected with nodes that are highly connected them-
highest closeness centrality would be the best place selves. If L(v, u) is the number of links from page
for emergency services. v to u, then the Page Rank P R(u) of the vertex u
Closeness centrality is defined by: can be defined as:

|V | − 1 X P R(v)
CC (v) = P (4) P R(u) = (1 − d) + d ∗ (7)
u∈V,u6=v d(v, u)
L(v, u)
v|(v,u)∈E

where d(v, u) is a distance, like for example the The parameter d is a damping factor. PageRank
number of edges in the shortest path between two score is iterated until convergence.
nodes or the sum of the weight of these edges, in PageRank is a variant of the Eigenvector centrality
weighted graphs. measure.

c) Betweenness centrality: Betweenness centrality is e) HITS algorithm: Hyperlink-Induced Topic Search


another centrality measure of a vertex within a (HITS, also known as hubs and authorities) cal-
graph. Vertices that occur on many shortest paths culates two scores: hub and authority score [25].
between other vertices have higher betweenness The more a vertex has outgoing arcs, the higher
than those that do not [14]. An improved implemen- is its hub score. The more a vertex has incoming
tation of this indicator has been proposed by Ulrik links, the higher is its authority score. At the begin-
Brandes with a running time of O(|V |.|E|) [6]. ning every node are considered as hub and authority
The betweenness of vertex u is defined by: scores are fixed to a constant. Then the scores are
updated and they converge after few iterations.
X σuw (v) If u is one of the m vertex connected to v, the scores
BC (v) = (5) auth(v) and hub(v) are computed for v at the new
σuw
(u,w)∈V ×V,u6=w,u6=v,w6=v iteration as follows:
X
where σuw is the number of shortest paths from ∀v, auth(v) = hub(u) (8)
nodes u to w and σuw (v) is the number of shortest u|(u,v)∈E
paths from u to w that pass through v. Redefining
the graph, betweenness can also be defined for an
edge e: X
∀v, hub(v) = auth(u) (9)
u|(v,u)∈E
X σvw (e)
BEC (e) = (6)
σvw These different measures can also be calculated on
(v,w)∈E,v6=w
oriented graphs.
where σvw is the number of shortest paths from 3.2.2. Network scoring
nodes v to w and σvw (e) is the number of shortest Network density is the rate of edges in the network
paths from v to w that pass through e. over the number of edges that could exist in the net-
There is also another type of centrality measure: the work. This measure shows if the underlying graph is
eigenvector centrality that measures the importance sparse or dense.
of a node in a network. It is based on the princi- These indicators have since been translated in ver-
ple that connections to nodes having a high degree sions applicable to directed graphs, useful in informa-
contribute more to the score of the node in question tion dissemination theory. This asymmetry leads to the
than connections to nodes having a low score. concept of prestige.
These different measures can also be calculated on
oriented graphs. For them, other measures can be a) Dyad Census: A dyad is a term borrowed from so-
defined, like for instance PageRank or HITS. ciology used to describe a group of two people, i.e.
the smallest possible social group. By extension, it
Combe et al. / A comparative study of social network analysis tools 5

connections between the groups. These groups are


called clusters by statisticians and data mining profes-
sionals while sociologists prefer to use the word com-
munities.
A very complete survey on graph clustering can be
found in [13].

3.3.1. Main approaches of community detection


Among the different methods proposed to detect
communities, two main approaches can be distin-
guished: on the one hand there is the hierarchical ap-
proach in which the nodes are aggregated in a hierar-
chy of clusters from the discrete partition to the whole
network [23]. This approach evaluates the proximity
between two nodes through a similarity measure and
Fig. 3. Community detection with igraph and the spinglass algorithm
builds the groups using an agglomerative strategy, like
the single linkage algorithm or the complete linkage
is used in social network analysis for designing two
interacting nodes. algorithm. On the other hand, there is the partitional
Four states are observable between two nodes (a clustering which consists in directly dividing the net-
and b) for directed graphs: work into a predefined number of groups. The min-
imum cut method is an example of this approach in
– no arc which the groups are defined so as the number of edges
– two mutual arcs between them is minimized.
– a to b
The softwares considered in this benchmarking in-
– b to a
clude three clustering methods. The first one is the
Each dyad is classified into one of the mutual, Newman and Givan [27] method. This is a hierarchical
asymmetric or null categories and the proportion of method, based on the betweenness of the edges, which
each of these cases is provided. These counts help consists in removing the edge with highest between-
to know if the links follow a random or a small- ness, and repeating this process until no edge remains.
world distribution [21]. The second method, called Walktrap [31], is a parti-
b) Triad Census: In order to extend the dyad count, tional algorithm that uses a random walk in the graph
Davis and Leinhardt [11] have proposed the triad in order to detect the components in which the walker
count, with 16 distinct cases (directed graphs). Tri- tends to stay. A calculated distance between two ver-
adic analysis performs the count of the triads in
tices is calculated as the probability for a walker to go
each configuration. Information provided is again
from a vertex to another. A hierarchical clustering is
useful for comparing a network with the random
then performed in order to obtain the clusters.
model.
The last algorithm is called Spinglass [32]. Fig.3
3.2.3. Graph and vertice similarity shows an example of community detection done with
In social network analysis tools, one can expect to the spinglass algorithm of igraph. In this figure, differ-
find functions expressing similarity of nodes in a graph ent vertex shapes indicate different communities.
and also functions to measure the similarity between With hierarchical methods, a dendogram (Fig. 9)
graphs themselves. Some examples of similarity mea- is the best representation for choosing the number of
sures available in softwares are the Jaccard, Dice or clusters to retain. Another way to determine the num-
Tanimoto similarity. ber of groups that must be retained consists in maxi-
mizing a particular criteria such as modularity.
3.3. Clustering or community detection
3.3.2. Clustering validation
The aim of clustering is to detect groups of nodes Modularity is a quality function useful to evalu-
with dense connections within the groups and sparser ate clustering. It has been proposed by Newman and
6 Combe et al. / A comparative study of social network analysis tools

Girvan[27]. Modularity is defined by: tegrated easily in any problematic and the second
one can be recommended for a Mathlab-like (console-
1 X based) approach.
Q= (Auv − Puv )δ(Cu , Cv ) (10)
2.|E| The following sections describe the dataset and the
(u,v)∈V ×V
criteria used in the benchmark.
where the couple (u, v) runs over all pairs of vertices,
4.1. Dataset
A is the adjacency matrix where Auv contains 1 if u
and v are linked by an edge and 0 otherwise, Puv is the
The dataset considered in this survey is a widely
expected number of edges between u and v, Cv is the
used data set in SNA literature. This dataset presents
group to which vertex v belongs and δ is the Kronecker
the affiliation graph between 34 members of the karate
delta, which is 1 if its two arguments are equal, and
club of a US university in 1970.
0 otherwise. The clustering corresponding to a unique
Zachary’s Karate Club2 has 34 vertex and 78 edges.
partition containing the whole graph has a modularity
Each vertex is numbered. An edge is present between
value of zero.
two nodes when the two corresponding individuals
“consistently interacted in contexts outside those of
karate class, workouts and club meetings” [35].
4. Benchmarking
4.2. Evaluated criteria
Many tools have been created for network analysis
and visualization purposes. A long list of tools is avail- In our benchmark, we have selected a set of evalua-
able on Wikipedia1 , with very different approaches. tion criteria. These criteria are the license of the tool,
Many are purely academic software. Some are oriented the data format handled, the graph types supported, the
toward visualization, other consist in APIs allowing amount of nodes that can be loaded in a reasonable
graph and hypergraph modeling with sometimes the time, the available indicators, the clustering algorithms
possibility of animation on vertices such as JUNG. included and the visualization layouts available. Each
Some tools are optimized for large data manipulation. criterion is detailed in the following sections.
Others propose low level implementations of specific
algorithms. 4.2.1. File formats
In this survey, the official documentation has been There are mainly three ways to express in a serial
inspected for libraries. We consider 4 tools: Pajek, manner the structure of a network:
Gephi, igraph and NetworkX that will be presented – adjacency matrix (square for directed graphs, tri-
further in this section. The choice of them is based on: angular for undirected ones)
– a balance between well established tools and – adjacency lists (for directed graphs), where the
newer ones, based on recent development stan- source node is followed by the list of the nodes
dards (in terms of ergonomics, modularity and that are the targets of every arcs starting from the
data portability), node
– a SNA point of view. The tools must provide basic – vertices pairs.
metrics for networks, Several file formats have been created in order to
– the networks size can reach tens of thousands of provide graph representations. Here are the main ones:
nodes.
a) Pajek graph file format (.net extension), while not
Pajek is a legacy software, with its own graph- very well documented, is very popular among so-
oriented approach. Gephi represents a modern answer cial network analysis tools (Fig. 4). It represents
for graph study with GUI (graphical user interface), in a text file, first the vertices (one per line) and
open source philosophy and plugin orientation. Net- then the edges. This format is not often handled
workx and igraph are two essential libraries for effi- in the other implementations except the Pajek pro-
cient large graph handling. The first one can be in-
2 The dataset is available at
1 http://en.wikipedia.org/wiki/Social_network http://vlado.fmf.uni-lj.si/pub/networks/
_analysis_software data/ucinet/ucidata.htm, at the date of April 2010, 5th.
Combe et al. / A comparative study of social network analysis tools 7

*Vertices 34 Creator "Mark Newman on Fri Jul...2006"


1 "1" graph
2 "2" [
node
..
. [
id 1
34 "34" ]
*Arcs node
1 2 1 [
1 3 1 id 2
1 4 1 ]
..
. ..
.
34 31 1
node
34 32 1
[
34 33 1
id 34
]
Fig. 4. Zachary dataset extract in Pajek .net format edge
[
gram, which allows edge representation with a ma- source 2
trix or an edge list or arc list (for directed graphs). target 1
Weighted networks are allowed. Weights in the op- ]
tional third column are for the arcs. ..
.
b) GML (Graph Modelling Language) is also a struc-
tures text file, where nodes and edges begin with edge
"node" and "edge" keywords and their content is be- [
tween "[" and "]". It allows annotations as content, source 34
such as coordinates for vertices (see Fig. 5). target 33
]
GML supports:
]
– directed and undirected graphs
– node and edge labels Fig. 5. Zachary dataset extract in GML format
– graphical placement of nodes (coordinates)
– other annotations
– edge representation with a full matrix, a half-
c) GraphML is an XML-based graph description lan- matrix, an arcs list or an edges list,
guage (see Fig. 6). As described in its documenta- – index labels,
tion 3 , it supports: – rectangular matrices for two-mode networks.
– directed, undirected, and mixed graphs, e) DOT is an other popular graph description lan-
– hypergraphs, guage, handled mainly by Graphviz [12].
– hierarchical graphs, f) GEXF4 is an XML-based format, from the GEXF
– graphical representations, and Working Group. It supports
– application-specific attribute data.
– dynamic graphs,
As all XML-based representation, it is quite a ver-
– application-specific attribute data, through the
bose one.
use of users XML namespaces,
d) DL (Data Language) format comes from the Ucinet
– hierarchical structure (nodes can contain nodes)
program [5]. The common extension for this format
– visualization and positioning information such as
is .dat. An example is given Fig. 7.
DL format supports: 3D coordinates, colours, shapes.

3 http://graphml.graphdrawing.org 4 http://gexf.net
8 Combe et al. / A comparative study of social network analysis tools

<?xml version="1.0" encoding="UTF-8"?> are dedicated to social network analysis. The covered
<graphml xmlns="http://graphml.
functionalities are:
graphdrawing.org/xmlns"
.. – tnet [30] for weighted, two-mode, and longitudi-
.
nal networks (networks study over the time) anal-
<graph id="G" ysis,
edgedefault="undirected"> – statnet [18] for statistical analysis of social net-
<node id="1"/> works,
<node id="2"/>
– sna includes node and graph-level indices, struc-
<edge id="e1" source="1"
tural distance and covariance methods, structural
target="2"/>
.. equivalence detection, theoretic models fitting,
. random graph generation, and 2D/3D network vi-
</graph> sualization
</graphml>
These packages are available on the Comprehensive R
Archive Network9 .
Fig. 6. Zachary dataset extract in GraphML format
igraph and NetworkX are two libraries suitable for
social network analysis within the R environment. Net-
DL
workX can also be called in Python programs.
N=34 NM=2
FORMAT = FULLMATRIX DIAGONAL PRESENT
LEVEL LABELS: 4.4. Benchmarking results
ZACHE
ZACHC The benchmarking results are summarized in Ta-
DATA: ble 1.
0 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 1 0 0 They are detailed in this section, following the eval-
1 0 1 1 0 0 0 1 ... 0 0 0 0 0 1 0 0 0 uation criteria introduced previously (see 4.2): the li-
1 1 0 1 0 0 0 1 ... 0 0 1 1 0 0 0 1 0
cense of the tool, the data format handled, the graph
1 1 1 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0
types supported, the available indicators, the cluster-
ing algorithms included and the visualization layouts
.. available.
.
The first point is licensing. It appears that NetworkX
Fig. 7. Zachary dataset in DAT format has the most permissive license, allowing integration
in proprietary software. Both igraph and Gephi have
4.3. Evaluated tools chosen GNU GPL which does not allow the integration
in proprietary software. Pajek source code is undis-
Two libraries and 2 stand-alone programs have been closed and the use of the software for commercial use
compared here: is not free. In matter of data format, Gephi handles
all the formats mentioned here. GEXF is not available
– Pajek 5
elsewhere mainly because this format started in the
– Gephi 6
Gephi project. DL comes with UCINET ; this last one
– igraph 7
being a project linked to Pajek, it is one of the pre-
– NetworkX 8
ferred formats for this tool. GML and GraphML are
The two libraries, igraph and NetworkX, use a gen- not supported in Pajek, so you can prefer the .net for-
eral purpose environment called R. mat, which is universal in our panel.
Concerning the bipartite graphs study and their ma-
The R environment It is dedicated to statistics. It is
nipulation, most tools propose a few primitives, such
organized into many packages amongst which, some
as projection (conversion of a bipartite graph into a
one-mode graph), but we would not recommend Gephi
5 http://pajek.imfm.si/doku.php
for that as two-modes graphs is not strictly two-mode
6 http://gephi.org
7 http://igraph.sourceforge.net
8 http://networkx.lanl.gov/ 9 http://cran.r-project.org/
Combe et al. / A comparative study of social network analysis tools 9

Software Pajek [4] Gephi [3] NetworkX [20] igraph [9]


Version 1.26 0.7 alpha 0.6 0.5.3
Type Stand-alone software Stand-alone software Library Library
Platform Windows Java Python R / Python / C libraries
License Free for non-commercial GNU GPL BSD License GNU GPL
use
Expectable computing time Fast (C) Medium (Java) Fast (C, Python) Fast (C)
Tractable number of nodes 500,000 nodes 150,000 nodes 1,000,000 nodes > 1.9 million relations
(without attributes)
Time to load 105 nodes 24 seconds 40 seconds 137 seconds 11 seconds
and 106 edges
File formats
GML No Yes Yes Yes
Pajek (.net) Yes Import only Yes Yes
GraphML Export only Yes Yes Yes
DL Yes Yes No No
GEXF No Yes No No
Graph types
Two-mode graphs Yes No Yes Yes
Multi-relational graphs Yes No No No
Temporality Yes No Yes No
Visualization layouts
Fruchterman Reingold Yes Yes No Yes
Kamada Kawai Yes Yes No Yes
Other spring layouts No Yes Yes Yes
Indicators
Degree centrality Yes Yes Yes Yes
Betweenness centrality Yes Yes Yes Yes
Closeness centrality Yes Yes Yes Yes
Dyad census No No No Yes
Triad census Yes No No Yes
HITS No Yes Yes Yes
Page Rank No Yes Yes Yes
Clustering algorithms
Edge betweenness No No No Yes
Walktrap No No No Yes
Spinglass No No No Yes
Dendogram display Yes Yes No Yes

Table 1. Features and availability of the main algorithms in the retained software

graph enabled. Pajek can handle links from different virtual machine pop up. The visualization pane is an
kinds. The temporality starts being taken into account important part of Gephi, while the other tools can pro-
in different projects. For now, the data can be filtered in cess indicators independently of drawing the Graph.
function of a year associated to the nodes for example, Such an architecture could penalize the application for
if the data format is adapted. this criterion. Pajek does not suffer for this point and
The tool appearing as the less efficient in matter of can load 500,000 in 52 minutes. igraph is very fast for
allowed vertices in memory is Gephi. After 200,000 data loading (22 seconds for 2.9 millions of nodes, but
nodes on our reference computer (Intel Core 2 Duo the dataset was attribute-free (no name for nodes pro-
2.5 GHz, 2 Go RAM, Windows), some errors or mes- vided, as .net import is quite restricted for this tool).
sages invite to increase the dedicated memory for the Gephi and NetworkX appears to be limited in their ca-
10 Combe et al. / A comparative study of social network analysis tools

pacity by the RAM consumption. NetworkX is quite


slow for loading 100,000 nodes, but the loading is rea-
sonable beyond. Some features such as management
of multi-graphs can be the cause of degraded perfor-
mance.
The four softwares are suitable for computing com-
mon indicators, such as graph statistics, degree cen-
trality, closeness centrality and betweenness centrality
(igraph and NetworkX implementations of between-
ness centrality are based on the algorithm from Bran-
des [6].) Dyad and triad census are available in igraph
and Pajek (for triad census). For HITS and PageRank
indexes you can not rely on Pajek which is not up to
date. If you need to create your own indicators, the two
libraries and Gephi are useful.
Community detection is experimental in Gephi with
a beta version of Markov cluster algorithm (MCL) Fig. 9. Dendogram of the Walktrap algorithm results on the Zachary
while few algorithms are available in igraph. Pajek dataset (igraph website example)
offers hierarchical clustering capabilities. It can pro-
vide a dendogram representation of a hierarchical clus-
tering, as an EPS (PostScript) image. igraph offers vide a hash function for it). Using programming lan-
the dendogram plotting capabilities of R. As demon- guages it makes easy to redefine objects such as nodes
strated on the igraph website10 , the few lines provided in order to handle them as arbitrary objects. It has also
in Fig. 8 gives the Fig. 9. Gephi, Pajek and igraph some interesting functions if you use bipartite graphs.
gives a dendogram representation for the communi- igraph offers many algorithms among which some
ties obtained. Any connection between the visualiza- clustering oriented ones. It is available for both Python
tion Fig. 2 and the dendogram Fig. 9 must be done re- and R environments, and C libraries are available as
specting the fact igraph enumerates nodes from 0. well. With R, it is easy to integrate igraph routines in
Concerning visualization layouts, NetworkX lacks a statistical process. A graphical user interface exists
of basic algorithms. If you need advanced visualiza-
which offers easy visualization and some basic analy-
tion, you have to switch your data to an other plat-
sis functions. igraph is performance-oriented and ma-
form. The three other tools perform the Fruchterman
jority of its functionalities are implemented in C. 3D
Reingold and Kamada Kawai popular force-based al-
gorithms. visualization layouts are available. It offers some node-
related neighborhood similarity indexes such as Jac-
library(igraph) card, Dice and the inverse log-weighted similarities
g <- read.graph("karate.net", [1].
format="pajek") Pajek is a closed-source software. It is fast tool and
wt <- walktrap.community(g,
modularity=TRUE)
comfortable for some visualization purposes. It is not
dend <- as.dendrogram(wt, as extensible as the three other studied software. Nev-
use.modularity=TRUE) ertheless, Pajek is useful in hierarchical data manipu-
plot(dend, nodePar=list(pch=c(NA, 20))) lation and provides powerful and accessible data ma-
nipulation functions. 3D visualization and its export in
Fig. 8. Plot a dendogram with Walktrap and igraph VRML are also available.
Gephi is a quite new tool and it is updated fre-
NetworkX is well documented, it is interesting but quently. Many functionalities are already supported,
the clustering algorithms are missing. Nodes and edges but several algorithms are missing. Its ergonomics
can be any kind of objects (the only condition is to pro- makes Gephi easy to use. The rendering is highly cus-
tomizable and quite fast. It is possible to move vertices
10 http://igraph.sourceforge.net while layout algorithms are performing.
Combe et al. / A comparative study of social network analysis tools 11

4.5. Other interesting software for social network fits your problematic. For libraries, the choice depends
analysis on your favorite language. The computation time can
also be an important criterion in your choice ; if it is
There are many other SNA tools available, we tested the case, prefer a Python or a C-based software. If you
some of them such as: are interested in an interactive console (as the MAT-
LAB experience), you should definitely try igraph on
– GraphViz [12] is dedicated to graph visualization.
– Tulip [10] can handle over 1 million vertices and R.
4 millions edges. It has visualization, clustering
and extension by plug-ins capabilities.
– UCInet [5] is not free. It uses Pajek and Netdraw 5. Conclusion
for visualization. It is specialized in statistical and
matricial analysis. It calculates indicators (such as The fact that Social Network Analysis is situated be-
triad census, Freeman betweenness) and performs tween several domains (sociology, computer science,
hierarchical clustering. mathematics and physics) has led to many different
– JUNG [29], for Java Universal Network/Graph methodological approaches and to a lot of tools. That
Framework, is mainly developed for creating in- is why so many programs have been created in or-
teractive graphs in Java user interfaces, JUNG has der to manipulate and study them11 . While a stand-
been extended with some SNA metrics. alone software is very useful for graph visualization
– GUESS [2] is dedicated to visualization purposes. (up to a maximum of few thousands of nodes), data
It is published under the GPL license. format conversion or indicators computation, libraries
The reasons why other tools haven’t been detailed are more adapted for tasks involving tens of thousands
above are: of nodes and for operations such as the union and the
difference between sets of nodes or for the cluster-
– their narrow and specialized functionalities focal- ing. A fair separation of the algorithms, the user in-
ized on a single aspect, i.e. GUESS on visualiza- terface and the visualization pane is important. Gephi
tion, adopted this approach with the recent release of the
– factually replaced by other tools with the same Gephi toolkit, a library created from the Gephi logic
target features and audience (Tulip with Gephi), and algorithms.
– are not focused on a computer science vision, We can also say that today the freely available tools
– are not freely available. are able to provide a very rich set of functionalities, but
if one wants specific analysis, a commercial software
4.6. How to choose the right software for you? or complementary code developments may be needed.
Finally at this moment, the main challenges con-
The first question you have to ask is: "How to choose
cerning the graph exploration are oriented toward
the right software?". If you need standard graph visu-
high-level visualization (i.e. hierarchical graphs), while
alization, it is likely that you can find a software that
amongst the possible enhancements of social network
suits you. If your data is not in a standardized format
analysis tools, we can mention firstly the temporal
given in the list above, the best way is to generate a
analysis which should allow to study the evolution of
suited representation from your memory-loaded graph
networks over time, and secondly social mining which
or to convert it into GML for example. You can also
simultaneously exploits the attributes of nodes and the
look on this Wikipedia page11 for a list of input/output
graph structure.
formats allowed by a large panel of programs. If your
need is specific or your graph needs to be handled with
specific attributes for vertices and edges, you should
take a look to the libraries. In order to choose a pro- References
gram for visualizing or manipulating graphs, it is ad-
visable to try a few of them to check if the approach [1] L.A. Adamic and E. Adar. Friends and neighbors on the web.
Social Networks, 25(3):211–230, 2003.
[2] E. Adar. Guess: a language and interface for graph exploration.
11 http://en.wikipedia.org/wiki/Social_network In Proceedings of the SIGCHI conference on Human Factors
_analysis_software in computing systems, page 800. ACM, 2006.
12 Combe et al. / A comparative study of social network analysis tools

[3] M. Bastian, S. Heymann, and M. Jacomy. Gephi: An Open [19] J.-L. Guillaume and M. Latapy. Bipartite structure of all com-
Source Software for Exploring and Manipulating Networks. In plex networks. Information Processing Letters, 90(5):215–
International AAAI Conference on Weblogs and Social Media, 221, 2004.
pages 361–362, 2009. [20] A.A. Hagberg, D.A. Schult, and P.J. Swart. Exploring network
[4] V. Batagelj and A. Mrvar. Pajek-program for large network structure, dynamics, and function using NetworkX. In Proc.
analysis. Connections, 21(2):47–57, 1998. 7th SciPy Conf., Varoquaux G, Vaught T, and Millman J (Eds),
[5] S.P. Borgatti, M.G. Everett, and L.C. Freeman. Ucinet for Win- pages 11–15, 2008.
dows: Software for social network analysis. Harvard, MA: An- [21] PW. Holland and S. Leinhardt. A method for detecting struc-
alytic Technologies, 2002. ture in sociometric data. American Journal of Sociology,
[6] U. Brandes. A Faster algorithm for betweenness centrality. 76(3):492–513, 1970.
Journal of Mathematical Sociology, 25:163–177, 2001. [22] M. Huisman and M.A.J. Van Duijn. Software for social net-
[7] S. Brin and L. Page. The anatomy of a large-scale hypertextual work analysis, pages 270–316. 2004.
Web search engine. Computer networks and ISDN systems, [23] S.C. Johnson. Hierarchical clustering schemes. Psychome-
30(1-7):107–117, 1998. trika, 32(3):241–254, September 1967.
[8] D. Cartwright and F. Harary. A graph theoretic approach to the [24] T. Kamada and S. Kawai. An algorithm for drawing general
investigation of system-environment relationships. Journal of undirected graphs. Information processing letters, 31(12):7–
Mathematical Sociology, 5:87–111, 1977. 15, 1989.
[9] G. Csárdi and T. Nepusz. The igraph software package for [25] J.M. Kleinberg. Authoritative sources in a hyperlinked envi-
complex network research. InterJournal Complex Systems, ronment. Journal of the ACM (JACM), 46(5):604–632, 1999.
1695, 2006. [26] J.L. Moreno. Who shall survive? New York: Beacon Press,
[10] A. David. Tulip. Lecture notes in computer science, pages 1934.
435–437, 2002. [27] M.E.J. Newman and M. Girvan. Finding and evaluating com-
[11] J. A. Davis and S. Leinhardt. The Structure of Positive Inter- munity structure in networks. Physical review E, 69(2):26113,
personal Relations in Small Groups. Sociological Theories in 2004.
Progress, 2:218–251, 1967. [28] J. Nieminen. On the centrality in a graph. Scandinavian Jour-
[12] J. Ellson, E. Gansner, L. Koutsofios, S. North, and G. Wood- nal of Psychology, 15(1):332–336, 1974.
hull. Graphviz - open source graph drawing tools. In Graph [29] J. O’Madadhain, D. Fisher, S. White, and Y. Boey. The jung
Drawing, pages 594–597. Springer, 2001. (java universal network/graph) framework. University of Cali-
[13] Santo Fortunato. Community detection in graphs. Physics Re- fornia, Irvine, California, 2003.
ports, page 103, juin 2009. [30] T. Opsahl. Structure and Evolution of Weighted Networks.
[14] L.C. Freeman. Centrality in social networks conceptual clari- pages 104–122, 2009.
fication. Social networks, 1(3):215–239, 1979. [31] P. Pons and M. Latapy. Computing communities in large
[15] T.M.J. Fruchterman and E.M. Reingold. Graph Drawing by networks using random walks. Computer and Information
Force-directed Placement. Software: Practice and Experience, Sciences-ISCIS 2005, pages 284–293, 2005.
21(11):1129–1164, 1991. [32] J. Reichardt and S. Bornholdt. Statistical mechanics of com-
[16] E.R. Gansner, Y. Koren, and S. North. Topological fisheye munity detection. Physical Review E, 74(1):16110, 2006.
views for visualizing large graphs. IEEE Transactions on Vi- [33] J. Scott. Social Network Analysis : A handbook. Newbury Park,
sualization and Computer Graphics, pages 457–468, 2005. CA, Sage Publications, 1994.
[17] Gartner. Hype Cycle for social software, 2008. G00158239, [34] S. Wasserman and K. Faust. Social Network Analysis. Cam-
2008. bridge University Press, 1994.
[18] S.M. Goodreau, M.S. Handcock, D.R. Hunter, and C.T. Butts. [35] WW Zachary. An information flow model for conflict and fis-
A statnet Tutorial. Journal of statistical software, 24(9):1, sion in small groups. Journal of Anthropological Research,
2008. 33(4):452–473, 1977.

You might also like