A Comparative Study of Social Network Analysis Tools
A Comparative Study of Social Network Analysis Tools
Abstract. Social networks have known an important development since the appearance of web 2.0 platforms. This leads to a
growing need for social network mining and social network analysis (SNA) methods and tools in order to provide deeper analysis
of the network but also to detect communities in view of various applications. For this reason, a lot of works have focused on
graph characterization or clustering and several new SNA tools have been developed over these last years. The purpose of this
article is to compare some of these tools which implement algorithms dedicated to social network analysis.
(wikis, social bookmarks and social tagging, etc) and is the case, for instance in a classical dataset [35] re-
services (GData, Google Friend Connect, OpenSocial, lated to a karate club where the nodes correspond to
Facebook Beacon,. . . ) were proposed on internet, sev- the members of the club and where the edges are used
eral new SNA tools have been developed. These tools to describe their friendships. When the relationships
are very useful to analyze theoretically a social net- are directed, edges are replaced by arcs. Nodes as well
work but also to represent it graphically. They compute as edges can have attributes. In that case, we can talk
different indicators which characterize the network’s then about labeled graphs.
structure, the relationships between the actors as well
as the position of a particular actor. They also allow the 2.2. Two-mode graph
comparison of several networks. The purpose of this
article is to present some actual major tools and to de- When the relationships between two types of ele-
scribe some of their functionalities. A similar compar- ments are considered, for example the members and
ison has already been done in [22], but with a more the competitions in the karate club, a two-mode graph
statistical vision. Our comparative survey on the state- is most suited to represent the social network. A two-
of-the-art tools for network visualization and analysis, mode graph, also known as bipartite graph, is a graph
is focused on three main points: with two types of vertices. The edges are allowed only
between nodes of different types.
– Graph visualization;
The most common way to store two-mode data is
– Computation of various indicators providing a lo-
a rectangular data matrix with the two node types re-
cal (i.e. at the node level) or a global description
spectively in rows and columns. For example, a 2 di-
(i.e. on the whole graph);
mensional matrix with the actors in rows and the events
– Community detection (i.e. clustering);
in columns can represent a two-mode graph for the
In order to present the characteristics of the differ- karate club. This representation is very common in
ent tools, the main concepts used to represent social SNA [19]. Two-modes graphs can be transformed in
networks are defined in the next section. The different one-mode graphs using a projection on one node type
measures we want to find in a SNA tool are presented and creating edges between these nodes using different
in section 3. We will describe the benchmarking ap- aggregation functions.
proach and the results of this comparative study in sec- The concept of graph can be generalized by a hyper-
tion 4 and then we will conclude. graph, in which two sets of vertices can be connected
by an edge. A multigraph is a graph which is permitted
to have edges that have the same end nodes.
2. Notations In the next sections, we note |V | and |E| the number
of vertices and edges in G and deg(v), the degree of
The theoretical framework for social network anal- the node v giving the number of adjacent edges to v.
ysis was introduced in the 1960s. Following the ba-
sic idea of Moreno [26] who suggested to represent
agents by points connected by lines, Cartwright and 3. Expected functionalities of network analysis
Harary have proposed to analyze this sociogram using tools
the graph theory. For this reason, they are considered
as the founders of the modern graph theory for social This work focuses on different functionalities pro-
network analysis [8]. vided by network analysis tools. These functionalities
Two types of graphs can be defined to represent a are firstly the visualization of the network, secondly
social network: one-mode and two-mode graphs. the computation of statistics based on nodes and on
edges, and finally, community detection (or cluster-
2.1. One-mode Graph ing).
20
27
15
15
26 16
10
21 13
32 29
27 20
14 31
6 23 14 4 11
12
34 7
21
11
9
31
33
34
18 1 33
10
25 13
19
30
7 9
8
3 2 1 6
2
19
16
17
29 5
23
4
24 32
28 8
28
22
22
5 18 12
3
24
26
25
30
Fig. 1. Visualization of Zachary’s Karate club using the igraph li- Fig. 2. Visualization of Zachary’s Karate club using the Pajek appli-
brary and spring layout cation and Kamada-Kawai layout
Many algorithms consist in pushing isolated vertices ate properties of the graph like the randomness or small
toward empty spaces and in grouping adjacent nodes. world distributions.
These algorithms are directly inspired by physical phe- On the other hand, the descriptors at the node level
nomena. For example, edges can be seen as springs and are useful for detecting the nodes strategically placed
nodes can be handled as electrically charged particles. in the network or highlighting those that take an im-
portant part in communication such as bridges or hubs.
The location of each element is recalculated step by
step. These methods require several iterations in order 3.2.1. Vertex and edge scoring
to provide a good result on large graphs. Force-based The place of a given actor in the network can be de-
layouts are simple to develop but are subject to poor scribed using measures based on vertex scoring. Com-
local minimum results (see Fig. 1). mon types of vertex scoring are the centrality mea-
Among these algorithms, we can mention, Fruchter- sures. Within graph theory and network analysis, there
man Reingold, which is a well-used force-based al- are various measures of the centrality of a vertex to
gorithm for graph visualization [15]. An example is determine the relative importance of this vertex within
provided on Fig. 3. An alternative is the algorithm of the graph. For example, to measure how important a
Kamada-Kawai [24] (see Fig. 2), which has a faster person is within a social network, Freeman [14] has
convergence than Fruchterman Reingold, but which distinguished three main centralities:
often does not give so good results than this last one. a) Degree centrality: The first and simplest measure is
It can be envisaged to use Kamada-Kawai in order to the degree centrality. It emphasizes nodes with the
calculate a first placement of the vertices. These two high degrees [28].
methods are among those called “spring algorithms”. In oriented graphs, we can distinguish:
Some other layouts are different in the way they pro-
vide a view of the neighborhood for a node (i.e. ra- – incoming degree of a vertex v:
dial layout, hyperbolic layout). 3D graph visualization
is the logical extension of planar representations. Most N + (v) = |{i ∈ V : (i, v) ∈ E(G)}| (1)
of the methods proposed are adaptable to 3D.
– outgoing degree:
Local zoom based, so called fish-eye functionality
can be also interesting to visually explore large graphs
N − (v) = |{i ∈ V : (v, i) ∈ E(G)}| (2)
[16].
– degree centrality [14]:
3.2. Indicator based network description
deg(v)
Many quantitative indicators have been defined on CD (v) = (3)
|V | − 1
networks [34,33].
The descriptors at the network level are used to com- b) Closeness centrality: For connected graphs, close-
pare the proportion of nodes versus edges, or to evalu- ness centrality is the inverse of the average distance
4 Combe et al. / A comparative study of social network analysis tools
to all other nodes. This indicator can be useful for d) PageRank: The score computed by Page Rank [7] is
many applications in the real world. For instance, if higher for nodes that are highly connected and con-
edges were streets, the crossroad (vertex) with the nected with nodes that are highly connected them-
highest closeness centrality would be the best place selves. If L(v, u) is the number of links from page
for emergency services. v to u, then the Page Rank P R(u) of the vertex u
Closeness centrality is defined by: can be defined as:
|V | − 1 X P R(v)
CC (v) = P (4) P R(u) = (1 − d) + d ∗ (7)
u∈V,u6=v d(v, u)
L(v, u)
v|(v,u)∈E
where d(v, u) is a distance, like for example the The parameter d is a damping factor. PageRank
number of edges in the shortest path between two score is iterated until convergence.
nodes or the sum of the weight of these edges, in PageRank is a variant of the Eigenvector centrality
weighted graphs. measure.
Girvan[27]. Modularity is defined by: tegrated easily in any problematic and the second
one can be recommended for a Mathlab-like (console-
1 X based) approach.
Q= (Auv − Puv )δ(Cu , Cv ) (10)
2.|E| The following sections describe the dataset and the
(u,v)∈V ×V
criteria used in the benchmark.
where the couple (u, v) runs over all pairs of vertices,
4.1. Dataset
A is the adjacency matrix where Auv contains 1 if u
and v are linked by an edge and 0 otherwise, Puv is the
The dataset considered in this survey is a widely
expected number of edges between u and v, Cv is the
used data set in SNA literature. This dataset presents
group to which vertex v belongs and δ is the Kronecker
the affiliation graph between 34 members of the karate
delta, which is 1 if its two arguments are equal, and
club of a US university in 1970.
0 otherwise. The clustering corresponding to a unique
Zachary’s Karate Club2 has 34 vertex and 78 edges.
partition containing the whole graph has a modularity
Each vertex is numbered. An edge is present between
value of zero.
two nodes when the two corresponding individuals
“consistently interacted in contexts outside those of
karate class, workouts and club meetings” [35].
4. Benchmarking
4.2. Evaluated criteria
Many tools have been created for network analysis
and visualization purposes. A long list of tools is avail- In our benchmark, we have selected a set of evalua-
able on Wikipedia1 , with very different approaches. tion criteria. These criteria are the license of the tool,
Many are purely academic software. Some are oriented the data format handled, the graph types supported, the
toward visualization, other consist in APIs allowing amount of nodes that can be loaded in a reasonable
graph and hypergraph modeling with sometimes the time, the available indicators, the clustering algorithms
possibility of animation on vertices such as JUNG. included and the visualization layouts available. Each
Some tools are optimized for large data manipulation. criterion is detailed in the following sections.
Others propose low level implementations of specific
algorithms. 4.2.1. File formats
In this survey, the official documentation has been There are mainly three ways to express in a serial
inspected for libraries. We consider 4 tools: Pajek, manner the structure of a network:
Gephi, igraph and NetworkX that will be presented – adjacency matrix (square for directed graphs, tri-
further in this section. The choice of them is based on: angular for undirected ones)
– a balance between well established tools and – adjacency lists (for directed graphs), where the
newer ones, based on recent development stan- source node is followed by the list of the nodes
dards (in terms of ergonomics, modularity and that are the targets of every arcs starting from the
data portability), node
– a SNA point of view. The tools must provide basic – vertices pairs.
metrics for networks, Several file formats have been created in order to
– the networks size can reach tens of thousands of provide graph representations. Here are the main ones:
nodes.
a) Pajek graph file format (.net extension), while not
Pajek is a legacy software, with its own graph- very well documented, is very popular among so-
oriented approach. Gephi represents a modern answer cial network analysis tools (Fig. 4). It represents
for graph study with GUI (graphical user interface), in a text file, first the vertices (one per line) and
open source philosophy and plugin orientation. Net- then the edges. This format is not often handled
workx and igraph are two essential libraries for effi- in the other implementations except the Pajek pro-
cient large graph handling. The first one can be in-
2 The dataset is available at
1 http://en.wikipedia.org/wiki/Social_network http://vlado.fmf.uni-lj.si/pub/networks/
_analysis_software data/ucinet/ucidata.htm, at the date of April 2010, 5th.
Combe et al. / A comparative study of social network analysis tools 7
3 http://graphml.graphdrawing.org 4 http://gexf.net
8 Combe et al. / A comparative study of social network analysis tools
<?xml version="1.0" encoding="UTF-8"?> are dedicated to social network analysis. The covered
<graphml xmlns="http://graphml.
functionalities are:
graphdrawing.org/xmlns"
.. – tnet [30] for weighted, two-mode, and longitudi-
.
nal networks (networks study over the time) anal-
<graph id="G" ysis,
edgedefault="undirected"> – statnet [18] for statistical analysis of social net-
<node id="1"/> works,
<node id="2"/>
– sna includes node and graph-level indices, struc-
<edge id="e1" source="1"
tural distance and covariance methods, structural
target="2"/>
.. equivalence detection, theoretic models fitting,
. random graph generation, and 2D/3D network vi-
</graph> sualization
</graphml>
These packages are available on the Comprehensive R
Archive Network9 .
Fig. 6. Zachary dataset extract in GraphML format
igraph and NetworkX are two libraries suitable for
social network analysis within the R environment. Net-
DL
workX can also be called in Python programs.
N=34 NM=2
FORMAT = FULLMATRIX DIAGONAL PRESENT
LEVEL LABELS: 4.4. Benchmarking results
ZACHE
ZACHC The benchmarking results are summarized in Ta-
DATA: ble 1.
0 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 1 0 0 They are detailed in this section, following the eval-
1 0 1 1 0 0 0 1 ... 0 0 0 0 0 1 0 0 0 uation criteria introduced previously (see 4.2): the li-
1 1 0 1 0 0 0 1 ... 0 0 1 1 0 0 0 1 0
cense of the tool, the data format handled, the graph
1 1 1 0 0 0 0 1 ... 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0
types supported, the available indicators, the cluster-
ing algorithms included and the visualization layouts
.. available.
.
The first point is licensing. It appears that NetworkX
Fig. 7. Zachary dataset in DAT format has the most permissive license, allowing integration
in proprietary software. Both igraph and Gephi have
4.3. Evaluated tools chosen GNU GPL which does not allow the integration
in proprietary software. Pajek source code is undis-
Two libraries and 2 stand-alone programs have been closed and the use of the software for commercial use
compared here: is not free. In matter of data format, Gephi handles
all the formats mentioned here. GEXF is not available
– Pajek 5
elsewhere mainly because this format started in the
– Gephi 6
Gephi project. DL comes with UCINET ; this last one
– igraph 7
being a project linked to Pajek, it is one of the pre-
– NetworkX 8
ferred formats for this tool. GML and GraphML are
The two libraries, igraph and NetworkX, use a gen- not supported in Pajek, so you can prefer the .net for-
eral purpose environment called R. mat, which is universal in our panel.
Concerning the bipartite graphs study and their ma-
The R environment It is dedicated to statistics. It is
nipulation, most tools propose a few primitives, such
organized into many packages amongst which, some
as projection (conversion of a bipartite graph into a
one-mode graph), but we would not recommend Gephi
5 http://pajek.imfm.si/doku.php
for that as two-modes graphs is not strictly two-mode
6 http://gephi.org
7 http://igraph.sourceforge.net
8 http://networkx.lanl.gov/ 9 http://cran.r-project.org/
Combe et al. / A comparative study of social network analysis tools 9
Table 1. Features and availability of the main algorithms in the retained software
graph enabled. Pajek can handle links from different virtual machine pop up. The visualization pane is an
kinds. The temporality starts being taken into account important part of Gephi, while the other tools can pro-
in different projects. For now, the data can be filtered in cess indicators independently of drawing the Graph.
function of a year associated to the nodes for example, Such an architecture could penalize the application for
if the data format is adapted. this criterion. Pajek does not suffer for this point and
The tool appearing as the less efficient in matter of can load 500,000 in 52 minutes. igraph is very fast for
allowed vertices in memory is Gephi. After 200,000 data loading (22 seconds for 2.9 millions of nodes, but
nodes on our reference computer (Intel Core 2 Duo the dataset was attribute-free (no name for nodes pro-
2.5 GHz, 2 Go RAM, Windows), some errors or mes- vided, as .net import is quite restricted for this tool).
sages invite to increase the dedicated memory for the Gephi and NetworkX appears to be limited in their ca-
10 Combe et al. / A comparative study of social network analysis tools
4.5. Other interesting software for social network fits your problematic. For libraries, the choice depends
analysis on your favorite language. The computation time can
also be an important criterion in your choice ; if it is
There are many other SNA tools available, we tested the case, prefer a Python or a C-based software. If you
some of them such as: are interested in an interactive console (as the MAT-
LAB experience), you should definitely try igraph on
– GraphViz [12] is dedicated to graph visualization.
– Tulip [10] can handle over 1 million vertices and R.
4 millions edges. It has visualization, clustering
and extension by plug-ins capabilities.
– UCInet [5] is not free. It uses Pajek and Netdraw 5. Conclusion
for visualization. It is specialized in statistical and
matricial analysis. It calculates indicators (such as The fact that Social Network Analysis is situated be-
triad census, Freeman betweenness) and performs tween several domains (sociology, computer science,
hierarchical clustering. mathematics and physics) has led to many different
– JUNG [29], for Java Universal Network/Graph methodological approaches and to a lot of tools. That
Framework, is mainly developed for creating in- is why so many programs have been created in or-
teractive graphs in Java user interfaces, JUNG has der to manipulate and study them11 . While a stand-
been extended with some SNA metrics. alone software is very useful for graph visualization
– GUESS [2] is dedicated to visualization purposes. (up to a maximum of few thousands of nodes), data
It is published under the GPL license. format conversion or indicators computation, libraries
The reasons why other tools haven’t been detailed are more adapted for tasks involving tens of thousands
above are: of nodes and for operations such as the union and the
difference between sets of nodes or for the cluster-
– their narrow and specialized functionalities focal- ing. A fair separation of the algorithms, the user in-
ized on a single aspect, i.e. GUESS on visualiza- terface and the visualization pane is important. Gephi
tion, adopted this approach with the recent release of the
– factually replaced by other tools with the same Gephi toolkit, a library created from the Gephi logic
target features and audience (Tulip with Gephi), and algorithms.
– are not focused on a computer science vision, We can also say that today the freely available tools
– are not freely available. are able to provide a very rich set of functionalities, but
if one wants specific analysis, a commercial software
4.6. How to choose the right software for you? or complementary code developments may be needed.
Finally at this moment, the main challenges con-
The first question you have to ask is: "How to choose
cerning the graph exploration are oriented toward
the right software?". If you need standard graph visu-
high-level visualization (i.e. hierarchical graphs), while
alization, it is likely that you can find a software that
amongst the possible enhancements of social network
suits you. If your data is not in a standardized format
analysis tools, we can mention firstly the temporal
given in the list above, the best way is to generate a
analysis which should allow to study the evolution of
suited representation from your memory-loaded graph
networks over time, and secondly social mining which
or to convert it into GML for example. You can also
simultaneously exploits the attributes of nodes and the
look on this Wikipedia page11 for a list of input/output
graph structure.
formats allowed by a large panel of programs. If your
need is specific or your graph needs to be handled with
specific attributes for vertices and edges, you should
take a look to the libraries. In order to choose a pro- References
gram for visualizing or manipulating graphs, it is ad-
visable to try a few of them to check if the approach [1] L.A. Adamic and E. Adar. Friends and neighbors on the web.
Social Networks, 25(3):211–230, 2003.
[2] E. Adar. Guess: a language and interface for graph exploration.
11 http://en.wikipedia.org/wiki/Social_network In Proceedings of the SIGCHI conference on Human Factors
_analysis_software in computing systems, page 800. ACM, 2006.
12 Combe et al. / A comparative study of social network analysis tools
[3] M. Bastian, S. Heymann, and M. Jacomy. Gephi: An Open [19] J.-L. Guillaume and M. Latapy. Bipartite structure of all com-
Source Software for Exploring and Manipulating Networks. In plex networks. Information Processing Letters, 90(5):215–
International AAAI Conference on Weblogs and Social Media, 221, 2004.
pages 361–362, 2009. [20] A.A. Hagberg, D.A. Schult, and P.J. Swart. Exploring network
[4] V. Batagelj and A. Mrvar. Pajek-program for large network structure, dynamics, and function using NetworkX. In Proc.
analysis. Connections, 21(2):47–57, 1998. 7th SciPy Conf., Varoquaux G, Vaught T, and Millman J (Eds),
[5] S.P. Borgatti, M.G. Everett, and L.C. Freeman. Ucinet for Win- pages 11–15, 2008.
dows: Software for social network analysis. Harvard, MA: An- [21] PW. Holland and S. Leinhardt. A method for detecting struc-
alytic Technologies, 2002. ture in sociometric data. American Journal of Sociology,
[6] U. Brandes. A Faster algorithm for betweenness centrality. 76(3):492–513, 1970.
Journal of Mathematical Sociology, 25:163–177, 2001. [22] M. Huisman and M.A.J. Van Duijn. Software for social net-
[7] S. Brin and L. Page. The anatomy of a large-scale hypertextual work analysis, pages 270–316. 2004.
Web search engine. Computer networks and ISDN systems, [23] S.C. Johnson. Hierarchical clustering schemes. Psychome-
30(1-7):107–117, 1998. trika, 32(3):241–254, September 1967.
[8] D. Cartwright and F. Harary. A graph theoretic approach to the [24] T. Kamada and S. Kawai. An algorithm for drawing general
investigation of system-environment relationships. Journal of undirected graphs. Information processing letters, 31(12):7–
Mathematical Sociology, 5:87–111, 1977. 15, 1989.
[9] G. Csárdi and T. Nepusz. The igraph software package for [25] J.M. Kleinberg. Authoritative sources in a hyperlinked envi-
complex network research. InterJournal Complex Systems, ronment. Journal of the ACM (JACM), 46(5):604–632, 1999.
1695, 2006. [26] J.L. Moreno. Who shall survive? New York: Beacon Press,
[10] A. David. Tulip. Lecture notes in computer science, pages 1934.
435–437, 2002. [27] M.E.J. Newman and M. Girvan. Finding and evaluating com-
[11] J. A. Davis and S. Leinhardt. The Structure of Positive Inter- munity structure in networks. Physical review E, 69(2):26113,
personal Relations in Small Groups. Sociological Theories in 2004.
Progress, 2:218–251, 1967. [28] J. Nieminen. On the centrality in a graph. Scandinavian Jour-
[12] J. Ellson, E. Gansner, L. Koutsofios, S. North, and G. Wood- nal of Psychology, 15(1):332–336, 1974.
hull. Graphviz - open source graph drawing tools. In Graph [29] J. O’Madadhain, D. Fisher, S. White, and Y. Boey. The jung
Drawing, pages 594–597. Springer, 2001. (java universal network/graph) framework. University of Cali-
[13] Santo Fortunato. Community detection in graphs. Physics Re- fornia, Irvine, California, 2003.
ports, page 103, juin 2009. [30] T. Opsahl. Structure and Evolution of Weighted Networks.
[14] L.C. Freeman. Centrality in social networks conceptual clari- pages 104–122, 2009.
fication. Social networks, 1(3):215–239, 1979. [31] P. Pons and M. Latapy. Computing communities in large
[15] T.M.J. Fruchterman and E.M. Reingold. Graph Drawing by networks using random walks. Computer and Information
Force-directed Placement. Software: Practice and Experience, Sciences-ISCIS 2005, pages 284–293, 2005.
21(11):1129–1164, 1991. [32] J. Reichardt and S. Bornholdt. Statistical mechanics of com-
[16] E.R. Gansner, Y. Koren, and S. North. Topological fisheye munity detection. Physical Review E, 74(1):16110, 2006.
views for visualizing large graphs. IEEE Transactions on Vi- [33] J. Scott. Social Network Analysis : A handbook. Newbury Park,
sualization and Computer Graphics, pages 457–468, 2005. CA, Sage Publications, 1994.
[17] Gartner. Hype Cycle for social software, 2008. G00158239, [34] S. Wasserman and K. Faust. Social Network Analysis. Cam-
2008. bridge University Press, 1994.
[18] S.M. Goodreau, M.S. Handcock, D.R. Hunter, and C.T. Butts. [35] WW Zachary. An information flow model for conflict and fis-
A statnet Tutorial. Journal of statistical software, 24(9):1, sion in small groups. Journal of Anthropological Research,
2008. 33(4):452–473, 1977.