A Tutorial: by Clement Levallois, Erasmus University Rotterdam
A Tutorial: by Clement Levallois, Erasmus University Rotterdam
V 1.1 - 2013
Bio notes
Clement Levallois
Table of content
Presentation of Gephi 4 Detecting communities
Setup 8 & visualizing them 32
Adding plugins 9 Tips to clean up the appearance 34
A note on terminology 10 Gephi: what’s the output? 35
--- start case 1 ---- 12 Export your visualization! 36
Importing the dataset 14 --- end case 1 ---
Import report 15
Saving a Gephi project 17 The “hairball” problem 41
Mouse controls 18 Tackling the hairball problem 42
Rearranging windows, tabs, panels 19
Switching on labels 20 Helper on importing CSV files 46
What a layout is 21 Helper on filters 47
Example of a simple layout 22 Helper on ranking panel 48
An insightful layout: Force Atlas 2 24 Helper on centrality 49
How do force-based layouts work 26
Making nodes bigger 27 Cheat sheets 51
Visualizing attributes in your data 28
Partitioning or ranking? 29 References 56
Example of partitioning 30
Visualizing new attributes with Gephi 31
Clement Levallois
Gephi
• Created in 2008 by a core team of 4 French
computing engineers inspired by a professor.
Clement Levallois
Gephi
• In 2013,
– 11 core developers
– 15 developers of plugins
– 4 Google Summer of Code
– One Java Duke Award
– 210,000 downloads in the past year
Clement Levallois
Gephi
• A software written in Java for Mac, PC and Linux
Clement Levallois
A note on the slides
Clement Levallois
Setup
• Check list:
- Make sure you have the latest version of Gephi installed
- In the menu of Gephi, go to Tools -> Options -> Visualization -> OpenGL
There, increasing the anti-aliasing factor can improve the quality of the view
on your screen.
Clement Levallois
Adding plugins
• In the menu, go to
Tools -> Plugins -> Available plugins
Note: You will remark there is a “Plugins” item in the menu as well,
which is empty. That’s a secret feature. Or just a very misleading UI design.
Clement Levallois
A note on the terminology
X Y
Y was born in …
Y’s marital status is …
Y’s latitude is …
Y’s longitude is …
An “edge”
A “node” (or a tie, or a link…)
(also called a vertex, Nodes attributes
plural vertices). (NB: edges can have attributes too!)
Clement Levallois
1st practice: workflow in Gephi
• Import the network
• Export
Clement Levallois
The dataset
• Coappeareance network from Les Miserables, by
Victor Hugo
Clement Levallois
1. Importing the dataset in Gephi
Clement Levallois
2. Import Report
This tab gives additional
This report provides information on the file you just opened info on the attributes
found in the graph –
useful to spot errors in
the file
If necessary, the
graph will be
resized to fit in the
window.
Should an edge A-B
be seen actually as
A->B? You decide it
here.
If you want
to merge the
graph with
Number of one that is
nodes found already open
in Gephi
Number of
edges found
Clement Levallois
4. Saving a Gephi project
• Gephi can save projects under the .gephi extension.
– These files can only be opened in Gephi, and contain all
information about your project. Click File -> Save.
– There are a number of reports of this files to be unstable,
so always also export your network as indicated just
below.
Clement Levallois
5. Mouse controls
• Left click: accomplishes the action you previously
selected
Clement Levallois
6. Rearranging windows, tabs, panels
Clement Levallois
7. Switching on labels
• Please refer to the cheat sheet 1 in appendix of this slide
deck.
• Notice the panel at the bottom of the screen, and its “Labels”
tab.
• Edges and nodes can have labels but in most use cases, only
nodes have them.
Clement Levallois
9. Example of a simple layout
• Select the “Circular
Layout” in the drop
down menu of
layouts.
Clement Levallois
10. Example of a simple layout
(continued)
The “circular layout” applies the following procedure: “spread nodes on a circle, at
an equal distance from each other.”
Clement Levallois
11. A insightful layout: Force Atlas 2
• Select “Force Atlas 2”
in the drop down
menu of the layouts.
• Run it!
Clement Levallois
12. Force Atlas – the result
The result is that densely connected nodes are grouped in the same regions.
Recent works have demonstrated analytically this result (that with certain kinds of
layouts, densely connected groups of nodes find a visual translation in a 2D plane).
See Noack (2009) and Waltman, van Eck and Noyons (2010) in the reference list.
Clement Levallois
13. How do force-based layouts work
• “Nodes repulse each other like magnets, while edges
attract their nodes, like springs”
Clement Levallois
14. Making nodes bigger
• Please refer to the cheat sheet 2 in appendix
to this slide deck.
Clement Levallois
15. Visualizing attributes in your data
• Nodes in our example network have one
attribute: the gender of the character in “Les
Miserables”.
Clement Levallois
16. Partitioning or ranking?
• Partitioning is this: Male
Fem
ale
Works for attributes that classify nodes in categories: “male or female”, “East,
West, South or North”, “country of residence”. Each category is represented by a
different color, as above.
• Ranking is this:
28 35
or this: 28 35
Works for attributes that are gradual, not categorical (age, etc… any numerical
attribute).
The graduation is represented visually either by bigger sizes, or by changes in
shades of color (from light to dark, or from light colors to warm colors, usually).
Clement Levallois
17. Example of partitioning
• In the “partition” tab, hit the refresh button
to make sure Gephi loads all avail. attributes.
• Click on “apply”!
Clement Levallois
19. Detecting communities
& visualizing them
• Go to the statistics panel and run the Modularity algorithm.
• This algo identifies relatively densely connected groups of nodes in the network.
• As a result, each node ends up being attributed to one of these groups (called
“class” here).
• The same way we have colorized nodes by their genders, we can colorize them by
the group (“class”) they belong to.
• Go to the partition tab, hit the refresh button, select “modularity class”, and
“apply”!
• Note: by clicking on the colored squared next to each category in the partition tab,
you can change and choose the colors you prefer. In a future release of Gephi, this
color choosing mechanism will be improved further.
Click on it to choose
a different color
Clement Levallois
20. Visualization of communities:
the result
Clement Levallois
21. Tips to clean up the appearance
• Apply the layout “Label adjust”
– Find it in the drop down menu of the layouts
Clement Levallois
22. Gephi: what’s the output?
• The next slide shows how to export your viz to a picture file. Is it
what Gephi is for, ultimately?
• In my view, the value added is before all in the insights you gain in
the process of exploring the viz interactively, and iteratively, in
Gephi.
Clement Levallois
23. Export your visualization!
• Directly from the overview panel with the screenshot option (see
cheat sheet 2 ).
• Why do the views in the overview and preview panels don’t exactly
match?
– Visualizing a graph in real time as in the overview panel requires a
technology (OpenGL) which is incompatible with an export in the pdf
or svg file formats.
– The preview panel provides this bridge towards pdf and svg, by using a
different technology to render the graph.
– This difference in technology means that some visual features in the
overview panel don’t translate exactly to the preview mode.
Clement Levallois
24. Les Miserables network
just before export to pdf
Clement Levallois
END CASE 1
• It would remain to cover:
– example of a ranking
– example of a filtering
– dynamic networks
– import formats
– plugins (huge chunk!)
– the Gephi toolkit
– …
Clement Levallois
CASE 2:
Digital humanities on Twitter
Twitter data collected with NodeXL
Clement Levallois
1. The original dataset
• Nodes: Twitter users who mentioned “digital
humanities” or #digitalhumanities in the on April
3, 2013 and the few days before.
Clement Levallois
2. The “hairball” problem
The dh twitter
network:
- spatialized with
Force Atlas 2
- modularity algo
applied
- partitioned
according to the
communities
found by the
modularity algo
One area of interest (the spike), but the rest is a big hairball. Not insightful!
Clement Levallois
3. Tackling the hairball problem
• The basic problem is that we have too many connections
=> If I mention somebody in just a single tweet, I’ll be connected
to this person on the graph.
Clement Levallois
4. The reworked dataset
• Original network is processed with Gaze
– Check www.clementlevallois.net/software.php
– Only the strongest connections are preserved.
Isolated nodes are deleted.
– Edge weight goes from 0.29 (small similarity) to 1
(absolute similarity).
• End up with 467 nodes and 14, 645 edges
• Find the original and reworked datasets at:
www.clementlevallois.net/gephi/tuto/dh.zip
Clement Levallois
2. Task:
• Import
• Spatialize
• Compute modularity
• Partition according to communities
• Rank nodes by number of followers
• Compute centrality
• Rank nodes by their centrality
Clement Levallois
• You work in autonomy
Clement Levallois
a) Helper on importing CSV files
• See cheat sheet 4 for reference
• Close all open projects, create a new empty one
• Go to the data laboratory
• Click on “import spreadsheet”
• Upload the files
– dh [Nodes].csv
– dh [Edges].csv
• Choose careful the type of the attributes! (String,
Integer or Float)
Clement Levallois
b) Helper on filters
We combine 2 filters:
- One which filters out (hides) edges that have a
weight under a certain threshold
Clement Levallois
c) Helper on ranking panel
Clement Levallois
d) Helper on centrality
• Centrality measures are different ways to
capture the elusive notion of being “central” in a
network.
• Betweenness centrality is one such approach:
– Among all the shortest paths connecting any two
nodes in a network, the most central node is the one
which lays on most of these paths.
A B C
Clement Levallois
Gephi Cheat Sheet 1
The Overview Panel
Where all the functions are available to explore the network visually.
Partition Statistics
Computes metrics on the network.
These metrics are recorded, and can then be
(tab hidden here)
(tab hidden here) used to be displayed on the graph. Ex: compute
For categorical the centrality of nodes. Then use the ranking
attributes. panel to make central nodes, bigger.
Example:
to color all nodes
representing males
in yellow, and all
nodes representing
females in green Filters
To hide or display
1 2 3
change edge
More label settings:
thickness
export a
1, the 3 buttons on the left:
How to memorize all
screenshot.
Click the
from left to right,
switch on or off the
- label size – should it track the node size?
- label color – should it track the node color?
these icons??
arrow for display of: - label font All these controls are also
resolution - nodes labels
settings - hulls (not implemented yet)
available with a more explicit
2, the slider: label size
switch - edges description in the panel here.
background - edge of the color of Once you know these controls well, the
the source node 3, the 2 buttons on the right:
color! (useful - label color icons are a quick way to access them.
- edge labels
for prints) - text to be displayed in the label Clement Levallois
Gephi Cheat Sheet 3
NODE VIEW
The Data Laboratory
Where the numerical and textual data for nodes and edges can be examined and modified.
Import function
Opens a dialog window to import nodes and
edges from a csv file into Gephi
To switch
between views
of nodes &
Extra
edges columns
Each node can
have extra
information,
besides its id
and label.
3 default This extra info is
columns written in
additional
for nodes columns.
Node: simply a
Example:
copy of the
here, each node
label column
is characterized
(or the id if
by a number,
there are no
recorded in a
labels).
column we
choose to call
Id: the unique
“Modularity
identifier of the
class”.
node
Columns can
Label: the name
contain
of the edge
numbers, text
which will be
or booleans
displayed next
(true / false).
to it if we
choose to.
Import function
Opens a dialog window to import nodes and
edges from a csv file into Gephi
To switch
between views
of
nodes & edges
Extra
6 default columns
columns Each edge can
have extra
for edges information,
Source and Target: besides its id and
the two connected label, type and
nodes forming the weight.
edge.
For example
Type: here, I added a
Is the direction of column to
the edge characterize the
meaningful? connection
between the 2
Id: the unique characters of the
identifier of the Miserables:
edge. friends or
enemies in the
Label: the name of novel?
the edge which will
be displayed next to
it if we choose to.
Weight: how
“strong” is the tie
between the two
nodes forming the
edge? This is a
numerical value.
Helper functions to quickly edit columns
Clement Levallois
Gephi Cheat Sheet 5
The Preview panel
Where you make final adjustments before exporting your visualization to an image file (PDF, SVG or PNG)
Load or save
parameters
1. Setting
the
parameters
Jacomy M., Heymann S., Venturini T. , Bastian M. (2012): “ForceAtlas2, A Continuous Graph
Layout Algorithm for Handy Network Visualization.” (draft) http://www.medialab.sciences-
po.fr/publications/Jacomy_Heymann_Venturini-Force_Atlas2.pdf
Noack A. (2009). “Modularity clustering is force-directed layout,” Physical Review E, vol. 79(2).
Thorp J. (2013). Visualization as Process, Not Output. HBR Blog Network, April 3.
http://blogs.hbr.org/cs/2013/04/visualization_as_process.html
Waltman L., van Eck N. J., Noyons E. C.M. (2010). “A unified approach to mapping and clustering
of bibliometric networks”, Arxiv, http://arxiv.org/abs/1006.1032
Clement Levallois