0% found this document useful (0 votes)
3K views

Graph Machine Learning

Uploaded by

sumer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

Graph Machine Learning

Uploaded by

sumer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Graph Machine

Learning

Take graph data to the next level by applying


machine learning techniques and algorithms

Claudio Stamile

Aldo Marzullo
Enrico Deusebio

BIRMINGHAM—MUMBAI
Graph Machine Learning
Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, without the prior written permission of the publisher,
except in the case of brief quotations embedded in critical articles or reviews.
Every e ort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without warranty,
either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors,
will be held liable for any damages caused or alleged to have been caused directly or indirectly by
this book.
Packt Publishing has endeavored to provide trademark information about all of the companies
and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing
cannot guarantee the accuracy of this information.

Group Product Manager: Kunal Parikh


Publishing Product Manager: Devika Battike
Senior Editor: Roshan Kumar
Content Development Editor: Sean Lobo
Technical Editor: Sonam Pandey
Copy Editor: Sa s Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Sa s Editing
Indexer: Vinayak Purushotham
Production Designer: Joshua Misquitta

First published: May 2021

Production reference: 1270521

Published by Packt Publishing Ltd.


Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-80020-449-2

www.packt.com
Alla memoria di mio Zio, Franchino Avolio. Alle ruote delle bici troppo
sgon e, all'infanzia che mi ha regalato.

In memory of my uncle, Franchino Avolio. To the wheels of bikes that are


too at, to the childhood he gave me.
– Claudio Stamile

To my family, my roots.
– Aldo Marzullo

To Lili, for always reminding me with your 'learning' process how


wonderful the human brain and life are.
– Enrico Deusebio
Contributors
About the authors
Claudio Stamile received an M.Sc. degree in computer science from the University of
Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint
Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon,
France). During his career, he has developed a solid background in arti cial intelligence,
graph theory, and machine learning, with a focus on the biomedical eld. He is currently
a senior data scientist in CGnal, a consulting rm fully committed to helping its top-tier
clients implement data-driven strategies and build AI-powered solutions to promote
e ciency and support new business models.
Aldo Marzullo received an M.Sc. degree in computer science from the University of
Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid
background in several areas, including algorithm design, graph theory, and machine
learning. In January 2020, he received his joint Ph.D. from the University of Calabria and
Université Claude Bernard Lyon 1 (Lyon, France), with a thesis entitled Deep Learning
and Graph eory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently
a postdoctoral researcher at the University of Calabria and collaborates with several
international institutions.
Enrico Deusebio is currently the chief operating o cer at CGnal, a consulting rm that
helps its top-tier clients implement data-driven strategies and build AI-powered solutions.
He has been working with data and large-scale simulations using high-performance
facilities and large-scale computing centers for over 10 years, both in an academic and
industrial context. He has collaborated and worked with top-tier universities, such as the
University of Cambridge, the University of Turin, and the Royal Institute of Technology
(KTH) in Stockholm, where he obtained a Ph.D. in 2014. He also holds B.Sc. and M.Sc.
degrees in aerospace engineering from Politecnico di Torino.
About the reviewers
Kacper Kubara is a technical co-founder of Artemo and a data engineer at Annual
Insight, and is currently pursuing a postgraduate degree in AI at the University of
Amsterdam. Despite the focus of his research being graph representation learning, he
is also interested in the tools and methods that help to bridge the gap between the AI
industry and academia.
Tural Gulmammadov has been leading a group of data scientists and machine learning
engineers at Oracle to tackle applied machine learning problems from various industries.
He is dedicated to and motivated by the applications of graph theory and discrete
mathematics in machine learning over distributed computational environments. He is a
cognitive science, statistics, and psychology enthusiast, as well as a chess player, painter,
seasonal horse rider, and paddler.
Table of Contents

Preface

Section 1 – Introduction to Graph Machine


Learning
1
Getting Started with Graphs
Technical requirements 4 Segregation metrics 30
Introduction to graphs with Centrality metrics 32
networkx 5 Resilience metrics 35
Types of graphs 9 Benchmarks and repositories 36
Graph representations 14
Examples of simple graphs 36
Plotting graphs 18 Generative graph models 38
networkx 18 Benchmarks 40
Gephi 21 Dealing with large graphs 47
Graph properties 26 Summary 48
Integration metrics 27

2
Graph Machine Learning
Technical requirements 52 The generalized graph
Understanding machine embedding problem 57
learning on graphs 52 The taxonomy of graph
Basic principles of machine learning 53 embedding machine learning
The bene�t of machine learning on algorithms 64
graphs 55 The categorization of embedding
ii Table of Contents

algorithms 65 Summary 68

Section 2 – Machine Learning on Graphs


3
Unsupervised Graph Learning
Technical requirements 72 Our �rst autoencoder 95
The unsupervised graph Denoising autoencoders 100
embedding roadmap 72 Graph autoencoders 102

Shallow embedding methods 74 Graph neural networks 104


Matrix factorization 74 Variants of GNNs 106
Skip-gram 81 Spectral graph convolution 107
Spatial graph convolution 110
Autoencoders 92
Graph convolution in practice 111
TensorFlow and Keras – a powerful
combination 93 Summary 114

4
Supervised Graph Learning
Technical requirements 116 Manifold regularization and semi-
supervised embedding 132
The supervised graph
embedding roadmap 116 Neural Graph Learning 134
Planetoid 144
Feature-based methods 117
Shallow embedding methods 121 Graph CNNs 145
Label propagation algorithm 121 Graph classi�cation using GCNs 145
Label spreading algorithm 127 Node classi�cation using GraphSAGE 148

Graph regularization methods 131 Summary 150

5
Problems with Machine Learning on Graphs
Technical requirements 152 Embedding-based methods 158
Predicting missing links in a Detecting meaningful
graph 153 structures such as communities
Similarity-based methods 154
Table of Contents iii

163 Detecting graph similarities and


Embedding-based community graph matching 169
detection 164 Graph embedding-based methods 171
Spectral methods and matrix Graph kernel-based methods 171
factorization 165 GNN-based methods 171
Probability models 166 Applications 172
Cost function minimization 167
Summary 173

Section 3 – Advanced Applications of Graph


Machine Learning
6
Social Network Graphs
Technical requirements 178 Embedding for supervised and
Overview of the dataset 178 unsupervised tasks 188
Dataset download 179 Task preparation 189
Loading the dataset using networkx 180 node2vec-based link prediction 190
GraphSAGE-based link prediction 191
Network topology and Hand-crafted features for link
community detection 181 prediction 197
Topology overview 182 Summary of results 199
Node centrality 183
Community detection 186
Summary 200

7
Text Analytics and Natural Language Processing Using
Graphs
Technical requirements 202 Knowledge graphs 210
Providing a quick overview of a Bipartite document/entity graphs 212
dataset 203 Building a document topic
Understanding the main classi�er 233
concepts and tools used in NLP 204 Shallow learning methods 234
Creating graphs from a corpus Graph neural networks 238
of documents 209
Summary 249
iv Table of Contents

8
Graph Analysis for Credit Card Transactions
Technical requirements 252 Embedding for supervised and
Overview of the dataset 252 unsupervised fraud detection 270
Loading the dataset and graph Supervised approach to fraudulent
building using networkx 254 transaction identi�cation 271
Unsupervised approach to fraudulent
Network topology and transaction identi�cation 274
community detection 260
Network topology 260
Summary 277
Community detection 264

9
Building a Data-Driven Graph-Powered Application
Technical requirements 280 Graph processing engines 285
Overview of Lambda Graph querying layer 288
architectures 280 Selecting between Neo4j and GraphX 293

Lambda architectures for Summary 293


graph-powered applications 283

10
Novel Trends on Graphs
Technical requirements 296 Graph machine learning and
neuroscience 302
Learning about data
augmentation for graphs 296 Graph theory and chemistry and
biology 304
Sampling strategies 297
Graph machine learning and computer
Exploring data augmentation
vision 304
techniques 298
Recommendation systems 305
Learning about topological data
Summary 305
analysis 299
Why subscribe? 307
Topological machine learning 300

Applying graph theory in new


domains 302
Table of Contents v

Other Books You May Enjoy


Index
Preface
Graph Machine Learning provides a new set of tools for processing network data and
leveraging the power of the relationship between entities that can be used for predictive,
modeling, and analytics tasks.
You will start with a brief introduction to graph theory and Graph Machine Learning,
learning to understand their potential. As you proceed, you will become well versed with
the main machine learning models for graph representation learning: their purpose,
how they work, and how they can be implemented in a wide range of supervised and
unsupervised learning applications. You'll then build a complete machine learning
pipeline, including data processing, model training, and prediction, in order to exploit
the full potential of graph data. Moving on, you will cover real-world scenarios, such as
extracting data from social networks, text analytics, and natural language processing using
graphs and nancial transaction systems on graphs. Finally, you will learn how to build
and scale out data-driven applications for graph analytics to store, query, and process
network information, before progressing to explore the latest trends on graphs.
By the end of this machine learning book, you will have learned the essential concepts
of graph theory and all the algorithms and techniques used to build successful machine
learning applications.

Who this book is for


is book is for data analysts, graph developers, graph analysts, and graph professionals
who want to leverage the information embedded in the connections and relations between
data points, unravel hidden structures, and exploit topological information to boost their
analysis and models' performance. e book will also be useful for data scientists and
machine learning developers who want to build machine learning-driven graph databases.
A beginner-level understanding of graph databases and graph data is required. An
intermediate-level working knowledge of Python programming and machine learning is
also expected to make the most out of this book.
viii Preface

What this book covers


Chapter 1, Getting Started with Graphs, introduces the basic concepts of graph theory
using the NetworkX Python library.
Chapter 2, Graph Machine Learning, introduces the main concepts of graph machine
learning and graph embedding techniques.
Chapter 3, Unsupervised Graph Learning, covers recent unsupervised graph embedding
methods.
Chapter 4, Supervised Graph Learning, covers recent supervised graph embedding
methods.
Chapter 5, Problems with Machine Learning on Graphs, introduces the most common
machine learning tasks on graphs.
Chapter 6, Social Network Analysis, shows an application of machine learning algorithms
on social network data.
Chapter 7, Text Analytics and Natural Language Processing Using Graphs, shows the
application of machine learning algorithms to natural language processing tasks.
Chapter 8, Graph Analysis for Credit Card Transactions, shows the application of machine
learning algorithms to credit card fraud detection.
Chapter 9, Building a Data-Driven Graph-Powered Application, introduces some
technologies and techniques that are useful for dealing with large graphs.
Chapter 10, Novel Trends on Graphs, introduces some novel trends (algorithms and
applications) in graph machine learning.

To get the most out of this book


A Jupyter or a Google Colab notebook is su cient to cover all the examples. For some
chapters, Neo4j and Gephi are also required.
Preface ix

If you are using the digital version of this book, we advise you to type the code yourself
or access the code via the GitHub repository (link available in the next section). Doing
so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code �les


You can download the example code les for this book from GitHub at https://
github.com/PacktPublishing/Graph-Machine-Learning. In case there's an
update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at
https://github.com/PacktPublishing/. Check them out!

Download the color images


We also provide a PDF le that has color images of the screenshots/diagrams used
in this book. You can download it here: https://static.packt-cdn.com/
downloads/9781800204492_ColorImages.pdf.

Conventions used
ere are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names,
lenames, le extensions, pathnames, dummy URLs, user input, and Twitter handles.
Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image le as
another disk in your system."
A block of code is set as follows:

html, body, #map {


height: 100%;
margin: 0;
padding: 0
}
x Preface

When we wish to draw your attention to a particular part of a code block, the relevant
lines or items are set in bold:

Jupyter==1.0.0
networkx==2.5
matplotlib==3.2.2
node2vec==0.3.3
karateclub==1.0.19
scipy==1.6.2

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see on screen. For
example, words in menus or dialog boxes appear in the text like this. Here is an example:
"Select System info from the Administration panel."

Tips or important notes


Appear like this.

Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book
title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes
do happen. If you have found a mistake in this book, we would be grateful if you would
report this to us. Please visit www.packtpub.com/support/errata, selecting your
book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet,
we would be grateful if you would provide us with the location address or website name.
Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in,
and you are interested in either writing or contributing to a book, please visit authors.
packtpub.com.
Preface xi

Reviews
Please leave a review. Once you have read and used this book, why not leave a review on
the site that you purchased it from? Potential readers can then see and use your unbiased
opinion to make purchase decisions, we at Packt can understand what you think about
our products, and our authors can see your feedback on their book. ank you!
For more information about Packt, please visit packt.com.
Section 1 –
Introduction to
Graph Machine
Learning

In this section, the reader will get a brief introduction to graph machine learning, showing
the potential of graphs combined with the right machine learning algorithms. Moreover,
a general overview of graph theory and Python libraries is provided in order to allow the
reader to deal with (that is, create, modify, and plot) graph data structures.
is section comprises the following chapters:

• Chapter 1, Getting Started with Graphs


• Chapter 2, Graph Machine Learning
1
Getting Started with
Graphs
Graphs are mathematical structures that are used for describing relations between entities
and are used almost everywhere. For example, social networks are graphs, where users
are connected depending on whether one user "follows" the updates of another user. ey
can be used for representing maps, where cities are linked through streets. Graphs can
describe biological structures, web pages, and even the progression of neurodegenerative
diseases.
Graph theory, the study of graphs, has received major interest for years, leading people
to develop algorithms, identify properties, and de ne mathematical models to better
understand complex behaviors.
is chapter will review some of the concepts behind graph-structured data. eoretical
notions will be presented, together with examples to help you understand some of the
more general concepts and put them into practice. In this chapter, we will introduce and
use some of the most widely used libraries for the creation, manipulation, and study of the
structure dynamics and functions of complex networks, speci cally looking at the Python
networkx library.
4 Getting Started with Graphs

e following topics will be covered in this chapter:

• Introduction to graphs with networkx


• Plotting graphs
• Graph properties
• Benchmarks and repositories
• Dealing with large graphs

Technical requirements
We will be using Jupyter Notebooks with Python 3.8 for all of our exercises. In the
following code snippet, we show a list of Python libraries that will be installed for
this chapter using pip (for example, run pip install networkx==2.5 on the
command line, and so on):

Jupyter==1.0.0
networkx==2.5
snap-stanford==5.0.0
matplotlib==3.2.2
pandas==1.1.3
scipy==1.6.2

In this book, the following Python commands will be referred to:

• import networkx as nx
• import pandas as pd
• import numpy as np

For more complex data visualization tasks, Gephi (https://gephi.org/) is also


required. e installation manual is available here: https://gephi.org/users/
install/. All code les relevant to this chapter are available at https://github.
com/PacktPublishing/Graph-Machine-Learning/tree/main/Chapter01.
Introduction to graphs with networkx 5

Introduction to graphs with networkx


In this section, we will give a general introduction to graph theory. Moreover, in order
to merge theoretical concepts with their practical implementation, we will enrich our
explanation with code snippets in Python, using networkx.
A simple undirected graph (or simply, a graph) G is de ned as a couple G=(V,E) , where
V={ v1, .., vn} is a set of nodes (also called vertices) and E={{ vk , vw } .., {vi,vj}} is a set
of two-sets (set of two elements) of edges (also called links), representing the connection
between two nodes belonging to V.
It is important to underline that since each element of E is a two-set, there is no order
between each edge. To provide more detail, { vk , vw } and { vw , vk } represent the same
edge.
We now provide de nitions for some basic properties of graphs and nodes, as follows:

• e order of a graph is the number of its vertices |V|. e size of a graph is the
number of its edges |E|.
• e degree of a vertex is the number of edges that are adjacent to it. e neighbors of
a vertex v in a graph G is a subset of vertex V ′ induced by all vertices adjacent to v.
• e neighborhood graph (also known as an ego graph) of a vertex v in a graph G
is a subgraph of G, composed of the vertices adjacent to v and all edges connecting
vertices adjacent to v.

An example of what a graph looks like can be seen in the following screenshot:

Figure 1.1 – Example of a graph


6 Getting Started with Graphs

According to this representation, since there is no direction, an edge from Milan to Paris
is equal to an edge from Paris to Milan. us, it is possible to move in the two directions
without any constraint. If we analyze the properties of the graph depicted in Figure 1.1,
we can see that it has order and size equal to 4 (there are, in total, four vertices and four
edges). e Paris and Dublin vertices have degree 2, Milan has degree 3, and Rome has
degree 1. e neighbors for each node are shown in the following list:

• Paris = {Milan, Dublin}


• Milan = {Paris, Dublin, Rome}
• Dublin = {Paris, Milan}
• Rome = {Milan}

e same graph can be represented in networkx, as follows:

import networkx as nx
G = nx.Graph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin'), ('Milan','Paris'), ('Paris','Dublin'),
('Milan','Rome')]
G.add_nodes_from(V)
G.add_edges_from(E)

Since by default, the nx.Graph() command generates an undirected graph, we do not


need to specify both directions of each edge. In networkx, nodes can be any hashable
object: strings, classes, or even other networkx graphs. Let's now compute some
properties of the graph we previously generated.
All the nodes and edges of the graph can be obtained by running the following code:

print(f"V = {G.nodes}")
print(f"E = {G.edges}")

Here is the output of the previous commands:

V = ['Rome', 'Dublin', 'Milan', 'Paris']


E = [('Rome', 'Milan'), ('Dublin', 'Milan'), ('Dublin',
'Paris'), ('Milan', 'Paris')]
Introduction to graphs with networkx 7

We can also compute the graph order, the graph size, and the degree and neighbors for
each of the nodes, using the following commands:

print(f"Graph Order: {G.number_of_nodes()}")


print(f"Graph Size: {G.number_of_edges()}")
print(f"Degree for nodes: { {v: G.degree(v) for v in G.nodes}
}")
print(f"Neighbors for nodes: { {v: list(G.neighbors(v)) for v
in G.nodes} }")

e result will be the following:

Graph Order: 4
Graph Size: 4
Degree for nodes: {'Rome': 1, 'Paris': 2, 'Dublin':2, 'Milan':
3}
Neighbors for nodes: {'Rome': ['Milan'], 'Paris': ['Milan',
'Dublin'], 'Dublin': ['Milan', 'Paris'], 'Milan': ['Dublin',
'Paris', 'Rome']}

Finally, we can also compute an ego graph of a speci c node for the graph G, as follows:

ego_graph_milan = nx.ego_graph(G, "Milan")


print(f"Nodes: {ego_graph_milan.nodes}")
print(f"Edges: {ego_graph_milan.edges}")

e result will be the following:

Nodes: ['Paris', 'Milan', 'Dublin', 'Rome']


Edges: [('Paris', 'Milan'), ('Paris', 'Dublin'), ('Milan',
'Dublin'), ('Milan', 'Rome')]

e original graph can be also modi ed by adding new nodes and/or edges, as follows:

#Add new nodes and edges


new_nodes = {'London', 'Madrid'}
new_edges = [('London','Rome'), ('Madrid','Paris')]
G.add_nodes_from(new_nodes)
G.add_edges_from(new_edges)
print(f"V = {G.nodes}")
print(f"E = {G.edges}")
8 Getting Started with Graphs

is would output the following lines:

V = ['Rome', 'Dublin', 'Milan', 'Paris', 'London', 'Madrid']


E = [('Rome', 'Milan'), ('Rome', 'London'), ('Dublin',
'Milan'), ('Dublin', 'Paris'), ('Milan', 'Paris'), ('Paris',
'Madrid')]

Removal of nodes can be done by running the following code:

node_remove = {'London', 'Madrid'}


G.remove_nodes_from(node_remove)
print(f"V = {G.nodes}")
print(f"E = {G.edges}")

is is the result of the preceding commands:

V = ['Rome', 'Dublin', 'Milan', 'Paris']


E = [('Rome', 'Milan'), ('Dublin', 'Milan'), ('Dublin',
'Paris'), ('Milan', 'Paris')]

As expected, all the edges that contain the removed nodes are automatically deleted from
the edge list.
Also, edges can be removed by running the following code:

node_edges = [('Milan','Dublin'), ('Milan','Paris')]


G.remove_edges_from(node_edges)
print(f"V = {G.nodes}")
print(f"E = {G.edges}")

e nal result will be as follows:

V = ['Dublin', 'Paris', 'Milan', 'Rome']


E = [('Dublin', 'Paris'), ('Milan', 'Rome')]

e networkx library also allows us to remove a single node or a single edge from
a graph G by using the following commands: G. remove_node('Dublin') and
G.remove_edge('Dublin', 'Paris').
Introduction to graphs with networkx 9

Types of graphs
In the previous section, we described how to create and modify simple undirected graphs.
Here, we will show how we can extend this basic data structure in order to encapsulate
more information, thanks to the introduction of directed graphs (digraphs), weighted
graphs, and multigraphs.

Digraphs
A digraph G is de ned as a couple G=(V, E), where V={v1, .., vn } is a set of nodes and
E={(vk , vw ) .., ( vi, vj)} is a set of ordered couples representing the connection between
two nodes belonging to V.
Since each element of E is an ordered couple, it enforces the direction of the connection.
e edge (vk , vw ) means the node vk goes into vw . is is di erent from (vw , vk )
since it means the node vw goes to vk . e starting node vw is called the head, while the
ending node is called the tail.
Due to the presence of edge direction, the de nition of node degree needs to be extended.

Indegree and outdegree


For a vertex v, the number of head ends adjacent to v is called the indegree

(indicated by ( ) of v, while the number of tail ends adjacent to v is its
outdegree (indicated by +
( )).

An example of what a digraph looks like is available in the following screenshot:

Figure 1.2 – Example of a digraph


10 Getting Started with Graphs

e direction of the edge is visible from the arrow—for example, Milan -> Dublin means
from Milan to Dublin. Dublin has −
( ) = 2 and +
( ) = 0, Paris has −
( )=
+
0 and ( ) = 2, Milan has −
( ) = 1 and +
( ) = 2, and Rome has −
( )=1
+
and ( ) = 0.
e same graph can be represented in networkx, as follows:

G = nx.DiGraph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin'), ('Paris','Milan'), ('Paris','Dublin'),
('Milan','Rome')]
G.add_nodes_from(V)
G.add_edges_from(E)

e de nition is the same as that used for simple undirected graphs; the only di erence
is in the networkx classes that are used to instantiate the object. For digraphs, the
nx.DiGraph()class is used.
Indegree and Outdegree can be computed using the following commands:

print(f"Indegree for nodes: { {v: G.in_degree(v) for v in


G.nodes} }")
print(f"Outdegree for nodes: { {v: G.out_degree(v) for v in
G.nodes} }")

e results will be as follows:

Indegree for nodes: {'Rome': 1, 'Paris': 0, 'Dublin': 2,


'Milan': 1}
Outdegree for nodes: {'Rome': 0, 'Paris': 2, 'Dublin': 0,
'Milan': 2}

As for the undirected graphs, G.add_nodes_from(), G.add_edges_from(),


G.remove_nodes_from(), and G.remove_edges_from() functions can be used
to modify a given graph G.

Multigraph
We will now introduce the multigraph object, which is a generalization of the graph
de nition that allows multiple edges to have the same pair of start and end nodes.
A multigraph G is de ned as G=(V, E), where V is a set of nodes and E is a multi-set (a set
allowing multiple instances for each of its elements) of edges.
Introduction to graphs with networkx 11

A multigraph is called a directed multigraph if E is a multi-set of ordered couples;


otherwise, if E is a multi-set of two-sets, then it is called an undirected multigraph.
An example of a directed multigraph is available in the following screenshot:

Figure 1.3 – Example of a multigraph


In the following code snippet, we show how to use networkx in order to create a
directed or an undirected multigraph:

directed_multi_graph = nx.MultiDiGraph()
undirected_multi_graph = nx.MultiGraph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin'), ('Milan','Dublin'), ('Paris','Milan'),
('Paris','Dublin'), ('Milan','Rome'), ('Milan','Rome')]
directed_multi_graph.add_nodes_from(V)
undirected_multi_graph.add_nodes_from(V)
directed_multi_graph.add_edges_from(E)
undirected_multi_graph.add_edges_from(E)

e only di erence between a directed and an undirected multigraph is in the rst


two lines, where two di erent objects are created: nx.MultiDiGraph() is used to
create a directed multigraph, while nx.MultiGraph() is used to build an undirected
multigraph. e function used to add nodes and edges is the same for both objects.
12 Getting Started with Graphs

Weighted graphs
We will now introduce directed, undirected, and multi-weighted graphs.
An edge-weighted graph (or simply, a weighted graph) G is de ned as G=(V, E ,w) where
V is a set of nodes, E is a set of edges, and w: E → ℝ is the weighted function that assigns
at each edge e ∈ E a weight expressed as a real number.
A node-weighted graph G is de ned as G=(V, E ,w) ,where V is a set of nodes, E is a set of
edges, and w: V → ℝ is the weighted function that assigns at each node v ∈ V a weight
expressed as a real number.
Please keep the following points in mind:

• If E is a set of ordered couples, then we call it a directed weighted graph.


• If E is a set of two-sets, then we call it an undirected weighted graph.
• If E is a multi-set, we will call it a weighted multigraph (directed weighted
multigraph).
• If E is a multi-set of ordered couples, it is an undirected weighted multigraph.

An example of a directed edge-weighted graph is available in the following screenshot:

Figure 1.4 – Example of a directed edge-weighted graph


From Figure 1.4, it is easy to see how the presence of weights on graphs helps to add useful
information to the data structures. Indeed, we can imagine the edge weight as a "cost" to
reach a node from another node. For example, reaching Dublin from Milan has a "cost"
of 19, while reaching Dublin from Paris has a "cost" of 11.
Introduction to graphs with networkx 13

In networkx, a directed weighted graph can be generated as follows:

G = nx.DiGraph()
V = {'Dublin', 'Paris', 'Milan', 'Rome'}
E = [('Milan','Dublin', 19), ('Paris','Milan', 8),
('Paris','Dublin', 11), ('Milan','Rome', 5)]
G.add_nodes_from(V)
G.add_weighted_edges_from(E)

Bipartite graphs
We will now introduce another type of graph that will be used in this section: multipartite
graphs. Bi- and tripartite graphs—and, more generally, kth-partite graphs—are graphs
whose vertices can be partitioned in two, three, or more k-th sets of nodes, respectively.
Edges are only allowed across di erent sets and are not allowed within nodes belonging
to the same set. In most cases, nodes belonging to di erent sets are also characterized by
particular node types. In Chapters 7, Text Analytics and Natural Language Processing Using
Graphs, and Chapter 8, Graphs Analysis for Credit Cards Transaction, we will deal with
some practical examples of graph-based applications and you will see how multipartite
graphs can indeed arise in several contexts—for example, in the following scenarios:

• When processing documents and structuring the information in a bipartite graph of


documents and entities that appear in the documents
• When dealing with transactional data, in order to encode the relations between the
buyers and the merchants

A bipartite graph can be easily created in networkx with the following code:

import pandas as pd
import numpy as np
n_nodes = 10
n_edges = 12
bottom_nodes = [ith for ith in range(n_nodes) if ith % 2 ==0]
top_nodes = [ith for ith in range(n_nodes) if ith % 2 ==1]
iter_edges = zip(
np.random.choice(bottom_nodes, n_edges),
np.random.choice(top_nodes, n_edges))
edges = pd.DataFrame([
{"source": a, "target": b} for a, b in iter_edges])
B = nx.Graph()
14 Getting Started with Graphs

B.add_nodes_from(bottom_nodes, bipartite=0)
B.add_nodes_from(top_nodes, bipartite=1)
B.add_edges_from([tuple(x) for x in edges.values])

e network can also be conveniently plotted using the bipartite_layout utility


function of networkx, as illustrated in the following code snippet:

from networkx.drawing.layout import bipartite_layout


pos = bipartite_layout(B, bottom_nodes)
nx.draw_networkx(B, pos=pos)

e bipatite_layout function produces a graph, as shown in the following


screenshot:

Figure 1.5 – Example of a bipartite graph

Graph representations
As described in the previous sections, with networkx, we can actually de ne and
manipulate a graph by using node and edge objects. In di erent use cases, such a
representation would not be as easy to handle. In this section, we will show two ways to
perform a compact representation of a graph data structure—namely, an adjacency matrix
and an edge list.
Introduction to graphs with networkx 15

Adjacency matrix
e adjacency matrix M of a graph G=(V,E) is a square matrix (|V| × |V|) matrix such that
its element is 1 when there is an edge from node i to node j, and 0 when there is no
edge. In the following screenshot, we show a simple example where the adjacency matrix
of di erent types of graphs is displayed:

Figure 1.6 – Adjacency matrix for an undirected graph, a digraph, a multigraph, and a weighted graph
16 Getting Started with Graphs

It is easy to see that adjacency matrices for undirected graphs are always symmetric,
since no direction is de ned for the edge. e symmetry instead is not guaranteed for the
adjacency matrix of a digraph due to the presence of constraints in the direction of the
edges. For a multigraph, we can instead have values greater than 1 since multiple edges
can be used to connect the same couple of nodes. For a weighted graph, the value in a
speci c cell is equal to the weight of the edge connecting the two nodes.
In networkx, the adjacency matrix for a given graph can be computed in two di erent
ways. If G is the networkx of Figure 1.6, we can compute its adjacency matrix as follows:

nx.to_pandas_adjacency(G) #adjacency matrix as pd DataFrame


nt.to_numpy_matrix(G) #adjacency matrix as numpy matrix

For the rst and second line, we get the following results respectively:

Rome Dublin Milan Paris


Rome 0.0 0.0 0.0 0.0
Dublin 0.0 0.0 0.0 0.0
Milan 1.0 1.0 0.0 0.0
Paris 0.0 1.0 1.0 0.0

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[1. 1. 0. 0.]
[0. 1. 1. 0.]]

Since a numpy matrix cannot represent the name of the nodes, the order of the element in
the adjacency matrix is the one de ned in the G.nodes list.

Edge list
As well as an adjacency matrix, an edge list is another compact way to represent graphs.
e idea behind this format is to represent a graph as a list of edges.
e edge list L of a graph G=(V,E) is a list of size |E| matrix such that its element is a
couple representing the tail and the end node of the edge i. An example of the edge list for
each type of graph is available in the following screenshot:
Introduction to graphs with networkx 17

Figure 1.7 – Edge list for an undirected graph, a digraph, a multigraph, and a weighted graph
In the following code snippet, we show how to compute in networkx the edge list of the
simple undirected graph G available in Figure 1.7:

print(nx.to_pandas_edgelist(G))
18 Getting Started with Graphs

By running the preceding command, we get the following result:

source target
0 Milan Dublin
1 Milan Rome
2 Paris Milan
3 Paris Dublin

Other representation methods, which we will not discuss in detail, are also available in
networkx. Some examples are nx.to_dict_of_dicts(G) and nx.to_numpy_
array(G), among others.

Plotting graphs
As we have seen in previous sections, graphs are intuitive data structures represented
graphically. Nodes can be plotted as simple circles, while edges are lines connecting two
nodes.
Despite their simplicity, it could be quite di cult to make a clear representation when the
number of edges and nodes increases. e source of this complexity is mainly related to
the position (space/Cartesian coordinates) to assign to each node in the nal plot. Indeed,
it could be unfeasible to manually assign to a graph with hundreds of nodes the speci c
position of each node in the nal plot.
In this section, we will see how we can plot graphs without specifying coordinates for each
node. We will exploit two di erent solutions: networkx and Gephi.

networkx
networkx o ers a simple interface to plot graph objects through the nx.draw library. In
the following code snippet, we show how to use the library in order to plot graphs:

def draw_graph(G, nodes_position, weight):


nx.draw(G, pos_ position, with_labels=True, font_size=15,
node_size=400, edge_color='gray', arrowsize=30)
if plot_weight:
edge_labels=nx.get_edge_attributes(G,'weight')
nx.draw_networkx_edge_labels(G, pos_ position, edge_
labels=edge_labels)
Plotting graphs 19

Here, nodes_position is a dictionary where the keys are the nodes and the value
assigned to each key is an array of length 2, with the Cartesian coordinate used for
plotting the speci c node.
e nx.draw function will plot the whole graph by putting its nodes in the given
positions. e with_labels option will plot its name on top of each node with the
speci c font_size value. node_size and edge_color will respectively specify the
size of the circle, representing the node and the color of the edges. Finally, arrowsize
will de ne the size of the arrow for directed edges. is option will be used when the
graph to be plotted is a digraph.
In the following code example, we show how to use the draw_graph function previously
de ned in order to plot a graph:

G = nx.Graph()
V = {'Paris', 'Dublin','Milan', 'Rome'}
E = [('Paris','Dublin', 11), ('Paris','Milan', 8),
('Milan','Rome', 5), ('Milan','Dublin', 19)]
G.add_nodes_from(V)
G.add_weighted_edges_from(E)
node_position = {"Paris": [0,0], "Dublin": [0,1], "Milan":
[1,0], "Rome": [1,1]}
draw_graph(G, node_position, True)

e result of the plot is available to view in the following screenshot:

Figure 1.8 – Result of the plotting function


20 Getting Started with Graphs

e method previously described is simple but unfeasible to use in a real scenario since
the node_position value could be di cult to decide. In order to solve this issue,
networkx o ers a di erent function to automatically compute the position of each node
according to di erent layouts. In Figure 1.9, we show a series of plots of an undirected
graph, obtained using the di erent layouts available in networkx. In order to use them
in the function we proposed, we simply need to assign node_position to the result
of the layout we want to use—for example, node_position = nx.circular_
layout(G). e plots can be seen in the following screenshot:

Figure 1.9 – Plots of the same undirected graph with di erent layouts
networkx is a great tool for easily manipulating and analyzing graphs, but it does
not o er good functionalities in order to perform complex and good-looking plots of
graphs. In the next section, we will investigate another tool to perform complex graph
visualization: Gephi.

You might also like