0% found this document useful (0 votes)

266 views9 pages

A Gentle Introduction To Graph Neural Networks

This document provides an overview of graph neural networks (GNNs) and introduces two specific GNN models: DeepWalk and GraphSage. It explains that GNNs operate directly on graph structures to learn node representations. DeepWalk learns unsupervised node embeddings through random walks and skip-gram modeling. GraphSage learns embeddings inductively by aggregating feature information from a node's neighbors. Both aim to cluster similar nodes together based on graph connectivity.

Uploaded by

Eric K. A. Yartey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

266 views9 pages

A Gentle Introduction To Graph Neural Networks

Uploaded by

Eric K. A. Yartey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Open in app

Follow 589K Followers

This is your last free member-only story this month. Upgrade for unlimited access.

A Gentle Introduction to Graph Neural

Networks (Basics, DeepWalk, and GraphSage)
Kung-Hsiang, Huang (Steeve) Feb 10, 2019 · 8 min read

Image from Pexels

Recently, Graph Neural Network (GNN) has gained increasing popularity in various
domains, including social network, knowledge graph, recommender system, and even
life science. The power of GNN in modeling the dependencies between nodes in a
graph enables the breakthrough in the research area related to graph analysis. This
article aims to introduce the basics of Graph Neural Network and two more advanced
algorithms, DeepWalk and GraphSage.

Graph
Before we get into GNN, let’s first understand what is Graph. In Computer Science, a
graph is a data structure consisting of two components, vertices and edges. A graph G
can be well described by the set of vertices V and edges E it contains.

Edges can be either directed or undirected, depending on whether there exist

directional dependencies between vertices.

A Directed Graph (wiki)

The vertices are often called nodes. In this article, these two terms are interchangeable.

Graph Neural Network

Graph Neural Network is a type of Neural Network which directly operates on the
Graph structure. A typical application of GNN is node classification. Essentially, every
node in the graph is associated with a label, and we want to predict the label of the
nodes without ground-truth. This section will illustrate the algorithm described in the
paper, the first proposal of GNN and thus often regarded as the original GNN.

In the node classification problem setup, each node v is characterized by its feature x_v
and associated with a ground-truth label t_v. Given a partially labeled graph G, the
goal is to leverage these labeled nodes to predict the labels of the unlabeled. It learns to
represent each node with a d dimensional vector (state) h_v which contains the
information of its neighborhood. Specifically,

https://arxiv.org/pdf/1812.08434

where x_co[v] denotes the features of the edges connecting with v, h_ne[v] denotes the
embedding of the neighboring nodes of v, and x_ne[v] denotes the features of the
neighboring nodes of v. The function f is the transition function that projects these
inputs onto a d-dimensional space. Since we are seeking a unique solution for h_v, we
can apply Banach fixed point theorem and rewrite the above equation as an iteratively
update process. Such operation is often referred to as message passing or
neighborhood aggregation.

https://arxiv.org/pdf/1812.08434

H and X denote the concatenation of all the h and x, respectively.

The output of the GNN is computed by passing the state h_v as well as the feature x_v
to an output function g.

https://arxiv.org/pdf/1812.08434

Both f and g here can be interpreted as feed-forward fully-connected Neural Networks.

The L1 loss can be straightforwardly formulated as the following:
https://arxiv.org/pdf/1812.08434

which can be optimized via gradient descent.

However, there are three main limitations with this original proposal of GNN pointed
out by this paper:

1. If the assumption of “fixed point” is relaxed, it is possible to leverage Multi-layer

Perceptron to learn a more stable representation, and removing the iterative
update process. This is because, in the original proposal, different iterations use
the same parameters of the transition function f, while the different parameters in
different layers of MLP allow for hierarchical feature extraction.

2. It cannot process edge information (e.g. different edges in a knowledge graph may
indicate different relationship between nodes)

3. Fixed point can discourage the diversification of node distribution, and thus may
not be suitable for learning to represent nodes.

Several variants of GNN have been proposed to address the above issue. However, they
are not covered as they are not the focus in this post.

DeepWalk
DeepWalk is the first algorithm proposing node embedding learned in an unsupervised
manner. It highly resembles word embedding in terms of the training process. The
motivation is that the distribution of both nodes in a graph and words in a corpus
follow a power law as shown in the following figure:

http://www.perozzi.net/publications/14_kdd_deepwalk.pdf
The algorithm contains two steps:

1. Perform random walks on nodes in a graph to generate node sequences

2. Run skip-gram to learn the embedding of each node based on the node sequences
generated in step 1

At each time step of the random walk, the next node is sampled uniformly from the
neighbor of the previous node. Each sequence is then truncated into sub-sequences of
length 2|w| + 1, where w denotes the window size in skip-gram. If you are not
familiar with skip-gram, my previous blog post shall brief you how it works.

In this paper, hierarchical softmax is applied to address the costly computation of

softmax due to the huge number of nodes. To compute the softmax value of each of the
individual output element, we must compute all the e^xk for all the element k.

The definition of Softmax

Therefore, the computation time is O(|V|) for the original softmax, where V denotes
the set of vertices in the graph.

Hierarchical softmax utilizes a binary tree to deal with the problem. In this binary tree,
all the leaves (v1, v2, … v8 in the above graph) are the vertices in the graph. In each of
the inner node, there is a binary classifier to decide which path to choose. To compute
the probability of a given vertex v_k, one simply compute the probability of each of the
sub-path along the path from the root node to the leave v_k. Since the probability of
each node’ children sums to 1, the property that the sum of the probability of all the
vertices equals 1 still holds in the hierarchical softmax. The computation time for an
element is now reduced to O(log|V|) as the longest path for a binary tree is bounded
by O(log(n)) where n is the number of leaves.
Hierarchical Softmax (http://www.perozzi.net/publications/14_kdd_deepwalk.pdf)

After a DeepWalk GNN is trained, the model has learned a good representation of each
node as shown in the following figure. Different colors indicate different labels in the
input graph. We can see that in the output graph (embedding with 2 dimensions),
nodes having the same labels are clustered together, while most nodes with different
labels are separated properly.

http://www.perozzi.net/publications/14_kdd_deepwalk.pdf

However, the main issue with DeepWalk is that it lacks the ability of generalization.
Whenever a new node comes in, it has to re-train the model in order to represent this
node (transductive). Thus, such GNN is not suitable for dynamic graphs where the
nodes in the graphs are ever-changing.

GraphSage
GraphSage provides a solution to address the aforementioned problem, learning the
embedding for each node in an inductive way. Specifically, each node is represented
by the aggregation of its neighborhood. Thus, even if a new node unseen during
training time appears in the graph, it can still be properly represented by its
neighboring nodes. Below shows the algorithm of GraphSage.

https://www-cs-faculty.stanford.edu/people/jure/pubs/graphsage-nips17.pdf

The outer loop indicates the number of update iteration, while h^k_v denotes the
latent vector of node v at update iteration k. At each update iteration, h^k_v is updated
based on an aggregation function, the latent vectors of v and v’s neighborhood in the
previous iteration, and a weight matrix W^k. The paper proposed three aggregation
function:

1. Mean aggregator:

The mean aggregator takes the average of the latent vectors of a node and all its
neighborhood.

https://www-cs-faculty.stanford.edu/people/jure/pubs/graphsage-nips17.pdf

Compared with the original equation, it removes the concatenation operation at line 5
in the above pseudo code. This operation can be viewed as a “skip-connection”, which
later in the paper proved to largely improve the performance of the model.

2. LSTM aggregator:

Since the nodes in the graph don’t have any order, they assign the order randomly by
permuting these nodes.

3. Pooling aggregator:

This operator performs an element-wise pooling function on the neighboring set.

Below shows an example of max-pooling:

https://www-cs-faculty.stanford.edu/people/jure/pubs/graphsage-nips17.pdf

, which can be replaced with mean-pooling or any other symmetric pooling function. It
points out that pooling aggregator performs the best, while mean-pooling and max-
pooling aggregator have similar performance. The paper uses max-pooling as the
default aggregation function.

The loss function is defined as the following:

https://www-cs-faculty.stanford.edu/people/jure/pubs/graphsage-nips17.pdf

where u and v co-occur in a fixed-length random walk, while v_n are the negative
samples that don’t co-occur with u. Such loss function encourages nodes closer to have
similar embedding, while those far apart to be separated in the projected space. Via
this approach, the nodes will gain more and more information about their
neighborhoods.

GraphSage enables representable embedding to be generated for unseen nodes by

aggregating its nearby nodes. It allows node embedding to be applied to domains
involving dynamic graph, where the structure of the graph is ever-changing. Pinterest,
for example, has adopted an extended version of GraphSage, PinSage, as the core of
their content discovery system.
Conclusion
You have learned the basics of Graph Neural Networks, DeepWalk, and GraphSage.
The power of GNN in modeling complex graph structures is truly astonishing. In view
of its effectiveness, I believe, in the near future, GNN will play an important role in AI’s
development. If you like my article, don’t forget to follow me on Medium and Twitter,
where I frequently shared the most advanced development of AI, ML, and DL.

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials
and cutting-edge research to original features you don't want to miss. Take a look.

Emails will be sent to [email protected].

Get this newsletter Not you?

Machine Learning Deep Learning Artificial Intelligence Data Science Programming

About Write Help Legal

Get the Medium app

Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
100% (1)
Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
504 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Generative AI A Transformative Force in Business Intelligence
No ratings yet
Generative AI A Transformative Force in Business Intelligence
7 pages
Advances in Intelligent Information and Database Systems
No ratings yet
Advances in Intelligent Information and Database Systems
371 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
A Learning Path Recommendation Model Based On A Multidimensional
No ratings yet
A Learning Path Recommendation Model Based On A Multidimensional
28 pages
Convolutional Neural Networks
100% (1)
Convolutional Neural Networks
31 pages
Ghana Passport Form
50% (2)
Ghana Passport Form
4 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
DuckDB in Action MEAP v01 Chptrs 1to3 MotheDuck
100% (1)
DuckDB in Action MEAP v01 Chptrs 1to3 MotheDuck
71 pages
The Elements of Differentiable Programming
No ratings yet
The Elements of Differentiable Programming
300 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
05 Linear Algebra and Machine Learning
0% (1)
05 Linear Algebra and Machine Learning
24 pages
Introduction To Machine Learning EECS 6327
No ratings yet
Introduction To Machine Learning EECS 6327
22 pages
Deeplearningsmartnetworks 190505233523
100% (1)
Deeplearningsmartnetworks 190505233523
101 pages
Towards Knowledge Graphs Validation Through Weighted Knowledge Sources
No ratings yet
Towards Knowledge Graphs Validation Through Weighted Knowledge Sources
25 pages
Top 10 Machine Learning Algo PDF
No ratings yet
Top 10 Machine Learning Algo PDF
15 pages
Classification of Customer Churn Prediction Model For Telecommunication Industry Using Analysis of Variance
No ratings yet
Classification of Customer Churn Prediction Model For Telecommunication Industry Using Analysis of Variance
7 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Residue Number Systems (RNS)
No ratings yet
Residue Number Systems (RNS)
19 pages
Gradient Boost
No ratings yet
Gradient Boost
72 pages
Module 3 Image Segmentation
No ratings yet
Module 3 Image Segmentation
296 pages
Compiling With Continuations Andrew W Appel 2007
No ratings yet
Compiling With Continuations Andrew W Appel 2007
270 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
100% (1)
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
Infosys Pragathi Report
No ratings yet
Infosys Pragathi Report
68 pages
Self-Improving LLM Architectures With Open Source
No ratings yet
Self-Improving LLM Architectures With Open Source
14 pages
Computer Science Coursework: DeepLearning Library
No ratings yet
Computer Science Coursework: DeepLearning Library
110 pages
Python TensorFlow Tutorial - Build A Neural Network - Adventures in Machine Learning
No ratings yet
Python TensorFlow Tutorial - Build A Neural Network - Adventures in Machine Learning
18 pages
Maths For Machine Learning
No ratings yet
Maths For Machine Learning
47 pages
Unit III
No ratings yet
Unit III
37 pages
Transformer
No ratings yet
Transformer
31 pages
New CZ3005 Module 3 - Constraint Satisfaction and Adversarial Search
No ratings yet
New CZ3005 Module 3 - Constraint Satisfaction and Adversarial Search
53 pages
机器学习绘图模板
No ratings yet
机器学习绘图模板
101 pages
New CZ3005 Module 2 - Intelligent Agents and Search
No ratings yet
New CZ3005 Module 2 - Intelligent Agents and Search
66 pages
Dynamic Programming Guide Sample
No ratings yet
Dynamic Programming Guide Sample
13 pages
MFML PDF
No ratings yet
MFML PDF
101 pages
OOMD Summer
No ratings yet
OOMD Summer
12 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
No ratings yet
Stanford CS224W Graph Representation Learning 09-Node2vec PDF
60 pages
State Oriented Programming
No ratings yet
State Oriented Programming
32 pages
Activations, Loss Functions & Optimizers in ML
No ratings yet
Activations, Loss Functions & Optimizers in ML
29 pages
Student Performance Prediction
No ratings yet
Student Performance Prediction
19 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
Regularization For Neural Networks 1718966083
No ratings yet
Regularization For Neural Networks 1718966083
9 pages
1 s2.0 S1877050923010761 Main
No ratings yet
1 s2.0 S1877050923010761 Main
84 pages
AIIM Pro and CIP Overview
No ratings yet
AIIM Pro and CIP Overview
30 pages
General Idea of Iterative Models-Spiral Model
No ratings yet
General Idea of Iterative Models-Spiral Model
30 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Neurocomputing: Zhaoyang Niu, Guoqiang Zhong, Hui Yu
No ratings yet
Neurocomputing: Zhaoyang Niu, Guoqiang Zhong, Hui Yu
15 pages
Mathematics 08 01245 v2
No ratings yet
Mathematics 08 01245 v2
29 pages
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
No ratings yet
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
62 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Verilog Nonblocking Assignments Demystified
100% (2)
Verilog Nonblocking Assignments Demystified
3 pages
Lecture 3.b - Dynamic Programming PDF
No ratings yet
Lecture 3.b - Dynamic Programming PDF
46 pages
Siamese Neural Networks For One-Shot Image Recognition
No ratings yet
Siamese Neural Networks For One-Shot Image Recognition
8 pages
Credit Risk Assessment of Listed Companies Based On Long Short-Term Memory Neural Networks
No ratings yet
Credit Risk Assessment of Listed Companies Based On Long Short-Term Memory Neural Networks
7 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
A Lion, A Lion - A Lion Has A Tail It Has A Big Head and A Very Small Waist
No ratings yet
A Lion, A Lion - A Lion Has A Tail It Has A Big Head and A Very Small Waist
1 page
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
EXP5 Alexnet
No ratings yet
EXP5 Alexnet
3 pages
Knowledge - Graph - Embedding - and - OpenKE (Report)
No ratings yet
Knowledge - Graph - Embedding - and - OpenKE (Report)
5 pages
Lectures 2 Heuristic Optimization Methods:: Combinatorial Optimization Complexity Theory When and Why To Use Heuristics
No ratings yet
Lectures 2 Heuristic Optimization Methods:: Combinatorial Optimization Complexity Theory When and Why To Use Heuristics
37 pages
GPU Architecture
No ratings yet
GPU Architecture
17 pages
Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022
No ratings yet
Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022
12 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Sequential Patterns The GSP Algorithm
No ratings yet
Sequential Patterns The GSP Algorithm
10 pages
How To Learn Machine Learning Algorithms For Interviews
No ratings yet
How To Learn Machine Learning Algorithms For Interviews
16 pages
Cs230exam spr21
No ratings yet
Cs230exam spr21
16 pages
3a Variations
No ratings yet
3a Variations
17 pages
Multimodal Deep Learning
No ratings yet
Multimodal Deep Learning
21 pages
Using News Titles and Financial Features To Predict Intraday Movements of The DJIA
No ratings yet
Using News Titles and Financial Features To Predict Intraday Movements of The DJIA
9 pages
Geographic Coordinate Conversion
No ratings yet
Geographic Coordinate Conversion
11 pages
Trust-In Machine Learning Models
No ratings yet
Trust-In Machine Learning Models
11 pages
Isolation and Impartial Aggregation: A Paradigm of Incremental Learning Without Interference
No ratings yet
Isolation and Impartial Aggregation: A Paradigm of Incremental Learning Without Interference
9 pages
Zoe Depth
No ratings yet
Zoe Depth
20 pages
Academic Calender
No ratings yet
Academic Calender
2 pages
Your Trip: Eric Kwamena Asibu Yartey
No ratings yet
Your Trip: Eric Kwamena Asibu Yartey
2 pages
A High-Speed and Low-Complexity Architecture For Softmax Function in Deep Learning
No ratings yet
A High-Speed and Low-Complexity Architecture For Softmax Function in Deep Learning
4 pages
Depth Prediction Single Image
No ratings yet
Depth Prediction Single Image
8 pages
Bertgcn: Transductive Text Classification by Combining GCN and Bert
No ratings yet
Bertgcn: Transductive Text Classification by Combining GCN and Bert
7 pages
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
No ratings yet
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
5 pages
Research Article Application of Face Recognition Technology in Intelligent Education Management in Colleges and Universities
No ratings yet
Research Article Application of Face Recognition Technology in Intelligent Education Management in Colleges and Universities
10 pages
A New Approach For Classification of Network Data Using One Class SVM
No ratings yet
A New Approach For Classification of Network Data Using One Class SVM
7 pages
Sphereface: Deep Hypersphere Embedding For Face Recognition
No ratings yet
Sphereface: Deep Hypersphere Embedding For Face Recognition
9 pages
Intelligent Vision System Quality Class Airport Lamp Prisms Corrected
No ratings yet
Intelligent Vision System Quality Class Airport Lamp Prisms Corrected
4 pages
Unit 1 Lecture 3
No ratings yet
Unit 1 Lecture 3
5 pages
What Is A Support Vector Machine?: Primer
No ratings yet
What Is A Support Vector Machine?: Primer
3 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Dynamic programming The Ultimate Step-By-Step Guide
From Everand
Dynamic programming The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet

A Gentle Introduction To Graph Neural Networks

Uploaded by

A Gentle Introduction To Graph Neural Networks

Uploaded by

Open in app

Follow 589K Followers

A Gentle Introduction to Graph Neural

Image from Pexels

Edges can be either directed or undirected, depending on whether there exist

A Directed Graph (wiki)

Graph Neural Network

H and X denote the concatenation of all the h and x, respectively.

Both f and g here can be interpreted as feed-forward fully-connected Neural Networks.

which can be optimized via gradient descent.

1. If the assumption of “fixed point” is relaxed, it is possible to leverage Multi-layer

1. Perform random walks on nodes in a graph to generate node sequences

In this paper, hierarchical softmax is applied to address the costly computation of

The definition of Softmax

This operator performs an element-wise pooling function on the neighboring set.

The loss function is defined as the following:

GraphSage enables representable embedding to be generated for unseen nodes by

Sign up for The Variable

Emails will be sent to [email protected].

Machine Learning Deep Learning Artificial Intelligence Data Science Programming

About Write Help Legal

Get the Medium app

You might also like