0% found this document useful (0 votes)
2 views

Tutorial3

The document provides an overview of Graph Attention Networks (GAT), detailing their structure, advantages, and implementation. It discusses the graph attention layer, message passing, and the computational efficiency of GAT, highlighting its ability to assign different importances to nodes. Additionally, it includes implementation details using PyTorch Geometric and the steps to create a GCNConv layer.

Uploaded by

Mohammed Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Tutorial3

The document provides an overview of Graph Attention Networks (GAT), detailing their structure, advantages, and implementation. It discusses the graph attention layer, message passing, and the computational efficiency of GAT, highlighting its ability to assign different importances to nodes. Additionally, it includes implementation details using PyTorch Geometric and the steps to create a GCNConv layer.

Uploaded by

Mohammed Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Graph attention Networks (GAT)

Antonio Longa1,2
MobS1 Lab, Fondazione Bruno Kessler,Trento, Italy
SML2 Lab, University of Trento, Italy
Recap 01
Introduction
02
TABLE OF 04 Pros of GAT

03
Graph attention layer
(GAT) CONTENTS
05 Message passing
Implementation

06 Implement our
GCNConv

07 GAT implementation
01 Recap
01 Recap

PROBLEMS:

■ Different sizes
01 Recap

PROBLEMS:

■ Different sizes

■ NOT invariant to nodes ordering

𝙂 Adj(𝙂)

𝙂 = 𝙂’ Adj(𝙂)≠ Adj(𝙂’)

𝙂’ Adj(𝙂’)
01 Recap

COMPUTATION GRAPH
The neighbour of a node defines its computation graph

INPUT GRAPH
01 Recap

COMPUTATION GRAPH
The neighbour of a node defines its computation graph

INPUT GRAPH COMPUTATION GRAPH


01 Recap

COMPUTATION GRAPH
The neighbour of a node defines its computation graph

INPUT GRAPH COMPUTATION GRAPH


01 Recap

COMPUTATION GRAPH
The neighbour of a node defines its computation graph

INPUT GRAPH COMPUTATION GRAPH


01 Recap

XA

XB

XE

Neural Networks
Ordering invariant
Aggregation

Sum
Average
02 Introduction
02 Introduction
02 Introduction
02 Introduction

How much features of node “c” are important to node “i”?


02 Introduction

How much features of node “c” are important to node “i”?

Can we learn such importance, in an automatic manner?


02 Introduction

How much features of node “c” are important to node “i”?

Can we learn such importance, in an automatic manner?

YES, with GAT


03 Graph Attention Networks GAT
Petar
Veličković
Senior Research Scientist at DeepMind
03 Graph Attention layer

INPUT: a set of node features

OUTPUT: a new set of node features


03 Graph Attention layer

1) apply a parameterized linear transformation to every node


03 Graph Attention layer

1) apply a parameterized linear transformation to every node


03 Graph Attention layer

1) apply a parameterized linear transformation to every node


03 Graph Attention layer

2) Self attention
03 Graph Attention layer

2) Self attention
03 Graph Attention layer

2) Self attention

Specify the importance of node j’s features to node i


03 Graph Attention layer

3) Normalization
03 Graph Attention layer

4) Attention mechanism

Is a single-layer feed forward neural network


03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
max(0.2x, x)
03 Graph Attention layer
4) Attention mechanism
max(0.2x, x)
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism
03 Graph Attention layer
4) Attention mechanism

1
03 Graph Attention layer
4) Attention mechanism

1
1
03 Graph Attention layer
4) Attention mechanism

1
1
03 Graph Attention layer
5) Use it :)
03 Graph Attention layer
6) Multi-head attention
03 Graph Attention layer
6) Multi-head attention
03 Graph Attention layer
6) Multi-head attention
03 Graph Attention layer
6) Multi-head attention

Concatenation Average

● On the final (prediction) layer of the


network
04 Pros of GAT
Self-attention layers can be parallelized across edges

Output features can be parallelized across nodes


● Computationally efficient
04 Pros of GAT
Self-attention layers can be parallelized across edges

Output features can be parallelized across nodes


● Computationally efficient

● Allows to assign different importances to nodes of a same neighborhood


04 Pros of GAT
Self-attention layers can be parallelized across edges

Output features can be parallelized across nodes


● Computationally efficient

● Allows to assign different importances to nodes of a same neighborhood

Not required to have the entire graph


● It is applied in a shared manner to all edges in the graph
04 Pros of GAT
Self-attention layers can be parallelized across edges

Output features can be parallelized across nodes


● Computationally efficient

● Allows to assign different importances to nodes of a same neighborhood

Not required to have the entire graph


● It is applied in a shared manner to all edges in the graph

Transductive learning (Cora, Citeseer, Pubmed)

● Works in both: Inductive learning (PPI)


05 Message passing implementation

Features representations of
node i at the k-th layer
05 Message passing implementation

Features representations of
node i at the k-th layer

Differentiable function
Eg: MLP
05 Message passing implementation

● Feature rep of node i at


the (k-1)-th layer
Features representations of ● Feature rep of node j at
node i at the k-th layer the (k-1)-th layer
● [optionally] features of
edge (i,j)

Differentiable function
Eg: MLP
05 Message passing implementation

● Feature rep of node i at


the (k-1)-th layer
Features representations of ● Feature rep of node j at
Differentiable, ordering the (k-1)-th layer
node i at the k-th layer
invariant function. ● [optionally] features of
For every j in the edge (i,j)
neighbourhood of i.
Eg: sum, average, etc...
Differentiable function
Eg: MLP
05 Message passing implementation

● Feature rep of node i at


the (k-1)-th layer
Features representations of ● Feature rep of node j at
Differentiable, ordering the (k-1)-th layer
node i at the k-th layer
invariant function. ● [optionally] features of
For every j in the edge (i,j)
neighbourhood of i.
Eg: sum, average, etc...
Differentiable function Differentiable function
Eg: MLP Eg: MLP
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.

Differentiable function
Eg: MLP

message()
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.

Differentiable function Differentiable functions


Eg: MLP Eg: MLP

update() message()
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.

Differentiable function Differentiable function


Eg: MLP Aggregation Eg: MLP

Sum, avg, concat


update() message()
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
PARAMETERS
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
PARAMETERS
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
PARAMETERS
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
METHODS

Aggregates messages from neighbors


(sum, mean, max)
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
METHODS

Aggregates messages from neighbors


(sum, mean, max)

Constructs messages from node j to


node i in analogy to ϕΘ
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
METHODS

Aggregates messages from neighbors


(sum, mean, max)

Constructs messages from node j to


node i in analogy to ϕΘ

Propagate messages
05 Message passing implementation
PyTorch Geometric provides the MessagePassing base class.
METHODS

Aggregates messages from neighbors


(sum, mean, max)

Constructs messages from node j to


node i in analogy to ϕΘ

Propagate messages

Updates node embeddings in


analogy to γΘ
05 Message passing implementation
HOW TO USE IT?

Layer Name
05 Message passing implementation
HOW TO USE IT?

GCNConv inherits from MessagePassing

Layer Name
05 Message passing implementation
HOW TO USE IT?

GCNConv inherits from MessagePassing

Layer Name

Initialize the class, call “super” specifying your


aggregations (add,max,mean)
05 Message passing implementation
HOW TO USE IT?

GCNConv inherits from MessagePassing

Layer Name

Initialize the class, call “super” specifying your


aggregations (add,max,mean)

Forward and propagate


05 Message passing implementation
HOW TO USE IT?

GCNConv inherits from MessagePassing

Layer Name

Initialize the class, call “super” specifying your


aggregations (add,max,mean)

Forward and propagate

Compute the message


06 Implement our GCNConv
Simple example
06 Implement our GCNConv
Simple example
06 Implement our GCNConv
Simple example
06 Implement our GCNConv
Simple example

In steps:
1. Add self loops
2. A linear transformation to node feature matrix
3. Compute normalization coefficients
4. Normalize node features
5. Sum up neighboring node features
06 Implement our GCNConv
Simple example

In steps:
1. Add self loops
2. A linear transformation to node feature matrix Forward method
3. Compute normalization coefficients
4. Normalize node features Message method
5. Sum up neighboring node features int
06 Implement our GCNConv
GCNConv inherits from MessagePassing
06 Implement our GCNConv
GCNConv inherits from MessagePassing

1) Add self loops


06 Implement our GCNConv
GCNConv inherits from MessagePassing

1) Add self loops


2) A linear transformation to node feature matrix
06 Implement our GCNConv
GCNConv inherits from MessagePassing

1) Add self loops


2) A linear transformation to node feature matrix

3) Compute normalization coefficients


06 Implement our GCNConv
GCNConv inherits from MessagePassing

1) Add self loops


2) A linear transformation to node feature matrix

3) Compute normalization coefficients

4) Normalize node features


06 Implement our GCNConv
GCNConv inherits from MessagePassing

5) Sum up neighboring node features

1) Add self loops


2) A linear transformation to node feature matrix

3) Compute normalization coefficients

4) Normalize node features


06 GAT implementation

Jupyter-Notebook

You might also like