GNNs
GNNs
• Conclusions
• INPUT: GNNs take as input a graph
GNNs: with nodes (examples) and edges.
Input,
output & • OUTPUT & GOALS: GNNs embed
(encode) examples as vectors in
goals vector space such that similar or
related examples tend to cluster.
This enables us to classify
unlabeled examples or predict links.
GNNs: The Intuition
This is iterated layer-wise,
A node (example) “asks” its incorporating information
immediate neighbors about passed to the (“evolving”) Locations of nodes in vector
their features and becomes target node from neighbor space are updated in this
more like them in vector nodes residing at a manner layer by layer.
space. progressively increasing
number of hops away.
where ht(l+1) is the vector representation of the target node (t) in layer l+1; ht(l) is the
vector representation of the target node in layer l; N is the set of neighbor nodes in
layer l with cardinality |N|, n is an element of N with vector representation hn(l) in
layer 1, and WN(l) and Wt(l) are learned layer-specific weight matrices.
How does learning take place in
this message passing GNN (I)?
• “Similar” nodes in the original graph should have similar (final) embeddings (z) in
the vector space after all layer-wise transformations in the GNN.
• Similarity in the embedding space is usually assessed with the dot product (cosine
similarity)
• Similarity of nodes in an original graph may be defined in several ways, such as:
• Node u has a high likelihood of being visited during a random walk starting at node v
(this can be truly random walk, or a biased walk geared toward preferential capture of
either local or global aspects of network topology)
Loss = ∑u,vCE(yu,v,zu∙zv)
• Ai, et. al. (2024) have proposed melding node-level and higher-level (subgraph)
structural information to produce a more comprehensive representation. Subgraphs
represented as “super-nodes.”
• Typically, the number of labeled nodes used to train a GNN is far smaller than the
number of unlabeled nodes whose labels one wishes to predict (overfitting risk),
and the distribution of labeled training nodes and test nodes may be different (e.g.,
different degrees, dissimilar proportions of neighbors with different labels),
exacerbating the risk of poor prediction generalization
• Fan et al. (2024) propose a new variable “decorrelation regularizer” to mitigate the
effect of this distribution shift while maintaining sample size (this complex approach is
covered in supplementary slides of the regularization module).
GNNs: Caveats, insights and
innovations (II)
• GNNs may promote “over-homogenization” of nodes
(oversmoothing), especially with increasing depth of the network.
• This makes intuitive sense as GNNs induce target nodes to become more
like their neighbors – including, potentially, nodes belonging to other
classes. The oversmoothing issue is aggravated by more layers, which
allow passing of messages from more distant neighbors.
• This can compromise classification performance by GNNs. Inter-class
edges in the graph fed to a GCN are thought to shoulder blame here.
• Wang et al. (2024) proposed “GUIded Dropout over Edges” (GUIDE) to
mitigate this problem. Edge strength (tied to number of times an edge lies
on the shortest paths between all node pairs) used as a proxy for “inter-
class” edges, which are preferentially removed.
GUIDE in Graph-based semi-supervised
learning (GSSL) – Wang et al., 2024
• A bipartite graph, with nodes consisting of patients and chronic diseases, was used to
create a patient network with weights reflecting number of shared diseases. This was fed
to a GNN, with patient features for each node, to make chronic disease predictions (Lu
and Uddin, 2021 – see https://pubmed.ncbi.nlm.nih.gov/34799627/ and supplementary
slide)
• Over 39K compounds, with 20% held-out for testing of the ensemble after it was
trained on all training data
• Predictions then made for over 12 million compounds
*See Yang et al. for details on bond-based message passing (https://pubs.acs.org/doi/10.1021/acs.jcim.9b00237). Note Images from Yang et al.
that atom-based features concatenated with bond features prior to message passing. Skip connections with original
feature vector used. At the end, Yang et al. return to an atom-based representation by summing incoming bond
messages and concatenating with atomic features.
Case Study (continued)
• Wong and colleagues further filtered hits, removing those thought to
have unfavorable medicinal chemistry properties
• For molecules with high predicted activity, Monte Carlo tree search
used to identify subgraphs (i.e., a portion of the molecule, referred to in
the paper as a “rationale”) thought to be responsible for activity
• Authors found that GCNs fed sparse graphs more precisely identified patients with similar
survival times compared with denser graphs, but since sparse graphs may miss neighbors,
the team “knitted together” multiple sparse continuous k-nearest neighbor graphs to improve
sensitivity
• Two-layer GCN with final fully connected layer outputting risk scores
• After training, create new graph incorporating an unlabeled example, update degree and
adjacency matrices, and use the following to obtain the risk score for the unlabeled example:
Weaving together sparse graphs.
Figure from Ling, Liu and Xue, 2024. Random subsets of features used to create the sparse
graphs. Assess alignment of GCN-generated output with survival time for each. Start with
sparse graph that performs best. Sequentially add sparse graphs until composite graph, when
fed to GCN, does not generate improved alignment using Harrell’s CI.
Results of the
weaving exercise
• 90%-10% train-test split
• Ten-fold cross-validation using
training set
• Grid search for hyperparameter
optimization
• Adam optimizer
• Dropout 0.1
• Early stopping to avoid overfitting
Figure and table from Ling, Liu and Xue, 2024. Tabular data
from cross-validation. AGGSurv is the test method.
Rankings in figure presumably represent test set data (not
specified in paper).