unit-II node embeddings
unit-II node embeddings
• Node Embeddings
• Each node in a graph is encoded as a low-dimensional vector.
• The vector captures the node's graph position and local neighborhood
structure.
• Nodes that are close in the graph should have embeddings that are close in
the latent space.
• Latent Space Representation
• Instead of working with an entire adjacency matrix (which is sparse and high-
dimensional), embeddings place nodes in a continuous vector space.
• The relative distances between embeddings should reflect the relationships in
the original graph (e.g., edges).
3.1 An Encoder-Decoder Perspective
• The Encoder-Decoder framework is a central idea in graph
representation learning.
• It provides a structured way to approach the problem of encoding
node information and reconstructing graph relationships based on
these encodings.
• This perspective is particularly useful for understanding node
embeddings and is crucial for learning accurate representations of
nodes in a graph.
3.1.1 The Encoder
• Encoder:
The encoder's job is to transform each node in the graph into a low-
dimensional vector (embedding). This mapping is learned such that
similar nodes in the graph should be represented by similar vectors in
the embedding space.
• Shallow Embedding Approach:
In many methods, the encoder is a shallow function that simply looks up a
pre-learned vector for each node. This can be seen as a simple mapping
from node IDs to embeddings.
• Key Idea:
• This method doesn't explicitly learn based on the graph structure but assumes
that the embeddings, once learned, capture the relevant information for the node.
It's often seen in basic node embedding methods like DeepWalk or Node2Vec.
3.1.2 The Decoder
• Purpose of the Decoder
• Given a node embedding zu for a node u, the decoder attempts to
predict certain characteristics of the graph related to that node. These
characteristics might include:
• Pairwise Decoders
• The most common type of decoder used in node embedding models is the
pairwise decoder. This decoder takes in a pair of node embeddings and
predicts their relationship or similarity based on the graph structure.
• The loss function ensures that similar nodes stay close in the embedding
space:
• Step 5: Interpretation
• Each row of Z represents a 2D embedding for a node.
• Similar nodes (e.g., A and C, B and D) have closer embeddings.
• The embeddings preserve local graph structure by minimizing differences between connected nodes.
3. Inner-Product Based Methods
• A more recent approach replaces L2 distance with an inner-product
decoder
• This assumes that node similarity (e.g., neighborhood overlap) is
proportional to the dot product of their embeddings.
• Some methods using this approach:
• Graph Factorization (GF): Uses the adjacency matrix (𝑆=𝐴S=A) as similarity.
• GraRep: Uses powers of the adjacency matrix to capture long-range connections.
• HOPE: Uses more general node similarity measures. The loss function minimizes the
difference between predicted and actual similarities
• The loss function minimizes the difference between predicted and actual similarities
• These methods can be solved using Singular Value Decomposition (SVD), reducing
the problem to matrix factorization:
• Step 1: Define the Adjacency Matrix A
• Step 2: Define the Inner-Product Decoder
• Inner-product based methods use the formula:
• This means that the similarity between nodes uuu and vvv is
computed by taking the dot product of their embedding vectors.
• Step 3: Learn Node Embeddings
• Let's assume we have 2D embeddings for each node:
Second-order Proximity:Nodes with many common neighbors should have similar embeddings.
• Optimized using Kullback-Leibler divergence(KL)-
• divergence to approximate second-order relationships.
Example
• The length of random walks (T) controls the influence of different eigenvalues, linking
it to spectral clustering methods.
• Spectral clustering is a technique that uses graph theory and eigenvalues to cluster data points.
Multi-relational Data and
Knowledge Graphs
1. Knowledge Graph Completion
• Knowledge Graph Completion (KGC) is the process of predicting
missing edges in a multi-relational graph, typically referred to as a
knowledge graph (KG). A knowledge graph consists of:
Definition of Multi-Relational Graphs
• A multi-relational graph is a type of graph where edges have specific
types (relations). The given notation:
3. Applications of Knowledge Graph Completion
• The primary application of KGC is relation prediction, where missing
relationships in the graph are inferred.
• Other tasks include:
• Node classification: Assigning labels to entities based on relational data (e.g.,
classifying drugs as "antibiotics" or "painkillers").
• Link prediction: Predicting new relationships between existing entities.
• A well-known reference for node classification using knowledge graphs is
Schlichtkrull et al., 2017.
4.1 Reconstructing Multi-Relational
Data
• In a multi-relational graph, nodes are connected by different types of
edges (relationships).
• Imporatnt task is to embed these nodes into a low-dimensional space
and reconstruct their relationships.
• Unlike simple graphs, where we only consider node pairs, multi-
relational graphs require us to consider edge types as well.
Decoder Function for Multi-Relational Graphs
• The decoder function DEC(u, τ, v) computes the likelihood that an
edge (u, v) of type τ exists.
• One early approach is RESCAL, which represents relationships using
learnable matrices
Decoders, Similarity Functions, and
Loss Functions
• Decoder (DEC): Computes a score between node embeddings to
estimate their relationship.
• Similarity Function (S[u, v]): Defines what type of node-node similarity
is being decoded.
• Loss Function (L): Measures the discrepancy between the decoder’s
output and the actual similarity measure.
Challenges in Multi-Relational
Graphs
• Most multi-relational embedding methods focus on reconstructing
immediate neighbors because defining higher-order relationships is
difficult.
• The adjacency tensor (a multi-relational extension of an adjacency matrix)
is often used as the similarity function.
• A naive reconstruction loss (like mean-squared error) is impractical
because:
• It is computationally expensive
• The loss is zero if the true edge scores are higher by at least Δ,
otherwise, the model is penalized.