Enhancing Graph Neural Networks With Limited Labeled Data (Pseudo Labeling)
Enhancing Graph Neural Networks With Limited Labeled Data (Pseudo Labeling)
Abstract—Graphs are pervasive in the real-world, such as many real-world applications, due to various reasons such as
social network analysis, bioinformatics, and knowledge graphs. labeling cost and privacy issues, one often needs to train a
Graph neural networks (GNNs) have great ability in node GNN classifier with sparse labels, which is known as few-
classification, a fundamental task on graphs. Unfortunately,
conventional GNNs still face challenges in scenarios with few shot node classification. For example, labeling a large number
labeled nodes, despite the prevalence of few-shot node classi- of web documents can be both costly and time-consuming
fication tasks in real-world applications. To address this chal- [10], [11]; similarly, in social networks, privacy concerns limit
lenge, various approaches have been proposed, including graph access to personal information, leading to a scarcity of attribute
meta-learning, transfer learning, and methods based on Large labels [5]. Consequently, when confronted with such datasets,
Language Models (LLMs). However, traditional meta-learning
and transfer learning methods often require prior knowledge GNNs may exhibit poor generalization to unlabeled nodes. To
from base classes or fail to exploit the potential advantages of tackle the few-shot learning problem, various methods have
unlabeled nodes. Meanwhile, LLM-based methods may overlook been proposed, such as meta-learning [12]–[14], transfer learn-
the zero-shot capabilities of LLMs and rely heavily on the ing [15], [16], and adversarial reprogramming [17]. However,
quality of generated contexts. In this paper, we propose a they still require a substantial amount of labeled nodes in each
novel approach that integrates LLMs and GNNs, leveraging
the zero-shot inference and reasoning capabilities of LLMs and class to achieve satisfactory results [18] or require auxiliary
employing a Graph-LLM-based active learning paradigm to labeled data to provide supervision.
enhance GNNs’ performance. Extensive experiments demonstrate Recently, Large Language Models (LLMs) have demon-
the effectiveness of our model in improving node classification strated their outstanding generalizability in zero-shot learning
accuracy with considerably limited labeled data, surpassing state- and reasoning [19], [20]. Several efforts have been taken in
of-the-art baselines by significant margins.
Index Terms—Graph Neural Networks, Large Language
introducing LLMs to graph learning, such as pre-processing
Model, Active Learning, Pseudo Labeling of textual node attributes or taking textual descriptions of
rationales as inputs [21], [22], leveraging LLMs to construct
I. I NTRODUCTION graph structure [23], [24], and generating new nodes [25]. For
example, Chen et al. [19] first leveraged LLMs as annotators
Graphs have become increasingly recognized as one of the to provide more supervision for graph learning. Yu et al. [25]
powerful data structures to perform real-world content analysis leveraged the generative capability of LLMs to address the
[1]–[3]. They are adept at representing complex relationships few-shot node classification problem. These works demon-
and uncovering hidden information between objects across strate that LLMs can enhance GNNs from different perspec-
various domains. Among various tasks on graphs, node clas- tives. However, they typically treat LLMS merely as annotators
sification stands out as a classic task with broad applications, or generators for node classification tasks, overlooking their
such as sentiment analysis [4] and user attribute inference untapped potentials, such as the capacity to uncover hidden
[5]. Recently, graph neural networks [6]–[8] have shown insights within the results and their zero-shot reasoning ability,
great power in node classification. Generally, GNNs adopt the which could significantly enhance GNNs’ performance for
message-passing mechanism, which updates a node’s represen- few-shot learning tasks.
tation by aggregating its neighbors’ information, facilitating In this paper, we introduce a novel few-shot node classi-
the implicit propagation of information from labeled nodes fication model that enhances GNNs’ capabilities by actively
to unlabeled ones. This strategy has substantially enhanced distilling knowledge from LLMs. Unlike previous approaches,
performance across various benchmark datasets [9]. our model uses LLMs as “teachers”, capitalizing on their zero-
Despite the great success of GNNs in node presentation shot inference and reasoning capabilities to bolster the per-
learning and node classification, they often struggle to gen- formance of GNNs in few-shot learning scenarios. However,
eralize effectively when labeled data is scarce. However, in there are two primary challenges: (i) LLMs cannot consistently
deliver accurate predictions for all nodes. How to select nodes effectively propagate labels throughout the entire graph when
that LLMs can provide high-quality labels that can benefit only few labeled data points are available [27].
GNN most; and (ii) How to effectively distill the knowledge Few-shot Node Classification In real-world graph learning
from LLMs to GNNs. To address these challenges, we propose tasks, obtaining high-quality labeled samples can be particu-
an active learning-based knowledge distillation strategy that larly challenging due to various factors such as the high cost
selects valuable nodes for LLMs and bridges the gap between involved in annotation processes or the limited access to node
LLMs and GNNs. This approach significantly enhances the information. Thus, researchers proposed different methods to
efficacy of GNNs when labeled data is scarce. We first explore improve the performance of GNNs with only few labeled
the metrics that affect the correctness of LLMs’ predictions. data. Most recent advancements in few-shot node classification
Then, we employ LLMs as a teacher model and leverage them (FSNC) models have mainly developed from two approaches:
to perform on the limited training data, generating soft labels metric-learning and meta-learning. Metric-learning models
for training nodes along with logits and rationales. These aim to learn a task-invariant metric applicable across all tasks
outputs are used to supervise GNNs in learning from two to facilitate FSNC [28], [29]. Prototypical network [30] and
perspectives: probability distribution and feature enhancement relation network [31] are two classic examples, where the
at the embedding level. In this way, GNNs can learn the hidden former uses the mean vector of the support set as a prototype
information from unlabeled nodes and the detailed explanation and calculates the distance metrics to classify query instances,
provided by LLMs. Furthermore, we introduce a novel Graph- and the latter trains a neural network to learn the distance
LLM-based active learning approach to establish a connection metric between the query and support set. Meta-learning mod-
between LLMs and GNNs, which effectively select nodes for els use task distributions to conduct meta-training, learning
which GNNs fail to provide accurate pseudo-labels but LLMs shared initialization parameters that are then adapted to new
can offer reliable pseudo-labels, thereby enabling GNNs to tasks during meta-testing [13], [32], [33]. These approaches
leverage the zero-shot capabilities of LLMs and enhance their have demonstrated effectiveness compared to metric-learning,
performance with limited data. Afterward, the selected pseudo- which often struggles due to task divergence issues. However,
labels are merged with the true labels to train the final few-shot meta-learning requires significant amounts of data for meta-
node classification model. Our major contributions are: training, sourced from the same domain as meta-testing,
• We innovate a semi-supervised learning model by distilling thereby severely limiting its practical applicability. Different
knowledge from Large Language Models and leveraging the from metric-learning and meta-learning models, in this paper,
enhanced rationales provided by Large Language Models to we propose to distill knowledge from LLMs to GNNs, leverage
help GNNs improve their performance. the LLMs’ zero-shot ability and reasoning ability to improve
• We design and implement a Graph-LLM-based active learn- GNNs for few-shot node classification.
ing paradigm to enhance the performance of GNNs. This LLMs for Text-Attributed Graphs Recently, LLMs have
is achieved by identifying nodes for which GNNs struggle garnered widespread attention and experienced rapid develop-
to generate reliable pseudo labels, yet LLMs can provide ment, emerging as a hot topic in the artificial intelligence area.
dependable predictions, which leverage the zero-shot ability Within the graph domain, LLMs show their generalizability
of LLMs to enhance the performance of GNNs. in dealing with Text-Attributed Graphs (TAGs). Chen et al.
• Extensive experiments on various benchmark datasets [19] demonstrated the power of LLMs’ zero-shot ability on
demonstrate the effectiveness of our proposed framework node classification tasks. Moreover, LLMs also demonstrate
for node classification tasks with limited labeled data. their power in providing rationales to enhance node features
[22] and construct edges in graphs [34]. Liu et al. [21] further
II. R ELATED W ORK proposed OFA to encode all graph data into text and leverage
Graph Neural Networks Graph Neural Networks (GNNs) LLMs to make predictions on different tasks. Despite their
have garnered widespread attention for their effective exploita- remarkable proficiency in understanding text, LLMs still face
tion of graph structure information. There are two primary limitations when it comes to processing graph-structured data.
types of GNNs: spectral-based and spatial-based. Kipf and Therefore, leveraging LLMs’ zero-shot ability and integrating
Welling [6] followed the idea of CNNs and proposed the them with GNNs has emerged as the latest state-of-the-art
Graph Convolutional Network (GCN) to aggregate informa- approach in text-attributed graph learning [19].
tion within the spectral domain using graph convolution. Active Learning Active learning (AL) [35]–[39] is a widely
Different from GCN, Graph Attention Network (GAT) [26] adopted approach across various domains for addressing the
and GraphSAGE [8] emerged as spatial-based approaches. issue of label sparsity. The core concept involves selecting the
GAT applies the attention mechanism to learn the importance most informative instances from the pool of unlabeled data.
of the neighbors when aggregating information. GraphSAGE Recently, many works [37], [38], [40] integrate GNNs with
randomly samples the number of neighbors of a node and AL to improve the representative power of graph embeddings.
aggregates information from these local neighborhoods. De- However, how to leverage AL to build connections between
spite their extensive application across various domains, GNNs LLMs and GNNs and improve the performance of GNNs has
often face challenges due to limited labeled data. Existing emerged as a problem. Chen et al. [19] first leverage active
convolutional filters or aggregation mechanisms struggle to learning to select nodes that are close to the cluster center
provide superior pseudo-labels with rationales, whereas GNNs
&