0% found this document useful (0 votes)
11 views10 pages

Enhancing Graph Neural Networks With Limited Labeled Data (Pseudo Labeling)

Uploaded by

pouyarezvani79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views10 pages

Enhancing Graph Neural Networks With Limited Labeled Data (Pseudo Labeling)

Uploaded by

pouyarezvani79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Enhancing Graph Neural Networks with Limited

Labeled Data by Actively Distilling Knowledge


from Large Language Models
Quan Li1 , Tianxiang Zhao1 , Lingwei Chen2 , Junjie Xu1 , Suhang Wang1
1
Pennsylvania State University, University Park, PA, USA
2
Wright State University, Dayton, OH, USA
{qbl5082, tkz5084, junjiexu, szw494}@psu.edu, [email protected]
arXiv:2407.13989v3 [cs.LG] 4 Sep 2024

Abstract—Graphs are pervasive in the real-world, such as many real-world applications, due to various reasons such as
social network analysis, bioinformatics, and knowledge graphs. labeling cost and privacy issues, one often needs to train a
Graph neural networks (GNNs) have great ability in node GNN classifier with sparse labels, which is known as few-
classification, a fundamental task on graphs. Unfortunately,
conventional GNNs still face challenges in scenarios with few shot node classification. For example, labeling a large number
labeled nodes, despite the prevalence of few-shot node classi- of web documents can be both costly and time-consuming
fication tasks in real-world applications. To address this chal- [10], [11]; similarly, in social networks, privacy concerns limit
lenge, various approaches have been proposed, including graph access to personal information, leading to a scarcity of attribute
meta-learning, transfer learning, and methods based on Large labels [5]. Consequently, when confronted with such datasets,
Language Models (LLMs). However, traditional meta-learning
and transfer learning methods often require prior knowledge GNNs may exhibit poor generalization to unlabeled nodes. To
from base classes or fail to exploit the potential advantages of tackle the few-shot learning problem, various methods have
unlabeled nodes. Meanwhile, LLM-based methods may overlook been proposed, such as meta-learning [12]–[14], transfer learn-
the zero-shot capabilities of LLMs and rely heavily on the ing [15], [16], and adversarial reprogramming [17]. However,
quality of generated contexts. In this paper, we propose a they still require a substantial amount of labeled nodes in each
novel approach that integrates LLMs and GNNs, leveraging
the zero-shot inference and reasoning capabilities of LLMs and class to achieve satisfactory results [18] or require auxiliary
employing a Graph-LLM-based active learning paradigm to labeled data to provide supervision.
enhance GNNs’ performance. Extensive experiments demonstrate Recently, Large Language Models (LLMs) have demon-
the effectiveness of our model in improving node classification strated their outstanding generalizability in zero-shot learning
accuracy with considerably limited labeled data, surpassing state- and reasoning [19], [20]. Several efforts have been taken in
of-the-art baselines by significant margins.
Index Terms—Graph Neural Networks, Large Language
introducing LLMs to graph learning, such as pre-processing
Model, Active Learning, Pseudo Labeling of textual node attributes or taking textual descriptions of
rationales as inputs [21], [22], leveraging LLMs to construct
I. I NTRODUCTION graph structure [23], [24], and generating new nodes [25]. For
example, Chen et al. [19] first leveraged LLMs as annotators
Graphs have become increasingly recognized as one of the to provide more supervision for graph learning. Yu et al. [25]
powerful data structures to perform real-world content analysis leveraged the generative capability of LLMs to address the
[1]–[3]. They are adept at representing complex relationships few-shot node classification problem. These works demon-
and uncovering hidden information between objects across strate that LLMs can enhance GNNs from different perspec-
various domains. Among various tasks on graphs, node clas- tives. However, they typically treat LLMS merely as annotators
sification stands out as a classic task with broad applications, or generators for node classification tasks, overlooking their
such as sentiment analysis [4] and user attribute inference untapped potentials, such as the capacity to uncover hidden
[5]. Recently, graph neural networks [6]–[8] have shown insights within the results and their zero-shot reasoning ability,
great power in node classification. Generally, GNNs adopt the which could significantly enhance GNNs’ performance for
message-passing mechanism, which updates a node’s represen- few-shot learning tasks.
tation by aggregating its neighbors’ information, facilitating In this paper, we introduce a novel few-shot node classi-
the implicit propagation of information from labeled nodes fication model that enhances GNNs’ capabilities by actively
to unlabeled ones. This strategy has substantially enhanced distilling knowledge from LLMs. Unlike previous approaches,
performance across various benchmark datasets [9]. our model uses LLMs as “teachers”, capitalizing on their zero-
Despite the great success of GNNs in node presentation shot inference and reasoning capabilities to bolster the per-
learning and node classification, they often struggle to gen- formance of GNNs in few-shot learning scenarios. However,
eralize effectively when labeled data is scarce. However, in there are two primary challenges: (i) LLMs cannot consistently
deliver accurate predictions for all nodes. How to select nodes effectively propagate labels throughout the entire graph when
that LLMs can provide high-quality labels that can benefit only few labeled data points are available [27].
GNN most; and (ii) How to effectively distill the knowledge Few-shot Node Classification In real-world graph learning
from LLMs to GNNs. To address these challenges, we propose tasks, obtaining high-quality labeled samples can be particu-
an active learning-based knowledge distillation strategy that larly challenging due to various factors such as the high cost
selects valuable nodes for LLMs and bridges the gap between involved in annotation processes or the limited access to node
LLMs and GNNs. This approach significantly enhances the information. Thus, researchers proposed different methods to
efficacy of GNNs when labeled data is scarce. We first explore improve the performance of GNNs with only few labeled
the metrics that affect the correctness of LLMs’ predictions. data. Most recent advancements in few-shot node classification
Then, we employ LLMs as a teacher model and leverage them (FSNC) models have mainly developed from two approaches:
to perform on the limited training data, generating soft labels metric-learning and meta-learning. Metric-learning models
for training nodes along with logits and rationales. These aim to learn a task-invariant metric applicable across all tasks
outputs are used to supervise GNNs in learning from two to facilitate FSNC [28], [29]. Prototypical network [30] and
perspectives: probability distribution and feature enhancement relation network [31] are two classic examples, where the
at the embedding level. In this way, GNNs can learn the hidden former uses the mean vector of the support set as a prototype
information from unlabeled nodes and the detailed explanation and calculates the distance metrics to classify query instances,
provided by LLMs. Furthermore, we introduce a novel Graph- and the latter trains a neural network to learn the distance
LLM-based active learning approach to establish a connection metric between the query and support set. Meta-learning mod-
between LLMs and GNNs, which effectively select nodes for els use task distributions to conduct meta-training, learning
which GNNs fail to provide accurate pseudo-labels but LLMs shared initialization parameters that are then adapted to new
can offer reliable pseudo-labels, thereby enabling GNNs to tasks during meta-testing [13], [32], [33]. These approaches
leverage the zero-shot capabilities of LLMs and enhance their have demonstrated effectiveness compared to metric-learning,
performance with limited data. Afterward, the selected pseudo- which often struggles due to task divergence issues. However,
labels are merged with the true labels to train the final few-shot meta-learning requires significant amounts of data for meta-
node classification model. Our major contributions are: training, sourced from the same domain as meta-testing,
• We innovate a semi-supervised learning model by distilling thereby severely limiting its practical applicability. Different
knowledge from Large Language Models and leveraging the from metric-learning and meta-learning models, in this paper,
enhanced rationales provided by Large Language Models to we propose to distill knowledge from LLMs to GNNs, leverage
help GNNs improve their performance. the LLMs’ zero-shot ability and reasoning ability to improve
• We design and implement a Graph-LLM-based active learn- GNNs for few-shot node classification.
ing paradigm to enhance the performance of GNNs. This LLMs for Text-Attributed Graphs Recently, LLMs have
is achieved by identifying nodes for which GNNs struggle garnered widespread attention and experienced rapid develop-
to generate reliable pseudo labels, yet LLMs can provide ment, emerging as a hot topic in the artificial intelligence area.
dependable predictions, which leverage the zero-shot ability Within the graph domain, LLMs show their generalizability
of LLMs to enhance the performance of GNNs. in dealing with Text-Attributed Graphs (TAGs). Chen et al.
• Extensive experiments on various benchmark datasets [19] demonstrated the power of LLMs’ zero-shot ability on
demonstrate the effectiveness of our proposed framework node classification tasks. Moreover, LLMs also demonstrate
for node classification tasks with limited labeled data. their power in providing rationales to enhance node features
[22] and construct edges in graphs [34]. Liu et al. [21] further
II. R ELATED W ORK proposed OFA to encode all graph data into text and leverage
Graph Neural Networks Graph Neural Networks (GNNs) LLMs to make predictions on different tasks. Despite their
have garnered widespread attention for their effective exploita- remarkable proficiency in understanding text, LLMs still face
tion of graph structure information. There are two primary limitations when it comes to processing graph-structured data.
types of GNNs: spectral-based and spatial-based. Kipf and Therefore, leveraging LLMs’ zero-shot ability and integrating
Welling [6] followed the idea of CNNs and proposed the them with GNNs has emerged as the latest state-of-the-art
Graph Convolutional Network (GCN) to aggregate informa- approach in text-attributed graph learning [19].
tion within the spectral domain using graph convolution. Active Learning Active learning (AL) [35]–[39] is a widely
Different from GCN, Graph Attention Network (GAT) [26] adopted approach across various domains for addressing the
and GraphSAGE [8] emerged as spatial-based approaches. issue of label sparsity. The core concept involves selecting the
GAT applies the attention mechanism to learn the importance most informative instances from the pool of unlabeled data.
of the neighbors when aggregating information. GraphSAGE Recently, many works [37], [38], [40] integrate GNNs with
randomly samples the number of neighbors of a node and AL to improve the representative power of graph embeddings.
aggregates information from these local neighborhoods. De- However, how to leverage AL to build connections between
spite their extensive application across various domains, GNNs LLMs and GNNs and improve the performance of GNNs has
often face challenges due to limited labeled data. Existing emerged as a problem. Chen et al. [19] first leverage active
convolutional filters or aggregation mechanisms struggle to learning to select nodes that are close to the cluster center
provide superior pseudo-labels with rationales, whereas GNNs

 &RUD &RUD
cannot, which can better enhance GNNs’ performance. Hence,
&LWHVHHU &LWHVHHU

3XE0HG  3XE0HG
we first conduct preliminary experiments to understand key
  factors pivotal for LLMs in generating reliable pseudo-labels.
$FFXUDF\

 
LLMs may benefit from various metrics to perform node
 
classification well. Particularly, in graph G, certain metrics
  exert a more pronounced influence on the correctness of LLM
 /RZ 0LG +LJK  /RZ 0LG +LJK predictions on nodes, which include: 1) the degree of a node,
'HJUHHV +RPRSKLO\5DWLR
and 2) the homophily ratio. Both degrees and the homophily
Fig. 1. Preliminary experiments: the impact of the degrees (left) and
homophily ratio (right) to LLMs ratio are important for a node. The former indicates how many
nodes will be affected by a node, while the latter suggests that
under the label-free setting and use LLM as an annotator to the node tends to connect with others having similar features.
create labels for these nodes. However, their approach simply Therefore, it’s crucial to examine how the degree and the
leverages LLMs to annotate nodes and ignores the benefits homophily ratio affect the performance of LLMs’ predictions.
of unlabeled nodes and the zero-shot reasoning ability of
LLMs. And, under the few-shot setting, GNNs themselves We conduct preliminary experiments to understand how
can provide relatively high-quality pseudo-labels for those these factors influence the classification performance of LLMs.
nodes close to the cluster center, which waste resources if Specifically, we use the following equation to compute the
we use LLMs to generate pseudo-labels for those nodes. homophily ratio: HR = # of neighbors have same label
total # of neighbors .
Moreover, prior research primarily concentrated on selecting We divide degree and homophily into 3 categories: highest,
data with the highest confidence score during the AL process. middle, and lowest, and select 200 nodes for each category.
In our work, instead of focusing on nodes where GNNs have Specifically, We sort the nodes based on the degrees and the
high confidence, we prioritize nodes where GNNs struggle to homophily ratio in descending order, evenly selecting 200
provide pseudo-labels with high confidence scores but LLMs nodes from the head, tail, and middle of the node list for
can provide reliable predictions. This approach is motivated the highest, lowest, and middle categories, respectively. The
by our integration of LLMs as teacher models to enhance the GPT-3.5-turbo is used for testing. We provide the raw text Xi
performance of GNNs by leveraging LLMs’ zero-shot pseudo- and the potential classes to the LLMs, asking them to assign a
labeling and reasoning ability. Through active learning, we label from the given class to Xi . Then, we compare the results
integrate LLMs into GNNs, enabling LLMs to instruct GNNs from LLM and the ground truth labels for evaluation.
with data that GNNs find challenging to label confidently.
Figure 1 shows the preliminary experimental results. From
III. P RELIMINARIES the figure, we can find that the performance of LLMs can
be affected by the degree and the homophily ratio of a node.
In this section, we conduct preliminary experiments to
When we use the nodes with more indegrees or higher ho-
reveal the metrics that can affect LLMs to generate high-
mophily ratio, the classification accuracy of LLMs is increased
quality pseudo labels and formulate the problem.
significantly. Compared with degrees, LLMs are more sensi-
Notations We use G = (V, E) to denote a graph, where V =
tive to the homophily ratio. From the right figure of Figure 1, it
{v1 , v2 , . . . , vN } is a set of N nodes and E is a set of edges.
is obvious that the performance of LLMs changed drastically
We use A to denote the adjacency matrix, where Aij = 1
across nodes with different homophily ratios. For example, the
means nodes vi and vj are connected; otherwise Aij = 0.
accuracy of LLMs for nodes with the lowest homophily ratio
The text-attributed graph can be defined as GT = (V, A, X ),
in the Cora dataset is around 40%, but it reaches 75% on nodes
where X = {X1 , X2 , · · · , XN } denotes the set of raw texts and
with the highest homophily ratio. This is because nodes with
can be encoded as text embeddings X = {x1 , x2 , · · · , xN }.
more degrees and a higher homophily ratio tend to be closer
In semi-supervised learning, The node set V can be divided
to the distribution center and occupy well-connected positions
into two different sets: (1) the labeled node set Vl and (2) the
in the graph. These nodes exert indirect influence and are
unlabeled node set Vu . Moreover, we use VS to denote the
naturally associated with richer textual information, making
labeled node set including both original labeled data and the
them more significant and representative within clusters. For
data selected through active learning, and use Y to represent
instance, in citation networks, papers with more citations are
the label set, where Y = {y1 , y2 , · · · , yN }.
more representative and easily distinguishable within their
A. Understanding LLM’s Capability field. These representative and distinct contexts lead LLMs
As the sparse label challenges GNN, in this paper, we to achieve better classification outcomes.
aim to imbue GNNs with the zero-shot learning prowess Our preliminary experiments demonstrate that LLMs are
of LLMs, thereby elevating their performance in scenarios capable of generating high-quality pseudo-labels for nodes
with limited labeled data. However, LLMs might be good at with a higher homophily ratio and more degree, which paves
classifying certain nodes while performing poorly on other us a way to effectively select nodes to query LLMs to obtain
nodes. Thus, it is important to identify nodes that LLMs can high-quality knowledge for enhancing GNNs knowledge.
Based on the information provided in the paper, it can be categorized as "Rule Learning". The paper primarily
Rationale prompt focuses on the integration of knowledge acquisition and machine learning techniques ... the emphasis on rule
Labeled Nodes acquisition and the use of a specific program like FOCL aligns with the characteristics of Rule Learning.

Rationale – Enhanced Explanation


Pseudo- Active
Soft label prompt Labels Learning
Logits

Optimization
ℒ𝑇

… + ℒ𝐹
Predictions
ℒ𝑆
Selected Nodes with Pseudo-Labels
Input Graph GNN Knowledge Distillation and Alignment Pseudo-Labeling
Fig. 2. An illustration of the proposed framework

B. Problem Statement student model, enhancing its generalizability with limited data.
As LLMs could not give reliable knowledge for all the Next, we introduce each component in detail.
nodes, in this paper, we study a novel problem of how to A. Base GNN Classifier
effectively leverage LLMs to enhance the performance of few-
As GNNs have shown great power in semi-supervised node
shot node classification over graphs. Given a text-attributed
classification, we adopt Graph Neural Networks (GNNs) as
graph GT = (V, A, X ) with a very limited labeled node set
the backbone models, which can be used to capture the
Vl (i.e. |Vl | ≪ |Vu |) and their label set Yl , a budget size B
structure information between entities and naturally propagate
(note that the budget size B is the number of nodes per class),
the information to all unlabeled nodes efficiently. We first use
and a large language model LLM , we aim to train a GNN that
SBERT [42] to encode raw texts X to text embeddings X.
can have better performance with only few available labeled
Then, we use GNNs to perform on the given graph and these
nodes by querying LLM within the budget B.
embeddings. Specifically, the GNN takes the graph GT as input
IV. P ROPOSED M ODEL and learns the node representation as
Though GNNs have shown great power in node classi-
Hf = GN N (A, X) (1)
fication, the vanilla GNNs suffer from low generalizability
with few labeled data for training [41]. Thus, in order to f
where H is the node representation matrix from the last layer
enhance the generalizability of GNNs, we propose a frame- of GNN. The final prediction results can be computed as:
work that integrates GNNs with LLMs and employs a novel
Graph-LLM-based active learning strategy to actively distill Z = softmax(Hf ) (2)
knowledge from LLMs. Our proposed model uses GNN as where Z is the probabilities for all nodes in the graph. The
the backbone model and takes advantage of LLMs’ zero- loss function for training the GNN will be introduced in IV-D.
shot pseudo-labeling and reasoning capabilities, especially for
nodes that are difficult for GNN to give accurate predictions. B. Obtaining Knowledge from LLM
In these instances, LLMs can offer reliable pseudo-labels and Despite GNNs showing success in dealing with graph
provide enhanced rationales, thereby improving the few-shot data, the generalizability of GNNs with few available data
learning capability of GNNs from distinct perspectives. is still limited. To tackle this challenge, we introduce LLMs
An illustration of the framework is shown in Figure 2. as teacher models, leveraging their zero-shot ability [20] to
Specifically, LLM serves as a teacher model, instructing the instruct GNNs in classification tasks and provide insights into
student model (GNNs) from two distinct perspectives: (1) the reasoning behind these decisions. In this way, GNNs can
it imparts the “correct” answers to the student model along learn hidden label distribution information and enhanced fea-
with the probability distribution for all potential categories, ture information from LLMs, which empower the capabilities
drawing upon its vast knowledge, which teaches GNNs with of GNNs with scarce labeled data
the output logits; and (2) it explains the rationale behind its To effectively distill knowledge from LLMs to GNN, we
decision-making process, providing insights into why certain consider two types of knowledge: (1) soft labels and logits;
decisions are made, which serves as the feature teacher to teach and (2) rationales behind LLMs’ decision-making process.
GNNs at embedding level. Then the knowledge obtained from Soft labels and logits reveal hidden distribution information
LLMs will be distilled to GNNs, and GNNs propagate label for unlabeled data, while rationales contribute richer node
information to all unlabeled nodes. We leverage Graph-LLM- information. This combination allows GNNs to benefit from
based active learning to identify nodes that GNNs struggle the unlabeled data and get enhanced node features. We prompt
to generate reliable pseudo labels but LLMs can provide the prediction and reasoning in a two-step manner: first, we
reliable predictions. These selected nodes are then added to the input the raw texts into the LLMs to generate the soft labels
train set with pseudo labels, and fed to LLMs for logits and and logits with the probability distribution. We then let LLMs
rationales, which can further enhance the capability of GNNs explain the reason why they make these decisions. Examples
under the guidance of LLMs. Finally, we train the ultimate of prompts are shown in Table I. To avoid deviations in output
TABLE I However, since the dimensionality of r̄i may be different
P ROMPT EXAMPLES from the dimension of the final layer of GNNs, alignment
For soft labels Paper: <Paper Information>. Task: For the
and logits following categories: <categories>, which categories between these representations is necessary. While min/max
does this paper belong to? Provide your <k> best pooling can effectively reduce dimensionality for alignment
guesses within the given categories: <categories> purposes, it tends to lose information during the pooling
and a confidence score that each is correct (0 to 1).
The sum of all confidence should be 1. Outputs must process. To retain the enriched information from these ratio-
be in the given categories. For example: “answer”: nales, we train a Multi-Layer Perceptron (MLP) using text
<your first answer>, “confidence”: <confidence for embeddings Xl of the limited labeled node set Vl and their
first answer>, ...
For rationales Paper: <Paper Information>. Task: For the corresponding ground truth labels Yl , applying the cross-
following categories: <categories>, which categories entropy loss function. This MLP is tasked with aligning the
does this paper belong to? Think step by step. representations between the rationales r̄ and the outputs Hf
Explain your decision in detail.
of the final layer of GNNs, ensuring that valuable informa-
tion is retained throughout the alignment process. The final
formats with multi-task prompts, we use separate prompts for
representation for i-th rationale is generated as follows:
logits and rationales, respectively. Next, we give the details.
1) Soft Labels and Logits Generation: For a node vi , we ri = M LP (r̄i ) (6)
first feeds raw text Xi into LLMs to generate soft labels ȳi
for Xi and the logits li for all possible categories. An example where r is the embedding that has the same dimension as the
of the prompt for soft labels and logit generation is shown in final layer’s outputs Hf in GNNs.
the first row of Table I. We leverage the zero-shot ability of C. Distilling Knowledge to GNN
LLMs to generate relevant reliable soft labels and logits so
that GNNs can leverage the hidden information of unlabeled With the knowledge from LLM represented as ri and li , we
data with knowledge distillation. This can be written as use knowledge distillation [43] to distill this knowledge into
GNNs. Through this process, GNNs can tap into the hidden
ȳi , li = LLM (Xi ; prompt) (3) information behind unlabeled nodes by using output logits
to enhance their performance. Moreover, they can achieve
2) Rationales for Feature Enhancement: Traditional knowl- improved node representations by incorporating the rationales
edge distillation methods primarily utilize the soft labels and generated from LLMs to further enrich the depth and quality
logits from the teacher model. Nonetheless, incorporating the of the information being processed. Specifically, LLMs serve
rationales behind text decisions can significantly enhance the as a pre-trained teacher model to teach the student model
learning capabilities of GNNs [22]. In this context, GNNs (GNNs) from two distinct perspectives: 1). soft labels and the
are able to learn more informative features from the LLM at probability distribution (logits) and 2). the rationales at the
the embedding level. Consequently, we introduce LLMs as a embedding level.
feature teacher, guiding GNNs to assimilate more informative 1) Loss for Knowledge Distillation: Let VS be the set of
features in their decision-making process. Unlike previous nodes including the original training data and the data selected
works that concatenate the enhanced embeddings and the through active learning (to be introduced in Section IV-E).
node embeddings or simply replace the node representations Following [5], for each vi ∈ Vs , we first convert the logits li
directly, we will use a loss function to minimize the difference from LLM as:
between them, which will help GNNs learn the enhanced rep-
exp (lij /τ )
resentation while retaining the original node representations. p(yi = j|LLM ) = PC (7)
The loss function will be detailed in Section IV-C. c=1 exp (lic /τ )
For a node vi , LLMs will output the classification result for where C is the number of classes, τ is the knowledge
Xi with a detailed explanation of the decision-making process. distillation (KD) temperature to control how much of the
The enhanced explanation Ri can be represented as follows: teacher’s knowledge is distilled to student model and lij is the
j-th elements of li . Then, the student can learn the distilled
Ri = LLM (Xi ; prompt) (4)
knowledge from the teacher by optimizing the following loss:
An example of the prompt for rationales is shown in the C
1 X X
second row of Table I. Since the rationales we get from LLMs LT = − p(yi = j|LLM ) log Zij (8)
are all textual explanations, we further need to transform |VS |
vi ∈VS
j=1
them into the embedding level to teach GNNs the more
where Zij is the probability that vi belongs to class j by
informative features. We use a pre-trained language model
GNN. This enhances the model’s capacity to get insights from
such as Sentence BERT (SBERT) [42] to get the embeddings
unlabeled data and augment its overall learning capabilities.
for Ri , which can be represented as follows:
2) Loss for Feature Alignment: We also introduce rationales
r̄i = LM (Ri ) (5) to augment the node representation from a feature perspective
at the embedding level. With r we get from IV-B, Mean Square
where r̄i means the embedding of i-th rationale. Error (MSE) is used to calculate the loss between the rationales
and node embeddings Hf at the final layer of GNNs for all RS assigns scores to the pi in descending order, prioritizing
the nodes in the current training set VS as: nodes for which GNNs cannot generate reliable pseudo-labels.
|VS | Considering the fact that some nodes can better contribute
1 X f to label propagation and model improvement in the graph,
LF = (H − ri )2 (9)
|VS | i=1 i we would like to add a metric to evaluate the importance of
the node and to facilitate selecting the most valuable pseudo-
where Hfi is the node embedding of vi from GNN. By employ- labels. Here, we utilize neighborhood entropy reduction to
ing this approach, the GNNs learn informative rationales from assess the importance of a given node vi [44]. Specifically, for
LLMs, enhancing their learning capabilities from a feature each node vi in the candidate set Vc , we compute the entropy
perspective at the embedding level. reduction in the neighbors’ softmax vectors by removing vi
D. Objective Function of Proposed Framework from the node set Vn , which contains the vi and its neigh-
The student model itself computes training loss between bors. The basic intuition is that a node is more informative
predictions and labeled or pseudo-labeled data as when it can greatly change uncertainty (entropy) within its
neighborhood. In other words, the more changes in entropy,
C
1 X X the more important a node is. Then we rank and assign scores
LS = − I(yi = c) log Zic (10) to these nodes based on the change of entropy. The score of
|VS |
vi ∈VS
c=1
entropy change is defined as follows:
where I is the indicator function which outputs 1 if yi = c
otherwise 0. VS is the set of labeled or pseudo-labeled nodes, SEi = RS (h(ŷVn −vi ) − h(ŷVn )) (13)
and yi is the label or pseudo-label of vi .
where SEi is the score of entropy change for vi , h(·) is the
With the knowledge distillation from LLM, the final loss
entropy function, and ŷ denotes the pseudo-labels from GNNs
function of our proposed model can be formalized as follows:
(ŷVn and ŷVn −vi represent the pseudo labels for the nodes in
min L = (1 − α − β)LS + αLT + βLF (11) the set Vn with and without vi ), which is computed based
GN N
on nodes’ logits vectors and the activation function. Thus, the
where α and β are both balance parameters that are set up to final evaluation metric for vi is:
adjust the relative weight of knowledge distillation loss and
feature embedding loss, respectively. Si = SGLi + SEi (14)

E. Graph-LLM-based Active learning In each stage, we select subsets of valuable nodes with high
To further improve GNNs’ few-shot learning ability, we Si , each consisting of b nodes per class. We query LLM to
introduce a novel Graph-LLM-based active learning strategy obtain the pseudo-label, logits, and rationals. We then add
to select valuable nodes for querying LLMs and add them these nodes to the label set and retrain our model using Eq. 11.
to the training set iteratively. We seek to select B nodes for We continue this process until the total number of nodes meets
each class where GNNs exhibit low confidence in classification the budget size B times the number of classes C. Here, B is
results, yet LLMs can offer high-quality pseudo-labels based a relatively small budget size, achieving a balance between
on their inherent knowledge. Through iterative selection, we the cost of querying LLMs and the resultant performance
progressively enhance the GNNs’ capabilities. improvement. Finally, the selected nodes with pseudo-labels
As indicated by the preliminary experiment results pre- are used to train the final GNN model. This approach makes
sented in Section III, LLMs demonstrate the ability to generate GNNs benefit from the various abilities of LLMs, enhancing
high-quality pseudo-labels for nodes with higher homophily their performance with scarce labeled data.
ratios and more degrees. Thus, we define an evaluation metric
V. E XPERIMENTAL R ESULTS AND A NALYSIS
that combines the confidence score of GNN’s prediction,
homophily ratio, and degrees to evaluate if the node in the In this section, we present the evaluation results of our
unlabeled node set Vu is valuable for our proposed model. proposed few-shot node classification model on benchmark
The evaluation metric is defined as follows: datasets. We aim to answer the following research questions:
• RQ1: How does our proposed model perform compared
SGLi = RS(pi ) + RS(HRi ) + RS(Di ) (12)
with state-of-the-art baselines under consistent settings?
where SGLi means the evaluation score for i-th node, pi • RQ2: How do different hyper-parameters impact the per-
(i.e. pi = max(Zi )), HRi , and Di denote the final output formance of our model?
confidence score, the homophily ratio, and degree for i-th • RQ3: How do different components in our proposed model
node, respectively. The homophily ratio is calculated using contribute to the performance?
labels generated from GNN. The RS represents a ranking
function used to calculate scores for each evaluation metric. A. Experimental Setup
Specifically, we arrange the nodes in ascending order accord- 1) Datasets: We evaluate our proposed model using three
ing to the evaluation metric results, excluding pi , and assign public citation datasets: Cora, Citeseer, and PubMed [6].
scores ranging from 0 to 1 with a step of 1/|Vu |. Note that These datasets are among the most commonly utilized citation
TABLE II • Meta-PN: Meta-PN uses meta-learning and employs a bi-
S TATISTICS OF THE DATASETS level optimization to generate high-quality pseudo-labels.
Dataset # nodes # edges # classes • LLM-based model: This LLM-based model leverages
Cora 2,078 5,429 7
Citeseer 3,327 9,228 6 LLMs to generate the pseudo nodes for each class, uses
PubMed 19,717 88,651 3 LM to encode these nodes and uses an MLP to build edges.
Note that the LLM-based model does not provide the original
network datasets for evaluating GNN models in node classi- code; therefore, we independently developed the model based
fication tasks. In these datasets, nodes are papers with topics on the paper. For fair comparison, the hyperparameters of all
serving as labels. Edges depict citation links between papers, the models are tuned on the validation set. All experimental
and node features are derived from the title and abstract of results are conducted under consistent settings.
papers. The dataset statistics are shown in Table II.
2) Implementation: Following the traditional dataset split B. Comparison with Baselines
setting, we divide the dataset into three parts: 60% for training,
20% for validation, and 20% for testing. From the training We conduct comprehensive experiments to evaluate the
set, we then randomly select n-shot samples (i.e. n × C) performance of our proposed model compared with 7 state-
to be used as training data. It is important to note that of-the-art baseline models under consistent settings and aim
since we randomly select n-shot nodes as training nodes for to answer RQ1. Specifically, we compare our model with
the few-shot setting, the choice of seeds will influence the seven state-of-the-art models using shots n = [1, 3, 5, 7].
quality of the initial nodes, thereby impacting the classification Additionally, we assess our model with different backbone
performance. Hence, we conduct experiments with different models (GCN, GAT, GraphSAGE) and LLMs (GPT, Gemini,
seeds [0, 1, 2] and use the average accuracy as our final results. LLama). The experiment results can be found in Table IV,
For our Graph-LLM-based active learning strategies, we set Table III, and Table V. From the experimental results, we can
the budget size B = 3, indicating the selection of 3 samples observe that the LLM-based few-shot model either matches
per class in total during the graph active learning process. or surpasses both traditional GNN models and meta-learning-
The balanced parameters are configured as α = 0.3 and based models. Obviously, our proposed model with GCN
β = 0.1, and the KD temperature τ = 3 is used for distilling outperforms all state-of-the-art baselines by a large margin
knowledge from LLMs to GNNs. We use GPT3.5-Turbo as in different shots. For example, with the 3-shot setting, the
our base LLM model. Additionally, we assess the impacts improvement margin of accuracy is (6 ∼ 35)% for Cora,
of different backbone models (GCN, GAT, and GraphSAGE), (2 ∼ 23)% for Citeseer, and (5 ∼ 20)% for PubMed. Com-
different LLM base models (GPT3.5-Turbo, Gemini-1.5-flash, pared to the LLM-based few-shot model, our model performs
and LLama-3.1-8b), different training sizes N, the sample size better while requiring fewer detailed outputs, thus reducing
per class B for active learning, the balance parameter α and costs and resource usage. These observations highlight that
β, and the KD temperature τ in V-B and V-C. Note that for our proposed model can achieve state-of-the-art performance
the LLM base models, we use the API for GPT and Gemini, with fewer labeled nodes, rendering it a promising approach
while LLama is run locally. for few-shot node classification tasks.
3) Baselines: We use 7 state-of-the-art models as baselines: From the tables, we can also observe that both the backbone
3 backbone GNN models (GCN [6], GAT [26], and Graph- models and LLMs influence the performance of our proposed
SAGE [8]), 2 GNN based few-shot learning models (Meta-PN model. Our model tends to perform better when paired with a
[14], CGPN [15]), 1 graph self-supervised model (MVGRL high-performing backbone model. Similarly, if the LLM excels
[45]), and 1 LLM-based few-shot learning model [25]. in text inference tasks, our model’s performance improves
• GCN: GCN conducts convolution operations on graph-
accordingly. Despite this dependency, our model significantly
structured data, which aggregates information from the outperforms the backbone models alone, highlighting its abil-
neighbors to iteratively update node representations. ity to effectively distill knowledge from LLMs by leveraging
• GAT: GAT incorporates an attention mechanism into GNN
their enhanced rationales and soft labels with few labeled data.
for feature aggregation, which allows GAT to focus on more
important neighbors and get better node representations. C. Hyper-parameter Sensitivity Analysis
• GraphSAGE: GraphSAGE samples neighbors and employs We evaluate the impact of different hyper-parameters to
mean aggregation to learn node embeddings, efficiently answer RQ2. We evaluate our proposed model with different
capturing the graph’s structural information. hyper-parameter settings: training size N ∈ {C ×1, C ×3, C ×
• MVGRL: MVGRL is a benchmark in GNN self-supervised 5, C × 7}; the budget size B ∈ [1, 2, 3, 4, 5, 7, 10, 15, 20, 25]
learning by using data augmentation to create diverse views for graph-LLM based active learning with N = C × 3;
for contrastive learning, employing graph diffusion, and the balance parameters α ∈ (0, 0.5] with β = 0.1 and
subgraph sampling to enhance its performance. β ∈ [0.01, 0.03, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5] with α = 0.3, re-
• CGPN: CGPN introduces the concept of poison learning spectively; and the KD temperature τ ∈ [1, 5]. The evaluation
and utilizes contrastive learning to propagate limited labels results for training size are shown in Tables IV, III, and V.
across the entire graph efficiently. The hyper-parameter evaluations are illustrated in Figure 3.
TABLE III
F EW- SHOT NODE CLASSIFICATION PERFORMANCE COMPARISON ON C ITESEER . T HE BEST RESULTS ARE HIGHLIGHTED IN BOLD
Citeseer
Models
1-shot 3-shot 5-shot 7-shot
GCN 43.72±(1.22) 52.24±(1.04) 56.69±(4.39) 58.20±(3.20)
GAT 29.02±(1.35) 33.91±(1.77) 36.33±(2.28) 38.77±(2.74)
GraphSAGE 43.64±(2.60) 54.33±(3.20) 57.76±(3.90) 58.11±(3.36)
MVGRL 46.13±(3.05) 55.61±(2.14) 59.56±(2.56) 60.52±(3.06)
Meta-PN 31.00±(4.89) 42.84±(4.70) 47.83±(4.30) 52.33±(3.33)
CGPN 34.75±(2.79) 41.69±(1.81) 45.88±(3.24) 46.80±(3.32)
LLM-based Model 44.63±(1.72) 55.75±(0.63) 58.65±(1.22) 59.53±(2.39)
Our Model (GAT, GPT) 32.86±(2.29) 39.51±(1.37) 41.55±(2.05) 42.87±(1.84)
Our Model (GraphSAGE, GPT) 45.41±(3.70) 56.63±(3.43) 59.87±(3.73) 60.87±(3.49)
Our Model (GCN, LLama) 45.11±(1.82) 56.22±(2.59) 60.23±(3.89) 62.05±(2.64)
Our Model (GCN, Gemini) 47.04±(2.75) 57.01±(3.07) 60.02±(3.19) 62.55±(3.48)
Our Model (GCN, GPT) 47.74±(1.69) 57.43±(3.33) 62.19±(2.80) 63.32±(2.64)

TABLE IV
F EW- SHOT NODE CLASSIFICATION PERFORMANCE COMPARISON ON C ORE . T HE BEST RESULTS ARE HIGHLIGHTED IN BOLD
Cora
Models
1-shot 3-shot 5-shot 7-shot
GCN 48.63±(5.49) 66.27±(5.04) 72.15±(2.24) 74.98±(0.97)
GAT 41.14±(7.24) 62.07±(6.96) 65.27±(3.40) 68.26±(1.29)
GraphSAGE 49.81±(4.66) 64.94±(6.89) 69.80±(2.41) 73.37±(1.19)
MVGRL 29.02±(2.49) 39.60±(3.17) 41.46±(4.11) 47.34±(6.01)
Meta-PN 36.59±(4.66) 54.55±(3.37) 59.74±(4.58) 66.89±(2.66)
CGPN 47.69±(6.97) 55.39±(5.01) 60.57±(2.05) 62.62±(2.86)
LLM-based Model 48.96±(3.57) 68.45±(4.80) 72.65±(1.92) 74.36±(1.35)
Our Model (GAT, GPT) 47.40±(2.49) 66.66±(3.02) 68.47±(3.26) 70.97±(1.77)
Our Model (GraphSAGE, GPT) 51.55±(2.15) 68.23±(3.76) 72.54±(2.40) 75.39±(0.44)
Our Model (GCN, LLama) 52.57±(1.72) 73.07±(2.23) 73.12±(2.84) 77.26±(2.04)
Our Model (GCN, Gemini) 53.35±(3.68) 73.42±(1.83) 75.44±(1.03) 79.61±(1.02)
Our Model (GCN, GPT) 53.82±(2.26) 74.32±(1.79) 78.13±(1.63) 79.73±(1.08)

TABLE V
F EW- SHOT NODE CLASSIFICATION PERFORMANCE COMPARISON ON P UB M ED . T HE BEST RESULTS ARE HIGHLIGHTED IN BOLD
PubMed
Models
1-shot 3-shot 5-shot 7-shot
GCN 54.55±(3.06) 62.64±(0.85) 66.53±(3.25) 69.99±(1.31)
GAT 49.53±(2.87) 59.35±(0.80) 59.95±(2.22) 64.58±(1.74)
GraphSAGE 54.62±(0.02) 60.58±(2.82) 63.79±(1.53) 68.15±(1.67)
MVGRL 41.54±(7.91) 51.96±(4.74) 52.14±(4.35) 54.56±(2.19)
Meta-PN 40.13±(0.50) 45.72±(4.38) 51.47±(3.84) 56.70±(5.47)
CGPN 42.73±(3.48) 39.79±(5.86) 41.67±(5.74) 50.02±(3.33)
LLM-based Model 56.45±(0.74) 61.89±(2.47) 67.47±(4.30) 71.25±(2.01)
Our Model (GAT, GPT) 52.77±(3.21) 62.06±(1.47) 63.41±(0.88) 67.78±(2.95)
Our Model (GraphSAGE, GPT) 57.37±(1.66) 63.40±(2.71) 65.98±(3.36) 71.00±(1.55)
Our Model (GCN, LLama) 57.11±(1.16) 63.16±(1.25) 68.00±(3.00) 73.00±(3.74)
Our Model (GCN, Gemini) 57.29±(2.22) 64.01±(0.85) 69.24±(2.93) 73.05±(2.56)
Our Model (GCN, GPT) 58.17±(1.13) 65.40±(0.84) 69.37±(2.47) 73.43±(3.36)

• From Table IV, Table III, and Table V, we can easily loss based on the ground truth gradually decreases. When
observe that when we increase the training size N, the the proportion falls below a certain threshold, the loss is
performance in terms of classification accuracy consistently primarily driven by the teacher loss and feature embedding
improves. Especially when N is increased from 1 to 3, the loss. However, the quality of these pseudo-labels and feature
performance significantly improves. embeddings generated from LLMs cannot be guaranteed.
• In Figure 3(a), we observe that as the value of α increases, • Figure Figure 3(c) shows the results of the evaluation on
the performance initially improves, and then reaches a peak KD temperature τ . It indicates that as we increase τ , the
around α = 0.3. However, the performance decreases performance initially experiences a significant improvement,
drastically with further increases in α beyond this point. stabilizes at a high level for τ ∈ [3, 5], and then drastically
• Figure 3(b) illustrates that the performance increases when drops when τ changes from 5 to 9. It’s not difficult to
β ∈ [0.01, 0, 1] and reaches the highest performance at understand this trend: when τ is relatively small, the soft
β = 0.1. Then the performance keeps decreasing when label probabilities distilled from the teacher model are
we enlarge β, where the performance drops slightly when informative and assist in optimizing the student model.
β is less than 0.3, and it significantly drops after 0.3. However, when τ becomes large, the distilled knowledge
The observed trend is quite understandable: as we increase becomes ambiguous, potentially leading to a smoothing
the value of the balance parameters, the proportion of effect on the student model’s inference ability.
&RUD &LWHVHHU  3XE0HG &RUD &LWHVHHU 3XE0HG
   

   
 
$FFXUDF\

$FFXUDF\
  
  
 
    

            
                               

(a) Alpha (b) Beta


 &RUD  &LWHVHHU  3XE0HG  &RUD  &LWHVHHU
 3XE0HG
  
   
$FFXUDF\

$FFXUDF\
  
    
 
     
 
  
                                              

(c) Temperature (d) Budget Size


Fig. 3. Hyper-parameters evaluation: (a) Alpha (b) Beta (c) Temperature (d) Budget Size

TABLE VI TABLE VII


A BLATION STUDY: S OFT L ABELS (SL), E NHANCED R ATIONALES (ER), A BLATION STUDY: ALIGNMENT AND ACTIVE LEARNING
AND G RAPH -LLM- BASED ACTIVE LEARNING (AL) Strategy Citeseer Cora PubMed
SL ER AL Citeseer Cora PubMed
52.24±(1.04) 66.27±(5.04) 62.64±(0.85) Alignment: max pooling 52.28±(2.37) 70.88±(2.24) 62.43±(1.21)
✓ 53.21±(2.25) 68.93±(3.88) 63.23±(1.15) AL: Random Selection 54.97±(2.32) 72.77±(1.86) 63.58±(0.64)
✓ 54.39±(3.84) 70.32±(3.81) 63.82±(0.99) AL: w/o iteration 55.84±(2.83) 73.12±(1.97) 64.43±(0.57)
✓ 54.65±(2.58) 68.87±(2.12) 63.19±(2.41) Our Model 57.43±(3.33) 74.32±(1.79) 65.40±(0.84)
✓ ✓ 55.57±(2.18) 71.12±(2.63) 64.33±(1.27)
✓ ✓ 55.41±(1.13) 72.91±(1.74) 64.59±(1.02)
As illustrated in Table VI, all these components contribute
✓ ✓ 55.37±(1.83) 71.78±(2.80) 63.68±(0.82) to the performance of our model. Among these components,
✓ ✓ ✓ 57.43±(3.33) 74.32±(1.79) 65.40±(0.84) the enhanced rationales have a relatively small impact on the
performance. When we add soft-labels and active learning
• For the impact of budget size B, as shown in Figure
independently, the performance improves by a considerable
3(d). As B increases from 1 to 7, we observe a consistent
margin. When we add these two components together, the
improvement in performance. However, beyond this range,
performance has a significant improvement. The experimental
the performance remains stable or even drops, indicating
results also showcase that LLMs can effectively enhance the
an upper limit to the benefits gained from enlarging B. At
performance of GNNs. Whether we incorporate the logits and
this point, any further increase in the budget size results
enhanced rationales independently or combine them, there is a
in higher costs, but only minimal gains in performance.
significant performance improvement. Furthermore, when all
This suggests that while increasing the budget can enhance
these components are integrated, our model achieves state-of-
performance up to a certain point, we need to make a trade-
the-art performance.
off between the performance and the cost. For the rationale alignments, we further evaluate the two
different alignment strategies: 1) max pooling and 2) our MLP-
D. Ablation Study based alignment approach. For active learning, we evaluate
different selection strategies: 1) randomly select nodes with
In this section, we design the ablation study to further pseudo-labels 2). select all valuable nodes at once with Graph-
investigate how different components contribute to the per- LLM-based AL; 2) select nodes in an iteration method with
formance of our model and answer RQ3. Our model proceeds our Graph-LLM-based AL, which means we will select b
with LLMs and Graph-LLM-based active learning. For the nodes per class until the total number of selected nodes reach
LLMs, we further have two distinct perspectives: 1) soft labels B ×C. As shown in Table VII, compared with the MLP-based
and logits and 2) enhanced rationales. Thus, we investigate alignment strategy, the improvement by using max pooling is
three components in our model design: soft labels and logits, limited, which is reasonable because max pooling will lose
enhanced rationales, and Graph-LLM-based active learning. information during the pooling process. For active learning,
(1) Soft labels and logits refer to the soft labels and logits that the performance of the model with our Graph-LLM-based AL
get from LLMs, which are used for knowledge distillation; is better than random selection. Moreover, despite selecting
(2) rationales refer to getting the enhanced explanation from all valuable nodes at once has shown significant performance
LLMs, which will provide insights from the feature perspective improvement, the performance reaches a new high when we
at the embedding level. (3) Graph-LLM-based active learning apply the iteration selection strategy. This enhancement is
refers to selecting valuable nodes for the model. The hyper- attributed to the iterative active learning process, where GNNs
parameter is set to N = k × 3, α = 0.3, β = 0.1, T = 3, benefit from the LLM’s zero-shot inference and reasoning
B = 3, and backbone model is GCN. Note that when we ability to refine their predictions iteratively.
solely apply active learning to the model, we select nodes
with high confidence scores and prioritize the most important VI. C ONCLUSION
nodes. However, when we integrate the LLM into the model, In this paper, we extend the task of node classification to
we employ our Graph-LLM-based active learning strategy. a more challenging and realistic case where only few labeled
data are available. To tackle this challenge, we propose a novel [19] Z. Chen, H. Mao, H. Wen, H. Han, W. Jin, H. Zhang, H. Liu, and
few-shot node classification model that leverages the zero- J. Tang, “Label-free node classification on graphs with large language
models (llms),” arXiv preprint arXiv:2310.04668, 2023.
shot and reasoning ability of Large Language Models. We [20] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large
treat LLMs as a teacher to teach GNNs from two different language models are zero-shot reasoners,” NeurIPS, 2022.
perspectives including the logits from the distribution side and [21] H. Liu, J. Feng, L. Kong, N. Liang, D. Tao, Y. Chen, and M. Zhang,
“One for all: Towards training one graph model for all classification
enhanced rationales from the feature side. Moreover, we pro- tasks,” arXiv preprint arXiv:2310.00149, 2023.
posed a Graph-LLM-based active learning method to further [22] X. He, X. Bresson, T. Laurent, and B. Hooi, “Explanations as fea-
improve the generalizability of GNNs with few available data tures: Llm-based features for text-attributed graphs,” arXiv preprint
arXiv:2305.19523, 2023.
by actively selecting and distilling knowledge from LLMs. To [23] Z. Guo, L. Xia, Y. Yu, Y. Wang, Z. Yang, W. Wei, L. Pang, T.-S. Chua,
assess the effectiveness of our model, extensive experiments and C. Huang, “Graphedit: Large language models for graph structure
have been conducted on three citation networks. The evalua- learning,” arXiv preprint arXiv:2402.15183, 2024.
[24] J. Jiang, K. Zhou, Z. Dong, K. Ye, W. X. Zhao, and J.-R. Wen,
tion results demonstrate that our model achieves state-of-the- “Structgpt: A general framework for large language model to reason
art performance and LLMs can effectively provide insights to over structured data,” arXiv preprint arXiv:2305.09645, 2023.
GNNs from different perspectives, reaffirming its effectiveness [25] J. Yu, Y. Ren, C. Gong, J. Tan, X. Li, and X. Zhang, “Empower text-
attributed graphs learning with large language models (llms),” arXiv
in node classification, its superiority over baseline models, and preprint arXiv:2310.09872, 2023.
its practical significance in addressing the challenges of few- [26] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Ben-
shot node classification. gio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
[27] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolutional
networks for semi-supervised learning,” in AAAI, vol. 32, no. 1, 2018.
R EFERENCES [28] F. Hao, F. He, J. Cheng, L. Wang, J. Cao, and D. Tao, “Collect and
[1] W. Chen, Y. Gu, Z. Ren, X. He, H. Xie, T. Guo, D. Yin, and Y. Zhang, select: Semantic alignment metric learning for few-shot learning,” in
“Semi-supervised user profiling with heterogeneous graph attention ICCV, 2019, pp. 8460–8469.
networks.” in IJCAI, vol. 19, 2019, pp. 2116–2122. [29] W. Jiang, K. Huang, J. Geng, and X. Deng, “Multi-scale metric learning
[2] A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-stgcnn: for few-shot learning,” IEEE Transactions on Circuits and Systems for
A social spatio-temporal graph convolutional neural network for human Video Technology, vol. 31, no. 3, pp. 1091–1102, 2020.
trajectory prediction,” in CVPR, 2020, pp. 14 424–14 432. [30] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot
[3] M. Réau, N. Renaud, L. C. Xue, and A. M. Bonvin, “Deeprank-gnn: learning,” NeurIPS, vol. 30, 2017.
a graph neural network framework to learn patterns in protein–protein [31] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales,
interfaces,” Bioinformatics, vol. 39, no. 1, p. btac759, 2023. “Learning to compare: Relation network for few-shot learning,” in
[4] R. Li, H. Chen, F. Feng, Z. Ma, X. Wang, and E. Hovy, “Dual graph CVPR, 2018, pp. 1199–1208.
convolutional networks for aspect-based sentiment analysis,” in ACL- [32] F. Zhou, C. Cao, K. Zhang, G. Trajcevski, T. Zhong, and J. Geng, “Meta-
IJCNLP, 2021. gnn: On few-shot node classification in graph meta-learning,” in CIKM,
[5] Q. Li, X. Li, L. Chen, and D. Wu, “Distilling knowledge on text graph 2019, pp. 2357–2360.
for social media attribute inference,” in SIGIR, 2022, pp. 2024–2028. [33] K. Huang and M. Zitnik, “Graph meta learning via local subgraphs,”
[6] T. N. Kipf and M. Welling, “Semi-supervised classification with graph NeurIPS, vol. 33, pp. 5862–5874, 2020.
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. [34] S. Sun, Y. Ren, C. Ma, and X. Zhang, “Large language models as
[7] S. Abu-El-Haija, B. Perozzi, A. Kapoor, N. Alipourfard, K. Lerman, topological structure enhancers for text-attributed graphs,” arXiv preprint
H. Harutyunyan, G. Ver Steeg, and A. Galstyan, “Mixhop: Higher-order arXiv:2311.14324, 2023.
graph convolutional architectures via sparsified neighborhood mixing,” [35] Y. Shen, H. Yun, Z. C. Lipton, Y. Kronrod, and A. Anandkumar,
in ICML. PMLR, 2019, pp. 21–29. “Deep active learning for named entity recognition,” arXiv preprint
[8] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation arXiv:1707.05928, 2017.
learning on large graphs,” NeurIPS, vol. 30, 2017. [36] P. Bachman, A. Sordoni, and A. Trischler, “Learning algorithms for
[9] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A active learning,” in ICML. PMLR, 2017, pp. 301–310.
comprehensive survey on graph neural networks,” IEEE transactions on [37] H. Cai, V. W. Zheng, and K. C.-C. Chang, “Active learning for graph
neural networks and learning systems, vol. 32, no. 1, pp. 4–24, 2020. embedding,” arXiv preprint arXiv:1705.05085, 2017.
[10] J. Wei, C. Huang, S. Vosoughi, Y. Cheng, and S. Xu, “Few-shot text [38] L. Gao, H. Yang, C. Zhou, J. Wu, S. Pan, and Y. Hu, “Active
classification with triplet networks, data augmentation, and curriculum discriminative network representation learning,” in IJCAI, 2018.
learning,” arXiv preprint arXiv:2103.07552, 2021. [39] Y. Wu, Y. Xu, A. Singh, Y. Yang, and A. Dubrawski, “Active learning
[11] X. Han, H. Zhu, P. Yu, Z. Wang, Y. Yao, Z. Liu, and M. Sun, “Fewrel: A for graph neural networks via node feature propagation,” arXiv preprint
large-scale supervised few-shot relation classification dataset with state- arXiv:1910.07567, 2019.
of-the-art evaluation,” arXiv preprint arXiv:1810.10147, 2018. [40] S. Hu, Z. Xiong, M. Qu, X. Yuan, M.-A. Côté, Z. Liu, and J. Tang,
[12] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for “Graph policy network for transferable active learning on graphs,”
fast adaptation of deep networks,” in ICML, 2017, pp. 1126–1135. NeurIPS, vol. 33, pp. 10 174–10 185, 2020.
[13] K. Ding, Q. Zhou, H. Tong, and H. Liu, “Few-shot network anomaly [41] W. Feng, J. Zhang, Y. Dong, Y. Han, H. Luan, Q. Xu, Q. Yang,
detection via cross-network meta-learning,” in WWW, 2021. E. Kharlamov, and J. Tang, “Graph random neural networks for semi-
[14] K. Ding, J. Wang, J. Caverlee, and H. Liu, “Meta propagation networks supervised learning on graphs,” NeurIPS, 2020.
for graph few-shot semi-supervised learning,” in AAAI, 2022. [42] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using
[15] S. Wan, Y. Zhan, L. Liu, B. Yu, S. Pan, and C. Gong, “Contrastive graph siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
poisson networks: Semi-supervised learning with extremely limited [43] G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a
labels,” NeurIPS, vol. 34, pp. 6316–6327, 2021. neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
[16] Q. Zhu, C. Yang, Y. Xu, H. Wang, C. Zhang, and J. Han, “Transfer learn- [44] F. Wang, T. Zhao, and S. Wang, “Distribution consistency based self-
ing of graph neural networks with ego-graph information maximization,” training for graph neural networks with sparse labels,” in WSDM, 2024.
NeurIPS, vol. 34, pp. 1766–1779, 2021. [45] K. Hassani and A. H. Khasahmadi, “Contrastive multi-view representa-
[17] L. Chen, X. Li, and D. Wu, “Adversarially reprogramming pretrained tion learning on graphs,” in ICML. PMLR, 2020, pp. 4116–4126.
neural networks for data-limited and cost-efficient malware detection,”
in SDM. SIAM, 2022, pp. 693–701.
[18] H. Yao, Y. Wei, L.-K. Huang, D. Xue, J. Huang, and Z. J. Li,
“Functionally regionalized knowledge transfer for low-resource drug
discovery,” NeurIPS, vol. 34, 2021.

You might also like