0% found this document useful (0 votes)

23 views12 pages

(KDD 2023) All in One - Multi-Task Prompting For Graph Neural Networks

Uploaded by

groverkarish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views12 pages

(KDD 2023) All in One - Multi-Task Prompting For Graph Neural Networks

Uploaded by

groverkarish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

All in One: Multi-Task Prompting for Graph Neural Networks

Xiangguo Sun Hong Cheng Jia Li

Department of Systems Engineering Department of Systems Engineering Data Science and Analytics Thrust,
and Engineering Management, and and Engineering Management, and The Hong Kong University of Science
Shun Hing Institute of Advanced Shun Hing Institute of Advanced and Technology (Guangzhou)
Engineering, The Chinese University Engineering, The Chinese University [email protected]
of Hong Kong of Hong Kong
[email protected] [email protected]

Bo Liu Jihong Guan

School of Computer Science and Department of Computer Science and
Engineering, Southeast University Technology, Tongji University
Purple Mountain Laboratories [email protected]
[email protected]
ABSTRACT KEYWORDS
Recently, “pre-training and fine-tuning” has been adopted as a stan- pre-training; prompt tuning; graph neural networks
dard workflow for many graph tasks since it can take general graph
ACM Reference Format:
knowledge to relieve the lack of graph annotations from each ap-
Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. 2023. All in
plication. However, graph tasks with node level, edge level, and One: Multi-Task Prompting for Graph Neural Networks. In Proceedings of
graph level are far diversified, making the pre-training pretext often the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
incompatible with these multiple tasks. This gap may even cause a (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY,
“negative transfer” to the specific application, leading to poor results. USA, 12 pages. https://doi.org/10.1145/3580305.3599256
Inspired by the prompt learning in natural language processing
(NLP), which has presented significant effectiveness in leveraging
1 INTRODUCTION
prior knowledge for various NLP tasks, we study the prompting
topic for graphs with the motivation of filling the gap between pre- Graph neural networks (GNNs) have been widely applied to various
trained models and various graph tasks. In this paper, we propose a applications such as social computing [5, 28] , anomaly detection
novel multi-task prompting method for graph models. Specifically, [30, 31] , and network analysis [4]. Beyond exploring various ex-
we first unify the format of graph prompts and language prompts quisite GNN structures, recent years have witnessed a new research
with the prompt token, token structure, and inserting pattern. In trend on how to train a graph model for dedicated problems.
this way, the prompting idea from NLP can be seamlessly intro- Traditional supervised learning methods on graphs heavily rely
duced to the graph area. Then, to further narrow the gap between on graph labels, which are not always sufficient in the real world.
various graph tasks and state-of-the-art pre-training strategies, we Another shortcoming is the over-fitting problem when the testing
further study the task space of various graph applications and re- data is out-of-distribution [24]. To solve these challenges, many
formulate downstream problems to the graph-level task. Afterward, studies turn to “pre-training and fine-tuning” [13], which means
we introduce meta-learning to efficiently learn a better initializa- pre-training a graph model with easily accessible data, and then
tion for the multi-task prompt of graphs so that our prompting transferring the graph knowledge to a new domain or task via
framework can be more reliable and general for different tasks. We tuning the last layer of the pre-trained model. Although much
conduct extensive experiments, results from which demonstrate progress has been achieved on pre-training strategies [9], there still
the superiority of our method. exists a huge gap between these pretexts and multiple downstream
tasks. For example, a typical pretext for the pre-training graph is
CCS CONCEPTS binary edge prediction. Usually, this pre-training strategy makes
connected nodes closer in a latent representation space. However,
• Networks → Online social networks; • Computing method- many downstream tasks are not limited to edge-level tasks but also
ologies → Knowledge representation and reasoning. include node-level tasks (e.g., node multi-class classification) or
graph-level tasks (e.g., graph classification). If we transfer the above
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed pre-trained model to multi-class node classification, it may require
for profit or commercial advantage and that copies bear this notice and the full citation us to carefully search the results in higher dimensional parameter
on the first page. Copyrights for components of this work owned by others than the space for the additional classes of node labels. This tuning may
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission even break down (a.k.a negative transfer [33]) when connected
and/or a fee. Request permissions from [email protected]. nodes have different labels. Tuning this pre-trained model to graph-
KDD ’23, August 6–10, 2023, Long Beach, CA, USA level tasks is neither efficient because we have to pay huge efforts
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0103-0/23/08. . . $15.00 to learn an appropriate function mapping node embedding to the
https://doi.org/10.1145/3580305.3599256 whole graph representation.

2120
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan

Tuned Frozen Prompt

input prompt answer
Downstream Pretraining Downstream
Tasks Tasks Tasks Prompt
``KDD2023 will witness many high-quality papers. I feel so [excited] ’’
Fine-tuning
turning
Pretraining Pretrained
Pretrained
Graph Model Graph Model input prompt tasker
Graph Model
(answer)
+ 𝑧

+ inserting pattern: prompt token: token structure:

… … …
Task Domain Pretraining Domain Task Domain insert the prompt to the input graph

Figure 1: Fine-tuning, Pre-training, and Prompting. Figure 2: Our graph prompt inspired by the language prompt.

A promising solution to the above problems is to extend “pre-

training and fine-tuning” to “pre-training, prompting, and fine-
the capability of model generalization. Currently, we only find very
tuning”. Prompt learning is a very attractive idea derived from
few works [27] studying the graph prompt issue. However, it can
natural language processing (NLP) and has shown notable effective-
only deal with a single-type task (e.g., node classification) using a
ness in generalizing pre-trained language models to a wide range of
specific pretext (e.g., edge prediction), which is far from addressing
language applications [20]. Specifically, a language prompt refers to
the multi-task setting with different-level tasks.
a piece of text appended to the rear of an input text. For example, a
Last but not least, learning a reliable prompt usually needs huge
sentiment task like “KDD2023 will witness many high-quality papers.
manpower and is more sensitive to prompt initialization in the
I feel so [MASK]” can be easily transferred to a word prediction
multi-task setting [18]. Although there are some works [14, 38]
task via a preset prompt (“I feel so [MASK]”). It is highly expected
in the NLP area trying to initialize the prompt via hand-crafted
that the language model may predict “[MASK]” as “excited” rather
content or some discrete features, these methods are task-bounded,
than “upset” without further optimizing parameters for the new
which is not sufficient when we confront a new task. This problem
sentiment task because this model has already been pre-trained via
may be even worse in our multi-task graph area since graph features
the pretext of masked words prediction and contains some useful
vary a lot in different domains and tasks.
knowledge to answer this question. By this means, some down-
Presented work. To further fill the gap between graph pre-
stream objectives can be naturally aligned with the pre-training
training and downstream tasks, we introduce the prompt method
target. Inspired by the success of the language prompt, we hope to
from NLP to graphs under the multi-task background. Specifically,
introduce the same idea to graphs. As shown in Figure 1, prompt
to address the first challenge, we propose to unify the format of
tuning in the graph domain is to seek some light-weighted prompt,
the language prompt and graph prompt in one way so that we can
keep the pre-training model frozen, and use the prompt to refor-
smoothly transfer the prompt idea from NLP to graphs, then we
mulate downstream tasks in line with the pre-training task. In this
design the graph prompt from prompt tokens, token structures,
way, the pre-trained model can be easily applied to downstream
and prompt inserting patterns. To address the second challenge,
applications with highly efficient fine-tuning or even without any
we first study the task subspace in graphs and then propose to
fine-tuning. This is particularly useful when the downstream task
reformulate node-level and edge-level tasks to graph-level tasks by
is a few-shot setting.
induced graphs from original graphs. To address the third challenge,
However, designing the graph prompt is more intractable than
we introduce the meta-learning technique over multiple tasks to
language prompts. First, classic language prompts are usually some
learn better prompts. We carefully evaluate our method with other
preset phrases or learnable vectors attached at the end of input texts.
approaches and the experimental results extensively demonstrate
As shown in Figure 2, we only need to consider the content for the
our advantages.
language prompt, whereas the graph prompt not only requires the
Contributions:
prompt “content” but also needs to know how to organize these
prompt tokens and how to insert the prompt into the original graph, • We unify the prompt format in the language area and graph
both of which are undefined problems. area, and further propose an effective graph prompt for multi-
Second, there is a huge difficulty in reconciling downstream task settings (section 3.3).
problems to the pre-training task. In the NLP area, we usually pre- • We propose an effective way to reformulate node-level and
train a language model via masked prediction and then transfer it edge-level tasks to graph-level tasks, which can further match
to various applications like question answering [22], sentiment clas- many pre-training pretexts (section 3.2).
sification [17]. The underlying support [21] is that these language • We introduce the meta-learning technique to our graph
tasks usually share a large overlapping task sub-space, making a prompting study so that we can learn a reliable prompt for
masked language task easily transferred to other applications. How- improving the multi-task performance (section 3.4).
ever, how much does the same observation exist (if truly exists) in • We carefully analyze why our method works (section 3.5)
graph learning? It is crucial but difficult to decide on an appropriate and confirm the effectiveness of our method via extensive
pre-training task and reformulate downstream tasks to improve experiments (section 4).

2121
All in One: Multi-Task Prompting for Graph Neural Networks KDD ’23, August 6–10, 2023, Long Beach, CA, USA

2 BACKGROUND
Question
Graph Neural Networks. Graph neural networks (GNNs) have Answering
Graph-level Operations
presented powerful expressiveness in many graph-based applica- Sentiment
Classification
tions [10, 12, 15, 29] . The nature of most GNNs is to capture the un- Subgraph-level
Operations
derlying message-passing patterns for graph representation. To this
end, there are many effective neural network structures proposed Masked Edge-level
Node-level Operations
Prediction Operations
such as graph attention network (GAT) [32], graph convolution
network (GCN) [34], Graph Transformer [25]. Recent works also
consider how to make graph learning more adaptive when data (a) NLP tasks (b) graph tasks

annotation is insufficient or how to transfer the model to a new

domain, which triggered many graph pre-training studies instead Figure 3: Task space in NLP and graph
of traditional supervised learning.
Graph Pre-training. Graph pre-training [13] aims to learn some
adaptive inserting patterns. Third, we build a meta-learning process
general knowledge for the graph model with easily accessible infor-
to learn more adaptive graph prompts for multi-task settings. Next,
mation to reduce the annotation costs of new tasks. Some effective
we elaborate on the main components.
pre-training strategies include node-level comparison like GCA
[40], edge-level pretext like edge prediction [13], and graph-level
3.2 Reformulating Downstream Tasks
contrastive learning such as GraphCL [36] and SimGRACE [35].
In particular, GraphCL minimizes the distance between a pair of 3.2.1 Why Reformulate Downstream Tasks. The success of the tra-
graph-level representations for the same graph with different aug- ditional “pre-training and fine-tuning” framework in the NLP area
mentations whereas SimGRACE tries to perturb the graph model largely lies in the fact that the pre-training task and downstream
parameter spaces and narrow down the gap between different per- tasks share some common intrinsic task subspace, making the pre-
turbations for the same graph. These graph-level strategies perform training knowledge transferable to other downstream tasks (Figure
more effectively in graph knowledge learning [11] and are the de- 3a). However, things are a little complicated in the graph domain
fault strategies of this paper. since graph-related tasks are far from similar. As shown in Figure 3b,
Prompt Learning & Motivations. Intuitively, the above graph- it is far-fetched to treat the edge-level task and the node-level task
level pre-training strategies have some intrinsic similarities with as the same one because node-level operations and edge-level oper-
the language-masked prediction task: aligning two graph views ations are far more different [27]. This gap limits the performance
generated by node/edge/feature mask or other perturbations is very of pre-training models and might even cause negative transfer [13].
similar to predicting some vacant “blanks” on graphs. That inspires The same problem also happens in our “pre-training, prompting,
us to further consider: why can’t we use a similar format prompt and fine-tuning” framework since we aim to learn a graph prompt
for graphs to improve the generalization of graph neural networks? for multiple tasks, which means we need to further narrow down
Instead of fine-tuning a pre-trained model with an adaptive task the gap between these tasks by reformulating different graph tasks
head, prompt learning aims to reformulate input data to fit the in a more general form.
pretext [7]. Many effective prompt methods are firstly proposed 3.2.2 Why Reformulate to the Graph Level. With the above moti-
in the NLP area, including some hand-crafted prompts like GPT-3 vation, we revisit the potential task space on the graph and find
[3], discrete prompts like [7, 26], and trainable prompts in the con- their hierarchical relation as shown in Figure 3b. Intuitively, many
tinuous spaces like [16, 19]. Despite significant results achieved, node-level operations such as “changing node features”, “delete/add
prompt-based methods have been rarely introduced in graph do- a node”, or edge-level operations such as “add/delete an edge”, can
mains yet. We only find very few works like GPPT [27], trying to be treated as some basic operations at the graph level. For example,
design prompts for graphs. Unfortunately, most of them are very “delete a subgraph” can be treated as “delete nodes and edges”. Com-
limited and are far from sufficient to meet the multi-task demands. pared with node-level and edge-level tasks, graph-level tasks are
more general and contain the largest overlapping task sub-spaces
3 MULTI-TASK PROMPTING ON GRAPHS for knowledge transfer, which has been adopted as the mainstream
3.1 Overview of Our Framework task in many graph pre-training models [11, 35, 36]. This observa-
tion further inspires us to reformulate downstream tasks to look
Objective: In this paper, we aim to learn a prompt graph that can be
like the graph-level task and then leverage our prompting model to
inserted into the original graph, through which we wish to further
match graph-level pre-training strategies.
bridge the gap between a graph pre-training strategy and multiple
downstream tasks, and further relieve the difficulties of transferring 3.2.3 How to Reformulate Downstream Tasks. Specifically, we refor-
prior knowledge to different domains. mulate node-level and edge-level tasks to graph-level tasks by build-
Overview: To achieve our goal, we propose a novel multi-task ing induced graphs for nodes and edges, respectively. As shown in
prompting framework for graph models. First, we unify various Figure 4a, an induced graph for a target node means its local area
graph tasks in the same format and reformulate these downstream in the network within 𝜏 distance, which is also known as its 𝜏-ego
tasks as graph-level tasks. Second, with the unified graph-level network. This subgraph preserves the node’s local structure by
instances, we further narrow down the gap among multiple tasks neighboring node connections and its semantic context by neigh-
by a novel prompt graph with learnable tokens, inner structures, and boring node features, which is the main scope of most graph neural

2122
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan

encoders. When we treat the target node’s label as this induced denotes the set of prompt tokens and |P | is the number of tokens.
graph label, we can easily translate the node classification problem Each token 𝑝𝑖 ∈ P can be represented by a token vector p𝑖 ∈ R1×𝑑
into graph classification; Similarly, we present an induced graph for with the same size of node features in the input graph; Note that in
a pair of nodes in Figure 4b. Here, the pair of nodes can be treated practice, we usually have |P | ≪ 𝑁 and |P | ≪ 𝑑ℎ where 𝑑ℎ is the
as a positive edge if there is an edge connecting them, or a negative size of the hidden layer in the pre-trained graph model. With these
edge if not. This subgraph can be easily built by extending this node token vectors, the input graph can be reformulated by adding the
pair to their 𝜏 distance neighbors. We can reformulate the edge- 𝑗-th token to graph node 𝑣𝑖 (e.g., x̂𝑖 = x𝑖 + p 𝑗 ). Then, we replace
level task by assigning the graph label with the edge label of the the input features with the prompted features and send them to the
target node pair. Note that for unweighted graphs, the 𝜏 distance is pre-trained model for further processing.
equal to 𝜏-hop length; for weighted graphs, the 𝜏 distance refers to
the shortest path distance, where the induced graph can be easily 3.3.3 Token Structures. S = {(𝑝𝑖 , 𝑝 𝑗 )|𝑝𝑖 , 𝑝 𝑗 ∈ P} is the token
found by many efficient algorithms [1, 39]. structure denoted by pair-wise relations among tokens. Unlike the
NLP prompt, the token structure in the prompt graph is usually
implicit. To solve this problem, we propose three methods to design
the prompt token structures: (1) the first way is to learn tunable
parameters:
| P | −1
A= ∪ {𝑎𝑖 𝑗 }
𝑖=1
𝑗 =𝑖+1

where 𝑎𝑖 𝑗 is a tunable parameter indicating how possible the token

(a) Induced graphs for nodes 𝑝𝑖 and the token 𝑝 𝑗 should be connected; (2) the second way is
to use the dot product of each prompt token pair and prune them
according to the dot value. In this case, (𝑝𝑖 , 𝑝 𝑗 ) ∈ S iff 𝜎 (p𝑖 ·p 𝑗 ) < 𝛿
where 𝜎 (·) is a sigmoid function and 𝛿 is a pre-defined threshold;
(3) the third way is to treat the tokens as independent and then we
have S = ∅.
3.3.4 Inserting Patterns. Let 𝜓 be the inserting function that indi-
(b) Induced graphs for edges
cates how to add the prompt graph G𝑝 to the input graph G, then
the manipulated graph can be denoted as G𝑚 = 𝜓 (G, G𝑝 ). We can
Figure 4: Induced graphs for nodes and edges define the inserting pattern as the dot product between prompt
tokens and input graph nodes, and then use a tailored connection
Í| P |
like x̂𝑖 = x𝑖 + 𝑘=1 𝑤𝑖𝑘 p𝑘 where 𝑤𝑖𝑘 is a weighted value to prune
3.3 Prompt Graph Design unnecessary connections:
3.3.1 Prompting NLP and Graph in One Way. To seamlessly trans-
fer the prompting idea from NLP to the graph domain, we propose 𝜎 (p𝑘 · x𝑇𝑖 ), if 𝜎 (p𝑘 · x𝑇𝑖 ) > 𝛿
𝑤𝑖𝑘 = (1)
to unify NLP Prompt and Graph Prompt in one perspective. Having 0, otherwise
compared the demand of NLP and graph area as shown in Figure As an alternative and special case, we can also use a more simplified
2, we found that the prompt in NLP and graph areas should con- Í| P |
way to get x̂𝑖 = x𝑖 + 𝑘=1 p𝑘 .
tain at least three components: (1) prompt token, which contains
the vectorized prompting information with the same size as the
input word/node vector; (2) token structure, which indicates the con-
3.4 Multi-task Prompting via Meta Learning
nection of different tokens. In the NLP area, prompt tokens (a.k.a 3.4.1 Constructing Meta Prompting Tasks. Let 𝜏𝑖 be the 𝑖-th task
𝑞
prompt words) are preset as a linear relation like a sub-sentence or with supporting data D𝜏𝑠𝑖 and query data D𝜏𝑖 ; Specifically, for the
𝑞
a phrase; whereas in the graph domain, the connections of different graph classification task, D𝜏𝑠𝑖 and D𝜏𝑖 contain labeled graphs; for
tokens are non-linear and far more complicated than NLP prompts; the node classification task, we generate an induced graph for each
(3) inserting pattern, which presents how to add the prompt to the node as mentioned in section 3.2.3, align the graph label with the
𝑞
input data. In the NLP area, the prompt is usually added in the front target node label, and treat this graph as a member in D𝜏𝑠𝑖 or D𝜏𝑖 ;
or the back end of the input sentences by default. However, in the for the edge classification task, we first generate edge induced
graph area, there are no explicit positions like a sentence to joint graphs for training and testing and the edge label is up to its two
graph prompt, making the graph prompting more difficult. endpoints.
3.3.2 Prompt Tokens. Let a graph instance be G = (V, E) where 3.4.2 Applying Meta-learning to Graph Prompting. Let 𝜃 be prompt
V = {𝑣 1, 𝑣 2, · · · , 𝑣 𝑁 } is the node set containing 𝑁 nodes; each parameters, 𝜋 ∗ be the fixed parameters of the pre-trained graph
node has a feature vector denoted by x𝑖 ∈ R1×𝑑 for node 𝑣𝑖 ; E = backbone, and 𝜙 be the tasker’s parameters. We use 𝑓𝜃,𝜙 |𝜋 ∗ to denote
{(𝑣𝑖 , 𝑣 𝑗 )|𝑣𝑖 , 𝑣 𝑗 ∈ V} is the edge set where each edge connects a the pipeline with prompt graph (𝜃 ), pre-trained model (𝜋 ∗ , fixed),
pair of nodes in V. With the previous discussion, we here present and downstream tasker (𝜙). Let L D (𝑓 ) be the task loss with pipline
our prompt graph as G𝑝 = (P, S) where P = {𝑝 1, 𝑝 2, · · · , 𝑝 | P | } 𝑓 on data D. Then for each task 𝜏𝑖 , the corresponding parameters

2123
All in One: Multi-Task Prompting for Graph Neural Networks KDD ’23, August 6–10, 2023, Long Beach, CA, USA

can be updated as follows: Algorithm 1: Overall Learning Process

𝜃𝑖𝑘 = 𝜃𝑖𝑘 −1 − 𝛼∇𝜃 𝑘 −1 L D𝜏𝑠 𝑓𝜃 𝑘 −1 ,𝜙 𝑘 −1 |𝜋 ∗ Input: Overall pipeline 𝑓𝜃,𝜙 |𝜋 ∗ with prompt parameter 𝜃 ,
𝑖 𝑖
𝑖 𝑖
(2) pre-trained model with frozen parameter 𝜋 ∗ , and
𝜙𝑖𝑘 = 𝜙𝑖𝑘 −1 − 𝛼∇𝜙 𝑘 −1 L D𝜏𝑠 𝑓𝜃 𝑘 −1 ,𝜙 𝑘 −1 |𝜋 ∗ task head parameterized by 𝜙; Multi-task episodes
𝑖
𝑖 𝑖 𝑖
E = {E1, · · · , E𝑛 };
where the initialization is set as: 𝜃𝑖0 = 𝜃 , and 𝜙𝑖0 = 𝜙. The goal of Output: Optimal pipeline 𝑓𝜃 ∗ ,𝜙 ∗ |𝜋 ∗
this section is to learn effective initialization settings (𝜃, 𝜙) for meta 1 Initialize 𝜃 and 𝜙
prompting tasks, which can be achieved by minimizing the meta 2 while not done do
loss on various tasks: // inner adaptation
∑︁
𝜃 ∗, 𝜙 ∗ = arg min L D𝑞 𝑓𝜃𝑖 ,𝜙𝑖 |𝜋 ∗ (3) 3 Sample E𝑖 ∈ E where E𝑖 = (TE𝑖 , L E𝑖 , S E𝑖 , Q E𝑖 )
𝜏𝑖
𝜃,𝜙 𝜏𝑖 ∈ T 4 for 𝜏⊳𝑡 ∈ TE𝑖 , ⊳ = 𝑔, 𝑛, ℓ do
where T is the task set. According to the chain rule, we use the 5 𝜃𝜏⊳𝑡 , 𝜙𝜏⊳𝑡 ← 𝜃, 𝜙

second-order gradient to update 𝜃 (or 𝜙) based on query data: (⊳)
6 𝜃𝜏⊳𝑡 ← 𝜃𝜏⊳𝑡 − 𝛼∇𝜃𝜏⊳𝑡 L D𝑠 𝑓𝜃𝜏⊳𝑡 ,𝜙𝜏⊳𝑡 |𝜋 ∗
𝜏⊳𝑡
𝜃 ←𝜃 − 𝛽 · 𝑔𝜃𝑠𝑒𝑐𝑜𝑛𝑑 (⊳)
7 𝜙𝜏⊳𝑡 ← 𝜙𝜏⊳𝑡 − 𝛼∇𝜙𝜏⊳𝑡 L D𝑠 𝑓𝜃𝜏⊳𝑡 ,𝜙𝜏⊳𝑡 |𝜋 ∗
∑︁ 𝜏⊳𝑡
=𝜃 − 𝛽 · ∇𝜃 L D𝑞 𝑓𝜃𝑖 ,𝜙𝑖 |𝜋 ∗ 8 end
𝜏𝑖
𝜏𝑖 ∈ T // outer meta update
(4)
∑︁
=𝜃 − 𝛽 · ∇𝜃𝑖 L D𝑞 𝑓𝜃𝑖 ,𝜙𝑖 |𝜋 ∗ · ∇𝜃 (𝜃𝑖 ) 9 Update 𝜃, 𝜙 by Equation (4) on
𝜏𝑖 𝑞
𝜏𝑖 ∈ T Q E𝑖 = {D𝜏⊳𝑡 |𝜏⊳𝑡 ∈ TE𝑖 , ⊳ = 𝑔, 𝑛, ℓ }
∑︁ 10 end
=𝜃 − 𝛽 · ∇𝜃𝑖 L D𝑞 𝑓𝜃𝑖 ,𝜙𝑖 |𝜋 ∗ · I−𝛼H𝜃 L D𝜏𝑠 𝑓𝜃𝑖 ,𝜙𝑖 |𝜋 ∗
𝜏𝑖 𝑖 11 return 𝑓𝜃 ∗ ,𝜙 ∗ |𝜋 ∗
𝜏𝑖 ∈ T

where H𝜃 (L) is the Hessian matrix with (H𝜃 (L))𝑖 𝑗 = 𝜕 2 L/𝜕𝜃𝑖 𝜃 𝑗 ;

and 𝜙 can be updated in the same way. graph only contains isolated tokens, each of which corresponds to a
Kindly note that in the prompt learning area, the task head is also node category. However, there are at least three notable differences:
known as the answering function, which connects the prompt to the (1) GPPT is not flexible to manipulate original graphs; (2) GPPT is
answers for downstream tasks to be reformulated. The answering only applicable for node classification; and (3) GPPT only supports
function can be both tunable or hand-craft templates. In section 3.5, edge prediction task as the pretext but is not compatible with more
we also propose a very simple but effective hand-crafted prompt advanced graph-level pre-training strategies such as GraphCL [36],
answering template without any tunable task head. UGRAPHEMB [2], SimGRACE [35] etc. We further discuss these
3.4.3 Overall Learning Process. To improve the learning stability, issues w.r.t. flexibility, efficiency, and compatibility as below.
we organize these tasks as multi-task episodes where each episode 3.5.2 Flexibility. The nature of prompting is to manipulate the
contains batch tasks including node classification (“𝑛” for short), input data to match the pretext. Therefore, the flexibility of data
edge classification (“ℓ” for short), and graph classification (“𝑔” for operations is the bottleneck of prompting performance. Let 𝑔 be
short). Let E𝑖 = (TE𝑖 , L E𝑖 , S E𝑖 , Q E𝑖 ) be a multi-task episode. We de- any graph-level transformation such as “changing node features”,
(𝑔) (𝑛) (ℓ ) (⊳)
fine task batch TE𝑖 = {TE , TE , TE } where each subset TE = “adding or removing edges/subgraphs” etc., and 𝜑 ∗ be the frozen
𝑖 𝑖 𝑖 𝑖
{𝜏⊳1, · · · , 𝜏⊳𝑡 ⊳ }; loss function sets L E𝑖 = {L (𝑔) , L (𝑛) , L (ℓ ) }, sup- pre-trained graph model. For any graph G with adjacency matrix A
(𝑔) (𝑛) (ℓ ) (⊳) and node feature matrix X, Fang et al. [6] have proved that we can
porting data S E𝑖 = {S E , S E , S E } where each subset S E = always learn an appropriate prompt token 𝑝 ∗ making the following
𝑖 𝑖 𝑖 𝑖
(𝑔) (𝑛) (ℓ )
{D𝜏𝑠⊳1 , · · · , D𝜏𝑠⊳𝑡⊳ }, and query data Q E𝑖 = {Q E , Q E , Q E } where equation stand:
𝑖 𝑖 𝑖
(⊳)
𝜑 ∗ A, X + 𝑝 ∗ = 𝜑 ∗ (𝑔(A, X)) + 𝑂 𝑝𝜑
𝑞 𝑞
S E = {D𝜏⊳1 , · · · , D𝜏⊳𝑡⊳ }. Then the multi-task prompting is pre- (5)
𝑖
sented in Algorithm 1. We treat each node/edge/graph class as This means we can learn an appropriate token applied to the orig-
a binary classification task so that they can share the same task inal graph to imitate any graph manipulation. Here 𝑂 𝑝𝜑 denotes
head. Note that our method can also deal with other tasks beyond the error bound between the manipulated graph and the prompting
classification with only a few adaptations (see Appendix A). graph w.r.t. their representations from the pre-trained graph model.
This error bound is related to some non-linear layers of the model
3.5 Why It Works? (unchangeable) and the quality of the learned prompt (changeable),
3.5.1 Connection to Existing Work. A prior study on graph prompt which is promising to be further narrowed down by a more ad-
is proposed by [27], namely GPPT. They use edge prediction as vanced prompt scheme. In this paper, we extend the standalone
a pre-training pretext and reformulate node classification to the token to a prompt graph that has multiple prompt tokens with
pretext by designing labeled tokens added to the original graph. learnable inner structures. Unlike the indiscriminate inserting in
The compound graph will be sent to the pre-trained model again Equation (5) (“X + 𝑝 ∗ ” means the prompt token should be added
to predict the link connecting each node to the label tokens. Their to every node of the original graph), the inserting pattern of our
work somehow is a special case of our method when our prompt proposed prompt graph is highly customized. Let 𝜓 (G, G𝑝 ) denote

2124
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan

the inserting pattern defined in section 3.3; G is the original graph, the source task head unchanged. We can even select some specific
and G𝑝 is the prompt graph, then we can learn an optimal prompt pretext and customize the details of our prompt without any tuned
graph G𝑝∗ to extend Equation (5) as follows: task head. Here we present a case that does not need to tune a task
head and we evaluate its feasibility in section 4.4.
𝜑 ∗ 𝜓 (G, G𝑝∗ ) = 𝜑 ∗ (g(A, X)) + 𝑂 𝑝𝜑
∗
(6)
Prompt without Task Head Tuning:
By efficient tuning, the new error bound ∗
can be further re-
𝑂 𝑝𝜑 Pretext: GraphCL [36], a graph contrastive learning task
∗ can be
duced. In section 4.6, we empirically demonstrate that 𝑂 𝑝𝜑 that tries to maximize the agreement between a pair of
significantly smaller than 𝑂 𝑝𝜑 via efficient training. That means views from the same graph.
our method supports more flexible transformations on graphs to Downstream Tasks: node/edge/graph classification.
match various pre-training strategies. Prompt Answer: node classification. Assume there are 𝑘
categories for the nodes. We design the prompt graph with
3.5.3 Efficiency. Assume an input graph has 𝑁 nodes and 𝑀 edges 𝑘 sub-graphs (a.k.a sub-prompts) where each sub-graph has
and the prompt graph has 𝑛 tokens with 𝑚 edges. Let the graph 𝑛 tokens. Each sub-graph corresponds to one node category.
model contain 𝐿 layers and the maximum dimension of one layer be Then we can generate 𝑘 graph views for all input graphs.
𝑑. The parameter complexity of the prompt graph is only 𝑂 (𝑛𝑑). In We classify the target node with label ℓ (ℓ = 1, 2, · · · , 𝑘) if
contrast, some typical graph models (e.g., GAT [32]) usually contain the ℓ-th graph view is closest to the induced graph. It is
𝑂 (𝐿𝐾𝑑 2 + 𝐿𝐾𝑑) parameters to generate node embedding and addi- similar to edge/graph classification.
tional 𝑂 (𝐾𝑑) parameters to obtain the whole graph representation
Interestingly, by shrinking the prompt graph as isolate tokens
(𝐾 is the multi-head number). The parameters may be even larger in
aligned with node classes and replacing the induced graphs with
other graph neural networks (e.g., graph transformer [37]). In our
the whole graph, our prompt format can degenerate to GPPT, which
prompt learning framework, we only need to tune the prompt with
means we can also leverage edge-level pretext for node classifica-
the pre-trained graph model frozen, making the training process
tion. Since this format is exactly the same as GPPT, we will not
converge faster than traditional transfer tuning.
discuss it anymore. Instead, we directly compare GPPT on node
For the time complexity, a typical graph model (e.g., GCN [34])
classification with our method.
usually needs 𝑂 (𝐿𝑁𝑑 2+𝐿𝑀𝑑+𝑁𝑑) time to generate node embedding
via message passing and then obtain the whole graph representation
4 EVALUATION
(e.g., 𝑂 (𝑁𝑑) for summation pooling). By inserting the prompt into
the original graph, the total time is 𝑂 (𝐿(𝑛+𝑁 )𝑑 2+𝐿(𝑚+𝑀)𝑑+(𝑛+𝑁 )𝑑). In this section, we extensively evaluate our method with other ap-
Compared with the original time, the additional time cost is only proaches on node-level, edge-level, and graph-level tasks of graphs.
𝑂 (𝐿𝑛𝑑 2 +𝐿𝑚𝑑 +𝑛𝑑) where 𝑛 ≪ 𝑑, 𝑛 ≪ 𝑁 , 𝑚 ≪ 𝑀. In particular, we wish to answer the following research questions:
Besides the efficient parameter and time cost, our work is also Q1: How effective is our method under the few-shot learning back-
memory friendly. Taking node classification as an example, the ground for multiple graph tasks? Q2: How adaptable is our method
memory cost of a graph model largely includes parameters, graph when transferred to other domains or tasks? Q3: How do the main
features, and graph structure information. As previously discussed, components of our method impact the performance? Q4: How effi-
our method only needs to cache the prompt parameters, which are cient is our model compared with traditional approaches? Q5: How
far smaller than the original graph model. For the graph features powerful is our method when we manipulate graphs?
and structures, traditional methods usually need to feed the whole
graph into a graph model, which needs huge memory to cache these 4.1 Experimental Settings
contents. However, we only need to feed an induced graph to the 4.1.1 Datasets. : We compare our methods with other approaches
model for each node, the size of which is usually far smaller than on five public datasets including Cora [34], CiteSeer [34], Reddit [8],
the original graph. Notice that in many real-world applications, we Amazon [23], and Pubmed [34]. Detailed statistics are presented
are often interested in only a few parts of the total nodes, which in Table 1 where the last column refers to the number of node
means our method can stop timely if there is no more node to be classes. To conduct edge-level and graph-level tasks, we sample
predicted and we do not need to propagate messages on the whole edges and subgraphs from the original data where the label of an
graph either. This is particularly helpful for large-scale data. edge is decided by its two endpoints and the subgraph label follows
the majority of the subgraph nodes. For example, if nodes have 3
3.5.4 Compatibility. Unlike GPPT, which can only use binary edge different classes, say 𝑐 1, 𝑐 2, 𝑐 3 , then edge-level tasks contain at least
prediction as a pretext, and is only applicable for node classifica- 6 categories (𝑐 1, 𝑐 2, 𝑐 3, 𝑐 1𝑐 2, 𝑐 1𝑐 3, 𝑐 2𝑐 3 ). We also evaluate additional
tion as downstream tasks, our framework can support node-level, graph classification and link prediction on more specialized datasets
edge-level, and graph-level downstream tasks, and adopt various where the graph label and the link label are inborn and not related
graph-level pretexts with only a few steps of tuning. Besides, when to any node (see Appendix A).
transferring the model to different tasks, traditional approaches usu-
ally need to additionally tune a task head. In contrast, our method 4.1.2 Approaches. Compared approaches are from three categories:
focuses on the input data manipulation and it relies less on the (1) Supervised methods: these methods directly train a GNN
downstream tasks. This means we have a larger tolerance for the model on a specific task and then directly infer the result. We here
task head. For example, in section 4.3, we study the transferability take three famous GNN models including GAT [32], GCN [34],
from other domains or tasks but we only adapt our prompt, leaving and Graph Transformer [25] (short as GT). These GNN models

2125
All in One: Multi-Task Prompting for Graph Neural Networks KDD ’23, August 6–10, 2023, Long Beach, CA, USA

Table 1: Statistics of datasets. to 12.26% on edge-level tasks, and 0.14% to 10.77% on graph-level
tasks. In particular, we also compared our node-level performance
Dataset #Nodes #Edges #Features #Labels with the previously mentioned node-level prompt model GPPT in
Cora 2,708 5,429 1,433 7 Table 2. Kindly note that our experiment settings are totally dif-
CiteSeer 3,327 9,104 3,703 6 ferent from GPPT. In GPPT, they study the few-shot problem by
Reddit 232,965 23,213,838 602 41 masking 30% or 50% data labels. However, in our paper, we propose
Amazon 13,752 491,722 767 10 a more challenging problem: how does the model perform if we
Pubmed 19,717 88,648 500 3 further reduce the label data? So in our experiment, each class only
has 100 labeled samples. This different setting makes our labeled
ratio approximately only 25% on Cora, 18% on CiteSeer, 1.7% on
are also included as the backbones of our prompt methods. (2) Reddit, 7.3% on Amazon, and 1.5% on Pubmed, which are far less
Pre-training with fine-tuning: These methods first pre-train a than the reported GPPT (50% labeled).
GNN model in a self-supervised way such as GraphCL [36] and
SimGRACE [35], then the pre-trained model will be fine-tuned for 4.3 Transferability Analysis (RQ2)
a new downstream task. (3) Prompt methods: With a pre-trained To evaluate the transferability, we compared our method with the
model frozen and a learnable prompt graph, our prompt method hard transfer method and the fine-tuning method. Here the hard
aims to change the input graph and reformulate the downstream transfer method means we seek the source task model which has
task to fit the pre-training strategies. the same task head as the target task and then we directly conduct
4.1.3 Implementations. We set the number of graph neural layers the model inference on the new task. The fine-tune method means
as 2 with a hidden dimension of 100. To study the transferability we load the source task model and then tune the task head for the
across different graph data, we use SVD (Singular Value Decompo- new task. We evaluate the transferability from two perspectives: (1)
sition) to reduce the initial features to 100 dimensions. The token how effectively is the model transferred to different tasks within
number of our prompt graph is set as 10. We also discuss the impact the same domain? and (2) how effectively is the model transferred
of token numbers in section 4.4 where we change the token number to different domains?
from 1 to 20. We use the Adam optimizer for all approaches. The
4.3.1 Transferability to Different Level Tasks. Here we pre-train
learning rate is set as 0.001 for most datasets. In the meta-learning
the graph neural network on Amazon, then conduct the model on
stage, we split all the node-level, edge-level, and graph-level tasks
two source tasks (graph level and node level), and further evaluate
randomly in 1:1 for meta-training and meta-testing. Reported re-
the performance on the target task (edge level). For simplicity, both
sults are averaged on all tested tasks. More implementation details
source tasks and the target task are built as binary classifications
are shown in Appendix A, in which we also analyze the perfor-
with 1 : 1 positive and negative samples (we randomly select a class
mance on more datasets and more kinds of tasks such as regression,
as the positive label and sample negatives from the rest). We report
link prediction, and so on.
the results in Table 3, from which we have two observations: First,
our prompt method significantly outperforms the other approaches
4.2 Multi-Task Performance with Few-shot and the prediction results make sense. In contrast, the problem of
Learning Settings (RQ1) the hard transfer method is that the source model sometimes can
We compared our prompt-based methods with other mainstream not well decide on the target tasks because the target classes may
training schemes on node-level, edge-level, and graph-level tasks be far away from the source classes. This may even cause negative
under the few-shot setting. We repeat the evaluation 5 times and transfer results (results that are lower than random guess). In most
report the average results in Table 2, Table 12 (Appendix A), and cases, the fine-tuning method can output meaningful results with
Table 13 (Appendix A). From the results, we can observe that most a few steps of tuning but it can still encounter a negative transfer
supervised methods are very hard to achieve better performance problem. Second, the graph-level task has better adaptability than
compared with pre-train methods and prompt methods. This is the node-level task for the edge-level target, which is in line with
because the empirical annotations required by supervised frame- our previous intuition presented in Figure 3 (section 3.2).
works in the few-shot setting are very limited, leading to poor
performance. In contrast, pre-training approaches contain more 4.3.2 Transferability to Different Domains. We also conduct the
prior knowledge, making the graph model rely less on data labels. model on Amazon and PubMed as source domains, then load the
However, to achieve better results on a specific task, we usually model states from these source domains and report the performance
need to carefully select an appropriate pre-training approach and on the target domain (Cora). Since different datasets have various
carefully tune the model to match the target task, but this huge input feature dimensions, we here use SVD to unify input features
effort is not ensured to be applicable to other tasks. The gap be- from all domains as 100 dimensions. Results are shown in Table 4,
tween pre-training strategies and downstream tasks is still very from which we can find that the good transferability of our prompt
large, making the graph model very hard to transfer knowledge also exists when we deal with different domains.
on multi-task settings (we further discuss the transferability in sec-
tion 4.3.) Compared with pre-training approaches, our solutions 4.4 Ablation Study (RQ3)
further improve the compatibility of graph models. The reported In this section, we compare our complete framework with four
improvements range from 1.10% to 8.81% on node-level tasks, 1.28% variants: “w/o meta” is the prompt method without meta-learning

2126
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan

Table 2: Node-level performance (%) with 100-shot setting. IMP (%): the average improvement of prompt over the rest.

Training Cora CiteSeer Reddit Amazon Pubmed

Methods
schemes Acc F1 AUC Acc F1 AUC Acc F1 AUC Acc F1 AUC Acc F1 AUC
GAT 74.45 73.21 82.97 83.00 83.20 89.33 55.64 62.03 65.38 79.00 73.42 97.81 75.00 77.56 79.72
supervised GCN 77.55 77.45 83.71 88.00 81.79 94.79 54.38 52.47 56.82 95.36 93.99 96.23 53.64 66.67 69.89
GT 74.25 75.21 82.04 86.33 85.62 90.13 61.50 61.38 65.56 85.50 86.01 93.01 51.50 67.34 71.91
GraphCL+GAT 76.05 76.78 81.96 87.64 88.40 89.93 57.37 66.42 67.43 78.67 72.26 95.65 76.03 77.05 80.02
GraphCL+GCN 78.75 79.13 84.90 87.49 89.36 90.25 55.00 65.52 74.65 96.00 95.92 98.33 69.37 70.00 74.74
pre-train
GraphCL+GT 73.80 74.12 82.77 88.50 88.92 91.25 63.50 66.06 68.04 94.39 93.62 96.97 75.00 78.45 75.05
+
SimGRACE+GAT 76.85 77.48 83.37 90.50 91.00 91.56 56.59 65.47 67.77 84.50 84.73 89.69 72.50 68.21 81.97
fine-tune
SimGRACE+GCN 77.20 76.39 83.13 83.50 84.21 93.22 58.00 55.81 56.93 95.00 94.50 98.03 77.50 75.71 87.53
SimGRACE+GT 77.40 78.11 82.95 87.50 87.05 91.85 66.00 69.95 70.03 79.00 73.42 97.58 70.50 73.30 74.22
GraphCL+GAT 76.50 77.26 82.99 88.00 90.52 91.82 57.84 67.02 75.33 80.01 75.62 97.96 77.50 78.26 83.02
GraphCL+GCN 79.20 79.62 85.29 88.50 91.59 91.43 56.00 68.57 78.82 96.50 96.37 98.70 72.50 72.64 79.57
GraphCL+GT 75.00 76.00 83.36 91.00 91.00 93.29 65.50 66.08 68.86 95.50 95.43 97.56 76.50 79.11 76.00
prompt
SimGRACE+GAT 76.95 78.51 83.55 93.00 93.14 92.44 57.63 66.64 69.43 95.50 95.43 97.56 73.00 74.04 81.89
SimGRACE+GCN 77.85 76.57 83.79 90.00 89.47 94.87 59.50 55.97 59.46 95.00 95.24 98.42 78.00 78.22 87.66
SimGRACE+GT 78.75 79.53 85.03 91.00 91.26 95.62 69.50 71.43 70.75 86.00 83.72 98.24 73.00 73.79 76.64
IMP (%) 1.47 1.94 1.10 3.81 5.25 2.05 3.97 5.04 6.98 4.49 5.84 2.24 8.81 4.55 4.62
Reported Acc of GPPT (Label Ratio 50%) 77.16 – – 65.81 – – 92.13 – – 86.80 – – 72.23 – –
appr. Label Ratio of our 100-shot setting ∼ 25% ∼ 18% ∼ 1.7% ∼ 7.3% ∼ 1.5%

Table 3: Transferability (%) on Amazon from different level full w/o meta w/o h w/o token structure w/o inserting
tasks spaces. Source tasks: graph-level tasks and node-level 100.00
tasks. Target task: edge-level tasks.
90.00

Source task Methods Accuracy F1-score AUC score 80.00

hard 51.50 65.96 40.34 70.00

graph level fine-tune 62.50 70.59 53.91
prompt 70.50 71.22 74.02 60.00
Acc F1 AUC Acc F1 AUC Acc F1 AUC
hard 40.50 11.85 29.48 node-level edge-level graph-level
node level fine-tune 46.00 54.24 37.26
prompt 59.50 68.73 55.90 Figure 5: Effectiveness of main components

Table 4: Transferability (%) from different domains. Source

domains: Amazon and PubMed. Target domain: Cora any across links between prompt tokens and the input graphs. We
report the performance in Figure 5, from which we can find the
meta-learning and token structure all contribute significantly to the
Source
Amazon PubMed final results. In particular, the inserting pattern between a prompt
Domains
graph and the input graph plays a very crucial role in the final
Tasks hard fine-tune prompt hard fine-tune prompt performance. As previously discussed, the purpose of the prompt-
Acc 26.9 64.14 65.07 55.62 57.93 62.07 based method is to relieve the difficulty of traditional “pre-train,
node
F1 13.11 77.59 80.23 66.33 70.00 76.60 fine-tuning” by filling the gap between the pre-training model and
level
AUC 17.56 88.79 92.59 82.34 83.34 88.46
the task head. This means the prompt graph is proposed to further
Acc 17.00 77.00 82.00 10.00 90.50 96.50 improve the fine-tuning performance. This is particularly important
edge
F1 10.51 81.58 84.62 2.17 89.73 91.80
level when we transfer the model across different tasks/domains, which
AUC 4.26 94.27 96.19 6.15 93.89 94.70
proposes harder demand for the task head. As suggested in Figure
Acc 46.00 87.50 88.00 50.00 91.00 95.50
graph 5, even when we totally remove the tunable task head, the “w/o h”
F1 62.76 89.11 88.12 10.00 93.90 95.60
level
AUC 54.23 86.33 94.99 90.85 91.47 98.47
variant can still perform very competitively, which suggests the
powerful capability of bridging upstream and downstream tasks.

step; “w/o h” is our method without task head tuning, which is 4.5 Efficiency Analysis (RQ4)
previously introduced in section 3.5.4; “w/o token structure” is Figure 6 presents the impact of increasing token number on the
the prompt where all the tokens are treated as isolated without model performance, from which we can find that most tasks can
any inner connection; and “w/o inserting” is the prompt without reach satisfactory performance with very limited tokens, making

2127
All in One: Multi-Task Prompting for Graph Neural Networks KDD ’23, August 6–10, 2023, Long Beach, CA, USA

the complexity of the prompt graph very small. The limited to- supervised
ken numbers make our tunable parameter space far smaller than 0.7 pretrain
prompt
traditional methods, which can be seen in Table 5. This means
our method can be efficiently trained with a few steps of tuning. 0.6

Loss
As shown in Figure 7, the prompt-based method converges faster
than traditional pre-train and supervised methods, which further 0.5
suggests the efficiency advantages of our method.
0.4
Table 5: Tunable parameters comparison. RED (%): average
reduction of the prompt method to others. 0 10 20 30 40 50
Epoch
Methods Cora CiteSeer Reddit Amazon Pubmed RED (%)
GAT ∼ 155K ∼ 382K ∼ 75K ∼ 88K ∼ 61K 95.4↓ Figure 7: Training losses with epochs. Mean values and 65%
GCN ∼ 154K ∼ 381K ∼ 75K ∼ 88K ∼ 61K 95.4↓ confidence intervals by 5 repeats with different seeds.
GT ∼ 615K ∼ 1.52M ∼ 286K ∼ 349K ∼ 241K 98.8↓
prompt ∼ 7K ∼ 19K ∼ 3K ∼ 4K ∼ 3K –
Table 6: Error bound discussed by section 3.5.2 RED (%): aver-
age reduction of each method to the original error.

node-level Acc node-level F1 node-level AUC

edge-level Acc edge-level F1 edge-level AUC
Token Drop Drop Mask
Prompt Solutions RED (%)
100.00 Number Nodes Edges Features
Original Error
0 0.9917 2.6330 6.8209 -
(without prompt)

90.00
Naive Prompt
1 0.8710 0.5241 2.0835 66.70↓
(Equation 5)
Our Prompt Graph 3 0.0875 0.2337 0.6542 90.66↓
(with token, structure, 5 0.0685 0.1513 0.4372 93.71↓
80.00 and inserting patterns) 10 0.0859 0.1144 0.2600 95.59↓
1 5 10 15 20

Figure 6: Impact of token numbers

20 15

10 10

4.6 Flexibility on Graph Transformation (RQ5) 0 5

As discussed in section 3.5.2, the flexibility of data transformation 10

is the bottleneck of prompt-based methods. Here we manipulate 20

several graphs by dropping nodes, dropping edges, and masking 30

features, then we calculate the error bound mentioned in Equa- 20 10 0 10 20 30 20 10 0 10 20 30

tion 5 and 6. We compare the original error with the naive prompt
(a) pre-trained (b) prompt
mentioned in Equation 5, and our prompt graph with 3, 5, and
10 tokens. As shown in Table 6, our designed prompt graph sig-
Figure 8: Visualization of graph representations.
nificantly reduces the error between the original graph and the
manipulated graph. This means our method is more powerful to
stimulate various graph transformations and can further support ACKNOWLEDGMENTS
significant improvement for downstream tasks. This capability can This research is supported in part by project #MMT-p2-23 of the
also be observed in the graph visualization from two approaches. Shun Hing Institute of Advanced Engineering, The Chinese Univer-
As shown in Figure 8, the graph representations from a pre-trained sity of Hong Kong, by grants from the Research Grant Council of
model present lower resolution to node classes compared with our the Hong Kong Special Administrative Region, China (No. CUHK
prompted graph. 14217622), NSFC (No. 61972087, No. 62206067, No. U1936205, No.
62172300, No. 62202336), Guangzhou-HKUST(GZ) Joint Funding
5 CONCLUSION Scheme (No. 2023A03J0673), National Key R&D Program of China
In this paper, we study the multi-task problem of graph prompts (No. 2022YFB3104300, No. 2021YFC3300300), the Fundamental Re-
with few-shot settings. We propose a novel method to reformulate search Funds for the Central Universities (No. ZD-21-202101), and
different-level tasks to unified ones and further design an effective Open Research Projects of Zhejiang Lab (No. 2021KH0AB04). The
prompt graph with a meta-learning technique. We extensively eval- first author, Dr. Xiangguo Sun, in particular, wants to thank
uate the performance of our method. Experiments demonstrate the his parents for their kind support during his tough period.
effectiveness of our framework.

2128
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan

REFERENCES advances in natural language processing via large pre-trained language models:
[1] Takuya Akiba, Takanori Hayashi, Nozomi Nori, Yoichi Iwata, and Yuichi Yoshida. A survey. arXiv preprint arXiv:2111.01243 (2021).
2015. Efficient top-k shortest-path distance queries on large networks by pruned [21] Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Zhiyuan Liu,
landmark labeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, et al. 2021. Exploring low-dimensional
Vol. 29. intrinsic task subspace via prompt tuning. arXiv preprint arXiv:2110.07867 (2021).
[2] Yunsheng Bai, Hao Ding, Yang Qiao, Agustin Marinovic, Ken Gu, Ting Chen, [22] Anna Rogers, Olga Kovaleva, Matthew Downey, and Anna Rumshisky. 2020.
Yizhou Sun, and Wei Wang. 2019. Unsupervised inductive graph-level represen- Getting closer to AI complete question answering: A set of prerequisite real tasks.
tation learning via graph-graph proximity. In Proceedings of the 28th International In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 8722–8731.
[23] Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan
Joint Conference on Artificial Intelligence. 1988–1994.
Günnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint
[3] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan,
arXiv:1811.05868 (2018).
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda
[24] Zheyan Shen, Jiashuo Liu, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and
Askell, et al. 2020. Language models are few-shot learners. Advances in neural
Peng Cui. 2021. Towards out-of-distribution generalization: A survey. arXiv
information processing systems 33 (2020), 1877–1901.
preprint arXiv:2108.13624 (2021).
[4] Hongxu Chen, Hongzhi Yin, Xiangguo Sun, Tong Chen, Bogdan Gabrys, and
[25] Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu
Katarzyna Musial. 2020. Multi-level graph convolutional networks for cross-
Sun. 2020. Masked label prediction: Unified message passing model for semi-
platform anchor link prediction. In Proceedings of the 26th ACM SIGKDD interna-
supervised classification. arXiv preprint arXiv:2009.03509 (2020).
tional conference on knowledge discovery & data mining. 1503–1511.
[26] Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer
[5] Junru Chen, Yang Yang, Tao Yu, Yingying Fan, Xiaolong Mo, and Carl Yang. 2022.
Singh. 2020. AutoPrompt: Eliciting Knowledge from Language Models with
BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion
Automatically Generated Prompts. In Empirical Methods in Natural Language
Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge
Processing (EMNLP).
Discovery and Data Mining. 2741–2751.
[27] Mingchen Sun, Kaixiong Zhou, Xin He, Ying Wang, and Xin Wang. 2022. GPPT:
[6] Taoran Fang, Yunchao Zhang, Yang Yang, and Chunping Wang. 2022. Prompt
Graph pre-training and prompt tuning to generalize graph neural networks. In
Tuning for Graph Neural Networks. arXiv preprint arXiv:2209.15240 (2022).
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and
[7] Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language
Data Mining. 1717–1727.
Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of
[28] Xiangguo Sun, Hong Cheng, Bo Liu, Jia Li, Hongyang Chen, Guandong Xu, and
the Association for Computational Linguistics. 3816–3830.
Hongzhi Yin. 2023. Self-supervised Hypergraph Representation Learning for
[8] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation
Sociological Analysis. IEEE Transactions on Knowledge and Data Engineering
learning on large graphs. Advances in neural information processing systems 30
(2023).
(2017).
[29] Xiangguo Sun, Hongzhi Yin, Bo Liu, Hongxu Chen, Qing Meng, Wang Han, and
[9] Bowen Hao, Hongzhi Yin, Jing Zhang, Cuiping Li, and Hong Chen. 2022. A Multi-
Jiuxin Cao. 2021. Multi-level hyperedge distillation for social linking prediction
Strategy based Pre-Training Method for Cold-Start Recommendation. ACM
on sparsely observed networks. In Proceedings of the Web Conference 2021. 2934–
Transactions on Information Systems (2022).
2945.
[10] Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang,
[30] Xiangguo Sun, Hongzhi Yin, Bo Liu, Qing Meng, Jiuxin Cao, Alexander Zhou,
and Jie Tang. 2022. GraphMAE: Self-Supervised Masked Graph Autoencoders.
and Hongxu Chen. 2022. Structure Learning Via Meta-Hyperedge for Dynamic
In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and
Rumor Detection. IEEE Transactions on Knowledge and Data Engineering (2022).
Data Mining. 594–604.
[31] Jianheng Tang, Jiajin Li, Ziqi Gao, and Jia Li. 2022. Rethinking Graph Neu-
[11] W Hu, B Liu, J Gomes, M Zitnik, P Liang, V Pande, and J Leskovec. 2020. Strategies
ral Networks for Anomaly Detection. In International Conference on Machine
For Pre-training Graph Neural Networks. In International Conference on Learning
Learning.
Representations (ICLR).
[32] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
[12] Cheng Jiashun, Li Man, Li Jia, and Fugee Tsung. 2023. Wiener Graph Deconvolu-
Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Con-
tional Network Improves Graph Self-Supervised Learning. In Proceedings of the
ference on Learning Representations.
AAAI conference on artificial intelligence.
[33] Liyuan Wang, Mingtian Zhang, Zhongfan Jia, Qian Li, Chenglong Bao, Kaisheng
[13] Wei Jin, Tyler Derr, Haochen Liu, Yiqi Wang, Suhang Wang, Zitao Liu, and Jiliang
Ma, Jun Zhu, and Yi Zhong. 2021. Afec: Active forgetting of negative transfer in
Tang. 2020. Self-supervised learning on graphs: Deep insights and new direction.
continual learning. Advances in Neural Information Processing Systems 34 (2021),
arXiv preprint arXiv:2006.10141 (2020).
22379–22391.
[14] Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for
[34] Max Welling and Thomas N Kipf. 2016. Semi-supervised classification with graph
Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on
convolutional networks. In J. International Conference on Learning Representations
Empirical Methods in Natural Language Processing. 3045–3059.
(ICLR 2017).
[15] Jia Li, Zhichao Han, Hong Cheng, Jiao Su, Pengyun Wang, Jianfeng Zhang, and
[35] Jun Xia, Lirong Wu, Jintao Chen, Bozhen Hu, and Stan Z Li. 2022. SimGRACE: A
Lujia Pan. 2019. Predicting path failure in time-evolving graphs. In Proceedings of
Simple Framework for Graph Contrastive Learning without Data Augmentation.
the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data
In Proceedings of the ACM Web Conference 2022. 1070–1079.
Mining. 1279–1289.
[36] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and
[16] Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous
Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in
Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Associa-
Neural Information Processing Systems 33 (2020), 5812–5823.
tion for Computational Linguistics. 4582–4597.
[37] Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim.
[17] Yan Ling, Jianfei Yu, and Rui Xia. 2022. Vision-Language Pre-Training for Mul-
2019. Graph transformer networks. Advances in neural information processing
timodal Aspect-Based Sentiment Analysis. In Proceedings of the 60th Annual
systems 32 (2019).
Meeting of the Association for Computational Linguistics. 2149–2159.
[38] Zexuan Zhong, Dan Friedman, and Danqi Chen. 2021. Factual Probing Is [MASK]:
[18] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Gra-
Learning vs. Learning to Recall. In NAACL-HLT. 5017–5033.
ham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompt-
[39] Andy Diwen Zhu, Xiaokui Xiao, Sibo Wang, and Wenqing Lin. 2013. Efficient
ing methods in natural language processing. arXiv preprint arXiv:2107.13586
single-source shortest path and distance queries on large graphs. In Proceedings
(2021).
of the 19th ACM SIGKDD international conference on Knowledge discovery and
[19] Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie
data mining. 998–1006.
Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across
[40] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021.
Scales and Tasks. In Proceedings of the 60th Annual Meeting of the Association for
Graph contrastive learning with adaptive augmentation. In Proceedings of the
Computational Linguistics (Volume 2: Short Papers). 61–68.
Web Conference 2021. 2069–2080.
[20] Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu
Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heinz, and Dan Roth. 2021. Recent

2129
All in One: Multi-Task Prompting for Graph Neural Networks KDD ’23, August 6–10, 2023, Long Beach, CA, USA

Table 9: Additional graph-level classification. In particular, Movielens contains user’s rating scores to the movies,
each edge in which has a score value ranging from 0 to 5. QM9 is a
ProteinsFull (100 shots) ENZYMES (50 shots) molecule graph dataset where each graph has 19 regression targets,
Methods
Acc (%) Macro F1 (%) Acc (%) Macro F1 (%) which are treated as graph-level multi-output regression. Person-
Supervised 66.64 65.03 31.33 30.25 alityCafe and Facebook datasets are used to test the performance
Pre-train + Fine-tune 66.50 66.43 34.67 33.94 of link prediction, both of which are social networks where edges
Prompt 70.50 70.17 35.00 34.92 denote the following/quoting relations.
Prompt w/o h 68.50 68.50 36.67 34.05 Multi-label v.s. Multi-class Classification In the main experi-
ments, we treat the classification task as a multi-label problem. Here
Table 10: Graph/edge-level regression with few-shot settings. we present the experimental results under a multi-class setting. As
reported in Table 8, our prompt-based method still outperforms the
rest methods.
Tasks Graph Regression Edge Regression
Additional Graph-level Classification Here, we evaluate the
Datasets QM9 (100 shots) MovieLens (100 shots)
Methods MAE MSE MAE MSE graph-level classification performance where the graph label is not
impacted by nodes’ attributes. As shown in Table 9, our method is
Supervised 0.3006 0.1327 0.2285 0.0895 more effective in the multi-class graph classification, especially in
Pre-train + Fine-tune 0.1539 0.0351 0.2171 0.0774 the few-shot setting.
Prompt 0.1384 0.0295 0.1949 0.0620 Edge/Graph-level Regression Beyond classification tasks, our
Prompt w/o h 0.1424 0.0341 0.2120 0.0744 method can also support to improve graph models on regression
tasks. Here, we evaluate the regression performance of both graph-
Table 7: Statistics of Additional Datasets level (QM9) and edge-level (MovieLens) datasets by MAE (mean
absolute error) and MSE (mean squared error). We only feed 100-
shot edge induced graphs for the model and the results are shown
Dataset #Nodes #Edges #Features #Labels #Graphs in Table 10, from which we can observe that our prompt-based
ENZYMES 19,580 74,564 21 6 600 methods outperform traditional approaches.
ProteinsFull 43,471 162,088 32 2 1,113 Link Prediction Beyond edge classification, link prediction is
Movielens 10,352 100,836 100 - 1 also a widely studied problem in the graph learning area. Here, the
QM9 2,333,625 4,823,498 16 - 129,433 edges are split into three parts: (1) 80% of the edges are for message
passing only. (2) 10% of the rest edges as the supervision training
PersonalityCafe 100,340 3,788,032 100 0 1 set. and (3) the rest edges as the testing set. For each edge in the
Facebook 4,039 88,234 1,283 0 1 training set and the testing set, we treat these edges as positive
samples and sample non-adjacent nodes as negative samples. We
Table 8: Multi-class node classification (100-shots) generate the edge-induced graph for these node pairs according to
the first part edges. The graph label is assigned as positive if the
Cora CiteSeer node pairs have a positive edge and vice versa. To further evaluate
Methods
Acc (%) Macro F1 (%) Acc (%) Macro F1 (%) our method’s potential in the extremely limited setting, we only
Supervised 74.11 73.26 77.33 77.64 sample 100 positive edges from the training set to train our model.
Pre-train and Fine-tune 77.97 77.63 79.67 79.83
In the testing stage, each positive edge has 100 negative edges.
We evaluate the performance by MRR (mean reciprocal rank), and
Prompt 80.12 79.75 80.50 80.65 Hit Ratio@ 1, 5, 10. Results from Table 11 demonstrate that the
Prompt w/o h 78.55 78.18 80.00 80.05
performance of our prompt-based method still keeps the best in
Reported Acc of GPPT most cases.
77.16 - 65.81 -
(Label Ratio 50%)
Table 11: Evaluation on link prediction (100-shot settings)
A APPENDIX
Datasets PersonalityCafe Facebook
In this section, we supplement more experiments to evaluate the Methods MRR Hit@1 Hit@5 Hit@10 MRR Hit@1 Hit@5 Hit@10
effectiveness of our framework further. The source code is publicly Supervised 0.18 0.04 0.24 0.56 0.13 0.06 0.17 0.35
available at https://anonymous.4open.science/r/mpg Pre-train
Additional Datasets Besides the datasets mentioned in the 0.13 0.05 0.12 0.34 0.10 0.02 0.16 0.33
+ Fine-tune
main experiments of our paper, we here supplement more datasets Prompt 0.20 0.07 0.32 0.60 0.19 0.10 0.23 0.39
in Table 7 to further evaluate the effectiveness of our framework. Prompt w/o h 0.20 0.06 0.30 0.50 0.15 0.09 0.15 0.33
Specifically, ENZYMES and ProteinsFull are two molecule/protein ∼ 0.003% (training) ∼ 0.1% (training)
Label Ratio
datasets that are used in our additional graph-level classification ∼ 80%(message passing) ∼ 80%(message passing)
tasks. Movielens and QM9 are used to evaluate the performance of
our method on edge-level and graph-level regression, respectively.

2130
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan

Table 12: Edge-level performance (%) with 100-shot setting. IMP (%): the average improvement of prompt over the rest.

Training Cora CiteSeer Reddit Amazon Pubmed

Methods
schemes Acc F1 AUC Acc F1 AUC Acc F1 AUC Acc F1 AUC Acc F1 AUC
GAT 84.30 83.35 85.43 68.63 82.79 89.98 93.50 93.03 94.48 85.00 82.67 88.78 80.05 77.07 79.26
supervised GCN 83.85 84.90 85.90 66.67 81.01 89.62 83.50 84.51 91.43 89.00 89.81 98.85 79.00 77.73 80.19
GT 85.95 86.01 87.25 69.70 83.03 82.46 95.50 94.52 96.89 94.00 93.62 99.34 74.50 65.77 85.19
GraphCL+GAT 85.64 85.97 87.22 72.67 82.85 92.98 94.00 93.75 98.43 86.50 86.96 84.47 85.54 83.92 91.78
GraphCL+GCN 86.36 85.82 86.39 70.67 81.82 90.00 94.00 93.94 97.04 86.50 84.92 98.41 80.00 78.05 85.21
pre-train
GraphCL+GT 85.79 86.27 87.51 86.01 85.38 88.58 96.67 95.38 97.65 96.50 97.42 98.12 85.50 87.11 81.68
+
SimGRACE+GAT 86.85 86.80 88.12 85.33 85.26 90.04 95.50 95.54 97.11 87.50 86.34 88.65 80.01 81.03 86.89
fine-tune
SimGRACE+GCN 85.62 85.38 87.83 89.33 86.34 95.10 88.00 87.88 94.49 98.45 97.57 98.29 80.50 82.58 91.22
SimGRACE+GT 86.35 87.03 88.47 86.00 89.52 90.42 97.50 95.54 96.92 96.50 96.45 99.09 81.00 79.57 85.69
GraphCL+GAT 86.85 86.88 87.92 76.67 83.00 96.22 95.36 94.50 98.65 88.50 86.00 87.15 86.50 84.75 92.61
GraphCL+GCN 86.87 86.80 87.79 76.67 82.37 93.54 95.50 95.52 97.75 86.96 85.63 98.66 81.50 78.61 86.11
GraphCL+GT 87.02 86.90 87.97 86.67 88.00 91.10 97.03 95.94 98.62 98.50 98.48 98.53 86.50 87.78 82.21
prompt
SimGRACE+GAT 87.37 87.33 88.37 91.33 92.30 95.18 95.72 96.69 97.64 95.50 95.38 98.89 80.50 82.03 87.86
SimGRACE+GCN 86.85 86.80 88.67 93.47 97.69 97.08 88.00 88.12 95.10 98.50 98.52 98.55 81.00 83.76 91.41
SimGRACE+GT 87.30 87.24 88.74 95.33 96.52 94.46 98.00 98.02 99.38 98.50 98.52 99.10 82.50 80.45 87.61
IMP(%) 1.65 1.48 1.28 12.26 6.84 5.21 1.94 2.29 1.88 3.63 3.44 2.03 2.98 4.66 3.21

Table 13: Graph-level performance (%) with 100-shot setting. IMP (%): the average improvement of prompt over the rest.

Training Cora CiteSeer Reddit Amazon Pubmed

Methods
schemes Acc F1 AUC Acc F1 AUC Acc F1 AUC Acc F1 AUC Acc F1 AUC
GAT 84.40 86.44 87.60 86.50 84.75 91.75 79.50 79.76 82.11 93.05 94.04 93.95 69.86 72.30 66.92
supervised GCN 83.95 86.01 88.64 85.00 82.56 93.33 64.00 70.00 78.60 91.20 91.27 94.33 61.30 59.97 66.29
GT 85.85 85.90 89.59 77.50 75.85 89.72 69.62 68.01 66.32 90.33 91.39 94.39 60.30 60.88 67.62
GraphCL+GAT 85.50 85.54 89.31 83.00 85.47 92.13 72.03 72.82 83.23 92.15 92.18 94.78 85.50 85.50 86.33
GraphCL+GCN 85.50 85.59 87.94 86.50 84.57 94.56 71.00 71.90 80.33 93.58 93.55 94.93 78.75 77.29 89.40
pre-train
GraphCL+GT 85.95 85.05 87.92 84.50 81.87 88.36 69.63 70.06 81.35 91.68 91.55 94.78 86.85 86.93 88.91
+
SimGRACE+GAT 86.04 86.33 88.55 83.50 85.84 90.09 81.32 81.64 88.61 93.58 93.57 93.91 87.33 86.70 88.02
fine-tune
SimGRACE+GCN 85.95 86.05 89.33 84.50 86.46 91.60 80.50 81.52 89.11 90.73 90.52 94.85 85.26 84.64 86.99
SimGRACE+GT 86.40 86.47 89.64 81.00 81.54 89.81 69.50 70.97 77.11 92.63 92.56 94.04 85.95 86.05 89.37
GraphCL+GAT 86.40 86.47 89.46 86.50 89.93 92.24 73.36 73.32 84.77 94.08 94.02 94.20 85.95 85.97 87.17
GraphCL+GCN 85.95 86.01 88.95 87.00 85.87 95.35 72.50 72.91 81.37 94.05 94.05 94.98 84.60 84.43 88.96
GraphCL+GT 86.05 85.17 88.93 85.50 85.28 88.60 72.63 70.97 82.39 92.63 92.64 94.82 87.03 86.96 89.10
prompt
SimGRACE+GAT 86.67 86.36 89.51 87.50 88.37 91.47 82.62 83.33 89.41 93.35 94.66 94.61 87.75 87.69 88.88
SimGRACE+GCN 86.85 86.90 89.95 85.00 85.85 91.95 81.00 82.24 89.43 93.95 92.06 93.89 85.50 85.54 87.30
SimGRACE+GT 86.85 86.87 89.75 87.50 86.63 90.85 76.50 80.82 86.84 94.05 94.06 94.96 86.40 86.50 89.74
IMP(%) 1.12 0.43 0.79 3.52 4.54 0.53 4.69 4.31 6.13 1.72 1.39 0.14 10.66 10.77 9.16

2131

Training The Application of LLM
No ratings yet
Training The Application of LLM
68 pages
Distributed Graph Neural Network Training: A Survey
No ratings yet
Distributed Graph Neural Network Training: A Survey
46 pages
Deep Learning Notes Andrew NG
No ratings yet
Deep Learning Notes Andrew NG
54 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
Les Neuronsputha FINAL 12 5 24 With Cert
No ratings yet
Les Neuronsputha FINAL 12 5 24 With Cert
36 pages
MLST Wavelet Positional Encoding
No ratings yet
MLST Wavelet Positional Encoding
25 pages
MLST Wavelet Positional Encoding
No ratings yet
MLST Wavelet Positional Encoding
23 pages
Gnns
No ratings yet
Gnns
75 pages
AI Course Outline
0% (1)
AI Course Outline
2 pages
ArXiv-2024-MingZhang-0-Towards Graph Contrastive Learning A Survey and Beyond
No ratings yet
ArXiv-2024-MingZhang-0-Towards Graph Contrastive Learning A Survey and Beyond
35 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Gcntrain+: A Versatile and Eficient Accelerator For Graph Convolutional Neural Network Training
No ratings yet
Gcntrain+: A Versatile and Eficient Accelerator For Graph Convolutional Neural Network Training
22 pages
GNN Foundations Frontiers and Applications Chapter6
No ratings yet
GNN Foundations Frontiers and Applications Chapter6
21 pages
A Dual-Channel Semi-Supervised Learning Framework On Graphs Via Knowledge Transfer and Meta-Learning
No ratings yet
A Dual-Channel Semi-Supervised Learning Framework On Graphs Via Knowledge Transfer and Meta-Learning
26 pages
GNNChap 6
No ratings yet
GNNChap 6
27 pages
An End-To-End Attention-Based Approach For Learning On Graphs
No ratings yet
An End-To-End Attention-Based Approach For Learning On Graphs
16 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
2024 - Introduction To Graph Neural Networks A Starting
No ratings yet
2024 - Introduction To Graph Neural Networks A Starting
49 pages
GMPT Cikm2021 Final
No ratings yet
GMPT Cikm2021 Final
10 pages
Large Language Model
No ratings yet
Large Language Model
49 pages
(SDM2022) Neural Graph Matching For Pre-Training Graph Neural Networks
No ratings yet
(SDM2022) Neural Graph Matching For Pre-Training Graph Neural Networks
9 pages
Hetgpt: Harnessing The Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks
No ratings yet
Hetgpt: Harnessing The Power of Prompt Tuning in Pre-Trained Heterogeneous Graph Neural Networks
9 pages
推荐系统图提示
No ratings yet
推荐系统图提示
10 pages
O A: T T O G M A C T: Ne For LL Owards Raining NE Raph Odel For LL Lassification Asks
No ratings yet
O A: T T O G M A C T: Ne For LL Owards Raining NE Raph Odel For LL Lassification Asks
22 pages
Boosting Graph Contrastive Learning Via Graph Contrastive Saliency
No ratings yet
Boosting Graph Contrastive Learning Via Graph Contrastive Saliency
17 pages
AdapterGNN Parameter-Efficient Fine-Tuning Improves Generalization in GNNs
No ratings yet
AdapterGNN Parameter-Efficient Fine-Tuning Improves Generalization in GNNs
14 pages
Graph Contrastive Learning With Augmentations
No ratings yet
Graph Contrastive Learning With Augmentations
19 pages
Automated Unsupervised Graph Representation Learning
No ratings yet
Automated Unsupervised Graph Representation Learning
14 pages
(WWW 2024) Inductive Graph Alignment Prompt - Bridging The Gap Between Graph Pre-Training and Inductive Fine-Tuning From Spectral Perspective
No ratings yet
(WWW 2024) Inductive Graph Alignment Prompt - Bridging The Gap Between Graph Pre-Training and Inductive Fine-Tuning From Spectral Perspective
12 pages
Original Paper
No ratings yet
Original Paper
10 pages
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
No ratings yet
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
12 pages
Distributed Graph Neural Network Training: A Survey
No ratings yet
Distributed Graph Neural Network Training: A Survey
37 pages
p2734 Fang
No ratings yet
p2734 Fang
13 pages
Enhancing Graph Neural Networks With Limited Labeled Data (Pseudo Labeling)
No ratings yet
Enhancing Graph Neural Networks With Limited Labeled Data (Pseudo Labeling)
10 pages
AlonAndYahav 2021 On The Bottleneck of Graph Neu
No ratings yet
AlonAndYahav 2021 On The Bottleneck of Graph Neu
16 pages
25569-Article Text-29632-1-2-20230626
No ratings yet
25569-Article Text-29632-1-2-20230626
9 pages
X-GOAL Multiplex Heterogeneous Graph Prototypical Contrastive Learning
No ratings yet
X-GOAL Multiplex Heterogeneous Graph Prototypical Contrastive Learning
11 pages
Ai 2
No ratings yet
Ai 2
10 pages
1 Xyz
No ratings yet
1 Xyz
21 pages
A Survey of Graph Prompting Methods
No ratings yet
A Survey of Graph Prompting Methods
11 pages
Neural Graph Learning Training Neural Networks Using Graphs
No ratings yet
Neural Graph Learning Training Neural Networks Using Graphs
8 pages
28774-Article Text-32828-1-2-20240324
No ratings yet
28774-Article Text-32828-1-2-20240324
9 pages
Parameter-Efficient Transfer Learning For NLP: Dai & Le 2015 Howard & Ruder 2018 Radford Et Al. 2018
No ratings yet
Parameter-Efficient Transfer Learning For NLP: Dai & Le 2015 Howard & Ruder 2018 Radford Et Al. 2018
13 pages
Computing Graph Neural Networks: A Survey From Algorithms To Accelerators
No ratings yet
Computing Graph Neural Networks: A Survey From Algorithms To Accelerators
38 pages
29112-Article Text-33166-1-2-20240324
No ratings yet
29112-Article Text-33166-1-2-20240324
9 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
29256-Article Text-33310-1-2-20240324
No ratings yet
29256-Article Text-33310-1-2-20240324
9 pages
Improving Global Awareness of Linkset Predictions Using Cross-Attentive Modulation Tokens
No ratings yet
Improving Global Awareness of Linkset Predictions Using Cross-Attentive Modulation Tokens
17 pages
(KDD 2023) Boosting Multitask Learning On Graphs Through Higher Order Task Affinities
No ratings yet
(KDD 2023) Boosting Multitask Learning On Graphs Through Higher Order Task Affinities
10 pages
How To Transfer Algorithmic Reasoning Knowledge To Learn New Algorithms?
No ratings yet
How To Transfer Algorithmic Reasoning Knowledge To Learn New Algorithms?
21 pages
On The Bottleneck of Graph Neural Networks
No ratings yet
On The Bottleneck of Graph Neural Networks
16 pages
Multi-Task Pre-Training Language Model For Semantic Network Completion
No ratings yet
Multi-Task Pre-Training Language Model For Semantic Network Completion
10 pages
Graph GPT
No ratings yet
Graph GPT
10 pages
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
No ratings yet
Graphprompt: Unifying Pre-Training and Downstream Tasks For Graph Neural Networks
12 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Graph Contrastive Learning With Augmentations
No ratings yet
Graph Contrastive Learning With Augmentations
12 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
Parameter-Efficient Transfer Learning For NLP
No ratings yet
Parameter-Efficient Transfer Learning For NLP
10 pages
A Comparison Between Recursive Neural Networks and Graph Neural Networks
No ratings yet
A Comparison Between Recursive Neural Networks and Graph Neural Networks
8 pages
Rolip2 Report GNN
No ratings yet
Rolip2 Report GNN
6 pages
A Generalization of Transformer Networks To Graphs
No ratings yet
A Generalization of Transformer Networks To Graphs
8 pages
Label Tricks For Classification
No ratings yet
Label Tricks For Classification
6 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Technologies 12 00015
No ratings yet
Technologies 12 00015
40 pages
ML DL Da DS
No ratings yet
ML DL Da DS
5 pages
CM412 - DL - Model Paper
No ratings yet
CM412 - DL - Model Paper
5 pages
Artificial Neural Network Dissertation
100% (2)
Artificial Neural Network Dissertation
8 pages
Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach
No ratings yet
Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach
11 pages
Module 4 Notes
No ratings yet
Module 4 Notes
33 pages
Lec7 - 10 - HMM Learning
No ratings yet
Lec7 - 10 - HMM Learning
88 pages
Lab 12
No ratings yet
Lab 12
5 pages
Adadelta: An Adaptive Learning Rate Method Matthew D. Zeiler Google Inc., USA New York University, USA
No ratings yet
Adadelta: An Adaptive Learning Rate Method Matthew D. Zeiler Google Inc., USA New York University, USA
6 pages
13.machine Learning Axioms-Completed
No ratings yet
13.machine Learning Axioms-Completed
8 pages
Computer Network
No ratings yet
Computer Network
10 pages
In-The-Wild Deepfake Detection Using Adaptable CNN Models With Visual Class Activation Mapping For Improved Accuracy
No ratings yet
In-The-Wild Deepfake Detection Using Adaptable CNN Models With Visual Class Activation Mapping For Improved Accuracy
6 pages
3D HumanPose Estimation With Spatial and Temporal Transformers
No ratings yet
3D HumanPose Estimation With Spatial and Temporal Transformers
15 pages
NNDL 7&8 Programs
No ratings yet
NNDL 7&8 Programs
7 pages
Research
No ratings yet
Research
25 pages
Rahul Chakraborty
No ratings yet
Rahul Chakraborty
11 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
19 pages
Master of Computer Application: 2nd Year, Semester-3
No ratings yet
Master of Computer Application: 2nd Year, Semester-3
3 pages
20bce2251 VL2021220503859 Ast02
No ratings yet
20bce2251 VL2021220503859 Ast02
10 pages
Federated Learning
No ratings yet
Federated Learning
9 pages
Assignment - 2: Department of Computer Engineering T.E. (Computer Sem VI) Subject Name: Artificial Intelligence (
No ratings yet
Assignment - 2: Department of Computer Engineering T.E. (Computer Sem VI) Subject Name: Artificial Intelligence (
2 pages
QB Cse3348 Genrative Ai-1
No ratings yet
QB Cse3348 Genrative Ai-1
3 pages
AIML Lect5 Assignment ID3
No ratings yet
AIML Lect5 Assignment ID3
2 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
3 pages
St2 For The Sixth Week
No ratings yet
St2 For The Sixth Week
1 page
History of Deep Learning
No ratings yet
History of Deep Learning
23 pages

(KDD 2023) All in One - Multi-Task Prompting For Graph Neural Networks

Uploaded by

(KDD 2023) All in One - Multi-Task Prompting For Graph Neural Networks

Uploaded by

All in One: Multi-Task Prompting for Graph Neural Networks

Xiangguo Sun Hong Cheng Jia Li

Bo Liu Jihong Guan

Tuned Frozen Prompt

+ inserting pattern: prompt token: token structure:

A promising solution to the above problems is to extend “pre-

annotation is insufficient or how to transfer the model to a new

where 𝑎𝑖 𝑗 is a tunable parameter indicating how possible the token

can be updated as follows: Algorithm 1: Overall Learning Process

where H𝜃 (L) is the Hessian matrix with (H𝜃 (L))𝑖 𝑗 = 𝜕 2 L/𝜕𝜃𝑖 𝜃 𝑗 ;

Training Cora CiteSeer Reddit Amazon Pubmed

Source task Methods Accuracy F1-score AUC score 80.00

hard 51.50 65.96 40.34 70.00

Table 4: Transferability (%) from different domains. Source

node-level Acc node-level F1 node-level AUC

Figure 6: Impact of token numbers

4.6 Flexibility on Graph Transformation (RQ5) 0 5

As discussed in section 3.5.2, the flexibility of data transformation 10

is the bottleneck of prompt-based methods. Here we manipulate 20

several graphs by dropping nodes, dropping edges, and masking 30

features, then we calculate the error bound mentioned in Equa- 20 10 0 10 20 30 20 10 0 10 20 30

Training Cora CiteSeer Reddit Amazon Pubmed

Training Cora CiteSeer Reddit Amazon Pubmed

You might also like