0% found this document useful (0 votes)
11 views

GraphGPT

Uploaded by

pouyarezvani79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

GraphGPT

Uploaded by

pouyarezvani79
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

GraphGPT: Graph Instruction Tuning for Large Language Models

Jiabin Tang Yuhao Yang Wei Wei


University of Hong Kong University of Hong Kong University of Hong Kong
[email protected] [email protected] [email protected]

Lei Shi Lixin Su Suqi Cheng


Baidu Inc. Baidu Inc. Baidu Inc.
[email protected] [email protected] [email protected]

Dawei Yin Chao Huang∗


arXiv:2310.13023v3 [cs.CL] 7 May 2024

Baidu Inc. University of Hong Kong


[email protected] [email protected]

ABSTRACT 1 INTRODUCTION
Graph Neural Networks (GNNs) have evolved to understand graph Graph neural networks (GNNs) have emerged as a powerful frame-
structures through recursive exchanges and aggregations among work for analyzing and learning from graph-structured data [4, 27],
nodes. To enhance robustness, self-supervised learning (SSL) has enabling advancements in various domains, such as social network
become a vital tool for data augmentation. Traditional methods analysis [31, 65], recommender systems [9, 42], and biological net-
often depend on fine-tuning with task-specific labels, limiting their work analysis [6, 25]. One of the key benefits of GNNs is their ability
effectiveness when labeled data is scarce. Our research tackles this to capture the inherent structural information and dependencies
by advancing graph model generalization in zero-shot learning envi- present in graph data. By leveraging message passing and aggre-
ronments. Inspired by the success of large language models (LLMs), gation mechanisms, GNNs can effectively propagate and combine
we aim to create a graph-oriented LLM capable of exceptional gener- information across the graph, enabling them to model complex
alization across various datasets and tasks without relying on down- relationships and make accurate predictions.
stream graph data. We introduce the GraphGPT framework, which In recent years, various GNN architectures have introduced in-
integrates LLMs with graph structural knowledge through graph in- novations in how information is exchanged and aggregated among
struction tuning. This framework includes a text-graph grounding graph nodes. For example, graph convolutional network (GCNs) [17,
component to link textual and graph structures and a dual-stage in- 22] adapt convolutional operations to the graph domain, enabling ef-
struction tuning approach with a lightweight graph-text alignment fective feature representations. Graph attention networks (GATs) [39,
projector. These innovations allow LLMs to comprehend complex 43] leverages attention mechanisms to assign different weights to
graph structures and enhance adaptability across diverse datasets neighboring nodes, allowing for more fine-grained information
and tasks. Our framework demonstrates superior generalization in aggregation. Graph transformer networks (GTNs) [14, 60] incor-
both supervised and zero-shot graph learning tasks, surpassing ex- porate self-attention and positional encoding to capture global
isting benchmarks. The open-sourced model implementation of our dependencies and structural patterns in the graph. However, a no-
GraphGPT is available at https://github.com/HKUDS/GraphGPT. table limitation of many GNN approaches is their heavy reliance
on supervised learning, which can lead to inadequate robustness
CCS CONCEPTS and generalization when confronted with sparse and noisy data.
• Information systems → Data mining; Language models; • To enhance the generalization ability of GNNs, self-supervised
Mathematics of computing → Graph algorithms. learning (SSL) has emerged as a promising approach in graph repre-
sentation learning. It aims to pre-train a robust graph model using
auxiliary tasks on unlabeled graph data. The idea is to leverage the
KEYWORDS
inherent structure and patterns within the graph itself to create
Large Language Models, Graph Learning, Instruction Tuning meaningful self-supervisory signals. SSL-enhanced graph learning
methods exhibit two primary paradigms: contrastive SSL and gen-
∗ Chao Huang is the Corresponding Author. erative SSL. Within contrastive SSL, the emphasis lies on learning
representations by contrasting positive and negative samples, with
Permission to make digital or hard copies of all or part of this work for personal or notable advancements of DGI [40] and GCA [67]. Conversely, gen-
classroom use is granted without fee provided that copies are not made or distributed erative SSL focuses on generating synthetic samples that closely
for profit or commercial advantage and that copies bear this notice and the full citation resemble the original graph structures with masked autoencoders,
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or exemplified by techniques like GraphMAE [11] and S2GAE [35].
republish, to post on servers or to redistribute to lists, requires prior specific permission While these methods aim to generate graph embeddings that are
and/or a fee. Request permissions from [email protected].
generalizable to different downstream tasks, they often require a
SIGIR’24, July 14–18, 2024, Washington, DC, USA
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM. fine-tuning process using labels specific to the downstream graph
ACM ISBN 979-8-4007-0431-4/24/07. . . $15.00 learning scenarios. However, this reliance on labeled data from
https://doi.org/10.1145/3626772.3657775
downstream tasks can restrict their generalization in practical sit- Input: (a)ChatGPT with Node Content Only Token Length: 615
Abstract: The use of lower precision has emerged as a popular technique …
uations where obtaining high-quality labels may not always be Title: TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks
Question: Which arXiv CS sub-category does this paper belong to? …
feasible. This limitation is particularly relevant in learning scenar- Output:
ios like cold-start recommendation systems or traffic flow prediction cs.AR, cs.AI, cs.SY, cs.ET, cs.NE.The paper presents a hardware …
Therefore, the most likely category for this paper is cs.AR …
in new cities where accurate labels may be scarce or unavailable. (b)ChatGPT with Node Content and Token Length: 4649
As a result, the objective of this research is to advance the gener- Input: Text-based Graph Structure
Abstract: The use of lower precision has emerged as a popular technique …
alization capabilities of graph models by addressing challenging and Title: TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks
With it as central node (paper 0), a citation graph can be constructed.
practical zero-shot learning scenarios. Inspired by the remarkable The list of neighbors: Paper 1: … , … , Paper 102: …
The citation relations: Paper 0 cites Paper 1, … , … cites Paper 102.
success of large language models (LLMs) in natural language pro- Question: Which arXiv CS sub-category does this paper belong to? …
Output:
cessing (NLP) tasks [48], where they have demonstrated exceptional Based on the title and Abstract, the paper is likely to belong:
generalization abilities, this work aims to develop a graph-oriented 1. cs.AR (Hardware Architecture) …

Input: (c)GraphGPT Token Length: 750


LLM capable of achieving high generalization across diverse down- Given a citation graph: <graph> where the 0th node is the target paper,
stream datasets and tasks. However, effectively integrating large with the following information:
Abstract: The use of lower precision has emerged as a popular technique …
language models with graph learning poses non-trivial challenges. Title: TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks
Question: Which arXiv CS sub-category does this paper belong to? …
Output:
• C1: Achieving a proper alignment between the structural infor- Based on the title and abstract, we can identify the following CS
mation of a graph and the language space demands meticulous sub-categories that are most likely to be relevant:1. cs.LG …
Ground Truth: cs.LG, Machine Learning
deliberation and thoughtful consideration.
• C2: Effectively guiding LLMs to comprehend the structural in- Figure 1: Limitation of LLMs in understanding graph struc-
formation of graphs remains a considerable challenge. tural contexts with heavy reliance on textual data.
• C3: Endowing LLMs with the ability to reason step-by-step is
important when tackling complex graph learning tasks. (C2) In our proposed dual-stage graph instruction tuning paradigm,
we leverage self-supervised signals through the graph matching
To gain a deeper understanding of the limitations associated task, which is derived from unlabeled graph structures, to serve as
with directly prompting LLMs using purely text-based prompts instructions for guiding model tuning of LLMs. By incorporating
for graph structure modeling, we provide illustrative examples in this self-supervised instruction tuning, the language model acquires
Figure 1. These examples facilitate a comparative analysis between domain-specific structural knowledge related to graphs, thereby
our GraphGPT framework and the ChatGPT approach. We focus enhancing its understanding of graph structures. To further cus-
on a representative node classification task, where the objective is tomize the LLM’s reasoning behavior for diverse downstream graph
to predict the category of a given paper. In Figure 1 (a) and Figure 1 learning tasks, the second stage of our graph instruction tuning
(b), we showcase the prediction results for two scenarios using paradigm involves fine-tuning the LLM with task-specific graph
ChatGPT: (1) utilizing only the input node textual data, and (2) instructions, to improve the model’s adaptability. (C3) By incorpo-
employing text-based graph structure-aware prompts inspired by rating the Chain-of-Thought (COT) distillation into our framework,
the prompt designs in recent studies [2, 5]. These figures highlight GraphGPT enhances its step-by-step reasoning abilities and im-
the potential limitations that arise when relying solely on text- proves its performance in the face of distribution shift.
based prompts for graph structure modeling, as evidenced by the In summary, our work makes the following contributions:
incorrect paper node classification results presented. In contrast,
• This work aims to align graph domain-specific structural knowl-
our GraphGPT framework effectively addresses these limitations
edge with the reasoning ability of Large Language Models (LLMs)
by preserving and leveraging the graph structural information, as
to improve the generalization of graph learning.
shown in Figure 1 (c). It enables accurate identification of the paper
category, in understanding the underlying graph structure. • Our approach aims to align LLMs with Graphs through a graph
Additionally, the utilization of text-based structural prompts instruction tuning paradigm. This paradigm incorporates self-
leads to an increase in token size, which presents challenges in supervised instruction tuning, enhancing the LLM’s comprehen-
practical scenarios. Longer token sequences incur higher compu- sion of graph structural knowledge and its reasoning capabilities.
tational and memory costs, making it less feasible for real-world Additionally, we introduce task-specific instruction tuning to
applications. Furthermore, existing LLMs have token limits, which improve the model’s adaptability across diverse graph tasks.
further restrict the applicability of longer text-based prompts for • We evaluate our proposed GraphGPT on supervised and zero-
large-scale graph structure modeling. These limitations emphasize shot graph learning tasks. We conduct thorough analyses of its
the necessity for more efficient and scalable approaches that can component-wise effects and generalization ability. By comparing
effectively incorporate graph structural information into LLMs. it with state-of-the-art baselines, we demonstrate the superior
Contributions. To address these challenges, we propose a novel generalization power of our approach across various settings.
framework called GraphGPT, which aims to align Large Language
Models (LLMs) with Graphs using a carefully designed graph in- 2 PRELIMINARIES
struction tuning paradigm. (C1) Our framework introduces a text- Graph-structured Data. represents information as entities (nodes)
graph grounding paradigm as the initial step to align the encoding and the relationships (edges) between them. A graph is denoted
of graph structures with the natural language space. By incorporat- as G(V, E, A, X), comprising key components. The node set V
ing textual information in a contrastive manner, we enable effective represents the collection of nodes, with |V | = 𝑁 indicating the total
alignment of graph structure information within language models. number of nodes. The edge set E characterizes the relationships
between nodes. The adjacency matrix A ∈ R𝑁 ×𝑁 encodes the seamlessly with LLMs. Building upon previous works [30, 49], we
graph’s topology, with each element 𝐴𝑖,𝑗 indicating the presence adopt a contrastive approach by incorporating textual information
or absence of an edge between nodes 𝑖 and 𝑗. The feature matrix into the graph structure encoding process. We directly integrate a
X ∈ R𝑁 ×𝐹 contains attribute or feature information associated pre-trained graph encoder into our GraphGPT framework, enabling
with each node, where 𝐹 represents the feature dimensionality. the seamless utilization of its capabilities. Formally, given a graph
Graph Neural Networks. have become a powerful framework G(V, E, A, X) with raw textual contents C = 𝑐𝑖 ∈ R𝑙𝑖 ×𝑑 , 1 ≤ 𝑖 ≤ 𝑁
for learning representations from graph-structured data. Unlike for 𝑁 nodes, we obtain encoded graph representations Ĥ ∈ R𝑁 ×𝑑
traditional neural networks that process grid-like data, GNNs excel and encoded text representations T̂ ∈ R𝑁 ×𝑑 as follows:
in capturing the intricate relationships and dependencies within
H = 𝑓G (X), T = 𝑓T (C), Ĥ = norm(H), T̂ = norm(T) (3)
graphs. They utilize the graph’s structure-comprising nodes and
edges-to derive expressive node representations through repeated We utilize the graph encoder, 𝑓G , to generate structure-level graph
message propagation and aggregation operations. representations from the input graph G(V, E, A, X). To encode the
(𝑙 ) (𝑙 −1) raw textual contents C associated with the nodes, we employ a text
𝑚 𝑣 = Propagate (𝑙 ) ({ℎ𝑢 : 𝑢 ∈ N (𝑣)}),
encoder, such as a transformer or BERT, denoted as 𝑓T . This step
(𝑙 ) (𝑙 ) (𝑙 −1) (𝑙 ) produces encoded text representations of nodes, which are then
ℎ𝑣 = Aggregate (ℎ 𝑣 , 𝑚𝑣 ) (1)
normalized row-wise using the norm function. The text-structure
In Graph Neural Networks, the feature vector of node 𝑣 at layer 𝑙 is alignment across modalities is conducted as follows:
(𝑙 )
denoted as ℎ 𝑣 . Message passing is performed by the Propagate (𝑙 )
function, aggregating information from neighboring nodes of 𝑣 in Γ1 = ( ĤT̂⊤ ) · exp(𝜏), Γ2 = ( ĤT̂′⊤ ) · exp(𝜏), Γ3 = ( T̂⊤ T̂′⊤ ) · exp(𝜏)
layer 𝑙. The Aggregate (𝑙 ) function combines this information with 3
∑︁ 1
the previous layer’s representation of node 𝑣 to update ℎ 𝑣 . By
(𝑙 ) L= 𝜆𝑖 (CE(Γ𝑖 , y) + CE(Γ𝑖⊤, y)) (4)
2
incorporating graph structure into learned representations, GNNs 𝑖=1
can be tailored for tasks like node classification and link prediction. where T̂′ = { | N1 | 𝑗 ∈ N𝑖 T̂ 𝑗 , 1 ≤ 𝑖 ≤ 𝑁 } and 𝑁 is the num-
Í
𝑖
ber of nodes. In our text-graph grounding, we use the label y =
3 METHODOLOGY (0, 1, · · · , 𝑛 − 1) ⊤ for the contrastive alignment objective. We em-
3.1 Structural Information Encoding with ploy a graph transformer [61] as the graph encoder and a vanilla
transformer [38] as the text encoder.
Text-Graph Grounding
To improve the understanding of graph structural information by 3.2 Dual-Stage Graph Instruction Tuning
large language models, our framework focuses on aligning the en- The dual-stage graph instruction tuning paradigm proposed in this
coding of graph structures with the natural language space. This work builds upon the concept of instruction tuning, which has been
alignment enables language models to effectively comprehend the recently introduced to enhance the adaptability of language models
graph’s structural elements using their language understanding for specific domains [45]. In this paradigm, we aim to align the
capabilities. To achieve this, we introduce a text-graph grounding language capacity of the model with the nuances of graph learning
paradigm that generates prompts preserving the graph’s structural tasks, enabling the language model to generate more accurate and
context for language models. This paradigm acts as a bridge, con- contextually appropriate responses for graph-structured data.
necting the semantic understanding of textual information with
3.2.1 Self-Supervised Instruction Tuning. In the initial stage
the inherent structural relationships in the graph.
of our graph instruction tuning, we introduce self-supervised in-
In our GraphGPT, we design the graph encoder to be highly
struction tuning. This mechanism enhances the language model’s
flexible, allowing it to leverage a wide range of backbone GNN
reasoning abilities by incorporating graph domain-specific struc-
architectures obtained from diverse graph pre-training paradigms.
tural knowledge and effectively understanding contextual infor-
We incorporate a message-passing neural network architecture,
mation within the graph’s structure. To achieve this, we utilize
which can be a graph transformer [60] or a graph convolutional
self-supervised signals derived from unlabeled graph structures as
network [17], as the structure-level pre-trained graph model. In
instructions for model tuning. Specifically, we design a structure-
each message-passing step, the graph encoder aggregates informa-
aware graph matching task that guides the language model in
tion from neighboring nodes, considering their relationships:
  differentiating between graph tokens using language tokens. This
H (𝑙 ) = 𝜎 ÃH (𝑙 −1) W (2) instruction task plays a vital role in accurately associating graph
tokens with their corresponding textual descriptions, deepening the
The self-loop adjacency matrix, denoted as Ã, is obtained by adding model’s comprehension of the graph with the provided guidance.
the identity matrix I to the original adjacency matrix A. W is the Instruction Design. The instruction for our graph matching task
parameter matrix. This matrix captures the self-connections and consists of three components: i) graph information, ii) human ques-
local connectivity of nodes in the graph. 𝜎 (·) is the non-linear tion, and iii) GraphGPT response. In this task, we treat each node
activation. H (𝑙 ) is the graph representations at the 𝑙-th layer. in the graph as a central node and perform h-hops with random
Text-Structure Alignment. To enhance the alignment of graph neighbor sampling, resulting in a subgraph structure. The natural
structure information with Language Models (LLMs), our focus language input for the LLM is the human question. In the context
is on exploring effective encoding methods that can collaborate of the graph matching task, the instruction includes the indicator
Graph Tokens Language Tokens
Central Node [cls] Human Instruct Human Instruct
Tuned Given a sequence of graph tokens <Graph>. The first
[Instruct] Given a sequence of graph tokens <Graph>… Here is a list token represents the central node of the subgraph. The
1-hop Neighbor
of node text: <NodeTexts> Please reorder the list of remaining represent the first and second order
[NodeText]
Frozen texts according to the order of graph tokens. neighbors... <NodeTexts> Which category does this
2-hop Neighbor
Alignment node belong to? Please think in a step-by-step manner
Projector A clinical
and provide your reasoning.
[Graph] observation that
confirms…

Text-Grounded In security
Structural Encoder Graph Tokens LLMs sensitive apps, it LLMs ? ?
[Instruct]
is essential that… Graph Tokens
Text
Text
Input Graphs from Text
Attributes We show a tight Text classification link prediction
Attributes
Attributes lower bound of Text
CoT
Multiple Domains [eos] \Omega on the… Attributes
Attributes Instruct Tuning + Task Distillation
Text Prompts
Attributes Instruct Tuning
Cardiovascular
LLM Response
LLM Response To determine the categorization, we consider the
complications are Large Language Vicuna

the primary…
Models (LLMs) Based on the information, we obtain the matching as specific topics in the text. First, it involves… Second,
follows: Graph token 1 corresponds to... Graph token there is evidence that… Finally, this node is about…,
Llama 2 corresponds to… Graph token 3 corresponds to… which can be categorized into…
Structural Information Encoding
Self-Supervised Instruction Tuning Task-Specific Instruction Tuning
Figure 2: The overall architecture of our proposed GraphGPT with graph instruction tuning paradigm.

token <graph> and a shuffled list of node text information. For classification or link prediction. By fine-tuning the LLM using task-
example, in a citation graph, the node text information corresponds specific graph instructions, we guide the model to generate re-
to paper titles. The objective of the LLM in the graph matching sponses that align with the constraints and requirements of the
task is to align each graph token with its corresponding node text specific graph learning task. This enhances the model’s adaptability
information. This requires reordering the node text information and performance in handling diverse graph learning tasks.
list based on the sequence of graph tokens, effectively associating
each graph token with its relevant textual description. The detailed
designs of graph matching are shown in Figure 4. GNN
Tuning Strategy. To optimize the tuning process efficiently, we
propose incorporating a Lightweight Alignment Projector. Dur- Text Attribute Transformer
ing training, we focus on optimizing the parameters of the projector
𝑓P , while keeping the parameters of both the LLM and the graph en- Figure 3: Workflow of text-structure alignment.
coder fixed. We assume that the projector successfully learns to map
the encoded graph representation to graph tokens, while the LLM Instruction Design. We utilize a consistent instruction template
excels at aligning these tokens with diverse node text information. comprising three parts. To generate graph information for each
To align the graph tokens with the language tokens, we employ a node, we employ the same neighbor sampling approach as in the
projector 𝑓P , which can be as simple as a single linear layer. This pro- first stage. This approach ensures the inclusion of relevant graph
jector establishes correspondence between the graph tokens and the information, with each node serving as the central node. For the
language tokens. By replacing the indicator token <graph> in the node classification task, the human question instruction includes
original language token sequence, the aligned graph tokens create the indicator token <graph> and specific text information about the
a modified token sequence for the LLM. This modified sequence, de- central node. This instruction guides the language model to predict
noted as {<graph_begin>, <graph_token>1, · · · , <graph_token>𝑛 , the category of the central node based on both the graph structure
<graph_end>}, corresponds to the number of nodes 𝑛 in the graph data and the accompanying text information. Figure 4 provides
associated with the given prompt. Given that the graph matching instruction examples for different tasks, visually illustrating the
process is unsupervised, we have the opportunity to leverage a presentation of the instruction to the language model.
vast amount of unlabeled graph data from different domains, to en- Tuning Strategy. In the second stage of training, we utilize the
hance the generalizability of the learned projector. Mathematically, parameters of the structure-aware projector that were trained in
with projected graph tokens X G = 𝑓P ( Ĥ) and text embeddings the first stage as the initial state. This allows us to conduct instruc-
X I = tokenizer(instruction), for a sequence of length 𝐿, we com- tion tuning specifically for downstream tasks. During this training
pute the probability of generating the target output X O as follows: process, we keep the parameters of the language model (LLM) and
graph encoder fixed, focusing solely on optimizing the parameters
of the projector from the previous stage. By doing so, we ensure that
𝐿
Ö the LLM further aligns with the requirements of downstream tasks,
𝑝 (X O |X G , X I ) = 𝑝𝜃 (𝑥𝑖 |X G , X I,<𝑖 , X O,<𝑖 ) (5) enhancing its ability to comprehend and interpret graph structures.
𝑖=1 After completing the two training stages as described above, we
have confidence that our GraphGPT has acquired the capability to
where 𝜃 are the learnable parameters within GraphGPT. comprehend the given graph structure and perform downstream
tasks on the provided graph. The training process involving in-
3.2.2 Task-Specific Instruction Tuning. In the second stage, we
struction tuning and the freezing of specific model parameters has
introduce task-specific instruction tuning to customize the model’s
refined the model’s understanding of graph structures, enabling it
reasoning behavior for different graph learning tasks, such as node
to effectively tackle various tasks associated with the given graph.
Graph Information: <graph>: Central Node: 68442, Edge index: [[…src node…],[…dst node…]], Node list: […] Graph Matching
Human Question: Given a sequence of graph tokens <graph> that constitute a subgraph of a citation graph, …. Here is a list of paper titles: 1. …
2. …, please reorder the list of papers according to the order of graph tokens.
GraphGPT Response: Based on the given graph tokens and the list of paper titles, we obtain the matching of graph tokens and papers: Graph token 1
corresponds to smt based induction methods for timed systems. Graph token 2 corresponds to …

Graph Information: <graph>: Central Node: 2, Edge index: [[…src node…],[…dst node…]], Node list: […] Node Classification
Human Question: Given a citation graph: <graph> where the 0th node is the target paper, with the following information: Abstract: … Title: …
Question: Which arXiv CS sub-category does this paper belong to? …
GraphGPT Response: cs.IT, cs.LG, cs.SP, cs.CV, cs.NA. The paper discusses the Restricted Isometry …. So, it is likely to belong to cs.IT…

Graph Information: <graph>: Central Node 1: 8471, Edge index 1: [[…src node…],[…dst node…]], Node list 1: […] Link Prediction
<graph>: Central Node 2: 19368, Edge index 2: [[…src node…],[…dst node…]], Node list 2: […]
Human Question: Given a sequence of graph tokens: <graph> that constitute a subgraph of a citation graph, …. Abstract: … Titile: … and the other
sequence of graph tokens: <graph>, … Abstract: … Title: …, are these two central nodes connected? Give me an answer of "yes" or "no".
GraphGPT Response: Yes, they are connected. Based on the first paper, …. And the second paper proposes ….

Figure 4: Our instruction designs for graph matching task (upper), node classification (middle) and link prediction (lower).

3.3 Chain-of-Thought (CoT) Distillation • RQ3: What is the contribution of various key components in the
When faced with diverse graph data, language models may en- proposed GraphGPT framework to its overall performance?
counter unfamiliar patterns and structures, leading to challenges • RQ4: How scalable and efficient is our GraphGPT framework?
in generating accurate and coherent responses. This is especially
true when the number of node classes varies across different types 4.1 Experimental Settings
of graph data, causing distribution shift. To address this challenge 4.1.1 Data Descriptions. We evaluate our GraphGPT using three
and enhance accuracy in the presence of distribution shift, it is datasets: OGB-arxiv, PubMed, and Cora. The OGB-arxiv dataset [12]
crucial to equip our GraphGPT with step-by-step reasoning abili- represents a directed graph capturing the citation network among
ties. Thus, we propose incorporating the Chain-of-Thought (COT) computer science arXiv papers indexed by MAG [41]. Each paper is
technique [47], which explicitly models the flow of thoughts and manually labeled with a research category selected from 40 subject
reasoning steps. By leveraging COT, our language model improves areas. The PubMed dataset [8] consists of 19,717 scientific publi-
the coherence and consistency of generated text, enabling it to cations on diabetes from the PubMed database, categorized into
follow a logical progression of ideas and enhance its understanding Experimental induced diabetes, Type 1 diabetes, and Type 2 dia-
and reasoning capabilities for the given graph data. betes. Additionally, it includes a citation network with 44,338 links.
Incorporating the Chain-of-Thought (COT) technique can be The Cora dataset [49] comprises 25,120 research papers connected
challenging due to the influence of model parameter scale [32]. To through citations. We use an expanded version with 70 classes,
overcome this, we draw inspiration from previous research [32] and larger than previous versions [17].
adopt a distillation approach. By extracting valuable knowledge 4.1.2 Evaluation Protocols. To facilitate comparison across dif-
from a closed-source, powerful language model like ChatGPT (with ferent datasets, we map node features into a unified vector space
over 200 billion parameters), we can generate high-quality COT by encoding raw text information with a pre-trained BERT [3].
instructions and enhance our model’s COT reasoning capabilities In our experiments, we partition the Cora and PubMed datasets
without increasing the parameter count. into training, validation, and testing sets following a 3:1:1 ratio, as
COT Distillation Paradigm. Our approach involves designing described in previous works [8, 49]. For the OGB-arxiv data, we
tailored Chain-of-Thought (COT) prompts for node-specific tasks. adhere to the public split setting [12] with a training, validation,
For the node classification task in a citation graph, we provide the and testing ratio of 6:2:3. To evaluate our model’s performance, we
abstract, paper title, and a task description as input. Using the GPT- utilize three commonly used metrics: Accuracy and Macro F1 for
3.5 language model (LLM), we incorporate "Please think about node classification, and AUC for link prediction.
the categorization in a step-by-step manner." to enable 4.1.3 Baseline Methods. In our performance comparison, we
step-by-step reasoning. By engaging in sequential thought, the LLM consider various state-of-the-art methods for comprehensive evalu-
generates output that includes predictions for node classes and de- ation. (i) The first category includes MLP, which employs a Multi-
tailed explanations for each prediction. This ensures transparent layer Perceptron for node representation. (ii) The second category
and comprehensible reasoning and decision-making. To further comprises representative graph neural encoders, including Graph-
enhance performance, we integrate the generated COT instruction SAGE [7], GCN [17], GAT [39], and RevGNN [21]. (iii) The third
data with previously designed instructions for task-specific instruc- category focuses on the self-supervised approach DGI [40] for graph
tion tuning. With the integrated instructions, we proceed with the learning. (iv) The fourth category explores knowledge distillation-
proposed instruction tuning paradigm. enhanced GNNs, with GKD [55] and GLNN [63] as notable methods.
(v). The fifth category showcases recently proposed strong graph
4 EVALUATION
transformer networks, with NodeFormer [51] and DIFFormer [50]
We conduct experiments to address key research questions: as competitors. (vi) Lastly, we consider open-sourced LLMs, such
• RQ1: How does the proposed GraphGPT framework perform in as Baichuan-7B, vicuna-7B-v1.1, and vicuna-7B-v1.5 as baselines
both supervised and zero-shot graph learning settings? for understanding text-attributed graph data.
• RQ2: What is the generalization ability of our model in handling 4.1.4 Implementation Details. For our model implementation,
multiple tasks without experiencing catastrophic forgetting? we primarily use the PyTorch and Transformers libraries. We utilize
Table 1: Performance comparison of various methods on node classification under both supervised and zero-shot settings.
Dataset Arxiv-Arxiv Arxiv-PubMed Arxiv-Cora (Arxiv+PubMed)-Cora (Arxiv+PubMed)-Arxiv
Model Accuracy Macro-F1 acc Macro-F1 Accuracy Macro-F1 Accuracy Macro-F1 Accuracy Macro-F1
MLP 0.5179 0.2536 0.3940 0.1885 0.0258 0.0037 0.0220 0.0006 0.2127 0.0145
GraphSAGE 0.5480 0.3290 0.3950 0.1939 0.0328 0.0132 0.0132 0.0029 0.1281 0.0129
GCN 0.5267 0.3202 0.3940 0.1884 0.0214 0.0088 0.0187 0.0032 0.0122 0.0008
GAT 0.5332 0.3118 0.3940 0.1884 0.0167 0.0110 0.0161 0.0057 0.1707 0.0285
RevGNN 0.5474 0.3240 0.4440 0.3046 0.0272 0.0101 0.0217 0.0016 0.1309 0.0126
DGI 0.5059 0.2787 0.3991 0.1905 0.0205 0.0011 0.0205 0.0011 0.5059 0.2787
GKD 0.5570 0.1595 0.3645 0.2561 0.0470 0.0093 0.0406 0.0037 0.2089 0.0179
GLNN 0.6088 0.3757 0.4298 0.3182 0.0267 0.0115 0.0182 0.0092 0.3373 0.1115
NodeFormer 0.5922 0.3328 0.2064 0.1678 0.0152 0.0065 0.0144 0.0053 0.2713 0.0855
DIFFormer 0.5986 0.3355 0.2959 0.2503 0.0161 0.0094 0.0100 0.0007 0.1637 0.0234
baichuan-7B 0.0946 0.0363 0.4642 0.3876 0.0405 0.0469 0.0405 0.0469 0.0946 0.0363
vicuna-7B-v1.1 0.2657 0.1375 0.5251 0.4831 0.1090 0.0970 0.1090 0.0970 0.2657 0.1375
vicuna-7B-v1.5 0.4962 0.1853 0.6351 0.5231 0.1489 0.1213 0.1489 0.1213 0.4962 0.1853
GraphGPT-7B-v1.1-cot 0.4913 0.1728 0.6103 0.5982 0.1145 0.1016 0.1250 0.0962 0.4853 0.2102
GraphGPT-7B-v1.5-stage2 0.7511 0.5600 0.6484 0.5634 0.0813 0.0713 0.0934 0.0978 0.6278 0.2538
GraphGPT-7B-v1.5-std 0.6258 0.2622 0.7011 0.6491 0.1256 0.0819 0.1501 0.0936 0.6390 0.2652
GraphGPT-7B-v1.5-cot 0.5759 0.2276 0.5213 0.4816 0.1813 0.1272 0.1647 0.1326 0.6476 0.2854
p-val 2.26𝑒 −9 1.56𝑒 −10 2.22𝑒 −7 1.55𝑒 −9 1.04𝑒 −9 9.96𝑒 −6 7.62𝑒 −8 1.97𝑒 −7 1.5e −13 4.63𝑒 −6

Vicuna-7B-v1.1 and Vicuna-7B-v1.5 as the base models. The batch further training, their performance significantly declines. In con-
size is set to 2 per GPU, and the learning rate is 2𝑒 −3 . We apply trast, our GraphGPT not only surpasses all state-of-the-art methods
a warmup ratio of 3𝑒 −2 and set the maximum input length of the in supervised tasks but also achieves a remarkable 2-10 times in-
Large Language Model (LLM) to 2048. The training process runs crease in accuracy in the zero-shot graph learning scenario.
for 3 epochs. In the task-specific instruction tuning stage, we ex- LLM-based solutions like Baichuan-7B and Vicuna-7B maintain
plore various combinations of instruction data to assess the model’s stable performance across different datasets but rely solely on text
performance under different data mixtures. The hyperparameter information for predictions. In contrast, our GraphGPT preserves
settings remain constant, except for the number of training epochs, graph structure, providing a comprehensive solution for graph
which is set to 2 in this stage. The alignment projector parameters learning tasks. Two key factors contribute to these improvements:
fine-tuned in the self-supervised instruction tuning stage serve as (i) Our dual-stage graph instruction tuning aligns structural infor-
the initial parameters for the projector in the second tuning stage. mation encoded by the graph encoder with natural language tokens,
For evaluating most baselines, we use their publicly available code. enabling the LLM to understand the graph’s inherent characteris-
For more implementation details, please refer to our released code. tics. (ii) Our framework facilitates mutual enhancement between
the graph encoder and LLM, filling the LLM’s gap in structural un-
4.2 Overall Performance Comparison (RQ1) derstanding and enabling it to reason about the graph’s structure.
We conduct experiments on the node classification task, evaluating
Obs.2: Benefits with Structure-aware Graph Matching. The
both supervised and zero-shot scenarios. The overall performance
presence of the first stage, which involves self-supervised graph
is summarized in Table 1. In the Supervised Task Setting, models
matching tasks for instruction tuning, plays a crucial role in enhanc-
are trained on a specific dataset and evaluated on the corresponding
ing the zero-shot transferability of our GraphGPT. The first stage
test set (e.g., training on Arxiv-Arxiv and testing on the Arxiv test
focuses on aligning the graph tokens, which encode rich structural
set). In the Zero-Shot Task Setting, models are trained on a spe-
information, with the language tokens. This alignment enables the
cific dataset and tested on other datasets without additional training
model to develop a deeper understanding of the inherent structural
(e.g., training on Arxiv-PubMed and testing on the PubMed dataset).
characteristics of the graph data. Without the first stage, if we only
To handle variations in the number of classes across datasets, we
conduct the second stage of task-specific instruction tuning, the
employ a transfer-trained classifier, such as a linear layer, when
model tends to be more prone to overfitting on the specific dataset.
testing GNN-based models. In Table 1, "-7B-" indicates the parame-
In such cases, the model’s performance may be heavily reliant on
ter scale, while "-v1.1-" and "-v1.5-" represent different versions of
dataset-specific patterns and characteristics, rather than a genuine
the base Vicuna model. "-stage2" indicates the adoption of only the
understanding of the underlying graph structure. This can limit the
second stage tuning. "-std" and "-cot" denote the use of the standard
model’s ability to generalize to new, unseen datasets.
and generated COT instruction datasets, respectively.
Obs.3: Benefits with COT Distillation. The "-std" and "-cot" vari-
Obs.1: Overall Superiority of our GraphGPT. Our graph LLM
ants indicate that the use of COT distillation substantially benefits
consistently outperforms various state-of-the-art baselines in both
more complex graph learning tasks. Models tuned with the standard
supervised and zero-shot scenarios. Notably, even recently devel-
instruction dataset can already achieve prominent results when
oped strong GNN-based models, such as NodeFormer, DIFFormer,
transferred to simpler tasks, such as the PubMed dataset with 3
and GKD, exhibit good structural modeling capabilities in the super-
classes, with an accuracy of 0.7011 for Arxiv-PubMed. However,
vised setting. However, when transferred to new datasets without
Table 2: Performance comparison of various instruction mix- Table 4: Module ablation study under both supervised and
tures in supervised learning on the Arxiv dataset and the zero-shot settings to analyze the individual contributions.
zero-shot setting on the Cora dataset for node classification. Dataset Arxiv-Arxiv Arxiv-PubMed Arxiv-Cora
Dataset Supervision. on Arxiv Zero Shot on Cora Variant Acc Mac-F1 Acc Mac-F1 Acc Mac-F1
Model Acc Macro-F1 Acc Macro-F1 w/o GS 0.4962 0.1853 0.6351 0.5231 0.1489 0.1213
MLP 0.5179 0.2536 0.0220 0.0006
w/o LR 0.5807 0.2462 0.2523 0.1925 0.0050 0.0016
GraphSAGE 0.5480 0.3290 0.0132 0.0029
GCN 0.5267 0.3202 0.0187 0.0032 ours 0.6258 0.2622 0.7011 0.6491 0.1813 0.1272
GAT 0.5332 0.3118 0.0161 0.0057
RvGNN 0.5474 0.3240 0.0217 0.0016
DGI 0.5059 0.2787 0.0205 0.0011
GKD 0.5570 0.1595 0.0406 0.0037
GNN-based approaches experience a significant decline in perfor-
GLNN 0.6088 0.3757 0.0182 0.0092 mance on Arxiv after iterative training. In contrast, our model
NodeFormer 0.5922 0.3328 0.0144 0.0053
DIFFormer 0.5986 0.3355 0.0100 0.0007
exhibits improved performance. We attribute this phenomenon to
baichuan-7b 0.0946 0.0363 0.0405 0.0469 the occurrence of catastrophic forgetting in GNN-based models,
vicuna-7B-v1.1 0.2657 0.1375 0.1090 0.0970
vicuna-7B-v1.5 0.4962 0.1853 0.1489 0.1213
where the structural modeling competence of the model trained
Arxiv-std + PubMed-std 0.6390 0.2652 0.1501 0.0936 solely on the smaller PubMed dataset is compromised. However,
Arxiv-cot + PubMed-cot 0.6476 0.2854 0.1647 0.1326
Arxiv-mix + PubMed-mix 0.6139 0.2772 0.1544 0.1048
our model effectively mitigates this issue through our unified graph
Arxiv-std + PubMed-std + Link 0.5931 0.2238 0.1847 0.1579 instruction tuning paradigm. This enables our model to maintain
Arxiv-mix + Pubmed-mix + Link 0.6874 0.3761 0.1836 0.1494
and even enhance its performance by retaining the generalized
Table 3: Performance comparison of various instruction mix- graph structure patterns despite incorporating additional data.
tures for link prediction on PubMed.
Generalization for Multitasking Graph Learner. Recent stud-
Dataset PubMed ies on instruction tuning suggest that mixing different instruction
Model AUC AP tuning data can further enhance the performance of Language and
MLP 0.5583 0.5833 Logic Models (LLMs). In this study, we ensure a consistent number
GAT 0.5606 0.6373
of instruction entries and mix different types of instruction data,
GraphSAGE 0.5041 0.5813
RevGNN 0.4538 0.5083
including standard instruction ("-std"), COT instruction ("-cot"), a
Node2Vec 0.6535 0.6885 blend of standard (50%) and COT (50%) instruction ("-mix"), and
w/o Link 0.5010 0.5005 link prediction instruction ("Link"). The results are presented in
only Link 0.6704 0.6087 Tables 2 and Table 3. We can observe that effective data mixture so-
Arxiv-std + PubMed-std + Link 0.8246 0.8026 lutions can significantly improve the performance of our GraphGPT
Arxiv-mix + PubMed-mix + Link 0.6451 0.5886 under various settings. The addition of task-specific instruction
for link prediction task notably enhances the performance of our
their performance tends to be mediocre when applied to complex model in node classification. Interestingly, after incorporating node
tasks like the Cora dataset with 70 classes. By leveraging the pow- classification, the performance of link prediction also exceeds that
erful reasoning capabilities of the closed-source model (GPT-3.5) of the selected best-performing existing models. After mixing the
through COT distillation, our model can integrate this knowledge instructions of different tasks, our model demonstrates the ability
and significantly enhance its performance on complex graph tasks. to effectively handle various graph learning tasks and transfer its
knowledge to other unseen datasets.
4.3 Generalization Ability Investigation (RQ2)
In this subsection, we explore the generalization ability of our model 4.4 Module Ablation Study (RQ3)
by incorporating more instruction data to fine-tune the LLM for We conduct an ablation study to investigate the individual contri-
effectively handling various types of tasks. Our main results and butions of different sub-modules of our proposed framework, and
experimental observations are presented as follows: the results are reported in Table 5. The observations are as follows:
More Data Boost Model Transfer Ability. In our preliminary Effect of Graph Instruction Tuning. In our study, we investigate
investigation, we examine the influence of data quantity on the the benefit of incorporating graph structural information into LLM
transfer capability of our GraphGPT, as illustrated in the "(Arxiv using the variant "w/o GS." In this variant, we directly adopt the
+ PubMed)-Cora" column of Table 1. In this experiment, we train base LLM (specifically, Vicuna-7B-v1.5) to perform node classifi-
models using a combination of the Arxiv and PubMed datasets and cation on three datasets, without incorporating graph structural
perform zero-shot testing on the Cora dataset. The results reveal information. The results of our study demonstrate that our model
that by incorporating a relatively smaller PubMed dataset (with significantly outperforms the base model that lacks structural infor-
20,000+ items) alongside Arxiv, our GraphGPT exhibits a significant mation. This indicates that our graph instruction tuning paradigm
improvement in transfer performance on Cora. In contrast, the enables the LLM to understand the graph structural information
transfer performance of GNN-based models, trained separately on more effectively. Importantly, this improvement in performance
Arxiv and PubMed, actually deteriorates. was achieved without altering the original parameters of the LLM.
More Data Yet No Forgetting. We further validate the perfor- Instead, it was solely accomplished through our lightweight align-
mance of the combined Arxiv and PubMed instruction data on the ment projector, which aligns graph tokens and natural language
original Arxiv data, as demonstrated in the "(Arxiv + PubMed)- tokens through the 1-linear projection operation.
Arxiv" column in Table 1. The results indicate that most traditional
Table 5: Study on the time and space efficiency of our 4.6 Model Case Study (RQ5)
GraphGPT during both the training and inference stages.
We conduct a detailed analysis of our model’s performance in
Variants Training Time Tuned Parameters GPU Occupy downstream graph learning tasks compared to traditional LLMs
Stage-1-tune OOM 6,607,884,288 OOM using different types of instructions. We evaluate ChatGPT and our
Stage-1-freeze 22:53:33 131,612,672 39517.75 GraphGPT using Arxiv data, with prompts based on node content,
improvement - ↓ × 50.21 -
node content with text-based graph structure, and our designed
Stage-2-tune OOM 6,607,884,288 OOM
Stage-2-freeze 03:44:35 131,612,672 38961.75
graph instruction. The results, shown in Table 6, clearly demon-
improvement - ↓ × 50.21 - strate that despite its massive parameter count (over 200B), Chat-
GPT struggles to make accurate predictions solely based on node
Inference Time (s) Accuracy
20.0 text information or node content with text-based graph structure.
15.0 0.6
Inference Time (s)

Inference Time (s)

0.2 This challenge is particularly evident when dealing with papers


15.0
Accuracy

Accuracy
10.0 0.4 that have cross-disciplinary characteristics, as seen in the example
10.0 0.1 of research domains in machine learning and hardware architecture.
0.2
5.0 5.0 In contrast, our GraphGPT consistently provides accurate predic-
0.0
baichuan vicuna-v1.1vicuna-v1.5 ours baichuan vicuna-v1.1vicuna-v1.5 ours tions and reasonable explanations. This is because our GraphGPT
Arxiv-Arxiv Arxiv-Cora
incorporates a subgraph structure with 103 nodes, allowing it to ex-
Figure 5: Inference efficiency study of our GraphGPT. tract rich structural information from neighboring nodes’ citation
Effect of LLM-enhanced Semantic Reasoning. We conduct relationships, leading to accurate predictions.
further investigations to assess the influence of the LLM’s reasoning Furthermore, we believe that our approach of using graph tokens
ability in our GraphGPT by performing supervised and zero-shot to represent the graph structure as input to the LLM is more effi-
predictions using only the default graph encoders. This variant cient than the natural language solution. In the case of a 103-node
is denoted as "w/o LR". The results of our study indicate that our subgraph, our GraphGPT only requires 750 tokens to be fed into the
GraphGPT, which integrates the LLM, significantly enhances the LLM, while the text-based method requires 4649 tokens. This sig-
performance of graph encoder, especially in the zero-shot setting. nificant reduction in token consumption translates to a substantial
This suggests that the rich semantic information injected by the decrease in training and inference resource requirements.
LLM provides a substantial gain in performance.
5 RELATED WORK
4.5 Model Efficiency Study (RQ4)
Self-supervised Learning and Pre-training on Graphs. To en-
The study aims to assess the computational efficiency of our model hance the robustness of graph models, self-supervised learning
during both the model training and inference stages. (SSL) has been introduced as a powerful technique [13, 16, 24]. It
Training Efficiency with Graph Instruction Tuning. Our in- allows GNNs to learn meaningful graph representations without
struction tuning framework follows a two-stage process where the heavily relying on labeled data. The core idea behind self-supervised
parameters of both the LLM and the graph encoder are frozen, and learning in graph models is to design pretext tasks that leverage
only the graph-text alignment projector is tuned. We conduct a the graph’s intrinsic properties to generate additional supervision
comparison between freezing and tuning the LLM parameters in signals [52]. SSL-enhanced graph learning methods can be broadly
a 4-card 40G Nvidia A100 environment, denoted by "-freeze" and classified into two main paradigms: contrastive SSL and generative
"-tune" respectively. The study analyze the time and space efficiency SSL. In particular, i) Contrastive SSL focuses on learning repre-
in terms of training time, the number of tuned parameters, and GPU sentations by contrasting positive and negative samples. Notable
occupancy (MiB per GPU). Under the same experimental conditions, methods in this domain include GraphCL [59] and GCA [67]. Recent
when tuning LLM parameters, we encounter Out of Memory (OOM) advancements in contrastive SSL include automated contrastive
errors even with a batch size of 1. However, by utilizing our tuning augmentation (i.e., JOAO [58], AdaGCL [15]), Hyperbolic-Euclidean
strategy, the training process remains stable even with a batch size dual space contrasting (e.g., DSGC [56]), or community-aware con-
of 2. Moreover, the number of tuned parameters decreases by more trastive learning (e.g., gCooL [20]). ii) Generative SSL, on the
than 50 times compared to the freezing stage. other hand, focuses on generating realistic samples that resemble
Model Inference Efficiency. In our exploration, we assess the the original graph structures. Recent advancements in this line
inference speed and accuracy of our GraphGPT by comparing it include GraphMAE [10, 11] for feature masking, and S2GAE [35],
with baichuan-7B, vicuna-7B-v1.1, and vicuna-7B-v1.5 LLMs. Us- AutoCF [53] for reconstructing masked edges as SSL tasks.
ing a single 40G Nvidia A100, we measure inference time (seconds Prompt-Tuning for Graph Neural Networks. Recent efforts in
per response) on the Arxiv and Cora COT instruction datasets, as enhancing the generalization capability of graph neural networks
shown in Figure 5. Our graph LLM demonstrates superior efficiency (GNNs) have focused on training GNN models in a self-supervised
and accuracy. Lower inference time doesn’t necessarily mean better manner, followed by fine-tuning on specific downstream tasks using
performance: baichuan-7B yields quick but often incorrect or irrel- prompt-tuning techniques [64]. For example, GPPT [33] is a trans-
evant answers, while vicuna-7B-v1.1 and vicuna-7B-v1.5 require fer learning paradigm, where GNNs are first pre-trained on masked
longer, complex reasoning steps for better answers. In contrast, edge prediction and then prompted with token pairs for down-
our model achieves accurate predictions through a brief reasoning stream node classification. GraphPrompt [26] aims to handle down-
process, enhancing inference efficiency. stream tasks by integrating pre-training and downstream tasks
Table 6: Comparison of prediction results between our GraphGPT and ChatGPT.

Title: TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks. (with 102 first-order and second-order neighbors)
Ground-Truth Category: cs.LG, Machine Learning
ChatGPT (Node Content Only): cs.AR, cs.AI, cs.SY, cs.ET, cs.NE. The paper presents a hardware accelerator for deep neural networks (DNNs) that is
specifically designed to execute ternary DNNs, which represent weights and activations by signed ternary values. Therefore, the most likely category for
this paper is cs.AR (Hardware Architecture), which focuses on the design and analysis of hardware and systems for AI and machine learning applications.
The paper also involves the optimization of compute and storage requirements of DNNs, which falls under the category of cs.AI (Artificial Intelligence).
Additionally, the paper proposes a programmable, in-memory accelerator, which is a type of embedded system, and therefore, it could be categorized
under cs.SY (Systems and Control). Furthermore, the paper discusses the use of ternary values for DNNs, which is an emerging technique to optimize the
compute and storage requirements of DNNs, and therefore, it could be categorized under cs.ET (Emerging Technologies). Lastly, the paper proposes a new
bit-cell called Ternary Processing Cells (TPCs), which could be categorized under cs.NE (Neural and Evolutionary Computing).
ChatGPT (Node Content and Text-based Graph Structure): Based on the title "TiM-DNN: Ternary In-Memory Accelerator for Deep Neural Networks",
the paper is likely to belong to the following arXiv CS sub-categories: 1. cs.AR (Hardware Architecture): The paper proposes a hardware accelerator,
TiM-DNN, specifically designed for executing ternary deep neural networks. This falls under the domain of hardware architecture research. 2. cs.NE
(Neural and Evolutionary Computing): The paper focuses on accelerating deep neural networks, which is a topic within the field of neural and evolutionary
computing. 3. cs.CV (Computer Vision and Pattern Recognition): Deep neural networks are widely used in computer vision tasks, and the TiM-DNN
accelerator could potentially be applied to accelerate computer vision algorithms. 4. cs.SY (Systems and Control): The TiM-DNN accelerator is a system-level
solution for accelerating deep neural networks, so it could be categorized under systems and control research. 5. cs.DC (Distributed, Parallel, and Cluster
Computing): While the paper does not explicitly mention distributed or parallel computing, the TiM-DNN accelerator is designed to optimize the compute
and storage requirements of deep neural networks, which are often executed on distributed or parallel computing systems.
GraphGPT: Based on the title and abstract of the paper, we can identify the following arXiv CS sub-categories that are most likely to be relevant:
1. cs.LG (Machine Learning): This sub-category is relevant because the paper discusses the use of ternary
DNNs for language modeling and image classification, which are both tasks that involve learning. 2. cs.AI
(Artificial Intelligence): This sub-category is relevant because the paper proposes a new accelerator for DNNs,
which is designed to optimize the compute and storage requirements of complex DNNs. 3. cs.NE (Numerical
Analysis): This sub-category is relevant because the paper discusses the use of ternary DNNs, which involve
����������� signed ternary values, and the use of TiM tiles, which are specialized memory arrays that perform massively
parallel signed vector-matrix multiplications on ternary values. 4. cs.SY (Systems and Control Theory): This
sub-category is relevant because the paper discusses the implementation of TiM-DNN in 32nm technology
using an architectural simulator calibrated with SPICE simulations and RTL synthesis. 5. cs.AR (Hardware
Architecture): This sub-category is relevant because the paper proposes a new accelerator for DNNs, which
is designed to be programmable and in-memory.

into a unified task template. Additionally, Sun et al. [34] presents This research addresses this gap by introducing a dual-stage graph
a unified prompt format, reformulates tasks to the graph level, instruction tuning paradigm that effectively aligns the language
and incorporates meta-learning techniques to improve multi-task capacity of LLMs with graph learning. Previous studies [2, 5] have
performance in graph prompting. Despite these advances, these attempted to incorporate graph information into LLMs using natu-
methods still require further fine-tuning that relies on supervi- ral language, but they have faced challenges in handling complex
sion labels from downstream tasks to ensure accurate learning. graph structures and achieving a deep understanding of graphs due
In contrast, this work addresses this limitation by introducing a to the limitations of relying solely on text-based prompts.
foundational graph model that tackles the more challenging task of
zero-shot graph learning. By eliminating the need for label inputs 6 CONCLUSION
from downstream tasks, this approach allows for a more general This work presents an effective and scalable graph large language
and flexible graph learning paradigm in real-world scenarios. model, aims at improving the generalization capabilities of graph
models. The proposed framework, GraphGPT, injects graph domain-
large Language Models. In recent years, LLMs (e.g., ChatGPT [29]
specific structural knowledge into the LLM through a dual-stage
and Claude [1]) have gained widespread attention for their remark-
graph instruction tuning paradigm. By leveraging a simple yet effec-
able capabilities in various NLP tasks [18, 46]. Based on these unique
tive graph-text alignment projector, we enable LLMs to comprehend
capabilities of LLMs, many tuning-free prompting techniques have
and interpret the structural components of graphs. Extensive evalu-
been explored to enhance their generative abilities, such as in-
ations across different settings demonstrate the effectiveness of our
context learning [28] and Chain-of-Thought [47, 57]. With the rise
method in both supervised and zero-shot graph learning scenar-
of open-source LLMs, such as Llama [36, 37], ChatGLM [62], and
ios. Furthermore, the model exhibits strong generalization abilities,
Baichuan [54], technologies for aligning pre-trained LLMs to differ-
allowing it to handle diverse downstream datasets and tasks with-
ent specific tasks and human feedback have been proposed, making
out suffering from catastrophic forgetting. A potential avenue for
private LLMs in specific domains possible [19, 44, 45].
future investigation is exploring pruning techniques to compress
While there have been successful attempts to align LLMs with
redundant or less important parameters of LLM, thereby reducing
visual information, such as multimodal LLMs [23, 66], the align-
the overall model size while preserving its performance.
ment of LLMs with graph structures remains largely unexplored.
REFERENCES [35] Qiaoyu Tan, Ninghao Liu, Xiao Huang, Soo-Hyun Choi, Li Li, Rui Chen, and
[1] Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, et al. 2022. Con- Xia Hu. 2023. S2GAE: Self-Supervised Graph Autoencoders are Generalizable
stitutional AI: Harmlessness from AI Feedback. CoRR abs/2212.08073 (2022). Learners with Graph Masking. In WSDM. 787–795.
[2] Zhikai Chen, Haitao Mao, Hang Li, et al. 2023. Exploring the Potential of Large [36] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
Language Models (LLMs) in Learning on Graphs. CoRR abs/2307.03393 (2023). Lachaux, Timothée Lacroix, Baptiste Rozière, et al. 2023. LLaMA: Open and
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Efficient Foundation Language Models. CoRR abs/2302.13971 (2023).
Pre-training of Deep Bidirectional Transformers for Language Understanding. In [37] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas-
NAACL-HLT (1). Association for Computational Linguistics, 4171–4186. mine Babaei, Nikolay Bashlykov, Soumya Batra, et al. 2023. Llama 2: Open
[4] Yushun Dong, Ninghao Liu, Brian Jalaian, et al. 2022. EDITS: Modeling and Foundation and Fine-Tuned Chat Models. CoRR abs/2307.09288 (2023).
Mitigating Data Bias for Graph Neural Networks. In WWW. ACM, 1259–1269. [38] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, et al. 2017. Atten-
[5] Jiayan Guo, Lun Du, and Hengyu Liu. 2023. GPT4Graph: Can Large Language tion is all you need. In NeurIPS, Vol. 30.
Models Understand Graph Structured Data ? An Empirical Evaluation and Bench- [39] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, et al.
marking. CoRR abs/2305.15066 (2023). 2018. Graph Attention Networks. In ICLR (Poster). OpenReview.net.
[6] Zhichun Guo, Kehan Guo, Bozhao Nan, Yijun Tian, Roshni G. Iyer, et al. 2023. [40] Petar Velickovic, William Fedus, William L. Hamilton, Pietro Liò, et al. 2019. Deep
Graph-based Molecular Representation Learning. In IJCAI. 6638–6646. Graph Infomax. In ICLR (Poster). OpenReview.net.
[7] William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Represen- [41] Kuansan Wang, Zhihong Shen, et al. 2020. Microsoft Academic Graph: When
tation Learning on Large Graphs. In NeurIPS. 1024–1034. experts are not enough. Quant. Sci. Stud. 1, 1 (2020), 396–413.
[8] Xiaoxin He, Xavier Bresson, et al. 2023. Explanations as Features: LLM-Based [42] Xiang Wang, Tinglin Huang, Dingxian Wang, et al. 2021. Learning Intents behind
Features for Text-Attributed Graphs. CoRR abs/2305.19523 (2023). Interactions with Knowledge Graph for Recommendation. In WWW. 878–887.
[9] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong-Dong Zhang, and Meng [43] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, et al. 2019. Heterogeneous
Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network Graph Attention Network. In WWW. ACM, 2022–2032.
for Recommendation. In SIGIR. ACM, 639–648. [44] Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khy-
[10] Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, et al. 2023. GraphMAE2: A Decoding- athi Raghavi Chandu, et al. 2023. How Far Can Camels Go? Exploring the State
Enhanced Masked Self-Supervised Graph Learner. In WWW. 737–746. of Instruction Tuning on Open Resources. CoRR abs/2306.04751 (2023).
[11] Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Jie Tang, et al. 2022. Graphmae: [45] Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel
Self-supervised masked graph autoencoders. In KDD. 594–604. Khashabi, and Hannaneh Hajishirzi. 2023. Self-Instruct: Aligning Language
[12] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, et al. 2020. Open Graph Models with Self-Generated Instructions. In ACL. 13484–13508.
Benchmark: Datasets for Machine Learning on Graphs. In NeurIPS. [46] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian
[13] Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, and Yizhou Sun. 2020. Borgeaud, Dani Yogatama, Jeff Dean, William Fedus, et al. 2022. Emergent
Gpt-gnn: Generative pre-training of graph neural networks. In KDD. 1857–1867. Abilities of Large Language Models. Trans. Mach. Learn. Res. 2022 (2022).
[14] Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous [47] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei
Graph Transformer. In WWW. ACM / IW3C2, 2704–2710. Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting
[15] Yangqin Jiang, Chao Huang, and Lianghao Huang. 2023. Adaptive graph con- Elicits Reasoning in Large Language Models. In NeurIPS.
trastive learning for recommendation. In KDD. 4252–4261. [48] Wei Wei, Xubin Ren, Jiabin Tang, Qinyong Wang, Lixin Su, Suqi Cheng, Junfeng
[16] Baoyu Jing, Chanyoung Park, and Hanghang Tong. 2021. Hdmi: High-order deep Wang, Dawei Yin, and Chao Huang. 2023. LLMRec: Large Language Models with
multiplex infomax. In WWW. 2414–2424. Graph Augmentation for Recommendation. CoRR abs/2311.00423 (2023).
[17] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with [49] Zhihao Wen and Yuan Fang. 2023. Augmenting Low-Resource Text Classification
Graph Convolutional Networks. In ICLR (Poster). OpenReview.net. with Graph-Grounded Pre-training and Prompting. In SIGIR.
[18] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke [50] Qitian Wu, Chenxiao Yang, et al. 2023. DIFFormer: Scalable (Graph) Transformers
Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. In NeurIPS. Induced by Energy Constrained Diffusion. In ICLR.
[19] Harrison Lee, Samrat Phatale, Hassan Mansoor, Kellie Lu, Thomas Mesnard, et al. [51] Qitian Wu, Wentao Zhao, et al. 2023. NodeFormer: A Scalable Graph Structure
2023. RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Learning Transformer for Node Classification. CoRR abs/2306.08385 (2023).
Feedback. CoRR abs/2309.00267 (2023). [52] Jun Xia, Lirong Wu, Jintao Chen, et al. 2022. Simgrace: A simple framework for
[20] Bolian Li, Baoyu Jing, and Hanghang Tong. 2022. Graph communal contrastive graph contrastive learning without data augmentation. In WWW. 1070–1079.
learning. In WWW. 1203–1213. [53] Lianghao Xia, Chao Huang, Tao Yu, Ben Kao, et al. 2023. Automated Self-
[21] Guohao Li, Matthias Müller, Bernard Ghanem, and Vladlen Koltun. 2021. Training Supervised Learning for Recommendation. In WWW. 992–1002.
Graph Neural Networks with 1000 Layers. In ICML. 6437–6449. [54] Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, et al. 2023. Baichuan 2:
[22] Mingkai Lin, Wenzhong Li, Ding Li, Yizhou Chen, and Sanglu Lu. 2022. Resource- Open Large-scale Language Models. CoRR abs/2309.10305 (2023).
Efficient Training for Large Graph Convolutional Networks with Label-Centric [55] Chenxiao Yang, Qitian Wu, and Junchi Yan. 2022. Geometric Knowledge Distilla-
Cumulative Sampling. In WWW. ACM, 1170–1180. tion: Topology Compression for Graph Neural Networks. In NeurIPS.
[23] Haotian Liu, Chunyuan Li, et al. 2023. Visual Instruction Tuning. [56] Haoran Yang, Hongxu Chen, Shirui Pan, Lin Li, Philip S Yu, and Guandong Xu.
[24] Yixin Liu, Ming Jin, Shirui Pan, Chuan Zhou, Yu Zheng, Feng Xia, and S Yu Philip. 2022. Dual space graph contrastive learning. In WWW. 1238–1247.
2022. Graph self-supervised learning: A survey. TKDE 35, 6 (2022), 5879–5900. [57] Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao,
[25] Yunchao Liu, Yu Wang, Oanh Vu, Rocco Moretti, et al. 2023. Interpretable and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving
Chirality-Aware Graph Neural Network for Quantitative Structure Activity Rela- with Large Language Models. CoRR abs/2305.10601 (2023).
tionship Modeling in Drug Discovery. In AAAI. 14356–14364. [58] Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang. 2021. Graph
[26] Zemin Liu, Xingtong Yu, et al. 2023. Graphprompt: Unifying pre-training and contrastive learning automated. In ICML. PMLR, 12121–12132.
downstream tasks for graph neural networks. In WWW. 417–428. [59] Yuning You, Tianlong Chen, Yongduo Sui, et al. 2020. Graph contrastive learning
[27] Xiaojun Ma, Qin Chen, et al. 2022. Meta-Weight Graph Neural Network: Push with augmentations. In NeurIPS, Vol. 33. 5812–5823.
the Limits Beyond Global Homophily. In WWW. ACM, 1270–1280. [60] Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J. Kim.
[28] Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh 2019. Graph Transformer Networks. In NeurIPS. 11960–11970.
Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the Role of Demonstrations: [61] Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim.
What Makes In-Context Learning Work?. In EMNLP. 11048–11064. 2019. Graph transformer networks. In NeurIPS, Vol. 32.
[29] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, [62] Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, et al.
Pamela Mishkin, et al. 2022. Training language models to follow instructions 2023. GLM-130B: An Open Bilingual Pre-trained Model. In ICLR.
with human feedback. In NeurIPS. [63] Shichang Zhang, Yozen Liu, Yizhou Sun, and Neil Shah. 2022. Graph-less Neural
[30] Alec Radford, Jong Wook Kim, Chris Hallacy, et al. 2021. Learning Transferable Networks: Teaching Old MLPs New Tricks Via Distillation. In ICLR.
Visual Models From Natural Language Supervision. In International Conference [64] Wen Zhang, Yushan Zhu, Mingyang Chen, et al. 2023. Structure Pretraining and
on Machine Learning (ICML). PMLR, 8748–8763. Prompt Tuning for Knowledge Graph Transfer. In WWW. 2581–2590.
[31] Zezhi Shao et al. 2022. Pre-training Enhanced Spatial-temporal Graph Neural [65] Yanfu Zhang et al. 2022. Robust Self-Supervised Structural Graph Neural Network
Network for Multivariate Time Series Forecasting. In KDD. ACM, 1567–1577. for Social Network Prediction. In WWW. ACM, 1352–1361.
[32] Kumar Shridhar, Alessandro Stolfo, and Mrinmaya Sachan. 2023. Distilling [66] Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023.
Reasoning Capabilities into Smaller Language Models. In ACL. 7059–7073. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
[33] Mingchen Sun, Kaixiong Zhou, et al. 2022. Gppt: Graph pre-training and prompt Language Models. arXiv preprint arXiv:2304.10592 (2023).
tuning to generalize graph neural networks. In KDD. 1717–1727. [67] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021.
[34] Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. 2023. All in One: Graph contrastive learning with adaptive augmentation. In WWW. 2069–2080.
Multi-Task Prompting for Graph Neural Networks. In KDD.

You might also like