0% found this document useful (0 votes)
48 views

Exploring Large Language Models For Knowledge Graph Completion

Uploaded by

xzj294197164
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Exploring Large Language Models For Knowledge Graph Completion

Uploaded by

xzj294197164
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Exploring Large Language Models for Knowledge Graph Completion

Liang Yao, Jiazhen Peng Chengsheng Mao, Yuan Luo


Tencent Inc. Northwestern University
Shenzhen, China Chicago, USA
[email protected] [email protected]
[email protected] [email protected]

Abstract arising from the sparsity of knowledge graphs. A


Knowledge graphs play a vital role in numer-
number of studies integrate textual information to
ous artificial intelligence tasks, yet they fre- enhance knowledge representation (Socher et al.,
quently face the issue of incompleteness. In 2013; Xie et al., 2016; Xiao et al., 2017; Wang and
arXiv:2308.13916v3 [cs.CL] 10 Sep 2023

this study, we explore utilizing Large Language Li, 2016; Xu et al., 2017; An et al., 2018). Our pre-
Models (LLM) for knowledge graph comple- vious work KG-BERT (Yao et al., 2019) firstly em-
tion. We consider triples in knowledge graphs ploys the pre-trained language model BERT (De-
as text sequences and introduce an innovative vlin et al., 2019) to encode prior knowledge and
framework called Knowledge Graph LLM (KG-
contextual information. The KG-BERT model was
LLM) to model these triples. Our technique em-
ploys entity and relation descriptions of a triple extended by several recent studies (Wang et al.,
as prompts and utilizes the response for pre- 2021, 2022; Lovelace and Rose, 2022) on both effi-
dictions. Experiments on various benchmark ciency and performance, but the language models
knowledge graphs demonstrate that our method used in these works are relatively small.
attains state-of-the-art performance in tasks Recently, large language models (Zhao et al.,
such as triple classification and relation pre- 2023) like ChatGPT and GPT-4 (OpenAI, 2023)
diction. We also find that fine-tuning relatively
have gained significant attention. Researchers
smaller models (e.g., LLaMA-7B, ChatGLM-
6B) outperforms recent ChatGPT and GPT-4.
find that scaling pre-trained language models of-
ten leads to an improved model capacity on down-
1 Introduction stream tasks. These large-sized models show dif-
ferent behaviors from smaller models like BERT
Large knowledge graphs (KG) like FreeBase (Bol-
and display surprising abilities in solving a series
lacker et al., 2008), YAGO (Suchanek et al., 2007),
of complex tasks.
and WordNet (Miller, 1995) serve as a powerful
In this study, we propose a novel method for
foundation for numerous critical AI tasks, includ-
knowledge graph completion using large language
ing semantic search, recommendation (Zhang et al.,
models. Specifically, we treat entities, relations,
2016), and question answering (Cui et al., 2017).
and triples as textual sequences and model knowl-
A KG is generally a multi-relational graph with
edge graph completion as a sequence-to-sequence
entities as nodes and relations as edges. Each edge
problem. We perform instruction tuning with open
is depicted as a triplet (head entity, relation, tail
LLMs (LLaMA (Touvron et al., 2023) and Chat-
entity) (abbreviated as (h, r, t)), signifying the rela-
GLM (Du et al., 2022)) on these sequences for
tionship between two entities, for instance, (Steve
predicting the plausibility of a triple or a candi-
Jobs, founded, Apple Inc.). Despite their effective-
date entity/relation. The method achieves stronger
ness, knowledge graphs remain incomplete. This
performance in several KG completion tasks. Our
issue leads to the challenge of knowledge graph
source code is available at: https://github.com/
completion, which aims to evaluate the plausibility
yao8839836/kg-llm. Our contributions are sum-
of triples that are not present in a knowledge graph.
marized as follows:
A significant amount of research has been dedi-
cated to knowledge graph completion. One preva- • We propose a new language modeling method
lent method is knowledge graph embedding (Wang for knowledge graph completion. To the best
et al., 2017). However, most knowledge graph of our knowledge, this is the first study to sys-
embedding models solely rely on structural infor- tematically investigate large language models
mation from observed triple facts, leading to issues for KG completion tasks.
• Results on several benchmarks show that our as in-context learning, instruction following, and
method achieves state-of-the-art results in step-by-step reasoning. These abilities are helpful
triple classification and relation prediction. for KG completion tasks.
We also find that fine-tuning relatively smaller
models (e.g., LLaMA-7B, ChatGLM-6B) can 2.2 LLMs with KG Completion
outperform recent ChatGPT and GPT-4. Recently, Zhao et al. (2023) presents a compre-
hensive survey of LLMs that describes knowledge
2 Related Work completion as a basic evaluation task of LLMs.
2.1 Knowledge Graph Completion Two closely related studies (Xie et al., 2023; Zhu
et al., 2023) evaluate ChatGPT and GPT-4 on a
Comprehensive reviews of knowledge graph com- link prediction task in KG. Our study is inspired
pletion techniques have been carried out by (Wang by these works, but we further provide more com-
et al., 2017) and (Ji et al., 2021). These techniques prehensive results for KG completion and perform
can be grouped into two categories based on their instruction tuning on three tasks.
scoring functions for triple (h, r, t): translational
distance models like TransE (Bordes et al., 2013) 3 Method
and semantic matching models like DistMult (Yang
et al., 2015). Convolutional neural networks have 3.1 Knowledge Graph Completion Tasks
also demonstrated promising results in knowledge In this chapter, we describe the three tasks in knowl-
graph completion (Dettmers et al., 2018; Nguyen edge graph completion: triple classification, rela-
et al., 2018; Nathani et al., 2019). tion prediction, and entity (link) prediction, and
The methods mentioned above perform knowl- how to transform them into simple prompt ques-
edge graph completion using only the structural tions for LLM to complete the tasks. The entire
information found in triples. However, incorpo- process is depicted in Figure 1.
rating various types of external information, such
as entity types, logical rules, and textual descrip- Triple Classification. Given a triple (h, r, t), the
tions, can enhance performance (Wang et al., 2017; task is to classify it as correct or incorrect. For
Ji et al., 2021). For textual descriptions, Socher example, given the triple <Steve Jobs, founded,
et al. (2013) initially represented entities by aver- Apple Inc.>, the task is to classify it as correct. The
aging the word embeddings in their names, with prompt formation would be "Is this true: Steve
the embeddings learned from an external corpus. Jobs founded Apple Inc.?". And the ideal output of
Wang et al. (2014a) suggested embedding entities LLM would be "Yes, this is true."
and words in the same vector space by aligning Relation Prediction. Given a head entity and a
Wikipedia anchors with entity names. Xie et al. tail entity, the task is to predict the relationship
(2016) employed convolutional neural networks between them. For example, given the head entity
(CNN) to encode word sequences in entity descrip- "Steve Jobs" and the tail entity "Apple Inc.", the
tions. There are also a number of studies in this line task is to predict that their relationship is "founded".
of works (Xiao et al., 2017; Wang and Li, 2016; Xu The prompts formation would be "What is the rela-
et al., 2017; An et al., 2018). Yao et al. (2019) tionship between Steve Jobs and Apple Inc.? Please
proposed KG-BERT which improves the above choose your answer from: was born in | founded
methods with pre-trained language models (PLMs). | is citizen of | . . . . . . | plays for." And the desired
Recently, Wang et al. (2021, 2022); Lovelace and response would be "Steve Jobs founded Apple Inc."
Rose (2022) extended cross-encoder in KG-BERT
to bi-encoder, which enhances the performance Entity (link) Prediction. Given a head entity and
and inference efficiency. Similar to this work, a relationship, the task is to predict the tail entity
KGT5 (Saxena et al., 2022) and KG-S2S (Chen related to the head entity. Given a tail entity and a
et al., 2022) treat KG completion as sequence-to- relationship, the task is to predict the head entity.
sequence tasks. However, the pre-trained language For example, given the head entity "Steve Jobs" and
models used in these studies are relatively small. the relationship "founded", the task is to predict the
Compared with these methods, our method uti- tail entity "Apple Inc.". The prompts formation
lizes more powerful large language models with would be "Steve Jobs founded" for asking the tail
emergent abilities not present in small models such entity and "What/Who/When/Where/Why founded
Input Triple: <Steve Jobs, founded, Apple Inc.>

KG Completion Tasks Prompts Formation Outputs

Triple Classification Is this true: Steve Jobs founded Apple Inc.? Yes, this is true.

What is the relationship between Steve Jobs and


Relation Prediction Apple Inc.? Please choose your answer from: was
born in | founded | is citizen of | …… | plays for.
LLM

Entity Prediction Steve Jobs founded Steve Jobs founded Apple Inc.

What/Who/When/Where/Why
founded Apple Inc.?

Figure 1: Illustrations of Large Langauge Models (LLMs) for Knowledge Graph (KG) Completion.

Method WN11 FB13 Avg.


Apple Inc.?" for asking the head entity. The ideal NTN (Socher et al., 2013) 86.2 90.0 88.1
response would be "Steve Jobs founded Apple Inc." TransE (Wang et al., 2014b) 75.9 81.5 78.7
TransH (Wang et al., 2014b) 78.8 83.3 81.1
TransR (Lin et al., 2015) 85.9 82.5 84.2
3.2 Instruction Turning LLM with KG TransD (Ji et al., 2015) 86.4 89.1 87.8
(KG-LLM) TEKE (Wang and Li, 2016) 86.1 84.2 85.2
TransG (Xiao et al., 2016) 87.4 87.3 87.4
TranSparse-S (Ji et al., 2016) 86.4 88.2 87.3
In order to align LLMs with KG triples, we in- DistMult (Zhang et al., 2018) 87.1 86.2 86.7
troduce KG-LLM, which instruction turns the DistMult-HRS (Zhang et al., 2018) 88.9 89.0 89.0
AATE (An et al., 2018) 88.0 87.2 87.6
pre-trained LLM to process KG data using ConvKB (Nguyen et al., 2018) 87.6 88.8 88.2
the specific factual question-answering prompt DOLORES (Wang et al., 2018) 87.5 89.3 88.4
DKRL (BERT) 87.3 79.8 83.6
paradigm. Specifically, We fine-tune two open KG-BERT(a) (Yao et al., 2019) 93.5 90.4 91.9
LLMs: ChatGLM-6B (Du et al., 2022) with P- KGT5 72.8 66.3 69.6
LLaMA-7B 21.1 9.1 15.1
tuning v2 (Liu et al., 2021) and LLaMA (Touvron LLaMA-13B 28.1 17.6 22.9
et al., 2023) with LoRA (Hu et al., 2021) using KG-LLaMA-7B 95.5 89.2 92.4
KG-LLaMA-13B 95.6 90.2 92.9
prompts and responses of training triples in a KG.
We name our fine-tuned models KG-ChatGLM-6B Table 2: Triple classification accuracy (in percentage)
and KG-LLaMA (7B and 13B). for different methods. The baseline results with citations
are obtained from corresponding papers.
4 Experiments
Method FB13-100
ChatGPT 0.90
Dataset # Ent # Rel # Train # Dev # Test GPT-4 0.94
LLaMA-7B 0.14
WN11 38,696 11 112,581 2,609 10,544
LLaMA-13B 0.16
FB13 75,043 13 316,232 5,908 23,733
KG-LLaMA-7B 0.93
WN18RR 40,943 11 86,835 3,034 3,134
KG-LLaMA-13B 0.94
YAGO3-10 123,182 37 1,079,040 5,000 5,000

Table 3: Triple classification accuracy on 100 test in-


Table 1: Summary statistics of datasets.
stances of FB13 for different LLMs.

4.1 Datasets and Settings We compare KG-LLM with multiple KG em-


We ran our experiments on four widely used bench- bedding methods: TransE and its extensions
mark KG datasets: WN11 (Socher et al., 2013), TransH (Wang et al., 2014b), TransD (Ji et al.,
FB13 (Socher et al., 2013), WN18RR and YAGO3- 2015), TransR (Lin et al., 2015), TransG (Xiao
10 (Dettmers et al., 2018). Table 1 provides statis- et al., 2016) and TranSparse (Ji et al., 2016), Dist-
tics of all datasets we used. We used the same en- Mult and its extension DistMult-HRS (Zhang et al.,
tity and relation text descriptions as in (Yao et al., 2018). The neural tensor network NTN (Socher
2019). Due to the access limit of GPT-4, we ran- et al., 2013). CNN models: ConvKB (Nguyen
domly selected 100 test examples from FB13 and et al., 2018). Contextualized KG embeddings: DO-
YAGO3-10 for evaluation, we name the subsets LORES (Wang et al., 2018). KG embeddings
FB13-100 and YAGO3-10-100. with textual information: TEKE (Wang and Li,
Method WN18RR YAGO3-10 YAGO3-10-100
KG-BERT(a) 0.1102 – –
ever, when instructed to process KG data, KG-
StAR (Wang et al., 2021) 0.2430 – – LLaMA shows significant improvement compared
KGT5 0.1011 0.0484 0.12
ChatGPT – – 0.22
to LLaMA. KG-LLaMA-13B achieves the highest
GPT-4 – – 0.24 accuracy scores on the two KG data sets. Table 3
KG-ChatGLM-6B 0.1613 0.0455 0.11
LLaMA-7B 0.0849 0.0254 0.03 presents the accuracy scores of different LLMs on
LLaMA-13B 0.0991 0.0276 0.01 the 100 test instances of FB13. We manually label
KG-LLaMA-7B 0.2415 0.0782 0.16
KG-LLaMA-13B 0.2559 0.0872 0.13 the response of different LLMs as correct or wrong.
We find that KG-LLaMA performs well, the score
Table 4: Entity (link) prediction Hits@1 for different is higher than ChatGPT and equal to GPT-4.
methods. The baseline results with citations are ob-
tained from corresponding papers. The link prediction hits@1 scores of various pre-
trained language models on WN18RR and YAGO3-
Method YAGO3-10 YAGO3-10-100 10 are presented in Table 4. The scores are the
KG-BERT(b) 0.6816 –
KGT5 0.5714 0.60
average for both head and tail entities. In the
ChatGPT – 0.39 case of LLMs, the response is considered correct
GPT-4 – 0.56
ChatGLM-6B 0.0658 0.07 if it contains the label words. The results indi-
KG-ChatGLM-6B 0.5662 0.58 cate a promising outcome in our paradigm, as KG-
LLaMA-7B 0.0348 0.13
LLaMA-13B 0.0040 0.01 LLaMA shows significant improvements due to
KG-LLaMA-7B 0.7028 0.71 instruction turning.
KG-LLaMA-13B 0.6968 0.64
Table 5 demonstrates that KG-LLaMA-7B
Table 5: Relation prediction Hits@1 for different meth- produces the best relation prediction hits@1
ods. on YAGO3-10, even better than GPT-4. KG-
ChatGLM-6B also shows much better results. This
2016), DKRL (Xie et al., 2016) (BERT encoder), indicates that instruction turning leads the LLM
AATE (An et al., 2018). Pre-tained language mod- to extract knowledge stored in model parameters
els: KG-BERT (Yao et al., 2019), StAR (Wang more efficiently.
et al., 2021) and KGT5 (Saxena et al., 2022). We Table 6 illustrates the differences in responses
also compare with two state-of-the-art LLMs Chat- between LLM and KG-LLM given the same input.
GPT and GPT-4. We found the answer of the original models are
For instruction tuning and inference of not satisfactory while instruction tuning can teach
ChatGLM-6B, We used the default parameter the models to answer like training triples and to be
settings in its public implementations. For LLaMA, more aware of a fact.
we use the implementation in the Transformers The main reasons why KG-LLM performs well
Python library. More detailed settings can be are: 1). LLMs contain more general knowledge
found in our code. For KG completion models, compared with smaller pre-trained language mod-
we use the results in their original papers or els. 2). Instruction tuning fills the gap between
reproduce the results using default configurations the pre-trained weights in LLMs and KG triple
in their implementations. For KGT5, we use our descriptions.
prompts and responses for training, other settings
are the same as its implementation. We input our
designed prompts to the web interface of GPT-4 5 Conclusions
and ChatGPT to obtain results.
In this work, we propose a novel KG comple-
4.2 Results tion method termed KG-LLM. Our approach at-
Table 2 presents triple classification accuracy tains state-of-the-art performance in KG comple-
scores on WN11 and FB13. If the ground truth tion tasks such as triple classification and relation
is true and the response contains affirmative words prediction. For future work, we plan to apply
like "Yes" and "yes", or if the label is false our KG-LLM as a knowledge-enhanced language
and the response contains negative words like model to other NLP tasks and incorporate the struc-
"No"/"no"/"not"/"n’t", we label the response as cor- ture information of KGs. Furthermore, we are go-
rect. We find that LLaMA-7B and LLaMA-13B ing to explore more effective prompt engineering
perform poorly on both WN11 and FB13. How- and context instructions of LLM.
Input Label Output
I cannot verify specific personal information about
GPT-4
individuals who are not public figures.
I’m sorry, but I don’t have enough information to
ChatGPT
confirm whether Everett T Moore is a librarian or not.
LLaMA-7B I was wondering if anyone could tell me if this is true.
Is this true: Everett T Moore profession Librarian? True I’m a librarian at Everett T Moore Middle School in
LLaMA-13B
San Antonio, TX.
KG-LLaMA-7B Yes, this is true.
KG-LLaMA-13B Yes, this is true.
ChatGLM-6B Yes, Everett T Moore is a profession Librarian.
KG-ChatGLM-6B Yes, this is true.
Josip Škorić is a male name. Josip is a Croatian form
GPT-4
of the name Joseph, which is a masculine name.
Josip Škorić is a human being and therefore has a
ChatGPT gender. However, I couldn’t find any information
about the gender of Josip Škorić.
Josip Škorić has gender issues. I’m a 22-year-old guy
LLaMA-7B
from Croatia, who has gender issues.
Josip Škorić has undergone gender reassignment
Josip Škorić has gender male LLaMA-13B
surgery.
KG-LLaMA-7B male
KG-LLaMA-13B male
As an AI language model, I do not have access to in-
ChatGLM-6B formation about specific individuals’ genders or their
personal identity, including JosipŠkorić’s.
KG-ChatGLM-6B male

Table 6: Examples outputs from different LLMs. The first line is taken from FB13-100 and the second line is from
YAGO3-10-100.

Limitations KBQA: learning question answering over qa corpora


and knowledge bases. Proceedings of the VLDB En-
Although our method has demonstrated promising dowment, 10(5):565–576.
results in KG completion using LLMs, it currently
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp,
lacks the ability to handle KGs that lack textual and Sebastian Riedel. 2018. Convolutional 2d knowl-
names or descriptions for entities and relations. Ad- edge graph embeddings. In AAAI, pages 1811–1818.
ditionally, we have not yet fully utilized the KG
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
structure information, which has the potential to Kristina Toutanova. 2019. Bert: Pre-training of deep
significantly improve results, particularly in the bidirectional transformers for language understand-
entity prediction task. ing. In NAACL, pages 4171–4186.
Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding,
Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2022. Glm:
References General language model pretraining with autoregres-
sive blank infilling. In Proceedings of the 60th An-
Bo An, Bo Chen, Xianpei Han, and Le Sun. 2018. Ac-
nual Meeting of the Association for Computational
curate text-enhanced knowledge graph representation
Linguistics (Volume 1: Long Papers), pages 320–335.
learning. In NAACL, pages 745–755.
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang,
Sturge, and Jamie Taylor. 2008. Freebase: a collabo- and Weizhu Chen. 2021. Lora: Low-rank adap-
ratively created graph database for structuring human tation of large language models. arXiv preprint
knowledge. In SIGMOD, pages 1247–1250. arXiv:2106.09685.
Antoine Bordes, Nicolas Usunier, Alberto Garcia- Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and
Duran, Jason Weston, and Oksana Yakhnenko. Jun Zhao. 2015. Knowledge graph embedding via
2013. Translating embeddings for modeling multi- dynamic mapping matrix. In ACL, pages 687–696.
relational data. In NIPS, pages 2787–2795.
Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016.
Chen Chen, Yufei Wang, Bing Li, and Kwok-Yan Lam. Knowledge graph completion with adaptive sparse
2022. Knowledge is flat: A seq2seq generative frame- transfer matrix. In AAAI.
work for various knowledge graph completion. In
Proceedings of the 29th International Conference on Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Martti-
Computational Linguistics, pages 4005–4017. nen, and S Yu Philip. 2021. A survey on knowledge
graphs: Representation, acquisition, and applications.
Wanyun Cui, Yanghua Xiao, Haixun Wang, Yangqiu IEEE transactions on neural networks and learning
Song, Seung-won Hwang, and Wei Wang. 2017. systems, 33(2):494–514.
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Liang Wang, Wei Zhao, Zhuoyu Wei, and Jingming
Xuan Zhu. 2015. Learning entity and relation embed- Liu. 2022. Simkgc: Simple contrastive knowledge
dings for knowledge graph completion. In AAAI. graph completion with pre-trained language models.
In Proceedings of the 60th Annual Meeting of the
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Association for Computational Linguistics (Volume
Zhengxiao Du, Zhilin Yang, and Jie Tang. 2021. P- 1: Long Papers), pages 4281–4294.
tuning v2: Prompt tuning can be comparable to fine-
tuning universally across scales and tasks. arXiv Quan Wang, Zhendong Mao, Bin Wang, and Li Guo.
preprint arXiv:2110.07602. 2017. Knowledge graph embedding: A survey
of approaches and applications. IEEE TKDE,
Justin Lovelace and Carolyn Rose. 2022. A framework 29(12):2724–2743.
for adapting pre-trained language models to knowl-
edge graph completion. In Proceedings of the 2022 Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng
Conference on Empirical Methods in Natural Lan- Chen. 2014a. Knowledge graph and text jointly em-
guage Processing, pages 5937–5955. bedding. In EMNLP.
George A Miller. 1995. Wordnet: a lexical database for
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng
english. Communications of the ACM, 38(11):39–41.
Chen. 2014b. Knowledge graph embedding by trans-
Deepak Nathani, Jatin Chauhan, Charu Sharma, and lating on hyperplanes. In AAAI.
Manohar Kaul. 2019. Learning attention-based em-
beddings for relation prediction in knowledge graphs. Zhigang Wang and Juan-Zi Li. 2016. Text-enhanced
In Proceedings of the 57th Annual Meeting of the As- representation learning for knowledge graph. In IJ-
sociation for Computational Linguistics, pages 4710– CAI, pages 1293–1299.
4723.
Han Xiao, Minlie Huang, Lian Meng, and Xiaoyan Zhu.
Dai Quoc Nguyen, Dat Quoc Nguyen, Tu Dinh Nguyen, 2017. SSP: semantic space projection for knowledge
and Dinh Phung. 2018. A convolutional neural graph embedding with text descriptions. In AAAI.
network-based model for knowledge base completion
and its application to search personalization. Seman- Han Xiao, Minlie Huang, and Xiaoyan Zhu. 2016.
tic Web. TransG: A generative model for knowledge graph
embedding. In ACL, volume 1, pages 2316–2325.
OpenAI. 2023. Gpt-4 technical report.
Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, and
Apoorv Saxena, Adrian Kochsiek, and Rainer Gemulla. Maosong Sun. 2016. Representation learning of
2022. Sequence-to-sequence knowledge graph com- knowledge graphs with entity descriptions. In AAAI.
pletion and question answering. In Proceedings
of the 60th Annual Meeting of the Association for Xin Xie, Zhoubo Li, Xiaohan Wang, Yuqi Zhu, Ningyu
Computational Linguistics (Volume 1: Long Papers), Zhang, Jintian Zhang, Siyuan Cheng, Bozhong Tian,
pages 2814–2828. Shumin Deng, Feiyu Xiong, and Huajun Chen. 2023.
Lambdakg: A library for pre-trained language model-
Richard Socher, Danqi Chen, Christopher D Manning, based knowledge graph embeddings.
and Andrew Ng. 2013. Reasoning with neural tensor
networks for knowledge base completion. In NIPS, Jiacheng Xu, Xipeng Qiu, Kan Chen, and Xuanjing
pages 926–934. Huang. 2017. Knowledge graph representation with
Fabian M Suchanek, Gjergji Kasneci, and Gerhard jointly structural and textual encoding. In IJCAI,
Weikum. 2007. Yago: a core of semantic knowledge. pages 1318–1324.
In WWW, pages 697–706. ACM.
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao,
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier and Li Deng. 2015. Embedding entities and relations
Martinet, Marie-Anne Lachaux, Timothée Lacroix, for learning and inference in knowledge bases. In
Baptiste Rozière, Naman Goyal, Eric Hambro, ICLR.
Faisal Azhar, et al. 2023. Llama: Open and effi-
cient foundation language models. arXiv preprint Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Kg-
arXiv:2302.13971. bert: Bert for knowledge graph completion. arXiv
preprint arXiv:1909.03193.
Bo Wang, Tao Shen, Guodong Long, Tianyi Zhou, Ying
Wang, and Yi Chang. 2021. Structure-augmented Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing
text representation learning for efficient knowledge Xie, and Wei-Ying Ma. 2016. Collaborative knowl-
graph completion. In Proceedings of the Web Confer- edge base embedding for recommender systems. In
ence 2021, pages 1737–1748. KDD, pages 353–362. ACM.

Haoyu Wang, Vivek Kulkarni, and William Yang Zhao Zhang, Fuzhen Zhuang, Meng Qu, Fen Lin, and
Wang. 2018. Dolores: Deep contextualized Qing He. 2018. Knowledge graph embedding with
knowledge graph embeddings. arXiv preprint hierarchical relation structure. In EMNLP, pages
arXiv:1811.00147. 3198–3207.
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang,
Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen
Zhang, Junjie Zhang, Zican Dong, et al. 2023. A
survey of large language models. arXiv preprint
arXiv:2303.18223.
Yuqi Zhu, Xiaohan Wang, Jing Chen, Shuofei Qiao,
Yixin Ou, Yunzhi Yao, Shumin Deng, Huajun Chen,
and Ningyu Zhang. 2023. Llms for knowledge graph
construction and reasoning: Recent capabilities and
future opportunities.

A Example Input
An example input for LLM relation prediction
from YAGO3-10: What is the relationship be-
tween Sergio Padt and Jong Ajax? Please choose
your answer from: is known for|is citizen of|has
currency|has child|deals with|has academic advi-
sor|has gender|wrote music for|acted in|died in|has
capital|works at|lives in|is affiliated to|has musi-
cal role|is located in|happened in|has official lan-
guage|created|has won prize|influences|is politician
of|is connected to|owns|graduated from|was born
in|is leader of|exports|is interested in|participated
in|directed|imports|edited|has neighbor|has web-
site|is married to|plays for.

You might also like