KGLM: Integrating Knowledge Graph Structure in Language Models For Link Prediction

The document presents the Knowledge Graph Language Model (KGLM), which integrates knowledge graph structures into language models to enhance link prediction tasks. By introducing an entity/relation embedding layer, KGLM improves the model's ability to learn from both textual and structural information of knowledge graphs, achieving state-of-the-art performance on benchmark datasets. The research demonstrates that further pre-training of language models with this embedding layer significantly enhances their predictive capabilities in knowledge graph completion tasks.

Uploaded by

kontoshakes6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

KGLM: Integrating Knowledge Graph Structure in Language Models For Link Prediction

Uploaded by

kontoshakes6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

KGLM: Integrating Knowledge Graph Structure

in Language Models for Link Prediction

Jason Youn1,2,3 and Ilias Tagkopoulos 1,2,3
1
Department of Computer Science, University of California, Davis, CA 95616, USA.
2
Genome Center, University of California, Davis, CA 95616, USA.
3
USDA/NSF AI Institute for Next Generation Food Systems (AIFS),
University of California, Davis, CA 95616, USA.
{jyoun, itagkopoulos}@ucdavis.edu

Melinda French
Abstract
daugh Jennifer Gates
The ability of knowledge graphs to represent terOf
complex relationships at scale has led to their
arXiv:2211.02744v2 [cs.CL] 17 May 2023

adoption for various needs including knowl-

With
f

bo
edge representation, question-answering, and O

divorced
ter

rn
h

In
recommendation systems. Knowledge graphs ug Microsoft
da
are often incomplete in the information they located
rOf In
represent, necessitating the need for knowl- e
found
edge graph completion tasks. Pre-trained
and fine-tuned language models have shown Washington
promise in these tasks although these models Bill Gates
ignore the intrinsic information encoded in the Figure 1: Sample knowledge graph with 6 triples. The
knowledge graph, namely the entity and re- graph contains three unique entity types (circle for per-
lation types. In this work, we propose the son, triangle for company, and square for location) and
Knowledge Graph Language Model (KGLM) 5 unique relation types or 10 if considering both the
architecture, where we introduce a new en- forward and inverse relations. The task of the knowl-
tity/relation embedding layer that learns to dif- edge graph completion is to complete the missing links
ferentiate distinctive entity and relation types, in the graph, e.g., (Bill Gates, bornIn?, Washington) us-
therefore allowing the model to learn the struc- ing the existing knowledge graph.
ture of the knowledge graph. In this work,
we show that further pre-training the language
models with this additional embedding layer
using the triples extracted from the knowledge 2020), and automatically curated ones like Free-
graph, followed by the standard fine-tuning Base (Bollacker et al., 2008), Knowledge Vault
phase sets a new state-of-the-art performance (Dong et al., 2014), and NELL (Carlson et al.,
for the link prediction task on the benchmark 2010) exist. However, these KGs often suffer from
datasets. incompleteness. For example, 71% of the people in
FreeBase have no known place of birth (West et al.,
1 Introduction 2014). To address this issue, knowledge graph
Knowledge graph (KG) is defined as a directed, completion (KGC) methods aim at connecting the
multi-relational graph where entities (nodes) are missing links in the KG.
connected with one or more relations (edges) Graph feature models like path ranking algo-
(Wang et al., 2017). It is represented with a set rithm (PRA) (Lao and Cohen, 2010; Lao et al.,
of triples, where a triple consists of (head entity, 2011) attempt to solve the KGC tasks by extract-
relation, tail entity) or (h, r, t) for short, for ex- ing the features from the observed edges over the
ample (Bill Gates, founderOf, Microsoft) as shown KG to predict the existence of a new edge (Nickel
in Figure 1. Due to their effectiveness in iden- et al., 2015). For example, the existence of the
daughterOf
tifying patterns among data and gaining insights path Jennifer Gates −− −−−−−→ Melinda French
divorcedWith
into the mechanisms of action, associations, and ←−−−−−−−− Bill Gates in Figure 1 can be used as
testable hypotheses (Li and Chen, 2014; Silvescu a clue to infer the triple (Jennifer Gates, daugh-
et al., 2012), both manually curated KGs like DB- terOf, Bill Gates). Other popular types of models
pedia (Auer et al., 2007), WordNet (Miller, 1998), are latent feature models such as TransE (Bordes
KIDS (Youn et al., 2022), and CARD (Alcock et al., et al., 2013), TransH (Wang et al., 2014), and Ro-
Generate pre-training data Pre-train language model
e0 Pre-trained language model
e0 r1 e1
r1 e1 Triple 1
e1 r1 e0 Token
Triple 2 E[s] Etoke4 Etokr4 Etoke3 E[/s]
embeddings
r1
r0

r3
e3
r4 Position
r2 e4 r4 e3 E0 E1 E2 E3 E4
embeddings
e4 Triple 12
e2 Entity/relation type
Input knowledge graph Pre-training corpus of hop 1 E E Er4-1 E E
embeddings

Figure 2: Proposed pre-training approach of the KGLM. First, both the forward and inverse triples are extracted
from the knowledge graph to serve as the pre-training corpus. We then continue pre-training the language model,
RoBERTa in our case, using the masked language model training objective, with an additional entity/relation-type
embedding layer. The entity/relation-type embedding scheme shown here corresponds to the KGLMGER , the most
fine-grained version where both the entity and relation types are considered unique. Note that the inverse relation
denoted by -1 is different from its forward counterpart. For demonstration purposes, we assume all entities and
relations to have a single token.

tatE (Sun et al., 2019) where entities and relations are no ways to distinguish forward relation (Jen-
are converted into a latent space using embeddings. nifer Gates, daughterOf, Melinda French) from
TransE, a representative latent feature model, mod- inverse relation (Melinda French, daughterOf -1 ,
els the relationship between the entities by inter- Jennifer Gates).
preting them as a translational operation. That is, In this paper, we propose the Knowledge Graph
the model optimizes the embeddings by enforcing Language Model (KGLM) (Figure 2), a simple
the vector operation of head entity embedding h yet effective language model pre-training approach
plus the relation embedding r to be close to the tail that learns from both the textual and structural in-
entity embedding t for a given fact in the KG, or formation of the knowledge graph. We continue
simply h + r ≈ t. pre-training the language model that has already
been pre-trained on other large natural language
Recently, pre-trained language models like corpora using the corpus generated by converting
BERT (Devlin et al., 2018) and RoBERTa (Liu the triples in the knowledge graphs as textual se-
et al., 2019) have shown state-of-the-art perfor- quences, while enforcing the model to better under-
mance in all of the natural language processing stand the underlying graph structure and by adding
(NLP) tasks. As a natural extension, models like an additional entity/relation-type embedding layer.
KG-BERT (Yao et al., 2019) and BERTRL (Zha Testing our model on the WN18RR dataset for the
et al., 2021) that utilize these pre-trained language link prediction task shows that our model improved
models by treating a triple in the KG as a textual se- the mean rank by 21.2% compared to the previous
quence, e.g., (Bill Gates, founderOf, Microsoft) as state-of-the-art method (51 vs. 40.18, respectively).
‘Bill Gates founder of Microsoft’, have also shown All code and instructions on how to reproduce the
state-of-the-art results on the downstream KGC results are available online.1
tasks. Although such textual encoding (Wang et al.,
2021) models are generalizable to unseen entities 2 Background
or relations (Zha et al., 2021), they still fail to learn
Link Prediction. The link prediction (LP) task,
the intrinsic structure of the KG as the models are
one of the commonly researched knowledge graph
only trained on the textual sequence. To solve this
completion tasks, attempts to predict the missing
issue, a hybrid approach like StAR (Wang et al.,
head entity (h) or tail entity (t) of a triple (h, r,
2021) has recently been proposed to take advantage
t) given a KG G = (E, R), where {h, t} ∈ E is
of both latent feature models and textual encoding
the set of all entities and r ∈ R is the set of all
models by enforcing a translation-based graph em-
relations. Specifically, given a single test positive
bedding approach to train the textual encoders. Yet,
triple (h, r, t), its corresponding link prediction test
current textual encoding models still suffer from en-
dataset can be constructed by corrupting either the
tity ambiguation problems (Cucerzan, 2007) where
head or the tail entity in the filtered setting (Bordes
an entity Apple, for example, can refer to either the
1
company Apple Inc. or the fruit. Moreover, there https://github.com/ibpa/KGLM
et al., 2013) as Table 1: Statistics of the benchmark knowledge graphs
used for link prediction.

(h,r,t)
DLP = Dataset # ent # rel # train # val # test
{(h, r, t’) | t0 ∈ (E − {h, t}) ∧ (h, r, t0 ) ∈
/ D} WN18RR 40,943 11 86,835 3,034 3,134
FB15k-237 14,951 237 272,115 17,535 20,466
∪{(h’, r, t) | h0 ∈ (E − {h, t}) ∧ (h0 , r, t) ∈
/ D} UMLS 135 46 5,216 652 661
∪{(h, r, t)},
(1)
where D = Dtrain ∪Dval ∪Dtest is the complete Gates, daughterOf -1 , Jennifer Gates), where the -1
dataset. Evaluation of the link prediction task is notation denotes the inverse direction of the corre-
measured with mean rank (MR), mean reciprocal sponding relation.
rank (MRR), and hits@N (Rossi et al., 2021). MR To enforce the model to learn the knowledge
is defined as graph structure, we introduce a new embedding
layer entity/relation-type embedding (ER-type em-
P (h,r,t)
rank((h, r, t) | DLP ) bedding) in addition to the pre-existing token and
(h,r,t)∈Dtest position embeddings of RoBERTa as shown in Fig-
MR = , ure 2. This additional layer aims to embed the
|Dtest |
(2) tokens in the input sequence with its corresponding
where rank(·|·) is the rank of the positive triple entity/relation-type, where the set of entities E in
among its corrupted versions and |Dtest | is the num- the knowledge graph can have tE different entity
ber of positive test triples. MRR is the same as MR types depending on the schema of the knowledge
except that the reciprocal rank 1/rank(·|·) is used. graph, (e.g., tE = 3 for person, company, and lo-
Hits@N is defined as cation in Figure 1). Note that many knowledge
graphs do not specify the entity types, in which
case tE = 1. For the set of relations R, there exist
hits@N =
( tR = 2nR , where nR is the number of unique rela-
(h,r,t)
P 1, if rank((h, r, t) | DLP ) < N tions in the knowledge graph and the multiplier of
(h,r,t)∈Dtest 0, otherwise 2 comes from forward and inverse directions (e.g.,
, tR = 10 for the sample knowledge graph in Figure
|Dtest |
(3) 1).
where N ∈ {1, 3, 10} is commonly reported. In this work, we propose three different varia-
Higher MRR and hits@N values are better, tions of ER-type embeddings. KGLMBase is the
whereas, for MR, lower values denote higher per- simplified version where all entities are assigned
formance. a single entity type and relations are assigned ei-
ther forward or inverse relation type regardless of
3 Proposed Approach their unique relation types, resulting in a total of 3
In this work, we propose to continue pre-training, ER-type embeddings. The KGLMGR is a version
instead of pre-training from scratch, the language with granular relation types with tR + 1 ER-type
model RoBERTaLARGE (Liu et al., 2019) that has al- embeddings. The KGLMGER is the most granular
ready been trained on English-language corpora of version where we utilize all tE + tR ER-type em-
varying sizes and domains, using both the forward beddings. In other words, all entity types as well
and inverse knowledge graph textual sequences as all relation types including both directions are
(Figure 2). Following the convention used in the considered.
KG-BERT and StAR (see Appendix A), we use a To be specific, we convert a triple (h, r, t) to a
textual representation of a given triple, e.g., (Bill sequence of tokens w(h,r,t) = h[s]wah wbr wct [/s] :
Gates, founderOf, Microsoft) as ‘Bill Gates founder a ∈ {1..|h|} & b ∈ {1..|r|} & c ∈ {1..|t|}i ∈
of Microsoft’, to generate the pre-training corpus. R(|h|+|r|+|t|+2) , where [s] and [/s] are special
However, instead of extracting only the forward tokens denoting beginning and end of the sequence,
triple as done in the previous work, we extract both respectively. The input to the RoBERTa model is
the forward and inverse versions of the triple, e.g., then constructed by adding the ER-type embedding
(Jennifer Gates, daughterOf, Bill Gates) and (Bill t(h,r,t) and the p(h,r,t) position embeddings to the
Table 2: Link prediction results on the benchmark datasets WN18RR, FB15k-237, and UMLS. Bold numbers de-
note the best performance for a given metric and class of models. Underlined numbers denote the best performance
for a given metric regardless of the model type. Note that we do not report KGLMGER performance since the tested
datasets do not specify entity types in their schema.

WN18RR FB15k-237 UMLS

Method Hits @1 Hits @3 Hits @10 MR MRR Hits @1 Hits @3 Hits @10 MR MRR Hits@10 MR
Model type: Not based on language models
TransE .043 .441 .532 2300 .243 .198 .376 .441 323 .279 .989 1.84
TransH .053 .463 .540 2126 .279 .306 .450 .613 219 .320 - -
DistMult .412 .470 .504 7000 .444 .199 .301 .446 512 .281 .846 5.52
ComplEx .409 .469 .530 7882 .449 .194 .297 .450 546 .278 .967 2.59
ConvE .390 .430 .480 5277 .46 .239 .350 .491 246 .316 .990 1.51
RotatE .428 .492 .571 3340 .476 .241 .375 .533 177 .338 - -
GAAT .424 .525 .604 1270 .467 .512 .572 .650 187 .547 - -
LineaRE .453 .509 .578 1644 .495 .264 .391 .545 155 .357 - -
QuatDE .438 .509 .586 1977 .489 .268 .400 .563 90 .365 - -
Model type: Based on language models
KG-BERT .041 .302 .524 97 .216 - - .420 153 - .990 1.47
StAR .243 .491 .709 51 .401 .205 .322 .482 117 .296 .991 1.49
KGLMBase .305 .518 .730 47.97 .445 - - - - - - -
KGLMGR .330 .538 .741 40.18 .467 .200 .314 .468 125.9 .289 .995 1.19

w(h,r,t) token embeddings, as 4 Experiments and Results

X(h,r,t) = w(h,r,t) + p(h,r,t) + t(h,r,t) . (4) 4.1 Datasets

We tested our proposed method on three bench-
Unlike the segment embeddings in the KG-BERT mark datasets WN18RR, FB15k-237, and UMLS
and StAR that were used to mark the input tokens as shown in Table 1. WN18RR (Dettmers et al.,
with either the entity (se ) or relation (sr ), the ER- 2018) is derived from WordNet (Miller, 1998), a
type embedding now replaces its functionality. Fi- large English lexical database of semantic relation-
nally, we pre-train the model using the masked lan- ships between words, FB15k-237 (Toutanova and
guage model (MLM) training objective (Liu et al., Chen, 2015) is extracted from Freebase (Bollacker
2019). et al., 2008), a large community-drive KG of gen-
For fine-tuning, we extend the idea of how the eral facts about the world, and UMLS contains
KG-BERT scores a triple (see Equation 6 in Ap- biomedical relationships. WN18RR and FB15k-
pendix A) to take advantage of the ER-type embed- 237 are subsets of WN18 (Bordes et al., 2013) and
dings learned in our pre-training stage. For a given FB15k (Bordes et al., 2013), respectively, where
target triple, we calculate the weighted average the inverse relation test leakage problem, i.e. the
score of both directions as problem of inverted test triples appearing in the
training set, has been corrected.
scoreKGLM (h, r, t) = αSeqCls(X(h,r,t) )+
−1 ,h) (5) 4.2 Settings
(1 − α)SeqCls(X(t,r ),
We used RoBERTaLARGE (Liu et al., 2019), a
where SeqCls(·) is a RoBERTa model transformer BERTLARGE -based architecture with 24 layers,
with a sequence classification head on top of the 1024 hidden size, 16 self-attention heads, and
pooled output (last layer hidden-state of the [CLS] 355M parameters, for the pre-trained language
token followed by dense layer and tanh activation model as it has been shown in a previous study to
function), (t, r−1 , h) denotes the inverse version of perform better than BERT (hits@1 0.243 vs. 0.222
(h, r, t), and 0 ≤ α ≤ 1 denotes the weight used and MR 51 vs. 99, link prediction on WN18RR)
for balancing the scores from forward and inverse (Wang et al., 2021). For pre-training, we used
scores. For example, α = 1.0 considers only the learning rate = 5e-05, batch size = 32, epoch = 20
forward direction score. (WN18RR), 10 (FB15k-237), and 1,000 (UMLS),
Table 3: Breakdown of the original hypothesis and their results on WN18RR. For claim 1, we continued to pre-
train RoBERTaLARGE using the knowledge graph without the ER-type embeddings. Note that we did not also use
the ER-type embeddings layer in the fine-tuning stage. For claim 2, we learned the ER-type embeddings in the
fine-tuning stage only without any further pre-training.

ER-type embeddings
Model Continue pre-training Pre-train Fine-tune Hits @1 Hits @3 Hits @10 MR MRR
Claim 1 o x x 0.331 0.529 0.728 53.5 0.462
Claim 2 x - o 0.322 0.489 0.672 66.4 0.439
KGLMGR o o o 0.330 0.538 0.741 40.18 0.467

and AdamW optimizer (Loshchilov and Hutter, sult shows that the combination of these two claims
2017). For fine-tuning training data, we sampled works in a non-linear fashion to maximize perfor-
10 negative triples for a positive triple by corrupt- mance.
ing both the head and tail entity 5 times each. We The results of performing link prediction on the
used the validation set to find the optimal learning benchmark datasets are shown in Table 2. Com-
rates = {1e − 06, 5e − 07}, batch size = {16, 32}, pared to StAR, which had the best performance on
epochs = {1, 2, 3, 4, 5} for WN18RR and FB15k- MR and hits@10 on WN18RR, KGLMGR outper-
237 and 25, 50, 75, 100 for UMLS, and α from formed all the metrics with 21.2% improved MR
0.0 to 1.0 with an increment of 0.1. For all exper- (40.18 vs. 51, respectively) and 4.5% increased
iments, we set α = 0.5 based on the WN18RR hits@10 (0.709 vs. 0.741, respectively). Al-
validation set performance. Both pre-training and though still inferior compared to the graph embed-
fine-tuning were performed on 3 × Nvidia Quadro ding approaches, KGLMGR has 35.8% improved
RTX 6000 GPUs in a distributed manner using hits@1 compared to the best language model-based
the 16-bit mixed precision and DeepSpeed (Rasley approach StAR (0.243 vs. 0.330, respectively).
et al., 2020; Rajbhandari et al., 2020) library in the Across all model types, KGLMGR has the best per-
stage-2 setting. We used the Transformers library formance on all metrics for WN18RR except for
(Wolf et al., 2019). hits@1. Although we did not observe any improve-
ment compared to StAR for the FB15k-237 dataset,
4.3 Link Prediction Results we had the best performance on all metrics for
The hypothesis behind the KGLM was that learning UMLS with 21.2% improved MR than ComplEx
the ER-type embedding layers in the pre-training (1.19 vs. 1.51, respectively). KGLMGR outper-
stage using the corpus generated by the knowl- formed KGLMBase in all metrics.
edge graph, followed by fine-tuning has the best
performance. To test our hypothesis, we broke 5 Conclusion
down the hypothesis into two separate claims. For
In this work, we presented KGLM, which intro-
the first claim, we only continued pre-training
duces a new entity/relation (ER)-type embedding
RoBERTaLARGE followed by fine-tuning without
layer for learning the structure of the knowledge
the ER-type embeddings. This test removes the
graph. Compared to the previous language model-
contribution from the ER-type embeddings and
based methods that only fine-tune for a given task,
solely tests the performance gained by further pre-
we found that learning the ER-type embeddings
training the model with the knowledge graph as
in the pre-training stage followed by fine-tuning
input. Table 3 shows that claim 1 falls behind the
resulted in better performance. In future work, we
KGLMGR in all metrics except for hits @1 (0.331
plan to further test the version of KGLM that takes
vs. 0.330, respectively). For the second claim, we
into account entity types, KGLMGER , on domain-
did not continue pre-training and instead used the
specific knowledge graphs like KIDS (Youn et al.,
RoBERTaLARGE pre-trained weights as-is. We then
2022) with entity types in their schema.
learned the ER-type embeddings in the fine-tuning
stage. This test shows if the ER-type embeddings Limitations
can be learned only during the fine-tuning stage.
Table 3 shows that KGLMGR outperforms all of the Although KGLM outperforms state-of-the-art mod-
metrics obtained using the second claim. This re- els when the training set includes full sentences
(e.g., UMLS and WN18RR), the model performed Silviu Cucerzan. 2007. Large-scale named entity dis-
similarly to the state-of-the-art in cases where the ambiguation based on wikipedia data. In Proceed-
ings of the 2007 joint conference on empirical meth-
training dataset had only ontological relationships,
ods in natural language processing and computa-
such as the /music/artist/origin relation present in tional natural language learning (EMNLP-CoNLL),
the FB15k-237 dataset. One major limitation of pages 708–716.
the proposed method is the long training and infer-
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp,
ence time, which we plan to alleviate by adopting and Sebastian Riedel. 2018. Convolutional 2d
Siamese-style textual encoders (Wang et al., 2021; knowledge graph embeddings. In Thirty-second
Li et al., 2022) in future work. AAAI conference on artificial intelligence.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Ethics Statement Kristina Toutanova. 2018. Bert: Pre-training of deep
bidirectional transformers for language understand-
The authors declare no competing interests. ing. arXiv preprint arXiv:1810.04805.

Acknowledgements Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko

Horn, Ni Lao, Kevin Murphy, Thomas Strohmann,
We would like to thank the members of the Shaohua Sun, and Wei Zhang. 2014. Knowledge
Tagkopoulos lab for their suggestions. This work vault: A web-scale approach to probabilistic knowl-
edge fusion. In Proceedings of the 20th ACM
was supported by the USDA-NIFA AI Institute for SIGKDD international conference on Knowledge
Next Generation Food Systems (AIFS), USDA- discovery and data mining, pages 601–610.
NIFA award number 2020-67021-32855 and the
Ni Lao and William W Cohen. 2010. Relational re-
NIEHS grant P42ES004699 to I.T. J.Y. conceived trieval using a combination of path-constrained ran-
the project and performed all experiments. Both dom walks. Machine learning, 81(1):53–67.
J.Y. and I.T. wrote the manuscript. I.T. supervised
Ni Lao, Tom Mitchell, and William Cohen. 2011. Ran-
all aspects of the project.
dom walk inference and learning in a large scale
knowledge base. In Proceedings of the 2011 con-
ference on empirical methods in natural language
References processing, pages 529–539.
Brian P Alcock, Amogelang R Raphenya, Tammy TY Da Li, Ming Yi, and Yukai He. 2022. Lp-bert: Multi-
Lau, Kara K Tsang, Mégane Bouchard, Arman task pre-training knowledge graph bert for link pre-
Edalatmand, William Huynh, Anna-Lisa V Nguyen, diction. arXiv preprint arXiv:2201.04843.
Annie A Cheng, Sihan Liu, et al. 2020. Card 2020:
antibiotic resistome surveillance with the compre- Yixue Li and Luonan Chen. 2014. Big biological data:
hensive antibiotic resistance database. Nucleic acids challenges and opportunities. Genomics, proteomics
research, 48(D1):D517–D525. & bioinformatics, 12(5):187.

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
Lehmann, Richard Cyganiak, and Zachary Ives. dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
2007. Dbpedia: A nucleus for a web of open data. Luke Zettlemoyer, and Veselin Stoyanov. 2019.
In The semantic web, pages 722–735. Springer. Roberta: A robustly optimized bert pretraining ap-
proach. arXiv preprint arXiv:1907.11692.
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim
Sturge, and Jamie Taylor. 2008. Freebase: a collab- Ilya Loshchilov and Frank Hutter. 2017. Decou-
oratively created graph database for structuring hu- pled weight decay regularization. arXiv preprint
man knowledge. In Proceedings of the 2008 ACM arXiv:1711.05101.
SIGMOD international conference on Management George A Miller. 1998. WordNet: An electronic lexical
of data, pages 1247–1250. database. MIT press.
Antoine Bordes, Nicolas Usunier, Alberto Garcia- Maximilian Nickel, Kevin Murphy, Volker Tresp, and
Duran, Jason Weston, and Oksana Yakhnenko. Evgeniy Gabrilovich. 2015. A review of relational
2013. Translating embeddings for modeling multi- machine learning for knowledge graphs. Proceed-
relational data. Advances in neural information pro- ings of the IEEE, 104(1):11–33.
cessing systems, 26.
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase,
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr and Yuxiong He. 2020. Zero: Memory optimiza-
Settles, Estevam R Hruschka, and Tom M Mitchell. tions toward training trillion parameter models. In
2010. Toward an architecture for never-ending lan- SC20: International Conference for High Perfor-
guage learning. In Twenty-Fourth AAAI conference mance Computing, Networking, Storage and Anal-
on artificial intelligence. ysis, pages 1–16. IEEE.
Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, State-of-the-art natural language processing. arXiv
and Yuxiong He. 2020. Deepspeed: System opti- preprint arXiv:1910.03771.
mizations enable training deep learning models with
over 100 billion parameters. In Proceedings of the Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Kg-
26th ACM SIGKDD International Conference on bert: Bert for knowledge graph completion. arXiv
Knowledge Discovery & Data Mining, pages 3505– preprint arXiv:1909.03193.
3506. Jason Youn, Navneet Rai, and Ilias Tagkopoulos. 2022.
Andrea Rossi, Denilson Barbosa, Donatella Fir- Knowledge integration and decision support for ac-
mani, Antonio Matinata, and Paolo Merialdo. 2021. celerated discovery of antibiotic resistance genes.
Knowledge graph embedding for link prediction: A Nature Communications, 13(1):1–11.
comparative analysis. ACM Transactions on Knowl- Hanwen Zha, Zhiyu Chen, and Xifeng Yan. 2021. In-
edge Discovery from Data (TKDD), 15(2):1–49. ductive relation prediction by bert. arXiv preprint
Adrian Silvescu, Doina Caragea, and Anna Atramen- arXiv:2103.07102.
tov. 2012. Graph databases. Artificial Intelligence
Research Laboratory Department of Computer Sci-
A Previous Work
ence, Iowa State University. A.1 KG-BERT
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian KG-BERT (Yao et al., 2019) is a fine-
Tang. 2019. Rotate: Knowledge graph embed-
ding by relational rotation in complex space. arXiv
tuning method that utilizes the base version
preprint arXiv:1902.10197. of the pre-trained language model BERT
(BERTBASE ) (Devlin et al., 2018) as an encoder
Kristina Toutanova and Danqi Chen. 2015. Observed
versus latent features for knowledge base and text for entities and relations of the knowledge
inference. In Proceedings of the 3rd workshop on graph. Specifically, KG-BERT first converts
continuous vector space models and their composi- a triple (h, r, t) to a sequence of tokens
tionality, pages 57–66. w(h,r,t) = h[CLS]wah [SEP]wbr [SEP]wct [SEP] :
Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh a ∈ {1..|h|} & b ∈ {1..|r|} & c ∈ {1..|t|}i,
Phung, et al. 2019. A capsule network-based em- where wn denotes the nth token of either entity
bedding model for knowledge graph completion and or relation, [CLS] and [SEP] are the special
search personalization. In Proceedings of the 2019
Conference of the North American Chapter of the
tokens, while |h|, |r|, and |t| denote the number of
Association for Computational Linguistics: Human tokens in the head entity, relation, and tail entity,
Language Technologies, Volume 1 (Long and Short respectively. This textual token sequence is then
Papers), pages 2180–2189. converted to a sequence of token embeddings
Bo Wang, Tao Shen, Guodong Long, Tianyi Zhou, w(h,r,t) ∈ Rd×(|h|+|r|+|t|+4) , where d is the
Ying Wang, and Yi Chang. 2021. Structure- dimension of the embeddings and 4 is from the
augmented text representation learning for efficient special tokens. Then the segment embeddings
knowledge graph completion. In Proceedings of the
s(h,r,t) = h(se )×(|h|+2) (sr )×(|r|+1) (se )×(|t|+1) i,
Web Conference 2021, pages 1737–1748.
where se and sr are used to differentiate en-
Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. tities from relations, respectively, as well as
2017. Knowledge graph embedding: A survey of
approaches and applications. IEEE Transactions
the position embeddings p(h,r,t) = hpi : i ∈
on Knowledge and Data Engineering, 29(12):2724– {1..(|h|+|r|+|t|+4)}i are added to the token
2743. embeddings w(h,r,t) to form a final input repre-
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng
sentation X(h,r,t) ∈ Rd×(|h|+|r|+|t|+4) that is fed
Chen. 2014. Knowledge graph embedding by trans- to BERT as input. Then, the score of how likely a
lating on hyperplanes. In Proceedings of the AAAI given triple (h, r, t) is to be true is computed by
Conference on Artificial Intelligence, volume 28.
Robert West, Evgeniy Gabrilovich, Kevin Murphy, scoreKG-BERT (h, r, t) = SeqCls(X(h,r,t) ). (6)
Shaohua Sun, Rahul Gupta, and Dekang Lin. 2014.
Knowledge base completion via search-based ques- KG-BERT significantly improved the MR of the
tion answering. In Proceedings of the 23rd inter-
national conference on World wide web, pages 515–
link prediction task compared to the previous state-
526. of-the-art approach CapsE (Vu et al., 2019) (97
compared to 719, an 86.5% decrease), but suffered
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Chaumond, Clement Delangue, Anthony Moi, Pier-
from poor hits@1 of 0.041 due to the entity am-
ric Cistac, Tim Rault, Rémi Louf, Morgan Fun- biguation problem and lack of structural learning
towicz, et al. 2019. Huggingface’s transformers: (Wang et al., 2021; Cucerzan, 2007).
A.2 StAR
StAR (Wang et al., 2021) is a hybrid model that
learns both the contextual and structural informa-
tion of the knowledge graph by augmenting the
structured knowledge in the encoder. It divides
a triple into two parts, (h, r) and (t), and applies
a Siamese-style transformer with a sequence clas-
sification head to generate u = Pool(X(h,r) ) ∈
Rd×(|h|+|r|+3) and v = Pool(X(t) ) ∈ Rd×(|t|+2) ,
respectively, where Pool(·) is the output of the
RoBERTa’s pooling layer. The first scoring module
focuses on classifying the triple by applying a

scorecStAR (h, r, t) = Cls([u; u × v; u − u; v]),

(7)
where Cls(·) is a neural binary classifier with a
dense layer followed by a softmax activation func-
tion. The second scoring module then adopts the
idea of how translation-based graph embedding
methods like TransE learns the graph structure by
minimizing the distance between u and v as

scoredStAR (h, r, t) = −||u − v||, (8)

where || · || is the L2-normalization. During
the training, StAR uses a weighted average of
the binary cross entropy loss computed using
scorecStAR (h, r, t) and the margin-based hinge loss
computed using scoredStAR (h, r, t), whereas only
the scorecStAR (h, r, t) is used for inference. This
approach shows a new state-of-the performance
over the metrics MR (51) and hits@10 (0.709), as
well as significantly improving the hits@1 com-
pared to the KG-BERT (0.041 to 0.243, a 492.7%
increase).

International Fashion Designer
No ratings yet
International Fashion Designer
25 pages
Diagrama Panel Katolights
100% (1)
Diagrama Panel Katolights
8 pages
Exploring Large Language Models For Knowledge Graph Completion
No ratings yet
Exploring Large Language Models For Knowledge Graph Completion
7 pages
Knowledge Graph Embedding With Hierarchical Relation Structure
No ratings yet
Knowledge Graph Embedding With Hierarchical Relation Structure
10 pages
Jair14494 Rev
No ratings yet
Jair14494 Rev
32 pages
Unifying Structure and Language Semantic For Efficient Contrastive Knowledge Graph Completion With Structured Entity Anchors
No ratings yet
Unifying Structure and Language Semantic For Efficient Contrastive Knowledge Graph Completion With Structured Entity Anchors
11 pages
Fast-and-Frugal Text-Graph Transformers Are Effective Link Predictors
No ratings yet
Fast-and-Frugal Text-Graph Transformers Are Effective Link Predictors
14 pages
A Prompt-Based Knowledge Graph Foundation Model For Universal In-Context Reasoning
No ratings yet
A Prompt-Based Knowledge Graph Foundation Model For Universal In-Context Reasoning
23 pages
Kgvalidator: A Framework For Automatic Validation of Knowledge Graph Construction
No ratings yet
Kgvalidator: A Framework For Automatic Validation of Knowledge Graph Construction
23 pages
Path Ranking Model For Entity Prediction
No ratings yet
Path Ranking Model For Entity Prediction
6 pages
Integrating Graph Contextualized Knowledge Into Pre-Trained Language Models
No ratings yet
Integrating Graph Contextualized Knowledge Into Pre-Trained Language Models
8 pages
403 Self Adaptive Relational T
No ratings yet
403 Self Adaptive Relational T
9 pages
Wang Knowledge
No ratings yet
Wang Knowledge
20 pages
2019 Introduction To Neural Network Based Approaches For Question Answering Over Knowledge Graphs
No ratings yet
2019 Introduction To Neural Network Based Approaches For Question Answering Over Knowledge Graphs
34 pages
CP-KGC: Constrained-Prompt Knowledge Graph Completion With Large Language Models
No ratings yet
CP-KGC: Constrained-Prompt Knowledge Graph Completion With Large Language Models
10 pages
2023.findings Acl.709
No ratings yet
2023.findings Acl.709
20 pages
CCKS GRL YizhouSun v1
No ratings yet
CCKS GRL YizhouSun v1
85 pages
Making Large Language Models Perform Better in Knowledge Graph Completion
No ratings yet
Making Large Language Models Perform Better in Knowledge Graph Completion
10 pages
Reference Paper 10
No ratings yet
Reference Paper 10
29 pages
Open-World Knowledge Graph Completion: Baoxu Shi Tim Weninger
No ratings yet
Open-World Knowledge Graph Completion: Baoxu Shi Tim Weninger
8 pages
Yao Et Al. - 2023 - Knowledge Graphs For Textbooks Extraction and Com
No ratings yet
Yao Et Al. - 2023 - Knowledge Graphs For Textbooks Extraction and Com
8 pages
Intelligraphs: Datasets For Benchmarking Knowledge Graph Generation
No ratings yet
Intelligraphs: Datasets For Benchmarking Knowledge Graph Generation
19 pages
TransG A Generative Model For Knowledge Graph Embe
No ratings yet
TransG A Generative Model For Knowledge Graph Embe
11 pages
1807 11761 PDF
No ratings yet
1807 11761 PDF
4 pages
10 KG
No ratings yet
10 KG
63 pages
e1oqj
No ratings yet
e1oqj
14 pages
KG FB
No ratings yet
KG FB
10 pages
Multi-Task Pre-Training Language Model For Semantic Network Completion
No ratings yet
Multi-Task Pre-Training Language Model For Semantic Network Completion
10 pages
D15-1174 Representing Text For Joint Embedding of Text and Knowledge Bases
No ratings yet
D15-1174 Representing Text For Joint Embedding of Text and Knowledge Bases
11 pages
LLM With Knowledge Graphs
No ratings yet
LLM With Knowledge Graphs
40 pages
Fast Gen of Kge
No ratings yet
Fast Gen of Kge
14 pages
Friendly Neighbors 2023
No ratings yet
Friendly Neighbors 2023
8 pages
Aph Language Models
No ratings yet
Aph Language Models
18 pages
TGDK 1 1 2
No ratings yet
TGDK 1 1 2
38 pages
Buildeing Knowlwdge Base Through Deep Learning Relation Extraction
No ratings yet
Buildeing Knowlwdge Base Through Deep Learning Relation Extraction
7 pages
A Comprehensive Survey of Graph Neural Networks For Knowledge Graphs
No ratings yet
A Comprehensive Survey of Graph Neural Networks For Knowledge Graphs
13 pages
Knowledge - Graph - Embedding - and - OpenKE (Report)
No ratings yet
Knowledge - Graph - Embedding - and - OpenKE (Report)
5 pages
Fast and Continual Knowledge Graph Embedding Via Incremental Lora
No ratings yet
Fast and Continual Knowledge Graph Embedding Via Incremental Lora
9 pages
0318 Copie
No ratings yet
0318 Copie
7 pages
Universial
No ratings yet
Universial
5 pages
KnowPath - Reasoning Via LLM-generated Inference Paths
No ratings yet
KnowPath - Reasoning Via LLM-generated Inference Paths
9 pages
Analyzing Knowledge Graph Embedding Methods From A Multi-Embedding Interaction Perspective
No ratings yet
Analyzing Knowledge Graph Embedding Methods From A Multi-Embedding Interaction Perspective
8 pages
Mathematics 11 01073
No ratings yet
Mathematics 11 01073
12 pages
Ke, P., Et Al. (2021) - Jointgt - Graph-Text Joint Representation Learning For Text Generation From Knowledge Graphs. Arxiv
No ratings yet
Ke, P., Et Al. (2021) - Jointgt - Graph-Text Joint Representation Learning For Text Generation From Knowledge Graphs. Arxiv
13 pages
Trans H
No ratings yet
Trans H
8 pages
Towards Understanding The Geometry of KN
No ratings yet
Towards Understanding The Geometry of KN
10 pages
Trans R
No ratings yet
Trans R
7 pages
Can KG Reduce Hullicination
No ratings yet
Can KG Reduce Hullicination
14 pages
Neural Knowledge Acquisition Via Mutual Attention Between Knowledge Graph and Text
No ratings yet
Neural Knowledge Acquisition Via Mutual Attention Between Knowledge Graph and Text
8 pages
Subgraph2vec: A Random Walk-Based Algorithm For Embedding Knowledge Graphs
No ratings yet
Subgraph2vec: A Random Walk-Based Algorithm For Embedding Knowledge Graphs
6 pages
1 s2.0 S0925231225005910 Main
No ratings yet
1 s2.0 S0925231225005910 Main
10 pages
BertNet Harvesting Knowledge Graphs From Pretrained Language Models
No ratings yet
BertNet Harvesting Knowledge Graphs From Pretrained Language Models
13 pages
16585-Article Text-20079-1-2-20210518
No ratings yet
16585-Article Text-20079-1-2-20210518
9 pages
2. KnowGPT Knowledge Graph Based PrompTing For
No ratings yet
2. KnowGPT Knowledge Graph Based PrompTing For
29 pages
3418499+ +Artigo+Cilamce+Modificado
No ratings yet
3418499+ +Artigo+Cilamce+Modificado
7 pages
Research Paper Neuro Symbolic
No ratings yet
Research Paper Neuro Symbolic
12 pages
Chen 2023
No ratings yet
Chen 2023
9 pages
2021 Naacl-Main 45
No ratings yet
2021 Naacl-Main 45
12 pages
Analyzing Knowledge Graph Embedding Methods From A Multi-Embedding Interaction Perspective
No ratings yet
Analyzing Knowledge Graph Embedding Methods From A Multi-Embedding Interaction Perspective
8 pages
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
No ratings yet
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
8 pages
Perspective-Shifted Neuro-Symbolic World Models: A Framework For Socially-Aware Robot Navigation
No ratings yet
Perspective-Shifted Neuro-Symbolic World Models: A Framework For Socially-Aware Robot Navigation
12 pages
Test - 2 Single Choice Questions
No ratings yet
Test - 2 Single Choice Questions
11 pages
Last Exception
No ratings yet
Last Exception
5 pages
Two Articles
No ratings yet
Two Articles
4 pages
01.ytt-Rr Fe
No ratings yet
01.ytt-Rr Fe
34 pages
Anne McCaffrey - Ship 04 - The City Who Fought (With SM Stirling) UC
No ratings yet
Anne McCaffrey - Ship 04 - The City Who Fought (With SM Stirling) UC
361 pages
CAT D7G-Sprocket Assemblies
100% (1)
CAT D7G-Sprocket Assemblies
20 pages
Write The Quotient When The Sum of 73 and 37 Is Divided By: Chapter 5 - Playing With Number
No ratings yet
Write The Quotient When The Sum of 73 and 37 Is Divided By: Chapter 5 - Playing With Number
10 pages
PID Loop Tuning Guide Control Soft DS405v21 020405
No ratings yet
PID Loop Tuning Guide Control Soft DS405v21 020405
4 pages
Is 3025 (P-14) For Conductivity
0% (1)
Is 3025 (P-14) For Conductivity
4 pages
Soal Sat Big Kelas Vii
No ratings yet
Soal Sat Big Kelas Vii
4 pages
Well Testing and Interpretation For Horizontal Wells
No ratings yet
Well Testing and Interpretation For Horizontal Wells
6 pages
Sphere: Engr. Jordan Ronquillo
No ratings yet
Sphere: Engr. Jordan Ronquillo
14 pages
Torsion of A Non-Circular Bar PDF
No ratings yet
Torsion of A Non-Circular Bar PDF
16 pages
Holy Rosary Academy of Las Piñas City Hele/Tle/ Techvoc Department SY 2021-2022
No ratings yet
Holy Rosary Academy of Las Piñas City Hele/Tle/ Techvoc Department SY 2021-2022
6 pages
Sip Terms Definition
No ratings yet
Sip Terms Definition
7 pages
Datasheet - 1606-XL480E-3
No ratings yet
Datasheet - 1606-XL480E-3
2 pages
Audit Coverage Ratio of Indonesian Taxation
No ratings yet
Audit Coverage Ratio of Indonesian Taxation
12 pages
Merged Document 14 PDF
No ratings yet
Merged Document 14 PDF
9 pages
Entry Requirements and Fees 2025 1
No ratings yet
Entry Requirements and Fees 2025 1
2 pages
01 Sir Pawan
No ratings yet
01 Sir Pawan
349 pages
Samsung Galaxy Alpha
No ratings yet
Samsung Galaxy Alpha
5 pages
Howells, J., & Bessant, J. (2012) - Introduction Innovation and Economic Geography A Review and Analysis
No ratings yet
Howells, J., & Bessant, J. (2012) - Introduction Innovation and Economic Geography A Review and Analysis
14 pages
IVRI PHD Entrance Information Bulletin 2011-12
No ratings yet
IVRI PHD Entrance Information Bulletin 2011-12
76 pages
Random Drink Percentage For DND 5ed
67% (3)
Random Drink Percentage For DND 5ed
32 pages
50Q Application Sums For RRB Free PDF
No ratings yet
50Q Application Sums For RRB Free PDF
35 pages
Accerleration Practice Problems
No ratings yet
Accerleration Practice Problems
4 pages
WWRY Khashoggi Audition Package Father and Engineer
No ratings yet
WWRY Khashoggi Audition Package Father and Engineer
14 pages
Ground Matrix 6 (Jul - Dec 2023)
No ratings yet
Ground Matrix 6 (Jul - Dec 2023)
128 pages