A Survey On Knowledge Graphs Representation Acquisition and Applications
A Survey On Knowledge Graphs Representation Acquisition and Applications
2, FEBRUARY 2022
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 495
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
496 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 497
Fig. 3. Illustration of knowledge representation in different spaces. (a) Pointwise space. (b) Complex vector space. (c) Gaussian distribution. (d) Manifold
space.
A. Representation Space product. QuatE [25] extends the complex-valued space into
hypercomplex h, t, r ∈ Hd by a quaternion Q = a + bi +
The key issue of representation learning is to learn cj+dk with three imaginary components, where the quaternion
low-dimensional distributed embedding of entities and inner product, i.e., the Hamilton product h ⊗ r, is used as
relations. Current literature mainly uses real-valued compositional operator for head entity and relation. With the
pointwise space [see Fig. 3(a)], including vector, matrix, introduction of the rotational Hadmard product in complex
and tensor space, while other kinds of space, such as space, RotatE [24] can also capture inversion and composition
complex vector space [see Fig. 3(b)], Gaussian space [see patterns, as well as symmetry and antisymmetry. QuatE [25]
Fig. 3(c)], and manifold [see Fig. 3(d)], are utilized as uses the Hamilton product to capture latent interdependencies
well. The embedding space should follow three conditions, within the 4-D space of entities and relations and gains a more
i.e., differentiability, calculation possibility, and definability expressive rotational capability than RotatE.
of a scoring function [15]. 3) Gaussian Distribution: Inspired by the Gaussian
1) Pointwise Space: The pointwise Euclidean space is word embedding, the density-based embedding model
widely applied for representing entities and relations, project- KG2E [26] introduces Gaussian distribution to deal with the
ing relation embedding in vector or matrix space, or capturing (un)certainties of entities and relations. The authors embedded
relational interactions. TransE [16] represents entities and entities and relations into multidimensional Gaussian distri-
relations in d-dimension vector space, i.e., h, t, r ∈ Rd , and bution H ∼ N (μh , h ) and T ∼ N (μt , t ). The mean
makes embeddings follow the translational principle h+r ≈ t. vector u indicates entities and relations’ position, and the
To tackle this problem of insufficiency of a single space for covariance matrix models their (un)certainties. Following
both entities and relations, TransR [17] then further introduces the translational principle, the probability distribution of entity
separated spaces for entities and relations. The authors pro- transformation H − T is denoted as Pe ∼ N (μh − μt ,
jected entities (h, t ∈ Rk ) into relation (r ∈ Rd ) space by a h + t ). Similarly, TransG [27] represents entities with
projection matrix Mr ∈ Rk×d . NTN [18] models entities across Gaussian distributions, while it draws a mixture of Gaussian
multiple dimensions by a bilinear tensor neural layer. The distribution for relation embedding, where the mth component
relational interaction between head and tail hT Mt is captured translation vector of relation r is denoted as ur,m = t − h ∼
as a tensor denoted as M ∈ Rd×d×k . Instead of using the N (ut − uh , (σh2 + σt2 )E).
Cartesian coordinate system, HAKE [19] captures semantic 4) Manifold and Group: This section reviews knowledge
hierarchies by mapping entities into the polar coordinate representation in manifold space, lie group, and dihedral
system, i.e., entity embeddings em ∈ Rd and e p ∈ [0, 2π)d group. A manifold is a topological space, which could be
in the modulus and phase part, respectively. defined as a set of points with neighborhoods by the set
Many other translational models, such as TransH [20], also theory. The group is algebraic structures defined in abstract
use similar representation space, while semantic matching algebra. Previous pointwise modeling is an ill-posed alge-
models use plain vector space (e.g., HolE [21]) and relational braic system where the number of scoring equations is far
projection matrix (e.g., ANALOGY [22]). Principles of these more than the number of entities and relations. Moreover,
translational and semantic matching models are introduced in embeddings are restricted in an overstrict geometric form
Sections III-B1 and III-B2, respectively. even in some methods with subspace projection. To tackle
2) Complex Vector Space: Instead of using a real-valued these issues, ManifoldE [28] extends pointwise embedding
space, entities and relations are represented in a complex into manifold-based embedding. The authors introduced two
space, where h, t, r ∈ Cd . Take head entity as an example, settings of manifold-based embedding, i.e., sphere and hyper-
h has a real part Re(h) and an imaginary part Im(h), i.e., h = plane. An example of a sphere is shown in Fig. 3(d). For
Re(h)+i Im(h). ComplEx [23] first introduces complex vector the sphere setting, reproducing kernel Hilbert space is used
space shown in Fig. 3(d), which can capture both symmetric to represent the manifold function. Another “hyperplane”
and antisymmetric relations. The Hermitian dot product is used setting is introduced to enhance the model with intersected
to do composition for relation, head, and the conjugate of embeddings. ManifoldE [28] relaxes the real-valued pointwise
the tail. Inspired by Euler’s identity eiθ = cos θ + i sin θ , space into manifold space with a more expressive represen-
RotatE [24] proposes a rotational model taking relation as tation from the geometric perspective. When the manifold
a rotation from head entity to tail entity in complex space function and relation-specific manifold parameter are set to
as t = h ◦ r, where ◦ denotes the elementwise Hadmard zero, the manifold collapses into a point.
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
498 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 499
Fig. 5. Illustrations of neural encoding models. (a) CNN [43] input triples into dense layer and convolution operation to learn semantic representation.
(b) GCN [44] acts as encoder of knowledge graphs to produce entity and relation embeddings. (c) RSN [45] encodes entity–relation sequences and skips
relations discriminatively. (d) Transformer-based CoKE [46] encodes triples as sequences with an entity replaced by [MASK].
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
500 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
rank-r factorization RESCAL over each relational slice of hypernetwork H for 1-D relation-specific convolutional filter
knowledge graph tensor. For the kth relation of m relations, generation to achieve multitask knowledge sharing and, mean-
the kth slice of X is factorized as while, simplifies 2-D ConvE. It can also be interpreted as
a tensor factorization model when taking hypernetwork and
Xk ≈ ARk AT . (15) weight matrix as tensors.
The authors further extended it to handle attributes of enti- 5) Recurrent Neural Networks: The MLP- and CNN-based
ties efficiently [50]. Jenatton et al. [51] then proposed a models, as mentioned above, learn triplet-level representations.
bilinear structured latent factor model In comparison, recurrent networks can capture long-term rela-
(LFM),
d
which extends
tional dependencies in knowledge graphs. Gardner et al. [57]
RESCAL by decomposing Rk = i=1 α i ui vi . By intro-
k
ducing three-way Tucker tensor decomposition, TuckER [52] and Neelakantan et al. [58] propose the RNN-based model
learns to embed by outputting a core tensor and embedding over the relation path to learn vector representation without
vectors of entities and relations. LowFER [53] proposes a and with entity information, respectively. RSN [45] [see
multimodal factorized bilinear pooling mechanism to better Fig. 5(c)] designs a recurrent skip mechanism to enhance
fuse entities and relations. It generalizes the TuckER model semantic representation learning by distinguishing relations
and is computationally efficient with low-rank approximation. and entities. The relational path as (x 1 , x 2 , . . . , x T ) with
3) Neural Networks: Neural networks for encoding seman- entities and relations in an alternating order is generated by
tic matching have yielded remarkable predictive performance random walk, and it is further used to calculate recurrent
in recent studies. Encoding models with linear/bilinear blocks hidden state ht = tanh(Wh ht−1 + Wx xt + b). The skipping
can also be modeled using neural networks, for example, operation is conducted as
SME [39]. Representative neural models include the multilayer
ht , xt ∈ E
perceptron (MLP) [3], the neural tensor network (NTN) [18], ht = (20)
and the neural association model (NAM) [54]. They generally S1 ht + S2 xt−1 , x t ∈ R
feed entities or relations or both into deep neural networks and where S1 and S2 are weight matrices.
compute a semantic matching score. MLP [3] encodes entities 6) Transformers: Transformer-based models have boosted
and relations together into a fully connected layer and uses contextualized text representation learning. To utilize contex-
a second layer with sigmoid activation for scoring a triple as tual information in knowledge graphs, CoKE [46] employs
transformers to encode edges and path sequences. Similarly,
fr (h, t) = σ (w σ (W[h, r, t])) (16)
KG-BERT [59] borrows the idea from language model pre-
where W ∈ R is the weight matrix and [h, r, t] is a
n×3d training and takes the Bidirectional Encoder Representations
concatenation of three vectors. NTN [18] takes entity embed- from Transformer (BERT) model as an encoder for entities
dings as input associated with a relational tensor and outputs and relations.
predictive score as 7) Graph Neural Networks: GNNs are introduced for learn-
ing connectivity structure under an encoder–decoder frame-
+ Mr,1 h + Mr,2 t + br )
fr (h, t) = r σ (hT Mt (17) work. R-GCN [60] proposes relation-specific transformation
where br ∈ R is bias for relation r , and Mr,1 and Mr,2
k to model the directed nature of knowledge graphs. Its forward
are relation-specific weight matrices. It can be regarded as propagation is defined as
⎛ ⎞
a combination of MLPs and bilinear models. NAM [54] 1
associates the hidden encoding with the embedding of x i(l+1) = σ ⎝ Wr(l) x (l) (l) (l) ⎠
j + W0 x i (21)
the tail entity and proposes the relational-modulated neural r ci,r
r∈R j ∈Ni
network (RMNN).
where x i(l) ∈ Rd is the hidden state of the i th entity in
(l)
4) Convolutional Neural Networks: CNNs are utilized for
learning deep expressive features. ConvE [55] uses 2-D con- the lth layer, Nir is a neighbor set of the i th entity within
volution over embeddings and multiple layers of nonlinear relation r ∈ R, Wr(l) and W0(l) are the learnable parameter
features to model the interactions between entities and rela- matrices, and ci,r is normalization, such as ci,r = |Nir |. Here,
tions by reshaping head entity and relation into 2-D matrix, the GCN [61] acts as a graph encoder. To enable specific tasks,
i.e., Mh ∈ Rdw ×dh and Mr ∈ Rdw ×dh for d = dw × dh . an encoder model still needs to be developed and integrated
Its scoring function is defined as into the R-GCN framework. R-GCN takes the neighborhood
fr (h, t) = σ (vec(σ ([Mh ; Mr ] ∗ ω))W)t (18) of each entity equally. SACN [44] introduces weighted GCN
[see Fig. 5(b)], which defines the strength of two adjacent
where ω is the convolutional filters and vec is the vectorization nodes with the same relation type, to capture the structural
operation reshaping a tensor into a vector. ConvE can express information in knowledge graphs by utilizing node structure,
semantic information by nonlinear feature learning through node attributes, and relation types. The decoder module called
multiple layers. ConvKB [43] adopts CNNs for encoding Conv-TransE adopts the ConvE model as semantic matching
the concatenation of entities and relations without reshaping metric and preserves the translational property. By aligning
[see Fig. 5(a)]. Its scoring function is defined as the convolutional outputs of entity and relation embeddings
with C kernels to be M(h, r) ∈ RC×d , its scoring function is
fr (h, t) = concat(σ ([h, r, t] ∗ ω)) · w. (19) defined as
The concatenation of a set for feature maps generated fr (h, t) = g(vec(M(h, r))W )t. (22)
by convolution increases the learning ability of latent fea-
tures. Compared with ConvE, which captures the local rela- Nathani et al. [62] introduced graph attention networks with
tionships, ConvKB keeps the transitional characteristic and multihead attention as the encoder to capture multihop neigh-
shows better experimental performance. HypER [56] utilizes borhood features by inputting the concatenation of entity and
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 501
relation embeddings. CompGCN [63] proposes entity–relation calibration takes a postprocessing process to adjust probability
composition operations over each edge in the neighborhood of scores, making predictions probabilistic sense. Tabacof and
a central node and generalizes previous GCN-based models. Costabello [76] first studied probability calibration for
KGE under the closed-world assumption, revealing that
well-calibrated models can lead to improved accuracy.
D. Embedding With Auxiliary Information Safavi et al. [77] further explored probability calibration under
Multimodal embedding incorporates external information, the more challenging open-world assumption.
such as text descriptions, type constraints, relational paths, and
visual information, with a knowledge graph itself to facilitate E. Summary
more effective knowledge representation.
1) Textual Description: Entities in knowledge graphs have KRL is vital in the research community of knowledge
textual descriptions denoted as D = w1 , w2 , . . . , wn , pro- graphs. This section reviews four folds of KRL with sev-
viding supplementary semantic information. The challenge of eral modern methods summarized in Table I and more
KRL with textual description is to embed both structured in Appendix C in the Supplementary Material. Overall, devel-
knowledge and unstructured textual information in the same oping a novel KRL model is to answer the following four
space. Wang et al. [64] proposed two alignment models questions: 1) which representation space to choose; 2) how to
for aligning entity space and word space by introducing measure the plausibility of triplets in a specific space; 3) which
entity names and Wikipedia anchors. DKRL [65] extends encoding model to use for modeling relational interactions;
TransE [16] to learn representation directly from entity and 4) whether to utilize auxiliary information. The most pop-
descriptions by a convolutional encoder. SSP [66] captures ularly used representation space is the Euclidean point-based
the strong correlations between triples and textual descriptions space by embedding entities in vector space and modeling
by projecting them in a semantic subspace. The joint loss interactions via vector, matrix, or tensor. Other representation
function is widely applied when incorporating KGE with spaces, including complex vector space, Gaussian distribution,
textual description. Wang et al. [64] used a three-component and manifold space and group, are also studied. Manifold
loss L = L K + LT + L A of the knowledge model L K , text space has an advantage over pointwise Euclidean space by
model LT and the alignment model L A . SSP [66] uses a relaxing the pointwise embedding. Gaussian embeddings can
two-component objective function L = Lembed + μLtopic of express the uncertainties of entities and relations, and multiple
embedding-specific loss Lembed and topic-specific loss Ltopic relation semantics. Embedding in complex vector space can
within textual description, traded off by a parameter μ. effectively model different relational connectivity patterns,
2) Type Information: Entities are represented with hier- especially the symmetry/antisymmetry pattern. The represen-
tation space plays an essential role in encoding the semantic
archical classes or types and, consequently, relations with
information of entities and capturing the relational properties.
semantic types. SSE [67] incorporates semantic categories
of entities to embed entities belonging to the same cate- When developing a representation learning model, appropriate
representation space should be selected and designed carefully
gory smoothly in semantic space. TKRL [68] proposes type
to match the nature of encoding methods and balance the
encoder model for projection matrix of entities to capture type
hierarchy. Noticing that some relations indicate attributes of expressiveness and computational complexity. The scoring
function with a distance-based metric utilizes the transla-
entities, KR-EAR [69] categorizes relation types into attributes
tion principle, while the semantic matching scoring function
and relations and modeled the correlations between entity
employs compositional operators. Encoding models, especially
descriptions. Zhang et al. [70] extended existing embedding
neural networks, play a critical role in modeling interactions
methods with hierarchical relation structure of relation clus-
of entities and relations. The bilinear models also have drawn
ters, relations, and subrelations.
much attention, and some tensor factorization can also be
3) Visual Information: Visual information (e.g., entity
regarded as this family. Other methods incorporate auxiliary
images) can be utilized to enrich KRL. Image-embodied
information of textual description, relation/entity types, entity
IKRL [71], containing cross-modal structure-based and
images, and confidence scores.
image-based representation, encodes images to entity space
and follows the translation principle. The cross-modal repre-
sentations make sure that structure- and image-based repre- IV. K NOWLEDGE ACQUISITION
sentations are in the same representation space. Knowledge acquisition aims to construct knowledge graphs
There remain many kinds of auxiliary information for from unstructured text and other structured or semistructured
KRL, such as attributes, relation paths, and logical rules. sources, complete an existing knowledge graph, and discover
Wang et al. [5] gave a detailed review of using additional and recognize entities and relations. Well-constructed and
information. This article discusses relation path and logical large-scale knowledge graphs can be useful for many down-
rules under the umbrella of KGC in Sections IV-A2 and IV- stream applications and empower knowledge-aware models
A4, respectively. with commonsense reasoning, thereby paving the way for AI.
4) Uncertain Information: Knowledge graphs, such as The main tasks of knowledge acquisition include relation
ProBase [72], NELL [73], and ConceptNet [74], contain extraction, KGC, and other entity-oriented acquisition tasks,
uncertain information with a confidence score assigned to such as entity recognition and entity alignment (EA). Most
every relational fact. In contrast to classic deterministic KGE, methods formulate KGC and relation extraction separately.
uncertain embedding models aim to capture uncertainty rep- These two tasks, however, can also be integrated into a unified
resenting the likelihood of relational facts. Chen et al. [75] framework. Han et al. [78] proposed a joint learning frame-
proposed an uncertain KGE model to simultaneously preserve work with mutual attention for data fusion between knowledge
structural and uncertainty information, where probabilistic soft graphs and text, which solves KGC and relation extraction
logic is applied to infer the confidence score. Probability from text. There are also other tasks related to knowledge
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
502 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
TABLE I
S UMMARY OF R ECENT KRL M ODELS . S EE M ORE D ETAILS
IN A PPENDIX C IN THE S UPPLEMENTARY M ATERIAL
Fig. 6. (a) Embedding-based ranking and (b) relation path reasoning [58].
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 503
TABLE II
C OMPARISON OF RL-BASED PATH F INDING FOR K NOWLEDGE G RAPH R EASONING
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
504 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 505
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
506 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
and found that nine-layer CNNs have improved performance. embedding, facilitate efficient rule injection, and induce inter-
Liu et al. [160] proposed to initialize the neural model by pretable rules. With the observation of the graphical nature of
transfer learning from entity classification. The cooperative knowledge graphs, path search and neural path representation
CORD [161] ensembles text corpus and knowledge graph with learning are studied. However, they suffer from connectivity
external logical rules by bidirectional knowledge distillation deficiency when traverses over large-scale graphs. The emerg-
and adaptive imitation. TK-MF [162] enriches sentence rep- ing direction of metarelational learning aims to learn fast
resentation learning by matching sentences and topic words. adaptation over unseen relations in low-resource settings.
Recently, Shahbazi et al. [163] studied trustworthy relation Entity discovery acquires entity-oriented knowledge from
extraction by benchmarking several explanation mechanisms, text and fuses knowledge between knowledge graphs. There
including saliency, gradient × input, and leave one out. are several categories according to specific settings. Entity
The existence of low-frequency relations in knowledge recognition is explored in a sequence-to-sequence manner,
graphs requires few-shot relation classification with unseen entity typing discusses noisy type labels and zero-shot typing,
classes or only a few instances. Gao et al. [164] pro- and entity disambiguation and alignment learn unified embed-
posed hybrid attention-based prototypical networks to compute dings with iterative alignment model proposed to tackle the
prototypical relation embedding and compare its distance issue of a limited number of alignment seeds. However, it may
between the query embedding. Qin et al. [165] explored the face error accumulation problems if newly aligned entities
relationships between relations with a global relation graph suffer from poor performance. Language-specific knowledge
and formulated few-shot relation extraction as a Bayesian has increased in recent years and, consequentially, motivates
metalearning problem to learn the posterior distribution of the research on cross-lingual knowledge alignment.
relations’ prototype vectors. Relation extraction suffers from noisy patterns under the
7) Joint Entity and Relation Extraction: Traditional assumption of distant supervision, especially in text corpus of
relation extraction models utilize pipeline approaches by first different domains. Thus, weakly supervised relation extraction
extracting entity mentions and then classifying relations. How- must mitigate the impact of noisy labeling. For example,
ever, pipeline methods may cause error accumulation. Several multi-instance learning takes bags of sentences as inputs and
studies show better performance by joint learning [143], [166] attention mechanism [146] reduce noisy patterns by soft selec-
than by conventional pipeline methods. Katiyar and tion over instances, and RL-based methods formulate instance
Cardie [167] proposed a joint extraction framework with an selection as a hard decision. Another principle is to learn richer
attention-based LSTM network. Some convert joint extraction representation as possible. As deep neural networks can solve
into different problems, such as sequence labeling via a novel error propagation in traditional feature extraction methods,
tagging scheme [168] and multiturn question answering [169]. this field is dominated by DNN-based models, as summarized
Challenges remain in dealing with entity pair and relation in Table III.
overlapping [170]. Wei et al. [171] proposed a cascade binary
tagging framework that models relations as subject–object V. T EMPORAL K NOWLEDGE G RAPH
mapping functions to solve the overlapping problem. Current knowledge graph research mostly focuses on sta-
There is a distribution discrepancy between training and tic knowledge graphs where facts are not changed with
inference in the joint learning framework, leading to exposure time, while the temporal dynamics of a knowledge graph
bias. Recently, Wang et al. [172] proposed a one-stage joint are less explored. However, the temporal information is of
extraction framework by transforming joint entity and relation great importance because the structured knowledge only holds
extraction into a token pair linking task to mitigate error prop- within a specific period, and the evolution of facts follows
agation and exposure bias. In contrast to the common view that a time sequence. Recent research begins to take temporal
joint models can ease error accumulation by capturing mutual information into KRL and KGC, which is termed temporal
interaction of entities and relations, Zhong and Chen [173] knowledge graph in contrast to the previous static knowl-
proposed a simple pipeline-based yet effective approach to edge graph. Research efforts have been made for learning
learning two independent encoders for entities and relations, temporal and relational embeddings simultaneously. Relevant
revealing that strong contextual representation can preserve models for dynamic network embedding also inspire temporal
distinct features of entities and relations. Future research needs KGE. For example, the temporal graph attention (TGAT)
to rethink the relation between the pipeline and joint learning network [174] that captures temporal–topological structure and
methods. learn time-feature interactions simultaneously may be useful
to preserve temporal-aware relation for knowledge graphs.
D. Summary
This section reviews knowledge completion for incomplete A. Temporal Information Embedding
knowledge graphs and acquisition from plain text. Temporal information is considered in temporal-aware
KGC completes missing links between existing entities or embedding by extending triples into temporal quadruple as
infers entities given entity and relation queries. Embedding- (h, r, t, τ ), where τ provides additional temporal information
based KGC methods generally rely on triple representation about when the fact held. Leblay and Chekol [175] investi-
learning to capture semantics and do candidate ranking for gated temporal scope prediction over time-annotated triple and
completion. Embedding-based reasoning remains at the indi- simply extended existing embedding methods, for example,
vidual relation level and is poor at complex reasoning because TransE with the vector-based TTransE defined as
it ignores the symbolical nature of the knowledge graph f τ (h, r, t) = −h + r + τ − t L 1/2 . (23)
and lack of interpretability. Hybrid methods with symbolics
and embedding incorporate rule-based reasoning, overcome Ma et al. [176] also generalized existing static embedding
the sparsity of knowledge graph to improve the quality of methods and proposed ConT by replacing the shared weight
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 507
TABLE III
S UMMARY OF NRE AND R ECENT A DVANCES
vector of Tucker with a timestamp embedding. Temporally A multivariate temporal point process is used to model the
scoped quadruple extends triples by adding a time scope occurrence of facts, and a novel recurrent network is developed
[τs , τe ], where τs and τe stand for the beginning and ending to learn the representation of nonlinear temporal evolution.
of the valid period of a triple, and then, a static subgraph G τ To capture the interaction between nodes, RE-NET [184]
can be derived from the dynamic knowledge graph when given models event sequences via an RNN-based event encoder
a specific timestamp τ . HyTE [177] takes a time stamp as a and neighborhood aggregator. Specifically, RNN is used to
hyperplane wτ and projects entity and relation representation capture the temporal entity interaction, and the neighborhood
as Pτ (h) = h − (wτ h)wτ , Pτ (t) = t − (wτ t)wτ , and Pτ (r) = aggregator aggregates the concurrent interactions.
r − (wτ r)wτ . The temporally projected scoring function is
calculated as
C. Temporal Relational Dependence
f τ (h, r, t) = Pτ (h) + Pτ (r) − Pτ (t) L 1 /L 2 (24) There exists temporal dependencies in relational chains
within the projected translation of Pτ (h) + Pτ (r) ≈ Pτ (t). following the timeline, for example, wasBornIn →
García-Durán et al. [178] concatenated predicate token graduateFrom → workAt → diedIn. Jiang et al. [185],
sequence and temporal token sequence and used LSTM to [186] proposed time-aware embedding, a joint learning frame-
encode the concatenated time-aware predicate sequences. The work with temporal regularization, to incorporate temporal
last hidden state of LSTM is taken as temporal-aware relational order and consistency information. The authors defined a
embedding rtemp . The scoring functions of extended TransE temporal scoring function as
and DistMult are calculated as h+rtemp −t2 and (h◦ t)rtemp T
, f (rk , rl ) = rk T − rl L 1/2 (25)
respectively. By defining the context of an entity e as an aggre-
gate set of facts containing e, Liu et al. [179] proposed context where T ∈ Rd×d is an asymmetric matrix that encodes the
selection to capture useful contexts and measured temporal temporal order of relation, for a temporal ordering relation
consistency with selected context. By formulating temporal pair rk , rl . Three temporal consistency constraints of dis-
KGC as four-order tensor completion, Lacroix et al. [180] jointness, ordering, and spans are further applied by integer
proposed TComplEx that extends ComplEx decomposition and linear programming formulation.
introduced weighted regularizers.
D. Temporal Logical Reasoning
B. Entity Dynamics
Logical rules are also studied for temporal reasoning.
Real-world events change entities’ states and, consequently, Chekol et al. [187] explored Markov logic network and
affect the corresponding relations. To improve temporal scope probabilistic soft logic for reasoning over uncertain temporal
inference, the contextual temporal profile model [181] formu- knowledge graphs. RLvLR-Stream [95] considers temporal
lates the temporal scoping problem as state change detection close-path rules and learns the structure of rules from the
and utilizes the context to learn state and state change vectors. knowledge graph stream for reasoning.
Inspired by the diachronic word embedding, Goel et al. [182]
took an entity and timestamp as the input of entity embed-
ding function to preserve the temporal-aware characteristics VI. K NOWLEDGE -AWARE A PPLICATIONS
of entities at any time point. Know-evolve [183], a deep Rich structured knowledge can be useful for AI applica-
evolutionary knowledge network, investigates the knowledge tions. However, how to integrate such symbolic knowledge
evolution phenomenon of entities and their evolved relations. into the computational framework of real-world applications
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
508 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
remains a challenge. The application of knowledge graphs 2) Multihop Reasoning: To deal with complex multihop
includes twofold: 1) in-KG applications, such as link pre- relations, it requires a more dedicated design to be capa-
diction and NER and 2) out-of-KG applications, including ble of multihop commonsense reasoning. Structured knowl-
relation extraction and more downstream knowledge-aware edge provides informative commonsense observations and acts
applications, such as question answering and recommendation as relational inductive biases, which boosts recent studies
systems. This section introduces several recent DNN-based on commonsense knowledge fusion between symbolic and
knowledge-driven approaches with applications on natural lan- semantic space for multihop reasoning. Bauer et al. [199] pro-
guage processing and recommendation. More miscellaneous posed multihop bidirectional attention and pointer-generator
applications, such as digital health and search engine, are decoder for effective multihop reasoning and coherent answer
introduced in Appendix E in the Supplementary Material. generation, utilizing external commonsense knowledge by
relational path selection from ConceptNet and injection with
A. Language Representation Learning selectively gated attention. The variational reasoning net-
work (VRN) [200] conducts multihop logic reasoning with
Language representation learning via self-supervised lan- reasoning-graph embedding, while handling the uncertainty
guage model pretraining has become an integral component of in topic entity recognition. KagNet [201] performs concept
many NLP systems. Traditional language modeling does not recognition to build a schema graph from ConceptNet and
exploit factual knowledge with entities frequently observed in learns path-based relational representation via GCN, LSTM,
the text corpus. How to integrate knowledge into language and hierarchical path-based attention. CogQA [202] combines
representation has drawn increasing attention. The knowledge implicit extraction and explicit reasoning and proposes a cog-
graph language model (KGLM) [188] learns to render knowl- nitive graph model based on BERT and GNN for multihop QA.
edge by selecting and copying entities. ERNIE-Tsinghua [189]
fuses informative entities via aggregated pretraining and ran- C. Recommender Systems
dom masking. K-BERT [116] infuses domain knowledge Integrating knowledge graphs as external information
into BERT contextual encoder. ERNIE-Baidu [190] intro- enables recommendation systems to have the ability of com-
duces named entity masking and phrase masking to integrate monsense reasoning, with the potential to solve the sparsity
knowledge into the language model and is further improved issue and the cold start problem. By injecting knowledge-
by ERNIE 2.0 [115] via continual multitask learning. To graph-based side information, such as entities, relations, and
capture factual knowledge from text, KEPLER [191] combines attributes, many efforts work on embedding-based regulariza-
knowledge embedding and masked language modeling losses tion to improve recommendation. The collaborative CKE [203]
via joint optimization. GLM [192] proposes a graph-guided jointly trains KGEs, item’s textual information, and visual con-
entity masking scheme to utilize knowledge graph implicitly. tent via translational KGE model and stacked autoencoders.
CoLAKE [193] further exploits the knowledge context of an Noticing that time- and topic-sensitive news articles consist
entity through a unified word-knowledge graph and a modified of condensed entities and common knowledge, DKN [204]
transformer encoder. Similar to the K-BERT model and focus- incorporates knowledge graph by a knowledge-aware CNN
ing on the medical corpus, BERT-MK [194] integrates medical model with multichannel word-entity-aligned textual inputs.
knowledge into the pretraining language model via knowledge However, DKN cannot be trained in an end-to-end manner
subgraph extraction. Rethinking about large-scale training on as it needs to learn entity embedding in advance. To enable
language model and querying over knowledge graphs, Petroni end-to-end training, MKR [205] associates multitask knowl-
et al. [195] analyzed the language model and knowledge base. edge graph representation and recommendation by sharing
They found that certain factual knowledge can be acquired via latent features and modeling high-order item-entity interaction.
the pretraining language model. While other works consider the relational path and struc-
ture of knowledge graphs, KPRN [206] regards the inter-
B. Question Answering action between users and items as an entity–relation path
Knowledge-graph-based question answering (KG-QA) in the knowledge graph and conducts preference inference
answers natural language questions with facts from knowledge over the path with LSTM to capture the sequential depen-
graphs. Neural network-based approaches represent questions dence. PGPR [207] performs reinforcement policy-guided path
and answers in distributed semantic space, and some also reasoning over knowledge-graph-based user–item interaction.
conduct symbolic knowledge injection for commonsense KGAT [208] applies graph attention network over the collabo-
reasoning. rative knowledge graph of entity–relation and user–item graphs
1) Single-Fact QA: Taking a knowledge graph as an exter- to encode high-order connectivities via embedding propaga-
nal intellectual source, simple factoid QA or single-fact QA tion and attention-based aggregation. Knowledge graph-based
is to answer a simple question involving a single knowledge recommendation inherently processes interpretability from
graph fact. Dai et al. [196] proposed a conditional focused embedding propagation with multihop neighbors in the knowl-
neural network equipped with focused pruning to reduce the edge graph.
search space. BAMnet [197] models the two-way interaction VII. F UTURE D IRECTIONS
between questions and knowledge graph with a bidirectional Many efforts have been conducted to tackle the challenges
attention mechanism. Although deep learning techniques are of knowledge representation and its related applications. How-
intensively applied in KG-QA, they inevitably increase the ever, there remain several formidable open problems and
model complexity. Through the evaluation of simple KG-QA promising future directions.
with and without neural networks, Mohammed et al. [198]
found that sophisticated deep models, such as LSTM and GRU A. Complex Reasoning
with heuristics, achieve state of the art, and nonneural models Numerical computing for knowledge representation and
also gain reasonably well performance. reasoning requires a continuous vector space to capture the
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 509
semantic of entities and relations. While embedding-based Probabilistic logic inference using Markov logic networks
methods have a limitation on complex logical reasoning, two is computationally intensive, making it hard to scalable to
directions on the relational path and symbolic logic are worthy large-scale knowledge graphs. Rules in a recent neural logical
of being further explored. Some promising methods, such model [102] are generated by simple brute-force search, mak-
as recurrent relational path encoding, GNN-based message ing it insufficient on large-scale knowledge graphs. Express-
passing over knowledge graph, and RL-based pathfinding and GNN [103] attempts to use NeuralLP [100] for efficient rule
reasoning, are up-and-coming for handling complex reasoning. induction. Nevertheless, there still has a long way to go to
For the combination of logic rules and embeddings, recent deal with cumbersome deep architectures and the increasingly
works [102], [103] combine Markov logic networks with KGE, growing knowledge graphs.
aiming to leverage logic rules and handling their uncertainty.
Enabling probabilistic inference for capturing the uncertainty
and domain knowledge with efficiently embedding will be a E. Knowledge Aggregation
noteworthy research direction. The aggregation of global knowledge is the core of
knowledge-aware applications. For example, recommendation
systems use a knowledge graph to model user–item interaction
B. Unified Framework and text classification jointly to encode text and knowledge
Several representation learning models on knowledge graphs graph into a semantic space. Most current knowledge aggre-
have been verified as equivalence; for example, Hayshi and gation methods design neural architectures, such as attention
Shimbo [41] proved that HolE and ComplEx are mathemati- mechanisms and GNNs. The natural language processing com-
cally equivalent for link prediction with a particular constraint. munity has been boosted from large-scale pretraining via trans-
ANALOGY [22] provides a unified view of several repre- formers and variants, such as BERT models. At the same time,
sentative models, including DistMult, ComplEx, and HolE. a recent finding [195] reveals that the pretraining language
Wang et al. [47] explored connections among several bilinear model on the unstructured text can acquire certain factual
models. Sharma et al. [209] explored the geometric under- knowledge. Large-scale pretraining can be a straightforward
standing of additive and multiplicative KRL models. Most way to injecting knowledge. However, rethinking the way of
works formulated knowledge acquisition KGC and relation knowledge aggregation in an efficient and interpretable manner
extraction separately with different models. Han et al. [78] is also of significance.
put them under the same roof and proposed a joint learn-
ing framework with mutual attention for information sharing
between knowledge graph and text. A unified understanding F. Automatic Construction and Dynamics
of knowledge representation and reasoning is less explored. Current knowledge graphs rely highly on manual construc-
An investigation toward unification in a way similar to the tion, which is labor-intensive and expensive. The widespread
unified framework of graph networks [210], however, will be applications of knowledge graphs on different cognitive intel-
worthy of bridging the research gap. ligence fields require automatic knowledge graph construction
from large-scale unstructured content. Recent research mainly
works on semiautomatic construction under the supervision of
C. Interpretability existing knowledge graphs. Facing multimodality, heterogene-
Interpretability of knowledge representation and injection is ity, and large-scale application, automatic construction is still
a vital issue for knowledge acquisition and real-world applica- of great challenge.
tions. Preliminary efforts have been made for interpretability. The mainstream research focuses on static knowledge
ITransF [36] uses sparse vectors for knowledge transfer- graphs, with several works on predicting temporal scope valid-
ring and interprets with attention visualization. CrossE [42] ity and learning temporal information and entity dynamics.
explores the explanation scheme of knowledge graphs by Many facts only hold within a specific period. A dynamic
using embedding-based path searching to generate explana- knowledge graph, together with learning algorithms capturing
tions for link prediction. However, recent neural models have dynamics, can address the limitation of traditional knowledge
limitations on transparency and interpretability although they representation and reasoning by considering the temporal
have gained impressive performance. Some methods combine nature.
black-box neural models and symbolic reasoning by incorpo-
rating logical rules to increase interoperability. Interpretability
can convince people to trust predictions. Thus, further work VIII. C ONCLUSION
should go into interpretability and improve the reliability of Knowledge graphs as the ensemble of human knowledge
predicted knowledge. have attracted increasing research attention, with the recent
emergence of KRL, knowledge acquisition methods, and a
wide variety of knowledge-aware applications. This article
D. Scalability conducts a comprehensive survey on the following four scopes:
Scalability is crucial in large-scale knowledge graphs. There 1) KGE, with a full-scale systematic review from embedding
is a tradeoff between computational efficiency and model space, scoring metrics, encoding models, embedding with
expressiveness, with a limited number of works applied to external information, and training strategies; 2) knowledge
more than one million entities. Several embedding methods acquisition of entity discovery, relation extraction, and graph
use simplification to reduce the computation cost, such as completion from three perspectives of embedding learning,
simplifying tensor products with circular correlation opera- relational path inference, and logical rule reasoning; 3) tempo-
tion [21]. However, these methods still struggle with scaling ral knowledge graph representation learning and completion;
to millions of entities and relations. and 4) real-world knowledge-aware applications on NLU,
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
510 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
recommendation systems, question answering, and other mis- [26] S. He, K. Liu, G. Ji, and J. Zhao, “Learning to represent knowledge
cellaneous applications. Besides, some useful resources of data graphs with Gaussian embedding,” in Proc. 24th ACM Int. Conf. Inf.
sets and open-source libraries, and future research directions Knowl. Manage., Oct. 2015, pp. 623–632.
[27] H. Xiao, M. Huang, and X. Zhu, “TransG: A generative model
are introduced and discussed. Knowledge graph hosts a large for knowledge graph embedding,” in Proc. ACL, vol. 1, 2016,
research community and has a wide range of methodologies pp. 2316–2325.
and applications. We conduct this survey to have a summary [28] H. Xiao, M. Huang, Y. Hao, and X. Zhu, “From one point to a manifold:
of current representative research efforts and trends and expect Orbit models for knowledge graph embedding,” in Proc. IJCAI, 2016,
pp. 1315–1321.
that it can facilitate future research.
[29] I. Balazevic, C. Allen, and T. Hospedales, “Multi-relational poincaré
graph embeddings,” in NeurIPS, 2019, pp. 4463–4473.
[30] I. Chami, A. Wolf, D.-C. Juan, F. Sala, S. Ravi, and C. Ré, “Low-
R EFERENCES dimensional hyperbolic knowledge graph embeddings,” in Proc. 58th
[1] A. Newell, J. C. Shaw, and H. A. Simon, “Report on a general problem Annu. Meeting Assoc. Comput. Linguistics, 2020, pp. 6901–6914.
solving program,” in Proc. IFIP Congr., vol. 256, 1959, p. 64. [31] C. Xu and R. Li, “Relation embedding with dihedral group in knowl-
[2] E. Shortliffe, Computer-Based Medical Consultations: MYCIN, vol. 2. edge graph,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics,
Amsterdam, The Netherlands: Elsevier, 2012. 2019, pp. 263–272.
[3] X. Dong et al., “Knowledge vault: A Web-scale approach to proba- [32] B. Yang, W.-T. Yih, X. He, J. Gao, and L. Deng, “Embedding entities
bilistic knowledge fusion,” in Proc. SIGKDD, 2014, pp. 601–610. and relations for learning and inference in knowledge bases,” in Proc.
[4] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review ICLR, 2015, pp. 1–13.
of relational machine learning for knowledge graphs,” Proc. IEEE, [33] G. Ji, S. He, L. Xu, K. Liu, and J. Zhao, “Knowledge graph embedding
vol. 104, no. 1, pp. 11–33, Jan. 2016. via dynamic mapping matrix,” in Proc. 53rd Annu. Meeting Assoc.
[5] Q. Wang, Z. Mao, B. Wang, and L. Guo, “Knowledge graph embed- Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Process., vol. 1,
ding: A survey of approaches and applications,” IEEE Trans. Knowl. 2015, pp. 687–696.
Data Eng., vol. 29, no. 12, pp. 2724–2743, Dec. 2017. [34] H. Xiao, M. Huang, Y. Hao, and X. Zhu, “TransA: An adaptive
[6] A. Hogan et al., “Knowledge graphs,” 2020, arXiv:2003.02320. approach for knowledge graph embedding,” in Proc. AAAI, 2015,
[Online]. Available: http://arxiv.org/abs/2003.02320 pp. 1–7.
[7] F. N. Stokman and P. H. de Vries, “Structuring knowledge in a graph,” [35] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, and X. Zhu,
in Human-Computer Interaction. Berlin, Germany: Springer, 1988, “Knowledge graph embedding by flexible translation,” in Proc. KR,
pp. 186–206. 2016, pp. 557–560.
[8] A. Bordes, J. Weston, R. Collobert, and Y. Bengio, “Learning structured [36] Q. Xie, X. Ma, Z. Dai, and E. Hovy, “An interpretable knowledge
embeddings of knowledge bases,” in Proc. AAAI, 2011, pp. 301–306. transfer model for knowledge base completion,” in Proc. 55th Annu.
[9] Y. Lin, X. Han, R. Xie, Z. Liu, and M. Sun, “Knowledge representation Meeting Assoc. Comput. Linguistics, vol. 1, 2017, pp. 950–962.
learning: A quantitative review,” 2018, arXiv:1812.10901. [Online]. [37] W. Qian, C. Fu, Y. Zhu, D. Cai, and X. He, “Translating embeddings
Available: http://arxiv.org/abs/1812.10901 for knowledge graph completion with relation attention mechanism,”
[10] R. H. Richens, “Preprogramming for mechanical translation,” Mech. in Proc. 27th Int. Joint Conf. Artif. Intell., Jul. 2018, pp. 4286–4292.
Transl., vol. 3, no. 1, pp. 20–25, 1956. [38] S. Yang, J. Tian, H. Zhang, J. Yan, H. He, and Y. Jin, “TransMS:
[11] H. Paulheim, “Knowledge graph refinement: A survey of approaches Knowledge graph embedding for complex relations by multidirectional
and evaluation methods,” Semantic Web, vol. 8, no. 3, pp. 489–508, semantics,” in Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019,
Dec. 2016. pp. 1935–1942.
[12] L. Ehrlinger and W. Wöß, “Towards a definition of knowledge graphs,” [39] A. Bordes, X. Glorot, J. Weston, and Y. Bengio, “A semantic matching
SEMANTiCS (Posters, Demos, SuCCESS), vol. 48, pp. 1–4, 2016. energy function for learning with multi-relational data,” Mach. Learn.,
[13] T. Wu, G. Qi, C. Li, and M. Wang, “A survey of techniques for vol. 94, no. 2, pp. 233–259, Feb. 2014.
constructing Chinese knowledge graphs and their applications,” Sus- [40] Y. Xue, Y. Yuan, Z. Xu, and A. Sabharwal, “Expanding holographic
tainability, vol. 10, no. 9, p. 3245, Sep. 2018. embeddings for knowledge completion,” in Proc. NeurIPS, 2018,
[14] X. Chen, S. Jia, and Y. Xiang, “A review: Knowledge reasoning pp. 4491–4501.
over knowledge graph,” Expert Syst. Appl., vol. 141, Mar. 2020, [41] K. Hayashi and M. Shimbo, “On the equivalence of holographic and
Art. no. 112948. complex embeddings for link prediction,” in Proc. 55th Annu. Meeting
[15] T. Ebisu and R. Ichise, “TorusE: Knowledge graph embedding on a lie Assoc. Comput. Linguistics, vol. 1, 2017, pp. 554–559.
group,” in Proc. AAAI, 2018, pp. 1819–1826. [42] W. Zhang, B. Paudel, W. Zhang, A. Bernstein, and H. Chen, “Interac-
[16] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, tion embeddings for prediction and explanation in knowledge graphs,”
“Translating embeddings for modeling multi-relational data,” in Proc. in Proc. 12th ACM Int. Conf. Web Search Data Mining, Jan. 2019,
NIPS, 2013, pp. 2787–2795. pp. 96–104.
[17] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and [43] D. Q. Nguyen, T. D. Nguyen, D. Q. Nguyen, and D. Phung, “A
relation embeddings for knowledge graph completion,” in Proc. AAAI, novel embedding model for knowledge base completion based on
2015, pp. 2181–2187. convolutional neural network,” in Proc. Conf. North Amer. Chapter
[18] R. Socher, D. Chen, C. D. Manning, and A. Ng, “Reasoning with neural Assoc. Comput. Linguistics: Human Lang. Technol., vol. 2, 2018,
tensor networks for knowledge base completion,” in Proc. NIPS, 2013, pp. 327–333.
pp. 926–934. [44] C. Shang, Y. Tang, J. Huang, J. Bi, X. He, and B. Zhou, “End-
[19] Z. Zhang, J. Cai, Y. Zhang, and J. Wang, “Learning hierarchy-aware to-end structure-aware convolutional networks for knowledge base
knowledge graph embeddings for link prediction,” in Proc. AAAI, 2020, completion,” in Proc. AAAI, vol. 33, 2019, pp. 3060–3067.
pp. 3065–3072. [45] L. Guo, Z. Sun, and W. Hu, “Learning to exploit long-term rela-
[20] Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph embedding tional dependencies in knowledge graphs,” in Proc. ICML, 2019,
by translating on hyperplanes,” in Proc. AAAI, 2014, pp. 1112–1119. pp. 2505–2514.
[21] M. Nickel, L. Rosasco, and T. Poggio, “Holographic embeddings of [46] Q. Wang et al., “CoKE: Contextualized knowledge graph
knowledge graphs,” in Proc. AAAI, 2016, pp. 1955–1961. embedding,” 2019, arXiv:1911.02168. [Online]. Available:
[22] H. Liu, Y. Wu, and Y. Yang, “Analogical inference for multi-relational http://arxiv.org/abs/1911.02168
embeddings,” in Proc. ICML, 2017, pp. 2168–2178. [47] Y. Wang, R. Gemulla, and H. Li, “On multi-relational link prediction
[23] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, with bilinear models,” in Proc. AAAI, 2018, pp. 4227–4234.
“Complex embeddings for simple link prediction,” in Proc. ICML, [48] S. M. Kazemi and D. Poole, “SimplE embedding for link prediction
2016, pp. 2071–2080. in knowledge graphs,” in Proc. NeurIPS, 2018, pp. 4284–4295.
[24] Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang, “RotatE: Knowledge graph [49] M. Nickel, V. Tresp, and H.-P. Kriegel, “A three-way model for
embedding by relational rotation in complex space,” in Proc. ICLR, collective learning on multi-relational data,” in Proc. ICML, vol. 11,
2019, pp. 1–18. 2011, pp. 809–816.
[25] S. Zhang, Y. Tay, L. Yao, and Q. Liu, “Quaternion knowledge graph [50] M. Nickel, V. Tresp, and H. P. Kriegel, “Factorizing YAGO: Scalable
embedding,” in Proc. NeurIPS, 2019, pp. 2731–2741. machine learning for linked data,” in Proc. WWW, 2012, pp. 271–280.
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 511
[51] R. Jenatton, N. L. Roux, A. Bordes, and G. R. Obozinski, “A latent [77] T. Safavi, D. Koutra, and E. Meij, “Evaluating the calibration of
factor model for highly multi-relational data,” in Proc. NIPS, 2012, knowledge graph embeddings for trustworthy link prediction,” in Proc.
pp. 3167–3175. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2020,
[52] I. Balazevic, C. Allen, and T. Hospedales, “TuckER: Tensor factor- pp. 8308–8321.
ization for knowledge graph completion,” in Proc. Conf. Empirical [78] X. Han, Z. Liu, and M. Sun, “Neural knowledge acquisition via mutual
Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. attention between knowledge graph and text,” in Proc. AAAI, 2018,
Process. (EMNLP-IJCNLP), 2019, pp. 5185–5194. pp. 4832–4839.
[53] S. Amin, S. Varanasi, K. A. Dunfield, and G. Neumann, “LowFER: [79] T. Dong, Z. Wang, J. Li, C. Bauckhage, and A. B. Cremers, “Triple
Low-rank bilinear pooling for link prediction,” in Proc. ICML, 2020, classification using regions and fine-grained entity typing,” in Proc.
pp. 1–11. AAAI, vol. 33, 2019, pp. 77–85.
[54] Q. Liu et al., “Probabilistic reasoning via deep learning: Neural [80] P. Zhou et al., “Attention-based bidirectional long short-term memory
association models,” 2016, arXiv:1603.07704. [Online]. Available: networks for relation classification,” in Proc. 54th Annu. Meeting
http://arxiv.org/abs/1603.07704 Assoc. Comput. Linguistics, vol. 2, 2016, pp. 207–212.
[55] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional [81] E. Cao, D. Wang, J. Huang, and W. Hu, “Open knowledge enrichment
2D knowledge graph embeddings,” in Proc. AAAI, vol. 32, 2018, for long-tail entities,” in Proc. Web Conf., Apr. 2020, pp. 384–394.
pp. 1811–1818. [82] B. Shi and T. Weninger, “ProjE: Embedding projection for knowledge
[56] I. Balažević, C. Allen, and T. M. Hospedales, “Hypernetwork knowl- graph completion,” in Proc. AAAI, 2017, pp. 1236–1242.
edge graph embeddings,” in Proc. ICANN, 2019, pp. 553–565. [83] S. Guan, X. Jin, Y. Wang, and X. Cheng, “Shared embedding based
[57] M. Gardner, P. Talukdar, J. Krishnamurthy, and T. Mitchell, “Incorpo- neural networks for knowledge graph completion,” in Proc. 27th ACM
rating vector space similarity in random walk inference over knowledge Int. Conf. Inf. Knowl. Manage., Oct. 2018, pp. 247–256.
bases,” in Proc. Conf. Empirical Methods Natural Lang. Process. [84] B. Shi and T. Weninger, “Open-world knowledge graph completion,”
(EMNLP), 2014, pp. 397–406. in Proc. AAAI, 2018, pp. 1957–1964.
[58] A. Neelakantan, B. Roth, and A. McCallum, “Compositional vector [85] C. Zhang, Y. Li, N. Du, W. Fan, and P. S. Yu, “On the generative dis-
space models for knowledge base completion,” in Proc. 53rd Annu. covery of structured medical knowledge,” in Proc. 24th ACM SIGKDD
Meeting Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Int. Conf. Knowl. Discovery Data Mining, Jul. 2018, pp. 2720–2728.
Process., vol. 1, 2015, pp. 156–166. [86] N. Lao and W. W. Cohen, “Relational retrieval using a combination
[59] L. Yao, C. Mao, and Y. Luo, “KG-BERT: BERT for knowl- of path-constrained random walks,” Mach. Learn., vol. 81, no. 1,
edge graph completion,” 2019, arXiv:1909.03193. [Online]. Available: pp. 53–67, Oct. 2010.
http://arxiv.org/abs/1909.03193 [87] R. Das, A. Neelakantan, D. Belanger, and A. McCallum, “Chains of
[60] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, reasoning over entities, relations, and text using recurrent neural net-
and M. Welling, “Modeling relational data with graph convolutional works,” in Proc. 15th Conf. Eur. Chapter Assoc. Comput. Linguistics,
networks,” in Proc. ESWC, 2018, pp. 593–607. vol. 1, 2017, pp. 132–141.
[61] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [88] W. Chen, W. Xiong, X. Yan, and W. Y. Wang, “Variational knowledge
convolutional networks,” in Proc. ICLR, 2017, pp. 1–14. graph reasoning,” in Proc. Conf. North Amer. Chapter Assoc. Comput.
[62] D. Nathani, J. Chauhan, C. Sharma, and M. Kaul, “Learning attention- Linguistics: Human Lang. Technol., vol. 1, 2018, pp. 1823–1832.
based embeddings for relation prediction in knowledge graphs,” [89] W. Xiong, T. Hoang, and W. Y. Wang, “DeepPath: A reinforcement
in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, learning method for knowledge graph reasoning,” in Proc. Conf.
pp. 4710–4723. Empirical Methods Natural Lang. Process., 2017, pp. 564–573.
[63] S. Vashishth, S. Sanyal, V. Nitin, and P. Talukdar, “Composition-based [90] R. Das et al., “Go for a walk and arrive at the answer: Reasoning
multi-relational graph convolutional networks,” in Proc. ICLR, 2020, over paths in knowledge bases using reinforcement learning,” in Proc.
pp. 1–15. ICLR, 2018, pp. 1–18.
[64] Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph and text [91] X. V. Lin, R. Socher, and C. Xiong, “Multi-hop knowledge graph
jointly embedding,” in Proc. Conf. Empirical Methods Natural Lang. reasoning with reward shaping,” in Proc. Conf. Empirical Methods
Process. (EMNLP), 2014, pp. 1591–1601. Natural Lang. Process., 2018, pp. 3243–3253.
[65] R. Xie, Z. Liu, J. Jia, H. Luan, and M. Sun, “Representation learning [92] Y. Shen, J. Chen, P.-S. Huang, Y. Guo, and J. Gao, “M-Walk: Learning
of knowledge graphs with entity descriptions,” in Proc. AAAI, 2016, to walk over graphs using monte carlo tree search,” in Proc. NeurIPS,
pp. 2659–2665. 2018, pp. 6786–6797.
[66] H. Xiao, M. Huang, L. Meng, and X. Zhu, “SSP: Semantic space [93] C. Fu, T. Chen, M. Qu, W. Jin, and X. Ren, “Collaborative policy
projection for knowledge graph embedding with text descriptions,” in learning for open knowledge graph reasoning,” in Proc. Conf. Empiri-
Proc. AAAI, 2017, pp. 3104–3110. cal Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang.
[67] S. Guo, Q. Wang, B. Wang, L. Wang, and L. Guo, “Semantically Process. (EMNLP-IJCNLP), 2019, pp. 2672–2681.
smooth knowledge graph embedding,” in Proc. 53rd Annu. Meeting [94] L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek, “AMIE:
Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Process., Association rule mining under incomplete evidence in ontological
vol. 1, 2015, pp. 84–94. knowledge bases,” in Proc. WWW, 2013, pp. 413–422.
[68] R. Xie, Z. Liu, and M. Sun, “Representation learning of knowledge [95] P. G. Omran, K. Wang, and Z. Wang, “An embedding-based approach
graphs with hierarchical types,” in Proc. IJCAI, 2016, pp. 2965–2971. to rule learning in knowledge graphs,” IEEE TKDE, vol. 33, no. 4,
[69] Y. Lin, Z. Liu, and M. Sun, “Knowledge representation learning with pp. 1348–1359, Apr. 2021.
entities, attributes and relations,” in Proc. IJCAI, 2016, pp. 2866–2872. [96] S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo, “Jointly embedding
[70] Z. Zhang, F. Zhuang, M. Qu, F. Lin, and Q. He, “Knowledge graph knowledge graphs and logical rules,” in Proc. Conf. Empirical Methods
embedding with hierarchical relation structure,” in Proc. Conf. Empir- Natural Lang. Process., 2016, pp. 192–202.
ical Methods Natural Lang. Process., 2018, pp. 3198–3207. [97] S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo, “Knowledge graph
[71] R. Xie, Z. Liu, H. Luan, and M. Sun, “Image-embodied knowledge embedding with iterative guidance from soft rules,” in Proc. AAAI,
representation learning,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2018, pp. 4816–4823.
Aug. 2017, pp. 3140–3146. [98] W. Zhang et al., “Iteratively learning embeddings and rules for knowl-
[72] W. Wu, H. Li, H. Wang, and K. Q. Zhu, “Probase: A probabilistic tax- edge graph reasoning,” in Proc. World Wide Web Conf. (WWW), 2019,
onomy for text understanding,” in Proc. SIGMOD, 2012, pp. 481–492. pp. 2366–2377.
[73] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and [99] T. Rocktäschel and S. Riedel, “End-to-end differentiable proving,” in
T. M. Mitchell, “Toward an architecture for never-ending language Proc. NIPS, 2017, pp. 3788–3800.
learning,” in Proc. AAAI, 2010, pp. 1306–1313. [100] F. Yang, Z. Yang, and W. W. Cohen, “Differentiable learning of
[74] R. Speer, J. Chin, and C. Havasi, “ConceptNet 5.5: An open logical rules for knowledge base reasoning,” in Proc. NIPS, 2017,
multilingual graph of general knowledge,” in Proc. AAAI, 2017, pp. 2319–2328.
pp. 4444–4451. [101] P.-W. Wang, D. Stepanova, C. Domokos, and J. Z. Kolter, “Differen-
[75] X. Chen, M. Chen, W. Shi, Y. Sun, and C. Zaniolo, “Embedding uncer- tiable learning of numerical rules in knowledge graphs,” in Proc. ICLR,
tain knowledge graphs,” in Proc. AAAI, vol. 33, 2019, pp. 3363–3370. 2020, pp. 1–12.
[76] P. Tabacof and L. Costabello, “Probability calibration for knowledge [102] M. Qu and J. Tang, “Probabilistic logic neural networks for reasoning,”
graph embedding models,” in Proc. ICLR, 2019. in Proc. NeurIPS, 2019, pp. 7710–7720.
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
512 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
[103] Y. Zhang et al., “Efficient probabilistic logic reasoning with graph [128] Z. Sun, W. Hu, and C. Li, “Cross-lingual entity alignment via joint
neural networks,” in Proc. ICLR, 2020, pp. 1–20. attribute-preserving embedding,” in Proc. ISWC, 2017, pp. 628–644.
[104] W. Xiong, M. Yu, S. Chang, X. Guo, and W. Y. Wang, “One-shot [129] M. Chen, Y. Tian, K.-W. Chang, S. Skiena, and C. Zaniolo, “Co-
relational learning for knowledge graphs,” in Proc. Conf. Empirical training embeddings of knowledge graphs and entity descriptions for
Methods Natural Lang. Process., 2018, pp. 1980–1990. cross-lingual entity alignment,” in Proc. 27th Int. Joint Conf. Artif.
[105] X. Lv, Y. Gu, X. Han, L. Hou, J. Li, and Z. Liu, “Adapting meta Intell., Jul. 2018, pp. 3998–4004.
knowledge graph information for multi-hop reasoning over few-shot [130] Q. Zhang, Z. Sun, W. Hu, M. Chen, L. Guo, and Y. Qu, “Multi-view
relations,” in Proc. Conf. Empirical Methods Natural Lang. Process. knowledge graph embedding for entity alignment,” in Proc. 28th Int.
9th Int. Joint Conf. Natural Lang. Process. (EMNLP-IJCNLP), 2019, Joint Conf. Artif. Intell., Aug. 2019, pp. 5429–5435.
pp. 3374–3379. [131] B. D. Trsedya, J. Qi, and R. Zhang, “Entity alignment between
[106] M. Chen, W. Zhang, W. Zhang, Q. Chen, and H. Chen, “Meta relational knowledge graphs using attribute embeddings,” in Proc. AAAI, vol. 33,
learning for few-shot link prediction in knowledge graphs,” in Proc. 2019, pp. 297–304.
Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint Conf. [132] Z. Sun et al., “A benchmarking study of embedding-based entity
Natural Lang. Process. (EMNLP-IJCNLP), 2019, pp. 4217–4226. alignment for knowledge graphs,” in Proc. VLDB, 2020.
[107] C. Zhang, H. Yao, C. Huang, M. Jiang, Z. Li, and N. V. Chawla, “Few- [133] M. Craven et al., “Constructing biological knowledge bases by extract-
shot knowledge graph completion,” in Proc. AAAI, 2020, pp. 1–8. ing information from text sources,” in Proc. ISMB, 1999, pp. 77–86.
[108] P. Qin, X. Wang, W. Chen, C. Zhang, W. Xu, and W. Y. Wang, [134] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision
“Generative adversarial zero-shot relational learning for knowledge for relation extraction without labeled data,” in Proc. Joint Conf. 47th
graphs,” in Proc. AAAI, 2020, pp. 1–8. Annu. Meeting ACL 4th Int. Joint Conf. Natural Lang. Process. (AFNLP
[109] J. Baek, D. B. Lee, and S. J. Hwang, “Learning to extrapolate ACL-IJCNLP), vol. 2, 2009, pp. 1003–1011.
knowledge: Transductive few-shot out-of-graph link prediction,” in [135] J. Qu, D. Ouyang, W. Hua, Y. Ye, and X. Zhou, “Discovering
Proc. NeurIPS, 2020, pp. 546–560. correlations between sparse features in distant supervision for relation
[110] J. P. C. Chiu and E. Nichols, “Named entity recognition with bidi- extraction,” in Proc. 12th ACM Int. Conf. Web Search Data Mining,
rectional LSTM-CNNs,” Trans. Assoc. Comput. Linguistics, vol. 4, Jan. 2019, pp. 726–734.
pp. 357–370, Dec. 2016. [136] D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, “Relation classification
[111] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and via convolutional deep neural network,” in Proc. COLING, 2014,
C. Dyer, “Neural architectures for named entity recognition,” in Proc. pp. 2335–2344.
Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. [137] T. H. Nguyen and R. Grishman, “Relation extraction: Perspective from
Technol., 2016, pp. 260–270. convolutional neural networks,” in Proc. 1st Workshop Vector Space
[112] C. Xia et al., “Multi-grained named entity recognition,” in Proc. 57th Model. Natural Lang. Process., 2015, pp. 39–48.
Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 1430–1440. [138] D. Zeng, K. Liu, Y. Chen, and J. Zhao, “Distant supervision for relation
[113] A. Hu, Z. Dou, J.-Y. Nie, and J.-R. Wen, “Leveraging multi-token extraction via piecewise convolutional neural networks,” in Proc. Conf.
entities in document-level named entity recognition,” in Proc. AAAI, Empirical Methods Natural Lang. Process., 2015, pp. 1753–1762.
2020, pp. 7961–7968. [139] X. Jiang, Q. Wang, P. Li, and B. Wang, “Relation extraction with multi-
[114] X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, “A unified MRC instance multi-label convolutional neural networks,” in Proc. COLING,
framework for named entity recognition,” in Proc. 58th Annu. Meeting 2016, pp. 1471–1480.
Assoc. Comput. Linguistics, 2020, pp. 5849–5859. [140] H. Ye, W. Chao, Z. Luo, and Z. Li, “Jointly extracting relations with
[115] Y. Sun et al., “ERNIE 2.0: A continual pre-training framework for class ties via effective deep ranking,” in Proc. 55th Annu. Meeting
language understanding,” in Proc. AAAI, 2020, pp. 8968–8975. Assoc. Comput. Linguistics, vol. 1, 2017, pp. 1810–1820.
[116] W. Liu et al., “K-BERT: Enabling language representation with knowl- [141] W. Zeng, Y. Lin, Z. Liu, and M. Sun, “Incorporating relation paths in
edge graph,” in Proc. AAAI, 2020, pp. 1–8. neural relation extraction,” in Proc. Conf. Empirical Methods Natural
[117] X. Ren, W. He, M. Qu, C. R. Voss, H. Ji, and J. Han, “Label noise Lang. Process., 2017, pp. 1768–1777.
reduction in entity typing by heterogeneous partial-label embedding,” [142] Y. Xu, L. Mou, G. Li, Y. Chen, H. Peng, and Z. Jin, “Classifying rela-
in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, tions via long short term memory networks along shortest dependency
Aug. 2016, pp. 1825–1834. paths,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015,
[118] Y. Ma, E. Cambria, and S. Gao, “Label embedding for zero-shot pp. 1785–1794.
fine-grained named entity typing,” in Proc. COLING, 2016, [143] M. Miwa and M. Bansal, “End-to-end relation extraction using LSTMs
pp. 171–180. on sequences and tree structures,” in Proc. 54th Annu. Meeting Assoc.
[119] J. Hao, M. Chen, W. Yu, Y. Sun, and W. Wang, “Universal representa- Comput. Linguistics, vol. 1, 2016, pp. 1105–1116.
tion learning of knowledge bases by jointly embedding instances and [144] R. Cai, X. Zhang, and H. Wang, “Bidirectional recurrent convolutional
ontological concepts,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. neural network for relation classification,” in Proc. 54th Annu. Meeting
Discovery Data Mining, Jul. 2019, pp. 1709–1719. Assoc. Comput. Linguistics, vol. 1, 2016, pp. 756–765.
[120] Y. Zhao, A. Zhang, R. Xie, K. Liu, and X. Wang, “Connecting [145] Y. Shen and X. Huang, “Attention-based convolutional neural net-
embeddings for knowledge graph entity typing,” in Proc. 58th Annu. work for semantic relation extraction,” in Proc. COLING, 2016,
Meeting Assoc. Comput. Linguistics, 2020, pp. 6419–6428. pp. 2526–2536.
[121] H. Huang, L. Heck, and H. Ji, “Leveraging deep neural networks and [146] Y. Lin, S. Shen, Z. Liu, H. Luan, and M. Sun, “Neural relation
knowledge graphs for entity disambiguation,” 2015, arXiv:1504.07678. extraction with selective attention over instances,” in Proc. 54th Annu.
[Online]. Available: http://arxiv.org/abs/1504.07678 Meeting Assoc. Comput. Linguistics, vol. 1, 2016, pp. 2124–2133.
[122] W. Fang, J. Zhang, D. Wang, Z. Chen, and M. Li, “Entity disambigua- [147] G. Ji, K. Liu, S. He, and J. Zhao, “Distant supervision for relation
tion by knowledge and text jointly embedding,” in Proc. 20th SIGNLL extraction with sentence-level attention and entity descriptions,” in
Conf. Comput. Natural Lang. Learn., 2016, pp. 260–269. Proc. AAAI, 2017, pp. 3060–3066.
[123] O.-E. Ganea and T. Hofmann, “Deep joint entity disambiguation with [148] X. Han, P. Yu, Z. Liu, M. Sun, and P. Li, “Hierarchical relation extrac-
local neural attention,” in Proc. Conf. Empirical Methods Natural Lang. tion with coarse-to-fine grained attention,” in Proc. Conf. Empirical
Process., 2017, pp. 2619–2629. Methods Natural Lang. Process., 2018, pp. 2236–2245.
[124] P. Le and I. Titov, “Improving entity linking by modeling latent [149] L. Baldini Soares, N. FitzGerald, J. Ling, and T. Kwiatkowski, “Match-
relations between mentions,” in Proc. 56th Annu. Meeting Assoc. ing the blanks: Distributional similarity for relation learning,” in Proc.
Comput. Linguistics, vol. 1, 2018, pp. 1595–1604. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 2895–2905.
[125] M. Chen, Y. Tian, M. Yang, and C. Zaniolo, “Multilingual knowledge [150] Y. Zhang, P. Qi, and C. D. Manning, “Graph convolution over pruned
graph embeddings for cross-lingual knowledge alignment,” in Proc. dependency trees improves relation extraction,” in Proc. Conf. Empir-
26th Int. Joint Conf. Artif. Intell., Aug. 2017, pp. 1511–1517. ical Methods Natural Lang. Process., 2018, pp. 2205–2215.
[126] H. Zhu, R. Xie, Z. Liu, and M. Sun, “Iterative entity alignment via joint [151] Z. Guo, Y. Zhang, and W. Lu, “Attention guided graph convolutional
knowledge embeddings,” in Proc. 26th Int. Joint Conf. Artif. Intell., networks for relation extraction,” in Proc. 57th Annu. Meeting Assoc.
Aug. 2017, pp. 4258–4264. Comput. Linguistics, 2019, pp. 241–251.
[127] Z. Sun, W. Hu, Q. Zhang, and Y. Qu, “Bootstrapping entity alignment [152] N. Zhang et al., “Long-tail relation extraction via knowledge graph
with knowledge graph embedding,” in Proc. 27th Int. Joint Conf. Artif. embeddings and graph convolution networks,” in Proc. Conf. North,
Intell., Jul. 2018, pp. 4396–4402. 2019, pp. 3016–3025.
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 513
[153] Y. Wu, D. Bamman, and S. Russell, “Adversarial training for relation [178] A. García-Durán, S. Dumančić, and M. Niepert, “Learning sequence
extraction,” in Proc. Conf. Empirical Methods Natural Lang. Process., encoders for temporal knowledge graph completion,” in Proc. Conf.
2017, pp. 1778–1783. Empirical Methods Natural Lang. Process., 2018, pp. 4816–4821.
[154] P. Qin, W. Xu, and W. Y. Wang, “DSGAN: Generative adversarial [179] Y. Liu, W. Hua, K. Xin, and X. Zhou, “Context-aware temporal
training for distant supervision relation extraction,” in Proc. 56th Annu. knowledge graph embedding,” in Proc. WISE, 2019, pp. 583–598.
Meeting Assoc. Comput. Linguistics, vol. 1, 2018, pp. 496–505. [180] T. Lacroix, G. Obozinski, and N. Usunier, “Tensor decompositions for
[155] P. Qin, W. Xu, and W. Y. Wang, “Robust distant supervision relation temporal knowledge base completion,” in Proc. ICLR, 2020, pp. 1–12.
extraction via deep reinforcement learning,” in Proc. 56th Annu. [181] D. T. Wijaya, N. Nakashole, and T. M. Mitchell, “CTPs: Contextual
Meeting Assoc. Comput. Linguistics, vol. 1, 2018, pp. 2137–2147. temporal profiles for time scoping facts using state change detection,”
[156] X. Zeng, S. He, K. Liu, and J. Zhao, “Large scaled relation extraction in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP),
with reinforcement learning,” in Proc. AAAI, 2018, pp. 5658–5665. 2014, pp. 1930–1936.
[157] J. Feng, M. Huang, L. Zhao, Y. Yang, and X. Zhu, “Reinforcement [182] R. Goel, S. M. Kazemi, M. Brubaker, and P. Poupart, “Diachronic
learning for relation classification from noisy data,” in Proc. AAAI, embedding for temporal knowledge graph completion,” in Proc. AAAI,
2018, pp. 5779–5786. 2020, pp. 3988–3995.
[158] R. Takanobu, T. Zhang, J. Liu, and M. Huang, “A hierarchical frame- [183] R. Trivedi, H. Dai, Y. Wang, and L. Song, “Know-evolve: Deep
work for relation extraction with reinforcement learning,” in Proc. temporal reasoning for dynamic knowledge graphs,” in Proc. ICML,
AAAI, vol. 33, 2019, pp. 7072–7079. 2017, pp. 3462–3471.
[159] Y. Huang and W. Y. Wang, “Deep residual learning for weakly- [184] W. Jin, C. Zhang, P. Szekely, and X. Ren, “Recurrent event network
supervised relation extraction,” in Proc. Conf. Empirical Methods for reasoning over temporal knowledge graphs,” in Proc. ICLR RLGM
Natural Lang. Process., 2017, pp. 1803–1807. Workshop, 2019, pp. 1–6.
[160] T. Liu, X. Zhang, W. Zhou, and W. Jia, “Neural relation extraction via [185] T. Jiang et al., “Towards time-aware knowledge graph completion,” in
inner-sentence noise reduction and transfer learning,” in Proc. Conf. Proc. COLING, 2016, pp. 1715–1724.
Empirical Methods Natural Lang. Process., 2018, pp. 2195–2204. [186] T. Jiang et al., “Encoding temporal information for time-aware link
[161] K. Lei et al., “Cooperative denoising for distantly supervised relation prediction,” in Proc. Conf. Empirical Methods Natural Lang. Process.,
extraction,” in Proc. COLING, 2018, pp. 426–436. 2016, pp. 2350–2354.
[162] H. Jiang et al., “Relation extraction using supervision from topic [187] M. W. Chekol, G. Pirrò J. Schoenfisch, and H. Stuckenschmidt,
knowledge of relation labels,” in Proc. 28th Int. Joint Conf. Artif. Intell., “Marrying uncertainty and time in knowledge graphs,” in Proc. AAAI,
Aug. 2019, pp. 5024–5030. 2017, pp. 88–94.
[163] H. Shahbazi, X. Fern, R. Ghaeini, and P. Tadepalli, “Relation extrac- [188] R. Logan, N. F. Liu, M. E. Peters, M. Gardner, and S. Singh, “Barack’s
tion with explanation,” in Proc. 58th Annu. Meeting Assoc. Comput. wife hillary: Using knowledge graphs for fact-aware language model-
Linguistics, 2020, pp. 6488–6494. ing,” in Proc. ACL, 2019, pp. 5962–5971.
[189] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE:
[164] T. Gao, X. Han, Z. Liu, and M. Sun, “Hybrid attention-based prototyp-
Enhanced language representation with informative entities,” in Proc.
ical networks for noisy few-shot relation classification,” in Proc. AAAI,
57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 1441–1451.
vol. 33, 2019, pp. 6407–6414.
[190] Y. Sun et al., “ERNIE: Enhanced representation through knowl-
[165] M. Qu, T. Gao, L.-P. A. Xhonneux, and J. Tang, “Few-shot relation
edge integration,” 2019, arXiv:1904.09223. [Online]. Available:
extraction via Bayesian meta-learning on relation graphs,” in Proc.
http://arxiv.org/abs/1904.09223
ICML, 2020, pp. 1–10.
[191] X. Wang, T. Gao, Z. Zhu, Z. Liu, J. Li, and J. Tang, “KEPLER:
[166] M. Miwa and Y. Sasaki, “Modeling joint entity and relation extraction
A unified model for knowledge embedding and pre-trained language
with table representation,” in Proc. Conf. Empirical Methods Natural
representation,” Trans. Assoc. Comput. Linguistics, vol. 9, pp. 176–194,
Lang. Process. (EMNLP), 2014, pp. 1858–1869.
Feb. 2020.
[167] A. Katiyar and C. Cardie, “Going out on a limb: Joint extraction of [192] T. Shen, Y. Mao, P. He, G. Long, A. Trischler, and W. Chen,
entity mentions and relations without dependency trees,” in Proc. 55th “Exploiting structured knowledge in text via graph-guided represen-
Annu. Meeting Assoc. Comput. Linguistics, vol. 1, 2017, pp. 917–928. tation learning,” in Proc. Conf. Empirical Methods Natural Lang.
[168] S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, and B. Xu, “Joint Process. (EMNLP), 2020, pp. 8980–8994.
extraction of entities and relations based on a novel tagging scheme,” [193] T. Sun et al., “CoLAKE: Contextualized language and knowledge
in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, vol. 1, 2017, embedding,” in Proc. 28th Int. Conf. Comput. Linguistics, 2020,
pp. 1227–1236. pp. 3660–3670.
[169] X. Li et al., “Entity-relation extraction as multi-turn question answer- [194] B. He et al., “BERT-MK: Integrating graph contextualized knowledge
ing,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, into pre-trained language models,” in Proc. Findings Assoc. Comput.
pp. 1340–1350. Linguistics (EMNLP), 2020, pp. 2281–2290.
[170] D. Dai, X. Xiao, Y. Lyu, S. Dou, Q. She, and H. Wang, “Joint extraction [195] F. Petroni et al., “Language models as knowledge bases?” in Proc.
of entities and overlapping relations using position-attentive sequence EMNLP-IJCNLP, 2019, pp. 2463–2473.
labeling,” in Proc. AAAI, vol. 33, 2019, pp. 6300–6308. [196] Z. Dai, L. Li, and W. Xu, “CFO: Conditional focused neural question
[171] Z. Wei, J. Su, Y. Wang, Y. Tian, and Y. Chang, “A novel cascade binary answering with large-scale knowledge bases,” in Proc. 54th Annu.
tagging framework for relational triple extraction,” in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, vol. 1, 2016, pp. 800–810.
Meeting Assoc. Comput. Linguistics, 2020, pp. 1476–1488. [197] Y. Chen, L. Wu, and M. J. Zaki, “Bidirectional attentive memory
[172] Y. Wang, B. Yu, Y. Zhang, T. Liu, H. Zhu, and L. Sun, “TPLinker: networks for question answering over knowledge bases,” in Proc. Conf.
Single-stage joint extraction of entities and relations through token North, 2019, pp. 2913–2923.
pair linking,” in Proc. 28th Int. Conf. Comput. Linguistics, 2020, [198] S. Mohammed, P. Shi, and J. Lin, “Strong baselines for simple question
pp. 1572–1582. answering over knowledge graphs with and without neural networks,”
[173] Z. Zhong and D. Chen, “A frustratingly easy approach for entity in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum.
and relation extraction,” 2020, arXiv:2010.12812. [Online]. Available: Lang. Technol., vol. 2, 2018, pp. 291–296.
http://arxiv.org/abs/2010.12812 [199] L. Bauer, Y. Wang, and M. Bansal, “Commonsense for generative
[174] D. Xu, C. Ruan, E. Korpeoglu, S. Kumar, and K. Achan, “Inductive multi-hop question answering tasks,” in Proc. Conf. Empirical Methods
representation learning on temporal graphs,” in Proc. ICLR, 2020, Natural Lang. Process., 2018, pp. 4220–4230.
pp. 1–19. [200] Y. Zhang, H. Dai, Z. Kozareva, A. J. Smola, and L. Song, “Variational
[175] J. Leblay and M. W. Chekol, “Deriving validity time in knowledge reasoning for question answering with knowledge graph,” in Proc.
graph,” in Proc. Companion The Web Conf. Web Conf. (WWW), 2018, AAAI, 2018, pp. 6069–6076.
pp. 1771–1776. [201] B. Y. Lin, X. Chen, J. Chen, and X. Ren, “KagNet: Knowledge-aware
[176] Y. Ma, V. Tresp, and E. A. Daxberger, “Embedding models for graph networks for commonsense reasoning,” in Proc. Conf. Empirical
episodic knowledge graphs,” J. Web Semantics, vol. 59, Dec. 2019, Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang.
Art. no. 100490. Process. (EMNLP-IJCNLP), 2019, pp. 2829–2839.
[177] S. S. Dasgupta, S. N. Ray, and P. Talukdar, “Hyte: Hyperplane- [202] M. Ding, C. Zhou, Q. Chen, H. Yang, and J. Tang, “Cognitive graph
based temporally aware knowledge graph embedding,” in Proc. Conf. for multi-hop reading comprehension at scale,” in Proc. 57th Annu.
Empirical Methods Natural Lang. Process., 2018, pp. 2001–2011. Meeting Assoc. Comput. Linguistics, 2019, pp. 2694–2703.
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
514 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022
[203] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma, “Collaborative Erik Cambria (Fellow, IEEE) received the Ph.D.
knowledge base embedding for recommender systems,” in Proc. 22nd degree through a joint program between the
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2016, University of Stirling, Stirling, U.K., and the Media
pp. 353–362. Lab, Massachusetts Institute of Technology (MIT),
[204] H. Wang, F. Zhang, X. Xie, and M. Guo, “DKN: Deep knowledge- Cambridge, MA, USA, in 2012.
aware network for news recommendation,” in Proc. WWW, 2018, He worked at Microsoft Research Asia, Beijing,
pp. 1835–1844. China, and HP Labs India, Bengaluru, India.
[205] H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo, “Multi-task He is the Founder of SenticNet, a Singapore-
feature learning for knowledge graph enhanced recommendation,” in based company offering B2B sentiment
Proc. World Wide Web Conf. (WWW), 2019, pp. 2000–2010. analysis services, and an Associate Professor
[206] X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T.-S. Chua, “Explainable with Nanyang Technological University (NTU),
reasoning over knowledge graphs for recommendation,” in Proc. AAAI, Singapore, where he also holds the appointment of the Provost Chair in
vol. 33, 2019, pp. 5329–5336. Computer Science and Engineering.
[207] Y. Xian, Z. Fu, S. Muthukrishnan, G. de Melo, and Y. Zhang, Dr. Cambria is involved in many international conferences as a PC
“Reinforcement knowledge graph reasoning for explainable recommen- member and the Program Chair. He is also an Associate Editor of several
dation,” in Proc. 42nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., journals, e.g., Neurocomputing (NEUCOM), Information Fusion (INFFUS),
Jul. 2019, pp. 285–294. Knowledge-Based Systems (KBS), IEEE Computational Intelligence
[208] X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “KGAT: Knowledge Magazine (IEEE CIM), and IEEE Intelligent Systems (where he manages the
graph attention network for recommendation,” in Proc. SIGKDD, 2019, Department of Affective Computing and Sentiment Analysis).
pp. 950–958.
[209] A. Sharma and P. Talukdar, “Towards understanding the geometry of
knowledge graph embeddings,” in Proc. 56th Annu. Meeting Assoc.
Comput. Linguistics, vol. 1, 2018, pp. 122–131.
[210] P. W. Battaglia et al., “Relational inductive biases, deep learning,
and graph networks,” 2018, arXiv:1806.01261. [Online]. Available:
http://arxiv.org/abs/1806.01261
Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.