0% found this document useful (0 votes)
142 views21 pages

A Survey On Knowledge Graphs Representation Acquisition and Applications

Uploaded by

x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views21 pages

A Survey On Knowledge Graphs Representation Acquisition and Applications

Uploaded by

x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

494 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO.

2, FEBRUARY 2022

A Survey on Knowledge Graphs: Representation,


Acquisition, and Applications
Shaoxiong Ji , Shirui Pan , Member, IEEE, Erik Cambria , Fellow, IEEE,
Pekka Marttinen , and Philip S. Yu , Life Fellow, IEEE

Abstract— Human knowledge provides a formal understand-


ing of the world. Knowledge graphs that represent structural
relations between entities have become an increasingly popular r ∈ R, e ∈ E Relation set and entity set.
research direction toward cognition and human-level intelligence. v∈V Vertex in the vertex set.
In this survey, we provide a comprehensive review of the knowl- ξ ∈ EG Edge in the edge set.
edge graph covering overall research topics about: 1) knowledge es , eq , et Source/query/current entity.
graph representation learning; 2) knowledge acquisition and rq Query relation.
completion; 3) temporal knowledge graph; and 4) knowledge- < w1 , . . . , wn > Text corpus.
aware applications and summarize recent breakthroughs and
d· (·) Distance metric in specific space.
perspective directions to facilitate future research. We propose
a full-view categorization and new taxonomies on these topics. fr (h, t) Scoring function.
Knowledge graph embedding is organized from four aspects of σ (·), g(·) Nonlinear activation function.
representation space, scoring function, encoding models, and aux- Mr Mapping matrix.
iliary information. For knowledge acquisition, especially knowl- 
M Tensor.
edge graph completion, embedding methods, path inference, and L Loss function.
logical rule reasoning are reviewed. We further explore several Rd d-dimensional real-valued space.
emerging topics, including metarelational learning, commonsense Cd d-dimensional complex space.
reasoning, and temporal knowledge graphs. To facilitate future
research on knowledge graphs, we also provide a curated col- Hd d-dimensional hypercomplex space.
lection of data sets and open-source libraries on different tasks. Td d-dimensional torus space.
In the end, we have a thorough outlook on several promising Bdc d-dimensional hyperbolic space
research directions. with curvature c.
Index Terms— Deep learning, knowledge graph comple- N (u, σ 2 I) Gaussian distribution.
tion (KGC), knowledge graph, reasoning, relation extraction, h, t Hermitian dot product.
representation learning. t⊗r Hamilton product.
h ◦ t, h  t Hadmard (elementwise) product.
N OMENCLATURE ht Circular correlation.
G Knowledge graph. concat(), [h, r] Vectors/matrices concatenation.
F Set of facts. ω Convolutional filters.
(h, r, t) Triple of head, relation, and tail. ∗ Convolution operator.
(h, r, t) Embedding of head, relation, and tail.
I. I NTRODUCTION
Manuscript received August 9, 2020; revised January 16, 2021; accepted
March 30, 2021. Date of publication April 26, 2021; date of current version
February 4, 2022. This work was supported in part by NSF under Grant III-
1763325, Grant III-1909323, and Grant SaTC-1930941; in part by the
I NCORPORATING human knowledge is one of the research
directions of artificial intelligence (AI). Knowledge repre-
sentation and reasoning, inspired by human problem solving,
Agency for Science, Technology and Research (A*STAR) through its AME
Programmatic Funding Scheme under Project A18A2b0046; in part by the are to represent knowledge for intelligent systems to gain
Academy of Finland under Grant 336033 and Grant 315896; in part by the ability to solve complex tasks [1], [2]. Recently, knowl-
Business Finland under Grant 884/31/2018; and in part by EU H2020 under edge graphs as a form of structured human knowledge have
Grant 101016775. (Corresponding author: Shirui Pan.) drawn great research attention from both academia and the
Shaoxiong Ji and Pekka Marttinen are with the Department of
Computer Science, Aalto University, 02150 Aalto, Finland (e-mail:
industry [3]–[6]. A knowledge graph is a structured representa-
[email protected]; [email protected]). tion of facts, consisting of entities, relationships, and semantic
Shirui Pan is with the Department of Data Science and AI, Faculty of descriptions. Entities can be real-world objects and abstract
Information Technology, Monash University, Clayton, VIC 3800, Australia concepts, relationships represent the relation between entities,
(e-mail: [email protected]). and semantic descriptions of entities, and their relationships
Erik Cambria is with the School of Computer Science and Engi-
neering, Nanyang Technological University, Singapore 639798 (e-mail: contain types and properties with a well-defined meaning.
[email protected]). Property graphs or attributed graphs are widely used, in which
Philip S. Yu is with the Department of Computer Science, University of nodes and relations have properties or attributes.
Illinois at Chicago, Chicago, IL 60607 USA (e-mail: [email protected]). The term of knowledge graph is synonymous with
This article has supplementary material provided by the
authors and color versions of one or more figures available at
knowledge base with a minor difference. A knowledge
https://doi.org/10.1109/TNNLS.2021.3070843 graph can be viewed as a graph when considering its
Digital Object Identifier 10.1109/TNNLS.2021.3070843 graph structure [7]. When it involves formal semantics,
2162-237X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 495

entity recognition, typing, disambiguation, and align-


ment; and relation extraction is discussed according to
the neural paradigms.
3) Wide Coverage on Emerging Advances: We pro-
vide wide coverage on emerging topics, including
transformer-based knowledge encoding, graph neural
network (GNN)-based knowledge propagation, rein-
forcement learning (RL)-based path reasoning, and
Fig. 1. Example of knowledge base and knowledge graph. (a) Factual triples
metarelational learning.
in knowledge base. (b) Entities and relations in knowledge graph. 4) Summary and Outlook on Future Directions: This survey
provides a summary of each category and highlights
it can be taken as a knowledge base for interpretation promising future research directions.
and inference over facts [8]. Examples of knowledge base The remainder of this survey is organized as follows. First,
and knowledge graph are illustrated in Fig. 1. Knowl- an overview of knowledge graphs, including history, notations,
edge can be expressed in a factual triple in the form of definitions, and categorization, is given in Section II. Then,
(head, relation, tail) or (subject, predicate, object) we discuss KRL in Section III from four scopes. Next, our
under the resource description framework (RDF), for example, review goes to tasks of knowledge acquisition and tempo-
(Albert Einstein, WinnerOf, Nobel Prize). It can also be ral knowledge graphs in Sections IV and V. Downstream
represented as a directed graph with nodes as entities and applications are introduced in Section VI. Finally, we discuss
edges as relations. For simplicity and following the trend of future research directions, together with a conclusion in the
the research community, this article uses the terms knowledge end. Other information, including KRL model training and
graph and knowledge base interchangeably. a collection of knowledge graph data sets and open-source
Recent advances in knowledge-graph-based research focus implementations, can be found in the appendixes.
on knowledge representation learning (KRL) or knowledge
graph embedding (KGE) by mapping entities and relations II. OVERVIEW
into low-dimensional vectors while capturing their semantic
A. Brief History of Knowledge Bases
meanings [5], [9]. Specific knowledge acquisition tasks include
knowledge graph completion (KGC), triple classification, Knowledge representation has experienced a long-period
entity recognition, and relation extraction. Knowledge-aware history of development in the fields of logic and AI. The
models benefit from the integration of heterogeneous infor- idea of graphical knowledge representation first dated back to
mation, rich ontologies and semantics for knowledge repre- 1956 as the concept of semantic net proposed by Richens [10],
sentation, and multilingual knowledge. Thus, many real-world while the symbolic logic knowledge can go back to the
applications, such as recommendation systems and ques- General Problem Solver [1] in 1959. The knowledge base
tion answering, have been brought about prosperity with is first used with knowledge-based systems for reasoning
the ability of commonsense understanding and reasoning. and problem-solving. MYCIN [2] is one of the most famous
Some real-world products, for example, Microsoft’s Satori and rule-based expert systems for medical diagnosis with a knowl-
Google’s Knowledge Graph [3], have shown a strong capacity edge base of about 600 rules. Later, the community of
to provide more efficient services. human knowledge representation saw the development of
This article conducts a comprehensive survey of current lit- frame-based language, rule-based, and hybrid representations.
erature on knowledge graphs, which enriches graphs with more Approximately at the end of this period, the Cyc project1
context, intelligence, and semantics for knowledge acquisition began, aiming at assembling human knowledge. RDF2 and
and knowledge-aware applications. Our main contributions are Web Ontology Language (OWL)3 were released in turn and
summarized as follows. became important standards of the Semantic Web.4 Then,
1) Comprehensive Review: We conduct a comprehensive many open knowledge bases or ontologies were published,
review of the origin of knowledge graphs and modern such as WordNet, DBpedia, YAGO, and Freebase. Stokman
techniques for relational learning on knowledge graphs. and Vries [7] proposed a modern idea of structure knowledge
Major neural architectures of knowledge graph repre- in a graph in 1988. However, it was in 2012 that the concept
sentation learning and reasoning are introduced and of knowledge graph gained great popularity since its first
compared. Moreover, we provide a complete overview launch by Google’s search engine,5 where the knowledge
of many applications in different domains. fusion framework called Knowledge Vault [3] was proposed
2) Full-View Categorization and New Taxonomies: A to build large-scale knowledge graphs. A brief road map of
full-view categorization of research on knowledge graph, knowledge base history is illustrated in Fig. 1 in Appendix A in
together with fine-grained new taxonomies, is pre- the Supplementary Material. Many general knowledge graph
sented. Specifically, at the high level, we review the databases and domain-specific knowledge bases have been
research on knowledge graphs in four aspects: KRL, released to facilitate research. We introduce more general and
knowledge acquisition, temporal knowledge graphs, and domain-specific knowledge bases in Appendixes F-A1 and
knowledge-aware applications. For KRL, we further pro- F-A2 in the Supplementary Material.
pose fine-grained taxonomies into four views, including 1 http://cyc.com
representation space, scoring function, encoding models, 2
Released as W3C recommendation in 1999 available at
and auxiliary information. For knowledge acquisition, http://w3.org/TR/1999/REC-rdf-syntax-19990222
KGC is reviewed under embedding-based ranking, rela- 3
http://w3.org/TR/owl-guide
tional path reasoning, logical rule reasoning, and metare- 4 http://w3.org/standards/semanticweb

lational learning; entity acquisition tasks are divided into 5 http://blog.google/products/search/introducing-knowledge-graph-things-not

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
496 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

B. Definitions and Notations


Most efforts have been made to give a definition by describ-
ing general semantic representation or essential characteristics.
However, there is no such wide-accepted formal definition.
Paulheim [11] defined four criteria for knowledge graphs.
Ehrlinger and Wöß [12] analyzed several existing definitions
and proposed Definition 1, which emphasizes the reasoning
engine of knowledge graphs. Wang et al. [5] proposed a
definition as a multirelational graph in Definition 2. Fol-
lowing previous literature, we define a knowledge graph as
G = {E, R, F }, where E, R, and F are sets of entities,
relations, and facts, respectively. A fact is denoted as a triple
(h, r, t) ∈ F .
Definition 1 (Ehrlinger and Wöß [12]): A knowledge gra- Fig. 2. Categorization of research on knowledge graphs.
ph acquires and integrates information into an ontology and
applies a reasoner to derive new knowledge.
Definition 2 (Wang et al. [5]): A knowledge graph is a and alignment. Relation extraction models utilize attention
multirelational graph composed of entities and relations, mechanisms, graph convolutional networks (GCNs), adver-
which are regarded as nodes and different types of edges, sarial training (AT), RL, deep residual learning, and transfer
respectively. learning.
Specific notations and their descriptions are listed Temporal Knowledge Graphs incorporate temporal infor-
in Nomenclature. Details of several mathematical operations mation for representation learning. This survey categorizes
are explained in Appendix B in the Supplementary Material. four research fields, including temporal embedding, entity
dynamics, temporal relational dependence, and temporal log-
ical reasoning.
C. Categorization of Research on Knowledge Graph Knowledge-Aware Applications include natural language
This survey provides a comprehensive literature review on understanding (NLU), question answering, recommendation
the research of knowledge graphs, namely, KRL, knowledge systems, and miscellaneous real-world tasks, which inject
acquisition, and a wide range of downstream knowledge-aware knowledge to improve representation learning.
applications, where many recent advanced deep learning
techniques are integrated. The overall categorization of the D. Related Surveys
research is illustrated in Fig. 2.
Knowledge Representation Learning is a critical research Previous survey papers on knowledge graphs mainly focus
issue of the knowledge graph, which paves the way for many on statistical relational learning [4], knowledge graph refine-
knowledge acquisition tasks and downstream applications. We ment [11], Chinese knowledge graph construction [13], knowl-
categorize KRL into four aspects of representation space, edge reasoning [14], KGE [5], or KRL [9]. The latter two
scoring function, encoding models, and auxiliary information, surveys are more related to our work. Lin et al. [9] presented
providing a clear workflow for developing a KRL model. KRL in a linear manner, with a concentration on quantitative
Specific ingredients include the following: analysis. Wang et al. [5] categorized KRL according to scoring
1) representation space in which the relations and entities functions and specifically focused on the type of information
are represented; utilized in KRL. It provides a general view of current research
2) scoring function for measuring the plausibility of factual only from the perspective of scoring metrics. Our survey
triples; goes deeper into the flow of KRL and provides a full-scaled
3) encoding models for representing and learning relational view from fourfold, including representation space, scoring
interactions; function, encoding models, and auxiliary information. Besides,
4) auxiliary information to be incorporated into the embed- our paper provides a comprehensive review of knowledge
ding methods. acquisition and knowledge-aware applications with several
Representation learning includes pointwise space, mani- emerging topics, such as knowledge-graph-based reasoning
fold, complex vector space, the Gaussian distribution, and and few-shot learning discussed.
discrete space. Scoring metrics are generally divided into
the distance- and similarity matching-based scoring func- III. K NOWLEDGE R EPRESENTATION L EARNING
tions. Current research focuses on encoding models, includ- KRL is also known as KGE, multirelation learning, and
ing linear/bilinear models, factorization, and neural networks. statistical relational learning in the literature. This section
Auxiliary information considers textual, visual, and type reviews recent advances on distributed representation learn-
information. ing with rich semantic information of entities and relations
Knowledge Acquisition tasks are divided into three cat- form four scopes, including representation space (representing
egories, i.e., KGC, relation extraction, and entity discovery. entities and relations, Section III-A), scoring function (mea-
The first one is for expanding existing knowledge graphs, suring the plausibility of facts, Section III-B), encod-
while the other two discover new knowledge (also known ing models (modeling the semantic interaction of facts,
as relations and entities) from the text. KGC falls into the Section III-C), and auxiliary information (utilizing external
following categories: embedding-based ranking, relation path information, Section III-D). We further provide a summary
reasoning, rule-based reasoning, and metarelational learning. in Section III-E. The training strategies for KRL models are
Entity discovery includes recognition, disambiguation, typing, reviewed in Appendix D in the Supplementary Material.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 497

Fig. 3. Illustration of knowledge representation in different spaces. (a) Pointwise space. (b) Complex vector space. (c) Gaussian distribution. (d) Manifold
space.

A. Representation Space product. QuatE [25] extends the complex-valued space into
hypercomplex h, t, r ∈ Hd by a quaternion Q = a + bi +
The key issue of representation learning is to learn cj+dk with three imaginary components, where the quaternion
low-dimensional distributed embedding of entities and inner product, i.e., the Hamilton product h ⊗ r, is used as
relations. Current literature mainly uses real-valued compositional operator for head entity and relation. With the
pointwise space [see Fig. 3(a)], including vector, matrix, introduction of the rotational Hadmard product in complex
and tensor space, while other kinds of space, such as space, RotatE [24] can also capture inversion and composition
complex vector space [see Fig. 3(b)], Gaussian space [see patterns, as well as symmetry and antisymmetry. QuatE [25]
Fig. 3(c)], and manifold [see Fig. 3(d)], are utilized as uses the Hamilton product to capture latent interdependencies
well. The embedding space should follow three conditions, within the 4-D space of entities and relations and gains a more
i.e., differentiability, calculation possibility, and definability expressive rotational capability than RotatE.
of a scoring function [15]. 3) Gaussian Distribution: Inspired by the Gaussian
1) Pointwise Space: The pointwise Euclidean space is word embedding, the density-based embedding model
widely applied for representing entities and relations, project- KG2E [26] introduces Gaussian distribution to deal with the
ing relation embedding in vector or matrix space, or capturing (un)certainties of entities and relations. The authors embedded
relational interactions. TransE [16] represents entities and entities and relations into multidimensional Gaussian distri-
relations in d-dimension vector space, i.e., h, t, r ∈ Rd , and bution H ∼ N (μh ,  h ) and T ∼ N (μt ,  t ). The mean
makes embeddings follow the translational principle h+r ≈ t. vector u indicates entities and relations’ position, and the
To tackle this problem of insufficiency of a single space for covariance matrix  models their (un)certainties. Following
both entities and relations, TransR [17] then further introduces the translational principle, the probability distribution of entity
separated spaces for entities and relations. The authors pro- transformation H − T is denoted as Pe ∼ N (μh − μt ,
jected entities (h, t ∈ Rk ) into relation (r ∈ Rd ) space by a  h +  t ). Similarly, TransG [27] represents entities with
projection matrix Mr ∈ Rk×d . NTN [18] models entities across Gaussian distributions, while it draws a mixture of Gaussian
multiple dimensions by a bilinear tensor neural layer. The distribution for relation embedding, where the mth component
relational interaction between head and tail hT Mt  is captured translation vector of relation r is denoted as ur,m = t − h ∼
as a tensor denoted as M  ∈ Rd×d×k . Instead of using the N (ut − uh , (σh2 + σt2 )E).
Cartesian coordinate system, HAKE [19] captures semantic 4) Manifold and Group: This section reviews knowledge
hierarchies by mapping entities into the polar coordinate representation in manifold space, lie group, and dihedral
system, i.e., entity embeddings em ∈ Rd and e p ∈ [0, 2π)d group. A manifold is a topological space, which could be
in the modulus and phase part, respectively. defined as a set of points with neighborhoods by the set
Many other translational models, such as TransH [20], also theory. The group is algebraic structures defined in abstract
use similar representation space, while semantic matching algebra. Previous pointwise modeling is an ill-posed alge-
models use plain vector space (e.g., HolE [21]) and relational braic system where the number of scoring equations is far
projection matrix (e.g., ANALOGY [22]). Principles of these more than the number of entities and relations. Moreover,
translational and semantic matching models are introduced in embeddings are restricted in an overstrict geometric form
Sections III-B1 and III-B2, respectively. even in some methods with subspace projection. To tackle
2) Complex Vector Space: Instead of using a real-valued these issues, ManifoldE [28] extends pointwise embedding
space, entities and relations are represented in a complex into manifold-based embedding. The authors introduced two
space, where h, t, r ∈ Cd . Take head entity as an example, settings of manifold-based embedding, i.e., sphere and hyper-
h has a real part Re(h) and an imaginary part Im(h), i.e., h = plane. An example of a sphere is shown in Fig. 3(d). For
Re(h)+i Im(h). ComplEx [23] first introduces complex vector the sphere setting, reproducing kernel Hilbert space is used
space shown in Fig. 3(d), which can capture both symmetric to represent the manifold function. Another “hyperplane”
and antisymmetric relations. The Hermitian dot product is used setting is introduced to enhance the model with intersected
to do composition for relation, head, and the conjugate of embeddings. ManifoldE [28] relaxes the real-valued pointwise
the tail. Inspired by Euler’s identity eiθ = cos θ + i sin θ , space into manifold space with a more expressive represen-
RotatE [24] proposes a rotational model taking relation as tation from the geometric perspective. When the manifold
a rotation from head entity to tail entity in complex space function and relation-specific manifold parameter are set to
as t = h ◦ r, where ◦ denotes the elementwise Hadmard zero, the manifold collapses into a point.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
498 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

A more intensively used principle is the translation-based


scoring function that aims to learn embeddings by repre-
senting relations as translations from head to tail entities.
Bordes et al. [16] proposed TransE by assuming that the added
embedding of h+r should be close to the embedding of t with
the scoring function defined under L 1 or L 2 constraints as
fr (h, t) = h + r − t L 1 /L 2 . (2)
Since that, many variants and extensions of TransE have
Fig. 4. Illustrations of distance-based and similarity matching-based scoring been proposed. For example, TransH [20] projects entities and
functions taking TransE [16] and DistMult [32] as examples. (a) Translational
distance-based scoring of TransE. (b) Semantic similarity-based scoring of relations into a hyperplane, TransR [17] introduces separate
DistMult. projection spaces for entities and relations, and TransD [33]
constructs dynamic mapping matrices Mrh = r p hp + I and
Mrt = r p tp + I by the projection vectors h p , t p , r p ∈ Rn .
Hyperbolic space, a multidimensional Riemannian manifold By replacing the Euclidean distance, TransA [34] uses the
with a constant negative curvature −c (c > 0) : Bd,c = {x ∈ Mahalanobis distance to enable more adaptive metric learning.
Rd : x 2 < (1/c)}, is drawing attention for its capacity Previous methods used additive score functions, TransF [35]
of capturing hierarchical information. MuRP [29] represents relaxes the strict translation and uses dot product as fr (h, t) =
the multirelational knowledge graph in the Poincaré ball of (h+r) t. To balance the constraints on head and tail, a flexible
hyperbolic space Bdc = {x ∈ Rd : c x 2 < 1} while it translation scoring function is further proposed.
fails to capture logical patterns and suffers from constant Recently, ITransF [36] enables hidden concepts discovery
curvature. Chami et al. [30] leverages expressive hyperbolic and statistical strength transferring by learning associations
isometries and learns a relation-specific absolute curvature cr between relations and concepts via sparse attention vectors,
in the hyperbolic space. with the scoring function defined as
TorusE [15] solves the regularization problem of TransE  
via embedding in an n-dimensional torus space, which is a fr (h, t) = αrH · D · h + r − αrT · D · t (3)
compact lie group. With the projection from vector space into where D ∈ Rn×d×d is stacked concept projection matrices
torus space defined as π : Rn → T n , x → [x], entities and of entities and relations, α rH , α rT ∈ [0, 1]n are attention
relations are denoted as [h], [r], [t] ∈ Tn . Similar to TransE, vectors calculated by sparse softmax, TransAt [37] integrates
it also learns embeddings following the relational translation relation attention mechanism with translational embedding,
in torus space, i.e., [h] + [r] ≈ [t]. Recently, DihEdral [31] and TransMS [38] transmits multidirectional semantics with
proposes a dihedral symmetry group preserving a 2-D polygon. nonlinear functions and linear bias vectors, with the scoring
It utilizes a finite non-Abelian group to preserve the relational function as
properties of symmetry/skew-symmetry, inversion, and com-
position effectively with the rotation and reflection properties fr (h, t) = − tanh(t ◦ r) ◦ h + r − tanh(h ◦ r) ◦ t
in the dihedral group. +α · (h ◦ t) 1/2 . (4)
KG2E [26] in the Gaussian space and ManifoldE [28]
B. Scoring Function with manifold also use the translational distance-based scoring
The scoring function is used to measure the plausibil- function. KG2E uses two scoring methods, i.e., asymmetric
ity of facts, also referred to as the energy function in the KL-divergence and symmetric expected likelihood, while the
energy-based learning framework. Energy-based learning aims scoring function of ManifoldE is defined as
to learn the energy function Eθ (x) (parameterized by θ taking  2
x as input) and to make sure that positive samples have higher fr (h, t) = M(h, r, t) − Dr2  (5)
scores than negative samples. In this article, the term of the where M is the manifold function, and Dr is a
scoring function is adopted for unification. There are two relation-specific manifold parameter.
typical types of scoring functions, i.e., distance- [see Fig. 4(a)] 2) Semantic Matching: Another direction is to calculate the
and similarity-based [see Fig. 4(b)] functions, to measure the semantic similarity. SME [39] proposes to semantically match
plausibility of a fact. The distance-based scoring function separate combinations of entity–relation pairs of (h, r ) and
measures the plausibility of facts by calculating the distance (r, t). Its scoring function is defined with two versions of
between entities, where addictive translation with relations as matching blocks—linear and bilinear blocks—i.e.,
h + r ≈ t is widely used. Semantic similarity-based scor-
ing measures the plausibility of facts by semantic matching. fr (h, t) = gleft (h, r) gright (r, t). (6)
It usually adopts a multiplicative formulation, i.e., h Mr ≈ t ,
The linear matching block is defined as gleft (h, t) = Ml,1 h +
to transform head entity near the tail in the representation
Ml,2 r + bl , and the bilinear form is gleft (h, r) = (Ml,1 h) ◦
space.
(Ml,2 r) + bl . By restricting relation matrix Mr to be diago-
1) Distance-Based Scoring Function: An intuitive
nal for multirelational representation learning, DistMult [32]
distance-based approach is to calculate the Euclidean distance
proposes a simplified bilinear formulation defined as
between the relational projections of entities. Structural
embedding (SE) [8] uses two projection matrices and L 1 fr (h, t) = h diag(Mr )t. (7)
distance to learn SE as
To capture productive interactions in relational data and
fr (h, t) = Mr,1 h − Mr,2 t L 1 . (1) compute efficiently, HolE [21] introduces a circular correlation

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 499

Fig. 5. Illustrations of neural encoding models. (a) CNN [43] input triples into dense layer and convolution operation to learn semantic representation.
(b) GCN [44] acts as encoder of knowledge graphs to produce entity and relation embeddings. (c) RSN [45] encodes entity–relation sequences and skips
relations discriminatively. (d) Transformer-based CoKE [46] encodes triples as sequences with an entity replaced by [MASK].

of embedding, which can be interpreted as a compressed tensor C. Encoding Models


product, to learn compositional representations. By defining a This section introduces models that encode the interactions
perturbed holographic compositional operator as p(a, b; c) = of entities and relations through specific model architec-
(c ◦ a)  b, where c is a fixed vector, the expanded holo- tures, including linear/bilinear models, factorization models,
graphic embedding model HolEx [40] interpolates the HolE and neural networks. Linear models formulate relations as
and full tensor product method. It can be viewed as the linear a linear/bilinear mapping by projecting head entities into a
concatenation of perturbed HolE. Focusing on multirelational representation space close to tail entities. Factorization aims
inference, ANALOGY [22] models analogical structures of to decompose relational data into low-rank matrices for rep-
relational data. Its scoring function is defined as resentation learning. Neural networks encode relational data
with nonlinear neural activation and more complex network
fr (h, t) = h Mr t (8) structures by matching semantic similarity of entities and
relations. Several neural models are illustrated in Fig. 5.
with relation matrix constrained to be normal matrices in linear 1) Linear/Bilinear Models: Linear/bilinear models encode
mapping, i.e., Mr Mr = Mr Mr for analogical inference. HolE interactions of entities and relations by applying linear opera-
with Fourier transformed in the frequency domain can be tion as
viewed as a special case of ComplEx [41], which connects  
h
holographic and complex embeddings. The analogical embed- gr (h, t) = MrT (12)
t
ding framework [22] can recover or equivalently obtain several
models, such as DistMult, ComplEx, and HolE, by restricting or bilinear transformation operations as (8). Canonical meth-
the embedding dimension and scoring function. Crossover ods with linear/bilinear encoding include SE [8], SME [39],
interactions are introduced by CrossE [42] with an interaction DistMult [32], ComplEx [23], and ANALOGY [22]. For
matrix C ∈ Rnr ×d to simulate the bidirectional interaction TransE [16] with L2 regularization, the scoring function can
between entity and relation. The relation specific interaction be expanded to the form with only linear transformation with
is obtained by looking up interaction matrix as cr = xr C. 1-D vectors, i.e.,
By combining the interactive representations and matching
h+r−t 22 = 2r T (h − t) − 2hT t+ r 22 + h 22 + t 22 . (13)
with tail embedding, the scoring function is defined as
  Wang et al. [47] studied various bilinear models and eval-
f (h, r, t) = σ tanh(cr ◦ h + cr ◦ h ◦ r + b)t . (9) uated their expressiveness and connections by introducing
the concepts of universality and consistency. The authors
The semantic matching principle can be encoded by neural further showed that the ensembles of multiple linear models
networks further discussed in Section III-C. can improve the prediction performance through experiments.
The two methods mentioned above in Section III-A4 with Recently, to solve the independence embedding issue of entity
group representation also follow the semantic matching prin- vectors in canonical Polyadia decomposition, SimplE [48]
ciple. The scoring function of TorusE [15] is defined as introduces the inverse of relations and calculates the average
canonical Polyadia score of (h, r, t) and (t, r −1 , h) as
min x − y i . (10) 1 
(x,y)∈([h]+[r])×[t] fr (h, t) = h ◦ rt + t ◦ r t (14)
2
By modeling 2L relations as group elements, the scoring where r is the embedding of inversion relation. Embedding
function of DihEdral [31] is defined as the summation of models in the bilinear family, such as RESCAL, DistMult,
components HolE, and ComplEx, can be transformed from one into
another with certain constraints [47]. More bilinear models

L are proposed from a factorization perspective discussed in
fr (h, t) = h Rt = h(l) R(l) t(l) (11) Section III-C2.
l=1 2) Factorization Models: Factorization methods formulated
KRL models as three-way tensor X decomposition. A general
where the relation matrix R is defined in block diagonal form principle of tensor factorization can be denoted as Xhrt ≈
for R(l) ∈ D K , and entities are embedded in real-valued space h Mr t, with the composition function following the semantic
for h(l) and t(l) ∈ R2 . matching pattern. Nickel et al. [49] proposed the three-way

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
500 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

rank-r factorization RESCAL over each relational slice of hypernetwork H for 1-D relation-specific convolutional filter
knowledge graph tensor. For the kth relation of m relations, generation to achieve multitask knowledge sharing and, mean-
the kth slice of X is factorized as while, simplifies 2-D ConvE. It can also be interpreted as
a tensor factorization model when taking hypernetwork and
Xk ≈ ARk AT . (15) weight matrix as tensors.
The authors further extended it to handle attributes of enti- 5) Recurrent Neural Networks: The MLP- and CNN-based
ties efficiently [50]. Jenatton et al. [51] then proposed a models, as mentioned above, learn triplet-level representations.
bilinear structured latent factor model In comparison, recurrent networks can capture long-term rela-
(LFM),
d
which extends
 tional dependencies in knowledge graphs. Gardner et al. [57]
RESCAL by decomposing Rk = i=1 α i ui vi . By intro-
k

ducing three-way Tucker tensor decomposition, TuckER [52] and Neelakantan et al. [58] propose the RNN-based model
learns to embed by outputting a core tensor and embedding over the relation path to learn vector representation without
vectors of entities and relations. LowFER [53] proposes a and with entity information, respectively. RSN [45] [see
multimodal factorized bilinear pooling mechanism to better Fig. 5(c)] designs a recurrent skip mechanism to enhance
fuse entities and relations. It generalizes the TuckER model semantic representation learning by distinguishing relations
and is computationally efficient with low-rank approximation. and entities. The relational path as (x 1 , x 2 , . . . , x T ) with
3) Neural Networks: Neural networks for encoding seman- entities and relations in an alternating order is generated by
tic matching have yielded remarkable predictive performance random walk, and it is further used to calculate recurrent
in recent studies. Encoding models with linear/bilinear blocks hidden state ht = tanh(Wh ht−1 + Wx xt + b). The skipping
can also be modeled using neural networks, for example, operation is conducted as
SME [39]. Representative neural models include the multilayer

ht , xt ∈ E
perceptron (MLP) [3], the neural tensor network (NTN) [18], ht = (20)
and the neural association model (NAM) [54]. They generally S1 ht + S2 xt−1 , x t ∈ R
feed entities or relations or both into deep neural networks and where S1 and S2 are weight matrices.
compute a semantic matching score. MLP [3] encodes entities 6) Transformers: Transformer-based models have boosted
and relations together into a fully connected layer and uses contextualized text representation learning. To utilize contex-
a second layer with sigmoid activation for scoring a triple as tual information in knowledge graphs, CoKE [46] employs
transformers to encode edges and path sequences. Similarly,
fr (h, t) = σ (w σ (W[h, r, t])) (16)
KG-BERT [59] borrows the idea from language model pre-
where W ∈ R is the weight matrix and [h, r, t] is a
n×3d training and takes the Bidirectional Encoder Representations
concatenation of three vectors. NTN [18] takes entity embed- from Transformer (BERT) model as an encoder for entities
dings as input associated with a relational tensor and outputs and relations.
predictive score as 7) Graph Neural Networks: GNNs are introduced for learn-
ing connectivity structure under an encoder–decoder frame-
 + Mr,1 h + Mr,2 t + br )
fr (h, t) = r σ (hT Mt (17) work. R-GCN [60] proposes relation-specific transformation
where br ∈ R is bias for relation r , and Mr,1 and Mr,2
k to model the directed nature of knowledge graphs. Its forward
are relation-specific weight matrices. It can be regarded as propagation is defined as
⎛ ⎞
a combination of MLPs and bilinear models. NAM [54]   1
associates the hidden encoding with the embedding of x i(l+1) = σ ⎝ Wr(l) x (l) (l) (l) ⎠
j + W0 x i (21)
the tail entity and proposes the relational-modulated neural r ci,r
r∈R j ∈Ni
network (RMNN).
where x i(l) ∈ Rd is the hidden state of the i th entity in
(l)
4) Convolutional Neural Networks: CNNs are utilized for
learning deep expressive features. ConvE [55] uses 2-D con- the lth layer, Nir is a neighbor set of the i th entity within
volution over embeddings and multiple layers of nonlinear relation r ∈ R, Wr(l) and W0(l) are the learnable parameter
features to model the interactions between entities and rela- matrices, and ci,r is normalization, such as ci,r = |Nir |. Here,
tions by reshaping head entity and relation into 2-D matrix, the GCN [61] acts as a graph encoder. To enable specific tasks,
i.e., Mh ∈ Rdw ×dh and Mr ∈ Rdw ×dh for d = dw × dh . an encoder model still needs to be developed and integrated
Its scoring function is defined as into the R-GCN framework. R-GCN takes the neighborhood
fr (h, t) = σ (vec(σ ([Mh ; Mr ] ∗ ω))W)t (18) of each entity equally. SACN [44] introduces weighted GCN
[see Fig. 5(b)], which defines the strength of two adjacent
where ω is the convolutional filters and vec is the vectorization nodes with the same relation type, to capture the structural
operation reshaping a tensor into a vector. ConvE can express information in knowledge graphs by utilizing node structure,
semantic information by nonlinear feature learning through node attributes, and relation types. The decoder module called
multiple layers. ConvKB [43] adopts CNNs for encoding Conv-TransE adopts the ConvE model as semantic matching
the concatenation of entities and relations without reshaping metric and preserves the translational property. By aligning
[see Fig. 5(a)]. Its scoring function is defined as the convolutional outputs of entity and relation embeddings
with C kernels to be M(h, r) ∈ RC×d , its scoring function is
fr (h, t) = concat(σ ([h, r, t] ∗ ω)) · w. (19) defined as
The concatenation of a set for feature maps generated fr (h, t) = g(vec(M(h, r))W )t. (22)
by convolution increases the learning ability of latent fea-
tures. Compared with ConvE, which captures the local rela- Nathani et al. [62] introduced graph attention networks with
tionships, ConvKB keeps the transitional characteristic and multihead attention as the encoder to capture multihop neigh-
shows better experimental performance. HypER [56] utilizes borhood features by inputting the concatenation of entity and

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 501

relation embeddings. CompGCN [63] proposes entity–relation calibration takes a postprocessing process to adjust probability
composition operations over each edge in the neighborhood of scores, making predictions probabilistic sense. Tabacof and
a central node and generalizes previous GCN-based models. Costabello [76] first studied probability calibration for
KGE under the closed-world assumption, revealing that
well-calibrated models can lead to improved accuracy.
D. Embedding With Auxiliary Information Safavi et al. [77] further explored probability calibration under
Multimodal embedding incorporates external information, the more challenging open-world assumption.
such as text descriptions, type constraints, relational paths, and
visual information, with a knowledge graph itself to facilitate E. Summary
more effective knowledge representation.
1) Textual Description: Entities in knowledge graphs have KRL is vital in the research community of knowledge
textual descriptions denoted as D = w1 , w2 , . . . , wn , pro- graphs. This section reviews four folds of KRL with sev-
viding supplementary semantic information. The challenge of eral modern methods summarized in Table I and more
KRL with textual description is to embed both structured in Appendix C in the Supplementary Material. Overall, devel-
knowledge and unstructured textual information in the same oping a novel KRL model is to answer the following four
space. Wang et al. [64] proposed two alignment models questions: 1) which representation space to choose; 2) how to
for aligning entity space and word space by introducing measure the plausibility of triplets in a specific space; 3) which
entity names and Wikipedia anchors. DKRL [65] extends encoding model to use for modeling relational interactions;
TransE [16] to learn representation directly from entity and 4) whether to utilize auxiliary information. The most pop-
descriptions by a convolutional encoder. SSP [66] captures ularly used representation space is the Euclidean point-based
the strong correlations between triples and textual descriptions space by embedding entities in vector space and modeling
by projecting them in a semantic subspace. The joint loss interactions via vector, matrix, or tensor. Other representation
function is widely applied when incorporating KGE with spaces, including complex vector space, Gaussian distribution,
textual description. Wang et al. [64] used a three-component and manifold space and group, are also studied. Manifold
loss L = L K + LT + L A of the knowledge model L K , text space has an advantage over pointwise Euclidean space by
model LT and the alignment model L A . SSP [66] uses a relaxing the pointwise embedding. Gaussian embeddings can
two-component objective function L = Lembed + μLtopic of express the uncertainties of entities and relations, and multiple
embedding-specific loss Lembed and topic-specific loss Ltopic relation semantics. Embedding in complex vector space can
within textual description, traded off by a parameter μ. effectively model different relational connectivity patterns,
2) Type Information: Entities are represented with hier- especially the symmetry/antisymmetry pattern. The represen-
tation space plays an essential role in encoding the semantic
archical classes or types and, consequently, relations with
information of entities and capturing the relational properties.
semantic types. SSE [67] incorporates semantic categories
of entities to embed entities belonging to the same cate- When developing a representation learning model, appropriate
representation space should be selected and designed carefully
gory smoothly in semantic space. TKRL [68] proposes type
to match the nature of encoding methods and balance the
encoder model for projection matrix of entities to capture type
hierarchy. Noticing that some relations indicate attributes of expressiveness and computational complexity. The scoring
function with a distance-based metric utilizes the transla-
entities, KR-EAR [69] categorizes relation types into attributes
tion principle, while the semantic matching scoring function
and relations and modeled the correlations between entity
employs compositional operators. Encoding models, especially
descriptions. Zhang et al. [70] extended existing embedding
neural networks, play a critical role in modeling interactions
methods with hierarchical relation structure of relation clus-
of entities and relations. The bilinear models also have drawn
ters, relations, and subrelations.
much attention, and some tensor factorization can also be
3) Visual Information: Visual information (e.g., entity
regarded as this family. Other methods incorporate auxiliary
images) can be utilized to enrich KRL. Image-embodied
information of textual description, relation/entity types, entity
IKRL [71], containing cross-modal structure-based and
images, and confidence scores.
image-based representation, encodes images to entity space
and follows the translation principle. The cross-modal repre-
sentations make sure that structure- and image-based repre- IV. K NOWLEDGE ACQUISITION
sentations are in the same representation space. Knowledge acquisition aims to construct knowledge graphs
There remain many kinds of auxiliary information for from unstructured text and other structured or semistructured
KRL, such as attributes, relation paths, and logical rules. sources, complete an existing knowledge graph, and discover
Wang et al. [5] gave a detailed review of using additional and recognize entities and relations. Well-constructed and
information. This article discusses relation path and logical large-scale knowledge graphs can be useful for many down-
rules under the umbrella of KGC in Sections IV-A2 and IV- stream applications and empower knowledge-aware models
A4, respectively. with commonsense reasoning, thereby paving the way for AI.
4) Uncertain Information: Knowledge graphs, such as The main tasks of knowledge acquisition include relation
ProBase [72], NELL [73], and ConceptNet [74], contain extraction, KGC, and other entity-oriented acquisition tasks,
uncertain information with a confidence score assigned to such as entity recognition and entity alignment (EA). Most
every relational fact. In contrast to classic deterministic KGE, methods formulate KGC and relation extraction separately.
uncertain embedding models aim to capture uncertainty rep- These two tasks, however, can also be integrated into a unified
resenting the likelihood of relational facts. Chen et al. [75] framework. Han et al. [78] proposed a joint learning frame-
proposed an uncertain KGE model to simultaneously preserve work with mutual attention for data fusion between knowledge
structural and uncertainty information, where probabilistic soft graphs and text, which solves KGC and relation extraction
logic is applied to infer the confidence score. Probability from text. There are also other tasks related to knowledge

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
502 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

TABLE I
S UMMARY OF R ECENT KRL M ODELS . S EE M ORE D ETAILS
IN A PPENDIX C IN THE S UPPLEMENTARY M ATERIAL

Fig. 6. (a) Embedding-based ranking and (b) relation path reasoning [58].

layer, is defined as h(e, r) = g(Wc σ (e ⊕ r) + b p ), where


e ⊕ r = De e + Dr r + bc is the combination operator of
input entity–relation pair. Previous embedding methods do not
differentiate entities and relation prediction, and ProjE does
not support relation prediction. Based on these observations,
SENN [83] distinguishes three KGC subtasks explicitly by
introducing a unified neural shared embedding with adaptively
weighted general loss function to learn different latent features.
Existing methods rely heavily on existing connections in
knowledge graphs and fail to capture the evolution of factual
knowledge or entities with a few connections. ConMask [84]
proposes relationship-dependent content masking over the
entity description to select relevant snippets of given relations
and CNN-based target fusion to complete the knowledge graph
acquisition, such as triple classification [79], relation classi- with unseen entities. It can only make a prediction when
fication [80], and open knowledge enrichment [81]. In this query relations and entities are explicitly expressed in the
section, three categories of knowledge acquisition techniques, text description. Previous methods are discriminative models
namely, KGC, entity discovery, and relation extraction, are that rely on preprepared entity pairs or text corpus. Focusing
reviewed thoroughly. on the medical domain, REMEDY [85] proposes a generative
model, called conditional relationship variational autoencoder,
for entity pair discovery from latent space.
A. Knowledge Graph Completion 2) Relation Path Reasoning: Embedding learning of entities
Because of the nature of incompleteness of knowledge and relations has gained remarkable performance in some
graphs, KGC is developed to add new triples to a knowledge benchmarks, but it fails to model complex relation paths.
graph. Typical subtasks include link prediction, entity predic- Relation path reasoning turns to leverage path information
tion, and relation prediction. over the graph structure. Random walk inference has been
Preliminary research on KGC focused on learning widely investigated; for example, the path-ranking algorithm
low-dimensional embedding for triple prediction. In this sur- (PRA) [86] chooses a relational path under a combination of
vey, we term those methods as embedding-based methods. path constraints and conducts maximum-likelihood classifica-
Most of them, however, failed to capture multistep relation- tion. To improve path search, Gardner et al. [57] introduced
ships. Thus, recent work turns to explore multistep relation vector space similarity heuristics in the random walk by
paths and incorporate logical rules, termed relation path incorporating textual content, which also relieves the feature
inference and rule-based reasoning, respectively. Triple clas- sparsity issue in PRA. Neural multihop relational path mod-
sification as an associated task of KGC, which evaluates the eling is also studied. Neelakantan et al. [58] developed an
correctness of a factual triple, is additionally reviewed in this RNN model to compose the implications of relational paths
section. by applying compositionality recursively [in Fig. 6(b)]. Chain-
1) Embedding-Based Models: Taking entity prediction as of-Reasoning [87], a neural attention mechanism to enable
an example, embedding-based ranking methods, as shown multiple reasons, represents logical composition across all
in Fig. 6(a), first learn embedding vectors based on existing relations, entities, and text. Recently, DIVA [88] proposes a
triples. By replacing the tail entity or head entity with each unified variational inference framework that takes multihop
entity e ∈ E, those methods calculate scores of all the reasoning as two substeps of path-finding (a prior distribution
candidate entities and rank the top k entities. Aforementioned for underlying path inference) and path-reasoning (a likelihood
KRL methods (e.g., TransE [16], TransH [20], TransR [17], for link classification).
HolE [21], and R-GCN [60]) and joint learning methods, 3) RL-Based Path Finding: Deep RL is introduced for
such as DKRL [65] with textual information, can been used multihop reasoning by formulating path-finding between entity
for KGC. pairs as sequential decision making, specifically a Markov
Unlike representing inputs and candidates in the unified decision process (MDP). The policy-based RL agent learns to
embedding space, ProjE [82] proposes a combined embedding find a step of relation to extending the reasoning paths via the
by space projection of the known parts of input triples, interaction between the knowledge graph environment, where
i.e., (h, r, ?) or (?, r, t), and the candidate entities with the the policy gradient is utilized for training RL agents.
candidate-entity matrix Wc ∈ Rs×d , where s is the number DeepPath [89] first applies RL into relational path learn-
of candidate entities. The embedding projection function, ing and develops a novel reward function to improve accu-
including a neural combination layer and an output projection racy, path diversity, and path efficiency. It encodes states in

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 503

TABLE II
C OMPARISON OF RL-BASED PATH F INDING FOR K NOWLEDGE G RAPH R EASONING

the continuous space via a translational embedding method


and takes the relation space as its action space. Similarly,
MINERVA [90] takes path walking to the correct answer
entity as a sequential optimization problem by maximizing the
expected reward. It excludes the target answer entity and pro-
vides more capable inference. Instead of using a binary reward
function, Multi-Hop [91] proposes a soft reward mechanism.
Action dropout is also adopted to mask some outgoing edges
during training to enable more effective path exploration.
M-Walk [92] applies an RNN controller to capture the histor-
ical trajectory and uses the Monte Carlo tree search (MCTS) Fig. 7. Illustrations of logical rule learning. (a) KALE [96]. (b) pLogic-
for effective path generation. By leveraging text corpus with Net [102].
the sentence bag of current entity denoted as bet , CPL [93]
proposes collaborative policy learning for pathfinding and fact iterative training strategy with three components of embedding
extraction from text. learning, axiom induction, and axiom injection.
With source, query, and current entity denoted as es , eq , and The logical rule is one kind of auxiliary information;
et and query relation denoted as rq , the MDP environment and meanwhile, it can incorporate prior knowledge, enabling the
policy networks of these methods are summarized in Table II, ability of interpretable multihop reasoning and paving the way
where MINERVA, M-Walk, and CPL use the binary reward. for generalization even in few-shot labeled relational triples.
For the policy networks, DeepPath uses a fully connected However, logic rules alone can only cover a limited number of
network, and the extractor of CPL employs CNN, while the relational facts in knowledge graphs and suffer colossal search
rest uses recurrent networks. space. The combination of neural and symbolic computation
4) Rule-Based Reasoning: To better make use of the sym- has complementary advantages that utilize efficient data-driven
bolic nature of knowledge, another research direction of KGC learning and differentiable optimization and exploit prior log-
is logical rule learning. A rule is defined by the head and ical knowledge for precise and interpretable inference. Incor-
body in the form of head ← body. The head is an atom, porating rule-based learning for knowledge representation is
i.e., a fact with variable subjects and/or objects, while the body principally to add regularizations or constraints to representa-
can be a set of atoms. For example, given relations sonOf, tions. The neural theorem prover (NTP) [99] learns logical
hasChild, and gender and entities X and Y , there is a rules for multihop reasoning, which utilizes a radial basis
rule in the reverse form of logic programming as function kernel for differentiable computation on the vector
space. NeuralLP [100] enables gradient-based optimization
(Y, sonOf, X) ← (X, hasChild, Y) ∧ (Y, gender, Male). to be applicable in inductive logic programming, where a
neural controller system is proposed by integrating atten-
Logical rules can be extracted by rule mining tools, such as tion mechanism and auxiliary memory. Neural-Num-LP [101]
AMIE [94]. The recent RLvLR [95] proposes a scalable rule extends NeuralLP to learn numerical rules with dynamic
mining approach with efficient rule searching and pruning and programming and cumulative sum operations. pLogicNet [102]
uses the extracted rules for link prediction. proposes probabilistic logic neural networks [see Fig. 7(b)]
More research attention focuses on injecting logical rules to leverage first-order logic and learn effective embedding
into embeddings to improve reasoning, with joint learning or by combining the advantages of Markov logic networks and
iterative training applied to incorporate first-order logic rules. KRL methods while handling the uncertainty of logic rules.
For example, KALE [96] proposes a unified joint model with ExpressGNN [103] generalizes pLogicNet by tuning graph
t-norm fuzzy logical connectives defined for compatible triples networks and embedding and achieves more efficient logical
and logical rules embedding. Specifically, three compositions reasoning.
of logical conjunction, disjunction, and negation are defined to 5) Metarelational Learning: The long-tail phenomena exist
compose the truth value of a complex formula. Fig. 7(a) illus- in the relations of knowledge graphs. Meanwhile, the real-
trates a simple first-order Horn clause inference. RUGE [97] world scenario of knowledge is dynamic, where unseen triples
proposes an iterative model, where soft rules are utilized are usually acquired. The new scenario, called metarelational
for soft label prediction from unlabeled triples and labeled learning or few-shot relational learning, requires models to
triples for embedding rectification. IterE [98] proposes an predict new relational facts with only a very few samples.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
504 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

Targeting the previous two observations, GMatching [104]


develops a metric-based few-shot learning method with
entity embeddings and local graph structures. It encodes
one-hop neighbors to capture the structural information with
R-GCN and then takes the structural entity embedding
for multistep matching guided by long short-term mem-
ory (LSTM) networks to calculate the similarity scores. Meta-
KGR [105], an optimization-based metalearning approach,
adopts model agnostic metalearning for fast adaption and RL
for entity searching and path reasoning. Inspired by model-
Fig. 8. Illustrations of several entity discovery tasks. (a) Entity recognition
and optimization-based metalearnings, MetaR [106] transfers with LSTM-CRF [111]. (b) EA with IPTransE [126].
relation-specific metainformation from support set to query
set and archives fast adaption via loss gradient of high-order
relational representation. Zhang et al. [107] proposed joint training. Recently, Li et al. [114] formulated flat and nested
modules of heterogeneous graph encoder, recurrent autoen- NER as a unified machine reading comprehension framework
coder, and matching network to complete new relational facts by referring annotation guidelines to construct query questions.
with few-shot references. Qin et al. [108] utilized GAN to Pretrained language models with knowledge graphs, such as
generate reasonable embeddings for unseen relations under the ERNIE [115] and K-BERT [116], have been applied into NER
zero-shot learning setting. Baek et al. [109] proposed a trans- and achieved improved performance.
ductive metalearning framework, called graph extrapolation 2) Entity Typing: Entity typing includes coarse and
networks (GENs), for a few-shot out-of-graph link prediction fine-grained types, while the latter uses a tree-structured type
in knowledge graphs. category and is typically regarded as multiclass and multilabel
6) Triple Classification: Triple classification is to determine classification. To reduce label noise, PLE [117] focuses on cor-
whether facts are correct in testing data, which is typically rect type identification and proposes a partial-label embedding
regarded as a binary classification problem. The decision model with a heterogeneous graph for the representation of
rule is based on the scoring function with a specific thresh- entity mentions, text features, and entity types and their rela-
old. Aforementioned embedding methods could be applied tionships. To tackle the increasing growth of typeset and noisy
for triple classification, including translational distance-based labels, Ma et al. [118] proposed prototype-driven label embed-
methods, such as TransH [20] and TransR [17], and semantic ding with hierarchical information for zero-shot fine-grained
matching-based methods, such as NTN [18], HolE [21], and named entity typing. Recent studies utilize embedding-based
ANALOGY [22]. approaches. For example, JOIE [119] learns joint embeddings
Vanilla vector-based embedding methods failed to deal of instance- and ontology-view graphs and formulates entity
with 1-to-n relations. Recently, Dong et al. [79] extended typing as top-k ranking to predict associated concepts. Con-
the embedding space into region-based n-dimensional balls nectE [120] explores local typing and global triple knowledge
where the tail region is in the head region for 1-to-n relation to enhance joint embedding learning.
using fine-grained type chains, i.e., tree-structure conceptual 3) Entity Disambiguation: Entity disambiguation or entity
clusterings. This relaxation of embedding to n-balls turns linking is a unified task, which links entity mentions to the
triple classification into a geometric containment problem and corresponding entities in a knowledge graph. For example,
improves the performance for entities with long-type chains. Einstein won the Noble Prize in Physics in 1921. The entity
However, it relies on the type chains of entities and suffers mention of “Einstein” should be linked to the entity of
from the scalability problem. Albert Einstein. The contemporary end-to-end learning
approaches have made efforts through representation learning
B. Entity Discovery of entities and mentions, for example, DSRM [121] for mod-
This section distinguishes entity-based knowledge acqui- eling entity semantic relatedness and EDKate [122] for the
sition into several fractionized tasks, i.e., entity recognition, joint embedding of entity and text. Ganea and Hofmann [123]
entity disambiguation, entity typing, and EA. We term them proposed an attentive neural model over local context windows
as entity discovery as they all explore entity-related knowledge for entity embedding learning and differentiable message
under different settings. passing for inferring ambiguous entities. By regarding rela-
1) Entity Recognition: Entity recognition or named entity tions between entities as latent variables, Le and Titov [124]
recognition (NER), when it focuses on specifically named developed an end-to-end neural architecture with relationwise
entities, is a task that tags entities in text. Handcrafted features, and mentionwise normalization.
such as capitalization patterns and language-specific resources, 4) Entity Alignment: The tasks, as mentioned earlier,
such as gazetteers, are applied in many pieces of literature. involve entity discovery from text or a single knowledge graph,
Recent work applies sequence-to-sequence neural architec- while EA aims to fuse knowledge among various knowledge
tures, for example, LSTM-CNN [110] for learning character- graphs. Given E1 and E2 as two different entity sets of two
and word-level features and encoding partial lexicon matches. different knowledge graphs, EA is to find an alignment set
Lample et al. [111] proposed stacked neural architectures A = {(e1 , e2 ) ∈ E1 × E2 |e1 ≡ e2 }, where entity e1 and entity
by stacking LSTM layers and CRF layers, i.e., LSTM-CRF e2 hold an equivalence relation ≡. In practice, a small set of
[in Fig. 8(a)] and Stack-LSTM. MGNER [112] proposes an alignment seeds (i.e., synonymous entities appear in different
integrated framework with entity position detection in various knowledge graphs) is given to start the alignment process,
granularities and attention-based entity classification for both as shown in the left box of Fig. 8(b).
nested and nonoverlapping named entities. Hu et al. [113] dis- Embedding-based alignment calculates the similarity
tinguished multitoken and single-token entities with multitask between the embeddings of a pair of entities. MTransE [125]

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 505

vanilla CNN [136], PCNN can more efficiently capture the


structural information within the entity pair. MIMLCNN [139]
further extends it to multilabel learning with cross-sentence
max pooling for feature selection. Side information, such as
class ties [140] and relation path [141], is also utilized. RNNs
are also introduced; for example, SDP-LSTM [142] adopts
multichannel LSTM while utilizing the shortest dependence
path between entity pair, and Miwa and Bansal [143] stack
sequential and tree-structure LSTMs based on dependence
tree. BRCNN [144] combines RNN for capturing sequential
Fig. 9. Overview of NRE. dependence with CNN for representing local semantics using
two-channel bidirectional LSTM and CNN.
first studies EA in the multilingual scenario. It considers 2) Attention Mechanism: Many variants of attention mecha-
distance-based axis calibration, translation vectors, and linear nisms are combined with CNNs, including word-level attention
transformations for cross-lingual entity matching and triple to capture semantic information of words [145] and selective
alignment verification. Following the translation-based and attention over multiple instances to alleviate the impact of
linear transformation models, IPTransE [126] proposes an noisy instances [146]. Other side information is also intro-
iterative alignment model by mapping entities into a unified duced for enriching semantic representation. APCNN [147]
representation space under a joint embedding framework [see introduces entity description by PCNN and sentence-level
Fig. 8(b)] through aligned translation as e1 + r(E1 →E2 ) − e2 , attention, while HATT [148] proposes hierarchical selec-
linear transformation as M(E1 →E2 ) e1 − e2 , and parameter tive attention to capture the relation hierarchy by concate-
sharing as e1 ≡ e2 . To solve error accumulation in iterative nating attentive representation of each hierarchical layer.
alignment, BootEA [127] proposes a bootstrapping approach Rather than CNN-based sentence encoders, Att-BLSTM [80]
in an incremental training manner, together with an editing proposes word-level attention with BiLSTM. Recently,
technique for checking newly labeled alignment. Soares et al. [149] utilized pretrained relation representations
Additional information of entities is also incorporated for from the deep transformer model.
refinement, for example, JAPE [128] capturing the correla- 3) Graph Convolutional Networks: GCNs are utilized for
tion between cross-lingual attributes, KDCoE [129] embed- encoding a dependence tree over sentences or learning KGEs
ding multilingual entity descriptions via cotraining, and Mul- to leverage relational knowledge for sentence encoding. C-
tiKE [130] learning multiple views of the entity name, rela- GCN [150] is a contextualized GCN model over the pruned
tion, and attributes, and alignment with character attribute dependence tree of sentences after path-centric pruning.
embedding [131]. EA has been intensively studied in recent AGGCN [151] also applies GCN over the dependence tree
years. We recommend Sun et al.’s quantitative survey [132] but utilizes multihead attention for edge selection in a soft
for detailed reading. weighting manner. Unlike the previous two GCN-based mod-
els, Zhang et al., [152] applied GCN for relation embedding in
knowledge graph for sentence-based relation extraction. The
C. Relation Extraction authors further proposed a coarse-to-fine knowledge-aware
Relation extraction is a key task to build large-scale knowl- attention mechanism for the selection of informative instances.
edge graphs automatically by extracting unknown relational 4) Adversarial Training: AT is applied to add adversar-
facts from plain text and adding them into knowledge graphs. ial noise to word embeddings for CNN- and RNN-based
Due to the lack of labeled relational data, distant super- relation extractions under the MIML learning setting [153].
vision [133], also referred to as weak supervision or self- DSGAN [154] denoises distantly supervised relation extraction
supervision, uses heuristic matching to create training data by by learning a generator of sentence-level true positive samples
assuming that sentences containing the same entity mentions and a discriminator that minimizes the probability of being true
may express the same relation under the supervision of a positive of the generator.
relational database. Mintz et al. [134] adopted the distant 5) Reinforcement Learning: RL has been integrated into
supervision for relation classification with textual features, NRE recently by training instance selectors with policy net-
including lexical and syntactic features, named entity tags, works. Qin et al. [155] proposed to train policy-based RL
and conjunctive features. Traditional methods rely highly on agent of sentential relation classifier to redistribute false pos-
feature engineering [134], with a recent approach exploring the itive instances into negative samples to mitigate the effect of
inner correlation between features [135]. Deep neural networks noisy data. The authors took the F1 score as an evaluation
are changing the representation learning of knowledge graphs metric and used F1 score-based performance change as the
and texts. This section reviews recent advances in neural rela- reward for policy networks. Similarly, Zeng et al. [156] and
tion extraction (NRE), with an overview illustrated in Fig. 9. Feng et al. [157] proposed different reward strategies. The
1) Neural Relation Extraction: Trendy neural networks are advantage of RL-based NRE is that the relation extractor
widely applied to NRE. CNNs with position features of is model-agnostic. Thus, it could be easily adapted to any
relative distances to entities [136] are first explored for relation neural architecture for effective relation extraction. Recently,
classification and then extended to relation extraction by mul- HRL [158] proposed a hierarchical policy learning framework
tiwindow CNN [137] with multiple sized convolutional filters. of high-level relation detection and low-level entity extraction.
Multi-instance learning takes a bag of sentences as input to 6) Other Advances: Other advances of deep learning are
predict the relationship of the entity pair. PCNN [138] applies also applied for NRE. Noticing that current NRE methods
the piecewise max-pooling over the segments of convolutional do not use very deep networks, Huang and Wang [159]
representation divided by entity position. Compared with applied deep residual learning to noisy relation extraction

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
506 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

and found that nine-layer CNNs have improved performance. embedding, facilitate efficient rule injection, and induce inter-
Liu et al. [160] proposed to initialize the neural model by pretable rules. With the observation of the graphical nature of
transfer learning from entity classification. The cooperative knowledge graphs, path search and neural path representation
CORD [161] ensembles text corpus and knowledge graph with learning are studied. However, they suffer from connectivity
external logical rules by bidirectional knowledge distillation deficiency when traverses over large-scale graphs. The emerg-
and adaptive imitation. TK-MF [162] enriches sentence rep- ing direction of metarelational learning aims to learn fast
resentation learning by matching sentences and topic words. adaptation over unseen relations in low-resource settings.
Recently, Shahbazi et al. [163] studied trustworthy relation Entity discovery acquires entity-oriented knowledge from
extraction by benchmarking several explanation mechanisms, text and fuses knowledge between knowledge graphs. There
including saliency, gradient × input, and leave one out. are several categories according to specific settings. Entity
The existence of low-frequency relations in knowledge recognition is explored in a sequence-to-sequence manner,
graphs requires few-shot relation classification with unseen entity typing discusses noisy type labels and zero-shot typing,
classes or only a few instances. Gao et al. [164] pro- and entity disambiguation and alignment learn unified embed-
posed hybrid attention-based prototypical networks to compute dings with iterative alignment model proposed to tackle the
prototypical relation embedding and compare its distance issue of a limited number of alignment seeds. However, it may
between the query embedding. Qin et al. [165] explored the face error accumulation problems if newly aligned entities
relationships between relations with a global relation graph suffer from poor performance. Language-specific knowledge
and formulated few-shot relation extraction as a Bayesian has increased in recent years and, consequentially, motivates
metalearning problem to learn the posterior distribution of the research on cross-lingual knowledge alignment.
relations’ prototype vectors. Relation extraction suffers from noisy patterns under the
7) Joint Entity and Relation Extraction: Traditional assumption of distant supervision, especially in text corpus of
relation extraction models utilize pipeline approaches by first different domains. Thus, weakly supervised relation extraction
extracting entity mentions and then classifying relations. How- must mitigate the impact of noisy labeling. For example,
ever, pipeline methods may cause error accumulation. Several multi-instance learning takes bags of sentences as inputs and
studies show better performance by joint learning [143], [166] attention mechanism [146] reduce noisy patterns by soft selec-
than by conventional pipeline methods. Katiyar and tion over instances, and RL-based methods formulate instance
Cardie [167] proposed a joint extraction framework with an selection as a hard decision. Another principle is to learn richer
attention-based LSTM network. Some convert joint extraction representation as possible. As deep neural networks can solve
into different problems, such as sequence labeling via a novel error propagation in traditional feature extraction methods,
tagging scheme [168] and multiturn question answering [169]. this field is dominated by DNN-based models, as summarized
Challenges remain in dealing with entity pair and relation in Table III.
overlapping [170]. Wei et al. [171] proposed a cascade binary
tagging framework that models relations as subject–object V. T EMPORAL K NOWLEDGE G RAPH
mapping functions to solve the overlapping problem. Current knowledge graph research mostly focuses on sta-
There is a distribution discrepancy between training and tic knowledge graphs where facts are not changed with
inference in the joint learning framework, leading to exposure time, while the temporal dynamics of a knowledge graph
bias. Recently, Wang et al. [172] proposed a one-stage joint are less explored. However, the temporal information is of
extraction framework by transforming joint entity and relation great importance because the structured knowledge only holds
extraction into a token pair linking task to mitigate error prop- within a specific period, and the evolution of facts follows
agation and exposure bias. In contrast to the common view that a time sequence. Recent research begins to take temporal
joint models can ease error accumulation by capturing mutual information into KRL and KGC, which is termed temporal
interaction of entities and relations, Zhong and Chen [173] knowledge graph in contrast to the previous static knowl-
proposed a simple pipeline-based yet effective approach to edge graph. Research efforts have been made for learning
learning two independent encoders for entities and relations, temporal and relational embeddings simultaneously. Relevant
revealing that strong contextual representation can preserve models for dynamic network embedding also inspire temporal
distinct features of entities and relations. Future research needs KGE. For example, the temporal graph attention (TGAT)
to rethink the relation between the pipeline and joint learning network [174] that captures temporal–topological structure and
methods. learn time-feature interactions simultaneously may be useful
to preserve temporal-aware relation for knowledge graphs.
D. Summary
This section reviews knowledge completion for incomplete A. Temporal Information Embedding
knowledge graphs and acquisition from plain text. Temporal information is considered in temporal-aware
KGC completes missing links between existing entities or embedding by extending triples into temporal quadruple as
infers entities given entity and relation queries. Embedding- (h, r, t, τ ), where τ provides additional temporal information
based KGC methods generally rely on triple representation about when the fact held. Leblay and Chekol [175] investi-
learning to capture semantics and do candidate ranking for gated temporal scope prediction over time-annotated triple and
completion. Embedding-based reasoning remains at the indi- simply extended existing embedding methods, for example,
vidual relation level and is poor at complex reasoning because TransE with the vector-based TTransE defined as
it ignores the symbolical nature of the knowledge graph f τ (h, r, t) = − h + r + τ − t L 1/2 . (23)
and lack of interpretability. Hybrid methods with symbolics
and embedding incorporate rule-based reasoning, overcome Ma et al. [176] also generalized existing static embedding
the sparsity of knowledge graph to improve the quality of methods and proposed ConT by replacing the shared weight

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 507

TABLE III
S UMMARY OF NRE AND R ECENT A DVANCES

vector of Tucker with a timestamp embedding. Temporally A multivariate temporal point process is used to model the
scoped quadruple extends triples by adding a time scope occurrence of facts, and a novel recurrent network is developed
[τs , τe ], where τs and τe stand for the beginning and ending to learn the representation of nonlinear temporal evolution.
of the valid period of a triple, and then, a static subgraph G τ To capture the interaction between nodes, RE-NET [184]
can be derived from the dynamic knowledge graph when given models event sequences via an RNN-based event encoder
a specific timestamp τ . HyTE [177] takes a time stamp as a and neighborhood aggregator. Specifically, RNN is used to
hyperplane wτ and projects entity and relation representation capture the temporal entity interaction, and the neighborhood
as Pτ (h) = h − (wτ h)wτ , Pτ (t) = t − (wτ t)wτ , and Pτ (r) = aggregator aggregates the concurrent interactions.
r − (wτ r)wτ . The temporally projected scoring function is
calculated as
C. Temporal Relational Dependence
f τ (h, r, t) = Pτ (h) + Pτ (r) − Pτ (t) L 1 /L 2 (24) There exists temporal dependencies in relational chains
within the projected translation of Pτ (h) + Pτ (r) ≈ Pτ (t). following the timeline, for example, wasBornIn →
García-Durán et al. [178] concatenated predicate token graduateFrom → workAt → diedIn. Jiang et al. [185],
sequence and temporal token sequence and used LSTM to [186] proposed time-aware embedding, a joint learning frame-
encode the concatenated time-aware predicate sequences. The work with temporal regularization, to incorporate temporal
last hidden state of LSTM is taken as temporal-aware relational order and consistency information. The authors defined a
embedding rtemp . The scoring functions of extended TransE temporal scoring function as
and DistMult are calculated as h+rtemp −t 2 and (h◦ t)rtemp T
, f (rk , rl ) = rk T − rl L 1/2 (25)
respectively. By defining the context of an entity e as an aggre-
gate set of facts containing e, Liu et al. [179] proposed context where T ∈ Rd×d is an asymmetric matrix that encodes the
selection to capture useful contexts and measured temporal temporal order of relation, for a temporal ordering relation
consistency with selected context. By formulating temporal pair rk , rl . Three temporal consistency constraints of dis-
KGC as four-order tensor completion, Lacroix et al. [180] jointness, ordering, and spans are further applied by integer
proposed TComplEx that extends ComplEx decomposition and linear programming formulation.
introduced weighted regularizers.
D. Temporal Logical Reasoning
B. Entity Dynamics
Logical rules are also studied for temporal reasoning.
Real-world events change entities’ states and, consequently, Chekol et al. [187] explored Markov logic network and
affect the corresponding relations. To improve temporal scope probabilistic soft logic for reasoning over uncertain temporal
inference, the contextual temporal profile model [181] formu- knowledge graphs. RLvLR-Stream [95] considers temporal
lates the temporal scoping problem as state change detection close-path rules and learns the structure of rules from the
and utilizes the context to learn state and state change vectors. knowledge graph stream for reasoning.
Inspired by the diachronic word embedding, Goel et al. [182]
took an entity and timestamp as the input of entity embed-
ding function to preserve the temporal-aware characteristics VI. K NOWLEDGE -AWARE A PPLICATIONS
of entities at any time point. Know-evolve [183], a deep Rich structured knowledge can be useful for AI applica-
evolutionary knowledge network, investigates the knowledge tions. However, how to integrate such symbolic knowledge
evolution phenomenon of entities and their evolved relations. into the computational framework of real-world applications

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
508 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

remains a challenge. The application of knowledge graphs 2) Multihop Reasoning: To deal with complex multihop
includes twofold: 1) in-KG applications, such as link pre- relations, it requires a more dedicated design to be capa-
diction and NER and 2) out-of-KG applications, including ble of multihop commonsense reasoning. Structured knowl-
relation extraction and more downstream knowledge-aware edge provides informative commonsense observations and acts
applications, such as question answering and recommendation as relational inductive biases, which boosts recent studies
systems. This section introduces several recent DNN-based on commonsense knowledge fusion between symbolic and
knowledge-driven approaches with applications on natural lan- semantic space for multihop reasoning. Bauer et al. [199] pro-
guage processing and recommendation. More miscellaneous posed multihop bidirectional attention and pointer-generator
applications, such as digital health and search engine, are decoder for effective multihop reasoning and coherent answer
introduced in Appendix E in the Supplementary Material. generation, utilizing external commonsense knowledge by
relational path selection from ConceptNet and injection with
A. Language Representation Learning selectively gated attention. The variational reasoning net-
work (VRN) [200] conducts multihop logic reasoning with
Language representation learning via self-supervised lan- reasoning-graph embedding, while handling the uncertainty
guage model pretraining has become an integral component of in topic entity recognition. KagNet [201] performs concept
many NLP systems. Traditional language modeling does not recognition to build a schema graph from ConceptNet and
exploit factual knowledge with entities frequently observed in learns path-based relational representation via GCN, LSTM,
the text corpus. How to integrate knowledge into language and hierarchical path-based attention. CogQA [202] combines
representation has drawn increasing attention. The knowledge implicit extraction and explicit reasoning and proposes a cog-
graph language model (KGLM) [188] learns to render knowl- nitive graph model based on BERT and GNN for multihop QA.
edge by selecting and copying entities. ERNIE-Tsinghua [189]
fuses informative entities via aggregated pretraining and ran- C. Recommender Systems
dom masking. K-BERT [116] infuses domain knowledge Integrating knowledge graphs as external information
into BERT contextual encoder. ERNIE-Baidu [190] intro- enables recommendation systems to have the ability of com-
duces named entity masking and phrase masking to integrate monsense reasoning, with the potential to solve the sparsity
knowledge into the language model and is further improved issue and the cold start problem. By injecting knowledge-
by ERNIE 2.0 [115] via continual multitask learning. To graph-based side information, such as entities, relations, and
capture factual knowledge from text, KEPLER [191] combines attributes, many efforts work on embedding-based regulariza-
knowledge embedding and masked language modeling losses tion to improve recommendation. The collaborative CKE [203]
via joint optimization. GLM [192] proposes a graph-guided jointly trains KGEs, item’s textual information, and visual con-
entity masking scheme to utilize knowledge graph implicitly. tent via translational KGE model and stacked autoencoders.
CoLAKE [193] further exploits the knowledge context of an Noticing that time- and topic-sensitive news articles consist
entity through a unified word-knowledge graph and a modified of condensed entities and common knowledge, DKN [204]
transformer encoder. Similar to the K-BERT model and focus- incorporates knowledge graph by a knowledge-aware CNN
ing on the medical corpus, BERT-MK [194] integrates medical model with multichannel word-entity-aligned textual inputs.
knowledge into the pretraining language model via knowledge However, DKN cannot be trained in an end-to-end manner
subgraph extraction. Rethinking about large-scale training on as it needs to learn entity embedding in advance. To enable
language model and querying over knowledge graphs, Petroni end-to-end training, MKR [205] associates multitask knowl-
et al. [195] analyzed the language model and knowledge base. edge graph representation and recommendation by sharing
They found that certain factual knowledge can be acquired via latent features and modeling high-order item-entity interaction.
the pretraining language model. While other works consider the relational path and struc-
ture of knowledge graphs, KPRN [206] regards the inter-
B. Question Answering action between users and items as an entity–relation path
Knowledge-graph-based question answering (KG-QA) in the knowledge graph and conducts preference inference
answers natural language questions with facts from knowledge over the path with LSTM to capture the sequential depen-
graphs. Neural network-based approaches represent questions dence. PGPR [207] performs reinforcement policy-guided path
and answers in distributed semantic space, and some also reasoning over knowledge-graph-based user–item interaction.
conduct symbolic knowledge injection for commonsense KGAT [208] applies graph attention network over the collabo-
reasoning. rative knowledge graph of entity–relation and user–item graphs
1) Single-Fact QA: Taking a knowledge graph as an exter- to encode high-order connectivities via embedding propaga-
nal intellectual source, simple factoid QA or single-fact QA tion and attention-based aggregation. Knowledge graph-based
is to answer a simple question involving a single knowledge recommendation inherently processes interpretability from
graph fact. Dai et al. [196] proposed a conditional focused embedding propagation with multihop neighbors in the knowl-
neural network equipped with focused pruning to reduce the edge graph.
search space. BAMnet [197] models the two-way interaction VII. F UTURE D IRECTIONS
between questions and knowledge graph with a bidirectional Many efforts have been conducted to tackle the challenges
attention mechanism. Although deep learning techniques are of knowledge representation and its related applications. How-
intensively applied in KG-QA, they inevitably increase the ever, there remain several formidable open problems and
model complexity. Through the evaluation of simple KG-QA promising future directions.
with and without neural networks, Mohammed et al. [198]
found that sophisticated deep models, such as LSTM and GRU A. Complex Reasoning
with heuristics, achieve state of the art, and nonneural models Numerical computing for knowledge representation and
also gain reasonably well performance. reasoning requires a continuous vector space to capture the

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 509

semantic of entities and relations. While embedding-based Probabilistic logic inference using Markov logic networks
methods have a limitation on complex logical reasoning, two is computationally intensive, making it hard to scalable to
directions on the relational path and symbolic logic are worthy large-scale knowledge graphs. Rules in a recent neural logical
of being further explored. Some promising methods, such model [102] are generated by simple brute-force search, mak-
as recurrent relational path encoding, GNN-based message ing it insufficient on large-scale knowledge graphs. Express-
passing over knowledge graph, and RL-based pathfinding and GNN [103] attempts to use NeuralLP [100] for efficient rule
reasoning, are up-and-coming for handling complex reasoning. induction. Nevertheless, there still has a long way to go to
For the combination of logic rules and embeddings, recent deal with cumbersome deep architectures and the increasingly
works [102], [103] combine Markov logic networks with KGE, growing knowledge graphs.
aiming to leverage logic rules and handling their uncertainty.
Enabling probabilistic inference for capturing the uncertainty
and domain knowledge with efficiently embedding will be a E. Knowledge Aggregation
noteworthy research direction. The aggregation of global knowledge is the core of
knowledge-aware applications. For example, recommendation
systems use a knowledge graph to model user–item interaction
B. Unified Framework and text classification jointly to encode text and knowledge
Several representation learning models on knowledge graphs graph into a semantic space. Most current knowledge aggre-
have been verified as equivalence; for example, Hayshi and gation methods design neural architectures, such as attention
Shimbo [41] proved that HolE and ComplEx are mathemati- mechanisms and GNNs. The natural language processing com-
cally equivalent for link prediction with a particular constraint. munity has been boosted from large-scale pretraining via trans-
ANALOGY [22] provides a unified view of several repre- formers and variants, such as BERT models. At the same time,
sentative models, including DistMult, ComplEx, and HolE. a recent finding [195] reveals that the pretraining language
Wang et al. [47] explored connections among several bilinear model on the unstructured text can acquire certain factual
models. Sharma et al. [209] explored the geometric under- knowledge. Large-scale pretraining can be a straightforward
standing of additive and multiplicative KRL models. Most way to injecting knowledge. However, rethinking the way of
works formulated knowledge acquisition KGC and relation knowledge aggregation in an efficient and interpretable manner
extraction separately with different models. Han et al. [78] is also of significance.
put them under the same roof and proposed a joint learn-
ing framework with mutual attention for information sharing
between knowledge graph and text. A unified understanding F. Automatic Construction and Dynamics
of knowledge representation and reasoning is less explored. Current knowledge graphs rely highly on manual construc-
An investigation toward unification in a way similar to the tion, which is labor-intensive and expensive. The widespread
unified framework of graph networks [210], however, will be applications of knowledge graphs on different cognitive intel-
worthy of bridging the research gap. ligence fields require automatic knowledge graph construction
from large-scale unstructured content. Recent research mainly
works on semiautomatic construction under the supervision of
C. Interpretability existing knowledge graphs. Facing multimodality, heterogene-
Interpretability of knowledge representation and injection is ity, and large-scale application, automatic construction is still
a vital issue for knowledge acquisition and real-world applica- of great challenge.
tions. Preliminary efforts have been made for interpretability. The mainstream research focuses on static knowledge
ITransF [36] uses sparse vectors for knowledge transfer- graphs, with several works on predicting temporal scope valid-
ring and interprets with attention visualization. CrossE [42] ity and learning temporal information and entity dynamics.
explores the explanation scheme of knowledge graphs by Many facts only hold within a specific period. A dynamic
using embedding-based path searching to generate explana- knowledge graph, together with learning algorithms capturing
tions for link prediction. However, recent neural models have dynamics, can address the limitation of traditional knowledge
limitations on transparency and interpretability although they representation and reasoning by considering the temporal
have gained impressive performance. Some methods combine nature.
black-box neural models and symbolic reasoning by incorpo-
rating logical rules to increase interoperability. Interpretability
can convince people to trust predictions. Thus, further work VIII. C ONCLUSION
should go into interpretability and improve the reliability of Knowledge graphs as the ensemble of human knowledge
predicted knowledge. have attracted increasing research attention, with the recent
emergence of KRL, knowledge acquisition methods, and a
wide variety of knowledge-aware applications. This article
D. Scalability conducts a comprehensive survey on the following four scopes:
Scalability is crucial in large-scale knowledge graphs. There 1) KGE, with a full-scale systematic review from embedding
is a tradeoff between computational efficiency and model space, scoring metrics, encoding models, embedding with
expressiveness, with a limited number of works applied to external information, and training strategies; 2) knowledge
more than one million entities. Several embedding methods acquisition of entity discovery, relation extraction, and graph
use simplification to reduce the computation cost, such as completion from three perspectives of embedding learning,
simplifying tensor products with circular correlation opera- relational path inference, and logical rule reasoning; 3) tempo-
tion [21]. However, these methods still struggle with scaling ral knowledge graph representation learning and completion;
to millions of entities and relations. and 4) real-world knowledge-aware applications on NLU,

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
510 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

recommendation systems, question answering, and other mis- [26] S. He, K. Liu, G. Ji, and J. Zhao, “Learning to represent knowledge
cellaneous applications. Besides, some useful resources of data graphs with Gaussian embedding,” in Proc. 24th ACM Int. Conf. Inf.
sets and open-source libraries, and future research directions Knowl. Manage., Oct. 2015, pp. 623–632.
[27] H. Xiao, M. Huang, and X. Zhu, “TransG: A generative model
are introduced and discussed. Knowledge graph hosts a large for knowledge graph embedding,” in Proc. ACL, vol. 1, 2016,
research community and has a wide range of methodologies pp. 2316–2325.
and applications. We conduct this survey to have a summary [28] H. Xiao, M. Huang, Y. Hao, and X. Zhu, “From one point to a manifold:
of current representative research efforts and trends and expect Orbit models for knowledge graph embedding,” in Proc. IJCAI, 2016,
pp. 1315–1321.
that it can facilitate future research.
[29] I. Balazevic, C. Allen, and T. Hospedales, “Multi-relational poincaré
graph embeddings,” in NeurIPS, 2019, pp. 4463–4473.
[30] I. Chami, A. Wolf, D.-C. Juan, F. Sala, S. Ravi, and C. Ré, “Low-
R EFERENCES dimensional hyperbolic knowledge graph embeddings,” in Proc. 58th
[1] A. Newell, J. C. Shaw, and H. A. Simon, “Report on a general problem Annu. Meeting Assoc. Comput. Linguistics, 2020, pp. 6901–6914.
solving program,” in Proc. IFIP Congr., vol. 256, 1959, p. 64. [31] C. Xu and R. Li, “Relation embedding with dihedral group in knowl-
[2] E. Shortliffe, Computer-Based Medical Consultations: MYCIN, vol. 2. edge graph,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics,
Amsterdam, The Netherlands: Elsevier, 2012. 2019, pp. 263–272.
[3] X. Dong et al., “Knowledge vault: A Web-scale approach to proba- [32] B. Yang, W.-T. Yih, X. He, J. Gao, and L. Deng, “Embedding entities
bilistic knowledge fusion,” in Proc. SIGKDD, 2014, pp. 601–610. and relations for learning and inference in knowledge bases,” in Proc.
[4] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review ICLR, 2015, pp. 1–13.
of relational machine learning for knowledge graphs,” Proc. IEEE, [33] G. Ji, S. He, L. Xu, K. Liu, and J. Zhao, “Knowledge graph embedding
vol. 104, no. 1, pp. 11–33, Jan. 2016. via dynamic mapping matrix,” in Proc. 53rd Annu. Meeting Assoc.
[5] Q. Wang, Z. Mao, B. Wang, and L. Guo, “Knowledge graph embed- Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Process., vol. 1,
ding: A survey of approaches and applications,” IEEE Trans. Knowl. 2015, pp. 687–696.
Data Eng., vol. 29, no. 12, pp. 2724–2743, Dec. 2017. [34] H. Xiao, M. Huang, Y. Hao, and X. Zhu, “TransA: An adaptive
[6] A. Hogan et al., “Knowledge graphs,” 2020, arXiv:2003.02320. approach for knowledge graph embedding,” in Proc. AAAI, 2015,
[Online]. Available: http://arxiv.org/abs/2003.02320 pp. 1–7.
[7] F. N. Stokman and P. H. de Vries, “Structuring knowledge in a graph,” [35] J. Feng, M. Huang, M. Wang, M. Zhou, Y. Hao, and X. Zhu,
in Human-Computer Interaction. Berlin, Germany: Springer, 1988, “Knowledge graph embedding by flexible translation,” in Proc. KR,
pp. 186–206. 2016, pp. 557–560.
[8] A. Bordes, J. Weston, R. Collobert, and Y. Bengio, “Learning structured [36] Q. Xie, X. Ma, Z. Dai, and E. Hovy, “An interpretable knowledge
embeddings of knowledge bases,” in Proc. AAAI, 2011, pp. 301–306. transfer model for knowledge base completion,” in Proc. 55th Annu.
[9] Y. Lin, X. Han, R. Xie, Z. Liu, and M. Sun, “Knowledge representation Meeting Assoc. Comput. Linguistics, vol. 1, 2017, pp. 950–962.
learning: A quantitative review,” 2018, arXiv:1812.10901. [Online]. [37] W. Qian, C. Fu, Y. Zhu, D. Cai, and X. He, “Translating embeddings
Available: http://arxiv.org/abs/1812.10901 for knowledge graph completion with relation attention mechanism,”
[10] R. H. Richens, “Preprogramming for mechanical translation,” Mech. in Proc. 27th Int. Joint Conf. Artif. Intell., Jul. 2018, pp. 4286–4292.
Transl., vol. 3, no. 1, pp. 20–25, 1956. [38] S. Yang, J. Tian, H. Zhang, J. Yan, H. He, and Y. Jin, “TransMS:
[11] H. Paulheim, “Knowledge graph refinement: A survey of approaches Knowledge graph embedding for complex relations by multidirectional
and evaluation methods,” Semantic Web, vol. 8, no. 3, pp. 489–508, semantics,” in Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019,
Dec. 2016. pp. 1935–1942.
[12] L. Ehrlinger and W. Wöß, “Towards a definition of knowledge graphs,” [39] A. Bordes, X. Glorot, J. Weston, and Y. Bengio, “A semantic matching
SEMANTiCS (Posters, Demos, SuCCESS), vol. 48, pp. 1–4, 2016. energy function for learning with multi-relational data,” Mach. Learn.,
[13] T. Wu, G. Qi, C. Li, and M. Wang, “A survey of techniques for vol. 94, no. 2, pp. 233–259, Feb. 2014.
constructing Chinese knowledge graphs and their applications,” Sus- [40] Y. Xue, Y. Yuan, Z. Xu, and A. Sabharwal, “Expanding holographic
tainability, vol. 10, no. 9, p. 3245, Sep. 2018. embeddings for knowledge completion,” in Proc. NeurIPS, 2018,
[14] X. Chen, S. Jia, and Y. Xiang, “A review: Knowledge reasoning pp. 4491–4501.
over knowledge graph,” Expert Syst. Appl., vol. 141, Mar. 2020, [41] K. Hayashi and M. Shimbo, “On the equivalence of holographic and
Art. no. 112948. complex embeddings for link prediction,” in Proc. 55th Annu. Meeting
[15] T. Ebisu and R. Ichise, “TorusE: Knowledge graph embedding on a lie Assoc. Comput. Linguistics, vol. 1, 2017, pp. 554–559.
group,” in Proc. AAAI, 2018, pp. 1819–1826. [42] W. Zhang, B. Paudel, W. Zhang, A. Bernstein, and H. Chen, “Interac-
[16] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, tion embeddings for prediction and explanation in knowledge graphs,”
“Translating embeddings for modeling multi-relational data,” in Proc. in Proc. 12th ACM Int. Conf. Web Search Data Mining, Jan. 2019,
NIPS, 2013, pp. 2787–2795. pp. 96–104.
[17] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, “Learning entity and [43] D. Q. Nguyen, T. D. Nguyen, D. Q. Nguyen, and D. Phung, “A
relation embeddings for knowledge graph completion,” in Proc. AAAI, novel embedding model for knowledge base completion based on
2015, pp. 2181–2187. convolutional neural network,” in Proc. Conf. North Amer. Chapter
[18] R. Socher, D. Chen, C. D. Manning, and A. Ng, “Reasoning with neural Assoc. Comput. Linguistics: Human Lang. Technol., vol. 2, 2018,
tensor networks for knowledge base completion,” in Proc. NIPS, 2013, pp. 327–333.
pp. 926–934. [44] C. Shang, Y. Tang, J. Huang, J. Bi, X. He, and B. Zhou, “End-
[19] Z. Zhang, J. Cai, Y. Zhang, and J. Wang, “Learning hierarchy-aware to-end structure-aware convolutional networks for knowledge base
knowledge graph embeddings for link prediction,” in Proc. AAAI, 2020, completion,” in Proc. AAAI, vol. 33, 2019, pp. 3060–3067.
pp. 3065–3072. [45] L. Guo, Z. Sun, and W. Hu, “Learning to exploit long-term rela-
[20] Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph embedding tional dependencies in knowledge graphs,” in Proc. ICML, 2019,
by translating on hyperplanes,” in Proc. AAAI, 2014, pp. 1112–1119. pp. 2505–2514.
[21] M. Nickel, L. Rosasco, and T. Poggio, “Holographic embeddings of [46] Q. Wang et al., “CoKE: Contextualized knowledge graph
knowledge graphs,” in Proc. AAAI, 2016, pp. 1955–1961. embedding,” 2019, arXiv:1911.02168. [Online]. Available:
[22] H. Liu, Y. Wu, and Y. Yang, “Analogical inference for multi-relational http://arxiv.org/abs/1911.02168
embeddings,” in Proc. ICML, 2017, pp. 2168–2178. [47] Y. Wang, R. Gemulla, and H. Li, “On multi-relational link prediction
[23] T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, with bilinear models,” in Proc. AAAI, 2018, pp. 4227–4234.
“Complex embeddings for simple link prediction,” in Proc. ICML, [48] S. M. Kazemi and D. Poole, “SimplE embedding for link prediction
2016, pp. 2071–2080. in knowledge graphs,” in Proc. NeurIPS, 2018, pp. 4284–4295.
[24] Z. Sun, Z.-H. Deng, J.-Y. Nie, and J. Tang, “RotatE: Knowledge graph [49] M. Nickel, V. Tresp, and H.-P. Kriegel, “A three-way model for
embedding by relational rotation in complex space,” in Proc. ICLR, collective learning on multi-relational data,” in Proc. ICML, vol. 11,
2019, pp. 1–18. 2011, pp. 809–816.
[25] S. Zhang, Y. Tay, L. Yao, and Q. Liu, “Quaternion knowledge graph [50] M. Nickel, V. Tresp, and H. P. Kriegel, “Factorizing YAGO: Scalable
embedding,” in Proc. NeurIPS, 2019, pp. 2731–2741. machine learning for linked data,” in Proc. WWW, 2012, pp. 271–280.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 511

[51] R. Jenatton, N. L. Roux, A. Bordes, and G. R. Obozinski, “A latent [77] T. Safavi, D. Koutra, and E. Meij, “Evaluating the calibration of
factor model for highly multi-relational data,” in Proc. NIPS, 2012, knowledge graph embeddings for trustworthy link prediction,” in Proc.
pp. 3167–3175. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2020,
[52] I. Balazevic, C. Allen, and T. Hospedales, “TuckER: Tensor factor- pp. 8308–8321.
ization for knowledge graph completion,” in Proc. Conf. Empirical [78] X. Han, Z. Liu, and M. Sun, “Neural knowledge acquisition via mutual
Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. attention between knowledge graph and text,” in Proc. AAAI, 2018,
Process. (EMNLP-IJCNLP), 2019, pp. 5185–5194. pp. 4832–4839.
[53] S. Amin, S. Varanasi, K. A. Dunfield, and G. Neumann, “LowFER: [79] T. Dong, Z. Wang, J. Li, C. Bauckhage, and A. B. Cremers, “Triple
Low-rank bilinear pooling for link prediction,” in Proc. ICML, 2020, classification using regions and fine-grained entity typing,” in Proc.
pp. 1–11. AAAI, vol. 33, 2019, pp. 77–85.
[54] Q. Liu et al., “Probabilistic reasoning via deep learning: Neural [80] P. Zhou et al., “Attention-based bidirectional long short-term memory
association models,” 2016, arXiv:1603.07704. [Online]. Available: networks for relation classification,” in Proc. 54th Annu. Meeting
http://arxiv.org/abs/1603.07704 Assoc. Comput. Linguistics, vol. 2, 2016, pp. 207–212.
[55] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, “Convolutional [81] E. Cao, D. Wang, J. Huang, and W. Hu, “Open knowledge enrichment
2D knowledge graph embeddings,” in Proc. AAAI, vol. 32, 2018, for long-tail entities,” in Proc. Web Conf., Apr. 2020, pp. 384–394.
pp. 1811–1818. [82] B. Shi and T. Weninger, “ProjE: Embedding projection for knowledge
[56] I. Balažević, C. Allen, and T. M. Hospedales, “Hypernetwork knowl- graph completion,” in Proc. AAAI, 2017, pp. 1236–1242.
edge graph embeddings,” in Proc. ICANN, 2019, pp. 553–565. [83] S. Guan, X. Jin, Y. Wang, and X. Cheng, “Shared embedding based
[57] M. Gardner, P. Talukdar, J. Krishnamurthy, and T. Mitchell, “Incorpo- neural networks for knowledge graph completion,” in Proc. 27th ACM
rating vector space similarity in random walk inference over knowledge Int. Conf. Inf. Knowl. Manage., Oct. 2018, pp. 247–256.
bases,” in Proc. Conf. Empirical Methods Natural Lang. Process. [84] B. Shi and T. Weninger, “Open-world knowledge graph completion,”
(EMNLP), 2014, pp. 397–406. in Proc. AAAI, 2018, pp. 1957–1964.
[58] A. Neelakantan, B. Roth, and A. McCallum, “Compositional vector [85] C. Zhang, Y. Li, N. Du, W. Fan, and P. S. Yu, “On the generative dis-
space models for knowledge base completion,” in Proc. 53rd Annu. covery of structured medical knowledge,” in Proc. 24th ACM SIGKDD
Meeting Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Int. Conf. Knowl. Discovery Data Mining, Jul. 2018, pp. 2720–2728.
Process., vol. 1, 2015, pp. 156–166. [86] N. Lao and W. W. Cohen, “Relational retrieval using a combination
[59] L. Yao, C. Mao, and Y. Luo, “KG-BERT: BERT for knowl- of path-constrained random walks,” Mach. Learn., vol. 81, no. 1,
edge graph completion,” 2019, arXiv:1909.03193. [Online]. Available: pp. 53–67, Oct. 2010.
http://arxiv.org/abs/1909.03193 [87] R. Das, A. Neelakantan, D. Belanger, and A. McCallum, “Chains of
[60] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, reasoning over entities, relations, and text using recurrent neural net-
and M. Welling, “Modeling relational data with graph convolutional works,” in Proc. 15th Conf. Eur. Chapter Assoc. Comput. Linguistics,
networks,” in Proc. ESWC, 2018, pp. 593–607. vol. 1, 2017, pp. 132–141.
[61] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [88] W. Chen, W. Xiong, X. Yan, and W. Y. Wang, “Variational knowledge
convolutional networks,” in Proc. ICLR, 2017, pp. 1–14. graph reasoning,” in Proc. Conf. North Amer. Chapter Assoc. Comput.
[62] D. Nathani, J. Chauhan, C. Sharma, and M. Kaul, “Learning attention- Linguistics: Human Lang. Technol., vol. 1, 2018, pp. 1823–1832.
based embeddings for relation prediction in knowledge graphs,” [89] W. Xiong, T. Hoang, and W. Y. Wang, “DeepPath: A reinforcement
in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, learning method for knowledge graph reasoning,” in Proc. Conf.
pp. 4710–4723. Empirical Methods Natural Lang. Process., 2017, pp. 564–573.
[63] S. Vashishth, S. Sanyal, V. Nitin, and P. Talukdar, “Composition-based [90] R. Das et al., “Go for a walk and arrive at the answer: Reasoning
multi-relational graph convolutional networks,” in Proc. ICLR, 2020, over paths in knowledge bases using reinforcement learning,” in Proc.
pp. 1–15. ICLR, 2018, pp. 1–18.
[64] Z. Wang, J. Zhang, J. Feng, and Z. Chen, “Knowledge graph and text [91] X. V. Lin, R. Socher, and C. Xiong, “Multi-hop knowledge graph
jointly embedding,” in Proc. Conf. Empirical Methods Natural Lang. reasoning with reward shaping,” in Proc. Conf. Empirical Methods
Process. (EMNLP), 2014, pp. 1591–1601. Natural Lang. Process., 2018, pp. 3243–3253.
[65] R. Xie, Z. Liu, J. Jia, H. Luan, and M. Sun, “Representation learning [92] Y. Shen, J. Chen, P.-S. Huang, Y. Guo, and J. Gao, “M-Walk: Learning
of knowledge graphs with entity descriptions,” in Proc. AAAI, 2016, to walk over graphs using monte carlo tree search,” in Proc. NeurIPS,
pp. 2659–2665. 2018, pp. 6786–6797.
[66] H. Xiao, M. Huang, L. Meng, and X. Zhu, “SSP: Semantic space [93] C. Fu, T. Chen, M. Qu, W. Jin, and X. Ren, “Collaborative policy
projection for knowledge graph embedding with text descriptions,” in learning for open knowledge graph reasoning,” in Proc. Conf. Empiri-
Proc. AAAI, 2017, pp. 3104–3110. cal Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang.
[67] S. Guo, Q. Wang, B. Wang, L. Wang, and L. Guo, “Semantically Process. (EMNLP-IJCNLP), 2019, pp. 2672–2681.
smooth knowledge graph embedding,” in Proc. 53rd Annu. Meeting [94] L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek, “AMIE:
Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Lang. Process., Association rule mining under incomplete evidence in ontological
vol. 1, 2015, pp. 84–94. knowledge bases,” in Proc. WWW, 2013, pp. 413–422.
[68] R. Xie, Z. Liu, and M. Sun, “Representation learning of knowledge [95] P. G. Omran, K. Wang, and Z. Wang, “An embedding-based approach
graphs with hierarchical types,” in Proc. IJCAI, 2016, pp. 2965–2971. to rule learning in knowledge graphs,” IEEE TKDE, vol. 33, no. 4,
[69] Y. Lin, Z. Liu, and M. Sun, “Knowledge representation learning with pp. 1348–1359, Apr. 2021.
entities, attributes and relations,” in Proc. IJCAI, 2016, pp. 2866–2872. [96] S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo, “Jointly embedding
[70] Z. Zhang, F. Zhuang, M. Qu, F. Lin, and Q. He, “Knowledge graph knowledge graphs and logical rules,” in Proc. Conf. Empirical Methods
embedding with hierarchical relation structure,” in Proc. Conf. Empir- Natural Lang. Process., 2016, pp. 192–202.
ical Methods Natural Lang. Process., 2018, pp. 3198–3207. [97] S. Guo, Q. Wang, L. Wang, B. Wang, and L. Guo, “Knowledge graph
[71] R. Xie, Z. Liu, H. Luan, and M. Sun, “Image-embodied knowledge embedding with iterative guidance from soft rules,” in Proc. AAAI,
representation learning,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2018, pp. 4816–4823.
Aug. 2017, pp. 3140–3146. [98] W. Zhang et al., “Iteratively learning embeddings and rules for knowl-
[72] W. Wu, H. Li, H. Wang, and K. Q. Zhu, “Probase: A probabilistic tax- edge graph reasoning,” in Proc. World Wide Web Conf. (WWW), 2019,
onomy for text understanding,” in Proc. SIGMOD, 2012, pp. 481–492. pp. 2366–2377.
[73] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and [99] T. Rocktäschel and S. Riedel, “End-to-end differentiable proving,” in
T. M. Mitchell, “Toward an architecture for never-ending language Proc. NIPS, 2017, pp. 3788–3800.
learning,” in Proc. AAAI, 2010, pp. 1306–1313. [100] F. Yang, Z. Yang, and W. W. Cohen, “Differentiable learning of
[74] R. Speer, J. Chin, and C. Havasi, “ConceptNet 5.5: An open logical rules for knowledge base reasoning,” in Proc. NIPS, 2017,
multilingual graph of general knowledge,” in Proc. AAAI, 2017, pp. 2319–2328.
pp. 4444–4451. [101] P.-W. Wang, D. Stepanova, C. Domokos, and J. Z. Kolter, “Differen-
[75] X. Chen, M. Chen, W. Shi, Y. Sun, and C. Zaniolo, “Embedding uncer- tiable learning of numerical rules in knowledge graphs,” in Proc. ICLR,
tain knowledge graphs,” in Proc. AAAI, vol. 33, 2019, pp. 3363–3370. 2020, pp. 1–12.
[76] P. Tabacof and L. Costabello, “Probability calibration for knowledge [102] M. Qu and J. Tang, “Probabilistic logic neural networks for reasoning,”
graph embedding models,” in Proc. ICLR, 2019. in Proc. NeurIPS, 2019, pp. 7710–7720.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
512 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

[103] Y. Zhang et al., “Efficient probabilistic logic reasoning with graph [128] Z. Sun, W. Hu, and C. Li, “Cross-lingual entity alignment via joint
neural networks,” in Proc. ICLR, 2020, pp. 1–20. attribute-preserving embedding,” in Proc. ISWC, 2017, pp. 628–644.
[104] W. Xiong, M. Yu, S. Chang, X. Guo, and W. Y. Wang, “One-shot [129] M. Chen, Y. Tian, K.-W. Chang, S. Skiena, and C. Zaniolo, “Co-
relational learning for knowledge graphs,” in Proc. Conf. Empirical training embeddings of knowledge graphs and entity descriptions for
Methods Natural Lang. Process., 2018, pp. 1980–1990. cross-lingual entity alignment,” in Proc. 27th Int. Joint Conf. Artif.
[105] X. Lv, Y. Gu, X. Han, L. Hou, J. Li, and Z. Liu, “Adapting meta Intell., Jul. 2018, pp. 3998–4004.
knowledge graph information for multi-hop reasoning over few-shot [130] Q. Zhang, Z. Sun, W. Hu, M. Chen, L. Guo, and Y. Qu, “Multi-view
relations,” in Proc. Conf. Empirical Methods Natural Lang. Process. knowledge graph embedding for entity alignment,” in Proc. 28th Int.
9th Int. Joint Conf. Natural Lang. Process. (EMNLP-IJCNLP), 2019, Joint Conf. Artif. Intell., Aug. 2019, pp. 5429–5435.
pp. 3374–3379. [131] B. D. Trsedya, J. Qi, and R. Zhang, “Entity alignment between
[106] M. Chen, W. Zhang, W. Zhang, Q. Chen, and H. Chen, “Meta relational knowledge graphs using attribute embeddings,” in Proc. AAAI, vol. 33,
learning for few-shot link prediction in knowledge graphs,” in Proc. 2019, pp. 297–304.
Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint Conf. [132] Z. Sun et al., “A benchmarking study of embedding-based entity
Natural Lang. Process. (EMNLP-IJCNLP), 2019, pp. 4217–4226. alignment for knowledge graphs,” in Proc. VLDB, 2020.
[107] C. Zhang, H. Yao, C. Huang, M. Jiang, Z. Li, and N. V. Chawla, “Few- [133] M. Craven et al., “Constructing biological knowledge bases by extract-
shot knowledge graph completion,” in Proc. AAAI, 2020, pp. 1–8. ing information from text sources,” in Proc. ISMB, 1999, pp. 77–86.
[108] P. Qin, X. Wang, W. Chen, C. Zhang, W. Xu, and W. Y. Wang, [134] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision
“Generative adversarial zero-shot relational learning for knowledge for relation extraction without labeled data,” in Proc. Joint Conf. 47th
graphs,” in Proc. AAAI, 2020, pp. 1–8. Annu. Meeting ACL 4th Int. Joint Conf. Natural Lang. Process. (AFNLP
[109] J. Baek, D. B. Lee, and S. J. Hwang, “Learning to extrapolate ACL-IJCNLP), vol. 2, 2009, pp. 1003–1011.
knowledge: Transductive few-shot out-of-graph link prediction,” in [135] J. Qu, D. Ouyang, W. Hua, Y. Ye, and X. Zhou, “Discovering
Proc. NeurIPS, 2020, pp. 546–560. correlations between sparse features in distant supervision for relation
[110] J. P. C. Chiu and E. Nichols, “Named entity recognition with bidi- extraction,” in Proc. 12th ACM Int. Conf. Web Search Data Mining,
rectional LSTM-CNNs,” Trans. Assoc. Comput. Linguistics, vol. 4, Jan. 2019, pp. 726–734.
pp. 357–370, Dec. 2016. [136] D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, “Relation classification
[111] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and via convolutional deep neural network,” in Proc. COLING, 2014,
C. Dyer, “Neural architectures for named entity recognition,” in Proc. pp. 2335–2344.
Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. [137] T. H. Nguyen and R. Grishman, “Relation extraction: Perspective from
Technol., 2016, pp. 260–270. convolutional neural networks,” in Proc. 1st Workshop Vector Space
[112] C. Xia et al., “Multi-grained named entity recognition,” in Proc. 57th Model. Natural Lang. Process., 2015, pp. 39–48.
Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 1430–1440. [138] D. Zeng, K. Liu, Y. Chen, and J. Zhao, “Distant supervision for relation
[113] A. Hu, Z. Dou, J.-Y. Nie, and J.-R. Wen, “Leveraging multi-token extraction via piecewise convolutional neural networks,” in Proc. Conf.
entities in document-level named entity recognition,” in Proc. AAAI, Empirical Methods Natural Lang. Process., 2015, pp. 1753–1762.
2020, pp. 7961–7968. [139] X. Jiang, Q. Wang, P. Li, and B. Wang, “Relation extraction with multi-
[114] X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, “A unified MRC instance multi-label convolutional neural networks,” in Proc. COLING,
framework for named entity recognition,” in Proc. 58th Annu. Meeting 2016, pp. 1471–1480.
Assoc. Comput. Linguistics, 2020, pp. 5849–5859. [140] H. Ye, W. Chao, Z. Luo, and Z. Li, “Jointly extracting relations with
[115] Y. Sun et al., “ERNIE 2.0: A continual pre-training framework for class ties via effective deep ranking,” in Proc. 55th Annu. Meeting
language understanding,” in Proc. AAAI, 2020, pp. 8968–8975. Assoc. Comput. Linguistics, vol. 1, 2017, pp. 1810–1820.
[116] W. Liu et al., “K-BERT: Enabling language representation with knowl- [141] W. Zeng, Y. Lin, Z. Liu, and M. Sun, “Incorporating relation paths in
edge graph,” in Proc. AAAI, 2020, pp. 1–8. neural relation extraction,” in Proc. Conf. Empirical Methods Natural
[117] X. Ren, W. He, M. Qu, C. R. Voss, H. Ji, and J. Han, “Label noise Lang. Process., 2017, pp. 1768–1777.
reduction in entity typing by heterogeneous partial-label embedding,” [142] Y. Xu, L. Mou, G. Li, Y. Chen, H. Peng, and Z. Jin, “Classifying rela-
in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, tions via long short term memory networks along shortest dependency
Aug. 2016, pp. 1825–1834. paths,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015,
[118] Y. Ma, E. Cambria, and S. Gao, “Label embedding for zero-shot pp. 1785–1794.
fine-grained named entity typing,” in Proc. COLING, 2016, [143] M. Miwa and M. Bansal, “End-to-end relation extraction using LSTMs
pp. 171–180. on sequences and tree structures,” in Proc. 54th Annu. Meeting Assoc.
[119] J. Hao, M. Chen, W. Yu, Y. Sun, and W. Wang, “Universal representa- Comput. Linguistics, vol. 1, 2016, pp. 1105–1116.
tion learning of knowledge bases by jointly embedding instances and [144] R. Cai, X. Zhang, and H. Wang, “Bidirectional recurrent convolutional
ontological concepts,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. neural network for relation classification,” in Proc. 54th Annu. Meeting
Discovery Data Mining, Jul. 2019, pp. 1709–1719. Assoc. Comput. Linguistics, vol. 1, 2016, pp. 756–765.
[120] Y. Zhao, A. Zhang, R. Xie, K. Liu, and X. Wang, “Connecting [145] Y. Shen and X. Huang, “Attention-based convolutional neural net-
embeddings for knowledge graph entity typing,” in Proc. 58th Annu. work for semantic relation extraction,” in Proc. COLING, 2016,
Meeting Assoc. Comput. Linguistics, 2020, pp. 6419–6428. pp. 2526–2536.
[121] H. Huang, L. Heck, and H. Ji, “Leveraging deep neural networks and [146] Y. Lin, S. Shen, Z. Liu, H. Luan, and M. Sun, “Neural relation
knowledge graphs for entity disambiguation,” 2015, arXiv:1504.07678. extraction with selective attention over instances,” in Proc. 54th Annu.
[Online]. Available: http://arxiv.org/abs/1504.07678 Meeting Assoc. Comput. Linguistics, vol. 1, 2016, pp. 2124–2133.
[122] W. Fang, J. Zhang, D. Wang, Z. Chen, and M. Li, “Entity disambigua- [147] G. Ji, K. Liu, S. He, and J. Zhao, “Distant supervision for relation
tion by knowledge and text jointly embedding,” in Proc. 20th SIGNLL extraction with sentence-level attention and entity descriptions,” in
Conf. Comput. Natural Lang. Learn., 2016, pp. 260–269. Proc. AAAI, 2017, pp. 3060–3066.
[123] O.-E. Ganea and T. Hofmann, “Deep joint entity disambiguation with [148] X. Han, P. Yu, Z. Liu, M. Sun, and P. Li, “Hierarchical relation extrac-
local neural attention,” in Proc. Conf. Empirical Methods Natural Lang. tion with coarse-to-fine grained attention,” in Proc. Conf. Empirical
Process., 2017, pp. 2619–2629. Methods Natural Lang. Process., 2018, pp. 2236–2245.
[124] P. Le and I. Titov, “Improving entity linking by modeling latent [149] L. Baldini Soares, N. FitzGerald, J. Ling, and T. Kwiatkowski, “Match-
relations between mentions,” in Proc. 56th Annu. Meeting Assoc. ing the blanks: Distributional similarity for relation learning,” in Proc.
Comput. Linguistics, vol. 1, 2018, pp. 1595–1604. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 2895–2905.
[125] M. Chen, Y. Tian, M. Yang, and C. Zaniolo, “Multilingual knowledge [150] Y. Zhang, P. Qi, and C. D. Manning, “Graph convolution over pruned
graph embeddings for cross-lingual knowledge alignment,” in Proc. dependency trees improves relation extraction,” in Proc. Conf. Empir-
26th Int. Joint Conf. Artif. Intell., Aug. 2017, pp. 1511–1517. ical Methods Natural Lang. Process., 2018, pp. 2205–2215.
[126] H. Zhu, R. Xie, Z. Liu, and M. Sun, “Iterative entity alignment via joint [151] Z. Guo, Y. Zhang, and W. Lu, “Attention guided graph convolutional
knowledge embeddings,” in Proc. 26th Int. Joint Conf. Artif. Intell., networks for relation extraction,” in Proc. 57th Annu. Meeting Assoc.
Aug. 2017, pp. 4258–4264. Comput. Linguistics, 2019, pp. 241–251.
[127] Z. Sun, W. Hu, Q. Zhang, and Y. Qu, “Bootstrapping entity alignment [152] N. Zhang et al., “Long-tail relation extraction via knowledge graph
with knowledge graph embedding,” in Proc. 27th Int. Joint Conf. Artif. embeddings and graph convolution networks,” in Proc. Conf. North,
Intell., Jul. 2018, pp. 4396–4402. 2019, pp. 3016–3025.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
JI et al.: SURVEY ON KNOWLEDGE GRAPHS: REPRESENTATION, ACQUISITION, AND APPLICATIONS 513

[153] Y. Wu, D. Bamman, and S. Russell, “Adversarial training for relation [178] A. García-Durán, S. Dumančić, and M. Niepert, “Learning sequence
extraction,” in Proc. Conf. Empirical Methods Natural Lang. Process., encoders for temporal knowledge graph completion,” in Proc. Conf.
2017, pp. 1778–1783. Empirical Methods Natural Lang. Process., 2018, pp. 4816–4821.
[154] P. Qin, W. Xu, and W. Y. Wang, “DSGAN: Generative adversarial [179] Y. Liu, W. Hua, K. Xin, and X. Zhou, “Context-aware temporal
training for distant supervision relation extraction,” in Proc. 56th Annu. knowledge graph embedding,” in Proc. WISE, 2019, pp. 583–598.
Meeting Assoc. Comput. Linguistics, vol. 1, 2018, pp. 496–505. [180] T. Lacroix, G. Obozinski, and N. Usunier, “Tensor decompositions for
[155] P. Qin, W. Xu, and W. Y. Wang, “Robust distant supervision relation temporal knowledge base completion,” in Proc. ICLR, 2020, pp. 1–12.
extraction via deep reinforcement learning,” in Proc. 56th Annu. [181] D. T. Wijaya, N. Nakashole, and T. M. Mitchell, “CTPs: Contextual
Meeting Assoc. Comput. Linguistics, vol. 1, 2018, pp. 2137–2147. temporal profiles for time scoping facts using state change detection,”
[156] X. Zeng, S. He, K. Liu, and J. Zhao, “Large scaled relation extraction in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP),
with reinforcement learning,” in Proc. AAAI, 2018, pp. 5658–5665. 2014, pp. 1930–1936.
[157] J. Feng, M. Huang, L. Zhao, Y. Yang, and X. Zhu, “Reinforcement [182] R. Goel, S. M. Kazemi, M. Brubaker, and P. Poupart, “Diachronic
learning for relation classification from noisy data,” in Proc. AAAI, embedding for temporal knowledge graph completion,” in Proc. AAAI,
2018, pp. 5779–5786. 2020, pp. 3988–3995.
[158] R. Takanobu, T. Zhang, J. Liu, and M. Huang, “A hierarchical frame- [183] R. Trivedi, H. Dai, Y. Wang, and L. Song, “Know-evolve: Deep
work for relation extraction with reinforcement learning,” in Proc. temporal reasoning for dynamic knowledge graphs,” in Proc. ICML,
AAAI, vol. 33, 2019, pp. 7072–7079. 2017, pp. 3462–3471.
[159] Y. Huang and W. Y. Wang, “Deep residual learning for weakly- [184] W. Jin, C. Zhang, P. Szekely, and X. Ren, “Recurrent event network
supervised relation extraction,” in Proc. Conf. Empirical Methods for reasoning over temporal knowledge graphs,” in Proc. ICLR RLGM
Natural Lang. Process., 2017, pp. 1803–1807. Workshop, 2019, pp. 1–6.
[160] T. Liu, X. Zhang, W. Zhou, and W. Jia, “Neural relation extraction via [185] T. Jiang et al., “Towards time-aware knowledge graph completion,” in
inner-sentence noise reduction and transfer learning,” in Proc. Conf. Proc. COLING, 2016, pp. 1715–1724.
Empirical Methods Natural Lang. Process., 2018, pp. 2195–2204. [186] T. Jiang et al., “Encoding temporal information for time-aware link
[161] K. Lei et al., “Cooperative denoising for distantly supervised relation prediction,” in Proc. Conf. Empirical Methods Natural Lang. Process.,
extraction,” in Proc. COLING, 2018, pp. 426–436. 2016, pp. 2350–2354.
[162] H. Jiang et al., “Relation extraction using supervision from topic [187] M. W. Chekol, G. Pirrò J. Schoenfisch, and H. Stuckenschmidt,
knowledge of relation labels,” in Proc. 28th Int. Joint Conf. Artif. Intell., “Marrying uncertainty and time in knowledge graphs,” in Proc. AAAI,
Aug. 2019, pp. 5024–5030. 2017, pp. 88–94.
[163] H. Shahbazi, X. Fern, R. Ghaeini, and P. Tadepalli, “Relation extrac- [188] R. Logan, N. F. Liu, M. E. Peters, M. Gardner, and S. Singh, “Barack’s
tion with explanation,” in Proc. 58th Annu. Meeting Assoc. Comput. wife hillary: Using knowledge graphs for fact-aware language model-
Linguistics, 2020, pp. 6488–6494. ing,” in Proc. ACL, 2019, pp. 5962–5971.
[189] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “ERNIE:
[164] T. Gao, X. Han, Z. Liu, and M. Sun, “Hybrid attention-based prototyp-
Enhanced language representation with informative entities,” in Proc.
ical networks for noisy few-shot relation classification,” in Proc. AAAI,
57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 1441–1451.
vol. 33, 2019, pp. 6407–6414.
[190] Y. Sun et al., “ERNIE: Enhanced representation through knowl-
[165] M. Qu, T. Gao, L.-P. A. Xhonneux, and J. Tang, “Few-shot relation
edge integration,” 2019, arXiv:1904.09223. [Online]. Available:
extraction via Bayesian meta-learning on relation graphs,” in Proc.
http://arxiv.org/abs/1904.09223
ICML, 2020, pp. 1–10.
[191] X. Wang, T. Gao, Z. Zhu, Z. Liu, J. Li, and J. Tang, “KEPLER:
[166] M. Miwa and Y. Sasaki, “Modeling joint entity and relation extraction
A unified model for knowledge embedding and pre-trained language
with table representation,” in Proc. Conf. Empirical Methods Natural
representation,” Trans. Assoc. Comput. Linguistics, vol. 9, pp. 176–194,
Lang. Process. (EMNLP), 2014, pp. 1858–1869.
Feb. 2020.
[167] A. Katiyar and C. Cardie, “Going out on a limb: Joint extraction of [192] T. Shen, Y. Mao, P. He, G. Long, A. Trischler, and W. Chen,
entity mentions and relations without dependency trees,” in Proc. 55th “Exploiting structured knowledge in text via graph-guided represen-
Annu. Meeting Assoc. Comput. Linguistics, vol. 1, 2017, pp. 917–928. tation learning,” in Proc. Conf. Empirical Methods Natural Lang.
[168] S. Zheng, F. Wang, H. Bao, Y. Hao, P. Zhou, and B. Xu, “Joint Process. (EMNLP), 2020, pp. 8980–8994.
extraction of entities and relations based on a novel tagging scheme,” [193] T. Sun et al., “CoLAKE: Contextualized language and knowledge
in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, vol. 1, 2017, embedding,” in Proc. 28th Int. Conf. Comput. Linguistics, 2020,
pp. 1227–1236. pp. 3660–3670.
[169] X. Li et al., “Entity-relation extraction as multi-turn question answer- [194] B. He et al., “BERT-MK: Integrating graph contextualized knowledge
ing,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, 2019, into pre-trained language models,” in Proc. Findings Assoc. Comput.
pp. 1340–1350. Linguistics (EMNLP), 2020, pp. 2281–2290.
[170] D. Dai, X. Xiao, Y. Lyu, S. Dou, Q. She, and H. Wang, “Joint extraction [195] F. Petroni et al., “Language models as knowledge bases?” in Proc.
of entities and overlapping relations using position-attentive sequence EMNLP-IJCNLP, 2019, pp. 2463–2473.
labeling,” in Proc. AAAI, vol. 33, 2019, pp. 6300–6308. [196] Z. Dai, L. Li, and W. Xu, “CFO: Conditional focused neural question
[171] Z. Wei, J. Su, Y. Wang, Y. Tian, and Y. Chang, “A novel cascade binary answering with large-scale knowledge bases,” in Proc. 54th Annu.
tagging framework for relational triple extraction,” in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, vol. 1, 2016, pp. 800–810.
Meeting Assoc. Comput. Linguistics, 2020, pp. 1476–1488. [197] Y. Chen, L. Wu, and M. J. Zaki, “Bidirectional attentive memory
[172] Y. Wang, B. Yu, Y. Zhang, T. Liu, H. Zhu, and L. Sun, “TPLinker: networks for question answering over knowledge bases,” in Proc. Conf.
Single-stage joint extraction of entities and relations through token North, 2019, pp. 2913–2923.
pair linking,” in Proc. 28th Int. Conf. Comput. Linguistics, 2020, [198] S. Mohammed, P. Shi, and J. Lin, “Strong baselines for simple question
pp. 1572–1582. answering over knowledge graphs with and without neural networks,”
[173] Z. Zhong and D. Chen, “A frustratingly easy approach for entity in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum.
and relation extraction,” 2020, arXiv:2010.12812. [Online]. Available: Lang. Technol., vol. 2, 2018, pp. 291–296.
http://arxiv.org/abs/2010.12812 [199] L. Bauer, Y. Wang, and M. Bansal, “Commonsense for generative
[174] D. Xu, C. Ruan, E. Korpeoglu, S. Kumar, and K. Achan, “Inductive multi-hop question answering tasks,” in Proc. Conf. Empirical Methods
representation learning on temporal graphs,” in Proc. ICLR, 2020, Natural Lang. Process., 2018, pp. 4220–4230.
pp. 1–19. [200] Y. Zhang, H. Dai, Z. Kozareva, A. J. Smola, and L. Song, “Variational
[175] J. Leblay and M. W. Chekol, “Deriving validity time in knowledge reasoning for question answering with knowledge graph,” in Proc.
graph,” in Proc. Companion The Web Conf. Web Conf. (WWW), 2018, AAAI, 2018, pp. 6069–6076.
pp. 1771–1776. [201] B. Y. Lin, X. Chen, J. Chen, and X. Ren, “KagNet: Knowledge-aware
[176] Y. Ma, V. Tresp, and E. A. Daxberger, “Embedding models for graph networks for commonsense reasoning,” in Proc. Conf. Empirical
episodic knowledge graphs,” J. Web Semantics, vol. 59, Dec. 2019, Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang.
Art. no. 100490. Process. (EMNLP-IJCNLP), 2019, pp. 2829–2839.
[177] S. S. Dasgupta, S. N. Ray, and P. Talukdar, “Hyte: Hyperplane- [202] M. Ding, C. Zhou, Q. Chen, H. Yang, and J. Tang, “Cognitive graph
based temporally aware knowledge graph embedding,” in Proc. Conf. for multi-hop reading comprehension at scale,” in Proc. 57th Annu.
Empirical Methods Natural Lang. Process., 2018, pp. 2001–2011. Meeting Assoc. Comput. Linguistics, 2019, pp. 2694–2703.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.
514 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 2, FEBRUARY 2022

[203] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma, “Collaborative Erik Cambria (Fellow, IEEE) received the Ph.D.
knowledge base embedding for recommender systems,” in Proc. 22nd degree through a joint program between the
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Aug. 2016, University of Stirling, Stirling, U.K., and the Media
pp. 353–362. Lab, Massachusetts Institute of Technology (MIT),
[204] H. Wang, F. Zhang, X. Xie, and M. Guo, “DKN: Deep knowledge- Cambridge, MA, USA, in 2012.
aware network for news recommendation,” in Proc. WWW, 2018, He worked at Microsoft Research Asia, Beijing,
pp. 1835–1844. China, and HP Labs India, Bengaluru, India.
[205] H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo, “Multi-task He is the Founder of SenticNet, a Singapore-
feature learning for knowledge graph enhanced recommendation,” in based company offering B2B sentiment
Proc. World Wide Web Conf. (WWW), 2019, pp. 2000–2010. analysis services, and an Associate Professor
[206] X. Wang, D. Wang, C. Xu, X. He, Y. Cao, and T.-S. Chua, “Explainable with Nanyang Technological University (NTU),
reasoning over knowledge graphs for recommendation,” in Proc. AAAI, Singapore, where he also holds the appointment of the Provost Chair in
vol. 33, 2019, pp. 5329–5336. Computer Science and Engineering.
[207] Y. Xian, Z. Fu, S. Muthukrishnan, G. de Melo, and Y. Zhang, Dr. Cambria is involved in many international conferences as a PC
“Reinforcement knowledge graph reasoning for explainable recommen- member and the Program Chair. He is also an Associate Editor of several
dation,” in Proc. 42nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., journals, e.g., Neurocomputing (NEUCOM), Information Fusion (INFFUS),
Jul. 2019, pp. 285–294. Knowledge-Based Systems (KBS), IEEE Computational Intelligence
[208] X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “KGAT: Knowledge Magazine (IEEE CIM), and IEEE Intelligent Systems (where he manages the
graph attention network for recommendation,” in Proc. SIGKDD, 2019, Department of Affective Computing and Sentiment Analysis).
pp. 950–958.
[209] A. Sharma and P. Talukdar, “Towards understanding the geometry of
knowledge graph embeddings,” in Proc. 56th Annu. Meeting Assoc.
Comput. Linguistics, vol. 1, 2018, pp. 122–131.
[210] P. W. Battaglia et al., “Relational inductive biases, deep learning,
and graph networks,” 2018, arXiv:1806.01261. [Online]. Available:
http://arxiv.org/abs/1806.01261

Pekka Marttinen is currently an Assistant Pro-


fessor of machine learning with Aalto Univer-
sity, Aalto, Finland, where he is also a Principal
Shaoxiong Ji received the bachelor’s degree in Investigator with the Machine Learning for Health
electronics and information engineering from the (Aalto-ML4H) Research Group. His research inter-
School of Information and Communication Engi- ests include probabilistic machine learning, deep
neering, Dalian University of Technology, Dalian, learning, Bayesian modeling, and applications in
China, in 2014. He is currently pursuing the Ph.D. genomics and healthcare. He has coauthored 49 arti-
degree with the Department of Computer Science, cles on these topics.
Aalto University, Aalto, Finland.
He worked as a Research Assistant at the Uni-
versity of Technology Sydney (UTS), Ultimo, NSW,
Australia, The University of Queensland (UQ), Bris-
bane, QLD, Australia, and Nanyang Technological
University (NTU), Singapore. His research interests include machine learning,
data mining, and knowledge representation and reasoning.

Philip S. Yu (Life Fellow, IEEE) received the


Shirui Pan (Member, IEEE) received the Ph.D. Ph.D. degree in electrical engineering from Stanford
degree in computer science from the Univer- University, Stanford, CA, USA.
sity of Technology Sydney (UTS), Ultimo, NSW, He is currently a Distinguished Professor of com-
Australia. puter science with the University of Illinois at
He was a Lecturer with the School of Software, Chicago, Chicago, IL, USA, where he is the Wexler
UTS. He is currently a Lecturer with the Fac- Chair in Information Technology. He has published
ulty of Information Technology, Monash University, more than 830 articles in refereed journals and
Clayton, VIC, Australia. To date, he has published conferences. He holds or has applied for more than
over 80 research articles in top-tier journals and 300 U.S. patents. His research interests include
conferences, including the IEEE T RANSACTIONS big data, data mining, data streams, databases, and
ON PATTERN A NALYSIS AND M ACHINE I NTEL - privacy.
LIGENCE (TPAMI), IEEE T RANSACTIONS ON K NOWLEDGE AND D ATA Dr. Yu is a fellow of the ACM. He received the ACM SIGKDD 2016 Inno-
E NGINEERING (TKDE), and the Annual Conference on Neural Information vation Award, the Research Contributions Award from the IEEE International
Processing Systems (NeurIPS). His research interests include data mining and Conference on Data Mining in 2003, and the Technical Achievement Award
machine learning. from the IEEE Computer Society in 2013.

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 28,2022 at 09:23:28 UTC from IEEE Xplore. Restrictions apply.

You might also like