Triple Trustworthiness Measurement For Knowledge Graph
Triple Trustworthiness Measurement For Knowledge Graph
ABSTRACT and intelligent application in the Internet age with its powerful
The Knowledge graph (KG) uses the triples to describe the facts semantic processing capabilities and open organization capabilities.
in the real world. It has been widely used in intelligent analysis It has attracted increasing attention in academic and industrial
and applications. However, possible noises and conflicts are in- circles. It usually stores knowledge in the form of triples (head
evitably introduced in the process of constructing. And the KG entity, relationship, tail entity), which can be simplified to (h, r , t).
arXiv:1809.09414v3 [cs.AI] 19 Feb 2019
based tasks or applications assume that the knowledge in the KG is The construction of the preliminary KG has mainly relied on man-
completely correct and inevitably bring about potential deviations. ual annotation or expert supervision [8, 35]. This way is extremely
In this paper, we establish a knowledge graph triple trustworthiness labor-intensive and time-consuming, and can no longer meet the
measurement model that quantify their semantic correctness and speed of updating and growth of the real-world knowledge [8].
the true degree of the facts expressed. The model is a crisscross- Therefore, an increasing number of researchers are committed to
ing neural network structure. It synthesizes the internal semantic productively extracting information directly from unstructured
information in the triples and the global inference information of web text, such as ORE [2, 9, 19], NELL [7], and to automatically
the KG to achieve the trustworthiness measurement and fusion in constructing large-scale knowledge graphs, such as Freebase [3],
the three levels of entity level, relationship level, and KG global DBpedia [1], and Wikidata1 . However, some noises and errors are
level. We analyzed the validity of the model output confidence inevitably introduced in the process of automation. References [23]
values, and conducted experiments in the real-world dataset FB15K and [14] verify the existence and problems of errors in the KG. Exist-
(from Freebase) for the knowledge graph error detection task. The ing knowledge-driven learning tasks or applications [11, 13, 22, 27],
experimental results showed that compared with other models, our assume knowledge in the existing KG is completely correct and
model achieved significant and consistent improvements. therefore bring about potential errors [28, 43].
For a piece of knowledge in a KG, especially from a professional
CCS CONCEPTS field, it is difficult to clearly determine whether it is true when
it is not tested in practice or is not strictly and mathematically
•Computing methodologies → Artificial intelligence; Natu-
proven. Therefore, we introduce the concept of KG triple trustwor-
ral language processing; Knowledge representation and reasoning;
thiness, which indicates the degree of certainty that the knowledge
Reasoning about belief and knowledge; Probabilistic reasoning;
expressed by the triple is true. It’s value is set to be within the in-
terval [0, 1]. The closer the value is to 0, the greater the probability
KEYWORDS
that the triple is in error. Based on this, we can find possible errors
Knowledge graph; trustworthiness; neural network; error detection in the existing KG and improve knowledge quality in the KG.
ACM Reference format: There are intricate relationships among the entities in the KG,
Shengbin Jia, Yang Xiang, and Xiaojun Chen. 2019. Triple Trustworthiness the same relationship can occur between different entities, and mul-
Measurement for Knowledge Graph. In Proceedings of Proceedings of the tiple relationships can associate with the same entity at the same
2019 World Wide Web Conference, San Francisco, CA, USA, May 13–17, 2019 time. It is a challenge to study how to use appropriate methods
(WWW ’19), 8 pages. to evaluate the trustworthiness for a knowledge triple. We pro-
DOI: 10.1145/3308558.3313586
pose a knowledge graph triple trustworthiness measurement model
(KGTtm), which is a crisscrossed neural network-based structure.
1 INTRODUCTION We measure the trustworthy probability from multiple levels, in-
A knowledge graph (KG), aims to describe the various entities or cluding the entity level (correlation strength between an entity pair),
concepts and their relationships existing in the objective world [44], the relationship level (translation invariance of relation vectors),
which lays the foundation for the knowledge-based organization and the KG global level (inference proof of triple related reachable
paths). Corresponding to different levels, we generate three essen-
∗ This author is the one who did all the really hard work. tial questions and focus on solving them by designing three kinds
of Estimators. Next, a comprehensive triple confidence value is
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
output through a Fusioner.
for profit or commercial advantage and that copies bear this notice and the full citation The main contributions of this work include: (1) We propose a
on the first page. Copyrights for components of this work owned by others than ACM knowledge graph triple trustworthiness measurement method that
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a makes comprehensive use of the triple semantic information and
fee. Request permissions from [email protected].
WWW ’19, San Francisco, CA, USA
© 2019 ACM. 978-1-4503-6674-8/19/05.
DOI: 10.1145/3308558.3313586 1 https://www.wikidata.org/wiki/Wikidata
h …… and research. However, there is a lack of systematic research on
the knowledge triple trustworthiness calculation method at present.
r
h
…… Our work is devoted to this basic research and proposes a unified
t measurement model that could facilitate a variety of tasks.
……
In this work, we verify the effect of the triple trustworthiness on
the knowledge graph error detection task. The Knowledge graph
TEF ResourceRank
RNN RNN RNN ... RNN
error detection task is dedicated to identifying whether a triple in
p(E(h,r,t)) RR(t,h) RP((h,r,t)) the KG is in error. The existence of noise and errors in the KG is un-
avoidable. Therefore, error detection is especially important for KG
construction and application. The error detection can actually be re-
Hidden Layer
garded as a special case of the trustworthiness measurement, which
is divided into two kinds of Boolean value types: “true (trusted)”
Triple Trustworthiness
and “error (untrusted)”. Traditional methods [7, 14, 16, 21] were
still based on manual detection, and the cost was considerable.
Figure 1: The triple trustworthiness measurement model for Recently, some people have begun to study automatic KG error
KG. detection methods [8, 23, 31, 36]. In particular, embedding-based
methods [5, 22, 39] have gained a significant amount of attention.
we can efficiently measure the semantic correlations of entities and
globally inferring information. We can achieve three levels of mea- relations in the vector space. Whether two entities have a poten-
surement and an integration of confidence value at the entity level, tial relationship could be predicted by matrix-vector operations of
relationship level, and the knowledge graph global level. (2) We their corresponding embeddings. They have good efficiency and
empirically verify the effectiveness of the triple trustworthiness on prospect. Furthermore, the Knowledge representation learning
benchmark data created from a real-world, large-scale KG Freebase. (KRL) technology is used to project the entities and relations in
Experimental results show that the error or noise instances are the KG into a dense, real-valued and low-dimensional semantic
assigned low confidence values, meanwhile, high trustworthiness embeddings. The main methods include TransE [5], TransH [42],
for true triples. (3) The trustworthiness calculated by the KGTtm TransR [25], TransD [18], PTransE [24], ComplEx [40] and others.
could be utilized in knowledge graph construction or improvement.
We evaluate the model on the knowledge graph error detection 3 THE TRIPLE TRUSTWORTHINESS
task. Experimental results show that the KGTtm can effectively
MEASUREMENT MODEL
detect the error triples and achieve promising performances.
The triple trustworthiness measurement (KGTtm) Model for knowl-
2 RELATED WORK edge graph is presented based on a crisscrossing neural network
structure, as shown in figure 1. Longitudinally, it can be divided
The concept of Trustworthiness has been applied to knowledge
into two levels. The upper is a pool of multiple trustworthiness
graph related tasks to some extent. Reference [43] proposed a triple
estimate cells (Estimator). The output of these Estimators forms the
confidence awareness knowledge representation learning frame-
input of the lower-level fusion device (Fusioner). The Fusioner is a
work, which improved the knowledge representation effect. There
Multi-layer perceptron to generate the final trustworthiness value
were three kinds of triple credibility calculation methods using
for each triple. Viewed laterally, for a given triplet (h, r, t), we con-
the internal structure information of the KG. This method used
sider the triple trustworthiness from three progressive levels and
only the information provided by the relationship, ignoring the
correspondingly answer three hierarchical questions. 1) Is there a
related entities. The NELL [7] constantly iterated the extracting
possible relationship between entity pairs (h, t)? 2) Can a certain
template and kept learning new knowledge. It used heuristics to
relationship r occur between entity pairs (h, t)? 3) From a global
assign confidence values to candidate relations and continuously
perspective, can other relevant triples in the KG infer that the triple
updated the values through the process of learning. This method
is trustworthy? For answering these questions we designed three
was relatively simple but lacked semantic considerations. Dong
kinds of Estimators, as described below.
et al. [8] constructed a probabilistic knowledge base (Knowledge
Vault), where the reliable probability of a triple was a fusion of
some estimators. Several extractors provided a reliability value;
3.1 Is there a possible relationship between the
meanwhile, a probability could be computed by many prior models entity pairs?
which were fitted with existing knowledge repositories in Freebase. We use the association strength between a given entity pair (h, t) to
This method was tailored for their knowledge base construction and measure the likelihood of an undetermined relationship occurring
did not have good generalization capabilities. Li et al. [22] used the between the pair. If a pair of entities has heavily weak relevance, it
neural network method to embed the words in ConceptNet and pro- seems to be hopeless that there is a relationship between the entity
vide confidence scores to unseen tuples to complete the knowledge pairs. The trustworthiness of the triples formed by the entity pair
base. This method considered only the triples themselves, ignoring will be greatly compromised. As shown in figure 2, there are dense
the global information provided by the knowledge base. The above edges (relationships) from node (entity) A to node E, that is, there
models used the trustworthiness to solve various specific tasks. It is a high association strength about (A, E). We can easily guess that
shows that the triple trustworthiness is important for applications there is a relationship between entity pair (A, E). However, it is
A D
D
A O E
B
?
? F
B C
G Christianity Islam
? H
Protestantism
?
ion
A
on
el i g
C
i
lig
Islamism
al R
Re
io
Riyadh
lig
B
n
tio
Re
CityO
Na
nCity fCou
? BornI
ntry
G
F SaudiArabia
Nationality
BinLaden
Death
te
berSta
ty
Obama oun
InP
a ticC
Mem
om
lace
D
pl
E BinLaden Di
MemberState
Pakistan IslamicCountry
Figure 2: The graph of resource alloca- Figure 3: Effects display of the Trans- Figure 4: The inference instances for
tion in the ResourceRank algorithm. lation based energy function. triple trustworthiness.
impossible to reach F from G following the directed edges. We can Different states of nodes in the graph reflect information of
also guess that there is no a relationship between (G, F ). entity. Considering the following six characteristics: 1) R (t | h); 2)
We propose an algorithm named ResourceRank, to character- In-degree of head node ID(h); 3) Out-degree of head node OD(h);
ize the association strength between an entity pair according to the 4) In-degree of tail node ID(t); 5) Out-degree of tail node OD(t);
idea of Resource allocation [24, 26, 43, 46]. The algorithm assumes 6)The depth from head node to tail node Dep, we can construct a
that the association between entity pairs (h, t) will be stronger, and feature vector V . After being activating, the vector is transformed
more resource is passed from the head h through all associated into a probability value as RR (h, t), indicating the likelihood that
paths to the tail t in a graph. The amount of resource aggregated there may be one or more relationships between the head entity h
into t ingeniously indicates the association strength from h to t. and the tail entity t. This transformation is:
The ResourceRank algorithm mainly includes three steps: 1) Con-
u = α (W1V + b1 )
structing a directed graph centered on the head entity h. 2) Iterating (2)
RR (h, t) = W2u + b2
the flow of resources in the graph until it converges and calculates
the resource retention value of the tail entity t. 3) Synthesizing Here, α is a nonlinear activation function, Wi and bi are parameter
other features and output the likelihood of (h, ?, t). matrices that can be trained during model training. The RR (h, t)
Specific details are described below: Each entity is abstracted is within the range [0, 1]. The closer it is to 1, the more likely it
into a node. If there is a relationship from the entities e 1 to e 2 , a is that there is a relationship between h and t, which answer the
directed edge will exist from node e 1 to e 2 . Therefore, the KG can questions shown in the title in the entity layer.
be mapped as a directed graph. This graph is weakly connected,
but starting from h each node in the graph can be reached. In the 3.2 Can the determined relationship r occur
initial state, the resource amount of h is 1, others is 0, and the sum between the entity pair (h, t) ?
of all nodes is always 1. If a node does not exist in the graph, then
its resource is always 0. Moreover, there may be multiple relations The above Estimator can only measure the likelihood that one
between an entity pair (e 1 , e 2 ) but only one directed edge from e 1 to undetermined relationship occurring between the entity pair, but
e 2 in the graph. Depending on the number of these relations, each not what kind of relationship. We next calculate the possibility
edge will have a different bandwidth. The larger the bandwidth is, of such a relation r occurring between the entity pair (h, t) by the
the more resource flows through the edge. Translation-based energy function (TEF) algorithm.
The resource owned by node h will flow through all associated Inspired by the translation invariance phenomenon in the word
paths to each other nodes in the entire graph. We simulate the embedding space [29, 30], the relationship in the KG is regarded
flow of resource flowing until distribution steady based on the as a certain translation between entities; that is, the relational
PageRank [6, 33] algorithm. The value of the resource on the tail vector r is as the translating operations between the head entity
entity is R (t | h), it is calculated as follows: embedding h and the tail entity embedding t [5]. As illustrated
in figure 3, in the vector space, the same relational vector can be
Õ R (ei | h) · BWe t θ mapped to the same plane and freely translated in the plane to
R (t | h) = (1 − θ ) i
+ . (1) remain unchanged. The triples (BinLaden, Religion, Islam) and
OD (ei ) N
e i ∈M t (Obama, Religion, Protestantism) should be all correct. However,
according to translational invariance of relation vectors, (BinLaden,
Where, Mt is the set of all nodes that have outgoing links to the Religion, Protestantism) must be wrong. Therefore, An trustworthy
node t, OD (ei ) is the out-degree of the node ei and the BWei t is triple (h, r, t), should satisfy h+r ≈ t. The energy function is defined
the bandwidth from the ei to t. Thus, for each node ei in Mt , the as E(h, r , t) = kh + r − tk. The higher the degree of fit between h, r ,
R(e i |h)·BWei t
resource transferred from ei to t should be O D(e i )
. Because and t, the smaller the value of E(h, r , t) will be. We believe that the
the KG noises, the graph is imperfect. And there may be closed smaller the E(h, r , t) value is, the probability that the relationship r
loops affecting the resources flowing. In order to improve the model is established between the entity pair (h, t) will be greater, and the
fault-tolerance, we assume that the resource flow from each node trustworthiness of (h, r , t) will be better, and vice versa.
may directly jump to a random node with the same probability θ . The TEF algorithm operates as follows: Firstly, knowledge repre-
This part of resource that flows to t randomly is N1 , and N is the sentation learning technology is used to implement a low-dimensional
total number of nodes. distributed representation for entities or relations, and we compute
E(h, r , t) for each triple. Then, a modified sigmoid function is used target triple. Therefore, we propose a Semantic distance-based
to convert E(h, r, t) into the probability that the entity pair (h, t) path selection algorithm, which is described as Algorithm 1.
constitutes the relationship r . The conversion formula is as follows:
3.3.2 Reachable Paths Representation. After the paths are se-
1 lected, it is necessary to map each path to a low-dimensional vector
P(E(h, r , t)) = (3)
1 + e −λ(δr −E(h,r,t )) for subsequent calculations. The previous methods [24, 43] merely
Here, δr is a threshold related to the relationship r . When E(h, r, t) considered the relations in the paths. Here, we consider whole
= δr , the probability value P is 0.5. If E(h, r , t) ¡ δr , then P ¿ 0.5. The triple in the paths, including not only relations but also the head,
λ is a hyperparameter used for smoothing and can be adjusted dy- tail entities, since the entities can also provide significant seman-
namically along with the model training. The P(E(h, r , t)) answers tic information. The embeddings of the three elements of each
the second question in the relation layer. triple are concatenated as a unit s. Therefore, a path is transformed
into an ordered sequence S = s 1 , s 2 , ..., sn . We use the recurrent
neural networks (RNNs) [15], which are good at capturing tem-
3.3 Can the relevant triples in the KG infer
poral semantics of a sequence, to learn the semantic information
that the triple is trustworthy? contained in the path. The RNN layer encodes st by considering
Inspired by “social identity” theory [17, 41], we make an metaphor: forward information from s 1 to st . We use the output vector ht of
regarding the KG as a social group, where each triple is an indi- the last time to represent the semantic information of each path.
vidual. The degree of acknowledgements from other individuals to We stitch the output ht of the TopK paths together to form a vector.
the targeted individual (target triple) reflects whether the targeted The vector is nonlinear transformed (using the method in eq(2).)
individual can properly integrate into the society (i.e., the KG). We into a recognition value as RP((h, r, t)) , indicating the recognition
believe that only a true triple can achieve popular recognition. Vice degree that the relevant triples in the KG infer the target triple
versa, if a triple is well accepted, we tend to believe that it is trust- being credible.
worthiness. Therefore, the answer is yes to the question in the title.
How to infer the credibility of the target triple by evaluating the
Algorithm 1 Reachable Paths Selecting Algorithm
acknowledgements of the relevant triples in the KG?
We design a Reachable paths inference (RPI) algorithm to Require:
meet it. There are many substantial multi-step paths from head The knowledge graph (KG); A given target triple (h, r, t).
entities to tail entities, which indicate the semantic relevance and Ensure:
the complex inference patterns among triples [34]. These reachable Multiple reachable paths most relevant to target triple.
paths will be important evidences for judging the triple trustworthi- 1: Search the reachable paths from h to t in the KG and store in
P (h,r,t ) = p1 , ..., pn ;
ness. For example, as shown in figure 4, there are multiple reachable
2: For each pi = (h, l 1 , e 1 ), (e 1 , l 2 , e 2 ), ..., (en−1 , ln , t) , calculate
paths between entity pairs “Bin Laden” and “Saudi Arabia”. Ac-
Bor nI nCity CityO f Count r y 1) the semantic distance between the r and all relations in pi ,
cording to the path “Bin Laden → Riyadh → r ·l
as, SD(pi (l), r ) = n1 l j ∈pi (l ) kr k jl ;
Í
Saudi Arabia”, we can firmly infer the fact triple (Bin Laden, Nation- k jk
ality, Saudi Arabia). In addition, we suppose there is a pseudo-triple 2) the semantic distance between the t and all head entities in
t ·e
(Bin Laden, Religion, Christianity) in the KG. The related paths will pi , as, SD(pi (e), t) = n1 e j ∈pi (e) kt k ej ;
Í
k jk
be very few and illogical, and we should doubt the credibility of 3) the semantic distance between the h and all tail entities in
this tuple. In contrast, we can find the correct triple (Bin Laden, h ·e
pi , as, SD(pi (e), h) = n1 e j ∈pi (e) kh k ej ;
Í
Religion, Islam) because it gets good acknowledgements depend- k jk
ing on multiple reachable paths. To exploit the reachable paths 3: Calculate the average distance
¯ i ) = 1 (SD(pi (e), t) + SD(pi (l), r ) + SD(pi (e), h));
SD(p
for inferring triple trustworthiness, we need to address two key 3
4: Select first TopK paths with the highest SD(p ¯ i ) scores.
challenges:
5: Return pi | 1 6 i 6 TopK, Sort(SD(p ¯ i ), descend)
3.3.1 Reachable Paths Selection. In a large-scale KG, the num-
ber of reachable paths associated with a triple may be enormous.
It is consuming to weigh all the paths. meanwhile, not all paths
are meaningful and reliable. For example, the path “Bin Laden 3.4 Fusing the Estimators
Deat hI nPl ace Diplomat icCount ry
→ Pakistan → Saudi Arabia” provided We designed a Fusioner based on a multi-layer perceptron [12]
only scarce evidence to reason about the credibility of the triple to output the final triples trustworthiness values. A simple way
(Bin Laden, Nationality, Saudi Arabia). Therefore, it is necessary to to combine the above Estimators is to splice their outputs into a
choose the most efficient reachable paths to use. Previous works feature vector f (s) for each triple s = (h, r , t) and,
believed that the paths that led to lots of possible tail entities
f (s) = RR(h, t), p(E(s)), RP(s) (4)
were mostly unreliable for the entity pair. They proposed a path-
constraint resource allocation algorithm to select relation paths [24, The vector f (s) will be inputted into the Fusioner and transformed
43]. Such a method ignored the semantic information of the paths. passing multiple hidden layers. The output layer is a binary classifier
However, we find that the reliability of the reachable path is actu- by assigning a label of y = 1 to true tuples and a label of y = 0 to
ally a consideration of the semantic relevance of the path with the fake ones. A nonlinear activation function (logistic sigmoid) is used
to calculate p(y = 1| f (s)) as,
hi = σ (Whi f (s) + bhi )
(5)
Confidence Value
p(y = 1 | f (s)) = φ(Wo h + bo )
Values
Where hi is the i t h hidden layer, Whi and bhi are the parameter
matrices to be learned in the i t h hidden layer, and Wo and bo are
the parameter matrices of the output layer.
Fake Triples True Triples
4 EXPERIMENTS (a)
Triple Trustworthiness
(b)
4.1 Experimental Settings
We focus on Freebase [4], which is one of the most popular real- Figure 5: (a) The scatter plot of the triple confidence values
world large-scale knowledge graphs, and we perform our experi- distribution. (b) The various value cures of precision and re-
ments on the FB15K [5], which is a typical benchmark knowledge call with the triple confidence values.
graph extracted from Freebase. As for the FB15K, there are 1,345 re-
lations and 14,951 entities and the corresponding 592,213 triples. We
4.2 Interpreting the Validity of the Trustworth-
use all of the 592,213 triples to construct graphs which are described
in Section 3.1. Each head entity is the core of a graph, so we can iness
construct 14,951 graphs. These graphs are used for ResourceRank To verify whether the triple trustworthiness outputs from KGTtm
algorithm and Reachable paths inference algorithm. is valid, we do the following analysis on the test set.
There are no explicit labelled errors in the FB15K. Considering We display the triple confidence values in a centralized coor-
the experience that most errors in real-world KGs derive from the dinate system, as shown in figure 5(a). The left area shows the
misunderstanding between similar entities, we use the methods distribution of the values of the negative examples, while the right
described in [43] to generate fake triples as negative examples auto- area shows that of the positive examples. It can be seen that the
matically where the picked entity should once appear at the same confidence values of the positive examples are mainly concentrated
position. For example, (Newton, Nationality, American) is a poten- in the upper region (¿ 0.5)4 . In contrast, the values of the negative
tial negative example rather than the obvious irrational (Newton, examples are mainly concentrated in the lower region (¡ 0.5). It is
Nationality, Google), given a true triple (Newton, Nationality, Eng- consistent with the natural law of judging triple trustworthiness,
land), as England and American are more common as the tails of proving that the triple confidence values output from our model
Nationality. We assure that the number of negative examples is are meaningful.
equal to that of positive examples. In a random but quantitatively In addition, by dynamically setting the threshold for the triple
balanced manner, one of the three kinds of fake triples may be con- confidence values (only if the value of a triple is higher than the
structed for each true triple: one by replacing head entity, one by threshold can it be considered trustworthy.), we can measure the
replacing relationship, and one by replacing tail entity. We assign curves of the precision and recall of the output, as shown in fig-
a label of 1 to positive examples and 0 to negative examples. There- ure 5(b). As the threshold increases, the precision continues to
fore, we build a corpus for our experiments which contains double increase, and the recall continues to decrease. When the threshold
592,213 triples. It is separated for training (double 483,142 triples), is adjusted within the interval [0, 0.5], there is no obvious change in
valid (double 50,000 triples), and testing (double 59,071 triples) the recall, and it remains at a high level. However, if the threshold
We implement the neural network using the Keras library23 . is adjusted within the interval [0.5, 1] , the recall tends to decline.
The dimension of the entity and relation embeddings is 100. The In particular, the closer the threshold is to 1, the greater the decline
batch size is fixed to 50. We use early stopping [10] based on the rate will be. These show that the positive examples universally
performance on the validation set. The number of RNN units is 100. have higher confidence values (¿ 0.5). Moreover, the precision has
Parameter optimization is performed with the Adam optimizer [20], remained at a relatively high level, even when the threshold is set
and the initial learning rate is 0.001. In addition, to mitigate over- to a small value, which indicates that our model can identify the
fitting, we apply the dropout method [38] to regularize our model. negative instances well and assign them a small confidence value.
In addition, there are some adjustable parameters during the
model training. θ is the probability that the resource flow from a 4.3 Comparing With Other Models on The
node directly jumps to a random node. According to the value in Knowledge Graph Error Detection Task
the PageRank algorithm [6, 33], we set θ = 0.15. We set K = 4 The Knowledge graph error detection task is to detect possible er-
and TopK = 3. If the two parameters are set too large, the con- rors in the KG according to their triple trustworthy scores. Exactly,
sumption of model training will be greatly increased. If they are it aims to predict whether a triple is correct or not, which could be
set too small, the related algorithm will be affected. Thus this is the viewed as a triple classification task [37].
trade-off after repeated attempts. The relation-specific threshold δr We give several evaluation results. (1) The accuracy of classifica-
can be searched via maximizing the classification accuracy on the tion. The decision strategy for classification is that if the confidence
validation triples, which belong to the relation r .
4 The output-layer of our model is a binary classifier, and the confidence value of a
2 https://github.com/keras-team/keras
triple is the probability that the triple is predicted as label=1, therefore we choose a
3 The code can be obtained from https://github.com/TJUNLP/TTMF. threshold of 0.5.
Table 1: Evaluation results on the Knowledge graph error detection.
Models MLP Bilinear TransE TransH TransD TransR PTransE Ours TransE Ours PTransE Ours TransH
Accuracy 0.833 0.861 0.868 0.912 0.913 0.902 0.941 0.977 0.978 0.981
F1-score 0.846 0.869 0.876 0.913 0.913 0.904 0.942 0.975 0.979 0.982
Table 2: Evaluation results of the three type noises. 4.4 Analyzing the ability of models to tackle
the three type noises.
Models (h, r, ?) (h, ?, t) (?, r, t) Three types of errors or noises are generated by replacing the head
Recall Quality Recall Quality Recall Quality entity, tail entity, or relation in the triples. We measure the ability
MLP 0.970 0.791 0.912 0.735 0.978 0.844 of the models to recall positive cases from candidate triples doped
Bilinear 0.936 0.828 0.904 0.807 0.973 0.907 with a large number of noises.
TransE 0.960 0.796 0.927 0.759 0.959 0.786 We select only true triples (positive examples) in test set and
TransH 0.935 0.826 0.927 0.811 0.955 0.850 divide them into three categories: all pairs of head and relation
TransD 0.942 0.838 0.909 0.804 0.954 0.853 (h, r , ?), all pairs of head and tail (h, ?, t), and all pairs of tail and
TransR 0.964 0.872 0.921 0.829 0.972 0.868 relation (?, r, t). Then complement all empty positions with the
PTransE 0.944 0.841 0.973 0.888 0.957 0.863 objects in entity set or relationship set. In this way, for a certain pair
Ours 0.987 0.943 0.977 0.923 0.994 0.959 of head and relation or a pair of tail and relation, 14,951 candidate
triples can be constructed respectively. Similarly, for a pair of head
and tail entities, 1,345 candidate triples can be generated. As for
a complemented triple (h, r , t), we calculate its confidence value.
Table 3: Evaluation results of each single estimator on the When the value is higher than the threshold (¿ 0.5), we judge it to be
Knowledge graph error detection. correct. Two evaluation metrics are conducted as: (1) The recall of
true triples in the test set (Recall). (2) The average trustworthiness
values across each set of true triples (Quality) [22].
Models TEF(TransE) ResourceRank RPI KGTtm By analyzing the results in table 2, we find that our model
Accuracy 0.868 0.811 0.881 0.977 achieves a higher recall on the three types of test sets compared to
other models, it shows that our model can more accurately find the
right from noisy triples. The average trustworthiness values of our
model is higher than that of others, which show that our model can
better identify the correct instances and with high confidence val-
ues. In addition, our model achieves the best results on the (?, r, t)
value of a testing triple (h, r, t) is below the threshold 0.5, it is set, but the worst among the (h, ?, t) set. It can be found that the
predicted as negative, otherwise, it is positive. (2) The maximum output of almost all models satisfies this phenomenon. It is difficult
F1-score when the given threshold is at [0, 1]. to judge the relation types of an entity pair, because there may be
As shown in table 1, our model has better results in terms of various relations between a certain entity pair, which increases the
accuracy and the F1-score than the other models. The Bilinear difficulty of model judgment.
model [22] [32] [45] and Multi layer perceptron (MLP) model [8] [22]
have been widely applied to the KG related tasks. They can cal-
culate a score for the validity of triples through operations, such 4.5 Analyzing the Effects of Single Estimators
as tensor decomposition and nonlinear transformation. Here we To measure the effect of single Estimators, we separate each Esti-
convert the scores to the confidence values using the sigmoid func- mator as an independent model to calculate the confidence values
tion. Compared with the Bilinear and MLP models, our model for triples. The results in the knowledge graph error detection are
shows improvements of more than 10% in the two evaluation in- shown in table 3. It can be found that the accuracy obtained by each
dicators. We use the TEF algorithm (as illustrated in Section 3.2) model is above 0.8, which proves the effectiveness of each Estimator.
to transform the output of the embedding-based models of TransE, Among them, the Reachable paths inference (RPI) based method
TransH, TransD, TransR, and PTransE into triple confidence val- achieves better results than the other two Estimators. After combin-
ues. These embedding-based models are better than the traditional ing all the Estimators, the accuracy obtained by the global model
method, but their results are affected by the quality of the embed- (KGTtm) has been greatly improved, which shows that our model
dings. In comparison, our model does not rely on word embeddings. has good flexibility and scalability. It can well integrate multiple
We introduce different embeddings into our model, as shown by aspects of information to obtain a more reasonable trustworthiness.
Ours TransE, Ours TransH, and Ours PTransE, which have very It is worth emphasizing that our model is flexible and easy to
subtle effects. Since our model makes full use of the internal seman- extend. The newly added estimators can train their parameters
tic information of the triple and the global inference information of together with the model frame. In addition, the confidence value
the knowledge graph, it is more robust to achieve the three-level generated by a single estimator can be extended to the feature
measure of trustworthiness. vector f (s) straightly.
5 CONCLUSION [13] Baoli Han, Ling Chen, and Xiaoxue Tian. 2018. Knowledge based collection
selection for distributed information retrieval. Information Processing and Man-
In this paper, to eliminate the deviation caused by the errors in agement 54, 1 (2018), 116–128.
the KG to the knowledge-driven learning tasks or applications, [14] Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. 2016. Vandal-
ism Detection in Wikidata. In ACM International on Conference on Information
we establish a knowledge graph triple trustworthiness measure- and Knowledge Management. 327–336.
ment model (KGTtm) to detect and eliminate errors in the KG. The [15] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.
KGTtm is a crisscrossing neural network structure, it evaluates the Neural computation 9, 8 (1997), 1735–1780.
[16] Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, and Gerhard Weikum.
trustworthiness of the triples from three perspectives and syntheti- 2013. YAGO2: A spatially and temporally enhanced knowledge base from
cally uses the triple semantic information and the global inference Wikipedia. Artificial Intelligence 194 (2013), 28–61.
information of the knowledge graph. Experiments were conducted [17] Paul James. 2015. Despite the terrors of typologies: the importance of under-
standing categories of difference and identity. Interventions 17, 2 (2015), 174–195.
on the popular knowledge graph Freebase. The experimental re- [18] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge
sults confirmed the capabilities of our model. In the future, we will Graph Embedding via Dynamic Mapping Matrix. In Meeting of the Association
for Computational Linguistics and the International Joint Conference on Natural
explore adding more estimators to the model to further improve Language Processing. 687–696.
the effectiveness of the trustworthiness. We will also try to apply [19] Shengbin Jia, Maozhen Li, Yang Xiang, and Others. 2018. Chinese Open Relation
the trustworthiness to more knowledge-based applications. Extraction and Knowledge Base Establishment. ACM Transactions on Asian and
Low-Resource Language Information Processing (TALLIP) 17, 3 (2018), 15.
[20] Diederik Kingma and Jimmy Ba. 2014. Adam: a method for stochastic optimiza-
ACKNOWLEDGMENTS tion. arXiv preprint arXiv:1412.6980 (2014), 1–13. https://doi.org/10.1109/ICCE.
2017.7889386 arXiv:1412.6980
The authors would also like to thank the anonymous referees for [21] J. Lehmann. 2015. DBpedia: A large-scale, multilingual knowledge base extracted
their valuable comments and helpful suggestions. This work was from wikipedia. Semantic Web 6, 2 (2015), 167–195.
supported in part by the National Natural Science Foundation of [22] Xiang Li, Aynaz Taheri, Lifu Tu, and Kevin Gimpel. 2016. Commonsense Knowl-
edge Base Completion. In Meeting of the Association for Computational Linguistics.
China under Grant 71571136, and in part by the Project of Science 1445–1455.
and Technology Commission of Shanghai Municipality under Grant [23] Jiaqing Liang, Yanghua Xiao, Yi Zhang, Seung-won Hwang, and Haixun Wang.
2017. Graph-Based Wrong IsA Relation Detection in a Large-Scale Lexical
16JC1403000 and Grant 14511108002. Taxonomy.. In AAAI. 1178–1184.
[24] Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu.
REFERENCES 2015. Modeling Relation Paths for Representation Learning of Knowledge Bases.
(2015). https://doi.org/10.18653/v1/D15-1082 arXiv:1506.00379
[1] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyga-
[25] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning
niak, and Zachary Ives. 2007. DBpedia: A nucleus for a Web of open data. In
Entity and Relation Embeddings for Knowledge Graph Completion. Proceedings
Lecture Notes in Computer Science (including subseries Lecture Notes in Artifi-
of the Twenty-Ninth AAAI Conference on Artificial Intelligence Learning (2015).
cial Intelligence and Lecture Notes in Bioinformatics), Vol. 4825 LNCS. 722–735.
[26] Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey.
https://doi.org/10.1007/978-3-540-76298-0 52
Physica A: statistical mechanics and its applications 390, 6 (2011), 1150–1170.
[2] Michele Banko, Mj Cafarella, and Stephen Soderland. 2007. Open information
[27] Denis Lukovnikov, Asja Fischer, and Jens Lehmann. 2017. Neural Network-based
extraction for the web. In the 16th International Joint Conference on Artificial
Question Answering over Knowledge Graphs on Word and Character Level. In
Intelligence (IJCAI 2007). 2670–2676. https://doi.org/10.1145/1409360.1409378
International Conference on World Wide Web. 1211–1220.
[3] Kurt Bollacker, Robert Cook, and Patrick Tufts. 2007. Freebase: A shared database
[28] Michel V Manago and Yves Kodratoff. 1987. Noise and knowledge acquisition.
of structured general human knowledge. Proceedings of the national conference
In International Joint Conference on Artificial Intelligence. 348–354.
on Artificial Intelligence (2007).
[29] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient
[4] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor.
Estimation of Word Representations in Vector Space. IJCAI International Joint
2008. Freebase: a collaboratively created graph database for structuring human
Conference on Artificial Intelligence (2013). https://doi.org/10.1016/j.physletb.
knowledge. In Proceedings of the 2008 ACM SIGMOD international conference
2012.02.070 arXiv:1301.3781
on Management of data (SIGMOD 2008). 1247–1250. https://doi.org/10.1145/
[30] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013.
1376616.1376746
Distributed Representations of Words and Phrases and their Compositionality.
[5] Antoine Bordes, Nicolas Usunier, Jason Weston, and Oksana Yakhnenko. 2013.
In Advances in neural information processing systems. 3111–3119. https://doi.
Translating Embeddings for Modeling Multi-Relational Data. Advances in NIPS
org/10.1162/jmlr.2003.3.4-5.951 arXiv:1310.4546
(2013). https://doi.org/10.1007/s13398-014-0173-7.2 arXiv:arXiv:1011.1669v3
[31] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2015.
[6] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Ra-
A Review of Relational Machine Learning for Knowledge Graph. Proc. IEEE
jagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. 2000. Graph
(2015). https://doi.org/10.1109/JPROC.2015.2483592 arXiv:1503.00759
structure in the Web. Computer Networks 33, 1 (2000), 309–320.
[32] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way
[7] Andrew Carlson, Justin Betteridge, and Bryan Kisiel. 2010. Toward an Architec-
Model for Collective Learning on Multi-Relational Data. In ICML.
ture for Never-Ending Language Learning. In Proceedings of the Conference on
[33] L Page, S Brin, R Motwani, and T Winograd. 1998. The PageRank citation
Artificial Intelligence (AAAI) (2010) (2010). https://doi.org/10.1002/ajp.20927
ranking:bringing order to the web. Technical report, Stanford Digital Library
[8] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin
Technologies Project (1998). https://doi.org/SIDL-WP-1999-0120
Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowl-
[34] Giuseppe Pirr. 2015. Explaining and Suggesting Relatedness in Knowledge
edge vault: a web-scale approach to probabilistic knowledge fusion. Proceed-
Graphs. In International Semantic Web Conference.
ings of the 20th ACM SIGKDD international conference on Knowledge discovery
[35] Liu Qiao, Li Yang, Duan Hong, Liu Yao, and Zhiguang Qin. 2016. Knowledge
and data mining - KDD ’14 (2014). https://doi.org/10.1145/2623330.2623623
Graph Construction Techniques. Journal of Computer Research & Development
arXiv:arXiv:1301.3781v3
(2016).
[9] Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations
[36] Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for
for open information extraction. In Proceedings of the 2011 Conference on Empirical
fact checking in knowledge graphs. Knowledge-Based Systems 104, C (2016),
Methods in Natural Language Processing (EMNLP 2011). 1535–1545. https://doi.
123–133.
org/10.1234/12345678
[37] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013.
[10] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech Recog-
Reasoning with neural tensor networks for knowledge base completion. In
nition with Deep Recurrent Neural Networks. In arXiv preprint arXiv:1303.5778.
Advances in neural information processing systems. 926–934.
IEEE, 6645–6649. arXiv:1303.5778 http://arxiv.org/abs/1303.5778
[38] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
[11] Sanping Guan, Xioalong Jin, Yantao Jia, Yuanzhuo Wang, and Cheng Xueqi. 2018.
Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks
Knowledge graph oriented knowledge inference methods: A survey. Ruan Jian
from Overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958.
Xue Bao/Journal of Software 29(10) (2018), 1–29.
https://doi.org/10.1214/12-AOS1000 arXiv:1102.4807
[12] John B Hampshire and Barak Pearlmutter. 1991. Equivalence proofs for multi-
[39] Kristina Toutanova, Xi Victoria Lin, Wen-Tau Yih, Hoifung Poon, and Chris Quirk.
layer perceptron classifiers and the Bayesian discriminant function. In Connec-
2016. Compositional Learning of Embeddings for Relation Paths in Knowledge
tionist Models. Elsevier, 159–172.
Bases and Text. In Proceedings of the 54nd Annual Meeting on Association for
Computational Linguistics (ACL 2016). 1434–1444. arXiv:1611.07232
[40] Théo Trouillon, Christopher R Dance, Éric Gaussier, Johannes Welbl, Sebastian arXiv preprint arXiv:1705.03202 (2017).
Riedel, and Guillaume Bouchard. 2017. Knowledge graph completion via complex [44] Z L Xu, Y P Sheng, L R He, and Y F Wang. 2016. Review on knowledge graph
tensor factorization. The Journal of Machine Learning Research 18, 1 (2017), 4735– techniques. Journal of University of Electronic Science and Technology of China
4772. (2016).
[41] John C Turner and Penelope J Oakes. 1986. The significance of the social identity [45] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014.
concept for social psychology with reference to individualism, interactionism Embedding Entities and Relations for Learning and Inference in Knowledge
and social influence. British Journal of Social Psychology 25, 3 (1986), 237–252. Bases. (2014). arXiv:1412.6575
[42] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge [46] Tao Zhou, Jie Ren, Matúš Medo, and Yi Cheng Zhang. 2007. Bipartite network
Graph Embedding by Translating on Hyperplanes. AAAI Conference on Artificial projection and personal recommendation. Physical Review E - Statistical, Nonlin-
Intelligence (2014). arXiv:1506.00379 ear, and Soft Matter Physics (2007). https://doi.org/10.1103/PhysRevE.76.046115
[43] Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2017. Does William Shakespeare arXiv:0707.0540
REALLY Write Hamlet? Knowledge Representation Learning with Confidence.