1 s2.0 S0020025524003438 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Information Sciences 667 (2024) 120430

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

SANe: Space adaptation network for temporal knowledge graph


completion
Yancong Li a , Xiaoming Zhang a,∗ , Bo Zhang b , Feiran Huang c , Xiaopeng Chen d ,
Ming Lu e , Shuai Ma a
a
State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
b
School of Computer and Electronic Information / School of Artificial Intelligence, Nanjing Normal University, Nanjing, 210023, China
c
College of Cyber Security, Jinan University, Guangzhou, 510632, China
d
TAIJI COMPUTER CORPORATION LIMITED, China
e
School of Cyber Science and Technology, Beihang University, Beijing 100191, China

A R T I C L E I N F O A B S T R A C T

Keywords: Temporal Knowledge Graphs (TKGs) model time-dependent facts as relations between entities
Temporal knowledge graph at specific timestamps, making them well-suited for real-world scenarios. However, TKGs are
Temporal knowledge graph completion susceptible to incompleteness, necessitating Temporal Knowledge Graph Completion (TKGC) to
Space adaptation network
predict missing facts. Prior methods often struggle to effectively handle two critical properties of
Parameter generation
Partition tree
TKGs, time-variability and time-stability, simultaneously, which hinders their performance. In this
paper, we propose Space Adaptation Network (SANe), a novel approach for TKGC. SANe adapts
facts at different timestamps to distinct latent spaces, effectively addressing time-variability.
Our model introduces Parameter Generation Network to produce separate neural networks for
each snapshot, which are then encoded into different latent spaces. A dynamic convolutional
neural network processes entities and relations, utilizing different learned parameters generated
by parameter generation network with respect to timestamps. By handling different temporal
snapshots separately, TKGC is transformed into static KGC, enabling the modeling of time-
variability. Dynamic convolutional neural network efficiently learns collective knowledge over
large periods and supplements more specific knowledge gradually in smaller periods, facilitating
time-stability. To strike a balance between learning time-variability and time-stability, we
introduce a time-aware parameter generator to produce parameters hierarchically based on year,
month, and day timestamps. Long-term knowledge is effectively shared across adjacent snapshots
within the same year or month, while short-term knowledge within a day is preserved in specific
parameters. However, in unbalanced TKGs, where many facts occur in small intervals, the large
number of parameters generated by time-aware parameter generator may remain underutilized.
To address this, we propose Adaptive Parameter Generation with a partition tree, ensuring
parameter load balancing while maintaining time-stability. We conduct extensive experiments
on five benchmark datasets, demonstrating the superiority of SANe over existing methods for
TKGC, achieving state-of-the-art performance. Our contributions include pioneering TKGC from
the perspective of space adaptation, achieving a balance between time-variability and time-
stability through latent space overlap constraints, and substantiating the effectiveness of our
model through comprehensive experiments on rich temporal datasets.

* Corresponding author.
E-mail address: [email protected] (X. Zhang).

https://doi.org/10.1016/j.ins.2024.120430
Received 22 March 2023; Received in revised form 28 February 2024; Accepted 3 March 2024
Available online 6 March 2024
0020-0255/© 2024 Elsevier Inc. All rights reserved.
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Fig. 1. A comparison of different methods for handling temporal information in knowledge graphs is illustrated. Existing approaches include (a) learning entities,
relations, and timestamps separately and then processing them in a shared latent space [9–11], and (b) implicitly encoding temporal information in entities or relations
to learn their time-aware representations, which are then processed in a shared latent space [18,19,15]. (c) Our proposed method adapts entities and relations into
specific latent spaces generated based on temporal information.

1. Introduction

Knowledge Graphs (KGs) [1] are widely adopted for organizing and managing knowledge as structured information, typically
presented in the form of fact triples. This facilitates downstream tasks such as question answering [2], community detection [3],
and recommendation systems [4]. In KGs, entities are represented as nodes, and the relations between them are depicted as directed
edges. Due to their inherent incompleteness, most KGs have spurred a significant amount of research in the area of Knowledge Graph
Completion (KGC), which involves the task of inferring potential or missing facts within KGs. Current methods mainly focus on
completing static KGs [5–8]. However, many real-world facts are time-dependent and have a limited validity period. The temporal
facts harm the effectiveness of KGC methods since the triples in KGs are not invariable.
Temporal Knowledge Graphs (TKGs) [9–11] model facts as quadruples by associating each triple in KGs with a timestamp, making
them more suitable for real-world scenarios. A TKG comprises a collection of subgraphs, known as snapshots, each corresponding
to a different timestamp, and dynamically evolves over time. Similar to static KGs, TKGs are also susceptible to incompleteness.
Consequently, Temporal Knowledge Graph Completion (TKGC), which involves predicting missing facts in TKGs, has emerged as a
significant and rapidly growing research topic [12–14].
In recent years, several techniques have been proposed for TKGC [12,13,15]. One line of studies learn independent embedding
representations for entities, relations and timestamps respectively (shown in Fig. 1(a)) [12,13,16,17]. Other solutions integrate
temporal information into entities or relations to produce time-aware representations (shown in Fig. 1(b)) [18,19,15]. These solutions
simply expand traditional KGC models by constraining the representations of facts in term of temporal information. They attempt
to model the varying knowledge in an identical latent space, which is usually ineffective to adapt to dynamic TKGs. The static
fact representations in the first approach fail to account for the validity of knowledge over specific periods. Meanwhile, the second
solution frequently changes the representations of facts, even when the facts remain stable and are not sensitive to time. Essentially,
previous methods struggle to simultaneously address two inherent and critical properties of TKGs: time-variability and time-stability.
Time-variability means that knowledge changes over time periods. For example, the president of USA was Donald Trump on 2020-
10-01, but Joe Biden on 2021-10-01. Time-stability means that certain knowledge remains valid for a specific period. For instance,
the triple (Donald Trump, presidentOf, USA) remains valid during a specific time frame, spanning from 2017-01-20 to 2021-01-20.
Efficiently designing a principled approach to model the time-variability and time-stability of TKGs is a critical problem for the TKGC
task.
In this paper, we provide a new train of thought and method to tackle the challenges. Facts at different timestamps usually have
different context. For example, the query (?, presidentOf, USA) yields different answers given different timestamps. Handling time-
variability aims to mitigate the interference of the same query occurring at different times. Hence, it is reasonable to separate facts at
different timestamps, known as snapshots, into distinct processing pipelines. As shown in Fig. 1(c), each snapshot is endowed with an
identical latent space, such that knowledge can be effectively distinguished at different timestamps. However, this straightforward
strategy leads to a parameter explosion when the TKG covers a lengthy period, as the number of model parameters increases
linearly with the number of timestamps. Besides, it is also fully hostile toward time-stability. More specifically, we assume that the
latent spaces for different timestamps are required to be identical, partially overlapping, or uncorrelated, depending on whether the
timestamps are identical, adjacent, or distant. To facilitate knowledge sharing among adjacent snapshots, controlling the overlap
of latent spaces based on the distance between timestamps is necessary. This approach enables time-stability and maintains an
acceptable number of parameters.
To this end, we propose Space Adaptation Network (SANe) for TKGC. We introduce parameter generation [20] to produce a
neural network for each snapshot by specifying a set of parameters. Snapshots are encoded into different latent spaces in term of
different neural networks. Specifically, a dynamic convolutional neural network (DCNN) is introduced to process the entities and
relations using different learnt parameters that are generated by parameter generation with respect to timestamps. Essentially, by

2
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 1
Abbreviations and Their Descriptions.

Acronym Description

KGs Knowledge graphs


KGC Knowledge graph completion
TKGs Temporal knowledge graphs
TKGC Temporal knowledge graph completion
SANe Space adaptation network
DCNN Dynamic convolutional neural network
PGN Parameter generation network
TaPG Time-aware parameter generator
AdaPG Adaptive parameter generation
CNNs Convolutional neural networks
DCL Dynamic convolutional layer
ReLU Rectified linear unit
RNN Recurrent neural network

handling different temporal snapshots in separate spaces, TKGC can be transformed into static KGC, thereby enabling the modeling of
time-variability. DCNN consists of multiple stacked dynamic convolutional layers. Inspired by CNNs in visual tasks, we anticipate that
the DCNN first captures collective knowledge over extensive periods and then incrementally integrates more specific knowledge over
shorter periods. This strategy plays an important role in modeling time-stability. Therefore, we construct a time-aware parameter
generator (TaPG) to generate the parameters for DCNN hierarchically. TaPG produces parameters for each layer of DCNN according
to the year, month and day of timestamps. Long-term knowledge is shared effectively across multiple adjacent snapshots within
the same year or month. Simultaneously, short-term knowledge within a day can be preserved in a set of specific parameters as
well. Our solution is capable of striking a balance between learning time-variability and time-stability. In addition, TaPG produces
parameters in term of the same time interval. However, some TKGs are unbalanced, i.e., plenty of facts mainly occur in a small
interval, which may cause that tremendous parameters produced by TaPG are rarely used. Additionally, we propose an Adaptive
Parameter Generator (AdaPG) to maintain parameter load balancing while ensuring time-stability. AdaPG uses a partition tree to
allocate parameters for snapshots based on the number of facts, rather than time intervals. The experimental evaluation of SANe,
conducted on five standard benchmark datasets, demonstrates its superiority over existing TKGC methods, achieving state-of-the-art
performance. The main contributions are outlined as follows:

• Our approach sheds new light on solving TKGC by adapting different latent spaces for each snapshot. To the best of our
knowledge, this is the first work that implements TKGC from the perspective of space adaptation.
• SANe achieves a balance between learning time-variability and time-stability by imposing constraints on the overlap of latent
spaces with respect to time intervals.
• The extensive experiments conducted on five benchmark datasets that contain rich temporal information substantiate the supe-
riority of the proposed model.

This paper presents an expanded version, which includes improvements in both technique and performance evaluation, over
its preliminary version [21]. In the developed version, we introduce an adaptive parameter generator (AdaPG) as an improvement
over the TaPG model. AdaPG overcomes TaPG’s limitations in handling unbalanced TKGs by allocating parameters based on the
number of facts rather than time intervals. This ensures parameter load balancing. By utilizing a partition tree, AdaPG optimizes
the utilization of generated parameters in TKGs with varying fact distributions. These advancements enhance the adaptability and
effectiveness of our approach for TKGC tasks. Moreover, to further enhance time representation quality, we introduce a consistency
loss to smooth timestamp representations. By incorporating the consistency loss during training, we ensure that snapshots with
similar timestamps are more consistent in the parameter generation process, thus improving the model’s ability to capture temporal
variations. Specifically, the main improvements are as follows: firstly, we enhance the parameter generation method, AdaPG, by
proposing a strategy for load balancing of parameters, enabling adaptation to unbalanced TKGs. Compared to TaPG [21], it assigns
the parameters based on the number of facts by constructing a flexible partition tree. Second, we conduct a more comprehensive
evaluation of the proposed approach with additional experiments on five benchmark datasets. Third, we provide more detailed
information about the modules designed in SANe and discuss the influence of the AdaPG module.
The remainder of this paper is organized as follows: Section 2 provides a detailed review of related work, illuminating the existing
landscape of our research context. Section 3 outlines the employed methodology. Section 4 encompasses the experimental design,
results, and in-depth discussions. Finally, Section 5 summarizes our findings concisely, highlighting both the limitations and strengths
of our proposed framework.
A comprehensive compilation of abbreviations employed in this manuscript, along with their corresponding explanations, is
presented in Table 1. This table is intended to serve as a convenient reference for readers, facilitating a clearer comprehension of
the text. When encountering unfamiliar acronyms throughout the document, readers are encouraged to consult Table 1 for quick
clarification.

3
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 2
Summary of related models, categories, and characteristics.

Category Category Model Characteristic Limitation

Static KGC Translation TransE [5], Translates relations as translations from head Ignores temporal information, resulting in
TransH [22], to tail entities identical plausibility scores for quadruples
TransD [23], with different timestamps
TransR [24]
Bilinear ComplEx [25], Bilinear transformations capture semantic
RESCAL [26], correlations between entities under relations
TuckER [27]
Neural ConvE [28], Uses neural networks to model entity-relation
RGHAT [29], interactions
InteractE [30]
Temporal KGC - TTransE [12], Represents entities, relations, and timestamps Cannot effectively model time-variability and
TComplEx [13], independently with the identical latent space time-stability in TKGs
TeLM [16],
ChronoR [17]
- HyTE [31], Posits that temporal information needs to be
DE-SimplE [19], incorporated into either entities or relations
ATiSE [18],
TIE [15]
- SANe [21] Maps facts with different timestamps into Tremendous parameters generated are rarely
distinct latent spaces used
Parameter Other domain Platanios et al. [20], Contextual parameter generation for sentence Not specifically designed for the knowledge
Generation N3 [32], translation, image classification, multilingual graph domain
Nekvinda et al. [33] speech synthesis
Static KGC CoPER [34], Generates model parameters for link Does not incorporate temporal information,
ParamE [35] prediction in static KGs based on neural limited to static KG completion
networks

2. Related work

In this section, typical methods for static knowledge graph completion and temporal knowledge graph completion are introduced,
and research advances on parameter generation in various fields are briefly reviewed. Table 2 provides a summary of the relevant
models along with their characteristics and limitations.

2.1. Static knowledge graph completion

Considerable research has focused on static knowledge graph completion (KGC). The primary goal of these time-agnostic models
is to infer missing or potential facts from static knowledge graphs.
TransE [5] is a prominent translation-based model that has demonstrated impressive performance across diverse domains and
applications, owing to its straightforward formulation and computational efficiency. Following the introduction of TransE, several
variants have been proposed in order to address some of its limitations. Among these variants are TransH [22], TransD [23], and
TransR [24]. TransH models relations using hyperplanes instead of translating the head entity vector. TransD introduces a separate
space for each relation to capture diverse relation semantics, while TransR employs a separate matrix for each relation to map entities
to the relation-specific space.
Bilinear models, which include ComplEx [25], RESCAL [26], and TuckER [27], adopt a framework in which relations are rep-
resented as linear transformations that act on entity embeddings. ComplEx extends the conventional embedding model by using
complex-valued embeddings for entities and relations, allowing it to capture asymmetric and symmetric relations. RESCAL models a
relation as a linear transformation on the outer product of two entity embeddings and has been shown to be effective in capturing
complex relationships among entities. TuckER, on the other hand, leverages the Tucker decomposition to factorize the 3-way tensor
that encodes entity-relation interactions, enabling it to capture higher-order interactions among entities and relations.
Neural models, such as ConvE [28], RGHAT [29], and InteractE [30], integrate nonlinear neural networks to complement knowl-
edge graphs, demonstrating impressive effectiveness in various tasks. ConvE projects the entities and relations into 2D feature maps
and uses convolutional neural networks (CNNs) to model their interactions in the feature space. RGHAT utilizes graph convolutional
networks (GCNs) to model the rich relational structures in the graph. InteractE extends the ConvE architecture to better model
entity-relation interactions, resulting in improved performance on the KGC task.
The aforementioned methods have demonstrated promising outcomes in the task of link prediction for static KGs. However, these
time-unaware models have limitations in reasoning about temporal facts. Specifically, static models will output the same plausibility
score due to their ignorance of temporal information when given two quadruples with the same head entity, relation, tail entity and
different timestamps, i.e. (Donald Trump, presidentOf, USA, 2020) and (Donald Trump, presidentOf, USA, 2022). However, only the
former of the two quadruples is positive. To improve the performance of static KGC models by leveraging temporal information,
several investigations have been performed regarding temporal knowledge graph completion.

4
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

2.2. Temporal knowledge graph completion

Recent studies have demonstrated that the performance of KGC models can be augmented by incorporating temporal information
in TKGs. The current methods for TKGC can generally be divided into two categories.
The first category of approaches represents entities, relations, and timestamps independently with the identical latent space.
TTransE [12] is a model that builds on the TransE [5] model and incorporate temporal information to improve performance on
TKGC tasks. Similarly, TComplEx [13] is an extension of the ComplEx model [25] that adopts the canonical decomposition of order-
4 tensors, incorporating a novel regularization mechanism that enhances its ability to handle temporal dynamics. TeLM [16] improves
upon TComplEx by utilizing multi-vector embeddings and a linear temporal regularizer to undertake fourth-order tensor factorization
of temporal knowledge graphs. ChronoR [17] is a model that uses k-dimensional rotation to represent temporal information in
knowledge graphs. SANe [21] maps facts with different timestamps into distinct latent spaces using CNN, capturing the dynamic
variation of knowledge in TKGs. It incorporates a time-aware parameter generator to enable efficient knowledge sharing between
adjacent time intervals and enhance TKGC performance.
The second category of approaches posits that temporal information needs to be incorporated into either entities or relations,
leading to the acquisition of time-aware representations. HyTE [31] is a temporally aware KG embedding method that incorporates
time into the entity-relation space by associating each timestamp with a corresponding hyperplane. It enables KG inference with
temporal guidance and predicts temporal scopes for relational facts with missing time annotations. DE-SimplE [19] is a method that
combines the static KGC model SimplE [36] with a diachronic embedding function, enabling the learning of time-sensitive entity
representations. The diachronic embedding function captures temporal information by transforming entity embeddings based on a
time index, while SimplE is used to score the KGC task. ATiSE [18] is a method that incorporates temporal information into entity
and relation embeddings in a knowledge graph by utilizing additive time series decomposition and Gaussian distributions to capture
temporal uncertainty. TIE [15] is an incremental embedding framework designed to capture temporal variations. TIE generates time-
dependent embeddings of entities and relations by incrementally updating the embeddings based on newly observed data, and uses
experience replay to prevent catastrophic forgetting.

2.3. Parameter generation

Parameter generation has been explored across various academic disciplines. The neural translation model proposed by Pla-
tanios et al. [20] employs a contextual parameter generator to create encoder and decoder parameters for a sentence, taking into
account information from both the source and target languages to produce effective parameters. N3 [32] combines natural language
descriptions with pre-trained models to generate network parameters for image classification. Nekvinda et al. [33] proposed a mul-
tilingual speech synthesis approach that leverages contextual parameter generation through meta-learning to produce fluent and
natural-sounding multilingual speech.
Based on our inquiry, literature also exists on the generation of parameters for completing static knowledge graphs. CoPER [34]
leverages relation embeddings to derive model parameters that manipulate head entity embeddings, enabling complex entity-relation
interactions. ParamE [35] combines the advantages of translational models and convolution neural network-based models for link
prediction. In ParamE, entity and relation embeddings are viewed as input, parameters, and output of a neural network. However,
neither CoPER nor ParamE incorporates temporal information, rendering them incapable of modeling the temporal dependencies
across facts in TKG settings.

3. Methodology

A temporal knowledge graph (TKG) is a directed graph composed of a set of entities  , relations , and timestamps  , represented
as a structured dataset of quadruples  = {(ℎ, 𝑟, 𝑡, 𝜏) | ℎ, 𝑡 ∈ , 𝑟 ∈ , 𝜏 ∈  }. Each quadruple in  represents a temporal link in the
TKG, where ℎ and 𝑡 denote the head and tail entities, connected via a relation 𝑟 at time 𝜏 . The objective of Temporal Knowledge
Graph Completion (TKGC) [37] is to predict the missing tail or head entity in a query (ℎ, 𝑟, ?, 𝜏) or (?, 𝑟, 𝑡, 𝜏) by utilizing the temporal
information available within the TKG. This paper focuses exclusively on the interpolation task for TKGC, involving the prediction of
missing facts only for timestamps present in the dataset. The extrapolation task [14], which involves predicting future facts beyond
the observed timestamps, is not addressed in this study.
To overcome the challenges in TKGC, this paper introduces the Space Adaptation Network (SANe). SANe leverages distinct
latent spaces to adapt snapshots with varying timestamps, as shown in Fig. 2. The SANe model comprises two key components:
a Dynamic Convolutional Neural Network (DCNN) and a Parameter Generation Network (PGN). DCNN is a convolutional neural
network that adapts entities and relations into distinct latent spaces using convolutional layers with varied parameters. PGN is
responsible for generating these parameters based on temporal information. More precisely, PGN transforms timestamps into a group
of parameters that regulate the intersection of several latent spaces within DCNN. This approach ensures valid knowledge sharing
among neighboring snapshots and enables the model to capture temporal dependencies in the data. In our model, the head entity
and relation are represented by embedding vectors 𝐡 ∈ ℝ𝑑 and 𝐫 ∈ ℝ𝑑 , respectively, with dimensionality 𝑑 . For a query (ℎ, 𝑟, ?, 𝜏),
the DCNN 𝑓 leverages parameters generated by the PGN 𝑔 to predict the missing tail entity 𝑡. The parameters, associated with the
DCNN 𝑓 and generated by PGN 𝑔 , are expressed as 𝜃𝑓 = 𝑔(𝜏). The prediction of the missing tail entity 𝑡 is formulated as follows:

𝐭 = 𝑓 (𝐡, 𝐫; 𝑔(𝜏)), (1)

5
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Fig. 2. SANe consists of a multi-layer convolutional neural network called DCNN, which is designed to predict missing entities. The filter parameters of DCNN are
generated by Parameter Generation Network (PGN) based on temporal information.

where 𝑔(𝜏) is the set of parameters associated with 𝑓 that is determined by the input timestamp 𝜏 .
The subsequent sections, Section 3.1, Section 3.2, and Section 3.3, delve into the architecture of the DCNN, and offer two distinct
approaches within the PGN: the Time-aware Parameter Generator (TaPG) and the Adaptive Parameter Generator (AdaPG), catering
to different TKGs flexibly. The loss function is discussed in detail in Section 3.4.

3.1. Dynamic convolutional neural network

Convolutional Neural Networks (CNNs) have proven effective in modeling static KGC tasks [28,30]. However, their potential
in the context of TKGC has yet to be fully explored. To this end, we introduce specialized parameters into the CNN architecture,
enhancing its capability to support TKGC by incorporating temporal information. The DCNN 𝑓 comprises multiple dynamic convolu-
tional layers and batch normalization, followed by a fully connected linear layer. The Dynamic Convolutional Layer (DCL) serves as
a crucial component of the DCNN architecture, responsible for identifying key features from the input with a filter. Unlike traditional
convolutional layers, the filter parameters of the DCL are not fixed but are instead dynamically generated from the PGN. The PGN
serves as a vast parameter pool that selects appropriate parameters for the DCLs based on the diverse temporal facts present in
the input data. Therefore, the DCLs are capable of capturing the temporal dependencies of the input data, which can be crucial in
predicting missing information in a TKG.
DCL employs padding and filtering operations on the input feature matrix 𝐗 ∈ ℝ𝐶𝑖 ×𝐻×𝑊 to generate the feature map 𝐗′ ∈
𝐶
ℝ ×𝐻×𝑊 , where 𝐻 and 𝑊 correspond to the height and width dimensions. Specifically, the convolution operation between the
𝑜

input feature matrix and the filter 𝜔𝜏,𝑝 ∈ ℝ𝐶𝑜 ×𝐶𝑖 ×𝑘×𝑘 is followed by the Rectified Linear Unit (ReLU) non-linear activation function
[38]. The DCL can be formulated as follows:
( ) ( )
𝐗′ = 𝖣𝖢𝖫 𝐗; 𝜃𝜔𝜏,𝑝 = 𝖱𝖾𝖫𝖴 𝐗 ⊛ 𝜔𝜏,𝑝 , (2)

where ⊛ represents the convolution operator, 𝐶𝑜 and 𝐶𝑖 refer to the size of input and output channels, and 𝑘 denotes the kernel size.
The PGN generates the filter 𝜔𝜏,𝑝 according to the position 𝑝 of DCL within the DCNN and the corresponding timestamp 𝜏 , with 𝜃𝜔𝜏,𝑝
derived from the set of parameters 𝑔(𝜏, 𝑝) of PGN.
Multiple DCLs are combined in a stacked manner to process the entities and relations in a more efficient way. We begin by
reshaping the entity 𝐡 and the relation 𝐫 into two-dimensional matrices, denoted as ̃𝐡 ∈ ℝ𝐻×𝑊 and 𝐫̃ ∈ ℝ𝐻×𝑊 . To enhance the
heterogeneous interactions between the vectors representing entity ̃𝐡 and relation 𝐫̃ , we apply feature permutation and checkered
reshaping operations to the concatenated vector 𝐗 ∈ ℝ2𝐻×𝑊 of ̃𝐡 and 𝐫̃ , drawing inspiration from previous research [30]. Feature
permutation entails the random reordering of each constituent within the vectors ̃𝐡 and 𝐫̃ , whereas checkered reshaping guarantees
that the elements within the matrix 𝐗 are positioned alternately. This guarantees that each pair of contiguous cells is occupied by
elements from the sets 𝐡 and 𝐫 in a cyclical manner. Following the aforementioned operations, the input 𝐗 ̃ is subject to regularization
and subsequently processed through 𝑃 DCLs with the objective of generating the feature map 𝐌.
A scoring function is used to predict the tail entity by evaluating the correlation score between a candidate tail entity 𝐭 ∈ ℝ𝑑 and
the query (ℎ, 𝑟, ?, 𝜏):

𝜓𝜏 (ℎ, 𝑟, 𝑡) = ⟨𝖫𝗂𝗇𝖾𝖺𝗋(𝖿 𝗅𝖺𝗍𝗍𝖾𝗇(𝐌)), 𝐭⟩ , (3)


where < ⋅, ⋅ > denotes the dot product of vectors, 𝖫𝗂𝗇𝖾𝖺𝗋(⋅) represents a linear layer activated by the Rectified Linear Unit (ReLU)
function, and 𝖿 𝗅𝖺𝗍𝗍𝖾𝗇(𝐌) refers to the operation of flattening the feature map 𝐌 into a 𝑑 -dimensional vector.
The sets of filters contained within the DCNN, represented by 𝜔𝜏,1 , ⋯ 𝜔𝜏,𝑃 , capture the delivery and variation of knowledge
across different temporal snapshots. Specifically, when considering timestamps of related events that are identical, adjacent, or

6
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Fig. 3. Query prediction using DCNN. Different parameters are used to process queries with distinct timestamps, while some model parameters are partially shared
by queries with adjacent timestamps.

distant, the latent spaces generated by the DCNN should be equivalent, partially overlapping, or uncorrelated, respectively. The
extent of overlap between latent spaces across different timestamps determines the degree of shared knowledge between them. This
characteristic facilitates the reduction of interference from earlier snapshots and the incorporation of missing facts in subsequent
snapshots, resulting in accumulated knowledge. In essence, our SANe model is designed to store factual information in multiple
knowledge bases, where each knowledge base corresponds to a different time range and is represented by a distinct set of parameters.
Thus, the model is able to accurately “index” its knowledge by retrieving specific “records” from the appropriate set of parameters
based on the given timestamp. The next two sections will present two solutions that employ the parameter generation network in
order to preserve valid knowledge and forget mistaken information in a time-aware manner.

3.2. Time-aware parameter generator

When searching for records, individuals traditionally refine their search by sequentially breaking down timestamps into year,
month, and day components. This systematic approach aids in precisely locating a specific record or retrieving similar records in
proximity to the timestamp of interest.
Based on empirical evidence, the filter parameters 𝜔1,𝜏 of the first DCL in the DCNN play a crucial role in establishing a global
“catalogue” of the year of 𝜏 . This catalogue serves to encode high-level contextual features from an annual perspective. Subsequently,
the second and third DCLs utilize the parameters 𝜔2,𝜏 and 𝜔3,𝜏 , respectively, to provide additional information related to the month
and day aspects. This refined encoding facilitates a more precise prediction of relevant facts in the temporal context.
In this part, we present the time-aware parameter generator (TaPG), which employs three sets of parameters associated with the
temporal aspects of “Year-Month-Day” to “store” knowledge. The timestamp 𝜏 is initially partitioned and embedded into a fixed-
length sequence of embeddings (𝝉1, 𝝉2, 𝝉3), representing the year, month, and day, respectively, each in ℝ𝑑𝜏 space. To model the
sequence data, we use a recurrent neural network (RNN), which generates multiple outputs as follows:

{𝐨𝜏1 , 𝐨𝜏2 , 𝐨𝜏3 } = 𝖱𝖭𝖭({𝝉 1 , 𝝉 2 , 𝝉 3 }), (4)

𝐨𝜏𝑖 = 𝜎(𝐖𝑜 𝐬𝑖 + 𝐛𝑜 ), (5)


𝐬𝑖 = 𝜎(𝐔𝑠 𝝉 𝑖 + 𝐖𝑠 𝐬𝑖−1 + 𝐛𝑠 ), (6)
where 𝜎 denotes the nonlinear activation function, 𝐬𝑖 and 𝐨𝜏𝑖
indicate the hidden state and output at step 𝑖, respectively, and RNN
parameters include 𝐖𝑜 , 𝐖𝑠 , and 𝐔𝑠 . The output 𝐨𝜏3 corresponds to the contextual representation of the timestamp 𝜏 .
We employ a collection of fully connected layers, denoted as {𝖫𝗂𝗇𝖾𝖺𝗋1 , 𝖫𝗂𝗇𝖾𝖺𝗋2 , 𝖫𝗂𝗇𝖾𝖺𝗋3 }, to transform the RNN outputs into a set
of parameters represented as

𝑔(𝝉, 𝑝) = 𝜔𝜏,𝑝 = 𝖫𝗂𝗇𝖾𝖺𝗋𝑝 (𝐨𝜏𝑝 ), (7)


where 𝜏 is a timestamp and 𝑝 ∈ {1, 2, 3} indexes the year, month, and day, respectively. These linear layers constitute a pool of
parameters that are retrieved with respect to timestamps, thereby enabling contextualized parameter generation. The 𝑑 -dimensional
vectors generated by {𝖫𝗂𝗇𝖾𝖺𝗋𝑖 } are reshaped into tensors with dimensions ℝ𝐶𝑜 ×𝐶𝑖 ×𝑘×𝑘 to support convolutional operations. It is worth
noting that the scale of the DCNN parameters is independent of the number of timestamps and determined solely by the size of the
linear layers {𝖫𝗂𝗇𝖾𝖺𝗋𝑖 }.
Specifically, the parameters 𝜔𝜏,𝑝 generated by the Eq. (7) play a crucial role in establishing a global “catalogue” corresponding to
the specific aspect of the timestamp, providing essential contextual information for subsequent operations in the DCNN.
As illustrated in Fig. 3, the filters 𝜔1,𝜏 handle the facts with the same year, resulting in shared valid knowledge during an interval
across adjacent snapshots. In contrast to previous approaches [39,40] that explicitly construct sparse snapshots at each timestamp,
our implicit approach facilitates knowledge delivery across different snapshots. Of course, when dealing with facts that have a
long gap of timestamps, two distinct models are employed to avoid interference from earlier knowledge. Thus, the proposed TaPG
facilitates the efficient handling of time-variability and time-stability by DCNN. The parameters generated by TaPG create multiple
spaces that are specifically designed for different temporal snapshots. Additionally, the knowledge across these spaces can be shared
or separated based on the contextual information provided by TaPG.

7
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Fig. 4. Time distribution of facts for four benchmarks TKG datasets, i.e., ICEWS14 [41], ICEWS05-15 [41], YAGO11k [31], and Wikidata12k [31].

Fig. 5. Illustration of the parameter generation methods. (a) Time-aware Parameter Generator (TaPG) with a year-month-day time division. (b) Adaptive Parameter
Generator (AdaPG) ensuring parameter load balancing based on fact distribution.

3.3. Adaptive parameter generator

While TaPG represents a straightforward yet practical module for providing a latent space for each snapshot, it encounters
challenges when handling TKGs with incomplete timestamps, specifically when month and day values are missing. In addition,
sometimes the time distribution of facts is unbalanced, i.e., plenty of facts occur mainly at some period of time. For example,
Fig. 4 reports the number of facts for different time in TKGs, where YAGO11k and Wikidata12k exhibit the long tail effect on time
distribution of facts. TaPG generates parameters for each snapshot based on the strict division of “Year-Month-Day”, causing that few
parameters in {𝖫𝗂𝗇𝖾𝖺𝗋𝑖 } are forced to learn most of facts that are distributed in a small interval. Consequently, a significant number of
parameters are underutilized, representing an inefficiency in handling unbalanced temporal distributions. To address this limitation,
we are exploring enhancements to TaPG to accommodate TKGs with incomplete timestamps and mitigate the impact of unbalanced
time distributions, thereby maximizing the efficiency of parameter utilization.
In this part, we propose an enhanced Adaptive Parameter Generator (AdaPG) designed to simultaneously maintain load balancing
of parameters and ensure time-stability. Unlike TaPG, AdaPG organizes timestamps in  based on the number of facts rather than
physical time. Specifically, AdaPG employs a (𝑃 + 2)-layer partition tree 𝑃 𝑇 where the leaf nodes represent the facts and non-leaf
nodes represent a time interval. The fact with the timestamp 𝜏 can be queried in term of a path (𝛾𝑟𝑜𝑜𝑡 , 𝛾1 , 𝛾2 , ⋯ , 𝛾𝑃 , 𝛾𝑙𝑒𝑎𝑓 ) where
𝛾𝑟𝑜𝑜𝑡 and 𝛾𝑙𝑒𝑎𝑓 represent root and leaf nodes respectively. AdaPG uses RNN to encode the path and retrieves the parameters from
{𝖫𝗂𝗇𝖾𝖺𝗋𝑖 }𝐿
𝑖=1
,

{𝐨𝜏1 , 𝐨𝜏2 , ⋯ , 𝐨𝜏𝑃 } = 𝖱𝖭𝖭({𝜸 1 , 𝜸 2 , ⋯ , 𝜸 𝑃 }), (8)

𝑔(𝝉, 𝑝) = 𝜔𝜏,𝑝 = 𝖫𝗂𝗇𝖾𝖺𝗋𝑝 (𝐨𝜏𝑝 ), 𝑝 ∈ {1, ⋯ , 𝑃 } (9)

where 𝜸 𝑖 ∈ ℝ𝑑𝜏is the embedding of the node 𝛾𝑖 , and 𝐨𝜏𝐿


is the representation of timestamp 𝜏 . Compared to TaPG, this strategy can
flexibly control the number of layers of DCNN, i.e., 𝑃 , by setting the height of 𝑃 𝑇 .
Essentially, as shown in Fig. 5(a), the temporal division employed by TaPG aligns with that of 𝑃 𝑇 , where non-leaf nodes are
constructed based on the time range of specific years, months and days. To avoid the serious waste of parameters in {𝖫𝗂𝗇𝖾𝖺𝗋𝑖 },
AdaPG is required that non-leaf nodes of 𝑃 𝑇 at each layer should manage similar number of facts, as illustrated in Fig. 5(b). This
ensures a balanced traversal of all non-leaf nodes when processing all the facts. Specifically, let 𝐷(𝛾) = {(ℎ, 𝑟, 𝑡, 𝜏)} be the set of facts

8
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

𝑘 𝑚
in the subtree rooted at 𝛾 , and | ⋅ | be the size of the set. A parent node 𝛾𝑝 in 𝑘-th layer splits 𝑚𝑘 child nodes {𝛾𝑝,𝑖 }𝑖=1 . The splitting
𝑚𝑘
rule is to search an optimal division {𝐷(𝛾𝑝,𝑖 )}𝑖=1 by solving the following programming problem,

𝑚𝑘 −1
∑ ( )
min 𝖺𝖻𝗌 |𝐷(𝛾𝑝,𝑖+1 )| − |𝐷(𝛾𝑝,𝑖 )|
𝑖=1
𝑠.𝑡., 𝜏 − 𝜏 ′ < 0, ∀𝜏 ∈ 𝐷(𝛾𝑝,𝑖 ), ∀𝜏 ′ ∈ 𝐷(𝛾𝑝,𝑗 ) 𝗐𝗁𝖾𝗇 𝑖 < 𝑗
(10)
𝐷(𝛾𝑝,𝑖 ) ∩ 𝐷(𝛾𝑝,𝑗 ) = ∅, ∀𝑖, 𝑗
𝑚𝑘

𝐷(𝛾𝑝,𝑖 ) = 𝐷(𝛾𝑝 )
𝑖=1
𝑘 𝑚
where 𝖺𝖻𝗌(𝑥) = |𝑥| is the absolute value. The rule is to divide a set 𝐷(𝛾𝑝 ) sorted by 𝜏 into 𝑚𝑘 disjoint subsets {𝐷(𝛾𝑝,𝑖 )}𝑖=1 in a certain
order where the subsets share the similar size roughly. The first condition ensures that the facts with the same timestamp are always
assigned to a same children node. The rule also prioritizes aggregation on the facts with adjacent timestamps.
Given the number of child nodes {𝑚𝑘 }𝑃𝑘=1 and TKG G , the (𝑃 + 2)-layer partition tree 𝑃 𝑇 is constructed in this paper as the
following steps:

1. Construct the root node 𝛾𝑟𝑜𝑜𝑡 and initialize 𝐷(𝛾𝑟𝑜𝑜𝑡 ) = G that contains all the facts in the TKG.
2. Set 𝛾𝑝 = 𝛾𝑟𝑜𝑜𝑡 and 𝓁 = 1.
3. If 𝓁 > 𝑃 , for each node 𝛾 in 𝑃 + 1 layer, split it into |𝐷(𝛾)| leaf nodes. Each leaf is associated with a fact in 𝐷(𝛾). Return 𝑃 𝑇 .
𝑚𝓁
4. Otherwise, split the node 𝛾𝑝 into 𝑚𝓁 child nodes {𝛾𝑝,𝑖 }𝑖=1 according to the solution of Eq. (10).
5. Set 𝓁 = 𝓁 + 1. For each 𝑖 ∈ {1, ⋯ , 𝑚𝓁 }, set 𝛾𝑝 = 𝛾𝑝,𝑖 and call the step 3 and 4.

The construction of 𝑃 𝑇 can induce many special variants of SANe. For example, when 𝑚1 , ⋯ , 𝑚𝑃 = 1, SANe degrades to common
𝑃 -layer convolution
( 𝑛 ) networks with a fixed set of parameters.
There are kind of divisions to split 𝑛 facts into 𝑘 subsets when solving Eq. (10). Therefore, the time complexity for selecting
𝑘−1
the optimal division is 𝑂(𝑛𝑘−1 ), requiring high computational burden when 𝑘 is large. A simple heuristic method is applied to
approximate the optimal solution of Eq. (10). Specifically, we use a (𝑔+2)-layer 𝑃 𝑇 to process 𝑛 facts with 𝑚1 , ⋯ , 𝑚𝑔 = 2 where
𝑔 = 𝖼𝖾𝗂𝗅(log2 𝑘). The 2𝑔 non-leaf nodes at the last layer are returned as the approximate results. Now, the time complexity reduces to
𝑂(𝑛 log2 𝑘). Note here that the number of subsets changes from 𝑘 to 2𝑔 . In the experiment, we select appropriate 𝑚1 , ⋯ , 𝑚𝑔 , e.g., 1,
2, and 3, to approach to 𝑘.

3.4. Loss function

Two losses are used to train the model.


Task Loss. We employ the task loss of TKGC to train SANe for tail entity prediction. Specifically, the probability of the candidate
tail entity 𝑡 answering to the query (ℎ, 𝑟, ?, 𝜏) is defined as,
( )
𝑝(ℎ; 𝑟, 𝑡, 𝜏) = 𝜎 𝜓𝜏 (ℎ, 𝑟, 𝑡) (11)
where 𝜎 (⋅) is sigmoid function. The objective of the training process is to minimize the negative log-likelihood loss function:
∑ (
1
𝑡 = − log 𝑝(ℎ; 𝑟, 𝑡, 𝜏)
|G | (ℎ,𝑟,𝑡,𝜏)∈G
∑ ) (12)
+|𝑆𝑡 | log(1 − 𝑝(ℎ; 𝑟, 𝑡̃, 𝜏))
𝑡̃∈𝑆𝑡

where 𝑆𝑡 represents the negative samples by replacing the tail entity of (ℎ, 𝑟, 𝑡, 𝜏) as 𝑡̃.
Consistency Loss. To ensure smooth representations of timestamps concerning time, i.e., adjacent timestamps sharing similar
representations, we generate representations 𝑜𝜏 for each timestamp 𝜏 ∈  using Eq. (8). Let 𝑂 denote the set {𝑜𝜏 |𝜏 ∈  }, which is
sorted by timestamps. We employ a consistency loss to smooth the representations of timestamps,
|𝑂|−1
1 ∑
𝑐 = ‖𝑂𝑖+1 − 𝑂𝑖 ‖𝑝𝑝 (13)
|𝑂| 𝑖=1
where ‖ ⋅ ‖𝑝 is the L-𝑝 normalization and 𝑂𝑖 is the 𝑖-th element in 𝑂 .
The aforementioned constrained optimization problem is centered on the pursuit of temporal smoothness, with the objective of
ensuring a degree of similarity between adjacent time embeddings. It is important to recognize that this approach is a prevalent
feature in the literature of related works [17,16]. Nevertheless, it is critical to delineate a fundamental distinction between our
methodology and the conventions found in existing literature. The predominant approach in these works predominantly focuses on
modeling complete timestamps, whereas our framework necessitates the decomposition of timestamps as an integral component.

9
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 3
A comparison of the space complexity and scoring functions for various
TKGC methods.

Model Space Complexity Scoring Function


( )
TransE [5]  (𝑛 𝑒 𝑑 + 𝑛 𝑟 𝑑 ) ‖𝐡 + 𝐫 − 𝐭‖
TTransE [12]  (𝑛 𝑒 𝑑 + 𝑛 𝑟 𝑑 + 𝑛 𝜏 𝑑 ) ‖𝐡 + 𝐫 + 𝜏 − 𝐭‖
HyTE [31]  (𝑛𝑒 𝑑 + 𝑛𝑟 𝑑 )+ 𝑛𝜏 𝑑 ( + 𝑃𝜏 (𝐫) − 𝑃𝜏 )(𝐭) ‖
‖𝑃𝜏 (𝐡)
ATiSE [18]  𝑛𝑒 𝑑 + 𝑛𝑟 𝑑  𝐏ℎ,𝜏 − 𝐏𝑡,𝜏 , 𝐏𝑟,𝜏
( )
TeRo [42]  𝑛𝑒 𝑑 + 𝑛𝑟 𝑑 ‖𝐡𝜏 + 𝐫 − 𝐭 𝜏 ‖
( )
SANe  𝑛 𝑒 𝑑 + 𝑛 𝑟 𝑑 + 𝑛 𝑦 𝑑𝜏 ⟨𝖫𝗂𝗇𝖾𝖺𝗋(𝖿 𝗅𝖺𝗍𝗍𝖾𝗇(𝖢𝖭𝖭(𝐡, 𝐫))), 𝐭⟩

Given the distinctive nature of our timestamp decomposition method, the conventional approach to constrained optimization
does not straightforwardly apply to our model. To overcome this challenge and adapt it to our temporal partitioning paradigm,
we introduce a consistency loss function, detailed in Equation (13). Furthermore, it is pertinent to underscore that our primary
emphasis in this research endeavor is oriented towards temporal knowledge graph completion (TKGC), rather than the constrained
optimization problem. Therefore, we introduced a hyperparameter, denoted as 𝜆, to modulate the weight of the consistency loss.
Notably, our choice of a relatively modest 𝜆 is a conscious decision intended to accentuate the primacy of the TKGC task during
model training, while not overshadowing the overarching focus of constrained optimization.
Then the overall loss function to train SANe is,

 = 𝑡 + 𝜆𝑐 (14)
where 𝜆 denotes a hyper-parameter that control the relative importance of two losses.

3.5. Complexity analysis

Table 3 provides an overview of scoring functions and space complexity for various TKGC techniques, including SANe. The
notation 𝑛𝑒 , 𝑛𝑟 , and 𝑛𝜏 denotes the number of entities, relations, and timestamps, respectively. The dimensions of feature vectors
are denoted by 𝑑 and 𝑑𝜏 , while 𝑛𝑦 denotes the number of years. Additionally, the CNN in DCNN corresponds to a three-layer
convolutional neural network. It is noteworthy that the space complexity of our proposed SANe is comparable to that of other
established methods.
In regard to the intricacies of PGN, a novel component introduced within our model to manage temporal variability and stability,
an in-depth analysis reveals that its inherent complexity is notably less pronounced in comparison to methods that do not decom-
pose timestamps. To illustrate this point, we focus on the TaPG within the PGN framework. Consider a TKG spanning a century,
encompassing approximately 36,500 unique timestamps, corresponding to each day over the span of 100 years. In typical TKGC
approaches, the requirement to generate an embedding vector for each of these timestamps results in the storage of a voluminous
number of numerical values, particularly if each vector is represented in a 100-dimensional space.
In contrast, the TaPG mechanism partitions timestamps into year-month-day units, significantly reducing the volume of necessary
time embeddings. Specifically, this method requires only 143 time vectors: 100 for years, 12 for months, and 31 for days, each
potentially in a 100-dimensional space. Consequently, the total number of numerical values required to store timestamp embeddings
dramatically diminishes to a significantly smaller numerical load compared to traditional approaches. Moreover, the partitioning
rules for timestamps are conceptually simple and user-friendly.
In essence, the PGN framework’s timestamp decomposition approach embodies a form of distributed representation, rendering
the complexity of timestamp handling considerably more manageable when contrasted with established methods that undertake
direct mapping of each timestamp to an embedding vector. In this context, the complexity of timestamp management is contingent
solely upon the number of distinct years, while the number of months and days remains constant at 12 and 31, respectively. This
simplification not only aligns with user-friendly considerations but also exhibits inherent simplicity in its structure.

4. Experiments

In this section, we present a detailed evaluation of our proposed TKGC model on five benchmark datasets. First, we detail the
experimental setup. Second, we analyze and discuss the experimental results. Furthermore, we conduct ablation studies to determine
the significance of the various components in our proposed model. Two versions of the SANe model, specifically SANe and SANe+,
incorporating TaPG and AdaPG respectively, are utilized in the experiments.

4.1. Experimental setup

Datasets. The model is evaluated on five publicly available benchmark datasets: ICEWS14 [41], ICEWS05-15 [41], YAGO11k
[31], Wikidata12k [31], and GDELT [43]. The ICEWS datasets consist of socio-political events with time annotations from the
Integrated Crisis Early Warning System dataset [9], where ICEWS14 contains events from 2014 and ICEWS05-15 contains events from
2005 to 2015. YAGO11k and Wikidata12k are subsets of YAGO3 [10] and Wikidata [11] datasets, respectively. Time annotations in

10
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 4
Statistics of benchmark datasets used for evaluating TKGC methods. The time span unit is in
years.

Datasets ICEWS14 ICEWS05-15 YAGO11k Wikidata12k GDELT

#Entities 6,869 10,094 10,623 12,554 500


#Relations 230 251 10 24 20
Time span 2014 2005-2015 -453-2844 1709-2018 2015-2016
#Train 72,826 368,962 16,408 32,497 2,735,685
#Valid 8,941 46,275 2,050 4,062 341,961
#Test 8,963 46,092 2,051 4,062 341,961

YAGO11k and Wikidata12k are time intervals that are discretized into quadruplets with year-level granularity, while month and day
information is discarded as suggested by Dasgupta et al. [31]. To handle these datasets, fabricated timestamps with constant values
for months and days are used. GDELT (Global Database of Events, Language, and Tone) is a comprehensive temporal knowledge
graph, encompassing records of human behavior starting from 1979, with a focus on events occurring between April 1, 2015, and
March 31, 2016. Table 4 provides a summary of the statistics for these five benchmark datasets.
Baselines. In the experiment, we compare two versions of the proposed models with several baselines, namely SANe equipped
with TaPG and SANe+ equipped with AdaPG. The baselines include both static and temporal KGC models. The former category
comprises TransE [5], DistMult [6], RotatE [7], ComplEx-N3 [44], and QuatE2 [8]. The latter category consists of TTransE [12],
TA-TransE [41], TA-DistMult [41], HyTE [31], DE-SimplE [19], ATiSE [18], TeRo [42], TimePlex [45], TComplEx [13], ChronoR
[17], TeLM [16], BoxTE [14], and RoAN-DED [37]. It is worth noting that DE-SimplE, ChronoR, BoxTE, and RoAN-DED are excluded
from the comparison with SANe and SANe+ on the YAGO11k and Wikidata12k datasets, as their results are not available in the
original research papers.
Evaluation Metrics. For each quadruple (ℎ, 𝑟, 𝑡, 𝜏), two queries are constructed to train and test the model, i.e., (ℎ, 𝑟, ?, 𝜏) and
(?, 𝑟, 𝑡, 𝜏). In practice, the query (?, 𝑟, 𝑡, 𝜏) is replaced by a reciprocal query (𝑡, 𝑟−1 , ?, 𝜏) for the convenience of training as most of works
[45,16] do. The Mean Reciprocal Rank (MRR) and Hits@N are employed as evaluation metrics. The metric of MRR is calculated
as the mean of the reciprocals of all rank values for each query. Meanwhile, Hits@N evaluates the accuracy of the top-N retrieved
entities, which is defined as the proportion of queries where the correct entity candidate is ranked among the top-N in the list
of retrieved entities. MRR is deemed to be a more robust evaluation metric as compared to Hits@N, since it is less vulnerable to
the influence of outliers and is thus able to more accurately reflect the performance of the model [41]. Improved accuracy in the
prediction of factual information is indicated by higher values of both MRR and Hits@N. It is pertinent to note that all evaluations
have been conducted under the time-wise filtering setting, which has been extensively employed in prior research studies [18,42].
Implementation details. The model is implemented using PyTorch, a deep learning framework, on a single NVIDIA GeForce
RTX 3090 GPU. To select the optimal hyperparameters for the model, we conduct a grid search and tune the hyperparameters based
on the MRR performance on the validation set. Xavier initialization [46] is utilized to initialize the model parameters, and the Adam
optimizer [47] is adopted with a learning rate of 0.001 for optimization. Each epoch comprises 256 mini-batches, and a negative
sampling ratio of 1000 is used for generating negative samples in the training set. For ICEWS05-15, the dimension is set to 𝑑 = 300,
while for other datasets, 𝑑 = 200. The kernel size is selected from 𝑘 ∈ {3, 5, 7} and the convolution filters are set to 64. To maintain
acceptable time complexity, the number of nodes 𝛾𝑖 at each level of the partition tree is set to a multiple of 1, 2, 3, except for 𝛾𝑟𝑜𝑜𝑡 and
𝛾𝑙𝑒𝑎𝑓 . To determine the value of 𝜆, we conducted experiments with different values from {5e − 4, 2e − 4, 1e − 4, 5e − 5, 1e − 5}. For
ICEWS14 and ICEWS05-15 datasets, 𝜆 = 5e − 5, and for YAGO11k and Wikidata12k, 𝜆 = 1e − 5. In the future, we plan to investigate
how the structure of partition trees affects task performance.

4.2. Main results

The results of the models on the ICEWS datasets, which include ICEWS14 and ICEWS05-15, are shown in Table 5. Additionally, a
set of observations and analyses are itemized herein. (1) Most Temporal Knowledge Graph Completion (TKGC) models significantly
outperform static Knowledge Graph Completion (KGC) models. TKGC models employ temporal information to restrict the similarity
between facts, thereby effectively distinguishing similar facts with distinct timestamps. (2) SANe and SANe+ models exhibit optimal
performance across all metrics for link prediction. This observation indicates that adapting snapshots with different timestamps to
distinct latent spaces is a productive strategy. The facts are assigned implicitly to distinct Convolutional Neural Network (CNN)
modules, whereby each snapshot at varying timestamps is processed based on a specific latent space. The outcomes indicate that
parameter generation represents a practical approach to mitigate mutual interference of knowledge across snapshots that possess
differing timestamps. (3) Facts in ICEWS represent transient events that usually occur and conclude within a short period. Unlike
other TKGC techniques, SANe can recall and deduce instantaneous facts by restoring the CNN model from the parameter pool based
on timestamps. The outcomes presented in Table 5 provide additional validation that SANe is more effective in enabling the intrinsic
time-variability that exists within TKGs.
Table 6 presents the results for YAGO11k and Wikidata12k datasets, which are based on Wikipedia. The SANe approach exhibits a
considerably superior performance compared to the prior methods in the ICEWS dataset. Notably, with regards to the main metric for
the TKGC task, namely MRR, SANe outperforms the state-of-the-art approaches by a remarkable margin of 6% and 29% for YAGO11k
and Wikidata12k, respectively. The datasets derived from Wikipedia contain factual information spanning multiple centuries, and

11
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 5
Link prediction results on ICEWS14 and ICEWS05-15 for various methods. Results marked with ∗, †, and ⋄ are
taken from the studies by [41], [42], and [16], respectively. Dashes indicate unobtainable results, and all other
results are from the original papers.

Datasets ICEWS14 ICEWS05-15


Metrics Hits@10 Hits@3 Hits@1 MRR Hits@10 Hits@3 Hits@1 MRR

TransE [5] .637 – .094 .280 .663 – .090 .294
DistMult∗ [6] .672 – .323 .439 .691 – .337 .456
RotatE† [7] .690 .478 .291 .418 .595 .355 .164 .304
ComplEx-N3† [44] .716 .527 .347 .467 .729 .535 .362 .481
QuatE2† [8] .712 .530 .353 .471 .727 .529 .370 .482

TTransE† [12] .601 – .074 .255 .616 – .084 .271


TA-TransE∗ [41] .625 – .095 .275 .668 – .096 .299
TA-DistMult∗ [41] .686 – .363 .477 .728 – .346 .474
HyTE† [31] .655 .416 .108 .297 .681 .445 .116 .316
DE-SimplE† [19] .725 .592 .418 .526 .748 .578 .392 .513
ATiSE [18] .757 .632 .423 .545 .803 .623 .394 .533
TeRo [42] .732 .621 .468 .562 .795 .668 .469 .586
TimePlex [45] .771 – .515 .604 .818 – .545 .640
TComplEx⋄ [13] .770 .660 .530 .610 .800 .710 .590 .660
ChronoR [17] .773 .669 .547 .625 .820 .723 .596 .675
TeLM⋄ [16] .774 .673 .545 .625 .823 .728 .599 .678
BoxTE [14] .763 .664 .528 .613 .820 .719 .582 .667
RoAN-DED [37] .774 .644 .457 .569 .464 .289 .169 .268

SANe .782 .688 .558 .638 .823 .734 .605 .683


SANe+ .788 .696 .563 .643 .826 .737 .608 .686

Table 6
Comparison of link prediction results on YAGO11k and Wikidata12k across several TKGC methods. Results marked
with ∗, †, and ⋄ are taken from [18], [42], and [16], respectively. Dashes indicate unobtainable results.

Datasets YAGO11k Wikidata12k


Metrics Hits@10 Hits@3 Hits@1 MRR Hits@10 Hits@3 Hits@1 MRR

TransE∗ [5] .244 .138 .015 .100 .339 .192 .100 .178
DistMult∗ [6] .268 .161 .107 .158 .460 .238 .119 .222

RotatE [7] .305 .167 .103 .167 .461 .236 .116 .221
ComplEx-N3∗ [44] .282 .154 .106 .167 .436 .253 .123 .233

QuatE2 [8] .270 .148 .107 .164 .416 .243 .125 .230

TTransE† [12] .251 .150 .020 .108 .329 .184 .096 .172
TA-TransE† [41] .326 .160 .027 .127 .429 .267 .030 .178
TA-DistMult† [41] .292 .171 .103 .161 .447 .232 .122 .218
HyTE† [31] .272 .143 .015 .105 .333 .197 .098 .180
ATiSE [18] .301 .189 .126 .185 .462 .288 .148 .252
TeRo† [42] .319 .197 .121 .187 .507 .329 .198 .299
TimePlex [45] .367 – .169 .236 .532 – .228 .334
TComplEx⋄ [13] .307 .183 .127 .185 .539 .357 .233 .331
TeLM⋄ [16] .321 .194 .129 .191 .542 .360 .231 .332

SANe .401 .266 .180 .250 .640 .483 .331 .432


SANe+ .402 .292 .210 .274 .629 .490 .341 .437

even millennia, whereas the temporal coverage of ICEWS is limited to a few years. Furthermore, the longevity of factual information
in Wikipedia-based datasets differs markedly from the fleeting nature of events captured by ICEWS. The outstanding performance
of SANe underscores the need for developing a more systematic method for generating parameters that yield multiple latent spaces.
Such an approach should restrict the extent of knowledge sharing based on the temporal distance between timestamps. Several
parameter sets are utilized to capture the contextual information of timestamps, which allows for the aggregation of knowledge across
adjacent snapshots. As a result, valid knowledge over a particular period can be efficiently preserved and shared. Models that employ
independent representations [12,17,45,13,16] or incorporate timestamps into entities and relations [31,41,19,18,42,14] encounter
interference issues across multiple snapshots, particularly when knowledge persists for a prolonged period. This is primarily attributed
to the fact that these models treat all the available facts uniformly in a single latent space, resulting in the risk of misremembering
and forgetting important knowledge. Furthermore, the findings presented in Table 6 serve to affirm that SANe is more proficient in
facilitating the preservation of time-stability that is inherent in TKGs.
Table 7 presents the experimental results on the GDELT dataset, offering a comprehensive view of the model’s performance in
a diverse and complex TKG context. Notably, our proposed models, SANe and SANe+, demonstrate their capacity to excel in the

12
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 7
Comparison of link prediction results on GDELT across several
TKGC methods.

Metrics Hits@10 Hits@3 Hits@1 MRR

TransE [5] .312 .158 .000 .113


DistMult [6] .348 .208 .117 .196

TTransE [12] .318 .160 .000 .115


HyTE [31] .326 .165 .000 .118
TA-DistMult [41] .365 .219 .124 .206
DE-SimplE [19] .403 .248 .141 .230

SANe .476 .326 .212 .301


SANe+ .480 .330 .214 .304

face of the intricate dynamics presented by the GDELT dataset. These models surpass baseline approaches, achieving state-of-the-art
results. The encouraging performance gains are especially significant when considering the challenges posed by the GDELT dataset.
The GDELT dataset is notably denser than the ICEWS datasets, housing approximately 2.7 million training facts, 500 entities, and
20 relations. Moreover, it exhibits a pronounced temporal intricacy characterized by facts that persist across multiple consecutive
timestamps, intermingled with transient and sparse events. This temporal diversity underscores the dataset’s complexity, necessitating
advanced reasoning capabilities. The exceptional performance exhibited by SANe and SANe+ on the GDELT dataset is a testament
to the model’s adaptability and its ability to effectively capture temporal patterns.
In the five datasets of TKGC tasks, SANe+ achieves the best results. And, it has a significant improvement compared to SANe in
YAGO11k and Wikidata12k datasets which exhibit the typical long tail effect on time distribution of facts as shown in Fig. 4. In the
experiment, TaPG and AdaPG manage the timestamps using 𝑃 𝑇 with same tree structure except for the number of facts associated
with the leaves. The results suggest that SANe+ is capable of handling long tail effect by optimizing the manner of parameter
generation. AdaPG ensures the linear correlation between the number of facts and the number of parameters and thus keep the load
balancing of parameters.
Additionally, it is worth highlighting that the performance improvements exhibited by SANe and SANe+ on the YAGO11k and
Wikidata12k datasets surpass those on the ICEWS datasets. This observation underscores the enhanced utility of SANe and SANe+
in the face of the unique challenges posed by sparse and imbalanced temporal datasets.

4.3. Ablation study

To verify the contributions of modules that are involved in SANe and SANe+, five variant models including V1, V2, V3, V4
and V5 are investigated on ICEWS14 dataset. The variant V1 ignores the Time Information of facts which predicts the tail entity
𝐭 based on a fixed set of parameters Θ, i.e., 𝐭 = 𝑓 (𝐡, 𝐫; Θ). Following the work [48], the variant V2 introduces time-dependent
entity embeddings by employing the time-wise information, i.e., 𝐭 = 𝑓 (𝐡 ⊙ 𝝉, 𝐫; Θ), where ⊙ is Hadamard product and 𝝉 ∈ ℝ𝑑 is a
𝑑 -dimensional embedding representation of 𝜏 . To verify the effectiveness of Contextual Representations of 𝜏 , the variant V3 replaces
𝝉 as 𝐨𝜏3 in Eq. (4), i.e., 𝐭 = 𝑓 (𝐡 ⊙ 𝐨𝜏3 , 𝐫; Θ). In addition, we also verify the necessity of designing a principled parameter generation
network by introducing the variant V4. In particular, V4 employs 3 linear layers {𝖫𝗂𝗇𝖾𝖺𝗋𝑖 }3𝑖=1 as PGN, which projects 𝝉 into the
parameters of DCNN directly, i.e., 𝐭 = 𝑓 (𝐡, 𝐫; {𝖫𝗂𝗇𝖾𝖺𝗋𝑖 (𝝉)}3𝑖=1 ). To validate the effectiveness of the consistency loss, V5 applies the
same configuration as SANe+ except for the inclusion of the consistency loss.
The outcomes of the ablation analysis are presented in Table 8, wherein the symbol  is employed to signify the presence of
a particular component during the experimentation, while the symbol  is employed to indicate its absence. One can see that
V1 achieves the worst results obviously, suggesting the importance of time information for TKGC task. The performance of V2 is
far beyond V1, and even better than many baselines (e.g., TimePlex and DE-SimplE) in Table 5. V2 processes all the facts only
in term of DCNN without any extra module and complicated translation. It verifies that TKGC benefits from convolution network
owing to its strong expressiveness. V3 also only involves DCNN but introduces contextual representations 𝐨𝜏3 of timestamps to boost
the model performance. The slight improvement on V2 shows that 𝐨𝜏3 encodes the meaningful time information and is capable of
describing the physical relationship between timestamps. V2, V3 and V4 models shows that both time information, time contextual
representations and PGN make a contribution to predict the tail entities. PGN obviously is dominant factor by comparing V4 with V2
and V3 although V4 only equips simple linear layers as PGN. This is mainly because that parameter generation strategy can avoid the
interference of time-dependent facts by separating snapshots into different networks. Compared to V4, SANe and SANe+ also enable
the time-stability of TKG by producing parameters based on time contextual representations, and thus obtain the best results. TaPG
and AdaPG make a contribution to effective knowledges management according to time divisions. In addition, due to the adoption
of the consistency loss, SANe+ achieved superior performance compared to V5, demonstrating the effectiveness of the consistency
loss.

13
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 8
Comparison of results for different variations of the model on ICEWS14.

Variant Models Time Information Time Contextual Representation PGN Consistency Loss Hits@10 Hits@3 Hits@1 MRR

V1     .703 .529 .350 .469


V2     .760 .656 .527 .608
V3     .778 .679 .536 .622
V4   𝖫𝗂𝗇𝖾𝖺𝗋  .780 .683 .548 .630
SANe   TaPG  .782 .688 .558 .638
V5   AdaPG  .786 .692 .561 .641

SANe+   AdaPG  .788 .696 .563 .643

Table 9
Evaluation of generalization performance on ICEWS14 dataset for
queries with unseen timestamps.

Metrics Hits@10 Hits@3 Hits@1 MRR

DistMult [6] .620 .462 .302 .410


DE-SimplE [19] .624 .492 .333 .434
TComplEx [13] .625 .492 .348 .443

SANe .709 .569 .394 .503


SANe+ .718 .600 .435 .535

4.4. Analysis

Generalizing to Unseen Timestamps. One of the key strengths of our proposed SANe model lies in its ability to generate
contextual representations of timestamps, which enables the model to query facts associated with unseen timestamps that do not
exist in the datasets. This feature is particularly critical in the context of temporal knowledge graph completion tasks, where a model
must be robust enough to handle new, unseen data points.
To train our model effectively, a large dataset is typically required. However, due to constraints related to time and computational
resources, we often find ourselves restricted to using only a subset of the available dataset for training purposes. In such scenarios,
it becomes essential to have a strategy for evaluating the model’s performance on data points that it has not encountered during
training. To address this challenge, we partitioned the dataset in a specific manner that allows us to assess the model’s performance
on unseen data.
Consistent with the methodology adopted by Goel et al. [19], we re-partitioned the ICEWS14 dataset by selectively removing
quadruplets corresponding to the 5𝑡ℎ , 15𝑡ℎ , and 25𝑡ℎ days of each month from the training set. The removed quadruplets were then
utilized to construct validation and test sets through a random selection process. This approach effectively simulates a scenario where
the model encounters new time points in future applications, thereby providing a robust assessment of its generalization capabilities.
The findings from this experiment, as presented in Table 9, clearly demonstrate the superior performance of the SANe model
when compared to the TComplEx model [13]. Specifically, SANe outperforms TComplEx by approximately 14% in terms of mean
reciprocal rank (MRR), highlighting the model’s effectiveness in handling unseen timestamps.
Furthermore, the enhanced version of our model, SANe+, achieves a substantial improvement in MRR, outperforming TComplEx
by almost 21%. This significant performance boost serves as a testament to the efficacy of our model in generalizing to previously
unseen timestamps. The improvements introduced in SANe+ contribute to its robustness and adaptability, making it a promising
candidate for a wide range of temporal knowledge graph completion tasks.
Performance Evaluation on Various Relations. The annotations in YAGO11k are predominantly temporal intervals, which
signifies the temporal nature of entities and relations in real-world knowledge graphs. In practice, relations between entities often
exhibit temporal dynamics, and can change over time, leading to evolving or outdated knowledge in the graph. This temporal
variation poses significant challenges for TKGC, where the objective is to predict the likely relations between entities, and requires
models that can effectively capture and reason about the temporal dependencies in the data. Therefore, the temporal annotations in
YAGO11k pose a unique challenge for the development of effective knowledge graph completion models that can account for the
temporal variation in the data.
Our evaluation of SANe is conducted on several relations (i.e., worksAt, hasWonPrize, graduatedFrom, and isAffiliatedTo) within
YAGO11k dataset. Furthermore, we reproduce the results of ATiSE and TimPlex, employing the respective hyperparameters provided
by each method. These relations are typically established or dissolved between specific entities at particular points in time, and
subsequently maintained for a specified duration [18]. As an illustration, an individual may transition to a different company after
being employed by one company for a few months. In this context, SANe is anticipated to exhibit favorable performance. The results
depicted in Fig. 6 provide a comprehensive performance comparison among SANe and two baseline models across four distinct
relations within the YAGO11k dataset. The evaluation encompasses various metrics, revealing that SANe consistently outperforms
the baseline models, particularly demonstrating notable superiority in terms of MRR. Notably, SANe excels in capturing the temporal
dynamics of the worksAt relation, exhibiting exceptional performance in this specific context. The process of adapting different

14
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Fig. 6. Results from several relations in YAGO11k achieved by ATiSE [18], TimePlex [45], and SANe.

temporal snapshots to distinct latent spaces through parameter generation has an advantage for effectively capturing the time-
variability of knowledge, and our study results have verified this hypothesis. The incorporation of contextual representations of
timestamps within overlapping latent spaces has the potential to facilitate the sharing of knowledge between adjacent snapshots.
This can prove advantageous in the effective modeling of the time-stability of knowledge.
Visualization of Contextual Representations of Timestamps. Most existing methods for TKGC directly utilize embedding
representations 𝝉 to capture temporal information. However, we propose a novel approach that leverages contextual representations
𝐨𝜏3 to encode physical relationships between timestamps in a more meaningful way. In particular, Fig. 7 provides a t-SNE [49]
visualization of timestamp representations learned by our model, SANe, and a variant, V4, which only considers parameters in terms
of 𝝉 as discussed in Section 4.3.
In Fig. 7(a), we visualize representations of timestamps without context, as learned by the V4 variant. The distribution of points
lacks a discernible pattern. In contrast, Fig. 7(b) visualizes representations obtained by decomposing timestamps into context and
processing them through an RNN. Notably, even without employing timestamp smoothing techniques, the points in Fig. 7(b) in-
tuitively reveal distinct clusters in chronological order. This outcome underscores the effectiveness of our approach in preserving
time series information and providing meaningful geometric interpretations for temporal embeddings, contributing to the improved
performance of the model. The clarity of temporal relationships in Fig. 7(b) is particularly evident, emphasizing the advantage of
incorporating contextual information in timestamp representations.
Effectiveness of AdaPG in Handling Unbalanced Time Distribution. We conduct additional experiments to investigate the
effectiveness of AdaPG in handling unbalanced time distribution within the ICEWS14 dataset. We compare the performance of two
models, SANe with TaPG and SANe+ with AdaPG, on the 365 timestamps present in the dataset.
Fig. 8 illustrates the MRR performance of SANe and SANe+ on the 365 timestamps of the ICEWS14 dataset. As depicted in
the graph, SANe+ consistently outperforms SANe across most timestamps, demonstrating its superior stability and effectiveness
in handling unbalanced time distribution data. Specifically, at 212 timestamps, SANe+ achieved better performance compared to
SANe, indicating that AdaPG effectively addressed the challenges posed by unbalanced time distribution in the dataset. The observed

15
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Fig. 7. Visualization of temporal embeddings from SANe and its variant trained on ICEWS14 dataset using t-SNE.

Fig. 8. MRR performance comparison between SANe and SANe+ on the 365 timestamps of ICEWS14.

Table 10
Hyperparameter Sensitivity Analysis of SANe on ICEWS14.

Learning rate kernel size Hits@10 Hits@3 Hits@1 MRR

0.005 7 .774 .666 .499 .599


0.005 5 .769 .622 .432 .549
0.005 3 .771 .630 .438 .555
0.0005 7 .774 .673 .539 .622
0.0005 5 .779 .684 .553 .633
0.0005 3 .780 .687 .557 .637
0.001 7 .770 .675 .547 .626
0.001 5 .777 .685 .556 .635

0.001 3 .782 .688 .558 .638

performance improvement in SANe+ suggests that AdaPG’s ability to balance parameter generation based on the number of facts
at each timestamp positively impacts the overall model performance, leading to more stable results across the unbalanced time
distribution.
Hyperparameter Sensitivity Analysis. In this section, we perform a comprehensive hyperparameter sensitivity analysis on the
SANe model. Specifically, we conduct grid search experiments using the ICEWS14 dataset to investigate the sensitivity of SANe to
the learning rate and kernel size hyperparameters. We explore various settings for these hyperparameters, selecting from a range of
values: learning rates {0.005, 0.001, 0.0005} and kernel sizes {3, 5, 7}. The results are shown in Table 10.

16
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Our findings reveal the impact of these hyperparameters on the model’s performance. Notably, we observe that the learning rate
significantly influences SANe’s performance, with the best results achieved at a learning rate of 0.001. In contrast, the choice of
kernel size demonstrates less sensitivity, as performance remains comparable across kernel sizes of 3, 5, and 7.
The insights gained from this hyperparameter sensitivity analysis provide valuable guidance for model optimization in temporal
knowledge graph completion tasks. We encourage further exploration and collaboration in the study of hyperparameter sensitivity
to refine the model’s adaptability to various applications.

4.5. Discussion

In this section, we present a comprehensive analysis of the experimental results obtained from our proposed Space Adaptation
Network (SANe) for Temporal Knowledge Graph Completion (TKGC) tasks. We highlight the main contributions, discuss the effec-
tiveness of individual model components, and explore the generalization capabilities of SANe and SANe+ to unseen timestamps
and various relations. Additionally, we provide insights into the visualization of contextual representations of timestamps and the
effectiveness of AdaPG in handling unbalanced time distribution.
Main Results. Our experimental results on the ICEWS datasets (ICEWS14 and ICEWS05-15) and Wikipedia-based datasets
(YAGO11k and Wikidata12k) demonstrate the superiority of SANe and SANe+ over existing models for TKGC tasks. Specifically,
SANe+ outperforms all other methods on five benchmark datasets, achieving state-of-the-art performance. The effectiveness of SANe
and SANe+ is attributed to its ability to adapt facts at different timestamps to distinct latent spaces, enabling it to effectively capture
time-variability within Temporal Knowledge Graphs (TKGs).
Ablation Study. To verify the contributions of different model components, we conducted an ablation study, evaluating five
variant models (V1 to V5) on the ICEWS14 dataset. The results show that the incorporation of time information, time contextual
representations, and the principled parameter generation network (PGN) play crucial roles in the success of SANe and SANe+.
Moreover, the inclusion of the consistency loss in SANe+ further improves the model’s performance, confirming the effectiveness of
the consistency loss in capturing temporal variations.
Generalizing to Unseen Timestamps. SANe’s contextual representations of timestamps allow it to handle unseen timestamps
effectively. The model’s ability to generalize to previously unseen data was evaluated by re-partitioning the ICEWS14 dataset and
simulating the introduction of new time points in future applications. The results demonstrate that SANe significantly outperforms
TComplEx [13] in terms of mean reciprocal rank (MRR), highlighting its effectiveness in handling previously unseen timestamps.
Performance Evaluation on Various Relations. The evaluation of SANe on various relations within the YAGO11k dataset shows
its superiority over other methods. SANe effectively captures and reasons about temporal dependencies in data, enabling it to perform
well on relations with temporal dynamics. The utilization of distinct latent spaces for different temporal snapshots, facilitated by
parameter generation, enhances the model’s ability to capture time-variability and time-stability of knowledge.
Visualization of Contextual Representations of Timestamps. The use of contextual representations 𝐨𝜏3 for timestamps provides
meaningful geometric interpretations for temporal embeddings. This leads to improved performance for SANe by enhancing the
preservation of time series information. The t-SNE visualization of timestamp representations learned by SANe further validates the
effectiveness of our approach.
Effectiveness of AdaPG in Handling Unbalanced Time Distribution. The additional experiments on the ICEWS14 dataset
demonstrate that SANe+ with AdaPG outperforms SANe with TaPG in handling unbalanced time distribution. AdaPG’s ability to
balance parameter generation based on the number of facts positively impacts the model’s stability and performance, making SANe+
more effective in handling unbalanced time distribution data. In summary, the discussion highlights the key findings of our experi-
ments and validates the effectiveness of SANe and its components. We provide insights into the generalization capabilities of SANe,
its performance on various relations, and the advantages of using contextual representations for timestamps. Furthermore, we em-
phasize the effectiveness of AdaPG in addressing unbalanced time distribution, which leads to improved stability and performance
in the SANe+ model.
Computational Resource Analysis. The efficient utilization of computational resources is of paramount importance for the
practical applicability of complex models in the domain of temporal knowledge graph completion (TKGC). This section delves into
a detailed analysis of the computational resources required for training and utilizing the proposed models, SANe and SANe+. The
analysis encompasses hardware and software dependencies, training times, and model complexity.
Hardware and Software Dependencies. The training of SANe and SANe+ relies on a single NVIDIA GeForce RTX 3090 Graphics
Processing Unit (GPU). This choice of GPU, renowned for its high computational capacity, aligns with the demands imposed by the
intricacy of the models under consideration. The utilization of the PyTorch deep learning framework in our implementation further
ensures compatibility and efficiency in resource utilization.
Training Times. Training times, a crucial factor in assessing practical feasibility, vary across datasets. For the ICEWS14 dataset,
which serves as a benchmark, the training duration for SANe is approximately 3.5 hours. In the case of the ICEWS05-15 dataset,
the training period extends to around 5 hours. For the Wikidata12k dataset, training spans approximately 6 hours. In scenarios with
notably large temporal graphs, such as YAGO11k, the training time extends to roughly 16 hours. These training durations are deemed
acceptable, given the intricacies of the models and the scale of the datasets under consideration.
Model Complexity. A fundamental metric for assessing computational resource requirements is the model’s parameter count. In
the case of the ICEWS14 dataset, we conducted a parameter analysis comparing SANe and SANe+ to TeRo and ATiSE [18]. The
results are presented in Table 11.

17
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

Table 11
Model Parameter Compari-
son on the ICEWS14 Dataset.

Model #Parameters

TeRo 8,231,500
ATiSE 18,980,089

SANe 21,670,219
SANe+ 21,671,019

The parameter counts of SANe and SANe+ are seen to be within the range of non-deep learning model ATiSE [18], reflecting
a reasonable level of model complexity. In essence, the analysis presented in this section underscores the practical feasibility of
training and deploying SANe and SANe+ on resource-constrained systems. The choice of hardware, efficient software dependen-
cies, reasonable training times, and manageable model complexity collectively contribute to the models’ applicability in real-world
settings.
Discussion on Handling Different Temporal Granularities. The consideration of temporal granularity is paramount in the
domain of TKGC, as real-world knowledge graphs often exhibit diverse temporal resolutions. This discussion delves into the model’s
adaptability to varying temporal granularities, a crucial aspect in ensuring the applicability of the proposed framework.
The framework presented in this study, notably the Parameter Generation Network (PGN), is designed to flexibly accommodate
knowledge graphs characterized by differing temporal granularities. To illustrate this adaptability, we examine datasets featuring
distinct temporal resolutions, including daily and yearly granularities.
Notably, datasets such as ICEWS14, ICEWS05-15, and GDELT are characterized by daily temporal granularity. In contrast,
YAGO11k and Wikidata12k datasets feature yearly temporal granularity. The model’s ability to handle these diverse temporal reso-
lutions underscores its utility in addressing real-world knowledge graphs with varying temporal characteristics.
Moreover, the discussion emphasizes that the PGN, specifically the Adaptive Parameter Generator (AdaPG), incorporates a parti-
tioning strategy distinct from traditional Time-aware Parameter Generator (TaPG). This differentiation allows the model to effectively
adapt to scenarios where temporal granularity may not extend to months or days.

4.6. Future directions

The promising results obtained from our model, SANe, have opened up intriguing avenues for future research. While our work
has focused on addressing the challenges of TKGC on benchmark datasets, there are several promising directions for advancing the
field and extending the applicability of our approach.
Scalability to Ultra-Large Knowledge Graphs. One of the most pressing questions in the field of TKGC pertains to the scalability
of models to ultra-large knowledge graphs. Real-world knowledge graphs can often encompass millions of entities and relations,
presenting unique challenges in terms of computational resources and model efficiency. Future research efforts can be directed
towards developing novel techniques that enable the effective completion of ultra-large-scale TKGs. This would involve exploring
methods for efficient model training, data handling, and distributed processing, all while preserving the model’s accuracy and
robustness.
Enhancing Model Interpretability. Complex models like SANe offer significant performance gains but often lack interpretability.
Understanding model decisions is crucial for real-world adoption. We urge further research in making complex models interpretable
without sacrificing their performance. Achieving model interpretability is vital for building trust and expanding their practical
applicability.
Real-World Applications. Our future work will focus on real-world deployments and case studies of SANe. We aim to collaborate
with domain-specific experts and organizations to showcase SANe’s effectiveness in practical applications. Through these real-world
use cases, we plan to highlight SANe’s adaptability and utility in addressing knowledge graph challenges across diverse domains.
This will bridge the gap between research and deployment, offering valuable insights for broader knowledge graph applications.

5. Conclusion

In this paper, we present a novel solution for modeling the time-variability and time-stability of TKGs based on a parameter
generation framework. Towards to time-variability, we propose to endow each snapshot at different timestamps with an identical
latent space. Specifically, SANe processes entities and relations using different DCNN pipelines, which are generated by the PGN in
terms of temporal information. SANe is capable of alleviating interference from early snapshots and sharing missing facts in adjacent
snapshots. In addition, TaPG, one of PGN instances, plays an important role in achieving time-stability by endowing timestamp
contextual representations. It gathers valid knowledge from multiple sets of parameters by constraining the overlap of latent spaces.
AdaTG is also proposed to address parameter waste in TaPG based on partition tree over timestamps. In summary, our approach
demonstrates optimal performance across five benchmark datasets. Notably, a substantial improvement of 29% on the Wikidata12k
dataset compared to baseline models validates the efficacy of our model in addressing temporal knowledge graph completion tasks.
However, the model lacks interpretability, and scalability on ultra-large knowledge graphs remains unverified. These limitations
highlight the need for further in-depth exploration in future research.

18
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

CRediT authorship contribution statement

Yancong Li: Writing – original draft, Visualization, Software, Methodology, Conceptualization. Xiaoming Zhang: Writing – re-
view & editing, Project administration, Funding acquisition, Conceptualization. Bo Zhang: Writing – review & editing, Visualization,
Investigation. Feiran Huang: Supervision, Formal analysis, Data curation. Xiaopeng Chen: Supervision, Investigation. Ming Lu:
Validation, Software. Shuai Ma: Writing – review & editing, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Data availability

Data will be made available on request.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 62272025 and No. U22B2021), and
in part by Fund of the State Key Laboratory of Software Development Environment.

References

[1] Y. Wang, H. Wang, W. Lu, Y. Yan, Hygge: hyperbolic graph attention network for reasoning over knowledge graphs, Inf. Sci. 630 (2023) 190–205.
[2] H. Cui, T. Peng, F. Xiao, J. Han, R. Han, L. Liu, Incorporating anticipation embedding into reinforcement learning framework for multi-hop knowledge graph
question answering, Inf. Sci. 619 (2023) 745–761.
[3] H. Roghani, A. Bouyer, A fast local balanced label diffusion algorithm for community detection in social networks, IEEE Trans. Knowl. Data Eng., https://
doi.org/10.1109/TKDE.2022.3162161.
[4] F. Wang, Z. Zheng, Y. Zhang, Y. Li, K. Yang, C. Zhu, To see further: knowledge graph-aware deep graph convolutional network for recommender systems, Inf.
Sci. 647 (2023) 119465.
[5] A. Bordes, N. Usunier, A. Garcia-Durán, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational data, in: Proceedings of the 26th
International Conference on Neural Information Processing Systems-Volume 2, 2013, pp. 2787–2795.
[6] B. Yang, S.W.-t. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and inference in knowledge bases, in: Proceedings of the International
Conference on Learning Representations (ICLR), 2015, p. 2015.
[7] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: knowledge graph embedding by relational rotation in complex space, in: International Conference on Learning
Representations, 2019.
[8] S. Zhang, Y. Tay, L. Yao, Q. Liu, Quaternion knowledge graph embeddings, in: Proceedings of the 33rd International Conference on Neural Information Processing
Systems, 2019, pp. 2735–2745.
[9] E. Boschee, J. Lautenschlager, S. O’Brien, S. Shellman, J. Starz, M. Ward, Icews coded event data, Harvard Dataverse, https://doi.org/10.7910/DVN/28075.
[10] F. Mahdisoltani, J. Biega, F. Suchanek, Yago3: a knowledge base from multilingual wikipedias, in: 7th Biennial Conference on Innovative Data Systems Research,
CIDR Conference, 2014.
[11] F. Erxleben, M. Günther, M. Krötzsch, J. Mendez, D. Vrandečić, Introducing Wikidata to the linked data web, in: International Semantic Web Conference,
Springer, 2014, pp. 50–65.
[12] J. Leblay, M.W. Chekol, Deriving validity time in knowledge graph, in: Companion Proceedings of the Web Conference 2018, 2018, pp. 1771–1776.
[13] T. Lacroix, G. Obozinski, N. Usunier, Tensor decompositions for temporal knowledge base completion, in: International Conference on Learning Representations,
2020.
[14] J. Messner, R. Abboud, I.I. Ceylan, Temporal knowledge graph completion using box embeddings, in: Proceedings of the AAAI Conference on Artificial Intelli-
gence, 2022.
[15] J. Wu, Y. Xu, Y. Zhang, C. Ma, M. Coates, J.C.K. Cheung, Tie: a framework for embedding-based incremental temporal knowledge graph completion, in:
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 428–437.
[16] C. Xu, Y.-Y. Chen, M. Nayyeri, J. Lehmann, Temporal knowledge graph completion using a linear temporal regularizer and multivector embeddings, in: Pro-
ceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021,
pp. 2569–2578.
[17] A. Sadeghian, M. Armandpour, A. Colas, D.Z. Wang, Chronor: rotation based temporal knowledge graph embedding, in: Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 35, 2021, pp. 6471–6479.
[18] C. Xu, M. Nayyeri, F. Alkhoury, H. Yazdi, J. Lehmann, Temporal knowledge graph completion based on time series Gaussian embedding, in: International
Semantic Web Conference, Springer, 2020, pp. 654–671.
[19] R. Goel, S.M. Kazemi, M. Brubaker, P. Poupart, Diachronic embedding for temporal knowledge graph completion, in: Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 34, 2020, pp. 3988–3995.
[20] E.A. Platanios, M. Sachan, G. Neubig, T. Mitchell, Contextual parameter generation for universal neural machine translation, in: Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing, 2018, pp. 425–435.
[21] Y. Li, X. Zhang, B. Zhang, H. Ren, Each snapshot to each space: space adaptation for temporal knowledge graph completion (candidate best paper), in: The
Semantic Web–ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23–27, 2022, Proceedings, Springer, 2022, pp. 248–266.
[22] Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating on hyperplanes, in: Proceedings of the AAAI Conference on Artificial Intelligence,
vol. 28, 2014.
[23] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph embedding via dynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting of the Association for
Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, pp. 687–696.
[24] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge graph completion, in: Twenty-Ninth AAAI Conference on Artificial
Intelligence, 2015.

19
Y. Li, X. Zhang, B. Zhang et al. Information Sciences 667 (2024) 120430

[25] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, G. Bouchard, Complex embeddings for simple link prediction, in: Proceedings of the 33rd International Conference
on International Conference on Machine Learning-Volume 48, 2016, pp. 2071–2080.
[26] M. Nickel, V. Tresp, H.-P. Kriegel, A three-way model for collective learning on multi-relational data, in: Proceedings of the 28th International Conference on
International Conference on Machine Learning, 2011, pp. 809–816.
[27] I. Balažević, C. Allen, T. Hospedales, Tucker: tensor factorization for knowledge graph completion, in: Proceedings of the 2019 Conference on Empirical Methods
in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 5185–5194.
[28] T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowledge graph embeddings, in: Proceedings of the AAAI Conference on Artificial Intelli-
gence, vol. 32, 2018.
[29] Z. Zhang, F. Zhuang, H. Zhu, Z. Shi, H. Xiong, Q. He, Relational graph neural network with hierarchical attention for knowledge graph completion, in: Proceedings
of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 9612–9619.
[30] S. Vashishth, S. Sanyal, V. Nitin, N. Agrawal, P. Talukdar, Interacte: improving convolution-based knowledge graph embeddings by increasing feature interac-
tions, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 3009–3016.
[31] S.S. Dasgupta, S.N. Ray, P. Talukdar, Hyte: hyperplane-based temporally aware knowledge graph embedding, in: Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing, 2018, pp. 2001–2011.
[32] T. Jin, Z. Liu, S. Yan, A. Eichenberger, L.-P. Morency, Language to network: conditional parameter adaptation with natural language descriptions, in: Proceedings
of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
[33] T. Nekvinda, O. Dušek, One model, many languages: meta-learning for multilingual text-to-speech, in: Proc. Interspeech 2020, 2020, pp. 2972–2976.
[34] G. Stoica, O. Stretcu, E.A. Platanios, T. Mitchell, B. Póczos, Contextual parameter generation for knowledge graph link prediction, in: Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 34, 2020, pp. 3000–3008.
[35] F. Che, D. Zhang, J. Tao, M. Niu, B. Zhao, Parame: regarding neural network parameters as relation embeddings for knowledge graph completion, in: Proceedings
of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 2774–2781.
[36] S.M. Kazemi, D. Poole, Simple embedding for link prediction in knowledge graphs, in: Proceedings of the 32nd International Conference on Neural Information
Processing Systems, 2018, pp. 4289–4300.
[37] L. Bai, X. Ma, X. Meng, X. Ren, Y. Ke, Roan: a relation-oriented attention network for temporal knowledge graph completion, Eng. Appl. Artif. Intell. 123 (2023)
106308.
[38] X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in: Proceedings of the Fourteenth International Conference on Artificial Intelligence and
Statistics, JMLR Workshop and Conference Proceedings, 2011, pp. 315–323.
[39] J. Wu, M. Cao, J.C.K. Cheung, W.L. Hamilton, Temp: temporal message passing for temporal knowledge graph completion, in: Proceedings of the 2020 Conference
on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 5730–5746.
[40] Y. Xu, E. Haihong, M. Song, W. Song, X. Lv, W. Haotian, Y. Jinrui, Rtfe: a recursive temporal fact embedding framework for temporal knowledge graph comple-
tion, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,
2021, pp. 5671–5681.
[41] A. Garcia-Duran, S. Dumančić, M. Niepert, Learning sequence encoders for temporal knowledge graph completion, in: Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing, 2018, pp. 4816–4821.
[42] C. Xu, M. Nayyeri, F. Alkhoury, H.S. Yazdi, J. Lehmann, Tero: a time-aware knowledge graph embedding via temporal rotation, in: Proceedings of the 28th
International Conference on Computational Linguistics, 2020, pp. 1583–1593.
[43] K. Leetaru, P.A. Schrodt, Gdelt: global data on events, location, and tone, 1979–2012, in: ISA Annual Convention, 2013.
[44] T. Lacroix, N. Usunier, G. Obozinski, Canonical tensor decomposition for knowledge base completion, in: International Conference on Machine Learning, PMLR,
2018, pp. 2863–2872.
[45] P. Jain, S. Rathi, S. Chakrabarti, et al., Temporal knowledge base completion: new algorithms and evaluation protocols, in: Proceedings of the 2020 Conference
on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 3733–3747.
[46] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the Thirteenth International Conference on
Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
[47] D.P. Kingma, J. Ba, Adam: a method for stochastic optimization, in: Proceedings of 3rd International Conference on Learning Representations, 2015.
[48] P. Shao, D. Zhang, G. Yang, J. Tao, F. Che, T. Liu, Tucker decomposition-based temporal knowledge graph completion, Knowl.-Based Syst. 238 (2022) 107841.
[49] A. Bibal, V. Delchevalerie, B. Frénay, Dt-sne: T-sne discrete visualizations as decision tree structures, Neurocomputing 529 (2023) 101–112.

20

You might also like