0% found this document useful (0 votes)
54 views7 pages

Predicting The Next Location: A Recurrent Model With Spatial and Temporal Contexts

Uploaded by

mpek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views7 pages

Predicting The Next Location: A Recurrent Model With Spatial and Temporal Contexts

Uploaded by

mpek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)

Predicting the Next Location:


A Recurrent Model with Spatial and Temporal Contexts
Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan
Center for Research on Intelligent Perception and Computing
National Laboratory of Pattern Recognition
Institute of Automation, Chinese Academy of Sciences
{qiang.liu, shu.wu, wangliang, tnt}@nlpr.ia.ac.cn

Abstract Nowadays, the spatial temporal predicting problem has


been extensively studied. Factorizing Personalized Markov
Spatial and temporal contextual information plays a key
Chain (FPMC) (Rendle, Freudenthaler, and Schmidt-
role for analyzing user behaviors, and is helpful for pre-
dicting where he or she will go next. With the growing Thieme 2010) is a personalized extension of common
ability of collecting information, more and more tem- markov chain models, and has become one of the most
poral and spatial contextual information is collected in popular methods for sequential prediction. FPMC has also
systems, and the location prediction problem becomes been applied for next location prediction (Cheng et al. 2013;
crucial and feasible. Some works have been proposed to Chen, Liu, and Yu 2014). The main concern about FPMC
address this problem, but they all have their limitations. is that it is based on a strong independence assumption
Factorizing Personalized Markov Chain (FPMC) is con- among different factors. As another popular method, Ten-
structed based on a strong independence assumption sor Factorization (TF) has been successfully applied for
among different factors, which limits its performance. time-aware recommendation (Xiong et al. 2010) as well as
Tensor Factorization (TF) faces the cold start problem
modeling spatial temporal information (Zheng et al. 2010a;
in predicting future actions. Recurrent Neural Networks
(RNN) model shows promising performance compar- Bahadori, Yu, and Liu 2014). In TF, both time bins and loca-
ing with PFMC and TF, but all these methods have tions are regarded as additional dimensions in the factorized
problem in modeling continuous time interval and ge- tensor, which leads the cold start problem in behavior pre-
ographical distance. In this paper, we extend RNN and diction with new time bins, e.g., behaviors in the future. Re-
propose a novel method called Spatial Temporal Recur- cently, Recurrent Neural Networks (RNN) has been success-
rent Neural Networks (ST-RNN). ST-RNN can model fully employed for word embedding (Mikolov et al. 2010;
local temporal and spatial contexts in each layer with 2011a; 2011b) and sequential click prediction (Zhang et al.
time-specific transition matrices for different time inter- 2014). RNN shows promising performance comparing with
vals and distance-specific transition matrices for differ- conventional methods.
ent geographical distances. Experimental results show
that the proposed ST-RNN model yields significant im- Although the above methods have achieved satisfactory
provements over the competitive compared methods on results in some applications, they are unable to handle con-
two typical datasets, i.e., Global Terrorism Database tinuous geographical distances between locations and time
(GTD) and Gowalla dataset. intervals between nearby behaviors in modeling sequential
data. At first, these continuous values of spatial and temporal
Introduction contexts are significant in behavior modeling. For instance,
a person may tend to go to a restaurant nearby, but he or she
With the rapid growth of available information on the inter-
maybe hesitate to go to a restaurant far away even if it is
net and the enhancing ability of systems in collecting infor-
delicious and popular. Meanwhile, suppose a person went to
mation, more and more temporal and spatial contexts have
an opera house last night and a parking lot last month, where
been collected. Spatial and temporal contexts describe the
he or she will go to today has a higher probability to be in-
essential factors for an event, i.e., where and when. These
fluenced by the opera house because of the similar interests
factors are fundamental for modeling behavior in practical
and demands in a short period. Secondly, these local tempo-
applications. It is challenging and crucial to predict where a
ral contexts have fundamental effects in revealing character-
man will be at a give time point with complex temporal and
istics of the user and are helpful for the behavior modeling.
spatial information. For example, based on user historical
If the person went to an opera house last night and an art mu-
check-in data, we can analyze and predict where a user will
seum this morning, it is probable that both the opera house
go next. Moreover, such analysis can also be used for social
and the museum have higher important than other contexts,
good, such as predicting where traffic jams will happen or
e.g., going to shopping mall last month, in the next location
which city terrorist organizations will attack.
prediction. Finally, as some behaviors are periodical such as
Copyright  c 2016, Association for the Advancement of Artificial going to church every Sunday, the effect of time interval be-
Intelligence (www.aaai.org). All rights reserved. comes important for temporal prediction in such situations.

194
In this paper, to better model spatial and temporal infor- and spatial information can be included in TF simultane-
mation, we propose a novel method called Spatial Tempo- ously as two separated dimensions and make location pre-
ral Recurrent Neural Networks (ST-RNN). Rather than con- diction (Bahadori, Yu, and Liu 2014; Bhargava et al. 2015;
sidering only one element in each layer of RNN, taking Zhong et al. 2015). However, it is hard for factorization
local temporal contexts into consideration, ST-RNN mod- based models to generate latent representations of time bins
els sequential elements in an almost fixed time period in that have never or seldom appeared in the training data.
each layer. Besides, ST-RNN utilizes the recurrent structure Thus, we can say that, it is hard to predict future behaviors
to capture the periodical temporal contexts. Therefore, ST- with factorization based models.
RNN can well model not only the local temporal contexts Neighborhood based models might be the most natural
but also the periodical ones. On the other hand, ST-RNN em- methods for prediction with both temporal and spatial con-
ploys the time-specific and distance-specific transition ma- texts. Time-aware neighborhood based methods (Ding and
trices to characterize dynamic properties of continuous time Li 2005; Nasraoui et al. 2007; Lathia, Hailes, and Capra
intervals and geographical properties of distances, respec- 2009; Liu et al. 2010) adapt the ordinary neighborhood
tively. Since it is difficult to estimate matrices for continu- based algorithms to temporal effects via giving more rele-
ous time intervals and geographical distances, we divide the vance to recent observations and less to past observations.
spatial and temporal values into discrete bins. For a specific And for spatial information, the distance between locations
temporal value in one time bin, we can calculate the corre- is calculated and the prediction is made based on power
sponding transition matrix via a linear interpolation of tran- law distribution (Ye et al. 2011) or the multi-center gaus-
sition matrices of the upper bound and lower bound. Simi- sian model (Cheng et al. 2012). Recently, some work con-
larly, for a specific spatial value, we can generate the tran- siders users’ interest to the neighborhood of the destination
sition matrix. Incorporating the recurrent architecture with location (Liu et al. 2014; Li et al. 2015). And Personalized
continuous time interval and location distance, ST-RNN can Ranking Metric Embedding (PRME) method (Feng et al.
better model spatial temporal contexts and give more accu- 2015) learns embeddings as well as calculating the distance
rate location prediction. between destination location and recent visited ones. How-
The main contributions of this work are listed as follows: ever, neighborhood based methods are unable to model the
• We model time intervals in a recurrent architecture with underlying properties in users’ sequential behavior history.
time-specific transition matrices, which presents a novel As a commonly-used method for sequential prediction,
perspective on temporal analysis. Markov Chain (MC) based models aim to predict the next
behavior of a user based on the past sequential behaviors.
• We incorporate distance-specific transition matrices for
In these methods, a estimated transition matrix indicates the
modeling geographical distances, which promotes the
probability of a behavior based on the past behaviors. Ex-
performance of spatial temporal prediction in a recurrent
tending MC via factorization of the probability transition
architecture.
matrix, Factorizing Personalized Markov Chain (FPMC)
• Experiments conducted on real-world datasets show that (Rendle, Freudenthaler, and Schmidt-Thieme 2010) has be-
ST-RNN is effective and clearly outperforms the state-of- come a state-of-the-art method. It has also been extended by
the-art methods. generating user groups (Natarajan, Shin, and Dhillon 2013),
modeling interest-forgetting curve (Chen, Wang, and Wang
Related Work 2015) and capturing dynamic of boredom (Kapoor et al.
In this section, we review several types of methods for 2015). Recently, rather than merely modeling temporal in-
spatial temporal prediction including factorization methods, formation, FPMC is successfully applied in the spatial tem-
neighborhood based methods, markov chain based methods poral prediction by using location Constraint (Cheng et al.
and recurrent neural networks. 2013) or combining with general MC methods (Chen, Liu,
Matrix Factorization (MF) based methods (Mnih and and Yu 2014). However, FPMC assumes that all the compo-
Salakhutdinov 2007; Koren, Bell, and Volinsky 2009) have nent are linearly combined, indicating that it makes strong
become the state-of-the-art approaches to collaborative fil- independent assumption among factors (Wang et al. 2015).
tering. The basic objective of MF is to factorize a user-item Recently, Recurrent Neural Networks (RNN) not only has
rating matrix into two low rank matrices, each of which been successfully applied in word embedding for sentence
represents the latent factors of users or items. The origi- modeling (Mikolov et al. 2010; 2011a; 2011b), but also
nal matrix can be approximated via the multiplying calcula- shows promising performance for sequential click predic-
tion. MF has been extended to be time-aware and location- tion (Zhang et al. 2014). RNN consists of an input layer,
aware nowadays. Tensor Factorization (TF) (Xiong et al. an output unit and multiple hidden layers. Hidden repre-
2010) treats time bins as another dimension and generate sentation of RNN can change dynamically along with a be-
latent vectors of users, items and time bins via factoriza- havioral history. It is a suitable tool for modeling temporal
tion. And timeSVD++ (Koren 2010; Koenigstein, Dror, and information. However, in modeling sequential data, RNN
Koren 2011) extends SVD++ (Koren 2008) in the same assumes that temporal dependency changes monotonously
way. Rather than temporal information, spatial information along with the position in a sequence. This does not con-
can also be modeled via factorization models, such as ten- firm to some real situations, especially for the most re-
sor factorization (Zheng et al. 2010a) and collective ma- cent elements in a historical sequence, which means that
trix factorization (Zheng et al. 2010b). Moreover, temporal RNN can not well model local temporal contexts. Moreover,

195
though RNN shows promising performance comparing with denotes the time-specific transition matrix for the time inter-
the conventional methods in sequential prediction, it is not val t − ti before current time t. The matrix Tt−ti captures
capable to model the continuous geographical distance be- the impact of elements in the most recent history and takes
tween locations and time interval between behaviors. continuous time interval into consideration.

Proposed Model Spatial Temporal Recurrent Neural Networks


In this section, we first formulate our problem and introduce Conventional RNN have difficulty in modeling not only the
the general RNN model, and then detail our proposed ST- time interval information, but also the geographical distance
RNN model. Finally we present the learning procedure of between locations. Considering distance information is an
the proposed model. essential factor for location prediction, it is necessary to in-
volve it into our model. Similar to time-specific transition
Problem Formulation matrices, we incorporate distance-specific transition matri-
Let P be a set of users and Q be a set of locations, pu ∈ ces for different geographical distances between locations.
Rd and qv ∈ Rd indicate the latent vectors of user u and Distance-specific transition matrices capture geographical
location v. Each location v is associated with its coordinate properties that affect human behavior. In ST-RNN, as shown
{xv , yv }. For each user u, the history of where he has been in Figure 1, given a user u, his or her representation at time
is given as Qu = {qtu1 , qtu2 , ...}, where qtui denotes where t can be calculated as:
user u is at time ti . And the history of all users is denoted as ⎛ ⎞
QU = {Qu1 , Qu2 , ...}. Given historical records of a users, u ⎜  u u ⎟
ht,qu =f⎝ Squ −qu Tt−ti qt + Cht−ŵ,qu ⎠ ,
the task is to predict where a user will go next at a specific t
u ∈Qu ,t−ŵ<t <t
qt
t ti i t−ŵ
i i
time t. (3)
where Sqtu −qtu is the distance-specific transition matrix for
Recurrent Neural Networks i
the geographical distance qtu − qtui according to the current
The architecture of RNN consists of an input layer, an output coordinate, and qtu denotes the coordinate of user u at time t.
unit, hidden layers, as well as inner weight matrices (Zhang The geographical distance can be calculated as an Euclidean
et al. 2014). The vector representation of the hidden layers distance:
are computed as:
  qtu − qtui := xut − xuti , ytu − ytui 2 . (4)
hutk = f Mqutk + Chutk−1 , (1)
u
Usually, the location qt−w , i.e., the location user u visits at
where hutk is the representation of user u at time tk , qutk de- time t − w, does not exist in the visiting history Qu . We can
notes the latent vector of the location the user visits at time utilizes the approximate value ŵ as the local window width.
tk , C is the recurrent connection of the previous status prop- Based on the visiting list Qu and the time point t − w, we
agating sequential signals and M denotes the transition ma- set the value ŵ to make sure that ŵ is the most closed value
trix for input elements to capture the current behavior of the
ŵ ∈ Q . Thus, ŵ is usually a slightly larger or
u u
to w and qt−
user. The activation function f (x) is chosen as a sigmod small than w.
function f (x) = exp (1/1 + e−x ). Moreover, when the history is not long enough or the pre-
dicted position is at the very first part of the history, we have
RNN With Temporal Context
t < w. Then, Equation 3 should be rewritten as:
Since long time intervals have different impacts compar- ⎛ ⎞
ing with short ones, the length of time interval is essential 
for predicting future behaviors. But continuous time inter- hut,qtu = f ⎝ Sqtu −qtu Tt−ti quti + Chu0 ⎠ ,
i
vals can not be modeled by the current RNN model. Mean- qtu ∈Qu ,0<ti <t
i
while, since RNN can not well model local temporal con- (5)
texts in user behavioral history, we need more subtle pro- where hu0 = h0 denotes the initial status. The initial status
cessing for the most recent elements in a behavioral history. of all the users should be the same because there does not
Accordingly, it will be reasonable and plausible to model exist any behavioral information for personalised prediction
more elements of local temporal contexts in each layer of the in such situations.
recurrent structure and take continuous time intervals into
Finally, the prediction of ST-RNN can be yielded via cal-
consideration. Thus, we replace the transition matrix M in
culating inner product of user and item representations. The
RNN with time-specific transition matrices. Mathematically,
prediction of whether user u would go to location v at time
given a user u, his or her representation at time t can be cal-
t can be computed as:
culated as:
⎛ ⎞
 ou,t,v = (hut,qv + pu )T qv , (6)
hut = f ⎝ Tt−ti quti + Chut−w ⎠ , (2)
qtu ∈Qu ,t−w<ti <t
where pu is the permanent representation of user u, indicat-
i
ing his or her interest and activity range, and hut,qv captures
where w is the width of time window and the elements in his or her dynamic interests under the specific spatial and
this window are modeled by each layer of the model, Tt−ti temporal contexts.

196
Incorporating the negative log likelihood, we can solve the
following objective function equivalently:
 λ 2
J= ln(1 + e−(ou,t,v −ou,t,v ) ) + Θ , (10)
2
where Θ = {P, Q, S, T, C} denotes all the parameters to be
estimated, λ is a parameter to control the power of regular-
ization. And the derivations of J with respect to the param-
eters can be calculated as:
∂J  (qv − qv )e−(ou,t,v −ou,t,v )
  = + λpu ,
∂pu 1 + e−(ou,t,v −ou,t,v )
∂J  (hut,q + pu )e−(ou,t,v −ou,t,v )
=− v
+ λqv ,
Figure 1: Overview of proposed ST-RNN model. ∂qv 1 + e−(ou,t,v −ou,t,v )
∂J  (hut,q  + pu )e−(ou,t,v −ou,t,v )
= v
+ λqv ,
Linear Interpolation for Transition Matrices ∂qv 1 + e−(ou,t,v −ou,t,v )
If we learn a distinct matrix for each possible continu- ∂J  qv e−(ou,t,v −ou,t,v )
ous time intervals and geographical distances, the ST-RNN =− ,
model will face the data sparsity problem. Therefore, we
u
∂ht,qv 1 + e−(ou,t,v −ou,t,v )
partition time interval and geographical distance into dis- ∂J  qv e−(ou,t,v −ou,t,v )
crete bins respectively. Only the transition matrices for the =− .
upper and lower bound of the corresponding bins are learned
u
∂ht,qv 1 + e−(ou,t,v −ou,t,v )
in our model. For the time interval in a time bin or geograph- Moreover, parameters in ST-RNN can be further learnt
ical distance in a distance bin, their transition matrices can with the back propagation through time algorithm (Rumel-
be calculated via a linear interpolation. Mathematically, the hart, Hinton, and Williams 1988). Given the derivation
time-specific transition matrix Ttd for time interval td and ∂J/∂hut,qtu , the corresponding gradients of all parameters in
the distance-specific transition matrix Sld for geographical the hidden layer can be calculated as:
distance ld can be calculated as:

∂J ∂J
TL(td ) (U (td ) − td ) + TU (td ) (td − L(td )) =C T 
f (·) ⊗ ,
Ttd = , (7) ∂hut−w,vu ∂hut,qtu
[(U (td ) − td ) + (td − L(td ))] t−w



SL(ld ) (U (ld ) − ld ) + SU (ld ) (ld − L(ld )) ∂J ∂J  T

S ld = , (8) = f (·) ⊗ hut−w,vt−w
u ,
[(U (ld ) − ld ) + (ld − L(ld ))] ∂C ∂hut,qtu
where U (td ) and L(td ) denote the upper bound and lower
∂J  T ∂J
bound of time interval td , and U (ld ) and L(ld ) denote the = Sqtu −qtu Tt−ti 
f (·) ⊗ ,
upper bound and lower bound of geographical distance ld ∂quti i ∂hut,qtu
respectively. Such a linear interpolation method can solve
the problem of learning transition matrices for continuous ∂J  ∂J  T
= f (·) ⊗ Tt−ti quti ,
values and provide a solution for modeling the impact of ∂Sqtu −qtu u
∂ht,qtu
i
continuous temporal and spatial contexts.
∂J  T ∂J  u T

Parameter Inference = Sqtu −qtu f (·) ⊗ u q ti .
∂Tt−ti i ∂ht,qtu
In this subsection, we introduce the learning process of ST-
RNN with Bayesian Personalized Ranking (BPR) (Rendle Now, we can employ stochastic gradient descent to estimate
et al. 2009) and Back Propagation Through Time (BPTT) the model parameters, after all the gradients are calculated.
(Rumelhart, Hinton, and Williams 1988). This process can be repeated iteratively until the conver-
BPR (Rendle et al. 2009) is a state-of-the-art pairwise gence is achieved.
ranking framework for the implicit feedback data. The ba-
sic assumption of BPR is that a user prefers a selected lo- Experimental Results and Analysis
cation than a negative one. Then, we need to maximize the In this section, we conduct empirical experiments to demon-
following probability: strate the effectiveness of ST-RNN on next location predic-
p(u, t, v  v  ) = g(ou,t,v − ou,t,v ) , (9) tion. We first introduce the datasets, baseline methods and
evaluation metrics of our experiments. Then we compare our

where v denotes a negative location sample, and g(x) is a ST-RNN to the state-of-the-art baseline methods. The final
nonlinear function which is selected as g(x) = 1/1 + e−x . part is the parameter selection and convergence analysis.

197
Table 1: Performance comparison on two datasets evaluated by recall, F1-score, MAP and AUC.
recall@1 recall@5 recall@10 F1-score@1 F1-score@5 F1-score@10 MAP AUC

TOP 0.0052 0.0292 0.0585 0.0052 0.0097 0.0106 0.0372 0.6685


MF 0.0100 0.0538 0.1146 0.0100 0.0179 0.0208 0.0527 0.7056
MC 0.0091 0.0543 0.1015 0.0091 0.0181 0.0184 0.0510 0.7029
TF 0.0116 0.0588 0.1120 0.0116 0.0196 0.0204 0.0551 0.7097
Gowalla
PFMC 0.0159 0.0792 0.1535 0.0159 0.0264 0.0279 0.0671 0.7363
PFMC-LR 0.0186 0.0940 0.1823 0.0186 0.0313 0.0331 0.0763 0.7580
PRME 0.0203 0.0990 0.1896 0.0203 0.0330 0.0344 0.0847 0.7695
RNN 0.0257 0.1349 0.2286 0.0257 0.0450 0.0416 0.0921 0.7875
ST-RNN 0.0304 0.1524 0.2714 0.0304 0.0508 0.0493 0.1038 0.8115

TOP 0.0290 0.2105 0.3490 0.0290 0.0702 0.0634 0.1307 0.7036


MF 0.0784 0.2993 0.4935 0.0784 0.0998 0.0897 0.1986 0.7906
MC 0.0733 0.2995 0.4969 0.0733 0.0998 0.0903 0.1968 0.7929
TF 0.0861 0.3469 0.5347 0.0861 0.1156 0.0972 0.2181 0.8147
GTD
PFMC 0.0964 0.3944 0.5741 0.0964 0.1315 0.1044 0.2385 0.8376
PFMC-LR 0.1014 0.3988 0.5775 0.1014 0.1329 0.1050 0.2428 0.8399
PRME 0.1147 0.4128 0.5861 0.1147 0.1359 0.1066 0.2512 0.8431
RNN 0.1216 0.4168 0.5912 0.1216 0.1389 0.1075 0.2600 0.8470
ST-RNN 0.1654 0.4986 0.6812 0.1654 0.1662 0.1239 0.3238 0.9042

On both datasets, 70% elements of the behavioral history


Table 2: Performance of ST-RNN with varying window of each user are selected for training, then 20% for testing
width w on two datasets evaluated by recall, MAP and AUC. and the remaining 10% data as the validation set. The regu-
dataset w recall@1 recall@5 recall@10 MAP AUC lation parameter for all the experiments is set as λ = 0.01.
6h 0.0304 0.1524 0.2714 0.1038 0.8115 Then we employ several evaluation metrics. Recall@k
12h 0.0281 0.1447 0.2623 0.0996 0.8056 and F1-score@k are two popular metrics for ranking tasks.
Gowalla 1d 0.0271 0.1417 0.2598 0.0982 0.8042 The evaluation score for our experiment is computed accord-
(d = 13) 2d 0.0256 0.1357 0.2522 0.0953 0.7997 ing to where the next selected location appears in the ranked
3d 0.0258 0.1368 0.2543 0.0956 0.8007
list. We report recall@k and F1-score@k with k = 1, 5 and
15d 0.1649 0.4492 0.6232 0.3018 0.8708 10 in our experiments. The larger the value, the better the
1m 0.1717 0.4798 0.6529 0.3175 0.8880 performance. Mean Average Precision (MAP) and Area un-
GTD 2m 0.1698 0.4892 0.6604 0.3184 0.8919
der the ROC curve (AUC) are two commonly used global
(d = 7) 3m 0.1654 0.4986 0.6812 0.3238 0.9042
4m 0.1638 0.4803 0.6606 0.3145 0.8915
evaluations for ranking tasks. They are standard metrics for
6m 0.1662 0.4898 0.6710 0.3206 0.8994
evaluating the quality of the whole ranked lists. The larger
the value, the better the performance.
We compare ST-RNN with several representative methods
for location prediction:
Experimental Settings • TOP: The most popular locations in the training set are
We evaluate different methods based on two real-world selected as prediction for each user.
datasets belonging to two different scenarios: • MF (Mnih and Salakhutdinov 2007): Based on user-
• Gowalla1 (Cho, Myers, and Leskovec 2011) is a dataset location matrix, it is one of the state-of-the-art methods
from Gowalla Website, one of the biggest location-based for conventional collaborative filtering.
online social networks. It records check-in history of • MC: The markov chain model is a classical sequential
users, containing detailed timestamps and coordinates. model and can be used as a sequential baseline method.
We would like to predict where a user will check-in next. • TF (Bahadori, Yu, and Liu 2014): TF extends MF to
• Global Terrorism Database (GTD)2 includes more than three dimensions, including user, temporal information
125,000 terrorist incidents that have occurred around the and spatial information.
world since 1970. The time information is collected based • FPMC (Rendle, Freudenthaler, and Schmidt-Thieme
on the day level. For social good, we would like to predict 2010): It is a sequential prediction method based on
which province or state a terrorist organization will attack. markov chain.
Thus, it is available for us to take action before accidents • PFMC-LR (Cheng et al. 2013): It extends the PFMC with
happen and save people’s life. the location constraint in predicting.
• PRME (Feng et al. 2015): It takes distance between desti-
1
https://snap.stanford.edu/data/loc-gowalla.html nation location and recent vistaed ones into consideration
2
http://www.start.umd.edu/gtd/ for learning embeddings.

198
0.105 1 1

0.32 0.9 0.9

0.8 0.8
0.315

normalized value

normalized value
0.7 0.7
0.1 0.31
MAP

MAP
0.6 0.6
0.305
0.5 0.5
0.3
0.4 0.4
0.095 0.295
0.3 0.3
0.29 recall@1 recall@1
0.2 recall@5 0.2 recall@5
0.285 recall@10 recall@10
0.1 MAP 0.1 MAP
0.09
0 5 10 15 0 5 10 15 0
0 5 10 15 20 25 30
0
0 5 10 15 20
dimensionality dimensionality iterations iterations

(a) Gowalla with w = 6h (b) GTD with w = 3m (c) Gowalla (w = 6h, d = 13) (d) GTD (w = 3m, d = 7)

Figure 2: MAP Performance of ST-RNN on the two datasets with varying dimensionality d evaluated by MAP. Convergence
curve of ST-RNN on the two datasets measured by normalized recall and MAP.

• RNN (Zhang et al. 2014): This is a state-of-the-art method compared methods. These results shows that ST-RNN is not
for temporal prediction, which has been successfully ap- very sensitive to the window width.
plied in word embedding and ad click prediction. To investigate the impact of dimensionality and select the
best parameters for ST-RNN, we illustrate the MAP perfor-
Analysis of Experimental Results mance of ST-RNN on both datasets with varying dimension-
The performance comparison on the two datasets evaluated ality in Figure 2(a) and 2(b). The window width is set to be
by recall, F1-score, MAP and AUC is illustrated in Table w = 6h on the Gowalla dataset and w = 3m for the GTD
1. MF and MC obtain similar performance improvement dataset. It is clear that the performance of ST-RNN stays sta-
over TOP, and the better one is different with different met- ble in a large range on both datasets and the best parameters
rics. MF and MC can not model temporal information and can be selected as d = 13 and d = 7 respectively. Moreover,
collaborative information respectively. Jointly modeling all even not with the best dimensionality, ST-RNN still outper-
kinds of information, TF slightly improves the results com- forms all the compared methods according to Table 1. In a
paring with MF and MC, but can not well predict the fu- word, from these curves, we can say that ST-RNN is not very
ture. And PFMC improves the performance greatly compar- sensitive to the dimensionality and can be well applied for
ing with TF. PFMC-LR and PRME achieve further improve- practical applications.
ment with via incorporating distance information. Another
great improvement is brought by RNN, and it is the best Analysis of Convergence Rates
method among the compared ones. Moreover, we can ob- Figure 2(c) and 2(d) illustrate the convergence curves of ST-
serve that, ST-RNN outperforms the compared methods on RNN on the Gowalla and the GTD datasets evaluated by
the Gowalla dataset and the GTD dataset measured by all the recall and MAP. To draw the curves of different metrics in
metrics. And the MAP improvements comparing with RNN one figure and compare their convergence rates, we calcu-
are 12.71% and 24.54% respectively, while the AUC im- late normalized values of convergence results of recall@1,
provements are 3.04% and 6.54% respectively. These great recall@5, recall@10 and MAP on both datasets. The nor-
improvements indicate that our proposed ST-RNN can well malized values are computed according to the converge pro-
model temporal and spatial contexts. And the larger im- cedure of each evaluation metric which ensures the start-
provement on the GTD dataset shows that the impact of time ing value is 0 and the final value is 1 for each converge
interval information and geographical distance information curve. From these curves, we can observe that ST-RNN can
is more significant on modeling terrorist organizations’ be- converge in a satisfactory number of iterations. Moreover,
havior than on users’ check-in behavior. on both datasets, it is obvious that the curves of recall@1
converge very soon, followed by that of recall@5, and re-
Analysis of Window Width and Dimensionality
sults of recall@10 converge the slowest as well as results of
Table 2 illustrates the performance of ST-RNN evaluated by MAP. From this observation, we can find that more items
recall, MAP and AUC with varying window widths, which you would like to output in the ranked list, more iterations
can provide us a clue on the parameter selection. The dimen- are needed in training.
sionality is set to be d = 13 and d = 7 respectively. On the
Gowalla dataset, the best parameter is clearly to be w = 6h
under all the metrics. And the performances with other win- Conclusion
dow width are still better than those of compared methods. In this paper, we have proposed a novel spatial temporal pre-
On the GTD dataset, the best performance of recall@1 is diction method, i.e. ST-RNN. Instead of only one element in
obtained with w = 1m while the best performance of other each layer of RNN, ST-RNN considers the elements in the
metrics is obtained with w = 3m, which indicates that the local temporal contexts in each layer. In ST-RNN, to capture
longer ranking list requires a larger window width. We select time interval and geographical distance information, we re-
w = 3m as the parameter and all the results can still defeat place the single transition matrix in RNN with time-specific

199
transition matrices and distance-specific transition matrices. Li, X.; Cong, G.; Li, X.-L.; Pham, T.-A. N.; and Krish-
Moreover, a linear interpolation is applied for the training of naswamy, S. 2015. Rank-geofm: A ranking based geographi-
transition matrices. The experimental results on real datasets cal factorization method for point of interest recommendation.
show that ST-RNN outperforms the state-of-the-art methods In SIGIR, 433–442.
and can well model the spatial and temporal contexts. Liu, N. N.; Zhao, M.; Xiang, E.; and Yang, Q. 2010. Online
evolutionary collaborative filtering. In RecSys, 95–102.
Acknowledgments Liu, Y.; Wei, W.; Sun, A.; and Miao, C. 2014. Exploiting
This work is jointly supported by National Basic Research geographical neighborhood characteristics for location recom-
Program of China (2012CB316300), and National Natu- mendation. In CIKM, 739–748.
ral Science Foundation of China (61403390, U1435221, Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; and Khu-
61420106015, 61525306). danpur, S. 2010. Recurrent neural network based language
model. In INTERSPEECH, 1045–1048.
References Mikolov, T.; Kombrink, S.; Burget, L.; Cernocky, J. H.; and
Khudanpur, S. 2011a. Extensions of recurrent neural network
Bahadori, M. T.; Yu, Q. R.; and Liu, Y. 2014. Fast multivari- language model. In ICASSP, 5528–5531.
ate spatio-temporal analysis via low rank tensor learning. In
Mikolov, T.; Kombrink, S.; Deoras, A.; Burget, L.; and Cer-
NIPS, 3491–3499.
nocky, J. 2011b. Rnnlm-recurrent neural network language
Bhargava, P.; Phan, T.; Zhou, J.; and Lee, J. 2015. Who, modeling toolkit. In ASRU Workshop, 196–201.
what, when, and where: Multi-dimensional collaborative rec- Mnih, A., and Salakhutdinov, R. 2007. Probabilistic matrix
ommendations using tensor factorization on sparse user- factorization. In NIPS, 1257–1264.
generated data. In WWW, 130–140.
Nasraoui, O.; Cerwinske, J.; Rojas, C.; and González, F. A.
Chen, M.; Liu, Y.; and Yu, X. 2014. Nlpmm: A next location 2007. Performance of recommendation systems in dynamic
predictor with markov modeling. In PAKDD. 186–197. streaming environments. In SDM, 569–574.
Chen, J.; Wang, C.; and Wang, J. 2015. A personalized Natarajan, N.; Shin, D.; and Dhillon, I. S. 2013. Which app
interest-forgetting markov model for recommendations. In will you use next? collaborative filtering with interactional
AAAI, 16–22. context. In RecSys, 201–208.
Cheng, C.; Yang, H.; King, I.; and Lyu, M. R. 2012. Fused Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt-
matrix factorization with geographical and social influence in Thieme, L. 2009. Bpr: Bayesian personalized ranking from
location-based social networks. In AAAI, 17–23. implicit feedback. In UAI, 452–461.
Cheng, C.; Yang, H.; Lyu, M. R.; and King, I. 2013. Where Rendle, S.; Freudenthaler, C.; and Schmidt-Thieme, L. 2010.
you like to go next: Successive point-of-interest recommenda- Factorizing personalized markov chains for next-basket rec-
tion. In IJCAI, 2605–2611. ommendation. In WWW, 811–820.
Cho, E.; Myers, S. A.; and Leskovec, J. 2011. Friendship and Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1988.
mobility: User movement in location-based social networks. Learning representations by back-propagating errors. Cogni-
In SIGKDD, 1082–1090. tive modeling 5:3.
Ding, Y., and Li, X. 2005. Time weight collaborative filtering. Wang, P.; Guo, J.; Lan, Y.; Xu, J.; Wan, S.; and Cheng, X.
In CIKM, 485–492. 2015. Learning hierarchical representation model for nextbas-
ket recommendation. In SIGIR, 403–412.
Feng, S.; Li, X.; Zeng, Y.; Cong, G.; Chee, Y. M.; and Yuan,
Q. 2015. Personalized ranking metric embedding for next new Xiong, L.; Chen, X.; Huang, T.-K.; Schneider, J. G.; and Car-
poi recommendation. In IJCAI, 2069–2075. bonell, J. G. 2010. Temporal collaborative filtering with
bayesian probabilistic tensor factorization. In SDM, 211–222.
Kapoor, K.; Subbian, K.; Srivastava, J.; and Schrater, P. 2015.
Just in time recommendations: Modeling the dynamics of Ye, M.; Yin, P.; Lee, W.-C.; and Lee, D.-L. 2011. Exploit-
boredom in activity streams. In WSDM, 233–242. ing geographical influence for collaborative point-of-interest
recommendation. In SIGIR, 325–334.
Koenigstein, N.; Dror, G.; and Koren, Y. 2011. Yahoo! mu-
sic recommendations: modeling music ratings with temporal Zhang, Y.; Dai, H.; Xu, C.; Feng, J.; Wang, T.; Bian, J.; Wang,
dynamics and item taxonomy. In RecSys, 165–172. B.; and Liu, T.-Y. 2014. Sequential click prediction for spon-
sored search with recurrent neural networks. In AAAI, 1369–
Koren, Y.; Bell, R.; and Volinsky, C. 2009. Matrix factor- 1376.
ization techniques for recommender systems. IEEE Computer
Zheng, V. W.; Cao, B.; Zheng, Y.; Xie, X.; and Yang, Q.
42(8):30–37.
2010a. Collaborative filtering meets mobile recommendation:
Koren, Y. 2008. Factorization meets the neighborhood: a mul- A user-centered approach. In AAAI, 236–242.
tifaceted collaborative filtering model. In SIGKDD, 426–434. Zheng, V. W.; Zheng, Y.; Xie, X.; and Yang, Q. 2010b. Collab-
Koren, Y. 2010. Collaborative filtering with temporal dynam- orative location and activity recommendations with gps his-
ics. Communications of the ACM 53(4):89–97. tory data. In WWW, 1029–1038.
Lathia, N.; Hailes, S.; and Capra, L. 2009. Temporal col- Zhong, Y.; Yuan, N. J.; Zhong, W.; Zhang, F.; and Xie, X.
laborative filtering with adaptive neighbourhoods. In SIGIR, 2015. You are where you go: Inferring demographic attributes
796–797. from location check-ins. In WSDM, 295–304.

200

You might also like