Predicting The Next Location: A Recurrent Model With Spatial and Temporal Contexts
Predicting The Next Location: A Recurrent Model With Spatial and Temporal Contexts
194
In this paper, to better model spatial and temporal infor- and spatial information can be included in TF simultane-
mation, we propose a novel method called Spatial Tempo- ously as two separated dimensions and make location pre-
ral Recurrent Neural Networks (ST-RNN). Rather than con- diction (Bahadori, Yu, and Liu 2014; Bhargava et al. 2015;
sidering only one element in each layer of RNN, taking Zhong et al. 2015). However, it is hard for factorization
local temporal contexts into consideration, ST-RNN mod- based models to generate latent representations of time bins
els sequential elements in an almost fixed time period in that have never or seldom appeared in the training data.
each layer. Besides, ST-RNN utilizes the recurrent structure Thus, we can say that, it is hard to predict future behaviors
to capture the periodical temporal contexts. Therefore, ST- with factorization based models.
RNN can well model not only the local temporal contexts Neighborhood based models might be the most natural
but also the periodical ones. On the other hand, ST-RNN em- methods for prediction with both temporal and spatial con-
ploys the time-specific and distance-specific transition ma- texts. Time-aware neighborhood based methods (Ding and
trices to characterize dynamic properties of continuous time Li 2005; Nasraoui et al. 2007; Lathia, Hailes, and Capra
intervals and geographical properties of distances, respec- 2009; Liu et al. 2010) adapt the ordinary neighborhood
tively. Since it is difficult to estimate matrices for continu- based algorithms to temporal effects via giving more rele-
ous time intervals and geographical distances, we divide the vance to recent observations and less to past observations.
spatial and temporal values into discrete bins. For a specific And for spatial information, the distance between locations
temporal value in one time bin, we can calculate the corre- is calculated and the prediction is made based on power
sponding transition matrix via a linear interpolation of tran- law distribution (Ye et al. 2011) or the multi-center gaus-
sition matrices of the upper bound and lower bound. Simi- sian model (Cheng et al. 2012). Recently, some work con-
larly, for a specific spatial value, we can generate the tran- siders users’ interest to the neighborhood of the destination
sition matrix. Incorporating the recurrent architecture with location (Liu et al. 2014; Li et al. 2015). And Personalized
continuous time interval and location distance, ST-RNN can Ranking Metric Embedding (PRME) method (Feng et al.
better model spatial temporal contexts and give more accu- 2015) learns embeddings as well as calculating the distance
rate location prediction. between destination location and recent visited ones. How-
The main contributions of this work are listed as follows: ever, neighborhood based methods are unable to model the
• We model time intervals in a recurrent architecture with underlying properties in users’ sequential behavior history.
time-specific transition matrices, which presents a novel As a commonly-used method for sequential prediction,
perspective on temporal analysis. Markov Chain (MC) based models aim to predict the next
behavior of a user based on the past sequential behaviors.
• We incorporate distance-specific transition matrices for
In these methods, a estimated transition matrix indicates the
modeling geographical distances, which promotes the
probability of a behavior based on the past behaviors. Ex-
performance of spatial temporal prediction in a recurrent
tending MC via factorization of the probability transition
architecture.
matrix, Factorizing Personalized Markov Chain (FPMC)
• Experiments conducted on real-world datasets show that (Rendle, Freudenthaler, and Schmidt-Thieme 2010) has be-
ST-RNN is effective and clearly outperforms the state-of- come a state-of-the-art method. It has also been extended by
the-art methods. generating user groups (Natarajan, Shin, and Dhillon 2013),
modeling interest-forgetting curve (Chen, Wang, and Wang
Related Work 2015) and capturing dynamic of boredom (Kapoor et al.
In this section, we review several types of methods for 2015). Recently, rather than merely modeling temporal in-
spatial temporal prediction including factorization methods, formation, FPMC is successfully applied in the spatial tem-
neighborhood based methods, markov chain based methods poral prediction by using location Constraint (Cheng et al.
and recurrent neural networks. 2013) or combining with general MC methods (Chen, Liu,
Matrix Factorization (MF) based methods (Mnih and and Yu 2014). However, FPMC assumes that all the compo-
Salakhutdinov 2007; Koren, Bell, and Volinsky 2009) have nent are linearly combined, indicating that it makes strong
become the state-of-the-art approaches to collaborative fil- independent assumption among factors (Wang et al. 2015).
tering. The basic objective of MF is to factorize a user-item Recently, Recurrent Neural Networks (RNN) not only has
rating matrix into two low rank matrices, each of which been successfully applied in word embedding for sentence
represents the latent factors of users or items. The origi- modeling (Mikolov et al. 2010; 2011a; 2011b), but also
nal matrix can be approximated via the multiplying calcula- shows promising performance for sequential click predic-
tion. MF has been extended to be time-aware and location- tion (Zhang et al. 2014). RNN consists of an input layer,
aware nowadays. Tensor Factorization (TF) (Xiong et al. an output unit and multiple hidden layers. Hidden repre-
2010) treats time bins as another dimension and generate sentation of RNN can change dynamically along with a be-
latent vectors of users, items and time bins via factoriza- havioral history. It is a suitable tool for modeling temporal
tion. And timeSVD++ (Koren 2010; Koenigstein, Dror, and information. However, in modeling sequential data, RNN
Koren 2011) extends SVD++ (Koren 2008) in the same assumes that temporal dependency changes monotonously
way. Rather than temporal information, spatial information along with the position in a sequence. This does not con-
can also be modeled via factorization models, such as ten- firm to some real situations, especially for the most re-
sor factorization (Zheng et al. 2010a) and collective ma- cent elements in a historical sequence, which means that
trix factorization (Zheng et al. 2010b). Moreover, temporal RNN can not well model local temporal contexts. Moreover,
195
though RNN shows promising performance comparing with denotes the time-specific transition matrix for the time inter-
the conventional methods in sequential prediction, it is not val t − ti before current time t. The matrix Tt−ti captures
capable to model the continuous geographical distance be- the impact of elements in the most recent history and takes
tween locations and time interval between behaviors. continuous time interval into consideration.
196
Incorporating the negative log likelihood, we can solve the
following objective function equivalently:
λ 2
J= ln(1 + e−(ou,t,v −ou,t,v ) ) + Θ , (10)
2
where Θ = {P, Q, S, T, C} denotes all the parameters to be
estimated, λ is a parameter to control the power of regular-
ization. And the derivations of J with respect to the param-
eters can be calculated as:
∂J (qv − qv )e−(ou,t,v −ou,t,v )
= + λpu ,
∂pu 1 + e−(ou,t,v −ou,t,v )
∂J (hut,q + pu )e−(ou,t,v −ou,t,v )
=− v
+ λqv ,
Figure 1: Overview of proposed ST-RNN model. ∂qv 1 + e−(ou,t,v −ou,t,v )
∂J (hut,q + pu )e−(ou,t,v −ou,t,v )
= v
+ λqv ,
Linear Interpolation for Transition Matrices ∂qv 1 + e−(ou,t,v −ou,t,v )
If we learn a distinct matrix for each possible continu- ∂J qv e−(ou,t,v −ou,t,v )
ous time intervals and geographical distances, the ST-RNN =− ,
model will face the data sparsity problem. Therefore, we
u
∂ht,qv 1 + e−(ou,t,v −ou,t,v )
partition time interval and geographical distance into dis- ∂J qv e−(ou,t,v −ou,t,v )
crete bins respectively. Only the transition matrices for the =− .
upper and lower bound of the corresponding bins are learned
u
∂ht,qv 1 + e−(ou,t,v −ou,t,v )
in our model. For the time interval in a time bin or geograph- Moreover, parameters in ST-RNN can be further learnt
ical distance in a distance bin, their transition matrices can with the back propagation through time algorithm (Rumel-
be calculated via a linear interpolation. Mathematically, the hart, Hinton, and Williams 1988). Given the derivation
time-specific transition matrix Ttd for time interval td and ∂J/∂hut,qtu , the corresponding gradients of all parameters in
the distance-specific transition matrix Sld for geographical the hidden layer can be calculated as:
distance ld can be calculated as:
∂J ∂J
TL(td ) (U (td ) − td ) + TU (td ) (td − L(td )) =C T
f (·) ⊗ ,
Ttd = , (7) ∂hut−w,vu ∂hut,qtu
[(U (td ) − td ) + (td − L(td ))] t−w
SL(ld ) (U (ld ) − ld ) + SU (ld ) (ld − L(ld )) ∂J ∂J T
S ld = , (8) = f (·) ⊗ hut−w,vt−w
u ,
[(U (ld ) − ld ) + (ld − L(ld ))] ∂C ∂hut,qtu
where U (td ) and L(td ) denote the upper bound and lower
∂J T ∂J
bound of time interval td , and U (ld ) and L(ld ) denote the = Sqtu −qtu Tt−ti
f (·) ⊗ ,
upper bound and lower bound of geographical distance ld ∂quti i ∂hut,qtu
respectively. Such a linear interpolation method can solve
the problem of learning transition matrices for continuous ∂J ∂J T
= f (·) ⊗ Tt−ti quti ,
values and provide a solution for modeling the impact of ∂Sqtu −qtu u
∂ht,qtu
i
continuous temporal and spatial contexts.
∂J T ∂J u T
Parameter Inference = Sqtu −qtu f (·) ⊗ u q ti .
∂Tt−ti i ∂ht,qtu
In this subsection, we introduce the learning process of ST-
RNN with Bayesian Personalized Ranking (BPR) (Rendle Now, we can employ stochastic gradient descent to estimate
et al. 2009) and Back Propagation Through Time (BPTT) the model parameters, after all the gradients are calculated.
(Rumelhart, Hinton, and Williams 1988). This process can be repeated iteratively until the conver-
BPR (Rendle et al. 2009) is a state-of-the-art pairwise gence is achieved.
ranking framework for the implicit feedback data. The ba-
sic assumption of BPR is that a user prefers a selected lo- Experimental Results and Analysis
cation than a negative one. Then, we need to maximize the In this section, we conduct empirical experiments to demon-
following probability: strate the effectiveness of ST-RNN on next location predic-
p(u, t, v v ) = g(ou,t,v − ou,t,v ) , (9) tion. We first introduce the datasets, baseline methods and
evaluation metrics of our experiments. Then we compare our
where v denotes a negative location sample, and g(x) is a ST-RNN to the state-of-the-art baseline methods. The final
nonlinear function which is selected as g(x) = 1/1 + e−x . part is the parameter selection and convergence analysis.
197
Table 1: Performance comparison on two datasets evaluated by recall, F1-score, MAP and AUC.
recall@1 recall@5 recall@10 F1-score@1 F1-score@5 F1-score@10 MAP AUC
198
0.105 1 1
0.8 0.8
0.315
normalized value
normalized value
0.7 0.7
0.1 0.31
MAP
MAP
0.6 0.6
0.305
0.5 0.5
0.3
0.4 0.4
0.095 0.295
0.3 0.3
0.29 recall@1 recall@1
0.2 recall@5 0.2 recall@5
0.285 recall@10 recall@10
0.1 MAP 0.1 MAP
0.09
0 5 10 15 0 5 10 15 0
0 5 10 15 20 25 30
0
0 5 10 15 20
dimensionality dimensionality iterations iterations
(a) Gowalla with w = 6h (b) GTD with w = 3m (c) Gowalla (w = 6h, d = 13) (d) GTD (w = 3m, d = 7)
Figure 2: MAP Performance of ST-RNN on the two datasets with varying dimensionality d evaluated by MAP. Convergence
curve of ST-RNN on the two datasets measured by normalized recall and MAP.
• RNN (Zhang et al. 2014): This is a state-of-the-art method compared methods. These results shows that ST-RNN is not
for temporal prediction, which has been successfully ap- very sensitive to the window width.
plied in word embedding and ad click prediction. To investigate the impact of dimensionality and select the
best parameters for ST-RNN, we illustrate the MAP perfor-
Analysis of Experimental Results mance of ST-RNN on both datasets with varying dimension-
The performance comparison on the two datasets evaluated ality in Figure 2(a) and 2(b). The window width is set to be
by recall, F1-score, MAP and AUC is illustrated in Table w = 6h on the Gowalla dataset and w = 3m for the GTD
1. MF and MC obtain similar performance improvement dataset. It is clear that the performance of ST-RNN stays sta-
over TOP, and the better one is different with different met- ble in a large range on both datasets and the best parameters
rics. MF and MC can not model temporal information and can be selected as d = 13 and d = 7 respectively. Moreover,
collaborative information respectively. Jointly modeling all even not with the best dimensionality, ST-RNN still outper-
kinds of information, TF slightly improves the results com- forms all the compared methods according to Table 1. In a
paring with MF and MC, but can not well predict the fu- word, from these curves, we can say that ST-RNN is not very
ture. And PFMC improves the performance greatly compar- sensitive to the dimensionality and can be well applied for
ing with TF. PFMC-LR and PRME achieve further improve- practical applications.
ment with via incorporating distance information. Another
great improvement is brought by RNN, and it is the best Analysis of Convergence Rates
method among the compared ones. Moreover, we can ob- Figure 2(c) and 2(d) illustrate the convergence curves of ST-
serve that, ST-RNN outperforms the compared methods on RNN on the Gowalla and the GTD datasets evaluated by
the Gowalla dataset and the GTD dataset measured by all the recall and MAP. To draw the curves of different metrics in
metrics. And the MAP improvements comparing with RNN one figure and compare their convergence rates, we calcu-
are 12.71% and 24.54% respectively, while the AUC im- late normalized values of convergence results of recall@1,
provements are 3.04% and 6.54% respectively. These great recall@5, recall@10 and MAP on both datasets. The nor-
improvements indicate that our proposed ST-RNN can well malized values are computed according to the converge pro-
model temporal and spatial contexts. And the larger im- cedure of each evaluation metric which ensures the start-
provement on the GTD dataset shows that the impact of time ing value is 0 and the final value is 1 for each converge
interval information and geographical distance information curve. From these curves, we can observe that ST-RNN can
is more significant on modeling terrorist organizations’ be- converge in a satisfactory number of iterations. Moreover,
havior than on users’ check-in behavior. on both datasets, it is obvious that the curves of recall@1
converge very soon, followed by that of recall@5, and re-
Analysis of Window Width and Dimensionality
sults of recall@10 converge the slowest as well as results of
Table 2 illustrates the performance of ST-RNN evaluated by MAP. From this observation, we can find that more items
recall, MAP and AUC with varying window widths, which you would like to output in the ranked list, more iterations
can provide us a clue on the parameter selection. The dimen- are needed in training.
sionality is set to be d = 13 and d = 7 respectively. On the
Gowalla dataset, the best parameter is clearly to be w = 6h
under all the metrics. And the performances with other win- Conclusion
dow width are still better than those of compared methods. In this paper, we have proposed a novel spatial temporal pre-
On the GTD dataset, the best performance of recall@1 is diction method, i.e. ST-RNN. Instead of only one element in
obtained with w = 1m while the best performance of other each layer of RNN, ST-RNN considers the elements in the
metrics is obtained with w = 3m, which indicates that the local temporal contexts in each layer. In ST-RNN, to capture
longer ranking list requires a larger window width. We select time interval and geographical distance information, we re-
w = 3m as the parameter and all the results can still defeat place the single transition matrix in RNN with time-specific
199
transition matrices and distance-specific transition matrices. Li, X.; Cong, G.; Li, X.-L.; Pham, T.-A. N.; and Krish-
Moreover, a linear interpolation is applied for the training of naswamy, S. 2015. Rank-geofm: A ranking based geographi-
transition matrices. The experimental results on real datasets cal factorization method for point of interest recommendation.
show that ST-RNN outperforms the state-of-the-art methods In SIGIR, 433–442.
and can well model the spatial and temporal contexts. Liu, N. N.; Zhao, M.; Xiang, E.; and Yang, Q. 2010. Online
evolutionary collaborative filtering. In RecSys, 95–102.
Acknowledgments Liu, Y.; Wei, W.; Sun, A.; and Miao, C. 2014. Exploiting
This work is jointly supported by National Basic Research geographical neighborhood characteristics for location recom-
Program of China (2012CB316300), and National Natu- mendation. In CIKM, 739–748.
ral Science Foundation of China (61403390, U1435221, Mikolov, T.; Karafiát, M.; Burget, L.; Cernockỳ, J.; and Khu-
61420106015, 61525306). danpur, S. 2010. Recurrent neural network based language
model. In INTERSPEECH, 1045–1048.
References Mikolov, T.; Kombrink, S.; Burget, L.; Cernocky, J. H.; and
Khudanpur, S. 2011a. Extensions of recurrent neural network
Bahadori, M. T.; Yu, Q. R.; and Liu, Y. 2014. Fast multivari- language model. In ICASSP, 5528–5531.
ate spatio-temporal analysis via low rank tensor learning. In
Mikolov, T.; Kombrink, S.; Deoras, A.; Burget, L.; and Cer-
NIPS, 3491–3499.
nocky, J. 2011b. Rnnlm-recurrent neural network language
Bhargava, P.; Phan, T.; Zhou, J.; and Lee, J. 2015. Who, modeling toolkit. In ASRU Workshop, 196–201.
what, when, and where: Multi-dimensional collaborative rec- Mnih, A., and Salakhutdinov, R. 2007. Probabilistic matrix
ommendations using tensor factorization on sparse user- factorization. In NIPS, 1257–1264.
generated data. In WWW, 130–140.
Nasraoui, O.; Cerwinske, J.; Rojas, C.; and González, F. A.
Chen, M.; Liu, Y.; and Yu, X. 2014. Nlpmm: A next location 2007. Performance of recommendation systems in dynamic
predictor with markov modeling. In PAKDD. 186–197. streaming environments. In SDM, 569–574.
Chen, J.; Wang, C.; and Wang, J. 2015. A personalized Natarajan, N.; Shin, D.; and Dhillon, I. S. 2013. Which app
interest-forgetting markov model for recommendations. In will you use next? collaborative filtering with interactional
AAAI, 16–22. context. In RecSys, 201–208.
Cheng, C.; Yang, H.; King, I.; and Lyu, M. R. 2012. Fused Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt-
matrix factorization with geographical and social influence in Thieme, L. 2009. Bpr: Bayesian personalized ranking from
location-based social networks. In AAAI, 17–23. implicit feedback. In UAI, 452–461.
Cheng, C.; Yang, H.; Lyu, M. R.; and King, I. 2013. Where Rendle, S.; Freudenthaler, C.; and Schmidt-Thieme, L. 2010.
you like to go next: Successive point-of-interest recommenda- Factorizing personalized markov chains for next-basket rec-
tion. In IJCAI, 2605–2611. ommendation. In WWW, 811–820.
Cho, E.; Myers, S. A.; and Leskovec, J. 2011. Friendship and Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1988.
mobility: User movement in location-based social networks. Learning representations by back-propagating errors. Cogni-
In SIGKDD, 1082–1090. tive modeling 5:3.
Ding, Y., and Li, X. 2005. Time weight collaborative filtering. Wang, P.; Guo, J.; Lan, Y.; Xu, J.; Wan, S.; and Cheng, X.
In CIKM, 485–492. 2015. Learning hierarchical representation model for nextbas-
ket recommendation. In SIGIR, 403–412.
Feng, S.; Li, X.; Zeng, Y.; Cong, G.; Chee, Y. M.; and Yuan,
Q. 2015. Personalized ranking metric embedding for next new Xiong, L.; Chen, X.; Huang, T.-K.; Schneider, J. G.; and Car-
poi recommendation. In IJCAI, 2069–2075. bonell, J. G. 2010. Temporal collaborative filtering with
bayesian probabilistic tensor factorization. In SDM, 211–222.
Kapoor, K.; Subbian, K.; Srivastava, J.; and Schrater, P. 2015.
Just in time recommendations: Modeling the dynamics of Ye, M.; Yin, P.; Lee, W.-C.; and Lee, D.-L. 2011. Exploit-
boredom in activity streams. In WSDM, 233–242. ing geographical influence for collaborative point-of-interest
recommendation. In SIGIR, 325–334.
Koenigstein, N.; Dror, G.; and Koren, Y. 2011. Yahoo! mu-
sic recommendations: modeling music ratings with temporal Zhang, Y.; Dai, H.; Xu, C.; Feng, J.; Wang, T.; Bian, J.; Wang,
dynamics and item taxonomy. In RecSys, 165–172. B.; and Liu, T.-Y. 2014. Sequential click prediction for spon-
sored search with recurrent neural networks. In AAAI, 1369–
Koren, Y.; Bell, R.; and Volinsky, C. 2009. Matrix factor- 1376.
ization techniques for recommender systems. IEEE Computer
Zheng, V. W.; Cao, B.; Zheng, Y.; Xie, X.; and Yang, Q.
42(8):30–37.
2010a. Collaborative filtering meets mobile recommendation:
Koren, Y. 2008. Factorization meets the neighborhood: a mul- A user-centered approach. In AAAI, 236–242.
tifaceted collaborative filtering model. In SIGKDD, 426–434. Zheng, V. W.; Zheng, Y.; Xie, X.; and Yang, Q. 2010b. Collab-
Koren, Y. 2010. Collaborative filtering with temporal dynam- orative location and activity recommendations with gps his-
ics. Communications of the ACM 53(4):89–97. tory data. In WWW, 1029–1038.
Lathia, N.; Hailes, S.; and Capra, L. 2009. Temporal col- Zhong, Y.; Yuan, N. J.; Zhong, W.; Zhang, F.; and Xie, X.
laborative filtering with adaptive neighbourhoods. In SIGIR, 2015. You are where you go: Inferring demographic attributes
796–797. from location check-ins. In WSDM, 295–304.
200