1 Introduction

Points of Interest (POIs) are specific locations that are notable or useful to individuals, often highlighted in maps and navigation systems for informational or navigational purposes. Location-based social networks (LBSNs), e.g., Foursquare, Instagram and Yelp, have gained global popularity as platforms where users can share their experiences at various POIs. The extensive data generated from these LBSNs enable the delivery of personalized location-aware services to users. POI recommendation systems have attracted significant attention due to their potential to enhance user engagement and support targeted advertising for businesses.

As user preferences are dynamically changing, POI recommendation is often a time-critical task, meaning that the right place should be presented to the user at the right time. Previously, systems have been developed for recommending the next new POI to visit in a short period [1,2,3,4,5,6,7,8,9,10]. However, these next POI recommenders suffer several drawbacks: (1) Users occasionally visit multiple POIs in a short period [11, 12], so recommending only the next one may not be sufficient; (2) Users may not plan their visits in a short period step by step [13], e.g., arbitrarily visiting a group of stores to compare products, so it is more meaningful to recommend all places the user may visit in this period; (3) Even though next POI recommenders may iteratively predict a sequence of POIs, the recommendation loses focus on the time horizon.

In light of this, we formulate a novel task, namely short-term POI recommendation, to recommend a set of new POIs for the user to visit in a short period following the user’s recent check-ins. Short-term POI recommendation poses numerous challenges. First, we need to model not only direct (first-order) transitions but also indirect (high-order) transitions in a short period. In Fig. 1, we recommend to \(u_1\) new POIs d and e for the next six hours, based on recent check-ins on b and c, direct transitions \(c\rightarrow d\) observed from \(u_2\) and \(u_4\), \(b\rightarrow e\) from \(u_3\), and indirect transition \(b\rightarrow d\) from \(u_4\). We would otherwise miss d if we consider only direct transitions from the last check-in on b. Second, we should consider the temporal context of transitions. In this example, we would rank d before e since both \(b\rightarrow d\) and \(c\rightarrow d\) have smaller time span than \(b\rightarrow e\), suggesting that d is more likely to be visited within six hours. Third, we need to learn users’ personal preferences. For instance, \(u_1\)’s preferences are more similar with \(u_4\)’s than \(u_3\)’s because of more common check-ins. Besides, check-ins are sparse and non-uniform, and the scarcity of POI transitions within the time horizon makes models prone to overfit training examples. In addition, check-ins are implicit feedback [5, 14,15,16] that contains much noise.

Fig. 1
figure 1

Illustration of short-term POI recommendation

To our knowledge, few existing work addresses all these challenges. Time-aware POI recommenders [16,17,18] do not utilize sequential information. Next POI recommenders [1,2,3,4,5,6,7,8,9,10] focus only on the next POI in a short period. Mobility prediction models, such as refs. [13, 19,20,21,22,23], are not effective in recommending a set of new POIs over a short period. Session-based (aware) recommenders [15, 24,25,26] do not consider the temporal context of transitions.

In this paper, we propose a simple but effective Personalized Time-Weighted Latent Ranking (PTWLR) model that jointly learns users’ short-term and general POI preferences by factorizing POI transitions and explicitly models both direct and indirect POI transitions within a specific time horizon. To capture the temporal context of transitions, we introduce a temporal weighting scheme that adapts training weights based on transition time. We extend our model to accommodate the transition dependencies on multiple recent check-ins and the geographical context. Experiments on real-world LBSN datasets show that our model achieves over 14% overall performance gain over seven baseline methods across different metrics and yields the best results in various contexts. Notably, our model has superior performance for users of different activity levels and adapts to varying time horizons.

We summarize our contributions below:

  • We formulate the task of short-term POI recommendation and perform data analysis.

  • We propose a Personalized Time-Weighted Latent Ranking (PTWLR) model that jointly learns POI transitions in a time horizon and user preferences, together with a novel temporal weighting scheme for training. We extend it with the transition dependencies on multiple recent check-ins and the geographical context.

  • We evaluate our model on real-world LBSN datasets against seven widely used baselines under varying time horizon settings and perform subgroup analysis. We conduct ablation study and parameter effect analysis to validate the effectiveness of proposed components.

2 Related work

2.1 POI recommendation

POI recommendation can take memory-based or model-based approaches. On memory-based approaches, Ye et al. [27] report that user-based collaborative filtering (CF) outperforms item-based CF because POIs may have insufficient check-ins for similarity measurement. As such, they propose USG, a user-based CF model that fuses geographical and social influences. Sang et al. [11] propose a context-based CF model for sequential POI recommendation. Yuan et al. [18] investigate the modeling of temporal and geographical influences. Zhang et al. [28] exploits the transition dependencies on historical check-ins using additive Markov chain. Nonetheless, memory-based recommenders cannot handle large-scale data and are susceptible to data sparsity.

On model-based approaches, latent factor models have been widely adopted in POI recommendation, which approximates the data with low-rank latent matrices and can be extended to incorporate context such as sequential [2,3,4,5,6, 29], temporal [2,3,4, 6, 16, 17], geographical [16, 29, 30], categorical [4,5,6], and social [2, 30] information. Rank-GeoFM [16] captures the geographical influences from neighboring POIs and the temporal influences from similar time slots. Other model-based approaches include metric embedding [1, 7, 31], probabilistic graphical models [32, 33], and neural networks [8,9,10, 34].

Various context information has been exploited in POI recommendation. Geographical influences can be modeled using power-law distributions [5, 16, 18, 27], Gaussian mixture models [19, 30, 32], or kernel density estimation [28], among which power-law distributions are empirically shown to be the most effective approach [35]. POI categories are considered in [4,5,6, 11, 32]. Social influences are considered in [2, 27, 28, 30], but Ye et al. [27] observe that they play a minor role because friends could have different tastes.

2.2 Temporal context-aware recommendation

Temporal context-aware recommenders capture users’ dynamically changing preferences, in addition to long-term preferences. Time-aware POI recommenders [16,17,18] model users’ temporal preferences in disjoint periodical time slots, e.g., hours and days. Nonetheless, as the time slots are fixed and sequential information is not utilized, they are unsuitable for short-term POI recommendation. On sequential recommenders, FPMC [36] jointly learns users’ general and sequential preferences to recommend the next basket of items based on recent purchases. FPMC-LR [29] incorporates a localized region constraint to recommend the next set of POIs, but it omits indirect POI transitions and transition time. Session-based (aware) recommenders [15, 24,25,26] predict items stepwise in a session, but they disregard the temporal context as the duration of sessions varies.

Next POI recommenders [1,2,3,4,5,6,7,8,9,10] focus on the next new POI a user may visit in a short period. PRME [1] embeds user-POI and POI sequential relations in Euclidean spaces. HME [7] embeds those relations in a Poincaré ball to better represent hierarchical structures. PRME and HME preserve high-order sequential relations since sequentially related POIs are embedded in proximity in the latent space, but the directions of sequential relations are ignored so from and to POIs cannot be distinguished. Methods in refs. [2, 3, 6] integrate users’ temporal and sequential preferences, but they are trained with only direct POI transitions and consider only dependencies on the last POI. Since these models assume the next check-in time is available as a timestamp, they are not useful for short-term POI recommendation. TLR-M_UI [8] leverages transformer encoder-decoders to simultaneously predict the next POI and queuing time. GETNext [9] and STHGCN [10] are transformer-based models that adopt graphical representations of trajectories. These models are unsuitable for our task since they focus on predicting the next POI in a trajectory rather than a set of POIs over a short period.

High-order sequential relations are exploited in conjunction with transition time in some works. Zhao et al. [37] propose a utility-based recommender that models time gaps between the next purchase and historical purchases. Wang and Zhang [38] propose a conditional opportunity model to estimate the time of follow-up purchases. Liu et al. [33] adopt factorized probabilistic graphs for POI recommendation over future periods, but the performance over consecutive short periods is not strong. Mobility prediction models in refs. [13, 20, 22] consider spatio-temporal intervals between check-ins. Work by Chen et al. [23] simultaneously predicts the location and time of the next check-in but treats them as separate outputs. TiSASRec [39] is an attention-based model that embeds time intervals between purchases. These models focus on predicting the next location or item. Differently, we exploit high-order POI transitions and transition time specifically for short-term POI recommendation. Our model predicts the set of new POIs a user may visit in a consecutive short period based on the user’s recent check-ins and POI transitions within the time horizon.

2.3 Ranking-based optimization

Ranking-based optimization objectives measure the ranking quality of results, which work well on implicit feedback and sparse data because they utilize both positive and negative training examples. Bayesian Personalized Ranking (BPR) [14] optimizes the pairwise rank of positive and negative examples. Weighted Approximate Rank Pairwise (WARP) [40, 41], initially developed for information retrieval, computes loss on the rank of positive examples with a diminishing marginal penalty for losing a rank. A stochastic gradient descent (SGD) optimization algorithm for WARP is proposed in Weston et al. [41], and a truncated sampling scheme is presented in Lim and Lanckriet [42]. Check-in frequencies have been exploited in WARP for POI recommendation in refs. [16, 43]. Different from existing work, we consider the time horizon effect in our optimization objective.

3 Data description and analysis

3.1 Data description

We use two real-world LBSN datasets published in Liu et al. [35]. One contains Foursquare data in US from April 2012 to September 2013 [44]; the other contains Gowalla data world-wide from Feburary 2009 to October 2010 [19]. Each check-in is represented as a tuple (userPOItime) and assumed to record the user’s actual location. In Foursquare, users and POIs with fewer than 10 check-ins are filtered; in Gowalla, users with fewer than 15 check-ins and POIs with fewer than 10 check-ins are filtered. After ordering check-ins by time, we merge successive check-ins of the same POI within 60 s, which are assumed to be duplicates. Table 1 summarizes the statistics of both datasets after pre-processing.

Table 1 Statistics of datasets

3.2 Data analysis

3.2.1 POI check-in

Figure 2a shows the distributions on the number of check-ins by users. Our datasets are sparse so most users fall in \(10-50\) group. Both datasets have a wide distribution, indicating users’ different activity levels, and Gowalla has a longer tail due to higher density. We will evaluate the performance for different user groups in Sect. 6.2.3.

Figure 2b plots the complementary cumulative distribution functions (CCDFs) of users’ exploration ratio on new POIs. Starting with a user’s first 70% check-ins as the history, we incrementally count new POI check-ins and update the history. On both datasets, over 70% users visit new POIs at least half of their time, showing their persistent impetus for exploration [1, 12, 29]. Thus, it is meaningful to study new POI recommendation.

Fig. 2
figure 2

POI check-in distributions

3.2.2 Short-term POI visit

Figure 3 shows the distributions on the size of non-empty check-in sets within 3, 6, and 12 h after a check-in. A substantial proportion of sets contain multiple POIs, suggesting that users tend to visit multiple POIs in a short time [11, 12]. On Gowalla, this proportion is over 50% under any time horizon setting. Hence, it is meaningful to recommend a set of POIs the user may visit in a consecutive short period. We will evaluate the performance on check-in sets of different sizes in Sect. 6.2.2.

Fig. 3
figure 3

Distributions on the size of non-empty check-in sets in consecutive periods

3.2.3 Transition time

Figure 4a plots the CCDFs of time between consecutive check-ins. The proportions of consecutive check-in pairs within 3, 6, and 12 h are 18.24%, 23.88%, 29.14% on Foursquare, and 37.94%, 45.63%, 51.56% on Gowalla, respectively, showing that short-term POI transitions are scarcer on Foursquare. Figure 4b plots the CCDFs of transition time less than 6 h, which we observe roughly following exponential distributions.

Fig. 4
figure 4

Distributions on transition time

4 Task formulation

Let \(\mathcal {U}\) be the set of users and \(\mathcal {L}\) be the set of POIs. Each check-in s is a tuple (ult) with a user \(u\in \mathcal {U}\), a POI \(l\in \mathcal {L}\) and time t. Let \(\mathcal {S}_u\) be the set of check-ins by a user and \(\mathcal {L}_u\) be the set of POIs visited by a user. We use \(\mathcal {S}^c_u\) to denote the current set of check-ins and \(\mathcal {L}^c_u\) to denote the current set of POIs visited by a user. We denote the time difference of a check-in pair \((s_i,s_j)\) with \(\delta t_{s_i,s_j}:=t_j-t_i\), where \(s_i = (u_i,l_i,t_i)\) and \(s_j=(u_j,l_j,t_j)\). We formulate our task as follows:

Definition 1

(Short-Term POI Recommendation) Given the current check-in \(s^c=(u^c,l^c,t^c)\) and the current set of check-ins \(\mathcal {S}_{u^c}^c\) by a user, we recommend a set of new POIs \(\mathcal {D}_{s^c,\tau }\subseteq \mathcal {L}{\setminus } \mathcal {L}_{u^c}^c\) that the user would visit from \(l^c\) in the time interval \((t^c,t^c+\tau ]\), where \(\tau\) is the time horizon.

We define some fundamental concepts:

Definition 2

(Relevant and Irrelevant POIs) For a time horizon \(\tau\), the set of relevant POIs from a check-in \(s_i=(u_i,l_i,t_i)\) is \(\mathcal {L}_{s_i,\tau }^+:=\{l_j\mid s_j\in \mathcal {S}_{u_i}\wedge 0<\delta t_{s_i,s_j}\le \tau \}\), i.e., POIs visited by the user in \(\tau\). Note that \(\mathcal {D}_{s_i,\tau }\subseteq \mathcal {L}_{s_i,\tau }^+\). The set of irrelevant POIs is \(\mathcal {L}_{s_i,\tau }^-:=\mathcal {L}{\setminus } \bigcup _{s_j\in \mathcal {S}_{u_i},l_j=l_i}\mathcal {L}_{s_j,\tau }^+\), i.e., POIs not included in any relevant set from a check-in on \(l_i\) by the user.

Definition 3

(Transitions) The set of transitions of a user is \(\mathcal {T}_u:=\{(s_i,s_j)\mid s_i, s_j\in \mathcal {S}_u\wedge \delta t_{s_i,s_j}>0\}\), i.e., check-in pairs of a user with positive time difference. We define \(\mathcal {T}:=\bigcup _{u\in \mathcal {U}}\mathcal {T}_u\).

Definition 4

(Temporal and Non-temporal Transitions) For a time horizon \(\tau\), the set of temporal transitions of a user is \(\mathcal {T}_{u,\tau }^T:=\{(s_i,s_j)\in \mathcal {T}_u\mid \delta t_{s_i,s_j}\le \tau \}\), i.e., transitions with time no greater than \(\tau\). The set of non-temporal transitions is \(\mathcal {T}_{u,\tau }^{\lnot T}:=\mathcal {T}_u{\setminus }\mathcal {T}_{u,\tau }^T\). We define \(\mathcal {T}_\tau ^T:=\bigcup _{u\in \mathcal {U}} \mathcal {T}_{u,\tau }^T\) and \(\mathcal {T}_\tau ^{\lnot T}:=\bigcup _{u\in \mathcal {U}} \mathcal {T}_{u,\tau }^{\lnot T}\).

Definition 5

(Direct and Indirect Transitions) The set of direct transitions of a user is \(\mathcal {T}_u^D:=\{(s_i,s_j)\in \mathcal {T}_u\mid \delta t_{s_i,s_j}=min\{\delta t_{s_i,s_k}\mid (s_i,s_k)\in \mathcal {T}_u\}\}\), i.e., consecutive check-in pairs. The set of indirect transitions is \(\mathcal {T}_u^{\lnot D}:=\mathcal {T}_u{\setminus }\mathcal {T}_u^D\). We define \(\mathcal {T}^D:=\bigcup _{u\in \mathcal {U}} \mathcal {T}_u^D\) and \(\mathcal {T}^{\lnot D}:=\bigcup _{u\in \mathcal {U}}\mathcal {T}_u^{\lnot D}\).

In Fig. 5, \(a\rightarrow b\) and \(a\rightarrow c\) are temporal transitions, while \(a\rightarrow d\) is a non-temporal transition; \(a\rightarrow b\) is a direct transition, while \(a \rightarrow c\) and \(a\rightarrow d\) are indirect transitions. Note that from a check-in \(s_i\), any temporal transition directs to some relevant POI in \(\mathcal {L}_{s_i,\tau }^+\), while a non-temporal transition does not necessarily direct to any relevant POI. Table 2 summarizes our notations.

Fig. 5
figure 5

Illustration of transitions

Table 2 List of notations

5 Proposed method

For short-term POI recommendation, we propose a ranking-based model that jointly learns short-term direct and indirect POI transitions along with user preferences. We also develop a temporal weighting scheme to capture the temporal context of transitions. In this section, we describe our model and training examples, explain our optimization objective and training algorithm, and propose extensions to incorporate extra context.

5.1 Personalized latent ranking framework

Given the current check-in \(s^c=(u^c,l^c,t^c)\) and check-in set \(\mathcal {S}_{u^c}^c\) of a user, we learn a function \(f_\tau (s^c,\mathcal {S}_{u^c}^c,l)\) that scores the user’s visiting preference on a new POI l in a time horizon \(\tau\). We start with considering only the user u and the current POI \(l^c\) and model user-POI and POI-POI relations with low-dimensional latent matrices \(\textbf{V}^{U,I}\in \mathbb {R}^{|\mathcal {U}|\times K}\), \(\textbf{V}^{I,U}\in \mathbb {R}^{|\mathcal {L}|\times K}\), \(\textbf{V}^{L,I}\in \mathbb {R}^{|\mathcal {L}|\times K}\), and \(\textbf{V}^{I,L}\in \mathbb {R}^{|\mathcal {L}|\times K}\). Our simplified scoring function is expressed as:

$$\begin{aligned} g_\tau (u^c,l^c,l)=\textbf{v}_{u^c}^{U,I}\cdot \textbf{v}_l^{I,U}+\textbf{v}_{l^c}^{L,I}\cdot \textbf{v}_l^{I,L}, \end{aligned}$$
(1)

where the first term measures user preferences and the second term measures transition influences from the current POI.

When constructing training examples \(D_\mathcal {T}\) for our model, we categorize observed transitions w.r.t. Definitions 3, 4, 5. We learn POI transition relations from temporal transitions \(\mathcal {T}_\tau ^T\) because they direct to relevant POIs within the time horizon of our task (Definition 2). Since direct transitions \(\mathcal {T}^D\) contain the target POI of any indirect transition, we learn user preferences from direct transitions only. While non-temporal indirect transitions are not used in training, temporal indirect transitions \(\mathcal {T}^{T}_\tau \bigcap \mathcal {T}^{\lnot D}\) contain high-order POI transition relations useful for our task, which first-order models fail to capture [2,3,4, 29, 36]. Our augmentation of training examples with indirect transitions also alleviates data sparsity and improves the generalization performance of our model. We include temporal indirect transitions to at most \(\Delta\) check-ins following a direct transition, such that the size of training examples is \(O((\Delta +1)|\mathcal {T}^D|)\), i.e., linear to the number of direct transitions. The step limit \(\Delta\) preserves training efficiency and avoids data imbalance caused by dense check-in clusters.

During training, examples of different transition types are scored with a group of functions derived from Eq. (1):

$$\begin{aligned} g_\tau (u^c,l^c,l)= {\left\{ \begin{array}{ll} \textbf{v}_{u^c}^{U,I}\cdot \textbf{v}_l^{I,U}+\textbf{v}_{l^c}^{L,I}\cdot \textbf{v}_l^{I,L}&{}\text {if} (s^c,s)\in \mathcal {T}^{T}_\tau \bigcap \mathcal {T}^D,\\ 2(\textbf{v}_{l^c}^{L,I}\cdot \textbf{v}_l^{I,L}) &{}\text {if }(s^c,s)\in \mathcal {T}^{T}_\tau \bigcap \mathcal {T}^{\lnot D},\\ 2(\textbf{v}_{u^c}^{U,I}\cdot \textbf{v}_l^{I,U}) &{}\text {if} (s^c,s)\in \mathcal {T}^{\lnot T}_\tau \bigcap \mathcal {T}^{D}. \end{array}\right. } \end{aligned}$$
(2)

The scores in case two and three are doubled to be on the scale as in case one. With this formulation, \(\textbf{V}^{U,I}\), \(\textbf{V}^{I,U}\) learn user general POI preferences, and \(\textbf{V}^{L,I}\), \(\textbf{V}^{I,L}\) learn first- and high-order temporal transition influences. This explains how our framework collectively exploits different types of transitions for training.

5.2 Time-weighted ranking optimization

We train our model by optimizing the rank of target POIs. For a training example \((s^c,s)\in D_\mathcal {T}\), the margin-penalized rank induced by the scoring function \(g_\tau\) is computed as:

$$\begin{aligned} rank_{g_\tau }(u^c,l^c,l)=\displaystyle \sum _{l'\in \mathcal {L}^{neg}}\mathbb I[g_\tau (u^c,l^c,l)<g_\tau (u^c,l^c,l')+\epsilon ], \end{aligned}$$
(3)

where \({\mathbb {I}}(\cdot )\) is the indicator function, \(l'\) is a negative POI example, and \(\epsilon\) is the margin. When the training example \((s^c,s)\) is a temporal transition, we consider the set of irrelevant POIs \(\mathcal {L}_{s^c,\tau }^-\) (Definition 2) as negative POIs; otherwise we consider the set of unvisited POIs \(\mathcal {L}\setminus \mathcal {L}_{u^c}\) as negaive POIs. Basically, the margin-penalized rank counts misranked negative POIs whose score is greater than the target POI’s score after adding the margin. A loss is derived from the rank through an ordered weighted averaging operator:

$$\begin{aligned} L(rank)=\displaystyle \sum _{i=1}^{rank}\alpha _i,~~~\text {with} \alpha _1\ge \ldots \ge 0. \end{aligned}$$
(4)

Since the loss increment reduces with the rank, the loss emphasizes ranks at top positions. We set \(\alpha _i\) to \(\frac{1}{i}\) following refs. [40, 41].

To strengthen the time horizon effect of our model, we further incorporate transition time in training. We note that under our task settings, the importance of training examples varies. For example, less transition time indicates that the user has a stronger tendency to visit the target POI in the short period. We therefore introduce a factor \(w_{\delta t}\) to weight training examples based on the rank of transition time. From Sect. 3.2.3, we approximate the CCDF of short-term transition time \(\mathbb {P}(T\ge \delta t_{s^c,s})\) with an exponential decay factor \(e^{-k\cdot \delta t_{s^c,s}}\) and combine it with a constant \(\beta\) that lower-bounds the weight of non-temporal transition examples. The weight factor is computed as:

$$\begin{aligned} w_{\delta t}= {\left\{ \begin{array}{ll} (1-\beta )e^{-k\cdot \delta t_{s^c,s}}+\beta &{}\text {if} (s^c,s)\in \mathcal {T}^{T}_\tau \bigcap \mathcal {T}^D, \\ \frac{1}{2}\left[ (1-\beta )e^{-k\cdot \delta t_{s^c,s}}+\beta \right] &{}\text {if} (s^c,s)\in \mathcal {T}^{T}_\tau \bigcap \mathcal {T}^{\lnot D}, \\ \frac{1}{2}\beta &{}\text {if }(s^c,s)\in \mathcal {T}^{\lnot T}_\tau \bigcap \mathcal {T}^{D}. \end{array}\right. } \end{aligned}$$
(5)

The factors in case two and three are halved to counteract the scaling-up of scores in Eq. (2). Our method generalizes standard training strategies: with \(k=0\), temporal transitions are uniformly weighted; with \(\beta =1\), all transitions are uniformly weighted; with \(\beta =0\), only temporal transitions are used for training. With \(k>0\) and \(0<\beta <1\), non-temporal transitions have the minimum weight, and temporal transitions receive additional temporal weight. Besides, indirect transitions following a direct transition have greater time span and thus less weight than the direct transition. By differentiating the importance of training examples, our temporal weighting scheme assists the joint training with different types of transitions. We minimize the following optimization objective in training:

$$\begin{aligned} \mathcal {O}=\displaystyle \sum _{(s^c,s)\in D_\mathcal {T}}w_{\delta t}\,L[rank_{g_\tau }(u^c,l^c,l)], \end{aligned}$$
(6)

which sums the time-weighted ranking loss.

To recommend new POIs to a user, it is insufficient to optimize the ranking loss on training data. For better generalization performance, we constrain \(\textbf{v}_{u^c}^{U,I}\), \(\textbf{v}_l^{I,U}\) in a ball of radius \(\alpha C\) and \(\textbf{v}_{l^c}^{L,I}\), \(\textbf{v}_{l}^{I,L}\) in a ball of radius C, which bounds \(\textbf{v}_{u^c}^{U,I}\cdot \textbf{v}_l^{I,U}\) in \([-\alpha ^2 C^2,\alpha ^2 C^2]\) and \(\textbf{v}_{l^c}^{L,I}\cdot \textbf{v}_l^{I,L}\) in \([-C^2,C^2]\). The importance of user preference score \(\textbf{v}_{u^c}^{U,I}\cdot \textbf{v}_l^{I,U}\) can be controlled with \(\alpha\). We also apply a weighted \(L_2\) regularization term in our objective:

$$\begin{aligned} \mathcal {O}=\displaystyle \sum _{(s^c,s)\in D_{\mathcal {T}}}w_{\delta t}\left[ L[rank_{g_\tau }(u^c,l^c,l)]+\lambda \Vert \Theta \Vert _F^2\right] . \end{aligned}$$
(7)

In summary, we develop a personalized latent ranking framework where our model learns short-term POI transition influences and user preferences by joint training with different types of transitions. Our proposed temporal weighting scheme captures the temporal context of transitions and enhances the training. We apply regularization to improve the performance. We name our model Personalized Time-Weighted Latent Ranking (PTWLR).

5.3 Parameter learning

The parameters \(\Theta =\{\textbf{V}^{U,I},\textbf{V}^{I,U},\textbf{V}^{L,I},\textbf{V}^{I,L}\}\) of PTWLR can be efficiently learnt with an SGD-based algorithm. Since the ranking loss in Eq. (4) is not continuous, we apply a continuous approximation [16]. When \(rank_{g_\tau }(u^c,l^c,l)>0\), we have:

$$\begin{aligned} L[rank_{g_\tau }(u^c,l^c,l)] \!\!=&L[rank_{g_\tau }(u^c,l^c,l)]\,\frac{\sum _{l'\in \mathcal {L}^{neg}}{\mathbb {I}}(g_l<g_{l'}+\epsilon )}{rank_{g_\tau }(u^c,l^c,l)}\nonumber \\ \approx&L[rank_{g_\tau }(u^c,l^c,l)]\,\frac{\sum _{l'\in \mathcal {L}^{neg}\wedge g_l<g_{l'}+\epsilon }\sigma (g_{l'}+\epsilon -g_l)}{rank_{g_\tau }(u^c,l^c,l)}\nonumber \\ =&L[rank_{g_\tau }(u^c,l^c,l)]\,\mathbb E_{l'\in \mathcal {L}^{neg}\wedge g_l<g_{l'}+\epsilon }\left[ \sigma (g_{l'}+\epsilon -g_l)\right] , \end{aligned}$$
(8)

where \(g_l\) shorts for \(g_\tau (u^c,l^c,l)\), and \(\sigma (\cdot )\) is the logistic function. For a training example \((s^c,s)\in D_\mathcal {T}\), we uniformly sample a misranked negative POI \(l'\) from \({\mathcal {L}}^{neg}\) and count the number of trials n [41]. Since n follows a geometric distribution with the success probability \(p=\frac{rank_{g_\tau }(u^c,l^c,l)}{|\mathcal {L}^{neg}|}\), the maximum likelihood estimate of p is \(\frac{1}{n}\), and \(rank_{g_\tau }(u^c,l^c,l)\approx \left\lfloor \frac{|\mathcal {L}^{neg}|}{n}\right\rfloor\). We then compute the approximated gradient as:

$$\begin{aligned} \frac{\partial {L[rank_{g_\tau }(u^c,l^c,l)]}}{\partial \Theta }\approx L\left( \left\lfloor \frac{|\mathcal L^{neg}|}{n}\right\rfloor \right) \frac{{\sigma (g_{l'}+\epsilon -g_l)}}{\partial \Theta }. \end{aligned}$$
(9)

The details of our training algorithm are presented in Algorithm 1. For a training example \((s^c,s)\in D_\mathcal {T}\), we compute \(g_l\) as Eq. (2) and sample a misranked negative POI \(l'\) (lines 4–8). After obtaining \(l'\), the weight \(w_{\delta t}\) is determined as Eq. (5), and the gradient approximation is computed as Eq. (9). We update each parameter \(\theta \in \Theta\) (line 12) as:

$$\begin{aligned} \theta \leftarrow \theta -\gamma \,w_{\delta t}\left[ L\left( \left\lfloor \frac{|\mathcal {L}^{neg}|}{n}\right\rfloor \right) \frac{\sigma (g_{l'}+\epsilon -g_l)}{\partial \theta }+2\lambda \theta \right] , \end{aligned}$$
(10)

where \(\gamma\) is the learning rate. We project the updated latent vectors to enforce norm constraints.

Algorithm 1
figure a

Training Algorithm of PTWLR

The running time of an iteration of our algorithm is \(O({\bar{n}}|D_{\mathcal {T}}|K)\), linear to the size of training examples. The average number of trials \({\bar{n}}\) is at most \(|\mathcal {L}|\), and its expectation is upper-bounded by \(\frac{|\mathcal {L}|}{H(rank_{g_\tau })}\), where \(H(\cdot )\) is harmonic mean (see analysis in Appendix A). A truncated sampling scheme [42] can be applied to reduce \(\bar{n}\).

5.4 Contextualized POI recommendation

Our PTWLR model can incorporate richer context to improve the accuracy of short-term POI recommendation. We observed in Sect. 3.2.2 that a user often visits multiple POIs in a short period, which shows the user’s future visits may relate to the recent few check-ins, not just the current one. We therefore extend our model to aggregate transition influences from historical check-ins [28, 45, 46]. Based on Sect. 3.2.3, we assume that for any check-in \(s=(u,l,t)\) that may occur in a short term, the likelihood of the transition time from a historical check-in \(s^p\in \mathcal {S}_u^c\) (including \(s^c\)) decays exponentially:

$$\begin{aligned} \mathbb {P}(T\ge \delta t_{s^p,s})\propto e^{-k'\cdot \delta t_{s^p,s}}\propto e^{-k'\cdot \delta t_{s^p,s^c}}. \end{aligned}$$
(11)

Even though \(\delta t_{s^p,s}\) is not certain since the exact time of s is unknown, \(\delta t_{s^p,s^c}\) gives an equivalent measure of the likelihood. By aggregating the likelihood-weighted transition influences from historical check-ins, we extend our scoring function from Eq. (1) to:

$$\begin{aligned} f_\tau (s^c,\mathcal {S}_{u^c}^c,l)=\textbf{v}_{u^c}^{U,I}\cdot \textbf{v}_l^{I,U}+\frac{1}{Z}\displaystyle \sum _{s^p\in \mathcal {S}_{u^c}^c}e^{-k'\cdot \delta t_{s^p,s^c}} (\textbf{v}_{l^p}^{L,I}\cdot \textbf{v}_l^{I,L}), \end{aligned}$$
(12)

where \(Z=\sum _{s^p\in \mathcal {S}_{u^c}^c}e^{-k'\cdot \delta t_{s^p,s^c}}\) is a normalizing factor. The historical context enables our model to better capture the user’s short-term interests. We include \(\Delta '\) most recent check-ins within an hour of \(s^c\) to avoid noisy transition influences.

Additionally, we model the likelihood of transition distances in a short term with the power-law distribution [18, 27]:

$$\begin{aligned} \mathbb {P}(D\ge d_{l^c,l})\propto (1+a\cdot d_{l^c,l})^{-b}, \end{aligned}$$
(13)

where \(a>0\) and \(b>0\) are parameters. We incorporate the geographical context by adjusting our score w.r.t the likelihood of the transition distance from \(l^c\) to l:

$$\begin{aligned} f_\tau ^{G}(s^c,\mathcal {S}_{u^c}^c,l)= \mathbb {P}(D\ge d_{l^c,l})\cdot f_\tau (s^c,\mathcal {S}_{u^c}^c,l). \end{aligned}$$
(14)

With the proposed extensions, our model incorporates the context of historical check-ins and geographical distances to make recommendation. We refer to our extended model with the historical and geographical context as PTWLR-HG.

6 Experiments

6.1 Experimental setup

6.1.1 Experimental settings

We conduct experiments on Foursquare and Gowalla datasets described in Sect. 3.1. We use each user’s first 70% check-ins for training, the next 10% for validation, and the last 20% for test. For a test example \((s^c,\mathcal {S}_{u^c}^c)\), the set of new POI check-ins in \(\tau\) forms the ground truth, i.e., \(\mathcal {D}_{s^c,\tau }\) (Definition 1). We set the time horizon \(\tau\) to 6 h in our primary experiments following refs. [1, 7, 29], and we evaluate other time horizon settings in Sect. 6.2.4.

The hyper-parameters of our model are set as \(\epsilon =0.3\), \(C=1\), \(\alpha =0.5\), dimension size \(K=100\), learning rate \(\gamma =0.001\), and regularization parameter \(\lambda =0.03\). For time-weighted ranking optimization, we select step limit \(\Delta\) from 0 to 4 and perform grid search for k and \(\beta\) in \([0,1]\times [0,1]\). Based on validation, we set \(\Delta =2\), \(k=0.25\), and \(\beta =0.5\). For historical context, we select context window size \(\Delta '\) from 1 to 5 and decay parameter \(k'\) in [0, 20]. We set \(\Delta '=3\), \(k'=3\) on Foursquare, and \(k'=5\) on Gowalla. For geographical context, we perform grid search for a and b in \([0,3]\times [0,1.5]\). We set \(b=0.5\), \(a=3\) on Foursquare, and \(a=2\) on Gowalla. Section 6.4 explains the details of parameter effects. We adopt early stopping with patience of 100 during training.

6.1.2 Evaluation metrics

Performance evaluation is based on Rec@N, MAP@N, and nDCG@N. Rec@N evaluates ground truth coverage while MAP@N and nDCG@N evaluate ranking quality. All metrics range within [0, 1], and a higher value indicates better performance. Our results are averaged first over a user’s test examples, then over users. Appendix B gives further explanation.

6.1.3 Baseline models

We compare our proposed model, PTWLR-HG, with seven baseline modelsFootnote 1:

  • MC: The first-order Markov chain of temporal POI transitions.

  • FPMC-LR [29]: A next POI recommender that factorizes the check-in tensor into user-POI and POI transition matrices and imposes a localized region constraint.

  • PRME-G [1]: A next POI recommender that models user-POI relations and POI transitions in separate latent Euclidean spaces.

  • HME-G [7]: A next POI recommender that models user-POI relations and POI transitions in a Poincaré ball.

  • Rank-GeoFM [16]: A general POI recommender that considers neighboring POIs. We adapt it for our task by multiplying recommendation scores with the geographical context formula in Eq. (13).

  • DeepMoveFootnote 2 [21]: An RNN-based next POI prediction model that captures periodicity with historical attention.

  • TiSASRecFootnote 3 [39]: An attention-based next item prediction model that encodes relative time intervals to model the effect of time gaps.

The dimension size of all baselines is set to 100 for fair comparison. Other parameters are chosen according to the corresponding papers.

6.2 Performance analysis

We compare PTWLR-HG against the baselines from different perspectives: (1) the overall performance on two datasets; (2) the performance on test examples that contain different numbers of POIs; (3) the performance for users with different numbers of check-ins in the training set; (4) the performance under varying time horizon settings. We report the mean results from five independent runs.

6.2.1 Overall performance

Table 3 compares the overall performance. MC has better top-5 results than FPMC-LR on Foursquare, but its top-10 results are not as good due to poor generalization ability. PRME-G outperforms FPMC-LR on both datasets because it captures high-order POI transition relations and learns user preferences from both temporal and non-temporal POI transitions. Although Rank-GeoFM does not consider sequential information, its performance is comparable with PRME-G, which implies that user preferences and geographical context are also important for our task. Neither DeepMove nor TiSASRec perform satisfactorily, probably because: (1) they do not predict target POIs within a specific time horizon; (2) they do not exploit geographical context; (3) they are too heavily parameterized to train on sparse data.

Our model, PTWLR-HG, consistently outperforms all baselines, with over 14% gain across all metrics, statistically significant in terms of paired t-tests with p-value less than \(10^{-4}\). This is because PTWLR learns time-aware POI transition relations of high-order along with user preferences by jointly modeling different types of POI transitions. Moreover, our proposed temporal weighting scheme strengthens the temporal context of transitions, and the incorporation of historical and geographical context leads to further improvement. PTWLR-HG is also light and easy to train on sparse data. These properties make PTWLR-HG the best model in our tests.

Table 3 Overall model performance, with the best results highlighted in boldface

6.2.2 Performance on test examples with different numbers of POIs

Users may visit a varying number of POIs in a short period. To investigate this impact on performance, we group test examples by the number of POIs (1, 2–4, and \(\ge 5\)) and evaluate the recall since it measures ground truth coverage. In multi-POI examples, users generally travel longer distances than in single-POI examples. As shown in Fig. 6, the relative performance of Rank-GeoFM declines on multi-POI examples because the geographical context is biased towards POIs close to the current location. In contrast, models that utilizes sequential information like PRME-G exhibit more stable performance across groups. This suggests that sequential information helps recommend not only the immediate next POI but subsequent POIs in a short period. Since our model considers high-order POI transitions and geographical context, it has the highest recall on all groups.

Fig. 6
figure 6

Performance on test examples with different numbers of POIs

6.2.3 Performance for users with different numbers of check-ins

We further evaluate the performance for users grouped by the number of check-ins in the training set (1–20, 21–50, 51–100, and \(\ge 101\)). Even though active users have more history to learn, recommendation gets harder as new POIs are less common. From Fig. 7, the best performance generally comes from the 1–20 group, which suggests that insufficiency of visit history can be mitigated with sequential and geographical information. The relative performance of MC declines for active users because it cannot recommend infrequent POIs. In contrast, our model performs the best for all user groups. By learning user preferences and utilizing the historical context, our model captures the short-term interests of active users and makes accurate recommendations. This explains why our model achieves the largest performance gain over baselines for the \(\ge 101\) group on Gowalla, which contains denser check-ins.

Fig. 7
figure 7

Performance for users with different numbers of check-ins in the training set

6.2.4 Performance under varying time horizon settings

Time horizon \(\tau\) can be customized in different applications, so we evaluate the performance while varying \(\tau\) (3, 6, and 12 h). From Fig. 8, the relative performance is similar under different settings. Generally, as the time horizon increases, the performance declines because sequential and geographical information becomes weaker. DeepMove and TiSASRec show smaller decline because they model long sequential patterns. Our model outperforms all baselines regardless of the time horizon, demonstrating its wide applicability on short-term POI recommendation.

Fig. 8
figure 8

Performance under varying time horizon settings

6.3 Ablation study

We compare PTWLR-HG with the following simplified models:

  1. (a)

    LR-HG: Base model without user preference component, not using indirect POI transitions or temporal weighting scheme in training.

  2. (b)

    PLR-HG: LR-HG with user preference component.

  3. (c)

    PTWLR-H: PTWLR with historical context only.

  4. (d)

    PTWLR-G: PTWLR with geographical context only.

Table 4 compares the performance. PTWLR-HG outperforms all simplified models, validating the effectiveness of all components. The improvement of b over a shows that user preference learning is helpful. The incorporation of geographical context leads to considerable improvement (e vs. c) because it complements sequential information, especially of infrequent POIs. The incorporation of indirect POI transitions and temporal weighting scheme also brings substantial improvement (e vs. b) because more sequential information is learned during training. Although the improvement from historical context is not as much (e vs. d), it is more evident on Gowalla with denser check-ins. Historical context also stabilizes the performance, as we observe lower standard deviations in other models than d. We note that without geographical context, PTWLR-H still outperforms all baselines in Sect. 6.2.1, which suggests the possibility of applying our model in non-geographical domains like E-commerce.

Table 4 Performance of variants of our proposed model, with the best results highlighted in boldface

6.4 Parameter effects

This section explains our parameter selection process based on validation results.

6.4.1 Effect of step limit \(\Delta\)

We first investigate the effect of step limit \(\Delta\) on PLR model. With \(\Delta =0\), the model is trained with direct POI transitions only; with \(\Delta \ge 1\), the model is trained with both direct and indirect POI transitions. From Fig. 9, the performance improves when indirect transitions are used in training, which peaks at \(\Delta =2\) then declines. This is because indirect transitions contain additional sequential information and noise, and the step limit helps avoid the ones with high noise, in accordance with [43].

Fig. 9
figure 9

Effect of step limit \(\Delta\)

6.4.2 Effect of temporal weighting parameters

After setting \(\Delta =2\), we investigate the effect of constant \(\beta\) and decay factor k in temporal weighting. As shown in Fig. 10, the performance significantly improves when non-temporal transitions are exploited for user preference learning (\(\beta >0\)). The performance with temporal weighting at \(k=0.25\) and \(\beta =0.5\) is consistently better than uniform weighting (\(\beta =1\)), which validates the effectiveness of our proposed temporal weighting scheme.

Fig. 10
figure 10

Effect of temporal weighting parameters

6.4.3 Effect of dimension size K

Figure 11a compares the performance as dimension size K varies from 10 to 200. The model with temporal weighting outperforms the model without in all cases, and the margin is even larger when \(K=25\), probably because the temporal information benefits training. From \(K=50\) to 200, the performance is relatively stable. Figure 11b compares the number of iterations to converge. It decreases drastically as K increases to 25 and stays below 500 when \(K\ge 50\).

Fig. 11
figure 11

Effect of dimension size K

6.4.4 Effect of historical context parameters

We investigate the effect of window size \(\Delta '\) and decay factor \(k'\) on PTWLR-H model. As shown in Fig. 12, when \(k'\ge 1\), the performance with the context of multiple historical check-ins (\(\Delta '\ge 2\)) is superior to the context of the current check-in only (\(\Delta '=1\)). The performance peaks at \(k'=3\) on Foursquare and \(k'=5\) on Gowalla with \(\Delta '=3\), showing that the window size limit helps reduce noisy transition influences.

Fig. 12
figure 12

Effect of historical context parameters

7 Conclusions and future work

In this paper, we formulated the task of short-term POI recommendation and discussed its significance and challenges. We proposed PTWLR, a ranking-based model that jointly learns short-term POI transitions and user preferences with a temporal weighting scheme. Our extended model, PTWLR-HG, accommodates transition influences from multiple recent check-ins and geographical context. Experiments on real-world LBSN datasets demonstrated the superior performance of PTWLR-HG over seven widely used baselines in various contexts. Our ablation study and parameter effect analysis showed that all proposed components contribute to performance improvement.

For future work, we consider incorporating extra context information like POI categories and periodicity. Higher-level temporal information in terms of weekday/weekend and months may encode periodical POI popularity. We could also adapt our approach to other types of models, e.g., metric embedding, and other domains, e.g., E-commerce.