0% found this document useful (0 votes)
50 views10 pages

Deep and Broad Learning On Content-Aware POI Recommendation

no description

Uploaded by

kelemu3509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views10 pages

Deep and Broad Learning On Content-Aware POI Recommendation

no description

Uploaded by

kelemu3509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2017 IEEE 3rd International Conference on Collaboration and Internet Computing

Deep and Broad Learning on Content-aware POI


Recommendation

Fengjiao Wang∗ , Yongzhi Qu† , Lei Zheng∗ , Chun-Ta Lu∗ and Philip S. Yu∗
∗ Department of Computer Science
University of Illinois at Chicago
Chicago, Illinois, 60607, USA
Email: {fwang27,lzheng21,clu29,psyu}@uic.edu
† Wuhan University of Technology, China

Email: {quwong}@whut.edu.cn

Abstract—POI recommendation has attracted lots of research information. There are several drawbacks of these algorithms.
attentions recently. There are several key factors that need First of all, existing POI recommendation algorithms mainly
to be modeled towards effective POI recommendation – POI focus on information of users, such as user preference, users’
properties, user preference and sequential momentum of check- check-in sequence, while ignoring the characteristics of POIs.
ins. The challenge lies in how to synergistically learn multi-source Second, current algorithms typically model different sources
heterogeneous data. Previous work tries to model multi-source
information in a flat manner, using either embedding based
of information with the same metric, such as distances in
methods or sequential prediction models in a cross-related space, PRME and transition probabilities in FPMC. However, these
which cannot generate mutually reinforce results. In this paper, symbolized features may not be suitable to handle different
a deep and broad learning approach based on a Deep Context- form of dependencies. Third, they always model consecutive
aware POI Recommendation (DCPR) model was proposed to dependencies but ignores long term dependencies in check-
structurally learn POI and user characteristics. The proposed in sequences. Moreover, the above-mentioned models are all
DCPR model includes three collaborative layers, a CNN layer shallow models, which cannot capture the highly non-linearity
for POI feature mining, a RNN layer for sequential dependency of sequential patterns.
and user preference modeling, and an interactive layer based
on matrix factorization to jointly optimize the overall model. Recently, researchers take the content information of POIs
Experiments over three data sets demonstrate that DCPR model into consideration. Content information can be helpful in
achieves significant improvement over state-of-the-art POI recom- various ways. For instance, a user may search a POI’s reviews
mendation algorithms and other deep recommendation models.
or tips beforehand to decide whether she/he is interested in
Keywords—Spatial Temporal Modeling; Embedding; POI Rec- visiting the place. Therefore, in reality, POIs’ reviews or tips
ommendation; can actually be part of the inputs that affect a user’s check-
in decision. Besides, context information can help identify
I. I NTRODUCTION semantically similar POIs, e.g., ‘burgers’ often appear in the
reviews and descriptions of fast food shops. As shown in
As location-based applications rapidly gain popularity, a recent works [7], [8], [9], integrating context information
large volume of online contents with geo-tagged information can be beneficial to alleviate the sparsity problem in POI
(check-ins) is created daily. Check-ins, as a direct channel recommendation. However, most of these works are based on
connecting the online and offline worlds, aid the development traditional topic models that simply use bag-of-word features
of many personalized and locational information services, such and ignore the word orders. Sentences with similar N-grams
as personalized advertisement [1], local event promote [2], but total different semantic meanings are hard to differentiate
[3] and city management improvement [4]. One of the core for bag-of-words based technique [10]. Therefore, previous
tasks towards these services is Point Of Interest (POI) rec- methods may not fully uncover semantic information of POIs.
ommendation, since it not only helps users enriching their Moreover, topic models can be easily affected by the scalability
urban experiences but also facilitates the analysis of the crowd problem and also cannot handle new users and new POIs.
mobility and communication.
Due to the success of the deep neural networks, researchers
Most of the prominent approaches to POI recommenda- have also applied deep models on POI recommendation tasks.
tion can be divided into three categories: 1) collaborative Among which, Recurrent Neural Networks (RNN) is espe-
filtering, 2) sequential pattern modeling and 3) context-aware cially suitable for sequential prediction. Recently, [11] shows
recommendation. Basically, they are derived to learn three RNN’s superior performance on sequential click prediction. By
types of information - user preference, check-in sequences, and concurrently model spatial and temporal patterns in LBSNs
text information, respectively. Recently, some state-of-the-art through transition matrix of RNN, [12] achieves promising
models try to learn two types of information simultaneously, performance improvement over matrix factorization based and
such as PRME [5] and FPMC [6], which model user prefer- Markov chain-based algorithms.
ence and sequential patterns together. However, most of the
extended variants of the prominent approaches still relied on In order to broadly fuse different sources of information
the original architecture and integrate other information as side (user preference, check-in sequences, and text information), in

0-7695-6303-1/17/31.00 ©2017 IEEE 369


DOI 10.1109/CIC.2017.00054
this paper we propose a new deep and broad learning model 4 shows the experimental results as well as the discussion.
named as Deep Content-aware POI Recommendation model Section 5 presents a review for the state-of-the-art research
(DCPR) to learn effective representations of POIs and users to status. Section 5 concludes the paper.
facilitate POI recommendation task. In particular, in the pro-
posed model we design a multi-layer deep architecture which II. P ROBLEM F ORMULATION
consists of multiple deep neural networks. The composition of
multiple layers of deep neural networks can first map the data In this section, we will introduce the problem formulation.
(POI associated with text information) into a highly non-linear Given a set of users U where U = {u1 , u2 , ...uN } and a set
latent space (POI space), and then the user representations of POIs P where P = {p1 , p2 , . . . , pM } in a location-based
can be learned through user preference and check-in sequence service. N is the total number of users and M denotes the total
modeling with deep neural networks in the produced highly number of POIs. Each user in U is associated with a list of
non-linear latent space. check-ins in chronological order. For instance, user ui ’s check-
in list is denoted as Cui , where Cui = {c1ui , c2ui , ...cnui }. The
Specifically, to enable the content-aware features as well k-th check-in ckui in the list Cui is defined as ckui = (ui , pl ),
as to address the sparsity problem and long term sequential which means that user ui checked in at POI pl at the k-
pattern mining, the proposed model utilizes convolutional th time stamp. Each POI in P is associated with a list of
neural networks (CNN) model to capture semantic information reviews or tips. For example, for POI pl , its list of reviews/tips
and common opinion of POIs while preserving the word- is denoted as Rpl , where Rpl = {rp1l , rp2l , ..., rpml } with rpj l
orders for the original documents. Then, long short term indicates the j-th review/tip in POI pl ’s review/tip list. Given
memory networks (LSTM) is employed to store user pref- G = (U , P, C, R), which consists of a list of users U , a list of
erence through modeling check-in sequences to collectively POIs P, all users’ check-ins C, and all POIs’ reviews/tips R,
learn user preference from similar users. The LSTM network the task is to recommend a certain number of POIs for each
and CNN network are connected in a structural manner as user based on previous check-ins.
LSTM learns user preference and sequential patterns with
prior knowledge of POIs’ semantic information by taking the In this paper, we utilize ranking-based loss [13] to train the
representational vectors as input from CNN layer. Finally, deep neural networks. For the ranking-based loss, each training
the personalized ranking layer on top jointly optimize latent sample usually contains a positive item and a negative item.
representations produced in the first two layers (convolutional For the proposed problem, each training sample is a sequence
layers CNN, recurrent layers LSTM), as it refines the learned of check-in POIs performed by a user, the positive item is
latent features in the first two layers towards generating more the POI checked in after the sequence of check-ins, while the
accurate patterns and better recommendations. The proposed negative item is the POI uniformly sampled from the list of
architecture makes DCPR model an end-to-end trainable deep POIs that are not in the user’s training sequence. Then, a user,
model. a positive POI, and a negative POI form one training sample.
Training data Ds is defined as
Contributions of this paper is summarized as follows.
Ds := {(ui , pj , pj  )|ui ∈ U ∧ pj ∈ Pi+ ∧ pj  ∈ Pi− } (1)
1) We propose a deep and broad learning approach
based on a deep content-aware model (DCPR) in where ui , pj , and pj  are uniformly sampled from U , Pi+ , Pi− ,
which content-based POI features and user specific respectively. Pi+ denotes the list of positive POIs for user ui ,
sequential patterns are learned synergistically. The while Pi− represents the list of negative POIs for user ui .
hierarchical model can jointly learn a multi-source
heterogeneous network and is robust to sparsity. III. T HE P ROPOSED A RCHITECTURE
2) We propose a structural pair deep learning model,
in which the first deep learning algorithm effectively In this section, we introduce the proposed model DCPR
learns an embedding space with latent representations that effectively learns embeddings of users and POIs for POI
of POIs, and the second deep learning model learns recommendation through a deep network architecture. DCPR
global structure of the constructed embedding space collectively models broad information on check-in sequences
with physical meanings to mine users’ mobility pat- and text information with the deep neural network in a hi-
terns. Both the deep representation learning and deep erarchical manner, and it is coupled with probabilistic matrix
mobility mining are optimized by an unified ranking factorization [14] to provide top-N recommendations for users.
based objective function. The advantages of the proposed DCPR model are two-fold.
3) The proposed model is extensively evaluated on three Firstly, DCPR is an end-to-end deep model which can learns
real LBSN datasets. The results demonstrate that more representative embeddings of users and POIs. Secondly,
it outperforms state-of-the-art sequential modeling the proposed model explains how check-in behaviors are
methods and deep recommendation models in POI formed by modeling text information and check-in sequences
recommendation tasks. in a hierarchical order.
4) The proposed deep learning framework can be em-
ployed to solve a generic class of problems involving A. Architecture
heterogeneous network learning.
The architecture of the proposed framework is illustrated
The rest of the paper is organized as follows: Section 2 gives in Fig. 1. It consists of three components: POI context ex-
the details of the problem definition. Section 3 illustrates the traction, user preference and check-in sequence modeling, and
proposed architecture and mathematical formulation. Section personalized ranking from bottom to top.

370
negative
Check-in Behaviorr Ranking Loss POI
positive
Learning
POI

h User Embeddings

n
User Representation
Learning R R R R

… POI Embeddings



on
POI Representation
Learning
CNN
.
.
. .. . .
.
.
.
. ... .
.
.
.
. ... .
.
… .
.
. .. . .
.
.
.
. .. . .
.
.
.
. .. . .
. Word Embeddings
. . . . . .

… POIs

… Reviews/tips

Fig. 1: Network Architecture. The architecture contains three components: 1) POI representation learning; 2) user representation
learning; 3) check-in behavior learning.

At the bottom of DCPR, the POI context extraction com-


ponent of the algorithm learns semantic information of POIs to
generate latent representations from reviews/tips by employing Japanese
spots
CNN. Above the POI context extraction component is the in
check-in sequence modeling component, which is responsible Convoy

for modeling check-in sequences to learn latent representa- area

tions of users by utilizing LSTM. In the check-in sequence


modeling component, rectangle R stands for recurrent cell
and h denotes hidden state in LSTM. The POI embeddings Document Embeddings
gs
Convolutional Layer
Max Pooling Layer Fully Connected Layer
learned by CNN from reviews/tips represent POIs’ properties with Multiple Filters

and can help explain users checked-ins. Compared to previous


Fig. 2: The structure of the POI representation learning com-
models ignoring the textual content, such as [6], [5], it can
ponent.
facilitate the check-in sequence modeling component to learn
more effective latent representations of users. Futhermore, the
above-mentioned two components are directly connected and
organized in a hierarchical order. At the top of DCPR is the concatenated into one document dq . The dq contains semantic
personalized ranking component which optimizes the latent information and common opinions of the q-th POI. This helps
representations of users and POIs following the fashion of to construct a meaningful solution space and facilitate the
probabilistic matrix factorization [14]. prediction of users’ future check-ins. Also, it helps learn
the users’ historical behavior more effectively and boost the
B. POI representation learning performance of prediction.
Given all POIs’ reviews/tips, we aim to learn latent rep- Given a document dq of POI pq , before feeding to the
resentations of POIs to facilitate POI recommendations. Intu- POI context extraction component, we first apply a word
itively, when a user searches a POI online, he/she is more likely embedding function, denoted as Φ, on each word of dq . Φ
to browse some of the reviews/tips to sum up the property and maps each word into a n-dimentional vector.
general opinions of this POI. To mimic this online behavior and
Assume there are N words in document dq , then an
accurately model POIs from their textual content, we propose
embedding matrix P i of document dq is represented as:
to learn model POIs from their reviews/tips.
To sum up all reviews/tips belonging to one POI, we firstly Πq = Φ(w1 ) ⊕ Φ(w2 ) ⊕ ... ⊕ Φ(wn ) ⊕ .. ⊕ Φ(wN ) (2)
concatenate all reviews/tipcs of the POI into one document. where Φ is a word embedding function mapping each word to
Formally, for the q-th POI pq , its list of reviews/tips can be a n-dimentional vector, Πq denotes the embedding matrix of

371
document dq , and ⊕ is the concatenation operator. Note that
n-th column of Φ corresponds to embedding of n-th word in Ct-1 × + Ct
document dq .
× tanh
Following the embedding function, three inner layers inside
CNN, including a convolution layer, a max-pooling layer and σ tanh ×
a fully connected layer, are built to learn feature vectors of
POIs. The structure of the POI representation learn component σ σ
is illustrated in Figure 2. Next, we will explain these three
layers in details. ht-1 ht

Convolutional layers apply convolution operator on doc- Xt


ument embeddings to generate new features. A convolution
operation corresponds to a neuron in neural networks. It Fig. 3: Basic structure of LSTM.
employs a filter Kj ∈ Rh×t to a window of h words to
generate a new feature. For example, applying convolution
operation on document dq produces feature zj is defined as
follows. of the LSTM is illustrated in Figure 3. A special mechanism
zjq = f (Πq ∗ Kj + bj ) (3) is involved in the basic structure of LSTM which includes
memory cell, input and output gate, and forget gate. Different
where zjq is the new convolution feature produced by filter Kj , parts work collaboratively to store and access information in
Πq is the q-th document that convolution operation works on, memory cell which is a unique part introduced in LSTM
symbol ∗ is the convolution operator, bj is the bias term, and to handle long term dependency problem, particularly. The
f is the activation function. Rectified Linear Units (ReLUs) equations introduced in this special mechanism is listed as
[15] are used as activation units. It has been shown that using follows.
ReLUs as activation units in CNN effectively shortening the
training time of neural networks [16]. The equation of ReLUs
is ft = σ(Wf · [ht−1 , xt ] + bf ) (8)
f (x) = max{0, x} (4)
it = σ(Wi · [ht−1 , xt ] + bi ) (9)
Following the convolution layer, a max pooling operation t = tanh(Wc · [ht−1 , xt ] + bC )
C (10)
is applied on the newly produced features as Eq. (5) t
Ct = ft ∗ Ct−1 + it ∗ C (11)
lj = max{zj1 , zj2 , ..., zjn−h+1 } (5) ot = σ(Wo [ht−1 , xt ] + bo ) (12)
Here, lj denotes the feature corresponding to filter Kj . For all ht = ot ∗ tanh(Ct ) (13)
of the filters, the produced features after max pooling layer is
where, Wf , Wi , Wc , and Wo are weight matrices, and bf ,
L = {l1 , l2 , l3 , ..., ln1 } (6) bi , bC , and bo are bias terms. Equation (8) works in forget
gate layer, which calculates how much information should
where n1 denotes the number of filters (neurons). The output be discarded for memory cell. In equation (8), ht−1 denotes
of max pooling layer is feed into a fully connected layer as: hidden state in last time stamp, xt stands for the input at
xq = f (W × L + g), (7) the time stamp t, bf is the bias term in forget gate, ft
determines how much information should be kept in memory
where W is the weight matrix in the fully connected layer, cell, and σ is the sigmoid function. In our scenario, input
xq ∈ Rn2 ×1 is latent features of the q-th POI. The fully xt is the embedding vector of POI checked in at this time
connected layer is designed to learn non-linear combination stamp. Equation (8) decides what information to forget for the
of extracted features from convolution and max pooling oper- memory cell in last time stamp, while equations (9) and (10)
ations. determine what new information should be stored in the new
memory cell. Equation (9) works in input gate layer, which
C. User representation learning decides which values will be updated according to last hidden
In this section, we aim to model a user’s interests from the state and input, and equation (10) deploys a tanh layer to create
user’s past POI sequences. Traditional approaches represents a vector of new candidate values C t . Equation (11) updates
each POI with one-hot encoding and lose the rich semantic in- memory cell values in this time stamp. Equations (12) and
formation existing in the textual information. In order to utilize (13) calculate the values in the new hidden state based on
the semantic information, in this paper, we build a check-in values in the new memory cell and hidden state in last time
sequence modeling component to utilize POI representations stamp as well as input of this time stamp.
from the POI representation learning component. Given a
list of POIs for a user i and their corresponing embeedings, D. Check-in behavior learning
the sequence modeling component generates a vector as the
embeeding of user i. Ranking based loss function attracts lots of attentions lately
[6], [5] since it directly optimizes the ranking order of POIs.
Check-in sequence modeling component employs long Essentially, to recommend POIs is to provide a ranking on the
short term memory networks (LSTM) [17] to model check- list of POIs with top POIs with high probability to be visited
in sequences with long term dependencies. The architecture by user. Inspired by Bayesian Personalized Ranking [18], we

372
(a) Data distribution@4sq (b) Data distribution@yelp

Fig. 4: Global distribution of POIs’ location in Foursquare and Yelp datasets.

model the conditional probability over POI j’s latent features TABLE I: Datasets
with Gaussian distribution as
Dataset Foursquare Yelp TIST
p(vj |xj , λ) = N (vj |xj , λv I) (14) Users 74,140 30,367 266,909
POIs 104,844 25,728 3,680,126
where I is a K × K identity matrix. Similarly, conditional Check-ins 418,081 146,456 33,263,631

probability over user i’s latent representation with Gaussian


distribution is defined as
A. Experiment Setting
p(ui |hi , λ) = N (ui |hi , λu I) (15)
We conduct extensive experiments to evaluate the proposed
where I is also a K × K identity matrix. The goal is to DCPR algorithm on the following three datasets. The statistics
maximize the difference between positive POI and negative of each dataset is summarized in Table I. For Foursquare and
POI. The difference probability given user i ∈ U , positive Yelp datasets, check-ins are from tweets in Twitter network.
POI j ∈ Pi+ , and negative POI j  ∈ Pi− is defined as Each tweet’s source indicates whether it is a Foursquare check-
in or Yelp check-in or something else. Foursquare tips are from
p(ri,j,j  |ui , vj , vj  ) = σ(uTi vj − uTi vj  ) (16) Foursquare network. Yelp reviews are from Yelp website. To
remove users or POIs with too few check-ins, we filter out
where ui , vj , and vj are latent features of user u, POI j, and users with less than 5 check-ins and POIs with less than 3
POI j  . Furthermore, σ is sigmoid function. visits for both Foursquare and Yelp datasets. TIST dataset
is a public dataset. It is originally utilized for monitoring
For optimization, we utilize the technique of MAP. Max-
and visualizing global check-in behaviors [20]. Note that, this
imizing the posterior probability of u, v, and parameters in
dataset only contain user check-ins and it does not contain text
deep neural networks is to minimize the negative of log-
information. For TIST dataset, we remove users with less than
likelihood.
 20 check-ins and POIs with less than 20 visits. Distributions
L=− {logσ(uTi vj − uTi vj  of POIs’ locations in Foursquare and Yelp datasets are shown
(i,j,j  )∈DS
in Fig. 4. Yelp dataset contains more check-ins in the United
States, while Foursquare dataset includes check-ins spreading
λu λv
+ (ui − hi )T (ui − hi ) + (vj − xj )T (vj − xj ))} worldwide which creates more challenges for the learning task.
2 2 We omit the description for TIST dataset, please refer to the
(17)
The first term of equation (17) enforces user preference in paper [20] for more details.
the way of maximizing the difference between product of user We compared the proposed approach with two state-of-
factors with positive embeddings and product of user factors the-art POI recommendation algorithms (FPMC, PRME), one
with negative embeddings. The second and third terms forces traditional recommendation algorithm (FM), and two deep
ui and vj to be close to user i’s latent factors and POI j’s latent models (RNN, CDL).
features respectively. Stochastic Gradient Descent algorithm
[19] is utilized to minimize the loss function. 1) FPMC [6] refers to factorized personalized Markov
chains model, which constructs a transition tensor
IV. E XPERIMENTS to model the probability of users’ next behavior
based on previous behaviors. A factorization model
To test whether the proposed architecture can effectively is proposed to decompose the tensor to estimate the
modeling users’ check-in sequences and extracting semantic probability. The factorization model is able to learn
information from text, we evaluate the performance of the pro- information among similar users and similar items.
posed framework and state-of-the-art baselines in this section 2) PRME [5] stands for personalized ranking metric
with various metrics and case studies. embedding model. It learns two embeddings in two

373
0.2 0.5 0.16
FPMC FPMC FPMC
PRME PRME PRME
FM FM 0.14 FM
0.4
0.15 RNN RNN RNN
DCPR DCPR 0.12 DCPR
0.3
0.1 0.1
0.2
0.08
0.05
0.1
0.06

0 0 0.04
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

(a) Precision@TopN (4sq) (b) Recall@TopN (4sq) (c) F1-score@TopN (4sq)

0.25 0.5 0.16


FPMC FPMC FPMC
PRME 0.45 PRME PRME
0.14
FM FM FM
0.2 0.4
RNN RNN RNN
0.12
CDL 0.35 CDL CDL
DCPR DCPR DCPR
0.1
0.15 0.3

0.25 0.08

0.1 0.2
0.06
0.15
0.04
0.05 0.1
0.02
0.05

0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

(d) Precision@TopN (Yelp) (e) Recall@TopN (Yelp) (f) F1-score@TopN (Yelp)

Fig. 5: Performance on Foursquare and Yelp datasets.

separate spaces. One embedding is based on sequen- algorithm, the dimension of POIs’ embeddings is set to 50,
tial transition probability, while the other embedding the number of neurons in the recurrent layer is set to 64, cross
is based on user preferences. Each user’s top-N entropy is employed as the loss function. For the proposed
recommendation is based on linear combination of DCPR algorithm, embedding dimension of POIs is set to 50.
the learned embeddings. For the convolution layer, the number of filters is set to 100,
3) FM [21] refers to Factorization Machine. It models filter length is set to 3. The number of neurons in the fully
pairwise interactions between all features. Note that, connected layer and the recurrent layer is set to 50. Note that,
for the proposed problem, there are three types of fea- we use different latent dimensions for different comparison
tures constructed for FM, including one hot encoding algorithms to optimize the performance for each case.
of users, combinations of one hot encoding of POIs
Three metrics are used to evaluate the performance of the
in check-in sequences, and one hot encoding of POI
compared methods. The output of the compared methods is
checked in after the sequences.
a ranked list of all POIs which indicate the likelihood of
4) RNN [11] is the state-of-the-art deep model for
the POI being checked in at the testing period from high
sequential prediction by adopting recurrent neural
to low. The first metric is Precision@N, which measures the
networks.
percentage of correct predictions in the top-N ranked list. The
5) CDL [22] jointly models text information with deep
second metric Recall@N measures the percentage of correct
representation learning and user feedback with col-
predictions in the top-N ground truth set. Note that, top-N
laborative filtering.
ground truth set is constructed based on the time difference
6) DCPR is the proposed method in this paper.
between training check-in sequence and testing check-ins. The
closer the time difference is, the higher position the POI’s takes
To evaluate the performance of the different approaches, in the top-N ground truth list. The third metric F1-score@N
for each user, we pick the first 80% of check-ins as training is the harmonic mean of above-mentioned two metrics, which
data, and the remained 20% of check-ins are considered as shows a comprehensive evaluation of the compared methods.
testing data. For the FPMC algorithm, the training data is
further divided into 80% and 20%, for training and validation,
B. Performance Comparison
respectively. Learning rate is set to 0.005, the parameter for
the regularization term is set to 0.03, and the factorization Fig. 5 shows the performance of POI recommendation
dimension is set to 20. For the PRME algorithm, parameter on Foursquare and Yelp datasets with metrics Precision@N,
α and latent dimension are set to 0.02 and 60 respectively, Recall@N, and F1-score@N. N varies from 1 to 20. Four
which follows the setting in the original paper. For the RNN observations are made as follows.

374
0.25 0.2 0.14
FPMC FPMC FPMC
DCPR DCPR 0.12 DCPR
0.2 0.15
0.1

0.15 0.1 0.08

0.06
0.1 0.05
0.04

0.05 0 0.02
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

(a) Precision@TopN (TIST) (b) Recall@TopN (TIST) (c) F1-score@TopN (TIST)

Fig. 6: Performance on TIST dataset.

0.16 0.30
FPMC FM DCPR FPMC FM DCPR 0.20 FPMC FM DCPR
0.14 PRME RNN 0.25
PRME RNN PRME RNN
0.12
0.15
0.20
0.10
0.08 0.15 0.10
0.06
0.10
0.04 0.05
0.05
0.02
0.00 50% 60% 70% 80% 0.00 50% 60% 70% 80% 0.00 50% 60% 70% 80%

(a) Precision@Top5 (4sq) (b) Recall@Top5 (4sq) (c) Fscore@Top5 (4sq)

0.14 0.18
FPMC FM CDL 0.25 FPMC FM CDL FPMC FM CDL
0.12 PRME RNN DCPR PRME RNN DCPR 0.16 PRME RNN DCPR
0.14
0.10 0.20
0.12
0.08 0.15 0.10
0.06 0.08
0.10
0.06
0.04
0.04
0.05
0.02 0.02
0.00 50% 60% 70% 80% 0.00 50% 60% 70% 80% 0.00 50% 60% 70% 80%

(d) Precision@Top5 (Yelp) (e) Recall@Top5 (Yelp) (f) Fscore@Top5 (Yelp)

Fig. 7: Performance comparison with varied training size.

• DCPR consistently outperform other compared meth- dependencies, therefore, DCPR wins in almost all of
ods in two datasets, as shown in Fig. 5. Although in the varied N.
yelp dataset, PRME achieves slightly better results
when N = 1, the proposed DCPR algorithm per- • FM usually performs well in rate prediction tasks,
forms the best in most of general cases. The reason while it achieves inferior results compared to other
that PRME shows slightly higher results is because methods in POI recommendation task. Although FM
that PRME utilizes metric embedding technique to captures all pairwise interactions between all features,
model sequential transition probability. The metric the model is incapable of differentiate the importance
embedding technique is designed to learn transition of different feature interactions. Therefore, it is not
probability between consecutive check-ins, but it can able to focus on important feature interactions and
not model long term sequential influences. In con- ignore insignificant ones. In comparison, the proposed
trast, the proposed DCPR algorithm employ special DCPR have different parts to specialize on modeling
recurrent structure to particularly modeling long-term specific type of information and jointly learn the im-

375
portance of each part in one loss function. Therefore,
it achieves superior results compared to FM.
• The proposed DCPR outperforms other two deep
neural network based models. It can be seen from
Fig. 5 that DCPR achieves much higher accuracy
compared with typical RNN algorithms and CDL
as well. Even though RNN tries to model check-
in sequences, long term dependencies may not be
captured by deep recurrent neural networks. Also, (a) Precision@Top5-4sq (b) Recall@Top5-4sq
RNN ignores text information and thus loses another
source of information to tackle the problem. Although
CDL algorithm learns deep representation for content
information, it is not capable of modeling sequential
influence. The proposed DCPR algorithm models text
information and check-in sequences simultaneously,
so it outperforms RNN and CDL with big margin.
• Comparing the performance of the comparison meth-
ods on three different metrics, we observe that Pre-
cision@N and Recall@N always monotonically de- (c) Precision@Top5-Yelp (d) Recall@Top5-Yelp
crease or increase in all three datasets, while Re-
call@N shows non-monotonic trending. It increases
first and then decrease. It is worth noticing that DCPR Fig. 8: Gain over FPMC on Foursquare and Yelp datasets.
almost always achieves the biggest improvement when
the comparison algorithms are at their highest F1-
score. For example, in Foursquare dataset, Fig. (5c),
when N = 4, DCPR achieves 13% improvement over Recall@Top5 increases 14.92%. The evaluation results on the
FPMC, and DCPR obtains 128% improvement over size of training set indicate that DCPR is capable of producing
FM. Interestingly, for Foursquare and Yelp datasets, high-quality embedding vectors of users and POIs.
almost all algorithms perform best when N = 4. Besides evaluating the proposed approach on the whole
It is probably because Foursquare and Yelp datasets dataset with different metrics in macro level, we also show
contain more users with short sequences. a comprehensive study on the performance of the compared
approaches in micro level. Specifically, we study the perfor-
From Fig. 5, we can conclude that FPMC is the best mance of the proposed algorithm on different users groups
performing baseline method. Therefore, for the large-scale where users are clustered according to the length of their
TIST dataset, we only compare the performance of the pro- check-in sequences. As an illustration example, Fig. 8 shows
posed DCPR algorithm with FPMC in Fig. 6. Since the TIST the gain of DCPR over FPMC in Precision@5 and Recall@5.
dataset does not contain text information, we accommodate the We pick users with modestly long sequences in the overall
proposed DCPR algorithm to only generate embeddings for population for Foursquare and Yelp datasets. At the same time,
POIs by omitting the convolution feature generating process. population density of each group is shown to provide in depth
We can see that, for all three different metrics, DCPR always understanding of the performance of different algorithms. The
outperforms FPMC with a big margin. For instance, for F1- population density of each groups is indicated by the size of
score@N metric, when N equals to 12, DCPR achieves 15.2% orange marker in fig. 8. First of all, in all of the different
improvement over FPMC. For the F1-score@N metric, both group of users, DCPR achieves larger than 10% improvements.
algorithms perform the best when N = 15. It is probably Interestingly, for both datasets, highest improvement always
because TIST dataset include users with longer sequences achieved when users having 11 or 12 check-ins. For instance,
compared to that of Foursquare and Yelp datasets. It shows when sequence length equals to 11, the proposed algorithm
that the proposed DCPR is robust in terms of varying sequence achieves nearly 50% improvement over FPMC in Foursquare
length. Also, compared to the performance on foursquare dataset, while it also improves FPMC nearly 70% in Yelp
and yelp datasets, the proposed DCPR algorithm achieves the dataset. The possible reason for this observation is that when
largest improvement in TIST dataset. It is probably due to the feeding too long a sequence from the past may contain more
reason that the proposed DCPR is especially good at modeling noise, while too short a sequence does not capture enough
long term dependencies and average sequence length of TIST behavior information.
dataset is much longer than that of other two datasets.
The robustness of the proposed algorithm is also tested C. Sensitivity analysis
by varying the size of the training check-ins in Foursquare
We perform the sensitivity analysis in Fig. 9 on two
and Yelp datasets. Also, we pick N = 5 for illustration
parameters: one is the number of convolution kernels n1 , while
purpose. As can be seen in Fig. 7, the proposed DCPR always
the other is the number of latent recurrent features n2 . These
outperform other compared algorithms. For instance, in Yelp
results are all based on Yelp dataset due to space limitation.
dataset, when the size of the training data increase from 50% to
Upper two figures show results of n1 , while bottom two figures
80%, FPMC’s Recall@Top5 increases 7.54%, while DCPR’s

376
Embedding (PRME) [5] model learns embeddings in two
separate spaces which models sequential transition probability
and user preference. Bayesian personalized ranking loss is
introduced to combine learned embeddings to predict future
check-ins. Instead of learning POI representations only from
previous check-ins, [28] proposes to learn representations from
surrounding check-ins inspired by skip-gram. [29] incorporates
skip-gram model with bayesian personalized ranking loss.
(a) Precision@5, n1 (b) Recall@5, n1 Even though PRME also models sequential pattern and user
preference, simply linear combination of embeddings cannot
explain the complex relationship interacted between these two
factors.

B. Context-Aware Recommendation
Although spatial, temporal, and social information have
been investigated in POI recommendation, text information is
relatively less explored in POI recommendation[30], [8], [9].
(c) Precision@5, n2 (d) Recall@5, n2 Text information includes reviews, tags, tips, and categories,
etc. [8] proposes a topic and location aware probabilistic
Fig. 9: Sensitivity analysis. matrix factorization model using POI-associated tags. Firstly,
users’ interest with respect to semantic topics is learned from
text information of POIs through Latent Dirichlet Allocation
(LDA) model. Then, learned users’ topic interests is compared
with POIs’ topic distribution to find potential POIs utilizing
display results of n2 . First column’s figures display analysis on probabilistic matrix factorization. Meanwhile, word-of-month
the Precision@5, while the second column’s figures indicate opinions are considered in the above-mentioned factor-based
analysis on Recall@5. As can be seen, for parameter n1 , model. Yang et al. [7] employs sentiment analysis techniques
when it increases from 5 to 50, values increase, however, to extract users’ preference from text information (tips).
when it increases beyond 50, values almost stay same. For And then, preference inferred from contents is considered
the parameter n2 , when it increases from 5, the performance simultaneously with preference learned from users’ check-
increases drastically, when it reaches 50, the performance stays in behavior. Factor analysis framework is also extended to
evenly. Therefore, for the proposed DCPR algorithm, we pick model geographical influence [24]. Similar to LDA model,
the number of convolution kernels equals to 100 and the [9] proposes a spatial topic model by simultaneously mod-
number of recurrent features as 50. eling spatial and content information in Twitter networks.
[31] investigates personal and local preferences from POIs’
V. R ELATED W ORK contents. [30] exploits contents associated with POIs’ and
A. POI Recommendation comments written by users with weighted matrix factorization.
[32] models personal preferences and sequential influence with
Similar like the traditional recommender systems, matrix a latent probabilistic generative model.
factorization technique is introduced in POI recommendation
[23], [24]. Different from item recommender systems which Above-mentioned models learn text similarity only based
employ explicit user feedback such as ratings, POI recom- on lexical similarity. Two reviews can be semantically similar
mendation utilize implicit user behavior (check-ins) as user when they have low lexical overlaps, as English vocabulary
feedback. Other implicit information is introduced such as is very diverse. These works ignores semantic meaning which
location of check-in POIs, temporal information of check-ins, plays an important role in understanding POIs. In addition,
and social networks. Some recent works focus on leveraging topic modeling-based approaches can easily be affected by
geographical [23], [24], social influences [24] and temporal sparsity problem and also cannot cope with new users and
effects. [24] combines users’ preference, social influence, and POIs.
geographical influence based on matrix factorization frame-
work. [23] proposes a GeoMF model which jointly models C. Deep Learning for Recommendation
geographical information and user preference. [25] introduces
Lately, neural network based methods attract lots of at-
ranking based loss into the GeoMF model.
tentions not only because it generates useful representations
Sequential pattern mining gains lots of attentions lately for various learning tasks but also delivers state-of-the-art per-
in personalized recommendation [6], [26]. Rendle at al. [6] formance on natural language processing and other sequential
proposes a FPMC model which constructs a personalized modeling tasks [11], [12]. Among which, Recurrent Neural
probability transition tensor based on Markov chains. Then, Networks (RNN) is especially good at modeling sequence [33],
a factorization model is proposed to estimate the transition [34]. For example, [11] shows RNN’s superior performance
tensor. FPMC model is extended by incorporating geographical on sequential click prediction. By concurrently model spatial
constraints [27]. Embedding technique [5], [28] attracts lots of and temporal patterns in LBSNs through transition matrix
research attentions lately since it is capable of learning better of RNN, [12] achieves promising improvement over matrix
representations for various tasks. Personalized Ranking Metric factorization-based and markov chain-based algorithms.

377
Researchers start to focus on employing neural network [8] B. Liu and H. Xiong, “Point-of-interest recommendation in location
based models for traditional recommender systems [35], [22]. based social networks with topic and location awareness,” in SDM’13.
[35] proposes an item recommendation algorithm which jointly [9] B. Hu and M. Ester, “Spatial topic modeling in online social media for
location recommendation,” in RecSys’13.
models users and items from reviews utilizing deep neural
[10] H. M. Wallach, “Topic modeling: Beyond bag-of-words,” in ICML’06.
networks.
[11] Y. Zhang, H. Dai, C. Xu, J. Feng, T. Wang, J. Bian, B. Wang, and T.-Y.
As discussed above, while there are studies try to model se- Liu, “Sequential click prediction for sponsored search with recurrent
quential pattern in check-in sequences and review text in item neural networks,” in AAAI’14.
recommender system, they did not address both challenges [12] Q. Liu, S. Wu, L. Wang, and T. Tan, “Predicting the next location: A
recurrent model with spatial and temporal contexts,” in AAAI’16.
simultaneously. Instead of learning sequence from markov
[13] W. W. Cohen, R. E. Schapire, and Y. Singer, “Learning to order things,”
chain-based models, the proposed DCPR model learns per- J. Artif. Int. Res., 1999.
sonalized sequential behaviors with the aid of advanced deep [14] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in
model. Instead of only relying on topic modeling based models NIPS’08.
to handle review text, the proposed DCPR learns the semantic [15] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
meaning and sentimental attitudes of reviews with deep CNN boltzmann machines,” in ICML’10.
model. [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in NIPS’12.
[17] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
VI. C ONCLUSION Comput., 1997.
This paper proposed a deep content-aware POI recom- [18] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr:
mendation (DCPR) algorithm to tackle the problem of POI Bayesian personalized ranking from implicit feedback,” in UAI’09.
recommendation. Broad learning from multiple sources of [19] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Effiicient backprop,”
in Neural Networks: Tricks of the Trade, 1998.
information is utilized to solve this challenging problem.
[20] D. Yang, D. Zhang, L. Chen, and B. Qu, “Nationtelescope: Monitoring
Specifically, text information associated with POIs and users’ and visualizing large-scale collective behavior in lbsns,” J. Network and
check-in sequences are simultaneously modeled in this paper. Computer Applications, 2015.
Furthermore, two different types of deep neural networks are [21] S. Rendle, “Factorization machines with libFM,” ACM Trans. Intell.
combined in an architectural framework with each one learns Syst. Technol., 2012.
one information source, and finally a ranking-based loss is [22] H. Wang, N. Wang, and D.-Y. Yeung, “Collaborative deep learning for
introduced to learn the users’ overall check-in behaviors. The recommender systems,” in KDD’15.
proposed DCPR model learns different source information [23] D. Lian, C. Zhao, X. Xie, G. Sun, E. Chen, and Y. Rui, “Geomf: Joint
discriminatively. Therefore, it can synergistically learns multi- geographical modeling and matrix factorization for point-of-interest
recommendation,” in KDD’14.
source heterogeneous networks. To this end, it is a deep
and broad learning model. Evaluation on three different real- [24] B. Liu, Y. Fu, Z. Yao, and H. Xiong, “Learning geographical preferences
for point-of-interest recommendation,” in KDD’13.
world datasets demonstrated the effectiveness of the proposed
[25] X. Li, G. Cong, X.-L. Li, T.-A. N. Pham, and S. Krishnaswamy, “Rank-
approach. For future work, other side information such as geofm: A ranking based geographical factorization method for point of
temporal information and geographical information can be in- interest recommendation,” in SIGIR’15.
cluded in the proposed framework. Besides, the proposed deep [26] J.-D. Zhang, C.-Y. Chow, and Y. Li, “Lore: Exploiting sequential
framework can be further extended for event recommendation. influence for location recommendations,” in SIGSPATIAL’14.
[27] C. Cheng, H. Yang, M. R. Lyu, and I. King, “Where you like to go
next: Successive point-of-interest recommendation,” in IJCAI’13.
ACKNOWLEDGMENT
[28] X. Liu, Y. Liu, and X. Li, “Exploring the context of locations for
This work is supported in part by NSF through grants IIS- personalized location recommendations,” in IJCAI’16.
1526499, and CNS-1626432, and NSFC 61672313. [29] S. Zhao, T. Zhao, I. King, and M. R. Lyu, “Geo-teaser: Geo-temporal
sequential embedding rank for point-of-interest recommendation,” in
WWW Companion’17.
R EFERENCES [30] H. Gao, J. Tang, X. Hu, and H. Liu, “Content-aware point of interest
recommendation on location-based social networks,” in AAAI’15.
[1] A. Agarwal, K. Hosanagar, and M. D. Smith, “Location, location,
[31] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen, “Lcars: A location-content-
location: An analysis of profitability of position in online advertising
aware recommender system,” in KDD’13.
markets,” JMR’11.
[32] W. Wang, H. Yin, S. W. Sadiq, L. Chen, M. Xie, and X. Zhou,
[2] R. Lee and K. Sumiya, “Measuring geographical regularities of crowd “SPORE: A sequential personalized spatial item recommender system,”
behaviors for twitter-based geo-social event detection,” in SIGSPATIAL in ICDE’16.
Workshop’10.
[33] A. Graves, “Generating sequences with recurrent neural networks,”
[3] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users: CoRR’13.
Real-time event detection by social sensors,” in WWW’10.
[34] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient
[4] C. Xia, R. Schwartz, K. Xie, A. Krebs, A. Langdon, J. Ting, and flow in recurrent nets: the difficulty of learning long-term dependen-
M. Naaman, “Citybeat: Real-time social media visualization of hyper- cies,” in A Field Guide to Dynamical Recurrent Neural Networks, 2001.
local city data,” in WWW’14.
[35] L. Zheng, V. Noroozi, and P. S. Yu, “Joint deep modeling of users and
[5] S. Feng, X. Li, Y. Zeng, G. Cong, Y. M. Chee, and Q. Yuan, “Person- items using reviews for recommendation,” in WSDM’17.
alized ranking metric embedding for next new poi recommendation,” in
IJCAI’15.
[6] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme, “Factorizing per-
sonalized markov chains for next-basket recommendation,” in WWW’10.
[7] D. Yang, D. Zhang, Z. Yu, and Z. Wang, “A sentiment-enhanced
personalized location recommendation system,” in HT’13.

378

You might also like