0% found this document useful (0 votes)
156 views6 pages

Taxi Demand Prediction Using Ensemble Model Based On Rnns and Xgboost

This document proposes an ensemble model to predict taxi demand using LSTM, GRU, and XGBOOST models. The models are trained on a dataset from over 5,000 taxis in Bangkok, Thailand over 4 months. Taxi demand is aggregated within 200m of points of interest representing different area functions like hospitals, airports, subway stations. The ensemble model combines the predictions from the individual models to more accurately forecast demand across all area types than any single model. Evaluation shows the ensemble approach outperforms other models for taxi demand prediction.

Uploaded by

Chandra Sekhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views6 pages

Taxi Demand Prediction Using Ensemble Model Based On Rnns and Xgboost

This document proposes an ensemble model to predict taxi demand using LSTM, GRU, and XGBOOST models. The models are trained on a dataset from over 5,000 taxis in Bangkok, Thailand over 4 months. Taxi demand is aggregated within 200m of points of interest representing different area functions like hospitals, airports, subway stations. The ensemble model combines the predictions from the individual models to more accurately forecast demand across all area types than any single model. Evaluation shows the ensemble approach outperforms other models for taxi demand prediction.

Uploaded by

Chandra Sekhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Taxi Demand Prediction using Ensemble Model

Based on RNNs and XGBOOST


Ukrish Vanichrujee, Teerayut Horanont, Wasan Pattara-atikom Takahiro Shinozaki
and Thanaruk Theeramunkong National Electronic and Computer Department of Information and
School of Information, Computer, and Technology Center Communication Engineering
Communication Technology 112 Thailand Science Park, Prahon Yothin Tokyo Institute of Technology
Sirindhorn International Institute of Rd, Klong 1, Klong Luang, Pathumthani Kanagawa, Japan
Technology 12120, Thailand
Thammasat University [email protected]
Bangkok, Thailand
[email protected], [email protected],
and [email protected]

Abstract — Taxis play an important role in urban Therefore, there are many existing proposed systems which use
transportation. Understanding the taxi demand in the future gives taxi GPS trajectories as their transportation sensors. Anomaly
an opportunity to organize the taxi fleet better. It also reduces the detection is one of the most popular topics using taxi GPS data.
waiting time of passengers and cruising time of taxi drivers. Even, Zhang, D. et al [1], applied isolation forest method which uses
there are some works proposed to predict the demand of taxi but “Few and Different” concept to develop anomaly detection
there are few studies that consider the function of areas such as system in taxi trajectories. Yisheng Lv. et al [2], used a GPS
hospital area, department store area, residential area, and tourist trace to build a deep learning model to predict traffic flow of a
attraction. One predictive model may not fit with all types of area. city.
We use a point of interest (POI) to match taxi demand with a place
to study the taxi demand in the area with a different function. In For taxi demand prediction studies, Moreira-Matias, L., et al
this paper, we investigate the best predictive models that can [3], presented a model for predicting the number of services that
forecast demand of taxi hourly with 7 types of area function. The will happen at taxi stands by applying the time-varying Poisson
models that were selected for the experiment are long short term model and the auto regression integrated moving average
memory (LSTM), gated recurrent unit (GRU) and extreme (ARIMA). Moreover, they used sliding-window ensemble
gradient boosting (XGBOOST). Then, we proposed the ensemble framework to originate a prediction by combining the prediction
model that can forecast the taxi demand well with all types of area of each model. The dataset was generated from 441 vehicles
function using the information from those machine learning with 63 taxi stands in the city of Porto. N. Davis et al [4],
models. We build the models based on a real-world dataset
proposed a taxi demand forecasting model using multi-level
generated by over 5,000 taxis in Bangkok, Thailand for 4 months.
The result shows that the proposed ensemble model can
clustering approach and data from a leading taxi booking
outperform other models in overall. application in the city of Bengaluru, India. They applied many
linear forecasting models such as Holt-Winters (HW) model,
Keywords— Taxi Demand Prediction; Time Series; Neuron Seasonal Naive, STL decomposition, ARIMA, TBATS. The
Networks; Spatio-temporal Data result showed that STL had the best performance. There are also
studies [5], [6], [7] using artificial neural networks to forecast
I. INTRODUCTION demand of taxi. Mukai and Yoden [5], also applied artificial
Where to find a passenger is one of the most important neural networks to predict the taxi demand in 25 regions of
questions for all taxi drivers. The more time taxi driver spends Tokyo. They divided the hours of a day into 6 intervals (4 hours).
on cruising a new passenger, the more fuel consumption and the They reported that the area that has highest error rate is the area
fewer number of passengers can be picked-up. For that has many types of transportation such as railway, subway
inexperienced taxi drivers, they usually don’t know where to because the taxi demand is so small and non-periodical. Zhao et
pick-up a new passenger since they have no experience about al [6], compared Markov predictor with Neural Network
the demand of taxi over time and space. The information about predictor using the dataset from yellow taxicab and Uber taxi in
taxi demand in the future can be used to guide both New York. The result showed that Markov predictor can achieve
inexperienced and experienced drivers to catch up with the taxi high accuracy in the area with high theoretical maximum
demand in the city faster. Hence, it helps to match the demand predictability while neural network model performed better in
with supply in taxi services. the area that has lower theoretical maximum predictability. Jun
Xu et al [7], divided the entire New York City into around 6,500
With Rich information from taxi GPS sensors, many areas. Then, they applied a special kind of recurrent neural
researchers have used this digital footprint to discover some network which is long short term memory recurrent neural
knowledge that can help us improving the transportation system.

Thailand Advanced Institute of Science and Technology (TAIST),


National Science and Technology Development Agency (NSTDA), Tokyo
Institute of Technology, Sirinhorn International Institute of Technology,
Thammasat University
network (LSTM RNN) with a data sequence of the number of Timestamp Timestamp of the sample point
taxi requests in each area.
Latitude/Longitude GPS location of the sample point
In this paper, we proposed an ensemble model to predict the
Speed Current speed of a taxi at the sample point
demand of taxi using three machine learning models which are
long short term memory (LSTM) [8], gated recurrent unit (GRU) Angle Current direction of a taxi at the sample point
[9] and extreme gradient boosting (XGBOOST) [10]. Unlike
previous works, we aggregated demand of taxi within 200 meter The status which indicates whether a taxi is
Meter occupied or not (the status is 0 when available,
radius of a point of interest (POI) which is a place such as a and become 1 when occupied)
hospital, an airport, and a subway. We investigated how well
each model can perform in the areas with a different function.
Then, we combined the result produced by those models to A. Data Preparation
produce a final prediction. Figure 1 shows taxi demand patterns
of areas with a different function. The GPS sensor in a taxicab is very sensitive. Therefore, the
obtained position sometimes may be invalid. We want to create
a reliable model to predict the taxi demand accurately, thus the
invalid data should be removed. There are four steps in this
process.
1) Extracting pick-up/drop-off from raw GPS record
A shift of meter status indicates a pick-up/drop-off
event. If the meter is changed from 0 to 1, we mark that
position as the pick-up point. On the other hand, if the meter
is changed from 1 to 0, the position is marked as the drop-
off point.

2) Filtering invalid trajectories


After preliminary investigating the average duration
and speed of normal trajectories, the normal range of
duration should be 100 to 6,000 second. The normal range
of speed should be between 1 m/s to 30 m/s. The trajectories
which are out of those ranges will be removed.

3) Mapping the pick-up event to POI


In this study, we use Points of Interest (POI) to match
Fig. 1. Demand patterns of area with different function
pick-up event with an area. If the pick-up event is within a
The rest of the paper is organized as follows. In Section II, radius of 200 meters from a point of interest, we will
we describe about data processing. In section III, we briefly consider that the pick-up event occurred at that POI.
explain about the chosen predictive models. In section IV, we
show the experimental setup and result. Lastly, we conclude 4) Calculating the aggregated demand in every hour for
everything in the section V conclusion. each point of interest
II. DATA PROCESSING The number of pick-up events represents the demand
The dataset using in this study is real-world data generated of taxi. We aggregated the number of pick-up events for all
by taxis GPS in Bangkok, Thailand from January 1 to June 30, areas in every hour. Therefore, we have the taxi demand of
2016. The data collection process was supported by Toyota each area in 24-time slots a day.
Tsusho Electronic (Thailand) Co, Ltd. The sampling rate of this B. Point of Interest
dataset is 5 second. For historical weather data in Bangkok, we
use the data from openweathermap.org which has hourly There are 7 types of area function in this study. We select top
sampling rate. There are many types of weather condition 10 taxi demand places for each type except the airport because
obtained from the website such as Thunderstorm, Rain, Clouds, Bangkok has only 2 airports.
Clear, and Frog. We reclassify the weather data into only clear 1) Hospital
and rainy. The detail of each fields in datasets is shown in table 2) Residential area
I.
3) Education area
TABLE I. SPECIFICATION OF TAXI TRACE 4) Tourist attraction area
5) Airport
Data Type Description
6) Department Store
IMEI The taxi unique ID 7) Subway
C. Features
There are 6 main features for the predictive model. These (1)
features are selected based on previous studies [5], [7].
(2)
1) Number of pick-up
2) Hour of a day
In (1) and (2), , and denote the input-to-hidden
3) Day of week weight matrix, the hidden-to-hidden weight matrix and the
4) Day of month hidden-to-output weight matrix respectively. The vector and
5) Weather are the bias of the hidden layer and the output layer
6) National Holiday respectively. . and . are the activation functions of the
hidden layer and output layer respectively. The hidden state of
We also applied rolling window technique with the time step will be passed to the hidden state of time step 1.
number of pickups feature. Let n(t) be the number of pick-
up in time step t and w be the window size that we want to However, the conventional recurrent networks suffer from the
look back, if we want to predict n(t), we will also use n(t-1), gradient vanishing problem when it works with multi-step
n(t-2),.., n(t-w) as extra features. The window size w is 24 in dependencies. Long short term memory networks [8] were
this study. invented by Hochreiter and Schmidhuber in 1997. It was
III. MODELS explicitly designed to avoid the gradient vanishing problem in
the traditional recurrent neural networks.
In this section, we briefly explain about our selected models
which are Long Short Term Memory (LSTM), Gated Recurrent 1) Long short term memory
Unit (GRU) and eXtreme Gradient Boosting (XGBOOST). We
also describe the overview of ensemble model in this section. Long short term memory or LSTM is one of the most
popular artificial neural networks currently because it can
A. Recurrent Neural Network (RNN) achieve high accuracy result in many kinds of research. The
A recurrent neural network is a special kind of artificial repeating module in LSTM has more complicated structure than
neural network which has a backward connection between the traditional RNNs. It contains a cell state, a forget gate, an
hidden layers. The recurrent neural networks have internal input a gate and an output gate.
memory which allows them to operate over sequential data
effectively. Therefore, RNNs are one of the most popular 2) Gated recurrent unit
models for dealing with sequential tasks such as handwriting Gated recurrent unit was introduced by Cho et al. [9] in
recognition, language modeling and time series prediction. 2014. The structure of gated recurrent unit is similar to LSTM.
GRU cell consists of two gates which are a reset gate and an
update gate. In some researches, GRU provides a comparable
result to LSTM but for taxi demand prediction, it hasn’t been
explored yet. Therefore, we selected GRU to be another
predictive model in this study.
B. Extreme gradient boosting
Gradient boosting tree model was proposed by Friedman et
Fig. 2. A recurrent neural network al in 2001 [11].The main idea of boosting is to merge a set of
weak learner to a strong one, in an iterative fashion. Extreme
gradient boosting (XGBoost) [10] is a library optimized for
boosting algorithm. The library provides a scalable, portable
framework which was used in many data science competitions.
C. Ensemble model
We combine the predictions from those three models to find
the best final prediction by assigning more weight to the model
that achieved low error rate in the previous w time steps. We use
Fig. 3. An unrolled recurrent neural network sMAPE (Symmetric mean absolute percentage error) to evaluate
the performance of each model. The formula of sMAPE is
Figure 2 shows the structure of a recurrent neural network. described in section IV. Firstly, we find the average of sMAPE
Figure 3 represents the unrolled version of a recurrent neural in previous w time steps for all models. Let
network which illustrates how the network deal with sequential ,…, , , be the set of sMAPE
data clearly. Given an input of time series , ,…. , values for model m in the previous w time steps. We calculate
the RNN computes the hidden state sequence the average sMAPE of model m in the previous w time steps
, ,…. and output sequence , ,…. using (3).
iteratively using (1), (2).
∑ (3) C. Experimental result
We show the sMAPE results of all models over all types of
Then, we rank those models based on their average sMAPE area function. Then, we compare the predictions of all models
in the previous w time steps. Let be the final prediction and with the real demand on a specific place.
, , be the predictions from the models those 1) sMAPE results over all types of area function
were ranked in ascending order. μ, , are the coefficients of the
first rank model, the second rank model and the third rank model
respectively. Finally, the final prediction can be computed by
the weighted arithmetic mean in (4). Figure 4 illustrates the
architecture of our ensemble model.
μ , (4)

Fig. 5. sMAPE result from the LSTM model

Fig. 4. Architecture of our ensemble model

IV. EXPERIMENTAL RESULTS


Fig. 6. sMAPE result from the GRU model
This section contains the experimental setup (A), the evaluation
metric (B), and the experimental result (C).
A. Experimental setup
We have a real-world dataset generated from over 5,000
taxis in Bangkok for 4 months. We train our models with the
first 3 months and validate those models with the last month.
The long short term memory and gated recurrent unit models
were implemented in Keras API [12] which is built on top of
Tensorflow. The simple architecture achieves low error rate. The
designed architectures for both LSTM and GRU have one
hidden layer which contains 150 neurons. The library [10] is
used to build the gradient boosting tree model. For the ensemble
model’s coefficients, μ, , are 0.7, 0.2, and 0.1 respectively. Fig. 7. sMAPE result from the XGBOOST model
The window size w is 3.
B. Evaluation Metric
We evaluate the performance of our predictive models with
the widely used prediction error metrics: Symmetric Mean
Absolute Error (sMAPE). The formulation of the prediction
error metric is given as (5).
, Ŷ,
∑ (5)
, Ŷ ,

, is the real demand while Ŷ , is the predicted demand


occurred in area i at time t. We also include constant C which
equals one to prevent division by zero. Fig. 8. sMAPE result from the ensemble model
Figures 5,6,7,8 show the sMAPE result of the LSTM, GRU,
XGBOOST and ensemble models in box plot respectively. The
area that all models achieve the lowest sMAPE result is the
airport area.

TABLE II. AVERAGE SMAPE OF ALL MODELS IN PERCENT

Models
Areas
lstm gru xgboost ensemble

Airport 18.30% 18.80% 17.31% 17.62%

Residential Area 23.32% 23.95% 27.42% 24.01%

Department Store 24.65% 25.40% 24.21% 23.95%

Education Area 22.01% 22.23% 27.41% 22.20%

Hospital 24.01% 24.84% 26.92% 24.52% Fig. 10. One week prediction in the subway area

Subway 27.01% 27.44% 25.75% 25.92% Figure 9 shows the comparison of one day prediction of each
model with the actual demand in the airport while figure 10
Tourist Attraction 25.39% 25.44% 24.10% 23.85%
shows the comparison of one week prediction in the subway
All places 24.32% 24.82% 25.55% 24.02%
area. Those figures 9, 10 illustrate that the model which provides
a better prediction lately is likely to produce more accurate
prediction in the next time step. The ensemble model follows
The average sMAPE of all models in each area types were this trend. Therefore, it can provide a better prediction in overall.
shown in table II. The XGBOOST approach can outperform
both LSTM and GRU model in the airport, department store and V. CONCLUSTION
tourist attraction areas which are high taxi demand areas. The We proposed an ensemble model based on long short
prediction of GRU is really closed to LSTM’s but the result term memory network (LSTM), gated recurrent unit network
shows that LSTM still provides better prediction than GRU for (GRU) and eXtreme gradient boosting (XGBOOST) models to
all types of area. Both LSTM and GRU work very well in low predict taxi demand in 7 types of area function in the city of
taxi demand areas such as residential area, education area and Bangkok, Thailand. We use Point of Interests (POIs) to match
hospital area. With sMAPE of 24.32%, LSTM is the best the taxi demand with studied areas. The model that provides a
independent predictive model for taxi demand prediction in this better prediction lately is likely to produce more accurate
experiment. For the ensemble model, even though, it can prediction in the next time step. The ensemble model predicts
provide the lowest error rate in only tourist attraction and taxi demand by following this trend. The results show that the
department store areas but it produced the results that are closed ensemble model outperforms other standalone models with
to the result of the best model in each type of area function. The sMAPE of 24.02% in all areas. For the independent models,
average sMAPE of the ensemble model in all places is 24.02% LSTM provides the most accurate prediction in low taxi
which is the lower than all standalone models. demand areas such as residential area, hospital and education
area. LSTM achieves better results than GRU in all types of
2) Comparing the predictions over a specific place area. The XGBOOST model gives better results than the others
in high taxi demand areas such as department store, subway and
airport areas. This study shows that a single predictive model
cannot provide the best prediction in all areas. Combining the
predictions from various models can improve the performance
in overall.
ACKNOWLEDGMENT
This research is financially supported by Thailand Advanced
Institute of Science and Technology (TAIST), National Science
and Technology Development Agency (NSTDA), Tokyo
Institute of Technology, Sirindhorn International Institute of
Technology, Thammasat University under the TAIST Tokyo
Tech Program and partially support by Center of Excellence in
Intelligent Informatics, Speech and Language Technology and
Service Innovation (CILS). The real-world taxi GPS dataset is
supported by Toyota Tsusho Electronics (Thailand) Co, Ltd
Fig. 9. One day prediction in the airport
REFERENCES

[1] ZHANG, D., LI, N., ZHOU, Z.-H., CHEN, C., SUN, L., AND LI, S.,
"IBAT: detecting anomalous taxi trajectories from GPS traces.," In 13th
conference on ubiquitous computing, UbiComp, 2011.
[2] Y. Lv, Y. Duan, "Traffic flow prediction with big data: A deep learning
approach," IEEE Trans. Intell. Transp. Syst., vol. 16, no. 2, pp. 865-873,
2015.
[3] Moreira-Matias, L., et al., "Predicting Taxi–Passenger Demand Using
Streaming Data," IEEE Transactions on Intelligent Transportation
Systems, vol. 14, no. 3, pp. 1393-1402, 2013.
[4] N. Davis, G. Raina, and K. Jagannathan, "A multi-level clustering
approach for forecasting taxi travel demand," in Proc. IEEE ITSC, Dec,
pp. 223-228, 2016.
[5] Naoto Mukai and Naoto Yoden, "Taxi Demand Forecasting Based on
Taxi Probe Data by Neural Network," Smart Innovation, Systems and
Technologies, 2012.
[6] Kai Zhao, Denis Khryashche, Juliana Freire, Cl´audio Silva, and Huy,
"Predicting Taxi Demand at High Spatial Resolution: Approaching the
Limit of Predictability," in IEEE International Conference on Big Data,
2016.
[7] Jun Xu et al, "Jun Xu et al, “Real-Time Prediction of Taxi Demand Using
Recurrent Neural Networks," IEEE Trans. Intell. Transp. Syst., vol. pp,
no. 99, pp. 1-10, 2017.
[8] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[9] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, D.
Bahdanau, Y. Bengio, "Learning phrase representations using RNN
encoder-decoder for statistical machine translation", arXiv preprint arXiv:
1406.1078, 2014.
[10] Tianqi Chen, Carlos Guestrin “XGBoost: A Scalable Tree Boosting
System”, arXiv: 1603.02754v3 [cs.LG] 10 Jun 2016.
[11] Friedman, Jerome H. "Greedy function approximation: a gradient
boosting machine." Annals of statistics (2001): 1189-1232.
[12] Chollet, Fran\c{c}ois, Keras, (2015), GitHub repository,
https://github.com/keras-team/keras

You might also like