0% found this document useful (0 votes)
8 views4 pages

Predicting Taxi Demand Using Machine Learning: International Research Journal of Engineering and Technology (Irjet)

The document discusses a methodology for predicting taxi demand in urban areas using machine learning algorithms and streaming data. It evaluates various models, including Random Forest, Linear Regression, and XGBoost, to optimize taxi dispatching and reduce passenger waiting times. The study concludes that Random Forest Regression provides the best accuracy for predicting taxi demand, aiding taxi service companies in improving their operational efficiency.

Uploaded by

nishanthkonda5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Predicting Taxi Demand Using Machine Learning: International Research Journal of Engineering and Technology (Irjet)

The document discusses a methodology for predicting taxi demand in urban areas using machine learning algorithms and streaming data. It evaluates various models, including Random Forest, Linear Regression, and XGBoost, to optimize taxi dispatching and reduce passenger waiting times. The study concludes that Random Forest Regression provides the best accuracy for predicting taxi demand, aiding taxi service companies in improving their operational efficiency.

Uploaded by

nishanthkonda5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

Predicting Taxi Demand Using Machine Learning


Suhas A Bhyratae1, Arvind R2, Bhuvan Kumar S 3 , Aishwarya R Pillai4
1Assistant Professor, Dept of ISE, Atria Institute of Technology,Banglore, Karnataka 2,3,4Students,
Dept of ISE, Atria Institute of Technology,Banglore, Karnataka

---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Taxi service is imbalanced in big cities. Taxi network reliability for both companies and clients can be
drivers have to decide where to wait for passengers to pick up improved with a smart approach regarding this issue: a
someone as soon as possible. Passengers always prefer a quick clever allotment of vehicles throughout stands will reduce
taxi service whenever needed. The busy area to be the average waiting time to pick-up a passenger whereas
concentrated can be decided by the control centre. The sensors the distance travelled will be profitable. Passengers will
that are installed in these vehicles help in automatically also experience a lower waiting time to get a taxi which will
discovering new facts. This data is already being used by be automatically dispatched or directly picked-up at a
transporting systems to find time-saving routes, taxi stand.
dispatching and other such aspects. By organizing the
availability of the taxi, more customers can be served in a
2. PROPOSED METHODOLOGY
short time. In this paper we are using six different algorithms
along with the streaming data to increase the performance of Prediction of taxi demand is a time series analysis problem.
The different steps involved are; cleaning the data,
demand prediction and distribution of taxi-passenger in a
clustering, Fourier Transform and making predictions using
short term time horizon. We evaluate our method on the
machine learning models. In the system a minimum Pentium
dataset of New York City. We do this by dividing the city into
2.266 MHz processor and Python language is used. 1GB RAM
smaller areas and then analyzing and predicting the demand
and 250mb disk space is required. A collection of libraries
in each area. The data set includes around nineteen different such as dask, folium, numpy, pandas, matplotlib, etc are also
features with properties like the GPS location, pickup points, used. The input data has been collected from New York City
drop-off location, etc. This model can be used to predict the Taxi and Limousine Commission’s website. The collected
demand in the different areas of the city at a particular time data was around 7000 examples which had to be cleaned
and we show which the algorithm that gives the best results and it was brought up to 92% accuracy. The dataset is
cleaned in the preprocessing. Redundant data was removed
depending on factors like if the pickup point was outside the
Key Words: Taxi Demand Prediction, Baseline Models, city, if the trip lasted for more than 24hrs and also removing
Regression Models, Time Series Data records which are incomplete. Once the cleaned data set is
available it is then clustered using the K-means algorithm. All
the time series data will be then converted into frequency
1.INTRODUCTION domain to get frequency and amplitude using Fourier
transforms. This is further on given as input to various
Taxi drivers need to choose someplace to wait for the baseline models and regression models for which output will
passengers so that they can pick someone fast. Likewise, be the accuracy. The model with best accuracy will be
passengers also need to find their cabs quickly. Dispatching selected for prediction. The different baseline models used in
the taxi resourcefully helps both the customers and drivers this system are Simple Moving Average, Weighted Moving
and also helps to reduce waiting time for customers, as well Average and Exponential Moving Average and the different
as drivers. In this system, a real-time taxi demand prediction regression models used include Linear Regression, Random
is proposed and streaming data is used to predict the future Forest and xg Boost.
demand for taxis in a particular area at a particular time. The The data points analyzed by creating a series of averages of
few real-time objectives include managing many numbers of various subsets of the complete data set is the moving
taxis in a crowded area, utilization of resources effectively to average. It can also be called as moving mean or rolling mean
lessen waiting time, organizing the available taxi to serve and it is a type of finite impulse response filter. The different
more customers in a short time. Our system uses features types are: simple, weighted and exponential. A simple
like GPS location and other properties of the taxi like pickup moving average (SMA) can be defined as an arithmetic
point, drop point etc. to predict taxi demand. moving average which is calculated by taking the sum of all
the recent values and then dividing that by the number of
Our work focuses on the real-time choice problem about values. Short-term averages react quickly to changes in the
going to the best taxi stand after a passenger drop-off (i.e. values of the underlying, while long-term averages are
where a quicker pickup of a passenger can be got). The

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7338
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

slower to react. The Formula for SMA is: SMA={A_1 + A_2 + that is scalar and one or more independent variables. The
... + A_n}/ n case where one independent variable is present is called a
simple linear regression.
An exponentially weighted moving average (EWMA) which is For cases with more than one independent variable is called
also known as exponential moving average (EMA), is a first- multiple linear regression. Here, multiple correlated
order infinite impulse response filter that applies weighting dependent variables are predicted, rather than a single
factors which decrease exponentially. There is an exponential scalar variable. In linear regression, the linear predictor
decrement in weighting for each older datum, never reaching functions are used to model the relationships whose
zero. The graph at right shows an example of the weight unknown parameters are estimated from the data. These
decrease. Fig. 1 shows the simple and exponential moving models are called linear models. Given the value of the
average indicators. predictors, linear regression focuses on the conditional
probability distribution of the response, instead of the joint
probability distribution of all of these variables. The linear
regression model has an extensive use because these models
depend linearly on their unknown parameters and are easier
to fit than the models which are non-linearly related to their
parameters. Fig. 3 shows how the values are divided in a
simple linear regression.

Fig -1: Simple and Exponential Moving Average

A weighted moving average (WMA) is an average that has


multiplying factors to give different weights to data at
different positions in the sample window. Mathematically,
Fig -3: Linear Regression
the weighted moving average is the convolution of the datum
points with a fixed weighting function.The Fig. 2 shows how
Random forest, also known as random decision forests is a
the weights decrease, from highest weight for the most
method for regression, classification and also tasks which
recent datum points, down to zero. In the exponential
work by constructing a collection of decision trees during the
moving average which follows, it can be compared to the
training time and giving the mode of the classes
weights.
(classification) or a mean prediction (regression) of the
individual trees as output. Fig. 4 gives the structure of
random forest.

Fig -2: Weighted Moving Average

Linear regression is a type of Regression model. It is a linear Fig -4: Random Forest Structure
approach to modelling the relationship between a response

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7339
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

XGBoost is a tool that is highly flexible and versatile which only the features required are extracted. After data mining
can work through most of the regression, classification and we perform clustering using k-means algorithm. Then
other ranking problems. It can be easily accessed and used later the data is passed as input to the different models
through different platforms. XGBoost stands for eXtreme and the predictions are obtained as output. The model
Gradient Boosting. This algorithm was developed to reduce which gives the most accurate prediction is obtained.
the processing time of a computer and to allocate the usage
of memory resources. Handling the missing values, support 4. CONCLUSIONS
parallelization in the construction of a tree, etc are some of
the important features. The fig. 5 shows an example Random Forest Regression seems to be best model where
structure of XGBoost. MAPE of train value decrease below 12% there is not any
sign of overfitting or under fitting but other models seems
little bit overfitting. All models have test MAPE in range of
12.6 to 13.6%.

Fig -5: Example tree structure for XGBoost

Our system produces predictions and the algorithm which


gives the best accuracy will be selected through this process.
We see that the Random Forest, Regression Tree and the
XGBoost are most suitable

3. GENERAL STRUCTURE

Fig -7: MAPE of the models

Our approach towards predicting the taxi demand at a


particular area at a particular time interval provides a simple
and an efficient method for taxi service companies to
improvise their business model based on the demand for taxi
and the availability of customers.

REFERENCES

[1] Filipe Rodriguesa, Ioulia Markoua, Francisco C. Pereiraa


(2018) “Combining time-series and textual data for taxi
demand prediction in event areas: a deep learning
approach” Technical University of Denmark (DTU),
Bygning 116B, Lyngby, Denmark.
[2] Kai Zhao, Denis Khryashchev, Juliana Freire, Cl´audio
Silva, and Huy Vo (2016) “Predicting Taxi Demand at
Fig -6: General Structure of the System High Spatial Resolution: Approaching the Limit of
Predictability” Center for Urban Science and Progress,
The above figure shows the general structure of the model. New York University.
It shows how the data first undergoes data mining where

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7340
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

[3] Ioulia Markou, Filipe Rodrigues, and Francisco C.


Pereira (2018) “Real-Time Taxi Demand Prediction
using data from the web

[4] Stephan Krygsmana, Martin Dijsta, Theo Arentze


(2004) “Multimodal public transport: an analysis of
travel time elements and the interconnectivity ratio”
Urban and Regional Research Centre Utrecht (URU),
Utrecht University, The Netherlands.

[5] Jun Xu, Rouhollah Rahmatizadeh, Ladislau B¨ol¨oni


and Damla Turgut “Real-time Prediction of Taxi
Demand UsingRecurrent Neural Networks”.

[6] Predicting Taxi-Passenger Demand using Streaming


Data. Luis Moreira-Matias, Joaao Gama, Michel Ferreira,
Joaao Mendes-Moreira, Luis Damas.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7341

You might also like