0% found this document useful (0 votes)

8 views4 pages

Predicting Taxi Demand Using Machine Learning: International Research Journal of Engineering and Technology (Irjet)

The document discusses a methodology for predicting taxi demand in urban areas using machine learning algorithms and streaming data. It evaluates various models, including Random Forest, Linear Regression, and XGBoost, to optimize taxi dispatching and reduce passenger waiting times. The study concludes that Random Forest Regression provides the best accuracy for predicting taxi demand, aiding taxi service companies in improving their operational efficiency.

Uploaded by

nishanthkonda5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views4 pages

Predicting Taxi Demand Using Machine Learning: International Research Journal of Engineering and Technology (Irjet)

Uploaded by

nishanthkonda5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

Predicting Taxi Demand Using Machine Learning

Suhas A Bhyratae1, Arvind R2, Bhuvan Kumar S 3 , Aishwarya R Pillai4
1Assistant Professor, Dept of ISE, Atria Institute of Technology,Banglore, Karnataka 2,3,4Students,
Dept of ISE, Atria Institute of Technology,Banglore, Karnataka

---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Taxi service is imbalanced in big cities. Taxi network reliability for both companies and clients can be
drivers have to decide where to wait for passengers to pick up improved with a smart approach regarding this issue: a
someone as soon as possible. Passengers always prefer a quick clever allotment of vehicles throughout stands will reduce
taxi service whenever needed. The busy area to be the average waiting time to pick-up a passenger whereas
concentrated can be decided by the control centre. The sensors the distance travelled will be profitable. Passengers will
that are installed in these vehicles help in automatically also experience a lower waiting time to get a taxi which will
discovering new facts. This data is already being used by be automatically dispatched or directly picked-up at a
transporting systems to find time-saving routes, taxi stand.
dispatching and other such aspects. By organizing the
availability of the taxi, more customers can be served in a
2. PROPOSED METHODOLOGY
short time. In this paper we are using six different algorithms
along with the streaming data to increase the performance of Prediction of taxi demand is a time series analysis problem.
The different steps involved are; cleaning the data,
demand prediction and distribution of taxi-passenger in a
clustering, Fourier Transform and making predictions using
short term time horizon. We evaluate our method on the
machine learning models. In the system a minimum Pentium
dataset of New York City. We do this by dividing the city into
2.266 MHz processor and Python language is used. 1GB RAM
smaller areas and then analyzing and predicting the demand
and 250mb disk space is required. A collection of libraries
in each area. The data set includes around nineteen different such as dask, folium, numpy, pandas, matplotlib, etc are also
features with properties like the GPS location, pickup points, used. The input data has been collected from New York City
drop-off location, etc. This model can be used to predict the Taxi and Limousine Commission’s website. The collected
demand in the different areas of the city at a particular time data was around 7000 examples which had to be cleaned
and we show which the algorithm that gives the best results and it was brought up to 92% accuracy. The dataset is
cleaned in the preprocessing. Redundant data was removed
depending on factors like if the pickup point was outside the
Key Words: Taxi Demand Prediction, Baseline Models, city, if the trip lasted for more than 24hrs and also removing
Regression Models, Time Series Data records which are incomplete. Once the cleaned data set is
available it is then clustered using the K-means algorithm. All
the time series data will be then converted into frequency
1.INTRODUCTION domain to get frequency and amplitude using Fourier
transforms. This is further on given as input to various
Taxi drivers need to choose someplace to wait for the baseline models and regression models for which output will
passengers so that they can pick someone fast. Likewise, be the accuracy. The model with best accuracy will be
passengers also need to find their cabs quickly. Dispatching selected for prediction. The different baseline models used in
the taxi resourcefully helps both the customers and drivers this system are Simple Moving Average, Weighted Moving
and also helps to reduce waiting time for customers, as well Average and Exponential Moving Average and the different
as drivers. In this system, a real-time taxi demand prediction regression models used include Linear Regression, Random
is proposed and streaming data is used to predict the future Forest and xg Boost.
demand for taxis in a particular area at a particular time. The The data points analyzed by creating a series of averages of
few real-time objectives include managing many numbers of various subsets of the complete data set is the moving
taxis in a crowded area, utilization of resources effectively to average. It can also be called as moving mean or rolling mean
lessen waiting time, organizing the available taxi to serve and it is a type of finite impulse response filter. The different
more customers in a short time. Our system uses features types are: simple, weighted and exponential. A simple
like GPS location and other properties of the taxi like pickup moving average (SMA) can be defined as an arithmetic
point, drop point etc. to predict taxi demand. moving average which is calculated by taking the sum of all
the recent values and then dividing that by the number of
Our work focuses on the real-time choice problem about values. Short-term averages react quickly to changes in the
going to the best taxi stand after a passenger drop-off (i.e. values of the underlying, while long-term averages are
where a quicker pickup of a passenger can be got). The

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7338
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

slower to react. The Formula for SMA is: SMA={A_1 + A_2 + that is scalar and one or more independent variables. The
... + A_n}/ n case where one independent variable is present is called a
simple linear regression.
An exponentially weighted moving average (EWMA) which is For cases with more than one independent variable is called
also known as exponential moving average (EMA), is a first- multiple linear regression. Here, multiple correlated
order infinite impulse response filter that applies weighting dependent variables are predicted, rather than a single
factors which decrease exponentially. There is an exponential scalar variable. In linear regression, the linear predictor
decrement in weighting for each older datum, never reaching functions are used to model the relationships whose
zero. The graph at right shows an example of the weight unknown parameters are estimated from the data. These
decrease. Fig. 1 shows the simple and exponential moving models are called linear models. Given the value of the
average indicators. predictors, linear regression focuses on the conditional
probability distribution of the response, instead of the joint
probability distribution of all of these variables. The linear
regression model has an extensive use because these models
depend linearly on their unknown parameters and are easier
to fit than the models which are non-linearly related to their
parameters. Fig. 3 shows how the values are divided in a
simple linear regression.

Fig -1: Simple and Exponential Moving Average

A weighted moving average (WMA) is an average that has

multiplying factors to give different weights to data at
different positions in the sample window. Mathematically,
Fig -3: Linear Regression
the weighted moving average is the convolution of the datum
points with a fixed weighting function.The Fig. 2 shows how
Random forest, also known as random decision forests is a
the weights decrease, from highest weight for the most
method for regression, classification and also tasks which
recent datum points, down to zero. In the exponential
work by constructing a collection of decision trees during the
moving average which follows, it can be compared to the
training time and giving the mode of the classes
weights.
(classification) or a mean prediction (regression) of the
individual trees as output. Fig. 4 gives the structure of
random forest.

Fig -2: Weighted Moving Average

Linear regression is a type of Regression model. It is a linear Fig -4: Random Forest Structure
approach to modelling the relationship between a response

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7339
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

XGBoost is a tool that is highly flexible and versatile which only the features required are extracted. After data mining
can work through most of the regression, classification and we perform clustering using k-means algorithm. Then
other ranking problems. It can be easily accessed and used later the data is passed as input to the different models
through different platforms. XGBoost stands for eXtreme and the predictions are obtained as output. The model
Gradient Boosting. This algorithm was developed to reduce which gives the most accurate prediction is obtained.
the processing time of a computer and to allocate the usage
of memory resources. Handling the missing values, support 4. CONCLUSIONS
parallelization in the construction of a tree, etc are some of
the important features. The fig. 5 shows an example Random Forest Regression seems to be best model where
structure of XGBoost. MAPE of train value decrease below 12% there is not any
sign of overfitting or under fitting but other models seems
little bit overfitting. All models have test MAPE in range of
12.6 to 13.6%.

Fig -5: Example tree structure for XGBoost

Our system produces predictions and the algorithm which

gives the best accuracy will be selected through this process.
We see that the Random Forest, Regression Tree and the
XGBoost are most suitable

3. GENERAL STRUCTURE

Fig -7: MAPE of the models

Our approach towards predicting the taxi demand at a

particular area at a particular time interval provides a simple
and an efficient method for taxi service companies to
improvise their business model based on the demand for taxi
and the availability of customers.

REFERENCES

[1] Filipe Rodriguesa, Ioulia Markoua, Francisco C. Pereiraa

(2018) “Combining time-series and textual data for taxi
demand prediction in event areas: a deep learning
approach” Technical University of Denmark (DTU),
Bygning 116B, Lyngby, Denmark.
[2] Kai Zhao, Denis Khryashchev, Juliana Freire, Cl´audio
Silva, and Huy Vo (2016) “Predicting Taxi Demand at
Fig -6: General Structure of the System High Spatial Resolution: Approaching the Limit of
Predictability” Center for Urban Science and Progress,
The above figure shows the general structure of the model. New York University.
It shows how the data first undergoes data mining where

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 7340
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

[3] Ioulia Markou, Filipe Rodrigues, and Francisco C.

Pereira (2018) “Real-Time Taxi Demand Prediction
using data from the web

[4] Stephan Krygsmana, Martin Dijsta, Theo Arentze

(2004) “Multimodal public transport: an analysis of
travel time elements and the interconnectivity ratio”
Urban and Regional Research Centre Utrecht (URU),
Utrecht University, The Netherlands.

[5] Jun Xu, Rouhollah Rahmatizadeh, Ladislau B¨ol¨oni

and Damla Turgut “Real-time Prediction of Taxi
Demand UsingRecurrent Neural Networks”.

[6] Predicting Taxi-Passenger Demand using Streaming

Data. Luis Moreira-Matias, Joaao Gama, Michel Ferreira,
Joaao Mendes-Moreira, Luis Damas.

A Re-Examination of The Bene Ts of Exercise For State Body Satisfaction - Consideration of Individual Difference Factors
No ratings yet
A Re-Examination of The Bene Ts of Exercise For State Body Satisfaction - Consideration of Individual Difference Factors
9 pages
IJRPR22505
No ratings yet
IJRPR22505
3 pages
Machine Learning Using Exploratory Analy
No ratings yet
Machine Learning Using Exploratory Analy
9 pages
Ijirt161160 Paper
No ratings yet
Ijirt161160 Paper
5 pages
Uber and Taxi Demand Prediction in Cities
No ratings yet
Uber and Taxi Demand Prediction in Cities
5 pages
Short Brief - Machine Learning
No ratings yet
Short Brief - Machine Learning
10 pages
An Adaptive Hybrid Algorithm For Time Series Prediction in Healthcare
No ratings yet
An Adaptive Hybrid Algorithm For Time Series Prediction in Healthcare
6 pages
Predicting Taxi Demand at High Spatial Resolution Approaching The Limit of Predictability
No ratings yet
Predicting Taxi Demand at High Spatial Resolution Approaching The Limit of Predictability
10 pages
Group2 Forecasting Quanti
No ratings yet
Group2 Forecasting Quanti
21 pages
WCE2008 pp1171-1175
No ratings yet
WCE2008 pp1171-1175
5 pages
Collective Traffic Forecasting
No ratings yet
Collective Traffic Forecasting
15 pages
Fare and Duration Prediction: A Study of New York City Taxi Rides
No ratings yet
Fare and Duration Prediction: A Study of New York City Taxi Rides
6 pages
Demand Forecasting
No ratings yet
Demand Forecasting
9 pages
Sales Forecasting Using Kernel Based Support Vector Machine Algorithm
No ratings yet
Sales Forecasting Using Kernel Based Support Vector Machine Algorithm
6 pages
Newyork Taxi
No ratings yet
Newyork Taxi
9 pages
اسلوب مقترح لمسالة اختيار افضل نموذج تكهن في السلاسل الزمنية حالة
No ratings yet
اسلوب مقترح لمسالة اختيار افضل نموذج تكهن في السلاسل الزمنية حالة
20 pages
Taxi Demand Prediction Using Ensemble Model Based On Rnns and Xgboost
No ratings yet
Taxi Demand Prediction Using Ensemble Model Based On Rnns and Xgboost
6 pages
Machine Learning Thesis
No ratings yet
Machine Learning Thesis
92 pages
Stock Market Prediction Using Machine Learning
100% (1)
Stock Market Prediction Using Machine Learning
7 pages
Time Series Linear Models
No ratings yet
Time Series Linear Models
121 pages
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
No ratings yet
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
5 pages
Gratis
No ratings yet
Gratis
38 pages
1 PB
No ratings yet
1 PB
8 pages
Automotive Scenarios For Trajectory Tracking Using Machine Learning Techniques and Image Processing
No ratings yet
Automotive Scenarios For Trajectory Tracking Using Machine Learning Techniques and Image Processing
6 pages
Comparative Analysis of Different Forecasting Techniques For Ford Mustang Sales Data
No ratings yet
Comparative Analysis of Different Forecasting Techniques For Ford Mustang Sales Data
11 pages
Minor Project
No ratings yet
Minor Project
41 pages
Short-Term Electrical Load Forecasting
No ratings yet
Short-Term Electrical Load Forecasting
5 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
Stock Price Prediction Using Machine Learning Algorithms: ARIMA, LSTM & Linear Regression
No ratings yet
Stock Price Prediction Using Machine Learning Algorithms: ARIMA, LSTM & Linear Regression
7 pages
Electric Power Scam Prediction Using Machine Learning Techniques
No ratings yet
Electric Power Scam Prediction Using Machine Learning Techniques
8 pages
Short Term Load Forecasting Using Gaussian Process Models
No ratings yet
Short Term Load Forecasting Using Gaussian Process Models
12 pages
Forecasting Demand For Services: Sachin Modgil IMI-K
No ratings yet
Forecasting Demand For Services: Sachin Modgil IMI-K
36 pages
MGT 104 Chapter 3
No ratings yet
MGT 104 Chapter 3
51 pages
A Selection of Advanced Technologies For Demand Forecasting in The Retail Industry
No ratings yet
A Selection of Advanced Technologies For Demand Forecasting in The Retail Industry
4 pages
Predicting Taxi Demand at High Spatial Resolution
No ratings yet
Predicting Taxi Demand at High Spatial Resolution
10 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
A Study of Soft Computing Techniques
No ratings yet
A Study of Soft Computing Techniques
19 pages
Unit 2
No ratings yet
Unit 2
38 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
I-Smac49090.2020.9243602
No ratings yet
I-Smac49090.2020.9243602
6 pages
Lecture 06 - Transportation Demand Analysis
No ratings yet
Lecture 06 - Transportation Demand Analysis
6 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
17 pages
MBA Analytics For Finance 11
No ratings yet
MBA Analytics For Finance 11
12 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
309 Monash Time Series Forecasting
No ratings yet
309 Monash Time Series Forecasting
14 pages
Stock Price Prediction Using Time Series
No ratings yet
Stock Price Prediction Using Time Series
9 pages
TIME SERIES ARIMA - Compressed
No ratings yet
TIME SERIES ARIMA - Compressed
15 pages
Artificial Intelligence-Based Traffic Flow Predict
No ratings yet
Artificial Intelligence-Based Traffic Flow Predict
50 pages
Regression Analysis For Non-Linear Load Growth (Load Forecasting)
No ratings yet
Regression Analysis For Non-Linear Load Growth (Load Forecasting)
9 pages
Unit Iii
No ratings yet
Unit Iii
18 pages
02.predict The Stock Exchange of Thailand - Set
No ratings yet
02.predict The Stock Exchange of Thailand - Set
4 pages
Session 8
No ratings yet
Session 8
4 pages
Documento PDF
No ratings yet
Documento PDF
1 page
Documento PDF
No ratings yet
Documento PDF
1 page
TusharGoel Seminar PPT
No ratings yet
TusharGoel Seminar PPT
23 pages
A Prognosis Approach For Stock Market
No ratings yet
A Prognosis Approach For Stock Market
8 pages
Using Machine Learning Models To Predict The Uber
No ratings yet
Using Machine Learning Models To Predict The Uber
7 pages
Jpskycak 2018 Intuiting Predictive Algorithms 1
No ratings yet
Jpskycak 2018 Intuiting Predictive Algorithms 1
16 pages
Forecasting
No ratings yet
Forecasting
9 pages
09 Mathematical Models
No ratings yet
09 Mathematical Models
4 pages
Application Performance Management in Modern Systems: Definitive Reference for Developers and Engineers
From Everand
Application Performance Management in Modern Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Co6 - Correlation and Regression Analysis
No ratings yet
Co6 - Correlation and Regression Analysis
65 pages
Chapter 5 Answers
No ratings yet
Chapter 5 Answers
49 pages
BUSN2037 S1 2025 Lab07 Instructions
No ratings yet
BUSN2037 S1 2025 Lab07 Instructions
4 pages
Logistic Regression From Introductory To Advanced Concepts and Applications 1st Edition Scott Menard
No ratings yet
Logistic Regression From Introductory To Advanced Concepts and Applications 1st Edition Scott Menard
48 pages
Lesson 2 Types of Quantitative Research
No ratings yet
Lesson 2 Types of Quantitative Research
38 pages
Terry Et Al-2019-Journal of Community Psychology
No ratings yet
Terry Et Al-2019-Journal of Community Psychology
13 pages
A Study On Effectiveness of Recruitment Organizational Support in Ites
No ratings yet
A Study On Effectiveness of Recruitment Organizational Support in Ites
11 pages
Herdsmen Farmers Conflict and Food Security in Nigeria A Case Study of Lau Lga
100% (3)
Herdsmen Farmers Conflict and Food Security in Nigeria A Case Study of Lau Lga
58 pages
Impact of English Proficiency
No ratings yet
Impact of English Proficiency
13 pages
ANN Johson Cook Curva Tracao
No ratings yet
ANN Johson Cook Curva Tracao
11 pages
Sharma 1981
No ratings yet
Sharma 1981
10 pages
Exam Statistics (Tine)
No ratings yet
Exam Statistics (Tine)
2 pages
Data Mining Case
No ratings yet
Data Mining Case
8 pages
QTTM509 Research Methodology-I: Dr. Tawheed Nabi
No ratings yet
QTTM509 Research Methodology-I: Dr. Tawheed Nabi
17 pages
Aspects of The Emergence Ecology of The Regionally Endangered Coenagrion Mercuriale Odonata Coenagrionidae in Northeast Algeria
No ratings yet
Aspects of The Emergence Ecology of The Regionally Endangered Coenagrion Mercuriale Odonata Coenagrionidae in Northeast Algeria
8 pages
Regression Analysis Using Excel: X Abp
No ratings yet
Regression Analysis Using Excel: X Abp
7 pages
Major Project Report (ROHIT)
No ratings yet
Major Project Report (ROHIT)
113 pages
Data Analysis Coca Cola
No ratings yet
Data Analysis Coca Cola
7 pages
Personal, Family, and Academic Factors Affecting Low Achievement in Secondary School
No ratings yet
Personal, Family, and Academic Factors Affecting Low Achievement in Secondary School
25 pages
Prediction of Air Quality Index Using Supervised Machine Learning
No ratings yet
Prediction of Air Quality Index Using Supervised Machine Learning
14 pages
Regression Analysis
100% (2)
Regression Analysis
28 pages
The Effect of ICT Literation in Government Financi
No ratings yet
The Effect of ICT Literation in Government Financi
17 pages
HR Analytics Applications of Correlation and Linear Regression
No ratings yet
HR Analytics Applications of Correlation and Linear Regression
19 pages
Mettu University College of Business and Economics Department of Management (Mba Program)
0% (1)
Mettu University College of Business and Economics Department of Management (Mba Program)
38 pages
(P. McCullagh, John A. Nelder) Generalized Linear (B-Ok - Xyz)
No ratings yet
(P. McCullagh, John A. Nelder) Generalized Linear (B-Ok - Xyz)
274 pages
Corruption, Illicit Financial Flows and Political Stability
No ratings yet
Corruption, Illicit Financial Flows and Political Stability
20 pages
Lecture Slides 2.C Research Methods - Experimental Designs
No ratings yet
Lecture Slides 2.C Research Methods - Experimental Designs
24 pages
ImpactofTaxationonEconomicDevelopmentofNigeria2000 2013
No ratings yet
ImpactofTaxationonEconomicDevelopmentofNigeria2000 2013
20 pages
Text Cohesion PDF
No ratings yet
Text Cohesion PDF
15 pages

Predicting Taxi Demand Using Machine Learning: International Research Journal of Engineering and Technology (Irjet)

Uploaded by

Predicting Taxi Demand Using Machine Learning: International Research Journal of Engineering and Technology (Irjet)

Uploaded by

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

Predicting Taxi Demand Using Machine Learning

Fig -1: Simple and Exponential Moving Average

A weighted moving average (WMA) is an average that has

Fig -2: Weighted Moving Average

Fig -5: Example tree structure for XGBoost

Our system produces predictions and the algorithm which

Fig -7: MAPE of the models

Our approach towards predicting the taxi demand at a

[1] Filipe Rodriguesa, Ioulia Markoua, Francisco C. Pereiraa

[3] Ioulia Markou, Filipe Rodrigues, and Francisco C.

[4] Stephan Krygsmana, Martin Dijsta, Theo Arentze

[5] Jun Xu, Rouhollah Rahmatizadeh, Ladislau B¨ol¨oni

[6] Predicting Taxi-Passenger Demand using Streaming

You might also like