0% found this document useful (0 votes)
189 views6 pages

Flight Price Predection 2

The document discusses a flight fare prediction system that uses machine learning algorithms to analyze historical flight data and predict ticket prices. The system applies techniques like data preprocessing, feature selection, and multiple regression analysis to create a predictive model. This model can help customers save money by informing them of trends in ticket prices and providing predicted fare amounts to refer to when booking flights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views6 pages

Flight Price Predection 2

The document discusses a flight fare prediction system that uses machine learning algorithms to analyze historical flight data and predict ticket prices. The system applies techniques like data preprocessing, feature selection, and multiple regression analysis to create a predictive model. This model can help customers save money by informing them of trends in ticket prices and providing predicted fare amounts to refer to when booking flights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

The price of an airline ticket is affected by a number of factors, such as flight distance,

purchasing time, fuel price, etc

Airline Origin and


Destination
Survey (DB1B)

Data
Preprocessing

Air Carrier Statistics Data Preprocessing Feature Extraction


Database (T-100) Feature Selection

Airfare
Economic Data Prediction Model

Flight Fare Prediction System

Vinod Kimbhaune, Harshil Donga, Asutosh Trivedi,


Sonam Mahajan and Viraj Mahajan
EasyChair preprints are intended for rapid
dissemination of research results and are
integrated with the rest of EasyChair.

May 19, 2021

FLIGHT FARE PREDICTION SYSTEM


Ratnakar Garje(B-75), Datta Meghe College Of
Ayush Tambe(B-57), Engineering,Airoli,Na
Raj Singh(B-52) vi Mumbai
Students of Third Year IT Engineering

proposed system can help save millions of


Abstract— Travelling through flights has rupees of customers by proving them the
become an integral part of today’s lifestyle as information to book tickets at the right
more and more people are opting for faster time.
travelling options. The flight ticket prices
increase or decrease every now and then The proposed problem statement is
depending on various factors like timing of the
“Flight Fare prediction system”.
flights, destination, duration of flights. various
occasions such as vacations or festive season. II. RELATED WORK
Therefore, having some basic idea of the flight Proposed study[1] Airfare price
fares before planning the trip will surely help prediction using machine learning
many people save money and time. In the techniques, For the research work a dataset
proposed system a predictive model will be
created by applying machine learning
consisting of 1814 data flights of the
algorithms to the collected historical data of Aegean Airlines was collected and used to
flights. This system will give people the idea train machine learning model. Different
about the trends that prices follow and also number of features were used to train
provide a predicted price value which they can model various to showcase how selection
refer to before booking their flight tickets to of features can change accuracy of model.
save money.
In case study[2] by William groves an
agent is introduced which is able to
I. INTRODUCTION
optimize purchase timing on behalf of
This project aims to develop an customers. Partial least square regression
application which will predict the flight technique is used to build a model.
prices for various flights using machine In a survey paper [4] by supriya rajankar
learning model. The user will get the a survey on flight fare prediction using
predicted values and with its reference the machine learning algorithm uses small
user can decide to book their tickets dataset consisting of flights between Delhi
accordingly. and Bombay. Algorithms such as K-nearest
In the current day scenario flight neighbours (KNN), linear regression,
companies try to manipulate the flight support vector machine (SVM) are applied.
ticket prices to maximize their profits. Research done by Santos[3] analysis is
There are many people who travel regularly done on air fare routes from Madrid to
through flights and so they have an idea London, Frankfurt, New York and Paris over
about the best time to book cheap tickets. course of few months. The model provides
But there are also many people who are the accepted number of days before buying
inexperienced in booking tickets and end up the flight ticket.
falling in discount traps made by the Tianyi wang[5] proposed framework
companies where actually they end up where two databases are combined
spending more than they should have. The together with macroeconomic data and
machine learning algorithms such as In the exploratory data analysis step, we
support vector machine, XGBoost are used cleaned the dataset by removing the
to model the average ticket price based on duplicate values and null values. If these
source and destination pairs. The values are not removed it would affect the
framework achieves a high prediction accuracy of the model. We gained further
accuracy 0.869 with the adjusted R squared information such as distribution of data.
performance metrics Next step is data pre-processing where we
In[6] the research a desired model is observed that most of the data was present
implemented using the Linear Quantile in string format. Data from each feature is
Blended Regression methodology for San extracted such as day and month is
Francisco–New York course where each day extracted from date of journey in integer
airfares are given by online website. Two format, hours and minutes is extracted
features such as number of days for from departure time. Features such as
departure and whether departure is on source and destination needed to be
weekend or weekday are considered to converted into values as they were of
develop the model. categorical type. For this One hot-encoding
III. IMPLEMENTATION and label encoding techniques are used to
For this project, we have implemented convert categorical values to model
the machine learning life cycle to create a identifiable values.
basic web application which will predict the Feature selection step is involved in
flight prices by applying machine learning selecting important features that are more
algorithm to historical flight data using correlated to the price. There are some
python libraries like Pandas, NumPy, features such as extra information and
Matplotlib, seaborn and sklearn. Figure.1 route which are unnecessary features
shows the steps that we followed from the which may affect the accuracy of the model
life cycle: and therefore, they need to be removed
before getting our model ready for
prediction.
After selecting the features which are
more correlated to price the next step
involves applying machine algorithm and
creating a model. As our dataset consist of
labelled data, we will be using supervised
machine learning algorithms also in
supervised we will be using regression
algorithms as our dataset contains
continuous values in the features.
Regression models are used to describe
relationship between dependent and
independent variables. The machine
Fig. 1. Machine Learning Life Cycle learning algorithms that we will be using in
our project are:
Data selection is the first step where Linear Regression
historical data of flight is gathered for the In simple linear regression there is only
model to predict prices. Our dataset one independent and dependent feature
consists of more than 10,000 records of but as our dataset consists of many
data related to flights and its prices. Some independent features on which the price
of the features of the dataset are source, may depend upon, we will be using
destination, departure date, departure multiple linear regression which estimates
time, number of stops, arrival time, prices relationship between two or more
and few more.
independent variables and one dependent variable from dataset as decision nodes for
variable. decision making.
The multiple linear regression model is It divides the whole dataset in different
represented by: sub-section and when test data is passed to
Y = β0x1+…. +βnxn + the model the output is decided by
Ɛ checking the section to which the datapoint
Y = the predicted value of the belong to. And to whichever section the
data point belongs to the decision tree will
dependent variable Xn = the
give output as the average value of all the
independent variables βn = datapoints in the sub-section
independent variables Random Forest
coefficients Random Forest is an ensemble learning
Ɛ = y-intercept when all other technique where training model uses
parameters are 0 Decision Tree multiple learning algorithms and then
Decision trees are basically of two types combine individual results to get a final
classification and regression tree where predicted result. Under ensemble learning
classification is used for categorical values random forest falls into bagging category
and regression is used for continuous where random number of features and
values. Decision tree chooses independent records will
be selected and passed to the group of n = Total
models. Random forest basically uses group number of
of decision trees as group of models. data points
Random amount of data is passed to
Lesser the value of MAE the better the
decision trees and each decision tree
performance of your model.
predicts values according to the dataset
MSE (Mean Square Error)
given to it. From the predictions made by
the decision trees the average value of the Mean Square Error squares the difference
predicted values if considered as the of actual and predicted output values
output of the random forest model. before summing them all instead of using
the absolute value.
MSE = 1/n[∑(y-ý)2]
Performance Metrics
Performance metrics are statistical y=actual
models which will be used to compare the output values
accuracy of the machine learning models ý=predicted
trained by different algorithms. The output values
sklearn.metrics module will be used to n = Total
implement the functions to measure the
number of
errors from each model using the
data points
regression metrics. Following metrics will
be used to check the error measure of each MSE punishes big errors as we are squaring
model. MAE (Mean Absolute Error) the errors. Lower the value of MSE the
better the performance of the model.
Mean Absolute Error is basically the sum of
average of the absolute difference between RMSE (Root Mean Square Error)
the predicted and actual values. RMSE is measured by taking the square root
MAE = 1/n[∑(y-ý)] of the average of the squared difference
y = actual between the prediction and the actual
output values, value.
ý = predicted
output values
RMSE = √1/n[∑(y-ý)2] validation techniques such as gridsearchCV
y=actual and randomizedsearchCV which will be
output values used for improving the accuracy of the
model. Parameters of the models such as
ý=predicted
number of trees in random forest or max
output values depth of decision tree can be changed
n = Total using this technique which will help us in
number of further enhancement of the accuracy.
data points
RMSE is greater than MAE and lesser the The last three steps of the life cycle
value of RMSE between different model the model are involved in the deployment of
better the performance of that model. the trained machine learning model.
R2 (Coefficient of determination) Therefore, after getting the model with the
best accuracy we store that model in a file
It helps you to understand how well the
using pickle module. The back-end of the
independent variable adjusted with the
application will be created using Flask
variance in your model.
Framework where API end-points such and
R2 =𝟏 − ̅2 GET and POST will be created to perform
∑(y-y̅)
operations related to fetching and
The value of R-square lies between 0 to 1. displaying data on the front-end of the
The closer its value to one, the better your application.
model is when comparing with other model
values. The front-end of the application will be
created using the bootstrap framework
where user will have the functionality of
entering their flight data. This data will be
sent to the back-end service where the
model will predict the output according to
the provided data. The predicted value is
sent to the front-end and displayed.

IV.CONCLUSION
A proper implementation of this project
can result in saving money of inexperienced
people by providing them the information
related to trends that flight prices follow
and also give them a predicted value of the
price which they use to decide whether to
book ticket now or later. In conclusion this
type of service can be implemented with
good accuracy of prediction. As the
predicted value is not fully accurate there is
huge scope for improvement of these kind
of service.

V. FUTURE SCOPE
Currently, there are many fields where
prediction-based services are used such as
Fig. 2. System Architecture
stock price predictor tools used by stock
Diagram There are also different cross- brokers and service like Zestimate which
gives the estimated value of house prices. [3] J. Santos Dominguez-Menchero, Javier Rivera
and Emilio TorresManzanera "Optimal purchase
Therefore, there is requirement for service timing in the airline market".
like this in the aviation industry which can
help the customers in booking tickets. [4] Supriya Rajankar, Neha sakhrakar and
Omprakash rajankar “Flight fare prediction using
There are many researches works that have
machine learning algorithms” International
been done on this using various techniques
journal of Engineering Research and Technology
and more research is needed to improve
(IJERT) June 2019. [5] Tianyi wang, samira
the accuracy of the prediction by using Pouyanfar, haiman Tian and Yudong Tao "A
different algorithms. More accurate data Framework for airline price prediction: A
with better features can be also be used to machine learning approach"
get more accurate results. [6] T. Janssen "A linear quantile mixed regression
model for prediction of airline ticket prices"

ACKNOWLEDGMENT [7] Wohlfarth, T.clemencon, S.Roueff “A Dat mining


approach to travel price forecasting” 10 th
We feel great pleasure in submitting this international conference on machine learning
project paper on Flight fare prediction. We Honolulu 2011.
wish to express true sense of gratitude
[8] medium.com/analytics-vidhya/mae-mse-rmse-
towards our project guide, Dr. V. V. coefficient-ofdetermination-adjusted-r-squared-
Kimbhahune who at every stage in the which-metric-is-bettercd0326a5697e article on
performance metrics
study of this project, contributed his
valuable guidance and helped to solve [9] www.keboola.com/blog/random-forest-
every and every problem. Our great regression article on random forest
obligation would remain due towards Dr. V.
[10] https://towardsdatascience.com/machine-
V. Kimbhahune who was a constant learning-basics-decisiontree-regression-
inspiration during our project. He provided 1d73ea003fda article on decision tree regression
with an opportunity to undertake the
project at Smt. Kashibai Navale College of
Engineering, Pune. I feel highly indebted to
them who provided us with all our project
requirements, and did much beyond our
expectations to bring out the best in us. I
sincerely thank our respected Head of
Department Dr. P. N. Mahalle, he proved to
be a constant motivation for the knowledge
acquisition and moral support during our
course curriculum.

REFERENCES
[1] K. Tziridis T. Kalampokas G.Papakostas and K.
Diamantaras "Airfare price prediction using
machine larning techniques" in European Signal
Processing Conference (EUSIPCO), DOI:
10.23919/EUSIPCO .2017.8081365L. Li Y. Chen
and Z. Li” Yawning detection for monitoring
driver fatigue based on two cameras” Proc. 12th
Int. IEEE Conf. Intell. Transp. Syst. pp. 1-6 Oct.
2009.

[2] William Groves and Maria Gini "An agent for


optimizing airline ticket purchasing" in
proceedings of the 2013 international conference
on autonomous agents and multi-agent systems.

You might also like