Prediction of Flight-Fare Using Machine Learning
Prediction of Flight-Fare Using Machine Learning
net/publication/369249835
CITATION READS
1 609
8 authors, including:
Aditi Sharma
Parul Universiy
38 PUBLICATIONS 321 CITATIONS
SEE PROFILE
All content following this page was uploaded by Aditi Sharma on 21 March 2023.
Technology and Science For Hyderabad, India Nazarbayev University, and Science For Women,
Women,Guntur,India [email protected] Asthana, Kazakhstan Guntur,India
[email protected] [email protected] [email protected]
Abstract— Passengers are attempting to grasp how these airline our analysis' result. We'll start with the Duration column to
businesses make judgments regarding flight ticket costs over time, figure out how many people are in each group. We'll check for
since demand for air travel in India is growing more popular with
multiple flight tickets purchasing on the internet. There are a null values before discarding the NaN values. Next
variety of strategies that allow you to perform things at the right implementation is Exploratory Data Analysis which is
moment. Customers want the cheapest ticket possible, but airlines important in the achievement of our forecast. We will feature
want to maximize their profit by keeping their entire income as
engineering certain characteristics such that they may be used
high as feasible. To increase revenue, airlines use a number of
computational tactics, including as demand forecasting and to describe the output of our machine learning model. Handling
pricing discrimination. This is for the consumer who buys a flight categorical data is one of the most significant aspects of EDA.
ticket by estimating the amount of the flight fare. The major Categorical data can be classified into Nominal data (without
difficulty from the customer's perspective, finding the perfect
value or the ideal time to purchase tickets is the most difficult order) & Ordinal data that uses One Hot Encoder and Label
component. The bulk of the techniques rely on advanced Encoder (with order). Because of the nominal categorical data,
computational intelligence, prediction models, and a branch of we will do the Airline column using One Hot Encoder. The
science called Machine Learning (ML). This research emphasizes same code will be applied to the Source column, and the Price
the factors and provides instructions for developing a machine
learning-based aircraft fare prediction model. and Source features will be compared. We have previously split
the training and test datasets in our Machine Learning Process.
Keywords— Flight fare, Machine Learning, Random Forest, This strategy will be used to prevent data leakage. The data is
Hyperparameter tuning imported and pre-processed. In this research, we will forecast
the price. As a result, we'll do a feature selection to determine
I. INTRODUCTION
which feature has the best relationship with the target variable
(Price). We will partition the data using scikit learn and utilize
Traveling is one of the biggest reasons, and everyone wants
Random Forest for this sort of analysis. This score can be
to try new things in life. With its diversity, India offers one of
improved by using hyperparameter tuning, which includes
the most fascinating cities in the world, as well as a wealth of
Randomized Search CV and Grid Search CV. This research
fantastic options and one of the world's most religiously and
looks at the three most popular designs and learning
ethnically diverse nations. Because of its cultural and
methodologies. We are able to overcome this challenge in our
geographical variety, India is one of the world's most promising
project. The literature analysis is explained in Section 1 of the
countries. These developments also raise concerns regarding the
further documentation and the recommended system
cost of airline tickets. When comparing the price of an airline
architecture is explained in Section 2. We'll go through the
ticket today to the previous day, it might be difficult to make an
architectural algorithm and procedure in this part.
educated judgement. Tourists who wish to visit a new location
in India should be aware of ticket prices in order to obtain the II. LITERATURE SURVEY
cheapest and most reliable ticket price that meets their
requirements. This void inspires the concept of forecasting Tiyani Wang [1] proposed to predict the cost on pricing basis at
flight tickets in order to make it easier for travellers to buy the level of marketing strategies. The DB1B and T-100 datasets,
tickets that meet their demands. The data set for this project as well as data about the economy. It depicts a high-level
contains 10683 records with 13 columns that define overview of the proposed framework's primary components. In
international and domestic flights in India in 2019. In this paper, the data preparation stage, all datasets are removed to exclude
we have analyzed this data set using machine learning any inaccurate sample data, changed, and merged based on the
techniques in order to forecast the price of an airline ticket based section of the market. The feature extraction module extracts
on the data columns qualities. We'll strive to delete null values and generates handmade characteristics that are intended to
(error input) based on these columns so that they don't impact characterize a market segment.
134
Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on March 15,2023 at 05:36:29 UTC from IEEE Xplore. Restrictions apply.
P.H.K.Tissera[2] proposed the research component's output we predict the cheapest flight ticket price using machine
which is a web application built with React native, a hybrid web learning algorithms.
application development platform with two APIs. One API is
written in node js, while the other is written in Python Flask.
K.Tziridis[3] Th.Kalampokas[4] proposed the complex tactics
and approaches are used by airline firms to assign dynamic
airfare pricing. These tactics consider a number of financial,
marketing, commercial, and societal elements that all influence
the final flight cost. Because the pricing mechanisms employed
by airlines are incredibly complicated, It is quite difficult for a
customer to get the best deal on an airline ticket because prices
fluctuate often.
G.A.Papakostas[5] proposed several strategies have lately been
presented that can give the optimum moment for a consumer to
purchase an airline ticket by projecting the price of the flight.
The bulk of these strategies rely on advanced prediction models
developed in the Machine Learning branch of computational
intelligence research (ML).
Janssen [6] designed a linear quantile hybrid regressor model
that performs well for predicting plane ticket prices several days
before arrival.
Ren, Yang and Yuan [7], studied for predicting aircraft ticket Fig. 1. Architecture of Proposed System
prices, LR (77.06% acc.), NB (73.06% acc.), SR (76.84% acc.),
and SVM (80.6% acc. for two bins) models performed well. IV. ALGORITHM
135
Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on March 15,2023 at 05:36:29 UTC from IEEE Xplore. Restrictions apply.
1. Take a look at n different training scenarios from Z, Y with Figure 2 is a Heat map which shows uunique features of a
replacement; label them Zv, Zv. matrix are portrayed as colors, which is a graphical depiction.
2. Use Zv and Yv to train a classification or regression tree. A heat map is a useful tool for visualzing how values are
After training, you can create predictions for unknown samples distributed across two dimensions of a matrix. This enhances
z' by adding the predictions from all the different regression pattern detection and gives the impression of depth.
trees on z':
1 The fitting model feature selection approach shown in Figure 3
f̂ = V ∑Vv=1 fv (z̅)
is based on a machine learning technique that we are aiming to
Alternatively, the majority vote is employed in the case of
apply to a specific data set. It uses a greedy search method, in
classification trees.
which all feasible feature combinations are compared to the
Additionally, the standard deviation of the predictions from all
evaluation criterion.
of the separate regression trees on may be used to quantify the
prediction's uncertainty.
V. RESULTS
Fig. 2. Heat-map
Fig. 3. The fitting model feature selection approach Fig. 5. Scatter plot1
136
Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on March 15,2023 at 05:36:29 UTC from IEEE Xplore. Restrictions apply.
Figure 5 is a scatter plot, which is a two-dimensional graph that VI. CONCLUSION
depicts the amount of influence one variable has on another or
the relationship between them. Scatter plots, like line graphs, In this research paper we performed machine learning models
depict data points on vertical & horizontal axes. to find the cheapest ticket price. In a literary review, ticket
anticipation and demand forecast algorithms are utilized. We
began with a summary of airline pricing policies, which
involves periodic ticket price adjustments based on internal and
external factors. We described how customers and airlines
interact to determine dynamic ticket costs. We imported the data
set and performed exploratory data analysis to predict the
outcome. We converted all text data types into numerical
datatype as machine learning is the study of computer
algorithms. We dropped the columns which are of no use and
handled the categorical data. The training and test datasets have
been split. This strategy will be utilized to prevent data leaking,
and the model will be fitted using the Random Forest Regressor,
For Classification and Regression, ensemble learning is used.
The data is imported and pre-processed. based on our
hyperparameter tuning, we've improved the result. By using the
Randomized Search CV as our hyperparameter tuning, we have
already improved our score from 79 to 81 percent.
137
Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on March 15,2023 at 05:36:29 UTC from IEEE Xplore. Restrictions apply.
[15] T. Liu, J. Cao, Y. Tan and Q. Xiao, "ACER: An adaptive context- [30] M. S, A. Sharma, S. P. Singh, V. Solanki, S. Sethuramalingam, and
aware ensemble regression model for airfare price prediction," 2017 S. P. Singh, “SVM-based compliance discrepancies detection using
International Conference on Progress in Informatics and remote sensing for organic farms,” Arabian Journal of Geosciences,
Computing (PIC), 2017, pp. 312-317, doi: vol. 14, no. 14, 2021.
10.1109/PIC.2017.8359563. [31] V. Goar, A. Sharma, N. S. Yadav, S. Chowdhury, and Y.-C. Hu,
[16] T. Wohlfarth, S. Clemencon, F. Roueff and X. Casellato, "A Data- “IOT-based smart mask protection against the waves of covid-19,”
Mining Approach to Travel Price Forecasting," 2011 10th Journal of Ambient Intelligence and Humanized Computing, 2022.
International Conference on Machine Learning and Applications [32] V. Gupta, N. Kumar, A. Sharma, and A. Abraham. "Sensor Routing
and Workshops, 2011, pp. 84-89, doi: 10.1109/ICMLA.2011.11. Protocol with Optimized Delay and Overheads in Mobile based
[17] T. Szopiński and R. Nowacki, “The influence of purchase date and WSN." Journal of Information Assurance & Security 16, no. 4
flight duration over the dispersion of airline ticket prices.” (2021).
Contemporary economics, 9(3), 253-366, 2015. [33] Ch. Gangadhar, K. Chanthirasekaran, K. R. Chandra, A. Sharma, M.
[18] V. H. Vu, Q. T. Minh and P. H. Phung, "An airfare prediction model Thangamani, and P. S. Kumar, “An energy efficient NOMA-based
for developing markets," 2018 International Conference on spectrum sharing techniques for cell-free massive MIMO”,
Information Networking (ICOIN), 2018, pp. 765-770, doi: International Journal of Engineering Systems Modelling and
10.1109/ICOIN.2018.8343221. W. Groves, W., & Gini, M. (2011). Simulation 2022 13:4, 284-288
A regression model for predicting optimal purchase timing for [34] J. S. Priya, A. Sharma, S. Gopinath, H. Muthukrishnan, E. B.
airline tickets. Pukkunnen, P. Jenopaul, S. G. Kumar. (2021). Block Chain (Binary
[19] W. Groves and M. Gini, On optimizing airline ticket purchase Relevance Method) Using Machine Learning Technique. Annals of
timing. ACM Transactions on Intelligent Systems and Technology the Romanian Society for Cell Biology, 1537–1548.
(TIST), 2015, 7(1), 1-8. [35] G. Sonowal, A. Sharma, L. Kharb, “Spear-phishing emails
[20] Y. Xu and J. Cao, "OTPS: A decision support service for optimal verification method based on verifiable secret sharing scheme,”
airfare Ticket Purchase," 2017 IEEE International Conference on Journal of Information Assurance & Security, vol. 16 no. 3, pp. 117-
Big Data (Big Data), 2017, pp. 1363-1368, doi: 124, 2021.
10.1109/BigData.2017.8258068. [36] V. Goar, A. Sharma, D. Chahal , “Android Asset Packaging Tool
[21] J. Abdella, M. N. Zaki, K. Shuaib and F. Khan, “Airline ticket price based Forensics Security and Predictive Analysis.” Journal of
and demand prediction: A survey.”, Journal of King Saud Information Assurance & Security . 2021, Vol. 16 Issue 3, p124-131.
University-Computer and Information Sciences, 33(4), 375-391, 8p.
2021. [37] K. Vashishtha, A. Chauhan, A. Sharma, “Key Spreading and Mutual
[22] N. Kumar, “Machine intelligence prospective for large scale video Validation schemes for Privacy Protection in Fog Computing
based visual activities analysis,” 2017 Ninth International Environment using MNSOR Protocols” Journal of Information
Conference on Advanced Computing (ICoAC), 2017, pp. 29-34, Assurance & Security. 2021, Vol. 16 Issue 4, p148-155. 8p.
doi: 10.1109/ICoA [38] V. Goar, M. Kuri, R. Soni, A. Sharma, “An Energy Savings
[23] R. Bhadada, A. Sharma, “Montgomery implantation of ECC over Approach Based on Data Mining by K-Means Clustering and R-
RSA on FPGA for public key cryptography application,” 2014 2nd Programming Framework”. In: Goar, V., Kuri, M., Kumar, R.,
International Conference on Emerging Technology Trends in Senjyu, T. (eds) Advances in Information Communication
Electronics, Communication and Networking, 2014, pp. 1-5, doi: Technology and Computing. Lecture Notes in Networks and
10.1109/ET2ECN.2014.7044973. Systems, vol 392, 2022. Springer, Singapore.
[24] A. Sharma and R. Bhadada, “KOM multiplier for ECC https://doi.org/10.1007/978-981-19-0619-0_53
implementation in FPGA,” International Journal of Control Theory [39] N. K. Gupta, A. Walia, and A. Sharma, “GP-MSJF: An improved
and Applications, vol. 10, pp. 677-683, 2017 ICoAC.2017.8441320. load balancing generalized priority-based modified SJF scheduling
[25] R. Dash, T. N. Nguyen, K. Cengiz, A. Sharma, “FTSVR: Fine-tuned in cloud computing,” Advances in Information Communication
support vector regression model for stock predictions,” Neural Technology and Computing, pp. 589–597, 2022.
Computing and Applications, 2021.https://10.1007/s00521-021- [40] S. Samanta, A. Sarkar, A. Sharma, and O. Geman, “Security and
05842-w challenges for blockchain integrated fog-enabled IOT
[26] N. Kumar, N. Sukavanam, “Detecting helmet of bike riders in Applications,” Lecture Notes in Networks and Systems, pp. 13–24,
outdoor video sequences for road traffic accidental avoidance,” In: 2022.
Abraham A., Cherukuri A., Melin P., Gandhi N. (eds) Intelligent [41] S. Samanta, A. Sarkar, and A. Sharma, “Cognitive IOT for future
Systems Design and Applications. ISDA 2018 2018. Advances in city: Architecture, security and Challenges,” Lecture Notes in
Intelligent Systems and Computing, vol 941, 2020, Springer, Cham. Electrical Engineering, pp. 153–165, 2022.
https://doi.org/10.1007/978-3-030-16660-1_3 [42] S. Samanta, A. Sarkar, C. Gupta and A. Sharma, “Machine learning
[27] N. Kumar, A. Sharma, “A spoofing security approach for facial integrated blockchain model for industry 4.0 smart applications”,
biometric data authentication in unconstraint environment,” In: Pati Knowledge engineering for modern information systems, 2021,
B., Panigrahi C., Misra S., Pujari A., Bakshi S. (eds) Progress in 10.1515/9783110713633
Advanced Computing and Intelligent Engineering. Advances in [43] S. Samanta, A. Sarkar, and A. Sharma, “Networking Technologies
Intelligent Systems and Computing, vol. 713, 2019, Springer, and challenges for green IOT applications in urban climate,”
Singapore. https://doi.org/10.1007/978-981-13-1708-8_40 Machine Learning and Data Science, pp. 169–184, 2022.
[28] J. R. Albert, A. Sharma, “Investigation on load harmonic reduction
through solar-power utilization in intermittent SSFI using particle
swarm, genetic, and modified firefly optimization algorithms.”
Journal of Intelligent & Fuzzy Systems, vol. 42, no. 4, pp. 4117-
4133, 2022, doi: 10.3233/jifs-212559
[29] S. J. Suji Prasad, M. Thangatamilan, M. Suresh, H. Panchal, C. A.
Rajan, C. Sagana, B. Gunapriya, A. Sharma, T. Panchal, and K. K.
Sadasivuni, “An efficient Lora-based Smart Agriculture
Management and Monitoring System using wireless sensor
networks,” International Journal of Ambient Energy, vol. 43, no. 1,
pp. 5447–5450, 2021.
138
Authorized licensed use limited to: Parul Institute of Engineering and Technology. Downloaded on March 15,2023 at 05:36:29 UTC from IEEE Xplore. Restrictions apply.
View publication stats