0% found this document useful (0 votes)
27 views38 pages

Updated Hard Copy Final Report

This document is a project report on flight fare prediction using machine learning. It discusses the problem statement, aims and objectives, literature review, methodology, implementation, results, and conclusions. The report was submitted by a student in partial fulfilment of the degree of Bachelor of Technology in Computer Engineering.

Uploaded by

SHIVOM YADAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views38 pages

Updated Hard Copy Final Report

This document is a project report on flight fare prediction using machine learning. It discusses the problem statement, aims and objectives, literature review, methodology, implementation, results, and conclusions. The report was submitted by a student in partial fulfilment of the degree of Bachelor of Technology in Computer Engineering.

Uploaded by

SHIVOM YADAV
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Flight fare prediction using machine learning

A Project Report Submitted by

Shiv om Yadav – 91900151044

in partial fulfilment for the award of the degree of

Bachelor of Technology
in
Computer Engineering – AI

Faculty of Engineering & Technology


Marwadi University, Rajkot
April 2024

1
Faculty of Technology
Marwadi University
Computer Engineering – AI department
2023-2024

CERTIFICATE

This is to certify that the project entitled Flight fare prediction using

machine learning has been carried out by Shiv om Yadav – 91900151044]

under my guidance in partial fulfilment of the degree of Bachelor of

Technology in Computer Engineering of Marwadi University, Rajkot during

the academic year 2023-2024.

Date: ____________________

Internal Guide Head of the Department


Prof. Abdul Kalam Dr. Madhu Shukla
Assistant professors

2
Acknowledgments
We would like to extend our sincere gratitude to all those who have contributed to
the successful completion of this project. Your unwavering support, guidance, and
assistance have been invaluable throughout this endeavor. First and foremost, we
would like to express our deep appreciation to Marwadi University for providing
us with conducive research and learning environment. Your commitment to
academic excellence has been a constant source of inspiration.

Our heartfelt thanks go to Dr. Madhu Shukla, whose visionary leadership has not
only guided the department but also provided us with the opportunity to explore our
research interests. Your mentorship has played a pivotal role in our growth. We are
immensely grateful to our mentor, Prof. Abdul Kalam, whose expertise and
guidance have shaped our understanding of the subject matter. We also express our
heartfelt appreciation to Prof. Anjan Kumar Sahoo, our external guide, whose
expertise and guidance were instrumental in shaping our understanding of the
subject matter.

Our sincere appreciation extends to the esteemed faculty members at Computer


Engineering – Artificial Intelligence, who have generously shared their
knowledge and experiences, expanded our horizons, and fostered our spirit of
inquiry. To our fellow classmates, your camaraderie and collaborative efforts have
made this academic journey all the more enriching. We offer our gratitude for your
companionship.

Lastly, we would extend our thanks to our friends for their constant support and
understanding. Your presence has provided us with strength. This project represents
a collective effort, and we are thankful to all who have played a role in its
completion.

3
Flight fare prediction using machine learning

Index [Sample]
Institute’s Vision and Mission............................................................................................ iv
Department’s Vision and Mission ..................................................................................... iv
PEO, POs and PSOs .......................................................................................................... iv
Abstract ................................................................................................................................ iv
Text ....................................................................................................................................... iv

1 Introduction…………………………………………………………………………2
1.1 Problem Summary………………………………………………………………2
1.2 Aims and Objective……………………………………………………………..3
1.3 Problem Specifications………………………………………………………….4
1.4 Literature Review……………………………………………………………….5
1.5 Plane Of The Work……………………………………………………………..6
1.6 Materials And Tools Required…………………………………………………6
1.7 Motivation……………………………………………………………………...8

2 Methodology………………………………………………………………………..10
2.1 Design Specification…………………………………………………………...13
2.2 Proposed Machine Learning Algorithm……………………………………….14

3 Implementation………………………………………………………………………17

3.1 Classification Results……………………………………………………………17

3.2 Proposed Models Result………………………………………………………. 18

4 Conclusion…………………………………………………………………………20

4.1 Summary of The Result…………………………………………………………20

4.2 Discussion………………………………………………………………………20

4.3 Future Work……………………………………………………………………21

5 References……………………………………………………………………… 22

I
Flight fare prediction using machine learning

Institute’s Vision and Mission

Institute’s Vision

Our vision is to address challenges facing our society and planet through sterile education
that builds capacity of our students and empower them through their innovative thinking
practice and character building that will ultimately manifest to boost creativity and
responsibility utilizing the limited natural resources to meet the challenges of the 21st
century.

Institute’s Mission

• To Produce creative, responsible and informed professionals

• To produce individuals who are digital-age literates, inventive thinkers, effective


communicators and highly productive.

• To deliver cost-effective quality education

• To offer world-class, cross-disciplinary education in strategic sectors of economy


though well devised and synchronized delivery structure and system, designed to
tackle the creative intelligence and enhance the productivity of individuals.

• To provide a conducive environment that enables and promotes individuals to


creatively interact, coordinate, disseminate and examine change, opinion as well
as concept that will enable students to experience higher level of learning acquired
through ceaseless effort that led to the development of character, confidence,
values and technical skills.

II
Flight fare prediction using machine learning

Department’s Vision and Mission

Department’s Vision

To impart quality technical education through research, innovation and teamwork for
creating professionally superior and ethically strong manpower that meet the global
challenges of engineering industries and research organization in the area of Computer
Engineering.

Department’s Mission

• Maintain a vital, state-of-the art ICT enabled teaching and learning


methodologies, which provides its students and faculty with opportunities to
create, interpret, apply and disseminate knowledge.

• Enable graduates in becoming digital age literates, innovators, efficient


communicators and result oriented professionals.

• Dedicate itself to providing its students with the skills, knowledge and attitudes
that will allow its graduates to succeed as engineers, leaders, professionals and
entrepreneurs.

• Prepare its graduates for life-long learning to meet intellectual, ethical and career
challenges.

• Inspire graduates for competitive exam higher education as well as research and
development.

III
Flight fare prediction using machine learning

PEO, PO and PSO

Program Educational Objectives (PEO):

Our graduated students are expected to fulfill the following Program Educational
Objectives (PEOs):

1. Core Competency: Successfully apply fundamental mathematical, scientific, and


engineering principles in formulating and solving engineering and real-life problems
for betterment of society.

2. Breadth: Will apply current industry accepted practices, new and emerging
technologies to analyse, design, implement and maintain state of art solutions.

3. Professionalism: Work effectively and ethically in ever changing global professional


environment and multi-disciplinary environment.

4. Learning Environment: Demonstrate excellent communication and soft skills to


fulfil their commitment towards social responsibilities and foster life-long learning.

5. Preparation: Promote research and patenting to enhance technical and


entrepreneurship skills within them.

• Function and communicate effectively to solve technical problems.


• Advance professionally to roles of greater computer engineering responsibilities,
and/or by transitioning into leadership position in various industries such as business,
government, and/or education.
• Prepare for entrepreneurship skills by demonstrating commitment to community by
applying technical skills and knowledge to support various service activities.
• Place themselves in positions of leadership and responsibility within an organization
and progress through advanced degree or certificate programs in engineering,
business, and other professionally related fields.
• Participate in higher study by the process of life-long learning through the successful
completion of advanced degrees, continuing education, and/or engineering
certification(s)/licensure or other professional development.

IV
Flight fare prediction using machine learning

Program Outcomes (POs)

Engineering Graduates will be able to:

PO1: Engineering knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of complex
engineering problems.

PO2: Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.

PO3: Design/development of solutions: Design solutions for complex engineering


problems and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations.

PO4: Conduct investigations of complex problems: Use research-based knowledge


and research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.

PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.

PO6: The engineer and society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.

PO7: Environment and sustainability: Understand the impact of the professional


engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.

PO8: Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.

PO9: Individual and team work: Function effectively as an individual, and as a


member or leader in diverse teams, and in multidisciplinary settings.

V
Flight fare prediction using machine learning

PO10: Communication: Communicate effectively on complex engineering activities


with the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make effective
presentations, and give and receive clear instructions.

PO11: Project management and finance: Demonstrate knowledge and understanding of


the engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary environments.

PO12: Life-long learning: Recognize the need for, and have the preparation and ability
to engage in independent and life-long learning in the broadest context of technological
change.

Program Specific Outcomes (PSOs)

PSO1. Students shall demonstrate skills, the knowledge and competence in the analysis,
design and development of computer-based systems addressing industrial and social
issues.

PSO2. Students shall have competence to take challenges associated with future
technological issues associated with security, wearable devices, augmented reality,
Internet of Anything etc.

VI
Flight fare prediction using machine learning

Attainment
PO / PSO Level Justification

The project applies mathematical and engineering


PO1 2 fundamentals to develop a deep learning model for prostate
cancer classification.
PO2 The project identifies and analyzes complex medical data
3 classification problems, substantiating conclusions using data
and research..
PO3 The project considers public health and safety aspects, as well
2 as societal and environmental factors, in designing a model
for prostate cancer diagnosis.
PO4 The project conducts investigations and provides valid
3 conclusions through research-based knowledge and data
analysis.
PO5 Utilizing modern engineering and IT tools, including machine
3 learning techniques, for the development of the classification
model.
PO6 The project Considering societal and health-related issues in
3 the development of a model that aids in medical diagnosis.

PO7 The project understands the impact in societal and


2 environmental contexts but it primarily focuses on technical
aspects rather than sustainable development.
PO8 The project strictly adhering to ethical principles in the use of
3 artificial intelligence and machine learning for medical data
classification.
PO9 The project collaborating with a multidisciplinary team to
2 develop and validate the machine learning model.
PO10 The project effectively communicating the results and
3 findings of the classification model to the engineering
community and society.
PO11 The project managing the project and resources effectively to
2 achieve the project's objectives.

PO12 The project acknowledges the need for lifelong learning,


2 especially in the context of technological change.
PSO1 The project demonstrates proficiency in computer-based
system development for advanced prostate cancer
3
classification, potentially enhancing medical diagnosis and
patient care.

VII
Flight fare prediction using machine learning

Abstract

Passengers are attempting to grasp how these airline businesses make judgments
regarding flight ticket costs over time, since demand for air travel in India is growing
more popular with multiple flight tickets purchasing on the internet. There are a variety of
strategies that allow you to perform things at the right moment. Customers want the
cheapest ticket possible, but airlines want to maximize their profit by keeping their entire
income as high as feasible. To increase revenue, airlines use several computational
tactics, including as demand forecasting and pricing discrimination. This is for the
consumer who buys flight ticket by estimating the amount of the flight fare. The major
difficulty from the customer’s perspective, finding the perfect value or the ideal time to
purchase tickets is the most difficult component. The bulk of the techniques rely on
advanced computational intelligence, prediction models, and a branch of science called
Machine Learning (ML). This research emphasizes the factors and provides instructions
for developing a machine learning-based aircraft fare prediction model.

VIII
Flight fare prediction using machine learning

List of Tables [Sample]


Table No. Table Description Page No

Table 3.1 The most frequently used metrics for evaluating machine 17

Table 3.2 Models result 18

IX
Flight fare prediction using machine learning

List of Figures [Sample]

Figure No. Figure Description Page No


Proposed methodology
Fig 2(a) 10

Fig 2(b) Source city in economic class 11

Fig 2© Distribution fare price vs day 11

Fig 2(d) city used business class 12

Fig 2(€) Distribution of business 12

Fig 2.1(a) Architectural diagram 14

X
Flight fare prediction using machine learning

1.1 List of Symbols, Abbreviations and Nomenclature

API: Application Programming Interface


CDSS: Clinical Decision Support System
CSV: Comma-Separated Values (a file format)
EDA: Exploratory Data Analysis
GPU: Graphics Processing Unit
GUI: Graphical User Interface
RAM: Random Access Memory
UI: User Interface
KNN: K-nearest neighbors
LR: Logistic Regression
SVM: Support Vector Machine
DT: Decision Tree
RF: Random Forest Classifier
XAI: Explainable Artificial Intelligence

XI
Flight fare prediction using machine learning

Chapter 1

1
Flight fare prediction using machine learning

1.Introduction

The prediction of flight fares has been a topic of interest in the field of transportation and
tourism for many years. Various methods have been proposed for predicting flight fares,
including statistical models, machine learning algorithms, and artificial intelligence techniques.
One common approach for flight fare prediction is the use of linear regression models. These
models use historical data on flight fares, such as the date of the flight, the destination, and the
carrier, to predict future fares. Researchers have found that linear regression models can
provide accurate predictions of flight fares, but they may not be able to capture the complex
relationships between different factors that influence fares.
Another popular approach for flight fare prediction is the use of machine learning algorithms,
such as decision trees, random forests, and neural networks. These algorithms have been found
to be effective in capturing the non-linear relationships between different factors that influence
fares. However, they may require a large amount of data to train effectively.
Artificial intelligence techniques such as deep learning models have also been used for flight
fare prediction. These models have been found to be effective in capturing the non-linear
relationships between different factors that influence fares. However, they are typically more
computationally expensive than traditional machine learning algorithms.
Several studies have been published on flight fare prediction, including those that have used
various datasets, such as airlines' pricing data, search engines' data, and travel agencies' data.
Some studies have also focused on specific industries such as low-cost carriers, full-service
carriers, or specific regions.
In conclusion, the literature on flight fare prediction has shown that there are a variety of
methods that can be used for prediction, including linear regression models, machine learning
algorithms, and artificial intelligence techniques. Each approach has its own strengths and
weaknesses, and the choice of method will depend on the specific needs of the problem and the
availability of data.

1.1 Problem summary

Currently, airlines use sophisticated tactics and procedures to allocate ticket pricing in a
dynamic manner. These techniques take into consideration several financial, marketing,
commercial, and societal elements that have a direct impact on the ultimate price of flight. Due

2
Flight fare prediction using machine learning

to the tremendous complexity of the pricing methods used by airlines, it is very difficult for a
passenger to acquire an airline ticket at the lowest price, since the price fluctuates constantly.

To solve this problem, we have been provided with prices of flight tickets for various airlines
between the months of March and June of 2019 and between various cities, using which we
aim to build a model which predicts the prices of the flights using various input features.

1.2 Background & Objective

Anyone who has booked a flight ticket knows how unexpectedly the prices vary. Airlines use
using sophisticated quasi-academic tactics known as "revenue management" or "yield
management". The cheapest available ticket for a given date gets more or less expensive over
time. This usually happens as an attempt to maximize revenue based on -

1. Time of purchase patterns (making sure last-minute purchases are expensive).

2. Keeping the flight as full as they want it (raising prices on a flight which is filling up in
order to reduce sales and hold back inventory for those expensive last-minute
expensive purchases)

So, if we could inform the travellers with the optimal time to buy their flight tickets based on
the historic data and show them various trends in the airline industry, we could help them save
money on their travels. This would be a practical implementation of a data analysis, statistics,
and machine learning techniques to solve a daily problem faced by travellers.

The objectives of the project can broadly be laid down by the following questions –

1. Flight Trends

Do airfares change frequently? Do they move in small increments or in large jumps?


Do they tend to go up or down over time?

2. Best Time to Buy

What is the best time to buy so that the consumer can save the most by taking the
least risk? So should a passenger wait to buy his ticket, or should he buy as early as
possible?

3. Verifying Myths
3
Flight fare prediction using machine learning
Does price increase as we get near to departure date? Is Indigo cheaper than Jet
Airways? Are morning flights expensive?

1.3 Problem Specifications

The scope of this project extends to the management of extensive airlines datasets,
encompassing historical flight fare records. The primary challenge is to effectively
process these vast datasets, extracting meaningful patterns and trends, while creating
algorithms that balance computational efficiency. The following problem specifications
emerged during the project, drawing insights from the works in recent research papers:
Some of the problems that were faced during the project were as follows:

1. Handling Large and Complex airlines Datasets

2. Extracting Meaningful Patterns from Noisy Data

3. Developing Real-time Predictive Algorithms

4. Ensuring Model Robustness and Adaptability

1.Handling Large and Complex Airlines Datasets:

Airlines generate vast amounts of data from various sources such as flight schedules,
passenger information, maintenance logs, etc. Handling this data requires robust
infrastructure and efficient algorithms to process, store, and retrieve information in a
timely manner.

2.Extracting Meaningful Patterns from Noisy Data:


Data collected from airlines can often be noisy due to various factors like sensor errors,
inconsistent reporting, or missing values. Extracting meaningful insights from such noisy
data involves advanced techniques such as data cleaning, feature engineering, and
statistical analysis to identify patterns and trends.

3.Developing Real-time Predictive Algorithms:

Real-time predictive algorithms are essential for tasks like predicting flight delays,
4
Flight fare prediction using machine learning

optimizing route planning, or forecasting demand. These algorithms need to be fast,


accurate, and capable of processing incoming data streams to provide timely predictions
and recommendations.

4.Ensuring Model Robustness and Adaptability:

Models deployed in the airline industry must be robust to changes in data patterns,
external factors (e.g., weather conditions), and operational dynamics. Regular monitoring
and updating of models are necessary to ensure they remain effective over time.
Additionally, models should be adaptable to new data sources and evolving business
requirements.
Each of these areas requires a combination of domain expertise, data engineering skills, and
advanced analytics techniques to address the specific challenges faced by the airline industry.

1.4 Literature Review and Prior Art Search

• Tiyani Wang [3] proposed to predict the cost on pricing basis at the level of marketing
strategies. The DB1B and T-100 datasets, as well as data about the economy. It depicts
a high-level overview of the proposed framework's primary components. In the data
preparation stage, all datasets are removed to exclude any inaccurate sample data,
changed, and merged based on the section of the market. The feature extraction module
extracts and generates handmade characteristics that are intended to characterize a
market segment.

• P.H.K.Tissera[4] proposed the research component's output which is a web application


built with React native, a hybrid web application development platform with two APIs.
One API is written in node js, while the other is written in Python Flask. K.Tziridis[5]
Th.Kalampokas[6] proposed the complex tactics and approaches are used by airline
firms to assign dynamic airfare pricing. These tactics consider a number of financial,
marketing, commercial, and societal elements that all influence the final flight cost.
Because the pricing mechanisms employed by airlines are incredibly complicated, It is


5
Flight fare prediction using machine learning

quite difficult for a customer to get the best deal on an airline ticket because prices
fluctuate often.

• G.A.Papakostas[7] proposed several strategies have lately been presented that can give
the optimum moment for a consumer to purchase an airline ticket by projecting the
price of the
• flight. The bulk of these strategies rely on advanced prediction models developed in the
Machine Learning branch of computational intelligence research (ML).

• Janssen [8] designed a linear quantile hybrid regressor model that performs well for
predicting plane ticket prices several days before arrival.

• Ren, Yang and Yuan [9], studied for predicting aircraft ticket prices, LR (77.06%
acc.), NB (73.06% acc.), SR (76.84% acc.), and SVM (80.6% acc. for two bins)
models performed well.

1.5 Plan of the work

our project is a smart approach to forecasting flight fares through machine learning. It
begins by analyzing vast amounts of historical flight data, considering factors such as
departure and arrival locations, dates, and ticket types. This data is then cleaned and
organized to focus on the most relevant aspects. Using this refined dataset, the system
trains machine learning models to understand the relationship between various
parameters and ticket prices. These models, such as regression algorithms, learn from
past data to make predictions about future flight fares. The accuracy of these predictions
is continually evaluated and refined to ensure reliability. Once the model is deemed
accurate, it can be deployed to provide real-time fare estimates, helping travelers plan
their trips more effectively. Continuous monitoring and adaptation ensure that the system
remains up-to-date with changing trends and variables in the airline industry, maintaining
its usefulness and accuracy over time.

1.6 Materials / Tools required

6
Flight fare prediction using machine learning
• Historical Flight Data: Access to comprehensive datasets containing past
flight information, including fares, departure/arrival airports, dates, times,
and other relevant details.

• Programming Language: Proficiency in languages like Python ,commonly


used for machine learning tasks.

• Machine Learning Libraries: Familiarity with libraries such as scikit-learn,


TensorFlow, or PyTorch for building and training predictive models.

• Data Preprocessing Tools: Utilization of tools like pandas, NumPy, or


scikit-learn for cleaning, transforming, and preparing the data for analysis.

• Statistical Analysis Tools: Basic understanding and use of statistical


methods and tools for analyzing data distributions, correlations, and outliers.

• Regression Algorithms: Knowledge of regression algorithms like linear

• regression, decision trees, random forests, or gradient boosting machines for


predicting flight fares.

• Model Evaluation Metrics: Familiarity with evaluation metrics like mean


absolute error (MAE), mean squared error (MSE), or root mean squared error
(RMSE) to assess model performance.

• Documentation Tools: Use of tools like Jupyter Notebooks, google colab,


visual studio code.

• Domain Knowledge: Understanding of the airline industry, including factors


influencing flight fares such as seasonal trends, economic indicators, and
market dynamics, to inform feature selection and model training decisions.

7
Flight fare prediction using machine learning

1.7 Motivation

• The motivation for developing a flight fare prediction model lies in its potential to
benefit both consumers and businesses in the travel industry.
• It offers consumers cost-saving insights, convenience, and informed decision-
making while providing businesses with market insights, competitive advantage,
and improved revenue management capabilities.
• Flight fare prediction projects provide an excellent opportunity to apply and
experiment with various data science techniques, such as machine learning
algorithms, time series analysis, and feature engineering. It's a chance to delve
into predictive modelling and gain practical experience in a real-world scenario.

8
Flight fare prediction using machine learning

Chapter 2

9
Flight fare prediction using machine learning

2 Methodology

In order to carry out more research and development for this investigation, the research
methodology is shown below. The research methodology can be broken down into a total of
six distinct stages, each of which is explained in turn below (as shown in Fig 1).

Fig. 2(a): Research Methodology

The first step Application Understanding aims to explain that the Variability in the air fares
can be analysed effectively by using various Machine Learning techniques. The airfares
movement are being judged manually based on the human sentiments based on their
experiences, which lacks to consider the other factors which affect the variability in the
airfares and this manual process takes a lot of time to identify the right price for the flight for a
specific departure date for a customer. Recognizing that traditional techniques might, at times,
be inaccurate and realizing that using modern ways will result in findings that are more
accurate and produced more quickly. This research can enhance in automating the customer’s
experience to make the booking of the flight at the most optimal cost. The passengers will be
able to make flight booking at an optimal cost by getting the predictions and determine the
number of days in advance for a passenger to make the flight booking, by predicting the flight

10
Flight fare prediction using machine learning

ticket price pattern with respect to number of days left for the departure.

The second step (EDA) Data Gathering involves the gathering of dataset to predict flight price
prediction is being taken from the Kaggle1 which is a public and open-source repository. The
Dataset captures the flight level data over the period of 11th February 2022 to 31st March
2022, i.e., for 50 days. The dataset collected consists of the flight level data for the top 6 major
metro cities of India and it captures the data for all the major airline companies of India.
Totally, 300261 datapoints and 11 features have been taken into consideration in the dataset.
The features considered for the dataset consists of the following features, the name of the
airline company, City from which the flight takes off, Departure Time of the flight, Airline
Stops, Arrival Time of the flight, Destination City, Cabin Class, Duration of Flight, Days left
before departure and the target variable for this dataset is considered as the price of the flight.
The datasets that are taken into consideration considers the flight level data for 6 major cities
of India which is Delhi, Mumbai, Chennai, Hyderabad, Kolkata and Bangalore, and also the
flight level dataset captures the data from 6 major airline service providers which are Vistara,
Air India, Go First, Indigo, Air Asia and Spice Jet, among all of these airline companies
Vistara and Air India are known to be premium airline service providers and Go First, Indigo,
Air Asia and Spice Jet are marked as the low cost airline carriers.

https://www.kaggle.com/datasets/shubhambathwal/flight-price prediction? Select=Clean_Dataset.csv

Fig. 2(b): Source city in economic class Fig. 2(c): Distribution day left vs Price

11
Flight fare prediction using machine learning

Fig. 2(d) cities used business class Fig. 2(e) Distribution of business
class

The third step Data pre-processing and transformation involves to pre-process and transform
the flight level dataset which is a structured dataset that requires to be undergone through
various steps of the processing and transformation, the following steps were performed for the
pre-processing and transformation of dataset, Initially the dataset for flight_details.csv was
read into a data frame and then, the Unnamed column was dropped from the flight_details.csv
dataset. After that, the dataset was diagnosed for the null values and no null values were found
in any of the columns of the dataset. Then, the distribution of the numerical variables was seen
by box plots to check for the outlier values in the datasets, the log scaling was applied on the
features having outliers value and then the z-score method was applied to detect and remove
the remaining outliers. After that some of the steps were performed to transform the dataset so
that it can accurately fit into various machine learning models. Firstly, the dummy variables for
all the categorical data in the pre-processed dataset are generated using the one hot encoding

technique. The previous research has shown that the accuracy of a machine learning models
increases when one-hot encoding for categorical variables is being applied as it allows the
representation of categorical data to be more expressive. Then, the correlation matrix is plotted
for all the dummy variables and the original feature with the price of the airline to check the
impact of all the variables on the price of the airline. After this, split is made between the
training and testing sets in the ratio of 70:30 (i.e. train: 70% and test: 30%) to apply various
machine learning models on the pre-processed dataset and after that the target variable i.e. the
price of the airline is being separated with the features set. The Min max scaling
12
Flight fare prediction using machine learning
transformation was applied to the train and the test sets to have a better fit into the machine
learning models.

The fourth step Data modelling and conversion involves to predict the continuous target
variable i.e., the flight price, various ML models which are based on regression techniques are
applied on the processed dataset which includes basic and advanced ML models. Regression
analysis is a statistical method for connecting a dependent variable to one or more independent
(explanatory) variables. A regression model may demonstrate whether variations in the
dependent variable are related to variations in one or more explanatory variables. The
regression models that were used to predict the prices of the flight tickets were, multiple linear
regression, decision tree regressor, K-neighbours regressor, extra trees regressor, Boost
regressor and bagging regressor.

The fifth step Evaluation involves to evaluate various metrics for the machine learning
regression models implemented to give an accurate prediction of the flight price and compare
these metrics to get the best model to accurately predict the flight prices. These evaluation
metrics are RMSE (Root Mean Squared Error), MSE (Mean Squared Error), MAE (Mean
Absolute Error), Adjusted R-Square. Graphical analysis on the performance of all the models
were obtained and the model with lowest error terms was taken into consideration to fit the test
data and get the flight price prediction on the testing data. The detailed description and
formulas for the evaluation metrics used to measure the prediction of airline ticket prices can
be found here.

2.1 Design Specifications

In order to successfully predict the prices of the airfares it becomes important to comprehend
the description of the design approach that I will be following to predict the cost of the air
fares. The airfare prediction starts by collecting the Airlines dataset from Kaggle which is a
public open-source repository and then followed by the pre- processing various features of the
data and after getting the pre-processed data, the data is split into train and testing sets and then
various regression-based models are applied on the training data, followed by the testing of the
model on the test data and finally a predicted output of flight price is generated for the test data
and the model is evaluated. The below diagram (as shown in fig 2) explains about the same.

13
Flight fare prediction using machine learning

Fig 2.1 (a): Architecture diagram for Airfare prediction

2.2 Proposed Machine Learning Algorithms:

After the train and test sets of the data are obtained, multiple machine learning models have
been applied on the training dataset to predict the flight price. The models implemented to
predict the flight price are listed below:

2.2.1 Multiple Linear regression:

The Logistic Regression Classifier is utilized to predict the probability of a data point belonging
to a particular category. It shares similarities with Linear regression in thatit assumes data can
be characterized by a linear function. However, instead of linear modeling, logistic regression
employs the sigmoid function to model the data [6].

Equation of sigmoid function is given in equation (1):

2.2.2 Decision tree regressor:

One of the most popular and useful methods for supervised learning is the decision tree. Both
14
Flight fare prediction using machine learning

Regression and Classification problems may be solved with it, while Classification is more
often utilized in real-world settings. It was the way the tree asked the correct questions at the
right node in Decision Trees for Classification to provide precise and effective classifications.
Entropy and Information Gain are the two metrics used in Classification Trees to accomplish
this. However, because we are making predictions about continuous variables, we are unable to
compute the entropy and follow the same procedure, for the continuous target variable the
mean square error (MSE) is a measurement that indicates how much our projections stray from
the initial goal.

2.2.3 Random forest regressor:

Random Forest, developed by Leo Breiman [7], is a method of ensemble learning that
incorporates numerous decision trees. Unlike a single decision tree, it combines predictions
from diverse trees trained on randomly chosen subsets of the training data, resulting in
improved performance [8]. Similar to decision trees, Random Forest calculates the Gini
Index for each parameter and utilizes the regression error formula [7] to locate the closest
node, which improves prediction accuracy.

15
Flight fare prediction using machine learning

Chapter 3

16
Flight fare prediction using machine learning

3 Implementation

3.1 Classification Results:

A. Metrices

In the era of machine learning, metrics assist as benchmarks for assessing the potency of an
ML model. Represented as Table 3, these metrics gauge the accuracy, efficacy, and efficiency
of a model, enabling comparisons between different models. The selection of a specific
metric hinges on the task at hand and the application's demands. Within machine learning,
various metrics are employed: Accuracy in classification: This metric assesses theproportion of
accurate success rate made by the model, commonly used as the main assessment measure for
classification tasks.
Precision and recall: These metrics assess the degree to which a model can reliably discover
positive cases, such as the identification of cancer in medical diagnostics. Recall shows the
percentage of true positive cases that are accurately detected, whereas precision represents the
percentage of accurate positive forecasts.
F1 score: An indicator of balance between recall and precision, the F1 score is a metric that
takes both metrics and combines them into a single number. Commonly, it is employed to
evaluate a model's overall efficacy in a cohesive fashion.

Predicted values
True False
True Positive (TP) False Negative
(FN) TP + TN
Accuracy =
Type 1 Error TP + TN + FP + FN
True

False Positive (FP) True Negative TN


Specificity =
Type 1 Error (TN) TN + FP
Actual

False

TP + TN
Accuracy =
TP + TN + FP + FN
Pr ecision
TP
2 x Pr ecision x Recall
TP + TN F1 =
Pr ecision + Recall

Table 3.1: The most frequently used metrics for evaluating machine
17
Flight fare prediction using machine learning
learning

B. Model results

Algorithms Accuracy

Linear Regression 90.46

Decision Tree Regressor 95.99

Random Forest Regressor 96.28

XGB Regressor 97.56

Table 3. 2: Models result

18
Flight fare prediction using machine learning

Chapter 4

19
Flight fare prediction using machine learning

4 Conclusion

4.1 summary of the result

• The Decision Trees regressor model was the fourth best performing model with the
adjusted R-Square value of 0.9738, MAE as 1247.90, RMSE as 3307.63 and MSE 1.34
x 107, this suggests that the decision tree regressor have better MAE metrics than the
Extra tree regressor, which means that the average price predicted price variation for
decision tree regressor is lower than the Extra tree regressor. This also means that the
predicted prices vary by the factor of 0.9738 from the original prices, so it can also be
inferred here that the predicted and the actual values are very close and the average
deviation for the predicted prices are deviated by the value of 1247.90 INR.

• The Linear regression model had a lowest accuracy in predicting the flight prices as the
adjusted R-Square was 0.9069, MAE was 4571.18, RMSE as 6919.96 and the MSE as
4.78 x 107 , which means that the predicted prices vary by the factor of 0.9069 from the
original prices, this means that the R-squared is significantly lower than the top 6
models that have been presented above so it can also be inferred that the predicted and
the actual values of prices are significantly less close than the top 6 models and the
average deviation for the predicted prices are deviated by the value of 4571.18 INR,
which is significantly higher than the others models.

• Since the XG boost model had the best evaluation metrics in predicting the flight price,
so the model was then fitted on the test data to get the flight price predictions on the
test data. The predictions of flight prices on the test data and plotted against the actual
values of the flight prices. The test data was seen to fit well on the XG Boost model
and the predicted prices were almost matching the actual price values of the flights.

4.2 Discussion

This project was carried out step by stage, and it was difficult to identify a good dataset and to
extract it, clean it, and convert it. After pre-processing the dataset, one hot encoding for the
categorical variable, and scaling the data using the min max scaling transformation, the
project's goal was achieved. Understanding the purpose of the variables in the dataset and
identifying the relevant factors were the main challenges in the pre-processing of the data. The
Exploratory data analysis was performed after generating the pre-processed dataset which
resulted in generating better insights than the previous work that has been done in predicting
the airfares by (Liu et al., 2017) and gathering more information about the features that affect
the variability of the airfares such as time of flight, the weekday of the departure, duration of
the flight.

20
Flight fare prediction using machine learning

The number of days left before departure, timings of the day at which the flights are departed
and arrived, the weekday on the flight is scheduled and the duration of the flight plays a vital
role for determining the price of the flight price.

The case studies using a variety of sophisticated regression machine learning models have
outperformed the earlier models in the literature study. With an adjusted R-squared value of
0.9845, this research produced the best prediction metrics to predict the price of the flight
using the XG Boost model. This is significantly better than the research conducted by (Wang
et al. 2019), which produced the best adjusted R-squared value of 0.858 using the Random
Forest model. By accurately forecasting a customer's ideal flight cost, this study will have a
significant positive impact on the airline industry's passengers and improve the possibility that
they will make more purchases of airline ticket at the most optimal cost.

4.3 Future Work

• More routes can be added and the same analysis can be expanded to major airports and
travel routes in India.
• The analysis can be done by increasing the data points and increasing the historical data
used.
• That will train the model better giving better accuracies and more savings.
• More rules can be added in the Rule based learning based on our understanding of the
industry,
• also incorporating the offer periods given by the airlines.
• Developing a more user-friendly interface for various routes giving more flexibility to
the users.

21
Flight fare prediction using machine learning

References

[1] R. R. Subramanian, M. S. Murali, B. Deepak, P. Deepak, H. N. Reddy, and R.


R.Sudharsan, “Airline Fare Prediction Using Machine Learning Algorithms,” IEEE Xplore,
Jan. 01, 2022.
[2] K. Tziridis, Kalampokas, A. G. Papakostas and I. K. Diamantaras,“Airfare prices
prediction using machine learning techniques.” In 2021 25th European Signal Processing
Conference (EUSIPCO),August 2021, pp.1036-1039, IEEE.
[3] R. Ren, Y. Yang and S. Yuan, “Prediction of airline ticket price.” University of Stanford,
2014.
[4] K. Tziridis, Kalampokas, A. G. Papakostas and I. K. Diamantaras, “Airfare prices
prediction using machine learning techniques.” In 2017 25th European Signal Processing
Conference (EUSIPCO), August 2017, pp. 1036-1039, IEEE.
[5] M. Papadakis, “Predicting Airfare Prices,” 2014
[6] Gordiievych and I. Shubin, "Forecasting of airfare prices using time series," 2015
Information Technologies in Innovation Business Conference (ITIB), 2015, pp. 68-71, doi:
10.1109/ITIB.2015.7355055.
[7] Y. Narangajavana, F. J. Garrigos-Simon, J. S. García, S. ForgasColl, “Prices, prices and
prices: A study in the airline sector.” Tourism Management, 41, 28-42, 2014
[8] Lantseva, K. Mukhina, A. Nikishova, S. Ivanov, and K. Knyazkov, “Data-driven modeling
of airlines pricing.” Procedia Computer Science, 2015, 66, 267-276.
[9] H. C. Wen and H. P. Chen, “Passenger booking timing for low-cost airlines: A continuous
logit approach.” Journal of Air Transport Management, 64, 91-99, 2017.

22
Flight fare prediction using machine learning

1
Flight fare prediction using machine learning

You might also like