Presentation On Flight Price Prediction 2

Uploaded by

bytestech50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views30 pages

Presentation On Flight Price Prediction 2

Uploaded by

bytestech50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

You are on page 1/ 30

FLIGHT PRICE PREDICTION

PRESENTED BY: Saurabh Yadav

INDEX
Introduction
Problem Statement.
Problem Understanding.
What is Housing Price Prediction?
Importance of housing price prediction.
Exploratory data analysis.
Visualizations.
Analysis.
Model Building.
Hyper Parameter Tunning.
Saving the model and predictions from saved best model.
Conclusion.
INTRODUCTION
Airline industry is one of the most sophisticated in its use of dynamic pricing strategies to maximise revenue, based on proprietary
algorithms and hidden variables. That is why the airline companies use complex algorithms to calculate the flight ticket prices.
There are several different factors on which the price of the flight ticket depends. The seller has information about all the factors,
but buyers are able to access limited information only which is not enough to predict the airfare prices. Considering the features
such as departure time, arrival time and time of the day it will give the best time to buy the ticket.
Nowadays, the number of people using flights has increased significantly. It is difficult for airlines to maintain prices since prices
change dynamically due to different conditions. That’s why we will try to use machine learning models to solve this problem.
This can help airlines by predicting what prices they can maintain. It can also help customers to predict future flight prices and
plan their journey accordingly.
PROBLEM STATEMENT:
Anyone who has booked a flight ticket knows how unexpectedly the prices vary. The cheapest available ticket on a given flight
gets more and less expensive over time. This usually happens as an attempt to maximise revenue based on -
1. Time of purchase patterns (making sure last-minute purchases are expensive).
2. Keeping the flight as full as they want it (raising prices on a flight which is filling up in order to reduce sales and hold back
inventory for those expensive last-minute expensive purchases).
Business goal: The main aim of this project is to predict the price of flight tickets based on various features. The purpose of the
paper is to study the factors which influence the fluctuations in the airfare prices and how they are related to the change in the
prices.
Then using this information, build a system that can help buyers whether to buy a ticket or not. So, we will deploy a Machine
Learning model for flight ticket price prediction and analysis. This model will provide the approximate selling price for the flight
tickets based on different features.
PROBLEM UNDERSTANDING
Airlines implement dynamic pricing for their tickets and base their pricing decisions on demand estimation models. The
reason for such a complicated system is that each flight only has a set number of seats to sell, so airlines must regulate
demand. In the case where demand is expected to exceed capacity, the airline may increase prices, to decrease the rate at
which seats fill. On the other hand, a seat that goes unsold represents a loss of revenue and selling that seat for any price
above the service cost for a single passenger would have been a preferable scenario.
Here we are trying to help the buyers to understand the price of the flight tickets by deploying machine learning models.
These models would help the sellers/buyers to understand the flight ticket prices in market and accordingly they would be
able to book their tickets.
Benefits of Flight Price Prediction
Pricing in the airline industry is often compared to a brain game
between carriers and passengers where each party pursues the best
rates. Carriers love selling tickets at the highest price possible while
still not losing consumers to competitors. Passengers are crazy about
buying flights at the lowest cost available while not missing the
chance to get on board. All this makes flight prices fluctuant and hard
to predict. But nothing is impossible for people armed with intellect
and algorithms. Predicting flight prices helps an individuals to know
and understand the future price of the flight tickets.
There are two main use cases of flight price prediction in the travel
industry. OTAs and other travel platforms integrate this feature to
attract more visitors looking for the best rates. Airlines employ the
technology to forecast rates of competitors and adjust their pricing
strategies accordingly.
Data Analysis and Model Building Flowchart
Import Libraries Import Datasets Data
Preprocessing

Identifying EDA & Finding and

Outliers and Visualization Treating Null
Skewness Values

Ordinal Checking Model Building

Encoding Correlation &
VIF

Saving the Model Hyper Parameter R2 score, CV &

& Prediction Tuning evaluation
metrics
EXPLORATORY DATA ANALYSIS
As a first step I have imported required libraries and I have imported the datasets which were in csv format.
Then I did all the statistical analysis like checking shape, nunique(unique value each column contains), value counts, info
etc…..
While checking the info of the datasets I found some columns with more than 80% null values, so these columns will create
skewness in datasets so I decided to drop those columns, since it seem to me as unnecessary.
Then while looking into the value counts I found some columns with more than 85% zero values this also creates skewness in
the model and there are chances of getting model bias so I have dropped those columns with more than 85% zero values.
Exploratory Data Analysis (EDA) Steps
➢ Importing necessary libraries and loading collected dataset as a data frame.
➢ Checked some statistical information like shape, number of unique values present, info, unique (), data types, value count function etc.
➢ Checked null values and found some missing values on column “Meal_Availability” and filled the null values by using mode method.
➢ Taking care of Timestamp variables by converting data types of “Dep_Time” and “Arrival_Time” from object data type into datetime data
types.
➢ Extracted Departure_Hour, Deparutre_Min and Arrival_Hour, Arrival_Min columns from Dep_time and Arrival_Time columns and
dropped these columns after extraction.
➢ The target variable "price" should be continuous numeric data but due to some string values like “,” it was showing as object data type.
So, I replaced this sign by empty space and converted into float data type.
➢ From the value count function of Total_Stops, I found categorical data so replaced them with numeric data according to stops.
➢ Checked statistical description of the data and separated categorical and numeric features.
➢Visualized each feature using seaborn and matplotlib libraries by plotting several categorical and numerical plots.
➢ Identified outliers using box plots.
➢ Checked for skewness and removed skewness in numerical column “Duration” using square root transformation method.
➢ Encoded the columns having object data type using Label Encoder method. Used Pearson’s correlation coefficient to check the
correlation between label and features. With the help of heatmap and correlation bar graph was able to understand the Feature vs Label
relativity.
➢ Separated feature and label data and feature scaling is performed using Standard Scaler method to avoid any kind of data biasness.
Visualization :Univariate Analysis for Numerical Variables
The distribution plot shows how the data has been
distributed in each of the columns.
From the distribution plot we can observe the columns are
somewhat distributed normally as they have no proper bell
shape curve.
The columns like "Duration", " Total_Stops " and "Price"
are skewed to right as the mean value in these columns are
much greater than the median(50%).
Also the data in the column Arrival_Hour skewed to left
since the mean values is less than the median.
Since there is presence of skewness in the data, we need to
remove skewness in the numerical columns to overcome
with any kind of data biasness.
VISUALIZATION

Highest number of airline preferred by people are Indigo covering 49.48% of the total record. Air Asia, Go First and Vistara
and similar in range. FlyBig has the lowest numbers.
VISUALIZATION

The departure area or source place highly used or people majorly flying from the city is "New Delhi" covering 31.91%
record in the column
We see that "Mumbai" is a close second wherein it covers 21.85% records in the column
Other two famous locations where people chose to fly from are "Bangalore", "Hyderabad" and "Kolkata"
The least travel from location is "Chennai"
VISUALIZATION

When we observe the barplot for Departure hour vs Airline we can see that FlyBig has the highest departure time while
IndiGo has the lowest departure time
Considering the barplot for Arrival time vs Airline we can see that FlyBig has the highest arrival time while Vistara have the
lowest arrival time
Looking at the barplot for Flight duration vs Airline we observe that Ai Asia has the highest flight duration while Alliance
Air has the lowest flight duration collectively
Comparing the barplots for Flight prices vs Airline we can clearly see that Vistara have very high flight prices while the
FlyBig has the lowest fare.
VISUALIZATION
When we observe the barplot for Departure
hour vs Airline we can see that FlyBig has the
highest departure time while IndiGo has the
lowest departure time
Considering the barplot for Arrival time vs
Airline we can see that FlyBig has the highest
arrival time while Vistara have the lowest
arrival time
Looking at the barplot for Flight duration vs
Airline we observe that Ai Asia has the highest
flight duration while Alliance Air has the lowest
flight duration collectively
Comparing the barplots for Flight prices vs
Airline we can clearly see that Vistara have very
high flight prices while the FlyBig has the
lowest fare.
VISUALIZATION

Spicejet has the maximum non stop flight

Air India has the maximum no of 1 stop flights
VISUALIZATION

Airfares in Vistara and Air India are high when compared to other airlines.
Flight prices when departing from cities like Chennai and Patna have higher price range but the others are around the
similar range a bit lesser in pricing but not providing a huge difference as such
Similarly, prices when arriving in cities Portblair and Dheradun have high price range
When we consider the layovers for pricing situation then obviously direct flights are cheaper when compared to flights that
have 1 or more stops.
OUTLIERS

A box plot is used to summarize data sets by using the box and
whisker plot method. This function helps to understand the data
summary properly. Box plots can be very useful when we want
to know how the data is distributed and spread. Three types of
quartiles are used in the box plot to plot the data. These values
include the median, maximum, minimum, upper-quartile, and
lower-quartile statistical values. A box plot summarizes this data
in the 25th, 50th, and 75th percentiles.
From the box plot we can notice the outliers present in Duration
and "Price" columns.
Since Price is our target variable so no need to remove outliers in
this column. We have removed Outliers from Duration column
by using Zscore method.
CORRELATION

From the heat map and bar plot we can clearly observe the positive and negative correlation between the label and features.
DATA ANALYSIS STEPS DONE
I have done feature engineering steps like feature extraction and feature selection to improve data normality and linearity.
Identified outliers using boxplots and removed outliers in numerical variables.
Identified skewness using distribution plots and removed skewness using square root transformation method.
Used Pearson’s correlation coefficient to check the correlation between dependent and independent variables. To visualize the
correlation I have used heatmap and bar plot.
I have used StandardScalar method to scale the data to overcome with the issue of data biasness.
Split train and test to build machine learning models. Found best random state and best accuracy. Model building process will be
shown in the further steps.
ASSUMPTIONS:
Firstly, from the problem statement we got to know that it is a Regression type problem for which we used Regression
algorithms to build the model and predicted the price of flight tickets by collecting the from yatra website using web scraping.
Secondly, from the distribution plots I found skewness in Duration column and from box plots I found outliers in target column
and categorical column. Also, based upon the analysis and visualization part we have seen some of the features having
somewhat linear relation with label. So, I assumed these features helps in model building and to predict price of the flight
tickets. Also, this model helps the buyers to understand the future price of the flight tickets.
So, I suggest that the sellers and buyers take this model into consideration the features that were deemed as most important as
seen in this study might help them estimate the flight ticket price.
MODEL BUILDING:
In this problem “Price” is our target variable which is continuous in nature where we need to predict the price of flight tickets.
From this I can conclude that it is a Regression type problem hence I have used following regression algorithms.
After the pre-processing and data cleaning I left with 11 columns including target and with the help of feature importance bar graph
I used these independent features for model building and prediction. The algorithms used on training the data are as follows:
i. Decision Tree Regressor
ii. Random Forest Regressor
iii. Extra Trees Regressor
iv. Gradient Boosting Regressor
v. Extreme Gradient Boosting Regressor (XGB)
vi. Bagging Regressor
vii. KNN Regressor

I have got the best random state and maximum R2 score and then created new train test split to build the above models.
BEST RANDOM STATE
HYPER PARAMETER TUNING

I have used GridSearchCV to get the best parameters of XGB Regressor. And used all the obtained parameters to
get the accuracy of final model.
SAVING THE MODEL AND PREDICTIONS
USING SAVED MODEL

I have saved my best model using .pkl as follows.

Now after saving the best model, loading my saved model and predicting the test values.
CONCLUSION
The case study aims to give an idea of applying Machine Learning algorithms to predict the price of the flight tickets. After the completion of
this project, we got an insight of how to collect data, pre-processing the data, analyse the data, cleaning the data and building a model.
First we collected the flights data from website and it was done by using Web scraping. The framework used for web scraping was Selenium,
which has an advantage of automating our process of collecting data. We collected almost 5303 of data which contained the ticket price of the
flights and other related features. Then, the scrapped data was saved in a excel file so that we can use further and analyse the data.
Then we loaded the dataset and have done data cleaning, EDA process and pre-processing techniques like checking outliers, skewness,
correlation, scaling data etc. And got better insights from data Visualization.
From the visualization we got to know that flight ticket prices change during morning and evening time of the day. From the distribution plots
we came to know that the prices of the flight tickets are going up and down, they are not fixed at a time. Also, from this graph we found prices
are increasing in large amounts. From plots we found that the prices are tending to go up as the time is approaching from morning to evening.
From the categorical plots (bar and box) we came to know that early morning and late night flights are cheaper compared to working hours.
From the categorical plots we found that the flight ticket prices increases as the person get near to departure time. That is last minute flights are
very expensive. From the bar plot we got to know that both Indigo and Spice jet airways almost having same ticket fares.
After separating our train and test data, we started running different ML regression algorithms to find out the best performing model on the
basis of different metrics like R2 Score MAE, MSE, RMSE. We got Extra Trees Regressor as the best model among all the models. On this
basis we performed the Hyper parameter tuning to find out the best parameter and improving the scores. The R2 score increased after tuning so,
we concluded that Extra Trees Regressor as the best model as it was giving high R2 score after tuning.

Flight Price Prediction
57% (7)
Flight Price Prediction
19 pages
Flight Price Prediction
No ratings yet
Flight Price Prediction
15 pages
Airfare Prices Prediction Using Machine Learning Techniques
No ratings yet
Airfare Prices Prediction Using Machine Learning Techniques
55 pages
Flight Price Prediction
No ratings yet
Flight Price Prediction
34 pages
A17 MJ PPT March 7
No ratings yet
A17 MJ PPT March 7
43 pages
Flight Price Prediction Project Report in PDF
No ratings yet
Flight Price Prediction Project Report in PDF
34 pages
Comparative Analysis of Machine Learning Models For Accurate Flight Price Prediction
No ratings yet
Comparative Analysis of Machine Learning Models For Accurate Flight Price Prediction
7 pages
Capstone Review 1
No ratings yet
Capstone Review 1
7 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
STC (1) - Removed
No ratings yet
STC (1) - Removed
30 pages
Flight Price Prediction
No ratings yet
Flight Price Prediction
15 pages
Ict Project Report
No ratings yet
Ict Project Report
14 pages
VND Openxmlformats-Officedocument Wordprocessingml
No ratings yet
VND Openxmlformats-Officedocument Wordprocessingml
71 pages
A Flight Fare Prediction Using Machine Learning
No ratings yet
A Flight Fare Prediction Using Machine Learning
5 pages
MAJOR PROJECTy
No ratings yet
MAJOR PROJECTy
11 pages
Batch 7 F
No ratings yet
Batch 7 F
15 pages
Presentation Sample Half
No ratings yet
Presentation Sample Half
11 pages
Introduction
No ratings yet
Introduction
3 pages
A17 Journal (1) .Docxnew
No ratings yet
A17 Journal (1) .Docxnew
9 pages
Models
No ratings yet
Models
5 pages
Dse4 Stug082
No ratings yet
Dse4 Stug082
43 pages
Thesis Defense
No ratings yet
Thesis Defense
25 pages
Flight Fare Predictor
No ratings yet
Flight Fare Predictor
21 pages
Flight Price Prediction Using Machine Learning Algorithms
No ratings yet
Flight Price Prediction Using Machine Learning Algorithms
5 pages
Flight Booking
No ratings yet
Flight Booking
25 pages
1-Flight Booking
No ratings yet
1-Flight Booking
25 pages
Flight Ticket Price Predicting With The
No ratings yet
Flight Ticket Price Predicting With The
4 pages
Presentation On Flight Price Prediction
No ratings yet
Presentation On Flight Price Prediction
30 pages
Industrial Internship Report ON Fundamental Analysis of Indian Steel Industry
No ratings yet
Industrial Internship Report ON Fundamental Analysis of Indian Steel Industry
60 pages
Prediction of Flight-Fare Using Machine Learning
No ratings yet
Prediction of Flight-Fare Using Machine Learning
6 pages
Flight Price Prediction Project
No ratings yet
Flight Price Prediction Project
9 pages
NM Arts&Science Project Documentation
No ratings yet
NM Arts&Science Project Documentation
8 pages
Report
No ratings yet
Report
31 pages
Presentation Learbnbay - Flight Fare Prediction
No ratings yet
Presentation Learbnbay - Flight Fare Prediction
15 pages
Team Nithya
No ratings yet
Team Nithya
16 pages
SSRN Id4269263
No ratings yet
SSRN Id4269263
5 pages
Airfare Synopsis
No ratings yet
Airfare Synopsis
6 pages
Presentation On Flight Price Prediction
No ratings yet
Presentation On Flight Price Prediction
30 pages
Paper 90
No ratings yet
Paper 90
7 pages
Prediction of Flight-Fare Using Machine Learning
No ratings yet
Prediction of Flight-Fare Using Machine Learning
6 pages
Research High School
No ratings yet
Research High School
10 pages
Flight Price Predictions
No ratings yet
Flight Price Predictions
37 pages
Meta
No ratings yet
Meta
21 pages
47.epra Journals 14763
No ratings yet
47.epra Journals 14763
6 pages
Flight Fare Prediction: Project Report
No ratings yet
Flight Fare Prediction: Project Report
38 pages
Flight Price Predection 2
No ratings yet
Flight Price Predection 2
6 pages
Propsoal ML
No ratings yet
Propsoal ML
4 pages
Flight Price Project
No ratings yet
Flight Price Project
15 pages
Cse 28
No ratings yet
Cse 28
7 pages
Easychair Preprint: Vinod Kimbhaune, Harshil Donga, Asutosh Trivedi, Sonam Mahajan and Viraj Mahajan
No ratings yet
Easychair Preprint: Vinod Kimbhaune, Harshil Donga, Asutosh Trivedi, Sonam Mahajan and Viraj Mahajan
5 pages
L35 MC 6
No ratings yet
L35 MC 6
351 pages
Airplane Final
No ratings yet
Airplane Final
23 pages
Prediction of Airline Ticket Price: Motivation Models Diagnostics
No ratings yet
Prediction of Airline Ticket Price: Motivation Models Diagnostics
1 page
Project PPT 1
No ratings yet
Project PPT 1
16 pages
Flight Ticket Price Predictor - Formatted Paper
No ratings yet
Flight Ticket Price Predictor - Formatted Paper
5 pages
Predicting Flight Prices in India Sectors
No ratings yet
Predicting Flight Prices in India Sectors
16 pages
Surendra Paper
No ratings yet
Surendra Paper
7 pages
Flight Price Prediction Report
No ratings yet
Flight Price Prediction Report
18 pages
Nocom vs. Camerino
0% (1)
Nocom vs. Camerino
7 pages
EE5253 2023 Paper Group35
No ratings yet
EE5253 2023 Paper Group35
5 pages
Flight Price Prediction Project Presentation
No ratings yet
Flight Price Prediction Project Presentation
15 pages
Project Report On Flight Price Predication Using ML Techniques
No ratings yet
Project Report On Flight Price Predication Using ML Techniques
23 pages
Analyzing Malicious Software
100% (1)
Analyzing Malicious Software
47 pages
Presentation Free Diving Range 2018 Eng
No ratings yet
Presentation Free Diving Range 2018 Eng
55 pages
Introduction To Practical Exercises Using MODICOM 2 ... - LJ Create PDF
No ratings yet
Introduction To Practical Exercises Using MODICOM 2 ... - LJ Create PDF
8 pages
Disposal of Unused Drugs - Knowledge and Behavior Among People Around The World
100% (1)
Disposal of Unused Drugs - Knowledge and Behavior Among People Around The World
34 pages
Summary-RK Narayan - The Financial Expert
100% (13)
Summary-RK Narayan - The Financial Expert
5 pages
Cage Trim Valves
100% (1)
Cage Trim Valves
57 pages
Digiscope Slimhole MWD Ps
No ratings yet
Digiscope Slimhole MWD Ps
2 pages
Law Assignment (Final)
No ratings yet
Law Assignment (Final)
10 pages
SPH Catalogue
No ratings yet
SPH Catalogue
127 pages
Curriculum Vitae: Present Libya +218 913008576 Residence +919816228430 +919736499006
100% (1)
Curriculum Vitae: Present Libya +218 913008576 Residence +919816228430 +919736499006
3 pages
Diagnostic Test 15 Dependent Prepositions
No ratings yet
Diagnostic Test 15 Dependent Prepositions
1 page
COPC 2021 CX Standard For Customer Operations Release 7.0
No ratings yet
COPC 2021 CX Standard For Customer Operations Release 7.0
80 pages
SDS Underwater Cutting Rods 2018 PDF
100% (1)
SDS Underwater Cutting Rods 2018 PDF
8 pages
User Manual - PTRC - View Historic Return - Up To 31st March 2016
No ratings yet
User Manual - PTRC - View Historic Return - Up To 31st March 2016
8 pages
SDSS2022 Programme Book
No ratings yet
SDSS2022 Programme Book
22 pages
BEIJER - IX TxA and IX TXB To X2 Migration Guidelines (08 - 2016)
No ratings yet
BEIJER - IX TxA and IX TXB To X2 Migration Guidelines (08 - 2016)
10 pages
pp2 Coursework 1 - 201808
No ratings yet
pp2 Coursework 1 - 201808
3 pages
Excel Vba-Based Solution To Pipe Flow Measurement Problem: Spreadsheets in Education (Ejsie)
No ratings yet
Excel Vba-Based Solution To Pipe Flow Measurement Problem: Spreadsheets in Education (Ejsie)
16 pages
Use of Modified Bituminous Binders in India - Current Imperatives
100% (5)
Use of Modified Bituminous Binders in India - Current Imperatives
25 pages
How To Enable and Use Remote Desktop For Windows 10
No ratings yet
How To Enable and Use Remote Desktop For Windows 10
11 pages
Business Model Canvas
No ratings yet
Business Model Canvas
3 pages
CBSE Class 3 Mathematics - 4 Digit Numbers-2
No ratings yet
CBSE Class 3 Mathematics - 4 Digit Numbers-2
4 pages
The History of Kit Kat
100% (2)
The History of Kit Kat
7 pages
Information Science: Competency Levels of Nursing Informatics
No ratings yet
Information Science: Competency Levels of Nursing Informatics
6 pages
Activity Hazards Analysis: MD485B Tower Assembly AHA
No ratings yet
Activity Hazards Analysis: MD485B Tower Assembly AHA
6 pages
80-90 DT - Fiat Tractor (01/84 - 12/92)
No ratings yet
80-90 DT - Fiat Tractor (01/84 - 12/92)
2 pages
Technical Tip: Overview of Ethylene Oxide (Eo or Eto) Residuals
No ratings yet
Technical Tip: Overview of Ethylene Oxide (Eo or Eto) Residuals
3 pages
Shippo Integration in Angular: A Step-by-Step Guide to Creating Shipping Functionality
From Everand
Shippo Integration in Angular: A Step-by-Step Guide to Creating Shipping Functionality
Abdelfattah Ragab
No ratings yet