0% found this document useful (0 votes)
22 views25 pages

Thesis Defense

Flight ticket price prediction using machine learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views25 pages

Thesis Defense

Flight ticket price prediction using machine learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Flight Ticket

Price
Prediction

Presentation by Shubham Karmakar

Supervised by Sri Tulsidas Mukherjee


Page 2
Incentives
• To develop a predictive model that accurately forecasts flight ticket
prices.
• Identify and analyze key factors such as airline selection,
routes, and stops that influence ticket pricing.
• Optimize model performance through hyperparameter tuning and
feature importance analysis.
• Gain insights into the complex relationships between features and
ticket prices to make informed decisions when booking flights.
• Enhance consumer welfare by enabling cost-effective travel
planning.
• Optimizes airline revenue management through informed pricing
strategies.
Page 3 Introduction
The motivation behind predicting flight ticket prices is
driven by the dual imperative of enhancing consumer
welfare and optimizing revenue streams for airlines. For
travelers, predictive algorithms serve as indispensable
tools, offering discerning insights into optimal purchasing
windows, thereby imbuing journeys with a sense of
financial prudence and foresight. These sophisticated
models not only augment market transparency but also
engender a paradigm shift towards consumer
empowerment, elucidating the intricate dynamics of
pricing volatility. Concurrently, for airlines, predictive
analytics represent a strategic linchpin, enabling dynamic
pricing strategies that maximize yield while mitigating
demand volatility. Furthermore, the deployment of
predictive tools engenders a virtuous cycle of innovation,
catalyzing the evolution of pricing mechanisms and
operational efficiencies within the aviation industry..
Page 4

Insights
• Total Operating Revenue:
⚬ India's aviation industry generated approximately $20 billion in revenue.
• Revenue Passenger Kilometers (RPK):
⚬ In 2019, Indian airlines achieved around 265 billion RPK, reflecting
significant passenger demand and airline activity.
• Passenger Traffic:
⚬ Domestic airlines in India carried approximately 144 million passengers
in 2019, showcasing the growing preference for air travel
Page 5

1
Stakeholders
Consumers 2 3
• Cost Savings: Helps consumers
identify the best times to purchase Airline Benefits:
tickets at the lowest prices. • Revenue Management: Assists airlines in
setting optimal prices to maximize revenue. Market Dynamics:
• Budget Planning: Enables travelers to
• Demand Forecasting: Improves airlines' ⚬ Enhances market competitiveness
plan their expenses more effectively.
ability to predict passenger demand and by providing transparent pricing.
adjust pricing strategies accordingly. ⚬ Encourages more informed
decision-making for both
consumers and airlines.
Page 6

Data Processing
• Importing Datasets:
• Collected datasets from various sources, including airline websites and travel agencies.
• Ensured datasets included relevant features such as date of journey, departure time, arrival time, and
ticket prices.
• Handling Missing Values:
• Identified and addressed missing data points using imputation methods to maintain dataset integrity.
• Applied techniques such as mean imputation for numerical data and mode imputation for categorical
data.
• Date and Time Conversion:
• Converted date and time columns into datetime format for consistent processing.
• Extracted valuable time-based features such as day of the week, month, and hour of travel.
• Data Normalization:
• Standardized numerical features to ensure uniform scaling across the dataset.
• Used normalization techniques to improve the performance of machine learning models.
Page 6

Handling Categorical Data

We are using two main Encoding Techniques to convert


Categorical data into some numerical format

Nominal data -- Data that are not in any order -->one hot
encoding

ordinal data -- Data are in order --> labelEncoder


From graph we see that Jet Airways have the highest Price and
apart from the first airline everyone has almost similar median
Handling Outliers

As there some outliers in price feature, we replace them with the


MEDIAN
Page 10
Feature Engineering
Feature engineering is a preprocessing step in
supervised machine learning and statistical modellingwhich
transforms raw data into a more effective set of inputs. Each input
comprises several attributes, known as features. By providing
models with relevant information, feature engineering significantly
enhances their predictive accuracy and decision-making capability

1 2
Date and Time Extraction:
⚬ Extracted day and month from the date of journey to Duration Calculation:
⚬ Calculated the duration of each flight by subtracting departure
capture seasonal and monthly trends.
⚬ Extracted hour and minute from departure and arrival time from arrival time.
⚬ Converted the duration into a numerical feature representing
times to analyze the impact of travel time on ticket
prices. total travel time in minutes or hours.
Page 11
Feature Engineering
• Airline Encoding:
⚬ Encoded the airline names to capture airline-specific
1
pricing strategies.
⚬ Used label encoding or one-hot encoding for
transforming airline data into numerical features.

• Route Encoding:
2 3
⚬ Encoded the route (combination of source and Additional Features:
destination) to identify route-specific pricing • Considered adding features such as layovers, number
patterns. of stops, and seat class.
⚬ Applied one-hot encoding to convert categorical • Evaluated the importance of each feature using feature
route data into numerical format. importance metrics from machine learning models.
Model Selection
Page 12

1
2

• RandomForest Regressor: • Logistic Regression:


⚬ Chosen for its ability to handle complex ⚬ Evaluated for its simplicity and
interactions and provide robust efficiency in binary classification.
predictions. ⚬ Found less suitable for continuous
⚬ Performed well in initial tests with high
price prediction tasks.
accuracy and low error rates.
Model Selection
Page 13

3
4

• K-Nearest Neighbors (KNN): • Decision Tree Regressor:


• Considered for its straightforward approach • Explored for its interpretability and ease
to prediction based on nearest neighbors. of visualization.
• Showed limitations in handling large • Prone to overfitting on training data,
datasets and complex feature spaces. requiring pruning techniques.
Model Selection
Page 14

5
6

• Support Vector Regression • Gradient Boosting Regressor:


(SVR): ⚬ Assessed for its ability to improve
⚬ Tested for its effectiveness prediction accuracy through iterative
in high-dimensional boosting.
spaces. ⚬ Demonstrated competitive performance
⚬ Computationally intensive but required careful tuning of
and less scalable for larger hyperparameters.
datasets.
Page 15
So Which is the best Model?
• Model Selection:
• After evaluating multiple models, the RandomForest Regressor emerged as the best-performing model.
Page 6

....why so?
• Performance Metrics:
⚬ Accuracy: Achieved an accuracy rate of 85.36%, indicating a high level of predictive performance.
⚬ Mean Absolute Error (MAE): Low MAE values, reflecting the model's precision in predicting
ticket prices.
⚬ Root Mean Squared Error (RMSE): Demonstrated the model's ability to handle variability in the
data with minimal error.
Page 6

....why so?
• Feature Importance:
• Identified key features contributing to accurate predictions, such as departure time,
duration, and airline.
• Provided insights into which factors most significantly influence ticket prices.

• Model Robustness:
• Validated the model's robustness through cross-validation and testing on unseen data.
• Ensured consistent performance across different datasets and conditions.
Page 18
Hypertunning The Model
Objective:
Optimize the performance of the RandomForest Regressor by fine-tuning its hyperparameters.
Parameters Tuned:
Number of Trees (n_estimators): Adjusted the number of decision trees in the forest to balance between overfitting
and underfitting.
Maximum Depth (max_depth): Set limits on the depth of the trees to prevent overfitting.
Minimum Samples Split (min_samples_split): Determined the minimum number of samples required to split an
internal node.
Minimum Samples Leaf (min_samples_leaf): Established the minimum number of samples required to be at a leaf
node.
Optimization Techniques:
RandomizedSearchCV: Utilized to efficiently search through a wide range of hyperparameters by randomly
sampling from the specified distributions.
GridSearchCV: Employed to perform an exhaustive search over a predefined grid of hyperparameters to find the best
combination.
Page 19
Hypertuning The Model
• Results:
⚬ Achieved improved accuracy and reduced error rates with optimized
hyperparameters.
⚬ Fine-tuned model demonstrated enhanced generalization on unseen data.

Before hypertuning After Hypertuning


r2 score was: 0.8383033821751005 r2 score is:
0.8536822685106241

AFTER HYPERTUNING THE ACCURACY INCREASES


Data analysis(sample data)
Page 20

• Sample Data Visualization:


⚬ Display charts and graphs showing
the distribution of ticket prices over
time.
⚬ Include visuals depicting the
relationship between ticket prices
and key features such as departure
time, duration, and airline.
Data analysis(sample data)
Page 21

• Insights from Visualizations:


⚬ Price Trends: Identify patterns and trends in ticket prices based on the day of the
week, month, and season.
⚬ Departure Time Impact: Show how departure times influence ticket prices,
highlighting peak and off-peak hours.
⚬ Airline Comparison: Compare ticket prices across different airlines to reveal
pricing strategies and differences.
• Descriptive Statistics:
⚬ Provide summary statistics such as mean, median, and standard deviation of ticket
prices.
⚬ Highlight any notable outliers or anomalies in the data.
Page 22
Lexicon retrograde
RandomForest Regressor:
An ensemble learning method that constructs multiple decision trees and merges them to improve
predictive accuracy and control overfitting.
Hyperparameter Tuning:
The process of adjusting model parameters to optimize performance and achieve the best possible
predictive accuracy.
Feature Importance:
A metric that indicates the contribution of each feature to the predictive power of the model, helping to
identify the most influential variables.
One-Hot Encoding:
A technique used to convert categorical variables into a binary format, where each category is represented
by a unique binary vector.
LabelEncoder:
A preprocessing tool that converts categorical labels into a numerical format, allowing them to be used in
machine learning algorithms.
CESSATION
Based on the analysis conducted, it can be concluded that factors such as the airline, total stops, and
specific routes have a significant impact on ticket prices. It is advisable for passengers to consider these
factors while booking flights to potentially find more cost-effective options.
Additionally, understanding the importance of these variables can help passengers make informed
decisions and optimize their travel expenses.
The flight price prediction project utilized advanced machine learning techniques such as Random
ForestRegressor and hyperparameter tuning to accurately forecast ticket prices. By analyzing feature
importance, we identified key factors like airline selection, routes, and stops that significantly influence
pricing. Techniques like One-Hot Encoding and LabelEncoder were employed to preprocess categorical
data for model training.
Through this analysis, we gained insights into the complex relationships between various features and
ticket prices, enabling us to make informed decisions when booking flights. The project showcased the
importance of data preprocessing, model optimization, and feature analysis in enhancing prediction
accuracy and understanding the dynamics of flight pricing.
धन्यवादः

You might also like