0% found this document useful (0 votes)
437 views13 pages

Flight Delay Prediction: Project Synopsis On

This document provides a synopsis for a project using machine learning models to predict flight delays. The goals are to apply linear regression to predict arrival delays based on departure delay and route distance. The dataset used contains features like origin, destination, carrier, day of week, and departure/arrival delays. Python is identified as the programming language due to its powerful machine learning and scientific computing packages. The proposed work will use multiple linear regression to model the relationship between arrival delay as the dependent variable and other features as explanatory variables. The system will involve data collection, preprocessing, training a machine learning model, and evaluating model performance.

Uploaded by

Ramesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
437 views13 pages

Flight Delay Prediction: Project Synopsis On

This document provides a synopsis for a project using machine learning models to predict flight delays. The goals are to apply linear regression to predict arrival delays based on departure delay and route distance. The dataset used contains features like origin, destination, carrier, day of week, and departure/arrival delays. Python is identified as the programming language due to its powerful machine learning and scientific computing packages. The proposed work will use multiple linear regression to model the relationship between arrival delay as the dependent variable and other features as explanatory variables. The system will involve data collection, preprocessing, training a machine learning model, and evaluating model performance.

Uploaded by

Ramesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

PROJECT SYNOPSIS ON

Flight Delay Prediction


DEPARTMENT OF INFORMATION TECHNOLOGY
NETAJI SUBHASH ENGINEERING COLLEGE

By

RAMESH KUMAR (10900216031)

Under the guidance of:

(ANUPAM BERA)

Project group: SAURABH KUMAR, SAMEER AKHTER, PAYAL KUMARI, RAMESH KUMAR
Under my guidance and supervision the synopsis of the project
_____________________________________________________________of 4th year
Information and technology is submitted.

(signature of Project guide)


--------------------------------------
ANUPAM BERA
Information and Technology
Netaji Subhas Engineering College.
Garia, Kolkata - 700152
ACKNOWLEDGEMENT
I owe my deep sense of gratitude to my respected mentor Prof. ANUPAM BERA,
Department of Information and Technology. Netaji Subhash Engineering College,
Kolkata for his meticulous and expert guidance, constructive criticism, patient
hearing and benevolent behaviour throughout my ordeal of the present research. I
shall remain grateful to him for his cordial, cooperative attitude, wise and
knowledgeable counsel that acted as an impetus in the successful completion of
my project titled MACHINE LEARNING MODEL FOR FLIGHT DELAY
PREDICTION.
I would like to particularly thank the Head of the Department for giving me guidance
and inspiration during my study in the department. I never forget the kind help
extended by the HOD. It however, is not possible for me to forget the kind of help
provided by all the faculty members,
At last but not least my friends in the department who deserve some words of
thanks.
CONTENT
Abstract 5
Introduction 6
Project Goals and Scope 7
Data and Tools 7
4.1 Data Used
4.1.1 Choosing the Dataset
4.2 Tools 8
Python and associated packages
Proposed Work 9
Linear Regression
System Design 10
The various modules of the project would be divided into the segments as described.
I. Data Collection 12
II. Pre Processing 12
III. Training the Machine 13
IV. Data Scoring 14
Conclusion 15
Future work 15
References 16
Abstract
As the population increases, there is need of more n more airlines , and which
results growth in aircraft industry which in turn resulted in air-traffic congestion
causing flight delays. Flight delays not only have economic impact but also harmful
environmental effects. Air-traffic management is becoming increasingly
challenging. In this project I apply machine learning algorithm like linear regression
to predict if a given flight’s arrival will be delayed or not, that will tells us how delay
one flight is .
Introduction

Delay is one of the most the annoying thing we people face in our day to day life . So
with the case of flight also. People are in hurry. And they hate the delays. What
basically delay is - a delay may be defined as the difference between scheduled and
real times of departure or arrival of a plane. Country regulator authorities have a
multitude of indicators related to tolerance thresholds for flight delays. Indeed, flight
delay is an essential subject in the context of air transportation systems. For
passengers, flight delay caused the inconvenience of travel, bad mood, as well as the
double loss of time and economy; for the airport, the delay of the flight seriously
affects the normal operation of the airport; for airline, frequent flight delay not only
bring huge economic losses to the airline, but also affect the reputation of the airline.
Flight delay has become the shackles of the development of the aviation industry.
Project Goals and Scope
In this project we established a multiple linear regression model using departure
delay and route distance to predict arrival delay, and presented the design and
implementation of a flight delay prediction system. A chief goal of this project is to
add to the academic understanding of flight delay prediction. The hope is that with
a greater understanding of how the flight delays, customer will be better equipped
to prevent delay.
It is important here to define the scope of the project. This project will focus
exclusively on predicting the flight delay of individual stocks. The project will make
no attempt to decide how much money to allocate to each prediction. More so, the
project will analyse the accuracies of these prediction.

Data and Tools

4.1 Data Used


4.1.1 Choosing the Dataset
We have selected dataset available on kaggle.com .Features contained in the
dataset are as follows:
1. Origin
2. Dest
3. Unique_Carrier
4. Day_of_Week
5. Dep_Hour
6. Arr_Delay.
4.2 Tools
Python and associated packages
Python was the language of choice for this project. This was an easy decision for
the multiple reasons. 16
1. Python as a language has an enormous community behind it. Any problems
that might be encountered can be easily solved with a trip to Stack Overflow.
Python is among the most popular languages on the site which makes it very
likely there will be a direct answer to any query.
2. Python has an abundance of powerful tools ready for scientific computing.
Packages such as Numpy, Pandas, and SciPy are freely available,
performant, and well documented. Packages such as these can dramatically
reduce, and simplify the code needed to write a given program. This makes
iteration quick.
3. Python as a language is forgiving and allows for programs that look like
pseudo code. This is useful when pseudo code given in academic papers
needs to be implemented and tested. Using Python, this step is usually
reasonably trivial.

Proposed Work

I basically use regression in my project.

5.1Multiple Linear Regression


In statistics, linear regression is an approach for modeling the relationship between
a scalar dependent variable y and one or more explanatory variables (or
independent variables) denoted X. The case of one explanatory variable is called
simple linear regression. For more than one explanatory variable, the process is
called multiple linear regression. In linear regression, the relationships are modeled
using linear predictor functions whose unknown model parameters are estimated
from the data. Such models are called linear models.
Linear regression has many practical uses. Most applications fall into one of the
following two broad categories: (1) if the goal is prediction, or forecasting, or error
reduction, linear regression can be used to fit a predictive model to an observed
data set of Y and X values. After developing such a model, if an additional value of
X is then given without its accompanying value of Y, the fitted model can be used
to make a prediction of the value of Y. (2) given a variable Y and a number of
variables X1, ..., Xp that may be related to Y, linear regression analysis can be
applied to quantify the strength of the relationship between Y and the Xj, to assess
which Xj may have no relationship with Y at all, and to identify which subsets of the
Xj contain redundant information about Y.

System Design
The first step is the conversion of this raw data into processed data. This is done
using feature extraction, since in the raw data collected there are multiple attributes
but only a few of those attributes are useful for the purpose of prediction. So the
first step is feature extraction, where the key attributes are extracted from the whole
list of attributes available in the raw dataset. Feature extraction starts from an initial
state of measured data and builds derived values or features. These features are
intended to be informative and non-redundant, facilitating the subsequent learning
and generalization steps. Feature extraction is a dimensionality reduction process,
where the initial set of raw variables is diminished to progressively reasonable
features for ease of management, while still precisely and totally depicting the first
informational collection. The feature extraction process is followed by a
classification process wherein the data that was obtained after feature extraction is
split into two different and distinct segments. Classification is the issue of
recognizing to which set of categories a new observation belongs. The training data
set is used to train the model whereas the test data is used to predict the accuracy
of the model. The splitting is done in a way that training data maintain a higher
proportion than the test data. The random forest algorithm utilizes a collection of
random decision trees to analyze the data. In layman terms, from the total number
of decision trees in the forest, a cluster of the decision trees look for specific
attributes in the data. This is known as data splitting. In this case, since the end
goal of our proposed system is to predict the price of the stock by analyzing its
historical data.
The various modules of the project would be divided into the segments as described.
I. Data Collection
Data collection is a very basic module and the initial step towards the project. It
generally deals with the collection of the right dataset. The dataset that is to be
used in the market prediction has to be used to be filtered based on various
aspects. Data collection also complements to enhance the dataset by adding more
data that are external. Our data mainly consists of the previous year flight time
table. Initially, we will be analyzing the Kaggle dataset and according to the
accuracy, we will be using the model with the data to analyze the predictions
accurately.
II. Pre Processing
Data pre-processing is a part of data mining, which involves transforming raw data
into a more coherent format. Raw data is usually, inconsistent or incomplete and
usually contains many errors. The data pre-processing involves checking out for
missing values, looking for categorical values, splitting the data-set into training and
test set and finally do a feature scaling to limit the range of variables so that they
can be compared on common environs.
III. Training the Machine
Training the machine is similar to feeding the data to the algorithm to touch up the
test data. The training sets are used to tune and fit the models. The test sets are
untouched, as a model should not be judged based on unseen data. The training of
the model includes cross-validation where we get a well-grounded approximate
performance of the model using the training data. Tuning models are meant to
specifically tune the hyperparameters like the number of trees in a random forest.
We perform the entire cross-validation loop on each set of hyperparameter values.
Finally, we will calculate a cross-validated score, for individual sets of
hyperparameters. Then, we select the best hyperparameters. The idea behind the
training of the model is that we some initial values with the dataset and then
optimize the parameters which we want to in the model. This is kept on repetition
until we get the optimal values. Thus, we take the predictions from the trained
model on the inputs from the test dataset. Hence, it is divided in the ratio of 80:20
where 80% is for the training set and the rest 20% for a testing set of the data.
IV. Data Scoring
The process of applying a predictive model to a set of data is referred to as scoring
the data. The technique used to process the dataset is the Random Forest
Algorithm. Random forest involves an ensemble method, which is usually used, for
classification and as well as regression. Based on the learning models, we achieve
interesting results. The last module thus describes how the result of the model can
help to predict the probability of a stock to rise and sink based on certain
parameters. It also shows the vulnerabilities of a particular entity. The user
authentication system control is implemented to make sure that only the authorized
entities are accessing the results.

Conclusion
In this project, I am able to successfully apply machine learning algorithms to
predict flight arrival-delay and show simple classifiers like linear regression and can
predict if a flight’s arrival will be delayed or not fairly accurately , i.e. giving how
delay one flight could be .

Future work
For further work I like to further improve my model, perhaps with more
training-data or deeper neural network, or both. Taxi-delay prediction is a natural
progression to this work, considering amount of fuel wasted while taxing. Accurate
taxi-delay prediction requires taking airport runway and taxiway configurations in to
consideration where very little work exists.

References:
[1] C. Cetek, E. Cinar, F. Aybek, and A. Cavcar, “Capacity and delay analysis for airport
manoeuvring areas using simulation,” Aircraft Engineering and Aerospace Technology,
vol. 86, no. 1, pp. 43–55, 2013. [Online]. Available: https://doi.org/10.1108/AEAT-04-
2012-0058
[2] K. B. Nogueira, P. H. Aguiar, and L. Weigang, “Using ant algorithm to arrange taxiway
sequencing in airport,” International Journal of Computer Theory and Engineering, vol. 6,
no. 4, p. 357, 2014.
[3] R. R. Clewlow, I. Simaiakis, and H. Balakrishnan, “Impact of arrivals on departure taxi
operations at airports,” 2010
References
1. https://www.researchgate.net/publication/315382748_A_Review_on_Flight_Delay_Prediction

2.

You might also like