MALLA REDDY ENGINEERING COLLEGE
(AUTONOMOUS)
DEPARTMENT OF CSE
Flight Delay Prediction Based on Aviation Big Data and Machine
Learning
Under the Guidance of
Mr.k.Srikanth
Submitted By:
A.Manu Priyatham Chand -19J41A0567
B.Rishikesh -19J41A0570
B.Srikanth -19J41A0573
K.VSR.Sravan Varma -19J41A0592
CONTENTS
• ABSTRACT
• INTRODUCTION
• EXISTING SYSTEM
• WEEKNESS(GAPS) OF EXISTING SYSTEM
• PROBLEM STATEMENT
• OBJECTIVES OF THE PROPOSED SYSTEM
• PROPOSED SYSTEM
• ALGORITHMS USED IN PROPOSED SYSTEM
• ADVANTAGES OF PROSED SYSTEM OVER EXISTING SYSTEM
• PROPOSED SYSTEM ARCHITECTURE
• SOFTWARE & HARDWARE REQUIREMENT
ABSTRACT
• Accurate flight delay prediction is fundamental to establish the more efficient airline business.
Recent studies have been focused on applying machine learning methods to predict the flight delay.
Most of the previous prediction methods are conducted in a single route or airport. This paper
explores a broader scope of factors which may potentially influence the flight delay, and compares
several machine learning-based models in designed generalized flight delay prediction tasks. To
build a dataset for the proposed scheme, automatic dependent surveillance broadcast (ADS-B)
messages are received, pre-processed, and integrated with other information such as weather
condition, flight schedule, and airport information. The designed prediction
• tasks contain different classification tasks and a regression task. Experimental results show that
long short-term memory (LSTM) is capable of handling the obtained aviation sequence data, but
over fitting problem occurs in our limited dataset.
INTRODUCTION
Therefore, new technologies such as automatic dependent surveillance broadcast (ADS-B)
have been proposed, where flights can periodically broadcast their current state information,
such as international civil aviation organization (ICAO) identity number, longitude, latitude
and speed . Compared with the traditional radar-based schemes, the ADSB- based scheme is
low cost, and the corresponding ADS-B receiver (at 1090 MHz or 978 MHz) can be easily
connected to personal computers . The received ADS-B message along with other collected
data from the Internet can constitute ahuge volumes of aviation data by which data mining can
support military, agricultural, and commercial applications.
The second stage is a layered neuron network model to predict the delay of each individual
flight using the day-to-day delay status from the first stage and other information. The two
stages of the model achieved accuracies of 85% and 87.42%, respectively. This study
suggested that the deep learning model requires a great volumes of data. Otherwise, the model
is likely to end up with poor performance or overfitting .
EXISTING SYSTEM
Nowadays, aircrafts have become a necessity because they easy life. They
are efficient in carrying goods and passengers around the world. It also
supplies emergencies in warfare and takes a vital role in carrying medical
necessities. Hence, advent of airplanes is considered important. Delays in
aircrafts can affect thousands of people across the globe either directly or
indirectly. There are a lot of reasons of delays in aircrafts such as critical
weather, security issues, traffic and many more.
There are several methods implemented in the existing system to predict
the flight delays but due to various complexities of the ATFM and the huge
datasets involved, it has become very difficult to find an accurate solution
for this complication. Many algorithms have been implemented to forecast
flight delays. We are using Python in Visual Studio Code. We implement
Binary Classification to prepare a model that can predict the delays.
WEEKNESS(GAPS) OF EXISTING SYSTEM
on the existing system, the system is not using
Data Transformation and Balancing.
This system is less performance due to lack of
Data Cleaning and Data Integration.
OBJECTIVES OF THE PROPOSED SYSTEM
The proposed work benefits from considering as many factors as possible that may potentially
influence the flight delay. For instance, airports information, weather of airports, traffic flow of
airports, traffic flow of routes. The contributions of this paper can be summarized as follows:
The system explores a broader scope of factors which may potentially influence the flight delay and
quantize those selected factors. Thus we obtain an integrated aviation dataset. Our experimental
results indicate that the multiple factors can be effectively used to predict whether a flight will delay.
Several machine learning based-network architectures are proposed and are matched with the
established aviation dataset. Traditional flight prediction problem is a binary classification task.
PROPOSED SYSTEM
The proposed work benefits from considering as many factors as possible that may
potentially influence the flight delay. For instance, airports information, weather of airports,
traffic flow of airports, traffic flow of routes. The contributions of this paper can be
summarized as follows:
The system explores a broader scope of factors which may potentially influence the flight
delay and quantize those selected factors. Thus we obtain an integrated aviation dataset.
Our experimental results indicate that the multiple factors can be effectively used to predict
whether a flight will delay.
• Several machine learning based-network architectures are proposed and are matched with
the established aviation dataset. Traditional flight prediction problem is a binary
classification task. To comprehensively evaluate the performance of the architectures,
several prediction tasks covering classification and regression are designed.Conventional
schemes mostly focused on a single route or a single airport [4], [6], [12]. However, our
work covers all routes and airports which are within our ADSB platform
ALGORITHMS USED IN PROPOSED SYSTEM
• K-Nearest Neighbors (KNN):
• Simple, but a very powerful classification algorithm
• Classifies based on a similarity measure
• Non-parametric
• Lazy learning
• Does not “learn” until the test example is given
• Whenever we have a new data to classify, we find its K-nearest neighbors from
the training data
• Example
• Training dataset consists of k-closest examples in feature space
• Feature space means, space with categorization variables (non-metric variables)
• Learning based on instances, and thus also works lazily because instance close
to the input vector for test or prediction may take time to occur in the training
dataset
Random Forest
• Random forests or random decision forests are an ensemble learning method for classification,
regression and other tasks that operates by constructing a multitude of decision trees at training
time. For classification tasks, the output of the random forest is the class selected by most trees.
For regression tasks, the mean or average prediction of the individual trees is returned. Random
decision forests correct for decision trees' habit of overfitting to their training set. Random forests
generally outperform decision trees, but their accuracy is lower than gradient boosted trees.
However, data characteristics can affect their performance.
• The first algorithm for random decision forests was created in 1995 by Tin Kam Ho[1] using the
random subspace method, which, in Ho's formulation, is a way to implement the "stochastic
discrimination" approach to classification proposed by Eugene Kleinberg.
• An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered
"Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab, Inc.).The extension
combines Breiman's "bagging" idea and random selection of features, introduced first by Ho[1] and
later independently by Amit and Geman[13] in order to construct a collection of decision trees with
controlled variance.
• Random forests are frequently used as "blackbox" models in businesses, as they generate
reasonable predictions across a wide range of data while requiring little configuration.
ADVANTAGES OF PROPOSED SYSTEM OVER
EXISTING SYSTEM
Proposed methods implementing ADS-B Message
Based Aviation Big Data Platform which is more
effective and fast.
ADS-B system is a communication and surveillance
integrated system for air traffic management (ATM)
where flights periodically broadcast location and other
information on the same frequency band.
PROPOSED SYSTEM ARCHITECTURE
SOFTWARE & HARDWARE REQUIREMENT
H/W System Configuration:-
➢ Processor - Pentium –IV
➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA
SOFTWARE REQUIREMENTS:
Operating system : Windows 7 Ultimate.
Coding Language : Python.
Front-End : Python.
Back-End : Django-ORM
Designing : Html, css, javascript.
Data Base : MySQL (WAMP Server).
FLOW CHART
Service Provider
Thank you
OUTPUTS
THANK YOU
ANY QUESTIONS?