0% found this document useful (0 votes)
149 views17 pages

DWDM Project PDF

This document discusses a project analyzing flight disruptions using data mining and warehousing techniques. The project aims to understand the causes of flight delays and cancellations in order to reduce their economic impacts. Multiple linear regression, decision tree regression, and backward elimination models are used to predict disruptions based on factors like weather, aircraft malfunctions, staffing issues. Graphs show relationships between these factors and effects like cancellations or diversions. The results and discussions section displays modeling outputs, comparing actual and predicted disruption data to evaluate the prediction capabilities of linear regression and decision tree regression techniques. Overall, the goal is to better understand the sources of disruptions and identify solutions to make airline operations more efficient and reliable for passengers.

Uploaded by

Ashi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
149 views17 pages

DWDM Project PDF

This document discusses a project analyzing flight disruptions using data mining and warehousing techniques. The project aims to understand the causes of flight delays and cancellations in order to reduce their economic impacts. Multiple linear regression, decision tree regression, and backward elimination models are used to predict disruptions based on factors like weather, aircraft malfunctions, staffing issues. Graphs show relationships between these factors and effects like cancellations or diversions. The results and discussions section displays modeling outputs, comparing actual and predicted disruption data to evaluate the prediction capabilities of linear regression and decision tree regression techniques. Overall, the goal is to better understand the sources of disruptions and identify solutions to make airline operations more efficient and reliable for passengers.

Uploaded by

Ashi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Innovative Project

Data Warehousing and Data Mining MC411

Project: Flight Disruption Analysis

DELHI TECHNOLOGICAL UNIVERSITY

SHAHBAD DAULATPUR, BAWANA ROAD

DELHI – 110042

Submitted to : Submitted by :

Ms Trasha Gupta Vanshika Bansal(2K17/MC/118)

Pranav Tripathi(2K17/EP/055)
Contents

-Introduction
-Understanding the project
-Problem Statement
-Multiple Linear Regression
-Decision Tree Regression
-Backward Elimination
Introduction

Understanding disruptions:

‘Flight disruption’ is defined as ‘situations where a scheduled flight is


cancelled, or delayed for two hours or more, within 48 hours of the original
scheduled departure time’. Disruptions in aviation cost airlines and their
customers up to $60 billion per year, or about 8% of worldwide airline revenue
according to Amadeus (2016). As of today, these disruptions are far away to be
solved due to quasi predictability of the incidents (airport constrains, weather,
etc.). Therefore, decisions need to be taken to close the gap between the
uncertainty of these incidents and the scheduled plan.
The small window for error in the flight schedules can cause disruptions to have
multiple delays propagated further down in the flight schedule. Understanding
these events (predictability, duration and impact) is the key step in the
disruption management process.

Inherent uncertainty in airline operations makes delays and disruptions


inevitable. Because the airline system operates as a closely interconnected
network, it is subject to ‘network effects’, that is, a disruption in one place can
quickly propagate to multiple other parts of the network. Therefore, managing
these delays as they arise is crucial. Disruption management is the process by
which, on the day of operation, when a disruption occurs, airlines try to bring
operations back on schedule as quickly as possible, while incurring minimal
costs. Measures such as flight cancelations, flight holds, aircraft swaps, crew
swaps, reserve crew and passenger reaccommodation are used as part of the
disruption management process.
In this work, we integrate disruption management and flight planning. Flight
planning is the process of determining, at the pre-departure stage of each flight,
its three-dimensional trajectory, involving its path, altitude(s), speed and fuel
burn as the aircraft flies from its origin to its destination. Our goal is to reduce
flight delays and disruptions to passengers using disruption management
combined with flight planning, to achieve the appropriate trade-off of passenger
service with fuel burn and additional operating costs incurred during recovery.
To our knowledge, this is the first work that integrates these aspects of airline
operations. The ability to change a flight’s speed directly impacts its block (or
flying) time, and thus, its arrival time; which in turn can impact network
connectivity of the flight’s aircraft, crew and passengers to downstream flights.
Therefore, through changes to block times, to trade-off the costs of changing a
flight’s arrival time (including, for example, network connectivity costs
capturing the costs associated with resulting delays and disruptions to the
flight’s aircraft, crews and passengers) with the change in fuel burn costs
(associated with the flight’s block time adjustment). To illustrate our integrated
disruption management and flight planning approach, consider, for example, a
flight experiencing a departure delay at its origin. The choices are to:
(1) operate the flight at increased speeds (and increased fuel burn), and employ
techniques such as aircraft swaps and flight cancelations as necessary, to absorb
delays at the flight destination and to decrease costs associated with passenger
delays and misconnections; or (2) reduce the flight’s speed using flight planning
to decrease fuel burn and emissions if connectivity is unaffected. Our
overarching goal is to decrease costs incurred during airline operations by
identifying the operational trade-offs between (i) aircraft and passenger delay
costs; and (ii) fuel burn costs.
Understanding The Project

The purpose of this project is to analyse the most evident causes of flight
disruptions and find feasible solutions, if any, for the prevailing problems.

Adverse weather conditions, strikes, political reasons and other different causes
can impact the long-term success of an airline. When multiple delays are caused
by a single event, passenger itineraries and airline schedules are seriously
damaged.

Often, the disruption problem spreads virally, because the flight that was
cancelled in one city was supposed to provide the aircraft for a departure from
another city.

According to a report made by the US Federal Aviation Administration, the


economic price of domestic flight delays entails a yearly cost of 32.9 billion
dollars to passengers, airlines and other parts of the economy.
Problem Statement !

How does the aviation industry operate? How does such a vast and vibrant
industry manage its resources amidst all the governing conditions that are well
evident in any large scale industry? How do the flights get delayed and what is
the industry doing to prevent so?

Through this project we try to get a glimpse of the aviation industry, and we try
to understand how flights get disrupted and can this problem be resolved !

Dataset is collected from the website of Bureau of Transportation Statistics,


United States Department of Transportation

https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp?pn=1
Prediction Models used to calculate our results:

Multiple Linear Regression


Multiple Linear Regression is the most common form of linear regression
analysis. As a predictive analysis, the multiple linear regression is used to
explain the relationship between one continuous dependent variable and
two or more independent variables. The independent variables can be
continuous or categorical (dummy coded as appropriate).

Multiple linear regression attempts to model the relationship between two or


more explanatory variables and a response variable by fitting a linear equation
to observed data. Every value of the independent variable x is associated with a
value of the dependent variable y. The population regression line for p
explanatory variables x1, x2, ... , xp is defined to be y = 0 + 1 x1 + 2x2 + ...
+ p xp . This line describes how the mean response y changes with the
explanatory variables. The observed values for y vary about their means y

and are assumed to have the same standard deviation . The fitted values b0,
b1, ..., bp estimate the parameters 0, 1, ..., p of the population regression line.

Formally, the model for multiple linear regression, given n observations, is

yi = 0 + 1xi1 + 2xi2 + ... pxip + i for i = 1,2, ... n.


Decision Tree - Regression
Decision tree builds regression or classification models in the form of a tree
structure. It breaks down a dataset into smaller and smaller subsets while at the
same time an associated decision tree is incrementally developed. The final
result is a tree with decision nodes and leaf nodes. A decision node (e.g.,
Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each
representing values for the attribute tested. Leaf node (e.g., Hours Played)
represents a decision on the numerical target. The topmost decision node in a
tree which corresponds to the best predictor called root node. Decision trees
can handle both categorical and numerical data.
Backward Elimination

Steps of Backward Elimination

Below are some main steps which are used to apply backward
elimination process:
Step-1: Firstly, We need to select a significance level to stay in
the model. (SL=0.05)
Step-2: Fit the complete model with all possible
predictors/independent variables.
Step-3: Choose the predictor which has the highest P-value, such
that.

a. If P-value >SL, go to step 4.

b. Else Finish, and Our model is ready.

Step-4: Remove that predictor.


Step-5: Rebuild and fit the model with the remaining variables.

Need for Backward Elimination: An optimal Multiple Linear


Regression model:

Unnecessary features increase the complexity of the model. Hence


it is good to have only the most significant features and keep our
model simple to get the better result.
So, in order to optimize the performance of the model, we will use
the Backward Elimination method. This process is used to
optimize the performance of the MLR model as it will only
include the most affecting feature and remove the least affecting
feature. Let's start to apply it to our MLR model.
Results and Discussions

The graph shown above displays the relation between the factors
of flight disruption with the effect it causes. Here the factors cause
the flight to be cancelled.
This is the relation between the factors causing disruption and
causing the flight to be diverted.

Now we imply multiple regression to analyze the data and this is


the graph we obtain. Red region is the actual data and yellow is the
predicted one. This regression data is for predicting flight
cancellation.
Now we imply multiple regression to analyze the data and this is
the graph we obtain. Red region is the actual data and blue is the
predicted one. This regression data is for predicting flight diversion.

To improve our prediction model, we apply the decision tree


regression model. Red region is the actual data and yellow is the
predicted one. This regression data is for predicting flight
cancellation.
To improve our prediction model, we apply the decision tree
regression model. Red region is the actual data and blue is the
predicted one. This regression data is for predicting flight diversion.

Moreover, to analyse which factor is most evident one, the p value


of the factors is calculated. Not much was inferred for the relation
for flight cancellation, but flight diversion relationship showed that
security delay was not that relevant.

You might also like