Region Ex
Region Ex
Learning Journal
Objectives
• This case uses a data set modeled after U.S.
commercial airline flight delay data to explore
the use of descriptive statistics and graphical
analysis in solving a business problem.
• The case illustrates potential data analysis pitfalls
and paradoxes, and demonstrates that even
when using simple methodology, an analyst
must act like a detective to discover what the
data do or do not say about the problem at hand.
After completing this case, you should be able to:
• We see here that while RegionEx does indeed have a higher average
flight delay than MDA, it also has a large standard deviation.
• We should perform a two‐sample t‐test with unequal variances and
significance level α= 0.05 to test the hypothesis that the means are
different.
• The null hypothesis is that the means μ1 and μ2 of delayed flights at
RegionEx and MDA, respectively, are equal.
• The alternative hypothesis is that they are unequal.
• Our test statistic is:
Rankings alone do not give any indication of the
magnitude of difference.
Distributions matter
• That RegionEx has a higher average delay than MDA but a lower median
delay is indicative of RegionEx having a skewed distribution of flight delays.
• Indeed, this skewness is also indicated by the difference in the two airlines’
90th percentile delay values and RegionEx’s extremely large standard
deviation.
• This suggests that while half of RegionEx flights arrive within 9 minutes of
schedule (compared to MDA’s 13.0 minutes), some RegionEx flights arrived
very late, pulling the average delay upward.
• It isn’t immediately clear which is the better statistic to use: passengers might
be more concerned about the likelihood of delays, which suggests the use
of percentiles, whereas costs incurred by the airlines in terms of fuel and
crew are a function of the average delay.
How do we compare apples‐to‐apples?
• MDA cancelled three of its flights, which excludes them
from the computation of the above descriptive statistics.
• Some airlines cancel extremely delayed flights and prefer to
rebook those passengers on a different flight.
• If MDA uses a policy of cancelling extremely delayed flights
while RegionEx does not, then this could explain why
RegionEx had a few extremely large delay values that pulled
its mean delay upward.
• It is unclear which policy is better from a passenger’s
standpoint, but this is at least one plausible explanation for
the difference between the two airlines’ delay statistics.
Inspect the distribution of RegionEx’s arrival delays by constructing a histogram of the number of
arrival delay minutes of RegionEx’s flights. Do the same for MDA’s flights. How do these two
distributions compare? Interpret the meaning of the descriptive statistics from Question 1 in relation
to these histograms. What, if any, additional information do the histograms provide?
Histograms show us the distribution of the flight delays
• We were told only that RegionEx “ranked worse” than MDA in flight delays; we see
here that the actual difference in the percentage of delayed flights is small.
• To test whether this difference is significant, we perform a hypothesis test for two
proportions.
• The null hypothesis is that the proportions p 1 and p2 of delayed flights of RegionEx and
MDA, respectively, are equal. The alternative hypothesis is that they are unequal is:
Definitions matter
• We see that although RegionEx has a higher percentage of delayed flights in the
aggregate, when we look at each route individually, RegionEx does no worse than
MDA on any route.
• Moreover, on routes between DFW and MSY, it experiences a lower fraction of
delayed flights than MDA.
• Why is this reasonable? MDA flies the same number of flights on each of the four
legs, so its total percentage delay is just a straight average of the delays on the
four legs.
• RegionEx, however, flies three times as many flights on the DFW routes as on the
PNS routes, so these receive three times the weight in the aggregated average.
• Moreover, Dallas‐Fort Worth legs experience higher delays (for both airlines) than
the Pensacola routes.
• Thus, the total percentage delay is pulled higher for RegionEx than for MDA. Thus,
a same metric applied to different levels of aggregation can yield a different
ranking.
Adjusted average delay
• ((90+28)*0.256+(90+29)*0.289+(30+30)*0.200+(30+30)*0.267)/357=0.259.
• If MDA was operating all flights then the average delay would be:
• ((90+28)*0.267+(90+29)*0.300+(30+30)*0.200+(30+30)*0.267)/357=0.267.
Consider the RegionEx flights only. Prepare a scatter plot of arrival delay minutes versus number of passengers.
Your scatter plot should consist of 240 data points, one for each flight in the data set where the vertical
coordinate is arrival delay minutes of that flight and the horizontal coordinate is the number of passengers.
What is the correlation coefficient between arrival delay minutes and number of passengers for RegionEx’s
flights? Interpret your results.
• Although RegionEx shows a greater average flight delay, its flights on the
DFW routes are actually shorter than those of MDA.
• MDA has built 10 extra minutes of padding into the scheduled flight time of
DFW routes and 5 extra minutes on PNS routes as compared to RegionEx.
• If we adjust RegionEx’s delays for this padding, the average delay drops to
6.9 minutes per flight, and their on‐time performance increases to 92%!
• A flight schedule can be padded to make delay statistics appear more
favorable to an airline even if passengers actually have to spend longer at
the airport.
• However, because passengers make plans based on the scheduled arrival
time of the flight, it is useful for scheduled flight times to match reality.
RegionEx might benefit from increasing its scheduled flight times.
• What other factors should Marion Volero take
into consideration regarding the data
analysis?
• ‐ Sampling: Was the sample large enough or
representative of all flights on those legs?
• Is September a representative month, or did
something peculiar happen during September
that might skew the data in one direction or
another for one or both airlines?
Flight Delays versus Passenger Delays