0% found this document useful (0 votes)
28 views14 pages

Data Presentation Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views14 pages

Data Presentation Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Ingoude Company

DATASET: NEWYORKCITY_FLIGHTS

MID-TERM: DATA
ANALYTICS FOR BUSINESS
LE NGUYEN QUYNH TRANG - 23006478

01
Le Nguyen Quynh Trang - 23006478

1 Introduction

2 Data Cleaning

Problem Statement: Airport &


3
TABLE OF
Airline flight Inform

CONTENT Problem Statement: Delay


4 Information

5 Solution & Conclusion


02
Le Nguyen Quynh Trang - 23006478 Data Introduction Data Cleaning Problem Statement: Airport & Airline flight Inform Problem Statement: Delay Information Conclusion

Data name: NewYorkCity_flights


Data of 3 airport of NYC
John F. Kennedy International Airport (JFK)
located in Southern Queens
Newark Liberty International Airport (EWR)
located in New Jersey
LaGuardia Airport (LGA) Located in: Northern
Queens

Key column
Date: The date of the flight in 2013.
Dep Delay: Departure delay in minutes
Arr Delay: Arrival delay in minutes
Carrier: Airline carrier code (16 carriers)
Origin: Origin airport code (3 origins)
Air Time: Flight airtime in minutes
Distance: Distance traveled in miles
Le Nguyen Quynh Trang - 23006478 Data Introduction Data Cleaning Problem Statement: Airport & Airline flight Inform Problem Statement: Delay Information Conclusion

DATA CLEANING BY GOOGLE COLAB


1. Read Data: Use pd.read_csv to read the CSV file with a
semicolon (;) delimiter.
2. Handle Missing Values: Check for missing values and fill them
with the mean value of the corresponding column.
3. Convert Data Types: Convert the 'Date' column to datetime
format, and the 'Carrier' and 'Origin' columns to text format.
4. Handle Blank Values: Replace blank values with NaN and then
fill them with the mean value for numerical columns.
5. Handle Duplicates: Remove duplicate rows from the dataset.

CREATE NEW DATA CATEGORIES


Le Nguyen Quynh Trang - 23006478 Problem Statement: Airport & Airline flight Inform

Problem Statement:
Airport & Airline flight Inform
Le Nguyen Quynh Trang - 23006478 Problem Statement: Airport & Airline flight Inform

Question: What is the market share distribution among


#Graphical representation of carriers scheduled
different carriers in terms of scheduled flights, and how flights in numbers and %
does this impact the competitive landscape of the airline
industry?

This distribution can help identify which carriers dominate the flight operations and which have a minimal
presence.
B6 and UA are the most frequent carriers, followed by B6.

This visualization helps in understanding the market share of each carrier in terms of flight operations.
The pie chart confirms that United Airlines (UA), JetBlue (B6), Delta Air Lines (DL) are the major players, each
contributing a significant share of the total flights.
Le Nguyen Quynh Trang - 23006478 Problem Statement: Airport & Airline flight Inform

#Number of scheduled flight #Overall pattern of air time


from different origin from NYC airports

Comparison Between Origin Airports


Median: JFK has the highest median air time (178 minutes), followed
EWR is the busiest airport, followed by JFK and then LGA. by EWR (153 minutes) and LGA (117 minutes).
The distribution of flights is relatively balanced among the three Interquartile Range (IQR): JFK has the widest IQR, indicating greater
airports, with no single airport dominating the total flight count. variability in air time from JFK compared to LGA and EWR.

04

Question: How do the three major NYC airports (EWR, JFK, and LGA) compare in terms of flight volume and air time patterns,
and what implications does this have for airport operations and resource allocation?
Le Nguyen Quynh Trang - 23006478

Problem Statement:
Delay Information

25% Delay flights/ total

07
Le Nguyen Quynh Trang - 23006478 Problem Statement: Delay information

Detailed Analysis and Insights


Descriptive Statistics
Departure Delay (Dep Delay)

Departure Delay (Dep Delay):The average departure delay is positive


(12.63 minutes), indicating that flights tend to depart later than
scheduled.
The median is negative (-2 minutes), suggesting that more than half
of the flights depart earlier than scheduled.
The high skewness and kurtosis values indicate a significant number
of extreme delays.

Arrival Delay (Arr Delay)

Arrival Delay (Arr Delay):The average arrival delay is slightly positive


(7.09 minutes), indicating that flights tend to arrive later than
scheduled.
The median is negative (-5 minutes), suggesting that more than half
of the flights arrive earlier than scheduled.
Similar to departure delay, the high skewness and kurtosis values
indicate a significant number of extreme delays.

Air Time

The average air time is 150.72 minutes, with a median of 129 minutes.
The distribution of air time is slightly right-skewed, indicating that
there are some flights with significantly longer air times.
The range of air time is substantial (665 minutes), showing a wide 08
variability in flight durations.
Le Nguyen Quynh Trang - 23006478 Problem Statement: Delay information

# Monthly Average Departure Delay # Average Arrival Delay by Carrier

Seasonal Trends:
Summer Months: June and July show significantly higher average departure Carriers like AS (Alaska Airlines) and HA (Hawaiian Airlines) tend to arrive
delays, which could be due to increased travel during the summer vacation period.
early on average.
Winter Month: December also shows a higher average departure delay, possibly
Carriers such as F9 (Frontier Airlines), FL (AirTran Airways), and YV (Mesa
due to holiday travel and weather-related disruptions.
Airlines) exhibit significant delays, indicating major punctuality issues.
Fall Months: September, October, and November have the lowest average
departure delays, indicating more stable flight schedules during these months Most other carriers show moderate delays, with AA (American Airlines)
demonstrating excellent punctuality with minimal delays.

04
Question: How do departure and arrival delays vary across months and carriers, and what insights can be drawn to improve
operational efficiency and customer satisfaction in the airline industry?
Le Nguyen Quynh Trang - 23006478 Problem Statement: Airport & Airline flight Inform

#Top 5 fastest flights from NYC #Heatmap of flight speeds by carriers monthly

The peak season for air travel in New York is considered to be summer (June-August) and lean season is winter (November-February). The airlines operate the highest number of
flights and carry maximum passenger load during the summer season. The data proves that this statement is true, with most airlines having maximum departures between June
and August and minimum departures between November and February.
From the heatmap of flight speeds by carriers monthly, it is visible that during the peak summer months (particularly July and August), most of the airlines tend to fly faster than
normal flight speed, likely to cover maximum departures and manage the increased demand. This is evident from the darker shades in the heatmap for these months across
multiple carriers.
Whereas, it is observed that during the lean winter months (November-February), flight speeds are generally lower, as indicated by the lighter shades in the heatmap for most
04
carriers during these months.
This pattern suggests that airlines adjust their operations seasonally, increasing speeds and potentially the number of flights during peak travel times to accommodate higher
passenger volumes, while reducing speeds and possibly flight frequencies during slower periods to optimize costs and efficiency.
Le Nguyen Quynh Trang - 23006478

SOLUTION

Enhanced Data Collection: Include additional data points such as taxi in and out times, flight diversions,
and fuel consumption to better understand and manage delays.
Dynamic Scheduling: Implement dynamic scheduling systems that adjust flight schedules based on real-
time data to minimize delays.
Speed Adjustments: Allow pilots to increase flight speeds for delayed flights to catch up on lost time,
balancing fuel consumption and punctuality.
Resource Allocation: Optimize resource allocation at airports, such as gate assignments and ground crew
availability, to reduce turnaround times.
Predictive Analytics: Use predictive analytics to anticipate delays and proactively manage flight
operations, improving overall efficiency
Le Nguyen Quynh Trang - 23006478

CONCLUSION

Though the dataset doesn't offer reasons for delays and missing important data such as taxi in and out, flight
diversion, chocks on and off timing, and fuel consumption. So, it is clear that the dataset doesn't provide clear
understanding of delay issues, which may be supportive to look into delays that can be controlled or reduced.

For example: If airlines permits pilots to fly aircraft at higher speed and fuel consumption on planes that
departed late, the delay spread can be minimize along the flight network. This would decrease the possible
delay itself and significatively reduced the number of aircraft delays.

A solution applicable to one type of delay may affect the others, resulting in a ripple effect that will allow more
efficient operations; benefiting passengers, airports, carriers.

07
Le Nguyen Quynh Trang - 23006478

Thank
You
10

You might also like