DM Project - Step 4

The document analyzes flight delay data through data visualization and statistical analysis. Various visualizations are created to understand the distribution of departure delays and how they vary by factors like carrier, departure time, month, day of week, precipitation, and departing airport.

Uploaded by

BHAVIKA MALHOTRA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views11 pages

DM Project - Step 4

Uploaded by

BHAVIKA MALHOTRA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Analysis & Visualization

import pandas as pd
df = pd.read_csv("full_data_flightdelay.csv")
df.drop_duplicates(inplace = True)

from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
df['DEP_TIME_BLK'] = label_encoder.fit_transform(df['DEP_TIME_BLK'])
df['CARRIER_NAME'] = label_encoder.fit_transform(df['CARRIER_NAME'])
df['DEPARTING_AIRPORT'] =
label_encoder.fit_transform(df['DEPARTING_AIRPORT'])
df['PREVIOUS_AIRPORT'] =
label_encoder.fit_transform(df['PREVIOUS_AIRPORT'])

MONTH DAY_OF_WEEK DEP_TIME_BLK DISTANCE_GROUP

SEGMENT_NUMBER \
0 1 7 3 2
1
1 1 7 2 7
1
2 1 7 1 7
1
3 1 7 1 9
1
4 1 7 0 7
1
... ... ... ... ...
...
1048570 3 7 15 10
1
1048571 3 7 3 3
1
1048572 3 7 13 7
1
1048573 3 7 3 11
1
1048574 3 7 14 5
1

CONCURRENT_FLIGHTS NUMBER_OF_SEATS CARRIER_NAME \

0 25 143 14
1 29 191 6
2 27 199 6
3 27 180 6
4 10 182 15
... ... ... ...
1048570 25 154 16
1048571 33 276 16
1048572 26 169 16
1048573 33 235 16
1048574 25 173 16

AIRPORT_FLIGHTS_MONTH AIRLINE_FLIGHTS_MONTH ...

DEPARTING_AIRPORT \
0 13056 107363 ...
41
1 13056 73508 ...
41
2 13056 73508 ...
41
3 13056 73508 ...
41
4 13056 15023 ...
41
... ... ... ...
...
1048570 11562 53007 ...
48
1048571 11562 53007 ...
48
1048572 11562 53007 ...
48
1048573 11562 53007 ...
48
1048574 11562 53007 ...
48

LATITUDE LONGITUDE PREVIOUS_AIRPORT PRCP SNOW SNWD TMAX

AWND \
0 36.080 -115.152 208 0.00 0.0 0.0 65.0
2.91
1 36.080 -115.152 208 0.00 0.0 0.0 65.0
2.91
2 36.080 -115.152 208 0.00 0.0 0.0 65.0
2.91
3 36.080 -115.152 208 0.00 0.0 0.0 65.0
2.91
4 36.080 -115.152 208 0.00 0.0 0.0 65.0
2.91
... ... ... ... ... ... ... ...
...
1048570 40.696 -74.172 208 0.03 0.0 0.0 65.0
14.09
1048571 40.696 -74.172 208 0.03 0.0 0.0 65.0
14.09
1048572 40.696 -74.172 208 0.03 0.0 0.0 65.0
14.09
1048573 40.696 -74.172 208 0.03 0.0 0.0 65.0
14.09
1048574 40.696 -74.172 208 0.03 0.0 0.0 65.0
14.09

DEP_DEL15
0 0
1 0
2 0
3 0
4 0
... ...
1048570 1
1048571 1
1048572 0
1048573 0
1048574 0

[1044213 rows x 26 columns]

1. What is the distribution of departure delays

(DEP_DEL15)?
import matplotlib.pyplot as plt

plt.hist(df['DEP_DEL15'], bins=20, color='skyblue', edgecolor='black')

plt.xlabel('Departure Delay (DEP_DEL15)')
plt.ylabel('Frequency')
plt.title('Distribution of Departure Delays')
plt.show()
2. How does the average departure delay vary by carrier
(CARRIER_NAME)?
decoded_carrier_names =
label_encoder.inverse_transform(df['CARRIER_NAME'])

plt.figure(figsize=(10, 6))
sns.barplot(x=decoded_carrier_names, y='DEP_DEL15', data=df)
plt.xticks(rotation=45)
plt.xlabel('Carrier Name')
plt.ylabel('Average Departure Delay (DEP_DEL15)')
plt.title('Average Departure Delay by Carrier')
plt.show()
3. Is there a relationship between departure delay and the
number of flight attendants per passenger
(FLT_ATTENDANTS_PER_PASS)?
sns.scatterplot(x='FLT_ATTENDANTS_PER_PASS', y='DEP_DEL15', data=df)
plt.xlabel('Flight Attendants per Passenger')
plt.ylabel('Departure Delay (DEP_DEL15)')
plt.title('Departure Delay vs Flight Attendants per Passenger')
plt.show()
There is no such relationship between the number of flight attendants per passenger and
departure delay.

4. What is the distribution of departure delays

(DEP_DEL15) for different departure time blocks
(DEP_TIME_BLK)?
plt.figure(figsize=(12, 6))
decoded_dep_time_blk =
label_encoder.inverse_transform(df['DEP_TIME_BLK'])
sns.boxplot(x='DEP_TIME_BLK', y='DEP_DEL15', data=df,
palette='viridis')
plt.xticks(rotation=45)
plt.xlabel('Departure Time Block')
plt.ylabel('Departure Delay (DEP_DEL15)')
plt.title('Distribution of Departure Delays by Time Block')
plt.show()
As we can infer from the graph, flights belonging to time blocks: 13, 14 and 16 are delayed only.

5. How does the average departure delay vary by month

(MONTH)?
import calendar

# Map month numbers to month names

df['MONTH_NAME'] = df['MONTH'].apply(lambda x: calendar.month_name[x])

plt.figure(figsize=(10, 6))
sns.lineplot(x='MONTH_NAME', y='DEP_DEL15', data=df, estimator='mean',
ci=None)
plt.xlabel('Month')
plt.ylabel('Average Departure Delay (DEP_DEL15)')
plt.title('Average Departure Delay by Month')
plt.xticks(rotation=45)
plt.show()

# Drop the newly added column after plotting, if not needed anymore
df.drop('MONTH_NAME', axis=1, inplace=True)
As we can see, most of the flights are delayed in the month of February, followed by January and
then the least in March.

6. How does departure delay vary by the day of the week

(DAY_OF_WEEK)?
sns.barplot(x='DAY_OF_WEEK', y='DEP_DEL15', data=df)
plt.xlabel('Day of Week')
plt.ylabel('Average Departure Delay (DEP_DEL15)')
plt.title('Average Departure Delay by Day of Week')
plt.show()
7. How does departure delay vary with precipitation
(PRCP)?
sns.scatterplot(x='PRCP', y='DEP_DEL15', data=df)
plt.xlabel('Precipitation (PRCP)')
plt.ylabel('Departure Delay (DEP_DEL15)')
plt.title('Departure Delay vs Precipitation')
plt.show()
As we can see, precipitation doesn't influence departure delay majorly. It's affect is almost
negligible.

8. Is there a difference in departure delay between flights

departing from different airports (DEPARTING_AIRPORT)?
decoded_airport_names =
label_encoder.inverse_transform(df['DEPARTING_AIRPORT'])

plt.figure(figsize=(50, 25))
sns.boxplot(x=decoded_airport_names, y='DEP_DEL15', data=df)
plt.xticks(rotation=45, ha='right')
plt.xlabel('Departing Airport')
plt.ylabel('Departure Delay (DEP_DEL15)')
plt.title('Departure Delay vs Departing Airport')
plt.show()
As we can see, only three airports cause a delay in flights:

1. Alexander Hamilton Airport

2. Albuquerque International Sunport
3. Columbus Metropolitan Airport

Techniques of Value Analysis and Engineering by Lawrence D Miles
84% (38)
Techniques of Value Analysis and Engineering by Lawrence D Miles
383 pages
Balanced Cantilever Bridge Design Considering Seismic Analysis Manual
50% (2)
Balanced Cantilever Bridge Design Considering Seismic Analysis Manual
31 pages
CSA Section by Section 8.6
100% (3)
CSA Section by Section 8.6
4 pages
AI Data Science Practical
No ratings yet
AI Data Science Practical
9 pages
MRA Project Milestone 1 PDF
No ratings yet
MRA Project Milestone 1 PDF
1 page
Using Dplyr To Group, Manipulate and Summarize Data
No ratings yet
Using Dplyr To Group, Manipulate and Summarize Data
9 pages
Brofessor's Physics Mind Maps - Basic Mathematics
No ratings yet
Brofessor's Physics Mind Maps - Basic Mathematics
11 pages
Dplyr Copy 4
No ratings yet
Dplyr Copy 4
4 pages
Project in English Mary Ann Habulan
No ratings yet
Project in English Mary Ann Habulan
3 pages
Assignment2 VidulGarg
No ratings yet
Assignment2 VidulGarg
11 pages
Trip 20230529 060328
No ratings yet
Trip 20230529 060328
213 pages
Radix Sort Final
No ratings yet
Radix Sort Final
27 pages
File Handling
No ratings yet
File Handling
13 pages
Dplyr
No ratings yet
Dplyr
4 pages
Data Science Python Notebook
No ratings yet
Data Science Python Notebook
15 pages
Fib Divided Into Fib Mod 49 DATA
No ratings yet
Fib Divided Into Fib Mod 49 DATA
20 pages
Network Intrusion Detection System
No ratings yet
Network Intrusion Detection System
11 pages
listprintAAC3041000012 2023 0624 1609
No ratings yet
listprintAAC3041000012 2023 0624 1609
190 pages
TP Debug Info
No ratings yet
TP Debug Info
33 pages
Data
No ratings yet
Data
31 pages
Assignmemt 1
No ratings yet
Assignmemt 1
288 pages
HR Excel Dashboard Templates 01
No ratings yet
HR Excel Dashboard Templates 01
17 pages
Pi Art
No ratings yet
Pi Art
1 page
Statistics-Exercise 3
No ratings yet
Statistics-Exercise 3
4 pages
Dplyr
No ratings yet
Dplyr
4 pages
Operations On Decimals
No ratings yet
Operations On Decimals
3 pages
Spearman and Pearson
No ratings yet
Spearman and Pearson
2 pages
Dplyr Copy 3
No ratings yet
Dplyr Copy 3
4 pages
Cell Status Report 2019 03 23 05 02 01
No ratings yet
Cell Status Report 2019 03 23 05 02 01
117 pages
MATLAB Command Window
No ratings yet
MATLAB Command Window
21 pages
Dplyr Copy 2
No ratings yet
Dplyr Copy 2
4 pages
TP Debug Info
No ratings yet
TP Debug Info
220 pages
Maf11 Exam Sem 1 - 2018
No ratings yet
Maf11 Exam Sem 1 - 2018
8 pages
TP Debug Info
No ratings yet
TP Debug Info
1,687 pages
1ra Diferencias Cjardin Sep19
No ratings yet
1ra Diferencias Cjardin Sep19
10 pages
Practical PRogram List 2.ipynb - Colab
No ratings yet
Practical PRogram List 2.ipynb - Colab
6 pages
Fast Fourier Transform
No ratings yet
Fast Fourier Transform
37 pages
TP Debug Info
No ratings yet
TP Debug Info
262 pages
676 Rows × 17 Columns: Import As
0% (1)
676 Rows × 17 Columns: Import As
2 pages
Maths (Assignment) 2
No ratings yet
Maths (Assignment) 2
7 pages
Makala
No ratings yet
Makala
4 pages
listprintACC2011020947 2024 1023 1746
No ratings yet
listprintACC2011020947 2024 1023 1746
136 pages
TP Debug Info
No ratings yet
TP Debug Info
2,225 pages
TR DFS Appendix 6360712
No ratings yet
TR DFS Appendix 6360712
30 pages
Badminton Yalgaar Score Sheet
No ratings yet
Badminton Yalgaar Score Sheet
2 pages
TP Debug Info
No ratings yet
TP Debug Info
33 pages
TP Debug Info
No ratings yet
TP Debug Info
230 pages
TP Debug Info
No ratings yet
TP Debug Info
985 pages
Calculation of Service Life: Load Data
No ratings yet
Calculation of Service Life: Load Data
4 pages
TP Debug Info
No ratings yet
TP Debug Info
361 pages
TP Debug Info
No ratings yet
TP Debug Info
17 pages
ML LAB Prob 1 5
No ratings yet
ML LAB Prob 1 5
22 pages
TP Debug Info
No ratings yet
TP Debug Info
170 pages
Maximum Deflections Forload Case Wind - X: X:8 .0 9 M M A T N O D E 2 7 1 3 Y:-0.22 MM Atnode 352 Z:-0.00 MM Atnode 35
No ratings yet
Maximum Deflections Forload Case Wind - X: X:8 .0 9 M M A T N O D E 2 7 1 3 Y:-0.22 MM Atnode 352 Z:-0.00 MM Atnode 35
1 page
CS 140 Lecture 6: Professor CK Cheng UC San Diego
No ratings yet
CS 140 Lecture 6: Professor CK Cheng UC San Diego
10 pages
Zeb Epc Worse
No ratings yet
Zeb Epc Worse
47 pages
Vertopal Notebook
No ratings yet
Vertopal Notebook
21 pages
K Map
No ratings yet
K Map
56 pages
Nikitha
No ratings yet
Nikitha
15 pages
Mughal History1
No ratings yet
Mughal History1
8 pages
TP Debug Info
No ratings yet
TP Debug Info
68 pages
TP Debug Info
No ratings yet
TP Debug Info
2,002 pages
The Red Baron’s Ultimate Ducati Desmo Manual: BELT-DRIVEN CAMSHAFTS L-TWINS 1979 TO 2017
From Everand
The Red Baron’s Ultimate Ducati Desmo Manual: BELT-DRIVEN CAMSHAFTS L-TWINS 1979 TO 2017
Eduardo Cabrera Choclán
No ratings yet
Test (POS)
No ratings yet
Test (POS)
2 pages
10.1515 - Hjbpa 2017 0005666666
No ratings yet
10.1515 - Hjbpa 2017 0005666666
14 pages
Roswell 2
No ratings yet
Roswell 2
994 pages
Final Project
No ratings yet
Final Project
40 pages
Friday July 30, 2010 Leader
No ratings yet
Friday July 30, 2010 Leader
47 pages
Map Work - Geography (Locating and Labelling)
No ratings yet
Map Work - Geography (Locating and Labelling)
17 pages
Jumper (3.5e Class) - D&D Wiki
No ratings yet
Jumper (3.5e Class) - D&D Wiki
8 pages
A0158-00326-02 WFC
No ratings yet
A0158-00326-02 WFC
14 pages
Response of Newly Collected Acetobacter Isolates in Sweet Corn (Zea Mays L. Saccharata)
No ratings yet
Response of Newly Collected Acetobacter Isolates in Sweet Corn (Zea Mays L. Saccharata)
5 pages
Advanced MR Imaging of The Pancreas
No ratings yet
Advanced MR Imaging of The Pancreas
15 pages
Lecture Notes 06
67% (3)
Lecture Notes 06
34 pages
ILS Science - Question Paper 1 (Jun 2024)
No ratings yet
ILS Science - Question Paper 1 (Jun 2024)
36 pages
A100 User Guide 8750163000 RevC1
No ratings yet
A100 User Guide 8750163000 RevC1
44 pages
General Notes:: Schedule of Loads and Computation
No ratings yet
General Notes:: Schedule of Loads and Computation
3 pages
Polar Manual
No ratings yet
Polar Manual
75 pages
Military Chants For Criminology.
No ratings yet
Military Chants For Criminology.
3 pages
Article Writing
No ratings yet
Article Writing
3 pages
RS PRO Digital, Bench Power Supply, 217W, 3 Output, 0 32V 3 A, 5 A
No ratings yet
RS PRO Digital, Bench Power Supply, 217W, 3 Output, 0 32V 3 A, 5 A
5 pages
Eurotile Pricelist2015 3
No ratings yet
Eurotile Pricelist2015 3
147 pages
XSEED Quarter1 English 3
No ratings yet
XSEED Quarter1 English 3
6 pages
Purchase Spec. For Plates-Copper Alloy (SB171 Uns C46400)
No ratings yet
Purchase Spec. For Plates-Copper Alloy (SB171 Uns C46400)
4 pages
Enhancing Alternate Fuel in Cement Manufacturing Process: A Sustainable Technological Approach
100% (1)
Enhancing Alternate Fuel in Cement Manufacturing Process: A Sustainable Technological Approach
33 pages
Delhi Public School Vadodara: Academic Session 2024-2025 Practice Paper-14
No ratings yet
Delhi Public School Vadodara: Academic Session 2024-2025 Practice Paper-14
5 pages
Section 3 - Group Assignment
No ratings yet
Section 3 - Group Assignment
7 pages
CST 303
No ratings yet
CST 303
19 pages
DBAF396689
No ratings yet
DBAF396689
562 pages
Anti-Skimming Protection For Your ATM: Flexible Protection For Dip and Motorized Card Readers
No ratings yet
Anti-Skimming Protection For Your ATM: Flexible Protection For Dip and Motorized Card Readers
2 pages