Flight Delayed
Flight Delayed
ipynb - Colaboratory
NetworkX example
import networkx as nx
Initialize the graph. Graph() is for undirected and DiGraph() is for directed graph
G = nx.Graph()
#G = nx.DiGraph()
nodes = ["A","B","C"]
G.add_nodes_from(nodes)
G.add_edge("A","B")
plt.show()
nx.draw(G)
nx.draw(G, with_labels=True)
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 1/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
Airlines Network
data = pd.read_csv('/airlines_network_optimization.csv')
Data Overview
Mounted at /content/drive
data.shape
(99, 16)
data.head(5)
Create a new dataframe by selecting only origin, dest and distance columns
data_distance
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 2/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
99 rows × 3=columns
data_distance data_distance.groupby(['origin', 'dest'], as_index=False).mean()
data_distance
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 3/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
25 JFK DEN 1626.0
53 using
Add nodes LGA origin
PBIand dest
1035.0
column
54 LGA RDU 431.0
graphD.add_nodes_from(data_distance['origin'])
55 LGA STL 888.0
graphD.add_nodes_from(data_distance['dest'])
56 LGA TPA 1010.0
nx.draw(graphD, with_labels=True)
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 4/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
pos = nx.spring_layout(graphD)
edge_labels=dict([((u,v,),d['weight']) for u,v,d in graphD.edges(data=True)])
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 5/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
94 LGA TPA -6
95 EWR LAX 0
96 JFK BOS -1
97 EWR SJU 1
98 LGA IAH -4
99 rows × 3 columns
Normalize the data to remove negative values. Negative means the flight depart before schedule. Let's just take only rows with dep_delay > 0
data_delayed
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 6/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
0 EWR MEM 97
1 LGA FLL 14
3 JFK DEN 7
16 JFK TPA 16
24 EWR TPA 19
25 EWR MSY 2
35 LGA MSP 2
36 LGA DTW 2
43 EWR IAH 4
44 JFK CLT 9
45 LGA STL 2
54 EWR IAD 2
55 LGA PBI 27
62 JFK LAX 21
64 LGA CVG 2
65 EWR CVG 68
67 EWR IAH 5
68 LGA DTW 14
69 LGA ORD 37
71 LGA TPA 86
72 JFK PBI 48
80 EWR MSP 9
83 EWR LAX 17
Some 84
record has
JFK no value in dep_delay
MCO 235 column. Drop those record(s).
88 JFK SJU 1
data_delayed = data_delayed.dropna()
89 JFK LAX 119
90
data_delayed JFK ATL 48
97 EWR SJU 1
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 7/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
0 EWR MEM 97
1 LGA FLL 14
3 JFK DEN 7
16 JFK TPA 16
24 EWR TPA 19
25 EWR MSY 2
35 LGA MSP 2
36 LGA DTW 2
43 EWR IAH 4
Draw network graph
44 JFK CLT 9
54 EWR IAD 2
graph = nx.DiGraph()
55 LGA PBI 27
graph.add_nodes_from(data_delayed['origin'])
62 JFK LAX 21
64 LGA CVG 2
for index, row in data_delayed.iterrows():
65 EWR CVG 68 row['dest'], weight=row['dep_delay'])
graph.add_edge(row['origin'],
67 EWR IAH 5
68 LGA DTW 14
weight = [graph.edges[e]['weight'] for e in graph.edges]
69 LGA ORD 37
nx.draw(graph,
71 LGAnx.spring_layout(graph),
TPA 86 node_color='#FA8072', edge_color=weight, width=2, edge_cmap=plt.cm.Reds, with_labels=True)
72 JFK PBI 48
80 EWR MSP 9
83 EWR LAX 17
88 JFK SJU 1
90 JFK ATL 48
97 EWR SJU 1
Question 2: Which airport (EWR, JFK or LGA) has the most number of departure delayed
flight?
Question 3: Draw a network graph using air_time as the edges weight. Compare it with
the distance graph in Question 1 to know whether there is a direct correlation between
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 8/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
air time and distance. Is the graph similar or there are some differences? You can use
the same code as in Question 1. You just need to change the column name
data_airtime = data.filter(['origin', 'dest', 'air_time'])
data_airtime
96 JFK BOS 40
99 rows × 3 columns
data_airtime = data_airtime.dropna()
data_airtime
96 JFK BOS 40
99 rows × 3 columns
graph2 = nx.DiGraph()
graph2.add_nodes_from(data_airtime['origin'])
for index, row in data_airtime.iterrows():
graph2.add_edge(row['origin'], row['dest'], weight=row['air_time'])
https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 9/9