0% found this document useful (0 votes)
9 views9 pages

Flight Delayed

The document discusses network visualization of airline flight data. It loads airline data, creates a networkx graph with airports as nodes and flights as edges weighted by distance. It then visualizes the graph, finding that airport JFK has the most number of long distance flights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Flight Delayed

The document discusses network visualization of airline flight data. It loads airline data, creates a networkx graph with airports as nodes and flights as edges weighted by distance. It then visualizes the graph, finding that airport JFK has the most number of long distance flights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

11/26/23, 11:09 PM Lab Activity - Network Visualization.

ipynb - Colaboratory

NetworkX example

Import the networkx library

import networkx as nx

Initialize the graph. Graph() is for undirected and DiGraph() is for directed graph

G = nx.Graph()
#G = nx.DiGraph()

Add nodes from a list / array

nodes = ["A","B","C"]
G.add_nodes_from(nodes)

Add an edge from A to B

G.add_edge("A","B")

Draw the graph

import matplotlib.pyplot as plt

plt.show()
nx.draw(G)

Draw with node label

nx.draw(G, with_labels=True)

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 1/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory

Airlines Network

Import libraries and dataset

import numpy as np # linear algebra


import pandas as pd # data processing and data structure

Load the dataset

data = pd.read_csv('/airlines_network_optimization.csv')

Data Overview

from google.colab import drive


drive.mount('/content/drive')

Mounted at /content/drive

data.shape

(99, 16)

Display the first 5 records in the dataset

data.head(5)

year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time a

0 2013 2 26 1807 1630 97 1956 1837

1 2013 8 17 1459 1445 14 1801 1747

2 2013 2 13 1812 1815 -3 2055 2125

3 2013 4 11 2122 2115 7 2339 2353

4 2013 8 5 1832 1835 -3 2145 2155

Drawing network graph of flight distance

Create a new dataframe by selecting only origin, dest and distance columns

data_distance = data.filter(['origin', 'dest', 'distance'])

Display the new dataframe

data_distance

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 2/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory

origin dest distance

0 EWR MEM 946

1 LGA FLL 1076

2 EWR SEA 2402

3 JFK DEN 1626

4 JFK SEA 2422

... ... ... ...

94 LGA TPA 1010

95 EWR LAX 2454

96 JFK BOS 187

97 EWR SJU 1608


There are several records where origin and dest are the same. Group them together and calculate the mean distance for that group if any.
98 LGA IAH 1416

99 rows × 3=columns
data_distance data_distance.groupby(['origin', 'dest'], as_index=False).mean()
data_distance

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 3/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory
25 JFK DEN 1626.0

26 JFK DTW 509.0

27 JFK FLL 1069.0

28 JFK IAD 228.0

29 JFK JAX 828.0

30 JFK LAX 2475.0

31 JFK MCO 944.0

32 JFK PBI 1028.0

33 JFK SEA 2422.0

34 JFK SJU 1598.0

35 JFK SRQ 1041.0

36 JFK TPA 1005.0

37 LGA ATL 762.0

38 LGA BHM 866.0

39 LGA CLT 544.0

40 LGA CMH 479.0

41 LGA CVG 585.0

42 LGA DCA 214.0

43 LGA DEN 1620.0

44 LGA DTW 502.0

45 LGA FLL 1076.0

46 LGA IAD 229.0

47 LGA IAH 1416.0

48 LGA MCO 950.0


Draw network graph based on the data above. Identify which airport (EWR, JFK or LGA) has the most number of long distance flight.
49 LGA MDW 725.0

50 LGA MIA 1096.0


import networkx as nx
51 LGA MSP 1020.0
graphD = nx.DiGraph()
52 LGA ORD 733.0

53 using
Add nodes LGA origin
PBIand dest
1035.0
column
54 LGA RDU 431.0
graphD.add_nodes_from(data_distance['origin'])
55 LGA STL 888.0
graphD.add_nodes_from(data_distance['dest'])
56 LGA TPA 1010.0

Add edges with distance as the weight

for index, row in data_distance.iterrows():


graphD.add_edge(row['origin'], row['dest'], weight=row['distance'])

Draw the graph

import matplotlib.pyplot as plt


plt.show()

nx.draw(graphD, with_labels=True)

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 4/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory

Lets draw the label for the edges

pos = nx.spring_layout(graphD)
edge_labels=dict([((u,v,),d['weight']) for u,v,d in graphD.edges(data=True)])

nx.draw(graphD, pos, with_labels=True)


nx.draw_networkx_edge_labels(graphD,pos,edge_labels=edge_labels)

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 5/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory

{('EWR', 'BOS'): Text(0.34717333823612767, -0.2352327433966388, '200.0'),


('EWR', 'CVG'): Text(-0.1219596621367476, -0.38961664996052137, '569.0'),
('EWR', 'DFW'): Text(-0.3352747551485213, -0.16334520274627184, '1372.0'),
('EWR', 'IAD'): Text(-0.4815249181096476, 0.21080692310871185, '212.0'),
('EWR', 'IAH'): Text(-0.39974887175863405, 0.3754992830540069, '1400.0'),
('EWR', 'IND'): Text(-0.3925309166976173, 0.29091877548460665, '645.0'),
('EWR', 'LAX'): Text(-0.22071915879980544, 0.49696270993680997, '2454.0'),
('EWR', 'MCO'): Text(-0.31042212792148116, 0.4512580304058955, '937.0'),
('EWR', 'MEM'): Text(0.18116175190146444, -0.3384410415138626, '946.0'),
('EWR', 'MIA'): Text(0.1408098692387888, 0.5667378937823518, '1085.0'),
('EWR', 'MSP'): Text(0.44368252609923947, 0.26010802335761557, '1008.0'),
('EWR', 'MSY'): Text(0.09562273141941097, -0.39327628931425823, '1167.0'),
('EWR', 'ORD'): Text(-0.4596279157574, 0.039352349420861556, '719.0'),
('EWR', 'PBI'): Text(0.4426389053408897, -0.1859033634485659, '1023.0'),
('EWR', 'RDU'): Text(-0.052004077748965226, 0.55643929320759, '416.0'),
('EWR', 'RSW'): Text(-0.3366231787243114, -0.3022052720406718, '1068.0'),
('EWR', 'SEA'): Text(0.03636006922501661, 0.5958029595968523, '2402.0'),
('EWR', 'SFO'): Text(0.38978342790312553, 0.4602155197542636, '2565.0'),
('EWR', 'SJU'): Text(0.4267774486062334, 0.35746535010220065, '1608.0'),
('EWR', 'TPA'): Text(0.25126850639905984, 0.5506072167197091, '997.0'),
weight ('JFK',
= [graphD.edges[e]['weight'] for e in graphD.edges]
'ATL'): Text(0.2876553328498047, 0.42935093497598203, '760.0'),
('JFK', 'BOS'): Text(0.343383746063162, -0.27049083812446567, '187.0'),
('JFK', 'CLE'): Text(-0.24275899472649543, -0.3729213133172455, '425.0'),
nx.draw(graphD, pos, node_color='#A0CBE2',
('JFK', 'CLT'): edge_color=weight,
Text(0.46832732639200114, width=2,
0.05166425538083779, edge_cmap=plt.cm.Blues, with_labels=True)
'541.0'),
('JFK', 'DCA'): Text(0.4935188094349753, -0.04091672247641189, '213.0'),
('JFK', 'DEN'): Text(-0.11732291598550598, -0.33087911722892427, '1626.0'),
('JFK', 'DTW'): Text(0.2714231708413726, -0.343012173859437, '509.0'),
Question 1: Which airport (EWR, JFK or LGA) has the most number of long distance
('JFK', 'FLL'): Text(-0.005340912572215486, -0.40922454363260924, '1069.0'),
('JFK', 'IAD'): Text(-0.48531451028261324, 0.17554882838088495, '228.0'),
flight ('JFK', 'JAX'): Text(-0.42476279309307163, 0.0982398779840032, '828.0'),
('JFK', 'LAX'): Text(-0.2245087509727711, 0.4617046152089831, '2475.0'),
('JFK', 'MCO'): Text(-0.3142117200944468, 0.41599993567806864, '944.0'),
('JFK', 'PBI'): Text(0.438849313167924, -0.22116145817639277, '1028.0'),
('JFK', 'SEA'): Text(0.03257047705205096, 0.5605448648690253, '2422.0'),
Drawing network graph of flight delays
('JFK', 'SJU'): Text(0.42298785643326775, 0.3222072553743738, '1598.0'),
('JFK', 'SRQ'): Text(-0.4437820730565589, -0.2273168209141458, '1041.0'),
('JFK', 'TPA'): Text(0.2474789142260942, 0.5153491219918823, '1005.0'),
('LGA', 'ATL'): Text(0.30345675245929205, 0.40154240248432727, '762.0'),
Create('LGA',
a new dataframe by selecting origin, dest and
'BHM'): Text(-0.146988835704565, dep_delay columns '866.0'),
0.5046050572741247,
('LGA', 'CLT'): Text(0.4841287460014885, 0.023855722889183036, '544.0'),
('LGA', 'CMH'): Text(0.44111023942248967, -0.15351661614217302, '479.0'),
data_delayed
('LGA',= 'CVG'):
data.filter(['origin', 'dest', 'dep_delay'])
Text(-0.10994783470022594, -0.452683277180003, '585.0'),
('LGA', 'DCA'): Text(0.5093202290444626, -0.06872525496806664, '214.0'),
('LGA', 'DEN'): Text(-0.10152149637601868, -0.35868764972057904, '1620.0'),
data_delayed
('LGA', 'DTW'): Text(0.2872245904508599, -0.37082070635109177, '502.0'),
('LGA', 'FLL'): Text(0.010460507037271824, -0.437033076124264, '1076.0'),
('LGA', 'IAD'):
origin dest Text(-0.4695130906731259,
dep_delay 0.1477402958892302, '229.0'),
('LGA', 'IAH'): Text(-0.38773704432211237, 0.3124326558345253, '1416.0'),
('LGA',
0 EWR'MCO'):
MEM Text(-0.2984103004849595,
97 0.3881914031864139, '950.0'),
('LGA', 'MDW'): Text(-0.4538752269072912, -0.12703456079419814, '725.0'),
('LGA',
1 LGA'MIA'):
FLL Text(0.15282169667531048,
14 0.5036712665628702, '1096.0'),
('LGA', 'MSP'): Text(0.45569435353576115, 0.19704139613813396, '1020.0'),
2 EWR
('LGA', SEA Text(-0.4476160883208783,
'ORD'): -3 -0.02371427779862008, '733.0'),
('LGA', 'PBI'): Text(0.45465073277741136, -0.2489699906680475, '1035.0'),
3 JFK DEN 7
('LGA', 'RDU'): Text(-0.039992250312443564, 0.49337266598810836, '431.0'),
('LGA', 'STL'): Text(0.5161229109133811, 0.11963212566402007, '888.0'),
4 JFK SEA -3
('LGA', 'TPA'): Text(0.2632803338355815, 0.4875405895002275, '1010.0')}
... ... ... ...

94 LGA TPA -6

95 EWR LAX 0

96 JFK BOS -1

97 EWR SJU 1

98 LGA IAH -4

99 rows × 3 columns

Normalize the data to remove negative values. Negative means the flight depart before schedule. Let's just take only rows with dep_delay > 0

data_delayed =data_delayed[data_delayed['dep_delay'] > 0]

data_delayed

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 6/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory

origin dest dep_delay

0 EWR MEM 97

1 LGA FLL 14

3 JFK DEN 7

16 JFK TPA 16

24 EWR TPA 19

25 EWR MSY 2

27 JFK DTW 199

35 LGA MSP 2

36 LGA DTW 2

43 EWR IAH 4

44 JFK CLT 9

45 LGA STL 2

54 EWR IAD 2

55 LGA PBI 27

62 JFK LAX 21

64 LGA CVG 2

65 EWR CVG 68

67 EWR IAH 5

68 LGA DTW 14

69 LGA ORD 37

71 LGA TPA 86

72 JFK PBI 48

80 EWR MSP 9

83 EWR LAX 17
Some 84
record has
JFK no value in dep_delay
MCO 235 column. Drop those record(s).

88 JFK SJU 1
data_delayed = data_delayed.dropna()
89 JFK LAX 119

90
data_delayed JFK ATL 48

97 EWR SJU 1

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 7/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory

origin dest dep_delay

0 EWR MEM 97

1 LGA FLL 14

3 JFK DEN 7

16 JFK TPA 16

24 EWR TPA 19

25 EWR MSY 2

27 JFK DTW 199

35 LGA MSP 2

36 LGA DTW 2

43 EWR IAH 4
Draw network graph
44 JFK CLT 9

import45 LGA as STL


networkx nx 2

54 EWR IAD 2
graph = nx.DiGraph()
55 LGA PBI 27

graph.add_nodes_from(data_delayed['origin'])
62 JFK LAX 21

64 LGA CVG 2
for index, row in data_delayed.iterrows():
65 EWR CVG 68 row['dest'], weight=row['dep_delay'])
graph.add_edge(row['origin'],
67 EWR IAH 5

68 LGA DTW 14
weight = [graph.edges[e]['weight'] for e in graph.edges]
69 LGA ORD 37

nx.draw(graph,
71 LGAnx.spring_layout(graph),
TPA 86 node_color='#FA8072', edge_color=weight, width=2, edge_cmap=plt.cm.Reds, with_labels=True)

72 JFK PBI 48

80 EWR MSP 9

83 EWR LAX 17

84 JFK MCO 235

88 JFK SJU 1

89 JFK LAX 119

90 JFK ATL 48

97 EWR SJU 1

Question 2: Which airport (EWR, JFK or LGA) has the most number of departure delayed
flight?

airport JGK has the most number of depatured delayed flight.

Question 3: Draw a network graph using air_time as the edges weight. Compare it with
the distance graph in Question 1 to know whether there is a direct correlation between

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 8/9
11/26/23, 11:09 PM Lab Activity - Network Visualization.ipynb - Colaboratory

air time and distance. Is the graph similar or there are some differences? You can use
the same code as in Question 1. You just need to change the column name
data_airtime = data.filter(['origin', 'dest', 'air_time'])
data_airtime

origin dest air_time

0 EWR MEM 144

1 LGA FLL 147

2 EWR SEA 315

3 JFK DEN 221

4 JFK SEA 358

... ... ... ...

94 LGA TPA 147

95 EWR LAX 308

96 JFK BOS 40

97 EWR SJU 200

98 LGA IAH 189

99 rows × 3 columns

data_airtime = data_airtime.dropna()
data_airtime

origin dest air_time

0 EWR MEM 144

1 LGA FLL 147

2 EWR SEA 315

3 JFK DEN 221

4 JFK SEA 358

... ... ... ...

94 LGA TPA 147

95 EWR LAX 308

96 JFK BOS 40

97 EWR SJU 200

98 LGA IAH 189

99 rows × 3 columns

graph2 = nx.DiGraph()

graph2.add_nodes_from(data_airtime['origin'])
for index, row in data_airtime.iterrows():
graph2.add_edge(row['origin'], row['dest'], weight=row['air_time'])

weight2 = [graph2.edges[e]['weight'] for e in graph2.edges]

nx.draw(graph2, nx.spring_layout(graph2), node_color='#FA8072', edge_color=weight2, width=2, edge_cmap=plt.cm.Reds, with_labels=True)

https://colab.research.google.com/drive/1wAWBV7cxvJemrPy8W5EqgY2RM1avOmJp#scrollTo=yr80CTNdLO0V&printMode=true 9/9

You might also like