Chapter5-Case Study Analyzing Flight Delays
Chapter5-Case Study Analyzing Flight Delays
Delay Data
PA R A L L E L P R O G R A M M I N G W I T H D A S K I N P Y T H O N
Dhavide Aruliah
Director of Training, Anaconda
Case study: Analyzing flight delays
Limitations:
Unsupported le formats
date,amount
2016-01-31,103.15
2016-02-25,114.17
2016-03-06,4.03
2016-05-20,150.48
accounts/Bob.csv :
date,amount
2016-01-04,99.68
2016-02-09,146.41
2016-02-21,-42.94
2016-03-14,0.26
10.56476
89160 NaN
89161 0.0
89162 NaN
89163 NaN
89164 NaN
Name: WEATHER_DELAY, dtype: float64
Dhavide Aruliah
Director of Training, Anaconda
Daily weather data
import pandas as pd
df = pd.read_csv('DEN.csv', parse_dates=True, index_col='Date')
df.columns
PrecipitationIn Events
Date
2016-03-27 0.00 NaN
2016-03-28 0.00 NaN
2016-03-29 0.04 Rain-Thunderstorm
2016-03-30 0.04 Rain-Snow
2016-03-31 0.01 Snow
'0.00'
str
df[['PrecipitationIn', 'Events']].info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 2 columns):
PrecipitationIn 366 non-null object
Events 115 non-null object
dtypes: object(2)
memory usage: 5.8+ KB
Dhavide Aruliah
Director of Training, Anaconda
Merging DataFrames
Pandas: pd.merge()
Pandas: pd.DataFrame.merge()
Dask: dask.dataframe.merge()
2.701183508773752
CPU times: user 3.35 s, sys: 719 ms, total: 4.07 s
Wall time: 1.64 s
%time print(df.WEATHER_DELAY.std().compute())
21.230502105
CPU times: user 3.33 s, sys: 706 ms, total: 4.04 s
Wall time: 1.61 s
192563
CPU times: user 3.36 s, sys: 695 ms, total: 4.06 s
Wall time: 1.66 s
%time print(persisted_df.WEATHER_DELAY.mean().compute())
2.701183508773752
CPU times: user 15.1 ms, sys: 9.24 ms, total: 24.3 ms
Wall time: 18.5 ms
21.230502105
CPU times: user 29.6 ms, sys: 12.5 ms, total: 42.1 ms
Wall time: 29.5 ms
%time print(persisted_df.WEATHER_DELAY.count().compute())
192563
CPU times: user 9.88 ms, sys: 2.98 ms, total: 12.9 ms
Wall time: 9.43 ms
h ps://dask.org/