0% found this document useful (0 votes)

36 views47 pages

Alizing Time Series Data in Python

Uploaded by

Trà Thanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views47 pages

Alizing Time Series Data in Python

Uploaded by

Trà Thanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

alizing-time-series-data-in-python

November 13, 2024

1 Visualizing Time Series Data in Python

1.1 Introduction
[ ]: # Import pandas
import pandas as pd

# Read in the file content in a DataFrame called discoveries

discoveries = pd.read_csv('[Link]
↪datacamp/master/Visualizing%20Time%20Series%20Data%20in%20Python/

↪ch1_discoveries.csv')

# Display the first five lines of the DataFrame

print([Link]())

date Y
0 01-01-1860 5
1 01-01-1861 3
2 01-01-1862 0
3 01-01-1863 2
4 01-01-1864 0

1.1.1 Test whether your data is of the correct type

[ ]: [Link]

[ ]: date object
Y int64
dtype: object

[ ]: # Convert the date column to a datestamp type

discoveries['date'] = pd.to_datetime([Link])
# Chuyển kiểu dữ liệu cột date về datetime

[ ]: # Print the data type of each column in discoveries, again

print([Link])

date datetime64[ns]

1
Y int64
dtype: object

[ ]: import [Link] as plt

%matplotlib inline

[ ]: # Set the date column as the index of your DataFrame discoveries

discoveries = discoveries.set_index('date')

# Plot the time series in your DataFrame

ax = [Link](color='blue')

# Specify the x-axis label in your plot

ax.set_xlabel('Date')

# Specify the y-axis label in your plot

ax.set_ylabel('Number of great discoveries');

2
1.1.2 Specify plot styles

[ ]: # Use the fivethirtyeight style

[Link]('fivethirtyeight')

# Plot the time series

ax1 = [Link]()
ax1.set_title('FiveThirtyEight Style');

[ ]: # Use the ggplot style

[Link]('ggplot')
ax2 = [Link]()

# Set the title

ax2.set_title('ggplot Style');

3
1.1.3 Display and label plots

[ ]: # Plot a line chart of the discoveries DataFrame using the specified arguments
ax = [Link](color='blue', figsize=(8, 3), linewidth=2, fontsize=6)

# Specify the title in your plot

ax.set_title('Number of great inventions and scientific discoveries from 1860␣
↪to 1959', fontsize=8);

4
1.1.4 Subset time series data
[ ]: # Select the subset of data between 1945 and 1950
discoveries_subset_1 = discoveries['1945':'1950']

# Plot the time series in your DataFrame as a blue area chart

ax = discoveries_subset_1.plot(color='blue', fontsize=15);

5
[ ]: # Select the subset of data between 1939 and 1958
discoveries_subset_2 = discoveries['1939':'1958']

# Plot the time series in your DataFrame as a blue area chart

ax = discoveries_subset_2.plot(color='blue', fontsize=15);

[ ]: [Link]()

[ ]: Y
date
1860-01-01 5
1861-01-01 3
1862-01-01 0
1863-01-01 2
1864-01-01 0

6
1.1.5 Add vertical and horizontal markers
[ ]: # Plot your the discoveries time series
ax = [Link](color='blue', fontsize=6)

# Add a red vertical line

[Link]('1939-01-01', color='red', linestyle='--')

# Add a green horizontal line

[Link](4, color='green', linestyle='--');

[ ]: data = {'Year':range(1945, 1951),

'SoluongSV':[1000, 2000, 1500, 1700, 2500, 2700]}
df = [Link](data)
[Link](6)
# Chuyển index về cột Year
# Đổi sang kiểu datetime
# Dùng lệnh [Link] để vẽ timeseries
# Vẽ trong miền 1945-1948

7
[ ]: Year SoluongSV
0 1945 1000
1 1946 2000
2 1947 1500
3 1948 1700
4 1949 2500
5 1950 2700

[ ]: # Chuyển index về cột Year

df.set_index('Year', inplace=True)
[Link]()

[ ]: SoluongSV
Year
1945 1000
1946 2000
1947 1500
1948 1700
1949 2500

[ ]: [Link]

[ ]: SoluongSV int64
dtype: object

[ ]: [Link](color = 'red', linestyle='--', linewidth=0.5, fontsize=10)

[ ]: <Axes: xlabel='Year'>

8
[ ]:

1.1.6 Add shaded regions to your plot

[ ]: # Plot your the discoveries time series

ax = [Link](color='blue', fontsize=6)

# Add a vertical red shaded region

[Link]('1900-01-01', '1915-01-01', color='red', alpha=.3)

# Add a horizontal green shaded region

[Link](6, 8, color='green', alpha=.3);

9
1.2 Summary Statistics and Diagnostics
1.2.1 Find missing values

[ ]: co2_levels = pd.read_csv('[Link]
↪master/Visualizing%20Time%20Series%20Data%20in%20Python/ch2_co2_levels.csv')

[ ]: # Display first seven rows of co2_levels

print(co2_levels.head(8))

datestamp co2
0 1958-03-29 316.1
1 1958-04-05 317.3
2 1958-04-12 317.6
3 1958-04-19 317.5
4 1958-04-26 316.4
5 1958-05-03 316.9
6 1958-05-10 NaN
7 1958-05-17 317.5

10
[ ]: # Set datestamp column as index
co2_levels = co2_levels.set_index('datestamp')

# Print out the number of missing values

print(co2_levels.isnull().sum())

co2 59
dtype: int64

1.2.2 Handle missing values

[ ]: # Impute missing values with the next valid observation

co2_levels = co2_levels.fillna(method='bfill')

# Print out the number of missing values

print(co2_levels.isnull().sum())

co2 0
dtype: int64
<ipython-input-23-70d681172169>:2: FutureWarning: [Link] with 'method'
is deprecated and will raise in a future version. Use [Link]() or [Link]()
instead.
co2_levels = co2_levels.fillna(method='bfill')

1.2.3 Display rolling averages

[ ]: # Compute the 52 weeks rolling mean of the co2_levels DataFrame

ma = co2_levels.rolling(window=52).mean()

# Compute the 52 weeks rolling standard deviation of the co2_levels DataFrame

mstd = co2_levels.rolling(window=52).std()

# Add the upper bound column to the ma DataFrame

ma['upper'] = ma['co2'] + (mstd['co2'] * 2)

# Add the lower bound column to the ma DataFrame

ma['lower'] = ma['co2'] - (mstd['co2'] * 2)

# Plot the content of the ma DataFrame

ax = [Link](linewidth=0.8, fontsize=6)

# Specify labels, legend, and show the plot

ax.set_xlabel('Date', fontsize=10)
ax.set_ylabel('CO2 levels in Mauai Hawaii', fontsize=10)
ax.set_title('Rolling mean and variance of CO2 levels\nin Mauai Hawaii from␣
↪1958 to 2001', fontsize=10)

[Link]();

11
1.2.4 Display aggregated values

[ ]: co2_levels.dtypes

[ ]: co2 float64
dtype: object

[ ]: co2_levels.reset_index('datestamp',inplace=True)

[ ]: co2_levels['datestamp'] = pd.to_datetime(co2_levels.datestamp)

[ ]: co2_levels.set_index('datestamp',inplace=True)

[ ]: # Get month for each dates in the index of co2_levels

index_month = co2_levels.[Link]

# Compute the mean CO2 levels for each month of the year
mean_co2_levels_by_month = co2_levels.groupby(index_month).mean()

# Plot the mean CO2 levels for each month of the year

12
mean_co2_levels_by_month.plot(fontsize=6)

# Specify the fontsize on the legend

[Link](fontsize=10);

1.2.5 Compute numerical summaries

[ ]: # Print out summary statistics of the co2_levels DataFrame

print(co2_levels.describe())

# Print out the minima of the co2 column in the co2_levels DataFrame
print(co2_levels.[Link]())

# Print out the maxima of the co2 column in the co2_levels DataFrame
print(co2_levels.[Link]())

co2
count 2284.000000
mean 339.657750
std 17.100899
min 313.000000

13
25% 323.975000
50% 337.700000
75% 354.500000
max 373.900000
313.0
373.9

1.2.6 Boxplots and Histograms

[ ]: # Generate a boxplot
ax = co2_levels.boxplot()

# Set the labels and display the plot

ax.set_xlabel('CO2', fontsize=10)
ax.set_ylabel('Boxplot CO2 levels in Maui Hawaii', fontsize=10);

[ ]: # Generate a histogram
ax = co2_levels.plot(kind='hist', bins=50, fontsize=6)

# Set the labels and display the plot

ax.set_xlabel('CO2', fontsize=10)
ax.set_ylabel('Histogram of CO2 levels in Maui Hawaii', fontsize=10)

14
[Link](fontsize=10)
[Link]();

1.2.7 Density plots

[ ]: # Display density plot of CO2 levels values

ax = co2_levels.plot(kind='density', linewidth=4, fontsize=6)

# Annotate x-axis labels

ax.set_xlabel('CO2', fontsize=10)

# Annotate y-axis labels

ax.set_ylabel('Density plot of CO2 levels in Maui Hawaii', fontsize=10)

[Link]();

15
1.3 Seasonality, Trend and Noise
1.3.1 Autocorrelation in time series data
[ ]: # Import required libraries
import [Link] as plt

[Link]('fivethirtyeight')

from [Link] import tsaplots

# Display the autocorrelation plot of your time series

fig = tsaplots.plot_acf(co2_levels['co2'], lags=24)

# Show plot
[Link]();

16
In order to help you asses how trustworthy these autocorrelation values are, the plot_acf() function
also returns confidence intervals (represented as blue shaded regions). If an autocorrelation value
goes beyond the confidence interval region, you can assume that the observed autocorrelation value
is statistically significant.
They are highly correlated and statistically significant.
the autocorrelation values are not beyond the confidence intervals (the blue shaded regions)
the correlations (check the lines and their corresponding values on the Y axis) are not greater than
0.5

1.3.2 Partial autocorrelation in time series data

[ ]: # Display the partial autocorrelation plot of your time series
fig = tsaplots.plot_pacf(co2_levels['co2'], lags=24)

# Show plot
[Link]()

17
If partial autocorrelation values are close to 0, then values between observations and lagged obser-
vations are not correlated with one another. Inversely, partial autocorrelations with values close
to 1 or -1 indicate that there exists strong positive or negative correlations between the lagged
observations of the time series.
at which lag values do we have statistically significant partial autocorrelations?
0, 1, 4,5,6 = there are the additional lag values that are beyond the confidence intervals.

1.3.3 Time series decomposition

[ ]: # Import [Link] as sm
import [Link] as sm

# Perform time series decompositon

decomposition = [Link].seasonal_decompose(co2_levels)

# Print the seasonality component

print([Link])

datestamp
1958-03-29 1.028042
1958-04-05 1.235242

18
1958-04-12 1.412344
1958-04-19 1.701186
1958-04-26 1.950694
…
2001-12-01 -0.525044
2001-12-08 -0.392799
2001-12-15 -0.134838
2001-12-22 0.116056
2001-12-29 0.285354
Name: seasonal, Length: 2284, dtype: float64

1.3.4 Plot individual components

[ ]: # Extract the trend component

trend = [Link]

# Plot the values of the trend

ax = [Link](figsize=(12, 6), fontsize=6)

# Specify axis labels

ax.set_xlabel('Date', fontsize=10)
ax.set_title('Seasonal component the CO2 time-series', fontsize=10)
[Link]()

19
1.3.5 Visualize the airline dataset
[ ]: airline = pd.read_csv('[Link]
↪master/Visualizing%20Time%20Series%20Data%20in%20Python/

↪ch3_airline_passengers.csv')

[Link]()

[ ]: Month AirPassengers
0 1949-01 112
1 1949-02 118
2 1949-03 132
3 1949-04 129
4 1949-05 121

[ ]: # Plot the time series in your dataframe

ax = [Link](color='blue', fontsize=12)

# Add a red vertical line at the date 1955-12-01

[Link]('1955-12-01', color='red', linestyle='--')

# Specify the labels in your plot

ax.set_xlabel('Date', fontsize=12)
ax.set_title('Number of Monthly Airline Passengers', fontsize=12)
[Link]()

20
1.3.6 Analyze the airline dataset

[ ]: # Print out the number of missing values

print([Link]().sum())

# Print out summary statistics of the airline DataFrame

print([Link]())

Month 0
AirPassengers 0
dtype: int64
AirPassengers
count 144.000000
mean 280.298611
std 119.966317
min 104.000000
25% 180.000000
50% 265.500000
75% 360.500000
max 622.000000

21
[ ]: # Display boxplot of airline values
ax = [Link]()

# Specify the title of your plot

ax.set_title('Boxplot of Monthly Airline\nPassengers Count', fontsize=20)
[Link]()

[ ]: airline['Month'] = pd.to_datetime([Link])

[ ]: airline.set_index('Month', inplace=True)

[ ]: [Link]

[ ]: DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01', '1949-04-01',

'1949-05-01', '1949-06-01', '1949-07-01', '1949-08-01',
'1949-09-01', '1949-10-01',
…
'1960-03-01', '1960-04-01', '1960-05-01', '1960-06-01',
'1960-07-01', '1960-08-01', '1960-09-01', '1960-10-01',

22
'1960-11-01', '1960-12-01'],
dtype='datetime64[ns]', name='Month', length=144, freq=None)

[ ]: # Get month for each dates from the index of airline

index_month = [Link]

# Compute the mean number of passengers for each month of the year
mean_airline_by_month = [Link](index_month).mean()

# Plot the mean number of passengers for each month of the year
mean_airline_by_month.plot()
[Link](fontsize=20)
[Link]()

Looks like July and August are the busiest months!

23
1.3.7 Time series decomposition of the airline dataset

[ ]: # Import [Link] as sm
import [Link] as sm

# Perform time series decompositon

decomposition = [Link].seasonal_decompose(airline)

# Extract the trend and seasonal components

trend = [Link]
seasonal = [Link]

[ ]: import numpy as np

[ ]: airline_decomposed = [Link](np.c_[trend, seasonal], index=[Link],␣

↪columns=['trend', 'seasonal'])

airline_decomposed.head()

[ ]: trend seasonal
Month
1949-01-01 NaN -24.748737
1949-02-01 NaN -36.188131
1949-03-01 NaN -2.241162
1949-04-01 NaN -8.036616
1949-05-01 NaN -4.506313

[ ]: # Plot the values of the df_decomposed DataFrame

ax = airline_decomposed.plot(figsize=(12, 6), fontsize=15)

# Specify axis labels

ax.set_xlabel('Date', fontsize=15)
[Link](fontsize=15)
[Link]()

24
1.4 Work with Multiple Time Series
1.4.1 Load multiple time series

[ ]: # Read in meat DataFrame

meat = pd.read_csv('[Link]
↪master/Visualizing%20Time%20Series%20Data%20in%20Python/ch4_meat.csv')

# Review the first five lines of the meat DataFrame

display([Link](5))

# Convert the date column to a datestamp type

meat['date'] = pd.to_datetime(meat['date'])

# Set the date column as the index of your DataFrame meat

meat = meat.set_index('date')

# Print the summary statistics of the DataFrame

[Link]()

date beef veal pork lamb_and_mutton broilers other_chicken \

0 1944-01-01 751.0 85.0 1280.0 89.0 NaN NaN
1 1944-02-01 713.0 77.0 1169.0 72.0 NaN NaN
2 1944-03-01 741.0 90.0 1128.0 75.0 NaN NaN
3 1944-04-01 650.0 89.0 978.0 66.0 NaN NaN
4 1944-05-01 681.0 106.0 1029.0 78.0 NaN NaN

turkey
0 NaN

25
1 NaN
2 NaN
3 NaN
4 NaN

[ ]: beef veal pork lamb_and_mutton broilers \

count 827.000000 827.000000 827.000000 827.000000 635.000000
mean 1683.463362 54.198549 1211.683797 38.360701 1516.582520
std 501.698480 39.062804 371.311802 19.624340 963.012101
min 366.000000 8.800000 124.000000 10.900000 250.900000
25% 1231.500000 24.000000 934.500000 23.000000 636.350000
50% 1853.000000 40.000000 1156.000000 31.000000 1211.300000
75% 2070.000000 79.000000 1466.000000 55.000000 2426.650000
max 2512.000000 215.000000 2210.400000 109.000000 3383.800000

other_chicken turkey
count 143.000000 635.000000
mean 43.033566 292.814646
std 3.867141 162.482638
min 32.300000 12.400000
25% 40.200000 154.150000
50% 43.400000 278.300000
75% 45.650000 449.150000
max 51.100000 585.100000

[ ]: [Link]()

[ ]: beef veal pork lamb_and_mutton broilers other_chicken \

date
1944-01-01 751.0 85.0 1280.0 89.0 NaN NaN
1944-02-01 713.0 77.0 1169.0 72.0 NaN NaN
1944-03-01 741.0 90.0 1128.0 75.0 NaN NaN
1944-04-01 650.0 89.0 978.0 66.0 NaN NaN
1944-05-01 681.0 106.0 1029.0 78.0 NaN NaN

turkey
date
1944-01-01 NaN
1944-02-01 NaN
1944-03-01 NaN
1944-04-01 NaN
1944-05-01 NaN

26
1.4.2 Visualize multiple time series

[ ]: # Plot time series dataset

ax = [Link](fontsize=12, linewidth=2, figsize=(15,10))

# Additional customizations
ax.set_xlabel('Date')
[Link](fontsize=15)

# Show plot
[Link]()

[ ]: # Plot an area chart

ax = [Link](fontsize=12, figsize=(15,10))

# Additional customizations
ax.set_xlabel('Date')
[Link](fontsize=15)

# Show plot
[Link]()

27
1.4.3 Define the color palette of your plots

[ ]: # Plot time series dataset using the cubehelix color palette

ax = [Link](colormap='cubehelix', fontsize=15, figsize=(15,10))

# Additional customizations
ax.set_xlabel('Date')
[Link](fontsize=18)

# Show plot
[Link]()

28
[ ]: # Plot time series dataset using the cubehelix color palette
ax = [Link](colormap='PuOr', fontsize=15, figsize=(15,10))

# Additional customizations
ax.set_xlabel('Date')
[Link](fontsize=18)

# Show plot
[Link]()

29
1.4.4 Add summary statistics to your time series plot

[ ]: des = [Link]().loc['mean']
meat_mean = [Link]([[Link]], columns=[Link], index=['mean'])
meat_mean

[ ]: beef veal pork lamb_and_mutton broilers \

mean 1683.463362 54.198549 1211.683797 38.360701 1516.58252

other_chicken turkey
mean 43.033566 292.814646

[ ]: # Plot the meat data

ax = [Link](fontsize=6, linewidth=1,figsize=(20,15))

# Add x-axis labels

ax.set_xlabel('Date', fontsize=6)

# Add summary table information to the plot

[Link](cellText=meat_mean.values,
colWidths = [0.15]*len(meat_mean.columns),
rowLabels=meat_mean.[Link],
colLabels=meat_mean.columns,
loc='top')

30
# Specify the fontsize and location of your legend
[Link](loc='upper center', bbox_to_anchor=(0.5, 0.95), ncol=3, fontsize=12)

# Show plot
[Link]()

1.4.5 Plot your time series on individual plots

[ ]: # Create a facetted graph with 2 rows and 4 columns

[Link](subplots=True,
figsize=(15,9),
layout=(2,4),
sharex=False,
sharey=False,
colormap='viridis',
fontsize=2,
legend=False,
linewidth=0.2)

[Link]()

31
1.4.6 Compute correlations between time series
the pearson method should be used when relationships between your variables are thought to be
linear, while the kendall and spearman methods should be used when relationships between your
variables are thought to be non-linear.
[ ]: # Print the correlation matrix between the beef and pork columns using the␣
↪spearman method

print(meat[['beef', 'pork']].corr(method='spearman'))

# Print the correlation between beef and pork columns

print(0.827587)

beef pork
beef 1.000000 0.827587
pork 0.827587 1.000000
0.827587

[ ]: # Compute the correlation between the pork, veal and turkey columns using the␣
↪pearson method

print(meat[['pork', 'veal', 'turkey']].corr('pearson'))

# Print the correlation between veal and pork columns

print(-0.808834)

# Print the correlation between veal and turkey columns

32
print(-0.768366)

# Print the correlation between pork and turkey columns

print(0.835215)

pork veal turkey

pork 1.000000 -0.808834 0.835215
veal -0.808834 1.000000 -0.768366
turkey 0.835215 -0.768366 1.000000
-0.808834
-0.768366
0.835215

1.4.7 Visualize correlation matrices

[ ]: # Import seaborn library
import seaborn as sns

# Get correlation matrix of the meat DataFrame

corr_meat = [Link](method='spearman')

# Customize the heatmap of the corr_meat correlation matrix

[Link](corr_meat,
annot=True,
linewidths=0.4,
annot_kws={"size": 10})

[Link](rotation=90)
[Link](rotation=0)
[Link]()

33
1.4.8 Clustered heatmaps

[ ]: # Get correlation matrix of the meat DataFrame

corr_meat = [Link]('pearson')

# Customize the heatmap of the corr_meat correlation matrix and rotate the␣
↪x-axis labels

fig = [Link](corr_meat,
row_cluster=True,
col_cluster=True,
figsize=(10, 10))

[Link](fig.ax_heatmap.xaxis.get_majorticklabels(), rotation=90)
[Link](fig.ax_heatmap.yaxis.get_majorticklabels(), rotation=0)
[Link]()

34
1.5 Case Study
[ ]: import seaborn as sns

[Link](x=meat["veal"], y=meat["lamb_and_mutton"])

[ ]: <Axes: xlabel='veal', ylabel='lamb_and_mutton'>

35
1.5.1 Explore the Jobs dataset

[ ]: jobs = pd.read_csv('[Link]
↪master/Visualizing%20Time%20Series%20Data%20in%20Python/ch5_employment.csv')

[Link]()

[ ]: datestamp Agriculture Business services Construction \

0 2000-01-01 10.3 5.7 9.7
1 2000-02-01 11.5 5.2 10.6
2 2000-03-01 10.4 5.4 8.7
3 2000-04-01 8.9 4.5 5.8
4 2000-05-01 5.1 4.7 5.0

Durable goods manufacturing Education and Health Finance Government \

0 3.2 2.3 2.7 2.1
1 2.9 2.2 2.8 2.0
2 2.8 2.5 2.6 1.5
3 3.4 2.1 2.3 1.3
4 3.4 2.7 2.2 1.9

Information Leisure and hospitality Manufacturing Mining and Extraction \

36
0 3.4 7.5 3.6 3.9
1 2.9 7.5 3.4 5.5
2 3.6 7.4 3.6 3.7
3 2.4 6.1 3.7 4.1
4 3.5 6.2 3.4 5.3

Nondurable goods manufacturing Other Self-employed \

0 4.4 4.9 2.3
1 4.2 4.1 2.5
2 5.1 4.3 2.0
3 4.0 4.2 2.0
4 3.6 4.5 1.9

Transportation and Utilities Wholesale and Retail Trade

0 4.3 5.0
1 4.0 5.2
2 3.5 5.1
3 3.4 4.1
4 3.4 4.3

[ ]: [Link]()

[ ]: datestamp Agriculture Business services Construction \

117 2009-10-01 11.8 10.3 18.7
118 2009-11-01 12.6 10.6 19.4
119 2009-12-01 19.7 10.3 22.7
120 2010-01-01 21.3 11.1 24.7
121 2010-02-01 18.8 12.0 27.1

Durable goods manufacturing Education and Health Finance Government \

117 12.9 6.0 7.0 3.5
118 12.7 5.5 6.7 3.4
119 13.3 5.6 7.2 3.6
120 14.1 5.5 6.6 4.3
121 13.6 5.6 7.5 4.0

Information Leisure and hospitality Manufacturing \

117 8.2 12.4 12.2
118 7.6 11.9 12.5
119 8.5 12.6 11.9
120 10.0 14.2 13.0
121 10.0 12.7 12.1

Mining and Extraction Nondurable goods manufacturing Other \

117 10.8 10.9 8.5
118 12.0 12.0 8.0
119 11.8 9.5 8.2

37
120 9.1 11.1 10.0
121 10.7 9.7 9.9

Self-employed Transportation and Utilities Wholesale and Retail Trade

117 5.9 8.6 9.6
118 5.7 8.5 9.2
119 5.9 9.0 9.1
120 7.2 11.3 10.5
121 6.5 10.5 10.0

[ ]: # Review the type of each column in your DataFrame

print([Link])

# Convert datestamp column to a datetime object

jobs['datestamp'] = pd.to_datetime(jobs['datestamp'])

# Set the datestamp columns as the index of your DataFrame

jobs = jobs.set_index('datestamp')

# Check the number of missing values in each column

print([Link]().sum())

datestamp object
Agriculture float64
Business services float64
Construction float64
Durable goods manufacturing float64
Education and Health float64
Finance float64
Government float64
Information float64
Leisure and hospitality float64
Manufacturing float64
Mining and Extraction float64
Nondurable goods manufacturing float64
Other float64
Self-employed float64
Transportation and Utilities float64
Wholesale and Retail Trade float64
dtype: object
Agriculture 0
Business services 0
Construction 0
Durable goods manufacturing 0
Education and Health 0
Finance 0
Government 0

38
Information 0
Leisure and hospitality 0
Manufacturing 0
Mining and Extraction 0
Nondurable goods manufacturing 0
Other 0
Self-employed 0
Transportation and Utilities 0
Wholesale and Retail Trade 0
dtype: int64

1.5.2 Describe time series data with boxplots

[ ]: # Generate a boxplot
[Link](fontsize=6, vert=False)
[Link]()

# Generate numerical summaries

print([Link]())

# Print the name of the time series with the highest mean
print(9.840984)

# Print the name of the time series with the highest variability
print(4.587619)

39
Agriculture Business services Construction \
count 122.000000 122.000000 122.000000
mean 9.840984 6.919672 9.426230
std 3.962067 1.862534 4.587619
min 2.400000 4.100000 4.400000
25% 6.900000 5.600000 6.100000
50% 9.600000 6.450000 8.100000
75% 11.950000 7.875000 10.975000
max 21.300000 12.000000 27.100000

Durable goods manufacturing Education and Health Finance \

count 122.000000 122.000000 122.000000
mean 6.025410 3.420492 3.540164
std 2.854475 0.877538 1.235405
min 2.800000 1.800000 2.100000
25% 4.125000 2.900000 2.700000
50% 5.100000 3.200000 3.300000
75% 6.775000 3.700000 3.700000
max 14.100000 6.100000 7.500000

Government Information Leisure and hospitality Manufacturing \

count 122.000000 122.000000 122.000000 122.000000
mean 2.581148 5.486885 8.315574 5.982787
std 0.686750 2.016582 1.605570 2.484221
min 1.300000 2.400000 5.900000 3.100000
25% 2.100000 3.900000 7.300000 4.500000
50% 2.400000 5.150000 8.050000 5.300000
75% 2.875000 6.900000 8.800000 6.600000
max 5.100000 11.500000 14.200000 13.000000

Mining and Extraction Nondurable goods manufacturing Other \

count 122.000000 122.000000 122.000000
mean 5.088525 5.930328 5.096721
std 2.942428 1.922330 1.317457
min 0.300000 3.100000 2.900000
25% 3.200000 4.825000 4.200000
50% 4.300000 5.500000 4.900000
75% 6.050000 6.100000 5.600000
max 16.100000 12.000000 10.000000

Self-employed Transportation and Utilities Wholesale and Retail Trade

count 122.000000 122.000000 122.000000
mean 3.031967 4.935246 5.766393
std 1.124429 1.753340 1.463417
min 1.700000 2.300000 3.600000
25% 2.400000 3.900000 4.800000
50% 2.700000 4.400000 5.400000
75% 3.200000 5.400000 6.200000

40
max 7.200000 11.300000 10.500000
9.840984
4.587619

1.5.3 Plot all the time series in your dataset

[ ]: # A subset of the jobs DataFrame

jobs_subset = jobs[['Finance', 'Information', 'Manufacturing', 'Construction']]

# Print the first 5 rows of jobs_subset

display(jobs_subset.head())

# Create a facetted graph with 2 rows and 2 columns

ax = jobs_subset.plot(subplots=True,
layout=(2,2),
sharex=False,
sharey=False,
linewidth=0.7,
fontsize=3,
legend=False)

[Link]()

Finance Information Manufacturing Construction

datestamp
2000-01-01 2.7 3.4 3.6 9.7
2000-02-01 2.8 2.9 3.4 10.6
2000-03-01 2.6 3.6 3.6 8.7
2000-04-01 2.3 2.4 3.7 5.8
2000-05-01 2.2 3.5 3.4 5.0

41
1.5.4 Annotate significant events in time series data

[ ]: # Plot all time series in the jobs DataFrame

ax = [Link](colormap='Spectral', fontsize=6, linewidth=0.8)

# Set labels and legend

ax.set_xlabel('Date', fontsize=10)
ax.set_ylabel('Unemployment Rate', fontsize=10)
ax.set_title('Unemployment rate of U.S. workers by industry', fontsize=10)
[Link](loc='center left', bbox_to_anchor=(1.0, 0.5))

# Annotate your plots with vertical lines

[Link]('2001-07-01', color='blue', linestyle='--', linewidth=0.8)
[Link]('2008-09-01', color='blue', linestyle='--', linewidth=0.8)

# Show plot
[Link]()

42
1.5.5 Plot monthly and yearly trends

[ ]: # Extract the month from the index of jobs

index_month = [Link]

# Compute the mean unemployment rate for each month

jobs_by_month = [Link](index_month).mean()

# Plot the mean unemployment rate for each month

ax = jobs_by_month.plot(fontsize=6, linewidth=1)

# Set axis labels and legend

ax.set_xlabel('Month', fontsize=10)
ax.set_ylabel('Mean unemployment rate', fontsize=10)
[Link](bbox_to_anchor=(0.8, 0.6), fontsize=10)
[Link]();

43
[ ]: # Extract of the year in each date indices of the jobs DataFrame
index_year = [Link]

# Compute the mean unemployment rate for each year

jobs_by_year = [Link](index_year).mean()

# Plot the mean unemployment rate for each year

ax = jobs_by_year.plot(fontsize=6, linewidth=1)

# Set axis labels and legend

ax.set_xlabel('Year', fontsize=10)
ax.set_ylabel('Mean unemployment rate', fontsize=10)

44
[Link](bbox_to_anchor=(0.1, 0.5), fontsize=10)
[Link]();

Averaging time series values by month shows that unemployment rate tends to be a lot higher during
the winter months for the Agriculture and Construction industry. The increase in unemployment
rate after 2008 is very clear when average time series values by year.

1.5.6 Apply time series decomposition to your dataset

[ ]: # Initialize dictionary
jobs_decomp = {}

# Get the names of each time series in the DataFrame

jobs_names = [Link]

45
# Run time series decomposition on each time series of the DataFrame
for ts in jobs_names:
ts_decomposition = [Link].seasonal_decompose(jobs[ts])
jobs_decomp[ts] = ts_decomposition

1.5.7 Visualize the seasonality of multiple time series

[ ]: jobs_seasonal = {}

[ ]: # Extract the seasonal values for the decomposition of each time series
for ts in jobs_names:
jobs_seasonal[ts] = jobs_decomp[ts].seasonal

# Create a DataFrame from the jobs_seasonal dictionnary

seasonality_df = [Link](jobs_seasonal)

# Remove the label for the index

seasonality_df.[Link] = None

# Create a faceted plot of the seasonality_df DataFrame

seasonality_df.plot(subplots=True,
layout=(4,4),
figsize=(20,14),
sharey=False,
fontsize=2,
linewidth=0.3,
legend=False)

# Show plot
[Link]()

46
1.5.8 Correlations between multiple time series

[ ]:

Autocorrelation in Time Series Analysis
No ratings yet
Autocorrelation in Time Series Analysis
26 pages
Time Series Analysis
No ratings yet
Time Series Analysis
9 pages
Time-Series Visualization with Matplotlib
No ratings yet
Time-Series Visualization with Matplotlib
27 pages
0.1 Exercise 26: Flights Dataset - Time Series Visualization and Analysis
No ratings yet
0.1 Exercise 26: Flights Dataset - Time Series Visualization and Analysis
8 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Performing Analysis of Meteorological Data: Punam Seal
No ratings yet
Performing Analysis of Meteorological Data: Punam Seal
21 pages
Unit 6 2
No ratings yet
Unit 6 2
6 pages
CSE315:Introduction To Data Science: WEEK-8
No ratings yet
CSE315:Introduction To Data Science: WEEK-8
27 pages
Dev Lab Record
No ratings yet
Dev Lab Record
31 pages
Time Series Analysis With Python
100% (1)
Time Series Analysis With Python
21 pages
What Is Time Series Decomposition and How Does It Work?
No ratings yet
What Is Time Series Decomposition and How Does It Work?
22 pages
Unit 5 - Real Time Data Analysis
No ratings yet
Unit 5 - Real Time Data Analysis
16 pages
Dsi 436
No ratings yet
Dsi 436
4 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
Week 10 Intro Time Series
No ratings yet
Week 10 Intro Time Series
34 pages
Unit 5 - Time Series Analysis and Predictive Modeling
No ratings yet
Unit 5 - Time Series Analysis and Predictive Modeling
21 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python Cheatsheet
No ratings yet
Python Cheatsheet
2 pages
Unit 3
No ratings yet
Unit 3
10 pages
21 - Practice Note On Time Series USING R
No ratings yet
21 - Practice Note On Time Series USING R
17 pages
41b Data Wrangling, Grouping and Aggregation
No ratings yet
41b Data Wrangling, Grouping and Aggregation
31 pages
DWP 4
No ratings yet
DWP 4
11 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
10 pages
Tutorial - Time Series Analysis With Pandas - Dataquest
No ratings yet
Tutorial - Time Series Analysis With Pandas - Dataquest
32 pages
Sunspot Time Series Data Analysis
No ratings yet
Sunspot Time Series Data Analysis
31 pages
Assignment 3 Teleco Telecom Revenue - Copy1
No ratings yet
Assignment 3 Teleco Telecom Revenue - Copy1
33 pages
Pandas Quick Start Guide
No ratings yet
Pandas Quick Start Guide
23 pages
Time Series Data Analysis Guide
No ratings yet
Time Series Data Analysis Guide
33 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Working With Time
No ratings yet
Working With Time
3 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Asfasdas
No ratings yet
Asfasdas
36 pages
Time Series
No ratings yet
Time Series
38 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas
No ratings yet
Pandas
42 pages
Project Time Series Analysis
100% (2)
Project Time Series Analysis
26 pages
Pandas Guide for Beginners
No ratings yet
Pandas Guide for Beginners
18 pages
M1 - L4 (Converting Non Stationary Data)
No ratings yet
M1 - L4 (Converting Non Stationary Data)
21 pages
Introduction To Matplotlib
No ratings yet
Introduction To Matplotlib
58 pages
Champagne Sales Data Analysis
No ratings yet
Champagne Sales Data Analysis
11 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Moving Averages in Pandas (Article) - DataCamp
No ratings yet
Moving Averages in Pandas (Article) - DataCamp
23 pages
Python Pandas Cheat Sheet Guide
No ratings yet
Python Pandas Cheat Sheet Guide
11 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Pandas
No ratings yet
Pandas
63 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
VG Offline Sample Pickup Form Instructions
No ratings yet
VG Offline Sample Pickup Form Instructions
3 pages
Duosida Info
No ratings yet
Duosida Info
5 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
Powerful, Professional Presentations Handout
No ratings yet
Powerful, Professional Presentations Handout
1 page
CCS336 Cloud Services Management Lecture Notes 1
No ratings yet
CCS336 Cloud Services Management Lecture Notes 1
93 pages
Teacher and ICT Action Plan For SY 2021-2022
80% (10)
Teacher and ICT Action Plan For SY 2021-2022
18 pages
BTL CardioPoint: Complete Cardiology Solutions
No ratings yet
BTL CardioPoint: Complete Cardiology Solutions
23 pages
BCS303 - Module 2
No ratings yet
BCS303 - Module 2
41 pages
Hiren Al 2
No ratings yet
Hiren Al 2
5 pages
Service Functions for HP M404 Reset
No ratings yet
Service Functions for HP M404 Reset
2 pages
Evaluating Rectified Activations in CNNs
No ratings yet
Evaluating Rectified Activations in CNNs
5 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
The Medieval Mind of C S Lewis Jason M Baxter Download
No ratings yet
The Medieval Mind of C S Lewis Jason M Baxter Download
19 pages
Daimler MB Prospekt Programm09 en 090716
No ratings yet
Daimler MB Prospekt Programm09 en 090716
16 pages
Digital Signature
No ratings yet
Digital Signature
34 pages
Simulation Function Device Guide
No ratings yet
Simulation Function Device Guide
9 pages
Cyberphysical Systems Modelling and Industrial Application Alla G Kravets Download
No ratings yet
Cyberphysical Systems Modelling and Industrial Application Alla G Kravets Download
72 pages
STS MIdterm Performance Task
No ratings yet
STS MIdterm Performance Task
2 pages
U.trust GP HSM Se Series Datasheet EN
No ratings yet
U.trust GP HSM Se Series Datasheet EN
2 pages
BCA 421 Java - (B)
No ratings yet
BCA 421 Java - (B)
1 page
Smashing Magazine Graphic Charter
No ratings yet
Smashing Magazine Graphic Charter
18 pages
DBMS Join Queries and Table Creation
No ratings yet
DBMS Join Queries and Table Creation
10 pages
Learner Project
No ratings yet
Learner Project
13 pages
Ai Exam Theory+Math+Py
No ratings yet
Ai Exam Theory+Math+Py
5 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
COEN 243 Tutorial 5
No ratings yet
COEN 243 Tutorial 5
21 pages
Java Study Guide
No ratings yet
Java Study Guide
17 pages
ZXA10 C300&C320 (V2.0.1P3) Product Description PDF
No ratings yet
ZXA10 C300&C320 (V2.0.1P3) Product Description PDF
157 pages
Accu-Chek Smart Pix Short Instruction Guide
No ratings yet
Accu-Chek Smart Pix Short Instruction Guide
15 pages
NEP BCA III Sem Comp - Communication & Networks
No ratings yet
NEP BCA III Sem Comp - Communication & Networks
2 pages