0% found this document useful (0 votes)
16 views

Week 10 Intro Forecasting

Time series

Uploaded by

arnablions
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Week 10 Intro Forecasting

Time series

Uploaded by

arnablions
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

22/04/2024, 03:17 week_10_intro_forecasting

CMPINF 2120 - Week 10


Introduction to time series forecasting methods
This report introduces the basic concepts of forecasting methods by focusing on 3
simple approaches. You will learn how to use the AVERAGE, Naive, and Seasonal Naive
forecasting methods work. You will then see how to combine these simple approaches
with decompositions.
These simple approaches are the fundamental building blocks of time series forecasting.
They must be understood to before the more advanced methods can be utulized.

Import Modules
In [1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns

In [2]: import statsmodels.api as sm

Read data
Let's use the US retail employment example again.
In [4]: us_retail_df = pd.read_csv('us_retail_employment.csv')

In [5]: us_retail_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 357 entries, 0 to 356
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 357 non-null int64
1 Month 357 non-null int64
2 Day 357 non-null int64
3 Employed 357 non-null float64
dtypes: float64(1), int64(3)
memory usage: 11.3 KB

In [6]: us_retail_df.head()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 1/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[6]: Year Month Day Employed


0 1990 1 1 13255.8
1 1990 2 1 12966.3
2 1990 3 1 12938.2
3 1990 4 1 13012.3
4 1990 5 1 13108.3

Prepare data
We need to create the datetime object column and then separate the Employed
column into its own Series.
In [7]: us_retail_df['date_dt'] = pd.to_datetime( us_retail_df.loc[:, ['Year', 'Mont

In [8]: us_retail_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 357 entries, 0 to 356
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 357 non-null int64
1 Month 357 non-null int64
2 Day 357 non-null int64
3 Employed 357 non-null float64
4 date_dt 357 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(3)
memory usage: 14.1 KB

In [9]: us_retail_df.head()

Out[9]: Year Month Day Employed date_dt


0 1990 1 1 13255.8 1990-01-01
1 1990 2 1 12966.3 1990-02-01
2 1990 3 1 12938.2 1990-03-01
3 1990 4 1 13012.3 1990-04-01
4 1990 5 1 13108.3 1990-05-01
Visualize the Employed column vs the date_dt column using Seaborn.
In [10]: sns.relplot(data = us_retail_df, x='date_dt', y='Employed', kind='line', asp

plt.show()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 2/25


22/04/2024, 03:17 week_10_intro_forecasting

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

Extract the Series.


In [11]: retail_series = us_retail_df.Employed.copy()

In [12]: retail_series

Out[12]: 0 13255.8
1 12966.3
2 12938.2
3 13012.3
4 13108.3
...
352 15691.6
353 15775.5
354 15785.9
355 15749.5
356 15611.3
Name: Employed, Length: 357, dtype: float64

In [13]: retail_series.index

Out[13]: RangeIndex(start=0, stop=357, step=1)

Set the index to a DateTimeIndex to enable using Time Series methods.


In [14]: retail_series.index = us_retail_df.date_dt

In [15]: retail_series

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 3/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[15]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2019-05-01 15691.6
2019-06-01 15775.5
2019-07-01 15785.9
2019-08-01 15749.5
2019-09-01 15611.3
Name: Employed, Length: 357, dtype: float64

In [16]: retail_series.index

Out[16]: DatetimeIndex(['1990-01-01', '1990-02-01', '1990-03-01', '1990-04-01',


'1990-05-01', '1990-06-01', '1990-07-01', '1990-08-01',
'1990-09-01', '1990-10-01',
...
'2018-12-01', '2019-01-01', '2019-02-01', '2019-03-01',
'2019-04-01', '2019-05-01', '2019-06-01', '2019-07-01',
'2019-08-01', '2019-09-01'],
dtype='datetime64[ns]', name='date_dt', length=357, freq=Non
e)

We can RESAMPLE the data to force a regular sampling frequency to support


traditional/classic time series methods.
In [17]: ready_series = retail_series.copy().resample('MS').mean()

In [18]: ready_series

Out[18]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2019-05-01 15691.6
2019-06-01 15775.5
2019-07-01 15785.9
2019-08-01 15749.5
2019-09-01 15611.3
Freq: MS, Name: Employed, Length: 357, dtype: float64

Visualize the Series plot.


In [19]: ready_series.plot( figsize=(15, 6) )

plt.show()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 4/25


22/04/2024, 03:17 week_10_intro_forecasting

Split data
Let's split the data into dedicating training and test sets. This way we can get some idea
of how well the forecasting methods are working.
However, the goal of time series forecasters is to forecast the future. Therefore, we
should NEVER randomly split time series data. Instead, we should force the hold-out
test set always be in the future!!!!
Let's first check the number of unique years in the data.
In [20]: us_retail_df.Year.value_counts().sort_index()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 5/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[20]: Year
1990 12
1991 12
1992 12
1993 12
1994 12
1995 12
1996 12
1997 12
1998 12
1999 12
2000 12
2001 12
2002 12
2003 12
2004 12
2005 12
2006 12
2007 12
2008 12
2009 12
2010 12
2011 12
2012 12
2013 12
2014 12
2015 12
2016 12
2017 12
2018 12
2019 9
Name: count, dtype: int64

We can split the data using the DateTimeIndex .


In [21]: ready_series.loc[ ready_series.index < '2017-01-01' ]

Out[21]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2016-08-01 15864.6
2016-09-01 15750.3
2016-10-01 15899.5
2016-11-01 16260.2
2016-12-01 16394.3
Freq: MS, Name: Employed, Length: 324, dtype: float64

Create the training set.


In [22]: train_series = ready_series.loc[ ready_series.index < '2017-01-01' ].copy()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 6/25


22/04/2024, 03:17 week_10_intro_forecasting

Create the hold-out "future" test set.


In [23]: test_series = ready_series.loc[ ready_series.index >= '2017-01-01' ].copy()

Visualize the TRAINING set and the HOLD-OUT future test set.
In [24]: fig, ax = plt.subplots(figsize=(15, 6))

ready_series.plot( ax = ax, label = 'all' )

train_series.plot( ax = ax, label = 'train' )

test_series.plot( ax = ax, label = 'test' )

ax.legend()

plt.show()

If we remove the "ALL" series...then there will be a gap between the training and test
series.
In [25]: fig, ax = plt.subplots(figsize=(15, 6))

#ready_series.plot( ax = ax, label = 'all' )

train_series.plot( ax = ax, label = 'train', color='orange' )

test_series.plot( ax = ax, label = 'test', color = 'green' )

ax.legend()

plt.show()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 7/25


22/04/2024, 03:17 week_10_intro_forecasting

Simple Forecasting
The two simplest forecasting methods:
Average all historical measurements - all future forecasts equal the AVERAGE
Use the most recent (last) observation as the forecast -> Naive method
The average or MEAN method is easy to calculate...
In [26]: train_series.mean()

Out[26]: 14623.75277777778

The most recent or last observation is the Naive forecaster:


In [27]: train_series.iloc[ -1 ]

Out[27]: 16394.3

The Naive method literally uses the LAST observation as the forecast.
In [28]: train_series

Out[28]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2016-08-01 15864.6
2016-09-01 15750.3
2016-10-01 15899.5
2016-11-01 16260.2
2016-12-01 16394.3
Freq: MS, Name: Employed, Length: 324, dtype: float64

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 8/25


22/04/2024, 03:17 week_10_intro_forecasting

Make these simple forecasts on the hold out test set.


Let's compile everything into a Pandas DataFrame. The .index attribute of the
DataFrame is set to the DateTimeIndex.
In [31]: my_forecasts = pd.DataFrame({'observed': test_series.values.copy() },
index=test_series.index)

In [32]: my_forecasts.head()

Out[32]: observed
date_dt
2017-01-01 15854.4
2017-02-01 15627.9
2017-03-01 15635.0
2017-04-01 15686.6
2017-05-01 15759.5
Forecast using the AVERAGE or MEAN method.
In [35]: my_forecasts['AVERAGE'] = train_series.mean()

In [36]: my_forecasts.head()

Out[36]: observed AVERAGE


date_dt
2017-01-01 15854.4 14623.752778
2017-02-01 15627.9 14623.752778
2017-03-01 15635.0 14623.752778
2017-04-01 15686.6 14623.752778
2017-05-01 15759.5 14623.752778
Forecast using the Naive method.
In [37]: my_forecasts['Naive'] = train_series.iloc[-1].copy()

In [38]: my_forecasts.head()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 9/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[38]: observed AVERAGE Naive


date_dt
2017-01-01 15854.4 14623.752778 16394.3
2017-02-01 15627.9 14623.752778 16394.3
2017-03-01 15635.0 14623.752778 16394.3
2017-04-01 15686.6 14623.752778 16394.3
2017-05-01 15759.5 14623.752778 16394.3
Compare the hold out test set observations with the forecasts. Seaborn wide-format
plotting options are used below because the DataFrame .index is the DateTimeIndex.
In [39]: sns.relplot(data = my_forecasts, kind='line', aspect=2)

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

Let's use Pandas plotting and matplotlib plotting to include the Training set and the
forecasts and the test set in a single plot. Neither approach captures the repeating
patterns associated with the hold out test set. However, the Naive method is at least "in
the right ballpark" compared to the AVERAGE method in this example.
In [40]: fig, ax = plt.subplots( figsize=(15, 6) )

train_series.plot(ax=ax, label='train', color='orange' )

test_series.plot(ax=ax, label='test', color='green' )

my_forecasts.AVERAGE.plot( ax=ax, label='AVERAGE', color='black' )

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 10/25


22/04/2024, 03:17 week_10_intro_forecasting

my_forecasts.Naive.plot( ax=ax, label='Naive', color='cyan' )

ax.legend()

plt.show()

We know from our exploration...that there is a SEASONAL pattern present in this data
set!!!!
We can modify our simple forecasts to account for the seasonality by using: SEASONAL
NAIVE forecasting!!!!
Seasonal Naive corresponds to using the last or most recent season as the future
forecasts for all future seasons.
Future forecasts in May will correspond to the most recently observed or last value for
May. While future forecasts for October will be the last October value. Therefore, not all
seasonal (month in this case) forecasts are the same. There seasonal (monthly) variation
is preserved based on the last year in the training data.
The last year in the training data is 2016:
In [42]: us_retail_df.loc[ us_retail_df.Year == 2016 ]

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 11/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[42]: Year Month Day Employed date_dt


312 2016 1 1 15625.7 2016-01-01
313 2016 2 1 15486.6 2016-02-01
314 2016 3 1 15576.8 2016-03-01
315 2016 4 1 15648.7 2016-04-01
316 2016 5 1 15745.7 2016-05-01
317 2016 6 1 15851.8 2016-06-01
318 2016 7 1 15874.4 2016-07-01
319 2016 8 1 15864.6 2016-08-01
320 2016 9 1 15750.3 2016-09-01
321 2016 10 1 15899.5 2016-10-01
322 2016 11 1 16260.2 2016-11-01
323 2016 12 1 16394.3 2016-12-01
Seasonal Naive uses the above values as the forecasts in each future month. There are
many ways to execute the Seasonal Naive forecast method. Let's use some Pandas data
manipulation techniques to execute the Seasonal Naive forecast.
In [44]: my_forecasts_b = my_forecasts.reset_index().copy()

In [45]: my_forecasts_b.head()

Out[45]: date_dt observed AVERAGE Naive


0 2017-01-01 15854.4 14623.752778 16394.3
1 2017-02-01 15627.9 14623.752778 16394.3
2 2017-03-01 15635.0 14623.752778 16394.3
3 2017-04-01 15686.6 14623.752778 16394.3
4 2017-05-01 15759.5 14623.752778 16394.3
Check the data types.
In [46]: my_forecasts_b.info()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 12/25


22/04/2024, 03:17 week_10_intro_forecasting

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date_dt 33 non-null datetime64[ns]
1 observed 33 non-null float64
2 AVERAGE 33 non-null float64
3 Naive 33 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 1.2 KB

Let's extract the Date Time components of Year and Month from the date_dt column.
In [47]: my_forecasts_b['Year'] = my_forecasts_b.date_dt.dt.year

In [48]: my_forecasts_b.head()

Out[48]: date_dt observed AVERAGE Naive Year


0 2017-01-01 15854.4 14623.752778 16394.3 2017
1 2017-02-01 15627.9 14623.752778 16394.3 2017
2 2017-03-01 15635.0 14623.752778 16394.3 2017
3 2017-04-01 15686.6 14623.752778 16394.3 2017
4 2017-05-01 15759.5 14623.752778 16394.3 2017
In [49]: my_forecasts_b['Month'] = my_forecasts_b.date_dt.dt.month

In [50]: my_forecasts_b.head()

Out[50]: date_dt observed AVERAGE Naive Year Month


0 2017-01-01 15854.4 14623.752778 16394.3 2017 1
1 2017-02-01 15627.9 14623.752778 16394.3 2017 2
2 2017-03-01 15635.0 14623.752778 16394.3 2017 3
3 2017-04-01 15686.6 14623.752778 16394.3 2017 4
4 2017-05-01 15759.5 14623.752778 16394.3 2017 5
We can now JOIN or MERGE the Seasonal Naive forecasts from 2016 (the most recent
year in the training set) to ALL FUTURE forecast months!!!
The "smaller" data set of the most recent monthly measurements is shown below.
In [51]: us_retail_df.loc[ us_retail_df.Year == 2016, ['Month', 'Employed']].rename(c

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 13/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[51]: Month Seasonal_Naive


312 1 15625.7
313 2 15486.6
314 3 15576.8
315 4 15648.7
316 5 15745.7
317 6 15851.8
318 7 15874.4
319 8 15864.6
320 9 15750.3
321 10 15899.5
322 11 16260.2
323 12 16394.3
Join the above small data to the larger forecast DataFrame.
In [52]: my_forecasts_b.merge( us_retail_df.loc[ us_retail_df.Year == 2016, ['Month',
on=['Month'],
how='left' )

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 14/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[52]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive


0 2017-01-01 15854.4 14623.752778 16394.3 2017 1 15625.7
1 2017-02-01 15627.9 14623.752778 16394.3 2017 2 15486.6
2 2017-03-01 15635.0 14623.752778 16394.3 2017 3 15576.8
3 2017-04-01 15686.6 14623.752778 16394.3 2017 4 15648.7
4 2017-05-01 15759.5 14623.752778 16394.3 2017 5 15745.7
5 2017-06-01 15843.0 14623.752778 16394.3 2017 6 15851.8
6 2017-07-01 15841.1 14623.752778 16394.3 2017 7 15874.4
7 2017-08-01 15810.2 14623.752778 16394.3 2017 8 15864.6
8 2017-09-01 15679.3 14623.752778 16394.3 2017 9 15750.3
9 2017-10-01 15819.9 14623.752778 16394.3 2017 10 15899.5
10 2017-11-01 16285.8 14623.752778 16394.3 2017 11 16260.2
11 2017-12-01 16305.9 14623.752778 16394.3 2017 12 16394.3
12 2018-01-01 15718.6 14623.752778 16394.3 2018 1 15625.7
13 2018-02-01 15577.0 14623.752778 16394.3 2018 2 15486.6
14 2018-03-01 15610.8 14623.752778 16394.3 2018 3 15576.8
15 2018-04-01 15681.4 14623.752778 16394.3 2018 4 15648.7
16 2018-05-01 15797.2 14623.752778 16394.3 2018 5 15745.7
17 2018-06-01 15844.9 14623.752778 16394.3 2018 6 15851.8
18 2018-07-01 15854.5 14623.752778 16394.3 2018 7 15874.4
19 2018-08-01 15834.9 14623.752778 16394.3 2018 8 15864.6
20 2018-09-01 15680.6 14623.752778 16394.3 2018 9 15750.3
21 2018-10-01 15796.5 14623.752778 16394.3 2018 10 15899.5
22 2018-11-01 16291.3 14623.752778 16394.3 2018 11 16260.2
23 2018-12-01 16309.2 14623.752778 16394.3 2018 12 16394.3
24 2019-01-01 15753.5 14623.752778 16394.3 2019 1 15625.7
25 2019-02-01 15567.4 14623.752778 16394.3 2019 2 15486.6
26 2019-03-01 15576.6 14623.752778 16394.3 2019 3 15576.8
27 2019-04-01 15624.9 14623.752778 16394.3 2019 4 15648.7
28 2019-05-01 15691.6 14623.752778 16394.3 2019 5 15745.7
29 2019-06-01 15775.5 14623.752778 16394.3 2019 6 15851.8
30 2019-07-01 15785.9 14623.752778 16394.3 2019 7 15874.4
31 2019-08-01 15749.5 14623.752778 16394.3 2019 8 15864.6
file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 15/25
22/04/2024, 03:17 week_10_intro_forecasting

date_dt observed AVERAGE Naive Year Month Seasonal_Naive


32 2019-09-01 15611.3 14623.752778 16394.3 2019 9 15750.3
Assign the joined data to a new object.
In [53]: my_forecasts_c = my_forecasts_b.merge( us_retail_df.loc[ us_retail_df.Year =
on=['Month'],
how='left' ).\
copy()

In [54]: my_forecasts_c.head()

Out[54]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive


0 2017-01-01 15854.4 14623.752778 16394.3 2017 1 15625.7
1 2017-02-01 15627.9 14623.752778 16394.3 2017 2 15486.6
2 2017-03-01 15635.0 14623.752778 16394.3 2017 3 15576.8
3 2017-04-01 15686.6 14623.752778 16394.3 2017 4 15648.7
4 2017-05-01 15759.5 14623.752778 16394.3 2017 5 15745.7
Let's force the index to be a DateTimeIndex based on the date_dt column using the
.set_index() method.

In [55]: my_forecasts_c.set_index('date_dt').head()

Out[55]: observed AVERAGE Naive Year Month Seasonal_Naive


date_dt
2017-01-01 15854.4 14623.752778 16394.3 2017 1 15625.7
2017-02-01 15627.9 14623.752778 16394.3 2017 2 15486.6
2017-03-01 15635.0 14623.752778 16394.3 2017 3 15576.8
2017-04-01 15686.6 14623.752778 16394.3 2017 4 15648.7
2017-05-01 15759.5 14623.752778 16394.3 2017 5 15745.7
By default Pandas does not modify in place and so my_forecasts_c is not changed.
In [56]: my_forecasts_c.head()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 16/25


22/04/2024, 03:17 week_10_intro_forecasting

Out[56]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive


0 2017-01-01 15854.4 14623.752778 16394.3 2017 1 15625.7
1 2017-02-01 15627.9 14623.752778 16394.3 2017 2 15486.6
2 2017-03-01 15635.0 14623.752778 16394.3 2017 3 15576.8
3 2017-04-01 15686.6 14623.752778 16394.3 2017 4 15648.7
4 2017-05-01 15759.5 14623.752778 16394.3 2017 5 15745.7
Let's make a copy of the DataFrame to be safe.
In [57]: my_forecasts_d = my_forecasts_c.copy()

And now set the index to be a DateTimeIndex.


In [58]: my_forecasts_d.set_index('date_dt', inplace=True, drop=False)

In [59]: my_forecasts_d.head()

Out[59]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive


date_dt
2017- 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7
01-01 01-01
2017- 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6
02-01 02-01
2017- 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8
03-01 03-01
2017- 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7
04-01 04-01
2017- 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7
05-01 05-01
Visualize the Seasonal Naive forecasts compared to the other forecasting procedures.
The Seasonal Naive approach allows captures the seasonal pattern! Seasonal Naive is a
very useful method when there is an important seasonal component.
In [60]: fig, ax = plt.subplots( figsize=(15, 6) )

train_series.plot(ax=ax, label='train', color='orange' )

test_series.plot(ax=ax, label='test', color='green' )

my_forecasts_d.AVERAGE.plot( ax=ax, label='AVERAGE', color='black' )

my_forecasts_d.Naive.plot( ax=ax, label='Naive', color='cyan' )

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 17/25


22/04/2024, 03:17 week_10_intro_forecasting

my_forecasts_d.Seasonal_Naive.plot( ax=ax, label='Seasonal Naive', color='ma

ax.legend()

plt.show()

Combine simple
Decomposition forecast with Time Series
Uses a time series decomposition method to enable a simple forecaster which must the
be re-seasonalized. Let's use the STL decomposition for this example.
In [61]: from statsmodels.tsa.seasonal import STL

In [62]: train_stl_fit = STL( train_series ).fit()

In [63]: fig = train_stl_fit.plot()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 18/25


22/04/2024, 03:17 week_10_intro_forecasting

Calculate the seasonally adjusted data.


In [64]: df_stl_train = pd.DataFrame({'observed': train_stl_fit.observed,
'seasonal_adjust': train_stl_fit.observed - tra
index=train_series.index)

In [65]: sns.relplot(data = df_stl_train, kind='line', aspect=3)

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

We will use the Naive method...but apply the Naive logic to the seasonally adjusted
data. Thus, we will use the last or most recent seasonally adjusted value.
file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 19/25
22/04/2024, 03:17 week_10_intro_forecasting

In [66]: df_stl_train.seasonal_adjust.iloc[-1]

Out[66]: 15913.226856066618

Re-seasonalize by adding the seasonal component to the last seasonally adjusted


value. The seasonal component comes from the decomposition. The .seasonal
attribute for the last year in the decomposition is shown below.
In [67]: train_stl_fit.seasonal[ train_stl_fit.seasonal.index >= '2016-01-01' ]

Out[67]: date_dt
2016-01-01 -104.057251
2016-02-01 -279.689973
2016-03-01 -216.257000
2016-04-01 -153.775629
2016-05-01 -64.140218
2016-06-01 25.390812
2016-07-01 34.305993
2016-08-01 -1.915552
2016-09-01 -134.848905
2016-10-01 15.664630
2016-11-01 364.508477
2016-12-01 481.073144
Freq: MS, Name: season, dtype: float64

ADD the seasonally adjusted Naive value to the most recent year's Seasonal
component!!!
In [68]: train_stl_fit.seasonal[ train_stl_fit.seasonal.index >= '2016-01-01' ] + df_

Out[68]: date_dt
2016-01-01 15809.169605
2016-02-01 15633.536883
2016-03-01 15696.969856
2016-04-01 15759.451227
2016-05-01 15849.086638
2016-06-01 15938.617668
2016-07-01 15947.532849
2016-08-01 15911.311304
2016-09-01 15778.377951
2016-10-01 15928.891486
2016-11-01 16277.735333
2016-12-01 16394.300000
Freq: MS, Name: season, dtype: float64

Manually apply the re-seasoned forecast to the future...


In [69]: reseason_naive_forecast = train_stl_fit.seasonal[ train_stl_fit.seasonal.ind

Reset the index to convert the Pandas Series to a DataFrame.


In [70]: df_reseason_naive_forecast = reseason_naive_forecast.reset_index()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 20/25


22/04/2024, 03:17 week_10_intro_forecasting

In [71]: df_reseason_naive_forecast

Out[71]: date_dt season


0 2016-01-01 15809.169605
1 2016-02-01 15633.536883
2 2016-03-01 15696.969856
3 2016-04-01 15759.451227
4 2016-05-01 15849.086638
5 2016-06-01 15938.617668
6 2016-07-01 15947.532849
7 2016-08-01 15911.311304
8 2016-09-01 15778.377951
9 2016-10-01 15928.891486
10 2016-11-01 16277.735333
11 2016-12-01 16394.300000
Extract the Month DateTime component from the date_dt column.
In [72]: df_reseason_naive_forecast['Month'] = df_reseason_naive_forecast.date_dt.dt.

In [73]: df_reseason_naive_forecast

Out[73]: date_dt season Month


0 2016-01-01 15809.169605 1
1 2016-02-01 15633.536883 2
2 2016-03-01 15696.969856 3
3 2016-04-01 15759.451227 4
4 2016-05-01 15849.086638 5
5 2016-06-01 15938.617668 6
6 2016-07-01 15947.532849 7
7 2016-08-01 15911.311304 8
8 2016-09-01 15778.377951 9
9 2016-10-01 15928.891486 10
10 2016-11-01 16277.735333 11
11 2016-12-01 16394.300000 12

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 21/25


22/04/2024, 03:17 week_10_intro_forecasting

Merge the above forecasts with the larger hold-out test forecast DataFrame.
In [74]: df_reseason_naive_forecast.loc[:, ['Month', 'season']].rename(columns={'seas

Out[74]: Month STL_Reseason_Naive


0 1 15809.169605
1 2 15633.536883
2 3 15696.969856
3 4 15759.451227
4 5 15849.086638
5 6 15938.617668
6 7 15947.532849
7 8 15911.311304
8 9 15778.377951
9 10 15928.891486
10 11 16277.735333
11 12 16394.300000
In [75]: my_forecasts_d.merge( df_reseason_naive_forecast.loc[:, ['Month', 'season']]
on=['Month'],
how='left').\
head()

Out[75]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive STL_Rese


0 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7 1
01-01
1 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6 15
02-01
2 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8 15
03-01
3 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7 1
04-01
4 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7 15
05-01
Assign the forecasts with the reseasoned STL reseasoned Naive forecast to a new
DataFrame.
In [76]: my_forecasts_e = my_forecasts_d.merge( df_reseason_naive_forecast.loc[:, ['M
on=['Month'],

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 22/25


22/04/2024, 03:17 week_10_intro_forecasting

how='left').\
copy()

In [77]: my_forecasts_e.head()

Out[77]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive STL_Rese


0 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7 1
01-01
1 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6 15
02-01
2 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8 15
03-01
3 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7 1
04-01
4 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7 15
05-01
Set the .index to be a DateTimeIndex to support the Seaborn wide-format plotting
options.
In [78]: my_forecasts_f = my_forecasts_e.set_index('date_dt', drop=False)

In [79]: my_forecasts_f.head()

Out[79]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive ST


date_dt
2017- 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7
01-01 01-01
2017- 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6
02-01 02-01
2017- 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8
03-01 03-01
2017- 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7
04-01 04-01
2017- 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7
05-01 05-01
Visualize the 4 different simple forecasting methods in one plot and compare to the
known hold-out test set. Reseasoning the Naive seasonally adjusted forecast enables
the forecasts to capture the cyclic pattern present in the hold out data.
In [83]: fig, ax = plt.subplots( figsize=(15, 6) )

train_series.plot(ax=ax, label='train', color='orange' )

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 23/25


22/04/2024, 03:17 week_10_intro_forecasting

test_series.plot(ax=ax, label='test', color='green' )

my_forecasts_f.AVERAGE.plot( ax=ax, label='AVERAGE', color='black' )

my_forecasts_f.Naive.plot( ax=ax, label='Naive', color='cyan' )

my_forecasts_f.Seasonal_Naive.plot( ax=ax, label='Seasonal Naive', color='ma

my_forecasts_f.STL_Reseason_Naive.plot( ax=ax, label='STL Reseason Naive', c

ax.legend()

plt.show()

Model selection
You have seen 4 different forecasting methods in this report. Three of the methods
involve NO parameters or model "fitting". Summary statistics are used as the forecast!
These three methods are foundation for all other forecasting methods. The last method
combined a simple approach with a time series decomposition to enable capturing more
advanced patterns. The fourth approach is your first "advanced" method because it
combines the decomposition approach from visualizing and exploring the time series
with a simple forecasting procedure. The simple forecasting methods were executed
using Pandas attributes, methods, and functions, but there are multiple ways to execute
these simple strategies.
The 4 methods were visually compared on the hold out test set. The AVERAGE or MEAN
method clearly does not capture the hold out test set behavior. However, it is visually
difficult to tell which method is better between Seasonal Naive and the STL Reseason
Naive approach. Let's quantify the performance on the hold out test set by calculating a
performance metric appropriate for regression problems. THe cells below calculate
the RMSE for each of the 4 forecasting methods. As shown by the values displayed to

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 24/25


22/04/2024, 03:17 week_10_intro_forecasting

the screen, the Seasonal Naive method has the lowest RMSE on the hold-out test set.
Thus, the Seasonal Naive outperforms the STL Reseasoning approach!
In [80]: np.sqrt( ( ( my_forecasts_f.observed - my_forecasts_f.Seasonal_Naive )**2 ).

Out[80]: 81.74907839347037

In [81]: np.sqrt( ( ( my_forecasts_f.observed - my_forecasts_f.AVERAGE )**2 ).mean()

Out[81]: 1190.6051791210214

In [82]: np.sqrt( ( ( my_forecasts_f.observed - my_forecasts_f.Naive )**2 ).mean() )

Out[82]: 632.3527036537291

In [83]: np.sqrt( ( ( my_forecasts_f.observed - my_forecasts_f.STL_Reseason_Naive )**

Out[83]: 100.99782463931113

Conclusion
Three of the four methods involve zero parameters. They are summary statistics which
are VERY easy to interpret and describe. You should always include these simple
methods as benchmarks to compare against more complex time series forecasting
methods.
In [ ]:

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 25/25

You might also like