0% found this document useful (0 votes)

45 views25 pages

Week 10 Intro Forecasting

Time series

Uploaded by

arnablions

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views25 pages

Week 10 Intro Forecasting

Time series

Uploaded by

arnablions

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

22/04/2024, 03:17 week_10_intro_forecasting

CMPINF 2120 - Week 10

Introduction to time series forecasting methods
This report introduces the basic concepts of forecasting methods by focusing on 3
simple approaches. You will learn how to use the AVERAGE, Naive, and Seasonal Naive
forecasting methods work. You will then see how to combine these simple approaches
with decompositions.
These simple approaches are the fundamental building blocks of time series forecasting.
They must be understood to before the more advanced methods can be utulized.

Import Modules
In [1]: import numpy as np
import pandas as pd
import [Link] as plt

import seaborn as sns

In [2]: import [Link] as sm

Read data
Let's use the US retail employment example again.
In [4]: us_retail_df = pd.read_csv('us_retail_employment.csv')

In [5]: us_retail_df.info()

<class '[Link]'>
RangeIndex: 357 entries, 0 to 356
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 357 non-null int64
1 Month 357 non-null int64
2 Day 357 non-null int64
3 Employed 357 non-null float64
dtypes: float64(1), int64(3)
memory usage: 11.3 KB

In [6]: us_retail_df.head()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 1/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[6]: Year Month Day Employed

0 1990 1 1 13255.8
1 1990 2 1 12966.3
2 1990 3 1 12938.2
3 1990 4 1 13012.3
4 1990 5 1 13108.3

Prepare data
We need to create the datetime object column and then separate the Employed
column into its own Series.
In [7]: us_retail_df['date_dt'] = pd.to_datetime( us_retail_df.loc[:, ['Year', 'Mont

In [8]: us_retail_df.info()

<class '[Link]'>
RangeIndex: 357 entries, 0 to 356
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 357 non-null int64
1 Month 357 non-null int64
2 Day 357 non-null int64
3 Employed 357 non-null float64
4 date_dt 357 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(3)
memory usage: 14.1 KB

In [9]: us_retail_df.head()

Out[9]: Year Month Day Employed date_dt

0 1990 1 1 13255.8 1990-01-01
1 1990 2 1 12966.3 1990-02-01
2 1990 3 1 12938.2 1990-03-01
3 1990 4 1 13012.3 1990-04-01
4 1990 5 1 13108.3 1990-05-01
Visualize the Employed column vs the date_dt column using Seaborn.
In [10]: [Link](data = us_retail_df, x='date_dt', y='Employed', kind='line', asp

[Link]()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 2/25

22/04/2024, 03:17 week_10_intro_forecasting

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
[Link]: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

Extract the Series.

In [11]: retail_series = us_retail_df.[Link]()

In [12]: retail_series

Out[12]: 0 13255.8
1 12966.3
2 12938.2
3 13012.3
4 13108.3
...
352 15691.6
353 15775.5
354 15785.9
355 15749.5
356 15611.3
Name: Employed, Length: 357, dtype: float64

In [13]: retail_series.index

Out[13]: RangeIndex(start=0, stop=357, step=1)

Set the index to a DateTimeIndex to enable using Time Series methods.

In [14]: retail_series.index = us_retail_df.date_dt

In [15]: retail_series

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 3/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[15]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2019-05-01 15691.6
2019-06-01 15775.5
2019-07-01 15785.9
2019-08-01 15749.5
2019-09-01 15611.3
Name: Employed, Length: 357, dtype: float64

In [16]: retail_series.index

Out[16]: DatetimeIndex(['1990-01-01', '1990-02-01', '1990-03-01', '1990-04-01',

'1990-05-01', '1990-06-01', '1990-07-01', '1990-08-01',
'1990-09-01', '1990-10-01',
...
'2018-12-01', '2019-01-01', '2019-02-01', '2019-03-01',
'2019-04-01', '2019-05-01', '2019-06-01', '2019-07-01',
'2019-08-01', '2019-09-01'],
dtype='datetime64[ns]', name='date_dt', length=357, freq=Non
e)

We can RESAMPLE the data to force a regular sampling frequency to support

traditional/classic time series methods.
In [17]: ready_series = retail_series.copy().resample('MS').mean()

In [18]: ready_series

Out[18]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2019-05-01 15691.6
2019-06-01 15775.5
2019-07-01 15785.9
2019-08-01 15749.5
2019-09-01 15611.3
Freq: MS, Name: Employed, Length: 357, dtype: float64

Visualize the Series plot.

In [19]: ready_series.plot( figsize=(15, 6) )

[Link]()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 4/25

22/04/2024, 03:17 week_10_intro_forecasting

Split data
Let's split the data into dedicating training and test sets. This way we can get some idea
of how well the forecasting methods are working.
However, the goal of time series forecasters is to forecast the future. Therefore, we
should NEVER randomly split time series data. Instead, we should force the hold-out
test set always be in the future!!!!
Let's first check the number of unique years in the data.
In [20]: us_retail_df.Year.value_counts().sort_index()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 5/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[20]: Year
1990 12
1991 12
1992 12
1993 12
1994 12
1995 12
1996 12
1997 12
1998 12
1999 12
2000 12
2001 12
2002 12
2003 12
2004 12
2005 12
2006 12
2007 12
2008 12
2009 12
2010 12
2011 12
2012 12
2013 12
2014 12
2015 12
2016 12
2017 12
2018 12
2019 9
Name: count, dtype: int64

We can split the data using the DateTimeIndex .

In [21]: ready_series.loc[ ready_series.index < '2017-01-01' ]

Out[21]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2016-08-01 15864.6
2016-09-01 15750.3
2016-10-01 15899.5
2016-11-01 16260.2
2016-12-01 16394.3
Freq: MS, Name: Employed, Length: 324, dtype: float64

Create the training set.

In [22]: train_series = ready_series.loc[ ready_series.index < '2017-01-01' ].copy()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 6/25

22/04/2024, 03:17 week_10_intro_forecasting

Create the hold-out "future" test set.

In [23]: test_series = ready_series.loc[ ready_series.index >= '2017-01-01' ].copy()

Visualize the TRAINING set and the HOLD-OUT future test set.
In [24]: fig, ax = [Link](figsize=(15, 6))

ready_series.plot( ax = ax, label = 'all' )

train_series.plot( ax = ax, label = 'train' )

test_series.plot( ax = ax, label = 'test' )

[Link]()

If we remove the "ALL" series...then there will be a gap between the training and test
series.
In [25]: fig, ax = [Link](figsize=(15, 6))

#ready_series.plot( ax = ax, label = 'all' )

train_series.plot( ax = ax, label = 'train', color='orange' )

test_series.plot( ax = ax, label = 'test', color = 'green' )

[Link]()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 7/25

22/04/2024, 03:17 week_10_intro_forecasting

Simple Forecasting
The two simplest forecasting methods:
Average all historical measurements - all future forecasts equal the AVERAGE
Use the most recent (last) observation as the forecast -> Naive method
The average or MEAN method is easy to calculate...
In [26]: train_series.mean()

Out[26]: 14623.75277777778

The most recent or last observation is the Naive forecaster:

In [27]: train_series.iloc[ -1 ]

Out[27]: 16394.3

The Naive method literally uses the LAST observation as the forecast.
In [28]: train_series

Out[28]: date_dt
1990-01-01 13255.8
1990-02-01 12966.3
1990-03-01 12938.2
1990-04-01 13012.3
1990-05-01 13108.3
...
2016-08-01 15864.6
2016-09-01 15750.3
2016-10-01 15899.5
2016-11-01 16260.2
2016-12-01 16394.3
Freq: MS, Name: Employed, Length: 324, dtype: float64

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 8/25

22/04/2024, 03:17 week_10_intro_forecasting

Make these simple forecasts on the hold out test set.

Let's compile everything into a Pandas DataFrame. The .index attribute of the
DataFrame is set to the DateTimeIndex.
In [31]: my_forecasts = [Link]({'observed': test_series.[Link]() },
index=test_series.index)

In [32]: my_forecasts.head()

Out[32]: observed
date_dt
2017-01-01 15854.4
2017-02-01 15627.9
2017-03-01 15635.0
2017-04-01 15686.6
2017-05-01 15759.5
Forecast using the AVERAGE or MEAN method.
In [35]: my_forecasts['AVERAGE'] = train_series.mean()

In [36]: my_forecasts.head()

Out[36]: observed AVERAGE

date_dt
2017-01-01 15854.4 14623.752778
2017-02-01 15627.9 14623.752778
2017-03-01 15635.0 14623.752778
2017-04-01 15686.6 14623.752778
2017-05-01 15759.5 14623.752778
Forecast using the Naive method.
In [37]: my_forecasts['Naive'] = train_series.iloc[-1].copy()

In [38]: my_forecasts.head()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 9/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[38]: observed AVERAGE Naive

date_dt
2017-01-01 15854.4 14623.752778 16394.3
2017-02-01 15627.9 14623.752778 16394.3
2017-03-01 15635.0 14623.752778 16394.3
2017-04-01 15686.6 14623.752778 16394.3
2017-05-01 15759.5 14623.752778 16394.3
Compare the hold out test set observations with the forecasts. Seaborn wide-format
plotting options are used below because the DataFrame .index is the DateTimeIndex.
In [39]: [Link](data = my_forecasts, kind='line', aspect=2)

[Link]()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
[Link]: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

Let's use Pandas plotting and matplotlib plotting to include the Training set and the
forecasts and the test set in a single plot. Neither approach captures the repeating
patterns associated with the hold out test set. However, the Naive method is at least "in
the right ballpark" compared to the AVERAGE method in this example.
In [40]: fig, ax = [Link]( figsize=(15, 6) )

train_series.plot(ax=ax, label='train', color='orange' )

test_series.plot(ax=ax, label='test', color='green' )

my_forecasts.[Link]( ax=ax, label='AVERAGE', color='black' )

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 10/25

22/04/2024, 03:17 week_10_intro_forecasting

my_forecasts.[Link]( ax=ax, label='Naive', color='cyan' )

[Link]()

We know from our exploration...that there is a SEASONAL pattern present in this data
set!!!!
We can modify our simple forecasts to account for the seasonality by using: SEASONAL
NAIVE forecasting!!!!
Seasonal Naive corresponds to using the last or most recent season as the future
forecasts for all future seasons.
Future forecasts in May will correspond to the most recently observed or last value for
May. While future forecasts for October will be the last October value. Therefore, not all
seasonal (month in this case) forecasts are the same. There seasonal (monthly) variation
is preserved based on the last year in the training data.
The last year in the training data is 2016:
In [42]: us_retail_df.loc[ us_retail_df.Year == 2016 ]

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 11/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[42]: Year Month Day Employed date_dt

312 2016 1 1 15625.7 2016-01-01
313 2016 2 1 15486.6 2016-02-01
314 2016 3 1 15576.8 2016-03-01
315 2016 4 1 15648.7 2016-04-01
316 2016 5 1 15745.7 2016-05-01
317 2016 6 1 15851.8 2016-06-01
318 2016 7 1 15874.4 2016-07-01
319 2016 8 1 15864.6 2016-08-01
320 2016 9 1 15750.3 2016-09-01
321 2016 10 1 15899.5 2016-10-01
322 2016 11 1 16260.2 2016-11-01
323 2016 12 1 16394.3 2016-12-01
Seasonal Naive uses the above values as the forecasts in each future month. There are
many ways to execute the Seasonal Naive forecast method. Let's use some Pandas data
manipulation techniques to execute the Seasonal Naive forecast.
In [44]: my_forecasts_b = my_forecasts.reset_index().copy()

In [45]: my_forecasts_b.head()

Out[45]: date_dt observed AVERAGE Naive

0 2017-01-01 15854.4 14623.752778 16394.3
1 2017-02-01 15627.9 14623.752778 16394.3
2 2017-03-01 15635.0 14623.752778 16394.3
3 2017-04-01 15686.6 14623.752778 16394.3
4 2017-05-01 15759.5 14623.752778 16394.3
Check the data types.
In [46]: my_forecasts_b.info()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 12/25

22/04/2024, 03:17 week_10_intro_forecasting

<class '[Link]'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date_dt 33 non-null datetime64[ns]
1 observed 33 non-null float64
2 AVERAGE 33 non-null float64
3 Naive 33 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 1.2 KB

Let's extract the Date Time components of Year and Month from the date_dt column.
In [47]: my_forecasts_b['Year'] = my_forecasts_b.date_dt.[Link]

In [48]: my_forecasts_b.head()

Out[48]: date_dt observed AVERAGE Naive Year

0 2017-01-01 15854.4 14623.752778 16394.3 2017
1 2017-02-01 15627.9 14623.752778 16394.3 2017
2 2017-03-01 15635.0 14623.752778 16394.3 2017
3 2017-04-01 15686.6 14623.752778 16394.3 2017
4 2017-05-01 15759.5 14623.752778 16394.3 2017
In [49]: my_forecasts_b['Month'] = my_forecasts_b.date_dt.[Link]

In [50]: my_forecasts_b.head()

Out[50]: date_dt observed AVERAGE Naive Year Month

0 2017-01-01 15854.4 14623.752778 16394.3 2017 1
1 2017-02-01 15627.9 14623.752778 16394.3 2017 2
2 2017-03-01 15635.0 14623.752778 16394.3 2017 3
3 2017-04-01 15686.6 14623.752778 16394.3 2017 4
4 2017-05-01 15759.5 14623.752778 16394.3 2017 5
We can now JOIN or MERGE the Seasonal Naive forecasts from 2016 (the most recent
year in the training set) to ALL FUTURE forecast months!!!
The "smaller" data set of the most recent monthly measurements is shown below.
In [51]: us_retail_df.loc[ us_retail_df.Year == 2016, ['Month', 'Employed']].rename(c

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 13/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[51]: Month Seasonal_Naive

312 1 15625.7
313 2 15486.6
314 3 15576.8
315 4 15648.7
316 5 15745.7
317 6 15851.8
318 7 15874.4
319 8 15864.6
320 9 15750.3
321 10 15899.5
322 11 16260.2
323 12 16394.3
Join the above small data to the larger forecast DataFrame.
In [52]: my_forecasts_b.merge( us_retail_df.loc[ us_retail_df.Year == 2016, ['Month',
on=['Month'],
how='left' )

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 14/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[52]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive

0 2017-01-01 15854.4 14623.752778 16394.3 2017 1 15625.7
1 2017-02-01 15627.9 14623.752778 16394.3 2017 2 15486.6
2 2017-03-01 15635.0 14623.752778 16394.3 2017 3 15576.8
3 2017-04-01 15686.6 14623.752778 16394.3 2017 4 15648.7
4 2017-05-01 15759.5 14623.752778 16394.3 2017 5 15745.7
5 2017-06-01 15843.0 14623.752778 16394.3 2017 6 15851.8
6 2017-07-01 15841.1 14623.752778 16394.3 2017 7 15874.4
7 2017-08-01 15810.2 14623.752778 16394.3 2017 8 15864.6
8 2017-09-01 15679.3 14623.752778 16394.3 2017 9 15750.3
9 2017-10-01 15819.9 14623.752778 16394.3 2017 10 15899.5
10 2017-11-01 16285.8 14623.752778 16394.3 2017 11 16260.2
11 2017-12-01 16305.9 14623.752778 16394.3 2017 12 16394.3
12 2018-01-01 15718.6 14623.752778 16394.3 2018 1 15625.7
13 2018-02-01 15577.0 14623.752778 16394.3 2018 2 15486.6
14 2018-03-01 15610.8 14623.752778 16394.3 2018 3 15576.8
15 2018-04-01 15681.4 14623.752778 16394.3 2018 4 15648.7
16 2018-05-01 15797.2 14623.752778 16394.3 2018 5 15745.7
17 2018-06-01 15844.9 14623.752778 16394.3 2018 6 15851.8
18 2018-07-01 15854.5 14623.752778 16394.3 2018 7 15874.4
19 2018-08-01 15834.9 14623.752778 16394.3 2018 8 15864.6
20 2018-09-01 15680.6 14623.752778 16394.3 2018 9 15750.3
21 2018-10-01 15796.5 14623.752778 16394.3 2018 10 15899.5
22 2018-11-01 16291.3 14623.752778 16394.3 2018 11 16260.2
23 2018-12-01 16309.2 14623.752778 16394.3 2018 12 16394.3
24 2019-01-01 15753.5 14623.752778 16394.3 2019 1 15625.7
25 2019-02-01 15567.4 14623.752778 16394.3 2019 2 15486.6
26 2019-03-01 15576.6 14623.752778 16394.3 2019 3 15576.8
27 2019-04-01 15624.9 14623.752778 16394.3 2019 4 15648.7
28 2019-05-01 15691.6 14623.752778 16394.3 2019 5 15745.7
29 2019-06-01 15775.5 14623.752778 16394.3 2019 6 15851.8
30 2019-07-01 15785.9 14623.752778 16394.3 2019 7 15874.4
31 2019-08-01 15749.5 14623.752778 16394.3 2019 8 15864.6
[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 15/25
22/04/2024, 03:17 week_10_intro_forecasting

date_dt observed AVERAGE Naive Year Month Seasonal_Naive

32 2019-09-01 15611.3 14623.752778 16394.3 2019 9 15750.3
Assign the joined data to a new object.
In [53]: my_forecasts_c = my_forecasts_b.merge( us_retail_df.loc[ us_retail_df.Year =
on=['Month'],
how='left' ).\
copy()

In [54]: my_forecasts_c.head()

Out[54]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive

In [55]: my_forecasts_c.set_index('date_dt').head()

Out[55]: observed AVERAGE Naive Year Month Seasonal_Naive

date_dt
2017-01-01 15854.4 14623.752778 16394.3 2017 1 15625.7
2017-02-01 15627.9 14623.752778 16394.3 2017 2 15486.6
2017-03-01 15635.0 14623.752778 16394.3 2017 3 15576.8
2017-04-01 15686.6 14623.752778 16394.3 2017 4 15648.7
2017-05-01 15759.5 14623.752778 16394.3 2017 5 15745.7
By default Pandas does not modify in place and so my_forecasts_c is not changed.
In [56]: my_forecasts_c.head()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 16/25

22/04/2024, 03:17 week_10_intro_forecasting

Out[56]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive

And now set the index to be a DateTimeIndex.

In [58]: my_forecasts_d.set_index('date_dt', inplace=True, drop=False)

In [59]: my_forecasts_d.head()

Out[59]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive

date_dt
2017- 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7
01-01 01-01
2017- 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6
02-01 02-01
2017- 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8
03-01 03-01
2017- 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7
04-01 04-01
2017- 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7
05-01 05-01
Visualize the Seasonal Naive forecasts compared to the other forecasting procedures.
The Seasonal Naive approach allows captures the seasonal pattern! Seasonal Naive is a
very useful method when there is an important seasonal component.
In [60]: fig, ax = [Link]( figsize=(15, 6) )

train_series.plot(ax=ax, label='train', color='orange' )

test_series.plot(ax=ax, label='test', color='green' )

my_forecasts_d.[Link]( ax=ax, label='AVERAGE', color='black' )

my_forecasts_d.[Link]( ax=ax, label='Naive', color='cyan' )

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 17/25

22/04/2024, 03:17 week_10_intro_forecasting

my_forecasts_d.Seasonal_Naive.plot( ax=ax, label='Seasonal Naive', color='ma

[Link]()

Combine simple
Decomposition forecast with Time Series
Uses a time series decomposition method to enable a simple forecaster which must the
be re-seasonalized. Let's use the STL decomposition for this example.
In [61]: from [Link] import STL

In [62]: train_stl_fit = STL( train_series ).fit()

In [63]: fig = train_stl_fit.plot()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 18/25

22/04/2024, 03:17 week_10_intro_forecasting

Calculate the seasonally adjusted data.

In [64]: df_stl_train = [Link]({'observed': train_stl_fit.observed,
'seasonal_adjust': train_stl_fit.observed - tra
index=train_series.index)

In [65]: [Link](data = df_stl_train, kind='line', aspect=3)

[Link]()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
[Link]: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

We will use the Naive method...but apply the Naive logic to the seasonally adjusted
data. Thus, we will use the last or most recent seasonally adjusted value.
[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 19/25
22/04/2024, 03:17 week_10_intro_forecasting

In [66]: df_stl_train.seasonal_adjust.iloc[-1]

Out[66]: 15913.226856066618

Re-seasonalize by adding the seasonal component to the last seasonally adjusted

value. The seasonal component comes from the decomposition. The .seasonal
attribute for the last year in the decomposition is shown below.
In [67]: train_stl_fit.seasonal[ train_stl_fit.[Link] >= '2016-01-01' ]

Out[67]: date_dt
2016-01-01 -104.057251
2016-02-01 -279.689973
2016-03-01 -216.257000
2016-04-01 -153.775629
2016-05-01 -64.140218
2016-06-01 25.390812
2016-07-01 34.305993
2016-08-01 -1.915552
2016-09-01 -134.848905
2016-10-01 15.664630
2016-11-01 364.508477
2016-12-01 481.073144
Freq: MS, Name: season, dtype: float64

ADD the seasonally adjusted Naive value to the most recent year's Seasonal
component!!!
In [68]: train_stl_fit.seasonal[ train_stl_fit.[Link] >= '2016-01-01' ] + df_

Out[68]: date_dt
2016-01-01 15809.169605
2016-02-01 15633.536883
2016-03-01 15696.969856
2016-04-01 15759.451227
2016-05-01 15849.086638
2016-06-01 15938.617668
2016-07-01 15947.532849
2016-08-01 15911.311304
2016-09-01 15778.377951
2016-10-01 15928.891486
2016-11-01 16277.735333
2016-12-01 16394.300000
Freq: MS, Name: season, dtype: float64

Manually apply the re-seasoned forecast to the future...

In [69]: reseason_naive_forecast = train_stl_fit.seasonal[ train_stl_fit.[Link]

Reset the index to convert the Pandas Series to a DataFrame.

In [70]: df_reseason_naive_forecast = reseason_naive_forecast.reset_index()

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 20/25

22/04/2024, 03:17 week_10_intro_forecasting

In [71]: df_reseason_naive_forecast

Out[71]: date_dt season

0 2016-01-01 15809.169605
1 2016-02-01 15633.536883
2 2016-03-01 15696.969856
3 2016-04-01 15759.451227
4 2016-05-01 15849.086638
5 2016-06-01 15938.617668
6 2016-07-01 15947.532849
7 2016-08-01 15911.311304
8 2016-09-01 15778.377951
9 2016-10-01 15928.891486
10 2016-11-01 16277.735333
11 2016-12-01 16394.300000
Extract the Month DateTime component from the date_dt column.
In [72]: df_reseason_naive_forecast['Month'] = df_reseason_naive_forecast.date_dt.dt.

In [73]: df_reseason_naive_forecast

Out[73]: date_dt season Month

0 2016-01-01 15809.169605 1
1 2016-02-01 15633.536883 2
2 2016-03-01 15696.969856 3
3 2016-04-01 15759.451227 4
4 2016-05-01 15849.086638 5
5 2016-06-01 15938.617668 6
6 2016-07-01 15947.532849 7
7 2016-08-01 15911.311304 8
8 2016-09-01 15778.377951 9
9 2016-10-01 15928.891486 10
10 2016-11-01 16277.735333 11
11 2016-12-01 16394.300000 12

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 21/25

22/04/2024, 03:17 week_10_intro_forecasting

Merge the above forecasts with the larger hold-out test forecast DataFrame.
In [74]: df_reseason_naive_forecast.loc[:, ['Month', 'season']].rename(columns={'seas

Out[74]: Month STL_Reseason_Naive

0 1 15809.169605
1 2 15633.536883
2 3 15696.969856
3 4 15759.451227
4 5 15849.086638
5 6 15938.617668
6 7 15947.532849
7 8 15911.311304
8 9 15778.377951
9 10 15928.891486
10 11 16277.735333
11 12 16394.300000
In [75]: my_forecasts_d.merge( df_reseason_naive_forecast.loc[:, ['Month', 'season']]
on=['Month'],
how='left').\
head()

Out[75]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive STL_Rese

0 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7 1
01-01
1 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6 15
02-01
2 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8 15
03-01
3 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7 1
04-01
4 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7 15
05-01
Assign the forecasts with the reseasoned STL reseasoned Naive forecast to a new
DataFrame.
In [76]: my_forecasts_e = my_forecasts_d.merge( df_reseason_naive_forecast.loc[:, ['M
on=['Month'],

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 22/25

22/04/2024, 03:17 week_10_intro_forecasting

how='left').\
copy()

In [77]: my_forecasts_e.head()

Out[77]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive STL_Rese

0 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7 1
01-01
1 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6 15
02-01
2 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8 15
03-01
3 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7 1
04-01
4 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7 15
05-01
Set the .index to be a DateTimeIndex to support the Seaborn wide-format plotting
options.
In [78]: my_forecasts_f = my_forecasts_e.set_index('date_dt', drop=False)

In [79]: my_forecasts_f.head()

Out[79]: date_dt observed AVERAGE Naive Year Month Seasonal_Naive ST

date_dt
2017- 2017- 15854.4 14623.752778 16394.3 2017 1 15625.7
01-01 01-01
2017- 2017- 15627.9 14623.752778 16394.3 2017 2 15486.6
02-01 02-01
2017- 2017- 15635.0 14623.752778 16394.3 2017 3 15576.8
03-01 03-01
2017- 2017- 15686.6 14623.752778 16394.3 2017 4 15648.7
04-01 04-01
2017- 2017- 15759.5 14623.752778 16394.3 2017 5 15745.7
05-01 05-01
Visualize the 4 different simple forecasting methods in one plot and compare to the
known hold-out test set. Reseasoning the Naive seasonally adjusted forecast enables
the forecasts to capture the cyclic pattern present in the hold out data.
In [83]: fig, ax = [Link]( figsize=(15, 6) )

train_series.plot(ax=ax, label='train', color='orange' )

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 23/25

22/04/2024, 03:17 week_10_intro_forecasting

test_series.plot(ax=ax, label='test', color='green' )

my_forecasts_f.[Link]( ax=ax, label='AVERAGE', color='black' )

my_forecasts_f.[Link]( ax=ax, label='Naive', color='cyan' )

my_forecasts_f.Seasonal_Naive.plot( ax=ax, label='Seasonal Naive', color='ma

my_forecasts_f.STL_Reseason_Naive.plot( ax=ax, label='STL Reseason Naive', c

[Link]()

Model selection
You have seen 4 different forecasting methods in this report. Three of the methods
involve NO parameters or model "fitting". Summary statistics are used as the forecast!
These three methods are foundation for all other forecasting methods. The last method
combined a simple approach with a time series decomposition to enable capturing more
advanced patterns. The fourth approach is your first "advanced" method because it
combines the decomposition approach from visualizing and exploring the time series
with a simple forecasting procedure. The simple forecasting methods were executed
using Pandas attributes, methods, and functions, but there are multiple ways to execute
these simple strategies.
The 4 methods were visually compared on the hold out test set. The AVERAGE or MEAN
method clearly does not capture the hold out test set behavior. However, it is visually
difficult to tell which method is better between Seasonal Naive and the STL Reseason
Naive approach. Let's quantify the performance on the hold out test set by calculating a
performance metric appropriate for regression problems. THe cells below calculate
the RMSE for each of the 4 forecasting methods. As shown by the values displayed to

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 24/25

22/04/2024, 03:17 week_10_intro_forecasting

the screen, the Seasonal Naive method has the lowest RMSE on the hold-out test set.
Thus, the Seasonal Naive outperforms the STL Reseasoning approach!
In [80]: [Link]( ( ( my_forecasts_f.observed - my_forecasts_f.Seasonal_Naive )**2 ).

Out[80]: 81.74907839347037

In [81]: [Link]( ( ( my_forecasts_f.observed - my_forecasts_f.AVERAGE )**2 ).mean()

Out[81]: 1190.6051791210214

In [82]: [Link]( ( ( my_forecasts_f.observed - my_forecasts_f.Naive )**2 ).mean() )

Out[82]: 632.3527036537291

In [83]: [Link]( ( ( my_forecasts_f.observed - my_forecasts_f.STL_Reseason_Naive )**

Out[83]: 100.99782463931113

Conclusion
Three of the four methods involve zero parameters. They are summary statistics which
are VERY easy to interpret and describe. You should always include these simple
methods as benchmarks to compare against more complex time series forecasting
methods.
In [ ]:

[Link] 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_intro_forecasting.html 25/25

Time Series Forecasting
No ratings yet
Time Series Forecasting
29 pages
Time Series Forecast - A Basic Introduction Using Python
No ratings yet
Time Series Forecast - A Basic Introduction Using Python
18 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Unit 6 2
No ratings yet
Unit 6 2
6 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
06 Time Series Analysis
No ratings yet
06 Time Series Analysis
9 pages
Time Series Analysis With Python
100% (1)
Time Series Analysis With Python
64 pages
Time Series
No ratings yet
Time Series
34 pages
Week09 Handling Time Series
No ratings yet
Week09 Handling Time Series
24 pages
HDFC Bank Time Series Analysis
No ratings yet
HDFC Bank Time Series Analysis
10 pages
Adsl Exp 9 2024
No ratings yet
Adsl Exp 9 2024
14 pages
Tsa - Time Series Analysis
No ratings yet
Tsa - Time Series Analysis
45 pages
Completed Time Series Analysis! ?
No ratings yet
Completed Time Series Analysis! ?
24 pages
Computational Finance and Algorithmic Trading
No ratings yet
Computational Finance and Algorithmic Trading
11 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Resumos Forecasting
No ratings yet
Resumos Forecasting
17 pages
Unit 5 - Real Time Data Analysis
No ratings yet
Unit 5 - Real Time Data Analysis
16 pages
A Project Based On Python
No ratings yet
A Project Based On Python
17 pages
Unit 2
No ratings yet
Unit 2
37 pages
Markets
No ratings yet
Markets
5 pages
Time Series: "The Art of Forecasting"
100% (1)
Time Series: "The Art of Forecasting"
98 pages
Forecasting 230409164319 90acc507
No ratings yet
Forecasting 230409164319 90acc507
61 pages
E Monika Sree 10-10-2024
No ratings yet
E Monika Sree 10-10-2024
60 pages
I See That You Will Get An A This Semester
No ratings yet
I See That You Will Get An A This Semester
33 pages
Predictive Analytics & Time Series
No ratings yet
Predictive Analytics & Time Series
54 pages
Time Series Forecasting of Energy Consumption
No ratings yet
Time Series Forecasting of Energy Consumption
13 pages
L5 - Forecasting
No ratings yet
L5 - Forecasting
82 pages
Ibd Manual
No ratings yet
Ibd Manual
12 pages
11 Classical Time Series Forecasting Methods in Python (Cheat Sheet)
No ratings yet
11 Classical Time Series Forecasting Methods in Python (Cheat Sheet)
5 pages
An End-to-End Project On Time Series Analysis and Forecasting With Python
No ratings yet
An End-to-End Project On Time Series Analysis and Forecasting With Python
19 pages
Python Time Series Forecasting Guide
No ratings yet
Python Time Series Forecasting Guide
23 pages
IE3265 Forecasting
No ratings yet
IE3265 Forecasting
61 pages
Understanding Time Trends in Forecasting
No ratings yet
Understanding Time Trends in Forecasting
14 pages
Week 10 Intro Time Series
No ratings yet
Week 10 Intro Time Series
34 pages
Assignment 3 Teleco Telecom Revenue - Copy1
No ratings yet
Assignment 3 Teleco Telecom Revenue - Copy1
33 pages
Time Series
No ratings yet
Time Series
98 pages
Forecasting - Introduction
No ratings yet
Forecasting - Introduction
72 pages
ARIMA Model for Time Series in Python
No ratings yet
ARIMA Model for Time Series in Python
11 pages
Statistics Project SEM1 Notes
No ratings yet
Statistics Project SEM1 Notes
5 pages
L010 - TSA Journal
No ratings yet
L010 - TSA Journal
124 pages
Business Analytis C4
No ratings yet
Business Analytis C4
10 pages
Time Series Analysis
No ratings yet
Time Series Analysis
2 pages
Intro To Time Series
No ratings yet
Intro To Time Series
85 pages
? Time Series
No ratings yet
? Time Series
27 pages
Time Series Forecasting
100% (1)
Time Series Forecasting
52 pages
Python Time Series Cheat Sheet
No ratings yet
Python Time Series Cheat Sheet
7 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
04 Getting Started With Pandas
No ratings yet
04 Getting Started With Pandas
85 pages
Time Series Modeling: Shouvik Mani April 5, 2018
No ratings yet
Time Series Modeling: Shouvik Mani April 5, 2018
46 pages
Forecasting Lecture Material
No ratings yet
Forecasting Lecture Material
74 pages
Python Time Series Analysis Guide
No ratings yet
Python Time Series Analysis Guide
27 pages
Time Series EDA for Data Analysts
No ratings yet
Time Series EDA for Data Analysts
20 pages
Forecasting Techniques in Operations Management
No ratings yet
Forecasting Techniques in Operations Management
72 pages
Time Series Using Python
No ratings yet
Time Series Using Python
47 pages
Department of Computer Science & Engineering: Presentation On "Number Guessing Game"
No ratings yet
Department of Computer Science & Engineering: Presentation On "Number Guessing Game"
9 pages
420 KV Gis
100% (2)
420 KV Gis
144 pages
Road Maps Organic Chemistry Set 2 Eklavya @JEEAdvanced - 2024 (2) (4 Files Merged)
No ratings yet
Road Maps Organic Chemistry Set 2 Eklavya @JEEAdvanced - 2024 (2) (4 Files Merged)
11 pages
862 MIR - Instruction Manual Vs30
100% (1)
862 MIR - Instruction Manual Vs30
20 pages
Book B
No ratings yet
Book B
47 pages
Fatigue Strength in Materials
No ratings yet
Fatigue Strength in Materials
5 pages
Electrical Concepts: Why SF6 Gas Used in HV/EHV Circuit Breaker?
No ratings yet
Electrical Concepts: Why SF6 Gas Used in HV/EHV Circuit Breaker?
3 pages
Introduction To Sentaurus TCAD
100% (1)
Introduction To Sentaurus TCAD
47 pages
12-2324-SHW (Practiacal Work)
No ratings yet
12-2324-SHW (Practiacal Work)
4 pages
Grade 7 TVE Dressmaking Test 2
No ratings yet
Grade 7 TVE Dressmaking Test 2
2 pages
Alternative Math Assessments
No ratings yet
Alternative Math Assessments
7 pages
Collection
No ratings yet
Collection
56 pages
Technical Program - ICOA2025
No ratings yet
Technical Program - ICOA2025
8 pages
Schools in Malappuram and Courses
No ratings yet
Schools in Malappuram and Courses
17 pages
Understanding Emotive Meaning in Ethics
No ratings yet
Understanding Emotive Meaning in Ethics
2 pages
FLIR A315 Datasheet
No ratings yet
FLIR A315 Datasheet
12 pages
Sparse Coding in Deep Image SR
No ratings yet
Sparse Coding in Deep Image SR
10 pages
Altivar 31 Manual
No ratings yet
Altivar 31 Manual
94 pages
MMC 1
No ratings yet
MMC 1
55 pages
Vaisala PTB330 Datasheet B210708EN E
No ratings yet
Vaisala PTB330 Datasheet B210708EN E
2 pages
Completion Tools Catalog PDF
No ratings yet
Completion Tools Catalog PDF
174 pages
Semifinal 2004
No ratings yet
Semifinal 2004
5 pages
Olympus Szh10 Brochure
No ratings yet
Olympus Szh10 Brochure
16 pages
Audi Q5 Quattro (8RB) - EWD Headlamps
100% (1)
Audi Q5 Quattro (8RB) - EWD Headlamps
43 pages
Design and Installation of Suctıon Anchor Piles
No ratings yet
Design and Installation of Suctıon Anchor Piles
13 pages
Chemistry Paper 1 Exam 2025
No ratings yet
Chemistry Paper 1 Exam 2025
12 pages
VB UNIT - III Part - II
No ratings yet
VB UNIT - III Part - II
20 pages
Comp 1
No ratings yet
Comp 1
2 pages
邻近堆载作用对既有桩基承载特性的影响分析阙木泰
No ratings yet
邻近堆载作用对既有桩基承载特性的影响分析阙木泰
85 pages
6.debugging Strategies When A Machine Learning System Performs Poorly
No ratings yet
6.debugging Strategies When A Machine Learning System Performs Poorly
5 pages