0% found this document useful (0 votes)
26 views

Complete Guide To Time Series Forecasting (With Codes in Python)

The document provides a comprehensive guide to time series forecasting with codes in Python. It discusses loading and handling time series data in Pandas, checking and ensuring stationarity, and building forecasting models. The guide covers theoretical concepts and applies them in Python codes for an end-to-end demonstration of the time series forecasting process.

Uploaded by

Prem Prakash
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Complete Guide To Time Series Forecasting (With Codes in Python)

The document provides a comprehensive guide to time series forecasting with codes in Python. It discusses loading and handling time series data in Pandas, checking and ensuring stationarity, and building forecasting models. The guide covers theoretical concepts and applies them in Python codes for an end-to-end demonstration of the time series forecasting process.

Uploaded by

Prem Prakash
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

! (https://www.facebook.com/AnalyticsVidhya) " (https://twitter.com/analyticsvi

# LOGIN / REGISTER (HTTPS://ID.ANALYTICSVIDHYA.COM/ACCOUNTS/LOGIN/?NEXT=HTTPS://WWW.ANALY

HOME (HTTPS://WWW.ANALYTICSVIDHYA.COM) BLOG ARCHIVE (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG-ARCHIVE/)

CORPORATE (HTTPS://WWW.ANALYTICSVIDHYA.COM/CORPORATE/)

(https://datahack.analyticsvidhya.com/contest/in
(https://www.analyticsvidhya.com/blog/)
utm_source=blog&utm_medium=topBanner&utm_

Home (https://www.analyticsvidhya.com/) $ Data Science (https://www.analyticsvidhya.com/blog/category/data-science/) $


comprehensive beginner’s guide to create a Time Series Forecast (with Codes in Python and R)
(https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/)

DATA SCIENCE (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/DATA-SCIENCE/)

PYTHON (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/PYTHON-2/)

STATISTICS (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/STATISTICS/)

TIME SERIES (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/TIME-SERIES/)

A comprehensive beginner’s guide to create a


Time Series Forecast (with Codes in Python
and R)
AARSHAY JAIN (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/AUTHOR/AARSHAY/), FEBRUARY 6, 2016 LOGIN TO BOOKMARK TH

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 1 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://datahack.analyticsvidhya.com/contest/wns-analytics-wizard-2019/?
utm_source=AVBannerbelowtitle&utm_medium=display&utm_campaign=datamin)

Overview
Learn the steps to create a Time Series forecast
Additional focus on Dickey-Fuller test & ARIMA (Autoregressive, moving average) models
Learn the concepts theoretically as well as with their implementation in python

Introduction
Time Series (http://courses.analyticsvidhya.com/courses/creating-time-series-forecast-using-p
utm_source=blog&utm_medium=TimeSeriesForecastComprehensivearticle) (referred as TS from no
considered to be one of the less known skills in the data s
(http://courses.analyticsvidhya.com/courses/introduction-to-data-science-2?
utm_source=blog&utm_medium=TimeSeriesForecastComprehensivearticle) space (Even I had little clue a
a couple of days back). I set myself on a journey to learn the basic steps for solving a Time Series proble
here I am sharing the same with you. These will de_nitely help you get a decent model in any future proje
take up!

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 2 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-
content/uploads/2016/02/A-comprehensive-beginner%E2%80%99s-guide-
complete guide to create a time series forecast with python
to-create-a-Time-Series-Forecast-with-Codes-in-Python.png)

Before going through this article, I highly recommend reading A Complete Tutorial on Time Series Modelin
(https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/) and taking th
Time Series Forecasting course (http://courses.analyticsvidhya.com/courses/creating-time-series-for
using-python?utm_source=blog&utm_medium=TimeSeriesForecastComprehensivearticle). It focuses
fundamental concepts and I will focus on using these concepts in solving a problem end-to-end along with
in Python (http://courses.analyticsvidhya.com/courses/introduction-to-data-scie
utm_source=blog&utm_medium=TimeSeriesForecastComprehensivearticle). Many resources exist for
series in R but very few are there for Python so I’ll be using Python in this article.

Our journey would go through the following steps:

1. What makes Time Series Special?


2. Loading and Handling Time Series in Pandas
3. How to Check Stationarity of a Time Series?
4. How to make a Time Series Stationary?
5. Forecasting a Time Series

1. What makes Time Series Special?


As the name suggests, TS is a collection of data points collected at constant time intervals
intervals. The
analyzed to determine the long term trend so as to forecast the future or perform some other form of an
But what makes a TS different from say a regular regression problem? There are 2 things:

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 3 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

1. It is time dependent
dependent. So the basic assumption of a linear regression model that the observatio
independent doesn’t hold in this case.
2. Along with an increasing or decreasing trend, most TS have some form of seasonality trend
variations speci_c to a particular time frame. For example, if you see the sales of a woolen jacke
time, you will invariably _nd higher sales in winter seasons.

Because of the inherent properties of a TS, there are various steps involved in analyzing it. These are disc
in detail below. Lets start by loading a TS object in Python. We’ll be using the popular AirPassengers da
which can be downloaded here (https://www.analyticsvidhya.co
content/uploads/2016/02/AirPassengers.csv).

Please note that the aim of this article is to familiarize you with the various techniques used for TS in ge
The example considered here is just for illustration and I will focus on coverage a breadth of topics a
making a very accurate forecast.

2. Loading and Handling Time Series in Pandas


Pandas has dedicated libraries for handling TS objects, particularly the datatime64[ns] class which store
information and allows us to perform some operations really fast. Lets start by _ring up the required librarie

import pandas as pd

import numpy as np

import matplotlib.pylab as plt

%matplotlib inline

from matplotlib.pylab import rcParams


rcParams['figure.figsize'] = 15, 6

Now, we can load the data set and look at some initial rows and data types of the columns:

data = pd.read_csv('AirPassengers.csv')

print data.head()

print '\n Data Types:'

print data.dtypes

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 4 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/1.-dataload-1.png)

The data contains a particular month and number of passengers travelling in that month. But this is still no
as a TS object as the data types are ‘object’ and ‘int’. In order to read the data as a time series, we have to
special arguments to the read_csv command:

dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')

data = pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col='Month',date_pars

dateparse)

print data.head()

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/2.-dataload-2.png)

Let’s understand the arguments one by one:

1. parse_dates
parse_dates: This speci_es the column which contains the date-time information. As we say abo
column name is ‘Month’.
2. index_col: A key idea behind using Pandas for TS data is that the index has to be the variable dep

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 5 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

date-time information. So this argument tells pandas to use the ‘Month’ column as index.
3. date_parser: This speci_es a function which converts an input string into datetime variable. Be d
Pandas reads data in format ‘YYYY-MM-DD HH:MM:SS’. If the data is not in this format, the format
be manually de_ned. Something similar to the dataparse function de_ned here can be used fo
purpose.

Now we can see that the data has time object as index and #Passengers as the column. We can cross-chec
datatype of the index with the following command:

data.index

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/3.-index-type.png)

Notice the dtype=’datetime[ns]’ which con_rms that it is a datetime object. As a personal preference, I w
convert the column into a Series object to prevent referring to columns names every time I use the TS. Plea
feel free to use as a dataframe is that works better for you.

ts = data[‘#Passengers’] ts.head(10)

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/4.-series.png)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 6 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Before going further, I’ll discuss some indexing techniques for TS data. Lets start by selecting a particular
in the Series object. This can be done in following 2 ways:

#1. Specific the index as a string constant:

ts['1949-01-01']

#2. Import the datetime library and use 'datetime' function:

from datetime import datetime

ts[datetime(1949,1,1)]

Both would return the value ‘112’ which can also be con_rmed from previous output. Suppose we want
data upto May 1949. This can be done in 2 ways:

#1. Specify the entire range:

ts['1949-01-01':'1949-05-01']

#2. Use ':' if one of the indices is at ends:

ts[:'1949-05-01']

Both would yield following output:

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/5.-index-range.png)

There are 2 things to note here:

1. Unlike numeric indexing, the end index is included here here. For instance, if we index a list as a[:5]
would return the values at indices – [0,1,2,3,4]. But here the index ‘1949-05-01’ was included in the ou
2. The indices have to be sorted for ranges to work. If you randomly shune the index, this won’t wor

Consider another instance where you need all the values of the year 1949. This can be done as:

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 7 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

ts['1949']

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/6.-index-year.png)

The month part was omitted. Similarly if you all days of a particular month, the day part can be omitted.

Now, lets move onto the analyzing the TS.

3. How to Check Stationarity of a Time Series?


A TS is said to be stationary if its statistical properties such as mean, variance remain constant over
But why is it important? Most of the TS models work on the assumption that the TS is stationary. Intuitive
can sat that if a TS has a particular behaviour over time, there is a very high probability that it will follow the
in the future. Also, the theories related to stationary series are more mature and easier to impleme
compared to non-stationary series.

Stationarity is de_ned using very strict criterion. However, for practical purposes we can assume the series
stationary if it has constant statistical properties over time, ie. the following:

1. constant mean
2. constant variance
3. an autocovariance that does not depend on time.

I’ll skip the details as it is very clearly de_ned in this


(https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/). Lets move on
ways of testing stationarity. First and foremost is to simple plot the data and analyze visually. The data c

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 8 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

plotted using following command:

plt.plot(ts)

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/7.-ts.png)

It is clearly evident that there is an overall increasing trend in the data along with some seasonal varia
However, it might not always be possible to make such visual inferences (we’ll see such cases later). So
formally, we can check stationarity using the following:

1. Plotting Rolling Statistics: We can plot the moving average or moving variance and see if it varie
time. By moving average/variance I mean that at any instant ‘t’, we’ll take the average/variance of th
year, i.e. last 12 months. But again this is more of a visual technique.
2. Dickey-Fuller Test: This is one of the statistical tests for checking stationarity. Here the null hypo
is that the TS is non-stationary. The test results comprise of a Test Statistic and some Critical V
for difference con_dence levels. If the ‘Test Statistic’ is less than the ‘Critical Value’, we can reject th
hypothesis and say that the series is stationary. Refer this
(https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/) for details

These concepts might not sound very intuitive at this point. I recommend going through the prequel art
you’re interested in some theoretical statistics, you can refer Introduction to Time Series
Forecasting by Brockwell and DavisDavis. The book is a bit stats-heavy, but if you have the skill to read-be
lines, you can understand the concepts and tangentially touch the statistics.

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 9 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Back to checking stationarity, we’ll be using the rolling statistics plots along with Dickey-Fuller test result
so I have de_ned a function which takes a TS as input and generated them for us. Please note that I’ve p
standard deviation instead of variance to keep the unit similar to mean.

from statsmodels.tsa.stattools import adfuller

def test_stationarity(timeseries):

#Determing rolling statistics

rolmean = pd.rolling_mean(timeseries, window=12)

rolstd = pd.rolling_std(timeseries, window=12)

#Plot rolling statistics:

orig = plt.plot(timeseries, color='blue',label='Original')

mean = plt.plot(rolmean, color='red', label='Rolling Mean')

std = plt.plot(rolstd, color='black', label = 'Rolling Std')

plt.legend(loc='best')

plt.title('Rolling Mean & Standard Deviation')

plt.show(block=False)

#Perform Dickey-Fuller test:

print 'Results of Dickey-Fuller Test:'

dftest = adfuller(timeseries, autolag='AIC')


dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Numb

of Observations Used'])

for key,value in dftest[4].items():

dfoutput['Critical Value (%s)'%key] = value

print dfoutput

The code is pretty straight forward. Please feel free to discuss the code in comments if you face challenges
grasping it.

Let’s run it for our input series:

test_stationarity(ts)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 10 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/1.-dfuller-ts.png)

Though the variation in standard deviation is small, mean is clearly increasing with time and this is
stationary series. Also, the test statistic is way more than the critical values. Note that the signed v
should be compared and not the absolute values.

Next, we’ll discuss the techniques that can be used to take this TS towards stationarity.

4. How to make a Time Series Stationary?


Though stationarity assumption is taken in many TS models, almost none of practical time series are stat
So statisticians have _gured out ways to make series stationary, which we’ll discuss now. Actually, its a
impossible to make a series perfectly stationary, but we try to take it as close as possible.

Lets understand what is making a TS non-stationary. There are 2 major reasons behind non-stationaruty of
1. Trend – varying mean over time. For eg, in this case we saw that on average, the number of passenge
growing over time.

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 11 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

2. Seasonality – variations at speci_c time-frames. eg people might have a tendency to buy cars in a par
month because of pay increment or festivals.

The underlying principle is to model or estimate the trend and seasonality in the series and remove those
the series to get a stationary series. Then statistical forecasting techniques can be implemented on this
The _nal step would be to convert the forecasted values into the original scale by applying trend and seas
constraints back.

Note: I’ll be discussing a number of methods. Some might work well in this case and others might not. But
idea is to get a hang of all the methods and not focus on just the problem at hand.

Let’s start by working on the trend part.

Estimating & Eliminating Trend


One of the _rst tricks to reduce trend can be transformation
transformation. For example, in this case we can clearly se
the there is a signi_cant positive trend. So we can apply transformation which penalize higher values mor
smaller values. These can be taking a log, square root, cube root, etc. Lets take a log transform
simplicity:

ts_log = np.log(ts)
plt.plot(ts_log)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 12 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/9.-ts-log.png)

In this simpler case, it is easy to see a forward trend in the data. But its not very intuitive in presence of noi
we can use some techniques to estimate or model this trend and then remove it from the series. There c
many ways of doing it and some of most commonly used are:

1. Aggregation – taking average for a time period like monthly/weekly averages


2. Smoothing – taking rolling averages
3. Polynomial Fitting
itting – _t a regression model

I will discuss smoothing here and you should try other techniques as well which might work out for
problems. Smoothing refers to taking rolling estimates, i.e. considering the past few instances. There are c
various ways but I will discuss two of those here.

Moving average

In this approach, we take average of ‘k’ consecutive values depending on the frequency of time series. He
can take the average over the past 1 year, i.e. last 12 values. Pandas has speci_c functions de_n
determining rolling statistics.

moving_avg = pd.rolling_mean(ts_log,12)
plt.plot(ts_log)

plt.plot(moving_avg, color='red')

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 13 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/10.-smooth-1.png)

The red line shows the rolling mean. Lets subtract this from the original series. Note that since we are takin
average of last 12 values, rolling mean is not de_ned for _rst 11 values. This can be observed as:

ts_log_moving_avg_diff = ts_log - moving_avg

ts_log_moving_avg_diff.head(12)

(https://www.analyticsvidhya.com/wp-

content/uploads/2016/02/10.5-missing-rolling.png)

Notice the _rst 11 being Nan. Lets drop these NaN values and check the plots to test stationarity.

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 14 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

ts_log_moving_avg_diff.dropna(inplace=True)

test_stationarity(ts_log_moving_avg_diff)

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/2.-dfuller-smooth-1.png)

This looks like a much better series. The rolling values appear to be varying slightly but there is no speci_c
Also, the test statistic is smaller than the 5% critical values so we can say with 95% con_dence that th
stationary series.

However, a drawback in this particular approach is that the time-period has to be strictly de_ned. In this ca
can take yearly averages but in complex situations like forecasting a stock price, its dipcult to come up
number. So we take a ‘weighted moving average’ where more recent values are given a higher weight. The
be many technique for assigning weights. A popular one is exponentially weighted moving average
weights are assigned to all the previous values with a decay factor. Find details
(http://pandas.pydata.org/pandas-docs/stable/computation.html#exponentially-weighted-moment-functio
This can be implemented in Pandas as:

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 15 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

expwighted_avg = pd.ewma(ts_log, halflife=12)


plt.plot(ts_log)

plt.plot(expwighted_avg, color='red')

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/12.-smooth-2.png)

Note that here the parameter ‘halqife’ is used to de_ne the amount of exponential decay. This is ju
assumption here and would depend largely on the business domain. Other parameters like span and cen
mass can also be used to de_ne decay which are discussed in the link shared above. Now, let’s remove thi
series and check stationarity:

ts_log_ewma_diff = ts_log - expwighted_avg

test_stationarity(ts_log_ewma_diff)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 16 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/3.-dfuller-smooth-2.png)

This TS has even lesser variations in mean and standard deviation in magnitude. Also, the test stati
smaller than the 1% critical value
value, which is better than the previous case. Note that in this case there
no missing values as all values from starting are given weights. So it’ll work even with no previous values.

Eliminating Trend and Seasonality

The simple trend reduction techniques discussed before don’t work in all cases, particularly the ones wit
seasonality. Lets discuss two ways of removing trend and seasonality:

1. Differencing – taking the differece with a particular time lag


2. Decomposition – modeling both trend and seasonality and removing them from the model.

Differencing

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 17 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

One of the most common methods of dealing with both trend and seasonality is differencing. In this tech
we take the difference of the observation at a particular instant with that at the previous instant. This m
works well in improving stationarity. First order differencing can be done in Pandas as:

ts_log_diff = ts_log - ts_log.shift()


plt.plot(ts_log_diff)

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/14.-ts-diff.png)

This appears to have reduced trend considerably. Lets verify using our plots:

ts_log_diff.dropna(inplace=True)
test_stationarity(ts_log_diff)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 18 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/4.-dfuller-diff.png)

We can see that the mean and std variations have small variations with time. Also, the Dickey-Fuller test st
is less than the 10% critical value
value, thus the TS is stationary with 90% con_dence. We can also take sec
third order differences which might get even better results in certain applications. I leave it to you to try them

Decomposing

In this approach, both trend and seasonality are modeled separately and the remaining part of the se
returned. I’ll skip the statistics and come to the results:

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 19 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(ts_log)

trend = decomposition.trend
seasonal = decomposition.seasonal

residual = decomposition.resid

plt.subplot(411)
plt.plot(ts_log, label='Original')

plt.legend(loc='best')
plt.subplot(412)

plt.plot(trend, label='Trend')
plt.legend(loc='best')

plt.subplot(413)
plt.plot(seasonal,label='Seasonality')

plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')

plt.legend(loc='best')
plt.tight_layout()

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/16.-decompose.png)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 20 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Here we can see that the trend, seasonality are separated out from data and we can model the residuals
check stationarity of residuals:

ts_log_decompose = residual

ts_log_decompose.dropna(inplace=True)
test_stationarity(ts_log_decompose)

(https://www.analyticsvidhya.com/blog/)
BLOG (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/?UTM_SOURCE

COURSES (HTTPS://COURSES.ANALYTICSVIDHYA.COM) % HACKATHONS (HTTPS://DATAHACK.A

DATAMIN (HTTPS://DATAMIN.ANALYTICSVIDHYA.COM/?UTM_SOURCE=HOME_BLOG_NAVBAR)

DATAHACK SUMMIT 2019 (HTTPS://WWW.ANALYTICSVIDHYA.COM/DATAHACK-SUMMIT-2019?UTM

CONTACT (HTTPS://WWW.ANALYTICSVIDHYA.COM/CONTACT/)

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/5.-dfuller-decompose.png)

The Dickey-Fuller test statistic is signi_cantly lower than the 1% critical value value. So this TS is very cl
stationary. You can try advanced decomposition techniques as well which can generate better results. Als
should note that converting the residuals into original values for future data in not very intuitive in this case

5. Forecasting a Time Series

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 21 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

We saw different techniques and all of them worked reasonably well for making the TS stationary. Lets
model on the TS after differencing as it is a very popular technique. Also, its relatively easier to add nois
seasonality back into predicted residuals in this case. Having performed the trend and seasonality estim
techniques, there can be two situations:

1. A strictly stationary series with no dependence among the values. This is the easy case wherein w
can model the residuals as white noise. But this is very rare.
2. A series with signi_cant dependence among valuesvalues. In this case we need to use some statistical
models like ARIMA to forecast the data.

Let me give you a brief introduction to ARIMA


ARIMA. I won’t go into the technical details but you should unde
these concepts in detail if you wish to apply them more effectively. ARIMA stands for Auto-Regre
Integrated Moving Averages
Averages. The ARIMA forecasting for a stationary time series is nothing but a linear
linear regression) equation. The predictors depend on the parameters (p,d,q) of the ARIMA model:

1. Number of AR (Auto-Regressive) terms (p): AR terms are just lags of dependent variable. For
instance if p is 5, the predictors for x(t) will be x(t-1)….x(t-5).
2. Number of MA (Moving Average) terms (q): MA terms are lagged forecast errors in prediction
equation. For instance if q is 5, the predictors for x(t) will be e(t-1)….e(t-5) where e(i) is the difference
between the moving average at ith instant and actual value.
3. Number of Differences (d): These are the number of nonseasonal differences, i.e. in this case we
the _rst order difference. So either we can pass that variable and put d=0 or pass the original variable
put d=1. Both will generate same results.

An importance concern here is how to determine the value of ‘p’ and ‘q’. We use two plots to determine thes
numbers. Lets discuss them _rst.

1. Autocorrelation Function (ACF): It is a measure of the correlation between the the TS with a l
version of itself. For instance at lag 5, ACF would compare series at time instant ‘t1’…’t2’ with se
instant ‘t1-5’…’t2-5’ (t1-5 and t2 being end points).
2. Partial Autocorrelation Function (PACF): This measures the correlation between the TS with a l
version of itself but after eliminating the variations already explained by the intervening comparisons
lag 5, it will check the correlation but remove the effects already explained by lags 1 to 4.

The ACF and PACF plots for the TS after differencing can be plotted as:

#ACF and PACF plots:

from statsmodels.tsa.stattools import acf, pacf

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 22 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

lag_acf = acf(ts_log_diff, nlags=20)

lag_pacf = pacf(ts_log_diff, nlags=20, method='ols')

#Plot ACF:

plt.subplot(121)
plt.plot(lag_acf)

plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')

plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Autocorrelation Function')

#Plot PACF:
plt.subplot(122)
plt.plot(lag_pacf)
plt.axhline(y=0,linestyle='--',color='gray')

plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Partial Autocorrelation Function')
plt.tight_layout()

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/6.-acf-pcf-_nal.png)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 23 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

In this plot, the two dotted lines on either sides of 0 are the con_dence interevals. These can be used to
determine the ‘p’ and ‘q’ values as:

1. p – The lag value where the PACF chart crosses the upper con_dence interval for the _rst time. If you
notice closely, in this case p=2.
2. q – The lag value where the ACF chart crosses the upper con_dence interval for the _rst time. If you
closely, in this case q=2.

Now, lets make 3 different ARIMA models considering individual as well as combined effects. I will also prin
RSS for each. Please note that here RSS is for the values of residuals and not actual series.

We need to load the ARIMA model _rst:

from statsmodels.tsa.arima_model import ARIMA

The p,d,q values can be speci_ed using the order argument of ARIMA which take a tuple (p,d,q). Let model t
cases:

AR Model

model = ARIMA(ts_log, order=(2, 1, 0))

results_AR = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_AR.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_AR.fittedvalues-ts_log_diff)**2))

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 24 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/18.-model-AR.png)

MA Model

model = ARIMA(ts_log, order=(0, 1, 2))

results_MA = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_MA.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_MA.fittedvalues-ts_log_diff)**2))

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 25 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/19.-model-MA.png)

Combined Model

model = ARIMA(ts_log, order=(2, 1, 2))


results_ARIMA = model.fit(disp=-1)
plt.plot(ts_log_diff)
plt.plot(results_ARIMA.fittedvalues, color='red')

plt.title('RSS: %.4f'% sum((results_ARIMA.fittedvalues-ts_log_diff)**2))

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 26 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/20.-model-both.png)

Here we can see that the AR and MA models have almost the same RSS but combined is signi_cantly bette
Now, we are left with 1 last step, i.e. taking these values back to the original scale.

Taking it back to original scale

Since the combined model gave best result, lets scale it back to the original values and see how well it perf
there. First step would be to store the predicted results as a separate series and observe it.

predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)


print predictions_ARIMA_diff.head()

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/21.-check-output.png)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 27 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Notice that these start from ‘1949-02-01’ and not the _rst month. Why? This is because we took a lag by 1 a
_rst element doesn’t have anything before it to subtract from. The way to convert the differencing to log sca
to add these differences consecutively to the base number. An easy way to do it is to _rst determine the
cumulative sum at index and then add it to the base number. The cumulative sum can be found as:

predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()

print predictions_ARIMA_diff_cumsum.head()

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/22.-cumsum.png)

You can quickly do some back of mind calculations using previous output to check if these are correct. Nex
we’ve to add them to base number. For this lets create a series with all values as base number and add the
differences to it. This can be done as:

predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)


predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value
predictions_ARIMA_log.head()

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/23.-add-cumsum.png)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 28 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Here the _rst element is base number itself and from thereon the values cumulatively added. Last step is to
the exponent and compare with the original series.

predictions_ARIMA = np.exp(predictions_ARIMA_log)

plt.plot(ts)
plt.plot(predictions_ARIMA)
plt.title('RMSE: %.4f'% np.sqrt(sum((predictions_ARIMA-ts)**2)/len(ts)))

(https://www.analyticsvidhya.com/wp-content/uploads/2016/02/24.-_nal-plot.png)

That’s all in Python. Well, let’s learn how to implement a time series forecast in R.

Time Series Forecast in R

Step 1: Reading data and calculating basic summary

1 #Installing packages and calling out the libraries


2 install.packages("summarytools")
3 install.packages("tseries")
4 install.packages("forecast")

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 29 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

5 library(forecast)
6 library(ggplot2)
7 library(tseries)
8 library(summarytools)
9
10 #Reading the Airpaseengers data
11 data("AirPassengers")
12 tsdata<-AirPassengers
13 #Identifying the class of data
14 class(tsdata)
15 #Observations of the time series data
16 tsdata
17 #Summary of the data and missi
18 dfSummary(tsdata)

hub.com/Harshit1694/70e88bb3daa681aa3e34d7413d75ca0a/raw/c9174ce4092bc1fdd729c136c3d6fa69dd41ef0e/Importing_
Importing_ts.R (https://gist.github.com/Harshit1694/70e88bb3daa681aa3e34d7413d75ca0a#file-importing_ts-r)
with by GitHub (https://github.com)

Output

class(tsdata)
"ts"

> #Observations of the time series data


> tsdata
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118

1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201

1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336

1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 30 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

> #Summary of the data and missi

tsdata was converted to a data frame


Data Frame Summary
tsdata

Dimensions: 144 x 1
Duplicates: 26

------------------------------------------------------------------------------------------
-------

No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing


---- ---------- -------------------------- ----------------------- --------------------- -
---- --
1 tsdata Mean (sd) : 280.3 (120) 118 distinct values . : . 144 0

[ts] min < med < max: Start: 1949-01 : : . . : (100%) (0%)
104 < 265.5 < 622 End : 1960-12 : : : : :
IQR (CV) : 180.5 (0.4) : : : : : : :
: : : : : : : : . .

------------------------------------------------------------------------------------------
-------

Step 2: Checking the cycle of Time Series Data and Plotting the Raw Data

1 #Check the cycle of data and plot the raw data


2 as.data.frame(tsdata)
3 cycle(tsdata)
4 plot(tsdata, ylab="Passengers (1000s)", type="o")

github.com/Harshit1694/ebd831e9196de12602dd41f920e3df3c/raw/57c2280380d98c18be43408af81563c4c669183b/cycle_
cycle_ts.R (https://gist.github.com/Harshit1694/ebd831e9196de12602dd41f920e3df3c#file-cycle_ts-r) hosted with
GitHub (https://github.com)

Output

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 31 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

cycle(tsdata)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 1 2 3 4 5 6 7 8 9 10 11 12
1950 1 2 3 4 5 6 7 8 9 10 11 12

1951 1 2 3 4 5 6 7 8 9 10 11 12
1952 1 2 3 4 5 6 7 8 9 10 11 12
1953 1 2 3 4 5 6 7 8 9 10 11 12
1954 1 2 3 4 5 6 7 8 9 10 11 12

1955 1 2 3 4 5 6 7 8 9 10 11 12
1956 1 2 3 4 5 6 7 8 9 10 11 12
1957 1 2 3 4 5 6 7 8 9 10 11 12
1958 1 2 3 4 5 6 7 8 9 10 11 12

1959 1 2 3 4 5 6 7 8 9 10 11 12
1960 1 2 3 4 5 6 7 8 9 10 11 12

Step 3: Decomposing the time series data

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 32 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

1 #Decomposing the data into its trend, seasonal, and random error components
2 tsdata_decom <- decompose(tsdata, type = "multiplicative")
3 plot(tsdata_decom)

hub.com/Harshit1694/d93bd9f73087662b7724276b73d63e38/raw/4b43c7013518b46a622d42a57a26cbaad21bd178/decom_
decom_ts.R (https://gist.github.com/Harshit1694/d93bd9f73087662b7724276b73d63e38#file-decom_ts-r) hosted w
by GitHub (https://github.com)

Output

Step 4: Test the stationarity of data

1 #Testing the stationarity of the data


2 #Augmented Dickey-Fuller Test
3 adf.test(tsdata)
4
5 #Autocorrelation test
6 autoplot(acf(tsdata,plot=FALSE))+ labs(title="Correlogram of Air Passengers data")
7

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 33 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

8 tsdata_decom$random
9 autoplot(acf(tsdata_decom$random[7:138],plot=FALSE))+ labs(title="Correlogram of Air Passenge

gist.github.com/Harshit1694/f8cfd3565c3c29bfbc2e86d711e3c409/raw/1ec21aa3cfecf5663d7f9766ca64457e83772fec/stat_
stat_ts.R (https://gist.github.com/Harshit1694/f8cfd3565c3c29bfbc2e86d711e3c409#file-stat_ts-r) hosted with
GitHub (https://github.com)

Output

Augmented Dickey-Fuller Test

data: tsdata
Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01

alternative hypothesis: stationary

the p-value is 0.01 which is <0.05, therefore, we reject the null hypothesis and hence time series is stationar

The maximum lag is at 1 or 12 months, indicates a positive relationship with the 12-month cycle.

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 34 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Autoplot the random time series observations from 7:138 which exclude the NA values

Step 5: Fitting the model

1 #Fitting the model


2 #Linear model
3 autoplot(tsdata) + geom_smooth(method="lm")+ labs(x ="Date", y = "Passenger numbers (1000's)
4
5 #ARIMA Model
6 arimats <- auto.arima(tsdata)
7 arimats
8 ggtsdiag(arimats)

st.github.com/Harshit1694/8e32d9cbfc7337e9650588e6d289539b/raw/f5231b99caadeb803b9c39609c13c72f4240a6e2/fit_
fit_ts.R (https://gist.github.com/Harshit1694/8e32d9cbfc7337e9650588e6d289539b#file-fit_ts-r) hosted with
GitHub (https://github.com)

Output

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 35 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Series: tsdata
ARIMA(2,1,1)(0,1,0)[12]

Coefficients:
ar1 ar2 ma1
0.5960 0.2143 -0.9819
s.e. 0.0888 0.0880 0.0292

sigma^2 estimated as 132.3: log likelihood=-504.92


AIC=1017.85 AICc=1018.17 BIC=1029.35

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 36 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Step 6: Forecasting

1 #Forecast of Arima Model


2 fts <- forecast(arimats, level = c(95))
3 autoplot(fts)

hub.com/Harshit1694/2a0f0426e25cda8ee7101be829971b9a/raw/567bf5032a0d842c6bc4c328316f2684b327a24b/forecast_
forecast_ts.R (https://gist.github.com/Harshit1694/2a0f0426e25cda8ee7101be829971b9a#file-forecast_ts-r) hosted
by GitHub (https://github.com)

Output

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 37 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Finally we have a forecast at the original scale. Not a very good forecast I would say but you got the idea rig
Now, I leave it upto you to re_ne the methodology further and make a better solution.

Projects
Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on
challenge? Test the techniques discussed in this post and accelerate your learning in Time Series Analysis
the following Practice Problems:

Practice Problem: Food Demand Forecasting


(https://datahack.analyticsvidhya.com/conte
machine-learning-hackathon-1/?utm_source=
series-forecasting-codes-python&utm_mediu
(https://datahack.analyticsvidhya.com/contest/genpact-
machine-learning-hackathon-1/?utm_source=complete-
tutorial-learn-data-science-python-scratch-
2&utm_medium=blog)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 38 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Practice Problem: Time Series Analyses


(https://datahack.analyticsvidhya.com/conte
problem-time-series-2/?utm_source=time-ser
forecasting-codes-python&utm_medium=blog

End Notes
Through this article I have tried to give you a standard approach for solving time series problem. This c
have come at a better time as today is our Mini DataHack (http://datahack.analyticsvidhya.com/contes
datahack) which will challenge you to solve a similar problem. We’ve covered concepts of stationarity, h
take a time series closer to stationarity and _nally forecasting the residuals. It was a long journey and I s
some statistical details which I encourage you to refer using the suggested material. If you don’t want to
paste, you can download the iPython notebook with all the codes from my
(https://github.com/aarshayj/Analytics_Vidhya/tree/master/Articles) repository.

I hope this article will help you achieve a good _rst solution today. All the best guys!

Did you like the article? How helpful was it in the hackathon today? Somethings bothering you which you wi
discuss further? Please feel free to post a comment and I’ll be more than happy to discuss.

Note – The discussions of this article are going on at AV’s Discuss portal.
here (https://discuss.analyticsvidhya.com/t/discussions-for-article-a-
comprehensive-beginners-guide-to-create-a-time-series-forecast-with-code
in-python/65783?u=jalfaizy)!

You can also read this article on Analytics Vidhya's Android APP

(//play.google.com/store/apps/details?
id=com.analyticsvidhya.android&utm_source=blog_article&utm_campaign=blog&pcampaignid=MKT-Other-
global-all-co-prtnr-py-PartBadge-Mar2515-1)

Share this:

 (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?share=linkedin&nb=1)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 39 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

 (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?share=facebook&nb=1)

 (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?share=twitter&nb=1)

 (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?share=pocket&nb=1)

 (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/?share=reddit&nb=1)

Related Articles

(https://www.analyticsvidhya.com/blog/2018/08/auto-
(https://www.analyticsvidhya.com/blog/2018/10/predicting-
(https://www.analyticsvidhya.c
arima-time-series-modeling- stock-price-machine-learningnd- time-series-guide-forecasting-
python-r/) deep-learning-techniques-python/) modeling-python-codes/)
Build High Performance Time Series Stock Prices Prediction Using A Multivariate Time Series Guide
Models using Auto ARIMA in Python Machine Learning and Deep Learning Forecasting and Modeling (with
and R Techniques (with Python codes) Python codes)
(https://www.analyticsvidhya.com/bl (https://www.analyticsvidhya.com/bl (https://www.analyticsvidhya.co
og/2018/08/auto-arima-time-series- og/2018/10/predicting-stock-price- og/2018/09/multivariate-time-se
modeling-python-r/) machine-learningnd-deep-learning- guide-forecasting-modeling-pyth
August 30, 2018 techniques-python/) codes/)
In "Python" October 25, 2018 September 27, 2018
In "Deep Learning" In "Python"

TAGS : AR MODEL (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/AR-MODEL/), ARIMA


(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/ARIMA/), FORECASTING ANALYTICS
(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/FORECASTING-ANALYTICS/), MA MODEL

(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/MA-MODEL/), MOVING AVERAGE


(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/MOVING-AVERAGE/), PANDAS
(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/PANDAS/), PYTHON (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/PYTHON
SERIES (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/TIME-SERIES/), TIME SERIES ANALYSIS
(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/TIME-SERIES-ANALYSIS/), TIME SERIES FORECASTING

(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/TIME-SERIES-FORECASTING/)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 40 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

PREVIOUS ARTICLE NEXT ARTICLE

' Mini DataHack and the tactics of & What I learnt about Time Series
the three “Last Man Standing”! Analysis in 3 hour Mini DataHack?
(https://www.analyticsvidhya.com/blog/2016/02/secrets-
(https://www.analyticsvidhya.com/blog/20
winners-signature-hackathon-last-man- learn-time-series-3-hours-mini-
standing/) datahack/)

(https://www.analyticsvidhya.com/blog/author/aarshay/)
Aarshay Jain
(Https://Www.analyticsvidhya.com/Blog/Author/Aarshay/)
Aarshay is a ML enthusiast, pursuing MS in Data Science at Columbia University,
graduating in Dec 2017. He is currently exploring the various ML techniques and
writes articles for AV to share his knowledge with the community.

* (mailto:[email protected]) ) (https://in.linkedin.com/in/aarshayjain)

+ (https://github.com/aarshayj) ( (aarshay)

This article is quite old and you might not get a prompt response from the author. We request you to pos
this comment on Analytics Vidhya's Discussion portal (https://discuss.analyticsvidhya.com/) to get yo
queries resolved

72 COMMENTS

DR.D.K.SAMUEL
February 6, 2016 at 10:02 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-105271)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 41 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Real thanks for a post which met my need

AARSHAY JAIN
February 6, 2016 at 10:04 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-cod
python/#comment-105272)

I’m glad you liked it

SATISH
February 7, 2016 at 3:17 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-105327)

Thanks for great explanation related to timeseries. What is difference between holtwinters and arima forca

SHAN
February 8, 2016 at 6:09 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-code
python/#comment-105357)

Holtwinters is double exponential smoothening method. ARIMA, forecasts by identifying p,d,q


component of a series. Hope it helps.

AARSHAY JAIN
February 8, 2016 at 6:51 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-
forecasting-codes-python/#comment-105358)

To add to Shan, Holtwinters uses a weighted average of past values while ARIMA uses
past values and past errors. You can _nd more details here:
https://www.google.co.in/url?
sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwiUgKXCx-
fKAhXIc44KHTTpDyUQFggjMAE&url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fwww.ons.gov.uk%2Fons%2Fgui
method%2Fukcemga%2Fukcemga-publications%2Fpublications%2Farchive%2Ffrom-h
winters-to-arima-modelling–measuring-the-impact-on-forecasting-errors-for-compone
of-quarterly-estimates-of-public-service-output.pdf&usg=AFQjCNGmYzfVB-
_gdss4LKTGw4VVZgBC_w&sig2=9pnseABiC_4oxC2KnWmHNw&cad=rja
(https://www.google.co.in/url?
sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwiUgKXCx-
fKAhXIc44KHTTpDyUQFggjMAE&url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2Fwww.ons.gov.uk%2Fons%2Fgui

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 42 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

method%2Fukcemga%2Fukcemga-publications%2Fpublications%2Farchive%2Ffrom-h
winters-to-arima-modelling--measuring-the-impact-on-forecasting-errors-for-compone
quarterly-estimates-of-public-service-output.pdf&usg=AFQjCNGmYzfVB-
_gdss4LKTGw4VVZgBC_w&sig2=9pnseABiC_4oxC2KnWmHNw&cad=rja)

TL
April 11, 2016 at 12:59 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes
python/#comment-109273)

Holt winters (at least the additive model) is a special case of arima model (a seasonal arima mo
That would be an arima(p,d,q)(P,D,Q) where the second parentheses contains the seasonal effec
would additionally recommend checking out any of Rob Hyndman’s work on arima modeling, I _n
be very accessible.

SHAN
February 7, 2016 at 6:12 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-105332)

Hi..
Thanks. for an informative article.
I am eager to know on followings :

a) how can we identify what should be nlags value to test with


lag_acf = acf(ts_log_diff, nlags=20)
lag_pacf = pacf(ts_log_diff, nlags=20, method=’ols’)

b) How can we forecast for future time points (say 12 time points ahead).
Can we use followings still ?
predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,_ll_value=0)
ts_log is not available for future points.

c) In one of the article ( A Complete Tutorial on Time Series Modeling in R,) referred by you ,

while performing adf says


adf.test(diff(log(AirPassengers)), alternative=”stationary”, k=0)
What is k , and how can we identify the value of k while performing the test..

while performing ARIMA says :


_t <- arima(log(AirPassengers), c(0, 1, 1),seasonal = list(order = c(0, 1, 1), period = 12)))

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 43 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

We can identify the (p,d,q) from ACF PACF plots .


Please explain parameter seasonal = list(order = c(0, 1, 1)
What values should we pass in seasonal parameter and how to identify it.

It will be helpful if you guide on above.. Thanks in anticipation.

AARSHAY JAIN
February 8, 2016 at 6:52 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-code
python/#comment-105359)

Hi Shan,

Thanks for reaching out. Please _nd my responses below:

a) So the ‘nlags’ doesn’t affect the output values. I just speci_es how many values to display. So
can start with a small number and if you don’t _nd the crossing point within that, you can increas
maximum upto the number of observations in data.

b) ARIMA has a speci_c function for forecasting values. The ‘results_ARIMA’ variable here is of th
type ‘ARIMAresults’ which has a ‘predict’ function. You can check the details as –
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.arima_model.ARMAResult
(http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.arima_model.ARMAResu
Please feel free to get back to me in case you face challenges in implementing this. You can also
a thread in the discussion forum which will allow more freedom of expression while discussing

c) I’m not much experienced with R so let me read the code syntax. I’ll get back to you on this.

Cheers!

CHIRAG BHATIA
July 15, 2016 at 5:48 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-foreca
codes-python/#comment-113494)

Hi Aarshay
I am trying to predict future values on same AirPassenger data but i am not getting co
results. I may miss some parameters while predicting. Please help me. I am stuck from
2 days. My code is:

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 44 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

import pandas as pd
import numpy as np
from statsmodels.tsa.arima_model import ARIMA
import matplotlib.pylab as plt

data_1 = pd.read_csv(‘AirPassengers.csv’)
avg= data_1[‘#Passengers’]
avg=list(avg)
res = pd.Series(avg, index=pd.to_datetime(data_1[‘Month’],format=’%Y-%m’))

ts=np.log(res)
ts_diff = ts – ts.shift()
ts_diff.dropna(inplace=True)
r = ARIMA(ts,(2,1,2))
r = r._t(disp=-1)
pred = r.predict(start=’1961-01′,end=’1970-01′)
dates = pd.date_range(‘1961-01′,’1970-01′,freq=’M’)
# print dates
predictions_ARIMA_diff = pd.Series(pred, copy=True)
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
predictions_ARIMA_log = pd.Series(ts.ix[0])
predictions_ARIMA_log=predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum
predictions_ARIMA = np.exp(predictions_ARIMA_log)
plt.plot(res)
plt.plot(predictions_ARIMA)
# plt.title(‘RMSE: %.4f’% np.sqrt(sum((predictions_ARIMA-ts1)**2)/len(ts)))
plt.show()

print predictions_ARIMA.head()
print ts.head()

AARSHAY JAIN
February 8, 2016 at 12:32 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-cod
python/#comment-105373)

Hi Shan,

I guess you have started a separate discussion thread for your query ‘c’. Lets continue the discus
there. For others who’re reading this and interested in exploring further, please check out this link

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 45 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

http://discuss.analyticsvidhya.com/t/seasonal-parameter-in-arima-and-adf-test/7385/1
(http://discuss.analyticsvidhya.com/t/seasonal-parameter-in-arima-and-adf-test/7385/1)

Cheers!

AMITSETHIA
February 13, 2016 at 12:23 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-105619)

Thanks Aarshay for this write up. It is also recommended to not to go for combined models as p & q used
together will nullify their impact on the model, hence, it is either a moving average or auto correlation along
differences, but here combined model has given the best results. Can you please correct my understanding
around combined models.

AARSHAY JAIN
February 13, 2016 at 12:50 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-co
python/#comment-105623)

I haven’t read that p & q should not be combined. It’s actually appears counter intuitive because i
was the case then ARIMA should not exist in the _rst place. Can you throw some light on why do
believe that they cancel out the effect of one another?

AAYUSH KUMAR SINGHA


February 29, 2016 at 10:58 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-106400)

Hi!
The article is the best available on Time Series with Python with great external links too for those who want
understand the stat behind also.
I would like to request to please extend this article to predict out-of-sample data range also with different m
to depict the better ones as you did for eliminating trend (taking rolling average and ewma).
That will make it all fully qedged time-series article.
Thanks in advance.

AARSHAY JAIN
March 1, 2016 at 8:46 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-106425)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 46 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Hi Ayush!
Thanks for your valuable feedback. Yes I think that component is necessary. But instead of exten
this article, I’ll probably write a separate post taking another case study. I’m a bit crunched for
bandwidth but you can expect it sometime in this month. Stay tuned!

MICHAEL
March 13, 2016 at 5:49 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
107267)

Thanks for the excellent article. I have 2 clari_cations


1) In the Estimating & Eliminating Trend step, I have negative numbers. Could you please tell me what
transformations could I apply. Log and Sqrt returns NAN?
2) Also, test_stationarity(ts_log_decompose,nlags=10) while executing speci_es nlags not de_ned.

Thanks in advance.

AARSHAY JAIN
March 13, 2016 at 6:12 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes
python/#comment-107270)

Hi Michael,

Thanks for reaching out. Regarding your queries:


1. You can try scaling up your values and then applying transformations. Also, you might want to
check if log transformation is actually required in your case. You can try a cube root as well.
2. Please remove the nlags argument and then run the code. I’ve updated the code above as well

WILL WELCH
March 14, 2016 at 1:34 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
107299)

Nice article, you rarely see this range of models discussed in one place
and in such a hands-on way.

For anyone doing seasonal decomposition in Python, I’d like to shamelessly


plug my `seasonal` package (PyPI or https://github.com/welch/seasonal (https://github.com/welch/seaso
in
addition to statsmodels seasonal_decompose. `seasonal` offers some richer

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 47 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

and more robust detrending possibilities, and will also estimate your model’s
periodicity for you (convenient in a dev-ops setting with thousands of streams
at hand). It also includes a robust periodogram for visualizing the periodicities
in your data.

AARSHAY JAIN
March 14, 2016 at 5:58 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes
python/#comment-107313)

Thanks Will for sharing your library. It’ll be helpful for everyone.

ALON STAR
March 24, 2016 at 7:38 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
108176)

Can you please explain what is the dftest[0:4]?

AARSHAY JAIN
March 25, 2016 at 6:26 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes
python/#comment-108224)

the adfuller function returns a list with many values. I’m picking the _rst 4 using [0:4]. I’ve used th
value separately. You might want to print the dftest variable and you’ll know.

LBERT
April 10, 2016 at 5:53 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
109243)

Can we use this method for decimal data? Why the program gave me an error of “ValueError: You must spe
freq or x must be a pandas object with a timeseries index”?

AARSHAY JAIN
April 11, 2016 at 6:59 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-109280)

I don’t think it is a decimal error. Please check whether your index is a timeseries object.

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 48 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

CLOGA (HTTP://WWW.CLOGA,INFO)
April 14, 2016 at 9:54 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
109445)

Hi Aarshay Jain,

One more question,


When you _t model , you use ts_log as sampel, ie. model = ARIMA(ts_log, order=(2, 1, 2)) , but when you pre
you use predict value as diff value : predictions_ARIMA_diff = pd.Series(results_ARIMA._ttedvalues, copy=T
is results_ARIMA._ttedvalues return log value or diff value?

Thank you for your time.

AARSHAY JAIN
April 14, 2016 at 10:53 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes
python/#comment-109454)

actually while calling ARIMA I have set order = (2,1,2). Here the middle argument 1 means that A
will automatically take a difference of 1 while making predictions.

CLOGA (HTTP://WWW.CLOGA,INFO)
April 15, 2016 at 2:31 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-foreca
codes-python/#comment-109489)

Got it thank you!

AYODEJI OLUFEMI AYOTUNDE


April 19, 2016 at 12:44 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
109660)

What an excellent article on Time Series, more grease to your elbow. But the question is, is this method a
package of analyzing Time Series related data or what? And can’t we do the same on SPSS and have the sa
simple method as this? However, I have to commend you a lot for this wonderful presentation. God will con
to increase your knowledge.

AARSHAY JAIN

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 49 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

April 19, 2016 at 5:09 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-


python/#comment-109664)

Thanks Ayodeji!
I’m not sure about SPSS and sorry I didn’t get your questions – what do you mean “is this method
package of analyzing Time Series related data”? Please elaborate.

ANDREW
April 22, 2016 at 4:47 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
109830)

Hi Aarshay Jain

I’ve tried the dickey-fuller test code with different dataset and then an error shown up like this:

ValueError: too many values to unpack

please give an advice

thank you

AARSHAY JAIN
April 25, 2016 at 5:50 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-110007)

please share the code..

DENNIS
May 27, 2016 at 1:41 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-111506)

Andrew, your are probably passing in a dataframe instead of a series, in the code Aarshay wrote
the dftest.
Speci_cally here: dftest = adfuller(timeseries.unstack(), autolag=’AIC’)
note the .unstack() that I added — transforming the df into a series — when I also encountered th
same error.

TANVIR

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 50 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

April 23, 2016 at 8:10 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com


109922)

Hello,
I am struggling on a question.
Here, forecast is sowing upto 1960-12-01. Now based on the current measure, I want to forecast the upcom
years for example 1961-01-01 to 1965-12-01.
How can I do this ?

ANIRBAN DHAR
May 3, 2016 at 6:45 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
110390)

Thanks Aarshay for this detailed and illustrative posts.

one concern, the decomposition is not working for 12 months [of a single year] data.
eg. in AirPassengers.csv if I take only the records of 1949 it fails to decompose giving below error:

File “C:\anirban\install\Anaconda3\lib\site-packages\statsmodels\tsa\seasonal.
py”, line 88, in seasonal_decompose
trend = convolution__lter(x, _lt)
File “C:\anirban\install\Anaconda3\lib\site-packages\statsmodels\tsa\_lters\f
iltertools.py”, line 289, in convolution__lter
result = signal.convolve(x, _lt, mode=’valid’)
File “C:\anirban\install\Anaconda3\lib\site-packages\scipy\signal\signaltools.
py”, line 470, in convolve
return correlate(volume, kernel[slice_obj], mode)
File “C:\anirban\install\Anaconda3\lib\site-packages\scipy\signal\signaltools.
py”, line 160, in correlate
_check_valid_mode_shapes(in1.shape, in2.shape)
File “C:\anirban\install\Anaconda3\lib\site-packages\scipy\signal\signaltools.
py”, line 72, in _check_valid_mode_shapes
“in1 should have at least as many items as in2 in ”
ValueError: in1 should have at least as many items as in2 in every dimension for
‘valid’ mode.

I think this is somehow related to the _lter but not able to nail it since I am too novice.
Please note – the code is exact replica of yours

Any help will be appreciated, Thanks again

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 51 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

BAWASIR KI DAWA (HTTP://RAJHERBALS.COM/ARSHOHER-BAWASIR-KI-DAWA.HT


May 6, 2016 at 12:47 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
110515)

Great blog post, thanks for sharing this post.


bawaseer ka ayurvedic ilaj (http://rajherbals.com/arshoher-bawasir-ki-dawa.html)

AARSHAY JAIN
May 6, 2016 at 5:56 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-110533)

You are welcome

SK
May 16, 2016 at 7:06 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
111065)

How to do the prediction for the future?

start = len(ts)
end = len(ts)+14
y_forecast = np.exp(results_ARIMA.predict(start, end))
This does not provide good results.
Would you please expand your code and description to include one month ahead forecast?

PRAKHA SHRIVASTAVA
May 19, 2016 at 10:59 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
111179)

Hi
Thank You for sharing this post. i have one question: time series in pandas does only work with csv _le bec
want to forecast my database values for next 6 months. I did connect the python with mySQl database. i.e i
data in python with dataset not in csv _le.So how can i used time series forecasting method. If you provide
code it will be huge help for me.

PRAKHAR SHRIVASTAVA
May 25, 2016 at 12:45 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 52 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

python/#comment-111437)

this problem is done…. by using data = pd.read_sql_query(cur,con).

PRAKHAR SHRIVASTAVA
May 25, 2016 at 12:32 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
111435)

In Dicket-fuller test my came Results of Dickey-Fuller Test:


Test Statistic -2.287864
p-value 0.175912
#Lags Used 11.000000
Number of Observations Used 215.000000
Critical Value (1%) -3.461136
Critical Value (10%) -2.573986
Critical Value (5%) -2.875079
dtype: qoat64
My p value is so less? Its means my data is not normal ox not not suited to this model?

PRAKHAR SHRIVASTAVA
May 27, 2016 at 9:24 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-111519)

This problem is also done.

SATYA CHANDU
July 1, 2016 at 12:31 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-112881)

If test static (-2.287864) is greater than critical value (-3.46, -2.57, -2.87) then we can’t reject the
hypothesis, the series is stationary. That said it is still non-stationary. If you increase the i value in
ARIMA model, perhaps above condition may meet and you may get the good forecast values.

DENNIS
May 27, 2016 at 1:43 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
111507)

Hi Aarshay,

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 53 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

I really enjoyed this but running on python 3, I’ve encountered a couple errors on the last portion.
“plt.title(‘RSS: %.4f’% sum((results_MA._ttedvalues-ts_log_diff)**2))

pandas\tslib.pyx in pandas.tslib.Timestamp.__radd__ (pandas\tslib.c:14048)()

pandas\tslib.pyx in pandas.tslib._Timestamp.__add__ (pandas\tslib.c:19022)()

ValueError: Cannot add integral value to Timestamp without offset.’


Googling around it seems that its a bug in statsmodel but if I was wondering perhaps if you or someone els
ported it to python 3?

Thanks

PRAKHAR SHRIVASTAVA
May 27, 2016 at 9:23 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
111518)

Thank you for this post. I _nd error in model = ARIMA(ts_log, order=(2, 1, 0))
and i unable to _nd the error.Please me

PRAKHAR SHRIVASTAVA
June 1, 2016 at 2:29 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
111696)

thank you for this example. i have one problem. When i tried to put model in my program its said
ValueError: Given a pandas object and the index does not contain dates but in my dataset date is there.
data = pd.read_sql_query(cur,con, index_col=’datum’, coerce_qoat=True, params=None)
I dont know what is problem with this?

SEAN STANISLAW
June 3, 2016 at 5:08 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
111794)

I still prefer gretl for building time series and econometric models easy to use its an open source just down
and go

YUER

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 54 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

June 10, 2016 at 1:09 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com


112070)

Hi, thank you for your sharing.


I am using this model to predict play number of a song.
When I input data with ‘data = pd.read_csv(’00play.csv’, parse_dates=’data’,
index_col=’data’,date_parser=dateparse)’
There are some error ‘TypeError: Only booleans, lists, and dictionaries are accepted for the ‘parse_dates’
parameter’
if I delete this parameter parse_dates, is there any inquence?
Using data without parameter parse_dates, when making seasonal_decompose, another error is ‘ValueErro
D not understood. Please report if you think this in error.’
please give an advice
thank you

JITENDRA
June 12, 2016 at 4:00 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
112152)

hi can you give me an idea in case of multiple time series forecasting

JIE
June 19, 2016 at 3:59 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
112372)

really liked this post. Thank you very much for sharing.

BOM
June 23, 2016 at 5:09 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
112604)

How to deal with the tendency of irregular time series data(data with different time interval)?

SATYA CHANDU
June 27, 2016 at 10:58 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
112764)

Hi,

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 55 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

This is a very good blog and very useful. I could follow the entire process. But I did not understand how to
forecast for next 12 months from the last value. In the current case the last value is 1960-12, I need to forec
till 1961-12 (12 values). How can I do that in the following code? It would be great if you kindly add that pro
and update this article.

predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)


predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,_ll_value=0)

AADITYA
June 28, 2016 at 7:23 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
112783)

Hi Aarshay,

First of all thanks for this brilliant post.

I am following a similar approach for forecasting a minute’s data using previous hours data. I am using fore
function in statsmodels along with ARIMA model. Calculating the P, Q and D using the approach you mentio
in your post.

However, I am facing a few problems:

1. At times the ARIMA throws an error for AR or MA parameter .


2. ARIMA in python takes a lot of time. Similar code in R takes less than 30 minutes for forecasting a month
data. Am I missing something or ARIMA in python is inherently slow?
3. I get MLE not converging Warning almost every-time, why is that so.
4. ARIMA does not allow D value more than two, however, at times adfuller results in d value more than two
What should be done in this case.

Looking forward to your suggestions.

Thanks,
Aaditya

SATYA CHANDU
June 28, 2016 at 4:21 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
112798)

Hi Arshay,

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 56 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

This is very useful article. I got a small doubt about forecasting values. How can I get the forecast values fr
the following code? Suppose I want to print for next 1 year, how can I do that?

Thanks,
Satya

predictions_arima_log = pd.Series(ts_log.ix[0], index=ts_log.index)


predictions_arima_log = predictions_arima_log.add(predictions_arima_diff_cumsum,_ll_value=0)

SHANTANU SAHA
July 5, 2016 at 7:07 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
113125)

Thank you for such a detailed post. I was wondering what if the data was in Country-level? How can we dea
such time series data then?

EVELYN
July 8, 2016 at 5:50 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
113228)

For this code line:


data = pd.read_csv(‘AirPassengers.csv’, parse_dates=’Month’, index_col=’Month’,date_parser=dateparse)
Does it work? I got error message in my Anaconda Python 2.7 because Python can’t identify ‘Month’ as a lis
Month column value for parameter parse_dates, so I changed to [‘Month’], It works.
Could anyone con_rm it in Python 3? Thanks.

MICHAEL FRANCIS
July 11, 2016 at 4:44 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
113333)

“ValueError: Cannot add integral value to Timestamp without offset.” i keep getting this error whenever I use
ARIMA function and I was wondering if you could tell me what this means and how i could _x it. Im using th
same data and steps as the example above.

FLORENT
July 18, 2016 at 4:58 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
113624)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 57 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

Thanks for this great article, it greatly helped me get started with time series forecasting.

What would be the additional steps if you wanted to make a more accurate forecast?

ABS
July 20, 2016 at 9:22 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
113719)

Really nice post,

My question is how would different would the second part of the problem be if you were to use decomposin
instead of differecing for forecasting the time-series?

SHREYAK TIWARI
July 20, 2016 at 11:53 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#com
113723)

Can someone please explain while creating ARIMA models you are using ts_log ( just a log time series) but
calculating RSS you are using ts_log_diff . Am i missing something here ?

MAYANK SATNALIKA
July 26, 2016 at 7:03 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#comm
114036)

Hey I’m a newbie in machine learning and wanted sort out an issue: What type of problems can be classi_e
under time forecasting problem. Many tutorials begin with predicting stock prices for next few days, so is it
time forecast problem. Also is the Bike sharing Demand question from Kaggle a part of time forecasting
question as we are given the demand for some dates and we need to predict demand for upcoming days.

BHUVANESHWARAN
August 17, 2016 at 7:22 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-114897)

How to select which model is better one for our data? Is there any parameters in data to select models ?

JARAD

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 58 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

August 25, 2016 at 4:15 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-


python/#comment-115111)

This is literally the BEST article I’ve ever seen on time-series analysis with Python. Very well explained. I wis
statsmodels documentation was this good (they give you the tools but don’t show you how to use them!).

I am very confused about ACF and PACF and how to read the charts to determine the proper P an Q. You
concluded that p and q are both 2 and you mention “upper con_dence level”. I don’t see the lines crossing t
upper-con_dence level dashed line at point 2 in ACF or PACF. Is this a typo?

If not a typo, can you explain?

JARAD
August 26, 2016 at 6:22 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-115149)

I wonder what your thoughts are on doing a decomposition, then performing ARIMA forecasting on each
component (trend, seasonality, residual), then re-scaling back. Is this a sound method/approach? I did this
the prediction line looks like what I’d expect. I’m just wondering if this is a common practice.

ANIL
August 29, 2016 at 7:55 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-115299)

Hi Aarshey.. great article. I have tested the code and working _ne, however, I am not getting the years in X a
tried different date parse methods, but no luck. How did you get year values in X axis where as parse metho
converting Month column as string in %Y-%m-%d format?

YASSIR
September 2, 2016 at 3:00 pm (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-115462)

I got confused on many points: 1- we do many transformations to get stationarity data and every transform
we get data with good stationarity and on the example, you got the best stationary after applying the
Decomposing, then why did you use the ts_log_diff and ts_log data with ACF,PACF and ARIMA instead of us
the Decomposing data !? 2- I did see many styles for ACF and PACF one like continuous graph and another
like pins, which one I should go for it? 3- what is the best and easiest way to detect AR and MA by ACF and

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 59 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

PACF? some tutorials mention about every ARIMA model has a special ACF and PACF pattern and others
mention about the intersection between the lags and the con_dence upper line! 4-is there any way to autom
the step of getting the AR and MA instead of trying to investigate the ACF and PACF plots?

ALEX DEBIE
September 21, 2016 at 12:59 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-116262)

thanks alot for the information, i learned a ton. Im just a little confused now that i have this model how to us
to predict the next point in time

DWITI BASU
September 26, 2016 at 6:31 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-116483)

Hi I am getting this error when I am writing the following codes, can anyone help?

date1= lambda dates: pd.datetime.strptime(dates, ‘%Y-%m’)


dataset= pd.read_csv(‘AirPassangers.csv’, parse_dates=’Month’, index_col=’Month’,date_parser=date1)

This is what I am getting:


========
date1= lambda dates: pd.datetime.strptime(dates, ‘%Y-%m-%d’)

dataset= pd.read_csv(‘AirPassangers.csv’, parse_dates=’Month’, index_col=’Month’,date_parser=date1)

Traceback (most recent call last):

File “”, line 1, in


dataset= pd.read_csv(‘AirPassangers.csv’, parse_dates=’Month’, index_col=’Month’,date_parser=date1)

File “C:\Users\dwiti.b\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\io\parsers.py”, lin


in parser_f
return _read(_lepath_or_buffer, kwds)

File “C:\Users\dwiti.b\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\io\parsers.py”, lin


in _read
parser = TextFileReader(_lepath_or_buffer, **kwds)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 60 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

File “C:\Users\dwiti.b\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\io\parsers.py”, lin


in __init__
self._make_engine(self.engine)

File “C:\Users\dwiti.b\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\io\parsers.py”, lin


in _make_engine
self._engine = CParserWrapper(self.f, **self.options)

File “C:\Users\dwiti.b\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\io\parsers.py”, lin


1202, in __init__
ParserBase.__init__(self, kwds)

File “C:\Users\dwiti.b\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\io\parsers.py”, lin


in __init__
kwds.pop(‘parse_dates’, False))

File “C:\Users\dwiti.b\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\io\parsers.py”, lin


in _validate_parse_dates_arg
raise TypeError(msg)

TypeError: Only booleans, lists, and dictionaries are accepted for the ‘parse_dates’ parameter

DENNIS
October 3, 2016 at 10:42 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-
python/#comment-116715)

thank you for this post. however do you have any tutorials on stock price prediction using arti_cial neural
networks?

LEO
October 4, 2016 at 5:09 am (https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/#co
116749)

This is not a complete guide. This can be something to get you started. Time series analysis is not that limi

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 61 of 62
Complete guide to Time Series Forecasting (with Codes in Python) 20/08/19, 12)40 PM

ANALYTICS DATA COMPANIES JOIN OUR COMMUNITY :


VIDHYA SCIENTISTS
Post Jobs ! "
About Us Blog (https://www.analyticsvidhya.com/corporate/)
(http://www.analyticsvidhya.com/about- (https://www.facebook.com/AnalyticsVid
(https://www.analyticsvidhya.com/blog)
Trainings (https://twitter.com/analy
me/) Hackathon (https://trainings.analyticsvidhya.com)
46336 21839
Our Team (https://datahack.analyticsvidhya.com/)
Hiring
(https://www.analyticsvidhya.com/about-
Hackathons (https://www.facebook.com/AnalyticsVid
(https://twitter.com/analy
Discussions
me/team/) (https://datahack.analyticsvidhya.com/)
(https://discuss.analyticsvidhya.com/)
Followers Followers
Career Apply Jobs Advertising
(https://www.analyticsvidhya.com/career- (https://www.facebook.com/AnalyticsVid
(https://twitter.com/analy
(https://www.analyticsvidhya.com/contact/)
(https://www.analyticsvidhya.com/jobs/)
analytics- + $
vidhya/) Leaderboard Reach Us
(https://www.analyticsvidhya.com/contact/)
(https://plus.google.com/+Analyticsvidhy
(https://datahack.analyticsvidhya.com/users/) (https://in.linkedin.com/c
Contact Us
(https://www.analyticsvidhya.com/contact/) (https://plus.google.com/+Analyticsvidhy
vidhya) 7513
Write for us Followers (https://in.linkedin.com/c
(https://www.analyticsvidhya.com/about-
me/write/) vidhya) Followers
(https://plus.google.com/+Analyticsvidhy
(https://in.linkedin.com/c
vidhya)

Subscribe to emailer >

© Copyright 2013-2019 Analytics Vidhya. Privacy Policy (https://www.analyticsvidhya.com/privacy-policy/) Don't have an account?
Terms of Use (https://www.analyticsvidhya.com/terms/)

Refund Policy (https://www.analyticsvidhya.com/refund-policy/)

×
-
(http://play.google.com/store/apps/details?id=com.analyticsvidhya.android)

https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/ Page 62 of 62

You might also like