0% found this document useful (0 votes)
142 views

Time Series Analysis Using Ipython

This article discusses analyzing time series data using ARMA and ARIMA models in Python with the Statsmodels package. It loads annual sunspot data from 1700-2008 and checks that the data was uploaded correctly. It then prepares the data for analysis.

Uploaded by

Kaustubh Keskar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views

Time Series Analysis Using Ipython

This article discusses analyzing time series data using ARMA and ARIMA models in Python with the Statsmodels package. It loads annual sunspot data from 1700-2008 and checks that the data was uploaded correctly. It then prepares the data for analysis.

Uploaded by

Kaustubh Keskar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Advertisements

Search... 
Report this ad

HOME | FEATURED | ARTICLES | LISTS | INTERVIEWS | INFOGRAPHICS | EDUCATION & TRAINING

| ABOUT

Search... ARTICLES

BI Corner Newsletter Time Series Analysis using


iPython
BY JEFFREY STRICKLAND ON NOVEMBER 16, 2015 • ( 5 COMMENTS )

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In this example, we will examine ARMA and ARIMA models with Python using the
Statsmodels package. This package can be downloaded
Advertisements
at http://statsmodels.sourceforge.net/stable/index.html.
Autogressive Moving-Average Processes (ARMA) and Auto-
Regressive Integrated Moving Average (ARIMA) can be called
from the tsa (Time Series) module from the Statamodels
package.
Note: I am not as expert in time-series analysis as I am in other areas of Analytics, so if you
find errors I would be happy to know about them and correct them.

Introduction
Sign Up!
ARIMA models are, in theory, the most general class of models for forecasting a time series,
which can be made to be “stationary” by differencing (if necessary), perhaps in conjunction
Most Viewed Posts with nonlinear transformations such as logging or deflating (if necessary). A random variable
that is a time series is stationary if its statistical properties are all constant over time. A
INFOGRAPHIC: stationary series has no trend, its variations around its mean have a constant amplitude, and
NOSQL - The it wiggles in a consistent fashion, i.e., its short-term random time patterns always look the
Ultimate Solution for
same in a statistical sense. The latter condition means that its autocorrelations (correlations
Database
Management with its prior deviations from the mean) remain constant over time, or equivalently, that its
power spectrum remains constant over time. A random variable of this form can be viewed
A 12 Step Approach
(as usual) as a combination of signal and noise, and the signal (if one is apparent) could be a
to Analytical
Modeling pattern of fast or slow mean reversion, or sinusoidal oscillation, or rapid alternation in sign,
and it could also have a seasonal component. An ARIMA model can be viewed as a “filter”
Data Fitting with R
that tries to separate the signal from the noise, and the signal is then extrapolated into the
future to obtain forecasts.
Naïve Bayes using R

The Data
Time Series Analysis
using iPython
The data we will use is annual sunspot data from 1700 – 2008 recording the number of
7 Implementation sunspots per year. The file sunspots.csv and can be downloaded from the line below.
Challenges of Big
Data Analytics (w/ https://mareksybilak-pythonxy.googlecode.com/hg-
Infographic) Report this ad
history/fbb21ff5a891:/src/python/statsmodels/PLATLIB/statsmodels/datasets/sunspots/

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Unobserved
Component Models
Import Packages FOLLOW BI CORNER

in R
In addition to Statsmodels, we will need to import additional packages, including Numpy, Join 3,393 other followers
Business Intelligence Scipy, Pandas, and Matplotlib.
and Analytics Value Enter your email address
Framework
Also, from Statsmodels we will need to import qqplot.
Follow Us
Analytics: is it more
than a buzzword? In [1]:
From Data Silos to
Semantic Data Hubs import numpy as np  Follow BI Corner on Twitter
from scipy import stats
import pandas as pd
BI Corner
 Tag Cloud import matplotlib.pyplot as plt
@bicorner
import statsmodels.api as sm
Agile BI Amazon 6 Big Data Solutions for
Artificial Intelligence Small Businesses
bicorner.com/2018/08/11/6-b…
Banking Best
In [2]:
Practices Big from statsmodels.graphics.api import qqplot

Data
Business Loading the Dataset
Analytics We will use pandas to load the .csv file which we downloaded earlier.

Business In [3]:

Intelligenc
dta= pd.read_csv("C:/Users/Strickland/Documents/Python Scripts/sunspot Aug 11, 2018

e CAO CIO Cloud


BI Corner
@bicorner

Cloudera CMO COO Data


Check the Data Upload Bridging the gap between data
Governance Data Lake and insights
bicorner com/2018/08/11/bri
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
bicorner.com/2018/08/11/bri…
Next we check to see that the data uploaded correctly by viewing the dataset Notes.
Data
Data Mining

Science Data In [4]:

Scientist Data print sm.datasets.sunspots.NOTE

Security Data

Visualization Data
Warehouse Aug 11, 2018
::
Facebook Finance
BI Corner
Forrester Gartner
Number of Observations - 309 (Annual 1700 - 2008) @bicorner
Google Hadoop Number of Variables - 1 The Dark Side of Big Data
Variable name definitions::
Healthcare HR IBM bicorner.com/2018/08/10/the…

IoT Machine SUNACTIVITY - Number of sunspots for each year


Learning
Manufacturing The data file contains a 'YEAR' variable that is not returned by l

Marketing
Microsoft Mobile BI
Netflix NoSQL Oracle
Preparing the Data
Predictive Aug 10, 2018
Next we need to do a little dataset preparation. Here, an annual date series must be date-
Analytics Python Qlik
times at the end of the year. BI Corner
Retail R programming
@bicorner
SAP Self-Service BI SMB In [5]: Big Data Is Stepping Into The
Spark Tableau Fashion World
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008') bicorner.com/2018/08/10/big…
del dta["YEAR"]

Examine the Data


Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Now we take a look at the data. Aug 10, 2018

In [6]: BI Corner
@bicorner
# show plots in the notebook Top 10 roles in AI and data science
%matplotlib inline bicorner.com/2018/08/10/top…
dta.plot(figsize=(12,8));

Aug 10, 2018

Embed View on Twitter

Like BI Corner on Facebook

BI Corner
1K likes

Like Page
Auto-correlations
Before we decide which model to use, we need to look at auto-correlations.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Autocorrelation correlogram. Seasonal patterns of time series can be examined via
correlograms, which display graphically and numerically the autocorrelation function (ACF).
Auto-correlation in pandas plotting and statsmodels graphics standardize the data before
computing the auto-correlation. These libraries subtract the mean and divide by the  Follow BI Corner by RSS
standard deviation of the data.

When using standardization, they make an assumption that your data has been generated
with a Gaussian law (with a certain mean and standard deviation). This may not be the case
in reality.

Correlation is sensitive. Both (matplotlib and pandas plotting) of these functions have their
drawbacks. The figure generated by the following code using matplotlib will be identical to
figure generated by pandas plotting or statsmodels graphics.

Partial autocorrelations. Another useful method to examine serial dependencies is to


examine the partial autocorrelation function (PACF) – an extension of autocorrelation, where
the dependence on the intermediate elements (those within the lag) is removed.

Once we determine the nature of the auto-correlations we use the following rules of thumb.

Rule 1: If the ACF shows exponential decay, the PACF has a spike at lag 1, and no
correlation for other lags, then use one autoregressive (p)parameter
Rule 2: If the ACF shows a sine-wave shape pattern or a set of exponential decays, the
PACF has spikes at lags 1 and 2, and no correlation for other lags, the use two
autoregressive (p) parameters
Rule 3: If the ACF has a spike at lag 1, no correlation for other lags, and the PACF damps
out exponentially, then use one moving average (q) parameter.
Rule 4: If the ACF has spikes at lags 1 and 2, no correlation for other lags, and the PACF
has a sine-wave shape pattern or a set of exponential decays, then use two moving
average (q) parameter.
Rule 5: If the ACF shows exponential decay starting at lag 1, and the PACF shows
exponential decay starting at lag 1, then use one autoregressive (p) and one moving
average (q) parameter.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Removing serial dependency. Serial dependency for a particular lag can be removed by Advertisements

differencing the series. There are two major reasons for such transformations.

First, we can identify the hidden nature of seasonal dependencies in the series.
Autocorrelations for consecutive lags are interdependent, so removing some of the
autocorrelations will change other auto correlations, making other seasonalities more
apparent.
Second, removing serial dependencies will make the series stationary, which is necessary
for ARIMA and other techniques.

Another popular test for serial correlation is the Durbin-Watson statistic. The DW statistic
will lie in the 0-4 range, with a value near two indicating no first-order serial correlation.
Positive serial correlation is associated with DW values below 2 and negative serial
correlation with DW values above 2.

In [7]:

sm.stats.durbin_watson(dta)

Out[7]:

array([ 0.13952893])

The value of Durbin-Watson statistic is close to 2 if the errors are uncorrelated. In our
example, it is 0.1395. That means that there is a strong evidence that the variable open has
high autocorrelation.
In [8]:

Report this ad
# show plots in the notebook
%matplotlib inline

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(dta.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(dta, lags=40, ax=ax2)

C:\Anaconda\lib\site-packages\matplotlib\collections.py:590: FutureWar
if self._edgecolors == str('face'):

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
The plots also indicate that autocorrelation is present. Another set of plots (shown below)
are available using the autocorrelation_plot function from Pandas.
In [9]:

from pandas.tools.plotting import autocorrelation_plot

In [10]:

# show plots in the notebook


%matplotlib inline

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
dta['SUNACTIVITY_2'] = dta['SUNACTIVITY']
dta['SUNACTIVITY_2'] = (dta['SUNACTIVITY_2'] - dta['SUNACTIVITY_2'].me
plt.acorr(dta['SUNACTIVITY_2'],maxlags = len(dta['SUNACTIVITY_2']) -1,
plt.show()
autocorrelation_plot(dta['SUNACTIVITY'])
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
For mixed ARMA processes the Autocorrelation function is a mixture of exponentials and
damped sine waves after (q-p) lags. The partial autocorrelation function is a mixture of
exponentials and dampened sine waves after (p-q) lags.

Times Series Modeling


We will only explore two methods here. An ARMA model is classified as ARMA(p,q), with no
differenceing terms. ARMA models can be described by a series of equations. The equations
are somewhat simpler if the time series is first reduced to zero-mean by subtracting the
sample mean. Therefore, we will work with the mean-adjusted series

yt; = Yt – Y̅, t = 1, …N

where Yt is the original time series, Y̅ is its sample mean, and yt is the mean-adjusted series.
One subset of ARMA models are the so-called autoregressive, or AR models. An AR model
expresses a time series as a linear function of its past values. The order of the AR model tells
how many lagged past values are included. The simplest AR model is the first-order
autoregressive, or AR(1), model

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
yt + a1 yt-1 = et

where yt is the mean-adjusted series in year t, yt-1 is the series in the previous year, at is the
lag-1 autoregressive coefficient, and et is the noise. The noise also goes by various other
names: the error, the random-shock, and the residual. The residuals et are assumed to be
random in time (not autocorrelated), and normally distributed. Be rewriting the equation for
the AR(1) model as

yt = a1 yt-1 + et

We see that the AR(1) model has the form of a regression model in which yt is regressed on
its previous value. In this form, at is analogous to the negative of the regression coefficient,
and et to the regression residuals. The name autoregressive refers to the regression on self
(auto).

A nonseasonal ARIMA model is classified as an ARIMA(p,d,q) model, where:

p is the number of autoregressive terms,


d is the number of nonseasonal differences needed for stationarity, and
q is the number of lagged forecast errors in the prediction equation.

The forecasting equation is constructed as follows. First, let y denote the dth difference of Y,
which means:

If d=0: yt = Yt

If d=1: yt = Yt – Yt-1

If d=2: yt = (Yt – Yt-1) – (Yt-1 – Yt-2) = Yt – 2Yt-1 + Yt-2

Note that the second difference of Y (the d=2 case) is not the difference from two periods
ago. Rather, it is the first-difference-of-the-first difference, which is the discrete analog of a
second derivative, i.e., the local acceleration of the series rather than its local trend.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Modeling the Data
Model One: ARMA(2,0)
Using Rule 2 from above, we will first try an ARMA(2,0) model with two autoregressive terms
and no moving averages.

In [11]:

arma_mod20 = sm.tsa.ARMA(dta, (2,0)).fit()


print arma_mod20.params

const 49.659389
ar.L1.SUNACTIVITY 1.390656
ar.L2.SUNACTIVITY -0.688571
dtype: float64

We now calculate the Akaike Information Criterion (AIC), Schwarz Bayesian Information
Criterion (BIC), and Hannan-Quinn Information Criterion (HQIC). Our goalis to choose a model
that minimizes (AIC, BIC, HQIC).
In [12]:

print arma_mod20.aic, arma_mod20.bic, arma_mod20.hqic

2622.63633806 2637.56970317 2628.60672591

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Does our model obey the theory? We will use the Durbin-Watson test for autocorrelation.
The Durbin-Watson statistic ranges in value from 0 to 4. A value near 2 indicates non-
autocorrelation; a value toward 0 indicates positive autocorrelation; a value toward 4
indicates negative autocorrelation.
In [13]:

sm.stats.durbin_watson(arma_mod20.resid.values)

Out[13]:

2.1458269461951018

The Durbin-Watson test shows no autocorrelation.


Plotting the Data
Next we plot and study the data it represents.

In [14]:

# show plots in the notebook


%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax = arma_mod20.resid.plot(ax=ax);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Analyzing the Residuals
In the following steps, we calculate the residuals, tests the null hypothesis that the residuals
come from a normal distribution, and construct a qq-plot.

In [15]:

resid20 = arma_mod20.resid

In [16]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
stats.normaltest(resid20)

Out[16]:

NormaltestResult(statistic=41.736012879102248, pvalue=8.65246706745336

In [17]:

# show plots in the notebook


%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid20, line='q', ax=ax, fit=True)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model Autocorrelation
Now we investigate the autocorrelation of our ARMA(2,0) model.

In [18]:

%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid20.values.squeeze(), lags=40, ax=a
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid20, lags=40, ax=ax2)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Next, we calculate the lag, autocorrelation (AC), Q statistic and Prob>Q. The Ljung–Box Q test
(named for Greta M. Ljung and George E. P. Box) is a type of statistical test of whether any of
a group of autocorrelations of a time series are different from zero. The null hypothesis is,
H0: The data are independently distributed (i.e. the correlations in the population from which
the sample is taken are 0, so that any observed correlations in the data result from
randomness of the sampling process).

In [19]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
r,q,p = sm.tsa.acf(resid20.values.squeeze(), qstat=True)
data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print table.set_index('lag')

AC Q Prob(>Q)
lag
1 -0.085220 2.265964 0.132244
2 0.103691 5.631592 0.059857
3 -0.027833 5.874877 0.117859
4 0.091122 8.491069 0.075158
5 0.019010 8.605302 0.125881
6 0.031320 8.916425 0.178333
7 0.044485 9.546121 0.215786
8 -0.034337 9.922552 0.270504
9 0.185690 20.967727 0.012794
10 0.191608 32.767487 0.000298
11 0.190385 44.456231 0.000006
12 0.121693 49.247963 0.000002
13 -0.016219 49.333365 0.000004
14 0.014986 49.406527 0.000008
15 -0.063197 50.711976 0.000009
16 0.039730 51.229689 0.000015
17 0.009577 51.259872 0.000027
18 -0.073645 53.050932 0.000026
19 0.076469 54.988666 0.000023
20 -0.006827 55.004163 0.000041
21 0.088818 57.636429 0.000029
22 0.120485 62.497142 0.000009
23 0.103328 66.084650 0.000005

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
24 -0.085728 68.562767 0.000004
25 0.013730 68.626556 0.000006
26 -0.036183 69.071127 0.000009
27 -0.150156 76.754545 0.000001
28 0.049680 77.598615 0.000002
29 -0.055467 78.654537 0.000002
30 0.003354 78.658411 0.000003
31 -0.010905 78.699521 0.000005
32 0.120386 83.727446 0.000002
33 0.042680 84.361682 0.000002
34 0.011107 84.404793 0.000004
35 0.024261 84.611235 0.000005
36 -0.125046 90.115463 0.000002
37 -0.036394 90.583412 0.000002
38 -0.060509 91.881735 0.000002
39 -0.024440 92.094325 0.000003
40 0.000581 92.094446 0.000005

Notice that the p-values for the Ljung–Box Q test all are well above .05 for lags 1 through 8,
indicating “significance.” This is not a desirable result. However, the p-values for the
remaining lags through 40 data values as less than .05. So there is much data not
contributing to correlations at high lags.
Predictions
Next, we compute the predictions and analyze their fit against actual values.

In [20]:

predict_sunspots20 = arma_mod20.predict('1990', '2012', dynamic=True)


print predict_sunspots20

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1990-12-31 164.966812
1991-12-31 135.687529
1992-12-31 89.897551
1993-12-31 46.380327
1994-12-31 17.392506
1995-12-31 7.045131
1996-12-31 12.615671
1997-12-31 27.487278
1998-12-31 44.332851
1999-12-31 57.519084
2000-12-31 64.257219
2001-12-31 64.547986
2002-12-31 60.312657
2003-12-31 54.222559
2004-12-31 48.669655
2005-12-31 45.140943
2006-12-31 44.057288
2007-12-31 44.980067
2008-12-31 47.009508
2009-12-31 49.196363
2010-12-31 50.840111
2011-12-31 51.620193
2012-12-31 51.573181
Freq: A-DEC, dtype: float64

In [21]:

ax = dta.ix['1950':].plot(figsize=(12,8))
ax = predict_sunspots20.plot(ax=ax, style='r--', label='Dynamic Predic
ax.legend();
ax.axis((-20.0, 38.0, -4.0, 200.0));

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
The fit looks good up to about 1998 and underfit the data afterwards.

Calculate Forecast Errors


Mean absolute error: The mean absolute error (MAE) value is computed as the average
absolute error value. If this value is 0 (zero), the fit (forecast) is perfect. As compared to the
mean squared error value, this measure of fit will “de-emphasize” outliers, that is, unique or
rare large error values will affect the MAE less than the MSE value.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Mean Forecast Error (Bias). The mean forecast error (MFE) is the average error in the
observations. A large positive MFE means that the forecast is undershooting the actual
observations, and a large negative MFE means the forecast is overshooting the actual
observations. A value near zero is ideal.

The MAE is a better indicator of fit than the MFE.

In [22]:

def mean_forecast_err(y, yhat):


return y.sub(yhat).mean()

In [23]:

def mean_absolute_err(y, yhat):


return np.mean((np.abs(y.sub(yhat).mean()) / yhat)) # or percent e

In [24]:

print "MFE = ", mean_forecast_err(dta.SUNACTIVITY, predict_sunspots20)


print "MAE = ", mean_absolute_err(dta.SUNACTIVITY, predict_sunspots20)

MFE = 4.73038833847
MAE = 0.134688778061

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
For MFE > 0, models tends to under-forecast. However, as long as the tracking signal is
between –4 and 4, we assume the model is working correctly. The measure of MAE being
small would indicate a pretty good fit.

Model Two: ARMA(3,0)


As a second model, we will try an ARMA(3,0) model with three autoregressive terms and no
moving averages.

In [25]:

arma_mod30 = sm.tsa.ARMA(dta, (3,0)).fit()

In [26]:

print arma_mod30.params

const 49.749953
ar.L1.SUNACTIVITY 1.300810
ar.L2.SUNACTIVITY -0.508093
ar.L3.SUNACTIVITY -0.129650
dtype: float64

In [27]:

print arma_mod30.aic, arma_mod30.bic, arma_mod30.hqic

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
2619.4036287 2638.07033508 2626.8666135

Does our model obey the theory?

In [28]:

sm.stats.durbin_watson(arma_mod30.resid.values)

Out[28]:

1.9564808156070777

The Durbin-Watson is lose to 2, so the test shows very little if any positive autocorrelation.

In [29]:

# show plots in the notebook


%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax = arma_mod30.resid.plot(ax=ax);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [30]:

resid30 = arma_mod30.resid

In [31]:

stats.normaltest(resid30)

Out[31]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
NormaltestResult(statistic=49.845020032909339, pvalue=1.50069150690246

In [32]:

# show plots in the notebook


%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid30, line='q', ax=ax, fit=True)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [33]:

%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid30.values.squeeze(), lags=40, ax=a
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid30, lags=40, ax=ax2)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [34]:

r,q,p = sm.tsa.acf(resid30.values.squeeze(), qstat=True)


data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print table.set_index('lag')

AC Q Prob(>Q)
lag

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1 0.009179 0.026286 8.712028e-01
2 0.041793 0.573042 7.508715e-01
3 -0.001335 0.573601 9.024483e-01
4 0.136089 6.408921 1.706203e-01
5 0.092468 9.111828 1.046860e-01
6 0.091948 11.793244 6.674345e-02
7 0.068748 13.297201 6.518982e-02
8 -0.015020 13.369229 9.976132e-02
9 0.187592 24.641909 3.393910e-03
10 0.213718 39.321995 2.229474e-05
11 0.201082 52.361139 2.344948e-07
12 0.117182 56.804191 8.574251e-08
13 -0.014055 56.868327 1.893901e-07
14 0.015398 56.945566 3.997655e-07
15 -0.024967 57.149321 7.741463e-07
16 0.080916 59.296772 6.872157e-07
17 0.041138 59.853741 1.110943e-06
18 -0.052021 60.747431 1.548430e-06
19 0.062496 62.041694 1.831641e-06
20 -0.010301 62.076981 3.381239e-06
21 0.074453 63.926656 3.193583e-06
22 0.124955 69.154773 8.978346e-07
23 0.093162 72.071037 5.799777e-07
24 -0.082152 74.346689 4.713011e-07
25 0.015695 74.430045 8.289033e-07
26 -0.025037 74.642904 1.367283e-06
27 -0.125861 80.041150 3.722562e-07
28 0.053225 81.009983 4.716274e-07
29 -0.038693 81.523809 6.916626e-07
30 -0.016904 81.622228 1.151660e-06
31 -0.019296 81.750940 1.868764e-06
32 0.104990 85.575067 8.927949e-07
33 0.040086 86.134568 1.247508e-06

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
34 0.008829 86.161811 2.047823e-06
35 0.014588 86.236449 3.263804e-06
36 -0.119329 91.248899 1.084453e-06
37 -0.036665 91.723868 1.521921e-06
38 -0.046193 92.480517 1.938732e-06
39 -0.017768 92.592886 2.990676e-06
40 -0.006220 92.606708 4.696979e-06

This could indicate a lack of fit. Notice that the p-values for the Ljung–Box Q test all are well
above 0.05 for lags 1 through 8, indicating “significance.” This is not a desirable result.
However, the p-values for the remaining lags through 40 data values as less than 0.05. So
there is much data not contributing to correlations at high lags.

In-sample dynamic prediction. How well does our model do?

In [35]:

predict_sunspots30 = arma_mod30.predict('1990', '2012', dynamic=True)


print predict_sunspots30

1990-12-31 167.047421
1991-12-31 140.993013
1992-12-31 94.859134
1993-12-31 46.860929
1994-12-31 11.242619
1995-12-31 -4.721260
1996-12-31 -1.166883
1997-12-31 16.185712
1998-12-31 39.021896

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1999-12-31 59.449879
2000-12-31 72.170148
2001-12-31 75.376790
2002-12-31 70.436468
2003-12-31 60.731600
2004-12-31 50.201814
2005-12-31 42.076047
2006-12-31 38.114308
2007-12-31 38.454663
2008-12-31 41.963833
2009-12-31 46.869301
2010-12-31 51.423272
2011-12-31 54.399728
2012-12-31 55.321701
Freq: A-DEC, dtype: float64

In [36]:

ax = dta.ix['1950':].plot(figsize=(12,8))
ax = predict_sunspots30.plot(ax=ax, style='r--', label='Dynamic Predic
ax.legend();
ax.axis((-20.0, 38.0, -4.0, 200.0));

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [37]:

print "MFE = ", mean_forecast_err(dta.SUNACTIVITY, predict_sunspots30)


print "MAE = ", mean_absolute_err(dta.SUNACTIVITY, predict_sunspots30)

MFE = 5.63694046318
MAE = -0.141136496478

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
For small MAE the model is a pretty good fit. The MFE being slightly above 4 might indicate
the model tends to under-forecast.

Model Three: ARMA(2,1)


As a third model, we will try an ARMA(2,1) model with two autoregressive terms and one
moving average term.

In [38]:

arma_mod40 = sm.tsa.ARMA(dta, (2,1)).fit()

In [39]:

res40 = arma_mod40.resid

In [40]:

print arma_mod40.params

const 49.749195
ar.L1.SUNACTIVITY 1.470738
ar.L2.SUNACTIVITY -0.755121
ma.L1.SUNACTIVITY -0.153691
dtype: float64

In [41]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print arma_mod40.aic, arma_mod40.bic, arma_mod40.hqic

2620.27719156 2638.94389794 2627.74017636

In [42]:

sm.stats.durbin_watson(arma_mod40.resid.values)

Out[42]:

1.990341245334406

In [43]:

predict_sunspots40 = arma_mod40.predict('1990', '2012', dynamic=True)

In [44]:

resid40 = arma_mod40.resid

In [45]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
stats.normaltest(resid40)

Out[45]:

NormaltestResult(statistic=48.550090206155012, pvalue=2.86735763168793

In [46]:

predict_sunspots40 = arma_mod40.predict('1990', '2012', dynamic=True)

In [47]:

print "Metric for ARMA(2,1): ",mean_forecast_err(dta.SUNACTIVITY, p

Metric for ARMA(2,1): 5.65195019619 MFE

In [48]:

print "Metric for ARMA(2,1): ",mean_absolute_err(dta.SUNACTIVITY, p

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Metric for ARMA(2,1): 4.82202232657 MAE

Model Four: ARMA(2,3)


As a fourth model we will try an ARMA(2,3) model with three autoregressive terms and two
moving averages.

In [49]:

arma_mod50 = sm.tsa.ARMA(dta, (2,3)).fit()

In [50]:

sm.stats.durbin_watson(arma_mod50.resid.values)

Out[50]:

1.9674807976736908

In [51]:

print arma_mod50.params

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
const 49.718535
ar.L1.SUNACTIVITY 1.447478
ar.L2.SUNACTIVITY -0.749029
ma.L1.SUNACTIVITY -0.144607
ma.L2.SUNACTIVITY 0.067941
ma.L3.SUNACTIVITY 0.009734
dtype: float64

In [52]:

print arma_mod50.aic, arma_mod50.bic, arma_mod50.hqic

2622.85169078 2648.98507972 2633.29986951

In [53]:

resid50 = arma_mod50.resid

In [54]:

stats.normaltest(resid50)

Out[54]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
NormaltestResult(statistic=48.615275864226881, pvalue=2.77540890867719

In [55]:

predict_sunspots50 = arma_mod50.predict('1990', '2012', dynamic=True)

In [56]:

ax = dta.ix['1950':].plot(figsize=(12,8))
ax = predict_sunspots50.plot(ax=ax, style='r--', label='Dynamic Predic
ax.legend();
ax.axis((-20.0, 38.0, -4.0, 200.0));

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [57]:

print "Metric for ARMA(3,2): ",mean_forecast_err(dta.SUNACTIVITY, p

Metric for ARMA(3,2): 5.27758730154 MLE

In [58]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print "Metric for ARMA(3,2): ",mean_absolute_err(dta.SUNACTIVITY, p

Metric for ARMA(3,2): 0.325297875678 MAE

Model Five: ARIMA(3,0,2)


As a fifth model we will try an AIMA(3,0,2) model with three autoregressive terms, no
differences, and two moving averages.

In [59]:

arima_mod1 = sm.tsa.ARIMA(dta, (3,0,2)).fit()

In [60]:

print arima_mod1.params

const 48.486959
ar.L1.SUNACTIVITY 2.560637
ar.L2.SUNACTIVITY -2.471094
ar.L3.SUNACTIVITY 0.892054
ma.L1.SUNACTIVITY -1.518372
ma.L2.SUNACTIVITY 0.663744
dtype: float64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [61]:

sm.stats.durbin_watson(arima_mod1.resid.values)

Out[61]:

1.7634439830432345

In [62]:

print arima_mod1.aic, arima_mod1.bic, arima_mod1.hqic

2581.57229897 2607.7056879 2592.0204777

In [63]:

# show plots in the notebook


%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax = arima_mod1.resid.plot(ax=ax);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [64]:

resid1 = arima_mod1.resid

In [65]:

stats.normaltest(resid1)

Out[65]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
NormaltestResult(statistic=45.133708715833087, pvalue=1.58248527805154

In [66]:

%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid1, line='q', ax=ax, fit=True)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [67]:

%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid1.values.squeeze(), lags=40, ax=ax
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid1, lags=40, ax=ax2)

In [68]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
r,q,p = sm.tsa.acf(resid1.values.squeeze(), qstat=True)
data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print table.set_index('lag')

AC Q Prob(>Q)
lag
1 0.102089 3.251829 0.071344
2 -0.073750 4.954393 0.083978
3 -0.128911 10.173307 0.017149
4 0.069554 11.697568 0.019748
5 0.095314 14.569406 0.012370
6 0.072379 16.230927 0.012567
7 -0.023626 16.408542 0.021635
8 -0.136402 22.348679 0.004309
9 0.040967 22.886280 0.006457
10 0.100846 26.154915 0.003537
11 0.089601 28.743898 0.002487
12 -0.014426 28.811239 0.004202
13 -0.132360 34.498960 0.001010
14 -0.065597 35.900688 0.001080
15 -0.011676 35.945248 0.001800
16 0.122644 40.878636 0.000579
17 0.064368 42.242196 0.000619
18 -0.075295 44.114441 0.000556
19 -0.051250 44.984809 0.000689
20 -0.115365 49.410363 0.000269
21 -0.011647 49.455629 0.000434
22 0.071806 51.182092 0.000405
23 0.059462 52.370145 0.000446

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
24 -0.078581 54.452304 0.000371
25 0.014392 54.522396 0.000568
26 -0.004754 54.530072 0.000870
27 -0.081503 56.793746 0.000683
28 0.050981 57.682592 0.000798
29 -0.050678 58.564034 0.000929
30 -0.064736 60.007507 0.000919
31 -0.058183 61.177715 0.000978
32 0.067297 62.748932 0.000931
33 0.038756 63.271926 0.001174
34 0.011765 63.320293 0.001663
35 0.005979 63.332829 0.002343
36 -0.108496 67.476475 0.001142
37 -0.025842 67.712414 0.001523
38 -0.022721 67.895486 0.002035
39 0.000369 67.895534 0.002814
40 -0.002080 67.897080 0.003841

In [69]:

predict_sunspots1 = arima_mod1.predict('1990', '2012', dynamic=True)


print predict_sunspots1

1990-12-31 175.041558
1991-12-31 164.082608
1992-12-31 129.091771
1993-12-31 82.132355
1994-12-31 38.576035
1995-12-31 11.871561

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1996-12-31 9.232555
1997-12-31 29.609714
1998-12-31 64.487621
1999-12-31 101.089254
2000-12-31 126.803664
2001-12-31 133.315806
2002-12-31 119.098931
2003-12-31 89.541196
2004-12-31 54.794996
2005-12-31 26.180332
2006-12-31 12.402623
2007-12-31 16.836970
2008-12-31 36.711921
2009-12-31 64.356302
2010-12-31 89.986320
2011-12-31 105.033134
2012-12-31 104.888644
Freq: A-DEC, dtype: float64

In [70]:

ax = dta.ix['1950':].plot(figsize=(12,8))
ax = predict_sunspots1.plot(ax=ax, style='r--', label='Dynamic Predict
ax.legend();
ax.axis((-20.0, 38.0, -4.0, 200.0));

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [71]:

print "Metrics for ARIMA(3,0,2): ",mean_forecast_err(dta.SUNACTIVITY,

Metrics for ARIMA(3,0,2): -13.3948141872 MLE

In [72]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print "Metrics for ARIMA(3,0,2): ",mean_absolute_err(dta.SUNACTIVITY,

Metrics for ARIMA(3,0,2): 0.357429358606 MAE

Model Six: ARIMA(2,0,3)


Now let’s examine a sixth model.

In [73]:

arima_mod2 = sm.tsa.ARIMA(dta, (2,0,3)).fit()

In [74]:

print arima_mod2.params

const 49.718535
ar.L1.SUNACTIVITY 1.447478
ar.L2.SUNACTIVITY -0.749029
ma.L1.SUNACTIVITY -0.144607
ma.L2.SUNACTIVITY 0.067941
ma.L3.SUNACTIVITY 0.009734
dtype: float64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [75]:

mean_forecast_err(dta.SUNACTIVITY, predict_sunspots30)

Out[75]:

5.63694046318347

In [76]:

print arima_mod2.aic, arima_mod2.bic, arima_mod2.hqic

2622.85169078 2648.98507972 2633.29986951

In [77]:

# show plots in the notebook


%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
ax = arima_mod2.resid.plot(ax=ax);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [78]:

resid2 = arima_mod2.resid

In [79]:

stats.normaltest(resid2)

Out[79]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
NormaltestResult(statistic=48.615275864226881, pvalue=2.77540890867719

In [80]:

%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid2, line='q', ax=ax, fit=True)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [81]:

%matplotlib inline
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid2.values.squeeze(), lags=40, ax=ax
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid2, lags=40, ax=ax2)

In [82]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
r,q,p = sm.tsa.acf(resid2.values.squeeze(), qstat=True)
data = np.c_[range(1,41), r[1:], q, p]
table = pd.DataFrame(data, columns=['lag', "AC", "Q", "Prob(>Q)"])
print table.set_index('lag')

AC Q Prob(>Q)
lag
1 0.003654 0.004166 9.485368e-01
2 0.016284 0.087173 9.573495e-01
3 -0.012200 0.133919 9.874772e-01
4 0.135484 5.917449 2.053986e-01
5 0.087475 8.336347 1.386472e-01
6 0.081117 10.423244 1.079229e-01
7 0.053309 11.327560 1.249555e-01
8 -0.029051 11.597006 1.701103e-01
9 0.177629 21.704026 9.865850e-03
10 0.210414 35.933812 8.640929e-05
11 0.198595 48.652409 1.091797e-06
12 0.113626 52.829935 4.416460e-07
13 -0.017531 52.929715 9.294815e-07
14 0.010195 52.963573 1.929801e-06
15 -0.028636 53.231609 3.528614e-06
16 0.076135 55.132751 3.381868e-06
17 0.034417 55.522597 5.646754e-06
18 -0.058278 56.644198 7.048564e-06
19 0.057447 57.737799 8.771678e-06
20 -0.011998 57.785664 1.558883e-05
21 0.074738 59.649517 1.442882e-05
22 0.127935 65.129954 3.773169e-06
23 0.092605 68.011458 2.456406e-06

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
24 -0.084264 70.405618 1.897762e-06
25 0.014395 70.475736 3.269389e-06
26 -0.025898 70.703497 5.237901e-06
27 -0.127168 76.214477 1.396141e-06
28 0.052248 77.148070 1.761348e-06
29 -0.037720 77.636389 2.555245e-06
30 -0.019569 77.768295 4.115004e-06
31 -0.019972 77.906185 6.513392e-06
32 0.106340 81.829283 3.036510e-06
33 0.042161 82.448210 4.093087e-06
34 0.010416 82.486126 6.560314e-06
35 0.015178 82.566924 1.022858e-05
36 -0.118302 87.493436 3.552541e-06
37 -0.036109 87.954084 4.926259e-06
38 -0.043925 88.638258 6.325244e-06
39 -0.017690 88.749642 9.574259e-06
40 -0.008921 88.778071 1.467656e-05

In [83]:

predict_sunspots2 = arima_mod2.predict('1990', '2012', dynamic=True)


print predict_sunspots2

1990-12-31 167.956436
1991-12-31 142.397104
1992-12-31 95.564926
1993-12-31 46.661219
1994-12-31 10.952848
1995-12-31 -4.103928

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1996-12-31 0.848334
1997-12-31 19.294592
1998-12-31 42.285756
1999-12-31 61.748175
2000-12-31 72.698544
2001-12-31 73.971042
2002-12-31 67.610807
2003-12-31 57.451369
2004-12-31 47.509807
2005-12-31 40.729330
2006-12-31 38.361260
2007-12-31 40.012307
2008-12-31 44.175913
2009-12-31 48.965961
2010-12-31 52.780787
2011-12-31 54.714777
2012-12-31 54.656770
Freq: A-DEC, dtype: float64

In [84]:

print "Metrics for ARIMA(2,0,3): ",mean_forecast_err(dta.SUNACTIVITY,

Metrics for ARIMA(2,0,3): 5.27758730154 MLE

In [85]:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print "Metrics for ARIMA(2,0,3): ",mean_absolute_err(dta.SUNACTIVITY,

Metrics for ARIMA(2,0,3): 0.325297875678 MAE

In [86]:

ax = dta.ix['1950':].plot(figsize=(12,8))
ax = predict_sunspots2.plot(ax=ax, style='r--', label='Dynamic Predict
ax.legend();
ax.axis((-20.0, 38.0, -4.0, 200.0));

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Now let’s compare all six ARMA models using the metrics we have discussed.

In [87]:

print "Model 1: ARMA(2,0) AIC", arma_mod20.aic, " BIC", arma_mod20


print "Model 2: ARMA(3,0) AIC", arma_mod30.aic, " BIC", arma_mod30
print "Model 3: ARMA(2,1) AIC", arma_mod40.aic, "BIC", arma_mod40.
print "Model 4: ARMA(2,3) AIC", arma_mod50.aic, " BIC", arma_mod50
print "Model 5: ARIMA(3,0,2) AIC", arima_mod1.aic, " BIC", arima_mod1
print "Model 6: ARIMA(2,0,3) AIC", arima_mod2.aic, " BIC", arima_mod2

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 1: ARMA(2,0) AIC 2622.63633806 BIC 2637.56970317 HQIC 2628
Model 2: ARMA(3,0) AIC 2619.4036287 BIC 2638.07033508 HQIC 2626.
Model 3: ARMA(2,1) AIC 2620.27719156 BIC 2638.94389794 HQIC 2627.
Model 4: ARMA(2,3) AIC 2622.85169078 BIC 2648.98507972 HQIC 2633
Model 5: ARIMA(3,0,2) AIC 2581.57229897 BIC 2607.7056879 HQIC 2592.
Model 6: ARIMA(2,0,3) AIC 2622.85169078 BIC 2648.98507972 HQIC 2633

In [88]:

print "Metrics for Model 1 ARMA(2,0): ",mean_forecast_err(dta.SUNAC


print "Metrics for Model 2 ARMA(3,0): ",mean_forecast_err(dta.SUNAC
print "Metrics for Model 3 ARMA(2,1): ",mean_forecast_err(dta.SUNAC
print "Metrics for Model 4 ARMA(2,3): ",mean_forecast_err(dta.SUNAC
print "Metrics for Model 5 ARIMA(3,0,2): ",mean_forecast_err(dta.SUNAC
print "Metrics for Model 6 ARIMA(2,0,3): ",mean_forecast_err(dta.SUNAC

Metrics for Model 1 ARMA(2,0): 4.73038833847 MFE 0.134688778061


Metrics for Model 2 ARMA(3,0): 5.63694046318 MFE -0.14113649647
Metrics for Model 3 ARMA(2,1): 5.65195019619 MFE 4.82202232657
Metrics for Model 4 ARMA(2,3): 5.27758730154 MFE 0.325297875678
Metrics for Model 5 ARIMA(3,0,2): -13.3948141872 MFE 0.35742935860
Metrics for Model 6 ARIMA(2,0,3): 5.27758730154 MFE 0.325297875678

Analysis
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y
Clearly Model 5 minimizes (aic, bic, hqic). However, Model 1 has the smallest MFE and MAE.
In terms of consistency, Model 2 minimizes AIC, BIC, HQIC, MFE and MAE. Also, ARIMA(3,0,2)
has the lowest AIC, BIC and HQIC values, though it slightly overfits. Notice that ARIMA(2,0,3)
and ARMA(2,3) are the same. Why?

Conclusion
Based on the analysis, I would be conservative and choose Model 1, ARMA(2,0), above the
other models.

More Experimentation
Now lets build a few model models for comparison. Note we will renumber the models as
well.

In [89]:

arma_mod10 = sm.tsa.ARMA(dta, (1,0)).fit()


arma_mod20 = sm.tsa.ARMA(dta, (2,0)).fit()
arma_mod30 = sm.tsa.ARMA(dta, (3,0)).fit()
arma_mod40 = sm.tsa.ARMA(dta, (2,1)).fit()
arma_mod50 = sm.tsa.ARMA(dta, (2,3)).fit()
arima_mod1 = sm.tsa.ARIMA(dta, (3,0,2)).fit()
arima_mod2 = sm.tsa.ARIMA(dta, (2,0,2)).fit()
arima_mod3 = sm.tsa.ARIMA(dta, (1,0,0)).fit()
arima_mod4 = sm.tsa.ARIMA(dta, (0,1,0)).fit()
arima_mod5 = sm.tsa.ARIMA(dta, (0,0,1)).fit()
arima_mod6 = sm.tsa.ARIMA(dta, (1,1,0)).fit()
arima_mod7 = sm.tsa.ARIMA(dta, (0,1,1)).fit()
arima_mod8 = sm.tsa.ARIMA(dta, (1,1,1)).fit()
arima_mod9 = sm.tsa.ARIMA(dta, (3,0,3)).fit()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
arima_mod10= sm.tsa.ARIMA(dta, (1,0,6)).fit()
arima_mod11= sm.tsa.ARIMA(dta, (1,0,3)).fit()

We caclualue AIC, BIC, and HQIC.

In [90]:

print "Model 01: ARMA(1,0) AIC", arma_mod20.aic, " BIC", arma_mod2


print "Model 02: ARMA(2,0) AIC", arma_mod20.aic, " BIC", arma_mod2
print "Model 03: ARMA(3,0) AIC", arma_mod30.aic, " BIC", arma_mod3
print "Model 04: ARMA(2,1) AIC", arma_mod40.aic, "BIC", arma_mod40
print "Model 05: ARMA(2,3) AIC", arma_mod50.aic, " BIC", arma_mod5
print "Model 06: ARIMA(3,0,2) AIC", arima_mod1.aic, " BIC", arima_mod
print "Model 07: ARIMA(2,0,2) AIC", arima_mod2.aic, " BIC", arima_mod
print "Model 08: ARIMA(1,0,0) AIC", arima_mod3.aic, " BIC", arima_mod
print "Model 09: ARIMA(0,1,0) AIC", arima_mod4.aic, " BIC", arima_mod
print "Model 10: ARIMA(0,0,1) AIC", arima_mod5.aic, " BIC", arima_mod
print "Model 11: ARIMA(1,1,0) AIC", arima_mod6.aic, " BIC", arima_mod
print "Model 12: ARIMA(0,1,1) AIC", arima_mod7.aic, " BIC", arima_mod
print "Model 13: ARIMA(1,1,1) AIC", arima_mod8.aic, " BIC", arima_mod
print "Model 14: ARIMA(3,0,3) AIC", arima_mod9.aic, " BIC", arima_mod
print "Model 15: ARIMA(1,0,6) AIC", arima_mod10.aic, " BIC", arima_mo
print "Model 16: ARIMA(1,0,3) AIC", arima_mod11.aic, " BIC", arima_mo

Model 01: ARMA(1,0) AIC 2622.63633806 BIC 2637.56970317 HQIC 262


Model 02: ARMA(2,0) AIC 2622.63633806 BIC 2637.56970317 HQIC 262
Model 03: ARMA(3,0) AIC 2619.4036287 BIC 2638.07033508 HQIC 2626
Model 04: ARMA(2,1) AIC 2620.27719156 BIC 2638.94389794 HQIC 2627
Model 05: ARMA(2,3) AIC 2622.85169078 BIC 2648.98507972 HQIC 263

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 06: ARIMA(3,0,2) AIC 2581.57229897 BIC 2607.7056879 HQIC 2592
Model 07: ARIMA(2,0,2) AIC 2620.87269582 BIC 2643.27274348 HQIC 262
Model 08: ARIMA(1,0,0) AIC 2819.16915243 BIC 2830.36917626 HQIC 282
Model 09: ARIMA(0,1,0) AIC 2835.1157734 BIC 2842.57597297 HQIC 2838
Model 10: ARIMA(0,0,1) AIC 2886.90066859 BIC 2898.10069242 HQIC 289
Model 11: ARIMA(1,1,0) AIC 2730.68977742 BIC 2741.88007677 HQIC 273
Model 12: ARIMA(0,1,1) AIC 2743.27827663 BIC 2754.46857598 HQIC 274
Model 13: ARIMA(1,1,1) AIC 2724.51170879 BIC 2739.43210792 HQIC 273
Model 14: ARIMA(3,0,3) AIC 2575.69564491 BIC 2605.56237513 HQIC 258
Model 15: ARIMA(1,0,6) AIC 2644.84706483 BIC 2678.44713632 HQIC 265
Model 16: ARIMA(1,0,3) AIC 2655.64444292 BIC 2678.04449059 HQIC 266

Now we calculate predictions for each model.

In [91]:

predict_sunspots10 = arma_mod10.predict('1990', '2012', dynamic=True)


predict_sunspots20 = arma_mod20.predict('1990', '2012', dynamic=True)
predict_sunspots30 = arma_mod30.predict('1990', '2012', dynamic=True)
predict_sunspots40 = arma_mod40.predict('1990', '2012', dynamic=True)
predict_sunspots50 = arma_mod50.predict('1990', '2012', dynamic=True)
predict_sunspots1 = arima_mod1.predict('1990', '2012', dynamic=True)
predict_sunspots2 = arima_mod2.predict('1990', '2012', dynamic=True)
predict_sunspots3 = arima_mod3.predict('1990', '2012', dynamic=True)
predict_sunspots4 = arima_mod4.predict('1990', '2012', dynamic=True)
predict_sunspots5 = arima_mod5.predict('1990', '2012', dynamic=True)
predict_sunspots6 = arima_mod6.predict('1990', '2012', dynamic=True)
predict_sunspots7 = arima_mod7.predict('1990', '2012', dynamic=True)
predict_sunspots8 = arima_mod8.predict('1990', '2012', dynamic=True)
predict_sunspots9 = arima_mod9.predict('1990', '2012', dynamic=True)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
predict_sunspots10 = arima_mod10.predict('1990', '2012', dynamic=True)
predict_sunspots11 = arima_mod11.predict('1990', '2012', dynamic=True)

Now we calculate MFE and MAE for each model.

In [92]:

print "Metrics for Model 01 ARMA(1,0): ",mean_forecast_err(dta.SUNA


print "Metrics for Model 02 ARMA(2,0): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 03 ARMA(3,0): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 04 ARMA(2,1): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 05 ARMA(2,3): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 06 ARIMA(3,0,2): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 07 ARIMA(2,0,2): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 08 ARIMA(1,0,0): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 09 ARIMA(0,1,0): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 10 ARIMA(0,0,1): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 11 ARIMA(1,1,0): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 12 ARIMA(0,1,1): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 13 ARIMA(1,1,1): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 14 ARIMA(3,0,3): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 15 ARIMA(1,0,6): ",mean_forecast_err(dta.SUNA
print "Metrics for Model 16 ARIMA(1,0,3): ",mean_forecast_err(dta.SUNA

Metrics for Model 01 ARMA(1,0): 2.46349253974 MFE 0.04916343837


Metrics for Model 02 ARMA(2,0): 4.73038833847 MFE 0.13468877806
Metrics for Model 03 ARMA(3,0): 5.63694046318 MFE -0.1411364964
Metrics for Model 04 ARMA(2,1): 5.65195019619 MFE 4.82202232657

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Metrics for Model 05 ARMA(2,3): 5.27758730154 MFE 0.32529787567
Metrics for Model 06 ARIMA(3,0,2): -13.3948141872 MFE 0.3574293586
Metrics for Model 07 ARIMA(2,0,2): 5.26199149696 MFE 0.21582641378
Metrics for Model 08 ARIMA(1,0,0): -13.3102092512 MFE 0.2074657886
Metrics for Model 09 ARIMA(0,1,0): 61.396291866 MFE -9004.78947368
Metrics for Model 10 ARIMA(0,0,1): 9.30138026093 MFE 0.18354189627
Metrics for Model 11 ARIMA(1,1,0): 57.8502628003 MFE 11612.9783655
Metrics for Model 12 ARIMA(0,1,1): 60.7306048362 MFE -17940.477995
Metrics for Model 13 ARIMA(1,1,1): 59.019277972 MFE 10926.8374441
Metrics for Model 14 ARIMA(3,0,3): -10.2668182636 MFE 0.2810331308
Metrics for Model 15 ARIMA(1,0,6): 2.46349253974 MFE 0.04916343837
Metrics for Model 16 ARIMA(1,0,3): -3.61421431343 MFE 0.0652529614

Conclusion
We had assumed from our preliminary data analysis that the model would have at leat 2
autoregression terms. So, we will first examine and compare the models ARMA(2,0),
ARMA(2,1), ARMA(2,3) and ARIMA(2,0,3)

Model 14 ARIMA(2,0,3) has relatively good values for AIC, BIC, HQIC, and MAE, but not MFE
and underfits (MFE = 5.2779 > 4).

Model 02 ARMA(2,0) has the next best MAE and a relatively good BIC, but slightly underfits
(MFE = 4.7303 > 4).

Model 01 ARMA(2,1) and ARMA(2,3) are not as good as ARMA(2,0).

However, look at ARMA(1,0). It has the same AIX, BIC, and HQIC, but it minimizes MFE and
MAE.

We choose Model 01 ARMA(1,0) as the best model based on these results. However, I would
take the best performing model and plot their prediction and compare them.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Now try to build ARNA(3,2) on your own and compare it with the best model.

Authored by:
Jeffrey Strickland, Ph.D.

Jeffrey Strickland, Ph.D., is the Author of “Predictive Analytics Using R”


and a Senior Analytics Scientist with Clarity Solution Group. He has
performed predictive modeling, simulation and analysis for the
Department of Defense, NASA, the Missile Defense Agency, and the
Financial and Insurance Industries for over 20 years. Jeff is a Certified
Modeling and Simulation professional (CMSP) and an Associate
Systems Engineering Professional. He has published nearly 200 blogs on LinkedIn, is also a
frequently invited guest speaker and the author of 20 books including:

Discrete Event simulation using ExtendSim


Crime Analysis and Mapping
Missile Flight Simulation, Second Edition
Mathematical modeling of Warfare and Combat Phenomenon
Predictive Modeling and Analytics
Using Math to Defeat the Enemy
Verification and Validation for Modeling and Simulation
Simulation Conceptual Modeling

Connect with Jeffrey Strickland


Contact Jeffrey Strickland

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Advertisements

Report this ad

Report this ad

Colorado Springs, CO, USA

SHARE THI S:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
         3 1

Like
2 bloggers like this.

RELATED

Predictive Analytics: the The Skills and Tools of What’s up with Gold?
skills and tools required Predictive Analytics In "Articles"
In "Articles" In "Articles"

 Categories: Articles, Education & Training, Featured, Jeffrey Strickland

 Tagged as: Big Data, Business Analytics, Predictive Analytics, Python, Statistics

 This Is Why You’re Failing at Big Data Use Big Data for Property Market Research 

5 replies »

 Pingback: 1 – Time Series Analysis using iPython - Exploding Ads

 Pingback: What is Time Series Analysis? | bicorner.com

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
 Pingback: bicorner.com’s Top 10 Posts of 2015 | bicorner.com

 Pingback: Truckee real estate – Luv the Data

mike Reply
December 11, 2016 at 2:22 pm

The link to the data is down

 Like

 Leave a Reply
Enter your comment here...

TOP CATEGORIES: ARTICLES MICROSOFT

SOCIAL LINKS:  TWITTER  FACEBOOK  GOOGLE+  TUMBLR

BLOG AT WORDPRESS.COM. |

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD

You might also like