0% found this document useful (0 votes)
747 views

NumXL - Getting Started

NumXL is an econometric and time series analysis Excel add-in. Getting Started with NumXL is designed as self-study course. We are constantly working on improving it, so we invite comments, suggestions and criticisms. In this document, we will demonstrate the basic steps in conducting your own time series analysis, diagnose model's residuals and make forecasts, only using NumXL. http://bitly.com/I1WLvi

Uploaded by

NumXL Pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
747 views

NumXL - Getting Started

NumXL is an econometric and time series analysis Excel add-in. Getting Started with NumXL is designed as self-study course. We are constantly working on improving it, so we invite comments, suggestions and criticisms. In this document, we will demonstrate the basic steps in conducting your own time series analysis, diagnose model's residuals and make forecasts, only using NumXL. http://bitly.com/I1WLvi

Uploaded by

NumXL Pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Getting

Started
with
NumXL
Tutorial
Version1.57
August 10
th
,2012

NumXL 1.57 - Getting Started -2- Spider Financial Corp, 2012

Overview

NumXL is a suite of time series Add-ins for Microsoft Excel. Once loaded, NumXL integrates
scores of time series functions, along with a rich set of user interfaces and tools to assist in your
data analysis.

This document is prepared as a self-study course; it is divided into seven (7) separate modules
organized in terms of relevance (and to some extent, difficult):
- Module 1: Data Preparation
- Module 2: Descriptive Statistics
- Module 3: Time Series Smoothing
- Module 4: Correlogram Analysis
- Module 5: Modeling
- Module 6: Calibration
- Module 7: Residuals Diagnosis
- Module 8: Forecast

The purpose of these modules is to demonstrate the basic steps in conducting your own time
series analysis, diagnosis, and forecast, using only NumXLs functions and tools.
Feel free to start at any point in the course, but bear in mind that the examples build on each
other from one module to the next.

Getting Started with NumXL is designed as a self-study course. We are constantly working on
improving it, so we invite comments, suggestions, and criticisms.


******************
Spider Financial Corp
1507 E. 53
rd
Street, Ste. 480
Chicago, IL 60615

+1 (888) 427-9486
+1 (312) 324-0367

[email protected]
www.spiderfinancial.com



NumXL 1.57 - Getting Started -3- Spider Financial Corp, 2012

Module1:Datapreparation
In this module, well discuss how to prepare our sample data for time series analysis with
NumXL.
SampleData
Consider the daily adjusted
1
closing prices for shares of Microsoft stock between 1/3/2000 and
May 1
st
2009.
0
10
20
30
40
50
60
J
a
n

0
0
M
a
y

0
0
S
e
p

0
0
J
a
n

0
1
M
a
y

0
1
S
e
p

0
1
J
a
n

0
2
M
a
y

0
2
S
e
p

0
2
J
a
n

0
3
M
a
y

0
3
S
e
p

0
3
J
a
n

0
4
M
a
y

0
4
S
e
p

0
4
J
a
n

0
5
M
a
y

0
5
S
e
p

0
5
J
a
n

0
6
M
a
y

0
6
S
e
p

0
6
J
a
n

0
7
M
a
y

0
7
S
e
p

0
7
J
a
n

0
8
M
a
y

0
8
S
e
p

0
8
J
a
n

0
9
MSFT

We downloaded the sample data (MSFT) from finance.yahoo.com.
1.1 DataLayoutinExcel
Once you have your sample data, the most common time series layout method is to display the
dates and values in adjacent columns in the same spreadsheet. Although the date component is
not needed for modeling, it gives us a general idea about the chronological order of the values.
All NumXL functions support two different chronological orders:

1. Ascending: the first value corresponds to the
earliest observation. NumXL assumes an
ascending order by default, unless otherwise
specified.

2. Descending: the first value (observation)
corresponds to the latest observation.





1
Closing prices are adjusted for splits and dividends

NumXL 1.57 - Getting Started -4- Spider Financial Corp, 2012

1.2 DataSampling
Once you have the ordered time series in your worksheet, you should examine the sampling
assumptions.

A time series data sample will generally contain observations that are equally-spaced over time,
where the value of each observation is available (i.e. there are no missing values).

For the MSFT daily closing price sample data, the observations are tacked to the end of each
workday. In this case, the sample period is the trading day (not the calendar day), and the
observations in the sample are equally spaced as a result.

Note: In the event the sample data contains one or more missing values, a special treatment is
required to impute their values. Refer to the Missing Values issue in the NumXL Tips & Hints
archive online.

Once you have the ordered time series in your worksheet and have made a note of the sampling
assumption, you should examine the data visually to ensure that it meets the important
assumptions defined by econometric and time series theories:
1. Is the underlying process stable (Homogeneity)?
2. Do the variance and auto-covariance remain the same throughout the sample span
(stationarity) ?
3. Do we have observations with unusual values?
4. Are the values of the observations well spread-out?

1.3 Stationarity
For stationarity, we are mainly concerned with the stability of variances and covariance
throughout the sample.

The stationarity assumption is pivotal for time series theories, so how do we check for it?
Paradoxically, we start by testing for non-stationary conditions, primarily: (1) the presence of
unit root (random walk) and/or (2) the presence of deterministic trend. If we cant find them, we
may conclude that the data is stationary.

Lets examine the plot for the original data for a deterministic trend or random walk (possibly
with a drift).

For the MSFT price time series, the data plot does not exhibit any trend and the series seems
stationary.

Note: For more details on time series stationarity, refer to the Stationarity issue in the
NumXL Tips & Hints archive online.


NumXL 1.57 - Getting Started -5- Spider Financial Corp, 2012

1.4 Homogeneity
Before we attempt to propose a model for a time series, we ought to verify that the underlying
process did not undergo a structural change, at least during the span of the sample data.

What are structural changes? Structural changes are those events that permanently alter the
statistical properties of the stochastic process. A structural change can be triggered by new
changes in policies, passing new laws, or any major development (exogenous) during the span of
the sample.

To examine for homogeneity (or the lack thereof), look over the data plot along with WMA and
EWMA and try to identify any (permanent) changes in the mean, variance or any signs of trend
or random walk.

Furthermore, an analyst/investigator must bring a rich prior knowledge and strong hypotheses
about the underlying process structure to his interpretation of a data set.

In the plot below, we draw a 20-day equally-weighted moving average along with the original
data.

0
10
20
30
40
50
60
1/3/2000 1/3/2001 1/3/2002 1/3/2003 1/3/2004 1/3/2005 1/3/2006 1/3/2007 1/3/2008 1/3/2009
MSFT WMA

Examining the sample data plot and the weighted-moving average (WMA), there is no evidence
of a sudden permanent change in the underlying process mean.

Note: For more details on time series homogeneity, refer to the Homogeneity issue in the
NumXL Tips & Hints archive online.
1.5 Outliers
An outlier is an observation that is numerically distant from the rest of the data. In other words,
an outlier is one that appears to deviate markedly from other members of the sample in which it
occurs.

The mere presence of outliers in our data may change the mean level in the uncontaminated time
series, or it might suggest that the underlying distribution has fat-tails.


NumXL 1.57 - Getting Started -6- Spider Financial Corp, 2012

Outlier detection is a complex topic; for starters, we can examine the data plot visually. There are
a few statistical methods to mark potential outliers, but it is your responsibility to verify, and to
some extent explain, their values.
Note: This is a quick overview for a very complex subject. For more details on time series
homogeneity, refer to the Outliers issue in the NumXL Tips & Hints online.

A quick way to examine the outliers in the data is through the use of a Q1, Q3 quartile (i.e.
IQR.).

LL Q1 1.5 IQR
UL Q3 1.5 IQR
=
= +

In the plot below, the shaded region represents the values between the upper fence (UL) and the
lower fence value (UL).
0
10
20
30
40
50
60
1/3/2000 1/3/2001 1/3/2002 1/3/2003 1/3/2004 1/3/2005 1/3/2006 1/3/2007 1/3/2008 1/3/2009
One may argue that the values of the observations at the beginning of the sample are very high.
1.6 ConcentrationofValues
Occasionally, we face a time series in which values are naturally restricted to a given range. For
example, binomial data are restricted between 0 and 1. Another example is company quarterly
revenues, which are listed as positive integers within a wide range.
Whyshouldwecare?
First, the time series model does not assume any bounds or limits on values that the time series
can take, so using those models for a constrained data set may yield poor fitting.

Second, having a floor or a ceiling level in the data set affects the symmetry (or lack of skew) of
the values around the mean. This phenomenon can also be difficult to capture using time series
models.

Third, a data set whose values span several orders of magnitude can prove to be problematic for
modeling and forecasting.

Finally, a relationship between the observation level and local variance may develop and, for the
same reasons above, well have to stabilize the variance before doing anything else.

NumXL 1.57 - Getting Started -7- Spider Financial Corp, 2012


To detect the issues associated with concentration of values, we ask the following questions:
(1) Is the volatility/variance changing in relation to the observation levels?
(2) Are the data values capped or floor-leveled?
(3) Does the distribution show a skew in either direction?
Assuming we have a concentration of values issue, what is next? We need to perform a data
transformation on the input data.
Goal: we like the values of the observations to be distributed close to a normal distribution.

Lets examine the distribution of the daily closing prices for MSFT shares. First, we plot the
histogram and QQ-plot of daily closing prices:
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
1
5
.
6
1
8
.
0
2
0
.
4
2
2
.
7
2
5
.
1
2
7
.
5
2
9
.
8
3
2
.
2
3
4
.
5
3
6
.
9
3
9
.
3
4
1
.
6
4
4
.
0
4
6
.
3
3.00
2.00
1.00
0.00
1.00
2.00
3.00
2.50 1.50 0.50 0.50 1.50 2.50 3.50

Obviously, the data is far from normal, but the takeaway here is that 50% of the observations fall
in a narrow range (22.18 27.61).

Next, lets transform the values using the Box-Cox function. A Box-Cox transformation is a
special form of power transformation and requires one input lambda.

By optimizing the Box-Cox transformation for our sample data, we found that a zero value
lambda (i.e. Log transformation) brings our data close to normality.

In the figure below, we plotted the histogram and QQ-plot for the log-price.
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
2
.
7
2
.
8
2
.
9
3
.
0
3
.
1
3
.
2
3
.
2
3
.
3
3
.
4
3
.
5
3
.
6
3
.
7
3
.
8 3.00
2.00
1.00
0.00
1.00
2.00
3.00
3.00 2.00 1.00 0.00 1.00 2.00 3.00 4.00

The log (Box-Cox with zero-lambda) transformation improves the distribution of the values,
especially at the right-tail.

NumXL 1.57 - Getting Started -8- Spider Financial Corp, 2012

Conclusion
In this module, we discussed sample data layout in Excel, sampling assumptions of the data and
high-lighted four (4) important issues to examine in our data before we conduct the analysis.


For concentration of values, we used histograms and an empirical distribution function to show
that 50% of values are concentrated in a narrow band. To bring the distribution close to normal
distribution, we used the Box-Cox transformation and optimized for lambda (i.e. Box-Cox
parameter) and found that logarithmic (special case of Box-Cox) is the best transformation.
For the remainder of the user guide, we will use the log-transformation of the sample data to
carry on the analysis.


NumXL 1.57 - Getting Started -9- Spider Financial Corp, 2012

Module2:SummaryStatistics
In module 1, we examined the MSFT daily closing prices time series to explore common issues
associated with real time series data: stationarity, homogeneity, outliers, missing values and
concentration of values. We concluded the module with a transformation of the input data using
logarithmic transformation.

In this module, we will conduct a few computations to summarize the sample statistical
distribution in an attempt to understand the unknown population distribution.

NumXL comes with scores of functions to compute various summary statistics functions,
including robust functions like Quantile, IQR, etc. Furthermore, NumXL includes a wide range
of statistical tests to verify the significance of the computed summary statistics.

For our purposes here, well use the NumXL toolbar and summary statistics wizards.
- Excel 2007/2010: Using the NumXL tab, click DESC STAT.

- Excel 97-2003: Using either the NumXL menu or toolbar, click DESC STAT.

Next, the Descriptive Statistics dialog box will pop up. Fill in the fields with your (log-
transformed) data location, series time order, options and location for the results to appear on
your worksheet.

NumXL 1.57 - Getting Started -10- Spider Financial Corp, 2012


The Descriptive Statistics dialog box will print out the selected statistics and tests (along with the
formulas) into your worksheet.

In sum, one may conclude that the underlying distribution has the following properties:
- Mean is significantly different from zero
- Density (mass) distribution is significantly positively skewed
- Density distribution has fat-tails
- Half of the observation values fall between 3.09 and 3.32

Although the median is smaller than average, the distribution is positively skewed, leading us to
believe that the distribution has right fat-tails.

The quartiles (Q1, Q3) inscribe 50% of the values in the sample. The inter-quartile range (IQR)
can be used to characterize the data when there may be extremities that skew the data; the
interquartile range is a relatively robust statistic (also sometimes called "resistant") compared to
the range and standard deviation.

Well revisit the statistical tests for white-noise, normality and arch effect in module 5, but for
now we can ignore the right-most table.

NumXL 1.57 - Getting Started -11- Spider Financial Corp, 2012


Recall the MSFT log-price histogram in module 1 (see below):
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
2
.
7
2
.
8
2
.
9
3
.
0
3
.
1
3
.
2
3
.
2
3
.
3
3
.
4
3
.
5
3
.
6
3
.
7
3
.
8





NumXL 1.57 - Getting Started -12- Spider Financial Corp, 2012

Module3:Smoothing
In module one (1), we demonstrated the data preparation phase of time series analysis. In
module two (2), we described few steps to calculate numerous summary statistics and verify the
significance of their values.

In this module, we will walk you through time series smoothing in Excel using NumXL
functions and tools. For sample data, well use the S&P 500 weekly closing prices between
January 2009 and July 2012.

NumXL supports numerous smoothing functions, but each function assumes a particular
characteristic about the sample data.
60
70
80
90
100
110
120
130
140
150
S&P500

Lets consider the S&P 500 weekly close prices time series between Jan 2009 and July 2012.
The time series exhibits a trend over time.

Using equally-weighted moving average (WMA) with a window size of 4 weeks, forecasting
into the next 12 weeks, we find:
60
70
80
90
100
110
120
130
140
150
S&P500
S&P500
WMA

The WMA keeps pace with the original data, but it is lagging. Furthermore, the out-of-sample
forecast is flat.


NumXL 1.57 - Getting Started -13- Spider Financial Corp, 2012

Assuming the trend is deterministic (non-stochastic), we can use the Holt-Winters double
exponential smoothing functions (DESMTH).

60
70
80
90
100
110
120
130
140
150
S&P500
S&P500
WMA

The double exponential smoothing function
2
tracks the data pretty well and the forecast looks
inline with the original curve. Is this it? Did we find a crystal ball that tells us where the price
will be each week? Not quite!

Earlier, we made the assumption that the trend is deterministic (non-stochastic), but the price is
more like a random-walk process, so the trend we observe is just an anomaly that can occur in
the random-walk.
Proof?
The Augmented Dickey-Fuller unit-root test (ADF Test) in NumXL can test for the presence of a
unit-root (i.e. random-walk) in the presence of drift and/or trend.

1
(1 ) ...
t t t
L y y y t o |

= V = + + +
The ADF test is basically: : 0
o
H =

The unit-root existance is confirmed in all 3 different formulations: no const (
1 t t t
y y c

V = + ),
const (
1 t t t
y y o c

V = + + ) and const+trend (
1 t t t
y y t o | c

V = + + + ).

The time series is integrated (i.e. has a unit-root), so we need to take the first difference to
stabilize (i.e. make stationary) the input data:
(1 )
t t t
z L y y = = V




2
We used optimal values for the smoothing parameters of the exponential smoothing function.

NumXL 1.57 - Getting Started -14- Spider Financial Corp, 2012

Module4:CorrelogramAnalysis
In module one (1), we demonstrated the data preparation phase of time series analysis. In module
two, we described a few steps to calculate numerous summary statistics and verify the
significance of their values.

In this module, we present a few steps to conduct a correlogram analysis in Excel using NumXL
functions and tools.

For sample data, well use the S&P 500 closing log
3
prices between January 2009 and July 2012.
4.00
4.10
4.20
4.30
4.40
4.50
4.60
4.70
4.80
4.90
5.00
LOGS&P500

Many time series data sets exhibit time interdependency among their values. This is important to
detect and will eventually factor in to improve the forecast quality of the model.

NumXL supports numerous functions and wizard user-interface, simplifying the process of
contructing an ACF and partial ACF (aka PACF) plots.

Using the NumXL Correlogram toolbar, you can generate the ACF/PACF values and their plots
in a few steps.
1. Using the NumXL toolbar (or menu in Excel 97-2003), select Correlogram.



3
In module 1, we showed that the logarithmic transformation of the prices provides better values distribution.

NumXL 1.57 - Getting Started -15- Spider Financial Corp, 2012

2. The Correlogram dialog box pops up. Fill in the location of your data, series time order,
output options and location for the table and graphs to be generated in your worksheet.


3. Once finished, the tool prints out the table (along with the formulas) into the target cells
and creates a correlogram plot (if selected).



100%
0%
100%
200%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF

50%
0%
50%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PACF


NumXL 1.57 - Getting Started -16- Spider Financial Corp, 2012

The shaded area in the ACF and PACF plots represents the confidence intervals for the ACF and
PACF values.

Note that PACF is significant (~100%) at lag order 1, and the ACF is declining very slowly.
This is a common pattern indicating the presence of unit-root
4
.

Next, lets take the first difference of the time series.
10%
8%
6%
4%
2%
0%
2%
4%
6%
8%
10%
12%
S&P500LOGReturns

Next, lets run the correlogram analysis on the differenced (i.e. log returns) time series.

50%
0%
50%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF

40%
20%
0%
20%
40%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PACF


4
In module three, we tested the time series for the presence of unit-root.

NumXL 1.57 - Getting Started -17- Spider Financial Corp, 2012

The log returns do not exhibit strong interdependency, though lag order 8 and 9 show marginal
significance. This beg the following question:

Q1: Does the log-returns time series exhibit white-noise (no serial correlation)?
To answer this question, well use the descriptive statistics wizard and check the white-noise test
option.


Now, check the white-noise (Ljung-Box) test field:

The summary statistics table with the white-noise test appears as follows:

The answer for our question is Yes, the time series does not exhibit significant serial correlation.

Whats next?
The weekly log-returns time series distribution possesses a fat-tails (i.e. excess-kurtosis > 0),
which may happen if the squared returns are correlated (aka ARCH effect).

Q: Does the log-returns time series exhibit an ARCH effect? Are the squared weekly log-returns
correlated or morelike a white-noise distribution?


NumXL 1.57 - Getting Started -18- Spider Financial Corp, 2012

0
0.002
0.004
0.006
0.008
0.01
0.012
S&P500Weekly
Squared
LogReturns

Again, launch the descriptive statistics wizard from the NumXL toolbar (or menu in Excel 2003),
and select the ARCH effect.

Examining the the ARCH effect test results, we conclude that the squared returns are serially
correlated; i.e. we have a conditional heterskedacity in the log returns.

Lets examine the correlogram of the squared log-returns:
50%
0%
50%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
ACF

50%
0%
50%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PACF

The PACF shows a significant autocorrelation up to the 3
rd
lag order.
Conclusion
The correlogram analysis is a key tool to explore the interdependency of the observation values;
it can also be used as a tool to identify the model and the estimate the orders of its components.
In our example, we found that the weekly log returns are not correlated, but their squared values
are. As a result, an ARCH/GARCH model may be in order here.


NumXL 1.57 - Getting Started -19- Spider Financial Corp, 2012

Module5:TimeSeriesModeling
In module four (4), we demonstrated correlogram analysis and its use in identifying proper time
series models.

In this module, we will walk you through the model specification process using NumXL
functions and tools.

NumXL supports numerous time series models: ARMA, ARIMA, AirLine, GARCH,etc. , and
more will be added as users request them.

In all cases, we start this phase with a model in mind (e.g. GARCH(1,1)), and use NumXL tools
and wizards to facilitate the model specification stage.

For the sample data, we are using the weekly log returns for S&P 500 between January 2009 and
July 2012.
10%
8%
6%
4%
2%
0%
2%
4%
6%
8%
10%
12%
S&P500LOGReturns

In Module 4, we showed that the weekly log returns dont exhibit significant serial correlation,
but they do possess an ARCH effect. In other words, an ARCH/GARCH model is more suited to
fit the data than, say, ARMA. For start, lets consider a GARCH(1,1) model:
2 2 2
1 1 1 1
~ (0,1)
t t
t t t
t
t o t t
y a
a
a

o c
c
o o o | o

= +
=
u
= + +


Using NumXL toolbar, locate and click on the GARCH icon.



NumXL 1.57 - Getting Started -20- Spider Financial Corp, 2012

The GARCH wizard dialog box pops up. In the input data field, specify the cells range for the
sample data. Next, enter the values of the ARCH and GARCH component orders as one (1).


For innovations distribution, well use the default Gaussian distribution, and this completes the
GARCH(1,1) model specification.

Next, lets instruct the GARCH wizard to generate and augment the goodness-of-fit calculations
and residuals diagnosis sections in the model output table.

By default, the selected cell is used for the output range value. If this is acceptable, lets click the
OK button.

The following table will be generated in your worksheet:


The models parameters values are set by a quick guess, and they are not optimal. The model
ought to be calibrated (next module) before we can guage its fit or consider it for forecasting.

In the middle table (i.e. Goodness of fit), the wizard created a log-likelihood function and Akaike
information criterion formulas in the corresponding cells. The formulas reference the models
parameters cells and input data range, so after you calibrate the model, they will reflect the
goodness of fit of the optimal values.

NumXL 1.57 - Getting Started -21- Spider Financial Corp, 2012


In the right-most table (i.e. Residuals Diagnosis), the wizard created a series of statistical tests
(formulas) for the standardized residuals (i.e. { }
t
c ) to help us verify the GARCH assumption:
~ i.i.d ~ (0,1)
t
c u
The generated formulas reference the models parameters cells and input data cells range, so
when you calibrate (or modify) the values of the models parameters, the statistical tests results
reflect the model parameters latest values.


Whatsnext?
A quick recap: weve analyzed the input data statistical properties and come up with a suspect
model: GARCH (1, 1). Now, we need to answer the following questions:
1. What are the optimal values for GARCH(1,1), given the input data?
2. Given the calibrated model, how well does the model fit the input data? Do the residuals
address the assumption(s) of the underlying model?
3. Are there similar models to consider (e.g. EGARCH, GARCH-M, etc.)? How do we rank
and ultimately decide which of them to use?

As you may have guessed, our analysis has reached a new phase: the model identification phase.
For now, lets address the

first two questions:
- In module six (6), we will address the calibration process,
- In module seven (7), we will visit the residuals diagnosis in greater detail and validate the
models assumptions.




NumXL 1.57 - Getting Started -22- Spider Financial Corp, 2012

Module6:ModelCalibration
In module five (5), we presented the few steps to specify the time series model, along with
goodness-of-fit and residuals diagnosis tables.

In this module, we will continue along this path and find the optimal values for the models
parameters a process referred to as calibration. Once calibrated, you can examine the
residuals for the models assumptions and compare this model with other models.

NumXL supports numerous time series models, but fortunatley the calibration process using
NumXL is the same for all models.

In a nutshell, a calibration is an optimization problem where we search for a set of parameter
values that maximize the value of a utility function (i.e. log-likelihood function) while
complying with one constraint: the stablility of the model.

What do we mean by model stability? For an ARMA model, the underlying process has a
finite unconditional (long-run) mean and variance (i.e. the roots of the characteristics equation
are outside the unit-circle). For GARCH and GARCH-variant models, in addition to the
constraint earlier, the variance model must guarantee positive values for conditional variance.

Fortunately, NumXL lumps together model-specific constraints with a function (e.g.
ARMA_CHECK, GARCH_CHECK, etc.). The function returns one(1) for a stable model,
otherwise zero(0).

In the Goodness-of-fit table, the right-most formula is actually a stability check.


The first two tables contain all inputs we need to carry on the calibration process, so lets
process.
1. Select the top cell in the model table (i.e. M32 in the figure above).
2. Locate and click on the calibration icon in the NumXL toolbar.

3. The Microsoft Excel solver pops up in your worksheet.

NumXL 1.57 - Getting Started -23- Spider Financial Corp, 2012


4. Notice that all fields in the solver are already pre-set with your model.
5. Click on the Solver button.

The solver begins its search for a set of parameter values that maximize the objective (i.e. log-
likelihood function) while keeping the model valid/stable.

Using the GRG method, the solver does not guarantee global maxima and we may end up with a
local optimal solution. This is sufficient for the majority of potential cases.

Once we accept the solver solution, the new set of paramter values are copied to your worksheet.

NumXL 1.57 - Getting Started -24- Spider Financial Corp, 2012

Note: If the calculation options are set to manual, you need to force recalculate to update all
formulas that reference the models parameters cells.

The calibrated model is shown below:


Note that the LLF function changed from 412 to 425 value with the new optimal values, yet the
model is valid/stable (S34 = 1).

Whatsnext?
The calibrated parameters values improved the overall fit of the model with input data, but is
this the right model? We need to examine the assumption(s) of GARCH and whether they are
met with this model or not. This will be the focus of our next module.

NumXL 1.57 - Getting Started -25- Spider Financial Corp, 2012

Module7:ResidualDiagnosis
In modules five and six, we demonstrated the time series modeling procedure from model
specification to calibration.

In this module, we will look into the models residuals time series and examine the underlying
model assumptions (e.g. normality, etc.).

In a nutshell, a time series model draws some patterns for the evolution of values over time and
assumes the error terms (i.e. residuals) to be independent and following a particular probability
distribution. Once we fit the sample data into the model, it is imperative to examine those
residuals for independence and whether their values follow the assumed distribution.

Example 1: An ARMA(p,q) model assumes the residuals time series { }
t
a to be a Gaussian
white-noise distribution.

1 1
2
(1 ) (1 )
~ i.i.d ~ (0, )
p q
i j
i t j t
i j
t
L y L a
a
| u
o
= =
= +
u


Example 2: A GARCH(1,1) model assumes the standardized residuals { }
t
c to be Gaussian
white-noise with zero mean and unity variance.

2 2 2
1 1 1 1
~ i.i.d ~ (0,1)
t t
t o t t
t t t
t
y a
a
a

o o o | o
c o
c

= +
= + +
=
u

To avoid confusion as to when you should use the regular residuals { }
t
a or the standardized
residuals{ }
t
c , we will limit ourselves to the standardized residuals. This simplifies the diagnosis
dramatically:
~ i.i.d ~ (0,1)
t
c u

Why do we care? The objective of a time series is forecasting, so by ensuring that our model
properly fits the data and meets all assumptions, we can have faith in the projected forecast.

Note: similar to modules 5 and 6, we will be using the S&P 500 weekly log returns time series
between Jan 2009 and July 2012.

NumXL supports numerous functions to help us construct residuals series and conduct statistical
tests to answer the independence/probability distribution questions.




NumXL 1.57 - Getting Started -26- Spider Financial Corp, 2012

In this module, you dont need to launch any wizard or create any formula; all the tests we need
are included in the model table (right-most part):


The standardized residuals diagnosis includes the following hypothesis testings:
1. Population mean ( : 0
o
H = )
2. Polulation standard deviation ( : 1
o
H o = )
3. Population skew ( : S 0
o
H = )
4. Population excess-kurtosis ( : K 0
o
H = )
5. White-noise or serial-correlation test (
1 2 3
: ... 0
o k
H = = = = = )
6. Normality test
7. ARCH effect test

The first four tests examine the distribution center, dispersion, symmetry and far-end tails. The
Normality test compliments these test by assuming a specific distribution Gaussian.
~ (0,1)
t
c u
The white-noise and ARCH effect tests address a different concern: indepence of residual
observations.
~ i.i.d
t
c
Since the independence test is quite a complex topic, we simplify it by examining the linear and
quadratic order dependency.

Lets analyze the residuals diagnosis table results:
(1) The population mean (i.e. AVG) test shows that the sample average is not significantly
different from zero (Target). As a result, the residuals distribution has a mean of zero.

(2) The population standard deviation (i.e. STDEV) test shows that sample data standard
deviation is not significantly different from one (1).

(3) The population skew test shows that sample skew is not significantly different from zero.
The residuals distribution is symmetrical.

NumXL 1.57 - Getting Started -27- Spider Financial Corp, 2012


(4) The population excess kurtosis test indicates sample kurtosis is not significantly different
from that of a normal distribution (i.e. excess-kurtosis = 0). The residuals distribution
tails are normal.

(5) So far, the residuals distribution seems like a Gaussian distribution. The Normality test
shows that standardized residuals are likely to be sampled from a normal population.

(6) Now lets examine the interdependence concern among the values of the residuals. First,
lets examine the first-order dependence (linear) or serial correlation using the white-
noise test (Ljung-Box). The test shows no sign of significant serial correlation.

(7) Lets examine the second order dependence (quadratic) or ARCH effect. The ARCH
effect test shows insignificant serial correlation in the squared residuals or the absence of
an ARCH effect.


As a result, the standardized residuals are independent and identically Gaussian distributed.
Thus, the GARH model assumption is met. The model is fair.

Lets plot the standardized residuals distribution and the QQ-Plot.

NumXL 1.57 - Getting Started -28- Spider Financial Corp, 2012

0%
5%
10%
15%
20%
25%
30%

2
.
5

2
.
0

1
.
4

0
.
9

0
.
3
0
.
3
0
.
8
1
.
4
1
.
9
2
.
5 2.00
1.50
1.00
0.50
0.00
0.50
1.00
1.50
2.00
2.50 2.00 1.50 1.00 0.50 0.00 0.50 1.00 1.50 2.00

The sample data histogram does not strongly imply a Gaussian distribution, but this is due to the
construct of the histogram as a rough estimate of the underlying distribution.

On the other hand, the QQ-plot confirms our earlier finding of normality and an absence of fat-
tails on either end.





NumXL 1.57 - Getting Started -29- Spider Financial Corp, 2012

Module8:TimeSeriesForecast
In modules five and six, we demonstrated a few steps to specify a model, calibrate the values of
its parameters, and in module seven, we examined the standardized residuals residuals
diagnosis - to ensure the model proper fit with the input data.

In this module, we will take the final step and actually project a forecast: mean, standard error,
and confidence interval.

In general, we are interested in forecasting the conditional mean and conditional standard
deviation (aka volatility):

[ ]
~ (0,1)
T k T k T k
T k T T k
T k T K T k
T k
y a
E y
a

o c
c
+ + +
+ +
+ + +
+
= +
=
=
u

Where:
-
T k

+
is the conditional mean forecast at T+k
-
T K
o
+
is the conditional volatility forecast at T+k

As a result; for a 95% confidence interval, the forecast is expressed as follows:
1.96 1.96
T k T k T k T k T k
y o o
+ + + + +
+ > >

For GARCH Models, the conditional mean is constant, so the forecast procedure is primarily
focused on a volatility forecast.
1.96 1.96
T k T k T k
y o o
+ + +
+ > >

Using the NumXL forecast toolbar, you can generate the out-of-sample forecast values, standard
errors and confidence intervals in a few steps.



1. Select the first cell in the Model (i.e. M32)
2. Locate and click on the Forecast icon in the NumXL toolbar


NumXL 1.57 - Getting Started -30- Spider Financial Corp, 2012


3. The forecast wizard pops up on the screen.



4. For input data, select the cell range of the latest (i.e. most recent) weekly log returns (~
July 2012).
5. For the realized volatility, you may:
a. Leave it blank, so that a GARCH-fitted volatility is used, or
b. Select a range of the latest weekly realized volatility (computed using different
approach)
6. In the OutputMax Steps field, select a 15 week forecast.
7. In OutputVolatility Term structure, leave it checked for now, as will discuss later on.
8. In the Output Range, select an empty cell in your worksheet to print the forecast
formulas.
9. Click OK now.
The forecast wizard prints the formulas for different cells in the forecast table (below):


NumXL 1.57 - Getting Started -31- Spider Financial Corp, 2012

6%
4%
2%
0%
2%
4%
6%
8%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15


Furthermore, the forecast standard error (i.e. conditional volatility) is increasing with the forecast
horizon (below).

1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
0 20 40 60 80
GARCHVolatility
Forecast

In the graph above, the volatility forecast climbs to its long-run value of (3.77%).

Whatarethelongrunvalues?
For a stable time series model, the conditional mean and variance forecast converge to their long-
run (historical or unconditional) values. The long-run values are implied (i.e. calculated) from
the models parameter values.

Example: for GARCH(1,1), the long run conditional volatility (GARCH_VL) is expressed as
follows:

max( , )
1
1 ( )
o
T L p q
i i
i
V
o
o
o |
+
=
= =
+


TermStructure
In finance, we often wish to compute a multi-period volatility forecast (aka volatility term
structure).

NumXL 1.57 - Getting Started -32- Spider Financial Corp, 2012


2 2 2 2
1 1 2 1
2 2 2 2
1 2
...
...
T T K T T T T T k T k
T T K T T T k
o o o o
o o o o
+ + + + + +
+ + + +
= + + +
= + + +

Now we have a base-unit of
2
T T K
o
+
expressed in terms of a k-period time unit. To facilitate
comparison among different periods, we use a one-period time unit for all volatility calculations:

2 2 2
2 1 2
...
T T T k
T T K
k
o o o
o
+ + +
+
+ + +
=
Lets plot the GARCH volatility term structure:

1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
0 20 40 60 80
GARCHVolatility
Forecast
LocalVol
TermStructure


As for the multi-period log returns:

1 1 2 1
...
T T k T T T T T k T k
r r r r
+ + + + + +
= + + +
And using the same time-unit base:

1 1 2 1
...
T T T T T k T k
T T k
r r r
r
k
+ + + + +
+
+ + +
=
For GARCH model, the conditional mean forecast is constant ( ), so the multi-period returns
(term structure) are also constant ( ).

Application
Using the GARCH model earlier, what is the 3-month (12 weeks) volatility forecast per annum?

2 2 2
2 1 2 12
12
...
52
12
T T T
T T
o o o
o
+ + +
+
+ + +
=
This volatility value can be plugged into the Black-Scholes option pricing equation to generate a
3-month European S&P 500 index option.





NumXL 1.57 - Getting Started -33- Spider Financial Corp, 2012

References
- Hull, John H.; "Options, Futures and Other Derivatives," Princeton University Press
(1994)
- Hamilton, J. D.; "Time Series Analysis," Princeton University Press (1994) ISBN 0-691-
04289-6
- Tsay, Ruey S.; "Analysis of Financial Time Series," John Wiley & Sons (2005) ISBN 0-
471-690740
- Lange, K.L., Little, R.J.A., and Taylor, J.M.G.; "Robust Statistical Modeling Using the t-
Distribution," Journal of the American Statistical Association 84, p. 881-896 (1989)
- Jarque, Carlos M. and Bera, Anil K.; "Efficient Tests for Normality, Homoscedasticity
and Serial Independence of Regression Residuals," Economics Letters 6 (3), p. 255-259
(1980)
- Enders, W.; "Applied Econometric Time Series," John Wiley & Sons, p. 86-87 (1995)
- Shapiro, S. S. and Wilk, M. B.; "An Analysis of Variance Test for Normality (Complete
Samples)," Biometrika, 52, 3 and 4, p. 591-611 (1965)
OnlineResources
NumXL Reference Manual
http://www.spiderfinancial.com/support/documentation/numxl/reference-manual
NumXL Tutorial Videos
http://www.spiderfinancial.com/support/library
http://www.youtube.com/user/spiderfinancial
NumXL Users Interface Guide
http://www.spiderfinancial.com/support/documentation/numxl/user-interface-guide
NumXL Tips & Hints Archives
http://www.spiderfinancial.com/tips-demos
http://www.scribd.com/spiderfinancial
NumXL select-cases
http://www.spiderfinancial.com/support/documentation/numxl/white-paper

NumXL 1.57 - Getting Started -34- Spider Financial Corp, 2012

Useful Links & Resources
https://www.facebook.com/spiderfinancial/app_197602066931325

You might also like