Econometrics in MATLAB: ARMAX, Pseudo Ex-Post Forecasting, GARCH and EGARCH, Implied Volatility
Econometrics in MATLAB: ARMAX, Pseudo Ex-Post Forecasting, GARCH and EGARCH, Implied Volatility
Introduction to MATLAB 2
Piotr Z. Jelonek
Thus ut depends on: its past values ut−1 , ut−2 , ..., ut−p weighted with, respectively, φ1 , φ2 , ..., φp ,
and current and past residuals εt , εt−1 , ..., εt−q , weighted with θ0 = 1, θ1 , ..., θq . While the past
values of ut are treated as known, the residuals are not observable and may only be estimated from
the data. Two important simple cases of ARMA(p,q) model are:
AR(1): ut = c + φ1 ut−1 + εt , MA(1): ut = c + εt + θ1 εt−1 .
ARMA model has two useful extensions.
First, we might explain the dependent variable with current or past values of additional ex-
ogenous variables. This extension is known as AutoRegressive–Moving-Average with eXogenous
inputs (ARMAX). In example, if we added to our specification an exogenous variable vt , a simple
ARMAX(1,1) model could take the form of:
ut = c + φ1 ut−1 + εt + θ1 εt−1 + βvt .
Next, we could estimate ARMA using differences of the dependent variable of order d. This ex-
tension is known as AutoRegressive Integrated Moving-Average and abbreviated to ARIMA(p,d,q).
Since the series used in this class are already transformed to log-returns, we do not need any further
differencing and thus we may set d = 0. In this case ARIMA(p,0,q) and ARMA(p,q) are the same
model. ARIMA has also a more complicated variant that allows to capture seasonality. While it is
available we will not be using it here (financial data is rarely seasonal).
Since MATLAB default function allows for the estimation of ARMA, from technical point of
view what we will be estimating further will be ARIMA(X) models. But since we do not need to
take differences of the dependent variable, in fact this will be ARMA(p,q) specifications. Let us
look at some syntax.
In order to estimate ARMA(p,q) model in MATLAB you need to type:
Mdl = arima(p,0,q); % <- shorthand syntax, model with a constant
[EstMdl,EstParamCov,logL,info] = estimate(Mdl,r);
If you intend to estimate a model with no constant term, replace the last line by
M dl = arima(0 ARLags0 , 1 : p,0 M ALags0 , 1 : q,0 Constant0 , 0);
Above input variables p and q are just the desired lags of, respectively, autoregressive and moving
average parts of the ARMA specification. Vectors 1 : p and 1 : q enumerate the desired lags (in
the latter syntax you may specify separate lags instead of an entire range). Letter r stands for the
underlying vector of the data (in my examples it will eb either returns or log-returns of Alphabet
Inc. or Apple Inc.). Mdl is a structure which initializes the model setup.
Structure is just a number of variables of different types, bundled together. Structures have
fields, which correspond to each separate variable. In order to access a chosen field refer to it, using
name of the structure and name of this element, separated by a dot. In example, Mdl structure
has field P, denoting the lag of the auto-regressive part. To access its value (and save it as small p)
refer to it as:
1
p=Mdl.P
To view other fields of this structure, click it to inspect it in the workspace window.
Among the output variables EstMdl is a structure, which contains the estimated model. Est-
ParamCov contains the estimate of covariance matrix of model parameters, logL referes to loga-
rithm of model likelihood, info is yet another structure containing informations regarding conver-
gence of the estimation process.
The following code selects the best 1 ARMA(p,q) specification for Alphabet Inc. log returns
given p ≤ max p and q ≤ max q:
for i=1:max_p
for j=1:max_q
Mdl = arima(i,0,j); % <- shorthand syntax, model with a constant
[EstMdl,EstParamCov,logL,info] = estimate(Mdl,r);
[aic,bic]=aicbic(logL,i+j,N);
fprintf(’\n’);
Above function infer() elicits model residuals and the estimate of their variance(s). You may use
function forcast() to generate h-step ahead forecasts from a model given by EstMdl structure,
using vector r as the past data:
[fcast,YMSE] = forecast(EstMdl,h,’Y0’,r);
1
It would be more interesting to investigate all possibilities by selecting separate lags.
2
Output variable fcast will contain the forecast, YMSE is a vector of the corresponding mean
square errors. This example is provided as ‘ARMA selection.m’ script.
The script ‘regARMA selection.m’ selects the best model for the regularized ARMA. Hence
instead of taking all the AR lags up to and including p and all the ma MA lags up to and including
q we indicate which specific lags to include in the model specification.
% forecast horizon
h=6;
% loading data
data
% find log-returns
r_a=log(a(1:end-1))-log(a(2:end));
r_g=log(g(1:end-1))-log(g(2:end));
tic
for i=1:max_p
v = dec2ind(i); colsv=size(v,2);
for j=1:max_q
w = dec2ind(j); colsw=size(w,2);
% Mdl = regARIMA(’ARLags’,v,’MALags’,w);
Mdl = regARIMA(’ARLags’,v,’MALags’,w,’Constant’,0); % <- model without intercept
[EstMdl,EstParamCov,logL,info] = estimate(Mdl,r);
[aic,bic]=aicbic(logL,colsv+colsw,N);
3
best_i=i;
best_j=j;
end
There are two easy ways to forecast variable ut indirectly through its log-returns. We can either
specify the basic model as:
rt+h = α + βrt + εt , (1)
or, alternatively, we may write it down it using log-returns across h periods:
% DEFINE PARAMETERS
% forecats horizon (in periods)
h=5;
% number of forecasts
4
nf=250;
% lag order
l=1;
% LOADING DATA
A = xlsread(’it.xlsx’);
% find log-returns
r_a=log(a(1:end-1))-log(a(2:end));
r_g=log(g(1:end-1))-log(g(2:end));
% data length
n=size(r_a,1);
% FORECASTING
for i=1:nf; % <- for a required number of forecasts
first=n-(nf-i+1)-w-h+1;
last=n-(nf-i+1)-h;
y=r_a(first:last);
% BENCHMARK - AR(1)
% select the data
x=[ones(w,1) r_a(first-(h-j+1):last-(h-j+1),1)];
% ols estimation
b=(x’*x)\x’*y; % <- the same as: b=inv(x’*x)*x’*y;
% take the last row of x’s, multiply by betas
f=x(end,:)*b;
% save forecasts
5
bmark1(i,j)=f;
end
end
% forecast accuracy
fprintf(1,’Comparing accuracy of the model with AR(1)\n’);
6
fcast_acc=fcast_std./bmark0_std
fprintf(1,’Comparing accuracy of the model with naive forecasts\n’);
fmark1_acc=bmark1_std./bmark0_std
function plotfcast(y,fcast_prices,scale)
% This is my function which plots the forecasted trajectories.
n=size(y,1);
nf=size(fcast_prices,1);
h=size(fcast_prices,2)-1;
x=ones(n,1);
for i=2:n
x(i)=i;
end
plot(x,y)
hold on;
for i=1:nf
plot(x(n-h-(nf-i):n-(nf-i)),fcast_prices(i,:),’b’);
end
hold off;
axis tight;
axis([min(x) max(x) (1/scale)*min(y) scale*max(y)]); % <- defining range of x and y
end
To forecast from a different model (e.g. ARMAX) all that you need to modify is lines 77-84
7
in the Forecasting ARX.m script. The graph of the generated forecast trajectories is presented
below.
Figure 1: Pine in the wind : ex-post Alphabet Inc. share price forecasts (pretty bad, as you see).
In an attempt to generate better forecasts you may try to change the number of forecasts (nf=250),
the length of the rolling window (w=100) or the forecast horizon (h=5).
Why is it useful: Contrary to what you might think, most models are bad at forecasting. Good
sample statistics might simply signalize overfitting. Statistical significance of model coefficients
does not yet imply economic significance – since estimates of the coefficients my be tiny and thus
do not tell us anything useful about the future. From a point of view of statistics a single fore-
casted trajectory is equivalent to a sample of size 1 – so it is useless when we are trying to evaluate
properties of trajectories, generated by the model. It is hard to evaluate if the model is good or
bad without an appropriate point of reference (a benchmark). Pine on the wind chart allows us
to assess forecasts qualitatively (are we getting the trend/turning points right).
In the ‘Forecasting ARX.m’ script I provide a model with a slightly different specification
which includes a moving average component. With this one we will generate a smaller number
of forecasts – otherwise it takes about 2 hrs to run. You may also run a ‘Nasdaq.m’ script which
provides another (negative) example of an ARMA forecast. This last bit of code was extracted
from the MATLAB help.
8
3 GARCH and EGARCH models, simulations
Asset returns display an empirical features known as volatility clustering and heavy tails.
Volatility clustering: disturbances ut of large absolute value tend to be followed by further large
absolute values, but not necessarily of the same sign. Heavy tails: extreme observations (very
large or very small) happen more often than it would be implied by the fitted normal distribution.
Likely explanation is behavioural. If investors are anxious and worried, market volatility is large,
if they are calm and relaxed, it is small. Panicked or excited investors are more likely to imitate
each other. In result market undergoes either more turbulent and volatile or more tranquil periods.
Since investors’ attitude does not (typically) change overnight, volatility is persistent. We may
use this observation to model volatility as unobserved variable. This variable will be changing
with time, depending on the past data. It is useful since volatility is to some extent forecastable.
ARCH/GARCH models represent volatility as unobserved variable, depending on the data from
t − 1. In result conditional standard deviation (thus also conditional variance) will be changing
over time. These models also explain heavy tails.
9
Figure 2: Estimate of volatility, obtained from GARCH(1,1).
N=size(g,1);
z=[0; randn(N-1,1)];
lagz=[0; z(1:N-1,1)];
r_sim=zeros(N,1);
s2_sim=[sigma_0^2; zeros(N-1,1)];
for i=2:N
s2_sim(i,1)=beta_0+s2_sim(i-1,1)*(beta_1+beta_2*z(i-1,1)^2);
r_sim(i,1)=alpha_0+alpha_1*r_sim(i-1,1)+sqrt(s2_sim(i,1))*z(i,1)+...
+alpha_2*sqrt(s2_sim(i-1,1))*z(i-1,1);
end
10
Alternatively, you might use VIX style index (VIX is calculated for S&P500), although it might be
labour consuming (for more details see the next section).
You may also try ‘regARMA GARCH.m’ ans Simulate and forecast GARCH script, the latter
uses the predefined MATLAB functions in order to forecast volatility.
Exponential GARCH (EGARCH) model belongs to the class of nonlinear, assymmetric GARCH
models. These models account for asymmetry of conditional volatility. This asymmetry reflects
the fact that positive and negative innovations do not generate the same standard deviation.
The basic EGARCH(1,1) model is:
q
where εt ∼ NID(0, 1) and E|εt−1 | = π2 . Main features:
• negative innovations may have a bigger impact on future variance than positive innovations
of the same magnitude (erroneously: “leverage effect”)
• exponent of log is always positive, no need for non- negativity conditions
• natural logs make construction of unbiased volatility forecasts more difficult
You may also try ‘regARMA EGARCH.m’ ans Simulate and forecast EGARCH script, the lat-
ter uses the predefined MATLAB functions in order to forecast volatility.
11
Figure 3: Estimate of volatility, obtained from EGARCH(1,1).
4 Implied volatility
Definition 2 (Implied volatility) Volatility of the (future) stock prices which yields theoretic
option prices (e.g. coming from the Black-Scholes formula) equal to recorded market transaction
prices.
The script ‘IV.mat’ calculates implied volatility using put option prices (transaction level data)
and risk-free interest rates (obtained from futures prices). This code uses a bisection method in
order to calculate the level of volatility (σ) which – according to the Black-Scholes formula – would
generate the option price recorded in a given transaction. Then the sigmas obtained for all the
options traded on a given day are weighted (using transaction volume) to obtain a daily proxy of
volatility.
In the code below pwd (‘print working directory’) command produces a string (a sequence
of characters) which corresponds to current directory. The syntax [pwd,’\’] appends a single
character (‘\’) to the string, representing the working directory. Hence the command:
currentFolder=[pwd,’\’];
outputs the current folder. This is later used to create the paths to the files with put option prices.
Command run() runs the file saved under the indicated path. The command:
filename1=names1{i,1};
12
accesses the i-th row, 1-st column of names1 cell and saves it as filename1. The cells are MATLAB
data structures similar to matrices, but their entries are not required to be numerical. In these
case – they are strings, listing the names of all the files, containing put option prices. In order to
access elements of a cell, we nee to use curly (‘{’,‘}’) brackets.
% fixed parameters
global small_number;
small_number=0.00000001; % <- smallest increment allowed for
% setting folders
currentFolder=[pwd,’\’];
optionsFolder=[currentFolder,’put_options’,’\’];
futuresFolder=[currentFolder,’futures’,’\’];
% loading data
filename=’mibid_and_mibor.xlsx’;
[NUM,TXT,RAW]=xlsread([currentFolder,filename]);
% defining outputs
ivalues=[];
output=[];
count1=0; count2=0; flag=0;
% processing options
n=size(NUM,1);
strike=NUM(:,1);
call_price=NUM(:,7);
trading_volume=NUM(:,9);
open_interest=NUM(:,11);
option_date=datenum(RAW1(2:end,2),’dd/mm/yyyy’); % <- former t0
expiry_date=datenum(RAW1(2:end,3),’dd/mm/yyyy’);
dt=(expiry_date-option_date)/365;
13
% loading futures prices from the files
filename2=names2{i,1};
[NUM,TXT,RAW2]=xlsread([futuresFolder,filename2]);
% processing futures
future_price=NUM(:,6);
future_date=datenum(RAW2(2:end,2),’dd/mm/yyyy’);
% defining outputs
iv=zeros(n,1);
if t>=first_date && t<=last_date % <- if the interest rates are available ...
k=strike(m,1);
moneyness=k/f;
if T-t<30
h=1;
elseif (T-t>=30 && T-t<60)
h=2;
else
h=3;
end
id1=0;
while (dates(id1+1)<=t) && (id1+1 < size(dates,1))
14
id1=id1+1;
end
id2=size(dates,1);
while (dates(id2-1)>=t) && (id2-1>1)
id2=id2-1;
end
d=dt(m);
p=f*exp(-r*d);
c=call_price(m);
if c>=(k*exp(-r*d)-p)
v_p = cal_iv_p(k,c,d,r,p);
iv(m,1)=v_p(1);
end
oi=open_interest(m);
tv=trading_volume(m);
if ~isnan(v_p(1))
if tv>0
% data for analysing the relationship, list after commas
output=[output;t,T,v_p(1),oi,tv];
end
end
end
end
m=m+1;
end
ivalues=[ivalues; iv];
end
15
size(z2,1);
fprintf(1,’\nTotal number of approximated volatilities: %i’,...
size(z2,1));
fprintf(1,’\nPercentage of cases when numerical approx. fails: %4.3f’,...
size(z1,1)/size(z2,1));
fprintf(1,’\n’);
Figure 4: Implied volatilities (volume-weighted) implied by put option prices (NIFTY index)
16
5 Best coding practices
• use different naming conventions for scripts and function (the names of my scripts always
begin with a capital letter, hence you always know which file is executable)
• make sure that running a script does not require any manual operations (like: separate loading
the data), all the scripts should work on plug and play basis (it makes your work reusable,
otherwise after a few months you will not remember how to use the script)
• in general: any operation which you will be doing 3 or more times should be automated
• if there are any limitations in using a script, or if you need to do any operations to run the
script (which for some reason can not be automated), comment upon it in header of the script
(then it is easier to do if you go back to it some time later)
• if you think it might be helpful later on, add comments to the code (again it increases its
reusability: the code will be easier to modify later)
• any complicated block of operations which has to be repeated again and again is best to be
written as a separate function
• in the header of the function explain what does the function do, list the functions inputs (and
their type) and its outputs (and their type)
17