Hidden Markov Models in Univariate Gaussians
Hidden Markov Models in Univariate Gaussians
k=1
N
Prob@q
t+1
= sHkL q
t
= sHiLD Prob@x
t+1
q
t+1
= sHkL, qD Prob@x
t+2
, , x
T
q
t+1
= sHkL, qD
If one defines the following variables:
b
t
HiL = Prob@x
t+1
, , x
T
q
t
= sHiL, qD
then:
6 HMMUG_20091222_05.nb
b
t
HiL =
k=1
M
aHi, kL b
t+1
HkL b
t+1
HkL
The initial condition to start the backward recursion is arbitrarily chosen as:
b
T
HiL = 1
The backward recursion starts at time t = T and recursively calculates the backward probabilities acrosss states for each time
period down to t = 1.
Sufficient Statistics for the Occult Data, Updating q, and Calculating log ![X | q]
With the forward probabilities a
t
HiL and backward probabilities b
t
HiL one can compute the probability that the system in a specific
state at a specific time given the observed data and the working parameters:
Prob@q
t
= sHiL X, qD ! Prob@x
1
, , x
t
, q
t
= sHiL qD Prob@x
t+1
, , x
T
q
t
= sHiL, qD
Define:
g
t
HiL = Prob@q
t
= sHiL X, qD
then:
g
t
HiL ! a
t
HiL b
t
HiL
Similarly, we can calculate the probability that the system transits from state s(i) to state s(j) at time t:
Prob@q
t
= sHiL, q
t+1
= sH jL X, qD !
Prob@x
1
, , x
t-1
, q
t-1
= sHkL qD
Prob@q
t+1
= sH jL q
t
= sHiLD Prob@x
t+1
q
t+1
= sHkL, qD Prob@x
t+2
, , x
T
q
t+1
= sHkL, qD
Recalling the definition of x
t
Hi, jL:
x
t
Hi, jL = Prob@q
t
- sHiL, q
t+1
= sH jL X, qD
then:
x
t
Hi, jL ! a
t
HiL aHi, jL b
t+1
H jL b
t+1
H jL
Note that x
t
is not defined for t = T. The values of g
t
and x
t
are normalized so that they represent proper probability measures:
k=1
N
g
t
HkL = 1
k=1
N
l=1
N
x
t
Hk, lL = 1
Again, note that, consistent with the definitions of the variables:
g
t
HiL =
k=1
N
x
t
Hi, kL
As described earlier, the parameter set q is updated using x
t
, g
t
and x
t
. The log likelihood can be computed by summing a
T
HiL
over the states:
HMMUG_20091222_05.nb 7
log !@q XD = log
i=1
N
Prob@X, q
T
= sHiL qD = log
i=1
N
a
T
HiL
MLE Implementation
Implementation Issues
There are several implementation issues associated with actually fitting a model to data. We deal with three here:
Selecting the number of states.
Initializing and terminating the algorithm. This has three sub-issues:
Generating an intial estimate of the parameters q
H0L
.
Terminating the algorithm.
Avoiding local maxima of the log likelihood.
Scaling to avoid under- and over-flow conditions in the forward-backward recursions.
Selecting the Number of States:
In MLE one can always increase the likelihood by adding parameters, but, as one adds parameters, the risk of overfitting is also
increased. A trade-off mechanism is needed.
The Bayesian Information Criterion (BIC) is the log likelihood penalized for the number of free paramaters [McLachlan 2000].
The BIC is not a test of significance (i.e., it is not used to accept or reject models), but it does provide a means of ranking models
that have been fit on the same data. The model with the smallest (i.e., usually most negative) BIC is preferred.
The BIC given the likelihood !, the number of free parameters k, and the number of observations n is
BIC@X, qD = -2 log !+ k log n
In the HMMUG model the number of observations is T, and the number of free parameters for i, a, m, and s, respectively, is
k@qD = HN - 1L + NHN - 1L + N + N = NHN + 2L - 1
The HMMUG model's BIC is, therefore,
BIC@X, qD = -2 log !@q XD + HNHN + 2L - 1L log T
There are alternatives to the BIC [McLachlan 2000] that involve similar concepts of log likelihood penalized for model
complexity.
Initialization and Termination
The EM algorithm starts with an intial guess of the parameters q. The algorithm can get stuck in a local maximum, so the choice
of an intial q is more than just an issue of efficiency. Several approaches have been suggested [Finch et al. 1989] [Karlis 2001]
[Karlis et al. 2003]. Most termination criteria do not detect convergence per se but lack of progress, and the likelihood function
has "flat" regions that can lead to premature termination [Karlis 2001].
Therefore, it makes sense to make the termination criteria reasonably strict, and it also makes sense to start the algorithm at
multiple starting points. [Karlis 2001] [Karlis et al. 2003]. An approach suggested in [Karlis et al. 2003] is to run multiple
starting points for a limited number of iterations, pick the one with the highest likelihood and then run that choice using a fairly
tight terminiation tolerance. This is the approach taken in the demonstrations below.
Scaling
8 HMMUG_20091222_05.nb
Scaling
Consider the forward recursion:
a
t
HiL = b
t
HiL
k=1
N
a
t-1
HkL aHk, iL
Repeated multiplication by b
t
HiL at each time step can cause serious computational problems. In the discrete case as discussed in
[Rabiner 1989] b
t
HiL << 1 and the computation of a
t
HiL is driven exponentially towards zero. In the continuous case, however,
b
t
HiL may take on any value and in the Gaussian case the expected value of b
t
HiL is 1 s 2 p . Thus, a
t
HiL may be driven to 0 or
depending upon s. In the case of time series of financial returns s<< 1 2 p ; hence, b
t
HiL tends to be >>1 and the problem
is more one of over-flow than under-flow.
The solution, as discussed in [Rabiner 1989], is to scale the computations in a manner which will still allow one to use the same
forward-backward recursions. For each a
t
compute a normalization constant n
t
and apply it to a
t
to produce a normalized a
t
that
sums to 1:
n
t
= 1
k=1
N
a
t
HkL
a
t
HiL = n
t
a
t
HiL
At each recursion use the normalized a
t
to compute an unnormalized a
t+1
which is then itself normalized and used in the next
iteration. Note that as the recursion progresses:
a
t
HiL = a
t
HiL
u=1
t
n
u
On the backward recursion apply the same normalization constant so that a normalized b
t
t
. Note that:
b
t
HiL = b
t
HiL
u=t
T
n
u
As [Rabiner 1989] shows, the effects of the normalization constants cancel out in the numerators and denominators of the
computations of g
t
and x
t
.
Note that the true value of a
T
can be recovered from the scaled values:
1 =
k=1
N
a
T
HkL =
t=1
T
n
t
k=1
N
a
T
HkL
i=1
N
a
T
HiL = 1
t=1
T
n
t
and used to compute the log likelihood
log !@q XD = log
i=1
N
a
T
HiL = -
t=1
T
log n
t
At this point, the complete algorithm can be set out.
EM (Baum-Welch) Algorithm for the HMMUG
HMMUG_20091222_05.nb 9
1
HiL = n
1
a
1
HiL
The forward recursions continue for t = 2, , T computing a
t
HiL then scaling it to a
t
HiL:
a
t
HiL = b
t
HiL
k=1
N
a
t-1
HkL aHk, iL
n
t
= 1
k=1
N
a
t
HkL
a
t
HiL = n
t
a
t
HiL
The backward recursion is intialized and proceeds backward from t = T, , 1:
b
T
HiL = n
T
b
t
HiL = n
t
k=1
N
aHi, kL b
t+1
HkL b
t+1
HkL
The values of g and x are estimated using the scaled forward-backward values:
g
t
HiL = a
t
HiL b
t
HiL
k=1
N
a
t
HkL b
t
HkL
x
t
Hi, jL = a
t
HiL aHi, jL b
t+1
H jL b
t+1
H jL
k=1
N
l=1
N
a
t
HkL aHk, lL b
t+1
HlL b
t+1
HlL
M-Step
The updated parameters q
+
are:
10 HMMUG_20091222_05.nb
i
+
HiL = g
1
HiL
a
+
Hi, jL =
t=1
T-1
x
t
Hi, jL
t=1
T-1
g
t
HiL
m
+
HiL =
t=1
T
g
t
HiL x
t
t=1
T
g
t
HiL
s
+
HiL =
t=1
T
g
t
HiL H x
t
- m
+
HiLL
2
t=1
T
g
t
HiL
Log Likelihood
As noted earlier the log likelihood is computed from the scaling constants:
log !@q XD = -
t=1
T
log n
t
Termination Criteria
In the code developed here the relative change in log likelihood is used as the primary termination criterion. For some positive
constant t << 1 the algorithm is terminated when for the Hh + 1L
th
iteration:
log !Aq
Hh+1L
XE log !Aq
HhL
XE - 1 t
Other choices of termination criteria are covered in [Karlis 2001] and [Karlis 2003].
In addition, a maximum iteration limit is set and at that point the algorithm terminates even if the log likelihood tolerance has not
been achieved; one can look at the convergence of the log likelihood function and accept the solution or restart the algorithm
using either the final parameter estimates from the prior run or a new initialization.
Programming Considerations
This tutorial uses Mathematica to implement the Baum-Welch algorithm along with some useful supporting functions.
xHMMUG[data, q
H0L
]
The primay function is xHMMUG[] which performs the actual MLE. It's structure can be summarized as follows:
input data vector X and initial q
H0L
resolve options and set up working variables
compute initial a, n, b, and and log !
begin loop
E-step: compute current b, g, and x
M-step: update q = {i, a, m, and s}
log liklihood: compute next a, n, b, and and log !
append log ! history vector
break out of loop if ! has converged or max iterations reached
end loop
HMMUG_20091222_05.nb 11
end loop
compute BIC
return results: q, g, BIC, and log ! history
Note that an initial log likelihood is computed before the main loop starts. Normally, the log likelihood is computed immediately
after the M-step but the forward-backward algorithm also uses the as in the E-step, These computations are organized so that the
log likelihood reported at termination refers to the most recently updated q.
Exploiting List Processing
Mathematica, in common with other modern tools such as R, MATLAB, and others, has a syntax which supports list or array
processing. Using these capabilities usually results in code which is simpler and more efficient. For example, consider the
expression for a
t
HiL:
a
t
HiL = b
t
HiL
k=1
N
a
t-1
HkL aHk, iL
Using to denote element-by-element multiplication, the expression above can be stated as:
a
t
= b
t
Ha
t-1
aL
Similarly, consider the expression for x
t
Hi, jL:
x
t
Hi, jL ! a
t
HiL aHi, jL b
t+1
H jL b
t+1
H jL
Using " to denote the outer or Kronecker product, this can be restated as:
x
t
! aIa
t !
Ib
t+1
b
t+1
MM
It is not necessary to hold onto x
t
for each iteration. The code in xHMMUG[] computes, normalizes, and accumulates x in one
statement:
mnXi +=
Total@Flatten@DD
&@mnA KroneckerProduct@mnAlpha PtT, mnBetaPt + 1T mnBPt + 1TDD
Unfortunately, the list or array conventions vary from one system to the next. Care needs to be taken if one tries, e.g., to tran-
scribe Mathematica code into R or MATLAB. At least one of the references below [Zucchini 2009] contains samples of R code.
Equilibrium Distribution of the Markov Chain
The equilibrium distribution for the Markov chain, p(i), is the long-run probability that the system finds itself in state s(i):
pHiL =
k=1
N
aHi, kL pHiL
or
A
T
p = p
For finite-state Markov chains, the equilibrium distribution can be found by solving the following equations, where O is a matrix
of 1s, ! is a vector of 1s, and 0 is a vector or 0s:
A
T
p - I p = 0 and Op = ! IA
T
+ O-IM p = !
The general problem of finding the equilibrium distribution of a Markov chain is quite difficult, but the approach above often
works for small problems.
12 HMMUG_20091222_05.nb
The general problem of finding the equilibrium distribution of a Markov chain is quite difficult, but the approach above often
works for small problems.
Mathematica Functions
The functions presented below are teaching and demonstration tools, lacking even rudimentary error checking and handling. It
would be easy to add the required features, but that would greatly increase the number of lines of code and obscure the algo-
rithms. They are, however, perfectly usable if one is careful about the inputs.
Equilibrium Distribution for a Finite State Markov Chain
Description
Compute the equilibrium distribution for an ergodic, finite state Markov chain.
Input
The transition matrix of the Markov chain, such that mnTransitionMatrixPi, jT = Prob@q
t+1
= sH jL q
t
= sHiLD. There are no options.
Output
The equilibrium distribution as a numeric vector.
Note
This function is of limited generality but is quite useful for small problems.
Code
In[1]:= xMarkovChainEquilibrium@mnTransitionMatrix_D :=
Inverse@Transpose@mnTransitionMatrixD - IdentityMatrix@Length@mnTransitionMatrixDD + 1D.
ConstantArray@1, Length@mnTransitionMatrixDD;
Random q Initializer
Description
Generate a random q to intialize the EM algorithm for HMMUG models.
Input
There are two inputs: the data and the number of states:
the data are represented by a numeric vector, and
the number of states is a positive integer.
There are no options.
Output
HMMUG_20091222_05.nb 13
Output
The result is returned as a list of data representing q; it contains the following four components:
i - initial state probability vector (numeric vector),
a - transition matrix of the Markov chain (numeric square matrix),
m - state means (numeric vector), and
s - state standard deviations (numeric vector).
Note
The output is in a form in which in can be used directly as the second argument to xHMMUG[], i.e., q
H0L
.
The term "random" must be taken with a grain of salt. What is produced randomly is a candidate state transition matrix. The
intial state is estimated from its equilibrium distribution. A random spread for the mean and sdev vectors is then generated based
on the total sample mean and sdev.
In[2]:= xHMMUGRandomInitial@vnData_, iStates_D := ModuleB
8mnA, i, vnGamma, vnIota, vnM, vnQ, vnS, t, mnW<,
H* Generate a random transition matrix HaL *L
mnA =
Total@D
& [email protected], 0.99<, 8iStates, iStates<DL;
H* Compute HiL from HaL *L
vnIota = xMarkovChainEquilibrium@mnAD;
H* Mean HmL and sdev HsL for each state *L
vnM =
Total@D
&@[email protected], 0.99<, iStatesDD Mean@vnDataD;
vnS =
Total@D
&@[email protected], 0.99<, iStatesDD Variance@vnDataD ;
H* return q *L
8vnIota, mnA, vnM, vnS<
F;
EM Algorithm for HHMs with Univariate Gaussian Outcomes
Description
MLE fit of a HMMUG model using the Baum-Welch, or forward-backward, version of the EM algorithm.
Input
There are two inputs: the raw data and intial parameters:
the raw data as a numeric vector, and
the parameters q as a list containing {i, a, m, s}.
There are two options controlling termination:
"LikelihoodTolerance" representing the minimum change in the log likelihood function required to terminate the EM iterations, and
"MaxIterations" which are the maximum number of EM iterations before the function terminates.
The lhs of the option rules are strings.
14 HMMUG_20091222_05.nb
The lhs of the option rules are strings.
Output
The result is returned as a list of rules containing:
"i" the initial state probability vector,
"a" the transition matrix,
"m" the mean vector,
"s" the standard deviation vector,
"g" the vector of periodic state probabilities, i.e, gPt, iT = Prob@q
t
= sHiL],
"BIC" the Bayesian Information Criterion for the current fit, and
"LL" the log likelihood history, i.e., a list of !@q XD after each EM iteration.
The lhs of the rules above are strings.
Note
Restarting the fit from the prior run is straightforward. If vxH is the result of the prior run, then {"i" , "a", "m", "s"} /. vxH will
pull out q which can be used to restart the algorithm at the point at which it terminated. Typically, one should consider changing
the iteration limit or log likelihood tolerance options when restarting, although this is not always necessary.
Code
In[3]:= H* Options for xHMMUG function *L
Options@xHMMUGD = 9"LikelihoodTolerance" 10.
-7
, "MaxIterations" 400=;
In[4]:= H* Input X, 8i, a, m, s<, and optionally a tolerance and iteration limit. *L
xHMMUG@vnData_, 8vnInitialIota_, mnInitialA_, vnInitialMean_, vnInitialSdev_<,
OptionsPattern@DD := ModuleB
8mnA, mnAlpha, mnB, mnBeta, nBIC, mnGamma, vnIota, vnLogLikelihood,
iMaxIter, vnMean, iN, vnNu, vnSdev, t, iT, nTol, mnWeights, mnXi<,
H* Resolve options *L
nTol = OptionValue@"LikelihoodTolerance"D;
iMaxIter = OptionValue@"MaxIterations"D;
H* Initialize variables *L
iT = Length@vnDataD;
iN = Length@mnInitialAD;
vnIota = vnInitialIota;
mnA = mnInitialA;
vnMean = vnInitialMean;
vnSdev = vnInitialSdev;
H* Initial log ! *L
H* --- b *L
HMMUG_20091222_05.nb 15
In[4]:=
mnB = Table@
PDF@NormalDistribution@vnMeanPT, vnSdevPTD, vnDataPtTD & Range@iND,
8t, 1, iT<
D;
H* --- a and n *L
mnAlpha = Array@0. &, 8iT, iN<D;
vnNu = Array@0. &, iTD;
mnAlphaP1T = vnIota mnBP1T;
vnNuP1T = 1 Total@mnAlphaP1TD;
mnAlphaP1T *= vnNuP1T;
For@t = 2, t iT, t++,
mnAlphaPtT = HmnAlphaPt - 1T.mnAL mnBPtT;
vnNuPtT = 1 Total@mnAlphaPtTD;
mnAlphaPtT *= vnNuPtT;
D;
H* --- log ! *L
vnLogLikelihood = 8-Total@Log@vnNuDD<;
H* Main Loop *L
DoB
H* --- E-Step *L
H* --- --- b *L
mnBeta = Array@0. &, 8iT, iN<D;
mnBetaPiT, ;;T = vnNuPiTT;
For@t = iT - 1, t 1, t--,
mnBetaPtT = mnA.HmnBetaPt + 1T mnBPt + 1TL vnNuPtT;
D;
H* --- --- g *L
mnGamma =
Total@D
& HmnAlpha mnBetaL;
H* --- --- x; note that we do not need the individual x
t
s *L
mnXi = Array@0. &, 8iN, iN<D;
ForBt = 1, t iT - 1, t++,
mnXi +=
Total@Flatten@DD
&@mnA KroneckerProduct@mnAlphaPtT, mnBetaPt + 1T mnBPt + 1TDD;
F;
H* --- M-Step*L
H* --- --- a *L
mnA =
Total@D
& mnXi;
H* --- --- i *L
vnIota = mnGammaP1T;
H* --- --- observation weights *L
mnWeights =
Total@D
& ImnGamma"M;
H* --- --- m and s *L
vnMean = mnWeights.vnData;
;
16 HMMUG_20091222_05.nb
In[4]:=
vnSdev = Total ImnWeights HvnData - & vnMeanL
2
M ;
H* --- Log Likelihood *L
H* --- --- b *L
mnB = Table@
PDF@NormalDistribution@vnMeanPT, vnSdevPTD, vnDataPtTD & Range@iND,
8t, 1, iT<
D;
H* --- --- a and n *L
mnAlpha = Array@0. &, 8iT, iN<D;
vnNu = Array@0. &, iTD;
mnAlphaP1T = vnIota mnBP1T;
vnNuP1T = 1 Total@mnAlphaP1TD;
mnAlphaP1T *= vnNuP1T;
For@t = 2, t iT, t++,
mnAlphaPtT = HmnAlphaPt - 1T.mnAL mnBPtT;
vnNuPtT = 1 Total@mnAlphaPtTD;
mnAlphaPtT *= vnNuPtT;
D;
H* --- --- log ! *L
vnLogLikelihood = Append@vnLogLikelihood, -Total@Log@vnNuDDD;
H* --- --- likelihood test for early Break@D out of Do@D *L
If@vnLogLikelihoodP-1T vnLogLikelihoodP-2T - 1 nTol, Break@DD,
H* --- Max iterations for Do@D *L
8iMaxIter<
F;
H* BIC *L
nBIC = -2 vnLogLikelihoodP-1T + HiN HiN + 2L - 1L Log@iTD;
H* Return i, a, m, s, g, BIC, log ! as rule vector *L
8"i" -> vnIota, "a" mnA, "m" vnMean,
"s" vnSdev, "g" mnGamma, "BIC" nBIC, "LL" vnLogLikelihood<
F;
xHMMUG Report
Description
Produce a summary report of the results of an xHMMUG[] fit.
Input
A vector of rules, typically as produced from xHMMUG[]. There are no options.
Output
The function does produce a result but Print[]s out the summary to the notebook as a side-effect.
HMMUG_20091222_05.nb 17
Code
In[5]:= xHMMUGReport@hmmug_D := ModuleA
8<,
PrintA"i
`
= ", MatrixForm@"i" . hmmugDE;
PrintA"a
`
= ", MatrixForm@"a" . hmmugDE;
PrintA"p
`
= ", MatrixForm@xMarkovChainEquilibrium@"a" . hmmugDDE;
PrintA"m
`
= ", MatrixForm@"m" . hmmugDE;
PrintA"s
`
= ", MatrixForm@"s" . hmmugDE;
PrintA"!@q
`
XD= ", Last@"LL" . hmmugDE;
Print@"t = ", P-1T P-2T - 1 &@"LL" . hmmugDD;
Print@" = ", Length@"LL" . hmmugDD;
Print@"BIC = ", "BIC" . hmmugD;
E;
HMMUG Simulator
Description
Simulate an HMMUG model with specified parameters for a specified number of periods.
Input
There are two inputs:
q itself as a list containing the parameters {i. a, m, s}, and
a positive integer simulation length.
There are no options.
Output
Returns a 2-list;
the vector of hidden states expressed as integers and
the vector of simulated data.
Note
The entire simulation function is a single line of Mathematica.
Code
In[6]:= xSimHMMUG@8vnStart_, mnMarkovChain_, vnMean_, vnSdev_<, iSimLength_D :=
8, RandomReal@NormalDistribution@vnMeanPT, vnSdevPTDD & < &@
NestList@
RandomChoice@mnMarkovChainPT Range@Length@mnMarkovChainDDD &,
RandomChoice@vnStart Range@Length@mnMarkovChainDDD,
iSimLength - 1
D
D;
S&P 500 Demonstration
18 HMMUG_20091222_05.nb
S&P 500 Demonstration
The dataset used to demonstrate the code above is the monthly log returns for the S&P 500 from Jan 1969 to the last full month
of the current date.
Running the code below in mid-December 2009 results in just under 40-years of monthly log returns for the S&P 500 index. The
the ticker "^GSPC" is price only, dividends are not included. For the purposes of this exposition, this is good enough.
Note that this period include a great deal of variety in the market: the sideways markets of the 1970s, the extended bull market
that with hindsight appears to have ended in the early 2000s, and the difficult markets of the first decade of the century. It
includes several interesting events such as the explosion of interest rates in the 1980s, the 1987 crash, the tech bubble, the
housing bubble, etc.
In[7]:= FinancialData@"^GSPC", "Name"D
Out[7]= S And P 500 Indexrth
In[8]:= mxSP500 = FinancialData@"^GSPC", 81968, 12, 1<D;
mxSP500 = Most@Last Split@mxSP500, 1P1, 2T 2P1, 2T ⅅ
mxSP500LogReturns = 8Rest@First mxSP500D, Differences@Log@Last mxSP500DD<";
In[11]:= Print@"Date Range: ", 8FirstFirst@D, FirstLast@D< &@mxSP500LogReturnsDD;
Print@"Number of Months: ", Length@mxSP500LogReturnsDD
Date Range: 881969, 1, 31<, 82009, 12, 31<<
Number of Months: 492
Plots of the period and cumulative returns and a histogram of returns are:
In[13]:= DateListPlot@mxSP500LogReturns, Joined True, PlotRange AllD
Out[13]=
1970 1980 1990 2000 2010
-0.2
-0.1
0.0
0.1
HMMUG_20091222_05.nb 19
In[14]:= DateListPlotA8First , Accumulate@Last D<" &@mxSP500LogReturnsD,
Joined True, PlotRange AllE
Out[14]=
1970 1980 1990 2000 2010
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
The log returns are noticeably negatively skewed and leptokurtotic (heavy-tailed):
In[15]:= Through@8Mean, StandardDeviation, Skewness, Kurtosis<@mxSP500LogReturnsPAll, 2TDD
Out[15]= 80.0048245, 0.0452431, -0.709973, 5.58923<
Model Fits
Before fitting fancy hidden Markov models, it would be wise to evaluate the simple assumption of log Normality. For the S&P
500's log returns, the log likelihood is computed as follows:
Univariate Gaussian Model
For a univariate Gaussian distribution the log density of a single observation is:
log y@x m, sD = -
1
2
log@2 pD - log@sD -
1
2
x -m
s
2
hence, for the observations X:
!@q XD =
t=1
T
log y@x
t
m, sD = -T
1
2
log@2 pD + log@sD -
t=1
T
1
2
x
t
-m
s
2
Note that we are using the MLE of the standard deviation.
In[16]:= nMean = Mean@mxSP500LogReturnsPAll, 2TD
nSdev = MeanAmxSP500LogReturnsPAll, 2T
2
E - nMean
2
iSize = Length@mxSP500LogReturnsPAll, 2TD
Out[16]= 0.0048245
Out[17]= 0.0451971
Out[18]= 492
The log likelihood is:
20 HMMUG_20091222_05.nb
In[19]:= nLogLikelihood =
-iSize
HLog@2 pDL
2
+ Log@nSdevD -
1
2
TotalB
mxSP500LogReturnsPAll, 2T - nMean
nSdev
2
F
Out[19]= 825.47
The BIC is:
In[20]:= nBIC = -2 nLogLikelihood + 2 Log@iSizeD
Out[20]= -1638.54
Two-State Model
As mentioned earlier, it's a good idea to initialize the algorithm at different points. Because some of these initial values may be
really bad, Mathematica may display underflow warnings. It's usually okay at this stage to ignore them. A set of 20 intitial
guesses are run for five iterations...
In[21]:= vxInitialTest@"SP500", 2D = xHMMUG@mxSP500LogReturnsPAll, 2T, , "MaxIterations" 5D &
Table@xHMMUGRandomInitial@mxSP500LogReturnsPAll, 2T, 2D, 820<D;
... then the one with the highest likelihood is selected...
In[22]:= iBest@"SP500", 2D = Position@, Max@DD &@Last@"LL" . D & vxInitialTest@"SP500", 2DDP1, 1T
Out[22]= 12
It's a good idea to examine the ensemble graphically to see if the number of iterations allowed for each candidate is reasonable:
In[23]:= ListPlot@"LL" . & vxInitialTest@"SP500", 2D, Joined TrueD
Out[23]=
1 2 3 4 5 6
760
780
800
820
840
The best candidate is then used to intialize the model fit. Here the default tolerance and iteration limit is used:
In[24]:= vxHMMUG@"SP500", 2D = xHMMUG@
mxSP500LogReturnsPAll, 2T,
8"i", "a", "m", "s"< . HvxInitialTest@"SP500", 2DPiBest@"SP500", 2DTL
D;
The convergence of the likelihood function should be examined:
HMMUG_20091222_05.nb 21
In[25]:= ListPlot@"LL" . vxHMMUG@"SP500", 2D, PlotRange AllD
Out[25]=
10 20 30
853.0
853.5
854.0
854.5
A report summarizing the result is:
In[26]:= xHMMUGReport@vxHMMUG@"SP500", 2DD
i
`
=
5.76112 10
-10
1.
a
`
= K
0.812364 0.187636
0.044969 0.955031
O
p
`
= K
0.193328
0.806672
O
m
`
= K
-0.0188743
0.0104782
O
s
`
= K
0.0691238
0.0349897
O
!@q
`
XD= 854.718
t = 9.38355 10
-8
= 39
BIC = -1666.05
Finally, the values of g
t
are cumulatively summed. This makes the interplay across states easier to see compared with plotting the
state probabilities:
22 HMMUG_20091222_05.nb
In[27]:= DateListPlotA8mxSP500LogReturnsPAll, 1T, <" &
IAccumulate IH"g" . vxHMMUG@"SP500", 2DL"MM, PlotStyle 88Red<, 8Green<<, Joined TrueE
Out[27]=
1970 1980 1990 2000 2010
0
100
200
300
400
If there are doubts about the solution, then the number of initial candidates or the number of iterations set for the candidates can
be varied. Once a candidate is selected the tolerance and iteration limit of the main fit may be adjusted. It's also reasonable to
repeat the entire process above multiple times to check the convergence of the log likelihood. Finally, no claim is made that the
method of generating initial candidates is optimal; it may be wise to consider alternatives.
Three-State Model
Here is the same analysis for a three-state model:
In[28]:= vxInitialTest@"SP500", 3D = xHMMUG@mxSP500LogReturnsPAll, 2T, , "MaxIterations" 5D &
Table@xHMMUGRandomInitial@mxSP500LogReturnsPAll, 2T, 3D, 820<D;
In[29]:= iBest@"SP500", 3D = Position@, Max@DD &@Last@"LL" . D & vxInitialTest@"SP500", 3DDP1, 1T
Out[29]= 15
In[30]:= ListPlot@"LL" . & vxInitialTest@"SP500", 3D, Joined TrueD
Out[30]=
1 2 3 4 5 6
800
810
820
830
840
850
Previous trials have indicated that the iteration limit sometimes must be increased to achieve the desired tolerance.
In[31]:= vxHMMUG@"SP500", 3D = xHMMUG@
mxSP500LogReturnsPAll, 2T,
8"i", "a", "m", "s"< . HvxInitialTest@"SP500", 3DPiBest@"SP500", 3DTL,
"MaxIterations" 1000
D;
HMMUG_20091222_05.nb 23
In[32]:= ListPlot@"LL" . vxHMMUG@"SP500", 3D, PlotRange AllD
Out[32]=
50 100 150 200 250 300 350
854
856
858
860
862
864
In[33]:= xHMMUGReport@vxHMMUG@"SP500", 3DD
i
`
=
2.76515 10
-259
1.
1.22483 10
-137
a
`
=
0.541067 0.458933 4.91812 10
-29
1.28261 10
-6
0.969118 0.0308804
0.195081 8.44613 10
-33
0.804919
p
`
=
0.0549045
0.815937
0.129159
m
`
=
0.0536356
0.00809838
-0.0366716
s
`
=
0.0204549
0.0356294
0.0695324
!@q
`
XD= 864.624
t = 7.87853 10
-8
= 354
BIC = -1642.47
24 HMMUG_20091222_05.nb
In[34]:= DateListPlotA8mxSP500LogReturnsPAll, 1T, <" &
IAccumulate IH"g" . vxHMMUG@"SP500", 3DL"MM, PlotStyle 88Red<, 8Green<, 8Blue<<,
Joined TrueE
Out[34]=
1970 1980 1990 2000 2010
0
100
200
300
400
Summary
Summarizing results by comparing the BICs for each model:
In[35]:= TableFormA
881, 2, 3<, Join@8nBIC<, "BIC" . & 8vxHMMUG@"SP500", 2D, vxHMMUG@"SP500", 3D<D<",
TableHeadings 8None, 8"States", "BIC"<<, TableAlignments CenterE
Out[35]//TableForm=
States BIC
1 -1638.54
2 -1666.05
3 -1642.47
Run as of 21-Dec-2009, the two-state model is selected on the basis of its BIC.
A Few Observations
Consider the "best" model above, the two-state model:
In[36]:= xHMMUGReport@vxHMMUG@"SP500", 2DD
HMMUG_20091222_05.nb 25
i
`
=
5.76112 10
-10
1.
a
`
= K
0.812364 0.187636
0.044969 0.955031
O
p
`
= K
0.193328
0.806672
O
m
`
= K
-0.0188743
0.0104782
O
s
`
= K
0.0691238
0.0349897
O
!@q
`
XD= 854.718
t = 9.38355 10
-8
= 39
BIC = -1666.05
Based on monthly returns, the market appears to be in a "low volatility-moderately positive return" or "bull" state about 80% of
the time and in a "high volatility-highly negative return" or "bear" state the remaining 20% of the time. The large transition
probabilites on the main diagonal of the state transition matrix a indicate that the states are somewhat "sticky". Thus, once the
market finds itself in a bull or bear state, it tends to stay there for a time.
This is the simple bull-bear model used as the initial example in the first section of the tutorial.
Simulation
Generating Simulated Data
A simulation, with timing, over the same time horizon as the original data with the two-state model's parameters is:
In[37]:= 8nTime, 8vnStates, vnSim<< =
Timing@xSimHMMUG@8"i", "a", "m", "s"< . vxHMMUG@"SP500", 2D, Length@mxSP500LogReturnsDDD;
On a 2!2.93 GHz quad-core Intel Nehalem-based Xeon system, the time is < 5 milliseconds. Your time may be faster or slower
but even on a relatively slow processor the code should run fairly quickly.
In[38]:= nTime
Out[38]= 0.004454
Plots of the period and cumulative returns and a histogram of returns are:
26 HMMUG_20091222_05.nb
In[39]:= ListPlot@vnSim, Joined True, PlotRange AllD
Out[39]=
100 200 300 400 500
-0.15
-0.10
-0.05
0.05
0.10
0.15
In[40]:= ListPlot@Accumulate@vnSimD, Joined True, PlotRange AllD
Out[40]=
100 200 300 400 500
0.5
1.0
1.5
2.0
2.5
In[41]:= Histogram@vnSimD
Out[41]=
The distribution should be noticeably negatively skewed and leptokurtotic:
In[42]:= Through@8Mean, StandardDeviation, Skewness, Kurtosis<@vnSimDD
Out[42]= 80.00525115, 0.0436044, -0.472083, 4.60125<
Fitting a Two-State HMMUG Model to the Simulation
In[43]:= vxInitialTest@"Sim", 2D = xHMMUG@mxSP500LogReturnsPAll, 2T, , "MaxIterations" 5D &
Table@xHMMUGRandomInitial@mxSP500LogReturnsPAll, 2T, 2D, 820<D;
HMMUG_20091222_05.nb 27
In[44]:= iBest@"Sim", 2D = Position@, Max@DD &@Last@"LL" . D & vxInitialTest@"Sim", 2DDP1, 1T
Out[44]= 14
In[45]:= ListPlot@"LL" . & vxInitialTest@"Sim", 2D, Joined TrueD
Out[45]=
1 2 3 4 5 6
790
800
810
820
830
840
850
In[46]:= vxHMMUG@"Sim", 2D = xHMMUG@
mxSP500LogReturnsPAll, 2T,
8"i", "a", "m", "s"< . HvxInitialTest@"Sim", 2DPiBest@"Sim", 2DTL
D;
In[47]:= ListPlot@"LL" . vxHMMUG@"Sim", 2D, PlotRange AllD
Out[47]=
10 20 30 40
852.5
853.0
853.5
854.0
854.5
Evaluating the Model
Note that the assignment of parameters to states is arbitrary and dependent upon initial conditions.
The original model fit is:
In[48]:= xHMMUGReport@vxHMMUG@"SP500", 2DD
28 HMMUG_20091222_05.nb
i
`
=
5.76112 10
-10
1.
a
`
= K
0.812364 0.187636
0.044969 0.955031
O
p
`
= K
0.193328
0.806672
O
m
`
= K
-0.0188743
0.0104782
O
s
`
= K
0.0691238
0.0349897
O
!@q
`
XD= 854.718
t = 9.38355 10
-8
= 39
BIC = -1666.05
The fit on the data simulated using the above as parameters is:
In[49]:= xHMMUGReport@vxHMMUG@"Sim", 2DD
i
`
=
2.62911 10
-10
1.
a
`
= K
0.811942 0.188058
0.044902 0.955098
O
p
`
= K
0.192746
0.807254
O
m
`
= K
-0.0189478
0.0104746
O
s
`
= K
0.069162
0.0350023
O
!@q
`
XD= 854.718
t = 7.83003 10
-8
= 42
BIC = -1666.05
The cumulative g-plots for the true hidden states produced by the simulation (dashed) and the fit (solid) are below. The assign-
ment of the underlying states positionally is dependent upon intitialization.
HMMUG_20091222_05.nb 29
In[50]:=
ListPlot@Join@Accumulate Transpose@"g" . vxHMMUG@"Sim", 2DD,
Accumulate Transpose@If@ 1, 81, 0<, 80, 1<D & vnStatesDD,
PlotStyle 88Red<, 8Green<, 8Red, Dashed<, 8Green, Dashed<<, Joined TrueD
Out[50]=
100 200 300 400 500
100
200
300
400
Using a qq-plot or quantile plot to compare the distribution of the original and simulated data requires that the Statistical Plots
package be loaded:
In[51]:= Needs@"StatisticalPlots`"D
In[52]:= QuantilePlot@mxSP500LogReturnsPAll, 2T, vnSimD
Out[52]=