Time Series Regime Analysis in Python - by Spencer - Medium
Time Series Regime Analysis in Python - by Spencer - Medium
Open in app
1
Search
Get unlimited access to the best of Medium for less than $1/week. Become a member
To highlight the importance of time series regime analysis, one does not have to
look further than classic time series forecasting techniques. A linear autoregressive
(AR) model for example, suggests that a value in time has a linear dependency on its
previous values and on an error term. An autoregressive model of order p is
represented below.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 1/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
[1]
The main assumption of the AR approach is that the mean and variance of the time
series remains constant. We may consider this to be a single regime in terms of
mean and variance. In reality time series do not always present with constant
regimes and series are often observed changing regimes. This necessitates the
understanding and detection of underlying regimes to improve forecasting
techniques.
[2]
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 2/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Markov Switching Autoregressive Models (MS-AR) still define the price of an asset as
an autoregressive process, however each process is also regime specific.
Representing this mathematically, an easy to understand MS-AR process with 3
regimes (S) is shown below.
[3]
Now moving onto the implementation, this will be separated into two sub-sections.
The first section deals with the MS-AR implementation for regime detection and
classification. The second section deals with the machine learning, Random Forest
Classifier, implementation for the prediction of the next regime.
MS-AR
First of all, we are going to need data. The Standard & Poor's 500 (S&P500) index of
daily frequency is used. The S&P500 is a stock market index tracking the
performance of 500 large companies listed on stock exchanges in the United States.
We use the S&P 500 ticker ^GSPC, specifying the start and end dates as well as a
daily interval. The data is sourced from Yahoo finance.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 3/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Plotting the adjusted close prices gives us a good picture of what we are dealing
with. This is clearly an increasing series with drawdowns around major crises such
as the global financial crisis of 2008/2009 and the COVID-19 pandemic of 2020.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 4/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Before fitting our model it’s important to understand what we are trying to achieve
with regime analysis. If we were investing in a fund tracking the S&P 500 index,
what would we do? We may buy and hold, making a handsome return over the last
22 years. However, what would be even better would be to avoid significant
drawdowns. If we had a way of characterizing the time series during these periods
of losses, hopefully exiting the market and only re-entering after the drawdown
period.
Now, above I mentioned that if we could characterize the time series, we may be
able to identify and avoid large drawdowns. The word characterize leads into one of
the most important parts of the article. A simple characteristic describing how price
is changing is the percentage change in price from one period to the next. We will
call this the price return. Referring back to our discussion on what a market is, price
is changing because participants are buying and selling an asset. Therefore, the
percentage return of an asset can be loosely thought of as characterizing the buying
and selling behavior of market participants. In reality there is a lot more that could
characterize this behavior, like traded volumes or high-low spreads. But for
simplicity’s sake, let’s take one period price returns as our only characteristic. When
markets are calm, participants are not frantically buying and selling assets so daily
price returns vary less. Prices also tend to trend upward. We may describe price
returns as having a low variance, suggesting that there is a relatively low degree of
spread in returns. When markets are in a state of panic, participants are frantically
buying and selling assets to either make short term gains or avoid losses. We may
describe price returns as having a high variance as there is a high spread in daily
returns. This panic would likely start occurring when assets are becoming
overpriced or in response to a marco-event.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 5/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Given the discussion above, a plot of the price returns (excess returns) starts to paint
a picture of possible underlying regimes and how returns could characterize
participants behavior.
Focusing on our key dates, 2008/2009 and 2020, we can see price returns spike
dramatically. This supports the hypothesized dynamics we alluded to previously.
Furthermore, supporting the use of price returns to characterize the behavior of
markets and participants, suggesting that there are in fact underlying regimes
described by price return variance.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 6/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Fitting the markov regression is the easiest part of the analysis. We instantiate the
model and fit it in only two lines of code. The final line of code will give us a
summary of fitment results. When creating the model we specify the number of
regimes we are wanting to detect. In my experience, trying to detect and
characterize more than 3 or 4 regimes becomes tedious with the usefulness of
results decaying rapidly. In this case we specify 2 regimes, these can be
characterized as a low variance return regime and a high variance return regime.
Trend is specified as no trend, as we are exploring stationary (or close to stationary)
price return data. Variance switching being set to true means that we expect there to
be regime specific heteroskedasticity. In other words the variance of our residuals is
not constant, and we expect this to be regime specific.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 7/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
0 1
Date
2000-01-03 0.029665 0.970335
2000-01-04 0.000002 0.999998
2000-01-05 0.003385 0.996615
2000-01-06 0.003566 0.996434
2000-01-07 0.000665 0.999335
To further visualize the output we can plot smoothed regime probabilities. Now we
can see our regime classifications taking shape, with our crises dates of interest
having dominant probabilities of high variance regime returns.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 8/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Now all we have to do is visualize our regimes on a graph of the adjusted close price
of the S&P 500.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 9/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
What we get makes a lot of sense. During periods of volatility, generally around
major market decline, we see high variance regime returns (red) confirming our
hypothesis. During periods of low volatility, generally during calm uptrends we see
low variance regime returns (green). From an investors perspective this graph can
be interpreted simply, buy or hold assets during green periods and sell holdings as
soon as possible in red periods. This is where machine learning may prove to be
useful. A Markov switching regression is good at telling us what regime we were just
in, ‘nowcasting’, but it doesn’t not necessarily give us much information about the
next regime. A machine learning model can provide just that, giving us an
expectation of the next regime.
Random Forest Classifier
Machine learning is separated into three distinct types of learning, namely
supervised, unsupervised and reinforcement learning. In this case we are going to
be using a Random Forest Classifier, an algorithm in the supervised domain.
Supervised learning implies that our model will take dependent and independent
variables. Our dependent variables are our data labels which in this case will be the
regime in the next period. Our independent variables are all the features which the
model will use in making these predictions. At a high level, our machine learning
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 10/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
model will aim to map a function between our labels and features. Our biggest
challenge in applying machine learning to our problem is the relatively small
dataset increasing the risk of overfitting. Overfitting means that our model
memorizes our past data, learning few underlying relationships, which severely
limits its usefulness looking forward.
First we need to prepare our dataset for the Random Forest Classifier. In addition to
S&P 500 price data, we will create price and volume return features. These will be
the percentage changes in the adjusted close price and volume over a number of
different lookback periods. Most importantly we also need to append our MS-AR
regime classification probabilities. These will describe the classification of the
current regime to the machine learning model. We will also add return volatility
statistics to describe the volatility of past prices. Finally we need our binary state or
regime classifications, this is the column that we will transform into our labels for
the classifier.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 11/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 12/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Now that our regimes column is our target, we can start our data preprocessing. I
will be using the train-test split function to create X and y lists for both training and
testing. I opted for an 80–20 train-test split with 80% of the data used for training
and 20% reserved for evaluating the model.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 13/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Upon inspection we find that we have significantly fewer high variance regime
training samples when compared with low variance regime training samples.
This makes sense as times of volatility and instability are thankfully relatively
infrequent, however this causes our dataset to be unbalanced. Unbalanced data
means that our classes for classification are unequally represented which may affect
model performance. To rebalance our dataset we have a few options. We could
oversample our smallest class by synthetically creating more examples of high
variance regimes. I prefer to take the more intuitive approach of under sampling the
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 14/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
largest class by the size difference to the smallest class. This means that we will just
get rid of low variance examples until our number of high and low variance regime
return examples are equal.
Now that our data is ready we can fit our model. This is the easy part as all complex
calculations and iterations are taken care of for us.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 15/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
In fitting our model we need to specify the number of estimators which is the
number of trees in our random forest. I arbitrarily chose to initialize the model with
20 trees. We will explore how to better choose this parameter later on in the article.
After fitting our model we can make predictions on our testing dataset which we
will use to evaluate performance. We can get two types of predictive output from the
random forest classifier. We can get a binary classification output or a probabilistic
classification output.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 16/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 17/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
On the testing set (unseen data), our model has achieved a true positive rate of
99.4% meaning that over 99% of actual high variance regimes in the next period
were correctly predicted as high variance regimes. The model presented with a 0.5%
false positive rate, meaning that 0.5% of actual low variance regimes in the next
period were incorrectly predicted as high variance regimes. To get a view of overall
accuracy we can compute the area under the curve (AUC). Our model has an AUC of
99.5%, meaning that the model is able to distinguish between high and low variance
regimes 99.5% of the time. Alternatively we can also use an accuracy score
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 18/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
computation. All metrics would suggest a model with a high degree of accuracy and
low prediction error, providing a satisfactory result for this use case.
Previously I mentioned that I had arbitrarily chosen to initialize the random forest
with 20 trees. Now this has seemed to work relatively well, but how would we go
about tuning that hyperparameter? Although tuning would likely yield a small
increase in results, it is important to remember the random nature of machine
learning and that every model fitment is going to contain some variation. To tune a
hyperparameter such as the number of trees we can easily write a loop to fit models
of different specifications and plot the results. Visually we can see that in this case
there is not too much extra performance to gain from increasing the number of
trees beyond approximately 20 trees.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 19/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
As one would expect our most important features are columns 20 and 21, being our
current variance regime probability columns. It would seem that our model has
learnt rules associating the current regime mapped to the future regime.
This wraps up our exploration of using machine learning to predict the next
variance return regime. I have two areas that I would target when building on and
improving this approach.
2. Using deep learning for predicting the next state. Leveraging long short-term
memory neural networks could allow for a comprehensive predictive model that
can detect non-linear dependencies within the series over a specified lookback
window. Again the amount of data is the limiting factor in this application,
making it more useful for higher granularity time series.
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 20/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
I hope you found this simple introduction to regime analysis and prediction useful.
My takeaway from this process is the different angle of approach to time series
problems. While many may try to better predict a series, an approach like this
suggests that it may be better to know when to predict and when not to predict. In
terms of finance, this translates to analyzing when to trade and when not to trade a
particular strategy. Find the full notebook and code for this article here.
[1] https://en.wikipedia.org/wiki/Autoregressive_model
[2] https://en.wikipedia.org/wiki/Markov_chain
[3] Mendy, D. & Widodo, T. (2018). ”Two stage Markov switching model: Identifying the
Indonesian Rupiah Per US Dollar turning points post 1997 financial crisis,” MPRA Paper
86728.
Following
Written by Spencer
85 Followers
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 21/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
13 1
Tejas Ekawade
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 22/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
100 1
Lists
ChatGPT prompts
47 stories · 1587 saves
Seyed Mousavi
224 2
FMZQuant
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 24/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
Janelle Turing
203 2
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 25/26
24/05/2024, 08:50 Time Series Regime Analysis in Python | by Spencer | Medium
56
https://medium.com/@spencer13luck/time-series-regime-analysis-in-python-ffdc7aa005dd 26/26