Week 1 Notes
Week 1 Notes
OVERVIEW
BDF1: Supervised ML
ADMIN
May 15: Coursework released (probably about deep learning)
TECH
Recommend Python 3, with packages:
numpy
Pandas
Matplotlib/seaborn
Sklearn
Tensorflow/keras
Alternative: Google colab, the downside is that you get disconnected after 12 hours
Note: If you want to do extensive computation for your projects or otherwise, you
might want to check out cloud services like AWS and Google Cloud.
These things cost money and you’re under no pressure to spend money on your
projects. However, you often get some free credit when you sign up, which might be
helpful. But let me stress that we’re perfectly happy for you to do smaller-scope
projects that are feasible to run on a laptop — you can achieve the same grades with
a study like this if it is well executed.
Notebook exercise: Which neural network can learn the structure of the “two
spirals” data? Which network outperforms a simple tree or random forest?
Micro data: For every stock in the US market, for every month in the sample period,
the authors collect
macro variable as it displays the
market condition bear/not bear
Past performance: Return between month t-12 and t-2
Future performance: Return between t and t+1
Look at Fig 1 and Tab 3: Basic momentum strategy (WML) is the buy the winners
(top 10% stocks in terms of past performance) and short the losers (bottom 10%)
each month. This strategy performs very well on average but poorly in bear markets
does bad during bad times
The key thing to notice is the interaction between past performance of a stock and
the macro environment. In a very simple 2D plot (where plus and minus signs denote
expected returns):
NB: Human specification search, still very coarse, only one threshold nonlinearity
Goal for this part of the course: Learn how we can refine further using deep learning
and exploit more complex nonlinearities
Learning outcomes:
MODEL SETUP
We want to cast the task of return prediction as a classical supervised learning
problem with loss minimisation:
We look at a simple approach that imposes very little structure on the data — the
point is to let the machine figure out the important patterns. An alternative approach
which imposes more structure is in the supplementary notes below.
Unlike in the momentum case, we do not hard-wire things like the definition of a
“loser” or a “bear market”. We allow the machine to recognise these kinds of signals
in a flexible way.
For the target y, we use future performance between date t (now) and t+1 (next
month). Again, we do not hard-wire the kind of portfolio we are interested in (e.g.,
WML). We let the machine tell us flexibly which stocks are good prospects.
More explicitly, the prediction problem becomes, for every time period (month) t and
every stock i:
For the loss function, we use mean square error, which is the standard choice in
machine learning when predicting a continuous variable (i.e., in “regression”
problems), and also a standard choice in asset pricing. Formally, if we have T time
periods and N stocks, this is
To summarise, this picture shows how we would attack the problem with a neural
network:
This is perhaps the simplest setup but by no means the only one. Some interesting
suggestions came up in class:
What if we stick to looking at momentum, but allow a machine to fine tune the
strategy (e.g., 5% winners instead of 10%) at different points in time?
Which combinations of past returns should one consider for the features x?
Does it make sense to predict returns separately for all stocks, so that the
network has N outputs instead of one?
Perhaps a mean-square loss is too conservative, and the model will have an
incentive to always predict zero. Is there value in an additional loss for getting the
sign of returns wrong?
DATA SETUP
Look at GKX, Section 2.1
To select the right predictive variables x, they rely on Welch-Goyal for macro
indicators, and on decades of literature about the cross section of stock returns for
the relevant micro characteristics. Characteristics used are, among others:
Classics such as beta, B/M (value vs. Growth), Size, momentum (these are the
predictors that inspired the famous Fama-French factor models)
Accounting ratios
Past performance information beyond momentum, e.g., past volatility
1. Standardisation
Here: We first convert characteristics into cross-sectional ranks in each month (e.g.,
the company with the highest book-market ratio in June 2013 gets assigned a “1” in
that month, the second-highest a “2”, and so on).
2. Train-test split: 18Y for training, 12Y for validation, 30Y for testing
Note: No cross validation. This is typical for neural networks because re-training on
many folds is too computationally expensive. This also helps us to respect the time
series dimension — the vali set follows immediately after the training set, which
would not be possible if using many folds.
3. Missing values: Replace with cross-sectional median (economic rationale for this
is unclear, but it prevents shrinking of the dataset due to missing values)
4. Interactions: Do not include only stock char and macro variables, but also all of
their cross-products. We are nudging the machine towards considering
relationships like “characteristic A matters more in macro environment B”, as we
saw in the momentum case study.
Takeaway: A lot of economics and domain expertise are already baked in before we
start characteristics have been chosen based on 30 years of literature
The approach above, following GKX, was to keep the statistical model as general
and flexible as possible. Another approach is to use insights from economic theory
to put constraints on the model.
A more general theory that we can use to impose constraints is that financial
markets should not allow arbitrage to persist. Arbitrage means getting a free lunch:
A trade with zero risk and positive return. The argument goes: If prices permit a free
lunch, say by buying firm A, many traders will rush to buy, prices will go up, and the
arbitrage goes away.
Let’s impose the idea that there can be no arbitrage on our model as a constraint.
We should not take this too literally: Everyone knows that arbitrage sometimes
persists for a while in reality, and high-frequency traders make a lot of money this
way. But it may be a decent enough approximation if we are doing relatively low-
frequency trading, e.g., the monthly trades that we have looked at in this lecture so
far.
We need a few results from finance theory to make this work. The approach loosely
follows Chen-Pelger-Zhu (CPZ), whose research paper is on the course page.
The expectation above conditions on Info_t, which stands for all information
available to the market at time t.
2. Another way of reading this equation is to say: You cannot use any combination
of information at time t to predict the modulated excess returns
(because, conditional on any of this info, the predicted modulated return is always
zero!). This implies another useful equation:
Now, the expectation is unconditional, and g(.) can be any arbitrary function of
information at date t. This is just another way of mathematically encoding the no-
arbitrage condition. If you are interested in the maths, you can try to derive this
version — the proof is only a few lines if you start with the previous equation, and
uses the law of iterated expectations.
CPZ constrain their neural network by imposing no arbitrage on the model. In fact,
they go further: Instead of predicting excess returns directly
(as we did above by predicting returns), they move the goalposts and try to estimate
the SDF. This is reasonable: Remember that only the SDF determines how expected
returns differ from zero. Therefore, once we know the SDF, we can find any
expected return we want (the details on how we back out expected returns from the
SDF are in their paper)
How to find the SDF? The idea is to get equation (*) as close to zero as possible.
The loss function is therefore changed to minimising the left-hand side of (*). In
particular, they set up a neural network whose input layer consists of micro and
macro variables x that are known at time t (just as in GKX), but whose output y is an
SDF (unlike in GKX, where y=returns).
In addition, they use a second neural network to discipline their predictions. Notice
in equation (*) that the function g can be anything: Intuitively, we can
condition on any function of information we want; it should still be impossible to
predict modulated returns. The second network now tries to find the function g(x)
that gives us the *worst* possible result, that is, the g(x) which drives the left-hand
side as far from zero as possible. Thus, we have two networks fighting with each
other: Network 1 tries to get the pricing equation right by picking m to minimise
pricing error, network 2 tries to break it by picking g to maximise errors. This
technique is called General Adversarial Networks (GAN). The intuition is that “what
doesn’t kill you makes you stronger”. Network 1 has to try harder, and price better, in
order to win against network 2.
PS: When you read CPZ, you’ll see that they cover even more ground by including a
recursive neural net (RNN) in the architecture. The idea here is to encode the history
of many (approx. 170) macro indicators in their data into a smaller number of
“hidden macro states”. This is another way to constrain the model. We do not have
time to cover RNN in class, but there are some references below. Talk to me if you’d
like to use this in your project.
FURTHER READING
For an exhaustive resource on deep learning, written by some of the top researchers
in the field: deeplearningbook.org
Another great introduction to neural nets is the course CS231n at Stanford, which is
publicly available (google it). This also talks about RNN.
I encourage you to read the original paper on momentum crashes and GKX in as
much detail as possible
For the arbitrage pricing approach, the main resource is the paper by Chen-Pelger-
Zhu, which is on the course website. If you find the asset pricing theory in that paper
hard to follow, I recommend the textbook “Asset Pricing” by John Cochrane
(especially the first few chapters) as a refresher. Cochrane also has great lecture
notes online.