0% found this document useful (0 votes)

19 views

Week 1 Notes

The document provides an overview of a course on supervised machine learning and deep learning applied to finance topics. It outlines several predictive modeling case studies that will be covered, including predicting stock returns and credit defaults, and combining time series and cross-sectional data using deep neural networks. It also provides details on the technical aspects of the course, including recommended software and packages.

Uploaded by

Dooja Sedali

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Week 1 Notes

Uploaded by

Dooja Sedali

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Week 1

OVERVIEW
BDF1: Supervised ML

1. Predict default in credit markets -> Logit, Trees, Forests

2. Predict time series of stock returns -> LASSO, trees (Welch/Goyal)
3. Predict cross section of stock returns -> text

BDF2: Deep learning and reinforcement learning

1. Combine TS & CS prediction in stock markets using deep neural networks

2. Portfolio selection with reinforcement learning
3. Credit allocation for Fintech

Focus: Interpretability and causality

ADMIN
May 15: Coursework released (probably about deep learning)

May 18: Project proposals in class (anything to do with lectures)

June 15: Project presentations

Exam: 6 discussion, 8 multiple choice

2019: Style of questions but different focus of course

Office hour: Email me, Antoine or Nick

Zoom: Questions on chat, breakout rooms with presentations afterwards

TECH
Recommend Python 3, with packages:
numpy
Pandas
Matplotlib/seaborn
Sklearn
Tensorflow/keras

Alternative: Google colab, the downside is that you get disconnected after 12 hours

Note: If you want to do extensive computation for your projects or otherwise, you
might want to check out cloud services like AWS and Google Cloud.

These things cost money and you’re under no pressure to spend money on your
projects. However, you often get some free credit when you sign up, which might be
helpful. But let me stress that we’re perfectly happy for you to do smaller-scope
projects that are feasible to run on a laptop — you can achieve the same grades with
a study like this if it is well executed.

DEEP LEARNING INTRODUCTION

See BDF1 for introduction to neural networks

Recap of standard architecture:

Why use a deep neural network with many layers?

Universal approximation: Can represent *any* nonlinear function with a one-layer
network, as long as you have enough units
Curse of dimensionality when p is large, number of units you need for universal
approximation grows exponentially with p
Deep networks: The number of things you can represent grows exponentially in
number of layers, while computational cost grows linearly in number of layers ->
We beat the curse of dimensionality by adding more layers to our network

Notebook exercise: Which neural network can learn the structure of the “two
spirals” data? Which network outperforms a simple tree or random forest?

Next: Why is this important in finance?

CASE STUDY: MOMENTUM

Based on “Momentum crashes” by Daniel and Moskowitz (JFE 2016)

Micro data: For every stock in the US market, for every month in the sample period,
the authors collect
macro variable as it displays the
market condition bear/not bear
Past performance: Return between month t-12 and t-2
Future performance: Return between t and t+1

Look at Fig 1 and Tab 3: Basic momentum strategy (WML) is the buy the winners
(top 10% stocks in terms of past performance) and short the losers (bottom 10%)
each month. This strategy performs very well on average but poorly in bear markets
does bad during bad times
The key thing to notice is the interaction between past performance of a stock and
the macro environment. In a very simple 2D plot (where plus and minus signs denote
expected returns):

Win - good market = good result

lose - good market = bad result
win - bad market = bad result
lose - bad market = good result
Refined momentum strategy based on the results in the paper should be nonlinear

WML in good times

Neutral (or even LMW) in bad times

NB: Human specification search, still very coarse, only one threshold nonlinearity

Goal for this part of the course: Learn how we can refine further using deep learning
and exploit more complex nonlinearities

Learning outcomes:

1. Setup the data and model architecture (weeks 1 and 2)

2. Train the network (week 2)
3. Interpret the results (week 3)

MODEL SETUP
We want to cast the task of return prediction as a classical supervised learning
problem with loss minimisation:

We look at a simple approach that imposes very little structure on the data — the
point is to let the machine figure out the important patterns. An alternative approach
which imposes more structure is in the supplementary notes below.

For the inputs x, we use

Micro characteristics: Past performance (for momentum example) and other

characteristics of each stock (for more general analysis)
Macro data: Stock market conditions (for momentum example) and other macro
indicators

Unlike in the momentum case, we do not hard-wire things like the definition of a
“loser” or a “bear market”. We allow the machine to recognise these kinds of signals
in a flexible way.

For the target y, we use future performance between date t (now) and t+1 (next
month). Again, we do not hard-wire the kind of portfolio we are interested in (e.g.,
WML). We let the machine tell us flexibly which stocks are good prospects.

More explicitly, the prediction problem becomes, for every time period (month) t and
every stock i:

For the loss function, we use mean square error, which is the standard choice in
machine learning when predicting a continuous variable (i.e., in “regression”
problems), and also a standard choice in asset pricing. Formally, if we have T time
periods and N stocks, this is

To summarise, this picture shows how we would attack the problem with a neural
network:
This is perhaps the simplest setup but by no means the only one. Some interesting
suggestions came up in class:

What if we stick to looking at momentum, but allow a machine to fine tune the
strategy (e.g., 5% winners instead of 10%) at different points in time?
Which combinations of past returns should one consider for the features x?
Does it make sense to predict returns separately for all stocks, so that the
network has N outputs instead of one?
Perhaps a mean-square loss is too conservative, and the model will have an
incentive to always predict zero. Is there value in an additional loss for getting the
sign of returns wrong?

Some of these might be interesting to pursue in your projects.

DATA SETUP
Look at GKX, Section 2.1

The authors apply an extensive amount of pre-processing and domain knowledge to

the raw data before running their neural network.

To select the right predictive variables x, they rely on Welch-Goyal for macro
indicators, and on decades of literature about the cross section of stock returns for
the relevant micro characteristics. Characteristics used are, among others:

Classics such as beta, B/M (value vs. Growth), Size, momentum (these are the
predictors that inspired the famous Fama-French factor models)
Accounting ratios
Past performance information beyond momentum, e.g., past volatility

Pre-processing steps are as follows

1. Standardisation

Usual procedure in ML would be: subtract mean, divide by standard deviation

Here: We first convert characteristics into cross-sectional ranks in each month (e.g.,
the company with the highest book-market ratio in June 2013 gets assigned a “1” in
that month, the second-highest a “2”, and so on).

Ranks are then mapped to the [-1,1] interval for normalisation

This is informed by previous work in asset pricing that has found ranks to be more
predictive than raw numbers

2. Train-test split: 18Y for training, 12Y for validation, 30Y for testing

Note: No cross validation. This is typical for neural networks because re-training on
many folds is too computationally expensive. This also helps us to respect the time
series dimension — the vali set follows immediately after the training set, which
would not be possible if using many folds.

3. Missing values: Replace with cross-sectional median (economic rationale for this
is unclear, but it prevents shrinking of the dataset due to missing values)
4. Interactions: Do not include only stock char and macro variables, but also all of
their cross-products. We are nudging the machine towards considering
relationships like “characteristic A matters more in macro environment B”, as we
saw in the momentum case study.

5. Ensure validity of the exercise:

6. Do not use variables at month t that were published later
7. Lag variables to avoid look-ahead bias

Takeaway: A lot of economics and domain expertise are already baked in before we
start characteristics have been chosen based on 30 years of literature

Perhaps in contrast to the view (common in the computer science community)

that neural nets can figure out everything “from scratch” without any human
expertise. Maybe economist won’t be replaced by robots just yet...

ALTERNATIVE ARCHITECTURE: ARBITRAGE

PRICING APPROACH
NB: We did not go through this in class, so I will not examine you on the technical
details of this section. However, it will definitely help your progress to read it
carefully and understand the intuition, even if you skip some technicalities.

The approach above, following GKX, was to keep the statistical model as general
and flexible as possible. Another approach is to use insights from economic theory
to put constraints on the model.

Constraints sound like a negative thing, so why would we impose them on

ourselves? For a very simple example, suppose that theory suggests very strongly
that stocks A and B always move together. Then, it makes sense to bake this co-
movement into your statistical model from the start — more concretely, we would do
this by using an architecture which, instead of chasing universal approximation, is
only able to represent relationships that respect this co-movement. This way, the
model then does not have to waste its flexibility (or more formally, its “ degrees of
freedom”) on figuring out that they move together.

This is especially important because, to prevent overfit in practice, we end up

regularising our neural nets. This effectively means that we make the optimisation
algorithm pays a penalty whenever the model becomes more complex (e.g., in the
sense of L1 or L2 norms of the parameter vector). An unconstrained model, which
says “ stock A is going up and stock B is going up”, is more complex than a
constrained model, which knows that A and B move together and says “both are
going up”. Hence, the unconstrained model has to spend more of its limited
complexity budget to accurately describe stocks A and B. In fact, the optimisation
algorithm might end up choosing not to describe them in order to save its budget
for stocks C or D...

A more general theory that we can use to impose constraints is that financial
markets should not allow arbitrage to persist. Arbitrage means getting a free lunch:
A trade with zero risk and positive return. The argument goes: If prices permit a free
lunch, say by buying firm A, many traders will rush to buy, prices will go up, and the
arbitrage goes away.

Let’s impose the idea that there can be no arbitrage on our model as a constraint.
We should not take this too literally: Everyone knows that arbitrage sometimes
persists for a while in reality, and high-frequency traders make a lot of money this
way. But it may be a decent enough approximation if we are doing relatively low-
frequency trading, e.g., the monthly trades that we have looked at in this lecture so
far.

We need a few results from finance theory to make this work. The approach loosely
follows Chen-Pelger-Zhu (CPZ), whose research paper is on the course page.

1. There is no arbitrage in a market if and only if there exists a “stochastic discount

factor” denoted m (a.k.a. SDF, Pricing kernel, or equivalent martingale measure).
An SDF satisfies the following for all stocks i at all times t:
This says that, once modulated with the SDF, the future excess return on every
stock is zero in expectation, at all times. Of course, the raw (unmodulated) expected
returns deviate from zero — figuring out which ones are high or low is the whole
point. However, these deviations are all summarised in one place, namely, in the
SDF.

The expectation above conditions on Info_t, which stands for all information
available to the market at time t.

2. Another way of reading this equation is to say: You cannot use any combination
of information at time t to predict the modulated excess returns

(because, conditional on any of this info, the predicted modulated return is always
zero!). This implies another useful equation:

Now, the expectation is unconditional, and g(.) can be any arbitrary function of
information at date t. This is just another way of mathematically encoding the no-
arbitrage condition. If you are interested in the maths, you can try to derive this
version — the proof is only a few lines if you start with the previous equation, and
uses the law of iterated expectations.

CPZ constrain their neural network by imposing no arbitrage on the model. In fact,
they go further: Instead of predicting excess returns directly
(as we did above by predicting returns), they move the goalposts and try to estimate
the SDF. This is reasonable: Remember that only the SDF determines how expected
returns differ from zero. Therefore, once we know the SDF, we can find any
expected return we want (the details on how we back out expected returns from the
SDF are in their paper)

How to find the SDF? The idea is to get equation (*) as close to zero as possible.
The loss function is therefore changed to minimising the left-hand side of (*). In
particular, they set up a neural network whose input layer consists of micro and
macro variables x that are known at time t (just as in GKX), but whose output y is an
SDF (unlike in GKX, where y=returns).

In addition, they use a second neural network to discipline their predictions. Notice
in equation (*) that the function g can be anything: Intuitively, we can
condition on any function of information we want; it should still be impossible to
predict modulated returns. The second network now tries to find the function g(x)
that gives us the *worst* possible result, that is, the g(x) which drives the left-hand
side as far from zero as possible. Thus, we have two networks fighting with each
other: Network 1 tries to get the pricing equation right by picking m to minimise
pricing error, network 2 tries to break it by picking g to maximise errors. This
technique is called General Adversarial Networks (GAN). The intuition is that “what
doesn’t kill you makes you stronger”. Network 1 has to try harder, and price better, in
order to win against network 2.

PS: When you read CPZ, you’ll see that they cover even more ground by including a
recursive neural net (RNN) in the architecture. The idea here is to encode the history
of many (approx. 170) macro indicators in their data into a smaller number of
“hidden macro states”. This is another way to constrain the model. We do not have
time to cover RNN in class, but there are some references below. Talk to me if you’d
like to use this in your project.

FURTHER READING
For an exhaustive resource on deep learning, written by some of the top researchers
in the field: deeplearningbook.org

Another great introduction to neural nets is the course CS231n at Stanford, which is
publicly available (google it). This also talks about RNN.

I encourage you to read the original paper on momentum crashes and GKX in as
much detail as possible

For the arbitrage pricing approach, the main resource is the paper by Chen-Pelger-
Zhu, which is on the course website. If you find the asset pricing theory in that paper
hard to follow, I recommend the textbook “Asset Pricing” by John Cochrane
(especially the first few chapters) as a refresher. Cochrane also has great lecture
notes online.

Machine Learning For Algorithmic Trading
38% (8)
Machine Learning For Algorithmic Trading
13 pages
A Global Fintech Overview
No ratings yet
A Global Fintech Overview
34 pages
SSRN Id3197726
No ratings yet
SSRN Id3197726
27 pages
Equity Machine Factor Models
No ratings yet
Equity Machine Factor Models
12 pages
Financial Market Prediction Using Deep Learning
No ratings yet
Financial Market Prediction Using Deep Learning
22 pages
Ai Model for Predicting Stock Market Movement , DSDM Boo Chang Gyu
No ratings yet
Ai Model for Predicting Stock Market Movement , DSDM Boo Chang Gyu
7 pages
$tock Forecasting Using Machine Learning: Greg Colvin, Garrett Hemann, and Simon Kalouche
No ratings yet
$tock Forecasting Using Machine Learning: Greg Colvin, Garrett Hemann, and Simon Kalouche
5 pages
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
No ratings yet
Testing Stock Market Efficiency Using Historical Trading Data and Machine Learning
40 pages
Applying Deep Learning To Enhance Momentum Trading Strategies in Stocks
100% (1)
Applying Deep Learning To Enhance Momentum Trading Strategies in Stocks
5 pages
Python A.I. Stock Prediction
100% (1)
Python A.I. Stock Prediction
24 pages
Enhancing Time Series Momentum Strategies Using Deep Neural Networks
No ratings yet
Enhancing Time Series Momentum Strategies Using Deep Neural Networks
21 pages
Options Tradingusing Artificial Neural Network
No ratings yet
Options Tradingusing Artificial Neural Network
9 pages
Neural Networks To Hedge and Price Stock Options 1716166663
No ratings yet
Neural Networks To Hedge and Price Stock Options 1716166663
58 pages
3-Predicting Stock Prices Using Deep Learning - by Yacoub Ahmed - Towards Data Science PDF
No ratings yet
3-Predicting Stock Prices Using Deep Learning - by Yacoub Ahmed - Towards Data Science PDF
15 pages
Ten Financial Applications of Machine Learning: Marcos López de Prado
No ratings yet
Ten Financial Applications of Machine Learning: Marcos López de Prado
20 pages
Deep Order Flow Imbalance Extracting Alpha at Multiple Horizons
No ratings yet
Deep Order Flow Imbalance Extracting Alpha at Multiple Horizons
46 pages
IJRPR3112
No ratings yet
IJRPR3112
8 pages
Ai DSS
No ratings yet
Ai DSS
27 pages
Stock Price Prediction using Machine Learning
No ratings yet
Stock Price Prediction using Machine Learning
44 pages
Chapter2-Introduction
No ratings yet
Chapter2-Introduction
2 pages
Stock Market Prediction
No ratings yet
Stock Market Prediction
15 pages
A Deep Trend-Following Trading Strategy For Equity Markets
No ratings yet
A Deep Trend-Following Trading Strategy For Equity Markets
27 pages
Deep Learning Applying On Stock Trading
No ratings yet
Deep Learning Applying On Stock Trading
6 pages
Financial Time Series Models—Comprehensive Review of Deep Learning Approaches and Practical Recommendations
No ratings yet
Financial Time Series Models—Comprehensive Review of Deep Learning Approaches and Practical Recommendations
13 pages
54272
No ratings yet
54272
7 pages
Sample 2 Research Paper
No ratings yet
Sample 2 Research Paper
19 pages
How To Use Machine Learning To Possibly Become A Millionaire - Predicting The Stock Market
No ratings yet
How To Use Machine Learning To Possibly Become A Millionaire - Predicting The Stock Market
23 pages
Stock Market Indices Prediction With Various Neural Network Models PDF
No ratings yet
Stock Market Indices Prediction With Various Neural Network Models PDF
5 pages
report on stock prediction project
No ratings yet
report on stock prediction project
20 pages
A Project Report: Submitted in Partial Fulfillment For The Award of The Degree of
No ratings yet
A Project Report: Submitted in Partial Fulfillment For The Award of The Degree of
18 pages
To Forecast Markets: Traderplanet Special Report
No ratings yet
To Forecast Markets: Traderplanet Special Report
7 pages
Intern
No ratings yet
Intern
26 pages
predicting_stock_market_trends
No ratings yet
predicting_stock_market_trends
2 pages
Forex Market Prediction
No ratings yet
Forex Market Prediction
5 pages
3
No ratings yet
3
5 pages
Stock Prediction Using Machine Learning
No ratings yet
Stock Prediction Using Machine Learning
7 pages
10 1109@COMITCon 2019 8862225
No ratings yet
10 1109@COMITCon 2019 8862225
4 pages
Generating High Frequency Trading Strategies With Artificial - PDF Room
No ratings yet
Generating High Frequency Trading Strategies With Artificial - PDF Room
120 pages
Neural Network Time Series Prediction SP500 2
No ratings yet
Neural Network Time Series Prediction SP500 2
7 pages
Robust Machine Learning Pipelines For Trading Mark
No ratings yet
Robust Machine Learning Pipelines For Trading Mark
29 pages
A Neural Network Approach To Predict Stock Performance
0% (1)
A Neural Network Approach To Predict Stock Performance
34 pages
Minor Project
No ratings yet
Minor Project
21 pages
AI For Trading Syllabus: Contact Info
No ratings yet
AI For Trading Syllabus: Contact Info
7 pages
Journal of Internet Banking and Commerce
No ratings yet
Journal of Internet Banking and Commerce
22 pages
GPT C
No ratings yet
GPT C
16 pages
Neural Network Modeling For Stock Movement Prediction A State of The Art
No ratings yet
Neural Network Modeling For Stock Movement Prediction A State of The Art
5 pages
Stock Prediction Report
No ratings yet
Stock Prediction Report
3 pages
Identifying Trades Using Technical Analysis and ML/DL Models
No ratings yet
Identifying Trades Using Technical Analysis and ML/DL Models
14 pages
Stock Market Analysis Using Supervised Machine Learning
No ratings yet
Stock Market Analysis Using Supervised Machine Learning
4 pages
Assess Deep Learning Models For Egyptian Exchange Prediction Using Nonlinear Artificial Neural Networks
No ratings yet
Assess Deep Learning Models For Egyptian Exchange Prediction Using Nonlinear Artificial Neural Networks
23 pages
Stock Price Prediction
No ratings yet
Stock Price Prediction
8 pages
Chap 7 Neural Networks
No ratings yet
Chap 7 Neural Networks
42 pages
Minor Project 1 Report
No ratings yet
Minor Project 1 Report
20 pages
Supervised Autoencoder MLP
No ratings yet
Supervised Autoencoder MLP
33 pages
1-s2.0-S2199853124002324-main
No ratings yet
1-s2.0-S2199853124002324-main
15 pages
20EJCIT200 - Abhishek Tiwari
No ratings yet
20EJCIT200 - Abhishek Tiwari
7 pages
AI ProjectResearchPaper
No ratings yet
AI ProjectResearchPaper
9 pages
Deep Learning Algorithm For Stock Price Prediction: 1. Abstract
No ratings yet
Deep Learning Algorithm For Stock Price Prediction: 1. Abstract
5 pages
572713907
No ratings yet
572713907
1 page
Ripple
No ratings yet
Ripple
2 pages
Cuvva
No ratings yet
Cuvva
4 pages
Data Protection
No ratings yet
Data Protection
3 pages
KATLAS Technology LTD - Imperial Presentation 300420
No ratings yet
KATLAS Technology LTD - Imperial Presentation 300420
10 pages