0% found this document useful (0 votes)
176 views29 pages

MS&E 448 Final Presentation High Frequency Algorithmic Trading

The students developed high-frequency trading strategies using machine learning models on order book data. They improved their models by changing to second-by-second data and different prediction labels. Testing on historical stock data showed promising results, with cumulative profits increasing over time. Further optimization of model hyperparameters, features, and risk management were identified as areas for continued improvement.

Uploaded by

akion xc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
176 views29 pages

MS&E 448 Final Presentation High Frequency Algorithmic Trading

The students developed high-frequency trading strategies using machine learning models on order book data. They improved their models by changing to second-by-second data and different prediction labels. Testing on historical stock data showed promising results, with cumulative profits increasing over time. Further optimization of model hyperparameters, features, and risk management were identified as areas for continued improvement.

Uploaded by

akion xc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MS&E 448 Final Presentation

High Frequency Algorithmic Trading

Francis Choi George Preudhomme Nopphon Siranart


Roger Song Daniel Wright

Stanford University

June 6, 2017

High-Frequency Trading MS&E448 June 6, 2017 1 / 29


Overview

Review our strategy and progress from the midterm


Changes in Data Processing
Changes to Models
Strategy and Simulations
Results
Evaluation and Next Steps

High-Frequency Trading MS&E448 June 6, 2017 2 / 29


Recall from the Midterm

Goal: Next-minute price movement prediction based on order book


dynamics
Data: Minute-by-Minute consolidated book for S&P 500 ETF (IVV)
Model: Random Forest three-way classifier
Labels: Mid-price changes and spread-crossing
Trading Strategy: Accumulating positions and closing them out at
the end of the day
Results: Still not generated profit

High-Frequency Trading MS&E448 June 6, 2017 3 / 29


After the Midterm

Data Processing
Changing the data from minute by minute to second by second
Change from three-way classification to binary classification (no
longer using spread crossing label)
Train and test on a rolling window basis - 2 weeks training period
prior to each day

High-Frequency Trading MS&E448 June 6, 2017 4 / 29


Data (Example)

High-Frequency Trading MS&E448 June 6, 2017 5 / 29


After the Midterm

New Labels
AREA
Time-weighed PnL over the next period (area under the price
movement curve)

VWAP
Volume-weighted average price (VWAP) based on inner bid and ask.
Whether it goes up or down in the window.

High-Frequency Trading MS&E448 June 6, 2017 6 / 29


After the Midterm

Adding new features


Bid-Ask Volume Imbalance Quantity indicating the number of
shares at the bid minus the number of shares at the ask in the current
order book.
VWAP A variation on mid-price where the average of the bid and ask
prices is weighted according to their inverse volume.
Second Order Derivatives Expand feature universe to encompass
multiple time periods.

High-Frequency Trading MS&E448 June 6, 2017 7 / 29


Model

Logistic Regression
Outputs probability (how confident we are) on each trade
Advantages over random forest: it trains much faster, the coefficients
have an interpretation

High-Frequency Trading MS&E448 June 6, 2017 8 / 29


Model
Random Forest
Again, outputs probability (how confident we are) on each trade
One key advantage over logistic regression - doesn’t assume any
functional form and slightly higher accuracy

High-Frequency Trading MS&E448 June 6, 2017 9 / 29


Strategy

Train the model on a rolling backwards window.


At each second, use the model to arrive at a prediction with a
probability estimate.
If the probability estimate is above the threshold, make the predicted
trade with the size weighted accordingly
Close out the trade at the end of the trading window.

High-Frequency Trading MS&E448 June 6, 2017 10 / 29


Thesys Simulator
Here is what we think it looks like

High-Frequency Trading MS&E448 June 6, 2017 11 / 29


Thesys Simulator
Here is what it actually looks like

High-Frequency Trading MS&E448 June 6, 2017 12 / 29


Thesys Simulator

Very frustrating and very slow


We decided to just pull the data
from Thesys and do the
simulations manually.

High-Frequency Trading MS&E448 June 6, 2017 13 / 29


Results

We choose 10 stocks and ETFs to test our trading strategies, chosen


based on liquidity
These include XLF, CSCO, EEM, IVV, IWM, QQQ, UVXY, VXX,
XLE, SPY
Training Period - 2 weeks from 01/05/2015 - 01/16/2015
Test Period - 2 weeks from 01/19/2015 - 01/30/2015
We use PnL per trade as a performance metric

High-Frequency Trading MS&E448 June 6, 2017 14 / 29


Tuning Parameters

Figure: Heat map of accuracy for different decay and window length parameters
(Left) XLE (Right) XLF

High-Frequency Trading MS&E448 June 6, 2017 15 / 29


Accuracy of Model: Logistic Regression

Figure: Prediction accuracy vs prediction threshold for the logistic regression


model

High-Frequency Trading MS&E448 June 6, 2017 16 / 29


Accuracy of Model: Random Forest

Figure: Prediction accuracy vs prediction threshold for the random forest model.

High-Frequency Trading MS&E448 June 6, 2017 17 / 29


Accuracy of Model: Difference
Overall, Random Forest has slightly better accuracy across threshold
values.

Figure: Prediction accuracy RF - LR vs prediction threshold.

High-Frequency Trading MS&E448 June 6, 2017 18 / 29


Cumulative PnL (XLF)
PnL stably increasing throughout the day - High Sharpe Ratio !!

Figure: Cumulative PnL within a day

High-Frequency Trading MS&E448 June 6, 2017 19 / 29


Trading PnL (XLF)
Logistic Regression with VWAP label performs best in this case

Figure: PnL per Trade vs prediction threshold for each algorithm and label

High-Frequency Trading MS&E448 June 6, 2017 20 / 29


Trading PnL (XLF)
Tuning hyperparameters improves the model significantly

Figure: PnL per Trade vs prediction threshold for different hyperparameters

High-Frequency Trading MS&E448 June 6, 2017 21 / 29


Trading PnL (MSFT)
Random Forest with AREA label performs best for MSFT

Figure: PnL per Trade vs prediction threshold for each algorithm and label

High-Frequency Trading MS&E448 June 6, 2017 22 / 29


Trading PnL (MSFT)
A combination of non-optimal hyperparameters, models and labels
performs poorly.

Figure: PnL per Trade vs prediction threshold for different hyperparameters

High-Frequency Trading MS&E448 June 6, 2017 23 / 29


Multiple Stocks
Random Forest with AREA labels. Window = 15, decay = 0.8

Figure: PnL per Trade vs prediction threshold for different stocks

High-Frequency Trading MS&E448 June 6, 2017 24 / 29


Multiple Stocks
Logistic Regression with AREA labels. Window = 15, decay = 0.8

Figure: PnL per Trade vs prediction threshold for different stocks

High-Frequency Trading MS&E448 June 6, 2017 25 / 29


Evaluating Our Strategy

Strengths:
High accuracy rates: model is doing a good job
High PnL per trade with small variance especially when training on a
longer period of time
The model can be generalized to multiple stocks/ETFs
Perform well even in tumultuous historical periods and on
hypothetical scenarios
Limitations:
Have to tune hyperparameters for each stock
High prediction accuracy does not always mean profit: label isn’t
exactly a prediction of PnL
Interpretability of the model

High-Frequency Trading MS&E448 June 6, 2017 26 / 29


Future Work and Areas for Improvement

Within 10 weeks, we can’t make the perfect trading strategy: there is


still a lot we could improve.
Some ideas for further work:
Training on a longer period of time
More sophisticated features: right now we only use the order book
data, could try including external features (such as an index like the
VIX, or data on correlated securities, etc.)
Converting to a strategy that trades at bid and ask (rather than
midprice)
Modifying strategy to handle scaled-up trade quantities
Risk Management

High-Frequency Trading MS&E448 June 6, 2017 27 / 29


Conclusion

Idea: use machine learning techniques on the order book to make


price movement predictions. Trade on these predictions to make $$$
Models: Random forest, logistic regression
Data: Second-by-second orderbook data from Thesys
Calibrated trading frequency, prediction label, hyperparameters of
models
Performed simulations on historical data
Promising results that can be built upon

High-Frequency Trading MS&E448 June 6, 2017 28 / 29


Conclusion

The End
Questions?

High-Frequency Trading MS&E448 June 6, 2017 29 / 29

You might also like