MS&E 448 Final Presentation High Frequency Algorithmic Trading
MS&E 448 Final Presentation High Frequency Algorithmic Trading
Stanford University
June 6, 2017
Data Processing
Changing the data from minute by minute to second by second
Change from three-way classification to binary classification (no
longer using spread crossing label)
Train and test on a rolling window basis - 2 weeks training period
prior to each day
New Labels
AREA
Time-weighed PnL over the next period (area under the price
movement curve)
VWAP
Volume-weighted average price (VWAP) based on inner bid and ask.
Whether it goes up or down in the window.
Logistic Regression
Outputs probability (how confident we are) on each trade
Advantages over random forest: it trains much faster, the coefficients
have an interpretation
Figure: Heat map of accuracy for different decay and window length parameters
(Left) XLE (Right) XLF
Figure: Prediction accuracy vs prediction threshold for the random forest model.
Figure: PnL per Trade vs prediction threshold for each algorithm and label
Figure: PnL per Trade vs prediction threshold for each algorithm and label
Strengths:
High accuracy rates: model is doing a good job
High PnL per trade with small variance especially when training on a
longer period of time
The model can be generalized to multiple stocks/ETFs
Perform well even in tumultuous historical periods and on
hypothetical scenarios
Limitations:
Have to tune hyperparameters for each stock
High prediction accuracy does not always mean profit: label isn’t
exactly a prediction of PnL
Interpretability of the model
The End
Questions?