Tactical Investment Algorithms: Marcos López de Prado
Tactical Investment Algorithms: Marcos López de Prado
Marcos López de Prado Two major epistemological limitations prevent finance from becoming a science, at par with
True Positive Technologies physics, chemistry or biology. First, finance does not comply with Popper’s falsifiability criterion,
because financial theories cannot be tested in a laboratory in controlled experiments. Claims such
as “value and momentum factors explain the outperformance of stocks” cannot be proven wrong,
even if they are. All researchers have is the outcome from a single realized path (a price time series)
produced by an unknown data-generating process (DGP). We cannot draw millions of alternative
paths from the same DGP and evaluate in how many instances value and momentum factors had
explanatory power, while controlling for environmental conditions.
The second epistemological limitation afflicting finance is non-stationarity. Financial systems are
extremely dynamic and complex, with conditions that quickly change over time. Financial cause-
effect mechanisms are not invariant, due to changes in regulation, expectations, economic cycles,
market regimes and other environmental variables. For instance, even if value and momentum
factors truly explained the outperformance of stocks in the 20th century, that may no longer be the
case as a result of recent technological, behavioral or policy changes. Perhaps value and momentum
only worked under certain conditions that are no longer present. Consequently, claims made by
financial economists are typically based on anecdotal information, and do not rise to the standard
of scientific theories.
Due to these epistemological limitations, researchers rely on backtesting for developing investment
algorithms. A backtest infers the performance of an investment algorithm under the general
assumption that future observations will be drawn from the same DGP that produced past
observations. In this paper, I explain the different types of backtesting methods, and the specific
61
Quarter 1 • 2020 Tactical Investment Algorithms
assumptions underlying each method. I also argue that one A Practical Example of MC Backtest
particular type of backtesting method can help address finance’s
epistemological limitations, and bring financial theories closer to Consider a researcher that wishes to design a market making
scientific standards. algorithm. Market microstructure theory tells us uninformed
traders cause short-term mean reversion as a result of temporary
The Three Types of Backtests market impact, and informed traders cause permanent impact
on market prices. Informed traders arrive at the market at a rate
In general terms, we can differentiate between three types of
μ and uninformed traders arrive at the market at a rate ε, where
backtests. First, the walk-forward method (WF) assesses the
both rates can be modelled with a Poisson process. The statistical
performance of an investment algorithm under the assumption
analysis of historical time series gives us a range of fluctuation of
that history repeats itself exactly.1 A first caveat of WF is that
μ and ε, which can be used to simulate long series under various
past time series merely reflect one possible path produced by the
scenarios. For a given combination of μ and ε, MC allows us to
DGP. If we were to take a time machine, the stochastic nature of
derive the optimal market making algorithm, that is, the set of
the DGP would produce a different path. Since WF backtests are
profit taking and stop loss levels that maximize the Sharpe ratio in
not representative of the past DGP, there is no reason to believe
a MC backtest. In contrast, WF and RS would backtest the overall
that they are representative of the future DGP. Accordingly,
performance of the market making algorithm, over all historical
WF is more likely to yield a descriptive (or anecdotal) than an
values of μ and ε, without allowing us to estimate the performance
inferential statement (see López de Prado [2018], chapter 11). A
at specific pairs of μ and ε, and without allowing us to derive
second caveat of WF is that the DGP is never stated: should the
optimal market making algorithms for each specific pair.
DGP change, the researcher will not be able to decommission the
algorithm before it loses money, because she never understood Exhibit 1 shows the performance of a trading algorithm
the conditions that made the algorithm work. under various profit-taking and stop-loss scenarios, where
the underlying price follows an Ornstein-Uhlenbeck process
The second type of backtest is the resampling method (RS), which
with a half-life of 5, zero drift and noise with unit variance (see
addresses WF’s first caveat. RS assesses the performance of an
López de Prado [2018], chapter 13). The half-life is so small that
investment algorithm under the assumption that future paths
performance is maximized in a narrow range of combinations
can be simulated through the resampling of past observations.
of small profit-taking with large stop-losses. In other words, the
The resampling can be deterministic (e.g., jackknife, cross-
optimal trading rule is to hold an inventory long enough until a
validation) or random (e.g., subsampling, bootstrap). Because
small profit arises, even at the risk of experiencing some 5-fold
RS can produce many different paths, where the historical is just
or 7-fold unrealized losses. Sharpe ratios are high, reaching levels
one possibility, it allows us to consider more general scenarios
of around 3.2. The worst possible trading rule in this setting
consistent with the DGP. For instance, through a RS backtest we
would be to combine a short stop-loss with a large profit-taking
can bootstrap the distribution of the algorithm’s Sharpe ratio,
threshold, a situation that market-makers avoid in practice.
which is much more informative than the single-path Sharpe ratio
Performance is closest to neutral in the diagonal of the mesh,
derived by WF. Whereas it is trivial to overfit a WF backtest, it is
where profit-taking and stop-losses are symmetric.
more difficult to overfit a RS backtest. Still, resampling on a finite
historical sample may not yield paths representative of the future
(see López de Prado [2018], chapter 12).
The third type of backtest, the Monte Carlo method (MC),
addresses both of WF’s caveats. The MC method assesses the
performance of an investment algorithm under the assumption
that future paths can be simulated via Monte Carlo. MC requires a
deeper knowledge of the DGP, derived from the statistical analysis
of the observations or theory (e.g., market microstructure,
institutional processes, economic links, etc.). For instance,
economic theory may suggest that two variables are cointegrated,
and empirical studies may indicate the range of values that
characterize the cointegration vector. Accordingly, researchers can
simulate millions of years of data, where the cointegration vector
takes many different values within the estimated range. This is a
much richer analysis than merely resampling observations from a
finite (and likely unrepresentative) set of observations (see López
de Prado [2018], chapter 13).
62
Tactical Investment Algorithms
Exhibit 2 shows what happens when the half-life increases from Third, MC backtests enable the incorporation of priors, which
5 to 10. The areas of highest and lowest performance spread over inject information beyond what we could have learned from a
the mesh, while the Sharpe ratios decrease to levels around or finite set of observations. When these priors are motivated by
below 2. This is because, as the half-life increases, so does the economic theory, MC offers a powerful tool to simulate the most
magnitude of the autoregressive coefficient, thus bringing the likely scenarios, even if some of those scenarios have not been
process closer to a random walk. For a sufficiently long half-life, observed in the past. Unlike WF or RS, MC backtests can help
even the optimal combination of profit-taking and stop-loss levels us develop tactical algorithms to be deployed in the presence of
yield an unacceptably low return on risk. black swans.
Fourth, the length of MC backtests can be expanded for as long as
needed to achieve a targeted degree of confidence. This is helpful
in that MC backtests avoid the indetermination inherent to
working with finite datasets.
63
Quarter 1 • 2020 Tactical Investment Algorithms
One potential caveat of parametric Monte Carlo is that the algorithm will be deployed tactically on different instruments
DGP may be more complex than a finite set of algebraic over time, when those instruments temporarily follow the DGP
functions can replicate. When that is the case, non-parametric associated with that algorithm. The main difference between the
Monte Carlo experiments may be of help, through the use of tactical algorithmic factory (TAF) approach and the strategic
variational autoencoders, self-organizing maps, or generative algorithmic factory (SAF) approach is that TAF’s objective is to
adversarial networks (De Meer Pardo [2019]). These methods develop DGP-specific algorithms, which are not required to work
can be understood as non-parametric, non-linear estimators all the time. Instead, TAF’s algorithms only need to work during
of latent variables (similar to a non-linear PCA). An the DGP for which they have been certified.
autoencoder is a neural network that learns how to represent
high-dimensional observations in a low-dimensional space. DGP Identification
Variational autoencoders have an additional property which
MC backtests allow researchers to pose the algorithm selection
makes their latent spaces continuous. This allows for successful
problem in terms of a DGP identification problem. This is
random sampling and interpolation and, in turn, their use
advantageous, because finding an algorithm that works well across
as a generative model. Once a variational autoencoder has
all possible DGPs is much more challenging than estimating what
learned the fundamental structure of the data, it can generate
is the current DGP (which in turn determines the algorithm that
new observations that resemble the statistical properties of the
should be run at a given point in time). Also, from a mathematical
original sample, within a given dispersion (hence the notion of
perspective, identifying the optimal algorithm associated with a
“variational”). A self-organizing map differs from autoencoders in
particular DGP is a well-defined problem.3
that it applies competitive learning (rather than error-correction),
and it uses a neighborhood function to preserve the topological One practical way of identifying the prevailing DGP is as follows:
properties of the input space. Generative adversarial networks First, through MC backtests, develop many tactical investment
train two competing neural networks, where one network (called algorithms for a wide range of DGPs. Second, select a sample of
a generator) is tasked with generating simulated observations recent market performance. Third, evaluate the probability that
from a distribution function, and the other network (called a the sample of recent market performance was drawn from each
discriminator) is tasked with predicting the probability that the of the studied DGPs. This probability can be estimated through
simulated observations are false given the true observed data. different methods, the total variation distance, the Wasserstein
The two neural networks compete with each other, until they distance, the Jensen-Shannon distance, some derivation of the
converge to an equilibrium. The original sample on which the Kullback-Leibler divergence, or the Kolmogorov-Smirnov test.
non-parametric Monte Carlo is trained must be representative The resulting distribution of probability can then be used to
enough to learn the general characteristics of the DGP, otherwise allocate risk across the algorithms developed by the TAF. In other
a parametric Monte Carlo approach should be preferred. See words, an ensemble of optimal strategies is deployed, and not only
López de Prado [2019] for additional details. the most likely optimal strategy.
The Tactical Algorithmic Factory In practice, it takes only a few recent observations for the
estimated distribution of probability to narrow down the likely
The WF and RS backtesting methods attempt to find “all-weather” DGPs. The reason is, we are comparing two samples, where the
algorithms, that is, strategic investment algorithms that are synthetic one is comprised of potentially millions of datapoints,
not associated with a particular DGP, and are deployed under and it typically does not take many observations to discard what
all market conditions. The notion of strategic (all-weather) DGPs are inconsistent with recent observations.
investment algorithms is inconsistent with the fact that markets
go through regimes, during which some algorithms are expected Another possibility is to create a basket of securities with a returns
to work and others expected to fail. Given that markets are distribution that matches the distribution of a given DGP. Under
adaptive and investors learn from mistakes, the likelihood that this alternative implementation, rather than estimating the
truly all-weather algorithms exist is rather slim (an argument probability that a security follows a DGP, we create a synthetic
often wielded by discretionary portfolio managers). And even security (as a basket of securities) for which a given algorithm is
if all-weather algorithms existed, they are likely to be a rather optimal.
insignificant subset of the population of algorithms that work One virtue of running an ensemble of optimal algorithms is that
across one or more regimes. the ensemble strategy does not correspond to any particular DGP.
In contrast to WF and RS backtests, MC backtests help us This allows the ensemble strategy to dynamically and smoothly
define the precise sensitivity of an investment algorithm to transition from one DGP to another, and even profit from a
the characteristics of each DGP. Once we understand what never-seen-before DGP.
characteristics make the algorithm work, we can deploy it
tactically, while monitoring the idoneity of market conditions, and
Conclusion
derive the appropriate ex-ante risk allocations. When used in this In this paper I have argued that MC backtests offer to financial
way, MC backtests allow us to trade the algorithms rather than the researchers the possibility of conducting randomized controlled
markets. Under this investment paradigm, a firm will develop as experiments. Absent financial laboratories, this is as close as
many tactical investment algorithms as possible (López de Prado finance can get to the Popperian criterion of falsifiability.
[2018], chapter 1), and then deploy only those algorithms that are
certified to work under the prevalent market conditions. These
algorithms are DGP-specific, not instrument specific: the same
64
Tactical Investment Algorithms
An MC backtest can be understood as a certification of the López de Prado, M. (2018): Advances in Financial
performance of an algorithm subject to certain declared Machine Learning. First edition, Wiley. https://www.
environmental conditions, similar to how an engineer would amazon.com/dp/1119482089
certify the performance of a type of equipment. In contrast with
López de Prado, M. (2019): Systems and Methods for a
the WF and RS methods, MC backtests inform us about the
Factory that produces Tactical Investment Algorithms
conditions under which the tactical investment algorithm should
through Monte Carlo Backtesting. United States Patent
be deployed. This information also helps investors pinpoint the
and Trademark Office, Application No. 62/899,164.
circumstances under which the algorithm is most vulnerable,
when the algorithm should be decommissioned, and how much López de Prado, M. (2020): Machine Learning for Asset
risk should be allocated to it. Managers. First edition, Cambridge University Press.
Forthcoming.
Given that markets are adaptive and investors learn from
mistakes, the likelihood that truly all-weather algorithms exist is Author Bio
rather slim (an argument often wielded by discretionary portfolio
Marcos López de Prado, PhD
managers). And even if all-weather algorithms existed, they are
True Positive Technologies
likely to be a rather insignificant subset of the population of
algorithms that work across one or more regimes. Accordingly, Prof. Marcos López de Prado is the CIO
asset managers should embrace the TAF paradigm, hence of True Positive Technologies (TPT), and
developing as many tactical investment algorithms as possible, Professor of Practice at Cornell University’s
through MC backtesting. School of Engineering. He has over 20
years of experience developing investment
Endnotes strategies with the help of machine learning
1. The main argument in favor of WF is that it prevents algorithms and supercomputers. Marcos
leakage from look-ahead information. However, if a launched TPT after he sold some of his patents to AQR Capital
walk-backwards backtest does not exhibit significantly Management, where he was a principal and AQR’s first head of
better performance than a WF, look-ahead leakage is not a machine learning. Marcos also founded and led Guggenheim
concern, making the main argument for WF rather weak. Partners’ Quantitative Investment Strategies business, where he
managed up to $13 billion in assets, and delivered an audited risk-
2. In recent years, it has been proven fashionable for some adjusted return (information ratio) of 2.3.
asset managers to promote certain investment factors
through long WF backtests (in some cases, covering over Concurrently with the management of investments, between
a hundred years). Consider the validity of that work when, 2011 and 2018 Marcos was a research fellow at Lawrence
for instance, the current environment of negative interest Berkeley National Laboratory (U.S. Department of Energy, Office
rates has never been experienced before. In contrast, of Science). He has published dozens of scientific articles on
it is straightforward to conduct a MC backtest on data machine learning and supercomputing in the leading academic
simulated by a DGP with negative interest rates. journals, is a founding co-editor of The Journal of Financial
Data Science, and SSRN ranks him as the most-read author in
3. Most journal articles promote investment algorithms economics. Among several monographs, Marcos is the author
without stating the DGP that those algorithms supposedly of several graduate textbooks, including Advances in Financial
exploit. Without knowing the DGP, we cannot know the Machine Learning (Wiley, 2018) and Machine Learning for Asset
conditions under which that algorithm is supposed to be Managers (Cambridge University Press, forthcoming).
run, or when to decommission it.
Marcos earned a PhD in financial economics (2003), a second
References PhD in mathematical finance (2011) from Universidad
Complutense de Madrid, and is a recipient of Spain's National
De Meer Pardo, F. (2019): “Enriching Financial Datasets
Award for Academic Excellence (1999). He completed his post-
with Generative Adversarial Networks.” Working paper.
doctoral research at Harvard University and Cornell University,
Available at http://resolver.tudelft.nl/uuid:51d69925-fb7b-
where he is a faculty member. Marcos has an Erdős #2 according
4e82-9ba6-f8295f96705c
to the American Mathematical Society, and in 2019, he received
Franco-Pedroso, J., J. Gonzalez-Rodriguez, J. Cubero, the ‘Quant of the Year Award’ from The Journal of Portfolio
M. Planas, R. Cobo, and F. Pablos (2019): “Generating Management.
Virtual Scenarios of Multivariate Financial Data for
Quantitative Trading Applications.” The Journal of
Financial Data Science, 1(2), pp. 55-77. Available at
https://doi.org/10.3905/jfds.2019.1.003
Hamilton, J. (1994): Time Series Analysis. Princeton
University Press, First edition.
Jarvis, S., J. Sharpe, and A. Smith (2017): “Ersatz Model
Tests.” British Actuarial Journal, 22(3), pp. 490-521.
65
Quarter 1 • 2020 Tactical Investment Algorithms