004 - Best Practices in Research For Quantitative Equity Strategies
004 - Best Practices in Research For Quantitative Equity Strategies
The Journal of Portfolio Management Special QES Issue 2016, 42 (5) 135-143; DOI:
https://doi.org/10.3905/jpm.2016.42.5.135
Abstract
The authors examine the research process and principles underlying successful models used in
quantitative equity strategies. They identify three key factors they see contributing to improved empirical
work: 1) making research design a top priority, 2) making new, more extensive datasets available, and 3)
making advances in computational areas such as econometrics, machine learning, and statistics. The authors
explain these key factors and also share insights on how to integrate market dynamics, data, research design,
advance modeling techniques, and economic/financial introspection into the research process.
In this article, we examine the research process and principles underlying successful models used in
quantitative equity strategies. The research process is at the heart of the development of successful
quantitative strategies. Key factors for their success are the availability of more and better data, advances
in computational and econometric methods, and better understanding of how to enhance judgment in
the research process. Our discussion does not provide rules to follow, but rather tenets that emerged from
By identifying and examining the characteristics of quantitative strategies, we attempt to highlight some
best practices in quantitative modeling and aspire to outline a broader paradigm for building successful
models. These characteristics may not be strictly statistical or mathematical in nature, but rather they emphasize
the integration of market dynamics, data, research design, modeling techniques, and economic and
financial judgment.
In Superforecasting, Tetlock and Gardner [20151 state that "foresight isn't a mysterious gift bestowed at
birth. It is the product of particular ways of thinking, of gathering information, of updating beliefs." In this
article, we share some insights on accomplishing this objective by providing a framework and some thoughts
on building quantitative forecasting models. Our discussion centers on developing quantitative models
regardless of asset class, but our examples draw heavily from equities. Although we focus on
quantitative research methodologies, we think that some of these ideas are valuable for a fundamental
research process.
1
WHAT ARE QUANTITATIVE MODELS?
In this article, we refer to quantitative modeling in a broader sense. A quantitative strategy is a systematic, data-
and model-based approach to making investment decisions. We can further qualify quantitative strategies by
their underlying core characteristics. By examining these core characteristics, we can attempt to identify
some best practices in quantitative modeling to develop a paradigm that will lead to building successful
models.
2
The most important characteristic of the quantitative modeling approach is the scientific approach. This
approach provides a paradigm that guides and informs empirical work. Similar to other fields that take a
scientific approach—including natural sciences, medicine, and social sciences—this approach in quantitative
modeling attempts to describe, inquire, and interpret with precision. The characteristics of a scientific
• Use of empirical work to attempt to put precision around investment decisions and economic
reasoning
• Use of sensitivity analysis to challenge assumptions and context in which the strategy was developed
Dating back to the 17th century, the scientific method is an approach for examining and understanding
phenomena, developing new theories, or modifying or integrating existing theories based on the
presentation of empirical and measurable evidence subject to specific principles of reasoning. Research based
on the scientific method typically takes steps to 1) define a question, 2) collect information and resources, 3)
form an explanatory hypothesis, 4) test the hypothesis by performing experiments and collecting data in a
reproducible manner, 5) analyze the data, and 6) interpret the data and draw conclusions. After drawing
these conclusions, the researcher may then go back to reformulate the explanatory hypothesis and repeat
3
Within the context of investment management, empirical analysis uses data and tools to design, research, and
evaluate hypotheses/models. The primary function of empirical research is to create (some) evidence for
trading models. A large part of empirical analysis is research design. A well-thought-out research design
provides support and credibility for validating the investment insights underlying trading models.
4
• Mullifactor strategies: Models that invest in equities based on multiple characteristics that replicate how
investors make decisions. These strategies have an edge in their superior processing of information and
identification of differentiated insights.
• Allocation decisions: Strategies that make decisions based on allocations to different countries, sectors,
factor timing, regime switching, etc.
• Stock—specific strategies: Strategies that focus on the information specific to an individual equity security.
• Factor strategies: Rule-based strategies that trade well-known equity risk premiums to earn returns.
• Event studies: Strategies that generate alpha from specific events.
• Market microstructure strategies: Strategies that exploit profitable opportunities arising from the trading
flows and dynamics of equity markets.
• Statistical arbitrage: Strategies that exploit systematic relationships among equity securities with
similar characteristics. In contrast to pure riskless arbitrage, statistical arbitrage is a risky form of
strategy.
• Textual strategies: Quantitative strategies that trade based on qualitative textual signals, such as news
reports, company documents, or Internet searches.
• Thematic/macro strategies: Strategies that trade baskets of equity securities based on broad themes in the
economic environment. technology, demographics, or other fields.
Exhibit 1 lists different types of quantitative strategies. The typical steps in developing these strategies are as
follows: 1) formulate trading ideas and strategies, 2) develop signals, 3) acquire and process data, 4) analyze
the signals, 5) build the strategy, 6) evaluate the strategy, 7) backtest the strategy, and 8) implement the
A successful quantitative strategy often starts as an idea based on economic intuition, a market insight, or an
anomaly. Background research can be helpful for understanding what others have tried or implemented in
the past.
To distinguish between a trading idea and a quantitative strategy, we look at the economic motivation for
each. A trading idea has a shorter-term horizon, often associated with a specific event or mispricing. A
quantitative strategy has a longer time span and exploits opportunities to process information better,
receive premiums associated with anomalies, or identify mispricings.
• Developing Signals
After having established the idea of the strategy, we move from the economic concepts to the construction of
signals that may be able to capture our intuition. Signals provide building blocks for the model used to
5
create an investment strategy.
Built from data, signals are quantitative measures that represent an investment idea. How signals are built
varies depending on the investment thesis and the data representing the thesis. For example, a
quantitative signal could be based on a stock's underlying characteristics such as its return on equity
or valuation ratio. A sentiment signal could be developed from unstructured text from various company-
Data are critical to a strategy's success. A strategy relies on accurate and clean data to build signals. Data need
to be carefully stored in an infrastructure that is scalable and flexible. Upon acquiring new data sources,
6
• Ana lyzing the Signa ls
Researchers perform a variety of statistical tests and econometric techniques on the data to evaluate the
empirical properties of signals. This empirical research is used to understand the risk-and-return potential
of a signal. For example, a researcher might be interested in statistically testing whether a signal's Sharpe
ratio is larger than 1. This analysis may form the basis for building a more complete trading strategy.
A model represents a mathematical or systematic specification of the trading strategy. There are two
important considerations in this specification: 1) the selection of the specific signals and 2) how these
signals are combined. Both considerations may be motivated by the economic intuition driving the
trading strategy.
The final steps involve assessing the estimation, specification, and forecast quality of the model. This analysis
includes examining the goodness of fit (often done in sample), forecasting ability (often done out of sample),
Empirical validation and testing are key drivers in the development of quantitative trading strategies. They
bridge the gap between stylized financial models and the real world represented by the markets.
Financial models are often crude approximations of reality—with regimes in which they work acceptably, and
regimes in which they do not work at all or work very poorly at best. Careful systematic empirical research can
help identify these regimes. The researcher's judgment and experience become a critical factor in this step.
The well-known statistician George Box stated, "All models are wrong; some models are useful" (Box [19761).
Models simplify the world around us through idealization. Naturally, this idealization describes the most
salient features of markets, and it is important to note that not every market dynamic is included in a model.
The construction of idealized representations of the financial markets is a vital part of academic and
practitioner research.
Although models are quantitative in nature, the research process is subject to data and design decisions that
are more qualitative in nature. Judgment calls include deciding how to cleanse the data, how to select a
7
specific model, how to aggregate signals, and which risk measures to rely on. Researchers make these
decisions, of course, based on their experience and preferences. Besides the research process itself,
judgment is also prevalent in the feedback mechanism of backtesting and running the strategy.
Generally, most quantitative models are based on two approaches of thinking—hypothesis based
(deductive) and pattern based (inductive). Each approach requires a different model-building research
process. For the hypothesis-based approach, the starting point is some insight about why a trading
opportunity exists. It is dependent on an economic thesis or hypothesis on how the market works or
why the opportunity exists. Frequently, the "story" precedes the empirical work.
The second approach is inductive or pattern based. This approach is exploratory in nature, and the discovery
of insights emerges from the empirical work. A key feature is that learning occurs throughout the process. In
this approach, it is critical to be able to distinguish between correlation and causation. Are measured
statistical correlations spurious or causal? Understanding underlying economic mechanisms and theory
Best practices involve understanding how to make better decisions in the research design process. It is
useful to draw on sciences from other disciplines that study decision making, often in experimental settings;
A good question to ask is, "How do we make better decisions in the development of quantitative
strategies?" We compiled the following list of attributes from the research of various experts in the areas
of decision sciences, including Learner [1978, 1983], Tetlock and Gardner [2015], and Tversky and
• Understand the assumptions underlying the research methodology decisions and why those
8
• Tweak the research question being asked and try answering this revised question as a way of
• Break down the investment thesis into its underlying assumptions and scrutinize each
assumption.
• Examine what you know about the investment thesis from what you do not know or cannot know.
The benefits of using quantitative models extend beyond pure quantitative trading. These models provide
differentiate between how purely quantitative investors use model forecasts and how fundamental investors
use them. For quantitative investors, model forecasts produce an expected return forecast on a security or
a set of securities. For fundamental investors, model forecasts create new insights to synthesize
with other qualitative information (e.g., management meetings and industry strategy) being acquired to
make investment decisions. Working with fundamental investors, we apply quantitative models to
understand complex relationships, to verify investment theses, and to discover new opportunities.
Collaboration of quantitative researchers with fundamental investors is a social experience that can create an
"investment edge." The process of building a quantitative model jointly produces unique investment insights.
Numerous studies and anecdotes provide evidence that combining computer-based forecasting and
human judgment results in better outcomes. We can draw on the literature from other disciplines to assist in
providing insight into how to better integrate the two. For example, in "freestyle chess," a chess tournament in
which players are open to consult any resource available to assist them, the winners of tournaments are
humans paired with machines—beating machines only, or human experts alone, or machine alone (Cowen
[2013]). The key to the winners' success is being able to synthesize information from multiple sources, while
Quantitative strategies exist across different markets. The characteristics underlying these strategies vary
substantially. We categorize strategies along a number of dimensions, such as the asset class, type of
securities, horizon, trading style, and investment philosophy. Each of these categories influences the
quantitative modeling process, often starting with the research design, data, modeling techniques, and
9
evaluation methods.
Commonality in these traits allows us to classify strategies into groups. In Exhibit 1, we attempt to create a
simple taxonomy of quantitative strategies. There is some overlap in this classification because strategies
share common traits. We also considered how investors implement the strategies.
Quantitative investment strategies differ in their motivation to trade, the frequency of trading, information used
to trade, and the markets traded. The strategies employ different holding periods and trading frequencies—
the latter of which can occur in milliseconds, or extend to months or years. Separately, the holding period
of each trade varies along similar horizons. Both trading frequencies and holding periods are functions
of the investment theory underlying the strategy and the empirical results uncovered in research.
In addition to insights about the market, quantitative computational methods are critical for success. Many of
the traditional financial econometric techniques continue to be widely used. Their success results from their
tractability—being well understood and fairly straightforward to implement. These include regression-
based techniques, such as Fama-MacBeth and generalized least squares, and nonparametric techniques,
such as portfolio sorts. Our understanding of how to effectively apply and interpret these techniques has
matured.
Researchers continue to extend and innovate upon traditional computational approaches. For example,
Patton and Timmermann [2010] propose new ways to test for monotonicity (in portfolio sorts) in the
expected returns of securities sorted by characteristics that theory predicts should earn a systematic
premium. They provide a summary statistic for monotonicity, allowing researchers to decompose the results to
better diagnose the source of a rejection of (or failure to reject) the theory being tested.
New computational methods continue to emerge and flourish. An increasingly popular computational
field among quantitative researchers is statistical learning (sometimes referred to as machine learning).
These analytical tools—which can be classified into supervised and unsupervised methods—are valuable for
building models because they reveal the structure of data, incorporate nonlinearities into the model, and
provide robust predictions. In our view, these approaches should not be viewed or used as "black boxes"
Researchers in finance are applying these newer methodologies to create insights into the dynamics of
10
equity markets. Moritz and Zimmermann [2014] address the research question of which variables provide
independent information about the cross-section of stock returns. Their computational approach, called
deep conditional portfolio sorts, is designed to deal with a large number of variables and potential
nonlinearities and interactions. When estimating the model, the authors incorporate concepts from the
statistical learning literature, mirroring methods used to estimate decision tree and ensemble methods.
Ogneva, Piotroski, and Zakolyukina [2015] use the lasso (least absolute shrinkage and selection
operator) model by Tibshirani [1996] to select a parsimonious set of fundamental variables for a
probability of recession given a failure model. Lasso estimates a sparse solution of a regression problem by
setting some of the regression coefficients to zero. They are primarily interested in the out-of-sample
Unstructured data have become more valuable in developing quantitative signals for equities. Textual analysis
of corporate disclosures such as financial statements, earnings releases, and conference call transcripts are
sources of unstructured, qualitative data. Li [2010] provides a survey of various techniques to extract
signals from textual data, showing that the communication patterns of management could reveal
certain management characteristics that have an impact on understanding corporate decisions and
forecasting stock returns. Focusing on research related to earnings quality, stock market efficiency, and
corporate financial policies, he highlights two general approaches for conducting content analysis using a
rule-based dictionary approach and statistical approach, such as the naïve Bayesian machine-learning
algorithm.
Equity markets react to news flow. Although it is potentially rich in information, this source of data also
contains substantial noise. Heston and Sinha [2015] compare different methods of textual analysis using
news reports from Dow Jones to predict cross-sectional stock returns. They analyze the horizons over which
returns are realized for sentiment signals created using different computational methods. Some signals provide a
forecasting horizon of up to a quarter, whereas others forecast returns over shorter horizons, such as a day.
A growing amount of information comes from Twitter, Internet searches, and other sources of text-based
social media. These data sources can also be useful for building quantitative measures of investor
sentiment. For example, Da, Engelberg, and Gao [2015] use daily Internet search volume to construct a
measure of market-level sentiment. They show that this measure is useful in predicting short-term reversal,
11
It is important to understand the intuition, assumptions, and strengths and weaknesses of computational
approaches. The choice of a method involves trade-offs. The computational approach should align with the
data structure, research design approach, and underlying investment strategy being research and traded. For
example, research that uses a hypothesis-based (deductive) approach to modeling typically relies on more
approaches are applied to unstructured data sources and/or data that contain nonlinearities or other unusual
features.
There are many advantages to using a quantitative model. The scientific approach to developing
models bring advantages such as rigor, creativity, avoidance of biases, and process.
Rigorous analysis is an underlying principle of the scientific approach. A rigorous approach allows one to
validate ideas through a framework incorporating statistical rigor by employing backtests, in-sample/out-
sample comparisons, and Monte Carlo analysis to study the robustness and sensitivity of a strategy to a
given choice of parameters. This verification should also incorporate new market and theoretical
developments. New assumptions (paradigms/theories) require the reconstruction of prior assumptions and
the expanding set of data sources provide tools and raw materials to explore new trading ideas. Similar to an
artist who has access to paints and canvases, a researcher benefits from creativity—which results from hard
work, introspection, and inspiration. This creativity is the driver for new investment ideas. With the right
tools, researchers have the ability to develop ideas about investing strategies and create models.
All decisions are subject to biases. In investing, the behavioral biases are well documented—confirmation bias,
optimism bias, and overconfidence, to name a few. Quantitative models give us more objective
The decisions we make to build quantitative strategies are also affected by biases. We sometimes see
data and construct a story to explain what happened. Taleb [2007] calls this "narrative fallacy"— looking
backward and creating a story to fit events. Being aware of potential biases and understanding how our
assumptions drive our choices in the modeling process are key to building successful strategies.
12
Quantitative strategies are systematic; that is, the underlying strategy is consistently applied to identify and
implement trading opportunities in a structured framework. A framework brings structure and logic to a
disorderly and complex activity of identifying opportunities in the markets. This framework provides a
process—a common plan of direction and action to use in developing, evaluating, and implementing
13
In markets filled with near-constant information flow, not all information influences asset prices. The benefit of
having a systematic model is having a consistent process to focus efforts on information that influences
Empirical research is often based on historical data, and there is a limit to how much information about
the future we can infer from the past. Sometimes, quantitative investors are at risk of being too systematic in
their approach. There is always the risk of low-probability events that will challenge the underlying
assumptions. Because markets change and the current environment looks different from the past, we need
to evaluate whether those changes are structural or transitory. It is important to understand how both types of
changes will impact the performance of quantitative models. We need to continually evaluate our models and
the markets they operate in, revising a model when our judgment and experience indicate it is no longer
At the heart of a quantitative model is data. Quantitative analysis relies on nonexperimental inference. How
the data are used and the source of the data are of critical importance. "Garbage in and garbage out" is a
commonly used phrase referring to how the data inputs of a model can affect its output. For researchers of
quantitative strategies, this means that a quantitative process is only as good as its data.
Data impact the outcome of a research project. In any dataset, there are some data features we understand, and
some we do not. For researchers, it is critical to explore data features and expose unexpected features of
the data.
We can classify data in a number of different ways. More recently, it has become common to characterize data
as structured or unstructured. Structured data are organized into tables with clearly identified and
organized information. Unstructured data, such as text containing natural language, do not have a
formal structure. It requires specialized processes to extract the important attributes that can be used in
various computational techniques—and thereby, it introduces new opportunities and challenges for
researchers. The infrastructure to store and access this information is still evolving; thus, it requires
Data containing errors, missing values, and other flaws affect the validity of the analysis. For example, Kothari,
Sabino, and Zach [2005] find that non-surviving firms tend to be either extremely bad or extremely good
14
performers. Survivor bias implies truncation of such extreme observations. The authors show that even a small
degree of such nonrandom truncation can have a strong impact on sample moments of stock returns.
Data are often available from multiple sources, and the number of available data sources is increasing. It is well
known that different data sources maintain a different level of detail. These differences can have a large impact on
the researcher needs to explain how discrepancies between databases affect the research output. The issue is
to determine whether the use of a particular data source might have influenced the results.
15
Characteristics of good data include the following:
Successfully working with data means understanding the nuances of data sources. The following are a
• Understand how the database evolves over time. Most databases change over time, and those
changes include what data were collected, how the data were collected and its coverage.
• Understand how the database's standard procedures work and how they differ among different
data sources. Most databases have standardized procedures for reporting certain items in their
system in order to ensure comparability.
• Choose one data source to build the model and a second data source to confirm the model.
• Include statistics that describe and compare the usability of data items with regard to standard empirical
applications in finance.
• Look for economic explanations of any outliers. For example, Brown, Lajbcygier, and Li [2008]
examine the economic significance of outliers in their dataset. In their work, they show that the
outliers result from firms with materially different financial situations. In contrast to outliers caused
by bad data, this set of outliers had material implications for the conclusions of their results.
There may be opportunities in using less clean data sources or data with shorter history, and so on. This
type of data might provide a source of alpha that others overlook because of the work and patience
16
required to make the data usable for a research process.
How does poor data affect strategies? High-quality data are critical to success. The validity and power of the
results rely on well-prepared datasets. For example, Ljungavist, Malloy, and Marston [2009] document
changes in the collection and recording of historical I/B/E/S analyst stock recommendations. They show
that these changes are nonrandom, and the consequences of these changes affect returns generated on
17
WHAT DO WE MEAN BY "GOOD" MODELS AND STRATEGIES?
In this section, we describe five key properties of "good" quantitative models. We leverage the work of
Gabaix and Laibson [2008], who describe critical properties for building economic models. These
Parsimony: Parsimony means models with few assumptions. All models are only approximations
of reality, and some features will always be omitted. Our assumptions are based on the results of empirical
research (parameter estimates), economic intuition, and judgment and theory (for example, priors,
structural assumptions). Having too many assumptions tends to lead to overfitting. When overfitting
Tractability: Tractable models are easy to analyze, providing transparency to the user. We should have
explicit descriptions of our research choices for data, research design, and computational choices. Having
tractability enables us to question our model assumptions and make changes when necessary.
Conceptual insightfulness: Our model should align with market dynamics, investor behavior, and
investment theory. The empirical analysis and our hypothesis about why the strategy works should be
mutually reinforcing.
Predictability: We desire models that give us robust forecasts. As practitioners, we are primarily concerned with
the profitability and risk of a strategy and how well the model's motivation fits economic theory and market
behavior.
Adaptability: Financial markets are constantly evolving, subject to sudden, unpredictable changes. Market
changes challenge the assumptions of models. It is important to understand and forecast the potential impact
of those changes. For example, change may be the result of innovation of financial products, change in the
preferences of market participants, and/or responses to exogenous financial and economic shocks.
Flexibility in our thinking and our models is necessary to adapt to changing market conditions.
CONCLUSIONS
In this article, we discussed a number of principles, insights, and experiences for building successful
quantitative equity strategies. To build these successful strategies, we need to combine judgment with the
18
scientific approach to identify and validate new opportunities in ever-changing markets. Understanding
the data and computational techniques is necessary for obtaining empirical evidence to support our
investment ideas. Without doubt, the research and development of successful quantitative models are a
blend of science and art. Our endeavors in model building require constant innovation and rigorous
analytics to be successful.
19
[References]
1. Box G.E.P. “Science and Statistics,” Journal of the American Statistical Association,
71(1976), pp.791-799
2. Brown S., Lajbcygier P., Li B. “Going Negative: What to Do with Negative Book Equity
4. Da Z., Engelberg J., Gao P., “The Sum of All FEARS Investor Sentiment and Asset Prices.”
Review of Financial Studies, Vol.28, No.1(2015), pp.1-32
5. Caplin A., Schotter A., Gabaix X., Laibson D. “The Seven Properties of Good Models.” In
The Foundations of Positive and Normat ive Economics: A Handbook Caplin A.,Schotter
http://ssrn.com/abstract=2311310 .
7. Kothari S., Sabino J., Zach T. “Implications of Survival and Data Trimming for Tests of
Market Efficiency.” Journal of Accounting and Economics, Vol. 39, No.1(2005) pp.129 -
161
9. Leamer E. “Let’s Take the Con Out of Econometrics.” Amer ican Economic Review, Vol.
73, No.1 (1983), pp.31 -43
No.1(2009), pp.1935-1960
12. Moritz B., Zimmermann T. “Deep Conditional Portfolio Sorts: The Relation between Past
and the Business Cycle” USC, Marshall School of Business Working Paper, 2015
14. Patton A., Timmermann A. “Monotonicity in Asset Returns: New Tests with Applications
20
to the Term Structure, The CAPM and Portfolio Sorts.” Journal of Financial Economics,
98 (2010), pp. 605-625
15. Taleb N. “The Black Swan: The Impact of the Highly Improbable.” New York: Random
House, 2007.
16. Tetlock P., Gardner D. “Superforecasting: The Art and Science of Prediction.” New York:
21