Peng Liu - Quantitative Trading Strategies Using Python - Technical Analysis, Statistical Testing, and Machine Learning-Apress (2023)
Peng Liu - Quantitative Trading Strategies Using Python - Technical Analysis, Statistical Testing, and Machine Learning-Apress (2023)
Peng Liu - Quantitative Trading Strategies Using Python - Technical Analysis, Statistical Testing, and Machine Learning-Apress (2023)
Apress Standard
The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and
accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with
respect to the material contained herein or for any errors or omissions
that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional af iliations.
Quantitative trading, also called algorithmic trading, refers to automated trading activities that buy or sell
particular instruments based on speci ic algorithms. Here, an algorithm can be considered a model that
transforms an input into an output. In this case, the input includes suf icient data to make a proper trading
decision, and the output is the action of buying or selling an instrument. The quality of a trading decision
thus relies on the suf iciency of the input data and the suitability and robustness of the model.
Developing a successful quantitative trading strategy involves the collection and processing of vast
amounts of input data, such as historical price data, inancial news, and economic indicators. The data is
passed as input to the model development process, where the goal is to accurately forecast market trends,
identify trading opportunities, and manage potential risks, all of which are re lected in the resulting buy or
sell signals.
A robust trading algorithm is often identi ied via the process of backtesting, which involves simulating
the algorithm’s performance using historical data. Simulating the performance of the algorithm under
different scenarios allows us to assess the strategy’s potential effectiveness better, identify its limitations,
and ine-tune the parameters to optimize its results. However, one also needs to be aware of the potential
risks of over itting and survivorship bias, which can lead to in lated metrics and potentially poor test set
performance.
In this chapter, we start by covering a few basic and important concepts related to quantitative trading.
We then switch to hands-on examples of working with inancial data using Python.
The model used to generate trading signals could be either rule-based or trained using data. The rule-
based approach mainly relies on domain knowledge and requires explicitly writing out the logic low from
input to output, similar to following a cooking recipe. On the other hand, the data-driven approach involves
training a model using machine learning techniques and using the model as a black box for prediction and
inference. Let us review the overall model training process in a typical machine learning work low.
Now let us look at a speci ic type of algorithmic trading at large institutions: institutional algorithmic
trading.
The institutional algorithmic strategies generate optimal trading signals by analyzing daily quotes and
prices. For example, an institutional algorithmic strategy may suggest entering a long position if the current
stock price moves from below to above the volume-weighted average price (VWAP) over a day, a technical
indicator often used by short-term traders. The institutional algorithmic strategies may also exploit
arbitrage opportunities or price spreads between correlated securities. Here, arbitrage means making
positive sure pro its with zero investments. Arbitrage opportunities, if exist, would normally disappear
very fast as many hedge funds and investors are constantly looking for such arbitrage opportunities.
The next section brie ly introduces the role of a quant trader.
Alternatively, we can group tradable assets based on the type of maturity. Stocks, currencies, and
commodities are asset classes with no maturity, while ixed-income instruments and derivatives have
maturities. For vanilla security with a maturity date, such as a futures contract, it is possible to compute its
fair price based on the no-arbitrage argument, a topic we will discuss in Chapter 3.
We can also group assets based on the linearity of the payoff function at maturity for certain derivative
instruments. For example, a futures contract allows the buyer/seller to buy/sell the underlying asset at an
agreed price at maturity. Let us assume the underlying (stock) price at the maturity date is ST and the
agreed price is K. When a buyer enters/longs a futures contract to buy the stock at price K, the buyer would
make a pro it of ST − K if ST ≥ K (purchase the stock at a lower price) or suffer a loss of K − ST if ST < K
(purchase the stock at a higher price). A similar analysis applies to the case of entering a short position in a
futures contract. Both functions are linear with respect to the underlying asset’s price upon exercise. See
Figure 1-4 for an illustration of linear payoff functions.
Figure 1-4 Illustration of the linear payoff function of entering a long or short position in a futures contract
Other derivative products with linear payoff functions include forwards and swaps. These are easy to
price since their prices are linear functions of the underlying asset. We can price these instruments
irrespective of the mathematical model for the underlying price. In other words, we only require the
underlying asset’s price, not the mathematical model around the asset. These assets are thus subject to
model-independent pricing.
Let us look at the nonlinear payoff function from an options contract. A call option gives the buyer a
choice to buy the underlying asset at the strike price K at the maturity date T when the underlying asset
price is ST, while a put option changes such choice to selling the underlying asset at the strike price K. Under
both situations, the buyer can choose not to exercise the option and therefore gains no pro it. Given that an
investor can either long or short a call or put option, there are four combinations when participating in an
options contract, as listed in the following:
Long a call: Buy a call option to obtain the opportunity to buy the underlying asset at a prespeci ied strike
price upon maturity.
Short a call: Sell a call option to allow the buyer the opportunity to buy the underlying asset at a
prespeci ied strike price upon maturity.
Long a put: Buy a put option to obtain the opportunity to sell the underlying asset at a prespeci ied strike
price upon maturity.
Short a put: Sell a put option to allow the buyer the opportunity to sell the underlying asset at a
prespeci ied strike price upon maturity.
Figure 1-5 contains the payoff functions for the four different combinations, all of which are nonlinear
functions of the underlying asset price ST.
Figure 1-5 Four types of nonlinear payoff functions in an options contract
Note that tradable instruments within the same asset class exhibit similar characteristics but will differ
from one another in some aspects. The market behavior will differ for tradable instruments that follow their
respective price dynamics.
We can also group a tradable asset according to whether it belongs to the cash market or the derivative
market. The cash market, also called the spot market, is a marketplace where trading instruments are
exchanged at the point of sale, and purchasers take immediate possession of the trading products. For
example, the stock exchange falls into the cash market since investors receive shares of stock almost
immediately in exchange for cash, thus settling the transactions on the spot.
On the other hand, the derivative market completes a transaction only at a prespeci ied date in the
future. Take the futures market, for example. A buyer who pays for the right to receive a good only gets to
expect the delivery at a prespeci ied future date.
The next section introduces common trading avenues and steps.
Market Structures
Before 2010, open outcry was a popular way to communicate trade orders in trading pits ( loor). Traders
would tap into temporary information asymmetry and use verbal communication and hand signals to
perform trading activities at stock, option, and futures exchanges. Traders would arrange their trades face
to face on the exchange’s trading loor, cry out bids and offers to offer liquidity, and listen for bids and offers
to take liquidity. The open outcry rule is that traders must announce their bids and offers so that other
traders may react to them, avoiding whispering among a small group of traders. They must also publicly
announce that they accept bids (assets sold) or offers (assets taken) of particular trades. The largest pit was
the US long-term treasury bond futures market, with over 500 loor traders under the Chicago Board of
Trade (CBOT), a major market maker that later merged into the CMT Group.
As technology advanced, the trading markets moved from physical to electronic, shaping a fully
automated exchange. First proposed by Fischer Black in 1971, the fully automated exchange was also called
program trading, which encompasses a wide range of portfolio trading strategies.
The trading rules and systems together de ine a trading market’s market structure. One type of market
is called the call market, where trades are allowed only when the market is called. The other type of market
is the continuous market, where trades are allowed anytime during regular trading hours. Big exchanges
such as NYSE, LSE (London Stock Exchange), and SGX (Singapore Exchange) allow a hybrid mode of market
structure.
The market structure can also be categorized based on the nature of pricing among the tradable assets.
When the prices are determined based on the bid (buy) and ask (sell) quotations from market makers or
dealers, it is called a quote-driven or price-driven market. The trades are determined by dealers and market
makers who participate in every trade and match orders from their inventory. Typical assets in a quote-
driven market include bonds, currencies, and commodities.
On the other hand, when the trades are based on the buyers’ and sellers’ requirements, it is called an
order-driven market where the bid and ask prices, along with the number of shares desired, are put on
display. Typical assets in an order-driven market include stock markets, futures exchanges, and electronic
communications networks (ECNs). There are two basic types of orders: market orders, based on the asset’s
market price, and limit orders, where the assets are only traded based on the preset limit price.
Let us look at a few major types of buy-side stock investors.
Market Making
Market maker refers to a irm or an individual that actively quotes the two-sided markets (buy side and sell
side) of a particular security. The market maker provides bids, meaning the particular price of the security
along with the quantity it is willing to buy. It also provides offers (asks), meaning the price of the security
and the quantity it is willing to sell. Naturally, the asking price is supposed to be higher than the bid price, so
that the market maker can make a pro it based on the spread of the two quote prices.
Market makers post quotes and stand ready to trade, thereby providing immediacy and liquidity to the
market. By quoting bid and ask prices, market makers make the assets more liquid for potential buyers and
short sellers.
A market maker also takes a signi icant risk of holding the assets because a security’s value may decline
between its purchase and sale to another buyer. They need capital to inance their inventories. The capital
available to them thus limits their ability to offer liquidity. Because market making is very risky, investors
generally dislike investing in market-making operations. Market-making irms with signi icant external
inancing typically have excellent risk management systems that prevent their dealers from generating
large losses.
The next section introduces the concept of scalping.
Scalping
Scalping is a type of trading that makes small and fast pro its by quickly (typically no more than a few
minutes in large positions) and continuously acquiring and unwinding their positions. Traders that engage
in scalping are referred to as scalpers.
When engaged in scalping, a trader requires a live feed of quotes in order to move fast. The trader, also
called the day trader, must follow a strict exit strategy because one large loss could eliminate the many
small gains the trader worked to accumulate.
Active traders such as day traders are strong believers in market timing, a key component of actively
managed investment strategies. For example, if traders can predict when the market will go up and down,
they can make trades to turn that market move into a pro it. Obviously, this is a dif icult and strenuous task
as one needs to watch the market continuously, from daily to even hourly, as compared to long-term position
traders that invest for the long run.
The next section introduces the concept of portfolio rebalancing.
Portfolio Rebalancing
As time goes on, a portfolio’s current asset allocation will drift away from an investor’s original target asset
allocation. If left unadjusted, the portfolio will either become too risky or too conservative. Such rebalancing
is completed by changing the position of one or more assets in the portfolio, either buying or selling, with
the goal of maximizing the portfolio return or hedging another inancial instrument.
Asset allocations in a portfolio can change as market performance alters the values of the assets due to
price changes. Rebalancing involves periodically buying or selling the assets in a portfolio to regain and
maintain that original, desired level of asset allocation de ined by an investor’s risk and reward pro ile.
There are several reasons why a portfolio may deviate from its target allocation over time, such as due
to market luctuations, additional cash injection or withdrawal, and changes in risk tolerance. We can
perform portfolio rebalancing using either a time-based rebalancing approach (e.g., quarterly or annually)
or a threshold-based rebalancing approach, which occurs when the allocation of an asset class deviates
from the target by a prede ined percentage.
In the world of quantitative trading, Python has emerged as a powerful tool for formulating and
implementing trading algorithms. Part of the reason is its comprehensive open source libraries and strong
community support. In the next section, we will discuss the practical aspect of inancial data analysis and
start by acquiring and summarizing the stock data using Python.
Let us examine the bullish candle in the green of a trading day. When the market starts, the stock
assumes an opening price and starts to move. Across the day, the stock will experience the highest price
point (high) and the lowest price point (low), where the gap in between indicates the momentum of the
movement. We know for a fact that the high will always be higher than the low, as long as there is
movement. When the market closes, the stock registers a close. Figure 1-7 depicts a sample movement path
summarized by the green candlestick.
Figure 1-7 A sample path of stock price movement represented by the green candlestick chart. When the market starts, the stock assumes an
opening price and starts to move. It will experience the highest price point (high) and the lowest price point (low), where the gap in between
indicates the momentum of the movement. When the market closes, the stock registers a close
Next, we will switch gears and start working on the actual stock price data using Python. We will
download the data from Yahoo! Finance and introduce different ways to graph the data.
The result shows a long list of information about Microsoft, useful for our initial analysis of a particular
stock. Note that all this information is structured in the form of a dictionary, making it easy for us to access
a speci ic piece of information. For example, the following code snippet prints the market cap of the stock:
Such structured information, also considered metadata in this context, comes in handy when we analyze
multiple tickers together.
Now let us focus on the actual stock data of Microsoft. In Listing 1-2, we download the stock price data of
Microsoft from the beginning of 2022 till the current date. Here, the current date is determined
automatically by the today() function from the datetime package, which means we will obtain a
different (bigger) result every time we run the code on a future date. We also specify the format of the date
to be “YYYY-mm-dd,” an important practice to unify the date format.
# download daily stock price data by passing in specified ticker and date
range
from datetime import datetime
today_date = datetime.today().strftime('%Y-%m-%d')
print(today_date)
data = yf.download("MSFT", start="2022-01-01", end=today_date)
Listing 1-2 Downloading stock price data
We can examine the irst few rows by calling the head() function of the DataFrame. The resulting table
contains price-related information such as open, high, low, close, and adjustment close prices, along with
the daily trading volume:
We can also view the last few rows using the tail() function:
>>> data.tail()
Open High Low Close Adj Close Volume
Date
2022-12-30 238.210007 239.960007 236.660004 239.820007 239.820007 21930800
2023-01-03 243.080002 245.750000 237.399994 239.580002 239.580002 25740000
2023-01-04 232.279999 232.869995 225.960007 229.100006 229.100006 50623400
2023-01-05 227.199997 227.550003 221.759995 222.309998 222.309998 39585600
2023-01-06 223.000000 225.759995 219.350006 224.929993 224.929993 43597700
It is also a good habit to check the dimension of the DataFrame using the shape() function:
The following section will look at visualizing the time series data via interactive charts.
Running the code produces Figure 1-8. Note that the graph is interactive; by hovering over each point,
the corresponding date and closing price come forward.
Figure 1-8 Interactive time series plot of the daily closing price of Microsoft
We can also enrich the graph by overlaying the trading volume information, as shown in Listing 1-4.
Running the code generates Figure 1-9. Note that the trading volume assumes a secondary y-axis on the
right, by setting secondary_y=True.
Figure 1-9 Visualizing the daily closing price and trading volume of Microsoft
Based on this graph, a few bars stand out, making it dif icult to see the line chart. Let us change it by
controlling the magnitude of the secondary y-axis. Speci ically, we can enlarge the total magnitude of the
right y-axis to make these bars appear shorter, as shown in Listing 1-5.
# rescale volume
fig2.update_yaxes(range=[0,500000000],secondary_y=True)
fig2.update_yaxes(visible=True, secondary_y=True)
fig2
Listing 1-5 Rescaling the y-axis
Running the code generates Figure 1-10. Now the bars appear shorter given a bigger range (0 to 500M)
of the y-axis on the right.
Figure 1-10 Controlling the magnitude of the daily trading volume as bars
Lastly, let us plot all the price points via candlestick charts. This requires us to pass in all the price-
related information in the DataFrame. The Candlestick() function can help us achieve this, as shown in
Listing 1-6.
Running the code generates Figure 1-11. Each bar represents one day’s summary points (open, high,
low, and close), with the green color indicating an increase in price and red indicating a decrease in price at
the end of the trading day.
Figure 1-11 Visualizing all daily price points of Microsoft as candlestick charts
Notice the sliding window at the bottom. We can use it to zoom in a speci ic range, as shown in Figure 1-
12. The dates along the x-axis are automatically adjusted as we zoom in. Also, note that these bars come in
groups of ive. This is no incidence—there are ive trading days in a week.
Summary
In this chapter, we covered the basics of quantitative trading, covering topics such as institutional
algorithmic trading, major asset classes, derivatives such as options, market structures, buy-side investors,
market making, scalping, and portfolio rebalancing. We then delved into exploratory data analysis of the
stock data, starting with summarizing the periodic data points using candlestick charts. We also reviewed
the practical side of things, covering data retrieval, analysis, and visualization via interactive charts. These
will serve as the building blocks as we develop different trading strategies later on.
Exercises
List a few inancial instruments and describe the risk and reward pro ile.
Can a model get exposed to the test set data during training?
A model is considered better if it does better than another model on the training set, correct?
For daily stock price data, can we aggregate it as weekly data? How about hourly?
What is the payoff function for the issuer of a European call option? Put option? How is it connected to
the payoff function of the buyer?
Suppose you purchase a futures contract that requires you to sell a particular commodity one month later
for a price of $10,000. What is your payoff when the price of the commodity grows to $12,000? Drops to
$7000?
What about the payoff for the buyer in both cases?
How do the results change if we switch to an options contract with the same strike price and delivery
date?
Draw a sample stock price curve of a red candlestick.
Download the stock price data of Apple, plot it as both a line and a candlestick chart, and analyze its trend.
Calculate the YTD (year-to-date) average stock price of Apple.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_2
2. Electronic Market
Peng Liu1
(1) Singapore, Singapore
In this chapter, we delve into the world of electronic markets, which have revolutionized the way
inancial instruments are traded. With the rapid advancements in technology and the widespread
adoption of the Internet, electronic markets have largely replaced traditional, loor-based trading
venues, ushering in an era of speed, ef iciency, and accessibility for market participants around
the globe.
Electronic markets facilitate the buying and selling of inancial instruments, such as stocks,
bonds, currencies, and commodities (covered in Chapter 1), through computerized systems and
networks. They have played a critical role in democratizing access to inancial markets, enabling
a broader range of participants, including retail investors, institutional investors, and high-
frequency traders, to engage in trading activities with ease and transparency. At the heart of
electronic markets lies the trading mechanism, which governs how buy and sell orders are
matched, executed, and settled.
Furthermore, electronic markets offer a variety of order types that cater to the diverse needs
and objectives of traders. These order types can be used to achieve speci ic goals, such as
minimizing market impact, ensuring a desired level of execution, or managing risk. In this
chapter, we will examine the most common types of orders, including market orders, limit orders,
stop orders, and their various iterations.
As we progress through this chapter, readers will gain a comprehensive understanding of the
inner workings of electronic markets, the trading mechanisms that drive them, and the wide
array of order types available to market participants.
Electronic Order
The rise of electronic trading has brought about signi icant improvements in the ef iciency,
speed, and accessibility of inancial markets. Transactions that once took minutes or hours to
complete can now be executed in milliseconds or even microseconds, thanks to the power of
high-speed networks and advanced computer algorithms. As a result, market participants can
take advantage of leeting trading opportunities, react more swiftly to market news, and bene it
from tighter bid-ask spreads, which translate into lower transaction costs.
Moreover, electronic trading has democratized access to global inancial markets, allowing
individual investors to trade alongside institutional players such as hedge funds, banks, and
proprietary trading irms. Through user-friendly online trading platforms, retail investors can
access a vast array of inancial instruments, from stocks and bonds to currencies and derivatives,
and participate in various markets around the world. These platforms provide a wealth of market
data, research tools, and risk management features, empowering investors to make more
informed decisions and execute their trading strategies with precision and ease. At the same
time, the increased transparency and availability of market data have fostered a more
competitive landscape, driving innovation in trading strategies, algorithms, and inancial
products.
Orders are short messages to the exchange through the broker. An order is a set of
instructions the trader gives to the exchange. It must contain at least the following instructions:
Contract/security (or contracts/securities) to trade
Buy or sell or cancel or modify
Size: How many shares or contracts to trade
From an investor’s perspective, making a trade via a computer system is simple and easy.
However, the complex process behind the scenes sits on top of an impressive array of technology.
What was once associated with shouting traders and wild hand gestures in open outcry markets
has now become more closely associated with computerized trading strategies.
When you place an order to trade a inancial instrument, the complex technology enables your
brokerage to interact with all the securities exchanges looking to execute the trade. Those
exchanges simultaneously interact with all the brokerages to facilitate trading activities.
For example, the Singapore Exchange (SGX), a Singaporean investment holding company, acts
through its central depository (CDP) as a central counterparty to all matched trades (mainly
securities) executed on the SGX ST Trading Engine, as well as privately negotiated married trades
that are reported to the clearing house for clearing on the trade date. Being a central
counterparty (CCP), CDP assumes the role of the seller to the buying clearing member and buyer
to the selling clearing member. CDP, therefore, takes the buyer’s credit risks and assumes the
seller’s delivery risks. This interposing of CDP as the CCP eliminates settlement uncertainty for
market participants. SGX provides a centralized order-driven market with automated order
routing, supported by decentralized computer networks. There are no designated market makers
(liquidity providers), and member irms act as brokers or principals for clearing and settlement.
Market Order
The market order is the most common transaction type in the stock markets. It is an instruction
by an investor to a broker to buy or sell stock shares, bonds, or other assets at the best available
price in the current inancial market. This means a market order instructs the broker to buy or
sell a security immediately at the current price. Since there will be plenty of willing buyers and
sellers for large-cap stocks, futures, or ETFs, market orders are best used for buying or selling
these inancial instruments with high liquidity.
Since the market order is an instruction to trade a given quantity at the best price possible,
the priority of the market-order trader is to execute the order immediately with no speci ic price
limit. Thus, the main risk is the uncertainty of the ultimate execution price. Once submitted, the
market order cannot be canceled since it has already been executed.
Note that the electronic market orders don’t wait. Upon receipt of a market order, the
exchange will match it against the standing limit orders immediately until it is completely illed.
Such immediacy characterizes market orders compared to limit orders (introduced in the
following section). This means that when illing a market order, the order matching system will
buy at the (ideally) lowest ask price or sell at the highest bid price, thus ending up paying the
bid/ask spread.
Given the nature of market orders, they are particularly suitable for situations where the
primary goal is to execute a trade quickly, rather than achieving a speci ic target price. This
makes market orders especially useful in fast-moving or volatile market conditions, where
getting in or out of a position promptly is crucial. However, the urgency of market orders also
exposes investors to the risk of price slippage, which occurs when the actual execution price
differs from the expected price due to rapid market luctuations.
It is important for investors to understand that market orders offer no price protection,
meaning that the execution price may be signi icantly different from the current market price,
especially for illiquid or thinly traded instruments. In such cases, limit orders may be a more
appropriate choice, as they allow investors to specify a maximum purchase price or a minimum
sale price for their orders, providing some level of price control. However, limit orders come with
the trade-off of potentially not being executed if the speci ied price is not met.
Limit Order
A limit order, which instructs the broker to buy or sell at the best price available only if the price
is no worse than the price limit speci ied by the investor, is the main alternative to the market
order for most individual investors. It is preferable when buying or selling a less frequently
traded or highly volatile asset.
During regular hours, limit orders are arranged according to the exchange’s limit price and
time of receipt. When a buy market order arrives, irst in the queue limit order selling at the
lowest ask price gets matched irst. When a sell market order arrives, irst in the queue limit
orders bidding at the highest bid price gets executed irst. If the order is not executable, the order
will be a standing offer and placed in a ile called a limit order book.
A buy limit order is an order to purchase a inancial instrument at or below a speci ied price,
allowing traders to control how much they would pay for the instrument. In other words, the
investor is guaranteed to pay that price or less by using a limit order to make a purchase.
Although the price is guaranteed, the order being illed is not guaranteed to be executed in
time. After all, a buy limit order will only be executed if the asking price is at or below the
speci ied limit price. If the asset does not reach the speci ied price or moves too quickly through
the price, the order is not illed and will be stacked into the limit order book, causing the investor
to miss out on the trading opportunity. That is, by using a buy limit order, the investor is
guaranteed to pay the buy limit order price or better but is not guaranteed to have the order
illed.
The same reasoning applies to the sell limit order, where the investor will sell the inancial
instrument at or above a speci ied selling price. A sell limit order allows traders to set a minimum
selling price for their inancial instruments. In this case, the investor is guaranteed to receive the
speci ied price or a better price for the sale, but there is no guarantee that the order will be
executed. A sell limit order will only be illed if the bid price is at or above the speci ied limit
price. If the asset does not reach the speci ied price or moves too quickly through the price, the
order is not illed and will be stored in the limit order book, potentially causing the investor to
miss out on the trading opportunity.
Limit orders offer more control over the execution price than market orders and can be
particularly useful when trading illiquid or volatile assets, where price slippage is more likely.
However, they also come with the risk that the order may not be executed if the speci ied price is
not reached, potentially resulting in missed trading opportunities.
To maximize the chances of a limit order being executed, traders should carefully monitor
market conditions and adjust their limit prices accordingly. They may also consider using other
advanced order types, such as stop-limit orders or trailing stop-limit orders, which combine the
features of limit orders with additional conditions, providing even greater control over the
execution price and risk management.
Figure 2-1 Illustrating the limit order book that consolidates all standing limit orders (prices and quantities) from the buy side and
the sell side. A market maker is incentivized to reduce the gap by providing more liquidity to the market, serving as the liquidity
provider, and making the trades of this asset more executable
We can also look at the marketability of buy and sell orders at different ranges. As shown in
Figure 2-2, we divide the limit order book into ive different regions: above the best offer, at the
best offer, between the best bid and best offer, at the best bid, and below the best bid. For a buy
order, it will be (easily) marketable if the price is at regions 1 and 2, since those eager to sell the
asset (at the bottom part of the top box) would love to see a buyer with an expected or even
higher bid. We call the buy order in the market if it lives within region 3, a situation in lux. Region
4 is borderline and is called at the market, representing the best bid of all the buyers in the limit
order book. When the price of the buy order drops to region 5, there is no marginal
competitiveness, and the order will simply be buried among the rest of the buy orders, leaving it
behind the market. The same reasoning applies to the marketability of sell orders as well.
Figure 2-2 Analyzing the marketability of buy and sell orders within different regions of the limit order book
It is important for traders and investors to understand the marketability of buy and sell
orders in these different regions so as to optimize their order execution strategies. By
strategically placing orders in the appropriate regions, traders can increase the likelihood of their
orders being executed at the desired price levels, thus minimizing transaction costs and better
managing trading risks. Furthermore, by monitoring the market dynamics and the depth of the
limit order book (the number of levels of buy and sell limit orders available in the order book at a
given point in time), traders can gain valuable insights into the market dynamics of the asset.
Stop Order
By default, a stop order is a market order conditioned on a preset stop price. A stop order
becomes a market order as soon as the current market price reaches or crosses the preset stop
price.
A stop order is always executed in the direction that the asset price is moving, assuming that
such movement will continue in its original direction. For instance, if the market for a particular
asset is moving downward, the stop order will be to sell at a preset price below the current
market price. This is called a stop-loss order, which is placed to limit potential losses when the
investor is in an open position of the asset. The stop-loss order will take the investor out of the
open position at a preset level if the market moves against the existing position.
Stop-loss orders are essential, especially when one cannot actively keep an eye on the market.
It’s thus recommended to always have a stop-loss order in place for any existing position in order
to gain protection from a sudden drop in price due to adverse market news. We can also call it a
sell-stop order, which is always placed below the current market price and is typically used to
limit a loss or protect a pro it on a long stock position.
Alternatively, if the price is moving upward, the stop order will be to buy once the security
reaches a preset price above the current market price. This is called a stop-entry order, or buy-
stop order, which can be used to enter the market in the direction the market is moving. A buy-
stop order is always placed above the current market price.
Therefore, before entering a position, we can use a stop-entry (buy-stop) order to long an
asset if the market price exceeds the preset stop price, and use a sell-stop order to short an asset
if the market price drops below the preset stop price. If we are already in a long (or short)
position, we can use a sell-stop (or buy-stop) order to limit the loss of the position in case the
market price drops (or rises).
Also, note that stop orders can be subject to slippage, that is, the difference between the
expected execution price and the actual execution price. Since stop orders are triggered and
converted into market orders once the preset stop price is reached, there is a possibility that the
order may be executed at a worse price than initially anticipated, especially in fast-moving or
illiquid markets. As a result, slippage can lead to a larger loss or a smaller pro it than originally
expected.
Let us look at one example. Say you observe that a particular stock has been moving in a
sideways range (a fairly stable range without forming any distinct trends over some period of
time) between $20 and $30, and you believe it will ultimately break out the upper limit and move
higher. You would like to employ breakout trading, which means you will take a position within
the early stage of an upward-moving trend. In this case, you could place a stop-entry order above
the current upper limit of $30. The price of the stop-entry order can be set as $30.25 to allow for a
margin of error. Placing the stop-entry order gets you into the market once the sideways range is
broken to the upside. Also, now that you’re long in the position, if you’re a disciplined trader,
you’ll want to immediately establish a regular stop-loss sell order to limit your losses in case the
upward trend is false.
When placing a stop order, we have (unknowingly) entered into the world of algorithmic
trading. Here, the logic of algorithmic trading is simple: if the market price reaches or crosses the
stop price, issue a market order; else, keep checking the market price.
Stop-Limit Order
A stop-limit order is similar to a stop order in that a stop price will activate the order. However,
unlike the stop order, which is submitted as a market order when elected, the stop-limit order is
submitted as a limit order. A stop-limit order combines the features of a stop order and a limit
order, providing more control over the execution price while still allowing for the possibility of
protecting against signi icant losses or locking in pro its. Speci ically, when the market price
reaches the preset stop price, the stop-limit order becomes a limit order that will be executed at
the speci ied limit price or better. This ensures that the order will not be executed at a price
worse than the limit price, thus mitigating the risk associated with market orders.
A stop-limit order is a conditional trade that combines the features of a stop order with those
of a limit order and is used to mitigate risk. So a stop-limit order is a limit order contingent on a
preset stop price and a limit price. A stop-limit order eliminates the price risk associated with a
stop order where the execution price cannot be guaranteed. However, it exposes the investor to
the risk that the order may never ill even if the stop price is reached. A stop-limit order gives
traders precise control over when the order should be illed, but the order is not guaranteed to be
executed. Traders often use stop-limit orders to lock in pro its or limit downside losses, although
they could “miss the market” altogether, resulting in missed opportunities if the asset’s price
moves in the desired direction but doesn’t satisfy the limit price condition.
In summary, stop-limit orders offer a balance between limiting the execution price and
stopping potential loss due to signi icant adverse market movements. However, they come with
the risk of not being executed if the limit price is not met, potentially causing traders to miss out
on potential pro its or fail to limit their losses effectively.
Let us look at an example algorithm behind the stop-limit order. Suppose research shows that
the slippage is usually three ticks. Regarding the algorithmic rule for a buy-stop-limit order, if the
market price reaches or crosses the stop price, the system would issue a limit order of a limit
price three ticks above the stop price. Otherwise, it will keep checking the market price.
Regarding the algorithmic rule for a sell-stop-limit order, if the market price reaches or crosses
the stop price, the system would issue a limit order of a limit price three ticks below the stop
price. Otherwise, it will keep checking the market price.
Pegged Order
A pegged order is a type of order that allows the limit price to be dynamic, adjusting
automatically based on a reference price. This can be particularly useful in spread trading or
other trading strategies that require staying in sync with the market’s best bid, best offer, or mid-
price.
The price in a limit order is ixed and static; we can only issue a new order to have a new limit
price. However, there are situations when we would like the limit price to be dynamic. For
example, suppose a trading strategy must trade at an offset of the best bid or best ask. But these
two quotes luctuate, and you want your limit order prices to change in sync with them. Pegged
orders allow you to do just that.
Placing a pegged order requires specifying the reference price to track, along with an optional
differential offset. The differential offset can be a positive or negative multiple of the tick size that
represents the minimum price movement for the particular asset. The trading system will then
manage the pegged order by automatically modifying its price on the order book as the reference
price moves, maintaining the desired price relationship.
A pegged order is a limit order with a dynamic limit price. It allows traders to keep their
orders in line with the changing market conditions without having to monitor and adjust their
orders manually and constantly. This can be particularly bene icial in fast-moving markets or
when trading strategies require maintaining speci ic price relationships with the best bid, best
offer, or mid-price. However, it’s essential to understand that pegged orders still carry the risk of
not being executed if the market moves unfavorably, and the dynamic limit price never reaches a
level at which the order can be illed.
The pegged order is often used in spread trading, which involves the simultaneous buying and
selling of related securities as a unit, designed to pro it from a change in the spread (price
difference) between the two securities. Here, spread trading is a strategy that takes advantage of
the price difference, or spread, between two related securities. In this strategy, a trader
simultaneously buys one security and sells another security to pro it from changes in the spread
between the two. The objective is to capitalize on the temporary mispricing or changing price
relationship between the securities rather than betting on the direction of the individual
securities themselves.
So how does a pegged order work? When entering a pegged order, you must specify a
reference price they wish to track, which could be the best bid, best offer, or mid-price. Best bid
and best offer pegs may track at a differential offset, which is speci ied as a multiple of the whole
tick size. This means that the trading system will manage the pegged order by automatically
modifying the pegged order’s price on the order book as the reference price moves.
Let us look at an example of pegged order. Suppose your strategy requires you to buy a limit
order to be illed at three ticks lower than the current best bid and a sell limit order to be illed at
two ticks higher than the current best offer. When the bid price changes, the pegged order
becomes a composite order comprising
A cancelation order of total order size (one buy limit order and one sell limit order)
A new buy limit order with a limit price pegged at the new best bid less an offset of three ticks,
and a new sell limit order with a limit price pegged at the new best ask plus an offset of two
ticks
Let’s say the current best bid is $100, and the best offer is $101. According to this strategy, we
will place a buy limit order at $100 – (3 ticks) and a sell limit order at $101 + (2 ticks). Assuming
each tick is $0.01, the buy limit order will be placed at $99.97, and the sell limit order will be
placed at $101.02.
Now, if the best bid changes to $100.50 and the best offer changes to $101.50, the pegged
orders will automatically adjust to the new reference prices. Speci ically, the buy limit order will
now be placed at $100.50 – (3 ticks) = $100.47, and the sell limit order will be placed at $101.50 +
(2 ticks) = $101.52.
The pseudocode for the algorithm behind a pegged buy order with an offset of x is as follows:
1.
If the bid price increases to B+
a.
Cancel the current limit order
b.
Submit a buy limit order at a price of B+ − x
2. Else
a. If the bid price decreases to B−
i. If the current limit order is not illed
1.
Cancel the current limit order
2.
Submit a buy limit order at a price of B− − x
ii.
Else
1.
Keep checking whether the bid price has changed
When the bid price changes, the algorithm checks if the change is an increase or a decrease. If
the bid price increases, the current limit order is canceled, and a new buy limit order is submitted
at the new bid price minus the offset x. If the bid price decreases, the algorithm irst checks if the
current limit order has been illed or not. If the current limit order is not illed, the order is
canceled, and a new buy limit order is submitted at the new bid price minus the offset x. If the
order is illed, no further action is needed. The algorithm will continue monitoring the bid price
for changes and adjust the buy limit order accordingly.
Pay attention to the inner if condition in the else statement. Here, we check if the current limit
order is illed. Since there is a price drop, we would execute the limit order if it drops to the limit
price of the buy limit order.
We can similarly write out the pseudocode for the algorithm behind a pegged sell order with
an offset of x as follows:
1.
If the ask price decreases to A−
a.
Cancel the current limit order
b.
Submit a sell limit order at a price of A− + x
2.
Else
a.
If the ask price increases to A+
i.
If the current limit order is not illed
1.
Cancel the current limit order
2.
Submit a sell limit order at a price of A− + x
ii.
Else
1.
Keep checking whether the bid price has changed
Price Impact
It is important to note the potential price impact of large market orders, which tend to move
prices. And the reason is the lack of suf icient liquidity for large orders to ill at the best price.
Large market orders can have a signi icant impact on prices, especially when there is insuf icient
liquidity at the best price level. This phenomenon is known as price slippage, which occurs when
the actual execution price of an order differs from the expected price due to insuf icient liquidity.
For example, suppose that a 10K-share market buy order arrives, and the best offer is $100
for 5K shares. Half the order will ill at $100, but the next 5K shares will have to ill at the next
price in the book, say at $100.02 (where we assume there are also 5K shares offered). The
volume-weighted average price for the order will be $100.01, which is larger than $100.00. Thus,
the price might move further following the trade.
To mitigate the impact of large market orders on prices, traders can consider using
alternative order types or strategies, such as using limit orders to control the price at which their
orders get executed or iceberg orders that divide large orders into smaller parts, thus reducing
the visibility of the order’s total size.
Order Flow
In trading, order low is an important concept. It is the overall trade direction at any given period
of time. Ex post, order low can be inferred from the trade direction. For example, a trade is said
to be buyer initiated if the trade took place at the ask price or higher. In this case, the buyer is
willing to absorb the bid/ask spread and pay a higher price. The trade sign is +1.
Conversely, a trade is seller initiated if the trade occurred at the bid price or lower. In this case,
the seller is willing to absorb the bid/ask spread and sell for a low price. The trade sign is –1.
In essence, the order low suggests the net direction of the market. When there were more
buy (sell) market orders (MO) than sell (buy) MO, the market direction would typically be up
(down). Many papers in the literature have provided ample evidence of this intuitive
observation. It is also well known among traders. By analyzing the order low, traders can identify
buying and selling pressure and anticipate potential price movements. The concept of order low
is based on the premise that the net direction of market orders can provide insights into market
trends and potential price changes.
A positive net order low, where there are more buy market orders than sell market orders,
generally indicates a bullish market with upward price movement. Conversely, a negative net
order low, where there are more sell market orders than buy market orders, signals a bearish
market with a downward price movement. This correlation between order low and market
direction is well documented in academic literature and widely recognized by traders.
So how do we measure the direction of market order lows? One way is to use the net trade
sign: the total number of buyer-initiated trades less the total number of seller-initiated trades.
We can also use the net trade volume sign: the aggregate size of buyer-initiated trades less the
aggregate size of seller-initiated trades.
That being said, if we can forecast the direction of order low ex ante, the trade direction in
the future can be anticipated. In other words, a positive order low suggests the market is likely to
go up, while a negative order low suggests the market is likely to go down.
Therefore, we can use some models to forecast the order low on the ly. A simple model is to
generate a trading signal if the forecasted order low for the next period exceeds some threshold.
This threshold can be determined via backtesting (to be covered in a later chapter).
In the following section, we will look at a sample limit order book data and develop familiarity
with both concepts and implementation.
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
df = np.loadtxt('data/Train_Dst_NoAuction_DecPre_CF_7.txt')
Listing 2-1 Loading the LOB dataset
We can access the dimensions of the sample dataset via the shape attribute:
>>> df.shape
(149, 254750)
In this dataset, the rows indicate features such as asset price and volume, and the columns
indicate timestamps. Typically, we would use the rows to indicate observation-level data per
timestamp and use the columns to represent features or attributes. We would need to transpose
the dataset.
Also, based on the documentation on the dataset, the irst 40 rows carry 10 levels of bid and
ask from the order book, along with the volume of each particular price point. We have a total of
40 entries per timestamp since each side (buy and sell) contains 10 price levels, and each level
includes two points: price and volume. In other words, the limit order book in a single time
snapshot shows up as an array of 40 elements.
The following code prints out price-volume data of ten price levels for the sell and the buy
sides at the irst timestamp:
>>> df[:40,0]
array([0.2615 , 0.00353, 0.2606 , 0.00326, 0.2618 , 0.002 , 0.2604 ,
0.00682, 0.2619 , 0.00164, 0.2602 , 0.00786, 0.262 , 0.00532,
0.26 , 0.00893, 0.2621 , 0.00151, 0.2599 , 0.00159, 0.2623 ,
0.00837, 0.2595 , 0.001 , 0.2625 , 0.0015 , 0.2593 , 0.00143,
0.2626 , 0.00787, 0.2591 , 0.00134, 0.2629 , 0.00146, 0.2588 ,
0.00123, 0.2633 , 0.00311, 0.2579 , 0.00128])
Since each level consists of a price-volume pair for both sides (buy and sell), we know that for
the irst four entries, 0.2615 indicates the ask price, 0.00353 as the volume at that ask price level,
0.2606 as the buy price, and 0.00326 as the volume at that buy price level. Every two entries
constitute a price-volume pair, and every price level corresponds to two consecutive pairs. We
have a total of 10 price levels, corresponding to 20 price-volume pairs, including 10 for the buy
side and 10 for the sell side. Also, we know that price levels on the sell side should always be
higher than on the buy side, and a quick check veri ies this.
Let us extract the price-volume pairs across all timestamps. Remember to transpose the
dataset, which is achieved by accessing the .T attribute. The inal result is then converted into a
Pandas DataFrame format for better processing later. Remember to print a few rows of the
transformed dataset in df2 for a sanity check:
def printdistribution(dataset):
fig = make_subplots(rows=1, cols=5,
subplot_titles=("k=10", "k=20", "k=30",
"k=50", "k=100"))
fig.add_trace(
go.Histogram(x=dataset[144,:], histnorm='percent'),
row=1, col=1
)
fig.add_trace(
go.Histogram(x=dataset[145,:], histnorm='percent'),
row=1, col=2
)
fig.add_trace(
go.Histogram(x=dataset[146,:], histnorm='percent'),
row=1, col=3
)
fig.add_trace(
go.Histogram(x=dataset[147,:], histnorm='percent'),
row=1, col=4
)
fig.add_trace(
go.Histogram(x=dataset[148,:], histnorm='percent'),
row=1, col=5,
)
fig.update_layout(
title="Label distribution of mid-point movement",
width=700,
height=300,
showlegend=False
)
fig.update_xaxes(ticktext=labels, tickvals=[1, 2, 3], tickangle =
-45)
fig.update_yaxes(visible=False, showticklabels=False)
fig.layout.yaxis.title.text = 'percent'
fig.show()
>>> printdistribution(df)
Listing 2-2 Plotting the label distribution of the mid-point movement
Running the code generates Figure 2-3. The plot suggests an increasingly obvious trend for
upward and downward movements as the lookahead window gets large.
Figure 2-3 Histogram of three types of movement across different lookahead windows in the limit order book
>>> df2.shape
(254750, 40)
Now we would like to dissect this DataFrame and allocate each component to a separate
DataFrame. In Listing 2-3, we subset the DataFrame based on the sequence of columns for each
component, resulting in four DataFrames: dfAskPrices, dfAskVolumes, dfBidPrices, and
dfBidVolumes. Subsetting the DataFrame is completed by calling the loc() function and
supplying the corresponding row and column indexes.
>>> dfAskPrices.loc[0,:]
0 0.2615
4 0.2618
8 0.2619
12 0.2620
16 0.2621
20 0.2623
24 0.2625
28 0.2626
32 0.2629
36 0.2633
Name: 0, dtype: float64
>>> dfBidPrices.loc[0,:]
2 0.2606
6 0.2604
10 0.2602
14 0.2600
18 0.2599
22 0.2595
26 0.2593
30 0.2591
34 0.2588
38 0.2579
Name: 0, dtype: float64
The results show that the ask prices follow an increasing sequence, while the bid prices follow
a decreasing sequence. Since we often work with price data that follow an increasing sequence in
analyses such as plotting, we need to reverse the order of the bid prices. The order could be
reversed by rearranging the sequence of columns in the DataFrame. The current sequence of the
columns is
>>> dfBidPrices.columns
Int64Index([2, 6, 10, 14, 18, 22, 26, 30, 34, 38], dtype='int64')
>>> dfBidPrices.columns[::-1]
Int64Index([38, 34, 30, 26, 22, 18, 14, 10, 6, 2], dtype='int64')
Now let us reverse both bid prices and volumes, where we passed the reversed column names
to the respective DataFrames based on column selection:
dfBidPrices = dfBidPrices[dfBidPrices.columns[::-1]]
dfBidVolumes = dfBidVolumes[dfBidVolumes.columns[::-1]]
Examining the irst row of dfBidPrices shows an increasing price trend now:
>>> dfBidPrices.loc[0,:]
38 0.2579
34 0.2588
30 0.2591
26 0.2593
22 0.2595
18 0.2599
14 0.2600
10 0.2602
6 0.2604
2 0.2606
Name: 0, dtype: float64
Note that the index for each entry still stays the same. We may need to reset the index
depending on the speci ic follow-up process.
Since the price increases from the bottom (buy side) to the top (sell side) in a limit order
book, we can join the price tables from both sides to show the continuum. There are multiple
ways to join two tables, and we choose outer join to avoid missing any entry. Listing 2-4 joins the
price and volume tables from both sides, followed by renaming the columns.
We can print out the irst row of dfPrices to check the prices across all levels at the irst
timestamp:
>>> dfPrices.loc[0,:]
1 0.2579
2 0.2588
3 0.2591
4 0.2593
5 0.2595
6 0.2599
7 0.2600
8 0.2602
9 0.2604
10 0.2606
11 0.2615
12 0.2618
13 0.2619
14 0.2620
15 0.2621
16 0.2623
17 0.2625
18 0.2626
19 0.2629
20 0.2633
Name: 0, dtype: float64
The result shows that all prices are in increasing order. Since the irst ten columns show the
buy-side prices and the last ten columns belong to the sell-side prices, the best bid price would be
the highest price at the buy side, that is, 0.2606, while the best ask price (best offer) would be the
lowest price at the sell side, that is, 0.2615. The difference between the two price points gives us
the bid/ask spread for the current snapshot, and its movement across different snapshots
indicates market dynamics.
We can plot these prices as time series, where each price curve represents the evolution of
price for the speci ic particular of a buy or sell trading side. As a matter of fact, these curves
should not intersect with each other; otherwise, they would have been transacted and jointly
removed from that price level. Listing 2-5 plots the 20 price curves for the irst 50 timestamps.
fig = go.Figure()
for i in dfPrices.columns:
fig.add_trace(go.Scatter(y=dfPrices[:50][i]))
fig.update_layout(
title='10 price levels of each side of the orderbook',
xaxis_title="Time snapshot index",
yaxis_title="Price levels",
height=500,
showlegend=False,
)
>>> fig.show()
Listing 2-5 Visualizing sample price curves
Running the code generates Figure 2-4. Note the big gap in the middle; this is the bid/ask
spread of the limit order book. The igure also tells us something about market dynamics. For
example, at time step 20, we observe a sudden jump in ask prices, which may be caused by a
certain event in the market, causing the sellers to raise the prices as a whole.
Figure 2-4 Visualizing the 10 price curves for both sides for the irst 50 time snapshots. Each curve represents the price evolution at
a particular price level and will not intersect with each other. The big gap in the middle presents the bid/ask spread of the limit order
book
Note that the graph is interactive, offering the usual set of lexible controls (such as zooming,
highlighting via selection, and additional data upon hovering) based on the plotly library.
We can also plot the volume data as stacked bar charts. The following code snippet retrieves
the irst 5 snapshots of volume data and plots the 20 levels of volumes as stack bars:
px.bar(dfVolumnes.head(5).transpose(), orientation='h')
Figure 2-5 Plotting the irst 5 snapshots of volume as bar charts across all 20 price levels
Let us plot the volume at each price level for a particular time snapshot. We can use the
iloc() function to access a particular portion based on the positional index. For example, the
following code prints out the irst row of dfPrices:
>>> dfPrices.iloc[0]
1 0.2579
2 0.2588
3 0.2591
4 0.2593
5 0.2595
6 0.2599
7 0.2600
8 0.2602
9 0.2604
10 0.2606
11 0.2615
12 0.2618
13 0.2619
14 0.2620
15 0.2621
16 0.2623
17 0.2625
18 0.2626
19 0.2629
20 0.2633
Name: 0, dtype: float64
We can plot the volume data of a particular timestamp as bars. As shown in Listing 2-6, we use
list comprehension to format the prices to four decimal places before passing them to the y
argument in the go.Bar() function.
colors = ['lightslategrey',] * 10
colors = colors + ['crimson',] * 10
fig = go.Figure()
timestamp = 0
fig.add_trace(go.Bar(
y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices.iloc[timestamp].tolist()],
x=dfVolumnes.iloc[timestamp].tolist(),
orientation='h',
marker_color=colors
))
fig.update_layout(
title='Volume of 10 price levels of each side of the orderbook',
xaxis_title="Volume",
yaxis_title="Price levels",
# template='plotly_dark'
)
fig.show()
Listing 2-6 Visualizing the volume data
We can also combine the previous two charts together, as shown in Listing 2-7.
for i in dfPrices.columns:
fig.add_trace(go.Scatter(y=dfPrices.head(20)[i]), row=1, col=1)
timestamp = 0
fig.add_trace(go.Bar(
y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices.iloc[timestamp].tolist()],
x= dfVolumnes.iloc[timestamp].tolist(),
orientation='h',
marker_color=colors
), row=1, col=2)
fig.update_layout(
title='10 price levels of each side of the orderbook for multiple
time points, bar size represents volume',
xaxis_title="Time snapshot",
yaxis_title="Price levels",
template='plotly_dark'
)
fig.show()
Listing 2-7 Combining multiple charts together
Figure 2-7 Combining the price and volume data for each price level
Visualizing Price Movement
The price at each price level may move across different timestamps as a re lection of market
dynamics. Visualizing the whole times series of the price index may be too granular at irst
glance, since there are too many observations, given the nature of the ultra high-frequency data.
Instead, we can pick a ixed-size window to plot the price at a particular period within the
window and then move the window forward in time to show the change in price. The rolling
window can then be used to generate an animation of prices moving up and down.
Listing 2-8 achieves the desired plotting effect. Here, we set the window length to 100 and
choose the second price level for visualization. The animation is essentially a collection of frames
changing from one to another. Thus, we supply the corresponding sequence of data for each
frame in the animation.
widthOfTime = 100
priceLevel = 1
fig = go.Figure(
data=[go.Scatter(x=dfPrices.index[:widthOfTime].tolist(),
y=dfPrices[:widthOfTime][priceLevel].tolist(),
name="frame",
mode="lines",
line=dict(width=2, color="blue")),
],
layout=go.Layout(width=1000, height=400,
# xaxis=dict(range=[0, 100], autorange=False,
zeroline=False),
# yaxis=dict(range=[0, 1], autorange=False,
zeroline=False),
title="10 price levels of each side of the
orderbook",
xaxis_title="Time snapshot index",
yaxis_title="Price levels",
template='plotly_dark',
hovermode="closest",
updatemenus=[dict(type="buttons",
showactive=True,
x=0.01,
xanchor="left",
y=1.15,
yanchor="top",
font={"color":'blue'},
buttons=[dict(label="Play",
method="animate",
args=[None])])]),
frames=[go.Frame(
data=[go.Scatter(
x=dfPrices.iloc[k:k+widthOfTime].index.tolist(),
y=dfPrices.iloc[k:k+widthOfTime][priceLevel].tolist(),
mode="lines",
line=dict(color="blue", width=2))
]) for k in range(widthOfTime, 1000)]
)
fig.show()
Listing 2-8 Animating the price movement
Running the code generates Figure 2-8. We can click the Play button to start animating the
line chart, which will change shape as we move forward.
Figure 2-8 Animating the price changes of a selected price level via a rolling window of 100 timestamps
In addition, we can also plot the animation of change in the volume across all the price levels,
as shown in Listing 2-9. The change in volume also indicates the market dynamics in terms of
supply and demand, although less so direct than the price itself.
timeStampStart = 100
fig = go.Figure(
data=[go.Bar(y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices[:timeStampStart].values[0].tolist()],
x=dfVolumnes[:timeStampStart].values[0].tolist(),
orientation='h',
name="priceBar",
marker_color=colors),
],
layout=go.Layout(width=800, height=450,
title="Volume of 10 buy, sell price levels of an
orderbook",
xaxis_title="Volume",
yaxis_title="Price levels",
template='plotly_dark',
hovermode="closest",
updatemenus=[dict(type="buttons",
showactive=True,
x=0.01,
xanchor="left",
y=1.15,
yanchor="top",
font={"color":'blue'},
buttons=[dict(label="Play",
method="animate",
args=[None])])]),
frames=[go.Frame(
data=[go.Bar(y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices.iloc[k].values.tolist()],
x=dfVolumnes.iloc[k].values.tolist(),
orientation='h',
marker_color=colors)],
layout=go.Layout(width=800, height=450,
title="Volume of 10 buy, sell price levels of an
orderbook [Snapshot=" + str(k) +"]",
xaxis_title="Volume",
yaxis_title="Price levels",
template='plotly_dark',
hovermode="closest")) for k in
range(timeStampStart, 500)]
)
fig.show()
Listing 2-9 Animating the volume movement
Figure 2-9 Visualizing the change in the volume across all the price levels
Summary
In this chapter, we covered the basics of the electronic market and the different types of
electronic orders, including market order, stop order, limit order, and other forms of dynamic
order (e.g., pegging, trailing stop, market if touched, limit, and cancelation). We discussed the
mechanism of the order matching system and order low.
In the second section, we looked at real LOB data and discussed different ways to visualize the
price and volume data, such as their movement across time. Working with the actual data by irst
plotting them out and performing some initial analysis is a common and important irst step in
the whole pipeline of devising and implementing trading strategies.
Exercises
Write a function in Python to illustrate the algorithm of a pegged buy order and sell order.
(Hint: Start by de ining your own input and output.)
What’s the difference between the market if touched order (MIT) and the stop order?
How to calculate mid-price in a limit order book? Implement the logic in code. (Hint: Start by
de ining your own input and output.)
Describe how a buy trailing stop order works.
Should the trailing stop-loss order be placed above or below the current market price for an
investor in a long position? A short position?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_3
Obligations at Maturity
There are two types of settlement upon expiration of a futures (and
options) contract: physical delivery and cash settlement. Such derivative
contracts will either be physically delivered or cash-settled.
The irst type is the physical delivery of the underlying asset. A
deliverable futures contract stipulates that the buyer in the long position
of the futures contract will pay the agreed-upon price to the seller, who in
turn will deliver the underlying asset to the buyer on the predetermined
date (settlement date of the futures contract). This process is called
delivery, where the actual underlying asset needs to be delivered upon
the speci ied delivery date, rather than being traded out with offsetting
contracts.
For example, a buyer enters a one-year crude oil futures contract with
an opposing seller at a price of $60. We know that one futures contract
corresponds to 1000 barrels of crude oil. This means the buyer is
obligated to purchase 1000 barrels of crude oil from the seller, regardless
of the commodity’s spot price on the settlement date. If the spot price of
the crude oil on the agreed settlement date one year later is below $58,
the long contract holder loses a total of ($60 – $58) × $1000 = $2000, and
the short position holder gains $2000. Conversely, if the spot price rises
to $65 per barrel, the long position holder gains ($65 – $60) × $1000 =
$5000, and the short position holder loses $5000.
The second type is cash settlement. When a futures contract is cash-
settled, the net cash position of the contract on the expiry date is
transferred between the buyer and the seller. It permits the buyer and
seller to pay the net cash value of the position on the delivery date.
Take the previous case, for example. When the spot price of the crude
oil drops to $58, the long position holder will lose $2000, which happens
by debiting $2000 from the buyer’s account and crediting this amount to
the seller’s account. On the other hand, when the spot price rises to $65,
the account of the long position holder will be credited $5000, which
comes from debiting the account of the short position holder.
It is important to understand that the majority of futures contracts
are not held until maturity, and most participants in the futures market
do not actually take or make delivery of the underlying asset. Instead,
they are traded out before the settlement date. Traders and investors
often choose to close their positions before the contract’s expiration date
to avoid the obligations associated with physical delivery or cash
settlement. This can be achieved by entering into an offsetting
transaction that effectively cancels out the original position. For example,
a trader with a long position in a futures contract can sell an identical
contract to offset the position, while a trader with a short position can
buy an identical contract to close the position.
The process of closing out a futures position before maturity is a
common practice in the market, as it allows participants to lock in gains
or limit losses without having to deal with the actual delivery or cash
settlement of the underlying asset. This lexibility is one of the key
features of futures trading, as it enables market participants to manage
their risk exposure and capitalize on market opportunities ef iciently.
In conclusion, while futures contracts carry obligations at maturity in
the form of physical delivery or cash settlement, most participants in the
futures market choose to close their positions before the expiration date.
By engaging in offsetting transactions, traders and investors can
effectively manage their risk exposure and pro it from price movements
in the underlying asset without having to deal with the logistics of taking
or making the delivery.
Clearing House
Farmers who sell futures contracts do not sell directly to the buyers.
Rather, they sell to the clearing house of the futures exchange. As a
designated intermediary between a buyer and seller in the inancial
market, the clearing house validates and inalizes each transaction,
ensuring that both the buyer and the seller honor their contractual
obligations. The clearing house thus guarantees that all of the traders in
the futures market will honor their obligations, thus avoiding potential
counterparty risk.
The clearing house serves this role by adopting the buyer’s position
to every seller and the seller’s position to every buyer. Every trader in the
futures market has obligations only to the clearing house. The clearing
house takes no active position in the market, but interposes itself
between all parties to every transaction. As the middleman, the clearing
house provides the security and ef iciency integral to inancial market
stability. So as far as the farmers are concerned, they can sell their goods
to the clearing house at the price of the futures contract when the
contract expires.
The clearing house will then match and con irm the details of the
trades executed on the exchange, including the contract size, price, and
expiration date, ensuring that all parties have accurate and consistent
information. Order matching and con irmation is thus one of the main
roles of a clearing house.
The clearing house of the futures market also has a margin
requirement, which is a sum of the deposit that serves as the minimum
maintenance margin for the (clearing) member of the exchange. All
members of an exchange are required to clear their trades through the
clearing house at the end of each trading session and satisfy the margin
requirement to cover the corresponding minimum balance requirement.
Otherwise, the member will receive a margin call to top up the remaining
balance when the margin account runs low due to luctuation in asset
price. Clearing houses thus collect and monitor margin requirements
from their members, ensuring that all participants have suf icient
collateral to cover potential losses. This helps to maintain the inancial
stability of the market and reduces the likelihood of default.
Figure 3-1 illustrates the clearing house as a middle party between
the buyer and the seller.
Figure 3-1 Illustrating the role of the clearing house as an intermediary between buyers and sellers
in a futures market
Mark-to-Market
Mark-to-market involves updating the price of a futures contract to
re lect its current market value rather than the book value, so as to
ensure that margin requirements are being met. If the current market
value of the futures contract causes the margin account to fall below its
required level, the trader will receive a margin call from the exchange to
top up the remaining balance.
Mark-to-market is a process of pricing futures contracts at the end of
every trading day. Made to accounts with open futures positions, the cash
adjustment in mark-to-market re lects the day’s pro it or loss, based on
the settlement price of the product, and is determined by the exchange.
Since mark-to-market adjustments affect the cash balance in a futures
account, the margin requirement for the account is being assessed on a
daily basis to continue holding an open position.
Let us look at a mark-to-market example and understand the daily
change in the price of the futures contract due to luctuating prices in the
underlying asset. First, note the two counterparties on either side of a
futures contract, that is, a long position trader and a short position trader.
The long trader goes bullish as the underlying asset is expected to
increase in price, while the trader shorting the contract is considered
bearish due to the expected drop in the price of the underlying asset.
The futures contract may go up or down in value at the end of the
trading day. When its price goes up, the long margin account increases in
value due to mark-to-market, with the daily gain credit to the margin
account of the long position trader. Correspondingly, the short position
trader on the opposing side will suffer a loss of an equal amount, which is
debited from the margin account.
Similarly, when the price of the futures contract goes down, the long
margin account decreases in value due to mark-to-market, with the daily
loss debited from the margin account of the long position trader. This
amount will be credited to the margin account of the short position
trader, who will realize a gain of an equal amount.
By updating the price of a futures contract to re lect its current
market value, the exchange can monitor the risk exposure of traders in
real time. This helps to ensure that margin requirements are being met
and that traders have enough funds to cover their positions, which
essentially reduces risk exposure to the traders. This also allows traders
to accurately assess their pro it or loss and make informed decisions
about their positions.
Figure 3-2 illustrates the two types of traders with an open position
in the same futures contract and their respective pro it and loss due to
mark-to-market.
Figure 3-2 Illustrating the mark-to-market process and the resulting effect on the margin account
of long and short position traders for the same futures contract
Note that the margin account changes the balance daily due to
gain/loss from mark-to-market exercise. Although the inal settlement
price at the delivery date could be different from the intended price upon
entering the futures position, the traders on both sides would still end up
transacting at an effective price equal to the initially intended price, thus
hedging the risk of price luctuations.
Now let us look at how to price this derivative product, starting with
its similar twin: forward contract.
As time passes by, the value of each position will evolve. Speci ically,
the forward position becomes −F + ST since we would buy one asset
valued at ST for a price of F. Our stock position becomes −ST due to
change in the stock price, and cash position becomes Ster(T − t).
Now, using the no-arbitrage argument, we would end up with zero
value in our portfolio since we started with zero value. Adding the value
of the three positions at time T gives the total portfolio value of
−F + Ster(T − t). And by equating it to zero, we have F = Ster(T − t), thus
completing the pricing of the forward contract using the no-arbitrage
argument.
This is the formula for the price of a forward contract. It demonstrates
that the forward price is determined by the current price of the
underlying asset, the risk-free interest rate, and the time until the
contract expires. By using this formula, both parties in a forward contract
can agree on a fair price that eliminates arbitrage opportunities and
re lects the true value of the underlying asset.
It is interesting to note that the stock and cash positions jointly
constitute a replicating portfolio that offsets the randomness in the
payoff function of the forward contract at the delivery date. This means
that no matter what the price of the forward contract will be in the future,
we will always be able to use another replicating portfolio to deliver the
same payoff, as if we were in a position of the forward contract. This is
called pricing by replication.
Let us see what happens if the price of the forward is not equal to the
stock price with a continuously compounded interest rate. We can argue
about arbitrage opportunities based on the riskless pro it from the buy-
low-sell-high principle. When F > Ster(T − t), we can borrow an amount of St
and use the money to short a forward contract that allows us to sell one
unit of the underlying asset at price F. Upon reaching the delivery date,
we receive a total of F by selling the asset, pay back the borrowed money
with interest Ster(T − t), and earn a net pro it of F − Ster(T − t). This is
arbitrage, where we made a riskless pro it by taking advantage of the
price difference at the future time T.
Similarly, when F < Ster(T − t), the forward contract is cheaper, and the
asset is more expensive. In that case, we again exercise the buy-low-sell-
high principle by longing a forward contract at time t that allows us to
buy one unit of the underlying asset at price F and time T. We will also
short one unit of the underlying asset at time t to gain a total amount of
St, which further grows to Ster(T − t) upon reaching the delivery date. When
the contract expires, we will close the short position in the underlying
asset by purchasing one unit of the asset for a price of F. We get to keep
the remaining balance Ster(T − t) − F, thus also establishing the arbitrage
argument and ensuring a riskless pro it.
Note that the futures price is equal to the spot price of the underlying
asset at the current time t. To see this, simply set T = t and we have
F = Ster(t − t) = St.
In a nutshell, the future net cash low predetermined or ixed in
advance (today) must equal today’s net cash low to annihilate arbitrage
opportunities. The no-arbitrage argument gives a fair price for the
forward contract.
Figure 3-7 Illustrating the price dynamics of the futures contract in backwardation
# For visualisation
import matplotlib.pyplot as plt
plt.style.use('seaborn-darkgrid')
%matplotlib inline
Let us plot the closing price via Listing 3-2. Note the use of the
fontsize argument in adjusting the font size in the igure.
Note that the DataFrame has two levels of columns, with the irst level
specifying the symbol name and the second one showing the different
price points.
Similarly, we can plot the closing price of the two sets of futures data,
as shown in Listing 3-4.
# Set the figure size
ax = plt.figure(figsize=(15, 7))
futures_symbol = "ES=F"
futures_data = yf.download(futures_symbol,
start="2022-01-01", end="2022-04-01", interval="1d")
Listing 3-5 Downloading S&P 500 E-Mini futures data
# Calculate RSI
futures_data["RSI"] =
ta.momentum.RSIIndicator(futures_data["Close"]).rsi()
# Calculate MACD
macd = ta.trend.MACD(futures_data["Close"])
futures_data["MACD"] = macd.macd()
futures_data["MACD_signal"] = macd.macd_signal()
Listing 3-6 Calculating common technical indicators
Now we can plot the raw futures time series data together with the
technical indicators to facilitate analysis, as shown in Listing 3-7.
# Plot RSI
axes[1].plot(futures_data.index,
futures_data["RSI"], label="RSI", color="g")
axes[1].axhline(30, linestyle="--", color="r",
alpha=0.5)
axes[1].axhline(70, linestyle="--", color="r",
alpha=0.5)
axes[1].set_title("Relative Strength Index (RSI)")
axes[1].grid()
# Plot MACD
axes[3].plot(futures_data.index,
futures_data["MACD"], label="MACD", color="b")
axes[3].plot(futures_data.index,
futures_data["MACD_signal"], label="Signal Line",
linestyle="--", color="r")
axes[3].axhline(0, linestyle="--", color="k",
alpha=0.5)
axes[3].set_title("Moving Average Convergence
Divergence (MACD)")
axes[3].grid()
Listing 3-7 Visualizing futures data and technical indicators
We can plot a few things here. In the plotted RSI chart, we can observe
periods when the RSI crossed below 30, which might signal potentially
oversold conditions. Traders may use these signals to consider entering
or exiting positions. In the plotted chart on Bollinger Bands, we can see
periods when the price touched or crossed the bands, which may indicate
potential trend reversals or support and resistance levels. In the MACD
chart, we can observe periods when the MACD line crossed the signal line,
which may signal potential entry or exit points for traders.
Summary
In this chapter, we delved into the world of options and futures contracts.
Forward contracts are customized, private agreements between two
parties and are traded over the counter (OTC). They are only settled at
the end of the agreement and are priced based on the spot price of the
underlying asset, the risk-free interest rate, and the time to expiration.
However, forward contracts come with potential counterparty risk as
there is no clearing house to guarantee the ful illment of the contractual
obligations.
Futures contracts, on the other hand, are standardized contracts
traded on regulated exchanges. They are marked to market daily,
meaning that the price of the contract is adjusted to re lect its current
market value, ensuring that margin requirements are met. The clearing
house of the futures exchange serves as an intermediary between buyers
and sellers, mitigating counterparty risk and ensuring the stability of the
market.
We also covered the pricing of both types of contracts. For example,
the pricing of futures contracts is in luenced by factors such as the spot
price of the underlying asset, the risk-free interest rate, storage costs, and
convenience yield. In addition, futures markets can exhibit contango,
where futures prices are higher than the spot price, or backwardation,
where futures prices are lower than the spot price.
Exercises
A farmer sells agricultural products, and a manufacturer purchases raw
materials for production. In both cases, what position should they take
in a futures contract in order to hedge against adverse price changes in
the future?
A wheat farmer takes a short position in ten wheat futures contracts on
day 1, each valued at $4.5 and representing 5000 bushels. If the price of
the futures contracts increases to $4.55 on day 2, what is the change in
the farmer’s margin account?
Suppose we enter into a short forward position. What is the risk due to
the luctuating asset price in the future? How can we hedge the risk?
Assume we could buy a barrel of oil for $80 today, and the current
futures price is $85 for delivery three months from today. One futures
contract can buy 1000 barrels of oil. How can you arbitrage in this
situation? What is the pro it? Assume a zero risk-free interest rate.
Apply the same no-arbitrage argument to value a forward contract in a
short position.
Write a function to calculate the fair price of a futures contract given
the spot price of the asset, risk-free interest rate, rate of storage cost,
convenience yield, and delivery date. Allow for both annual
compounding and continuous compounding.
Explain the source of riskless pro it when a forward contract is
overpriced or underpriced than its theoretical no-arbitrage value.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_4
Figure 4-1 Illustrating the four quadrants of risk and return pro ile
In the following section, we will start by understanding the
fundamentals of returns as a performance measure of inancial assets.
Understanding returns is crucial for us to evaluate the success of
different investments and make informed decisions in managing
portfolios.
Analyzing Returns
The return is the irst and foremost metric most investors would look
at for a speci ic investment vehicle. It represents the change in value of
a inancial asset over a speci ied period. It can be expressed in absolute
terms (e.g., the dollar amount gained or lost) or as a percentage of the
initial investment value. As a crucial metric on the performance of an
asset or portfolio, the return allows us to compare across different
investments.
When measured in percentage terms, the range could range from
(theoretically) negative in inity to positive in inity. Suppose the asset
price changes from St − 1 to St. The change in price is St − St − 1, which
could be positive or negative. Considering the price of an asset changes
across different time points, and also the fact that multiple assets have
multiple price levels, it is dif icult to assess whether the price change
St − St − 1 is big or small. To standardize the price changes and make it
easier for comparison, a more widely used measure is percentage
return Rt, de ined as
Next, let us combine these two lists in a Pandas DataFrame for easy
manipulation. This is achieved by wrapping the two lists in a dictionary
and passing it to the pd.DataFrame() function:
return_df = pd.DataFrame({"Asset1":asset_return1,
"Asset2":asset_return2})
>>> return_df
Printing out the return_df variable generates the following,
where the two lists now appear as the two columns in the DataFrame:
Asset1 Asset2
0 0.05 0.5
1 0.30 -0.2
2 -0.10 0.3
3 0.35 0.5
4 0.20 -0.3
To facilitate visual analysis, let us plot the two return series in a bar
chart using the .plot.bar() method:
>>> return_df.plot.bar()
>>> return_df.std()
Asset1 0.185068
Asset2 0.384708
dtype: float64
>>> return_df.mean()
Asset1 0.16
Asset2 0.16
dtype: float64
>>> return_df + 1
Asset1 Asset2
0 1.05 1.5
1 1.30 0.8
2 0.90 1.3
3 1.35 1.5
4 1.20 0.7
init_investment = 100
cum_value = (return_df + 1).cumprod()*100
>>> cum_value
Asset1 Asset2
0 105.0000 150.0
1 136.5000 120.0
2 122.8500 156.0
3 165.8475 234.0
4 199.0170 163.8
>>> cum_value.plot.line()
Figure 4-4 Illustrating the calculation process of return using the 1+R format that gives a more
convenient way to calculate the return
Also, note that the last row in the shifted column is NA, which is due
to the fact that there is no more future price available at the last time
point. This also makes the 1+R return column NA. We will demonstrate
the calculation process in code later. For now, it is good to digest and
accept the 1+R formatted return as an equivalent way of describing
asset returns.
Multiperiod Return
The terminal return can also be considered as the multiperiod return,
or the return over a combined period of time. Since the evolution
process is sequential, we need to compound the returns in each period,
sequentially. When we have the 1+R formatted returns, it is easy to
calculate the multiperiod return by multiplying/compounding the
intermediate 1+R returns followed by a subtraction of one.
The multiperiod return is a measure of an investment’s
performance over a series of consecutive periods. Recall that the
terminal return can be calculated via R0, T = (1 + R0, 1)(1 + R1, 2)…
(1 + RT − 1, T) − 1. When we calculate the two-period return Rt, t + 2, the
formula becomes
This method allows us to calculate the overall return over the two
periods while considering the compounding effect of each period’s
return on the next. The compounded return is thus easy to calculate
using the 1+R formatted returns for both periods. Figure 4-6 illustrates
the process of compounding the two-period return.
Figure 4-6 Calculating the two-period return by compounding the two single-period returns in
1+R format, followed by an adjustment of subtraction by one
By multiplying the 1+R formatted returns for all n periods and then
subtracting one, we can determine the compounded return over the
entire n-period investment horizon.
Let us look at a simple example. Suppose we invest in an asset for
two periods, where the irst-period return is 10%, and the second-
period return is –2%. To calculate the compounded return, our irst
step is to convert both single-period returns to the 1+R format, giving
1.1 and 0.98, respectively. We would then multiply these two numbers
and subtract by one:
Annualizing Returns
Once we know how to calculate the terminal return of any asset, the
next question is comparing assets with different periods of time. For
example, some returns are daily, while other returns are monthly,
quarterly, or yearly. The answer is annualization, where we annualize
the returns to the same time scale of a year for a fair comparison.
Annualizing returns is a crucial step in comparing the performance
of assets with different investment horizons. By converting returns to
an annualized basis, we can more easily evaluate and compare the
performance of various assets on a standardized time scale. This
process helps to level the playing ield and facilitate informed decision-
making.
The overall process for annualizing returns is as follows:
Calculate the 1+R formatted return for the given period.
Raise the 1+R formatted return to the power of the number of
periods per year.
Subtract one to convert the result from the 1+R format back to the
return itself.
Let us look at an example. Suppose we have an asset that generates
a monthly return of 1%. To calculate the annualized return, we need to
enlarge the time horizon to a year. However, simply multiplying 12 by
1% is incorrect. To proceed with the sequential compounding process,
we would construct the 1+R formatted return (1 + 0.01) for each
month, multiply across all 12 months to reach (1 + 0.01)12, and inally
subtract by one to give (1 + 0.01)12 − 1 ≈ 12.68%, which is higher than
12%. Calculating the annualized return thus involves deriving the 1+R
formatted return, multiplying these returns by the number of periods
per year, and subtracting by one to convert from 1+R to R.
This calculation shows that the annualized return is 12.68%, which
is higher than simply multiplying the 1% monthly return by 12. This
difference is due to the compounding effect, which is an essential
factor to consider when annualizing returns.
>>> prices[1]/prices[0] – 1
1.0
>>> prices[2]/prices[1] – 1
-1.25
>>> print(prices[1:])
[0.2, -0.05]
>>> print(prices[:-1])
[0.1, 0.2]
Now we can do division for the corresponding elements in one shot.
However, we need to convert both lists to NumPy arrays in order for
the element-wise multiplication to work:
>>>
print(np.array(prices[1:])/np.array(prices[:-1])-1)
[ 1. -1.25]
prices_df = pd.DataFrame({"price":prices})
>>> prices_df
price
0 0.10
1 0.20
2 -0.05
>>> prices_df.iloc[1:]
price
1 0.20
2 -0.05
>>> prices_df.iloc[:-1]
price
0 0.1
1 0.2
Pay attention to the indexes in the irst column here. These are the
default row-level indexes assigned upon creating the Pandas
DataFrame, and these indexes remain unchanged even after the
subsetting operation. Having misaligned indexes could easily lead to
problems when trying to combine two DataFrames. In this case, we
would end up with an unwanted result when we divide these two
DataFrames:
>>> prices_df.iloc[1:]/prices_df.iloc[:-1]
price
0 NaN
1 1.0
2 NaN
>>> prices_df.iloc[1:].values/prices_df.iloc[:-1]
– 1
price
0 1.00
1 -1.25
>>> prices_df.iloc[1:]/prices_df.iloc[:-1].values
– 1
price
1 1.00
2 -1.25
Let us stay with the shifting operation a bit longer. It turns out that
there is a function with the same name. For example, to shift the prices
downward by one unit, we can pass one to the shift() function of the
Pandas DataFrame object as follows:
>>> prices_df.shift(1)
price
0 NaN
1 0.1
2 0.2
Notice that the irst element is illed with NaN since there is no
value before the irst price. We can then divide the original DataFrame
by the shifted DataFrame to obtain the sequence of single-period 1+R
formatted returns and subtract by one to get the normal return:
>>> prices_df/prices_df.shift(1) - 1
price
0 NaN
1 1.00
2 -1.25
returns_df = prices_df.pct_change()
>>> returns_df
price
0 NaN
1 1.00
2 -1.25
>>> returns_df + 1
price
0 NaN
1 2.00
2 -0.25
>>> np.prod(returns_df + 1) – 1
price -1.5
dtype: float64
>>> (returns_df+1).prod() – 1
price -1.5
dtype: float64
r = 0.0001
>>> (1+r)**252-10
0.025518911987694626
r = 0.01
>>> (1+r)**12-1
0.12682503013196977
r = 0.05
>>> (1+r)**4-1
0.21550625000000023
Analyzing Risk
The risk of an asset is related to volatility, which is of equal or higher
importance than the reward. Volatility is a crucial metric in assessing
the risk of an investment, as it represents the level of uncertainty or
luctuations in the asset’s returns. A higher volatility implies a higher
risk, as the asset’s price can experience more signi icant ups and
downs. To quantify the risk associated with an investment, we must
understand the concept of volatility and how to calculate it.
Recall the returns of two assets in Figure 4-3. Despite having the
same average reward, asset 2 is more volatile than asset 1. Asset 2
deviates from the mean more often and more signi icantly than asset 1.
Volatility thus measures the degree of deviation from the mean. We will
formalize the notion of volatility in this section.
Before looking at volatility, let us irst introduce the concept of
variance and standard deviation.
Here, Ri − RP also means to de-mean the original return Ri, that is,
subtract the mean return RP from the original return Ri. This gives
deviation from the mean. Also, by squaring these deviations, the
problem of canceling out positive and negative terms no longer exists;
all de-meaned returns end up being positive or zero. Finally, we take
the average of the squared deviations as the variance of the return
series. A visual inspection of Figure 4-3 also suggests that asset 2 has a
higher variance than asset 1.
Although variance summarizes the average degree of deviation
from the mean return, its unit is the squared distance from the average
return, making it dif icult to interpret the unit. In practice, we would
often take the square root of the variance and bring it back to the same
scale as the return. The result is called standard deviation, where the
deviation is now standardized and comparable.
Annualizing Volatility
Similar to return, the volatility also needs to be annualized to warrant a
fair comparison. Without annualizing the volatility, it is dif icult to
compare the volatility of monthly data with that of daily data.
The formula for annualizing the volatility relies on the fact that the
volatility increases with the square root of the time period T. The
annualized return σP, T can be calculated as
p1_ret = 0.05
p1_vol = 0.2
p2_ret = 0.1
p2_vol = 0.5
risk_free_rate = 0.03
Figure 4-9 Different risk-adjusted returns. Subtracting the risk-free rate from the (annualized)
return gives the excess return, which considers the market benchmark performance
import yfinance as yf
prices_df = yf.download(["AAPL","GOOG"],
start="2023-01-01")
>>> prices_df.head()
Listing 4-3 Downloading stock data using y inance
Figure 4-10 Printing the irst few rows of daily stock prices for Apple and Google
returns_df = prices_df.pct_change()
>>> returns_df.head()
AAPL GOOG
2023-01-03 NaN NaN
2023-01-04 0.010314 -0.011037
2023-01-05 -0.010605 -0.021869
2023-01-06 0.036794 0.016019
2023-01-09 0.004089 0.007260
Again, the irst row is empty since there is no data point before it.
We can remove this row using the dropna() function:
returns_df = returns_df.dropna()
>>> returns_df.head()
AAPL GOOG
2023-01-04 0.010314 -0.011037
2023-01-05 -0.010605 -0.021869
2023-01-06 0.036794 0.016019
2023-01-09 0.004089 0.007260
2023-01-10 0.004456 0.004955
>>> returns_df.std(axis=0)
AAPL 0.012995
GOOG 0.016086
dtype: float64
Google’s stock prices were more volatile than Apple’s in the irst
few days. Now let us try setting axis=1:
>>> returns_df.std(axis=1)
2023-01-04 0.015097
2023-01-05 0.007965
2023-01-06 0.014690
2023-01-09 0.002242
2023-01-10 0.000352
2023-01-11 0.009001
2023-01-12 0.002259
2023-01-13 0.000308
2023-01-17 0.011068
2023-01-18 0.000882
2023-01-19 0.016097
dtype: float64
The result shows the daily standard deviation calculated for the two
stocks combined.
Now we show how to calculate the volatility manually by going
through the exact steps described earlier. Our irst step is to de-mean
the daily returns and obtain the deviations from the (arithmetic)
mean:
The next step is to square these deviations so that they would not
cancel each other when summing together. Squaring is the same as
raising the element to the power of two, using the double asterisk
notation:
squared_deviations_df = deviations_df**2
>>> squared_deviations_df.head()
AAPL GOOG
2023-01-04 0.000010 2.350688e-04
2023-01-05 0.000318 6.845668e-04
2023-01-06 0.000874 1.374582e-04
2023-01-09 0.000010 8.787273e-06
2023-01-10 0.000008 4.352158e-07
variance = squared_deviations_df.mean()
>>> variance
AAPL 0.000154
GOOG 0.000235
dtype: float64
The last step is to take the square root of the variance to obtain the
volatility:
volatility = np.sqrt(variance)
>>> volatility
AAPL 0.012390
GOOG 0.015337
dtype: float64
Notice that the result is different from the one obtained using the
std() function! The cause for the difference is that the std()
function calculates the sample standard deviation, which divides N − 1
in the denominator as opposed to N in our manual calculations.
To correct this, let us revisit step three and divide the sum of
squared deviations by N − 1 this time. In Listing 4-5, we irst get the
number of rows N using the irst dimension (row dimension) of the
shape() function, then plug in the calculation based on the formula of
variance.
num_rows = squared_deviations_df.shape[0]
variance2 = squared_deviations_df.sum() /
(num_rows-1)
>>> variance2
AAPL 0.000169
GOOG 0.000259
dtype: float64
Listing 4-5 Calculating the sample variance
Taking the square root now gives the same result as using the
std() function:
volatility2 = np.sqrt(variance2)
>>> volatility2
AAPL 0.012995
GOOG 0.016086
dtype: float64
annualized_vol = returns_df.std()*np.sqrt(252)
>>> annualized_vol
AAPL 0.206289
GOOG 0.255356
dtype: float64
annualized_vol = returns_df.std()*(252**0.5)
>>> annualized_vol
AAPL 0.206289
GOOG 0.255356
dtype: float64
returns_per_day = (returns_df+1).prod()**
(1/returns_df.shape[0]) - 1
>>> returns_per_day
AAPL 0.007153
GOOG 0.004178
dtype: float64
annualized_return = (returns_per_day+1)**252-1
>>> annualized_return
AAPL 5.025830
GOOG 1.859802
dtype: float64
Listing 4-6 Annualizing the daily return
It seems Apple is doing quite well compared with Google for the
irst few days.
There is another way to calculate the annualized return, a faster
way:
annualized_return = (returns_df+1).prod()**
(252/returns_df.shape[0])-1
>>> annualized_return
AAPL 5.025830
GOOG 1.859802
dtype: float64
The key change here is that we raise the terminal return to the
power of 252/N. This is standardization, bringing the daily scale to the
yearly scale.
Calculating the Sharpe Ratio
Finally, let us compute the Sharpe ratio for both stocks. We assume a
risk-free interest rate of 3%, calculate the excess return by subtracting
it from the annualized return, and divide it by the annualized volatility
to obtain the Sharpe ratio. This is shown in Listing 4-7.
riskfree_rate = 0.03
excess_return = annualized_return - riskfree_rate
sharpe_ratio = excess_return/annualized_vol
>>> sharpe_ratio
AAPL 24.217681
GOOG 7.165694
dtype: float64
Listing 4-7 Calculating the Sharpe ratio
Summary
In this chapter, we explored the two key characteristics of any inancial
asset: risk and return. Return refers to the inancial reward an asset
brings, while risk represents the volatility or uncertainty of that return.
As investors, our goal is to maximize return while minimizing risk.
We introduced different ways to represent and calculate the
returns, including the simple return, terminal return, multiperiod
return, and the 1+R formatted return. It is important to understand the
connections among these forms of return when translating one form to
the other.
We then highlighted the risk-return trade-off, where low-return
assets are typically associated with low risk and high-return assets
with high risk. To better compare the risk and return for different
investment vehicles, we introduced the annualized return and
volatility, as well as a risk-adjusted return metric called the Sharpe
ratio. We also provided examples illustrating the importance of
considering both risk and return when comparing investment
products.
Exercises
How many inputs do we need to calculate a single-period return?
What is the return if the asset price changes from $5 to $6?
Is the total return of a popular stock typically higher or lower than its
price return?
Calculate the three-period return that consists of 10%, –5%, and 6%.
If we buy an asset that rises by 10% on day one and drops by 10% on
day two, is our return positive, negative, or zero?
Calculate the annualized return for an asset with a quarterly (three
months) return of 2%.
Download the YTD stock data for Apple and Tesla and calculate the
daily cumulative returns using the daily closing price. Plot the
returns as line charts.
Both annualized volatility and variance grow linearly with time,
correct?
Suppose the monthly volatility is 5%. Calculate the annualized
volatility.
The annualized volatility is always greater than the monthly
volatility. True or false?
The risk-free rate is the return on an investment that carries a low
risk. True or false?
If the risk-free rate goes up and the volatility of the portfolio remains
unchanged, will the Sharpe ratio increase or decrease?
Obtain monthly return data based on the median daily price per
month of Apple stock in the irst half of 2022. Calculate the
annualized return and volatility based on the monthly returns.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_5
5. Trend-Following Strategy
Peng Liu1
(1) Singapore, Singapore
Figure 5-2 Calculating the simple returns based on the de inition of percentage return
This requires two steps: irst, calculate the ratio to obtain the so-called
1+R return. This ratio re lects the growth factor of the asset’s price from the
beginning of the period to the end. If this ratio is greater than one, it indicates
that the asset’s price has increased over the period. If it’s less than one, it
indicates a decrease in the asset’s price. If the ratio equals one, it means the
asset’s price hasn’t changed.
Next, we would subtract one from the 1+R return to convert it to the simple
return. This step transforms the growth factor into the actual percentage
return. Subtracting one essentially removes the initial investment from the
calculation, leaving only the gained or lost amount relative to the initial
investment, which is the return. See Figure 5-3 for an illustration, where the
daily returns are the same as in the previous approach.
Figure 5-3 Calculating the simple returns based on the 1+R approach
This 1+R method is often used because it is more intuitive. The growth
factor easily shows how much the initial investment has grown (or shrunk),
and subtracting one gives the net growth in percentage terms, which is the
simple return. This method is especially useful when dealing with multiple time
periods, as growth factors can simply be multiplied together to calculate the
cumulative growth factor over several periods.
Q4: What is the terminal return from day 1 to day 5 without compounding?
Answer: The terminal return is the total return on an investment over a given
period of time. It’s a measure of the total gain or loss experienced by an
investment from the start of the investment period to the end, without
considering any compounding effect over the period.
To calculate the terminal return without involving the compounding process,
we would resort to , where the second formula irst
calculates the ratio of the asset’s price on day 5 to its price on day 1 (which
re lects the overall growth factor) and then subtracts one to convert the growth
factor into a terminal return. See Figure 5-4 for an illustration.
Figure 5-4 Calculating the terminal return without compounding
Q5: What is the terminal return from day 1 to day 5 with compounding? Is it
equal to the result in Q4?
Answer: Compounding returns is an important concept in inance. It re lects
the fact that not only your initial investment earns a return but also the returns
from previous periods. This leads to exponential growth over time, given a
positive return rate.
We will ill in the “return3” column, where each cell is a product between the
1+R return of the current period and the cumulative 1+R return of the previous
period, offset by one. For the irst period (from day 1 to day 2), the “return3”
value would be just the “1 + R” return for this period. See Figure 5-5 for an
illustration.
Figure 5-5 Calculating the terminal return using compounding
As it turns out, the terminal return is 6%, which is the same as previously
calculated.
Q6: Sum up the single-period returns in Q3. Is it equal to the result in Q4?
Answer: The result shows that it is different from 6%. In general, adding up
single-period returns can lead to incorrect conclusions about the overall return
on investment. The sum of the single-period returns is not equal to the terminal
return (from Q4) because this approach overlooks the effect of compounding. In
other words, by simply summing up single-period returns, we are effectively
treating each period’s return as if it was independent and earned on the initial
investment amount, disregarding the fact that the investment grows with each
period due to the returns earned in the prior periods. This is why we see a
difference between the summed single-period returns and the terminal return
calculated through the correct method that takes into account the compounding
effect.
The principle of compounding acknowledges that returns accumulate over
time, meaning the returns earned in one period are reinvested and can generate
further returns in subsequent periods. So, while the sum of single-period returns
might provide a rough estimate of the total return, it is not a correct measure,
especially when the time span is long, or the return rate is high. Instead, the
appropriate way to calculate the total return over multiple periods is to use the
concept of compound returns, which considers both the initial investment and
the reinvestment of returns. It is thus important to follow the sequential
compounding process when calculating the terminal return. See Figure 5-6 for
an illustration.
Figure 5-6 Summing up all single-period returns
Here, St + 1 and St represent the asset price at the future time t + 1 and the
current time t, respectively, and ln denotes the natural logarithm. See Figure 5-7
for an illustration.
Here, the irst-period return is NaN as there is no prior stock price available.
Let us calculate the terminal return using the original approach by taking the
irst and last closing prices as the inputs (based on the de inition given earlier),
as shown in Listing 5-3.
# terminal return
terminal_return = df.Close[-1]/df.Close[0] - 1
>>> terminal_return
-0.01716826464354737
Listing 5-3 Calculating the terminal return using the original approach by de inition
We can also calculate the same value by compounding the (1+R) returns
based on the .cumprod() function, as shown in Listing 5-4.
# cumulative returns
cum_returns = (1+returns).cumprod() - 1
>>> cum_returns
Date
2023-01-03 00:00:00-05:00 NaN
2023-01-04 00:00:00-05:00 -0.011037
2023-01-05 00:00:00-05:00 -0.032664
2023-01-06 00:00:00-05:00 -0.017168
Name: Close, dtype: float64
Listing 5-4 Calculating the same cumulative terminal return by compounding 1+R formatted returns
Now we calculate the same using log returns, starting by obtaining the
single-period log returns in Listing 5-5.
We can add all log returns from previous periods together to get the
cumulative log returns, convert back to the original scale via exponentiation,
and, lastly, offset by one to convert from 1+R to the simple return format, as
shown in Listing 5-6.
Again, we verify the value of the last entry and verify that it is the same as the
previous terminal return:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
symbol = 'AAPL'
df = yf.download(symbol, start="2022-01-01", end="2023-
01-01")
df.index = pd.to_datetime(df.index)
>>> df.head()
Open High Low Close
Adj Close Volume
Date
2022-01-03 177.830002 182.880005 177.710007 182.009995
180.434296 104487900
2022-01-04 182.630005 182.940002 179.119995 179.699997
178.144302 99310400
2022-01-05 179.610001 180.169998 174.639999 174.919998
173.405685 94537600
2022-01-06 172.699997 175.300003 171.639999 172.000000
170.510956 96904000
2022-01-07 172.889999 174.139999 171.029999 172.169998
170.679489 86709100
Listing 5-7 Downloading Apple’s stock price data
Note that we have an index named Date which now assumes a datetime
format to facilitate plotting.
Listing 5-8 generates a plot on the daily adjusted closing price. We will later
overlay its SMA on the same plot.
Now we create an SMA series with a window size of three. We can create the
rolling window using the rolling() method for a Pandas Series, followed by
the mean() method to extract the average value from the window (a collection
of price points). Listing 5-9 creates a new SMA column called SMA-3 and subsets
to keep only two columns: the adjusted closing price and the SMA column.
window = 3
SMA1 = "SMA-"+str(window)
df[SMA1] = df['Adj Close'].rolling(window).mean()
colnames = ["Adj Close",SMA1]
df2 = df[colnames]
>>> df2.head()
Adj Close SMA-3
Date
2022-01-03 180.434296 NaN
2022-01-04 178.144302 NaN
2022-01-05 173.405685 177.328094
2022-01-06 170.510956 174.020315
2022-01-07 170.679489 171.532043
Listing 5-9 Creating simple moving averages
Let us pause for a moment and look at how this column is generated. We see
that the irst two rows in the SMA column are missing. This makes sense as both
of them are unable to get a full three-period moving window to calculate the
average. In other words, we cannot calculate the average when there is an empty
value in the window unless additional treatment is applied here, such as
ignoring the empty value while calculating the average.
We note that the third entry of the SMA column is 177.844493. Let us verify
through manual calculation. The following command takes the irst three entries
of the adjusted closing price column and calculates the average, which reports
the same value:
which veri ies the calculation. Figure 5-10 summarizes the process of
calculating SMA in our running example.
Note that we can con igure the min_periods argument in the rolling()
function to control the behavior at the initial windows with incomplete data. For
example, by setting min_periods=1, the previous code will report the average
value based on the available data in the window. See the following code snippet
for a comparison:
Note that the only difference is in the irst two entries, where we have an
incomplete set of values in the rolling window.
Next, we plot the three-period SMA alongside the original daily adjusted
closing price series, as shown in Listing 5-10.
Running these commands generates Figure 5-11. Note that the three-period
SMA curve in red looks less volatile than the original price series in blue. Also,
the three-period SMA curve starts from the third entry.
Figure 5-11 Visualizing the original price and three-period SMA
Now let us add another SMA with a longer period. In Listing 5-11, we add a
20-period SMA as an additional column to df2.
window = 20
SMA2 = "SMA-"+str(window)
df2["SMA-"+SMA2] = df2['Adj
Close'].rolling(window).mean()
colnames = ["Adj Close",SMA1,SMA2]
Listing 5-11 Creating 20-period SMA
Running these commands generates Figure 5-12, which shows that the 20-
period SMA is smoother than the 3-period SMA due to a larger window size.
Figure 5-12 Visualizing the daily prices together with 3-period and 20-period SMAs
where α is the smoothing factor which ranges between zero and one. The
smoothing factor α determines the weight given to the most recent price
relative to the existing EMA. A higher α emphasizes recent prices more strongly.
As for the irst EWMA value at time t = 0, a default choice is to set
EWMA0 = S0. Therefore, EMA assumes that recent data is more relevant than old
data. Such an assumption has its merit since EMA can react faster to changes
and is thus more sensitive to recent movements as compared to the simple
moving average. This also means there is no window size to be speci ied by the
function since all historical data points are in use.
It’s important to note that while EMA provides more accurate and timely
signals than SMA, it might also produce more false signals as it’s more
responsive to short-term price luctuations.
The EMA can be calculated by calling the ewm() method from a Pandas
Series object, followed by extracting the average value via mean(). We can set
the alpha argument in ewm() to directly control the importance of the current
observation compared with historical ones. See Listing 5-13 for an illustration,
where we set α = 0.1 to give more weightage to historical prices.
alpha = 0.1
df2['EWM_'+str(alpha)] = df2['Adj
Close'].ewm(alpha=alpha, adjust=False).mean()
df2.head()
Adj Close SMA-3 SMA-20 EWM_0.1
Date
2022-01-03 180.434296 NaN NaN 180.434296
2022-01-04 178.144302 NaN NaN 180.205296
2022-01-05 173.405685 177.328094 NaN 179.525335
2022-01-06 170.510956 174.020315 NaN 178.623897
2022-01-07 170.679489 171.532043 NaN 177.829456
Listing 5-13 Creating EMA series
We observe that there is no missing value in the EMA series. Indeed, the irst
entry will simply be the original price itself due to the design of the EMA
weighting scheme.
As usual, let us verify the calculations to ensure our understanding is on the
right track. The following code snippet manually calculates the second EMA
value, which is the same as the one obtained using the ewm() function:
alpha=0.1
>>> alpha*df2['Adj Close'][1] + (1-alpha)*df2['Adj
Close'][0]
180.73006591796877
Let us continue to create another EMA series with α = 0.5. In other words, we
assign an equal weightage to the current observation and historical ones:
alpha = 0.5
df2['EWM_'+str(alpha)]= df2['Adj Close'].ewm(alpha=alpha,
adjust=False).mean()
df2.head()
Adj Close SMA-3 SMA-20
EWM_0.1 EWM_0.5
Date
2022-01-03 180.434296 NaN NaN 180.434296
180.434296
2022-01-04 178.144302 NaN NaN 180.205296
179.289299
2022-01-05 173.405685 177.328094 NaN 179.525335
176.347492
2022-01-06 170.510956 174.020315 NaN 178.623897
173.429224
2022-01-07 170.679489 171.532043 NaN 177.829456
172.054357
Let us put all these moving averages in a single chart. Here, the plot()
function treats all four columns as four separate series to be plotted against the
index column, as shown in Listing 5-14.
df2.plot(linewidth=3, figsize=(12,6))
plt.title('Daily adjusted closing price with SWA and
EWM', fontsize=20)
plt.xlabel('Date', fontsize=16)
plt.ylabel('Price', fontsize=16)
Listing 5-14 Plotting all moving averages together
Having looked at how to compute these moving averages, the next section
shows how to use them as technical indicators to develop a trend-following
strategy.
>>> df2.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 251 entries, 2022-01-03 00:00:00-05:00 to
2022-12-30 00:00:00-05:00
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Adj Close 251 non-null float64
1 SMA-3 249 non-null float64
2 SMA-20 232 non-null float64
3 EWM_0.1 251 non-null float64
4 EWM_0.5 251 non-null float64
dtypes: float64(5)
memory usage: 19.9 KB
Now we will use SMA-3 and SMA-20 as the respective short-term and long-
term moving averages, whose crossover will generate a trading signal. We leave
it as an exercise to try both SMA with different window sizes and EMA with
different weighting schemes.
Note that we can only use the information up to yesterday to make a trading
decision for tomorrow. We cannot use today’s information since the closing
price is not yet available in the middle of the day. To enforce this requirement,
we can shift the moving averages one day into the future, as shown in the
following code snippet. This essentially says that the moving average for today
is derived from historical information up to yesterday.
Now let us implement the trading rule: buy if SMA-3 > SMA-20, and sell if
SMA-3 < SMA-20. Such an if-else condition can be created using the
np.where() function, as shown in Listing 5-15.
# identify buy signal
df2['signal'] = np.where(df2['SMA-3'] > df2['SMA-20'], 1,
0)
# identify sell signal
df2['signal'] = np.where(df2['SMA-3'] < df2['SMA-20'],
-1, df2['signal'])
df2.dropna(inplace=True)
Listing 5-15 Creating and identifying buy and sell signals
>>> df2['signal'].value_counts()
-1 135
1 96
Name: signal, dtype: int64
The result shows that there are more declining days than inclining days,
which con irms the downward trending price series shown earlier.
Next, we introduce a baseline strategy called buy-and-hold, which simply
means we hold one share of Apple stock until the end of the whole period. Also,
we will use the log return instead of the raw return to facilitate the calculations.
Therefore, instead of taking the division between consecutive stock prices to get
, we now take the difference log St + 1 − log St to get , which can then
be exponentiated to convert to back .
The following code snippet calculates the instantaneous logarithmic single-
period return, where we irst take the logarithm of the adjusted closing prices
and then call the diff() function to obtain the differences between
consecutive pairs of prices:
df2['log_return_buy_n_hold'] = np.log(df2['Adj
Close']).diff()
Now comes the calculation of the single-period return for the trend-
following strategy. Recall the signal column we created earlier. This column
represents whether we go long (valued 1) or short (value –1) in a position for
every single period. This also shows that the logarithmic return is
positive if St + 1 > St and negative if St + 1 < St. This creates the following four
scenarios when the asset moves from St to St + 1:
When we long an asset and its logarithmic return is positive, the trend-
following strategy reports a positive return, that is, .
When we long an asset and its logarithmic return is negative, the trend-
following strategy reports a negative return, that is, .
When we short an asset and its logarithmic return is positive, the trend-
following strategy reports a negative return, that is, .
When we short an asset and its logarithmic return is negative, the trend-
following strategy reports a positive return, that is, .
df2['log_return_trend_follow'] = df2['signal'] *
df2['log_return_buy_n_hold']
Listing 5-16 Calculating the log return of the trend-following strategy
df2['action'] = df2.signal.diff()
We can produce a frequency count of different trading actions using the
value_counts() function:
>>> df2['action'].value_counts()
0.0 216
2.0 7
-2.0 7
Name: action, dtype: int64
The result shows that the majority of the trading days do not require action.
For the 14 days with a trading action, 7 days change the position from short to
long, and another 7 change from long to short.
We can visualize these trading actions as triangles on the graph with stock
prices and SMAs. In Listing 5-17, we indicate a buy action via the green triangle
facing upward when the short-term SMA crosses above the long-term SMA. On
the other hand, we use a red triangle facing downward to indicate a sell action
when the short-term SMA crosses below the long-term SMA.
plt.rcParams['figure.figsize'] = 12, 6
plt.grid(True, alpha = .3)
plt.plot(df2['Adj Close'], label = 'Adj Close')
plt.plot(df2['SMA-3'], label = 'SMA-3')
plt.plot(df2['SMA-20'], label = 'SMA-20')
plt.plot(df2.loc[df2.action == 2].index, df2['SMA-3']
[df2.action == 2], '^',
color = 'g', markersize = 12)
plt.plot(df2[df2.action == -2].index, df2['SMA-20']
[df2.action == -2], 'v',
color = 'r', markersize = 12)
plt.legend(loc=1);
Listing 5-17 Visualizing trading actions
Running these commands generates Figure 5-14. Again, we denote the green
triangles as acting from short to long and the red triangles as moving from long
to short.
Figure 5-14 Visualizing the trading actions, including going from short to long (green triangles) and long to
short (red triangles)
Let us analyze the cumulative returns of each period for both trading
strategies. Speci ically, we would like to obtain the inal percentage return at the
end of 2022 if we started with one unit of Apple stock at the beginning of 2022,
comparing the two trading strategies.
Recall that we need to multiply the 1+R return at each period to carry out the
compounding process in order to obtain the terminal return (after subtracting
one). We also know that the 1+R return is the same as the division between two
consecutive prices, that is, . Therefore, to calculate the
terminal return, we irst convert the returns from the logarithmic format to the
usual percentage format using the np.exp() function, then carry out the
compounding by performing a cumulative product operation using the
cumprod() method. This is achieved via Listing 5-18, where we leave out the
last step of subtracting by one and report the 1+R return.
plt.plot(np.exp(df2['log_return_buy_n_hold']).cumprod(),
label='Buy-n-hold')
plt.plot(np.exp(df2['log_return_trend_follow']).cumprod(),
label='Trend following')
plt.legend(loc=2)
plt.title("Cumulative return of different trading
strategies")
plt.grid(True, alpha=.3)
Listing 5-18 Visualizing cumulative returns
Running these commands generates Figure 5-15, which shows that the
trend-following strategy clearly outperforms the buy-and-hold strategy.
However, note that this is a simpli ied setting that does not take into account
transaction cost and other market factors. More analyses and tests are needed
to assess the performance of this trading strategy (also many others) in the real-
world environment.
Figure 5-15 Comparing the cumulative return of buy-and-hold and trend-following strategies for one share
of Apple’s stock
It turns out that sticking to the buy-and-hold strategy would lose by 25%,
while using the trend-following strategy generates a terminal return of 7%.
Summary
In this chapter, we covered the basics of the popular trend-following strategy
and its implementation in Python. We started with an exercise on working with
log returns and then transitioned to different moving averages as commonly
used technical indicators, including simple moving averages and exponential
moving averages. Lastly, we discussed how to generate trading signals and
calculate the performance metrics using this strategy, which will serve as a good
baseline strategy as we delve into other candidates later on.
Exercises
Explain why log returns are symmetric mathematically.
How can we deal with a situation where the price point at a given day is
missing when calculating its moving average?
How does the value of the window size affect the smoothness of the SMA?
What about the impact of α on the smoothness of EMA?
Change the code to obtain a moving median instead of a moving average.
Discuss the difference between the median and the mean. How about
maximum and minimum over the same rolling window?
Switch to EMA to derive the trading signals and discuss the results.
Show mathematically why the log returns are additive over time and explain
the signi icance of this property in the context of asset returns.
Suppose there are multiple missing price points in your data, how would you
modify the moving average calculation to handle these gaps? What are the
potential issues with your approach?
Experiment with different window sizes for SMA and different values of α for
EMA. Discuss how these parameters affect the sensitivity of the moving
averages to price changes. How would you choose an optimal value for these
parameters?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_6
Momentum trading is a strategy that makes use of the strength of price movements as a basis for opening
positions, either longing or shorting a set of assets. It involves buying and/or selling a selected set of assets
according to the recent strength of price trends, assuming that these trends will continue in the same
direction if there is enough force behind a price move. When using momentum trading, traders intend to
capitalize on the force or speed of price movements to determine investment positions. They would either
initiate long or short positions in a curated selection of assets based on the recent vigor of price trends.
Crucially, the key presumption underpinning this approach is that existing trends, given that their force is
strong enough, will persist in the same direction.
When an asset displays an upward trend, registering higher prices, it invariably attracts more attention
from a wider spectrum of traders and investors. The heightened attention garnered by the asset fuels its
market price further. This momentum endures until a signi icant number of sellers enter and penetrate the
market, supplying an abundance of the asset. Once enough sellers are in the market, the momentum
changes its direction and forces an asset’s price to go lower. This is essentially the price dynamics between
supply and demand. At this juncture, market participants may reassess the fair price of the asset, which may
be perceived as overvalued due to the recent price surge.
In other words, as more sellers in iltrate the market, the momentum alters its course, pushing the
asset’s price in a downward direction. This is essentially a representation of the classic supply and demand
dynamics and the shift from an environment with more buyers than sellers to one where sellers outweigh
buyers. Also, note that while price trends can persist for an extended period, they will inevitably reverse at
some point. Thus, the ability to identify these in lection points and adjust the positions accordingly is also
of equal importance.
Figure 6-1 Characterizing the momentum trading strategy for three stocks
The momentum trading strategy is particularly effective in equities, offering a systematic approach to
compare and analyze similar assets. It performs a cross-sectional analysis across the equity universe (in
this case, three stocks), evaluating and rank-ordering the constituents based on their relative performances
over a speci ied lookback period. This process enables traders to identify strong performers and potential
laggards, using their recent momentum as a proxy for future performance.
In making a trading decision, the momentum strategy often embraces a two-pronged approach,
establishing a portfolio with two legs. The irst leg is the “long” leg, consisting of top-ranked assets
projected to maintain their strong upward price momentum. Traders buy these stocks with an expectation
of price appreciation, aiming to sell at a higher price in the future. The second leg is the “short” leg, made up
of bottom-ranked assets showing signs of declining price momentum. Traders sell these stocks, often
through short-selling, where they borrow the stock to sell in the market with the intent to buy it back at a
lower price later. The idea is to pro it from the anticipated price decline of these assets. By going long on
assets with strong positive momentum and short on assets with negative momentum, traders can
potentially bene it from both rising and falling markets, provided the identi ied momentum persists over
the holding period.
Note that momentum strategies, grounded in the principle of relative momentum, maintain their long
and short positions irrespective of the broader market trends. These strategies function on the assumption
that the strongest performers and underperformers will persist in their respective trajectories, thus
maintaining their relative positions in the investment universe. In other words, in a bullish market
environment, the stocks with the strongest upward momentum are expected to outperform the market.
Meanwhile, during bearish phases, these same high-momentum stocks may fall in price, but they are still
expected to perform better than other stocks that are falling more rapidly. Conversely, the bottom-ranked
stocks, showing declining momentum, are expected to underperform the market. In a rising market, these
stocks may increase in value, but at a slower pace than the market. Similarly, in a falling market, these stocks
are anticipated to decline more rapidly than the broader market. Thus, irrespective of whether the market
is bullish or bearish, momentum strategies rely on the persistence of relative performance.
Contrary to the momentum trading strategy, which mandates regular trading based on a prede ined
lookahead window, the trend-following strategy operates without a set trading frequency. Rather, it’s driven
entirely by the data at hand. Trading actions are informed by the moving averages’ interactions, leading to
potentially less frequent but more strategically timed trades. Such a mechanism makes the trend-following
strategy more lexible as it adapts to the market’s movements.
Note that in a trend-following strategy, the primary concern is whether an asset is on an upward or
downward trend. When employing this strategy, traders do not focus on the comparative performance of
different assets against each other, as in a momentum strategy. Rather, their interest lies in identifying and
capitalizing on established price trends of individual assets. The underlying assumption for this strategy is
that the identi ied asset prices that have been rising or falling steadily over a period will continue to move
in the same direction. So, a trader would go long when an asset shows an upward trend and go short when
it’s on a downward trend. The action is to “ride the wave” as long as the trend continues. The “trendiness” of
the market completely determines the trading decisions of the strategy.
In summary, while both strategies aim to exploit market momentum, the trend-following strategy
involves time series analysis that relies on the absolute momentum in historical prices of the same asset,
and the momentum trading strategy involves cross-sectional analysis that relies on the relative momentum
across multiple assets. Thus, these two strategies are fundamentally different from each other.
The next section introduces implementing the momentum trading strategy using Python.
import pandas as pd
import requests
from bs4 import BeautifulSoup
import os
import numpy as np
import pandas as pd
import yfinance as yf
Listing 6-1 Importing relevant packages
Next, we write a function called fetch_info() to complete the scraping task. As shown in Listing 6-2,
we irst assign the web link to the url variable and store the header details in the headers variable. The
headers are necessary metadata upon visiting a website. We then send a GET request to obtain information
from the speci ied web link via the requests.get() method and pull and parse the data out of the
scraped HTML ile using BeautifulSoup(), stored in the soup variable. We can then ind the meat in the
soup by passing the speci ic node name (table in this case) to the find_all() function, read the HTML
data into a DataFrame format using the read_html() function from Pandas, and drop the unnecessary
column (the Notes column) before returning the DataFrame object. Finally, if the scraping fails, the
function will print out an error message via a try-except control statement.
def fetch_info():
try:
url = "https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:101.0) Gecko/20100101 Firefox/101.0',
'Accept': 'application/json',
'Accept-Language': 'en-US,en;q=0.5',
}
# Send GET request
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
# Get the symbols table
tables = soup.find_all('table')
# # Convert table to dataframe
df = pd.read_html(str(tables))[1]
# Cleanup
df.drop(columns=['Notes'], inplace=True)
return df
except:
print('Error loading data')
return None
Listing 6-2 Fetching relevant information from the web page
Now let us call the function to store the result in dji_df and output the irst ive rows, as shown in the
following:
We can then take the Symbol column, extract the values, and convert it to a list format:
tickers = dji_df.Symbol.values.tolist()
With the DJI tickers available, we can now download the stock prices for these ticker symbols using the
yfinance package.
start_date = "2021-01-01"
end_date = "2022-09-01"
df = yf.download(tickers, start=start_date, end=end_date)
Listing 6-3 Downloading the daily stock prices of DJI tickers
By now, we have stored the stock prices of the 30 DJI constituents, with each column representing one
ticker and each row indicating a corresponding trading day. The index of the DataFrame follows the
datetime format.
Next, we convert the daily stock prices to monthly returns.
Although chaining together relevant operations looks more concise, it is not the best way to learn these
operations if this is the irst time we encounter them. Let us decompose these operations. The irst
operation is to call the pct_change() method, which is a convenient function widely used in many
contexts. Next comes the resample() function, which is a convenient method for frequency conversion
and resampling of time series data. Let us use some dummy data to understand this function.
The following code snippet creates a Pandas Series object with nine integers ranging from zero to eight,
which are indexed by nine one-minute timestamps:
We then aggregate the series into three-minute bins and sum the values of the timestamps falling into a
bin, as shown in the following code snippet:
>>> series.resample('3T').sum()
2000-01-01 00:00:00 3
2000-01-01 00:03:00 12
2000-01-01 00:06:00 21
Freq: 3T, dtype: int64
As we can see from the result, the resample() function completes the aggregation operation by the
speci ied interval, and the following method summarizes the data within the interval.
Back to our running example, we downsample the raw daily returns into monthly returns, so each month
is represented with only one data point instead of 30 (in a typical month). The aggregation works by
cumulating all daily returns following the same procedure: converting to 1+R format, compounding, and
then converting back to simple return.
The new thing here is the lambda function. We use the x symbol to represent a general input argument.
In this case, it will be all the raw daily returns in a given month. Since this lambda function performs a
customized operation, we use the agg() function to carry through the customized function, instead of
using the built-in function such as sum() as before.
By now, we have converted the daily returns to monthly representations where every single monthly
return represents the terminal return of the daily returns compounded within the month. Next, we calculate
another metric using historical monthly returns to indicate the current month’s stock performance.
By now, we have calculated the six-month terminal monthly return as the cumulative return of the past
six months, including the current month. This also explains why the irst ive months show empty values in
the previous result and the cumulative monthly returns only start from the sixth month.
Next, we look at using these terminal returns to generate trading signals.
Since our data lasts until 2022-08-31, we will use 2022-07-31 as the trade formation period. To generate
a trading strategy, we will use the terminal monthly return from the previous month indexed at 2022-06-30
as the end of the measurement period. We resort to the datetime package to encode these two dates, as
shown in Listing 6-6.
import datetime as dt
end_of_measurement_period = dt.datetime(2022,6,30)
formation_period = dt.datetime(2022,7,31)
Listing 6-6 Identifying the measurement and formation periods
These dates will then be used to slice the cumulative monthly return DataFrame stored in
past_cum_return_df. In the following code snippet, we pass the end_of_measurement_period
variable to the .loc[] property of past_cum_return_df to perform label-based indexing at the row
level. Since the result is Pandas Series indexed by the 30 ticker symbols, we will use the reset_index()
method to reset its index to zero-based integers and bring the symbols as a column in the resulting
DataFrame. The following code snippet shows the resulting cumulative terminal returns at the end of the
measurement period:
end_of_measurement_period_return_df =
past_cum_return_df.loc[end_of_measurement_period]
end_of_measurement_period_return_df =
end_of_measurement_period_return_df.reset_index()
>>> end_of_measurement_period_return_df.head()
index 2022-06-30 00:00:00-04:00
0 AAPL -0.227936
1 AMGN 0.099514
2 AXP -0.144964
3 BA -0.320882
4 CAT -0.126977
These six-month terminal monthly returns of the 30 DJI constituents represent the relative momentum
of each stock. We can observe the stock symbols and returns with the highest momentum in the positive
and negative directions using the following code snippet:
Here, we used the methods idxmax() and idxmin() to return the index of the maximum and
minimum values, respectively.
These two stocks would become the best choices if we were to long or short an asset. Instead of focusing
on only one stock in each direction (long and short), we can enlarge the space and use a quantile approach
for stock selection. For example, we can classify all stocks into ive groups (also referred to as quantiles or
percentiles) based on their returns and form a trading strategy that longs the stocks in the top percentile
and shorts those in the bottom percentile.
To obtain the quantile of each return, we can use the qcut() function from Pandas, which receives a
Pandas Series and cuts it into a prespeci ied number of groups based on their quantiles, thus discretizing
the continuous variables into a categorical, more speci ically, and ordinal one. The following code snippet
provides a short example:
Thus, the qcut() function rank-orders the series into ive groups based on their quantiles. We can now
similarly rank-order the returns and store the result in a new column called rank, as shown in Listing 6-7.
end_of_measurement_period_return_df['rank'] =
pd.qcut(end_of_measurement_period_return_df.iloc[:,1], 5, labels=False)
>>> end_of_measurement_period_return_df.head()
index 2022-06-30 00:00:00-04:00 rank
0 AAPL -0.227936 1
1 AMGN 0.099514 4
2 AXP -0.144964 2
3 BA -0.320882 0
4 CAT -0.126977 2
Listing 6-7 Rank-ordering the stocks based on cumulative terminal monthly returns
We can now use this column to select the top and bottom performers. Speci ically, we will long the stocks
ranked four and short the stocks ranked zero. Let us observe the stock symbols in these two groups via
Listing 6-8.
long_stocks =
end_of_measurement_period_return_df.loc[end_of_measurement_period_return_df["r
>>> long_stocks
array(['AMGN', 'CVX', 'IBM', 'KO', 'MRK', 'TRV'], dtype=object)
short_stocks =
end_of_measurement_period_return_df.loc[end_of_measurement_period_return_df["r
>>> short_stocks
array(['BA', 'CRM', 'CSCO', 'DIS', 'HD', 'NKE'], dtype=object)
Listing 6-8 Obtaining the stock tickers to long or short
Having identi ied the group of stocks to be bought or sold, we will execute the trading actions and enter
into these positions for a period of one month. Since the current period is 2022-07-31, we will evaluate the
out-of-sample performance of the momentum strategy on 2022-08-31.
The result shows that the majority of the top performers are decreasing in price, which is a direct
re lection of market sentiment during that period of time. We can similarly obtain the evaluation-period
performance for the bottom performances in the short position, as shown in Listing 6-10.
short_return_df = mth_return_df.loc[formation_period
+ relativedelta(months=1), \ mth_return_df.columns.isin(short_stocks)]
>>> short_return_df
BA 0.005900
CRM -0.151614
CSCO -0.014327
DIS 0.056362
HD -0.035350
NKE -0.073703
Name: 2022-08-31 00:00:00-04:00, dtype: float64
Listing 6-10 Obtaining the performance of stocks in a short position at the evaluation period
Now we calculate the return of the evaluation period based on these two positions. We assume an
equally weighted portfolio in both positions. Thus, the inal return is the average of all member stocks in the
respective position. Also, since we hold a short position for the bottom performers, we subtract the average
return from the short position in these stocks while adding the average return from the long position.
Listing 6-11 completes the calculation.
Therefore, the momentum trading strategy reports a inal monthly return of 1.587%. Now let us
compare with the buy-and-hold strategy.
Comparing with the Buy-and-Hold Strategy
We assume a buy-and-hold strategy based on DJI as the benchmark. This means entering a long position of
the index at the same beginning of the trading period on 2021-01-01 and holding them all the way until
2022-09-01. We irst download the data on this index by passing “^DJI” as the ticker symbol, as shown in
the following code snippet:
Next, we follow the same approach to calculate the monthly terminal returns, as shown in Listing 6-12.
We can then access the monthly return during the evaluation period, as shown in the following code
snippet:
The buy-and-hold strategy thus reports a monthly return of –4.064% in the same evaluation period.
Although the momentum trading strategy performs better, we are still far from claiming victory here. More
robust backtesting on the out-of-sample performance across multiple periods is still needed.
Summary
In this chapter, we looked at the momentum trading strategy and its implementation in Python. We started
by comparing it with the trend-following strategy from the previous chapter, discussing their connections
and differences in terms of time series and cross-sectional analysis, as well as the different use of lookback
and lookahead windows. Next, we covered its implementation using monthly returns, focusing on the
process of signal generation and out-of-sample performance evaluation.
In the next chapter, we will learn a systematic way of assessing different trading strategies using
backtesting.
Exercises
Play around with the parameters of the momentum trading strategy (such as the window size) and
assess the performance.
Try implementing the momentum trading strategy on a different set of assets, such as commodities,
forex, or cryptocurrencies. Discuss any differences or similarities you observe in the performance of the
strategy.
Try to create a hybrid strategy that combines both momentum trading and trend following. How does
this hybrid strategy perform compared to the stand-alone strategies?
Try to incorporate volatility measures, such as Bollinger Bands or standard deviation of returns, into the
momentum trading strategy. How does this impact the performance?
Implement the strategy using other momentum indicators such as the Relative Strength Index (RSI) or
the Moving Average Convergence Divergence (MACD). Compare their performance with the basic
momentum strategy.
Incorporate transaction costs into the momentum trading strategy. How do these costs impact the
overall pro itability of the strategy?
Perform backtesting of the momentum trading strategy over different market periods (bull market, bear
market, high volatility period, etc.). How robust is the strategy across different market conditions?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_7
Introducing Backtesting
Backtesting allows us to simulate a trading strategy using historical data and
analyze the risk and return before actually entering into a position. It refers to the
process of testing a particular trading strategy backward using historical data in
order to assess its performance on future data going forward. Such performance
is also called the test set performance in the context of training a machine
learning model, with the common constraint that the test set needs to be
completely kept away when formulating a strategy or training a model. This
period of historical data reserved for testing purposes allows us to assess the
potential variability of the proposed trading strategy.
Building on that, backtesting offers a way to measure the effectiveness of a
trading strategy while keeping emotions and subjective bias at bay. It provides a
scienti ic method to simulate the actual performance of a strategy, which then can
be used to calculate various summary metrics that indicate the strategy’s
potential pro itability, risk, and stability over time. Example metrics include the
total return, average return, volatility, maximum drawdown (to be covered
shortly), and the Sharpe ratio.
When carrying out a backtesting procedure, one needs to avoid data snooping
(i.e., peeking into the future) and observe the sequence of time. Even if a certain
period of historical data is used to cross-validate a strategy, one needs to ensure
that the cross-validation periods fall outside or, more speci ically, after the
training period. In other words, the cross-validation period cannot exist in the
middle of the training period, thus preserving the sequence of time as we move
forward.
Retrospectively testing out the hypothetical performance of a trading strategy
on historical data allows us to assess its variability over a set of aforementioned
metrics. Since the same trading strategy may exhibit completely different
behavior when backtested over various choices of investment horizons and
assets, it is critical to overlay a comprehensive set of backtesting scenarios for
the particular trading strategy before its adoption. It’s essential to conduct a
thorough and varied backtesting process, as the performance of a trading strategy
can greatly vary depending on the choice of investment horizon, the selection of
assets, and the speci ic market conditions during the testing period.
For example, we can use backtesting on the trend-following strategy we
covered earlier, where we use two moving averages to generate trading signals if
there is a crossover. In this process, the input consists of two window sizes: one
for the short window and one for the long window. The output is the resulting
return, volatility, or other risk-adjusted return such as the Sharpe ratio. Any pair
of window sizes for the moving averages has a corresponding performance
metric, and we would change the input parameters in order to obtain the optional
performance metric on the historical data. More speci ically, we can create a range
of potential values for each parameter—for example, we could test short moving
averages from 10 to 30 days and long moving averages from 50 to 200 days. For
each combination of these parameters, we calculate the corresponding
performance metric. The optimal parameters then maximize (or minimize,
depending on the speci ic metric) this selected performance metric.
Caveats of Backtesting
Note that a good backtesting performance does not necessarily guarantee a good
future return. This is due to the underlying assumption of backtesting: any
strategy that did well in the past is likely to do well in the future period, and
conversely, any strategy that performed poorly in the past is likely to perform
poorly in the future. Since inancial markets are complex adaptive systems that
are in luenced by a myriad of factors, including economic indicators, geopolitical
events, and even shifts in investor sentiment, all these are constantly evolving and
can deviate signi icantly from past patterns. In summary, past performance is not
indicative of future results.
However, a well-conducted backtest that yields positive results gives
assurance that the strategy is fundamentally sound and is likely to yield pro its
when implemented in reality. Backtesting can at least help us to weed out the
strategies that do not prove themselves worthy. However, this assumption is
likely to fail in the stock market, which typically highlights a low signal-to-noise
ratio. Since inancial markets keep evolving fast, the future may exhibit patterns
not present in the historical data, making extrapolation a dif icult task compared
to interpolation.
Another issue with backtesting is the potential to over it a strategy such that
it performs well on the historical data used for testing but fails to generalize to
new, unseen data. Over itting occurs when a strategy is too complex and tailors
itself to the idiosyncrasies and noise in the test data rather than identifying and
exploiting the fundamental patterns that govern the data-generating process.
In addition, the backtesting period of the historical data needs to be
representative and re lect a variety of market conditions. Excessively using the
same dataset for backtesting is called data dredging, where the same dataset may
produce an exceptionally good result purely by chance. If the backtest only
includes a period of economic boom, for instance, the strategy might appear more
successful than it would during a downturn or volatile market conditions. By
assessing the trading strategy over a comprehensive and diverse period of
historical data, we can avoid data dredging and better tell if the good
performance, if any, is due to sound trading or merely a luke.
Data dredging, or “p-hacking,” is a material concern in backtesting. It involves
repeatedly running different backtests with slightly modi ied parameters on the
same dataset until a desirable result is found. The danger here lies in the fact that
the positive result might just be a product of chance rather than an indication of a
genuinely effective strategy. This over itting could lead to a strategy that
performs exceptionally well on the test data but fails miserably on new, unseen
data.
On the other hand, the selection of the stocks used for backtesting also needs
to be representative, including companies that eventually went bankrupt, were
sold, or liquidated. Failing to do so produces the survivorship bias, where one
cherry-picks a set of stocks and only looks at those that survived till today and
ignores others that disappeared in the middle. By excluding companies that have
failed or undergone signi icant structural changes, we could end up with an overly
optimistic view of the strategy’s pro itability and risk pro ile. This is because the
stocks that have survived, in general, are likely to be those that performed better
than average. Ignoring companies that went bankrupt or were delisted for any
reason may skew the results, creating an illusion of a successful strategy when, in
reality, the strategy may not perform as well in the real environment.
Moreover, by incorporating stocks that have underperformed or failed, we are
in a better position to assess the risk of the strategy and prepare for worst-case
scenarios. This can lead to more accurate risk and reward assessments and better
inform the decision-making process when it comes to deploying the strategy. This
strategy will also be more robust and can withstand various market conditions,
including periods of economic downturn or industry-speci ic shocks.
Lastly, a backtest should also consider all trading costs, however insigni icant,
as these can add up over the course of the backtesting period and drastically
affect the performance of a trading strategy’s pro itability. These costs can include
brokerage fees, bid-ask spreads, slippage (the difference between the expected
price of a trade and the price at which the trade is executed), and in some cases,
taxes and other regulatory fees. Overlooking these costs in backtesting can lead to
an overly optimistic assessment of a strategy’s performance. For example, a high-
frequency trading strategy might seem pro itable when backtested without
trading costs. However, in reality, such strategies involve a large number of trades
and, therefore, high transaction costs, which can quickly erode any potential
pro its. Considering these costs during the backtesting stage will present a more
accurate estimate of the net pro itability of the strategy. Moreover, the impact of
trading costs can vary greatly depending on the speci ics of the trading strategy.
Strategies that involve frequent trading, narrow pro it margins, or large order
sizes can be particularly sensitive to the assumptions made about trading costs in
the backtesting process.
Before diving into the speci ics of backtesting, let us introduce a popular risk
measure called the maximum drawdown, or max drawdown.
Again, the max drawdown is a risk measure that helps us understand the
worst-case scenario of the trading strategy during the backtest period. Such a
calculation process for the drawdown intuitively makes sense, since most people
treat it as the money they have lost compared to the peak asset value they once
owned in the past.
Figure 7-2 provides a sample wealth index curve and the corresponding
single-period drawdowns. Based on the cumulative wealth index curve in the blue
line in the left panel, we can obtain the cumulative peak value in the green line,
which overlaps with the wealth index if the wealth continues to make new heights
and stays lat if the wealth drops. We can thus form a new time series curve
consisting of single-period drawdowns as the percentage difference between
these two curves and return the lowest point as the max drawdown.
Figure 7-2 Obtaining the max drawdown based on a sample wealth index curve
Here, the max drawdown does not mean we are going to suffer such a loss; it
simply means the maximum loss we could have suffered following the particular
trading strategy. The strategy may incur such a loss if we are extremely unlucky
and happen to buy the asset at its peak price and sell it at its trough price. A
strategy with a high max drawdown would indicate a higher risk level, as it shows
that the strategy has historically resulted in substantial losses. On the other hand,
a strategy with a low max drawdown would indicate lower risk, as it has not led to
signi icant losses in the past.
A shrewd reader may immediately wonder if there is a risk-adjusted return
metric based on drawdown risk. It turns out there is, and the measure is called the
Calmar ratio, which is calculated as the ratio between the annualized return of the
trailing 36 months and the max drawdown over the same trailing 36 months.
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
start_date = "2023-01-01"
end_date = "2023-02-11"
df = yf.download(['GOOG', 'MSFT'], start=start_date,
end=end_date)
>>> df.head()
Listing 7-1 Downloading the stock price data
Figure 7-3 Printing the irst ive rows of the downloaded stock price data
Note that the DataFrame is indexed by a list of dates in the datetime format,
as shown in the following:
>>> df2.index
DatetimeIndex(['2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06', '2023-01-09', '2023-01-10', '2023-01-11',
'2023-01-12', '2023-01-13', '2023-01-17', '2023-01-18',
'2023-01-19', '2023-01-20', '2023-01-23', '2023-01-24',
'2023-01-25', '2023-01-26', '2023-01-27', '2023-01-30',
'2023-01-31', '2023-02-01', '2023-02-02', '2023-02-03',
'2023-02-06', '2023-02-07', '2023-02-08', '2023-02-09',
'2023-02-10'], dtype='datetime64[ns]', name='Date',
freq=None)
We can use these date indices to subset the DataFrame by the different
granularity of time periods, such as selecting at the monthly level. As an example,
the following code snippet slices the data in February 2023:
>>> df2.loc["2023-02"]
GOOG MSFT
Date
2023-02-01 101.430000 252.750000
2023-02-02 108.800003 264.600006
2023-02-03 105.220001 258.350006
2023-02-06 103.470001 256.769989
2023-02-07 108.040001 267.559998
2023-02-08 100.000000 266.730011
2023-02-09 95.459999 263.619995
2023-02-10 94.860001 263.100006
The DataFrame we will work with contains 28 days of daily adjusted closing
prices for both stocks, ranging from 2023-01-03 to 2023-02-10. We can check
these details using the info() method:
>>> df2.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 28 entries, 2023-01-03 to 2023-02-10
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 GOOG 28 non-null float64
1 MSFT 28 non-null float64
dtypes: float64(2)
memory usage: 672.0 bytes
>>> df2.plot.line()
To better understand the stock returns, let us convert the raw stock prices to
single-period percentage returns using the pct_change() function:
returns_df = df2.pct_change()
>>> returns_df.head()
GOOG MSFT
Date
2023-01-03 NaN NaN
2023-01-04 -0.011037 -0.043743
2023-01-05 -0.021869 -0.029638
2023-01-06 0.016019 0.011785
2023-01-09 0.007260 0.009736
Again, the irst day shows an NA value since there is no prior stock price as the
baseline to calculate the daily return.
The corresponding line plot for the daily returns follows in Figure 7-5.
>>> returns_df.plot.line()
The igure suggests that the daily returns of both stocks are highly correlated,
except for the last few days when Google showed a sharp dip in price. Such a dip
will re lect itself in the max drawdown measure, as we will show later. Besides, we
also observe a higher volatility for Google as compared to Microsoft.
Now let us construct the wealth index time series. We assume an initial
amount of $1000 for each stock, based on which we will observe the daily
evolution of the portfolio value, assuming a buy-and-hold strategy. Such a wealth
process relies on the sequential compounding process using the cumprod()
function based on 1+R returns, as shown in Listing 7-2.
initial_wealth = 1000
wealth_index_df = initial_wealth*(1+returns_df).cumprod()
>>> wealth_index_df.head()
GOOG MSFT
Date
2023-01-03 NaN NaN
2023-01-04 988.963234 956.256801
2023-01-05 967.335558 927.915502
2023-01-06 982.831735 938.851285
2023-01-09 989.966623 947.992292
Listing 7-2 Constructing the wealth curve
We can override the initial entry as 1000 in order to plot the complete wealth
index curve for both stocks. This essentially tracks the money we have at each
time point after we invest $1000 in each stock on day 1, that is, 2023-01-03.
wealth_index_df.loc["2023-01-03"] = initial_wealth
>>> wealth_index_df.head()
GOOG MSFT
Date
2023-01-03 1000.000000 1000.000000
2023-01-04 988.963234 956.256801
2023-01-05 967.335558 927.915502
2023-01-06 982.831735 938.851285
2023-01-09 989.966623 947.992292
Now we plot the wealth curve for both stocks, as shown in Figure 7-6.
>>> wealth_index_df.plot.line()
prior_peaks_df = wealth_index_df.cummax()
>>> prior_peaks_df.head()
GOOG MSFT
Date
2023-01-03 1000.0 1000.0
2023-01-04 1000.0 1000.0
2023-01-05 1000.0 1000.0
2023-01-06 1000.0 1000.0
2023-01-09 1000.0 1000.0
Listing 7-3 Constructing the cumulative maximum wealth
>>> prior_peaks_df.plot.line()
>>> drawdown_df.plot.line()
The sharp dip in Google’s drawdown at the end of the series becomes more
noticeable now, and we can probably say something about the reason behind the
steep drop. It turns out that there was a factual error in the demo when Google
introduced Bard as a response to the challenge from its rival, Microsoft’s ChatGPT.
The error caused Google shares to tank by a drop of $100 billion in market value.
Coming back to the max drawdown, we can now collect the minimum of these
daily drawdowns as the inal report of the max drawdown for this trading
strategy, as shown in Listing 7-5. Note that we entered a long position in both
stocks at the beginning of the investment period, so the trading strategy is simply
buy-and-hold.
>>> drawdown_df.min()
GOOG -0.128125
MSFT -0.072084
dtype: float64
Listing 7-5 Calculating the max drawdown
Here, we take the minimum of the daily drawdown as it is a negative value. In
practice, we would often report it as a positive number. The result shows that
Google has a much bigger max drawdown (again, expressed as a negative value
and interpreted as the positive absolute value), more than double the max
drawdown of Microsoft during the same trading period.
We can observe the date when the max drawdown occurs using the
idxmin() function, which returns the date index of the minimum value across
the whole column/series, as shown in the following code snippet:
>>> drawdown_df.idxmin()
GOOG 2023-02-10
MSFT 2023-01-05
dtype: datetime64[ns]
We can also limit the range of the DataFrame by subsetting using a less
granular date index in the loc() function. For example, the following code
returns the max drawdown and the corresponding date for each stock in January
2023:
>>> drawdown_df.loc["2023-01"].min()
GOOG -0.044264
MSFT -0.072084
dtype: float64
>>> drawdown_df.loc["2023-01"].idxmin()
GOOG 2023-01-25
MSFT 2023-01-05
dtype: datetime64[ns]
Till now, we have managed to calculate the max drawdown following the
requisite steps. It turns out that a function would be extremely helpful when such
steps become tedious and complex. Using a function to wrap the recipe as a black
box allows us to focus on the big picture and not get bogged down by the inner
workings each time we calculate the max drawdown.
We de ine a function called drawdown() to achieve this task, as shown in
Listing 7-6. This function takes the daily returns in the form of a single Pandas
Series as input, executes the aforementioned calculation steps, and returns the
daily wealth index, prior peaks, and drawdowns in a DataFrame as the output.
Note that the calculation process remains the same. The only change is the
compilation of the relevant information (wealth index, prior peaks, and
drawdown) in one DataFrame. Also, we explicitly speci ied the input type to be a
Pandas Series, as this saves the need to check the input type later on.
Now let us test this function by passing Google’s daily returns as the input
series:
>>> drawdown(returns_df["GOOG"]).head()
Wealth index Prior peaks Drawdown
Date
2023-01-03 NaN NaN NaN
2023-01-04 988.963234 988.963234 0.000000
2023-01-05 967.335558 988.963234 -0.021869
2023-01-06 982.831735 988.963234 -0.006200
2023-01-09 989.966623 989.966623 0.000000
The following code snippet plots the wealth index and prior peaks as line
charts:
We can use the loc() function to subset for a speci ic month. For example,
the following code returns the same curves for January 2023:
>>> drawdown(returns_df.loc["2023-01","GOOG"])[['Wealth
index', 'Prior peaks']].plot.line()
Figure 7-10 Visualizing the wealth index and prior peaks for January 2023
Similarly, we can obtain the max drawdown and the corresponding date for
both stocks, as shown in the following code snippet:
>>> drawdown(returns_df["GOOG"])['Drawdown'].min()
-0.1281250188455857
>>> drawdown(returns_df["GOOG"])['Drawdown'].idxmin()
Timestamp('2023-02-10 00:00:00')
>>> drawdown(returns_df["MSFT"])['Drawdown'].min()
-0.035032299621028426
>>> drawdown(returns_df["MSFT"])['Drawdown'].idxmin()
Timestamp('2023-01-19 00:00:00')
The following code snippet returns the max drawdown for both stocks in
January 2023:
>>> drawdown(returns_df.loc["2023-01","GOOG"])
['Drawdown'].min()
-0.04426435893749917
>>> drawdown(returns_df.loc["2023-01","MSFT"])
['Drawdown'].min()
-0.035032299621028426
In the next section, we will discuss the backtesting procedure using the trend-
following strategy.
Now we create two moving averages, a short curve with a span of 5 using the
exponential moving average via the ewm() method and a long curve with a
window size of 30 using the simple moving average via the rolling() method,
as shown in Listing 7-7.
sma_span = 30
ema_span = 5
short_ma = 'ema'+str(ema_span)
long_ma ='sma'+str(sma_span)
df_goog[long_ma] = df_goog['Adj
Close'].rolling(sma_span).mean()
df_goog[short_ma] = df_goog['Adj
Close'].ewm(span=ema_span).mean()
>>> df_goog.head()
Adj Close sma30 ema5
Date
2022-01-03 145.074493 NaN 145.074493
2022-01-04 144.416504 NaN 144.679700
2022-01-05 137.653503 NaN 141.351501
2022-01-06 137.550995 NaN 139.772829
2022-01-07 137.004501 NaN 138.710106
Listing 7-7 Calculating the short and long moving averages
Note that the span is directly related to the α parameter we introduced earlier
via the following relationship:
where span ≥ 1.
Since generating the trading signal requires that both moving averages are
available at each time point, we remove the rows with any NA value in the
DataFrame using the dropna() method, where we set inplace=True to
change within the DataFrame directly:
df_goog.dropna(inplace=True)
>>> df_goog.head()
Adj Close sma30 ema5
Date
2022-02-14 135.300003 137.335750 137.064586
2022-02-15 136.425507 137.047450 136.851559
2022-02-16 137.487503 136.816483 137.063541
2022-02-17 132.308502 136.638317 135.478525
2022-02-18 130.467499 136.402200 133.808181
Now let us plot these two moving averages together with the original price
curve via the following code snippet:
fig = plt.figure(figsize=(14,7))
plt.plot(df_goog.index, df_goog['Adj Close'],
linewidth=1.5, label='Daily Adj Close')
plt.plot(df_goog.index, df_goog[long_ma], linewidth=2,
label=long_ma)
plt.plot(df_goog.index, df_goog[short_ma], linewidth=2,
label=short_ma)
plt.title("Trend following strategy")
plt.ylabel('Price($)')
plt.legend()
Figure 7-11 Visualizing the moving averages together with the raw time series
As Figure 7-11 suggests, the short moving average (green curve) tracks the
raw time series more closely, while the long moving average (orange curve)
displays a smoother pattern due to a stronger averaging effect.
Now let us calculate the log returns of the buy-and-hold strategy, which
assumes buying one share of Google stock and holding it till the end of the
investment period. This is shown in Listing 7-8.
df_goog['log_return_buy_n_hold'] = np.log(df_goog['Adj
Close'] / df_goog['Adj Close'].shift(1))
Listing 7-8 Calculating the log returns of the buy-and-hold strategy
df_goog['log_return_buy_n_hold'] = np.log(df_goog['Adj
Close']).diff()
Listing 7-9 An equivalent way of calculating the log returns
Next, we identify the trading signals for the trend-following strategy, starting
by creating a signal column that indicates the intended position based on the
magnitude of the two moving averages. This is shown in Listing 7-10.
The periodic log returns for the trend-following strategy can be obtained by
multiplying signal with log_return_buy_n_hold via Listing 7-11.
df_goog['log_return_trend_follow'] = df_goog['signal'] *
df_goog['log_return_buy_n_hold']
Listing 7-11 Calculating the periodic log returns of the buy-and-hold strategy
The terminal return can be calculated using the cumprod() function or the
prod() function, as shown in Listing 7-12. The irst approach calculates the
compounded periodic return and accesses the last period as the inal return
before converting to the simple return format. The second approach directly
multiplies all intermediate percentage returns to get the inal return as the last
period, followed by conversion to a simple return.
>>> np.exp(df_goog['log_return_trend_follow'].sum())**
(252/df_goog.shape[0])-1
0.4210313983829783
Let us calculate the annualized volatility, as shown in Listing 7-14. Recall that
the daily volatility scales up as a function of the square root of time.
Now we calculate the Sharpe ratio, assuming a risk-free interest rate of 3%.
This is shown in Listing 7-15.
riskfree_rate = 0.03
# calculate Sharpe ratio of buy-n-hold
sharpe_ratio_buy_n_hold = (annualized_return_buy_n_hold -
riskfree_rate) / annualized_vol_buy_n_hold
>>> sharpe_ratio_buy_n_hold
-1.0569661045137495
# calculate Sharpe ratio of trend following
sharpe_ratio_trend_follow =
(annualized_return_trend_follow - riskfree_rate) /
annualized_vol_trend_follow
>>> sharpe_ratio_trend_follow
0.9953569038205886
Listing 7-15 Calculating the Sharpe ratio
Although these two strategies are quite disparate in terms of these measures
in backtesting, it also shows the importance of demonstrating the superiority of a
strategy among a set of common backtesting measures before its adoption. In the
next chapter, we will discuss a feedback loop that optimizes the selection of
trading parameters, such as the window size, in order to obtain the best trading
performance given a speci ic trading strategy.
Summary
In this chapter, we covered the process of backtesting a trading strategy. We
started by introducing the concept of backtesting and its caveats. We then
introduced the maximum drawdown, a commonly used performance measure on
the downside risk of a particular trading strategy, followed by its calculation
process. Lastly, we provided an example of how to backtest a trend-following
strategy via multiple performance measures.
In the next chapter, we will introduce statistical arbitrage with hypothesis
testing, with the pairs trading strategy as the working example.
Exercises
Asset A loses 1% a month for 12 months, and asset B gains 1% per month for 12
months. Which is the more volatile asset?
Drawdown is a measure of only downside risk and not upside risk. True or
false?
Assume the risk-free rate is never negative. The drawdown of an investment
that returns the risk-free rate every month is zero. True or false?
The drawdown computed from a daily return series is always greater than or
equal to the drawdown computed from the corresponding monthly series. True
or false?
Write a class to calculate the annualized return, volatility, Sharpe ratio, and max
drawdown of a momentum trading strategy.
How does the frequency of data sampling affect the calculated max drawdown?
What might be the implications of using daily data vs. monthly data?
Assume you have calculated a Sharpe ratio of 1.5 for your trading strategy. If the
risk-free rate increases, what would happen to the Sharpe ratio, all else being
equal?
If a strategy has a positive average return but a high max drawdown, what
might this suggest about the risk of the strategy?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_8
Statistical Arbitrage
Statistical arbitrage refers to the use of statistical methods to identify statistically signi icant
relationships underlying multiple inancial assets and generate trading signals. There are two
parts involved in this process: statistical analysis and arbitrage. In this context, statistical
analysis mostly refers to hypothesis testing, which is a suite of statistical procedures that allows
us to determine if a speci ic relationship among multiple inancial instruments based on the
observed data is statistically signi icant. On the other hand, arbitrage means making sure-win
pro its.
At its core, this strategy relies on mean reversion, which assumes that inancial instruments
that have deviated far from their historical relationship will eventually converge again. For
instance, consider two highly correlated stocks, A and B. If, due to some short-term market
factors, the price of A increases disproportionately compared to B, a statistical arbitrage strategy
might involve short-selling A (which is now overpriced) and buying B (which is underpriced). As
the prices of A and B revert to their historical correlation, the arbitrageur would close both
positions—buy A to cover the short sell and sell B to realize the gain. The net pro it comes from
the convergence of prices. Therefore, statistical arbitrage is essentially a market-neutral strategy,
generating pro its by taking advantage of temporary market inef iciencies.
Note that statistical arbitrage strategies should expect a relatively stable long-term
equilibrium relationship between the two underlying assets for the strategy to work. They also
operate on relatively small pro it margins, necessitating high volumes of trades to generate
substantial returns.
Delving deeper, the irst step in the statistical arbitrage process is to identify pairs of trading
instruments that exhibit a high degree of comovement. This can be achieved through statistical
procedures such as correlation analysis or cointegration tests. For instance, consider stocks A
and B, which typically move in sync with each other. Although perfect correlation is rare in
inancial markets, we can leverage historical price data to ind stocks that are highly correlated,
often within the same industry or sector.
However, this comovement doesn’t always mean equal price changes. Short-term luctuations
driven by various factors like market sentiment, sudden news announcements, or unforeseen
events like a pandemic can cause a temporary divergence in the price relationship. In the given
example, if stock A increases by 10% and stock B only by 5%, it suggests a temporary mispricing
where B is underpriced relative to A.
This brings us to the second step, which involves capitalizing on this mispricing through
trading actions such as pairs trading. In the case of A and B, an investor could execute a long
position on the underpriced stock B, expecting its price to increase and converge with the price
of A.
It’s important to note that statistical arbitrage relies heavily on the premise that these pricing
inef iciencies are temporary and that the price relationship will revert to its historical norm.
Therefore, this strategy necessitates diligent monitoring and a robust risk management system
to ensure timely entries and exits.
Figure 8-1 illustrates one way of performing statistical arbitrage. We assume a perfect
correlation between stocks A and B, where the same percentage change is observed for periods 0,
1, and 2. However, stock A increased by 10% in period 3, while stock B only increased by 5%.
Based on the principle of statistical arbitrage, we could long stock B, which is considered to be
underpriced, or short stock A, which is considered overpriced. We could also do both at the same
time.
Figure 8-1 Illustrating the concept of statistical arbitrage. After identifying a perfect correlation between stocks A and B using
statistical techniques, as indicated by the prices in periods 0, 1, and 2, we would take advantage of market mispricing by longing stock
B (which is underpriced) and/or shorting stock A (which is overpriced)
Pairs Trading
Pairs trading is a market-neutral strategy that leverages statistical analysis to generate potential
pro its regardless of the overall market direction. The “pair” in pairs trading refers to
simultaneously taking two positions: going long on one asset and short on another, with the key
requirement being that these assets are highly correlated. The trading signal stems from the
spread or price difference between these two assets.
An unusually large spread, in comparison to historical data, suggests a temporary divergence,
and the anticipation is that this divergence will eventually correct itself, reverting to its mean or
average value over time. Traders can capitalize on this mean-reverting behavior, initiating trades
when the spread is abnormally wide and closing them once the spread narrows and returns to its
typical range.
The determination of what constitutes an “abnormal” or “normal” spread is crucial and forms
the core parameters of the pairs trading strategy. This typically involves extensive backtesting,
where historical price data is analyzed to identify consistent patterns in price divergence and
convergence, which then informs the thresholds for trade entry and exit points. Pairs trading,
while robust in its market-neutral stance, requires a keen understanding of the long-term
equilibrium relationship between the paired assets and careful management of potential risks if
the expected price convergence does not materialize.
In the strategy of pairs trading, asset selection is grounded in a statistical procedure called
hypothesis testing, speci ically, the cointegration test. This process uses historical price data to
identify pairs of inancial instruments that exhibit a high level of correlation. When two assets
are highly correlated, they tend to move in a synchronized manner. This means that any price
change in one asset is typically mirrored proportionally by the other, resulting in relatively stable
spreads that do not deviate signi icantly from their historical average. However, there can be
moments when this spread deviates markedly from its historical norm, suggesting temporary
mispricing of the assets. This divergence indicates that the assets’ prices have drifted apart more
than their usual correlation would predict.
Such deviations create a unique pro it opportunity in pairs trading. Traders can capitalize on
these large spreads by betting on their future contraction. Speci ically, the strategy would be to go
long on the underpriced asset and short on the overpriced one, with the anticipation that the
spread will revert back to its historical average as the asset prices correct themselves. This
reversion provides the opportunity to close both positions at a pro it.
Figure 8-2 provides the overall work low of implementing a pairs trading strategy. At irst, we
analyze a group of inancial assets (such as stocks) and identify a pair that passes the
cointegration test. This is a statistical test that determines if a group of assets is cointegrated,
meaning their combination generates a stationary time series, despite each individual time
series not exhibiting such stationarity. In other words, the historical differences, or spreads, of
the two cointegrated assets form a stationary time series. We can thus monitor the current
spread and check if it exceeds a reasonable range of historical spreads. Exceeding the normal
range indicates a trading signal to enter two positions: long the underpriced asset and short the
overpriced asset. We would then hold these positions until the current spread shrinks back to the
normal range, upon which point we would exit the positions and lock in a pro it before it shrinks
even further (which results in a loss).
Figure 8-2 Overall work low of implementing the pairs trading strategy
Cointegration
Cointegration, a concept pivotal to hypothesis testing, posits two potential scenarios: the null
hypothesis, which states that two or more non-stationary time series are not cointegrated, and
the alternative hypothesis, which claims the opposite, that is, these time series are cointegrated
if their linear combination generates a stationary time series (more on this later).
Let’s demystify some of the jargon here. A time series refers to a sequence of data points
indexed (or listed or graphed) in time order, with each data point assigned a speci ic timestamp.
This dataset can be analyzed through several summary statistics or statistical properties. These
can include metrics like mean and variance computed over a certain time frame or window.
Moving this window across different periods, a stationary time series exhibits constancy in
its mean and variance on average. This means that no matter when you observe it, its basic
properties do not change. On the other hand, a non-stationary time series demonstrates a trend
or a drift, signifying a changing mean and variance across varying time periods. These time series
are dynamic, with their basic properties shifting over time, often due to factors like trends and
seasonality.
Hence, the process of cointegration examines whether there is a long-term equilibrium
relationship between non-stationary time series despite short-term luctuations. Such long-term
equilibrium manifests as a stationary time series as a linear combination of the two non-
stationary time series.
Many traditional statistical methods, including ordinary least squares (OLS) regression, are
based on the assumption that the variables under analysis—which are also time series data
points—exhibit stationarity. This implies that their fundamental statistical characteristics
remain consistent over time. However, when dealing with non-stationary variables, this
stationarity assumption gets violated. As a result, different techniques are needed to perform the
modeling. One common strategy is to difference the non-stationary variable (deriving a new time
series by taking the difference in the observed values of two consecutive time points) to
eliminate any observable trend or drift.
A non-stationary time series might possess a unit root, which signi ies a root of one in its
autoregressive (AR) polynomial. To put it differently, the value in the next time period is strongly
impacted by the present period value. This dependency re lects a form of serial correlation,
where values from previous periods exert in luence on subsequent ones, thereby potentially
leading to non-stationary behavior.
The unit root test, therefore, is a method to examine whether a time series is non-stationary
and possesses a unit root. Identifying and addressing the presence of a unit root is a critical step
in the process of time series modeling, especially when the aim is to understand long-term trends
and forecasts.
In essence, a cointegration test examines the assumption that, although individual time
series may each have a unit root and hence be non-stationary, a linear combination of these time
series might result in a stationary series. This forms the alternative hypothesis for the test.
To be precise, the alternative hypothesis states that the aggregate time series, derived from a
linear combination of individual time series, achieves stationarity. Should this be the case, it
would imply a persistent long-term relationship among these time series variables. Such long-
term relationships will get obscured by temporary luctuations in the market from time to time,
due to factors such as mispricing. Hence, the cointegration test aids in revealing these hidden
long-term relationships among time series variables.
When assets are determined to be cointegrated—meaning that the alternative hypothesis is
upheld—they are fed into the trading signal generation phase of the pairs trading strategy. Here,
we anticipate the long-term relationship between the two time series variables to prevail,
regardless of short-term market turbulence.
Therefore, cointegration serves as a valuable tool in statistical analysis, exposing the
underlying long-term relationship between two non-stationary and seemingly unrelated time
series. This long-term association, dif icult to detect when these time series are analyzed
independently, can be discovered by combining these individual non-stationary assets in a
particular way. This combination is typically done using the Johansen test, yielding a new,
combined time series that exhibits stationarity, characterized by a consistent mean and variance
over different periods. Alternatively, the Engle-Granger test can be employed to generate a spread
series from the residuals of a linear regression model between the two assets.
Figure 8-3 illustrates the process of cointegration and strategy formulation. The purpose of
cointegration is to convert individual non-stationary time series data into a combined stationary
series, which can be achieved via the Johansen test with a linear combination, the Engle-Granger
test via a linear regression model, or other test procedures. We would then derive another series
called the spread to indicate the extent of short-term luctuation from the long-term equilibrium
relationship. The spread is used to generate trading signals in the form of entry and exit points
based on the extent of deviation at each time point, with the help of entry and exit thresholds
de ined in advance.
Figure 8-3 Illustrating the process of cointegration using different tests and strategy formulation to generate trading signals
Stationarity
Stock prices are time series data. A stationary time series is a time series where the statistical
properties of the series, including the mean, variance, and covariance at different time points, are
constant and do not change over time. A stationary time series is thus characterized by a lack of
observable trends or cycles in the data.
Let us take the normal distribution as an example. A normal distribution y = f (x; μ, σ) is a
probability density function that maps an input x to a probability output y, assuming a ixed set of
parameters: the mean μ as the central tendency and standard deviation σ as the average
deviation from the mean. The speci ic form of the probability distribution is as follows:
A widely used normal distribution is the standard normal, specifying μ = 0 and σ = 1. The
resulting probability density function is
We can generate random samples following this speci ic form using the random.normal()
function from NumPy. In Listing 8-1, we de ine a function generate_normal_sample() that
generates a normally distributed random sample by passing in the input parameter μ and σ in a
list.
To see the impact on the samples generated from a non-stationary distribution, we will
specify three different non-stationary distributions. Speci ically, we will generate 100 samples
that follow a distribution with either an increasing mean or standard deviation. Listing 8-2
performs the random sampling for 100 rounds and compares them with the samples from the
standard normal distribution.
for i in range(T):
# generate a stationary sample and append to list
stationary_list.append(generate_normal_sample([0,1]))
# generate a non-stationary sample with an increasing mean and
append to list
nonstationary_list1.append(generate_normal_sample([i,1]))
# # generate a non-stationary sample with an increasing mean and
sd and append to list
nonstationary_list2.append(generate_normal_sample([i,np.sqrt(i)]))
x = range(T)
# plot the lists as line plots with labels for each line
plt.plot(x, stationary_list, label='Stationary')
plt.plot(x, nonstationary_list1, label='Non-stationary with increasing
mean')
plt.plot(x, nonstationary_list2, label='Non-stationary with increasing
mean and sd')
Running the code generates Figure 8-4, where the impact of a changing mean and standard
deviation becomes more pronounced as we increase the magnitude in later rounds.
Figure 8-4 Generating normally distributed random samples from non-stationary distributions with different parameter
speci ications
Note that we can use the augmented Dickey-Fuller (ADF) test to check if a series is a
stationary. The function stationarity_test() de ined in Listing 8-3 accepts two inputs: the
time series to be tested for stationarity and the signi icant level used to compare with the p-value
and determine the statistical signi icance. Note that the p-value is accessed as the second
argument from the test result object using the adfuller() function. This is shown in Listing 8-
3.
Let us apply this function to the previous time series data. The result shows that the ADF is
able to differentiate if a time series is stationary (with ixed parameters) based on a preset
signi icance level:
>>> print(stationarity_test(stationary_list))
>>> print(stationarity_test(nonstationary_list1))
>>> print(stationarity_test(nonstationary_list2))
p-value is 1.2718058919122438e-12. The series is likely stationary.
p-value is 0.9925665941220737. The series is likely non-stationary.
p-value is 0.9120355459829741. The series is likely non-stationary.
Let us look at a concrete example of how to test for cointegration between two stocks.
import os
import random
import numpy as np
import yfinance as yf
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
from matplotlib import pyplot as plt
%matplotlib inline
SEED = 8
random.seed(SEED)
np.random.seed(SEED)
Now we dig into the linear regression model between these two stocks. We will treat Google
stock as the (only) independent variable and Microsoft stock as the dependent variable to be
predicted. The model assumes the following form:
where β0 denotes the intercept and β1 is the slope of the linear line itted between these two
stocks. ϵ represents the random noise that is not modeled by the predictor x. Note that we are
assuming a linear relationship between x and y, which is unlikely to be the case in a real-world
environment. Another name for ϵ is the residual, which is interpreted as the (vertical) distance
between the predicted value β0 + β1x and the target value y. That is, ϵ = y − (β0 + β1x).
Our focus would then shift to these residuals, with the intention of assessing if the residual
time series would be stationary. Let us irst obtain the residuals from the linear regression
model.
In Listing 8-5, we assign the irst stock as the target variable Y and the second stock as the
predictor variable X. We then use the add_constant() function to add a column of ones to the
X variable, which can also be considered as the bias trick to incorporate the intercept term β0.
Next, we construct a linear regression model object using the OLS() function, perform learning
by invoking the fit() function, and calculate the residuals as the difference between the target
values and the predicted values, obtained via the predict() method.
The model object is essentially a collection of the model weights (also called parameters) and
the architecture that governs how the data low from the input to the output. Let us access the
model weights:
We have two parameters in the model: const corresponding to β0 and MSFT corresponding
to β1.
Besides using the predict() method to obtain the predicted values, we can also construct
the explicit expression for the predictions and calculate them manually. That is, we can calculate
the predicted values as follows:
The following code snippet implements this expression and calculates the model predictions
manually. We also check if the manually calculated residuals are equal to the previous values
using the equals() function:
# alternative approach
residuals2 = Y - (model.params['const'] + model.params[stocks[1]] *
X)
# check if both residuals are the same
print(residuals.equals(residuals2))
Lastly, we test the stationarity of the residual series, again using the augmented Dickey-Fuller
(ADF) test. The test can be performed using the adfuller() function from the statsmodels
package. There are two metrics that are relevant to every statistical test: the test statistic and the
p-value. Both metrics convey the same information on the statistical signi icance of the
underlying hypothesis, with the p-value being a standardized and, thus, more interpretable
metric. A widely used threshold (also called the signi icance level) is 5% for the p-value. That is, if
the resulting p-value from a statistical test is less than 5%, we can safely (up to a con idence level
of 95%) reject the null hypothesis in favor of the alternative hypothesis. If the p-value is greater
than 5%, we fail to reject the null hypothesis and conclude that the two stocks are not
cointegrated.
The null hypothesis often represents the status quo. In the case of the cointegration testing
using the Engle-Granger test, the null hypothesis is that the two stocks are not cointegrated. That
is, the historical prices do not exhibit a linear relationship in the long run. The alternative
hypothesis is that the two stocks are cointegrated, as exhibited by a linear relationship between
the two and a stationary residual series.
Now let us carry out the ADF test and use the result to determine if these two stocks are
cointegrated using a signi icance level of 5%. In Listing 8-6, we apply the adfuller() function
to the prediction residuals and print out the test statistic and p-value. This is followed by an if-
else statement to determine if we have enough con idence to reject the null hypothesis and claim
that the two stocks are cointegrated.
The result suggests that Google and Microsoft stocks are cointegrated due to a small p-value
of 2%. Indeed, based on our previous analysis of calculating the max drawdown, Google and
Microsoft stock prices generally tend to move together. However, with the introduction of
ChatGPT in Bing search, the overall picture may start to change. Such cointegration
(comovement) may gradually weaken as the tool gives everything for Microsoft to win (due to a
small revenue from web search) and for Google to lose (majority revenue comes from web
search).
Next, we touch upon another closely related but different statistical concept: correlation.
np.random.seed(123)
X = np.random.normal(1, 1, 100)
Y = np.random.normal(2, 1, 100)
X = pd.Series(np.cumsum(X), name='X')
Y = pd.Series(np.cumsum(Y), name='Y')
Running the code generates Figure 8-5. Series Y has a higher drift than series X as designed
and also exhibits a high degree of correlation (or comovement) across the whole history of 100
points.
Figure 8-5 Illustrating the evolution of two series that are highly correlated but not cointegrated
Let us calculate the exact correlation coef icient and cointegration p-value. In the following
code snippet, we call the corr() method to obtain the correlation of X with Y and use the
coint() function from the statsmodels package to perform the cointegration test and
retrieve the resulting p-value. The coint() function performs the augmented Engle-Granger
two-step cointegration test, similar to how to manually carry out the two-step process earlier.
The result shows that these two series are highly correlated but not cointegrated.
In the next section, we dive deep into the implementation of the pairs trading strategy.
These 15 unique pairs of stocks are stored as tuples in a list. Each tuple will go through the
cointegration test in the following section.
threshold = 0.1
# run Engle-Granger test for cointegration on each pair of stocks
for pair in stock_pairs:
# subset df based on current pair of stocks
df2 = df[list(pair)]
# perform test for the current pair of stocks
score, pvalue, _ = coint(df2.values[:,0], df2.values[:,1])
# check if the current pair of stocks is cointegrated
if pvalue < threshold:
print(pair, 'are cointegrated')
else:
print(pair, 'are not cointegrated')
Listing 8-8 Performing a cointegration test for each unique pair of stocks
Note that the threshold is set as 10% instead of 5% as before, since the test would show no
cointegrated pair of stocks when setting the threshold as the latter. As it turns out, the coint()
function is slightly different from our manual implementation of the test procedure earlier. For
example, the order of the time series assumed by the coint() function may not be the same.
Running the code generates the following result:
It turns out that only Google and Microsoft stock prices are cointegrated using the 10%
threshold on the signi icance level. These two stocks will be the focus of our pairs trading
strategy in the following, starting by identifying the stationary spread between the two stocks.
Figure 8-6 Visualizing the spread as the residuals of the linear regression model
Converting to Z-Scores
A z-score is a measure of how many standard deviations the daily spread is from its mean. It is a
standardized score that we can use to compare across different distributions. Denote x as the
original observation. The z-score is calculated as follows:
where μ and σ denote the mean and standard deviation of the time series, respectively.
Therefore, the magnitude of the z-score indicates how far away the current observation
deviates from the mean in terms of the unit of standard deviations, and the sign of the z-score
suggests whether the deviation is above (a positive z-score) or below (a negative z-score) the
mean.
For example, assume a distribution with a mean of 10 and a standard deviation of 2. If an
observation is valued at 8, the z-score for this observation would be . In other words,
this observation is one standard deviation away from the mean of the distribution.
The z-score is often used to assess the statistical signi icance of observation in hypothesis
testing. A z-score of greater than or equal to 1.96 (or smaller than or equal to –1.96) corresponds
to a p-value of 0.05 or less, which is a common threshold for assessing the statistical signi icance.
In Listing 8-10, we visualize the probability density function (PDF) of a standard normal
distribution with a mean of 0 and a standard deviation of 1. We irst generate a list of equally
spaced input values as the z-scores using the np.linspace() function and obtain the
corresponding probabilities in the PDF of standard normal distribution using the norm.pdf()
function with a location parameter of 0 (corresponding to the mean) and scale of 1
(corresponding to the standard deviation). We also shade the areas before –1.96 and after 1.96,
where a z-score of 1.96 corresponds to a 95% signi icance level in a statistical test. In other
words, z-scores greater than or equal to 1.96 account for 5% of the total probability, and z-scores
lower than or equal to –1.96 account for 5% as well.
In the context of hypothesis testing, the shaded area represents the probability of observing a
z-score greater than 1.96 under the null hypothesis. Performing the statistical test would give us
a z-score. If the z-score is above 1.96 or below –1.96 in a one-sided test, we would reject the null
hypothesis in favor of the alternative hypothesis at the 0.05 signi icance level, since the
probability of observing the phenomenon under the null hypothesis would simply be too small.
In summary, we use the z-score as a standardized score to measure how many standard
deviations an observation is from the mean of a distribution. It is used in hypothesis testing to
determine the statistical signi icance of an observation, that is, the probability of an event
happening under the null hypothesis. The signi icance level is often set at 0.05. We can use the z-
score to calculate the probability of observing a value as extreme as the observation under the
null hypothesis. Finally, we make a decision on whether to reject or fail to reject the null
hypothesis.
Now let us revisit the running example. Since stock prices are often volatile, we switch to the
moving average approach to derive the running mean and standard deviation. That is, each daily
spread would have a corresponding running mean and standard deviation based on the collection
of spreads in the rolling window. In Listing 8-11, we derive the running mean and standard
deviation using a window size of ten and apply the transformation to derive the resulting z-
scores as the standardized spread.
# convert to z score
# z-score is a measure of how many standard deviations the spread is
from its mean
# derive mean and sd using a moving window
window_size = 10
spread_mean = spread.rolling(window=window_size).mean()
spread_std = spread.rolling(window=window_size).std()
zscore = (spread - spread_mean) / spread_std
zscore.plot(figsize=(12,6))
Listing 8-11 Converting to z-scores based on moving averages
Running the code generates Figure 8-8, where the standardized spreads now look more
normally distributed as white noise.
Figure 8-8 Visualizing the z-scores after standardizing the spreads using the running mean and standard deviation
Since we used a window size of ten, the irst nine observations will appear as NA in the
moving average series. Let us get rid of the initial NA values by irst identifying the irst valid
index using the first_valid_index() function and then subsetting the z-score series, as
shown in the following code:
The next section formulates the trading strategy using the z-scores.
Figure 8-9 Illustrating the process of formulating trading signals based on preset entry and exit thresholds for the z-scores
In Listing 8-12, we irst initialize the entry and exit thresholds, respectively. We create two
Pandas Series objects (stock1_position and stock2_position) to store the daily
positions for each stock. Based on the current z-score and present thresholds for entering and
exiting long or short positions, we check the daily z-score in a loop and match it to one of the four
cases for signal generation based on the following rule:
Long stock 1 and short stock 2 if the z-score is below –2 and stock 1 has no prior position.
Short stock 1 and long stock 2 if the z-score is above 2 and stock 2 has no prior position.
Exit the position in both stock 1 and stock 2 if the z-score is between –1 and 1.
Maintain the position in both stock 1 and stock 2 for the rest of the cases, that is, the z-score is
between –2 and –1 or between 1 and 2.
We can now calculate the overall pro it of the pairs trading strategy. In Listing 8-13, we irst
obtain the daily percentage changes using the pct_change() function for each stock, starting
from the index with a valid value. These daily returns will be adjusted according to the position
we held from the previous trading day. In other words, multiplying the shifted positions with the
daily returns gives the strategy’s daily returns for each stock, illing possible NA values with zero.
Finally, we add up the daily returns from the two stocks, convert them to 1+R returns, and
perform the sequential compounding procedure using the cumprod() function to obtain the
wealth index.
The terminal return, extracted via the following code, shows that the pairs trading strategy
delivers a total of 14.1% pro it at the end of the trading year.
Again, this result is subject to more rigorous backtesting in terms of the selection of
investment assets, trading periods, and evaluation metrics.
Summary
In this chapter, we covered the concept of statistical arbitrage and hypothesis testing, as well as
the implementation details based on the pairs trading strategy. We irst walked through the
overall process of developing a pairs trading strategy and introduced new concepts such as
cointegration and stationarity. Next, we compared cointegration and correlation, both closely
related but drastically different. Last, we introduced a case study on calculating the cumulative
return using the pairs trading strategy.
In the next chapter, we will introduce Bayesian optimization, a principled way to search for
optimal parameters of a trading strategy.
Exercises
Evaluate the cointegration of selected stock pairs during bull and bear market periods
separately. Do the results vary signi icantly? If so, discuss possible reasons.
Implement rolling cointegration tests on a pair of time series data and observe how
cointegration status (cointegrated or not) evolves over time.
For a given pair of stocks, test the stationarity of the spread between them using the ADF test.
If the spread is stationary, what does it imply for the pairs trading strategy?
Given the time series data of spreads for a pair of stocks, perform a hypothesis test to check
whether the mean of spreads is equal to zero.
Calculate the z-scores of the spread for different lookback periods (e.g., 30, 60, and 90 days).
How does changing the lookback period affect the distribution of z-scores and the performance
of your pairs trading strategy?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_9
More on Optimization
Optimization aims at locating the optimal value f ∗ = f (x∗) or its maximizer
for all the input values in a maximization setting,
which could also be a minimization problem. The procedure that carries out the
optimization process is called the optimizer. There are multiple types of
optimizers, with stochastic gradient descent (SGD) being the most popular
optimizer in the space of deep learning. In the context of backtesting a trading
strategy, we are mostly interested in optimizing the risk-adjusted return,
represented by the Sharpe ratio or other risk measures such as the max
drawdown. Plus, we have the additional challenge that the inputs are not
continuous values; instead, they are discrete such as window sizes or trading
volumes.
The optimizer takes a function f and igures out the desired optimum value f
∗ or its corresponding input parameter x ∗. Being an optimum value means that f
(x ∗) is greater (or less, in the case of minimization) than any other values in the
neighborhood. Here, f ∗ may be either a local optimum or a global optimum. A
local optimum means f (x ∗) is at the top of a mountain, and global optimum
means the highest point of all mountains in the region. That is, in a
maximization setting, we could take all the local maxima, compare each other,
and report the maximum of them as the global maximum. Both are
characterized by having a zero gradient at the point x ∗, yet the global optimum
is often what we aim for. The optimizer needs a strategy to escape from these
local optima and continue its search for the global optimum. There are various
techniques to handle this issue, including using different initial values via the
multistart procedure, applying random jumps in the parameter space, and using
complex algorithms like simulated annealing or genetic algorithms that employ
speci ic mechanisms to escape local optima.
In the context of developing a trading strategy, we are interested in the
global maximizer (optimal input parameters) that gives the maximal Sharpe
ratio. This is a complex task as there may be many sets of parameters that yield
good results (local maxima), but we want to ind the absolute best (global
maximum).
Note that using the gradient information to identify an optimum represents
a huge improvement in our understanding of optimization problems, as irst
proposed by Isaac Newton. Prior to his time, we would make the manual
comparison for each unique pair, which is a combinatorial problem that requires
the most time-consuming work. When the function form is available, such as
y = x2, we could invoke the tool of calculus and solve for the point whose gradient
is zero, that is, y′ = 2x = 0, giving x = 0. We could then calculate the second
derivative or apply the sign chart method to ascertain if this is a maximum or
minimum point.
The next section introduces more on the global optimization problem.
Global Optimization
Optimization aims to locate the optimal set of parameters of interest across the
whole search domain, often by carefully allocating limited resources. For
example, when searching for the car key at home before leaving for work in two
minutes, we would naturally start with the most promising place where we
would usually put the key. If it is not there, think for a little while about the
possible locations and go to the next most promising place. This process
iterates until the key is found. In this example, the search policy is, in a way,
behaving intelligently. It digests the available information on previous searches
and proposes the following promising location, so as to use the limited resource
wisely. The resource could be the limited number of trials we could run before a
project deadline approaches tomorrow or the two-minute budget to search for
the key in this case. The unknown function is the house itself, a binary value that
reveals if the key is placed at the proposed location upon each sampling at the
speci ic location.
This intelligent search policy represents a cornerstone concept in
optimization, especially in the context of derivative-free optimization where the
unknown function does not reveal any derivative information. Here, the policy
needs to balance exploration, which probes the unknown function at various
locations in the search domain, and exploitation, which focuses on promising
areas where we have already identi ied a good candidate value. This trade-off is
usually characterized by a learning curve showing the function value of the best-
found solution over the number of function evaluations.
The key search example is considered an easy one since we are familiar with
the environment in terms of its structural design. However, imagine locating an
item in a totally new environment. The optimizer would need to account for the
uncertainty due to unfamiliarity with the environment while determining the
next sampling location via multiple sequential trials. When the sampling budget
is limited, as is often the case in real-life searches in terms of time and
resources, the optimizer needs to argue carefully on the utility of each candidate
input parameter value.
This process is characterized by sequential decision-making under
uncertainty, a problem that lies at the heart of the ield of optimization. When
faced with such a situation, optimizers need to develop an intelligent search
policy that effectively manages the trade-off between exploration (searching
new areas) and exploitation (capitalizing on known, promising locations). In the
context of searching for an item in an unfamiliar environment, exploration
involves searching in completely new areas where the item could potentially be
located, while exploitation involves focusing the search around areas where
clues or signs of the item have already been found. The challenge is to balance
these two approaches, as focusing too much on exploration could lead to a waste
of time and resources, while focusing too much on exploitation could result in
missed opportunities.
In the world of trading strategies, this situation amounts to a search in a
high-dimensional parameter space where each dimension represents a different
aspect of the trading strategy. Exploration would involve trying out completely
new sets of parameters, while exploitation would involve ine-tuning the most
promising sets of parameters already discovered. The optimizer aims to
effectively navigate this high-dimensional space and ind the set of parameters
that yields the best possible performance in terms of the Sharpe ratio or other
preset metrics.
Let us formalize this sequential global optimization using mathematical
terms. We are dealing with an unknown scalar-valued objective function f based
on a speci ic domain . In other words, the unknown subject of interest f is a
function that maps a certain candidate parameter in to a real number in
that is, . We typically place no speci ic assumption about the nature
of the domain other than that it should be a bounded, compact, and convex
set.
A bounded set means that it has upper and lower limits, and all values of
the parameters contained within fall within these bounds. A compact set is
one that is both bounded and closed, meaning that it includes its boundary. And
a convex set is one in which, for any two points within the set, the set contains
the whole line segment that joins them. These assumptions make our problem
mathematically tractable and realistic in the real-world scenario.
Unless otherwise speci ied, we focus on the maximization setting instead of
minimization since maximizing the objective function is equivalent to
minimizing the negated objective, followed by another negation to recover the
original maximum value. The optimization procedure thus aims at locating the
global maximum f ∗ or its corresponding location x ∗ in a principled and
systematic manner. Mathematically, we wish to locate f ∗ where
Bayesian Optimization
As the name suggests, Bayesian optimization is an area that studies
optimization problems using the Bayesian approach. Optimization aims at
locating the optimal objective value (i.e., a global maximum or minimum) of all
possible values or the corresponding location of the optimum over the search
domain, also called the environment. The search process starts at a speci ic
initial location and follows a particular policy to iteratively guide the following
sampling locations, collect new observations, and refresh the guiding search
policy.
At its core, Bayesian optimization uses a probabilistic model (such as
Gaussian processes) to represent the unknown function and a utility function
(also called the acquisition function) to decide where to sample next. It
iteratively updates the probabilistic model with new sample points and uses
this updated model to select the next sampling location.
As shown in Figure 9-5, the overall optimization process consists of repeated
interactions between the policy (the optimizer) and the environment (the
unknown objective function). The policy is a mapping function that takes in a
new input parameter (plus historical ones) and outputs the next parameter
value to try out in a principled way. Here, we are constantly learning and
improving the policy as the search continues. A good policy guides our search
toward the global optimum faster than a bad one. In arguing which parameter
value to try out, a good policy would spend the limited sampling budget on
promising candidate values.
Figure 9-5 The overall Bayesian optimization process. The policy digests the historical observations and
proposes a new sampling location. The environment governs how the (possibly noise-corrupted) observation
at the newly proposed location is revealed to the policy. Our goal is to learn an ef icient and effective policy
that could navigate toward the global optimum as quickly as possible
On the other hand, the environment contains the unknown objective function
to be learned by the policy within a speci ic boundary (maximum and minimum
values of the parameter value). When probing the functional value as requested
by the policy, the actual observation revealed by the environment to the policy is
often corrupted by noise due to the choice of the backtesting period, making the
learning even more challenging. Thus, Bayesian optimization, a speci ic
approach for global optimization, would like to learn a policy that can help us
ef iciently and effectively navigate toward the global optimum of an unknown,
noise-corrupted objective function as quickly as possible.
When deciding which parameter value to try next, most search strategies
face the exploration and exploitation trade-off. Exploration means searching
within an unknown and faraway area, and exploitation refers to searching
within the neighborhood visited earlier in the hope of locating a better
functional evaluation. Bayesian optimization also faces the same dilemma.
Ideally, we would like to explore more at the initial phase to increase our
understanding of the environment (the black-box function) and gradually shift
toward the exploitation mode that taps into the existing knowledge and digs
into known promising regions.
Bayesian optimization achieves such a trade-off via two components: a
Gaussian process (GP) used to approximate the underlying black-box function
and an acquisition function that encodes the exploration-exploitation trade-off
into a scalar value as an indicator of the sampling utility across all candidates in
the domain. Let us look at each component in detail in the following sections.
Gaussian Process
As a widely used stochastic process (able to model an unknown black-box
function and the corresponding uncertainties of modeling), the Gaussian
process takes the inite-dimensional probability distributions one step further
into a continuous search domain that contains an in inite number of variables,
where any inite set of points in the domain jointly forms a multivariate
Gaussian distribution. It is a lexible framework to model a broad family of
functions and quantify their uncertainties, thus being a powerful surrogate
model used to approximate the true underlying function. Let us look at a few
visual examples to see what it offers.
Figure 9-6 illustrates an example of a “ lipped” prior probability distribution
for a single random variable selected from the prior belief of the Gaussian
process. Every single point represents a parameter value, although it is now
modeled as a random variable and thus has randomness in its realizations.
Speci ically, each point follows a normal distribution. Plotting the mean (solid
line) and 95% credible interval (dashed lines) of all these prior distributions
gives us the prior process for the objective function regarding each location in
the domain. The Gaussian process thus employs an in inite number of normally
distributed random variables within a bounded range to model the underlying
objective function and quantify the associated uncertainty via a probabilistic
approach.
Figure 9-6 A sample prior belief of the Gaussian process represented by the mean and 95% credible interval
for each location in the domain. Every objective value is modeled by a random variable that follows a normal
prior predictive distribution. Collecting the distributions of all random variables and updating these
distributions as more observations are collected could help us quantify the potential shape of the true
underlying function and its probability
The prior process can thus serve as the surrogate data-generating process of
the unknown black-box function, which can also be used to generate samples in
the form of functions, an extension of sampling single points from a probability
distribution. For example, if we were to repeatedly sample from the prior
process, we would expect the majority (around 95%) of the samples to fall
within the credible interval and a minority outside this range. Figure 9-7
illustrates three functions sampled from the prior process.
Figure 9-7 Three example functions sampled from the prior process, where the majority of the functions fall
within the 95% credible interval
Figure 9-8 Updated posterior process after incorporating two exact observations in the Gaussian process.
The posterior mean interpolates through the observations, and the associated variance reduces as we move
nearer the observations
Mathematically, for a new sampling location , the corresponding
functional evaluation f∗ following the Gaussian process would assume a
conditional normal distribution:
Therefore, we can obtain the posterior mean and variance at any arbitrary
location based on the posterior Gaussian process model, serving as the
surrogate model for the underlying function of the speci ic trading strategy.
Now let us look at the other critical component: the acquisition function.
Acquisition Function
The tools from Bayesian inference and the incorporation of the Gaussian
process provide principled reasoning on the underlying distribution of the
objective function. However, we would still need to incorporate such
probabilistic information in our decision-making to search for the global
maximum. We need to build a policy (by maximizing the acquisition function)
that absorbs the most updated information on the objective function and
recommends the following most promising sampling location in the face of
uncertainties across the domain. The optimization policy guided by maximizing
the acquisition function thus plays an essential role in connecting the Gaussian
process to the eventual goal of Bayesian optimization. In particular, the
posterior predictive distribution obtained from the updated Gaussian process
provides an outlook on the objective value and the associated uncertainty for
locations not explored yet, which could be used by the optimization policy to
quantify the utility of any alternative location within the domain.
When converting the posterior knowledge about candidate locations, that is,
posterior parameters such as the mean and the variance of the Gaussian
distribution at each location, to a single scalar utility score, the acquisition
function comes into play. An acquisition function is a manually designed
mechanism that evaluates the relative potential of each candidate location in
the form of a scalar score, and the location with the maximum score will be used
as the next sampling choice. It is a function that assesses how valuable a
candidate’s location is when we acquire/sample it.
The acquisition function takes into account both the expected value and the
uncertainty (variance) of the function at unexplored locations, as provided by
the Gaussian process posterior distribution. In this context, exploration means
sampling in regions of high uncertainty, while exploitation involves sampling
where the function value is expected to be high.
The acquisition function is also cheap to evaluate as a side computation
since we need to evaluate it at every candidate location and then locate the
maximum utility score, posing another (inner) optimization problem. Figure 9-9
provides a sample curve of the acquisition function.
Figure 9-9 Illustrating a sample acquisition function curve. The location that corresponds to the highest
value of the acquisition function is the next location (parameter value of a trading strategy) to sample. Since
there is no value added if we were to sample those locations already sampled earlier, the acquisition function
thus reports zero at these locations
EI and UCB
Acquisition functions differ in multiple aspects, including the choice of the
utility function, the number of lookahead steps, the level of risk aversion or
preference, etc. Introducing risk appetite directly bene its from the posterior
belief about the underlying objective function. In the case of GP regression as the
surrogate model, the risk is quanti ied by the covariance function, with its
credible interval expressing the uncertainty level about the objective’s possible
values.
Regarding the utility of the collected observations, the expected
improvement chooses the historical maximum of the observed value as the
benchmark for comparison upon selecting an additional sampling location. It
also implicitly assumes that only one more additional sampling is left before the
optimization process terminates. The expected marginal gain in utility (i.e., the
acquisition function) becomes the expected improvement in the maximal
observation, calculated as the expected difference between the observed
maximum and the new observation after the additional sampling at an arbitrary
sampling location.
Speci ically, denote y1 : n = {y1, …, yn} as the set of collected observations at the
corresponding locations x1 : n = {x1, …, xn}. Assuming the noise-free setting, the
actual observations would be exact, that is, y1 : n = f1 : n. Given the collected
dataset , the corresponding utility is
, where is the incumbent maximum observed so
far. Similarly, assume we obtain another observation yn + 1 = fn + 1 at a new
location xn + 1, the resulting utility is
. Taking the difference
between these two gives the increase in utility due to the addition of another
observation:
Figure 9-10 The full Bayesian optimization loop featuring an iterative interaction between the unknown
(black-box) environment and the decision-making policy that consists of a Gaussian process for probabilistic
evaluation and acquisition function for utility assessment of candidate locations in the environment
With the basic BO framework in mind, let us test it out by optimizing the
window lengths of the pairs trading strategy.
import os
import math
import torch
import random
import numpy as np
from matplotlib import pyplot as plt
import torch.nn as nn
import yfinance as yf
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
%matplotlib inline
SEED = 1
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
The next section touches upon the performance of the pairs trading strategy
as the black-box function.
class QTS_OPTIMIZER(nn.Module):
def __init__(self, ticker_pair, start_date, end_date,
riskfree_rate=0.04):
super(QTS_OPTIMIZER, self).__init__()
self.ticker_pair = ticker_pair
self.start_date = start_date
self.end_date = end_date
self.riskfree_rate = riskfree_rate
self.stock = self.get_stock_data()
Listing 9-1 De ining the black-box function for Bayesian optimization
Upon instantiating this class, the __init__() function will get triggered,
which also includes downloading the stock data for the selected ticker and date
range. Listing 9-2 has the de inition of the get_stock_data() method, where
we use the usual download() function to download the data and extract the
adjusted closing price that considers dividends and splits.
def get_stock_data(self):
print("===== DOWNLOADING STOCK DATA =====")
df = yf.download(['GOOG'], start=self.start_date,
end=self.end_date)['Adj Close']
print("===== DOWNLOAD COMPLETE =====")
return pd.DataFrame(df)
Listing 9-2 De ining the method to retrieve stock data
return sharpe_ratio
Listing 9-3 De ining the method to calculate the Sharpe ratio
Let us test the class out. The following code instantiates the class into the
qts variable by passing the ticker symbol of Google and Microsoft with a date
range of the start and end dates of 2022. Note the printed message after running
this line, showing that the get_stock_data() function gets triggered during
the process. Note that there is no mention of entry and exit signals at this stage;
the initialization stage is meant to handle all preparatory work before the actual
scoring in the forward() function.
We can also print the irst few rows of the object’s stock attribute as a sanity
check:
>>> qts.stock.head()
GOOG MSFT
Date
2022-01-03 145.074493 330.813873
2022-01-04 144.416504 325.141388
2022-01-05 137.653503 312.659851
2022-01-06 137.550995 310.189270
2022-01-07 137.004501 310.347412
Let us test out the scoring function. In the following code snippet, we pass in
different values of entry and exit thresholds and obtain the corresponding
Sharpe ratio for the whole year of 2022:
def generate_initial_data(n=10):
# generate random initial locations
train_x1 = x1_bound[0] + (x1_bound[1] - x1_bound[0])
* torch.rand(size=(n,1), device=device, dtype=dtype)
train_x2 = torch.rand(size=(n,1), device=device,
dtype=dtype)
train_x = torch.cat((train_x1, train_x2), 1)
# obtain the exact value of the objective function
and add output dimension
train_y = []
for i in range(len(train_x)):
train_y.append(qts(entry_threshold=train_x1[i],
exit_threshold=train_x2[i]))
train_y = torch.Tensor(train_y,
device=device).to(dtype).unsqueeze(-1)
# get the current best observed value, i.e., utility
of the available dataset
best_observed_value = train_y.max().item()
return train_x, train_y, best_observed_value
Listing 9-4 Generating initial training data
Next, we implement the irst component in BO: the Gaussian process model.
# initialize GP model
from botorch.models import SingleTaskGP
from gpytorch.mlls import ExactMarginalLogLikelihood
def initialize_model(train_x, train_y):
# create a single-task exact GP model instance
# use a GP prior with Matern kernel and constant mean
function by default
model = SingleTaskGP(train_X=train_x,
train_Y=train_y)
mll = ExactMarginalLogLikelihood(model.likelihood,
model)
# optimize GP hyperparameters
from botorch.fit import fit_gpytorch_mll
# fit hyperparameters (kernel parameters and noise
variance) of a GPyTorch model
fit_gpytorch_mll(mll.cpu());
mll = mll.to(train_x)
model = model.to(train_x)
>>> list(model.named_hyperparameters())
[('likelihood.noise_covar.raw_noise', Parameter
containing:
tensor([0.2238], dtype=torch.float64,
requires_grad=True)),
('mean_module.raw_constant', Parameter containing:
tensor(1.1789, dtype=torch.float64,
requires_grad=True)),
('covar_module.raw_outputscale', Parameter containing:
tensor(1.8917, dtype=torch.float64,
requires_grad=True)),
('covar_module.base_kernel.raw_lengthscale', Parameter
containing:
tensor([[-0.8823, -0.9687]], dtype=torch.float64,
requires_grad=True))]
Listing 9-6 Optimizing GP hyperparameters
EI = ExpectedImprovement(model=model_ei,
best_f=best_observed_value)
qEI = qExpectedImprovement(model=model_qei,
best_f=best_observed_value)
beta = 0.8
UCB = UpperConfidenceBound(model=model_ucb, beta=beta)
num_fantasies = 64
qKG = qKnowledgeGradient(
model=model_qkg,
num_fantasies=num_fantasies,
X_baseline=train_x,
q=1
)
Listing 9-7 De ining and initializing the acquisition functions
The acquisition function is used to generate the next parameter value to be
sampled, which is located by maximizing the acquisition function at hand. The
process of searching for the maximum value of the acquisition function within
the search domain is handled by the optimize_acqf() function, which is
provided by the botorch.optim module. The new parameter value, along with
the corresponding score from the unknown objective function, will be used as an
additional training data point to support an updated version of the GP model
and acquisition function in the next round.
Listing 9-8 provides the detailed implementation of passing an acquisition
function and obtaining the next sampling decision and functional observation.
Note the additional parameters required by the optimization procedure
optimize_acqf(): bounds to de ine the search domain of each parameter,
BATCH_SIZE to specify the number of samples to probe at each round (probing
multiple points in parallel is possible), NUM_RESTARTS to control the number of
initial conditions when optimization starts, and RAW_SAMPLES to indicate the
number of initial samples to support heuristic-based optimization over the
acquisition function.
def optimize_acqf_and_get_observation(acq_func):
"""Optimizes the acquisition function, and returns a
new candidate and a noisy observation."""
# optimize
candidates, value = optimize_acqf(
acq_function=acq_func,
bounds=bounds,
q=BATCH_SIZE,
num_restarts=NUM_RESTARTS,
raw_samples=RAW_SAMPLES, # used for
intialization heuristic
)
# observe new values
new_x = candidates.detach()
# sample output value
new_y = qts(entry_threshold=new_x.squeeze()
[0].item(), exit_threshold=new_x.squeeze()[1].item())
# add output dimension
new_y = torch.Tensor([new_y],
device=device).to(dtype).unsqueeze(-1)
# print("new fn value:", new_y)
Let us test out this function with the qKG acquisition function:
>>> optimize_acqf_and_get_observation(qKG)
(tensor([[1.5470, 0.6003]], dtype=torch.float64),
tensor([[2.2481]], dtype=torch.float64))
Before scaling up to multiple iterations, we will also test out the random
search strategy, which selects a random window length for each moving series at
each round. This serves as the baseline for comparison, since manual selection
often amounts to a random search strategy in the initial phase. In the function
update_random_observations() shown in Listing 9-9, we pass a running
list of best-observed function values, perform a random selection, observe the
corresponding functional evaluation, compare it with the current running
maximum, and then return the list of running maxima with the current
maximum appended.
def update_random_observations(best_random):
"""Simulates a random policy by drawing a new random
points,
observing their values, and updating the current
best candidate to the running list.
"""
new_x1 = x1_bound[0] + (x1_bound[1] - x1_bound[0]) *
torch.rand(size=(1,1), device=device, dtype=dtype)
new_x2 = torch.rand(size=(1,1), device=device,
dtype=dtype)
new_x = torch.cat((new_x1, new_x2), 1)
new_y = qts(entry_threshold=new_x[0,0].item(),
exit_threshold=new_x[0,1].item())
best_random.append(max(best_random[-1], new_y))
return best_random
Listing 9-9 De ining the random search strategy
# single trial
import time
N_ROUND = 20
verbose = True
beta = 0.8
best_random.append(best_observed_value)
best_observed_ei.append(best_observed_value)
best_observed_qei.append(best_observed_value)
best_observed_ucb.append(best_observed_value)
best_observed_qkg.append(best_observed_value)
# update progress
best_random = update_random_observations(best_random)
best_value_ei = max(best_observed_ei[-1],
new_y_ei.item())
best_value_qei = max(best_observed_qei[-1],
new_y_qei.item())
best_value_ucb = max(best_observed_ucb[-1],
new_y_ucb.item())
best_value_qkg = max(best_observed_qkg[-1],
new_y_qkg.item())
best_observed_ei.append(best_value_ei)
best_observed_qei.append(best_value_qei)
best_observed_ucb.append(best_value_ucb)
best_observed_qkg.append(best_value_qkg)
t1 = time.monotonic()
Listing 9-10 Performing the sequential search
Let us plot the search progress so far via the following code snippet:
Let us repeat the experiments a number of times to assess the stability of the
results, as shown in Listing 9-11.
# multiple trials
# number of runs to assess std of different BO loops
N_TRIALS = 4
# indicator to print diagnostics
verbose = True
# number of steps in the outer BO loop
N_ROUND = 20
best_random_all, best_observed_ei_all,
best_observed_qei_all, best_observed_ucb_all,
best_observed_qkg_all = [], [], [], [], []
best_random.append(best_observed_value)
best_observed_ei.append(best_observed_value)
best_observed_qei.append(best_observed_value)
best_observed_ucb.append(best_observed_value)
best_observed_qkg.append(best_observed_value)
# update progress
best_random =
update_random_observations(best_random)
best_value_ei = max(best_observed_ei[-1],
new_y_ei.item())
best_value_qei = max(best_observed_qei[-1],
new_y_qei.item())
best_value_ucb = max(best_observed_ucb[-1],
new_y_ucb.item())
best_value_qkg = max(best_observed_qkg[-1],
new_y_qkg.item())
best_observed_ei.append(best_value_ei)
best_observed_qei.append(best_value_qei)
best_observed_ucb.append(best_value_ucb)
best_observed_qkg.append(best_value_qkg)
t1 = time.monotonic()
best_observed_ei_all.append(best_observed_ei)
best_observed_qei_all.append(best_observed_qei)
best_observed_ucb_all.append(best_observed_ucb)
best_observed_qkg_all.append(best_observed_qkg)
best_random_all.append(best_random)
Listing 9-11 Assessing the stability of the results via repeated experiments
Running the code generates Figure 9-12, suggesting that BO-based search
strategies consistently outperform the random search strategy.
Figure 9-12 Assessing the stability of the results via repeated experiments
Finally, let us extract the mean and standard deviation of all experiments, as
shown in Listing 9-12.
def extract_last_entry(x):
tmp = []
for i in range(4):
tmp.append(x[i][-1])
return tmp
rst_df = pd.DataFrame({
"EI":
[np.mean(extract_last_entry(best_observed_ei_all)),
np.std(extract_last_entry(best_observed_ei_all))],
"qEI":
[np.mean(extract_last_entry(best_observed_qei_all)),
np.std(extract_last_entry(best_observed_qei_all))],
"UCB":
[np.mean(extract_last_entry(best_observed_ucb_all)),
np.std(extract_last_entry(best_observed_ucb_all))],
"qKG":
[np.mean(extract_last_entry(best_observed_qkg_all)),
np.std(extract_last_entry(best_observed_qkg_all))],
"random":
[np.mean(extract_last_entry(best_random_all)),
np.std(extract_last_entry(best_random_all))],
}, index=["mean", "std"])
>>> rst_df
EI qEI UCB qKG random
mean 2.736916 2.734416 2.786065 2.706545 2.470426
std 0.116130 0.146371 0.106940 0.041464 0.247212
Listing 9-12 Extracting the mean and standard deviation for all experiments
Exercises
How does Bayesian optimization approach the problem of hyperparameter
tuning in trading strategies? What makes this approach particularly suitable
for this task?
Change the objective function to search for the parameters that minimize the
maximum drawdown of the trend-following strategy.
Bayesian optimization is based on a probabilistic model of the objective
function, typically a Gaussian process (GP). How does this model assist in
identifying areas of the search space to explore or exploit?
Can you describe a scenario where a long-term (nonmyopic) acquisition
function would be bene icial in the context of optimizing trading strategies?
What about a scenario where a short-term (myopic) function might be
preferable?
Can you discuss how the incorporation of prior knowledge can be leveraged in
the Bayesian optimization process for parameter tuning in trading strategies?
How can Bayesian optimization handle noisy evaluations, a common
occurrence in inancial markets, during the optimization process of a trading
strategy’s parameters?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_10
Machine learning can be used in pairs trading in several ways to improve the
effectiveness of trading strategies. Examples include pair selection, feature engineering,
spread prediction, etc. In this inal chapter, we are going to focus on spread prediction
using different machine learning algorithms in order to generate trading signals.
Figure 10-1 Summarizing the three components that determine the success of a pairs trading strategy
Figure 10-2 Example of a typical model training process. The work low starts with the available training data and
gradually tunes a model. The tuning process requires matching the model prediction to the target output, where the gap is
measured by a particular cost measure and used as feedback for the next round of tuning. Each tuning produces a new
model, and we want to look for one that minimizes the cost
In the following section, we will introduce the high-level principles of three different
types of machine learning algorithms: support vector machine, random forest, and
neural network.
Support Vector Machine
Support vector machine (SVM) is a popular supervised learning algorithm, especially in
the Kaggle community, for both classi ication and regression. In the context of
classi ication, SVM works by mapping the input data from its original feature space into a
high-dimensional feature space using a kernel function, and then inding the hyperplane
that best separates the different classes of data. The hyperplane is chosen in order to
maximize the margin between the classes. Seeking a boundary based on the principle of
maximal margin often leads to a better generalization performance, thus reducing the
risk of over itting.
Since we are interested in predicting the spread as a continuous outcome, making it a
regression task, SVM instead inds the hyperplane that best separates the input data
while minimizing the margin violations. In this case, our goal in the regression task is to
it a hyperplane as closely as possible to the actual data points by minimizing the sum of
the squared errors (SSE) as the cost measure between the predicted output and the
actual target values. Since minimizing SSE toward zero would easily lead to an
over itting model, the SVM model used in regression often assumes an ϵ-insensitive loss
function, which allows the model to tolerate some error in its predictions, up to a certain
threshold ϵ.
There are multiple technical terms here that serve more explanation. Let us start
with the concept of the hyperplane. A hyperplane is a decision line used to predict the
continuous output in the case of regression. The data points on either side of the
hyperplane within a certain distance (speci ically, within ϵ) are called support vectors. We
can also use these support vectors to draw two decision boundaries around the
hyperplane at a distance of ϵ.
Moving on, a kernel is a set of mathematical functions that take data as input and
transform it into the required form, possibly in a different dimension. These are
generally used for inding a hyperplane in the higher-dimensional space, which is
considered easier to achieve linear separation than inding the same separating
hyperplane in the original feature space. Using kernels in SVM provides a powerful and
lexible tool for classi ication and regression tasks, allowing SVM to handle complex and
even nonlinearly separable datasets.
Figure 10-3 helps illustrate these concepts. Given a set of training observations in the
form of input-output pairs, the support vector regression model will build a hyperplane
as the regression line to predict future test data. The hyperplane is surrounded by two
decision boundaries, determined by a user-speci ied hyperparameter ϵ. Here, ϵ speci ies
the width of the ϵ-insensitive zone (or tolerance zone) around the regression line, where
errors are not penalized. Not all the points are within the decision boundaries, and SVM
is designed to minimize such margin violations by maximizing the number of points
within the decision boundary upon estimating the hyperplane.
Figure 10-3 Illustrating the training mechanism of the support vector regression model
Note that ϵ controls the tolerance of the margin violation. It determines the trade-off
between the model complexity and the predictive accuracy. A small value of ϵ will result
in a complex model that closely its the training data, but risks over itting the training set
and therefore generalizing poorly to the new data. On the other hand, a large value of ϵ
will result in a simpler model with larger errors but potentially a better generalization
performance.
As a user-speci ied hyperparameter, the choice of ϵ can be highly sensitive to the
resulting predictive performance. A common approach is cross-validation, which
involves partitioning the raw data into training and validation sets several times, each
starting with a different random seed. The best ϵ is the one that reports the highest
predictive performance on average.
We introduce the random forest model in the following section.
Random Forest
Random forest is a type of ensemble model, which includes multiple simple models
combined together to make the inal prediction. It is a powerful and lexible model that
can be used for both regression and classi ication tasks. As the name suggests, the
algorithm constructs multiple decision trees and combines all trees in the forest to make
a inal prediction.
The main differentiating factor about random forest compared with other models is
how the raw training dataset is divided to support the training of each tree. Speci ically,
each tree is trained on a different subset of the data and a different subset of the
features, a process known as bagging or bootstrap aggregation. By using random subsets
of the data and features, the algorithm creates multiple independent submodels that
have a low bias and high variance. The inal prediction is then produced by taking the
average of the predictions of all the individual trees, similar to collecting the views from
multiple independent consultants and taking the average recommendation as the inal
decision.
Note that at each node of the tree, a random subset of features is considered to
determine the best split, instead of considering all features. This process is called feature
bagging. The randomness in feature selection ensures that the trees are decorrelated
and reduces the chance of over itting.
Random forests are widely used for their simplicity, versatility, and robustness. They
can handle a mix of numerical and categorical features, require very little preprocessing
of the data, and provide a built-in method for handling missing values. Furthermore, they
offer measures of feature importance, which can provide insights into the underlying
structure of the data.
Figure 10-4 illustrates the overall training process of the random forest model. We
start by sampling from the original training set to obtain a total of B subsets. Each
sampling randomly selects both observations and features, so that the resulting subsets
appear to be independent of each other and uncorrelated in the feature space. We will
then train a decision tree model for each subset, leading to B submodels. Upon assessing
a new test data point, these B predictions will be aggregated together and averaged to
produce the inal prediction.
Figure 10-4 Illustrating the training mechanism of the random forest model
Neural Network
A neural network consists of multiple interconnected nodes, also called neurons, stacked
together in layers. Each neuron serves as a function that receives input from the neurons
in the preceding layer, performs a nonlinear transformation on that input, and sends an
output to the neurons in the next layer. In between these neurons are the weights, also
called parameters of the neural network. Learning a neural network model essentially
means tuning the weights so that the inal prediction is accurate, and the model
generalizes well to the test set.
A typical neural network consists of an input layer representing the input data and an
output layer generating the output. It can also include any number of layers in between
(called hidden layers). Each layer contains at least one neuron, interpreted as an
extracted hidden feature. When it comes to the number of layers of a neural network, it
refers to the hidden layer plus the output layer. For example, a perceptron is a single-
layer neural network, meaning it has only input and output layers and does not have any
hidden layer in between.
Being the fundamental constituent of a neural network, a perceptron is a single
neuron that completes two steps of mathematical operations: the weighted sum and the
nonlinear transformation. For a single observation with p dimensions x ∈ ℝp, the
perceptron irst calculates the weighted sum between x and its corresponding
weight vector w ∈ ℝp, which is (and should be) also p-dimensional. The weighted sum is
often accompanied by one more term called intercept or bias, which acts as an additional
parameter to exercise a global level shift to the weighted sum to it the data better.
After adding an intercept/bias term b, the sum passes through an activation function
which introduces a nonlinear transformation to the weighted sum. Note that the bias
term is added by inserting a column of ones in the input data, which is the same bias
trick as linear regression. Such nonlinear transformation, together with the number and
width of layers, determines neural networks' lexibility, expressivity, and approximating
power. Figure 10-5 summarizes the process low of a perceptron.
Figure 10-5 The process lowchart of a perceptron, which consists of a weighted sum operation followed by an
activation function. A column of ones is automatically added to correspond to the bias term in the weight vector
The most popular choice of activation function is the recti ied linear unit (ReLU),
which acts as an on/off switch that ires the input signal as it is if its value is above a
speci ic threshold and mutes it by outputting zero if it is below the threshold. In other
words, the ReLU operation is an identity function if the input is positive; otherwise, the
output is set as zero. Without such nonlinear activation, a multilayer neural network
would simply become a series of linear functions stacked on top of each other, resulting
in a linear model.
Figure 10-6 visualizes the ReLU function's shape and summarizes the characteristics
of the perceptron operation discussed so far. Other than the architectural lexibility of a
neural network model in terms of the number and width of its layers, another main
added lexibility lies in the nonlinear operation. In fact, many exciting and meaningful
hidden features could be automatically extracted using ReLU as an activation function.
For example, when training an image classi ier using a special architecture called
convolutional neural networks, low-level features in the initial hidden layers tend to
resemble fundamental structural components such as lines or edges, while high-level
features at later hidden layers start to learn structural patterns such as squares, circles,
or even complex shapes like the wheels of a car. This is not possible if we are limited to
the linear transformation of features and is considered an extremely dif icult task if we
were to engineer such informative features manually.
Figure 10-6 Decomposing a single perceptron into a weighted sum and an activation function which is often ReLU. The
ReLU operation passes through a signal if it is positive and mutes it if it is negative. Such nonlinearity also introduces great
approximating power to the neural networks in addition to the lexibility in designing the number and width of layers
One of the reasons why ReLU (and its variants) remains the most popular activation
function is its fast gradient computation. When the input is less than or equal to zero, the
gradient (of a constant number) becomes zero, thus saving the need for backpropagation
and parameter update. When the input is positive, the gradient (of the original input
variable) is simply one, which gets backpropagated as it is.
Having reviewed these three model classes, let us switch to the implementation of
pairs trading and compare their performances after using machine learning models to
predict the daily spread.
import os
import random
import numpy as np
import yfinance as yf
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
from matplotlib import pyplot as plt
%matplotlib inline
SEED = 8
random.seed(SEED)
np.random.seed(SEED)
For simplicity, we will de ine spread as the difference in the log price of the two
stocks, which is calculated and visualized in Listing 10-2.
Figure 10-7 Visualizing the daily spread de ined as the difference in the log price of both stocks
Feature Engineering
Feature engineering is the process of selecting, transforming, and extracting relevant
features from the raw data in order to boost the performance of a machine learning
model. The quality and sometimes the quantity of the features are critical factors that
in luence the performance of a machine learning model. These additional engineered
features may not necessarily make sense from an interpretability perspective, yet they
will likely improve the predictive performance of the machine learning algorithm by
offering a new knob for the model to tune with.
We have already encountered feature engineering in previous discussions, with the
moving average being the most notable example. In this exercise, we will use ive
features to predict the spread series, including the daily returns for both stocks, the ive-
day moving average of the spread series, and the 20-day moving standard deviation of
daily returns. These are created in Listing 10-3.
Let us also split the data into a training and a test set. We will adopt the common 80-
20 rule; that is, 80% of the data goes to the training set, and 20% goes to the test set. We
will also observe the sequence of time, so the 80% training set does not peak in the
future, as shown in Listing 10-4.
With the training and test data ready, we can now move into the model training part,
starting with SVM.
svm_model = SVR(kernel='linear')
svm_model.fit(train_X, train_y)
train_pred = svm_model.predict(train_X)
>>> print("training rmse: ",
np.sqrt(mean_squared_error(train_y, train_pred)))
test_pred = svm_model.predict(test_X)
>>> print("test rmse: ", np.sqrt(mean_squared_error(test_y,
test_pred)))
training rmse: 0.039616044798431914
test rmse: 0.12296547390274865
Listing 10-5 Model training and testing using SVM
The RMSE measures the model’s predictive performance. However, we still need to
plug the model into the trading strategy and evaluate the ultimate pro itability in the
pairs trading strategy. As the only change is on the predicted spread based on the speci ic
machine learning model, we can de ine a function to score the model as an input
parameter and output the terminal pro it. The score_fn() function in Listing 10-6
completes the scoring operation.
import torch
In this function, we add another input parameter to control if the model belongs to a
neural network. This control is placed here to determine the speci ic prediction method
to use. For standard sklearn algorithms such as SVM and random forest, we can call the
predict() method of the model object to generate predictions for the given input data.
However, when the model is a neural network trained using PyTorch, we need to irst
convert the input to a tensor object using torch.Tensor(), generate predictions by
calling the model object itself (underlying, the forward() function within the model class
is called), extracting the outputs without gradient information using the detach()
method, and converting to a NumPy object using numpy().
Next, we calculate the z-score using the mean and the standard deviation of the
predicted spread series. We then use an entry threshold of two and an exit threshold of
one to generate the trading signals based on the standardized z-scores. The rest of the
calculations follow the same approach as in the previous chapter.
We can now use this function to obtain the terminal return for the pairs trading
strategy using the SVM model:
>>> score_fn(svm_model)
1.143746922303926
Similarly, we can obtain the same measure using the random forest regressor.
# random forest
from sklearn.ensemble import RandomForestRegressor
The result shows that random forest can better it the data with a lower training and
test set RMSE compared with SVM.
We also calculate the terminal return as follows:
>>> score_fn(svm_model)
0.9489411965252148
The result reports a lower terminal return, despite a better predictive performance.
This is also over itting, in the sense that a more predictive model at the stage-one
prediction task leads to a lower terminal return at the stage-two trading task. Combining
these two tasks in a single stage is an interesting and active area of research.
We move to neural networks in the next section.
Note that we use the .values attribute to access the values from the DataFrame
and the view() function to reshape the target into a column.
Next, we de ine the neural network model in Listing 10-8. Here, we slot the attributes
to the initialization function, including one input linear layer, one hidden linear layer, and
one output linear layer. The number of incoming neurons in the input layer (i.e.,
train_X.shape[1]) and the number of outgoing neurons in the output layer (i.e., 1)
are determined by the speci ic problem at hand. The number of neurons in the middle
layers is user de ined and directly determines the model complexity. All these layers are
chained together with a ReLU activation function in the middle via the forward()
function. Also, note that it is unnecessary to apply ReLU to the last layer since the output
will be a scalar value representing the predicted spread.
The result shows that the neural network contains a total of 2497 parameters over
three linear layers. Note that the ReLU layer does not have any associated parameters as
it involves deterministic mapping only.
Next, we de ine the loss function as the mean square error using MSELoss() and
choose Adam as the optimizer over the network weights, with an initial learning rate of
0.001:
We now enter the iterative training loop to update the weights by minimizing the
speci ied loss function, as shown in Listing 10-10.
Here, we iterate over the training set for a total of 100 epochs. In each epoch, we irst
clear the existing gradients in memory using the zero_grad() function of the
optimizer. Next, we score the training set to obtain predicted targets in outputs,
calculate the corresponding MSE loss, perform backward propagation to calculate the
gradients using autograd functionality via the backward() method, and inally
perform gradient descent update using the step() function.
Running the code generates the following results, where we see that the training loss
continues to decrease as iteration proceeds:
The result shows that the neural network is less over itting than the random forest
model.
Now we obtain the terminal return of the pairs trading strategy based on the neural
network model:
Again, this result shows that an accurate machine learning model may not necessarily
lead to a higher terminal return in the pairs trading strategy. Even if the machine
learning model is predictive of future spreads, another layer of assumption imposed by
the pairs trading strategy is that the temporary market luctuations will ease down, and
the two assets will revert back to the long-term equilibrium relationship. Such an
assumption may not necessarily stand, along with the many unpredictable factors in the
market.
Summary
In this chapter, we introduced different machine learning algorithms used in predicting
the spread, a key component when employing the pairs trading strategy. We started by
introducing the overall framework when training any machine learning algorithm and
then elaborated on three speci ic algorithms: support vector machine, random forest,
and neural network. Lastly, we plugged these models into the strategy and found that a
higher predictive performance by the machine learning model, a sign of over itting, may
lead to a lower performance score in terms of cumulative return. It is thus important not
to over it the machine learning models at the prediction stage and instead focus more on
the inal performance of the trading strategy at the decision stage, where the actual
trading action is made.
Exercises
How does the SVM model determine the optimal hyperplane for predicting the spread
in a pairs trading strategy? What are the key parameters that need to be adjusted in an
SVM?
How does a random forest algorithm handle feature selection when predicting the
spread in a pairs trading strategy? What are the implications of feature importance in
this context?
Explain how SVM, random forest, and neural networks approach the problem of
over itting in the context of predicting the spread in a pairs trading strategy.
How can you handle nonlinear relationships between features in SVM, random forest,
and neural networks when predicting the spread in a pairs trading strategy?
How can the layers in a neural network be optimized to improve the prediction of the
spread in a pairs trading strategy?
Index
A
Acquisition function
best-observed value
closed-form EI
decision-making
de ining and initializing
randomness
trade-off between
UCB
Actively managed investment pools
Active traders
add_constant() function
Advanced order types
Agency orders
Agency trading
agg() function
All-or-none (AON)
alpha argument
Annualized variance
Annualizing returns
Annualizing volatility
Annuities
apply() function
Arbitrage
Arithmetic mean
Asset classes
Asset price curve
asset_return1
asset_return2
Augmented Dickey-Fuller (ADF) test
Automated optimization techniques
Automated trading
B
Backtesting
historical data
market phases
maximum drawdown/max drawdown
optimistic assessment
parameters
performance
performance indicator
procedure
pro its
risk and reward assessments
test set performance
trend-following strategy
backward() method
Backwardation
Bayesian optimization
black-box function
environment
parameter
policy
Big exchanges
Black-box function
Bollinger Bands
Bonds
botorch.optim module
Buy-and-hold strategy
Buy-side institutional investors
Buy-side prices
Buy-side retail investors
C
Call market
Candlestick() function
Candlestick charts
Cash settlement
Central depository (CDP)
Centralized order-driven market
ChatGPT model
Chicago Board of Trade (CBOT)
Clearance
Clearing house
coint() function
Cointegration
correlation
equilibrium
statsmodels package
time series
hypothesis
non-stationary time series
process
statistical analysis
statistical characteristics
traditional statistical methods
Compounded return
Compounding returns
Contango
Continuously compounded returns
Continuous market
Convenience yield
cummax() function
.cumprod() function
cumprod() function
D
Daily drawdowns
Dark pools
DataFrames
Data-generating process
datetime format
Day trader
Delivery
Derivative market
df2 variable
dfBidPrices
dfPrices
diff() function
Display orders
DJI Stock Symbols
Dow Jones Industrial Average (DJIA)
momentum trading
download() function
drawdown()
dropna() function
E
Earnings per share (EPS)
Electronic communications networks (ECNs)
Electronic markets
buying and selling inancial instruments
discrete price grid
discrete price ladder
display vs. non-display order
electronic order
limit order
limit order book
market order
market participants
MIT
order low
order matching systems
order types
pegged order
price impact
proprietary and agency trading
revolution
stop-limit order
stop order
trailing stop order
E-Mini futures contract
equals() function
Evaluation-period performance
ewm() method
ExactMarginalLogLikelihood() function
Exchange-traded funds (ETFs)
Execution risk
Expected improvement (EI)
Exploratory data analysis
Exponentially weighted moving average (EWMA)
Exponential moving average (EMA)
Exponentiation
F
FAK orders
Feature engineering
SVM and random forest
training and test data
DataFrame X
Fill and kill (FAK)
Fill or kill (FOK)
Financial assets
Financial data analysis
de inition
downloading stock price data
summarizing stock prices
visualizing stock price data
Financial derivatives
Financial instrument
Financial market stability
Financial trading
First-order gradient-based methods
First-period return
Flexible controls
FOK orders
Forward and futures contracts
age-old practice
derivative products
inancial instruments
futures trading
key difference
market participants opportunities
predetermined quantity
purchase and receive obligation
Forward contract
arbitrage opportunities
buy-low-sell-high principle
counterparty risk
current time point
de inition
exponential constant
formula
net cash low
no-arbitrage argument
portfolio
predetermined delivery price
private agreements
risk-free interest rate
stock and cash positions
trading price and quantity
unforeseen circumstances
Futures contract
clearing house
hedging and speculation
leverage
mark-to-market
obligations at maturity
parameters
pricing
standardized features
standardized contracts
Futures data
closing price
downloading
fontsize argument
“GC=F” and “HG=F” symbols
technical indicators
visualizing
y inance package
Futures trading
G
Gaussian distribution
Gaussian process (GP) model
generate_initial_data()
get_stock_data() function
go.Bar() function
Group tradable assets
derivative products
maturity
nonlinear payoff function
payoff function linearity
H
head() function
Hedge funds
Hedgers
Hedging
Hidden/non-display orders
High-frequency trading (HFT)
Hyperparameter tuning
Hypothesis testing
I, J, K
Iceberg orders
idxmin() function
iloc() function
iloc() method
Immediate or cancel (IOC)
Implementing trend-following strategy
buy-and-hold
cumulative returns analysis
framework
long-term moving average
momentum-related technical indicators
1+R return
short-term moving average
signal column
sign switch
single-period return
SMA-3 and SMA-20
trading actions
trading rule
transaction cost
info() function
initialize_model()
Input data groups
inancial news
fundamentals
market states
technicals
Institutional algorithmic trading
Interpolation
L
Label distribution
Leverage
Limit order
Limit order book (LOB)
Linear regression model
LOB data
data folder
features
label distribution
limit prices
normalized data representations
price-volume data
price-volume pair
visualizing price movement
loc() function
Logarithmic returns
advantages
compounding returns
dummy stock prices
mathematical computations
natural logarithm
percentage return
1+R approach
sequential compounding process
single-period returns
stock price analysis
stock returns calculation
terminal return
Lookback windows application
M
Machine learning
components
market-neutral strategy
pairs trading
calculation
stocks
trading horizon
trained model
training process work low
training situation
types
make_subplots() function
Marked to market (MTM)
Market if touched (MIT)
Market maker
Market-neutral trading strategy
Market-not-held orders
Market orders
Market participants
Market timing
Mark-to-market
de inition
exchange
inal settlement price
luctuating prices
long margin account
minimum requirement
price updation
pro it and loss
traders risk exposure
Maximum drawdown
buy-and-hold strategy
calculating
calculation process
daily returns
DataFrame
distance
line charts
performance
risk-adjusted return metric
risk measure
stock price data
stock returns
stocks
trading strategy
volatility
wealth index curve
Maximum log-likelihood (MLL) approach
mean() method
Mean square error
Model development work low
Model training process
Momentum trading
asset’s price
characterizes
current month
elements
time frame
volatility
volume
measurement period
monthly returns
principle
terminal return
traders
traders and investors
and trend-following
Moving Average Convergence Divergence (MACD)
Moving averages (MA)
Multiperiod return
Mutual funds
N
NaN value
NASDAQ Nordic stock market
Neural network
fundamental constituent
input data and an output layer
linear regression
parameters
ReLU function
New York Stock Exchange (NYSE)
No-arbitrage argument
Nonconvex function
Non-display orders
Normal contango
Normal contango
Normality
n-period investment horizon
np.exp() function
np.mean() function
Null hypothesis
O
Objective functions
OHLC prices
OHLC chart
On-balance volume (OBV)
One-dimensional objective function
Online trading platforms
Optimization
argmax operation
decision-making
derivative-free
global
optimizer
parameters
procedure
time and resources
trading strategy
Order-driven market
Order low
Order matchings systems
de inition
conditional orders
electronic exchanges
exchanges
non-displayed orders
order precedence rules
order types
price/display/time precedence rule
rule-based systems
Order precedence rules
types
price precedence
size precedence
time precedence
Order types
Ordinary least squares (OLS)
Over-the-counter (OTC)
P
Pairs trading
assets
asset selection
components
implementation
mean-reverting behavior
neural network
SVM
it() method
predict() method
score_fn() function
sklearn algorithms
torch.Tensor()
strategy
view() function
traders
Pandas DataFrame
pct_change() function
pd.DataFrame() function
Pegged order
algorithm
best bid
composite order
de inition
differential offset
dynamic limit price
limit order
reference price
securities
Percentage change
Percentage returns
p-hacking
Physical delivery
plot() function
plotly package
Potential trading opportunities
predict() method
Price impact
Price ladder
Price movement visualization
Price precedence
Price return
Price slippage
Principle of compounding
prod() function
Program trading
Proprietary orders
Proprietary trading
Q
qcut() function
Quantitative trading
algorithm
avenues and steps
buy-side investors
common assets
See Tradable assets, quantitative trading
data collection and processing
de inition
grouping tradable assets
institutional algorithmic trading
market making
market structures
model development work low
order execution
portfolio rebalancing
process
quant trader
scalping
structured features
Quant trader
Quote-driven/price-driven market
R
Random forest
bagging or bootstrap aggregation
factor
features
training process
Random forest regressor
Real estate investment trusts (REITs)
Rebalances a portfolio
Rebalancing
Recti ied linear unit (ReLU)
Relative Strength Index (RSI)
Rerminal return
resample() function
return_df variable
Returns analysis
annualized returns calculation
annualizing
description
dummy returns
multiperiod return
1+R format
single-period returns calculation
stock return with dividends
terminal return
two-period terminal return calculation
Return values
1+R format
1+R formatted DataFrame
Risk-adjusted return
Risk analysis
annualized returns calculation
annualized volatility calculation
column-wise arithmetic mean returns
Sharpe ratio
Sharpe ratio calculation
stock price data
variance and standard deviation
volatility
Risk and return trade-off
diversi ication strategies
factors
individual asset
low-return asset
pro it maximization
stock market
two-dimensional coordinate system
Risk-free bond interest rate
Risk-free interest rate
1+R method
rolling() function
Root mean squared error (RMSE)
1+R return
Rule-based approach
S
Scalping
shape() function
Sharpe ratio
shift() function
Short-term swings
Simple moving average (SMA)
Singaporean investment
Singapore Exchange (SGX)
Single-period logarithmic return
Single-period log returns
Single-period percentage return
Single-period returns
Single-period volatility
Size precedence
Slippage
SMA-3
Speculators
S&P 500 E-Mini futures contract
Spot market
Stacked bar charts
Standard deviation
Standardization
Stationarity
adfuller() function
distribution
mean and standard deviation
random.normal() function
stationarity_test()
stock prices
time series
Statistical arbitrage
concept
market movements
mean reversion
short-term luctuations
short-term market factors
statistical methods
steps
stocks
Statistical concept
Statistical measures
std() function
Stock data
Stock price data
Stock return with dividends
Stocks
Stop-entry order
Stop-limit order
Stop-loss orders
Stop orders
summary() function
Sum of the squared errors (SSE)
Support vector machine (SVM)
hyperplane
input-output pairs
mathematical functions
support vectors
user-speci ied hyperparameter
Symmetry
T
tail() function
Tangible and intangible factors
Technical indicators
additional features
Bollinger Bands
DataFrame
EMA
integral
MA
MACD
market analysis clari ication
mathematical calculations
raw futures time series data
RSI
SMA
volume-based indicators
Terminal monthly return
Terminal return
Ticker() module
Time precedence
Time series data
today() function
torch.Tensor() function
Tradable assets, quantitative trading
annuities
bonds
cash and equivalents
commodities
currencies
ETFs
forward
futures
hedge funds
mutual funds
options
REITs
stocks
Trade formation period
Traders
Trading agency
Trading algorithm
Trading avenues
Trading signals
Trading steps
acquisition of information and quotes
con irmation, clearance, and settlement
execution of order
routing of order
Trading volume
Trailing stop orders
Transactions
Trend following strategy
de inition
implementation
See Implementing trend-following strategy
log return
See Logarithmic return
lookback window
risk management techniques
technical indicators
See also Trend trading
Trend traders
Trend trading
de inition
EMA
fundamental principle
moving average
SMA
technical analysis tools
technical indicators
See Technical indicators
Two-period return
Typical model training process
U
Unit root test
Upper con idence bound (UCB)
V, W, X
value_counts() function
Variance and standard deviation
Volatility
Volume-weighted average price (VWAP)
Y
Yahoo! Finance
y inance library
y inance package
Z
zero_grad() function
Zero-sum game
Z-score