Peng Liu Quantitative Trading Strategies Using Python - Technical Analysis - Statistical Testing - and
Peng Liu Quantitative Trading Strategies Using Python - Technical Analysis - Statistical Testing - and
Apress Standard
The publisher, the authors, and the editors are safe to assume that the advice
and information in this book are believed to be true and accurate at the date
of publication. Neither the publisher nor the authors or the editors give a
warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The
publisher remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
Quantitative trading, also called algorithmic trading, refers to automated trading activities that buy or sell
particular instruments based on specific algorithms. Here, an algorithm can be considered a model that transforms
an input into an output. In this case, the input includes sufficient data to make a proper trading decision, and the
output is the action of buying or selling an instrument. The quality of a trading decision thus relies on the
sufficiency of the input data and the suitability and robustness of the model.
Developing a successful quantitative trading strategy involves the collection and processing of vast amounts of
input data, such as historical price data, financial news, and economic indicators. The data is passed as input to the
model development process, where the goal is to accurately forecast market trends, identify trading opportunities,
and manage potential risks, all of which are reflected in the resulting buy or sell signals.
A robust trading algorithm is often identified via the process of backtesting, which involves simulating the
algorithm’s performance using historical data. Simulating the performance of the algorithm under different
scenarios allows us to assess the strategy’s potential effectiveness better, identify its limitations, and fine-tune the
parameters to optimize its results. However, one also needs to be aware of the potential risks of overfitting and
survivorship bias, which can lead to inflated metrics and potentially poor test set performance.
In this chapter, we start by covering a few basic and important concepts related to quantitative trading. We then
switch to hands-on examples of working with financial data using Python.
The model used to generate trading signals could be either rule-based or trained using data. The rule-based
approach mainly relies on domain knowledge and requires explicitly writing out the logic flow from input to
output, similar to following a cooking recipe. On the other hand, the data-driven approach involves training a
model using machine learning techniques and using the model as a black box for prediction and inference. Let us
review the overall model training process in a typical machine learning workflow.
Figure 1-2 Example of a typical model training process. The workflow starts with the available training data and gradually tunes a model. The tuning
process requires matching the model prediction to the target output, where the gap is measured by a particular cost function and used as feedback for the
next round of tuning. Each tuning produces a new model, and we want to look for one that minimizes the cost
Now let us look at a specific type of algorithmic trading at large institutions: institutional algorithmic trading.
The institutional algorithmic strategies generate optimal trading signals by analyzing daily quotes and prices.
For example, an institutional algorithmic strategy may suggest entering a long position if the current stock price
moves from below to above the volume-weighted average price (VWAP) over a day, a technical indicator often
used by short-term traders. The institutional algorithmic strategies may also exploit arbitrage opportunities or
price spreads between correlated securities. Here, arbitrage means making positive sure profits with zero
investments. Arbitrage opportunities, if exist, would normally disappear very fast as many hedge funds and
investors are constantly looking for such arbitrage opportunities.
The next section briefly introduces the role of a quant trader.
Figure 1-3 Grouping common investment assets into four major classes
Alternatively, we can group tradable assets based on the type of maturity. Stocks, currencies, and commodities
are asset classes with no maturity, while fixed-income instruments and derivatives have maturities. For vanilla
security with a maturity date, such as a futures contract, it is possible to compute its fair price based on the no-
arbitrage argument, a topic we will discuss in Chapter 3.
We can also group assets based on the linearity of the payoff function at maturity for certain derivative
instruments. For example, a futures contract allows the buyer/seller to buy/sell the underlying asset at an agreed
price at maturity. Let us assume the underlying (stock) price at the maturity date is ST and the agreed price is K.
When a buyer enters/longs a futures contract to buy the stock at price K, the buyer would make a profit of ST − K if
ST ≥ K (purchase the stock at a lower price) or suffer a loss of K − ST if ST < K (purchase the stock at a higher
price). A similar analysis applies to the case of entering a short position in a futures contract. Both functions are
linear with respect to the underlying asset’s price upon exercise. See Figure 1-4 for an illustration of linear payoff
functions.
Figure 1-4 Illustration of the linear payoff function of entering a long or short position in a futures contract
Other derivative products with linear payoff functions include forwards and swaps. These are easy to price
since their prices are linear functions of the underlying asset. We can price these instruments irrespective of the
mathematical model for the underlying price. In other words, we only require the underlying asset’s price, not the
mathematical model around the asset. These assets are thus subject to model-independent pricing.
Let us look at the nonlinear payoff function from an options contract. A call option gives the buyer a choice to
buy the underlying asset at the strike price K at the maturity date T when the underlying asset price is ST, while a
put option changes such choice to selling the underlying asset at the strike price K. Under both situations, the buyer
can choose not to exercise the option and therefore gains no profit. Given that an investor can either long or short a
call or put option, there are four combinations when participating in an options contract, as listed in the following:
Long a call: Buy a call option to obtain the opportunity to buy the underlying asset at a prespecified strike price
upon maturity.
Short a call: Sell a call option to allow the buyer the opportunity to buy the underlying asset at a prespecified
strike price upon maturity.
Long a put: Buy a put option to obtain the opportunity to sell the underlying asset at a prespecified strike price
upon maturity.
Short a put: Sell a put option to allow the buyer the opportunity to sell the underlying asset at a prespecified
strike price upon maturity.
Figure 1-5 contains the payoff functions for the four different combinations, all of which are nonlinear
functions of the underlying asset price ST.
Note that tradable instruments within the same asset class exhibit similar characteristics but will differ from
one another in some aspects. The market behavior will differ for tradable instruments that follow their respective
price dynamics.
We can also group a tradable asset according to whether it belongs to the cash market or the derivative market.
The cash market, also called the spot market, is a marketplace where trading instruments are exchanged at the
point of sale, and purchasers take immediate possession of the trading products. For example, the stock exchange
falls into the cash market since investors receive shares of stock almost immediately in exchange for cash, thus
settling the transactions on the spot.
On the other hand, the derivative market completes a transaction only at a prespecified date in the future. Take
the futures market, for example. A buyer who pays for the right to receive a good only gets to expect the delivery
at a prespecified future date.
The next section introduces common trading avenues and steps.
Market Structures
Before 2010, open outcry was a popular way to communicate trade orders in trading pits (floor). Traders would tap
into temporary information asymmetry and use verbal communication and hand signals to perform trading
activities at stock, option, and futures exchanges. Traders would arrange their trades face to face on the exchange’s
trading floor, cry out bids and offers to offer liquidity, and listen for bids and offers to take liquidity. The open
outcry rule is that traders must announce their bids and offers so that other traders may react to them, avoiding
whispering among a small group of traders. They must also publicly announce that they accept bids (assets sold) or
offers (assets taken) of particular trades. The largest pit was the US long-term treasury bond futures market, with
over 500 floor traders under the Chicago Board of Trade (CBOT), a major market maker that later merged into the
CMT Group.
As technology advanced, the trading markets moved from physical to electronic, shaping a fully automated
exchange. First proposed by Fischer Black in 1971, the fully automated exchange was also called program trading,
which encompasses a wide range of portfolio trading strategies.
The trading rules and systems together define a trading market’s market structure. One type of market is called
the call market, where trades are allowed only when the market is called. The other type of market is the
continuous market, where trades are allowed anytime during regular trading hours. Big exchanges such as NYSE,
LSE (London Stock Exchange), and SGX (Singapore Exchange) allow a hybrid mode of market structure.
The market structure can also be categorized based on the nature of pricing among the tradable assets. When
the prices are determined based on the bid (buy) and ask (sell) quotations from market makers or dealers, it is
called a quote-driven or price-driven market. The trades are determined by dealers and market makers who
participate in every trade and match orders from their inventory. Typical assets in a quote-driven market include
bonds, currencies, and commodities.
On the other hand, when the trades are based on the buyers’ and sellers’ requirements, it is called an order-
driven market where the bid and ask prices, along with the number of shares desired, are put on display. Typical
assets in an order-driven market include stock markets, futures exchanges, and electronic communications
networks (ECNs). There are two basic types of orders: market orders, based on the asset’s market price, and limit
orders, where the assets are only traded based on the preset limit price.
Let us look at a few major types of buy-side stock investors.
Market Making
Market maker refers to a firm or an individual that actively quotes the two-sided markets (buy side and sell side) of
a particular security. The market maker provides bids, meaning the particular price of the security along with the
quantity it is willing to buy. It also provides offers (asks), meaning the price of the security and the quantity it is
willing to sell. Naturally, the asking price is supposed to be higher than the bid price, so that the market maker can
make a profit based on the spread of the two quote prices.
Market makers post quotes and stand ready to trade, thereby providing immediacy and liquidity to the market.
By quoting bid and ask prices, market makers make the assets more liquid for potential buyers and short sellers.
A market maker also takes a significant risk of holding the assets because a security’s value may decline
between its purchase and sale to another buyer. They need capital to finance their inventories. The capital available
to them thus limits their ability to offer liquidity. Because market making is very risky, investors generally dislike
investing in market-making operations. Market-making firms with significant external financing typically have
excellent risk management systems that prevent their dealers from generating large losses.
The next section introduces the concept of scalping.
Scalping
Scalping is a type of trading that makes small and fast profits by quickly (typically no more than a few minutes in
large positions) and continuously acquiring and unwinding their positions. Traders that engage in scalping are
referred to as scalpers.
When engaged in scalping, a trader requires a live feed of quotes in order to move fast. The trader, also called
the day trader, must follow a strict exit strategy because one large loss could eliminate the many small gains the
trader worked to accumulate.
Active traders such as day traders are strong believers in market timing, a key component of actively managed
investment strategies. For example, if traders can predict when the market will go up and down, they can make
trades to turn that market move into a profit. Obviously, this is a difficult and strenuous task as one needs to watch
the market continuously, from daily to even hourly, as compared to long-term position traders that invest for the
long run.
The next section introduces the concept of portfolio rebalancing.
Portfolio Rebalancing
As time goes on, a portfolio’s current asset allocation will drift away from an investor’s original target asset
allocation. If left unadjusted, the portfolio will either become too risky or too conservative. Such rebalancing is
completed by changing the position of one or more assets in the portfolio, either buying or selling, with the goal of
maximizing the portfolio return or hedging another financial instrument.
Asset allocations in a portfolio can change as market performance alters the values of the assets due to price
changes. Rebalancing involves periodically buying or selling the assets in a portfolio to regain and maintain that
original, desired level of asset allocation defined by an investor’s risk and reward profile.
There are several reasons why a portfolio may deviate from its target allocation over time, such as due to
market fluctuations, additional cash injection or withdrawal, and changes in risk tolerance. We can perform
portfolio rebalancing using either a time-based rebalancing approach (e.g., quarterly or annually) or a threshold-
based rebalancing approach, which occurs when the allocation of an asset class deviates from the target by a
predefined percentage.
In the world of quantitative trading, Python has emerged as a powerful tool for formulating and implementing
trading algorithms. Part of the reason is its comprehensive open source libraries and strong community support. In
the next section, we will discuss the practical aspect of financial data analysis and start by acquiring and
summarizing the stock data using Python.
Figure 1-6 Illustrating the bullish candlestick in green and bearish candlestick in red
Let us examine the bullish candle in the green of a trading day. When the market starts, the stock assumes an
opening price and starts to move. Across the day, the stock will experience the highest price point (high) and the
lowest price point (low), where the gap in between indicates the momentum of the movement. We know for a fact
that the high will always be higher than the low, as long as there is movement. When the market closes, the stock
registers a close. Figure 1-7 depicts a sample movement path summarized by the green candlestick.
Figure 1-7 A sample path of stock price movement represented by the green candlestick chart. When the market starts, the stock assumes an opening
price and starts to move. It will experience the highest price point (high) and the lowest price point (low), where the gap in between indicates the
momentum of the movement. When the market closes, the stock registers a close
Next, we will switch gears and start working on the actual stock price data using Python. We will download the
data from Yahoo! Finance and introduce different ways to graph the data.
Next, we can use the Ticker() module from the yfinance package to observe the profile information of a
specific stock. The following code snippet obtains the ticker information on Microsoft and prints it out via the
info attribute:
The result shows a long list of information about Microsoft, useful for our initial analysis of a particular stock.
Note that all this information is structured in the form of a dictionary, making it easy for us to access a specific
piece of information. For example, the following code snippet prints the market cap of the stock:
Such structured information, also considered metadata in this context, comes in handy when we analyze
multiple tickers together.
Now let us focus on the actual stock data of Microsoft. In Listing 1-2, we download the stock price data of
Microsoft from the beginning of 2022 till the current date. Here, the current date is determined automatically by
the today() function from the datetime package, which means we will obtain a different (bigger) result every
time we run the code on a future date. We also specify the format of the date to be “YYYY-mm-dd,” an important
practice to unify the date format.
# download daily stock price data by passing in specified ticker and date
range
from datetime import datetime
today_date = datetime.today().strftime('%Y-%m-%d')
print(today_date)
data = yf.download("MSFT", start="2022-01-01", end=today_date)
Listing 1-2 Downloading stock price data
We can examine the first few rows by calling the head() function of the DataFrame. The resulting table
contains price-related information such as open, high, low, close, and adjustment close prices, along with the daily
trading volume:
We can also view the last few rows using the tail() function:
>>> data.tail()
Open High Low Close Adj Close Volume
Date
2022-12-30 238.210007 239.960007 236.660004 239.820007 239.820007 21930800
2023-01-03 243.080002 245.750000 237.399994 239.580002 239.580002 25740000
2023-01-04 232.279999 232.869995 225.960007 229.100006 229.100006 50623400
2023-01-05 227.199997 227.550003 221.759995 222.309998 222.309998 39585600
2023-01-06 223.000000 225.759995 219.350006 224.929993 224.929993 43597700
It is also a good habit to check the dimension of the DataFrame using the shape() function:
The following section will look at visualizing the time series data via interactive charts.
Running the code produces Figure 1-8. Note that the graph is interactive; by hovering over each point, the
corresponding date and closing price come forward.
Figure 1-8 Interactive time series plot of the daily closing price of Microsoft
We can also enrich the graph by overlaying the trading volume information, as shown in Listing 1-4.
Running the code generates Figure 1-9. Note that the trading volume assumes a secondary y-axis on the right,
by setting secondary_y=True.
Figure 1-9 Visualizing the daily closing price and trading volume of Microsoft
Based on this graph, a few bars stand out, making it difficult to see the line chart. Let us change it by
controlling the magnitude of the secondary y-axis. Specifically, we can enlarge the total magnitude of the right y-
axis to make these bars appear shorter, as shown in Listing 1-5.
# rescale volume
fig2.update_yaxes(range=[0,500000000],secondary_y=True)
fig2.update_yaxes(visible=True, secondary_y=True)
fig2
Listing 1-5 Rescaling the y-axis
Running the code generates Figure 1-10. Now the bars appear shorter given a bigger range (0 to 500M) of the
y-axis on the right.
Figure 1-10 Controlling the magnitude of the daily trading volume as bars
Lastly, let us plot all the price points via candlestick charts. This requires us to pass in all the price-related
information in the DataFrame. The Candlestick() function can help us achieve this, as shown in Listing 1-6.
Running the code generates Figure 1-11. Each bar represents one day’s summary points (open, high, low, and
close), with the green color indicating an increase in price and red indicating a decrease in price at the end of the
trading day.
Figure 1-11 Visualizing all daily price points of Microsoft as candlestick charts
Notice the sliding window at the bottom. We can use it to zoom in a specific range, as shown in Figure 1-12.
The dates along the x-axis are automatically adjusted as we zoom in. Also, note that these bars come in groups of
five. This is no incidence—there are five trading days in a week.
Summary
In this chapter, we covered the basics of quantitative trading, covering topics such as institutional algorithmic
trading, major asset classes, derivatives such as options, market structures, buy-side investors, market making,
scalping, and portfolio rebalancing. We then delved into exploratory data analysis of the stock data, starting with
summarizing the periodic data points using candlestick charts. We also reviewed the practical side of things,
covering data retrieval, analysis, and visualization via interactive charts. These will serve as the building blocks as
we develop different trading strategies later on.
Exercises
List a few financial instruments and describe the risk and reward profile.
Can a model get exposed to the test set data during training?
A model is considered better if it does better than another model on the training set, correct?
For daily stock price data, can we aggregate it as weekly data? How about hourly?
What is the payoff function for the issuer of a European call option? Put option? How is it connected to the
payoff function of the buyer?
Suppose you purchase a futures contract that requires you to sell a particular commodity one month later for a
price of $10,000. What is your payoff when the price of the commodity grows to $12,000? Drops to $7000?
What about the payoff for the buyer in both cases?
How do the results change if we switch to an options contract with the same strike price and delivery date?
Draw a sample stock price curve of a red candlestick.
Download the stock price data of Apple, plot it as both a line and a candlestick chart, and analyze its trend.
Calculate the YTD (year-to-date) average stock price of Apple.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_2
2. Electronic Market
Peng Liu1
(1) Singapore, Singapore
In this chapter, we delve into the world of electronic markets, which have revolutionized the way
financial instruments are traded. With the rapid advancements in technology and the widespread
adoption of the Internet, electronic markets have largely replaced traditional, floor-based trading
venues, ushering in an era of speed, efficiency, and accessibility for market participants around the
globe.
Electronic markets facilitate the buying and selling of financial instruments, such as stocks, bonds,
currencies, and commodities (covered in Chapter 1), through computerized systems and networks. They
have played a critical role in democratizing access to financial markets, enabling a broader range of
participants, including retail investors, institutional investors, and high-frequency traders, to engage in
trading activities with ease and transparency. At the heart of electronic markets lies the trading
mechanism, which governs how buy and sell orders are matched, executed, and settled.
Furthermore, electronic markets offer a variety of order types that cater to the diverse needs and
objectives of traders. These order types can be used to achieve specific goals, such as minimizing
market impact, ensuring a desired level of execution, or managing risk. In this chapter, we will examine
the most common types of orders, including market orders, limit orders, stop orders, and their various
iterations.
As we progress through this chapter, readers will gain a comprehensive understanding of the inner
workings of electronic markets, the trading mechanisms that drive them, and the wide array of order
types available to market participants.
Electronic Order
The rise of electronic trading has brought about significant improvements in the efficiency, speed, and
accessibility of financial markets. Transactions that once took minutes or hours to complete can now be
executed in milliseconds or even microseconds, thanks to the power of high-speed networks and
advanced computer algorithms. As a result, market participants can take advantage of fleeting trading
opportunities, react more swiftly to market news, and benefit from tighter bid-ask spreads, which
translate into lower transaction costs.
Moreover, electronic trading has democratized access to global financial markets, allowing
individual investors to trade alongside institutional players such as hedge funds, banks, and proprietary
trading firms. Through user-friendly online trading platforms, retail investors can access a vast array of
financial instruments, from stocks and bonds to currencies and derivatives, and participate in various
markets around the world. These platforms provide a wealth of market data, research tools, and risk
management features, empowering investors to make more informed decisions and execute their trading
strategies with precision and ease. At the same time, the increased transparency and availability of
market data have fostered a more competitive landscape, driving innovation in trading strategies,
algorithms, and financial products.
Orders are short messages to the exchange through the broker. An order is a set of instructions the
trader gives to the exchange. It must contain at least the following instructions:
Contract/security (or contracts/securities) to trade
Buy or sell or cancel or modify
Size: How many shares or contracts to trade
From an investor’s perspective, making a trade via a computer system is simple and easy. However,
the complex process behind the scenes sits on top of an impressive array of technology. What was once
associated with shouting traders and wild hand gestures in open outcry markets has now become more
closely associated with computerized trading strategies.
When you place an order to trade a financial instrument, the complex technology enables your
brokerage to interact with all the securities exchanges looking to execute the trade. Those exchanges
simultaneously interact with all the brokerages to facilitate trading activities.
For example, the Singapore Exchange (SGX), a Singaporean investment holding company, acts
through its central depository (CDP) as a central counterparty to all matched trades (mainly securities)
executed on the SGX ST Trading Engine, as well as privately negotiated married trades that are
reported to the clearing house for clearing on the trade date. Being a central counterparty (CCP), CDP
assumes the role of the seller to the buying clearing member and buyer to the selling clearing member.
CDP, therefore, takes the buyer’s credit risks and assumes the seller’s delivery risks. This interposing of
CDP as the CCP eliminates settlement uncertainty for market participants. SGX provides a centralized
order-driven market with automated order routing, supported by decentralized computer networks.
There are no designated market makers (liquidity providers), and member firms act as brokers or
principals for clearing and settlement.
Proprietary and Agency Trading
In the world of finance, the distinction between proprietary and agency trading plays a crucial role in
determining the objectives and motivations behind trading activities. While both types of trading
involve the execution of orders in financial markets, they serve different purposes and are subject to
different regulations and risk profiles.
Proprietary trading allows financial institutions to generate profits by leveraging their own capital
and expertise in market analysis, risk management, and trading strategies. Prop traders often engage in
various strategies such as arbitrage, market making, and statistical arbitrage, seeking opportunities to
capitalize on market inefficiencies and price discrepancies. However, proprietary trading carries a
higher degree of risk due to the full responsibility for potential losses. As a result, proprietary trading
desks are often subject to strict risk management controls and regulatory oversight, particularly in the
wake of the 2008 financial crisis.
On the other hand, agency trading focuses on providing execution services for clients, prioritizing
the best execution of client orders, and ensuring that clients’ interests are aligned with the broker’s
actions. The primary goal of agency trading is to achieve the most favorable terms for the client while
minimizing the impact of the trade on the market. Brokers engaged in agency trading earn income
through commissions and fees, rather than by taking positions in the market. Since agency traders do
not assume market risk on behalf of their clients, they are subject to different regulatory and compliance
requirements than proprietary traders.
A broker or trading agency can execute trading orders for their clients or their own agency. The
main difference between agency and proprietary trading is the trading client, that is, for whom the trade
is executed, and whose investment portfolio is changed as a result of trading. Agency trading is any
type of trade that a broker executes for their clients/investors who are charged a brokerage fee.
Proprietary trading, also known as prop trading, refers to when an agency or broker executes trades for
the benefit of its own institution. The orders submitted by traders for their own accounts/institutions are
called proprietary orders. Since most traders cannot access the markets directly, most orders are agency
orders, which a broker presents to the market.
Agency orders can be held or not held. Held orders are those when the broker has an obligation to a
client to fill the order. Market-not-held orders are institutional orders where the trader hires a broker-
dealer to execute the order. Working on an order means a broker-dealer takes some time to fill the order.
Understanding the differences between proprietary and agency trading is essential for market
participants to navigate the complex world of financial markets. While proprietary trading focuses on
generating profits through active market participation, agency trading emphasizes the execution of
client orders in the best possible manner, ensuring that the interests of clients are at the forefront of the
broker’s actions.
Market Order
The market order is the most common transaction type in the stock markets. It is an instruction by an
investor to a broker to buy or sell stock shares, bonds, or other assets at the best available price in the
current financial market. This means a market order instructs the broker to buy or sell a security
immediately at the current price. Since there will be plenty of willing buyers and sellers for large-cap
stocks, futures, or ETFs, market orders are best used for buying or selling these financial instruments
with high liquidity.
Since the market order is an instruction to trade a given quantity at the best price possible, the
priority of the market-order trader is to execute the order immediately with no specific price limit. Thus,
the main risk is the uncertainty of the ultimate execution price. Once submitted, the market order cannot
be canceled since it has already been executed.
Note that the electronic market orders don’t wait. Upon receipt of a market order, the exchange will
match it against the standing limit orders immediately until it is completely filled. Such immediacy
characterizes market orders compared to limit orders (introduced in the following section). This means
that when filling a market order, the order matching system will buy at the (ideally) lowest ask price or
sell at the highest bid price, thus ending up paying the bid/ask spread.
Given the nature of market orders, they are particularly suitable for situations where the primary
goal is to execute a trade quickly, rather than achieving a specific target price. This makes market orders
especially useful in fast-moving or volatile market conditions, where getting in or out of a position
promptly is crucial. However, the urgency of market orders also exposes investors to the risk of price
slippage, which occurs when the actual execution price differs from the expected price due to rapid
market fluctuations.
It is important for investors to understand that market orders offer no price protection, meaning that
the execution price may be significantly different from the current market price, especially for illiquid
or thinly traded instruments. In such cases, limit orders may be a more appropriate choice, as they allow
investors to specify a maximum purchase price or a minimum sale price for their orders, providing
some level of price control. However, limit orders come with the trade-off of potentially not being
executed if the specified price is not met.
Limit Order
A limit order, which instructs the broker to buy or sell at the best price available only if the price is no
worse than the price limit specified by the investor, is the main alternative to the market order for most
individual investors. It is preferable when buying or selling a less frequently traded or highly volatile
asset.
During regular hours, limit orders are arranged according to the exchange’s limit price and time of
receipt. When a buy market order arrives, first in the queue limit order selling at the lowest ask price
gets matched first. When a sell market order arrives, first in the queue limit orders bidding at the highest
bid price gets executed first. If the order is not executable, the order will be a standing offer and placed
in a file called a limit order book.
A buy limit order is an order to purchase a financial instrument at or below a specified price,
allowing traders to control how much they would pay for the instrument. In other words, the investor is
guaranteed to pay that price or less by using a limit order to make a purchase.
Although the price is guaranteed, the order being filled is not guaranteed to be executed in time.
After all, a buy limit order will only be executed if the asking price is at or below the specified limit
price. If the asset does not reach the specified price or moves too quickly through the price, the order is
not filled and will be stacked into the limit order book, causing the investor to miss out on the trading
opportunity. That is, by using a buy limit order, the investor is guaranteed to pay the buy limit order
price or better but is not guaranteed to have the order filled.
The same reasoning applies to the sell limit order, where the investor will sell the financial
instrument at or above a specified selling price. A sell limit order allows traders to set a minimum
selling price for their financial instruments. In this case, the investor is guaranteed to receive the
specified price or a better price for the sale, but there is no guarantee that the order will be executed. A
sell limit order will only be filled if the bid price is at or above the specified limit price. If the asset does
not reach the specified price or moves too quickly through the price, the order is not filled and will be
stored in the limit order book, potentially causing the investor to miss out on the trading opportunity.
Limit orders offer more control over the execution price than market orders and can be particularly
useful when trading illiquid or volatile assets, where price slippage is more likely. However, they also
come with the risk that the order may not be executed if the specified price is not reached, potentially
resulting in missed trading opportunities.
To maximize the chances of a limit order being executed, traders should carefully monitor market
conditions and adjust their limit prices accordingly. They may also consider using other advanced order
types, such as stop-limit orders or trailing stop-limit orders, which combine the features of limit orders
with additional conditions, providing even greater control over the execution price and risk
management.
Figure 2-1 Illustrating the limit order book that consolidates all standing limit orders (prices and quantities) from the buy side and the
sell side. A market maker is incentivized to reduce the gap by providing more liquidity to the market, serving as the liquidity provider, and
making the trades of this asset more executable
We can also look at the marketability of buy and sell orders at different ranges. As shown in Figure
2-2, we divide the limit order book into five different regions: above the best offer, at the best offer,
between the best bid and best offer, at the best bid, and below the best bid. For a buy order, it will be
(easily) marketable if the price is at regions 1 and 2, since those eager to sell the asset (at the bottom
part of the top box) would love to see a buyer with an expected or even higher bid. We call the buy
order in the market if it lives within region 3, a situation in flux. Region 4 is borderline and is called at
the market, representing the best bid of all the buyers in the limit order book. When the price of the buy
order drops to region 5, there is no marginal competitiveness, and the order will simply be buried
among the rest of the buy orders, leaving it behind the market. The same reasoning applies to the
marketability of sell orders as well.
Figure 2-2 Analyzing the marketability of buy and sell orders within different regions of the limit order book
It is important for traders and investors to understand the marketability of buy and sell orders in
these different regions so as to optimize their order execution strategies. By strategically placing orders
in the appropriate regions, traders can increase the likelihood of their orders being executed at the
desired price levels, thus minimizing transaction costs and better managing trading risks. Furthermore,
by monitoring the market dynamics and the depth of the limit order book (the number of levels of buy
and sell limit orders available in the order book at a given point in time), traders can gain valuable
insights into the market dynamics of the asset.
Stop Order
By default, a stop order is a market order conditioned on a preset stop price. A stop order becomes a
market order as soon as the current market price reaches or crosses the preset stop price.
A stop order is always executed in the direction that the asset price is moving, assuming that such
movement will continue in its original direction. For instance, if the market for a particular asset is
moving downward, the stop order will be to sell at a preset price below the current market price. This is
called a stop-loss order, which is placed to limit potential losses when the investor is in an open position
of the asset. The stop-loss order will take the investor out of the open position at a preset level if the
market moves against the existing position.
Stop-loss orders are essential, especially when one cannot actively keep an eye on the market. It’s
thus recommended to always have a stop-loss order in place for any existing position in order to gain
protection from a sudden drop in price due to adverse market news. We can also call it a sell-stop order,
which is always placed below the current market price and is typically used to limit a loss or protect a
profit on a long stock position.
Alternatively, if the price is moving upward, the stop order will be to buy once the security reaches
a preset price above the current market price. This is called a stop-entry order, or buy-stop order, which
can be used to enter the market in the direction the market is moving. A buy-stop order is always placed
above the current market price.
Therefore, before entering a position, we can use a stop-entry (buy-stop) order to long an asset if the
market price exceeds the preset stop price, and use a sell-stop order to short an asset if the market price
drops below the preset stop price. If we are already in a long (or short) position, we can use a sell-stop
(or buy-stop) order to limit the loss of the position in case the market price drops (or rises).
Also, note that stop orders can be subject to slippage, that is, the difference between the expected
execution price and the actual execution price. Since stop orders are triggered and converted into
market orders once the preset stop price is reached, there is a possibility that the order may be executed
at a worse price than initially anticipated, especially in fast-moving or illiquid markets. As a result,
slippage can lead to a larger loss or a smaller profit than originally expected.
Let us look at one example. Say you observe that a particular stock has been moving in a sideways
range (a fairly stable range without forming any distinct trends over some period of time) between $20
and $30, and you believe it will ultimately break out the upper limit and move higher. You would like to
employ breakout trading, which means you will take a position within the early stage of an upward-
moving trend. In this case, you could place a stop-entry order above the current upper limit of $30. The
price of the stop-entry order can be set as $30.25 to allow for a margin of error. Placing the stop-entry
order gets you into the market once the sideways range is broken to the upside. Also, now that you’re
long in the position, if you’re a disciplined trader, you’ll want to immediately establish a regular stop-
loss sell order to limit your losses in case the upward trend is false.
When placing a stop order, we have (unknowingly) entered into the world of algorithmic trading.
Here, the logic of algorithmic trading is simple: if the market price reaches or crosses the stop price,
issue a market order; else, keep checking the market price.
Stop-Limit Order
A stop-limit order is similar to a stop order in that a stop price will activate the order. However, unlike
the stop order, which is submitted as a market order when elected, the stop-limit order is submitted as a
limit order. A stop-limit order combines the features of a stop order and a limit order, providing more
control over the execution price while still allowing for the possibility of protecting against significant
losses or locking in profits. Specifically, when the market price reaches the preset stop price, the stop-
limit order becomes a limit order that will be executed at the specified limit price or better. This ensures
that the order will not be executed at a price worse than the limit price, thus mitigating the risk
associated with market orders.
A stop-limit order is a conditional trade that combines the features of a stop order with those of a
limit order and is used to mitigate risk. So a stop-limit order is a limit order contingent on a preset stop
price and a limit price. A stop-limit order eliminates the price risk associated with a stop order where
the execution price cannot be guaranteed. However, it exposes the investor to the risk that the order may
never fill even if the stop price is reached. A stop-limit order gives traders precise control over when the
order should be filled, but the order is not guaranteed to be executed. Traders often use stop-limit orders
to lock in profits or limit downside losses, although they could “miss the market” altogether, resulting in
missed opportunities if the asset’s price moves in the desired direction but doesn’t satisfy the limit price
condition.
In summary, stop-limit orders offer a balance between limiting the execution price and stopping
potential loss due to significant adverse market movements. However, they come with the risk of not
being executed if the limit price is not met, potentially causing traders to miss out on potential profits or
fail to limit their losses effectively.
Let us look at an example algorithm behind the stop-limit order. Suppose research shows that the
slippage is usually three ticks. Regarding the algorithmic rule for a buy-stop-limit order, if the market
price reaches or crosses the stop price, the system would issue a limit order of a limit price three ticks
above the stop price. Otherwise, it will keep checking the market price. Regarding the algorithmic rule
for a sell-stop-limit order, if the market price reaches or crosses the stop price, the system would issue a
limit order of a limit price three ticks below the stop price. Otherwise, it will keep checking the market
price.
Pegged Order
A pegged order is a type of order that allows the limit price to be dynamic, adjusting automatically
based on a reference price. This can be particularly useful in spread trading or other trading strategies
that require staying in sync with the market’s best bid, best offer, or mid-price.
The price in a limit order is fixed and static; we can only issue a new order to have a new limit
price. However, there are situations when we would like the limit price to be dynamic. For example,
suppose a trading strategy must trade at an offset of the best bid or best ask. But these two quotes
fluctuate, and you want your limit order prices to change in sync with them. Pegged orders allow you to
do just that.
Placing a pegged order requires specifying the reference price to track, along with an optional
differential offset. The differential offset can be a positive or negative multiple of the tick size that
represents the minimum price movement for the particular asset. The trading system will then manage
the pegged order by automatically modifying its price on the order book as the reference price moves,
maintaining the desired price relationship.
A pegged order is a limit order with a dynamic limit price. It allows traders to keep their orders in
line with the changing market conditions without having to monitor and adjust their orders manually
and constantly. This can be particularly beneficial in fast-moving markets or when trading strategies
require maintaining specific price relationships with the best bid, best offer, or mid-price. However, it’s
essential to understand that pegged orders still carry the risk of not being executed if the market moves
unfavorably, and the dynamic limit price never reaches a level at which the order can be filled.
The pegged order is often used in spread trading, which involves the simultaneous buying and
selling of related securities as a unit, designed to profit from a change in the spread (price difference)
between the two securities. Here, spread trading is a strategy that takes advantage of the price
difference, or spread, between two related securities. In this strategy, a trader simultaneously buys one
security and sells another security to profit from changes in the spread between the two. The objective
is to capitalize on the temporary mispricing or changing price relationship between the securities rather
than betting on the direction of the individual securities themselves.
So how does a pegged order work? When entering a pegged order, you must specify a reference
price they wish to track, which could be the best bid, best offer, or mid-price. Best bid and best offer
pegs may track at a differential offset, which is specified as a multiple of the whole tick size. This
means that the trading system will manage the pegged order by automatically modifying the pegged
order’s price on the order book as the reference price moves.
Let us look at an example of pegged order. Suppose your strategy requires you to buy a limit order
to be filled at three ticks lower than the current best bid and a sell limit order to be filled at two ticks
higher than the current best offer. When the bid price changes, the pegged order becomes a composite
order comprising
A cancelation order of total order size (one buy limit order and one sell limit order)
A new buy limit order with a limit price pegged at the new best bid less an offset of three ticks, and a
new sell limit order with a limit price pegged at the new best ask plus an offset of two ticks
Let’s say the current best bid is $100, and the best offer is $101. According to this strategy, we will
place a buy limit order at $100 – (3 ticks) and a sell limit order at $101 + (2 ticks). Assuming each tick
is $0.01, the buy limit order will be placed at $99.97, and the sell limit order will be placed at $101.02.
Now, if the best bid changes to $100.50 and the best offer changes to $101.50, the pegged orders
will automatically adjust to the new reference prices. Specifically, the buy limit order will now be
placed at $100.50 – (3 ticks) = $100.47, and the sell limit order will be placed at $101.50 + (2 ticks) =
$101.52.
The pseudocode for the algorithm behind a pegged buy order with an offset of x is as follows:
1.
If the bid price increases to B+
a.
Cancel the current limit order
b.
Submit a buy limit order at a price of B+ − x
2.
Else
a.
If the bid price decreases to B−
i.
If the current limit order is not filled
1.
Cancel the current limit order
2.
Submit a buy limit order at a price of B− − x
ii.
Else
1.
Keep checking whether the bid price has changed
When the bid price changes, the algorithm checks if the change is an increase or a decrease. If the
bid price increases, the current limit order is canceled, and a new buy limit order is submitted at the new
bid price minus the offset x. If the bid price decreases, the algorithm first checks if the current limit
order has been filled or not. If the current limit order is not filled, the order is canceled, and a new buy
limit order is submitted at the new bid price minus the offset x. If the order is filled, no further action is
needed. The algorithm will continue monitoring the bid price for changes and adjust the buy limit order
accordingly.
Pay attention to the inner if condition in the else statement. Here, we check if the current limit order
is filled. Since there is a price drop, we would execute the limit order if it drops to the limit price of the
buy limit order.
We can similarly write out the pseudocode for the algorithm behind a pegged sell order with an
offset of x as follows:
1.
If the ask price decreases to A−
a.
Cancel the current limit order
b.
Submit a sell limit order at a price of A− + x
2.
Else
a.
If the ask price increases to A+
i.
If the current limit order is not filled
1.
Cancel the current limit order
2.
Submit a sell limit order at a price of A− + x
ii.
Else
1.
Keep checking whether the bid price has changed
Price Impact
It is important to note the potential price impact of large market orders, which tend to move prices. And
the reason is the lack of sufficient liquidity for large orders to fill at the best price. Large market orders
can have a significant impact on prices, especially when there is insufficient liquidity at the best price
level. This phenomenon is known as price slippage, which occurs when the actual execution price of an
order differs from the expected price due to insufficient liquidity.
For example, suppose that a 10K-share market buy order arrives, and the best offer is $100 for 5K
shares. Half the order will fill at $100, but the next 5K shares will have to fill at the next price in the
book, say at $100.02 (where we assume there are also 5K shares offered). The volume-weighted
average price for the order will be $100.01, which is larger than $100.00. Thus, the price might move
further following the trade.
To mitigate the impact of large market orders on prices, traders can consider using alternative order
types or strategies, such as using limit orders to control the price at which their orders get executed or
iceberg orders that divide large orders into smaller parts, thus reducing the visibility of the order’s total
size.
Order Flow
In trading, order flow is an important concept. It is the overall trade direction at any given period of
time. Ex post, order flow can be inferred from the trade direction. For example, a trade is said to be
buyer initiated if the trade took place at the ask price or higher. In this case, the buyer is willing to
absorb the bid/ask spread and pay a higher price. The trade sign is +1.
Conversely, a trade is seller initiated if the trade occurred at the bid price or lower. In this case, the
seller is willing to absorb the bid/ask spread and sell for a low price. The trade sign is –1.
In essence, the order flow suggests the net direction of the market. When there were more buy (sell)
market orders (MO) than sell (buy) MO, the market direction would typically be up (down). Many
papers in the literature have provided ample evidence of this intuitive observation. It is also well known
among traders. By analyzing the order flow, traders can identify buying and selling pressure and
anticipate potential price movements. The concept of order flow is based on the premise that the net
direction of market orders can provide insights into market trends and potential price changes.
A positive net order flow, where there are more buy market orders than sell market orders, generally
indicates a bullish market with upward price movement. Conversely, a negative net order flow, where
there are more sell market orders than buy market orders, signals a bearish market with a downward
price movement. This correlation between order flow and market direction is well documented in
academic literature and widely recognized by traders.
So how do we measure the direction of market order flows? One way is to use the net trade sign: the
total number of buyer-initiated trades less the total number of seller-initiated trades. We can also use the
net trade volume sign: the aggregate size of buyer-initiated trades less the aggregate size of seller-
initiated trades.
That being said, if we can forecast the direction of order flow ex ante, the trade direction in the
future can be anticipated. In other words, a positive order flow suggests the market is likely to go up,
while a negative order flow suggests the market is likely to go down.
Therefore, we can use some models to forecast the order flow on the fly. A simple model is to
generate a trading signal if the forecasted order flow for the next period exceeds some threshold. This
threshold can be determined via backtesting (to be covered in a later chapter).
In the following section, we will look at a sample limit order book data and develop familiarity with
both concepts and implementation.
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
df = np.loadtxt('data/Train_Dst_NoAuction_DecPre_CF_7.txt')
Listing 2-1 Loading the LOB dataset
We can access the dimensions of the sample dataset via the shape attribute:
>>> df.shape
(149, 254750)
In this dataset, the rows indicate features such as asset price and volume, and the columns indicate
timestamps. Typically, we would use the rows to indicate observation-level data per timestamp and use
the columns to represent features or attributes. We would need to transpose the dataset.
Also, based on the documentation on the dataset, the first 40 rows carry 10 levels of bid and ask
from the order book, along with the volume of each particular price point. We have a total of 40 entries
per timestamp since each side (buy and sell) contains 10 price levels, and each level includes two
points: price and volume. In other words, the limit order book in a single time snapshot shows up as an
array of 40 elements.
The following code prints out price-volume data of ten price levels for the sell and the buy sides at
the first timestamp:
>>> df[:40,0]
array([0.2615 , 0.00353, 0.2606 , 0.00326, 0.2618 , 0.002 , 0.2604 ,
0.00682, 0.2619 , 0.00164, 0.2602 , 0.00786, 0.262 , 0.00532,
0.26 , 0.00893, 0.2621 , 0.00151, 0.2599 , 0.00159, 0.2623 ,
0.00837, 0.2595 , 0.001 , 0.2625 , 0.0015 , 0.2593 , 0.00143,
0.2626 , 0.00787, 0.2591 , 0.00134, 0.2629 , 0.00146, 0.2588 ,
0.00123, 0.2633 , 0.00311, 0.2579 , 0.00128])
Since each level consists of a price-volume pair for both sides (buy and sell), we know that for the
first four entries, 0.2615 indicates the ask price, 0.00353 as the volume at that ask price level, 0.2606 as
the buy price, and 0.00326 as the volume at that buy price level. Every two entries constitute a price-
volume pair, and every price level corresponds to two consecutive pairs. We have a total of 10 price
levels, corresponding to 20 price-volume pairs, including 10 for the buy side and 10 for the sell side.
Also, we know that price levels on the sell side should always be higher than on the buy side, and a
quick check verifies this.
Let us extract the price-volume pairs across all timestamps. Remember to transpose the dataset,
which is achieved by accessing the .T attribute. The final result is then converted into a Pandas
DataFrame format for better processing later. Remember to print a few rows of the transformed dataset
in df2 for a sanity check:
def printdistribution(dataset):
fig = make_subplots(rows=1, cols=5,
subplot_titles=("k=10", "k=20", "k=30",
"k=50", "k=100"))
fig.add_trace(
go.Histogram(x=dataset[144,:], histnorm='percent'),
row=1, col=1
)
fig.add_trace(
go.Histogram(x=dataset[145,:], histnorm='percent'),
row=1, col=2
)
fig.add_trace(
go.Histogram(x=dataset[146,:], histnorm='percent'),
row=1, col=3
)
fig.add_trace(
go.Histogram(x=dataset[147,:], histnorm='percent'),
row=1, col=4
)
fig.add_trace(
go.Histogram(x=dataset[148,:], histnorm='percent'),
row=1, col=5,
)
fig.update_layout(
title="Label distribution of mid-point movement",
width=700,
height=300,
showlegend=False
)
fig.update_xaxes(ticktext=labels, tickvals=[1, 2, 3], tickangle =
-45)
fig.update_yaxes(visible=False, showticklabels=False)
fig.layout.yaxis.title.text = 'percent'
fig.show()
>>> printdistribution(df)
Listing 2-2 Plotting the label distribution of the mid-point movement
Running the code generates Figure 2-3. The plot suggests an increasingly obvious trend for upward
and downward movements as the lookahead window gets large.
Figure 2-3 Histogram of three types of movement across different lookahead windows in the limit order book
>>> df2.shape
(254750, 40)
Now we would like to dissect this DataFrame and allocate each component to a separate
DataFrame. In Listing 2-3, we subset the DataFrame based on the sequence of columns for each
component, resulting in four DataFrames: dfAskPrices, dfAskVolumes, dfBidPrices, and
dfBidVolumes. Subsetting the DataFrame is completed by calling the loc() function and
supplying the corresponding row and column indexes.
One thing to note is that the ask and bid prices do not follow the same sequence order. Printing out
the first row of dfAskPrices and dfBidPrices helps us verify this:
>>> dfAskPrices.loc[0,:]
0 0.2615
4 0.2618
8 0.2619
12 0.2620
16 0.2621
20 0.2623
24 0.2625
28 0.2626
32 0.2629
36 0.2633
Name: 0, dtype: float64
>>> dfBidPrices.loc[0,:]
2 0.2606
6 0.2604
10 0.2602
14 0.2600
18 0.2599
22 0.2595
26 0.2593
30 0.2591
34 0.2588
38 0.2579
Name: 0, dtype: float64
The results show that the ask prices follow an increasing sequence, while the bid prices follow a
decreasing sequence. Since we often work with price data that follow an increasing sequence in
analyses such as plotting, we need to reverse the order of the bid prices. The order could be reversed by
rearranging the sequence of columns in the DataFrame. The current sequence of the columns is
>>> dfBidPrices.columns
Int64Index([2, 6, 10, 14, 18, 22, 26, 30, 34, 38], dtype='int64')
>>> dfBidPrices.columns[::-1]
Int64Index([38, 34, 30, 26, 22, 18, 14, 10, 6, 2], dtype='int64')
Now let us reverse both bid prices and volumes, where we passed the reversed column names to the
respective DataFrames based on column selection:
dfBidPrices = dfBidPrices[dfBidPrices.columns[::-1]]
dfBidVolumes = dfBidVolumes[dfBidVolumes.columns[::-1]]
Examining the first row of dfBidPrices shows an increasing price trend now:
>>> dfBidPrices.loc[0,:]
38 0.2579
34 0.2588
30 0.2591
26 0.2593
22 0.2595
18 0.2599
14 0.2600
10 0.2602
6 0.2604
2 0.2606
Name: 0, dtype: float64
Note that the index for each entry still stays the same. We may need to reset the index depending on
the specific follow-up process.
Since the price increases from the bottom (buy side) to the top (sell side) in a limit order book, we
can join the price tables from both sides to show the continuum. There are multiple ways to join two
tables, and we choose outer join to avoid missing any entry. Listing 2-4 joins the price and volume
tables from both sides, followed by renaming the columns.
We can print out the first row of dfPrices to check the prices across all levels at the first
timestamp:
>>> dfPrices.loc[0,:]
1 0.2579
2 0.2588
3 0.2591
4 0.2593
5 0.2595
6 0.2599
7 0.2600
8 0.2602
9 0.2604
10 0.2606
11 0.2615
12 0.2618
13 0.2619
14 0.2620
15 0.2621
16 0.2623
17 0.2625
18 0.2626
19 0.2629
20 0.2633
Name: 0, dtype: float64
The result shows that all prices are in increasing order. Since the first ten columns show the buy-
side prices and the last ten columns belong to the sell-side prices, the best bid price would be the
highest price at the buy side, that is, 0.2606, while the best ask price (best offer) would be the lowest
price at the sell side, that is, 0.2615. The difference between the two price points gives us the bid/ask
spread for the current snapshot, and its movement across different snapshots indicates market dynamics.
We can plot these prices as time series, where each price curve represents the evolution of price for
the specific particular of a buy or sell trading side. As a matter of fact, these curves should not intersect
with each other; otherwise, they would have been transacted and jointly removed from that price level.
Listing 2-5 plots the 20 price curves for the first 50 timestamps.
fig = go.Figure()
for i in dfPrices.columns:
fig.add_trace(go.Scatter(y=dfPrices[:50][i]))
fig.update_layout(
title='10 price levels of each side of the orderbook',
xaxis_title="Time snapshot index",
yaxis_title="Price levels",
height=500,
showlegend=False,
)
>>> fig.show()
Listing 2-5 Visualizing sample price curves
Running the code generates Figure 2-4. Note the big gap in the middle; this is the bid/ask spread of
the limit order book. The figure also tells us something about market dynamics. For example, at time
step 20, we observe a sudden jump in ask prices, which may be caused by a certain event in the market,
causing the sellers to raise the prices as a whole.
Figure 2-4 Visualizing the 10 price curves for both sides for the first 50 time snapshots. Each curve represents the price evolution at a
particular price level and will not intersect with each other. The big gap in the middle presents the bid/ask spread of the limit order book
Note that the graph is interactive, offering the usual set of flexible controls (such as zooming,
highlighting via selection, and additional data upon hovering) based on the plotly library.
We can also plot the volume data as stacked bar charts. The following code snippet retrieves the
first 5 snapshots of volume data and plots the 20 levels of volumes as stack bars:
px.bar(dfVolumnes.head(5).transpose(), orientation='h')
Figure 2-5 Plotting the first 5 snapshots of volume as bar charts across all 20 price levels
Let us plot the volume at each price level for a particular time snapshot. We can use the iloc()
function to access a particular portion based on the positional index. For example, the following code
prints out the first row of dfPrices:
>>> dfPrices.iloc[0]
1 0.2579
2 0.2588
3 0.2591
4 0.2593
5 0.2595
6 0.2599
7 0.2600
8 0.2602
9 0.2604
10 0.2606
11 0.2615
12 0.2618
13 0.2619
14 0.2620
15 0.2621
16 0.2623
17 0.2625
18 0.2626
19 0.2629
20 0.2633
Name: 0, dtype: float64
We can plot the volume data of a particular timestamp as bars. As shown in Listing 2-6, we use list
comprehension to format the prices to four decimal places before passing them to the y argument in the
go.Bar() function.
colors = ['lightslategrey',] * 10
colors = colors + ['crimson',] * 10
fig = go.Figure()
timestamp = 0
fig.add_trace(go.Bar(
y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices.iloc[timestamp].tolist()],
x=dfVolumnes.iloc[timestamp].tolist(),
orientation='h',
marker_color=colors
))
fig.update_layout(
title='Volume of 10 price levels of each side of the orderbook',
xaxis_title="Volume",
yaxis_title="Price levels",
# template='plotly_dark'
)
fig.show()
Listing 2-6 Visualizing the volume data
Figure 2-6 Volume data of 20 price levels (10 for the sell side and 10 for the buy side) for a particular snapshot in time
We can also combine the previous two charts together, as shown in Listing 2-7.
for i in dfPrices.columns:
fig.add_trace(go.Scatter(y=dfPrices.head(20)[i]), row=1, col=1)
timestamp = 0
fig.add_trace(go.Bar(
y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices.iloc[timestamp].tolist()],
x= dfVolumnes.iloc[timestamp].tolist(),
orientation='h',
marker_color=colors
), row=1, col=2)
fig.update_layout(
title='10 price levels of each side of the orderbook for multiple
time points, bar size represents volume',
xaxis_title="Time snapshot",
yaxis_title="Price levels",
template='plotly_dark'
)
fig.show()
Listing 2-7 Combining multiple charts together
widthOfTime = 100
priceLevel = 1
fig = go.Figure(
data=[go.Scatter(x=dfPrices.index[:widthOfTime].tolist(),
y=dfPrices[:widthOfTime][priceLevel].tolist(),
name="frame",
mode="lines",
line=dict(width=2, color="blue")),
],
layout=go.Layout(width=1000, height=400,
# xaxis=dict(range=[0, 100], autorange=False,
zeroline=False),
# yaxis=dict(range=[0, 1], autorange=False,
zeroline=False),
title="10 price levels of each side of the
orderbook",
xaxis_title="Time snapshot index",
yaxis_title="Price levels",
template='plotly_dark',
hovermode="closest",
updatemenus=[dict(type="buttons",
showactive=True,
x=0.01,
xanchor="left",
y=1.15,
yanchor="top",
font={"color":'blue'},
buttons=[dict(label="Play",
method="animate",
args=[None])])]),
frames=[go.Frame(
data=[go.Scatter(
x=dfPrices.iloc[k:k+widthOfTime].index.tolist(),
y=dfPrices.iloc[k:k+widthOfTime][priceLevel].tolist(),
mode="lines",
line=dict(color="blue", width=2))
]) for k in range(widthOfTime, 1000)]
)
fig.show()
Listing 2-8 Animating the price movement
Running the code generates Figure 2-8. We can click the Play button to start animating the line
chart, which will change shape as we move forward.
Figure 2-8 Animating the price changes of a selected price level via a rolling window of 100 timestamps
In addition, we can also plot the animation of change in the volume across all the price levels, as
shown in Listing 2-9. The change in volume also indicates the market dynamics in terms of supply and
demand, although less so direct than the price itself.
timeStampStart = 100
fig = go.Figure(
data=[go.Bar(y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices[:timeStampStart].values[0].tolist()],
x=dfVolumnes[:timeStampStart].values[0].tolist(),
orientation='h',
name="priceBar",
marker_color=colors),
],
layout=go.Layout(width=800, height=450,
title="Volume of 10 buy, sell price levels of an
orderbook",
xaxis_title="Volume",
yaxis_title="Price levels",
template='plotly_dark',
hovermode="closest",
updatemenus=[dict(type="buttons",
showactive=True,
x=0.01,
xanchor="left",
y=1.15,
yanchor="top",
font={"color":'blue'},
buttons=[dict(label="Play",
method="animate",
args=[None])])]),
frames=[go.Frame(
data=[go.Bar(y= ['price-'+'{:.4f}'.format(x) for x in
dfPrices.iloc[k].values.tolist()],
x=dfVolumnes.iloc[k].values.tolist(),
orientation='h',
marker_color=colors)],
layout=go.Layout(width=800, height=450,
title="Volume of 10 buy, sell price levels of an
orderbook [Snapshot=" + str(k) +"]",
xaxis_title="Volume",
yaxis_title="Price levels",
template='plotly_dark',
hovermode="closest")) for k in
range(timeStampStart, 500)]
)
fig.show()
Listing 2-9 Animating the volume movement
Figure 2-9 Visualizing the change in the volume across all the price levels
Summary
In this chapter, we covered the basics of the electronic market and the different types of electronic
orders, including market order, stop order, limit order, and other forms of dynamic order (e.g., pegging,
trailing stop, market if touched, limit, and cancelation). We discussed the mechanism of the order
matching system and order flow.
In the second section, we looked at real LOB data and discussed different ways to visualize the
price and volume data, such as their movement across time. Working with the actual data by first
plotting them out and performing some initial analysis is a common and important first step in the
whole pipeline of devising and implementing trading strategies.
Exercises
Write a function in Python to illustrate the algorithm of a pegged buy order and sell order. (Hint: Start
by defining your own input and output.)
What’s the difference between the market if touched order (MIT) and the stop order?
How to calculate mid-price in a limit order book? Implement the logic in code. (Hint: Start by
defining your own input and output.)
Describe how a buy trailing stop order works.
Should the trailing stop-loss order be placed above or below the current market price for an investor
in a long position? A short position?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_3
Obligations at Maturity
There are two types of settlement upon expiration of a futures (and options)
contract: physical delivery and cash settlement. Such derivative contracts will
either be physically delivered or cash-settled.
The first type is the physical delivery of the underlying asset. A
deliverable futures contract stipulates that the buyer in the long position of the
futures contract will pay the agreed-upon price to the seller, who in turn will
deliver the underlying asset to the buyer on the predetermined date (settlement
date of the futures contract). This process is called delivery, where the actual
underlying asset needs to be delivered upon the specified delivery date, rather
than being traded out with offsetting contracts.
For example, a buyer enters a one-year crude oil futures contract with an
opposing seller at a price of $60. We know that one futures contract
corresponds to 1000 barrels of crude oil. This means the buyer is obligated to
purchase 1000 barrels of crude oil from the seller, regardless of the
commodity’s spot price on the settlement date. If the spot price of the crude oil
on the agreed settlement date one year later is below $58, the long contract
holder loses a total of ($60 – $58) × $1000 = $2000, and the short position
holder gains $2000. Conversely, if the spot price rises to $65 per barrel, the
long position holder gains ($65 – $60) × $1000 = $5000, and the short
position holder loses $5000.
The second type is cash settlement. When a futures contract is cash-
settled, the net cash position of the contract on the expiry date is transferred
between the buyer and the seller. It permits the buyer and seller to pay the net
cash value of the position on the delivery date.
Take the previous case, for example. When the spot price of the crude oil
drops to $58, the long position holder will lose $2000, which happens by
debiting $2000 from the buyer’s account and crediting this amount to the
seller’s account. On the other hand, when the spot price rises to $65, the
account of the long position holder will be credited $5000, which comes from
debiting the account of the short position holder.
It is important to understand that the majority of futures contracts are not
held until maturity, and most participants in the futures market do not actually
take or make delivery of the underlying asset. Instead, they are traded out
before the settlement date. Traders and investors often choose to close their
positions before the contract’s expiration date to avoid the obligations
associated with physical delivery or cash settlement. This can be achieved by
entering into an offsetting transaction that effectively cancels out the original
position. For example, a trader with a long position in a futures contract can
sell an identical contract to offset the position, while a trader with a short
position can buy an identical contract to close the position.
The process of closing out a futures position before maturity is a common
practice in the market, as it allows participants to lock in gains or limit losses
without having to deal with the actual delivery or cash settlement of the
underlying asset. This flexibility is one of the key features of futures trading,
as it enables market participants to manage their risk exposure and capitalize
on market opportunities efficiently.
In conclusion, while futures contracts carry obligations at maturity in the
form of physical delivery or cash settlement, most participants in the futures
market choose to close their positions before the expiration date. By engaging
in offsetting transactions, traders and investors can effectively manage their
risk exposure and profit from price movements in the underlying asset without
having to deal with the logistics of taking or making the delivery.
Clearing House
Farmers who sell futures contracts do not sell directly to the buyers. Rather,
they sell to the clearing house of the futures exchange. As a designated
intermediary between a buyer and seller in the financial market, the clearing
house validates and finalizes each transaction, ensuring that both the buyer
and the seller honor their contractual obligations. The clearing house thus
guarantees that all of the traders in the futures market will honor their
obligations, thus avoiding potential counterparty risk.
The clearing house serves this role by adopting the buyer’s position to
every seller and the seller’s position to every buyer. Every trader in the futures
market has obligations only to the clearing house. The clearing house takes no
active position in the market, but interposes itself between all parties to every
transaction. As the middleman, the clearing house provides the security and
efficiency integral to financial market stability. So as far as the farmers are
concerned, they can sell their goods to the clearing house at the price of the
futures contract when the contract expires.
The clearing house will then match and confirm the details of the trades
executed on the exchange, including the contract size, price, and expiration
date, ensuring that all parties have accurate and consistent information. Order
matching and confirmation is thus one of the main roles of a clearing house.
The clearing house of the futures market also has a margin requirement,
which is a sum of the deposit that serves as the minimum maintenance margin
for the (clearing) member of the exchange. All members of an exchange are
required to clear their trades through the clearing house at the end of each
trading session and satisfy the margin requirement to cover the corresponding
minimum balance requirement. Otherwise, the member will receive a margin
call to top up the remaining balance when the margin account runs low due to
fluctuation in asset price. Clearing houses thus collect and monitor margin
requirements from their members, ensuring that all participants have sufficient
collateral to cover potential losses. This helps to maintain the financial
stability of the market and reduces the likelihood of default.
Figure 3-1 illustrates the clearing house as a middle party between the
buyer and the seller.
Figure 3-1 Illustrating the role of the clearing house as an intermediary between buyers and sellers in a
futures market
Mark-to-Market
Mark-to-market involves updating the price of a futures contract to reflect its
current market value rather than the book value, so as to ensure that margin
requirements are being met. If the current market value of the futures contract
causes the margin account to fall below its required level, the trader will
receive a margin call from the exchange to top up the remaining balance.
Mark-to-market is a process of pricing futures contracts at the end of every
trading day. Made to accounts with open futures positions, the cash adjustment
in mark-to-market reflects the day’s profit or loss, based on the settlement
price of the product, and is determined by the exchange. Since mark-to-market
adjustments affect the cash balance in a futures account, the margin
requirement for the account is being assessed on a daily basis to continue
holding an open position.
Let us look at a mark-to-market example and understand the daily change
in the price of the futures contract due to fluctuating prices in the underlying
asset. First, note the two counterparties on either side of a futures contract,
that is, a long position trader and a short position trader. The long trader goes
bullish as the underlying asset is expected to increase in price, while the trader
shorting the contract is considered bearish due to the expected drop in the
price of the underlying asset.
The futures contract may go up or down in value at the end of the trading
day. When its price goes up, the long margin account increases in value due to
mark-to-market, with the daily gain credit to the margin account of the long
position trader. Correspondingly, the short position trader on the opposing side
will suffer a loss of an equal amount, which is debited from the margin
account.
Similarly, when the price of the futures contract goes down, the long
margin account decreases in value due to mark-to-market, with the daily loss
debited from the margin account of the long position trader. This amount will
be credited to the margin account of the short position trader, who will realize
a gain of an equal amount.
By updating the price of a futures contract to reflect its current market
value, the exchange can monitor the risk exposure of traders in real time. This
helps to ensure that margin requirements are being met and that traders have
enough funds to cover their positions, which essentially reduces risk exposure
to the traders. This also allows traders to accurately assess their profit or loss
and make informed decisions about their positions.
Figure 3-2 illustrates the two types of traders with an open position in the
same futures contract and their respective profit and loss due to mark-to-
market.
Figure 3-2 Illustrating the mark-to-market process and the resulting effect on the margin account of
long and short position traders for the same futures contract
Figure 3-3 An example of daily changes in the margin account of long and short position traders due
to mark-to-market
Note that the margin account changes the balance daily due to gain/loss
from mark-to-market exercise. Although the final settlement price at the
delivery date could be different from the intended price upon entering the
futures position, the traders on both sides would still end up transacting at an
effective price equal to the initially intended price, thus hedging the risk of
price fluctuations.
Now let us look at how to price this derivative product, starting with its
similar twin: forward contract.
Pricing Forward Contract
A forward contract is a customizable contract between two parties to buy or
sell an asset at a specified price on a future date. Different from the futures
contract, whose price is settled on a daily basis until the end of the contract, a
forward contract is only settled at the end of the agreement and is traded over
the counter. Therefore, it is easier to price.
The price of a forward contract is the predetermined delivery price for the
underlying asset decided by the buyer and the seller. This is the price to be
paid at a predetermined date in the future and is determined by the following
formula:
where F0 is the price of the forward contract at the current time point t = 0,
and S0 is the price of the underlying asset at t = 0. r is the risk-free bond
interest rate, the theoretical rate of return of an investment with zero risk. T is
the duration from the current time point t = 0 to the expiration date t = T. More
generally, we can write the price of the forward contract as follows:
$$ {F}_t={S}_t{e}^{r\left(T-t\right)} $$
Here, multiplying the exponential constant simply means increasing the
price of the forward contract, depending on the baseline interest rate r and the
duration T − t in a continuously compounding scheme. In other words,
suppose we deposit $1000 in a bank, which promises a continuously
compounded interest rate of r. We can thus expect to see the total value of the
deposit grow to 1000er at the end of year 1, 1000e2r at the end of year 2, etc.
This is a common way of compounding in finance and accounting.
Now let us look at how this formula comes into shape. The reasoning
follows the no-arbitrage argument, which says there is no arbitrage
opportunity to make any riskless profit, no matter how the price of the
underlying asset changes. Suppose we enter into a long forward contract that
obligates us to buy the asset S at time T for a price of FT. We are living at the
current time point t, where the spot price of the asset is St, and the future price
of the asset will be ST. The nature of the agreement fixes the action for us at
the delivery date; thus, we need to pay an amount of FT to purchase the asset
valued at ST. In other words, our net profit/loss (P&L) at time T is −FT + ST,
where the negative sign means cash outflow. Note that this happens in the
future at time T and not yet for now at time t.
However, there is a risk involved upon entering this contract. Since the
asset price fluctuates in the future, the asset price may drop a lot due to
unforeseen circumstances in the future, leading to a very negative P&L upon
delivery. Although the opposite could also be true and the final P&L could be
very positive, this still poses a potential risk, especially for market participants
such as farmers and manufacturers mentioned earlier.
To hedge this risk, we could short one unit of this asset at time t, since we
know that a short position makes a profit if the asset price drops. A short
position in the underlying asset profits us from losses in the future due to a
decrease in the future asset price. It is one unit of the underlying asset because
we can use the exact one unit of the asset bought based on the forward
agreement to close the initial short position in the underlying asset, that is,
return the asset back to where we borrowed it from.
Now we look at the process in more detail. Upon entering the short
position of one unit of the underlying asset at time t, we obtain a cash inflow
of St, as shorting means selling an asset and buying it back later. This means
that we will have a cash outflow of ST at the delivery date to pay back the
asset and close the short position.
Note that the cash St at time t will not sit idle. Instead, we will invest the
cash, such as depositing t in the bank to enjoy a risk-free interest rate. The
money will grow to Ster(T − t) upon reaching the delivery date, with an
investment period of T − t. This investment will be used to cover the short
position in the underlying asset.
Figure 3-4 summarizes the positions in different products and the total
portfolio value with the evolution of time. Here, we have three different
products in our portfolio: a forward contract, an asset (e.g., one share of
stock), and cash. These three constitute our portfolio, and we start with zero
value in the portfolio at time t. To see this, we observe that the forward
position is zero at time t since we only make the transaction upon reaching the
delivery date. The stock position gives −St since we are shorting the stock, and
the cash position gives St, the income generated by shorting the stock. Adding
up the value of these three positions gives zero value for the portfolio at time
t. The net cash flow at time t is zero.
Figure 3-4 Pricing the forward contract in a long position using the no-arbitrage argument. The stock
and cash positions also constitute a replicating portfolio that offsets the randomness in the payoff
function of the forward contract at the delivery date
As time passes by, the value of each position will evolve. Specifically, the
forward position becomes −F + ST since we would buy one asset valued at ST
for a price of F. Our stock position becomes −ST due to change in the stock
price, and cash position becomes Ster(T − t).
Now, using the no-arbitrage argument, we would end up with zero value in
our portfolio since we started with zero value. Adding the value of the three
positions at time T gives the total portfolio value of −F + Ster(T − t). And by
equating it to zero, we have F = Ster(T − t), thus completing the pricing of the
forward contract using the no-arbitrage argument.
This is the formula for the price of a forward contract. It demonstrates that
the forward price is determined by the current price of the underlying asset,
the risk-free interest rate, and the time until the contract expires. By using this
formula, both parties in a forward contract can agree on a fair price that
eliminates arbitrage opportunities and reflects the true value of the underlying
asset.
It is interesting to note that the stock and cash positions jointly constitute a
replicating portfolio that offsets the randomness in the payoff function of the
forward contract at the delivery date. This means that no matter what the price
of the forward contract will be in the future, we will always be able to use
another replicating portfolio to deliver the same payoff, as if we were in a
position of the forward contract. This is called pricing by replication.
Let us see what happens if the price of the forward is not equal to the stock
price with a continuously compounded interest rate. We can argue about
arbitrage opportunities based on the riskless profit from the buy-low-sell-high
principle. When F > Ster(T − t), we can borrow an amount of St and use the
money to short a forward contract that allows us to sell one unit of the
underlying asset at price F. Upon reaching the delivery date, we receive a total
of F by selling the asset, pay back the borrowed money with interest Ster(T − t),
and earn a net profit of F − Ster(T − t). This is arbitrage, where we made a
riskless profit by taking advantage of the price difference at the future time T.
Similarly, when F < Ster(T − t), the forward contract is cheaper, and the
asset is more expensive. In that case, we again exercise the buy-low-sell-high
principle by longing a forward contract at time t that allows us to buy one unit
of the underlying asset at price F and time T. We will also short one unit of the
underlying asset at time t to gain a total amount of St, which further grows to
Ster(T − t) upon reaching the delivery date. When the contract expires, we will
close the short position in the underlying asset by purchasing one unit of the
asset for a price of F. We get to keep the remaining balance Ster(T − t) − F, thus
also establishing the arbitrage argument and ensuring a riskless profit.
Note that the futures price is equal to the spot price of the underlying asset
at the current time t. To see this, simply set T = t and we have
F = Ster(t − t) = St.
In a nutshell, the future net cash flow predetermined or fixed in advance
(today) must equal today’s net cash flow to annihilate arbitrage opportunities.
The no-arbitrage argument gives a fair price for the forward contract.
Figure 3-5 Calculating the fair price of a futures contract with an annually compounded interest rate,
storage cost, and convenience yield
Figure 3-6 Illustrating the price dynamics of the futures contract in contango. The left panel shows the
price curve at the current time point, where a futures contract with a longer delivery date is more
expensive. The right panel shows the price evolution of the asset and futures contract with different
delivery dates, each converging to the spot price upon reaching the respective delivery date
Figure 3-7 Illustrating the price dynamics of the futures contract in backwardation
# For visualisation
import matplotlib.pyplot as plt
plt.style.use('seaborn-darkgrid')
%matplotlib inline
Let us plot the closing price via Listing 3-2. Note the use of the
fontsize argument in adjusting the font size in the figure.
Figure 3-8 Visualizing the closing price of platinum futures data in 2022
Note that the DataFrame has two levels of columns, with the first level
specifying the symbol name and the second one showing the different price
points.
Similarly, we can plot the closing price of the two sets of futures data, as
shown in Listing 3-4.
Figure 3-9 Visualizing the closing price of gold and copper futures data in 2022
futures_symbol = "ES=F"
futures_data = yf.download(futures_symbol,
start="2022-01-01", end="2022-04-01", interval="1d")
Listing 3-5 Downloading S&P 500 E-Mini futures data
Now let us calculate a few technical indicators using the ta library. In this
example, we will calculate the Relative Strength Index (RSI), Bollinger
Bands, and MACD (Moving Average Convergence Divergence). The
following list briefly describes these popular technical indicators:
Relative Strength Index (RSI): RSI is a momentum oscillator that measures
the speed and change of price movements. The RSI oscillates between 0
and 100, and traders often consider an asset overbought when the RSI is
above 70 and oversold when it’s below 30.
Bollinger Bands: Bollinger Bands are a volatility indicator that measures
the standard deviation of price changes. The indicator consists of three
lines: the middle line (a simple moving average) and two outer lines (upper
and lower bands) plotted at a specified number of standard deviations away
from the moving average. When the bands widen, it indicates increased
volatility, and when they narrow, it signifies decreased volatility. Prices
often move between the upper and lower bands.
Moving Average Convergence Divergence (MACD): MACD is a
momentum indicator that shows the relationship between two moving
averages of an asset’s price. It consists of two lines: the MACD line
(difference between short-term and long-term moving averages) and the
signal line (a moving average of the MACD line). When the MACD line
crosses above the signal line, it may suggest a bullish signal (buy), and
when it crosses below the signal line, it may indicate a bearish signal (sell).
Additionally, when the MACD line is above zero, it suggests an upward
momentum, while below zero indicates a downward momentum.
Listing 3-6 calculates these technical indicators and concatenates them to
the DataFrame.
# Calculate RSI
futures_data["RSI"] =
ta.momentum.RSIIndicator(futures_data["Close"]).rsi()
# Calculate MACD
macd = ta.trend.MACD(futures_data["Close"])
futures_data["MACD"] = macd.macd()
futures_data["MACD_signal"] = macd.macd_signal()
Listing 3-6 Calculating common technical indicators
Now we can plot the raw futures time series data together with the
technical indicators to facilitate analysis, as shown in Listing 3-7.
# Plot RSI
axes[1].plot(futures_data.index,
futures_data["RSI"], label="RSI", color="g")
axes[1].axhline(30, linestyle="--", color="r",
alpha=0.5)
axes[1].axhline(70, linestyle="--", color="r",
alpha=0.5)
axes[1].set_title("Relative Strength Index (RSI)")
axes[1].grid()
# Plot MACD
axes[3].plot(futures_data.index,
futures_data["MACD"], label="MACD", color="b")
axes[3].plot(futures_data.index,
futures_data["MACD_signal"], label="Signal Line",
linestyle="--", color="r")
axes[3].axhline(0, linestyle="--", color="k",
alpha=0.5)
axes[3].set_title("Moving Average Convergence
Divergence (MACD)")
axes[3].grid()
Listing 3-7 Visualizing futures data and technical indicators
We can plot a few things here. In the plotted RSI chart, we can observe
periods when the RSI crossed below 30, which might signal potentially
oversold conditions. Traders may use these signals to consider entering or
exiting positions. In the plotted chart on Bollinger Bands, we can see periods
when the price touched or crossed the bands, which may indicate potential
trend reversals or support and resistance levels. In the MACD chart, we can
observe periods when the MACD line crossed the signal line, which may
signal potential entry or exit points for traders.
Summary
In this chapter, we delved into the world of options and futures contracts.
Forward contracts are customized, private agreements between two parties
and are traded over the counter (OTC). They are only settled at the end of the
agreement and are priced based on the spot price of the underlying asset, the
risk-free interest rate, and the time to expiration. However, forward contracts
come with potential counterparty risk as there is no clearing house to
guarantee the fulfillment of the contractual obligations.
Futures contracts, on the other hand, are standardized contracts traded on
regulated exchanges. They are marked to market daily, meaning that the price
of the contract is adjusted to reflect its current market value, ensuring that
margin requirements are met. The clearing house of the futures exchange
serves as an intermediary between buyers and sellers, mitigating counterparty
risk and ensuring the stability of the market.
We also covered the pricing of both types of contracts. For example, the
pricing of futures contracts is influenced by factors such as the spot price of
the underlying asset, the risk-free interest rate, storage costs, and convenience
yield. In addition, futures markets can exhibit contango, where futures prices
are higher than the spot price, or backwardation, where futures prices are
lower than the spot price.
Exercises
A farmer sells agricultural products, and a manufacturer purchases raw
materials for production. In both cases, what position should they take in a
futures contract in order to hedge against adverse price changes in the
future?
A wheat farmer takes a short position in ten wheat futures contracts on day
1, each valued at $4.5 and representing 5000 bushels. If the price of the
futures contracts increases to $4.55 on day 2, what is the change in the
farmer’s margin account?
Suppose we enter into a short forward position. What is the risk due to the
fluctuating asset price in the future? How can we hedge the risk?
Assume we could buy a barrel of oil for $80 today, and the current futures
price is $85 for delivery three months from today. One futures contract can
buy 1000 barrels of oil. How can you arbitrage in this situation? What is the
profit? Assume a zero risk-free interest rate.
Apply the same no-arbitrage argument to value a forward contract in a short
position.
Write a function to calculate the fair price of a futures contract given the
spot price of the asset, risk-free interest rate, rate of storage cost,
convenience yield, and delivery date. Allow for both annual compounding
and continuous compounding.
Explain the source of riskless profit when a forward contract is overpriced
or underpriced than its theoretical no-arbitrage value.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_4
Any financial asset is characterized by its risk and return. Return means the
financial reward it brings, such as the percentage increase in the asset value.
We hope to maximize the percentage return of the asset as much as
possible. However, a higher reward often comes with higher risk, where
risk refers to the volatility of such return. That is, an asset displays high
oscillations in its historical returns, making its future outlook more
uncertain than, say, a stable product with little deviation from the expected
gain, such as the bond. As an investor, the goal of making profits boils
down to maximizing the return and, at the same time, minimizing the risk.
Return is a measure of the financial gain or loss of an investment over a
specific period. It can be calculated as a percentage of the initial
investment, taking into account factors such as capital appreciation,
dividends, and interest payments. Returns can be either realized (already
received) or unrealized (expected to be received in the future). There are
various ways to measure returns, including absolute return, annualized
return, and risk-adjusted return.
Risk is the variability or uncertainty in the returns of an investment. It
represents the potential for losses due to factors such as market fluctuations,
economic conditions, and company-specific events. There are several types
of risk, including market risk, credit risk, liquidity risk, and operational risk,
among others. In general, investments with higher risk tend to offer higher
potential returns to compensate for the increased uncertainty.
Figure 4-1 Illustrating the four quadrants of risk and return profile
Next, let us combine these two lists in a Pandas DataFrame for easy
manipulation. This is achieved by wrapping the two lists in a dictionary and
passing it to the pd.DataFrame() function:
return_df = pd.DataFrame({"Asset1":asset_return1,
"Asset2":asset_return2})
>>> return_df
Asset1 Asset2
0 0.05 0.5
1 0.30 -0.2
2 -0.10 0.3
3 0.35 0.5
4 0.20 -0.3
To facilitate visual analysis, let us plot the two return series in a bar
chart using the .plot.bar() method:
>>> return_df.plot.bar()
Running this command generates Figure 4-2. The figure suggests that
despite having the same mean return, these two assets clearly have different
risk profiles. Specifically, asset 2 (orange bars) is more volatile than asset 1
(blue bars).
>>> return_df.std()
Asset1 0.185068
Asset2 0.384708
dtype: float64
Note that the std() is applied column-wise. Similarly, we can call the
mean() function to calculate the mean value of each column:
>>> return_df.mean()
Asset1 0.16
Asset2 0.16
dtype: float64
>>> return_df + 1
Asset1 Asset2
0 1.05 1.5
1 1.30 0.8
2 0.90 1.3
3 1.35 1.5
4 1.20 0.7
init_investment = 100
cum_value = (return_df + 1).cumprod()*100
>>> cum_value
Asset1 Asset2
0 105.0000 150.0
1 136.5000 120.0
2 122.8500 156.0
3 165.8475 234.0
4 199.0170 163.8
>>> cum_value.plot.line()
Also, note that the last row in the shifted column is NA, which is due to
the fact that there is no more future price available at the last time point.
This also makes the 1+R return column NA. We will demonstrate the
calculation process in code later. For now, it is good to digest and accept the
1+R formatted return as an equivalent way of describing asset returns.
Multiperiod Return
The terminal return can also be considered as the multiperiod return, or the
return over a combined period of time. Since the evolution process is
sequential, we need to compound the returns in each period, sequentially.
When we have the 1+R formatted returns, it is easy to calculate the
multiperiod return by multiplying/compounding the intermediate 1+R
returns followed by a subtraction of one.
The multiperiod return is a measure of an investment’s performance
over a series of consecutive periods. Recall that the terminal return can be
calculated via R0, T = (1 + R0, 1)(1 + R1, 2)…(1 + RT − 1, T) − 1. When we
calculate the two-period return Rt, t + 2, the formula becomes
This method allows us to calculate the overall return over the two
periods while considering the compounding effect of each period’s return
on the next. The compounded return is thus easy to calculate using the 1+R
formatted returns for both periods. Figure 4-6 illustrates the process of
compounding the two-period return.
Figure 4-6 Calculating the two-period return by compounding the two single-period returns in 1+R
format, followed by an adjustment of subtraction by one
By multiplying the 1+R formatted returns for all n periods and then
subtracting one, we can determine the compounded return over the entire n-
period investment horizon.
Let us look at a simple example. Suppose we invest in an asset for two
periods, where the first-period return is 10%, and the second-period return
is –2%. To calculate the compounded return, our first step is to convert both
single-period returns to the 1+R format, giving 1.1 and 0.98, respectively.
We would then multiply these two numbers and subtract by one:
Annualizing Returns
Once we know how to calculate the terminal return of any asset, the next
question is comparing assets with different periods of time. For example,
some returns are daily, while other returns are monthly, quarterly, or yearly.
The answer is annualization, where we annualize the returns to the same
time scale of a year for a fair comparison.
Annualizing returns is a crucial step in comparing the performance of
assets with different investment horizons. By converting returns to an
annualized basis, we can more easily evaluate and compare the performance
of various assets on a standardized time scale. This process helps to level
the playing field and facilitate informed decision-making.
The overall process for annualizing returns is as follows:
Calculate the 1+R formatted return for the given period.
Raise the 1+R formatted return to the power of the number of periods per
year.
Subtract one to convert the result from the 1+R format back to the return
itself.
Let us look at an example. Suppose we have an asset that generates a
monthly return of 1%. To calculate the annualized return, we need to
enlarge the time horizon to a year. However, simply multiplying 12 by 1%
is incorrect. To proceed with the sequential compounding process, we
would construct the 1+R formatted return (1 + 0.01) for each month,
multiply across all 12 months to reach (1 + 0.01)12, and finally subtract by
one to give (1 + 0.01)12 − 1 ≈ 12.68%, which is higher than 12%.
Calculating the annualized return thus involves deriving the 1+R formatted
return, multiplying these returns by the number of periods per year, and
subtracting by one to convert from 1+R to R.
This calculation shows that the annualized return is 12.68%, which is
higher than simply multiplying the 1% monthly return by 12. This
difference is due to the compounding effect, which is an essential factor to
consider when annualizing returns.
The first-period return can be calculated based on the first two price
points. We would first obtain the 1+R formatted return and then subtract by
one to switch to the normal return:
>>> prices[1]/prices[0] – 1
1.0
>>> prices[2]/prices[1] – 1
-1.25
>>> print(prices[1:])
[0.2, -0.05]
>>> print(prices[:-1])
[0.1, 0.2]
>>>
print(np.array(prices[1:])/np.array(prices[:-1])-1)
[ 1. -1.25]
Another approach is to rely on the Pandas ecosystem, which implements
a lot of NumPy calculations under the hood. Let us convert the list to a
Pandas DataFrame by converting a dictionary, the same technique used
earlier:
prices_df = pd.DataFrame({"price":prices})
>>> prices_df
price
0 0.10
1 0.20
2 -0.05
>>> prices_df.iloc[1:]
price
1 0.20
2 -0.05
>>> prices_df.iloc[:-1]
price
0 0.1
1 0.2
Pay attention to the indexes in the first column here. These are the
default row-level indexes assigned upon creating the Pandas DataFrame,
and these indexes remain unchanged even after the subsetting operation.
Having misaligned indexes could easily lead to problems when trying to
combine two DataFrames. In this case, we would end up with an unwanted
result when we divide these two DataFrames:
>>> prices_df.iloc[1:]/prices_df.iloc[:-1]
price
0 NaN
1 1.0
2 NaN
The reason behind this seemingly irregular behavior is that both
DataFrames are trying to locate the corresponding element with the same
index. When the counterparty cannot be found, a NaN value shows up.
To correct this, we can extract the value attribute only from these
DataFrames. We only need to do this for one DataFrame as the other will be
converted to the format of the value automatically. The following code
snippet shows the way to go, where the result is the same as before:
>>> prices_df.iloc[1:].values/prices_df.iloc[:-1]
– 1
price
0 1.00
1 -1.25
>>> prices_df.iloc[1:]/prices_df.iloc[:-1].values
– 1
price
1 1.00
2 -1.25
Let us stay with the shifting operation a bit longer. It turns out that there
is a function with the same name. For example, to shift the prices
downward by one unit, we can pass one to the shift() function of the
Pandas DataFrame object as follows:
>>> prices_df.shift(1)
price
0 NaN
1 0.1
2 0.2
Notice that the first element is filled with NaN since there is no value
before the first price. We can then divide the original DataFrame by the
shifted DataFrame to obtain the sequence of single-period 1+R formatted
returns and subtract by one to get the normal return:
>>> prices_df/prices_df.shift(1) - 1
price
0 NaN
1 1.00
2 -1.25
Finally, we have one more utility function that helps us perform these
calculations in one shot. The function is pct_change(), which
calculates the percentage change between two consecutive values in the
DataFrame:
returns_df = prices_df.pct_change()
>>> returns_df
price
0 NaN
1 1.00
2 -1.25
>>> returns_df + 1
price
0 NaN
1 2.00
2 -0.25
We then call the prod() function from NumPy to multiply all
elements in an array, ignoring the NaN value. This gets us the 1+R
formatted terminal return, from which we subtract one to convert to the
normal terminal return:
>>> np.prod(returns_df + 1) – 1
price -1.5
dtype: float64
There is also a corresponding Pandas way, which gives the same result:
>>> (returns_df+1).prod() – 1
price -1.5
dtype: float64
r = 0.0001
>>> (1+r)**252-10
0.025518911987694626
For the monthly return, since there are 12 months in a year, we would
compound it 12 times:
r = 0.01
>>> (1+r)**12-1
0.12682503013196977
r = 0.05
>>> (1+r)**4-1
0.21550625000000023
Analyzing Risk
The risk of an asset is related to volatility, which is of equal or higher
importance than the reward. Volatility is a crucial metric in assessing the
risk of an investment, as it represents the level of uncertainty or fluctuations
in the asset’s returns. A higher volatility implies a higher risk, as the asset’s
price can experience more significant ups and downs. To quantify the risk
associated with an investment, we must understand the concept of volatility
and how to calculate it.
Recall the returns of two assets in Figure 4-3. Despite having the same
average reward, asset 2 is more volatile than asset 1. Asset 2 deviates from
the mean more often and more significantly than asset 1. Volatility thus
measures the degree of deviation from the mean. We will formalize the
notion of volatility in this section.
Before looking at volatility, let us first introduce the concept of variance
and standard deviation.
Here, Ri − RP also means to de-mean the original return Ri, that is,
subtract the mean return RP from the original return Ri. This gives deviation
from the mean. Also, by squaring these deviations, the problem of
canceling out positive and negative terms no longer exists; all de-meaned
returns end up being positive or zero. Finally, we take the average of the
squared deviations as the variance of the return series. A visual inspection
of Figure 4-3 also suggests that asset 2 has a higher variance than asset 1.
Although variance summarizes the average degree of deviation from the
mean return, its unit is the squared distance from the average return, making
it difficult to interpret the unit. In practice, we would often take the square
root of the variance and bring it back to the same scale as the return. The
result is called standard deviation, where the deviation is now standardized
and comparable.
This is also our measure of volatility. It measures how large the prices
swing around the mean price and serves as a direct measure of the
dispersion of returns. The higher the volatility, the higher the deviations
from the mean return. Figure 4-7 summarizes the definitions of common
statistical measures such as the mean, variance (both population and
sample), and standard deviation, also called volatility in the financial
context.
Figure 4-7 Summarizing the common statistical measures, including the mean, variance
(population and sample), and standard deviation (also called volatility)
Annualizing Volatility
Similar to return, the volatility also needs to be annualized to warrant a fair
comparison. Without annualizing the volatility, it is difficult to compare the
volatility of monthly data with that of daily data.
The formula for annualizing the volatility relies on the fact that the
volatility increases with the square root of the time period T. The
annualized return σP, T can be calculated as
Figure 4-8 Comparing the differences when annualizing volatility and variance. When given a
fixed single-period volatility or variance, the annualized volatility grows nonlinearly with time, while
the annualized variance grows linearly with time
p1_ret = 0.05
p1_vol = 0.2
p2_ret = 0.1
p2_vol = 0.5
risk_free_rate = 0.03
Let us work with some real data to calculate the aforementioned metrics
in the next section.
import yfinance as yf
prices_df = yf.download(["AAPL","GOOG"],
start="2023-01-01")
>>> prices_df.head()
Listing 4-3 Downloading stock data using yfinance
Running the code generates Figure 4-10. Note the multilevel columns
here. There are two levels of columns, with the first level indicating the
price type and the second one denoting the ticker symbol. Also, the index of
the DataFrame follows a datetime format.
Figure 4-10 Printing the first few rows of daily stock prices for Apple and Google
Next, we would like to focus on the daily adjusted closing price of the
two stocks, indexed by date instead of datetime. Listing 4-4 completes these
two tasks.
Here, we accessed the date attribute of the index and assigned it to the
index attribute of the DataFrame. We would then calculate the 1+R
formatted returns using the pct_change() utility function:
returns_df = prices_df.pct_change()
>>> returns_df.head()
AAPL GOOG
2023-01-03 NaN NaN
2023-01-04 0.010314 -0.011037
2023-01-05 -0.010605 -0.021869
2023-01-06 0.036794 0.016019
2023-01-09 0.004089 0.007260
Again, the first row is empty since there is no data point before it. We
can remove this row using the dropna() function:
returns_df = returns_df.dropna()
>>> returns_df.head()
AAPL GOOG
2023-01-04 0.010314 -0.011037
2023-01-05 -0.010605 -0.021869
2023-01-06 0.036794 0.016019
2023-01-09 0.004089 0.007260
2023-01-10 0.004456 0.004955
>>> returns_df.mean()
AAPL 0.007228
GOOG 0.004295
dtype: float64
>>> returns_df.std(axis=0)
AAPL 0.012995
GOOG 0.016086
dtype: float64
Google’s stock prices were more volatile than Apple’s in the first few
days. Now let us try setting axis=1:
>>> returns_df.std(axis=1)
2023-01-04 0.015097
2023-01-05 0.007965
2023-01-06 0.014690
2023-01-09 0.002242
2023-01-10 0.000352
2023-01-11 0.009001
2023-01-12 0.002259
2023-01-13 0.000308
2023-01-17 0.011068
2023-01-18 0.000882
2023-01-19 0.016097
dtype: float64
The result shows the daily standard deviation calculated for the two
stocks combined.
Now we show how to calculate the volatility manually by going through
the exact steps described earlier. Our first step is to de-mean the daily
returns and obtain the deviations from the (arithmetic) mean:
The next step is to square these deviations so that they would not cancel
each other when summing together. Squaring is the same as raising the
element to the power of two, using the double asterisk notation:
squared_deviations_df = deviations_df**2
>>> squared_deviations_df.head()
AAPL GOOG
2023-01-04 0.000010 2.350688e-04
2023-01-05 0.000318 6.845668e-04
2023-01-06 0.000874 1.374582e-04
2023-01-09 0.000010 8.787273e-06
2023-01-10 0.000008 4.352158e-07
In the third step, we average these daily squared deviations using the
mean() function:
variance = squared_deviations_df.mean()
>>> variance
AAPL 0.000154
GOOG 0.000235
dtype: float64
The last step is to take the square root of the variance to obtain the
volatility:
volatility = np.sqrt(variance)
>>> volatility
AAPL 0.012390
GOOG 0.015337
dtype: float64
Notice that the result is different from the one obtained using the
std() function! The cause for the difference is that the std() function
calculates the sample standard deviation, which divides N − 1 in the
denominator as opposed to N in our manual calculations.
To correct this, let us revisit step three and divide the sum of squared
deviations by N − 1 this time. In Listing 4-5, we first get the number of
rows N using the first dimension (row dimension) of the shape()
function, then plug in the calculation based on the formula of variance.
num_rows = squared_deviations_df.shape[0]
variance2 = squared_deviations_df.sum() /
(num_rows-1)
>>> variance2
AAPL 0.000169
GOOG 0.000259
dtype: float64
Listing 4-5 Calculating the sample variance
Taking the square root now gives the same result as using the std()
function:
volatility2 = np.sqrt(variance2)
>>> volatility2
AAPL 0.012995
GOOG 0.016086
dtype: float64
Now we have the single-period volatility that measures the daily spread
of the returns around its mean, the next section calculates the annualized
volatility.
annualized_vol = returns_df.std()*np.sqrt(252)
>>> annualized_vol
AAPL 0.206289
GOOG 0.255356
dtype: float64
We can also calculate the square root of 252 by raising it to the power of
0.5, which returns the same result:
annualized_vol = returns_df.std()*(252**0.5)
>>> annualized_vol
AAPL 0.206289
GOOG 0.255356
dtype: float64
returns_per_day = (returns_df+1).prod()**
(1/returns_df.shape[0]) - 1
>>> returns_per_day
AAPL 0.007153
GOOG 0.004178
dtype: float64
annualized_return = (returns_per_day+1)**252-1
>>> annualized_return
AAPL 5.025830
GOOG 1.859802
dtype: float64
Listing 4-6 Annualizing the daily return
It seems Apple is doing quite well compared with Google for the first
few days.
There is another way to calculate the annualized return, a faster way:
annualized_return = (returns_df+1).prod()**
(252/returns_df.shape[0])-1
>>> annualized_return
AAPL 5.025830
GOOG 1.859802
dtype: float64
The key change here is that we raise the terminal return to the power of
252/N. This is standardization, bringing the daily scale to the yearly scale.
riskfree_rate = 0.03
excess_return = annualized_return - riskfree_rate
sharpe_ratio = excess_return/annualized_vol
>>> sharpe_ratio
AAPL 24.217681
GOOG 7.165694
dtype: float64
Listing 4-7 Calculating the Sharpe ratio
Thus, the Sharpe ratio as a risk-adjusted return is much higher for Apple
than Google for the first few days.
Summary
In this chapter, we explored the two key characteristics of any financial
asset: risk and return. Return refers to the financial reward an asset brings,
while risk represents the volatility or uncertainty of that return. As
investors, our goal is to maximize return while minimizing risk.
We introduced different ways to represent and calculate the returns,
including the simple return, terminal return, multiperiod return, and the 1+R
formatted return. It is important to understand the connections among these
forms of return when translating one form to the other.
We then highlighted the risk-return trade-off, where low-return assets
are typically associated with low risk and high-return assets with high risk.
To better compare the risk and return for different investment vehicles, we
introduced the annualized return and volatility, as well as a risk-adjusted
return metric called the Sharpe ratio. We also provided examples illustrating
the importance of considering both risk and return when comparing
investment products.
Exercises
How many inputs do we need to calculate a single-period return?
What is the return if the asset price changes from $5 to $6?
Is the total return of a popular stock typically higher or lower than its
price return?
Calculate the three-period return that consists of 10%, –5%, and 6%.
If we buy an asset that rises by 10% on day one and drops by 10% on day
two, is our return positive, negative, or zero?
Calculate the annualized return for an asset with a quarterly (three
months) return of 2%.
Download the YTD stock data for Apple and Tesla and calculate the
daily cumulative returns using the daily closing price. Plot the returns as
line charts.
Both annualized volatility and variance grow linearly with time, correct?
Suppose the monthly volatility is 5%. Calculate the annualized volatility.
The annualized volatility is always greater than the monthly volatility.
True or false?
The risk-free rate is the return on an investment that carries a low risk.
True or false?
If the risk-free rate goes up and the volatility of the portfolio remains
unchanged, will the Sharpe ratio increase or decrease?
Obtain monthly return data based on the median daily price per month of
Apple stock in the first half of 2022. Calculate the annualized return and
volatility based on the monthly returns.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_5
5. Trend-Following Strategy
Peng Liu1
(1) Singapore, Singapore
Figure 5-2 Calculating the simple returns based on the definition of percentage return
This requires two steps: first, calculate the ratio to obtain the so-called 1+R
return. This ratio reflects the growth factor of the asset’s price from the beginning of
the period to the end. If this ratio is greater than one, it indicates that the asset’s price
has increased over the period. If it’s less than one, it indicates a decrease in the asset’s
price. If the ratio equals one, it means the asset’s price hasn’t changed.
Next, we would subtract one from the 1+R return to convert it to the simple
return. This step transforms the growth factor into the actual percentage return.
Subtracting one essentially removes the initial investment from the calculation,
leaving only the gained or lost amount relative to the initial investment, which is the
return. See Figure 5-3 for an illustration, where the daily returns are the same as in the
previous approach.
Figure 5-3 Calculating the simple returns based on the 1+R approach
This 1+R method is often used because it is more intuitive. The growth factor
easily shows how much the initial investment has grown (or shrunk), and subtracting
one gives the net growth in percentage terms, which is the simple return. This method
is especially useful when dealing with multiple time periods, as growth factors can
simply be multiplied together to calculate the cumulative growth factor over several
periods.
Q4: What is the terminal return from day 1 to day 5 without compounding?
Answer: The terminal return is the total return on an investment over a given
period of time. It’s a measure of the total gain or loss experienced by an investment
from the start of the investment period to the end, without considering any
compounding effect over the period.
To calculate the terminal return without involving the compounding process, we
would resort to , where the second formula first calculates
the ratio of the asset’s price on day 5 to its price on day 1 (which reflects the overall
growth factor) and then subtracts one to convert the growth factor into a terminal
return. See Figure 5-4 for an illustration.
Figure 5-4 Calculating the terminal return without compounding
Q5: What is the terminal return from day 1 to day 5 with compounding? Is it
equal to the result in Q4?
Answer: Compounding returns is an important concept in finance. It reflects the
fact that not only your initial investment earns a return but also the returns from
previous periods. This leads to exponential growth over time, given a positive return
rate.
We will fill in the “return3” column, where each cell is a product between the 1+R
return of the current period and the cumulative 1+R return of the previous period,
offset by one. For the first period (from day 1 to day 2), the “return3” value would be
just the “1 + R” return for this period. See Figure 5-5 for an illustration.
Figure 5-5 Calculating the terminal return using compounding
As it turns out, the terminal return is 6%, which is the same as previously
calculated.
Q6: Sum up the single-period returns in Q3. Is it equal to the result in Q4?
Answer: The result shows that it is different from 6%. In general, adding up
single-period returns can lead to incorrect conclusions about the overall return on
investment. The sum of the single-period returns is not equal to the terminal return
(from Q4) because this approach overlooks the effect of compounding. In other
words, by simply summing up single-period returns, we are effectively treating each
period’s return as if it was independent and earned on the initial investment amount,
disregarding the fact that the investment grows with each period due to the returns
earned in the prior periods. This is why we see a difference between the summed
single-period returns and the terminal return calculated through the correct method
that takes into account the compounding effect.
The principle of compounding acknowledges that returns accumulate over time,
meaning the returns earned in one period are reinvested and can generate further
returns in subsequent periods. So, while the sum of single-period returns might
provide a rough estimate of the total return, it is not a correct measure, especially
when the time span is long, or the return rate is high. Instead, the appropriate way to
calculate the total return over multiple periods is to use the concept of compound
returns, which considers both the initial investment and the reinvestment of returns. It
is thus important to follow the sequential compounding process when calculating the
terminal return. See Figure 5-6 for an illustration.
Figure 5-6 Summing up all single-period returns
Here, St + 1 and St represent the asset price at the future time t + 1 and the current
time t, respectively, and ln denotes the natural logarithm. See Figure 5-7 for an
illustration.
Normality: In addition, financial models often assume that returns are normally
distributed. However, it’s been observed that simple returns have skewness and
excess kurtosis, implying that they deviate from normality. On the other hand, log
returns tend to have properties closer to normality which makes them a better fit for
these financial models.
Continuously compounded returns: Log returns also represent continuously
compounded returns. This property makes log returns the preferred choice in
certain financial applications, especially those involving options and other
derivatives, where continuous compounding is commonly used.
In summary, using log returns simplifies mathematical computations and
statistical analyses, enables symmetry and normality, and represents continuously
compounded returns. These properties make log returns highly valuable in financial
analysis and modeling.
Let us look at a concrete example to understand the calculations using log returns.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
symbol = 'GOOG'
df = yf.download(symbol, start="2023-01-01", end="2023-
01-08")
>>> df
Open High Low Close Adj
Close Volume
Date
2023-01-03 89.830002 91.550003 89.019997 89.699997
89.699997 20738500
2023-01-04 91.010002 91.239998 87.800003 88.709999
88.709999 27046500
2023-01-05 88.070000 88.209999 86.559998 86.769997
86.769997 23136100
2023-01-06 87.360001 88.470001 85.570000 88.160004
88.160004 26612600
Listing 5-1 Downloading Google’s stock price
Here, the first-period return is NaN as there is no prior stock price available.
Let us calculate the terminal return using the original approach by taking the first
and last closing prices as the inputs (based on the definition given earlier), as shown
in Listing 5-3.
# terminal return
terminal_return = df.Close[-1]/df.Close[0] - 1
>>> terminal_return
-0.01716826464354737
Listing 5-3 Calculating the terminal return using the original approach by definition
We can also calculate the same value by compounding the (1+R) returns based on
the .cumprod() function, as shown in Listing 5-4.
# cumulative returns
cum_returns = (1+returns).cumprod() - 1
>>> cum_returns
Date
2023-01-03 00:00:00-05:00 NaN
2023-01-04 00:00:00-05:00 -0.011037
2023-01-05 00:00:00-05:00 -0.032664
2023-01-06 00:00:00-05:00 -0.017168
Name: Close, dtype: float64
Listing 5-4 Calculating the same cumulative terminal return by compounding 1+R formatted returns
Now we calculate the same using log returns, starting by obtaining the single-
period log returns in Listing 5-5.
We can add all log returns from previous periods together to get the cumulative
log returns, convert back to the original scale via exponentiation, and, lastly, offset by
one to convert from 1+R to the simple return format, as shown in Listing 5-6.
Again, we verify the value of the last entry and verify that it is the same as the
previous terminal return:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
symbol = 'AAPL'
df = yf.download(symbol, start="2022-01-01", end="2023-
01-01")
df.index = pd.to_datetime(df.index)
>>> df.head()
Open High Low Close
Adj Close Volume
Date
2022-01-03 177.830002 182.880005 177.710007 182.009995
180.434296 104487900
2022-01-04 182.630005 182.940002 179.119995 179.699997
178.144302 99310400
2022-01-05 179.610001 180.169998 174.639999 174.919998
173.405685 94537600
2022-01-06 172.699997 175.300003 171.639999 172.000000
170.510956 96904000
2022-01-07 172.889999 174.139999 171.029999 172.169998
170.679489 86709100
Listing 5-7 Downloading Apple’s stock price data
Note that we have an index named Date which now assumes a datetime
format to facilitate plotting.
Listing 5-8 generates a plot on the daily adjusted closing price. We will later
overlay its SMA on the same plot.
Now we create an SMA series with a window size of three. We can create the
rolling window using the rolling() method for a Pandas Series, followed by the
mean() method to extract the average value from the window (a collection of price
points). Listing 5-9 creates a new SMA column called SMA-3 and subsets to keep
only two columns: the adjusted closing price and the SMA column.
window = 3
SMA1 = "SMA-"+str(window)
df[SMA1] = df['Adj Close'].rolling(window).mean()
colnames = ["Adj Close",SMA1]
df2 = df[colnames]
>>> df2.head()
Adj Close SMA-3
Date
2022-01-03 180.434296 NaN
2022-01-04 178.144302 NaN
2022-01-05 173.405685 177.328094
2022-01-06 170.510956 174.020315
2022-01-07 170.679489 171.532043
Listing 5-9 Creating simple moving averages
Let us pause for a moment and look at how this column is generated. We see that
the first two rows in the SMA column are missing. This makes sense as both of them
are unable to get a full three-period moving window to calculate the average. In other
words, we cannot calculate the average when there is an empty value in the window
unless additional treatment is applied here, such as ignoring the empty value while
calculating the average.
We note that the third entry of the SMA column is 177.844493. Let us verify
through manual calculation. The following command takes the first three entries of
the adjusted closing price column and calculates the average, which reports the same
value:
which verifies the calculation. Figure 5-10 summarizes the process of calculating
SMA in our running example.
Note that the only difference is in the first two entries, where we have an
incomplete set of values in the rolling window.
Next, we plot the three-period SMA alongside the original daily adjusted closing
price series, as shown in Listing 5-10.
Running these commands generates Figure 5-11. Note that the three-period SMA
curve in red looks less volatile than the original price series in blue. Also, the three-
period SMA curve starts from the third entry.
Figure 5-11 Visualizing the original price and three-period SMA
Now let us add another SMA with a longer period. In Listing 5-11, we add a 20-
period SMA as an additional column to df2.
window = 20
SMA2 = "SMA-"+str(window)
df2["SMA-"+SMA2] = df2['Adj
Close'].rolling(window).mean()
colnames = ["Adj Close",SMA1,SMA2]
Listing 5-11 Creating 20-period SMA
Next, we overlay the 20-period SMA on the previous graph, as shown in Listing
5-12.
Running these commands generates Figure 5-12, which shows that the 20-period
SMA is smoother than the 3-period SMA due to a larger window size.
Figure 5-12 Visualizing the daily prices together with 3-period and 20-period SMAs
alpha = 0.1
df2['EWM_'+str(alpha)] = df2['Adj
Close'].ewm(alpha=alpha, adjust=False).mean()
df2.head()
Adj Close SMA-3 SMA-20 EWM_0.1
Date
2022-01-03 180.434296 NaN NaN 180.434296
2022-01-04 178.144302 NaN NaN 180.205296
2022-01-05 173.405685 177.328094 NaN 179.525335
2022-01-06 170.510956 174.020315 NaN 178.623897
2022-01-07 170.679489 171.532043 NaN 177.829456
Listing 5-13 Creating EMA series
We observe that there is no missing value in the EMA series. Indeed, the first
entry will simply be the original price itself due to the design of the EMA weighting
scheme.
As usual, let us verify the calculations to ensure our understanding is on the right
track. The following code snippet manually calculates the second EMA value, which
is the same as the one obtained using the ewm() function:
alpha=0.1
>>> alpha*df2['Adj Close'][1] + (1-alpha)*df2['Adj
Close'][0]
180.73006591796877
Let us continue to create another EMA series with α = 0.5. In other words, we
assign an equal weightage to the current observation and historical ones:
alpha = 0.5
df2['EWM_'+str(alpha)]= df2['Adj Close'].ewm(alpha=alpha,
adjust=False).mean()
df2.head()
Adj Close SMA-3 SMA-20
EWM_0.1 EWM_0.5
Date
2022-01-03 180.434296 NaN NaN 180.434296
180.434296
2022-01-04 178.144302 NaN NaN 180.205296
179.289299
2022-01-05 173.405685 177.328094 NaN 179.525335
176.347492
2022-01-06 170.510956 174.020315 NaN 178.623897
173.429224
2022-01-07 170.679489 171.532043 NaN 177.829456
172.054357
Let us put all these moving averages in a single chart. Here, the plot() function
treats all four columns as four separate series to be plotted against the index column,
as shown in Listing 5-14.
df2.plot(linewidth=3, figsize=(12,6))
plt.title('Daily adjusted closing price with SWA and
EWM', fontsize=20)
plt.xlabel('Date', fontsize=16)
plt.ylabel('Price', fontsize=16)
Listing 5-14 Plotting all moving averages together
Running these commands generates Figure 5-13. We note that EWM_0.1 (red
line) is close to SMA-20 (green line), both of which give more weightage to historical
observations. The same is true for the other two moving averages. For EMA, a small
weighting factor α results in a high degree of smoothing, while a larger value leads to
a quicker response to recent changes.
Figure 5-13 Visualizing the daily closing prices with both SMA and EMA of different configurations
Having looked at how to compute these moving averages, the next section shows
how to use them as technical indicators to develop a trend-following strategy.
>>> df2.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 251 entries, 2022-01-03 00:00:00-05:00 to
2022-12-30 00:00:00-05:00
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Adj Close 251 non-null float64
1 SMA-3 249 non-null float64
2 SMA-20 232 non-null float64
3 EWM_0.1 251 non-null float64
4 EWM_0.5 251 non-null float64
dtypes: float64(5)
memory usage: 19.9 KB
Now we will use SMA-3 and SMA-20 as the respective short-term and long-term
moving averages, whose crossover will generate a trading signal. We leave it as an
exercise to try both SMA with different window sizes and EMA with different
weighting schemes.
Note that we can only use the information up to yesterday to make a trading
decision for tomorrow. We cannot use today’s information since the closing price is
not yet available in the middle of the day. To enforce this requirement, we can shift
the moving averages one day into the future, as shown in the following code snippet.
This essentially says that the moving average for today is derived from historical
information up to yesterday.
Now let us implement the trading rule: buy if SMA-3 > SMA-20, and sell if
SMA-3 < SMA-20. Such an if-else condition can be created using the
np.where() function, as shown in Listing 5-15.
# identify buy signal
df2['signal'] = np.where(df2['SMA-3'] > df2['SMA-20'], 1,
0)
# identify sell signal
df2['signal'] = np.where(df2['SMA-3'] < df2['SMA-20'],
-1, df2['signal'])
df2.dropna(inplace=True)
Listing 5-15 Creating and identifying buy and sell signals
Here, a normal trading day would assume a value of either 1 or –1 in the signal
column. When there is a missing value or other special cases, we set it to 0. We also
use the dropna() function to ensure that the DataFrame is of good quality by
dropping rows with any NA/missing value in it.
We can check the frequency distribution of the signal column as follows:
>>> df2['signal'].value_counts()
-1 135
1 96
Name: signal, dtype: int64
The result shows that there are more declining days than inclining days, which
confirms the downward trending price series shown earlier.
Next, we introduce a baseline strategy called buy-and-hold, which simply means
we hold one share of Apple stock until the end of the whole period. Also, we will use
the log return instead of the raw return to facilitate the calculations. Therefore, instead
of taking the division between consecutive stock prices to get , we now take the
difference log St + 1 − log St to get , which can then be exponentiated to
convert to back .
The following code snippet calculates the instantaneous logarithmic single-period
return, where we first take the logarithm of the adjusted closing prices and then call
the diff() function to obtain the differences between consecutive pairs of prices:
df2['log_return_buy_n_hold'] = np.log(df2['Adj
Close']).diff()
Now comes the calculation of the single-period return for the trend-following
strategy. Recall the signal column we created earlier. This column represents
whether we go long (valued 1) or short (value –1) in a position for every single
period. This also shows that the logarithmic return is positive if St + 1 > St and
negative if St + 1 < St. This creates the following four scenarios when the asset moves
from St to St + 1:
When we long an asset and its logarithmic return is positive, the trend-following
strategy reports a positive return, that is, .
When we long an asset and its logarithmic return is negative, the trend-following
strategy reports a negative return, that is, .
When we short an asset and its logarithmic return is positive, the trend-following
strategy reports a negative return, that is, .
When we short an asset and its logarithmic return is negative, the trend-following
strategy reports a positive return, that is, .
Summarizing these four scenarios, we can obtain the single-period logarithmic
return for the trend-following strategy by multiplying signal with the
log_return_buy_n_hold (the single-period logarithmic return based on the
buy-and-hold strategy), as shown in Listing 5-16.
df2['log_return_trend_follow'] = df2['signal'] *
df2['log_return_buy_n_hold']
Listing 5-16 Calculating the log return of the trend-following strategy
Compared with the buy-and-hold strategy, the key difference is the additional
shorting actions generated by the trend-following strategy. That is, when the stock
price drops, the buy-and-hold strategy will register a loss, while the trend-following
strategy will make a profit if the trading signal is to go short. Creating a good trading
signal thus makes all the difference.
Next, we create explicit trading actions. The signal column tells us whether we
should go long or short in the given asset under the trend-following strategy.
However, this does not mean we need to make a trade at every period. If the signal
remains the same for two consecutive periods, we simply hold on to the position and
remain seated. In other words, there is no trading action for this specific trading day.
This applies in the case of two consecutive 1s or –1s in the signal column.
On the other hand, we will make an action when there is a sign switch in the
trading signal, changing from 1 to –1 or from –1 to 1. The former means changing
from longing a unit of stock to shorting it, while the latter means the reverse.
To create the trading actions, we can use the diff() method again on the
signal column, as shown in the following:
df2['action'] = df2.signal.diff()
We can produce a frequency count of different trading actions using the
value_counts() function:
>>> df2['action'].value_counts()
0.0 216
2.0 7
-2.0 7
Name: action, dtype: int64
The result shows that the majority of the trading days do not require action. For
the 14 days with a trading action, 7 days change the position from short to long, and
another 7 change from long to short.
We can visualize these trading actions as triangles on the graph with stock prices
and SMAs. In Listing 5-17, we indicate a buy action via the green triangle facing
upward when the short-term SMA crosses above the long-term SMA. On the other
hand, we use a red triangle facing downward to indicate a sell action when the short-
term SMA crosses below the long-term SMA.
plt.rcParams['figure.figsize'] = 12, 6
plt.grid(True, alpha = .3)
plt.plot(df2['Adj Close'], label = 'Adj Close')
plt.plot(df2['SMA-3'], label = 'SMA-3')
plt.plot(df2['SMA-20'], label = 'SMA-20')
plt.plot(df2.loc[df2.action == 2].index, df2['SMA-3']
[df2.action == 2], '^',
color = 'g', markersize = 12)
plt.plot(df2[df2.action == -2].index, df2['SMA-20']
[df2.action == -2], 'v',
color = 'r', markersize = 12)
plt.legend(loc=1);
Listing 5-17 Visualizing trading actions
Running these commands generates Figure 5-14. Again, we denote the green
triangles as acting from short to long and the red triangles as moving from long to
short.
Figure 5-14 Visualizing the trading actions, including going from short to long (green triangles) and long to
short (red triangles)
Let us analyze the cumulative returns of each period for both trading strategies.
Specifically, we would like to obtain the final percentage return at the end of 2022 if
we started with one unit of Apple stock at the beginning of 2022, comparing the two
trading strategies.
Recall that we need to multiply the 1+R return at each period to carry out the
compounding process in order to obtain the terminal return (after subtracting one).
We also know that the 1+R return is the same as the division between two consecutive
prices, that is, . Therefore, to calculate the terminal return, we first
convert the returns from the logarithmic format to the usual percentage format using
the np.exp() function, then carry out the compounding by performing a cumulative
product operation using the cumprod() method. This is achieved via Listing 5-18,
where we leave out the last step of subtracting by one and report the 1+R return.
plt.plot(np.exp(df2['log_return_buy_n_hold']).cumprod(),
label='Buy-n-hold')
plt.plot(np.exp(df2['log_return_trend_follow']).cumprod(),
label='Trend following')
plt.legend(loc=2)
plt.title("Cumulative return of different trading
strategies")
plt.grid(True, alpha=.3)
Listing 5-18 Visualizing cumulative returns
Running these commands generates Figure 5-15, which shows that the trend-
following strategy clearly outperforms the buy-and-hold strategy. However, note that
this is a simplified setting that does not take into account transaction cost and other
market factors. More analyses and tests are needed to assess the performance of this
trading strategy (also many others) in the real-world environment.
Figure 5-15 Comparing the cumulative return of buy-and-hold and trend-following strategies for one share of
Apple’s stock
It turns out that sticking to the buy-and-hold strategy would lose by 25%, while
using the trend-following strategy generates a terminal return of 7%.
Summary
In this chapter, we covered the basics of the popular trend-following strategy and its
implementation in Python. We started with an exercise on working with log returns
and then transitioned to different moving averages as commonly used technical
indicators, including simple moving averages and exponential moving averages.
Lastly, we discussed how to generate trading signals and calculate the performance
metrics using this strategy, which will serve as a good baseline strategy as we delve
into other candidates later on.
Exercises
Explain why log returns are symmetric mathematically.
How can we deal with a situation where the price point at a given day is missing
when calculating its moving average?
How does the value of the window size affect the smoothness of the SMA? What
about the impact of α on the smoothness of EMA?
Change the code to obtain a moving median instead of a moving average. Discuss
the difference between the median and the mean. How about maximum and
minimum over the same rolling window?
Switch to EMA to derive the trading signals and discuss the results.
Show mathematically why the log returns are additive over time and explain the
significance of this property in the context of asset returns.
Suppose there are multiple missing price points in your data, how would you
modify the moving average calculation to handle these gaps? What are the
potential issues with your approach?
Experiment with different window sizes for SMA and different values of α for
EMA. Discuss how these parameters affect the sensitivity of the moving averages
to price changes. How would you choose an optimal value for these parameters?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_6
Momentum trading is a strategy that makes use of the strength of price movements as a basis for opening
positions, either longing or shorting a set of assets. It involves buying and/or selling a selected set of assets
according to the recent strength of price trends, assuming that these trends will continue in the same direction if
there is enough force behind a price move. When using momentum trading, traders intend to capitalize on the force
or speed of price movements to determine investment positions. They would either initiate long or short positions
in a curated selection of assets based on the recent vigor of price trends. Crucially, the key presumption
underpinning this approach is that existing trends, given that their force is strong enough, will persist in the same
direction.
When an asset displays an upward trend, registering higher prices, it invariably attracts more attention from a
wider spectrum of traders and investors. The heightened attention garnered by the asset fuels its market price
further. This momentum endures until a significant number of sellers enter and penetrate the market, supplying an
abundance of the asset. Once enough sellers are in the market, the momentum changes its direction and forces an
asset’s price to go lower. This is essentially the price dynamics between supply and demand. At this juncture,
market participants may reassess the fair price of the asset, which may be perceived as overvalued due to the
recent price surge.
In other words, as more sellers infiltrate the market, the momentum alters its course, pushing the asset’s price
in a downward direction. This is essentially a representation of the classic supply and demand dynamics and the
shift from an environment with more buyers than sellers to one where sellers outweigh buyers. Also, note that
while price trends can persist for an extended period, they will inevitably reverse at some point. Thus, the ability to
identify these inflection points and adjust the positions accordingly is also of equal importance.
Figure 6-1 Characterizing the momentum trading strategy for three stocks
The momentum trading strategy is particularly effective in equities, offering a systematic approach to compare
and analyze similar assets. It performs a cross-sectional analysis across the equity universe (in this case, three
stocks), evaluating and rank-ordering the constituents based on their relative performances over a specified
lookback period. This process enables traders to identify strong performers and potential laggards, using their
recent momentum as a proxy for future performance.
In making a trading decision, the momentum strategy often embraces a two-pronged approach, establishing a
portfolio with two legs. The first leg is the “long” leg, consisting of top-ranked assets projected to maintain their
strong upward price momentum. Traders buy these stocks with an expectation of price appreciation, aiming to sell
at a higher price in the future. The second leg is the “short” leg, made up of bottom-ranked assets showing signs of
declining price momentum. Traders sell these stocks, often through short-selling, where they borrow the stock to
sell in the market with the intent to buy it back at a lower price later. The idea is to profit from the anticipated price
decline of these assets. By going long on assets with strong positive momentum and short on assets with negative
momentum, traders can potentially benefit from both rising and falling markets, provided the identified momentum
persists over the holding period.
Note that momentum strategies, grounded in the principle of relative momentum, maintain their long and short
positions irrespective of the broader market trends. These strategies function on the assumption that the strongest
performers and underperformers will persist in their respective trajectories, thus maintaining their relative
positions in the investment universe. In other words, in a bullish market environment, the stocks with the strongest
upward momentum are expected to outperform the market. Meanwhile, during bearish phases, these same high-
momentum stocks may fall in price, but they are still expected to perform better than other stocks that are falling
more rapidly. Conversely, the bottom-ranked stocks, showing declining momentum, are expected to underperform
the market. In a rising market, these stocks may increase in value, but at a slower pace than the market. Similarly,
in a falling market, these stocks are anticipated to decline more rapidly than the broader market. Thus, irrespective
of whether the market is bullish or bearish, momentum strategies rely on the persistence of relative performance.
Contrary to the momentum trading strategy, which mandates regular trading based on a predefined lookahead
window, the trend-following strategy operates without a set trading frequency. Rather, it’s driven entirely by the
data at hand. Trading actions are informed by the moving averages’ interactions, leading to potentially less
frequent but more strategically timed trades. Such a mechanism makes the trend-following strategy more flexible
as it adapts to the market’s movements.
Note that in a trend-following strategy, the primary concern is whether an asset is on an upward or downward
trend. When employing this strategy, traders do not focus on the comparative performance of different assets
against each other, as in a momentum strategy. Rather, their interest lies in identifying and capitalizing on
established price trends of individual assets. The underlying assumption for this strategy is that the identified asset
prices that have been rising or falling steadily over a period will continue to move in the same direction. So, a
trader would go long when an asset shows an upward trend and go short when it’s on a downward trend. The
action is to “ride the wave” as long as the trend continues. The “trendiness” of the market completely determines
the trading decisions of the strategy.
In summary, while both strategies aim to exploit market momentum, the trend-following strategy involves time
series analysis that relies on the absolute momentum in historical prices of the same asset, and the momentum
trading strategy involves cross-sectional analysis that relies on the relative momentum across multiple assets.
Thus, these two strategies are fundamentally different from each other.
The next section introduces implementing the momentum trading strategy using Python.
import pandas as pd
import requests
from bs4 import BeautifulSoup
import os
import numpy as np
import pandas as pd
import yfinance as yf
Listing 6-1 Importing relevant packages
Next, we write a function called fetch_info() to complete the scraping task. As shown in Listing 6-2, we
first assign the web link to the url variable and store the header details in the headers variable. The headers are
necessary metadata upon visiting a website. We then send a GET request to obtain information from the specified
web link via the requests.get() method and pull and parse the data out of the scraped HTML file using
BeautifulSoup(), stored in the soup variable. We can then find the meat in the soup by passing the specific
node name (table in this case) to the find_all() function, read the HTML data into a DataFrame format
using the read_html() function from Pandas, and drop the unnecessary column (the Notes column) before
returning the DataFrame object. Finally, if the scraping fails, the function will print out an error message via a
try-except control statement.
def fetch_info():
try:
url = "https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:101.0) Gecko/20100101 Firefox/101.0',
'Accept': 'application/json',
'Accept-Language': 'en-US,en;q=0.5',
}
# Send GET request
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
# Get the symbols table
tables = soup.find_all('table')
# # Convert table to dataframe
df = pd.read_html(str(tables))[1]
# Cleanup
df.drop(columns=['Notes'], inplace=True)
return df
except:
print('Error loading data')
return None
Listing 6-2 Fetching relevant information from the web page
Now let us call the function to store the result in dji_df and output the first five rows, as shown in the
following:
We can then take the Symbol column, extract the values, and convert it to a list format:
tickers = dji_df.Symbol.values.tolist()
With the DJI tickers available, we can now download the stock prices for these ticker symbols using the
yfinance package.
start_date = "2021-01-01"
end_date = "2022-09-01"
df = yf.download(tickers, start=start_date, end=end_date)
Listing 6-3 Downloading the daily stock prices of DJI tickers
By now, we have stored the stock prices of the 30 DJI constituents, with each column representing one ticker
and each row indicating a corresponding trading day. The index of the DataFrame follows the datetime format.
Next, we convert the daily stock prices to monthly returns.
Although chaining together relevant operations looks more concise, it is not the best way to learn these
operations if this is the first time we encounter them. Let us decompose these operations. The first operation is to
call the pct_change() method, which is a convenient function widely used in many contexts. Next comes the
resample() function, which is a convenient method for frequency conversion and resampling of time series
data. Let us use some dummy data to understand this function.
The following code snippet creates a Pandas Series object with nine integers ranging from zero to eight, which
are indexed by nine one-minute timestamps:
We then aggregate the series into three-minute bins and sum the values of the timestamps falling into a bin, as
shown in the following code snippet:
>>> series.resample('3T').sum()
2000-01-01 00:00:00 3
2000-01-01 00:03:00 12
2000-01-01 00:06:00 21
Freq: 3T, dtype: int64
As we can see from the result, the resample() function completes the aggregation operation by the
specified interval, and the following method summarizes the data within the interval.
Back to our running example, we downsample the raw daily returns into monthly returns, so each month is
represented with only one data point instead of 30 (in a typical month). The aggregation works by cumulating all
daily returns following the same procedure: converting to 1+R format, compounding, and then converting back to
simple return.
The new thing here is the lambda function. We use the x symbol to represent a general input argument. In this
case, it will be all the raw daily returns in a given month. Since this lambda function performs a customized
operation, we use the agg() function to carry through the customized function, instead of using the built-in
function such as sum() as before.
By now, we have converted the daily returns to monthly representations where every single monthly return
represents the terminal return of the daily returns compounded within the month. Next, we calculate another metric
using historical monthly returns to indicate the current month’s stock performance.
By now, we have calculated the six-month terminal monthly return as the cumulative return of the past six
months, including the current month. This also explains why the first five months show empty values in the
previous result and the cumulative monthly returns only start from the sixth month.
Next, we look at using these terminal returns to generate trading signals.
Since our data lasts until 2022-08-31, we will use 2022-07-31 as the trade formation period. To generate a
trading strategy, we will use the terminal monthly return from the previous month indexed at 2022-06-30 as the
end of the measurement period. We resort to the datetime package to encode these two dates, as shown in
Listing 6-6.
import datetime as dt
end_of_measurement_period = dt.datetime(2022,6,30)
formation_period = dt.datetime(2022,7,31)
Listing 6-6 Identifying the measurement and formation periods
These dates will then be used to slice the cumulative monthly return DataFrame stored in
past_cum_return_df. In the following code snippet, we pass the end_of_measurement_period
variable to the .loc[] property of past_cum_return_df to perform label-based indexing at the row level.
Since the result is Pandas Series indexed by the 30 ticker symbols, we will use the reset_index() method to
reset its index to zero-based integers and bring the symbols as a column in the resulting DataFrame. The following
code snippet shows the resulting cumulative terminal returns at the end of the measurement period:
end_of_measurement_period_return_df =
past_cum_return_df.loc[end_of_measurement_period]
end_of_measurement_period_return_df =
end_of_measurement_period_return_df.reset_index()
>>> end_of_measurement_period_return_df.head()
index 2022-06-30 00:00:00-04:00
0 AAPL -0.227936
1 AMGN 0.099514
2 AXP -0.144964
3 BA -0.320882
4 CAT -0.126977
These six-month terminal monthly returns of the 30 DJI constituents represent the relative momentum of each
stock. We can observe the stock symbols and returns with the highest momentum in the positive and negative
directions using the following code snippet:
Here, we used the methods idxmax() and idxmin() to return the index of the maximum and minimum
values, respectively.
These two stocks would become the best choices if we were to long or short an asset. Instead of focusing on
only one stock in each direction (long and short), we can enlarge the space and use a quantile approach for stock
selection. For example, we can classify all stocks into five groups (also referred to as quantiles or percentiles)
based on their returns and form a trading strategy that longs the stocks in the top percentile and shorts those in the
bottom percentile.
To obtain the quantile of each return, we can use the qcut() function from Pandas, which receives a Pandas
Series and cuts it into a prespecified number of groups based on their quantiles, thus discretizing the continuous
variables into a categorical, more specifically, and ordinal one. The following code snippet provides a short
example:
Thus, the qcut() function rank-orders the series into five groups based on their quantiles. We can now
similarly rank-order the returns and store the result in a new column called rank, as shown in Listing 6-7.
end_of_measurement_period_return_df['rank'] =
pd.qcut(end_of_measurement_period_return_df.iloc[:,1], 5, labels=False)
>>> end_of_measurement_period_return_df.head()
index 2022-06-30 00:00:00-04:00 rank
0 AAPL -0.227936 1
1 AMGN 0.099514 4
2 AXP -0.144964 2
3 BA -0.320882 0
4 CAT -0.126977 2
Listing 6-7 Rank-ordering the stocks based on cumulative terminal monthly returns
We can now use this column to select the top and bottom performers. Specifically, we will long the stocks
ranked four and short the stocks ranked zero. Let us observe the stock symbols in these two groups via Listing 6-8.
long_stocks =
end_of_measurement_period_return_df.loc[end_of_measurement_period_return_df["r
>>> long_stocks
array(['AMGN', 'CVX', 'IBM', 'KO', 'MRK', 'TRV'], dtype=object)
short_stocks =
end_of_measurement_period_return_df.loc[end_of_measurement_period_return_df["r
>>> short_stocks
array(['BA', 'CRM', 'CSCO', 'DIS', 'HD', 'NKE'], dtype=object)
Listing 6-8 Obtaining the stock tickers to long or short
Having identified the group of stocks to be bought or sold, we will execute the trading actions and enter into
these positions for a period of one month. Since the current period is 2022-07-31, we will evaluate the out-of-
sample performance of the momentum strategy on 2022-08-31.
The result shows that the majority of the top performers are decreasing in price, which is a direct reflection of
market sentiment during that period of time. We can similarly obtain the evaluation-period performance for the
bottom performances in the short position, as shown in Listing 6-10.
short_return_df = mth_return_df.loc[formation_period
+ relativedelta(months=1), \ mth_return_df.columns.isin(short_stocks)]
>>> short_return_df
BA 0.005900
CRM -0.151614
CSCO -0.014327
DIS 0.056362
HD -0.035350
NKE -0.073703
Name: 2022-08-31 00:00:00-04:00, dtype: float64
Listing 6-10 Obtaining the performance of stocks in a short position at the evaluation period
Now we calculate the return of the evaluation period based on these two positions. We assume an equally
weighted portfolio in both positions. Thus, the final return is the average of all member stocks in the respective
position. Also, since we hold a short position for the bottom performers, we subtract the average return from the
short position in these stocks while adding the average return from the long position. Listing 6-11 completes the
calculation.
Therefore, the momentum trading strategy reports a final monthly return of 1.587%. Now let us compare with
the buy-and-hold strategy.
Next, we follow the same approach to calculate the monthly terminal returns, as shown in Listing 6-12.
We can then access the monthly return during the evaluation period, as shown in the following code snippet:
The buy-and-hold strategy thus reports a monthly return of –4.064% in the same evaluation period. Although
the momentum trading strategy performs better, we are still far from claiming victory here. More robust
backtesting on the out-of-sample performance across multiple periods is still needed.
Summary
In this chapter, we looked at the momentum trading strategy and its implementation in Python. We started by
comparing it with the trend-following strategy from the previous chapter, discussing their connections and
differences in terms of time series and cross-sectional analysis, as well as the different use of lookback and
lookahead windows. Next, we covered its implementation using monthly returns, focusing on the process of signal
generation and out-of-sample performance evaluation.
In the next chapter, we will learn a systematic way of assessing different trading strategies using backtesting.
Exercises
Play around with the parameters of the momentum trading strategy (such as the window size) and assess the
performance.
Try implementing the momentum trading strategy on a different set of assets, such as commodities, forex, or
cryptocurrencies. Discuss any differences or similarities you observe in the performance of the strategy.
Try to create a hybrid strategy that combines both momentum trading and trend following. How does this hybrid
strategy perform compared to the stand-alone strategies?
Try to incorporate volatility measures, such as Bollinger Bands or standard deviation of returns, into the
momentum trading strategy. How does this impact the performance?
Implement the strategy using other momentum indicators such as the Relative Strength Index (RSI) or the
Moving Average Convergence Divergence (MACD). Compare their performance with the basic momentum
strategy.
Incorporate transaction costs into the momentum trading strategy. How do these costs impact the overall
profitability of the strategy?
Perform backtesting of the momentum trading strategy over different market periods (bull market, bear market,
high volatility period, etc.). How robust is the strategy across different market conditions?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_7
As the name suggests, backtesting refers to the process of testing a trading strategy on
relevant historical data before rolling it up to the live market. It gives an indication of
the likely performance in different trading scenarios. In this chapter, we delve into the
intricacies of backtesting a trading strategy, starting with an understanding of why
backtesting is an important component in quantitative trading.
Note that while backtesting can offer insightful results, it is only as good as the
quality of the data and the assumptions underpinning the trading strategy. For example,
a trading strategy might work very well in a bull market, but it’s equally important to
know how it performs during a bear market or during periods of high market volatility.
By using backtesting, we can analyze the strategy’s robustness over different market
phases, which provides a more holistic view of its performance. Therefore, a good
practice is to choose multiple representative trading periods and record the backtesting
performances so as to obtain a robust measure of the actual performance of a specific
trading strategy.
Introducing Backtesting
Backtesting allows us to simulate a trading strategy using historical data and analyze
the risk and return before actually entering into a position. It refers to the process of
testing a particular trading strategy backward using historical data in order to assess its
performance on future data going forward. Such performance is also called the test set
performance in the context of training a machine learning model, with the common
constraint that the test set needs to be completely kept away when formulating a
strategy or training a model. This period of historical data reserved for testing purposes
allows us to assess the potential variability of the proposed trading strategy.
Building on that, backtesting offers a way to measure the effectiveness of a trading
strategy while keeping emotions and subjective bias at bay. It provides a scientific
method to simulate the actual performance of a strategy, which then can be used to
calculate various summary metrics that indicate the strategy’s potential profitability,
risk, and stability over time. Example metrics include the total return, average return,
volatility, maximum drawdown (to be covered shortly), and the Sharpe ratio.
When carrying out a backtesting procedure, one needs to avoid data snooping (i.e.,
peeking into the future) and observe the sequence of time. Even if a certain period of
historical data is used to cross-validate a strategy, one needs to ensure that the cross-
validation periods fall outside or, more specifically, after the training period. In other
words, the cross-validation period cannot exist in the middle of the training period, thus
preserving the sequence of time as we move forward.
Retrospectively testing out the hypothetical performance of a trading strategy on
historical data allows us to assess its variability over a set of aforementioned metrics.
Since the same trading strategy may exhibit completely different behavior when
backtested over various choices of investment horizons and assets, it is critical to
overlay a comprehensive set of backtesting scenarios for the particular trading strategy
before its adoption. It’s essential to conduct a thorough and varied backtesting process,
as the performance of a trading strategy can greatly vary depending on the choice of
investment horizon, the selection of assets, and the specific market conditions during
the testing period.
For example, we can use backtesting on the trend-following strategy we covered
earlier, where we use two moving averages to generate trading signals if there is a
crossover. In this process, the input consists of two window sizes: one for the short
window and one for the long window. The output is the resulting return, volatility, or
other risk-adjusted return such as the Sharpe ratio. Any pair of window sizes for the
moving averages has a corresponding performance metric, and we would change the
input parameters in order to obtain the optional performance metric on the historical
data. More specifically, we can create a range of potential values for each parameter—
for example, we could test short moving averages from 10 to 30 days and long moving
averages from 50 to 200 days. For each combination of these parameters, we calculate
the corresponding performance metric. The optimal parameters then maximize (or
minimize, depending on the specific metric) this selected performance metric.
Caveats of Backtesting
Note that a good backtesting performance does not necessarily guarantee a good future
return. This is due to the underlying assumption of backtesting: any strategy that did
well in the past is likely to do well in the future period, and conversely, any strategy
that performed poorly in the past is likely to perform poorly in the future. Since
financial markets are complex adaptive systems that are influenced by a myriad of
factors, including economic indicators, geopolitical events, and even shifts in investor
sentiment, all these are constantly evolving and can deviate significantly from past
patterns. In summary, past performance is not indicative of future results.
However, a well-conducted backtest that yields positive results gives assurance that
the strategy is fundamentally sound and is likely to yield profits when implemented in
reality. Backtesting can at least help us to weed out the strategies that do not prove
themselves worthy. However, this assumption is likely to fail in the stock market,
which typically highlights a low signal-to-noise ratio. Since financial markets keep
evolving fast, the future may exhibit patterns not present in the historical data, making
extrapolation a difficult task compared to interpolation.
Another issue with backtesting is the potential to overfit a strategy such that it
performs well on the historical data used for testing but fails to generalize to new,
unseen data. Overfitting occurs when a strategy is too complex and tailors itself to the
idiosyncrasies and noise in the test data rather than identifying and exploiting the
fundamental patterns that govern the data-generating process.
In addition, the backtesting period of the historical data needs to be representative
and reflect a variety of market conditions. Excessively using the same dataset for
backtesting is called data dredging, where the same dataset may produce an
exceptionally good result purely by chance. If the backtest only includes a period of
economic boom, for instance, the strategy might appear more successful than it would
during a downturn or volatile market conditions. By assessing the trading strategy over
a comprehensive and diverse period of historical data, we can avoid data dredging and
better tell if the good performance, if any, is due to sound trading or merely a fluke.
Data dredging, or “p-hacking,” is a material concern in backtesting. It involves
repeatedly running different backtests with slightly modified parameters on the same
dataset until a desirable result is found. The danger here lies in the fact that the positive
result might just be a product of chance rather than an indication of a genuinely
effective strategy. This overfitting could lead to a strategy that performs exceptionally
well on the test data but fails miserably on new, unseen data.
On the other hand, the selection of the stocks used for backtesting also needs to be
representative, including companies that eventually went bankrupt, were sold, or
liquidated. Failing to do so produces the survivorship bias, where one cherry-picks a set
of stocks and only looks at those that survived till today and ignores others that
disappeared in the middle. By excluding companies that have failed or undergone
significant structural changes, we could end up with an overly optimistic view of the
strategy’s profitability and risk profile. This is because the stocks that have survived, in
general, are likely to be those that performed better than average. Ignoring companies
that went bankrupt or were delisted for any reason may skew the results, creating an
illusion of a successful strategy when, in reality, the strategy may not perform as well
in the real environment.
Moreover, by incorporating stocks that have underperformed or failed, we are in a
better position to assess the risk of the strategy and prepare for worst-case scenarios.
This can lead to more accurate risk and reward assessments and better inform the
decision-making process when it comes to deploying the strategy. This strategy will
also be more robust and can withstand various market conditions, including periods of
economic downturn or industry-specific shocks.
Lastly, a backtest should also consider all trading costs, however insignificant, as
these can add up over the course of the backtesting period and drastically affect the
performance of a trading strategy’s profitability. These costs can include brokerage
fees, bid-ask spreads, slippage (the difference between the expected price of a trade and
the price at which the trade is executed), and in some cases, taxes and other regulatory
fees. Overlooking these costs in backtesting can lead to an overly optimistic assessment
of a strategy’s performance. For example, a high-frequency trading strategy might seem
profitable when backtested without trading costs. However, in reality, such strategies
involve a large number of trades and, therefore, high transaction costs, which can
quickly erode any potential profits. Considering these costs during the backtesting
stage will present a more accurate estimate of the net profitability of the strategy.
Moreover, the impact of trading costs can vary greatly depending on the specifics of
the trading strategy. Strategies that involve frequent trading, narrow profit margins, or
large order sizes can be particularly sensitive to the assumptions made about trading
costs in the backtesting process.
Before diving into the specifics of backtesting, let us introduce a popular risk
measure called the maximum drawdown, or max drawdown.
Again, the max drawdown is a risk measure that helps us understand the worst-case
scenario of the trading strategy during the backtest period. Such a calculation process
for the drawdown intuitively makes sense, since most people treat it as the money they
have lost compared to the peak asset value they once owned in the past.
Figure 7-2 provides a sample wealth index curve and the corresponding single-
period drawdowns. Based on the cumulative wealth index curve in the blue line in the
left panel, we can obtain the cumulative peak value in the green line, which overlaps
with the wealth index if the wealth continues to make new heights and stays flat if the
wealth drops. We can thus form a new time series curve consisting of single-period
drawdowns as the percentage difference between these two curves and return the
lowest point as the max drawdown.
Figure 7-2 Obtaining the max drawdown based on a sample wealth index curve
Here, the max drawdown does not mean we are going to suffer such a loss; it
simply means the maximum loss we could have suffered following the particular
trading strategy. The strategy may incur such a loss if we are extremely unlucky and
happen to buy the asset at its peak price and sell it at its trough price. A strategy with a
high max drawdown would indicate a higher risk level, as it shows that the strategy has
historically resulted in substantial losses. On the other hand, a strategy with a low max
drawdown would indicate lower risk, as it has not led to significant losses in the past.
A shrewd reader may immediately wonder if there is a risk-adjusted return metric
based on drawdown risk. It turns out there is, and the measure is called the Calmar
ratio, which is calculated as the ratio between the annualized return of the trailing 36
months and the max drawdown over the same trailing 36 months.
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
start_date = "2023-01-01"
end_date = "2023-02-11"
df = yf.download(['GOOG', 'MSFT'], start=start_date,
end=end_date)
>>> df.head()
Listing 7-1 Downloading the stock price data
As shown in Figure 7-3, the DataFrame has a multilayer column structure, where
the first level indicates the type of stock price and the second layer indicates the stock
ticker.
Figure 7-3 Printing the first five rows of the downloaded stock price data
Note that the DataFrame is indexed by a list of dates in the datetime format, as
shown in the following:
>>> df2.index
DatetimeIndex(['2023-01-03', '2023-01-04', '2023-01-05',
'2023-01-06', '2023-01-09', '2023-01-10', '2023-01-11',
'2023-01-12', '2023-01-13', '2023-01-17', '2023-01-18',
'2023-01-19', '2023-01-20', '2023-01-23', '2023-01-24',
'2023-01-25', '2023-01-26', '2023-01-27', '2023-01-30',
'2023-01-31', '2023-02-01', '2023-02-02', '2023-02-03',
'2023-02-06', '2023-02-07', '2023-02-08', '2023-02-09',
'2023-02-10'], dtype='datetime64[ns]', name='Date',
freq=None)
We can use these date indices to subset the DataFrame by the different granularity
of time periods, such as selecting at the monthly level. As an example, the following
code snippet slices the data in February 2023:
>>> df2.loc["2023-02"]
GOOG MSFT
Date
2023-02-01 101.430000 252.750000
2023-02-02 108.800003 264.600006
2023-02-03 105.220001 258.350006
2023-02-06 103.470001 256.769989
2023-02-07 108.040001 267.559998
2023-02-08 100.000000 266.730011
2023-02-09 95.459999 263.619995
2023-02-10 94.860001 263.100006
The DataFrame we will work with contains 28 days of daily adjusted closing prices
for both stocks, ranging from 2023-01-03 to 2023-02-10. We can check these details
using the info() method:
>>> df2.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 28 entries, 2023-01-03 to 2023-02-10
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 GOOG 28 non-null float64
1 MSFT 28 non-null float64
dtypes: float64(2)
memory usage: 672.0 bytes
>>> df2.plot.line()
As shown in Figure 7-4, both stocks maintained an increasing trend during this
period, although Google suffered a big hit in stock price near the end of the period.
To better understand the stock returns, let us convert the raw stock prices to single-
period percentage returns using the pct_change() function:
returns_df = df2.pct_change()
>>> returns_df.head()
GOOG MSFT
Date
2023-01-03 NaN NaN
2023-01-04 -0.011037 -0.043743
2023-01-05 -0.021869 -0.029638
2023-01-06 0.016019 0.011785
2023-01-09 0.007260 0.009736
Again, the first day shows an NA value since there is no prior stock price as the
baseline to calculate the daily return.
The corresponding line plot for the daily returns follows in Figure 7-5.
>>> returns_df.plot.line()
The figure suggests that the daily returns of both stocks are highly correlated,
except for the last few days when Google showed a sharp dip in price. Such a dip will
reflect itself in the max drawdown measure, as we will show later. Besides, we also
observe a higher volatility for Google as compared to Microsoft.
Now let us construct the wealth index time series. We assume an initial amount of
$1000 for each stock, based on which we will observe the daily evolution of the
portfolio value, assuming a buy-and-hold strategy. Such a wealth process relies on the
sequential compounding process using the cumprod() function based on 1+R
returns, as shown in Listing 7-2.
initial_wealth = 1000
wealth_index_df = initial_wealth*(1+returns_df).cumprod()
>>> wealth_index_df.head()
GOOG MSFT
Date
2023-01-03 NaN NaN
2023-01-04 988.963234 956.256801
2023-01-05 967.335558 927.915502
2023-01-06 982.831735 938.851285
2023-01-09 989.966623 947.992292
Listing 7-2 Constructing the wealth curve
We can override the initial entry as 1000 in order to plot the complete wealth index
curve for both stocks. This essentially tracks the money we have at each time point
after we invest $1000 in each stock on day 1, that is, 2023-01-03.
wealth_index_df.loc["2023-01-03"] = initial_wealth
>>> wealth_index_df.head()
GOOG MSFT
Date
2023-01-03 1000.000000 1000.000000
2023-01-04 988.963234 956.256801
2023-01-05 967.335558 927.915502
2023-01-06 982.831735 938.851285
2023-01-09 989.966623 947.992292
Now we plot the wealth curve for both stocks, as shown in Figure 7-6.
>>> wealth_index_df.plot.line()
It appears that investing in Microsoft ends up with a higher portfolio value than in
Google, despite the latter taking the lead in all of the previous trading days. As it turns
out, one of the biggest drivers for the strong momentum behind Microsoft’s growth is
its investment in the ChatGPT model and recent integration with its search engine Bing
and Edge.
With the wealth index ready, we can build a new series to indicate the cumulative
peak wealth value for each trading day. This is achieved using the cummax() function
shown in Listing 7-3.
prior_peaks_df = wealth_index_df.cummax()
>>> prior_peaks_df.head()
GOOG MSFT
Date
2023-01-03 1000.0 1000.0
2023-01-04 1000.0 1000.0
2023-01-05 1000.0 1000.0
2023-01-06 1000.0 1000.0
2023-01-09 1000.0 1000.0
Listing 7-3 Constructing the cumulative maximum wealth
>>> prior_peaks_df.plot.line()
Now we are in a good position to calculate the daily drawdown as the percentage
difference between the current wealth and the prior peak. This is shown in Listing 7-4.
>>> drawdown_df.plot.line()
The sharp dip in Google’s drawdown at the end of the series becomes more
noticeable now, and we can probably say something about the reason behind the steep
drop. It turns out that there was a factual error in the demo when Google introduced
Bard as a response to the challenge from its rival, Microsoft’s ChatGPT. The error
caused Google shares to tank by a drop of $100 billion in market value.
Coming back to the max drawdown, we can now collect the minimum of these
daily drawdowns as the final report of the max drawdown for this trading strategy, as
shown in Listing 7-5. Note that we entered a long position in both stocks at the
beginning of the investment period, so the trading strategy is simply buy-and-hold.
>>> drawdown_df.min()
GOOG -0.128125
MSFT -0.072084
dtype: float64
Listing 7-5 Calculating the max drawdown
>>> drawdown_df.idxmin()
GOOG 2023-02-10
MSFT 2023-01-05
dtype: datetime64[ns]
We can also limit the range of the DataFrame by subsetting using a less granular
date index in the loc() function. For example, the following code returns the max
drawdown and the corresponding date for each stock in January 2023:
>>> drawdown_df.loc["2023-01"].min()
GOOG -0.044264
MSFT -0.072084
dtype: float64
>>> drawdown_df.loc["2023-01"].idxmin()
GOOG 2023-01-25
MSFT 2023-01-05
dtype: datetime64[ns]
Till now, we have managed to calculate the max drawdown following the requisite
steps. It turns out that a function would be extremely helpful when such steps become
tedious and complex. Using a function to wrap the recipe as a black box allows us to
focus on the big picture and not get bogged down by the inner workings each time we
calculate the max drawdown.
We define a function called drawdown() to achieve this task, as shown in Listing
7-6. This function takes the daily returns in the form of a single Pandas Series as input,
executes the aforementioned calculation steps, and returns the daily wealth index, prior
peaks, and drawdowns in a DataFrame as the output.
Note that the calculation process remains the same. The only change is the
compilation of the relevant information (wealth index, prior peaks, and drawdown) in
one DataFrame. Also, we explicitly specified the input type to be a Pandas Series, as
this saves the need to check the input type later on.
Now let us test this function by passing Google’s daily returns as the input series:
>>> drawdown(returns_df["GOOG"]).head()
Wealth index Prior peaks Drawdown
Date
2023-01-03 NaN NaN NaN
2023-01-04 988.963234 988.963234 0.000000
2023-01-05 967.335558 988.963234 -0.021869
2023-01-06 982.831735 988.963234 -0.006200
2023-01-09 989.966623 989.966623 0.000000
The following code snippet plots the wealth index and prior peaks as line charts:
Figure 7-9 Visualizing the wealth index and prior peaks as line charts
We can use the loc() function to subset for a specific month. For example, the
following code returns the same curves for January 2023:
>>> drawdown(returns_df.loc["2023-01","GOOG"])[['Wealth
index', 'Prior peaks']].plot.line()
Figure 7-10 Visualizing the wealth index and prior peaks for January 2023
Similarly, we can obtain the max drawdown and the corresponding date for both
stocks, as shown in the following code snippet:
>>> drawdown(returns_df["GOOG"])['Drawdown'].min()
-0.1281250188455857
>>> drawdown(returns_df["GOOG"])['Drawdown'].idxmin()
Timestamp('2023-02-10 00:00:00')
>>> drawdown(returns_df["MSFT"])['Drawdown'].min()
-0.035032299621028426
>>> drawdown(returns_df["MSFT"])['Drawdown'].idxmin()
Timestamp('2023-01-19 00:00:00')
The following code snippet returns the max drawdown for both stocks in January
2023:
>>> drawdown(returns_df.loc["2023-01","GOOG"])
['Drawdown'].min()
-0.04426435893749917
>>> drawdown(returns_df.loc["2023-01","MSFT"])
['Drawdown'].min()
-0.035032299621028426
In the next section, we will discuss the backtesting procedure using the trend-
following strategy.
Now we create two moving averages, a short curve with a span of 5 using the
exponential moving average via the ewm() method and a long curve with a window
size of 30 using the simple moving average via the rolling() method, as shown in
Listing 7-7.
sma_span = 30
ema_span = 5
short_ma = 'ema'+str(ema_span)
long_ma ='sma'+str(sma_span)
df_goog[long_ma] = df_goog['Adj
Close'].rolling(sma_span).mean()
df_goog[short_ma] = df_goog['Adj
Close'].ewm(span=ema_span).mean()
>>> df_goog.head()
Adj Close sma30 ema5
Date
2022-01-03 145.074493 NaN 145.074493
2022-01-04 144.416504 NaN 144.679700
2022-01-05 137.653503 NaN 141.351501
2022-01-06 137.550995 NaN 139.772829
2022-01-07 137.004501 NaN 138.710106
Listing 7-7 Calculating the short and long moving averages
Note that the span is directly related to the α parameter we introduced earlier via
the following relationship:
$$ \alpha =\frac{2}{span+1} $$
where span ≥ 1.
Since generating the trading signal requires that both moving averages are available
at each time point, we remove the rows with any NA value in the DataFrame using the
dropna() method, where we set inplace=True to change within the DataFrame
directly:
df_goog.dropna(inplace=True)
>>> df_goog.head()
Adj Close sma30 ema5
Date
2022-02-14 135.300003 137.335750 137.064586
2022-02-15 136.425507 137.047450 136.851559
2022-02-16 137.487503 136.816483 137.063541
2022-02-17 132.308502 136.638317 135.478525
2022-02-18 130.467499 136.402200 133.808181
Now let us plot these two moving averages together with the original price curve
via the following code snippet:
fig = plt.figure(figsize=(14,7))
plt.plot(df_goog.index, df_goog['Adj Close'],
linewidth=1.5, label='Daily Adj Close')
plt.plot(df_goog.index, df_goog[long_ma], linewidth=2,
label=long_ma)
plt.plot(df_goog.index, df_goog[short_ma], linewidth=2,
label=short_ma)
plt.title("Trend following strategy")
plt.ylabel('Price($)')
plt.legend()
Figure 7-11 Visualizing the moving averages together with the raw time series
As Figure 7-11 suggests, the short moving average (green curve) tracks the raw
time series more closely, while the long moving average (orange curve) displays a
smoother pattern due to a stronger averaging effect.
Now let us calculate the log returns of the buy-and-hold strategy, which assumes
buying one share of Google stock and holding it till the end of the investment period.
This is shown in Listing 7-8.
df_goog['log_return_buy_n_hold'] = np.log(df_goog['Adj
Close'] / df_goog['Adj Close'].shift(1))
Listing 7-8 Calculating the log returns of the buy-and-hold strategy
df_goog['log_return_buy_n_hold'] = np.log(df_goog['Adj
Close']).diff()
Listing 7-9 An equivalent way of calculating the log returns
Next, we identify the trading signals for the trend-following strategy, starting by
creating a signal column that indicates the intended position based on the magnitude of
the two moving averages. This is shown in Listing 7-10.
The periodic log returns for the trend-following strategy can be obtained by
multiplying signal with log_return_buy_n_hold via Listing 7-11.
df_goog['log_return_trend_follow'] = df_goog['signal'] *
df_goog['log_return_buy_n_hold']
Listing 7-11 Calculating the periodic log returns of the buy-and-hold strategy
The terminal return can be calculated using the cumprod() function or the
prod() function, as shown in Listing 7-12. The first approach calculates the
compounded periodic return and accesses the last period as the final return before
converting to the simple return format. The second approach directly multiplies all
intermediate percentage returns to get the final return as the last period, followed by
conversion to a simple return.
Note that we can also add up all the log returns and exponentiate the sum to get the
same result:
>>> np.exp(df_goog['log_return_trend_follow'].sum())**
(252/df_goog.shape[0])-1
0.4210313983829783
Let us calculate the annualized volatility, as shown in Listing 7-14. Recall that the
daily volatility scales up as a function of the square root of time.
Now we calculate the Sharpe ratio, assuming a risk-free interest rate of 3%. This is
shown in Listing 7-15.
riskfree_rate = 0.03
# calculate Sharpe ratio of buy-n-hold
sharpe_ratio_buy_n_hold = (annualized_return_buy_n_hold -
riskfree_rate) / annualized_vol_buy_n_hold
>>> sharpe_ratio_buy_n_hold
-1.0569661045137495
# calculate Sharpe ratio of trend following
sharpe_ratio_trend_follow =
(annualized_return_trend_follow - riskfree_rate) /
annualized_vol_trend_follow
>>> sharpe_ratio_trend_follow
0.9953569038205886
Listing 7-15 Calculating the Sharpe ratio
Lastly, we calculate the max drawdown of both strategies, as shown in Listing 7-16.
Although these two strategies are quite disparate in terms of these measures in
backtesting, it also shows the importance of demonstrating the superiority of a strategy
among a set of common backtesting measures before its adoption. In the next chapter,
we will discuss a feedback loop that optimizes the selection of trading parameters, such
as the window size, in order to obtain the best trading performance given a specific
trading strategy.
Summary
In this chapter, we covered the process of backtesting a trading strategy. We started by
introducing the concept of backtesting and its caveats. We then introduced the
maximum drawdown, a commonly used performance measure on the downside risk of
a particular trading strategy, followed by its calculation process. Lastly, we provided an
example of how to backtest a trend-following strategy via multiple performance
measures.
In the next chapter, we will introduce statistical arbitrage with hypothesis testing,
with the pairs trading strategy as the working example.
Exercises
Asset A loses 1% a month for 12 months, and asset B gains 1% per month for 12
months. Which is the more volatile asset?
Drawdown is a measure of only downside risk and not upside risk. True or false?
Assume the risk-free rate is never negative. The drawdown of an investment that
returns the risk-free rate every month is zero. True or false?
The drawdown computed from a daily return series is always greater than or equal to
the drawdown computed from the corresponding monthly series. True or false?
Write a class to calculate the annualized return, volatility, Sharpe ratio, and max
drawdown of a momentum trading strategy.
How does the frequency of data sampling affect the calculated max drawdown?
What might be the implications of using daily data vs. monthly data?
Assume you have calculated a Sharpe ratio of 1.5 for your trading strategy. If the
risk-free rate increases, what would happen to the Sharpe ratio, all else being equal?
If a strategy has a positive average return but a high max drawdown, what might this
suggest about the risk of the strategy?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_8
Statistical arbitrage is a market-neutral trading strategy leveraging statistical methods to identify and
exploit significant relationships between financial assets. Through hypothesis testing, it discerns pricing
discrepancies within correlated asset pairs due to temporary market inefficiencies. By purchasing
underpriced and selling overpriced assets, the strategy ensures profit as the market corrects these
inefficiencies, regardless of overall market movements.
Statistical Arbitrage
Statistical arbitrage refers to the use of statistical methods to identify statistically significant
relationships underlying multiple financial assets and generate trading signals. There are two parts
involved in this process: statistical analysis and arbitrage. In this context, statistical analysis mostly
refers to hypothesis testing, which is a suite of statistical procedures that allows us to determine if a
specific relationship among multiple financial instruments based on the observed data is statistically
significant. On the other hand, arbitrage means making sure-win profits.
At its core, this strategy relies on mean reversion, which assumes that financial instruments that
have deviated far from their historical relationship will eventually converge again. For instance,
consider two highly correlated stocks, A and B. If, due to some short-term market factors, the price of A
increases disproportionately compared to B, a statistical arbitrage strategy might involve short-selling A
(which is now overpriced) and buying B (which is underpriced). As the prices of A and B revert to their
historical correlation, the arbitrageur would close both positions—buy A to cover the short sell and sell
B to realize the gain. The net profit comes from the convergence of prices. Therefore, statistical
arbitrage is essentially a market-neutral strategy, generating profits by taking advantage of temporary
market inefficiencies.
Note that statistical arbitrage strategies should expect a relatively stable long-term equilibrium
relationship between the two underlying assets for the strategy to work. They also operate on relatively
small profit margins, necessitating high volumes of trades to generate substantial returns.
Delving deeper, the first step in the statistical arbitrage process is to identify pairs of trading
instruments that exhibit a high degree of comovement. This can be achieved through statistical
procedures such as correlation analysis or cointegration tests. For instance, consider stocks A and B,
which typically move in sync with each other. Although perfect correlation is rare in financial markets,
we can leverage historical price data to find stocks that are highly correlated, often within the same
industry or sector.
However, this comovement doesn’t always mean equal price changes. Short-term fluctuations
driven by various factors like market sentiment, sudden news announcements, or unforeseen events like
a pandemic can cause a temporary divergence in the price relationship. In the given example, if stock A
increases by 10% and stock B only by 5%, it suggests a temporary mispricing where B is underpriced
relative to A.
This brings us to the second step, which involves capitalizing on this mispricing through trading
actions such as pairs trading. In the case of A and B, an investor could execute a long position on the
underpriced stock B, expecting its price to increase and converge with the price of A.
It’s important to note that statistical arbitrage relies heavily on the premise that these pricing
inefficiencies are temporary and that the price relationship will revert to its historical norm. Therefore,
this strategy necessitates diligent monitoring and a robust risk management system to ensure timely
entries and exits.
Figure 8-1 illustrates one way of performing statistical arbitrage. We assume a perfect correlation
between stocks A and B, where the same percentage change is observed for periods 0, 1, and 2.
However, stock A increased by 10% in period 3, while stock B only increased by 5%. Based on the
principle of statistical arbitrage, we could long stock B, which is considered to be underpriced, or short
stock A, which is considered overpriced. We could also do both at the same time.
Figure 8-1 Illustrating the concept of statistical arbitrage. After identifying a perfect correlation between stocks A and B using statistical
techniques, as indicated by the prices in periods 0, 1, and 2, we would take advantage of market mispricing by longing stock B (which is
underpriced) and/or shorting stock A (which is overpriced)
Pairs Trading
Pairs trading is a market-neutral strategy that leverages statistical analysis to generate potential profits
regardless of the overall market direction. The “pair” in pairs trading refers to simultaneously taking
two positions: going long on one asset and short on another, with the key requirement being that these
assets are highly correlated. The trading signal stems from the spread or price difference between these
two assets.
An unusually large spread, in comparison to historical data, suggests a temporary divergence, and
the anticipation is that this divergence will eventually correct itself, reverting to its mean or average
value over time. Traders can capitalize on this mean-reverting behavior, initiating trades when the
spread is abnormally wide and closing them once the spread narrows and returns to its typical range.
The determination of what constitutes an “abnormal” or “normal” spread is crucial and forms the
core parameters of the pairs trading strategy. This typically involves extensive backtesting, where
historical price data is analyzed to identify consistent patterns in price divergence and convergence,
which then informs the thresholds for trade entry and exit points. Pairs trading, while robust in its
market-neutral stance, requires a keen understanding of the long-term equilibrium relationship between
the paired assets and careful management of potential risks if the expected price convergence does not
materialize.
In the strategy of pairs trading, asset selection is grounded in a statistical procedure called
hypothesis testing, specifically, the cointegration test. This process uses historical price data to identify
pairs of financial instruments that exhibit a high level of correlation. When two assets are highly
correlated, they tend to move in a synchronized manner. This means that any price change in one asset
is typically mirrored proportionally by the other, resulting in relatively stable spreads that do not deviate
significantly from their historical average. However, there can be moments when this spread deviates
markedly from its historical norm, suggesting temporary mispricing of the assets. This divergence
indicates that the assets’ prices have drifted apart more than their usual correlation would predict.
Such deviations create a unique profit opportunity in pairs trading. Traders can capitalize on these
large spreads by betting on their future contraction. Specifically, the strategy would be to go long on the
underpriced asset and short on the overpriced one, with the anticipation that the spread will revert back
to its historical average as the asset prices correct themselves. This reversion provides the opportunity
to close both positions at a profit.
Figure 8-2 provides the overall workflow of implementing a pairs trading strategy. At first, we
analyze a group of financial assets (such as stocks) and identify a pair that passes the cointegration test.
This is a statistical test that determines if a group of assets is cointegrated, meaning their combination
generates a stationary time series, despite each individual time series not exhibiting such stationarity. In
other words, the historical differences, or spreads, of the two cointegrated assets form a stationary time
series. We can thus monitor the current spread and check if it exceeds a reasonable range of historical
spreads. Exceeding the normal range indicates a trading signal to enter two positions: long the
underpriced asset and short the overpriced asset. We would then hold these positions until the current
spread shrinks back to the normal range, upon which point we would exit the positions and lock in a
profit before it shrinks even further (which results in a loss).
Cointegration
Cointegration, a concept pivotal to hypothesis testing, posits two potential scenarios: the null
hypothesis, which states that two or more non-stationary time series are not cointegrated, and the
alternative hypothesis, which claims the opposite, that is, these time series are cointegrated if their
linear combination generates a stationary time series (more on this later).
Let’s demystify some of the jargon here. A time series refers to a sequence of data points indexed
(or listed or graphed) in time order, with each data point assigned a specific timestamp. This dataset can
be analyzed through several summary statistics or statistical properties. These can include metrics like
mean and variance computed over a certain time frame or window.
Moving this window across different periods, a stationary time series exhibits constancy in its mean
and variance on average. This means that no matter when you observe it, its basic properties do not
change. On the other hand, a non-stationary time series demonstrates a trend or a drift, signifying a
changing mean and variance across varying time periods. These time series are dynamic, with their
basic properties shifting over time, often due to factors like trends and seasonality.
Hence, the process of cointegration examines whether there is a long-term equilibrium relationship
between non-stationary time series despite short-term fluctuations. Such long-term equilibrium
manifests as a stationary time series as a linear combination of the two non-stationary time series.
Many traditional statistical methods, including ordinary least squares (OLS) regression, are based
on the assumption that the variables under analysis—which are also time series data points—exhibit
stationarity. This implies that their fundamental statistical characteristics remain consistent over time.
However, when dealing with non-stationary variables, this stationarity assumption gets violated. As a
result, different techniques are needed to perform the modeling. One common strategy is to difference
the non-stationary variable (deriving a new time series by taking the difference in the observed values
of two consecutive time points) to eliminate any observable trend or drift.
A non-stationary time series might possess a unit root, which signifies a root of one in its
autoregressive (AR) polynomial. To put it differently, the value in the next time period is strongly
impacted by the present period value. This dependency reflects a form of serial correlation, where
values from previous periods exert influence on subsequent ones, thereby potentially leading to non-
stationary behavior.
The unit root test, therefore, is a method to examine whether a time series is non-stationary and
possesses a unit root. Identifying and addressing the presence of a unit root is a critical step in the
process of time series modeling, especially when the aim is to understand long-term trends and
forecasts.
In essence, a cointegration test examines the assumption that, although individual time series may
each have a unit root and hence be non-stationary, a linear combination of these time series might result
in a stationary series. This forms the alternative hypothesis for the test.
To be precise, the alternative hypothesis states that the aggregate time series, derived from a linear
combination of individual time series, achieves stationarity. Should this be the case, it would imply a
persistent long-term relationship among these time series variables. Such long-term relationships will
get obscured by temporary fluctuations in the market from time to time, due to factors such as
mispricing. Hence, the cointegration test aids in revealing these hidden long-term relationships among
time series variables.
When assets are determined to be cointegrated—meaning that the alternative hypothesis is upheld—
they are fed into the trading signal generation phase of the pairs trading strategy. Here, we anticipate the
long-term relationship between the two time series variables to prevail, regardless of short-term market
turbulence.
Therefore, cointegration serves as a valuable tool in statistical analysis, exposing the underlying
long-term relationship between two non-stationary and seemingly unrelated time series. This long-term
association, difficult to detect when these time series are analyzed independently, can be discovered by
combining these individual non-stationary assets in a particular way. This combination is typically done
using the Johansen test, yielding a new, combined time series that exhibits stationarity, characterized by
a consistent mean and variance over different periods. Alternatively, the Engle-Granger test can be
employed to generate a spread series from the residuals of a linear regression model between the two
assets.
Figure 8-3 illustrates the process of cointegration and strategy formulation. The purpose of
cointegration is to convert individual non-stationary time series data into a combined stationary series,
which can be achieved via the Johansen test with a linear combination, the Engle-Granger test via a
linear regression model, or other test procedures. We would then derive another series called the spread
to indicate the extent of short-term fluctuation from the long-term equilibrium relationship. The spread
is used to generate trading signals in the form of entry and exit points based on the extent of deviation at
each time point, with the help of entry and exit thresholds defined in advance.
Figure 8-3 Illustrating the process of cointegration using different tests and strategy formulation to generate trading signals
Stationarity
Stock prices are time series data. A stationary time series is a time series where the statistical properties
of the series, including the mean, variance, and covariance at different time points, are constant and do
not change over time. A stationary time series is thus characterized by a lack of observable trends or
cycles in the data.
Let us take the normal distribution as an example. A normal distribution y = f (x; μ, σ) is a
probability density function that maps an input x to a probability output y, assuming a fixed set of
parameters: the mean μ as the central tendency and standard deviation σ as the average deviation from
the mean. The specific form of the probability distribution is as follows:
A widely used normal distribution is the standard normal, specifying μ = 0 and σ = 1. The resulting
probability density function is
We can generate random samples following this specific form using the random.normal()
function from NumPy. In Listing 8-1, we define a function generate_normal_sample() that
generates a normally distributed random sample by passing in the input parameter μ and σ in a list.
To see the impact on the samples generated from a non-stationary distribution, we will specify three
different non-stationary distributions. Specifically, we will generate 100 samples that follow a
distribution with either an increasing mean or standard deviation. Listing 8-2 performs the random
sampling for 100 rounds and compares them with the samples from the standard normal distribution.
for i in range(T):
# generate a stationary sample and append to list
stationary_list.append(generate_normal_sample([0,1]))
# generate a non-stationary sample with an increasing mean and
append to list
nonstationary_list1.append(generate_normal_sample([i,1]))
# # generate a non-stationary sample with an increasing mean and
sd and append to list
nonstationary_list2.append(generate_normal_sample([i,np.sqrt(i)]))
x = range(T)
# plot the lists as line plots with labels for each line
plt.plot(x, stationary_list, label='Stationary')
plt.plot(x, nonstationary_list1, label='Non-stationary with increasing
mean')
plt.plot(x, nonstationary_list2, label='Non-stationary with increasing
mean and sd')
Running the code generates Figure 8-4, where the impact of a changing mean and standard
deviation becomes more pronounced as we increase the magnitude in later rounds.
Figure 8-4 Generating normally distributed random samples from non-stationary distributions with different parameter specifications
Note that we can use the augmented Dickey-Fuller (ADF) test to check if a series is a stationary.
The function stationarity_test() defined in Listing 8-3 accepts two inputs: the time series to
be tested for stationarity and the significant level used to compare with the p-value and determine the
statistical significance. Note that the p-value is accessed as the second argument from the test result
object using the adfuller() function. This is shown in Listing 8-3.
Let us apply this function to the previous time series data. The result shows that the ADF is able to
differentiate if a time series is stationary (with fixed parameters) based on a preset significance level:
>>> print(stationarity_test(stationary_list))
>>> print(stationarity_test(nonstationary_list1))
>>> print(stationarity_test(nonstationary_list2))
p-value is 1.2718058919122438e-12. The series is likely stationary.
p-value is 0.9925665941220737. The series is likely non-stationary.
p-value is 0.9120355459829741. The series is likely non-stationary.
Let us look at a concrete example of how to test for cointegration between two stocks.
import os
import random
import numpy as np
import yfinance as yf
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
from matplotlib import pyplot as plt
%matplotlib inline
SEED = 8
random.seed(SEED)
np.random.seed(SEED)
where β0 denotes the intercept and β1 is the slope of the linear line fitted between these two stocks.
ϵ represents the random noise that is not modeled by the predictor x. Note that we are assuming a linear
relationship between x and y, which is unlikely to be the case in a real-world environment. Another
name for ϵ is the residual, which is interpreted as the (vertical) distance between the predicted value
β0 + β1x and the target value y. That is, ϵ = y − (β0 + β1x).
Our focus would then shift to these residuals, with the intention of assessing if the residual time
series would be stationary. Let us first obtain the residuals from the linear regression model.
In Listing 8-5, we assign the first stock as the target variable Y and the second stock as the predictor
variable X. We then use the add_constant() function to add a column of ones to the X variable,
which can also be considered as the bias trick to incorporate the intercept term β0. Next, we construct a
linear regression model object using the OLS() function, perform learning by invoking the fit()
function, and calculate the residuals as the difference between the target values and the predicted
values, obtained via the predict() method.
The model object is essentially a collection of the model weights (also called parameters) and the
architecture that governs how the data flow from the input to the output. Let us access the model
weights:
We have two parameters in the model: const corresponding to β0 and MSFT corresponding to β1.
Besides using the predict() method to obtain the predicted values, we can also construct the
explicit expression for the predictions and calculate them manually. That is, we can calculate the
predicted values as follows:
The following code snippet implements this expression and calculates the model predictions
manually. We also check if the manually calculated residuals are equal to the previous values using the
equals() function:
# alternative approach
residuals2 = Y - (model.params['const'] + model.params[stocks[1]] *
X)
# check if both residuals are the same
print(residuals.equals(residuals2))
Lastly, we test the stationarity of the residual series, again using the augmented Dickey-Fuller
(ADF) test. The test can be performed using the adfuller() function from the statsmodels
package. There are two metrics that are relevant to every statistical test: the test statistic and the p-value.
Both metrics convey the same information on the statistical significance of the underlying hypothesis,
with the p-value being a standardized and, thus, more interpretable metric. A widely used threshold
(also called the significance level) is 5% for the p-value. That is, if the resulting p-value from a
statistical test is less than 5%, we can safely (up to a confidence level of 95%) reject the null hypothesis
in favor of the alternative hypothesis. If the p-value is greater than 5%, we fail to reject the null
hypothesis and conclude that the two stocks are not cointegrated.
The null hypothesis often represents the status quo. In the case of the cointegration testing using the
Engle-Granger test, the null hypothesis is that the two stocks are not cointegrated. That is, the historical
prices do not exhibit a linear relationship in the long run. The alternative hypothesis is that the two
stocks are cointegrated, as exhibited by a linear relationship between the two and a stationary residual
series.
Now let us carry out the ADF test and use the result to determine if these two stocks are
cointegrated using a significance level of 5%. In Listing 8-6, we apply the adfuller() function to
the prediction residuals and print out the test statistic and p-value. This is followed by an if-else
statement to determine if we have enough confidence to reject the null hypothesis and claim that the
two stocks are cointegrated.
The result suggests that Google and Microsoft stocks are cointegrated due to a small p-value of 2%.
Indeed, based on our previous analysis of calculating the max drawdown, Google and Microsoft stock
prices generally tend to move together. However, with the introduction of ChatGPT in Bing search, the
overall picture may start to change. Such cointegration (comovement) may gradually weaken as the tool
gives everything for Microsoft to win (due to a small revenue from web search) and for Google to lose
(majority revenue comes from web search).
Next, we touch upon another closely related but different statistical concept: correlation.
np.random.seed(123)
X = np.random.normal(1, 1, 100)
Y = np.random.normal(2, 1, 100)
X = pd.Series(np.cumsum(X), name='X')
Y = pd.Series(np.cumsum(Y), name='Y')
Running the code generates Figure 8-5. Series Y has a higher drift than series X as designed and
also exhibits a high degree of correlation (or comovement) across the whole history of 100 points.
Figure 8-5 Illustrating the evolution of two series that are highly correlated but not cointegrated
Let us calculate the exact correlation coefficient and cointegration p-value. In the following code
snippet, we call the corr() method to obtain the correlation of X with Y and use the coint()
function from the statsmodels package to perform the cointegration test and retrieve the resulting
p-value. The coint() function performs the augmented Engle-Granger two-step cointegration test,
similar to how to manually carry out the two-step process earlier. The result shows that these two series
are highly correlated but not cointegrated.
from statsmodels.tsa.stattools import coint
# calculate the correlation coefficeint
>>> print('Correlation: ' + str(X.corr(Y)))
# perform in cointegration test
score, pvalue, _ = coint(X,Y)
>>> print('Cointegration test p-value: ' + str(pvalue))
Correlation: 0.994833254077976
Cointegration test p-value: 0.17830098966789126
In the next section, we dive deep into the implementation of the pairs trading strategy.
Next, we analyze each unique pair of stocks and perform the cointegration test to look for those
with a long-term equilibrium relationship.
These 15 unique pairs of stocks are stored as tuples in a list. Each tuple will go through the
cointegration test in the following section.
threshold = 0.1
# run Engle-Granger test for cointegration on each pair of stocks
for pair in stock_pairs:
# subset df based on current pair of stocks
df2 = df[list(pair)]
# perform test for the current pair of stocks
score, pvalue, _ = coint(df2.values[:,0], df2.values[:,1])
# check if the current pair of stocks is cointegrated
if pvalue < threshold:
print(pair, 'are cointegrated')
else:
print(pair, 'are not cointegrated')
Listing 8-8 Performing a cointegration test for each unique pair of stocks
Note that the threshold is set as 10% instead of 5% as before, since the test would show no
cointegrated pair of stocks when setting the threshold as the latter. As it turns out, the coint()
function is slightly different from our manual implementation of the test procedure earlier. For example,
the order of the time series assumed by the coint() function may not be the same.
Running the code generates the following result:
It turns out that only Google and Microsoft stock prices are cointegrated using the 10% threshold on
the significance level. These two stocks will be the focus of our pairs trading strategy in the following,
starting by identifying the stationary spread between the two stocks.
Running the code generates Figure 8-6. The spread now appears as white noise, that is, following a
normally distributed Gaussian distribution. Since different stocks have different scales of spread, it
would be recommended to standardize them into the same scalar for ease of comparison and strategy
formulation. The next section covers the conversion process that turns the spread into z-scores.
Figure 8-6 Visualizing the spread as the residuals of the linear regression model
Converting to Z-Scores
A z-score is a measure of how many standard deviations the daily spread is from its mean. It is a
standardized score that we can use to compare across different distributions. Denote x as the original
observation. The z-score is calculated as follows:
where μ and σ denote the mean and standard deviation of the time series, respectively.
Therefore, the magnitude of the z-score indicates how far away the current observation deviates
from the mean in terms of the unit of standard deviations, and the sign of the z-score suggests whether
the deviation is above (a positive z-score) or below (a negative z-score) the mean.
For example, assume a distribution with a mean of 10 and a standard deviation of 2. If an
observation is valued at 8, the z-score for this observation would be . In other words, this
observation is one standard deviation away from the mean of the distribution.
The z-score is often used to assess the statistical significance of observation in hypothesis testing. A
z-score of greater than or equal to 1.96 (or smaller than or equal to –1.96) corresponds to a p-value of
0.05 or less, which is a common threshold for assessing the statistical significance.
In Listing 8-10, we visualize the probability density function (PDF) of a standard normal
distribution with a mean of 0 and a standard deviation of 1. We first generate a list of equally spaced
input values as the z-scores using the np.linspace() function and obtain the corresponding
probabilities in the PDF of standard normal distribution using the norm.pdf() function with a
location parameter of 0 (corresponding to the mean) and scale of 1 (corresponding to the standard
deviation). We also shade the areas before –1.96 and after 1.96, where a z-score of 1.96 corresponds to
a 95% significance level in a statistical test. In other words, z-scores greater than or equal to 1.96
account for 5% of the total probability, and z-scores lower than or equal to –1.96 account for 5% as
well.
Figure 8-7 Visualizing the probability density function of a standard normal distribution, with the 5% significance level shaded at both
the left and right sides
In the context of hypothesis testing, the shaded area represents the probability of observing a z-score
greater than 1.96 under the null hypothesis. Performing the statistical test would give us a z-score. If the
z-score is above 1.96 or below –1.96 in a one-sided test, we would reject the null hypothesis in favor of
the alternative hypothesis at the 0.05 significance level, since the probability of observing the
phenomenon under the null hypothesis would simply be too small.
In summary, we use the z-score as a standardized score to measure how many standard deviations
an observation is from the mean of a distribution. It is used in hypothesis testing to determine the
statistical significance of an observation, that is, the probability of an event happening under the null
hypothesis. The significance level is often set at 0.05. We can use the z-score to calculate the
probability of observing a value as extreme as the observation under the null hypothesis. Finally, we
make a decision on whether to reject or fail to reject the null hypothesis.
Now let us revisit the running example. Since stock prices are often volatile, we switch to the
moving average approach to derive the running mean and standard deviation. That is, each daily spread
would have a corresponding running mean and standard deviation based on the collection of spreads in
the rolling window. In Listing 8-11, we derive the running mean and standard deviation using a window
size of ten and apply the transformation to derive the resulting z-scores as the standardized spread.
# convert to z score
# z-score is a measure of how many standard deviations the spread is
from its mean
# derive mean and sd using a moving window
window_size = 10
spread_mean = spread.rolling(window=window_size).mean()
spread_std = spread.rolling(window=window_size).std()
zscore = (spread - spread_mean) / spread_std
zscore.plot(figsize=(12,6))
Listing 8-11 Converting to z-scores based on moving averages
Running the code generates Figure 8-8, where the standardized spreads now look more normally
distributed as white noise.
Figure 8-8 Visualizing the z-scores after standardizing the spreads using the running mean and standard deviation
Since we used a window size of ten, the first nine observations will appear as NA in the moving
average series. Let us get rid of the initial NA values by first identifying the first valid index using the
first_valid_index() function and then subsetting the z-score series, as shown in the following
code:
The next section formulates the trading strategy using the z-scores.
Figure 8-9 Illustrating the process of formulating trading signals based on preset entry and exit thresholds for the z-scores
In Listing 8-12, we first initialize the entry and exit thresholds, respectively. We create two Pandas
Series objects (stock1_position and stock2_position) to store the daily positions for each
stock. Based on the current z-score and present thresholds for entering and exiting long or short
positions, we check the daily z-score in a loop and match it to one of the four cases for signal
generation based on the following rule:
Long stock 1 and short stock 2 if the z-score is below –2 and stock 1 has no prior position.
Short stock 1 and long stock 2 if the z-score is above 2 and stock 2 has no prior position.
Exit the position in both stock 1 and stock 2 if the z-score is between –1 and 1.
Maintain the position in both stock 1 and stock 2 for the rest of the cases, that is, the z-score is
between –2 and –1 or between 1 and 2.
We can now calculate the overall profit of the pairs trading strategy. In Listing 8-13, we first obtain
the daily percentage changes using the pct_change() function for each stock, starting from the
index with a valid value. These daily returns will be adjusted according to the position we held from the
previous trading day. In other words, multiplying the shifted positions with the daily returns gives the
strategy’s daily returns for each stock, filling possible NA values with zero. Finally, we add up the daily
returns from the two stocks, convert them to 1+R returns, and perform the sequential compounding
procedure using the cumprod() function to obtain the wealth index.
Summary
In this chapter, we covered the concept of statistical arbitrage and hypothesis testing, as well as the
implementation details based on the pairs trading strategy. We first walked through the overall process
of developing a pairs trading strategy and introduced new concepts such as cointegration and
stationarity. Next, we compared cointegration and correlation, both closely related but drastically
different. Last, we introduced a case study on calculating the cumulative return using the pairs trading
strategy.
In the next chapter, we will introduce Bayesian optimization, a principled way to search for optimal
parameters of a trading strategy.
Exercises
Evaluate the cointegration of selected stock pairs during bull and bear market periods separately. Do
the results vary significantly? If so, discuss possible reasons.
Implement rolling cointegration tests on a pair of time series data and observe how cointegration
status (cointegrated or not) evolves over time.
For a given pair of stocks, test the stationarity of the spread between them using the ADF test. If the
spread is stationary, what does it imply for the pairs trading strategy?
Given the time series data of spreads for a pair of stocks, perform a hypothesis test to check whether
the mean of spreads is equal to zero.
Calculate the z-scores of the spread for different lookback periods (e.g., 30, 60, and 90 days). How
does changing the lookback period affect the distribution of z-scores and the performance of your
pairs trading strategy?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_9
Figure 9-1 Illustrating the optimization problem. The selected trading strategy manifests as an unknown
function, and our goal is to search for the optimal set of window lengths that deliver the highest performance
metric, the Sharpe ratio in this case
The next section provides more perspectives on the overall optimization process.
More on Optimization
Optimization aims at locating the optimal value f ∗ = f (x∗) or its maximizer
$$ {x}^{\ast }={\mathrm{argmax}}_{x\in \mathcal{X}}f $$ for all the input
values $$ x\in \mathcal{X} $$ in a maximization setting, which could also be a
minimization problem. The procedure that carries out the optimization process is
called the optimizer. There are multiple types of optimizers, with stochastic gradient
descent (SGD) being the most popular optimizer in the space of deep learning. In the
context of backtesting a trading strategy, we are mostly interested in optimizing the
risk-adjusted return, represented by the Sharpe ratio or other risk measures such as the
max drawdown. Plus, we have the additional challenge that the inputs are not
continuous values; instead, they are discrete such as window sizes or trading volumes.
The optimizer takes a function f and figures out the desired optimum value f ∗ or
its corresponding input parameter x ∗. Being an optimum value means that f (x ∗) is
greater (or less, in the case of minimization) than any other values in the
neighborhood. Here, f ∗ may be either a local optimum or a global optimum. A local
optimum means f (x ∗) is at the top of a mountain, and global optimum means the
highest point of all mountains in the region. That is, in a maximization setting, we
could take all the local maxima, compare each other, and report the maximum of them
as the global maximum. Both are characterized by having a zero gradient at the point
x ∗, yet the global optimum is often what we aim for. The optimizer needs a strategy
to escape from these local optima and continue its search for the global optimum.
There are various techniques to handle this issue, including using different initial
values via the multistart procedure, applying random jumps in the parameter space,
and using complex algorithms like simulated annealing or genetic algorithms that
employ specific mechanisms to escape local optima.
In the context of developing a trading strategy, we are interested in the global
maximizer (optimal input parameters) that gives the maximal Sharpe ratio. This is a
complex task as there may be many sets of parameters that yield good results (local
maxima), but we want to find the absolute best (global maximum).
Note that using the gradient information to identify an optimum represents a huge
improvement in our understanding of optimization problems, as first proposed by
Isaac Newton. Prior to his time, we would make the manual comparison for each
unique pair, which is a combinatorial problem that requires the most time-consuming
work. When the function form is available, such as y = x2, we could invoke the tool of
calculus and solve for the point whose gradient is zero, that is, y′ = 2x = 0, giving
x = 0. We could then calculate the second derivative or apply the sign chart method to
ascertain if this is a maximum or minimum point.
The next section introduces more on the global optimization problem.
Global Optimization
Optimization aims to locate the optimal set of parameters of interest across the whole
search domain, often by carefully allocating limited resources. For example, when
searching for the car key at home before leaving for work in two minutes, we would
naturally start with the most promising place where we would usually put the key. If it
is not there, think for a little while about the possible locations and go to the next
most promising place. This process iterates until the key is found. In this example, the
search policy is, in a way, behaving intelligently. It digests the available information
on previous searches and proposes the following promising location, so as to use the
limited resource wisely. The resource could be the limited number of trials we could
run before a project deadline approaches tomorrow or the two-minute budget to
search for the key in this case. The unknown function is the house itself, a binary
value that reveals if the key is placed at the proposed location upon each sampling at
the specific location.
This intelligent search policy represents a cornerstone concept in optimization,
especially in the context of derivative-free optimization where the unknown function
does not reveal any derivative information. Here, the policy needs to balance
exploration, which probes the unknown function at various locations in the search
domain, and exploitation, which focuses on promising areas where we have already
identified a good candidate value. This trade-off is usually characterized by a learning
curve showing the function value of the best-found solution over the number of
function evaluations.
The key search example is considered an easy one since we are familiar with the
environment in terms of its structural design. However, imagine locating an item in a
totally new environment. The optimizer would need to account for the uncertainty due
to unfamiliarity with the environment while determining the next sampling location
via multiple sequential trials. When the sampling budget is limited, as is often the
case in real-life searches in terms of time and resources, the optimizer needs to argue
carefully on the utility of each candidate input parameter value.
This process is characterized by sequential decision-making under uncertainty, a
problem that lies at the heart of the field of optimization. When faced with such a
situation, optimizers need to develop an intelligent search policy that effectively
manages the trade-off between exploration (searching new areas) and exploitation
(capitalizing on known, promising locations). In the context of searching for an item
in an unfamiliar environment, exploration involves searching in completely new areas
where the item could potentially be located, while exploitation involves focusing the
search around areas where clues or signs of the item have already been found. The
challenge is to balance these two approaches, as focusing too much on exploration
could lead to a waste of time and resources, while focusing too much on exploitation
could result in missed opportunities.
In the world of trading strategies, this situation amounts to a search in a high-
dimensional parameter space where each dimension represents a different aspect of
the trading strategy. Exploration would involve trying out completely new sets of
parameters, while exploitation would involve fine-tuning the most promising sets of
parameters already discovered. The optimizer aims to effectively navigate this high-
dimensional space and find the set of parameters that yields the best possible
performance in terms of the Sharpe ratio or other preset metrics.
Let us formalize this sequential global optimization using mathematical terms. We
are dealing with an unknown scalar-valued objective function f based on a specific
domain $$ \mathcal{X} $$ . In other words, the unknown subject of interest f is
a function that maps a certain candidate parameter in $$ \mathcal{X} $$ to a
real number in $$ \mathcal{X} $$ that is,
$$ f:\mathcal{X}\to \mathbb{R} $$ . We typically place no specific assumption
about the nature of the domain $$ \mathcal{X} $$ other than that it should be a
bounded, compact, and convex set.
A bounded set $$ \mathcal{X} $$ means that it has upper and lower limits,
and all values of the parameters contained within $$ \mathcal{X} $$ fall within
these bounds. A compact set is one that is both bounded and closed, meaning that it
includes its boundary. And a convex set is one in which, for any two points within the
set, the set contains the whole line segment that joins them. These assumptions make
our problem mathematically tractable and realistic in the real-world scenario.
Unless otherwise specified, we focus on the maximization setting instead of
minimization since maximizing the objective function is equivalent to minimizing the
negated objective, followed by another negation to recover the original maximum
value. The optimization procedure thus aims at locating the global maximum f ∗ or its
corresponding location x ∗ in a principled and systematic manner. Mathematically, we
wish to locate f ∗ where
$$ {f}^{\ast }=\underset{x\in \mathcal{X}}{\max }f(x)=f\left({x}^{\ast}\right) $$
Or equivalently, we are interested in its location x∗ where
$$ {x}^{\ast }={\mathrm{argmax}}_{\mathrm{x}\in \mathcal{X}}f(x) $$
The argmax operation is used in mathematics to denote the argument of the
maximum or the set of points in the domain $$ \mathcal{X} $$ that maximizes
the function f. When used in this optimization problem, it means that we are looking
for the specific values of the input parameters that yield the maximum value of the
function.
Again, note that f (x) is unknown and only indirectly observable through
sampling, and $$ \mathcal{X} $$ could be a set in a high-dimensional space.
So, we are looking for the best parameters in a high-dimensional space that we can
only explore one sample at a time. This is what makes the global optimization
problem challenging in practice.
Figure 9-2 provides an example one-dimensional objective function with its
global maximum f ∗ and its location x ∗ highlighted. The goal of global optimization
is thus to systematically reason about a series of sampling decisions within the total
search space $$ \mathcal{X} $$ , so as to locate the global maximum as fast as
possible, that is, sampling as few times as possible, instead of conducting random
trials or grid search. Besides, when the optimizer makes a sequence of decisions about
where in the parameter space to sample next, each decision is influenced by the
results of previous samples (also referred to as the training set) and is aimed at
improving the estimated optimum.
Figure 9-2 An example objective function with the global maximum f ∗ and its location x∗. The goal of global
optimization is to systematically reason about a series of sampling decisions so as to locate the global maximum as
fast as possible
Note that this is a nonconvex function, as is often the case in real-life functions
we are optimizing. A nonconvex function means that there are multiple local optima
in the function. Thus, we could not resort to first-order gradient-based methods to
reliably search for the global optimum, as we did for the convex function y = x2.
Using the gradient-based method, such as solving for the solution that makes the
gradient of the original function equal to zero, will likely converge to a local
optimum. This is also one of the advantages of Bayesian optimization, introduced as a
global optimization technique later, compared with other gradient-based optimization
procedures for local search.
The next covers more on the objective function.
Figure 9-3 Three possible functional forms. On the left is a convex function whose optimization is easy. In the
middle is a nonconvex function with multiple local minima, and on the right is also a nonconvex function with a
wide flat region full of saddle points. Optimization for the latter two cases takes a lot more work than for the first
case
Figure 9-4 Slow convergence due to a small learning rate on the left and divergence due to a large learning rate
on the right
Bayesian Optimization
As the name suggests, Bayesian optimization is an area that studies optimization
problems using the Bayesian approach. Optimization aims at locating the optimal
objective value (i.e., a global maximum or minimum) of all possible values or the
corresponding location of the optimum over the search domain, also called the
environment. The search process starts at a specific initial location and follows a
particular policy to iteratively guide the following sampling locations, collect new
observations, and refresh the guiding search policy.
At its core, Bayesian optimization uses a probabilistic model (such as Gaussian
processes) to represent the unknown function and a utility function (also called the
acquisition function) to decide where to sample next. It iteratively updates the
probabilistic model with new sample points and uses this updated model to select the
next sampling location.
As shown in Figure 9-5, the overall optimization process consists of repeated
interactions between the policy (the optimizer) and the environment (the unknown
objective function). The policy is a mapping function that takes in a new input
parameter (plus historical ones) and outputs the next parameter value to try out in a
principled way. Here, we are constantly learning and improving the policy as the
search continues. A good policy guides our search toward the global optimum faster
than a bad one. In arguing which parameter value to try out, a good policy would
spend the limited sampling budget on promising candidate values.
Figure 9-5 The overall Bayesian optimization process. The policy digests the historical observations and
proposes a new sampling location. The environment governs how the (possibly noise-corrupted) observation at the
newly proposed location is revealed to the policy. Our goal is to learn an efficient and effective policy that could
navigate toward the global optimum as quickly as possible
On the other hand, the environment contains the unknown objective function to be
learned by the policy within a specific boundary (maximum and minimum values of
the parameter value). When probing the functional value as requested by the policy,
the actual observation revealed by the environment to the policy is often corrupted by
noise due to the choice of the backtesting period, making the learning even more
challenging. Thus, Bayesian optimization, a specific approach for global
optimization, would like to learn a policy that can help us efficiently and effectively
navigate toward the global optimum of an unknown, noise-corrupted objective
function as quickly as possible.
When deciding which parameter value to try next, most search strategies face the
exploration and exploitation trade-off. Exploration means searching within an
unknown and faraway area, and exploitation refers to searching within the
neighborhood visited earlier in the hope of locating a better functional evaluation.
Bayesian optimization also faces the same dilemma. Ideally, we would like to explore
more at the initial phase to increase our understanding of the environment (the black-
box function) and gradually shift toward the exploitation mode that taps into the
existing knowledge and digs into known promising regions.
Bayesian optimization achieves such a trade-off via two components: a Gaussian
process (GP) used to approximate the underlying black-box function and an
acquisition function that encodes the exploration-exploitation trade-off into a scalar
value as an indicator of the sampling utility across all candidates in the domain. Let
us look at each component in detail in the following sections.
Gaussian Process
As a widely used stochastic process (able to model an unknown black-box function
and the corresponding uncertainties of modeling), the Gaussian process takes the
finite-dimensional probability distributions one step further into a continuous search
domain that contains an infinite number of variables, where any finite set of points in
the domain jointly forms a multivariate Gaussian distribution. It is a flexible
framework to model a broad family of functions and quantify their uncertainties, thus
being a powerful surrogate model used to approximate the true underlying function.
Let us look at a few visual examples to see what it offers.
Figure 9-6 illustrates an example of a “flipped” prior probability distribution for a
single random variable selected from the prior belief of the Gaussian process. Every
single point represents a parameter value, although it is now modeled as a random
variable and thus has randomness in its realizations. Specifically, each point follows a
normal distribution. Plotting the mean (solid line) and 95% credible interval (dashed
lines) of all these prior distributions gives us the prior process for the objective
function regarding each location in the domain. The Gaussian process thus employs
an infinite number of normally distributed random variables within a bounded range
to model the underlying objective function and quantify the associated uncertainty via
a probabilistic approach.
Figure 9-6 A sample prior belief of the Gaussian process represented by the mean and 95% credible interval for
each location in the domain. Every objective value is modeled by a random variable that follows a normal prior
predictive distribution. Collecting the distributions of all random variables and updating these distributions as
more observations are collected could help us quantify the potential shape of the true underlying function and its
probability
The prior process can thus serve as the surrogate data-generating process of the
unknown black-box function, which can also be used to generate samples in the form
of functions, an extension of sampling single points from a probability distribution.
For example, if we were to repeatedly sample from the prior process, we would
expect the majority (around 95%) of the samples to fall within the credible interval
and a minority outside this range. Figure 9-7 illustrates three functions sampled from
the prior process.
Figure 9-7 Three example functions sampled from the prior process, where the majority of the functions fall
within the 95% credible interval
In a Gaussian process, the uncertainty on the objective value of each location (i.e.,
the parameter value of a trading strategy) is quantified using a credible interval. As
we start to collect observations and assume a noise-free and exact observation model,
the uncertainties at the collection locations will be resolved, leading to zero variance
and direct interpolation at these locations. Besides, the variance increases as we move
further away from the observations, which is a result of integrating the prior process
(the prior belief about the unknown black-box function) with the information
provided by the actual observations. Figure 9-8 illustrates the updated posterior
process after collecting two observations. The posterior process with updated
knowledge based on the observations will thus make a more accurate surrogate model
and better estimate the objective function.
Figure 9-8 Updated posterior process after incorporating two exact observations in the Gaussian process. The
posterior mean interpolates through the observations, and the associated variance reduces as we move nearer the
observations
Acquisition Function
The tools from Bayesian inference and the incorporation of the Gaussian process
provide principled reasoning on the underlying distribution of the objective function.
However, we would still need to incorporate such probabilistic information in our
decision-making to search for the global maximum. We need to build a policy (by
maximizing the acquisition function) that absorbs the most updated information on
the objective function and recommends the following most promising sampling
location in the face of uncertainties across the domain. The optimization policy
guided by maximizing the acquisition function thus plays an essential role in
connecting the Gaussian process to the eventual goal of Bayesian optimization. In
particular, the posterior predictive distribution obtained from the updated Gaussian
process provides an outlook on the objective value and the associated uncertainty for
locations not explored yet, which could be used by the optimization policy to quantify
the utility of any alternative location within the domain.
When converting the posterior knowledge about candidate locations, that is,
posterior parameters such as the mean and the variance of the Gaussian distribution at
each location, to a single scalar utility score, the acquisition function comes into play.
An acquisition function is a manually designed mechanism that evaluates the relative
potential of each candidate location in the form of a scalar score, and the location
with the maximum score will be used as the next sampling choice. It is a function that
assesses how valuable a candidate’s location is when we acquire/sample it.
The acquisition function takes into account both the expected value and the
uncertainty (variance) of the function at unexplored locations, as provided by the
Gaussian process posterior distribution. In this context, exploration means sampling
in regions of high uncertainty, while exploitation involves sampling where the
function value is expected to be high.
The acquisition function is also cheap to evaluate as a side computation since we
need to evaluate it at every candidate location and then locate the maximum utility
score, posing another (inner) optimization problem. Figure 9-9 provides a sample
curve of the acquisition function.
Figure 9-9 Illustrating a sample acquisition function curve. The location that corresponds to the highest value of
the acquisition function is the next location (parameter value of a trading strategy) to sample. Since there is no
value added if we were to sample those locations already sampled earlier, the acquisition function thus reports zero
at these locations
EI and UCB
Acquisition functions differ in multiple aspects, including the choice of the utility
function, the number of lookahead steps, the level of risk aversion or preference, etc.
Introducing risk appetite directly benefits from the posterior belief about the
underlying objective function. In the case of GP regression as the surrogate model,
the risk is quantified by the covariance function, with its credible interval expressing
the uncertainty level about the objective’s possible values.
Regarding the utility of the collected observations, the expected improvement
chooses the historical maximum of the observed value as the benchmark for
comparison upon selecting an additional sampling location. It also implicitly assumes
that only one more additional sampling is left before the optimization process
terminates. The expected marginal gain in utility (i.e., the acquisition function)
becomes the expected improvement in the maximal observation, calculated as the
expected difference between the observed maximum and the new observation after
the additional sampling at an arbitrary sampling location.
Specifically, denote y1 : n = {y1, …, yn} as the set of collected observations at the
corresponding locations x1 : n = {x1, …, xn}. Assuming the noise-free setting, the
actual observations would be exact, that is, y1 : n = f1 : n. Given the collected dataset
$$ {\mathcal{D}}_n=\left\{{x}_{1:n},{y}_{1:n}\right\} $$ , the corresponding
utility is
$$ u\left({\mathcal{D}}_n\right)=\max \left\{{f}_{1:n}\right\}= , where
{f}_n^{\ast } $$
$$ {f}_n^{\ast } $$ is the incumbent maximum observed so far. Similarly,
assume we obtain another observation yn + 1 = fn + 1 at a new location xn + 1, the
resulting utility is
$$ u\left({\mathcal{D}}_{n+1}\right)=u\left({\mathcal{D}}_n\cup
\left\{{x}_{n+1},{f}_{n+1}\right\}\right)=\max \left\{{f}_{n+1}, .
{f}_n^{\ast}\right\} $$
Taking the difference between these two gives the increase in utility due to the
addition of another observation:
$$ u\left({\mathcal{D}}_{n+1}\right)-u\left({\mathcal{D}}_n\right)=\max \left\
{{f}_{n+1},{f}_n^{\ast}\right\}-{f}_n^{\ast }=\max \left\{{f}_{n+1}-{f}_n^{\ast
},0\right\} $$
which returns the marginal increment in the incumbent if
$$ {f}_{n+1}\ge {f}_n^{\ast } $$ and zero otherwise, as a result of observing
fn + 1. Readers familiar with the activation function in neural networks would instantly
connect this form with the ReLU (rectified linear unit) function, which keeps the
positive signal and silences the negative one.
Due to randomness in yn + 1, we can introduce the expectation operator to integrate
it out, giving us the expected marginal gain in utility, that is, the expected
improvement acquisition function:
$$ {\alpha}_{EI}\left({x}_{n+1};
{\mathcal{D}}_n\right)=\mathbbm{E}\left[u\left({\mathcal{D}}_{n+1}\right)-
u\left({\mathcal{D}}_n\right)|{x}_{n+1},{\mathcal{D}}_n\right]\kern7em =\int
\max \left\{{f}_{n+1}-{f}_n^{\ast },0\right\}p\left({f}_{n+1}|{x}_{n+1},
{\mathcal{D}}_n\right)d{f}_{n+1} $$
Under the framework of GP regression, we can obtain a closed-form expression of
the expected improvement acquisition function as follows:
$$ {\alpha}_{EI}\left({x}_{n+1};{\mathcal{D}}_n\right)=\left({\mu}_{n+1}-
{f}_n^{\ast}\right)\Phi \left(\frac{\mu_{n+1}-{f}_n^{\ast }}{\sigma_{n+1}}\right)+
{\sigma}_{n+1}\phi \left(\frac{\mu_{n+1}-{f}_n^{\ast }}{\sigma_{n+1}}\right) $$
where $$ {f}_n^{\ast } $$ is the best-observed value so far, and ϕ and Φ
denote the probability and cumulative density function of a standard normal
distribution at the tentative point xn + 1, respectively. μn + 1 and σn + 1 denote the
posterior mean and standard deviation at xn + 1.
The closed-form EI consists of two components: exploitation (the first term) and
exploration (the second term). Exploitation means continuing sampling the
neighborhood of the observed region with a high posterior mean, and exploration
encourages sampling an unvisited area where the posterior uncertainty is high. The
expected improvement acquisition function thus implicitly balances off these two
opposing forces.
On the other hand, the UCB acquisition function, as defined in the following,
encodes such a trade-off explicitly:
$$ {\alpha}_{UCB}\left({x}_{n+1};{\mathcal{D}}_n\right)={\mu}_{n+1}+
{\beta}_{n+1}{\sigma}_{n+1} $$
where βn + 1 is a user-defined stagewise hyperparameter that controls the trade-off
between the posterior mean and standard deviation. A low value of βn + 1 encourages
exploitation, and a high value of βn + 1 leans more toward exploration.
Both acquisition functions will then be assessed globally in search of the
maximizing location, which will serve as the next sampling choice. Let us summarize
the full BO (Bayesian optimization) loop in the following section.
Figure 9-10 The full Bayesian optimization loop featuring an iterative interaction between the unknown (black-
box) environment and the decision-making policy that consists of a Gaussian process for probabilistic evaluation
and acquisition function for utility assessment of candidate locations in the environment
With the basic BO framework in mind, let us test it out by optimizing the window
lengths of the pairs trading strategy.
We also import a few supporting packages in the following, along with setting the
random seed for reproducibility:
import os
import math
import torch
import random
import numpy as np
from matplotlib import pyplot as plt
import torch.nn as nn
import yfinance as yf
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
%matplotlib inline
SEED = 1
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
The next section touches upon the performance of the pairs trading strategy as the
black-box function.
class QTS_OPTIMIZER(nn.Module):
def __init__(self, ticker_pair, start_date, end_date,
riskfree_rate=0.04):
super(QTS_OPTIMIZER, self).__init__()
self.ticker_pair = ticker_pair
self.start_date = start_date
self.end_date = end_date
self.riskfree_rate = riskfree_rate
self.stock = self.get_stock_data()
Listing 9-1 Defining the black-box function for Bayesian optimization
Upon instantiating this class, the __init__() function will get triggered, which
also includes downloading the stock data for the selected ticker and date range.
Listing 9-2 has the definition of the get_stock_data() method, where we use
the usual download() function to download the data and extract the adjusted
closing price that considers dividends and splits.
def get_stock_data(self):
print("===== DOWNLOADING STOCK DATA =====")
df = yf.download(['GOOG'], start=self.start_date,
end=self.end_date)['Adj Close']
print("===== DOWNLOAD COMPLETE =====")
return pd.DataFrame(df)
Listing 9-2 Defining the method to retrieve stock data
return sharpe_ratio
Listing 9-3 Defining the method to calculate the Sharpe ratio
Let us test the class out. The following code instantiates the class into the qts
variable by passing the ticker symbol of Google and Microsoft with a date range of
the start and end dates of 2022. Note the printed message after running this line,
showing that the get_stock_data() function gets triggered during the process.
Note that there is no mention of entry and exit signals at this stage; the initialization
stage is meant to handle all preparatory work before the actual scoring in the
forward() function.
We can also print the first few rows of the object’s stock attribute as a sanity
check:
>>> qts.stock.head()
GOOG MSFT
Date
2022-01-03 145.074493 330.813873
2022-01-04 144.416504 325.141388
2022-01-05 137.653503 312.659851
2022-01-06 137.550995 310.189270
2022-01-07 137.004501 310.347412
Let us test out the scoring function. In the following code snippet, we pass in
different values of entry and exit thresholds and obtain the corresponding Sharpe ratio
for the whole year of 2022:
def generate_initial_data(n=10):
# generate random initial locations
train_x1 = x1_bound[0] + (x1_bound[1] - x1_bound[0])
* torch.rand(size=(n,1), device=device, dtype=dtype)
train_x2 = torch.rand(size=(n,1), device=device,
dtype=dtype)
train_x = torch.cat((train_x1, train_x2), 1)
# obtain the exact value of the objective function
and add output dimension
train_y = []
for i in range(len(train_x)):
train_y.append(qts(entry_threshold=train_x1[i],
exit_threshold=train_x2[i]))
train_y = torch.Tensor(train_y,
device=device).to(dtype).unsqueeze(-1)
# get the current best observed value, i.e., utility
of the available dataset
best_observed_value = train_y.max().item()
return train_x, train_y, best_observed_value
Listing 9-4 Generating initial training data
Next, we implement the first component in BO: the Gaussian process model.
# initialize GP model
from botorch.models import SingleTaskGP
from gpytorch.mlls import ExactMarginalLogLikelihood
Let us print out the values of the hyperparameters (including kernel parameters
and noise variance) of the GP model before optimization:
# optimize GP hyperparameters
from botorch.fit import fit_gpytorch_mll
# fit hyperparameters (kernel parameters and noise
variance) of a GPyTorch model
fit_gpytorch_mll(mll.cpu());
mll = mll.to(train_x)
model = model.to(train_x)
>>> list(model.named_hyperparameters())
[('likelihood.noise_covar.raw_noise', Parameter
containing:
tensor([0.2238], dtype=torch.float64,
requires_grad=True)),
('mean_module.raw_constant', Parameter containing:
tensor(1.1789, dtype=torch.float64,
requires_grad=True)),
('covar_module.raw_outputscale', Parameter containing:
tensor(1.8917, dtype=torch.float64,
requires_grad=True)),
('covar_module.base_kernel.raw_lengthscale', Parameter
containing:
tensor([[-0.8823, -0.9687]], dtype=torch.float64,
requires_grad=True))]
Listing 9-6 Optimizing GP hyperparameters
The result shows a different set of hyperparameters after optimization. Note that
we need to move the mll object to GPU to perform the optimization, after which it
can be moved back to GPU (if available).
The optimized GP model can then be incorporated into the acquisition function to
guide the following search process, as detailed in the next section.
EI = ExpectedImprovement(model=model_ei,
best_f=best_observed_value)
qEI = qExpectedImprovement(model=model_qei,
best_f=best_observed_value)
beta = 0.8
UCB = UpperConfidenceBound(model=model_ucb, beta=beta)
num_fantasies = 64
qKG = qKnowledgeGradient(
model=model_qkg,
num_fantasies=num_fantasies,
X_baseline=train_x,
q=1
)
Listing 9-7 Defining and initializing the acquisition functions
def optimize_acqf_and_get_observation(acq_func):
"""Optimizes the acquisition function, and returns a
new candidate and a noisy observation."""
# optimize
candidates, value = optimize_acqf(
acq_function=acq_func,
bounds=bounds,
q=BATCH_SIZE,
num_restarts=NUM_RESTARTS,
raw_samples=RAW_SAMPLES, # used for
intialization heuristic
)
# observe new values
new_x = candidates.detach()
# sample output value
new_y = qts(entry_threshold=new_x.squeeze()
[0].item(), exit_threshold=new_x.squeeze()[1].item())
# add output dimension
new_y = torch.Tensor([new_y],
device=device).to(dtype).unsqueeze(-1)
# print("new fn value:", new_y)
Let us test out this function with the qKG acquisition function:
>>> optimize_acqf_and_get_observation(qKG)
(tensor([[1.5470, 0.6003]], dtype=torch.float64),
tensor([[2.2481]], dtype=torch.float64))
Before scaling up to multiple iterations, we will also test out the random search
strategy, which selects a random window length for each moving series at each round.
This serves as the baseline for comparison, since manual selection often amounts to a
random search strategy in the initial phase. In the function
update_random_observations() shown in Listing 9-9, we pass a running list
of best-observed function values, perform a random selection, observe the
corresponding functional evaluation, compare it with the current running maximum,
and then return the list of running maxima with the current maximum appended.
def update_random_observations(best_random):
"""Simulates a random policy by drawing a new random
points,
observing their values, and updating the current
best candidate to the running list.
"""
new_x1 = x1_bound[0] + (x1_bound[1] - x1_bound[0]) *
torch.rand(size=(1,1), device=device, dtype=dtype)
new_x2 = torch.rand(size=(1,1), device=device,
dtype=dtype)
new_x = torch.cat((new_x1, new_x2), 1)
new_y = qts(entry_threshold=new_x[0,0].item(),
exit_threshold=new_x[0,1].item())
best_random.append(max(best_random[-1], new_y))
return best_random
Listing 9-9 Defining the random search strategy
# single trial
import time
N_ROUND = 20
verbose = True
beta = 0.8
best_random.append(best_observed_value)
best_observed_ei.append(best_observed_value)
best_observed_qei.append(best_observed_value)
best_observed_ucb.append(best_observed_value)
best_observed_qkg.append(best_observed_value)
# update progress
best_random = update_random_observations(best_random)
best_value_ei = max(best_observed_ei[-1],
new_y_ei.item())
best_value_qei = max(best_observed_qei[-1],
new_y_qei.item())
best_value_ucb = max(best_observed_ucb[-1],
new_y_ucb.item())
best_value_qkg = max(best_observed_qkg[-1],
new_y_qkg.item())
best_observed_ei.append(best_value_ei)
best_observed_qei.append(best_value_qei)
best_observed_ucb.append(best_value_ucb)
best_observed_qkg.append(best_value_qkg)
Let us plot the search progress so far via the following code snippet:
For each iteration, we fit the GP model to optimize its hyperparameters for each
strategy, instantiate the acquisition function based on the updated GP model instance,
optimize over the acquisition function, propose the next sampling point, obtain the
corresponding function evaluation, append the new observation (parameter value and
Sharpe ratio) to the training set, update the search progress by appending to running
maximum Sharpe ratio, and finally reinitialize the GP for the next iteration.
Running the code generates Figure 9-11. The comparison demonstrates the
benefits of adopting a principled model-based search strategy over random selections.
UCB performs the best across all iterations, showing the advantage of a higher focus
on early exploration embedded in this acquisition function. Other strategies pick up
later and stay flat afterward. Both model-based strategies perform better than the
random strategy.
Figure 9-11 Cumulative maximum Sharpe ratio of all search strategies. The UCB policy performs the best as it
is able to identify the highest Sharpe ratio in just one iteration. Other policies pick up later but lack exploration
toward the later iterations. The random strategy performs the worst, showing the advantage of a principled search
policy over random selection
Let us repeat the experiments a number of times to assess the stability of the
results, as shown in Listing 9-11.
# multiple trials
# number of runs to assess std of different BO loops
N_TRIALS = 4
# indicator to print diagnostics
verbose = True
# number of steps in the outer BO loop
N_ROUND = 20
best_random_all, best_observed_ei_all,
best_observed_qei_all, best_observed_ucb_all,
best_observed_qkg_all = [], [], [], [], []
best_random.append(best_observed_value)
best_observed_ei.append(best_observed_value)
best_observed_qei.append(best_observed_value)
best_observed_ucb.append(best_observed_value)
best_observed_qkg.append(best_observed_value)
# update progress
best_random =
update_random_observations(best_random)
best_value_ei = max(best_observed_ei[-1],
new_y_ei.item())
best_value_qei = max(best_observed_qei[-1],
new_y_qei.item())
best_value_ucb = max(best_observed_ucb[-1],
new_y_ucb.item())
best_value_qkg = max(best_observed_qkg[-1],
new_y_qkg.item())
best_observed_ei.append(best_value_ei)
best_observed_qei.append(best_value_qei)
best_observed_ucb.append(best_value_ucb)
best_observed_qkg.append(best_value_qkg)
t1 = time.monotonic()
best_observed_ei_all.append(best_observed_ei)
best_observed_qei_all.append(best_observed_qei)
best_observed_ucb_all.append(best_observed_ucb)
best_observed_qkg_all.append(best_observed_qkg)
best_random_all.append(best_random)
Listing 9-11 Assessing the stability of the results via repeated experiments
Running the code generates Figure 9-12, suggesting that BO-based search
strategies consistently outperform the random search strategy.
Figure 9-12 Assessing the stability of the results via repeated experiments
Finally, let us extract the mean and standard deviation of all experiments, as
shown in Listing 9-12.
def extract_last_entry(x):
tmp = []
for i in range(4):
tmp.append(x[i][-1])
return tmp
rst_df = pd.DataFrame({
"EI":
[np.mean(extract_last_entry(best_observed_ei_all)),
np.std(extract_last_entry(best_observed_ei_all))],
"qEI":
[np.mean(extract_last_entry(best_observed_qei_all)),
np.std(extract_last_entry(best_observed_qei_all))],
"UCB":
[np.mean(extract_last_entry(best_observed_ucb_all)),
np.std(extract_last_entry(best_observed_ucb_all))],
"qKG":
[np.mean(extract_last_entry(best_observed_qkg_all)),
np.std(extract_last_entry(best_observed_qkg_all))],
"random":
[np.mean(extract_last_entry(best_random_all)),
np.std(extract_last_entry(best_random_all))],
}, index=["mean", "std"])
>>> rst_df
EI qEI UCB qKG random
mean 2.736916 2.734416 2.786065 2.706545 2.470426
std 0.116130 0.146371 0.106940 0.041464 0.247212
Listing 9-12 Extracting the mean and standard deviation for all experiments
Summary
In this chapter, we introduced the use of Bayesian optimization techniques to search
for optimal parameters of a trading strategy. We started by illustrating the concept of
optimizing trading strategies by tuning the corresponding governing parameters, a
nontrivial task. By treating the performance measure as a black-box function of the
tuning parameters, we introduced the Bayesian optimization framework, which uses
Gaussian processes and acquisition functions (such as EI and UCB) to support the
search of optimal parameters in a sample-efficient manner. With the full BO loop in
perspective, we went through a case study that optimizes the entry and exit thresholds
of a pairs trading strategy to obtain an optimal Sharpe ratio.
In the final chapter, we will look at the use of machine learning models in the
pairs trading strategy.
Exercises
How does Bayesian optimization approach the problem of hyperparameter tuning
in trading strategies? What makes this approach particularly suitable for this task?
Change the objective function to search for the parameters that minimize the
maximum drawdown of the trend-following strategy.
Bayesian optimization is based on a probabilistic model of the objective function,
typically a Gaussian process (GP). How does this model assist in identifying areas
of the search space to explore or exploit?
Can you describe a scenario where a long-term (nonmyopic) acquisition function
would be beneficial in the context of optimizing trading strategies? What about a
scenario where a short-term (myopic) function might be preferable?
Can you discuss how the incorporation of prior knowledge can be leveraged in the
Bayesian optimization process for parameter tuning in trading strategies?
How can Bayesian optimization handle noisy evaluations, a common occurrence in
financial markets, during the optimization process of a trading strategy’s
parameters?
© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
P. Liu, Quantitative Trading Strategies Using Python
https://doi.org/10.1007/978-1-4842-9675-2_10
Machine learning can be used in pairs trading in several ways to improve the effectiveness of
trading strategies. Examples include pair selection, feature engineering, spread prediction, etc.
In this final chapter, we are going to focus on spread prediction using different machine
learning algorithms in order to generate trading signals.
Figure 10-1 Summarizing the three components that determine the success of a pairs trading strategy
Figure 10-2 Example of a typical model training process. The workflow starts with the available training data and gradually
tunes a model. The tuning process requires matching the model prediction to the target output, where the gap is measured by a
particular cost measure and used as feedback for the next round of tuning. Each tuning produces a new model, and we want to
look for one that minimizes the cost
In the following section, we will introduce the high-level principles of three different types
of machine learning algorithms: support vector machine, random forest, and neural network.
Note that ϵ controls the tolerance of the margin violation. It determines the trade-off
between the model complexity and the predictive accuracy. A small value of ϵ will result in a
complex model that closely fits the training data, but risks overfitting the training set and
therefore generalizing poorly to the new data. On the other hand, a large value of ϵ will result
in a simpler model with larger errors but potentially a better generalization performance.
As a user-specified hyperparameter, the choice of ϵ can be highly sensitive to the resulting
predictive performance. A common approach is cross-validation, which involves partitioning
the raw data into training and validation sets several times, each starting with a different
random seed. The best ϵ is the one that reports the highest predictive performance on average.
We introduce the random forest model in the following section.
Random Forest
Random forest is a type of ensemble model, which includes multiple simple models combined
together to make the final prediction. It is a powerful and flexible model that can be used for
both regression and classification tasks. As the name suggests, the algorithm constructs
multiple decision trees and combines all trees in the forest to make a final prediction.
The main differentiating factor about random forest compared with other models is how
the raw training dataset is divided to support the training of each tree. Specifically, each tree is
trained on a different subset of the data and a different subset of the features, a process known
as bagging or bootstrap aggregation. By using random subsets of the data and features, the
algorithm creates multiple independent submodels that have a low bias and high variance. The
final prediction is then produced by taking the average of the predictions of all the individual
trees, similar to collecting the views from multiple independent consultants and taking the
average recommendation as the final decision.
Note that at each node of the tree, a random subset of features is considered to determine
the best split, instead of considering all features. This process is called feature bagging. The
randomness in feature selection ensures that the trees are decorrelated and reduces the chance
of overfitting.
Random forests are widely used for their simplicity, versatility, and robustness. They can
handle a mix of numerical and categorical features, require very little preprocessing of the
data, and provide a built-in method for handling missing values. Furthermore, they offer
measures of feature importance, which can provide insights into the underlying structure of the
data.
Figure 10-4 illustrates the overall training process of the random forest model. We start by
sampling from the original training set to obtain a total of B subsets. Each sampling randomly
selects both observations and features, so that the resulting subsets appear to be independent of
each other and uncorrelated in the feature space. We will then train a decision tree model for
each subset, leading to B submodels. Upon assessing a new test data point, these B predictions
will be aggregated together and averaged to produce the final prediction.
Figure 10-4 Illustrating the training mechanism of the random forest model
Neural Network
A neural network consists of multiple interconnected nodes, also called neurons, stacked
together in layers. Each neuron serves as a function that receives input from the neurons in the
preceding layer, performs a nonlinear transformation on that input, and sends an output to the
neurons in the next layer. In between these neurons are the weights, also called parameters of
the neural network. Learning a neural network model essentially means tuning the weights so
that the final prediction is accurate, and the model generalizes well to the test set.
A typical neural network consists of an input layer representing the input data and an
output layer generating the output. It can also include any number of layers in between (called
hidden layers). Each layer contains at least one neuron, interpreted as an extracted hidden
feature. When it comes to the number of layers of a neural network, it refers to the hidden
layer plus the output layer. For example, a perceptron is a single-layer neural network,
meaning it has only input and output layers and does not have any hidden layer in between.
Being the fundamental constituent of a neural network, a perceptron is a single neuron that
completes two steps of mathematical operations: the weighted sum and the nonlinear
transformation. For a single observation with p dimensions x ∈ ℝp, the perceptron first
calculates the weighted sum between x and its corresponding weight vector
p
w ∈ ℝ , which is (and should be) also p-dimensional. The weighted sum is often accompanied
by one more term called intercept or bias, which acts as an additional parameter to exercise a
global level shift to the weighted sum to fit the data better.
After adding an intercept/bias term b, the sum passes through an activation function which
introduces a nonlinear transformation to the weighted sum. Note that the bias term is added by
inserting a column of ones in the input data, which is the same bias trick as linear regression.
Such nonlinear transformation, together with the number and width of layers, determines
neural networks' flexibility, expressivity, and approximating power. Figure 10-5 summarizes
the process flow of a perceptron.
Figure 10-5 The process flowchart of a perceptron, which consists of a weighted sum operation followed by an activation
function. A column of ones is automatically added to correspond to the bias term in the weight vector
The most popular choice of activation function is the rectified linear unit (ReLU), which
acts as an on/off switch that fires the input signal as it is if its value is above a specific
threshold and mutes it by outputting zero if it is below the threshold. In other words, the ReLU
operation is an identity function if the input is positive; otherwise, the output is set as zero.
Without such nonlinear activation, a multilayer neural network would simply become a series
of linear functions stacked on top of each other, resulting in a linear model.
Figure 10-6 visualizes the ReLU function's shape and summarizes the characteristics of the
perceptron operation discussed so far. Other than the architectural flexibility of a neural
network model in terms of the number and width of its layers, another main added flexibility
lies in the nonlinear operation. In fact, many exciting and meaningful hidden features could be
automatically extracted using ReLU as an activation function. For example, when training an
image classifier using a special architecture called convolutional neural networks, low-level
features in the initial hidden layers tend to resemble fundamental structural components such
as lines or edges, while high-level features at later hidden layers start to learn structural
patterns such as squares, circles, or even complex shapes like the wheels of a car. This is not
possible if we are limited to the linear transformation of features and is considered an
extremely difficult task if we were to engineer such informative features manually.
Figure 10-6 Decomposing a single perceptron into a weighted sum and an activation function which is often ReLU. The
ReLU operation passes through a signal if it is positive and mutes it if it is negative. Such nonlinearity also introduces great
approximating power to the neural networks in addition to the flexibility in designing the number and width of layers
One of the reasons why ReLU (and its variants) remains the most popular activation
function is its fast gradient computation. When the input is less than or equal to zero, the
gradient (of a constant number) becomes zero, thus saving the need for backpropagation and
parameter update. When the input is positive, the gradient (of the original input variable) is
simply one, which gets backpropagated as it is.
Having reviewed these three model classes, let us switch to the implementation of pairs
trading and compare their performances after using machine learning models to predict the
daily spread.
import os
import random
import numpy as np
import yfinance as yf
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
from matplotlib import pyplot as plt
%matplotlib inline
SEED = 8
random.seed(SEED)
np.random.seed(SEED)
For simplicity, we will define spread as the difference in the log price of the two stocks,
which is calculated and visualized in Listing 10-2.
Feature Engineering
Feature engineering is the process of selecting, transforming, and extracting relevant features
from the raw data in order to boost the performance of a machine learning model. The quality
and sometimes the quantity of the features are critical factors that influence the performance of
a machine learning model. These additional engineered features may not necessarily make
sense from an interpretability perspective, yet they will likely improve the predictive
performance of the machine learning algorithm by offering a new knob for the model to tune
with.
We have already encountered feature engineering in previous discussions, with the moving
average being the most notable example. In this exercise, we will use five features to predict
the spread series, including the daily returns for both stocks, the five-day moving average of
the spread series, and the 20-day moving standard deviation of daily returns. These are created
in Listing 10-3.
Note that this is just one way to create additional features. In practice, we would create
many more features to support algorithms such as SVM and random forest if the goal is to
maximize the predictive accuracy. For neural networks, however, such feature engineering is
helpful but not essential. Neural networks are powerful function approximators in that they can
learn the correct feature extraction given a sufficiently complex architecture and enough
training time.
We will then aggregate these features into a single DataFrame X, followed by filling NA
values with zero. We also assign the spread series to y:
Let us also split the data into a training and a test set. We will adopt the common 80-20
rule; that is, 80% of the data goes to the training set, and 20% goes to the test set. We will also
observe the sequence of time, so the 80% training set does not peak in the future, as shown in
Listing 10-4.
With the training and test data ready, we can now move into the model training part,
starting with SVM.
svm_model = SVR(kernel='linear')
svm_model.fit(train_X, train_y)
train_pred = svm_model.predict(train_X)
>>> print("training rmse: ",
np.sqrt(mean_squared_error(train_y, train_pred)))
test_pred = svm_model.predict(test_X)
>>> print("test rmse: ", np.sqrt(mean_squared_error(test_y,
test_pred)))
training rmse: 0.039616044798431914
test rmse: 0.12296547390274865
Listing 10-5 Model training and testing using SVM
The RMSE measures the model’s predictive performance. However, we still need to plug
the model into the trading strategy and evaluate the ultimate profitability in the pairs trading
strategy. As the only change is on the predicted spread based on the specific machine learning
model, we can define a function to score the model as an input parameter and output the
terminal profit. The score_fn() function in Listing 10-6 completes the scoring operation.
import torch
In this function, we add another input parameter to control if the model belongs to a neural
network. This control is placed here to determine the specific prediction method to use. For
standard sklearn algorithms such as SVM and random forest, we can call the predict()
method of the model object to generate predictions for the given input data. However, when
the model is a neural network trained using PyTorch, we need to first convert the input to a
tensor object using torch.Tensor(), generate predictions by calling the model object
itself (underlying, the forward() function within the model class is called), extracting the
outputs without gradient information using the detach() method, and converting to a
NumPy object using numpy().
Next, we calculate the z-score using the mean and the standard deviation of the predicted
spread series. We then use an entry threshold of two and an exit threshold of one to generate
the trading signals based on the standardized z-scores. The rest of the calculations follow the
same approach as in the previous chapter.
We can now use this function to obtain the terminal return for the pairs trading strategy
using the SVM model:
>>> score_fn(svm_model)
1.143746922303926
Similarly, we can obtain the same measure using the random forest regressor.
# random forest
from sklearn.ensemble import RandomForestRegressor
The result shows that random forest can better fit the data with a lower training and test set
RMSE compared with SVM.
We also calculate the terminal return as follows:
>>> score_fn(svm_model)
0.9489411965252148
The result reports a lower terminal return, despite a better predictive performance. This is
also overfitting, in the sense that a more predictive model at the stage-one prediction task leads
to a lower terminal return at the stage-two trading task. Combining these two tasks in a single
stage is an interesting and active area of research.
We move to neural networks in the next section.
Note that we use the .values attribute to access the values from the DataFrame and the
view() function to reshape the target into a column.
Next, we define the neural network model in Listing 10-8. Here, we slot the attributes to
the initialization function, including one input linear layer, one hidden linear layer, and one
output linear layer. The number of incoming neurons in the input layer (i.e.,
train_X.shape[1]) and the number of outgoing neurons in the output layer (i.e., 1) are
determined by the specific problem at hand. The number of neurons in the middle layers is
user defined and directly determines the model complexity. All these layers are chained
together with a ReLU activation function in the middle via the forward() function. Also,
note that it is unnecessary to apply ReLU to the last layer since the output will be a scalar
value representing the predicted spread.
Now we instantiate a neural network model in nn_model and inspect the architectural
information of the model using the summary() function, as shown in Listing 10-9.
The result shows that the neural network contains a total of 2497 parameters over three
linear layers. Note that the ReLU layer does not have any associated parameters as it involves
deterministic mapping only.
Next, we define the loss function as the mean square error using MSELoss() and choose
Adam as the optimizer over the network weights, with an initial learning rate of 0.001:
We now enter the iterative training loop to update the weights by minimizing the specified
loss function, as shown in Listing 10-10.
Here, we iterate over the training set for a total of 100 epochs. In each epoch, we first clear
the existing gradients in memory using the zero_grad() function of the optimizer. Next,
we score the training set to obtain predicted targets in outputs, calculate the corresponding
MSE loss, perform backward propagation to calculate the gradients using autograd
functionality via the backward() method, and finally perform gradient descent update using
the step() function.
Running the code generates the following results, where we see that the training loss
continues to decrease as iteration proceeds:
The result shows that the neural network is less overfitting than the random forest model.
Now we obtain the terminal return of the pairs trading strategy based on the neural network
model:
Again, this result shows that an accurate machine learning model may not necessarily lead
to a higher terminal return in the pairs trading strategy. Even if the machine learning model is
predictive of future spreads, another layer of assumption imposed by the pairs trading strategy
is that the temporary market fluctuations will ease down, and the two assets will revert back to
the long-term equilibrium relationship. Such an assumption may not necessarily stand, along
with the many unpredictable factors in the market.
Summary
In this chapter, we introduced different machine learning algorithms used in predicting the
spread, a key component when employing the pairs trading strategy. We started by introducing
the overall framework when training any machine learning algorithm and then elaborated on
three specific algorithms: support vector machine, random forest, and neural network. Lastly,
we plugged these models into the strategy and found that a higher predictive performance by
the machine learning model, a sign of overfitting, may lead to a lower performance score in
terms of cumulative return. It is thus important not to overfit the machine learning models at
the prediction stage and instead focus more on the final performance of the trading strategy at
the decision stage, where the actual trading action is made.
Exercises
How does the SVM model determine the optimal hyperplane for predicting the spread in a
pairs trading strategy? What are the key parameters that need to be adjusted in an SVM?
How does a random forest algorithm handle feature selection when predicting the spread in
a pairs trading strategy? What are the implications of feature importance in this context?
Explain how SVM, random forest, and neural networks approach the problem of overfitting
in the context of predicting the spread in a pairs trading strategy.
How can you handle nonlinear relationships between features in SVM, random forest, and
neural networks when predicting the spread in a pairs trading strategy?
How can the layers in a neural network be optimized to improve the prediction of the spread
in a pairs trading strategy?
Index
A
Acquisition function
best-observed value
closed-form EI
decision-making
defining and initializing
randomness
trade-off between
UCB
Actively managed investment pools
Active traders
add_constant() function
Advanced order types
Agency orders
Agency trading
agg() function
All-or-none (AON)
alpha argument
Annualized variance
Annualizing returns
Annualizing volatility
Annuities
apply() function
Arbitrage
Arithmetic mean
Asset classes
Asset price curve
asset_return1
asset_return2
Augmented Dickey-Fuller (ADF) test
Automated optimization techniques
Automated trading
B
Backtesting
historical data
market phases
maximum drawdown/max drawdown
optimistic assessment
parameters
performance
performance indicator
procedure
profits
risk and reward assessments
test set performance
trend-following strategy
backward() method
Backwardation
Bayesian optimization
black-box function
environment
parameter
policy
Big exchanges
Black-box function
Bollinger Bands
Bonds
botorch.optim module
Buy-and-hold strategy
Buy-side institutional investors
Buy-side prices
Buy-side retail investors
C
Call market
Candlestick() function
Candlestick charts
Cash settlement
Central depository (CDP)
Centralized order-driven market
ChatGPT model
Chicago Board of Trade (CBOT)
Clearance
Clearing house
coint() function
Cointegration
correlation
equilibrium
statsmodels package
time series
hypothesis
non-stationary time series
process
statistical analysis
statistical characteristics
traditional statistical methods
Compounded return
Compounding returns
Contango
Continuously compounded returns
Continuous market
Convenience yield
cummax() function
.cumprod() function
cumprod() function
D
Daily drawdowns
Dark pools
DataFrames
Data-generating process
datetime format
Day trader
Delivery
Derivative market
df2 variable
dfBidPrices
dfPrices
diff() function
Display orders
DJI Stock Symbols
Dow Jones Industrial Average (DJIA)
momentum trading
download() function
drawdown()
dropna() function
E
Earnings per share (EPS)
Electronic communications networks (ECNs)
Electronic markets
buying and selling financial instruments
discrete price grid
discrete price ladder
display vs. non-display order
electronic order
limit order
limit order book
market order
market participants
MIT
order flow
order matching systems
order types
pegged order
price impact
proprietary and agency trading
revolution
stop-limit order
stop order
trailing stop order
E-Mini futures contract
equals() function
Evaluation-period performance
ewm() method
ExactMarginalLogLikelihood() function
Exchange-traded funds (ETFs)
Execution risk
Expected improvement (EI)
Exploratory data analysis
Exponentially weighted moving average (EWMA)
Exponential moving average (EMA)
Exponentiation
F
FAK orders
Feature engineering
SVM and random forest
training and test data
DataFrame X
Fill and kill (FAK)
Fill or kill (FOK)
Financial assets
Financial data analysis
definition
downloading stock price data
summarizing stock prices
visualizing stock price data
Financial derivatives
Financial instrument
Financial market stability
Financial trading
First-order gradient-based methods
First-period return
Flexible controls
FOK orders
Forward and futures contracts
age-old practice
derivative products
financial instruments
futures trading
key difference
market participants opportunities
predetermined quantity
purchase and receive obligation
Forward contract
arbitrage opportunities
buy-low-sell-high principle
counterparty risk
current time point
definition
exponential constant
formula
net cash flow
no-arbitrage argument
portfolio
predetermined delivery price
private agreements
risk-free interest rate
stock and cash positions
trading price and quantity
unforeseen circumstances
Futures contract
clearing house
hedging and speculation
leverage
mark-to-market
obligations at maturity
parameters
pricing
standardized features
standardized contracts
Futures data
closing price
downloading
fontsize argument
“GC=F” and “HG=F” symbols
technical indicators
visualizing
yfinance package
Futures trading
G
Gaussian distribution
Gaussian process (GP) model
generate_initial_data()
get_stock_data() function
go.Bar() function
Group tradable assets
derivative products
maturity
nonlinear payoff function
payoff function linearity
H
head() function
Hedge funds
Hedgers
Hedging
Hidden/non-display orders
High-frequency trading (HFT)
Hyperparameter tuning
Hypothesis testing
I, J, K
Iceberg orders
idxmin() function
iloc() function
iloc() method
Immediate or cancel (IOC)
Implementing trend-following strategy
buy-and-hold
cumulative returns analysis
framework
long-term moving average
momentum-related technical indicators
1+R return
short-term moving average
signal column
sign switch
single-period return
SMA-3 and SMA-20
trading actions
trading rule
transaction cost
info() function
initialize_model()
Input data groups
financial news
fundamentals
market states
technicals
Institutional algorithmic trading
Interpolation
L
Label distribution
Leverage
Limit order
Limit order book (LOB)
Linear regression model
LOB data
data folder
features
label distribution
limit prices
normalized data representations
price-volume data
price-volume pair
visualizing price movement
loc() function
Logarithmic returns
advantages
compounding returns
dummy stock prices
mathematical computations
natural logarithm
percentage return
1+R approach
sequential compounding process
single-period returns
stock price analysis
stock returns calculation
terminal return
Lookback windows application
M
Machine learning
components
market-neutral strategy
pairs trading
calculation
stocks
trading horizon
trained model
training process workflow
training situation
types
make_subplots() function
Marked to market (MTM)
Market if touched (MIT)
Market maker
Market-neutral trading strategy
Market-not-held orders
Market orders
Market participants
Market timing
Mark-to-market
definition
exchange
final settlement price
fluctuating prices
long margin account
minimum requirement
price updation
profit and loss
traders risk exposure
Maximum drawdown
buy-and-hold strategy
calculating
calculation process
daily returns
DataFrame
distance
line charts
performance
risk-adjusted return metric
risk measure
stock price data
stock returns
stocks
trading strategy
volatility
wealth index curve
Maximum log-likelihood (MLL) approach
mean() method
Mean square error
Model development workflow
Model training process
Momentum trading
asset’s price
characterizes
current month
elements
time frame
volatility
volume
measurement period
monthly returns
principle
terminal return
traders
traders and investors
and trend-following
Moving Average Convergence Divergence (MACD)
Moving averages (MA)
Multiperiod return
Mutual funds
N
NaN value
NASDAQ Nordic stock market
Neural network
fundamental constituent
input data and an output layer
linear regression
parameters
ReLU function
New York Stock Exchange (NYSE)
No-arbitrage argument
Nonconvex function
Non-display orders
Normal contango
Normal contango
Normality
n-period investment horizon
np.exp() function
np.mean() function
Null hypothesis
O
Objective functions
OHLC prices
OHLC chart
On-balance volume (OBV)
One-dimensional objective function
Online trading platforms
Optimization
argmax operation
decision-making
derivative-free
global
optimizer
parameters
procedure
time and resources
trading strategy
Order-driven market
Order flow
Order matchings systems
definition
conditional orders
electronic exchanges
exchanges
non-displayed orders
order precedence rules
order types
price/display/time precedence rule
rule-based systems
Order precedence rules
types
price precedence
size precedence
time precedence
Order types
Ordinary least squares (OLS)
Over-the-counter (OTC)
P
Pairs trading
assets
asset selection
components
implementation
mean-reverting behavior
neural network
SVM
fit() method
predict() method
score_fn() function
sklearn algorithms
torch.Tensor()
strategy
view() function
traders
Pandas DataFrame
pct_change() function
pd.DataFrame() function
Pegged order
algorithm
best bid
composite order
definition
differential offset
dynamic limit price
limit order
reference price
securities
Percentage change
Percentage returns
p-hacking
Physical delivery
plot() function
plotly package
Potential trading opportunities
predict() method
Price impact
Price ladder
Price movement visualization
Price precedence
Price return
Price slippage
Principle of compounding
prod() function
Program trading
Proprietary orders
Proprietary trading
Q
qcut() function
Quantitative trading
algorithm
avenues and steps
buy-side investors
common assets
See Tradable assets, quantitative trading
data collection and processing
definition
grouping tradable assets
institutional algorithmic trading
market making
market structures
model development workflow
order execution
portfolio rebalancing
process
quant trader
scalping
structured features
Quant trader
Quote-driven/price-driven market
R
Random forest
bagging or bootstrap aggregation
factor
features
training process
Random forest regressor
Real estate investment trusts (REITs)
Rebalances a portfolio
Rebalancing
Rectified linear unit (ReLU)
Relative Strength Index (RSI)
Rerminal return
resample() function
return_df variable
Returns analysis
annualized returns calculation
annualizing
description
dummy returns
multiperiod return
1+R format
single-period returns calculation
stock return with dividends
terminal return
two-period terminal return calculation
Return values
1+R format
1+R formatted DataFrame
Risk-adjusted return
Risk analysis
annualized returns calculation
annualized volatility calculation
column-wise arithmetic mean returns
Sharpe ratio
Sharpe ratio calculation
stock price data
variance and standard deviation
volatility
Risk and return trade-off
diversification strategies
factors
individual asset
low-return asset
profit maximization
stock market
two-dimensional coordinate system
Risk-free bond interest rate
Risk-free interest rate
1+R method
rolling() function
Root mean squared error (RMSE)
1+R return
Rule-based approach
S
Scalping
shape() function
Sharpe ratio
shift() function
Short-term swings
Simple moving average (SMA)
Singaporean investment
Singapore Exchange (SGX)
Single-period logarithmic return
Single-period log returns
Single-period percentage return
Single-period returns
Single-period volatility
Size precedence
Slippage
SMA-3
Speculators
S&P 500 E-Mini futures contract
Spot market
Stacked bar charts
Standard deviation
Standardization
Stationarity
adfuller() function
distribution
mean and standard deviation
random.normal() function
stationarity_test()
stock prices
time series
Statistical arbitrage
concept
market movements
mean reversion
short-term fluctuations
short-term market factors
statistical methods
steps
stocks
Statistical concept
Statistical measures
std() function
Stock data
Stock price data
Stock return with dividends
Stocks
Stop-entry order
Stop-limit order
Stop-loss orders
Stop orders
summary() function
Sum of the squared errors (SSE)
Support vector machine (SVM)
hyperplane
input-output pairs
mathematical functions
support vectors
user-specified hyperparameter
Symmetry
T
tail() function
Tangible and intangible factors
Technical indicators
additional features
Bollinger Bands
DataFrame
EMA
integral
MA
MACD
market analysis clarification
mathematical calculations
raw futures time series data
RSI
SMA
volume-based indicators
Terminal monthly return
Terminal return
Ticker() module
Time precedence
Time series data
today() function
torch.Tensor() function
Tradable assets, quantitative trading
annuities
bonds
cash and equivalents
commodities
currencies
ETFs
forward
futures
hedge funds
mutual funds
options
REITs
stocks
Trade formation period
Traders
Trading agency
Trading algorithm
Trading avenues
Trading signals
Trading steps
acquisition of information and quotes
confirmation, clearance, and settlement
execution of order
routing of order
Trading volume
Trailing stop orders
Transactions
Trend following strategy
definition
implementation
See Implementing trend-following strategy
log return
See Logarithmic return
lookback window
risk management techniques
technical indicators
See also Trend trading
Trend traders
Trend trading
definition
EMA
fundamental principle
moving average
SMA
technical analysis tools
technical indicators
See Technical indicators
Two-period return
Typical model training process
U
Unit root test
Upper confidence bound (UCB)
V, W, X
value_counts() function
Variance and standard deviation
Volatility
Volume-weighted average price (VWAP)
Y
Yahoo! Finance
yfinance library
yfinance package
Z
zero_grad() function
Zero-sum game
Z-score