A Beginners Guide To Python Programming For Traders
A Beginners Guide To Python Programming For Traders
Guide to Python
Programming for
Traders
A Beginners Guide to
Python Programming
for Traders
Connors Research, LLC
Steve Jost
Publishing
Copyright © 2020, Connors Research, LLC.
Published by The Connors Group, Inc., 185 Hudson St., Suite 2500, Jersey City, NJ
07311
Publisher’s Notice
The publisher has provided this eBook to you without Digital Rights Management (DRM)
software applied so that you can enjoy reading it on your personal devices. This eBook
is for your personal use only. You may not print or post this eBook, or make this eBook
publicly available in any way. You may not copy, reproduce, or upload this eBook
except to read it on your personal devices.
Copyright infringement is against the law. If you believe the copy of this eBook you are
reading infringes on the author’s copyright, please notify the publisher at
[email protected].
ISBN 978-0-578-68440-6
It should not be assumed that the methods, techniques, or indicators presented in these products will be
profitable or that they will not result in losses. Past results of any individual trader or trading system
published by Company are not indicative of future returns by that trader or system, and are not indicative
of future returns which be realized by you. In addition, the indicators, strategies, columns, articles and all
other features of Company's products (collectively, the "Information") are provided for informational and
educational purposes only and should not be construed as investment advice. Examples presented on
Company's website are for educational purposes only. Such set-ups are not solicitations of any order to
buy or sell. Accordingly, you should not rely solely on the Information in making any investment. Rather,
you should use the Information only as a starting point for doing additional independent research in order
to allow you to form your own opinion regarding investments. You should always check with your licensed
financial advisor and tax advisor to determine the suitability of any investment.
Chapter 1 - Introduction
● Why is Python taking over Quantitative Finance?
● What can Python do for your trading and your career?
● My Journey
● About this e-book
Python is the hottest programming language on the planet, not just in the Quantitative Finance
field but also in many industries. Python continues to gain more users than any of its
competitors and is soon projected to be the most popular programming language in the world.
Stackoverflow.com, the most popular website for coding related questions, reports that the
number of questions regarding the Python programming language is growing much faster than
any of the other languages as the following graph shows, indicating Pythons recent rise in
popularity:
Python is behind some of the largest companies and most important projects in the world. From
helping Netflix stream videos to more than 100 million homes worldwide to powering the very
popular photo-sharing phenomenon Instagram to helping NASA in space exploration.
In fact, it was Python that recently aided NASA in stitching together the first image of a black
hole some 500 million trillion kilometers away!
While Python is popular in many industries, it is especially popular in the Data Science arena in
general and Quantitative finance in particular.
Not only is Python growing faster than other programming languages for general use, it's
already the most used language for quantitative finance and trading on the professional level.
There are many reasons for Python’s rise to the preeminent language used in quantitative
finance and data science. Some of these reasons include:
Don’t take our word for this, just do a quick search for any trading related job posting by the
largest and most sophisticated investment banks and hedge funds in the world.
You will see that all of these institutions require Python knowledge to get your foot in the door.
These companies don’t care that you can code in retail products like Amibroker and
TradeStation!
These institutions realize that Python makes them more efficient, more importantly more
profitable! They require any new hires be well versed in the Python programming language.
My trading and strategy development skills were greatly enhanced when I learned how to
program in Python. Below is a quick summary of my coding and systematic trading journey so
far.
Before joining Connors Research, I was an electrical engineer for 35 years and had recently
retired in 2019. Aside from engineering, my other passion is finance and trading.
As an avid reader of stock trading books, I applied techniques I read about to managing my own
money but never really achieved much success as a discretionary trader.
I was also drawn to systematic trading and viewed it as a way to possibly achieve the success
that eluded me as a discretionary trader. If I could develop one or more quantified approaches
that suited my personality and risk tolerance, I would be much more likely to have confidence in
my trading and be able to stick it out through the inevitable drawdowns. In short, having realistic
expectations based on quantified results and a set of rules to follow would enable me to control
my emotions and to become a better trader.
I started my journey to systematic trading using excel but it quickly became apparent that if I
was going to make any real progress I had to learn how to program.
My first venture into programming was using a community-based platform called WealthLab in
2003. As an open source platform, WealthLab had an active community that would freely share
code. Not all of it was great as there was a lot of curve fitting and one upmanship, but I did take
away from this experience a few good ideas and I learned the basics of how to program. Fidelity
bought WealthLab in 2004 and that was the end of the experience for me.
I drifted away from systematic trading for lack of an inexpensive platform, but eventually found
one that I could afford with Amibroker and Norgate Data. About that time I also found Connor’s
Research and purchased a number of their guidebooks on short term trading. I enjoyed
implementing the rules in Amibroker code and obtained good results trading the systems.
Enter Python
Amibroker worked out well for me for a number of years, but it had its limitations.
Being a closed source language, I was at the mercy of the developers at Amibroker to advance
the language and extend its functionality. I also missed the community based experience that I
had years earlier with WealthLab.
I needed to step up my game. While Amibroker was fine, at the end of the day it is a retail
product. I wanted to enter the big leagues and learn a more flexible, open-source, professionally
used coding language. A language that the biggest and best banks and hedge funds on Wall
Street were using.
I began reading how Python was taking over quantitative finance and how it could be used for
trading and backtesting. The decision from there was easy; I set out to learn Python and to find
another platform to develop and backtest trading algorithms.
Fast forward nine months and Python is now my primary tool for trading - ranging from
developing complete trading strategies to analyzing my backtests, to finding new trading edges.
Python has greatly expanded my skill set, ultimately making me a better, more profitable trader.
Below are some of the ways I use Python in my trading and research. I can now...
● Code any strategy I can think of with much greater efficiency and improved flexibility.
● Code portfolio level trading strategies, not just strategies applied to one security at a
time. This is a required skill for professional trading system development.
● Perform backtests on Futures Contracts, using continuous futures with several “roll”
options for realistic simulations.
● Test individual trading signals for historical edges before incorporating that signal into a
complete strategy. Answering questions such as “every time the RSI has been below
10, what has happened over the next 3 business days?” for example.
● Run statistical and machine learning models, which are already pre-programmed in
Python.
● Make custom charts and plots of any data I can get my hands on - this includes things
like line charts, bar charts, scatter plots, and correlation matrices.
● Analyze data not just from the Quantopian database, but also from any website, program
with a Python API (there are tons of them), Excel spreadsheets or CSV files.
● Interact with a large community of traders, data scientists, developers, and researchers
using Python. This has huge benefits including always having somebody available to
help with questions and generous sharing of code throughout the community.
This e-book is designed to help start you on your journey to using the power of Python to
improve your research, trading, and career prospects. It is not meant to be a complete course
but rather is meant to start you down the right path.
There are many great sources teaching the basics of Python. With this e-book, we don’t intend
to just put another resource out there that covers the same topics. Instead, our aim is to teach
some of the most important skills you will need to know in Python specifically for quantitative
finance, research, and systematic trading.
In this e-book, you will first learn the basics of Python, setting the foundation for some of the
more advanced topics we cover later in the book. This includes topics such as basic math in
Python, variable assignments, comparison operators, data types, and control flow statements.
While this serves as a general introduction to the language, special attention is paid to Python
techniques useful for trading and quantitative research.
After the basics have been covered, we will move on to the most important Python package
used for quantitative financial research and trading – Pandas.
After spending some time getting to know the basics of the Pandas package, we will go on to
the heart of the book, demonstrating some quantitative research and trading strategy
development using our new Python and Pandas skills.
We will start with a highly practical quantitative research case study. In this chapter, we will
walk you through how to check for edges in the marketplace by observing the future
performance of a security based on the reading of a technical analysis indicator. For our
example, we will use the popular technical analysis indicator RSI (relative strength index).
After gaining the knowledge provided in this chapter, you will be able to take this general
framework and inspect any trading indicator (or combination of indicators) to statistically see if it
has any predictive power.
We will then move on to an introduction to Zipline - the backtesting engine that powers
Quantopian.com. You will learn the basics of the Zipline API, the structure of a trading
algorithm, how to send orders, and how to access portfolio level information.
Finally, we will end the book with a complete walkthrough of how to code our first full-blown
trading strategy using the research we conducted in chapter 7. We will go step by step,
explaining the logic of the trading strategy complete with code snippets. At the end of the
chapter, we will provide the complete source code, which is also available for download.
Last but not least, we will inspect our test results using Pyfolio - the open-source Python
package designed to analyze historical backtested results.
I want this to be a highly practical book, specifically designed to start you on your journey to
leveling up your trading by showing real-world research examples and developing a profitable
trading strategy from scratch.
My hope is that you walk away from this book excited by the possibilities Python can offer an
aspiring, or even an established, quantitative trader or investment manager.
In this chapter, we are going to quickly cover the tool we will be utilizing throughout the book -
Quantopian.com. While we will be working in Quantopian for the purposes of this book,
remember all of the Python skills you will learn can be applied to your local environment as well.
The first step to get going with Quantopian is to set up a Quantopian account. This is done
simply and easily and should take all of five minutes to complete.
1. Go to Quantopian.com and click on the “Sign Up” button on the top right-hand corner.
2. From there, simply fill in your name, email address, and create a password.
3. A confirmation email will be sent to the email address provided. Pull up this email and
verify it is you. After that you are all set, just go back to Quantopian.com, enter your
credentials under “Log In” (top right hand corner) and you are in. Welcome to
Quantopian!
Research/Notebook Environment
The first environment, or place where you write code, we will cover in Quantopian.com is what is
known as the “research” or “notebook” environment. This environment contains embedded
Jupyter Notebooks. These Jupyter Notebooks have quickly become the ubiquitous tool for
quants everywhere, with some going as far as saying that Jupyter Notebooks are the new
Excel!
These notebooks are primarily designed to facilitate iterative coding. This means having the
ability to observe your data, write some code, and inspect the results in a step-by-step fashion.
This ensures that your code is doing what you intend it to do and greatly helps the development
process.
To enter the research/notebook environment, click on the “Research” dropdown menu at the top
of Quantopian.com (after you logged into your account) and select “Notebooks” from the
dropdown menu.
This will take you to a screen containing your existing notebooks. To create a new notebook,
click on the blue plus sign ( + ) at the top right hand side of the page.
Algorithm Environment
The other environment we will be working in is known as the “algorithm” environment. This is
where you construct complete trading models and produce historical (backtested) test results.
We will be working in this environment in Chapters 8 and 9, where we construct a trading model
from scratch.
To access this environment, again go to the “Research” tab at the top of the page. This time
select “Algorithms” from the dropdown menu.
This takes you to a screen displaying all of your saved Algorithms. To create a new Algorithm,
select “New Algorithm” at the top right-hand side of the screen.
Images provided courtesy of Quantopian.com
After you name your new Algorithm, this takes you to the Algorithm environment, where you can
construct full trading models. There will be some pre-populated sample code to get you started.
As a beginner, I would suggest deleting that code and starting from scratch. I will guide you
through writing a complete trading Algorithm in chapters 8 and 9.
Now that you are familiar with Quantopian.com, let's go on to some Python basics. All of the
example code provided over the next couple of chapters has been created in the
research/notebook environment. We will then go on to code in the Algorithm environment to
conclude the book.
Chapter 3 - Getting Started with Python - The Basics
Variables
Variables can best be thought of as a named container which is used to store a value.
As a practical example, let's say you wanted to code a moving average. You first have to
choose a length for your moving average (50-day, 200-day, etc). You could hard code this into
your code, or you can make a variable called something like “ma_length” which contains the
length of that moving average.
The big advantage of the latter approach is that you can easily go back and change the length
of the moving average by changing the value of “ma_length”. All moving averages in your code,
that use “ma_length”, will then be changed as well.
This makes your code much more dynamic, robust, and is considered a best-practice
methodology.
Before we use a variable in our code, we first have to assign it. Assigning variables in Python is
straightforward, we just use the equal sign(=). This stores whatever is to the right of the equal
sign to the variable name to the left.
From there, we can just reference the variable name in our code, and Python will know what
that variable is assigned.
In the next example, we set “ma_length” to the number 200. We then printed out “ma_length”
and Python told us this variable is now set to 200.
Doing basic math in Python is very straightforward. The following table displays some of the
basic mathematical operations that can be done in the language.
You might be thinking, why do I need Python for this, doesn’t a simple calculator or a
spreadsheet such as Microsoft Excel have this functionality?
Yes, it does, but as it will become clearer to you as your Python skills advance, Python can get
tasks done way more efficiently than those other tools.
Below, find some examples of using Python for basic math operations.
Comparison operators are what they sound like - these are statements comparing one thing to
another. Common comparison operators include “greater than”, “less than”, “equal to” etc.
Find a table below outlining common comparison operators in Python which will be very useful
in future trading logic.
A couple of things to point out here that trips up some new programmers. Namely, the code that
checks if something is equal to something else. Notice that for this logic, we need a double
equal sign (==).
This is because just one equal sign is used to assign a variable in Python, which we already
learned.
When you ask Python to evaluate comparison operations, you will get a “True” or “False” value
back. When your Python skills improve, you will learn how to use this output to control your
trading logic.
For now, some simple code examples using comparison operators are below.
Data and analysis tools provided by Quantopian.com.
Comments
Comments are an important part of programming and it is best to get in the habit of liberally
commenting on your code right from the beginning.
Comments refer to lines in your code that are not read by the computer and instead are
designed to be read by an actual human. It is best practice to liberally use comments in your
code to explain what the code is doing. This has a dual benefit.
First, it helps you understand what some code you previously wrote is trying to do when you
come back to it sometime in the future.
Second, it helps others to understand what your code is doing. This is especially necessary if
you are collaborating on a team, as your comments help another team member understand the
code.
Comments in Python are done in one of two ways. You can either start a line with a hashtag (#)
or use a triple quote (‘’’).
Hashtags will comment out single lines of code and triple quotes will comment out as many lines
of code that follow until another triple quote is encountered.
DataTypes are best thought of as categories of data in Python, and it affects how Python treats
a data point in calculations and scripts.
It’s not immediately clear to novice programmers why they need to even care about what
DataType a certain variable is. After you have been programming for a while, however, it will
become obvious why we need to care – because Python treats different DataTypes in different
ways!
A failure to understand this can cause your code to fail, or worse cause it to display behavior
differently than what you intended.
1. Integers
2. Floats
3. Strings
4. Booleans
5. Python Lists
6. Python Dictionaries
Type() function
Before we dive into different DataTypes, we first must cover the type() function. This function
simply returns whatever DataType of the object that you pass into the function.
If you are unsure of what DataType something is, pass it into the type() function and Python will
tell you.
Integers
In the example above, the first piece of code just passes the whole number 3 into the type()
function. Python then told us 3 is an integer (“int”).
In the second piece of code, we first set the whole number 10 to the variable “x”. We then
passed “x” into the type() function and python again told us that “x” is an integer, since it is set to
10.
Floats
Here we set pass the number 5.0 into the type function. Since 5.0 has a decimal place after it,
Python tells us this is a float. We then set the number 0.35 to the variable “x” and pass “x” into
the type() function. Python tells us this is a float as well.
Strings
A string in Python is technically anything in quotes. Typically, but not always, strings are words
as opposed to numbers.
Here we pass the phrase ‘python is cool’ into the type() function. Python tells us this is a string.
Notice the quote around ‘python is cool’, indicating that this is, in fact, a string. We then set
‘python is cool’ to the variable “x” and pass “x” into the type() function. Python again tells us X is
now a string.
Booleans
Booleans can best be thought of as a type of “on” or “off” switch in Python. Technically,
booleans are “True” or “False” values. Notice the capitalization of “True” and “False.” These
are special Python words, stored as Boolean values.
You can either set a variable as “True” or “False” directly or code a logical operation which
Python will then read and return either “True” or “False”, similar to what we saw previously.
Data and analysis tools provided by Quantopian.com.
In the code above, we first check the type of the words “True” and “False”. Python told us these
were of the “bool” datatype, short for Boolean. We then set the variable “x” to “5 > 3” and
checked the type of x, “bool” again. Finally, we printed out “x”, which Python tells us is True
(because 5 is greater than 3).
Python Lists
A python list is a collection of objects. Python lists have square brackets “[]”.
The len() function is a quick way to check the length of an object. In this case, we are working
with Python lists, which are collections of objects in Python, as we just learned.
To check how many objects are in our Python list, we simply pass the list into the len() function.
In the code above, notice that Python returned the integer “5” after we passed our list
(my_first_list) into the len() function. This is because there is a total of 5 objects in this Python
list.
The len() function will come in handy when we are building trading algorithms, as you will see
later in this book.
Python Dictionary
A python dictionary is a collection of key:value pairs. Python dictionaries are designed so you
can look up the key and it returns the associated value. Similar to a regular dictionary, where
you look up the word (the “key”) and it tells you the definition (the “value”).
Python dictionaries are set up using curly brackets “{}” and key-value pairs are separated by a
colon with a comma in between each set of pairs.
Data and analysis tools provided by Quantopian.com.
In the code above we first set the variable “my_first_dictionary” to a dictionary we made in curly
brackets, containing three key:value pairs of three countries and their corresponding largest city.
We then checked the type of the variable “my_first_dictionary”, python told us this is a
dictionary.
We then printed out the dictionary and Python displays it for us.
Finally, to show the practical functionality of Python dictionaries, we referenced our dictionary
and followed that by a key in square brackets. Python returns to us the value associated with
that key.
Control flow statements are a very important topic, specifically for trading purposes. These
statements control the “flow” of the code and are used to determine which lines of code get
executed and which do not.
As a practical example, it’s easy to imagine that we want to code logic such that if the price is
above a moving average we will buy, and if it is below a moving average we will sell. The most
straightforward way to achieve this would be an “if” statement.
In Python, white space matters! Said another way, code that is indented will only be executed if
the code before it is “True”. If not, the code below will not be executed. If this is confusing to
you, the code examples to follow should clear up any confusion.
If statements
If statements first check if a condition or conditions are True. The code following the if statement
is executed if the proceeding code is True and not executed if the proceeding code is False.
In this next code snippet, we asked Python if 5 is less than 2. Of course, this is not true. As
such, since this line of code is False, the code indented under it which prints out “yes, this is
true” is not executed, which is why you don’t see it printed out.
If … Else Statements
If … else statements are exactly what they sound like. Python first checks if the condition
following the “if” is true. If so, it executes the code under that condition. If not, then it executes
the code under the “else” statement.
In this code, Python first checks if 5 is greater than 2. Since it is, the code under that is
executed, so “yes, this is true” is printed.
Since the first if statement is true, the code under the else statement is not executed.
The next control flow statements we will learn is if … elif … else. This statement first checks if
the line of code following the if statement if True. If it is, the code indented under the if statement
gets executed.
If not, then the code checks to see if the statement following “elif” is true (short for else…if). If
that statement is True the code indented under that is executed.
If both the if statement and the elif statement are false, then the code under the “else” statement
is executed. The following code examples demonstrate this concept.
In the above code, we set the integer 30 to the variable “x”. In the first if statement, we check if
“x” is greater than 20. Since it is, the indented code under the first if statement is executed,
printing out “the first if statement is true”.
Finally, in our last code example, we set the variable “x” to be the integer 10. As such, the first if
statement and the second elif statement are both False, since 10 is not greater than 20 and 10
is also not greater than 15. Since these lines are both False, Python executes the code under
the else statement – printing “neither are true”.
Loops
Loops are an important topic in any programming language. Loops are used to iterate through a
collection of items, such as a Python list or a Python dictionary. There are two basic types of
loops – “for” loops and “while” loops.
A “for” loop iterates through each item in a collection one by one, executing whatever desired
code you wish to implement on each item in the collection. Once the loop works through each
item in the collection, the loop is stopped.
A “while” loop continues to execute the loop while a condition is True, only stopping when that
condition is False.
For Loops
In our first example, we will just make a Python list containing a collection of numbers. We will
then write a for loop to iterate through the list, simply printing out each item.
Data and analysis tools provided by Quantopian.com.
Notice in the code above, we set up a for loop by typing the python word “for” followed by an
iterative variable “x”. This iterative variable can be named anything, for simplicity we will stick
with “x” as the iterative variable name. We next type “in” followed by the name of the collection
we wish to iterate through - “my_list”. This is followed by a colon.
The indented code is what happens inside the loop. In this simple example, all we do is iterate
through the list and print out the contents one by one as shown above.
Here is another example, this time we will iterate through the same list and add 5 to each item
and print out the result.
While Loops
A while loop executes the loop while a certain condition is True. Only when than conditions fails
to be True does the loop stop.
In the code above, we first set the variable “y” equal to the integer 0. We then begin our while
loop. Our while loop first prints out the current value of “y”, then adds 1 to that value.
Notice that this loop executes only when the logical statement, in this case is y less than or
equal to 5, is True. Once the value of y becomes greater than 5, the loop is stopped. This is
why the integer 6 is not printed out.
Chapter 6 - Intro to Pandas: The Most Important Library for
Quantitative Trading
Now that we have some Python basics down, let’s get into more useful python programming as
it applies to Quantitative Finance and Trading. The most useful Python library for quantitative
research and trading is called Pandas.
Pandas is a software library written in Python designed specifically for data manipulation and
analysis. The Pandas library uses data in tabular format.
Tabular data is the main structure of data you will encounter in Finance and Trading. Pandas
was built specifically to work with this type of data, and more specifically, Pandas was built to
handle time series data.
Time series data refers to data that has an associated timestamp, such as a date or time (or
both). This is extremely common in Finance. Think of any price data you have encountered, for
example. This data will contain prices, maybe open, high, low and close along with volume,
accompanied by a timestamp. That is an example of time series data.
Let’s now jump into some Pandas coding examples. After we cover the basics of Pandas, we
will use our new-found skills to do some quantitative research, observing historical edges
provided by the popular technical analysis indicator RSI. Finally, we will go on to build our first
complete trading model in Python, incorporating the conclusions from our research project!
Let's get started.
The two main data structures of the Pandas library are known as Pandas DataFrames and
Pandas Series.
Most of your work will be on Pandas DataFrames. Pandas DataFrames are simply tabular data,
similar to an Excel spreadsheet, only much more powerful. A Pandas DataFrame will always
have labeled rows, called the index, and labeled columns. These row and column labels will be
used heavily in your Pandas code.
A Pandas Series is similar to Pandas DataFrames, except a Pandas Series only contains one
column. This is juxtaposed to Pandas DataFrames, which can contain multiple columns. Just
like DataFrames, Pandas Series will also have labeled rows, called an index.
Indexes and Column Names
As mentioned above, Pandas is designed to deal with tabular data. All data in Pandas
DataFrames will have labels for both the rows and the columns. We will use this label to execute
Pandas code.
The row labels in a Pandas Series and a Pandas DataFrame is called the index. Whenever we
refer to the index of a Series or DataFrame, that is the label associated with each row. If there
are no explicit row labels provided, the index will be an integer index starting with 0 (so 0, 1, 2,
3, etc.).
Having an integer index, however, is not very useful and is not a best practice. As you will soon
see, a lot of Pandas code is based on referencing the index, so it's best to have an index that
makes sense and actually means something, like a date for instance.
For finance and trading, the most common index is some kind of date or time (or both). For
example, it is very common to be working with pricing data, which almost all trading models will
require. This data will be of the time series variety, with each data point corresponding to a date
or time.
Column names are the other labels you need to know to write efficient Pandas code. For pricing
data, this is often a label such as “open”, “high”, “low”, “close”, “volume”, etc.
In the rest of the examples in this chapter, we will be working in the notebook environment on
the Quantopian website. This is an embedded Jupyter notebook within Quantopian where we
can grab data and practice our coding. All you have to do is sign up for Quantopian to access
this tool, no downloading required. Go ahead and create an account on Quantopian.com, which
should take all of 5 minutes, to follow along.
To grab some data to practice our Pandas coding skills, we need to use the built-in Quantopian
function “get_pricing”. This function takes several arguments or things you pass into the
function. These arguments control what data we are grabbing, how much data, the frequency of
the data, etc.
Here we will use the get_pricing function to grab the prices for the ETF “SPY” from 08/16/2019
to 08/22/2019 using daily frequency (daily bars).
Notice the arguments we passed into our “get_pricing” function. First the security we want data
for - “SPY”. Next, we specified a start date and an end date using the arguments “start_date”
and “end_date” respectively. Finally, we specified the frequency of our data using the
“frequency” argument.
This results in a Pandas DataFrame with the dates being the index (row labels) and
“open_price”, ”high”, ”low”, ”close_price”, ”volume” and ”price” being the column names.
We will use this small sample DataFrame for the rest of this chapter to demonstrate some basic
Pandas code.
First, let's save our new DataFrame as a variable. This is accomplished the same way all other
variables are assigned in Python, simply with the equal sign (=).
Let’s quickly check the datatype of our newly formed object we named “df”. We will do this the
same way we did earlier in this book, using the type() function:
We will begin our Pandas coding by learning how to grab rows and columns.
Selecting a column
To select a column of a DataFrame, we simply pass the name of the column (as a string) into
square brackets after the name of the DataFrame.
In the code above, we simply passed ‘price’ into square brackets after the name of our
DataFrame (df). Remember ‘price’ is the name of one of the columns in our DataFrame. This
returns a Pandas Series containing just the ‘price’ column.
To select multiple columns from a DataFrame, you pass a Python list into square brackets after
the name of the DataFrame, with the list containing the names of the columns you wish to
select.
Remember a Python list is itself in square brackets. So, since we are passing a list into square
brackets after the name of the DataFrame, this actually results in double square brackets.
Selecting a Row
To select a row in Pandas, it is best practice to use either .loc[] or .iloc[] after calling the
DataFrame name.
Both .loc[] and .iloc[] can be used to select rows in Pandas, though they do such in slightly
different ways.
.loc[] is used to select rows based on its index (row label). As such, you have to pass in the
name of the row or rows you want to select.
.iloc[] uses integer-based location. So instead of passing in the name or names of the rows you
want to select, you pass in the integer location of the row or rows you want to select. This will
become clearer once you see the examples and is very handy for trading purposes as we will
soon see.
In the code below, we select the row containing the data for the day “2019-08-21” using .loc[].
This works because “2019-08-21” is in the index (row labels) of our DataFrame.
Data and analysis tools provided by Quantopian.com.
To select rows via their integer location instead of the row names, use iloc[].
It is important to remember that Python indexes are 0 based, meaning it starts with 0. So, the
first row is actually row 0 and the second row is row 1.
Notice above that df.iloc[0] returns the first row of our DataFrame (08/16/2019) and df.iloc[1]
returns the second row (08/19/2019).
A handy thing about using .iloc[] is you can count backwards using negative numbers. This is
something you will find yourself doing a lot when coding trading models.
For example, df.iloc[-1] would result in the last row. In trading, this is often the most recent data.
If you want to reference the most recent moving average value, for example, you would use
.iloc[-1].
Notice that using .iloc[-1] here returns the last row of our DataFrame (08/22/2019)
In the code below, we select the rows ‘2019-08-19’ to ‘2019-08-21 using .loc[]. Notice the use
of the colon here, we are telling Pandas we want all data from the first date to the last date.
Finally, we can tell Pandas to give us all the rows from the beginning of our DataFrame to a
date, or from a date to the end of our DataFrame. To achieve this, we use .loc[] and either
leave the left side of the colon blank (start at the beginning) or the right side of the colon blank
(go to the end).
Notice in our first example, we used df.loc[ : ’2019-08-19’ ] to select all of the rows from the
beginning of the DataFrame to the row corresponding to ‘2019-08-19’.
In our second example, we used df.loc[ ’2019-08-19’ : ] to select all of the rows beginning with
‘2019-08-01’ to the last row of the DataFrame.
Selecting multiple rows works the same way with .iloc[], just pass in the integer locations as
opposed to the labels.
Here we grab rows corresponding to the integer index #3 (the 4th row) to the end:
To select both rows and columns, we will also make use of .loc[] and .iloc[]. The general
structure here is .loc/.iloc[row/rows , column/columns].
To select different rows and columns, simply pass in the rows you want to select, followed by a
comma, followed by the columns you want to select.
The functionality of .iloc[] and .loc[] remain the same, namely .iloc[] uses integer based selection
while .loc[] uses label based selection.
Let’s now use .loc[] to select a row and a column. In this example we will grab data from the
row corresponding to the date 08/19/2019 and the column “high”:
Notice that only 293.079999 is returned, which is the high from 2019-08-19.
We can do the same thing using .iloc[], this time we have to pass in the integer locations for
“2019-08-19” and “high” instead of the names.
Grabbing multiple rows and multiple columns works the same way as we saw before,
specifically you can use a colon to tell Pandas “I want this row to that row”, for example.
Boolean Indexing
A useful technique to know, which will come in handy when writing trading strategies, is what is
known as Boolean indexing. Boolean indexing is used to filter the data in our DataFrame or
Series based on a logical expression as opposed to row/column labels or integer locations.
Remember when we asked Python a logical operation, such as is something greater or less
than something else? Python then returned a Boolean value, a True or False value such as the
simple example below:
We can do that same thing for a row or column in a Pandas DataFrame. Pandas will return a
Series of True and False values.
We will continue to use our small 5 row DataFrame (“df”) we have been working with. As a
reminder, here is the full DataFrame:
Data and analysis tools provided by Quantopian.com.
In the code below, we grab the “close_price” column and ask Pandas is the close_price for each
day above $292? Notice Pandas returns a collection of True/False (boolean) values.
We can visually inspect this and see that the first and third row have close prices below $292
while the second, fourth and fifth rows have close prices above $292.
You might be thinking, “who cares, I could have just looked to see if that was true.” Well, here is
where the magic comes in.
If we pass that expression into square brackets after the name of the DataFrame, Pandas will
return to us only the rows that are “True”. In this case, it would be only the rows (days) with a
close value above $292.
Data and analysis tools provided by Quantopian.com.
Notice in the code above that only 3 days are returned since there are only three days with a
close above $292.
While this simple example is just to demonstrate the technique, think for a minute how useful
this technique could be from a trading perspective.
Let's say we had a universe of 500 stocks, for example, and we wanted to do a quick filter of all
the stocks which have a current price greater than some moving average. This technique of
Boolean filtering can get this logic done for use quickly and easily, without the need to write for
loops!
Boolean filtering isn’t limited to just one condition either. You can use multiple conditions as
well. You can also save those conditions as their own variables and use those variables to
conduct your Boolean filtering.
We will show this in the following examples. Here we print our original DataFrame for reference,
then create two variables named “condition_1” and “condition_2” which contain logical
operations.
Notice the series of Boolean (True/False) values returned when we print out “condition_1” and
“condition_2”.
Here we combined the conditions using “|” meaning “or”. Notice that four days are now
returned, corresponding to the days where either the price was greater than $292 OR the
volume is greater than 40,000,000.
Boolean filtering is a powerful technique, allowing you an unlimited number of filtering conditions
in just a few lines of code!
What is a method?
A method in Python is similar to a function, with the key difference being that a method is
dependent on the object you call it on. Meaning integers, for example, will have different
methods associated with it than, for example, strings. This is another example of why knowing
datatypes matters in your Python coding.
OBJECT.SOME_METHOD()
In this section, we will learn about a bunch of useful methods for Pandas DataFrames. This
certainly isn’t an exhaustive list of all methods available, but these are the most common
methods used in trading and quantitative finance.
For the examples in this section, we are going to use the following DataFrame. This DataFrame
contains daily pricing data over a 17-day time frame.
There are 17 rows (for the 17 days in our sample) and six columns to start with, named
“open_price”, “high” , “low” , “close_price” , “volume” and “price”
See the sample DataFrame we will be using in this section below:
To view some summary statistics of our DataFrames, we can use the .info() and .describe()
methods.
.info() displays general information about our DataFrame such as the names of the columns,
how many rows we have, what the DataTypes of the columns are and how much memory this
DataFrame is currently using.
Data and analysis tools provided by Quantopian.com.
Inspecting the output of our df.info() call, we can see that our DataFrame has 17 entries (17
rows) and 6 columns. We can also see the names of the columns and the datatypes of the data
contained in that column.
For our DataFrame, all columns contain data of the datatype “float”, which is a non-whole
number (a number with a decimal place).
The .describe() method displays statistical summaries of the columns in our DataFrame such as
count, mean, max, min, etc.
In our first example, we will cover the technique used to add a new column to an existing
DataFrame.
In this example, we will also introduce the first two DataFrame methods we will cover- .rolling()
and .mean().
.rolling() is used to grab a rolling window of data. This is very useful in calculating things like
moving averages, which we will do here.
The amount of rolling periods you want is controlled by an integer argument you pass into the
parentheses after .rolling(). For example, if you want a five-day rolling window, you would use
.rolling(5).
We are going to do that in the following code. Here we add a new column we will name “SMA”
which calculates the 5-period simple moving average of the closes.
Data and analysis tools provided by Quantopian.com.
Notice here we referenced the close_price column, grabbed a rolling 5-period window of data
using .rolling(5), then calculated the mean using .mean(). We set the result as a new column
titled “SMA” - all in one line of code!
These two methods return the first or last X rows of a DataFrame for .head() and .tail()
respectively. By default, .head() and .tail() returns the first and last 5 rows.
Passing an integer into the parenthesis controls how many rows are returned (if you leave it
blank, 5 will be returned).
Example:
Notice the head() and tail() methods returned the first and last 5 rows of our DataFrame,
respectively, by default. If we explicitly wanted to have it return the last 2 rows, we would pass
in the integer 2.
.pct_change()
A common task in quantitative finance is to calculate the percent changes of a security over a
certain time period.
Pandas makes this very easy by including a .pct_change() method. This method calculates the
percentage change on a column of data.
If we don’t pass in any arguments to our pct_change() function, Pandas automatically calculates
the percent change for one row (in this case one day). This is very useful if we quickly want to
change daily closing prices into daily returns.
Data and analysis tools provided by Quantopian.com.
If we want to calculate the percent change over other time periods, we would pass the number
of rows we want in our .pct_change() function.
If, for example, we want to get the rolling percent change for the last 5 days, we would use
pct_change(5).
Data and analysis tools provided by Quantopian.com.
.dropna()
Notice that in our DataFrame, there are a couple of data points in the “SMA” column that shows
up as NaN (not a number).
This happens because we are calculating a 5-day moving average, but we need at least 5 day’s
worth of data to accurately calculate the average. For the first four rows in our DataFrame, we
don’t have enough data points to calculate the average, so Python returns NaN.
These NaN values can, at times, mess up the calculations for our code. As such, Pandas has
the handy .dropna() method, which by default drops (deletes) any row that contains a “NaN”
value.
See the code snippet below, the first code just prints out our whole DataFrame (notice the
“NaN” values in the “SMA” column for the first four rows). In the second piece of code, we add
the .dropna() method which results in those four rows containing NaN values to be dropped.
Data and analysis tools provided by Quantopian.com.
.shift()
A useful technique in quantitative research and trading is to shift values. If your index is a date,
which is typical in finance, this has the effect of shifting prices forward or backward in time.
How many rows to shift the value is controlled by the argument passed into the .shift() method.
In the code example below, we add a new column “Tomorrows_Close” by shifting the “price”
column back in time by one day (one row in this case).
Using this technique, we can observe the future return of a security given some factor, technical
analysis indicator, or anything else. We will do this in the next section.
.sort_values()
To sort a specific column in our DataFrame, we would use the .sort_values() method. In this
method, we need to pass in an argument to tell Pandas which columns we want to sort by.
There are also arguments for this function that control whether you want to sort from high to low
or low to high.
Notice that the code below sorts the DataFrame from low to high based on the “price” column.
Notice that the index (dates) are not in chronological order anymore.
Data and analysis tools provided by Quantopian.com.
Resampling data in Pandas is a very important technique for trading. Resampling refers to
changing the frequency of your price data. For example, if you have daily price data, like we do
in this example, and you want to change it to weekly frequency (weekly bars) you would employ
the .resample() method.
For the resample method, we need to pass in two things. First would be the time frequency we
want to resample the data to. For example, to resample to weekly data we would need to pass
in “w” to the resample method. Find all the available resample frequencies below:
Data and analysis tools provided by Quantopian.com.
The next thing we need is an aggregation method of our resampled data, letting Pandas know
how we want it to aggregate the data. For example, we can use .mean() along with the
.resample() method to return the average (mean) price for every week.
While this is useful for some tasks, what is more common in finance is to resample and take the
last data point. For example, if we had daily data and wanted to resample to weekly closes, we
would have to chain .last() after our .resample() method.
See the example below, where we resample our DataFrame from the daily frequency to the
weekly frequency, taking the weekly closes as our aggregation method using .last().
Data and analysis tools provided by Quantopian.com.
Notice that the dates are now the end of every week, as opposed to every day.
Chapter 7 - Case Study: Using Pandas to Conduct Quantitative
Financial Research
We are now going to transition to conducting preliminary quantitative research that can be
implemented using our new Python/Pandas skills.
We will go over this step by step, explaining what we are doing and showing the code
examples.
All the techniques we are going to use in this example we have already covered in our
introduction to Python and Pandas.
Let’s say, for example, you aim to develop a trading strategy that uses the popular RSI technical
analysis indicator. You hypothesize that low RSI readings will lead to higher than average
returns over the next couple days since those securities showing low RSI readings will naturally
be oversold and are likely to revert to the mean.
You visually inspect some charts and this does seem to be the case. You observe consistent
snap-back rallies after low RSI readings.
Visually inspecting charts, however, is not nearly enough. Let's write some code to see if our
observation in fact holds up quantitatively.
Our goal for this research is to observe the future percent changes f or a given security after a
certain RSI value is reached. This will show us if this indicator has any predictive power and will
guide us as to how we use this indicator in our potential trading strategy.
In the code below, we grab data for SPY, the most popular ETF in the world. We will use this as
our example security.
We get this data via the “get_pricing” function in our embedded Jupyter notebook on the
Quantopian website. Notice we pass in the arguments for:
● The security we want data for, in this case “SPY”
● Our start date
● Our end date
● The frequency of our data
The result is a DataFrame, which we saved as “df”, spanning January 2003 to August 2019
containing data for SPY in the daily frequency.
Note: this data is total return data, so it is adjusted for dividends and other corporate actions.
Let’s get some quick information about our new DataFrame - “df.” We use the .info() method
here to give us a macro view of the data in this DataFrame.
Data and analysis tools provided by Quantopian.com.
Here we can observe some information about our DataFrame, including the number of rows
(4,184 rows/days), the six column names, and the date range (01/02/2003 - 08/15/2019) to
name a few.
Step #2 - Make a new column which calculates the RSI value for every day in our sample
Next we have to calculate our technical analysis indicator, RSI. While it certainly is possible to
code the actual math to calculate the RSI in Python, this indicator already comes prepackaged
in a third-party package called “TA-LIB.” Lucky for us, TA-LIB is already available on the
Quantopian website for us to use.
Before we use TA-LIB, however, we must import it into our environment and give it a name. We
do this via the “import” keyword. We name this package “ta”, which will be how we refer to it in
our code.
Now that we have TA-LIB imported, we can use it to calculate the RSI values per day. We will
add this statistic to our existing DataFrame, putting it in a new column called “RSI.”
Data and analysis tools provided by Quantopian.com.
In the code above, we told Pandas we are making a new column called “RSI” which contains
the 4-period RSI values per day. We used TA-LIB to calculate this, passing in the column we
want to use for our calculation (in this case the “close_price” column) and the length of our RSI
(in this case 4).
This will result in NaN values for the first three rows, since there isn't enough data to calculate
the RSI. We will use the .dropna() method to drop any rows that now contain NaN values.
The argument “inplace=True” just tells Pandas to save the new DataFrame “df” as a new copy
with the NaN rows dropped.
Step #3 - Make a new column displaying the future 3-day percent changes.
The next step is to make a new column displaying the future 3-day percent changes for our
security - SPY. If you have never done this before it may seem a bit tricky at first, but once you
get the hang of this technique it will become second nature.
We can easily get this done in one line of code, but for the sake of clarity, we will do this in two
steps, making two new columns.
The first step is to shift the close prices back in time. We can do this via the .shift() method.
Remember the goal here is to observe future price changes, shifting the data back in time
allows us to achieve this.
Data and analysis tools provided by Quantopian.com.
In the code above, we made a new column in our DataFrame titled “Future_3_day_close.” We
populate this column using the close_price column shifted back 3 rows (3 days).
We used .shift(-3) to achieve the desired result. To inspect the last 10 rows of our DataFrame,
we used .tail(10).
The next step is to take our new “Future_3_day_close” column and calculate the 3-day
percentage change. This will result in a column which contains the 3-day future percent
changes, which is what we are after.
Finally, since these calculations resulted in some NaN values for very recent days (we can’t tell
the future, after all), we will add our .dropna() method to drop those rows.
Again the “inplace=True” argument just saves the new DataFrame “in place”, meaning a new
copy with the NaN rows dropped.
To summarize, we first grabbed data using Quantopian’s built-in “get_pricing” function. We then
calculated the 4-period RSI values for every day using ta-lib and added that as a new column
titled “RSI”. Finally, we calculated the future 3-day percent changes by shifting the closing
prices back in time by 3 days and calculating the 3-day percent change.
Now all that is left to do is filter our DataFrame using Boolean filtering and observe the results.
Step #6 - Use Boolean filtering to filter our DataFrame using different RSI readings
Our next step is to filter our DataFrame for different RSI readings and observe the future
percent changes.
Before we do this, however, it's helpful to get a baseline. Let's observe all the days and the
average 3-day return for the 4,000+ days in our sample. The column we care about here is
“Future_3_day_pct_ch.” We will use the .describe() method to get some summary statistics for
this column.
Data and analysis tools provided by Quantopian.com.
We can see that we have 4,174 days in our sample and the average 3-day percentage return
was 0.0012 (0.12%).
Let’s now observe the future 3-day percent changes when the RSI is below 10. To do this we
first have to code a logical expression that checks if the RSI each day is below 10.
We then pass that logical expression into square brackets after our DataFrame name to filter
our DataFrame to return only rows (days) with an RSI value below 10. We name this new
DataFrame “filtered_df” and observe the last 5 rows using .tail(). Notice the RSI column, all the
values here are below 10.
Finally we grab the “Future_3_day_pct_ch” column in our new “filtered_df” DataFrame and use
.describe() as before.
Let's do one more example to make sure we are clear. Here we are going to look for RSI
values greater than 10 but less than 20. We follow the same code structure as before; the only
difference is this time we need to pass in two logical expressions (RSI > 10 & RSI < 20) into the
square brackets along with “&”.
This time, after filtering for RSI values above 10 but below 20, we observe 211 trading days and
an average 3-day future return of 0.0037 (0.37%). Not as high as when RSI is below 10, but still
about 3 times higher than the average 3-day return for the whole sample (0.12%).
The table below provides more statistics using this technique. Average 3-day future returns are
listed for 10 buckets of RSI as well as the whole sample. The returns for each RSI bucket are
then plotted as a histogram.
It looks like low RSI readings do lead to higher average future 3-day returns. As such, we can
use this signal as a potential input to a complete trading model, which we will do in the next
chapter.
Data and analysis tools provided by Quantopian.com
In this chapter, we will be walking through the steps necessary to create our first trading
algorithm. This chapter is for educational purposes and meant to teach the basics of the
Quantopian/Zipline backtesting environment. The purpose is not to develop a world-beating
trading strategy.
What is Zipline?
Zipline is an event driven, open source backtesting engine written in Python. It is currently the
most popular and full featured Python backtester in the world. Zipline is the backend that
powers Quantopian.com.
Since Zipline is open source, you can download it to your local machine and use your own data
sources, though this takes some technical maneuvering to get it working properly.
Since this book is meant to introduce you to Python programming as it applies to quantitative
trading, we will be working on the Quantopian website, which doesn’t require any downloads.
Keep in mind that everything we learn regarding writing Zipline algorithms can be applied to
Zipline on your local environment as well.
We will begin our introduction to Zipline by introducing the mandatory initialize() function. The
initialize() function is a required function for any trading strategy developed using Zipline.
What does the initialize() function do? The initialize() function is used to “initialize” our
algorithm, meaning it is used to “set up” our algorithm. The initialize function is a required
method that is called only once, at the beginning of a backtest.
As such, some of the things you code in the initialize() function include:
What is context?
An important and often confusing topic for those new to Zipline is the “context” object. The
context object is technically an augmented Python dictionary that is used to maintain state
throughout your algorithm.
For practical purposes, you can think of the context object as a way to maintain global variables
throughout your algorithm, as well as do things such as reference portfolio-level statistics such
as amount of current positions, cost basis of your positions, and current cash available just to
name three.
Global variables refer to variables that can be referenced anywhere in your algorithm, not just in
the current function you are working in. Think of things such as the maximum amount of
positions your strategy will hold, the length of a moving average and the lookback for RSI as
three examples of when global variables should be implemented.
The context object is an augmented Python dictionary so that properties can be accessed using
dot notation.
As a simple example, let’s say you wanted to set a moving average length of 200-days. You
can set up the global variable “context.ma_len” in your initialize function:
From there, in your trading logic, you can reference that global variable when doing calculations.
You can also easily change the value of your variable, in this case the length of a moving
average, by changing just one number in your algorithm – “context.ma_len”.
This makes your code much more maintainable, dynamic, and robust.
If all of this seems confusing to you, fear not. Seeing an example will be a big help, and actually
setting up your own trading algos will be an even bigger learning experience.
Setting up Securities to Trade
Another task we must do in the initialize function is to set up what securities (stocks, ETFs, etc)
that our trading algorithm will be using to trade.
One way would be to use Zipline’s pipeline API to dynamically bring in securities, such as the
500 most liquid US stocks on a monthly basis. Covering the pipeline, however, is a bit beyond
the scope of this beginner e-book, so we are going to stick with manually setting up securities
here.
As your Python and Zipline skills advance, you will find the pipeline to be a useful and powerful
tool in developing sophisticated trading strategies.
context.SOME_NAME = sid(SOME_SID_NUMBER)
SID stands for “security ID” and is the way Quantopian will know what exact security you want to
trade.
To look up the correct SID for the security you want to transact in, simply type sid(). This will
then bring up a dynamic dropdown list for you to search for and choose the security you are
looking for.
Now if we want to reference SPY, either in our trading logic or to put in an order, we just
reference “context.spy”.
We must set the security to context.SOMETHING, as this context object is the way to set global
variables in our algorithm.
If we want to trade a basket of stocks or ETFs, another way to set this up is to create a Python
list, like we learned about in chapter 2. This is useful if we want to iterate through the securities
in our list using a for loop, which is common.
Notice the square brackets around the list of SIDs. Remember all Python lists have square
brackets.
Another task that is implemented in the initialize function is scheduling other functions to run.
For example, let's say you wanted your trading logic to run once a day, 5 minutes before the
market close. You would schedule that function (you give it a name) to run in your initialize
function.
The above “schedule_function” function schedules a function called “trade” to run every day, 10
minutes before the market closes.
Another example:
This “schedule_function” function schedules a function called “trade_weekly” which runs on the
last business day of every week, 5 minutes after the market opens.
Once we set up our initialize function, including selecting which securities we want to trade and
scheduling other functions (which will contain our trading logic) to run, let's now explore how to
write some actual trading logic.
This is where we can put our newfound Python/Pandas skills to good use!
From there, we would need to write this “trade” function. All our scheduled functions take two
arguments - context and data.
Let’s now cover some common tasks required to do calculations, writing trading logic and do
trades.
Getting Current Data - data.current
Getting current and historical data is an obvious important step in most any trading strategy.
Now by “current” here, we mean the most recent data point in that point in the backtest. We
do not mean the actual most current data as of the current date you are writing Python code.
This, after all, would be cheating. We can’t know the future, so our algorithm needs to take the
“current” (at that point in that backtest) statistics when executing trading logic and entering
orders. This avoids look-ahead bias.
To get the current price of a security, we use the data object and use dot notation to reference
the most current data - data.current.
Arguments passed into data.current include the security you want current data for and what field
you are requesting.
In the code below, we are pulling in the current price for SPY and saving it as a variable -
current_price.
To access a historical window of data, which we would need to calculate most if not all technical
analysis indicators (such as moving averages and RSI), we also need to use the data object,
this time data.history.
We now have both current and historical data we can then reference, transform, and use to
calculate indicators, control trading logic, and much more.
Ordering
There are several ways to do this, depending on what you are trying to achieve. Do you have
an amount of shares you know you want to buy/sell? Or would you rather buy or sell based on
dollar value, and have Quantopian figure out how many shares that equates to? Or would you
want to control your trades by percentage of your portfolio - logic such as “buy XYZ stock with
10% of my portfolio”?
Below are a couple of examples of ordering in Quantopian. This is not an exhaustive list, but
something to get you started.
In the above examples, the first one, using the “order” function, simply buys a specified amount
of shares of SPY. The second line of code “order_value” buys a specified amount of SPY
expressed in dollars. Finally, the last line of code, “order_percent” buys SPY based on a
percentage of your portfolio.
Note: if you want to sell, or sell short, simply change the values to negative numbers.
Other interesting ordering methods include the ability to “target” a specified amount. Instead of
just buying, say, $2,000 worth of SPY every time your logic is satisfied, this targets a position of
$2,000 worth of SPY in your portfolio.
Appropriate trades will then be taken to get your position value to $2,000, whether that involves
buying more or selling to get your allocation back in line. This obviously depends on your
current positions, which Zipline automatically checks for you.
Other order types, such as limit orders, stop orders, etc. are controlled through an argument
called “style” that you pass into any of the order examples we previously covered.
Here is an example of an order targeting 20% exposure to SPY, but only if the price gets to a
certain level (limit order).
This order would target a 20% allocation to SPY only if SPY traded at or below $285.
Accessing portfolio level information is an important part of coding a complete trading strategy.
After all, we don't want our algorithm to just continually buy a security if a condition is met if we
don't have the capital available to make the purchase, right?
Access information like this is done through context object again using dot notation. We won’t
cover every bit of information you can pull here, just some common ones to get you started:
Data and analysis tools provided by Quantopian.com.
In the code above, notice we referenced the context object followed by .portfolio. From there,
we can see all of our options for things to reference in our algorithm.
Most of these are self-explanatory. It is easy to imagine we would want to reference such
information in our algorithms.
For example, let's say we wanted to check how much capital is currently being used by your
strategy to determine if we can take a new trade or not. This is easily accessed via
context.portfolio.capital_used.
Once you get the hang of referencing these portfolio level statistics it will become second
nature.
To reference statistics regarding individual positions, simply pass that security into square
brackets after “context.portfolio.positions” to see the options available to you.
For example, the code below retrieves the number of shares we are long or short for a specific
security, in this case SPY. This is useful to check if we already have a position in that security or
not, which we will use in our example in the next chapter.
In our final chapter, we are going to walk through our first trading strategy step by step. We will
then provide the whole source code followed by an inspection of the backtested results.
Chapter 9 – Case Study: Writing Your First Zipline Algorithm
In this chapter, we will put everything we learned together. We will use our new Python/Pandas
skills as well as our knowledge of the Zipline API to create our first trading strategy. As this is
an introductory book, we will keep the strategy relatively straightforward.
We will go step by step, explaining what we are doing and offering code snippets. At the end of
the chapter, we will present the entire source code of the strategy. In the next chapter, we will
present the backtested results.
For our example strategy, we are going to continue with the research we conducted in chapter
7.
Remember how we observed higher future returns when the 4-period RSI values are low? We
will use this insight to construct a simple trading model.
I want to reiterate; this model isn’t meant to be some world-beating algo. Instead, what we are
trying to do is tie in some of the techniques we have learned so far in this book.
● For loops
● If statements
● Python Lists
● Certain specific functions and methods such as .len() and .mean()
● Scheduling functions
● Pulling both current and historical data
● Utilizing order method including “order_percent” and “order_target_percent”
● Pulling portfolio level information - in this case checking the number of shares we are
long/short in a security
Taking a step back and thinking about this strategy - what we are basically doing is investing in
high quality bonds (AGG) unless there is a mean reversion opportunity in one of the sector
ETFs.
When there is a mean reversion opportunity in one of the sector ETFs, as measured by an
RSI(4) value under 20 and the ETFs current price being above its 200-day moving average, we
sell some of the bonds and invest in that sector. We hold that sector until its 4-period RSI is
above 70.
Given we are in bonds most of the time, we should expect this strategy to have relatively lower
returns and much lower risk, in the form of volatility as well as max drawdown, compared to SPY
itself.
In this trading strategy, we are again going to utilize the third-party package “TA_LIB” to
calculate our 4-period RSI value, just like we did in chapter 7.
We have to import this library to make it available for us to use. To do this, we simply use
“import” followed by the name of the package. We then save it as a name, in this case “ta”, to
be used in our code.
Now when we reference “ta” in our code, we are utilizing the ta-lib package. This makes
calculating technical indicators much easier.
Remember, the initialize() function is a required function for our algorithm to run and is used to
set up the algorithm.
We are going to do a few things in the initialize() function for this algorithm including setting up
the securities we want to trade and schedule other functions to run, which will contain our actual
trading logic as well as place our orders.
We are going to set up a Python list called “context.sectors” which contains our 10 US sector
ETFs. We are also going to set up our bond ETF, which I am calling “context.bonds”.
The next thing we are going to need to do is set up other functions to run which will contain our
trading logic. For this strategy, we are going to set up three additional functions called “entries”,
“exits” and “trade_bonds”, whose purpose is self-explanatory given their names.
We are going to schedule “exits” to run every day, 15 minutes before the close. We are then
going to schedule “entries” to run every day, 10 minutes before the close. Finally, we are going
to schedule “trade_bonds” which will run 5 minutes before the close.
That's all we are going to need to do for the initialize function. Our final initialize function looks
like this:
Next up will be our entries() function. This function will control whether our buy logic for the US
sector ETFs is met.
To achieve this, we are first going to set up a for loop. Our for loop will iterate through all 10
ETFs in our context.assets universe one by one. We will then pull current and historical data for
each, calculate the 200-day moving average and 4-period RSI, then execute our trading logic.
First things first, let's set up a for loop to iterate through the Python list context.assets which
contains our 10 sector ETFs.
Data and analysis tools provided by Quantopian.com.
Notice the use of the iterative variable which I named “x”. This “x” variable will be each ETF one
by one as the loop iterates through the python list “context.assets”. You can name this variable
whatever you want.
The next thing we will do is pull both the current price and the trailing 200 days of historical
closes using data.current() and data.history().
We saved the most recent price as the variable “current_price” and we saved the trailing
200-day window of closes as “closes_history”.
The next thing to do would be to calculate the 200-day moving average and the 4-period RSI.
Since we are automatically grabbing a trailing 200-day window of historical data, all we have to
do is take the average of that data to get the moving average. To achieve this, simply add on
the .mean() method.
For RSI, we are going to lean on the ta-lib library here. Since we already imported “talib as ta”
we write the ta-lib function for RSI - ta.RSI().
ta.RSI() takes two arguments, the prices we are using the calculate the RSI and the period. We
pass in closes_history and the number 4 here.
Data and analysis tools provided by Quantopian.com.
We saved the 200-day moving average as the variable “sma_200_day” and the 4-period RSI as
“rsi”.
Notice that we typed [-1] after the RSI calculation. This is so we grab the most recent RSI value
in our logic.
Why don’t we use .iloc[-1] here like we learned? The reason is because this isn't a Pandas
DataFrame or Series, it's actually a Numpy Array (which we didn’t cover). In Numpy, which is
the underlying package that Pandas was built on, to reference the last value in an array you
simply use [-1].
This is a quirk of ta-lib, don’t let it trip you up. Once you get used to this subtle nuance, it's not a
big deal.
Finally, the last thing to do is code our trading logic. We know that our rules are to buy each
sector ETF if its current price is greater than its 200-day moving average and its RSI(4) is below
20.
We are going to add one more rule to the code - make sure our position is currently flat (i.e. we
don’t hold this position already). This will help control the flow of the algorithm and make sure
that we aren't adding to our long positions if the conditions are again met the next day. We
reference “context” to pull this portfolio level information as we showed in the last chapter.
Notice the use of the “if” statement and the “and” keyword. For the if statement to be “True”, all
of three of our conditions have to be met:
If all of those conditions are true, we use “order_percent” to send an order. Our “order_percent”
function buys 10% of our portfolio in that ETF. This is simple math since there are 10 possible
sector ETFs we are allocating 10% to each.
Next up would be our function exits(). In this function will, you guessed it, check if we need to
exit any current long positions in the sector ETFs.
Remember our rules are such that if we are long a US sector ETF, we will simply exit if the
RSI(4) is greater than 70.
To achieve this, we will again set up a for loop. This time we will iterate through all the ETFs in
our python list - “context.assets”. We then grab a window of historical data and calculate our
RSI the same way we did before.
From there, we again set up an if statement. This time two conditions have to be satisfied for
our if statement to be True:
If both of those conditions are True, we sell the ETF using order_target_percent() and passing
in the desired allocation - 0%.
Note the use of order_target_percent() here. By using this method, it ensures that we are
getting out of the whole position no matter how long or short we are.
The last thing we are going to need to do is adjust our bond position. Remember, we are
allocating all unused cash here to high-quality bonds, in this case, AGG. This function runs
every day, 5 minutes before the close. It is purposefully the last function to run, allowing our
entries and exits functions to do their transactions first before we allocate the rest of the capital
to AGG.
The main thing we need to do in this function is to calculate the percentage of capital to put into
AGG. To illustrate how this works, imagine that we are long four US sector ETFs - let’s say
XLF, XLV, XLI, and XLK.
Since we are allocating 10% of capital to each position, that would mean we are allocating 40%
of capital here to sector ETFs. As such, we need to put the remaining 60% of our capital in
AGG, which is easily done via an order_target_percent() order.
Using the above example, we would first need to check how many sector ETFs we are currently
long. We would then multiply that number by 10% (0.10) and subtract it from 100% (1.0) to
calculate how much capital to put in AGG.
So, if we have 4 sector positions, then 40% (0.4) is allocated to the sectors:
4 * 0.10 = 0.40
An easy way to check the number of positions we are long is to use the len() function and pass
in “context.portfolio.positions”. This will give us our total number of current positions:
But Wait...
There is only one small problem with this approach. The code above will give us our total
number of long positions. This could, and most likely will, include the bond ETF we are long -
AGG.
We don't want to count the bond ETF however. We only want to know how many US Sector
ETFs we are long.
We can easily solve this problem by just checking if we are long AGG, which in our algo is
called “context.bonds”. If we are long AGG, we simply subtract 1 from the result of the
len(context.portfolio.positions) call.
The code looks like this:
Once we have the amount of sector ETFs we are long, saved under the variable
“amount_of_current_positions”, we simply subtract that from 10 and multiply by 0.10.
This will get us to our desired number - the percentage to allocate to AGG which we
appropriately call “percent_to_allocate_to_bonds”.
There you have it. Find the complete code for our first algorithm below:
Data and analysis tools provided by Quantopian.com.
Results
In this section, we will display the test results of our sample trading algorithm. For such a simple
strategy, the results are quite good.
To calculate these performance statistics and make these charts, I utilized the open-source
Python library Pyfolio.
Pyfolio is a robust, open-source Python package used to analyze backtested results, also
developed by the good folks at Quantopian. You will find it to be a very useful package, allowing
you to do a deep dive into the historical test results of your strategy.
In the spirit of brevity, we won’t go into the code required to calculate these statistics and make
these performance charts. Just know that there is much less coding work here than you
probably assume, as Pyfolio has most of the statistics and charts already pre-programmed.
Our backtest spans the beginning of October 2004 to the end of August 2019.
Let's first take a look at some summary statistics, such as returns, volatility, max drawdown, and
Sharpe Ratio:
Here is the cumulative return chart of our strategy, or the theoretical growth of $1,000:
Here is a look at the number of positions we held through time. This is useful to see how often
we were taking mean reversion trades in the US Sector ETFs.
Data and analysis tools provided by Quantopian.com.
Let's now take a look at the results on a trade by trade basis. For this analysis, I excluded all
the transactions done in AGG, as this wouldn't make much sense to analyze. These trade
statistics only include mean reversion trades we did the US Sector ETFs:
This shows we had 516 total trades in the US Sector ETFs, with a win rate of 74%. Note that a
“trade” here is a round trip trade, meaning a buy and a sell.
We can see that the average duration of our trades was 18 days with the median being 14 days.
It looks like we had one trade that took a while for us to exit - 72 days to be exact. Maybe
adding a stop rule would help that situation - something to potentially do more research on!
We can create a quick histogram to visualize our average trade duration:
A quick visual inspection of the chart shows that only a few trades lasted more than 50 days.
There are many more statistics and charts we can show, but I wanted to give you a taste of the
power of Pyfolio.
Chapter 10 – Conclusion and Next Steps
There you have it. I hope this e-book succeeded in giving you an introduction to Python
programming applied to the world of quantitative finance and trading. I hope you now realize the
potential power of this tool.
Notice that we used the same language, Python, to do our initial RSI research, code a full-blown
trading model, and analyze our backtested results. This is one of the many advantages of the
language, you can use Python from the beginning of your quant workflow all the way to the end.
I want to reiterate that as an introductory e-book, this just scratches the surface of what can be
done. Our trading model here was relatively simple, and we only used a static universe of
securities. The power of python and the Zipline API allows for the creation of complex
strategies complete with a dynamically changing universe of securities, such as the 500 most
liquid US stocks at any given time, for example. Not only that, we can also use fundamental
data in our algorithms, such as various measures of profitability, valuations, and debt/leverage,
something we didn’t touch in this book.
If you would like to go much deeper into Python, quantitative research, and trading strategy
development, we encourage you to check out our Python class – Python Programming for
Traders. In the course, we go much deeper into all topics related to Python programming for
quantitative research and trading.
Below are some topics we cover in detail during our 5-week “Python Programming for Traders”
course:
Register Today For our live Python Programming For Traders course. Please Call
Toll Free 1-888-484-8220 ext. 616 (outside the U.S. please dial 973-494-7311 ext. 616)