Inglese, Lucas - Python for Finance and Algorithmic trading (2nd edition)_ Machine Learning, Deep Learning, Time series Analysis, Risk and Portfolio Management for MetaTrader™5 Live Trading (2022).pdf
Inglese, Lucas - Python for Finance and Algorithmic trading (2nd edition)_ Machine Learning, Deep Learning, Time series Analysis, Risk and Portfolio Management for MetaTrader™5 Live Trading (2022).pdf
Lucas INGLESE
Disclaimer: I am not authorized by any financial authority to give
investment advice. This book is for educational purposes only. I
disclaim all responsibility for any loss of capital on your part.
Moreover, 78.18% of private investors lose money trading CFD.
Using the information and instructions in this work is at your own
risk. Suppose any code samples or other technologies this work
contains or describes are subject to open-source licenses or the
intellectual property rights of others. In that case, your responsibility
is to ensure that your use thereof complies with such licenses and
rights. This book is not intended as financial advice. Please consult a
qualified professional if you require financial advice. Past
performance is not indicative of future performance.
Who am I?
I am Lucas, an independent quantitative trader specializing in Machine
learning and data science and the founder of Quantreo, an algorithmic
trading E-learning website ( www.quantreo.com ).
To show you some realistic results, you can see the profit of my last
portfolio of strategies in live trading: 2.5% of return for a 0.6%
drawdown without leverage in 1 month and a half.
Chapter 2: Prerequisites
This chapter discusses the necessary prerequisites to understand this
book thoroughly. First, we will discuss some math, statistics, algebra,
and optimization basics. Then, the leading financial theories and the
Python basics are mandatory to implement our trading strategies.
2.1.1. Algebra
Algebra is a significant field to know when we work in finance.
Indeed, we will work with matrices all the time because it is the heart
of algebra. Thus, it is required to understand the basics about them.
There are many theories about matrices in math, more complex than
others, but for us, it will be straightforward: a matrix will be a set of
data. Let us see an example of a matrix A with shape ( n,m ), where
m is the number of columns and n is the number of rows in the
matrix.
We can imagine that matrix A with a shape of (3,2) will give us the
daily return of some assets. Usually, the rows will be the days, and
the columns represent the assets. Thus, the matrix has two assets and
three daily returns for each.
There are many essential operations that we can apply to a matrix.
Let us see some of them:
2.1.2. Statistics
Statistics are mandatory when we work in quantitative finance. From
calculating the returns to the computation of probabilities, we will see
the necessary skills to work on a financial project with peace of mind.
First, we will see some statistical metrics, but do not worry if you do
not understand them 100% because we will discuss them later in the
book! To compute an example of each metric, we will work with
these vectors (vectors are matrices with a shape (1,n))
Where |-x| = |x| = x. It means that the absolute value takes only the
values not the sign.
Now, let us discuss statistic tests. There are many, but we will see
only one in this book, which is essential to understand: the
augmented Dick and Fuller test. However, first, let us explain how a
statistic test generally works. Usually, there are two hypotheses: H0
and H1. The objective of the test is to test the H0 hypothesis.
We can see the CDF to the right and the PDF to the left of the figure.
However, the percentage between the standard deviation interval is
only available with a normal law, not for the others.
2.1.3. Optimization
When we talk about portfolio management, it always implies talking
about optimization. The optimization methods allow us to minimize
or maximize a function under constraints. For example, in portfolio
management, we maximize the portfolio's utility under constraints
that the portfolio's weight must equal 100%. Let us take an example
of an optimization problem explaining each part.
When we buy a CFD, we will pay some fees. These fees are either
commission or spread. The spread is the difference between the bid
price and the asking price. Moreover, the commissions are a fixed
amount that we will pay the broker to enter positions and exit.
The last notion that we need to know is leverage . A financial tool
available when we open an account with the broker allows, we to
multiply the strategy's returns. Thus, it is good when we earn money
and wrong when we lose it. Leverage is a powerful tool that can be a
destructor of our capital in the wrong hands (reminder: with great
power comes great responsibility)
Usually, leverage is taken by the account and not by each trade. As
we said earlier, leverage increases the risk of losing money. However,
it allows people with little capital to invest in the market. So, in
practice, it is strongly recommended to work on accounts with capital
coverage, which means that the broker closes our positions before we
run out of capital so that we do not get into debt with the broker, even
though during extremely volatile movements, we may owe the broker
money if he could not close the positions in time.
Pandas
Matplotlib
Time
time.sleep(n) Put the computer in a
break of n seconds
scipy.optimize.minimize(criterio Minimize a
n, x, args=()) function
StatsModels
Summary:
Asset-specific risk is the risk associated with an asset that does not
depend on market conditions. For example, an issue in a factory of
Tesla can reduce the price of the Tesla stock but only the Tesla stock
because there are no reasons that this issue affects the Netflix stock
price, for example.
The systemic risk is a macroeconomic risk on all the economy's
assets. For example, suppose there is a significant recession caused
by geopolitical instability. In that case, all stock prices will decrease
because people do not want to consume because of the uncertainty of
the future. It is a systemic risk because it affects all the economy's
assets.
The big difference between specific and systemic risk is that the
systemic risk does not depend on the firm's actions. Thus, the
objective of the portfolio optimization methods is to reduce the
specific risk in the portfolio. To do that, we use diversification.
3.1.2. Diversification
Diversification is the core process of portfolio management. It allows
us to reduce the specific risk of the portfolio, as explained above. The
strategy aims to increase the number of assets in the portfolio to
make the specific risk of each asset insignificant. The explanation is
shown in figure 3.1.
The figure shows that the increase in the number of assets allows us
to reduce the portfolio's risk to the portfolio's systemic risk because
each asset's specific risk decreases with the number of assets. So, the
figure highlights the power of diversification to decrease the risk of a
portfolio.
Don’t worry we will show you how to compute it in the next section!
3.2.2. Mean-variance criterion
In this subsection, we will implement a mean-variance optimization
[2]
. It is necessary to download a database with some assets' adjusted
close stock prices. To do the portfolio optimization, we will use these
assets: Facebook, Netflix, and Tesla. For the importation, we will use
the library yfinance , which allows us to import data very quickly.
Moreover, we transform the data into daily variation to put all assets
on the same scale.
# Parameters 1
Lambda = 3
W= 1
Wbar = 1 + 0.25 / 10 0 2
# Compute portfolio returns
** 2 7
criterion = -criterio n 8
return criterion
n = data.shape[ 1 ] 2
# Initialization weight value
x0 = np.ones(n ) 3
# Optimization constraints problem
cons = ({ 'type' : 'eq' , 'fun' : lambda x: sum ( abs (x)) - 1 } ) 4
# Set the bounds
With the previous code, we have found the best allocation for this
asset and, the vector of the weights is available in figure 3.3.
1
portfolio_return_MV = portfolio_return_MV. sum (axis= 1 ) 2
# Plot the CM
plt.figure(figsize=( 15 , 8 ) ) 3
plt.plot(np.cumsum(portfolio_return_MV)* 100 ,
5
plt.xticks(size= 15 ,fontweight= "bold" )
plt.yticks(size= 15 ,fontweight= "bold" )
plt.title( "Cumulative return of the mean variance potfolio" , size=
20 )
In figure 3.4, we can see the plot of the cumulative return on the test
set. It is good to see the portfolio's return, but only in the next chapter
will we learn how to complete risk analysis and backtest.
Indeed, we will see many metrics such as Sharpe and Sortino ratios,
drawdown, and the CPAM metrics to only give them as an example.
# Parameter s 1
Lambda = 3
W= 1
Wbar = 1 + 0.25 / 100
skewness = skew(portfolio_return, 0 ) 3
kurt = kurtosis(portfolio_return, 0 ) 4
# Compute the criterion
criterion = Wbar ** ( 1 - Lambda) / ( 1 + Lambda) + Wbar ** (-Lambda) \
* W * mean - Lambda / 2 * Wbar ** ( -1 - Lambda) * W ** 2 * std ** 2 \
+ Lambda * (Lambda + 1 ) / ( 6 ) * Wbar ** ( -2 - Lambda) * W ** 3 *
skewnes\
- Lambda * (Lambda + 1 ) * (Lambda + 2 ) / ( 24 ) * Wbar ** ( -3 - Lambda) *\
W ** 4 * kur t 5
criterion = -criterio n 6
return criterion
5 Compute
The Sharpe ratio is the best known of the financial metrics. Indeed, it
is a reference metric in the industry. It allows us to understand the
additional benefits for 1% more risk. We can compute the Sharpe
ratio with the following formula:
is the mean of the portfolio returns (annualized), is the volatility
(annualized) and is the risk-free asset (we set 0 for the risk-free
assets because it is actually around this value).
return Sortino
Summary
In figure 4.1, we can see the cash flows of a long contract (bet for the
increase) and a short contract (bet for the decrease). is the cash
flow at time 1, when purchasing the contract. For a long, we buy the
stock in . Thus, the cash flow is minus the stock price (with no
transaction costs), and when we close the position, we have the actual
price of the stock For the short contract, it is precisely the
opposite. Indeed, when we buy the selling contract, we sell a lengthy
contract that we do not already have but with the obligation to buy
the stock later (in our example ). So, the benefits of an extended
contract are - and for the short contract - .
4.1.3. Rebalancing
The rebalancing of a TAA is an essential thing. Indeed, at this
moment that the weights of each asset in the portfolio are determined.
In the following examples, we will use allocation with the same
weights for all assets to make things easier. The dynamic of the
strategy will be only the sign (positive if we long and negative if we
short).
Figure 4.3: Assets with their simple moving average (SMA 15)
SMA 15 SMA 15 SMA 15
Date FB NFLX TSLA
FB NFLX TSLA
The moving
06-11 27.01 9.00 5.82 29.84 9.63 5.89
06-12 27.40 9.00 5.93 29.01 9.56 5.91
average is
06-13 27.27 8.97 5.95 28.65 9.48 5.93 often used to
understand
the trend of the asset. Figure 4.4 shows the stock price with the SMA
(simple moving average) of 15 and SMA of 60. So, it can be easy to
understand the upward and downward trends.
This figure shows the Facebook stock price with the SMA 15 and the
SMA 60. When the trend is up, we see that the fast SMA is above the
slow SMA and the opposite when the trend is down.
4.2.2. Moving average factor
In this subsection, we are going to build the moving average factor.
Indeed, we will use the SMA for three months and SMA for 12
months of each stock (Facebook, Netflix, Tesla) to create a monthly
signal to rebalance our portfolio.
We will use the same samples as in the previous chapter. We will use
70% of the data to create the strategy. Then, we will test it on the test
set (the other 30%). The difference between the previous chapter and
this one is that we keep the price in absolute value and not in
variations. Then, we can compute the factor making the difference
between the small and the long SMA. Moreover, we are computing
the z-score in normalizing the factor. We can see how to do it in code
4.2 and the results in figure 4.5.
Code 4.2: Create the z-scores
list_tickers = [ "FB" , "NFLX" , "TSLA" ]
# We do a loop to create the SMAs for each asset
for col in list_tickers:
data[ f "pct {col} " ] = data[col].pct_change( 1 )
data[ f "SMA3 {col} " ] = data[col].rolling( 3 ).mean().shift( 1 )
data[ f "SMA12 {col} " ] = data[col].rolling( 12 ).mean().shift( 1
)
data[ f "Momentum factor {col} " ] = data[ f "SMA3 {col} " ] - \
data[ f "SMA12 {col} " ]
d 1
# Find the medians
median = train_set[columns].median()
We have also shifted to the signal because we see the next month if it
is profitable when we have a signal. If we do not shift, we will make
a massive profit because we will predict the past with the future, not
the inverse. Now, let us see in figure 4.6 the results of this strategy.
4.3.1. Correlation
This section will explain in detail how correlation works and why it
is essential in this strategy. Indeed, the correlation will allow us to
understand if there is some relation between the two-time series.
Correlation is an essential tool in finance. It allows us to understand
the relationship between assets. However, many people use it in the
wrong way. That is why we have a whole part dedicated to this
notion.
Many types of correlation exist, but the Pearson correlation is the
most useful. It calculates the linear correlation between time series x
and y with this formula:
To find the best assets, we will take ten assets, but we can take more
to have a better choice. Furthermore, we compute the autocorrelation
between the 12 last month and the one next month to find the best
asset for the strategy on the train set. That means we search for the
assets with the best correlation between the lookback period and the
hold period (you can also find the best lookback and hold period with
this technique).
corr.append(cor)
correlation = pd.DataFrame(corr, index=list_, columns=[ "Corr" ] )
3
correlation.sort_values(by= "Corr" , ascending= False )
Unlike the SMA strategy after the corona crisis, we can see a
decrease in the cumulative return. It can be explained by the fact that
the crisis misleads the 12 last month's returns.
Figure 4.9: Performance of the trend returns strategy
In this figure, we can see the profit of the trend return strategy.
Summary
A short position allows us to bet on the decrease of a stock.
The beta metric of a stock highlights the relation between the market
represented by a market stock index like S&P 500 and the portfolio.
For example, a beta of 1.15 implies that if the market has a variation
of 1%, the portfolio varies to 1.15%. It can be considered a metric
of systemic portfolio risk. It is computed using the following
formula:
is the asset's return and is the market index's return.
When we compute the beta, you will be in one of these situations:
> 1 : The portfolio is more volatile than the market and has
a significant systemic risk.
return cov/var
With this function, we have obtained a beta at 1.08 for the portfolio
using the mean-variance criterion. It means that when the market
(S&P 500) varies by 1%, the portfolio varies by 1.08%.
Now let us talk about the alpha metric. It allows us to understand if a
portfolio outperforms or underperforms the benchmark. For example,
suppose we have a portfolio with an alpha of 0.0115. In that case, the
portfolio outperforms the benchmark by 1.15%, considering the
portfolio's returns and risk. It can be computed using the following
formula:
Where is the mean of the asset’s return and is the of the
market index returns, the return of the risk-free asset and the
asset’s beta.
Let us talk about the Sharpe ratio. This metric is very used in finance;
as we said before, it allows us to understand the additional benefits of
1% more risk.
we do .
return mean/std
Now, let us talk about the Sortino ratio. The Sortino ratio is an
excellent metric because it derives from the Sharpe ratio, which only
considers downward volatility. So, it allows us to understand the
additional benefits for 1% more of low risk. We can compute the
Sortino ratio with the following formula:
(annualized), and is the risk-free asset (we set 0 for the risk-free
assets because it is actually around this value).
return mean/std
With this function, we have obtained a Sortino ratio of 1.87 for the
portfolio using the mean-variance criterion. This is an excellent
portfolio, and 1% of additional downward risk gives 1.87% of return.
5.1.3. Drawdown
In this part, we will talk about drawdown. It is one of the best metrics
for a backtest. Indeed, it allows us to understand the significant loss
we suffer. It gives us the most prominent loss if we enter a position at
the worst time of this period. Now, it can be complicated to
understand, but it is a straightforward thing. Let us see. First, we need
to explain the formula to compute the drawdown:
running_max = np.maximum.accumulate(cum_rets.dropna() ) 2
running_max[running_max < 1 ] = 1
drawdown = ((cum_rets)/running_max - 1 ) 3
return drawdown
As we can see, the max drawdown is around 40% during the corona
crisis. It is because it is a long-only strategy. Except for this period,
the drawdown is around 15%. It means that if we enter in position
with this portfolio just before the crisis, we have lost 40% of our
capital in 1 month. However, after May, we have found our capital at
100%.
The reflection with the inverse of the CDF function is a little bit
complicated. We can explain the concept easier in figure 5.2.
The objective of VaR is to find the value that has % of the values
above it. For example, if theta is equal to 5%, the VaR is the value
with 5% of the returns ordered above it.
return var
Using this function, we can find the VaR for the mean-variance
portfolio. Indeed, for this portfolio, we found a VaR of 3.87% for a
day, 13.61% for a month, and 7.02% for a year. It means that in the
5% of the worst-case, we can lose more than 3.87% by day, 13.61%
by month, and 7.02% by year with this strategy.
Indeed, the cVaR does a mean of each value below the VaR. So, if
there are extreme values, they will consider them.Let us put the
formula of the cVaR before a schema in figure 5.3 to understand
easier:
In this figure, we can see how to compute the cVaR. Indeed, it is the
mean of the values below the VaR.
return cvar
Using this function, we can find the cVaR for the mean-variance
portfolio. Indeed, for this portfolio, we found a cVaR of 4.87% for a
day, 18.28% for a month, and 24.20% for a year. It means that in the
5% of the worst-case, we can lose more than 4.87% by day, 18.28%
by month, and 24.20% by year with this strategy.
Using this function, we can find the risk contribution of each asset.
Indeed, Facebook represents 31.04% of the portfolio's risk, Netflix
30.61%, and Tesla 38.34%. We can also represent these values in a
graphic like in figure 5.4.
beta = cov/var
# Alpha
alpha = mean_stock_return - beta*mean_market_return
if CR:
plt.figure(figsize=( 10 , 6 ))
plt.scatter(columns, crs, linewidth= 3 , color = "#B96553" )
plt.axhline( 0 , color= "#53A7B9" )
plt.grid(axis= "x" )
plt.title( "RISK CONTRIBUTION PORTFOLIO" , size= 15 )
plt.xlabel( "Assets" )
plt.ylabel( "Risk contribution" )
plt.show()
This figure shows that this strategy is perfect because the alpha is equal
to 35.45%. However, a huge VaR implies a significant risk in this
strategy.
Let us discuss the two other static strategies, Sharpe and Sortino
portfolios. We can see in figures 5.8 and 5.9 the metrics of the
respective strategy.
Figure 5.8: Metrics for Sharpe optimization portfolio
Portfolio: ['FB', 'NFLX', 'TSLA']
As we can see, the two strategies have nearly the same metrics. We
cannot say if one or the other is better. The only difference is the risk
in each asset. If we prefer investing in Netflix stock and believe in
this stock, we take the Sortino portfolio. However, if we prefer Tesla,
we choose the Sharpe portfolio, for example.
But why? There are several answers to this question. First, it was
critical to understand that our dynamic portfolio is a fundamental
strategy.
Also, we did not make asset selection for the first strategy with the
SMA. The strategies are also straightforward, but the objective was
to enter smoothly into the field of long-short strategies. So, we will
see in the following chapters how to create short-term algorithmic
trading strategies that work much better than these. Let us see the
results of these strategies in figure 5.10 and figure 5.11.
Summary
The CAPM metrics, the alpha, and the beta are fascinating to
understand the links between the portfolio and the
benchmark.
The VaR and the cVaR allow us to speculate about the worst
losses you can have used the strategy.
Chapter 6: Advanced backtest
methods
This chapter discusses advanced backtest technic such as take-profit,
stop-loss backtest, interesting metrics, and trailing stop loss. It will
give us more ideas about measures used in finance and trading.
1. Find some data, create new features, and split the dataset between
a train set and a test set.
2. Create a strategy using only the train set
3. Backtest the strategy on the test
4. Repeat until we find a profitable strategy on the test set
1. Find the data, create new features, and split the dataset.
2. Create a strategy and OPTIMIZE it on the train set
3. Backtest the strategy on the test set: keep it if it is good or stop
here. Do not change the strategy's parameters; the strategy is not
profitable. It does not matter; we will try another one!
We need to understand that the more we touch our test set to adapt the
strategy, the more we will have bad performances in the future (in live
trading).
Indeed, the more the event is uncommon on the backtest, the more the
probability of facing it again is low. So, suppose we base the
profitability of our strategy on these 2 or 3 big profits. In that case, the
probability of being profitable in live trading is very low.
To illustrate this purpose, let us see Figure 6.1 and Figure 6.2, in
which we see two strategies with the same return over the period but
with very different behavior. The first one loses a significant part of
the trades but is good thanks to the three significant earnings.
Conversely, the second strategy (Figure 6.2) is much more interesting
because the capital growth is more stable.
The first thing to check is the HIT ratio, the winning trade percentage:
keeping a strategy with a HIT ratio adapted to our goals is essential.
For example, suppose we want to create a trading signal to put in copy
trading (like in BullTrading). In that case, we should have the highest
HIT ratio possible because people will be more confident if they see
many profits, even small ones. However, we can accept the lowest
HIT ratio if we develop algorithms only for our investment.
These two metrics are suitable to classify the strategy and understand
if they are suitable for our investor profile or not. Moreover, we need
to look at more metrics like the trade lifetime and time underwater
(when the strategy drawdown is lower than 0, but we will explain it in
detail in the next section).
There is no good value for these values; even if the more time
underwater is small, the best it is. However, it depends on our strategy
target: if we want a scalping strategy. We have a max time underwater
of 2 months; it is not good, but if we use a swing strategy, it can be
acceptable.
So, to fix this issue, we will use a little tip. Indeed, why do we need
the ticks? We touch the take-profit or the stop-loss first to find it for
each position. So, for each candle, we need to know which came first,
the low or the high!
# Loop to find out which of the Take Profit and Stop loss appears
first
except Exception as e:
print (e)
data.loc[start, "Low_time" ] = start
data.loc[start, "High_time" ] = start
0 2
# Verify the number of row without both TP and SL on same time
percentage_garbage_row= len (data.loc[data[ "First" ]== 0
].dropna())
/ len (data) * 100
#if percentage_garbage_row<95:
print ( f "WARNINGS: Garbage row: { '%.2f' %
percentage_garbage_row} %" ) 3
return data
# Extract data
row = data.iloc[i]
#VERIF
if buy:
var_buy_high = (row[ "high" ] - open_buy_price) /
open_buy_price
var_buy_low = (row[ "low" ] - open_buy_price) /
open_buy_price
buy = False
open_buy_price = None
var_buy_high = 0
var_buy_low = 0
open_buy_date = None
#VERIF
if sell: 2
var_sell_high = -(row[ "high" ] - open_sell_price) /
open_sell_price
var_sell_low = -(row[ "low" ] - open_sell_price) /
open_sell_price
sell = False
open_sell_price = None
var_sell_high = 0
var_sell_low = 0
open_sell_date = None
We have already made the backtest but do not know how to interpret
it. Indeed, we have a return series without clear information. That is
the point of the next section
Just take one minute to analyze the backtest. Here, the strategy will go
directly to the bin. Why? Because of the risk, it takes. The TUW equals
99.04%, with a drawdown max of 63.3%.
# Bonus
def profitable_month_return ( p ):
total = 0
positif = 0
r=[]
# Loop on each different year
for year in p.index.strftime( "%y" ).unique():
e = []
nbm = p.loc[p.index.strftime( "%y" )==year].index.strftime(
"%m" ).unique()
# Loop on each different month
for mois in nbm:
if s> 0 :
positif+= 1
else :
pass
total+= 1
else :
pass
e.append(sum_)
r.append(e)
r[ 0 ]=[ 0 for _ in range ( 12 - len (r[ 0 ]))] + r[ 0 ]
r[ -1 ]= r[ -1 ] + [ 0 for _ in range ( 12 - len (r[ -1 ]))]
return pd.DataFrame(r,columns=[ "January" , "February" ,
"March" , "April" , "May" , "June" ,
"July" , "August" , "September" , "October" ,
"November" , "December" ], index=p.index.strftime( "%y"
).unique())
plt.figure(figsize=( 20 , 8 ))
pal = sns.color_palette( "RdYlGn" ,n_colors= 15 )
sns.heatmap(htm, annot= True , cmap =pal, vmin= -100 , vmax=
100 )
11. Sortino ratio: Same metric as the Sharpe ratio, but we compute
the risk using the downward volatility instead of the classic
volatility for the Sharpe ratio.
12. Beta: It will give us some indications about how much the
strategy is correlated to the market (SP500, for example) (more
explanation in chapter 5)
16. VaR: Will give us the worst loss we can make with a 5% error
threshold. We can compute it on the period that you want: the
worst loss per day, month or year for example. (More explanation
in chapter 5)
17. cVaR: Like the VaR but some difference in the computation.
(More explanation in chapter 5)
if method== "simple" :
df_ret =
pd.DataFrame(random_returns).transpose().cumsum()* 100
cur_ret = data[ "returns" ].cumsum()* 100
else :
df_ret = (( 1
+pd.DataFrame(random_returns).transpose()).cumprod() -1 )* 100
cur_ret = (( 1 +data[ "returns" ]).cumprod() -1 )* 100
plt.figure(figsize=( 20 , 8 ))
plt.legend()
plt.show()
When we have our Monte-Carlo area, the best thing to see is a short
area close to an upward trending line. The more area is small, the less
the strategy is volatile.
Seen the lower band of the Monte-Carlo, we can see that in one of the
worst paths, we can lose a lot of many after our strategy launching.
Moreover, the Monte-Carlo area is too large.
6.3.3 Easy Trailing stop
Trailing stop loss required ticks to be computed correctly. However,
here is a quick tip for finding the worst-case possible using a trailing
stop loss strategy. So, if we use the backtest function that will find the
worst case, we will see if, in the worst case, we are profitable or not. If
yes, it is a good thing because it means that all the big profits from the
trailing stop loss will be a bonus.
There are a lot of other trailing stop losses like classic trailing stop
loss and double profit targets. In our example, we will use the
threshold trailing stop loss. Once we have touched the TP, we begin
the Trailing Stop Loss (TSL) with a margin of 0.1%. It means that
when we touch the TP, we secure 1.4% with a TP of 1.5%, and then
we use a classic TSL: for each price augmentation, we increase the
stop loss to secure more and more profit. The goal will be to find the
returns. If we never earn any bonus profit from the trailing stop loss,
check the strategy's profitability to increase the probability of
profitability in live trading.
As shown in figure 6.7, the random signal with worst-case TSL is not
profitable. This is because of poor strategy robustness with high
volatility underlying.
Logically, the worst TSL case is not very good. However, the goal
will be too delicate, something with acceptable returns, which will
increase the probability of being profitable in the future. It will be
much more difficult in creation but much easier when we are in live
trading, and we need to know when we want to suffer.
Summary:
7.1.1. Stationarity
The stationarity of a time series is essential because it is a crucial
point of the behavior of a time series.
There is stationarity if the time series is flat around the mean and
without trend . Moreover, it needs to have a constant variance in
time . [7] (There are many assumptions to check, but these are the
most important).
We can check the stationarity of a time series using the Dick and
Fuller test. The test's null hypothesis (H0) is that the time series is not
stationary. So, suppose we have a p-value inferior at error threshold s.
In that case, hypothesis H0 is not accepted, and it is a stationarity
time series. For example, if we want to know at an error threshold of
10% (if the time series is stationary), we will compute an augmented
Dick and Fuller test and check if the p-value is below or above 10%.
So, there are the following possibilities:
As we can see in the title, the p-value of this time series is 1.0. It
means that there is no doubt that this series is not stationary.
On the next page, Figure 7.2 shows the behavior of a stationary time
series. It is the most stationarity data that we can have. So, we can find
the differences between figure 7.1 and figure 7.2.
7.1.2. Cointegration
Many of the time, the stock prices are not stationary. So, how can we
apply models which need stationary data? To do it, we are going to
talk about cointegration. The cointegration allows us to find a
stationary time series combining non-stationaries time series. [8]
def cointegration ( x , y ):
Figure 7.3 shows the difference between two cointegrated time series.
Indeed, in the figure, we can see the residual between both time
series. In this example, both the series are not stationary. Indeed, the
p-value for the augmented dick fuller test for the two times series
equals 1. Moreover, the p-value of the augmented dick and fuller test
for the residuals are 0. So, the residuals are stationary. Moreover, it is
possible to see that in the figure without computations.
Figure 7.3: Cointegrate two time series
As we can see, the residuals of this times series are stationary, and
this implies that the two series are cointegrated because it is also
easy to see that the two time series are not stationary.
Once we have done this, we have the pair of assets to do the pair
trading strategy. It is necessary to compute z-scores using the
differences between the two time series for the dynamic portfolio (if
you do not remember how to do that, you can go to section 4.1).
Now that we have the z-score, we need to define the long and short
signals. There are many ways to do it. One of these ways is to enter
positions when we pass a standard deviation threshold and exit the
positions at the mean of the spread.
Figure 7.4 shows the spread of figure 7.3 with the standard deviation and
the mean of the values. The green circle represents a short position for
time series 1 and a long position for time series 2. Indeed, the spread is
ts1 minus ts2. We need a decrease of the time series 1 or an increase of
the time series 2, to return to the mean of the spread. The red circle
represents the inverse for the positions because the spread is negative.
This figure shows the spread with the standard deviation and means
of the spread and the enter position signal.
7.2.2. Applications
In this part, we are going to compute a pair trading strategy. First, we
will use a function to find the cointegrate pairs and the correlation
between the two assets to choose the better pair.
Summary
# Yersteday
yts1 = train_set[ts1_symbol].values[ -2 ]
yts2 = train_set[ts2_symbol].values[ -2 ]
#Today
pts1 = train_set[ts1_symbol].values[ -1 ]
pts2 = train_set[ts2_symbol].values[ -1 ]
# Today data
spread = pts1 - pts2
zscore = (spread-train_set[ "spread" ].mean())/train_set[
"spread" ]
.std()
# Yersteday
yspread = yts1 - yts2
yzscore = (yspread-train_set[ "spread" ].mean())/train_set[
"spread" ]
.std()
# TS1
short_ts1 = False
long_ts1 = False
else :
pass
#TS2
short_ts2 = False
long_ts2 = False
else :
pass
# Positions
if pair == 1 :
buy, sell = long_ts1, short_ts1
else :
buy, sell = long_ts2, short_ts2
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD,
\t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-----------------------------------------------------------" )
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
The live parameter sets the live trading mode (live = True) or
the screener mode (live = False).
Chapter 8: Auto Regressive
Moving Average model (ARMA)
In this chapter, we will explain how works the ARMAs models. In
the first section, we will explore the time series concepts. Then, we
will explain the AR model and the MA model. Moreover, we will
explain the ARMA model and its derivatives.
Figure 8.1: Upward trend on the stock price Google (2007 to 2021)
This figure shows an exponential upward trend for Google stock
price from 2007 to 2021. Indeed, we can see that the stock price
increases even if there is some downward period.
There is also the possibility of a downward trend in a stock price, but it
is only interesting if we can short the stock.
Now, let us talk about the cycle in time series. To explain this, we will
create a fictive time series to highlight some points. A time series is
cyclical if there is a repeated schema on the price with a non-determined
time between two repetitions of this behavior. Let us see an example in
figure 8.2.
We see that the same behavior sometimes appends but with a non-
determinate time between each repetition.
Now, let us see how to find a seasonal time series. Seasonality is like
cyclicality, but we talk about seasonality with equal time between each
repetition. Let us see the difference between the two concepts in figure
8.3.
Figure 8.3: Seasonal time series
In this figure, we can see the same time between two repetitions of the
behavior. So, it is a seasonal time series.
This figure shows that the period from 2005 to 2008 is better than the
period from 2017 to 2020 for a long-only strategy thanks to the log
price graphic because it puts the price in a minimal range.
Where is the predicted value, the real value and are the
parameters of the model.
For example, if we take the model with S&P 500 and gold again, we
predict S&P 500 using gold in the linear regression. Now, we will
predict the S&P 500 at time t using S&P 500 at time t-1, which can
be extended to S&P 500 at time t-p. Let us see the AR(p) model
equation, where p is the number of previous data you take.
# Define model
p= 1
model = ARIMA(train_set, order=(p, 0 , 0 ))
# Make forecast
forecast = model_fit.forecast( ) 1
return forecast[ 0 ][ 0 ]
def AR ( df , returns = True ):
""" Function to predict the test set with a AR model """
apply(AR_predict_value ) 2
# Compute the returns and the signal of the asset
if returns:
1 , -1 ) 6
# compute strategy returns
In code 8.1, we can see that the p of the AR model is set to 1 (number
of lags for the autoregressive model). But why not 3, 8, or 10? There
exist many ways to find the best value for p. The first is to try the
model's error using MSE or MAE (confer section 1.1) for p from 1 to
15, for example, and choose the lowest error. However, it can be very
long to compute all the models.
The other way is to use a partial autocorrelation graph to find the best
theoretical number of lags. Nevertheless, what is partial
autocorrelation? A partial autocorrelation summarizes the
relationship between an observation in a time series with previous
observations with the relationships of intervening observations
removed.
Let us see how to interpret a partial autocorrelation graph (you can
find the code to do it in the notebook associated with the chapter) in
figure 8.6.
Now, let us talk about the strategy's performance with the AR model.
In figure 8.7, we can see the cumulative returns of a method using the
AR model on EURUSD. We could say that the model works well
enough because, in 1 year, it have made 8% of return even if it had a
high risk but, it is not the point highlighted here.
During the corona crisis, the market situation was different from
normal. Whereas the AR model is a predictive algorithm that uses
past data to predict the value (regressor algorithm), the training data
can have very different properties like two other time series.
# Define model
q= 1
# Make forecast
forecast = model_fit.forecast()
return forecast[ 0 ][ 0 ]
return df
How to choose the q? We can also use the technique which tests all
the q from 1 to 20 to see which has the lowest error on the train set.
Alternatively, we can choose the same method as partial
autocorrelation but using absolute autocorrelation. Let us see the
autocorrelation graph of the EURUSD returns in figure 8.8.
# Define model
p= 1
q= 1
# Make forecast
forecast = model_fit.forecast()
return forecast[ 0 ][ 0 ]
return df
# Define model
p= 1
q= 1
d= 1
model = ARIMA(train_set, order=(p, d, q))
# Make forecast
forecast = model_fit.forecast()
return forecast[ 0 ][ 0 ]
def ARIMA_model ( df , returns = True ) : 1
""" Function to predict the test set with a ARIMA model """
return df
Summary
The log price properties are fascinating to understand the
natural variation of the assets.
The moving average model uses the past error term to adjust
the model. We can find the optimal lag using an
autocorrelation graph.
# Define model
p= 1
q= 1
d= 1
model = ARIMA(train_set, order=(p, d, q))
# Make forecast
forecast = model_fit.forecast()
value_forecasted = forecast[ 0 ][ 0 ]
buy = train_set.iloc[ -1 ] < value_forecasted
sell = not buy
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD,
\t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-----------------------------------------------------------" )
info_order = {
"Euro vs USdollar" : [ "EURUSD" , 0.01 ]
}
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
The confusion matrix has been explained as one of the best testing
performance metrics. It is a matrix with true positive, false positive,
false negative, and true negative. It is interesting to understand in
which way the algorithm is wrong. Let us see in figure 9.1 a
theoretical confusion matrix.
In the data, we add a new column with yesterday's price. It will be the
variable that we will use to predict the price of today. Let us see this
in code 9.1.
# Create a X
In statistics and machine learning, the data are divided into X and y.
X is all the variables that we use to predict the variable y. In our case,
X is yesterday's return, and y is today's return. Moreover, we need to
divide a train set and a test set. This decomposition allows us to train
the model on the train set and test the performances on unknowing
data, the test set. Usually, we use 80% of the data for the train set and
20% for the test.
from the dataset and the prediction. In our case the equation is:
Summary
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD, \t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-----------------------------------------------------------" )
info_order = {
"Euro vs USdollar" : [ "EURUSD" , 0.01 ]
}
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
Summary:
The question is: How can we help our algorithm better predict travel
time with the longitude and attitude data? An example of that will be
to compute the distance in kilometers between the start and the end
point for each travel. We will find an excellent relation between the
new feature (distance) and the target (travel time).
Features engineering can also allow us to find the best features for
our problem by looking at the variance of the features. However,
these methods are not relevant in trading because we will carefully
create new variables to give more information to our algorithms
about trends, short-term movement, volatility, etc.
Most of the new features in trading are technical indicators, price
action figures, or quantitative metrics. Here are some metrics that
will give some ideas for projects:
In our car example, the model is like the car tires; even if we keep
our summer tires in winter, we can go anywhere we want. We will
take more time and increase the accident risk. So, finding the model
adapted to our problem is essential. However, it will be easy to find
with good features and clear target.
Moreover, as the features and target engineering are one of the first
steps of a project, but not less important because we spend 80% of
the time on a project on this step, it is essential to do it carefully.
Quantitative features
N previous days : it is one of the most accessible variables but also
one of the most important because it will allow us to understand the
previous long or short-term variation depending on the period we
take.
We can see that the sign of the candle does not mean anything to the
Doji because it gives us indecision movement.
# Amplitude
df[ "amplitude_abs" ] = np. abs (df[ "Close" ] - df[ "Open" ])
df[ "Engulfing" ] = 0
df.loc[
# Yersteday red candlestick and Today increase
(df[ "candle_way" ].shift( 1 ) == -1 ) &\
(df[ "candle_way" ] == 1 ) &\
df.loc[
# Yersteday green candlestick and Today decrease
(df[ "candle_way" ].shift( 1 ) == 1 ) &\
(df[ "candle_way" ] == -1 ) &\
As we can see, this simple figure has several lines of code. That is
why it is much better to use a library or create our function if we
want to find a specific pattern.
Technical indicators
Resistance: The resistance can be computed using a lot of different
ways. This book will use the max value over the last 150 periods. The
support is the same thing but using the minimum.
df[ "target_dummy" ] = 1
df.loc[df[ f "target_var_ {n} " ]< 0 , "target_dummy" ] = -1
# Dummy variable
df[ "target_dummy" ] = 0
df.loc[df[ f "target_var_ {n} " ]>centile_67, "target_dummy"
]= 1
df.loc[df[ f "target_var_ {n} " ]<centile_33, "target_dummy"
] = -1
Summary
We can also use the technical analysis library of Python (ta). In this
part, we will create the needed indicators by ourselves.
# Volatility of returns
df[ "volatility returns 15" ] = df[[ "returns" ]].rolling( 15 ).std()
.shift( 1 )
df[ "volatility returns 60" ] = df[[ "returns" ]].rolling( 60 ).std()
.shift( 1 )
This figure shows that if we do not shift the data, we will have
interferences because we predict the 15th day while we already have
the 15th day in the features. It cannot be accurate, and we can lose
much money if we make this error.
11.1.2. Standardization
This subsection will explain the concept of standardization. We see
this in the SVM chapter because it is a geometric algorithm, and it is
necessary to standardize the data for these algorithms. Still, we can
apply this method to another algorithm.
The standardization allows doing the computation
faster. Thus, it is interesting for the algorithm which
demands many resources.
Sometimes like with this dataset, the data are not at the same scale.
For example, we can have a volatility of 60% with a return of 1.75%.
So, the algorithm has many difficulties working with that. We need to
standardize the data to put it on the same scale. Let us see graphicly
why we need to standardize the data in figure 11.2.
Now, let us see the formula to standardize the data and standardize
the data using Python.
Where is the standardize value, is the value of the observation,
is the mean of the vector and the volatility of the vector .
In this figure, we can see the question to which the SVC replies: " How
can the groups of points be separated optimally?".
We use margins to find the optimal line between the two groups.
These margins want to maximize the distance between the two
groups. Then, we will take the middle of the two margins to find the
delimitation for the two groups. Moreover, we will see that the points
on the margins are called supports, and these points are the only
important points in the algorithm. If we remove all the other points,
the performance of the algorithm will be the same.
Now let us talk about the kernel of the model. The kernel is the way
that our algorithm is training. There are many ways to optimize an
SVC, but the most used is the Gaussian kernel or the linear kernel in
finance.
In this figure, we can see the functioning intuition of the SVR. The
most useful parameter is the epsilon because it manages the path of
the SVR.
The most used kernel in finance for both SVC and SVR is the
linear and the gaussian kernel.
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD,
\t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-----------------------------------------------------------" )
info_order = {
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
In this figure, we can see the different hyperplanes of the decision tree.
In this example, there are three hyperplanes, meaning the depth of this
decision tree is 3.
Then, we have the model we can make predictions and backtest the
strategy. As shown in figure 12.3, we have an excellent return using
the depth of 6 because we have nearly 30% of returns yearly.
Whereas, when we do not make precise the depth, the algorithm can
go as deep as it wants, and it is not good; we can see that the returns
of the same model but without precise the depth is -80% on the
period.
Depth_max = None
Beta: -0.034 Alpha: -20.95 % Sharpe: -0.715. Sortino: -0.906
-----------------------------------------------------------------------------
VaR: 91.56 % cVaR: 101.59 % VaR/cVaR: 1.11 drawdown: 57.31 %
In this figure, we can see that the depth of the decision tree plays a
crucial role. Indeed, the return difference between the two strategies is
nearly 160% in the period. This highlights the overfitting problem
when we let the algorithm with no depth restrictions.
The decision tree regressor follows nearly the same process as the
decision tree classifier. So, let us see this in a simple tree in figure
12.4.
We can see the strategy results in figure 12.5, and as we can see with
the constraint on the depth of the algorithm, it also gives good results.
To find the best value for the parameter max_depth, we will use the
GridShearchCV algorithm of scikit-learn. It is a straightforward
algorithm. It just tests all the possibilities and takes the best.
param = { "max_depth" : [ 3 , 6 , 15 ] } 1
# Create the GridSearch
model = GridSearchCV(dtr, param_grid=param, cv= 3 ,)
# RETURNS
# Create returns criterion
We do not explain the same things to avoid being boring for this
algorithm. We can copy exactly what we say for the random forest
classifier on the random forest regressor. So, let us see how to
implement it and highlight the overfitting problem of these algorithms
again.
# SHARPE
# Create sharpe criterion
def sharpe ( y , y_pred ):
r = np.sign(y_pred) * y
return np.mean(r) / np.std(r)
model.fit(X_train.values, y_train_reg.values)
model.best_estimator_
As we can see that the algorithms are well different, the prediction is
better when we aggregate their prediction. However, we need some
mathematical assumptions like the independence of the estimator. But,
it is nearly impossible to have in real life because the algorithms train
on the same data.
# Backtest
backtest_dynamic_portfolio(df[ "strategy" ].iloc[split:])
# Backtest
backtest_dynamic_portfolio(df[ "strategy" ].iloc[split:])
# Backtest
backtest_dynamic_portfolio(df[ "strategy" ].iloc[split:])
As we can see, the results of the ensemble methods are the same. It
means that we cannot do a better prediction using this data with this
type of algorithm. To avoid this issue, we will see something most
powerful using deep learning later.
Summary
To find the best separation between groups, the decision tree
cut the space with n hyperplanes (where n is the depth of the
tree).
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean()
.shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling(
15 ).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling(
60 ).std()
.shift( 1 )
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean(
).shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling(
15 ).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling(
60 ).std()
.shift( 1 )
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD,
\t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-----------------------------------------------------------" )
info_order = {
"Google" : [ "Alphabet_Inc_C_(GOOG.O).a" , 1.00 ]
}
start = datetime.now().strftime( "%H:%M:%S" ) #"23:59:59"
while True :
# Verfication for launch
if datetime.now().weekday() not in ( 5 , 6 ):
is_time = datetime.now().strftime( "%H:%M:%S" ) ==
start
else :
is_time = False
# Launch the algorithm
if is_time:
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
The live parameter sets the live trading (live = True) or the
screener mode (live = False).
Chapter 13: Deep Neural
Networks (DNN)
In this chapter, we see our first algorithm for deep learning. We will
talk about the deep neural network DNN, or artificial neural network
ANN. Deep learning is a field of machine learning which uses the
most powerful algorithms. However, it demands many resources to
run. To explain the DNN and create a trading strategy using this
algorithm, we will explain the intuition behind the DNN and how to
create a DNN classifier and a DNN regressor. Moreover, after this
chapter, we will be capable of making our loss function. We will
implement the algorithms on the Apple stock price.
One neuron is like linear regression. It has an input named X and a set
of parameters to predict an output y. The parameters are named
weights in deep learning, and the intercept ( in the linear
regression) is called bias. The only difference between the neuron and
a linear regression is that we put in an activation function (we will
discuss the action functions later). Let us see the process behind a
neuron in figure 13.1.
In this figure, we can see the process behind one neuron in a deep
neural network.
Now, let us discuss the activation function. Usually, it is just a way to
contain the value of the output. However, we will explain later that in
the hidden layer, the choice of the activation function is not "really"
essential but is crucial for the output layer. Let us see the different
activation functions in figure 13.2 to understand the notion better.
This figure shows some activation functions, whereas the most used
are the ReLu, the linear, and the sigmoid function.
So, forwarding propagation is to have data put into the neural network
and give an output, the prediction. However, this process does not
train our algorithm. So, in the following parts, we will see how to train
the DNN.
To do it, we will search only with our feet the way with the most
downward slope, and we go in this direction after some meters; we
will continue this process until we are down.
The only problem with the gradient descent is that it can fall at a local
minimum. To schematize the local and global minimum, we can think of
a gold digger in a cave. There is a pack of 1kg gold (the local minimum)
and only one pack of 100kg (the global minimum). Suppose we do not
know that there is a pack in the cave. There is much chance of finding a
pack of 1kg and leaving the cave without the pack of 100kg; this problem
depends on the cost function. Usually, for a regression task, we use the
MSE, which is convex, so there is no issue. However, the algorithm can
fall at a local minimum if we create our loss function as in the next part.
This figure shows that if the function is not strictly convex, the algorithm
can fall into a local minimum even if there is a global minimum
elsewhere.
# Features engeeniring
df[ "returns t-1" ] = df[[ "returns" ]].shift( 1 )
# Mean of returns
df[ "mean returns 15" ] = df[[ "returns" ]].rolling( 15
).mean().shift( 1 )
df[ "mean returns 60" ] = df[[ "returns" ]].rolling( 60
).mean().shift( 1 )
# Volatility of returns
df[ "volatility returns 15" ] = df[[ "returns" ]].rolling( 15
).std().shift( 1 )
df[ "volatility returns 60" ] = df[[ "returns" ]].rolling( 60
).std().shift( 1 )
# NORMALIZATION
# Import the class
from sklearn.preprocessing import StandardScaler
nb_hidden_layer = 1 1
# INTIALIZATION SEQUENTIAL MODEL
classifier = Sequential( ) 2
# ADD HIDDEN LAYER
for _ in range (nb_hidden_layer) : 3
classifier.add(Dense( 75 , input_shape = (X_train.shape[ 1 ],),
activation= "relu" ) ) 4
# OUTPUT LAYER DENSE
classifier.add(Dense( 1 , activation= "sigmoid" ) ) 5
# COMPILE THE MODEL
classifier. compile (loss= "binary_crossentropy" , optimizer=
"adam" ) 6
# TRAINING
classifier.fit(X_train_scaled, y_train_cla, epochs= 15 , batch_size=
32 , verbose= 1 ) 7
1 Define the number of hidden layers.
Now, we will explain this code. First, we need to initialize the model
as for the scikit-learn models. After that, we have just an empty
model; we need to build it. The loop allows us to create as hidden
layers as we want. The Dense function of TensorFlow creates a layer
of neural networks. We have to be precise about how many neurons
we want, the shape of the inputs, and the activation function.
Then we need to define a particular layer for the output because we
need to predict the class. So, we want to predict a 0 or a 1. We choose
the sigmoid to activate this neuron because the value will be between
0 and 1, and we want a probability.
Then we have to explain how we want to train the algorithm. We
choose the cross-entropy function and a stochastic descent gradient
named "adam." To fit the model, we must choose the number of
epochs and the batch_size.
# Backtest
backtest_dynamic_portfolio(df[ "strategy" ].iloc[split:])
With this code, we can see backtest of a strategy using the DNN
classifier on Apple stock. As we can see, there is an excellent
performance because we win more than 30% annually. In comparison,
we need to consider that we have not integrated the fees.
# TRAINING
regressor.fit(X_train_scaled, y_train_reg, epochs= 15 , batch_size=
32 , verbose= 1 )
We have chosen to train the model using the Mean Squared Error
(MSE) as a loss function.
If we begin deep learning, it is advised to keep the linear
activation function for the output of a regressor. If we
understand the concept well, we can try a tanh function
or others as seen before.
The error is almost the same for the MSE between 0.000001 or
-0.000001.
y_true_dif = tf.math.sign(y_true_roll-y_true ) 3
y_pred_dif = tf.math.sign(y_pred_roll-y_pred)
booleen_vector = y_true_dif == y_pred_di f 4
alpha = tf.where(booleen_vector, 1 , 3 ) 5
alpha = tf.cast(alpha, dtype=tf.float32 ) 6
mse = F.square(y_true-y_pred)
mse = tf.cast(mse, dtype=tf.float32)
scale_mse = tf.multiply(alpha, mse)
alpha_mse = F.mean(scale_mse)
return alpha_mse
This part is not mandatory for the book, but you should
know that creating a custom loss function is possible.
You can consult the TensorFlow documentation if you
want to go deeper into the subject.
As we can see in the figure and as we have said in the last part, the
custom loss is not for beginners. Indeed, the custom loss strategy is
good but not better than the strategy using the MSE.
Figure 13.8: Backtest of the strategy using MSE and MSE alpha
MSE alpha
With the DNN, we have excellent results, but the DNN does not
consider that we work with a time series.
Summary
The process behind a neuron is like linear regression; instead
of putting the y in a activation function.
import os
from sklearn.preprocessing import StandardScaler
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean()
.shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling(
15 ).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling(
60 ).std()
.shift( 1 )
alg = ANN()
# TRAINING
alg.fit(X_train, y_train, epochs= 13 , batch_size= 32 ,
verbose= 1 )
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean()
.shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling(
15 ).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling(
60 ).std()
.shift( 1 )
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD,
\t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-----------------------------------------------------------" )
info_order = {
"Apple" : [ "AAPL.a" , 1.00 ]
}
start = datetime.now().strftime( "%H:%M:%S" ) #"23:59:59"
while True :
# Verfication for launch
if datetime.now().weekday() not in ( 5 , 1 ):
is_time = datetime.now().strftime( "%H:%M:%S" ) ==
start
else :
is_time = False
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
This figure showed the representation of an ANN using the layer only
with the square instead of the left with all neurons. However, it is the
same ANN.
The problem with the RNN is that the model may have some
problems with the gradient since there are many more derivations
than in the ANN case, which can cause vanishing gradient issues. At
the same time, the researchers have found a solution to this problem.
They have created new neurons named Long Short-Term Memory
(LSTM) neurons. It is slightly different from the structure of a neuron
we see in the last secion, so we need to study it.
Even if the GRU and the LSTM differ, you must test
both when creating your trading strategy on an RNN
model.
# Simple verification
if len (X_s) != len (y_s):
print ( "Warnings" )
We note that the y_train shape will not change, but we put it into the
function to shift the value due to the lag. Now, we have the data in a
suitable format. We can begin to implement the RNN classifier using
TensorFlow.
In this figure, we can see that the growth of the capital is very well,
except from 2018-07 to 2019-07 with a drawdown of 45%.
14.3. RNN regressor
This section will explain how to do an RNN regressor and create a
function to automate the building. To do it, we will do a tiny
precision about standardization. Then we will see a new deep
learning function named dropout and how to perform our trading
strategy created with the RNN regressor.
Add some dropout layer are very useful in two principal ways:
# LIBAIRIES
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout
# Inverse transform
y_train_sc = sc_y.inverse_transform(regressor.predict(X_train_3d))
# Predictions
y_pred_train = np.concatenate((np.zeros([lag, 1 ]),y_train_sc),
axis= 0 )
# Inverse transform
y_test_sc = sc_y.inverse_transform(regressor.predict(X_test_3d))
# Predictions
y_pred_test = np.concatenate((np.zeros([lag, 1 ]),y_test_sc),
axis= 0 )
# Backtest
backtest_dynamic_portfolio(df[ "strategy" ].iloc[split+lag:])
Before closing this chapter, there are some points about deep learning
need to be highlighted:
Summary
# Simple verification
if len (X_s) != len (y_s):
print ( "Warnings" )
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean()
.shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling( 15
).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling( 60
).std()
.shift( 1 )
lag = 15
X_train, y_train = X_3d_RNN(X_train, y_train.values, 15 )
alg = RNN()
# TRAINING
alg.fit(X_train, y_train, epochs= 1 , batch_size= 32 , verbose= 1
)
# Save the model
print ( "Train the model because there are no existed weights" )
alg.save_weights(os.path.join(path, f "Models/RNN_reg_
{symbol} " ))
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean()
.shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling( 15
).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling( 60
).std()
.shift( 1 )
X, _ = X_3d_RNN(X, y.values, 15 )
X = X[ -1 :,:,:]
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD, \t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-----------------------------------------------------------" )
info_order = {
"Netflix" : [ "Netflix_Inc_(NFLX.O)" , 1.00 ]
}
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
N is the number of filters. The more filters there are, the more
powerful and take time to train.
Our model is a little more complex than the other. Indeed, it combines
RNN, CNN, and dropout .
First, we will search for many assets for our strategy. Then we will
select the best assets using a Sharpe ratio criterion. Moreover, we will
find the optimal stop loss, take profit, and leverage.
Moreover, we will work with the data of our broker directly because
we need to be as close as possible to the market. Indeed, in the
previous chapter, we explored each technique using Yahoo data.
Indeed, the point was to explain that the techniques do not create a
strategy in the previous chapters. However, when we work on a real-
life project, it is better to have the broker data using the MetaTrader 5
platform because it is closer to reality.
symbols.append(list(element)[ -3 ]) 3
sectors.append(list(element)[ -1 ].split( "\\" )[ 0 ]) 4
descriptions.append(list(element)[ -7 ]) 5
# Create a dataframe
informations = pd.DataFrame([symbols, sectors, descriptions], index=[
"Symbol" , "Sector" , "Description" ]).transpose( ) 6
Our strategy will take only the assets with a meager spread. It is a
personal choice. We can choose another criterion of asset selection,
such as the volatility of the asset.
# Features engineering
df[ "returns t-1" ] = df[[ "returns" ]].shift( 1 )
# Mean of returns
df[ "mean returns 15" ] = df[[ "returns" ]].rolling( 15
).mean().shift( 1 )
df[ "mean returns 60" ] = df[[ "returns" ]].rolling( 60
).mean().shift( 1 )
# Volatility of returns
df[ "volatility returns 15" ] = df[[ "returns" ]].rolling( 15 ).std()
.shift( 1 )
df[ "volatility returns 60" ] = df[[ "returns" ]].rolling( 60 ).std()
.shift( 1 )
# NORMALIZATION
# Import the class
from sklearn.preprocessing import StandardScaler
# PCA
# Import the class
from sklearn.decomposition import PCA
This project must have three sets because we work with predictive
models, and we need one set to train it and two sets to work with
portfolio management technics. Let us see in figure 14.3 the utility of
each set in our project.
Validation set: Now, we have the best models and the best
allocation of our capital between the models (the best portfolio
allocation). We can backtest the portfolio on this anonymous
data.
We can see a little summary of the decision process to find the best
portfolio of trading strategies.
First, we will find the bests strategies using a Sharpe ratio criterion.
Then, we will take the most profitable strategies and use a voting
method on the algorithm. Moreover, we will use a portfolio method on
the test set data to find the best allocation in our portfolio strategies.
df = df.dropna()
# Create predictions for the whole dataset
df[ "prediction" ] = model.predict(np.concatenate((X_train,
X_test, X_valid), axis= 0 ))
if reg== False :
df[ "prediction" ] = np.where(df[ "prediction" ]== 0 , -1 , 1 )
# Compute the strategy
df[ "strategy" ] = np.sign(df[ "prediction" ]) * df[ "returns" ]
# Initialization
symbols = symbols[: 150 ] #["EURUSD", "GBPAUD"]
lists = []
With this code, we have found the Sharpe ratio for each strategy. We
will take the asset with the best Sharpe ratio and have good results
with the three algorithms.
vot = VotingClassifier(estimators=[
( 'lr' , lin), ( "tree" , tree), ( "svr" , svr)])
if reg== False :
df[ "prediction" ] = np.where(df[ "prediction" ]== 0 , -1 , 1 )
As we can see in figure 16.4, the return of each asset alone is not very
profitable, taking into account the volatility of these strategies. The
following subsection will apply a portfolio method to the trading
strategies to decrease the investment risk.
# Apply the tp
pf[ "Return" ] = np.where(pf[ "high" ].values>tp, tp, pf[ "Return"
].values)
pf[ "Return" ] = np.where(pf[ "Return" ].values>tp, tp, pf[ "Return"
].values)
# Apply the sl
pf[ "Return" ] = np.where(pf[ "low" ].values<-sl, -sl, pf[ "Return"
].values)
pf[ "Return" ] = np.where(pf[ "Return" ].values<-sl, -sl,
pf[ "Return" ].values)
So, suppose we work with our strategy using 2.1% for the tp threshold
and 9% for the sl. In that case, we want a maximum drawdown of
15% because we are not risky lovers, and the maximum drawdown is
10%. Then the leverage for the strategy is 15/10 = 1.5.
Use ticks to compute the spread in the time and not at one day
to select the asset because it is not representative.
Use ticks to work with subsequent stop loss.
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean()
.shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling( 15
).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling( 60
).std()
.shift( 1 )
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
alg = VotingClassifier(estimators=[
( 'lr' , lin), ( "tree" , tree), ( "svr" , svr)])
# Features engeeniring
data[ "returns t-1" ] = data[[ "returns" ]].shift( 1 )
# Mean of returns
data[ "mean returns 15" ] = data[[ "returns" ]].rolling( 15
).mean()
.shift( 1 )
data[ "mean returns 60" ] = data[[ "returns" ]].rolling( 60
).mean()
.shift( 1 )
# Volatility of returns
data[ "volatility returns 15" ] = data[[ "returns" ]].rolling( 15
).std()
.shift( 1 )
data[ "volatility returns 60" ] = data[[ "returns" ]].rolling( 60
).std()
.shift( 1 )
if live:
current_account_info = mt5.account_info()
print ( "-----------------------------------------------------------" )
print ( "Date: " , datetime.now().strftime( "%Y-%m-%d
%H:%M:%S" ))
print ( f "Balance: {current_account_info.balance} USD, \t"
f "Equity: {current_account_info.equity} USD, \t"
f "Profit: {current_account_info.profit} USD" )
print ( "-------------------------------------------------------------
info_order = {
"RUSSEL 2000" : [ "US2000" , 1.1 ],
"Bitcoin" : [ "Bitcoin" , 0.1 ],
"Nasdaq 100" : [ "NAS100" , 0.3 ]
}
else :
print ( f "Symbol: {symbol} \t"
f "Buy: {buy} \t"
f "Sell: {sell} " )
time.sleep( 1 )
Chapter 17: From nothing to live
trading
This chapter discusses the process of creating and putting a trading
strategy into production, managing a live trading strategy, and combining
strategies between them.
As we are not making manual trading here, the trading plan will be
slightly different, as we already know. A trading plan will give us the
entry and exit point with the money management to follow for each
strategy. However, if we have automatized this, we do not need to check
it for each trade. So, in algo trading, what a trading plan looks like?
We need to put all the strategy characteristics in the same place, i.e., the
performance, the risk, how many trades we want to take (maximum one
trade and at least three trades in a day), which timeframe we want to use
(daily, hourly, weekly), which asset we want to trade (low spread asset,
volatile asset), which market we want to trade (forex, stocks, crypto), do
we want to be overnight exposed, etc.,
To conclude, the trading plan will allow us to plan which type of strategy
we want to begin the research process peacefully.
As we can see, the process seems simple, but we have taken the whole
book to explain it. So, please do not underestimate the difficulty of this
process; we will need at least 100 tries to find one good strategy (on
average). Do not give up and be stable in our work; you will find
profitable strategies but it requires much time.
Analyze the trade manually and write what you think about it:
positive and negative points to correct. Here is an idea to
understand the utility better:
Stop-loss above the support too many times Define
another way to place stop-loss.
False signals after some news Do not trade 1 hour
before some news.
Checking about the return’s distribution between
production and backtest and see that the production
returns are less and less Upgrade the strategy or stop
it
Keep an history of the coding issue and how to solve them; it is
valuable when we are on other projects to solve our issues easily.
17 2 1 Bet sizing
17.2.1 Bet sizing
The first rule is that we will NEVER invest all our capital in one strategy. It
is simple, but it is essential. Moreover, there are several simple rules to
follow to minimize the risk of losing much money:
1. Never risk more than 1% of your capital in one trade (adapt the
volume)
2. Depending on strategy, stop the algo for the day after 3 losses or stop
it for the week after 5% loss, etc.
These three rules are not exhaustive; there are a lot of different similar rules
to manage our risk and be a profitable trader.
How to tell if the incubation returns are good or not? There are many
ways, but here is the easiest one. To do it, we can compare the
distribution between the backtest returns and the incubation returns. The
best way to do it is to compare the statistics with the theoretical values
obtained in the backtest on the train and test sets. Then compute the
return in the incubation period using our backtest function.
As we can see, the first distributions are similar instead of to the seconds,
so we need to review all our code to find the issue (frequent in the
backtest function).
However, we can find some tips to stop the algo, especially computation
about the drawdown: the method is called the drawdown management
method.
I hope you have enjoyed the book. Do not hesitate to join the discord
forum (QRCODE in chapter 1) if you have questions about the book or
see the other traders' questions. I wish you a fantastic continuation!
Annex: Compounding versus
simple interest
This paper will learn the difference between the compounding
interest strategy and the simple interest strategy. We will explain why
the difference between these types of calculations. Then, we will see
how to compute the cumulative returns using these methods.
# Save
svc = SVC()
alg_pickle = pickle.dumps(svc)
dump(alg_pickle, "svc.joblib" )
# Load
alg = load( "voting.jolib" )
return classifier
import warnings
from datetime import datetime
import pandas as pd
import MetaTrader5 as mt5
warnings.filterwarnings( "ignore" )
mt5.initialize()
class MT5:
# Tuple to dataframe
rates_frame = pd.DataFrame(rates)
deviation = 20 # mt5.getSlippage(symbol)
#************************ Open a trade
**************************
if id_position == None :
return summary
identifier =
current_open_positions.loc[current_open_positions[ "symbol"
]==symbol].values[ 0 ][ 0 ]
except :
position= None
identifier = None
# Close trades
if long == True and position== 0 :
long = False
else :
pass
if short== True :
res = MT5.orders(symbol, lot, buy= False , id_position=
None )
print ( f "OPEN SHORT TRADE: {res} " )
print ( "-------------------------------------------------------" )
else :
res = MT5.orders(row[ "symbol" ][ 0 ], row[ "volume" ][
0 ], buy= False , id_position=row[ "ticket" ][ 0 ])
Additional readings
Chapter 3
Markowitz’s “Portfolio Selection “: A Fifty-Year
Retrospective, The University of Chicago Press
https://www.jstor.org/stable/269777
Portfolio management: mean-variance analysis in the US
asset market, Narela (Bajram) Spaseski ,
https://www.researchgate.net/publication/264423979_PORTF
OLIO_MANAGEMENT_MEAN-
VARIANCE_ANALYSIS_IN_THE_US_ASSET_MARKET
Mean-variance-skewness-kurtosis based portfolio
optimization, KingKeung Lai, Shouyang Wand, Lean yu
https://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.898.991&rep=rep1&type=pdf
Chapter 4
Tactical Asset Allocation (TAA), ADAM BARONE .
https://www.investopedia.com/terms/t/tacticalassetallocation.
asp
Chapter 5
Optimization of conditional value-at-risk, R. Tyrrell
Rockafellar .
https://www.ise.ufl.edu/uryasev/files/2011/11/CVaR1_JOR.p
df
Chapter 7
Stationarity and differencing .
https://people.duke.edu/~rnau/411diff.htm
Cointegration, Niti Gupta.
https://www.wallstreetmojo.com/cointegration/
Pairs Trading, James Chen.
https://www.investopedia.com/terms/p/pairstrade.asp
Chapter 8
What Is a Time Series?, Adam Hayes.
https://www.investopedia.com/terms/t/timeseries.asp
Autoregressive–moving-average model, Wikipedia.
https://en.wikipedia.org/wiki/Autoregressive–moving-
average_model
Chapter 9
Linear Regression for Machine Learning, Jason
Brownlee. https://machinelearningmastery.com/linear-
regression-for-machine-learning/
Chapter 11
Support-vector machine, Wikipedia.
https://en.wikipedia.org/wiki/Support-vector_machine
Chapter 12
Decision Trees in Machine Learning, Prashant Gupta.
https://towardsdatascience.com/decision-trees-in-machine-
learning-641b9c4e8052
Ensemble Methods in Machine Learning: What are They
and Why Use Them? Evan Lutins.
https://towardsdatascience.com/ensemble-methods-in-
machine-learning-what-are-they-and-why-use-them-
68ec3f9fef5f
Chapter 13
Stochastic Gradient Descent — Clearly Explained !!,
Aishwarya V Srinivasan.
https://towardsdatascience.com/stochastic-gradient-descent-
clearly-explained-53d239905d31
Chapter 14
Recurrent Neural Networks cheatsheet, Afshine Amidi
and Shervine Amidi.
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
recurrent-neural-networks
Chapter 15
Introduction to Convolutional Neural Networks (CNN),
Manav Mandal .
https://www.analyticsvidhya.com/blog/2021/05/convolutiona
l-neural-networks-cnn/
[1]
Additional lecture : Markowitz’s “Portfolio Selection “: A Fifty-Year Retrospective
[2]
Additional lecture: Portfolio management: mean-variance analysis in the US asset
market .
[3]
Additional lecture : Mean-variance-skewness-kurtosis based portfolio optimization
[4]
Additional lecture : Additional lecture :Tactical Asset Allocation (TAA), Adam
BARONE
[5]
Additional lecture : https://en.wikipedia.org/wiki/Capital_asset_pricing_model
[6]
Additional lecture: Optimization of conditional value-at-risk, R. Tyrrell Rockafellar
[7]
Additional lecture: Stationarity and differencing
[8]
Additional lecture : Cointegration, Niti Gupta
[9]
Additional lecture : Pairs Trading, James Chen
[10]
Additional lecture: What Is a Time Series? Adam Hayes
[11]
Additional lecture : Autoregressive–moving-average model, Wikipedia
[12]
Additional lecture : Linear Regression for Machine Learning, Jason Brownlee
[13]
Additional lecture : Support-vector machine, Wikipedia
[14]
Additional lecture : Decision Trees in Machine Learning, Prashant Gupta
[15]
Additional lecture : Ensemble Methods in Machine Learning: What are They and Why
Use Them? Evan Lutins
[16]
Additional lecture : Stochastic Gradient Descent — Clearly Explained !!, Aishwarya V
Srinivasan
[17]
Additional lecture : Introduction to Convolutional Neural Networks (CNN), Manav
Mandal