Trading Results Analysis
Trading Results Analysis
October 6, 2020
[5]: df = raw.copy()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 92 entries, 0 to 91
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ticket 92 non-null int64
1 Magic 92 non-null int64
1
2 Comment 92 non-null object
3 Open Datetime 92 non-null object
4 Open Price 92 non-null float64
5 Close Datetime 92 non-null object
6 Close Price 92 non-null float64
7 Type 92 non-null int64
8 Lots 92 non-null float64
9 Symbol 92 non-null object
10 Take Profit 92 non-null float64
11 Stop Loss 92 non-null float64
12 Profit 92 non-null float64
13 Swap 92 non-null float64
14 Commission 92 non-null int64
dtypes: float64(7), int64(4), object(4)
memory usage: 10.9+ KB
The names of columns are fairly self explanatory. However I feel I should provide more information
on these: * ‘Ticket’ is generated by the broker and is of no meaning. * ‘Magic’ column is used to
distinguish between trading algorithms. * ‘Comment’ will be used to determine the reason why a
trade was closed.
[8]: Ticket 0
Magic 0
Comment 0
Open Datetime 0
Open Price 0
Close Datetime 0
Close Price 0
Type 0
Lots 0
Symbol 0
Take Profit 0
Stop Loss 0
Profit 0
Swap 0
2
Commission 0
dtype: int64
I then check for constant columns. I excpect the ‘Commission’ column to be all zeroes.
[9]: df.var() == 0
[10]: 0
I expect that all open prices are positive, so I count how many are negative.
[11]: df[df['Open Price'] <= 0]['Open Price'].count()
[11]: 0
[12]: 0
[13]: 0
1.2.2 Transformations
For aesthetic reasons, I strip the names of symbols from “+” and “.”.
3
[14]: df['Symbol'] = df['Symbol'].apply(lambda x: x.replace("+", ""))
df['Symbol'] = df['Symbol'].apply(lambda x: x.replace(".", ""))
Chronological order is of utmost importance since I shall be performing running calculations later.
[16]: df = df.sort_values('Open Datetime')
df = df.reset_index(drop=True)
I extract the same information as above for the “Close Datetime” column.
[18]: df['Close Date'] = df['Close Datetime'].dt.date
df['Close Hour'] = df['Close Datetime'].dt.hour
df['Close Day'] = df['Close Datetime'].dt.day_name()
df['Close Datetime Seconds'] = pd.to_datetime(df['Close Datetime'],␣
,→origin='unix').astype('int')
I create a new column that is equall to the net profit from a trade.
[20]: df['Profit'] = df['Profit'] + df['Swap'] + df['Commission']
I need a ‘Profit Per Lot’ column because different trades have different lot sizes.
[21]: df['Profit Per Lot'] = df['Profit'] / ( df['Lots'] / 0.01)
4
I extract the reason why a trade was closed from the ‘Comment’ column. The reason for closure is
added by my broker to the end of my comment:
[23]: df['Comment'][0]
[23]: 'srl-1.03-242[[sl]]'
In order to extract the reason for closure I search for specific strings in the comment.
[24]: df['Stop Loss Hit'] = df['Comment'].apply(lambda x: 1 if "[[sl]]" in x else 0)
df['Take Profit Hit'] = df['Comment'].apply(lambda x: 1 if "[[tp]]" in x else 0)
I calculate the percentage difference between the take profit and the stop loss columns. I define a
helper function first:
[25]: def stops_dist(x):
if x[2] == 1:
return (x[0] / x[1] - 1) * 100
else:
return (x[1] / x[0] - 1) * 100
5
After cleaning the shape is:
[29]: df.shape
[30]: Magic Open Datetime Open Price Close Datetime Close Price \
0 4444 2020-05-19 11:31:00 1.09543 2020-05-19 11:39:00 1.09538
1 4444 2020-05-19 12:17:00 1.09548 2020-05-19 12:23:00 1.09539
2 4444 2020-05-19 17:40:00 1.09250 2020-05-19 21:59:00 1.09257
3 4444 2020-05-20 13:03:00 1.09569 2020-05-20 13:20:00 1.09557
4 4444 2020-05-20 13:57:00 1.09630 2020-05-20 14:29:00 1.09628
Lots Symbol Profit Swap Open Date … Close Hour Close Day \
0 0.01 EURUSD 0.21 0.0 2020-05-19 … 11 Tuesday
1 0.01 EURUSD 0.38 0.0 2020-05-19 … 12 Tuesday
2 0.01 EURUSD 0.30 0.0 2020-05-19 … 21 Tuesday
3 0.01 EURUSD 0.50 0.0 2020-05-20 … 13 Wednesday
4 0.01 EURUSD 0.09 0.0 2020-05-20 … 14 Wednesday
Close Datetime Seconds Duration Profit Per Lot Order Type Direction \
0 1589888340000000000 8.0 0.21 Sell -1
1 1589890980000000000 6.0 0.38 Sell -1
2 1589925540000000000 259.0 0.30 Buy 1
3 1589980800000000000 17.0 0.50 Sell -1
4 1589984940000000000 32.0 0.09 Sell -1
[5 rows x 24 columns]
6
[31]: '2020-05-19'
[32]: '2020-06-16'
[34]: 92
1.3.1 Profit
Total profit Over the period in question the algorithms achieved a profit of:
[35]: np.round(df['Profit'].sum(), 2)
[35]: 118.02
Total profit by market As can be seen the bulk of the profit comes from ‘USDCHF’.
[36]: df_symbol = df[['Symbol', 'Profit']].groupby(['Symbol'], as_index=False).sum()
plt.figure(figsize=figsize)
plt.bar(df_symbol['Symbol'], df_symbol['Profit'])
plt.xticks(df_symbol['Symbol'], rotation=90)
plt.ylabel('Profit (zł)')
plt.xlabel('Symbol')
plt.show()
7
Total profit by market and trade direction The most money was made by shorting ‘USD-
CHF’. The most money was lost buying ‘US500’.
[37]: df_mkt = df[['Symbol', 'Order Type', 'Direction', 'Profit']]
df_mkt = df_mkt.groupby(['Symbol', 'Order Type'], as_index=False)
df_mkt = df_mkt.agg({"Direction": [np.sum], "Profit": [np.sum]})
df_mkt.columns = df_mkt.columns.droplevel(1)
df_mkt['Direction'] = np.abs(df_mkt['Direction'])
df_mkt = df_mkt.rename(columns={"Direction" : "Number of trades"})
df_mkt.sort_values('Profit', ascending=False)
8
[38]: df_cdate = df[['Close Date', 'Profit']].groupby(['Close Date'], as_index=False).
,→sum()
df_cdate.sort_values('Profit', ascending=False).head(3)
[40]: -7.6
Profit Per Lot histogram As can be seen the distribution of ‘Profit Per Lot’ is nothing like a
normal distribution:
[41]: width = 400
height = 400
dpi = 100
plt.figure(figsize=(width/dpi, height/dpi))
plt.hist(df['Profit Per Lot'])
plt.ylabel('Frequency')
plt.xlabel('Profit Per Lot (zł)')
plt.tight_layout()
plt.savefig('./img/profit_histogram.png')
plt.show()
9
The mean of the distribution is:
[42]: np.round(np.mean(df['Profit Per Lot']), 2)
[42]: 0.49
However the median is negative so the most frequent trades were small losses:
[43]: np.round(np.quantile(df['Profit Per Lot'], 0.5), 2)
[43]: -7.5
[44]: 22.85
Interquartile range:
[45]: np.round(np.quantile(df['Profit Per Lot'], 0.75) - np.quantile(df['Profit Per␣
,→Lot'], 0.25))
[45]: 14.0
10
The distribution exhibits positive skew of:
[46]: np.round(skew(df['Profit Per Lot']), 2)
[46]: 3.06
[47]: 15.45
Comment This is to be expected given the design of the algorithms. They have a predifined
maximum loss per trade (they will take small losses more frequently) but can hold winning trades
for up to a week (hence the big wins).
Profit Per Lot by symbol Given the presence of outliers I think it is apropriate that for the
rest of this section I analyze the median of ‘Profit Per Lot’. As can be seen by far the worst market
was ‘US500’.
[48]: df_ppl = df[['Symbol', 'Profit Per Lot']].groupby(['Symbol'], as_index=False).
,→quantile(0.5)
df_ppl['Symbol'] = df_ppl['Symbol'].astype('str')
plt.figure(figsize=figsize)
plt.bar(df_ppl['Symbol'], df_ppl['Profit Per Lot'])
plt.xticks(df_ppl['Symbol'], rotation=90)
plt.ylabel('Median Profit Per Lot (zł)')
plt.xlabel('Symbol')
plt.show()
11
Profit Per Lot by day of week As can be seen no particular day of the week is better for
trading.
[49]: df_day = df[['Open Day', 'Profit Per Lot']].groupby(['Open Day'],␣
,→as_index=False).quantile(0.5)
plt.figure(figsize=figsize)
plt.bar(df_day['Day Of Week'], df_day['Profit Per Lot'])
plt.xticks(df_day['Day Of Week'], rotation=90)
plt.ylabel('Median Profit Per Lot (zł)')
plt.xlabel('Day Of Week')
plt.show()
12
Profit Per Lot by open hour The median profit per lot is positive for 2pm and 4pm. This is
very interesting because: * Around 2pm typically macroeconomic news is released. * The european
session closes at 5pm.
[50]: df_hour = df[['Open Hour', 'Profit Per Lot']].groupby(['Open Hour'],␣
,→as_index=False).quantile(0.5)
plt.figure(figsize=figsize)
plt.bar(df_hour['Hour Of Day'], df_hour['Profit Per Lot'])
plt.ylabel('Median Profit Per Lot (zł)')
plt.xlabel('Hour Of Day')
ax = plt.gca()
ax.xaxis.set_major_locator(plt.MaxNLocator(integer=True))
plt.show()
13
1.3.3 Trades
Best trades Here is a table with the 3 best trades.
[51]: df[['Symbol', 'Profit Per Lot', 'Profit']].sort_values('Profit Per Lot',␣
,→ascending=False).head(3)
[53]: 17.34
14
Average loss A unprofitable trade on average loses:
[54]: avg_loss = np.round(df[df['Profit Per Lot'] < 0]['Profit Per Lot'].mean(), 2)
avg_loss
[54]: -10.84
[55]: 1.6
Percent of winning trades As can be seen winning trades occur 40% of the time.
[56]: win_prob = np.round(df[df['Profit Per Lot'] >= 0]['Profit Per Lot'].count() /␣
,→df.shape[0] * 100, 2)
win_prob
[56]: 40.22
loss_prob
[57]: 59.78
Probability of observing consecutive losses I use the well known formula to compute the
probability of a series of 10 losses happening. This calculation should be taken with a grain of salt
because the probabilities are most surely not fixed.
[58]: X = geom(loss_prob / 100)
df_geom = pd.DataFrame([np.arange(1,11), np.round(np.power(loss_prob/100, np.
,→arange(1,11)),2)]).T
15
6 7 0.03
7 8 0.02
8 9 0.01
9 10 0.01
Trade duration Next I inspect the histogram of the duration of individual trades.
[59]: plt.figure(figsize=figsize)
plt.hist(df['Duration'])
plt.ylabel('Frequency')
plt.xlabel('Time (min)')
plt.show()
I wonder what distribution does this follow ? I create a random discrete random variable from
binned data, and fit both a exponential and power law distribution.
[60]: dur = df['Duration'].to_numpy()
h = np.histogram(dur, bins=1000)
D_emp = rv_histogram(h)
16
Unfortunately, as can bee seen the cdf’s do not match, but they could serve as a approximation if
need be.
[61]: plt.figure(figsize=figsize)
plt.plot(X, D_power.cdf(X), label="Power")
plt.plot(X, D_exp.cdf(X), label="Exponential")
plt.plot(X, D_emp.cdf(X), label="Empirical")
plt.legend(loc='best')
plt.show()
[63]: Duration
8 3 days 18:01:00
49 3 days 09:33:00
9 3 days 08:06:00
[64]: Duration
31 00:02:00
17
30 00:02:00
65 00:03:00
Trade duration vs Profit Per Lot The scatter plot shows that there seems to be a minimal
relationship between the duration of a trade and its profitability. This is to be expected - as
mentioned earlier the algorithm will hold on to winning trades.
[65]: plt.figure(figsize=figsize)
plt.scatter(df['Profit Per Lot'], df['Duration'] / 60, s=3)
plt.ylabel("Duration (min)")
plt.xlabel("Profit Per Lot (zł)")
plt.show()
[67]: 4.84
18
[68]: plt.figure(figsize=figsize)
plt.hist(df_open['Trades Per Day'])
plt.ylabel('Frequency')
plt.xlabel('Trades Per Day')
plt.show()
total = (stop-start).total_seconds() / 60
plt.figure(figsize=figsize)
plt.bar(df_time['Symbol'], df_time['Percent'])
plt.xticks(df_time['Symbol'], rotation=90)
plt.ylabel('Time In Trade (%)')
plt.xlabel('Symbol')
plt.show()
19
1.3.5 Orders
There are 5 ways a order can get closed. 1. By me. (Did not occur) 2. By the algorithm. 3.
Market moves above/below take profit. 4. Market moves above/below stop loss. 5. Broker closes
trades open longer than one year. (Did not occur)
So I have 3 situations to examine.
[70]: 84.78
[71]: 11.96
Percent closed by algorithms Specifically, this is the percent of trades closed because the
trade triggered different logic than moving take profit and stop loss orders.
[72]: np.round(np.sum((df['Take Profit Hit'] == 0) & (df['Stop Loss Hit'] == 0)) / df.
,→shape[0] * 100, 2)
[72]: 3.26
20
Stop order distance This histogram is very interesting as it shows a way I could potentially
improve the logic of my algorithm. Placing orders 10% away from the market for the type of
strategy the algorithms trade seems pointless.
[73]: plt.figure(figsize=figsize)
plt.hist(df['Stops Distance'])
plt.ylabel('Frequency')
plt.xlabel('Distance (%)')
plt.show()
1.3.6 Drawdown
In trading drawdown refers to the difference between the high point in a equity curve and succeding
low point. I am interested in the the difference between the high point and the low point as well
as the duration of the drawdown.
size = arr.shape[0]
arr = np.cumsum(arr)
start = np.argmax(arr)
stop = np.argmin(arr[start:])
21
Cumulative profit I draw a plot of the cumulative profit over time and mark the high and low
points.
[76]: width = 800
height = 400
dpi = 100
plt.figure(figsize=(width/dpi, height/dpi))
plt.plot(np.cumsum(df['Profit']))
plt.ylabel("Profit (zł)")
plt.xlabel("Number of trade")
plt.scatter(d1, v1, c="green", label="Start of max drawdown")
plt.scatter(d2, v2, c="red", label="End of max drawdown")
plt.legend(loc='best')
plt.title('Maximum drawdown', fontsize=18)
plt.tight_layout()
plt.savefig('./img/drawdown.png')
plt.show()
Max drawdown duration I find that the duration of the max drawdown was:
[77]: drawdown_dur = df.loc[d2, "Open Datetime"] - df.loc[d1, "Open Datetime"]
str(drawdown_dur)
22
[78]: 14.85
Max drawdown amount The difference between the high point in the equity curve and the low
point is:
[79]: v1 - v2
[79]: 243.01
2 Monte Carlo
2.0.1 Simulating trades
Assumption: * Future trades will be similar to the ones in the dataset. * The algorithms trade size
is fixed at 0.01 lots.
The ‘Profit Per Lot’ column is sampled with replacement and summed up to simulate possible
outcomes for the next 100 trades.
I shall answer the following questions: 1. What is the probability that over the next 100 trades
the account will grow ? 2. How much can I expect to lose in the worst case ? 3. How much can I
expect to gain in the best case ?
Below is a visual to explain the process. Each line represents a different ‘future’. I am interested
in the distribution of these ‘futures’.
[80]: plt.figure(figsize=figsize)
np.random.seed(12346)
for _ in range(10):
23
100000 samples are chosen with replacement and summed up to get the final profit.
[81]: nrows = 10**5 # Number of simulations
ncols = 10**2 # Number of trades
plt.figure(figsize=(width/dpi, height/dpi))
plt.plot(bins[1:], X_unity, c='black')
plt.fill_between(x=bins[1:][mask], y1=0, y2=X_unity[mask], alpha=1/2)
plt.ylabel('Likelihood')
plt.xlabel('Profit (zł)')
plt.text(-250, 0.005, s=f'{loss_prob}%', color='white', fontsize=24)
plt.ylim(0)
24
plt.xlim(np.min(bins[1:]), np.max(bins[1:]))
plt.title('Probability of loss over the next 100 days.', fontsize=18)
plt.tight_layout()
plt.savefig('./img/probability_of_loss.png')
plt.show()
25
2.0.4 Probability of profit
From the simulated data I can calculate the probability of making money over the next 100 trades:
[84]: np.round(np.sum(X > 0) / X.shape[0] * 100, 2)
[84]: 56.63
[85]: 443.62
[86]: 1305.32
[87]: -306.93
26
The minimum profit in the simulation was:
[88]: np.round(np.min(X))
[88]: -773.0
27