Algorithmic Trading With Learning
Algorithmic Trading With Learning
Abstract
We propose a model where an algorithmic trader takes a view on the distribution of prices
at a future date and then decides how to trade in the direction of her predictions using the
optimal mix of market and limit orders. As time goes by, the trader learns from changes in
prices and updates her predictions to tweak her strategy. Compared to a trader who cannot
learn from market dynamics or form a view of the market, the algorithmic traders profits are
higher and more certain. Even though the trader executes a strategy based on a directional
view, the sources of profits are both from making the spread as well as capital appreciation of
inventories. Higher volatility of prices considerably impairs the traders ability to learn from
price innovations, but this adverse effect can be circumvented by learning from a collection of
assets that co-move. Finally, we provide a proof of convergence of the numerical scheme to the
viscosity solution of the dynamic programming equations which uses new results for systems
of PDEs.
Keywords: Algorithmic Trading; High Frequency Trading; Nonlinear Filtration; Brownian
Bridge; Stochastic Optimal Control; Adverse Selection
1. Introduction
Information is a valuable asset, but being better informed than others is money. Modern
trading algorithms employ unprecedented computer power to process information and make
trading decisions. In this paper we show how an informed trader (IT) who has conceived a view
of the market encoded in a prior on price changes, devises a strategy to trade in a single stock
or group of stocks to maximize expected profits. The strategy is rooted in the ITs superior
Let Sti denote the midprice of asset i (i = 1, . . . , n, n N) at time t [0, T ] and assume
that
STi = S0i + Di , (1)
where Di is a random variable which encodes the ITs prior belief on the future midprice
distribution of asset i. We require that Di have finite second moment, and that P(Di <
0) (0, 1) and P(Di > 0) (0, 1), i.e., there is non-zero probability on both positive and
negative outcomes to avoid arbitrage since interest rates are assumed zero. Hence, the IT
may use whatever methods she has at her disposal to form a prior (such as statistical machine
learning methods, price co-integration models, limit order book information, etc.) and the
price dynamics we propose below will reflect the ITs belief. The structure of the probability
space on which we work will be described below.
t (t, T ), and since the IT has access only to the filtration Ft generated by the collection of
Sti , she cannot decompose Sti into Di and tT i except at T where i i i
T T = 0 and ST = S0 + D .
i
To simplify notation we write the vector of midprices as St = (St1 , . . . , Stn ), Brownian bridges
as tT = (tT1 , . . . , n ), and terminal midprice changes as D = (D 1 , . . . , D n ).
tT
The algorithmic and high frequency trading literature commonly assumes the midprice
follows an arithmetic Brownian motion this is a particular case of our model if in (2) we
assume that Di are normally distributed with mean zero and variance i2 T . Further, our choice
of the price process is related to Doobs h-transform of the underlying Brownian motions (see,
e.g., Rogers and Williams (1987)). A similar model for asset prices was adopted by Brody et al.
(2008) whereby a process analogous to (2) represented the information flow in the economy
and asset prices are found via the conditional expectation with respect to this information, i.e.
E[S0i + Di | Ft ], rather than by (2) directly. In Brody et al. (2008), asset prices are martingales,
while here the midprices are not martingales. For other applications of randomized bridges in
information-based asset modeling see, e.g., Brody et al. (2009) and Macrina (2014). Moreover,
similar models with applications to utility maximization were studied earlier in Baudoin (2002),
Baudoin (2003), and Baudoin and Nguyen-Ngo (2004).
In the proposition below, also contained implicitly in Brody et al. (2008) and Baudoin
(2002), we show the stochastic differential equation (SDE) satisfied by Sti .
2
This independence assumption can easily be lifted. We impose it since, as will be shown later on, the
joint distribution of the price changes induces a non-linear co-integration of asset prices while correlating the
Brownian bridges will induce only an instantaneous correlation in price changes.
4
Proposition 1. The assets midprice process Sti given by (2) satisfies the SDE
where Wti are pairwise independent Ft -Brownian motions, and St=0 i = S0i for 1 6 i 6 n.
Moreover,
ai (t, S) (Sti S0i )
Ai (t, S) = , (5)
T t
where
R Qn Sj S0j 1 2 t
x
Rn i j=1 exp x j 2 (T t) x
2 j T j2 (T t) dF (x)
i j
ai (t, S) = E[ D | St = S ] = (6)
R Qn Sj S0j 1 2 t
Rn j=1 exp xj 2 (T t) 2 xj T 2 (T t) dF (x)
j j
This proposition shows how the traders prior and how learning from information updates
are encoded in the drift of the process (4). For instance, a trader who believes that prices are
independent arithmetic Brownian motions behaves in the same way as one who uses indepen-
dent normals with zero mean and variances of i2 T as his prior. Such a trader will never learn
from the dynamics of the asset prices and A(t, S) = 0. A second simple example is if F is
a Dirac mass at a single point . In this case, ai (t, S) = i , and therefore, the price process
is a Brownian bridge connecting the initial price S0 to the terminal price S0 + . In general,
the drift term A(t, S) is non-linear and captures the (non-linear) co-integration relationship
between asset prices. Throughout this paper, learning by the trader refers to the updates of
the prior by observing asset prices which is not necessarily restricted to the assets the trader
is taking positions in.
5
a modification of model (2) to bound the rate of growth. More specifically, in the notation of
Proposition 1, the agent assumes that the midprice Sti satisfies
where
Ai (t, S) := min Ai , max Ai , Ai (t, S) , < Ai < Ai < , i = 1, . . . , n . (8)
This modification will come into play in the proof of convergence of the numerical scheme in
section 6.
Innovations in midprices are used by the IT to update her prior and then to adjust her
strategy by deciding how to use market and limit orders to trade in and out of positions between
now and T < T . Market orders guarantee immediate execution but are more expensive because
the trader must pay the spread in addition to liquidity taking fees. On the other hand, limit
orders do not pay fees (in some markets they even earn rebates, see Cartea and Jaimungal
(2013)) but execution is not guaranteed the trader must wait until an incoming market order
fills her limit order.
Here we assume that the IT executes a strategy in a subset of 1, . . . , k assets (k n), but
nonetheless uses the intelligence she acquires from all n assets to form her trading strategy.
To pose the optimal control problem we require notation to describe the ITs submitted limit
and market orders, cash process, accumulated inventory, and finally, the market orders sent
by other participants.
The IT posts passive orders only at the touch (best bid or ask) where ` 1 k
t = {`t , . . . , `t }
{0, 1}k denotes her decision to post a sell (+) or a buy () limit order for one unit of asset
at time t with `i
t = 0 meaning that there is no post. We also assume that every time there
is an incoming market order, the ITs limit order is filled with probability p = {p1 , . . . , pk }.
Moreover, m 1 k k
t = {mt , . . . , mt } Z+ counts the total number of market orders sent by the
IT up until time t, X
mi
t = 1{ i 6t} , i = 1, . . . , k , (9)
k
k
where + indicates a sell ( a buy), 1{} is the indicator function, 0 < 1i < 2i < . . . are
increasing sequences of H-stopping times representing the arrival times of the ITs executed
market orders in asset i. The filtration H is the expanded filtration taking into account the
arrival of other agents market orders which fill the ITs posted
n limit orders oand the assets
1 k
midprices. More specifically, let the counting processes Nt = Nt , . . . , Nt represent the
6
total number of buy and sell market orders other participants have sent in the assets which
the IT trades. We assume these are Poisson processes (mutually independent and independent
of the midprice processes) with intensities > 0. Those market orders which fill the ITs
1 k
posted limit orders will be denoted Nt = {N t , . . . , N t } and, due to the thinning property
of Poisson processes, are also independent Poisson processes with intensities = p
(where x y represents componentwise
n multiplication).
The filtration on which strategies are
o
adapted is given by Ht = Su , Nu , Nu .
0ut
Rt i
The number of the ITs filled limit orders in asset i, up to time t, is given by 0 `i
t dN t .
Therefore the ITs inventory in asset i at time t is given by
Z t Z t
i+ i
qti = `i+
t dN t + `i i+ i
t dN t mt + mt , q0i = 0 , (10)
0 0
and we assume that the IT restricts her accumulated inventory position to be between q i 6
qti 6 q i for all t. The strategy can build short positions, so q i < 0.
The execution prices that the IT achieves for trading one unit of the asset using market
i i
orders in asset i are Sti 2 i for a sell, and Sti + 2 + i for a buy, where i > 0 represents
liquidity taking fees and i is the spread between the best ask and best bid which we assume
to be constant.
The ITs cash process Xt satisfies the SDE
k n
X
i i i i+
dXt = i
St 2 `i
t 1 i i
{qt 6q } dN t + i
St + 2 `i+
t 1{qti >q i } dN t
i=1 (11)
o
i i
i
St + 2 + i 1{qti 6qi } dmi i
t + St 2 i 1{qti >qi } dmi+
t ,
with X0 = 0. The first line accounts for changes in cash due to limit orders being filled by
other market participants incoming market orders and the second line accounts for changes
in cash due to the market orders that the IT submits.
Next, we turn to the ITs optimization problem who seeks the strategy = {`
t , mt }0tT
which maximizes her expected cash while penalizing running inventory. Thus, the ITs perfor-
mance criteria following a given strategy is given by
k
" ( )#
X Z T
H (t, X, S, q) = Et,X,S,q XT + i (qTi , STi ) i (qui )2 du , (12)
i=1 0
7
represents the terminal liquidation costs / value
i
i (q, S) = q S 2 + i sgn(q) i q . (13)
As is usual, we seek Markov optimal controls (and under quite general assumptions, the opti-
mum is indeed Markov ).
In the right-hand side of the performance criteria (12) there are three components. The
first is terminal cash. The second component captures the costs that the IT incurs at the
terminal date T when liquidating qT . The terminal inventory is unwound with a market order
so picks up the costs of crossing the spread, as well as any other costs including liquidity
taking fees and (linear) market impact, both of which are captured by the parameter i > 0.
And the third component is a running penalty that the IT imposes to control her exposure to
inventory risk, see Cartea and Jaimungal (2015b) and Cartea et al. (2013) where the authors
demonstrate that this type of running penalty can be viewed as arising from model uncertainty
(ambiguity aversion) in the drift of the assets midprice process.
+ ni=k+1 21 i2 S i S i g + Ai (t, S) S i g
P
h i i
+ ki=1 1qi <qi i max`{0,1} 2 ` + g(t, S, q + i `) g(t, S, q)
P
(16)
h i i
+ ki=1 1qi >qi i+ max`{0,1} 2 ` + g(t, S, q i `) g(t, S, q) ;
P
n P o
k i i i
maxD(q) i=1 ( 2 + )| | + g(t, S, q + ) g(t, S, q) ,
8
and the QVI is subject to the terminal condition
k
i
X
g(T , S, q) = 2 + i |q i | + i (q i )2 , qi 6 qi 6 qi. (17)
i=1
In (16) the first line represents the flow of asset midprices and the updates of the priors in
the assets in which the IT trades, while the second line represents the flow of asset midprices
and the updates of the priors in the other assets. The third and fourth lines represent the
change in the value function due to execution of the agents posted limit orders. The last line
corresponds to the execution of market orders, and D(q) is the set of allowed market order
executions which respect the inventory limits imposed by the IT.
To solve the QVI numerically, and obtain a unique g, we impose a grid in S with boundaries
i
S i S i < S and the additional boundary conditions
i
S i S i g(t, . . . , S i , . . . , q) = 0, S i S i g(t, . . . , S , . . . , q) = 0 , (18)
In this section we use simulations to analyze the behavior and financial performance of the
ITs strategy. The details of the numerical scheme and its convergence are provided in section
6. Here we focus on the numerical results and their financial interpretation. To show how
much being better informed is worth, we compare her strategy to that of two other type of
traders:
i) An uniformed trader who does not have the capability to form a prior or learn from price
innovations and therefore assumes that the midprice dynamics are given by an arithmetic
Brownian motion we label him the uniformed trader (UT). In this case, the UTs strategy
9
becomes that of an investor who is providing liquidity similar to the market maker models
described in the extant literature, see for example Avellaneda and Stoikov (2008), and Chapter
10 in Cartea et al. (2015).
ii) The other is a trader who has an uniformed prior, but is able to learn from market
dynamics we label him the uninformed learner (UL). His prior is symmetric around the
current midprice, but it is not necessarily independent normal with mean zero and variance
i2 T as is the UTs. As a result, the UL is able to learn from the dynamics of the midprice.
To gain insights into the performance of the traders strategies, we first show results when
they trade and learn from only one asset, and then show results for n = 2 where traders learn
from two assets and take positions in only one of them.
where (x, , ) is a Normal pdf with mean and standard deviation , 0 6 S01 + d 6 S01 + u
and pu + pd = 1. More specifically, for the numerical experiments below we choose
ITs prior is that the pdf of D1 is as in (19) with u = 0.02 and d = 0.02, with
probabilities pu = 0.8 and pd = 0.2, u = f = 7 102 ,
ULs prior is that the pdf of D1 is as in (19) with u = 0.02 and d = 0.02, with
probabilities pu = 0.5 and pd = 0.5, u = d = 7 102 , and finally
UT believes that D1 N 0, 1 T . This choice results in the UT assuming the
midprice is a Brownian motion with volatility 1 so that St1 = S01 + 1 Wt1 .
In the simulations, we assume that the midprice follows the rBb (2) where at time T the
assets midprice will increase to S01 + u , where S01 = 1 and u = 0.02 (with probability 1), and
1 = 0.01. We assume that all three traders know the correct 1 parameter and we see that in
this set-up although the IT and UL learn from market dynamics using the same mechanism,
the IT has a more accurate prior than that of the UL. Market orders from other participants
which fill the traders posted limit orders arrive with intensities = 30. In the simulations the
actions of the traders are small enough not to alter the dynamics of the limit order book or the
midprice. Finally, the other model parameters are: T = 1, T = 1.1, 1 = 102 , 1 = 1 /20,
1 = 5 103 , q 1 = 20, q 1 = 20, and 1 = 0.
10
1.06 1.06 1.06
1 1 1
Figure 1: The IT trading strategy. Maroon: post a buy market order. Orange: post a sell market order. Cyan:
post a buy limit order. Blue: post a sell limit order. Yellow: post both buy and sell limit orders. Dark Blue:
do not post any orders. Note: near maturity t = 1, there is a dark blue sliver when q = 0, a maroon sliver
when q = 2 and an orange sliver when q = 2.
Figure 1 shows the optimal strategy the IT uses to trade in the asset at all points in time
for a range of prices and for three inventory levels. As the agent moves from a short position to
a long position, the buy (sell) regions both for sending market orders and for positing limit
orders shrink (expand). There is a clear asymmetry in the strategy due to the IT having a
weight of 0.8 placed on the asset price increasing to $1.02. From the starting asset price of
$1.00, however, the agent does not immediately execute buy market orders, rather there is a
thin region around which the agent posts either buy limit orders only or both buy and sell
limit orders. Only when the price moves slightly upwards does she execute a buy market order.
On the other hand, if the price moves down sufficiently she cancels the buy limit order and
posts only sell limit orders. If the price moves even further down quickly, then (when q = 0 or
2) she executes a sell market order. If prices move upwards sufficiently quickly, the agent also
executes sell market orders since she has the view that the price cannot end higher than $1.02.
For comparison purposes, we also show in Figure 2 the optimal strategy when the agent
only posts limit orders. When q = 0, the regions where the agent posts only buy or sell limit
orders shrink, and the region where she posts both type of orders is expanded. The reason is
that she is not able to fully unwind using market orders, and is therefore more conservative on
taking directional bets. When q = +2 (q = 2) the sell (buy) limit order region expands to
increase the likelihood that she unwinds her position by the end of the trading horizon.
Figure 3 shows the strategies of the three traders for the same midprice path. On the left-
hand y-axis we show the midprice and on the right-hand y-axis we show accumulated inventory,
the x-axis records the time from the start of the strategy up until time T = 1. In the figure,
stars denote the arrival of other participants market orders, the solid circles denote the traders
buy market orders, and the empty circles denote the traders sell market orders. The green
(red) line shows the times when the trader is posting buy (sell) limit orders. Furthermore, we
11
1.06 1.06 1.06
1 1 1
Figure 2: The IT is posting limit orders only (cannot execute market orders). Cyan: post a buy limit order.
Blue: post a sell limit order. Yellow: post both buy and sell limit orders. Dark Blue: do not post any limit
orders. Note: near maturity t = 1, there is a dark blue sliver when q = 0.
1 = 1.02 so that at t = 1
see that the midprice is the path that starts at S01 = 1 and ends at S1.1
the midprice is very close to 1.02. Around the midprice path we see the traders limit orders
who posts at the best bid and best ask, which are 12 1 away from the midprice.
In panel (a) we see the strategy followed by the IT. Clearly, at the beginning of the strategy
the IT submits a number of market buy orders, as well as predominantly submitting one-sided
buy limit orders, to build up inventories because her prior is that the midprice will increase. As
time evolves and the IT has built a long inventory, the strategy stops relying on market orders
and concentrates on two-sided limit orders to control inventories whilst earning the spread.
Nearer the end of the strategy, the IT stops posting limit orders on the buy side of the book
and unwinds inventories using both limit and market sell orders.
In contrast, we see the strategy adopted by the UT. Panel (b) shows how the UTs inability
to learn stops him from behaving as aggressively as the IT. At no point does the UT use market
orders to build or unwind inventories. In fact, the UT devises a strategy where inventories
mean revert to zero very quickly to avoid exposure to inventory risk. Finally, the strategy
followed by the UL is somewhere in between that of the IT and UT, see panel (c).
The strategy of both the IT and UL is distinct from that of a market maker, who is expected
to continuously post two-sided quotes (not necessarily at the touch). Even though between the
start and final date of the strategy the IT and UL post on both sides of the limit order book,
this is a result of their information updates which induce the strategy to control inventory
exposure. The UTs strategy is closer to that of a market maker because he can only make
money from the spread since he has no directional view on the market.
The mix between limit and market orders used by the traders depends on how confident
they are about their prior as well as their ability to extract information from midprice dynamics.
Even though in the simulations we assume that the midprice ends at time T = 1.1, and the
strategy closes at T = 1, the Brownian bridge adds noise to the path and the IT and UL need
12
1.04 20
Inventory (Q)
12
1.02
8
1.01
4
1
0
0.99 4
0.98 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (t)
(a) IT
1.04 20
Asset price (S)
1.03 16
Inventory (Q)
12
1.02
8
1.01
4
1
0
0.99 4
0.98 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (t)
(b) UT
1.04 20
Asset price (S)
16
Inventory (Q)
1.03
12
1.02
8
1.01
4
1
0
0.99 4
0.98 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (t)
(c) UL
Figure 3: Sample path of midprice (blue), inventories (thin green), and trading strategy (thick green and red
lines for when limit orders are posted, and green and red circles when market orders are executed).
to update their prior and decide whether to send market orders and/or to post one-sided or
two-sided limit orders and wait until they cancel them or are filled. In Figure 4 we show the
average number, for 104 midprice paths, of market and filled limit orders for the three traders.
To calculate the average number of orders we divide the time interval [0, 1] into 20 subintervals,
and compute within each subinterval the average number of filled limit orders and the average
number of submitted market orders. If a limit order is posted but not filled, then it is not
taken into account in these averages.
The figure shows that the strategy used by the traders is not only based on a directional
view of the market trend (as shown by the market orders they employ), but it also executes
round-trip trades during the life of the strategy. It is clear that the lesser the quality of the
information used by the trader, the closer the strategy to that of a market maker who aims at
making the spread and not to bet on any particular direction the asset price might take, see
panel (b).
13
10 10 10
9 l.o. buy 9 l.o. buy 9 l.o. buy
8 l.o. sell 8 l.o. sell 8 l.o. sell
m.o. buy m.o. buy m.o. buy
7 7 7
m.o. sell m.o. sell m.o. sell
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
# of time interval # of time interval # of time interval
Figure 4: Average number of posted market orders (circles) and filled limit orders (dash).
What is the incremental value from being better informed? In Figure 5 we show the
risk-reward profiles for the three traders for different values of the maximum and minimum
inventory that they are willing to bear (panel (a)), and for different values of the running
inventory penalty (panel (b)). Clearly, the IT outperforms the other two traders in both
expected and standard deviation of her strategys profit and loss (P&L). The financial gains
from this superiority are intensified if the traders are allowed to build larger inventory positions,
but we recall that there is a limit to these gains since we are assuming that the traders strategies
are small enough not to move the market.3 Furthermore, as panel (b) shows, if the running
inventory penalty is increased, the gains from better information decrease because the higher
the penalty, the more conservative is the strategy as a result of exerting tighter controls on the
inventory position, see Cartea and Jaimungal (2015b).
0.55
Bounds on IT 0.5
inventory are
UL IT
0.5 increasing
UT running UL
penalty UT
0.45 is increasing
0.45
Mean P&L
Mean P&L
0.4 0.4
0.35
0.35
0.3
0.3
0.25
0.2 0.25
0.02 0.04 0.06 0.08 0.1 0.12 0.02 0.04 0.06 0.08 0.1
Std of P&L Std of P&L
Figure 5: Risk-return profiles for different constraints on inventory and levels of the running inventory penalty.
3
One issue we do not discuss here is the costs associated to acquiring information. Indeed, traders must
take into account the costs of processing and acquiring information and compare these to the profits that this
information generate.
14
0.34
0.33
Figure 6: Squares : trading in high volatility asset and learning from
0.32 both high and low volatility assets. Triangles 4: trading in high volatil-
Mean P&L
0.31
ity asset without learning. Diamonds : trading in low volatility asset
running penalty
is increasing
without learning. Exes : trading in low volatility asset and learning
0.3
from both high and low volatility assets.
0.29
where (x1 , x2 ; 1 , 2 , 1 , 2 ) denotes the (independent) bi-variate normal pdf with mean ( 1 , 2 )
and standard deviation (1 , 2 ), and pdd , pdu , pud , puu and d1 , u1 , d2 , u2 are provided in Table
1. We further assume T = 1, T = 1.1, 1,2 = 102 , 1,2 = 1,2 /20, 1,2 = 5 103 , q 1,2 = 20,
q 1,2 = 20, 1 = 2 = 7 102 and 1,2 = 0.
D1
d1 = 0.03 u1 = +0.03
d2 = 0.02 0.45 0.05
D2
u2 = +0.02 0.05 0.45
If the IT focuses only on one asset, then the probabilities assigned to the asset ending at
S0i + Diare pu = pd = 0.5 for i = 1, 2. However, if the IT is able to learn from the dynamics of
both assets then she will be able to make better predictions of the midprice at a future date.
To show the incremental gains due to learning from both assets and trading in one we compare
the P&L resulting from strategies where the IT learns from one or two assets and trades in
one. Figure 6 shows the risk-reward profile for the four strategies for different values of the
running inventory penalty. Clearly, learning from both assets increases the profitability of the
15
4 4
l.o. buy l.o. buy
l.o. sell l.o. sell
3 m.o. buy 3 m.o. buy
m.o. sell m.o. sell
2 2
1 1
0 0
0 5 10 15 20 0 5 10 15 20
# of time interval # of time interval
Figure 7: Average number of posted market orders (circles) and filled limit orders (dash) when trader learns
from both assets.
strategy.
Moreover, Figure 7 shows the average number of market and filled limit orders for 104
midprice paths when the trader takes a position in one asset but learns from both. Panel (a)
(panel (b)) shows the average orders when trading in the high (low) volatility asset. The figure
shows that the strategy for both assets relies heavily on making the spread using two-sided
limit orders with a slight tilt to more buy limit orders at the beginning because the prediction
is that the asset price will increase.
Thus far we have not incorporated the impact that other traders actions have on the
midprice process. It is well known that when market orders arrive they tend to push prices
in the direction of trades and this feedback effect should be taken into account (see, e.g.,
Lipton et al. (2013), Bechler and Ludkovski (2014), Cartea and Jaimungal (2014), Cartea
et al. (2015)). Such actions can result in the IT being at least locally adversely selected e.g.,
after a limit order purchase, the price goes down because the ITs limit order was matched
by a very fast informed trader who makes profits from ultra-fast short-term predictions. To
account for such effects we consider the following variant of the model and assume that prices
satisfy the SDE
dSti = A(t, S) dt + i dWti + i (dNti+ dNti ) , (21)
where i > 0 are price impact constants (these can be made random with little difficulty), and
recall that Nt are Poisson processes counting the number of buy/sell market orders posted
by the other market participants up to time t. Here, for simplicity, we assume that the ITs
posted limit orders are filled with probability one (pi = 1) if a market order arrives while they
are posted. It is not too difficult to lift this assumption, but in the interest of keeping the
discussion concise we assume that the limit orders are always at the front of the queue. Such
16
1.04 20
Inventory (Q)
1.02 10
1 0
0.98 10
0.96 20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (t)
Figure 8: Sample path of midprice (blue), inventories (dark green), and trading strategy (light green and red
lines for when limit orders are posted, and green and red circles when market orders are executed) in the
presence of adverse selection.
a model is similar in spirit to the one adopted in Cartea and Jaimungal (2015b) where the
authors show that market makers who can post anywhere in the limit order book modify their
strategies by posting deeper into the book to account for adverse selection costs.
The IT still seeks the value function (14) and the ansatz H(t, X, S, q) = X + ki=1 q i S i +
P
+ ni=k+1 12 i2 S i S i g + Ai (t, S) S i g
P
Pk h i i
i i i
+ i=1 1qi <qi max`{0,1} 2 ` + g(t, S, q + `) g(t, S, q)
Pk h i i
+ i=1 1qi >qi i+ max`{0,1} 2 i ` + g(t, S, q i `) g(t, S, q) ;
n P o
k i i i
maxD(q) i=1 ( 2 + )| | + g(t, S, q + ) g(t, S, q) .
(22)
The key change is that in lines three and four the gains the IT makes from her posted limit
i
order being filled (relative to the midprice) decreases by i to 2 i .
Figure 8 shows actions of the IT when other participants market orders affect midprices as
in (21). The price impact parameter is 1 = 3 103 . Notice that in this case the IT initially
posts limit orders symmetrically (in contrast to when there was no market impact, compare
with Figure 3(a)) because the presence of adverse selection makes the IT wait until the price
is pushed sufficiently upwards (in this case by the arrival of two market buy orders which have
positive price impact) before modifying her strategy to become directional.
6. Convergence Results
We now address the problem of finding numerically the optimal trading strategy when the
asset midprices satisfy (7). The traders inventory qt , cash Xt , and value function H(t, X, S, Q)
17
are defined as in Section 3 but with St = (St1 , . . . , Stn ) being the solution of (7). Then H admits
the representation
X k
H(t, X, S, q) = X + q i S i + g(t, S, q) (23)
i=1
n P o
k i i i
maxD(q) i=1 ( 2 + )| | + g(t, S, q + ) (1 + )g(t, S, q) ,
(24)
k
i
X
g(T , S, q) = 2 + i |q i | + i (q i )2 , qi 6 qi 6 qi, (25)
i=1
Below we introduce a finite difference scheme that produces discrete approximations to the
viscosity solution g of (16) on [0, T ] Rn . In order to prove that it converges, we use the result
of Briani et al. (2012), an extension of the classical framework of Barles and Soganidis (1991)
to systems of PDEs satisfying certain monotonicity assumptions.
6.1. Preliminaries
To simplify notation, in what follows we restrict ourselves to the case of a single asset,
i.e. n = 1. The general case of n > 2 assets is treated similarly.
Consider the system of equations
Fq , S, u, u, S u, SS u = 0 , (26)
+ |q| + (q)2 ,
uq (0, S) = 2 q 6 q 6 q,
18
where the vectors u = (uq ), u = ( uq ), S u = (S uq ), SS u = (SS uq ), and the function
Fq (, S, u, c, p, X) = max cq 21 2 Xq + A(T , S) (q pq ) q 2
+1q<q max`{0,1}
2 ` + (1 + )uq uq+`
+1q>q + max`{0,1}
2 ` + (1 + )uq uq` ;
max{1q>q ,0,1q<q } 2 + || + (1 + )uq uq+ ,
(A1) Let u = (uq ), v = (vq ) Rn . Suppose that uq vq = maxq {uq vq } > 0. Then
Fq (, S, u, c, p, X) Fq (, S, v, c, p, X) > (uq vq ) .
(A2) There exists > 1 and a continuous function : R+ R+ , (0) = 0, such that for any
X, X 0 R satisfying (in the sense of quadratic forms)
1 0 X 0 1 1
3 6 6 3 ,
0 1 0 X0 1 1
we have
Fq , S, u, ( 0 ), (S 0 S), X Fq 0 , S 0 , u, ( 0 ), (S 0 S), X 0
6 ( 0 )2 + (S 0 S)2 + 1
for all q.
19
6.2. Finite difference scheme
We provide a finite difference scheme F ,h = (Fq,h ) : [0, T ] R Rm C([0, T ] R) Rm
by the formula (r = (rq ) Rm , C([0, T ] R), where m = q q + 1)
rq ( ,S) (,S+h)2rq +(,Sh)
Fq,h (, S, r, ) := max 21 2 h2
+ A(T , S)q (q)2
A+ (T , S)
(,S+h)rq
A (T , S) rq (,Sh)
h h
+1q>q + max`{0,1}
2 ` + (1 + )rq rq` ;
max{1q>q ,0,1q<q } ( 2 + )|| + (1 + )rq rq+ ,
(C1) For all bounded functions u = (uq ), v = (vq ) with uq , vq C([0, T ] R) and uq 6 vq on
[0, T ] R, and any r, s Rn such that ( =) rq sq = maxq {rq sq } > 0 we have
Fq,h
(, S, r, uq + ) Fq,h
(, S, s, vq ) > .
(C2) For any bounded function C([0, T ] R), fixed (, S) [0, T ] R and r = (rq ) Rn ,
the functions
S 7 Fq,h (, S, r, ) , q 6 q 6 q,
are bounded and continuous, and the functions
r 7 Fq,h (, S, r, ) , q 6 q 6 q,
(C3) For every = (q ), q C([0, T ] Rn ) C 1,1 ((0, T ) R), (, S) [0, T ] R and > 0
20
there exist h, > 0 such that
(C4) For any , h > 0 there exists a bounded solution u,h = (u,h ,h
q ), uq : T Sh R of the
discrete problem (27).
Armed with these results, we define the candidate super- and sub-solutions of (26) as
0 0
uq (, S) = lim inf u,h
q ( , S ) ,
, h+0, ( 0 ,S 0 )(,S)
0 0
uq (, S) = lim sup u,h
q ( , S ) ,
,h+0,( 0 ,S 0 )(,S)
for (, S) [0, T ] R, q 6 q 6 q.
Propositions 3 and 4 allow us to apply the argument of Proposition 3.3 of Briani et al.
(2012): namely, u = (uq ) and u = (uq ) are indeed, respectively, supersolution and subsolution
of (26), and hence by the comparison principle (cf. remark before Proposition 3) uq 6 uq on
[0, T ] R. The opposite inequality is obviously true, and hence u := u = u is the viscosity
solution of (26). The constructed solution is bounded, since a comparison principle holds and
the fact that (24)(25) admits constant sub- and super- solutions, see Proposition 6.1 in the
Appendix of Briani et al. (2012).
7. Conclusions
In this paper we show how an algorithmic trader learns from market-wide price dynamics to
trade around directional predictions of the price of an asset or collection of assets. The profits
of the strategy come from both capital appreciation of inventories and earning the spread from
round-trip trades. When the quality of the information is impaired by the volatility of prices,
the strategy relies less on betting on the direction of the asset and relies more on two-sided
quotes to earn the spread. In extreme cases, where the trader is uniformed or the quality of
the information she gathers is too poor, the strategy behaves like that of a market maker who
is willing to bear little inventory risk.
We also show how learning from other assets that co-move increases the profits of the
trading strategies. This increase in profits is noticeable when the trader takes positions in an
asset which exhibits high volatility but improves the quality of the information she gathers
by learning from the dynamics of other assets that co-move. We also show how the traders
strategy is modified to account for adverse selection costs.
Finally, we show that the numerical scheme does indeed converge to the unique viscosity
solution of the QVI that arises from the dynamic programming principle.
21
Appendix A. Proofs
where F (x|St = y) is the joint conditional cdf of the random variables D. Using Bayes formula
R x1 R xn
R. . . pSt (y|D = z) dF (z)
F (x|St = y) = ,
Rn pSt (y|D = z) dF (z)
where pSt (y|D = z) is the joint conditional probability density function (pdf) of St . Condi-
tional on D = z, Sti = i tT i + t z are independent Gaussian random variables with mean
T i
t 2 t
T iz and variance i T (T t). This provides an explicit formula for pSt (y|D = z), and as a
result we obtain (6).
2. Next, we show that St satisfies (4). We fix some i {1, 2 . . . , n}, it suffices to show that
Z i Z
Su S0i
i 1 i i ai (u, Su )
W = St S0 + du du , (A.1)
i 0 T u 0 T u
22
where the second and third equalities follow from (2) and the law of iterated expectations.
Furthermore, re-arranging (A.1) yields
By definition,
t
Sti S0i = E Sti S0i | Ft = i E [tT | Ft ] + ai (t, S) ,
T
and hence we see from the previous formula that Wti satisfies the martingale condition.
Finally, it follows from (2) that d[W i , W j ] = 0 if i 6= j. Therefore, by Lvys theorem,
independent Ft -Brownian motions.
Substituting the ansatz (15) into the above DPE leads to the stated QVI for g. In Section 6,
we prove that (A.3) has at least one solution g. By the comparison principle (also established
in Section 6), this solution is unique.
Proof (Proposition 3). To simplify notation, we assume without loss of generality that
+ = = 1. We write
Fq (, S, u, c, p, X) = max { Lq (, S, u, c, p, X) , Mq (, S, u, c, p, X) } ,
23
where
Lq (, S, u, c, p, X) = cq 21 2 Xq + A(T , S) (q pq ) q 2
+ 1q<q max`{0,1}
2 ` + (1 + )uq uq+`
+ 1q>q + max`{0,1}
2 ` + (1 + )uq uq` ,
and
Mq (, S, u, c, p, X) = max + || + (1 + )uq uq+ ,
{1q>q ,0,1q<q } 2
for q 6 q 6 q.
(A1) The required inequality can be stated as follows:
Depending on where the maximum is attained, the latter is equivalent to one of the following
inequalities:
1. Lq (, S, u, c, p, X) Lq (, S, v, c, p, X) > 0 ,
2. Mq (, S, u, c, p, X) Lq (, S, v, c, p, X) > 0 ,
3. Lq (, S, u, c, p, X) Mq (, S, v, c, p, X) > 0 ,
4. Mq (, S, u, c, p, X) Mq (, S, v, c, p, X) > 0 .
In fact, it is easy to see that the proof of 1) and 4) would suffice.
Let us prove 1) i.e. that in case q < q < q
max{0,
2 + uq uq 1 } max{0, 2 + vq vq 1 } > 0 ,
(A.5)
max{0,
2 + uq uq +1 } max{0, 2 + vq vq +1 } > 0 ,
(A.6)
(if q = q or q = q one of the above inequalities is void). We prove the last inequality: it can
be violated only if
2 + uq < uq +1 , 2 + vq > vq +1 . Then 2 vq < vq +1 . Adding up
the latter and 2 + uq < uq +1 we obtain uq vq < uq +1 vq +1 , which contradicts out
assumption that uq vq = maxq (uq vq ).
Now, let us prove 4), i.e. that
max 0,
2 + uq uq +1 , 2 + uq uq 1
max 0,
2 + v q vq +1 ,
2 + v q vq 1 > 0.
The proof is straightforward: for instance, suppose that the first maximum is attained at
the second argument, while the second maximum is attained at the third argument; then the
inequality that we need to prove becomes
+ uq uq +1 + vq vq 1 > 0 ,
2 2
24
i.e. uq uq +1 (vq vq 1 ) > 0. Noting that uq uq +1 > uq uq 1 , and using our
assumption uq vq = maxq (uq vq ), we obtain the required inequality.
(A2) The condition on X, X 0 R, re-written in coordinates, is as follows:
We need to prove
6 (( 0 )2 + (S 0 S)2 + 1 ) .
Lq , S, u, ( 0 ), (S 0 S), X Lq 0 , S 0 , u, ( 0 ), (S 0 S), X 0
6 (( 0 )2 + (S 0 S)2 + 1 ) ,
Mq , S, u, ( 0 ), (S 0 S), X Mq 0 , S 0 , u, ( 0 ), (S 0 S), X 0
6 (( 0 )2 + (S 0 S)2 + 1 ) .
The second inequality is trivially true for any continuous : R+ R+ , (0) = 0 since its
left-hand is identically zero. Let us prove the first inequality. We have
Lq , S, u, ( 0 ), (S 0 S), X Lq 0 , S 0 , u, ( 0 ), (S 0 S), X 0
1
= A(T , S)A(T 0 , S 0 ) q+A(T , S)(S 0 S)A(T 0 , S 0 )(S 0 S)+ 2 X+X 0 .
2
The last term is negative, according to our assumption on X, X 0 , cf. (A.7) with x = x0 = 1.
The rest now follows from the fact that A(T , S) is globally Lipschitz in (, S), cf. Section
2.
Proof (Proposition 4). The verification of conditions (C2), (C3) and (C4) is immediate.
Let us show that condition (C1) is satisfied as well. Denote
n o
Fq,h (, S, r, ) = max L,h
q (, S, r, ) , M ,h
q (, S, r, ) .
Similarly to the proof of Proposition 3, it suffices to prove the following two inequalities:
L,h ,h
q (, S, r, uq + ) Lq (, S, s, vq ) > ,
Mq,h
(, S, r, uq + ) Mq,h
(, S, s, vq ) > .
The proof of the second inequality essentially repeats the proof of inequality 4) in the proof
of Proposition 3. For the proof of the first inequality, let us write L,h
q (, S, r, uq + )
25
L,h
q (, S, s, vq ) = I1 + I2 , where
rq u( , S) sq v( , S)
I1 =
1 2 u(, S + h) 2rq + u(, S h) + 2 v(, S + h) 2sq + v(, S h)
2 2
+ 12 2
h h2
u(, S + h) + rq v(, S + h) s q
A+ (T , S)
h h
rq u(, S h) sq v(, S h)
A (T , S)
h h
and
+
I2 = 1q <q max ` + (1 + )rq rq +` + 1q >q max ` + (1 + )rq rq `
`{0,1} 2 `{0,1} 2
+
1q <q max ` + (1 + )sq sq +` 1q >q max ` + (1 + )sq sq ` .
`{0,1} 2 `{0,1} 2
The proof that I2 > is similar to the proof of inequality 1) in the proof of Proposition 3.
It remains to show that I1 > 0. The latter is immediate from our assumptions u 6 v and
= rq sq = maxq {rq sq } > 0.
References
Almgren, R. (2003). Optimal execution with nonlinear impact functions and trading-enhanced
risk. Applied Mathematical Finance 10(1), 118.
Avellaneda, M. and S. Stoikov (2008, November). High-frequency trading in a limit order book.
Quantitative Finance 8, 217224.
Barles, G. and P. Soganidis (1991). Convergence of approximation schemes for fully nonlinear
second order equations. Asymptotic Analysis 4, 271283.
Baudoin, F. (2002). Conditional SDE and applications in finance. Stochastic Processes and
their Applications 100, 109145.
Baudoin, F. and L. Nguyen-Ngo (2004). The financial value of weak information. Finance and
Stochastics 8, 415435.
26
Bayraktar, E. and M. Ludkovski (2014). Liquidation in limit order books with controlled
intensity. Mathematical Finance 24(4), 627650.
Bechler, K. and M. Ludkovski (2014). Optimal execution with dynamic order flow imbalance.
arXiv preprint arXiv:1409.2618.
Briani, A., F. Camini, and H. Zidani (2012). Approximation schemes for monotone sys-
tems of nonlinear second order parabolic equations: convergence result and error estimate.
Differential Equations and Applications 4, 297317.
Brody, D., L. Hughston, and A. Macrina (2008). Information-based asset pricing. International
Journal of Theoretical and Applied Finance 11, 107142.
Cartea, ., R. Donnelly, and S. Jaimungal (2013). Algorithmic trading with model uncertainty.
Social Science Research Network Working Paper Series.
Cartea, ., R. F. Donnelly, and S. Jaimungal (2015). Enhancing trading strategies with order
book signals. Available at SSRN 2668277.
Cartea, . and S. Jaimungal (2013). Modeling asset prices for algorithmic and high frequency
trading. Applied Mathematical Finance 20(6), 512547.
Cartea, . and S. Jaimungal (2014). Incorporating order-flow into optimal execution. Available
at SSRN 2557457.
Cartea, . and S. Jaimungal (2015a). Optimal execution with limit and market orders.
Quantitative Finance 15(8), 12791291.
Cartea, . and S. Jaimungal (2015b). Risk metrics and fine tuning of high-frequency trading
strategies. Mathematical Finance 25(3), 576611.
Cartea, ., S. Jaimungal, and J. Ricci (2014). Buy low, sell high: A high frequency trading
perspective. SIAM Journal on Financial Mathematics 5(1), 415444.
27
Cartea, ., S. Jaimungal, and J. Ricci (2015). Trading strategies within the edges of no-
arbitrage. Social Science Research Network Working Paper Series.
Gatheral, J., A. Schied, and A. Slynko (2012). Transient linera price impact and Fredholm
integral equations. Mathematical Finance 22(3), 445474.
Guant, O. and C.-A. Lehalle (2015). General intensity shapes in optimal liquidation.
Mathematical Finance 25(3), 457495.
Guant, O., C.-A. Lehalle, and J. Fernandez-Tapia (2012). Dealing with the inventory risk: a
solution to the market making problem. Mathematics and Financial Economics, 131.
Guilbaud, F. and H. Pham (2013). Optimal high-frequency trading with limit and market
orders. Quantitative Finance 13(1), 7994.
Ho, T. and H. R. Stoll (1981). Optimal dealer pricing under transactions and return uncer-
tainty. Journal of Financial Economics 9, 4773.
Ishii, H. and S. Koike (1991). Viscosity solutions for monotone systems of second order elliptic
pdes. Communications in Partial Differential Equations 16, 10951128.
Kharroubi, I. and H. Pham (2010). Optimal portfolio liquidation with execution cost and risk.
SIAM Journal on Financial Mathematics 1, 897931.
Lipton, A., U. Pesavento, and M. G. Sotiropoulos (2013). Trade arrival dynamics and quote
imbalance in a limit order book. arXiv:1312.0514.
Macrina, A. (2014). Heat kernel models for asset pricing. International Journal of Theoretical
and Applied Finance 17(07), 1450048.
Obizhaeva, A. A. and J. Wang (2013). Optimal trading strategy and supply/demand dynamics.
Journal of Financial Markets 16(1), 132.
ksendal, B. and A. Sulem (2008). Applied stochastic control of jump diffusions, 3rd ed.
Springer.
Schied, A. (2013). Robust strategies for optimal order execution in the Almgren-Chriss frame-
work. Applied Mathematical Finance 20(3), 264286.
28