Marginal Price Optimization
Marginal Price Optimization
Stefan Loesch
arXiv:2502.08258v1 [q-fin.MF] 12 Feb 2025
Mark Richardson
[email protected]
27 January 2025
Abstract
We introduce a new framework for optimal routing and arbitrage in AMM driven
markets. This framework improves on the original best-practice convex optimization
by restricting the search to the boundary of the optimal space. We can parameterize
this boundary using a set of prices, and a potentially very high dimensional optimiza-
tion problem (2 optimization variables per curve) gets reduced to to a much lower
dimensional root finding problem (1 optimization variable per token, irregardless of
number of the curves). Our reformulation is similar to the dual problem of a refor-
mulation of the original convex problem. We show our reformulation of the problem
is equivalent to the original formulation except in the case of infinitely concentrated
liquidity, where we provide a suitable approximation. Our formulation performs far
better than the original one in terms of speed – we obtain an improvement of up to
200x against Clarabel, the new CVXPY default solver – and robustness, especially
on levered curves.
1
Contents
List of Figures 4
List of Tables 5
1 Introduction 7
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Automated Market Makers . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Levered liquidity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Directional liquidity and fees . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Limit orders and infinite leverage . . . . . . . . . . . . . . . . . . . 13
1.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Convex Optimization 18
2.1 Setting up the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Convergence issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Performance comparison 53
5.1 Performance on token pairs . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2
5.2 Scaling the number of curves . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Mathematical interlude . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 Scaling the number of tokens . . . . . . . . . . . . . . . . . . . . . . . . . 61
6 Conclusion 64
References 67
D Numerical Methods 78
D.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
D.1.1 Finding roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
D.1.2 Finding minima and maxima . . . . . . . . . . . . . . . . . . . . . 80
D.1.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
D.1.4 Higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
D.2 Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
D.2.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
D.2.2 Introducing the learning rate eta . . . . . . . . . . . . . . . . . . . 89
D.2.3 Higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
F Implementation example 96
3
List of Figures
4
List of Tables
5
List of Definitions and Theorems
6
1 Introduction
Part of Bancor’s product offering is the FastLane Arbitrage bot [15]. When developing
this bot, we have developed our own algorithms for solving what we call the “Arbitrage
Problem”1 , loosely defined as making risk free money out of a given set of AMMs by
trading against them. We started with what we believe was the standard approach at the
time, the convex regular convex optimization approach proposed in [6]. This approach
worked very well on unlevered curves, but as soon as we applied it to levered curves, we
ran into convergence issues that we could not ultimately solve. The main purpose of the
FastLane bot was to support the Carbon DeFi protocol [18, 31] and when looking into the
problem in more detail, we felt that there were structural reasons why the direct convex
optimization approach from [6] would not work well on levered curves. We therefore
developed our own algorithm, called the “Marginal Price” algorithm, which solves the
same problem, but which is significantly faster and which has more benign convergence
properties. It is similar to solving the conjugate convex problem proposed in [12] which
was published around the time we put the finishing touches on our arbitrage bot, but it
goes somewhat further, and it is more founded in financial than in purely mathematical
principles.
Before we go into the details we will provide key results from [6] that help us to better
define the problem space. For clarity of presentation, the references of this section will
lead to later sections in the paper. We do not want to hide the forest behind the trees,
and whilst the definitions and results referenced are important enough to warrant formal
treatment, the concepts are sufficiently widely understood that we are confident that a
reader even vaguely familiar with the topic will understand terms like “AMM” without
having to look up the formal definition.
7
state, the Arbitrage Problem is the problem of finding a sequence of trades that will
result in a risk-free arbitrage profit3 . The Routing Problem is the problem of finding a
sequence of trades on those AMMs that will result in the highest output (or lowest input)
of a ”target token” when all other token input and output quantities are fixed.
Proof. Solving the routing problem where no other tokens go into or out of the system
is the definition of an arbitrage according to definition 10. ■
In other words – when solving the routing problem, one is generally also solving the
arbitrage problem because arbitrages subsidize the desired exchange, and a pure abitrage
is optimally routing the “null”. In this paper, we mostly focus on arbitrage because of
our product focus, but transposing the results to routing is generally straightforward.
In this section, we briefly discuss the concept of Automated Market Makers (“AMMs”)
and their bonding curves 4 . Here we focus solely on constant product curves, including
their levered variety, as they represent all features of interest for our purposes. We
note, however, that most results of this paper generalize to other types of curves as well,
although it helps when certain quantities can be computed analytically, otherwise inter-
dependent numerical methods can impose additional challenges and performance may
be poor. We also note that, in our experience, all relevant curves can be approximated
by constant product curves, in segments if need be, which is the approach we take in
practice to deal with curves other than constant product curves.
Formally, an AMM is a smart contract that allows the user to trade two or more assets.
3
See definition 10
4
See definition 7 for a formal definition of the concept
8
The seminal version first introduced by Bancor [21, 22] was a generalized hypersurface
SC,r embedded in Rn , defined by the equation
n
( )
Y
SC,r = x xri i =C (1.1)
i=1
where x = (x1 , x2 , . . . , xn ) ∈ Rn+ is the vector of token balances and the r ∈ Rn+ is the
associated vector of “reserve weights”. C is a constant.
Equivalently, using the log token balances z = log x where the log function applies on a
per-component basis, one can transform the curved hypersurface SC,r into a hyperplane
PC,r given by the equation
n
( )
X
PC,r = z zi ri = log C (1.2)
i=1
This is the version implemented by Bancor [22] and later by Balancer [28]. The most
popular pools on Bancor v1 were all two-assets / same-weight. When Uniswap created
their first AMM, they froze this as a design principle [2, 3] and they, and many subsequent
AMMs including Bancor v2.1/3 and Carbon DeFi, relied on the well known simplified
version of the aforementioned constant product bonding curve. This simplified invariant
function is defined by the equation
x·y =k (1.3)
where x and y are the quantities of the two assets, and k is the pool invariant. What this
equation signifies is that – ignoring fees – the AMM in question will engage in any trade
(ie exchange x for y or vice versa) that keeps the pool invariant k constant. Adding or
removing liquidity, including via fees, will of course change k. We can express y as a
function of x as y(x) = k/x. It is easy to see that the marginal price in units of y per x
at a given point (x, y) is given by
9
dy k y2 y
pmarg ≡ − = 2 = = (1.4)
dx x k x
from which immediately follows that, at the marginal price, the value of token holdings
in x and y will always be the same, in units of x, or y, or any joint numeraire in which
one chooses to express it.
The above curves – usually referred to as “unlevered” curves for reasons that will become
clear in a moment – trade over the entire possible price range. This is nice for symmetry
and scale invariance reasons, but it is not particularly efficient in terms of collateral
usage, as most tokens in the AMM are held in reserve for price points that, in realistic
markets, will never be reached. See figure 1.1 for an illustration of this.
Figure 1.1: Invariance and price curve for an unlevered AMM. Invariance (aka bonding)
curve (left) and associated price curve (right) for an unlevered AMM; the pink area
represents an example for a reasonable trading area, and all collateral outside of it is
rarely used.
This is where the concept of virtual token balances and “amplified” or “concentrated”
liquidity curves comes in, first described by Bancor [23] in 2020, and later popularized
10
by Uniswap [5] in 2021. The idea is simple: we just posited that a significant proportion
of the collateral is allocated to price points that will never be reached, so we might as
well remove it from the AMM, and if ever the AMM were in a price range where it had
to pay out those removed tokens, it would just halt trading until prices returned to the
range it was prepared to trade in. To formalize this concept, we introduce virtual token
balances5 xv > 0 and yv > 0, and with their help we rewrite the invariant equation 1.3
as
where xa ≥ 0 and ya ≥ 0 are the actual token holdings of the AMM. The trading
behavior of the AMM is now slightly modified in that – again ignoring fees – the AMM
will accept any trade that (1) holds k constant, and importantly (2) maintains xa and
xb non-negative. What this means in practice is that a levered AMM has two price
boundaries – one for xa = 0 and one for ya = 0 – where all the AMM’s collateral is
held in the respective other token. An example for a levered curve is shown in figure 1.2
where an amount of 5 ETH or 10,000 USDC (exact mix depends on where we are on the
curve) is deployed over a price range of slightly below 1,500 to 3,000 USDC per ETH.
This relatively wide range of a levered curve is typical for Bancor’s original concentrated
AMM, and its successor, Carbon DeFi. However, architecture and design decisions
made for the Uniswap v3 AMM mandate a much thinner range: the minimum tick size
depends on the fee tier, but the width of a single Uniswap v3 curve was initially between
10-200 basis points. What is usually referred to as “the” Uniswap v3 curve is actually a
collection of independent curves that are located adjacent to each other, each of which
have their own liquidity holdings.
5
Some authors disagree on what exactly is being referred to as virtual token balance, the quantity xv
or the sum xv + xa ; herein, we use the former, but in any case it should always be clear from the context
11
Figure 1.2: Invariance and price curve for a levered AMM. Invariance (aka bonding)
curve (left) and price curve (right) for a levered AMM; only the area in solid blue
corresponds to actual liquidity held by the AMM. The thin curve extending it depicts
where the associated unlevered AMM would trade.
For those retained balances the Carbon AMM design then offers two choices:
• they are off-curve and inactive, and their only future use is that they can be
withdrawn by the owner of the curve, or
12
• they are automatically placed onto another curve that trades into the opposite
direction, with parameters independent of that of the first curve.
In a Carbon DeFi context, buy and sell curves are typically non-overlapping. For ex-
ample, a curve may buy ETH against USDC between 2,500-2,600 and sell it between
2,900-3,000. However – overlapping strategies are also possible where the parameters are
chosen such that the price of the buy and sell curve move in unison, and that collateral
that moved from the buy to the sell curve is offered back to the market at only a slightly
higher price.
It is an important insight – and the reason why we wrote “formally” introduced above
– that functionally, overlapping strategies are not different from non-directional curves
with fees. For example, a curve that buys and sells the marginal ETH at 3,000 USDC
with a 1% fee is functionally equivalent to one curve that buys ETH at 2,970 USDC and
another that sells it at 3,030 USDC, and collateral moves from one curve to the other
upon trading.
Fees will become very important further down this paper because they actually lead to
curve scenarios that are numerically not particularly well behaved. We will get into the
details of this further ahead, but the key reason is that in the presence of fees there are
“holes” in the curve where it does not exist. For example the curve above would not
trade at all for prices between 2,970 and 3,030 USDC, and any algorithm that performs
a local analysis at this price point risks failing.
Bancor, via its Carbon DeFi product, also introduced the concept of limit orders into
the AMM space, where the entire liquidity of a curve is placed at a single price point.
Formally, this can be considered as the limit xν , yν → ∞ in equation 1.5 and in our
preprint [33]. In practical calculations this of course does not help us much as we cannot
deal numerically with numbers that are infinite. The Carbon DeFi implementation
√
gets around this by reparametirizing the problem in terms of B = pmin and A =
13
√ √
pmax − pmin , where pmin and pmax are the marginal prices at the boundaries, which
allows setting A = 0 for limit orders. However, in our implementation of the arbitrage
bot (productized as Bancor’s “ArbFastLane protocol”), and throughout this paper, we
do not allow for A = 0 as this would lead to numerical instabilities in the calculations.
Instead, we impute a minimum value for A that is big enough for numerical stability,
and that is then being corrected in a transaction fine tuning process once arbitrage
opportunities have been identified.
1.3 Optimization
Before we dive further into the issue at hand, we want to generally discuss optimiza-
tion problems and start building intuition on how they work, and how they can be
solved6 . Generally, an optimization problem has (1) a target function, and (2) one or
more constraints. We then are looking to maximize or minimize the target function
while satisfying the constraints. The constraints can be equality constraints, inequality
constraints, or both. Equality vs inequality does not usually make a difference as the
solution to a sufficiently well conditioned optimization problem is found on the boundary
of the feasible region. Constraints can be of two kinds which we call “(multi-)linear”
or “explicit”, respectively. The former consist of multiple linear inequalities and all
of them must be satisfied, leading to a non-differentiable hypersurface where the non-
differentiable regions are the boundaries on which the active constraint switches. What
we call explicit constraints are constraints that can be written in the form f (x) = 0
for some function f . We call an explicit constraint smooth if it can be written using a
single smooth constraint function f . Note that the two forms of constraints are often
interchangeable. For example, every multi-linear constraint can be converted into an
explicit constraint using a piecewise-linear function f , but depending on the exact use
case one or the other may be easier to deal with. Linear constraints have the advantage
that the resulting set will be convex. Also, in higher dimensions it is usually hard to
formulate an explicit constraint function matching multiple linear constraints.
6
Please also refer to appendix D for a discussion of the numerical methods we use
14
In figure 1.3 on the left hand panel, we have drawn an optimization problem in one
dimension where the blue and orange lines represent the constraints, linear and smooth
respectively, and the grey lines represent level sets of the target function. We have also
drawn the derivative of the constraint and target functions in the same figure, on the
right hand panel.
In one dimension, this problem is easy to solve: the point where the target function
is optimized in the smooth case is where the level set is tangent to the constraint. In
other words: we look for the point where the derivative of the target function equals
the derivative of the constraint, which in this case happens slightly below x = 1. For
the linear case, we encounter another well known result of optimization theory which is
that the solution to the optimization problem is usually found on the lowest-dimension
boundary, which in this case are the points where the different constraints meet. In this
case the solution is exactly at x = 1. Note that whilst this is not technically a point
where the constraint is tangent to the target function, the derivative of the constraint
flips from above to below the derivative of the target function. If we would smooth the
corners of the constraint function – say by folding with a C ∞ kernel – we would find
that the derivates would again meet very closely to x = 1, the exact point depending on
the kernel used.
Ultimately, as we will see in what follows, this problem shows the essence of what we are
aiming to do in this paper, except that the problem we are solving is higher dimensional,
and the functions we consider are more complex, so that we do have to rely on numerical
methods to find the solution. Specifically, the target function is the profit made, or rather
the outflow in the target token chosen to collect the profit. The constraints are given by
the AMM curves that form the market, plus the “self-financing-constraint” that all token
flows must be accounted for (ie that there is no net token leakage in or out other than the
one explicitly accounted for). In the pure arbitrage case, this constraint is simply that, on
a net basis, all flows in tokens other than that of the target token are zero. We mention
en-passant that for routing applications one can use other constraints, eg “there is a flow
15
Figure 1.3: Constraints and target functions. The left panel depicts a smooth constraint
(blue) and a piecewise-linear one (orange) as well as the level areas of a linear target
function (grey). The right panel shows the respective derivatives, showing that at the
optimal point, the derivative of the target function equals the derivative of the constraint.
of 1,000 USDC into the system” (or “1,000 USDC and 3,000 DAI”), and answering the
question “what is the maximum amount of ETH that can be extracted and how?” is
the answer to the optimal routing problem. However, whilst the routing application is
interesting and not particularly hard to execute once the arbitrage application settled,
it is not the focus of this paper.
16
predict which one in advance7 . Specifically:
2. A market that only consists of limit order curves (eg, Carbon curves with width
zero) is a linear problem
3. Levered curves are a mix of the two, depending on their width, or rather: how
fast the liquidity changes at different points in the curve (“ticks” in Uniswap v3
parlance).
We can think of the third case above as a “smoothed” problem where the width of the
curves corresponds to the smoothing kernel used. We briefly explain the concept of
smoothing kernels in figure 1.4. In its left panel, we have drawn a few Gaussian kernels
with different width parameter λ. They satisfy the equation
r
λ −λx2 (1.6)
κλ (x) = e
π
When we calculate the convolution of the kernel κ with the target function f (here, the
constraint function), we get
Z
fκ (x) = (κ ∗ f )(x) = κ(x − y)f (y)dy (1.7)
Note that, as mentioned above, because the kernel κ is C ∞ , the convolution fκ is also C ∞
which we can easily show by pulling the differentiation operator ∂x inside the integral
so that we get fκ′ (x) = (f ∗ κ′ )(x) where the f ′ and κ′ respectively denote the derivative
with respect to the variable x. The convolution of a piecewise-linear function with a C ∞
kernel, together with the kernel examples, is shown in figure 1.4.
We do not use convolution explicitly in our algorithms but we enforce a minimum width
for limit orders which has the same effect: if we convert a buy-at-1,000 limit order to
7
In this context see appendix A for the radome optimization problem that provides a geometric
example for a problem exposing similar issues
17
Figure 1.4: Convolution examples. The left panel shows a number of Gaussian C ∞
kernels of different widths. The other panels show a zoomed out (middle) and detailed
view of a piecewise linear function (blue) and its convolution with two kernels of different
widths (orange and green) which is a C ∞ function.
2 Convex Optimization
The marginal price optimization algorithm for arbitrage and routing, described in section
3, is the core of this paper. It is closely related to the convex optimization algorithm
described in [6, 7, 12] that we were previously using. As described in section 2.3, when
we applied that algorithm for Bancor’s ArbFastLane product [15] to markets containing
important segments with levered curves we ran into unsurmountable convergence issues.
We will not go into the details of convex optimization here – the aforementioned refer-
ences are excellent resources for that. However, a description of the convex optimization
setup is important for setting the scene and context in which our core theorem of this pa-
per, the core equivalence theorem 19, establishing the equivalence between our marginal
price optimization algorithm and the previously known convex optimization algorithm.
18
This will happen in the next subsection 2.1, and in the following section 2.2 we will
discuss implementation and results of the convex optimization algorithm on unlevered
curves where it works extremely well. In the last subsection 2.3 we will finally discuss
convergence issues, providing a specific example of a problematic case.
We will here briefly outline the methodology that we used as a basis for our implemen-
tation. It closely follows [6] so interested readers are encouraged to read both papers in
conjunction.
For setting up a convex optimization problem, we need to define the optimization vari-
ables, the target function and a set of constraints. The target function is the function
that we want to optimize, and the constraints are the conditions that the variables must
satisfy.
Positivity Constraints. The basic constraint is that all token balances must be non-
negative, ie
∀α = 0 . . . 2M − 1 : xα ≥ 0 (2.1)
19
For unlevered curves, those constraints can often be omitted, because they are not bind-
ing. However, they are relevant in the case of levered AMMs where those balances are
virtual balances according to equation 1.5. Also, many solvers operate more efficiently
if positivity constraints are explicitly stated.
Curve Constraints (unlevered). The token balances satisfy the following curve
constraints or curve (in)equalities
The barred quantities x̄α are the initial values of the token balances, so those are not
optimization variables, but parameters of the problem. The redundant terms k̄α are
only shown to link back to the curve equation 1.3.
Curve Constraints (levered). When using levered curves we recreate the step from
equation 1.3 to equation 1.5 and we rewrite equation 2.2 as
This equation states that the current state of the AMM is adjusted by the virtual base
balances x̄0α , which are additional, constant, parameters of the problem.
Technically, the constraints in 1.3 and 1.5 should be equalities, but as shown in [6],
inequalities are required to make the problem convex. However, the solution to the
optimization problem will be found on the boundary, so any solution will actually satisfy
the equality constraints as opposed to the inequality ones.
Token Flows. Up to here we have treated all token balances as independent variables,
which misses one very important piece of information, notably what type of tokens they
are (eg WETH, USDC etc). This information is provided in the form of self-financing
constraints which ensure that the sum of the balances of the same token across all DEXes
20
is constant, with the exception of the target token which is the token in which the profits
are being extracted (see below). In other words: those constraints ensure the we do not
move tokens other than the target token in or out of the system.
In other words, Tiα is the indicator function for the token associated with the DEX
balance xα being of token type i.
Using this token matrix, we can now express the token balance function B : R2N →
RN +1 as a function from the state space into the “balance space”, associating each state
x = (xα ) a balance vector B(x) = (Bi (x)) where Bi (x) is the sum of the token balances
of token i across all DEXes. Using the token matrix we can write this in matrix form as
B(x) = Tx (2.5)
2M
X −1
Bi (x) = Tiα xα (2.6)
α=0
This finally allows us to define the aforementioned token flow function ϕ : R2N → RN +1
as the difference between the token balance function after optimizaton ϕ(x) and the
initial token balance function B(x̄) as
21
ϕ(x) = B(x) − B(x̄) (2.7)
The convention here is that outflows from the DEX system are negative, and inflows to
the DEX system are positive.
Self Financing Constraints. The self-financing constraints we use for arbitrage cal-
culation is that, other than for the target token i = 0, the flow must be zero, yielding
∀i = 1 . . . N : ϕi = 0 (2.8)
Note that we start at i = 1 because we excluded the target token from the token flow
calculation. Whilst the optimal routing problem is out of scope for this paper, we note
that if we are interested in routing instead of arbitrage, the above equation becomes
∀i = 1 . . . N : ϕ i = wi (2.9)
where w = (0, w1 , . . . , wM ) is the vector of desired flows to route into or out of token 0.
Target Function. Last but not least we need to define our target function (ie the
function that we want the optimizer to minimize or maximize). In our case, we want to
maximize our profit, and our profit is the outflow of target token ν = 0 from the system,
which in our conventions is a negative number. Therefore our optimizer target is
We can tie all those definitions together in the following definition. As mentioned above,
8
This definition is WLOG in that we could optimize for any other token but token 0, but this would
unnecessarily complicated the formulas.
22
the proof that this is a convex optimization problem is given in [6].
2.2 Implementation
Figure 2.1: Single pair arbitrage with three curves. The left panel shows three curves
of the same pair WETH/USDC, all at different prices (the stars are not aligned). The
right panel shows the same curve after the arbitrage process, where all three stars are
on the same line from the origin, at a price of 2377.
In the pair arbitrage figure 2.1 we show three curves, all operating in the same pair
23
WETH USDC
PRICE 2,377.2 1.0
ETH1 -0.119 224.201
ETH2 -0.083 180.452
ETH3 0.201 -538.075
AMMIn 0.201 404.653
AMMOut -0.201 -538.075
TOTAL NET -0.000 -133.423
WETH/USDC. As shown in the left panel, initially they are at different prices. The
result after running the optimization algorithm is shown in the right panel, where all
three curves are now at the same price of 2377. Therefore, all three stars indicating
the current state of the curves are on the same straight line through the origin. The
associated trade instructions are in table 2.1.
We then look at a simple triangular arbitrage, where we add WBTC as a third token,
and where we have curves for each of the three constituent pairs, as shown in the left
column of charts in figure 2.2. Note that, if we multiply the first two prices, we do
not get the third one, meaning that there is a circular arbitrage opportunity. After the
arbitrage process, the stars indicating the state of the AMM changed their location, and
the product of the first two prices equals the last one, therefore no further arbitrage
opportunities are left. The associated trade instructions are in table 2.2.
Finally, in figure 2.3 in the left column, we look at a triangular scenario with multiple
curves per pair, presenting both pairwise arbitrage opportunities (the stars are not on the
24
25
Figure 2.2: Triangle arbitrage with one curve each. The three charts on the left repre-
sent curves in the triangle WETH/USDC/WBTC, at price points that allow a circular
arbitrage. The right panel shows the same curves after the arbitrage process, where the
stars are aligned so no circular arbitrage is possible.
26
Figure 2.3: Combined triangle and pair arbitrage including levered curves. The curves
on the left represent a market where both pair and triangular arbitrage is possible, the
former being visible by the fact that the stars are not aligned. The right panel shows
the same curves after the arbitrage process, where all stars are either aligned, or at the
correct boundary in case of levered curves outside of their range.
WETH WBTC USDC
PRICE 1,215.9 13,611.0 1.0
ETH1 0.064 -86.315
ETH2 0.283 -440.579
ETH3 0.367 -898.979
BTC1 0.201 -3,707.206
BTC2 0.067 -1,107.091
BTC3 0.009 -129.880
BE1 15.463 -1.541
BE2 -10.584 0.913
BE3 -5.593 0.350
AMMIn 16.177 1.541 0.000
AMMOut -16.177 -1.541 -6,370.051
TOTAL NET -0.000 -0.000 -6,370.051
same straight line from the origin) and triangular ones (the products of the prices once
aligned do not match). In the right column of figure 2.3 we present the post arbitrage
scenario, and the associated instructions are in table 2.3. Note that the stars indicating
the current states of the AMMs are not all aligned: those on the unlevered curves are,
but some of the levered curves are stuck at the boundary closest to the relevant price
point. Again, the circular price equation is satisfied but only for the aligned prices
corresponding to interior points on the curves.
In scenarios like the one above, that are dominated by unlevered or sufficiently wide
levered curves, we have found that the algorithm converged well. It however ran into
issues in scenarios dominated by narrow levered curves. We have shown a simple example
for those types of curves in figure 2.4. On the left hand panel we see two levered curves
that are in the money against each other: one curve buys WETH at around 2,500, one
sells it at around 1,500, with an overall profit opportunity of around 17 USDC. None of
our convex solvers would converge with that problem. However, if we add a reasonably
sized unlevered sentinel curve – the unlevered curve added in the right panel of figure
2.4 – then convergence succeeds, even though the sentinel curve does not in this case
27
participate in the arbitrage trade, proving that convergence on this problem should be
possible9 even without the sentinel curve. The associated trade instructions are in table
2.4.
Figure 2.4: Example for curves where convergence fails. The left hand panel shows two
levered curves that are in the money against each other and where the algorithm fails.
Adding a sufficiently large sentinel curve (right panel) allows the algorithm to converge,
with no trading on the sentinel curve.
WETH USDC
PRICE 1,800.2 1.0
Sentinel -0.000 0.502
Lev1 0.018 -44.947
Lev2 -0.018 27.268
AMMIn 0.018 27.769
AMMOut -0.018 -44.947
TOTAL NET -0.000 -17.178
We have not formally shown this, but we suspect that the issues we see are related to the
phenomenon we have hinted at in a very simple case in section 1.3: there is a difference
9
Looking ahead, the marginal price optimizer that is subject of this paper and that we describe
in section 3, however, does converge well in this case; failure of the convex algorithm to converge on
curves that we were interested in, rather than the improvement in speed was the original reason why we
investigated the development of a new algorithm to replace convex optimization
28
between how to deal with smooth constraints and general linear constraints. Specifically,
in the smooth case, a gradient descent method can be used, or some other numeric solver
which allows us to solve for the condition that the gradient of the constraint equals the
gradient of the target function. For piecewise-linear constraints this no longer works, and
we need to use a method like the simplex method [29] that jumps between the vertices of
the feasible region. In practice, our constraints are piecewise smooth, meaning sometimes
we get interior solutions like in the smooth case, and sometimes we get corner solutions
like in the linear case10 . We have not found a convex solver that that can be relied upon
to consistently perform well11 under those circumstances.
Financially, the smooth vs corner solution cases are easily understood: an interior solu-
tion is a situation where – in the region of interest – the curves provide liquidity in both
directions, and no abrupt changes in liquidity occur when moving prices. On the other
hand, a solution where levered curves trade, but end up at their boundary is a corner
solution. An example for an interior solution is any collection of Uniswap v2 pools, or
Uniswap v3 pools with sufficient liquidity in and around the current tick that the trades
do not fully empty the current-tick pool in one direction. The archetypical example for
a corner solution are Carbon DeFi limit orders that are in the money against each other
(eg one curve selling 1 TKN at 100 USDC and another curve buying it at 105 USDC).
This is a highly non-smooth problem where no gradient method will work. However,
simplex methods usually work just fine.
The specific issue for using a convex optimization algorithm in a production environment
is that it is very hard to predict which case we are in before we have actually solved
it. We go back to the “sell TKN at 100 USDC, buy it at 105” example. If both
curves are sufficiently narrow, the transaction will bring at least one of the two curves
to its boundary, meaning that after the transaction at least one of the curves is either
completely empty or completely full. However, if the curves are wide enough, then
10
See appendix A for a description of the radome optimization problem that is a very similar geometric
problem
11
We note that because we developed the marginal price optimization algorithm before the publication
of [12] we never tried the conjugate algorithm developed there
29
this does not happen. Instead, the transaction will stop at the point where the marginal
prices on the two curves coincide, and, therefore, where every additional dollar transacted
would have a negative marginal contribution to the transaction profit.
Figure 2.5: Combining two invariant curves. The left panel shows two equally sized,
levered invariant curves covering disjoint price ranges. The right panel shows the com-
bined invariant curve (the dotted straight line shows curvature of the combined curve).
30
one and then the other curve flips its liquidity to the other boundary. Depending on
the regime we are in, a gradient or simplex method is better suited to find the solution.
See the appendix A for a discussion of a closely related geometric problem in higher
dimensions, the “radome optimization problem”.
We have seen in section 2, that the arbitrage and routing problem is a convex optimiza-
tion problem, and that we therefore should be able to solve it using standard convex
solvers. We have also seen in section 2.3 that there are potential pitfalls with respect
to levered curves, where convergence may be problematic. For us, this was particularly
problematic, because with the open source solvers of the CVXPY package we used [8],
we often were not able to determine why the algorithm failed on a viable scenario, and
what, if anything, we could have done to make it converge. Ultimately, we were not able
to make them work reliably enough to work in the ArbFastLane [15] product, which mo-
tivated the developepment of an alternative method. This new method is the main focus
of this paper: the Marginal Price method for arbitrage finding (and routing). It is similar
to solving the conjugate convex problem proposed in [12] which was published around
the time we put the finishing touches on our method, and it is based on financial rather
than pure mathematical reasoning and somewhat further optimized for performance.
In this section, we describe the method starting from mathematical first principles. For
this, we first define the mathematical framework and fix the notations for describing
AMMs (section 3.1), and then we do the same with respect to arbitrage transactions
(section 3.2). In section 3.3, finally, we move on to our central result – defining the
Marginal Price Formulation of the arbitrage problem in definition 18, and showing the
equivalence with its Convex Optimization Formulation from definition 3, in what we
refer to as the Core Equivalence Theorem, theorem 19.
31
3.1 Definitions and results related to AMMs
A token basket y is said to be larger than a token basket x (denoted y > x) iff ∀i :
yi ≥ xi , and ∃i∗ : yi∗ > xi∗ . The definitions of smaller than and larger/smaller or
equal are analogous.
It must be understood that the relation y > x only introduces a partial order on token
baskets: for example there is no relation between a basket that holds 1 WETH and one
the holds 1,000 USDC.
This definition can be extended to token baskets in the obvious manner13 , and if the
non-improving prices condition holds for all token baskets x, y then we define this as
non-improving prices in every direction.
In other words, non-improving prices mean that the price for the “second dollar sold”
13
We can reduce the multi-dimensional problem to a one-dimensional one by choosing two non-
overlapping directions represented by x, y where x represents the inputs and y the outputs. If we
fix the multiplier λx then the AMM will determine the multiplier λy such that it considers the exchange
of λx · x against λy · y as fair. Non-improving prices in this context means that the ”marginal basket
dλy d2 λy
exchange rate” dλx
is decreasing in λx , ie whatever x, y, λx we have dλ2
<0
x
32
cannot be better than the one for the “first dollar sold”, both for tokens and token
baskets. However, the prices can be the same.
AMMs can be bi-directional if they trade in both directions and traded tokens automat-
ically move ”to the other side of the curve”, or they can be directed if tokens can only
be traded once. Bi-directional AMMs can impose trading fees or not15 .
In other words, if the AMM is currently in the state represented by x, and the basket y
represents a state such that f (y) = f (x) = k then it will accept the basket of all tokens
for which yi − xi > 0 in exchange for the basket of all tokens where yi − xi < 0.
Proof. Firstly, the fact that f must be constant (and is therefore just on the border
of being convex) in the directions of tokens the AMM does not cover is trivial. The
remainder is a well known result (eg referred to in [6]). To sketch a full proof, in the one
dimensional case it is easy to see from the definition of the price p which corresponds
14
Covering the full range of prices means that the function λy (λx ) is defined for all λx ∈ (0, ∞) and
bijective
15
Note that financially there is no difference between fees and a bid ask spread, and for directed AMMs
the notion of fees – the difference between the buying and selling price of the AMM – does not make
sense
33
to the negative of the first derivative p = −f ′ , and the fact that for a convex function
we have the second derivative f ′′ > 0. In the higher dimensional case we note that this
must hold in every token direction and can therefore be reformulated with the λy (λx )
functions defined above. ■
Those definitions coincide with the AMM examples provided in section 1.2 where equa-
tion 1.1 is an example for a the bonding curve of a multi-asset AMM operating on token
balances vectors x, y, and both equations 1.3 and 1.5 are examples for bonding curves
of single-asset AMMs operating on single token balances x, y. We note that all those
bonding curves are convex. For levered AMMs, the values of x, y have to be restricted
to the range covered by the AMM.
Note that in the presence of unlimited and cost free flash loans (ie loans that can
be taken out in unlimited size for all tokens and that have to be repaid at the end of
the transaction) the above condition can be simplified to the requirement that it is not
possible to start with a balance of x = 0 and end with a balance of y > 0 where the last
inequality is to be interpreted in line with definition 4.
34
In other words, the circularity condition states that if we exchange infinitesimal amounts17
of tokens along any closed loop, then we end up with the same token amount we started
with.
Proof. Before we move on to prove this we point out that this is a well known result in
finance, and we refer to [24] as one of many examples. However, as the proof is directly
linked to the main topic of this paper we present it here anyway, in a very condensed
form. Starting with (1), we note that bi-directional unlevered AMMs that do not charge
fees buy and sell at exactly the same price. Therefore, if we have two AMMs that operate
on the same token pair at two different prices p1 ̸= p2 then buying low, selling high would
be an arbitrage transaction. Note that here the convexity condition is important, ie the
first dollar traded must always be at the best price, otherwise we may be able to get
additional arbitrages by trading a larger amount. Similarly in (2), we first note that we
can move along the loop in either direction, and resulting price in one direction p+ will be
the inverse of the price in the other direction, ie p+ = 1/p− . Unless the circular product
is unity we can always choose a direction in which the circular product in definition 11
is strictly greater than one, and thus we can make an arbitrage profit by trading along
the loop in that direction. This concludes the proof in both directions. ■
35
Proof. Proving that circularity follows from the existence of the pi is straightforward,
replacing pmarg (i, j) with pi /pj in definition 11 and observing that all terms cancel out.
Going the other way, we define pi ≡ pmarg (i, 1) as the price of token i in terms of the
“numeraire token” 1. For every token pair i, j we can look at the loop (1, i, j, 1). Because
it is a closed loop, the product of marginal prices is unity, and therefore we have
pmarg (i, 1)
pmarg (i, j) = ≡ pi /pj (3.1)
pmarg (j, 1)
πij = πi /πj
We define an equivalence relation between price vectors that we denote ”=” where two
price vectors π a and π b are equivalent iff their price functions coincide, ie
meaning the ratios πia /πja = πib /πjb are the same .
The purpose of the above definition is to abstract the price information in a fully ar-
bitraged market. We note that the price vector π itself should never be used directly
36
because it is only defined up to a multiplicative constant. Instead, all usage of π should
respect the associated equivalence relation which can be assured by always using the
associated price function πij instead of the components pii .
However, the price vector π is a valid mathematical object that resides in the reduced
dimensional space where all prices are positive and where the numeraire token is fixed
N −1
at unity (ie π ∈ R>0 × {1}).
At this stage, we are ready to move on to prove a more general proposition that covers
levered and unlevered AMMs:
Proof. First we note that, in case of unlevered AMMs, this proposition reduces to
proposition 12 because (a) the closest attainable price will simply be pi /pj , so (b) all
AMMs exchanging tokens i, j will be set at the same price pi /pj that (c) satisfies the
circularity condition in definition 11 because of proposition 13. For a levered AMM, we
firstly note that if the current marginal price is not at a boundary, for small trades the
levered AMM behaves like an unlevered AMM, therefore the same reasoning applies and
its marginal price must be at pi /pj as in the unlevered case. If the price of the AMM
was at the boundary away from pi /pj then someone trading against the AMM could buy
low at pi /pj from another curve and sell high into the AMM stuck at the far boundary,
thereby moving the price closer towards pi /pj . The only point where this trade is not
possible is at the boundary closest to pi /pj because at this point the AMM will no longer
buy. ■
We next we define the price response function (“PRF”) that indicates how an AMM –
or a set of AMMs – responds to a change in price(s):
37
Definition 16 (Price response function). Given a set of AMMs ν = 1 . . . M , a set of
tokens i = 1 . . . N , the individual PRF ρν of AMM ν is an equivalence-respecting18
function that maps a price vector π to a set of token changes
The aggregate PRF of a set of AMMs ρ is the sum of the individual PRFs
M
X
ρ(π) = ρν (π)
ν=1
By convention, outflows from the AMM are negative, and inflows are positive, therefore
the pre-trade balances xpre and post trade balances xpost satisfy
xpost pre
νi = xνi + ∆xνi
Arguably, the PRFs are the most important financial objects that we are dealing with.
They are equivalent to, but more financially relevant than, the usual invariant functions
in equations 1.1, 1.3, and 1.5. For a traditional AMM without fees they are path inde-
pendent, meaning that aggregating the PRF results over any price path π 1 , π 2 , . . . π N
is the same as going directly to π N , the end point19 . However, in the presence of fees,
things change. Firstly, longer paths lead to a higher fee bleed. Moreever, if fees accumu-
late on the curve, the invariant curve changes and therefore, so does the PRF. Finally,
in a directed AMM like Carbon DeFi, curves do not automatically reload 20 , so in this
case the PRF is usually highly path-dependent.
38
1 . . . N is a matrix (∆xνi ) that describes the flows of token i into (positive) and out of
(negative) the AMM ν. A TIM is called ”respecting a set of self financing constraints” if
its aggregate over all ν fulfils the constraints from equation 2.9. It is called an ”arbitrage
TIM” if it fulfils the arbitrage SFC in equation 2.8. In those two cases we refer to
21 of the arbitrage finding or routing process. In case of a pure
P
ν ∆xνi as the result
abitrage, the negative result (a positive number) is also referred to as the arbitrage
profit.
We are now ready to present the mathematical core of this paper, the claim that the
marginal price optimization problem described in this paper which forms the basis of
operations for the FastLane Arbitrage bot [15] is equivalent to the convex optimization
problem described in section 2 based on [6]. To do this, we first formally define the
“Marginal Price Formulation” of the problem:
ρ(π) = −λe0
where e0 is the unit vector of the zeroth vector component22 and the ”result” (in the sense
of definition 17) λ ≥ 0 is a scalar determined by the algorithm via the trade instruction
matrix that we will show to be non-negative below in proposition 20.
Note that whilst the optimal routing problem is out of scope in this paper we still want
to record the fact that for routing the above equation will become
21
This terminology is driven by the usage of the term ”result” within an convex optimization context
22
Again like in the footnote to equation 2.10 the choice of token 0 is WLOG and for simplicity of
presentation only; this condition states that this is a pure arbitrage transaction where the profits, if any,
are taken in token 0
39
ρ(π) = −λe0 + x
where the vector x are the desired in and outputs in tokens other than 0 like in equation
2.9.
We now show that the above Marginal Price Formulation is equivalent to the Convex
Optimization Formulation that was the subject of section 2, and specifically defined in
definition 3. En passant we note that this is essentially the well-known result from convex
optimization theory linking a problem and its conjugate as used in [12], but as our focus
is on finance as opposed to pure mathematics we want to provide a self-contained proof
more financial proof.
Theorem 19 (Core Equivalence). The problem of finding arbitrages (or route opti-
mally) in a set of AMMs in the Marginal Price Formulation as described in definition
18 is equivalent to the Convex Optimization Formulation as described in definition 3.
Specifically, the trade instruction matrix (and therefore the arbitrage profit) obtained by
both formulations will be the same.
Proof. To prove the above, we start with creating a list of items where the two for-
mulations coincide, and where they differ. The formulations coincide in the following
items
1. They are solving the same arbitrage finding problem as presented in definition 123 .
2. They start with a set of AMM curves satisfying invariant equations along the lines
of definition 8, and holding the associated amounts of tokens pre-arbitrage.
40
The convex optimization formulation (“COF”) and marginal price formulation (“MPF”)
differ in the following way:
1. The COF seeks to minimize a target function in line with equation 2.10 whilst the
MPF seeks to find the root according to definition 18.
2. The COF algorithm operates directly on the AMM token holdings xνi whilst the
MPF algorithm operates on a marginal price vector πi according to definition 14.
To prove equivalence, we have to either (a) show that they are the same, or that (b) the
MPF solution satisfies the COF conditions and vice versa. We start with a non-rigorous
argument for (a) by pointing out that both are solving the financial real-world arbitrage
problem 1 that, in a non-path-dependent environment, has a unique solution24 . However,
we have pointed out before that Carbon DeFi positions introduce path dependence, and
we can only accept this as a proof when no directional curves are in the curve set.
For proving the theorem along the lines of (b) we first show that the optimal solution in
the COF framework satisfies the MPF conditions. For this we point out that if marginal
prices were not to satisfy the price conditions in proposition 12 then additional profits
could be generated by “buying low selling high” and therefore the convex optimization
process had not worked as advertised25 .
Now we go the other way and show that a solution of the MPF satisfies the constraints
of the COF and minimizes the target function. By design, the MPF solution satisfies the
positivity constraints from equation 2.1, the self-financing constraints from equation 2.8
and curve constraints from either equation 2.2 or 2.3. To show that this also minimizes
the (negative) target function we note that by construction the state of the market is
arbitrage free, and the existence of a state with a bigger outflow under the same self-
24
Unique in the sense that if there were multiple solutions one could always trade from one to the
other so any algorithm that gets stuck on a sub-optimal solution does not in fact solve the arbitrage
problem; there can, however, be cases where more than one set of trade instructions yields the same
arbitrage profit in which case both algorithms may find either of them depending on starting conditions
and algorithm details
25
We had some concern in cases where profit repatriation was an issue, but as we have shown in detail
in appendix B that is is not in fact the case
41
financing constraints would imply that there were arbitrages available in the initial state.
■
Proof. For the may or may not exist part, nothing needs to be proved. All other
properties follow directly from the Core Equivalence Theorem 19, and the fact that the
empty solution (zero everywhere) will either dominate the convex solution, or be the
solution. ■
We want to briefly elaborate on the may or may not exist part, using financial arguments.
Firstly, unlevered curves can take up any number of tokens, so any routing constraint
pushing tokens into the system will not usually be a problem. Provided there is a route
from all inputs to the desired output token, the routing problem will always have a
solution. Constraints taking tokens out of the system are limited by the number of
tokens available in the system however, so if this number is being increased, at one point
there will no longer be a solution. Tokens that only live on levered curves will also have
a maximum amount that can be pushed in.
Whilst the marginal price algorithm is in principle the same on token pairs and on token
sets containing more than two tokens, there are important numerical differences. Most
importantly, on token pairs the problem is a one-dimensional root finding problem, and
according to the intermediate value theorem [1] we are guaranteed to find a root26 if we
can bracket it. Convergence will be in logarithmic time – every step will increase the
26
Or rather, a root location, if we also consider step functions that may not technically have a root, a
case that is of practical importance for us; see appendix D.1 for details
42
precision by a factor of two.
Having said this – our problem is more benign than the general mathematical framework
may suggest. After all, we are solving a real world problem in finance, and as shown in
[6], the problem is convex and therefore has a unique solution. Specifically, the arbitrage
problem is always dominated by the null solution, so either either a proper arbitrage
solution exists, or do nothing is the formal solution to the arbitrage problem. The issue
is therefore less the question of existence and uniqueness27 , but rather the question of
how to find the solution within a reasonable amount of time.
We have split the discussion into multiple parts. First, we discuss the implementation in
the case of a single token pair using bisection in section 4.1. We then discuss the generic
case using Newton-Raphson / gradient in section 4.2, and finally we deal with the topic
of convergence in section 4.3.
On token pairs, the arbitrage problem boils down to finding a single price where the net
flow of all other tokens than the target token is zero for arbitrage, or a specific number
for optimal routing28 .
This means that in two dimensions, our general root finding problem without fees looks
27
Again, financially it is clear that trade instructions cannot be unique in the general case; for example,
consider two zero-slippage curves covering the same pair where any routing of the required amount
through the two curves will be a solution
28
The marginal price routing algorithm is the one used when trading on Carbon DeFi via the canonical
user interface
43
generally like the different graphs depicted in figure 4.1: an unlevered curve is a simple
convex line (blue, 1), a single levered curve is a convex segment in between two flat
areas (orange, 2), and multiple levered curves at different prices correspond to a series
of convex segments separated by flat areas (green, 3).
Figure 4.1: Price response function with no fees. Price response function, defined as
change in token amounts against price for (1) a single unlevered curve, (2) a single
levered curve, and (3) multiple levered curves, all curves with no fees and all at current
price p = 100.
Note that if we include fees the picture changes considerably in that we get a flat area
inserted at the current price point which depicts the boundary between buying and
selling. The width of the flat area is the current price multiplied with the percentage fee
charged. This is shown in figure 4.2 for an unlevered curve in the left hand panel and a
levered curve in the right hand panel.
For one dimensional root finding problems, there are fundamentally two methods
As discussed in appendix D.1.3, the bisection method is extremely robust in that it will
44
Figure 4.2: Comparison of price response function with and without fees. Price response
function, defined as change in token amounts against price for (1) unlevered curves, (2)
levered curves, (a) without and (b) with fees, all at current price p = 100.
either fail right from the start (because the bracketing did not yield two points with an
opposite sign), or it is guaranteed to converge to the location of either a root in case
the function is continuous (cf left hand panel in figure D.1), or a root location where the
target function changes its sign in case the function is not continuous and jumps across
the x-axis (cf right hand panel in figure D.1).
Depending on the shape of the function, the Newton-Raphson method can be much
faster than the bisection method. As we discuss in appendix D.2.1, if the function is
linear, convergence is in one step. Generally, on convex functions convergence is fast
regardless of the starting point (see figures D.4 and D.5 for a few examples). However,
for functions that change convexity, or that are not continuously differentiable, a number
of bad things can happen, notably the algorithm can go into an infinite cycle (see figure
D.5 bottom right panel), or the new sampling point can be catapulted to “infinity” when
the function is very flat at the current sampling point, and depending on the shape of
the function it can be impossible to recover from that.
For example, consider the green function (3) in figure 4.1 in the price area arounf p ≃ 80.
45
The function is flat and therefore any gradient descent will fail. We could regularize the
function somewhat (eg by adding a small slope to ensure that the gradient is never zero).
However, even this regularization would lead the algorithm very far out, into another
flat area, from where the regularization would lead it back. In this case we can have
either of two eventual outcomes: the algorithm may eventually end up in the price area
of the curve that contains the root29 , or it may end up in an infinite cycle. In practice,
we will most likely see a large number of iterations and we will run into our maximum
iteration boundary. In any case – those kind of scenarios are extremely inefficient for
a Newton Raphson algorithm: either it takes a long time to converge, or it takes even
longer until we decide that it does not converge in the alloted time.
Generally in the pair case, the improved speed of the Newton-Raphson method in our ex-
perience does not outweigh the risk of non-convergence, so we exclusively use a bisection
method when we deal with arbitrage or routing within a single token pair.
We do need to qualify the robustness claim made above though because, for non-
continuous functions, it only applies in price-space. However we seek to specify transac-
tions that live in token-amount-space, so convergence in price-space is not enough. This
is not an issue with the bisection algorithm specifically, but rather with converting a to-
ken space problem into a price space problem as is the foundation of the marginal price
method. It is particularly annoying, however, in case of the bisection method, because
it somewhat torpedoes the robustness properties of this algorithm.
The functions that pose problems are those that are discontinuous at their root location
(ie that jump from positive to negative or vice versa). In practical applications this also
includes functions that are numerically discontinuous, with which we mean functions
that have such a steep gradient that, within the resolution of the algorithm, they appear
to be discontinuous. Those functions are easily identified: they are what in Carbon DeFi
[31] we refer to as “limit orders” (ie orders where the parameter A = 0 and where the
29
This is the equivalent of ending up by chance an the correct face of the radome in the example of
appendix A
46
start and end price are the same). In practice, very narrow ranges with A ⪆ 0 also pose
a problem, because they are what we referred to as “numerically discontinuous” above30 .
In the introductory section 1.3 on optimization, we have introduced, in equation 1.7, the
convolution method to regularize a function. We could apply this method here, but in
practice we find it easier to enforce a minimum width for limit orders. Specifically, we
enforce values of A, B so that they satisfy the condition above with ε0 ≃ 10−6 , and we
adjust both A and B if this is not the case, ensuring that the adjustment is such that
the effective price of the order, when fully executed, remains the same.
We now move on to the higher dimensional case, ie everything where more than two
tokens are involved. As discussed in appendix D.1.4, there is no equivalent of the bisec-
tion method in dimensions higher than one, so we are forced to use a multi-dimensional
Newton-Raphson method as described in appendix D.2.3. If we have N + 1 tokens
i = 0 . . . N , this algorithm works as follows:
2. Invert the Jacobian to solve the linear approximation of the function, as described
in equations D.7 and D.8.
3. Update the price vector according to equation D.9, taking into account the learning
rate η if need be.
47
This algorithm looks deceptively easy, but for it to work in a production setting, a
number of points need to be considered, most importantly the following:
Log Prices. We initially implemented the algorithm using actual prices. This ran
into a number of issues, most importantly that sometimes the algorithm ended up in a
situation where the price was negative. Also, because price levels in crypto can range
from below 10−6 to the dollar to almost 100, 000, the numerical conditioning is not ideal,
in particular for calculating the derivatives of the price response function (definition 16)
for the Jacobian. We therefore switched to log prices; we define x = log π and we perform
all calculations for the algorithm in the log space x, not in the price space π.
Curves with no closed form solutions. Because we calculate the derivative of the
price response function (definition 16) numerically via perturbation, and this calcula-
tion is at the core of the algorithm, it is important that this calculation is fast and
consistent32 . Calculating derivatives from something that itself relies on numerical ap-
proximations is often slow and error prone – so in cases where we do not have a closed
form solution for the PRF of a specific AMM, we approximate the curve with sufficient
number of levered constant product AMMs placed next to each other and use those for
the calculations.
Singular Jacobian. The Jacobian can become singular, in which case the naı̈ve algo-
rithm fails. We have implemented a fallback algorithm that inverts the Jacobian only
on its image, and does not attempt inversion on its null space. This improves perfor-
31
We currently calculate derivatives along the blue (sum first) path we will define in diagram 5.1 in
the next section, section 5. Please refer to the discussion in that section for the implications thereof
32
Please refer to the next section, specifically 5.4, for a discussion how the calculation of the Jacobian
is the main numerical effort of the current algorithm
48
mance in higher dimensional cases, because there is at least a chance that the algorithm
either ends up in a non-singular place, or that the singularity corresponds to prices that
ultimately do not matter for the arbitrage problem at hand. Note that singularities
typically occur if at a certain price level there are no curves that allow trading the pair
corresponding to that price. Because of the market structure on crypto markets, where
most AMM curves are against one of (W)ETH, WBTC or a USD stable coin, this is
particularly pertinent if an unusual target token is chosen, which in turn suggests that
it may be better to choose one of the aforementioned tokens as target token.
Learning rate. We have experimented with changing the learning rate η to improve
convergence, but we found that it did not make a significant difference. In reality, the
biggest issue is that, because of lack of liquidity in a certain price region, prices are
catapulted towards infinity, and catapulting them to towards “η times infinity” does not
seem to lead a significant improvement. On the other hand, choosing η < 1 is costly. In
figure D.6 in the appendix we have illustrated the impact of the learning rate η on the
Newton Raphson algorithm. Abstracting the findings from this further we have plotted
in figure 4.3 the slowdown factor n for a given convergence level d as a function of the
learning rate η. The convergence level d ∈ (0, 1) describes the residual distance to the
real solution as a function of the original distance (eg d = 0.01 means that the distance
has been reduced by a factor of 100). The slowdown factor n is the number of iterations
required to reach this level of convergence. We note that that slowdown factor is a real
cost: running the algorithm is computationally expensive, and the slowdown factor η
goes directly to the running cost bottom line, and it can also affect latency which matters
on fast chains.
Convergence criteria. We have initially used a relative convergence criteria (ie that
the algorithm does not change prices any longer). In some ways this is a good criterion,
because we operate in log space, so the criterion is effectively the average percentage
change. The issue is that the algorithm in this case relatively often indicated it had
converged, even though it had simply ended up in a region with no liquidity at all and,
49
Figure 4.3: Slowdown in convergence as a function of the learning rate. Slowdown factor
(ie number of iterations required) n to achieve various convergence levels d as a function
of the learning rate η, regular (left) and log scale (right).
therefore, where the Jacobian was flat zero. We could have tested for this, but we
ultimately chose an absolute criterion along the lines of “average violation of the self-
financing constraints < 1 USD”, which is more financially meaningful. The downside in
this case was that we always needed to provide USD prices to the algorithm, even if no
USD curves were involved in the arbitrage problem at hand. Also, divergence can take
longer if the algorithm is stuck close to infinity – relative convergence would break right
away, falsely claiming convergence. Absolute convergence will eventually rightly report
non-convergence, but it will take longer to do so because it will run empty cycles.
Convergence issues. Regardless of the criteria used, the algorithm sometimes diverges,
even though a solution exists. We have already discussed the issue in relation to the
convex optimization algorithm in section 2.3, and whilst the marginal price algorithm
converges better, it still has a substantial failure rate when very thin curves are involved.
50
ample discussed in the intro to section 4, the algorithm generally creates a set of trade
instruction that touches many, and possibly all, curves. Having a transaction with so
many curves has two drawbacks. Most importantly, the more curves are involved, the
more likely the whole transaction is to fail because the blockchain state has changed by
the time the transaction is included. Additionally, if there are limited flashloan oppor-
tunities and we have only limited token amounts, linearization33 of the transaction can
be complex. The latter problem can be dealt with, but the former remains: we usually
want to decompose the transaction into smaller ones, and prioritize their submission
based on profitability and complexity.
4.3 Convergence
Convergence of the marginal price algorithm is significantly better than that of the
convex optimization algorithm from [6]. For example, the example discussed in section
2.3 converges well to the same result is in table 2.4 without need for a sentinel curve
like in figure 2.4. However, even the marginal price alorithm can run into convergence
issues. We have thus far identified two scenarios that do lead to divergence:
• escape scenario: the gradient catapults the algorithm into a region where no
curves are located, at which point it is either blocked, or jumps around erratically
• loop scenario: the algorithm enters an infinite loop without ever converging to a
solution, or at least not within the maximum iterations allowed
Both scenarios are shown in figure 4.4. All panels depict price response functions of
different scenarios, and in all case the solution dx(p) = 0 is found in segment (2) and
convergence is fast provided the algorithm starts in, or ever reaches, segment (2). In
each of the cases however, whenever the algorithm starts in region (1) it will diverge.
The left and middle panels show escape scenarios. In the left panel, the curve segment
(1) is relatively steep so the next point will be in segment (3), close to segment (2).
33
The term “linearization” refers to the process of creating a transaction (based on own token holdings
or flashloans) that can be executed in a linear manner without any intermediate token balances ever
falling below zero
51
Figure 4.4: Divergent scenarios for the marginal price optimizer. The panels above
show price response functions where the result is in segment (2) and they diverge when
starting in segment (1). The two left panels escape to segment (3), the left one closely,
the middle one far away. The right panel enters into an infinite loop (1), (3), (1), etc.
In the middle panel, the curve is very flat and the escape is towards infinity. In those
particular cases this does not make a big difference – both algorithms will detect a zero
gradient, at which point they will fail. What this shows however is that a learning rate
η (as discussed in section 4.2) may or may not be helpful: Whilst in the left panel, an
η ≃ 0.8 would bring us into segment (2) from where we would converge, in the middle
panel we would need a really small η to not end up in the empty region (3). However,
this would be too detrimental for the overall speed of convergence of the algorithm, as
discussed in appendix D.2.2.
The right panel shows a loop scenario similar to the one shown in the bottom right panel
of figure D.5. Here, we have three curve segments, so (4) instead of (3) indicates the
segment without curves. Here, if we start in segment (1), the algorithm will bring us to
(3). From there, we will return to (1) and the cycle restarts. Depending on the exact
starting point and shape of the curve, the algorithm may eventually escape to segment
(4) or its counterpart on the left, may hit segment (2) and will converge, or it may
52
be stuck in an infinite loop. In reality this does not usually matter because we cannot
afford to run the algorithm until we find out, so each of those three cases will hit the
maximum-iterations boundary.
Whilst neither of the scenarios converges, in the above two dimensional case there is an
important difference: the escape scenario diverges very quickly – as soon as the algorithm
enters the empty space it will detect a zero gradient, at which point it will know it has
failed and will terminate. The loop scenario is harder to detect, and in fact we only
detect it via hitting the maximum iterations threshold. This of course is expensive, and
careful management of this boundary is important because it will significantly impact the
performance of the algorithm. We note that the bona fide escape scenario is relatively
rare: whenever there is an unlevered curve present in the analysis, the gradient will
never be fully zero, in which case we will end up in a variation of the loop scenario.
In higher dimensions the issues are different, except that one can have a convergent and
all types of divergent scenarios present at the same time, depending on the direction.
We currently only terminate if the Jacobian is zero. If it is singular – meaning that we
are in the empty space in at least one of the directions – we instead invert the Jacobian
on the invertible sub space only, on the hope that this returns us to a region where the
entire Jacobian is invertible again. We are not currently certain whether or not this is
the right choice because if we do not quickly return to the core region of convergence we
simply waste more iterations on the problem before we diverge anyway. Currently we
are leaving it in because we value convergence over speed.
5 Performance comparison
In this section, we report the results of our performance comparison between the different
algorithms. The software and hardware details are in table 5.1. The performance of this
machine is comparable to the cloud servers we use in the production environment, so
the numbers, which range from milliseconds to seconds, are indicative of what we can
expect there.
53
version
system
MacBook Air M3 2024
MacOS 15.0
RAM 16 GB
python 3.12.4
cvxpy 1.5.2
numpy 1.26.4
pandas 1.5.3
matplotlib 3.9.1
networkx 3.3
Times are measured using simple wall clock time from start to end of the calculation,
including the profiling of the code in 5.4 which used instruments introduced into the
code that recorded wall clock readings along important way points of the execution and
then aggregated the results into an overall time-spent-per-stage reading. We are well
aware of the theoretical limitations of this approach, and we have taken measures to
address those. For example, for measuring short running processes we have repeated
the measurements up to 10-100x. The ultimate size of the effect observed in the results
suggests that those measures were sufficient for our purposes: our core result shows a 20x
to 100x+ improvement for the marginal price algorithm over the convex ones from [6]34 ,
which is beyond any noise or bias that could possibly be introduced by our measurement
protocol.
54
scribed in appendix D.2, which is what we call the full mode because there is
no restriction on the number of tokens
3. Convex optimization with CVXPY [9, 13, 8] using the new default solver Clarabel
4. Ditto using the older solvers ECOS and SCS which ex post we group together
because their performance here is very similar
We start the analysis on token pairs, the only arena where all contestants, including
our marginal price pair mode, can compete. For this, we run the algorithms on two
tokens, and we vary the number of curves from 2 to 2,000. We note that whilst using
traditional AMMs even the most crowded pairs will not usually present more than a
dozen of curves, we have developed the marginal price algorithm for the use together
with the CarbonDeFi protocol [31, 32, 18], where every single trading position is a curve.
For busy markets, a few thousand positions is not a particularly high number.
The key results we have obtained for token pairs are summarized in figure 5.1. Note
that we limited our charts to 200 curves in this case – our analysis ran further (up to
2,000) but nothing meaningfully different happened there, so we chose the tighter range
for a more effective presentation. The left panel of figure 5.1 shows the performance of
all algorithms on a linear scale, and the right panel zooms into the two marginal price
algorithms only. We make the following observations, most of which are representative
for more general cases which we will discuss in the following sections.
1. The marginal price modes outperform the convex modes by a massive margin, to
the point that the curve of the full marginal price mode appears flat in the chart.
2. The pair mode is substantially slower than the full mode. The ratio is somewhat
volatile but as the right panel shows, the speedup of the full mode versus the pair
55
Figure 5.1: Calculation time versus number of curves (token pairs). The left hand panel
shows the calculation time in ms for the different algorithms (orange and blue marginal
price; the others convex) on token pairs. The the right hand panel shows the same
data for the marginal price algorithms only, plus the speedup ratio between the two (red
surface).
mode is at least 5-10x, except for the very smallest number of curves where the
fixed costs dominate the algorithm.
3. The convex algorithms group into ECOS and SCS on one hand, and Clarabel, the
new algorithm on the other one. None of the algorithms are competitive to even
the pair mode, let alone the full mode. Clarabel is performing significantly worse
than the older ECOS/SCS.
4. None of the convex algorithms display a performance that would be useable for real
life arbitrage purposes across a large number of curves – the calculation time re-
quired scales approximately linearly in the number of curves, at about 1.5 seconds
per 1,000 curves for ECOS/SCS and almost 3 seconds for Clarabel. For compar-
ison, pair mode is at 0.4 seconds, and full mode at about 0.04 seconds per 1,000
curves.
56
5.2 Scaling the number of curves
We now look more closely at what happens when we increase the number of curves whilst
holding the number of tokens constant. We have run the analysis for different numbers
of tokens between 2 and 20. The results we show here are for 10 tokens, and they are
representative for what he have seen in the other cases.
The results are presented in figure 5.2 where the left panel is the same chart as in the
left panel of figure 5.1 except that the x-axis now goes up to 2,000 curves. Also, as the
number of tokens is above two, there is no marginal price pair mode. Fundamentally, the
results are the same as in observation 21, except that Clarabel seems to perform even
worse: whilst ECOS/SCS deteriorate linearly, the curve for Clarabel looks quadratic. At
more than 40 seconds to run a single analysis on 2,000 curves it is beyond any usefulness
for us in practical settings. It also dominates the chart in the left hand panel to the
extent that the performance figures for the other algorithms are hard to read.
Figure 5.2: Calculation time versus number of curves (10 tokens). Both panels show the
calculation time for the different algorithms in a market with 10 tokens as a function of
the number of curves. Blue is marginal price full mode, orange is Clarabel, and green/red
are ECOS/SCS. The left hand panel is on a linear scale, the right hand panel is on a
log/log scale.
57
We therefore redraw the same chart in the right hand panel of figure figure 5.2 on
a log/log scale. The marginal price algorithm is the clear winner, with a consistent
performance of about 10ms per 1,000 curves. The ECOS/SCS group is consistently
behind by what looks like an order of magnitude along the entire curve, and Clarabel is
another half order of magnitude behind that at the start, deteriorating to a full order of
magnitude at the large-number-of-curves end of the spectrum.
We we present the associated speedup numbers of the marginal price algorithm over the
convex algorithms in table table 5.2. We see that ECOS/SCS are behind the marginal
algorithms by a factor of 10-20x, with the distance getting larger at larger-number-of-
curves end. The speedup compared to Clarabel starts at 30x for 10 curves and reaches
200x+ for 2,000 curves, and is most likely getting even worse beyond that. This is
making CVXPY’s new default solver definitely not a good choice for this task, even
when compared to ECOS/SCS.
The results of the following section 5.4 have surprised us at first. Before we continue, we
need to provide some additional context. As we discuss in a forthcoming paper [27], we
only ever use the optimization algorithm for two or three tokens at a time, and we loop
over the reasonable token combinations. Therefore, performance of our implementation
of the algorithm is not relevant for us for large token numbers, but we do care about the
performance for large numbers of curves. This means that in practice we operate in the
region covered by section 5.2.
58
However, when running numbers for this paper, we found that our implementation
degrades significantly when the number of tokens gets large, to the point that as shown
in figure 5.3 we can go beyond ECOS/SCS. When we looked into this, we concluded that
this was an artefact of the way we implemented the algorithm rather than a fundamental
limitation of the algorithm itself, for reasons that will become clear in a moment. We at
on stage will improve the algorithm implementation, and we will report on the results in
a future paper and update this one accordingly. Technically, the changes are not trivial,
and we need to be careful making changes to our research system to ensure that it stays
reasonably close to our production system.
The fundamental issue is that we have the derivative of a sum of functions, and the
derivative and sum operations commute, leading to the commutative diagram in equation
5.1. This diagram needs some explanation. Firstly, all vector quantities are denoted by
bold face, and all scalar quantities by regular face. We have a set of vector-valued
function fν (x) of a vector x, and the aggregate function F which is the sum over all
P
the constituent functions F = ν fν . We also have a derivate operator, the Jacobian
operator J, which has a matrix-valued result, where the element Jf ij of J applied to f
is the partial derivative of the i-th component f i with respect to the j-the element xj ,
P
as seen in the top right corner in the diagram 5.1. We also have a sum operator
that operates either at the vector level if it aggregates function values, or at the matrix
level if it aggregates the Jacobian values. The diagram shows two paths to calculate
the Jacobian of the aggregate function F, one in red and one in blue, and the diagram
commutes.
J ∂fν (x)i
fν (x) J f ν (x)ij = ∂xj
P P (5.1)
P P
Fν (x) = ν fν (x) J
J Fν (x)ij = ν J f ν (x)ij
The blue path – sum first, then derivative – is easier to implement than the right one.
For the blue path, all we have to do is to dispatch the relevant components of the vector
59
x that fν needs into the function, and aggregate the result correctly on a coordinate level
in the sum function F. We can for example use dictionaries to create those sparse vector
structures which is easy to implement and produces very little overhead. The function F
then has the same interface as the component functions fν , and an unmodified Jacobian
algorithm can be fed with the aggregate function F.
The red path – derivative first, then sum – is somewhat harder to implement because
the aggregation for J happens at the matrix level as opposed to at the vector level, so
we need to use different aggregation algorithms for F and Jf .
However, the red path has one important advantage that massively reduces the compu-
tational effort: all our functions only depend on two variables that for presentational
purposed here we call y, z. Also, they only return two values. Therefore, the Jacobian
of F when embedded in the larger matrix looks like in equation 5.2, ie it is mostly zero.
· · · · · · · ·
· (∂y f )y · (∂z f )y · · · ·
· · · · · · · ·
· (∂y f )z · (∂z f )z · · · ·
(5.2)
· · · · · · · ·
· · · · · · · ·
· · · · · · · ·
· · · · · · · ·
The consequences of this are important enough to warrant their own proposition:
60
along the blue path (sum first) however is O(N · K 2 ).
In other words, the blue path is a really expensive way of calculating zeroes. In addition
to that, the function f 35 is actually a function of the ratio of its variables only (ie
f (y, z) = f¯(y/z)). Using the chain rule we obtain the following identities that allow us
to cut the number of calculations in half once more:
z y y
y ∂y f = − = f¯′ ( ) (5.3)
∂z f z z
With the mathematics out of the way, we can now discuss the scaling of our implemented
algorithm – which follows the blue path in diagram 5.1 – with respect to the number of
tokens. For the analysis in this section, we have kept the number of curves at a constant
1,000, which brings us to the region of 1 second calculation time for ECOS/SCS and 10
seconds for Clarabel. The results are presented in figure 5.3 and we make the following
observations:
• The marginal price algorithm starts out very well, at about 50ms.
• The performance however deteriorates quite dramatically with the number of to-
kens as the log/log plot in figure 5.3 shows, and the crossover with ECOS/SCS is
at about 100 tokens; our algorithm even seems even on its way to catch up with
Clarabel.
• The performance of the convex algorithms is almost flat with ECOS/SCS at about
35
A reminder that the function F is the aggregate price response function as defined in 16
61
1 second, and with Clarabel at about 10 seconds and increasing at the end.
Figure 5.3: Calculation time versus number of tokens (1,000 curves). This chart shows
the calculation time for the marginal price algorithm (blue), and the convex Clarabel
(orange) and ECOS/SCS (green/red) algorithms as a function of the number of tokens.
The chart is on a log/log scale.
This is what we would expect. Additional numbers of tokens for the convex solvers
means adding more constraints, but those constraints are covering fewer variables per
constraint because the overall number of variables is fixed by the number of curves.
ECOS/SCS seem to be mostly oblivious to an increase in token numbers, and even for
Clarabel the impact seems muted. However, as shown in proposition 22, the blue path
algorithm we are using for the marginal price method is O(K 2 ) with respect to the
number of tokens, and this is what we see in the chart 5.3.
We have not at this stage been able to prove this by implementing the red path algorithm,
for the reasons discussed above at the beginning of section 5.3. However, we did profile
our current algorithm and measured the cost of the Jacobian calculation as well as that
of the other operations, the results of which are shown in figure 5.4. We see that the
Jacobian calculation is already starting at about 50 percent of the total processing time
for only two tokens, and this increases dramatically with the number of tokens. At 100
62
Figure 5.4: Impact of calculating the Jacobian on the overall performance. The left hand
panel shows the calculation time as function of number of tokens (with 1000 curves) for
everything but the calculation of the Jacobian. The right hand panel redraws this on
a log/log scale, and adds the calculation time for the Jacobian (solid red). The surface
chart in the background shows the percentage of time spent on the Jacobian calculation
(linear scale).
63
tokens, virtually all processing time is consumed by the the calculation of the Jacobian.
We can see this by looking at the surface plot in the left panel, and also by the fact
that the time requirements for the non-Jacobian calculations displayed in the left hand
panel become quickly flat at about 50ms over all 1,000 curves, regardless of the number
of tokens.
6 Conclusion
This concludes the first paper describing the mathematics and algorithms underlying our
FastLane Arbitrage bot [15]. This bot is monitoring various chains, and specifically the
DEXes and Carbon DeFi deployments thereon, and is looking for arbitrage opportunities
there (ie it is looking for trades that allow it to make a profit without taking any
risk). This first paper is focussing on the newly developed marginal price framework for
identifying all arbitrage opportunities in a specific market or submarket.
The core of this paper is section 3 where we have described the the marginal price
optimization algorithm 36 for arbitrage and optimal routing and where we have shown in
theorem 19 that it is outcome-equivalent to the convex optimization problem developed
in [6, 7, 12] and described in section 2. This algorithm dramatically simplifies the
optimization problem by only searching on the optimal surface described by the marginal
prices between the tokens in the system. This reduces the number of variables from two
per curve to one per token 37 , and converts the optimization problem into an often better
conditioned root finding problem.
64
that are dominated by certain configurations of levered curves, it is significantly more
robust in this respect, and the reasons for divergence are well understood.
We have presented two different implementations of the marginal price algorithm, the
pair optimizer based on the bisection algorithm, described in section 4.1, and the full
optimizer that is based on the Newton-Raphson / gradient descent algorithm described
in section 4.2. As the name implies, the pair optimizer only work on pairs, and we have
found that whilst it is slower than the full optimizer by a factor of up to 10x, it is more
robust – in fact it always converges, regardless of market conditions. Therefore, the
decision which optimizer to use for pairs can be hard39 , depending whether one values
resource use and latency or robustness more.
This paper only covered part of the technology underlying the FastLane Arbitrage bot.
Notably what is missing is
We also developed another algorithm – the Graph Mode – that uses a completely
different approach to identifying arbitrage opportunities, and that does not suffer from
the same convergence issues as the marginal price algorithm. It however displays other
issues, notably around scaling, that we are still working through. Ultimately we will
probably setle on a horse-for-courses approach where both algorithms are used in their
respective sphere.
39
One may also consider combined applications that start with the full optimizer on pairs, and escalate
cases of non-convergence to the pair optimzier
65
All of the above are subject to forthcoming papers that we currently have in preparation,
and that we will publish in due course.
66
References
[1] Stephen Abbott. Understanding Analysis. Springer, New York, 2nd edition, 2015.
[2] Hayden Adams. Uniswap: A Constant Product Market Maker for decentralized
finance. Technical report, Uniswap Labs, November 2018. Whitepaper.
[3] Hayden Adams. Uniswap whitepaper [work in progress]. Technical report, 2018.
[4] Hayden Adams, Noah Zinsmeister, and Dan Robinson. Uniswap v2 core. Technical
report, Uniswap Labs, March 2020. Whitepaper.
[5] Hayden Adams, Noah Zinsmeister, Dan Robinson, Moody Salem, River Keefer, and
Alex Martinelli. Uniswap v3 core. Technical report, Uniswap Labs, March 2021.
Whitepaper.
[6] Guillermo Angeris, Akshay Agrawal, Alex Evans, Tarun Chitra, and Stephen Boyd.
Constant Function Market Makers: Multi-asset trades via convex optimization,
2021.
[7] Guillermo Angeris, Tarun Chitra, Alex Evans, and Stephen Boyd. Optimal routing
for Constant Function Market Makers, 2022.
[9] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge Uni-
versity Press, Cambridge, UK, 2004.
[12] Theo Diamandis, Max Resnick, Tarun Chitra, and Guillermo Angeris. An efficient
algorithm for optimal routing through Constant Function Market Makers, 2023.
67
[13] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling lan-
guage for convex optimization. Journal of Machine Learning Research, 17(83):1–5,
2016.
[14] Daniel Engel and Maurice Herlihy. Composing networks of Automated Market
Makers. In Proceedings of the 3rd ACM Conference on Advances in Financial Tech-
nologies, AFT ’21. ACM, September 2021.
[20] Herbert Goldstein, Charles Poole, and John Safko. Classical Mechanics. Addison-
Wesley, San Francisco, 3rd edition, 2002.
[21] Eyal Hertzog, Guy Ben Artzi, Galia Benartzi, and Yehuda Levi. Methods for
exchanging and evaluating virtual currency [US Patent 12045807B2], 2024.
[22] Eyal Hertzog, Guy Benartzi, and Galia Benartzi. Bancor Protocol: Continuous
liquidity and asynchronous price discovery for tokens through their smart contracts;
aka ”Smart Tokens”. Technical report, Bprotocol Foundation, 2017. Draft Version
0.99.
[23] Eyal Hertzog, Yehuda Levi, Barak Manos, Asaf Shachaf, and Guy Ben Artzi.
Smart contract of a blockchain for management of cryptocurrencies [US Patent
20240119444A1], 2024.
[24] John C Hull. Options, futures, and other derivative securities. Prentice Hall, 2
edition, 1993.
68
[25] Stefan Loesch. The quantitative finance aspects of Automated Market Makers in
DeFi, 2022.
[26] Stefan Loesch, Nate Hindman, Mark Bentley Richardson, and Nicholas Welch. Im-
permanent Loss in Uniswap v3, 2021.
[27] Stefan Loesch and Mark Bentley Richardson. Decomposing arbitrage transaction,
forthcoming.
[28] Fernando Martinelli and Nikolai Mushegian. A non-custodial portfolio manager, liq-
uidity provider, and price sensor [Balancer Whitepaper]. Technical report, Septem-
ber 2019. Whitepaper.
[29] J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The
Computer Journal, 7(4):308–313, 01 1965.
[31] Mark Bentley Richardson and Stefan Loesch. Carbon: A decentralized protocol for
asymmetric liquidity and trading. Technical report, Carbon Protocol, November
2022. Last updated: 7 Jan 2023.
[32] Mark Bentley Richardson and Stefan Loesch. Carbon litepaper: A decentralized
protocol for asymmetric liquidity and trading. Technical report, Carbon Protocol,
November 2022. Litepaper.
[33] Mark Bentley Richardson and Stefan Loesch. DeFi’s concentrated liquidity from
scratch, 2024.
[34] Mark Bentley Richardson, Stefan Loesch, Barak Manos, and Asaf Shachaf. Cus-
tomizable cryptocurrency trading [WO Patent 2024084480A1], 2024.
69
Appendix
A “radome” [10] is a structure that protects a radar antenna from the elements, and
typically has the geometry shown in figure A.1. It also provides an excellent example for
a class of convex optimization problems, referred to herein as the “radome optimization
problems”.
In terms of radome shapes we consider the following types (a) the balloon type, a
smooth surface that is convex in all directions (b) the planar type, a collection of flat
surfaces that is convex in all directions, like shown in figure A.1 and (c) the mixed
type, a ”mostly smooth” surface that can be thought of as replacing the flat surfaces in
the planar type with convex surfaces, in a manner that retains the vertices and edges of
the planar type and overall convexity.
The problems A1 and A2 are traditional smooth optimization problems that can be
solved with Lagrange multipliers [20]: at the optimal point, the gradient of the constraint
must be equal to the gradient of the objective function, the latter being constant (0, 0, 1)
in case of the “height” function in (1) and a vector pointing in the direction of the sun in
the general linear case. In case (2) the gradient is orthogonal to the usual equidistance
surfaces around the target point.
If we assume the radome is the unit sphere then the solution of A1 in the height case is
the north pole, and more generally x/||x|| where x is a vector pointing in the direction
70
Figure A.1: Example for a radome. Example for a radar dome (“radome”), a structure
that consists of flat or near flat segments touching one another along non-smooth con-
nector lines (image credit Wikipedia)
71
of the sun.
The planar type radome – the usual form as depicted in figure A.1 – is an example for
a set of linear constraints: for each of the faces ν = 1 . . . M we have a linear condition
on the points x inside that radome that reads
nν · x ≤ nν · xν (A.1)
where nν is the normal vector of the face ν, and xν is any point on that face. We
note that the normal vector nν is the equivalent of the gradient of the constraint in
the smooth case. Therefore the Lagrange condition above cannot usually be strictly
satisfied40 . In case we have linear constraint here we arrive at the well-known linear-
programming problem as described in many text books (eg [11]). In this case the solution
is almost always41 at a vertex of the radome, which it can be found, for example, using
a simplex algorithms.
The interesting cases – and most relevant for us – are B2 and C1; B2 because it is
the most intuitive and C1 because it most closely relates to the actual problem we are
solving, which has a linear objective function on a mixed-type-constraint hyper-surface.
Starting with B2, the solution has two steps: we need to find the face that is closest to
the target point, and on the face we need to find the actual closest point. The algorithm
for finding the optimal face is similar to the simplex algorithm in linear programming,
except that we have a duality where faces now play the roles of vertices and vice versa.
Once we have identified the face we can then use a gradient algorithm to find the optimal
point on that face42 .
In case of C1 and C2, the constraint itself is neither linear nor entirely smooth. Depend-
40
To formalize this statement we could introduce a measure on the space of gradients that is uniform
on the unit sphere and when the sphere is deformed into a planar structure we find that all mass will be
in the vertices in the sense that faces and edges are of measure zero
41
In the sense of the measure introduced in the previous footnote
42
In case of a simple metric, we could of course do this part of the calculation analytically, but for
didactical reasons we assume that we do not have this shortcut available to us
72
ing on the convexity of the faces, there is a now a finite chance43 of the solution being
an “interior solution” (on one of the faces) and a finite chance of it being a “corner
solution” (on a vertex or edge). Again we may use a simplex-type algorithm to identify
the optimal face, and on the face we may use a gradient descent to lead us to the opti-
mal point. Importantly, we cannot simply start with a gradient descent before we have
identified the correct face because due to the non-smoothness of the constraint function
the local geometry is not a good guide to the global geometry of the problem.
Here we discuss the repatriation problem referred to in section 3.3 in more detail. High
level, the issue is when an arbitrage opportunity exists, but the profits cannot be ex-
tracted because there does not exist sufficient capacity to trade into the chosen target
token. For example, consider the following set of curves providing the arbitrage:
• Buy/sell USDC for USDT at 0.99 USDT per USDC (1m USDT capacity)
• Sell/buy USDT for USDT at 1.01 USDT per USDC (significantly larger capacity)
Assuming those curves are infinitesimally thin, one possible trade here is to start with
1m USDT, sell them for 1.01m USDC on the first curve, and to sell those against
USDC for 1.02m USDT, yielding a profit of 20k USDT. Alternatively one can start
with 0.99m USDC, sell them for 1m USDT on the second curve, and sell those for 1.01m
USDC. Again, the profit is 20k USDC. Using thicker curves44 will somewhat blunt that
opportunity and for the purpose of the argument we assume that we can make USDx
15k in profit in either direction.
43
The term “chance” is again to be understood in terms of the measure on the gradients introduced
above; here, because the faces have a non-zero convexity, there is a non-zero mass of the distribution of
gradients on the faces, so not all of the mass is concentrated in the vertices
44
The MPF algorithms will not be able to deal with infinitesimally thin curves to accurately determine
trading volumes, so in practice we will need to regularize the curves by imposing a minimum width; the
SOF algorithm operates on amounts rather than prices so in principle would be able to handle it; however
we have found in practice that the SOF algorithms available to us would perform worse than our MPF
algorithm
73
In either the convex optimization formulation (“COF”) according to definition 3 or the
marginal price formulation according to definition 18 we need to define a target token
in which to extract the profits. If it is either USDC or USDT both will find the same
solution – a target token of USDT will yield the first of the solutions above, and a target
token of USDC will yield the second one.
We now assume that the target token is WETH, and we will consider a number of
different cases.
High capacity curve. The baseline positive case is that there is a high capacity curve
linking USDx to WETH. In this case both algorithms will converge nicely, the MPF one
by choosing the price on the WETH curve that absorbs the 15k profit, and the COF
that works on quantities rather than prices by pushing those 15k through the WETH
curve at whatever price that yields.
Low capacity unlevered curve. We now consider any unlevered curve linking USDx
to WETH. The magic of unlevered curves is that they can absorb any token amounts
– the only question is at which price. Fundamentally this is not different from the high
capacity curve case above: both algorithms will converge, one by pushing the 15k USDx
into the curve, the other one by finding a price point at which the 15k USDx can be
absorbed. The difference to the high capacity case is only that the curve is restricted by
its WETH token balance in what it can release – if it holds 1 WETH then max output
will be (below but possibly close to) 1 WETH; if it holds 0.1 WETH it will be 0.1
WETH and so on. In other words: the algorithms will converge45 , albeit possibly to a
solution that is not particularly advantageous.
Limited capacity levered curve. We now consider a levered curve that is linking
USDx into WETH. The key difference between this and the low capacity unlevered curve
case is that because prices are bounded there is a maximum amount of USDx that
this curve can absorb. Once it has dispensed all its WETH, and reached the, from
the trader’s perspective, worst possible price in the range covered, it will simply stop
45
Unless the MPF algorithm has a price cutoff that is being hit
74
trading. The COF algorithm should converge: provided the solver can deal with the
boundary condition it will understand the maximum amount that can be repatriated
and will adjust the trade amounts through the USDx curves accordingly. The MPF
algorithm will do the same, provided that (a) it can adjust the prices on the USDx
curves sufficiently finely to ensure the profit matches the capacity on the curve, and (b)
the price adjustment of the USDx/WETH curve is done in a way that either it does not
overshoot the boundary, or the algorithm can deal with overshooting into the no-man’s
land beyond the curve without failing.
No curve. Finally we assume that there are no curves linking either of the USDx tokens
to WETH, directly or indirectly. How will this manifest itself in the two cases? In the
SOF it will depend on the solver. Ultimately, because there is no path into WETH
none of the operations the optimization algorithm performs on the USDx curves will
impact that target function. Therefore, ultimately it will fail because whatever it does,
the target function will remain unaffected. In the MPF case we can see this even more
clearly. The Jacobian defined in equation D.6 will be singular because all derivatives
between USDx and WETH will be zero46 . This in turn means that the update rule in
equation D.9 will fail47 and the algorithm will run into the max iteration limit without
being able to fulfil the solver condition of ∆USDC = 0 and ∆USDT = 0.
In summary – both algorithms, if sufficiently well done, converge to the same result
under the repatriation problem. There will be no convergence in case no repatriation is
possible because there are no curves, and if repatriation is limited then they will restrict
the amount of USDx arbitrage performed.
46
The “no indirect connections” condition here is important; if there are trade opportunities between
USDx and WETH that go via other tokens then the Jacobian will not be singular
47
Note that it will fail despite the modified update rule that ignore the null space when trying to
invert the Jacobian, the issue being that the null space is the one connecting USDx and WETH
75
C The Multiple Solutions Problem
Here we discuss the multiple solutions problem referred to in section 3.3 in more detail.
High level the issue is that there could be multiple solutions to the optimization problem
that perform equally well in terms of arbitrage profits, but they result in different trading
instructions. The archetypical example is buy-low-sell-high where there are multiple
options to trade into at the same conditions and where the volume is limited on the
other side. Consider the following set of three curves, all with a capacity of 1m USDT:
• Curves B1, B2: Buy/sell USDC for USDT at 1.01 USDT per USDC48
The trade is buying 1m USDC on curve A at 0.99 USDT and selling it into B1 or
B2 or any combination thereof at 1.01 USDT, for a profit of 20k USDT. The multiple
solutions come from the fact that, provided there is no slippage, there is no difference
selling into B1, B2 or into any combination thereof that has the right capacity.
The point about no slippage is important here. If there were slippage on the curves,
there would be a unique solution to the optimization problem: partitioning the trade so
that the post-slippage marginal prices are the same across all curves. This issue only
arises because the marginal prices do not move with volume.
The algorithm under the convex optimization framework (“COF”) operates on quan-
tities, and moving quantities between B1 and B2 does not impact the target function.
Any well designed algorithm will not stumble over this, and the chosen partition will
depend on details of the algorithm and starting conditions. The algorithm under the
marginal price framework (“MPF”) however operates on prices instead of volumes – the
latter are implied. Therefore the MPF algo cannot operate on no-slippage curves, and
in our implementation of the algorithm we regularize the curves by enforcing a certain
minimum width. Therefore the MPF algorithm will always converge to a unique post-
48
The curves B1, B2 could be composite curves trading through one or multiple other tokens, and
there could be more than two curves, none of which would substantially change the argument
76
regularization solution, equalizing the marginal prices on all curves and distributing the
trade volume accordingly.
77
D Numerical Methods
In this appendix, we discuss the bisection and Newton-Raphson methods, and how they
can be used for finding minima and maxima and roots.
The bisection method is a root-finding method for one-dimensional functions that has
very interesting properties. Notably, it is very robust in that it is guaranteed to con-
verge (depending on the convergence criteria chosen) on a very wide range of functions,
including those where roots do not really exist.
The algorithm is very simple: given function f (x) and a bracketing interval [a, b] such
that f (a) and f (b) are of different sign, we can find a root of any continuous function
by repeatedly bisecting the interval, checking the sign of the mid point, and moving the
boundary of the interval that has the same sign as the mid point to the mid point.
This algorithm yields a sequence (ai , bi ) of intervals such that f (ai ) and f (bi ) have
different signs, and the length of the interval |bi − ai | halves at every step, and therefore
converges to zero. Because of the intermediate value proposition, we know that if f is
continuous, there is (at least) one root in the interval [ai , bi ], and therefore the location
of the root can be approximated to arbitrary precision.
In the figure D.1 we are providing a number of example of functions and we briefly discuss
how the bisection method can be used to find the roots. We start with the left hand
panel where we have continuous functions, and we assume that the starting interval [a, b]
is such that f (a) and f (b) have different signs, but not necessarily symmetric around
the origin. For the highly regular “softsign” function, we can see that the bisection
method will always converges to the root at x = 0. Moreover, because the derivative is
uniformaly bounded in the sense of equation D.1, we can propagate the error on the x
axis to the y axis, meaning that we not only know that position of the root with a certain
78
precision ε ≃ bi − ai , but we also know that the error on the y axis is approximately
bounded by cε.
With the trigonometric “sine” function, we can see that the bisection method will con-
verge to a root as well, provided we start with an interval that has opposing signs.
However, the function has multiple roots, and to which of them we converge will depend
on the choice of the starting interval.
√
3
Finally we have the “root” function f (x) = x. This function has a root at x = 0, but
at this point the derivative is unbounded. Therefore the error on the y axis can be quite
big, even if the convergence on the x axis is advanced.
In the right hand panel of the figure D.1 we have some more pathological, specifically
discontinuous functions. Firstly, to make the obvious point: two out of the three func-
tions do not have roots, in the sense that f (x) = 0. However, they have “root locations”
x0 where they change sign:
Note that not all functions have this property. For example, consider f (x) = sin(1/x)
with f (0) = 0 around x = 0. This function has an infinite number of roots that are
dense around x = 0, so there is no ε that could separate them. However, if the functions
we consider are continuous except for a finite number of points in every finite interval,
that above equation will apply and distinct roots locations can be identified and found
with the bisection method. The value on that location of course may be undefined, in
the sense that the left and right limits do not coincide (and neither of them may be
zero) in any case (the “sign” function), or those limits do not even exist (the “inverse”
function f (x) = 1/x). However, especially in the “finite jump” case we note that we
79
can always regularize the function (eg by convolution), leading to something akin to
the “softsign” function, which as C ∞ with bounded derivative (for every fixed value of
the regularization parameter), and which matches finite-size functions reasonably well.
As previously mentioned, this case is particularly important to us, because limit orders
correspond to discontinuous functions. To make them continuous, even differentiable,
we regularize these orders by converting them into very narrow range orders.
Figure D.1: Example functions for root search. Both panels show functions discussed
in the text for the performance of the root search algorithm on them. The left hand
panel has benign functions that have one or multiple regular roots. The right hand panel
shows functions that have a root location but either very badly conditioned root (blue),
or not roots at all, with a finite jump (orange) or an infinite one (green).
The bisection method can also be used to find (interior) minima and maxima of functions
by using the fact that the derivative of a function is zero and changes sign at those points.
In other words – if we are looking for minima or maxima we can use the bisection method
to find the roots of the derivative of the function. The derivative can be calculated either
analytically, or numerically for example using the finite difference formula D.3 for some
small value of h:
80
f (x + h) − f (x − h)
f ′ (x) ≃ (D.3)
2h
Again, this algorithm is very robust. Specifically, we again consider the case of a non-
differentiable function like the one in figure 1.3. As we can see in the right hand panel,
the derivative is a step function that is constant between the steps (ie it locally looks like
the sign function in figure D.1). We have seen that those functions can be regularized
(eg using a convolution method), yielding functions similar to the softsign function in
the same figure. This function is differentiable and therefore the bisection method will
converge to the correct root of the regularized function. However, we actually do not
have to regularize the function for the algorithm to converge. As discussed above it will
converge, on the x axis, to the root location as defined above. The error is more benign
than in the root finding case. As we are looking for a maximum or minimum, by design
the area around the root is somewhat flat, therefore in practical applications the error
on the y axis is typically small. However, it must be stressed that the term “in practical
applications” is an important caveat here. In particular we can imagine a very spiky
function – think Dirac Delta. For example, if we look at a Gaussian kernel function as
defined in equation 1.6 and we take a very big value of λ – which is a popular choice for
a Dirac-Delta-like C ∞ function – then the bisection method will find the location of the
spike, but it may not be that good at estimating its exact height.
D.1.3 Convergence
To build on our previous discussions we now briefly discuss the convergence properties
of the algorithm. We only discuss the root finding case here, but the discussion can
be easily extended to the minima and maxima case by replacing the function with its
derivative.
Firstly, we know that, provided the start conditions are fulfilled, the algorithm always
converges “exponentially” on the x-axis, and we know exactly the speed of convergence:
after n steps, the size of the interval will be 2−n of the original interval size. See figure
81
D.2.
Convergence on the y-axis depends on the type of function we are looking at.
An important sub case of this is where the function is differentiable twice every-
where, which implies that in any compact interval – and we only care about those for
numerical applications because we need to choose our initial [a, b] – the derivative is
bounded, which means the above condition applies.
For the next condition we introduce the notion of separated points a variation of which
we have already seen in equation D.2: a set of points {xi } is separated if there exists an
ε so that for all i ̸= j we have |xi − xj | > ε.
So, if the root location is one of those points, the algorithm will converge to it, as defined
in equation D.2, but we can’t make any predictions about the error on the y-axis. For
example, look at the right hand panel of figure D.1 for examples of such functions. For
example, if we use the sign function, unless we by chance hit the root exactly, the value
at the mid point will either be 1 or -1. Even worse is the case of the inverse function
1/x where the mid point of the converged interval will be the larger in absolute value
the smaller the interval is, on top of the uncertainty around its sign.
In other words – in this case, whilst we can be certain about convergence on the x-axis,
82
convergence on the y-axis is undetermined, and in the worst case the longer we run the
algorithm, the further the value diverges from zero.
There are more pathological cases of functions that we can consider here. For example,
we can have a function with an infinite number of roots in any finite interval containing a
specific point (eg, the aforementioned sin(1/x) around zero). In this case, the algorithm
will still converge on the x-axis but we do not know where to. Whilst this case is
interesting from a mathematical point of view, it is not relevant for our problem set as
we will only encounter functions of the first two types in our problem set.
Figure D.2: Bisection progress over time. Both panels show how the size of the bisection
bracket contracts with number of steps, compared to the initial size of the bracket. The
left hand panel is on a linear scale, the right hand panel is on a log scale.
The bisection method is hard to extend to higher dimension because the geometry of
the problem is more complex, and because the intermediate value proposition no longer
applies. To understand this, we want to look at the following two dimensional problem
83
f (x) x+y−b 0
f (x) = = = (D.4)
f (y) x2 + y 2 − 1 0
that depends on a parameter b. Note that of course in two dimension we not only have
two variables but also two functions, and we are looking joint root f (x) = 0. We have
drawn the level sets for two different parameter values of b in the figure D.3. In both
panels we see the circle of radius one that is the level set of the second function. In
the left panel we also see the line y = 2 − x, whilst in the right panel we see the line
y = 1 − x.
Figure D.3: Attempting multi-dimensional bracketing. Both panels show the level sets
fi (x) = 0 (i = 0, 1) for the two components of vector valued function f . In the left panel,
those level sets do not intersect, so the equation f (x) = 0 does not have a solution. On
the right hand panel, the level sets intersect so there are solutions. The red rectangles
represent an attempt at two dimensional bracketing discussed in the text.
Firstly we note that the left panel does not have a solution to the problem as the two
level sets do not intersect and therefore there is not joint root. However, the right panel
does have two solutions, at the intersection points of the circle and the line.
We recall that for a bisection method we need a change of sign, and we note that above
84
(below) the line and outside (inside) the circle sign of the respective function is positive
(negative). We note that only on the right panel we can identify a (hyper)rectangle where
all possible signatures are present: if we start from the bottom left and go clockwise we
have (−, −), (+, −), (+, +), (−, +). We also note that there is a root in this space. This
is not by chance. If we can contract the rectangle to zero without changing the border
signatures (as indicated with the smaller rectangle) we will eventually converge against
a root location. However, in this process we encounter a number of problems
• How do we know that there is a (joint) root? In high dimensions this problem is
much harder to solve than in one dimension.
• Relatedly, how do we identify the hyper-rectangle that satisfies the correct signa-
ture conditions at the boundary?
• Finally, if we have such rectangle, how do contract it without violating the bound-
ary conditions?
None of those problems are unsolvable. However, they are hard enough that bisection
in higher dimensions is not a popular method for generic problems, and we will only
consider it in one dimension.
The Newton-Raphson method (also named “Gradient Descent”) uses the first derivative
(“gradient”, in higher dimensions) of a function to find its roots. A worked example is
provided in the figure D.4. Here the algorithm starts at a point x0 ≃ 1 (1a). This point
is transported along the orange tangent line to the point (1b) where that tangent line
intersects the x axis. Point (2a) has the same x coordinate as (1b), and the process is
repeated. We see that (4b) is already very close to the root, which is a general result of
gradient descent methods. On benign functions they converge very quickly to the root
– and of malignant functions they may not converge at all, but we well discuss this in
more detail below.
85
Figure D.4: Newton-Raphson worked example. The above figure shows the Newton-
Raphson method in action. The blue line represents the function whose root is to be
determined. The algorithm starts at point (1a) and gets moved along the orange tanget
to (1b). The process then repeats from (2a) which is the same x coordinate as (1b) via
the black tangents up to the point (4b) that is in our example judged sufficiently close
to being a root.
86
D.2.1 Convergence
• If the convexity is directed “towards” the x axis (ie what happens in figure D.4
in the area where f (x) > 0), and the root exists, convergence is guaranteed and
swift.
• If the convexity is directed “away” from the x axis (ie what happens in figure D.4
in the area where f (x) < 0), and the root exists, then the algorithm will move to
the other side of the root.
• If convexity has the same sign across the entire real axis, and the root exists, the
algorithm will converge, because either the convexity is directed towards the x axis
in the region of interest, or the first step transports the algorithm to the other side
of the root into a region where the convexity is directed towards the x axis.
Now we look at a few things that can go wrong. Firstly, we look at what happens if
there is no root to be found. This case is in figure D.5 in the top-left panel (a). We
start at point (1a) and because there is no root, step (2a) overshoots to the other side
of the minimum. Step (3a) then brings is back very close to the minimum, and because
the function there is very flat, step (4a) brings us very far to the left. In this case, the
algorithm will enter an infinite cycle that will only break if either we hit exactly the
minimum where f ′ (x) = 0 and we get a division by zero, or we leave the domain where
the function has been defined, or we hit a numerical limit. Note the top-right panel
which is the same function, except that it (just) has a root. There, the convergence is
just fine.
What we have just seen is that the algorithm does not converge if there is no solution
that can be found. This can be a nuisance sometimes, but arguably it is not a major issue
– after all there is no root to be found, and non-convergence is a good albeit possibly
87
Figure D.5: Examples of Newton-Raphson with potentially problematic convergence.
Those four panels show how the Newton Raphson algorithm performs on some prob-
lematic functions. The labels are like in figure D.4 except we omit the (b) labels. The
top left panel has no root, and the algorithm enters an infinite cycle. The top right
panel as a comparison shows the same function with a root. There convergence is fine.
The bottom left panel shows a function with a single inflection point where convergence
is fine. The bottom right panel shows a function with two inflection points that has a
perfectly well conditioned root, but where the algorithm enters an infinite cycle because
of the badly chosen starting point.
88
expensive indicator of that. The other case we look at is if there is a change in convexity.
As the bottom-left panel (c) shows, this can be alright is there is single inflection point.
However, as the bottom-right panel (d) shows, multiple inflection points, especially with
very high or even infinite convexity values, can lead the algorithm into an infinite cycle.
This is particularly vexing as the function is generally very well behaved, and had we
chosen a starting point in the inner region, convergence would have been immediate.
We have seen above that in many instances the Newton-Raphson method can overshoot
the root because it extrapolates local behavior across the entire curve. This problem
can be mitigated by introducing a learning rate η < 1 that reduces the step size of the
update, allowing the algorithm to adapt to change in the local conditions. This flexibility
however comes at a cost in that the convergence of the algorithm in the quasi linear case
is now slower.
where x(∞) = −b/m is the actual root of the function. In other words, like Zeno’s arrow,
at every step we get closer to the root by a constant percentage. However, unlike in
89
Zeno’s paradox, every step is constant time so for η < 1 the algorithm will indeed never
quite reach the target.
Figure D.6: Impact of the learning rate on the Newton Raphson algorithm. The left
panel shows the mechanics of the Newton Raphson algorithm with learning rate η = 1
(red) and η = 0.5 (green), in line with the depiction in figure D.4 but with the labels
(a), (b) omitted. The black line represents the first tangent. The red lines place the
points (a) directly above the intersection and continue from there. The green lines only
move the fraction η of the distance and place their new points (a) there. The right panel
shows the convergence of the algorithm on a linear function. For η = 1 convergence is
immediate, and for lower values of η it takes increasingly more steps to approach the
target value which is never reached.
In figure D.6 we provided some analysis and illustration on the impact of the parameter
η. In the left hand panel we are looking at the “Zeno convergence”, indicating how fast
a linear function converges to the actual value. As we can see – at η = 1 convergence
is perfect in step 1, but for any η < 1 convergence is slower and always asymptotic.
On the right hand panel we visualized how a non-linear function converges according to
different values of η: the red one is η = 1 and in this case convergence is very quick.
The green one is η = 0.5, and here it takes significantly more steps to converge.
90
D.2.3 Higher dimensions
This algorithm easily generalizes to higher dimensions, where the gradient is replaced by
the Jacobian matrix, and the tangent line by the tangent space. In the simple implemen-
tation shown above, effectively the function is replaced by its best linear approximation,
and the root of that approximation is found. This “linear root” is then used to start the
next step of the process.
∂fi
Jij (x̂) = (x̂) (D.6)
∂xj
The root of this linear approximation can be found by setting the right hand side to 0
and solving for x. Specifically, if at step s we are at the point x̂(s) then, using the linear
approximation for x̂(s+1) , we get
where J −1 (x̂(s) ) is the inverse of the Jacobian matrix at the point x̂(s) .
91
where
∆x̂(s) = −η J −1 (x̂(s) ) · f (x̂(s) ) (D.10)
92
E Explanation of key charts and tables
Figure E.1: Representation of a single AMM curve with state. The left panel shows an
unlevered invariance (bonding) curve one the WETH/USDC pair: the AMM is indif-
ferent between all combinations of token holdings on this curve, and currently it is at
the point indicated by the star. The right hand curve shows a levered curve where the
interpretation is the same, except that the state cannot go beyond the solid part of the
curve. The dotted part is the associated unlevered curve.
Our key means of representing the state of the world are the curve charts. Fundamentally,
they represent the invariance curves of the respective AMMs, with the current position
marked with a star. We consider two types of curves, levered and unlevered ones. In the
former, for example described by the traditional AMM equation 1.3, liquidity is placed
all along the curve that therefore covers the entire price range between p = 0 and p = ∞.
Such a curve is shown in the left hand panel of figure E.1. A levered curve on the other
hand only places liquidity inside a certain price range, and the mechanism operates
with virtual token balances as described in equation 1.5. Such a curve is shown in the
right hand panel of figure E.1 where liquidity is only available for virtual ETH balances
between roughly 1 . . . 2 WETH and 400 . . . 700 USDC. In both cases, the current state
of the AMM on the curve is depicted by a star. Ignoring fees, all trades must happen
93
on the solid part of the respective curve. The dotted part on the right hand panel is
only for information purposes – this is where the AMM would trade if the curve was
unlevered with the same virtual token balances.
Here, both curves are trading WETH against USDC. The left, unlevered one has a pool
constant k = 500, and the current price is p = 1500 USDC per WETH50 . The current
state, as indicated by the blue star, is at virtual balances of 0.58 WETH and 866 USDC,
the ratio of the two numbers corresponding to the price of 1500. This curve trades over
the entire price range. The right hand curve is a levered curve that only trades in the
small area where the curve is solid. The current price, again indicated by the blue star,
is 3000, and the range is determined by the both ends of the solid curve at prices of
2, 000 and 3500 respectively.
In section 2.2 we have run the convex optimization algorithm on a number of curves,
notably figure 2.1 (pair), figure 2.2 (triangle), figure 2.3 (triangle and pairs), and figure
2.4 (levered pair).
In table 2.1 (corresponding to figure 2.1) we see the trade instruction table associated
with the pair trade. It contains the following lines:
• Price line: the post trade price of the respective token, in units of the “target
token” in which the profit is taken (which can in this line be identified by a value
of 1).
• Curve lines (ETH1-3 ): the trading lines corresponding to specific curves, as seen
from the AMM. A negative number is an outflow of the respective token, and a
positive number an inflow.
94
• Net line: the difference between aggregate in and outflows; in an arbitrage trans-
action all flows but one should be approximately zero, meaning that, in aggregate,
no token flows happen in those tokens, and one flow (the one in the target token)
should be negative, indicating the profit taken by the trader.
In table 2.2 (corresponding to figure 2.2) we are looking at a triangle arbitrage, meaning
that we have three tokens (WETH, WBTC and USDC) and one curve each for the
corresponding pairs. The structure of the table is the same as in table 2.1 except that
we have one additional token column. In table 2.3 (corresponding to figure 2.3) finally
we have 3 tokens and 3 curves per token, so a total of 9 curve lines.
95
F Implementation example
In this appendix we describe in more detail the assumptions made and data used in
section 4. We have 7 tokens, TKN0 to TKN6, and their base prices (ie the prices they
would have without deviations) are given in table F.1. For simplicity they follow a
geometric progression. For example, TKN4 has twice the price of TKN3, and four times
that of TKN2.
We can confirm that there is only a very small arbitrage in the original scenario corre-
sponding to table F.2 when we look at the trade instructions in table F.4 and F.5. The
total arbitrage is about 400 USD, both when extracted via TKN0, and via TKN2.
When we add the curve CaX from table F.3 as shown in tables F.6 and F.7. However, that
arbitrage massively increases to an amount of about 40,000 USD. This is unsurprising,
given that the price of this curve is 20% off the base price, and the capacity the curve
is 10 times bigger than that of the standard curves, and commensurate with the curves
Ca0 to Ca2 that close the arbitrage51 . As expected, most of the trading activity happens
in those four curves and their associated tokens TKN1 to TKN4, plus TKN0 if the profit
is taken in that token.
51
Curve CaX trades TKN1 and TKN4, and the three other curves close this in a square via TKN2
and TKN3
96
Pair Price Base Price Deviation L L USD
cid
C00 TKN6/TKN3 8.1411 8 1.8% 46,341 1,048,576
C01 TKN6/TKN3 8.0320 8 0.4% 46,341 1,048,576
C02 TKN0/TKN2 0.2524 1/4 1.0% 524,288 1,048,576
C03 TKN4/TKN3 2.0448 2 2.2% 92,682 1,048,576
C04 TKN3/TKN2 2.0374 2 1.9% 185,364 1,048,576
C05 TKN3/TKN2 1.9805 2 -1.0% 185,364 1,048,576
C06 TKN4/TKN1 8.0760 8 1.0% 185,364 1,048,576
C07 TKN4/TKN1 7.9879 8 -0.2% 185,364 1,048,576
C08 TKN2/TKN1 1.9979 2 -0.1% 370,728 1,048,576
C09 TKN6/TKN0 64.2628 64 0.4% 131,072 1,048,576
C10 TKN4/TKN2 4.0058 4 0.1% 131,072 1,048,576
C11 TKN4/TKN5 0.5073 1/2 1.5% 46,341 1,048,576
C12 TKN6/TKN4 4.0304 4 0.8% 32,768 1,048,576
C13 TKN1/TKN2 0.5006 1/2 0.1% 370,728 1,048,576
C14 TKN0/TKN5 0.0314 1/32 0.4% 185,364 1,048,576
C15 TKN0/TKN5 0.0314 1/32 0.3% 185,364 1,048,576
C16 TKN2/TKN3 0.5075 1/2 1.5% 185,364 1,048,576
Ca0 TKN1/TKN2 0.5000 1/2 -0.0% 2,621,440 7,414,552
Ca1 TKN2/TKN3 0.5000 1/2 0.0% 3,707,276 20,971,520
Ca2 TKN3/TKN4 0.5000 1/2 0.0% 3,707,276 41,943,040
97
It is important to understand that the trade instructions form a connected system that
moves tokens around. Once those instructions are created, individual curves cannot
simply be removed, even if their contribution to the arbitrage is minimal. Each curve
plays a role, and if one is removed, the associated flows must be rerouted through other
curves. This rerouting may or may not significantly increase costs, depending on the
importance of the curve to a specific trade.
98
TKN0 TKN1 TKN2 TKN3 TKN4 TKN5 TKN6
PRICE 1.00 1.99 3.99 7.97 15.96 31.74 64.38
C00 -536 66
C01 353 -44
C02 3,190 -803
C03 -1,401 693
C04 -2,405 1,191
C05 1,315 -661
C06 -2,105 262
C07 777 -97
C08 352 -176
C09 979 -15
C10 -39 10
C11 284 -143
C12 29 -7
C13 400 -200
C14 -1,990 63
C15 -2,565 81
C16 1,968 -991
Ca0 575 -288
Ca1 628 -314
Ca2 2,360 -1,179
AMMIn 4,169 2,105 3,911 3,904 1,276 143 66
AMMOut -4,555 -2,105 -3,911 -3,904 -1,276 -143 -66
TOTAL NET -386 0 0 0 0 0 0
99
TKN0 TKN1 TKN2 TKN3 TKN4 TKN5 TKN6
PRICE 0.25 0.50 1.00 2.00 4.00 7.96 16.15
C00 -541 67
C01 348 -43
C02 3,355 -844
C03 -1,402 693
C04 -2,407 1,192
C05 1,313 -660
C06 -2,109 262
C07 773 -97
C08 353 -177
C09 1,097 -17
C10 -42 11
C11 277 -140
C12 26 -7
C13 401 -201
C14 -1,939 61
C15 -2,513 79
C16 1,966 -990
Ca0 582 -291
Ca1 586 -293
Ca2 2,346 -1,173
AMMIn 4,452 2,109 3,865 3,886 1,269 140 67
AMMOut -4,452 -2,109 -3,962 -3,886 -1,269 -140 -67
TOTAL NET -0 -0 -97 -0 -0 -0 -0
Table F.5: Trade instructions with little arbitrage, extracting via TKN2
100
TKN0 TKN1 TKN2 TKN3 TKN4 TKN5 TKN6
PRICE 1.00 2.10 3.94 7.69 15.18 31.21 62.44
C00 -165 20
C01 724 -90
C02 -2,985 756
C03 -2,321 1,155
C04 -5,581 2,798
C05 -1,860 946
C06 -28,135 3,680
C07 -25,254 3,322
C08 -15,990 8,255
C09 -14,978 236
C10 -5,034 1,281
C11 1,393 -692
C12 680 -167
C13 -15,942 8,231
C14 -10,626 337
C15 -11,200 355
C16 -1,208 616
Ca0 -114,981 59,331
Ca1 -62,890 31,827
Ca2 -34,425 17,326
CaX 200,302 -28,838
AMMIn 0 200,302 76,573 36,911 28,838 692 257
AMMOut -39,789 -200,302 -76,573 -36,911 -28,838 -692 -257
TOTAL NET -39,789 -0 -0 -0 -0 -0 -0
101
TKN0 TKN1 TKN2 TKN3 TKN4 TKN5 TKN6
PRICE 0.25 0.53 1.00 1.95 3.85 7.75 15.71
C00 -692 85
C01 197 -24
C02 14,145 -3,523
C03 -2,348 1,169
C04 -5,763 2,891
C05 -2,042 1,039
C06 -28,331 3,707
C07 -25,450 3,349
C08 -15,731 8,117
C09 -2,863 45
C10 -5,266 1,342
C11 709 -356
C12 429 -106
C13 -15,683 8,093
C14 -5,354 169
C15 -5,928 187
C16 -1,389 709
Ca0 -113,146 58,354
Ca1 -66,520 33,687
Ca2 -35,484 17,863
CaX 198,340 -28,567
AMMIn 14,145 198,340 74,564 38,524 28,567 356 130
AMMOut -14,145 -198,340 -84,503 -38,524 -28,567 -356 -130
TOTAL NET -0 -0 -9,939 -0 -0 -0 -0
Table F.7: Trade instructions with arbitrage curve, extracting via TKN2
102
TKN0 TKN1 TKN2 TKN3 TKN4 TKN5 TKN6
PRICE 1.00 2.10 3.94 7.69 15.18 31.22 62.41
C01 660 -82
C02 -2,798 708
C03 -2,323 1,156
C04 -5,579 2,797
C05 -1,858 945
C06 -28,136 3,681
C07 -25,255 3,322
C08 -15,989 8,255
C09 -15,282 241
C10 -5,035 1,282
C11 1,385 -688
C12 649 -160
C13 -15,941 8,231
C14 -10,566 335
C15 -11,141 353
C16 -1,205 615
Ca0 -114,972 59,326
Ca1 -62,842 31,802
Ca2 -34,496 17,362
CaX 200,293 -28,837
AMMIn 0 200,293 76,519 36,819 28,837 688 241
AMMOut -39,787 -200,293 -76,519 -36,819 -28,837 -688 -241
TOTAL NET -39,787 0 0 0 0 0 0
Table F.8: Trade instructions with arbitrage curve, and removing C00
103