0% found this document useful (0 votes)
61 views111 pages

MOSEKPortfolioCookbook A4paper

This document is a cookbook for portfolio optimization using MOSEK software. It introduces the basic concepts of mean-variance portfolio optimization and provides examples of solving optimization problems related to portfolio selection. It also covers more advanced topics like dealing with estimation error, using factor models, incorporating transaction costs, benchmark relative optimization, and risk budgeting. The goal is to serve as both a practical reference and guide for implementing portfolio optimization.

Uploaded by

Ivo Stefanov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views111 pages

MOSEKPortfolioCookbook A4paper

This document is a cookbook for portfolio optimization using MOSEK software. It introduces the basic concepts of mean-variance portfolio optimization and provides examples of solving optimization problems related to portfolio selection. It also covers more advanced topics like dealing with estimation error, using factor models, incorporating transaction costs, benchmark relative optimization, and risk budgeting. The goal is to serve as both a practical reference and guide for implementing portfolio optimization.

Uploaded by

Ivo Stefanov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

MOSEK Portfolio Optimization

Cookbook
Release 1.2.1

MOSEK ApS

13 September 2022
Contents

1 Preface 1
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Code examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Markowitz portfolio optimization 3


2.1 The mean–variance model . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Constraints, modifications . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Conic formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Input data preparation 15


3.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Modeling the distribution of returns . . . . . . . . . . . . . . . . . . . . 17
3.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Dealing with estimation error 27


4.1 Estimation of mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Estimation of the covariance matrix . . . . . . . . . . . . . . . . . . . . 30
4.3 Using constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Improve the optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Factor models 38
5.1 Explicit factor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Implicit factor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Modeling considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Transaction costs 48
6.1 Variable transaction costs . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Fixed transaction costs . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Market impact costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4 Cardinality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5 Buy-in threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Benchmark relative portfolio optimization 61

i
7.1 Active return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 Factor model on active returns . . . . . . . . . . . . . . . . . . . . . . . 62
7.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.4 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8 Other risk measures 69


8.1 Deviation measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2 Tail risk measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3 Expected utility maximization . . . . . . . . . . . . . . . . . . . . . . . 77
8.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

9 Risk budgeting 84
9.1 Risk contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.2 Risk budgeting portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3 Risk budgeting with variance . . . . . . . . . . . . . . . . . . . . . . . . 85
9.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

10 Appendix 91
10.1 Conic optimization refresher . . . . . . . . . . . . . . . . . . . . . . . . . 91
10.2 Mixed-integer models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
10.3 Quadratic cones and riskless solution . . . . . . . . . . . . . . . . . . . . 99
10.4 Monte Carlo Optimization Selection (MCOS) . . . . . . . . . . . . . . . 100

11 Notation and definitions 102


11.1 Financial notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
11.2 Mathematical notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
11.3 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Bibliography 105

ii
Chapter 1

Preface

1.1 Purpose
This book provides an introduction to the topic of portfolio optimization and discusses
several branches of practical interest from this broad subject.
We intended it to be a practical guide, a cookbook, that not only serves as a reference
but also supports the reader with practical implementation. We do not assume that the
reader is acquainted with portfolio optimization, thus the book can be used as a starting
point, while for the experienced reader it can serve as a review.
First we familiarize the reader with the basic concepts and the most relevant ap-
proaches in portfolio optimization, then we also present computational examples with
code to illustrate these concepts and to provide a basis for implementing more complex
and more specific cases. We aim to keep the discussion concise and self-contained, cov-
ering only the main ideas and tools, the most important pitfalls, both from theoretical
and from technical perspective. The reader is directed towards further reading in each
subject through references.

1.2 Content
Each of the chapters is organized around a specific subject:

• Sec. 2 is a general introduction, and is recommended to be read first to familiarize


with the problem formulation and notations used throughout this book. Here we
also present a code example showing how to use MOSEK to model the problem
and how to solve it.

• Sec. 3 summarizes concepts and pitfalls related to the preparation of raw data.
Here we discuss how to arrive at security returns data suitable for optimization,
starting from raw data that is commonly available on the internet. This chapter is
recommended as a second read.

• Each of the subsequent chapters are focusing on a specific topic: mitigating estima-
tion error, using factor models, modeling transaction costs, and finally optimizing
relative to a benchmark. These chapters are independent and thus can be read in
any order.

1
• The book assumes a basic familiarity with conic optimization. Sec. 10.1 can be
used as a quick refresher in this topic. It shows how to convert traditional quadratic
optimization (QO) and quadratically constrained quadratic optimization (QCQO)
problems into equivalent quadratic conic optimization models, and what are the
advantages of this conversion. Then it briefly introduces other types of cones as
well.

1.3 Code examples


The code examples are all in MOSEK Fusion API for Python, but all of them can be
written in other languages too. The list of programming languages that MOSEK Fusion
exist for can be found at

https://www.mosek.com/documentation/

The code examples appearing in this cookbook as well as other supplementary material
are available from

https://github.com/MOSEK/PortfolioOptimization

2
Chapter 2

Markowitz portfolio optimization

In this section we introduce the Markowitz model in portfolio optimization, and discuss
its different formulations and the most important input parameters.

2.1 The mean–variance model


Consider an investor who wishes to allocate capital among 𝑁 securities at time 𝑡 = 0
and hold them over a single period of time until 𝑡 = ℎ. We denote 𝑝0,𝑖 the (known) price
of security 𝑖 at the beginning of the investment period and 𝑃ℎ,𝑖 the (random) price of
security 𝑖 at the end of the investment period 𝑡 = ℎ. The rate of return of security 𝑖 over
period ℎ is then modeled by the random variable 𝑅𝑖 = 𝑃ℎ,𝑖 /𝑝0,𝑖 − 1, and its expected
value is denoted by 𝜇𝑖 = E(𝑅𝑖 ). The risk-averse investor seeks to maximize the return
of the investment, while trying to keep the investment risk, i. e., the uncertainty of the
future security returns 𝑅𝑖 on an acceptable low level.
The part of investment risk called specific risk (the risk associated with each individual
security) can be reduced (for a fixed expected rate of return) through diversification,
which means the selection of multiple unrelated securities into a portfolio.1 Modern
Portfolio Theory formalized this process by using variance as a measure of portfolio risk,
and constructing a optimization problem.
Thus we make the investment decision at time 𝑡 = 0 by specifying the 𝑁 -dimensional
decision vector x called portfolio, where 𝑥𝑖 is the fraction of funds
∑︀ invested into security
𝑖. We can then express the random portfolio return as 𝑅x = 𝑖 𝑥𝑖 𝑅𝑖 = xT 𝑅, where 𝑅
is the vector of security returns. The optimal x is given based on the following inputs of
the portfolio optimization problem:

• The expected portfolio return

𝜇x = E(𝑅x ) = xT E(𝑅) = xT 𝜇.

• The portfolio variance


∑︁
𝜎x2 = Var(𝑅x ) = Cov(𝑅𝑖 , 𝑅𝑗 )𝑥𝑖 𝑥𝑗 = xT Σx.
𝑖

1
The other component of risk called systematic risk can only be reduced by decreasing the expected
return.

3
Here 𝜇 is the vector of expected returns, Σ is the covariance matrix of returns, sum-
marizing the risks associated with the securities. Note that the covariance matrix is
symmetric positive semidefinite by definition, and if we assume that none of the securi-
ties is redundant2 , then it is positive definite. This can also be seen if we consider that
the portfolio variance must always be a positive number. After the above parameters the
problem is also referred to as mean–variance optimization (MVO). The choice of variance
as the risk measure results that MVO is a quadratic optimization (QO) problem.3
Using these input parameters, the MVO problem seeks to select a portfolio of securities
x in such a way that it finds the optimal tradeoff between expected portfolio return and
portfolio risk. In other words it seeks to maximize the return while limiting the level of
risk, or it wants to minimize the risk while demanding at least a given level of return.
Thus portfolio optimization can be seen as a bi-criteria optimization problem.

2.1.1 Solution of the mean–variance model


To be able to solve the portfolio optimization problem, we have to assume that the investor
knows the value of the expected return vector 𝜇 and covariance matrix Σ. In practice it
is only possible to have estimates, which we will denote by 𝜇 and Σ respectively. (Details
on the properties of these estimates and ways to improve them will be the topic of Sec.
4.)
In the simplest form of the MVO problem only risky securities are considered. Also the
portfolio is fully invested with no initial holdings, implying the linear equality constraint
𝑖 𝑥𝑖 = 1 x = 1, where 1 denotes a vector of ones.
T
∑︀
We can formulate this portfolio optimization problem in three equivalent ways:
1. Minimize the portfolio risk, with the constraint expressing a lower bound on the
portfolio return:
minimize xT Σx
subject to 𝜇T x ≥ 𝑟min , (2.1)
1T x = 1.
Here 𝑟min is the target expected return, which the investor wishes to reach. In this
problem the objective function is convex quadratic, all the constraints are linear,
thus it is a quadratic optimization (QO) problem. While QO problems are easy
to solve, this formulation is not ideal in practice because investors might prefer
specifying an upper bound on the portfolio risk instead of specifying a lower bound
on the expected portfolio return.
2. Maximize the expected portfolio return, with the constraint expressing an upper
bound on the portfolio risk:
maximize 𝜇T x
subject to xT Σx ≤ 𝛾 2 , (2.2)
1T x = 1.
2
A security is redundant within the universe of securities in consideration, if there is a replicating
portfolio, i. e., a combination of other securities in the universe, that has the same properties. It means
that we can drop the redundant security from the universe without loss of opportunities for diversification.
Keeping it in the portfolio could cause the covariance matrix Σ to be singular.
3
This was a reasonable choice from both the financial and the mathematical point of view, because
modern portfolio theory and the theory of quadratic optimization started to develop about the same
time in history, in the 1950s. However, the way we model risk can be different. See later in Sec. 8.

4
In this formulation it is possible to explicitly constrain the risk measure (the vari-
ance), which makes it more practical. We can also add constraints on multiple
types of risk measures more easily. However, the problem in this form is not a QO
because of the quadratic constraint. We can reformulate it as a conic problem; see
Sec. 2.3.

3. Maximize the utility function of the investor:

maximize 𝜇T x − 2𝛿 xT Σx
(2.3)
subject to 1T x = 1.

In this case we construct the (concave) quadratic utility function 𝜇T x − 2𝛿 xT Σx to


represent the risk-averse investor’s preferred tradeoff between portfolio return and
portfolio risk. This preference can be adjusted using the risk-aversion coefficient
𝛿. However this parameter might not have intuitive investment meaning for the
investor.
We get a variant of this optimization problem by using the standard deviation
instead of the variance:

maximize 𝜇T x − 𝛿˜ xT Σx
(2.4)
subject to 1T x = 1.

This form is favorable because then the standard deviation penalty term will be of
the same scale as portfolio return.
Moreover, if we assume portfolio return to be normally distributed, then 𝛿˜ has a
more tangible meaning; it is the z-score4 of portfolio return. Also, the objective
function will be the (1 − 𝛼)-quantile of portfolio return, which is the opposite of the
𝛼 confidence level value-at-risk (VaR). The number 𝛼 comes from Φ(−𝛿) ˜ = 1 − 𝛼,
where Φ is the cumulative distribution function of the normal distribution.
We can see that for 𝛿˜ = 0 we maximize expected portfolio return. Then by increas-
ing 𝛿˜ we put more and more weight on tail risk, i. e., we maximize a lower and lower
quantile of portfolio return. This makes selection of 𝛿˜ more intuitive in practice.
Note that computing quantiles is more complicated for other distributions, because
in general they are not determined by only mean and standard deviation.
These two problems will result in the same set of optimal solutions as the other two
formulations. They also allow an easy computation of the entire efficient frontier.
Maximizing problem (2.3) is again a QO, but problem (2.4) is not because of the
square root. This latter problem can be solved using conic reformulation, as we will
see in Sec. 2.3.

The optimal portfolio x computed by the Markowitz model is efficient in the sense
that there is no other portfolio giving a strictly higher return for the same amount of
risk or a strictly lower risk for the same amount of return. In other words an efficient
portfolio is Pareto optimal. The collection of such points form the efficient frontier in
mean return–variance space.
The above introduced simple formulations are trivial to solve. Through the method
of Lagrangian multipliers we can get analytical formulas, which involve 𝜇 and Σ−1 as
4
The z-score is the distance from the mean measured in units of standard deviation.

5
parameters. Thus the solution is simply to invert the covariance matrix (see e. g. in
[CJPT18]). This makes them useful as a demonstration example, but they have only
limited investment value.

2.2 Constraints, modifications


In investment practice additional constraints are essential, and it is common to add linear
equalities and inequalities, convex inequalities, and/or extra objective function terms to
the model, which represent various restrictions on the optimal security weights.
Constraints can reflect investment strategy or market outlook information, they can be
useful for imposing quality controls on the portfolio management process, or for control-
ling portfolio structure and avoiding inadvertent risk exposures. In these more realistic
use cases however, only numerical optimization algorithms are suitable for finding the
solution.
In the following, we discuss some of the constraints commonly added to portfolio
optimization problems.

2.2.1 Budget constraint


In general we can assume that we have x0 fraction of initial holdings, and 𝑥f0 fraction of
(risk-free) initial wealth to invest into risky securities. Then 1T x0 + 𝑥f0 = 1 is an initial
condition, meaning that both the risky and the risk-free securities take up some fraction
of the initial portfolio. After portfolio optimization, the portfolio composition changes,
but no cash is added to or taken out from the portfolio. Thus the condition
1 T x + 𝑥f = 1 (2.5)
has to hold. For the differences x̃ = x − x0 and 𝑥˜f = 𝑥f − 𝑥f0 it follows that 1T x̃ + 𝑥˜f = 0
has to hold.
If we do not want to keep any wealth in the risk-free security, then in addition 𝑥f must
be zero.

2.2.2 Diversification constraints


These constraints can help limit portfolio risk by limiting the exposure to individual
positions or industries, sectors.
For a single position, they can be given in the form
𝑙𝑖 ≤ 𝑥𝑖 ≤ 𝑢𝑖 .
For a group of securities such as an industry or sector, represented by the set ℐ they will
be
∑︁
𝑙𝑖 ≤ 𝑥𝑖 ≤ 𝑢 𝑖 .
𝑖∈ℐ

We can also create a constraint that limits the total fraction of the 𝑚 largest investments
to at most 𝑝. Based on Sec. 10.1.1 this is represented by the following set of constraints:
𝑚𝑡 + 1𝑇 u ≤ 𝑝, u + 𝑡 ≥ x, u ≥ 0,
where u and 𝑡 are new variables.

6
2.2.3 Leverage constraints
Componentwise short sale limit
The simplest form of leverage constraint is when we do not allow any short selling. This
is the long-only constraint stated as

x ≥ 0.

We can allow some amount of short selling 𝑠𝑖 on security 𝑖 by stating

𝑥𝑖 ≥ −𝑠𝑖 .

Total short sale limit


We can also bound the total short selling by the constraint
∑︁
max(−𝑥𝑖 , 0) ≤ 𝑆.
𝑖

We can rewrite this as a set of linear constraints by modeling the maximum function
based on Sec. 10.1.1 using an auxiliary vector t− :

1T t− ≤ 𝑆, t− ≥ −x, t− ≥ 0.

Collateralization requirement
An interesting variant of this constraint is the collateralization requirement, which limits
the total of short positions to a fraction of the total of long positions. We can model it
by introducing new variables x+ and x− for the long and short part of x, based on Sec.
10.1.1. Then the collateralization requirement will be:
∑︁ ∑︁
𝑥−
𝑖 ≤ 𝑐 𝑥+
𝑖 .
𝑖 𝑖

Leverage strategy
A leverage strategy is to do short selling, then use the proceeds to buy other securities.
We can express such a strategy in general using two constraints:

• 1T x = 1,

• 𝑖 |𝑥𝑖 | ≤ 𝑐 or equivalently ‖x‖1 ≤ 𝑐,


∑︀

where 𝑐 = 1 yields the long-only constraint and 𝑐 = 1.6 means the 130/30 strategy.
The 1-norm constraint is nonlinear, but can be modeled as a linear constraint based on
Sec. 10.1.1: −z ≤ x ≤ z, 1T z = 𝑐.

7
2.2.4 Turnover constraints
The turnover constraint limits the total change in the portfolio positions, which can help
limiting e. g. taxes and cost of transactions.
Suppose x0 is the initial holdings vector. Then we can write the turnover constraint
as

‖x − x0 ‖1 ≤ 𝑐.

This nonlinear expression can be modeled as linear constraint using Sec. 10.1.1.

2.2.5 Practical considerations


We can of course add many other constraints that might be imposed by regulations or
arise from specific investor preferences. As long as all of these are linear or conic, the
problem will not become significantly harder to solve. (See Sec. 10.1 for a summary on
conic constraints.)
However, we must not add too many constraints. Overconstraining a portfolio can
lead to infeasibility of the optimization problem, or even if there exists a solution, out-
of-sample performance of it could be poor.
Throughout this book the general notation x ∈ ℱ will indicate any further constraints
added to the problem, that are not relevant to the given example. Then we can write the
general MVO problem in the form

maximize 𝜇T x
subject to xT Σx ≤ 𝛾 2 , (2.6)
x ∈ ℱ.

2.3 Conic formulation


We have seen that some of the portfolio optimization problems in Sec. 2.1.1 are quadratic
optimization problems. Solving QO problems in their original form is popular and con-
sidered easy, because this model was studied starting from early in history (in the 1950s),
allowing it to become a well known and mature approach. However, more recent results
show us that conic optimization models can improve on QO models both in the theoretical
and in the practical sense. More on this in Sec. 10.1.2.
In this section we will show to convert problems (2.1)-(2.4) into conic form. Assuming
that the covariance matrix estimate Σ is positive definite, it is possible to decompose it
as Σ = GGT , where G ∈ R𝑁 ×𝑘 . We can do this for example by Cholesky decomposition,
see Sec. 10.1.1 for a list of other possibilities. An interesting approach in financial ap-
plications is the factor model, which can have the advantage of having a direct financial
interpretation (see in Sec. 5).
Even if there are no redundant securities, Σ can be positive semidefinite if the number
of samples is lower than the number of securities. In this case the Cholesky decomposition
might not be computable.5 We can then apply regularization techniques (e. g. shrinkage,
see in Sec. 4.2.1) to ensure positive definiteness. Otherwise if suitable linear returns data
5
The Cholesky decomposition is possible for positive semidefinite matrices, but software implemen-
tations might need positive definiteness depending on the algorithm used (e. g. Python numpy).

8
(see in Sec. 3) is available, then a centered and normalized data matrix can also serve as
matrix G.
Using the decomposition we can write the portfolio variance as xT Σx = xT GGT x =
‖G x‖22 . This leads to two different conic forms (see also in Sec. 10.1.1).
T

1. Modeling with rotated quadratic cone:


We can directly model the squared norm constraint ‖GT x‖22 ≤ 𝛾 2 using the rotated
quadratic cone as (𝛾 2 , 12 , GT x) ∈ 𝒬𝑘+2
r . This will give us the conic equivalent of e.
g. problem (2.3):
maximize 𝜇T x
subject to (𝛾 , 2 , GT x) ∈ 𝒬𝑘+2
2 1
r , (2.7)
1T x = 1.

2. Modeling with quadratic cone:


We can also consider the equivalent norm constraint ‖GT x‖2 ≤ 𝛾 instead of the
squared norm, and model it using the quadratic cone as (𝛾, GT x) ∈ 𝒬𝑘+1 . The
conic equivalent of problem (2.3) will then look like
maximize 𝜇T x
subject to (𝛾, GT x) ∈ 𝒬𝑘+1 , (2.8)
1T x = 1.

This way we can also model problem (2.4). We introduce the variable 𝑠 to represent
the upper bound of the portfolio standard deviation, and model the constraint
‖GT x‖2 ≤ 𝑠 using the quadratic cone as

maximize 𝜇T x − 𝛿𝑠˜
subject to (𝑠, G x) ∈ 𝒬𝑘+1 ,
T (2.9)
1T x = 1.

While modeling problem (2.3) using the rotated quadratic cone might seem more
natural, it can also be reasonable to model it using the quadratic cone for the
benefits of the smaller scale and intuitive interpretation of 𝛿.˜ This is the same as
modeling of problem (2.4), which shows that conic optimization can also provide
efficient solution to problems that are inefficient to solve in their original form.
In general, transforming the optimization problem into the conic form has multiple
practical advantages. Solving the problem in this format will result in a more robust and
usually faster and more reliable solution process. Check Sec. 10.1.2 for details.

2.4 Example
We have seen how to transform a portfolio optimization problem into conic form. Now
we will present a detailed example in MOSEK Fusion. Assume that the input data
estimates 𝜇 and Σ are given. The latter is also positive definite. For methods and
examples on how to obtain these see Sec. 3.
Suppose we would like to create a long only portfolio of eight stocks. Assume that we
receive the following variables:

9
# Expected returns and covariance matrix
m = np.array(
[0.0720, 0.1552, 0.1754, 0.0898, 0.4290, 0.3929, 0.3217, 0.1838]
)
S = np.array([
[0.0946, 0.0374, 0.0349, 0.0348, 0.0542, 0.0368, 0.0321, 0.0327],
[0.0374, 0.0775, 0.0387, 0.0367, 0.0382, 0.0363, 0.0356, 0.0342],
[0.0349, 0.0387, 0.0624, 0.0336, 0.0395, 0.0369, 0.0338, 0.0243],
[0.0348, 0.0367, 0.0336, 0.0682, 0.0402, 0.0335, 0.0436, 0.0371],
[0.0542, 0.0382, 0.0395, 0.0402, 0.1724, 0.0789, 0.0700, 0.0501],
[0.0368, 0.0363, 0.0369, 0.0335, 0.0789, 0.0909, 0.0536, 0.0449],
[0.0321, 0.0356, 0.0338, 0.0436, 0.0700, 0.0536, 0.0965, 0.0442],
[0.0327, 0.0342, 0.0243, 0.0371, 0.0501, 0.0449, 0.0442, 0.0816]
])

The similar magnitude and positivity of all the covariances suggests that the selected
securities are closely related. In Sec. 3.4.2 we will see that they are indeed from the same
market and that the data was collected from a highly bullish time period, resulting in the
large expected returns. Later in Sec. 5.4.1 we will analyze this covariance matrix more
thoroughly.

2.4.1 Maximizing return


We will solve problem (2.2) with an additional constraint to prevent short selling. We
specify a risk level of 𝛾 2 = 0.05 and assume that there are no transaction costs. The
optimization problem will then be

maximize 𝜇T x
subject to x Σx ≤ 𝛾 2 ,
T
(2.10)
1T x = 1,
x ≥ 0.

Recall that by the Cholesky decomposition we have Σ = GGT . Then we can model the
quadratic term xT Σx ≤ 𝛾 2 using the rotated quadratic cone as (𝛾 2 , 12 , GT x) ∈ 𝒬𝑁
r
+2
, and
arrive at
maximize (︀ 𝜇T x)︀
subject to 𝛾 2 , 12 , GT x ∈ 𝒬𝑁 +2
,
T
r
(2.11)
1 x = 1,
x ≥ 0.

In the code, G will be an input parameter, so we compute it first. We use the Python
package numpy for this purpose, abbreviated here as np.

N = m.shape[0] # Number of securities


gamma2 = 0.05 # Risk limit

# Cholesky factor of S to use in conic risk constraint


G = np.linalg.cholesky(S)

Next we define the Fusion model for problem (2.11):

10
with Model("markowitz") as M:
# Settings
M.setLogHandler(sys.stdout)

# Decision variable (fraction of holdings in each security)


# The variable x is the fraction of holdings in each security.
# x must be positive, this imposes the no short-selling constraint.
x = M.variable("x", N, Domain.greaterThan(0.0))

# Budget constraint
M.constraint('budget', Expr.sum(x), Domain.equalsTo(1))

# Objective
M.objective('obj', ObjectiveSense.Maximize, Expr.dot(m, x))

# Imposes a bound on the risk


M.constraint('risk', Expr.vstack(gamma2, 0.5, Expr.mul(G.T, x)),
Domain.inRotatedQCone())

# Solve optimization
M.solve()

returns = M.primalObjValue()
portfolio = x.level()

The resulting portfolio return is 𝜇T x = 0.2767 and the portfolio composition is x =


[0, 0.0913, 0.2691, 0, 0.0253, 0.3216, 0.1765, 0.1162].

2.4.2 Efficient frontier


In this section we compute the full efficient frontier using the same input variables. This
is the easiest to do based on the formulation (2.3) by varying the delta parameter. We
also prevent short selling in this case and assume no transaction costs. The optimization
problem becomes

maximize 𝜇T x − 𝛿˜ xT Σx
subject to 1T x = 1, (2.12)
x ≥ 0.

With
√ the help of Cholesky decomposition, we can model the quadratic term xT Σx =
xT GGT x = ‖GT x‖2 in the objective as a conic constraint. Let us introduce the
variable 𝑠 and apply that ‖GT x‖2 ≤ 𝑠 is equivalent to the quadratic cone membership
(𝑠, GT x) ∈ 𝒬𝑁 +1 . Then the problem will take the form

maximize 𝜇T x − 𝛿𝑠˜
subject to T
1 x = 1,
(2.13)
x ≥ 0,
(𝑠, G x) ∈ 𝒬𝑁 +1 .
T

First we compute the input variable G:

11
N = m.shape[0] # Number of securities

# Cholesky factor of S to use in conic risk constraint


G = np.linalg.cholesky(S)

Next we define the Fusion model for problem (2.13):

with Model("Case study") as M:


# Settings
M.setLogHandler(sys.stdout)

# Variables
# The variable x is the fraction of holdings in each security.
# x must be positive, this imposes the no short-selling constraint.
x = M.variable("x", N, Domain.greaterThan(0.0))

# The variable s models the portfolio variance term in the objective.


s = M.variable("s", 1, Domain.unbounded())

# Budget constraint
M.constraint('budget', Expr.sum(x), Domain.equalsTo(1))

# Objective (quadratic utility version)


delta = M.parameter()
M.objective('obj', ObjectiveSense.Maximize,
Expr.sub(Expr.dot(m, x), Expr.mul(delta, s)))

# Conic constraint for the portfolio variance


M.constraint('risk', Expr.vstack(s, Expr.mul(G.T, x)), Domain.inQCone())

Here we use a feature in Fusion that allows us to define model parameters. The line
delta = M.parameter() in the code does this. Defining a parameter is ideal for the
computation of the efficient frontier, because it allows us to reuse an already existing
optimization model, by only changing its parameter value.
To do so, we define the range of values for the risk aversion parameter 𝛿 to be 20
numbers between 10−1 and 101.5 :

deltas = np.logspace(start=-1, stop=1.5, num=20)[::-1]

Then we will solve the optimization model repeatedly in a loop, setting a new value
for the parameter 𝛿 in each iteration, without having to redefine the whole model each
time.

columns = ["delta", "obj", "return", "risk"] + df_prices.columns.tolist()


df_result = pd.DataFrame(columns=columns)
for d in deltas:
# Update parameter
delta.setValue(d)

# Solve optimization
M.solve()
(continues on next page)

12
(continued from previous page)

# Save results
portfolio_return = m @ x.level()
portfolio_risk = s.level()[0]
row = pd.Series([d, M.primalObjValue(), portfolio_return,
portfolio_risk] + list(x.level()), index=columns)
df_result = df_result.append(row, ignore_index=True)

Then we can plot the results. See the risk-return plane on Fig. 2.1. We generate it
by the following code:

# Efficient frontier
df_result.plot(x="risk", y="return", style="-o",
xlabel="portfolio risk (std. dev.)",
ylabel="portfolio return", grid=True)

We can observe the portfolio composition for different risk-aversion levels on Fig. 2.2,
generated by

# Portfolio composition
my_cmap = LinearSegmentedColormap.from_list("non-extreme gray",
["#111111", "#eeeeee"], N=256, gamma=1.0)
df_result.set_index('risk').iloc[:, 3:].plot.area(colormap=my_cmap,
xlabel='portfolio risk (std. dev.)', ylabel="x")

Fig. 2.1: The efficient frontier.

13
Fig. 2.2: Portfolio composition x with varying level if risk-aversion 𝛿.

14
Chapter 3

Input data preparation

In Sec. 2.1.1 we mentioned that the expected return vector 𝜇 and the covariance matrix Σ
of the securities needs to be estimated. In this chapter we will discuss this topic through
a more general approach.
Portfolio optimization problems are inherently stochastic because they contain uncer-
tain forward-looking quantities, most often the return of securities 𝑅. In the case of the
basic mean–variance optimization problems (2.1)-(2.3) the simple form of the objective
and constraint functions make it possible to isolate uncertainty from the decision vari-
ables and wrap it into the parameters. This allows us to separate stochastic optimization
into parameter estimation and deterministic optimization steps.
In many cases, for instance those involving different risk measures, however, it is
not possible to estimate the optimization inputs as a separate step. These objectives
or constraints can be a non-trivial function of portfolio returns, and thus we have to
explicitly model the randomness. See examples in Sec. 8. Therefore more generally our
goal is to estimate the distribution of returns over the investment time period. The
simplest way to do that is by using the concept of scenario.

3.1 Scenarios
A scenario is a possible realization z𝑘 of the 𝑁 dimensional random vector 𝑍 of a quantity
corresponding to a given time period [MWOS15]. Thus we can also see a set of scenarios
as a discretization of the distribution of 𝑍.
We denote∑︀ the number of scenarios by 𝑇 . The probability of scenario z𝑘 occurring will
be 𝑝𝑘 , with 𝑇𝑘=1 𝑝𝑘 = 1. In practice when the scenarios come from historical data or are
obtained via computer simulation, all scenarios are assumed to have the same probability
1/𝑇 .
We commonly work with scenarios of the security returns 𝑅. Scenario 𝑘 of security
returns will be r𝑘 . Assuming that 𝑝𝑘 = 1/𝑇 , we can arrange the scenarios as columns of
the 𝑁 × 𝑇 matrix R.
Models involving certain risk measures can most easily be solved as convex opti-
mization problems using scenarios (see Sec. 8). This approach is called sample average
approximation in stochastic optimization.

15
3.1.1 Scenario generation
Both the quality and the quantity of scenarios is important to get reliable optimal portfo-
lios. Scenarios must be representative and also there must be enough of them to accurately
model a random variable. Too few scenarios could lead to approximation errors, while
too many scenarios could result in excessive computational cost. Next we discuss some
common ways of generating scenarios.

3.1.2 Historical data


If we can assume that past realizations of a random quantity represents possible future
outcomes, we can simply use historical data to create scenarios. Observations are then
collected from a predetermined time window, with a given frequency. Each scenario would
correspond to a vector of simultaneous observations for all 𝑁 securities.
Using historical data is a simple non-parametric approach, but it could be inaccurate
or insufficient in some cases:

• The underlying assumption about past realizations representing the future might
not be realistic, if markets fundamentally change in the observed time period, or
unexpected extreme events happen. Such changes can happen at least in every
several years, often making the collection of enough representative historical data
impossible.

• It can represent the tails of the distribution inaccurately because of low number of
samples corresponding to the low probability tail events. Also these extreme values
can highly depend on the given historical sample, and can fluctuate a lot when the
sample changes.

• It depends on the observations being independent and identically distributed. With-


out this assumption, for example if the data exhibits autocorrelation, volatility
clustering, etc., the distribution estimated based on historical scenarios will be in-
accurate.

3.1.3 Monte Carlo simulation


With this approach we can generate a large number of scenarios according to a specified
distribution. It can be useful in cases when there is a lack of historical scenarios available
in the desired timeframe, but we have a model of the relevant probability distribution.
This allows us to generate scenarios through a parametric approach. It has the following
steps:

1. Select a (multivariate) probability distribution that explains all interesting features


of the quantity of interest, e. g. fat tails and tail-dependence of return.

2. Estimate the distribution parameters using historical (independent and identically


distributed) data samples.

3. Generate scenarios by sampling from the fitted distribution.

Later we will see an example of Monte Carlo scenario generation in Sec. 5.4.1.

16
3.1.4 Performance assessment
We advise to have two separate sets of scenario data.

• The first one is the in-sample data, which we use for running the optimization and
building the portfolio. We assume it known at the time of portfolio construction.

• The other data set is the out-of-sample data, which is for assessing the performance
of the optimal portfolio. Out-of sample data is assumed to become available after
the portfolio has been built.

3.2 Modeling the distribution of returns


In this section we describe how to estimate the distribution of security return over the
time period of investment. It can be in the form of the estimated expected return and
covariance matrix of securities, or in the form of a set of return scenarios.
The simplest case is when we have sufficient amount of return data for the desired time
period, then we can estimate their distribution easily and proceed with the optimization.
However, such data is seldom available, especially if the time period is very long, for
reasons mentioned in Sec. 3.1.2. Therefore often we need to use higher frequency (shorter
timeframe) data to estimate the distribution, then project the distribution to the longer
timeframe of interest. This is typically not straightforward to do. We will describe this
procedure first for stocks, then also outline the general method for other securities.

3.2.1 Notions of return


There are two different notions of return. Let 𝑃𝑡0 and 𝑃𝑡1 be the prices of a security at
the beginning and end of a time period.

Linear return
Also called simple return, calculated as
𝑃𝑡1
𝑅lin = − 1.
𝑃𝑡0

This return definition is the one we used so far in this book, the return whose distribution
we would like to compute over the time period of investment.
We can aggregate linear returns across securities, meaning that the linear return of a
portfolio is the average of the linear returns of its components:
∑︁
𝑅xlin = 𝑥𝑖 𝑅𝑖lin . (3.1)
𝑖

This is a property we need to be able to validly compute portfolio return and portfolio
risk. However, it is not easy to scale linear return from one time period to another, for
example to compute monthly return from daily return.

17
Logarithmic return
Also called continuously compounded return, calculated as
(︂ )︂
log 𝑃𝑡1
𝑅 = log .
𝑃𝑡0

We can aggregate logarithmic returns across time, meaning that the total logarith-
mic return over 𝐾 time∑︀periods (︀is the sum)︀ of all 𝐾 single-period logarithmic returns:
𝑘=1 log 𝑃𝑡𝑘 /𝑃𝑡𝑘−1 . This property makes it very easy to scale
= log (𝑃𝑡𝐾 /𝑃𝑡0 ) = 𝐾
log
𝑅𝐾
logarithmic return from one time period to another. However, we cannot average logarith-
mic returns the same way as linear returns, therefore they are unsuitable for computation
of portfolio return and portfolio risk.

Relationship between returns


The two notions of return are related by

𝑅log = log 1 + 𝑅lin . (3.2)


(︀ )︀

We must not confuse them, especially on longer investment time periods [Meu10a]. Log-
arithmic returns are suitable for projecting from shorter to longer time periods, while
linear returns are suitable for computing portfolio level quantities.

3.2.2 Data preparation for stocks


Because both returns have properties necessary for portfolio optimization, the data prepa-
ration procedure for a stock portfolio is the following:

1. We start with logarithmic return data over the time period of estimation (e. g. one
day if the data has daily frequency).

2. We estimate the distribution of logarithmic returns on this time period. If we


assume that logarithmic returns are normally distributed, then here we estimate
their mean vector and covariance matrix.

3. Using the property of aggregation across time we scale this distribution to the time
period of investment (e. g. one year). We can scale the mean and covariance of
logarithmic returns by the “square-root rule” 1 . This means that if ℎ is the time
period of investment and 𝜏 is the time period of estimation, then
ℎ ℎ
𝜇ℎ = 𝜇 , Σℎ = Σ𝜏 .
𝜏 𝜏 𝜏

4. Finally, we convert it into a distribution of linear returns over the period of in-
vestment. Then we can use the property of cross-sectional aggregation to compute
portfolio return and portfolio risk. We show an example working with the assump-
tion of normal distribution for logarithmic returns in Sec. 3.4.
1
Originally the square-root rule for stocks
√︀ states that we can annualize the standard deviation 𝜎𝜏 of
𝜏 -day logarithmic returns by 𝜎ann = 𝜎𝜏 252/𝜏 , where 𝜎ann is the annualized standard deviation, and
252 is the number of business days in a year.

18
3.2.3 Data preparation in general
We can generalize the above procedure to other types of securities; see in detail in [Meu05].
The role of the logarithmic return in case of stocks is generalized by the concept of market
invariant. A market invariant is a quantity that determines the security price and is
repeating (approximately) identically across time. Thus we can model it as a sequence of
independent and identically distributed random variables. The steps will be the following:

1. Identify the market invariants. As we have seen, the invariant for the stock market
is the logarithmic return. However, for other markets it is not necessarily the return,
e. g. for the fixed-income market the invariant is the change in yield to maturity.

2. Estimate the joint distribution of the market invariant over the time period of es-
timation. Apart from historical data on the invariants, any additional information
can be used. We can use parametric assumptions, approximate through moments,
apply factor models, shrinkage estimators, etc., or simply use the empirical distri-
bution, i. e., a set of scenarios.

3. Project the distribution of invariants to the time period of investment. Here we


will assume that the invariants are additive across time, i. e., the invariant corre-
sponding to a larger time interval is the sum of invariants corresponding to smaller
intervals (the property that logarithmic returns have). Then the projection is easy
to do through the characteristic function (see in [Meu05]), or we can also scale the
moments by applying the “square-root rule” (see in [Meu10b]).

4. Map the distribution of invariants into the distribution of security prices at the
investment horizon through a pricing function. Then compute the distribution of
linear returns from the distribution of prices. If we have a set of scenarios, we can
simply apply the pricing function on each of them.

This process will be followed in the case studies section in an example.

3.3 Extensions
In this section we discuss some use cases which need a more general definition of the
MVO problem.

3.3.1 Generalization of linear return


While it is common to use linear return in portfolio optimization, it is not the most
general approach. In certain cases we need to extend its definition to be able to do model
the optimization problem [Meu10c]. As we have seen in Sec. 3.2.1, the key property of
linear return is the aggregation across securities (3.1). It allows us to define the expected
portfolio return and the portfolio variance. We repeat the formula here in detail:

𝑊ℎ − 𝑤0 ∑︁ 𝑣𝑖 (𝑃ℎ,𝑖 − 𝑝0,𝑖 ) ∑︁ 𝑣𝑖 𝑝0,𝑖 𝑃ℎ,𝑖 − 𝑝0,𝑖 ∑︁


𝑅xlin = = Tp
= ∑︀ = 𝑥𝑖 𝑅𝑖lin , (3.3)
𝑤0 𝑖
v 0 𝑖 𝑖 𝑣𝑖 𝑝 0,𝑖 𝑝 0,𝑖 𝑖

where 𝑤0 = vT p0 is the initial wealth, 𝑊ℎ = vT 𝑃ℎ is the final wealth, v is the vector of


security holdings in terms of quantity, p0 is the vector of initial security prices, and 𝑃ℎ

19
is the vector of final security prices. We can see that maximizing linear return implicitly
assumes that we wish to maximize final wealth.
Based on equation (3.3) there are two components in the aggregation formula that
have to be well defined:

• Linear return of a security: It is given for security 𝑖 as

𝑃ℎ,𝑖 − 𝑝0,𝑖
𝑅𝑖lin = (3.4)
𝑝0,𝑖

Linear return is not well defined if the price of the security 𝑝0,𝑖 is zero.
This can occur for example with swaps and futures.

• Portfolio weight: It is given for security 𝑖 as

𝑣𝑖 𝑝0,𝑖
𝑥𝑖 = . (3.5)
𝑤0
Portfolio weight is not well defined if the initial wealth 𝑤0 (the total
portfolio value at time 𝑡 = 0) is zero. This can occur for example in the
case of dollar neutral long-short portfolios.

To be able to do portfolio optimization in the above mentioned special cases, we need


to extend the definitions of portfolio weight and linear return such that we can apply the
aggregation property (3.3).

• To extend the definition of linear return, we associate with each security a general-
ized normalizing quantity or basis 𝑏𝑖 that takes the role of the security price 𝑝0,𝑖 in
the denominator of formula (3.4):

𝑃ℎ,𝑖 − 𝑝0,𝑖
𝑅𝑖lin = (3.6)
𝑏𝑖
The basis depends on the nature of the security; for swaps it can be
the notional, for options it can be the strike or the underlying value,
etc. It can be any value that results in an intuitive return definition to
the portfolio manager. In most cases though it will be the price of the
security, giving back the usual linear return definition (3.4).

• To extend the definition of the portfolio weight, we introduce a generalized portfolio


basis 𝑏P that takes the role of the initial portfolio value 𝑤0 in the denominator of
formula (3.5). We also use the relevant security basis 𝑏𝑖 in the numerator:

𝑣𝑖 𝑏𝑖
𝑥𝑖 = . (3.7)
𝑏P

20
In general, the basis quantities 𝑏𝑖 (𝑖 = 1, . . . , 𝑁 ) and 𝑏P have to satisfy the following
properties:

1. They are positive. This guarantees that for a long position, a positive net profit
(P&L) will correspond to a positive return.

2. They are measured in the same money units as the P&L. This ensures that the
return is dimensionless.

3. They are homogeneous, i.e. if 𝑏 is the basis for one unit of security, then the basis
for 𝑛 unit of securities will be 𝑛𝑏. This ensures that the return is independent of
the position size.

4. They are known at the beginning of the investment period (at time 𝑡 = 0). This
ensures that the return will correspond to the full investment period.

By defining a basis value for each security and a basis value for the portfolio, com-
puting the expected portfolio return and portfolio variance will be possible through the
generalized aggregation formula:
∑︁ 𝑣𝑖 𝑏𝑖 𝑃ℎ,𝑖 − 𝑝0,𝑖 ∑︁
𝑅xlin = = 𝑥𝑖 𝑅𝑖lin , (3.8)
𝑖
𝑏P 𝑏𝑖 𝑖

We can observe that in formula (3.8), the security level basis values 𝑏𝑖 cancel out. This
means that the value of the security level basis does not affect the portfolio return. It
only matters in the context of interpreting the security weight and security return.

3.3.2 Portfolio scaling


During the introduction of the MVO problem in Sec. 2 we interpreted 𝑥𝑖 as the fraction
of wealth invested into security 𝑖, such that the sum of weights is 1. This is, however,
not the only possibility. From formula (3.8) we can see that the portfolio return and
its interpretation or scaling is only affected by the portfolio level basis value 𝑏P . We can
select it independently of the security level basis values 𝑏𝑖 (𝑖 = 1, . . . , 𝑁 ), and thus change
the scaling of the portfolio weights. Some natural choices:

• 𝑏P = 𝑖 𝑣𝑖 𝑏𝑖 . This choice ensures that the portfolio weights 𝑥𝑖 (𝑖 = 1, . . . , 𝑁 ) sum


∑︀
to 1, and thus represent fractions of 𝑏P .

• 𝑏P = 𝑖 𝑣𝑖 𝑝0,𝑖 = 𝑤0 . In this case we normalize the P&L with the total portfolio
∑︀
value. For long only portfolio this is the same as the initial capital.

• 𝑏P = 𝑖 |𝑣𝑖 |𝑝0,𝑖 , the gross exposure. This provides an intuitive fraction interpreta-
∑︀
tion to 𝑥𝑖 for long-short portfolios as well.

• 𝑏P = 𝐶init , the initial capital.

• 𝑏P = 1. In this case formula (3.8) will become the total net profit (P&L) in dollars,
and the portfolio weights will also be measured in dollars.

In this book, we will use 𝑏P = 𝑖 𝑣𝑖 𝑏𝑖 by default. Any other choice will be explicitly
∑︀
stated.

21
3.3.3 Long-short portfolios
When working with a long-short portfolio, we also have to extend the MVO problem
slightly. If the investor can short sell and also use leverage (e. g. margin loan), then
the total value of the investment (the gross exposure) can be greater than the amount
of initial capital. In case of leverage, the gross exposure provides better insight into the
risks taken than the total capital, so being fully invested is no longer meaningful as a sole
constraint.
Therefore in the optimization problem formulation of typical long-short portfolios,
we will substitute the budget constraint (2.5) to a gross exposure limit, i. e., a leverage
constraint:
∑︁
|𝑣𝑖 |𝑝0,𝑖 ≤ 𝐿 · 𝐶init ,
𝑖

where 𝐶init is the initial invested capital, and 𝐿 is the leverage limit. If we normalize here
with 𝑏P = 𝐶init , then we get

‖x‖1 ≤ 𝐿.

For example, Regulation-T states that for long positions the margin requirement is 50%
of the position value, and for short positions the collateral requirement is 150% of the
position value (of which 100% comes from short sale proceeds). This translates into
𝐿 = 2. A special case of 𝐿 = 1 would mean that we can short sell but have no leverage.
A more general version of the leverage constraint is
∑︁
𝑚𝑖 |𝑥𝑖 | ≤ 1.
𝑖

where 𝑚𝑖 is the margin requirement of position 𝑖. In case of Reg-T we had 𝑚𝑖 = 12 for all
𝑖.
We might also impose an enhanced active equity structure on the portfolio, like 120/20
or 130/30. This is typically considered when it is possible to use the short sale proceeds
to purchase more securities (another form of leverage), so there is no need to use margin
loans. Such a portfolio has 100% market exposure, which is expressed by
∑︁
𝑣𝑖 𝑝0,𝑖 = 𝐶init ,
𝑖

or writing the same using x after normalizing again with 𝑏P = 𝐶init , we get the ususal
budget constraint
∑︁
𝑥𝑖 = 1.
𝑖

For example, 130/30 type portfolio would have the constraints ‖x‖1 ≤ 1.6 and 1T x = 1.
We can also use factor neutrality constraints on a long-short portfolio. We can achieve
this simply by adding a linear equality constraint expressing that the portfolio should be
orthogonal to a vector 𝛽 of factor exposures:

𝛽 T x = 0.

22
A special case of this is when the factor exposures are all ones; then the portfolio will be
dollar neutral :

1T x = 0.

We have to note that in the case of dollar neutrality, the total portfolio value is 0.
Referring back to the discussion about portfolio weights in Sec. 3.3.1, in this case the
portfolio weights cannot be defined as (3.5). Instead, we need to define a 𝑏P that is
different from 𝑤0 . See also in Sec. 3.3.2.
Note also that if we model using the quadratic cone instead of the rotated quadratic
cone and x = 0 is a feasible solution, then there will be no optimal portfolios which
are not fully invested. The solutions will be either x = 0 or some risky portfolio with
‖x‖1 = 1. See a detailed discussion about this in Sec. 10.3.

3.4 Example
In this example we will show a case with real data, that demonstrates the steps in Sec.
3.2.3 and leads to the expected return vector and the covariance matrix we have seen in
Sec. 2.4.

3.4.1 Problem statement


Suppose that we wish to invest in the U.S. stock market with a long-only strategy. We
have chosen a set of eight stocks and we plan to re-invest any dividends. We start to
invest at time 𝑡 = 0 and the time period of investment will be ℎ = 1 year, ending at time
𝑡 = 1.
We also have a pre-existing allocation v0 measured in number of shares. We can
express this allocation also in fractions of total dollar value v0T p0 , by x0 = v0 ∘ p0 /v0T p0 .
Our goal is to derive a representation of the distribution of one year linear returns at
time 𝑡 = 1, and use it to solve the portfolio optimization problem. Here we will represent
the distribution with the estimate 𝜇 of the expected one year linear returns 𝜇 = E(𝑅)
and with the estimate Σ of the covariance of one year linear returns Σ = Cov(𝑅), at the
time point 𝑡 = 1.

3.4.2 Data collection


We obtain a time-series of daily stock prices from data providers, for a five year period
(specifically from 2016-03-21 until 2021-03-18), adjusted for splits and dividends. During
this time period the market was generally bullish, with only short downturn periods. This
will be reflected in the estimated mean return vector and in the covariance structure. See
a deeper analysis in Sec. 5.4.1.
We choose an estimation time period of one week as a balance between having enough
independent observations and the homogeneity of the data series. Thus 𝜏 = 1/52 year.
At this point we assume that the price data is available in the pandas DataFrame
df_prices. The graphs of the price time-series can be seen on Fig. 3.1:

23
Fig. 3.1: Daily prices of the 8 stocks in the example portfolio.

3.4.3 Estimation of moments


Now we will go over the steps in Sec. 3.2.3.

Market invariants
For stocks, both linear and logarithmic returns are market invariants, however it is easier
to project logarithmic returns to longer time periods. It also has an approximately
symmetrical distribution, which makes it easier to model. Therefore we resample each
price time series to weekly frequency, and obtain a series of approximately 250 non-
overlapping observations:

df_weekly_prices = df_prices.resample('W').last()

Next we compute weekly logarithmic returns from the weekly prices, followed by basic
handling of missing values:

df_weekly_log_returns =
np.log(df_weekly_prices) - np.log(df_weekly_prices.shift(1))
df_weekly_log_returns = df_weekly_log_returns.dropna(how='all')
df_weekly_log_returns = df_weekly_log_returns.fillna(0)

24
Distribution of invariants
To estimate the distribution of market invariants, in this example we choose a parametric
approach and fit the weekly logarithmic returns to a multivariate normal distribution.
This requires the estimation of the distribution parameters 𝜇log
𝜏 and Σ𝜏 . For simplicity
log

we use the sample statistics denoted by 𝜇log𝜏 and Σlog


𝜏 . The “log” superscript indicates
that these statistics correspond to the logarithmic returns. In code this looks like

return_array = df_weekly_log_returns.to_numpy()
m_weekly_log = np.mean(return_array, axis=0)
S_weekly_log = np.cov(return_array.transpose())

Projection of invariants
We project the distribution of the weekly logarithmic returns represented by 𝜇log𝜏 and
Σ𝜏 to the one year investment horizon. Because the logarithmic returns are additive
log

across time, the projected distribution will also be normal with parameters 𝜇log ℎ log
ℎ = 𝜏 𝜇𝜏
and Σlog
ℎ = 𝜏 Σ𝜏 .
ℎ log

m_log = 52 * m_weekly_log
S_log = 52 * m_weekly_log

Distribution of linear returns


To obtain the distribution of linear returns at the investment horizon ℎ, we first derive
the distribution of security prices at the investment horizon. Using the(︀ characteristic
function of the normal distribution, and the pricing function 𝑃ℎ = p0 exp 𝑅log , we get
)︀

(︁ )︁
E(𝑃ℎ ) = p0 ∘ exp 𝜇log + 1
diag(Σ log
) ,
ℎ 2 ℎ
(3.9)
Cov(𝑃ℎ ) = E(𝑃ℎ )E(𝑃ℎ )T ∘ (exp(Σlog ℎ ) − 1).
In code, this will look like

p_0 = df_weekly_prices.iloc[0].to_numpy()
m_P = p_0 * np.exp(m_log + 1/2*np.diag(S_log))
S_P = np.outer(m_P, m_P) * (np.exp(S_log) - 1)

Then the estimated moments of the linear return is easy to get by 𝜇 = p10 ∘ E(𝑃ℎ ) − 1
and Σ = p01pT ∘ Cov(𝑃ℎ ), where ∘ denotes the elementwise product and division by a
0
vector or a matrix is also done elementwise.

m = 1 / p_0 * m_P - 1
S = 1 / np.outer(p_0, p_0) * S_P

Notice that we could have computed the distribution of linear returns from the dis-
tribution of logarithmic returns directly is this case, using the simple relationship (3.2).
However in general for different securities, especially for derivatives we derive the distri-
bution of linear returns from the distribution of prices. This case study is designed to
demonstrate the general procedure. See details in [[Meu05], [Meu11]].
Also in particular for stocks we could have started with linear returns as market
invariants, and model their distribution. Projecting it to the investment horizon, however,

25
would have been much more complicated. There exist no scaling formulas for the linear
return distribution or its moments as simple as the ones for logarithmic returns.

26
Chapter 4

Dealing with estimation error

As we have seen in Sec. 2, Markowitz portfolio optimization is a convenient and useful


framework. However, in practice there is one important limitation [MM08]. The optimal
solution is extremely sensitive to changes in the vector 𝜇 of estimated expected returns
and the estimated covariance matrix Σ. Small changes in these input parameters often
result in large changes in the optimal portfolio. The optimization basically magnifies the
impact of estimation errors, yielding unintuitive, questionable portfolios and extremely
small or large positions, which are difficult to implement. The instability causes unwar-
ranted turnover and transaction costs.
It would help to use a larger estimation window, but the necessary amount of data
grows unrealistically high quickly, because the size of the covariance matrix increases
quadratically with the number of securities 𝑁 [[Bra10], [CY16]].
Fortunately there are multiple approaches to mitigating this issue. These can signifi-
cantly reduce the impact of estimation errors and stabilize the optimization.
In the following we explore the main aspects of overcoming the estimation error issues.

4.1 Estimation of mean


The optimization is particularly sensitive to estimation errors in expected returns, small
variations in sample means can lead to sizeable portfolio readjustments. Estimation based
only on historical time-series can result in poor out-of-sample portfolio performance.
Sample mean is hard to estimate accurately from historical data, because for all unbiased
estimators the standard error fundamentally depends on the length (duration) of the
time-series [Dod12]. Therefore to get better accuracy it is necessary to extend the time
horizon, which is often problematic because data from a longer time interval might contain
more structural changes, i. e. lack stationarity in its underlying parameters.
Despite these difficulties, the below methods offer an improvement on mean estima-
tion.

27
4.1.1 Black-Litterman model
One way to deal with estimation error is to combine noisy data with prior informa-
tion. We can base prior beliefs on recent regulatory, political and socioeconomic events,
macroeconomy news, financial statements, analyst ratings, asset pricing theories, insights
about market dynamics, econometric forecasting methods, etc. [AZ10]
The Black-Litterman model considers the market equilibrium returns implied by
CAPM as a prior estimate, then allows the investor to update this information with
their own views on security returns. This way we do not need noisy historical data for
estimation [[Bra10], [CJPT18]].
Thus we assume the distribution of expected return 𝜇 to be 𝒩 (𝜇eq , Q), where 𝜇eq
is the equilibrium return vector, and the covariance matrix Q represents the investor’s
confidence in the equilibrium return vector as a realistic estimate.
The investor’s views are then modeled by the equation B𝜇 = q+𝜀, where 𝜀 ∼ 𝒩 (0, V).
Each row of this equation represents a view in the form of a linear equation, allowing
for both absolute and relative statements on one or more securities. The views are also
uncertain, we can use the diagonal of V to set the confidence in them, small variance
meaning strong confidence in a view.
By combining the prior with the views using Bayes’s theorem we arrive at the estimate

𝜇BL = Mq + (I − MB)𝜋, (4.1)

where M = QBT (BQBT + V)−1 .


The prior 𝜇eq is implied by the solution x = 1𝛿 Σ−1 𝜇 of problem (2.3), by assuming
that x = xmkt , i. e., weights according to market capitalization.
We can also understand the Black–Litterman model as an application of generalized
least-squares to combine two information sources (e. g. historical data and exogeneous
views) [[Bra10], [The71]]. Assuming the sources are independent, we write them into one
linear equation with block structure as
(︂ )︂ (︂ )︂ (︂ )︂
𝜋 I 𝜀
= 𝜇+ 𝜋 , (4.2)
q B 𝜀q

where 𝜋 is the prior, 𝜀𝜋 ∼ 𝒩 (0, Q), 𝜀q ∼ 𝒩 (0, V), Q and V nonsingular. The solution
will be

𝜇GLS = (Q−1 + BT V−1 B)−1 (Q−1 𝜋 + BT V−1 q). (4.3)

Substituting 𝜋 = 𝜇eq we get a different form of 𝜇BL .1


Because the Black-Litterman model gives an expected return vector relative to the
market equilibrium return, the optimal portfolio will also be close to the market portfolio,
meaning it should not contain extreme positions. A drawback of this method is that the
exogeneous views are assumed independent of historical data, which can be difficult to
satisfy in practice. The model is extended further in [BGP12] to views on volatility and
market dynamics.
1
We can derive this using (A + B)−1 = A−1 − A−1 B(A + B)−1 for nonsingular matrices.

28
4.1.2 Shrinkage estimation
Sample estimators are unbiased, but perform well only in the limit case of an infinite
number of observations. For small samples their variance is large. In contrast, constant
estimators have no variance, but display large bias. Shrinkage estimators improve the
sample estimator by adjusting (shrinking) the sample estimator towards a constant esti-
mator, the shrinkage target, basically finding the optimal tradeoff between variance and
bias.
Shrinkage can also be thought of as regularization or a form of empirical Bayes esti-
mation. Also the shrinkage target has to be consistent with other prior information and
optimization constraints.
1. James–Stein estimator:
This is the most widely known shrinkage estimator for the estimation of mean vector
of a normal random variable with known covariance [[EM76], [LC98], [Ric99]]. The
sample mean 𝜇sam is normally distributed with mean 𝜇 and covariance Σ/𝑇 , where
Σ is the covariance matrix of the data. Assuming b as target, the James–Stein
estimator will be

𝜇JS = b + (1 − 𝛼)(𝜇sam − b),

where the shrinkage coefficient is


1 𝑁˜ −2
𝛼= ,
𝑇 (𝜇sam − b)T Σ−1 (𝜇sam − b)
and 𝑁 ˜ = Tr(Σ)/𝜆max (Σ) is the effective dimension. The estimator improves 𝜇sam
for 𝑁
˜ > 2, i. e., when Σ/𝑇 has not too different eigenvalues. Otherwise an estimator
with coordinatewise different shrinkage coefficient is preferable, such as the one in
[EM75].
The target b is arbitrary, but it is common to set it as the mean of means 1T 𝜇sam /𝑁 .
This depends on the data, making the number of parameters to estimate one less,
so the dimension 𝑁 ˜ should be adjusted accordingly.
We can further improve 𝜇JS by the positive-part rule, when we set (1 − 𝛼) to zero
whenever it would be negative.
2. Jorion’s estimator [Jor86]:
This estimator was derived in the context of portfolio analysis through an empirical
Bayes approach. It uses the target 𝑏1, where 𝑏 is the volatility-weighted mean of
means:
1T Σ−1 𝜇sam
𝑏= .
1T Σ−1 1
The shrinkage coefficient in this case is
𝑁 +2
𝛼= .
𝑁 + 2 + 𝑇 (𝜇sam − 𝑏1)T Σ−1 (𝜇sam − 𝑏1)
If Σ is not known, it can be replaced by 𝑇 −𝑁
𝑇 −1
−2
Σ, where Σ is the sample covariance
matrix. The assumption of normality is not critical, but it gives more improvement
over 𝜇sam in “symmetric” situations, where variances are similar across data series.

29
4.2 Estimation of the covariance matrix
In contrast to the estimation of the mean, the standard error of the sample variance
depends only on the sample size, making it possible to estimate from higher frequency
data. Still, if the number of securities is high relative to the number of samples, the
optimal solution can suffer from two issues [MM08]:
• Ill conditioning: We need to have sufficient amount of data to get a well-conditioned
covariance estimate. A rule of thumb is that the number of observations 𝑇 should
be an order of magnitude greater than the number of securities 𝑁 .

• Estimation error: The estimation error of the covariance matrix increases quadrat-
ically (in contrast to linear increase in the case of the mean), so if 𝑁/𝑇 is large it
can quickly become the dominant contributing factor to loss of accuracy.
When 𝑁/𝑇 is too large, the following regularization methods can offer an improvement
over the sample covariance matrix [CM13]. Moreover, we can also apply factor models in
this context, we discuss this in more detail in Sec. 5.

4.2.1 Covariance shrinkage


The idea is the same as in Sec. 4.1.2, we try to achieve a balanced bias–variance tradeoff
by shrinking the sample covariance matrix towards a shrinkage target. We can choose the
latter in many ways, but it is important to be consistent with other prior information, e.
g. what is implied by constraints.
The most well known shrinkage estimator is by Ledoit and Wolf; see [[LW03a],
[LW03b], [LW04]] and [CM13] for a quick review. It combines the sample covariance
matrix Σsam with the target B:

ΣLW = (1 − 𝛼)Σsam + 𝛼B. (4.4)

We have to estimate the shrinkage intensity parameter 𝛼 depending on the target B. In


practice it is also truncated so that 𝛼 ∈ [0, 1]. Ledoit and Wolf propose three type of
targets:
• Multiple of the identity matrix:

BI = 𝑠¯2 I, (4.5)

where 𝑠¯2 is the average sample variance. This target is optimal among all targets
of the form 𝑐I with 𝑐 ∈ R. The corresponding optimal 𝛼 is
∑︀𝑇 (︁[︀ ]︀2 )︁
1 T
𝑇2 𝑘=1 Tr z𝑘 z𝑘 − Σsam
𝛼= ,
Tr [Σsam − BI ]2
(︀ )︀

where z𝑘 is column 𝑘 of the centered data matrix Z. See the details in [LW04].
The sample covariance matrix Σsam can be ill-conditioned because estimation tends
to scatter the sample eigenvalues further away from the mean 𝜎
¯ 2 of the true unknown
eigenvalues. This raises the condition number of Σsam relative to the true matrix Σ
and this effect is stronger as the condition number of Σ is better or as 𝑁/𝑇 grows.

30
The above shrinkage estimator basically pulls all sample eigenvalues back towards
their grand mean 𝑠¯2 , the estimate of 𝜎
¯ 2 , thus stabilizing the matrix. We can also
see this as a form of regularization.

• Single factor covariance matrices implied by the CAPM2 :

BCAPM = 𝑠2mkt 𝛽𝛽 T + Σ𝜀 , (4.6)

where 𝑠2mkt is the sample variance of market returns, 𝛽 is the vector of factor ex-
posures, and Σ𝜀 is the diagonal matrix containing the variances of the residuals.
The single factor model is discussed in Sec. 5. In practice a proxy of the market
factor could be the equally-weighted or the market capitalization-weighted portfolio
of the constituents of a market index, resulting in two different shrinkage targets.
The optimal shrinkage intensity is derived in [LW03b].

• Constant correlation covariance matrix:


{︃
𝑠𝑖,𝑖 , 𝑖=𝑗
[BCC ]𝑖,𝑗 = √ , (4.7)
̸ 𝑗
𝑟¯ 𝑠𝑖,𝑖 𝑠𝑗,𝑗 , 𝑖 =

where ∑︀𝑠𝑖,𝑗 ∑︀
is the sample covariance between security 𝑖 and 𝑗 and 𝑟¯ =
𝑗=𝑖+1 𝑟𝑖,𝑗 is the average sample correlation. The optimal shrinkage
2 𝑛−1 𝑛
𝑛(𝑛−1) 𝑖=1
intensity is derived in [LW03a].

Notice that all three targets are positive definite matrices. Shrinkage towards a posi-
tive target guarantees that the resulting estimate is also positive, even when the sample
covariance matrix itself is singular. Further advantage of this estimator is that it works
also in the case of singular covariance matrices, when 𝑁 > 𝑇 .
In [LW20], the authors present a nonlinear version of this estimator, which can apply
different shrinkage intensity to each sample eigenvalue, resulting in even better perfor-
mance.

4.2.2 Covariance de-noising


According to [dP19], the shrinkage method in Sec. 4.2.1 does not discriminate between
the eigenvalues associated with noise and the eigenvalues associated with signal, thus it
weakens the signal as well. With the help of random matrix theory the method discussed
here shrinks only the part of the covariance matrix that is associated with noise. Taking
the covariance matrix Σ it consists of the following steps:

1. Compute the correlation matrix from the covariance matrix: C = D−1/2 ΣD−1/2 ,
where D = Diag(Σ).

2. Compute the eigendecomposition of C and compute the empirical distribution of


the eigenvalues using Kernel Density Estimation.
2
The Capital Asset Pricing Model (CAPM) relates the expected return of securities (particularly
stocks) to their systematic risk. We can calculate it as E(𝑅𝑖 ) = 𝑟f + 𝛽𝑖 (E(𝑅mkt ) − 𝑟f ), where E(𝑅𝑖 ) is
the expected return, 𝑅f is the risk-free rate, 𝛽𝑖 is the measure of systematic risk, and E(𝑅mkt ) is the
expected return of the market.

31
3. Fit the Marčenko–Pastur distribution to the empirical distribution, which gives the
theoretical bounds 𝜆+ and 𝜆− on the eigenvalues associated with noise. This way
we separate the spectrum into a random matrix (noise) component and the signal
component. The eigenvalues of the latter will be above the theoretical upper bound
𝜆+ .

4. Change the eigenvalues below 𝜆+ to their average, arriving at the de-noised corre-
lation matrix C̃.

5. Recover the de-noised covariance matrix: Σ̃ = D1/2 C̃D1/2 .

4.2.3 Robust estimation


The reason for estimation error is partly that maximum likelihood estimators are highly
sensitive to deviations from the assumed (typically normal) distribution. The empirical
distribution of security returns usually deviates from the normal distribution [VD09].
Therefore it can be helpful to use robust estimation methods in the construction of mean-
variance portfolios. Robust estimators are not as efficient as the maximum likelihood
estimator but are stable even when the underlying model is not perfectly satisfied by the
available data.
There are two approaches to involve robust statistics in portfolio selection:

• One-step methods: We perform the robust estimation and the portfolio optimization
in a single step. We substitute the portfolio risk and portfolio return expressions by
their robust counterparts. See for example in [VD09]. We discuss some examples
in section Sec. 8.

• Two-step methods: First we compute the robust estimates of the mean vector and
the covariance matrix of the data. This is followed by solving the usual portfolio
optimization problem with the parameters replaced by their robust estimates. There
are various robust covariance estimators, such as Minimum Covariance Determinant
(MCD), 2D Winsorization, S-estimators. See for example in [[RSTF20], [WZ07]].
We do not cover this topic in detail in this book.

4.3 Using constraints


Solutions from unconstrained portfolio optimization – while having elegant analytical
formulas – can be unreliable in practice. The optimization can significantly overweight
securities with large estimated returns, negative correlations, and small variances. The
securities with the largest estimation error will get the largest positions. There will also
be many zero positions, contradicting diversification [[AZ10], [MM08]].
Financially meaningful constraints on the portfolio weights can also act as regular-
ization, reducing error in the optimal portfolio [CM13]. In [DGNU09], 𝐿1 and 𝐿2 norm
constraints are introduced for this purpose:

maximize 𝜇T x − 2𝛿 xT Σx
subject to 1T x = 1, (4.8)
‖x‖𝑝𝑝 ≤ 𝑐.

32
If 𝑝 = 1 and 𝑐 = 1 we get back the constraint x ≥ 0, which prevents short-selling.
In [JM03] the authors found that preventing short-selling or limiting position size is
equivalent to shrinking all covariances of securities for which the respective constraint
is binding. The short-selling constraint also implies shrinkage effect on the mean vector
toward zero. In a conic optimization problem, the 1-norm constraint can be modeled
based on Sec. 10.1.1.
For the case 𝑝 = 1 and 𝑐 > 1 with the constraint 1T x = 1 also present, we get a
“short-sale budget”, meaning that the total weight of short positions will be bounded by
(𝑐 − 1)/2. This allows the possibility of optimizing e. g. 130/30 type portfolios with
𝑐 = 1.6.
If 𝑝 = 2 and the only constraint is 1T x = 1, then the minimum feasible value of 𝑐
is 1/𝑁 , giving the x = 1/𝑁 portfolio as the only possible solution. With larger values
of 𝑐 we get an effect equivalent to the Ledoit–Wolf shrinkage towards multiple of the
identity. For other shrinkage targets B, the corresponding constraint will be xT Bx ≤ 𝑐
or ‖GT x‖22 ≤ 𝑐 where GGT = B. In a conic optimization problem, we can model the
2-norm constraint based on Sec. 10.1.1.
In general,

• adding an 𝐿1 norm-constraint is equivalent to shrinking all covariances of securities


that are being sold short. Instead of the elementwise short-sale constraints here we
impose a global short-sale budget using only one constraint. This means that the
implied shrinkage coefficient will be the same for all securities. According to the
usual behavior of LASSO regression, the 𝐿1 norm-constrained portfolios are likely
to assign a zero weight to at least some of the securities. This could be beneficial if
we wish to invest only into a small number of securities.

• adding an 𝐿2 norm-constraint is equivalent to shrinking the sample covariance ma-


trix towards the identity matrix. According to properties of ridge regression, the 𝐿2
norm-constrained portfolios will typically have a non-zero weight for all securities,
which is good if we want full diversification.

4.4 Improve the optimization


In dealing with estimation error it is also a possibility to directly improve the optimization
procedure.

4.4.1 Resampled portfolio optimization


To overcome estimation uncertainty we can use nonparametric bootstrap resampling on
the available historical return data. This way we can create multiple instances of the
data and generate multiple expected return and covariance matrix estimates using any
estimation procedure. Then we compute the efficient frontier from all of them. Finally,
we average the obtained solutions to make the effect of the estimation errors cancel out.
This procedure is thoroughly analysed in [MM08].
Note that we should not average over portfolios of too different variances. Thus the
averaging procedure assumes that for each pair of inputs we generate 𝑚 points on the
efficient frontier, and rank them 1 to 𝑚. Then the portfolios with the same rank will be
averaged.

33
In this process we can treat the number of simulated return samples in each batch of
resampled data as a parameter. It can be used to measure the level of certainty of the
expected return estimate. A large number of simulated returns is consistent with more
certainty, while a small number is consistent with less certainty.
While this method offers good results in terms of diversification and mitigating the
effects of estimation error, it has to be noted that it does this at the cost of intensive
computational work.
According to [MM08], improving the optimization procedure is more beneficial than
to use improved estimates (e. g. shrinkage) as input parameter. However, the two
approaches do not exclude each other, so they can be used simultaneously.

4.4.2 Nested Clustering Optimization (NCO)


A recent method to tackle estimation error and error sensitivity of the optimization is
described in [dP19]. The NCO method is capable of controlling for noise in the inputs,
and also for instabilities arising from the covariance structure:
• Noise can arise from lack of data. As 𝑁/𝑇 approaches 1, the number of data points
will approach the number of degrees of freedom in covariance estimation. Also
based on random matrix theory, the smallest eigenvalue of the covariance matrix
will be close to zero, raising the condition number.
• Instability in the optimal solution comes from the correlation structure, the exis-
tence of highly correlating clusters of securities. These will also raise the condition
number of the covariance matrix. The most favorable correlation structure is that
of the identity matrix.
The steps of NCO are the following:
1. Cluster the securities into highly-correlated groups based on the correlation matrix
to get a partition 𝑃 of the security universe. We could apply hierarchical clustering
methods, or the method recommended in [dPL19]. It could be worthwhile to do
the clustering also on the absolute value of the correlation matrix to see if it works
better.
2. Run portfolio optimization for each cluster separately. Let x𝑘 denote the 𝑁 -
dimensional optimal weight vector of cluster 𝑃𝑘 , such that 𝑥𝑘,𝑖 = 0, ∀𝑖 ∈
/ 𝑃𝑘 .
3. Reduce the original covariance matrix Σ into a covariance matrix Σcl between
clusters: Σcl
𝑘,𝑙 = x𝑘 Σx𝑙 . The correlation matrix computed from Σ will be closer
T cl

to identity, making the inter-cluster optimization more accurate.


4. Run the inter-cluster portfolio optimization by treating each cluster as one security
and using the reduced covariance matrix Σcl . We will denote the resulting 𝑘-
dimensional optimal weight vector by xcl .
5. Generate the final optimal portfolio: x = 𝑘 𝑥cl 𝑘 x𝑘 .
∑︀

By splitting the problem in the above way into two independent parts, the error caused
by intra-cluster noise cannot propagate across clusters.
In the paper the author also proposes a method for estimating the error in the optimal
portfolio weights. We can use it with any optimization method, and facilitates comparison
of methods in practice. The steps of this procedure can be found in Sec. 10.4.

34
4.5 Example
In this part we add shrinkage estimation to the case study introduced in the previous
chapters. We start at the point where the expected weekly logarithmic returns and their
covariance matrix is available:
return_array = df_weekly_log_returns.to_numpy()
m_weekly_log = np.mean(return_array, axis=0)
S_weekly_log = np.cov(return_array.transpose())

The eigenvalues of the covariance matrix are 1e-3 * [0.2993, 0.3996, 0.4156, 0.
5468, 0.6499, 0.9179, 1.0834, 4.7805]. We apply the Ledoit–Wolf shrinkage with
target (4.5):
N = S_weekly_log.shape[0]
T = return_array.shape[0]

# Ledoit--Wolf shrinkage
S = S_weekly_log
s2_avg = np.trace(S) / N
B = s2_avg * np.eye(N)
Z = return_array.T - m_weekly_log[:, np.newaxis]
alpha_num = alpha_numerator(Z, S)
alpha_den = np.trace((S - B) @ (S - B))
alpha = alpha_num / alpha_den
S_shrunk = (1 - alpha) * S + alpha * B

The function alpha_numerator is to compute the sum in the numerator of 𝛼:


def alpha_numerator(Z, S):
s = 0
T = Z.shape[1]
for k in range(T):
z = Z[:, k][:, np.newaxis]
X = z @ z.T - S
s += np.trace(X @ X)
s /= (T**2)
return s

In this example, we get 𝛼 = 0.0843, and the eigenvalues will become 1e-3 * [0.3699,
0.4618, 0.4764, 0.5965, 0.6909, 0.9363, 1.0879, 4.4732], closer to their sample
mean as expected.
Next we implement the James–Stein estimator for the expected return vector:
# James--Stein estimator
m = m_weekly_log[:, np.newaxis]
o = np.ones(N)[:, np.newaxis]
S = S_shrunk
iS = np.linalg.inv(S)
b = (o.T @ m / N) * o
N_eff = np.trace(S) / np.max(np.linalg.eigvalsh(S))
alpha_num = max(N_eff - 3, 0)
(continues on next page)

35
(continued from previous page)
alpha_den = T * (m - b).T @ iS @ (m - b)
alpha = alpha_num / alpha_den
m_shrunk = b + max(1 - alpha, 0) * (m - b)
m_shrunk = m_shrunk[:, 0]

In this case though it turns out that the effective dimension 𝑁


˜ would be too low for the
estimator to give an improvement over the sample mean. This is because the eigenvalues
of the covariance matrix are spread out too much, meaning that the uncertainty of the
mean vector is primarily along the direction of one or two eigenvectors, effectively reducing
the dimensionality of the estimation.
We can check the improvement on the plot of the efficient frontier Fig. 4.2. We can
observe the portfolio composition for different risk-aversion levels on Fig. 4.2.

Fig. 4.1: The efficient frontier.

36
Fig. 4.2: Portfolio composition x with varying level if risk-aversion 𝛿.

37
Chapter 5

Factor models

The purpose of factor models is to impose a structure on financial variables and their
covariance matrix by explaining them through a small number of common factors. This
can help to overcome estimation error by reducing the number of parameters, i. e., the
dimensionality of the estimation problem, making portfolio optimization more robust
against noise in the data. Factor models also provide a decomposition of financial risk to
systematic and security specific components [CM13].
Let 𝑍𝑡 be an 𝑁 -dimensional random vector representing the analyzed financial variable
at time 𝑡. We can write 𝑍𝑡 as a linear function of components as

𝑍𝑡 = 𝛽𝐹𝑡 + 𝜃𝑡 ,

where 𝐹𝑡 is a 𝐾-dimensional random vector representing the common factors at time 𝑡,


𝛽 is the 𝑁 × 𝐾 matrix of factor exposures (or loadings, betas), and 𝜃𝑡 = 𝛼 + 𝜀𝑡 is the
specific component such that E(𝜃𝑡 ) = 𝛼 and 𝜀𝑡 is white noise1 with covariance Σ𝜃 . If 𝑍𝑡
represents security returns, then 𝛽E(𝐹𝑡 ) is the expected explained return and 𝛼 is the
expected unexplained return.
The factors 𝐹𝑡 should account for all common movement between pairs of variables.
Thus in the model the covariance of 𝑍𝑡 is determined only by the covariance of 𝐹𝑡 . This
requires that Cov(𝜀𝑡1 , 𝐹𝑡2 ) = 0 for all 𝑡1 , 𝑡2 and that Cov(𝜀𝑡 ) = Σ𝜀 is diagonal. We call
this an exact factor model. If Σ𝜀 is not diagonal then we have an approximate factor
model.
The above model is also static because 𝐹𝑡 has only a contemporaneous effect on 𝑍𝑡 .2
Another common assumption is that the sequence of random variables 𝐹𝑡 are weakly
stationary, so that E(𝐹𝑡 ) = 𝜇𝐹 and Cov(𝐹𝑡 ) = Σ𝐹 . This allows us to estimate the model
from (historical) scenarios.
With these assumptions, we can write the expected value of 𝑍𝑡 as

𝜇𝑍 = 𝛼 + 𝛽𝜇𝐹 .

The covariance matrix will be given by

Σ𝑍 = 𝛽Σ𝐹 𝛽 T + Σ𝜃 .
1
For a white noise process 𝜀𝑡 we have E(𝜀𝑡 ) = 0 and Cov(𝜀𝑡1 , 𝜀𝑡2 ) = Ω if 𝑡1 = 𝑡2 , where Ω does not
depend on 𝑡, otherwise 0.
2
Additional lags of 𝐹𝑡 in the 𝑍𝑡 equations can be easily allowed, then we obtain a dynamic factor
model.

38
This way the covariance matrix Σ𝑍 has only 𝑁 𝐾 + 𝑁 + 𝐾(𝐾 + 1)/2 parameters, which
is linear in 𝑁 , instead of having 𝑁 (𝑁 + 1)/2 covariances, which is quadratic in 𝑁 .
Note that because the common factors and the specific components are uncorrelated,
we can separate the risks associated with each. For a portfolio x the portfolio level
factor exposures are given by b = 𝛽 T x and we can write the total portfolio variance as
bT Σ𝐹 b + xT Σ𝜃 x.
While the factor structure may introduce a bias on the covariance estimator, it will
also lower its variance owing to the reduced number of parameters to estimate.

5.1 Explicit factor models


In these models the factors are predefined based on financial or economic theory, indepen-
dently from the data. They are observed from e. g. macroeconomic quantities (inflation,
economic growth, etc.) or from security characteristics (industry, dividend yield, market
capitalization, etc.) [Con95]. The goal of these models is typically to explain security
returns.
In the case of macroeconomic factor models the historical factor time-series F can be
observed. Then we can apply time-series regression to get estimators for 𝛽, 𝛼, and 𝜀.
However, enough stationary data might not be available for an accurate regression.
The simplest macroeconomic factor models use a single factor as common risk compo-
nent. In Sharpe’s single factor model this common factor is the market risk. In practice,
we can approximate the market risk factor by the returns of a stock index. We can ei-
ther form an equally-weighted or a market capitalization-weighted portfolio of the index
constituents. The covariance matrix with single factor model structure has only 2𝑁 + 1
parameters to estimate.
For fundamental factor models there are two methods:

• BARRA method: We determine the estimate 𝛽 of the matrix 𝛽 from the observed
security specific attributes (assumed time invariant). Then we do cross-sectional
regression for each 𝑡 to obtain the factor time-series F and its estimated covariance
matrix Σ𝐹 : F𝑡 = (𝛽 T Σ−1 𝜃 𝛽) 𝛽 Σ𝜃 Z𝑡 , where Σ𝜃 is the diagonal matrix of esti-
−1 T −1

mated specific variances accounting for the cross-sectional heteroskedasticity in the


model.

• Fama–French method: For each time 𝑡 we order the securities into quintiles by a
given security attribute. Then we long the the top quintile and short the bottom
quintile. The factor realization at time 𝑡 corresponding to the attribute will be the
return of this hedged portfolio. After obtaining the factor time-series corresponding
to each attribute, we can estimate 𝛽 using time-series regression.

A drawback of explicit factor models is the misspecification risk. The derivation of the
covariance matrix Σ𝑍 strongly relies on the orthogonality between the factors 𝐹𝑡 and the
residuals 𝜃𝑡 , which might not be satisfied if relevant factors are missing from the model.

39
5.2 Implicit factor models
Implicit factor models derive both the factors and the factor exposures from the data, it
does not need any external input. However, the statistical estimation procedures involved
are prone to discovering spurious correlations, and they can only work if we assume the
factor exposures time invariant over the estimation period. Mainly two methods are used
to derive the factors.

5.2.1 Factor analysis


This method assumes a more specific structure, a normal factor model with the additional
requirements E(𝐹𝑡 ) = 0 and Cov(𝐹𝑡 ) = I.
• For the first condition, we redefine 𝛼′ = 𝛼 + 𝛽E(𝐹𝑡 ).

• For the second condition, note that the variables 𝛽 and F𝑡 can only be determined
up to an invertible matrix H, because 𝛽F𝑡 = 𝛽H−1 HF𝑡 . Thus by decomposing the
factor covariance as Σ𝐹 = QQT and choosing H = Q−1 , we arrive at the desired
structure.
The covariance matrix will then become Σ𝑍 = 𝛽𝛽 T + Σ𝜃 . The matrix Q is still
determined only up to a rotation. This freedom can help in finding interpretation for the
factors.
Assuming that the variables 𝑍𝑡 are independent and identically normally distributed,
we can estimate 𝛼, 𝛽, and Σ𝜃 using maximum likelihood method, yielding the estimations
𝛼, 𝛽, and Σ𝜃 . Then we use cross-sectional regression (accounting for the cross-sectional
heteroskedasticity in 𝜀𝑡 ) to estimate the factor time-series: F𝑡 = (𝛽 T Σ−1 −1 T −1
𝜃 𝛽) 𝛽 Σ𝜃 (z𝑡 −
𝛼).
The number of factors can be determined e. g. using likelihood ratio test. The
drawback of factor analysis is that it is not efficient for large problems.

5.2.2 Principal Component Analysis


The simplest way to estimate an implicit factor model is by principal component analysis
(PCA), a dimension reduction method having the form of an affine transformation.
We are looking for the best approximation 𝑍˜𝑡 = 𝛼 + 𝛽𝐹𝑡 assuming 𝐹𝑡 = d + A𝑍𝑡 ,
E(𝐹𝑡 ) = 0, and dim(𝐹𝑡 ) = 𝐾 < dim(𝑍𝑡 ) = 𝑁 .
We can obtain this by eigendecomposition of Σ = Cov(𝑍𝑡 ):

Σ = VΛVT ,

where the matrix Λ is diagonal with the eigenvalues 𝜆1 , . . . , 𝜆𝑁 in it ordered from largest
to smallest, and the matrix V contains the corresponding eigenvectors v1 , . . . , v𝑁 as
columns.
If we partition the eigenvector matrix as V = [V𝐾 , V𝑁 −𝐾 ], where V𝐾 consists of
the first 𝐾 eigenvectors, we can construct the solution: 𝛽 = V𝐾 , 𝛼 = E(𝑍𝑡 ), and
the principal components F𝑡 = V𝐾 T
(𝑍𝑡 − E(𝑍𝑡 )). For the residuals 𝜀𝑡 = 𝑍𝑡 − 𝑍˜𝑡 =
−𝐾 (𝑍𝑡 − E(𝑍𝑡 )) we have E(𝜀𝑡 ) = 0 and Cov(𝐹𝑡 , 𝜀𝑡 ) = 0, as required in the
T
V𝑁 −𝐾 V𝑁
factor model definition. Also the factors will be uncorrelated: Σ𝐹 = 𝑇1 𝐹𝑡 𝐹𝑡T = Λ𝐾 ,
containing the first 𝐾 eigenvalues in the diagonal.

40
The intuition behind PCA is that it does an orthogonal projection of 𝑍𝑡 onto the
hyperplane spanned by the 𝐾 directions corresponding to the 𝐾 largest eigenvalues.
The 𝐾-dimensional projection 𝑍˜𝑡 contains the most randomness of the original variable
𝑍𝑡 , that an affine transformation can preserve. Eigenvalue 𝜆𝑖 measures the randomness
of principal component 𝐹𝑡,𝑖 , the randomness of 𝑍𝑡 along the direction v𝑖 . The ratio
𝑖=1 𝜆𝑖 is the fraction of randomness explained by the factor model approxima-
∑︀𝐾 ∑︀𝑁
𝑖=1 𝜆𝑖 /
tion.
We can also do PCA on the correlation matrix. We get the correlation matrix Ω from
the covariance matrix Σ by Ω = D−1/2 ΣD−1/2 , where D = Diag(Σ). Then we keep only
the 𝐾 largest eigenvalues: Ω̃ = V𝐾 Λ𝐾 V𝐾 T
. To guarantee that this(︁ approximation
)︁ will
also be a correlation matrix, we add a diagonal correction term Diag I − Ω̃ . Finally, we
transform it back into a covariance matrix. By this method we can get better performance
when the scale of data variables is very different.
Advantages of PCA based factor models is that they are computationally simple, they
need no external data, and that they avoid misspecification risk because the residuals and
the factors are orthogonal by construction. Also they yield factors ranked according to
their fraction of explained variance, which can further help determining the number of
factors 𝐾 to use. The drawback of implicit factors is that they do not have financial in-
terpretation, and the factor exposures can be unstable. Factor models can be generalized
in many ways; see in [BS11].

Number of factors
In practice, the number of factors 𝐾 is unknown and has to be estimated. Too few factors
reduce the explanatory power of the model, too many could lead to retaining insignificant
factors. In [BN01] the authors propose an information criteria to estimate 𝐾:
(︂ )︂ (︂ )︂
*
(︀ 𝑇 )︀ 𝑁 +𝑇 𝑁𝑇
𝐾 = argmin𝐾 log 𝜀𝐾 𝜀𝐾 + 𝐾 log ,
𝑁𝑇 𝑁 +𝑇

where 𝜀𝐾 is the vector of residuals in the case of 𝐾 factors.


If we wish to keep the number of factors time invariant, then a heuristic method can
help: 𝐾 = ⌊log(𝑁 )⌋.

Practical considerations
In practice, PCA relies on the assumption that the number of observations 𝑇 is large
relative to the number of securities 𝑁 . If this does not hold, i. e., 𝑁 > 𝑇 , then we can
do PCA in a different way. Suppose that Z is the 𝑁 × 𝑇 data matrix, and 𝜇 is the mean
of the data.
Then we do eigendecomposition of the 𝑇 ×𝑇 matrix 𝑁1 (Z−𝜇1T )T (Z−𝜇1T ) = WΛWT .
Then we get the principal component matrix as F = ΛWT .
The most efficient way to compute PCA is to use singular value decomposition (SVD)
directly on the data matrix Z. Then we get Z = VΛWT . Now it is easy to see the
connection between the above two PCA approaches:

F = VT Z = VT VΛWT = ΛWT .

41
5.3 Modeling considerations
We have arrived at the decomposition of the covariance matrix Σ𝑍 = 𝛽Σ𝐹 𝛽 T + Σ𝜃 ,
where Σ𝐹 is the 𝐾 × 𝐾 sample factor covariance matrix and 𝐾 is the number of factors.
Typically we have only a few factors, so 𝐾 ≪ 𝑁 and thus Σ𝐹 is much lower dimensional
than the 𝑁 × 𝑁 matrix Σ𝑍 . This makes it much cheaper to find the decomposition
Σ𝐹 = FFT , and consequently the factorization Σ𝑍 = GGT , where
[︁ ]︁
G = 𝛽F Σ𝜃 . 1/2
(5.1)

This form of G is typically very sparse. In practice sparsity is more important than the
number of variables and constraints in determining the computational efficiency of the
optimization problem. Thus it can be beneficial to focus on the number of nonzeros in the
matrix G and try to reduce it. Indeed, the 𝑁 × (𝑁 + 𝐾) dimensional G is larger than the
𝑁 × 𝑁 dimensional Cholesky factor of Σ𝑍 would be, but in G there are only 𝑁 (𝐾 + 1)
nonzeros in contrast to the 𝑁 (𝑁 + 1)/2 nonzeros in the Cholesky factor. Because in
practice 𝐾 tends to be a small number independent of 𝑁 , we can conclude that the usage
of a factor model reduced the number of nonzeros, and thus the storage requirement by
a factor of 𝑁 . This will in most cases also lead to a significant reduction in the solution
time.
In the risk minimization setting (2.1) of the Markowitz problem, according to Sec.
10.1.1 we can model the factor risk term xT 𝛽Σ𝐹 𝛽 T x ≤ 𝑡1 as ( 12 , 𝑡1 , FT 𝛽 T x) ∈ 𝒬𝐾+2 , and
1/2
the specific risk term xT Σ𝜃 x ≤ 𝑡2 as ( 21 , 𝑡2 , Σ𝜃 x) ∈ 𝒬𝑁 +2 . The latter can be written in
1/2
a computationally more favorable way as ( 21 , 𝑡2 , diag(Σ𝜃 ) ∘ x) ∈ 𝒬𝑁 +2 . The resulting
risk minimization problem will then look like

minimize 𝑡1 + 𝑡2
subject to 1 T T
( 2 , 𝑡1 , F 𝛽 x)∈ 𝒬𝐾+2 ,
1 1/2
( 2 , 𝑡2 , diag(Σ𝜃 ) ∘ x) ∈ 𝒬𝑁 +2 , (5.2)
T
𝜇 x ≥ 𝑟min ,
x ∈ ℱ.

We can also have prespecified limits 𝛾1 for the factor risk and 𝛾2 for the specific risk.
Then we can use the return maximization setting:

maximize 𝜇T x
subject to ( 21 , 𝛾1 , FT 𝛽 T x) ∈ 𝒬𝐾+2 ,
1/2 (5.3)
( 12 , 𝛾2 , diag(Σ𝜃 ) ∘ x) ∈ 𝒬𝑁 +2 ,
x ∈ ℱ.

5.4 Example
In this chapter there are two examples. The first shows the application of macroeconomic
factor models on the case study developed in the preceding chapters. The second example
demonstrates the performance gain that factor models can yield on a large scale portfolio
optimization problem.

42
5.4.1 Single factor model
In the example in Sec. 4.5 we have seen that the effective dimension 𝑁 ˜ is low, indicating
that there are only one or two independent sources of risk for all securities. This suggests
that a single factor model might give a good approximation of the covariance matrix. In
this example we will use the return of the SPY ETF as market factor. The SPY tracks
the S&P 500 index and thus is a good proxy for the U.S. stock market. Then our model
will be:

𝑅𝑡 = 𝛼 + 𝛽𝑅M,𝑡 + 𝜀𝑡 ,

where 𝑅M is the return of the market factor. To estimate this model using time-series
regression, we need to obtain scenarios of linear returns on the investment time horizon
(ℎ = 1 year in this example), both for the individual securities and for SPY.
To get this, we add SPY as the ninth security, and proceed exactly the same way as
in Sec. 3.4 up to the point where we have the expected yearly logarithmic return vector
ℎ and covariance matrix Σℎ . Then we generate Monte Carlo scenarios using these
𝜇log log

parameters.

scenarios_log =
np.random.default_rng().multivariate_normal(m_log, S_log, 100000)

Next, we convert the scenarios to linear returns:

scenarios_lin = np.exp(scenarios_log) - 1

Then we do the linear regression using the OLS class of the statsmodels package. The
independent variable X will contain the scenarios for the market returns and a constant
term. The dependent variable y will be the scenarios for one of the security returns.

params = []
resid = []
X = np.zeros((scenarios_lin.shape[0], 2))
X[:, 0] = scenarios_lin[:, -1]
X[:, 1] = 1
for k in range(N):
y = scenarios_lin[:, k]
model = sm.OLS(y, X, hasconst=True).fit()
resid.append(model.resid)
params.append(model.params)
resid = np.array(resid)
params = np.array(params)

After that we derive the estimates 𝛼, 𝛽, 𝑠2M , and diag(Σ𝜃 ) of the parameters 𝛼, 𝛽,
2
𝜎M , and diag(Σ𝜃 ) respectively.

a = params[:, 1]
B = params[:, 0]
s2_M = np.var(X[:, 0])
S_theta = np.cov(resid)
diag_S_theta = np.diag(S_theta)

At this point, we can compute the decomposition (5.1) of the covariance matrix:

43
G = np.block([[B[:, np.newaxis] * np.sqrt(s_M), np.sqrt(S_theta)]])

We can see that in this 8 × 9 matrix there are only 16 nonzero elements:

G = np.array([
[0.174, 0.253, 0, 0, 0, 0, 0, 0, 0 ],
[0.189, 0, 0.205, 0, 0, 0, 0, 0, 0 ],
[0.171, 0, 0, 0.181, 0, 0, 0, 0, 0 ],
[0.180, 0, 0, 0, 0.190, 0, 0, 0, 0 ],
[0.274, 0, 0, 0, 0, 0.314, 0, 0, 0 ],
[0.225, 0, 0, 0, 0, 0, 0.201, 0, 0 ],
[0.221, 0, 0, 0, 0, 0, 0, 0.218, 0 ],
[0.191, 0, 0, 0, 0, 0, 0, 0, 0.213 ]
])

This sparsity can considerably speed up the optimization. See the next example in
Sec. 5.4.2 for a large scale problem where this speedup is apparent.
While MOSEK can handle the sparsity efficiently, we can exploit this structure also
at the model creation phase, if we handle the factor risk and specific risk parts separately,
avoiding a large matrix-vector product:

G_factor = B[:, np.newaxis] * np.sqrt(s_M)


g_specific = np.sqrt(diag_S_theta)

factor_risk = Expr.mul(G_factor.T, x)
specific_risk = Expr.mulElm(g_specific, x)

Then we can form the risk constraint in the usual way:

M.constraint('risk', Expr.vstack(gamma2, 0.5, factor_risk, specific_risk),


Domain.inRotatedQCone())

Finally, we compute the efficient frontier in the following points:

deltas = np.logspace(start=-1, stop=1.5, num=20)[::-1]

If we plot the efficient frontier on Fig. 5.1, and the portfolio composition on Fig. 5.2
we can compare the results obtained with and without using a factor model.

5.4.2 Large scale factor model


In this example we will generate Monte Carlo based return data and compare optimization
runtime between the Cholesky decomposition based and factor model based representa-
tion of the covariance matrix.
The following function generates the data, and the factor model:

def random_factor_model(N, K, T):


# Generate K uncorrelated, zero mean factors, with weighted variance
S_F = np.diag(range(1, K + 1))
rng = np.random.default_rng(seed=1)
Z_F = rng.multivariate_normal(np.zeros(K), S_F, T).T

(continues on next page)

44
Fig. 5.1: The efficient frontier.

Fig. 5.2: Portfolio composition x with varying level if risk-aversion 𝛿.

45
(continued from previous page)
# Generate random factor model parameters
B = rng.normal(size=(N, K))
a = rng.normal(loc=1, size=(N, 1))
e = rng.multivariate_normal(np.zeros(N), np.eye(N), T).T

# Generate N time-series from the factors


Z = a + B @ Z_F + e

# Residual covariance
S_theta = np.cov(e)
diag_S_theta = np.diag(S_theta)

# Optimization parameters
m = np.mean(Z, axis=1)
S = np.cov(Z)

return m, S, B, S_F, diag_S_theta

The following code computes the comparison by increasing the number of securities
𝑁 . The factor model in this example will have 𝐾 = 10 common factors, which is kept
constant, because it typically does not depend on 𝑁 .

# Risk limit
gamma2 = 0.1

# Number of factors
K = 10

# Generate runtime data


list_runtimes_orig = []
list_runtimes_factor = []
for n in range(5, 15):
N = 2**n
T = N + 2**(n-1)
m, S, B, S_F, diag_S_theta = random_factor_model(N, K, T)

F = np.linalg.cholesky(S_F)
G_factor = B @ F
g_specific = np.sqrt(diag_S_theta)

G_orig = np.linalg.cholesky(S)

optimum_orig, runtime_orig = Markowitz(N, m, G_orig, gamma2)


optimum_factor, runtime_factor = Markowitz(N, m, (G_factor, g_specific),␣
˓→gamma2)

list_runtimes_orig.append((N, runtime_orig))
list_runtimes_factor.append((N, runtime_factor))

tup_N_orig, tup_time_orig = list(zip(*list_runtimes_orig))


tup_N_factor, tup_time_factor = list(zip(*list_runtimes_factor))

46
The function call Markowitz(N, m, G, gamma2) runs the Fusion model, the same way
as in Sec. 2.4. If instead of the argument G we specify the tuple (G_factor, g_specific),
then the risk constraint in the Fusion model is created such that it exploits the risk factor
structure:

# M is the Fusion model object


factor_risk = Expr.mul(G_factor.T, x)
specific_risk = Expr.mulElm(g_specific, x)
total_risk = Expr.vstack(factor_risk, specific_risk)

M.constraint('risk', Expr.vstack(np.sqrt(gamma2), total_risk), Domain.inQCone())

To get the runtime of the optimization, we add the following line to the model:

time = M.getSolverDoubleInfo("optimizerTime")

The results can be seen on Fig. 5.3, showing that the factor model based covariance
matrix modeling can result in orders of magnitude faster solution times for large scale
problems.

Fig. 5.3: Runtimes with and without using a factor model.

47
Chapter 6

Transaction costs

Rebalancing a portfolio generates turnover, i. e., buying and selling of securities to change
the portfolio composition. The basic Markowitz model assumes that there are no costs
associated with trading, but in reality, turnover incurs expenses. In this chapter we extend
the basic model to take this into account in the form of transaction cost constraints. We
also show some practical constraints, which can also limit turnover through limiting
position sizes.
We can classify transaction costs into two types [WO11]:

• Fixed costs are independent of transaction volume. These include brokerage com-
missions and transfer fees.

• Variable costs depend on the transaction volume. These comprise execution costs
such as market impact, bid/ask spread, or slippage; and opportunity costs of failed
or incomplete execution.

Note that to be able to compare transaction costs with returns and risk, we need to
aggregate them over the length of the investment time period.
In the optimization problem, let x̃ = x − x0 denote the change in the portfolio with
respect to the initial holdings x0 . Then in general we can take into account transaction
costs with the function 𝐶, where 𝐶(x̃) is the total transaction cost incurred by the change
x̃ in the portfolio. Here we assume that transaction costs are separable, ∑︀𝑛 i.e., the total
cost is the sum of the costs associated with each security: 𝐶(x̃) = 𝑖=1 𝐶𝑖 (˜ 𝑥𝑖 ), where
the function 𝐶𝑗 (˜
𝑥𝑖 ) specifies the transaction costs incurred for the change in the holdings
of security 𝑖. We can then write the MVO model with transaction cost in the following
way:

maximize 𝜇T x
subject to 1T x + 𝑖=1 𝐶𝑖 (˜
𝑛
𝑥𝑖 ) = 1 T x 0 ,
∑︀
(6.1)
x Σx ≤ 𝛾 2 ,
T

x ∈ ℱ.

The constraint 1T x + 𝑛𝑖=1 𝐶𝑖 (˜ 𝑥𝑖 ) = 1T x0 expresses the self-financing property of the


∑︀
portfolio. This means that no external cash is put into or taken out of the portfolio, we
pay the costs from the existing portfolio components. We can e. g. assign one of the
securities to be a cash account.

48
6.1 Variable transaction costs
The simplest model that handles variable costs makes the assumption that costs grow
linearly with the trading volume [[BBD+17], [LMFB07]]. We can use linear costs, for
example, to model the cost related to the bid/ask spread, slippage, borrowing or shorting
cost, or fund management fees. Let the transaction cost function for security 𝑖 be given
by
{︂ +
𝑣𝑖 𝑥˜𝑖 , 𝑥˜𝑖 ≥ 0,
𝐶𝑖 (˜
𝑥𝑖 ) = −
−𝑣𝑖 𝑥˜𝑖 , 𝑥˜𝑖 < 0,

where 𝑣𝑖+ and 𝑣𝑖− are the cost rates associated with buying and selling security 𝑖. By
introducing positive and negative part variables 𝑥˜+ 𝑖 = max(˜ 𝑥𝑖 , 0) and 𝑥˜−
𝑖 = max(−˜
𝑥𝑖 , 0)
we can linearize this constraint to 𝐶𝑖 (˜
𝑥𝑖 ) = 𝑣𝑖 𝑥˜𝑖 + 𝑣𝑖 𝑥˜𝑖 . We can handle any piecewise
+ + − −

linear convex transaction cost function in a similar way. After modeling the variables 𝑥˜+ 𝑖
and 𝑥˜−
𝑖 as in Sec. 10.1.1, the optimization problem will then become

maximize 𝜇T x
subject to 1T x + ⟨v+ , x̃+ ⟩ + ⟨v− , x̃− ⟩ = 1,
x̃ = x̃+ − x̃− ,
(6.2)
x̃+ , x̃− ≥ 0,
xT Σx ≤ 𝛾 2,
x ∈ ℱ.

In this model the budget constraint ensures that the variables x̃+ and x̃− will not both
become positive in any optimal solution.

6.2 Fixed transaction costs


We can extend the previous model with fixed transaction costs. Considering fixed costs is
a way to discourage trading very small amounts, thus obtaining a sparse portfolio vector,
i. e., one that has many zero entries.
Let 𝑓𝑖+ and 𝑓𝑖− be the fixed costs associated with buying and selling security 𝑖. The
extended transaction cost function is given by

⎨ 0, 𝑥˜𝑖 = 0,
+ +
𝐶𝑖 (˜
𝑥𝑖 ) = 𝑓 + 𝑣𝑖 𝑥˜𝑖 , 𝑥˜𝑖 > 0,
⎩ 𝑖−
𝑓𝑖 − 𝑣𝑖− 𝑥˜𝑖 , 𝑥˜𝑖 < 0.

This function is not convex, but we can still formulate a mixed integer optimization
problem based on Sec. 10.2.1 by introducing new variables. Let y+ and y− be binary

49
vectors. Then the optimization problem with transaction costs will become

maximize 𝜇T x
subject to 1 x + ⟨f , y ⟩ + ⟨f , y− ⟩+
T + + −

+⟨v+ , x̃+ ⟩ + ⟨v− , x̃− ⟩ = 1,


x̃ = x̃+ − x̃− ,
x̃+ , x̃− ≥ 0,
x̃+ ≤ u+ ∘ y+ , (6.3)
x̃− ≤ u− ∘ y− ,
y+ + y− ≤ 1,
y+ , y− ∈ {0, 1}𝑁 ,
xT Σx ≤ 𝛾 2,
x ∈ ℱ,

where u+ and u− are vectors of upper bounds on the amounts of buying and selling in
each security and ∘ is the elementwise product. The products 𝑢+ 𝑖 𝑦𝑖 and 𝑢𝑖 𝑦𝑖 ensure
+ − −

that if security 𝑖 is traded (𝑦𝑖+ = 1 or 𝑦𝑖− = 1), then both fixed and variable costs are
incurred, otherwise (𝑦𝑖+ = 𝑦𝑖− = 0) the transaction cost is zero. Finally, the constraint
y+ + y− ≤ 1 ensures that the transaction for each security is either a buy or a sell, and
never both.

6.3 Market impact costs


In reality, each trade alters the price of the security. This effect is called market impact. If
the traded quantity is small, the impact is negligible and we can assume that the security
prices are independent of the amounts traded. However, for large traded volumes we
should take market impact into account.
While there is no standard model for market impact, in practice an empirical power
law is applied [GK00] [p. 452]. Let 𝑑˜𝑖 = 𝑑𝑖 − 𝑑0,𝑖 be the traded dollar amount for security
𝑖. Then the average relative price change is
(︃ )︃𝛽−1
∆𝑝𝑖 |𝑑˜𝑖 |
= ±𝑐𝑖 𝜎𝑖 , (6.4)
𝑝𝑖 𝑞𝑖

where 𝜎𝑖 is the volatility of security 𝑖 for a unit time period, 𝑞𝑖 is the average dollar
volume in a unit time period, and the sign depends on the direction of the trade. The
number 𝑐𝑖 has to be calibrated, but it is usually around one. Equation (6.4) is called the
“square-root” law, because 𝛽 − 1 is empirically shown to be around 1/2 [TLD+11].
The relative price difference (6.4) is the impact cost rate, assuming 𝑑˜𝑖 dollar amount
is traded. After actually trading this amount, we get the total market impact cost as
∆𝑝𝑖 ˜
𝐶𝑖 (𝑑˜𝑖 ) = 𝑑𝑖 = 𝑎𝑖 |𝑑˜𝑖 |𝛽 , (6.5)
𝑝𝑖

where 𝑎𝑖 = ±𝑐𝑖 𝜎𝑖 /𝑞𝑖𝛽−1 . Thus if 𝛽 − 1 = 1/2, the market impact cost increases with
𝛽 = 3/2 power of the traded dollar amount.
We can also express the market impact cost in terms of portfolio fraction 𝑥˜𝑖 instead
of 𝑑˜𝑖 by normalizing 𝑞𝑖 with the total portfolio value vT p0 .

50
Using Sec. 10.1.1 we can model 𝑡𝑖 ≥ |˜ 𝑥𝑖 |𝛽 with the power cone as (𝑡𝑖 , 1, 𝑥˜𝑖 ) ∈
1/𝛽,(𝛽−1)/𝛽
. Hence, it follows that the total market impact cost term 𝑁 𝑥𝑖 |𝛽 can be
∑︀
𝒫3 𝑖=1 𝑎𝑖 |˜
1/𝛽,(𝛽−1)/𝛽
modeled by 𝑖=1 𝑎𝑖 𝑡𝑖 under the constraint (𝑡𝑖 , 1, 𝑥˜𝑖 ) ∈ 𝒫3 .
∑︀𝑁
Note however, that in this model nothing forces 𝑡𝑖 to be small as possible to ensure
𝑡𝑖 = |˜𝑥𝑖 |𝛽 holds at the optimal solution. This freedom allows the optimizer to try reducing
portfolio risk by incorrectly treating 𝑎𝑖 𝑡𝑖 as a risk-free security. Then it would allocate
more weight to 𝑎𝑖 𝑡𝑖 while reducing weight allocated to risky securities, basically throwing
away money.
There are two solutions, which can prevent this unwanted behavior:

• Adding a penalty term −𝛿 T t to the objective function to prevent excess growth


of the variables 𝑡𝑖 . We have to calibrate the hyper-parameter vector 𝛿 so that the
penalty would not become too dominant.

• Adding a risk-free security to the model. In this case the optimizer will prefer
to allocate to the risk-free security, which has positive return (the risk-free rate),
instead of allocating to 𝑎𝑖 𝑡𝑖 .

Let us denote the weight of the risk-free security by 𝑥f and the risk-free rate of return
by 𝑟f . Then the portfolio optimization problem accounting for market impact costs will
be
maximize 𝜇T x + 𝑟f 𝑥f
subject to 1T x + aT t + 𝑥f = 1,
xT Σx ≤ 𝛾 2, (6.6)
1/𝛽,(𝛽−1)/𝛽
(𝑡𝑖 , 1, 𝑥˜𝑖 ) ∈ 𝒫3 , 𝑖 = 1, . . . , 𝑁,
x, 𝑥f ∈ ℱ.

Note that if we model using the quadratic cone instead of the rotated quadratic cone
and a risk free security is present, then there will be no optimal portfolios for which
0 < 𝑥f < 1. The solutions will be either 𝑥f = 1 or some risky portfolio with 𝑥f = 0. See
a detailed discussion about this in Sec. 10.3.

6.4 Cardinality constraints


Investors often prefer portfolios with a limited number of securities. We do not need to
use all of the 𝑁 securities to achieve good diversification, and this way we can also reduce
costs significantly. We can create explicit limits to constrain the number of securities.
Suppose that we allow at most 𝐾 coordinates of the difference vector x̃ = x − x0 to
be non-zero, where 𝐾 is (much) smaller than the total number of securities 𝑁 .
We can again model this type of constraint based on Sec. 10.2.1 by introducing a
binary vector y to indicate |x̃| =
̸ 0, and by bounding the sum of y. The basic Markowitz

51
model then gets updated as follows:

maximize 𝜇T x
subject to 1T x = 1,
x̃ = x − x0 ,
x̃ ≤ u ∘ y,
x̃ ≥ −u ∘ y, (6.7)
T
1 y ≤ 𝐾,
y ∈ {0, 1}𝑁 ,
xT Σx ≤ 𝛾 2,
x ∈ ℱ,

where the vector u is some a priori chosen upper bound on the amount of trading in each
security.

6.5 Buy-in threshold


In the above examples we assumed that trades can be arbitrarily small. In reality, how-
ever, it can be meaningful to place lower bounds on traded amounts to avoid unrealisti-
cally small trades and to control the transaction cost. These constraints are called buy-in
threshold.
Let x̃ = x − x0 be the traded amount. Let also x̃+ = max(x̃, 0) and x̃− = max(−x̃, 0)
be the positive and negative part of x̃. These we model according to Sec. 10.2.1. Then the
buy-in threshold basically means that x̃± ∈ {0} ∪ [ℓ± , u± ], where ℓ± and u± are vectors
of lower and upper bounds on x̃+ and x̃− respectively.
This is a semi-continuous variable, which we can model based on Sec. 10.2.1. We
introduce binary variables y± and constraints ℓ± ∘ y± ≤ x̃± ≤ u± ∘ y± . The optimization
problem would then become a mixed integer problem of the form

maximize 𝜇T x
subject to 1T x = 1,
x − x0 = x̃+ − x̃− ,
x̃+ , x̃− ≥ 0,
x̃+ ≤ u+ ∘ y+ ,
x̃+ ≥ ℓ+ ∘ y+ ,
(6.8)
x̃− ≤ u− ∘ y− ,
x̃− ≥ ℓ− ∘ y− ,
y+ + y− ≤ 1,
y+ , y− ∈ {0, 1}𝑁 ,
xT Σx ≤ 𝛾 2,
x ∈ ℱ.

This model is of course compatible with the fixed plus linear transaction cost model
discussed in Sec. 6.2.

52
6.6 Example
In this chapter we show two examples. The first demonstrates the modeling of market
impact through the use of the power cone, while the second example presents fixed and
variable transaction costs and the buy-in threshold.

6.6.1 Market impact model


As a starting point, we refer back to problem (2.13). We will extend this problem with
the market impact cost model. To compute the coefficients 𝑎𝑖 in formula (6.5), we assume
that daily volume data is also available in the dataframe df_volumes. We also compute
the mean of the daily volumes, and the daily volatility for each security as the standard
deviation of daily linear returns:
# Compute average daily volume and daily volatility (std. dev.)
df_lin_returns = df_prices.pct_change()
vty = df_lin_returns.std()
vol = (df_volumes * df_prices).mean()

According to the data, the average daily dollar volumes are 108 ·
[3.9883, 4.2416, 6.0054, 4.2584, 30.4647, 34.5619, 5.0077, 8.4950], and the daily volatilities
are [0.0164, 0.0154, 0.0146, 0.0155, 0.0191, 0.0173, 0.0186, 0.0169]. Thus in this example
we will choose the size of our portfolio to be 10 billion dollars so that we can see a
significant market impact.
Then we update the Fusion model introduced in Sec. 2.4.2 with new variables and
constraints:
def EfficientFrontier(N, m, G, deltas, a, beta, rf):

with Model("Case study") as M:


# Settings
M.setLogHandler(sys.stdout)

# Variables
# The variable x is the fraction of holdings in each security.
# x must be positive, this imposes the no short-selling constraint.
x = M.variable("x", N, Domain.greaterThan(0.0))

# Variable for risk-free security (cash account)


xf = M.variable("xf", 1, Domain.greaterThan(0.0))

# The variable s models the portfolio variance term in the objective.


s = M.variable("s", 1, Domain.unbounded())

# Auxiliary variable to model market impact


t = M.variable("t", N, Domain.unbounded())

# Budget constraint with transaction cost terms


terms = Expr.hstack(Expr.sum(x), xf, Expr.dot(a, t))
M.constraint('budget', Expr.sum(terms), Domain.equalsTo(1))
(continues on next page)

53
(continued from previous page)

# Power cone to model market impact


M.constraint('mkt_impact', Expr.hstack(t, Expr.constTerm(N, 1.0), x),
Domain.inPPowerCone(1.0 / beta))

# Objective (quadratic utility version)


delta = M.parameter()
pf_return = Expr.add(Expr.dot(m, x), Expr.mul(rf, xf))
pf_risk = Expr.mul(delta, s)
M.objective('obj', ObjectiveSense.Maximize,
Expr.sub(pf_return, pf_risk))

# Conic constraint for the portfolio variance


M.constraint('risk', Expr.vstack(s, 1, Expr.mul(G.transpose(), x)),
Domain.inRotatedQCone())

columns = ["delta", "obj", "return", "risk", "t_resid",


"x_sum", "xf", "tcost"] + df_prices.columns.tolist()
df_result = pd.DataFrame(columns=columns)
for d in deltas:
# Update parameter
delta.setValue(d)

# Solve optimization
M.solve()

# Save results
portfolio_return = m @ x.level() + np.array([rf]) @ xf.level()
portfolio_risk = np.sqrt(2 * s.level()[0])
t_resid = t.level() - np.abs(x.level())**beta
row = pd.Series([d, M.primalObjValue(), portfolio_return,
portfolio_risk, sum(t_resid), sum(x.level()),
sum(xf.level()), t.level() @ a]
+ list(x.level()), index=columns)
df_result = df_result.append(row, ignore_index=True)

return df_result

The new rows are:


• The row for the variable 𝑥f , which represents the weight allocated to the cash
account. The annual return on it is assumed to be 𝑟f = 1%. We constrain 𝑥f to be
positive, meaning that borrowing money is not allowed.
• The row for the auxiliary variable t.
• The row for the market impact constraint modeled using the power cone.
We modified the budget constraints to include 𝑥f and the market impact cost aT t.
The objective also contains the risk-free part of portfolio return 𝑟f 𝑥f .
In this example, we start with 100% cash, meaning that 𝑥f0 = 1 and x0 = 0. Trans-
action cost is thus incurred for the total weight x.

54
Next, we compute the efficient frontier with and without market impact costs. We
select 𝛽 = 3/2 and 𝑐𝑖 = 1. The following code produces the results:

deltas = np.logspace(start=-0.5, stop=2, num=20)[::-1]


portfolio_value = 10**10
rel_vol = vol / portfolio_value
a1 = np.zeros(N)
a2 = (c * vty / rel_vol**(beta - 1)).to_numpy()
ax = plt.gca()
for a in [a1, a2]:
df_result = EfficientFrontier(N, m, G, deltas, a, beta, rf)
mask = df_result < 0
mask.iloc[:, :2] = False
df_result[mask] = 0
df_result.plot(ax=ax, x="risk", y="return", style="-o",
xlabel="portfolio risk (std. dev.)",
ylabel="portfolio return", grid=True)
ax.legend(["return without price impact", "return with price impact"])

On Fig. 6.1 we can see the return reducing effect of market impact costs. The left
part of the efficient frontier (up to the so called tangency portfolio) is linear because a
risk-free security was included. However, in this case borrowing is not allowed, so the
right part remains the usual parabola shape.

Fig. 6.1: The efficient frontier with risk-free security included, and market impact cost
taken into account.

55
6.6.2 Transaction cost models
In this example we show a problem that models fixed and variable transaction costs and
the buy-in threshold. Note that we do not model the market impact here.
We will assume now that x can take negative values too (short-selling is allowed), up
to the limit of 30% portfolio size. This way we can see how to apply different costs to
buy and sell trades. We also assume that x0 = 0, so x̃ = x.
The following code defines variables used as the positive and negative part variables
of x and the binary variables y+ , y− indicating whether there is buying or selling in a
security:
# Real variables
xp = M.variable("xp", N, Domain.greaterThan(0.0))
xm = M.variable("xm", N, Domain.greaterThan(0.0))

# Binary variables
yp = M.variable("yp", N, Domain.binary())
ym = M.variable("ym", N, Domain.binary())

Next we add two constraints. The first links xp and xm to x, so that they represent
the positive and negative parts. The second ensures that for each coordinate of yp and
ym only one of the values can be 1.
# Constraint assigning xp and xm to the positive and negative part of x.
M.constraint('pos-neg-part', Expr.sub(x, Expr.sub(xp, xm)),
Domain.equalsTo(0.0))

# Exclusive buy-sell constraint


M.constraint('exclusion', Expr.add(yp, ym), Domain.lessThan(1.0))

We update the budget constraint with the variable and fixed transaction cost terms.
The fixed cost of buy and sell trades are held by the variables fp and fm. These are
typically given in dollars, and have to be divided by the total portfolio value. The
variable cost coefficients are vp and vm. If these are given as percentages, then we do not
have to modify them.
# Budget constraint with transaction cost terms
fixcost_terms = Expr.add([Expr.dot(fp, yp), Expr.dot(fm, ym)])
varcost_terms = Expr.add([Expr.dot(vp, xp), Expr.dot(vm, xm)])
budget_terms = Expr.add([Expr.sum(x), varcost_terms, fixcost_terms])
M.constraint('budget', budget_terms, Domain.equalsTo(1.0))

Next, the 130/30 leverage constraint is added. Note that the transaction cost terms
from the budget constraint should also appear here, otherwise the two constraints com-
bined would allow a little more leverage than intended. (The sum of x would not reach
1 because of the cost terms, leaving more space in the leverage constraint for negative
positions.)
# Auxiliary variable for 130/30 leverage constraint
z = M.variable("z", N, Domain.unbounded())

# 130/30 leverage constraint


(continues on next page)

56
(continued from previous page)
M.constraint('leverage-gt', Expr.sub(z, x), Domain.greaterThan(0.0))
M.constraint('leverage-ls', Expr.add(z, x), Domain.greaterThan(0.0))
M.constraint('leverage-sum',
Expr.add([Expr.sum(z), varcost_terms, fixcost_terms]),
Domain.equalsTo(1.6))

Finally, to be able to differentiate between zero allocation (not incurring fixed cost)
and nonzero allocation (incurring fixed cost), and to implement buy-in threshold, we need
bound constraint involving the binary variables:

# Bound constraints
M.constraint('ubound-p', Expr.sub(Expr.mul(up, yp), xp),
Domain.greaterThan(0.0))
M.constraint('ubound-m', Expr.sub(Expr.mul(um, ym), xm),
Domain.greaterThan(0.0))
M.constraint('lbound-p', Expr.sub(xp, Expr.mul(lp, yp)),
Domain.greaterThan(0.0))
M.constraint('lbound-m', Expr.sub(xm, Expr.mul(lm, ym)),
Domain.greaterThan(0.0))

The full updated model will then look like the following:

def EfficientFrontier(N, m, G, deltas, vp, vm, fp, fm, up, um,


lp, lm, pcoef):

with Model("Case study") as M:


# Settings
M.setLogHandler(sys.stdout)

# Real variables
# The variable x is the fraction of holdings in each security.
x = M.variable("x", N, Domain.unbounded())
xp = M.variable("xp", N, Domain.greaterThan(0.0))
xm = M.variable("xm", N, Domain.greaterThan(0.0))

# Binary variables
yp = M.variable("yp", N, Domain.binary())
ym = M.variable("ym", N, Domain.binary())

# Constraint assigning xp and xm to the pos. and neg. part of x.


M.constraint('pos-neg-part', Expr.sub(x, Expr.sub(xp, xm)),
Domain.equalsTo(0.0))

# s models the portfolio variance term in the objective.


s = M.variable("s", 1, Domain.unbounded())

# Auxiliary variable for 130/30 leverage constraint


z = M.variable("z", N, Domain.unbounded())

# Bound constraints
(continues on next page)

57
(continued from previous page)
M.constraint('ubound-p', Expr.sub(Expr.mul(up, yp), xp),
Domain.greaterThan(0.0))
M.constraint('ubound-m', Expr.sub(Expr.mul(um, ym), xm),
Domain.greaterThan(0.0))
M.constraint('lbound-p', Expr.sub(xp, Expr.mul(lp, yp)),
Domain.greaterThan(0.0))
M.constraint('lbound-m', Expr.sub(xm, Expr.mul(lm, ym)),
Domain.greaterThan(0.0))

# Exclusive buy-sell constraint


M.constraint('exclusion', Expr.add(yp, ym), Domain.lessThan(1.0))

# Budget constraint with transaction cost terms


fixcost_terms = Expr.add([Expr.dot(fp, yp), Expr.dot(fm, ym)])
varcost_terms = Expr.add([Expr.dot(vp, xp), Expr.dot(vm, xm)])
budget_terms = Expr.add([Expr.sum(x), varcost_terms, fixcost_terms])
M.constraint('budget', budget_terms, Domain.equalsTo(1.0))

# 130/30 leverage constraint


M.constraint('leverage-gt', Expr.sub(z, x), Domain.greaterThan(0.0))
M.constraint('leverage-ls', Expr.add(z, x), Domain.greaterThan(0.0))
M.constraint('leverage-sum',
Expr.add([Expr.sum(z), varcost_terms, fixcost_terms]),
Domain.equalsTo(1.6))

# Objective (quadratic utility version)


delta = M.parameter()
penalty = Expr.mul(pcoef, Expr.sum(Expr.add(xp, xm)))
M.objective('obj', ObjectiveSense.Maximize,
Expr.sub(Expr.sub(Expr.dot(m, x), penalty),
Expr.mul(delta, s)))

# Conic constraint for the portfolio variance


M.constraint('risk', Expr.vstack(s, 1, Expr.mul(G.transpose(), x)),
Domain.inRotatedQCone())

columns = ["delta", "obj", "return", "risk", "x_sum", "tcost"]


+ df_prices.columns.tolist()
df_result = pd.DataFrame(columns=columns)
for idx, d in enumerate(deltas):
# Update parameter
delta.setValue(d)

# Solve optimization
M.solve()

# Save results
portfolio_return = m @ x.level()
portfolio_risk = np.sqrt(2 * s.level()[0])
(continues on next page)

58
(continued from previous page)
tcost = np.dot(vp, xp.level()) + np.dot(vm, xm.level())
+ np.dot(fp, yp.level()) + np.dot(fm, ym.level())
row = pd.Series([d, M.primalObjValue(), portfolio_return,
portfolio_risk, sum(x.level()), tcost]
+ list(x.level()), index=columns)
df_result = df_result.append(row, ignore_index=True)

return df_result

Here we also used a penalty term in the objective to prevent excess growth of the pos-
itive part and negative part variables. The coefficient of the penalty has to be calibrated
so that we do not overpenalize.
We also have to mention that because of the binary variables, we can only solve this
as a mixed integer optimization (MIO) problem. The solution of such a problem might
not be as efficient as the solution of a problem with only continuous variables. See Sec.
10.2 for details regarding MIO problems.
We compute the efficient frontier with and without transaction costs. The following
code produces the results:

deltas = np.logspace(start=-0.5, stop=2, num=20)[::-1]


ax = plt.gca()
for a in [0, 1]:
pcoef = a * 0.03
fp = a * 0.005 * np.ones(N) # Depends on portfolio value
fm = a * 0.01 * np.ones(N) # Depends on portfolio value
vp = a * 0.01 * np.ones(N)
vm = a * 0.02 * np.ones(N)
up = 2.0
um = 2.0
lp = a * 0.05
lm = a * 0.05

df_result = EfficientFrontier(N, m, G, deltas, vp, vm, fp, fm, up, um,


lp, lm, pcoef)
df_result.plot(ax=ax, x="risk", y="return", style="-o",
xlabel="portfolio risk (std. dev.)",
ylabel="portfolio return", grid=True)
ax.legend(["return without transaction cost",
"return with transaction cost"])

On Fig. 6.2 we can see the return reducing effect of transaction costs. The overall
return is higher because of the leverage.

59
Fig. 6.2: The efficient frontier with transaction costs taken into account.

60
Chapter 7

Benchmark relative portfolio


optimization

We often measure the performance of portfolios relative to a benchmark. The benchmark


is typically a market index representing an asset class or some segment of the market,
and its composition is assumed known. Benchmark relative portfolio management can
have two types of strategies: active or passive. The passive strategy tries to replicate
the benchmark, i. e. match its performance as closely as possible. In contrast, active
management seeks to consistently beat (outperform) the benchmark (after management
fees). This section is based on [CJPT18].

7.1 Active return


Let the portfolio and benchmark weights be x and xbm . In active management, quantities
of interest are the return and the risk relative to the benchmark. The active portfolio
return (return above the benchmark return) is

𝑅xa = xTa 𝑅 = xT 𝑅 − xTbm 𝑅 = 𝑅x − 𝑅xbm ,

where xa = x − xbm is the active holdings of the portfolio.


The active portfolio risk or tracking error will be the standard deviation of active
return:
√︀
𝜎xa (𝑅x , 𝑅xbm ) = xTa Σxa ,

where Σ is the covariance matrix of return. The tracking error measures how close the
portfolio return is to the benchmark return. Outperforming the benchmark involves
additional risk; if the tracking error is zero then there cannot be any active return at all.
In general we can construct a class of functions for the purpose of measuring tracking
error differently, from any deviation measure. See in Sec. 8.

61
7.2 Factor model on active returns
In the active return there is still a systematic component attributable to the benchmark.
We can account for that using a single factor model. First we create a factor model for
security returns:

𝑅 = 𝛽𝑅xbm + 𝜃,

where 𝛽𝑖 = Cov(𝑅𝑖 , 𝑅xbm )/Var(𝑅xbm ) is the exposure of security 𝑖 to the benchmark,


Cov(𝜃) = 0, Cov(𝑅xbm , 𝜃𝑖 ) = 0, and E(𝜃𝑖 ) = 𝛼𝑖 is the expected residual return. Under
this model, the covariance matrix of 𝑅 will be

Σ = 𝛽 T 𝛽𝜎x2 bm + Σ𝜃 ,

where 𝜎x2 bm is the benchmark variance and Σ𝜃 is the residual covariance matrix.
This factor model allows us to decompose the portfolio return into a systematic com-
ponent, which is explained by the benchmark, and a residual component, which is specific
to the portfolio:

𝑅x = 𝛽x 𝑅xbm + 𝜃x ,

where 𝛽x = 𝛽 T x, 𝜃x = 𝜃T x, and 𝛼x = 𝛼T x = E(𝜃x ).


Finally, by subtracting the benchmark return from both sides we can write the active
return as:

𝑅xa = (𝛽x − 1)𝑅xbm + 𝜃x ,

where 𝛽x − 1 is the active beta. After computing the variance of the decomposed active
return, we can also write the square of the tracking error as

𝜎x2 a (𝑅x , 𝑅xbm ) = (𝛽x − 1)2 𝜎x2 bm + 𝜔x2 ,

where 𝜎x2 bm = Var(𝑅xbm ), and 𝜔x2 = Var(𝜃x ) is the residual risk.


We must note that if a different factor model is used to forecast risk and alpha
(which is often the case in practice), there could be factors included in one model but
not included in the other model. The optimizer will interpret such misalignments as
factor return without risk or factor risk without return and it will exploit them as false
opportunities, leading to unintended biases in the results. Risk and alpha factors are
considered aligned if alpha can be written as a linear combination of the risk factors. In
case of a misalignment, there are different approaches to mitigate it [KS13].

7.3 Optimization
An active investment strategy would try to gain higher portfolio alpha 𝛼x at the cost
of accepting a higher tracking error. It follows that portfolio optimization with respect
to a benchmark will optimize the tradeoff between portfolio alpha and the square of the
tracking error. Additional constrains specific to such problems can be bounds on the

62
portfolio active beta or on the active holdings. We can see examples in the following
model:
maximize 𝛼T x
subject to (x − xbm ) Σ(x − xbm )
T
≤ 2
𝛾TE ,
x − xbm ≥ lh ,
x − xbm ≤ uh , (7.1)
𝛽x − 1 ≥ l𝛽 ,
𝛽x − 1 ≤ u𝛽 ,
x ∈ ℱ,

where 𝛼 and 𝛽 are estimates of 𝛼 and 𝛽. To compute these estimates, we can do a linear
regression. However, this tends to overestimate the betas of stocks with high benchmark
exposure and underestimate the betas of the stocks with low benchmark exposure. To
improve the estimation, shrinkage towards one (the beta of the benchmark) can be helpful:

𝛽 shrunk = (1 − 𝑐)𝛽 + 𝑐1,

where we must estimate the optimal shrinkage factor 𝑐.

7.4 Extensions
7.4.1 Variance constraint
Optimization of alpha with constraining only the tracking error can increase total port-
folio risk. According to [Jor04] it can be helpful to constrain also the total portfolio
variance, especially in cases when the benchmark portfolio is relatively inefficient:

maximize 𝛼T x
subject to (x − xbm ) Σ(x − xbm ) ≤ 𝛾TE
T 2
,
T 2 (7.2)
x Σx ≤ 𝛾 ,
x ∈ ℱ.

7.4.2 Passive management


In the case of passive management, the goal of optimization is to construct a benchmark
tracking portfolio, i. e., to have a tracking error as small as possible, given the constraints.
This usually also means that 𝛼x = 0. Of course without constraints or trading costs, the
optimal solution is the benchmark index. Therefore optimization in passive management
may only be useful when constraints are needed or liquidity issues are important [MM08].

7.4.3 Linear tracking error measures


The tracking error is one possible measure of closeness between the portfolio and the
benchmark. However, there can be other measures [RWZ99]. For example, we can use the
mean absolute deviation (MAD) measure, i. e., E(|𝑅x −𝑅xbm |). Advantage of this measure
is its better robustness against
∑︀𝑇 ⃒outliers in the
⃒ data. Applying scenario approximation to
the expectation we get 𝑇 𝑘=1 r𝑘 (x − xbm )⃒. After modeling the absolute value function
1 ⃒ T

63
(see Sec. 10.1.1), we can formulate the benchmark tracking problem with this measure
as an LO model:
maximizex,t 𝛼T x
subject to 1
∑︀𝑇
𝑇 𝑘=1 𝑡𝑘 ≤ 𝛾,
T
t + R (x − xbm ) ≥ 0, (7.3)
t − RT (x − xbm ) ≥ 0,
x ∈ ℱ,
where R is the return data matrix with one observation r𝑘 , 𝑘 = 1, . . . , 𝑇 in each column.
If the investor perceives risk as portfolio return being below the benchmark return,
we can also use downside deviation measures. One example is the lower partial moment
LPM1 measure: E (max(−𝑅 0)). The scenario approximation of this expectation
x + 𝑅xbm , )︀
will be 𝑇1 𝑇𝑘=1 max −rT𝑘 (x − xbm ), 0(︀ . After modeling)︀ the maximum (see Sec. 10.1.1)
∑︀ (︀

by defining a new variable 𝑡𝑘 = max −r𝑘 (x − xbm ), 0 , we can solve the problem as an
− T

LO:
maximize 𝛼T x
subject to 1
∑︀𝑇 −
𝑇 𝑘=1 𝑡𝑘 ≤ 𝛾,
t− ≥ −RT (x − xbm ), (7.4)
t− ≥ 0,
x ∈ ℱ.
Note that for a symmetric portfolio return distribution, this will be equivalent to the
MAD model.
Linear models might be also preferable because of their more intuitive interpretation.
By measuring the tracking error according to a linear function, the measurement unit of
the objective function is percentage instead of squared percentage.

7.5 Example
Here we show an example of a benchmark relative optimization problem. The benchmark
will be the equally weighted portfolio of the eight stocks from the previous examples,
therefore xbm = 1/𝑁 . The benchmark is created by the following code by aggregating
the price data of the eight stocks:

# Create benchmark
df_prices['bm'] = df_prices.iloc[:-2, 0:8].mean(axis=1)

Then we follow the same Monte Carlo procedure as in Sec. 5.4.1, just with the bench-
mark instead of the market factor. This will yield scenarios of linear returns on the
investment time horizon of ℎ = 1 year, so that we can compute estimates 𝛼 and 𝛽 of 𝛼
and 𝛽 using time-series regression.
In the Fusion model, we make the following modifications:
• We define the active holdings variable xa = x − xbm by

# Active holdings
xa = Expr.sub(x, xbm)

• We modify the constraint on risk to be the constraint on tracking error:

64
# Conic constraint for the portfolio variance
M.constraint('risk', Expr.vstack(s, 1, Expr.mul(G.T, xa)),
Domain.inRotatedQCone())

• We also specify bounds on the active holdings and on the portfolio active beta:
# Constraint on active holdings
M.constraint('ubound-h', Expr.sub(uh, xa), Domain.greaterThan(0.0))
M.constraint('lbound-h', Expr.sub(xa, lh), Domain.greaterThan(0.0))

# Constraint on portfolio active beta


port_act_beta = Expr.sub(Expr.dot(B, x), 1)
M.constraint('ubound-b', Expr.sub(ub, port_act_beta),
Domain.greaterThan(0.0))
M.constraint('lbound-b', Expr.sub(port_act_beta, lb),
Domain.greaterThan(0.0))

• Finally, we modify the objective to maximize the portfolio alpha:


# Objective (quadratic utility version)
delta = M.parameter()
M.objective('obj', ObjectiveSense.Maximize,
Expr.sub(Expr.dot(a, x), Expr.mul(delta, s)))

The complete Fusion model of the optimization problem (7.1) will then be
def EfficientFrontier(N, a, B, G, xbm, deltas, uh, ub, lh, lb):

with Model("Case study") as M:


# Settings
M.setLogHandler(sys.stdout)

# Variables
# The variable x is the fraction of holdings in each security.
# x must be positive, imposing the no short-selling constraint.
x = M.variable("x", N, Domain.greaterThan(0.0))

# Active holdings
xa = Expr.sub(x, xbm)

# s models the portfolio variance term in the objective.


s = M.variable("s", 1, Domain.unbounded())

# Budget constraint
M.constraint('budget_x', Expr.sum(x), Domain.equalsTo(1))

# Constraint on active holdings


M.constraint('ubound-h', Expr.sub(uh, xa), Domain.greaterThan(0.0))
M.constraint('lbound-h', Expr.sub(xa, lh), Domain.greaterThan(0.0))

(continues on next page)

65
(continued from previous page)
# Constraint on portfolio active beta
port_act_beta = Expr.sub(Expr.dot(B, x), 1)
M.constraint('ubound-b', Expr.sub(ub, port_act_beta),
Domain.greaterThan(0.0))
M.constraint('lbound-b', Expr.sub(port_act_beta, lb),
Domain.greaterThan(0.0))

# Conic constraint for the portfolio variance


M.constraint('risk', Expr.vstack(s, 1, Expr.mul(G.T, xa)),
Domain.inRotatedQCone())

# Objective (quadratic utility version)


delta = M.parameter()
M.objective('obj', ObjectiveSense.Maximize,
Expr.sub(Expr.dot(a, x), Expr.mul(delta, s)))

# Create DataFrame to store the results. SPY is removed.


columns = ["delta", "obj", "return", "risk"]
+ df_prices.columns[:-1].tolist()
df_result = pd.DataFrame(columns=columns)
for d in deltas:
# Update parameter
delta.setValue(d)

# Solve optimization
M.solve()

# Save results
portfolio_return = a @ x.level()
portfolio_risk = np.sqrt(2 * s.level()[0])
row = pd.Series([d, M.primalObjValue(), portfolio_return,
portfolio_risk]
+ list(x.level()), index=columns)
df_result = df_result.append(row, ignore_index=True)

return df_result

We give the input parameters and compute the efficient frontier using the following
code:

deltas = np.logspace(start=-0.5, stop=2, num=20)[::-1]


xbm = np.ones(N) / N
uh = np.ones(N) * 0.5
lh = -np.ones(N) * 0.5
ub = 0.5
lb = -0.5
df_result = EfficientFrontier(N, a, B, G, xbm, deltas, uh, ub, lh, lb)
mask = df_result < 0
mask.iloc[:, :2] = False
(continues on next page)

66
(continued from previous page)
df_result[mask] = 0
df_result

On Fig. 7.1 we plot the efficient frontier and on Fig. 7.2 the portfolio composition.
On the latter we see that as the tracking error decreases, the portfolio gets closer to the
benchmark, i. e., the equal-weighted portfolio.

Fig. 7.1: The benchmark relative efficient frontier.

67
Fig. 7.2: Portfolio composition x with varying level if risk-aversion 𝛿.

68
Chapter 8

Other risk measures

In the definitions (2.1)-(2.4) of the classical MVO problem the variance (or standard
deviation) of portfolio return is used as the measure of risk, making computation easy
and convenient. If we assume that the portfolio return is normally distributed, then the
variance is in fact the optimal risk measure, because then MVO can take into account all
information about return. Moreover, if 𝑇 is large compared to 𝑁 , the sample mean and
covariance matrix are maximum likelihood estimates (MLE), implying that the optimal
portfolio resulting from estimated inputs will also be a MLE of the true optimal portfolio
[Lau01].
Empirical observations suggest however, that the distribution of linear return is often
skewed, and even if it is elliptically symmetric, it can have fat tails and can exhibit
tail dependence1 . As the distribution moves away from normality, the performance of a
variance based portfolio estimator can quickly degrade.
No perfect risk measure exists, though. Depending on the distribution of return,
different measures can be more appropriate in different situations, capturing different
characteristics of risk. Return of diversified equity portfolios are often approximately
symmetric over periods of institutional interest. Options, swaps, hedge funds, and private
equity have return distributions that are unlikely to be symmetric. Also less symmetric
are distributions of fixed-income and real estate index returns, and diversified equity
portfolios over long time horizons [MM08].
The measures presented here are expectations, meaning that optimizing them would
lead to stochastic optimization problems in general. As a simplification, we consider only
their sample average approximation by assuming that a set of linear return scenarios are
given.
1
Tail dependence between two random variables means that their correlation is greater for more
extreme events. Typical in financial markets, where extreme events are likely to happen together for
multiple securities.

69
8.1 Deviation measures
This class of measures quantify the variability around the mean of the return distribution,
similar to variance. If we associate investment risk with the uncertainty of return, then
such measures could be ideal.
Let 𝑋, 𝑌 be random variables of investment returns, and let 𝐷 be a mapping. The
general properties that deviation measures should satisfy are [RUZ06]:
1. Positivity: 𝐷(𝑋) ≥ 0 with 𝐷(𝑋) = 0 only if 𝑋 = 𝑐 ∈ R,
2. Translation invariance: 𝐷(𝑋 + 𝑐) = 𝐷(𝑋), 𝑐 ∈ R,
3. Subadditivity: 𝐷(𝑋 + 𝑌 ) ≤ 𝐷(𝑋) + 𝐷(𝑌 ),
4. Positive homogeneity: 𝐷(𝑐𝑋) = 𝑐𝐷(𝑋), 𝑐 ≥ 0
Note that positive homegeneity and subadditivity imply the convexity of 𝐷.

8.1.1 Robust statistics


A group of deviation measures are robust statistics. Maximum likelihood estimators are
highly sensitive to deviations from the assumed distribution. Thus if the return is not
normally distributed, robust portfolio estimators can be an alternative. These are less
efficient than MLE for normal distribution, but their performance degrades less quickly
under departures from normality. One suitable class of estimators to express portfolio
risk are M-estimators [VD09]:
𝐷𝛿 (𝑅x ) = min E [𝛿(𝑅x − 𝑞)] ,
𝑞

where 𝛿 is a convex loss∑︀


function with unique minimum at zero. It has the sample average
approximation min𝑞 𝑇 𝑇𝑘=1 𝛿(xT r𝑘 − 𝑞).
1

Variance
Choosing 𝛿var (𝑦) = 12 𝑦 2 we get 𝐷𝛿var , a different expression for the portfolio variance. The
optimal 𝑞 in this case is the portfolio mean return 𝜇x . This gives us a way to perform
standard mean–variance optimization directly using scenario data, without the need for
a sample covariance matrix:
minimize 2𝑇 1
∑︀𝑇 T 2
𝑘=1 (x r𝑘 − 𝑞) (8.1)
subject to x ∈ ℱ.
By defining new variables 𝑡+ 𝑘 = max x r𝑘 − 𝑞, 0 and 𝑡𝑘 = max −x r𝑘 + 𝑞, 0 , then

(︀ T )︀ (︀ T )︀

using Sec. 10.1.1 we arrive at a QO model:


minimize 1
2𝑇
‖t+ ‖2 + 1
2𝑇
‖t− ‖2
x R − 𝑞 = t+ − t− ,
T
(8.2)
t+ , t− ≥ 0,
subject to x ∈ ℱ,
where R = [r1 , . . . , r𝑇 ] is the scenario matrix, where each column is a scenario. The
norms can be further modeled by the rotated quadratic cone; see in Sec. 10.1.1. While
this formulation does not need covariance estimation, it can still suffer from computational
inefficiency because the number of variables and the number of constraints depend on the
sample size 𝑇 .

70
𝛼-shortfall
Choosing 𝛿sf,𝛼 (𝑦) = 𝛼𝑦 − min(𝑦, 0) or equivalently 𝛿sf,𝛼 (𝑦) = 𝛼max(𝑦, 0) + (1 −
𝛼)max(−𝑦, 0) for 𝛼 ∈ (0, 1) will give the 𝛼-shortfall 𝐷𝛿sf,𝛼 studied in [Lau01]. The
optimal 𝑞 will be the 𝛼-quantile of the portfolio return. 𝛼-shortfall is a deviation mea-
sure with favorable robustness properties when the portfolio return has an asymmetric
distribution, fat tailed symmetric distribution, or exhibits multivariate tail-dependence.
The 𝛼 parameter adjusts the level of asymmetry in the 𝛿 function, allowing us to penalize
upside and downside returns differently.
Portfolio optimization using sample 𝛼-shortfall can be formulated as
minimize 𝑇𝛼 𝑇𝑘=1 (xT r𝑘 − 𝑞) − 𝑇1 𝑇𝑘=1 min(xT r𝑘 − 𝑞, 0)
∑︀ ∑︀
(8.3)
subject to x ∈ ℱ
By defining new variables 𝑡−
𝑘 = max(−x r𝑘 + 𝑞, 0), and
T
model accoring to Sec. 10.1.1 we
arrive at an LO model:
minimize 𝛼(xT 𝜇 − 𝑞) + 𝑇1 𝑇𝑘=1 𝑡−
∑︀
𝑘
t− ≥ −xT r𝑘 + 𝑞,
(8.4)
t− ≥ 0,
subject to x ∈ ℱ.
A drawback of this model is that the number of constraints depends on the sample size
𝑇 , resulting in computational inefficiency for large number of samples.
For elliptically symmetric return distributions the 𝛼-shortfall is equivalent to the
variance in the sense that the corresponding sample portfolio estimators will be estimating
the same portfolio. In fact for normal distribution, the 𝛼-shortfall is proportional to the
portfolio variance: 𝐷𝛿sf,𝛼 = 𝛼
𝜑(𝑞𝛼 )
𝐷𝛿var , where 𝜑(𝑞𝛼 ) is the value of the standard normal
√︀

density function at its 𝛼-quantile.


However when return is symmetric but not normally distributed, the sample portfolio
estimators for 𝛼-shortfall can have less estimation error than sample portfolio estimators
for variance.

MAD
A special case of 𝛼-shortfall is for 𝛼 = 0.5. The function 𝛿MAD (𝑦) = 21 |𝑦| gives us 𝐷𝛿MAD ,
which is called the mean absolute deviation (MAD) measure or the 𝐿1 -risk. For this case
the optimal 𝑞 will be the median of the portfolio return, and the sample MAD portfolio
optimization problem can be formulated as
minimize 2𝑇 1
∑︀𝑇 T
𝑘=1 |x r𝑘 − 𝑞| (8.5)
subject to x ∈ ℱ.
After modeling the absolute value based on Sec. 10.1.1 we arrive at the following LO:
minimize 2𝑇 1
∑︀𝑇
𝑘=1 𝑡𝑘
subject to T
x R−𝑞 ≥ −t,
(8.6)
xT R − 𝑞 ≤ t,
x ∈ ℱ,
where R is the return data matrix with one observation r𝑘 , 𝑘 = 1, . . . , 𝑇 in each column.
Note that the number of constraints in this LO problem again depends on the sample
size 𝑇 .

71
√︁ For normally distributed returns, the MAD is proportional to the variance: 𝐷𝛿𝑀 𝐴𝐷 =
𝐷𝛿var .
2
√︀
𝜋
The 𝐿1 -risk can also be applied without minimizing over 𝑞. We can just let 𝑞 to be
the sample portfolio mean instead, i. e., 𝑞 = xT 𝜇 [KY91].

Risk measure from the Huber function


Another robust portfolio estimator can be obtained for the case of symmetric return
distributions using the Huber function
{︃
𝑦2, |𝑦| ≤ 𝑐
𝛿H (𝑦) = 2
,
2𝑐|𝑦| − 𝑐 , |𝑦| > 𝑐

yielding the risk measure 𝐷𝛿H . A different form of the Huber function is 𝛿H (𝑦) = min𝑢 𝑢2 +
2𝑐|𝑦 − 𝑢|, which leads to a QO formulation:

minimize 1
∑︀𝑇 1
∑︀𝑇
𝑢2𝑘 + 2𝑐|xT r𝑘 − 𝑞 − 𝑢𝑘 |
𝑇 𝑘=1 𝑇 𝑘=1 (8.7)
subject to x ∈ ℱ.

Modeling the absolute value based on Sec. 10.1.1 we get

minimize 𝑇1 𝑇𝑘=1 𝑢2𝑘 + 𝑇1 𝑇𝑘=1 2𝑐𝑡𝑘


∑︀ ∑︀
subject to xT R − 𝑞 − u ≥ −t,
(8.8)
xT R − 𝑞 − u ≤ t,
x ∈ ℱ.

Note that the size of this problem depends on the number of samples 𝑇 .

8.1.2 Downside deviation measures


In financial context many distributions are skewed. Investors might only be concerned
with negative deviations (losses) relative to the expected portfolio return 𝜇x , or in general
with falling short of some benchmark return 𝑟bm . In these cases downside deviation
measures are more appropriate.
A class of downside deviation measures are lower partial moments of order 𝑛:

LPM𝑛 (𝑟bm ) = E(max(𝑟bm − 𝑅x , 0)𝑛 ),

where 𝑟bm is some given target return. The discrete version of this measure is
𝑘=1 𝑘 max(𝑟bm − x r𝑘 , 0) , where r𝑘 is a scenario of the portfolio return 𝑅x occur-
∑︀𝑇 T 𝑛
𝑝
ring with probability 𝑝𝑘 .
We have the following special cases:

• LPM0 is the probability of loss relative to the target return 𝑟bm . LPM0 is an
incomplete measure of risk, because it does not provide any indication of how
severe the shortfall can be, should it occur. Therefore it is best used as a constraint
while optimizing for a different risk measure. This way LPM0 can still provide
information about the risk tolerance of the investor.

• LPM1 is the expected loss, also called target shortfall.

72
• LPM2 is called target semi-variance.
While lower partial moments only consider outcomes below the target 𝑟bm , the op-
timization still uses the entire distribution of 𝑅x . The right tail of the distribution
(representing the outcomes above the target) is captured in the mean 𝜇x of the distribu-
tion.
The LPM𝑛 optimization problem can be formulated as [Lau01]
minimize
∑︀𝑇 T 𝑛
𝑘=1 𝑝𝑘 max(𝑟bm − x r𝑘 , 0) (8.9)
subject to x ∈ ℱ.
If we define the new variables 𝑡−
𝑘 = max(𝑟bm − x r𝑘 , 0), then for 𝑛 = 1 we arrive at an
T

LO and for 𝑛 = 2 we arrive at a QO problem:


minimize
∑︀𝑇 − 𝑛
𝑘=1 𝑝𝑘 (𝑡𝑘 )
subject to t− ≥ 𝑟bm − xT R,
(8.10)
t− ≥ 0,
x ∈ ℱ,
where t− is 𝑇 dimensional vector. Thus there will be 𝑁 + 𝑇 variables, and the number
of constraints will also depend on the sample size 𝑇 .
Contrary to the variance, LPM𝑛 measures are consistent with more general classes of
investor utility functions, and assume less about the return distribution. Thus they better
reflect investor preferences and are valid under a broader range of conditions [Har91].
However, LPMs are only useful for skewed distributions. If the return distribution is
symmetric, LPM1 and LPM2 based portfolio estimates will be equivalent to MAD and
variance based ones. In particular the MAD for a random variable with density 𝑓 (𝑥)
will be the same as LPM1 of a random variable with density 𝑓 (𝑥) + 𝑓 (−𝑥). Also, the
mean-LPM optimization model is very sensitive to the changes in the target value.

8.2 Tail risk measures


Tail risk measures try to capture the risk of extreme events. The measures described
below are commonly defined for random variables treating loss as positive number, so to
apply them on security returns 𝑅 we have to flip the sign and define loss as 𝐿 = −𝑅.
Let 𝑋, 𝑌 be random variables of investment losses, and let 𝜏 be a mapping. The
formal properties that a reasonable tail risk measure should satisfy are the following
[FS02]:
1. Monotonicity: If 𝑋 ≤ 𝑌 , then 𝜏 (𝑋) ≤ 𝜏 (𝑌 ),
2. Translation invariance: 𝜏 (𝑋 + 𝑚) = 𝜏 (𝑋) + 𝑚, 𝑚 ∈ R,
3. Subadditivity: 𝜏 (𝑋 + 𝑌 ) ≤ 𝜏 (𝑋) + 𝜏 (𝑌 ),
4. Positive homogeneity: 𝜏 (𝑐𝑋) = 𝑐𝜏 (𝑋), 𝑐 ≥ 0
If 𝜏 satisfies these properties then it is called coherent risk measure. If we relax
condition 3 and 4 to the property of convexity, then we get the more general class of convex
risk measures. Not all risk measures used in practice satisfy these properties, violation of
condition 1 and/or 2 is typical. However, in the context of portfolio selection these are
less critical, the property of convexity is most important to achieve risk diversification
[Lau01].

73
8.2.1 Value-at-Risk
Denote the 𝛼-quantile of a random variable 𝐿 by 𝑞𝛼 (𝐿) = inf{𝑥|P(𝐿 ≤ 𝑥) ≥ 𝛼}. If 𝐿
is the loss over a given time horizon, then the value-at-risk (VaR) of 𝐿 with confidence
level 𝛼 (or risk level 1 − 𝛼) is defined to be

VaR𝛼 (𝐿) = 𝑞𝛼 (𝐿).

This is the amount of loss (a positive number) over the given time horizon that will
not be exceeded with probability 𝛼. However, VaR does not give information about the
magnitude of loss in case of the 1−𝛼 probability event when losses are greater than 𝑞𝛼 (𝐿).
Also it is not a coherent risk measure (it does not respect convexity property), meaning
it can discourage diversification.2 In other words, portfolio VaR may not be lower than
the sum of the VaRs of the individual securities.

8.2.2 Conditional Value-at-Risk


A modification of VaR that is a coherent risk measure is conditional value-at-risk (CVaR).
CVaR is the average of the 1 − 𝛼 fraction of worst case losses, i. e., the losses equal to
or exceeding VaR𝛼 (𝐿). Its most general definition (applicable also for loss distributions
with possible discontinuities) can be found in [RU02]. Let 𝜆𝛼 = 1−𝛼 1
(P(𝐿 ≤ 𝑞𝛼 (𝐿)) − 𝛼).
Then the CVaR of 𝐿 with confidence level 𝛼 will be

CVaR𝛼 (𝐿) = 𝜆𝛼 𝑞𝛼 (𝐿) + (1 − 𝜆𝛼 )E(𝐿|𝐿 > 𝑞𝛼 (𝐿)),

which is a linear combination of VaR and the quantity called mean shortfall. The latter
is also not coherent on its own.
If the distribution function P(𝐿 ≤ 𝑞) is continuous at 𝑞𝛼 (𝐿) then 𝜆𝛼 = 0. If there is
discontinuity and we have to account for a probability mass concentrated at 𝑞𝛼 (𝐿), then
𝜆𝛼 is nonzero in general. This is often the case in practice, when the loss distribution is
discrete, for example for scenario based approximations.

CVaR for discrete distribution


Suppose that the loss distribution is described by points 𝑞1 < · · · < ∑︀ 𝑞𝑇 with nonzero
𝑖𝛼 −1
probabilities 𝑝1 , . . . , 𝑝𝑇 , and 𝑖𝛼 ∈ {1, . . . , 𝑇 } is the index such that 𝑖=1 𝑝𝑖 < 𝛼 ≤
𝑖=1 𝑝𝑖 . Then VaR𝛼 (𝐿) = 𝑞𝑖𝛼 and
∑︀𝑖𝛼

CVaR𝛼 (𝐿) = 𝜆𝛼 𝑞𝑖𝛼 + (1 − 𝜆𝛼 ) ∑︀ 1 𝑝𝑖 𝑖>𝑖𝛼 𝑝𝑖 𝑞𝑖


∑︀
=
1
(︁ ∑︀
𝑖 𝛼
𝑖>𝑖𝛼
∑︀ )︁ (8.11)
= 1−𝛼 ( 𝑖=1 𝑝𝑖 − 𝛼)𝑞𝑖𝛼 + 𝑖>𝑖𝛼 𝑝𝑖 𝑞𝑖 ,

where 𝜆𝛼 = 1−𝛼 1
( 𝑖𝑖=1 𝑝𝑖 − 𝛼). As a special case, if we assume that 𝑝𝑖 = 𝑇1 , i. e., we have
∑︀ 𝛼
a sample average approximation, then the above formula simplifies with 𝑖𝛼 = ⌈𝛼𝑇 ⌉.
It can be seen that VaR𝛼 (𝐿) ≤ CVaR𝛼 (𝐿) always holds. CVaR is also consistent with
second-order stochastic dominance (SSD), i. e., the CVaR efficient portfolios are the ones
actually preferred by some wealth-seeking risk-averse investors.
2
VaR only becomes coherent when return is normally distributed, but in this case it is proportional
to the standard deviation.

74
Portfolio optimization with CVaR
If we substitute portfolio loss scenarios into formula (8.11), we can see that the quantile
(−RT x)𝑖𝛼 will depend on x. It follows that the ordering of the scenarios and the index 𝑖𝛼
will also depend on x, making it difficult to optimize. However, note that formula (8.11)
is actually the linear combination of largest elements of the vector q. We can thus apply
Sec. 10.1.1 to get the dual form of CVaR𝛼 (𝐿), which is an LO problem:
minimize 𝑡 + 1
1−𝛼
pT u
subject to u + 𝑡 ≥ q, (8.12)
u ≥ 0.
Note that problem (8.12) is equivalent to
1
∑︀
min 𝑡 + 1−𝛼
𝑡 𝑖 𝑝𝑖 max(0, 𝑞𝑖 − 𝑡). (8.13)
This convex function (8.13) is exactly the one found in [RU00], where it is also proven to
be valid for continuous probability distributions as well.
Now we can substitute the portfolio return into q, and optimize over x to find the
portfolio that minimizes CVaR:
minimize 𝑡 + 1
1−𝛼
pT u
subject to u + 𝑡 ≥ −RT x,
(8.14)
u ≥ 0,
x ∈ ℱ.
Because CVaR is represented as a convex function in formula (8.13), we can also formulate
an LO to maximize expected return, while limiting risk in terms of CVaR:
maximize 𝜇T x
subject to 𝑡 + 1
1−𝛼
pT u ≤ 𝛾,
u+𝑡 ≥ −RT x, (8.15)
u ≥ 0,
x ∈ ℱ.
The drawback of optimizing CVaR using problems (8.14) or (8.15) is that both the number
of variables and the number of constraints depend on the number of scenarios 𝑇 . This
can make the LO model computationally expensive for very large number of samples. For
example if the distribution of return is not known, we might need to obtain or simulate
a substantial amount of samples, depending on the confidence level 𝛼.

8.2.3 Entropic Value-at-Risk


A more general risk measure in this class is the entropic value-at-risk (EVaR). EVaR is
also a coherent risk measure, with additional favorable monotonicity properties; see in
[AJ12]. It is defined as the tightest upper bound on VaR obtained from the Chernoff
inequality:
(︂ )︂
1 𝑀𝐿 (𝑠)
EVaR𝛼 (𝐿) = inf log , (8.16)
𝑠>0 𝑠 1−𝛼
where 𝑀𝐿 (𝑠) = E(e𝑠𝐿 ) is the moment generating function of 𝐿. The EVaR incorporates
losses both less and greater than the VaR simultaneously, thus it always holds that
CVaR𝛼 (𝐿) ≤ EVaR𝛼 (𝐿).

75
EVaR for discrete distribution
Based on the definition (8.16), the discrete version of the EVaR will be
𝑠𝑞𝑖
(︂ ∑︀ )︂
1 𝑖 𝑝𝑖 e
EVaR𝛼 (𝐿) = inf log , (8.17)
𝑠>0 𝑠 1−𝛼

We can make formula (8.17) convex by substituting a new variable 𝑡 = 1𝑠 :


(︃ )︃
∑︁
EVaR𝛼 (𝐿) = inf 𝑡log 𝑝𝑖 e𝑞𝑖 /𝑡 − 𝑡log(1 − 𝛼). (8.18)
𝑡>0
𝑖

We can transform formula (8.18) into a conic optimization problem by substituting the
first term of the objective with a new variable 𝑧 and adding the new constraint 𝑧 ≥
EVaR𝛼 (𝐿). Then we apply the rule Sec. 10.1.1:

minimize 𝑧 − 𝑡log(1 − 𝛼),


subject to
∑︀𝑇
𝑖=1 𝑝𝑖 𝑢𝑖 ≤ 𝑡, (8.19)
(𝑢𝑖 , 𝑡, 𝑞𝑖 − 𝑧) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝑇,
𝑡 ≥ 0.

Portfolio optimization with EVaR


Now we can substitute the portfolio return into q, and optimize over x to find the portfolio
that minimizes EVaR:
minimize 𝑧 − 𝑡log(1 − 𝛼)
subject to
∑︀𝑇
𝑖=1 𝑝𝑖 𝑢𝑖 ≤ 𝑡,
T
(𝑢𝑖 , 𝑡, −r𝑖 x − 𝑧) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝑇,
(8.20)
𝑡 ≥ 0,
𝜇T x ≥ 𝑟min ,
x ∈ ℱ.

Because EVaR is represented as a convex function (8.18), we can also formulate a conic
problem to maximize expected return, while limiting risk in terms of EVaR:

maximize 𝜇T x
subject to 𝑧 − 𝑡log(1 − 𝛼), ≤ 𝛾,
∑︀𝑇
𝑖=1 𝑝𝑖 𝑢𝑖 ≤ 𝑡,
(8.21)
(𝑢𝑖 , 𝑡, −rT𝑖 x − 𝑧) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝑇,
𝑡 ≥ 0,
x ∈ ℱ.

A disadvantage of the EVaR conic model is that it still depends on 𝑇 in the number of
exponential cone contraints.
Note that if we assume the return distribution to be a Gaussian mixture, we can find
a different approach to computing EVaR. See in Sec. 8.3.2.

76
8.2.4 Relationship between risk measures
Suppose that 𝜏 (𝑋) is a coherent tail risk measure that additionally satisfies 𝜏 (𝑋) >
−E(𝑋). Suppose also that 𝐷(𝑋) is a deviation measure that additionally satisfies
𝐷(𝑋) ≤ E(𝑋) for all 𝑋 ≥ 0. Then these subclasses of risk measures can be related
through the following identities:
𝐷(𝑋) = 𝜏 (𝑋 − E(𝑋))
(8.22)
𝜏 (𝑋) = 𝐷(𝑋) − E(𝑋)
For example, 𝛼-shortfall and CVaR is related this way. Details can be found in [RUZ06].

8.2.5 Practical considerations


Many risk measures in this chapter can be computed in practice through scenario based
approximations. The relevant optimization problems can then be conveniently formulated
as LO or QO problems. The advantage of these models is that they do not need a
covariance matrix estimate. The simplification of the problem structure, however, comes
at the cost of increasing the problem dimension by introducing a number of new variables
and constraints proportional to the number of scenarios 𝑇 .
If 𝑇 is large, this can lead to computational burden not only because of the increased
number of variables and constraints, but also because the scenario matrix R also depend-
ing on 𝑇 becomes very non-sparse, making the solution process less efficient.
If 𝑇 is too small, specifically 𝑇 < 𝑁 , then there will be fewer scenarios then parameters
to estimate. This makes the portfolio optimization problem very imprecise, and prone to
overfit the data. Also depending on other constraints, solution of LO problems can have
computational difficulties. To mitigate this, we can use regularization methods, such as
𝐿1 (lasso) or 𝐿2 (ridge) penalty terms in the objective:
minimize ℛx + 𝜆‖x − xpri ‖𝑝𝑝
subject to 𝜇T x ≥ 𝑅, (8.23)
1T x = 1,
where ℛx is the portfolio risk, xpri is a prior portfolio, and 𝜆 is the regularization strength
parameter. The penalty term will ensure that the extreme deviations from the prior are
unlikely. Thus 𝜆 basically sets the investor confidence in the portfolio xpri . Using 𝑝 = 1, 2
we arrive at LO and QO problems respectively. See details in [Lau01].

8.3 Expected utility maximization


Apart from selecting different risk measures, we can also approach portfolio optimization
through the use of utility functions. We specify a concave and increasing utility function
that measures the investors preference for each specific outcome. Then the objective is
to maximize the expected utility under the return distribution.
Expected utility maximization can take into account any type of return distribution,
while MVO works only with the first two moments, therefore it is accurate only for
normally distributed return. MVO is also equivalent with expected utility maximization
if the utility function is quadratic, because it models the investors indifference about
higher moments of return. The only advantage of MVO in this comparison is that it
works without having to discretize the return distribution and work with scenarios.

77
8.3.1 Optimal portfolio using gaussian mixture return
In [LB22] an expected utility maximization approach is taken, assuming that the return
distribution is a Gaussian mixture (GM). The benefits of a GM distribution are that it
can approximate any continuous distribution, including skewed and fat-tailed ones. Also
its components can be interpreted as return distributions given a specific market regime.
Moreover, the expected utility maximization using this return model can be formulated as
a convex optimization problem without needing return scenarios, making this approach
as efficient and scalable as MVO.
We denote security returns having Gaussian mixture (GM) distribution with 𝐾 com-
ponents as
𝑅 ∼ 𝒢ℳ({𝜇𝑖 , Σ𝑖 , 𝜋𝑖 }𝐾
𝑖=1 ), (8.24)
where 𝜇𝑖 is the mean, Σ𝑖 is the covariance matrix, and 𝜋𝑖 is the probability of component
𝑖. As special cases, for 𝐾 = 1 we get the normal distribution 𝒩 (𝜇1 , Σ1 ), and for Σ𝑖 =
0, 𝑖 = 1, . . . , 𝐾 we get a scenario distribution with 𝐾 return scenarios {𝜇1 , . . . , 𝜇𝐾 }.
Using definition (8.24), the distribution of portfolio return 𝑅x will also be GM:
2
𝑅x ∼ 𝒢ℳ({𝜇x,𝑖 , 𝜎x,𝑖 , 𝜋𝑖 }𝐾
𝑖=1 ), (8.25)
where 𝜇x,𝑖 = 𝜇T𝑖 x, and 𝜎x,𝑖
2
= xT Σ𝑖 x.
To select the optimal portfolio we use the exponential utility function 𝑈𝛿 (𝑥) = 1−𝑒−𝛿𝑥 ,
where 𝛿 > 0 is the risk aversion parameter. This choice allows us to write the expected
utility of the portfolio return 𝑅x using the moment generating function:

E(1 − 𝑒−𝛿𝑅x ) = 1 − 𝑀𝑅x (−𝛿). (8.26)


Thus maximizing the function (8.26) is the same as minimizing the moment generat-
ing function of the portfolio return, or equivalently, we can minimize its logarithm, the
cumulant generating function:
minimize 𝐾𝑅x (−𝛿)
(8.27)
subject to x ∈ ℱ,
If 𝑅x is assumed to be a GM random variable, then its cumulant generating function will
be the following:
(︃ 𝐾 )︂)︃
2
(︂
∑︁ 𝛿
𝐾𝑅x (−𝛿) = log 𝜋𝑖 exp −𝛿𝜇T𝑖 x + xT Σ𝑖 x (8.28)
𝑖=1
2

The function (8.28) is convex in x because it is a composition of the convex and increasing
log-sum-exp function and a convex quadratic function. Note that for 𝐾 = 1, we get back
the same quadratic utility objective as in version (2.3) of the MVO problem.
Assuming we have the GM distribution parameter estimates 𝜇𝑖 , Σ𝑖 , 𝑖 = 1, . . . , 𝐾,
we can apply Sec. 10.1.1 and Sec. 10.1.1 to arrive at the conic model of the utility
maximization problem (8.27):
minimize 𝑧
subject to
∑︀𝐾
𝑖=1 𝜋𝑖 𝑢𝑖 ≤ 1,
(𝑢𝑖 , 1, −𝛿𝜇T𝑖 x + 𝛿 2 𝑞𝑖 − 𝑧) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝐾, (8.29)
(𝑞𝑖 , 1, GT x) ∈ 𝒬𝑁 r
+2
, 𝑖 = 1, . . . , 𝐾,
x ∈ ℱ,

78
where 𝑞𝑖 is an upper bound for the quadratic term 12 xT Σ𝑖 x, and G𝑖 is such that Σ𝑖 =
G𝑖 GT𝑖 .

8.3.2 EVaR using Gaussian mixture return


We introduced Entropic Value-at-Risk (EVaR) in Sec. 8.2.3. EVaR can also be expressed
using the cumulant generating function 𝐾𝐿 (𝑠) = log(𝑀𝐿 (𝑠)) of the loss 𝐿:
1 1
EVaR𝛼 (𝐿) = inf 𝐾𝐿 (𝑠) − log(1 − 𝛼). (8.30)
𝑠>0 𝑠 𝑠
After substituting the return 𝑅 = −𝐿 instead of the loss, we see that 𝐾−𝑅 (𝑠) = 𝐾𝑅 (−𝑠).
Assuming now that 𝑅 = 𝑅x is the portfolio return, we get the following optimization
problem:

minimize 1
𝐾 (−𝑠) − 1𝑠 log(1 − 𝛼)
𝑠 𝑅x (8.31)
subject to x ∈ ℱ,

We can find a connection between EVaR computation (8.31) and maximization of ex-
pected exponential utility (8.27). Suppose that the pair (x* , 𝑠* ) is optimal for problem
(8.31). Then x* also optimal for problem (8.27), with risk aversion parameter 𝛿 = 𝑠* .
By assuming a GM distribution for security return, we can optimize problem (8.31)
without needing a scenario distribution. First, to formulate it as a convex optimization
problem, define the new variable 𝑡 = 1𝑠 :
(︂ )︂
1
EVaR𝛼 (𝐿) = inf 𝑡𝐾𝑅x − − 𝑡 log(1 − 𝛼). (8.32)
𝑡>0 𝑡
We can observe that the first term in formula (8.32) is the perspective of 𝐾𝑅x (−1).
Therefore by substituting the cumulant generating function (8.28) of the GM portfolio
return, we get a convex function in the portfolio x:
(︃ 𝐾 (︂ )︂)︃
∑︁ 1 T 1 T
EVaR𝛼 (𝐿) = inf 𝑡 log 𝜋𝑖 exp − 𝜇𝑖 x + 2 x Σ𝑖 x − 𝑡 log(1 − 𝛼). (8.33)
𝑡>0
𝑖=1
𝑡 2𝑡

Assuming we have the GM distribution parameter estimates 𝜇𝑖 , Σ𝑖 , 𝑖 = 1, . . . , 𝐾, we can


apply Sec. 10.1.1 and Sec. 10.1.1 to arrive at the conic model of the EVaR minimization
problem (8.31):

minimize 𝑧 − 𝑡log(1 − 𝛼)
subject to
∑︀𝐾
𝑖=1 𝜋𝑖 𝑢𝑖 ≤ 𝑡,
(𝑢𝑖 , 𝑡, −𝜇T𝑖 x + 𝑞𝑖 − 𝑧) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝐾,
(8.34)
(𝑞𝑖 , 1, GT x) ∈ 𝒬𝑁 r
+2
, 𝑖 = 1, . . . , 𝐾,
𝑡 ≥ 0,
x ∈ ℱ,

where 𝑞𝑖 is an upper bound for the quadratic term 12 xT Σ𝑖 x, and G𝑖 is such that Σ𝑖 =
G𝑖 GT𝑖 .
The huge benefit of this EVaR formulation is that its size does not depend on the num-
ber of scenarios, because it is derived without using a scenario distribution. It depends
only on the number of GM components 𝐾.

79
8.4 Example
This example shows how can we compute the CVaR efficient frontier using the dual form
of CVaR in MOSEK Fusion.

8.4.1 Scenario generation


The input data is again obtained the same way as detailed in Sec. 3.4.2, but we do
only the steps until Sec. 3.4.3. This way we get the expected return estimate 𝜇log and
the covariance matrix estimate Σlog of yearly logarithmic returns. These returns have
approximately normal distribution, so we can easily generate 𝑇 number of return scenarios
from them, using the numpy default random number generator. Here we choose 𝑇 =
99999. This number ensures to have a nonzero 𝜆𝛼 in formula (8.11), which is the most
general case.
# Number of scenarios
T = 99999
# Generate logarithmic return scenarios assuming normal distribution
R_log = np.random.default_rng().multivariate_normal(m_log, S_log, T)

Next, we convert the received logarithmic return scenarios to linear return scenarios
using the inverse of formula (3.2).
# Convert logarithmic return scenarios to linear return scenarios
R = np.exp(scenarios_log) - 1
R = R.T

We transpose the resulting matrix just to remain consistent with the notation in this
chapter, namely that each column of R is a scenario. We also set the scenario probabilities
to be 𝑇1 :
# Scenario probabilities
p = np.ones(T) / T

8.4.2 The optimization problem


The optimization problem we solve here resembles problem (2.12), but we will change
the risk measure from portfolio standard deviation to portfolio CVaR:
maximize 𝜇T x − 𝛿 · CVaR𝛼 (−RT x)
subject to 1T x = 1, (8.35)
x ≥ 0.
Applying the dual CVaR formula (8.13), we get:
maximize 𝜇T x − 𝛿 𝑡 + 1−𝛼1
(︀ ∑︀ T
)︀
𝑖 𝑝𝑖 max(0, −r𝑖 x − 𝑡)
subject to 1T x = 1, (8.36)
x ≥ 0.
We know that formula (8.13) equals CVaR only if it is minimal in 𝑡. This will be satisfied
in the optimal solution of problem (8.36), because the CVaR term is implicitly forced to
become smaller, and 𝑡 is independent from x.

80
Now we model the maximum function, and arrive at the following LO model of the
mean-CVaR efficient frontier:
maximize 𝜇T x − 𝛿𝑡 − 1−𝛼
𝛿
∑︀
𝑖 𝑝𝑖 𝑢𝑖
subject to u ≥ −RT x − 𝑡,
u ≥ 0, (8.37)
1T x = 1,
x ≥ 0.

8.4.3 The Fusion model


Here we show the Fusion model of problem (8.37). We give the constraints modeling the
maximum function in a separate function called maximum:
# max(x, y) <= t
def maximum(M, x, y, t):
M.constraint(Expr.sub(t, x), Domain.greaterThan(0.0))
M.constraint(Expr.sub(t, y), Domain.greaterThan(0.0))

def EfficientFrontier(N, T, m, R, p, alpha, deltas):

with Model("Case study") as M:


# Settings
#M.setLogHandler(sys.stdout)

# Variables
# The variable x is the fraction of holdings relative to the initial␣
˓→capital.

# It is constrained to take only positive values.


x = M.variable("x", N, Domain.greaterThan(0.0))

# Budget constraint
M.constraint('budget', Expr.sum(x), Domain.equalsTo(1.0))

# Auxiliary variables.
t = M.variable("t", 1, Domain.unbounded())
u = M.variable("u", T, Domain.unbounded())

# Constraint modeling maximum


maximum(M, Expr.sub(Expr.mul(-R.T, x), Expr.repeat(t, T, 0)), Expr.
˓→constTerm(T, 0.0), u)

# Objective
delta = M.parameter()
cvar_term = Expr.add(t, Expr.mul(1/(1-alpha), Expr.dot(p, u)))
M.objective('obj', ObjectiveSense.Maximize, Expr.sub(Expr.dot(m, x),␣
˓→Expr.mul(delta, cvar_term)))

# Create DataFrame to store the results.


columns = ["delta", "obj", "return", "risk"] + df_prices.columns.
˓→tolist()
(continues on next page)

81
(continued from previous page)
df_result = pd.DataFrame(columns=columns)
for d in deltas:
# Update parameter
delta.setValue(d)

# Solve optimization
M.solve()

# Save results
portfolio_return = m @ x.level()
portfolio_risk = t.level()[0] + 1/(1-alpha) * np.dot(p, u.level())
row = pd.Series([d, M.primalObjValue(), portfolio_return, portfolio_
˓→risk] + list(x.level()), index=columns)

df_result = df_result.append(row, ignore_index=True)

return df_result

Next, we compute the efficient frontier. We select the confidence level 𝛼 = 0.95. The
following code produces the optimization results:

alpha = 0.95

# Compute efficient frontier with and without shrinkage


deltas = np.logspace(start=-1, stop=2, num=20)[::-1]
df_result = EfficientFrontier(N, T, m, R, p, alpha, deltas)

On Fig. 8.1 we can see the risk-return plane, and on Fig. 8.2 the portfolio composition
for different levels of risk.

82
Fig. 8.1: The CVaR efficient frontier.

Fig. 8.2: Portfolio composition x with varying level of CVaR risk.

83
Chapter 9

Risk budgeting

Traditional MVO only considers the risk of the portfolio and ignores how the risk is
distributed among individual securities. This might result in risk being concentrated into
only a few securities, when these gain too much weight in the portfolio.
Risk budgeting techniques were developed to give portfolio managers more control over
the amount of risk each security contributes to the total portfolio risk. They allocate the
risk instead of the capital, thus diversify risk more directly. Risk budgeting also makes
the resulting portfolio less sensitive to estimation errors, because it excludes the use of
expected return estimates [[CK20], [Pal20]].

9.1 Risk contribution


If the risk measure is a homogeneous function of degree one, then we can use Euler
decomposition write the total portfolio risk ℛx as the sum of risk contributions rcx,𝑖 of
each security:
𝑁 𝑁 𝑁
∑︁ ∑︁ ∑︁ 𝜕ℛx
ℛx = rcx,𝑖 = 𝑥𝑖 mrcx,𝑖 = 𝑥𝑖 = xT ∇ℛx (9.1)
𝑖=1 𝑖=1 𝑖=1
𝜕𝑥𝑖

where mrcx,𝑖 = 𝜕ℛ𝜕𝑥𝑖


x
is called marginal risk contribution, because it measures the sensitiv-
ity of the portfolio standard deviation to changes in 𝑥𝑖 .

9.2 Risk budgeting portfolio


With the above definitions, the risk budgeting portfolio aims to allocate the risk contri-
butions according to a predetermined risk profile b. In other words, it aims to find the
portfolio x that solves the system
x ∘ ∇ℛx = b ∘ ℛx = b ∘ xT ∇ℛx (9.2)
where 1T b = 1 and b ≥ 0. A special case of this is the risk parity portfolio, for which
𝑏𝑖 = 𝑁1 .
Using that ∇ℛcx = ∇ℛx for homogeneous ℛx of degree one and 𝑐 ∈ R, we can see
that if xrb is a risk budgeting portfolio, then so is 𝑐xrb . In some solution approaches it is
useful to define a risk-normalized variable y = x/ℛx , and get
y ∘ ∇ℛy = b (9.3)

84
Working with this form is typically easier. After solving these equations, we can recover
the portfolio x by normalizing again.

9.3 Risk budgeting with variance


If we use portfolio variance as risk measure, i. e., set ℛx = xT Σx, then (9.2) will become
x ∘ Σx = b ∘ xT Σx, (9.4)
and (9.3) will become
y ∘ Σy = b (9.5)
Solving this system is inherently a non-convex problem because it has quadratic equality
constraints. However, we can still formulate convex optimization problems to find the
solution.
In general, however, we have to be aware that the conditions defining the risk bud-
geting portfolio are very restrictive. Therefore we have very limited possibility to further
constrain a risk budgeting problem, without making the risk budgeting portfolio infeasi-
ble.

9.3.1 Convex formulation using quadratic cone


Suppose we allocate the risk budget 𝑏𝑚 for each security 𝑖 in a group of securities 𝒢𝑚 ,
and we have 𝑀 such groups. As special cases, we have 𝑀 = 𝑁 when each security has a
unique risk budget, thus each 𝒢𝑚 contains only one security. In contrast, we have 𝑀 = 1
when all risk budgets 𝑏𝑖 = 1/𝑁 are the same, leading to the risk parity portfolio.
Let us now focus on group 𝒢𝑚 . We can define new variables 𝛾u,𝑚 and 𝛾l,𝑚 to bound
the risk contributions in this group from above and from below. First we express the risk
contributions as the fraction of the total risk 𝑏𝑚 xT Σx. We can bound this from above to
get the constraint
𝑏𝑚 xT Σx ≤ 𝛾u,𝑚
2
,
and model it using the quadratic cone.
Next, we bound the risk contributions from below as
2
𝑥𝑖 (Σx)𝑖 ≥ 𝛾l,𝑚 .
Assuming 𝑥𝑖 ≥ 0 and (Σx)𝑖 ≥ 0, we can model this using the power cone
1/2,1/2
as (𝑥 l,𝑚 ) ∈ 𝒫3
√𝑖 , (Σx)𝑖 , 𝛾√ , or equivalently using the rotated quadratic cone as
(𝑥𝑖 / 2, (Σx)𝑖 / 2, 𝛾l,𝑚 ) ∈ 𝒬3r .
Finally, minimizing the difference between the upper and lower bounds, we get the
risk budgeting portfolio as the optimal solution. The full optimization problem will look
like
minimize
∑︀𝑀
𝑚=1 𝛾u,𝑚 − 𝛾l,𝑚
subject to √ 1T x = 1,
(𝛾u,𝑚 / 𝑏𝑚 , GT x) ∈ 𝒬𝑘+1 , 𝑚 = 1, . . . , 𝑀,
1/2,1/2 (9.6)
(𝑥𝑖 , (Σx)𝑖 , 𝛾l,𝑚 ) ∈ 𝒫3 , 𝑖 ∈ 𝒢𝑚 , 𝑚 = 1, . . . , 𝑀,
x, Σx ≥ 0,
𝛾u,𝑚 ≥ 𝛾l,𝑚 , 𝑚 = 1, . . . , 𝑀,

85
where Σ = GGT for G ∈ R𝑁 ×𝑘 .
Note that we have to be careful adding further constraints to problem (9.6), because
these might make the risk budgeting portfolio infeasible.

9.3.2 Convex formulation using exponential cone


There also exists a different convex formulation of the risk budgeting problem, that works
not only in the positive orthant, but in any (prespecified) orthant of the portfolio space.
This allows us to work with long-short portfolios as well.
Observe that the risk budgeting equations (9.4) are the first-order optimality condi-
tions of the optimization problem

minimize 1 T
x Σx − 𝑐bT log (z ∘ x)
2 (9.7)
subject to z ∘ x ≥ 0.

We can use the parameter 𝑐 to scale the sum of vector b, leading to different magnitude
of
∑︀ the optimal solution. Thus we can ∑︀ directly find a solution that satisfies for example
𝑖 𝑥𝑖 = 1 or in the long-short case 𝑖 |𝑥𝑖 | = 1.
Problem (9.7) can be modeled using the exponential cone in the following way:

minimize 𝑠 − 𝑐bT t
subject to (𝑠, 1, GT x) ∈ 𝒬𝑘+2 ,
r
(9.8)
(𝑧𝑖 𝑥𝑖 , 1, 𝑡𝑖 ) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝑁,
z ∘ x ≥ 0,

where Σ = GGT for G ∈ R𝑁 ×𝑘 .


While this convex approach works with any sign configuration z ∈ {+1, −1}𝑁 of x, we
have to specify z as parameter, meaning we can still only compute the optimal solution
for a given orthant. The reason is that by design, there are solutions to (9.4) in each
orthant. There are 2𝑁 (normalized) solutions, one for each possible z ∘ x > 0 [CK20], i.
e., one in each orthant.1 So there is a local optimal solution in each orthant, meaning
that the problem is non-convex on the full space.
Also note that in this formulation we cannot add further constraints, because these
could alter the optimality conditions (9.4), making the solution invalid as a risk budgeting
portfolio.

9.3.3 Mixed integer formulation


In each orthant, the solution to (9.7) can have a different total risk. Therefore we can
try to find a risk budgeting portfolio that has low total risk. In order to do this, we can
extend problem (9.7) to be mixed integer. This will allow us to optimize over the full
space, including all long-short portfolios, without needing to prespecify the signs. After
defining separate variables based on Sec. 10.2.1 for the long and the short part of the
1
Of course the solutions in opposite orthants differ only by global sign, so these can be considered
the same solution.

86
portfolio, we get:
minimize 1 T
2
x Σx − 𝑐bT log (x+ + x− )
subject to x+ , x − ≥ 0,
x = x+ − x− ,
x+ ≤ 𝑀 z+ , (9.9)
x− ≤ 𝑀 z− ,
z + z−
+
≤ 1,
z+ , z− ∈ {0, 1}𝑁 .

What makes sure that problem (9.9) will find a low risk solution? The first term of
the objective function
√ is the sum of risk budgets, thus in any optimal solution, we must
have xT Σx = 𝑐, because the risk budgeting conditions are satisfied. It follows that the
second term will decide the optimal(︀ objective)︀ value. This logarithmic term will become
𝑐
the lowest if the monomial |x|𝑐b = ‖x‖1 |x̂|b is largest, where x̂ denotes a unit vector.
Depending on b, this implies an optimal x with large 1-norm. It follows that after
normalization, this x will yield a low value for the total risk xT Σx. Note however, that
it is not guaranteed to get the lowest risk, because |x̂|b can also vary.
A further tradeoff with the mixed integer approach is that we have no performance
guarantee, finding the optimal solution can take a lot of time. Most likely we will have
to settle with suboptimal portfolios in terms of risk.

9.4 Example
In this example we show how can we find a risk parity portfolio by solving a convex
optimization problem in MOSEK Fusion.
The input data is again obtained the same way as detailed in Sec. 3.4.2. Here we
assume that the covariance matrix estimate Σ of yearly returns is available.
The optimization problem we solve here is (9.8), which we repeat here:
minimize 𝑠 − 𝑐bT t
subject to (𝑠, 1, GT x) ∈ 𝒬𝑘+2 ,
r
(9.10)
(𝑧𝑖 𝑥𝑖 , 1, 𝑡𝑖 ) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝑁,
z ∘ x ≥ 0,
We search for a solution in the positive orthant, so we set z = 1. We choose the risk
budget vector to be b = 1/𝑁 , so that all securities contribute the same amount of risk.
The Fusion model of (9.10):

def RiskBudgeting(N, G, b, z, a):

with Model('Risk budgeting') as M:


# Settings
M.setLogHandler(sys.stdout)

# Portfolio weights
x = M.variable("x", N, Domain.unbounded())

# Orthant specifier constraint


(continues on next page)

87
(continued from previous page)
M.constraint("orthant", Expr.mulElm(z, x), Domain.greaterThan(0.0))

# Auxiliary variables
t = M.variable("t", N, Domain.unbounded())
s = M.variable("s", 1, Domain.unbounded())

# Objective function: 1/2 * x'Sx - a * b'log(z*x) becomes s - a * b't


M.objective(ObjectiveSense.Minimize, Expr.sub(s, Expr.mul(a, Expr.dot(b,␣
˓→t))))

# Bound on risk term


M.constraint(Expr.vstack(s, 1, Expr.mul(G.T, x)), Domain.inRotatedQCone())

# Bound on log term t <= log(z*x) becomes (z*x, 1, t) in K_exp


M.constraint(Expr.hstack(Expr.mulElm(z, x), Expr.constTerm(N, 1.0), t),␣
˓→Domain.inPExpCone())

# Create DataFrame to store the results.


columns = ["obj", "risk", "xsum", "bsum"] + df_prices.columns.tolist()
df_result = pd.DataFrame(columns=columns)

# Solve optimization
M.solve()

# Save results
xv = x.level()

# Check solution quality


risk_budgets = xv * np.dot(G @ G.T, xv)

# Renormalize to gross exposure = 1


xv = xv / np.abs(xv).sum()

# Compute portfolio metrics


Gx = np.dot(G.T, xv)
portfolio_risk = np.sqrt(np.dot(Gx, Gx))

row = pd.Series([M.primalObjValue(), portfolio_risk, np.sum(z * xv), np.


˓→sum(risk_budgets)] + list(xv), index=columns)

df_result = df_result.append(row, ignore_index=True)


row = pd.Series([None] * 4 + list(risk_budgets), index=columns)
df_result = df_result.append(row, ignore_index=True)

return df_result

The following code defines the parameters, including the matrix G such that Σ =
GG𝑇 .

# Number of securities
(continues on next page)

88
(continued from previous page)
N = 8
# Risk budget
b = np.ones(N) / N
# Orthant selector
z = np.ones(N)
# Global setting for sum of b
c = 1
# Cholesky factor of the covariance matrix S
G = np.linalg.cholesky(S)

Finally, we produce the optimization results:

df_result = RiskBudgeting(N, G, b, z, c)

On Fig. 9.1 we can see the portfolio composition, and on Fig. 9.2 the risk contributions
of each security.

Fig. 9.1: The risk budgeting portfolio.

89
Fig. 9.2: The risk contributions of each security.

90
Chapter 10

Appendix

10.1 Conic optimization refresher


In this section we give a short summary about conic optimization. Read more about
the topic from a mathematical and modeling point of view in our other publication, the
MOSEK Modeling Cookbook [MOSEKApS21].
Conic optimization is a class of convex optimization problems, which contains and
generalizes in an unified way many specific and well known convex models. These models
include linear optimization (LO), quadratic optimization (QO), quadratically constrained
quadratic optimization (QCQO), second-order cone optimization (SOCO), and semidefi-
nite optimization (SDO). The general form of a conic optimization problem is
maximize cT x
(10.1)
subject to Ax + b ∈ 𝒦.
where 𝒦 is a product of the following basic types of cones:
• Linear cone:
R𝑛 , R𝑛+ , {0}.
Linear cones model all traditionally LO problems.
• Quadratic cone and rotated quadratic cone:
The quadratic cone is the set
{︂ ⃒ √︁ }︂
𝑛

𝑛⃒ 2
𝒬 = 𝑥 ∈ R ⃒𝑥1 ≥ 𝑥2 + · · · + 𝑥2𝑛 .

The rotated quadratic cone is the set



𝒬𝑛r = 𝑥 ∈ R𝑛 ⃒2𝑥1 𝑥2 ≥ 𝑥23 + · · · + 𝑥2𝑛 , 𝑥1 , 𝑥2 ≥ 0 .
{︀ }︀

Modeling with these two cones cover the class of SOCO problems, which include all
traditionally QO and QCQO problems as well. See in Sec. 10.1.2 for more details.
• Primal power cone:
{︂ ⃒ √︁ }︂
𝛼,1−𝛼

𝑛 ⃒ 𝛼 1−𝛼 2 2
𝒫𝑛 = 𝑥 ∈ R ⃒𝑥1 𝑥2 ≥ 𝑥3 + · · · + 𝑥𝑛 , 𝑥1 , 𝑥2 ≥ 0 ,

or its dual cone.

91
• Primal exponential cone:
{︂ ⃒ (︂ )︂ }︂

3⃒ 𝑥3
𝐾exp = 𝑥 ∈ R ⃒𝑥1 ≥ 𝑥2 exp , 𝑥1 , 𝑥2 ≥ 0 ,
𝑥2
or its dual cone.

• Semidefinite cone:

𝒮+𝑛 = 𝑋 ∈ R𝑛×𝑛 ⃒𝑋 is symmetric positive semidefinite .


{︀ ⃒ }︀

Semidefinite cones model all traditionally SDO problems.


Each of these cones allow formulating different types of convex constraints.

10.1.1 Selection of conic constraints


In the following we will list the constraints appearing in financial context in this book
and show how can we convert them to conic form.

Maximum function
We can model the maximum constraint max(𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) ≤ 𝑐 using linear constraints
by introducing an auxiliary variable 𝑡:

𝑡 ≤ 𝑐,
𝑡 ≥ 𝑥1 ,
.. (10.2)
.
𝑡 ≥ 𝑥𝑛 .

For instance, we could write a series of constraints max(𝑥𝑖 , 0) ≤ 𝑐𝑖 , 𝑖 = 1, . . . , 𝑛 as

t ≤ c, t ≥ x, t ≥ 0, (10.3)

where t is an 𝑛-dimensional vector.

Positive and negative part


A special case of modeling the maximum function is to model the positive part 𝑥+ and
negative part 𝑥− of a variable 𝑥. We define these as 𝑥+ = max(𝑥, 0) and 𝑥− = max(−𝑥, 0).
We can model them explicitly based on Sec. 10.1.1 by relaxing the definitions to
inequalities 𝑥+ ≥ max(𝑥, 0) and 𝑥− ≥ max(−𝑥, 0), or we can also model them implicitly
by the following set of constraints:

𝑥 = 𝑥+ − 𝑥− ,
+ − (10.4)
𝑥 , 𝑥 ≥ 0.

Note however, that in either case a freedom remains in the magnitude of 𝑥+ and 𝑥− .
This is because in the explicit case we relaxed the equalities, and in the impicit case
only the difference of the variables is constrained. In other words it will be possible for
𝑥+ and 𝑥− to be both positive, allowing optimal solutions where 𝑥+ = max(𝑥, 0) and
𝑥− = max(−𝑥, 0) does not hold. We could ensure that only either 𝑥+ or 𝑥− is positive

92
and never both by stating the complementarity constraint 𝑥+ 𝑥− = 0 (or in the vector case
⟨x+ , x− ⟩ = 0), but unfortunately such an equality constraint is non-convex and cannot
be modeled.
We can do workarounds to ensure that equalities 𝑥+ = max(𝑥, 0) and 𝑥− = max(−𝑥, 0)
hold in the optimal solution. One approach is to penalize the magnitude of these two
variables, so that if both are positive in any solution, then the solver could always im-
prove the objective by reducing them until either one becomes zero. Another possible
workaround is to formulate a mixed integer problem; see Sec. 10.2.1.

Absolute value
We can model the absolute value constraint |𝑥| ≤ 𝑐 using the maximum function by
observing that |𝑥| = max(𝑥, −𝑥) (see Sec. 10.1.1):

−𝑐 ≤ 𝑥 ≤ 𝑐. (10.5)

Another possibility is to model it using the quadratic cone:

(𝑐, 𝑥) ∈ 𝒬2 . (10.6)

Sum of largest elements


The sum of the 𝑚 largest elements of a vector x is the optimal solution of the LO problem

maximize xT z
subject to 1T z = 𝑚, (10.7)
0 ≤ z ≤ 1.

Here x cannot be a variable, because that would result in a nonlinear objective. Let us
take the dual of problem (10.7):

minimize 𝑚𝑡 + 1T u
subject to u + 𝑡 ≥ x, (10.8)
u ≥ 0.

Problem (10.8) is actually the same as min𝑡 𝑚𝑡 + 𝑖 max(0, 𝑥𝑖 − 𝑡). In this case x can
∑︀
be a variable, and thus it can also be optimized.

Linear combination of largest elements


We can slightly extend the problem in Sec. 10.1.1 such that z can have an upper bound
c∑︀≥ 0, and there is a real number 0 ≤ 𝑏 ≤ 𝑐sum instead of the integer 𝑚, where 𝑐sum =
𝑖 𝑐𝑖 :

maximize xT z
subject to 1T z = 𝑐sum − 𝑏, (10.9)
0 ≤ z ≤ c.
∑︀𝑖𝑏 −1
This has the optimal objective 𝑐frac 𝑐𝑖 𝑥𝑖 , where 𝑖𝑏 is such that
∑︀
𝑖 𝑏 𝑥𝑖 𝑏 + 𝑖>𝑖𝑏 𝑖=1 𝑐𝑖 < 𝑏 ≤
𝑖=1 𝑐𝑖 , and 𝑐𝑖𝑏 = 𝑖=1 𝑐𝑖 − 𝑏 < 𝑐𝑖𝑏 .
∑︀𝑖𝑏 frac
∑︀𝑖𝑏

93
If we take the dual of problem (10.9), we get:

minimize (𝑐sum − 𝑏)𝑡 + cT u


subject to u + 𝑡 ≥ x, (10.10)
u ≥ 0,

which is the same as min𝑡 (𝑐sum − 𝑏)𝑡 + 𝑖 𝑐𝑖 max(0, 𝑥𝑖 − 𝑡).


∑︀

Manhattan norm
Let x ∈ R𝑛 and observe that ‖x‖1 = |𝑥1 | + · · · + |𝑥𝑛 |. Then we can model the Manhattan
norm or 1-norm constraint ‖x‖1 ≤ 𝑐 by modeling the absolute value for each coordinate:
𝑛
∑︁
−z ≤ x ≤ z, 𝑧𝑖 = 𝑐, (10.11)
𝑖=1

where z is an auxiliary variable.

Euclidean norm

Let x ∈ R𝑛 and observe that ‖x‖2 = 𝑥21 + · · · + 𝑥2𝑛 . Then we can model the Euclidean
√︀

norm or 2-norm constraint ‖x‖2 ≤ 𝑐 using the quadratic cone:

(𝑐, x) ∈ 𝒬𝑛+1 . (10.12)

Squared Euclidean norm


Let x ∈ R𝑛 and observe that ‖x‖22 = xT x = 𝑥21 + · · · + 𝑥2𝑛 . Then we can model the squared
Euclidean norm or sum-of-squares constraint ‖x‖22 ≤ 𝑐 using the rotated quadratic cone:

𝑐, 12 , x ∈ 𝒬𝑛+2 (10.13)
(︀ )︀
r .

Quadratic form
Let x ∈ R𝑛 and let Q ∈ 𝒮+𝑛 , i. e., a symmetric positive semidefinite matrix. Then we
can model the quadratic form constraint 21 xT Qx ≤ 𝑐 either using the quadratic cone or
the rotated quadratic cone. To see this, observe that there exists a matrix G ∈ R𝑛×𝑘
such that Q = GGT . Of course this decomposition can be done in many ways, so the
matrix G is not unique. The most interesting cases are when 𝑘 ≪ 𝑛 or G is very sparse,
because these make the optimization problem much easier to solve numerically (see in
Sec. 10.1.2).
Most common ways to compute the matrix G are the following:

• Cholesky decomposition: Q = CCT , where C is a lower triangular matrix with


nonnegative entries in the diagonal. From this decomposition we have G = C ∈
R𝑛×𝑛 .

• Eigenvalue decomposition: Q = VDVT , where the diagonal matrix D contains the


(nonnegative) eigenvalues of Q and the unitary matrix V contains the corresponding
eigenvectors in its columns. From this decomposition we have G = VD1/2 ∈ R𝑛×𝑛 .

94
• Matrix square root: Q = Q1/2 Q1/2 , where Q1/2 is the symmetric positive semidefi-
nite “square root” matrix of Q. From this decomposition we have G = Q1/2 ∈ R𝑛×𝑛 .

• Factor model: If Q is a covariance matrix of some data, then we can approximate


the data series with the combination of 𝑘 ≪ 𝑛 common factors. Then we have
the decomposition Q = 𝛽Q𝐹 𝛽 T + D, where Q𝐹 ∈ R𝑘×𝑘 is the covariance of the
factors, 𝛽 ∈ R𝑛×𝑘 is the exposure of the data series to each factor, and D is
diagonal. From this, by computing the Cholesky decomposition Q𝐹 = FFT we
have G = [𝛽F, D1/2 ] ∈ R𝑛×(𝑛+𝑘) . The advantage of factor models is that G is very
sparse and the factors have a direct financial interpretation (see Sec. 5 for details).

After obtaining the matrix G, we can write the quadratic form constraint as a sum-
of-squares 12 xT GGT x ≤ 𝑐, which is a squared Euclidean norm constraint 21 ‖GT x‖22 ≤ 𝑐.
We can choose to model this using the rotated quadratic cone as

𝑐, 1, GT x ∈ 𝒬𝑘+2 (10.14)
(︀ )︀
r ,

or we can choose to model its square root using the quadratic cone as

𝑐, GT x ∈ 𝒬𝑘+1 . (10.15)
(︀√ )︀

Whether to use the quadratic cone or the rotated quadratic cone for modeling can be
decided based on which is more natural. Typically the quadratic cone is used to model
2-norm constraints, while the rotated quadratic cone is more natural for modeling of
quadratic functions. There can be exceptions, however; see for example in Sec. 2.3.

Power
Let 𝑥 ∈ R and 𝛼 > 1. Then we can model the power constraint 𝑐 ≥ |𝑥|𝛼 or equivalently
𝑐1/𝛼 ≥ |𝑥| using the power cone:

(𝑐, 1, 𝑥) ∈ 𝒫3
1/𝛼,(𝛼−1)/𝛼
. (10.16)

Exponential
Let 𝑥 ∈ R. Then we can model the exponential constraint 𝑡 ≥ 𝑒𝑥 using the exponential
cone:

(𝑡, 1, 𝑥) ∈ 𝐾exp . (10.17)

Log-sum-exp
Let 𝑥1 , . . . , 𝑥𝑛 ∈ R. Then we can model the log-sum-exp constraint 𝑡 ≥ log ( 𝑒𝑥𝑖 ) by
∑︀𝑛
𝑖=1
applying rule Sec. 10.1.1 𝑛 times:
∑︀𝑛
𝑖=1 𝑢𝑖 ≤ 1, (10.18)
(𝑢𝑖 , 1, 𝑥𝑖 − 𝑡) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝑛

95
Perspective of function
The perspective of a function 𝑓 (𝑥) is defined as 𝑠𝑓 (𝑥/𝑠) on 𝑠 > 0. From any conic
representation of 𝑡 ≥ 𝑓 (𝑥) we can reach a representation of 𝑡 ≥ 𝑠𝑓 (𝑥/𝑠) by substituting
all constants 𝑐 with their homogenized counterpart 𝑠𝑐.

Perspective of log-sum-exp
Let 𝑥1 , . (︀. ∑︀
. , 𝑥𝑛 ∈ R.)︀ Then we can model the perspective of the log-sum-exp constraint
𝑡 ≥ 𝑠log 𝑛
𝑖=1 𝑒
𝑥𝑖 /𝑠
by applying rule Sec. 10.1.1 on constraint (10.18):
∑︀𝑛
𝑢𝑖 ≤ 𝑠,
𝑖=1 (10.19)
(𝑢𝑖 , 𝑠, 𝑥𝑖 − 𝑡) ∈ 𝐾exp , 𝑖 = 1, . . . , 𝑛

10.1.2 Traditional quadratic models


Optimization problems involving quadratic functions often appear in practice. In this
section we show how to convert the traditionally QO or QCQO problems into conic
optimization form, because most of the time the solution of these conic models is com-
putationally more efficient.

Quadratic optimization
The standard form of a quadratic optimization (QO) problem is the following:

minimize 1 T
2
x Qx + cT x
subject to Ax = b, (10.20)
x ≥ 0.

The matrix Q ∈ R𝑛×𝑛 must be symmetric positive semidefinite, otherwise the objective
function would not be convex.
Assuming the factorization Q = GGT with G ∈ R𝑛×𝑘 , we can reformulate the problem
(10.20) as a conic problem by applying the method described in Sec. 10.1.1:

minimize 𝑡 + cT x
subject to Ax = b,
(10.21)
x ≥ 0,
(𝑡, 1, G x) ∈ 𝒬𝑘+2
T
r .

Quadratically constrained quadratic optimization


Consider the quadratically constrained quadratic optimization (QCQO) problem of the
form
minimize 1 T
x Q0 x + cT0 x + 𝑎0
2 (10.22)
subject to 1 T
2
x Q𝑖 x + cT𝑖 x + 𝑎𝑖 ≤ 0, 𝑖 = 1, . . . , 𝑚.

The matrices Q𝑖 ∈ R𝑛×𝑛 , 𝑖 = 0, . . . , 𝑚 must all be symmetric positive semidefinite,


otherwise the optimization problem would not be convex.

96
Assuming the factorization Q𝑖 = G𝑖 GT𝑖 with G𝑖 ∈ R𝑛×𝑘𝑖 , we can reformulate the
problem (10.22) as a conic problem by applying the method described in Sec. 10.1.1 for
both the objective and the constraints:

minimize 𝑡0 + cT0 x + 𝑎0
subject to 𝑡𝑖 + cT𝑖 x + 𝑎𝑖 ≤ 0, 𝑖 = 1, . . . , 𝑚, (10.23)
(𝑡𝑖 , 1, GT𝑖 x) ∈ 𝒬r𝑘𝑖 +2 , 𝑖 = 0, . . . , 𝑚.

Practical benefits of conic models


The key step in the model conversion is to transform quadratic terms xT Qx using the
factorization Q = GGT , where G ∈ R𝑛×𝑘 . Assuming 𝑘 ≪ 𝑛, it results in the following
benefits:

• The storage requirement 𝑛𝑘 of G can be much lower than the storage requirement
𝑛2 /2 of Q.

• The amount of work to evaluate xT Qx is proportional to 𝑛2 whereas the work to


evaluate ‖GT x‖22 is proportional to 𝑛𝑘 only.

• No need to numerically validate positive semidefiniteness of the matrix Q. This


is otherwise difficult owing to the presence of rounding errors that can make Q
indefinite.

• Duality theory is much simpler for conic quadratic optimization.

In summary, the conic equivalents are not only as easy to solve as the original QP or
QCQP problems, but in most cases also need less space and solution time.

10.2 Mixed-integer models


Mixed integer optimization (MIO) is an extension of convex optimization, which intro-
duces variables taking values only in the set of integers or in some subset of integers. This
allows for solving a much wider range of optimization problems in addition to the ones
convex optimization already covers. However, these problems are no longer convex. Han-
dling of integer variables make the optimization NP-hard and needs specific algorithms,
thus the solution of MIO problems is much slower. In many practical cases we cannot
expect to find the optimal solution in reasonable time, and only near-optimal solutions
will be available.
A general mixed integer optimization problem has the following form:

maximize cT x
subject to Ax + b ∈ 𝒦, (10.24)
𝑥𝑖 ∈ Z, 𝑖 ∈ ℐ.

where 𝒦 is a cone and ℐ ⊆ {1, . . . , 𝑛} contains the indices of integer variables. We can
model any finite range for the integer variable 𝑥𝑖 by simply adding the extra constraint
𝑎𝑖 ≤ 𝑥 𝑖 ≤ 𝑏 𝑖 .

97
10.2.1 Selection of integer constraints
In the following we will list the integer constraints appearing in financial context in this
book and show how to model them.

Switch
In some practical cases we might wish to impose conditions on parts of our optimization
model. For example, allowing nonzero value for a variable only in presence of a condition.
We can model such situations using binary variables (or indicator variables), which can
only take 0 or 1 values. The following set of constraints only allow 𝑥 to be positive when
the indicator variable 𝑦 is equal to 1:

𝑥 ≤ 𝑀 𝑦,
(10.25)
𝑦 ∈ {0, 1}.

The number 𝑀 here is not related to the optimization problem, but it is necessary to
form such a switchable constraint. If 𝑦 = 0, then the upper limit of 𝑥 is 0. If 𝑦 = 1, then
the upper limit of 𝑥 is 𝑀 . This modeling technique is called big-M. The choice of 𝑀
can affect the solution performance, but a nice feature of it is that the problem cannot
accidentally become infeasible.
If we have a vector variable x then the switch will look like:
x ≤ M ∘ y,
(10.26)
y ∈ {0, 1}𝑛 ,

where ∘ denotes the elementwise product. We accounted for a possibly different big-M
value for each coordinate.

Semi-continuous variable
A slight extension of Sec. 10.2.1 is to model semi-continuous variables, i. e. when 𝑥 ∈
{0} ∪ [𝑎, 𝑏], where 0 < 𝑎 ≤ 𝑏. We can model this by

𝑎𝑦 ≤ 𝑥 ≤ 𝑏𝑦, 𝑦 ∈ {0, 1}. (10.27)

Cardinality
We might need to limit the number of nonzeros in a vector x to some number 𝐾 < 𝑛.
We can do this with the help of an indicator vector y of length 𝑛, which indicates |x| =
̸ 0
(see Sec. 10.2.1). First we add a big-M bound M ∘ y to the absolute value, and model it
based on Sec. 10.1.1. Then we limit the cardinality of x by limiting the cardinality of y:

x ≤ M ∘ y,
x ≥ −M ∘ y,
(10.28)
1T y ≤ 𝐾,
y ∈ {0, 1}𝑛 .

98
Positive and negative part
We introduced the positive part 𝑥+ and negative part 𝑥− of a variable 𝑥 in Sec. 10.1.1.
We noted that we need a way to ensure only 𝑥+ or 𝑥− will be positive in the optimal
solution and not both. One such way is to use binary variables:
𝑥 = 𝑥+ − 𝑥− ,
𝑥+ ≤ 𝑀 𝑦,
𝑥− ≤ 𝑀 (1 − 𝑦), (10.29)
𝑥+ , 𝑥− ≥ 0,
𝑦 ∈ {0, 1}.
Here the binary variable 𝑦 allows the positive (negative) part to become nonzero exactly
when the negative (positive) part is zero.
Sometimes we need to handle separately the case when both the positive and the
negative part is fixed to be zero. Then we need to introduce two binary variables 𝑦 + and
𝑦 − , and include the constraint 𝑦 + + 𝑦 − ≤ 1 to prevent both variables from being 1:
𝑥 = 𝑥 + − 𝑥− ,
𝑥+ ≤ 𝑀 𝑦+,
𝑥− ≤ 𝑀 𝑦−,
(10.30)
𝑥+ , 𝑥− ≥ 0,
𝑦+ + 𝑦− ≤ 1,
𝑦+, 𝑦− ∈ {0, 1}.

10.3 Quadratic cones and riskless solution


In Sec. 6 we consider portfolio optimization problems with risk-free security. In this case
there is a difference in the computation of the efficient frontier, if we model using the
quadratic cone instead of the rotated quadratic cone. Namely, the optimization problem
will not have solutions that are a mix of the risk-free security and risky securities. Instead,
it will only have either the 100% risk-free solution, or a portfolio of only risky securities.
Along the derivation below we will see that the same behavior applies to problems not
having a risk-free security but having x = 0 as a feasible solution.
Consider the following simple model:

maximize 𝜇T x + 𝑟f 𝑥f − 𝛿˜ xT Σx
subject to 1T x + 𝑥f = 1, (10.31)
f
x, 𝑥 ≥ 0,

where 𝑟f is the risk-free rate and 𝑥f is the allocation to the risk-free asset.
We can transform this problem using 𝑥f = 1 − 1T x, such that we are only left with
the variable x:

maximize (𝜇 − 𝑟f 1)T x + 𝑟f − 𝛿˜ xT Σx
subject to 1T x ≤ 1, (10.32)
x ≥ 0.
The feasible region in this form is a probability simplex. The solution x = 0 means that
everything is allocated to the risk-free security, and the hyperplane 1T x = 1 has all the
feasible solutions purely involving risky assets.

99
Let us denote the objective function value by obj(x). Then the directional derivative
of the objective along the direction u > 0, ||u|| = 1 will be

xT Σu
𝜕u obj(x) = (𝜇 − 𝑟f 1)T u − 𝛿˜ √ .
xT Σx
This does not depend on the norm of x, meaning that 𝜕u obj(𝑐u) will be constant in 𝑐:

𝜕u obj(𝑐u) = (𝜇 − 𝑟f 1)T u − 𝛿˜ uT Σu.

Thus the objective is linear along the 1-dimensional slice between x = 0 and any x > 0,
meaning that the optimal solution is either x = 0 or some x > 0 satisfying 1T x = 1.
Furthermore, along every 1-dimensional slice represented by a direction u, there is a
˜
𝛿 threshold
(𝜇 − 𝑟f 1)T u
𝛿˜u = √ .
uT Su
This is the Sharpe ratio of all portfolios of the form 𝑐u. If 𝛿˜ > 𝛿˜u then 𝜕u obj(u) turns
negative.
˜ the fewer directions u will exist for which 𝛿˜ < 𝛿˜u , i.
In general, the larger we set 𝛿,
e., 𝜕u obj(u) > 0, and the optimal solution is a combination of risky assets. The largest
𝛿˜ for which we still have such a direction is 𝛿˜* = maxu 𝛿˜u , the maximal Sharpe ratio.
The corresponding optimal portfolio is the one having the smallest risk while consisting
purely of risky assets. For 𝛿˜ > 𝛿˜* we have 𝜕u obj(u) < 0 in all directions, meaning that
the optimal solution will always be x = 0, the 100% risk-free portfolio.

10.4 Monte Carlo Optimization Selection (MCOS)


We can use the following procedure to compare the accuracy of different portfolio opti-
mization methods.
1. Inputs of the procedure are the estimated mean return 𝜇 and estimated covariance
Σ. Treat this pair of inputs as true values.

2. Compute the optimal asset allocation x for the original input pair (𝜇, Σ).

3. Repeat 𝐾sim times:

1. Do parametric bootstrap resampling, i. e., draw a new 𝑁 × 𝑇 return data


sample R𝑘 and derive a simulated pair (𝜇𝑘 , Σ𝑘 ).
2. De-noise the covariance matrix Σ𝑘 using the method in Sec. 4.2.2.
3. Compute the optimal portfolio x𝑘 using the optimization method(s) that we
wish to analyze.

4. To estimate the error for each security in the optimal portfolio, compute the stan-
dard deviation of the difference vectors x𝑘 − x. Then by taking the mean we get a
single number estimating the error of the optimization method.
Regarding the final step, we can also measure the average decay in performance by
any of the following:

100
• the mean difference in expected outcomes: 𝐾sim1 T
∑︀
𝑘 (x𝑘 − x) 𝜇

• the mean difference in variance: 𝐾sim


1 T
∑︀
𝑘 (x𝑘 − x) Σ(x𝑘 − w)

• the mean difference in Sharpe ratio or other metric computed from the above statis-
tics.

101
Chapter 11

Notation and definitions

Here we list some of the notations used in this book.

11.1 Financial notations


• 𝑁 : Number of securities in the security universe considered.

• 𝑝0,𝑖 : The known price of security 𝑖 at the beginning of the investment period.

• p0 : The known vector of security prices at the beginning of the investment period.

• 𝑃ℎ,𝑖 : The random price of security 𝑖 at the end of the investment period.

• 𝑃ℎ : The random vector of security prices at the end of the investment period.

• 𝑅𝑖 : The random rate of return of security 𝑖.

• 𝑅: The random vector of security returns.

• r: The known vector of security returns.

• R: The sample return data matrix consisting of security return samples as columns.

• 𝜇𝑖 : The expected rate of return of security 𝑖.

• 𝜇: The vector of expected security returns.

• Σ: The covariance matrix of security returns.

• 𝑥𝑖 : The fraction of funds invested into security 𝑖.

• x: Portfolio vector.

• x0 : Initial portfolio vector at the beginning of the investment period.

• x̃: The change in the portfolio vector compared to x0 .

• 𝑥f : The fraction of funds invested into the risk-free security.

• 𝑥f0 : The fraction of initial funds invested into the risk-free security.

• 𝑥˜f : The change in the risk-free investment compared to 𝑥f0 .

102
• 𝑅x : Portfolio return computed from portfolio x.

• Rx : Sample portfolio return computed from data matrix R and portfolio x.

• 𝜇x : Expected portfolio return computed from portfolio x.

• 𝜎x : Expected portfolio variance computed from portfolio x.

• 𝜇: Estimate of expected security return vector 𝜇.

• Σ: Estimate of security return covariance matrix Σ.

• 𝑇 : Number of data samples or scenarios.

• ℎ: Time period of investment.

• 𝜏 : Time period of estimation.

11.2 Mathematical notations


• xT : Transpose of vector x. Vectors are all column vectors, so their transpose is
always a row vector.

• E(𝑅): Expected value of 𝑅.

• Var(𝑅): Variance of 𝑅.

• Cov(𝑅𝑖 , 𝑅𝑗 ): Covariance of 𝑅𝑖 and 𝑅𝑗 .

• 1: Vector of ones.

• ℱ: Feasible region generated by a set of constraints.

• R𝑛 : Set of 𝑛-dimensional real vectors.

• Z𝑛 : Set of 𝑛-dimensional integer vectors.

• ⟨a, b⟩: Inner product of vectors a and b. Sometimes used instead of notation aT b.

• diag(S): Vector formed by taking the main diagonal of matrix S.

• Diag(x): Diagonal matrix with vector x in the main diagonal.

• Diag(S): Diagonal matrix formed by taking the main diagonal of matrix S.

• ℱ: Part of the feasible set of an optimization problem. Indicates that the problem
can be extended with further constraints.

• 𝑒: Euler’s number.

103
11.3 Abbreviations
• MVO: Mean–variance optimization

• LO: Linear optimization

• QO: Quadratic optimization

• QCQO: Quadratically constrained quadratic optimization

• SOCO: Second-order cone optimization

• SDO: Semidefinite optimization

104
Bibliography

[AJ12] A. Ahmadi-Javid. Entropic value-at-risk: a new coherent risk mea-


sure. Journal of Optimization Theory and Applications, 12 2012.
doi:10.1007/s10957-011-9968-2.

[AZ10] D. Avramov and G. Zhou. Bayesian portfolio analysis. Annual Review of


Financial Economics, 2(1):25–47, 2010.

[BN01] J. Bai and S. Ng. Determining the number of factors in approximate factor
models. Econometrica, 01 2001. doi:10.1111/1468-0262.00273.

[BS11] J. Bai and S. Shi. Estimating high dimensional covariance matrices and
its applications. Annals of Economics and Finance, 12:199–215, 11 2011.

[BGP12] D. Bertsimas, V. Gupta, and I. Paschalidis. Inverse optimization: a


new perspective on the black-litterman model. Operations Research,
60:1389–1403, 12 2012. doi:10.2307/23323707.

[BBD+17] S. Boyd, E. Busseti, S. Diamond, R. N. Kahn, K. Koh, P. Nys-


trup, and J. Speth. Multi-period trading via convex optimization. 2017.
arXiv:1705.00109.

[Bra10] M. W. Brandt. Chapter 5 - portfolio choice problems. In Handbook of


Financial Econometrics: Tools and Techniques, volume 1 of Handbooks
in Finance, pages 269–336. North-Holland, San Diego, 2010.

[CY16] J. Chen and M. Yuan. Efficient portfolio selection in a large


market. Journal of Financial Econometrics, 14:496–524, 06 2016.
doi:10.1093/jjfinec/nbw003.

[Con95] G. Connor. The three types of factor models: a comparison of their


explanatory power. Financial Analysts Journal, 51:42–46, 05 1995.
doi:10.2469/faj.v51.n3.1904.

[CM13] G. Coqueret and V. Milhau. Estimating covariance matrices for portfolio


optimization. 12 2013. EDHEC Risk Institute.

[CJPT18] G. Cornuéjols, J. Peña, and R. Tütüncü. Optimization Meth-


ods in Finance. Cambridge University Press, 2 edition, 2018.
doi:10.1017/9781107297340.

[CK20] Giorgio Costa and Roy H. Kwon. Generalized risk parity portfolio opti-
mization: an admm approach. J. of Global Optimization, 78(1):207–238,
sep 2020.

105
[dP19] M. López de Prado. A robust estimator of the efficient frontier. SSRN
Electronic Journal, 01 2019. doi:10.2139/ssrn.3469961.

[dPL19] M. López de Prado and M. Lewis. Detection of false investment strategies


using unsupervised learning methods. Quantitative Finance, 19:1–11, 07
2019. doi:10.1080/14697688.2019.1622311.

[DGNU09] V. Demiguel, L. Garlappi, F. Nogales, and R. Uppal. A generalized


approach to portfolio optimization: improving performance by con-
straining portfolio norms. Management Science, 55:798–812, 05 2009.
doi:10.2139/ssrn.1004707.

[Dod12] J. Dodson. Why is it so hard to estimate expected returns? 2012. Uni-


versity of Minnesota, School of Mathematics. URL: https://www-users.
math.umn.edu/~dodso013/docs/dodson2012-lambda.pdf.

[EM75] B. Efron and C. Morris. Data analysis using stein's estimator and its gen-
eralizations. Journal of The American Statistical Association, 70:311–319,
06 1975. doi:10.1080/01621459.1975.10479864.

[EM76] B. Efron and C. Morris. Families of minimax estimators of the mean


of a multivariate normal distribution. The Annals of Statistics, 01 1976.
doi:10.1214/aos/1176343344.

[FS02] Hans Foellmer and Alexander Schied. Convex measures of risk and
trading constraints. Finance and Stochastics, 6:429–447, 09 2002.
doi:10.1007/s007800200072.

[GK00] Richard C. Grinold and Ronald N. Kahn. Active portfolio management.


McGraw-Hill, New York, 2 edition, 2000.

[Har91] W.V. Harlow. Asset allocation in a downside-risk framework. Financial


Analysts Journal, 47(5):28–40, 1991. doi:10.2469/faj.v47.n5.28.

[JM03] R. Jagannathan and T. Ma. Risk reduction in large portfolios: why im-
posing the wrong constraint helps. Journal of Finance, 58:1651–1684, 08
2003. doi:10.1111/1540-6261.00580.

[Jor86] P. Jorion. Bayes-stein estimation for portfolio analysis. Journal of Finan-


cial and Quantitative Analysis, 21:279–292, 09 1986. doi:10.2307/2331042.

[Jor04] P. Jorion. Portfolio optimization with tracking-error constraints. Financial


Analysts Journal, 01 2004. doi:10.2469/faj.v59.n5.2565.

[KS13] R. Karels and M. Sun. Active portfolio construction when risk and alpha
factors are misaligned. In C. S. Wehn, C. Hoppe, and G. N. Gregoriou, ed-
itors, Rethinking Valuation and Pricing Models, pages 399–410. Academic
Press, 12 2013. doi:10.1016/B978-0-12-415875-7.00024-5.

[KY91] Hiroshi Konno and Hiroaki Yamazaki. Mean-absolute deviation portfolio


optimization model and its applications to tokyo stock market. Manage-
ment Science, 37(5):519–531, 1991.

106
[Lau01] G. J. Lauprête. Portfolio risk minimization under departures from normal-
ity. Ph.D. Thesis, Massachusetts Institute of Technology, Sloan School of
Management, Operations Research Center., 2001. URL: http://dspace.
mit.edu/handle/1721.1/7582.

[LW03a] O. Ledoit and M. Wolf. Honey, i shrunk the sample covariance matrix.
The Journal of Portfolio Management, 07 2003. doi:10.2139/ssrn.433840.

[LW03b] O. Ledoit and M. Wolf. Improved estimation of the covariance matrix of


stock returns with an application to portfolio selection. Journal of Em-
pirical Finance, 10(5):603–621, 2003. doi:https://doi.org/10.1016/S0927-
5398(03)00007-0.

[LW04] O. Ledoit and M. Wolf. A well-conditioned estimator for large-


dimensional covariance matrices. J. Multivariate Anal., 88(2):365–411,
2004. doi:https://doi.org/10.1016/S0047-259X(03)00096-4.

[LW20] O. Ledoit and M. Wolf. Analytical nonlinear shrinkage of


large-dimensional covariance matrices. The Annals of Statistics,
48(5):3043–3065, 2020. doi:10.1214/19-AOS1921.

[LC98] E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer,


New York, NY, 2 edition, 1998. doi:https://doi.org/10.1007/b98854.

[LMFB07] M. S. Lobo, M. Fazel, and S. Boyd. Portfolio optimization with linear and
fixed transaction costs. Annals of Operations Research, 152:341–365, 03
2007. doi:10.1007/s10479-006-0145-1.

[LB22] E. Luxenberg and S. Boyd. Portfolio construction with gaussian mixture


returns and exponential utility via convex optimization. 2022. URL: https:
//arxiv.org/abs/2205.04563, doi:10.48550/ARXIV.2205.04563.

[MWOS15] R. Mansini, W. Ogryczak, and M. G. Speranza. Linear and Mixed Integer


Programming for Portfolio Optimization. Springer International Publish-
ing, 1 edition, 2015. doi:10.1007/978-3-319-18482-1.

[Meu05] A. Meucci. Risk and Asset Allocation. Springer-Verlag Berlin Heidelberg,


1 edition, 2005. doi:10.1007/978-3-540-27904-4.

[Meu10a] A. Meucci. Quant nugget 2: linear vs. compounded returns – common


pitfalls in portfolio management. GARP Risk Professional, pages 49–51,
04 2010. URL: https://ssrn.com/abstract=1586656.

[Meu10b] A. Meucci. Quant nugget 4: annualization and general projection of


skewness, kurtosis and all summary statistics. GARP Risk Professional
- "The Quant Classroom", pages 59–63, 08 2010. URL: https://ssrn.com/
abstract=1635484.

[Meu10c] A. Meucci. Quant nugget 5: return calculations for leveraged securities and
portfolios. GARP Risk Professional, pages 40–43, 10 2010. URL: https:
//ssrn.com/abstract=1675067.

107
[Meu11] A. Meucci. 'the prayer' ten-step checklist for advanced risk and portfolio
management. SSRN, 02 2011. URL: https://ssrn.com/abstract=1753788.
[MM08] Richard Michaud and Robert Michaud. Efficient Asset Management: A
Practical Guide to Stock Portfolio Optimization and Asset Allocation 2nd
Edition. Oxford University Press, 2 edition, 01 2008.
[Pal20] D. P. Palomar. Risk parity portfolio. 2020. Lecture notes, The Hong Kong
University of Science and Technology.
[Ric99] J. A. Richards. An introduction to james–stein estimation. 11 1999. M.I.T.
EECS Area Exam Report.
[RU00] R. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk.
Journal of risk, 2:21–42, 01 2000.
[RU02] R. T. Rockafellar and S. P. Uryasev. Conditional value-at-risk for general
loss distributions. Journal of Banking & Finance, 26(7):1443–1471, 2002.
doi:https://doi.org/10.1016/S0378-4266(02)00271-6.
[RUZ06] R. T. Rockafellar, S. P. Uryasev, and M. Zabarankin. Generalized devia-
tions in risk analysis. Finance and Stochastics, 10:51–74, 2006.
[RSTF20] D. Rosadi, E. Setiawan, M. Templ, and P. Filzmoser. Robust covari-
ance estimators for mean-variance portfolio optimization with trans-
action lots. Operations Research Perspectives, 7:100154, 06 2020.
doi:10.1016/j.orp.2020.100154.
[RWZ99] M. Rudolf, H.-J. Wolter, and H. Zimmermann. A linear model for tracking
error minimization. Journal of Banking & Finance, 23(1):85–103, 1999.
doi:https://doi.org/10.1016/S0378-4266(98)00076-4.
[The71] H. Theil. Principles of Econometrics. John Wiley and Sons, 1 edition, 06
1971.
[TLD+11] B. Tóth, Y. Lemperiere, C. Deremble, J. Lataillade, J. Kockelko-
ren, and J.-P. Bouchaud. Anomalous price impact and the critical na-
ture of liquidity in financial markets. Physical Review X, 05 2011.
doi:10.2139/ssrn.1836508.
[VD09] F. J. Nogales V. DeMiguel. Portfolio selection with ro-
bust estimation. Operations Research, 57(3):560–577, 02 2009.
doi:https://doi.org/10.1287/opre.1080.0566.
[WZ07] R. Welsch and X. Zhou. Application of robust statistics to asset allocation
models. REVSTAT, 5:97–114, 03 2007.
[WO11] M. Woodside-Oriakhi. Portfolio Optimisation with Transaction Cost. PhD
thesis, School of Information Systems, Computing and Mathematics,
Brunel University, 01 2011.
[MOSEKApS21] MOSEK ApS. MOSEK Modeling Cookbook. MOSEK ApS, Fruebjergvej
3, Boks 16, 2100 Copenhagen O, 2021. Last revised June 2021. URL:
https://docs.mosek.com/modeling-cookbook/index.html.

108

You might also like