Portfolio Management Using Value at Risk - A Comparison Between Genetic Algorithms and Particle Swarm Optimization
Portfolio Management Using Value at Risk - A Comparison Between Genetic Algorithms and Particle Swarm Optimization
First I would like to thank my supervisor Dr. Ir. Jan van den Berg for his help and
invaluable insights. It was really a pleasure to work with him.
Furthermore, I would like to thank my family for their love and unconditional
support: Lourdes, Valdemar, Luciana, Paulo, Rodrigo and Lucas.
Finally, there were many people that I met during my brief stay in Rotterdam who
made my experience abroad unforgettable. Thank you for all the fun, my friends!
i
Abstract
ii
Contents
1 Introduction 1
2 Risk Measures 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The Mean-Variance Approach . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Parametric Method . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Historical Simulation Method . . . . . . . . . . . . . . . . . . . 14
2.3.3 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Coherent Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Experiment Set-Up 30
6 Conclusions 45
References 67
iv
List of Figures
C.1 Optimal portfolio weights, using di erent objective functions and di er-
ent horizons for the data . . . . . . . . . . . . . . . . . . . . . . . . . . 59
C.2 Typical run of PSO using bumping strategy . . . . . . . . . . . . . . . 60
C.3 Typical run of PSO using amnesia strategy . . . . . . . . . . . . . . . . 61
v
C.4 Typical run of PSO using random positioning strategy . . . . . . . . . 62
C.5 Typical run of GA using roulette wheel selection and whole arithmetic
crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
C.6 Typical run of GA using tournament selection and basic crossover . . . 64
vi
List of Tables
vii
Chapter 1
Introduction
The main idea of this Master Thesis is to check the applicability of Particle Swarm
Optimization (PSO) and Genetic Algorithms (GA) to risk management. A portfolio
containing multiple assets reduces the overall risk by diversifying away the idiosyncratic
risk. It is therefore good to consider as many assets as possible, with the limitations
of the costs of maintaining such a varied portfolio. Calculating the optimal weights for
the portfolio may be a computationally intensive task and thus it is interesting to nd
heuristic optimization methods that are fast and yet reliable. To test the performance
of PSO and GA in this task, subsets of the stocks of the Dow Jones Industrial Average
are used here and the percentage of the investment put in each of the assets (weights)
is de ned by minimizing the Value at Risk (VaR) of the portfolio. Moreover, the
constraint of no short-sales is added, which means that none of the weights can be
negative.
Value at Risk is a measure of risk that tries to determine the maximum loss of a
portfolio for a given con dence level. The VaR may also be interpreted as the quantile
of a distribution - the value below which lie q% of the values, for a given time horizon.
Although some people argue that it is not a good measure of risk, because of its lack
of coherence (see section 2.4), it is much used in practice, especially considering the
BIS (Bank for International Settlements) requirement (Hawkins, 2000).
To solve the optimization problem of minimizing the variance (another common
measure of risk), quadratic programming has often been used. But when the problem
includes a large number of assets or constraints, nding the best solution becomes more
time demanding. In these cases, di erent approaches have been employed, including
PSO and GA. Xia et al. (2000) used a Genetic Algorithms for solving the mean-variance
Introduction 2
optimization problem with transaction costs. Chang et al. (2000) focused on calculat-
ing the mean-variance frontier with the added constraint of a portfolio only holding
a limited number of assets. They have used three heuristics to solve this problem,
including a Genetic Algorithm. Finally Kendall Kendall & Su (2005) maximizes the
Sharpe ratio using Particle Swarm Optimization, but for only a very limited number
of assets. It was not found in the literature articles applying GA or PSO to portfolio
optimization using VaR, which shows the relevance of this Thesis.
The Particle Swarm Optimization algorithm is based on the behavior of shes and
birds, which collaboratively search an area to nd food. It is a systematic random
search like Genetic Algorithms in the sense that the algorithm moves through the
solution space towards the most promising area, but the exact path is not deterministic.
PSO has a population consisting of various particles, with each particle representing a
solution. The particles are all initialized in the search space with random values for the
weights and the velocities. Then they ` y' through the search space, with the velocity
in each iteration determined by a momentum term plus two random terms associated
with the best solution found by the particle and to the best solution found by all the
particles in the population. If some constraints are imposed, it is possible that at some
point particles will try to cross the boundaries of the feasible space, mainly because of
their momenta. There are di erent strategies to assure that the solutions found remain
feasible. In this work, four of them are implemented and discussed: bumping, amnesia,
random positioning and penalty function (see section 3.2 for more details about the
strategies).
Genetic Algorithms are techniques that mimic biological evolution in Nature. Given
an optimization problem to solve, GA will have a population of potential solutions
to that problem. To determine which are better t in a given generation, a tness
function (the objective function of the optimization) is used to qualitatively evaluate
the solutions. After using some selection strategy to choose the best chromosomes of
the population, o springs are generated from them, hoping that the o spring of two
good solutions will be even better. One major decision in GA is the way the solutions
are encoded. In this work the real encoding is used, with the genes consisting of real
numbers and representing the weights for each of the assets (see section 3.3).
GA uses special operators (selection, crossover, mutation), and the design of these
operators is a major issue in a GA implementation and usually is done ad hoc. In
this work, two selection operators (tournament and roulette-wheel selection) and two
crossover operators (basic crossover and whole arithmetic crossover) are implemented.
Introduction 3
A mutation operator to assure genetic variability was also implemented. GA may also
use elitism to make sure that the best solutions of each generation survive intact into
the next generation.
To implement the algorithms, the software Matlabr version 7.0 was used. To
estimate the VaR of a given portfolio, it was used the historical simulation method.
This method has as advantage that it does not requires the simplifying assumption
that the underlying distribution is normal, and yet does not add much computational
burden.
The two characteristics that were used to evaluate the performance of the algorithms
are the consistency (ability to always arrive at the global optimum of the problem) and
the speed of convergence to the best solution. It was also investigated the in uence of
the number of particles/chromosomes on the quality of the solutions; and the sensitivity
of the algorithms to the initial position of the particles/chromosomes.
This Thesis is structured as followed: chapter 2 makes a review about risk and
possible measures of it: variance (section 2.2), value at risk (section 2.3) and conditional
value at risk (section 2.4). Special attention is given to VaR and to three ways of
calculating it: the parametric method, the historical simulation method and the Monte
Carlo Method.
The next part (chapter 3) deals with Nature inspired strategies for optimization, in
particular Particle Swarm Optimization (section 3.2) and Genetic Algorithms (section
3.3). It is shown the basics of these methods, together with strategies for handling
constraints of the portfolio optimization problem.
Chapter 4 explains the experiment set-up. Section 4.1 describes the data used in
the empirical part. Section 4.2 describes the parameters chosen for the optimization
methods and the initialization procedure. And in section 4.3, the design of the exper-
iments is discussed. Finally in chapter 5, the results of the experiments are presented
and discussed.
Chapter 2
Risk Measures
2.1 Introduction
Finding a general de nition of risk in the literature is hard, as points Holton (2004),
who de nes it as an \exposure to a proposition of which one is uncertain ". He em-
phasizes that there are two components of risk: exposure and uncertainty. In general,
there is exposure to a proposition when there are material consequences from it. As
pointed by Holton, the test to be see if there are material consequences is: \would we
care? " Or in other words \if we immediately considered the proposition, would we have
a preference for it to be true or false? " It is possible that a person is exposed to a
proposition without even knowing about it. Take the example of children playing with
sharp knives - they are exposed to the possibility of getting cut, even though they are
unaware of it. The second component of risk, uncertainty, means that the person does
not know whether a proposition is true or false. Probability may be used as a metric
of uncertainty, although it only quanti es perceived uncertainty. Very often the word
risk is used meaning probability and measuring only negative risks. However, in the
Risk Management area, risk is used to refer to a combination of the probability of an
event happening together with the possible outcome of that event.
Knight (1921) distinguishes risk from uncertainty :
1
\But Uncertainty must be taken in a sense radically distinct from the fa-
miliar notion of Risk, from which it has never been properly separated. The
term `risk', as loosely used in everyday speech and in economic discussion,
1 Available online at: http://www.econlib.org/library/Knight/knRUP.html
Risk Measures 5
really covers two things which, functionally at least, in their causal relations
to the phenomena of economic organization, are categorically di erent. (...)
The essential fact is that `risk' means in some cases a quantity susceptible
of measurement, while at other times it is something distinctly not of this
character; and there are far-reaching and crucial di erences in the bearings
of the phenomenon depending on which of the two is really present and op-
erating. (...) It will appear that a measurable uncertainty, or `risk' proper,
as we shall use the term, is so far di erent from an unmeasurable one that
it is not in e ect an uncertainty at all. We shall accordingly restrict the
term `uncertainty' to cases of the non-quantitative type."
Financial risks, as de ned by Jorion (2001) are \those which relate to possible
losses in nancial markets, such as losses due to interest rate movements or defaults
on nancial obligations ". It is part of the job of nancial institutions to manage the
nancial risks, so that they are kept into tolerable levels. In order to manage the
risks taken, some metrics are needed. In this chapter, three possible risk measures are
explained: the mean-variance framework, the Value at Risk (VaR) and the Conditional
Value at Risk (CVaR). The VaR has a special importance after the Basel Capital Accord
of 1998 and the later amendments, which set capital requirements for commercial banks
to guard against credit and market risks; and allow banks to use their own VaR models
to calculate the capital required.
The remaining of this chapter is structured as follows. Section 2.2 gives a quick
review of the mean-variance approach introduced by Markowitz. Although it may be
shown that the scope of application of the Markowitz framework is limited to some spe-
cial cases, it is still used in practice and it is a standard in introductory risk textbooks.
Section 2.3 explains the Value at Risk and three ways of calculating it: the parametric
method (section 2.3.1), the historical simulation method (section 2.3.2) and the Monte
Carlo method (section 2.3.3). Finally, section 2.4 explains the concept of coherent risk
measures and de nes a coherent risk measure: the Conditional Value at Risk.
N
X
V= ni vi : (2.4)
i=1
The vector of weights and the vector of expected returns are de ned respectively
as: 2 3 2 3 2 3
!1 E [ R1 ] 1
6 7 6 7 6 7
! 6 !2 7
! 6 E [ R2 ] 7 6 2 7
!= 6
6 ...
7
7 and = 6
6 ...
7
7 = 6
6 ...
7
7: (2.7)
6 7 6 7 6 7
4 5 4 5 4 5
!N E [ RN ] N
It is possible to nd a portfolio that minimizes the risk for a given expected return,
or stated di erently, a portfolio that maximizes the expected return for a given level of
risk. A portfolio with this characteristic is called an ecient portfolio. The set of all
ecient portfolios form the ecient frontier. Figure 2.1 illustrates a typical ecient
frontier. All ecient portfolios lie in the upper part of the solid line (in black) and all
the others lie in the area delimited by it. The lower part of the solid line (in gray) form
the boundary of the area with the feasible portfolios, but the portfolios in that region
are obviously not ecient, because it is always possible to nd another one with lower
risk for the same expected return.
Calculating the portfolios on the frontier can be done through an optimization
problem stated as: XX
min(p ) = 2
wi wj i;j ; (2.10)
i j
Risk Measures 8
Figure 2.1: All possible portfolios are in the area shaded in gray. The upper part of the
parabolic region (black line) is called the ecient frontier of risky assets and is formed
by the portfolios which minimize the variance for a given expected return
subject to:
N
X N
X
Rp = wi ri and wi = 1: (2.11)
i=1 i=1
In the case where no short sales are allowed, there is the extra constraint:
0 wi 1: (2.12)
The portfolio optimization problem can also be written as the dual problem where
the return is maximized given an investor's desired level of risk.
Two interesting portfolios in the ecient frontier are the minimum variance portfo-
lio and the tangency portfolio. The minimum variance portfolio is simply the portfolio
that has the minimum variance among all other portfolios, not considering the returns,
and is the portfolio A in gure 2.2.
To de ne the tangency portfolio it is necessary to rst de ne the Sharpe ratio. The
Sharpe ratio combines the information from the mean and the variance of an asset and
is de ned as:
R Rf
Sp = p ; (2.13)
p
Risk Measures 9
Figure 2.2: Ecient frontier of risky assets (solid line) and capital allocation line with
riskless asset (dashed). Portfolio A is the portfolio with minimum variance and portfolio
B is the tangency portfolio with maximum Sharpe ratio
where Sp is the Sharpe ratio of the portfolio, Rp is the return on the risky portfolio
and Rf is the risk-free rate. Rp Rf is the excess return of the portfolio and p is
the standard deviation of the returns. The Sharpe ratio uses excess returns instead of
nominal returns because instead of showing how much return it is possible to get from
a risky portfolio, it shows how much extra return you may get for going from a riskless
to a risky position.
In gure 2.2, point B denotes the portfolio with maximum Sharpe ratio. By making
a portfolio that combines the risky portfolio B with the risk-free rate, we have the return
over the investment de ned by:
Ri = (1 p)Rf + pRp ; (2.14)
where p represents the proportion of the money that is invested in the risky portfolio
with respect to the amount that is invested in the riskless position. This equation
generates the capital allocation line, shown as a dashed line in gure 2.2, which allows
the investor to choose the level of risk he is willing to assume by simply changing the
proportion of the investment between the riskless asset and the tangency risky portfolio
(which remains constant).
The main problem of using the Markowitz mean-variance framework is that it is
only suited to the case of elliptic distributions (distributions where the equity-density
Risk Measures 10
surfaces are ellipsoids). This is the case of the normal or the t-distribution with nite
variances. A symmetric distribution is not necessarily elliptic. The use of this frame-
work with assets that present returns de ned by non-elliptic distributions can seriously
underestimate extreme events that may cause great losses (Szego, 2002).
FX (x), the VaR for a con dence level of (1 ) is de ned as (Inui & Kijima, 2005):
Figure 2.3a shows a normal probability density function of returns of a portfolio for
a given time period (for example, 1 day), with the VaR for a con dence level of 95%.
VaR does not tell how much will a portfolio lose in that time period, but it shows that
statistically only in 1 day out of 20 (5% of the days) the losses of a portfolio will be
larger than the VaR indicated in the gure. The Conditional Value at Risk (CVaR)
will be detailed in section 2.4.
The VaR may be interpreted as the quantile of a a distribution - the value below
which lie q% of the values (Bodie et al., 2005). For the normal distribution, it is easy
to calculate the VaR. For example, considering q = 5% (a con dence level of 95%),
the VaR will lie 1.65 standard deviations below the mean, and may be interpreted as
if with 5% of probability there will be a loss equal than or larger than VaR.
Although in Figure 2.3a the distribution was considered to be normal, the VaR
framework is not limited to this case. The normal distribution is a commonly used
simpli cation to make the calculations easier. However, considering the distribution
to be normal may signi cantly underestimate the risks involved when the actual dis-
tribution has \fat tails".
VaR is a standard measure of risk for banks, which usually use con dence levels of
99%, according to the BIS (Bank for International Settlements) requirement (Hawkins,
Risk Measures 11
Figure 2.3: Two distributions with the same VaR but di erent CVaR. Conditional
Value at Risk (CVaR) is the expected loss given that we are in the q% left tail of the
distribution. It is treated in more detail in section 2.4
2000). However, the highest is the con dence level (the closer to 100%), the rarer are
the events situated on the left of the VaR in the probability distribution. This way,
this events will be more unlikely to have happened in the past and to be present in
the historical data used in the model. Thus, it will be harder to do accurate forecast
about these extreme events in the future.
VaR calculations assume that the portfolio composition will not be changed over
the holding period. The most common choice for holding periods in banks is 1 day. BIS
prescribes a 10-day holding period for VaR calculations, but it allows banks to calculate
this value by multiplying the bank's 1-day VaR by the square root of 10, which is valid
if market moves are independently distributed over time (Hawkins, 2000).
The popularization of VaR in the Financial world is mainly due to JPMorgan,
which in the late 1980s developed a VaR system that encompassed all the company,
modeling hundreds of risk factors. Quarterly, a covariance matrix was updated with
historical data; and daily the risk represented by the positions taken by the company
Risk Measures 12
were assessed and presented in a Treasury meeting. The clients of the company became
interested in the system and instead of selling the software, JPMorgan opted to publish
a methodology and distribute the calculated covariance matrix. The service was named
RiskMetricsr (Holton, 2003, p. 18).
However, there are some critiques to the use of VaR as a risk measure (Szego, 2002):
it lacks subadditivity, in such a way that portfolio diversi cation may lead to an
increase of risk (see section 2.4 for an explanation of coherent risk measures; and
Tasche (2002) for an example of the lack of subadditivity in VaR);
it is non-convex and has many local minima, making it hard to be used in op-
timization problems. Figure 2.4 shows the VaR of a portfolio of two assets, as
a function of the weight, allowing to visualize the existence of local minima. It
is also possible to see that the portfolios obtained are di erent depending on
the risk measure used - minimizing VaR and estimating it with the parametric
method, the best w is found to be 0.46, while using historical simulation, the
best w is found to be 0.32; and minimizing CVaR using historical simulation the
best w is 0.50;
it may provide results that are con icting at di erent con dence levels.
Despite the critiques that the use of VaR receives from the academic literature, the
use of VaR is prescribed by regulatory agencies because it is a compact representation
of risk level and allows the measurement of downside risk (Szego, 2002).
To calculate the Value at Risk, three possible methodologies are (Holton, 2003, p.
3):
parametric method;
historical simulation method;
and Monte Carlo method.
The parametric method relies on the assumption that the risk factors follow a
known probability distribution. Suppose that the returns of the assets of a portfolio in
Risk Measures 13
Figure 2.4: VaR and CVaR of portfolios of 2 assets, showing the existence of local
minima for the VaR calculated using historical simulation. The two stocks used in
the portfolio are American Express Co. and American International Group Inc., in-
cluding 1250 data points with daily frequency (approximately 5 years), starting on
6/jun/2001. The proportion of the portfolio invested in American Express Co. is !
and the remaining (1 !) is invested in American International Group Inc.
with Rp being the stochastic returns of the portfolio. The 1- and 10-day VaR of the
portfolio with a 99% con dence level are:
V aRp (1; 99%) = 2:33 p V (2.16)
p
V aRp (10; 99%) = 10 V aRp (1; 99%) (2.17)
p
= 10 2:33 p V:
Obviously given a same holding period and assuming that the distribution of returns
is normal, the problem of nding the portfolio that minimizes the Value at Risk is the
same as the problem of nding a portfolio with minimum variance, no matter which
con dence level is being used.
Risk Measures 14
By what was said, it would be expected that portfolios calculated by minimizing the
VaR using the parametric method and considering the distribution of returns to be nor-
mal will be less conservative than portfolios calculated using non-parametric methods
with the actual (or approximated) distribution of returns, because the normal distri-
bution considers that extreme events are very improbable to happen. Returns lower
than 4 standard deviations from the expected return, for example, have a probability
of happening of only 0.003167%. However, there is evidence that individual returns
distributions show fat tails, implying that extraordinary losses happen more frequently
that predicted by the normal distribution.
On the other hand, if it is considered a well-diversi ed portfolio with many di erent
risk factors, even if the risk factors distributions are not normal, the central limit
theorem says that the resulting distribution of returns of the portfolio will converge to
a normal distribution. This way, it may be considered that the portfolio returns follow
a normal distribution, as long as the portfolio is well diversi ed and the risk factor
returns are independent from each other (Crouhy et al., 2001, p. 193).
ranged or occurring in a series or in a succession of stages so that each stage derives from or acts
The Monte Carlo method was invented by Stanislaw Ulam in 1946 (Eckhardt, 1987)
and covers any technique of statistical sampling used to approximate solutions to quan-
titative problems. In the method, the random process under analysis is simulated re-
peatedly, where in each simulation will be generated a scenario of possible values of the
portfolio at the target horizon. By generating a large number of scenarios, eventually
the distribution obtained through simulation will converge towards the true distribu-
tion. A good description of the method can be found, for example, in Holton (2003,
cap. 5).
As advantages of the method, Crouhy et al. (2001) mentions the fact that any
distribution of the risk factors may be used; the method can be used to model any
complex portfolio; and it allows the performance of sensitivity analyses and stress
Risk Measures 17
testing. As disadvantages it may be mentioned the fact that outliers are not incorporate
into the distribution; and that it is very computer intensive.
Given the set of properties above, VaR can not be considered a coherent measure
of risk. For example, only in the special case when the joint distribution of returns is
elliptic VaR is subadditive; and in this case the portfolio that minimizes the VaR is
the same that would be obtained by simply minimizing the variance.
As an example of a risk measure that may be proved to satisfy the aforementioned
four properties, and thus to be coherent, is the Conditional Value at Risk (CVaR), also
called Mean Excess Loss, Mean Shortfall, or Tail VaR. It tries to answer the question:
\if things do get bad, how much can we expect to lose? " (Hull, 2002). CVaR is the
expected loss given that we are in the q% left tail of the distribution. Being X a
real-valued random variable; fX (x) the probability density function associated to it;
Risk Measures 18
and x = V aR (1 ) , the CVaR for a con dence level of (1 ) is de ned as (Inui &
Kijima, 2005):
CV aR(1 ) = E [X j X V aR (1 ) ]
Z x
= 1 xf (x)dx (2.23)
X
1
it is more conservative than VaR (CV aR(p) V aR(p), for any portfolio p);
it is convex, which makes portfolio optimization easier. Figure 2.4 is an example
where CVaR is clearly convex;
it is coherent risk measure in the sense of Artzner et al. (1999);
it is a better representation of the risks involved in extreme events. For example,
in in Fig. 2.3, two portfolios exhibit the same VaR, but clearly the portfolio
shown in Fig. 2.3b is riskier than the one shown in Fig. 2.3a;
linear programming can be used for optimization.
Chapter 3
Optimization
3.1 Introduction
Computer Science and information technology always used biological and natural
processes as a source of inspiration. Some examples of Nature Inspired strategies
for problem solving are Arti cial Ants Colonies, Swarm Intelligence and Evolutionary
Computing.
In this research it is given special attention to two systematic random searches in-
spired in Nature: Particle Swarm Optimization (PSO) and Genetic Algorithm (GA).
Both algorithms are intrinsically parallel, which means that they explore several loca-
tions of the solution space at the same time. Many other algorithms for solving the
same kind of problems are serial and can only explore the solution space of a problem
in one direction at a time. Another notable strength of PSO and GA is that they
perform well in problems for which the search space is complex - those where the ob-
jective function is discontinuous, noisy, changes over time, or has many local optima.
Most practical problems have a vast solution space, which are impossible to search
exhaustively; the challenge then becomes how to avoid the local optima. PSO and GA,
by their characteristic of exploring simultaneously di erent parts of the solution space,
are less prone to converge to these local optima. Finally, these algorithms are exible
to handle constraints, which may be implemented more easily, when comparing to the
`standard' optimization techniques.
Nature Inspired Strategies for Optimization 20
3.2 Particle Swarm Optimization
The Particle Swarm Optimization (PSO) algorithm was introduced by Kennedy &
Eberhart (1995) and is based on the behavior of shes and birds, which collaboratively
search an area to nd food. It has links to Genetic Algorithms and Evolutionary
computing, but does not su er from some of the problems found in Genetic Algorithms,
such as: the iteration with the group improves the solution, instead of detracting the
progress; and PSO has a memory, which is not the case of Genetic Algorithms, where
even if elitism is used only a small number of individuals will preserve their identities
(Eberhart & Kennedy, 1995).
PSO has a population consisting of various particles, with each particle representing
a solution. The particles are initially positioned randomly in the search space (i.e.
assigned random values for the weights). It is desirable that the particles are initially
well spread to assure a good exploration and minimize the risk of getting trapped into
local optima.
Apart from position, a particle in the PSO also have a velocity, which is also ini-
tialized randomly. The velocity determines in which direction a particle will move and
how far it will go. The position of particle i in the next iteration is calculated as:
posi;d;t+1 = posi;d;t + veli;d;t+1 ; (3.1)
with veli;d;t being the velocity of particle i in dimension d at iteration t, calculated
+1
by:
veli;d;t+1 = wt veli;d;t + c1 r1 (pbesti;d posi;d;t )
+ c2r2(gbestd posi;d;t); (3.2)
where wt is a weight which gives the particle momentum; r and r are random numbers;
1 2
c and c are scaling constants; pbesti;d is the best position of particle i in dimension d in
1 2
all previous iterations; gbestd is the best position of the entire population in dimension
d in all previous iterations; and posi;d;t is the position of particle i in dimension d at
time t. This formulation is usually referred to as Particle Swarm Optimization with
Inertia. wt is here de ned as:
wmax wmin
wt = wmax
N
t; (3.3)
with wmax and wmin being the value of wt for the rst (t = 0) and last (t = N ) iterations
respectively, considering wmax wmin. N is the total number of iterations and t the
number of the present iteration.
Nature Inspired Strategies for Optimization 21
In equation (3.3) it is possible to see that the momentum in uence is higher in
the beginning of the search, decreasing linearly until the end of the search, when it
becomes less important to the direction of ight, comparing to the in uence of pbesti;d
and gbestd. Higher values of wt increase the exploration of the search space, because
the direction of ight of the particles will be guided mainly by the previous values of
the velocity (remembering that the initial velocities are random). On the other hand,
lower values of wt mean more exploitation of \good" regions in the solution space,
because the particles will tend to move to the direction of the best solutions found
both locally and globally.
To increase the exploration of the solution space, in this implementation random
numbers r and r are drawn from independent uniform distributions ranging from
1 2
(pi 1) to pi, with pi 2 [0; 1] and i 2 f1; 2g. For this particular implementation
of the algorithm, p and p were chosen to be both equal to 0:9 in such a way that
1 2
r ; r 2 [ 0:1; 0:9]. Of course, the better exploration of the solution space comes at the
1 2
cost of reducing the exploitation of promising areas, as some particles may be diverged
to the opposite directions of pbesti;d and/or gbestd.
There are other formulations to calculate the acceleration of the particles, for ex-
ample, the constriction factor method (CFM), proposed by Clerc (1999), or the use of
neighborhoods of particles (Eberhart & Kennedy, 1995), but these strategies are not
used in this work.
In the case of the portfolio selection problem, a solution consists of a set of weights
for the various assets in the portfolio. If there are N possible assets to choose from,
then a solution in the search space will contain N 1 weights. Being !i the weight for
the asset i, the weight of the last asset is simply determined by manipulating equation
2.6 into: N 1
X
!N = 1 !i ; (3.4)
i=1
Figure 3.1: Feasible solution space for portfolio optimization with 3 assets
In the speci c case with 3 assets, the feasible solution space is depicted in Figure
3.1, corresponding to triangle with vertices in the points (1; 0; 0), (0; 1; 0) and (0; 0; 1).
As one of the weights may be calculated by knowing the other two (see equation (3.4)),
the algorithm may concentrate in nding a solution lying in the shaded area of the
gure, and calculate afterwards ! . 3
There are di erent strategies to make sure that the particles abide to the imposed
constraints and each has its advantages and disadvantages. Two conventional strategies
to make sure that all the particles stay within the feasible space are here called bumping
and random positioning (Zhang et al., 2004).
The Bumping strategy resembles the e ect of a bird hitting a window. As the
particle reaches the boundary of the feasible space, it `bumps'. The particle stops on
Nature Inspired Strategies for Optimization 23
the edge of the feasible space and loses all of its velocity. Figure 3.2 shows a particle
bumping into a boundary. The initial position of the particle was pt and the position
for the next iteration was calculated to be pt . However, pt is outside the feasible
+1 +1
solutions space, having ! < 0. To avoid that the particle enters the infeasible space,
2
!d;t
+1 = wd;t + vd
0
!d;t for d = 1; :::; N 1 (3.6)
!d;t !d;t+1
where !d;t is the weight d of the particle in the last iteration; !d;t is the weight d of
+1
0
the particle without bumping; !d;t is the weight after bumping; vd is the velocity of
+1
0
particle d. The weight !N;t is calculated by:
+1
N
X1
!0
N;t+1 =1 0 ;
!i;t +1 (3.7)
i=1
as usual.
It should be clear by looking at Figure 3.2 that (3.6) comes from triangle similitude,
where the distance between pt and pt is the hypotenuse and !i;t !i;t is one of the
+1 +1
cathetus.
Nature Inspired Strategies for Optimization 24
After bumping, in the next iteration the particle will gain velocity again, starting
from velocity zero and applying (3.2). If the gbest is near the edge of feasible space,
then bumping makes sure the particles remain near the edge and thus near the gbest.
However, because of the loss of velocity caused by bumping into the boundaries, the
particles may get `trapped' at the current gbest and not reach the real global optimum,
resulting in a premature convergence to a sub-optimal solution.
The random positioning strategy simply changes the negative weight into a random,
feasible weight, and normalizes the weights so that they add up to one. This strategy
increases the exploration of the search space, when comparing to bumping. The pre-
mature convergence, which may occur with bumping, does not occur here. However
the opposite may be true for random positioning: there will no convergence to the real
global optimum at all. Especially if the optimal solution is near the boundaries of the
feasible region, it may happen that particles approaching it will be thrown back into
di erent and worse areas. And when they y back towards the promising area, if they
try to violate a constraint they will be thrown again to a random position. Thus the
particles may get stuck in a loop and never be able to come to a stop at the optimal
solution.
Hu & Eberhart (2002) come with a di erent strategy, henceforth called amnesia.
With the initialization all the particles are feasible solutions. But during the iterations
the particles are allowed to cross the boundaries and y into the infeasible space.
However the particles will not remember the value of the objective function of the
solutions found in the infeasible space and if they nd a better solution, it will not be
recorded neither as pbest nor as gbest. Because of this pbest and gbest will always lie
inside the feasible space and thus the particles will be attracted back there. A downside
of amnesia may be the fact that the particles `waste time' ying in infeasible space,
which in some cases may result in an inecient search.
Finally, it may be used the penalty function strategy, which is well known in op-
timization problems. It consists in adding up a penalty to the evaluated objective
function of a particle if it is located outside the feasible space. If the penalty is big
enough, the points explored outside the feasible area will not be remembered as op-
timal solutions. In this implementation, the penalty added to the objective function
is 1, which adds a loss equal to 100% of the capital of a portfolio located outside the
feasible area.
Nature Inspired Strategies for Optimization 25
was chosen to be after the rst weight. As the solutions may violate the constraint that
the weights add up to one after the swap, the weights must be normalized to get the nal
o spring. Normalization, however, may bring an extra impact on the weights, because
it moves the particle in the search space (and thus increases the randomness), resulting
in an increase of the exploration of the space and in a reduction of the exploitation
of good solutions. However, as a new o spring's weights can sum up to no more than
2, normalizing the weights will in the worse case result in dividing the weights by 2,
which seems acceptable. The nonnegative constraint however will never be violated
with this crossover, because nothing is subtracted from the weights.
The whole arithmetic crossover is designed to be used with real encoding, instead of
binary encoding. The rst o spring will be calculated as a fraction p of the values of the
genes of parent 1 and 1 p from parent 2, with p 2 [0; 1]. The second o spring will be
the opposite, being composed as a fraction 1 p of parent 1 and p of parent 2. Figure
3.5 gives an example of the use of the whole arithmetic crossover, using the same
parents as the ones used in Figure 3.4 and supposing that p = 0:6. The advantage
of this operator is that it automatically holds to both constraints: the weights stay
positive and add up to one.
With both crossover methods presented, there is a transfer of information between
successful candidate solutions: individuals can bene t from what others have learned,
and solutions can be mixed and combined, with the potential to produce an o spring
that has the strengths of both its parents. Of course in the course of evolution it may
happen that individuals with a bad t are generated, but they may be easily discarded
from one generation to another, so eventually the population of solutions will become
tter.
Nature Inspired Strategies for Optimization 28
Finally, there is the mutation operator, which is analogous to the mutation of living
beings that happens in Nature. The mutation operator in GA causes an alteration in
an individual's chromosome. In the way it was implemented here, two genes (weights)
are chosen randomly to be altered by adding a small value to one and subtracting the
same value from the other. Figure 3.6 gives an example of the use of the mutation
operator, supposing that the second and the fth genes were chosen to be mutated,
with the random value 0:1 chosen to be subtracted from the second gene and added to
the fth gene.
When applying the mutation operator, the two constraints imposed must be re-
spected (no negative weights and the sum of the weights being equal to one). To make
sure that the weights of a solution add up to one, if a subtraction cause a weight to
become negative or the addition causes a weight to become larger than one, the value
of that weight is set to zero (or one respectively) and all the weights of the solution
are normalized.
Nature Inspired Strategies for Optimization 29
To avoid that the best solutions are lost throughout the evolution process, by the
application of crossover and mutation to each generation, the elitism strategy may be
used. Elitism assures that the best n solutions of a generation (in this implementation,
n = 1) are transmitted untouched to the next generation.
3.4 Conclusion
From what was shown in this quick review of the methods, it is expected that
when applying them to the portfolio optimization problem, PSO will present a more
focused search than GA, with more emphasis in the exploitation of promising areas
than the exploration of the whole search space, because of the way that the position of
the particles are updated based on the global and local optima. However, some extra
exploration may be introduced by tuning the values of p and p mentioned in section
1 2
3.2. The use of the random positioning strategy is also a way to increase the exploration
of the solution space. It is also expected that the PSO will be more dependent of the
initial position of the particles. If they are not well spread in the solution space,
it is more likely that it will converge to local optima. Again, the random positioning
strategy may contribute to avoid this pitfall. However, given that the initial position of
the particles is `good', the other boundary handling strategies (bumping, amnesia and
penalty function) will probably show a better performance than random positioning, in
particular if the global optimum is near the boundaries of the feasible solution space.
Amnesia and penalty function should perform similarly, as both allow the particles to
y out of the feasible solutions region, not recording (or penalizing) the solutions found
there.
From GA it is expected that the whole arithmetic crossover operator result in a
better performance than the basic crossover, because it seems to preserve better the
knowledge existent in the parent solutions. The basic crossover probably will introduce
more randomness to the search and reduce the exploitation of good regions in the
solution space. Based on the literature, it is not expected a superior performance from
neither the tournament nor the roulette wheel selection. It also expected that GA will
show more robustness to initial conditions, due to its higher inherent randomness.
Chapter 4
Experiment Set-Up
the the Dow Jones Industrial Average (DJIA), which includes 30 major stocks in the
2
U.S. market, with the split and dividend multipliers prescribed by the CRSP (Center
for Research in Security Prices) standards. Data was gathered from 05/Jan/1987
to 30/May/2006 (approximately 19 years), totalizing 4895 data points, but for the
experiments subsets of it were used. The adjusted price series was used to calculate
the log-returns of the stocks, according to equation (2.19). It was noticed that the
historical series downloaded from Yahoo! Finance had some missing data points, and
to correct it data extracted from the Datastream database was used. Datastream
3
data was not used directly because a series adjusted for split and dividends was not
available.
A summary of the data is shown in Tables A.1 and A.2. It is visible that depending
on the horizon chosen, the return and risk associated to the stocks may vary signi ca-
tively. For Verizon, for example, daily returns can vary from 0.006% when using 10
years of historical data, to -0.015% when using only 1 year; and daily risk (measured
by the standard deviation of returns) can vary from 0.399% to 0.831%, for 1 year and
1 http://finance.yahoo.com
2 http://www.djindexes.com
3 Datastream is a database provided by Thomson Financial, containing stock market data, company
accounts and economics time series. http://www.datastream.com/
Experiment Set-Up 31
10 years horizon respectively. Figure A.1 shows clearly how the distribution of returns
change with the change according to the horizon used.
1. Generate vector: h iT
~s = 1 2 N ;
where N is the number of assets in the problem.
2. Generate vector ~s 0, by making a random permutation of the vector ~s.
3. Generate weights in the sequence determined by ~s 0. Each weight is a random
number from an uniform distribution, ranging from zero to one minus the sum
of all the weights already attributed for that particle.
4. Normalize the resulting vector, making the sum of the weights add up to one.
This way, the rst weight generated (!s ) belongs to the interval [0; 1], the second
0
(1)
weight belongs to the interval [0; 1 !s ], the third to [0; 1 (!s + !s )], and so
0
(1) 0
(1) 0
(2)
on. It should be noted here that !s i refers to the weight indexed by the i-th element
( )
of the vector ~s 0.
As an example, imagine the case with 3 assets. For a given particle, a possible
vector ~s 0 generated could be:
h iT
~s 0 = 3 1 2 ;
which means that the rst weight to be initialized will be the one corresponding to the
third asset, the second weight to be initialized corresponds to the rst asset and nally
Experiment Set-Up 32
the weight corresponding to the second asset will be initialized. The rst weight will be
a random number in the interval [0; 1], for example, 0:7, resulting in the (incomplete)
vector of weights:
! h
! = 0 0 0:7 :
iT
In the next step, the weight corresponding to the rst asset is initialized with a
random value in the interval [0; 0:3], for example 0:2. The weight of the second asset
is initialized last, with a random number belonging to the interval [0; 0:1], for example
0:05, resulting in the (non-normalized) vector of weights:
! h iT
! = 0:2 0:05 0:7 :
To normalize the weights, the vector ! ! is divided by the sum of all the weights (in
this case, 0:95), resulting in the normalized weight vector:
! h iT
! = 0:21 0:05 0:74 :
Li-Ping et al. (2005) recommend that for the PSO, the population size for a higher
dimensional problem should not be larger than 50 particles, and for a less complicated
problem, a population of 20 to 30 particles should be enough, with 30 particles being
a good choice. It was noticed that 30 particles seemed to be not enough for the
problem under analysis, so for most of the experiments, the size of the population was
chosen to be 50 particles. GA was compared to PSO always using the same number of
particles/chromosomes for both algorithms.
The parameters of PSO were tuned empirically, and the following values were cho-
sen:
wmax = 0:9,
wmin = 0:4,
c = 0:5,
1
c = 0:5,
2
p = 0:9,
1
p = 0:9.
2
Experiment Set-Up 33
For the penalty function strategy, a value of 1 was chosen to be added to the
objective function when the solution was unfeasible, which means adding a loss of
100% of the capital invested.
The parameters to be tuned in GA are two probabilities. The rst is the probability
that a crossover will be performed on two selected chromosomes and the second is
the probability that a mutation will happen on the o spring of two solutions. The
probability of a crossover occurring was chosen to be 0.8 (80%) and of a mutation 0.01
(1%).
It was noticed that the population in the GA algorithm tended to become very sim-
ilar after a certain number of iterations. It is interesting that the population converge,
but at the same time having equal solutions in the population reduces the exploration
of the solution space without adding to exploitation. It was then implemented that
when a new generation was calculated the algorithm checked if there were redundant
individuals, with 4 digits precision. If any redundant individuals were found, they were
mutated with a probability of 100%.
For calculating the Value at Risk and the Conditional Value at Risk, it was chosen
a con dence level of 95%. This means that taking 2 years of data (500 data points)
and applying the historical simulation method, 25 data points will be located to the
left of V aR (see Figure 2.3). Higher con dence levels would require a larger number
of data points and could signi cantly slow down the experiments.
Empirical Results
When more dimensions (more assets) are added, the complexity increases even more.
Figure 5.1: Contour plot showing the VaR of portfolios composed by stocks of 3M Co.,
American International Group Inc. and E. I. DuPont de Nemours & Co., calculated
using historical simulation with 2 years of data. ! is the weight of investment in 3M,
1
equal to 1 ! ! . The triangular shaped area with dashed line delimits the feasible
1 2
solutions area.
order of magnitude of the other strategies, except when using 20 assets, when it was
slightly higher than GA.
PSO using the amnesia or the penalty function strategy showed comparable results
as expected, both in terms of number of iterations needed to converge and in terms of
the errors comparing to the global optimum. This is not surprising, as the way they
are implemented lead to similar search behavior. The two strategies took longer than
bumping strategy to converge to a solution, because the particles tend to `waste time'
outside the feasible solutions space. The quality of the solutions found were comparable
to the other algorithms for 5 and 10 assets and lower than the other strategies for 20
assets, as visible in the "V aR (higher values of "V aR mean lower quality/consistency).
This may be due to the fact that in the rst iterations, which in general are more
exploratory for PSO, many solutions found are not feasible.
Empirical Results 38
PSO using the random positioning strategy had a bad performance when comparing
to the other strategies with respect to the number of iterations to converge to a solution.
The explanation to this is that if the solutions are near the boundaries of the feasible
solution space, it is very likely that the particles approaching the area will eventually
try to cross the boundaries and be thrown into a random position away of it. This
makes it harder for the algorithm to converge. The average errors to the best solution
("V aR ) are slightly lower than the other algorithms, due to the increased exploration
performed by the strategy, showing better exploratory power.
Analyzing the performance of the Genetic Algorithms in Table B.2 it is visible that
its performance was worse than Particle Swarm Optimization, when speaking of the
number of iterations needed to converge. The explanation is the fact that PSO is a
much more focused search, while GA presents more randomness. The additional ran-
domness in GA results in a larger exploration of the search space, which especially for
a higher dimensional problem (as it is the case of the portfolio optimization with 20
assets) results in solutions closer to the global optimum (lower "V aR ). It shows that GA
seems less likely to converge to local minima than PSO, exception made to the random
positioning strategy of PSO. However, the performance of the random positioning strat-
egy for PSO seemed to be worse than the performance of GA, especially for a higher
number of assets. Comparing the di erent operators used in GA, the basic crossover
performed better than the arithmetic crossover, which is surprising, because it was
expected that the arithmetic crossover would be better to preserve the knowledge ex-
istent in the population. However, the whole arithmetic crossover shows an `averaging
e ect' on the solutions that make GA lose part of its exploratory power. Comparing
the di erent selection strategies, the tournament selection performed slightly better
than the roulette wheel selection, but no strong conclusion can be made. The com-
bination of roulette wheel selection with whole arithmetic crossover showed the worst
performance of all the strategies, regarding the number of iterations to converge in all
portfolio sizes, and also regarding "V aR for 20 assets. The possible explanation for this
is that this combination reduces the exploratory capacity of the GA, without adding
much to the exploitation.
It is also noticeable that calculating Nit0 = Nit + 2N , which gives the number of
iterations needed to converge in 97.7% of the times (considering a normal distribution
of Nit), GA will not have converged until the limit of 2 000 iterations was reached. The
values of Nit0 can be seen in Table B.3. Therefore, the "V aR observed for GA in the
portfolio optimization with 20 assets may be the result of a brute force search. It may
be said then that the algorithms may show a better consistency (lower "V aR ), but this
Empirical Results 39
comes at a high price ( Nit0 ).
To investigate how the solutions found by the algorithms improve through the
iterations, the evolution of the best, average and worst solutions were recorded and
shown in Figures C.2 (for PSO with bumping strategy), C.3 (for PSO with amnesia
strategy), C.4 (for PSO with random positioning strategy), C.5 (for GA with roulette
wheel selection and whole arithmetic crossover), and C.6 (for GA with tournament
selection and basic crossover). It is also presented the largest, average and smallest
Euclidian distance between the particles/chromosomes in each iteration. The Euclidian
distance between two points P = (p ; : : : ; pn) and Q = (q ; : : : ; qn) in the n-dimensional
1 1
(a) Good initialization: particles are well spread in the feasible solution space
Conclusions
In this Thesis, it was shown the application of Particle Swarm Optimization and
Genetic Algorithms to risk management, in a constrained portfolio optimization prob-
lem where no short sales are allowed. The objective function to be minimized was the
value at risk calculated using historical simulation.
Several strategies to handle the constraints were implemented and the results were
compared. For PSO, it was implemented the strategies bumping, amnesia, random
positioning and penalty function. For GA, two selection operators (roulette wheel and
tournament); two crossover operators (basic crossover and arithmetic crossover); and
a mutation operator were implemented.
The results showed that the methods are capable of nding good solutions in a
reasonable amount of time. PSO showed to be faster than GA, both in terms of number
of iterations and in terms of total running time, which is explained by the more focused
search performed by PSO. However, PSO demonstrated to be much more sensible to
the initial position of the particles than GA, and if a bad initialization is made, it is
very likely that PSO will not converge to the global optimum.
Regarding the strategies used for PSO, bumping seems to be the best in terms
of speed, followed closely by amnesia and penalty function. The random positioning
strategy did not perform well in this sense, presenting problems to converge to the best
solution, although it was more robust to the initial position of the particles and the
superior exploratory power allowed a good consistency comparing to the others.
GA showed to be able to nd good solutions too, but was worse than PSO in terms
of speed. However, its less focused search (larger randomness) makes it less prone to
Conclusions 46
be trapped into local minima, especially if the population is not initialized with the
chromosomes well spread in the feasible solution space. The basic crossover showed to
be better than the whole arithmetic crossover, preserving the diversity of the population
and thus the exploration of the solution space. The combination tournament selection
with basic crossover was somewhat better than the other strategies and, in the other
hand, the combination roulette wheel selection with whole arithmetic crossover was
the worst.
Tests were also made regarding the number of particles needed to solve the prob-
lem, and apparently using a big number of particles increments the consistency of the
algorithms to nd the global optimum, but at the cost of a big increase in computa-
tional time. Using 50 particles/chromosomes seemed to be enough for problems up
to 20 assets. The bottom line is that consistency of the algorithms to nd the global
optimum can be increased, but this comes at a high price: longer execution times.
As suggestions for further research, it is proposed:
Evaluate the performance of the algorithms with the inclusion of criteria to detect
on-the- y the convergence.
Find a reformulation of the optimization problem, such that the solution found is
less sensible to the use of di erent horizons of data, i.e. use a multi-criteria opti-
mization that minimizes the objective function calculated over di erent horizons
of data.
Investigate how to use PSO/GA to add adaptability to portfolio management
when dealing with di erent horizons.
Check the e ect of including the Constriction Coecient proposed by Clerc
(1999) in the performance of PSO for risk management.
Test di erent encodings of the solutions for GA, for example the more traditional
binary encoding.
Compare PSO and GA with traditional methods.
Appendix A
Figure A.1: Distribution of daily returns of Verizon, for two di erent horizons
Summary of the data used 49
Table A.1: Average returns of the companies used in the experiments, for di erent
horizons of the data, with daily frequency
Company 1 year 2 year 5 year 10 year
3M 0.018% 0.002% 0.015% 0.020%
Alcoa 0.026% 0.007% -0.008% 0.016%
Altria 0.018% 0.042% 0.021% 0.021%
American Express 0.024% 0.018% 0.015% 0.027%
Am. International 0.014% -0.016% -0.009% 0.018%
AT&T 0.027% 0.016% -0.009% 0.007%
Boeing 0.047% 0.053% 0.011% 0.013%
Caterpillar 0.075% 0.060% 0.038% 0.029%
Citigroup 0.010% 0.011% 0.005% 0.031%
Coca-Cola 0.001% -0.010% 0.001% 0.001%
DuPont -0.013% 0.005% 0.002% 0.006%
Exxon 0.013% 0.032% 0.014% 0.022%
GE -0.009% 0.013% -0.008% 0.018%
GM -0.020% -0.038% -0.019% -0.001%
Home Depot -0.008% 0.007% -0.008% 0.021%
Honeywell 0.024% 0.022% -0.002% 0.010%
HP 0.061% 0.038% 0.006% 0.011%
IBM 0.009% -0.006% -0.012% 0.022%
Intel -0.071% -0.035% -0.017% 0.013%
Johnson & Johnson -0.017% 0.008% 0.008% 0.019%
JP Morgan 0.036% 0.018% 0.001% 0.013%
McDonalds 0.013% 0.022% 0.006% 0.008%
Merck 0.012% -0.025% -0.020% 0.006%
Microsoft -0.017% 0.002% -0.011% 0.021%
P zer -0.025% -0.030% -0.017% 0.014%
Procter & Gamble -0.002% 0.004% 0.022% 0.018%
United 0.029% 0.036% 0.017% 0.028%
Verizon -0.015% -0.003% -0.013% 0.006%
Wal-Mart 0.003% -0.012% 0.000% 0.024%
Walt Disney 0.018% 0.022% 0.000% 0.008%
Summary of the data used 50
Table A.2: Standard deviations of the returns of the companies used in the experiments,
for di erent horizons of the data, with daily frequency
Company 1 year 2 year 5 year 10 year
3M 0.429% 0.477% 0.588% 0.708%
Alcoa 0.674% 0.665% 0.908% 0.975%
Altria 0.459% 0.489% 0.735% 0.919%
American Express 0.464% 0.429% 0.797% 0.945%
Am. International 0.425% 0.597% 0.771% 0.823%
AT&T 0.388% 0.400% 0.789% 0.870%
Boeing 0.566% 0.560% 0.845% 0.930%
Caterpillar 0.691% 0.655% 0.787% 0.916%
Citigroup 0.352% 0.377% 0.781% 0.968%
Coca-Cola 0.315% 0.389% 0.542% 0.733%
DuPont 0.470% 0.466% 0.666% 0.829%
Exxon 0.560% 0.550% 0.627% 0.678%
GE 0.369% 0.384% 0.755% 0.809%
GM 1.253% 1.087% 1.021% 0.969%
Home Depot 0.531% 0.535% 0.882% 1.024%
Honeywell 0.537% 0.554% 0.953% 1.024%
HP 0.743% 0.753% 1.095% 1.238%
IBM 0.394% 0.436% 0.709% 0.916%
Intel 0.654% 0.713% 1.143% 1.288%
Johnson & Johnson 0.380% 0.380% 0.581% 0.681%
JP Morgan 0.394% 0.406% 0.903% 1.009%
McDonalds 0.619% 0.551% 0.752% 0.801%
Merck 0.608% 0.918% 0.831% 0.848%
Microsoft 0.541% 0.482% 0.802% 0.990%
P zer 0.608% 0.649% 0.737% 0.870%
Procter & Gamble 0.374% 0.402% 0.486% 0.774%
United 0.453% 0.449% 0.817% 0.847%
Verizon 0.399% 0.415% 0.740% 0.831%
Wal-Mart 0.448% 0.422% 0.625% 0.865%
Walt Disney 0.509% 0.505% 0.904% 0.961%
Appendix B
Table B.1: Comparison of di erent risk measures. Portfolios with the same 5 assets
were optimized to minimize VaR, CVaR and the Variance. The portfolios obtained were
then measured using di erent metrics. Minimized indicates which objective function
was minimized; Variance, VaR and CVaR indicate the average deviation from the
portfolio which minimized these criteria
Minimized Variance VaR CVaR
Variance - 5.56% 0.12%
VaR 0.77% - 0.49%
CVaR 0.09% 5.09% -
Tables with results 52
Table B.2: Comparison of the consistency of PSO and GA algorithms for solving the
portfolio optimization problem, running with 50 particles/chromosomes and a maxi-
mum of 2000 iterations. Na is the number of assets included in the portfolio; Algorithm
identi es to which algorithm the listed results refer to; Nit is the average number of
iterations that the algorithm took to nd a solution within 0.1% of the minimum VaR
found in a speci c run; N is the standard deviation of this measure; "V aR is the average
error between the VaR found by the algorithm in the di erent runs and the minimum
VaR found over all runs; and " is the standard deviation of these errors
Na Algorithm Nit N "V aR "
PSO Bumping 44.0 21.0 0.58% 0.79%
PSO Amnesia 79.7 31.0 0.55% 0.73%
PSO Random 562.7 522.6 0.37% 0.60%
PSO Penalty 93.8 54.1 0.37% 0.69%
5
GA Roul./Basic 142.0 168.7 0.52% 0.72%
GA Tourn./Basic 172.6 242.1 0.60% 0.78%
GA Roul./Arith. 472.6 574.2 0.91% 1.06%
GA Tourn./Arith. 269.1 415.0 1.02% 1.02%
PSO Bumping 102.1 53.3 3.43% 1.38%
PSO Amnesia 163.4 103.8 4.02% 2.30%
PSO Random 1473.3 427.5 2.29% 1.41%
PSO Penalty 190.4 88.1 3.34% 2.49%
10
GA Roul./Basic 793.0 518.1 2.65% 1.76%
GA Tourn./Basic 680.5 507.2 3.37% 1.79%
GA Roul./Arith. 1257.0 454.5 3.12% 1.96%
GA Tourn./Arith. 808.8 550.8 3.66% 1.76%
PSO Bumping 119.1 52.1 5.27% 2.50%
PSO Amnesia 320.6 96.8 6.77% 2.31%
PSO Random 1798.8 272.7 4.99% 2.40%
PSO Penalty 299.6 65.4 6.29% 3.26%
20
GA Roul./Basic 1239.9 506.4 3.62% 2.24%
GA Tourn./Basic 1078.4 510.8 3.46% 2.28%
GA Roul./Arith. 1615.2 272.6 5.72% 2.05%
GA Tourn./Arith. 1298.2 365.8 4.60% 2.78%
Tables with results 53
Table B.3: Comparison of the speed of the PSO and GA algorithms for solving the
portfolio optimization problem, running with 50 particles/chromosomes. Na is the
number of assets included in the portfolio; Algorithm identi es to which algorithm the
listed results refer to; t=it is the average time per iteration, for the given number of
assets and particles; Nit0 is the xed number of iterations used to solve the problem;
and t is the average time in seconds to solve the optimization problem
Na Algorithm t=it (ms) Nit0 t (s)
PSO Bumping 34.2 86 2.9
PSO Amnesia 33.9 142 4.8
PSO Random 33.9 1 608 54.5
PSO Penalty 34.5 202 7.0
5
GA Roul./Basic 44.2 479 21.2
GA Tourn./Basic 43.7 657 28.7
GA Roul./Arith. 43.0 1 621 69.7
GA Tourn./Arith 42.6 1 099 46.8
PSO Bumping 46.5 209 9.7
PSO Amnesia 46.0 371 17.1
PSO Random 46.0 2 328 107.1
PSO Penalty 46.6 367 17.1
10
GA Roul./Basic 69.6 1 829 127.3
GA Tourn./Basic 69.2 1 695 117.2
GA Roul./Arith. 68.3 2 166 148.0
GA Tourn./Arith 68.0 1 910 129.9
PSO Bumping 70.8 223 15.8
PSO Amnesia 70.1 514 36.0
PSO Random 70.5 2 344 165.1
PSO Penalty 71.2 430 30.6
20
GA Roul./Basic 121.6 2 253 274.1
GA Tourn./Basic 121.3 2 100 254.6
GA Roul./Arith. 120.0 2 160 259.3
GA Tourn./Arith 120.1 2 030 243.7
Tables with results 54
Table B.4: Comparison of the in uence of the initial position of the particles on the
consistency of the PSO and GA algorithms for solving the portfolio optimization prob-
lem, running with 50 particles/chromosomes and a maximum of 2000 iterations. The
particles were initially concentrated in an area of the search space. Na is the number
of assets included in the portfolio; Algorithm identi es to which algorithm the listed
results refer to; Nit is the average number of iterations that the algorithm took to nd
a solution within 0.1% of the minimum VaR found in a speci c run; N is the standard
deviation of this measure; "V aR is the average error between the VaR found by the
algorithm in the di erent runs and the minimum VaR found over all runs; and " is
the standard deviation of these errors
Na Algorithm Nit N "V aR "
PSO Bumping 82.1 80.2 4.02% 4.74%
PSO Amnesia 74.2 42.4 2.10% 3.65%
PSO Random 262.5 292.1 0.26% 0.47%
PSO Penalty 68.3 23.4 2.31% 3.73%
5
GA Roul./Basic 410.3 441.9 0.51% 0.87%
GA Tourn./Basic 390.3 519.2 0.96% 1.19%
GA Roul./Arith. 621.3 461.9 1.90% 2.11%
GA Tourn./Arith 398.3 427.1 1.26% 1.51%
PSO Bumping 202.9 136.0 36.59% 27.92%
PSO Amnesia 194.7 76.0 12.72% 12.19%
PSO Random 1455.9 441.7 2.63% 1.78%
PSO Penalty 208.7 106.6 13.87% 15.46%
10
GA Roul./Basic 787.3 502.4 2.78% 2.36%
GA Tourn./Basic 636.0 465.1 4.17% 2.78%
GA Roul./Arith. 1087.8 493.3 3.64% 1.99%
GA Tourn./Arith 951.1 499.1 3.78% 2.34%
PSO Bumping 107.5 77.4 76.87% 54.68%
PSO Amnesia 319.0 90.6 70.13% 38.39%
PSO Random 1726.7 142.7 3.92% 2.06%
PSO Penalty 320.2 86.0 74.45% 56.72%
20
GA Roul./Basic 1371.2 486.1 5.47% 4.03%
GA Tourn./Basic 933.3 389.0 4.56% 2.99%
GA Roul./Arith. 1492.9 341.7 6.60% 3.51%
GA Tourn./Arith 1318.0 480.3 6.11% 3.68%
Tables with results 55
Table B.5: Comparison of the in uence of the number of particles on the consistency
of the PSO and GA algorithms for solving the portfolio optimization problem, run-
ning with 10 assets and a maximum of 2000 iterations. Np is the number of parti-
cles/chromosomes used; Algorithm identi es to which algorithm the listed results refer
to; Nit is the average number of iterations that the algorithm took to nd a solution
within 0.1% of the minimum VaR found in a speci c run; N is the standard deviation
of this measure; "V aR is the average error between the VaR found by the algorithm in
the di erent runs and the minimum VaR found over all runs; and " is the standard
deviation of these errors
Np Algorithm Nit N "V aR "
PSO Bumping 105.7 44.4 6.54% 2.89%
PSO Amnesia 153.6 44.2 6.04% 3.52%
PSO Random 1519.5 410.6 3.62% 1.86%
PSO Penalty 153.6 37.6 6.26% 3.31%
10
GA Roul./Basic 797.0 539.2 4.62% 2.62%
GA Tourn./Basic 863.6 507.4 4.22% 2.27%
GA Roul./Arith. 1063.9 451.6 4.09% 2.12%
GA Tourn./Arith 1036.2 493.3 3.78% 2.33%
PSO Bumping 102.1 53.3 3.43% 1.38%
PSO Amnesia 163.4 103.8 4.02% 2.30%
PSO Random 1473.3 427.5 2.29% 1.41%
PSO Penalty 190.4 88.1 3.34% 2.49%
50
GA Roul./Basic 793.0 518.1 2.65% 1.76%
GA Tourn./Basic 680.5 507.2 3.37% 1.79%
GA Roul./Arith. 1257.0 454.5 3.12% 1.96%
GA Tourn./Arith. 808.8 550.8 3.66% 1.76%
PSO Bumping 58.8 38.9 1.27% 0.97%
PSO Amnesia 137.8 68.3 1.55% 1.10%
PSO Random 1494.4 341.6 1.81% 1.03%
PSO Penalty 126.6 43.8 1.46% 1.04%
250
GA Roul./Basic 650.0 414.0 0.62% 0.88%
GA Tourn./Basic 231.9 203.6 1.09% 0.90%
GA Roul./Arith. 1341.2 410.9 2.74% 1.12%
GA Tourn./Arith 1047.8 601.7 2.66% 1.04%
Tables with results 56
Table B.6: Comparison of the consistency of PSO and GA algorithms for solving the
portfolio optimization problem, running with 2000 particles/chromosomes and a max-
imum of 50 iterations. Na is the number of assets included in the portfolio; Algorithm
identi es to which algorithm the listed results refer to; Nit is the average number of
iterations that the algorithm took to nd a solution within 0.1% of the minimum VaR
found in a speci c run; N is the standard deviation of this measure; "V aR is the average
error between the VaR found by the algorithm in the di erent runs and the minimum
VaR found over all runs; and " is the standard deviation of these errors
Na Algorithm Nit N "V aR "
PSO Bumping 13.5 4.4 0.04% 0.08%
PSO Amnesia 19.0 3.3 0.10% 0.14%
PSO Random 24.1 3.9 0.17% 0.26%
PSO Penalty 19.3 3.3 0.20% 0.38%
5
GA Roul./Basic 34.1 9.4 0.53% 0.29%
GA Tourn./Basic 19.0 8.6 0.41% 0.39%
GA Roul./Arith. 19.2 11.9 1.22% 0.66%
GA Tourn./Arith 16.5 11.4 0.88% 0.64%
PSO Bumping 24.2 4.7 1.26% 1.14%
PSO Amnesia 34.6 4.0 1.55% 1.35%
PSO Random 44.6 4.5 2.08% 1.25%
PSO Penalty 35.3 3.2 1.57% 1.26%
10
GA Roul./Basic 41.9 6.3 2.55% 0.97%
GA Tourn./Basic 38.1 6.9 1.41% 1.10%
GA Roul./Arith. 15.2 10.3 5.85% 2.24%
GA Tourn./Arith 29.7 14.7 3.31% 1.20%
PSO Bumping 32.0 3.5 2.87% 2.17%
PSO Amnesia 47.1 1.7 3.87% 1.87%
PSO Random 48.5 1.8 6.25% 1.80%
PSO Penalty 46.8 1.8 3.99% 2.39%
20
GA Roul./Basic 45.6 4.6 5.32% 1.34%
GA Tourn./Basic 46.8 2.7 1.97% 1.25%
GA Roul./Arith. 22.1 12.4 8.54% 1.69%
GA Tourn./Arith 36.9 12.4 5.99% 1.38%
Tables with results 57
Table B.7: Comparison of the speed of the PSO and GA algorithms for solving the
portfolio optimization problem, running with 2000 particles/chromosomes. Na is the
number of assets included in the portfolio; Algorithm identi es to which algorithm the
listed results refer to; t=it is the average time per iteration, for the given number of
assets and particles; Nit0 is the xed number of iterations used to solve the problem;
and t is the average time in seconds to solve the optimization problem
Na Algorithm t=it (s) Nit0 t (s)
PSO Bumping 1.36 23 31
PSO Amnesia 1.36 26 35
PSO Random 1.36 32 44
PSO Penalty 1.39 26 36
5
GA Roul./Basic 6.44 53 341
GA Tourn./Basic 6.02 37 223
GA Roul./Arith. 6.50 43 279
GA Tourn./Arith 6.02 40 241
PSO Bumping 1.85 34 63
PSO Amnesia 1.85 43 80
PSO Random 1.85 54 100
PSO Penalty 1.88 42 79
10
GA Roul./Basic 8.49 55 467
GA Tourn./Basic 8.09 52 421
GA Roul./Arith. 8.46 36 305
GA Tourn./Arith 8.15 60 489
PSO Bumping 2.86 39 112
PSO Amnesia 2.85 51 145
PSO Random 2.86 53 151
PSO Penalty 2.89 51 148
20
GA Roul./Basic 13.31 55 732
GA Tourn./Basic 12.97 53 687
GA Roul./Arith. 12.56 47 590
GA Tourn./Arith 12.36 62 767
Appendix C
Figure C.1: Optimal portfolio weights, using di erent objective functions and di erent
horizons for the data
Pictures with results 60
Figure C.2: Evolution of solutions for a typical run of PSO using bumping strategy,
for 5 assets (3M, Citigroup, Coca-Cola, Gm and Microsoft)
Pictures with results 61
Figure C.3: Evolution of solutions for a typical run of PSO using amnesia strategy, for
5 assets (3M, Citigroup, Coca-Cola, Gm and Microsoft)
Pictures with results 62
Figure C.4: Evolution of solutions for a typical run of PSO using random positioning
strategy, for 5 assets (3M, Citigroup, Coca-Cola, Gm and Microsoft)
Pictures with results 63
Figure C.5: Evolution of solutions for a typical run of GA using roulette wheel selec-
tion and whole arithmetic crossover for 5 assets (3M, Citigroup, Coca-Cola, Gm and
Microsoft)
Pictures with results 64
Figure C.6: Evolution of solutions for a typical run of GA using tournament selection
and basic crossover for 5 assets (3M, Citigroup, Coca-Cola, Gm and Microsoft)
References
Artzner, P., Delbaen, F., Eber, J.M. & Heath, D. (1999). Coherent measures
of risk. Mathematical Finance , 9, 203{228.
Arumugam, M.S. & Rao, M. (2004). Novel hybrid approaches for real coded genetic
algorithm to compute the optimal control of a single stage hybrid manufacturing
systems. International Journal of Computational Intelligence , 1, 231{249.
Best, P.W. (1998). Implementing Value at Risk . John Wiley & Sons.
Blanco, A., Delgado, M. & M.C.Pegalajar (2001). A real-coded genetic algo-
rithm for training recurrent neural networks. Neural Networks , 14, 93{105.
Bodie, Z., Kane, A. & Marcus, A.J. (2005). Investments (International Edition).
McGraw-Hill, 6th edn.
Chang, T.J., Meade, N., Beasley, J. & Sharaiha, Y. (2000). Heuristics for
cardinality constrained portfolio optimisation. Computers & Operations Research ,
1271{1302.
Clerc, M. (1999). The swarm and queen: Towards a deterministic and adaptative par-
ticle swarm optimization. Proceedings of the IEEE Congress on Evolutionary Com-
putation , 1951{1957.
Eberhart, R. & Kennedy, J. (1995). A new optimizer using particle swarm theory.
Proceedings of the Sixth International Symposium on Micro Machine and Human
Science, Nagoya, Japan , 39{43.
References 66
Eckhardt, R. (1987). Stan Ulam, John von Neumann, and the Monte Carlo method.
Los Alamos Science , Special Issue, 131{137.
Hull, J.C. (2002). Fundamentals of Futures and Options Markets . Prentice Hall, 4th
edn.
Inui, K. & Kijima, M. (2005). On the signi cance of expected shortfall as a coherent
risk measure. Journal of Banking & Finance , 29, 853{864.
Jorion, P. (2001). Value at Risk: The New Benchmark for Managing Financial Risk .
McGraw-Hill, 2nd edn.
Kendall, G. & Su, Y. (2005). A particle swarm optimisation approach in the con-
struction of optimal risky portfolios. Proceedings of the 23rd IASTED International
Multi-Conference Arti cial Intelligence and Applications .
Knight, F.H. (1921). Risk, Uncertainty, and Pro t . Hart, Scha ner & Marx;
Houghton Miin Company, Boston, MA.
References 67
Li-Ping, Z., Huan-Jun, Y. & Shang-Xu, H. (2005). Optimal choice of parameters
for particle swarm optimization. Journal of Zhejiang University Science , 6A, 528{
534.
Markowitz, H.M. (1952). Portfolio selection. Journal of Finance , 7, 77 { 91.
Szego, G. (2002). Measures of risk. Journal of Banking & Finance , 26, 1253{1272.
Tasche, D. (2002). Expected shortfall and beyond. Journal of Banking & Finance ,
26, 1519{1533.
Xia, Y., Liu, B., Wang, S. & Lai, K.K. (2000). A model for portfolio selection
with order of expected returns. Computers & Operations Research , 409{422.
Zhang, W.J., Xie, X.F. & Bi, D.C. (2004). Handling boundary constraints for
numerical optimization by particle swarm ying in periodic search space. Congress
on Evolutionary Computation (CEC), Oregon, USA, 2307{2311.