0% found this document useful (0 votes)
417 views13 pages

Column Understanding The Kelly Criterion

Edward O. Thorp: The Kelly Criterion has been used successfully in gambling and in investing. He says it's simple: bet or invest so as to maximize expected growth rate of capital. The details can be mathematically subtle.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
417 views13 pages

Column Understanding The Kelly Criterion

Edward O. Thorp: The Kelly Criterion has been used successfully in gambling and in investing. He says it's simple: bet or invest so as to maximize expected growth rate of capital. The details can be mathematically subtle.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

Column overall title: A Mathematician on Wall Street

Column 23

Understanding The Kelly Criterion

by Edward O. Thorp

Copyright 2008

In January 1961 I spoke at the annual meeting of the American

Mathematical Society on “Fortune’s Formula: The Game of Blackjack.” This

announced the discovery of favorable card counting systems for blackjack.

My 1962 book Beat the Dealer explained the detailed theory and practice.

The “optimal” way to bet in favorable situations was an important feature. In

Beat the Dealer I called this, naturally enough, “The Kelly gambling system,”

since I learned about it from the 1956 paper by John L. Kelly. (Claude

Shannon, who refereed the Kelly paper, brought it to my attention in

November of 1960.) I’ve continued to use it successfully in gambling and in

investing. Since 1966 I’ve called it “the Kelly criterion” in my articles. The

rising tide of theory about and practical use of the Kelly Criterion by several

leading money managers received further impetus from William Poundstone’s

readable book about the Kelly Criterion, Fortune’s Formula. (As this title

came from that of my 1961 talk, I was asked to approve the use of the title.)

At a value investor’s conference held in Los Angeles in May, 2007, my son

reported that “everyone” said they were using the Kelly Criterion.

The Kelly Criterion is simple: bet or invest so as to maximize (after

each bet) the expected growth rate of capital, which is equivalent to

maximizing the expected value of the logarithm of wealth. But the details

6/25/2010 1
can be mathematically subtle. Since they’re not covered in Poundstone

(2005) you may wish to refer to my article “The Kelly Criterion in Blackjack,

Sports Betting and the Stock Market,” Handbook of Asset and Liability

Management, Volume I, Zenios and Ziemba editors, Elsvier 2006 (also

available on my website www.EdwardOThorp.com).

Hedge fund manager Mohnish Pabrai, in his new book The Dhandho

Investor, gives examples of the use of the Kelly Criterion for investment

situations. (Pabrai won the bidding for this year’s lunch with Warren Buffett,

paying over $600,000.) Consider his investment in Stewart Enterprises

(pages 108-115). His analysis gave what he believed to be a list of worst

case scenarios and payoffs over the next 24 months which I summarize in

Table 1.

Table 1 Stewart Enterprises, Payoff Within 24 Months

Probability Return
p1 = 0.80 R1 > 100%

p2 = 0.19 R2 > 0%

p3 = 0.01 R3 = −100%
______________________________
Sum=1.00

The expected growth rate of capital g ( f ) if we bet a fraction f of our

net worth is

3
(1) g ( f ) = ∑ pi ln ( 1 + Ri f )
i =1

6/25/2010 2
where ln means the logarithm to the base e. When we use Table 1 to

replace the pi by their values and the Ri by their lower bounds this gives the

conservative estimate for g ( f ) in equation (2):

(2) g ( f ) = 0.80 ln ( 1 + f ) + 0.01ln ( 1 − f )

Setting g ‵( f ) = 0 and solving gives the optimal Kelly fraction f * = 0.975

noted by Pabrai. Not having heard of the Kelly Criterion back in 2000, Pabrai

only bet 10% of his fund on Stewart. Would he have bet more, or less, if he

had then known about Kelly’s Criterion? Would I have? Not necessarily.

Here are some of the many reasons why.

(1) Opportunity costs. A simplistic example illustrates the idea. Suppose

Pabrai’s portfolio already had one investment which was statistically

independent of Stewart and with the same payoff probabilities. Then, by

symmetry, an optimal strategy is to invest in both equally. Call the optimal

Kelly fraction for each f *. Then 2 f * <1 since 2 f * = 1 has a positive

probability of total loss, which Kelly always avoids. So f * < 0.50. The same

reasoning for n such investments gives f * < 1/ n. Hence we need to know

the other investments currently in the portfolio, any candidates for new

investments, and their (joint) properties, in order to find the Kelly optimal

fraction for each new investment, along with possible revisions for existing

investments.

Pabrai’s discussion (e.g. pp. 78-81) of Buffett’s concentrated bets gives

considerable evidence that Buffet thinks like a Kelly investor, citing Buffett

bets of 25% to 40% of his net worth on single situations. Since f * < 1 is

6/25/2010 3
necessary to avoid total loss, Buffett must be betting more than .25 to .40 of

f * in these cases. The opportunity cost principle suggests it must be higher,

perhaps much higher. Here’s what Buffett himself says, as reported in

http://undergroundvalue.blogspot.com/2008/02/notes-from-buffett-meeting-

2152008_23.html, notes from a Q & A session with business students.

Emory:

With the popularity of “Fortune’s Formula” and the Kelly

Criterion, there seems to be a lot of debate in the value community

regarding diversification vs. concentration. I know where you side

in that discussion, but was curious if you could tell us more about

your process for position sizing or averaging down.

Buffett:

I have 2 views on diversification. If you are a professional

and have confidence, then I would advocate lots of

concentration. For everyone else, if it’s not your game,

participate in total diversification.

If it’s your game, diversification doesn’t make sense. It’s crazy to

put money in your 20th choice rather than your 1st choice. “LeBron

James” analogy. If you have LeBron James on your team, don’t take

him out of the game just to make room for some else.

Charlie and I operated mostly with 5 positions. If I were

running 50, 100, 200 million, I would have 80% in 5 positions, with

25% for the largest. In 1964 I found a position I was willing to go

heavier into, up to 40%. I told investors they could pull their money

out. None did. The position was American Express after the Salad

Oil Scandal. In 1951 I put the bulk of my net worth into GEICO.

With the spread between the on-the-run versus off-the-run 30 year

6/25/2010 4
Treasury bonds, I would have been willing to put 75% of my

portfolio into it. There were various times I would have gone up to

75%, even in the past few years. If it’s your game and you really

know your business, you can load up.

This supports the assertion in Rachel and (fellow Wilmott columnist)

Bill Ziemba’s new book, Scenarios for Risk Management and Global

Investment Strategies, that Buffett thinks like a Kelly investor when choosing

the size of an investment. They discuss Kelly and investment scenarios at

length.

Computing f * without considering the available alternative

investments is one of the most common oversights I’ve seen in the use of the

Kelly Criterion. It is a dangerous error because it generally overestimates f *

(2) Risk tolerance. As discussed at length in Thorp (2006, op. cit.), “full

Kelly” is too risky for the tastes of many, perhaps most, investors and using

instead an f = cf *, with fraction c where 0 < c < 1, or “fractional Kelly” is

much more to their liking. Full Kelly is characterized by drawdowns which are

too large for the comfort of many investors.

(3) The “true” scenario is worse than the supposedly conservative lower

bound estimate. Then we are inadvertently betting more than f * and, as

discussed in Thorp (2006, op. cit.), we get more risk and less return, a

strongly suboptimal result. Betting f = cf *, 0 < c < 1, gives some protection

against this.

(4) Black swans. As fellow Wilmott columnist Nassim Nicholas Taleb has

pointed out so eloquently in his new bestseller The Black Swan, humans tend

6/25/2010 5
not to appreciate the effect of relatively infrequent unexpected high impact

events. Failing to allow for these “black swans,” scenarios often don’t

adequately consider the probabilities of large losses. These large loss

probabilities may substantially reduce f *.

(5) The “long run.” The Kelly Criterion’s superior properties are

asymptotic, appearing with increasing probability as time increases. For

instance:

As time t tends to infinity the Kelly bettor’s fortune will, with

probability tending to 1, permanently surpass that of any bettor

following an “essentially different” strategy.

The notion of “essentially different” has confounded some well

known quants so I’ll take time here to explore some of its subtleties.

Consider for simplicity repeated tosses of a favorable coin. The

outcome of the n th trial is Χ n where P ( Χ n = 1) = p > 1


2 and P ( Χ n = −1)

is 1 − p = q > 0. The { Χ n } are independent identically distributed

random variables. The Kelly fraction is f ∗ = p − q = E ( Χ n ) > 0. The

Kelly strategy is to bet a fraction f n = f ∗ at each trial n = 1, 2, K . Now

consider a strategy which bets g n , n = 1, 2, K at each trial with g n ≠ f ∗

for some n ≤ N and g n = f ∗ thereafter. So the { g n } strategy differs

from Kelly on at least one of the first N trials but copies it thereafter.

There is a positive probability that { g n } is ahead of Kelly at time N ,

6/25/2010 6
hence ahead for all n ≥ N . For example consider the sequence of the

first N outcomes such that Χ n = 1 if g n > f ∗ and Χ n = −1 if g n ≤ f ∗ .

Then for this specific sequence, which has probability ≥ q N , { gn } gains

more than Kelly for each n ≤ N where g n ≠ f ∗, hence exceeds Kelly for

all n ≥ N .

What if instead in this coin tossing example we require that

g n ≠ f ∗ for infinitely many n ? This question arose indirectly about 15

years ago in the newsletter Blackjack Forum when a well known anti

Kellyite, John Leib, challenged a well known blackjack expert with

(approximately) this proposition bet: Leib would produce a strategy

which differed from Kelly at every trial but would (with probability as

close to 1 as you wish), after a finite number of trials, get ahead of

Kelly and stay ahead forever. When I read the challenge I immediately

saw how Leib could win the bet.

Leib’s Paradox: Assuming capital is infinitely divisible, then

given ε > 0 there is an N > 0 and a sequence { fn } with f n ≠ f ∗ for all

( )
n
n, such that P Vn* < Vn for all n ≥ N > 1 − ε where V = Π ( 1 + f Χ ) and
n i i
i =1

( )
n
Vn* = Π 1 + f * Χi . Furthermore there is a b > 1 such that
i =1

( ) ( )
P Vn / Vn* ≥ b, n ≥ N > 1 − ε and P Vn − Vn* → ∞ > 1 − ε . That is, for some

6/25/2010 7
N there is a non Kelly sequence that beats Kelly “infinitely badly” with

probability 1 − ε for all n ≥ N .

Note: The infinite divisibility of capital can be dealt with as

needed in examples where there is a minimum monetary unit by

choosing a sufficiently large starting capital.

PROOF. The proof has two parts. First we want to establish the

assertion for n = N . Second we show that once we have an { f n , n ≤ N }

that is ahead of Kelly at n = N , we can construct {f n }


≠ f * , n > N to stay

ahead.

To see the second part, suppose VN > VN . Then VN ≥ a + bVN for


* *

some a > 0, b > 1. For instance VN − VN ≥ c > 0 since there are only a
*

finite number of sequences of outcomes in the first N trials, hence

only a finite number with VN > VN . So


*

VN ≥ c + VN* ≥ c / 2 + ( c / 2 ) + VN*  = c / 2 +  dMaxVN* + VN*  ≥ c / 2 + ( d + 1) VN*

where dMaxVN = c / 2 defines d > 0 and MaxVN is over all sequences of


* *

the first N trials such that VN > VN . Setting c / 2 = a > 0 and d + 1 = b > 1
*

suffices. Once we have VN ≥ a + bVN we can, for bookkeeping purposes,


*

partition our capital into two parts: a and bVN . For n > N we bet
*

f n = f * from bVN* and an additional amount a / 2n from the a part, for a

6/25/2010 8
total which is generally unequal to f * of our capital. If by chance for

some n the total equals f * of our total capital we simply revise a / 2n

to a / 3n for that n. The portion bVN will become bVN for n > N and the
* *

portion a will never be exhausted so we have Vn > bVn for all n > N .
*

*
( ) ( *
)
Hence, since P Vn → ∞ = 1, we have P Vn / Vn ≥ b = 1 from which it

(
follows that P Vn − Vn → ∞ = 1.
*
)
To prove the first part, we show how to get ahead of Kelly with

probability 1 − ε within a finite number of trials. The idea is to begin by

betting less than Kelly by a very small amount. If the first outcome is a

loss, then we have more than Kelly and use the strategy from the proof

of the second part to stay ahead. If the first outcome is a win, we’re

behind Kelly and now underbet on the second trial by enough so that a

loss on the second trial will put us ahead of Kelly. We continue this

strategy until either there is a loss and we are ahead of Kelly or until

even betting 0 is not enough to surpass Kelly after a loss. Given any

N , if our initial underbet is small enough, we can continue this

strategy for up to N trials. The probability of the strategy failing is

p N , 12 < p < 1. Hence, given ε > 0, we can choose N such that p N < ε

and the strategy therefore succeeds on or before trial N with

probability 1 − p N > 1 − ε .

6/25/2010 9
More precisely: suppose the first n trials are wins and we have

bet a fraction f − ai with ai > 0, i = 1,K , n, on the i th trial. Then


*

Vn
=
( ) (
1 + f * − a1 K 1 + f * − an )
Vn* ( ) (
1+ f * K 1+ f * )
 a   an 
= 1 − 1 * K 1 − * 
> ( 1 − a1 ) K ( 1 − an ) > 1 − ( a1 + K + an )
 1+ f   1+ f 

where the last inequality is proven easily by induction. Letting

a1 + K + an = a, so Vn / Vn* > 1 − a, what betting fraction f * − b will put us

ahead of Kelly if the next trial is a loss? A sufficient condition is

=
(
Vn +1 Vn 1 − f + b
*
) 
> ( 1 − a ) 1 +
b 
> ( 1 − a ) ( 1 + b ) ≥ 1 or b ≥
a
*
*
Vn +1 Vn 1 − f
* *
( )  1− f  1− a

provided b ≤ f * and 0 < a < 1. If a ≤ 1


2 then b = 2a suffices. Proceeding

recursively, we have these conditions on the ai : choose a1 > 0. Then

an +1 = 2 ( a1 + K + an ) , n = 1, 2,K provided all the an ≤ 1 .


2 Letting

f ( x ) = a1 x + a2 x 2 +K we get the equation

(
f ( x ) − a1 x = 2 xf ( x ) 1 + x + x 2 + K )
= 2 xf ( x ) / ( 1 − x )

6/25/2010 10
 ∞

whose solution is f ( x ) = a1  x + 2

∑3
n=2
n −2
x n  from which an = 2a1 3n −2 if

n ≥ 2. Then given ε > 0 and an N such that p N < ε it suffices to

choose a1 so that aN = 2a1 3


N −2
(
≤ min f * , 12 . Q.E.D.)
Although Leib didn’t have the mathematical background to give

such a proof he understood the idea and indicated this sort of

procedure.

So far we’ve seen that all sequences which differ from Kelly for

only a finite number of trials, and some sequences which differ

infinitely often (even always), are not essentially different. How can

we tell, then, if a betting sequence is essentially different than Kelly?

Going to a more general setting than coin tossing, assume now for

simplicity that the payoff random variables X i are independent and

bounded below but not necessarily identically distributed.

At this point we come to an important distinction. In financial

applications one commonly assumes that the f i are constants that are

dependent only on the current period payoff random variable (or

variables). Such “myopic strategies” might arise for instance, by

selecting a utility function and maximizing expected utility to

determine the amount to bet. However, for gambling systems the

amount depend on previous outcomes, i.e. f n = f n ( X 1 , X 2 , K , X n −1 ) , just

as it does in the Leib example. As professor Stewart Ethier pointed

out, our discussion of “essentially different” is for the constant f i case.

6/25/2010 11
For a more general case, including the Leib example and many of the

classical gambling systems, I recommend Ethier’s forthcoming book on

the mathematics of gambling, The Doctrine of Chances, Springer-

Verlag, Berlin (2008 or 2009).

We assume E ( X i ) > 0 for all X i from which it follows that f i > 0


*

( )
n n
for all i. As before, Vn = Π ( 1 + f i X i ) and Vn* = Π 1 + f i * X i from which
i =1 i =1

n n
( )
ln Vn = ∑ ln ( 1 + f i X i ) and ln Vn* = ∑ ln 1 + fi * X i . Note from the definition
i =1 i =1

( )
f * that E ln 1 + fi X i ≥ E ln ( 1 + fi X i ) , where E denotes the expected
*

value, with equality if and only if f i = f i . Hence


*

{ ( }
n n
( ) )
E ln Vn* / Vn = E ∑ ln 1 + f i * X i − ln ( 1 + f i X i ) = ∑ ai
i =1 i =1

where ai ≥ 0 and ai = 0 if and only if f i = f i . This series of non-


*

negative terms either increases to infinity or increases to a positive

limit M . We say { fi } is essentially different from {f } i


*
if and only if

∑a
i =1
i tends to infinity as n increases. Otherwise, { fi } is not essentially

different from { f }.
i
*
The basic idea here can be applied to more

general settings.

6/25/2010 12
(6) Given a large fixed goal, e.g. to multiply your capital by 100, or 1000,

the expected time for the Kelly investor to get there tends to be least.

Is a wealth multiple of 100 or 1000 realistic? Indeed. In the 51½ years

from 1956 to mid 2007, Warren Buffett has increased his wealth to about

$5x1010. If he had $2.5x104 in 1956, that’s a multiple of 2x106. We know he

had about $2.5x107 in 1969 so his multiple over the last 38 years is about

2x103. Even my own efforts, as a late starter on a much smaller scale, have

multiplied capital by more than 2x104 over the 41 years from 1967 to early

2007. I know many investors and hedge fund managers who have achieved

such multiples.

The caveat here is that an investor or bettor many not choose to make,

or be able to make, enough Kelly bets for the probability to be “high enough”

for these asymptotic properties to prevail, i.e. he doesn’t have enough

opportunities to make it into this “long run.” In a subsequent article we’ll

explore for which investors Kelly or fractional Kelly may be a more or less

appropriate approach. An important consideration will be the investor’s

expected future wealth multiple.

6/25/2010 13

You might also like