Zhi Dong Bai, Random Matrix Theory and Its Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 174

RANDOM MATRIX THEORY AND

ITS APPLICATIONS
Multivariate Statistics and Wireless Communications
LECTURE NOTES SERIES
Institute for Mathematical Sciences, National University of Singapore

Series Editors: Louis H. Y. Chen and Ser Peow Tan


Institute for Mathematical Sciences
National University of Singapore
ISSN: 1793-0758

Published

Vol. 7 Markov Chain Monte Carlo: Innovations and Applications


edited by W. S. Kendall, F. Liang & J.-S. Wang
Vol. 8 Transition and Turbulence Control
edited by Mohamed Gad-el-Hak & Her Mann Tsai
Vol. 9 Dynamics in Models of Coarsening, Coagulation, Condensation
and Quantization
edited by Weizhu Bao & Jian-Guo Liu

Vol. 10 Gabor and Wavelet Frames


edited by Say Song Goh, Amos Ron & Zuowei Shen

Vol. 11 Mathematics and Computation in Imaging Science and


Information Processing
edited by Say Song Goh, Amos Ron & Zuowei Shen

Vol. 12 Harmonic Analysis, Group Representations, Automorphic Forms


and Invariant Theory In Honor of Roger E. Howe
edited by Jian-Shu Li, Eng-Chye Tan, Nolan Wallach & Chen-Bo Zhu

Vol. 13 Econometric Forecasting and High-Frequency Data Analysis


edited by Roberto S. Mariano & Yiu-Kuen Tse
Vol. 14 Computational Prospects of Infinity Part I: Tutorials
edited by Chitat Chong, Qi Feng, Theodore A Slaman, W Hugh Woodin
& Yue Yang
Vol. 15 Computational Prospects of Infinity Part II: Presented Talks
edited by Chitat Chong, Qi Feng, Theodore A Slaman, W Hugh Woodin
& Yue Yang

Vol. 16 Mathematical Understanding of Infectious Disease Dynamics


edited by Stefan Ma & Yingcun Xia
Vol. 17 Interface Problems and Methods in Biological and Physical Flows
edited by Boo Cheong Khoo, Zhilin Li & Ping Lin
Vol. 18 Random Matrix Theory and Its Applications
edited by Zhidong Bai, Yang Chen & Ying-Chang Liang

*For the complete list of titles in this series, please go to


http://www.worldscibooks.com/series/LNIMSNUS

LaiFun - Random Matrix Theory.pmd 2 6/19/2009, 2:51 PM


Lecture Notes Series, Institute for Mathematical Sciences, Vol.
National University of Singapore 18

RANDOM MATRIX THEORY AND


ITS APPLICATIONS
Multivariate Statistics and Wireless Communications

Editors

Zhidong Bai
National University of Singapore, Singapore
and
Northeast Normal University, P. R. China

Yang Chen
Imperial College London, UK

Ying-Chang Liang
Institute for Infocomm Research, Singapore

World Scientific
NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TA I P E I CHENNAI
A-PDF Merger DEMO : Purchase from www.A-PDF.com to remove the watermark

Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore
Vol. 18
RANDOM MATRIX THEORY AND ITS APPLICATIONS
Multivariate Statistics and Wireless Communications
Copyright 2009 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.

ISBN-13 978-981-4273-11-4
ISBN-10 981-4273-11-2

Printed in Singapore.

LaiFun - Random Matrix Theory.pmd 1 6/19/2009, 2:51 PM


March 17, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) contents-vol18

CONTENTS

Foreword vii

Preface ix

The Stieltjes Transform and its Role in Eigenvalue Behavior of


Large Dimensional Random Matrices
Jack W. Silverstein 1

Beta Random Matrix Ensembles


Peter J. Forrester 27

Future of Statistics
Zhidong Bai and Shurong Zheng 69

The and Shannon Transforms: A Bridge between Random


Matrices and Wireless Communications
Antonia M. Tulino 95

The Replica Method in Multiuser Communications


Ralf R. Muller 139

v
This page intentionally left blank
March 10, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) foreword-vol18

FOREWORD

The Institute for Mathematical Sciences at the National University of


Singapore was established on 1 July 2000. Its mission is to foster mathemat-
ical research, both fundamental and multidisciplinary, particularly research
that links mathematics to other disciplines, to nurture the growth of mathe-
matical expertise among research scientists, to train talent for research in
the mathematical sciences, and to serve as a platform for research inter-
action between the scientific community in Singapore and the wider inter-
national community.
The Institute organizes thematic programs which last from one month
to six months. The theme or themes of a program will generally be of
a multidisciplinary nature, chosen from areas at the forefront of current
research in the mathematical sciences and their applications.
Generally, for each program there will be tutorial lectures followed by
workshops at research level. Notes on these lectures are usually made avail-
able to the participants for their immediate benefit during the program. The
main objective of the Institutes Lecture Notes Series is to bring these lec-
tures to a wider audience. Occasionally, the Series may also include the pro-
ceedings of workshops and expository lectures organized by the Institute.
The World Scientific Publishing Company has kindly agreed to publish
the Lecture Notes Series. This Volume, Random Matrix Theory and Its
Applications: Multivariate Statistics and Wireless Communications, is the
eighteenth of this Series. We hope that through the regular publication
of these lecture notes the Institute will achieve, in part, its objective of
promoting research in the mathematical sciences and their applications.

February 2009 Louis H. Y. Chen


Ser Peow Tan
Series Editors

vii
This page intentionally left blank
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) preface-vol18

PREFACE

Random matrices first appeared in multivariate statistics with the work of


Wishart, Hsu and others in the 1930s and enjoyed tremendous impetus in
the 1950s and 1960s due to the important contribution of Dyson, Gaudin,
Mehta and Wigner.
The 1990s and beyond saw a resurgent random matrix theory because
of the rapid development in low-dimensional string theory.
The next high-water mark involves the discovery of probability laws
of the extreme eigenvalues of certain families of large random matrices
made by Tracy and Widom. These turned out to be particular solutions
of Painleve equations building on the work of Jimbo, Miwa, Mori, Sato,
Mehta, Korepin, Its and others.
The current volume in the IMS series resulting from a workshop held at
the Institute for Mathematical Science of the National University of Singa-
pore in 2006 has five extensive lectures on various aspect of random matrix
theory and its applications to statistics and wireless communications.
Chapter 1 by Jack Silverstein studies the eigenvalue, in particular, the
eigenvalue density of a general class of random matrices only mild con-
ditions were imposed on the entries using the Stieltjes transform. This
is followed by Chapter 2 of Peter Forrester which deals with those class
random matrices where there is an explicit joint probability density of the
eigenvalues and the symmetry parameter which describe the logarith-
mic repulsion between the eigenvalues takes on general values. Chapter 3
by Zhidong Bai is a survey of the future in statistics taking into account
of the impact modern high speed computing facilities and storage space. In
the next two chapters, one finds applications of random matrix theory to
wireless communications typified in the multi-input multi-output situation
commonly found, for example, in mobile phones. Chapter 4 by Antonia
Tulino uses the Shannon transform intimately related to the Stieltjes

ix
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) preface-vol18

x Preface

transform discussed in Chapter 1 to compute quantities of interest in


wireless communication. In the last chapter, Ralf Muller made use of the
Replica Methods developed by Edwards and Anderson in their investigation
of spin-glasses to tackle multiuser problems in wireless communications.

February 2009 Zhidong Bai


National University of Singapore, Singapore
& Northeast Normal University, P. R. China
Yang Chen
Imperial College London, UK
Ying-Chang Liang
Institute of Infocomm Research, Singapore
Editors
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

THE STIELTJES TRANSFORM AND ITS ROLE IN


EIGENVALUE BEHAVIOR OF LARGE DIMENSIONAL
RANDOM MATRICES

Jack W. Silverstein
Department of Mathematics
North Carolina State University
Box 8205, Raleigh, North Carolina 27695-8205, USA
E-mail: [email protected]

These lectures introduce the concept of the Stieltjes transform of a mea-


sure, an analytic function which uniquely chacterizes the measure, and
its importance to spectral behavior of random matrices.

1. Introduction
Let M(R) denote the collection of all subprobability distribution func-
tions on R. We say for {Fn } M(R), Fn converges vaguely to F
v
M(R) (written Fn F ) if for all [a, b], a, b continuity points of F ,
D
limn Fn {[a, b]} = F {[a, b]}. We write Fn F , when Fn , F are prob-
ability distribution functions (equivalent to limn Fn (a) = F (a) for all
continuity points a of F ).
For F M(R),


1
mF (z) dF (x), z C+ {z C : z > 0}
xz

is dened as the Stieltjes transform of F .


Below are some fundamental properties of Stieltjes transforms:

(1) mF is an analytic function on C+ .


(2) mF (z) > 0.
1
(3) |mF (z)| z .

1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

2 J. W. Silverstein

(4) For continuity points a < b of F


 b
1
F {[a, b]} = lim mF ( + i)d,
0+ a

since the right hand side


 b
1
= lim+ dF (x)d
0 a (x )2 + 2

  b
1
= lim ddF (x)
0+ a (x )2 + 2
     
1 1 bx 1 ax
= lim Tan Tan dF (x)
0+

= I[a,b] dF (x) = F {[a, b]}.

(5) If, for x0 R, mF (x0 ) limzC+ x0 mF (z) exists, then F is dier-
entiable at x0 with value ( 1 )mF (x0 ) ([9]).

Let S C+ be countable with a cluster point in C+ . Using ([4]), the


v
fact that Fn F is equivalent to
 
fn (x)dFn (x) f (x)dF (x)

for all continuous f vanishing at , and the fact that an analytic function
dened on C+ is uniquely determined by the values it takes on S, we have
v
Fn F mFn (z) mF (z) for all z S.

The fundamental connection to random matrices is:

For any Hermitian n n matrix A, we let F A denote the empirical


distribution function (e.d.f.) of its eigenvalues:
1
F A (x) = (number of eigenvalues of A x).
n
Then
1
mF A (z) = tr (A zI)1 .
n
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 3

So, if we have a sequence {An } of Hermitian random matrices, to show,


v
with probability one, F An F for some F M(R), it is equivalent to
+
show for any z C

1
tr (An zI)1 mF (z) a.s.
n
The main goal of the lectures is to show the importance of the Stielt-
jes transform to limiting behavior of certain classes of random matrices.
We will begin with an attempt at providing a systematic way to show a.s.
convergence of the e.d.f.s of the eigenvalues of three classes of large di-
mensional random matrices via the Stieltjes transform approach. Essential
properties involved will be emphasized in order to better understand where
randomness comes in and where basic properties of matrices are used.
Then it will be shown, via the Stieltjes transform, how the limiting dis-
tribution can be numerically constructed, how it can explicitly (mathemat-
ically) be derived in some cases, and, in general, how important qualitative
information can be inferred. Other results will be reviewed, namely the
exact separation properties of eigenvalues, and distributional behavior of
linear spectral statistics.
It is hoped that with this knowledge other ensembles can be explored
for possible limiting behavior.
Each theorem below corresponds to a matrix ensemble. For each one
the random quantities are dened on a common probability space. They all
assume:
n
For n = 1, 2, . . . Xn = (Xij ), n N , Xij
n
C, i.d. for all n, i, j, independent
1 1 2
across i, j for each n, E|X1 1 EX1 1 | = 1, and N = N (n) with n/N c > 0
as n .

Theorem 1.1. ([6], [8]). Assume:

(a) Tn = diag(tn1 , . . . , tnn ), tni R, and the e.d.f. of {tn1 , . . . , tnn } converges
weakly, with probability one, to a nonrandom probability distribution
function H as n .
v
(b) An is a random N N Hermitian random matrix for which F An A
where A is nonrandom (possibly defective).
(c) Xn , Tn , and An are independent.

Let Bn = An + (1/N )Xn Tn Xn . Then, with probability one, F Bn F as


v
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

4 J. W. Silverstein

n where for each z C+ m = mF (z) satisfies


  
t
m = mA z c dH(t) . (1.1)
1 + tm
It is the only solution to (1.1) with positive imaginary part.

Theorem 1.2. ([10], [7]). Assume:


Tn n n is random Hermitian non-negative definite, independent of Xn
D
with F Tn H a.s. as n , H nonrandom.
1/2
Let Tn denote any Hermitian square root of Tn , and define Bn =
1/2 1/2
(1/N )Tn XX Tn . Then, with probability one, F Bn F as n
D

where for each z C+ m = mF (z) satisfies



1
m= dH(t). (1.2)
t(1 c czm) z
It is the only solution to (1.2) in the set {m C : (1 c)/z + cm C+ }.

Theorem 1.3. ([3]). Assume:



Rn n N is random, independent of Xn , with F (1/N )Rn Rn H a.s. as
D

n , H nonrandom.
Let Bn = (1/N )(Rn + Xn )(Rn + Xn ) where > 0, nonrandom.
Then, with probability one, F Bn F as n where for each z C+
D

m = mF (z) satisfies

1
m= 2 cm)z + 2 (1 c)
dH(t) . (1.3)
t
1+2 cm (1 +

It is the only solution to (1.3) in the set {m C+ : (mz) 0}.

Remark 1.4. In Theorem 1.1, if An = 0 for all n large, then mA (z) = 1/z
and we nd that mF has an inverse

1 t
z = +c dH(t). (1.4)
m 1 + tm
Since
 
n n 1/2
F (1/N )Xn Tn Xn = I[0,) + F (1/N )Tn Xn Xn Tn
1/2
1
N N
we have
1 n/N n
mF (1/N )Xn Tn Xn (z) = + m (1/N )Tn1/2 Xn X Tn1/2 (z) z C+ ,
z N F n

(1.5)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 5

so we have
1c
mF (z) = + cmF (z). (1.6)
z
Using this identity, it is easy to see that (1.2) and (1.4) are equivalent.

2. Why These Theorems are True


We begin with three facts which account for most of why the limiting results
are true, and the appearance of the limiting equations for the Stieltjes
transforms.

Lemma 2.1. For nn A, q Cn , and t C with A and A+tqq invertible,


we have
1
q (A + tqq )1 = q A1
1 + tq A1 q

(since q A1 (A + tqq ) = (1 + tq A1 q)q ).

Corollary 2.1. For q = a + b, t = 1 we have

a A1 (a + b)
a (A + (a + b)(a + b) )1 = a A1 (a + b) A1
1 + (a + b) A1 (a + b)

1 + b A1 (a + b) 1 a A1 (a + b)
= a A b A1 .
1 + (a + b) A1 (a + b) 1 + (a + b) A1 (a + b)

Proof . Using Lemma 2.1, we have

(A + (a + b)(a + b))1 A1 = (A + (a + b)(a + b))1 (a + b)(a + b) A1

1
= A1 (a + b)(a + b) A1 .
1 + (a + b) A1 (a + b)
Multiplying both sides on the left by a gives the result.

Lemma 2.2. For n n A and B, with B Hermitian, z C+ , t R, and


q Cn , we have
 
 q (BzI)1 A((BzI)1 q  A
|tr [(BzI)1 (B+tqq zI)1 ]A|= t 
 z .
1+tq (BzI)1 q
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

6 J. W. Silverstein

Proof . The identity follows from Lemma 2.1. We have


 
 q (B zI)1 A((B zI)1 q  1 2
t  A |t| (B zI) q .
 1 + tq (B zI)1 q  |1 + tq (B zI)1 q|

Write B = i i ei ei , its spectral decomposition. Then
 |e q|2
(B zI)1 q 2 = i

i
|i z|2

and
 |e q|2
|1 + tq (B zI)1 q| |t|(q (B zI)1 q) = |t|z i
.
i
|i z|2

Lemma 2.3. For X = (X1 , . . . , Xn )T i.i.d. standardized entries, C n n,


we have for any p 2

p/2

E|X CX tr C|p Kp E|X1 |4 tr CC + E|X1 |2p tr (CC )p/2


where the constant Kp does not depend on n, C, nor on the distribution of
X1 . (Proof given in [1].)

From these properties, roughly speaking, we can make observations like



the following: for n n Hermitian A, q = (1/ n)(X1 , . . . , Xn )T , with Xi
i.i.d. standardized and independent of A, and z C+ , t R
tq (A zI)1 q
tq (A + tqq zI)1 q =
1 + tq (A zI)1 q

1 1
=1 1
1 + tq (A zI)1 q 1 + t(1/n)tr (A zI)1

1
1 .
1 + t mA+tqq (z)
Making this and other observations rigorous requires technical consid-
erations, the rst being truncation and centralization of the elements of
Xn , and truncation of the eigenvalues of Tn in Theorem 1.2 (not needed
in Theorem 1.1) and (1/n)Rn Rn in Theorem 1.3, all at a rate slower than
n (a ln n for some positive a is sucient). The truncation and centraliza-
tion steps will be outlined later. We are at this stage able to go through
algebraic manipulations, keeping in mind the above three lemmas, and in-
tuitively derive the equations appearing in each of the three theorems. At
the same time we can see what technical details need to be worked out.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 7

Before continuing, two more basic properties of matrices are included


here.

Lemma 2.4. Let z1 , z2 C+ with max( z1 ,  z2 ) v > 0, A and B n n


with A Hermitian, and q CC n . Then
1
|tr B((A z1 I)1 (A z2 I)1 )| |z2 z1 |N B , and
v2
1
|q B(A z1 I)1 q q B(A z2 I)1 q| |z2 z1 | q 2 B .
v2

Consider rst the Bn in Theorem 1.1. Let qi denote 1/ N times the
ith column of Xn . Then

n
(1/N )Xn Tn Xn = ti qi qi .
i=1

Let B(i) = Bn ti qi qi . For any z C +


and x C we write
Bn zI = An (z x)I + (1/N )Xn Tn Xn xI.
Taking inverses we have
(An (z x)I)1

= (Bn zI)1 + (An (z x)I)1 ((1/N )Xn Tn Xn xI)(Bn zI)1 .


Dividing by N , taking traces and using Lemma 2.1 we nd
mF An (z x) mF Bn (z)
n

1
= (1/N )tr (An (z x)I) ti qi qi xI (Bn zI)1

i=1


n
ti qi (B(i) zI)1 (An (z x)I)1 qi
= (1/N )
i=1
1 + ti qi (B(i) zI)1 qi

x(1/N )tr (Bn zI)1 (An (z x)I)1 .


Notice when x and qi are independent, Lemmas 2.2, 2.3 give us
qi (B(i) zI)1 (An (z x)I)1 qi (1/N )tr (Bn zI)1 (An (z x)I)1 .
Letting

n
ti
x = xn = (1/N )
i=1
1 + ti mF Bn (z)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

8 J. W. Silverstein

we have

n
ti
mF An (z xn ) mF Bn (z) = (1/N ) di (2.1)
i=1
1 + ti mF Bn (z)
where
1 + ti mF Bn (z)
di = q (B(i) zI)1 (An (z xn )I)1 qi
1 + ti qi (B(i) zI)1 qi i

(1/N )tr (Bn zI)1 (An (z xn )I)1 .


In order to use Lemma 2.3, for each i, xn is replaced by

n
tj
x(i) = (1/N ) .
j=1
1 + tj mF B(i) (z)

An outline of the remainder of the proof is given. It is easy to argue that


if A is the zero measure on R (that is, almost surely, only o(N ) eigenvalues
of An remain bounded), then the Stieltjes transforms of F An and F Bn
converge a.s. to zero, the limits obviously satisfying (1.1). So we assume A
is not the zero measure. One can then show
= inf (mF Bn (z))
n

is positive almost surely.


Using Lemma 2.3 (p = 6 is sucient) and the fact that all matrix in-
verses encountered are bounded in spectral norm by 1/z we have from
standard arguments using Booles and Chebyshevs inequalities, almost
surely
max max[| qi 2 1|, |qi (B(i) zI)1 qi mF B(i) (z)|, (2.2)
in

1
|qi (B(i) zI)1 (An (zx(i) )I)1 qi tr (B(i) zI)1 (An (zx(i) )I)1 |]
N
0 as n .
D
Consider now a realization for which (2.2) holds, > 0, F Tn H, and
v
F An A. From Lemma 2.2 and (2.2) we have
max max[|mF Bn (z)mF B(i) (z)|, |mF Bn (z)qi (B(i) zI)1 qi |] 0, (2.3)
in

and subsequently
  
 1 + ti mF Bn (z) 
max max   
1 , |x x(i) | 0. (2.4)
in 1 + ti q (B zI)1 qi
i (i)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 9

Therefore, from Lemmas 2.2, 2.4, and (2.2)-(2.4), we get maxin di 0,


and since
 
 ti  1
 
 1 + ti m Bn (z)  ,
F

we conclude from (2.1) that


mF An (z xn ) mF Bn (z) 0.
Consider a subsequence {ni } on which mF Bni (z) converges to a number
m. It follows that

t
xni c dH(t).
1 + tm
Therefore, m satises (1.1). Uniqueness (to be discussed later) gives us, for
this realization mF Bn (z) m. This event occurs with probability one.

3. The Other Equations


1/2 1/2
Let us now derive the equation for the matrix Bn = (1/N )Tn Xn Xn Tn ,

after the truncation steps have been taken. Let cn = n/N , qj = (1/ n)Xj
1/2
(the jth column of Xn ), rj = (1/ N )Tn Xj , and B(j) = Bn rj rj . Fix
+
z C and let mn (z) = mF Bn (z), mn (z) = mF (1/N )Xn Tn Xn (z). By (1.5)
we have
1 cn
mn (z) = + cn m n . (3.1)
z
We rst derive an identity for mn (z). Write

N
Bn zI + zI = rj rj .
j=1

Taking the inverse of Bn zI on the right on both sides and using


Lemma 2.1, we nd

N
1
I + z(Bn zI)1 = rj rj (B(j) zI)1 .
j=1
1 + rj (B(j) zI)1 rj

Taking the trace on both sides and dividing by N we have


1  rj (B(j) zI)1 rj
N
cn + zcn mn =
N j=1 1 + rj (B(j) zI)1 rj

1 
N
1
= 1 .
N j=1 1 + rj (B(j) zI)1 rj
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

10 J. W. Silverstein

Therefore
1 
N
1
mn (z) = . (3.2)
N j=1 z(1 + rj (B(j) zI)1 rj )

N
Write Bn zI zmn (z)Tn zI = j=1 rj rj (zmn (z))Tn . Taking
inverses and using Lemma 2.1, (3.2) we have
(zmn (z)Tn zI)1 (Bn zI)1
N 

= (zmn (z)Tn zI)1
rj rj (zmn (z))Tn (Bn zI)1

j=1

N
1 
= (B 1 r )
(mn (z)Tn + I)1 rj rj (B(j) zI)1
j=1
z(1 + rj (j) zI) j

(1/N )(mn (z)Tn + I)1 Tn (Bn zI)1 .
Taking the trace and dividing by n we nd
1 
N
1 1
(1/n)tr (zmn (z)Tn zI) mn (z) = dj
N j=1 z(1 + rj (B(j) zI)1 rj )
where
dj = qj Tn1/2 (B(j) zI)1 (mn (z)Tn + I)1 Tn1/2 qj

(1/n)tr (mn (z)Tn + I)1 Tn (Bn zI)1 .


The derivation for Theorem 1.3 will proceed in a constructive way. Here
we let xj and rj denote, respectively, the jth columns of Xn and Rn (after
truncation). As before mn = mF Bn , and let
mn (z) = mF (1/N )(Rn +Xn ) (Rn +Xn ) (z).
We have again the relationship (3.1). Notice then equation (1.3) can be
written

1
m= dH(t) (3.3)
t
1+2 cm 2 zm z
where
1c
m= + cm.
z
Let B(j) = Bn (1/N )(rj + xj )(rj + xj ) . Then, as in (3.2) we have

1 
N
1
mn (z) = . (3.4)
N j=1 z(1 + (1/N )(rj + xj ) (B(j) zI)1 (rj + xj ))

May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 11

Pick z C+ . For any n n Yn we write

1 
N
Bn zI (Yn zI) = (rj + xj )(rj + xj ) Yn .
N j=1

Taking inverses, dividing by n and using Lemma 2.1 we get

(1/n)tr (Yn zI)1 mn (z)

1  (1/n)(rj + xj ) (B(j) zI)1 (Yn zI)1 (rj + xj )


N
=
N j=1 1 + (1/N )(rj + xj ) (B(j) zI)1 (rj + xj )

(1/n)tr (Yn zI)1 Yn (Bn zI)1 .

The goal is to determine Yn so that each term goes to zero. Notice rst that

(1/n)xj (B(j) zI)1 (Yn zI)1 xj (1/n)tr (Bn zI)1 (Yn zI)1 ,

so from (3.4) we see that Yn should have a term

2 zmn (z)I.

Since for any n n C bounded in norm

|(1/n)xj Crj |2 = (1/n2 )xj Crj rj C xj

we have from Lemma 2.3

|(1/n)xj Crj |2 (1/n2 )tr Crj rj C = (1/n2 )rj C Crj = o(1) (3.5)

(from truncation (1/N ) rj 2 ln n), so the cross terms are negligible.


This leaves us (1/n)rj (B(j) zI)1 (Yn zI)1 rj . Recall Corollary 2.1:

a (A + (a + b)(a + b) )1
1 + b A1 (a + b) a A1 (a + b)
= 1
a A1 b A1 .
1 + (a + b) A (a + b) 1 + (a + b) A1 (a + b)

Identify a with (1/ N )rj , b with (1/ N )xj , and A with B(j) . Using
Lemmas 2.2, 2.3 and (3.5), we have

(1/n)rj (Bn zI)1 (Yn zI)1 rj

1 + 2 cn mn (z) 1
1 1
r (B(j) zI)1 (Yn zI)1 rj .
1+ N (rj+xj ) (B(j)zI) (rj+xj ) n j
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

12 J. W. Silverstein

Therefore
1  (1/n)rj (B(j) zI)1 (Yn zI)1 rj
N

N j=1 1 + N1 (rj + xj ) (B(j) zI)1 (rj + xj )

1  (1/n)rj (Bn zI)1 (Yn zI)1 rj


N

N j=1 1 + 2 cn mn (z)

1
= (1/n) tr (1/N )Rn Rn (Bn zI)1 (Yn zI)1 .
1 + 2 cn mn (z)
So we should take
1
Yn = (1/N )Rn Rn 2 zmn (z)I.
1+ 2 c n mn (z)

Then (1/n)tr (Yn zI)1 will approach the right hand side of (3.3).

4. Proof of Uniqueness of (1.1)


For m C+ satisfying (1.1) with z C+ we have

1
m=   t  dA( )
z c 1+tm dH(t)


1
=       dA( ) .
t2 m
 zc t
1+tm dH(t) i z + c |1+tm|2 dH(t)

Therefore
  
t2 m 1
m = z + c dH(t)   t 2 dA( ) .
|1 + tm|2  
 z + c 1+tm dH(t)
(4.1)
+
Suppose m C also satises (1.1). Then
 
 t
t
1+tm 1+tm dH(t)
mm = c   t   t  dA( )
z + c 1+tm dH(t) z + c 1+tm dH(t)

t2
(m m)c dH(t) (4.2)
(1 + tm)(1 + tm)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 13


1
     dA( ).
z+c t
1+tm dH(t) z+c t
1+tm dH(t)

Using Cauchy-Schwarz and (4.1) we have

 
 t2
c dH(t)
 (1 + tm)(1 + tm)
 
1 
     dA( )
z+c t
1+tm dH(t) z+c t
1+tm dH(t)

1/2
 
t2 1
c dH(t)   2 dA( )
|1 + tm|2  
 z + c 1+tm dH(t)
t

1/2
 2 
t 1
c dH(t)   2 dA( )
|1 + tm|2  
 z + c 1+tm dH(t)
t

1/2
 2
t m
= c dH(t)   t2 m 
|1 + tm|2 z + c dH(t)
|1+tm|2

1/2

t2 m
c dH(t)   t2 m  < 1.
|1 + tm|2 z + c |1+tm|2 dH(t)

Therefore, from (4.2) we must have m = m.

5. Truncation and Centralization


We outline here the steps taken to enable us to assume in the proof of
Theorem 1.1, for each n, the Xij s are bounded by a multiple of ln n. The
following lemmas are needed.

Lemma 5.1. Let X1 , . . . , Xn be i.i.d. Bernoulli with p = P(X1 = 1) < 1/2.


Then for any  > 0 such that p +  1/2 we have
n
1 n2
P Xi p  e 2(p+) .
n i=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

14 J. W. Silverstein

Lemma 5.2. Let A be N N Hermitian, Q, Q both n N , and T , T both


n n Hermitian. Then
2
(a) F A+Q TQ
F A+Q TQ
rank(Q Q)
N
and
1
(b) F A+Q TQ
F A+Q TQ
rank(T T ).
N
Lemma 5.3. For rectangular A, rank(A) the number of nonzero entries
of A.

Lemma 5.4. For Hermitian N N matrices A, B



N
B 2 2
i i ) tr (A B) .
(A
i=1

Lemma 5.5. Let {fi } be an enumeration of all continuous functions that


1
take a constant m value (m a positive integer) on [a, b], where a, b are
1 1 1
rational, 0 on (, a m ] [b + m , ), and linear on each of [a m , a],
1
[b, b + m ]. Then

(a) for F1 , F2 M(R)



  
  i
D(F1 , F2 )  
 fi dF1 fi dF2  2
i=1

is a metric on M(R) inducing the topology of vague convergence.


(b) For FN , GN M(R)

lim FN GN = 0 = lim D(FN , GN ) = 0.


N N

(c) For empirical distribution functions F, G on the (respective) sets


{x1 , . . . , xN }, {y1 , . . . , yN }
2
1 
N
1 
N
2
D (F, G) |xj yj | (xj yj )2 .
N j=1 N j=1


Let pn = P(|X11 | n). Since the second moment of X11 is nite we
have

npn = o(1). (5.1)


May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 15

ij = Xij I(|X |<n) and B


Let X n = An +(1/N )X n , where X
n Tn X  = (X
ij ).
ij
Then from Lemmas 5.2(a), 5.3, for any positive 
  
n 2
P( F Bn
F ) P
B
I 
N ij (|Xij | n)

 
1  
=P I(|Xij |n) pn pn .
N n ij 2n

Then by Lemma 5.1, for all n large


 N
P( F Bn F Bn ) e 16 ,
which is summable. Therefore
 a.s.
F Bn F Bn 0.
Let Bn = An + (1/N )X n where X
 n Tn X n = X
n EX
n . Since

rank(EXn ) 1, we have from Lemma 5.2(a)
 
F Bn F Bn 0.
For > 0 dene T = diag(tn1 I(|tn1 |) , . . . , tnn I(|tnn |) ), and let Q be
any n N matrix. If and are continuity points of H, we have by
Lemma 5.2(b)

F An +Q Tn Q
F An +Q T Q

1 
n
1 a.s.
rank(Tn T ) = I(|tni |>) cH{[, ]c }.
N N i=1

It follows that if = n then



F An +Q F An +Q
a.s.
Tn Q T Q
0.

Let X ij = X ij I(|X |<ln n) EX
ij I(|X |<ln n) , X n = ((1/ N )X ij ),
ij
ij

X ij = Xij X ij , and X n = ((1/ N )X ij ). Then, from Lemmas 5.5(c)


and 5.4 and simple applications of Cauchy-Schwarz we have
  1 n X n T X n )2
 n T X
D2 (F An +Xn T Xn , F A+X n T X n ) tr (X
N

1
[tr (X n T X n )2 + 4tr (X n T X n X n T X n )
N

+ 4(tr (X n T X n X n T X n )tr (X n T X n )2 )1/2 ].
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

16 J. W. Silverstein

We have

tr (X n T X n )2 2 tr (X X )2
and

tr (X n T X n XT X ) (4 tr (X X )2 tr (X X )2 )1/2 .
Therefore, to verify
  a.s.
D(F A+XT X , F A+XT X ) 0
it is sucient to nd a sequence {n } increasing to so that
1 1
4n tr (X X )2 0 and tr (X X )2 = O(1) a.s.
a.s.
N N
The details are omitted.
Notice the matrix diag(E|X 1 1 |2 tn1 , . . . , E|X 1 1 |2 tnn ) also satises assump-
of Theorem 1.1. Just substitute this matrix for Tn , and replace X n
tion (a)
by (1/ E|X 1 1 |2 )X n . Therefore we may assume
(1) Xij are i.i.d. for xed n,
(2) |X11 | a ln n for some positive a,
(3) EX11 = 0, E|X11 |2 = 1.

6. The Limiting Distributions


The Stieltjes transform provides a great deal of information to the nature
of the limiting distribution F when An = 0 in Theorem 1.1, and F in
Theorems 1.2, 1.3. For the rst two

1 t
z = +c dH(t)
m 1 + tm
is the inverse of m = mF (z), the limiting Stieltjes transform of

F (1/N )Xn Tn Xn . Recall, when Tn is nonnegative denite, the relationships
1/2
between F , the limit of F (1/N )Tn Xn Xn Tn and F
1/2

F (x) = 1 cI[0,) (x) + cF (x),


and mF and mF
1c
mF (z) = + cmF (z).
z
Based solely on the inverse of mF the following is shown in [9]: (1) For all
x R, x = 0
lim mF (z) m0 (x)
zC+ x
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 17

exists. The function m0 is continuous on R {0}. Consequently, by prop-


erty (5) of Stieltjes transforms, F has a continuous derivative f on R {0}
given by f(x) = 1 m0 (x) (F subsequently has derivative f = 1c f). The
density f is analytic (possess a power series expansion) for every x = 0 for
which f (x) > 0. Moreover, for these x, f (x) is the imaginary part of the
unique m C+ satisfying

1 t
x= +c dH(t).
m 1 + tm

(2) Let xF denote the above function of m. It is dened and analytic


on B {m R : m = 0, m1 SH c
} (SG
c
denoting the complement of
the support of distribution G). Then if x SFc we have m = m0 (x) B
and x F (m) > 0. Conversely, if m B and x = x F (m) > 0, then x SFc .
We see then a systematic way of determining the support of F : Plot
xF (m) for m B. Remove all intervals on the vertical axis corresponding
to places where xF is increasing. What remains is SF , the support of F .
Let us look at an example where H places mass at 1, 3, and 10, with
respective probabilities .2, .4, and .4, and c = .1. Figure (b) on the next
page is the graph of
 
1 1 3 10
xF (m) = + .1 .2 + .4 + .4 .
m 1+m 1 + 3m 1 + 10m

We see the support boundaries occur at relative extreme values. These


1
values were estimated and for values of x SF , f (x) = c m0 (x) was
computed using Newtons method on x = xF (m), resulting in gure (a).
It is possible for a support boundary to occur at a boundary of the
support of B, which would only happen for a nondiscrete H. However, we
have
(3) Suppose support boundary a is such that mF (a) B, and is a
left-endpoint in the support of F . Then for x > a and near a

 x 1/2
f (x) = g(t)dt
a

where g(a) > 0 (analogous statement holds for a a right-endpoint in the


support of F ). Thus, near support boundaries, f and the square root func-
tion share common features, as can be seen in gure (a).
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

18 J. W. Silverstein

It is remarked here that similar results have been obtained for the ma-
trices in Theorem 1.3. See [4].
Explicit solutions can be derived in a few cases. Consider the Marcenko-
Pastur distribution, where Tn = I. Then m = m0 (x) solves
1 1
x= +c ,
m 1+m
resulting in the quadratic equation
xm2 + m(x + 1 c) + 1 = 0
with solution

(x + 1 c) (x + 1 c)2 4x
m=
2x
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 19


(x + 1 c) x2 2x(1 + c) + (1 c)2
=
2x

(x + 1 c) (x (1 c)2 )(x (1 + c)2 )
= .
2x
We see the imaginary part of m is zero when x lies outside the interval

[(1 c)2 , (1 + c)2 ], and we conclude that

(x(1 c)2 )((1+ c)2 x)
2cx x ((1 c)2 , (1 + c)2 )
f (x) =
0 otherwise.

The Stieltjes transform in the multivariate F matrix case, that is, when
Tn = ((1/N )X n X n )1 , X n n N containing i.i.d. standardized entries,
n/N c (0, 1), also satises a quadratic equation. Indeed, H now is
the distribution of the reciprocal of a Marcenko-Pastur distributed random
variable which well denote by Xc , the Stieltjes transform of its distribution
denoted by mXc . We have
 1   
1 Xc 1 1
x = + cE = + cE
m 1 + X1  m m Xc + m
c

1
= + cmXc (m).
m
From above we have

1 c (z + 1 c) + (z + 1 c)2 4z
mXc (z) = +
cz 2zc

z + 1 c + (z + 1 c )2 4z
=
2zc
(the square root dened so that the expression is a Stieltjes transform) so
that m = m0 (x) satises
  
1 m + 1 c + (m + 1 c)2 + 4m
x= +c .
m 2mc
It follows that m satises

m2 (c x2 + cx) + m(2c x c2 + c + cx(1 c )) + c + c(1 c ) = 0.


May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

20 J. W. Silverstein

Solving for m we conclude that, with


  2   2
1 1 (1 c)(1 c ) 1 1 (1 c)(1 c )
b1 = b2 =
1 c 1 c

(1c ) (xb1 )(b2 x)
2x(xc +c) b1 < x < b2
f (x) =
0 otherwise.

7. Other Uses of the Stieltjes Transform


We conclude these lectures with two results requiring Stieltjes transforms.
The rst concerns the eigenvalues of matrices in Theorem 1.2 outside the
support of the limiting distribution. The results mentioned so far clearly say
nothing about the possibility of some eigenvalues lingering in this region.
Consider this example with Tn given earlier, but now c = .05. Below is a
scatterplot of the eigenvalues from a simulation with n = 200 (N = 4000),
superimposed on the limiting density.

0.7

0.6

0.5 1 3 10
.2 .4 .4
0.4 c=.05 n=200

0.3

0.2

0.1
...... ...................... ...................................................
0.0
0 2 4 6 8 10 12 14

Here the entries of Xn are N (0, 1). All the eigenvalues appear to stay close
to the limiting support. Such simulations were the prime motivation to
prove
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 21

Theorem 7.1. ([1]). Let, for any d > 0 and d.f. G, F d,G denote the lim-
iting e.d.f. of (1/N )Xn Tn Xn corresponding to limiting ratio d and limiting
F Tn G.
Assume in addition to the previous assumptions:
(a) EX11 = 0, E|X11 |2 = 1, and E|X11 |4 < .
(b) Tn is nonrandom and Tn is bounded in n.
(c) The interval [a, b] with a > 0 lies in an open interval outside the support
of F cn ,Hn for all large n, where Hn = F Tn .

Then
P(no eigenvalue of Bn appears in [a, b] for all large n) = 1.

Steps in proof: (1) Let B n = (1/N )Xn Tn Xn mn = mF Bn and m0n =


mF cn ,Hn . Then for z = x + ivn
sup |mn (z) m0n (z)| = o(1/N vn ) a.s.
x[a,b]

when vn = N 1/68 .
(2) The proof of (1) allows (1) to hold for Im(z) = 2vn , 3vn , . . . ,
34vn . Then almost surely

max sup |mn (x + i kvn ) m0n (x + i kvn )| = o(vn67 ).
k{1,...,34} x[a,b]

We take the imaginary part of these Stieltjes transforms and get


 
 d(F B n () F cn ,Hn ()) 
max sup   = o(vn66 ) a.s.
k{1,2...,34} x[a,b] (x )2 + kvn2 

Upon taking dierences we nd with probability one


 
 vn2 d(F B n () F cn ,Hn ()) 
max sup   = o(vn66 )
k1
=k2 x[a,b] ((x ) + k1 vn )((x ) + k2 vn ) 
2 2 2 2

 
 (vn2 )2 d(F B n () F cn ,Hn ()) 
max sup  2 2 2 2 2 2
 = o(vn66 )

k1,k2,k3
distinct x[a,b] ((x) +k 1 vn )((x) +k 2 vn )((x) +k3 vn )

..
.
 
 (vn2 )33 d(F B n () F cn ,Hn ()) 

sup   = o(vn66 ).
x[a,b] ((x)2 +v 2 )((x)2 +2v 2 ) ((x)2 +34v 2 ) 
n n n
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

22 J. W. Silverstein

Thus with probability one


 
 d(F B n () F cn ,Hn ()) 
sup    = o(1) .
x[a,b] ((x ) + vn )((x ) + 2vn ) ((x ) + 34vn ) 
2 2 2 2 2 2

Let 0 < a < a, b > b be such that [a , b ] is also in the open interval
outside the support of F cn ,Hn for all large n. We split up the integral and
get with probability one


 I[a ,b ]c () d(F B n () F cn ,Hn ())
sup 
x[a,b] ((x )2 + vn2 )((x )2 + 2vn2 ) ((x )2 + 34vn2 )
 
vn68 
+  = o(1).
((xj ) +vn )((xj ) +2vn ) ((xj ) +34vn ) 
2 2 2 2 2 2
j [a,b ]

Now if, for each term in a subsequence satisfying the above, there is
at least one eigenvalue contained in [a, b], then the sum, with x evaluated
at these eigenvalues, will be uniformly bounded away from 0. Thus, at
these same x values, the integral must also stay uniformly bounded away
from 0. But the integral MUST converge to zero a.s. since the integrand is
bounded and with probability one, both F B n and F cn ,Hn converge weakly
to the same limit having no mass on {a , b }. Contradiction!
The last result is on the rate of convergence of linear statistics of the
eigenvalues of Bn , that is, quantities of the form

1
n
Bn
f (x)dF (x) = f (i )
n i=1
where f is a function dened on [0, ), and the i s are the eigenvalues of
Bn . The result establishes the rate to be 1/n for analytic f . It considers
integrals of functions with respect to
Gn (x) = n[F Bn (x) F cn ,Hn (x)]
where for any d > 0 and d.f. G, F d,G is the limiting e.d.f. of Bn =
1/2 1/2
(1/N )Tn Xn Xn Tn corresponding to limiting ratio d and limiting F Tn
G.

Theorem 7.2. ([2]). Under the assumptions in Theorem 7.1, Let f1 , . . . , fr


be C 1 functions on R with bounded derivatives, and analytic on an open
interval containing

[lim inf Tmin
n
I(0,1) (c)(1 c)2 , lim sup Tmax
n
(1 + c)2 ].
n n
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 23

Let m = mF . Then

(1) the random vector


  
f1 (x) dGn (x), . . . , fr (x) dGn (x) (7.1)

forms a tight sequence in n.


4
(2) If X11 and Tn are real and E(X11 ) = 3, then (7.1) converges weakly to
a Gaussian vector (Xf1 , . . . , Xfr ), with means
  m(z)3 t2 dH(t)
1 c (1+tm(z))3
EXf = f (z)   2 dz (7.2)
2i m(z)2 t2 dH(t)
1c (1+tm(z))2

and covariance function



1 f (z1 )g(z2 ) d d
Cov(Xf , Xg ) = 2 2
m(z1 ) m(z2 )dz1 dz2
2 (m(z1 ) m(z2 )) dz1 dz2
(7.3)
(f, g {f1 , . . . , fr }). The contours in (7.2) and (7.3) (two in (7.3)
which we may assume to be non-overlapping) are closed and are taken
in the positive direction in the complex plane, each enclosing the support
of F c,H .
2
(3) If X11 is complex with E(X11 ) = 0 and E(|X11 |4 ) = 2, then (2) also
holds, except the means are zero and the covariance function is 1/2 the
function given in (7.3).
(4) If the assumptions in (2) or (3) were to hold, then Gn , considered as
a random element in D[0, ) (the space of functions on [0, ) right-
continuous with left-hand limits, together with the Skorohod metric)
cannot form a tight sequence in D[0, ).

The proof relies on the identity


 
1
f (x)dG(x) = f (z)mG (z)dz
2i
(f analytic on the support of G, contour positively oriented around the
support), and establishes the following results on

Mn (z) = n[mF Bn (z) mF cn ,Hn (z)].

(a) {Mn (z)} forms a tight sequence for z in a suciently large contour
about the origin.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

24 J. W. Silverstein

2 4
(b) If X11 is complex with E(X11 ) = 0 and E(X11 ) = 2, then for z1 , . . . , zr
with nonzero imaginary parts,
(Re Mn (z1 ), Im Mn (z1 ), . . . , Re Mn (zr ), Im Mn (zr ))
converges weakly to a mean zero Gaussian vector. It follows that Mn ,
viewed as a random element in the metric space of continuous R2 -
valued functions with domain restricted to a contour in the complex
plane, converges weakly to a (2 dimensional) Gaussian process M . The
limiting covariance function can be derived from the formula
m (z1 )m (z2 ) 1
E(M (z1 )M (z2 )) = 2
.
(m(z1 ) m(z2 )) (z1 z2 )2
4
(c) If X11 is real and E(X11 ) = 3 then (b) still holds, except the limiting
mean can be derived from
 3 t2 dH(t)
c m(1+tm) 3
EM (z) =   m2 t2 dH(t) 2
1c (1+tm)2

and covariance function is twice that of the above function.


The dierence between (2) and (3), and the diculty in extending beyond
these two cases, arise from

E(X1 AX1 tr A)(X1 BX1 tr B)

= (E(|X11 |4 ) |E(X11
2 2
)| 2) 2 2
aii bii + |E(X11 )| tr AB T + tr AB,
i

valid for square matrices A and B.


One can show
   
1 t2 m2 (x)
(7.2) = f (x) arg 1 c dH(t) dx
2 (1 + tm(x))2
and
  
1  m(x) m(y) 
(7.3) = f (x)g (y) ln   dxdy
2 m(x) m(y) 
  
1 mi (x)mi (y)
= f (x)g (y) ln 1 + 4 dxdy
2 2 |m(x) m(y)|2
where mi = m.
For case (2) with H = I[1,) we have for f (x) = ln x and c (0, 1)
1
EXln = ln(1 c) and Var Xln = 2 ln(1 c).
2
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein

Stieltjes Transform and Random Matrices 25

Also, for c > 0


r  2
1 1 r
EXxr = ((1 c)2r + (1 + c)2r ) cj
4 2 j=0 j

and
1 1
r r2    
 k +k
r1 r2 1c 1 2
Cov(Xxr1 , Xxr2 ) = 2cr1 +r2
k1 k2 c
k1 =0 k2 =0

k1
r1   
2r1 1 (k1 +
) 2r2 1 k2 +


.
r1 1 r2 1
=1

(see [5]).

References
1. Z. D. Bai and J. W. Silverstein, No eigenvalues outside the support of the lim-
iting spectral distribution of large-dimensional sample covariance matrices,
Ann. Probab. 26(1) (1998) 316345.
2. Z. D. Bai and J. W. Silverstein, CLT for linear spectral statistics of large
dimensional sample covariance matrices, Ann. Probab. 32(1A) (2004) 553
605.
3. R. B. Dozier and J. W. Silverstein, On the empirical distribution of eigen-
values of large dimensional information-plus-noise type matrices, J. Multi-
variate Anal. 98(4) (2007) 678694.
4. R. B. Dozier and J. W. Silverstein, Analysis of the limiting spectral distri-
bution of large dimensional information-plus-noise type matrices, J. Multi-
variate Anal. 98(6) (2007) 10991122.
5. D. Jonsson, Some limit theorems for the eigenvalues of a sample covariance
matrix, J. Multivariate Anal. 12(1) (1982) 138.
6. V. A. Marcenko and L. A. Pastur, Distribution of eigenvalues for some sets
of random matrices, Math. USSR-Sb. 1 (1967) 457483.
7. J. W. Silverstein, Strong convergence of the empirical distribution of eigen-
values of large dimensional random matrices, J. Multivariate Anal. 55(2)
(1995) 331339.
8. J. W. Silverstein and Z. D. Bai, On the empirical distribution of eigenvalues
of a class of large dimensional random matrices, J. Multivariate Anal. 54(2)
(1995) 175192.
9. J. W. Silverstein and S. I. Choi, Analysis of the limiting spectral distribution
function of large dimensional random matrices, J. Multivariate Anal. 54(2)
(1995) 295309.
10. Y. Q. Yin, Limiting spectral distribution for of random matrices, J. Multi-
variate Anal. 20 (1986) 5068.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

BETA RANDOM MATRIX ENSEMBLES

Peter J. Forrester
Department of Mathematics and Statistics
University of Melbourne, Victoria 3010, Australia
E-mail: [email protected]

In applications of random matrix theory to physics, time reversal sym-


metry implies one of three exponents = 1, 2 or 4 for the repulsion s
between eigenvalues at spacing s, s 0. However, in the corresponding
eigenvalue probability density functions (p.d.f.s), is a natural positive
real variable. The general p.d.f.s have alternative physical interpre-
tations in terms of certain classical log-potential Coulomb systems and
quantum many body systems. Each of these topics is reviewed, along
with the problem of computing correlation functions for general . There
are also random matrix constructions which give rise to these general
p.d.f.s. An inductive approach to the latter topic using explicit formulas
for the zeros of certain random rational functions is given.

1. Introduction
1.1. Log-gas systems
In equilibrium classical statistical mechanics, the control variables are the
absolute temperature T and the particle density . The state of a system can
be calculated by postulating that the probability density function (p.d.f.)
for the event that the particles are at positions ~r1 , . . . , ~rN is proportional to
the Boltzmann factor eU (~r1 ,...,~rN ) . Here U (~r1 , . . . , ~rN ) denotes the total
potential energy of the system, while := 1/kB T with kB denoting Boltz-
manns constant is essentially the inverse temperature. Then for the system
confined to a domain , the canonical average of any function f (~r1 , . . . , ~rN )
(for example the energy itself) is given by
Z Z
1
hf i := d~r1 d~rN f (~r1 , . . . , ~rN )eU (~r1 ,...,~rN ) ,
ZN

27
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

28 P. J. Forrester

where
Z Z
ZN = d~r1 d~rN eU (r1 ,...,rN ) . (1.1)
I I
In the so called thermodynamic limit, N, || , N/|| = fixed,
such averages can display non-analytic behavior indicative of a phase tran-
sition.
The most common situation is when the potential energy U consists of
a sum of one body and two body terms,
N
X X
U (~r1 , . . . , ~rN ) = V1 (~rj ) + V2 (|~rk ~rj |).
j=1 1j<kN

Our interest is when the pair potential V2 is logarithmic,


V2 (|~rk ~rj |) = log |~rk ~rj |.
Physically this is the law of repulsion between two infinitely long wires,
which can be thought of as two-dimensional charges. Because the charges
are of the same sign, with strength taken to be unity, without the presence
of a confining potential they would repel to infinity. Indeed, well defined
thermodynamics requires that the system be overall charge neutral. This
can be achieved by immersing the charges in a smeared out neutralizing
density. In particular, let this density have profile b (~r). It is thus required
that
Z
b (~r) d~r = N

while the one body potential V1 is calculated according to
Z
V1 (~r) = log |~r ~rj |b (~r) d~r. (1.2)

Consider first some specific log-potential systems (log-gases for short)
confined to one-dimensional domains. Four distinct examples are relevant.
These are the system on the real line, with the neutralizing background
having profile
r
2N x2
b (x) = 1 , |x| < 2N ; (1.3)
2N
the system on a half line with image charges of the same sign in x <
0, a fixed charge of strength (a 1)/2 at the origin, and a neutralizing
background charge density in x > 0 with profile
r
4N x2
b (x) = 1 , 0 < x < 2 N; (1.4)
4N
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 29

the system on the unit interval [1, 1] with fixed charges of strengths (a
1)/2 + 1/ and (b 1)/2 + 1/ at y = 1 and y = 1 respectively and a
neutralizing background
N
b (x) = , |x| < 1; (1.5)
1 x2
and the system on a unit circle with a uniform neutralizing background
density.
Physically one expects charged systems to be locally charge neutral.
Accepting this, the particle densities must then, to leading order, coincide
with the background densities. In the above examples, this implies that in
the bulk of the system the particle densities are dependent on N , which in
turn means that there is not yet a well defined thermodynamic limit. To
overcome this, note the special property of the logarithmic potential that it
is unchanged, up to an additive constant, by the scaling of the coordinates
xj 7 cxj . Effectively the density is therefore not a control variable, as it
determines only the length scale in the logarithmic potential.
Making use of (1.2), for the four systems the total energy of the system
can readily be computed (see [13] for details) to give the corresponding
Boltzmann factors. They are proportional to
N
Y Y
2
e(/2)xj |xk xj | (1.6)
j=1 1j<kN
N
Y Y
2
|xj | exj /2 |x2k x2j | (1.7)
j=1 1j<kN
N
Y Y
(1 xj )a/2 (1 + xj )b/2 |xk xj | (1.8)
j=1 1j<kN
Y
|eik eij | (1.9)
1j<kN

respectively. We remark that changing variables xj = cos j , 0 < j < in


(1.8) the Boltzmann factor becomes proportional to
N
Y
(sin j /2)(a1/2)+1/2 (cos j /2)(b1/2)+1/2 (sin j )/2
j=1
Y
| sin((k j )/2) sin((k + j )/2)| . (1.10)
1j<kN
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

30 P. J. Forrester

This corresponds to the log-gas on a half circle 0 j , with image


charges of the same sign at 2 j , a charge of strength (a 1/2) + 1/ at
= 0, and a charge of strength (b 1/2) + 1/ at = .

1.2. Quantum many body systems


The setting of Section 1.1 was equilibrium statistical mechanics. As a gen-
eralization, let us suppose now that there are dynamics due to Brownian
motion in a (fictitious) viscous fluid with friction coefficient . The evolu-
tion of the p.d.f. p (x1 , . . . , xN ) for the joint density of the particles is given
by the solution of the Fokker-Planck equation (see e.g. [32])

XN  
p U 1
= Lp where L = + . (1.11)
j=1
j j j

In general the steady state solution of this equation is the Boltzmann factor
eU ,

LeU = 0. (1.12)

Another general property is the operator identity

N 
U/2 U/2
X 1 2  U 2 1 2 U 
e Le = 2 + , (1.13)
j=1
xj 4 xj 2 x2j

relating L to an Hermitian operator.


For the potentials implied by the Boltzmann factors (1.6), (1.7), (1.9)
and (1.10) the conjugation (1.13) gives

eU/2 LeU/2 = (H E0 )/ (1.14)

where H is a Schrodinger operator consisting of one and two body terms


only,

N N
X 2 X X
H= + v1 (xj ) + v2 (xj , xk ).
j=1
x2j j=1 1j<kN
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 31

Explicitly, one finds [2]


N
X 2
H = H (H) :=
j=1
x2j
N
2 X 2 X 1
+ x + (/2 1) (1.15)
4 j=1 j (xj xk )2
1j<kN
N N
!
(L)
X 2 X a0  a0 1 2 2
H=H := + 1 2 x
j=1
x2j j=1 2 2 xj 4 j
X  1 1

+ (/2 1) + (1.16)
(xk xj )2 (xk + xj )2
1j<kN
N
X 2
H = H (C) :=
j=1
j2
X 1
+ (/4)(/2 1) 2 (1.17)
1j<kN
sin ((k j )/2)
N N  0  0
X 2 X a a  1
H = H (J) := + 1
j=1
2j j=1
2 2 sin2 j
b0  b0  1 
+ 1
2 2 cos2 j
X  1 1

+ (/2 1) + (1.18)
1j<kN
sin2 (j k ) sin2 (j + k )

where a0 = a+1/, b0 = b+1/. Thus all the pair potentials are proportional
to 1/r2 , where r is the separation between particles or between particles
and their images. Such quantum many body systems were first studied by
Calogero [5] and Sutherland [35].
It follows from (1.12) and (1.14) that eU/2 is an eigenfunction of
H with eigenvalue E0 . Since eU/2 is non-negative, it must in fact be the
ground state. This suggests considering a conjugation of the Schrodinger op-
erators with respect to this eigenfunction. Consider for definiteness (1.17).
A direct computation gives

H (C) := eU/2 (H (C) E0 )eU/2


N 
X 2 N 1 2 X zj zk  
= zj + + (1.19)
j=1
zj zj zk zj zk
1j<kN
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

32 P. J. Forrester

where zj := eixj and = 2/. This operator admits a complete set of


symmetric polynomial eigenfunctions P (z) = P (z1 , . . . , zN ; ) labeled by
partitions := (1 , . . . , N ), 1 N 0, known as the symmetric
Jack polynomials [27, 13]. These polynomials have the structure
X
P (z) = m + b m
<

where m denotes the monomial symmetric function in the variables


z1 , . . . , zN associated with the partition (for example, with = 21 and
N = 2, m21 = z12 z2 + z1 z22 ), < denotes the dominance ordering for parti-
tions, and the coefficients b are independent of N .

1.3. Selberg correlation integrals


The symmetric Jack polynomials can be used as a basis to define a class of
generalized hypergeometric functions, which in turn have direct relevance
to the calculation of correlation functions in the log-gas. To set up the
definition of the former, introduce the generalized factorial function
N
Y (u 1 (j 1) + j )
[u]()
:= , (1.20)
j=1
(u 1 (j 1))

the quantity
Y  
d0 := (i j + 1) + (0j i) (1.21)
(i,j)

(here 0j denotes the length of column j in the diagram of ), and the


renormalized Jack polynomial
|| ||!
C() (x) := P (x; ). (1.22)
d0
()
In terms of these quantities, the generalized hypergeometric functions p Fq
are specified by
()
p Fq (a1 , . . . , ap , b1 , . . . , bq ; x1 , . . . , xm )
X 1 [a1 ]() ()
[ap ]
:= ()
C () (x1 , . . . , xm ).
()
(1.23)

||! [b1 ] [bq ]
() ()
Since in the one-variable case we have = k, Ck (x) = xk and [u]k =
()
(u)k , we see that with m = 1 the generalized hypergeometric function p Fq
reduces to the classical hypergeometric function p Fq .
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 33

()
There are two cases in which p Fq can be expressed in terms of ele-
mentary functions [24, 37]. These are the generalized binomial theorem
m
Y
()
1 F0 (a; x1 , . . . , xm ) = (1 xj )a (1.24)
j=1

and its limiting form


()
0 F0 (x1 , . . . , xm ) = ex1 ++xm . (1.25)
The latter can be deduced from the confluence relation
lim F () (a1 , . . . , ap ; b1 , . . . , bq ; x1 /ap , . . . , xm /ap )
ap p q
()
= p1 Fq (a1 , . . . , ap1 ; b1 , . . . , bq ; x1 , . . . , xm ), (1.26)
()
which follows from the explicit form (1.20) of [ap ] and the fact that
()
C (x) is homogeneous of degree ||.
()
We will now relate 2 F1 to a generalization of the Selberg integral
referred to as a Selberg correlation integral. First we recall that the Selberg
integral is the multidimensional generalization of the beta integral
Z 1 Z 1 N
Y Y
SN (1 , 2 , ) := dx1 dxN xl 1 (1 xl )2 |xk xj |2 .
0 0 l=1 1j<kN

This can be transformed to the trigonometric form


Z 1/2 Z 1/2 N
Y
MN (a, b, ) := d1 dN eil (ab) |1 + e2il |a+b
1/2 1/2 l=1
Y
2ik 2ij 2
|e e |
1j<kN

known as the Morris integral. The Selberg correlation integral refers to the
generalizations
Z 1 Z 1
1
SN (t1 , . . . , tm ; 1 , 2 , 1/) := dx1 dxN
SN (1 , 2 , 1/) 0 0
N
Y m
Y Y 2/
xl 1 (1 xl )2 (1 tl0 xl ) |xj xk | ,
l=1 l0 =1 j<k

Z 1 Z 1
1
SN (t1 , . . . , tm ; 1 , 2 , 1/) := dx1 dxN
SN (1 + m, 2 , 1/) 0 0
N
Y m
Y Y
xl 1 (1 xl )2 (tl0 xl ) |xj xk |2/ ,
l=1 l0 =1 j<k
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

34 P. J. Forrester

and its trigonometric form


Z 1/2 Z 1/2
1
MN (t1 , . . . , tm ; a, b, 1/) := dx1 dxN
MN (a, b, 1/) 1/2 1/2
N
Y m
Y Y
ixl (ab) 2ixl a+b
e |1 + e | (1 + tl0 e2ixl ) |e2ixk e2ixj |2/ .
l=1 l0 =1 j<k

These have the generalized hypergeometric function evaluations [24, 14]


(1/)
SN (t1 , . . . , tm ; 1 , 2 , 1/) = 2 F1 (N, (N 1) (1 + 1);
2(N 1) (1 + 2 + 2); t1 , . . . , tm ), (1.27)

(1/)
SN (t1 , . . . , tm ; 1 , 2 , 1/) = 2 F1 (N,
(N 1) + (1 + 2 + m + 1); (1 + m); t1 , . . . , tm ), (1.28)
and
(1/)
MN (t1 , . . . , tm ; a, b, 1/) = 2 F1 (N, b; (N 1)(1+a); t1 , . . . , tm ).
(1.29)
On the other hand, the generalized binomial expansion allows the gen-
()
eralized hypergeometric function 2 F1 in m variables to be expressed as
an m-dimensional integral, provided all the arguments are equal. Thus we
have [18, 15]
Z 1/2 Z 1/2 N
1 Y
dx1 dxN eixl (ab) |1 + e2ixl |a+b
MN (a, b, 1/) 1/2 1/2 l=1
Y
2ixl r 2ixk
(1 + te ) |e e2ixj |2/
1j<kN
1
()
= 2 F1 (r, b; (N 1) + a + 1; t1 , . . . , tN ) , (1.30)
t1 ==tN =t

Z 1 Z 1 N
1 Y
dx1 dxN xl 1 (1 xl )2 (1 txl )r
SN (1 , 2 , 1/) 0 0 l=1
Y 2/ () 1
|xj xk | = 2 F1 (r, (N 1) + 1 + 1;

j<k
2

(N 1) + 1 + 2 + 2; t1 , . . . , tN ) . (1.31)
t1 ==tN =t

In using (1.30) and (1.31) in (1.27) and (1.29) it may happen that the
parameters are such that the former are divergent. To overcome this, use can
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 35

()
be made of certain transformation formulas satisfied by the 2 F1 . One such
formula, which is restricted to cases in which the series (1.23) terminates,
is [14]
()
2 F1 (a, b; c; t1 , . . . , tm )
()
F (a, b; a + b + 1 + (m 1)/ c; 1 t1 , . . . , 1 tm )
= 2 1 . (1.32)
()
F
2 1 (a, b; a + b + 1 + (m 1)/ c; t 1 , . . . , t m )
t1 ==tm =1

Another, which generalizes one of the classical Kummer relations, reads [37]
()
2 F1 (a, b; c; t1 , . . . , tm )
m 
Y () t1 tm 
= (1 tj )a 2 F1 a, c b; c; ,..., . (1.33)
j=1
1 t1 1 tm

1.4. Correlation functions


For a one-dimensional statistical mechanical system with Boltzmann factor
eU confined to a domain I, the n-particle correlation function is defined
by
Z Z
N (N 1) (N n + 1)
(n) (~r1 , . . . , ~rn ) = drn+1 drN eU (r1 ,...,rN )
ZN I I
(1.34)

where ZN is specified by (1.1). In the case of the log-gas systems specified in


Section 1.1 with even , the Selberg correlation integral evaluations (1.28),
(1.29) allows this to be expressed in terms of generalized hypergeometric
functions.
Consider first the Boltzmann factor (1.8), but with the change of vari-
ables xj 7 1 2xj so that now 0 < xj < 1. Further set a/2 = 1 and
b/2 = 2 . The resulting p.d.f. is said to define the Jacobi -ensemble. Ac-
cording to (1.34), with N +n particles the corresponding n-point correlation
is given by
n n
(N + n)n Y Y
(n) (r1 , . . . , rn ) := rk1 (1 rk )2 |rk rj |
SN +n (1 , 2 , /2)
k=1 j<k
Z N n
!
Y Y Y
dx1 . . . dxN xj 1 (1 xj )2 |xj rk | |xk xj | .
[0,1]N j=1 k=1 1j<kN
(1.35)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

36 P. J. Forrester

In the case even, the factor |xj rk | is a polynomial. Making use of


(1.28) it follows that

(n) (r1 , . . . , rn )
n n
SN (1 + n, 2 , /2) Y 1 Y
= (N + n)n tk (1 tk )2 |tk tj |
SN +n (1 , 2 , /2)
k=1 j<k

(/2)
2 F1 (N, 2(1 + 2 + m + 1)/ + N 1; 2(1 + m)/; t1 , . . . , tn )
(1.36)

where

tk = r j for k = 1 + (j 1), . . . , j (j = 1, . . . , n). (1.37)

For n = 1 the arguments (1.37) are equal, and we have available the
-dimensional integral representation (1.30). The get a convergent integral
we must first apply the Kummer type transformation (1.33). Doing this
gives [13]

SN (1 + , 2 , /2)
(1) (r) = (N + 1)
SN +1 (1 , 2 , /2)

r1 (1 r)2

M (2(1 + 1)/ 1, 2(2 + 1)/ + N 1; 2/)
Z 1/2 Z 1/2
Y
dx1 dx eixl (2(1 2 )/) |1 + e2ixl |2(1 +2 +2)/+N 1
1/2 1/2 l=1

r ixl N Y
(eixl e ) |e2ixk e2ixj |4/ . (1.38)
1r
1j<k

The Boltzmann factor (1.7) with the change of variable x2j 7 xj , and
= a + 1 is said to specify the Laguerre -ensemble. It can be ob-
tained from the Jacobi -ensemble by the change of variables and limiting
procedure

xj 7 xj /L, 2 7 L/2, 1 7 a/2, L . (1.39)


May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 37

The confluence (1.26) allows the same limiting procedure to be applied to


(1.36). Thus we obtain
n
!
(N + n)n Y a/2 rj /2 Y
(n) (r1 , . . . , rn ) = rj e |rk rj |
Wa,,N +n j=1
1j<kn
Z N n
!
Y a/2 xj /2
Y

dx1 dxN xj e |rk xj |
(0,)N j=1 k=1
Y
|xk xj | , (1.40)
j<k

where
Z Z N
Y Y
a/2 xj /2
Wa,,N = dx1 dxN xj e |xk xj | .
0 0 j=1 j<k

Applying the limiting procedure to (1.38) gives for the one-point correlation
(i.e. the particle density) with even the -dimensional integral represen-
tation
Z 1/2 Z 1/2
Wa,,N ra/2 er/2
(1) (r) = (N + 1) dx1 dx
Wa+2,,N +1 M (2/ 1, N, /2) 1/2 1/2

Y 2ixl
eixl (2/1N ) |1 + e2ixl |N +2/1 ere
l=1

Y
|e2ixk e2ixj |4/ . (1.41)
j<k

For the Laguerre -ensemble, in addition to the correlations at even


being expressible in terms of the confluent hypergeometric function, one
(N )
can give similar evaluations of the probability E ((0, s)) that the interval
(0, s) is particle free,
(N )
E (0; (0, s))
Z Z N
1 Y a/2
Y
= dx1 dxN exj /2 xj |xk xj |
Wa,,N s s j=1 j<k
Z Z N
eN s/2 Y Y
= dx1 dxN exj /2 (xj + s)a/2 |xk xj |
Wa,,N 0 0 j=1 j<k
(1.42)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

38 P. J. Forrester

where the second equality follows by the change of variables xj 7 xj + s.


Thus it follows by applying the limiting procedure (1.39) to (1.36) that for
a/2 =: m Z0 [17]

(N ) (/2)
E (0; (0, s)) = eN s/2 1 F1

(N ; 2m/; x1 , . . . , xm ) . (1.43)
xj =s

A closely related quantity is the distribution of the particle closest to


the origin,
(N ) d (N )
p (0; s) = E (0; (0, s))
ds
Z Z
N eN s/2 a/2
= s dx1 dxN 1
Wa,,N 0 0
N
Y 1 Y
xj exj /2 (xj + s)a/2 |xk xj | (1.44)
j=1 j<k

where the second equality follows by differentiating the first equality in


(1.42) and then changing variables xj 7 xj + s. This multidimensional
integral can be evaluated in an analogous way to that in (1.42) to give [17]
(N ) Wa+2,,N 1
p (0; s) = N sm eN s/2
Wa,,N

(/2)
1 F1 (N + 1; 2m/ + 2; x1 , . . . , xm ) . (1.45)
xj =s

For the Gaussian -ensemble with N + n particles


n
(N + n)n Y rj2 /2 Y
(n) (r1 , . . . , rn ) = e |rk rj |
G,N j=1
1j<kN
Z N n
!
Y Y Y
x2j /2
dx1 dxN e |rk xj | |xk xj |
(,)N j=1 k=1 j<k

where
Z N
Y Y
2
G,N = dx1 dxN exj /2 |xk xj | .
(,)N j=1 j<k

The Gaussian -ensemble can be obtained from the Jacobi -ensemble on


[0, 1] through the change of variables
r
1 xj 
xj 7 1 , 1 = 2 = L 2
2 2 L
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 39

and taking the limit L . Applying this to (1.38) gives for the one-point
density [2]
2 Z
G,N er /2
(1) (r) = (N + 1) du1 du
G,N +1 G (,)

Y Y
2
(iuj + r)N euj |uk uj |4/ (1.46)
j=1 1j<k

where
Z
Y Y
2
G = dx1 dx exj |xk xj |4/ .
(,) j=1 j<k

The last case to consider is the circular -ensemble. With N +n particles


the n-point correlation function is given by
(N + n)n ((/2)!)N +n
(n) (r1 , . . . , rn ) =
Ln ((N + n)/2)!
Y
|e2irk /L e2irj /L | IN,n (; r1 , . . . , rn )
1j<kn
(1.47)
where
Z N Y
Y n
IN,n (; r1 , . . . , rn ) := dx1 dxN |1 e2i(xj rk /L) |
[0,1]N j=1 k=1
Y
2ixk
|e e2ixj | , (1.48)
1j<kN

and the angles have been scaled so that the circumference length of the
circle is equal to L. Use of (1.29) and the transformation formula (1.32)
shows [14] that for even
(N + n)n ((/2)!)N +n Y
(n) (r1 , . . . , rn ) = n
|e2irk /L e2irj /L |
L ((N + n)/2)!
1j<kn
Yn
iN (rk r1 )/L
MN (n/2, n/2, /2) e
k=2
(/2)
2 F1 (N, n; 2n; 1 t1 , . . . , 1 t(n1) ) (1.49)
where
tk := e2i(rj r1 )/L , k = 1 + (j 2), . . . , (j 1) (j = 2, . . . , n).
(1.50)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

40 P. J. Forrester

For the circular ensemble the one-point function is a constant, due to


translation invariance. The two-point function is therefore a function of
the single variable r2 r1 as is clear from (1.50). Moreover, use of (1.31)
allows the generalized hypergeometric function in (1.49) to be written as a
-dimensional integral [15]
(2) (r1 , r2 )
(N + 2)(N + 1) (N/2)!((/2)!)N +2 MN (n/2, n/2, /2)
=
L2 ((N + 2)/2)! S (1 2/, 1 2/, 2/)
Z
Y
(2 sin (r1 r2 )/L) eiN (r1 r2 )/L du1 du
[0,1] j=1

1+2/
Y
(1 (1 e2i(r1 r2 )/L )uj )N uj (1 uj )1+2/ |uk uj |4/ .
j<k
(1.51)

1.5. Scaled limits


As remarked in Section 1.1, the log-gas picture predicts the leading or-
der form of the density profile for the Jacobi, Laguerre and Gaussian -
ensembles. It must then be that these same functional forms are the leading
asymptotic form of the corresponding multidimensional integrals, appropri-
ately scaled, in (1.38), (1.41), (1.46). Saddle point analysis undertaken in
[16, 2] has verified that this is indeed the case. The analysis has been ex-
tended for the Gaussian and Laguerre -ensembles in [7] to the calculation
of correction terms. In particular, for the Gaussian -ensemble it is found
that
r
2 2 (1 + 2/) 1
(1) ( 2Nx) W (x) 6/1
N (W (x)) N 2/
   1 1 
cos 2N PW (x) + (1 2/)Arcsin x + O min , 8/
N N
(1.52)

where W (x) := 2 1 x2 and
Z x
x 1
PW (x) = W (t) dt = 1 + W (x) Arccos x.
1 2
The expansion (1.52) is an example of a global asymptotic form, in which
the expansion parameter varies macroscopically relative to the inter-particle
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 41

spacing. In contrast local asymptotic expansions fix the inter-particle spac-


ing to be order unity. In the circular -ensemble defined on a circle of
circumference length L as assumed in (1.47), this is achieved by taking the
limit N, L with N/L = fixed. This limiting procedure applied to
(1.49) gives [14]

bulk
(n) (r1 , . . . , rn ) := lim (n) (r1 , . . . , rn )
N,L
N/L=

Y n
Y
= n cn () |2(rk rj )| ei(rk r1 )
1j<kn k=2

(/2)
1 F1 (n, 2n; 2i(r2 r1 ), . . . , 2i(rn r1 ))

(/2)
where in the argument of 1 F1 each 2i(rj r1 ) (j = 2, . . . , n) occurs
times, and

n1
Y (k/2 + 1)
cn () = (/2)n(n1)/2 ((/2)!)n .
((n + k)/2 + 1)
k=0

For the 2-point function, applying the limit to (1.51) gives [15]

((/2)!)3 ei(r1 r2 ) (2(r1 r2 ))


bulk 2
(2) (r1 , r2 ) = (/2)

!(3/2)! S (1 + 2/, 1 + 2/, 2/)
Z
Y 1+2/
du1 du e2i(r1 r2 )uj uj (1.53)
[0,1] j=1
Y
(1 uj )1+2/ |uk uj |4/ . (1.54)
j<k

Local expansions can also be performed in the neighborhood of the edge


of the support. There are two distinct cases: the spectrum edge when the
support is strictly zero in one direction as near x = 0 in the Laguerre -
ensemble or at both edges of the support of the Jacobi -ensemble, and the
spectrum edge when the support is non-zero in both directions about the
edge. These are referred to as the hard and soft edges respectively.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

42 P. J. Forrester

For the hard edge the necessary scaling is x 7 X/4N . We see from
(1.40) and (1.26) that
 1 n
hard
(n) (X1 , . . . , Xn ) = lim (n) (X1 /4N, . . . , Xn /4N )
N 4N
n
Y Y
a/2
= An () Xj |Xk Xj |
j=1 1j<kn

(/2)
0 F1 (a + 2n; Y1 , . . . , Yn ) (1.55)
{Yj }7{Xj /4}

where
An () = 2n(2+a+(n1)) (/2)n(1+a+(n1))
((1 + /2))n
2n .
Y
(1 + a/2 + (j 1)/2)
j=1

With n = 1, and a = c 2/, c a positive integer, the generalized hyperge-


(/2)
ometric function 0 F1 can be written as a -dimensional integral to give
[17]
Z Y
1/2
/21
(1) (X) = a(c, )X eiX cos j ei(c1)j
[,] j=1
Y
|eik eij |4/ d1 d (1.56)
1j<k

where
1   ((1 + 2/))
a(c, ) = (1)(c1)/2 (2) .
2 4 ()
Similarly the hard edge scaled limits can be taken in the evaluations of
the distributions (1.43) and (1.45). Thus one finds [16]

(/2)
E (0; (0, s)) = es/8 0 F1

(2m/; x1 , . . . , xm )
xj =s/4

(/2)
p (0; s) = Am, sm es/8 0 F1

(2m/ + 2; ; x1 , . . . , xm )
xj =s/4

where
(1 + /2)
Am, = 4(m+1) (/2)2m+1 .
(1 + m)(1 + m + /2)
Note the similarity with (1.55) in the case n = 1. In particular we have
available m-dimensional integral representations.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 43

x
At the soft edge the appropriate scaling is x 7 2N + . Starting
2N 1/3
with the formula (1.46) one can show [7]

1  x 
(1) 2N +
2N 1/3 2N 1/3

(1 + /2)  4 /2 Y (1 + 2/)
K, (x) + O(N 1/3 ) (1.57)
2 j=1
(1 + 2j/)

where
Z i Z i n
1 Y 3 Y
Kn, (x) := dv1 dvn evj /3xvj (vk vj )4/ .
(2i)n i i j=1 1k<ln

2. Physical Random Matrix Ensembles


2.1. Heavy nuclei and quantum mechanics
Random matrices were introduced into theoretical physics in the 1950s by
Wigner as a model of a random matrix approximation to the Hamiltonian
determining the highly excited states of heavy nuclei (see [30] for a collection
of many early works in the field). At the time it was thought that the com-
plex structure of heavy nuclei meant that in any basis the matrix elements
for the Hamiltonian determining the highly excited states would effectively
be random. (Subsequently [3] it has been learnt that a random matrix hy-
pothesis applies equally well to certain single particle quantum systems;
what is essential is that the underlying classical mechanics is chaotic.) One
crucial point was the understanding that the (global) time reversal sym-
metry exhibited by complex nuclei implied that the elements of the matrix
could be chosen to be symmetric, which since Hamiltonians are Hermitian
implied the relevant class of matrices to be real symmetric. Another crucial
point was the hypothesis of there being no preferential basis, in the sense
that the joint probability distribution of the independent elements of the
random matrix X should be independent of the basis vectors used to con-
struct the matrix in the first place. This effectively requires that the joint
probability distribution be unchanged upon the conjugation X 7 O T XO
where O is a real orthogonal matrix. Distributions with this property are
said to be orthogonally invariant, a typical example being

1 Tr(V (X))
e (2.1)
C
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

44 P. J. Forrester

where C denotes the normalization, and V (x) is a polynomial in x of even


degree with positive leading coefficient. It is a well-known result that if one
should require that the independent elements be independently distributed,
and that the distribution be orthogonally invariant, then the distribution
is necessarily of the form (2.1) with

V (x) = ax2 + bx. (2.2)

Random matrix theory applied to quantum systems without time re-


versal symmetry (typically due to the presence of a magnetic field) gives
the relevant class of matrices as being complex Hermitian. In this case the
hypothesis of there being no preferential basis requires invariance of the
joint probability distribution under the conjugation X 7 U XU where U
is a unitary matrix. Distributions with this property are said to be uni-
tary invariant, and again (2.1) is a typical example, with X now a complex
Hermitian rather than real symmetric matrix.
Theoretically a time reversal operator T is any anti-unitary operator.
However physical considerations further restricts their form (see e.g. [23]),
requiring that for an even number or no spin 1/2 particles T 2 = 1 (a familiar
example being T = K where K denotes complex conjugation), while for a
finite dimensional system with an odd number of spin 1/2 particles T 2 = 1
where T = Z2N K, with
 
0 1
Z2N = 1N .
1 0

For a quantum system which commutes with T of this latter form, the
2N 2N matrix X modelling the Hamiltonian must, in addition to being
Hermitian, have the property

X = Z2N XZ1
2N .

This means that X can be viewed as the 2N 2N complex matrix formed


from an N N matrix with the elements consisting of 2 2 blocks of the
form
 
0 1
, (2.3)
1 0

which is the matrix representation of a real quaternion. For no preferential


basis one requires that the probability density function for X be invariant
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 45

under the conjugation X 7 S XS where S is the 2N 2N unitary matrix


formed out of an N N unitary matrix with real quaternian blocks (2.3).

2.2. Dirac operators and QCD


Random Hermitian matrices with a special block structure occurred in Ver-
baarschots introduction [36] of a random matrix theory of massless Dirac
operators, in the context of quantum chromodynamics (QCD). Generally
the non-zero eigenvalues of the massless Dirac operator occur in pairs .
Furthermore, in the so called chiral basis, all basis elements are eigenfunc-
tions of the -matrix i5 with eigenvalues +1 or 1, and matrix elements
between states with the same eigenvalue of 5 must vanish, leaving a block
structure with non-zero elements in the upper-right and lower-left blocks
only. Noting too that the application to QCD requires that the Dirac op-
erator has a given number, say, of zero eigenvalues, implies the structure
 
0nn X
(2.4)
X 0mm
where X is an nm (n m) matrix with nm = . Moreover the positive
eigenvalues are given by the positive square root of the eigenvalues of X X.
As in the application to chaotic quantum systems, the elements of X
in (2.4) must be real, complex or real quaternion according to there being
a time reversal symmetry with T 2 = 1, no time reversal symmetry, or
a time reversal symmetry with T 2 = 1 respectively. And due to their
origin in studying the Dirac equation with a chiral basis, the corresponding
ensembles are referred to as chiral random matrices.

2.3. Random scattering matrices


Problems in quantum physics also give rise to random unitary matrices. One
such problem is the scattering of plane waves within an irregular shaped
domain, or one containing random scattering impurities. The wave guide
connecting to the cavity is assumed to permit N distinct plane wave states,
and the corresponding amplitudes are denoted I~ for the ingoing states and
~ for the outgoing states. By definition the scattering matrix S relates I~
O
and O,
~

S I~ = O.
~

The flux conservation requirement |I|~ 2 = |O|


~ 2 implies that S must be uni-
tary. For scattering matrices in quantum mechanics, or more generally evo-
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

46 P. J. Forrester

lution operators, time reversal symmetry requires that


T 1 ST = S .
For T 2 = 1 this implies
S = ST (2.5)
while for T 2 = 1,
S = Z2N S T Z1 D
2N =: S . (2.6)
With UN U (N ) a general N N unitary matrix note that
T D
S = U N UN , S = U2N U2N (2.7)
respectively have the properties (2.5) and (2.6).
For random scattering matrices it is hypothesized that the statistical
properties of S are determined soley by the global time reversal symmetry,
and are invariant under the same conjugations as the corresponding Hamil-
tonians. In fact the measure on U (N ) which is unchanged by both left and
right multiplication by another unitary matrix is unique. It is called the
Haar measure dH U , and its volume form is
(dH U ) = (U dU ). (2.8)
Similarly, for the symmetric and self dual quaternion unitary matrices in
(2.8) we have
T
(dH S) = ((UN ) dS U ) T
(dH S) = ((UN ) dS U ) (2.9)
which are invariant under the mappings
S 7 VNT SVN , D
S 7 V2N SV2N
for general unitary matrices V .

2.4. Quantum conductance problems


In the above problem of scattering within a cavity the incoming and out-
going wave is unchanged along the lead. A related setting is a quasi one-
dimensional conductor (lead) which contains internal scattering impurities
(see e.g. [34]). One supposes that there are n available scattering channels
at the left hand edge, m at the right hand edge, and that at each end there
is a reservoir which causes a current to flow.
The n-component vector I~ and m-component vector I~0 is used to denote
the amplitudes of the plane wave states traveling into the left and right sides
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 47

of the wire respectively, while the n-component vector O ~ and m-component


0
vector O denotes the amplitudes of the plane wave states traveling out of
~
the left and right sides of the wire. The (n + m) (n + m) scattering matrix
S now relates the flux traveling into the conductor to the flux traveling out,
" # " #
I~ ~
O
S ~0 = ~ 0 .
I O
The scattering matrix is further decomposed in terms of reflection and
transmission matrices by
 
rnn t0nm
S= 0 . (2.10)
tmn rmm
According to the Landauer scattering theory of electronic conduction,
the conductance G is given in terms of the transmission matrix tmn by
G/G0 = Tr(t t)
where G0 = 2e2 /h is twice the fundamental quantum unit of conductance.
Thus of interest is the distribution of t t in the case S is a random unitary
matrix (no time reversal symmetry), a symmetric unitary matrix (time
reversal symmetry T 2 = 1), or has the self dual property (2.6) (time reversal
symmetry T 2 = 1).

2.5. Eigenvalue p.d.f.s for Hermitian matrices


Let X = [xjk ]j,k=1,...,N be a real symmetric matrix. The diagonal and upper
triangular entries are independent variables. We know that real symmetric
matrices can be diagonalized according to
X = OLOT
where L = diag(1 , . . . , N ) is the diagonal matrix of the eigenvalues (a to-
tal of N independent variables), while O is a real orthogonal matrix formed
out of the eigenvectors (a total of N (N 1)/2 independent variables). We
seek the Jacobian for the change of variables from the independent ele-
ments of X, to the eigenvalues 1 , . . . , N and the N (N 1)/2 linearly
independent variables formed out of linear combinations of the elements
of O.
To calculate the Jacobian, it is useful to be familiar with the notion of
the wedge product,
du1 dun := det[dui (~rj )]i,j=1,...,n . (2.11)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

48 P. J. Forrester

Now, when changing variables from {u1 , . . . , un } to {v1 , . . . , vn }, since


n
X ui
dui = dvl
vl
l=1

and
" n
#
X ui h u i
i
dvl (~rj ) = [dvi (~rj )]i,j=1,...,n
vl vj i,j=1,...,n
l=1 i,j=1,...,n

it follows from (2.11) that


h u i
i
du1 dun = det dv1 dvn
vj i,j=1,...,n

thus allowing the Jacobian to be read off.


Denote by dH denote the matrix of differentials of H. We have
dH = dO LOT + OdL OT + OLdOT .
Noting from OT O = 1N that dOT O = OT dO it follows from this that
OT dH O = OT dO L LOT dO + dL

d1 (2 1 )~o T1 d~o2 (N 1 )~o T1 d~oN

(2 1 )~o T1 d~o2 d2 (N 2 )~o T2 d~oN

= .. .. ..
. . .

(N 1 )~o T1 d~oN (N 2 )~o T2 d~oN dN
where ~ok denotes the kth column of O.
For H Hermitian, let (dH) denote (up to a sign) the wedge product of
all the independent elements, real and imaginary parts separately, of H. To
compute the wedge product on the left hand side, the following result is
required (see e.g. [13]).

Proposition 2.1. Let A and M be N N matrices, where A is non-


singular. For A real ( = 1), complex ( = 2) or real quaternion ( = 4),
and M real symmetric ( = 1), complex Hermitian ( = 2) or quaternion
real Hermitian ( = 4)
 (N 1)/2+1
(A dM A) = det A A (dM ).

Making use of this result with = 1 we see immediately that


Y N
^
(dH) = (k j ) dj (OT dO). (2.12)
1j<kN j=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 49

A simple scaling argument can be used to predict the structure of (2.12).


Since there are N (N + 1)/2 independent differentials in (dH), we see that
for c a scalar
(dcH) = cN (N +1)/2 (dH).
But cH = OcLOT , so we conclude that (dcH) is a homogeneous polynomial
of degree N (N 1)/2 in {j } (note that d1 dN contributes degree N ).
Furthermore, because the probability of repeated eigenvalues occurs with
zero probability (dX) must vanish for j = k . These two facts together
tell us that the dependence in the eigenvalues is precisely as in (2.12).
The analogue of (2.12) for complex Hermitian matrices is
Y N
^
(dH) = (k j )2 dj (U dU ) (2.13)
1j<kN j=1

while for Hermitian matrices with real quaternion entries it is


Y N
^
(dH) = (k j )4 dj (S dS). (2.14)
1j<kN j=1

It is of interest to understand (2.13) and (2.14) from the viewpoint of scal-


ing. Consider for definiteness (2.13) (the case of (2.14) is similar). Recalling
that for H complex Hermitian (dH) consists of the product of differentials of
2
the independent real and imaginary parts, it follows that (dcH) = cN (dH).
This tells us that the polynomial in {j } in (2.13) is of degree N 2 N , and
we know from the argument in the case of (2.12) that it contains a factor
Q
of j<k (k j ). We want to deduce that in fact this factor is repeated
twice. For this we use the fact that the N N complex Hermitian matrix
[xjk + iyjk ]j,k=1,...,N has the same eigenvalues as the 2N 2N real matrix
 
xjk yjk
(2.15)
yjk xjk j,k=1,...,N
but with each eigenvalue doubly degenerate, due to the isomorphism be-
tween the complex numbers and the 22 matrices exhibited in (2.15). From
this viewpoint the second factor then corresponds to a double degeneracy.
As a consequence of (2.12), (2.13), (2.14) it follows that the eigenvalue
p.d.f. for ensembles of Hermitian matrices weighted by (2.1) and with real
( = 1), complex ( = 2) and real quaternion ( = 4) elements is
1 PNj=1 V (j )
Y
e |k j | .
C
1j<kN
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

50 P. J. Forrester

2.6. Eigenvalue p.d.f.s for Wishart matrices


For X a random n m rectangular matrix (n m), the positive definite
matrix A := X X is referred to as a Wishart matrix, after the application of
such matrices in multi-variate statistics (see e.g. [28]). In the latter setting
there are m variables x1 , . . . , xm , each measured n times, to give an array
(j)
of data [xk ] j=1,...,n which is thus naturally represented as a matrix. Then
k=1,...,m
" n #
X (j) (j)
T
X X= xk 1 xk 2
j=1 k1 ,k2 =1,...,m

is essentially an empirical approximation to the covariance matrix for the


data.
Fundamental to the computation of the eigenvalue p.d.f. for Wishart
matrices is the following result relating the Jacobian for changing variables
from the elements of X to the elements of A (and other associated variables
not explicitly stated).

Proposition 2.2. Let the n m matrix X have real ( = 1), complex


( = 2) or real quaternion ( = 4) elements, and suppose it has p.d.f. of
the form F (X X). The p.d.f. of A := X X is then proportional to
 (/2)(nm+12/)
F (A) det A .

Proof . We follow [29]. The p.d.f. of A must be equal to


F (A)h(A)
for some h. Write A = B V B where V is positive definite. Making use of
the result of Proposition 2.1 tells us that the p.d.f. of V is then
F (B V B)h(B V B) det(B B)(/2)(m1+2/) . (2.16)
Now let X = Y B, where Y is such that V = Y Y . Noting that for ~x T =
~y T B, the Jacobian is (det B B)/2 , it follows that
(dX) = (det B B)n/2 (dY )
and hence the p.d.f. of Y is
F (B Y Y B)(det B B)n/2 .
This is a function of Y Y , so the p.d.f. of V = Y Y is
F (B V B)(det B B)n/2 h(V ). (2.17)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 51

Equating (2.16) and (2.17) gives

h(B V B) = h(V )(det B B)(/2)(nm+12/) .

Setting V = 1m and noting h(1m ) = c for some constant c implies the


sought result.

Suppose that, analogous to (2.1), the matrix X in (2.4) is distributed


according to
1 Tr(V (X X))
e
C
where V (x) is a polynomial in x with positive leading term. Then as a
consequence of Proposition 2.2 and (2.12)(2.14) the eigenvalue p.d.f. of
A = X X is equal to
m
1 Pm
j=1 V (j )
Y (/2)(nm+12/)
Y
e j |k j | (2.18)
C j=1 1j<km

where 0 j < (j = 1, . . . , N ). Because the eigenvalues of (2.4), {xj }


say, are related to the eigenvalues {j } by x2j = j , it follows that the {xj }
have p.d.f.
m
1 Pm 2 Y
j=1 V (xj )
Y
e |xj |(nm+12/)+1 |x2k x2j | . (2.19)
C j=1 1j<km

2.7. Eigenvalue p.d.f.s for unitary matrices


We now take up the problem of computing the Haar volume form (2.8) and
its analogues (2.9) for symmetric and self dual quaternion unitary matrices.
There are a number of possible approaches (see e.g. [13]). Here use will be
made of the Cayley transform
1N U
H=i (2.20)
1N + U
which maps a unitary matrix U to an Hermitian matrix H. From this the
volume form (U dU ) can be computed in terms of (dH), and the decom-
position of the latter in terms of its eigenvalues and eigenvectors is already
known. To begin we invert (2.20) so it reads
1N + iH
U= .
1N iH
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

52 P. J. Forrester

Making use of the general operator identity


d dK
(1 K)1 = (1 K)1 (1 K)1 ,
da da
where K is assumed to be a smooth function of a, we deduce from this that

U dU = 2i(1N + iH)1 dH(1N iH)1 . (2.21)

To be consistent with (2.7), (2.9) for U symmetric or self dual quaternion


we introduce the decompositions

U = V T V, U = V DV

for V U (N ), V U (2N ) respectively, and use (2.21) to calculate U :=


V U dU V . This gives
i
U = (V + V ) dH(V + V )
2
where = T ( = 1), = D ( = 4). Observe that the elements of V +V
are real for = T , real quaternion for = D. This tells us that Proposition
2.1 can be applied to the right hand side of (2.21) with appropriate and
thus
 (N 1)/21
(U ) = 2N ((N 1)/2+1) det(1N + H 2 ) (dH). (2.22)

Let {j } denote the eigenvalues of H and {eij } denote the corresponding


eigenvalues of U , with H and U related by (2.20). Then

1 + i
ei = . (2.23)
1 i
From (2.22) the corresponding eigenvalues p.d.f. of {j } is
N
1 Y 1 Y
2 (N 1)/2+1
|k j | . (2.24)
C
l=1
(1 + l ) j<k

Changing variables according to (2.23) gives for the eigenvalue p.d.f. of


{eij }
1 Y ik
|e eij | . (2.25)
C
j<k
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 53

2.8. Eigenvalue p.d.f.s for blocks of unitary matrices


We seek the distribution of the non-zero eigenvalues of t t in the decompo-
sition (2.10). To compute this distribution, one approach is to consider the
singular value decomposition of each of the individual blocks, for example
tmn = Ut t Vt , where t is a rectangular diagonal matrix with diagonal
entries consisting of the square roots of the non-zero eigenvalues of t t,
and Ut and Vt are m m and n n unitary matrices. In terms of such
decompositions it is possible to parametrize (2.10) as
   
U 0 V 0
S= r L r (2.26)
0 Ur 0 0 Vr 0
where
"p #
1 t Tt p it
L= .
iTt 1 Tt t
In the case that S is symmetric, it is further required that
Vr = UrT , Vr0 = UrT0 , (2.27)
while for S self dual quaternion
Vr = UrD , Vr0 = UrD0 . (2.28)
From (2.26) the method of wedge products can be used to derive that
the non-zero elements of t have the distribution
m
Y Y

j |2k 2j | , = n m + 1 2/ (2.29)
j=1 1j<km

where 0 < j < 1 (j = 1, . . . , m). But it turns out that the details of
the calculation are quite tedious [13, 19]. In the case = 2 some alternative
derivations are possible [33, 6, 19], and a more general result can be derived.

Proposition 2.3. Let U be an N N random unitary matrix chosen with


Haar measure. Decompose U into blocks
 
An1 n2 Cn1 (N n2 )
U= (2.30)
B(N n1 )n2 D(N n1 )(N n2 )
where n1 n2 . The eigenvalue p.d.f. of Y := A A is proportional to
n2
Y n2
Y
(n1 n2 )
yj (1 yj )(N n1 n2 ) (yk yj )2 . (2.31)
j=1 j<k
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

54 P. J. Forrester

Proof . We will use the matrix integral [22]


Z  n
e(i/2)Tr(HQ) det(H 1m ) (dH) (det Q)(nm) e(i/2)TrQ , (2.32)

valid for Q Hermitian and Im() > 0, and the integration is over the space
of m m Hermitian matrices. In (2.30) the fact that U is unitary tells us
that
AA + CC = 1n1 . (2.33)
Following an idea of [38], we regard (2.33) as a constraint in the space of
general n1 n2 and n1 (N n2 ) complex rectangular matrices A and C,
which allows the distribution of A to be given by
Z
(AA + CC 1n2 )(dC). (2.34)

The delta function in (2.34) is a product of scalar delta functions, which in


turn is proportional to the matrix integral
Z

eiTr(H(AA +CC 1n2 )) (dH), (2.35)

where the integration is over the space of n2 n2 Hermitian matrices.


Substituting (2.35) and (2.34) and changing the order of integration,
the integration over C is a Gaussian integral and so can be computed im-
mediately. However for the resulting function of H to be integrable around
H = 0, the replacement H 7 H iIn in the exponent of (2.35) must be
made. Doing this we are able to deduce (2.34) to be proportional to
Z

lim (det(H i1n1 ))(N n2 ) eiTr(H(1n1 AA ) (dH) (2.36)
0+

which in turn is proportional to


(det(1n1 AA ))(N n1 n2 ) (2.37)
where (2.37) follows from (2.36) using (2.32). Because the non-zero eigen-
values of AA and A A are the same, we can replace AA 7 A A in (2.37).
Now using Proposition 2.2 in the case = 2 gives that the distribution of
A is proportional to
(det Y )(n1 n2 ) (det(1n2 Y ))(N n1 n2 ) . (2.38)
Changing variables now to the eigenvalues and eigenvectors using (2.13)
gives the stated result.

The eigenvalue p.d.f. (2.31) reclaims (2.30) in the case = 2 by setting


n1 = n2 = m, N = n + m and changing variables 1 yj = 2j .
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 55

2.9. Classical random matrix ensembles


Let = 1, 2 or 4 according to the elements being real, complex or quater-
nion real respectively. In the case of Hermitian matrices, the eigenvalue
p.d.f.s derived above all have the general form
N
1 Y Y
g(xl ) |xk xj | .
C
l=1 1j<kN

Choosing the entries of the matrices to be independent Gaussians, when


there is a choice, the form of g(x) is, up to scaling xl 7 cxl ,
x2

e , Gaussian
a x
x e (x > 0) Laguerre
g(x) =

xa (1 x)b (0 < x < 1) Jacobi

(1 + x2 ) Cauchy.

These are the four classical weight functions from orthogonal polynomial
theory, which can be characterized by the property that
d a(x)
log g(x) =
dx b(x)
where

degree a(x) 1, degree b(x) 2.

Recall that stereographic projection of the Cauchy weight for a certain


gives the circular ensemble, as noted in (2.23)(2.25). Thus these are
essentially the same p.d.f.s encountered in the log-gas systems of Section
1.1, and the quantum many body systems of Section 1.2, except that is
restricted to one of three values.
It is our objective in the rest of these notes to explore some eigenvalue
problems which relate to the Gaussian and Laguerre ensembles for general
> 0.

3. -Ensembles of Random Matrices


3.1. Gaussian ensemble
We will base our construction on an inductive procedure. Let a be a scalar
~ be a N 1
chosen from a particular probability distribution, and let w
column vector with each component drawn from a particular probability
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

56 P. J. Forrester

density. Inductively define a sequence of matrices {Mj }j=1,2,... by M1 = a


and
 
DN w
~
MN +1 = (3.1)
~T a
w

where DN = diag(a1 , . . . , aN ) with {aj } denoting theeigenvalues of MN .


For example, suppose a N[0, 1] and wj N[0, 1/ 2] (j = 1, . . . , N ).
Let ON be the real orthogonal matrix which diagonalizes MN , so that
T
M N = O N DN O N , and observe
   T  
ON ~0 MN w
~ ON ~0 DN w
~
.
~0 T 1 ~T a
w ~0 T 1 ~T a
w

It follows that the construction (3.1) gives real symmetric matrices MN


with distribution proportional to
2
eTr(MN )/2

and we know the corresponding eigenvalue p.d.f. is


N
1 Y a2j /2 Y
e |ak aj |. (3.2)
C
l=1 1j<kN

Given the eigenvalues {aj }j=1,...,N of MN we would like to compute the


eigenvalues {j }j=1,...,N +1 of MN +1 . Now
 
1N DN w ~
det(1N +1 MN +1 ) = det
w ~T a
 
1N DN w~
= det ~0 T aw ~ T (1N DN )1 w
~
~ T (1N DN )1 w)
= pN ()( a w ~

where pN () is the characteristic polynomial for MN . But 1N DN is


diagonal, so its inverse is also diagonal, allowing us to conclude
N
pN +1 () X qi
=a , qi := wi2 . (3.3)
pN () i=1
ai

The eigenvalues of MN +1 are thus given by the zeros of the rational


function in (3.3). The corresponding p.d.f. can be computed for a certain
choice of the distribution of the qi [12, 20].
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 57

Proposition 3.1. Let wi2 [/2, 1] where [s, ] refers to the gamma
distribution, specified by the p.d.f. s xs1 ex/ /(s) (x > 0). Given
a1 > a 2 > > a N
the p.d.f. for the zeros of the random rational function
N
X qi
a
i=1
ai
is equal to
Y
(j k )
2 N N
+1 Y
ea /2 1j<kN +1 Y
Y |j ap |/21
((/2))N (aj ak )1 j=1 p=1
1j<kN
N +1 N
!!
1 X X
exp 2j a2j (3.4)
2 j=1 j=1

where
> 1 > a1 > 2 > > aN > N +1 > (3.5)
and
N
X +1 N
X
j = aj + a. (3.6)
j=1 j=1

Proof . Because the qi are positive, graphical considerations imply the


interlacing condition. Note too that the summation constraint is equiva-
lent to the statement that Tr MN +1 = Tr DN + a, while the translations
j 7 j a, aj 7 aj a shows it suffices to consider the case a = 0.
With a = 0 we have
NY+1
( j )
X N
qi j=1
= N .
i=1
ai Y
( al )
l=1
From the residue at = ai it follows
N
Y +1
(ai j )
j=1
Y = qi . (3.7)
(ai al )
l=1,l6=i
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

58 P. J. Forrester

We want to compute the change of variables from {qi }i=1,...,N to


{j }j=1,...,N . It follows immediately from (3.7) that up to a sign
N N h N
^ Y 1 i^
dqi = qj det dj . (3.8)
j=1 j=1
ai j j=1

Hence after making use of the Cauchy double alternant identity the sought
Jacobian is seen to be equal to

Y

(ai aj )(i j )
YN
1i<jN
qj N
. (3.9)
Y
j=1
(ai j )
i,j=1

But the distribution of {wj } is equal to


N
1 Y P
/21 Nj=1 qj .
q j e (3.10)
((/2))N j=1

We must multiply (3.9) and (3.10), and write {qj } in terms of {ai , j }. By
equating the coefficients of 1/ on both sides of (3.3) and using (3.6) with
a = 0 we see
N N +1 N
!
X 1 X 2 X 2
qj = .
j=1
2 j=1 j j=1 j
QN
Further, we can read off i=1 qi from (3.7). Substituting we deduce (3.4).

Suppose now that


a N[0, 1]. (3.11)
Then we see from Proposition 3.1 that the (conditional) eigenvalue p.d.f. of
2
{j } is given by (3.4) with ea /2 replaced by the constant 12 , and
the constraint (3.6) no longer present. Let this conditional eigenvalue
p.d.f. be denoted GN ({j }, {ak }), and denote its domain of support
(3.5) by RN . Let {aj } have p.d.f. pN (a1 , . . . , aN ), and let {j } have
p.d.f. pN +1 (1 , . . . , N +1 ). With this notation, we read off from (3.4) that
Z
pN +1 (1 , . . . , N +1 ) = da1 daN GN (({j }, {ak })pN (a1 , . . . , aN ) .
R
(3.12)
We seek the solution of this recurrence with p0 = 1.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 59

When = 1 we know that the solution of (3.12) is given by (3.2). To


obtain its solution for general , we begin by noting that with i denoting
the top entry of the normalized eigenvector corresponding to the eigenvalue
i of MN we have
N
X 2j   pN 1 ()
= (1N MN )1 = . (3.13)
j=1
j 11 pN ()

Here the first equality follows from the spectral decomposition, while the
second follows from Cramers rule. Because the matrix (3.1) is real sym-
metric and thus orthogonally diagonalizable, we must have
N
X
2i = 1
j=1

which is consistent with (3.13).


In the case = 1 the matrix MN is orthogonally invariant and so we
have
wi2
2i 2 =: i
w12 + + wN
where each wi2 [1/2, 1]. Generally, if
wi2 [/2, 1] (3.14)
then the p.d.f. of 1 , . . . , N is equal to the Dirichlet distribution
N
(N /2) Y /21
(3.15)
((/2))N j=1 j
PN
where each j is positive and j=1 j = 1.
Let us then consider the distribution of the roots of (3.13) implied by
the 2i having the Dirichlet distribution implied by (3.15) [8, 1].

Proposition 3.2. Let {i } have the Dirichlet distribution


N
(N /2) Y /21

((/2))N j=1 j

and let {bj } be given. The roots of the random rational function
N
X j
,
j=1
x bj
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

60 P. J. Forrester

denoted {x1 , . . . , xN 1 } say, have the p.d.f.


Y
(xj xk )
N 1 Y
N
(N /2) 1j<kN 1 Y
Y |xj bp |/21 (3.16)
((/2))N (bj bk )1 j=1 p=1
1j<kN

where

x1 > b1 > x2 > b2 > > xN 1 > bN . (3.17)

Proof . As is consistent with (3.13) write


N
Y 1

N
(x xl )
X j
= l=1 .
j=1
x bj YN
(x bl )
l=1

For a particular j, taking the limit x bj shows


N
Y 1
(bj xl )
l=1
j = N
. (3.18)
Y
(bj bl )
l=1,l6=j

Our task is to change variables from {j }j=1,...,N 1 to {xj }j=1,...,N 1 . Anal-


ogous to (3.8) we have, up to a sign
N^
1 N 1 h N^
1
Y 1 i
dj = j det dxj
j=1 j=1
bj xk j,k=1,...,N 1 j=1

and thus the corresponding Jacobian is equal to



Y

(bk bj )(xk xj )
NY1
1j<kN 1
j N 1
. (3.19)
Y
j=1
(bj xk )
j,k=1

The result now follows immediately upon multiplying (3.19) with (3.9), and
substituting for j using (3.18).
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 61

Because, with respect to {xj }, (3.16) is a p.d.f., integrating over the


0
region (3.17) (RN 1 say) must give unity, and so we have the integration
formula
Z Y N
Y 1 YN
dx1 dxN 1 (xj xk ) |xj bp |/21
R0N 1 1j<kN 1 j=1 p=1

((/2))N Y
= (bj bk )1 . (3.20)
(N /2)
1j<kN

This allows us to verify the solution of the recurrence (3.12).

Proposition 3.3. The solution of the recurrence (3.12) is given by


N
1 Y 2 Y
pN (x1 , . . . , xN ) = exj /2 |xk xj | (3.21)
mN () j=1
1j<kN

where
N 1
Y (1 + (j + 1)/2)
N !mN () = (2)N/2 .
j=0
(1 + /2)

Proof . Substituting (3.21) in the r.h.s. of (3.12) gives


1 1 1 1
PN +1 2 Y
N
e 2 j=1 j (j k )
2 ((/2)) mN () 1j<kN +1
Z Y N
Y +1 Y
N
da1 daN (aj ak ) |j ap |/21 .
RN 1j<kN j=1 p=1

The integral is precisely the N 7 N + 1 case of (3.20). Substituting its


value we obtain pN +1 as specified by (3.21).

3.2. Three term recurrence and tridiagonal matrices


According to the working of the previous section, the characteristic poly-
QN
nomial pN (x) := l=1 (x xl ), where {xj } is distributed according to the
Gaussian -ensemble , satisfies the recurrence relation
N
pN 1 (x) X j
=
pN (x) j=1
x xj

where
j wj2 /(w12 + + wN
2
), wj2 [/2, 1],
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

62 P. J. Forrester

as well as the further recurrence relation


N
pN +1 (x) X wj2
=xa , a N[0, 1].
pN (x) j=1
x xj

The two recurrences together give the random coefficient three term recur-
rence
pN +1 (x) = (x a)pN (x) b2N pN (x). (3.22)
The three term recurrence (3.22) occurs in the study of tridiagonal ma-
trices. Thus consider a general real symmetric tridiagonal matrix

a 1 b1
b1 a 2 b2

b2 a 3
Tn = . (3.23)
..
. b n1
bn1 an
By forming 1n Tn and expanding the determinant along the bottom row
one sees
det(1n Tn ) = ( an ) det(1n1 Tn1 ) b2n1 det(1n2 Tn2 ).
Comparison with (3.22) shows the Gaussian -ensemble is realized by the
eigenvalue p.d.f. of random tridiagonal matrices with
aj N[0, 1] b2j [j/2, 1]. (3.24)
This result was first obtained using different methods in [9]. The present
derivation is a refinement of the approach in [20].

4. Laguerre Ensemble
A recursive construction of the Hermite ensemble was motivated by con-
sideration of a recursive structure inherent in the GOE. Likewise, to moti-
vate a recursive construction of the Laguerre ensemble we first examine
the case of the LOE. As noted in Section 2.6 this is realized by matrices
T
X(n) X(n) where X(n) is an n N rectangular matrix with Gaussian entries
N[0, 1]. Such matrices satisfy the recurrence
T T
X(n+1) X(n+1) = X(n) X(n) + ~x(1) ~x T(1) . (4.1)
This suggests inductively defining a sequence of N N positive definite
matrices indexed by (n) according to
A(n+1) = diag A(n) + ~x(1) ~x T(1) (4.2)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 63

where diag A(n) refers to the diagonal form of A(n) and A(0) = [0]N N .
Noting that A(n) will have N n zero eigenvalues, it is therefore necessary
to study the eigenvalues of the N N matrix
Y := diag(a1 , . . . , an , an+1 , . . . , an+1 ) + ~x~x T .
| {z }
N n

Since
det(1N Y ) = det(1N A) det(1N (1N A)1 ~x~x T )
it follows
N
X
x2j
n
det(1N Y ) X x2j j=n+1
=1 . (4.3)
det(1N A) j=1
aj an+1

One is thus led to ask about the density of zeros of the random rational
function
n+1
X wj2
1 , (4.4)
j=1
aj

where, since the sum of squares of Gaussian distributed variables are gamma
distributed variables,
wj2 [sj , 1]. (4.5)

Proposition 4.1. The zeros of the rational function (4.4) have p.d.f.
1 Pn+1
e j=1 (j aj )
(s1 ) (sn+1 )
n+1
Y (i j ) Y
s +s 1
|i aj |sj 1 (4.6)
(ai aj ) i j
i,j=1
1i<jn+1

where
1 > a1 > 2 > > n+1 > an+1 .

This result can be proved [20] by following the general strategy used to
establish Propositions 3.1 and 3.2.
The case of interest is
s1 = = sn = /2, sn+1 = (N n)/2, an+1 = 0. (4.7)
Let us denote (4.6) with these parameters by
G({j }j=1,...,n+1 ; {aj }j=1,...,n ) .
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

64 P. J. Forrester

Let the p.d.f. of {j }j=1,...,n+1 be denoted pn+1 ({aj }). For n < N the
recursive construction of {A(n) } gives that
Z
pn+1 ({j }) = da1 dan
1 >a1 >>n+1 >0
G({j }j=1,...,n+1 ; {aj }j=1,...,n )pn ({aj }) (4.8)
subject to the initial condition p0 = 1.
With = 1 the LOE recursion (4.1) tells us that the recurrence (4.8) is
satisfied by the eigenvalue p.d.f. for the non-zero eigenvalues of the Wishart
T
matrices X(n) X(n) . This in turn is equal to the eigenvalue p.d.f. of the full
T
rank matrices X(n) X(n) , which according to (2.18) is given by
n
1 Y (N n1)/2 l Y
pn ({j }) = l e |k j | (4.9)
Cn
l=1 1j<kn

(here the choice V (x) = x in (2.18) has been introduced to account for the
scale factor = 1 in the distribution [sj , ] used in (4.4)).
For general > 0, we want to check that (4.8) has as its solution
n
1 Y (N n+1)/21 l
Y
pn ({j }) = l e |k j | . (4.10)
Cn,
l=1 1j<kn

Since
G({j }j=1,...,n+1 ; {aj }j=1,...,n ) n+1
Y
(i j )
1 P
n ( a ) i<j
= e j=1 j j n+1
n
((/2))n ((N n)/2) Y
(ai aj )1
n+1 i<j
Y (N n)/2+1
i n
i=1
Y
(N n+1)/21
|i aj |/21
ai i,j=1

we see we need to evaluate


Z n
Y n
Y
da1 dan (ai aj ) |i aj |/21 .
1 >a1 >>n+1 >0 i<j i,j=1

This is precisely the integral (3.20) with N 7 n + 1, and so is equal to


((/2))n+1 Y
(j k )1 ,
((n + 1)/2)
1j<kn+1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 65

leaving us with
Cn+1, (/2)
pn+1 ({j }).
Cn, ((n + 1)/2)((N n)//2)
Thus (4.9) with
n
Y ((k + 1)/2)((N k)//2)
Cn, = (4.11)
(/2)
k=0

is indeed the solution of the recurrence (4.8).


Q
Let pn () = nl=1 ( xl ), where {xl } have the p.d.f. (4.10). We see
from (4.3)(4.5) and (4.7) that pn satisfies the recurrence
n
pn+1 () X wj2 w2
=1 n+1 (4.12)
pn () j=1
xj

where
wj2 [/2, 1] (j = 1, . . . , n), 2
wn+1 [(N n)/2, 1].
In addition, as for the matrix MN introduced in (3.1), the matrix A(n) in
(4.2) must satisfy the first equality in (3.13), thus implying the companion
recurrence
n
pn1 () X j
= (4.13)
pn () j=1
xj

where
j wj2 /(w12 + + wn2 ).
Comparing (4.12) and (4.13) gives the three term recurrence with random
coefficients [20]
2
pn+1 () = ( wn+1 )pn () bn pn1 () (4.14)
where
2
wn+1 [(N n)/2, 1], bn [n/2, 1].

5. Recent Developments
The whole topic of explicit constructions of -random ensembles is recent,
with the first paper on the subject appearing in 2002 [9]. In that work the
motivation came from considerations in numerical linear algebra, whereby
the form of a GOE matrix after the application of Householder transfor-
mations to tridiagonal form was sought. In the case of unitary matrices,
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

66 P. J. Forrester

the viewpoint of numerical linear algebra suggests seeking the Hessenberg


form. Doing this [25] leads to a random matrix construction of the circu-
lar -ensemble. Similarly, seeking the Hessenberg form of real orthogonal
matrices from O + (N ) leads to a random matrix construction of the Ja-
cobi -ensemble [25]. An alternative approach to the latter involves the
cosine-sine block decomposition of unitary matrices [10].
Recurrence relations with random coefficients for the characteristic poly-
nomials of the circular and Jacobi -ensembles following from the underly-
ing Hessenberg matrices are given in [25]. Using methods similar to those
presented in Sections 3.1 and 4 for the Gaussian and Laguerre -ensembles,
different three term recurrences with random coefficients for the Jacobi -
ensemble and the circular -ensemble have been given [20, 21].
Most recently [31, 11, 26] the continuum limit of various of the recur-
rences has been shown to be given by certain differential operators with
random noise terms. In the case of the Gaussian -ensemble this can be
anticipated by viewing the corresponding tridiagonal matrix (3.23) as the
discretization of a certain random Schrodinger operator [4]. This allows the
scaled distributions of the particles to be described in terms of the eigen-
values of the corresponding random differential operator.

Acknowledgments
This work was supported by the Australian Research Council.

References
1. G. W. Anderson, A short proof of Selbergs generalized beta formula, Forum
Math. 3 (1991) 415417.
2. T. H. Baker and P. J. Forrester, The Calogero-Sutherland model and gener-
alized classical polynomials, Commun. Math. Phys. 188 (1997) 175216.
3. O. Bohigas, M. J. Giannoni and C. Schmit, Characterization of chaotic quan-
tum spectra and universality of level fluctuation laws, Phys. Rev. Lett. 52
(1984) 14.
4. J. Breuer, P. J. Forrester and U. Smilansky, Random discrete Schrodinger
operators from random matrix theory, arXiv:math-ph/0507036 (2005).
5. F. Calogero, Solution of the three-body problem in one dimension, J. Math.
Phys. 10 (1969) 21912196.
6. B. Collins, Product of random projections, Jacobi ensembles and universality
problems arising from free probability, Prob. Theory Rel. Fields 133 (2005)
315344.
7. P. Desrosiers and P. J. Forrester, Hermite and Laguerre -ensembles: Asymp-
totic corrections to the eigenvalue density, Nucl. Phys. B 743 (2006) 307332.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

Beta Random Matrix Ensembles 67

8. A. L. Dixon, Generalizations of Legendres formula ke0 (k e)k 0 = 21 ,


Proc. London Math. Soc. 3 (1905) 206224.
9. I. Dumitriu and A. Edelman, Matrix models for beta ensembles, J. Math.
Phys. 43 (2002) 58305847.
10. A. Edelman and B. D. Sutton, The beta-Jacobi matrix model, the CS de-
composition, and generalized singular value problems, preprint (2006).
11. A. Edelman and B. D. Sutton, From random matrices to stochastic operators,
math-ph/0607038 (2006).
12. R. J. Evans, Multidimensional beta and gamma integrals, Contemp. Math.
166 (1994) 341357.
13. P. J. Forrester, Log-gases and random matrices,
www.ms.unimelb.edu.au/matpjf/matpjf.html.
14. P. J. Forrester, Selberg correlation integrals and the 1/r 2 quantum many
body system, Nucl. Phys. B 388 (1992) 671699.
15. P. J. Forrester, Exact integral formulas and asymptotics for the correla-
tions in the 1/r 2 quantum many body system, Phys. Lett. A 179 (1993)
127130.
16. P. J. Forrester, Exact results and universal asymptotics in the Laguerre ran-
dom matrix ensemble, J. Math. Phys. 35 (1993) 25392551.
17. P. J. Forrester, Recurrence equations for the computation of correlations in
the 1/r2 quantum many body system, J. Stat. Phys. 72 (1993) 3950.
18. P. J. Forrester, Addendum to Selberg correlation integrals and the 1/r 2 quan-
tum many body system, Nucl. Phys. B 416 (1994) 377385.
19. P. J. Forrester, Quantum conductance problems and the Jacobi ensemble,
J. Phys. A 39 (2006) 68616870.
20. P. J. Forrester and E. M. Rains, Interpretations of some parameter dependent
generalizations of classical matrix ensembles, Probab. Theory Relat. Fields
131 (2005) 161.
21. P. J. Forrester and E. M. Rains, Jacobians and rank 1 perturbations relating
to unitary Hessenberg matrices, math.PR/0505552 (2005).
22. Y. V. Fyodorov, Negative moments of characteristic polynomials of random
matrices: Ingham-Siegel integral as an alternative to Hubbard-Stratonovich
transformation, Nucl. Phys. B 621 (2002) 643674.
23. F. Haake, Quantum Signatures of Chaos (Springer, Berlin, 1992).
24. J. Kaneko, Selberg integrals and hypergeometric functions associated with
Jack polynomials, SIAM J. Math Anal. 24 (1993) 10861110.
25. R. Killip and I. Nenciu, Matrix models for circular ensembles, Int. Math.
Res. Not. 50 (2004) 26652701.
26. R. Killip and M. Stoiciu, Eigenvalue statistics for CMV matrices: From Pois-
son to clock via circular beta ensembles, Duke Math. J. 146 (2009) 361399.
27. I. G. Macdonald, Hall Polynomials and Symmetric Functions, 2nd edn.
(Oxford University Press, Oxford, 1995).
28. R. J. Muirhead, Aspects of Multivariable Statistical Theory (Wiley, New York,
1982).
29. I. Olkin, The 70th anniversary of the distribution of random matrices: A sur-
vey, Linear Algebra Appl. 354 (2002) 231243.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester

68 P. J. Forrester

30. C. E. Porter, Statistical Theories of Spectra: Fluctuations (Academic Press,


New York, 1965).
31. J. Ramirez, B. Rider and B. Virag, Beta ensembles, stochastic Airy spectrum,
and a diffusion, math.PR/0607331 (2006).
32. H. Risken, The Fokker-Planck Equation (Springer, Berlin, 1992).
33. S. H. Simon and A. L. Moustakas, Crosssover from conserving to Lossy trans-
port in circular random-matrix ensembles, Phys. Rev. Lett. 96 (2006) 136805.
34. A. D. Stone, P. A. Mello, K. A. Muttalib and J.-L. Pichard, Random matrix
theory and maximum entropy models for disordered conductors, in Meso-
scopic Phenomena in Solids, eds. P. A. Lee, B. L. Altshuler and R. A. Webb
(North Holland, Amsterdam, 1991), pp. 369448.
35. B. Sutherland, Quantum many body problem in one dimension, J. Math.
Phys. 12 (1971) 246250.
36. J. J. M. Verbaarschot, The spectrum of the Dirac operator near zero virtuality
for nc = 2 and chiral random matrix theory, Nucl. Phys. B 426 (1994) 559
574.
37. Z. Yan, A class of generalized hypergeometric functions in several variables,
Canad. J. Math. 44 (1992) 13171338.
38. K. Zyczkowski and H.-J. Sommers, Truncations of random unitary matrices,
J. Phys. A 33 (2000) 20452057.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

FUTURE OF STATISTICS

Zhidong Bai and Shurong Zheng



National University of Singapore
6 Science Drive 2, 117546, Singapore

Northeast Normal University
Changchun, Jilin 130024, P. R. China
E-mails: [email protected]

[email protected]

With the rapid development of modern computer science, large dimen-


sional data analysis has become more and more important and has there-
fore received increasing attention from statisticians. In this note, we
shall illustrate the difference between large dimensional data analysis
and classical data analysis by means of some examples, and we show
the importance of random matrix theory and its applications to large
dimensional data analysis.

1. Introduction
What are the future aspects of modern statistics and in which direction
will it develop? To answer this, we shall have a look at what has influenced
statistical research in recent decades. We strongly believe that, in every
discipline, the most impacting factor has been and still is the rapid
development and wide application of computer technology and computing
sciences. It has become possible to collect, store and analyze huge amounts
of data of large dimensionality. As a result, more and more measurements
are collected with large dimension, e.g. data in curves, images and movies,
and statisticians have to face the task of analyzing these data. But com-
puter technology also offers big advantages. We are now in a position to do
many things that were not possible 20 years ago, such as making spectral
decompositions of a matrix of order 1000 1000, searching patterns in a
DNA sequences and much more. However, it also confronts us with the big

69
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

70 Z. D. Bai and S. R. Zheng

challenge that classical limit theorems are no longer suitable to deal with
large dimensional data and we have to develop new limit theorems to cope
with this. As a result, many statisticians have now become interested in
this research topic.
Typically, large dimensional problems involve a large dimension p and
a small sample size n. However, in a real problem, they both are given
integers. It is then natural to ask for which size the dimension p has to be
taken as fixed or tending to infinity and what we should do if we cannot
justify p is fixed. Is it reasonable to claim p is fixed if the ratio of
dimension and sample size p/n is small, say, less than 0.001? If we cannot
say p is fixed, can any limit theorems be used for large dimensional data
analysis?
To discuss these questions, we shall provide some examples of multi-
variate analysis. We illustrate the difference between traditional tests and
the new approaches of large dimensional data by considering tests on the
difference of two population means and tests on the equality of a popula-
tion covariance matrix and a given matrix. By means of simulations, we
will show how the new approaches are superior to the traditional ones.
At present, large dimensional random matrix theory (RMT) is the only
systematic theory which is applicable to large dimensional problems. The
RMT is different from the classical limit theories because it is built on
the assumption that p/n y > 0 regardless what y is, provided where
it is applicable, say y (0, 1) for T 2 statistic. The RMT shows that the
classical limit theories behave very poorly or are even inapplicable to large-
dimensional problems, especially when the dimension is growing propor-
tionally with the sample size [see Dempster (1958), Bai (1993a,b, 1999),
Bai and Saranadasa (1996), Bai and Silverstein (2004, 2006), Bai and Yin
(1993), Bai, Yin and Krishnaiah (1988)]. In this paper, we will show how
to deal with large dimensional problems with the help of RMT, especially
the CLT of Bai and Silverstein (2004).

2. A Multivariate Two-Sample Problem


In this section, we revisit the T 2 test for the two-sample problem. Suppose
that xi,j Np (i , ), j = 1, . . . , Ni , i = 1, 2, are two independent samples.
To test the hypotheses H0 : 1 = 2 vs H1 : 1 6= 2 , traditionally one
uses Hotellings famous T 2 -test which is defined by

T 2 = (x1 x2 )0 A1 (x1 x2 ), (2.1)


May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 71

PNi P2 PNi
where xi = N1i j=1 xi,j , i = 1, 2, A = i=1 j=1 (xi,j xi )(xi,j xi )0
N1 N2
and = n N1 +N2 with n = N1 +N2 2. It is well known that, under the null
hypothesis, the T 2 statistic has an F distribution with degrees of freedom
p and n p + 1.
The advantages of the T 2 -test include the properties that it is invariant
under affine transformations, has an exact known null distribution, and is
most powerful when the dimension of data is sufficiently small compared
to its sample size. However, Hotellings test has the serious defect that the
T 2 statistic is undefined when the dimension of data is greater than the
sample degrees of freedom. Looking for remedies, Chung and Fraser (1958)
proposed a nonparametric test and Dempster (1958, 1960) discussed the
so-called non-exact significance test (NET). Dempster (1960) also con-
sidered the so-called randomization test. Not only being a remedy when
the T 2 is undefined, Bai and Saranadasa (1996) also found that, even if
T 2 is well defined, the NET is more powerful than the T 2 test when the
dimension is close to the sample degrees of freedom. Both, the T 2 test
and Dempsters NET, strongly rely on the normality assumption. Moreover,
Dempsters non-exact test statistic involves a complicated estimation of r,
the degrees of freedom for the chi-square approximation. To simplify the
testing procedure, a new method, the Asymptotic Normality Test (ANT), is
proposed in Bai and Sarahadasa (1996). It is proven there that the asymp-
totic power of ANT is equivalent to that of Dempsters NET. Simulation
results further show that the new approach is slightly more powerful than
Dempsters NET. We believe that the estimation of r and its rounding to
an integer in Dempsters procedure may cause an error of order O(1/n).
This might indicate that the new approach is superior to Dempsters test in
the second order term in some Edgeworth-type expansions (see Babu and
Bai (1993) and Bai and Rao (1991) for reference of Edgeworth expansions).

2.1. Asymptotic power of T 2 test


The purpose of this section is to investigate the asymptotic power of
Hotellings test when p/n y (0, 1) and to compare it with other NETs
given in later sections. To derive the asymptotic power of Hotellings test,
we first derive an asymptotic expression for the threshold of the test. It is
well known that under the null hypothesis, np+1 2
np T has an F -distribution
with degrees of freedom p and n p + 1. Let the significance level be cho-
sen as and the threshold be denoted by F (p, n p + 1). By elementary
calculations, we can prove the following.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

72 Z. D. Bai and S. R. Zheng

p yn
p
Lemma 1. We have np+1 F (p, n p + 1) = 1y + 2y/(1 y)3 nz +
n
o(1/ n), where yn = p/n, limn yn = y (0, 1) and z is the 1
quantile of the standard normal distribution.

Now, to describe the asymptotic behavior of the T 2 statistic under the


alternative hypothesis H1 , one can easily show that the distribution of the
T 2 statistic is the same as of
(w + 1/2 )0 U 1 (w + 1/2 ), (2.2)
P n
where = 1/2 (1 2 ), U = 0 0
i=1 ui ui , w = (w1 , . . . , wp ) and
N1 +N2
ui , i = 1, . . . , n are i.i.d. N (0, Ip ) random vectors and = N1 N2 , and is
covariance matrix of the riginal population. Denote the spectral decompo-
sition of U 1 by Odiag[d1 , . . . , dp ]O0 with eigenvalues d1 dp > 0.
Then, (2.2) becomes
(Ow + 1/2 kkv)0 diag[d1 , . . . , dp ](Ow + 1/2 kkv), (2.3)
where v = O/kk. Since U has the Wishart distribution W (n, Ip ), the
orthogonal matrix O has the Haar distribution on the group of all orthogo-
nal p-matrices, and hence the vector v is uniformly distributed on the unit
p-sphere. Note that the conditional distribution of Ow given O is N (0, Ip ),
the same as that of w, which is independent of O. This shows that Ow is
independent of v. Therefore, replacing Ow in (2.3) by w does not change
the joint distribution of Ow, v and the di s. Consequently, T 2 has the same
distribution as
Xp
n = (wi2 + 2wi vi 1/2 kk + 1 kk2 vi2 )di , (2.4)
i=1

where v = (v1 , . . . , vp )0 is uniformly distributed on the unit sphere of Rp


and is independent of w and the di s.
 Pp yn

Lemma 2. Using the above notation, we have n i=1 d i 1yn 0,
Pp 2 y
and n i=1 di (1y)3 in probability.

Now we are in a position to express the approximation of the power


function of Hotellings test.

Theorem 3. If yn = p/n y (0, 1), N1 /(N1 + N2 ) (0, 1) and


kk = o(1), then
s !
n(1 y) 2
H () z + (1 )kk 0, (2.5)
2y
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 73

where H () is the power function of Hotellings test and is the distribu-


tion function of standard normal random variable.

Remark 4. If the alternative hypothesis is considered in limit theorems,



it is typically assumed that nkk2 a > 0. Under this additional as-
sumption, it follows from (2.5)qthat the limiting power of Hotellings test
is given by H () (z + (1y) 2y (1 )a). This formula shows that
the limiting power of Hotellings test is slowly increasing for y close to 1 as
the non-central parameter a increases.

2.2. Dempsters NET


Dempster (1958, 1960) proposed a non-exact test for the hypothesis H 0
with the dimension of data possibly greater than the sample degrees
of freedom. Let us briefly describe his test. Denote N = qN1 + N2 ,
X0 = (x11 , x12 , . . . , x1N1 ; x21 , . . . , x2N2 ) and by H 0 = ( 1N JN , ( NN12N J0N1 ,
q
NN21N J0N2 )0 , h3 , . . . , hN ) a suitably chosen orthogonal matrix, where Jd
is a d-dimensional column vector of 1s. Let Y = HX = (y1 , . . . , yN )0 .
Then, the vectors y1 , . . . , y N are independent normal random vectors with
E(y1 ) = (N1 1 + N2 2 )/ N , E(y2 ) = 1/2 (1 2 ), E(yj ) = 0, for
3 j N , Cov(yj ) = , 1 j N . Dempster proposed the NET
PN
statistic F = Q2 /( i=3 Qi )/n, where Qi = yi0 yi , n = N 2. He used
the so-called 2 -approximation technique, assuming Qi is approximately
distributed as m2r , where the parameters m and r may be found by the
method of moments. Then, the distribution of F is approximately Fr,nr .
But generally the parameter r (its explicit form is given in (2.8) below) is
unknown. Dempster estimated r by either of the following two ways.
Approach 1: r is the solution of the equation

1
1+
1 n
t= + (n 1). (2.6)
r1 3r12

Approach 2: r is the solution of the equation



1
1+   
1 n 1 3 n
t+w = + 2 (n 1) + + 2 , (2.7)
r2 3r2 r2 2r2 2
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

74 Z. D. Bai and S. R. Zheng

PN PN P
where t = n[ln( n1 i=3 Qi )] i=3 ln Qi , w = 3i<jN ln sin2 ij and
ij is the angle between the vectors of yi , yj , 3 i < j N . Dempsters
test is then to reject H0 if F > F (r, nr).
By elementary calculus, we have
(tr())2 tr(2 )
r= and m = . (2.8)
tr(2 ) tr
From (2.8) and the Cauchy-Schwarz inequality, it follows that r p. On the
other hand, under regular conditions, both tr() and tr(2 ) are of the order
O(n), and hence, r is of the same order. Under wider conditions (2.12) and
(2.13) given in Theorem 6 below, it can be proven that r . Further, we
may prove that t (n/r)N (1, 1n ) and w n(n1)
2r
4
N (1, n(n1) 8
+ nr ). From
these estimates, one may conclude that both r1 and r2 are ratio-consistent
(in the sense that r/r 1). Therefore, the solutions of equations (2.6) and
(2.7) should satisfy
n
r1 = + O(1) (2.9)
t
and
 
1 n
r2 = + O(1), (2.10)
w 2
respectively. Since the random effect may cause anerror of order O(1), one
may simply choose the estimates of r as nt or w1 n2 .
To describe the asymptotic power function of Dempsters NET, we as-
sume that p/n y > 0, N1 /N (0, 1) and that the parameter r is
known. The reader should note that the limiting ratio y is allowed to be
greater than one in this case. When r is unknown, substituting r by the
estimators r1 or r2 may cause an error of high order smallness in the ap-
proximation of the power function of Dempsters NET. Similar to Lemma 1
one may show the following.
Lemma 5. When n, r ,
p
F (r, nr) = 1 + 2/rz + o(1/ r). (2.11)
Then we have the following approximation of the power function of Demp-
sters NET.
Theorem 6. If
0 = o( tr 2 ), (2.12)

max = o( tr 2 ), (2.13)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 75

and r is known, then


 n(1 )kk2 
D () z + 0, (2.14)
tr 2
where = 1 2 .

Remark 7. In usual cases when considering the asymptotic power of


Dempsters test, the quantity kk2 is typically assumed to have the same
2
order as 1/ n and tr( ) to have the order n. Thus, the quantities
nkk2 / tr 2 and nkk2 are both bounded away from zero and infin-
ity. The expression of the asymptotic power of Hotellings test involves a

factor 1 y which disappears in the expression of the asymptotic power
of Dempsters test. This reveals the reason why the power of the Hotelling
test increases much slower than that of the Dempster test as the non-central
parameter increases if y is close to one.

2.3. Bai and Saranadasas ANT


In this section, we describe the results for Bai and Saranadasas ANT. We
shall not assume the normality of the underlying distributions. We assume:
(a) xij = zij + i ; j = 1, . . . , Ni , i = 1, 2, where is a p m matrix (m
) with 0 = and zij are i.i.d. random m-vectors with independent
components satisfying Ezij = 0, Var(zij ) = Im , Ez4ijk = 3 + <
Qm k
and E k=1 zijk = 0 (and 1) when there is at least one k = 1 (there
are two k s equal to 2, correspondingly), whenever 1 + + m = 4.
(b) p/n y > 0 and N1 /N (0, 1).
(c) (2.12) and (2.13) are true.
Here and later, it should be noted that all random variables and param-
eters depend on n. For simplicity we omit the subscript n from all random
variables except those statistics defined later.
Now, we begin to construct the ANT proposed in Bai and Saranadasa
(1996). Consider the statistic
Mn = (x1 x2 )0 (x1 x2 ) tr Sn , (2.15)
where Sn = n1 A, x1 , x2 and A are defined in previous sections. Under H0 ,
we have EMn = 0. If Conditions (a)(c) hold, it can be proven that, under
H0 ,
Mn
Zn = N (0, 1), as n . (2.16)
Var Mn
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

76 Z. D. Bai and S. R. Zheng

If the underlying distributions are normal, then, again under H0 , we have


 
2 1
M := Var Mn = 2r2 1 + tr 2 . (2.17)
n

If the underlying distributions are not normal but satisfy Conditions (a)
(c), one may show that
2
Var Mn = M (1 + o(1)). (2.18)

Hence (2.16) is still true if the denominator of Zn is replaced by M . There-


fore, to complete the construction of the ANT statistic, we only need to find
a ratio-consistent estimator of tr(2 ) and substitute it into the denominator
of Zn . It seems that a natural estimator of tr 2 should be tr Sn2 . However,
unlike the case where p is fixed, tr Sn2 is generally neither unbiased nor
ratio-consistent even under the normality assumption. If nSn Wp (n, ),
it is routine to verify that

n2  1 
Bn2 = tr Sn2 (tr Sn )2
(n + 2)(n 1) n

is an unbiased and ratio-consistent estimator of tr 2 . Here, it should be


noted that tr Sn2 n1 (tr Sn )2 0, by the Cauchy-Schwarz inequality. It is
not difficult to prove that Bn2 is also a ratio-consistent estimator of tr 2
under Conditions(a)(c). Replacing tr 2 in (2.17) by the ratio-consistent
estimator Bn2 , we obtain the ANT statistic

(x1 x2 )0 (x1 x2 ) tr Sn
Z= s
2(n + 1)n  
tr Sn2 n1 (tr Sn )2
(n + 2)(n 1)
N1 N2
(x1 x2 )0 (x1 x2 ) tr Sn
= N r N (0, 1). (2.19)
2(n + 1)
Bn
n
Due to (2.19) the test rejects H0 if Z > z . Regarding the asymptotic power
of our new test, we have the following theorem.

Theorem 8. Under Conditions (a)(c),


 
n(1 )kk2
BS () z + 0. (2.20)
2 tr 2
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 77

2.4. Conclusions and simulations


Comparing Theorems 3, 6 and 8, we find that, from the point of view of
large sample theory, Hotellings test is less powerful than the other two tests
when y is close to one and that the latter two tests have the same asymptotic
power function. Our simulation results show that even for moderate sample
and dimension sizes, Hotellings test is still less powerful than the other two
tests when the underlying covariance structure is reasonably regular (i.e.
the structure pof does not cause too large a difference between 0 1
2
and nkk / tr(2 )), whereas the Type I error does not change much in
the latter two tests. It would not be hard to see that, using the approach of
this paper, one may easily derive similar results for the one-sample problem,
namely, Hotellings test is less powerful than NET and ANT, when the
dimension of data is large. Now, let us shed some light on this phenomenon.
The reason Hotellings test being less powerful is the inaccuracy of the
estimator of the covariance matrix. Let x1 , . . . , xn be i.i.d. random p-vectors
of mean 0 and variance-covariance matrix Ip . By the law of large numbers,
Pn
the sample covariance matrix Sn = n1 i=1 xi x0i should be close to the

identity Ip with an error of the order Op (1/ n) when p is fixed. However,
when p is proportional to n (say p/n y (0, 1)), the ratio of the largest

and the smallest eigenvalues of Sn tends to (1+ y)2 /(1 y)2 (see e.g. Bai,
Silverstein and Yin (1988), Bai and Yin (1993), Geman (1980), Silverstein
(1985) and Yin, Bai and Krishnaiah (1988)). More precisely, in the theory of
spectral analysis of large dimensional random matrices, it has been proven
that the empirical distribution of the eigenvalues of Sn tends to a limiting

distribution spreading over [(1 y)2 , (1 + y)2 ] as n . (see e.g.
Jonsson (1982), Wachter (1978), Yin (1986) and Yin, Bai and Krishnaiah
(1983)). This implies that Sn is not close to Ip . Especially when y is close
to one, then Sn has many small eigenvalues and hence Sn1 has many huge
eigenvalues. This will cause the deficiency of the T 2 test. We believe that in
many other multivariate statistical inferences with an inverse of a sample
covariance matrix involved the same phenomenon should exist (as another
example, see Saranadasa (1993)). Let us now explain our quotation-marked
close to one. Note that the limiting ratio between the largest and smallest

eigenvalues of Sn tends to (1+ y)2 /(1 y)2 . For our simulation example,
y = 0.93 and the ratio of the extreme eigenvalues is about 3039. This is
very serious. Even for y as small as 0.1 or 0.01, the ratio can be as large
as 3.705 and 1.494, which shows that it is not even necessary to require
the dimension of data to be very close to the degrees of freedom to make
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

78 Z. D. Bai and S. R. Zheng

the effect of high dimension visible. In fact, this has been shown by our
simulation for p = 4.
Dempsters test statistic depends on the choice of vectors h3 , h4 , . . . , hN
because different choices of these vectors would result in different estimates
of the parameter r. On the other hand, the estimation of r and the round-
ing of the estimates may cause an error (probably an error of second order
smallness) in Dempsters test. Thus, we conjecture that our new test can be
more powerful than Dempsters in their second terms of an Edgeworth type
expansion of their power functions. This conjecture was strongly supported
by our simulation results. Because our test statistic is mathematically sim-
ple, it is not difficult to get an Edgeworth expansion by using the results
obtain in Babu and Bai (1993), Bai and Rao (1991) or Bhattacharya and
Ghosh (1978). It seems difficult to get a similar expansion for Dempsters
test due to his complicated estimation of r.
We conducted a simulation study to compare the powers of the three
tests for both normal and non-normal cases with the dimensions N1 = 25,
N2 = 20, and p = 40. For the non-normal case, observations were generated
by the following moving average model. Let {Uijk } be a set of independent
gamma variables with shape parameter 4 and scale parameter 1. Define

xijk = Uijk + Ui,j+1,k + jk ; (k = 1, . . . , p, j = 1, . . . , Ni , i = 1, 2),

where and the s are constants. Under this model, = (ij ) with
ii = 4(1 + 2 ), i,i+1 = 4 and ij = 0 for |i j| > 1. For the normal case,
the covariance matrices were chosen to be = Ip and = (1 )Ip + Jp ,
with = 0.5, where J is a p p matrix with all entries 1. A simulation
was also conducted for small p (chosen as p = 4). The tests were made for
size = 0.05 with 1000 repetitions.
The power was evaluated at standard
parameter = k1 2 k2 / tr 2 . The simulation for the non-normality
case was conducted for = 0, 0.3, 0.6 and 0.9 (Figure 1). All three tests
have almost the same significance level. Under the alternative hypothesis,
the power curves of Dempsters test and our test are rather close but that
of our test is always higher than Dempsters test. Theoretically, the power
function for Hotellings test should increase very slowly when the noncentral
parameter increases. This is also demonstrated by our simulation results.
The reader should note that there are only 1000 repetitions for each value
of thenoncentral parameter in our simulation which may cause an error
of 1/ 1000 = 0.0316 by the Central Limit Theorem. Hence, it is not sur-
prising that the simulated power function of the Hotellings test, whose
magnitude is only around 0.05, seems not to be increasing at some points
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 79

Fig. 1. Simulated powers of the three tests with multivariate Gamma distributions.

of the noncentral parameter. Similar tables are presented for the normal
case (Figure 2). For higher dimension cases the power functions of Demp-
sters test and our test are almost the same, and our method is not worse
than Hotellings test even for p = 4.

3. A Likelihood Ratio Test on Covariance Matrix


In multivariate analysis, the second important test is about the covariance
0
matrix. Assume that xi = (x1i , . . . , xpi ) is a sample from a multivariate
normal population with mean vector 0p and variance-covariance matrix
pp for i = 1, . . . , n. Now, consider the hypotheses
H0 : = Ipp v.s. H1 : 6= Ipp . (3.1)

3.1. Classical tests


It is known that the sufficient and complete statistic for is the sample
covariance matrix which is defined by
n
1X 0
Sn = xi xi . (3.2)
n i=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

80 Z. D. Bai and S. R. Zheng

Fig. 2. Simulated powers of the three tests with multivariate normal distributions.

A classical test statistic is the log-determinant of Sn ,


T1 = log(|Sn |). (3.3)
Another important test is the likelihood ratio test (LRT) whose test statistic
is given by
T2 = n(tr(Sn ) log(|Sn |) p) (3.4)
(see Anderson 1984). To test H0 , we have the following limiting theorem.

Theorem 9. Under null hypothesis, for any fixed p, as n ,


r
n D
T1 N (0, 1),
2p
D
T2 2p(p+1)/2 .

The limiting distributions of T1 and T2 in Theorem 9 are valid even without


the normality assumption under existence of the 4th moment. It only needs
the assumption of fixed dimension, or even weaker, p/n 0. However, both
p and n are given numbers in a real testing problem. How could one justify
the assumption p/n 0? That is, for what pairs of (p, n) can Theorem 9
be applied to the tests?
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 81

For large-dimensional problems, the approximation of T1 and T2 by their


respective asymptotic limits can cause severe errors. Let us have a look
at the following simulation results given in the following table in which
empirical Type I errors are listed with significance level = 5% and 40 000
simulations.

Type I errors
n = 500
p 5 10 50 100 300
T1 0.0567 0.0903 1.0 1.0 1.0
T2 0.0253 0.0273 0.1434 0.9523 1.0
n = 1000
p 5 10 50 100 300
T1 0.0530 0.0666 0.9830 1.0 1.0
T2 0.0255 0.0266 0.0678 0.4221 1.0
p/n = 0.05
(n, p) (250, 12) (500, 25) (1000, 50) (2000, 100) (6000,300)
T1 0.1835 0.5511 0.9830 1.0 1.0
T2 0.0306 0.0417 0.0678 0.1366 0.7186

The simulation results show that Type I errors for the classical methods T1
and T2 are close to 1 as p/n y (0, 1) or p/n is large. It shows that the
classical methods T1 and T2 behave very poorly and are even inapplicable
for the testing problems with large dimension or dimension increasing with
sample size.
Bai and Silverstein (2004) have revealed the reason of the above phe-
nomenon. They show that, with probability 1,
r
n
T1 = log(|Sn |) as p/n y (0, 1). (3.5)
2p

We can similarly show that

T2 = n (tr(Sn ) log(|Sn |) p) as p/n y (0, 1). (3.6)

These two results show that Theorem 9 is not applicable when p is large
and we have to seek for new limit theorems to test the Hypothesis 3.1.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

82 Z. D. Bai and S. R. Zheng

3.2. Random matrix theory


In this section, we shall cite some important results of RMT related to our
work. For more details on RMT, the reader is referred to Bai (1999), Bai
and Silverstein (1998, 2004), Bai, Yin and Krishnaiah (1983), Marcenko
and Pastur (1967).
Before dealing with the large-dimensional testing problem (3.1), we first
introduce some basic concepts, notations and present some well-known re-
sults. To begin with, suppose that {xij , i, j = 1, 2, . . . } is a double array
of i.i.d. complex random variables with mean zero and variance 1. Write
0
xk = (x1k , . . . , xpk ) for k = 1, . . . , n and
n
!
1/2 1X
Bp = T p xk xk Tp1/2 ,

n
k=1

2 1/2
where Ex11 = 0, E|x11 | = 1, Tp is p p random non-negative definite
1/2
being independent. Let F A denote
Hermitian with (x1 , . . . , xn ) and Tp
the empirical spectral distribution (ESD) of the eigenvalues of the square
matrix A, that is, if A is p p, then
(number of eigenvalues of A x)
F A (x) = .
p
Silverstein (1995) proved that under certain conditions, with probability 1,
F Bp tends to a limiting distribution, called the limiting spectral distribution
(LSD). To describe his result, we define the Stieltjes transform for the c.d.f.
G by
Z
1
mG (z) dG(), z C+ = {z : z C, =(z) > 0}. (3.7)
z
Let Hp = F Tp and H denote the ESD and limiting spectral distribution
(LSD) of Tp , respectively. Also, let F {y,H} denote the LSD of F Bp . Further,
let F {yp .Hp } denote the LSD F {y,H} with y = yp and H = Hp .
Let m() and mF {yp ,Hp } () denote the Stieltjes transforms of the c.d.f.s
{y,H}
F (1 y)I[0,+) + yF {y,H} and F {yp ,Hp } (1 yp )I[0,+) +
yp F {yp ,Hp }
, respectively. Clearly, F Bp = (1 yp )I[0,+) + yp F Bp is the
ESD of the matrix
1 n
Bp = xj T x k .
n j,k=1

Therefore, F {y.H} and m are the LSD of F Bp and its Stieltjes transform
and F {yp .Hp } and mF {yp .Hp } are the corresponding versions with y = yp
and H = Hp . Silverstein (1995) proved
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 83

Theorem 10. If Hp H and H is a proper probability distribution, then


with probability 1, the ESD of Bp tends to a LSD for which the Stieltjes
transform m is the unique solution to the equation
Z
1 t
z= +y dH(t) (3.8)
m(z) 1 + t m(z)
on the upper half plane m C+ .

When Hp = 1[1,) , that is Tp = Ip , a special of Theorem 10 is due to


Marcenko and Pastur. For this special case, the LSD is called the MP law
whose explicit form is given by

1
p
0
(b x)(x a) if a < x < b,
Fy (x) = 2xy (3.9)

0 otherwise,

where a, b = (1 y)2 . If y > 1, there is a point mass 1 1/y at 0. The
Stieltjes transform for the MP law is given by
p
1 + z y (1 z + y)2 4z
m(z) = . (3.10)
2z
Next, we introduce the central limit theorem (CLT) of linear spectral statis-
tics (LSS) due to Bai
R and Silverstein (2004). Suppose we are concerned with
a parameter = f (x)dF (x). As an estimator of , one may employ the
integral
Z
= f (x)dFn (x),

which will be called a LSS, where Fn (x) is the ESD of the random matrix
computed from data and F (x) is the limiting spectral distribution (LSD)
D
of Fn F .
Bai and Silverstein (2004) established the following theorem.

Theorem 11. Assume:


(n)
(a) For each n, xij , i p, j n are independent and for all n, i, j, they
are identically distributed and Ex11 = 0, E|x11 |2 = 1, E|x411 | < ;
(b) yp = p/n y;
(c) Tp is p p non-random Hermitian nonnegative definite with spectral
D
norm bounded in p, with F Tp H, a proper c.d.f. where D denotes
the convergence in distribution.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

84 Z. D. Bai and S. R. Zheng

(d) The function f1 , . . . , fk are analytic on an open region of C containing


the real interval
 
Tp 2 Tp 2
lim inf min I(0,1) (y) (1 y) , lim sup max (1 + y) .
p p

Then

(i) the random vector


Z Z 
f1 (x)dGp (x), . . . , fk (x)dGp (x) (3.11)

forms a tight sequence in p, where Gp (x) = p[F Bp (x) F {yp ,Hp } (x)].
(ii) If x11 is real and E(x411 ) = 3, then (3.11) converges weakly to a Gaus-
sian vector (Xf1 , . . . , Xfk ) with means
Z
I y m(z)3 t2 (1 + tm(z))3 dH(t)
1
EXf = f (z)  Z 2 dz
2i
1 y m(z)2 t2 (1 + tm(z))2 dH(t)
(3.12)
and covariance function
II
1 f (z1 )g(z2 ) d d
Cov(Xf , Xg )= 2 m(z1 ) m(z2 )dz1 dz2
2 (m(z1 )m(z2 ))2 dz1 dz2
(3.13)

(f, g {f1 , . . . , fk }). The contours in (3.12) and (3.13) (two in (3.13),
which we may assume to be nonoverlapping) are closed and are taken in
the positive direction in the complex plane, each enclosing the support
of F c,H .
2 4
(iii) If X11 is complex with E(X11 ) and E(|X11 |) = 2, then (ii) also holds,
except the means are zero and the covariance function is 12 of the func-
tion given in (3.13).

In the following, we consider the special case of Tp = Ipp , that is,


n
1X
Sp = B p = Xk Xk ;
n
k=1

then the ESD or LSD Hp (t) = H(t) of F Tp is a degenerate distribution


in 1. Applying the theorem of Bai and Silverstein (2004), we obtain the
following theorem.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 85

Theorem 12. If Tp is a p p identity matrix Ipp , then the results in


Theorem 11 may be changed to

Z b(y
Z p) q
yp ,Hp f (x)
f (x)dF (x) = (b(yp ) x)(x a(yp ))dx,
2yp x
a(yp )

b(y)
Z
f (a(y)) + f (b(y)) 1 f (x)
EXf = p dx
4 2 4y (x 1 y)2
a(y)

and
I I
1 f (z(m1 )) g(z(m2 ))
Cov(Xf , Xg ) = 2 dm1 dm2 ,
2 (m1 m2 )2
where
yp = p/n y (0, 1),
1 y
z(mi ) = + f or i = 1, 2,
mi 1 + mi

a(yp ) = (1 yp )2 , b(yp ) = (1 + yp )2

and the m1 , m2 contours, nonintersecting and both taken in the positive


direction, enclose 1/(yp 1) and 1.

3.3. Testing based on RMT limiting CLT


In this section, we present a new testing method for the hypothesis H0 by
renormalizing T1 and T2 using the CLT for LSS of large sample covariance
matrices. We would like to point out to the reader that our new approach
applies to both cases of large dimension and small dimension, provided p
5. From simulation comparisons, one can see that, when classical methods
work well, our new approach is not only as good as the classical ones, but
also performs well when the classical methods fail.
Based on the MP law (see Theorem 2.5 in Bai (1999)), we have the
following lemma.

Lemma 13. As yp = y/p y (0, 1), with probability 1, we have


tr(Sn ) log(|Sn |) p
d2 (y) under H0 ,
p
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

86 Z. D. Bai and S. R. Zheng

where
Zb
x log(x) 1 p
d2 (y) = (b(y) x)(x a(y))dx
2yx
a
y1
= 1 log(1 y) < 0.
y

The proof is routine and omitted.

This limit theoretically confirms our findings that the classical methods
of using T1 and T2 will lead to a very serious error in large-dimensional
testing problems (3.1), that is, the Type I errors is almost 1. It suggests
that one has to find a new normalization of the statistics T1 and T2 such
that the hypothesis H0 can be tested by the newly normalized versions of
T1 and T2 .
Applying Theorem 12 to T1 and T2 , we have the following theorem.

Theorem 14. When yp = p/n y (0, 1), we have

log(|Sn |) p d1 (yp ) 1 (y) D


T3 = N (0, 1)
1 (y)
tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp ) D
T4 = N (0, 1)
2 (yp )

where
yp 1
d1 (yp ) = log(1 yp ) 1,
yp
log(1 yp )
1 (yp ) = ,
2
12 (yp ) = 2 log(1 yp ),
yp 1
d2 (yp ) = 1 log(1 yp ),
yp
log(1 yp )
2 (yp ) = ,
2
22 (yp ) = 2 log(1 yp ) 2yp .

The proof of the theorem is a simple application of of Theorem 12 and


hence is omitted. We just present some simulation results to demonstrate
how the new approach performs better than the original approaches.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 87

3.4. Simulation results


In this section we present the simulation results for T1 , T2 , T3 and T4 .
We first investigate whether the Type I errors of the four methods can be
controlled by the significance level. Then we investigate which of the four
methods has the largest power. The purpose of this section is to show that
the new hypothesis testing method T4 provides a useful tool for both small
and large dimensional problems (3.1)

H0 : pp = Ipp v.s. H1 : pp 6= Ipp .

Let us first give some detailed explanations. Firstly, we introduce the


four simulated testing statistics with their respective approximations
r
n D
T1 =

log(|Sn |) N (0, 1), under H0 ,


2p


D

T = n (tr(Sn ) log(|Sn |) p) 2p(p+1)/2 , under H0 ,
2

log(|Sn |) p d1 (yp ) 1 (yp ) D




T3 = N (0, 1), under H0 ,


1 (yp )




tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp ) D
T4 = N (0, 1), under H0 .
2 (yp )

Secondly, in order to illustrate detailed behaviors of the four statistics T1


T4 , we not only use the two-sided rejection regions, but we also use the
one-sided rejection regions in our simulation study.
r
n

1T = | log(|Sn |)| 0.975 (two-sided)


2p

r
n
Method 1 T1 = log(|Sn |) 0.05 (reject left)

2p

r

n


1T = log(|Sn |) 0.95 (reject right)
2p



T2 =n (tr(Sn ) log(|Sn |)p)0.975 or 0.025 (two sided)



Method 2 T2 = n (tr(Sn ) log(|Sn |) p) 0.05 (reject left)





T2 = n (tr(Sn ) log(|Sn |) p) 0.95 (reject right)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

88 Z. D. Bai and S. R. Zheng


| log(|Sn |) p d1 (yp ) 1 (yp )|


T3 = 0.975 (two-sided)

1 (yp )




(log(|Sn |) p d1 (yp ) 1 (yp ))
Method 3 T3 = 0.05 (reject left)

1 (yp )






(log(|Sn |) p d1 (yp ) 1 (yp ))
T3 = 0.95 (reject right)
1 (yp )


| tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp )|

T4 = 0.975


2 (yp )



(two-sided)





T = (tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp ))


4 0.05
Method 4 2 (yp )



(reject left)







(tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp ))

T4 = 0.95


2 (yp )

(reject right)

where 0.975 , 0.05 and 0.95 are the 97.5%, 5% and 95% quantiles of
N (0, 1); 0.975 , 0.05 and 0.95 are the 97.5%, 5% and 95% quantiles of
2p(p+1)/2 .
Thirdly, samples X1 , . . . , Xn are drawn from the population
N (0p , pp ). To compute Type I errors, we draw samples X1 , . . . , Xn from
N (0p , Ipp ), and, to compute powers, we take samples X1 , . . . , Xn from
N (0p , pp ) where = (ij )pp
(
1, i=j
ij =
0.05, i 6= j

for i, j = 1, . . . , p. The sample size n is taken values 500 or 1000. The


dimension of data p is taken values 5, 10, 50, 100 or 300. We also consider the
case that the dimension p increases with the sample size n. The parameter
setups are (n, p) = (6000, 300), (2000, 100), (1000, 50), (500, 25), (250, 12)
with p/n = 0.05.
The results of the 40 000 simulations are summarized in the following
three tables.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 89

Table 1. Type I errors and powers.

(n = 500, p = 300)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 1.0 0.0 1.0 1.0 0.0 1.0
Method 3 0.0513 0.0508 0.0528 1.0 1.0 0.0
Method 4 0.0507 0.0521 0.0486 1.0 0.0 1.0
(n = 500, p = 100)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.9523 0.0 0.9753 1.0 0.0 1.0
Method 3 0.0516 0.0514 0.0499 0.9969 1.0 0.0
Method 4 0.0516 0.0488 0.0521 1.0 0.0 1.0
(n = 500, p = 50)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.1484 0.0064 0.2252 1.0 0.0 1.0
Method 3 0.0488 0.0471 0.0504 0.7850 0.8660 0.0
Method 4 0.0515 0.0494 0.0548 1.0 0.0 1.0
(n = 500, p = 10)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.0903 0.1406 0.0136 0.1712 0.2610 0.0023
Method 2 0.0546 0.0458 0.0538 0.8985 0.0 0.9391
Method 3 0.0507 0.0524 0.0489 0.0732 0.1169 0.0168
Method 4 0.0585 0.0441 0.0668 0.9252 0.0 0.9470
(n = 500, p = 5)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.0567 0.0777 0.0309 0.0651 0.1038 0.0190
Method 2 0.0506 0.0489 0.0511 0.4169 0.0014 0.5188
Method 3 0.0507 0.0517 0.0497 0.0502 0.0695 0.0331
Method 4 0.0625 0.0368 0.0807 0.5237 0.0007 0.5940

From the simulation results, one can see the following:


(1) Under all setups of (n, p), the simulated Type I errors of testing
Methods 3 and 4 are close to the significance level = 0.05 while those of
the testing Methods 1 and 2 are not. Moreover, when the ratio of dimension
to sample size p/n is large, Type I errors of Methods 1 and 2 are close to 1.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

90 Z. D. Bai and S. R. Zheng

Table 2. Type I errors and powers.


(n = 1000, p = 300)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 1.0 0.0 1.0 1.0 0.0 1.0
Method 3 0.0496 0.0496 0.0493 1.0 1.0 0.0
Method 4 0.0512 0.0492 0.0499 1.0 0.0 1.0
(n = 1000, p = 100)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.4221 0.0003 0.5473 1.0 0.0 1.0
Method 3 0.0508 0.0509 0.0515 1.0 1.0 0.0
Method 4 0.0522 0.0492 0.0535 1.0 0.0 1.0
(n = 1000, p = 50)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.9830 0.9915 0.0 1.0 1.0 0.0
Method 2 0.0778 0.0179 0.1166 1.0 0.0 1.0
Method 3 0.0471 0.0495 0.0499 0.9779 0.9886 0.0
Method 4 0.0524 0.0473 0.0575 1.0 0.0 1.0
(n = 1000, p = 10)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.0666 0.1067 0.0209 0.1801 0.2623 0.0037
Method 2 0.0532 0.0470 0.0517 0.9994 0.0 0.9995
Method 3 0.0506 0.0498 0.0504 0.0969 0.1591 0.0116
Method 4 0.0582 0.0440 0.0669 0.9994 0.0 0.9996
(n = 1000, p = 5)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.0530 0.0696 0.0360 0.0664 0.1040 0.0203
Method 2 0.0510 0.0491 0.0498 0.8086 0.0 0.8736
Method 3 0.0508 0.0530 0.0494 0.0542 0.0784 0.0288
Method 4 0.0622 0.0356 0.0790 0.8780 0.0 0.9114

Furthermore, when the ratio p/n y (0, 1), even if y is very small,
Type I errors of testing Methods 1 and 2 still tend to 1 as the sample size
is becoming large.
(2) Under all choices of (n, p), powers of testing Methods 2 and 4 are
much higher than those of testing Methods 1 and 3, respectively. Moreover,
almost all powers of testing Method 4 are higher than others.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 91

Table 3. Type I errors and powers.

(n = 6000, p = 300)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.7186 0.0 0.8131 1.0 0.0 1.0
Method 3 0.0476 0.0465 0.0469 1.0 1.0 0.0
Method 4 0.0505 0.0525 0.0466 1.0 0.0 1.0
(n = 2000, p = 100)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.1366 0.0062 0.2144 1.0 0.0 1.0
Method 3 0.0501 0.0506 0.0515 1.0 1.0 0.0
Method 4 0.0525 0.0505 0.0531 1.0 0.0 1.0
(n = 1000, p = 50)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.9830 0.9915 0.0 1.0 1.0 0.0
Method 2 0.0778 0.0179 0.1166 1.0 0.0 1.0
Method 3 0.0471 0.0495 0.0499 0.9779 0.9886 0.0
Method 4 0.0524 0.0473 0.0575 1.0 0.0 1.0
(n = 500, p = 25)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.5511 0.6653 0.0 0.9338 0.9656 0.0
Method 2 0.0817 0.0313 0.0765 1.0 0.0 1.0
Method 3 0.0518 0.0539 0.0506 0.2824 0.3948 0.0013
Method 4 0.0552 0.0472 0.0558 1.0 0.0 1.0
(n = 250, p = 12)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.1835 0.2729 0.0033 0.3040 0.4151 0.0006
Method 2 0.0612 0.0442 0.0606 0.6129 0.0002 0.7141
Method 3 0.0483 0.0499 0.0486 0.0670 0.1089 0.0183
Method 4 0.0574 0.0507 0.0617 0.6369 0.0003 0.7192

(3) Comparing the Type I errors and powers for all choices of (n, p),
the testing Method 4 has better Type I errors and higher powers. Al-
though Method 2 has higher powers, its Type I errors are almost 1. Al-
though Method 3 has lower Type I errors, its powers are lower than those
of Method 4.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

92 Z. D. Bai and S. R. Zheng

In conclusion, our simulation results show that T1 and T2 are inappli-


cable for large-dimensional problems or small-dimensional problems whose
sample size is large. Although both statistics, T3 and T4 , can be applied
to the large-dimensional problem (3.1), T4 is better than T3 from the view
point of powers under the same significance level. It shows that T4 provides
a robust test for both, large-dimensional or small-dimensional problems.

4. Conclusions
In this paper, both theoretically and by simulation, we have shown that
classical approaches to hypothesis testing do not apply to large-dimensional
problems and that the newly proposed methods perform much better than
the classical ones. It is interesting that the new methods do not perform
much worse than the classical methods for small dimensional cases. There-
fore, we would strongly recommend the new approaches even for moderately
large dimensional cases provided that p 4 or 5, REGARDLESS of the
ratio between dimension and data size.
We would also like to emphasize that the large dimension of data may
cause low efficiency of classical inference methods. In such cases, we would
strongly recommend non-exact procedures with high efficiency rather than
those classical ones with low efficiency, such as Dempsters NET and Bai
and Saranadasas ANT.

Acknowledgment
The authors of this chapter would like to express their thanks to Dr. Adrian
Roellin for his careful proofreading of the chapter and valuable comments.

References
1. G. J. Babu and Z. D. Bai, Edgeworth expansions of a function of sample
means under minimal moment conditions and partial Cramers conditions,
Sankhya Ser. A 55 (1993) 244258.
2. Z. D. Bai, Convergence rate of expected spectral distributions of large random
matrices. Part I. Wigner matrices, Ann. Probab. 21 (1993) 625648.
3. Z. D. Bai, Convergence rate of expected spectral distributions of large random
matrices. Part II. Sample covariance matrices, Ann. Probab. 21 (1993) 649
672.
4. Z. D. Bai, Methodologies in spectral analysis of large dimensional random
matrices. A review, Statistica Sinica 9 (1999) 611677.
5. Z. D. Bai and C. R. Rao, Edgeworth expansion of a function of sample means,
Ann. Statist. 19 (1991) 12951315.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong

Future of Statistics 93

6. Z. D. Bai and J. W. Silverstein, No eigenvalues outside the support of the lim-


iting spectral distribution of large dimensional sample covariance matrices,
Ann. Probab. 26 (1998) 316345.
7. Z. D. Bai and J. W. Silverstein, CLT for linear spectral statistics of large-
dimensional sample covariance matrices, Ann. Probab. 32 (2004) 553605.
8. Z. D. Bai and Y. Q. Yin, Limit of the smallest eigenvalue of large dimensional
sample covariance matrix, Ann. Probab. 21 (1993) 12751294.
9. Z. D. Bai, Y. Q. Yin and P. R. Krishnaiah, On the limiting empirical distri-
bution function of the eigenvalues of a multivariate F -matrix, Theory Probab.
Appl. 32 (1987) 490500.
10. J. H. Chung and D. A. S. Fraser, Randomization tests for a multivariate
two-sample problem, J. Amer. Statist. Assoc. 53 (1958) 729735.
11. A. P. Dempster, A high dimensional two sample significance test, Ann. Math.
Statist. 29 (1958) 9951010.
12. A. P. Dempster, A significance test for the separation of two highly multi-
variate small samples, Biometrics 16 (1960) 4150.
13. V. A. Marcenko and L. A. Pastur, Distribution of eigenvalues for some sets
of random matrices, Math. USSR-Sb 1 (1967) 457483.
14. H. Saranadasa, Asymptotic expansion of the missclassification probabilities
of D- and A-criteria for discrimination from two high dimensional populations
using the theory of large dimensional random matrices, J. Multivariate Anal.
46 (1993) 154174.
15. Y. Q. Yin, Z. D. Bai and P. R. Krishnaiah, Limiting behavior of the eigen-
values of a multivariate F -matrix, J. Multivariate Anal. 13 (1983) 508516.
16. Y. Q. Yin, Z. D. Bai and P. R. Krishnaiah, On the limit of the largest
eigenvalue of the large dimensional sample covariance matrix, Probab. Theory
Related Fields 78 (1988) 509521.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

THE AND SHANNON TRANSFORMS:


A BRIDGE BETWEEN RANDOM MATRICES AND
WIRELESS COMMUNICATIONS

Antonia M. Tulino
Dip. di Ing. Elettronica e delle Telecomunicazioni
Universita degli Studi di Napoli, Fedrico I
Via Claudio 21, Napoli, Italy
E-mail: [email protected]

The landmark contributions to the theory of random matrices of Wishart


(1928), Wigner (1955), and Marcenko-Pastur (1967), were motivated to
a large extent by their applications. In this paper, we report on two
transforms motivated by the application of random matrices to various
problems in the information theory of noisy communication channels:
and Shannon transforms. Originally introduced in [1, 2], their applica-
tions to random matrix theory and engineering applications have been
developed in [3]. In this paper, we give a summary of their main prop-
erties and applications in random matrix theory.

1. Introduction
The rst studies of random matrices stemmed from the multivariate statis-
tical analysis at the end of the 1920s, primarily with the work of Wishart
(1928) on xed-size matrices with Gaussian entries. After a slow start, the
subject gained prominence when Wigner introduced the concept of statis-
tical distribution of nuclear energy levels in 1950. In the past half century,
classical random matrix theory has been developed, widely and deeply, into
a huge body, eectively used in many branches of physics and mathematics.
Of late, random matrices have attracted great interest in the engineering
community because of their applications in the context of information the-
ory and signal processing, which include among others: wireless communica-
tions channels, learning and neural networks, capacity of ad hoc networks,
direction of arrival estimation in sensor arrays, etc.
The earliest applications to wireless communication were the pioneering
works of Foschini and Telatar in the mid-90s on characterizing the capacity

95
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

96 A. M. Tulino

of multi-antenna channels. With works like [46] which, initially, called at-
tention to the eectiveness of asymptotic random matrix theory in wireless
communication theory, interest in the study of random matrices began and
the singular-value densities of random matrices and their asymptotics, as
the matrix size tends to innity, became an active research area in infor-
mation/communication theory. In the last few years a considerable body of
results on the fundamental information-theoretic limits of various wireless
communication channels that makes substantial use of asymptotic random
matrix theory, has emerged in the communications and information theory
literature. For an extended survey on contributions on this results see [3].
In the same way that the original contributions of Wishart and Wigner
were motivated by their applications, such is also the driving force behind
the eorts by information-theoreticians and engineers. The Shannon and
the transforms, introduced for the rst time in [1,2], are prime examples:
these transforms which were motivated by the application of random ma-
trix theory to various problems in the information theory of noisy commu-
nication channels [3], characterize the spectrum of a random matrix while
providing direct engineering insight.
In this paper, using the and Shannon transforms of the singular-value
distributions of large dimensional random matrices, we characterize for both
ergodic and non-ergodic regime the fundamental limits of a general class of
noisy multi-input multi-output (MIMO) wireless channels which are char-
acterized by random matrices that admit various statistical descriptions
depending on the actual application. For these channels, a number of ex-
amples and asymptotic closed-form expressions of their fundamental limits
are provided. For both the ergodic and non-ergodic regimes, we illustrate
the power of random matrix results in the derivation of the fundamental
limits of wireless channels and we show the applicability of our results to
real-world problems, where the asymptotic behaviors are shown to be ex-
cellent approximations of the behavior of actual systems with very modest
numbers of antennas.

2. Wireless Communication Channels


A typical wireless communication channel is described by the usual linear
vector memoryless channel:
y = Hx + n (2.1)
where x is the K-dimensional vector of the signal input, y is the N -
dimensional vector of the signal output, and the N -dimensional vector n is
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 97

the additive Gaussian noise, whose components are independent complex


Gaussian random variables with zero mean and independent real and imag-
inary parts with the same variance 2 /2 (i.e., circularly distributed). H, in
turn, is the N K complex random matrix describing the channel.
Model (2.1) encompasses a variety of channels of interest in wire-
less communications such as multi-access channels, linear channels with
frequency-selective and/or frequency-dispersive fading, multidimensional
channels (multi-sensor reception, multi-cellular system with cooperative de-
tection, etc), crosstalk in digital subscriber lines, signal space diversity, etc.
In each of these cases, N , K and H take dierent meanings. For example, K
and N may indicate the number of transmit and receive antennas while H
describes the fading between each pair of transmit and receive antennas, or
the spreading gain and the number of users while H the signature matrix,
or they may both represent time/frequency slots while H the tone matrix.
In Section 5 we detail some of the more representative wireless chan-
nels described by (2.1) that capture various features of interest in wireless
communications and we demonstrate how random matrix results along
with the and Shannon transforms have been used to characterize the
fundamental limits of the various channels that arise in wireless communi-
cations.

3. Why Asymptotic Random Matrix Theory?


In section we illustrate the role of random matrices and their singular values
in wireless communication through the derivation of some key performance
measures, which are determined by the distribution of the singular values
of the channel matrix.
The empirical cumulative distribution function (c.d.f) of the eigenvalues
(also referred to as the empirical spectral distribution (ESD)) of an N N
Hermitian matrix A is dened as

1 
N
FN
A (x) = 1{i (A) x} (3.1)
N i=1

where 1 (A), . . . , N (A) are the eigenvalues of A and 1{} is the indica-
tor function. If FN A () converges almost surely (a.s) as N , then the
corresponding limit (asymptotic ESD) is denoted by FA ().
The rst performance measure that we are going to consider is the mu-
tual information. The mutual information, rst introduced by Shannon in
1948, determines the maximum amount of data per unit bandwidth (in
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

98 A. M. Tulino

bits/s/Hz) that can be transmitted reliably over a specic channel realiza-


tion H. If the channel is known by the receiver, and the input x is Gaussian
with independent and identically distributed (i.i.d.) entries, the normalized
mutual information in (2.1) conditioned on H is given by [7, 8]
1
I(SNR) = I(x; y|H) (3.2)
N
1  
= log det I + SNR HH (3.3)
N
1   
N
= log 1 + SNR i (HH ) (3.4)
N i=1

= log (1 + SNR x) dFN
HH (x) (3.5)
0

with the transmitted signal-to-noise ratio (SNR)

N E[||x||2 ]
SNR = , (3.6)
KE[||n||2 ]

and i (HH ) equal to the ith squared singular value of H.


If the channel is known at the receiver and its variation over time is
stationary and ergodic, then the expectation of (3.2) over the distribution
of H is the ergodic mutual information (normalized to the number of receive
antennas or the number of degrees of freedom per symbol in the CDMA
channel).
For SNR , a regime of interest in short-range applications, the nor-
malized mutual information admits the following ane expansion [9, 10]

I(SNR) = S (log SNR +L ) + o(1) (3.7)

where the key measures are the high-SNR slope

I(SNR)
S = lim (3.8)
SNR log SNR
 
which for most channels gives S = min K N , 1 , and the power oset

I(SNR)
L = lim log SNR (3.9)
SNR S

which essentially boils down to log det(HH ) or log det(H H) depending


on whether K > N or K < N .
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 99

Another important performance measure for (2.1) is the minimum


mean-square-error (MMSE) achieved by a linear receiver, which deter-
mines the maximum achievable output signal-to-interference-and-noise ra-
tio (SINR). For an i.i.d. input, the arithmetic mean over the users (or transmit
antennas) of the MMSE is given, as a function of H, by [4]

1  
MMSE(SNR) = min E ||x My||2 (3.10)
K MCKN
1  1

= tr I + SNR H H (3.11)
K
1 
K
1
= (3.12)
K i=1 1 + SNR i (H H)

1
= dFK (x)
1 + SNR x H H
0

N 1 N K
= HH (x)
dFN
K 0 1 + SNR x K
(3.13)

where the expectation in (3.10) is over x and n while (3.13) follows from

N FN
HH (x) N u(x) = KFH H (x) Ku(x)
K
(3.14)

where u(x) is the unit-step function (u(x) = 0, x 0; u(x) = 1, x > 0).


Note, incidentally, that both performance measures as a function of SNR are
coupled through
 1

d   K tr I + SNR H H
loge det I + SNR HH = .
d SNR SNR

As we see in (3.5) and (3.13), both fundamental performance measures (mu-


tual information and MMSE) are dictated by the distribution of the empirical
(squared) singular value distribution of the random channel matrix. It is
thus of paramount importance, in order to evaluate these and other
performance measures, to be able to express this empirical distribution.
Since FNHH
clearly depends on the specic realization of H, so do (3.2) and
(3.10) above. In terms of engineering insight, however, it is crucial to obtain
expressions for the performance measures that do not depend on the single
matrix realization, to which end two approaches are possible:
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

100 A. M. Tulino

To study the average behaviora by taking an expectation of the perfor-


mance measures over H, which requires assigning a probabilistic structure
to it.
The second approach is to consider an operative regime where the per-
formance measures (3.2) and (3.10) do not depend on the specic choice
of signatures.

Asymptotic analysis (in the sense of large dimensional systems, i.e


K, N with K N ) is where both these approaches meet. First, the
computation of the average performance measures simplies as the dimen-
sions grow to innity. Second, the asymptotic regime turn out to be the
operative regime where the dependencies of (3.2) and (3.10) on the realiza-
tion of H disappear. Specically, in most of the cases, asymptotic random
matrix theory guarantees that as the dimensions of H go to innity but
their ratio is kept constant, its empirical singular-value distribution dis-
plays the following properties, which are key to the applicability to wireless
communication problems:

Insensitivity of the asymptotic eigenvalue distribution to the probability


density function of the random matrix entries.
An ergodic nature in the sense that with probability one the
eigenvalue histogram of any matrix realization converges almost surely
to the asymptotic eigenvalue distribution.
Fast convergence rate of the empirical singular-value distribution to its
asymptotic limit [11,12], which implies that that even for small values of
the parameters, the asymptotic results come close to the nite-parameter
results (cf. Fig. 1).

All these properties are very attractive in terms of analysis but are also
of paramount importance at the design level. In fact:

The ergodicity enables the design of a receiver that, when optimized in


the asymptotic regime, has a structure depending weakly or even not
depending at all of the specic realization of H. As a consequence, less
a priori knowledge and a lower level of complexity are required (see [3]
and references therein).

a Itis worth emphasizing that, in many cases, resorting to the expected value of the
mutual information is motivated by the stronger consideration that: in problems such
as aperiodic DS-CDMA or multi-antenna with an ergodic channel, it is precisely the
expected capacity that has real operational meaning.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 101

4 4

3 3

2 2

1 1

0 0
0 2 4 6 8 10 0 2 4 6 8 10

N=3 SNR N=5 SNR

4 4

3 3

2 2

1 1

0 0
0 2 4 6 8 10 0 2 4 6 8 10
SNR N = 50 SNR
N = 15
Fig. 1. Several realizations of the left hand side of (3.3) are compared to the asymptotic
limit in the right hand side of (4.10) in the case of = 1 for N = 3, 5, 15, 50.

The fast convergence ensures that the performance of the asymptotically


designed receiver operating for very small values of the system dimen-
sions, is very close to that of the optimized receiver.
Finally, the insensitivity property along with the fast convergence
leads to receiver structures that, for nite dimensionality, are very robust
to the probability distribution of H. Examples are the cases of DS-CDMA
subject to fading or single-user multiple-antennas link, where the results
do not depend on the fading statistics.
As already mentioned, closely related to the MMSE is the signal-to-
interference-to noise ratio, SINR, achieved at the output of a linear MMSE
receiver. Denote by xk the MMSE estimate of the kth component of x and
by MMSEk the corresponding MMSE, such SINR for the kth component is:
E[|xk |2 ] MMSEk
SINRk = . (3.15)
MMSEk
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

102 A. M. Tulino

Typically, the estimator sets E[|xk |2 ] = 1 and thus


1
1 MMSEk 
SINRk = =
SNR hk I + SNR hj hj hk . (3.16)
MMSEk
j=k

with the aid of the matrix inversion lemma. Normalized by the single-
user signal-to-noise ratio (SNR hk 2 ), the SINRk gives the so-called MMSE
multiuser eciency, denoted by kMMSE (SNR) [4]:

SINRk
kMMSE (SNR) = . (3.17)
SNR hk 2

For K, N with K N , both SINR and MMSE multiuser eciency can


be written as a function of the asymptotic ESD of HH .
The ergodic mutual information, obtained by averaging (3.2) over the
channel fading coecients, represents the fundamental operational limit in
the regime where the fading is such that the statistics of the channel are
revealed to the receiver during the span of a codeword.
Often, however, we may encounter channels that change slowly so that
H is held approximately constant during the transmission of a codeword. In
this case, the average mutual information has no operational signicance
and a more suitable performance measure is the so-called outage capac-
ity [13] (cumulative distribution of the mutual information), which coin-
cides with the classical Shannon-theoretic notion of -capacity [14], namely
the maximal rate for which block error probability  is attainable. Un-
der certain conditions, the outage capacity can be obtained through the
probability that the transmission rate R exceeds the input-output mutual
information (conditioned on the channel realization) [15,16,13]. Thus, given
a rate R an outage occurs when the random variable

I = log det(I + SNR HH ) (3.18)

whose distribution is induced by H, falls below R. A central result in ran-


dom matrix theory derived by Bai and Silverstein (2004) [17] establishes a
law of large numbers and a central limit theorem for linear statistics of a
suitable class of Hermitian random matrices. Using this result, in Section 4
the asymptotic normality of the unnormalized mutual information (3.18) is
proved for arbitrary signal-to-noise ratios and fading distributions, allowing
for correlation between either transmit or receive antennas.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 103

4. and Shannon Transforms: Theory and Applications


Motivated by the intuition drawn from various applications of random ma-
trices to problems in the information theory of noisy communication chan-
nels, and Shannon transforms, which are very related to a more classical
transform in random matrix theory, the Stieltjes transform [18], turn out
to be quite helpful at clarifying the exposition as well as the statement of
many results. In particular, the transform leads to compact denitions
of other transforms used in random matrix theory such as the R and S
transforms [3].

Denition 4.1. Given an N N nonnegative denite random matrix A


whose ESD converges a.s., its transform is
 
1
A () = E (4.1)
1 + X
while its Shannon transform is dened as
VA () = E[log(1 + X)] (4.2)
where X is a nonnegative random variable whose distribution is the asymp-
totic ESD of A while is a nonnegative real number.

Then, A () can be regarded as a generating function for the asymptotic


moments of A [3]. Furthermore from the denition 0 < X () 1.
For notational convenience, we refer to the transform of a matrix and
the transform of its asymptotic ESD interchangeably.

Lemma 4.2. For any N K matrix A and K N matrix B such that AB


N , if the spectra converge,
is nonnegative denite, for K, N with K
AB () = 1 + BA (). (4.3)

As it turns out, the Shannon and transforms are intimately related to


each other and to the classical Stieltjes transform:
 
d 1 1
VA () = 1 SA = 1 A ()
log e d
where A is N N Hermitian matrix whose ESD converges a.s. to FA ()
and SA () is its Stieltjes transform dened as [18]:

1
SA (z) = dFA (). (4.4)
z
Before introducing the and Shannon transforms of various random
matrices, some justication for their relevance to wireless communications
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

104 A. M. Tulino

is in order. The rationale for introducing the and Shannon transforms


can be succinctly explained by considering a hypothetical wireless commu-
nication channel where the random channel matrix H in (2.1) is such that,

as K, N with K N , the ESD of HH converges a.s. to a nonran-
dom limit. Based on Denition 4.1 we immediately recognize from (3.5) and
(3.13) that for an i.i.d. Gaussian input x, as K, N with K N the
normalized mutual information and the MMSE of (2.1) are related to and
Shannon transform of HH by the following relationships:
I(SNR) VHH (SNR) (4.5)

MMSE(SNR) H H (SNR) (4.6)


1 HH (SNR)
= 1 (4.7)

where (4.7) follows from (3.14). It is thus of vital interest in information-
theoretic and signal-processing analysis of the wireless communication chan-
nels of contemporary interest, the evaluation of the and Shannon trans-
forms of the various random (channel) matrices that arise in the linear
model (2.1).
A classical result in random matrix theory states that
Theorem 4.3 ([5]). If the entries of H are zero-mean i.i.d. with variance

N , as K, N with N , the ESD of HH converges a.s. to the
1 K

Marcenko-Pastur law whose density function is



+ (x a)+ (b x)+
f (x) = (1 ) (x) + (4.8)
2x
where
 
a = (1 )2 b = (1 + )2 .
For this simple statistical structure of H, the and Shannon transforms
admit the following nice and compact closed-form expressions:
Theorem 4.4 ([5]). The and Shannon transforms of the Marcenko-
Pastur law, whose density function is (4.8), are
F (, )
HH () = 1 (4.9)
4
and
 
1
VHH () = log 1 + F (, )
4
 
1 log e
+ log 1 + F (, ) F (, ) (4.10)
4 4
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 105

with
  2
2 2
F (x, z) = x(1 + z) + 1 x(1 z) + 1 .

However, as is well known since the work of Marcenko and Pastur [19],
it is rare the case that the limiting empirical distribution of the squared
singular values of random matrices (whose aspect ratio converges to a con-
stant) admit closed-form expressions. Rather, [19] showed a very general
result where the characterization of the solution is accomplished through
a xed-point equation involving the Stieltjes transform. Later this result
has been strengthened in [20]. Consistent with our emphasis, this result is
formulated in terms of the transform rather than the Stieltjes transform
used in [20] as follows:

Theorem 4.5 ([19, 20]). Let S be an N K matrix whose entries are


i.i.d. complex random variables with zero-mean and variance N1 . Let T be a
K K real diagonal random matrix whose empirical eigenvalue distribution
converges a.s. to a nonrandom limit. Let W0 be an N N Hermitian
complex random matrix with empirical eigenvalue distribution converging
a.s. to a nonrandom distribution. If H, T, and W0 are independent, the
empirical eigenvalue distribution of

W = W0 + STS (4.11)

converges, as K, N with K N , a.s. to a nonrandom limiting distri-


bution whose transform is the solution of the following pair of equations:

= 0 () (4.12)

= 0 () (1 T ( )) (4.13)

with 0 and T the transforms of W0 and T respectively.

In the following we give some of the more representative results on


the and Shannon transform, where the Shannon and transforms lead
to particularly simple solutions for the limiting empirical distribution of
the squared singular values of random matrices with either dependent or
independent entries.

Theorem 4.6 ([3]). Let S be an N K complex random matrix whose


entries are i.i.d. with variance N1 . Let T be a K K nonnegative denite
random matrix, whose ESD converges a.s. to a nonrandom distribution. The
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

106 A. M. Tulino

ESD of STS converges a.s., as K, N with K


N , to a distribution
whose transform satises
1
= (4.14)
1 T ()
where we have compactly abbreviated STS () = . The corresponding
Shannon transform is
1
VSTS () = VT () + log + ( 1) log e . (4.15)

Theorem 4.7 ([21]). Dene H = CSA where S is an N K matrix whose
entries are i.i.d. complex random variables with variance N1 . Let C and A
be, respectively, N N and K K random matrices such that the asymptotic
spectra of D = CC and T = AA converge a.s. to a nonrandom limit.
If C, A and S are independent, as K, N with K N , the Shannon

transform of HH is given by:
d t
VHH () = VD (d ) + VT (t ) log e (4.16)

where
d t d t
= 1 T (t ) = 1 D (d ) (4.17)

while the transform of HH can be obtained as

HH () = D ( d ()) (4.18)

where d () is the solution to (4.17).


The asymptotic fraction of zero eigenvalues of HH equals

lim HH () = 1 min { P[T = 0], P[D = 0]} .


Moreover, it has been proved in [3] and [21] that:

Theorem 4.8 ([21]). Let H be an N K matrix dened as in Theorem 4.7


whose jth column is hj . As K, N , with K
N

1
1  a.s. t ()
hj I + h h hj (4.19)
hj  2 E[D]
=j

with t () satisfying (4.17).


May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 107

According to (4.17), it is easy to verify that t () is the solution to



D
t = E   (4.20)
T
1 + DE 1+T t

where for simplicity notation we have abbreviated t () = t .


Notice that, given a linear memoryless vector channel as in (2.1) with
the channel matrix H dened as in Theorem 4.7, the signal-to-interference-
to-noise ratio SINRk , incurred estimating the kth component of channel input
based on its noisy received observations, is given by
1

SINRk = SNR hj I + SNR h h hj

(4.21)
=j

where SNR represents the transmitted signal-to-noise ratio.


Thus from Theorem 4.8, it follows that the multiuser eciency of the
kth user achieved by the MMSE receiver, kMMSE (SNR), converges a.s. to:
SINRk
kMMSE (SNR) = (4.22)
SNR hk 2

a.s. t (SNR)
. (4.23)
SNR E[D]

A special case of Theorem 4.7 is when H = SA (i.e C = I). Then


according to (4.20) and (4.14), we have that
t () = STS ()
and consequently MMSE multiuser eciency, kMMSE (SNR) of a channel as
in (2.1) with H = SA converges a.s. to the transform of STS evaluate at
b
SNR:

kMMSE (SNR) STS (SNR) .


a.s.
(4.24)
Theorem 4.9 ([21, 22]). Let H be an N K matrix dened as in Theorem
4.7. Dening
P[T = 0]
 = ,
P[D = 0]
 
VHH ()
lim log( ) = L (4.25)
min { P[T = 0], P[D = 0]}

b The conventional notation for multiuser eciency is (cf. [4]); the relationship in (5.6)
is the motivation for the choice of the transform terminology introduced in this section.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

108 A. M. Tulino

with
 

P[T=0]D
E log  VT ()  > 1

 e


  

L = E log T eD  = 1 (4.26)



  


E log T 1 V  P[T=0]  < 1
e  D

with and , respectively, solutions to


 
1 P[T = 0]
T () = 1  , D = 1  (4.27)

and with D and T the restrictions of D and T to the events D = 0 and


T = 0.

The foregoing result gives the power oset (3.9) of the linear vector
memoryless channel in (2.1) when H is dened as in Theorem 4.7.

Theorem 4.10. As , we have that


t ()
lim = P[T > 0] (4.28)

where t () is the solution to (4.17) while is the solution to (4.27) for
 < 1 and 0 otherwise.

Denition 4.11. An N K matrix P is asymptotically row-regular if

1 
K
lim 1{Pi,j }
K K
j=1

is independent of i for all R, as K, N and the aspect ra-


tio KN converges to a constant. A matrix whose transpose is asymptot-
ically row-regular is called asymptotically column-regular. A matrix that
is both asymptotically row-regular and asymptotically column-regular is
called asymptotically doubly-regular and satises

1  1 
N K
lim Pi,j = lim Pi,j . (4.29)
N N K K
i=1 j=1

If (4.29) is equal to 1, then P is standard asymptotically doubly-regular.


May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 109

Theorem 4.12 ([21, 3]). Dene an N K complex random matrix H


whose entries are independent complex random variables (arbitrarily dis-
tributed) with identical means. Let their second moments be
  Pi,j
E |Hi,j |2 = (4.30)
N
with P an N K deterministic standard asymptotically doubly-regular ma-
trix whose entries are uniformly bounded for any N . The asymptotic empir-
ical eigenvalue distribution of HH converges a.s. to the Marcenko-Pastur
distribution whose density is given by (4.8).

Using Lemma 2.6 in [23], Theorem 4.12 can be extended to matrices


whose mean has rank r where r > 1 but such that
r
lim = 0.
N N

Denition 4.13. Consider an N K random matrix H whose entries have


variances
Pi,j
Var[Hi,j ] = (4.31)
N
with P an N K deterministic matrix whose entries are uniformly bounded.
For each N , let

v N : [0, 1) [0, 1) R

be the variance prole function given by


i1 i j1 j
v N (x, y) = Pi,j x< , y< . (4.32)
N N K K
Whenever v N (x, y) converges uniformly to a limiting bounded measurable
function, v(x, y), we dene this limit as the asymptotic variance prole
of H.

Theorem 4.14 ([2426]). Let H be an N K complex random matrix


whose entries are independent zero-mean complex random variables (arbi-
trarily distributed) with variances
  Pi,j
E |Hi,j |2 = (4.33)
N
where P is an N K deterministic matrix whose entries are uniformly
bounded and from which the asymptotic variance prole of H, denoted
N ,
v(x, y), can be obtained as per Denition 4.13. As K, N with K
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

110 A. M. Tulino

the empirical eigenvalue distribution of HH converges a.s. to a limiting


distribution whose transform is

HH () = E [ HH (X, ) ] (4.34)

with HH (x, ) satisfying the equations,


1
HH (x, ) = (4.35)
1 + E[v(x, Y)HH (Y, )]
1
HH (y, ) = (4.36)
1 + E[v(X, y)HH (X, )]
where X and Y are independent random variables uniform on [0, 1].

The zero-mean hypothesis in Theorem 4.14 can be relaxed using Lemma


2.6 in [23]. Specically, if the rank of E[H] is o(N ), then Theorem 4.14 still
holds.

Theorem 4.15 ([21]). Let H be an N K matrix dened as in Theorem


4.14. Further dene
1
1  j1 j
(N ) (y, ) = h I + h h hj , y< .
hj 2 j K K
=j

(y,)
As K, N , (N ) converges a.s. to E[v(X,y)] , with (y, ) solution to the
xed-point equation

v(X, y)
(y, ) = E   y [0, 1]. (4.37)
v(X,Y)
1 + E 1+ (Y,) |X

The Shannon transform of the asymptotic spectrum of HH is given by


the following result.

Theorem 4.16 ([27, 3]). Let H be an N K complex random matrix


dened as in Theorem 4.14. The Shannon transform of the asymptotic spec-
trum of HH is

VHH () = E [log(1 + E[v(X, Y)HH (X, )|Y])]


+ E [log(1 + E[v(X, Y)HH (Y, )|X])]
E [v(X, Y)HH (X, )HH (Y, )] log e (4.38)

with HH (, ) and HH (, ) satisfying (4.35) and (4.36).


May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 111

Theorem 4.17 ([21]). Let H be an N K complex random matrix dened


as in Theorem 4.14. Then, denoting

P[E[v(X, Y)|Y] = 0]
 = ,
P[E[v(X, Y)|X] = 0]
we have that
 
VHH ()
lim log() = L
min{P[E[v(X, Y)|Y] = 0], P[E[v(X, Y)|X] = 0]}
with
   
1 v(X , Y ) 

E log E |X  E [log (1 + (Y ))]  > 1

e 1 + (Y )




 

a.s. E log v(X , Y )  = 1
L

e

     

(Y ) v(X , Y ) 

E
1
|  < 1

log E log 1 + E X
e  (Y )

with X and Y the restrictions of X and Y to the events E[v(X, Y)|X]=0 and
E[v(X, Y)|Y]=0, respectively. The function () is the solution, for  >1, of

1 v(X , y)
(y) = E
  (4.39)
 v(R , Y ) 
 
E |X
1 + (Y )
whereas () is the solution, for  <1, of

1
E
  = 1  . (4.40)
v(X , Y ) 
1+E |X
(Y )

As we will see in the next section, Theorems 4.144.17 give the MMSE
performance, the mutual information and the power oset of a large class of
vector channel of interest in wireless communications which are described
by random matrices with either correlated or independent entries.
Let STS be an N N random matrix with S and T be respectively
N K and K K random matrices as stated in Theorem 4.6. We have seen
that the ESD of STS converges a.s. to a nonrandom limit whose Shannon
and transform satisfy (4.15) and (4.14) respectively.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

112 A. M. Tulino

From this limiting behavior of the ESD of STS, it follows immediately


that linear spectral statistics of the form:

1 
N
g(i ) (4.41)
N i=1

with g() a continuous function on the real line with bounded and contin-
uous derivatives, converge a.s. to a nonrandom quantity. A recent central
result in random matrix theory by Bai and Silverstein (2004) [17] shows
their rate of convergence to be 1/N . Moreover, they show that:
Theorem 4.18 ([17]). Let S be an N K complex matrix dened as in
Theorem 4.6 and such that its (i, j)th entry satises:
2
E[Si,j ] = 0 E[|Si,j |4 ] =
. (4.42)
N2
Let T be a K K matrix dened as in Theorem 4.6 whose spectral norm
is bounded. Let g() be a continuous function on the real line with bounded
and continuous derivatives, analytic on a open set containing the interval c
 
[lim inf K max2 {0, 1 }, lim sup 1 (1 + )2 ]
K K

where 1 K are the eigenvalues of T. Denoting by i and FSTS (),


respectively, the ith eigenvalue and the asymptotic ESD of STS , the ran-
dom variable

N 
N = g(i ) N g(x) dFSTS (4.43)
i=1

converges, as K, N with KN , to a zero-mean Gaussian random


variable with variance
# #
1 g(Z(1 ))g(Z(2 ))
E[2 ] = 2 d1 d2 (4.44)
2 2 1
d
where g(x) = dx g(x) while
1
Z() = (1 (1 T ())) . (4.45)

In (5.64) the integration variables 1 and 2 follow closed contours, which
we may take to be non-overlapping and counterclockwise, such that the cor-
responding contours mapped through Z() enclose the support of FSTS ().

c In [28] this interval contains the spectral support of S ST.


May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 113

Using the foregoing result, we have that


Theorem 4.19 ([29]). Let S be an N K complex matrix dened as in
Theorem 4.20. Let T be an Hermitian random matrix independent of S
with bounded spectral norm and whose asymptotic ESD converges a.s. to a
nonrandom limit. Denote by VSTS () the Shannon transform of STS . As
K, N with K N , the random variable

N = log det(I + STS ) N VSTS () (4.46)


is asymptotically zero-mean Gaussian with variance
$ % 2 &'
T STS ()
E[2 ] = log 1 E
1 + TSTS ()
where the expectation is over the nonnegative random variable T whose
distribution is given by the asymptotic ESD of T.

From Jensens inequality and (4.14), a tight lower bound for the variance
of in Theorem 4.19 is given by [3, Eq. 2.239]:
 
(1 STS ())2
E[2 ] log 1 (4.47)

with strict equality if T = I. In fact, Theorem 4.19 can be particularized
to the case T = I to obtain:
Theorem 4.20. Let S be an N K complex as in Theorem 4.19. As
N , the random variable
K, N with K
N = log det(I + SS ) N VSS () (4.48)
is asymptotically zero-mean Gaussian with variance
 
(1 SS ())2
E[2 ] = log 1 (4.49)

where SS () and VHH () are given in (4.9) and (4.10).

5. Applications to Wireless Communications


In this section we focus our attention on some of the major wireless channels
that are simple yet practically very relevant and able to capture various
features of contemporary interest:
A. Randomly spread Code Division Multiple Access (CDMA) channels sub-
ject to either frequency-at or frequency-selective fading.
B. Single-user multiantenna channels subject to frequency-at fading.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

114 A. M. Tulino

Naturally, random matrices also arise in models that incorporate more


than one of the above features (multiuser, multiantenna, fading, wideband).
Although realistic models do include several (if not all) of the above features
it is conceptually advantageous to start by deconstructing them into their
essential ingredients.
In the next subsections we describe the foregoing scenarios and show
how the distribution of the squared singular values of certain matrices de-
termine communication limits in both the coded regime (Shannon capacity)
and the uncoded regime (probability of error). Each of above channels are
analyzed in the asymptotic regime where K (number of transmit anten-
nas or number of users) and N (number of receive antennas or number
of degrees of freedom per symbol in the CDMA channel) go to innity
while the ratio goes to a constant. In such regime, for each of these chan-
nels, we derive several performance measures of engineering interest which
are determined by the distribution of the singular values of the channel
matrix.
Unless otherwise stated, the analysis applies to coherent reception and
thus it is presumed that the state of the channel is perfectly tracked by
the receiver. The degree of channel knowledge at the transmitter, on the
other hand, as well as the rapidity of the fading uctuations (ergodic or
non-ergodic regime) are specied for each individual setting.

5.1. CDMA
An application that is very suitable is the code-division multiple access
channel or CDMA channel, were each user is assigned a signature vector
known at the receiver which can be seen as an element of an N dimensional
signal space. Based on the nature of this signal space we can distinguish
between:
Direct sequence CDMA used in many current cellular systems (IS-95,
cdma2000, UMTS)
Multi-carrier CDMA being considered for fourth generation of cellular
systems.

5.1.1. DS-CDMA frequency-at fading


Concerning the DS-CDMA, we rst focus on channels whose response is
at over the signal bandwidth which implies that the received signature of
each user is just a scaled version of the transmitted one where the scaling
factors are the independent fading coecients for each user.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 115

Considering the basic synchronous DS-CDMA [4] with K users and


spreading factor N in a frequency-at fading environment, the vector x
contains the symbols transmitted by the K users while the role of H is
played by the product of two matrices, S and A, where S is a N K
matrix whose columns are the spreading sequences

S = [ s1 | . . . |sK ] (5.1)

and A is a K K diagonal matrix whose kth diagonal entry is the complex


fading coecient of kth user. The model thus specializes to

y = SAx + n. (5.2)

The standard random signature model [4] assumes that the entries of S,
are chosen independently and equiprobably on { 1N , 1N }. Moreover, the
random signature model is often generalized to encompass non-binary (e.g.
Gaussian) distributions for the amplitudes that modulate the chip wave-
forms. With that, the randomness in the received sequence can also reect
the impact of fading. One motivation for modeling the signatures random is
the use of long sequences in some commercial CDMA systems, where the
period of the pseudo-random sequence spans many symbols. Another moti-
vation is to provide a baseline of comparison for systems that use signature
waveform families with low cross-correlations.
The arithmetic mean of the MMSEs for the K users satises [4]

1  1

K
MMSEk = tr (I + SNR A S SA)1 (5.3)
K K
k=1
A S SA (SNR) (5.4)

whereas the MMSE multiuser eciency of the kth user, kMMSE (SNR), given
in (3.17) is:
1

kMMSE (SNR) = sTk I + SNR |Ai | si si
2 T
sk (5.5)
i=k

SAA S (SNR) (5.6)

where the limit follows from (4.24). According to Theorem 4.6, the MMSE
multiuser eciency, abbreviated as

= SAA S (SNR), (5.7)


May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

116 A. M. Tulino

is the solution to the xed-point equation


 
1 = 1 |A|2 (SNR ) , (5.8)

where |A|2 is the transform of the asymptotic empirical distribution of


{|A1 |2 , . . . , |AK |2 }.
The capacity of the optimum receiver (normalized by the spreading
factor N ) which is given by [4]:
1  
C opt ( SNR) = lim log det I + SNR SAA S ,
N N

according to Theorem 4.6, equals (4.15).


It has been proved in [30] that in the asymptotic regime the normalized
spectral eciency of the MMSE receiver converges to

1 
K
C MMSE ( SNR) = lim E [log (1 + SINRk )] (5.9)
N N
k=1

from which it follows using (4.22) that


  
C MMSE ( SNR) = E log 1 + |A|2 SNR SAA S (SNR) . (5.10)

Based on (5.10), the capacity of the optimum receiver can be characterized


in terms of the MMSE spectral eciency [30]:
1
C opt ( SNR) = C MMSE ( SNR) + log
SAA S (SNR)
+ (SAA S (SNR) 1) log e. (5.11)

The unfaded equal power case is obtained by the the above model assum-
ing A = AI, where A is the transmitted amplitude equal for all users. In this
case, the channel matrix in (5.24) has independent identically distributed
entries and thus, according to Theorem 4.3, its asymptotic ESD converges
to the Marcenko-Pastur law. Thus the normalized capacity achieved with
the optimum receiver in the asymptotic regime is (cf. Theorem 4.4):
 
opt F (SNR, )
C (, SNR) = log 1 + SNR
4
 
F (SNR, ) F (SNR, )
+ log 1 + SNR log e,
4 4 SNR
(5.12)
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 117

while the MMSE converges to

F (SNR, )
1 (5.13)
4 SNR

with F (, ) dened in (4.11). Using (4.24) and (4.9), the maximum SINR
(achieved by the MMSE linear receiver) converges to [4]

F (SNR, )
SNR . (5.14)
4
Let us consider a synchronous DS-CDMA downlink with K active users
employing random spreading codes and operating over a frequency-selective
fading channel. Then H in (2.1) particularizes to

H = CSA (5.15)

where A is a KK deterministic diagonal matrix containing the amplitudes


of the users and C is an N N Toeplitz matrix dened as
 
1 ij
(C)i,j = c (5.16)
Wc Wc

with c() the impulse response of the channel.


Using Theorem 4.8 and with the aid of an auxiliary function (SNR),
abbreviated as , we obtain that the MMSE multiuser eciency of the kth
user, abbreviated as = MMSE (SNR), is the solution to

1 |C|2 ( )
= (5.17)
E[|C|2 ]
1 |A|2 (SNR E[|C|2 ])
= (5.18)
E[|C|2 ]

where |C|2 and |A|2 are independent random variables with distributions
given by the asymptotic spectra of CC and AA , respectively, while
|C|2 () and |A|2 () represent their respective transforms. Note that, using
(4.20), instead of (5.18) and (5.17), we may write [31, 32]

|C|2
= E
 
. (5.19)
|A|2
1 + SNR |C|2 E
1 + SNR |A|2
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

118 A. M. Tulino

From Theorem 4.7 and (4.3) we have that:

1  1  1

K
MMSEk = tr I + SNR H H
K K
k=1
1 1
= 1 + |C|2 ((SNR)) (5.20)

with () solution to (5.18) and (5.17).


The special case of (5.19) for equal-power users was given in [33].
For contributions on the asymptotic analysis of the uplink DS-CDMA
systems in frequency selective fading channels see [3,27,32]. In the context of
CDMA channels the asymptotic random matrix theory nd also application
in channel estimation and design of reduced-complexity receivers (see [3]
for tutorial overview of this topic).

5.1.2. Multi-carrier CDMA


If the channel is not at over the signal bandwidth, then the received sig-
nature of each user is not simply a scaled version of the transmitted one.
In this case, we can insert suitable transmit and receive interfaces and
choose the signature space in such a way that the equivalent channel that
encompasses the actual channel plus the interfaces can be modeled as a
random matrix H given by:

H = C SA (5.21)
= GS (5.22)

where denotes the Hadamard (element-wise) product [34], S is the ran-


dom signature matrix in the frequency domain, while G is an N K ma-
trix whose columns are independent N -dimensional random vectors whose
( , k)th element is given by

Gi,j = |Ci,j |2 |Aj |2 (5.23)

where Ak indicates the received amplitude of that kth user, which accounts
for its average path loss, and C,k denotes the fading for the th subcarrier
of the kth user, independent across the users. For this scenario, the linear
model (2.1) specializes to

y = (G S)x + n. (5.24)
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 119

The SINR at the output of the MMSE receiver is


 1
MMSE
SINRk = SNR |Ak |2 (ck sk ) I + SNR Hk Hk (ck sk )

where Hk indicates the matrix H with the kth column removed.


Let v(, ) be the two-dimensional channel prole of G. Using Theorems
4.15 and (4.22), the multiuser eciency is given by the following result.
Theorem 5.1 ([27]). For 0 y 1, the multiuser eciency of the MMSE
N , to
receiver for the
yK th user converges a.s., as K, N with K

MMSE (y, SNR)


lim yK (SNR) = (5.25)
K E [(X, y)]
where (, ) is a positive function solution to

(X, y)
(y, SNR) = E
 
(5.26)
(X, Y)
1 + SNR E |X
1 + SNR (Y, SNR)
and the expectations are with respect to independent random variables X
and Y both uniform on [0,1].

Most quantities of interest such as the multiuser eciency and the ca-
pacity approach their asymptotic behaviors very rapidly as K and N grow
large. Hence, we can get an extremely accurate approximation of the mul-
tiuser eciency and consequently of the capacity with an arbitrary number
of users, K, and a nite processing gain, N , simply by resorting to their
asymptotic approximation with (x, y) replaced in Theorem 5.1 by
1 k1 k
(x, y) |Ak |2 |C,k |2 x< y< .
N N K K
Thus, we have that the multiuser eciency of uplink MC-CDMA is closely
approximated by
N
k (SNR)
kMMSE (SNR) (5.27)
1 N
|C,k |2
N
=1

with
1 
N
|C,k |2
N
k (SNR) = . (5.28)

N K
=1 |Aj |2
1 + SNR
K j=1 1 + SNR N j (SNR)
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

120 A. M. Tulino

From (5.9) using Theorem 5.1, the MMSE spectral eciency converges,
as K, N , to
C MMSE (, SNR) = E [log (1 + SNR (Y, SNR))] (5.29)
where the function (, ) is the solution of (5.26).
As an application of Theorem 4.16, the capacity of a multicarrier CDMA
channel is obtained.
Theorem 5.2 ([27]). The capacity of the optimum receiver is
C opt (, SNR) = C MMSE (, SNR)
+ E [log(1 + SNR E [(X, Y)(Y, SNR)|X]]
SNR E [(Y, SNR)(Y, SNR)] log e (5.30)
with (, ) and (, ) satisfying the coupled xed-point equations
 
(X, y)
(y, SNR) = E (5.31)
1 + SNR E[(X, Y)(Y, SNR)|X]
1
(y, SNR) = (5.32)
1 + SNR (y, SNR)
where X and Y are independent random variables uniform on [0, 1].

Note that (5.30) appears as function of quantities with immediate en-


gineering meaning. More precisely, SNR (y, SNR) is easily recognized from
Theorem 5.1 as the SINR exhibited by the
yK th user at the output
of a linear MMSE receiver. In turn (y, SNR) is the corresponding mean-
square error. An alternative characterization of the capacity (inspired by
the optimality by successive cancellation with MMSE protection against
uncancelled users) is given by
C opt (, SNR) = E [log(1 + SNR (Y, SNR))] (5.33)
where

(X, y)
(y, SNR) = E
 
(5.34)
(X, Z)
1 + SNR (1 y)E |X
1 + SNR (Z, SNR)
where X, and Z are independent random variables uniform on [0, 1] and
[y, 1], respectively.
For the downlink, the structure of the transmitted MC-CDMA signal is
identical to that of the uplink, but the dierence with (5.21) is that every
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 121

user experiences the same channel and thus ck = c for all 1 k K. As a


result,

H = CSA

with C = diag(c) and A = diag(a). Consequently Theorems 4.7 4.10 can


be used for the asymptotic analysis of MC-CDMA downlink. For an ex-
tended survey on contributions on the asymptotic analysis of MC-CDMA
channels see [3] and references therein.

5.2. Multi-antenna channels


Let us now consider a single-user channel where the transmitter has nT
antennas and the receiver has nR antennas.
In this case, x contains the symbols transmitted from the nT transmit
antennas and y the symbols received by the nR receive antennas. With
frequency-at fading, the entries of H represent the fading coecients be-
tween each transmit and each receive antenna, typically modelled as zero-
mean complex Gaussian and normalized such that
 
E tr{HH } = nR . (5.35)

If all antennas are co-polarized, the entries of H are identically distributed


and thus the resulting variance of each entry is n1T . (See [16, 35] for the
initial contributions on this topic and [3640] for recent articles of tutorial
nature.)
In contrast with the multiaccess scenarios, in this case the signals trans-
mitted by dierent antennas can be advantageously correlated and thus the
covariance of x becomes relevant. Normalized by its energy per dimension,
the input covariance is denoted by
E[xx ]
= (5.36)
1
E[x2 ]
nT
where the normalization ensures that E[tr{}] = nT . It is useful to decom-
pose this input covariance in its eigenvectors and eigenvalues, = VPV .
Each eigenvalue represents the (normalized) power allocated to the cor-
responding signalling eigenvector. Associated with P, we dene an input
power prole
j j+1
P (nR ) (t, SNR) = Pj,j t<
nR nR
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

122 A. M. Tulino

supported on t (0, ]. This prole species the power allocation at each


SNR. As the number of antennas is driven to innity, P R (t, SNR) converges
(n )

uniformly to a nonrandom function, P(t, SNR), which we term asymptotic


power prole.
The capacity per receive antenna is given by the maximum over of
the Shannon transform of the averaged empirical distribution of HH , i.e.
C(SNR) = max VHH (SNR) (5.37)
:tr =nT

where
E[x2 ]
SNR = . (5.38)
1
E[n2 ]
nR
If full CSI is available at the transmitter, then V should coincide with
the eigenvector matrix of H H and P should be obtained through a waterll
process on the eigenvalues of H H [16, 4143]. The resulting jth diagonal
entry of P is
 +
1
Pj,j = (5.39)
SNR j (H H)

where is such that tr{P} = nT . Then, substituting in (5.37),


1
C(SNR) = log det(I + SNR P) (5.40)
nR

= (log(SNR ))+ dFnHT H () (5.41)

with equal to the diagonal eigenvalue matrix of H H.


If, instead, only statistical CSI is available, then V should be set, for
all the channels that we will consider, to coincide with the eigenvectors
of E[H H] while the capacity-achieving power allocation, P, can be found
iteratively [44].

5.3. Separable correlation model


Antenna correlation at the transmitter and at the receiver, that is, between
the columns and between the rows of H, respectively, can be accounted for
through corresponding correlation matrices T and R [4547]. According
to this model, which is referred to as separable correlation model, an nR nT
matrix Hw , whose entries are i.i.d. zero-mean with variance n1T , is pre- and
post-multiplied by the square root of deterministic matrices, T and R ,
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 123

whose entries represent, respectively, the correlation between the transmit


antennas and between the receive antennas:
1/2 1/2
H = R Hw T (5.42)
Implied by this model is that the correlation between two transmit antennas
is the same regardless of the receive antenna at which the observation is
made and vice versa. The validity of this model has been conrmed by
a number of experimental measurements conducted in various scenarios
[4854].
With full CSI at the transmitter, the asymptotic capacity is [55]

C(SNR) = (log(SNR ))+ dG() (5.43)
0
where satises
  +
1
dG() = 1 (5.44)
0 SNR
with G() the asymptotic spectrum of H H whose transform can be de-
rived using Theorem 4.7 and Lemma 4.2. Invoking Theorem 4.9, the capac-
ity in (5.43) can be evaluated as follows.
Theorem 5.3 ([56]). Let R and T be independent random variables
whose distributions are the asymptotic spectra of the full-rank matrices R
and T respectively. Further dene
( (
T < 1 R < 1
1 = 2 = (5.45)
R > 1 T > 1
and let be the inmum (excluding any mass point at zero) of the support
of the asymptotic spectrum of H H. For
 
1 1
SNR E (5.46)
1
with satisfying
) (
1
2 () = 1 min , ,

the asymptotic capacity of a channel with separable correlations and full
CSI at the transmitter is
    
T 1

E log + VR () + log SNR +E <1
e T
C(SNR) =     

R 1

E log + V () + log SNR +E >1
e T
R
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

124 A. M. Tulino

with and the solutions to


1
T () = 1 R () = 1 .

No asymptotic characterization of the capacity with full CSI at the
transmitter is known for = 1 and arbitrary SNR.
When the correlation is present only at either the transmit or receive
ends of the link, the solutions in Theorem 5.3 sometimes become explicit:

Corollary 5.4. With correlation at the end of the link with the fewest
antennas, the capacity per antenna with full CSI at the transmitter con-
verges to
    
T 1 1 1 <1


E log e + log 1 + log SNR + E T R = 1
C=     

R 1

E log log + log SNR( 1) + E
1 >1
e R T = 1 .

Finally if all antennas are assumed uncorrelated a single-user multi-


antenna channel with no correlation (i.e R = T = I) is commonly
refereed to as canonical channel the capacity per antenna with full CSI
at the transmitter converges to:

Theorem 5.5 ([56]). For


2 min{1, 3/2 }
SNR (5.47)
|1 ||1 |
the capacity of the canonical channel with full CSI at the transmitter con-
verges a.s. to
 
SNR 1 1


log + 1 + (1) log 1 log e

<1
C(SNR) =  



log SNR + 1 + ( 1) log 1 log e > 1.

With statistical CSI at the transmitter, achieving capacity requires that


the eigenvectors of the input covariance, , coincide with those of T
[57, 58]. Consequently, denoting by T and R the diagonal eigenvalue
matrices of T and R , respectively, we have that
1 
log det I + SNR R Hw T PT Hw R
1/2 1/2 1/2 1/2
C(, SNR) =
N
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 125

where P is the capacity-achieving power allocation [44]. Applying Theorem


4.7, we obtain:
Theorem 5.6 ([21]). The capacity of a Rayleigh-faded channel with sep-
arable transmit and receive correlation matrices T and R and statistical
CSI at the transmitter converges to
C(, SNR) = E [log(1 + SNR (SNR))] + E [log(1 + SNR R (SNR)]
SNR (SNR)(SNR) log e (5.48)
where
 
1 R
(SNR) = E (5.49)
1 + SNR R (SNR)
 

(SNR) = E (5.50)
1 + SNR (SNR)
with expectation over and R whose distributions are given by the asymp-
totic empirical eigenvalue distributions of T P and R , respectively.

If the input is isotropic, the achievable mutual information is easily


found from the foregoing result.
Corollary 5.7 ([59]). Consider a channel dened as in Theorem 5.6 and
an isotropic input. Expression (5.48) yields the mutual information with the
distribution of given by the asymptotic empirical eigenvalue distribution
of T .

This corollary is illustrated in Fig. 2, which depicts the mutual infor-


mation (bits/s/Hz) achieved by an isotropic input for a wide range of SNR.
The channel is Rayleigh-faded with nT = 4 correlated antennas and nR = 2
uncorrelated antennas. The correlation between the ith and jth transmit
antennas is
(T )i,j = e0.05d
2
(ij)2
(5.51)
which corresponds to a uniform linear array with antenna separation d
(wavelengths) exposed to a broadside Gaussian azimuth angular spectrum
with a 2 root-mean-square spread [60]. Such angular spread is typical of an
elevated base station in rural or suburban areas. The solid lines depict the
analytical solution obtained by applying Theorem 5.6 with P = I and R =
I and with the expectations over replaced with arithmetic averages over
the eigenvalues of T . The circles, in turn, show the result of Monte-Carlo
simulations. Notice the excellent agreement even for such small numbers of
antennas.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

126 A. M. Tulino

12

Mutual Information (bits/s/Hz)


d
(i.i.d.)
d=2
10 d

d=1

8 receiver

transmitter

2 analytical
simulation

-10 -5 0 5 10 15 20
SNR (dB)
Fig. 2. Mutual information achieved by an isotropic input on a Rayleigh-faded channel
with nT = 4 and nR = 2. The transmitter is a uniform linear array whose antenna
correlation is given by (5.51) where d is the spacing (wavelengths) between adjacent
antennas. The receive antennas are uncorrelated.

A Ricean term can be incorporated in the model (5.42) through


an additional deterministic matrix H0 containing unit-magnitude entries
[6163]. With proper weighting of the random and deterministic matrices,
the model particularizes to
$* * '
1 1/2 1/2 K
y= Hw T + H0 x+n (5.52)
K+1 R K+1

with Hw an i.i.d. N (0, 1) matrix and with the Ricean K-factor quantify-
ing the ratio between the deterministic (unfaded) and the random (faded)
energies [64].
If we assume that H0 has rank r where r > 1 but such that
r
lim =0 (5.53)
N N
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 127

then all foregoing results can be extended to the ricean channel by simply
replacing R and T with independent random variables whose  distribu-
1
tions are the asymptotic spectra of the full-rank matrices K+1 R and

1
K+1 T respectively.

5.4. Non-separable correlation model


While the separable correlation model is relatively simple and analytically
appealing, it also has clear limitations, particularly in terms of representing
indoor propagation environments [65]. Also, it does not accommodate di-
versity mechanisms such as polarizationd and radiation pattern diversitye
that are becoming increasingly popular as they enable more compact ar-
rays. The use of dierent polarizations and/or radiation patterns creates
correlation structures that cannot be represented through the separable
model.
A broader range of correlations can be encompassed, if we model the
channel as

H = UR HUT (5.54)

where UR and UT are unitary while the entries of H are independent zero-
mean Gaussian. This model is advocated and experimentally supported
in [68] and its capacity is characterized asymptotically in [21]. For the more
restrictive case where UR and UT are Fourier matrices, the model (5.54)
was proposed earlier in [69].
Since the spectra of H and H coincide, every result derived for matrices
with independent non-identically distributed entries (cf. Theorems 4.12
4.17) apply immediately to H.
As it turns out, the asymptotic spectral eciency of H is fully charac-
terized by the variances of its entries, which we assemble in a matrix G
such that Gi,j = nT E[|Hi,j |2 ] with

Gi,j = nT nR . (5.55)
ij

d Polarization diversity: Antennas with orthogonal polarizations are used to ensure low
levels of correlation with minimum or no antenna spacing [63, 66] and to make the
communication link robust to polarization rotations in the channel [67].
e Pattern diversity: Antennas with dierent radiation patterns or with rotated versions

of the same pattern are used to discriminate dierent multipath components and reduce
correlation.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

128 A. M. Tulino

Invoking Denition 4.13, we introduce the variance prole of H, which


maps the entries of G onto a two-dimensional piece-wise constant function
i i+1 j j+1
G (nR ) (r, t) = Gi,j r< , t< (5.56)
nR nR nT nT
supported on r, t [0, 1]. We can interpret r and t as normalized receive
and transmit antenna indices. It is assumed that, as the number of antennas
grows, G (nR ) (r, t) converges uniformly to the asymptotic variance prole,
G(r, t). The normalization condition in (5.55) implies that
E[G(R, T)] = 1 (5.57)
with R and T independent random variables uniform on [0, 1].
With full CSI at the transmitter, the asymptotic capacity is given by
(5.43) and (5.44) with G() representing the asymptotic spectrum of H H.
Using Theorems 4.17, an explicit expression for C(SNR) can be obtained for
suciently high SNR.
With statistical CSI at the transmitter, the eigenvectors of the capacity-
achieving input covariance coincide with the columns of UT in (5.54)
[70, 71]. Consequently, the capacity is given by:
1 
C(, SNR) = lim log det I + HPH . (5.58)
N N

Denote by P(t, SNR) the asymptotic power prole of the capacity achieving
power allocation at each SNR, in order to characterize (5.58), we invoke
Theorem 4.16 to obtain the following.
Theorem 5.8 ([21]). Consider the channel H = UR HUT where UR and
UT are unitary while the entries of H are zero-mean Gaussian and indepen-
dent. Denote by G(r, t) the asymptotic variance prole of H. With statistical
CSI at the transmitter, the asymptotic capacity is
C(, SNR) = E [log(1 + SNR E [G(R, T)P(T, SNR)(R, SNR)| T])]
+ E [log(1 + E[G(R, T)P(T, SNR)(T, SNR)|R])]
E [G(R, T)P(T, SNR)(R, SNR)(T, SNR)] log e
with expectation over the independent random variables R and T uniform
on [0, 1] and with
1
(r, SNR) =
1 + E[G(r, T)P(T, SNR)(T, SNR)]
SNR
(t, SNR) = .
1 + SNR E [G(R, t)P(t, SNR)(R, SNR)]
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 129

If there is no correlation but antennas with dierent polarizations are


used, the entries of H are no longer identically distributed because of the
dierent power transfer between co-polarized and dierently polarized an-
tennas. In this case, we can model the channel matrix as

H = A Hw (5.59)

where indicates Hadamard (element-wise) multiplication, Hw is composed


of zero-mean i.i.d. Gaussian entries with variance n1T and A is a determin-
istic matrix containing the square-root of the second-order moment of each
entry of H, which is given by the relative polarization of the corresponding
antenna pair. If all antennas are co-polar, then every entry of A equals 1.
The asymptotic capacity with full CSI at the transmitter can be found,
for suciently high SNR, by invoking Theorem 4.17.
Since the entries of H are independent, the input covariance that
achieves capacity with statistical CSI is diagonal [70, 71]. The correspond-
ing asymptotic capacity per antenna equals the one given in Theorem 5.8
with G(r, t) the asymptotic variance prole of H. Furthermore, these solu-
tions do not require that the entries of H be Gaussian but only that their
variances be uniformly bounded.
A common structure for A, arising when the transmit and receive arrays
have an equal number of antennas on each polarization, is that of a doubly-
regular form (cf. Denition 4.11). For such channels, the capacity-achieving
input is not only diagonal but isotropic and, applying Theorem 4.12, the
capacity admits an explicit form.

Theorem 5.9. Consider a channel H = A Hw where the entries of A


are deterministic and nonnegative while those of Hw are zero-mean and
independent, with variance n1T but not necessarily identically distributed.
If A is doubly-regular (cf. Denition 4.11), the asymptotic capacity per
antenna, with full CSI or with statistical CSI at the transmitter, coincides
with that of the canonical channel, given respectively in Theorem 5.5 and
in Eq. (4.10) with in the latter = SNR
.

A very practical example of the applicability of the above result is given


by the following wireless channel.

Example 5.10. Consider the wireless channel as in Fig. 3 where each


transmitter and receiver have antennas split between two orthogonal polar-
izations. Denoting by the gain between copolar antennas dierent from
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

130 A. M. Tulino

Fig. 3. Laptop computers equipped with a 16-antenna planar array. Two orthogonal
polarizations used.

gain between crosspolar antennas, , we can model the channel matrix as


in (5.59), where P = A A equals:


...
...

P= ... (5.60)

.. .. .. .. ..
. . . . .
which is asymptotically mean doubly regular.

Again the zero-mean multi-antenna channel model analyzed thus far can
be made Ricean by incorporating an additional deterministic component
H [6163] which leads to the following general model
$* * '
1 K
y= H+ H x + n (5.61)
K +1 K +1
with the scalar Ricean factor K quantifying the ratio between the Frobenius
norm of the deterministic (unfaded) component and the expected Frobe-
nius norm of the random (faded) component. Considered individually, each
(i, j)th channel entry has a Ricean factor given by
|Hi,j |2
K .
E[|Hi,j |2 ]
Using Lemma 2.6 in [23] the next result follows straightforwardly.

Theorem 5.11. Consider a channel with a Ricean term whose rank is


nite. The asymptotic capacity per antenna, C rice (, SNR), equals the cor-
responding asymptotic capacity per antenna in the absence of the Ricean
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 131

component, C(, SNR), with a simple SNR


penalty:
 
rice SNR
C (, SNR) = C , (5.62)
K +1
Note that, while the value of the capacity depends on the degree of CSI
available at the transmitter, (5.62) holds regardless.

5.5. Non-ergodic channels


The results on large random matrices surveyed in Section 4 show that the
mutual information per receive antenna converges a.s. to its expectation as
the number of antennas goes to innity (with a given ratio of transmit to
receive antennas). Thus, as the number of antennas grows, a self-averaging
mechanism hardens the mutual information to its expected value. However,
the non-normalized mutual information still suers random uctuations
that, although small with respect to the mean, are of vital interest in the
study of the outage capacity.
An interesting property of the distribution of the non-normalized mu-
tual information in (3.18) is the fact that, for many of the multi-antenna
channels of interest, it can be approximated as Gaussian as the number of
antennas grows. A number of authors have explored this property. Argu-
ments supporting the normality of the c.d.f (cumulative distribution func-
tion) of the mutual information for large numbers of antennas were given
in [29, 7274].f Ref. [72] used the replica method from statistical physics
(which has yet to nd a rigorous justication), [73] showed the asymptotic
normality only in the asymptotic regimes of low and high signal-to-noise
ratios, while in [74], the normality of the outage capacity is proved for the
canonical channel using [17]. Theorem 4.19 proves the asymptotic normality
of the unnormalized mutual information for arbitrary signal-to-noise ratios
and fading distributions, allowing for correlation between the antennas at
either transmitter or receiver. Theorem 4.20 a proof of such theorem can
be found in [29] provides succinct expressions for the asymptotic mean
and variance of the mutual information in terms of the and Shannon
transforms of the correlation matrix. Using Theorem 4.20 we can get an
extremely accurate approximation of the cumulative distribution of (3.18)
with an arbitrary number of transmit and receive antennas. More speci-
cally we have that the cumulative distribution of the unnormalized mutual

f For additional references, see [3].


May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

132 A. M. Tulino

information of a MIMO channel with correlation at the transmitter for ar-


bitrary signal-to-noise ratios and fading distributions, is well approximated
by a Gaussian distribution with mean, and varaince 2 given by
1 
nT
= log (1 + SNRj (T) ) log + ( 1) log e (5.63)
N j=1

%
nT  2 &
1 j (T) SNR
2 = log 1 . (5.64)
nT j=1 1 + j (T) SNR

In order to illustrate the power of this result with some examples, we


will consider correlated MIMO channels with a transmit correlation matrix
T such that
(T )i,j = e0.8(ij)
2
(5.65)
which is a typical structure of an elevated base station in suburbia. The
receive antennas are uncorrelated. For the examples, we will compare the
cumulative distributions of the unnormalized mutual information of such
channel with a Gaussian distribution whose mean and variance are given in
(5.63) and (5.64). Figures 4 and 5 compare the 10% point in the cumulative

14
10% Outage Capacity (bits/s/Hz)

Simulation
12 Gaussian approximation

10
SNR (dB) Simul. Asympt.
0 0.52 0.50
8 10 2.28 2.27

Transmitter Receiver
2 (K=2) (N=2)

0
0 5 10 15 20 25 30 35 40
SNR (dB)

Fig. 4. 10%-outage capacity for a Rayleigh-faded channel with nT = nR = 2. The trans-


mit antennas are correlated as per (5.65) while the receive antennas are uncorrelated.
Solid line indicates the corresponding limiting Gaussian distribution.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 133

10% Outage Capacity (bits/s/Hz)


16
Simulation
Gaussian approximation

12

4 Receiver
(N=2)
Transmitter
(K=4)

0
0 5 10 15 20 25 30 35 40
SNR (dB)

Fig. 5. 10%-outage capacity for a Rayleigh-faded channel with nT = 4 and nR =


2. The transmit antennas are correlated as per (5.65) while the receive antennas are
uncorrelated. Solid line indicates the corresponding limiting Gaussian distribution.

distribution of the mutual information for SNR between 0 and 40 dB for


nR = 2 and dierent number of transmit antennas. The solid line indicates
the simulation while the circles indicate the Gaussian distribution. Notice
the remarkable agreement despite having such a small number of antennas.
For channels with both transmit and receive correlation, the character-
istic function found through the replica method yields to the expression of
E[2 ] given in [72].

References
1. S. Verdu, Random matrices in wireless communication, proposal to the Na-
tional Science Foundation (Feb. 1999).
2. S. Verdu, Large random matrices and wireless communications, 2002 MSRI
Information Theory Workshop (Feb 25Mar 1, 2002).
3. A. M. Tulino and S. Verdu, Random Matrix Theory and Wireless Com-
munications, Foundations and Trends in Communications and Information
Theory, Volume 1, Issue 1 (Now Publishers Inc., 2004).
4. S. Verdu, Multiuser Detection (Cambridge University Press, Cambridge, UK,
1998).
5. S. Verdu and S. Shamai, Spectral eciency of CDMA with random spreading,
IEEE Trans. Information Theory 45(2) (1999) 622640.
6. D. Tse and S. Hanly, Linear multiuser receivers: Eective interference, eec-
tive bandwidth and user capacity, IEEE Trans. Information Theory 45(2)
(1999) 641657.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

134 A. M. Tulino

7. M. S. Pinsker, Information and Information Stability of Random Variables


and Processes (Holden-Day, San Francisco, CA, 1964).
8. S. Verdu, Capacity region of Gaussian CDMA channels: The symbol syn-
chronous case, in Proc. Allerton Conf. on Communication, Control and Com-
puting, Monticello, IL (Oct. 1986), pp. 10251034.
9. A. Lozano, A. M. Tulino and S. Verdu, High-SNR power oset in multi-
antenna communication, Bell Labs Technical Memorandum (June 2004).
10. A. Lozano, A. M. Tulino and S. Verdu, High-SNR power oset in multi-
antenna communication, IEEE Trans. Information Theory 51(12) (2005)
41344151.
11. Z. D. Bai, Convergence rate of expected spectral distributions of large random
matrices. Part I: Wigner matrices, Annals of Probability 21(2) (1993) 625
648.
12. F. Hiai and D. Petz, Asymptotic freeness almost everywhere for random
matrices, Acta Sci. Math. Szeged 66 (2000) 801826.
13. E. Biglieri, J. Proakis and S. Shamai, Fading channels: Information-theoretic
and communications aspects, IEEE Trans. Information Theory 44(6) (1998)
26192692.
14. I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete
Memoryless Systems (Academic, New York, 1981).
15. G. J. Foschini, Layered space-time architecture for wireless communication in
a fading environment when using multi-element antennas, Bell Labs Technical
Journal 1 (1996) 4159.
16. E. Telatar, Capacity of multi-antenna Gaussian channels, Euro. Trans.
Telecommunications 10(6) (1999) 585595.
17. Z. D. Bai and J. W. Silverstein, CLT of linear spectral statistics of large
dimensional sample covariance matrices, Annals of Probability 32(1A) (2004)
553605.
18. T. J. Stieltjes, Recherches sur les fractions continues, Annales de la Faculte
des Sciences de Toulouse 8(9) (1894) (1895) no. A (J), pp. 147 (1122).
19. V. A. Marcenko and L. A. Pastur, Distributions of eigenvalues for some sets
of random matrices, Math. USSR-Sbornik 1 (1967) 457483.
20. J. W. Silverstein and Z. D. Bai, On the empirical distribution of eigenvalues
of a class of large dimensional random matrices, J. Multivariate Analysis 54
(1995) 175192.
21. A. M. Tulino, A. Lozano and S. Verdu, Impact of correlation on the capacity
of multi-antenna channels, IEEE Trans. Information Theory 51(7) (2005)
24912509.
22. A. Lozano, A. M. Tulino and S. Verdu, High-SNR power oset in multi-
antenna communication, in Proc. IEEE Int. Symp. on Information Theory
(ISIT04), Chicago, IL (June 2004).
23. Z. D. Bai, Methodologies in spectral analysis of large dimensional random
matrices, Statistica Sinica 9(3) (1999) 611661.
24. V. L. Girko, Theory of Random Determinants (Kluwer Academic Publishers,
Dordrecht, 1990).
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 135

25. A. Guionnet and O. Zeitouni, Concentration of the spectral measure for large
matrices, Electronic Communications in Probability 5 (2000) 119136.
26. D. Shlyankhtenko, Random Gaussian band matrices and freeness with amal-
gamation, Int. Math. Res. Note 20 (1996) 10131025.
27. L. Li, A. M. Tulino and S. Verdu, Spectral eciency of multicarrier CDMA,
IEEE Trans. Information Theory 51(2) (2005) 479505.
28. Z. D. Bai and J. W. Silverstein, Exact separation of eigenvalues of large
dimensional sample covariance matrices, Annals of Probability 27(3) (1999)
15361555.
29. A. M. Tulino and S. Verdu, Asymptotic outage capacity of multiantenna
channels, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing
(ICASSP05), Philadelphia, PA, USA (Mar. 2005).
30. S. Shamai and S. Verdu, The eect of frequency-at fading on the spectral
eciency of CDMA, IEEE Trans. Information Theory 47(4) (2001) 1302
1327.
31. J. M. Chaufray, W. Hachem and P. Loubaton, Asymptotic analysis of opti-
mum and sub-optimum CDMA MMSE receivers, Proc. IEEE Int. Symp. on
Information Theory (ISIT02) (July 2002), p. 189.
32. L. Li, A. M. Tulino and S. Verdu, Design of reduced-rank MMSE multiuser
detectors using random matrix methods, IEEE Trans. Information Theory
50(6) (2004).
33. M. Debbah, W. Hachem, P. Loubaton and M. de Courville, MMSE analysis of
certain large isometric random precoded systems, IEEE Trans. Information
Theory 49(5) (2003) 12931311.
34. R. Horn and C. Johnson, Matrix Analysis (Cambridge University Press,
1985).
35. G. Foschini and M. Gans, On limits of wireless communications in fading en-
vironment when using multiple antennas, Wireless Personal Communications
6(6) (1998) 315335.
36. S. N. Diggavi, N. Al-Dhahir, A. Stamoulis and A. R. Calderbank, Great
expectations: The value of spatial diversity in wireless networks, Proc. IEEE
92(2) (2004) 219270.
37. A. Goldsmith, S. A. Jafar, N. Jindal and S. Vishwanath, Capacity limits of
MIMO channels, IEEE J. Selected Areas in Communications 21(5) (2003)
684702.
38. D. Gesbert, M. Sha, D. Shiu, P. J. Smith and A. Naguib, From theory
to practice: An overview of MIMO spacetime coded wireless systems, J.
Selected Areas in Communications 21(3) (2003) 281302.
39. E. Biglieri and G. Taricco, Large-system analyses of multiple-antenna
system capacities, Journal of Communications and Networks 5(2) (2003)
5764.
40. E. Biglieri and G. Taricco, Transmission and reception with multiple an-
tennas: Theoretical foundations, submitted to Foundations and Trends in
Communications and Information Theory (2004).
41. B. S. Tsybakov, The capacity of a memoryless Gaussian vector channel, Prob-
lems of Information Transmission 1 (1965) 1829.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

136 A. M. Tulino

42. T. M. Cover and J. A. Thomas, Elements of Information Theory (John Wiley


and Sons, Inc., 1991).
43. G. Raleigh and J. M. Cio, Spatio-temporal coding for wireless communica-
tions, IEEE Trans. on Communications 46(3) (1998) 357366.
44. A. M. Tulino, A. Lozano and S. Verdu, Power allocation in multi-antenna
communication with statistical channel information at the transmitter, in
Proc. IEEE Int. Conf. on Personal, Indoor and Mobile Radio Communica-
tions (PIMRC04), Barcelona, Catalonia, Spain (Sep. 2004).
45. D.-S. Shiu, G. J. Foschini, M. J. Gans and J. M. Kahn, Fading correlation and
its eects on the capacity of multi-element antenna systems, IEEE Trans. on
Communications 48(3) (2000) 502511.
46. D. Chizhik, F. R. Farrokhi, J. Ling and A. Lozano, Eect of antenna separa-
tion on the capacity of BLAST in correlated channels, IEEE Communications
Letters 4(11) (2000) 337339.
47. K. I. Pedersen, J. B. Andersen, J. P. Kermoal and P. E. Mogensen, A stochas-
tic multiple-input multiple-output radio channel model for evaluations of
space-time coding algorithms, in Proc. IEEE Vehicular Technology Conf.
(VTC2000 Fall) (Sep. 2000), pp. 893897.
48. C. C. Martin, J. H. Winters and N. R. Sollenberger, Multiple-input multiple-
output (MIMO) radio channel measurements, in Proc. IEEE Vehic. Techn.
Conf. (VTC2000), Boston, MA, USA (Sep. 2000).
49. H. Xu, M. J. Gans, N. Amitay and R. A. Valenzuela, Experimental veri-
cation of MTMR system capacity in a controlled propagation environment,
Electronics Letters 37 (2001).
50. J. P. Kermoal, L. Schumacher, P. E. Mogensen and K. I. Pedersen, Exper-
imental investigation of correlation properties of MIMO radio channels for
indoor picocell scenarios, in Proc. IEEE Vehic. Tech. Conf. (VTC2000 Fall)
(2000).
51. D. Chizhik, G. J. Foschini, M. J. Gans and R. A. Valenzuela, Propagation
and capacities of multi-element transmit and receive antennas, in Proc. 2001
IEEE AP-S Int. Symp. and USNC/URSI National Radio Science Meeting,
Boston, MA (July 2001).
52. J. Ling, D. Chizhik, P. W. Wolniansky, R. A. Valenzuela, N. Costa and
K. Huber, Multiple transmit multiple receive (MTMR) capacity survey in
Manhattan, IEE Electronics Letters 37(16) (2001) 10411042.
53. D. Chizhik, J. Ling, P. Wolniansky, R. A. Valenzuela, N. Costa and K. Huber,
Multiple-input multiple-output measurements and modelling in Manhattan,
IEEE J. Selected Areas in Communications 21(3) (2003) 321331.
54. V. Erceg, P. Soma, D. S. Baum and A. Paulraj, Capacity obtained from
multiple-input multiple-output channel measurements in xed wireless envi-
ronments at 2.5 GHz, Int. Conf. on Commun. (ICC02), New York City, NY
(Apr. 2002).
55. C. Chuah, D. Tse, J. Kahn and R. Valenzuela, Capacity scaling in dual-
antenna-array wireless systems, IEEE Trans. Information Theory 48(3)
(2002) 637650.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

and Shannon Transforms: Theory and Applications 137

56. A. M. Tulino, A. Lozano and S. Verdu, MIMO capacity with channel state
information at the transmitter, in Proc. IEEE Int. Symp. on Spread Spectrum
Tech. and Applications (ISSSTA04) (Aug. 2004).
57. E. Visotsky and U. Madhow, Space-time transmit precoding with imperfect
feedback, IEEE Trans. Information Theory 47 (2001) 26322639.
58. S. A. Jafar, S. Vishwanath and A. J. Goldsmith, Channel capacity and beam-
forming for multiple transmit and receive antennas with covariance feed-
back, in Proc. IEEE Int. Conf. on Communications (ICC01), Vol. 7 (2001),
pp. 22662270.
59. A. M. Tulino, S. Verdu and A. Lozano, Capacity of antenna arrays with
space, polarization and pattern diversity, in Proc. 2003 IEEE Information
Theory Workshop (ITW03) (Apr. 2003), pp. 324327.
60. T.-S. Chu and L. J. Greenstein, A semiempirical representation of antenna
diversity gain at cellular and PCS base stations, IEEE Trans. on Communi-
cations 45(6) (1997) 644656.
61. P. Driessen and G. J. Foschini, On the capacity formula for multiple-input
multiple-output channels: A geometric interpretation, IEEE Trans. on Com-
munications 47(2) (1999) 173176.
62. F. R. Farrokhi, G. J. Foschini, A. Lozano and R. A. Valenzuela, Link-optimal
space-time processing with multiple transmit and receive antennas, IEEE
Communications Letters 5(3) (2001) 8587.
63. P. Soma, D. S. Baum, V. Erceg, R. Krishnamoorthy and A. Paulraj, Anal-
ysis and modelling of multiple-input multiple-output (MIMO) radio channel
based on outdoor measurements conducted at 2.5 GHz for xed BWA appli-
cations, in Proc. IEEE Int. Conf. on Communications (ICC02), New York
City, NY (28 Apr.2 May 2002), pp. 272276.
64. S. Rice, Mathematical analysis of random noise, Bell System Technical Jour-
nal 23 (1944) 282332.
65. H. Ozcelik, M. Herdin, W. Weichselberger, G. Wallace and E. Bonek, De-
ciencies of the Kronecker MIMO channel model, IEE Electronic Letters 39
(2003) 209210.
66. W. C. Y. Lee and Y. S. Yeh, Polarization diversity system for mobile radio,
IEEE Trans. on Communications 20(5) (1972) 912923.
67. S. A. Bergmann and H. W. Arnold, Polarization diversity in portable
communications environment, IEE Electronic Letters 22(11) (1986) 609
610.
68. W. Weichselberger, M. Herdin, H. Ozcelik and E. Bonek, Stochastic MIMO
channel model with joint correlation of both link ends, IEEE Trans. Wireless
Communications 5(1) (2006) 90100.
69. A. Sayeed, Deconstructing multi-antenna channels, IEEE Trans. Signal Pro-
cessing 50(10) (2002) 25632579.
70. A. M. Tulino, A. Lozano and S. Verdu, Capacity-achieving input covari-
ance for single-user multi-antenna channels, Bell Labs Tech. Memorandum
ITD-04-45193Y, also in IEEE Trans. Wireless Communications 5(3) (2006)
662671.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino

138 A. M. Tulino

71. V. V. Veeravalli, Y. Liang and A. Sayeed, Correlated MIMO Rayleigh fading


channels: Capacity, optimal signalling and asymptotics, IEEE Trans. Infor-
mation Theory 51(6) (2005) 20582072.
72. A. L. Moustakas, S. H. Simon and A. M. Sengupta, MIMO capacity through
correlated channels in the presence of correlated interferers and noise:
A (not so) large N analysis, IEEE Trans. Information Theory 49(10) (2003)
25452561.
73. B. M. Hochwald, T. L. Marzetta and V. Tarokh, Multi-antenna channel hard-
ening and its implications for rate feedback and scheduling, IEEE Trans.
Information Theory 50(9) (2004) 18931909.
74. M. Kamath, B. Hughes and Y. Xinying, Gaussian approximations for the
capacity of MIMO Rayleigh fading channels, in Asilomar Conf. on Signals,
Systems and Computers (Nov. 2002).
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

THE REPLICA METHOD IN MULTIUSER


COMMUNICATIONS

Ralf R. Muller
Department of Electronics and Telecommunications
Norwegian University of Science and Technology
7491 Trondheim, Norway
E-mail: [email protected]

This review paper gives a tutorial overview of the usage of the replica
method in multiuser communications. It introduces the self averag-
ing principle, the free energy and other physical quantities and gives
them a meaning in the context of multiuser communications. The tech-
nical issues of the replica methods are explained to a non-physics
audience. An isomorphism between receiver metrics and the fundamen-
tal laws of physics is drawn. The overview is explained at the example
of detection of code-division multiple-access with random signature
sequences.

1. Introduction
Multiuser communication systems which are driven by Gaussian distributed
signals can be fully characterized by the distribution of the singular values
of the channel matrix in the large user limit. In digital communications,
however, transmitted signals are chosen from nite, often binary, sets. In
those cases, knowledge of the asymptotic spectrum of large random ma-
trices is, in general, not sucient to get valuable insight into the behavior
of characteristic performance measures such as bit error probabilities and
supported data rate. We will see that the quantized nature of signals gives
rise to the totally unexpected occurrence of phase transitions in multiuser
communications which can, by no means, be inferred from the asymptotic
convergence of eigenvalue spectra of large random matrices.
In order to analyze and design large dimensional communication sys-
tems which cannot be described by eigenvalues and eigenvectors alone, but

139
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

140 R. R. Muller

depend on statistics of the transmitted signal, e.g. minimum distances be-


tween signal points, a more powerful machinery than random matrix and
free probability theory is needed. Such a machinery was developed in sta-
tistical physics for the analysis of some particular magnetic materials called
spin glasses and is known as the replica method [1]. Additionally, the replica
method is well-tailored to cope with receivers whose knowledge about chan-
nel and/or input statistics is impaired.
The replica method is able to reproduce many of the results which were
found by means of random matrix and free probability theory, but the
calculations based on the replica method are often much more involved.
Moreover, it is still lacking mathematical rigor in certain respects. How-
ever, due to its success in explaining physical phenomena and its consis-
tency with engineering results from random matrix and free probability the-
ory, we can trust that its predictions in other engineering applications are
correct. Nevertheless, we should always exercise particular care when inter-
preting new results based on the replica method. Establishing a rigorous
mathematical basis for the replica method is a topic of current research in
mathematics and theoretical physics.

2. Self Average
While random matrix theory and recently also free probability theory
[2, 3] prove the (almost sure) convergence of some random variables to
deterministic values in the large matrix limit, statistical physics does not
always do so. It is considered a fundamental principle of statistical physics
that there are microscopic and macroscopic variables. Microscopic variables
are physical properties of microscopically small particles, e.g. the speed of a
gas molecule or the spin of an electron. Macroscopic variables are physical
properties of compound objects that contain many microscopic particles,
e.g. the temperature or pressure of a gas, the radiation of a hot object,
or the magnetic eld of a piece of ferromagnetic material. From a physics
point of view, it is clear which variables are macroscopic and which ones are
microscopic. An explicit proof that a particular variable is self-averaging,
i.e. it converges to a deterministic value in the large system limit, is a nice
result, if it is found, but it is not particularly important to the physics
community. When applying the replica method, systems are often only as-
sumed to be self-averaging. The replica method itself must be seen as a tool
to enable the calculation of macroscopic properties by averaging over the
microscopic properties.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 141

3. Free Energy
The second law of thermodynamics demands the entropy of any physical
system with conserved energy to converge to its maximum as time evolves.
If the system is described by a density pX(x) of states X R, this means
that in the thermodynamic equilibrium the (dierential) entropy

H(X) = log pX(x) dPX(x) (3.1)

is maximized while keeping the energy



E(X) = ||x|| dPX(x) (3.2)

constant. Hereby, the energy function ||x|| can be any measure which is
uniformly bounded from below.
The density at thermodynamic equilibrium is easily shown by the
method of Lagrange multipliers to be
e T ||x||
1

pX(x) =  + (3.3)
T1 ||x||
e dx

and called the Boltzmann distribution. The parameter T is called the tem-
perature of the system and determined by (3.2). For a Euclidean energy
measure, the Boltzmann distribution takes on the form of a Gaussian dis-
tribution which is well known in information theory to maximize entropy
for given average signal power.
A helpful quantity in statistical mechanics is the (normalized) free
energy a dened as

F(X) = E(X) T H(X) (3.4)
  + 
T1 ||x||
= T log e dx . (3.5)

In the thermodynamic equilibrium, the entropy is maximized and the free


energy is minimized since the energy is constant. The free energy normalized
to the dimension of the system is a self averaging quantity.
As we will see in the next section, the task of receivers in digital com-
munications is to minimize an energy function for a given received signal.
In the terminology of statistical physics, they minimize the energy for a

a The free energy is not related to freeness in free probability theory.


May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

142 R. R. Muller

given entropy. Formulating this optimization problem in Lagrangian form,


we nd that the free energy (3.4) is the object function to be minimized
and the temperature of the system is a Lagrange multiplier. We conclude
that, also from an engineering point of view, the free energy is a natural
quantity to look at.

4. The Meaning of the Energy Function


In statistical physics, the free energy characterizes the energy of a sys-
tem at given entropy via the introduction of a Lagrange multiplier which
is called temperature (3.4). This establishes the usefulness of the free en-
ergy for information theoretic tasks like calculations of channel capacities.
Moreover, the free energy is a tool to analyze various types of multiuser
detectors. In fact, the free energy is such a powerful concept that it needs
not any coding to be involved in the communication system to yield strik-
ing results. The only condition, it requires to be fullled, is the existence of
macroscopic variables, microscopic random variables and the existence of
an energy function. For communication systems, this requires, in practice,
nothing more than their size growing above all bounds.
In physics, the energy function is determined by the fundamental forces
of physics. It can represent kinetic energy, energy contained in electric, mag-
netic or nuclear elds. The broad applicability of the statistical mechanics
approach to communication systems stems form the validity of (3.4) for
any denition of the energy function. The energy function can be inter-
preted as the metric of a detector. Thus, any detector parameterized by a
certain metric can be analyzed with the tools of statistical mechanics in
the large system limit. There is no need that the performance measures of
the detectors depend only on the eigenvalues of the channel matrix in the
large system limits. However, there is a practical limit to the applicability
of the statistical mechanics framework to the analysis of large communi-
cation systems: The analytical calculations required to solve the equations
arising from (3.4) are not always feasible. The replica method was intro-
duced to circumvent such diculties. Some cases, however, have remained
intractable until present time.
Consider a communication channel uniquely characterized by a condi-
tional probability density pY |X(y, x) and a source uniquely characterized
by a prior density pX(x). Consider a detector for the output of this channel
characterized by an assumed channel transition probability pY |X(y, x) and
an assumed prior distribution pX(x). Let the detector minimize some kind
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 143

of cost function, e.g. bit error probability, subject to its hypotheses on the
channel transition probability pY |X(y, x) and the prior distribution pX(x).
If the assumed distributions equal the true distributions, the detector is op-
timum with respect to its cost function. If the assumed distributions dier
from the true ones, the detector is mismatched in some sense. The mismatch
can arise from insucient knowledge at the detector due to channel uctua-
tions or due to detector complexity. If the optimum detector requires an ex-
haustive search to solve an np-complete optimization, approximations to the
true prior distribution often lead to suboptimal detectors with reduced com-
plexity. Many popular detectors can be described within this framework.
The minimization of a cost function subject to some hypothesis on the
channel transition probability and some hypothesis on the prior distribu-
tion denes a metric which is to be optimized. This metric corresponds to
the energy function in thermodynamics and determines the distribution of
the microscopic variables in the thermodynamic equilibrium. In analogy to
(3.3), we nd

e T ||x||
1

pX|Y (x, y) =  + (4.1)


T1 ||x||
e dx

where the dependency on y and the assumed prior distribution is implicit


via the denition of the energy function || ||. The energy function reects
the properties of the detector. Using Bayes law, the appropriate energy
function corresponding to particular hypotheses on the channel transition
function and the prior distribution can be calculated via (4.1). While the en-
ergy function in statistical physics is uniquely dened by the fundamental
forces of physics, the energy function in digital communications charac-
terizes the algorithm run in the detector. Thus, every dierent algorithm
potentially run in a detector uniquely denes the statistical physics of a cor-
responding imaginary toy universe where the natural forces of physics have
been replaced by some imaginary alternatives characterizing a particular
detection algorithm.
In order to study macroscopic properties of the system, we must calcu-
late the free energy of the system. For that purpose, we make use of the
self-averaging property of the thermodynamic equilibrium and (3.5):

F (X) = E F (X|Y ) (4.2)


Y
  
e T ||x|| dx dPY (y) .
1
= T log (4.3)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

144 R. R. Muller

Note that, inside the logarithm, expectations are taken with respect to the
assumed distribution via the denition of the energy function, while, outside
the logarithm, expectations are taken with respect to the true distribution.
In the case of matched detection, i.e. the assumed distributions equal
the true distributions, the argument of the logarithm in (4.3) becomes pY (y)
up to a normalizing factor. Thus, the free energy becomes the (dierential)
entropy of Y up to a scaling factor and an additive constant.
Statistical mechanics provides an excellent framework to study not only
matched, but also mismatched detection. The analysis of mismatched detec-
tion in large communication systems which is purely based on asymptotic
properties of large random matrices and does not exploit the tools pro-
vided by statistical mechanics has been very limited so far. One exception
is the asymptotic SINR of linear MMSE multiuser detectors with erroneous
assumptions on the powers of interfering users in [4].

5. Replica Continuity
The explicit evaluation of the free energy turns out to be very complicated
in many cases of interest. One major obstacle is the occurrence of the ex-
pectation of the logarithm of some function f () of a random variable Y

E log f (Y ) . (5.1)
Y

In order to circumvent this expectation which also appears frequently in


information theory, the following identities are helpful
Yn1
log Y = lim (5.2)
n0 n
n
= lim Y . (5.3)
n0 n

Under the assumption that limit and expectation can be interchanged, this
gives

E log f (Y ) = lim E [f (Y )]n (5.4)
Y n Y
n0
n
= lim log E [f (Y )] (5.5)
n0 n Y

and reduces the problem to the calculation of the nth moment of the func-
tion of the random variable Y in the neighborhood of n = 0. Note that
the expectation must be calculated for real-valued variables n in order to
perform the limit operation.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 145

At this point, it is customary to assume analytic continuity of the func-


tion EY [f (Y )]n . That is, the expectation is calculated for integer n only,
but the resulting formula is trusted to hold for arbitrary real variables n
in the neighborhood of n = 0. Note that analytic continuity is just an
assumption. There is no mathematical theorem which states under which
exact conditions this assumption is true or false. In fact, establishing a
rigorous mathematical fundament for this step in the replica analysis is a
topic of ongoing research. However, in all physical problems where replicas
have been introduced this procedure seems to work and leads to reasonable
solutions [5].
Relying on the analytic continuity, let the function

f (Y ) = e ||x|| dx (5.6)

take the form of a partition function where the dependency on Y is implicit


via the denition of the energy function ||||. Since the variable of integration
is arbitrary, this implies
 n
n ||x||
E[f (Y )] = E e dx (5.7)
Y Y
n 

=E e ||xa || dxa (5.8)
Y
a=1
 
n

n
||xa ||
= Ee a=1 dxa . (5.9)
Y
a=1

Thus, instead of calculating the nth power of f (Y ), replicas of x are gen-


erated. These replicated variables xa are arbitrary and can be assigned
helpful properties. Often they are assumed to be independent random vari-
ables. Moreover, the replica trick allowed us to interchange integration and
expectation, although the expectation in (5.7) is to be taken over a nonlin-
ear function of the integral.

6. Saddle Point Integration


Typically, integrals arising from the replica ansatz are solved by saddle
point integration. The general idea of saddle point integration is as follows:
Consider an integral of the form

1
log eKf (x) dx. (6.1)
K
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

146 R. R. Muller

Note that we can write this integral as


  1/K
1
log eKf (x) dx = log exp(f (x))K dx (6.2)
K

which shows that it is actually the logarithm of the K-norm of the function
exp(f (x)). For K , we get the maximum norm and thus obtain

1
lim log eKf (x) dx = max f (x). (6.3)
K K x

That means, the integral can be solved maximizing the argument of the
exponential function.
Some authors also refer to the saddle point integration as saddle point
approximation and motivate it by a series expansion of the function f (x)
in the exponent. Making use of the identity (5.5) instead of (5.4), we can
argue via the innity norm and need not study under which conditions of
the function f (x) the saddle point approximation is accurate.

7. Replica Symmetry
If the function in the exponent is multivariate typically all replicated
random variables are arguments one would need to nd the extremum
of a multivariate function for an arbitrary number of arguments. This can
easily become a hopeless task, unless one can exploit some symmetries of
the optimization problem.
Assuming replica symmetry means that one concludes from the symme-
try of the exponent, e.g. f (x1 , x2 ) = f (x2 , x1 ) for the bi-variate case, that
the extremum appears if all variables take on the same value. Then, the
multivariate optimization problem reduces to a single variate one, e.g.

max f (x1 , x2 ) = max f (x, x) (7.1)


x1 ,x1 x

for the bi-variate case. This is the most critical assumption when applying
the replica method. In fact, it is not always true, even in practically relevant
cases. Figure 1 shows both an example and a counterexample. The general
way to circumvent this trouble is to assume replica symmetry at hand and
proof later, having found a replica symmetric solution, that it is correct.
With the example of Fig. 1 in mind, it might seem that replica sym-
metry is a very odd assumption. However, the functions to be extremized
arise from replication of identical integrals, see (5.9). Given the particular
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 147

0.2
f(x1,x2 )

0.2
2 2
1 1
x2 0 0 x1
1 1
2 2

Fig. 1. Graph of the function f (x1 , x2 ) = sin(x1 x2 ) exp(x21 x22 ). It is totally


symmetric with respect to the exchange of x1 and x2 . It shows symmetry with respect
to its minima, but breaks symmetry with respect to its maxima.

structure of the optimization problem



n
max f (xi ) (7.2)
x1 ,...,xn
a=1

it seems rather odd that replica symmetry might not hold. However, writing
our problem in the form of (7.2) assumes that the parameter n is an integer
despite the fact that it is actually a real number in the neighborhood of zero.
Thus, our intuition suggesting not to question replica symmetry cheats on
us. In fact, there are even practically relevant cases without sensible replica
symmetric solutions, e.g. cases were the replica symmetric solution implies
the entropy to be negative. Such phenomena are labeled replica symmetry
breaking and a rich theory in statistical mechanics literature exists to deal
with them [6, 5, 1]. For the introductory character of this work, however,
replica symmetry breaking is a too advanced issue.

8. Example: Analysis of Large CDMA Systems


The replica method was introduced into multiuser communications by the
landmark paper of Tanaka [7] for the purpose of studying the performance of
the maximum a-posteriori detector. Subsequently his work was generalized
and extended other problems in multiuser communications by himself and
Saad [8], Guo and Verdu [9], Muller et al. [10, 11], Caire et al. [4], Tanaka
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

148 R. R. Muller

and Okada [12], Kabashima [13], Li and Poor [14], Guo [15], and Wen and
Wong [16]. Additionally, the replica method has also been successfully used
for the design and analysis of error correction codes.
Consider a vector-valued real additive white Gaussian noise channel
characterized by the conditional probability distributionb

1
(yHx)T (yHx)
22
e 0
py|x,H(y, x, H) = N (8.1)
(202 ) 2

with x, y, N, 02 , H denoting the channel input, channel output, the lat-


ters number of components, the noise variance, and the channel matrix,
respectively. Moreover, let the detector be characterized by the assumed
conditional probability distribution

e 22 (yHx)
1 T
(yHx)
py|x,H(y, x, H) = N (8.2)
(2 2 ) 2

and the assumed prior distribution px(x). Let the entries of H be indepen-
dent zero-mean with vanishing odd order moments and variances wck /N for
row c and column k. Moreover, let wck be uniformly bounded from above.
Applying Bayes law, we nd

e 22 (yHx) (yHx)+log px(x)


1 T

px|y,H(x, y, H) =  . (8.3)
e 22 (yHx) (yHx) dPx(x)
1 T

Since (3.3) holds for any temperature T , we set without loss of generality
T = 1 in (3.3) and nd the appropriate energy function to be

1 T
||x|| = (y Hx) (y Hx) log px(x) . (8.4)
2 2
This choice of the energy function ensures that the thermodynamic equi-
librium models the detector dened by the assumed conditional and prior
distributions.
Let K denote the number of users, that is the dimensionality of the input
vector x. Applying successively (4.3) with (8.1), (5.5), replica continuity

b Inthis example, we do not use upper case and lower case notation to distinguish random
variables and their realizations to not mix up vectors and matrices.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 149

(5.9), and (8.4) we nd for the free energy per user


 12 (yHx)T (yHx) 
F(x) 1 e 20
= E N log e ||x|| dxdydPx(x)
K KH RN 2
(20 ) 2 R K

  n
1
= lim log E e ||x|| dx
K n0 n H RN RK

1
(yHx)T (yHx)
22
e 0
N dydPx(x) (8.5)
(202 ) 2
 n 
 212 (yHxa )T (yHxa )
E e a dPa(xa ) dy
1 RN H a=0
= lim log N
K n0 n (202 ) 2



=n

with a = , a 1, P0(x) = Px(x), and Pa(x) = Px(x) , a 1.


The following calculations are a generalization of the derivations by
Tanaka [7], Caire et al. [4], and Guo and Verdu [9]. They can also be found
in a recent work of Guo [15]. The integral in (8.5) is given by
 2
 n  K
1
E exp 2 yc hck xak dPa(xa ) dyc
N R H 2 a
a=0 k=1
n = , (8.6)
c=1
20
with yc , xak , and hck denoting the cth component of y, the kth component
of xa , and the (c, k)th entry of H, respectively. The integrand depends on
xa only through
 1 
K
vac = hck xak , a = 0, . . . , n (8.7)
k=1

with the load being dened as = K/N . Following [7], the quantities vac
can be regarded, in the limit K as jointly Gaussian random variables
with zero mean and covariances
1 (c)
Qab [c] = E vac vbc = xa xb (8.8)
H K
where we dened the following inner products
(c)  
K
xa xb = xak xbk wck . (8.9)
k=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

150 R. R. Muller

In order to perform the integration in (8.6), the K(n + 1)-dimensional


space spanned by the replicas and the vector x0 is split into subshells
  
  (c)
S{Q[]} = x0 , . . . , xn xa xb = KQab [c] (8.10)

where the inner product of two dierent vectors xa and xb is constant.c


The splitting of the K(n + 1)-dimensional space is depending on the chip
time c. With this splitting of the space, we ndd
 
N 
n = eKI{Q[]} eG{Q[c]} dQab [c], (8.11)
RN (n+1)(n+2)/2 c=1 ab

where

 
N (c)
n
x x
Qab [c]
a b
eKI{Q[]} = dPa (xa ) (8.12)
c=1
N a=0
ab

denotes the probability weight of the subshell and


   2 
1 n
yc
G{Q[c]}
e = E exp 2 vac {Q[c]} dyc . (8.13)
20 R H a=0 2a

This procedure is a change of integration variables in multiple dimensions


where the integration of an exponential function over the replicas has been
replaced by integration over the variables {Q[]}. In the following the two
exponential terms in (8.11) are evaluated separately.
First, we turn to the evaluation of the measure eKI{Q[]} . Since for some
t R, we have the Fourier expansions of the Dirac measure

(c)
x x
Qab [c]
a b
N

 (c)
1 x x
exp Qab [c] Qab [c] dQab [c] (8.14)
a b
=
2j J N

c The notation f {Q[]} expresses the dependency of the function f () on all Qab [c], 0
a b n, 1 c N . n  n
d The notation
ab is used as shortcut for a=0 b=a .
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 151

with J = (t j; t + j), the measure eKI{Q[]} can be expressed as


 
  N 
(c)

dQab [c] 
xa xb n
Q
KI{Q[]} Qab [c] N ab [c]
e = e dPa (xa )
c=1 J 2j a=0
ab
(8.15)


N 
Qab [c]Qab [c]
c=1 ab
= e
J N (n+2)(n+1)/2
 

K  N 
dQab [c]
Mk Q[] (8.16)
c=1
2j
k=1 ab

with

  N n
1
Mk Q[] = exp Qab [c]xak xbk wck dPa (xak ). (8.17)
N c=1 a=0
ab

In the limit of K one of the exponential terms in (8.11) will dominate


over all others. Thus, only the maximum value of the correlation Qab [c] is
relevant for calculation of the integral, as shown in Section 6.
At this point, we assume that the replicas have a certain symmetry,
as outlined in Section 7. This means, that in order to nd the maximum
of the objective function, we consider only a subset of the potential possi-
bilities that the variables Qab [] could take. Here, we restrict them to the
following four dierent possibilities Q00 [c] = p0c , Q0a [c] = mc , a = 0,
Qaa [c] = pc , a = 0, Qab [c] = qc , 0 = a = b = 0. One case distinc-
tion has been made, as zero and non-zero replica indices correspond to
the true and the assumed distributions, respectively, and thus will dier,
in general. Another case distinction has been made to distinguish correla-
tions Qab [] which correspond to correlations between dierent and iden-
tical replica indices. This gives four cases to consider in total. We ap-
ply the same idea to the correlation variables in the Fourier domain and
set Q00 [c] = G0c /2, Qaa [c] = Gc /2, a = 0, Q0a [c] = Ec , a = 0, and
Qab [c] = Fc , 0 = a = b = 0.
At this point the crucial benet of the replica method becomes obvious.
Assuming replica continuity, we have managed to reduce the evaluation of
a continuous function to sampling it at integer points. Assuming replica
symmetry we have reduced the task of evaluating innitely many integer
points to calculating eight dierent correlations (four in the original and
four in the Fourier domain).
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

152 R. R. Muller

The assumption of replica symmetry leads to

 n(n 1) G0c p0c n


Qab [c]Qab [c] = nEc mc + Fc qc + + Gc pc (8.18)
2 2 2
ab

and

Mk {E, F, G, G0 }
 
 1  N
G0c 2 n
Ec x0k xak + G2c x2ak +

n
Fc xak xbk n
N wck 2 x0k +
= e c=1 a=1 b=a+1
dPa (xak )
a=0
 G0k
x20k +

n
G
Ek x0k xak + 2k x2ak +

n
Fk xak xbk 
n
2
a=1 b=a+1
= e dPa (xak ) (8.19)
a=0

where

1 
N

Ek = Ec wck (8.20)
N c=1

1 
N

Fk = Fc wck (8.21)
N c=1

1 
N

Gk = Gc wck (8.22)
N c=1

1 
N

G0k = G0c wck . (8.23)
N c=1

Note that the prior distribution enters the free energy only via the (8.19).
We will focus on this later on after having nished with the other terms.
For the evaluation of eG{Q[c]} in (8.11), we can use the replica symmetry
to construct the correlated Gaussian random variables vac out of indepen-
dent zero-mean, unit-variance Gaussian random variables uc , tc , zac by
!
m2c mc
v0c = uc p0c tc (8.24)
qc qc

vac = zac pc qc tc qc , a > 0. (8.25)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 153

With that substitution, we get

eG(mc ,qc ,pc ,p0c ) (8.26)


 ! 2
 
1 m2c tc m c yc
= exp uc p0c Duc
20 R2 R 202 qc qc

   2  n
yc
exp 2 zc pc qc tc qc Dzc Dtc dyc
R 2
"
#
2 (pc% qc ))

# (1 + 1n
=$ 2
& (8.27)

1+
2 (pc qc ) + n 2 0 + p0c 2mc + qc

with the Gaussian measure Dz = exp(z 2 /2) dz/ 2. Since the integral in
(8.11) is dominated by the maximum argument of the exponential function,
the derivatives of

1  
N
G{Q[c]} Qab [c]Qab [c] (8.28)
N c=1
ab

with respect to mc , qc , pc and p0c must vanish as N . Taking derivatives


after plugging (8.18) and (8.27) into (8.28), solving for Ec , Fc , Gc , and G0c
and letting n 0 yields for all c
1
Ec = (8.29)
2 + (pc qc )
02 + (p0c 2mc + qc )
Fc = (8.30)
[ 2 + (pc qc )]2
Gc = Fc Ec (8.31)
G0c = 0. (8.32)

In the following, the calculations are shown explicitly for Gaussian and
binary priors. Additionally, a general formula for arbitrary priors is given.

8.1. Gaussian prior distribution


Assume a Gaussian prior distribution
1
pa(xak ) = exak /2
2
a. (8.33)
2
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

154 R. R. Muller

Thus, the integration in (8.19) can be performed explicitly and we nd with


[7, (87)]

"
# % &1n
#
# 1 + Fk Gk
Mk {E, F, G, G0 } = #
$% &% & . (8.34)
1 G0k 1 + Fk Gk nFk nEk2

In the large system limit, the integral in (8.16) is also dominated by


that value of the integration variable which maximizes the argument of
the exponential function under some weak conditions on the variances wck .
Thus, partial derivatives of


K 
N
n(n1) G0c p0c n
log Mk {E, F, G, G0 } nEc mc + Fc qc + + Gc pc
c=1
2 2 2
k=1

(8.35)

with respect to Ec , Fc , Gc , G0c must vanish for all c as N . An explicit


calculation of these derivatives yields

1 
K
Ek
mc = wck (8.36)
K 1 + Ek
k=1

1 
K
E 2 + Fk
qc = wck % k &2 (8.37)
K
k=1 1 + Ek

1 
K
E 2 + Ek + Fk + 1
pc = wck k % &2 (8.38)
K
k=1 1 + Ek

1 
K
p0c = wck (8.39)
K
k=1

in the limit n 0 with (8.31) and (8.32). Surprisingly, if we let the true
prior to be binary and only the replicas to be Gaussian we also nd (8.36)
to (8.39). This setting corresponds to linear MMSE detection [17].
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 155

Returning to our initial goal, the evaluation of the free energy, and
collecting our previous results, we nd
F(x) 1
= lim log n (8.40)
K K n0 n
N 
1  n(n 1)
= lim G(mc , qc , pc , p0c ) + Fc qc
K n0 n c=1 2
  K
n
+ nEc mc + Gc pc log Mk {E, F, G, 0} (8.41)
2
k=1
N  
1 
= lim log 1 + 2 (pc qc ) + 2Ec mc + Gc pc
2K n0 c=1

02 + (p0c 2mc + qc )
+(2n 1)Fc qc + 2
+ (pc qc ) + n02 + n(p0c 2mc + qc )

1 K % & Ek2 + Fk
+ lim log 1 + Ek (8.42)
2K n0
k=1
1 + Ek nEk2 nFk
N  
1 
= log 1 + 2 (pc qc ) + 2Ec mc Fc qc
2K c=1

Ec 1 
K % & E 2 + F
k
+Gc pc + + log 1 + Ek k . (8.43)
Fc 2K 1 + Ek
k=1
This is the nal result for the free energy of the mismatched detector as-
suming noise variance 2 instead of the true noise variance 02 . The six
macroscopic parameters Ec , Fc , Gc , mc , qc , pc are implicitly given by the si-
multaneous solution of the system of equations (8.29) to (8.31) and (8.36) to
(8.38) with the denitions (8.20) to (8.22) for all chip times c. This system
of equations can only be solved numerically.
Specializing our result to the matched detector assuming the true noise
variance by letting 0 , we have Fc Ec , Gc G0c , qc mc ,
pc p0c . This makes the free energy simplify to
N 
F(x) 1  2 ' 2 ( 1 
K % &
= 0 Ec log 0 Ec + log 1 + Ek (8.44)
K 2K c=1 2K
k=1
with
1
Ec = . (8.45)
K
wck
02 +
K 1 + Ek
k=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

156 R. R. Muller

This result is more compact and it requires only to solve (8.45) numerically
which is conveniently done by xed-point iteration.
It can be shown that the parameter Ek is actually the signal-to-
interference and noise ratio of user k. It has been derived independently by
Hanly and Tse [18] in context of CDMA with macro-diversity using a result
from random matrix theory by Girko [19]. Note that (8.45) and (8.20) are
actually formally equivalent to the result Girko found for random matrices.
The similarity of free energy with the entropy of the channel output
mentioned at the end of Section 4 is expressed by the simple relationship
I(x, y) F(x) 1
= (8.46)
K K 2
between the (normalized) free energy and the (normalized) mutual infor-
mation between channel input signal x and channel output signal y given
the channel matrix H. Assuming that the channel is perfectly known to the
receiver, but totally unknown to the transmitter, (8.46) gives the channel
capacity per user.

8.2. Binary prior distribution


Now, we assume a non-uniform binary prior distribution
1 + tk 1 tk
pa(xak ) = (xak 1) + (xak + 1). (8.47)
2 2
Plugging the prior distribution into (8.19), we nd

Mk {E, F, G, G0 }
 G0k +nGk n 
n
2 + Ek x0k xak + Fk xak xbk 
n
a=1 b=a+1
= e dPa (xak ) (8.48)
Rn+1 a=0
)  n 
1  1 + tk  n
= e 2 (G0k +nGk ) exp Ek xak + Fk xak xbk
2
{xak ,a=1,...,n} a=1 b=a+1
 n * n
1 tk  n 
+ exp Ek xak + Fk xak xbk Pr(xak ) (8.49)
2
a=1 b=a+1 a=1
)   n 2 
1
(G0k +nGk nFk )
 1 + tk Fk  n
=e 2 exp xak + Ek xak
2 2
{xak ,a=1,...,n} a=1 a=1
  2 *
1 tk Fk 
n 
n 
n
+ exp xak Ek xak Pr(xak ) (8.50)
2 2
a=1 a=1 a=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 157

where we can use the following property of the Gaussian measure


    + 
S2
exp Fk = exp Fk zS Dz S R (8.51)
2

which is also called the Hubbard-Stratonovich transform to linearize the


exponents

Mk {E, F, G, G0 }
  +  
 1 + tk
n
= e (G0k +nGk nFk )
1
2 exp z Fk + Ek xak
2
{xak ,a=1,...,n} a=1
  +  
1 tk
n 
n
+ exp z Fk + Ek xak Dz Pr(xak ). (8.52)
2 a=1 a=1

Since
 +   n
  n 
fn = exp z Fk + Ek xka Pr(xka ) (8.53)
{xka ,a=1,...,n} a=1 a=1
  +  
= Pr(xkn )fn1 exp z Fk + Ek xkn (8.54)
xkn
% , &
cosh k /2 + z Fk + Ek
= fn1 (8.55)
cosh (k /2)
% , &
coshn k /2 + z Fk + Ek
= (8.56)
coshn (k /2)


with tk = tanh(k /2), we nd

Mk {E, F, G, G0 }
- 1+tk % , & % , &
2 coshn
z Fk + Ek + k
2 + 1tk
2 coshn
z Fk + Ek k
2 Dz
= ' ( % & .
coshn 2k exp nFk G20k nGk
(8.57)

In the large system limit, the integral in (8.16) is dominated by that


value of the integration variable which maximizes the argument of the ex-
ponential function under some weak conditions on the variances wck . Thus,
partial derivations of (8.35) with respect to Ec , Fc , Gc , G0c must vanish for
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

158 R. R. Muller

all c as N . An explicit calculation of these derivatives gives


  + 
1 
K
1 + tk k
mc = wck tanh z Fk + Ek +
K 2 2
k=1
 + 
1 tk k
+ tanh z Fk + Ek Dz (8.58)
2 2
  + 
1 
K
1 + tk k
qc = wck tanh2 z Fk + Ek +
K 2 2
k=1
 + 
1 tk k
+ tanh2 z Fk + Ek Dz (8.59)
2 2
1 
K
pc = p0c = wck (8.60)
K
k=1

in the limit n 0. In order to obtain (8.59), note from (8.50) that the rst
order derivative of Mk exp(nFk /2) with respect to Fc is identical to half of
the second order derivative of Mk exp(nFk /2) with respect to Ec .
Returning to our initial goal, the evaluation of the free energy, and
collecting our previous results, we nd
F(x) 1
= lim log n
K K n0 n
N 
1 
= lim G(mc , qc , pc , p0c ) + nEc mc
K n0 n c=1
 K
n(n 1) n
+ Fc qc + Gc pc log Mk {E, F, G, 0} (8.61)
2 2
k=1

N   
1 
= log 1 + 2 (pc qc )
2K c=1

Ec
+ Ec (2mc +pc ) + Fc (pc qc ) +
Fc
K   + 
1  1 + tk k Ek
log cosh z Fk + Ek +
K 2 2 2
k=1
 + 
1 tk k 1 ' (
+ log cosh z Fk + Ek Dz + log 1 t2k .
2 2 2
(8.62)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 159

This is the nal result for the free energy of the mismatched detector as-
suming noise variance 2 instead of the true noise variance 02 . The ve
macroscopic parameters Ec , Fc , mc , qc , pc are implicitly given by the simul-
taneous solution of the system of equations (8.29), (8.30) and (8.58) to
(8.60) with the denitions (8.20) to (8.22) for all chip times c. This system
of equations can only be solved numerically. Moreover, it can have multi-
ple solutions. In case of multiple solutions, the correct solution is that one
which minimizes the free energy, since in the thermodynamic equilibrium
the free energy is always minimized, cf. Section 3.
Specializing our result to the matched detector assuming the true noise
variance by letting 0 , we have Fc Ec , Gc G0c , qc mc which
makes the free energy simplify to

1 . 2 ' (/ 1 1 ' (
N K
F(x)
= 0 Ec log 02 Ec log 1 t2k Ek
K 2K c=1 K 2
k=1
  + 
1 + tk k
+ log cosh z Ek + Ek + (8.63)
2 2
 + 
1 tk k
+ log cosh z Ek + Ek Dz
2 2

where the macroscopic parameters Ec are given by


% , &
  1 tanh z Ek + Ek
1
K
' (
= 02 + wck 1 t2k % , & Dz. (8.64)
Ec K 1 t2 tanh2 z E + E
k=1 k k k

Similar to the case of Gaussian priors, Ek can be shown to be a kind


of signal-to-interference and noise ratio, in the sense that the bit error
probability of user k is given by

Pr(xk = xk ) = Dz. (8.65)
Ek

In fact, it can even be shown that in the large system limit, an equivalent
additive white Gaussian noise channel can be dened to model the mul-
tiuser interference [10, 9]. Constraining the input alphabet of the channel
to follow the non-uniform binary distribution (8.47) and assuming channel
state information being available only at the transmitter, channel capacity
is given by (8.46) with the free energy given in (8.63).
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

160 R. R. Muller

Large system results for binary prior (even for uniform binary prior)
have not yet been able to be derived by means of rigorous mathematics
despite intense eort to do so. Only for the case of vanishing noise variance
a fully mathematically rigorous result was found by Tse and Verdu [20]
which does not rely on the replica method.

8.3. Arbitrary prior distribution


Consider now an arbitrary prior distribution. As shown by Guo and Verdu
[9], this still allows to reduce the multi-dimensional integration over all repli-
cated random variables to a scalar integration over the prior distribution.
Consider (8.19) giving the only term that involves the prior distribution
and apply the Hubbard-Stratonovich transform (8.51)

Mk {E, F, G, G0 }
 G0k
x20k +

n
Ek x0k xak +
Gk
x2ak +

n
Fk xak xbk 
n
2 2
a=1 b=a+1
= e dPa (xak ) (8.66)
a=0
 2
 G0k Fk 
n 
n
Gk Fk

n
2 x20k + 2 xak + Ek x0k xak + 2 x2ak
= e a=1 a=1 dPa (xak ) (8.67)
a=0

 
n
G0k
2 x20k + Ek x0k xak + Fk zxak +
Gk Fk
2 x2ak 
n
= e a=1 Dz dPa (xak ) (8.68)
a=0
   n
G0k Gk Fk
x2k Ek xk xk + Fk z xk + x2k
= e 2 e 2 dPxk(xk ) DzdPxk(xk ) .
(8.69)

In the large system limit, the integral in (8.16) is dominated by that


value of the integration variable which maximizes the argument of the ex-
ponential function under some weak conditions on the variances wck . Thus,
partial derivations of (8.35) with respect to Ec , Fc , Gc , G0c must vanish for
all c as N . While taking derivatives with respect to Ec , Gc and G0c
straightforwardly lead to suitable results, the derivative with respect to Fc
requires a little trick: Note for the integrand Ik in (8.67), we have

1 2
Ik = 2 2
Ik Ik . (8.70)
Fc 2x0k Ec Gc
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 161

With the help of (8.70), an explicit calculation of the four derivatives


gives the following expressions for the macroscopic parameters mc , qc , pc
and p0c
 
x2
 - Ek xk xk 2k + Fk z xk
1 K
xk e dPxk(xk )
mc = wck xk   DzdPxk(xk )
x2
K
k=1 - Ek xk xk 2k + Fk zxk
e dPxk(xk )
(8.71)

x2
 2
K  - Ek xk xk 2k + Fk z xk
1 xk e  dPxk(xk )
qc = wck x2
 DzdPxk(xk )
K
k=1 - Ek xk xk 2 + Fk zxkk
e dPxk(xk )
 
(8.72)
x2
  - E x x k + F z x
1 
K k k k 2 k k
x2k e dPxk(xk )
pc = wck  2
 DzdPxk(xk )
K
k=1 - Ek xk xk 2k + Fk zxk
x

e dPxk(xk )
(8.73)

1 
K
p0c = wck x2k dPxk(xk ) (8.74)
K
k=1

with (8.31) and (8.32) in the limit n 0.


Returning to our initial goal, the evaluation of the free energy, and
collecting our previous results, we nd
N    
F(x) 1  Ec
= log 1+ 2 (pc qc ) +Ec (2mc +pc ) + Fc (pc qc )+
K 2K c=1 Fc

K    
1 
x2
k
Ek xk xk + Fk z xk

2
log e dPxk(xk ) DzdPxk(xk ) .
K
k=1
(8.75)

This is the nal result for the free energy of the mismatched detector as-
suming noise variance 2 instead of the true noise variance 02 . The ve
macroscopic parameters Ec , Fc , mc , qc , pc are implicitly given by the simul-
taneous solution of the system of equations (8.29), (8.30) and (8.58) to
(8.60) with the denitions (8.20) to (8.22) for all chip times c. This system
of equations can only be solved numerically. Moreover, it can have multi-
ple solutions. In case of multiple solutions, the correct solution is that one
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

162 R. R. Muller

which minimizes the free energy, since in the thermodynamic equilibrium


the free energy is always minimized, cf. Section 3.

9. Phase Transitions
In thermodynamics, the occurrence of phase transitions, i.e. melting ice
becomes water, is a well-known phenomenon. In digital communications,
however, such phenomena are less known, though they do occur. The sim-
ilarity between thermodynamics and multiuser detection pointed out in
Section 4, should be sucient to convince the reader that phase transitions
in digital communications do occur. Phase transitions in turbo decoding
and detection of CDMA were found in [21] and [7], respectively.
The phase transitions in digital communications are similar to the hys-
teresis in ferro-magnetic materials. They occur if the equations determining
the macroscopic parameters, e.g. Ec determined by (8.64), have multiple
solutions. Then, it is the free energy to decide which of the solution corre-
sponds to the thermodynamic equilibrium. If a system parameter, e.g. the
load or the noise variance, changes, the free energy may shift its favor from
one solution to another one. Since each solution corresponds to a dierent
macroscopic property of the system, changing the valid solution means that
a phase transition takes place.
In digital communications, a popular macroscopic property is the bit
error probability. It is related to the macroscopic property Ek in (8.64) by
(8.65) for the case considered in Section 8. Numerical results are depicted
in Fig. 2. The thick curve shows the bit error probability of the individually
optimum detector as a function of the load. The thin curves show alternative
solutions for the bit error probability corresponding to alternative solutions
to the equations for the macroscopic variable Ek . Only for a certain interval
of the load, approximately 1.73 3.56 in Fig. 2, multiple solutions
coexist. As expected, the bit error probability increases with the load. At
a load of approximately = 1.986 a phase transition occurs and lets the
bit error probability jump. Unlike to ferromagnetic materials, there is no
hysteresis eect for the bit error probability of the individually optimum
detector, but only a phase transition. This is, as the external magnetic
eld corresponds to the channel output observed by the receiver. Unlike
an external magnetic eld, the channel output is a statistical variable and
cannot be design to undergo certain trajectories.
In order to observe a hysteresis behavior, we can expand our scope to
neural networks. Consider a Hopeld neural network [22] implementation of
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 163

0
10

bit error probability


1
10
a)

2
10

b)
3
10

4
10
0 1 2 3 4 5
load

Fig. 2. Bit error probability for the individually optimum detector with uniform binary
prior distribution versus system load for 10 log10 (Es /N0 ) = 6 dB.

the individually optimum multiuser detector which is an algorithm based


on non-linear gradient search maximizing the energy function associated
with the detector. Its application to the problem of multiuser detection is
discussed in [23]. With appropriate denition of the energy function, such
a detector will achieve the performance of the upper curve in Fig. 2 in the
large system limit. Thus, in the interval 1.73 1.986 where the free
energy favors the curve with lower bit error probability, the Hopeld neural
network is suboptimum (labeled with a)e . The curve labeled with b can
also be achieved by the Hopeld neural network, but only with the help of
a genie. In order to achieve a point in that area, cancel with the help of a
genie as many interferers as needed to push the load below the area where
multiple solutions occur, i.e. < 1.73. Then, initialize the Hopeld neural
network with the received vector where the interference has been canceled
and let it converge to the thermodynamic equilibrium. Then, slowly add
one by one the interferers you had canceled with the help of the genie while
the Hopeld neural network remains in the thermodynamic equilibrium
by performing iterations. If all the interference suppressed by the genie
has been added again, the targeted point on the lower curve in area b is

e Notethat in a system with a finite number of users, the Hopfield neural network is
suboptimal at any load.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

164 R. R. Muller

reached. The Hopeld neural network follows the lower curve, if interference
is added, and it follows the upper line, if it is removed.
It should be remarked that a hysteresis behavior of the Hopeld neural
network detector does not occur for all denitions of the energy function and
all prior distributions of the data to be detected, but additional conditions
on the microscopic conguration of the system need to be fullled.

References
1. H. Nishimori, Statistical Physics of Spin Glasses and Information Processing
(Oxford University Press, Oxford, U.K., 2001).
2. F. Hiai and D. Petz, The Semicircle Law, Free Random Variables and Entropy
(American Mathematical Society, Providence, RI, 2000).
3. S. Thorbjrnsen, Mixed moments of Voiculescus Gaussian random matrices,
J. Funct. Anal. 176 (2000) 213246.
4. G. Caire, R. R. Muller and T. Tanaka, Iterative multiuser joint decoding:
Optimal power allocation and low-complexity implementation, IEEE Trans.
Inform. Theory 50(9) (2004) 19501973.
5. K. H. Fischer and J. A. Hertz, Spin Glasses (Cambridge University Press,
Cambridge, U.K., 1991).
6. M. Mezard, G. Parisi and M. A. Virasoro, Spin Glass Theory and Beyond
(World Scientic, Singapore, 1987).
7. T. Tanaka, A statistical mechanics approach to large-system analysis of
CDMA multiuser detectors, IEEE Trans. Inform. Theory 48(11) (2002)
28882910.
8. T. Tanaka and D. Saad, A statistical-mechanics analysis of coded CDMA
with regular LDPC codes, in Proc. of IEEE International Symposium on
Information Theory (ISIT), Yokohama, Japan (June/July 2003), p. 444.
9. D. Guo and S. Verdu, Randomly spread CDMA: Asymptotics via statistical
physics, IEEE Trans. Inform. Theory 51(6) (2005) 19832010.
10. R. R. Muller and W. H. Gerstacker, On the capacity loss due to separation
of detection and decoding, IEEE Trans. Inform. Theory 50(8) (2004) 1769
1778.
11. R. R. Muller, Channel capacity and minimum probability of error in large
dual antenna array systems with binary modulation, IEEE Trans. Signal
Process. 51(11) (2003) 28212828.
12. T. Tanaka and M. Okada, Approximate belief propagation, density evolution,
and statistical neurodynamics for CDMA multiuser detection, IEEE Trans.
Inform. Theory 51(2) (2005) 700706.
13. Y. Kabashima, A CDMA multiuser detection algorithm on the basis of belief
propagation, J. Phys. A 36 (2003) 1111111121.
14. H. Li and H. Vincent Poor, Impact of channel estimation errors on multiuser
detection via the replica method, EURASIP J. Wireless Commun. Network-
ing 2005(2) (2005) 175186.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller

The Replica Method in Multiuser Communications 165

15. D. Guo, Performance of multicarrier CDMA in frequency-selective fading via


statistical physics, IEEE Trans. Inform. Theory 52(4) (2006) 17651774.
16. C.-K. Wen and K.-K. Wong, Asymptotic analysis of spatially correlated
MIMO multiple-access channels with arbitrary signaling inputs for joint and
separate decoding, submitted to IEEE Trans. Inform. Theory (2004).
17. S. Verdu, Multiuser Detection (Cambridge University Press, New York,
1998).
18. S. V. Hanly and D. N. C. Tse, Resource pooling and eective bandwidth in
CDMA networks with multiuser receivers and spatial diversity, IEEE Trans.
Inform. Theory 47(4) (2001) 13281351.
19. V. L. Girko, Theory of Random Determinants (Kluwer Academic Publishers,
Dordrecht, The Netherlands, 1990).
20. D. N. C. Tse and S. Verdu, Optimum asymptotic multiuser eciency of
randomly spread CDMA, IEEE Trans. Inform. Theory 46(7) (2000) 2718
2722.
21. D. Agrawal and A. Vardy, The turbo decoding algorithm and its phase tra-
jectories, IEEE Trans. Inform. Theory 47(2) (2001) 699722.
22. J. J. Hopeld, Neural networks and physical systems with emergent collective
computational abilities, Proc. Natl. Acad. Sci. USA 79 (1982) 25542558.
23. G. I. Kechriotis and E. S. Manolakos, Hopeld neural network implementa-
tion of the optimal CDMA multiuser detector, IEEE Trans. Neural Networks
7(1) (1996) 131141.

You might also like