Zhi Dong Bai, Random Matrix Theory and Its Applications
Zhi Dong Bai, Random Matrix Theory and Its Applications
Zhi Dong Bai, Random Matrix Theory and Its Applications
ITS APPLICATIONS
Multivariate Statistics and Wireless Communications
LECTURE NOTES SERIES
Institute for Mathematical Sciences, National University of Singapore
Published
Editors
Zhidong Bai
National University of Singapore, Singapore
and
Northeast Normal University, P. R. China
Yang Chen
Imperial College London, UK
Ying-Chang Liang
Institute for Infocomm Research, Singapore
World Scientific
NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TA I P E I CHENNAI
A-PDF Merger DEMO : Purchase from www.A-PDF.com to remove the watermark
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore
Vol. 18
RANDOM MATRIX THEORY AND ITS APPLICATIONS
Multivariate Statistics and Wireless Communications
Copyright 2009 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
ISBN-13 978-981-4273-11-4
ISBN-10 981-4273-11-2
Printed in Singapore.
CONTENTS
Foreword vii
Preface ix
Future of Statistics
Zhidong Bai and Shurong Zheng 69
v
This page intentionally left blank
March 10, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) foreword-vol18
FOREWORD
vii
This page intentionally left blank
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) preface-vol18
PREFACE
ix
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) preface-vol18
x Preface
Jack W. Silverstein
Department of Mathematics
North Carolina State University
Box 8205, Raleigh, North Carolina 27695-8205, USA
E-mail: [email protected]
1. Introduction
Let M(R) denote the collection of all subprobability distribution func-
tions on R. We say for {Fn } M(R), Fn converges vaguely to F
v
M(R) (written Fn F ) if for all [a, b], a, b continuity points of F ,
D
limn Fn {[a, b]} = F {[a, b]}. We write Fn F , when Fn , F are prob-
ability distribution functions (equivalent to limn Fn (a) = F (a) for all
continuity points a of F ).
For F M(R),
1
mF (z) dF (x), z C+ {z C : z > 0}
xz
1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
2 J. W. Silverstein
b
1
= lim ddF (x)
0+ a (x )2 + 2
1 1 bx 1 ax
= lim Tan Tan dF (x)
0+
= I[a,b] dF (x) = F {[a, b]}.
(5) If, for x0 R, mF (x0 ) limzC+ x0 mF (z) exists, then F is dier-
entiable at x0 with value ( 1 )mF (x0 ) ([9]).
for all continuous f vanishing at , and the fact that an analytic function
dened on C+ is uniquely determined by the values it takes on S, we have
v
Fn F mFn (z) mF (z) for all z S.
1
tr (An zI)1 mF (z) a.s.
n
The main goal of the lectures is to show the importance of the Stielt-
jes transform to limiting behavior of certain classes of random matrices.
We will begin with an attempt at providing a systematic way to show a.s.
convergence of the e.d.f.s of the eigenvalues of three classes of large di-
mensional random matrices via the Stieltjes transform approach. Essential
properties involved will be emphasized in order to better understand where
randomness comes in and where basic properties of matrices are used.
Then it will be shown, via the Stieltjes transform, how the limiting dis-
tribution can be numerically constructed, how it can explicitly (mathemat-
ically) be derived in some cases, and, in general, how important qualitative
information can be inferred. Other results will be reviewed, namely the
exact separation properties of eigenvalues, and distributional behavior of
linear spectral statistics.
It is hoped that with this knowledge other ensembles can be explored
for possible limiting behavior.
Each theorem below corresponds to a matrix ensemble. For each one
the random quantities are dened on a common probability space. They all
assume:
n
For n = 1, 2, . . . Xn = (Xij ), n N , Xij
n
C, i.d. for all n, i, j, independent
1 1 2
across i, j for each n, E|X1 1 EX1 1 | = 1, and N = N (n) with n/N c > 0
as n .
(a) Tn = diag(tn1 , . . . , tnn ), tni R, and the e.d.f. of {tn1 , . . . , tnn } converges
weakly, with probability one, to a nonrandom probability distribution
function H as n .
v
(b) An is a random N N Hermitian random matrix for which F An A
where A is nonrandom (possibly defective).
(c) Xn , Tn , and An are independent.
4 J. W. Silverstein
n , H nonrandom.
Let Bn = (1/N )(Rn + Xn )(Rn + Xn ) where > 0, nonrandom.
Then, with probability one, F Bn F as n where for each z C+
D
m = mF (z) satisfies
1
m= 2 cm)z + 2 (1 c)
dH(t) . (1.3)
t
1+2 cm (1 +
Remark 1.4. In Theorem 1.1, if An = 0 for all n large, then mA (z) = 1/z
and we nd that mF has an inverse
1 t
z = +c dH(t). (1.4)
m 1 + tm
Since
n n 1/2
F (1/N )Xn Tn Xn = I[0,) + F (1/N )Tn Xn Xn Tn
1/2
1
N N
we have
1 n/N n
mF (1/N )Xn Tn Xn (z) = + m (1/N )Tn1/2 Xn X Tn1/2 (z) z C+ ,
z N F n
(1.5)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
so we have
1c
mF (z) = + cmF (z). (1.6)
z
Using this identity, it is easy to see that (1.2) and (1.4) are equivalent.
a A1 (a + b)
a (A + (a + b)(a + b) )1 = a A1 (a + b) A1
1 + (a + b) A1 (a + b)
1 + b A1 (a + b) 1 a A1 (a + b)
= a A b A1 .
1 + (a + b) A1 (a + b) 1 + (a + b) A1 (a + b)
1
= A1 (a + b)(a + b) A1 .
1 + (a + b) A1 (a + b)
Multiplying both sides on the left by a gives the result.
6 J. W. Silverstein
i
|i z|2
and
|e q|2
|1 + tq (B zI)1 q| |t|(q (B zI)1 q) = |t|z i
.
i
|i z|2
1 1
=1 1
1 + tq (A zI)1 q 1 + t(1/n)tr (A zI)1
1
1 .
1 + t mA+tqq (z)
Making this and other observations rigorous requires technical consid-
erations, the rst being truncation and centralization of the elements of
Xn , and truncation of the eigenvalues of Tn in Theorem 1.2 (not needed
in Theorem 1.1) and (1/n)Rn Rn in Theorem 1.3, all at a rate slower than
n (a ln n for some positive a is sucient). The truncation and centraliza-
tion steps will be outlined later. We are at this stage able to go through
algebraic manipulations, keeping in mind the above three lemmas, and in-
tuitively derive the equations appearing in each of the three theorems. At
the same time we can see what technical details need to be worked out.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
i=1
n
ti qi (B(i) zI)1 (An (z x)I)1 qi
= (1/N )
i=1
1 + ti qi (B(i) zI)1 qi
8 J. W. Silverstein
we have
n
ti
mF An (z xn ) mF Bn (z) = (1/N ) di (2.1)
i=1
1 + ti mF Bn (z)
where
1 + ti mF Bn (z)
di = q (B(i) zI)1 (An (z xn )I)1 qi
1 + ti qi (B(i) zI)1 qi i
1
|qi (B(i) zI)1 (An (zx(i) )I)1 qi tr (B(i) zI)1 (An (zx(i) )I)1 |]
N
0 as n .
D
Consider now a realization for which (2.2) holds, > 0, F Tn H, and
v
F An A. From Lemma 2.2 and (2.2) we have
max max[|mF Bn (z)mF B(i) (z)|, |mF Bn (z)qi (B(i) zI)1 qi |] 0, (2.3)
in
and subsequently
1 + ti mF Bn (z)
max max
1 , |x x(i) | 0. (2.4)
in 1 + ti q (B zI)1 qi
i (i)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
1
N
1
= 1 .
N j=1 1 + rj (B(j) zI)1 rj
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
10 J. W. Silverstein
Therefore
1
N
1
mn (z) = . (3.2)
N j=1 z(1 + rj (B(j) zI)1 rj )
N
Write Bn zI zmn (z)Tn zI = j=1 rj rj (zmn (z))Tn . Taking
inverses and using Lemma 2.1, (3.2) we have
(zmn (z)Tn zI)1 (Bn zI)1
N
= (zmn (z)Tn zI)1
rj rj (zmn (z))Tn (Bn zI)1
j=1
N
1
= (B 1 r )
(mn (z)Tn + I)1 rj rj (B(j) zI)1
j=1
z(1 + rj (j) zI) j
(1/N )(mn (z)Tn + I)1 Tn (Bn zI)1 .
Taking the trace and dividing by n we nd
1
N
1 1
(1/n)tr (zmn (z)Tn zI) mn (z) = dj
N j=1 z(1 + rj (B(j) zI)1 rj )
where
dj = qj Tn1/2 (B(j) zI)1 (mn (z)Tn + I)1 Tn1/2 qj
1
N
1
mn (z) = . (3.4)
N j=1 z(1 + (1/N )(rj + xj ) (B(j) zI)1 (rj + xj ))
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
1
N
Bn zI (Yn zI) = (rj + xj )(rj + xj ) Yn .
N j=1
The goal is to determine Yn so that each term goes to zero. Notice rst that
(1/n)xj (B(j) zI)1 (Yn zI)1 xj (1/n)tr (Bn zI)1 (Yn zI)1 ,
2 zmn (z)I.
|(1/n)xj Crj |2 (1/n2 )tr Crj rj C = (1/n2 )rj C Crj = o(1) (3.5)
a (A + (a + b)(a + b) )1
1 + b A1 (a + b) a A1 (a + b)
= 1
a A1 b A1 .
1 + (a + b) A (a + b) 1 + (a + b) A1 (a + b)
Identify a with (1/ N )rj , b with (1/ N )xj , and A with B(j) . Using
Lemmas 2.2, 2.3 and (3.5), we have
1 + 2 cn mn (z) 1
1 1
r (B(j) zI)1 (Yn zI)1 rj .
1+ N (rj+xj ) (B(j)zI) (rj+xj ) n j
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
12 J. W. Silverstein
Therefore
1 (1/n)rj (B(j) zI)1 (Yn zI)1 rj
N
1
= (1/n) tr (1/N )Rn Rn (Bn zI)1 (Yn zI)1 .
1 + 2 cn mn (z)
So we should take
1
Yn = (1/N )Rn Rn 2 zmn (z)I.
1+ 2 c n mn (z)
Then (1/n)tr (Yn zI)1 will approach the right hand side of (3.3).
1
= dA( ) .
t2 m
zc t
1+tm dH(t) i z + c |1+tm|2 dH(t)
Therefore
t2 m 1
m = z + c dH(t) t 2 dA( ) .
|1 + tm|2
z + c 1+tm dH(t)
(4.1)
+
Suppose m C also satises (1.1). Then
t
t
1+tm 1+tm dH(t)
mm = c t t dA( )
z + c 1+tm dH(t) z + c 1+tm dH(t)
t2
(m m)c dH(t) (4.2)
(1 + tm)(1 + tm)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
1
dA( ).
z+c t
1+tm dH(t) z+c t
1+tm dH(t)
t2
c dH(t)
(1 + tm)(1 + tm)
1
dA( )
z+c t
1+tm dH(t) z+c t
1+tm dH(t)
1/2
t2 1
c dH(t) 2 dA( )
|1 + tm|2
z + c 1+tm dH(t)
t
1/2
2
t 1
c dH(t) 2 dA( )
|1 + tm|2
z + c 1+tm dH(t)
t
1/2
2
t m
= c dH(t) t2 m
|1 + tm|2 z + c dH(t)
|1+tm|2
1/2
t2 m
c dH(t) t2 m < 1.
|1 + tm|2 z + c |1+tm|2 dH(t)
14 J. W. Silverstein
Let pn = P(|X11 | n). Since the second moment of X11 is nite we
have
1
=P I(|Xij |n) pn pn .
N n ij 2n
1
n
1 a.s.
rank(Tn T ) = I(|tni |>) cH{[, ]c }.
N N i=1
1
[tr (X n T X n )2 + 4tr (X n T X n X n T X n )
N
+ 4(tr (X n T X n X n T X n )tr (X n T X n )2 )1/2 ].
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
16 J. W. Silverstein
We have
tr (X n T X n )2 2 tr (X X )2
and
tr (X n T X n XT X ) (4 tr (X X )2 tr (X X )2 )1/2 .
Therefore, to verify
a.s.
D(F A+XT X , F A+XT X ) 0
it is sucient to nd a sequence {n } increasing to so that
1 1
4n tr (X X )2 0 and tr (X X )2 = O(1) a.s.
a.s.
N N
The details are omitted.
Notice the matrix diag(E|X 1 1 |2 tn1 , . . . , E|X 1 1 |2 tnn ) also satises assump-
of Theorem 1.1. Just substitute this matrix for Tn , and replace X n
tion (a)
by (1/ E|X 1 1 |2 )X n . Therefore we may assume
(1) Xij are i.i.d. for xed n,
(2) |X11 | a ln n for some positive a,
(3) EX11 = 0, E|X11 |2 = 1.
x 1/2
f (x) = g(t)dt
a
18 J. W. Silverstein
It is remarked here that similar results have been obtained for the ma-
trices in Theorem 1.3. See [4].
Explicit solutions can be derived in a few cases. Consider the Marcenko-
Pastur distribution, where Tn = I. Then m = m0 (x) solves
1 1
x= +c ,
m 1+m
resulting in the quadratic equation
xm2 + m(x + 1 c) + 1 = 0
with solution
(x + 1 c) (x + 1 c)2 4x
m=
2x
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
(x + 1 c) x2 2x(1 + c) + (1 c)2
=
2x
(x + 1 c) (x (1 c)2 )(x (1 + c)2 )
= .
2x
We see the imaginary part of m is zero when x lies outside the interval
[(1 c)2 , (1 + c)2 ], and we conclude that
(x(1 c)2 )((1+ c)2 x)
2cx x ((1 c)2 , (1 + c)2 )
f (x) =
0 otherwise.
The Stieltjes transform in the multivariate F matrix case, that is, when
Tn = ((1/N )X n X n )1 , X n n N containing i.i.d. standardized entries,
n/N c (0, 1), also satises a quadratic equation. Indeed, H now is
the distribution of the reciprocal of a Marcenko-Pastur distributed random
variable which well denote by Xc , the Stieltjes transform of its distribution
denoted by mXc . We have
1
1 Xc 1 1
x = + cE = + cE
m 1 + X1 m m Xc + m
c
1
= + cmXc (m).
m
From above we have
1 c (z + 1 c) + (z + 1 c)2 4z
mXc (z) = +
cz 2zc
z + 1 c + (z + 1 c )2 4z
=
2zc
(the square root dened so that the expression is a Stieltjes transform) so
that m = m0 (x) satises
1 m + 1 c + (m + 1 c)2 + 4m
x= +c .
m 2mc
It follows that m satises
20 J. W. Silverstein
0.7
0.6
0.5 1 3 10
.2 .4 .4
0.4 c=.05 n=200
0.3
0.2
0.1
...... ...................... ...................................................
0.0
0 2 4 6 8 10 12 14
Here the entries of Xn are N (0, 1). All the eigenvalues appear to stay close
to the limiting support. Such simulations were the prime motivation to
prove
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
Theorem 7.1. ([1]). Let, for any d > 0 and d.f. G, F d,G denote the lim-
iting e.d.f. of (1/N )Xn Tn Xn corresponding to limiting ratio d and limiting
F Tn G.
Assume in addition to the previous assumptions:
(a) EX11 = 0, E|X11 |2 = 1, and E|X11 |4 < .
(b) Tn is nonrandom and Tn is bounded in n.
(c) The interval [a, b] with a > 0 lies in an open interval outside the support
of F cn ,Hn for all large n, where Hn = F Tn .
Then
P(no eigenvalue of Bn appears in [a, b] for all large n) = 1.
when vn = N 1/68 .
(2) The proof of (1) allows (1) to hold for Im(z) = 2vn , 3vn , . . . ,
34vn . Then almost surely
max sup |mn (x + i kvn ) m0n (x + i kvn )| = o(vn67 ).
k{1,...,34} x[a,b]
(vn2 )2 d(F B n () F cn ,Hn ())
max sup 2 2 2 2 2 2
= o(vn66 )
k1,k2,k3
distinct x[a,b] ((x) +k 1 vn )((x) +k 2 vn )((x) +k3 vn )
..
.
(vn2 )33 d(F B n () F cn ,Hn ())
sup = o(vn66 ).
x[a,b] ((x)2 +v 2 )((x)2 +2v 2 ) ((x)2 +34v 2 )
n n n
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
22 J. W. Silverstein
Let 0 < a < a, b > b be such that [a , b ] is also in the open interval
outside the support of F cn ,Hn for all large n. We split up the integral and
get with probability one
I[a ,b ]c () d(F B n () F cn ,Hn ())
sup
x[a,b] ((x )2 + vn2 )((x )2 + 2vn2 ) ((x )2 + 34vn2 )
vn68
+ = o(1).
((xj ) +vn )((xj ) +2vn ) ((xj ) +34vn )
2 2 2 2 2 2
j [a,b ]
Now if, for each term in a subsequence satisfying the above, there is
at least one eigenvalue contained in [a, b], then the sum, with x evaluated
at these eigenvalues, will be uniformly bounded away from 0. Thus, at
these same x values, the integral must also stay uniformly bounded away
from 0. But the integral MUST converge to zero a.s. since the integrand is
bounded and with probability one, both F B n and F cn ,Hn converge weakly
to the same limit having no mass on {a , b }. Contradiction!
The last result is on the rate of convergence of linear statistics of the
eigenvalues of Bn , that is, quantities of the form
1
n
Bn
f (x)dF (x) = f (i )
n i=1
where f is a function dened on [0, ), and the i s are the eigenvalues of
Bn . The result establishes the rate to be 1/n for analytic f . It considers
integrals of functions with respect to
Gn (x) = n[F Bn (x) F cn ,Hn (x)]
where for any d > 0 and d.f. G, F d,G is the limiting e.d.f. of Bn =
1/2 1/2
(1/N )Tn Xn Xn Tn corresponding to limiting ratio d and limiting F Tn
G.
Let m = mF . Then
(a) {Mn (z)} forms a tight sequence for z in a suciently large contour
about the origin.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 01-Silverstein
24 J. W. Silverstein
2 4
(b) If X11 is complex with E(X11 ) = 0 and E(X11 ) = 2, then for z1 , . . . , zr
with nonzero imaginary parts,
(Re Mn (z1 ), Im Mn (z1 ), . . . , Re Mn (zr ), Im Mn (zr ))
converges weakly to a mean zero Gaussian vector. It follows that Mn ,
viewed as a random element in the metric space of continuous R2 -
valued functions with domain restricted to a contour in the complex
plane, converges weakly to a (2 dimensional) Gaussian process M . The
limiting covariance function can be derived from the formula
m (z1 )m (z2 ) 1
E(M (z1 )M (z2 )) = 2
.
(m(z1 ) m(z2 )) (z1 z2 )2
4
(c) If X11 is real and E(X11 ) = 3 then (b) still holds, except the limiting
mean can be derived from
3 t2 dH(t)
c m(1+tm) 3
EM (z) = m2 t2 dH(t) 2
1c (1+tm)2
and
1 1
r r2
k +k
r1 r2 1c 1 2
Cov(Xxr1 , Xxr2 ) = 2cr1 +r2
k1 k2 c
k1 =0 k2 =0
k1
r1
2r1 1 (k1 +
) 2r2 1 k2 +
.
r1 1 r2 1
=1
(see [5]).
References
1. Z. D. Bai and J. W. Silverstein, No eigenvalues outside the support of the lim-
iting spectral distribution of large-dimensional sample covariance matrices,
Ann. Probab. 26(1) (1998) 316345.
2. Z. D. Bai and J. W. Silverstein, CLT for linear spectral statistics of large
dimensional sample covariance matrices, Ann. Probab. 32(1A) (2004) 553
605.
3. R. B. Dozier and J. W. Silverstein, On the empirical distribution of eigen-
values of large dimensional information-plus-noise type matrices, J. Multi-
variate Anal. 98(4) (2007) 678694.
4. R. B. Dozier and J. W. Silverstein, Analysis of the limiting spectral distri-
bution of large dimensional information-plus-noise type matrices, J. Multi-
variate Anal. 98(6) (2007) 10991122.
5. D. Jonsson, Some limit theorems for the eigenvalues of a sample covariance
matrix, J. Multivariate Anal. 12(1) (1982) 138.
6. V. A. Marcenko and L. A. Pastur, Distribution of eigenvalues for some sets
of random matrices, Math. USSR-Sb. 1 (1967) 457483.
7. J. W. Silverstein, Strong convergence of the empirical distribution of eigen-
values of large dimensional random matrices, J. Multivariate Anal. 55(2)
(1995) 331339.
8. J. W. Silverstein and Z. D. Bai, On the empirical distribution of eigenvalues
of a class of large dimensional random matrices, J. Multivariate Anal. 54(2)
(1995) 175192.
9. J. W. Silverstein and S. I. Choi, Analysis of the limiting spectral distribution
function of large dimensional random matrices, J. Multivariate Anal. 54(2)
(1995) 295309.
10. Y. Q. Yin, Limiting spectral distribution for of random matrices, J. Multi-
variate Anal. 20 (1986) 5068.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
Peter J. Forrester
Department of Mathematics and Statistics
University of Melbourne, Victoria 3010, Australia
E-mail: [email protected]
1. Introduction
1.1. Log-gas systems
In equilibrium classical statistical mechanics, the control variables are the
absolute temperature T and the particle density . The state of a system can
be calculated by postulating that the probability density function (p.d.f.)
for the event that the particles are at positions ~r1 , . . . , ~rN is proportional to
the Boltzmann factor eU (~r1 ,...,~rN ) . Here U (~r1 , . . . , ~rN ) denotes the total
potential energy of the system, while := 1/kB T with kB denoting Boltz-
manns constant is essentially the inverse temperature. Then for the system
confined to a domain , the canonical average of any function f (~r1 , . . . , ~rN )
(for example the energy itself) is given by
Z Z
1
hf i := d~r1 d~rN f (~r1 , . . . , ~rN )eU (~r1 ,...,~rN ) ,
ZN
27
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
28 P. J. Forrester
where
Z Z
ZN = d~r1 d~rN eU (r1 ,...,rN ) . (1.1)
I I
In the so called thermodynamic limit, N, || , N/|| = fixed,
such averages can display non-analytic behavior indicative of a phase tran-
sition.
The most common situation is when the potential energy U consists of
a sum of one body and two body terms,
N
X X
U (~r1 , . . . , ~rN ) = V1 (~rj ) + V2 (|~rk ~rj |).
j=1 1j<kN
the system on the unit interval [1, 1] with fixed charges of strengths (a
1)/2 + 1/ and (b 1)/2 + 1/ at y = 1 and y = 1 respectively and a
neutralizing background
N
b (x) = , |x| < 1; (1.5)
1 x2
and the system on a unit circle with a uniform neutralizing background
density.
Physically one expects charged systems to be locally charge neutral.
Accepting this, the particle densities must then, to leading order, coincide
with the background densities. In the above examples, this implies that in
the bulk of the system the particle densities are dependent on N , which in
turn means that there is not yet a well defined thermodynamic limit. To
overcome this, note the special property of the logarithmic potential that it
is unchanged, up to an additive constant, by the scaling of the coordinates
xj 7 cxj . Effectively the density is therefore not a control variable, as it
determines only the length scale in the logarithmic potential.
Making use of (1.2), for the four systems the total energy of the system
can readily be computed (see [13] for details) to give the corresponding
Boltzmann factors. They are proportional to
N
Y Y
2
e(/2)xj |xk xj | (1.6)
j=1 1j<kN
N
Y Y
2
|xj | exj /2 |x2k x2j | (1.7)
j=1 1j<kN
N
Y Y
(1 xj )a/2 (1 + xj )b/2 |xk xj | (1.8)
j=1 1j<kN
Y
|eik eij | (1.9)
1j<kN
30 P. J. Forrester
XN
p U 1
= Lp where L = + . (1.11)
j=1
j j j
In general the steady state solution of this equation is the Boltzmann factor
eU ,
LeU = 0. (1.12)
N
U/2 U/2
X 1 2 U 2 1 2 U
e Le = 2 + , (1.13)
j=1
xj 4 xj 2 x2j
N N
X 2 X X
H= + v1 (xj ) + v2 (xj , xk ).
j=1
x2j j=1 1j<kN
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
where a0 = a+1/, b0 = b+1/. Thus all the pair potentials are proportional
to 1/r2 , where r is the separation between particles or between particles
and their images. Such quantum many body systems were first studied by
Calogero [5] and Sutherland [35].
It follows from (1.12) and (1.14) that eU/2 is an eigenfunction of
H with eigenvalue E0 . Since eU/2 is non-negative, it must in fact be the
ground state. This suggests considering a conjugation of the Schrodinger op-
erators with respect to this eigenfunction. Consider for definiteness (1.17).
A direct computation gives
32 P. J. Forrester
the quantity
Y
d0 := (i j + 1) + (0j i) (1.21)
(i,j)
()
There are two cases in which p Fq can be expressed in terms of ele-
mentary functions [24, 37]. These are the generalized binomial theorem
m
Y
()
1 F0 (a; x1 , . . . , xm ) = (1 xj )a (1.24)
j=1
known as the Morris integral. The Selberg correlation integral refers to the
generalizations
Z 1 Z 1
1
SN (t1 , . . . , tm ; 1 , 2 , 1/) := dx1 dxN
SN (1 , 2 , 1/) 0 0
N
Y m
Y Y 2/
xl 1 (1 xl )2 (1 tl0 xl ) |xj xk | ,
l=1 l0 =1 j<k
Z 1 Z 1
1
SN (t1 , . . . , tm ; 1 , 2 , 1/) := dx1 dxN
SN (1 + m, 2 , 1/) 0 0
N
Y m
Y Y
xl 1 (1 xl )2 (tl0 xl ) |xj xk |2/ ,
l=1 l0 =1 j<k
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
34 P. J. Forrester
(1/)
SN (t1 , . . . , tm ; 1 , 2 , 1/) = 2 F1 (N,
(N 1) + (1 + 2 + m + 1); (1 + m); t1 , . . . , tm ), (1.28)
and
(1/)
MN (t1 , . . . , tm ; a, b, 1/) = 2 F1 (N, b; (N 1)(1+a); t1 , . . . , tm ).
(1.29)
On the other hand, the generalized binomial expansion allows the gen-
()
eralized hypergeometric function 2 F1 in m variables to be expressed as
an m-dimensional integral, provided all the arguments are equal. Thus we
have [18, 15]
Z 1/2 Z 1/2 N
1 Y
dx1 dxN eixl (ab) |1 + e2ixl |a+b
MN (a, b, 1/) 1/2 1/2 l=1
Y
2ixl r 2ixk
(1 + te ) |e e2ixj |2/
1j<kN
1
()
= 2 F1 (r, b; (N 1) + a + 1; t1 , . . . , tN ) , (1.30)
t1 ==tN =t
Z 1 Z 1 N
1 Y
dx1 dxN xl 1 (1 xl )2 (1 txl )r
SN (1 , 2 , 1/) 0 0 l=1
Y 2/ () 1
|xj xk | = 2 F1 (r, (N 1) + 1 + 1;
j<k
2
(N 1) + 1 + 2 + 2; t1 , . . . , tN ) . (1.31)
t1 ==tN =t
In using (1.30) and (1.31) in (1.27) and (1.29) it may happen that the
parameters are such that the former are divergent. To overcome this, use can
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
()
be made of certain transformation formulas satisfied by the 2 F1 . One such
formula, which is restricted to cases in which the series (1.23) terminates,
is [14]
()
2 F1 (a, b; c; t1 , . . . , tm )
()
F (a, b; a + b + 1 + (m 1)/ c; 1 t1 , . . . , 1 tm )
= 2 1 . (1.32)
()
F
2 1 (a, b; a + b + 1 + (m 1)/ c; t 1 , . . . , t m )
t1 ==tm =1
Another, which generalizes one of the classical Kummer relations, reads [37]
()
2 F1 (a, b; c; t1 , . . . , tm )
m
Y () t1 tm
= (1 tj )a 2 F1 a, c b; c; ,..., . (1.33)
j=1
1 t1 1 tm
36 P. J. Forrester
(n) (r1 , . . . , rn )
n n
SN (1 + n, 2 , /2) Y 1 Y
= (N + n)n tk (1 tk )2 |tk tj |
SN +n (1 , 2 , /2)
k=1 j<k
(/2)
2 F1 (N, 2(1 + 2 + m + 1)/ + N 1; 2(1 + m)/; t1 , . . . , tn )
(1.36)
where
For n = 1 the arguments (1.37) are equal, and we have available the
-dimensional integral representation (1.30). The get a convergent integral
we must first apply the Kummer type transformation (1.33). Doing this
gives [13]
SN (1 + , 2 , /2)
(1) (r) = (N + 1)
SN +1 (1 , 2 , /2)
r1 (1 r)2
M (2(1 + 1)/ 1, 2(2 + 1)/ + N 1; 2/)
Z 1/2 Z 1/2
Y
dx1 dx eixl (2(1 2 )/) |1 + e2ixl |2(1 +2 +2)/+N 1
1/2 1/2 l=1
r ixl N Y
(eixl e ) |e2ixk e2ixj |4/ . (1.38)
1r
1j<k
The Boltzmann factor (1.7) with the change of variable x2j 7 xj , and
= a + 1 is said to specify the Laguerre -ensemble. It can be ob-
tained from the Jacobi -ensemble by the change of variables and limiting
procedure
where
Z Z N
Y Y
a/2 xj /2
Wa,,N = dx1 dxN xj e |xk xj | .
0 0 j=1 j<k
Applying the limiting procedure to (1.38) gives for the one-point correlation
(i.e. the particle density) with even the -dimensional integral represen-
tation
Z 1/2 Z 1/2
Wa,,N ra/2 er/2
(1) (r) = (N + 1) dx1 dx
Wa+2,,N +1 M (2/ 1, N, /2) 1/2 1/2
Y 2ixl
eixl (2/1N ) |1 + e2ixl |N +2/1 ere
l=1
Y
|e2ixk e2ixj |4/ . (1.41)
j<k
38 P. J. Forrester
where
Z N
Y Y
2
G,N = dx1 dxN exj /2 |xk xj | .
(,)N j=1 j<k
and taking the limit L . Applying this to (1.38) gives for the one-point
density [2]
2 Z
G,N er /2
(1) (r) = (N + 1) du1 du
G,N +1 G (,)
Y Y
2
(iuj + r)N euj |uk uj |4/ (1.46)
j=1 1j<k
where
Z
Y Y
2
G = dx1 dx exj |xk xj |4/ .
(,) j=1 j<k
and the angles have been scaled so that the circumference length of the
circle is equal to L. Use of (1.29) and the transformation formula (1.32)
shows [14] that for even
(N + n)n ((/2)!)N +n Y
(n) (r1 , . . . , rn ) = n
|e2irk /L e2irj /L |
L ((N + n)/2)!
1j<kn
Yn
iN (rk r1 )/L
MN (n/2, n/2, /2) e
k=2
(/2)
2 F1 (N, n; 2n; 1 t1 , . . . , 1 t(n1) ) (1.49)
where
tk := e2i(rj r1 )/L , k = 1 + (j 2), . . . , (j 1) (j = 2, . . . , n).
(1.50)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
40 P. J. Forrester
1+2/
Y
(1 (1 e2i(r1 r2 )/L )uj )N uj (1 uj )1+2/ |uk uj |4/ .
j<k
(1.51)
bulk
(n) (r1 , . . . , rn ) := lim (n) (r1 , . . . , rn )
N,L
N/L=
Y n
Y
= n cn () |2(rk rj )| ei(rk r1 )
1j<kn k=2
(/2)
1 F1 (n, 2n; 2i(r2 r1 ), . . . , 2i(rn r1 ))
(/2)
where in the argument of 1 F1 each 2i(rj r1 ) (j = 2, . . . , n) occurs
times, and
n1
Y (k/2 + 1)
cn () = (/2)n(n1)/2 ((/2)!)n .
((n + k)/2 + 1)
k=0
For the 2-point function, applying the limit to (1.51) gives [15]
42 P. J. Forrester
For the hard edge the necessary scaling is x 7 X/4N . We see from
(1.40) and (1.26) that
1 n
hard
(n) (X1 , . . . , Xn ) = lim (n) (X1 /4N, . . . , Xn /4N )
N 4N
n
Y Y
a/2
= An () Xj |Xk Xj |
j=1 1j<kn
(/2)
0 F1 (a + 2n; Y1 , . . . , Yn ) (1.55)
{Yj }7{Xj /4}
where
An () = 2n(2+a+(n1)) (/2)n(1+a+(n1))
((1 + /2))n
2n .
Y
(1 + a/2 + (j 1)/2)
j=1
where
1 ((1 + 2/))
a(c, ) = (1)(c1)/2 (2) .
2 4 ()
Similarly the hard edge scaled limits can be taken in the evaluations of
the distributions (1.43) and (1.45). Thus one finds [16]
(/2)
E (0; (0, s)) = es/8 0 F1
(2m/; x1 , . . . , xm )
xj =s/4
(/2)
p (0; s) = Am, sm es/8 0 F1
(2m/ + 2; ; x1 , . . . , xm )
xj =s/4
where
(1 + /2)
Am, = 4(m+1) (/2)2m+1 .
(1 + m)(1 + m + /2)
Note the similarity with (1.55) in the case n = 1. In particular we have
available m-dimensional integral representations.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
x
At the soft edge the appropriate scaling is x 7 2N + . Starting
2N 1/3
with the formula (1.46) one can show [7]
1 x
(1) 2N +
2N 1/3 2N 1/3
(1 + /2) 4 /2 Y (1 + 2/)
K, (x) + O(N 1/3 ) (1.57)
2 j=1
(1 + 2j/)
where
Z i Z i n
1 Y 3 Y
Kn, (x) := dv1 dvn evj /3xvj (vk vj )4/ .
(2i)n i i j=1 1k<ln
1 Tr(V (X))
e (2.1)
C
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
44 P. J. Forrester
For a quantum system which commutes with T of this latter form, the
2N 2N matrix X modelling the Hamiltonian must, in addition to being
Hermitian, have the property
X = Z2N XZ1
2N .
S I~ = O.
~
46 P. J. Forrester
48 P. J. Forrester
and
" n
#
X ui h u i
i
dvl (~rj ) = [dvi (~rj )]i,j=1,...,n
vl vj i,j=1,...,n
l=1 i,j=1,...,n
50 P. J. Forrester
52 P. J. Forrester
U = V T V, U = V DV
1 + i
ei = . (2.23)
1 i
From (2.22) the corresponding eigenvalues p.d.f. of {j } is
N
1 Y 1 Y
2 (N 1)/2+1
|k j | . (2.24)
C
l=1
(1 + l ) j<k
where 0 < j < 1 (j = 1, . . . , m). But it turns out that the details of
the calculation are quite tedious [13, 19]. In the case = 2 some alternative
derivations are possible [33, 6, 19], and a more general result can be derived.
54 P. J. Forrester
valid for Q Hermitian and Im() > 0, and the integration is over the space
of m m Hermitian matrices. In (2.30) the fact that U is unitary tells us
that
AA + CC = 1n1 . (2.33)
Following an idea of [38], we regard (2.33) as a constraint in the space of
general n1 n2 and n1 (N n2 ) complex rectangular matrices A and C,
which allows the distribution of A to be given by
Z
(AA + CC 1n2 )(dC). (2.34)
These are the four classical weight functions from orthogonal polynomial
theory, which can be characterized by the property that
d a(x)
log g(x) =
dx b(x)
where
56 P. J. Forrester
Proposition 3.1. Let wi2 [/2, 1] where [s, ] refers to the gamma
distribution, specified by the p.d.f. s xs1 ex/ /(s) (x > 0). Given
a1 > a 2 > > a N
the p.d.f. for the zeros of the random rational function
N
X qi
a
i=1
ai
is equal to
Y
(j k )
2 N N
+1 Y
ea /2 1j<kN +1 Y
Y |j ap |/21
((/2))N (aj ak )1 j=1 p=1
1j<kN
N +1 N
!!
1 X X
exp 2j a2j (3.4)
2 j=1 j=1
where
> 1 > a1 > 2 > > aN > N +1 > (3.5)
and
N
X +1 N
X
j = aj + a. (3.6)
j=1 j=1
58 P. J. Forrester
Hence after making use of the Cauchy double alternant identity the sought
Jacobian is seen to be equal to
Y
(ai aj )(i j )
YN
1i<jN
qj N
. (3.9)
Y
j=1
(ai j )
i,j=1
We must multiply (3.9) and (3.10), and write {qj } in terms of {ai , j }. By
equating the coefficients of 1/ on both sides of (3.3) and using (3.6) with
a = 0 we see
N N +1 N
!
X 1 X 2 X 2
qj = .
j=1
2 j=1 j j=1 j
QN
Further, we can read off i=1 qi from (3.7). Substituting we deduce (3.4).
Here the first equality follows from the spectral decomposition, while the
second follows from Cramers rule. Because the matrix (3.1) is real sym-
metric and thus orthogonally diagonalizable, we must have
N
X
2i = 1
j=1
and let {bj } be given. The roots of the random rational function
N
X j
,
j=1
x bj
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
60 P. J. Forrester
where
N
(x xl )
X j
= l=1 .
j=1
x bj YN
(x bl )
l=1
The result now follows immediately upon multiplying (3.19) with (3.9), and
substituting for j using (3.18).
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
((/2))N Y
= (bj bk )1 . (3.20)
(N /2)
1j<kN
where
N 1
Y (1 + (j + 1)/2)
N !mN () = (2)N/2 .
j=0
(1 + /2)
where
j wj2 /(w12 + + wN
2
), wj2 [/2, 1],
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
62 P. J. Forrester
The two recurrences together give the random coefficient three term recur-
rence
pN +1 (x) = (x a)pN (x) b2N pN (x). (3.22)
The three term recurrence (3.22) occurs in the study of tridiagonal ma-
trices. Thus consider a general real symmetric tridiagonal matrix
a 1 b1
b1 a 2 b2
b2 a 3
Tn = . (3.23)
..
. b n1
bn1 an
By forming 1n Tn and expanding the determinant along the bottom row
one sees
det(1n Tn ) = ( an ) det(1n1 Tn1 ) b2n1 det(1n2 Tn2 ).
Comparison with (3.22) shows the Gaussian -ensemble is realized by the
eigenvalue p.d.f. of random tridiagonal matrices with
aj N[0, 1] b2j [j/2, 1]. (3.24)
This result was first obtained using different methods in [9]. The present
derivation is a refinement of the approach in [20].
4. Laguerre Ensemble
A recursive construction of the Hermite ensemble was motivated by con-
sideration of a recursive structure inherent in the GOE. Likewise, to moti-
vate a recursive construction of the Laguerre ensemble we first examine
the case of the LOE. As noted in Section 2.6 this is realized by matrices
T
X(n) X(n) where X(n) is an n N rectangular matrix with Gaussian entries
N[0, 1]. Such matrices satisfy the recurrence
T T
X(n+1) X(n+1) = X(n) X(n) + ~x(1) ~x T(1) . (4.1)
This suggests inductively defining a sequence of N N positive definite
matrices indexed by (n) according to
A(n+1) = diag A(n) + ~x(1) ~x T(1) (4.2)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
where diag A(n) refers to the diagonal form of A(n) and A(0) = [0]N N .
Noting that A(n) will have N n zero eigenvalues, it is therefore necessary
to study the eigenvalues of the N N matrix
Y := diag(a1 , . . . , an , an+1 , . . . , an+1 ) + ~x~x T .
| {z }
N n
Since
det(1N Y ) = det(1N A) det(1N (1N A)1 ~x~x T )
it follows
N
X
x2j
n
det(1N Y ) X x2j j=n+1
=1 . (4.3)
det(1N A) j=1
aj an+1
One is thus led to ask about the density of zeros of the random rational
function
n+1
X wj2
1 , (4.4)
j=1
aj
where, since the sum of squares of Gaussian distributed variables are gamma
distributed variables,
wj2 [sj , 1]. (4.5)
Proposition 4.1. The zeros of the rational function (4.4) have p.d.f.
1 Pn+1
e j=1 (j aj )
(s1 ) (sn+1 )
n+1
Y (i j ) Y
s +s 1
|i aj |sj 1 (4.6)
(ai aj ) i j
i,j=1
1i<jn+1
where
1 > a1 > 2 > > n+1 > an+1 .
This result can be proved [20] by following the general strategy used to
establish Propositions 3.1 and 3.2.
The case of interest is
s1 = = sn = /2, sn+1 = (N n)/2, an+1 = 0. (4.7)
Let us denote (4.6) with these parameters by
G({j }j=1,...,n+1 ; {aj }j=1,...,n ) .
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
64 P. J. Forrester
Let the p.d.f. of {j }j=1,...,n+1 be denoted pn+1 ({aj }). For n < N the
recursive construction of {A(n) } gives that
Z
pn+1 ({j }) = da1 dan
1 >a1 >>n+1 >0
G({j }j=1,...,n+1 ; {aj }j=1,...,n )pn ({aj }) (4.8)
subject to the initial condition p0 = 1.
With = 1 the LOE recursion (4.1) tells us that the recurrence (4.8) is
satisfied by the eigenvalue p.d.f. for the non-zero eigenvalues of the Wishart
T
matrices X(n) X(n) . This in turn is equal to the eigenvalue p.d.f. of the full
T
rank matrices X(n) X(n) , which according to (2.18) is given by
n
1 Y (N n1)/2 l Y
pn ({j }) = l e |k j | (4.9)
Cn
l=1 1j<kn
(here the choice V (x) = x in (2.18) has been introduced to account for the
scale factor = 1 in the distribution [sj , ] used in (4.4)).
For general > 0, we want to check that (4.8) has as its solution
n
1 Y (N n+1)/21 l
Y
pn ({j }) = l e |k j | . (4.10)
Cn,
l=1 1j<kn
Since
G({j }j=1,...,n+1 ; {aj }j=1,...,n ) n+1
Y
(i j )
1 P
n ( a ) i<j
= e j=1 j j n+1
n
((/2))n ((N n)/2) Y
(ai aj )1
n+1 i<j
Y (N n)/2+1
i n
i=1
Y
(N n+1)/21
|i aj |/21
ai i,j=1
leaving us with
Cn+1, (/2)
pn+1 ({j }).
Cn, ((n + 1)/2)((N n)//2)
Thus (4.9) with
n
Y ((k + 1)/2)((N k)//2)
Cn, = (4.11)
(/2)
k=0
where
wj2 [/2, 1] (j = 1, . . . , n), 2
wn+1 [(N n)/2, 1].
In addition, as for the matrix MN introduced in (3.1), the matrix A(n) in
(4.2) must satisfy the first equality in (3.13), thus implying the companion
recurrence
n
pn1 () X j
= (4.13)
pn () j=1
xj
where
j wj2 /(w12 + + wn2 ).
Comparing (4.12) and (4.13) gives the three term recurrence with random
coefficients [20]
2
pn+1 () = ( wn+1 )pn () bn pn1 () (4.14)
where
2
wn+1 [(N n)/2, 1], bn [n/2, 1].
5. Recent Developments
The whole topic of explicit constructions of -random ensembles is recent,
with the first paper on the subject appearing in 2002 [9]. In that work the
motivation came from considerations in numerical linear algebra, whereby
the form of a GOE matrix after the application of Householder transfor-
mations to tridiagonal form was sought. In the case of unitary matrices,
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
66 P. J. Forrester
Acknowledgments
This work was supported by the Australian Research Council.
References
1. G. W. Anderson, A short proof of Selbergs generalized beta formula, Forum
Math. 3 (1991) 415417.
2. T. H. Baker and P. J. Forrester, The Calogero-Sutherland model and gener-
alized classical polynomials, Commun. Math. Phys. 188 (1997) 175216.
3. O. Bohigas, M. J. Giannoni and C. Schmit, Characterization of chaotic quan-
tum spectra and universality of level fluctuation laws, Phys. Rev. Lett. 52
(1984) 14.
4. J. Breuer, P. J. Forrester and U. Smilansky, Random discrete Schrodinger
operators from random matrix theory, arXiv:math-ph/0507036 (2005).
5. F. Calogero, Solution of the three-body problem in one dimension, J. Math.
Phys. 10 (1969) 21912196.
6. B. Collins, Product of random projections, Jacobi ensembles and universality
problems arising from free probability, Prob. Theory Rel. Fields 133 (2005)
315344.
7. P. Desrosiers and P. J. Forrester, Hermite and Laguerre -ensembles: Asymp-
totic corrections to the eigenvalue density, Nucl. Phys. B 743 (2006) 307332.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 02-Forrester
68 P. J. Forrester
FUTURE OF STATISTICS
1. Introduction
What are the future aspects of modern statistics and in which direction
will it develop? To answer this, we shall have a look at what has influenced
statistical research in recent decades. We strongly believe that, in every
discipline, the most impacting factor has been and still is the rapid
development and wide application of computer technology and computing
sciences. It has become possible to collect, store and analyze huge amounts
of data of large dimensionality. As a result, more and more measurements
are collected with large dimension, e.g. data in curves, images and movies,
and statisticians have to face the task of analyzing these data. But com-
puter technology also offers big advantages. We are now in a position to do
many things that were not possible 20 years ago, such as making spectral
decompositions of a matrix of order 1000 1000, searching patterns in a
DNA sequences and much more. However, it also confronts us with the big
69
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
challenge that classical limit theorems are no longer suitable to deal with
large dimensional data and we have to develop new limit theorems to cope
with this. As a result, many statisticians have now become interested in
this research topic.
Typically, large dimensional problems involve a large dimension p and
a small sample size n. However, in a real problem, they both are given
integers. It is then natural to ask for which size the dimension p has to be
taken as fixed or tending to infinity and what we should do if we cannot
justify p is fixed. Is it reasonable to claim p is fixed if the ratio of
dimension and sample size p/n is small, say, less than 0.001? If we cannot
say p is fixed, can any limit theorems be used for large dimensional data
analysis?
To discuss these questions, we shall provide some examples of multi-
variate analysis. We illustrate the difference between traditional tests and
the new approaches of large dimensional data by considering tests on the
difference of two population means and tests on the equality of a popula-
tion covariance matrix and a given matrix. By means of simulations, we
will show how the new approaches are superior to the traditional ones.
At present, large dimensional random matrix theory (RMT) is the only
systematic theory which is applicable to large dimensional problems. The
RMT is different from the classical limit theories because it is built on
the assumption that p/n y > 0 regardless what y is, provided where
it is applicable, say y (0, 1) for T 2 statistic. The RMT shows that the
classical limit theories behave very poorly or are even inapplicable to large-
dimensional problems, especially when the dimension is growing propor-
tionally with the sample size [see Dempster (1958), Bai (1993a,b, 1999),
Bai and Saranadasa (1996), Bai and Silverstein (2004, 2006), Bai and Yin
(1993), Bai, Yin and Krishnaiah (1988)]. In this paper, we will show how
to deal with large dimensional problems with the help of RMT, especially
the CLT of Bai and Silverstein (2004).
Future of Statistics 71
PNi P2 PNi
where xi = N1i j=1 xi,j , i = 1, 2, A = i=1 j=1 (xi,j xi )(xi,j xi )0
N1 N2
and = n N1 +N2 with n = N1 +N2 2. It is well known that, under the null
hypothesis, the T 2 statistic has an F distribution with degrees of freedom
p and n p + 1.
The advantages of the T 2 -test include the properties that it is invariant
under affine transformations, has an exact known null distribution, and is
most powerful when the dimension of data is sufficiently small compared
to its sample size. However, Hotellings test has the serious defect that the
T 2 statistic is undefined when the dimension of data is greater than the
sample degrees of freedom. Looking for remedies, Chung and Fraser (1958)
proposed a nonparametric test and Dempster (1958, 1960) discussed the
so-called non-exact significance test (NET). Dempster (1960) also con-
sidered the so-called randomization test. Not only being a remedy when
the T 2 is undefined, Bai and Saranadasa (1996) also found that, even if
T 2 is well defined, the NET is more powerful than the T 2 test when the
dimension is close to the sample degrees of freedom. Both, the T 2 test
and Dempsters NET, strongly rely on the normality assumption. Moreover,
Dempsters non-exact test statistic involves a complicated estimation of r,
the degrees of freedom for the chi-square approximation. To simplify the
testing procedure, a new method, the Asymptotic Normality Test (ANT), is
proposed in Bai and Sarahadasa (1996). It is proven there that the asymp-
totic power of ANT is equivalent to that of Dempsters NET. Simulation
results further show that the new approach is slightly more powerful than
Dempsters NET. We believe that the estimation of r and its rounding to
an integer in Dempsters procedure may cause an error of order O(1/n).
This might indicate that the new approach is superior to Dempsters test in
the second order term in some Edgeworth-type expansions (see Babu and
Bai (1993) and Bai and Rao (1991) for reference of Edgeworth expansions).
p yn
p
Lemma 1. We have np+1 F (p, n p + 1) = 1y + 2y/(1 y)3 nz +
n
o(1/ n), where yn = p/n, limn yn = y (0, 1) and z is the 1
quantile of the standard normal distribution.
Future of Statistics 73
PN PN P
where t = n[ln( n1 i=3 Qi )] i=3 ln Qi , w = 3i<jN ln sin2 ij and
ij is the angle between the vectors of yi , yj , 3 i < j N . Dempsters
test is then to reject H0 if F > F (r, nr).
By elementary calculus, we have
(tr())2 tr(2 )
r= and m = . (2.8)
tr(2 ) tr
From (2.8) and the Cauchy-Schwarz inequality, it follows that r p. On the
other hand, under regular conditions, both tr() and tr(2 ) are of the order
O(n), and hence, r is of the same order. Under wider conditions (2.12) and
(2.13) given in Theorem 6 below, it can be proven that r . Further, we
may prove that t (n/r)N (1, 1n ) and w n(n1)
2r
4
N (1, n(n1) 8
+ nr ). From
these estimates, one may conclude that both r1 and r2 are ratio-consistent
(in the sense that r/r 1). Therefore, the solutions of equations (2.6) and
(2.7) should satisfy
n
r1 = + O(1) (2.9)
t
and
1 n
r2 = + O(1), (2.10)
w 2
respectively. Since the random effect may cause anerror of order O(1), one
may simply choose the estimates of r as nt or w1 n2 .
To describe the asymptotic power function of Dempsters NET, we as-
sume that p/n y > 0, N1 /N (0, 1) and that the parameter r is
known. The reader should note that the limiting ratio y is allowed to be
greater than one in this case. When r is unknown, substituting r by the
estimators r1 or r2 may cause an error of high order smallness in the ap-
proximation of the power function of Dempsters NET. Similar to Lemma 1
one may show the following.
Lemma 5. When n, r ,
p
F (r, nr) = 1 + 2/rz + o(1/ r). (2.11)
Then we have the following approximation of the power function of Demp-
sters NET.
Theorem 6. If
0 = o( tr 2 ), (2.12)
max = o( tr 2 ), (2.13)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
Future of Statistics 75
If the underlying distributions are not normal but satisfy Conditions (a)
(c), one may show that
2
Var Mn = M (1 + o(1)). (2.18)
n2 1
Bn2 = tr Sn2 (tr Sn )2
(n + 2)(n 1) n
(x1 x2 )0 (x1 x2 ) tr Sn
Z= s
2(n + 1)n
tr Sn2 n1 (tr Sn )2
(n + 2)(n 1)
N1 N2
(x1 x2 )0 (x1 x2 ) tr Sn
= N r N (0, 1). (2.19)
2(n + 1)
Bn
n
Due to (2.19) the test rejects H0 if Z > z . Regarding the asymptotic power
of our new test, we have the following theorem.
Future of Statistics 77
the effect of high dimension visible. In fact, this has been shown by our
simulation for p = 4.
Dempsters test statistic depends on the choice of vectors h3 , h4 , . . . , hN
because different choices of these vectors would result in different estimates
of the parameter r. On the other hand, the estimation of r and the round-
ing of the estimates may cause an error (probably an error of second order
smallness) in Dempsters test. Thus, we conjecture that our new test can be
more powerful than Dempsters in their second terms of an Edgeworth type
expansion of their power functions. This conjecture was strongly supported
by our simulation results. Because our test statistic is mathematically sim-
ple, it is not difficult to get an Edgeworth expansion by using the results
obtain in Babu and Bai (1993), Bai and Rao (1991) or Bhattacharya and
Ghosh (1978). It seems difficult to get a similar expansion for Dempsters
test due to his complicated estimation of r.
We conducted a simulation study to compare the powers of the three
tests for both normal and non-normal cases with the dimensions N1 = 25,
N2 = 20, and p = 40. For the non-normal case, observations were generated
by the following moving average model. Let {Uijk } be a set of independent
gamma variables with shape parameter 4 and scale parameter 1. Define
where and the s are constants. Under this model, = (ij ) with
ii = 4(1 + 2 ), i,i+1 = 4 and ij = 0 for |i j| > 1. For the normal case,
the covariance matrices were chosen to be = Ip and = (1 )Ip + Jp ,
with = 0.5, where J is a p p matrix with all entries 1. A simulation
was also conducted for small p (chosen as p = 4). The tests were made for
size = 0.05 with 1000 repetitions.
The power was evaluated at standard
parameter = k1 2 k2 / tr 2 . The simulation for the non-normality
case was conducted for = 0, 0.3, 0.6 and 0.9 (Figure 1). All three tests
have almost the same significance level. Under the alternative hypothesis,
the power curves of Dempsters test and our test are rather close but that
of our test is always higher than Dempsters test. Theoretically, the power
function for Hotellings test should increase very slowly when the noncentral
parameter increases. This is also demonstrated by our simulation results.
The reader should note that there are only 1000 repetitions for each value
of thenoncentral parameter in our simulation which may cause an error
of 1/ 1000 = 0.0316 by the Central Limit Theorem. Hence, it is not sur-
prising that the simulated power function of the Hotellings test, whose
magnitude is only around 0.05, seems not to be increasing at some points
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
Future of Statistics 79
Fig. 1. Simulated powers of the three tests with multivariate Gamma distributions.
of the noncentral parameter. Similar tables are presented for the normal
case (Figure 2). For higher dimension cases the power functions of Demp-
sters test and our test are almost the same, and our method is not worse
than Hotellings test even for p = 4.
Fig. 2. Simulated powers of the three tests with multivariate normal distributions.
Future of Statistics 81
Type I errors
n = 500
p 5 10 50 100 300
T1 0.0567 0.0903 1.0 1.0 1.0
T2 0.0253 0.0273 0.1434 0.9523 1.0
n = 1000
p 5 10 50 100 300
T1 0.0530 0.0666 0.9830 1.0 1.0
T2 0.0255 0.0266 0.0678 0.4221 1.0
p/n = 0.05
(n, p) (250, 12) (500, 25) (1000, 50) (2000, 100) (6000,300)
T1 0.1835 0.5511 0.9830 1.0 1.0
T2 0.0306 0.0417 0.0678 0.1366 0.7186
The simulation results show that Type I errors for the classical methods T1
and T2 are close to 1 as p/n y (0, 1) or p/n is large. It shows that the
classical methods T1 and T2 behave very poorly and are even inapplicable
for the testing problems with large dimension or dimension increasing with
sample size.
Bai and Silverstein (2004) have revealed the reason of the above phe-
nomenon. They show that, with probability 1,
r
n
T1 = log(|Sn |) as p/n y (0, 1). (3.5)
2p
These two results show that Theorem 9 is not applicable when p is large
and we have to seek for new limit theorems to test the Hypothesis 3.1.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
2 1/2
where Ex11 = 0, E|x11 | = 1, Tp is p p random non-negative definite
1/2
being independent. Let F A denote
Hermitian with (x1 , . . . , xn ) and Tp
the empirical spectral distribution (ESD) of the eigenvalues of the square
matrix A, that is, if A is p p, then
(number of eigenvalues of A x)
F A (x) = .
p
Silverstein (1995) proved that under certain conditions, with probability 1,
F Bp tends to a limiting distribution, called the limiting spectral distribution
(LSD). To describe his result, we define the Stieltjes transform for the c.d.f.
G by
Z
1
mG (z) dG(), z C+ = {z : z C, =(z) > 0}. (3.7)
z
Let Hp = F Tp and H denote the ESD and limiting spectral distribution
(LSD) of Tp , respectively. Also, let F {y,H} denote the LSD of F Bp . Further,
let F {yp .Hp } denote the LSD F {y,H} with y = yp and H = Hp .
Let m() and mF {yp ,Hp } () denote the Stieltjes transforms of the c.d.f.s
{y,H}
F (1 y)I[0,+) + yF {y,H} and F {yp ,Hp } (1 yp )I[0,+) +
yp F {yp ,Hp }
, respectively. Clearly, F Bp = (1 yp )I[0,+) + yp F Bp is the
ESD of the matrix
1 n
Bp = xj T x k .
n j,k=1
Therefore, F {y.H} and m are the LSD of F Bp and its Stieltjes transform
and F {yp .Hp } and mF {yp .Hp } are the corresponding versions with y = yp
and H = Hp . Silverstein (1995) proved
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
Future of Statistics 83
which will be called a LSS, where Fn (x) is the ESD of the random matrix
computed from data and F (x) is the limiting spectral distribution (LSD)
D
of Fn F .
Bai and Silverstein (2004) established the following theorem.
Then
forms a tight sequence in p, where Gp (x) = p[F Bp (x) F {yp ,Hp } (x)].
(ii) If x11 is real and E(x411 ) = 3, then (3.11) converges weakly to a Gaus-
sian vector (Xf1 , . . . , Xfk ) with means
Z
I y m(z)3 t2 (1 + tm(z))3 dH(t)
1
EXf = f (z) Z 2 dz
2i
1 y m(z)2 t2 (1 + tm(z))2 dH(t)
(3.12)
and covariance function
II
1 f (z1 )g(z2 ) d d
Cov(Xf , Xg )= 2 m(z1 ) m(z2 )dz1 dz2
2 (m(z1 )m(z2 ))2 dz1 dz2
(3.13)
(f, g {f1 , . . . , fk }). The contours in (3.12) and (3.13) (two in (3.13),
which we may assume to be nonoverlapping) are closed and are taken in
the positive direction in the complex plane, each enclosing the support
of F c,H .
2 4
(iii) If X11 is complex with E(X11 ) and E(|X11 |) = 2, then (ii) also holds,
except the means are zero and the covariance function is 12 of the func-
tion given in (3.13).
Future of Statistics 85
Z b(y
Z p) q
yp ,Hp f (x)
f (x)dF (x) = (b(yp ) x)(x a(yp ))dx,
2yp x
a(yp )
b(y)
Z
f (a(y)) + f (b(y)) 1 f (x)
EXf = p dx
4 2 4y (x 1 y)2
a(y)
and
I I
1 f (z(m1 )) g(z(m2 ))
Cov(Xf , Xg ) = 2 dm1 dm2 ,
2 (m1 m2 )2
where
yp = p/n y (0, 1),
1 y
z(mi ) = + f or i = 1, 2,
mi 1 + mi
a(yp ) = (1 yp )2 , b(yp ) = (1 + yp )2
where
Zb
x log(x) 1 p
d2 (y) = (b(y) x)(x a(y))dx
2yx
a
y1
= 1 log(1 y) < 0.
y
This limit theoretically confirms our findings that the classical methods
of using T1 and T2 will lead to a very serious error in large-dimensional
testing problems (3.1), that is, the Type I errors is almost 1. It suggests
that one has to find a new normalization of the statistics T1 and T2 such
that the hypothesis H0 can be tested by the newly normalized versions of
T1 and T2 .
Applying Theorem 12 to T1 and T2 , we have the following theorem.
where
yp 1
d1 (yp ) = log(1 yp ) 1,
yp
log(1 yp )
1 (yp ) = ,
2
12 (yp ) = 2 log(1 yp ),
yp 1
d2 (yp ) = 1 log(1 yp ),
yp
log(1 yp )
2 (yp ) = ,
2
22 (yp ) = 2 log(1 yp ) 2yp .
Future of Statistics 87
T2 =n (tr(Sn ) log(|Sn |)p)0.975 or 0.025 (two sided)
Method 2 T2 = n (tr(Sn ) log(|Sn |) p) 0.05 (reject left)
T2 = n (tr(Sn ) log(|Sn |) p) 0.95 (reject right)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
| log(|Sn |) p d1 (yp ) 1 (yp )|
T3 = 0.975 (two-sided)
1 (yp )
(log(|Sn |) p d1 (yp ) 1 (yp ))
Method 3 T3 = 0.05 (reject left)
1 (yp )
(log(|Sn |) p d1 (yp ) 1 (yp ))
T3 = 0.95 (reject right)
1 (yp )
| tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp )|
T4 = 0.975
2 (yp )
(two-sided)
T = (tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp ))
4 0.05
Method 4 2 (yp )
(reject left)
(tr(Sn ) log(|Sn |) p p d2 (yp ) 2 (yp ))
T4 = 0.95
2 (yp )
(reject right)
where 0.975 , 0.05 and 0.95 are the 97.5%, 5% and 95% quantiles of
N (0, 1); 0.975 , 0.05 and 0.95 are the 97.5%, 5% and 95% quantiles of
2p(p+1)/2 .
Thirdly, samples X1 , . . . , Xn are drawn from the population
N (0p , pp ). To compute Type I errors, we draw samples X1 , . . . , Xn from
N (0p , Ipp ), and, to compute powers, we take samples X1 , . . . , Xn from
N (0p , pp ) where = (ij )pp
(
1, i=j
ij =
0.05, i 6= j
Future of Statistics 89
(n = 500, p = 300)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 1.0 0.0 1.0 1.0 0.0 1.0
Method 3 0.0513 0.0508 0.0528 1.0 1.0 0.0
Method 4 0.0507 0.0521 0.0486 1.0 0.0 1.0
(n = 500, p = 100)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.9523 0.0 0.9753 1.0 0.0 1.0
Method 3 0.0516 0.0514 0.0499 0.9969 1.0 0.0
Method 4 0.0516 0.0488 0.0521 1.0 0.0 1.0
(n = 500, p = 50)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.1484 0.0064 0.2252 1.0 0.0 1.0
Method 3 0.0488 0.0471 0.0504 0.7850 0.8660 0.0
Method 4 0.0515 0.0494 0.0548 1.0 0.0 1.0
(n = 500, p = 10)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.0903 0.1406 0.0136 0.1712 0.2610 0.0023
Method 2 0.0546 0.0458 0.0538 0.8985 0.0 0.9391
Method 3 0.0507 0.0524 0.0489 0.0732 0.1169 0.0168
Method 4 0.0585 0.0441 0.0668 0.9252 0.0 0.9470
(n = 500, p = 5)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.0567 0.0777 0.0309 0.0651 0.1038 0.0190
Method 2 0.0506 0.0489 0.0511 0.4169 0.0014 0.5188
Method 3 0.0507 0.0517 0.0497 0.0502 0.0695 0.0331
Method 4 0.0625 0.0368 0.0807 0.5237 0.0007 0.5940
Furthermore, when the ratio p/n y (0, 1), even if y is very small,
Type I errors of testing Methods 1 and 2 still tend to 1 as the sample size
is becoming large.
(2) Under all choices of (n, p), powers of testing Methods 2 and 4 are
much higher than those of testing Methods 1 and 3, respectively. Moreover,
almost all powers of testing Method 4 are higher than others.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
Future of Statistics 91
(n = 6000, p = 300)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.7186 0.0 0.8131 1.0 0.0 1.0
Method 3 0.0476 0.0465 0.0469 1.0 1.0 0.0
Method 4 0.0505 0.0525 0.0466 1.0 0.0 1.0
(n = 2000, p = 100)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 1.0 1.0 0.0 1.0 1.0 0.0
Method 2 0.1366 0.0062 0.2144 1.0 0.0 1.0
Method 3 0.0501 0.0506 0.0515 1.0 1.0 0.0
Method 4 0.0525 0.0505 0.0531 1.0 0.0 1.0
(n = 1000, p = 50)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.9830 0.9915 0.0 1.0 1.0 0.0
Method 2 0.0778 0.0179 0.1166 1.0 0.0 1.0
Method 3 0.0471 0.0495 0.0499 0.9779 0.9886 0.0
Method 4 0.0524 0.0473 0.0575 1.0 0.0 1.0
(n = 500, p = 25)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.5511 0.6653 0.0 0.9338 0.9656 0.0
Method 2 0.0817 0.0313 0.0765 1.0 0.0 1.0
Method 3 0.0518 0.0539 0.0506 0.2824 0.3948 0.0013
Method 4 0.0552 0.0472 0.0558 1.0 0.0 1.0
(n = 250, p = 12)
Type I error Power
Two-sided Reject left Reject right Two-sided Reject left Reject right
Method 1 0.1835 0.2729 0.0033 0.3040 0.4151 0.0006
Method 2 0.0612 0.0442 0.0606 0.6129 0.0002 0.7141
Method 3 0.0483 0.0499 0.0486 0.0670 0.1089 0.0183
Method 4 0.0574 0.0507 0.0617 0.6369 0.0003 0.7192
(3) Comparing the Type I errors and powers for all choices of (n, p),
the testing Method 4 has better Type I errors and higher powers. Al-
though Method 2 has higher powers, its Type I errors are almost 1. Al-
though Method 3 has lower Type I errors, its powers are lower than those
of Method 4.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
4. Conclusions
In this paper, both theoretically and by simulation, we have shown that
classical approaches to hypothesis testing do not apply to large-dimensional
problems and that the newly proposed methods perform much better than
the classical ones. It is interesting that the new methods do not perform
much worse than the classical methods for small dimensional cases. There-
fore, we would strongly recommend the new approaches even for moderately
large dimensional cases provided that p 4 or 5, REGARDLESS of the
ratio between dimension and data size.
We would also like to emphasize that the large dimension of data may
cause low efficiency of classical inference methods. In such cases, we would
strongly recommend non-exact procedures with high efficiency rather than
those classical ones with low efficiency, such as Dempsters NET and Bai
and Saranadasas ANT.
Acknowledgment
The authors of this chapter would like to express their thanks to Dr. Adrian
Roellin for his careful proofreading of the chapter and valuable comments.
References
1. G. J. Babu and Z. D. Bai, Edgeworth expansions of a function of sample
means under minimal moment conditions and partial Cramers conditions,
Sankhya Ser. A 55 (1993) 244258.
2. Z. D. Bai, Convergence rate of expected spectral distributions of large random
matrices. Part I. Wigner matrices, Ann. Probab. 21 (1993) 625648.
3. Z. D. Bai, Convergence rate of expected spectral distributions of large random
matrices. Part II. Sample covariance matrices, Ann. Probab. 21 (1993) 649
672.
4. Z. D. Bai, Methodologies in spectral analysis of large dimensional random
matrices. A review, Statistica Sinica 9 (1999) 611677.
5. Z. D. Bai and C. R. Rao, Edgeworth expansion of a function of sample means,
Ann. Statist. 19 (1991) 12951315.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 03-BaiZhidong
Future of Statistics 93
Antonia M. Tulino
Dip. di Ing. Elettronica e delle Telecomunicazioni
Universita degli Studi di Napoli, Fedrico I
Via Claudio 21, Napoli, Italy
E-mail: [email protected]
1. Introduction
The rst studies of random matrices stemmed from the multivariate statis-
tical analysis at the end of the 1920s, primarily with the work of Wishart
(1928) on xed-size matrices with Gaussian entries. After a slow start, the
subject gained prominence when Wigner introduced the concept of statis-
tical distribution of nuclear energy levels in 1950. In the past half century,
classical random matrix theory has been developed, widely and deeply, into
a huge body, eectively used in many branches of physics and mathematics.
Of late, random matrices have attracted great interest in the engineering
community because of their applications in the context of information the-
ory and signal processing, which include among others: wireless communica-
tions channels, learning and neural networks, capacity of ad hoc networks,
direction of arrival estimation in sensor arrays, etc.
The earliest applications to wireless communication were the pioneering
works of Foschini and Telatar in the mid-90s on characterizing the capacity
95
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
96 A. M. Tulino
of multi-antenna channels. With works like [46] which, initially, called at-
tention to the eectiveness of asymptotic random matrix theory in wireless
communication theory, interest in the study of random matrices began and
the singular-value densities of random matrices and their asymptotics, as
the matrix size tends to innity, became an active research area in infor-
mation/communication theory. In the last few years a considerable body of
results on the fundamental information-theoretic limits of various wireless
communication channels that makes substantial use of asymptotic random
matrix theory, has emerged in the communications and information theory
literature. For an extended survey on contributions on this results see [3].
In the same way that the original contributions of Wishart and Wigner
were motivated by their applications, such is also the driving force behind
the eorts by information-theoreticians and engineers. The Shannon and
the transforms, introduced for the rst time in [1,2], are prime examples:
these transforms which were motivated by the application of random ma-
trix theory to various problems in the information theory of noisy commu-
nication channels [3], characterize the spectrum of a random matrix while
providing direct engineering insight.
In this paper, using the and Shannon transforms of the singular-value
distributions of large dimensional random matrices, we characterize for both
ergodic and non-ergodic regime the fundamental limits of a general class of
noisy multi-input multi-output (MIMO) wireless channels which are char-
acterized by random matrices that admit various statistical descriptions
depending on the actual application. For these channels, a number of ex-
amples and asymptotic closed-form expressions of their fundamental limits
are provided. For both the ergodic and non-ergodic regimes, we illustrate
the power of random matrix results in the derivation of the fundamental
limits of wireless channels and we show the applicability of our results to
real-world problems, where the asymptotic behaviors are shown to be ex-
cellent approximations of the behavior of actual systems with very modest
numbers of antennas.
1
N
FN
A (x) = 1{i (A) x} (3.1)
N i=1
where 1 (A), . . . , N (A) are the eigenvalues of A and 1{} is the indica-
tor function. If FN A () converges almost surely (a.s) as N , then the
corresponding limit (asymptotic ESD) is denoted by FA ().
The rst performance measure that we are going to consider is the mu-
tual information. The mutual information, rst introduced by Shannon in
1948, determines the maximum amount of data per unit bandwidth (in
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
98 A. M. Tulino
N E[||x||2 ]
SNR = , (3.6)
KE[||n||2 ]
I(SNR)
S = lim (3.8)
SNR log SNR
which for most channels gives S = min K N , 1 , and the power oset
I(SNR)
L = lim log SNR (3.9)
SNR S
1
MMSE(SNR) = min E ||x My||2 (3.10)
K MCKN
1 1
= tr I + SNR H H (3.11)
K
1
K
1
= (3.12)
K i=1 1 + SNR i (H H)
1
= dFK (x)
1 + SNR x H H
0
N 1 N K
= HH (x)
dFN
K 0 1 + SNR x K
(3.13)
where the expectation in (3.10) is over x and n while (3.13) follows from
N FN
HH (x) N u(x) = KFH H (x) Ku(x)
K
(3.14)
d K tr I + SNR H H
loge det I + SNR HH = .
d SNR SNR
100 A. M. Tulino
All these properties are very attractive in terms of analysis but are also
of paramount importance at the design level. In fact:
a Itis worth emphasizing that, in many cases, resorting to the expected value of the
mutual information is motivated by the stronger consideration that: in problems such
as aperiodic DS-CDMA or multi-antenna with an ergodic channel, it is precisely the
expected capacity that has real operational meaning.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
4 4
3 3
2 2
1 1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
4 4
3 3
2 2
1 1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
SNR N = 50 SNR
N = 15
Fig. 1. Several realizations of the left hand side of (3.3) are compared to the asymptotic
limit in the right hand side of (4.10) in the case of = 1 for N = 3, 5, 15, 50.
102 A. M. Tulino
with the aid of the matrix inversion lemma. Normalized by the single-
user signal-to-noise ratio (SNR hk 2 ), the SINRk gives the so-called MMSE
multiuser eciency, denoted by kMMSE (SNR) [4]:
SINRk
kMMSE (SNR) = . (3.17)
SNR hk 2
104 A. M. Tulino
with
2
2 2
F (x, z) = x(1 + z) + 1 x(1 z) + 1 .
However, as is well known since the work of Marcenko and Pastur [19],
it is rare the case that the limiting empirical distribution of the squared
singular values of random matrices (whose aspect ratio converges to a con-
stant) admit closed-form expressions. Rather, [19] showed a very general
result where the characterization of the solution is accomplished through
a xed-point equation involving the Stieltjes transform. Later this result
has been strengthened in [20]. Consistent with our emphasis, this result is
formulated in terms of the transform rather than the Stieltjes transform
used in [20] as follows:
W = W0 + STS (4.11)
= 0 () (4.12)
= 0 () (1 T ( )) (4.13)
106 A. M. Tulino
HH () = D ( d ()) (4.18)
1
1 a.s. t ()
hj I + h h hj (4.19)
hj 2 E[D]
=j
a.s. t (SNR)
. (4.23)
SNR E[D]
b The conventional notation for multiuser eciency is (cf. [4]); the relationship in (5.6)
is the motivation for the choice of the transform terminology introduced in this section.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
108 A. M. Tulino
with
P[T=0]D
E log VT () > 1
e
L = E log T eD = 1 (4.26)
E log T 1 V P[T=0] < 1
e D
The foregoing result gives the power oset (3.9) of the linear vector
memoryless channel in (2.1) when H is dened as in Theorem 4.7.
1
K
lim 1{Pi,j }
K K
j=1
1 1
N K
lim Pi,j = lim Pi,j . (4.29)
N N K K
i=1 j=1
v N : [0, 1) [0, 1) R
110 A. M. Tulino
HH () = E [ HH (X, ) ] (4.34)
(y,)
As K, N , (N ) converges a.s. to E[v(X,y)] , with (y, ) solution to the
xed-point equation
v(X, y)
(y, ) = E y [0, 1]. (4.37)
v(X,Y)
1 + E 1+ (Y,) |X
P[E[v(X, Y)|Y] = 0]
= ,
P[E[v(X, Y)|X] = 0]
we have that
VHH ()
lim log() = L
min{P[E[v(X, Y)|Y] = 0], P[E[v(X, Y)|X] = 0]}
with
1 v(X , Y )
E log E |X E [log (1 + (Y ))] > 1
e 1 + (Y )
a.s. E log v(X , Y ) = 1
L
e
(Y ) v(X , Y )
E
1
| < 1
log E log 1 + E X
e (Y )
with X and Y the restrictions of X and Y to the events E[v(X, Y)|X]=0 and
E[v(X, Y)|Y]=0, respectively. The function () is the solution, for >1, of
1 v(X , y)
(y) = E
(4.39)
v(R , Y )
E |X
1 + (Y )
whereas () is the solution, for <1, of
1
E
= 1 . (4.40)
v(X , Y )
1+E |X
(Y )
As we will see in the next section, Theorems 4.144.17 give the MMSE
performance, the mutual information and the power oset of a large class of
vector channel of interest in wireless communications which are described
by random matrices with either correlated or independent entries.
Let STS be an N N random matrix with S and T be respectively
N K and K K random matrices as stated in Theorem 4.6. We have seen
that the ESD of STS converges a.s. to a nonrandom limit whose Shannon
and transform satisfy (4.15) and (4.14) respectively.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
112 A. M. Tulino
1
N
g(i ) (4.41)
N i=1
with g() a continuous function on the real line with bounded and contin-
uous derivatives, converge a.s. to a nonrandom quantity. A recent central
result in random matrix theory by Bai and Silverstein (2004) [17] shows
their rate of convergence to be 1/N . Moreover, they show that:
Theorem 4.18 ([17]). Let S be an N K complex matrix dened as in
Theorem 4.6 and such that its (i, j)th entry satises:
2
E[Si,j ] = 0 E[|Si,j |4 ] =
. (4.42)
N2
Let T be a K K matrix dened as in Theorem 4.6 whose spectral norm
is bounded. Let g() be a continuous function on the real line with bounded
and continuous derivatives, analytic on a open set containing the interval c
[lim inf K max2 {0, 1 }, lim sup 1 (1 + )2 ]
K K
From Jensens inequality and (4.14), a tight lower bound for the variance
of in Theorem 4.19 is given by [3, Eq. 2.239]:
(1 STS ())2
E[2 ] log 1 (4.47)
with strict equality if T = I. In fact, Theorem 4.19 can be particularized
to the case T = I to obtain:
Theorem 4.20. Let S be an N K complex as in Theorem 4.19. As
N , the random variable
K, N with K
N = log det(I + SS ) N VSS () (4.48)
is asymptotically zero-mean Gaussian with variance
(1 SS ())2
E[2 ] = log 1 (4.49)
where SS () and VHH () are given in (4.9) and (4.10).
114 A. M. Tulino
5.1. CDMA
An application that is very suitable is the code-division multiple access
channel or CDMA channel, were each user is assigned a signature vector
known at the receiver which can be seen as an element of an N dimensional
signal space. Based on the nature of this signal space we can distinguish
between:
Direct sequence CDMA used in many current cellular systems (IS-95,
cdma2000, UMTS)
Multi-carrier CDMA being considered for fourth generation of cellular
systems.
S = [ s1 | . . . |sK ] (5.1)
y = SAx + n. (5.2)
The standard random signature model [4] assumes that the entries of S,
are chosen independently and equiprobably on { 1N , 1N }. Moreover, the
random signature model is often generalized to encompass non-binary (e.g.
Gaussian) distributions for the amplitudes that modulate the chip wave-
forms. With that, the randomness in the received sequence can also reect
the impact of fading. One motivation for modeling the signatures random is
the use of long sequences in some commercial CDMA systems, where the
period of the pseudo-random sequence spans many symbols. Another moti-
vation is to provide a baseline of comparison for systems that use signature
waveform families with low cross-correlations.
The arithmetic mean of the MMSEs for the K users satises [4]
1 1
K
MMSEk = tr (I + SNR A S SA)1 (5.3)
K K
k=1
A S SA (SNR) (5.4)
whereas the MMSE multiuser eciency of the kth user, kMMSE (SNR), given
in (3.17) is:
1
kMMSE (SNR) = sTk I + SNR |Ai | si si
2 T
sk (5.5)
i=k
where the limit follows from (4.24). According to Theorem 4.6, the MMSE
multiuser eciency, abbreviated as
116 A. M. Tulino
1
K
C MMSE ( SNR) = lim E [log (1 + SINRk )] (5.9)
N N
k=1
The unfaded equal power case is obtained by the the above model assum-
ing A = AI, where A is the transmitted amplitude equal for all users. In this
case, the channel matrix in (5.24) has independent identically distributed
entries and thus, according to Theorem 4.3, its asymptotic ESD converges
to the Marcenko-Pastur law. Thus the normalized capacity achieved with
the optimum receiver in the asymptotic regime is (cf. Theorem 4.4):
opt F (SNR, )
C (, SNR) = log 1 + SNR
4
F (SNR, ) F (SNR, )
+ log 1 + SNR log e,
4 4 SNR
(5.12)
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
F (SNR, )
1 (5.13)
4 SNR
with F (, ) dened in (4.11). Using (4.24) and (4.9), the maximum SINR
(achieved by the MMSE linear receiver) converges to [4]
F (SNR, )
SNR . (5.14)
4
Let us consider a synchronous DS-CDMA downlink with K active users
employing random spreading codes and operating over a frequency-selective
fading channel. Then H in (2.1) particularizes to
H = CSA (5.15)
1 |C|2 ( )
= (5.17)
E[|C|2 ]
1 |A|2 (SNR E[|C|2 ])
= (5.18)
E[|C|2 ]
where |C|2 and |A|2 are independent random variables with distributions
given by the asymptotic spectra of CC and AA , respectively, while
|C|2 () and |A|2 () represent their respective transforms. Note that, using
(4.20), instead of (5.18) and (5.17), we may write [31, 32]
|C|2
= E
. (5.19)
|A|2
1 + SNR |C|2 E
1 + SNR |A|2
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
118 A. M. Tulino
1 1 1
K
MMSEk = tr I + SNR H H
K K
k=1
1 1
= 1 + |C|2 ((SNR)) (5.20)
H = C SA (5.21)
= GS (5.22)
where Ak indicates the received amplitude of that kth user, which accounts
for its average path loss, and C,k denotes the fading for the
th subcarrier
of the kth user, independent across the users. For this scenario, the linear
model (2.1) specializes to
y = (G S)x + n. (5.24)
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
Most quantities of interest such as the multiuser eciency and the ca-
pacity approach their asymptotic behaviors very rapidly as K and N grow
large. Hence, we can get an extremely accurate approximation of the mul-
tiuser eciency and consequently of the capacity with an arbitrary number
of users, K, and a nite processing gain, N , simply by resorting to their
asymptotic approximation with (x, y) replaced in Theorem 5.1 by
1
k1 k
(x, y) |Ak |2 |C,k |2 x< y< .
N N K K
Thus, we have that the multiuser eciency of uplink MC-CDMA is closely
approximated by
N
k (SNR)
kMMSE (SNR) (5.27)
1 N
|C,k |2
N
=1
with
1
N
|C,k |2
N
k (SNR) = . (5.28)
N K
=1 |Aj |2
1 + SNR
K j=1 1 + SNR N j (SNR)
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
120 A. M. Tulino
From (5.9) using Theorem 5.1, the MMSE spectral eciency converges,
as K, N , to
C MMSE (, SNR) = E [log (1 + SNR (Y, SNR))] (5.29)
where the function (, ) is the solution of (5.26).
As an application of Theorem 4.16, the capacity of a multicarrier CDMA
channel is obtained.
Theorem 5.2 ([27]). The capacity of the optimum receiver is
C opt (, SNR) = C MMSE (, SNR)
+ E [log(1 + SNR E [(X, Y)(Y, SNR)|X]]
SNR E [(Y, SNR)(Y, SNR)] log e (5.30)
with (, ) and (, ) satisfying the coupled xed-point equations
(X, y)
(y, SNR) = E (5.31)
1 + SNR E[(X, Y)(Y, SNR)|X]
1
(y, SNR) = (5.32)
1 + SNR (y, SNR)
where X and Y are independent random variables uniform on [0, 1].
H = CSA
122 A. M. Tulino
where
E[x2 ]
SNR = . (5.38)
1
E[n2 ]
nR
If full CSI is available at the transmitter, then V should coincide with
the eigenvector matrix of H H and P should be obtained through a waterll
process on the eigenvalues of H H [16, 4143]. The resulting jth diagonal
entry of P is
+
1
Pj,j = (5.39)
SNR j (H H)
124 A. M. Tulino
Corollary 5.4. With correlation at the end of the link with the fewest
antennas, the capacity per antenna with full CSI at the transmitter con-
verges to
T 1 1 1 <1
E log e + log 1 + log SNR + E T R = 1
C=
R 1
E log log + log SNR( 1) + E
1 >1
e R T = 1 .
126 A. M. Tulino
12
d=1
8 receiver
transmitter
2 analytical
simulation
-10 -5 0 5 10 15 20
SNR (dB)
Fig. 2. Mutual information achieved by an isotropic input on a Rayleigh-faded channel
with nT = 4 and nR = 2. The transmitter is a uniform linear array whose antenna
correlation is given by (5.51) where d is the spacing (wavelengths) between adjacent
antennas. The receive antennas are uncorrelated.
with Hw an i.i.d. N (0, 1) matrix and with the Ricean K-factor quantify-
ing the ratio between the deterministic (unfaded) and the random (faded)
energies [64].
If we assume that H0 has rank r where r > 1 but such that
r
lim =0 (5.53)
N N
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
then all foregoing results can be extended to the ricean channel by simply
replacing R and T with independent random variables whose distribu-
1
tions are the asymptotic spectra of the full-rank matrices K+1 R and
1
K+1 T respectively.
H = UR HUT (5.54)
where UR and UT are unitary while the entries of H are independent zero-
mean Gaussian. This model is advocated and experimentally supported
in [68] and its capacity is characterized asymptotically in [21]. For the more
restrictive case where UR and UT are Fourier matrices, the model (5.54)
was proposed earlier in [69].
Since the spectra of H and H coincide, every result derived for matrices
with independent non-identically distributed entries (cf. Theorems 4.12
4.17) apply immediately to H.
As it turns out, the asymptotic spectral eciency of H is fully charac-
terized by the variances of its entries, which we assemble in a matrix G
such that Gi,j = nT E[|Hi,j |2 ] with
Gi,j = nT nR . (5.55)
ij
d Polarization diversity: Antennas with orthogonal polarizations are used to ensure low
levels of correlation with minimum or no antenna spacing [63, 66] and to make the
communication link robust to polarization rotations in the channel [67].
e Pattern diversity: Antennas with dierent radiation patterns or with rotated versions
of the same pattern are used to discriminate dierent multipath components and reduce
correlation.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
128 A. M. Tulino
Denote by P(t, SNR) the asymptotic power prole of the capacity achieving
power allocation at each SNR, in order to characterize (5.58), we invoke
Theorem 4.16 to obtain the following.
Theorem 5.8 ([21]). Consider the channel H = UR HUT where UR and
UT are unitary while the entries of H are zero-mean Gaussian and indepen-
dent. Denote by G(r, t) the asymptotic variance prole of H. With statistical
CSI at the transmitter, the asymptotic capacity is
C(, SNR) = E [log(1 + SNR E [G(R, T)P(T, SNR)(R, SNR)| T])]
+ E [log(1 + E[G(R, T)P(T, SNR)(T, SNR)|R])]
E [G(R, T)P(T, SNR)(R, SNR)(T, SNR)] log e
with expectation over the independent random variables R and T uniform
on [0, 1] and with
1
(r, SNR) =
1 + E[G(r, T)P(T, SNR)(T, SNR)]
SNR
(t, SNR) = .
1 + SNR E [G(R, t)P(t, SNR)(R, SNR)]
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
H = A Hw (5.59)
130 A. M. Tulino
Fig. 3. Laptop computers equipped with a 16-antenna planar array. Two orthogonal
polarizations used.
...
...
P= ... (5.60)
.. .. .. .. ..
. . . . .
which is asymptotically mean doubly regular.
Again the zero-mean multi-antenna channel model analyzed thus far can
be made Ricean by incorporating an additional deterministic component
H [6163] which leads to the following general model
$* * '
1 K
y= H+ H x + n (5.61)
K +1 K +1
with the scalar Ricean factor K quantifying the ratio between the Frobenius
norm of the deterministic (unfaded) component and the expected Frobe-
nius norm of the random (faded) component. Considered individually, each
(i, j)th channel entry has a Ricean factor given by
|Hi,j |2
K .
E[|Hi,j |2 ]
Using Lemma 2.6 in [23] the next result follows straightforwardly.
132 A. M. Tulino
%
nT 2 &
1 j (T) SNR
2 = log 1 . (5.64)
nT j=1 1 + j (T) SNR
14
10% Outage Capacity (bits/s/Hz)
Simulation
12 Gaussian approximation
10
SNR (dB) Simul. Asympt.
0 0.52 0.50
8 10 2.28 2.27
Transmitter Receiver
2 (K=2) (N=2)
0
0 5 10 15 20 25 30 35 40
SNR (dB)
12
4 Receiver
(N=2)
Transmitter
(K=4)
0
0 5 10 15 20 25 30 35 40
SNR (dB)
References
1. S. Verdu, Random matrices in wireless communication, proposal to the Na-
tional Science Foundation (Feb. 1999).
2. S. Verdu, Large random matrices and wireless communications, 2002 MSRI
Information Theory Workshop (Feb 25Mar 1, 2002).
3. A. M. Tulino and S. Verdu, Random Matrix Theory and Wireless Com-
munications, Foundations and Trends in Communications and Information
Theory, Volume 1, Issue 1 (Now Publishers Inc., 2004).
4. S. Verdu, Multiuser Detection (Cambridge University Press, Cambridge, UK,
1998).
5. S. Verdu and S. Shamai, Spectral eciency of CDMA with random spreading,
IEEE Trans. Information Theory 45(2) (1999) 622640.
6. D. Tse and S. Hanly, Linear multiuser receivers: Eective interference, eec-
tive bandwidth and user capacity, IEEE Trans. Information Theory 45(2)
(1999) 641657.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
134 A. M. Tulino
25. A. Guionnet and O. Zeitouni, Concentration of the spectral measure for large
matrices, Electronic Communications in Probability 5 (2000) 119136.
26. D. Shlyankhtenko, Random Gaussian band matrices and freeness with amal-
gamation, Int. Math. Res. Note 20 (1996) 10131025.
27. L. Li, A. M. Tulino and S. Verdu, Spectral eciency of multicarrier CDMA,
IEEE Trans. Information Theory 51(2) (2005) 479505.
28. Z. D. Bai and J. W. Silverstein, Exact separation of eigenvalues of large
dimensional sample covariance matrices, Annals of Probability 27(3) (1999)
15361555.
29. A. M. Tulino and S. Verdu, Asymptotic outage capacity of multiantenna
channels, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing
(ICASSP05), Philadelphia, PA, USA (Mar. 2005).
30. S. Shamai and S. Verdu, The eect of frequency-at fading on the spectral
eciency of CDMA, IEEE Trans. Information Theory 47(4) (2001) 1302
1327.
31. J. M. Chaufray, W. Hachem and P. Loubaton, Asymptotic analysis of opti-
mum and sub-optimum CDMA MMSE receivers, Proc. IEEE Int. Symp. on
Information Theory (ISIT02) (July 2002), p. 189.
32. L. Li, A. M. Tulino and S. Verdu, Design of reduced-rank MMSE multiuser
detectors using random matrix methods, IEEE Trans. Information Theory
50(6) (2004).
33. M. Debbah, W. Hachem, P. Loubaton and M. de Courville, MMSE analysis of
certain large isometric random precoded systems, IEEE Trans. Information
Theory 49(5) (2003) 12931311.
34. R. Horn and C. Johnson, Matrix Analysis (Cambridge University Press,
1985).
35. G. Foschini and M. Gans, On limits of wireless communications in fading en-
vironment when using multiple antennas, Wireless Personal Communications
6(6) (1998) 315335.
36. S. N. Diggavi, N. Al-Dhahir, A. Stamoulis and A. R. Calderbank, Great
expectations: The value of spatial diversity in wireless networks, Proc. IEEE
92(2) (2004) 219270.
37. A. Goldsmith, S. A. Jafar, N. Jindal and S. Vishwanath, Capacity limits of
MIMO channels, IEEE J. Selected Areas in Communications 21(5) (2003)
684702.
38. D. Gesbert, M. Sha, D. Shiu, P. J. Smith and A. Naguib, From theory
to practice: An overview of MIMO spacetime coded wireless systems, J.
Selected Areas in Communications 21(3) (2003) 281302.
39. E. Biglieri and G. Taricco, Large-system analyses of multiple-antenna
system capacities, Journal of Communications and Networks 5(2) (2003)
5764.
40. E. Biglieri and G. Taricco, Transmission and reception with multiple an-
tennas: Theoretical foundations, submitted to Foundations and Trends in
Communications and Information Theory (2004).
41. B. S. Tsybakov, The capacity of a memoryless Gaussian vector channel, Prob-
lems of Information Transmission 1 (1965) 1829.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
136 A. M. Tulino
56. A. M. Tulino, A. Lozano and S. Verdu, MIMO capacity with channel state
information at the transmitter, in Proc. IEEE Int. Symp. on Spread Spectrum
Tech. and Applications (ISSSTA04) (Aug. 2004).
57. E. Visotsky and U. Madhow, Space-time transmit precoding with imperfect
feedback, IEEE Trans. Information Theory 47 (2001) 26322639.
58. S. A. Jafar, S. Vishwanath and A. J. Goldsmith, Channel capacity and beam-
forming for multiple transmit and receive antennas with covariance feed-
back, in Proc. IEEE Int. Conf. on Communications (ICC01), Vol. 7 (2001),
pp. 22662270.
59. A. M. Tulino, S. Verdu and A. Lozano, Capacity of antenna arrays with
space, polarization and pattern diversity, in Proc. 2003 IEEE Information
Theory Workshop (ITW03) (Apr. 2003), pp. 324327.
60. T.-S. Chu and L. J. Greenstein, A semiempirical representation of antenna
diversity gain at cellular and PCS base stations, IEEE Trans. on Communi-
cations 45(6) (1997) 644656.
61. P. Driessen and G. J. Foschini, On the capacity formula for multiple-input
multiple-output channels: A geometric interpretation, IEEE Trans. on Com-
munications 47(2) (1999) 173176.
62. F. R. Farrokhi, G. J. Foschini, A. Lozano and R. A. Valenzuela, Link-optimal
space-time processing with multiple transmit and receive antennas, IEEE
Communications Letters 5(3) (2001) 8587.
63. P. Soma, D. S. Baum, V. Erceg, R. Krishnamoorthy and A. Paulraj, Anal-
ysis and modelling of multiple-input multiple-output (MIMO) radio channel
based on outdoor measurements conducted at 2.5 GHz for xed BWA appli-
cations, in Proc. IEEE Int. Conf. on Communications (ICC02), New York
City, NY (28 Apr.2 May 2002), pp. 272276.
64. S. Rice, Mathematical analysis of random noise, Bell System Technical Jour-
nal 23 (1944) 282332.
65. H. Ozcelik, M. Herdin, W. Weichselberger, G. Wallace and E. Bonek, De-
ciencies of the Kronecker MIMO channel model, IEE Electronic Letters 39
(2003) 209210.
66. W. C. Y. Lee and Y. S. Yeh, Polarization diversity system for mobile radio,
IEEE Trans. on Communications 20(5) (1972) 912923.
67. S. A. Bergmann and H. W. Arnold, Polarization diversity in portable
communications environment, IEE Electronic Letters 22(11) (1986) 609
610.
68. W. Weichselberger, M. Herdin, H. Ozcelik and E. Bonek, Stochastic MIMO
channel model with joint correlation of both link ends, IEEE Trans. Wireless
Communications 5(1) (2006) 90100.
69. A. Sayeed, Deconstructing multi-antenna channels, IEEE Trans. Signal Pro-
cessing 50(10) (2002) 25632579.
70. A. M. Tulino, A. Lozano and S. Verdu, Capacity-achieving input covari-
ance for single-user multi-antenna channels, Bell Labs Tech. Memorandum
ITD-04-45193Y, also in IEEE Trans. Wireless Communications 5(3) (2006)
662671.
May 21, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 04-Tulino
138 A. M. Tulino
Ralf R. Muller
Department of Electronics and Telecommunications
Norwegian University of Science and Technology
7491 Trondheim, Norway
E-mail: [email protected]
This review paper gives a tutorial overview of the usage of the replica
method in multiuser communications. It introduces the self averag-
ing principle, the free energy and other physical quantities and gives
them a meaning in the context of multiuser communications. The tech-
nical issues of the replica methods are explained to a non-physics
audience. An isomorphism between receiver metrics and the fundamen-
tal laws of physics is drawn. The overview is explained at the example
of detection of code-division multiple-access with random signature
sequences.
1. Introduction
Multiuser communication systems which are driven by Gaussian distributed
signals can be fully characterized by the distribution of the singular values
of the channel matrix in the large user limit. In digital communications,
however, transmitted signals are chosen from nite, often binary, sets. In
those cases, knowledge of the asymptotic spectrum of large random ma-
trices is, in general, not sucient to get valuable insight into the behavior
of characteristic performance measures such as bit error probabilities and
supported data rate. We will see that the quantized nature of signals gives
rise to the totally unexpected occurrence of phase transitions in multiuser
communications which can, by no means, be inferred from the asymptotic
convergence of eigenvalue spectra of large random matrices.
In order to analyze and design large dimensional communication sys-
tems which cannot be described by eigenvalues and eigenvectors alone, but
139
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
140 R. R. Muller
2. Self Average
While random matrix theory and recently also free probability theory
[2, 3] prove the (almost sure) convergence of some random variables to
deterministic values in the large matrix limit, statistical physics does not
always do so. It is considered a fundamental principle of statistical physics
that there are microscopic and macroscopic variables. Microscopic variables
are physical properties of microscopically small particles, e.g. the speed of a
gas molecule or the spin of an electron. Macroscopic variables are physical
properties of compound objects that contain many microscopic particles,
e.g. the temperature or pressure of a gas, the radiation of a hot object,
or the magnetic eld of a piece of ferromagnetic material. From a physics
point of view, it is clear which variables are macroscopic and which ones are
microscopic. An explicit proof that a particular variable is self-averaging,
i.e. it converges to a deterministic value in the large system limit, is a nice
result, if it is found, but it is not particularly important to the physics
community. When applying the replica method, systems are often only as-
sumed to be self-averaging. The replica method itself must be seen as a tool
to enable the calculation of macroscopic properties by averaging over the
microscopic properties.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
3. Free Energy
The second law of thermodynamics demands the entropy of any physical
system with conserved energy to converge to its maximum as time evolves.
If the system is described by a density pX(x) of states X R, this means
that in the thermodynamic equilibrium the (dierential) entropy
H(X) = log pX(x) dPX(x) (3.1)
constant. Hereby, the energy function ||x|| can be any measure which is
uniformly bounded from below.
The density at thermodynamic equilibrium is easily shown by the
method of Lagrange multipliers to be
e T ||x||
1
pX(x) = + (3.3)
T1 ||x||
e dx
and called the Boltzmann distribution. The parameter T is called the tem-
perature of the system and determined by (3.2). For a Euclidean energy
measure, the Boltzmann distribution takes on the form of a Gaussian dis-
tribution which is well known in information theory to maximize entropy
for given average signal power.
A helpful quantity in statistical mechanics is the (normalized) free
energy a dened as
F(X) = E(X) T H(X) (3.4)
+
T1 ||x||
= T log e dx . (3.5)
142 R. R. Muller
of cost function, e.g. bit error probability, subject to its hypotheses on the
channel transition probability pY |X(y, x) and the prior distribution pX(x).
If the assumed distributions equal the true distributions, the detector is op-
timum with respect to its cost function. If the assumed distributions dier
from the true ones, the detector is mismatched in some sense. The mismatch
can arise from insucient knowledge at the detector due to channel uctua-
tions or due to detector complexity. If the optimum detector requires an ex-
haustive search to solve an np-complete optimization, approximations to the
true prior distribution often lead to suboptimal detectors with reduced com-
plexity. Many popular detectors can be described within this framework.
The minimization of a cost function subject to some hypothesis on the
channel transition probability and some hypothesis on the prior distribu-
tion denes a metric which is to be optimized. This metric corresponds to
the energy function in thermodynamics and determines the distribution of
the microscopic variables in the thermodynamic equilibrium. In analogy to
(3.3), we nd
e T ||x||
1
144 R. R. Muller
Note that, inside the logarithm, expectations are taken with respect to the
assumed distribution via the denition of the energy function, while, outside
the logarithm, expectations are taken with respect to the true distribution.
In the case of matched detection, i.e. the assumed distributions equal
the true distributions, the argument of the logarithm in (4.3) becomes pY (y)
up to a normalizing factor. Thus, the free energy becomes the (dierential)
entropy of Y up to a scaling factor and an additive constant.
Statistical mechanics provides an excellent framework to study not only
matched, but also mismatched detection. The analysis of mismatched detec-
tion in large communication systems which is purely based on asymptotic
properties of large random matrices and does not exploit the tools pro-
vided by statistical mechanics has been very limited so far. One exception
is the asymptotic SINR of linear MMSE multiuser detectors with erroneous
assumptions on the powers of interfering users in [4].
5. Replica Continuity
The explicit evaluation of the free energy turns out to be very complicated
in many cases of interest. One major obstacle is the occurrence of the ex-
pectation of the logarithm of some function f () of a random variable Y
E log f (Y ) . (5.1)
Y
Under the assumption that limit and expectation can be interchanged, this
gives
E log f (Y ) = lim E [f (Y )]n (5.4)
Y n Y
n0
n
= lim log E [f (Y )] (5.5)
n0 n Y
and reduces the problem to the calculation of the nth moment of the func-
tion of the random variable Y in the neighborhood of n = 0. Note that
the expectation must be calculated for real-valued variables n in order to
perform the limit operation.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
146 R. R. Muller
which shows that it is actually the logarithm of the K-norm of the function
exp(f (x)). For K , we get the maximum norm and thus obtain
1
lim log eKf (x) dx = max f (x). (6.3)
K K x
That means, the integral can be solved maximizing the argument of the
exponential function.
Some authors also refer to the saddle point integration as saddle point
approximation and motivate it by a series expansion of the function f (x)
in the exponent. Making use of the identity (5.5) instead of (5.4), we can
argue via the innity norm and need not study under which conditions of
the function f (x) the saddle point approximation is accurate.
7. Replica Symmetry
If the function in the exponent is multivariate typically all replicated
random variables are arguments one would need to nd the extremum
of a multivariate function for an arbitrary number of arguments. This can
easily become a hopeless task, unless one can exploit some symmetries of
the optimization problem.
Assuming replica symmetry means that one concludes from the symme-
try of the exponent, e.g. f (x1 , x2 ) = f (x2 , x1 ) for the bi-variate case, that
the extremum appears if all variables take on the same value. Then, the
multivariate optimization problem reduces to a single variate one, e.g.
for the bi-variate case. This is the most critical assumption when applying
the replica method. In fact, it is not always true, even in practically relevant
cases. Figure 1 shows both an example and a counterexample. The general
way to circumvent this trouble is to assume replica symmetry at hand and
proof later, having found a replica symmetric solution, that it is correct.
With the example of Fig. 1 in mind, it might seem that replica sym-
metry is a very odd assumption. However, the functions to be extremized
arise from replication of identical integrals, see (5.9). Given the particular
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
0.2
f(x1,x2 )
0.2
2 2
1 1
x2 0 0 x1
1 1
2 2
it seems rather odd that replica symmetry might not hold. However, writing
our problem in the form of (7.2) assumes that the parameter n is an integer
despite the fact that it is actually a real number in the neighborhood of zero.
Thus, our intuition suggesting not to question replica symmetry cheats on
us. In fact, there are even practically relevant cases without sensible replica
symmetric solutions, e.g. cases were the replica symmetric solution implies
the entropy to be negative. Such phenomena are labeled replica symmetry
breaking and a rich theory in statistical mechanics literature exists to deal
with them [6, 5, 1]. For the introductory character of this work, however,
replica symmetry breaking is a too advanced issue.
148 R. R. Muller
and Okada [12], Kabashima [13], Li and Poor [14], Guo [15], and Wen and
Wong [16]. Additionally, the replica method has also been successfully used
for the design and analysis of error correction codes.
Consider a vector-valued real additive white Gaussian noise channel
characterized by the conditional probability distributionb
1
(yHx)T (yHx)
22
e 0
py|x,H(y, x, H) = N (8.1)
(202 ) 2
e 22 (yHx)
1 T
(yHx)
py|x,H(y, x, H) = N (8.2)
(2 2 ) 2
and the assumed prior distribution px(x). Let the entries of H be indepen-
dent zero-mean with vanishing odd order moments and variances wck /N for
row c and column k. Moreover, let wck be uniformly bounded from above.
Applying Bayes law, we nd
px|y,H(x, y, H) = . (8.3)
e 22 (yHx) (yHx) dPx(x)
1 T
Since (3.3) holds for any temperature T , we set without loss of generality
T = 1 in (3.3) and nd the appropriate energy function to be
1 T
||x|| = (y Hx) (y Hx) log px(x) . (8.4)
2 2
This choice of the energy function ensures that the thermodynamic equi-
librium models the detector dened by the assumed conditional and prior
distributions.
Let K denote the number of users, that is the dimensionality of the input
vector x. Applying successively (4.3) with (8.1), (5.5), replica continuity
b Inthis example, we do not use upper case and lower case notation to distinguish random
variables and their realizations to not mix up vectors and matrices.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
n
1
= lim log E e ||x|| dx
K n0 n H RN RK
1
(yHx)T (yHx)
22
e 0
N dydPx(x) (8.5)
(202 ) 2
n
212 (yHxa )T (yHxa )
E e a dPa(xa ) dy
1 RN H a=0
= lim log N
K n0 n (202 ) 2
=n
150 R. R. Muller
where
N (c)
n
x x
Qab [c]
a b
eKI{Q[]} = dPa (xa ) (8.12)
c=1
N a=0
ab
c The notation f {Q[]} expresses the dependency of the function f () on all Qab [c], 0
a b n, 1 c N . n n
d The notation
ab is used as shortcut for a=0 b=a .
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
dQab [c]
xa xb n
Q
KI{Q[]} Qab [c] N ab [c]
e = e dPa (xa )
c=1 J 2j a=0
ab
(8.15)
N
Qab [c]Qab [c]
c=1 ab
= e
J N (n+2)(n+1)/2
K N
dQab [c]
Mk Q[] (8.16)
c=1
2j
k=1 ab
with
N n
1
Mk Q[] = exp Qab [c]xak xbk wck dPa (xak ). (8.17)
N c=1 a=0
ab
152 R. R. Muller
and
Mk {E, F, G, G0 }
1 N
G0c 2 n
Ec x0k xak + G2c x2ak +
n
Fc xak xbk n
N wck 2 x0k +
= e c=1 a=1 b=a+1
dPa (xak )
a=0
G0k
x20k +
n
G
Ek x0k xak + 2k x2ak +
n
Fk xak xbk
n
2
a=1 b=a+1
= e dPa (xak ) (8.19)
a=0
where
1
N
Ek = Ec wck (8.20)
N c=1
1
N
Fk = Fc wck (8.21)
N c=1
1
N
Gk = Gc wck (8.22)
N c=1
1
N
G0k = G0c wck . (8.23)
N c=1
Note that the prior distribution enters the free energy only via the (8.19).
We will focus on this later on after having nished with the other terms.
For the evaluation of eG{Q[c]} in (8.11), we can use the replica symmetry
to construct the correlated Gaussian random variables vac out of indepen-
dent zero-mean, unit-variance Gaussian random variables uc , tc , zac by
!
m2c mc
v0c = uc p0c tc (8.24)
qc qc
vac = zac pc qc tc qc , a > 0. (8.25)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
2 n
yc
exp 2 zc pc qc tc qc Dzc Dtc dyc
R 2
"
#
2 (pc% qc ))
# (1 + 1n
=$ 2
& (8.27)
1+
2 (pc qc ) + n 2 0 + p0c 2mc + qc
with the Gaussian measure Dz = exp(z 2 /2) dz/ 2. Since the integral in
(8.11) is dominated by the maximum argument of the exponential function,
the derivatives of
1
N
G{Q[c]} Qab [c]Qab [c] (8.28)
N c=1
ab
In the following, the calculations are shown explicitly for Gaussian and
binary priors. Additionally, a general formula for arbitrary priors is given.
154 R. R. Muller
"
# % &1n
#
# 1 + Fk Gk
Mk {E, F, G, G0 } = #
$% &% & . (8.34)
1 G0k 1 + Fk Gk nFk nEk2
K
N
n(n1) G0c p0c n
log Mk {E, F, G, G0 } nEc mc + Fc qc + + Gc pc
c=1
2 2 2
k=1
(8.35)
1
K
Ek
mc = wck (8.36)
K 1 + Ek
k=1
1
K
E 2 + Fk
qc = wck % k &2 (8.37)
K
k=1 1 + Ek
1
K
E 2 + Ek + Fk + 1
pc = wck k % &2 (8.38)
K
k=1 1 + Ek
1
K
p0c = wck (8.39)
K
k=1
in the limit n 0 with (8.31) and (8.32). Surprisingly, if we let the true
prior to be binary and only the replicas to be Gaussian we also nd (8.36)
to (8.39). This setting corresponds to linear MMSE detection [17].
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
Returning to our initial goal, the evaluation of the free energy, and
collecting our previous results, we nd
F(x) 1
= lim log n (8.40)
K K n0 n
N
1 n(n 1)
= lim G(mc , qc , pc , p0c ) + Fc qc
K n0 n c=1 2
K
n
+ nEc mc + Gc pc log Mk {E, F, G, 0} (8.41)
2
k=1
N
1
= lim log 1 + 2 (pc qc ) + 2Ec mc + Gc pc
2K n0 c=1
02 + (p0c 2mc + qc )
+(2n 1)Fc qc + 2
+ (pc qc ) + n02 + n(p0c 2mc + qc )
1 K % & Ek2 + Fk
+ lim log 1 + Ek (8.42)
2K n0
k=1
1 + Ek nEk2 nFk
N
1
= log 1 + 2 (pc qc ) + 2Ec mc Fc qc
2K c=1
Ec 1
K % & E 2 + F
k
+Gc pc + + log 1 + Ek k . (8.43)
Fc 2K 1 + Ek
k=1
This is the nal result for the free energy of the mismatched detector as-
suming noise variance 2 instead of the true noise variance 02 . The six
macroscopic parameters Ec , Fc , Gc , mc , qc , pc are implicitly given by the si-
multaneous solution of the system of equations (8.29) to (8.31) and (8.36) to
(8.38) with the denitions (8.20) to (8.22) for all chip times c. This system
of equations can only be solved numerically.
Specializing our result to the matched detector assuming the true noise
variance by letting 0 , we have Fc Ec , Gc G0c , qc mc ,
pc p0c . This makes the free energy simplify to
N
F(x) 1 2 ' 2 ( 1
K % &
= 0 Ec log 0 Ec + log 1 + Ek (8.44)
K 2K c=1 2K
k=1
with
1
Ec = . (8.45)
K
wck
02 +
K 1 + Ek
k=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
156 R. R. Muller
This result is more compact and it requires only to solve (8.45) numerically
which is conveniently done by xed-point iteration.
It can be shown that the parameter Ek is actually the signal-to-
interference and noise ratio of user k. It has been derived independently by
Hanly and Tse [18] in context of CDMA with macro-diversity using a result
from random matrix theory by Girko [19]. Note that (8.45) and (8.20) are
actually formally equivalent to the result Girko found for random matrices.
The similarity of free energy with the entropy of the channel output
mentioned at the end of Section 4 is expressed by the simple relationship
I(x, y) F(x) 1
= (8.46)
K K 2
between the (normalized) free energy and the (normalized) mutual infor-
mation between channel input signal x and channel output signal y given
the channel matrix H. Assuming that the channel is perfectly known to the
receiver, but totally unknown to the transmitter, (8.46) gives the channel
capacity per user.
Mk {E, F, G, G0 }
G0k +nGk n
n
2 + Ek x0k xak + Fk xak xbk
n
a=1 b=a+1
= e dPa (xak ) (8.48)
Rn+1 a=0
) n
1 1 + tk n
= e 2 (G0k +nGk ) exp Ek xak + Fk xak xbk
2
{xak ,a=1,...,n} a=1 b=a+1
n * n
1 tk n
+ exp Ek xak + Fk xak xbk Pr(xak ) (8.49)
2
a=1 b=a+1 a=1
) n 2
1
(G0k +nGk nFk )
1 + tk Fk n
=e 2 exp xak + Ek xak
2 2
{xak ,a=1,...,n} a=1 a=1
2 *
1 tk Fk
n
n
n
+ exp xak Ek xak Pr(xak ) (8.50)
2 2
a=1 a=1 a=1
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
Mk {E, F, G, G0 }
+
1 + tk
n
= e (G0k +nGk nFk )
1
2 exp z Fk + Ek xak
2
{xak ,a=1,...,n} a=1
+
1 tk
n
n
+ exp z Fk + Ek xak Dz Pr(xak ). (8.52)
2 a=1 a=1
Since
+ n
n
fn = exp z Fk + Ek xka Pr(xka ) (8.53)
{xka ,a=1,...,n} a=1 a=1
+
= Pr(xkn )fn1 exp z Fk + Ek xkn (8.54)
xkn
% , &
cosh k /2 + z Fk + Ek
= fn1 (8.55)
cosh (k /2)
% , &
coshn k /2 + z Fk + Ek
= (8.56)
coshn (k /2)
with tk = tanh(k /2), we nd
Mk {E, F, G, G0 }
- 1+tk % , & % , &
2 coshn
z Fk + Ek + k
2 + 1tk
2 coshn
z Fk + Ek k
2 Dz
= ' ( % & .
coshn 2k exp nFk G20k nGk
(8.57)
158 R. R. Muller
in the limit n 0. In order to obtain (8.59), note from (8.50) that the rst
order derivative of Mk exp(nFk /2) with respect to Fc is identical to half of
the second order derivative of Mk exp(nFk /2) with respect to Ec .
Returning to our initial goal, the evaluation of the free energy, and
collecting our previous results, we nd
F(x) 1
= lim log n
K K n0 n
N
1
= lim G(mc , qc , pc , p0c ) + nEc mc
K n0 n c=1
K
n(n 1) n
+ Fc qc + Gc pc log Mk {E, F, G, 0} (8.61)
2 2
k=1
N
1
= log 1 + 2 (pc qc )
2K c=1
Ec
+ Ec (2mc +pc ) + Fc (pc qc ) +
Fc
K +
1 1 + tk k Ek
log cosh z Fk + Ek +
K 2 2 2
k=1
+
1 tk k 1 ' (
+ log cosh z Fk + Ek Dz + log 1 t2k .
2 2 2
(8.62)
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
This is the nal result for the free energy of the mismatched detector as-
suming noise variance 2 instead of the true noise variance 02 . The ve
macroscopic parameters Ec , Fc , mc , qc , pc are implicitly given by the simul-
taneous solution of the system of equations (8.29), (8.30) and (8.58) to
(8.60) with the denitions (8.20) to (8.22) for all chip times c. This system
of equations can only be solved numerically. Moreover, it can have multi-
ple solutions. In case of multiple solutions, the correct solution is that one
which minimizes the free energy, since in the thermodynamic equilibrium
the free energy is always minimized, cf. Section 3.
Specializing our result to the matched detector assuming the true noise
variance by letting 0 , we have Fc Ec , Gc G0c , qc mc which
makes the free energy simplify to
1 . 2 ' (/ 1 1 ' (
N K
F(x)
= 0 Ec log 02 Ec log 1 t2k Ek
K 2K c=1 K 2
k=1
+
1 + tk k
+ log cosh z Ek + Ek + (8.63)
2 2
+
1 tk k
+ log cosh z Ek + Ek Dz
2 2
In fact, it can even be shown that in the large system limit, an equivalent
additive white Gaussian noise channel can be dened to model the mul-
tiuser interference [10, 9]. Constraining the input alphabet of the channel
to follow the non-uniform binary distribution (8.47) and assuming channel
state information being available only at the transmitter, channel capacity
is given by (8.46) with the free energy given in (8.63).
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
160 R. R. Muller
Large system results for binary prior (even for uniform binary prior)
have not yet been able to be derived by means of rigorous mathematics
despite intense eort to do so. Only for the case of vanishing noise variance
a fully mathematically rigorous result was found by Tse and Verdu [20]
which does not rely on the replica method.
Mk {E, F, G, G0 }
G0k
x20k +
n
Ek x0k xak +
Gk
x2ak +
n
Fk xak xbk
n
2 2
a=1 b=a+1
= e dPa (xak ) (8.66)
a=0
2
G0k Fk
n
n
Gk Fk
n
2 x20k + 2 xak + Ek x0k xak + 2 x2ak
= e a=1 a=1 dPa (xak ) (8.67)
a=0
n
G0k
2 x20k + Ek x0k xak + Fk zxak +
Gk Fk
2 x2ak
n
= e a=1 Dz dPa (xak ) (8.68)
a=0
n
G0k Gk Fk
x2k Ek xk xk + Fk z xk + x2k
= e 2 e 2 dPxk(xk ) DzdPxk(xk ) .
(8.69)
1 2
Ik = 2 2
Ik Ik . (8.70)
Fc 2x0k Ec Gc
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
e dPxk(xk )
(8.73)
1
K
p0c = wck x2k dPxk(xk ) (8.74)
K
k=1
K
1
x2
k
Ek xk xk + Fk z xk
2
log e dPxk(xk ) DzdPxk(xk ) .
K
k=1
(8.75)
This is the nal result for the free energy of the mismatched detector as-
suming noise variance 2 instead of the true noise variance 02 . The ve
macroscopic parameters Ec , Fc , mc , qc , pc are implicitly given by the simul-
taneous solution of the system of equations (8.29), (8.30) and (8.58) to
(8.60) with the denitions (8.20) to (8.22) for all chip times c. This system
of equations can only be solved numerically. Moreover, it can have multi-
ple solutions. In case of multiple solutions, the correct solution is that one
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
162 R. R. Muller
9. Phase Transitions
In thermodynamics, the occurrence of phase transitions, i.e. melting ice
becomes water, is a well-known phenomenon. In digital communications,
however, such phenomena are less known, though they do occur. The sim-
ilarity between thermodynamics and multiuser detection pointed out in
Section 4, should be sucient to convince the reader that phase transitions
in digital communications do occur. Phase transitions in turbo decoding
and detection of CDMA were found in [21] and [7], respectively.
The phase transitions in digital communications are similar to the hys-
teresis in ferro-magnetic materials. They occur if the equations determining
the macroscopic parameters, e.g. Ec determined by (8.64), have multiple
solutions. Then, it is the free energy to decide which of the solution corre-
sponds to the thermodynamic equilibrium. If a system parameter, e.g. the
load or the noise variance, changes, the free energy may shift its favor from
one solution to another one. Since each solution corresponds to a dierent
macroscopic property of the system, changing the valid solution means that
a phase transition takes place.
In digital communications, a popular macroscopic property is the bit
error probability. It is related to the macroscopic property Ek in (8.64) by
(8.65) for the case considered in Section 8. Numerical results are depicted
in Fig. 2. The thick curve shows the bit error probability of the individually
optimum detector as a function of the load. The thin curves show alternative
solutions for the bit error probability corresponding to alternative solutions
to the equations for the macroscopic variable Ek . Only for a certain interval
of the load, approximately 1.73 3.56 in Fig. 2, multiple solutions
coexist. As expected, the bit error probability increases with the load. At
a load of approximately = 1.986 a phase transition occurs and lets the
bit error probability jump. Unlike to ferromagnetic materials, there is no
hysteresis eect for the bit error probability of the individually optimum
detector, but only a phase transition. This is, as the external magnetic
eld corresponds to the channel output observed by the receiver. Unlike
an external magnetic eld, the channel output is a statistical variable and
cannot be design to undergo certain trajectories.
In order to observe a hysteresis behavior, we can expand our scope to
neural networks. Consider a Hopeld neural network [22] implementation of
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
0
10
2
10
b)
3
10
4
10
0 1 2 3 4 5
load
Fig. 2. Bit error probability for the individually optimum detector with uniform binary
prior distribution versus system load for 10 log10 (Es /N0 ) = 6 dB.
e Notethat in a system with a finite number of users, the Hopfield neural network is
suboptimal at any load.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller
164 R. R. Muller
reached. The Hopeld neural network follows the lower curve, if interference
is added, and it follows the upper line, if it is removed.
It should be remarked that a hysteresis behavior of the Hopeld neural
network detector does not occur for all denitions of the energy function and
all prior distributions of the data to be detected, but additional conditions
on the microscopic conguration of the system need to be fullled.
References
1. H. Nishimori, Statistical Physics of Spin Glasses and Information Processing
(Oxford University Press, Oxford, U.K., 2001).
2. F. Hiai and D. Petz, The Semicircle Law, Free Random Variables and Entropy
(American Mathematical Society, Providence, RI, 2000).
3. S. Thorbjrnsen, Mixed moments of Voiculescus Gaussian random matrices,
J. Funct. Anal. 176 (2000) 213246.
4. G. Caire, R. R. Muller and T. Tanaka, Iterative multiuser joint decoding:
Optimal power allocation and low-complexity implementation, IEEE Trans.
Inform. Theory 50(9) (2004) 19501973.
5. K. H. Fischer and J. A. Hertz, Spin Glasses (Cambridge University Press,
Cambridge, U.K., 1991).
6. M. Mezard, G. Parisi and M. A. Virasoro, Spin Glass Theory and Beyond
(World Scientic, Singapore, 1987).
7. T. Tanaka, A statistical mechanics approach to large-system analysis of
CDMA multiuser detectors, IEEE Trans. Inform. Theory 48(11) (2002)
28882910.
8. T. Tanaka and D. Saad, A statistical-mechanics analysis of coded CDMA
with regular LDPC codes, in Proc. of IEEE International Symposium on
Information Theory (ISIT), Yokohama, Japan (June/July 2003), p. 444.
9. D. Guo and S. Verdu, Randomly spread CDMA: Asymptotics via statistical
physics, IEEE Trans. Inform. Theory 51(6) (2005) 19832010.
10. R. R. Muller and W. H. Gerstacker, On the capacity loss due to separation
of detection and decoding, IEEE Trans. Inform. Theory 50(8) (2004) 1769
1778.
11. R. R. Muller, Channel capacity and minimum probability of error in large
dual antenna array systems with binary modulation, IEEE Trans. Signal
Process. 51(11) (2003) 28212828.
12. T. Tanaka and M. Okada, Approximate belief propagation, density evolution,
and statistical neurodynamics for CDMA multiuser detection, IEEE Trans.
Inform. Theory 51(2) (2005) 700706.
13. Y. Kabashima, A CDMA multiuser detection algorithm on the basis of belief
propagation, J. Phys. A 36 (2003) 1111111121.
14. H. Li and H. Vincent Poor, Impact of channel estimation errors on multiuser
detection via the replica method, EURASIP J. Wireless Commun. Network-
ing 2005(2) (2005) 175186.
May 5, 2009 Master Review Vol. 9in x 6in (for Lecture Note Series, IMS, NUS) 05-Muller