0% found this document useful (0 votes)
413 views186 pages

Random Matrix Theory and Wireless Communications

Foundations and Trends TM in Communications and Information Theory Volume 1 Issue 1, 2004 Editorial Board Editor-in-Chief: Sergio Verd' u Department of Electrical Engineering Princeton University Princeton, New Jersey 08544, USA.

Uploaded by

Jongmin Lee
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
413 views186 pages

Random Matrix Theory and Wireless Communications

Foundations and Trends TM in Communications and Information Theory Volume 1 Issue 1, 2004 Editorial Board Editor-in-Chief: Sergio Verd' u Department of Electrical Engineering Princeton University Princeton, New Jersey 08544, USA.

Uploaded by

Jongmin Lee
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 186

Foundations and TrendsTM in

Communications and Information Theory


Volume 1 Issue 1, 2004

Editorial Board
Editor-in-Chief: Sergio Verdú
Department of Electrical Engineering
Princeton University
Princeton, New Jersey 08544, USA
[email protected]

Editors
Venkat Anantharam (Berkeley) Amos Lapidoth (ETH Zurich)
Ezio Biglieri (Torino) Bob McEliece (Caltech)
Giuseppe Caire (Eurecom) Neri Merhav (Technion)
Roger Cheng (Hong Kong) David Neuhoff (Michigan)
K.C. Chen (Taipei) Alon Orlitsky (San Diego)
Daniel Costello (NotreDame) Vincent Poor (Princeton)
Thomas Cover (Stanford) Kannan Ramchandran (Berkeley)
Anthony Ephremides (Maryland) Bixio Rimoldi (EPFL)
Andrea Goldsmith (Stanford) Shlomo Shamai (Technion)
Dave Forney (MIT) Amin Shokrollahi (EPFL)
Georgios Giannakis (Minnesota) Gadiel Seroussi (HP-Palo Alto)
Joachim Hagenauer (Munich) Wojciech Szpankowski (Purdue)
Te Sun Han (Tokyo) Vahid Tarokh (Harvard)
Babak Hassibi (Caltech) David Tse (Berkeley)
Michael Honig (Northwestern) Ruediger Urbanke (EPFL)
Johannes Huber (Erlangen) Steve Wicker (GeorgiaTech)
Hideki Imai (Tokyo) Raymond Yeung (Hong Kong)
Rodney Kennedy (Canberra) Bin Yu (Berkeley)
Sanjeev Kulkarni (Princeton)
Editorial Scope
TM
Foundations and Trends in Communications and Information Theory
will publish survey and tutorial articles in the following topics:

• Coded modulation • Multiuser detection


• Coding theory and practice • Multiuser information theory
• Communication complexity • Optical communication channels
• Communication system design • Pattern recognition and learning
• Cryptology and data security • Quantization
• Data compression • Quantum information processing
• Data networks • Rate-distortion theory
• Demodulation and equalization • Shannon theory
• Denoising • Signal processing for communications
• Detection and estimation • Source coding
• Information theory and statistics • Storage and recording codes
• Information theory and computer science • Speech and image compression
• Joint source/channel coding • Wireless communications
• Modulation and signal design

Information for Librarians


TM
Foundations and Trends in Communications and Information Theory, 2004,
Volume 1, 4 issues. ISSN paper version 1567-2190 (USD 200 N. America; EUR 200
Outside N. America). ISSN online version 1567-2328 (EUR 250 N. America; EUR
250 Outside N. America). Also available as a combined paper and online subscription
(USD N. America; EUR 300 Outside N. America).
Random Matrix
Theory and Wireless
Communications

Antonia M. Tulino

Dept. Ingegneria Elettronica e delle Telecomunicazioni


Universitá degli Studi di Napoli ”Federico II”
Naples 80125, Italy
[email protected]

Sergio Verdú

Dept. Electrical Engineering


Princeton University
Princeton, New Jersey 08544, USA
[email protected]
Foundations and Trends TM in
Communications and Information Theory

Published, sold and distributed by:


PO Box 179
2600 AD Delft
The Netherlands
Tel: +31-6-51115274
www.nowpublishers.com
[email protected]

in North America:
now Publishers Inc.
PO Box 1024
Hanover, MA 02339
USA
Tel. +1-781-985-4510

Printed on acid-free paper

ISSNs: Paper version 1567-2190; Electronic version 1567-2328


c 2004 A.M. Tulino and S. Verdú

All rights reserved. No part of this publication may be reproduced,


stored in a retrieval system, or transmitted in any form or by any
means, mechanical, photocopying, recording or otherwise, without prior
written permission of the publishers.

Now Publishers Inc. has an exclusive licence to publish this mate-


rial worldwide. Permission to use this content must be obtained from
the copyright licence holder. Please apply to now Publishers, PO Box
179, 2600 AD Delft, The Netherlands; www.nowpublishers.com; e-mail:
[email protected]

Printed in Great Britain by Antony Rowe Limited.


Foundations and Trends™ in
Communications and Information Theory
Vol 1, No 1 (2004) 1-182
© 2004 A.M. Tulino and S. Verdú

Random Matrix
Theory and Wireless
Communications
Antonia M. Tulino1 , Sergio Verdú2

1
Dept. Ingegneria Elettronica e delle Telecomunicazion, i Universita degli Studi di
Napoli “Federico II”, Naples 80125, Italy
2
Dept. Electrical Engineering, Princeton University, Princeton, New Jersey 08544,
USA

Abstract
Random matrix theory has found many applications in physics, statis-
tics and engineering since its inception. Although early developments
were motivated by practical experimental problems, random matrices
are now used in fields as diverse as Riemann hypothesis, stochastic
differential equations, condensed matter physics, statistical physics,
chaotic systems, numerical linear algebra, neural networks, multivari-
ate statistics, information theory, signal processing and small-world
networks. This article provides a tutorial on random matrices which
provides an overview of the theory and brings together in one source
the most significant results recently obtained. Furthermore, the appli-
cation of random matrix theory to the fundamental limits of wireless
communication channels is described in depth.
Table of Contents

Section 1 Introduction 3

1.1 Wireless Channels 5


1.2 The Role of the Singular Values 6
1.3 Random Matrices: A Brief Historical Account 13

Section 2 Random Matrix Theory 21

2.1 Types of Matrices and Non-Asymptotic Results 21


2.2 Transforms 38
2.3 Asymptotic Spectrum Theorems 52
2.4 Free Probability 74
2.5 Convergence Rates and Asymptotic Normality 91

Section 3 Applications to Wireless Communications 96

3.1 Direct-Sequence CDMA 96


3.2 Multi-Carrier CDMA 117
3.3 Single-User Multi-Antenna Channels 129
3.4 Other Applications 152

Section 4 Appendices 153

4.1 Proof of Theorem 2.39 153


4.2 Proof of Theorem 2.42 154
4.3 Proof of Theorem 2.44 156
4.4 Proof of Theorem 2.49 158
4.5 Proof of Theorem 2.53 159

References 163

2
1
Introduction

From its inception, random matrix theory has been heavily influenced
by its applications in physics, statistics and engineering. The landmark
contributions to the theory of random matrices of Wishart (1928) [311],
Wigner (1955) [303], and Marc̆enko and Pastur (1967) [170] were moti-
vated to a large extent by practical experimental problems. Nowadays,
random matrices find applications in fields as diverse as the Riemann
hypothesis, stochastic differential equations, condensed matter physics,
statistical physics, chaotic systems, numerical linear algebra, neural
networks, multivariate statistics, information theory, signal processing,
and small-world networks. Despite the widespread applicability of the
tools and results in random matrix theory, there is no tutorial reference
that gives an accessible overview of the classical theory as well as the
recent results, many of which have been obtained under the umbrella
of free probability theory.
In the last few years, a considerable body of work has emerged in the
communications and information theory literature on the fundamental
limits of communication channels that makes substantial use of results
in random matrix theory.
The purpose of this monograph is to give a tutorial overview of ran-

3
4 Introduction

dom matrix theory with particular emphasis on asymptotic theorems


on the distribution of eigenvalues and singular values under various as-
sumptions on the joint distribution of the random matrix entries. While
results for matrices with fixed dimensions are often cumbersome and
offer limited insight, as the matrices grow large with a given aspect
ratio (number of columns to number of rows), a number of powerful
and appealing theorems ensure convergence of the empirical eigenvalue
distributions to deterministic functions.
The organization of this monograph is the following. Section 1.1
introduces the general class of vector channels of interest in wireless
communications. These channels are characterized by random matrices
that admit various statistical descriptions depending on the actual ap-
plication. Section 1.2 motivates interest in large random matrix theory
by focusing on two performance measures of engineering interest: Shan-
non capacity and linear minimum mean-square error, which are deter-
mined by the distribution of the singular values of the channel matrix.
The power of random matrix results in the derivation of asymptotic
closed-form expressions is illustrated for channels whose matrices have
the simplest statistical structure: independent identically distributed
(i.i.d.) entries. Section 1.3 gives a brief historical tour of the main re-
sults in random matrix theory, from the work of Wishart on Gaussian
matrices with fixed dimension, to the recent results on asymptotic spec-
tra. Section 2 gives a tutorial account of random matrix theory. Section
2.1 focuses on the major types of random matrices considered in the lit-
erature, as well on the main fixed-dimension theorems. Section 2.2 gives
an account of the Stieltjes, η, Shannon, Mellin, R- and S-transforms.
These transforms play key roles in describing the spectra of random
matrices. Motivated by the intuition drawn from various applications
in communications, the η and Shannon transforms turn out to be quite
helpful at clarifying the exposition as well as the statement of many
results. Considerable emphasis is placed on examples and closed-form
expressions. Section 2.3 uses the transforms defined in Section 2.2 to
state the main asymptotic distribution theorems. Section 2.4 presents
an overview of the application of Voiculescu’s free probability theory
to random matrices. Recent results on the speed of convergence to the
asymptotic limits are reviewed in Section 2.5. Section 3 applies the re-
1.1. Wireless Channels 5

sults in Section 2 to the fundamental limits of wireless communication


channels described by random matrices. Section 3.1 deals with direct-
sequence code-division multiple-access (DS-CDMA), with and without
fading (both frequency-flat and frequency-selective) and with single
and multiple receive antennas. Section 3.2 deals with multi-carrier code-
division multiple access (MC-CDMA), which is the time-frequency dual
of the model considered in Section 3.1. Channels with multiple receive
and transmit antennas are reviewed in Section 3.3 using models that
incorporate nonideal effects such as antenna correlation, polarization,
and line-of-sight components.

1.1 Wireless Channels


The last decade has witnessed a renaissance in the information theory
of wireless communication channels. Two prime reasons for the strong
level of activity in this field can be identified. The first is the grow-
ing importance of the efficient use of bandwidth and power in view
of the ever-increasing demand for wireless services. The second is the
fact that some of the main challenges in the study of the capacity of
wireless channels have only been successfully tackled recently. Fading,
wideband, multiuser and multi-antenna are some of the key features
that characterize wireless channels of contemporary interest. Most of
the information theoretic literature that studies the effect of those fea-
tures on channel capacity deals with linear vector memoryless channels
of the form

y = Hx + n (1.1)

where x is the K-dimensional input vector, y is the N -dimensional


output vector, and the N -dimensional vector n models the additive
circularly symmetric Gaussian noise. All these quantities are, in gen-
eral, complex-valued. In addition to input constraints, and the degree
of knowledge of the channel at receiver and transmitter, (1.1) is char-
acterized by the distribution of the N × K random channel matrix H
whose entries are also complex-valued.
The nature of the K and N dimensions depends on the actual ap-
plication. For example, in the single-user narrowband channel with nT
6 Introduction

and nR antennas at transmitter and receiver, respectively, we identify


K = nT and N = nR ; in the DS-CDMA channel, K is the number of
users and N is the spreading gain.
In the multi-antenna case, H models the propagation coefficients
between each pair of transmit-receive antennas. In the synchronous DS-
CDMA channel, in contrast, the entries of H depend on the received
signature vectors (usually pseudo-noise sequences) and the fading coef-
ficients seen by each user. For a channel with J users each transmitting
with nT antennas using spread-spectrum with spreading gain G and a
receiver with nR antennas, K = nT J and N = nR G.
Naturally, the simplest example is the one where H has i.i.d. entries,
which constitutes the canonical model for the single-user narrowband
multi-antenna channel. The same model applies to the randomly spread
DS-CDMA channel not subject to fading. However, as we will see, in
many cases of interest in wireless communications the entries of H are
not i.i.d.

1.2 The Role of the Singular Values


Assuming that the channel matrix H is completely known at the re-
ceiver, the capacity of (1.1) under input power constraints depends on
the distribution of the singular values of H. We focus in the simplest
setting to illustrate this point as crisply as possible: suppose that the
entries of the input vector x are i.i.d. For example, this is the case
in a synchronous DS-CDMA multiaccess channel or for a single-user
multi-antenna channel where the transmitter cannot track the channel.
The empirical cumulative distribution function of the eigenvalues
(also referred to as the spectrum or empirical distribution) of an n × n
Hermitian matrix A is denoted by FnA defined as1

1
n
FnA (x) = 1{λi (A) ≤ x}, (1.2)
n
i=1

where λ1 (A), . . . , λn (A) are the eigenvalues of A and 1{·} is the indi-
cator function.
1 If
FnA converges as n → ∞, then the corresponding limit (asymptotic empirical distribution
or asymptotic spectrum) is simply denoted by FA (x).
1.2. The Role of the Singular Values 7

Now, consider an arbitrary N × K matrix H. Since the nonzero


singular values of H and H† are identical, we can write

HH† (x) − N u(x) = KFH† H (x) − Ku(x)


N FN K
(1.3)
where u(x) is the unit-step function (u(x) = 0, x ≤ 0; u(x) = 1, x > 0).
With an i.i.d. Gaussian input, the normalized input-output mutual
information of (1.1) conditioned on H is2
1 1  
I(x; y|H) = log det I + SNR HH† (1.4)
N N
1 
N  
= log 1 + SNR λi (HH† )
N
 ∞i=1
= log (1 + SNR x) dFN
HH† (x) (1.5)
0
with the transmitted signal-to-noise ratio (SNR)
N E[||x||2 ]
SNR = , (1.6)
KE[||n||2 ]
and with λi (HH† ) equal to the ith squared singular value of H.
If the channel is known at the receiver and its variation over time
is stationary and ergodic, then the expectation of (1.4) over the dis-
tribution of H is the channel capacity (normalized to the number of
receive antennas or the number of degrees of freedom per symbol in
the CDMA channel). More generally, the distribution of the random
variable (1.4) determines the outage capacity (e.g. [22]).
Another important performance measure for (1.1) is the minimum
mean-square-error (MMSE) achieved by a linear receiver, which deter-
mines the maximum achievable output signal-to-interference-and-noise
2 The celebrated log-det formula has a long history: In 1964, Pinsker [204] gave a general
log-det formula for the mutual information between jointly Gaussian random vectors but
did not particularize it to the linear model (1.1). Verdú [270] in 1986 gave the explicit form
(1.4) as the capacity of the synchronous DS-CDMA channel as a function of the signature
vectors. The 1991 textbook by Cover and Thomas [47] gives the log-det formula for the
capacity of the power constrained vector Gaussian channel with arbitrary noise covariance
matrix. In the mid 1990s, Foschini [77] and Telatar [250] gave (1.4) for the multi-antenna
channel with i.i.d. Gaussian entries. Even prior to those works, the conventional analyses
of Gaussian channels with memory via vector channels (e.g. [260, 31]) used the fact that
the capacity can be expressed as the sum of the capacities of independent channels whose
signal-to-noise ratios are governed by the singular values of the channel matrix.
8 Introduction

ratio (SINR). For an i.i.d. input, the arithmetic mean over the users (or
transmit antennas) of the MMSE is given, as function of H, by [271]

 −1 
1   1 †
min E ||x − My||2 = tr I + SNR H H (1.7)
K M∈CK×N K
1 
K
1
= (1.8)
K 1 + SNR λi (H† H)
i=1
 ∞
1
= dFK † (x)
1 + SNR x H H
0

N ∞ 1 N −K
= dFN
HH † (x) −
K 0 1 + SNR x K
(1.9)
where the expectation in (1.7) is over x and n while (1.9) follows from
(1.3). Note, incidentally, that both performance measures as a function
of SNR are coupled through
   −1 
d † †
SNR loge det I + SNR HH = K − tr I + SNR H H .
dSNR
As we see in (1.5) and (1.9), both fundamental performance measures
(capacity and MMSE) are dictated by the distribution of the empirical
(squared) singular value distribution of the random channel matrix.
In the simplest case of H having i.i.d. Gaussian entries, the density
function corresponding to the expected value of FN HH†
can be expressed
explicitly in terms of the Laguerre polynomials. Although the integrals
in (1.5) and (1.9) with respect to such a probability density function
(p.d.f.) lead to explicit solutions, limited insight can be drawn from
either the solutions or their numerical evaluation. Fortunately, much
deeper insights can be obtained using the tools provided by asymptotic
random matrix theory. Indeed, a rich body of results exists analyzing
the asymptotic spectrum of H as the number of columns and rows goes
to infinity while the aspect ratio of the matrix is kept constant.
Before introducing the asymptotic spectrum results, some justifica-
tion for their relevance to wireless communication problems is in order.
In CDMA, channels with K and N between 32 and 64 would be fairly
typical. In multi-antenna systems, arrays of 8 to 16 antennas would be
1.2. The Role of the Singular Values 9

at the forefront of what is envisioned to be feasible in the foreseeable fu-


ture. Surprisingly, even quite smaller system sizes are large enough for
the asymptotic limit to be an excellent approximation. Furthermore,
not only do the averages of (1.4) and (1.9) converge to their limits
surprisingly fast, but the randomness in those functionals due to the
random outcome of H disappears extremely quickly. Naturally, such
robustness has welcome consequences for the operational significance
of the resulting formulas.
2

1.8

1.6

1.4 β= 1

1.2

1
0.5
0.8

0.6 0.2

0.4

0.2

0
0 0.5 1 1.5 2 2.5

Fig. 1.1 The Marc̆enko-Pastur density function (1.10) for β = 1, 0.5, 0.2.

As we will see in Section 2, a central result in random matrix theory


states that when the entries of H are zero-mean i.i.d. with variance N1 ,
the empirical distribution of the eigenvalues of H† H converges almost
surely, as K, N → ∞ with K N → β, to the so-called Marc̆enko-Pastur
law whose density function is


1 + (x − a)+ (b − x)+
fβ (x) = 1 − δ(x) + (1.10)
β 2πβx
where (z)+ = max (0, z) and

a = (1 − β)2 b = (1 + β)2 . (1.11)
10 Introduction

0.9

0.8

0.7 β= 1

0.6

0.5

0.4

0.3

0.2

0.1
10

0.2 0.5
0
0 2 4 6 8 10 12 14 16

Fig. 1.2 The Marc̆enko-Pastur density function (1.12) for β = 10, 1, 0.5, 0.2. Note that the
mass points at 0, present in some of them, are not shown.

Analogously, the empirical distribution of the eigenvalues of HH†


converges almost surely to a nonrandom limit whose density function
is (cf. Fig. 1.2)
f̃β (x) = (1 − β) δ(x) + β fβ (x)

+ (x − a)+ (b − x)+
= (1 − β) δ(x) + . (1.12)
2πx
Using the asymptotic spectrum, the following closed-form expres-
sions for the limits of (1.4) [275] and (1.7) [271] can be obtained:
(1.13)
   b
1
log det I + SNR HH† → β log(1 + SNR x)fβ (x) dx
N a

1
= β log 1 + SNR − F (SNR , β)
4

1
+ log 1 + SNR β − F (SNR , β)
4
log e
− F (SNR , β) (1.14)
4 SNR
1.2. The Role of the Singular Values 11
 −1   b
1 † 1
tr I + SNR H H → fβ (x) dx (1.15)
K a 1 + SNR x
F(SNR , β)
= 1− (1.16)
4 β SNR
with

2
√ 2 √ 2
F(x, z) = x(1 + z) + 1 − x(1 − z) + 1 . (1.17)

4 4

3 3

2 2

1 1

0 0
0 2 4 6 8 10 0 2 4 6 8 10

N=3 SNR N=5 SNR

4 4

3 3

2 2

1 1

0 0
0 2 4 6 8 10 0 2 4 6 8 10
SNR N = 50 SNR
N = 15

Fig. 1.3 Several realizations of the left-hand side of (1.13) are compared to the asymptotic
limit in the right-hand side of (1.13) in the case of β = 1 for sizes: N = 3, 5, 15, 50.

The convergence of the singular values of H exhibits several key


features with engineering significance:

• Insensitivity of the asymptotic eigenvalue distribution to the


shape of the p.d.f. of the random matrix entries. This prop-
erty implies, for example, that in the case of a single-user
12 Introduction

multi-antenna link, the results obtained asymptotically hold


for any type of fading statistics. It also implies that restrict-
ing the CDMA waveforms to be binary-valued incurs no loss
in capacity asymptotically.3
• Ergodic behavior: it suffices to observe a single matrix realiza-
tion in order to obtain convergence to a deterministic limit.
In other words, the eigenvalue histogram of any matrix re-
alization converges almost surely to the average asymptotic
eigenvalue distribution. This “hardening” of the singular val-
ues lends operational significance to the capacity formulas
even in cases where the random channel parameters do not
vary ergodically within the span of a codeword.
• Fast convergence of the empirical singular-value distribution
to its asymptotic limit. Asymptotic analysis is especially use-
ful when the convergence is so fast that, even for small values
of the parameters, the asymptotic results come close to the
finite-size results (cf. Fig. 1.3). Recent works have shown that
the convergence rate is of the order of the reciprocal of the
number of entries in the random matrix [8, 110].

It is crucial for the explicit expressions of asymptotic capacity and


MMSE shown in (1.14) and (1.16), respectively, that the channel matrix
entries be i.i.d. Outside that model, explicit expressions for the asymp-
totic singular value distribution such as (1.10) are exceedingly rare.
Fortunately, in other random matrix models, the asymptotic singular
value distribution can indeed be characterized, albeit not in explicit
form, in ways that enable the analysis of capacity and MMSE through
the numerical solution of nonlinear equations.
The first applications of random matrix theory to wireless commu-
nications were the works of Foschini [77] and Telatar [250] on narrow-
band multi-antenna capacity; Verdú [271] and Tse-Hanly [256] on the
optimum SINR achievable by linear multiuser detectors for CDMA;
3 The spacing between consecutive eigenvalues, when properly normalized, was conjectured
in [65, 66] to converge in distribution to a limit that does not depend on the shape of the
p.d.f. of the entries. The universality of the level spacing distribution and other microscopic
(local) spectral characteristics has been extensively discussed in recent theoretical physics
and mathematical literature [174, 106, 200, 52, 54].
1.3. Random Matrices: A Brief Historical Account 13

Verdú [271] on optimum near-far resistance; Grant-Alexander [100],


Verdú-Shamai [275, 217], Rapajic-Popescu [206], and Müller [185] on
the capacity of CDMA. Subsequently, a number of works, surveyed in
Section 3, have successfully applied random matrix theory to a vari-
ety of problems in the design and analysis of wireless communication
systems.
Not every result of interest in the asymptotic analysis of channels of
the form (1.1) has made use of the asymptotic eigenvalue tools that are
of central interest in this paper. For example, the analysis of single-user
matched filter receivers [275] and the analysis of the optimum asymp-
totic multiuser efficiency [258] have used various versions of the central-
limit theorem; the analysis of the asymptotic uncoded error probability
as well as the rates achievable with suboptimal constellations have used
tools from statistical physics such as the replica method [249, 103].

1.3 Random Matrices: A Brief Historical Account


In this subsection, we provide a brief introduction to the main devel-
opments in the theory of random matrices. A more detailed account
of the theory itself, with particular emphasis on the results that are
relevant for wireless communications, is given in Section 2.
Random matrices have been a part of advanced multivariate statis-
tical analysis since the end of the 1920s with the work of Wishart [311]
on fixed-size matrices with Gaussian entries. The first asymptotic re-
sults on the limiting spectrum of large random matrices were obtained
by Wigner in the 1950s in a series of papers [303, 305, 306] motivated by
nuclear physics. Replacing the self-adjoint Hamiltonian operator in an
infinite-dimensional Hilbert space by an ensemble of very large Hermi-
tian matrices, Wigner was able to bypass the Schrödinger equation and
explain the statistics of experimentally measured atomic energy levels
in terms of the limiting spectrum of those random matrices. Since then,
research on the limiting spectral analysis of large-dimensional random
matrices has continued to attract interest in probability, statistics and
physics.
Wigner [303] initially dealt with an n×n symmetric matrix A whose
diagonal entries are 0 and whose upper-triangle entries are independent
14 Introduction

0.3

0.25

0.2

0.15

0.1

0.05

0
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

Fig. 1.4 The semicircle law density function (1.18) compared with the histogram of the
average of 100 empirical density functions for a Wigner matrix of size n = 100.

and take the values ±1 with equal probability. Through a combinatorial


derivation of the asymptotic eigenvalue moments involving the Cata-
lan numbers, Wigner showed that, as n → ∞, the averaged empirical
distribution of the eigenvalues of √1n A converges to the semicircle law
whose density is
 1√
2π 4 − x |x| ≤ 2
2 if
w(x) = (1.18)
0 if |x| > 2
Later, Wigner [305] realized that the same result would be obtained if
the random selection was sampled from a zero-mean (real or complex)
Gaussian distribution. In that case, it is even possible to find an exact
formula for the joint distribution of the eigenvalues as a function of
n [176]. The matrices treated in [303] and [305] are special cases of
Wigner matrices, defined as Hermitian matrices whose upper-triangle
entries are zero-mean and independent. In [306], Wigner showed that
the asymptotic distribution of any Wigner matrix is the semicircle law
(1.18) even if only a unit second-moment condition is placed on its
entries.
Figure 1.4 compares the semicircle law density function (1.18) with
the average of 100 empirical density functions of the eigenvalues of a
10 × 10 Wigner matrix whose diagonal entries are 0 and whose upper-
triangle entries are independent and take the values ±1 with equal
probability.
If no attempt is made to symmetrize the square matrix A and all
1.3. Random Matrices: A Brief Historical Account 15

its entries are chosen to be i.i.d., then the eigenvalues of √1n A are
asymptotically uniformly distributed on the unit circle of the complex
plane. This is commonly referred to as Girko’s full-circle law, which is
exemplified in Figure 1.5. It has been proved in various degrees of rigor
and generality in [173, 197, 85, 68, 9]. If the off-diagonal entries Ai,j and
Aj,i are Gaussian and pairwise correlated with correlation coefficient
ρ, then [238] shows that the eigenvalues of √1n A are asymptotically
uniformly distributed on an ellipse in the complex plane whose axes
coincide with the real and imaginary axes and have radius 1 + ρ and
1 − ρ, respectively. When ρ = 1, the projection on the real axis of such
elliptic law is equal to the semicircle law.

1.5

0.5

−0.5

−1

−1.5
−1.5 −1 −0.5 0 0.5 1 1.5

Fig. 1.5 The full-circle law and the eigenvalues of a realization of a matrix of size n = 500.

Most of the results surveyed above pertain to the eigenvalues of


square matrices with independent entries. However, as we saw in Sec-
tion 1.2, key problems in wireless communications involve the singular
values of rectangular matrices H; even if those matrices have indepen-
16 Introduction

dent entries, the matrices HH† whose eigenvalues are of interest do not
have independent entries.
When the entries of H are zero-mean i.i.d. Gaussian, HH† is com-
monly referred to as a Wishart matrix. The analysis of the joint dis-
tribution of the entries of Wishart matrices is as old as random matrix
theory itself [311]. The joint distribution of the eigenvalues of such ma-
trices is known as the Fisher-Hsu-Roy distribution and was discovered
simultaneously and independently by Fisher [75], Hsu [120], Girshick
[89] and Roy [210]. The corresponding marginal distributions can be
expressed in terms of the Laguerre polynomials [125].
The asymptotic theory of singular values of rectangular matrices
has concentrated on the case where the matrix aspect ratio converges
to a constant
K
→β (1.19)
N
as the size of the matrix grows.
The first success in the quest for the limiting empirical singular
value distribution of rectangular random matrices is due to Marc̆enko
and Pastur [170] in 1967. This landmark paper considers matrices of
the form
W = W0 + HTH† (1.20)
where T is a real diagonal matrix independent of H, W0 is a determin-
istic Hermitian matrix, and the columns of the N × K matrix H are
i.i.d. random vectors whose distribution satisfies a certain symmetry
condition (encompassing the cases of independent entries and uniform
distribution on the unit sphere). In the special case where W0 = 0,
T = I, and H has i.i.d. entries with variance N1 , the limiting spectrum
of W found in [170] is the density in (1.10). In the special case of square
H, the asymptotic density function of the singular values, correspond-
ing to the square root of the random variable whose p.d.f. is (1.10) with
β = 1, is equal to the quarter circle law:
1
q(x) = 4 − x2 , 0 ≤ x ≤ 2. (1.21)
π
As we will see in Section 2, in general (W0 = 0 or T = I) no closed-form
expression is known for the limiting spectrum. Rather, [170] character-
1.3. Random Matrices: A Brief Historical Account 17

ized it indirectly through its Stieltjes transform,4 which uniquely deter-


mines the distribution function. Since [170], this transform, which can
be viewed as an iterated Laplace transform, has played a fundamental
role in the theory of random matrices.
0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.5 1 1.5 2 2.5

Fig. 1.6 The quarter circle law compared a histogram of the average of 100 empirical sin-
gular value density functions of a matrix of size 100 × 100.

Figure 1.6 compares the quarter circle law density function (1.21)
with the average of 100 empirical singular value density functions of
a 100 × 100 square matrix H with independent zero-mean complex
1
Gaussian entries with variance 100 .
Despite the ground-breaking nature of Marc̆enko and Pastur’s con-
tribution, it remained in obscurity for quite some time. For example, in
1977 Grenander and Silverstein [101] rediscovered (1.10) motivated by
a neural network problem where the entries of H take only two values.
Also unaware of the in-probability convergence result of [170], in 1978
Wachter [296] arrived at the same solution but in the stronger sense of
almost sure convergence under the condition that the entries of H have
4 TheStieltjes transform is defined in Section 2.2.1. The Dutch mathematician T. J. Stieltjes
(1856-1894) provided the first inversion formula for this transform in [246].
18 Introduction

uniformly bounded central moments of order higher than 2 as well as


the same means and variances within a row. The almost sure conver-
gence for the model (1.20) considered in [170] was shown in [227]. Even
as late as 1991, rediscoveries of the Marc̆enko-Pastur law can be found
in the Physics literature [50].
The case where W = 0 in (1.20), T is not necessarily diagonal but
Hermitian and H has i.i.d. entries was solved by Silverstein [226] also
in terms of the Stieltjes transform.
The special case of (1.20) where W0 = 0, H has zero-mean i.i.d.
Gaussian entries and
T = (YY † )−1
where the K × m matrix Y has also zero-mean i.i.d. Gaussian entries
1
with variance m , independent of H, is called a (central) multivariate
F -matrix. Because of the statistical applications of such matrix, its
asymptotic spectrum has received considerable attention culminating
in the explicit expression found by Silverstein [223] in 1985.
The speed of convergence to the limiting spectrum is studied in
[8]. For our applications it is more important, however, to assess the
speed of convergence of the performance measures (e.g. capacity and
MMSE) to their asymptotic limits. Note that the sums in the right
side of (1.4) involve dependent terms. Thanks to that dependence, the
convergence in (1.13) and (1.15) is quite remarkable: the deviations
from the respective limits multiplied by N converge to Gaussian random
variables with fixed mean5 and variance. This has been established
for general continuous functions, not just the logarithmic and rational
functions of (1.13) and (1.15), in [15] (see also [131]).
The matrix of eigenvectors of Wishart matrices is known to be
uniformly distributed on the manifold of unitary matrices (the so-
called Haar measure) (e.g. [125, 67]). In the case of HH† where H
has i.i.d. non-Gaussian entries, much less success has been reported in
the asymptotic characterization of the eigenvectors [153, 224, 225].
For matrices whose entries are Gaussian and correlated according
to a Toeplitz structure, an integral equation is known for the Stielt-

5 The mean is zero in the interesting special case where H has i.i.d. complex Gaussian
entries [15].
1.3. Random Matrices: A Brief Historical Account 19

jes transform of the asymptotic spectrum as a function of the Fourier


transform of the correlation function [147, 198, 55]. Other results on
random matrices with correlated and weakly dependent entries can be
found in [170, 196, 146, 53, 199, 145]. Reference [191], in turn, consid-
ers a special class of random matrices with dependent entries that falls
outside the Marc̆enko-Pastur framework and that arises in the context
of the statistical physics of disordered systems.
Incidentally, another application of the Stieltjes transform approach
is the generalization of Wigner’s semicircle law to the sum of a Wigner
matrix and a deterministic Hermitian matrix. Provided Lindeberg-type
conditions are satisfied by the entries of the random component, [147]
obtained the deformed semicircle law, which is only known in closed-
form in the Stieltjes transform domain.
Sometimes, an alternative to the characterization of asymptotic
spectra through the Stieltjes transform is used, based on the proof
of convergence and evaluation of moments such as N1 tr{(HH† )k }. For
most cases of practical interest, the limiting spectrum has bounded
support. Thus, the moment convergence theorem can be applied
to obtain results on the limiting spectrum through its moments
[297, 314, 315, 313].
An important recent development in asymptotic random matrix
analysis has been the realization that the non-commutative free prob-
ability theory introduced by Voiculescu [283, 285] in the mid-1980s is
applicable to random matrices. In free probability, the classical notion
of independence of random variables is replaced by that of “freeness”
or “free independence”.
The power of the concept of free random matrices is best illustrated
by the following setting. In general, we cannot find the eigenvalues of
the sums of random matrices from the eigenvalues of the individual
matrices (unless they have the same eigenvectors), and therefore the
asymptotic spectrum of the sum cannot be obtained from the indi-
vidual asymptotic spectra. An obvious exception is the case of inde-
pendent diagonal matrices in which case the spectrum of the sum is
simply the convolution of the spectra. When the random matrices are
asymptotically free [287], the asymptotic spectrum of the sum is also
obtainable from the individual asymptotic spectra. Instead of convolu-
20 Introduction

tion (or equivalently, summing the logarithms of the individual Fourier


transforms), the “free convolution” is obtained through the sum of
the so-called R-transforms introduced by Voiculescu [285]. Examples
of asymptotically free random matrices include independent Gaussian
random matrices, and A and UBU∗ where A and B are Hermitian
and U is uniformly distributed on the manifold of unitary matrices
and independent of A and B.
In free probability, the role of the Gaussian distribution in classical
probability is taken by the semicircle law (1.18) in the sense of the free
analog of the central limit theorem [284]: the spectrum of the normal-
ized sum of free random matrices (with given spectrum) converges to
the semicircle law (1.18). Analogously, the spectrum of the normalized
sum of free random matrices with unit rank converges to the Marc̆enko-
Pastur law (1.10), which then emerges as the free counterpart of the
Poisson distribution [239, 295]. In the general context of free random
variables, Voiculescu has found an elegant definition of free-entropy
[288, 289, 291, 292, 293]. A number of structural properties have been
shown for free-entropy in the context of non-commutative probabil-
ity theory (including the counterpart of the entropy-power inequality
[248]). The free counterpart to Fisher’s information has been investi-
gated in [289]. However, a free counterpart to the divergence between
two distributions is yet to be discovered.
A connection between random matrices and information theory was
made by Balian [17] in 1968 considering the inverse problem in which
the distribution of the entries of the matrix must be determined while
being consistent with certain constraints. Taking a maximum entropy
method, the ensemble of Gaussian matrices is the solution to the prob-
lem where only a constraint on the energy of the singular values is
placed.
2
Random Matrix Theory

In this section, we review a wide range of existing mathematical results


that are relevant to the analysis of the statistics of random matrices
arising in wireless communications. We also include some new results on
random matrices that were inspired by problems of engineering interest.
Throughout the monograph, complex Gaussian random variables
are always circularly symmetric, i.e., with uncorrelated real and imagi-
nary parts, and complex Gaussian vectors are always proper complex.1

2.1 Types of Matrices and Non-Asymptotic Results


We start by providing definitions for the most important classes of
random matrices: Gaussian, Wigner, Wishart and Haar matrices. We
also collect a number of results that hold for arbitrary (non-asymptotic)
matrix sizes.

1 In
the terminology introduced in [188], a random vector with real and imaginary compo-
h i
nents x and y, respectively, is proper complex if E (x − E[x]) (y − E[y])T = 0 .

21
22 Random Matrix Theory

2.1.1 Gaussian Matrices


Definition 2.1. A standard real/complex Gaussian m × n matrix H
has i.i.d. real/complex zero-mean Gaussian entries with identical vari-
1
ance σ 2 = m . The p.d.f. of a complex Gaussian matrix with i.i.d.
zero-mean Gaussian entries with variance σ 2 is

tr{HH† }
(πσ 2 )−mn exp − . (2.1)
σ2

The following result is the complex counterpart of those given in [18,


78, 27, 245] and [182, Thm. 3.2.14]:

Lemma 2.1. [104] Let H be an m × n standard complex Gaussian


matrix with n ≥ m. Denote its QR-decomposition by H = QR. The
upper triangular matrix R is independent of Q, which is uniformly
distributed over the manifold2 of complex m × n matrices such that
QQ† = I. The entries of R are independent and its diagonal entries,
Ri,i for i ∈ {1, . . . , m}, are such that 2mR2i,i are χ2 random variables
with 2(n − i + 1) degrees of freedom while the off-diagonal entries, Ri,j
for i < j, are independent zero-mean complex Gaussian with variance
1
m.

The proof of Lemma 2.1 uses the expression of the p.d.f. of H given
in (2.1) and [67, Theorem 3.1].
The p.d.f. of the eigenvalues of standard Gaussian matrices is stud-
ied in [32, 68]. If the n×n matrix coefficients are real, [69] gives an exact
expression
for the expected number of real eigenvalues which grows as
2n/π.

2.1.2 Wigner Matrices


Definition 2.2. An n×n Hermitian matrix W is a Wigner matrix if its
upper-triangular entries are independent zero-mean random variables
with identical variance. If the variance is n1 , then W is a standard
Wigner matrix.
2 Thisis called the Stiefel manifold and it is a subspace of dimension 2mn − m2 with total
1 Q 1
volume 2m π mn− 2 m(m−1) m i=1 (n−i)!
2.1. Types of Matrices and Non-Asymptotic Results 23

Theorem 2.2. Let W be an n × n complex Wigner matrix whose


(diagonal and upper-triangle) entries are i.i.d. zero-mean Gaussian with
unit variance.3 Then, its p.d.f. is

−n/2 −n2 /2 tr{W2 }
2 π exp − (2.2)
2
while the joint p.d.f. of its ordered eigenvalues λ1 ≥ . . . ≥ λn is

1 P  1 
n−1 n
− 12 n λ2i
e i=1 (λi − λj )2 . (2.3)
(2π)n/2 i=1
i!
i<j

Theorem 2.3. [307] Let W be an n × n complex Gaussian Wigner


matrix defined as in Theorem 2.2. The marginal p.d.f. of the unordered
eigenvalues is

2
1
n−1
1 x2
√ e− 4 Hi (x) (2.4)
n 2i i! 2π
i=0

with Hi (·) the ith Hermite polynomial [1].

As shown in [304, 172, 81, 175], the spacing between adjacent eigen-
values of a Wigner matrix exhibits an interesting behavior. With the
eigenvalues of a Gaussian Wigner matrix sorted in ascending order, de-
note by L the spacing between adjacent eigenvalues relative to the mean
eigenvalue spacing. The density of L in the large-dimensional limit is
accurately approximated by4

π − π s2
fL (s) ≈
se 4 (2.5)
2
For small values of s, (2.5) approaches zero implying that very
small spacings are unlikely and that the eigenvalues somehow repel
each other.
3 Such matrices are often referred to as simply Gaussian Wigner matrices.
4 Wigner postulated (2.5) in [304] by assuming that the energy levels of a nucleus behave
like a modified Poisson process. Starting from the joint p.d.f. of the eigenvalues of a
Gaussian Wigner matrix, (2.5) has been proved in [81, 175] where its exact expression has
been derived. Later, Dyson conjectured that (2.5) may also hold for more general random
matrices [65, 66]. This conjecture has been proved by [129] for a certain subclass of not
necessarily Gaussian Wigner matrices.
24 Random Matrix Theory

2.1.3 Wishart Matrices

Definition 2.3. The m × m random matrix A = HH† is a (central)


real/complex Wishart matrix with n degrees of freedom and covariance
matrix Σ, (A ∼ Wm (n, Σ)), if the columns of the m × n matrix H are
zero-mean independent real/complex Gaussian vectors with covariance
matrix Σ.5 The p.d.f. of a complex Wishart matrix A ∼ Wm (n, Σ) for
n ≥ m is [244, p. 84], [182, 125]6

π −m(m−1)/2   
fA (B) = m exp −tr Σ−1 B detBn−m . (2.6)
i=1 (n − i)!
detΣ n

2.1.4 Haar Matrices

Definition 2.4. A square matrix U is unitary if

UU† = U† U = I.

Definition 2.5. [107] An n × n random matrix U is a Haar matrix7 if


it is uniformly distributed on the set, U(n), of n × n unitary matrices.8
Its density function on U(n) is given by [107, 67]


n
−n − 12 n(n+1)
2 π (n − i)! (2.7)
i=1

Lemma 2.4. [107] The eigenvalues, ζi for i ∈ {1, . . . , n}, of an n × n


Haar matrix lie on the unit circle, i.e., ζi = ejθi , and their joint p.d.f. is

1 
|ζi − ζ |2 . (2.8)
n!
i<

Lemma 2.5. (e.g. [110]) If 1 ≤ i, j, k,


≤ n, i = k, j =
, and U is an

5 Ifthe entries of H have nonzero mean, HH† is a non-central Wishart matrix.


6 The case n < m is studied in [267].
7 Also called isotropic in the multi-antenna literature [171].
8 A real Haar matrix is uniformly distributed on the set of real orthogonal matrices.
2.1. Types of Matrices and Non-Asymptotic Results 25

n × n (complex) Haar matrix, then


1
E[|Uij |2 ] =
n
2
E[|Uij |4 ] =
n(n + 1)
1
E[|Uij |2 |Ukj |2 ] = E[|Uij |2 |Ui |2 ] =
n(n + 1)
1
E[|Uij |2 |Uk |2 ] =
n2 − 1
1
E[Uij Uk U∗i U∗kj ] = − .
n(n − 1)
2

A way to generate a Haar matrix is the following: let H be an n×n stan-


dard complex Gaussian matrix and let R be the upper triangular ma-
trix obtained from the QR decomposition of H chosen such that all its
diagonal entries are nonnegative. Then, as a consequence of Lemma 2.1,
HR−1 is a Haar matrix [245].

2.1.5 Unitarily Invariant Matrices


Definition 2.6. A Hermitian random matrix W is called unitarily in-
variant if the joint distribution of its entries equals that of VWV† for
any unitary matrix V independent of W.

Example 2.1. A Haar matrix is unitarily invariant.

Example 2.2. A Gaussian Wigner matrix is unitarily invariant.

Example 2.3. A central Wishart matrix W ∼ Wm (n, I) is unitarily


invariant.

Lemma 2.6. (e.g [111]) If W is unitarily invariant, then it can be


decomposed as
W = UΛU† .
with U a Haar matrix independent of the diagonal matrix Λ.

Lemma 2.7. [110, 111] If W is unitarily invariant and f (·) is a real


continuous function defined on the real line, then f (W), given via the
functional calculus, is also unitarily invariant.
26 Random Matrix Theory

Definition 2.7. A rectangular random matrix H is called bi-unitarily


invariant if the joint distribution of its entries equals that of UHV†
for any unitary matrices U and V independent of H.

Example 2.4. A standard Gaussian random matrix is bi-unitarily in-


variant.

Lemma 2.8. [111] If H is a bi-unitarily invariant square random ma-


trix, then it admits a polar decomposition H = UC where U is a Haar
matrix independent of the unitarily-invariant nonnegative definite ran-
dom matrix C.

In the case of a rectangular m × n matrix H, with m ≤ n, Lemma


2.8 also applies with C an n×n unitarily-invariant nonnegative definite
random matrix and with U uniformly distributed over the manifold of
complex m × n matrices such that UU† = I.

2.1.6 Properties of Wishart Matrices


In this subsection we collect a number of properties of central and non-
central Wishart matrices and, in some cases, their inverses. We begin
by considering the first and second order moments of a central Wishart
matrix and its inverse.

Lemma 2.9. [164, 96] For a central Wishart matrix W ∼ Wm (n, I),
E[tr{W}] = mn
E[tr{W2 }] = mn (m + n)
E[tr2 {W}] = mn (mn + 1).

Lemma 2.10. [164, 96](see also [133]) For a central Wishart matrix
W ∼ Wm (n, I) with n > m,
   m
E tr W−1 = (2.9)
n−m
while, for n > m + 1,
   mn
E tr W−2 =
(n − m)3 − (n − m)

 2  −1  m n m−1
E tr W = + .
n − m (n − m)2 − 1 n − m + 1
2.1. Types of Matrices and Non-Asymptotic Results 27

For higher order moments of Wishart and generalized inverse Wishart


matrices, see [96].

From Lemma 2.1, we can derive several formulas on the determinant


and log-determinant of a Wishart matrix.

Theorem 2.11. [182, 131]9 A central complex Wishart matrix W ∼


Wm (n, I), with n ≥ m, satisfies
  
m−1
Γ(n −
+ k)
E detW k
= (2.10)
Γ(n −
)
=0
and hence the moment-generating function of loge detW for ζ ≥ 0 is
  
m−1
Γ(n −
+ ζ)
E eζ loge detW
= (2.11)
Γ(n −
)
=0

with Γ(·) denoting the Gamma function [97]


 ∞
Γ(a) = ta−1 e−t dt
0
which, for integer arguments, satisfies Γ(n + 1) = n! From (2.11),

m−1
E[loge detW] = ψ(n −
) (2.12)
=0

m−1
Var[loge detW] = ψ̇(n −
) (2.13)
=0

where ψ(·) is Euler’s digamma function [97], which for natural argu-
ments can be expressed as

m−1
1 1
ψ(m) = ψ(1) + = ψ(m − 1) + (2.14)

m−1
=1

with −ψ(1) = 0.577215... the Euler-Mascheroni constant. The deriva-


tive of ψ(·), in turn, can be expressed as
1
ψ̇(m + 1) = ψ̇(m) − (2.15)
m2
9 Note that [182, 131] derive the real counterpart of Theorem 2.11, from which the complex
case follows immediately.
28 Random Matrix Theory

π2
with ψ̇(1) = 6 .

If Σ and Φ are positive definite deterministic matrices and H is


an n × n complex Gaussian matrix with independent zero-mean unit-
variance entries, then W = ΣHΦH† satisfies (using (2.10))
   (n −
+ k − 1)!
n−1
E detWk = det(ΣΦ)k (2.16)
(n −
− 1)!
=0

The generalization of (2.16) for rectangular H is derived in [165, 219].


Analogous relationships for the non-central Wishart matrix are derived
in [5].

Theorem 2.12. [166] Let H be an n × m complex Gaussian matrix


with zero-mean unit-variance entries and let W be a complex Wishart
matrix W ∼ Wn (p, I), with m ≤ n ≤ p. Then, for ζ ∈ (−1, 1),


m−1
Γ(m + p − n − ζ −
) Γ(n + ζ −
)
† −1
E[det(H W ζ
H) ] =
Γ(n −
) Γ(m + p − n −
)
=0

m−1
E[log det(H† W−1 H)] = (ψ(n −
) − ψ(m + p − n −
)) .
=0

Additional results on quadratic functions of central and non-central


Wishart matrices can be found in [141, 142, 144] and the references
therein.
Some results on the p.d.f. of complex pseudo-Wishart matrices10
and their corresponding eigenvalues can be found in [58, 59, 168].

Next, we turn our attention to the determinant and log-determinant


of matrices that can be expressed as a multiple of the identity plus
a Wishart matrix, a familiar form in the expressions of the channel
capacity.

10 W
= HH† is a pseudo-Wishart matrix if H is a m×n Gaussian matrix and the correlation
matrix of the columns of H has a rank strictly larger than n [244, 267, 94, 58, 59].
2.1. Types of Matrices and Non-Asymptotic Results 29

Theorem 2.13. A complex Wishart matrix W ∼ Wm (n, I), with n ≥


m, satisfies
m

 m n!
E[det(I + γW)] = γi. (2.17)
i (n − i)!
i=0

Theorem 2.14. [38, 299] Let W be a central Wishart matrix W ∼


Wm (n, I) and let t = min{n, m} and r = max{n, m}. The moment-
generating function of loge det(I + γW) is
  detG(ζ)
E eζ loge det(I+γW) = t (2.18)
i=1 (r − i)!

with G(ζ) a t × t Hankel matrix whose (i, k)th entry is


 ∞
Gi,k = (1 + γλ)ζ λd−1 e−λ dλ
0
−d  
π γ (d − 1)! 1
= 1 F1 d, 1 + d + ζ, γ
Γ(−ζ) sin(π(d − 1 + ζ)) Γ(1 + d + ζ)
γ ζ Γ(−ζ)  

− 1 F1 −ζ, 1 − d − ζ, γ
1
(2.19)
Γ(1 − d − ζ)
with 1 F1 (·) the confluent hypergeometric function [97] and with d =
r − t + i + k + 1.

For a non-central Wishart matrix with covariance matrix equal to


the identity, a series expression for E[log det(I + γW)] has been com-
puted in [3] while the moment-generating function (2.18) has been com-
puted in [134] in terms of the integral of hypergeometric functions.
For a central Wishart matrix W ∼ Wm (n, Σ) where Σ is posi-
tive definite with distinct eigenvalues, the moment-generating function
(2.18) has been computed in [234] and [135].11

Theorem 2.15. [192] If H is an m × m zero-mean unit-variance com-


plex Gaussian matrix and Σ and Υ are positive definite matrices having
11 Reference [234] evaluates (2.18) in terms of Gamma functions for m > n while reference
[135] evaluates it for arbitrary m and n, in terms of confluent hypergeometric functions
of the second kind [97].
30 Random Matrix Theory

distinct eigenvalues ai and φi , respectively, then for ζ ≤ 0


 ζ 

E det I + ΣHΥH = 2 F0 (−ζ, m | − Σ, Υ) (2.20)

where the hypergeometric function with matrix arguments [192] is


det ({2 F0 (−ζ − m + 1, 1| − ai φj )})
2 F0 (−ζ, m | − Σ, Υ) = m−1 k m m
k=1 (−ζ − k) i<j (φi − φj ) i<j (aj − ai )

with 2 F0 (·, ·|·) denoting the scalar hypergeometric function [1].12

For Υ = I (resp. Σ = I), (2.20) still holds but with 2 F0 (s, m | − Σ, I)


(resp. 2 F0 (−ζ, m | I, −Υ)) replaced by [192]
 
det θjm−i 2 F0 (−ζ − i + 1, m − i + 1 |θj )
2 F0 (−ζ, m | Θ) = n (2.21)
i<j (θi − θj )
with Θ = −Σ (resp. Θ = −Υ).

The counterpart of Theorem 2.15 for a rectangular matrix H is as


follows.

Theorem 2.16. [148, 150] Let H be an m × n complex Gaussian ma-


trix with zero-mean unit-variance entries with m ≤ n and define
 †

M (ζ) = E eζ log det(I+γΣHΥH )
with Σ and Υ positive definite matrices having distinct eigenvalues ai
and φi , respectively. Then for ζ ≤ 0
detG(ζ) detΣ−d 
n−1
1 
n
1 
m
1
M (ζ) =
(−1)
d(d−1)
2 (−γ)
n(n−1)
2 (ζ log 1e − i)i φi − φj ai − aj
i=0 i<j i<j

with d = n − m and with G(ζ) an n × n matrix whose (i, j)th entry is


⎧ ` ´ i ∈ {1, . . . , m}
⎪ 1
⎨ 2 F0 ζ log e − n + 1, 1 | − γφj ai j ∈ {1, . . . , n}
Gi,j (ζ) =

⎩(−γφj )i−1−m ˆζ log 1 − n + 1˜ i ∈ {m+1, . . . , n}
e i−1−m j ∈ {1 . . . , n}
Γ(b+k)
where [b]k = Γ(b) indicates the Pochhammer symbol.13

12 In the remainder, det({f (i, j)}) denotes the determinant of a matrix whose (i, j)th entry
is f (i, j).
13 If b is an integer, [b] = b(b + 1) . . . (b − 1 + k).
k
2.1. Types of Matrices and Non-Asymptotic Results 31

An alternative expression for the moment-generating function in The-


orem 2.16 can be found in [231].

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0
5
4 5
3 4
3
2
2
1
1
0 0

Fig. 2.1 Joint p.d.f. of the unordered positive eigenvalues of the Wishart matrix HH† with
r = 3 and t = 2. (Scaled version of (2.22).)

To conclude the exposition on properties of Wishart matrices, we


summarize several results on the non-asymptotic distribution of their
eigenvalues.

Theorem 2.17. [75, 120, 89, 210] Let the entries of H be i.i.d. complex
Gaussian with zero mean and unit variance. The joint p.d.f. of the
ordered strictly positive eigenvalues of the Wishart matrix HH† , λ1 ≥
. . . ≥ λt , equals

Pt 
t
λr−t
t

e i=1 λi i
(λi − λj )2 (2.22)
(t − i)! (r − i)!
i=1 i<j

where t and r are the minimum and maximum of the dimensions of H.


32 Random Matrix Theory

The marginal p.d.f. of the unordered eigenvalues is14 (e.g. [32])


1  r−t 2 r−t −λ
t−1
k!
gr,t (λ) = L (λ) λ e (2.23)
t (k + r − t)! k
k=0
where the Laguerre polynomials are
eλ dk  −λ n+k 
Lnk (λ) = e λ . (2.24)
k!λn dλk
Figure 2.1 depicts the joint p.d.f. of the unordered positive eigenval-
ues of the Wishart matrix HH† , λ1 > 0, . . . , λt > 0, which is obtained
by dividing the joint p.d.f. of the ordered positive eigenvalues by t!

Theorem 2.18. Let W be a central complex Wishart matrix W ∼


Wm (n, Σ) with n ≥ m, where the eigenvalues of Σ are distinct and
their ordered values are a1 > . . . > am > 0. The joint p.d.f. of the
ordered positive eigenvalues of W, λ1 ≥ . . . ≥ λm , equals [125]
det({e−λj /ai })  λn−m 
m m
 λk − λ
a ak . (2.25)
detΣ n (n −
)! ak − a
=1 k<

The marginal p.d.f. of the unordered eigenvalues is [2]



m 
m
D(i, j) λn−m+j−1 e−λ/ai
i=1 j=1
qm,n (λ) = (2.26)

m 
m
1 1
m detΣ n
(n −
)! ( − )
a ak
=1 k<

where D(i, j) is the (i, j)th cofactor of the matrix D with entries
(n − m + k − 1)!
D,k = . (2.27)
a−n+m−k
Figure 2.2 contrasts a histogram obtained via Monte Carlo simu-
lation with the marginal p.d.f. of the unordered eigenvalues of W ∼
Wm (n, Σ) with n = 3 and m = 2 and with the correlation matrix Σ
chosen such that15
2
Σi,j = e−0.2(i−j) . (2.28)
14 An alternative expression for (2.23) can be found in [183, B.7].
15 The correlation in (2.28) is typical of a base station in a wireless cellular system.
2.1. Types of Matrices and Non-Asymptotic Results 33

0.8

0.6

0.4

0.2

0
0 5 10 15

Fig. 2.2 Marginal p.d.f. of the unordered eigenvalues of W ∼ Wm (n, Σ) with n = 3, m = 2


2
and Σi,j = e−0.2(i−j) , compared to an histogram obtained via Monte Carlo simulation.

Theorem 2.19. Let W be a central complex Wishart matrix W ∼


Wm (n, Σ) with m > n, where the eigenvalues of Σ are distinct and
their ordered values are a1 > . . . > am > 0. The joint p.d.f. of the
unordered strictly positive eigenvalues of W, λ1 , . . . , λn , equals [80]

n
1 
m
1 
n
det(Ξ) (λ − λk ) (2.29)

! (a − ak )
=1 k< k<

with
⎡ λ

− a1 − λan
⎢ 1 a1 . . . a1m−n−1 a1m−n−1 e 1 . . . a1m−n−1 e 1

⎢ . . ... . . ... . ⎥
Ξ=⎢

⎥.

⎣ . . ... . . ... . ⎦
λ1 λn
m−n−1 am−n−1 e− am
1 am . . . am m−n−1 e− am
. . . am
m

The marginal p.d.f. of the unordered eigenvalues is given in [2].

Let H be an m × m zero-mean unit-variance complex Gaussian


matrix and Σ and Υ be nonnegative definite matrices. Then the joint
34 Random Matrix Theory

p.d.f. of the eigenvalues of ΣHΥH† is computed in [209] while the


marginal p.d.f. has been computed in [230].
The distributions of the largest and smallest eigenvalues of a central
and non-central Wishart matrix W ∼ Wm (n, I) are given in [67] and
[140, 143, 136]. The counterpart for a central Wishart matrix W ∼
Wm (n, Σ) with n ≥ m can be found in [208].

2.1.7 Rank Results


Lemma 2.20. For any N × K matrices A, B,

rank(A + B) ≤ rank(A) + rank(B).

Moreover, the rank of A is less than or equal to the number of nonzero


entries of A.

Lemma 2.21. For any Hermitian N × N matrices A and B,



N
(λi (A) − λi (B))2 ≤ tr (A − B)2 .
i=1

Lemma 2.22. [313, 10] For any N × K matrices A and B,

N sup |FN
AA† (x) − FBB† (x)| ≤ rank(A − B).
N
(2.30)
x≥0

Lemma 2.23. [313, 10] For any N × N Hermitian matrices A and B,

N sup |FN
A (x) − FB (x)| ≤ rank(A − B).
N
(2.31)
x≥0

2.1.8 Karhunen-Loève Expansion


As will be illustrated in Section 3, this transformation, widely used in
image processing, is a very convenient tool that facilitates the applica-
tion of certain random matrix results to channels of practical interest.

Definition 2.8. Let A be an N × K random matrix. Denote the cor-


relation between the (i, j)th and (i , j  )th entries of A by
 
rA (i, j; i , j  ) = E Ai,j A∗i ,j  . (2.32)
2.1. Types of Matrices and Non-Asymptotic Results 35

The Karhunen-Loève expansion of A yields an N × K image random


matrix à whose entries are

N 
K

Ãk, = Ai,j ψk, (i, j)
i=1 j=1

where the so-called expansion kernel {ψk, (i, j)} is a set of complete
orthonormal discrete basis functions formed by the eigenfunctions of
the correlation function of A, i. e., this kernel must satisfy for all k ∈
{1, . . . , N } and
∈ {1, . . . , K}

N 
K
rA (i, j; i , j  ) ψk, (i , j  ) = λk, (rA ) ψk, (i, j) (2.33)
i =1 j  =1

where we indicate the eigenvalues of rA by λk, (rA ).

Lemma 2.24. The entries of a Karhunen-Loève image are, by con-


struction, uncorrelated and with variances given by the eigenvalues of
the correlation of the original matrix, i.e.,
!
  λk, (rA ) if k = j and
= i,
E Ãk, Ã∗j,i = (2.34)
0 otherwise.

Lemma 2.25. If the expansion kernel can be factored as

ψk, (i, j) = uk (i) v (j), (2.35)

then
A = UÃV†
with Uk,i=uk (i) and Vj,=v∗ (j), which renders the matrices U and V
unitary. As a consequence, A and its Karhunen-Loève image, Ã, have
the same singular values.

Thus, with the Karhunen-Loève expansion we can map the singular


values of a matrix with correlated Gaussian entries and factorable ker-
nel to those of another Gaussian matrix whose entries are independent.

Definition 2.9. The correlation of a random matrix A is said to


be separable if rA (i, j; i , j  ) can be expressed as the product of two
36 Random Matrix Theory

marginal correlations16 that are functions, respectively, of (i,j) and


(i ,j  ).

If the correlation of A is separable, then the kernel is automatically


factorable17 and, furthermore, λk, (rA )=λk λ where λk and λ are, re-
spectively, the kth and
th eigenvalues of the two marginal correlations
whose product equals rA .

2.1.9 Regular Matrices


Definition 2.10. An N × K matrix P is asymptotically row-regular if

1 
K
lim 1{Pi,j ≤ α}
K→∞ K
j=1

is independent of i for all α ∈ R, as the aspect ratio K


N converges to
a constant. A matrix whose transpose is asymptotically row-regular is
called asymptotically column-regular. A matrix that is both asymptot-
ically row-regular and asymptotically column-regular is called asymp-
totically doubly-regular and satisfies

1  1 
N K
lim Pi,j = lim Pi,j . (2.36)
N →∞ N K→∞ K
i=1 j=1

If (2.36) is equal to 1, then P is standard asymptotically doubly-regular.

Example 2.5. An N × K rectangular Toeplitz matrix

Pi,j = ϕ(i − j)

with K ≥ N is an asymptotically row-regular matrix. If either the func-


tion ϕ is periodic or N = K, then the Toeplitz matrix is asymptotically
doubly-regular.
16 Equivalently, the correlation matrix of the vector obtained by stacking up the columns
of A can be expressed as the Kronecker product of two separate matrices that describe,
respectively, the correlation between the rows and between the columns of A.
17 Another relevant example of a factorable kernel occurs with shift-invariant correlation

functions such as rA (i, j; i , j  ) = rA (i − i , j − j  ), for which the Karhunen-Loève image


is equivalent to a two-dimensional Fourier transform.
2.1. Types of Matrices and Non-Asymptotic Results 37

2.1.10 Cauchy-Binet Theorem


The result reported below, which is the continuous analog of the
Cauchy-Binet formula [121], has been applied in several contributions
[39, 166, 2, 231, 219] in order to compute the capacity of multi-antenna
channels and the marginal distributions of the singular values of ma-
trices with correlated Gaussian entries.

Theorem 2.26. [144](see also [6]) Let F and G be n×n matrices


parametrized by a real n-vector (w1 , . . . , wn ):
Fi,j = fj (wi ) (2.37)
Gi,j = gj (wi ) (2.38)
where fj and gj , j = 1, . . . , n, are real-valued functions defined on the
real line. Then, for 0 < a < b,
 b  b
... detF detG dw1 , . . . , dwn = n! detA
a a
where A is another n×n matrix whose (i,j)-th entry is
 b
A= fi (w)gj (w) dw.
a
Note that, in [144], the factor n! does not appear because the variables
w1 , . . . , wn are ordered.

2.1.11 Lyapunov Exponent


The celebrated result in this subsection, although outside the main fo-
cus of this monograph, has been used in several engineering applications
[114, 122, 83].
As n → ∞, the growth of the maximum singular value of the prod-
uct of n random matrices is exponential with a rate of increase given
by the following result.

Theorem 2.27. [79, 193, 29, 44] Denote the maximum singular value
of A (spectral norm of A) by ρ(A). Let A1 , . . . , An , . . . be a stationary
ergodic sequence of random matrices for which
E[log(max{ρ(An ), 1}) < ∞.
38 Random Matrix Theory

Then, there exists a deterministic constant λ (the so-called Lyapunov


exponent) such that almost surely18
"n #
1 
lim log ρ Ai = λ. (2.39)
n→∞ n
i=1

2.2 Transforms
As mentioned in Section 1.3, it is often the case that the solution for the
limiting spectrum is obtained in terms of a transform of its distribution.
In this section, we review the most useful transforms including the
Shannon transform and the η-transform which, suggested by problems
of interest in communications, are introduced in this monograph.
For notational convenience, we refer to the transform of a random
variable and the transform of its cumulative distribution or density
function interchangeably. If the distribution of such variable equals
the asymptotic spectrum of a random matrix, then we refer to the
transform of the matrix and the transform of its asymptotic spectrum
interchangeably.

2.2.1 Stieltjes Transform


Let X be a real-valued random variable with distribution FX (·). Its
Stieltjes transform is defined for complex arguments as19
  ∞
1 1
SX (z) = E = dFX (λ). (2.40)
X −z −∞ λ − z

Although (2.40) is an analytic function on the complement of the sup-


port of FX (·) on the complex plane, it is customary to further restrict
the domain of SX (z) to arguments having positive imaginary parts.
According to the definition, the signs of the imaginary parts of z and
SX (z) coincide. In the following examples, the sign of the square root
should be chosen so that this property is satisfied.
18 Thisproperty is satisfied by any conventional norm.
19 The Stieltjes transform is also known as the Cauchy transform and it is equal to −π
times the Hilbert transform when defined on the real line. As with the Fourier transform
there is no universal agreement on its definition, as sometimes the Stieltjes transform is
defined as SX (−z) or −SX (z).
2.2. Transforms 39

Example 2.6. The Stieltjes transform of the semi-circular law w(·) in


(1.18) is
 2 √ $ %
1 4 − λ2 1
Sw (z) = dλ = − z ± z2 − 4 . (2.41)
2π −2 λ − z 2

Example 2.7. The Stieltjes transform of the Marc̆enko-Pastur law


fβ (·) in (1.10) is
 b
1
Sfβ (z) = fβ (λ) dλ
a λ − z

1 − β − z ± z 2 − 2(β + 1)z + (β − 1)2
= . (2.42)
2βz

Example 2.8. The Stieltjes transform of f̃β (·) in (1.12) is


 b
1
Sf̃β (z) = f̃β (λ) dλ
λ − z
a

−1 + β − z ± z 2 − 2(β + 1)z + (β − 1)2
= . (2.43)
2z
Example 2.9. The Stieltjes transform of the averaged empirical eigen-
value distribution of the unit-rank matrix ss† is equal to

1 1 1
S(z) = SP (z) − 1 − (2.44)
N N z
where N is the dimension of s and SP is the Stieltjes transform of the
random variable s 2 .

Given SX (·), the inversion formula that yields the p.d.f. of X is


[246, 222]

1
fX (λ) = lim Im SX (λ + j ω) . (2.45)
ω→0+ π

Assuming FX (·) has compact support, we can expand SX (·) in a


1
Laurent series involving the moments of X. Expanding λ−z with re-
spect to z, exchanging summation and integration and using analytical
extension, (2.40) can be written as

1  E[X k ]
SX (z) = − . (2.46)
z zk
k=0
40 Random Matrix Theory

If the distribution of X is the averaged empirical eigenvalue distri-


bution of an N × N random matrix A, then E[X k ] can be regarded
 
as the kth moment E N1 tr{Ak } . As a consequence, SX (·) can be re-
garded as a generating function for the moments of the random matrix
whose averaged empirical eigenvalue distribution is FX .
As indicated at the onset of Section 2.2, we often denote the Stielt-
jes transform of the asymptotic empirical distribution of a matrix A
by SA (·). However, as in Examples 2.6, 2.7 and 2.8, it is sometimes
convenient to subscript S(·) by its corresponding asymptotic empirical
distribution or density function. Similar notational conventions will be
applied to the transforms to be defined in the sequel.

2.2.2 η-transform
In the applications of interest, it is advantageous to consider a trans-
form that carries some engineering intuition, while at the same time is
closely related to the Stieltjes transform.
Interestingly, this transform, which has not been used so far in the
random matrix literature, simplifies many derivations and statements
of results.20

Definition 2.11. The η-transform of a nonnegative random variable


X is

1
ηX (γ) = E (2.47)
1 + γX
where γ is a nonnegative real number and thus 0 < ηX (γ) ≤ 1.

The rationale for introducing this quantity can be succinctly ex-


plained by considering a hypothetical situation where the sum of three
components is observed (for example, at the output of a linear re-
ceiver): “desired signal” with strength γ, “background noise” with unit
strength, and “multiuser interference” with strength γX. The reason
the multiuser interference strength is scaled by γ is reminiscent of the
fact that, in many systems, the power of the users either is equal (per-
fect power control) or scales linearly. The expected SINR divided by
20 The η-transform was first used in [273].
2.2. Transforms 41

the single-user (i.e. X = 0) signal-to-noise ratio is given by (2.47).


Since this notion is reminiscent of the multiuser efficiency [271], we
have chosen the notation η standard in multiuser detection.
Either with analytic continuation or including the negative real line
in the domain of definition of the Stieltjes transform, we obtain the
simple relationship with the η-transform:
SX (− γ1 )
ηX (γ) = . (2.48)
γ
Given the η-transform, (2.48) gives the Stieltjes transform by ana-
lytic continuation in the whole positive upper complex half-plane, and
then the distribution of X through the inversion formula (2.45).
From (2.46) and (2.48), the η-transform can be written in terms of
the moments of X:



ηX (γ) = (−γ)k E[X k ], (2.49)
k=0
whenever the moments of X exist and the series in (2.49) converges.
From (1.8) it follows that the MMSE considered in Section 1.2 is
equal to the η-transform of the empirical distribution of the eigenvalues
of H† H.
Simple properties of the η-transform that prove useful are:
• ηX (γ) is strictly monotonically decreasing with γ ≥ 0 from 1
to P[X = 0]. 21
• γηX (γ) is strictly monotonically increasing with γ ≥ 0 from
0 to E[ X1 ].

Thus, the asymptotic fraction of zero eigenvalues of A is


lim ηA (γ) (2.50)
γ→∞

while
1
lim tr{A−1 } = lim γηA (γ). (2.51)
n→∞ n γ→∞

21 Notefrom (2.47) that it is easy (and, it will turn out, sometimes useful) to extend the
definition of the η-transform to (generalized or defective) distributions that put some
nonzero mass at +∞. In this case, ηX (0) = P[X < ∞]
42 Random Matrix Theory

Example 2.10. [271, p. 303] The η-transform of the Marc̆enko-Pastur


law given in (1.10) is
F(γ, β)
η(γ) = 1 − . (2.52)
4β γ

1
η(γ)
0.9 10

0.8

0.7

0.6

2
0.5

0.4

0.3
1
0.2

0.5
0.1
0.1

0
0 1 2 3 4 5 6 7 8 9 10

Fig. 2.3 η-transform of the Marc̆enko-Pastur law (1.10) evaluated for β = 0.1, 0.5, 1, 2, 10.

Example 2.11. The η-transform of the averaged empirical eigenvalue


distribution of the unit-rank matrix ss† is equal to
1
η(γ) = 1 − (1 − ηP (γ)) (2.53)
N
where N is the dimension of s, and ηP is the η-transform of the random
variable s 2 .

Example 2.12. The η-transform of the exponential distribution with


unit mean is
1

η(γ) = − Ei (− γ1 ) (2.54)
γ
2.2. Transforms 43

with Ei (·) denoting the exponential integral


 ∞ −t
e
Ei (z) = − dt.
−z t

Example 2.13. Let Q be a N × K matrix uniformly distributed over


the manifold of N × K complex matrices such that Q† Q = I. Then
β
ηQQ† (γ) = 1 − β + .
1+γ

Lemma 2.28. For any N × K matrix A and K × N matrix B such


that AB is nonnegative definite,
   
N 1 − ηFN (γ) = K 1 − ηFK (γ) . (2.55)
AB BA

Consequently, for K, N → ∞ with K


N → β, if the spectra converge,

ηAB (γ) = 1 − β + βηBA (γ). (2.56)

Lemma 2.29.

(a) Let the components of the N -dimensional vector x be zero-mean


and uncorrelated with second-order moment N1 . Then, for any
N × N deterministic nonnegative definite matrix A,
 
E x† (I + γA)−1 x = ηFN (γ).
A

(b) [13] Let the components of the N -dimensional vector x be zero-


mean and independent with variance N1 . For any N ×N nonneg-
ative definite random matrix B independent of x whose spec-
trum converges almost surely,

lim x† (I + γB)−1 x = ηB (γ) a.s. (2.57)


N →∞

lim x† (B − zI)−1 x = SB (z) a.s. (2.58)


N →∞
44 Random Matrix Theory

2.2.3 Shannon Transform


Another transform motivated by applications is the following.22

Definition 2.12. The Shannon transform of a nonnegative random


variable X is defined as
VX (γ) = E[log(1 + γX)] (2.59)
where γ is a nonnegative real number.

The Shannon transform is intimately related to the Stieltjes and


η-transforms:

γ d 1 1
VX (γ) = 1 − SX − (2.60)
log e dγ γ γ
= 1 − ηX (γ). (2.61)
Since VX (0) = 0, VX (γ) can be obtained for all γ > 0 by integrating
the derivative obtained in (2.60). The Shannon transform contains the
same information as the distribution of X, either through the inversion
of the Stieltjes transform or from the fact that all the moments of X
are obtainable from VX (γ).
As we saw in Section 1.2, the Shannon transform of the empirical
distribution of the eigenvalues of HH† gives the capacity of various
communication channels of interest.

Example 2.14. [275] The Shannon transform of the Marc̆enko-Pastur


law fβ (·) in (1.10) is

1 1 1
V(γ) = log 1 + γ − F (γ, β) + log 1 + γβ − F (γ, β)
4 β 4
log e
− F (γ, β) . (2.62)
4β γ

Example 2.15. [131] Denoting by V(γ) the Shannon transform of the


Marc̆enko-Pastur law fβ (·) in (1.10) with β ≤ 1,
1−β
lim (log γ − V(γ)) = log(1 − β) + log e. (2.63)
γ→∞ β
22 The Shannon transform was first introduced in [272, 273].
2.2. Transforms 45

3.5
V (γ) 0.1

3
0.5

1
2.5

2 2

1.5

10
0.5

0
0 1 2 3 4 5 6 7 8 9 10

Fig. 2.4 Shannon transform of the Marc̆enko-Pastur law (1.10) for β = 0.1, 0.5, 1, 2, 10.

Example 2.16. The Shannon transform of the averaged empirical


eigenvalue distribution of the unit-rank matrix ss† equals
1
VP (γ) V(γ) = (2.64)
N
where N is the dimension of s and VP is the Shannon transform of the
random variable s 2 .

Example 2.17. [61] The Shannon transform of grt (·) in (2.23) is23

t−1 
k  k

k (k + r − t)!(−1)1 +2 I1 +2 +r−t (γ)


V(γ) =

1 (k −
2 )!(r − t +
1 )!(r − t +
2 )!
2 !
k=0 1 =0 2 =0
1
with I0 (γ) = −e γ Ei (− γ1 ) while
" #

n
In (γ) = nIn−1 (γ) + (−γ)−n I0 (γ) + (k − 1)! (−γ)k . (2.65)
k=1

23 Related expressions in terms of the exponential integral function [97] and the Gamma
function can be found in [219] and [126], respectively.
46 Random Matrix Theory

An analytical expression for the Shannon transform of the marginal


distribution, qm,n (·) in (2.26), of the eigenvalues of a central complex
Wishart matrix W ∼ Wm (n, Σ) with n ≥ m can be found in [2, 135].
For the converse case, n ≤ m, defined in Theorem 2.19, the correspond-
ing Shannon transform can be found in [2, 234, 135].

Example 2.18. [148] The Shannon transform of the asymptotic eigen-


value distribution of ΣHΦH† as defined in Theorem 2.16 is
 

log e n−1 i! 
n
1 m
1 m
X
i=1 in
V(γ) = det
(−1) 2 γ 2 i<j φi − φj i<j ai − aj =1
d(d−1) n(n−1)
Y

where X is a m × n matrix whose (i, j)th entry, for i ∈ {1 . . . , m} and


j ∈ {1 . . . , n}, is
⎧ “ ”
⎪ (γφj )n−1 γφj ai
1
⎨−(n − 1)! a1−m e Ei − γφ1 a
j i
i=
i
(X )i,j = X (−γφj ai )k
n−1

⎩ [1 − n]k i = 
k=n−m i an−m

and Y is an (n − m) × n matrix whose (i, j)th entry, for j ∈ {1 . . . , n}


and i ∈ {1 . . . , n − m}, is

(Y)i,j = [1 − n]i−1 (−γφj )i−1 .

Example 2.19. The Shannon transform of the exponential distribu-


tion plays an important role in the capacity of fading channels and can
be written in terms of its η-transform given in (2.54):

V(γ) = γη(γ). (2.66)

2.2.4 Mellin Transform


The Mellin transform has been used in the non-asymptotic theory of
random matrices. As we will see, it is related to the Shannon transform
and can be used to find the capacity of multi-antenna channels with
finite number of antennas in closed form.

Definition 2.13. The Mellin transform of a positive random variable


X is given by
MX (z) = E[X z−1 ] (2.67)
2.2. Transforms 47

where z belongs to a strip of the complex plane where the expectation


is finite.

The inverse Mellin transform of Ω(z) is given by


 c+i∞
−1 1
MΩ (t) = t−z Ω(z)dz. (2.68)
2πi c−i∞
Notice that
M−1
MX (x) = fX (x)
with fX (·) denoting the p.d.f. of X.
Another interesting property of the Mellin transform is that the
Mellin transform of the product of two independent random variables
is equal to the product of the Mellin transforms:

MXY = MX MY . (2.69)

1
Example 2.20. If X is exponentially distributed with mean µ, then

MX (z) = µ1−z Γ(z).

Example 2.21. If X is Nakagami distributed with parameter ν,


2ν ν 2ν−1 −νr 2
fν (r) = Γ(ν) r e , then for 1 − z < ν

ν 1−z
MX 2 (z) = Γ(ν + z − 1).
Γ(ν)

Example 2.22. [126] The Mellin transform of gr,r (·) in (2.23) is

r −1  Γ (1 − z + n)  Γ(z +
)
r−1 2 r−1−n
Mgr,r (1 − z) = .
Γ(z) Γ(1 − z) (n!)2
!
n=0 =0

Example 2.23. The Mellin transform of qm,n (·) in (2.26) is

m−1  ak a   D(i, j) Γ(z + n − m + j − 1)


m m m
Mqm,n (z) = 
detΣn ak − a aim−z−n−j+1 m =1 (n −
)!
k< i=1 j=1

with D(·, ·) given in (2.27).


48 Random Matrix Theory

Theorem 2.30. [126]

VX (γ) = M−1
Υ (γ) (2.70)

where M−1
Υ is the inverse Mellin transform of

Υ(z) = z −1 Γ(z)Γ(1 − z)MX (1 − z). (2.71)

Using Theorem 2.30, an explicit expression for the Shannon trans-


form of gr,r (·) in (2.23) has been derived in [126].

2.2.5 R-transform

Another handy transform, on which we elaborate next, is the R-


transform. In particular, as we shall see in detail in Section 2.4 once the
concept of asymptotic freeness has been introduced, the R-transform
enables the characterization of the asymptotic spectrum of a sum of
suitable matrices (such as independent unitarily invariant matrices)
from their individual asymptotic spectra.

−1
Definition 2.14. [285] Let SX (z) denote the inverse (with respect
to the composition of functions) of the Stieltjes transform of X, i. e.,
−1
z = SX (SX (z)). The R-transform of X is defined as the complex-
valued function of complex argument

−1 1
RX (z) = SX (−z) − . (2.72)
z

As a consequence of (2.72), a direct relationship between the R-


transform and the Stieltjes transform exists, namely
1
s= (2.73)
RX (−s) − z
where for notational simplicity we used s = SX (z). For positive random
variables, letting z = − γ1 in (2.73), we obtain from (2.48) the following
relationship between the R-transform and the η-transform:
1
ηX (γ) = . (2.74)
1 + γ RX (−γ ηX (γ))
2.2. Transforms 49

A consequence of (2.74) is that the R-transform (restricted to the


negative real axis) can be equivalently defined as
ηX (γ) − 1
RX (ϕ) = (2.75)
ϕ
with γ and ϕ satisfying
ϕ = −γ ηX (γ). (2.76)

Example 2.24. The R-transform of a unit mass at a is


R(z) = a. (2.77)

Example 2.25. The R-transform of the semicircle law is


R(z) = z. (2.78)

Example 2.26. The R-transform of the Marc̆enko-Pastur law fβ (·) in


(1.10) is
1
R(z) = . (2.79)
1 − βz
Example 2.27. The R-transform of f̃β (·) in (1.12) is
β
R(z) = . (2.80)
1−z
Example 2.28. The R-transform of the averaged empirical eigenvalue
distribution of the N -dimensional unit-rank matrix ss† such that s 2
has η-transform ηP , satisfies the implicit equation
γ γ  1 1 − η (γ)
P
R − ηP (γ) − γ = . (2.81)
N N γ N − 1 + ηP (γ)
In the special case where the norm is deterministic, s 2 = c,
1
ηP (γ) = ,
1 + γc
and an explicit expression for the R-transform can be obtained from
(2.81) as

−1 + cz + 4cz N + (1 − cz)
2
R(z) =
2z
c
= + O(N −2 ). (2.82)
(1 − cz)N
50 Random Matrix Theory

Theorem 2.31. For any a > 0,

RaX (z) = aRX (az). (2.83)

We now outline how to obtain the moments of X from RX (z). When


the random variable X is compactly supported, the R-transform can be
represented as a series (for those values in the region of convergence):


RX (z) = ck z k−1 (2.84)
k=1

where the coefficients ck , called the free cumulants of X, play a role


akin to that of the classical cumulants. As in the classical case, the
coefficients ck are polynomial functions of the moments E[X p ] with
0 ≤ p ≤ k. Given the free cumulants ck , the moments of X can be
obtained by the so-called free cumulant formula [241]

m     
E[X ] =
m
ck E X m1 −1 · · · E X mk −1 . (2.85)
k=1 m1 +···+mk =m

Note that c1 = E[X], c2 = Var(X), and RX (0) = E[X].


As hinted at the beginning of this section, the main usefulness of the
R-transform stems from Theorem 2.192 stating that, for an important
class of random matrices, the R-transform of the asymptotic spectrum
of the sum is the sum of R-transforms of the individual spectra.

2.2.6 S-transform

Definition 2.15. The S-transform of a nonnegative random variable


X is24
x + 1 −1
ΣX (x) = − ηX (1 + x), (2.86)
x
which maps (−1, 0) onto the positive real line.

24 A less compact definition of the S-transform on the complex plane is given in the literature
(since the η-transform had not been used before) for arbitrary random variables with
nonzero mean. Note that the restriction to nonnegative random variables stems from the
definition of the η-transform.
2.2. Transforms 51

Example 2.29. The S-transform of the Marc̆enko-Pastur law fβ (·) in


(1.10) is
1
Σ(x) = . (2.87)
1 + βx

Example 2.30. The S-transform of f̃β (·) in (1.12) is


1
Σ(x) = . (2.88)
β+x

Example 2.31. The S-transform of the averaged empirical eigen-


value distribution of the N -dimensional unit-rank matrix ss† such that
s 2 = c is equal to
1+x
Σ(x) = . (2.89)
c (x + 1/N )

The S-transform was introduced by Voiculescu [286] in 1987. As we


will see, its main usefulness lies in the fact that the S-transform of the
product of certain random matrices is the product of the corresponding
S-transforms in the limit.
From (2.56), we obtain

−1 −1 γ−1
ηAB (γ) = ηBA +1 (2.90)
β
and hence the S-transform counterpart to (2.56):

Theorem 2.32. For any N × K matrix A and K × N matrix B such


that, as K, N → ∞ with K N → β, the spectra converge while AB is
nonnegative definite,

x+1 x
ΣAB (x) = ΣBA . (2.91)
x+β β

Example 2.32. Let Q be a N × K matrix uniformly distributed over


the manifold of N × K complex matrices such that Q† Q = I. Then
1+x
ΣQQ† (x) = . (2.92)
β +x
52 Random Matrix Theory

2.3 Asymptotic Spectrum Theorems


In this section, we give the main results on the limit of the empirical
distributions of the eigenvalues of various random matrices of inter-
est. For pedagogical purposes we will give results in increasing level of
generality.

2.3.1 The Semicircle Law


Theorem 2.33. [308, 305] Consider an N ×N standard Wigner matrix
W such that, for some constant κ, and sufficiently large N
  κ
max E |Wi,j |4 ≤ 2 . (2.93)
1≤i≤j≤N N
Then, the empirical distribution of W converges almost surely to the
semicircle law whose density is
1
w(x) = 4 − x2 (2.94)

with |x| ≤ 2.

Wigner’s original proof [305] of the convergence to the semicircle law


 
consisted of showing convergence of the empirical moments N1 tr W2k
to the even moments of the semicircle law, namely, the Catalan num-
bers:
 2
1  2k 
lim tr W = x2k w(x) dx
N →∞ N −2

1 2k
= . (2.95)
k+1 k
The zero-mean assumption in the definition of a Wigner matrix can be
relaxed to an identical-mean condition using Lemma 2.23. In fact, it
suffices that the rank of the mean matrix does not grow linearly with
N for Theorem 2.33 to hold.
Assuming for simplicity that the diagonal elements of the Wigner
matrix are zero, we can give a simple sketch of the proof of Theorem
2.33 based on the matrix inversion lemma:
1
(A−1 )i,i = (2.96)
Ai,i − a†i A−1
i ai
2.3. Asymptotic Spectrum Theorems 53

with ai representing the ith column of A excluding the i-element and


Ai indicating the (n − 1)×(n − 1) submatrix obtained by eliminating
from A the ith column and the ith row. Thus
1   1 
N
−1 1
tr (−zI + W) = † −1
. (2.97)
i=1 −z − wi (−zI + Wi )
N N wi
Moreover, Wi is independent of wi , whose entries are independent
with identical variance n1 . Then, taking the limit of (2.97) and applying
(2.58) to the right-hand side, we obtain the quadratic equation
1
SW (z) =
−z − SW (z)
which admits the closed-form solution given in (2.41).


Condition (2.93) on the entries of N W can be replaced by the
Lindeberg-type condition on the whole matrix [10, Thm. 2.4]:

1   
E |Wi,j |2 1 {|Wi,j | ≥ δ} → 0 (2.98)
N
i,j

for any δ > 0.

2.3.2 The Full-Circle Law


Theorem 2.34. [173, 197, 85, 68, 9] Let H be an N × N complex
random matrix whose entries are independent random variables with
identical mean, variance N1 and finite kth moments for k ≥ 4. Assume
that the joint distributions of the real and imaginary parts of the entries
have uniformly bounded densities. Then, the asymptotic spectrum of
H converges almost surely to the circular law, namely the uniform
distribution over the unit disk on the complex plane {ζ ∈ C : |ζ| ≤ 1}
whose density is given by
1
fc (ζ) = |ζ| ≤ 1. (2.99)
π
Theorem 2.34 also holds for real matrices replacing the assumption
on the joint distribution of real and imaginary parts with the one-
dimensional distribution of the real-valued entries.
54 Random Matrix Theory

2.3.3 The Marc̆enko-Pastur Law and its Generalizations


Theorem 2.35. [170, 296, 131, 10] Consider an N ×K matrix H whose
entries are independent zero-mean complex (or real) random variables
with variance N1 and fourth moments of order O( N12 ). As K, N → ∞

with KN → β, the empirical distribution of H H converges almost surely
to a nonrandom limiting distribution with density


1 + (x − a)+ (b − x)+
fβ (x) = 1 − δ(x) + (2.100)
β 2πβx
where
a = (1 − β)2 b = (1 + β)2 .

The above limiting distribution is the Marc̆enko-Pastur law with ratio


index β. Using Lemma 2.22, the zero-mean condition can be relaxed
to having identical mean. The condition on the fourth moments can be
relaxed [10, Thm. 2.8] to a Lindeberg-type condition:
1   
E |Hi,j |2 1 {|Hi,j | ≥ δ} → 0 (2.101)
K
i,j

for any δ > 0.


Using (1.3) and (2.100), the empirical distribution of HH† , with H
as in Theorem 2.35, converges almost surely to a nonrandom limiting
distribution with density (1.12) whose moments are given by
 b  k

k 1 k k
x f̃β (x) dx = βi (2.102)
a k i i−1
i=1
1  
= lim tr (HH† )k . (2.103)
N →∞ N

Furthermore, from Lemma 2.10, it follows straightforwardly that the


first and second order asymptotic moments of (HH† )−1 with β > 1
converge to
1   1
lim tr (HH† )−1 = (2.104)
N →∞ N β−1
1   β
lim tr (HH† )−2 = . (2.105)
N →∞ N (β − 1)3
2.3. Asymptotic Spectrum Theorems 55

The convergence in (2.103)–(2.105) is almost surely. If H is square,


then the empirical distribution of its singular values converges almost
surely to the quarter circle law with density q(·) given in (1.21). The
even moments of the quarter circle law coincide with the corresponding
moments of the semicircle law. Unlike those of the semicircle law, the
odd moments of the quarter circle law do not vanish. For all positive
integers k the moments of the quarter circle law are given by
 2
2k Γ( 1+k
2 )
xk q(x)dx = √ . (2.106)
0 π Γ(2 + k2 )
In the important special case of square H with independent Gaus-
sian entries, the speed at which the minimum singular value vanishes
(and consequently the growth of the condition number) is characterized
by the following result.

Theorem 2.36. [67, Thm. 5.1],[218] Consider an N ×N standard com-


plex Gaussian matrix H. The minimum singular value of H, σmin , sat-
isfies
2 /2
lim P [N σmin ≥ x] = e−x−x . (2.107)
N →∞

A summary of related results for both the minimum and maximum


singular values of H can be found in [67, 10].

The following theorem establishes a link between asymptotic ran-


dom matrix theory and recent results on the asymptotic distribution
of the zeros of classical orthogonal polynomials.

Theorem 2.37. [57] Let λ1 ≤ . . . ≤ λK denote the ordered eigen-


values of H† H with H an N × K standard complex Gaussian matrix
and let x1 ≤ . . . ≤ xK denote the zeros of the Laguerre polynomial
−K+1
LN
K (N x). If K, N → ∞ with K
N → β ∈ (0, ∞), then almost surely

1 
K
a.s
|λ − x |2 → 0. (2.108)
K
=1
Moreover, if d1 ≤ d2 ≤ . . . ≤ dK denote the ordered differences |λi −xi |,
then
a.s
d yK → 0 (2.109)
56 Random Matrix Theory

for all y ∈ (0, 1). For the smallest and largest eigenvalues of H† H, and
−K+1
for the smallest and largest zero of the polynomial LN K (N x), we
have that almost surely

lim x1 = lim λ1 = (1 − β)2 (2.110)
K→∞ K→∞
2
lim xK = lim λK = (1 + β) (2.111)
K→∞ K→∞

for β ≤ 1 while, for β > 1,



lim xK−N +1 = lim λK−N +1 = (1 − β)2 . (2.112)
K→∞ K→∞

Theorem 2.37 in conjunction with recent results on the asymptotic


distribution of the zeros of scaled generalized Laguerre polynomials,
−K+1
LNK (N x), also provides an alternative proof of the semicircle and
Marc̆enko-Pastur laws.
In [57], using results on the asymptotics of classical orthogonal poly-
nomials, results analogous to Theorem 2.37 are also derived for centered
sample covariance matrices
&  
N †
H H − κI (2.113)
K
 
with κ = max 1, K N . For such matrices, it is proved that if K, N → ∞
with KN → ∞ or with N → 0, the extremal eigenvalues converge almost
K

surely to 2 and −2, while the corresponding eigenvalue distribution


converges to the semicircle law (cf. Example 2.50).

Theorem 2.38. [170, 227] Let H be an N × K matrix whose entries


are i.i.d. complex random variables with zero-mean and variance N1 . Let
T be a K × K real diagonal random matrix whose empirical eigenvalue
distribution converges almost surely to the distribution of a random
variable T. Let W0 be an N × N Hermitian complex random matrix
with empirical eigenvalue distribution converging almost surely to a
nonrandom distribution whose Stieltjes transform is S0 . If H, T, and
W0 are independent, the empirical eigenvalue distribution of

W = W0 + HTH† (2.114)
2.3. Asymptotic Spectrum Theorems 57

converges, as K, N → ∞ with K N → β, almost surely to a nonrandom


limiting distribution whose Stieltjes transform S(·) satisfies


T
S(z) = S0 z − β E . (2.115)
1 + TS(z)
The case W0 = 0 merits particular attention. Using the more con-
venient η-transform and Shannon transform, we derive the following
result from [226]. (The proof is given in Appendix 4.1 under stronger
assumptions on T.)

Theorem 2.39. Let H be an N × K matrix whose entries are i.i.d.


complex random variables with variance N1 . Let T be a K × K Her-
mitian nonnegative random matrix, independent of H, whose empir-
ical eigenvalue distribution converges almost surely to a nonrandom
limit. The empirical eigenvalue distribution of HTH† converges almost
surely, as K, N → ∞ with K N → β, to a distribution whose η-transform
satisfies
1−η
β= (2.116)
1 − ηT (γη)
where for notational simplicity we have abbreviated ηHTH† (γ) = η.
The corresponding Shannon transform satisfies25
1
VHTH† (γ) = βVT (ηγ) + log + (η − 1) log e. (2.117)
η
The condition of i.i.d. entries can be relaxed to independent entries
with common mean and variance N1 satisfying the Lindeberg-type con-
dition (2.101). The mth moment of the empirical distribution of HTH†
converges almost surely to [313, 116, 158]:

m  m!
βi E[Tm1 ] . . . E[Tmi ] (2.118)
m1 +···+m =m
(m − i + 1)!f (m 1 , . . . m i )
i=1 i
m1 ≤···≤mi

where T is a random variable with distribution equal to the asymptotic


spectrum of T and, ∀ 1 ≤
≤ m,
f (i1 , . . . , i ) = f1 ! · · · fm ! (2.119)
25 The derivation of (2.117) from (2.116) is given in Section 3.1.2.
58 Random Matrix Theory

with fi the number of entries of the vector [i1 , . . . , i ] equal to i.26


Figure 2.5 depicts the Shannon transform of HTH† given in (2.117)
for β = 23 and T exponentially distributed.

1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
0 1 2 3 4 5 6 7 8 9 10

Fig. 2.5 Shannon transform of the asymptotic spectrum of HTH† for β = 23 and T expo-
nentially distributed. The stars indicate the Shannon transform, obtained via Monte Carlo
simulation, of the averaged empirical distribution of the eigenvalues of HTH† where H is
3 × 2.

1
If T = I, then ηT (γ) = 1+γ , and (2.116) becomes

β
η =1−β+ (2.120)
1 + γη
whose explicit solution is the η-transform of the Marc̆enko-Pastur dis-
tribution, f̃β (·), in (1.12):

F(γ, β)
η(γ) = 1 − . (2.121)

Equation (2.116) admits an explicit solution in a few other cases, one
of which is illustrated by the result that follows.

26 For example, f (1, 1, 4, 2, 1, 2) = 3! · 2! · 1!.


2.3. Asymptotic Spectrum Theorems 59

Theorem 2.40. [223] If, in Theorem 2.39, T = (YY † )−1 with Y a


K × m (K ≤ m) Gaussian random matrix whose entries have zero-
1
mean and variance m , then, using (2.121),27

γ 1
ηT (γ) = F , β̃ (2.122)
4 β̃ γ

m → β̃. Thus, solving (2.116) we find that the asymptotic spec-


where K
trum of W = H(YY † )−1 H† is given by


+
1 (1 − β̃) (x − a2 )+ (b2 − x)+
fW (x) = 1 − δ(x) + (2.123)
β 2πx(xβ̃ + β)
with

1− 1 − (1 − β)(1 − β̃) 1+ 1 − (1 − β)(1 − β̃)
a= b= .
1 − β̃ 1 − β̃

Using (2.56) and (2.116), we can give an equivalent expression for


the η-transform of the asymptotic spectrum of T1/2 H† HT1/2 :
ηT (γ(1 − β + βη)) = η (2.124)
where η = ηT1/2 H† HT1/2 (γ). Note that, as β → 0,
ηT1/2 H† HT1/2 (γ) → ηT (γ) (2.125)
and thus the spectrum of T1/2 H† HT1/2 converges to that of T.

Theorem 2.41. [178] Let Σ be a positive definite matrix whose


asymptotic spectrum has the p.d.f.
'

1 λ λ
fΣ (λ) = −1 1− (2.126)
2πµλ2 σ1 σ2
with σ1 ≤ λ ≤ σ2 and
√ √
( σ2 − σ1 )2
µ= . (2.127)
4σ1 σ2
27 Although [223] obtained (2.123) with the condition that Y be Gaussian, it follows from
(2.121) and Theorem 2.39 that this condition is not required for (2.122) and (2.123) to
hold.
60 Random Matrix Theory

If H is an N × K standard complex Gaussian matrix, then, as K, N →


∞ with K 1/2 HH† Σ1/2 has
N → β, the asymptotic spectrum of W = Σ
the p.d.f.28

(λ − a)+ (b − λ)+
fW (λ) = (1 − β) δ(λ) +
+
(2.128)
2πλ(1 + λµ)
with

a = 1 + β + 2µβ − 2 β (1 + µ)(1 + µβ) (2.129)

b = 1 + β + 2µβ + 2 β (1 + µ)(1 + µβ). (2.130)
The Shannon transform of (2.128) is
1
VW (γ) = log(γω1 (γ, β, µ)) + log |1 − µω2 (γ, β, µ)|
µ
−(β − 1) log |ω3 (γ, β, µ)| (2.131)
with

(1 + (1 + β)µ)[1 + γ(1 + β) + ω4 ] − 2µβ(γ − µ)
ω1 (γ, β, µ) =
2βγ[1 + (1 + β)µ + βµ2 ]

β + γ(1 + β) − ω4 + 2γβµ
ω2 (γ, β, µ) =
2γ[1 + (1 + β)µ + βµ2 ]
⎧ √

⎪ 1 + (1 − β)γ + 2µβ − ω4
⎨ if γ = µ,
ω3 (γ, β, µ) = 2β(γ − µ)

⎪ (1 + γβ)
⎩− if γ = µ
1 + (1 + β)γ

ω4 = (1 + (1 + β)γ)2 − 4βγ(γ − µ).

Returning to the setting of Theorem 2.38 but interchanging the as-


sumptions on W0 and T, i.e., with W0 diagonal and T Hermitian, the
result that follows (proved in Appendix 4.2) states that the asymptotic
spectrum in Theorem 2.38 still holds under the condition that W0 and
T be nonnegative definite. Consistent with our emphasis, this result is
formulated in terms of the η-transform rather than the Stieltjes trans-
form used in Theorem 2.38.
28 Theorem 2.39 indicates that (2.128) holds even without the Gaussian condition on H.
2.3. Asymptotic Spectrum Theorems 61

Theorem 2.42. Let H be an N × K matrix whose entries are i.i.d.


complex random variables with zero-mean and variance N1 . Let T be
a K × K positive definite random matrix whose empirical eigenvalue
distribution converges almost surely to a nonrandom limit. Let W0 be
an N × N nonnegative definite diagonal random matrix with empirical
eigenvalue distribution converging almost surely to a nonrandom limit.
Assuming that H, T, and W0 are independent, the empirical eigenvalue
distribution of
W = W0 + HTH† (2.132)
converges almost surely, as K, N → ∞ with K N → β, to a nonrandom
limiting distribution whose η-transform is the solution of the following
pair of equations:
γ η = ϕ η0 (ϕ) (2.133)

η = η0 (ϕ) − β (1 − ηT (γ η)) (2.134)


with η0 and ηT the η-transforms of W0 and T respectively.

Notice that the function η(γ) can be immediately evaluated from


(2.133) and (2.134) since every ϕ ∈ (0, ∞) determines a pair of values
(γ, η(γ)) ∈ (0, ∞) × [0, 1]: the product (γ η) is obtained from (2.133)
(which is strictly monotonically increasing in ϕ), then η is obtained
from (2.134) and, finally, γ = (γηη) .
Figure 2.6 shows the η-transform of W = HTH† where the asymp-
totic spectrum of T converges almost surely to an exponential distri-
bution.

Theorem 2.43. [86, 55, 159] Define H = CSA where S is an N ×


K matrix whose entries are independent complex random variables
(arbitrarily distributed) satisfying the Lindeberg condition (2.101) with
identical means and variance N1 . Let C and A be, respectively, N × N
and K × K random matrices such that the asymptotic spectra of D =
CC† and T = AA† converge almost surely to compactly supported
measures.29 If C, A and S are independent, as K, N → ∞ with K N → β,

29 Inthe case that C and A are diagonal deterministic matrices, Theorem 2.43 is a special
case of Theorem 2.50.
62 Random Matrix Theory

0.95

0.9

0.85

0.8

0.75

0.7

0.65

0.6

0.55
0 1 2 3 4 5 6 7 8 9 10

Fig. 2.6 η-transform of HTH† with β = 23 and ηT given by (2.54). The stars indicate the
η-transform of the averaged empirical spectrum of HTH† for a 3 × 2 matrix H.

the η-transform of HH† is

ηHH† (γ) = E [ ΓHH† (D, γ) ] (2.135)

where ΓHH† (d, γ) satisfies


1
ΓHH† (d, γ) =  (2.136)
1 + γ β dE T
1+γ T E[D ΓHH† (D,γ)]

with D and T independent random variables whose distributions are the


asymptotic spectra of D and T respectively. The asymptotic fraction
of zero eigenvalues of HH† equals

lim η † (γ) = 1 − min {β P[T = 0], P[D = 0]}


γ→∞ HH

The following result, proved in Appendix 4.3, finds the Shannon


transform of HH† in terms of the Shannon transforms of D and T.
2.3. Asymptotic Spectrum Theorems 63

Theorem 2.44. Let H be an N × K matrix as defined in Theorem


2.43. The Shannon transform of HH† is given by:
γd γt
VHH† (γ) = VD (βγd ) + βVT (γt ) − β log e (2.137)
γ
where
γd γt γd γt
= 1 − ηT (γt ) β = 1 − ηD (βγd ). (2.138)
γ γ

From (2.138), an alternative expression for ηHH† (γ) with H as in


Theorem 2.43, can be obtained as

ηHH† (γ) = ηD (β γd (γ)) (2.139)

where γd (γ) is the solution to (2.138).

Theorem 2.45. [262, 165] Let H be an N × K matrix defined as in


Theorem 2.43. Defining
P[T = 0]
β = β ,
P[D = 0]

VHH† (γ)
lim log(γ β) − = L∞ (2.140)
γ→∞ min {β P[T = 0], P[D = 0]}
with
⎧  
⎪ −E D  β > 1

⎪ log αβ  e − β VT (α)




⎨  
 
L∞ = −E log T eD β = 1 (2.141)



⎪    


⎩ −E log Γ∞ T − 1 VD 1 β < 1
e β Γ∞

with α and Γ∞ , respectively, solutions to


1 1
ηT (α) = 1 −  , ηD = 1 − β. (2.142)
β Γ∞
and with D and T the restrictions of D and T to the events D = 0 and
T = 0.
64 Random Matrix Theory

Corollary 2.1. As γ → ∞, we have that


 1
E D α, β  > 1 and P[D > 0] = 1
lim γ ηHH† (γ) =
γ→∞ ∞, otherwise

with α solution to (2.142).

Theorem 2.46. [262] Let H be an N × K matrix defined as in Theo-


rem 2.43. Further define
⎛ ⎞−1
1 
(N ) (y, γ) = h†j ⎝I + γ h h† ⎠ hj with j−1 ≤ y < Kj .
hj 2 K
 =j

As K, N → ∞, (N ) (y, γ) converges almost surely to

a.s. γt (γ)
(N ) (y, γ) → y ∈ [0, 1]
γE[D]

with γt (γ) satisfying (2.138).

Corollary 2.2. As γ → ∞, we have that


γt (γ)
lim = β P[T > 0] Γ∞ (2.143)
γ→∞ γ

where γt (γ) is the solution to (2.138) while Γ∞ is the solution to (2.142)


for β  < 1 and 0 otherwise.

Theorem 2.47. [159] Let H be an N × K matrix defined as in Theo-


rem 2.43. The mth moment of the empirical eigenvalue distribution of
HH† converges almost surely to

m  
βk B(m1 , . . . , mk , n1 , . . . , nm+1−k ) ·
k=1 m1 +···+mk =m n1 +···+nm+1−k =m
m1 ≤···≤mk n1 ≤···≤nm+1−k

E[Tm1 ] · · · E[Tmk ]E[Dn1 ] · · · E[Dnm+1−k ].


(2.144)

with D and T defined as in Theorem 2.43, f (i1 , . . . , i ) defined as in


2.3. Asymptotic Spectrum Theorems 65

(2.119), while30
m(m − k)!(k − 1)!
B(m1 , . . . , mk , n1 , . . . , nm+1−k ) = .
f (m1 , . . . , mk ) · f (n1 , . . . , nm+1−k )

Equation (2.144) is obtained in [159] using combinatorial tools. An


alternative derivation can be obtained using Theorem 2.55, from which
the nth moment of HH† given by (2.144) is also seen to equal E[m̃n (D)]
with m̃n admitting the following recursive re-formulation:

n 
m̃n (d) = c (d) m̃n1 −1 (d) . . . m̃n −1 (d) (2.145)
=1 n1 +n2 +···+n =n

with
c+1 (d) = β d E[T+1 ]E[D] .

Theorem 2.48. [159] Let H be an N × K matrix defined as in Theo-


rem 2.43 whose jth column is hj . Further define
⎛ ⎞n
1 
δn(N ) (y) = h† ⎝ h h† ⎠ hj with j−1 j
K ≤ y < K (2.146)
hj 2 j
 =j

then, as K, N → ∞ with K
N → β, almost surely
a.s. E[Dmn (D)] ξn
δn(N ) (y) → = (2.147)
E[D] E[D]
where ξn can be computed through the following recursive equation

n
    
ξn = β E D2 m−1 (D) E Ti+1 ξn1 −1 . . . ξni −1
=1 n1 +···+ni =n−
1≤i≤n−

with

n   
mn (d) = βd m−1 (d) E Ti+1 ξn1 −1 . . . ξni −1 . (2.148)
=1 n1 +···+ni =n−
1≤i≤n−

30 Note that B(m1 , . . . , mk , n1 , . . . , nm+1−k ) can be interpreted as the number of non-


crossing partitions (cf. Section 2.4.4) on {1, . . . , m} satisfying the conditions:
(i) the cardinalities of the subsets in , in increasing order, are m1 , . . . , mk ,
(ii) the cardinalities of the subsets in the complementation map (cf. Section 2.4.4) of
are, in increasing order, n1 , . . . , nm+1−k .
66 Random Matrix Theory

Moreover, E[mn (D)] yields yet another way to compute the nth moment
of the asymptotic spectrum of HH† .
Under mild assumptions on the distribution of the independent en-
tries of H, the following convergence result is shown in Appendix 4.4.

Theorem 2.49. Define an N × K complex random matrix H whose


entries are independent complex random variables (arbitrarily dis-
tributed) satisfying the Lindeberg condition (2.101) and with identical
means. Let their variances be
Pi,j
Var [Hi,j ] = (2.149)
N
with P an N ×K deterministic standard asymptotically doubly-regular
matrix whose entries are uniformly bounded for any N . The asymptotic
empirical eigenvalue distribution of H† H converges almost surely to the
Marc̆enko-Pastur distribution whose density is given by (2.100).

Using Lemma 2.22, Theorem 2.49 can be extended to matrices


whose mean has rank r where r > 1 but such that
r
lim = 0.
N →∞ N

Definition 2.16. Consider an N × K random matrix H whose entries


have variances
Pi,j
Var[Hi,j ] = (2.150)
N
with P an N × K deterministic matrix whose entries are uniformly
bounded. For each N , let

v N : [0, 1) × [0, 1) → R

be the variance profile function given by


j−1 j
v N (x, y) = Pi,j i−1
N ≤x< i
N, K ≤y< K. (2.151)

Whenever v N (x, y) converges uniformly to a limiting bounded measur-


able function, v(x, y), we define this limit as the asymptotic variance
profile of H.
2.3. Asymptotic Spectrum Theorems 67

Theorem 2.50. [86, 102, 221] Let H be an N × K random matrix


whose entries are independent zero-mean complex random variables
(arbitrarily distributed) satisfying the Lindeberg condition (2.101) and
with variances
  Pi,j
E |Hi,j |2 = (2.152)
N
where P is an N × K deterministic matrix whose entries are uniformly
bounded and from which the asymptotic variance profile of H, denoted
v(x, y), can be obtained as per Definition 2.16. As K, N → ∞ with

N → β, the empirical eigenvalue distribution of HH converges almost
K

surely to a limiting distribution whose η-transform is

ηHH† (γ) = E [ ΓHH† (X, γ) ] (2.153)

with ΓHH† (x, γ) satisfying the equations,


1
ΓHH† (x, γ) = (2.154)
1 + β γE[v(x, Y)ΥHH† (Y, γ)]
1
ΥHH† (y, γ) = (2.155)
1 + γ E[v(X, y)ΓHH† (X, γ)]

where X and Y are independent random variables uniform on [0, 1].

The zero-mean hypothesis in Theorem 2.50 can be relaxed using


Lemma 2.22. Specifically, if the rank of E[H] is o(N ), then Theorem
2.50 still holds.
The asymptotic fraction of zero eigenvalues of HH† is equal to

lim η † (γ) = 1 − min {β P[E[v(X, Y)|Y] = 0], P[E[v(X, Y)|X] = 0]}.


γ→∞ HH

Lemma 2.51. [86] Let H be an N ×K complex random matrix defined


as in Theorem 2.50. For each a, b ∈ [0, 1], a < b
bN
  b
1
(γHH† + I)−1
i,i → ΓHH† (x, γ)dx. (2.156)
N a
i= aN
68 Random Matrix Theory

Theorem 2.52. [262] Let H be an N × K matrix defined as in Theo-


rem 2.50. Further define
⎛ ⎞−1
1 
(N ) (y, γ) = h† ⎝I + γ h h† ⎠ hj , j−1 j
K ≤ y < K.
hj 2 j
 =j

(y,γ)
As K, N → ∞, (N ) converges almost surely to E[v(X,y)] , with (y, γ)
solution to the fixed-point equation
⎡ ⎤
v(X, y)
(y, γ) = E ⎣   ⎦ y ∈ [0, 1]. (2.157)
v(X,Y)
1 + γ β E 1+γ (Y,γ) |X

The transform of the asymptotic spectrum of HH† is given by the


following result proved in Appendix 4.5.

Theorem 2.53. Let H be an N ×K complex random matrix defined as


in Theorem 2.50. The Shannon transform of the asymptotic spectrum
of HH† is

VHH† (γ) = β E [log(1 + γ E[v(X, Y)ΓHH† (X, γ)|Y])]


+E [log(1 + γ β E[v(X, Y)ΥHH† (Y, γ)|X])]
−γ β E [v(X, Y)ΓHH† (X, γ)ΥHH† (Y, γ)] log e
(2.158)

with ΓHH† (·, ·) and ΥHH† (·, ·) satisfying (2.154) and (2.155).

Theorem 2.54. [262] Let H be an N × K complex random matrix


defined as in Theorem 2.50. Then, denoting

P[E[v(X, Y)|Y] = 0]
β = β ,
P[E[v(X, Y)|X] = 0]
we have that

VHH† (γ)
lim log(γβ) − = L∞
γ→∞ min{βP[E[v(X, Y)|Y] = 0], P[E[v(X, Y)|X] = 0]}
2.3. Asymptotic Spectrum Theorems 69

with
⎧      

⎪ −E log 1
E v(X ,Y ) 
| − β  E [log (1 + α(Y ))] β > 1

⎪ e 
1+α(Y ) X


⎨   

a.s.
L∞ → −E log v(Xe,Y ) β = 1



⎪      


⎩ −E log Γ∞ (Y ) − 1 E log 1 + E v(X ,Y ) |X β < 1
e β Γ∞ (Y )

with X and Y the restrictions of X and Y to the events E[v(X, Y)|X]=0


and E[v(X, Y)|Y]=0, respectively. The function α(·) is the solution, for
β  >1, of
⎡ ⎤
1 
v(X , y)
α(y) =  E ⎣    ⎦ (2.159)
β E v(R ,Y ) 
 |X
1+α(Y )

whereas Γ∞ (·) is the solution, for β  <1, of


⎡ ⎤
1
E⎣     ⎦ = 1 − β. (2.160)
1 + E v(X ,Y ) 
Γ∞ (Y ) |X

Corollary 2.3. As γ → ∞, if β  > 1 and P[E[v(X, Y)|X] > 0]] = 1,


then
⎡ ⎤
1 1
lim γ ηHH† (γ) = E⎣     ⎦ (2.161)
γ→∞ β P[E[v(X, Y)|Y ] = 0] E v(X ,Y ) 
|X

1+α(Y )

with α(·) solution to (2.159). Otherwise the limit in (2.161) diverges.

Corollary 2.4. As γ → ∞, we have that


lim (y, γ) = β P[E[v(X, Y)|Y ] = 0]Γ∞ (y) (2.162)
γ→∞

where Γ∞ (y) is the solution to (2.160) for β  < 1 and 0 otherwise while
(y, γ) is the solution to (2.157).

Theorem 2.55. [159] Let H be an N × K matrix defined as in Theo-


rem 2.50. The nth moment of the empirical eigenvalue distribution of
HH† converges almost surely to
1  
lim tr (HH† )n = E[mn (X)] (2.163)
N →∞ N
70 Random Matrix Theory

with mn (x) satisfying the recursive equation



n 
mn (x) = β m−1 (x) E[ v(x, Y) E [v(X, Y)mn1 −1 (X)|Y]
=1 n1 +···+ni =n−
1≤i≤n−

. . . E [v(X, Y)mni −1 (X)|Y] ] (2.164)


where m0 (x) = 1 and where, in the second summation, the nk ’s with
k ∈ {1, . . . , i} are strictly positive integers. In turn, X and Y are inde-
pendent random variables uniform on [0,1].

Theorem 2.56. [159] Consider an N × K matrix H defined as in The-


orem 2.50 whose jth column is hj . As K, N → ∞, the quadratic form
⎛ ⎞n
1 
δn(N ) (y) = h† ⎝ h h† ⎠ hj j−1 j
K ≤ y < K (2.165)
hj 2 j
 =j

converges almost surely to a function δn (y) given by


E[mn (X)v(X, y)] ξn (y)
δn (y)= = (2.166)
E[v(X, y)] E[v(X, y)]
where X is a random variable uniform on [0,1] and mn (x) is given by
(2.164) in Theorem 2.55.

From Theorems 2.55 and 2.56 it follows that:

Corollary 2.5. The relationships between the moments, E[mn (X)],


and ξn (y) are:
⎡ ⎤
 ⎢
n  ⎥
E[mn (X)] = β E⎣ξ−1 (Y) ξn1 −1 (Y) . . . ξni −1 (Y)⎦ (2.167)
=1 n1 +···+ni =n−
1≤i≤n−

with ξn (y) = E[mn (X)v(X, y)].

In the case that v(x, y) factors as v(x, y) = vX (x)vY (y), then (2.164)
becomes

n 
mn (r) = β r m−1 (r) E[Di+1 ] E[Cmn1 −1 (C)] · ·E[Cmni −1 (C)]
=1 n1 +···+ni =n−
1≤i≤n−
2.3. Asymptotic Spectrum Theorems 71

where C and D are independent random variables whose distribution


equals the distributions of vX (X) and vY (Y), respectively, with X and
Y uniform on [0, 1]. From the above recursive formula, the closed-form
expression given in (2.144) can be found by resorting to techniques of
non-crossing partitions and the complementation map.

Remark 2.3.1. If v(x, y) factors, Theorems 2.50-2.56 admit simpler


formulations. The Shannon transform, η-transform, (y, γ) and mo-
ments of the asymptotic spectrum of HH† , with H defined as in The-
orem 2.50, coincide with those of Theorems 2.43-2.48: in this case D
and T represent independent random variables whose distributions are
given by the distributions of vX (X) and vY (Y), respectively.

An example of v(x, y) that factors is when the N × K matrix of


variances, P, introduced in (2.152), is the outer product of two vectors
P = dtT . (2.168)
where the N -vector d and the K-vector t have nonnegative determin-
istic entries.

Definition 2.17. Let B be an N × K random matrix with indepen-


dent columns. Denoting by · the closest smaller integer, B behaves
ergodically if, for a given x ∈ [0, 1), the empirical distribution of
|(B) xN ,1 |2 , . . . , |(B) xN ,K |2
converges almost surely to a nonrandom limit Fx (·) and, for a given
y ∈ [0, 1), the empirical distribution of
|(B)1, yK |2 , . . . , |(B)N, yK |2
converges almost surely to a nonrandom limit Fy (·).

Definition 2.18. Let B be a random matrix that behaves ergodically


in the sense of Definition 2.17. Assuming that Fx (·) and Fy (·) have all
their moments bounded, the two-dimensional channel profile of B is
defined as the function ρ(x, y) : [0, 1]2 → R such that, if X is uniform
on [0, 1], the distribution of ρ(X, y) equals Fy (·) whereas, if Y is uniform
on [0, 1], then the distribution of ρ(x, Y) equals Fx (·).
72 Random Matrix Theory

Analogously, the one-dimensional channel profile of B for a given k


is the function ρk (x) : [0, 1] → R such that, if X is uniform on [0, 1],
the distribution of ρk (X) equals the nonrandom asymptotic empirical
distribution of |(B)1,k |2 , . . . , |(B)N,k |2 .

Theorem 2.57. [159, 160] Consider an N × K matrix H = S ◦ B with


◦ denoting the Hadamard (element-wise) product and with S and B
independent N × K random matrices. The entries of S are zero-mean
i.i.d. complex random variables arbitrarily distributed with variance N1
while B is as in Definition 2.18 with Fx (·) and Fy (·) having all their
moments bounded. Denoting by ρB (x, y) the channel profile of B, then,
as K, N → ∞ with K N → β, the empirical eigenvalue distribution of
HH† converges almost surely to a nonrandom limit whose η-transform,
Shannon transform and moments are given by (2.153), (2.158) and
(2.163-2.164) respectively with v(x, y) replaced by ρB (x, y). Analogous
considerations hold for the functions (y, γ) and δn (y).

Theorem 2.58. [262] Consider an N × K matrix H whose entries


are zero-mean correlated Gaussian random variables with correlation
function rH (i, j; i , j  ) whose eigenvalues are λi,j (rH ), for 1 ≤ i ≤ N and
1 ≤ j ≤ K (cf. Definition 2.8) and whose kernel factors as in (2.35).
Assume that N λi,j (rA ) are uniformly bounded for any N . Theorems
2.49-2.56 hold by redefining v(x, y) as the asymptotic variance profile
of the Karhunen-Loève image of H, which corresponds to the limit for
N → ∞ of
j−1 j
v N (x, y) = N λi,j (rH ) i−1
N ≤x< i
N, K ≤y< K.

Therefore, the asymptotic spectrum of H is fully characterized by the


variances of the entries of its Karhunen-Loève image.

A special case of Theorem 2.58 is illustrated in [55] for rH (i, j; i , j  ) =


f (i − i , j − j  ), in which case H is termed a band matrix.

Theorem 2.59. [159] Consider the N × K random matrix

H = [A1 s1 , . . . , AK sK ]Ā (2.169)


2.3. Asymptotic Spectrum Theorems 73

where S = [s1 . . . sK ] is an N × K matrix with zero-mean i.i.d. entries


with variance N1 , Ā is a deterministic diagonal matrix and Ak k ∈
{1, . . . , K} are either finite order or infinite order absolutely summable
N × N Toeplitz independent matrices, independent of S. Let ρ(x, y)
be the two-dimensional channel profile of the N × K matrix Λ whose
(i, j)th entry is31

Λi,j = |Āj |2 λi (Aj ) (2.170)

with λi (Aj ) the ith eigenvalue of Aj A†j . As K, N → ∞ with K


N → β,
the empirical eigenvalue distribution of HH† converges almost surely
to a nonrandom limiting distribution whose η-transform is [159]

ηHH† (γ) = E [ ΓHH† (X, γ) ] (2.171)

where ΓHH† (·, ·) satisfies the equations


1
ΓHH† (x, γ) = (2.172)
1 + β γE[ρ(x, Y)ΥHH† (Y, γ)]
1
ΥHH† (y, γ) = (2.173)
1 + γ E[ρ(X, y)ΓHH† (X, γ)]
with X and Y independent random variables uniform on [0, 1].

Consequently, Theorems 2.49-2.56 still hold with the function v(x, y)


replaced by ρ(x, y).

Define
1  ∗
R(N, m) = Hi1 ,jm Hi1 ,j1 · · · H∗im ,jm−1 Him ,jm , (2.174)
N
where the summation ranges over all 2m-tuples i1 , . . . , im , j1 , . . . , jm
satisfying 1 ≤ i ≤ N and 1 ≤ j ≤ K, such that the cardinality of
the set of distinct values of i plus the cardinality of the set of distinct
values of j equals k + 1, and such that there is one-to-one pairing of
the unconjugate and the conjugate terms in the products.
31 Theexistence of ρ(x, y) implies that Λ is a matrix that behaves ergodically in the sense
of Definition 2.17.
74 Random Matrix Theory

Lemma 2.60. [296] Let H be an N ×K real or complex random matrix


whose entries are independent with
µi
E[Hi,j ] = √
N
regardless of j and with
  κ2
E |Hi,j − √µi |2+δ <
N N 1+δ/2
for some δ > 0 and κ > 0. The empirical eigenvalue distribution of HH†
converges almost surely to a nonrandom limit FHH† (·) if and only if,
for each m, E[R(N, m)] in (2.174) converges as N → ∞. Furthermore,
 
m
λ dFHH† (λ) = lim λm dFHH
N
† (λ) (2.175)
N →∞
= lim E[R(N, m)]. (2.176)
N →∞

2.4 Free Probability


In the last few years, a large fraction of the new results on the asymp-
totic convergence of the eigenvalues of random matrices has been ob-
tained using the tools of free probability. This is a discipline founded
by Voiculescu [283] in the 1980s that spawned from his work on opera-
tor algebras. Unlike classical scalar random variables, random matrices
are noncommutative objects whose large-dimension asymptotics have
provided the major applications of the theory of free probability.
Knowing the eigenvalues of two matrices is, in general, not enough
to find the eigenvalues of the sum of the two matrices (unless they
commute). However, it turns out that free probability identifies a cer-
tain sufficient condition (called asymptotic freeness) under which the
asymptotic spectrum of the sum can be obtained from the individual
asymptotic spectra without involving the structure of the eigenvectors
of the matrices.
When two matrices are asymptotically free, there exists a rule to
compute any asymptotic moment of the sum of the matrices (and thus
the asymptotic spectrum) as a function of the individual moments.
The combinatorics of the rule are succinctly described by recourse to
the R-transform. Indeed, the central result in the application of free
2.4. Free Probability 75

probability to random matrices is that the R-transform of the asymp-


totic spectrum of the sum of asymptotically free matrices is equal to the
sum of the individual R-transforms. Analogously, the S-transform of the
product of asymptotically free random matrices is equal to the prod-
uct of the individual S-transforms. Computation of the R-transform,
S-transform and the mixed moments of random matrices is often aided
by a certain combinatorial construct based on noncrossing partitions
due to Speicher [240, 241, 242].
The power of free probability is evident, not only in the new results
on random matrices it unveils, but on the fresh view it provides on
established results. For example, it shows that the semicircle law and
the Marc̆enko-Pastur laws are the free counterparts of the Gaussian
and Poisson distributions, respectively, in classical probability. Fur-
thermore, using the central R-transform result it is possible to provide
different proof techniques for the major results reviewed in Section 2.3.

2.4.1 Asymptotic Freeness


For notational convenience, we define the following functional for se-
quences of Hermitian matrices:
1
φ(A) = lim E[trA]. (2.177)
N →∞ N
Note that the expected asymptotic pth moment of A is φ(Ap ) and
φ(I) = 1.

Definition 2.19. [287] The Hermitian random matrices A and B are


asymptotically free if for all
and for all polynomials pi (·) and qi (·)
with 1 ≤ i ≤
such that32

φ(pi (A)) = φ(qi (B)) = 0, (2.178)

we have

φ(p1 (A) q1 (B) . . . p (A) q (B)) = 0. (2.179)

Definition 2.19 generalizes to several random matrices as follows.


32 This includes polynomials with constant (zero-order) terms.
76 Random Matrix Theory

Definition 2.20. The Hermitian random matrices A1 , . . . , Am are


asymptotically free if, for all
∈ N and all polynomials p1 ( · ), . . . , p ( · ),
, -
φ p1 (Aj(1) ) · p2 (Aj(2) ) · · · p (Aj() ) = 0 (2.180)

whenever
, -
φ pi (Aj(i) ) = 0 ∀i = 1, . . . ,
(2.181)

where j(i) = j(i + 1) (i.e., consecutive indices are distinct, but non-
neighboring indices are allowed to be equal).

It is also of interest to define asymptotic freeness between pairs of


Hermitian random matrices.

Definition 2.21. [287] The pairs of Hermitian matrices {A1 , A2 } and


{B1 , B2 } are asymptotically free if, for all
and for all polynomials
pi (·) and qi (·) in two noncommuting indeterminates with 1 ≤ i ≤

such that

φ(pi (A1 , A2 )) = φ(qi (B1 , B2 )) = 0, (2.182)

we have

φ(p1 (A1 , A2 ) q1 (B1 , B2 ) . . . p (A1 , A2 ) q (B1 , B2 )) = 0. (2.183)

As a shorthand, when {A1 , A2 } and {B1 , B2 } are asymptotically free,


we will say that ({A1 , A2 }, {B1 , B2 }) are asymptotically free.

Let us now incorporate, in the definition of asymptotic freeness, the


class of non-Hermitian matrices. If H1 and H2 are rectangular non-
Hermitian matrices, we say that {H1 , H†1 } and {H2 , H†2 } are asymp-
totically free, or equivalently that H1 and H2 are asymptotically *-free,
if the relations given in Definition 2.21 apply with pi (H1 , H†1 ) and
qi (H2 , H†2 ) polynomials of two noncommuting variables.

The definition of asymptotic freeness is somewhat reminiscent of


the concept of independent random variables. However, as the follow-
ing example shows, statistical independence does not imply asymptotic
freeness.
2.4. Free Probability 77

Example 2.33. Suppose that X1 and X2 are independent zero-mean


random variables with nonzero variance. Then, X1 I and X2 I are not
asymptotically free. More generally, if two matrices are asymptotically
free and they commute, then one of them is necessarily deterministic.

An alternative to the foregoing definitions is obtained by dropping


the expectation from the definition of the operator φ in (2.177) and
assuming that the spectra of the matrices converge almost surely to
a nonrandom limit. This notion is known as almost surely asymptotic
freeness [110, 111]. As will be pointed out, some of properties and
examples discussed in the sequel for asymptotic freeness also hold for
almost surely asymptotic freeness.
To illustrate the usefulness of the definition of asymptotic freeness,
we will start by computing various mixed moments of random matrices.
If A1 , . . . , A are asymptotically free random matrices, a number of
useful relationships can be obtained by particularizing the following
identity:
 
φ (Ak11 − φ(Ak11 )I) · (Ak22 − φ(Ak22 )I) · · · (Ak  − φ(Ak  )I) = 0
(2.184)

which is obtained from (2.180) by considering the


polynomials

pi (Ai ) = Aki i − φ(Aki i )I

which obviously satisfy φ(pi (Ai )) = 0.


Applying (2.184), we can easily obtain the following relationships
for asymptotically free A and B:

φ(Ak B ) = φ(Ak )φ(B ) (2.185)


φ(ABAB) = φ (B)φ(A ) + φ (A)φ(B ) − φ (A)φ (B).
2 2 2 2 2 2

(2.186)

As mentioned, one approach to characterize the asymptotic spec-


trum of a random matrix is to obtain its moments of all orders. Fre-
quent applications of the concept of asymptotic freeness stem from the
fact that the moments of a noncommutative polynomial p(A, B) of two
78 Random Matrix Theory

asymptotically free random matrices can be computed from the indi-


vidual moments of A and B. Thus, if p(A, B), A, B are Hermitian,
the asymptotic spectrum of p(A, B) depends only on those of A and
B even if they do not have the same eigenvectors. To illustrate this
point, when p(A, B) = A + B we can use (2.184) to obtain the first
few moments:

φ(A + B) = φ(A) + φ(B) (2.187)


φ((A + B)2 ) = φ(A2 ) + φ(B2 ) + 2φ(A)φ(B) (2.188)
3 3 3 2
φ((A + B) ) = φ(A) + φ(B ) + 3φ(A)φ(B )
+ 3φ(B)φ(A2 ) (2.189)
φ((A + B)4 ) = φ(A4 ) + φ(B4 ) + 4φ(A)φ(B3 )
+ 4φ(B)φ(A3 ) + 2φ2 (B)φ(A2 )
+ 2φ2 (A)φ(B2 ) + 2φ(B2 )φ(A2 ). (2.190)

All other higher moments can be computed analogously. As we will


see below, the R-transform defined in Section 2.2.5 circumvents the in-
creasingly cumbersome derivations required to derive other moments.33
Next, we compile a list of some of the most useful instances of
asymptotic freeness that have been shown so far. In order to ease the
exposition, we state them without including all the technical sufficient
conditions (usually on the higher order moments of the matrix entries)
under which they have been proved so far. For the exact technical
conditions, the reader can refer to the pertinent citations.

Example 2.34. Any random matrix and the identity are asymptoti-
cally free.

Example 2.35. [287] Independent Gaussian standard Wigner matri-


ces are asymptotically free.

Example 2.36. [287] Let X and Y be independent standard Gaussian


matrices. Then {X, X† } and {Y, Y † } are asymptotically free.
33 Noticethat the first three moments of A + B can be obtained from formulas identical
to those pertaining to classical independent random variables. A difference appears from
the fourth moment (2.190) on.
2.4. Free Probability 79

Historically, Examples 2.35 and 2.36 are the first results on the
freeness of random matrices.

Example 2.37. [63] Independent standard Wigner matrices are


asymptotically free.

Example 2.38. [63] A standard Wigner matrix and a diagonal deter-


ministic matrix (or a block diagonal deterministic matrix with bounded
block size) are asymptotically free.

Example 2.39. [211] Let X and Y be independent square matrices


whose entries are zero-mean independent random variables (arbitrarily
distributed), with variance vanishing inversely proportionally to the
size. Then, ({X, X† }, {Y, Y † }) are asymptotically free. Furthermore,
these matrices and block diagonal deterministic matrices with bounded
block size are also asymptotically free.

Example 2.40. Suppose that the N -vectors hi , i ∈ {1, . . . ,


}, are
independent and have independent entries with variances equal to N1
and identical means. Furthermore, let X1 , . . . X be independent ran-
dom variables with finite moments of all order and also independent of
the random vectors. Then,

X1 h1 h†1 , X2 h2 h†2 , . . . , X h h†

are asymptotically free.

Example 2.41. [287] If U and V are independent Haar matrices, then


({U, U† }, {V, V† }) are asymptotically free.

Example 2.42. [287] If U is a Haar matrix and D is a deterministic


matrix with bounded eigenvalues, then ({U, U† }, {D, D† }) are asymp-
totically free.

Example 2.43. [294] Let X be a standard Gaussian matrix and


let D be a deterministic matrix with bounded eigenvalues. Then
({X, X† }, {D, D† }) are asymptotically free.
80 Random Matrix Theory

Example 2.44. [240] UAU† and B are asymptotically free if A and


B are Hermitian matrices whose asymptotic averaged empirical eigen-
value distributions are compactly supported and U is a Haar matrix
independent of A and B.

Example 2.45. [240] A unitarily invariant matrix with compactly


supported asymptotic spectrum and a deterministic matrix with
bounded eigenvalues are asymptotically free.

Example 2.46. [295] Independent unitarily invariant matrices with


compactly supported asymptotic spectra are asymptotically free.

Example 2.47. [295] Let A and B be N × K independent bi-


unitarily invariant random matrices whose asymptotic averaged em-
pirical singular value distributions are compactly supported. Then,
({A, A† }, {B, B† }, {D, D† }) are asymptotically free for any determin-
istic N × K matrix D with bounded eigenvalues.

Example 2.48. Let H1 and H2 be independent standard Gaussian


matrices and let T be a random Hermitian matrix independent of
H1 and H2 with compactly supported asymptotic averaged empirical
eigenvalue distribution. Then it follows from Lemma 2.7 and Examples
2.45–2.46 that (f1 (H1 TH†1 ), f2 (H2 TH†2 ), {D, D† }) are asymptotically
free for any real continuous functions f1 (·) and f2 (·), defined on the real
line, and any deterministic square matrix D with bounded asymptotic
spectrum.

Examples 2.41–2.48 are not only instances of asymptotic freeness,


but also of almost surely asymptotic freeness [111]. In particular, for
Example 2.48 the almost surely convergence holds if the asymptotic em-
pirical eigenvalue distribution of T converges almost surely to a com-
pactly supported probability measure. Note also that Examples 2.35
and 2.36 are special cases of Example 2.46 and 2.47, respectively.

Theorem 2.61. [64] Let (A, {P1 , V1 , . . . , P , V }) be asymptotically


free. If
Pi Vi = Vi Pi = I and φ(Pi Vj ) = 0
2.4. Free Probability 81

for all i ∈ {1, . . . ,


} and i = j, then P1 AV1 , . . . , P AV are asymptot-
ically free.

Example 2.49. [73] Let P be the permutation matrix corresponding


to a cyclic shift by
− 1 entries, and S be a complex standard Gaussian
matrix. Notice that P P† = I and that, for
= 1(modN ), tr{P } = 0.
Consequently, for N → ∞

tr{Pi P†j } − tr{Pi−j } = δi,j . (2.191)

Since SS† and {P1 , P†1 , . . . , PL , P†L } are asymptotically free (e.g. Ex-
ample 2.45), it follows from Theorem 2.61 that

P1 SS† P†1 , . . . , PL SS† P†L

are asymptotically free. Let S1 , . . . , SL be independent complex stan-


dard Gaussian matrices. The foregoing asymptotic freeness together
with the fact that the asymptotic distribution of the asymptoti-
cally free matrices P S S† P† does not depend on
, implies that the
. † †
asymptotic averaged empirical distributions of L =1 P SS P and of
.L †
=1 P S S P are the same.

Theorem 2.62. [290, 190] Let (P, {W1 , . . . , W }) be asymptotically


free Hermitian random matrices. PW1 P, . . . , PW P are asymptoti-
cally free if P is idempotent.

We note that, under the condition that P and V are unitary Haar
matrices, Theorems 2.61 and 2.62 hold not only in terms of asymptotic
freeness but also in terms of almost surely asymptotic freeness.

Theorem 2.63. [290, 190] Let W be a random matrix whose aver-


aged spectrum converges to the circular law (2.99). Let P1 , . . . , P be
a family of Hermitian random matrices asymptotically free of W such
that Pi Pj = Pj Pi = δi,j Pi , then WP1 W† , . . . , WP W† are asymp-
totically free. This result also holds if the spectrum of W converges to
the quarter circle law (1.21) or to the semicircle law (2.94), in which
case the spectrum of WPj W† converges to the Marc̆enko-Pastur law.
82 Random Matrix Theory

2.4.2 Sums of Asymptotically Free Random Matrices

Much of the practical usefulness of free probability stems from the


following result.

Theorem 2.64. [285] If A and B are asymptotically free random ma-


trices, then the R-transform of their sum satisfies

RA+B (z) = RA (z) + RB (z). (2.192)

As a simple application of this important result, and in view of Example


2.24, we can verify the translation property

RA+γI (z) = RA (z) + RγI (z) = RA (z) + γ. (2.193)

Using Theorem 2.64 and the relationship between the R-transform


and the η-transform (2.75)–(2.76) we can obtain:

Theorem 2.65. The η-transform of the sum of asymptotically free


random matrices is

ηA+B (γ) = ηA (γa ) + ηB (γb ) − 1 (2.194)

with γa , γb and γ satisfying the following pair of equations:

γa ηA (γa ) = γ ηA+B (γ) = γb ηB (γb ). (2.195)

As a simple application of Theorem 2.64, let us sketch a heuristic


argument for the key characterization (2.116) of the η-transform of the
asymptotic spectrum of HTH† . Let us assume that H is an N × K
matrix whose entries are independent random variables with common
variance N1 , while T is a deterministic positive real diagonal matrix.
According to Example 2.40, we can write HTH† as the sum of asymp-
totically free matrices


K
HTH = †
Tk hk h†k . (2.196)
k=1
2.4. Free Probability 83

Thus, with ζ ≥ 0


K
RHTH† (−ζ) = lim RT † (−ζ) (2.197)
K→∞ k hk hk
k=1

β  Tk
K
= lim (2.198)
K→∞ K 1 + Tk ζ
k=1
1 − ηT (ζ)
= β (2.199)
ζ
where (2.198) follows from (2.82) whereas (2.199) follows from the
law of large numbers. Finally, using the relationship between the η-
transform and the R-transform in (2.74) we obtain (2.116) letting
ζ = γηHTH† (γ), i.e.

ηHTH† (γ) = 1 − β (1 − ηT (γ ηHTH† (γ))) . (2.200)

Note that (2.197) has not been rigorously justified above, since it
involves both the limit in the size of the matrices which is the basis for
the claim of asymptotic freeness and a limit in the number of matrices.
The more general result (2.133)–(2.134) can be readily obtained
from (2.194), (2.195) and (2.200).
For T = I, we recover the η-transform in (2.121) of the Marc̆enko-
Pastur law. It is interesting to note that, in this special case, we are
summing unit-rank matrices whose spectra consist of a 1 − N1 mass
at 0 and a N1 mass at a location that converges to 1. If we were to
take the N th classical convolution (inverting the sum of log-moment
generating functions) of those distributions we would obtain asymptot-
ically the Poisson distribution; however, the distribution we obtain by
taking the N th free convolution (inverting the sum of R-transforms)
is the Marc̆enko-Pastur law. Thus, we can justifiably claim that the
Marc̆enko-Pastur law is the free analog of the classical Poisson law.

The free analog of the Gaussian law is the semicircle law according
to the celebrated free probability central limit theorem:

Theorem 2.66. [284] Let A1 , A2 , . . . be a sequence of N × N asymp-


totically free random matrices. Assume that φ(Ai ) = 0 and φ(A2i ) = 1.
84 Random Matrix Theory

Further assume that supi |φ(Aki )| < ∞ for all k. Then, as m, N → ∞,


the asymptotic spectrum of
1
√ (A1 + A2 + · · · Am ) (2.201)
m
converges in distribution to the semicircle law, that is, for every k,


⎨ 0
(A1 + A2 + · · · + Am )k
k odd
φ → 1 k
m2
k
⎩ k k
k even.
1+ 2 2
A simple sketch of the main idea behind the proof of this result
can be given in the case of asymptotically free matrices identically
distributed. In this case, Theorem 2.64 implies that the R-transform of
(2.201) equals
√   √ √  ∞  k−1
m RA1 √zm = mφ(A1 ) + zφ(A21 ) + m ck √zm (2.202)
k=3
→ z (2.203)
which is the R-transform of the semicircle law (Example 2.25). Note
that (2.202) follows from (2.84) while (2.203) follows from the fact that
the free cumulants are bounded because of the assumption in Theorem
2.66. A similar approach can be followed to prove that the spectra of
Gaussian Wigner matrices converges to the semicircle law. The key idea
is that a Gaussian standard Wigner matrix can be written as the sum
of two independent rescaled Gaussian standard Wigner matrices
1
W = √ (X1 + X2 ). (2.204)
2
Since the two matrices in the right side of (2.204) are asymptotically
free, the R-transforms satisfy
RW (z) = R X
√1
(z) + R X
√2
(z)
2 2

√ z
= 2 RW √ (2.205)
2
which admits the solution (cf. Example 2.25)
RW (z) = z. (2.206)
2.4. Free Probability 85

Example 2.50. Let H be an N × m random matrix whose entries are


zero-mean i.i.d. Gaussian random variables with variance √m1 N and

denote N1 m = ς. Using Example 2.46, Theorem 2.66, and the fact
that we can represent

1  †
m

HH = √ si si
m
i

with si an N -dimensional vector whose entries are zero-mean i.i.d. with


variance √1N , it can be shown that as N, m → ∞ with N m → 0, the
asymptotic spectrum of the matrix

HH† − ς N I

is the semicircle law. This result was also found using the moment ap-
proach, based on combinatorial tools, in [16] (without invoking Gaus-
sianity) and in [57] using results on√the asymptotic distribution of the
zeros of Laguerre polynomials Lm N ( N mx + m + N ).

2.4.3 Products of Asymptotically Free Matrices


The S-transform plays an analogous role to the R-transform for prod-
ucts (instead of sums) of asymptotically free matrices, as the following
theorem shows:34

Theorem 2.67. Let A and B be nonnegative asymptotically free ran-


dom matrices. The S-transform of their product satisfies

ΣAB (x) = ΣA (x)ΣB (x). (2.207)

Because of (2.69), it follows straightforwardly that the S-transform is


the free analog of the Mellin transform in classical probability theory,
whereas recall that the R-transform is the free analog of the log-moment
generating function in classical probability theory.

Theorem 2.67 together with (2.86) yields

34 Given the definition of the S-transform, we shall consider only nonnegative random ma-
trices whose trace does not vanish asymptotically.
86 Random Matrix Theory

Theorem 2.68. Let A and B be nonnegative asymptotically free ran-


dom matrices, then for 0 < γ < 1,
−1 γ
ηAB (γ) = η −1 (γ) ηB
−1
(γ). (2.208)
1−γ A

In addition, the following implicit relation is also useful:


γ
ηAB (γ) = ηA . (2.209)
ΣB (ηAB (γ) − 1)
As an application of (2.209), we can obtain the key relation (2.116)
from the S-transform of the Marc̆enko-Pastur law in (2.87)
1
ΣH† H (x) =
1+βx

provided that T and H† H are asymptotically free. According to (2.209)

ηTH† H (γ) = ηT (γ (1 − β + βηTH† H (γ)))


= ηT (γ ηHTH† (γ)) (2.210)

where (2.210) follows from (2.56). Applying (2.56) again,

ηHTH† (γ) = 1 − β + β ηT (γ ηHTH† (γ)) . (2.211)

From (2.209), Examples 2.13, 2.32 and 2.45 we obtain the following
result.

Example 2.51. Let Q be a N × K matrix uniformly distributed over


the manifold of N × K complex matrices such that Q† Q = I and let
A be an N × N nonnegative Hermitian random matrix independent of
Q whose empirical eigenvalue distribution converges almost surely to
a compactly supported measure. Then

β−1
ηQQ† A (γ) = ηA γ + γ (2.212)
ηQQ† A (γ)

with K
N → β.
2.4. Free Probability 87

Example 2.52. Define two N × N independent random matrices,


H1 and H2 , each having zero-mean i.i.d. entries with variance N1
and higher order moments of order o(1/N ). From Example 2.39,
({H1 , H†1 }, {H2 , H†2 }) are asymptotically free and, consequently, we can
compute the S-transform of A2 = H1 H2 H†2 H†1 by simply applying Ex-
ample 2.29 and Theorems 2.67 and 2.32:
1
ΣA2 (x) = (2.213)
(x + 1)2
from which it follows that the η-transform of A2 , ηA2 (γ), is the solution
of the fixed-point equation
η(1 + γη 2 ) = 1. (2.214)

Example 2.52 can be extended as follows.


Example 2.53. [184] Let H and T be as in Theorem 2.39. Then,

x+1 x x
ΣHTH† (x) = ΣH† H ΣT (2.215)
x+β β β

1 x
= ΣT (2.216)
x+β β
where (2.216) follows from Example 2.29.

Example 2.53 follows from the fact that, if H in Theorem 2.39 is a


standard complex Gaussian matrix, then ({H, H† }, T) are asymptoti-
cally free (cf. Example 2.43) and thus it follows from Theorem 2.67 that
the S-transform of HTH† is given by (2.216). On the other hand, since
the validity of Theorem 2.39 depends on the distribution of H only
through the first and second order moments, every matrix HTH† de-
fined as in Theorem 2.39 with H arbitrarily distributed admits the same
asymptotic spectrum and the same R- and S-transforms and hence Ex-
ample 2.53 follows straightforwardly. Analogous considerations hold for
Theorems 2.38, 2.42 and 2.43. More precisely, the hypotheses in those
theorems are sufficient to guarantee the additivity of the R-transforms
and factorability of the S-transforms therein. Note, however, that the
factorability of the S-transforms in (2.216) and the additivity of the R-
transforms in Theorem 2.38 do not imply, in general, that ({H, H† }, T)
are asymptotically free.
88 Random Matrix Theory

2.4.4 Freeness and Non-Crossing Partitions


The combinatorial description of the freeness developed by Speicher in
[241, 243] and in some of his joint works with A. Nica [189] has suc-
ceeded in obtaining a number of new results in free probability theory.
It is well known that there exists a combinatorial description of the
classical cumulants that is related to the partition theory of sets. In
the same way, a noncommutative analogue to the classical cumulants,
the so-called free cumulants, can be also described combinatorially. The
key difference with the classical case is that one has to replace the par-
titions by so-called non-crossing partitions [241, 243].

Definition 2.22. Consider the set {1, . . . , n} and let  be a partition


of this set,
 = {V1 , . . . , Vk },
where each Vi is called a block of . A partition  is called non-crossing
if the following does not occur: there exist 1 ≤ p1 ≤ q1 ≤ p2 ≤ q2 such
that p1 and p2 belong to the same block, q1 and q2 belong to the same
block, but q1 and p2 do not belong to the same block.

Example 2.54. Consider the set {1, 2, 3, 4} and the non-crossing par-
tition  = {{1, 3}, {2}, {4}}. Definition 2.22 is interpreted graphically
in Figure 2.7(a) by connecting elements in the same block with a line.
The fact that these lines do not cross evidences the non-crossing nature
of the partition. In contrast, the crossing partition  = {{1, 3}, {2, 4}}
of the same set is also shown in Figure 2.7(b).

Example 2.55. Consider the set {1, 2, . . . , 7}. Let V1 , V2 and V3 be a


partition of {1, 2, . . . , 7} with V1 = {1, 5, 7}, V2 = {2, 3, 4}, and V3 =
{6}. Then {V1 , V2 , V3 } is a non-crossing partition.

Every non-crossing partition , can be associated to a complemen-


tation map [154], denoted by K(). Figure 2.8 depicts the non-crossing
partition  = {{1, 5, 7}, {2, 3, 4}, {6}} and the corresponding comple-
mentation map K() = {{1, 4}, {2}, {3}, {5, 6}, {7}}. The complemen-
tation map K() can be found graphically as follows: duplicate the
2.4. Free Probability 89

1 2 3 4 1 2 3 4

(a) (b)

Fig. 2.7 Figures (a) and (b) depict a non-crossing and a crossing partition respectively.

1 1 2 2 3 3 4 4 5 5 6 6 7 7

Fig. 2.8 The non-crossing partition = {{1, 5, 7}, {2, 3, 4}, {6}} and the complementation
map K( ) = {{1, 4}, {2}, {3}, {5, 6}, {7}} obtained with the repeated integers.

elements of the set placing them between the elements of the old set;
then connect with a line as many elements of the new set as possible
without crossing the lines of the original partition.
The number of non-crossing partitions of the set {1, 2, . . . , n} into
i blocks equals35

1 n n
Qi = .
n i i−1
Moreover, the number of non-crossing partitions of {1, 2, . . . , n} equals
the nth Catalan number. This follows straightforwardly from the fact
that
n

1 2n
Qi = .
n+1 n
i=1

35 Note
Pn
that i=1 Qi β i equals the n-th moment of f̃β (·) given in (1.12).
90 Random Matrix Theory

The following result gives a general expression of the joint moments


of asymptotically free random matrices.

Theorem 2.69. [20, 21] Consider matrices A1 , . . . , A whose size is


such that the product A1 . . . A is defined. Some of these matrices are
allowed to be identical. Omitting repetitions, assume that the matrices
are asymptotically free.36 Let  be the partition of {1, . . . ,
} deter-
mined by the equivalence relation37 j ≡ k if ij = ik . For each partition
 of {1, . . . ,
}, let

φ = φ(Aj1 . . . Ajr ).
{j1 , . . . , jr } ∈ 
j1 < . . . < jr

There exist universal coefficients c(, ) such that



φ(A1 . . . A ) = c(, )φ
≤

where  ≤  indicates that  is finer 38 than .

Finding an explicit formula for the coefficients c(, ) is a nontrivial


combinatorial problem which has been solved by Speicher [241, 243].
From Theorem 2.69 it follows that φ(A1 . . . A ) is completely deter-
mined by the moments of the individual matrices.
It is useful to highlight a special case of Theorem 2.69.

Theorem 2.70. [111] Assume that A and B are asymptotically free


random matrices. Then, the moments of A + B are expressed by the
free cumulants of A and B as
 
φ((A + B)n ) = (c|V | (A) + c|V | (B)) (2.217)
V ∈

36 For example, (A1 , . . . , A4 ) = (B, C, C, B) with B and C asymptotically free.


37 If an equivalence relation is given on the set Ω, then the set of all equivalence classes
forms a partition of Ω. Conversely, if a partition 1 is given on Ω, we can define an
equivalence relation on Ω by writing x ≡ y if and only if there exists a member of 1
which contains both x and y. The notions of “equivalence relation” and “partition” are
thus essentially equivalent.
38 Given two partitions and of a given set Ω, we say that is finer than if it
1 2 1 2
splits the set Ω into smaller blocks, i.e., if every element of 1 is a subset of an element
of 2 . In that case, one writes 1 ≤ 2 .
2.5. Convergence Rates and Asymptotic Normality 91

where the summation is over all non-crossing partitions of {1, . . . , n},


c (A) denotes the
th free cumulant of A (cf. Section 2.2.5) and |V |
denotes the cardinality of V .

Theorem 2.70 is based on the fact that, if A and B are asymp-


totically free random matrices, the free cumulants of the sum satisfy
c (A + B) = c (A) + c (B).
The counterpart of Theorem 2.70 for the product of two asymptoti-
cally free random matrices A and B is given by the following theorem.

Theorem 2.71. [111] Assume that A and B are asymptotically free


random matrices. Then the moments of AB are expressed by the free
cumulants of A and B as follows:
  
φ((AB)n ) = c|V1 | (A) c|V2 | (B) (2.218)
1 , 2 V1 ∈ 1 V2 ∈ 2

where the summation is over all non-crossing partitions of {1, . . . , n}.

2.5 Convergence Rates and Asymptotic Normality


Most of the literature on large random matrices has focused on the
existence of the limiting spectral distributions employing the moment
convergence theorem, i.e., verifying the convergence of the kth moments
of the N × N random matrix to the moments of the target distribution
either almost surely or in probability. While this method guarantees
convergence, it gives no information on the speed of convergence. Loose
bounds on the convergence rate to the semicircle law were put forth in
1998 by Girko [88]. A sharper result, but probably not the final word
on the matter, was obtained recently:

Theorem 2.72. [95] Let W be an N × N Gaussian standard Wigner


matrix. The maximal absolute difference between the expected empir-
ical eigenvalue distribution of W and the semicircle law, Fw , whose
density is given in (2.94), vanishes as
−2/3
W ] − Fw ≤ κN
E[FN (2.219)

with κ a positive constant and with f − g = supx |f (x) − g(x)|.


92 Random Matrix Theory

For an arbitrary deterministic sequence aN , the notation

ξN = Op (aN ) (2.220)

means39 that, for any , there exists an ς > 0 such that

sup P [|ξN | ≥ ςaN ] < . (2.221)


N

Similarly, the notation

ξN = o(aN ) a. s. (2.222)

means that a−1


N ξN → 0 almost surely.

Theorem 2.73.√ [11] Let W be an N ×N standard Wigner matrix such


that supi,j,N E[| N Wi,j |8 ] < ∞ and that, for any positive constant δ,
 //√ /8
/

E / N Wi,j / 1{|Wi,j | ≥ δ} = o(N 2 ). (2.223)
i,j

Then,
−2/5
W − Fw = Op (N
FN ). (2.224)

If we further assume that all entries of N W have finite moments of
all orders, then for any η > 0, the empirical distribution of the Wigner
matrix tends to the semicircle law as
−2/5+η
W − Fw = o(N
FN ) a. s. (2.225)

If we relax the assumption on the entries of N W to simply finite
W and E[FW ]
fourth-order moments, then the convergence rates for FN N

have been proved in [8] to reduce to


−1/4
W ] − Fw = O(N
E[FN ) (2.226)
−1/4
FN
W − Fw = Op (N ). (2.227)

In the context of random matrices of the form HH† the following


results have been obtained.
39 It
is common in the literature to say that a sequence of random variables is tight if it is
Op (1).
2.5. Convergence Rates and Asymptotic Normality 93

Theorem 2.74. [12] Let H be an N × K matrix whose entries are


mutually independent with zero mean and variance N1 . Assume that
√ 
sup E | N Hi,j |8 < ∞ (2.228)
i,j,N
and for any positive constant δ
 //√ /8
/

E / N Hi,j / 1{|Hi,j | ≥ δ} = o(N 2 ). (2.229)
i,j
Then, the maximal absolute difference between the expected empiri-
cal eigenvalue distribution of H† H and the Marc̆enko-Pastur law, Fβ ,
whose density is given in (1.10), vanishes as
" 1 #
− 4θ+2
N
H† H ] − Fβ = O
E[FN √ 1 (2.230)
1 − β + N − 8θ+4
and
" ! 2 1 0#
− 5+θ − 4θ+2
N N
H† H − Fβ = Op max
FN √ 1 , √ 1
1 − β + N − 5+θ 1 − β + N − 8θ+4
(2.231)
with ⎧ q

⎨ −2 log(1− K
) 1
N
q if K
N ≤ (1 − N 8 )2 ,
θ= log N +4 log(1− K
) (2.232)

⎩ 1
N

2 otherwise.

Summarizing, if β < 1 then θ ∼ c/ log N and hence the convergence


rates in (2.230) and (2.231) are O(N −1/2 ) and Op (N −2/5 ), respectively.
When β > 1, θ = 12 and the rates are O(N −1/8 ) and Op (N −1/8 ),
respectively. For β = 1, the exact speed at which K N → 1 matters as far
as Theorem 2.74 is concerned.

Theorem 2.75. [87, 15] Let H be an N × K complex matrix whose


entries
√ are i.i.d. zero-mean random variables with variance N1 such that
E[| N Hi,j |4 ] = 2. Define the random variable
 b
∆N = log det(H† H) − K log(x) fβ (x)dx (2.233)
a

† 1−β
= log det(H H) + K log(1 − β) + log e
β
94 Random Matrix Theory

with fβ (·) the density of the Marc̆enko-Pastur law in (1.10). As K, N →


∞ with K N → β ≤ 1, ∆N converges to a Gaussian random variable with
zero mean and variance
  1
E |∆|2 = log . (2.234)
1−β

The counterpart of Theorem 2.75 for real H was first derived by Jon-
sson in [131] for a real zero-mean matrix with Gaussian i.i.d. entries
and an analogous result has been found by Girko in [87] for real (pos-
sible nonzero-mean) matrix with i.i.d. entries and variance N1 . In the
special case of Gaussian entries, Theorem 2.75 can be easily obtained
following [131] using the expression of the moment-generating function
of log det(H† H) in (2.11). In the general case, Theorem 2.75 can be
easily verified using the result given in [15].

Theorem 2.76. [15] Let H be an N × K complex matrix whose en-


tries are i.i.d. zero-mean random variables with variance N1 such that
E[|Hi,j |4 ] = N22 . Denote by Vβ (γ) the Shannon transform of f̃β (·) (Ex-
ample 2.117). As K, N → ∞ with K N → β, the random variable
 b

∆N = log det(I + γHH ) − N log(1 + γx) f̃β (x) dx
a
= log det(I + γHH† ) − N Vβ (γ) (2.235)

is asymptotically Gaussian with zero mean and variance


" #
 2 (1 − ηHH† (γ))2
E ∆ = − log 1 −
β
"
#
1 F (γ, β) 2
= − log 1 − . (2.236)
β 4γ

Notice that
F (γ, β)
lim = min{1, β} (2.237)
γ→∞ 4γ
and Theorem 2.75 can be obtained as special case.
2.5. Convergence Rates and Asymptotic Normality 95

Theorem 2.77. [15] Let H be an N × K complex matrix defined as in


Theorem 2.76. Let T be an Hermitian random matrix independent of H
with bounded spectral norm and whose asymptotic spectrum converges
almost surely to a nonrandom limit. Denote by VHTH† (γ) the Shannon
transform of HTH† . As K, N → ∞ with K N → β, the random variable

∆N = log det(I + γHTH† ) − N VHTH† (γ) (2.238)

is asymptotically zero-mean Gaussian with variance


" #
2
(1 − ηHTH † (γ))
E[∆2 ] = − log 1 − . (2.239)
β

More general results (for functions other than log(1+γx)) are given
in [15].

Theorem 2.78. [15] Let H be an N × K complex matrix defined as


in Theorem 2.76. Let T be a K × K nonnegative definite deterministic
matrix defined as in Theorem 2.77. Let g(·) be a continuous function
on the real line with bounded and continuous derivatives, analytic on
a open set containing the interval40

2
lim inf φN max {0, 1 − β}, lim sup φ1 (1 + β) .
2
N N
(2.240)

where φ1 ≥ . . . ≥ φN are the eigenvalues of T. Denoting by λi the ith


eigenvalue of HTH† , the random variable


N 
∆N = g(λi ) − N g(x) dFHTH† (2.241)
i=1

converges, as K, N → ∞ with K
N → β, to a zero-mean Gaussian random
variable.41

40 In [14, 13, 170, 222] this interval contains the spectral support of H† HT.
41 See [15] for an expression of the variance of the limit.
3
Applications to Wireless Communications

In this section, we detail some of the more representative problems


described by (1.1) that capture various features of interest in wireless
communications and we show how random matrix results have been
used to characterize the fundamental limits of the various channels
that arise in wireless communications.
Unless otherwise stated, the analysis applies to coherent reception
and thus it is presumed that the state of the channel is perfectly tracked
by the receiver. The degree of channel knowledge at the transmitter,
on the other hand, is specified for each individual setting.

3.1 Direct-Sequence CDMA


The analysis of randomly-spread DS-CDMA in the asymptotic regime
of number of users, K, and spreading gain, N , going to infinity with
N → β provides valuable insight into the behavior of multiuser re-
K

ceivers for large DS-CDMA systems employing pseudo-noise spreading


sequences (e.g. [167, 275, 256, 100, 217, 30]).
The standard random signature model [271, Sec. 2.3.5] assumes
that the entries of the matrix S, whose columns are the spreading

96
3.1. Direct-Sequence CDMA 97

sequences, are chosen independently and equiprobably on { √−1 N


, √1N }.
A motivation for this is the use of “long sequences” in commercial
CDMA systems, where the period of the pseudo-random sequence spans
many symbols. Another motivation is to provide a baseline of compar-
ison for systems that use signature waveform families with low cross-
correlations. Sometimes (particularly when the random sequence set-
ting is used to model to some extent nonideal effects such as asynchro-
nism and the frequency selectivity of the channel) the signatures are as-
sumed to be uniformly distributed on the unit Euclidean N -dimensional
sphere (a case for which the Marcenko-Pastur law also applies). In the
analysis that follows, the only condition on the signature sequences is
that their entries be i.i.d. zero-mean with variance N1 .
Specializing the general model in (1.1) to DS-CDMA, the vector x
contains the symbols transmitted by the K users, which have zero-mean
and equal variance. The entries of x correspond to different users and
are therefore independent. (Unequal-power users will be accommodated
by pre-multiplying x by an additional diagonal matrix of amplitudes.)

3.1.1 Unfaded Equal-Power DS-CDMA


With equal-power transmission at every user and no fading, the multi-
access channel model becomes [271, Sec. 2.9.2]
y = Sx + n, (3.1)
where the energy per symbol transmitted from each user divided by
the noise variance per chip is denoted by SNR , i.e.,
E[ x 2 ]
SNR = .
N E[ n ]
1 2

Asymptotic analyses have been reported in the literature for various


receivers, including:

• Single-user matched filter


• Decorrelator
• MMSE
• Optimum
• Iterative nonlinear.
98 Applications to Wireless Communications

The asymptotic analysis of the single-user matched filter (both un-


coded error probability and capacity) has relied on the central limit
theorem rather than on random matrix techniques [275]. The asymp-
totic analysis of the uncoded error probability has not used random
matrix techniques either: [258] used large-deviation techniques to ob-
tain the asymptotic efficiency and [249] used the replica method of
statistical physics to find an expression for the uncoded bit error rate
(see also [103]). The optimum near-far resistance and the MMSE were
obtained in [271] using the Marc̆enko-Pastur law (Theorem 2.35). Re-
call, from (1.12), that the asymptotic fraction of zero eigenvalues of
HH† is given by (1 − β)+ . Then, for β ≤ 1, using (2.57), the decorre-
lator achieves an output SINR that converges asymptotically to [271,
(4.111)]
(1 − β) SNR . (3.2)
When β > 1, the Moore-Penrose generalized-inverse decorrelator [271,
Sec. 5.1] is shown in [70] (also using the Marc̆enko-Pastur law) to attain
an asymptotic SINR ratio equal to
β−1
. (3.3)
(β − 1)2 + β/SNR
Using (2.57) and (2.121), the maximum SINR (achieved by the MMSE
linear receiver) converges to [271, (6.59)]
F (SNR , β)
SNR − (3.4)
4
with F(·, ·) defined in (1.17) while the MMSE converges to
F (SNR , β)
1− . (3.5)
4 SNR β
Incidentally, note that, as SNR → ∞, (3.3) and (3.4) converge to the
same quantity if β > 1.
The total capacity (sum-rate) of the multiaccess channel (3.1) was
obtained in [275] for the linear receivers listed above and the optimum
receiver also using the Marc̆enko-Pastur law. These expressions for the
decorrelator and MMSE receiver are
C dec (β, SNR ) = β log (1 + SNR (1 − β)) , 0≤β≤1 (3.6)
3.1. Direct-Sequence CDMA 99

and

F (SNR , β)
C (β, SNR ) = β log 1 + SNR −
mmse
(3.7)
4
while the capacity achieved with the optimum receiver is (1.14)

F (SNR , β)
C opt
(β, SNR ) = β log 1 + SNR −
4

F (SNR , β) F (SNR , β)
+ log 1 + SNR β − − log e.
4 4SNR
(3.8)

Spectral Efficiency
Bits/s/Hz
6
No Spreading
5
Optimal

2 MMSE

Decorrelator
1
Matched Filter

0.5 1 1.5 2
β

Eb
Fig. 3.1 Capacity of CDMA without fading for N0
= 10dB.

Figure 3.1 (from [275]) compares (3.6), (3.7) and (3.8) as a func-
tion of the number of users to spreading gain β, choosing SNR so that
Eb
β SNR /C(β, SNR ) = N 0
= 10.
100 Applications to Wireless Communications

3.1.2 DS-CDMA with Frequency-Flat Fading


When the users are affected by different attenuations which may vary
from symbol to symbol, it is convenient to model the channel gains
seen by each user as random quantities {|A1 |2 , . . . , |AK |2 } whose em-
pirical distribution converges almost surely to a nonrandom limit as
the number of users goes to infinity. In this case, the channel matrix
H can be written as the product of the N × K matrix S containing
the spreading sequences with a K × K diagonal matrix A of complex
fading coefficients such that the linear model in (1.1) becomes

y = SAx + n. (3.9)

Here, the role of the received signal-to-noise ratio of the kth user is
taken by |Ak |2 SNR .
The η-transform is intimately related to the performance of MMSE
multiuser detection of (3.1). The arithmetic mean of the MMSEs for
the K users satisfies [271, (6.27)]
 −1 
1 
K
1 † †
MMSEk = tr I + SNR A S SA (3.10)
K K
k=1
→ ηA† S† SA (SNR ) (3.11)

whereas the multiuser efficiency of the kth user (output SINR relative
to the single-user signal-to-noise ratio) achieved by the MMSE receiver,
ηkmmse (SNR ), is1
⎛ ⎞−1

ηkmmse (SNR ) = sTk ⎝I + SNR |Ai | si si ⎠ sk
2 T
(3.12)
i =k
→ ηSAA† S† (SNR ) (3.13)

where the limit follows from (2.57). According to Theorem 2.39, the
MMSE multiuser efficiency, abbreviated as

η = ηSAA† S† (SNR ), (3.14)

1 Theconventional notation for multiuser efficiency is η [271]; the relationship in (3.13) is


the motivation for the choice of the η-transform terminology introduced in Section 2.2.2.
3.1. Direct-Sequence CDMA 101

is the solution to the fixed-point equation


, -
1 − η = β 1 − η|A|2 (SNR η) , (3.15)

where η|A|2 is the η-transform of the asymptotic empirical distribution


of {|A1 |2 , . . . , |AK |2 }. A fixed-point equation equivalent to (3.15) was
given in [256] and its generalization to systems with symbol-level asyn-
chronism (but still chip-synchronous) is studied in [152].
The distribution of the output SINR is asymptotically Gaussian
[257], in the sense of Theorem 2.78, and its variance decreases as N1 .
The same holds for the decorrelator. Closed-form expressions for the
asymptotic mean are SNR ηSAA† S† for the MMSE receiver and SNR (1 −
β P[|A| > 0]) for the decorrelator with β < 1 while the variance, for
both receivers, is obtained in [257].2
In [217], the spectral efficiencies achieved by the MMSE receiver
and the decorrelator are given respectively by
 , -
C mmse (β, SNR ) = β E log 1 + |A|2 SNR ηSAA† S† (SNR ) (3.16)

and, for β ≤ 1,
 , -
C dec (β, SNR ) = β E log 1 + |A|2 SNR (1 − β P[|A| > 0]) (3.17)

where the distribution of |A|2 is given by the asymptotic empirical dis-


tribution of AA† and (3.17) follows from Corollary 2.2 using the fact
that the multiuser efficiency of the kth user achieved by the decorrela-
tor, ηkdec , equals that of the MMSE as the noise vanishes [271].
Also in [217], the capacity of the optimum receiver is characterized
in terms of the MMSE spectral efficiency: 3
1
C opt (β, SNR ) = C mmse (β, SNR ) + log
ηSAA† S† (SNR )
+(ηSAA† S† (SNR ) − 1) log e. (3.18)

2 Although most fading distributions of practical interest do not have any point masses
at zero, we express various results without making such an assumption on the fading
distribution. For example, the inactivity of certain users or groups of users can be modelled
by nonzero point masses in the fading distribution.
3 Equation (3.18) also holds for the capacity with non-Gaussian inputs, as shown in [186]

and [103] using statistical-physics methods.


102 Applications to Wireless Communications

This result can be immediately obtained by specializing Theorem 2.44


to the case where T = AA† and D = I. Here we give the derivation
in [217], which illustrates the usefulness of the interplay between the η
and Shannon transforms. From the definition of Shannon transform, the
capacity of the optimum receiver coincides with the Shannon transform
of the matrix evaluated at SNR , i.e.,
C opt (β, SNR ) = VSAA† S† (SNR ). (3.19)
Furthermore, also from the definition of Shannon transform and (3.16),
it follows that
C mmse (β, SNR ) = β VAA† (SNR ηSAA† S† (SNR )) (3.20)
and we know from (2.61) that
γ d
VX (γ) = 1 − ηX (γ). (3.21)
log e dγ
Thus, using the shorthand in (3.14),

d 1 − ηAA† (SNR η) SNR η̇


C mmse
(SNR , β) = β 1+ log e
dSNR SNR η

1−η SNR η̇
= 1+ log e (3.22)
SNR η
where we used (3.15) to write (3.22). The derivative of (3.19) yields
d opt 1−η
C (β, SNR ) = log e. (3.23)
dSNR SNR

Subtracting the right-hand sides of (3.22) and (3.23),


d opt d mmse 1
C (β, SNR ) − C (SNR , β) = η̇ 1 − log e, (3.24)
dSNR dSNR η
which is equivalent to (3.18) since, at SNR = 0, both functions equal 0.
Random matrix methods have also been used to optimize power
control laws in DS-CDMA, as the number of users goes to infinity,
for various receivers: matched filter, decorrelator, MMSE and optimum
receiver [217, 281].
Departing from the usual setup where the channel and spreading se-
quences are known by the receiver, the performance of blind and group-
blind linear multiuser receivers that have access only to the received
3.1. Direct-Sequence CDMA 103

spreading sequence of the user of interest is carried out via random ma-
trix techniques in [318]. The asymptotic SINR at the output of direct
matrix inversion blind MMSE, subspace blind MMSE and group-blind
MMSE receivers with binary random spreading is investigated and an
interesting saturation phenomenon is observed. This indicates that the
performance of blind linear multiuser receivers is not only limited by
interference, but by estimation errors as well. The output residual inter-
ference is shown to be zero-mean and Gaussian with variance depending
on the type of receiver.

3.1.3 DS-CDMA with Flat Fading and Antenna Diversity


Let us now study the impact of having, in addition to frequency-flat
fading, L receive antennas at the base station. The channel matrix is
now the N L × K array
⎡ ⎤
SA1
H = ⎣ ··· ⎦ (3.25)
SAL

where
A = diag{A1, , . . . , AK, },
= 1, . . . L (3.26)
and {Ak, } indicates the i.i.d. fading coefficients of the kth user at the

th antenna.
Assuming that the fading coefficients are bounded,4 using Lemma
2.60, [108] shows that the asymptotic averaged empirical singular value
distribution of (3.25) is the same as that of
⎡ ⎤
S1 A1
⎣ ··· ⎦
SL AL

where Sk for k ∈ {1, . . . , L} are i.i.d. matrices. Consequently, Theorem


2.50 leads to the conclusion that
β
1 − ηHH† = ( 1 − ηP ( SNR ηHH† ) ) , (3.27)
L
4 This assumption is dropped in [160].
104 Applications to Wireless Communications

where ηP is the η-transform of the asymptotic empirical distribution


.
of P1 , . . . , PK with Pk = L
=1 |Ak, | . This result admits the pleasing
2

engineering interpretation that the effective spreading gain is equal to


the CDMA spreading gain times the number of receive antennas (but,
of course, the bandwidth only grows with the CDMA spreading gain).
From the above result it follows that the expected arithmetic mean
of the MMSE’s for the K users converges to
 −1 
1 
K
1
E[MMSEk ] = E tr I + SNR H† H (3.28)
K K
k=1
→ ηH† H (SNR ). (3.29)

Moreover, the MMSE multiuser efficiency, ηkmmse (SNR ), converges in


probability as K, N → ∞ to [108]

ηkmmse (SNR ) → ηHH† (3.30)

while the asymptotic multiuser efficiency is given by


 
β
lim ηkmmse (SNR ) = 1 − min P[P = 0], 1 (3.31)
SNR →∞ L
where P is a random variable distributed according to the asymptotic
empirical distribution of P1 , . . . , PK . The spectral efficiency for MMSE
and decorrelator and the capacity of the optimum receiver are

C mmse (β, SNR ) = β VP (SNR ηHH† (SNR ))


= β E [log (1 + SNR P ηHH† (SNR ))] (3.32)

and, using Corollary 2.2, for β ≤ 1



β
C (β, SNR ) = β E log 1 + SNR P 1 − P[P > 0]
dec
(3.33)
L
while

C opt (β, SNR ) = C mmse (β, SNR ) + log 1


ηHH† (SNR )
+(ηHH† (SNR ) − 1) log e. (3.34)

Note the parallel between (3.32–3.34) and (3.16–3.18).


3.1. Direct-Sequence CDMA 105

3.1.4 DS-CDMA with Frequency-Selective Fading


Let us consider a synchronous DS-CDMA uplink with K active users
employing random spreading codes and operating over a frequency-
selective fading channel. The base station is equipped with a single
receive antenna.
Assuming that the symbol duration (Ts ≈ W N
c
with Wc the chip-
bandwidth) is much larger than the delay spread, we can disregard
the intersymbol interference. In this case, the channel matrix in (1.1)
particularizes to

H = [C1 s1 , . . . , CK sK ]A (3.35)

where A is a K × K deterministic diagonal matrix containing the am-


plitudes of the users and Ck is an N × N Toeplitz matrix defined as

1 i−j
(Ck )i,j = ck (3.36)
Wc Wc
with ck (·) the impulse response of the channel for the kth user inde-
pendent across users.
Let Λ be an N × K matrix whose (i, j)th entry is

Λi,j = λi (Cj )|Aj |2

with λi (Cj ) the ith eigenvalue of Cj C†j . Assuming that Λ behaves


ergodically (cf. Definition 2.17), from Theorem 2.59 it follows that the
arithmetic mean of the MMSE’s satisfies
 −1 
1 
K
1 †
MMSEk = tr I + SNR H H (3.37)
K K
k=1
→ ηH† H (SNR ) (3.38)
1 1
= 1 − + E [ ΓHH† (X, SNR ) ] (3.39)
β β
where in (3.39) we have used (2.56). The function ΓHH† (·, ·), in turn,
satisfies the fixed-point equation

ρ(x, Y)ΓHH† (x, SNR )
ΓHH† (x, SNR ) + β SNR E = 1
1 + SNR E[ρ(X, Y)ΓHH† (X, SNR )|Y]
(3.40)
106 Applications to Wireless Communications

where X and Y are independent random variables uniform on [0, 1] and


ρ(·, ·) is the channel profile of Λ (cf. Definition 2.18). Note that the
received signal-to-noise ratio of the kth user is SNR hk 2 with
1
hk 2 → |Ak |2 lim tr{C†k Ck }
N →∞ N
= E[ρk (X)]. (3.41)

with E[ρk (X)] representing the one-dimensional channel profile (cf. Def-
inition 2.18) of Λ. The multiuser efficiency of the kth user achieved by
the MMSE receiver is [159]
SINRk
ηkmmse (SNR ) = (3.42)
SNR hk 2
 . −1
h†k I + SNR i =k hi h†i hk
= (3.43)
hk 2
(y, SNR )
→ (3.44)
E[ρk (X)]

K ≤ y < K and (·, ·) defined as the solution to the fixed-point


with k−1 k

equation (cf. (2.157))


⎡ ⎤
ρ(X, y)
(y, SNR ) = E ⎣  ⎦ . (3.45)
1 + SNR β E 1+SNRρ(X,Y)
(Y,SNR ) |X

Let the ratio between the effective number of users and the effective
processing gain be defined as
P[E [ρ(X, Y)|Y] > 0]
β = β . (3.46)
P[E [ρ(X, Y)|X] > 0]
Using Corollary 2.4, we obtain that the asymptotic MMSE multiuser
efficiency admits the following expression for β  < 1:

ηkdec = lim ηkmmse (SNR )


SNR →∞
β P[E[ρ(X, Y)|Y] = 0]Γ∞ (y)
= (3.47)
E[ρk (X)]
where Γ∞ (·) satisfies (2.142) with the role of v(x, y) played by ρ(x, y).
3.1. Direct-Sequence CDMA 107

Specializing (3.39) to the case that the signal transmitted by each


user propagates through L discrete i.i.d. chip-spaced paths (where L
does not grow with N ), the η-transform of the asymptotic averaged
eigenvalue distribution of HH† , ηHH† , satisfies the fixed-point equation
[159]

1 − ηHH† = β ( 1 − ηP ( SNR ηHH† ) ) (3.48)

where ηP is the η-transform of the almost sure asymptotic empirical


distribution of5
! L /
/2 L /
/2 0
|A1 |2  //
/
/ , ... , |AK |2  /
/cK
/
/ .
/ c / / 2Wc /
1
Wc2 2Wc Wc2
=1 =1

Using this result, [159] concludes that, asymptotically as N → ∞, each


multipath interferer with a fixed number of resolvable paths acts like
a single path interferer with received power equal to the total received
power from all the paths of that user. From this it follows that, in the
special case of a fixed number of i.i.d. resolvable paths, the expressions
obtained for the SINR at the output of the decorrelator and MMSE
receiver in a frequency-selective channel are equivalent to those for a
flat fading channel. This result has been found also in [73] under the
assumption that the spreading sequences are either independent across
users and paths or independent across users and cyclically shifted across
the paths (cf. Section 3.1.5).
In the downlink, every user experiences the same frequency-selective
fading, i.e., Ck = C ∀k, where the empirical distribution of CC† con-
verges almost surely to a nonrandom limit F|C|2 . Consequently, (3.35)
particularizes to

H = CSA. (3.49)

Using Theorem 2.46 and with the aid of an auxiliary function χ(SNR ),
abbreviated as χ, we obtain that the MMSE multiuser efficiency of the

5 Whenever we refer to an almost sure asymptotic empirical distribution, we are implic-


itly assuming that the corresponding empirical distribution converges almost surely to a
nonrandom limit.
108 Applications to Wireless Communications

kth user, abbreviated as η = η mmse (SNR ), is the solution to


1 − η|C|2 (β χ)
βηχ = (3.50)
E[|C|2 ]
1 − η|A|2 (SNR E[|C|2 ]η)
ηχ = (3.51)
E[|C|2 ]
where |C|2 and |A|2 are independent random variables with distribu-
tions given by the asymptotic spectra of CC† and AA† , respectively,
while η|C|2 (·) and η|A|2 (·) represent their respective η-transforms. Note
that, instead of (3.51) and (3.50), we may write [37, 159]
⎡ ⎤
1 |C|2
η= E⎣  ⎦ . (3.52)
E[|C|2 ] 1 + β SNR |C|2 E |A|2
1+SNR E[|C|2 ] |A|2 η

From Corollary 2.2 we have that, for


P[|A| > 0]
β ≤ 1,
P[|C| > 0
ηkdec converges almost surely to the solution to

|C|2
1 = E dec . (3.53)
η E[|C|2 ] + β P[|A| > 0] |C|2
Note that both the MMSE and the decorrelator multiuser efficiencies
are asymptotically the same for every user.
From Theorem 2.43, the downlink counterpart of (3.39) is [159]
 −1 
1 
K
1 †
MMSEk = tr I + SNR H H
K K
k=1
1 1
= 1− + η 2 (βχ(SNR )) (3.54)
β β |C|
with χ(·) solution to (3.51) and (3.50). The special case of (3.52) for
equal-power users was given in [56].

For the sake of brevity, we will not explicitly extend the analysis to
the case in which both frequency selectivity and multiple receive anten-
nas are present. This can be done by blending the results obtained in
3.1. Direct-Sequence CDMA 109

Sections 3.1.3 and 3.1.4. Moreover, multiple transmit antennas can be


further incorporated as done explicitly in [169], where analytical tools
already leveraged in Sections 3.1.2-3.1.3 are applied to the asymptotic
characterization of the single-user matched filter and MMSE detectors.
It is found that DS-CDMA, even with single-user decoding, can out-
perform orthogonal multiaccess with multiple antennas provided the
number of receive antennas is sufficiently large.

In most of the literature, the DS-CDMA channel spans only the


users within a particular system cell with the users in other cells re-
garded as a collective source of additive white Gaussian noise. While it
is reasonable to preclude certain forms of multiuser detection of users
in other cells, on the basis that their codebooks may be unknown, the
structure in the signals of those other-cell users can be exploited even
without access to their codebooks. This, however, requires more re-
fined models that incorporate this structure explicitly within the noise.
For some simple such models, the performance of various receivers has
been evaluated asymptotically in [317, 237]. Since the expression for
the capacity of a DS-CDMA channel with colored noise parallels that
of the corresponding multi-antenna channel, we defer the details to
Section 3.3.8.

3.1.5 Channel Estimation for DS-CDMA


Reference [73] applies the concept of asymptotic freeness to the same
setup of Section 3.1.4 (linear DS-CDMA receivers and a fading channel
with L discrete chip-spaced paths), but departing from the usual as-
sumption that the receiver has perfect side information about the state
of the channel. Incorporating channel estimation, the receiver consists
of two distinct parts:
• The channel estimator, which provides linear MMSE joint
estimates of the channel gains for every path of every user.
• The data estimator, which uses those channel estimates to
detect the transmitted data using a one-shot linear receiver.

In order to render the problem analytically tractable, the delay spread


is considered small relative to the symbol time and, more importantly,
110 Applications to Wireless Communications

the time delays of the resolvable paths of all users are assumed known.
Thus, the channel estimation encompasses only the path gains and it is
further conditioned on the data (hypothesis that is valid during training
or with error-free data detection). The joint estimation of the channel
path gains for all the users is performed over an estimation window
of Q symbols, presumed small relative to the channel coherence time.
For the ith symbol within this window, the output of the chip matched
filter is

K 
L
y(i) = Ck, sk, (i)(x(i))k + n(i) (3.55)
k=1 =1

where Ck, represents the channel fading coefficient for path


of user
k such that E[|Ck, |2 ] = L1 , sk, (i) is the spreading sequence for the
th
path of the kth user for the ith symbol interval, n(i) is the additive
Gaussian noise in the ith symbol interval, and (x(i))k represents the
ith symbol of the kth user.
With long (i.e., changing from symbol to symbol) random spreading
sequences independent across users and paths, [73] shows using Theo-
rem 2.38 that, as K, N → ∞ with K N → β, the mean-square error of
the estimation of every path gain coefficient converges to
" & #−1
Q − βL L (Q − βL)2 Q + β L 2
ξ 2 = SNR + + SNR 2 + SNR L + .
2 2 4 2 4

This result, in fact, holds under alternative conditions as well:

• If the spreading sequences are independent across users and


paths but they repeat from symbol to symbol, i.e., sk, (i) =
sk, ∀i (this can be proved using Theorem 2.61).
• The sequences received over the L paths are cyclically shifted
versions of each other but independent across users, i.e.,
sk, (i) is a cyclically shifted replica of sk,1 (i) by
− 1 chips
(this can be proved using Example 2.49).

The linear receiver performing data estimation operates under the belief
that the estimate of the
th path gain of the kth user has mean C̄k,
and variance ξk2 . These estimates are further assumed uncorrelated and
3.1. Direct-Sequence CDMA 111

with equal variance for all paths of each user. (When the channel is
perfectly known, Ck, = C̄k, and ξk,2 = 0 and the results reduce to

their counterparts in Section 3.1.4.) The linear receiver is designed with


all expectations being conditional on the spreading sequences and the
mean and variance supplied by the channel estimator.
From Theorem 2.46, the output SINR for user k converges asymp-
totically in probability to

1 
L
|C̄k, |2 SINRd (3.56)
1 + ξk2 SINRd
=1

where SINRd is the corresponding output SINR without the effect of


other-user channel estimation errors. Implicit expressions for SINRd , de-
pending on the type of linear receiver, are
⎧ 

⎪ 1 P

⎪ + βLE MMSE

⎨ SNR 1 + P SINRd
1 1 1 βL
= + decorrelator (3.57)
SINRd ⎪
⎪ SNR 1 − βL


SNR

⎩ 1 + βLE[P] single-user matched filter
SNR

with expectation over P, whose distribution equals the asymptotic em-


pirical eigenvalue distribution of the matrix E[diag(c2 c†2 , . . . , cK c†K )]
(assumed to converge to a nonrandom limit) with ck = [Ck,1 . . . Ck,L ]T .
The main finding of the analysis in [73] is that, provided the channel es-
timation window (in symbols) exceeds the number of resolvable paths,
the resulting estimates enable near-optimal performance of the linear
data estimator.
In [46], the impact of channel estimator errors on the performance
of the linear MMSE multistage receiver (cf. Section 3.1.6) for large
multiuser systems with random spreading sequences is analyzed.

3.1.6 Reduced-Rank Receivers for DS-CDMA


Both the MMSE and the decorrelator receivers need to invert a ma-
trix whose dimensionality is equal to either the number of users or the
spreading gain. In large-dimensional systems, this is a computationally
intensive operation. It is therefore of interest to pursue receiver struc-
112 Applications to Wireless Communications

tures that approach the performance of these linear receivers at a lower


computational cost.
Invoking the Cayley-Hamilton Theorem,6 the MMSE receiver can
be synthesized as a polynomial expansion that yields the soft estimate
of the kth user symbol in (1.1) as

D−1
x̂k = h†k wm Rm y (3.58)
m=0

where R = HH† and D = N (the rank or the number of stages of the


receiver). Since the coefficients wm , m ∈ {0, . . . , D − 1} must be ob-
tained from the characteristic polynomial of the matrix whose inverse is
being expanded, this expansion by itself does not reduce the computa-
tional complexity. It does, however, enable the possibility of a flexible
tradeoff between performance and complexity controlled through D.
The first proposal for a reduced-complexity receiver built around this
idea came in [179], where it was suggested approximating (3.58) with
D < N and with the coefficients wm computed using as cost function
the mean-square error between x̂k obtained with the chosen D and the
actual x̂k obtained with a true MMSE receiver. Then, the wm ’s be-
come a function of the first D moments of the empirical distribution
of R. With D < N , the linear receiver in (3.58) projects the received
vector on the subspace (of the signal space) spanned by the vectors
{hk , Rhk , . . . , RD−1 hk }.7 Reduced-rank receivers have been put forth
for numerous signal processing applications such as array processing,
radar, model order reduction (e.g. [214, 215, 130]), where the signal is
effectively projected onto a lower-dimensional subspace and the filter
optimization then occurs within that subspace. This subspace can be
chosen using a variety of criteria:

Principal components. The projection occurs onto an estimate of


the lower-dimensional signal subspace with the largest energy
6 The Cayley-Hamilton Theorem ensures that the inverse of a K × K nonsingular matrix
can always expressed as a (K − 1)th order polynomial [117].
7 These vectors are also known as a Krylov sequence [117]. For a given matrix A and vector

x, the sequence of vectors x, Ax, A2 x, . . . or a truncated portion of this sequence is known


as the Krylov sequence of A. The subspace spanned by a Krylov sequence is called Krylov
space of A.
3.1. Direct-Sequence CDMA 113

[298, 115, 247].


Cross-spectral method. The eigenvector basis which minimizes the
mean-square error is chosen [34, 92] based on an eigenvalue
decomposition of the correlation matrix.
Partial despreading. The lower dimensional subspace of the reduced
rank receiver is spanned by non-overlapping segments of the
matched filter [232].
Reduced-rank multistage Wiener filter. The multi-stage Wiener
(MSW) filter and its reduced-rank version were proposed in
[91, 93].

These various techniques have been analyzed asymptotically, in


terms of SINR, in [116]. In particular, it is shown in [116] for the MSW
filter with equal-power users that, as K, N → ∞, the output SINR
converges in probability to a nonrandom limit
SNR
SINRD+1 = (3.59)
1 + β 1+SNR
SINRD

for D ≥ 0, where SINR0 = 0 and SINR1 = 1+βPSNR is the SINR at the out-
put of the matched filter. The analysis for unequal-power users can be
found in [253, 255]. A generalization of the analysis in [116] and [253]
can be found in [162] where a connection between the asymptotic be-
havior of the SINR at the output of the reduced rank Wiener filter and
the theory of orthogonal polynomials for the so-called power moments
is established. It is further demonstrated in [116] and [162], numerically
and analytically respectively, that the number of stages D needed in
the reduced-rank MSW filter to achieve a desired output SINR does not
scale with the dimensionality; in fact, a few stages are usually sufficient
to achieve near-full-rank output SINR regardless of the dimension of the
signal space. However, the weights of the reduced-rank receiver do de-
pend on the spreading sequences. Therefore, in long-sequence CDMA
they have to be reevaluated from symbol to symbol, which hampers
real-time implementation.
To lift the burden of computing the weights from the spreading se-
quences for every symbol interval, [187, 265, 159] proposed the asymp-
totic reduced-rank MMSE receiver, which replaces the weights in (3.58)
114 Applications to Wireless Communications

with their limiting values in the asymptotic regime. Following this ap-
proach, various scenarios described by (1.1) have been evaluated in
[45, 105, 158, 159, 187, 265].8 For all these different scenarios it has
been proved that, in contrast with the exact weights, the asymptotic
weights do not depend on the realization of H and hence they do not
need to be updated from symbol to symbol. The asymptotic weights are
determined only by the number of users per chip and by the asymptotic
moments of HH† and thus, in order to compute these weights explic-
itly, it is only necessary to obtain explicit expressions for the asymptotic
eigenvalue moments of the interference autocorrelation matrix. Numer-
ical results show that the asymptotic weights work well for even modest
dimensionalities.
Alternative low-complexity implementations of both the decorrela-
tor and the MMSE receiver can be realized using the concepts of itera-
tive linear interference cancellation [84, 124, 33, 207, 71, 72], which rely
on well-known iterative methods for the solution of systems of linear
equations (and consequently for matrix inversion) [7]. This connection
has been recently established in [99, 251, 72]. In particular, parallel
interference cancellation receivers are an example of application of the
Jacobi method, first- and second-order stationary methods and Cheby-
shev methods, while serial interference cancellation receivers are an
example of application of Gauss-Seidel and successive relaxation meth-
ods. For all these linear (parallel and serial) interference cancellation
receivers, the convergence properties to the true decorrelator or MMSE
solution have been studied in [99] for large systems. For equal-power
users, the asymptotic convergence of the output SINR of the linear
multistage parallel interference cancellation receiver (based on the first

8 In[187], DS-CDMA with equal-power users and no fading is studied. In turn, [158] con-
siders the more general scenario of DS-CDMA with unequal-power users and flat-fading.
Related results in the context of the reduced-rank MSW and of the receiver originally pro-
posed by [179] were reported in [45]. In [158, 159], the analysis is extended to multi-antenna
receivers and further extended to include frequency selectivity in [105, 159]. Specifically,
the frequency-selective CDMA downlink is studied in [105] with the restriction that the
signature matrix be unitarily invariant with i.i.d. entries. In [159], in contrast, the analysis
with frequency-selectivity is general enough to encompass uplink and downlink as well as
signature matrices whose entries are independent with common mean and variance but
otherwise arbitrarily distributed. The case of frequency-selective CDMA downlink with
orthogonal signatures has been treated in [105].
3.1. Direct-Sequence CDMA 115

and second-order stationary linear iterative method) to a nonrandom


limit has been analyzed in [252, 254].
We now summarize some of the results on linear polynomial MMSE
receivers for DS-CDMA. The linear expansion of the MMSE receiver
is built using a finite-order Krylov sequence of the matrix Hk H†k + σ 2 I
and the coefficients of the expansion are chosen to minimize MSE. The
soft estimate of the kth user symbol is given by (3.58) with R replaced
by

hi h†i + σ 2 I = Hk H†k + σ 2 I (3.60)
i =k

where Hk indicates the matrix H with the kth column removed. The
weights that minimize the mean-squared error are
⎡ ⎤−1 ⎡ ⎤
H1 + H0 H0 ··· HD + HD−1 H0 H0
⎢ . . . ⎥ ⎢ .. ⎥
w=⎣ .. .. .. ⎦ ⎣ . ⎦
HD + HD−1 H0 · · · H2D−1 + HD−1 HD−1 HD−1
(3.61)
where the (i, j)th entry of the above matrix is Hi+j−1 + Hi−1 Hj−1 with
 m
Hm = h†k Hk H†k + σ 2 I hk . (3.62)

Denoting the asymptotic value of Hm as



Hm = lim Hm , (3.63)
K→∞

the asymptotic weights are given by (3.61) where each Hm is replaced


by its asymptotic counterpart, Hm ∞ . The calculation of these asymptotic

weights is closely related to the evaluation of the asymptotic eigenvalue


moments of HH† , which can be done using the results laid down in
Section 2.3. In the following, all the hypotheses made in the previous
sections dealing with DS-CDMA are upheld.
In the case of unfaded equal power DS-CDMA, with H = S as in
Section 3.1.1, using (2.102) we have that [187, 158]
m
n

i
∞ m 2m−2n  n n β
Hm = σ . (3.64)
n=0
n i i−1 n
i=1
116 Applications to Wireless Communications

In the case of faded DS-CDMA with a single receive antenna, where


H = SA as in Section 3.1.2,
m

∞ m 2m−2n
Hm = σ |Ak |2 µn (3.65)
n=0
n

with µn , from (2.118), given by



n  n!    
µn = β n−i E |A|2m1 . . . E |A|2mi . (3.66)
m1 ! . . . mi ! i!
i=1

where |A| is a random variable whose distribution equals the asymptotic


empirical singular value distribution of A and the inner sum is over all
i-tuples of nonnegative integers (m1 , . . . , mi ) such that [158, 45]


i
m = n − i + 1 (3.67)
=1

i

m = n, (3.68)
=1

A similar result holds for the faded DS-CDMA with antenna diver-
sity described in Section 3.1.3 with |A| now equal to the square root
of the random variable whose distribution is given by the asymptotic
empirical distribution of P1 , . . . , PK as defined in Section 3.1.3.
For the frequency-selective faded downlink, applying Theorem 2.48
to the model in Section 3.1.4 we have [159]
m

∞ m 2m−2n  
Hm = σ |Ak |2 E |C|2 mn (|C|2 ) (3.69)
n
n=0

where

n     
mn (r) = β r m−1 (r) E |A|2i+2 E |C|2 mn1 −1 (|C|2 )
=1 n1 +···+ni =n−
1≤i≤n−
 
. . . E |C|2 mni −1 (|C|2 ) (3.70)

with |C|2 as in Section 3.1.4 and with |A| representing a random vari-
able, independent of |C|2 , whose distribution equals the asymptotic
3.2. Multi-Carrier CDMA 117

empirical singular value distribution of A. The counterpart of (3.69)


for orthogonal Haar distributed spreading signatures and for unitarily
invariant i.i.d. spreading sequences has been analyzed in [105], where
the asymptotic weights are calculated using free probability.
In the frequency-selective faded uplink, in turn, H is given by (3.35)
and straight application of Theorem 2.59 yields
m

∞ m 2m−2n
Hm = σ δn,k
n
n=0
m

m 2m−2n
= σ E[ρ(X, k)]E[mn (X)ρk (X)] (3.71)
n=0
n

with ρ(·, ·) and ρk (·) as in Section 3.1.4 and with mn (·) obtained
through the recursive equation given by (2.164) in Theorem 2.55.

3.2 Multi-Carrier CDMA


Multi-Carrier CDMA (MC-CDMA) is the frequency dual of DS-
CDMA. Hence, a MC-CDMA transmitter uses a given spreading se-
quence to spread the original signal in the frequency domain. In other
words, each fraction of the symbol corresponding to a chip of the
spreading code is transmitted through a different subcarrier. It is es-
sential that the sub-band corresponding to each subcarrier be narrow
enough for its fading to be frequency non-selective. The basic transmit-
ter structure of MC-CDMA is similar to that of OFDM [109], with the
main difference being that the MC-CDMA scheme transmits the same
symbol in parallel through the various subcarrier whereas an OFDM
scheme transmits different symbols. The spreading gain N is equal to
the number of frequency subcarriers. Each symbol of the data stream
generated by user k is replicated into N parallel copies. Each copy is
then multiplied by a chip from the corresponding spreading sequence.
Finally, an inverse discrete Fourier transform (IDFT) is used to convert
those N parallel copies back into serial form for transmission. A cyclic
or empty prefix is appended to facilitate demodulation, at the expense
of some loss in efficiency. A possible receiver front-end consists of N
matched filters, one for each subcarrier.
Since in the case of frequency-flat fading the analysis of MC-CDMA
118 Applications to Wireless Communications

is mathematically equivalent to that of its DS-CDMA counterpart (see


Section 3.1.2), we proceed directly to consider the more general case of
frequency-selective fading.

3.2.1 MC-CDMA Uplink


In synchronous MC-CDMA with K active users and frequency-selective
fading, the vector x contains the signals transmitted by each of the users
and the kth column of H is
(1) (N )
hk = [hk , . . . , hk ]T (3.72)
where
() ()
hk = Ak C,k sk , (3.73)
(1) (N )
with sk = [sk , . . . , sk ]T denoting the unit-energy transmitted spread-
ing sequence of the kth user, Ak indicating the received amplitude of
that kth user, which accounts for its average path loss, and with C,k
denoting the fading for the
th subcarrier of the kth user, independent
across the users. In this subsection we refer to hk as the received sig-
nature of the kth user. Notice that H incorporates both the spreading
and the frequency-selective fading. More precisely, denoting by C the
N × K matrix whose (
,k)th entry is C,k , we can write the received
signature matrix H as
H = C ◦ SA (3.74)
with ◦ denoting element-wise (Hadamard) product and
A = diag(A1 , . . . , AK ) (3.75)
S = [ s1 | . . . | sK ] (3.76)
C = [ c1 | . . . | cK ] (3.77)
1
where the entries of S are i.i.d. zero-mean with variance N and thus
the general model becomes
y = (C ◦ SA)x + n. (3.78)
Each user experiences independent fading and hence the columns of
C are independent. The relationship between the fading at different
3.2. Multi-Carrier CDMA 119

subcarriers of any given user, in turn, is dictated by the power-delay


response of the channel. More precisely, we can define a frequency co-
variance matrix of the kth user as
Mk = E[ck c†k ]. (3.79)
The (p, q)th entry of Mk is given by the correlation between the channel
response at subcarriers p and q, separated by frequency (p − q)∆f , i.e.,
 ∞
(Mk )p,q = φk (τ )e−j2π(p−q)τ ∆f dτ = Φk ((p − q)∆f ) (3.80)
−∞
with φk and Φk the power-delay response and the frequency correlation
function of the kth user channel, respectively.
The received energy at the
th subcarrier,
∈ {1, . . . , N }, for the
kth user, k ∈ {1, . . . , K}, is |C,k Ak |2 .
Let B be the N × K matrix whose (i, j)th element is
Bi,j = Ci,j Aj (3.81)
and let v(·, ·) be the two-dimensional channel profile of B assumed to
behave ergodically (cf. Definition 2.17). Then, the SINR at the output
of the MMSE receiver is
 −1
mmse
SINRk = SNR |Ak |2 (ck ◦ sk )† I + SNR Hk H†k (ck ◦ sk )
where, recall from the DS-CDMA analysis, Hk indicates the matrix
H with the kth column removed. Using Theorems 2.57 and 2.52, the
multiuser efficiency is given by the following result.

Theorem 3.1. [160] For 0 ≤ y ≤ 1, the multiuser efficiency of the


MMSE receiver for the yKth user converges almost surely, as K, N →
N → β, to
∞ with K
mmse Ψ(y, SNR )
lim η yK (SNR ) = (3.82)
K→∞ E [υ(X, y)]
where Ψ(·, ·) is a positive function solution to
⎡ ⎤
υ(X, y)
Ψ(y, SNR ) = E ⎣  ⎦ (3.83)
υ(X,Y)
1 + SNR βE 1+SNR Ψ(Y,SNR ) |X
where the expectations are with respect to independent random vari-
ables X and Y both uniform on [0,1].
120 Applications to Wireless Communications

Most quantities of interest such as the multiuser efficiency and the


capacity approach their asymptotic behaviors very rapidly as K and
N grow large. Hence, we can get an extremely accurate approximation
of the multiuser efficiency and consequently of the capacity with an
arbitrary number of users, K, and a finite processing gain, N , simply
by resorting to their asymptotic approximation with υ(x, y) replaced
in Theorem 3.1 by

−1
k−1 k
υ(x, y) ≈ |Ak |2 |C,k |2≤x< ≤y< .
N N K K
Thus, we have that the multiuser efficiency of uplink MC-CDMA is
closely approximated by
ΦN (SNR )
ηkmmse (SNR ) ≈ .kN (3.84)
=1 |C,k |
1 2
N
with
1 
N
|C,k |2
ΦN
k (SNR ) = . |Aj |2 |C,j |2
. (3.85)
N 1 + SNR
β K
=1 K j=1 1+SNR |A |2 ΦN (SNR )
j j

From Theorem 3.1, the MMSE spectral efficiency converges, as


K, N → ∞, to
C mmse (β, SNR ) = β E [log (1 + SNR Ψ(Y, SNR ))] (3.86)
where the function Ψ(·, ·) is the solution of (3.83).
Let the ratio between the effective number of users and the effective
processing gain be defined as
P[E [υ(X, Y)|Y] > 0]
β = β (3.87)
P[E [υ(X, Y)|X] > 0]
where only the contribution of users and subcarriers that are active
and not completely faded is accounted for. For all y if β  < 1, as SNR
goes to infinity, the solution to (3.83), Ψ(y, SNR ), converges to Ψ∞ (·),
which is the solution to the fixed-point equation

⎡ ⎤
υ(X, y)
Ψ∞ (y) = E ⎣  ⎦ . (3.88)
υ(X,Y)
1 + β E Ψ∞ (Y) |X
3.2. Multi-Carrier CDMA 121

If β  < 1, the spectral efficiency of the decorrelator is

C dec (β, SNR ) = β E [log (1 + SNR Ψ∞ (Y))] . (3.89)

As an application of Theorem 2.53, the following generalization of


(3.18) to the multicarrier CDMA channel is obtained.

Theorem 3.2. [160] The capacity of the optimum receiver is

C opt (β, SNR ) = C mmse (β, SNR )


+E [log(1 + SNR β E [υ(X, Y)Υ(Y, SNR )|X]]
−β SNR E [Ψ(Y, SNR )Υ(Y, SNR )] log e (3.90)

with Ψ(·, ·) and Υ(·, ·) satisfying the coupled fixed-point equations



υ(X, y)
Ψ(y, SNR ) = E (3.91)
1 + β SNR E[υ(X, Y)Υ(Y, SNR )|X]
1
Υ(y, SNR ) = (3.92)
1 + SNR Ψ(y, SNR )
where X and Y are independent random variables uniform on [0, 1].

As an alternative to (3.90), the asymptotic capacity per dimension


can also be expressed as


1
C (β, SNR ) = C
opt mmse
(β, SNR ) + E log
D(X, SNR )
+(E [D(X, SNR )] − 1) log e (3.93)

with D(·, ·) the solution to


1
D(x, SNR ) =  . (3.94)
υ(x,Y)
1 + SNR β E 1+SNR E[D(X,SNR )υ(X,Y)|Y]

This alternative expression can be easily derived from (3.90) by virtue


of the fact that Ψ(·, ·) and D(·, ·) relate through

Ψ(y, SNR ) = E[υ(X, y)D(X, SNR )].

Although (3.90) and (3.93) are equivalent, they admit different inter-
pretations. The latter is a generalization of the capacity given in (3.18).
122 Applications to Wireless Communications

The former, on the other hand, appears as function of quantities with


immediate engineering meaning. More precisely, SNR Ψ(y, SNR ) is easily
recognized from Theorem 3.1 as the SINR exhibited by the yKth
user at the output of a linear MMSE receiver. In turn Υ(y, SNR ) is the
corresponding mean-square error.
An alternative characterization of the capacity (inspired by the op-
timality by successive cancellation with MMSE protection against un-
cancelled users) is given by

C opt (β, SNR ) = βE [log(1 + SNR (Y, SNR ))] (3.95)

where
⎡ ⎤
υ(X, y)
(y, SNR ) = E ⎣  ⎦ (3.96)
υ(X,Z)
1 + SNR β(1 − y)E 1+SNR (Z,SNR ) |X

where X, and Z are independent random variables uniform on [0, 1] and


[y, 1], respectively.

A slight variation of the standard uplink MC-CDMA setup, namely


a multicode version where users are allowed to signal using several si-
multaneous spreading signatures, is treated in [201]. The asymptotic
output SINR of the linear MMSE receiver and the corresponding spec-
tral efficiency with both i.i.d. and orthogonal signatures are computed
accounting also for frequency selectivity in the channel. The deriva-
tions rely on approximating the user covariance matrices with suit-
able asymptotically free independent unitarily invariant matrices hav-
ing compactly supported asymptotic spectra (cf. Example 2.46). The
accuracy of this approximation is verified through simulation.

3.2.2 MC-CDMA Downlink

We now turn our attention to the MC-CDMA downlink, where the


results take simpler forms.
For the downlink, the structure of the transmitted MC-CDMA sig-
nal is identical to that of the uplink, but the difference with (3.74) is
that every user experiences the same channel and thus ck = c for all
3.2. Multi-Carrier CDMA 123

1 ≤ k ≤ K. As a result, the use of easily detectable orthogonal spread-


ing sequences becomes enticing. We shall thus consider, in addition to
sequences with i.i.d. entries, a scenario where the transmitted spreading
matrix S is an N × K isotropic unitary matrix Q and thus

H = CQA. (3.97)

with C = diag(c).
The role of the received signal-to-noise ratio of the kth user is, in this
scenario, taken by |Ak |2 SNR E[|C|2 ] where |C| is a random variable whose
distribution equals the asymptotic empirical singular value distribution
of C.
In our asymptotic analysis, we assume that the empirical singular
value distribution of A and C converge almost surely to respective
nonrandom limiting distributions F|A| and F|C| .

3.2.2.1 Sequences with i.i.d. Entries


It follows from Remark 2.3.1 that the results for the downlink can be
obtained as special cases of those derived for the uplink in Section
(3.2.1).
Application of Theorems 2.43 and 2.46 yields the following:

Theorem 3.3. The multiuser efficiency, ηkmmse , of the MMSE receiver


for the kth user converges almost surely to the solution, η mmse (SNR ), of
the fixed-point equation

⎡ ⎤
1 ⎣ |C|2
η mmse = E   ⎦ . (3.98)
E [|C|2 ] 1 + SNR β|C|2 E |A|2
1+|A|2 SNR E[|C|2 ] ηmmse

In the equal-power case, [202] arrived at (3.98) for a specific choice of


the distribution of |C|.
Unlike in the uplink, in the downlink the asymptotic multiuser effi-
ciency is the same for every user. This means that, asymptotically, all
the users are equivalent. The asymptotic Gaussianity of the multiaccess
interference at the output of the MMSE transformation [275] leads to
124 Applications to Wireless Communications

the following asymptotic spectral efficiency for the MMSE receiver:


 ,   -
C mmse (β, SNR ) = β E log 1 + |A|2 SNR E |C|2 η mmse (SNR ) . (3.99)
Let β  be the ratio between the effective number of users and the
effective processing gain:
P[|A| > 0]
β = β .
P[|C| > 0]
The asymptotic spectral efficiency of the decorrelator for β  ≤ 1 is
 , -
C dec = β E log 1 + SNR η0 |A|2 (3.100)
where η0 is the decorrelator multiuser efficiency, positive solution to
(cf. Corollary 2.2)

|C|2
E = 1. (3.101)
E[|C|2 ]η0 + β P[|A| > 0]|C|2
Applying Theorem 2.44, we obtain the central characterization of the
capacity of downlink MC-CDMA.

Theorem 3.4. In the MC-CDMA downlink, the capacity of the opti-


mum receiver admits the expression
 
C opt (β, SNR ) = C mmse (β, SNR ) + E log(1 + β|C|2 ρ) − β θ ρ log e

where
θ ρ = 1 − η|A|2 (SNR θ) (3.102)
β θ ρ = 1 − η|C|2 (ρβ). (3.103)
 
Note that θ(SNR ) = E |C|2 η mmse (SNR ).

3.2.2.2 Orthogonal Sequences


In this setting we assume that K ≤ N and the channel matrix H can
be written as the product of the N × N diagonal matrix C = diag(c),
an N × K matrix Q containing the spreading sequences and the K × K
diagonal matrix A of complex fading coefficients:
y = CQAx + n. (3.104)
3.2. Multi-Carrier CDMA 125

Here, Q is independent of C and of A and uniformly distributed over


the manifold9 of complex N × K matrices such that Q† Q = I.
The arithmetic mean of the MMSE’s for the K users satisfies
 −1 
1 
K
1 † † †
MMSEk = tr I + SNR A Q C CQA (3.105)
K K
k=1
a.s.
→ ηA† Q† C† CQA (SNR ) (3.106)
1
= 1 − (1 − ηCQAA† Q† C† (SNR )) (3.107)
β
where (3.107) comes from (2.56). For equal-power users (A = I), from
Example 2.51 we have that

β − 1 + ηCQQ† C†
ηCQQ† C† (SNR ) = ηCC† SNR . (3.108)
ηCQQ† C† (SNR )
From
1  1 
K K
1
MMSEk = (3.109)
K K 1 + SINRk
k=1 k=1

it follows that, as K, N → ∞,

1 
K
1 a.s. 1
→ 1 − (1 − ηCQQ† C† (SNR )). (3.110)
K 1 + SINRk β
k=1

For equal-power users, the unitary invariance of Q results in each user


1
admitting the same limiting MMSE and, from MMSEk = 1+SINR k
, the
same limiting SINR:
1 a.s. 1
→ . (3.111)
1 + SINRk 1 + SINR
Consequently, (3.110) implies that
SINR
β = 1 − ηCQQ† C† (SNR )
1 + SINR
which, in conjunction with (3.108), means that SINR is the solution to

SINR β
β = 1 − ηCC† SNR (3.112)
1 + SINR 1 + SINR(1 − β)
9 This is called the Stiefel manifold (cf. Section 2, Footnote 2).
126 Applications to Wireless Communications

whereas the multiuser efficiency of the kth user achieved by the MMSE
receiver, ηkmmse (SNR ), converges almost surely to
,  -
ηkmmse (SNR ) → η mmse SNR E |C|2

where the right side is the solution to the following equation at the
point τ = SNR E |C|2
$ %
η mmse |C̃|2
=E (3.113)
1 + τ η mmse βτ |C̃|2 + 1 + (1 − β)τ η mmse
2
|C|
with |C̃|2 = E[|C| 2 ] . A fixed-point equation equivalent to (3.113) was

derived in [56].
For equal-power users, the spectral efficiencies achieved by the
MMSE receiver and the decorrelator are
,   -
C mmse (β, SNR ) = β log 1 + SNR E |C|2 η mmse (SNR ) (3.114)

and, for 0 ≤ β ≤ 1,
,   -
C dec (β, SNR ) = β log 1 + SNR E |C|2 (1 − β) . (3.115)

In parallel with [217, Eqn. (141)], the capacity of the optimum receiver
is characterized in terms of the η-transform of HH† = CQQ†C†
 SNR
1
C opt (β, SNR ) = (1 − ηCQQ† C† (x)) dx (3.116)
0 x
with ηCQQ† C† (·) satisfying (3.108). An alternative characterization of
the capacity (inspired by the optimality by successive cancellation with
MMSE protection against uncancelled users) is given by

C opt (β, SNR ) = β E [log (1 + ‫(ג‬Y, SNR ))] (3.117)

with

‫(ג‬y, SNR ) SNR |C|
2
=E (3.118)
1 + ‫(ג‬y, SNR ) β y SNR |C|2 + 1 + (1 − β y)‫(ג‬y, SNR )
where Y is a random variable uniform on [0, 1].
The case of unequal-power users has been analyzed in [37] with the
restrictive setup of a finite number of user classes where the power
3.2. Multi-Carrier CDMA 127

is allowed to vary across classes but not over users within each class.
Reference [37] shows that the SINR of the kth user at the output of
the MMSE receiver, SINRk , and consequently ηkmmse (SNR ), converge al-
most surely to nonrandom limits. Specifically, the multiuser efficiency
converges to the solution η of
$ %
|C̃|2
E , - , - = 1 (3.119)
β|C̃|2 1 − η|A|2 (τ η) + η 1 − β + βη|A|2 (τ η)
with τ = SNR E[|C|2 ]. From the multiuser efficiency, the capacity can be
readily obtained using the optimality of successive interference cancel-
lation as done in (3.117).

3.2.2.3 Orthogonal Sequences vs i.i.d. Sequences


The multiuser efficiency achieved by the MMSE receiver where i.i.d.
spreading sequences are utilized, given in (3.98), can be rewritten as
$ %
η mmse |C̃|2
=E (3.120)
1 + τ η mmse βτ |C̃|2 + 1 + τ η mmse
with τ = SNR E[|C|2 ]. A comparison of (3.120) and (3.113) reveals that,
for a fixed β > 0, the SINR in the i.i.d. case is always less than in
the orthogonal case. Moreover, the performance gain induced by the
use of orthogonal instead of i.i.d. spreading sequences grows when β
approaches 1. If β ∼ 0, then the output SINR in the two cases is
basically equal. Moreover, from (3.98) and (3.113) it follows respectively
that
⎡ ⎤
|C̃|2
SINR i.i.d = E ⎣ ⎦ (3.121)
1 |C̃|2
τ + β 1+SINR i.i.d

and
⎡ ⎤
|C̃|2
SINR orth = E⎣   ⎦. (3.122)
|C̃|2
τ 1 − β 1+SINR orth + β 1+SINR orth
1 SINR orth

Notice, by comparing (3.121) and  (3.122), that in the latter the term
τ = SNR E[|C|2 ] is multiplied by 1 − β SINR orth +1 , which is less than 1.
1 1 SINR orth
128 Applications to Wireless Communications

Accordingly, for a given SINR the required SNR is reduced with respect
to the one required with i.i.d sequences.

3.2.3 Reduced Rank Receiver for MC-CDMA


In the downlink, the fading experienced by the N subcarriers is common
to all users. The asymptotic weights of the rank-D MMSE receiver for
the downlink can be easily derived from
m

∞ m 2m−2n
Hm = |Ak |2
σ ξn (3.123)
n=0
n
where, in the case of i.i.d. spreading sequences,

n
    
ξn = β E m−1 (|C|2 ) |C|4 E |A|2i+2 ξn1 −1 . . . ξni −1
=1 n1 +···+ni =n−
1≤i≤n−

(3.124)
and

n   
mn (r) = βr m−1 (r) E |A|2i+2 ξn1 −1 . . . ξni −1 (3.125)
=1 n1 +···+ni =n−
1≤i≤n−

with |C| and |A| random variables whose distributions equal the asymp-
totic empirical distributions of the singular values of C and A, respec-
tively. In the case of orthogonal sequences, the counterparts of (3.124)
and (3.125) can be found in [105].
For the uplink, the binomial expansion (3.62) becomes
m

∞ m 2m−2n
Hm = σ ξn,k (3.126)
n
n=0
where
ξn,k = E[mn (X)vk (X)] (3.127)
with mn (·) solution to the recursive equation

n 
mn (x) = β m−1 (x) E[ v(x, Y) E [v(X, Y)mn1 −1 (X)|Y]
=1 n1 +···+ni =n−
1≤i≤n−

. . . E [v(X, Y)mni −1 (X)|Y] ] (3.128)


3.3. Single-User Multi-Antenna Channels 129

where v(·, ·) is the two-dimensional channel profile of B as defined in


Section 3.2.1.

3.3 Single-User Multi-Antenna Channels


Let us now consider the problem of a single-user channel where the
transmitter has nT antennas while the receiver has nR antennas. (See
[250, 76] for the initial contributions on this topic and [60, 90, 82, 24, 23]
for recent articles of tutorial nature.)

3.3.1 Preliminaries

With reference to the general model in (1.1), x contains the symbols


transmitted from the nT transmit antennas and y the symbols received
by the nR receive antennas with nnTR → β when nT and nR grow large.
The entries of H represent the fading coefficients between each transmit
and each receive antenna normalized such that10
  
E tr HH† = nR (3.129)

while
E[ x 2 ]
SNR = . (3.130)
nR E[ n ]
1 2

In contrast with the multiaccess scenarios, in this case the signals trans-
mitted by different antennas can be advantageously correlated and thus
the covariance of x becomes relevant. Normalized by its energy per di-
mension, the input covariance is denoted by

E[xx† ]
Φ= (3.131)
nT E[ x ]
1 2

where the normalization ensures that E[tr{Φ}] = nT . It is useful to


decompose this input covariance in its eigenvectors and eigenvalues,

10 Although,
ˆ ˘ ¯˜
in most of the multi-antenna literature, E tr HH† = nT nR , for consistency
with the rest of the paper we use the normalization in (3.129). In the case that the entries
of H are identically distributed, the resulting variance of each entry is n1 .
T
130 Applications to Wireless Communications

Φ = VPV† . Each eigenvalue represents the (normalized) power allo-


cated to the corresponding signalling eigenvector. Associated with P,
we define an input power profile
j j+1
P (nR ) (t, SNR ) = Pj,j nR ≤t< nR

supported on t ∈ (0, β]. This profile specifies the power allocation at


each SNR . As the number of antennas is driven to infinity, P (nR ) (t, SNR )
converges uniformly to a nonrandom function, P(t, SNR ), which we term
asymptotic power profile.
In order to achieve capacity, the input covariance Φ must be prop-
erly determined depending on the channel-state information (CSI)
available to the transmitter. In this respect, there are three main
regimes of interest:

• The transmitter has full CSI, i.e., access to H instanta-


neously. In this case, Φ can be made a function of H. This
operational regime applies, for example, to fixed wireless ac-
cess systems where transmitter and receiver are stationary
(backhaul, local loop, broadband residential) and to low-
mobility systems (local-area networks, pedestrians). It is par-
ticularly appealing whenever uplink and downlink are recip-
rocal (time-duplexed systems) [48].
• The transmitter has only statistical CSI, i.e., access to the
distribution of H but not to its realization. In this case,
Φ cannot depend on H. This is the usual regime in high-
mobility and wide-area systems, especially if link reciprocity
does not hold.
• The transmitter has no CSI whatsoever.

For all these scenarios, the capacity per receive antenna is given
by the maximum over Φ of the Shannon transform of the averaged
empirical distribution of HΦH†, i.e.
C(SNR ) = max VHΦH† (SNR ). (3.132)
Φ:trΦ=nT

If full CSI is available at the transmitter, then V should coincide


with the eigenvector matrix of H† H and P should be obtained through
3.3. Single-User Multi-Antenna Channels 131

a waterfill process on the eigenvalues of H† H [260, 47, 250, 205]. The


resulting jth diagonal entry of P is

+
1
Pj,j = ν − (3.133)
SNR λj (H† H)

where ν is such that tr{P} = nT . Then, substituting in (3.132),


1
C(SNR ) = log det(I + SNR PΛ) (3.134)
nR

= β (log(SNR νλ))+ dFnHT† H (λ) (3.135)

with Λ equal to the diagonal eigenvalue matrix of H† H.

If, instead, only statistical CSI is available, then V should be set, for
all the channels that we will consider, to coincide with the eigenvectors
of E[H† H] while the capacity-achieving power allocation, P, can be
found iteratively [264].

With no CSI, the most reasonable strategy is to transmit an


isotropic signal (Φ = I) [195, 300]. In fact, because of its simplicity
and because many space-time coding schemes conform to it, this strat-
egy may be appealing even if some degree of CSI is available.

3.3.2 Canonical Model


The pioneering analyses that ignited research on this topic [250, 76]
started with H having i.i.d. zero-mean complex Gaussian random en-
tries (all antennas implicitly assumed identical and uncorrelated).
For this canonical channel, the capacity with full CSI converges
asymptotically to [43, 98, 177, 212]
 b

ν SNR
C(SNR ) = β log λ fβ (λ) dλ (3.136)
max{a,ν −1 } β

where ν satisfies
 b
+
β
ν− fβ (λ) dλ = 1 (3.137)
max{a,ν −1 } SNR λ
132 Applications to Wireless Communications

with a, b and fβ (·) given in (1.10).


β
If ν ≥ SNR a , then the integrals in (3.136) and (3.137) admit closed-
form expressions. Since, with full CSI at the transmitter, the capacity
is reciprocal in terms of the roles played by transmitter and receiver
[250], we have that
 
C(β, SNR ) = β C β1 , SNR (3.138)

and thus we need only solve the integrals for β < 1. Applying Example
2.15 to (3.136) and Theorem 2.10 to (3.137) and exploiting (3.138), the
following result is obtained.

Theorem 3.5. [263] For


2 min{1, β 3/2 }
SNR ≥ √ (3.139)
|1 − β||1 − β|
the capacity of the canonical channel with full CSI at the transmitter
converges almost surely to
⎧  
⎨ β log SNR + 1 + (1−β) log 1 − β log e β<1
C(SNR ) =  β 1−β 1−β
⎩ log β SNR + β β
β−1 + (β −1) log β−1 − log e β > 1.

Theorem 3.5 is illustrated in Figure 3.2 for various numbers of an-


tennas. The solid lines indicate the asymptotic solutions, with the role
of β played by nnTR , while the circles show the outcome of corresponding
Monte-Carlo simulations. Notice the power of the asymptotic analysis
for SNR levels satisfying (3.139).
For β = 1, the asymptotic capacity with full CSI is known only
for SNR → ∞, in which case it coincides with the mutual information
achieved by an isotropic input, presented later in this section [43].
Non-asymptotically in the number of antennas, the capacity with
full transmit CSI is studied in [4, 127]. In [127], in particular, an explicit
expression is given although as function of a parameter that must be
solved for numerically.

With statistical CSI, it was shown in [250] that capacity is achieved


with Φ = I. For fixed number of antennas, [250] gave an integral ex-
pression (integrating log(1 + SNR λ) with respect to the p.d.f. in (2.23))
3.3. Single-User Multi-Antenna Channels 133

20
analytical
Capacity (bits/s/Hz)

simulation
.
n T=4
15 nR=6

.
n T=2
nR=6

10 n T=2
nR=4

-3 0 3 6 9 12 15
SNR (dB)

Fig. 3.2 Capacity of a canonical channel with various numbers of transmit and receive
antennas. The arrows indicate the SNR above which (3.139) is satisfied.

for the expected capacity as a function of the signal-to-noise ratio and


the number of transmit and receive antennas. This integral involving
the Laguerre polynomials lends itself to an explicit expression. This has
been accomplished in [219, 61, 126]. In particular, [126] uses the Mellin
transform and Theorem 2.30 to arrive at a closed-form expression, and
[61] gives the expression in Example 2.17.
Asymptotically, as the numbers of transmit and receive antennas
grow with ratio β, the capacity per receive antenna converges almost
surely to [275, 206]


1 SNR SNR
C(β, SNR ) = β log 1 + − F ,β
β 4 β

1 SNR log e SNR


+ log 1 + SNR − F ,β −β F ,β
4 β 4 SNR β
(3.140)
134 Applications to Wireless Communications

with F (·, ·) given in (1.17). Notice that this capacity coincides, except
for a signal-to-noise scaling, with that of an unfaded equal-power DS-
CDMA channel.11
If β = 1, the asymptotic capacity per receive antenna with statisti-
cal CSI at the transmitter is equal to

1 + 1 + 4SNR log e ,√ -2
C(β, SNR ) = 2 log − 1 + 4 SNR − 1
2 4 SNR
evidencing the linear growth with the number of antennas originally
observed in [250, 76]. Further insight can be drawn, for arbitrary β,
from the high-SNR behavior of the capacity (cf. Example 2.15):


⎪ log SNR β−1
e − (β − 1) log β + o(1) β>1


C(SNR ) = log SNR
e + o(1) β=1



⎩ β log SNR − (1 − β) log(1 − β) + o(1)
βe β < 1.
Besides asymptotically in the number of antennas, the high-SNR capac-
ity can be characterized for fixed nT and nR via (2.12) in Theorem 2.11.
Also in this case, the capacity is seen to scale linearly with the num-
ber of antennas, more precisely with min(nT , nR ). While this scaling
makes multi-antenna communication highly appealing, it hinges on the
validity of the idealized canonical channel model. Much of the research
that has ensued, surveyed in the remainder of this section, is geared
precisely at accounting for various nonidealities (correlation, determin-
istic channel components, etc) that have the potential of compromising
this linear scaling.

3.3.3 Separable Correlation Model


The most immediate effect that results from locating various antennas
in close proximity is that their signals tend to be, to some extent,
11 Inaddition to its role in the analysis of multiaccess and single-user multi-antenna chan-
nels, (3.140) also plays a role in the analysis of the total capacity of the Gaussian
broadcast channel with multiple antennas at the transmitter [112, 282]. As shown in
[35, 316, 276, 280, 128, 259, 278, 36, 277, 302], in various degrees of generality, the multi-
antenna broadcast channel capacity region is equal to the union of capacity regions of the
dual multiaccess channel, where the union is taken over all individual power constraints
that sum to the averaged power constraint.
3.3. Single-User Multi-Antenna Channels 135

correlated. In its full generality, the correlation between the (i, j) and
(i , j  ) entries of H is given by
 
rH (i, i , j, j  ) = E Hi,j H∗i ,j  . (3.141)
In a number of interesting cases, however, correlation turns out to
be a strictly local phenomenon that can be modeled in a simplified
manner. To that end, the so-called separable (also termed Kronecker
or product) correlation model was proposed by several authors [220, 40,
203]. According to this model, an nR × nT matrix Hw , whose entries
are i.i.d. zero-mean with variance n1T , is pre- and post-multiplied by
the square root of deterministic matrices, ΘT and ΘR , whose entries
represent, respectively, the correlation between the transmit antennas
and between the receive antennas:
1/2 1/2
H = ΘR Hw ΘT . (3.142)
Implied by this model is that the correlation between two transmit
antennas is the same regardless of the receive antenna at which the
observation is made and viceversa. As confirmed experimentally in [41],
this condition is often satisfied in outdoor environments if the arrays are
composed by antennas with similar polarization and radiation patterns.
When (3.142) holds, the correlation in (3.141) can be expressed (cf.
Definition 2.9) as
(ΘR )i,i (ΘT )j,j 
rH (i, i , j, j  ) = . (3.143)
nT
Results on the asymptotic capacity and mutual information, with vari-
ous levels of transmitter information, of channels that obey this model
can be found in [181, 262, 43, 263, 178]. Analytical non-asymptotic
expressions have also been reported: in [208, 209, 2], the capacity of
one-sided correlated channels is obtained starting from the joint dis-
tribution of the eigenvalues of a Wishart matrix ∼ Wm (n, Σ) given in
Theorem 2.18 and (2.19). References [135, 234, 149, 39] compute the
moment generating function of the mutual information of a one-sided
correlated MIMO channel, constraining the eigenvalues of the correla-
tion matrix to be distinct. The two-sided correlated MIMO channel is
analyzed in [148, 231, 149] also through the moment generating func-
tion of the mutual information (cf. (2.16)).
136 Applications to Wireless Communications

With full CSI at the transmitter, the asymptotic capacity is [43]


 ∞
C(SNR ) = β (log(SNR νλ))+ dG(λ) (3.144)
0

where ν satisfies
 ∞
+
1
ν− dG(λ) = 1 (3.145)
0 SNR λ
with G(·) the asymptotic spectrum of H† H whose η-transform can be
derived using Theorem 2.43 and Lemma 2.28. Invoking Theorem 2.45,
the capacity in (3.144) can be evaluated as follows.

Theorem 3.6. [263] Let ΛR and ΛT be independent random variables


whose distributions are the asymptotic spectra of the full-rank matrices
ΘR and ΘT respectively. Further define
 
ΛT β < 1 ΛR β < 1
Λ1 = Λ2 = (3.146)
ΛR β > 1 ΛT β > 1
and let κ be the infimum (excluding any mass point at zero) of the
support of the asymptotic spectrum of H† H. For

1 1
SNR ≥ − δE (3.147)
κ Λ1
with δ satisfying
ηΛ2 (δ) = 1 − min{β, β1 },
the asymptotic capacity of a channel with separable correlations and
full CSI at the transmitter is
⎧    
⎨ β E log ΛT + VΛ (ϑ) + β log SNR + ϑE[ 1 ] β<1
C(SNR ) =  eϑ R
 ΛT
⎩ E log R + β VΛ (α) + log SNR + αE[ ]
Λ 1
β>1
αe T ΛR

with α and ϑ the solutions to


1
ηΛT (α) = 1 − ηΛR (ϑ) = 1 − β.
β

As for the canonical channel, no asymptotic characterization of the


capacity with full CSI at the transmitter is known for β = 1 and arbi-
trary SNR .
3.3. Single-User Multi-Antenna Channels 137

When the correlation is present only at either the transmit or re-


ceive ends of the link, the solutions in Theorem 3.6 sometimes become
explicit:

Corollary 3.1. With correlation at the end of the link with the fewest
antennas, the capacity per antenna with full CSI at the transmitter
converges to
⎧    

⎪ E ΛT 1 1−β
E[ 1 β<1

⎨ β log e + log 1−β + β log SNR
β + ΛT ]
ΛR = 1
C=    

⎪ β>1

⎩ E log ΛeR − β log β−1 β + log SNR (β − 1) + E[ ΛR ]
1
ΛT = 1.

With statistical CSI at the transmitter, achieving capacity requires


that the eigenvectors of the input covariance, Φ, coincide with those
of ΘT [279, 123]. Consequently, denoting by ΛT and ΛR the diagonal
eigenvalue matrices of ΘT and ΘR , respectively, we have that
1  
log det I + SNR ΛR Hw ΛT PΛT H†w ΛR
1/2 1/2 1/2 1/2
C(β, SNR ) =
N
where P is the capacity-achieving power allocation [264]. Applying The-
orem 2.44, we obtain:

Theorem 3.7. [262] The capacity of a Rayleigh-faded channel with


separable transmit and receive correlation matrices ΘT and ΘR and
statistical CSI at the transmitter converges to
C(β, SNR ) = βE [log(1 + SNR ΛΓ(SNR ))] + E [log(1 + SNR ΛR Υ(SNR )]
−β SNR Γ(SNR )Υ(SNR ) log e (3.148)
where

1 ΛR
Γ(SNR ) = E (3.149)
β 1 + SNR ΛR Υ(SNR )

Λ
Υ(SNR ) = E (3.150)
1 + SNR ΛΓ(SNR )
with expectation over Λ and ΛR whose distributions are given by the
asymptotic empirical eigenvalue distributions of ΛT P and ΘR , respec-
tively.
138 Applications to Wireless Communications

If the input is isotropic, the achievable mutual information is easily


found from the foregoing result.

12
Mutual Information (bits/s/Hz)

d
(i.i.d.)
d=2
10 d
.

d=1

8 receiver
.
transmitter

2 analytical
simulation

-10 -5 0 5 10 15 20
SNR (dB)

Fig. 3.3 Mutual information achieved by an isotropic input on a Rayleigh-faded channel


with nT = 4 and nR = 2. The transmitter is a uniform linear array whose antenna corre-
lation is given by (3.151) where d is the spacing (wavelengths) between adjacent antennas.
The receive antennas are uncorrelated.

Corollary 3.2. [266] Consider a channel defined as in Theorem 3.7


and an isotropic input. Expression (3.148) yields the mutual infor-
mation with the distribution of Λ given by the asymptotic empirical
eigenvalue distribution of ΘT .

This corollary is illustrated in Fig. 3.3, which depicts the mutual


information (bits/s/Hz) achieved by an isotropic input for a wide range
of SNR . The channel is Rayleigh-faded with nT = 4 correlated antennas
and nR = 2 uncorrelated antennas. The correlation between the ith
and jth transmit antennas is
2 (i−j)2
(ΘT )i,j = e−0.05d (3.151)
3.3. Single-User Multi-Antenna Channels 139

which corresponds to a uniform linear array with antenna separation d


(wavelengths) exposed to a broadside Gaussian azimuth angular spec-
trum with a 2◦ root-mean-square spread [42]. Such angular spread is
typical of an elevated base station in rural or suburban areas. The solid
lines depict the analytical solution obtained by applying Theorem 3.7
with P = I and ΘR = I and with the expectations over Λ replaced
with arithmetic averages over the eigenvalues of ΘT . The circles, in
turn, show the result of Monte-Carlo simulations. Notice the excellent
agreement even for such small numbers of antennas.

The high-SNR behaviors of the capacity with statistical CSI and of


the mutual information achieved by an isotropic input can be charac-
terized, asymptotically in the number of antennas, using Theorem 2.45.
For arbitrary nT and nR , such characterizations can be found in [165].

3.3.4 Non-Separable Correlations


While the separable correlation model is relatively simple and analyt-
ically appealing, it also has clear limitations, particularly in terms of
representing indoor propagation environments [194]. Also, it does not
accommodate diversity mechanisms such as polarization12 and radia-
tion pattern diversity13 that are becoming increasingly popular as they
enable more compact arrays. The use of different polarizations and/or
radiation patterns creates correlation structures that cannot be repre-
sented through the separable model.
In order to encompass a broader range of correlations, we model the
channel as
H = UR H̃U†T (3.152)
where UR and UT are unitary while the entries of H̃ are independent
zero-mean Gaussian. This model is advocated and experimentally sup-
ported in [301] and its capacity is characterized asymptotically in [262].
12 Polarization diversity: Antennas with orthogonal polarizations are used to ensure low
levels of correlation with minimum or no antenna spacing [156, 236] and to make the
communication link robust to polarization rotations in the channel [19].
13 Pattern diversity: Antennas with different radiation patterns or with rotated versions of

the same pattern are used to discriminate different multipath components and reduce
correlation.
140 Applications to Wireless Communications

For the more restrictive case where UR and UT are Fourier matrices,
the model (3.152) was proposed earlier in [213].
The matrices H and H̃ are directly related through the Karhunen-
Loève expansion (cf. Lemma 2.25) with the variances of the entries of
H̃ given by the eigenvalues of rH obtained by solving the system of
equations in (2.33). Furthermore, from Theorem 2.58, the asymptotic
spectrum of H is fully characterized by the variances of the entries of
H̃, which we assemble in a matrix G such that Gi,j = nT E[|H̃i,j |2 ] with

Gi,j = nT nR . (3.153)
ij

Invoking Definition 2.16, we introduce the variance profile of H̃,


which maps the entries of G onto a two-dimensional piece-wise constant
function
j j+1
G (nR ) (r, t) = Gi,j i
nR ≤r< i+1
nR , nT ≤t< nT (3.154)
supported on r, t ∈ [0, 1]. We can interpret r and t as normalized re-
ceive and transmit antenna indices. It is assumed that, as the number
of antennas grows, G (nR ) (r, t) converges uniformly to the asymptotic
variance profile, G(r, t). The normalization condition in (3.153) implies
that
E[G(R, T)] = 1 (3.155)
with R and T independent random variables uniform on [0, 1].

With full CSI at the transmitter, the asymptotic capacity is given by


(3.144) and (3.145) with G(·) representing the asymptotic spectrum of
H† H. Using Theorems 2.58 and 2.54, an explicit expression for C(SNR )
can be obtained for sufficiently high SNR .

With statistical CSI at the transmitter, the eigenvectors of the


capacity-achieving input covariance coincide with the columns of UT
in (3.152) [261, 268]. In order to characterize the capacity, we invoke
Theorem 2.53 to obtain the following.

Theorem 3.8. [262] Consider the channel H = UR H̃U†T where UR


and UT are unitary while the entries of H̃ are zero-mean Gaussian and
3.3. Single-User Multi-Antenna Channels 141

independent. Denote by G(r, t) the asymptotic variance profile of H̃.


With statistical CSI at the transmitter, the asymptotic capacity is

C(β, SNR ) = β E [log(1 + SNR E [G(R, T)P(T, SNR )Γ(R, SNR )| T])]
+E [log(1 + E[G(R, T)P(T, SNR )Υ(T, SNR )|R])]
−β E [G(R, T)P(T, SNR )Γ(R, SNR )Υ(T, SNR )] log e

with expectation over the independent random variables R and T uni-


form on [0, 1] and with
1
β Γ(r, SNR ) =
1 + E[G(r, T)P(T, SNR )Υ(T, SNR )]
SNR
Υ(t, SNR ) =
1 + SNR E [G(R, t)P(t, SNR )Γ(R, SNR )]

where P(t, SNR ) is the asymptotic power profile of the capacity achieving
power allocation at each SNR .

12
Mutual Information (bits/s/Hz)

10 analytical
simulation

0.4 3.6 0.5


4 G=
0.3 1 0.2

-10 -5 0 5 10 15 20
SNR (dB)

Fig. 3.4 Mutual information achieved by an isotropic input on a Rayleigh-faded channel


with nT = 3 and nR = 2 for the variance matrix G in (3.156).
142 Applications to Wireless Communications

Corollary 3.3. [266] Consider a channel defined as in Theorem 3.8 but


with an isotropic input. Theorem 3.8 yields the mutual information by
setting P(t, SNR ) = 1.

This corollary is illustrated in Fig. 3.4 for a Rayleigh-faded channel


with nT = 3 and nR = 2 where H = UR H̃U†T with the entries of H̃
being independent with zero-mean and variances given by

0.4 3.6 0.5
G= . (3.156)
0.3 1 0.2

Despite the very small numbers of antennas, there is full agreement


between the analytical values (obtained by applying Theorem 3.8 with
P(t, SNR ) = 1 and with the expectations replaced by arithmetic averages
over the entries of G) and the outcome of corresponding Monte-Carlo
simulations.

Asymptotic characterizations of the high-SNR capacity with statis-


tical CSI and of the mutual information achieved by an isotropic input
can be obtained via Theorem 2.54.
Asymptotic spectrum results have also been used in [161] to charac-
terize the wideband capacity of correlated multi-antenna channel using
the tools of [274].

3.3.5 Polarization Diversity

A particularly interesting channel is generated if antennas with mixed


polarizations are used and there is no correlation, in which case the
entries of H are independent but not identically distributed because of
the different power gain between co-polarized and differently polarized
antennas. In this case, the eigenvalues of rH coincide with the variance
of the entries of H, which we can model as

H = A ◦ Hw (3.157)

where ◦ indicates Hadamard (element-wise) multiplication, Hw is com-


posed of zero-mean i.i.d. Gaussian entries with variance n1T and A is
3.3. Single-User Multi-Antenna Channels 143

a deterministic matrix with nonnegative entries. Each |Ai,j |2 symbol-


izes the power gain between the jth transmit and ith receive antennas,
determined by their relative polarizations.14

The asymptotic capacity with full CSI at the transmitter can be


found, for sufficiently high SNR , by invoking Theorems 2.58 and 2.54.

Since the entries of H are independent, the input covariance that


achieves capacity with statistical CSI is diagonal [261, 268]. The cor-
responding asymptotic capacity per antenna equals the one given in
Theorem 3.8 with G(r, t) the asymptotic variance profile of H. Corol-
lary 3.3 holds similarly. Furthermore, these solutions do not require
that the entries of H be Gaussian but only that their variances be
uniformly bounded.

A common structure for A, arising when the transmit and receive


arrays have an equal number of antennas on each polarization, is that
of a doubly-regular form (cf. Definition 2.10). For such channels, the
capacity-achieving input is not only diagonal but isotropic and, apply-
ing Theorem 2.49, the capacity admits an explicit form.

Theorem 3.9. Consider a channel H = A ◦ Hw where the entries of


A are deterministic and nonnegative while those of Hw are zero-mean
and independent, with variance n1T but not necessarily identically dis-
tributed. If A is doubly-regular (cf. Definition 2.10), the asymptotic
capacity per antenna, with full CSI or with statistical CSI at the trans-
mitter, coincides with that of the canonical channel, given in Theorem
3.5 and Eq. (3.140) respectively.

3.3.6 Progressive Scattering


Let us postulate the existence of L−1 clusters of scatterers each with n
scattering objects, 1 ≤
≤ L − 1, such that the signal propagates from
the transmit array to the first cluster, from there to the second cluster
and so on, until it is received from the (L − 1)th cluster by the receiver

14 If all antennas are co-polar, then every entry of A equals 1.


144 Applications to Wireless Communications

array. This practically motivated model provides a nice application of


the S-transform.
The matrix H describing the communication link with progressive
scattering be written as the product of L independent random matrices
[184]


L
H= H (3.158)
=1

where the n × n−1 matrix H describes the subchannel between the


(
− 1)th and
th clusters. (We conventionally denote as 1st and Lth
clusters the transmit and the receive arrays themselves.) If the matri-
ces H are mutually independent with zero-mean i.i.d. entries having
variance n1 , and defining β = nnL , the S-transform of the matrix
" #†

L 
L
AL = H H (3.159)
=1 =1
= HL AL−1 H†L (3.160)

can be computed using Example 2.53 as


1 x
ΣAL (x) = ΣAL−1 ( βL−1 ) (3.161)
x + βL−1
which, applying Example 2.53 iteratively, yields

L
β
ΣAL (x) = (3.162)
x + β−1
=1

from which it follows that the η-transform of AL is the solution to

ηAL (SNR ) 
L
β
SNR = . (3.163)
1 − ηAL (SNR ) ηAL (SNR ) + β−1 − 1
=1

3.3.7 Ricean Channel


Every zero-mean multi-antenna channel model analyzed thus far can be
made Ricean by incorporating an additional deterministic component
H̄ [62, 74, 236]. With proper weighting of the random and deterministic
3.3. Single-User Multi-Antenna Channels 145

components so that condition (3.129) is preserved, the general model


then becomes
 
1 K
y= K+1 H + K+1 H̄ x + n (3.164)

with the scalar Ricean factor K quantifying the ratio between the
Frobenius norm of the deterministic (unfaded) component and the ex-
pected Frobenius norm of the random (faded) component. Considered
individually, each (i, j)th channel entry has a Ricean factor given by

|H̄i,j |2
K .
E[|Hi,j |2 ]
Using Lemma 2.22 the next result follows straightforwardly.

Theorem 3.10. Consider a channel with a Ricean term whose rank


is finite. The asymptotic capacity per antenna, C rice (β, SNR ), equals the
corresponding asymptotic capacity per antenna in the absence of the
Ricean component, C(β, SNR ), with a simple SNR penalty:
SNR
C rice (β, SNR ) = C(β, ). (3.165)
K +1
Note that, while the value of the capacity depends on the degree of CSI
available at the transmitter, (3.165) holds regardless.

Further applications of random matrix methods to Ricean multi-


antenna channels in the non-asymptotic regime, can be found in [134,
137, 3, 118, 151, 269].

3.3.8 Interference-limited Channel

Since efficient bandwidth utilization requires aggressive frequency reuse


across adjacent cells and sectors, mature wireless systems tend to be, by
design, limited by out-of-cell interference rather than by thermal noise.
Unlike thermal noise, which is spatially and temporally white, interfer-
ence is in general spatially colored. The impact of colored interference
on the capacity has been studied asymptotically in [163, 181, 51], and
non-asymptotically in [28, 138].
146 Applications to Wireless Communications

Out-of-cell interference can be incorporated into the model (1.1) by


representing the noise as


L
n= H x + nth (3.166)
=1

where L is the number of interferers, x the signal transmitted by the


-
th interferer, H the channel from such interferer and nth the underlying
thermal noise. Thus, (1.1) becomes


L
y = Hx + H x + nth . (3.167)
=1

In what follows, we consider a homogeneous system where the entries


of x ,
∈ {1, . . . , L}, to be i.i.d. zero-mean Gaussian and the number of
transmit antennas at each interferer to coincide with nT . Furthermore,
the channels H and H ,
∈ {1, . . . , L}, are modeled as canonical. We
define the signal-to-interference with respect to each interferer as

E[ x 2 ]
SIR  = (3.168)
E[ x 2 ]
and use SNR to specify the signal-to-thermal-noise ratio. With that, the
overall SINR satisfies

1 1 
L
1
= + (3.169)
SINR SNR SIR 
=1

and the capacity can be expressed as


⎡ ⎛ " L #−1 ⎞⎤
1 ⎣ 
C= E log det ⎝I + HH† 1
SIR 
H H† + nT
SNR
I ⎠⎦ (3.170)
nR
=1

with expectation over the distributions of H and H ,


∈ {1, . . . , L}.
The impact of interference on the capacity essentially mirrors that of
receive correlations except for the fact that the interference is subject
to fading. Asymptotically, however, this becomes immaterial and hence
Theorem 2.44 can be applied to obtain:
3.3. Single-User Multi-Antenna Channels 147

Theorem 3.11. [163, 317]15 Consider a Rayleigh-faded channel with


i.i.d. zero-mean unit-variance entries exposed to L i.i.d. Gaussian in-
terferers whose channels are similarly distributed. Let the user of in-
terest and each interferer be equipped with nT transmit antennas. As
nT , nR → ∞ with β → nnRT , the capacity converges to
" #
L
SIR  + SNR
η1
β η1
C (β, SNR , {SIR  }) = β log η2 + β log(1 + SNR )
SIR  + SNR β
=1 β
η2
+ log + (η1 − η2 ) log e (3.171)
η1
with η1 and η2 solutions to
SNR η1 
L
SNR η1
η1 + η1 + η1 = 1 (3.172)
SNR
β +1 SNR
β + SIR 
=1
L
SNR η2
η2 + η2 = 1. (3.173)
SNR
β + SIR 
=1

Obtaining explicit expressions requires solving for η1 and η2 in equa-


tions of order L+2 and L+1, respectively. Hence, the complexity of the
solution is directly determined by the number of interferers. Nonethe-
less, solving for η1 and η2 becomes trivial in some limiting cases [163]:

• For growing β,
1
lim η1 =  .L  (3.174)
β→∞ 1
1 + SNR 1 + =1 SIR 
1
lim η2 = .L 1
(3.175)
β→∞ 1 + SNR =1 SIR 

which are function of only the relative powers of the desired


user, the interferers and the thermal noise. Plugging these
into (3.171) yields an asymptotic capacity that is identical
to that which would be attained if the interference was re-
placed with white noise. Hence, as the total number of in-
terfering antennas grows much larger than the number of
15 Although the analysis in [317] considers multicell DS-CDMA, the expression for the
capacity maps exactly onto (3.170) except for a simple SNR scaling.
148 Applications to Wireless Communications

receive antennas, the progressively fine color of the interfer-


ence cannot be discerned. The capacity depends only on the
total interference-plus-thermal power, irrespective of how it
breaks down.
• For diminishing β and finite L,
lim η1 = lim η2 = 1 (3.176)
β→0 β→0

indicating that the capacity penalty due to a fixed number


of interfering antennas vanishes as the number of receive an-
tennas grows without bound. The performance becomes dic-
tated only by the underlying thermal noise, irrespective of
the existence of the interference [309, 310].

3.3.9 Outage Capacity


The ergodic capacity has operational meaning only if the distribution
of H is revealed by the realizations encountered by each codeword.
In some situations, however, H is held approximately constant during
the transmission of a codeword, in which case a more suitable perfor-
mance measure is the outage capacity, which coincides with the classical
Shannon-theoretic notion of -capacity [49], namely the maximal rate
for which block error probability  is attainable. Under certain condi-
tions, the outage capacity can be obtained through the probability that
the transmission rate R exceeds the input-output mutual information
(conditioned on the channel realization) [77, 250, 22]. Thus, given a
rate R an outage occurs when the random variable
I = log det(I + SNR HΦH†)
whose distribution is induced by H, falls below R. Establishing the
input covariance that maximizes the rate supported at some chosen
outage level is a problem not easily tackled analytically. (Some results
on the eigenvectors of Φ can be found in [229].) Hereafter Φ is allowed
to be an arbitrary deterministic matrix except where otherwise noted.
The distribution of I can be obtained via its moment-generating
function
 
M (ζ) = E eζI (3.177)
3.3. Single-User Multi-Antenna Channels 149

which, for the canonical channel with Φ = I, is given by (2.18) as de-


rived in [38, 299]. The corresponding function for one-sided correlation,
in the case of square channels, is for ζ ≤ 0

M (ζ) = 2 F0 (ζ log 1e , m | − γΘ) (3.178)

where 2 F0 (·, · | ·) is given by (2.21) with Θ = ΘR if the correlation takes


1/2 1/2
place at the receiver whereas Θ = ΘT ΦΘT if it takes place at the
transmitter. With both transmit and receive correlations, M (·) is given
1/2 1/2
by Theorem 2.16 with Σ = ΘR and Υ = ΘT ΦΘT .
For uncorrelated Ricean channels with Φ = I, M (·) is provided in
[134] in terms of the integral of hypergeometric functions.
For nR = 1, the distribution of I is found directly, bypassing the
moment-generating function, for correlated Rayleigh-faded channels in
[180, 132] and for uncorrelated Ricean channels in [180, 233].16
An interesting property of the distribution of I is the fact that, for
many of the multi-antenna channels of interest, it can be approximated
as Gaussian as the number of antennas grows. A number of authors have
explored this property using two distinct approaches in the engineering
literature:

(1) The mean and variance of I are obtained through the mo-
ment generating function (for fixed number of antennas). A
Gaussian distribution with such mean and variance is then
compared, through Monte Carlo simulation, to the empirical
distribution of I. This approach is followed in [235, 299, 26]
for the canonical channel, in [234] for channels with one-sided
correlation, and in [235] for uncorrelated Ricean channels.
Although, in every case, the match is excellent, no proof of
asymptotic Gaussianity is provided. Only for SNR → ∞ with
Φ = I and with H being a real Gaussian matrix with i.i.d.
entries has it been shown that I − E[I] converges to a Gaus-
sian random variable [87].

16 The input covariance is constrained to be Φ = I in [233], which also gives the corre-
sponding distribution of I for min(nT , nR ) = 2 and arbitrary max(nT , nR ) although in
the form of an involved integral expression.
150 Applications to Wireless Communications

(2) The random variable

∆ nR = I(SNR ) − nR VHΦH† (SNR ) (3.179)

is either shown or conjectured to converge to a zero-mean


Gaussian random variable as nR → ∞. For Rayleigh-faded
channels with one-sided correlation (at either transmitter or
receiver), the asymptotic Gaussianity of ∆nR follows from
Theorem 2.77.17 The convergence rate to the Gaussian dis-
tribution is analyzed in [15]. With both transmit and receive
correlations, the asymptotic Gaussianity of ∆nR is conjec-
tured in [216, 181] by observing the behavior of the second-
and third-order moments obtained via the replica method.

The appeal of the Gaussian behavior, of course, is that its character-


ization entails finding only the mean and variance of I. In how these
are found, and in some others respects, the differences between both
approaches are subtle but important:

• When approximating I as a Gaussian random variable for


finite nT and nR , the first approach uses exact expressions
for its mean and variance. These expressions, which can be
obtained from the moment-generating function, tend to be
rather involved and are often not in closed form. The sec-
ond approach, on the other hand, relies on functionals of the
asymptotic spectrum. Although exact only in the limit, these
functionals are tight even for small values of nT and nR and
tend to have simpler and more insightful forms.
• If the moment convergence theorem does not apply to the
asymptotic spectrum, as in the case of Ricean channels where
the rank of E[H] is o(nR ), then the second approach results
in a bias that stems from the fact that E[H] is not reflected
in the asymptotic spectrum (cf. Lemma 2.22).

Denoting ∆ = limnR →∞ ∆nR , E[∆2 ] can be found by applying [15,

17 The more restrictive case of a canonical channel at either low or high SNR is analyzed in
[113].
3.3. Single-User Multi-Antenna Channels 151

(1.17)]. For the canonical channel, this yields (cf. Theorem 2.76)
" #
(1 − ηHH† (γ))2
E[∆ ] = − log 1 −
2
β
"
#
1 F (γ, β) 2
= − log 1 − . (3.180)
β 4γ
With Rayleigh fading and correlation at the transmitter, in turn,
" #
2
(1 − η HTH † (γ))
E[∆2 ] = − log 1 − (3.181)
β
1/2 1/2
where T = ΘT ΦΘT with Φ the capacity-achieving power allocation.

0.6

0.5

0.4

0.3

0.2

0.1

0
−4 −3 −2 −1 0 1 2 3 4

Fig. 3.5 Histogram of ∆nR for a Rayleigh-faded channel with nT = 5 and nR = 10. The
transmit antennas are correlated as per (3.182) while the receive antennas are uncorrelated.
Solid line indicates the corresponding limiting Gaussian distribution.

Figure 3.5 compares the limiting Gaussian distribution of ∆ with a


histogram of ∆nR for nT = 5 and nR = 10 with a transmit correlation
matrix ΘT such that
2
(ΘT )i,j = e−0.8(i−j) . (3.182)
For channels with both transmit and receive correlation, the char-
acteristic function found through the replica method yields to the ex-
pression of E[∆2 ] given in [181].
152 Applications to Wireless Communications

3.3.10 Space-Time Coding


Besides the characterizations of the capacity for the various channels
described throughout this section, random matrix theory (and specifi-
cally free probability) has also been used to obtain design criteria for
space-time codes [25]. In [25], the behavior of space-time codes is char-
acterized asymptotically in the number of antennas. Specifically, the
behavior of pairwise error probabilities is determined with three types
of receivers: maximum-likelihood (ML), decorrelator and linear MMSE,
It is shown that with ML or linear receivers the asymptotic performance
of space-time codes is determined by the Euclidean distances between
codewords. This holds for intermediate signal-to-noise ratios even when
the number of antennas is small. Simulations show how asymptotic re-
sults are quite accurate in the non-asymptotic regime. This has the
interesting implication that even for few antennas, off-the-shelf codes
may outperform special-purpose space-time codes.

3.4 Other Applications


In addition to the foregoing recent applications of random matrix the-
ory in characterizing the fundamental limits of wireless communication
channels, several other applications of the results in Section 2 can be
found in the information theory, communications and signal processing
literature:

• Speed of convergence of iterative algorithms for multiuser


detection [312].
• Direction of arrival estimation in sensor arrays [228].
• Learning and neural networks [50].
• Capacity of ad hoc networks [157].
• Data mining and multivariate time series modelling and anal-
ysis [155, 139].
• Principal components analysis [119].
• Maximal entropy methods [17, 292].
• Information theory and free probability [288, 289, 248, 292,
293].
4
Appendices

4.1 Proof of Theorem 2.39


In this section we give a multiuser-detection argument for the proof
of Theorem 2.39 in the special case where T is diagonal. To use the
standard notation in multiuser detection [271], we replace H with S
and let T = AA† .
An important non-asymptotic relationship between the eigenvalues
λ1 , . . . , λN of the matrix STS† and the signal-to-interference ratios
achieved by the MMSE detector SIR 1 , . . . SIR K is [256]

N
λi
K SIR k
2
= (4.1)
λi + σ SIR k + 1
i=1 k=1

where σ 2 is the variance of the noise components in (3.1). To show


(4.1), we can write its right-hand side as
N  −1

λi 2 † †
= tr σ I + STS STS
λi + σ 2
i=1
" #
 −1 K
= tr σ 2 I + STS† Tk sk s†k
k=1

153
154 Appendices

which can be further elaborated into



N
λi 
K  −1
= Tk s†k σ 2 I + STS† sk
λi + σ 2
i=1 k=1
K
SIR k
= (4.2)
SIR k +1
k=1

where (4.2) follows from [271, (6.40)].


Denote for brevity

1
η = ηSTS† . (4.3)
σ2

From the the fact that the η-transform of STS† evaluated at σ −2 is the
multiuser efficiency achieved by each of the users asymptotically,
Tk
SIR k = η (4.4)
σ2
we obtain

1  SIR k 1 
K K
1
lim = 1 − lim Tk
K→∞ K SIR k + 1 K→∞ K η +1
k=1 k=1 σ2
η 
= 1 − ηT (4.5)
σ2
almost surely, by the law of large numbers and the definition of η-
transform. Also by definition of η-transform,

1  λi
N
lim =1−η (4.6)
N →∞ N λi + σ 2
i=1

Equations (4.1), (4.5) and (4.6) lead to the sought-after relationship


  η 
β 1 − ηT = 1 − η. (4.7)
σ2

4.2 Proof of Theorem 2.42


The first step in the proof is to convert the problem to one where T
is replaced by a diagonal matrix DT of the same size and with the
4.2. Proof of Theorem 2.42 155

same limiting empirical eigenvalue distribution. To that end, denote


the diagonal matrix
Q = I + γW0 (4.8)
and note that
 
det I + γ(W0 + HTH† ) = det (T) det (Q)
 
·det T−1 + γ(HQ−1 H† ) . (4.9)

Using Theorem 2.38 with W0 and T therein equal to T−1 and Q−1
(this is a valid choice since Q−1 is diagonal), it follows that the asymp-
totic spectrum of T−1 + γ(HQ−1 H† ) depends on T−1 only through its
asymptotic spectrum. Therefore, when we take N1 log of both sides of
(4.9) we are free to replace T by DT . Thus,
1  
VW (γ) = lim log det I + γ(W0 + HTH† ) (4.10)
N →∞ N
1  
= = lim log det I + γ(W0 + HDT H† ) (4.11)
N →∞ N
= VW0 +HDT H† (γ) (4.12)
Since the Shannon transforms are identical, so are the η-transforms.
Using Theorem 2.38 and (2.48), it follows that the η-transform of W0 +
HDT H† and consequently of W is
⎡ ⎤
1
ηγ = E ⎣  ⎦ (4.13)
W0 + γ1 + β E 1+Tηγ T

where T and W0 are independent random variables whose distributions


equal the asymptotic empirical eigenvalue distributions of T and W0 ,
respectively. From (4.13),
ηγ = ϕη0 (ϕ) (4.14)
with
γ
ϕ =   (4.15)
1 + β γ E 1+Tηγ
T

γ
= β
(4.16)
1+ η (1 − ηT (ηγ))
156 Appendices

from which

ηϕ + ϕβ (1 − ηT (ηγ)) = γη (4.17)
= ϕη0 (ϕ). (4.18)

4.3 Proof of Theorem 2.44


From Theorem 2.43 and from Remark 2.3.1 it follows straightforwardly
that the η-transform of HH† with H = CSA equals the η-transform of
a matrix H̃ whose entries are independent zero-mean random variables
with variance
Pi,j
E[|H̃i,j |2 ] =
N
and whose variance profile is

v(x, y) = vX (x) vY (y)

with vX (x) and vY (y) such that the distributions of vX (X) and vY (Y)
(with X and Y independent random variables uniform on [0, 1]) equal
the asymptotic empirical distributions of D and T respectively. In turn,
(2.137) can be proved as special case of (2.158) when the function
v(x, y) can be factored. In this case, the expressions of ΓHH† (x, γ) and
ΥHH† (y, γ) given by Equations (2.154) and (2.155) in Theorem 2.50
become
1
ΓHH† (x, γ) =
1 + β γ vX (x)E[vY (Y) ΥHH† (Y, γ)]
1
= (4.19)
1 + β γ vX (x) Υ̃HH† (γ)

where we have denoted

Υ̃HH† (γ) = E[vY (Y)ΥHH† (Y, γ)].

For convenience, in the following we drop the subindices from ΓHH† ,


ΥHH† , Υ̃HH† . Let us further denote

Γ̃(γ) = E[vX (X)Γ(X, γ)].


4.3. Proof of Theorem 2.44 157

Using (2.154) we obtain



vX (X)
Γ̃(γ) = E
1 + β γ vX (X) E[vY (Y) Υ(Y, γ)]

vX (X)
= E
1 + β γ vX (X) Υ̃(γ)]

ΛD
= E
1 + βγ ΛDΥ̃(γ)
1  
= 1 − ηD (β γ Υ̃(γ)) (4.20)
β γ Υ̃(γ)
where we have indicated by ΛD a nonnegative random variable whose
distribution is given by the asymptotic spectrum of D. Likewise, using
the definition of Υ(y, γ) in (2.155) we obtain

vY (Y)
Υ̃(γ) = E
1 + γ vY (Y)Γ̃(γ)

ΛT
= E
1 + γ ΛT Γ̃(γ)
1  
= 1 − ηT (γ Γ̃(γ)) (4.21)
γ Γ̃(γ)
where we have denoted by ΛT a nonnegative random variable whose
distribution is given by the asymptotic spectrum of the matrix T. No-
tice also that

log (1 + γE ([v(X, Y)Γ(X, γ)|Y]) = log (1 + γ vY (Y)E[vX (X)Γ(X, γ)])


 
= log 1 + γ vY (Y)Γ̃(γ) (4.22)

and thus
 
E [log (1 + γE[v(X, Y)Γ(X, γ)|Y])] = E log(1 + γ ΛT Γ̃(γ))
= VT (γ Γ̃(γ)). (4.23)

Likewise,
 
E [log(1 + γ β E[v(X, Y)Υ(Y, γ)|X])] = E log(1 + γ β ΛD Υ̃(γ))
= VD (γ β Υ̃(γ)). (4.24)
158 Appendices

Moreover,

γ β E[v(X, Y)Υ(X, γ)Υ(Y, γ)] = γ β E [vX (X) vY (Y)Γ(X, γ)Υ(Y, γ)]


= γ β Γ̃(γ) Υ̃(γ). (4.25)

Defining

γt = γ Γ̃(γ) γd = γ Υ̃(γ), (4.26)

plugging (4.25), (4.24), (4.23) into (2.158), and using (4.26), (4.21) and
(4.20), the expression for VHH† in Theorem 2.44 is found.

4.4 Proof of Theorem 2.49


From (2.153) it follows that

ηHH† (γ) = E[ΓHH† (X, γ)]

with ΓHH† (·, ·) satisfying the equation

1
ΓHH† (x, γ) =   (4.27)
v(x,Y)
1 + βγ E 1+γE[v(X,Y)ΓHH† (X,γ)|Y]

where X and Y are independent random variables uniformly dis-


tributed on [0, 1]. Again for convenience, in the following we abbreviate
ΓHH† (·, ·) and ΥHH† (·, ·) as Γ(·, ·) and Υ(·, ·).
From the definition of doubly-regular matrix, we have that
E [1{v(X, t) ≤ x}] does not depend on t and thus E[v(X, t)Γ(X, γ)] does
not depend on t. At the same time, from the definition of doubly-regular
E [1{v(r, Y) ≤ x}], does not depend on r and thus the expectation


v(r, Y)
E
1 + γE[v(X, Y)Γ(X, γ)|Y]
4.5. Proof of Theorem 2.53 159

does not depend on r. Consequently, Γ(r, γ)=Γ(γ) for all r. Thus, we


can rewrite the fixed-point equation in (4.27) as

1
Γ(γ) = 
v(x, Y)
1 + βγE
1 + γ Γ(γ)E[v(X, Y)| Y]
1
= 
v(x, Y)
1 + β γE
1 + γ Γ(γ)µ
1
=
E [v(x, Y)]
1+βγ
1 + γ Γ(γ)µ

resulting in
1
Γ(γ) =
1
1 + βγµ
1 + γΓ(γ)µ

with µ = E [v(X, y)] = E [v(x, Y)] = 1 since we have assumed P to be a


standard double-regular matrix. The above equation can be solved to
yield the η-transform of HH† as

F(γ, β)
ηHH† (γ) = 1 − .
4βγ

Using (2.48) and the inverse Stieltjes formula, the claim is proved.

4.5 Proof of Theorem 2.53


From (2.59), the Shannon transform of HH† is given by

VHH† (γ) = log(1 + γλ)dFHH† (λ)

where FHH† (·) represents the limiting distribution to which the em-
pirical eigenvalue distribution of HH† converges almost surely. The
160 Appendices

derivative with respect to γ is



λ
V̇HH† (γ) = log e dF † (λ)
1 + γλ HH


log e 1
= 1− dFHH† (λ)
γ 1 + γλ


log e 1
= 1− dFHH† (λ)
γ 1 + γλ
log e
= (1 − E [ΓHH† (X, γ)]) (4.28)
γ
where, in the last equality, we have invoked Theorem 2.50 and where
ΓHH† (·, ·) satisfies the equations given in (2.154) and (2.155), namely
1
ΓHH† (x, γ) = (4.29)
1 + βγE[v(x, Y)ΥHH† (Y, γ)]
1
ΥHH† (y, γ) = (4.30)
1 + γE[v(X, y)ΓHH† (X, γ)]
with X and Y independent random variables uniform on [0, 1]. For
brevity, we drop the subindices from ΓHH† and ΥHH† . Using (4.29)
we can write
1 − Γ(x, γ) βE[v(x, Y)Υ(Y, γ)]
= ,
γ 1 + βγE[v(x, Y)Υ(Y, γ)]
which, after adding and subtracting to the right-hand side
βγE[v(x, Y)Υ̇(Y, γ)]
,
1 + βγE[v(x, Y)Υ(Y, γ)]
becomes
1 − Γ(x, γ) βE[v(x, Y)Υ(Y, γ)] + βγE[v(x, Y)Υ̇(Y, γ)]
=
γ 1 + βγE[v(x, Y)Υ(Y, γ)]
βγE[v(x, Y)Υ̇(Y, γ)]

1 + βγE[v(x, Y)Υ(Y, γ)]
d
= ln(1 + βγE[v(x, Y)Υ(Y, γ)])

βγE[v(x, Y)Υ̇(Y, γ)]
− (4.31)
1 + βγE[v(x, Y)Υ(Y, γ)]
4.5. Proof of Theorem 2.53 161

d
where Υ̇(·, γ)= dγ Υ(·, γ). From (4.28) and (4.29) it follows that

d
V̇HH† (γ) = E log(1 + βγE[v(X, Y)Υ(Y, γ)])

 
−β γE v(X, Y) Γ(X, γ) Υ̇(Y, γ) log e. (4.32)

Notice that
  d
−γE v(X, Y) Γ(X, γ) Υ̇(Y, γ) =− (γE [v(X, Y) Γ(X, γ) Υ(Y, γ)])

 
+E γ v(X, Y)Γ̇(X, γ)Υ(Y, γ)
+E [v(X, Y)Γ(X, γ)Υ(Y, γ)] (4.33)
d
with Γ̇(·, γ)= dγ Γ(·, γ). From (4.29),
    
v(X,Y)(γ Γ̇(X,γ)+Γ(X,γ))
E v(X, Y) γ Γ̇(X, γ) + Γ(X, γ) Υ(Y, γ) = E 1+γE[v(X,Y)Γ(X,γ)|Y]

E[v(X,Y)(γ Γ̇(X,γ)+Γ(X,γ))|Y]
= E 1+γE[v(X,Y)Γ(X,γ)|Y]

from which integrating (4.32) with respect to γ and using (4.33) we


have that

VHH† (γ) = E [log(1 + βγE[v(X, Y)Υ(Y, γ)])]


−β γE [v(X, Y) Γ(X, γ) Υ(Y, γ)] log e
+β E [log(1 + γE[v(X, Y)Γ(X, γ)|Y])] + κ
(4.34)

with κ the integration constant which must be set to κ = 0 so that


VHH† (0) = 0.
Acknowledgements

The authors gratefully acknowledge helpful suggestions by Prof. Z. D.


Bai, National University of Singapore, Dr. A. Lozano, Bell Labora-
tories, Prof. R. Speicher, Queen’s University at Kingston, Prof. D.
Voiculescu, University of California at Berkeley, and the anonymous
reviewers.

162
References

[1] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. Dover


Publications, Inc., New York, 1964.
[2] G. Alfano, A. Lozano, A. M. Tulino, and S. Verdú, “Capacity of MIMO chan-
nels with one-sided correlation,” in Proc. IEEE Int. Symp. on Spread Spectrum
Tech. and Applications (ISSSTA’04), Aug. 2004.
[3] G. Alfano, A. Lozano, A. M. Tulino, and S. Verdú, “Mutual information and
eigenvalue distribution of MIMO Ricean channels,” in Proc. IEEE Int. Symp.
Information Theory & Applications (ISITA’04), (Parma, Italy), Oct. 2004.
[4] J. B. Andersen, “Array gain and capacity for known random channels with
multiple element arrays at both ends,” IEEE J. on Selected Areas in Commu-
nications, vol. 18, pp. 2172–2178, Nov. 2000.
[5] T. W. Anderson, “The non-central Wishart distribution and certain problems
of multivariate statistics,” Annals of Math. Statistics, vol. 17, no. 4, pp. 409–
431, 1946.
[6] M. Andréief, “Note sur une relation entre les intégrals définies des produits
des fonctions,” Mém. de la Soc. Sci. de Bordeaux, vol. 2, pp. 1–14, 1883.
[7] O. Axelsson, Iterative Solution Methods. Cambridge University Press, 1994.
[8] Z. D. Bai, “Convergence rate of expected spectral distributions of large ran-
dom matrices. Part I: Wigner matrices,” Annals of Probability, vol. 21, no. 2,
pp. 625–648, 1993.
[9] Z. D. Bai, “The circle law,” Annals of Probability, vol. 25, pp. 494–529, 1997.
[10] Z. D. Bai, “Methodologies in spectral analysis of large dimensional random
matrices,” Statistica Sinica, vol. 9, no. 3, pp. 611–661, 1999.

163
164 References

[11] Z. D. Bai, B. Miao, and J. Tsay, “Convergence rates of spectral distributions


of large Wigner matrices,” Int. Mathematical Journal, vol. 1, no. 1, pp. 65–90,
2002.
[12] Z. D. Bai, B. Miao, and J. Yao, “Convergence rates of spectral distributions
of large sample covariance matrices,” SIAM J. of Matrix Analysis and Appli-
cations, vol. 25, no. 1, pp. 105–127, 2003.
[13] Z. D. Bai and J. W. Silverstein, “No eigenvalues outside the support of the lim-
iting spectral distribution of large dimensional sample covariance matrices,”
Annals of Probability, vol. 26, pp. 316–345, 1998.
[14] Z. D. Bai and J. W. Silverstein, “Exact separation of eigenvalues of large
dimensional sample covariance matrices,” Annals of Probability, vol. 27, no. 3,
pp. 1536–1555, 1999.
[15] Z. D. Bai and J. W. Silverstein, “CLT of linear spectral statistics of large di-
mensional sample covariance matrices,” Annals of Probability, vol. 32, no. 1A,
pp. 553–605, 2004.
[16] Z. D. Bai and Y. Q. Yin, “Convergence to the semicircle law,” Annals of
Probability, vol. 16, no. 2, pp. 863–875, 1988.
[17] R. Balian, “Random matrices in information theory,” Il Nuovo Cimento,
pp. 183–193, Sep. 1968.
[18] M. S. Barlett, “On the theory of statistical regression,” in Proc. R. Soc. Edinb.
53, pp. 260–283, 1933.
[19] S. A. Bergmann and H. W. Arnold, “Polarization diversity in portable com-
munications environment,” IEE Electronics Letters, vol. 22, pp. 609–610, May
1986.
[20] P. Biane, “Minimal factorizations of cycle and central multiplicative functions
on the infinite symmetric group,” J. Combinatorial Theory, vol. A 76, no. 2,
pp. 197–212, 1996.
[21] P. Biane, “Free probability for probabilists,” Report MSRI
(http://arXiv.org/abs/math/9809193), Sep. 1998.
[22] E. Biglieri, J. Proakis, and S. Shamai, “Fading channels: Information-theoretic
and communications aspects,” IEEE Trans. on Information Theory, vol. 44,
pp. 2619–2692, Oct. 1998.
[23] E. Biglieri and G. Taricco, “Transmission and reception with multiple anten-
nas: Theoretical foundations,” submitted to Foundations and Trends in Com-
munications and Information Theory, 2004.
[24] E. Biglieri and G. Taricco, “Large-system analyses of multiple-antenna system
capacities,” Journal of Communications and Networks, vol. 5, No. 2, p. 5764,
June 2003.
[25] E. Biglieri, G. Taricco, and A. M. Tulino, “Performance of space-time codes
for a large number of antennas,” IEEE Trans. on Information Theory, vol. 48,
pp. 1794–1803, July 2002.
[26] E. Biglieri, A. M. Tulino, and G. Taricco, “How far away is infinity? Using
asymptotic analyses in multiple antenna systems,” Proc. IEEE Int. Symp. on
Spread Spectrum Techn. and Applications (ISSSTA’02), vol. 1, pp. 1–6, Sep.
2002.
References 165

[27] G. Birkhoff and S. Gulati, “Isotropic distributions of test matrices,” J. Appl.


Math. Physics, vol. 30, pp. 148–157, 1979.
[28] R. S. Blum, “MIMO capacity with interference,” IEEE J. on Selected Areas
in Communications, vol. 21, pp. 793–801, June 2003.
[29] P. Bougerol and J. Lacroix, Random Products of Matrices with Applications
to Schrödinger Operators. Basel, Switzerland: Birkhuser, 1985.
[30] J. Boutros and G. Caire, “Iterative multiuser joint decoding: unified frame-
work and asymptotic analysis,” IEEE Trans. on Information Theory, vol. 48,
pp. 1772–1793, July 2002.
[31] L. H. Brandenburg and A. D. Wyner, “Capacity of the Gaussian channel
with memory: The multivariate case,” Bell System Technical Journal, vol. 53,
pp. 745–778, May-June 1974.
[32] B. Bronk, “Exponential ensembles for random matrices,” J. of Math. Physics,
vol. 6, pp. 228–237, 1965.
[33] R. Buehrer and B. Woerner, “The asymptotic multiuser efficiency of m-stage
interference cancellation receivers,” in Proc. IEEE Int. Symp. on Personal,
Indoor and Mobile Radio Communications (PIMRC’97), (Helsinki, Finland),
pp. 570–574, Sep. 1997.
[34] K. A. Byerly and R. A. Roberts, “Output power based partial adaptive array
design,” in Proc. Asilomar Conf. on Signals, Systems and Computers, (Pacific
Grove, CA), pp. 576–580, Oct. 1989.
[35] G. Caire and S. Shamai, “On achievable rates in a multi-antenna broadcast
downlink,” in Proc. Allerton Conf. on Communications, Control and Com-
puting, pp. 1188–1193, Oct. 2000.
[36] G. Caire and S. Shamai, “On the achievable throughput of a multi-antenna
Gaussian broadcast channel,” IEEE Trans. on Information Theory, vol. 49,
no. 7, pp. 1691–1706, 2003.
[37] J. M. Chaufray, W. Hachem, and P. Loubaton, “Asymptotic analysis of opti-
mum and sub-optimum CDMA MMSE receivers,” Proc. IEEE Int. Symp. on
Information Theory (ISIT’02), p. 189, July 2002.
[38] M. Chiani, “Evaluating the capacity distribution of MIMO Rayleigh fading
channels,” Proc. IEEE Int. Symp. on Advances in Wireless Communications,
pp. 3–4, Sep. 23-24 2002, Victoria, Canada.
[39] M. Chiani, M. Z. Win, and A. Zanella, “On the capacity of spatially corre-
lated MIMO Rayleigh-fading channels,” IEEE Trans. on Information Theory,
vol. 49, pp. 2363–2371, Oct. 2003.
[40] D. Chizhik, F. R. Farrokhi, J. Ling, and A. Lozano, “Effect of antenna separa-
tion on the capacity of BLAST in correlated channels,” IEEE Communications
Letters, vol. 4, pp. 337–339, Nov. 2000.
[41] D. Chizhik, J. Ling, P. Wolniansky, R. A. Valenzuela, N. Costa, and K. Huber,
“Multiple-input multiple-output measurements and modelling in Manhattan,”
IEEE J. on Selected Areas in Communications, vol. 21, pp. 321–331, Apr.
2003.
[42] T.-S. Chu and L. J. Greenstein, “A semiempirical representation of antenna
diversity gain at cellular and PCS base stations,” IEEE Trans. on Communi-
cations, vol. 45, pp. 644–656, June 1997.
166 References

[43] C. Chuah, D. Tse, J. Kahn, and R. Valenzuela, “Capacity scaling in dual-


antenna-array wireless systems,” IEEE Trans. on Information Theory, vol. 48,
pp. 637–650, March 2002.
[44] J. Cohen, H. Kesten, and M. Newman, “Oseledec’s multiplicative ergodic
theorem: a proof,” in Random matrices and their applications, (J. Cohen,
H. Kesten, and M. Newman, eds.), Providence, RI: American Mathematical
Society, 1986.
[45] L. Cottatellucci and R. R. Müller, “Asymptotic design and analysis of mul-
tistage detectors with unequal powers,” in Proc. IEEE Information Theory
Workshop (ITW’02), (Bangalore, India), Oct. 2002.
[46] L. Cottatellucci and R. R. Müller, “Asymptotic design and analysis of multi-
stage detectors and multistage channel estimators for multipath fading chan-
nels,” in Proc. IEEE Int. Symp. on Information Theory (ISIT’02), (Yoko-
hama, Japan), June 2003.
[47] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley
and Sons, Inc., 1991.
[48] D. Cox, “Universal digital portable radio communications,” Proc. IEEE,
vol. 75, No. 4, pp. 436–477, Apr. 1987.
[49] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete
Memoryless Systems. New York: Academic, 1981.
[50] Y. L. Cun, I. Kanter, and S. A. Solla, “Eigenvalues of covariance matri-
ces: Application to neural-network learning,” Physical Review Letters, vol. 66,
pp. 2396–2399, 1991.
[51] H. Dai, A. F. Molisch, and H. V. Poor, “Downlink capacity of interference-
limited MIMO systems with joint detection,” IEEE Trans. on Wireless Com-
munications, vol. 3, pp. 442– 453, Mar. 2004.
[52] A. B. de Monvel and A. Khorunzhy, “Asymptotic distribution of smoothed
eigenvalue density. I. Gaussian random matrices & II. Wigner random matri-
ces,” Random Operators and Stochastic Equations, vol. 7, pp. 1–22 & 149–168,
1999.
[53] A. B. de Monvel and A. Khorunzhy, “On the norm and eigenvalue distribution
of large random matrices,” Annals of Probability, vol. 27, no. 2, pp. 913–944,
1999.
[54] A. B. de Monvel and A. Khorunzhy, “On universality of smoothed eigenvalue
density of large random matrices,” J. of Physics A: Mathematical and General,
vol. 32, pp. 413–417, 1999.
[55] A. B. de Monvel, A. Khorunzhy, and V. Vasilchuk, “Limiting eigenvalue dis-
tribution of random matrices with correlated entries,” Markov Processes and
Related Fields, vol. 2, no. 2, pp. 607–636, 1996.
[56] M. Debbah, W. Hachem, P. Loubaton, and M. de Courville, “MMSE anal-
ysis of certain large isometric random precoded systems,” IEEE Trans. on
Information Theory, vol. 49, pp. 1293–1311, May 2003.
[57] H. Dette, “Strong approximation of the eigenvalues of large dimensional
Wishart matrices by roots of generalized Laguerre polynomials,” J. of Ap-
proximation Theory, vol. 118, no. 2, pp. 290–304, 2002.
References 167

[58] J. A. Dı́az-Garcı́a and J. R. Gutierrez, “Proof of the conjectures of H. Uhlig


on the singular multivariate beta and the Jacobian of a certain matrix trans-
formation,” Annals of Statistics, vol. 25, pp. 2018–2023, 1997.
[59] J. A. Dı́az-Garcı́a, J. R. Gutierrez, and K. V. Mardia, “Wishart and pseudo-
Wishart distributions and some applications to shape theory,” J. of Multivari-
ate Analysis, vol. 63, pp. 73–87, 1997.
[60] S. N. Diggavi, N. Al-Dhahir, A. Stamoulis, and A. R. Calderbank, “Great
expectations: The value of spatial diversity in wireless networks,” Proc. IEEE,
vol. 92, pp. 219–270, Feb. 2004.
[61] M. Dohler and H. Aghvami, “A closed form expression of MIMO capacity over
ergodic narrowband channels,” preprint, 2003.
[62] P. Driessen and G. J. Foschini, “On the capacity formula for multiple-input
multiple-output channels: A geometric interpretation,” IEEE Trans. Commu-
nications, vol. 47, pp. 173–176, Feb. 1999.
[63] K. J. Dykema, “On certain free product factors via an extended matrix
model,” J. of Funct. Analysis, vol. 112, pp. 31–60, 1993.
[64] K. J. Dykema, “Interpolated free group factors,” Pacific J. Math., vol. 163,
no. 1, pp. 123–135, 1994.
[65] F. Dyson, “Statistical theory of the energy levels of complex systems,” J. of
Math. Physics, vol. 3, pp. 140–175, 1962.
[66] F. Dyson, “A class of matrix ensembles,” J. of Math. Physics, vol. 13, p. 90,
1972.
[67] A. Edelman, Eigenvalues and condition number of random matrices. PhD
thesis, Dept. Mathematics, MIT, Cambridge, MA, 1989.
[68] A. Edelman, “The probability that a random real Gaussian matrix has k
real eigenvalues, related distributions, and the circular law,” J. Multivariate
Analysis, vol. 60, pp. 203–232, 1997.
[69] A. Edelman, E. Kostlan, and M. Shub, “How many eigenvalues of a random
matrix are real?,” J. Amer. Math. Soc., vol. 7, pp. 247–267, 1994.
[70] Y. C. Eldar and A. M. Chan, “On the asymptotic performance of the decor-
relator,” IEEE Trans. on Information Theory, vol. 49, pp. 2309–2313, Sep.
2003.
[71] H. Elders-Boll, A. Busboom, and H. Schotten, “Implementation of linear mul-
tiuser detectors for asynchronous CDMA systems by linear interference can-
cellation algorithms,” in Proc. IEEE Vehicular Technology. Conf. (VTC’98),
(Ottawa, Canada), pp. 1849–1853, May 1998.
[72] H. Elders-Boll, H. Schotten, and A. Busboom, “Efficient implementation of
linear multiuser detectors for asynchronous CDMA systems by linear inter-
ference cancellation,” Euro. Trans. Telecommunications, vol. 9, pp. 427–438,
Sep./Oct. 1998.
[73] J. Evans and D. Tse, “Large system performance of linear multiuser receivers
in multipath fading channels,” IEEE Trans. on Information Theory, vol. 46,
pp. 2059–2078, Sep. 2000.
[74] F. R. Farrokhi, G. J. Foschini, A. Lozano, and R. A. Valenzuela, “Link-optimal
space-time processing with multiple transmit and receive antennas,” IEEE
Communications Letters, vol. 5, pp. 85–87, Mar. 2001.
168 References

[75] R. A. Fisher, “The sampling distribution of some statistics obtained from


non-linear equations,” Annals of Eugenics, vol. 9, pp. 238–249, 1939.
[76] G. Foschini and M. Gans, “On limits of wireless communications in fading
environment when using multiple antennas,” Wireless Personal Communica-
tions, vol. 6, No. 6, pp. 315–335, Mar. 1998.
[77] G. J. Foschini, “Layered space-time architecture for wireless communication in
a fading environment when using multi-element antennas,” Bell Labs Technical
Journal, vol. 1, pp. 41–59, 1996.
[78] D. A. S. Fraser, The Structure of Inference. John Wiley and Sons, New York,
1968.
[79] H. Furstenberg and H. Kesten, “Products of random matrices,” Annals of
Math. Statistics, pp. 457–469, 1960.
[80] H. Gao and P. J. Smith, “A determinant representation for the distribution of
quadratic forms in complex normal vectors,” J. Multivariate Analysis, vol. 73,
pp. 155–165, May 2000.
[81] M. Gaudin, “Sur la loi limite de l’espacement des valeurs propres d’une matrice
aleatoire,” Nuclear Physics, vol. 25, pp. 447–455, 1961.
[82] D. Gesbert, M. Shafi, D. Shiu, P. J. Smith, and A. Naguib, “From theory to
practice: An overview of MIMO Space-Time coded wireless systems,” J. on
Selected Areas in Communications, vol. 21, pp. 281–302, Apr. 2003.
[83] R. Gharavi and V. Anantharam, “An upper bound for the largest Lyapunov
exponent of a Markovian product of nonnegative matrices,” submitted to The-
oretical Computer Science, 2004.
[84] V. Ghazi-Moghadam, L. Nelson, and M. Kaveh, “Parallel interference can-
cellation for CDMA systems,” in Proc. Allerton Conf. on Communication,
Control and Computing, (Monticello, IL), pp. 216–224, Oct. 1995.
[85] V. L. Girko, “Circular law,” Theory Prob. Appl., vol. 29, pp. 694–706, 1984.
[86] V. L. Girko, Theory of Random Determinants. Dordrecht: Kluwer Academic
Publishers, 1990.
[87] V. L. Girko, “A refinement of the central limit theorem for random determi-
nants,” Theory Prob. Appl., vol. 42, no. 1, pp. 121–129, 1997.
[88] V. L. Girko, “Convergence rate of the expected spectral functions of sym-
metric random matrices equal to o(n−1/2 ),” Random Operator and Stochastic
Equations, vol. 6, pp. 359–408, 1998.
[89] M. A. Girshick, “On the sampling theory of roots of determinantal equations,”
Annals of Math. Statistics, vol. 10, pp. 203–204, 1939.
[90] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits
of MIMO channels,” IEEE J. on Selected Areas on Communications, vol. 21,
pp. 684–702, June 2003.
[91] J. S. Goldstein and I. S. Reed, “A new method of Wiener filtering and its
application to interference mitigation for communications,” in Proc. IEEE
MILCOM, (Monterey, CA), pp. 1087–1091, Nov. 1997.
[92] J. S. Goldstein and I. S. Reed, “Reduced rank adaptive filtering,” IEEE Trans.
Signal Processing, vol. 45, pp. 492–496, Feb. 1997.
References 169

[93] J. S. Goldstein, I. S. Reed, and L. L. Scharf, “A multistage representation of


the Wiener filter based on orthogonal projections,” IEEE Trans. on Informa-
tion Theory, vol. 44, pp. 2943–2959, Nov. 1998.
[94] C. R. Goodall and K. V. Mardia, “Multivariate aspects of shape theory,”
Annals of Statistics, vol. 21, pp. 848–866, 1993.
[95] F. Gotze and A. N. Tikhomirov, “Rate of convergence to the semicircular
law for the Gaussian unitary ensemble,” Theory Prob. Appl., vol. 47, no. 2,
pp. 323–330, 2003.
[96] P. Graczyk, G. Letac, and H. Massam, “The complex Wishart distribution
and the symmetric group,” Annals of Statistics, vol. 31, no. 1, pp. 287–309,
2003.
[97] S. Gradshteyn and I. Ryzhik, Table of Integrals, Series and Products. New
York: Academic, 1965.
[98] A. Grant, “Rayleigh fading multi-antenna channels,” EURASIP J. on Applied
Signal Processing, vol. 3, pp. 316–329, 2002.
[99] A. Grant and C. Schlegel, “Convergence of linear interference cancellation
multiuser receivers,” IEEE Trans. on Communications, vol. 10, pp. 1824–1834,
Oct. 2001.
[100] A. J. Grant and P. D. Alexander, “Random sequences multisets for syn-
chronous code-division multiple-access channels,” IEEE Trans. on Informa-
tion Theory, vol. 44, pp. 2832–2836, Nov. 1998.
[101] U. Grenander and J. W. Silverstein, “Spectral analysis of networks with ran-
dom topologies,” SIAM J. of Applied Mathematics, vol. 32, pp. 449–519, 1977.
[102] A. Guionnet and O. Zeitouni, “Concentration of the spectral measure for
large matrices,” Electronic Communications in Probability, vol. 5, pp. 119–
136, 2000.
[103] D. Guo and S. Verdú, “Multiuser detection and statistical mechanics,” in Com-
munications Information and Network Security, (V. Bhargava, H. V. Poor,
V. Tarokh, and S. Yoon, eds.), pp. 229–277, Ch. 13, Kluwer Academic Pub-
lishers, 2002.
[104] A. K. Gupta and D. K. Nagar, “Matrix variate distributions,” in Monographs
and Surveys in pure and applied mathematics, Boca Raton, FL: Chapman and
Hall/CRC, 2000.
[105] W. Hachem, “Low complexity polynomial receivers for downlink CDMA,” in
Proc. Asilomar Conf. on Systems, Signals and Computers, (Pacific Grove,
CA), Nov. 2002.
[106] G. Hackenbroich and H. A. Weidenmueller, “Universality of random-matrix
results for non-Gaussian ensembles,” Physics Review Letters, vol. 74, pp. 4118–
4121, 1995.
[107] P. R. Halmos, Measure Theory. Van Nostrand, Princeton, NJ, 1950.
[108] S. V. Hanly and D. N. C. Tse, “Resource pooling and effective bandwidths in
CDMA networks with multiuser receivers and spatial diversity,” IEEE Trans.
on Information Theory, vol. 47, pp. 1328–1351, May 2001.
[109] S. Hara and R. Prasad, “Overview of multicarrier CDMA,” IEEE Communi-
cations Magazine, vol. 35, pp. 126–133, Dec. 1997.
170 References

[110] F. Hiai and D. Petz, “Asymptotic freeness almost everywhere for random
matrices,” Acta Sci. Math. Szeged, vol. 66, pp. 801–826, 2000.
[111] F. Hiai and D. Petz, The Semicircle Law, Free Random Variables and Entropy.
American Mathematical Society, 2000.
[112] B. Hochwald and S. Vishwanath, “Space-time multiple access: Linear growth
in the sum-rate,” in Proc. Allerton Conf. on Communication, Control and
Computing, (Monticello, IL), Oct. 2002.
[113] B. M. Hochwald, T. L. Marzetta, and V. Tarokh, “Multi-antenna channel
hardening and its implications for rate feedback and scheduling,” IEEE Trans.
on Information Theory, submitted May 2002.
[114] T. Holliday, A. J. Goldsmith, and P. Glynn, “On entropy and Lyapunov ex-
ponents for finite state channels,” submitted to IEEE Trans. on Information
Theory, 2004.
[115] M. L. Honig, “Adaptive linear interference suppression for packet DS-CDMA,”
Euro. Trans. Telecommunications, vol. 9, pp. 173–182, Mar./Apr. 1998.
[116] M. L. Honig and W. Xiao, “Performance of reduced-rank linear interference
suppression for DS-CDMA,” IEEE Trans. on Information Theory, vol. 47,
pp. 1928–1946, July 2001.
[117] R. Horn and C. Johnson, Matrix Analysis. Cambridge University Press, 1985.
[118] D. Hösli and A. Lapidoth, “The capacity of a MIMO Ricean channel is mono-
tonic in the singular values of the mean,” 5th Int. ITG Conf. on Source and
Channel Coding, Jan. 2004.
[119] D. C. Hoyle and M. Rattray, “Principal component analysis eigenvalue spectra
from data with symmetry breaking structure,” Physical Review E, vol. 69,
026124, 2004.
[120] P. L. Hsu, “On the distribution of roots of certain determinantal equations,”
Annals of Eugenics, vol. 9, pp. 250–258, 1939.
[121] L. K. Hua, Harmonic analysis of functions of several complex variables in the
classical domains. Providence, RI: American Mathematical Society, 1963.
[122] P. Jacquet, G. Seroussi, and W. Szpankowski, “On the entropy of a hidden
Markov process,” in Proc. Data Compression Conference, Mar. 23–25 2004.
[123] S. A. Jafar, S. Vishwanath, and A. J. Goldsmith, “Channel capacity and beam-
forming for multiple transmit and receive antennas with covariance feedback,”
Proc. IEEE Int. Conf. on Communications (ICC’01), vol. 7, pp. 2266–2270,
2001.
[124] K. Jamal and E. Dahlman, “Multi-stage interference cancellation for DS-
CDMA,” in Proc. IEEE Vehicular Technology Conf. (VTC’96), (Atlanta, GA),
pp. 671–675, Apr. 1996.
[125] A. T. James, “Distributions of matrix variates and latent roots derived from
normal samples,” Annals of Math. Statistics, vol. 35, pp. 475–501, 1964.
[126] R. Janaswamy, “Analytical expressions for the ergodic capacities of certain
MIMO systems by the Mellin transform,” Proc. IEEE Global Telecomm. Conf.
(GLOBECOM’03), vol. 1, pp. 287–291, Dec. 2003.
[127] S. K. Jayaweera and H. V. Poor, “Capacity of multiple-antenna systems with
both receiver and transmitter channel state information,” IEEE Trans. on
Information Theory, vol. 49, pp. 2697–2709, Oct. 2003.
References 171

[128] N. Jindal, S. Vishwanath, and A. Goldsmith, “On the duality of Gaussian


multiple-access and broadcast channels,” IEEE Trans. on Information Theory,
vol. 50, pp. 768–783, May 2004.
[129] K. Johansson, “Universality of the local spacing distribution in certain ensem-
bles of Hermitian Wigner matrices,” Comm. Math. Phys., vol. 215, pp. 683–
705, 2001.
[130] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and
Techniques. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[131] D. Jonsson, “Some limit theorems for the eigenvalues of a sample covariance
matrix,” J. Multivariate Analysis, vol. 12, pp. 1–38, 1982.
[132] E. Jorswieck and H. Boche, “On transmit diversity with imperfect channel
state information,” Proc. IEEE Conf. Acoustics, Speech, and Signal Processing
(ICASSP’02), vol. 3, pp. 2181–2184, May 2002.
[133] V. Jungnickel, T. Haustein, E. Jorswieck, and C. von Helmolt, “On linear pre-
processing in multi-antenna systems,” in Proc. IEEE Global Telecomm. Conf.
(GLOBECOM’02), pp. 1012–1016, Dec. 2002.
[134] M. Kang and M. S. Alouini, “On the capacity of MIMO Rician channels,”
in Proc. 40th Annual Allerton Conference on Communication, Control, and
Computing, (Monticello, IL), Oct. 2002.
[135] M. Kang and M. S. Alouini, “Impact of correlation on the capacity of MIMO
channels,” in Proc. IEEE Int. Conf. in Communications (ICC’04), pp. 2623–
2627, May 2003.
[136] M. Kang and M. S. Alouini, “Largest eigenvalue of complex Wishart matrices
and performance analysis of MIMO MRC systems,” IEEE J. on Selected Areas
in Communications, vol. 21, pp. 418–426, Apr. 2003.
[137] M. Kang, L. Yang, and M. S. Alouini, “Capacity of MIMO Rician channels
with multiple correlated Rayleigh co-channel interferers,” in Proc. 2003 IEEE
Globecom, (San Francisco, CA), Dec. 2003.
[138] M. Kang, L. Yang, and M. S. Alouini, “Performance analysis of MIMO sys-
tems in presence of co-channel interference and additive Gaussian noise,” in
Proc. 37th Annual Conf. on Information Sciences and Systems (CISS’2003),
(Baltimore, MD), Mar. 2003.
[139] H. Kargupta, K. Sivakumar, and S. Ghosh, “Dependency detection in Mo-
biMine and random matrices,” Proc. 6th European Conference on Principles
and Practice of Knowledge Discovery in Databases, pp. 250–262, 2002.
[140] C. G. Khatri, “Distribution of the largest or the smallest characteristic root
under null hyperthesis concerning complex multivariate normal populations,”
Annals of Math. Statistics, vol. 35, pp. 1807–1810, Dec. 1964.
[141] C. G. Khatri, “Some results on the non-central multivariate beta distribution
and moments of trace of two matrices,” Annals of Math. Statistics, vol. 36,
no. 5, pp. 1511–1520, 1965.
[142] C. G. Khatri, “On certain distribution problems based on positive definite
quadratic functions in normal vectors,” Annals of Math. Statistics, vol. 37,
pp. 467–479, Apr. 1966.
172 References

[143] C. G. Khatri, “Non-central distributions of i-th largest characteristic roots of


three matrices concerning complex multivariate normal populations,” Annals
of the Inst. Statist. Math., vol. 21, pp. 23–32, 1969.
[144] C. G. Khatri, “On the moments of traces of two matrices in three situations
for complex multivariate normal population,” Sankhya: Indian J. of Statistics,
vol. 32, series A, pp. 65–80, 1970.
[145] A. Khorunzhy, “On spectral norm of large band random matrices,” preprint,
Apr. 2004.
[146] A. M. Khorunzhy, B. A. Khoruzhenko, L. A. Pastur, and M. V. Shcherbina,
“The large-n limit in statistical mechanics and the spectral theory of disor-
dered systems,” Phase Transitions, vol. 15, pp. 73–239, 1992.
[147] A. M. Khorunzhy and L. A. Pastur, “Random eigenvalue distributions,” Adv.
Soviet Math., vol. 19, pp. 97–127, 1994.
[148] M. Kiessling and J. Speidel, “Exact ergodic capacity of MIMO channels in
correlated Rayleigh fading environments,” in Proc. Int. Zurich Seminar on
Communications (IZS), (Zurich, Switzerland), Feb. 2004.
[149] M. Kiessling and J. Speidel, “Mutual information of MIMO channels in cor-
related Rayleigh fading environments - a general solution,” in Proc. of IEEE
Int. Conf. in Communications. (ICC’04), (Paris, France), June 2004.
[150] M. Kiessling and J. Speidel, “Unifying analysis of ergodic MIMO capacity in
correlated Rayleigh fading environments,” in Fifth European Wireless Confer-
ence Mobile and Wireless Systems beyond 3G, (Barcelona, Catalonia, Spain),
Feb. 24-27 2004.
[151] Y.-H. Kim and A. Lapidoth, “On the log-determinant of non-central Wishart
matrices,” Proc. 2003 IEEE Int. Symp. on Information Theory, pp. 54–54,
Jul 2003.
[152] Kiran and D. Tse, “Effective interference and effective bandwidth of linear
multiuser receivers in asynchronous systems,” IEEE Trans. on Information
Theory, vol. 46, pp. 1426–1447, July 2000.
[153] T. Kollo and H. Neudecker, “Asymptotics of eigenvalues and unit-length eigen-
vectors of sample variance and correlation matrices,” J. Multivariate Analysis,
vol. 47, pp. 283–300, 1993.
[154] G. Kreweras, “Sur les partitions non-croisees d’un cycle,” Discrete Math.,
vol. 1, pp. 333–350, 1972.
[155] L. Laloux, P. Cizeau, M. Potters, and J. P. Bouchaud, “Random matrix theory
and financial correlations,” Int. J. of Theoretical and Applied Finance, vol. 3,
No. 3, pp. 391–397, 2000.
[156] W. C. Y. Lee and Y. S. Yeh, “Polarization diversity system for mobile radio,”
IEEE Trans. on Communications, vol. 20, pp. 912–923, Oct 1972.
[157] O. Lévêque, E. Telatar, and D. Tse, “Upper bounds on the capacity of ad-hoc
wireless networks,” in Proc. 2003 Winter School on Coding and Information
Theory, (Monte Veritá, Switzerland), Feb. 24-27, 2003.
[158] L. Li, A. M. Tulino, and S. Verdú, “Asymptotic eigenvalue moments for linear
multiuser detection,” Communications in Information and Systems, vol. 1,
pp. 273–304, Sep. 2001.
References 173

[159] L. Li, A. M. Tulino, and S. Verdú, “Design of reduced-rank MMSE multiuser


detectors using random matrix methods,” IEEE Trans. on Information The-
ory, vol. 50, June 2004.
[160] L. Li, A. M. Tulino, and S. Verdú, “Spectral efficiency of multicarrier CDMA,”
submitted to IEEE Trans. on Information Theory, 2004.
[161] K. Liu, V. Raghavan, and A. M. Sayeed, “Capacity and spectral efficiency
in wideband correlated MIMO channels,” Proc. 2003 IEEE Int. Symp. on
Information Theory (ISIT 2003), July, 2003.
[162] P. Loubaton and W. Hachem, “Asymptotic analysis of reduced rank Wiener
filter,” Proc. IEEE Information Theory Workshop (ITW’03), pp. 328–332,
Paris, France, 2003.
[163] A. Lozano and A. M. Tulino, “Capacity of multiple-transmit multiple-
receive antenna architectures,” IEEE Trans. on Information Theory, vol. 48,
pp. 3117–3128, Dec. 2002.
[164] A. Lozano, A. M. Tulino, and S. Verdú, “Multiple-antenna capacity in the low-
power regime,” IEEE Trans. on Information Theory, vol. 49, pp. 2527–2544,
Oct. 2003.
[165] A. Lozano, A. M. Tulino, and S. Verdú, “High-SNR power offset in multi-
antenna communication,” in Proc. IEEE Int. Symp. on Information Theory
(ISIT’04), (Chicago, IL), June 2004.
[166] A. Lozano, A. M. Tulino, and S. Verdú, “High-SNR power offset in multi-
antenna communication,” Bell Labs Technical Memorandum, June 2004.
[167] U. Madhow and M. Honig, “On the average near-far resistance for MMSE
detection of direct sequence CDMA signals with random spreading,” IEEE
Trans. on Information Theory, vol. 45, pp. 2039–2045, Sep. 1999.
[168] R. K. Mallik, “The pseudo-Wishart distribution and its application to MIMO
systems,” IEEE Trans. on Information Theory, vol. 49, pp. 2761–2769, Oct.
2003.
[169] A. Mantravadi, V. V. Veeravalli, and H. Viswanathan, “Spectral efficiency of
MIMO multiaccess systems with single-user decoding,” IEEE J. on Selected
Areas in Communications, vol. 21, pp. 382–394, Apr. 2003.
[170] V. A. Marc̆enko and L. A. Pastur, “Distributions of eigenvalues for some sets
of random matrices,” Math. USSR-Sbornik, vol. 1, pp. 457–483, 1967.
[171] T. Marzetta and B. Hochwald, “Capacity of mobile multiple-antenna commu-
nication link in a Rayleigh flat-fading environment,” IEEE Trans. on Infor-
mation Theory, vol. 45, pp. 139–157, Jan. 1999.
[172] M. L. Mehta, “On the statistical properties of the level-spacings in nuclear
spectra,” Nuclear Physics, vol. 18, p. 395, 1960.
[173] M. L. Mehta, Random Matrices and the Statistical Theory of Energy Levels.
New York, Academic Press, 1967.
[174] M. L. Mehta, “Power series of level spacing functions of random matrix en-
sembles,” Z. Phys. B, vol. 86, pp. 258–290, 1992.
[175] M. L. Mehta and F. Dyson, “Statistical theory of the energy levels of complex
systems,” J. Math. Physics, vol. 4, pp. 713–719, 1963.
[176] M. L. Mehta and M. Gaudin, “On the density of the eigenvalues of a random
matrix,” Nuclear Physics, vol. 18, pp. 420–427, 1960.
174 References

[177] X. Mestre, Space processing and channel estimation: performance analysis and
asymptotic results. PhD thesis, Dept. de Teoria del Senyal i Comunicacions,
Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain, 2002.
[178] X. Mestre, J. R. Fonollosa, and A. Pages-Zamora, “Capacity of MIMO chan-
nels: asymptotic evaluation under correlated fading,” IEEE J. on Selected
Areas in Communications, vol. 21, pp. 829– 838, June 2003.
[179] S. Moshavi, E. G. Kanterakis, and D. L. Schilling, “Multistage linear receivers
for DS-CDMA systems,” Int. J. of Wireless Information Networks, vol. 39,
no. 1, pp. 1–17, 1996.
[180] A. L. Moustakas and S. H. Simon, “Optimizing multiple-input single-output
(MISO) communication systems with general Gaussian channels: nontrivial
covariance and nonzero mean,” IEEE Trans. on Information Theory, vol. 49,
pp. 2770–2780, Oct. 2003.
[181] A. L. Moustakas, S. H. Simon, and A. M. Sengupta, “MIMO capacity through
correlated channels in the presence of correlated interferers and noise: a (not
so) large N analysis,” IEEE Trans. on Information Theory, vol. 49, pp. 2545–
2561, Oct. 2003.
[182] R. J. Muirhead, Aspects of multivariate statistical theory. New York, Wiley,
1982.
[183] R. R. Müller, Power and Bandwidth Efficiency of Multiuser Systems with
Random Spreading. PhD thesis, Universtät Erlangen-Nürnberg, Erlangen,
Germany, Nov. 1998.
[184] R. R. Müller, “On the asymptotic eigenvalue distribution of concatenated
vector-valued fading channels,” IEEE Trans. on Information Theory, vol. 48,
pp. 2086–2091, July 2002.
[185] R. R. Müller, “Multiuser receivers for randomly spread signals: Fundamen-
tal limits with and without decision-feedback,” IEEE Trans. on Information
Theory, vol. 47, no. 1, pp. 268–283, Jan. 2001.
[186] R. R. Müller and W. Gerstacker, “On the capacity loss due to separation of
detection and decoding in large CDMA systems,” in IEEE Information Theory
Workshop (ITW), p. 222, Oct. 2002.
[187] R. R. Müller and S. Verdú, “Design and analysis of low-complexity interference
mitigation on vector channels,” IEEE J. on Selected Areas on Communica-
tions, vol. 19, pp. 1429–1441, Aug. 2001.
[188] F. D. Neeser and J. L. Massey, “Proper complex random processes with appli-
cations to information theory,” IEEE Trans. on Information Theory, vol. 39,
pp. 1293–1302, July 1993.
[189] A. Nica, R-transforms in free probability. Paris, France: Henri Poincare Insti-
tute, 1999.
[190] A. Nica and R. Speicher, “On the multiplication of free n-tuples of non-
commutative random variables,” American J. Math., vol. 118, no. 4, pp. 799–
837, 1996.
[191] B. Niederhauser, “Norms of certain random matrices with dependent entries,”
Random Operators and Stochastic Equations, vol. 11, no. 1, pp. 83–101, 2003.
[192] A. Y. Orlov, “New solvable matrix integrals,” Acta Sci. Math, vol. 63, pp. 383–
395, 1997.
References 175

[193] V. I. Oseledec, “A multiplicative ergodic theorem. Lyapunov characteristic


numbers for dynamical systems,” Trans. Moscow Math. Soc., vol. 19, pp. 197–
231, 1968.
[194] H. Ozcelik, M. Herdin, W. Weichselberger, G. Wallace, and E. Bonek, “De-
ficiencies of the Kronecker MIMO channel model,” IEE Electronics Letters,
vol. 39, pp. 209–210, Aug. 2003.
[195] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Uniform power allocation in
MIMO channels: A game theoretic approach,” IEEE Trans. on Information
Theory, vol. 49, pp. 1707–1727, July 2003.
[196] L. A. Pastur, “On the spectrum of random matrices,” Teoret. Mat. Fiz. (En-
glish translation: Theoretical and Mathematical Physics), vol. 10, pp. 67–74,
1972.
[197] L. A. Pastur, “Spectra of random self-adjoint operators,” Uspekhi Mat. Nauk.
(Russian Math. Surveys), vol. 28, pp. 1–67, 1973.
[198] L. A. Pastur, “Eigenvalue distribution of random matrices: some recent re-
sults,” Annals Inst. Henri Poincaré, vol. 64, pp. 325–337, 1996.
[199] L. A. Pastur, “Random matrices as paradigm,” Mathematical Physics 2000,
pp. 216–265, 2000.
[200] L. A. Pastur and M. Shcherbina, “Universality of the local eigenvalue statistics
for a class of unitary invariant ensembles,” J. Stat. Physics, vol. 86, pp. 109–
147, 1997.
[201] M. Peacock, I. Collings, and M. L. Honig, “Asymptotic spectral efficiency of
multi-user multi-signature CDMA in frequency-selective channels,” preprint,
2004.
[202] M. Peacock, I. Collings, and M. L. Honig, “Asymptotic SINR analysis of
single-user MC-CDMA in Rayleigh fading,” Proc. IEEE Int. Symp. on Spread
Spectrum Systems and Applications (ISSSTA’02), pp. 338–342, Sep. 2002.
[203] K. I. Pedersen, J. B. Andersen, J. P. Kermoal, and P. E. Mogensen, “A
stochastic multiple-input multiple-output radio channel model for evaluations
of space-time coding algorithms,” Proc. IEEE Vehicular Technology Conf.
(VTC’2000 Fall), pp. 893–897, Sep. 2000.
[204] M. S. Pinsker, Information and Information Stability of Random Variables
and Processes. San Francisco, CA: Holden-Day, 1964.
[205] G. Raleigh and J. M. Cioffi, “Spatio-temporal coding for wireless communi-
cations,” IEEE Trans. on Communications, vol. 46, pp. 357–366, Mar. 1998.
[206] P. Rapajic and D. Popescu, “Information capacity of a random signature
multiple-input multiple-output channel,” IEEE Trans. on Communications,
vol. 48, pp. 1245–1248, Aug. 2000.
[207] L. Rasmussen, A. Johansson, and T. Lim, “One-shot filtering equivalence for
linear successive interference cancellation in CDMA,” in Proc. IEEE Vehicu-
lar. Technology. Conf. (VTC’97), (Phoenix, AZ), pp. 2163–2167, May 1997.
[208] T. Ratnarajah, R. Vaillancourt, and M. Alvo, “Complex random matrices and
applications,” Math. Rep. of the Acad. of Sci. of the Royal Soc. of Canada,
vol. 25, pp. 114–120, Dec. 2003.
176 References

[209] T. Ratnarajah, R. Vaillancourt, and M. Alvo, “Complex random matrices


and Rayleigh channel capacity,” Communications in Information and Systems,
pp. 119–138, Oct. 2003.
[210] S. N. Roy, “p-statistics or some generalizations in the analysis of variance
appropriate to multivariate problems,” Sankhya: Indian J. of Statistics, vol. 4,
pp. 381–396, 1939.
[211] O. Ryan, “On the limit distributions of random matrices with independent or
free entries,” Communications in Mathematical Physics, vol. 193, pp. 595–626,
1998.
[212] U. Sacoglu and A. Scaglione, “Asymptotic capacity of space-time coding for
arbitrary fading: a closed form expression using Girko’s law,” in Proc. Intl.
Conf. on Acoust. Speech and Signal Proc., ICASSP(2001), (Salt Lake City,
UT), May 7-12 2001.
[213] A. Sayeed, “Deconstructing multi-antenna channels,” IEEE Trans. on Signal
Processing, vol. 50, pp. 2563–2579, Oct. 2002.
[214] L. L. Scharf, Statistical Signal Processing: Detection, Estimation, and Time
Series Analysis. New York, NY: Addison-Wesley, 1990.
[215] L. L. Scharf, “The SVD and reduced rank signal processing,” Signal Process-
ing, vol. 24, pp. 113–133, Nov. 1991.
[216] A. M. Sengupta and P. P. Mitra, “Capacity of multivariate channels with
multiplicative noise: I. Random matrix techniques and large-N expansions for
full transfer matrices,” LANL arXiv:physics, Oct. 2000.
[217] S. Shamai and S. Verdú, “The effect of frequency-flat fading on the spectral
efficiency of CDMA,” IEEE Trans. on Information Theory, vol. 47, pp. 1302–
1327, May 2001.
[218] J. Shen, “On the singular values of Gaussian random matrices,” Linear Algebra
and its Applications, vol. 326, no. 1-3, pp. 1–14, 2001.
[219] H. Shin and J. H. Lee, “Capacity of multiple-antenna fading channels: Spatial
fading correlation, double scattering and keyhole,” IEEE Trans. on Informa-
tion Theory, vol. 49, pp. 2636–2647, Oct. 2003.
[220] D.-S. Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, “Fading correlation
and its effects on the capacity of multi-element antenna systems,” IEEE Trans.
on Communications, vol. 48, pp. 502–511, Mar. 2000.
[221] D. Shlyankhtenko, “Random Gaussian band matrices and freeness with amal-
gamation,” Int. Math. Res. Note, vol. 20, pp. 1013–1025, 1996.
[222] J. Silverstein and S. Choi, “Analysis of the limiting spectral distribution of
large dimensional random matrices,” J. of Multivariate Analysis, vol. 54(2),
pp. 295–309, 1995.
[223] J. W. Silverstein, “The limiting eigenvalue distribution of a multivariate F-
matrix,” SIAM J. of Math. Analysis, vol. 16, pp. 641–646, 1985.
[224] J. W. Silverstein, “On the eigenvectors of large dimensional sample covariance
matrices,” J. of Multivariate Analysis, vol. 30, pp. 1–16, 1989.
[225] J. W. Silverstein, “Weak-convergence of random functions defined by the
eigenvectors of sample covariance matrices,” Annals of Probability, vol. 18,
pp. 1174–1194, July 1990.
References 177

[226] J. W. Silverstein, “Strong convergence of the empirical distribution of eigen-


values of large dimensional random matrices,” J. of Multivariate Analysis,
vol. 55, pp. 331–339, 1995.
[227] J. W. Silverstein and Z. D. Bai, “On the empirical distribution of eigenvalues
of a class of large dimensional random matrices,” J. of Multivariate Analysis,
vol. 54, pp. 175–192, 1995.
[228] J. W. Silverstein and P. L. Combettes, “Signal detection via spectral theory
of large dimensional random matrices,” IEEE Trans. on Signal Processing,
vol. 40, pp. 2100–2105, Aug. 1992.
[229] S. H. Simon and A. L. Moustakas, “Optimizing MIMO systems with channel
covariance feedback,” IEEE J. on Selected Areas in Communications, vol. 21,
pp. 406–417, Apr. 2003.
[230] S. H. Simon and A. L. Moustakas, “Eigenvalue density of correlated complex
random Wishart matrices,” Physical Review E, vol. 69, to appear, 2004.
[231] S. H. Simon, A. L. Moustakas, and L. Marinelli, “Capacity and character
expansions: Moment generating function and other exact results for MIMO
correlated channels,” Bell Labs Technical Memorandum ITD-04-45211T, Mar.
2004.
[232] R. Singh and L. Milstein, “Interference suppression for DS-CDMA,” IEEE
Trans. on Communications, vol. 47, pp. 446–453, Mar. 1999.
[233] P. J. Smith and L. M. Garth, “Exact capacity distribution for dual MIMO
systems in Ricean fading,” IEEE Communications Letters, vol. 8, pp. 18–20,
Jan. 2004.
[234] P. J. Smith, S. Roy, and M. Shafi, “Capacity of MIMO systems with semi-
correlated flat-fading,” IEEE Trans. on Information Theory, vol. 49, pp. 2781–
2788, Oct. 2003.
[235] P. J. Smith and M. Shafi, “On a Gaussian approximation to the capac-
ity of wireless MIMO systems,” Proc. IEEE Int. Conf. in Communications.
(ICC’02), pp. 406–410, Apr. 2002.
[236] P. Soma, D. S. Baum, V. Erceg, R. Krishnamoorthy, and A. Paulraj, “Analysis
and modelling of multiple-input multiple-output (MIMO) radio channel based
on outdoor measurements conducted at 2.5 GHz for fixed BWA applications,”
Proc. IEEE Int. Conf. on Communications (ICC’02), New York City, NY,
pp. 272–276, 28 Apr.-2 May 2002.
[237] O. Somekh, B. M. Zaidel, and S. Shamai, “Spectral efficiency of joint multiple
cell-site processors for randomly spread DS-CDMA systems,” in Proc. IEEE
Int. Symp. on Information Theory (ISIT’04), (Chicago, IL), June 2004.
[238] H. J. Sommers, A. Crisanti, H. Sompolinsky, and Y. Stein, “Spectrum of large
random asymmetric matrices,” Physical Review Letters, vol. 60, pp. 1895–
1899, May 1988.
[239] R. Speicher, “A new example of independence and white noise,” Probability
Theory and Related Fields, vol. 84, pp. 141–159, 1990.
[240] R. Speicher, “Free convolution and the random sum of matrices,” Publ. Res.
Inst. Math. Sci., vol. 29, pp. 731–744, 1993.
[241] R. Speicher, Free probability theory and non-crossing partitions. 39e Séminaire
Lotharingien de Combinatoire, 1997.
178 References

[242] R. Speicher, “Free calculus,” in Summer school on Quantum Probability,


(Grenoble, France), 1998.
[243] R. Speicher, Freie Wahrscheinlichkeitstheorie (WS 97/98). Heidelberg Uni-
versity, 1998.
[244] M. S. Srivastava and C. G. Khatri, An Introduction to Multivariate Statistics.
North-Holland, Amsterdam, 1979.
[245] G. W. Stewart, “The efficient generation of random orthogonal matrices with
an application to conditional estimation,” SIAM J. Numer. Analysis, vol. 17,
pp. 403–409, June 1980.
[246] T. J. Stieltjes, “Recherches sur les fractions continues,” Annales de la Faculte
des Sciences de Toulouse, vol. 8 (9), no. A (J), pp. 1–47 (1–122), 1894 (1895).
[247] E. G. Ström and S. L. Miller, “Properties of the single-bit single-user MMSE
receiver for DS-CDMA system,” IEEE Trans. on Communications, vol. 47,
pp. 416–425, Mar. 1999.
[248] S. Szarek and D. Voiculescu, “Volumes of restricted Minkowski sums and the
free analogue of the entropy power inequality,” Communications in Mathe-
matical Physics, vol. 178, pp. 563–570, July 1996.
[249] T. Tanaka, “A statistical-mechanics approach to large-system analysis of
CDMA multiuser detectors,” IEEE Trans. on Information Theory, vol. 48,
pp. 2888–2910, Nov. 2002.
[250] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Euro. Trans.
Telecommunications, vol. 10, pp. 585–595, Nov.-Dec. 1999.
[251] L. Trichard, I. Collings, and J. Evans, “Parameter selection for multiuser
receivers based on iterative methods,” in Proc. IEEE Vehicular Techn. Conf.
(VTC’00), (Tokyo, Japan), pp. 926–930, May 2000.
[252] L. Trichard, J. Evans, and I. Collings, “Large system analysis of linear mul-
tistage parallel interference cancellation,” IEEE Trans. on Communications,
vol. 50, pp. 1778–1786, Nov. 2002.
[253] L. Trichard, J. Evans, and I. Collings, “Optimal linear multistage receivers
and the recursive large system SIR,” in Proc. IEEE Int. Symp. on Information
Theory (ISIT’02), (Lausanne, Switzerland), p. 21, June 2002.
[254] L. Trichard, J. Evans, and I. Collings, “Large system analysis of second-order
linear multistage CDMA receivers,” IEEE Trans. on Wireless Communica-
tions, vol. 2, pp. 591–600, May 2003.
[255] L. Trichard, J. Evans, and I. Collings, “Optimal linear multistage receivers
with unequal power users,” in Proc. IEEE Int. Symp. on Information Theory
(ISIT’03), (Yokohama, Japan), p. 21, June 2003.
[256] D. Tse and S. Hanly, “Linear multiuser receivers: Effective interference, ef-
fective bandwidth and user capacity,” IEEE Trans. on Information Theory,
vol. 45, pp. 641–657, Mar. 1999.
[257] D. Tse and O. Zeitouni, “Linear multiuser receivers in random environments,”
IEEE Trans. on Information Theory, vol. 46, pp. 171–188, Jan. 2000.
[258] D. N. Tse and S. Verdú, “Optimum asymptotic multiuser efficiency of ran-
domly spread CDMA,” IEEE Trans. on Information Theory, vol. 46, no. 6,
pp. 2718–2723, Nov. 2000.
References 179

[259] D. N. Tse and P. Viswanath, “On the capacity of the multiple antenna broad-
cast channel,” in Multiantenna channels: Capacity, Coding and Signal Pro-
cessing, (G. Foschini and S. Verdú, eds.), pp. 87–106, American Mathematical
Society Press, 2003.
[260] B. S. Tsybakov, “The capacity of a memoryless Gaussian vector channel,”
Problems of Information Transmission, vol. 1, pp. 18–29, 1965.
[261] A. M. Tulino, A. Lozano, and S. Verdú, “Capacity-achieving input covariance
for single-user multi-antenna channels,” Bell Labs Tech. Memorandum ITD-
04-45193Y (also submitted to IEEE Trans. on Wireless Communications.),
Sep. 2003.
[262] A. M. Tulino, A. Lozano, and S. Verdú, “Impact of correlation on the capacity
of multi-antenna channels,” Bell Labs Technical Memorandum ITD-03-44786F
(also submitted to IEEE Trans. on Information Theory), Sep. 2003.
[263] A. M. Tulino, A. Lozano, and S. Verdú, “MIMO capacity with channel state
information at the transmitter,” in Proc. IEEE Int. Symp. on Spread Spectrum
Tech. and Applications (ISSSTA’04), Aug. 2004.
[264] A. M. Tulino, A. Lozano, and S. Verdú, “Power allocation in multi-antenna
communication with statistical channel information at the transmitter,” in
Proc. IEEE Int. Conf. on Personal, Indoor and Mobile Radio Communica-
tions. (PIMRC’04), (Barcelona, Catalonia, Spain), Sep. 2004.
[265] A. M. Tulino and S. Verdú, “Asymptotic analysis of improved linear receivers
for BPSK-CDMA subject to fading,” IEEE J. on Selected Areas in Commu-
nications, vol. 19, pp. 1544–1555, Aug. 2001.
[266] A. M. Tulino, S. Verdú, and A. Lozano, “Capacity of antenna arrays with
space, polarization and pattern diversity,” Proc. 2003 IEEE Information The-
ory Workshop (ITW’03), pp. 324–327, Apr. 2003.
[267] H. Uhlig, “On singular Wishart and singular multivariate beta distributions,”
Annals of Statistics, vol. 22, pp. 395–405, 1994.
[268] V. V. Veeravalli, Y. Liang, and A. Sayeed, “Correlated MIMO Rayleigh fading
channels: Capacity, optimal signalling and asymptotics,” submitted to IEEE
Trans. on Information Theory, 2003.
[269] S. Venkatesan, S. H. Simon, and R. A. Valenzuela, “Capacity of a Gaussian
MIMO channel with nonzero mean,” Proc. 2003 IEEE Vehicular Technology
Conf. (VTC’03), Oct. 2003.
[270] S. Verdú, “Capacity region of Gaussian CDMA channels: The symbol syn-
chronous case,” in Proc. Allerton Conf. on Communication, Control and Com-
puting, (Monticello, IL), pp. 1025–1034, Oct. 1986.
[271] S. Verdú, Multiuser Detection. Cambridge, UK: Cambridge University Press,
1998.
[272] S. Verdú, “Random matrices in wireless communication, proposal to the Na-
tional Science Foundation,” Feb. 1999.
[273] S. Verdú, “Large random matrices and wireless communications,” 2002 MSRI
Information Theory Workshop, Feb 25–Mar 1, 2002.
[274] S. Verdú, “Spectral efficiency in the wideband regime,” IEEE Trans. on In-
formation Theory, vol. 48, no. 6, pp. 1319–1343, June 2002.
180 References

[275] S. Verdú and S. Shamai, “Spectral efficiency of CDMA with random spread-
ing,” IEEE Trans. on Information Theory, vol. 45, pp. 622–640, Mar. 1999.
[276] S. Vishwanath, N. Jindal, and A. Goldsmith, “On the capacity of multiple
input multiple output broadcast channels,” in Proc. IEEE Int. Conf. in Com-
munications (ICC’02), pp. 1444–1450, Apr. 2002.
[277] S. Vishwanath, N. Jindal, and A. Goldsmith, “Duality, achievable rates and
sum-rate capacity of Gaussian MIMO broadcast channels,” IEEE Trans. on
Information Theory, vol. 49, pp. 2658–2668, Oct. 2003.
[278] S. Vishwanath, G. Kramer, S. Shamai, S. Jafar, and A. Goldsmith, “Capacity
bounds for Gaussian vector broadcast channels,” in Multiantenna channels:
Capacity, Coding and Signal Processing, (G. Foschini and S. Verdú, eds.),
pp. 107–122, American Mathematical Society Press, 2003.
[279] E. Visotsky and U. Madhow, “Space-time transmit precoding with imperfect
feedback,” IEEE Trans. on Information Theory, vol. 47, pp. 2632–2639, Sep.
2001.
[280] P. Viswanath and D. N. Tse, “Sum capacity of the multiple antenna Gaussian
broadcast channel,” in Proc. IEEE Int. Symp. Information Theory (ISIT’02),
p. 497, June 2002.
[281] P. Viswanath, D. N. Tse, and V. Anantharam, “Asymptotically optimal water-
filling in vector multiple-access channels,” IEEE Trans. on Information The-
ory, vol. 47, pp. 241–267, Jan. 2001.
[282] H. Viswanathan and S. Venkatesan, “Asymptotics of sum rate for dirty paper
coding and beamforming in multiple antenna broadcast channels,” in Proc.
Allerton Conf. on Communication, Control and Computing, (Monticello, IL),
Oct. 2003.
[283] D. Voiculescu, “Asymptotically commuting finite rank unitary operators with-
out commuting approximants,” Acta Sci. Math., vol. 45, pp. 429–431, 1983.
[284] D. Voiculescu, “Symmetries of some reduced free product c∗ -algebra,” in Op-
erator algebras and their connections with topology and ergodic theory, Lecture
Notes in Mathematics, vol. 1132, pp. 556–588, Berlin: Springer, 1985.
[285] D. Voiculescu, “Addition of certain non-commuting random variables,” J.
Funct. Analysis, vol. 66, pp. 323–346, 1986.
[286] D. Voiculescu, “Multiplication of certain non-commuting random variables,”
J. Operator Theory, vol. 18, pp. 223–235, 1987.
[287] D. Voiculescu, “Limit laws for random matrices and free products,” Inven-
tiones Mathematicae, vol. 104, pp. 201–220, 1991.
[288] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure
in free probability theory, I,” Communications in Math. Physics, vol. 155,
pp. 71–92, July 1993.
[289] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure
in free probability theory, II,” Inventiones Mathematicae, vol. 118, pp. 411–
440, Nov. 1994.
[290] D. Voiculescu, “Alternative proofs for the type II free Poisson variables and
for the free compression results (appendix to a paper by A. Nica and R.
Speicher),” American J. Math., vol. 118, pp. 832–837, 1996.
References 181

[291] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure


in free probability theory III: The absence of Cartan subalgebras,” Geometric
and Functional Analysis, vol. 6, pp. 172–199, 1996.
[292] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure
in free probability theory, IV: Maximum entropy and freeness,” in Free Proba-
bility Theory, (D. Voiculescu, ed.), pp. 293–302, Fields Inst. Communications,
Amer. Math. Soc., 1997.
[293] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure
in free probability theory V: noncommutative Hilbert transforms,” Inventiones
Mathematicae, vol. 132, pp. 189–227, Apr. 1998.
[294] D. Voiculescu, “A strengthened asymptotic freeness result for random matrices
with applications to free entropy,” Int. Math. Res. Notices, vol. 1, pp. 41–63,
1998.
[295] D. Voiculescu, “Lectures on free probability theory,” in Lectures on Probability
theory and Statistics: Ecole d’Ete de Probabilites; Lecture Notes in Mathemat-
ics, pp. 283–349, Springer, 2000.
[296] K. W. Wachter, “The strong limits of random matrix spectra for sample ma-
trices of independent elements,” Annals of Probability, vol. 6, no. 1, pp. 1–18,
1978.
[297] K. W. Wachter, “The limiting empirical measure of multiple discriminant
ratios,” Annals of Statistics, vol. 8, pp. 937–957, 1980.
[298] X. Wang and H. V. Poor, “Blind multiuser detection: A subspace approach,”
IEEE Trans. on Information Theory, vol. 44, pp. 677–690, Mar. 1998.
[299] Z. Wang and G. Giannakis, “Outage mutual information of space-time MIMO
channels,” IEEE Trans. on Information Theory, vol. 50, pp. 657–663, Apr.
2004.
[300] S. Wei and D. Goeckel, “On the minimax robustness of the uniform trans-
mission power strategy in MIMO systems,” IEEE Communications Letters,
vol. 7, pp. 523–524, Nov. 2003.
[301] W. Weichselberger, M. Herdin, H. Ozcelik, and E. Bonek, “Stochastic MIMO
channel model with joint correlation of both link ends,” to appear in IEEE
Trans. on Wireless Communications, 2004.
[302] H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of the
Gaussian MIMO broadcast channel,” Proc. Conf. on Information Sciences
and Systems (CISS’04), pp. 7–12, Mar. 2004.
[303] E. Wigner, “Characteristic vectors of bordered matrices with infinite dimen-
sions,” Annals of Mathematics, vol. 62, pp. 546–564, 1955.
[304] E. Wigner, “Results and theory of resonance absorption,” in Conference on
Neutron Physics by Time-of-Flight, Nov. 1-2 1956. Oak Ridge National Lab.
Report ORNL-2309.
[305] E. Wigner, “On the distribution of roots of certain symmetric matrices,” An-
nals of Mathematics, vol. 67, pp. 325–327, 1958.
[306] E. Wigner, “Statistical properties of real symmetric matrices with many di-
mensions,” Proc. 4th Canadian Math. Congress, pp. 174–176, 1959.
182 References

[307] E. Wigner, “Distribution laws for the roots of a random Hermitian matrix,”
in Statistical Theories of Spectra: Fluctuations, (C. E. Porter, ed.), New York:
Academic, 1965.
[308] E. Wigner, “Random matrices in physics,” SIAM Review, vol. 9, pp. 1–123,
1967.
[309] J. H. Winters, “Optimum combining in digital mobile radio with cochannel
interference,” IEEE J. on Selected Areas in Communications, vol. 2, pp. 528–
539, July 1984.
[310] J. H. Winters, J. Salz, and R. D. Gitlin, “The impact of antenna diversity on
the capacity of wireless communication systems,” IEEE Trans. on Communi-
cations, vol. 42, pp. 1740–1751, Feb./Mar./Apr. 1994.
[311] J. Wishart, “The generalized product moment distribution in samples from a
normal multivariate population,” Biometrika, vol. 20 A, pp. 32–52, 1928.
[312] W. Xiao and M. L. Honig, “Large system convergence analysis of adaptive
reduced- and full-rank least squares algorithms,” IEEE Trans. on Information
Theory, 2004, to appear.
[313] Y. Q. Yin, “Limiting spectral distribution for a class of random matrices,” J.
of Multivariate Analysis, vol. 20, pp. 50–68, 1986.
[314] Y. Q. Yin and P. R. Krishnaiah, “A limit theorem for the eigenvalues of
product of two random matrices,” J. of Multivariate Analysis, vol. 13, pp. 489–
507, 1984.
[315] Y. Q. Yin and P. R. Krishnaiah, “Limit theorem for the eigenvalues of the
sample covariance matrix when the underlying distribution is isotropic,” The-
ory Prob. Appl., vol. 30, pp. 861–867, 1985.
[316] W. Yu and J. Cioffi, “Trellis precoding for the broadcast channel,” in Proc.
IEEE Global Telecomm. Conf. (GLOBECOM’01), pp. 1344–1348, Oct. 2001.
[317] B. M. Zaidel, S. Shamai, and S. Verdú, “Multicell uplink spectral efficiency of
coded DS-CDMA with random signatures,” IEEE Journal on Selected Areas
in Communications, vol. 19, pp. 1556–1569, Aug. 2001.
[318] J. Zhang and X. Wang, “Large-system performance analysis of blind and
group-blind multiuser receivers,” IEEE Trans. on Information Theory, vol. 48,
pp. 2507–2523, Sep. 2002.

You might also like