0% found this document useful (0 votes)
6 views

Chapter 3

Uploaded by

elhocine
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Chapter 3

Uploaded by

elhocine
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

Chapter 3

PROBABILITY AND STOCHASTIC PROCESSES

God does not play dice with the universe.


—Albert Einstein

Not only does God definitely play dice, but He sometimes confuses us by throwing them
where they cannot be seen.
—Stephen Hawking

Abstract This chapter aims to provide a cohesive overview of basic probability concepts,
starting from the axioms of probability, covering events, event spaces, joint and
conditional probabilities, leading to the introduction of random variables, dis-
crete probability distributions (PDP) and probability density functions (PDFs).
Then follows an exploration of random variables and stochastic processes in-
cluding cumulative distribution functions (CDFs), moments, joint densities, mar-
ginal densities, transformations and algebra of random variables. A number of
useful univariate densities (Gaussian, chi-square, non-central chi-square, Rice,
etc.) are then studied in turn. Finally, an introduction to multivariate statis-
tics is provided, including the exterior product (used instead of the conventional
determinant in matrix r.v. transformations), Jacobians of random matrix trans-
formations and culminating with the introduction of the Wishart distribution.
Multivariate statistics are instrumental in characterizing multidimensional prob-
lems such as array processing operating over random vector or matrix channels.
Throughout the chapter, an emphasis is placed on complex random variables,
vectors and matrices, because of their unique importance in digital communica-
tions.

91
92 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

This material to appear as part of the book Space-Time Methods, vol. 1:


Space-Time Processing
c
�2009 by S«ebastien Roy. All rights reserved.

1. Introduction
The theory of probability and stochastic processes is of central importance
in many core aspects of communication theory, including modeling of informa-
tion sources, of additive noise, and of channel characteristics and fluctuations.
Ultimately, through such modeling, probability and stochastic processes are in-
strumental in assessing the performance of communication systems, wireless
or otherwise.
It is expected that most readers will have at least some familiarity with this
vast topic. If that is not the case, the brief overview provided here may be
less than satisfactory. Interested readers who wish for a fuller treatment of
the subject from an engineering perspective may consult a number of classic
textbooks [Papoulis and Pillai, 2002], [Leon-Garcia, 1994], [Davenport, 1970].
The subject matter of this book calls for the development of a perhaps lesser-
known, but increasingly active, branch of statistics, namely multivariate statis-
tical theory. It is an area whose applications — in the field of communication
theory in general, and in array processing / MIMO systems in particular —
have grown considerably in recent years, mostly thanks to the usefulness and
polyvalence of the Wishart distribution. To supplement the treatment given
here, readers are directed to the excellent textbooks [Muirhead, 1982] and [An-
dersen,1958].

2. Probability
Experiments, events and probabilities
Definition 3.1. A probability experiment or statistical experiment consists
in performing an action which may result in a number of possible outcomes,
the actual outcome being randomly determined.
For example, rolling a die and tossing a coin are probability experiments. In
the first case, there are 6 possible outcomes while in the second case, there are
2.
Definition 3.2. The sample space S of a probability experiment is the set of
all possible outcomes.
In the case of a coin toss, we have

S = {t, h} , (3.1)
Probability and stochastic processes 93

where t denotes “tails” and h denotes “heads”.


Another, slightly more sophisticated, probability experiment could consist
in tossing a coin five times and defining the outcome as being the total number
of heads obtained. Hence, the sample space would be
S = {1, 2, 3, 4, 5} . (3.2)
Definition 3.3. An event E occurs if the outcome of the experiment is part of
a predetermined subset (as defined by E) of the sample space S.
For example, let the event A correspond to the obtention of an even number
of heads in the five-toss experiment. Therefore, we have
A = {2, 4} . (3.3)
A single outcome can also be considered an event. For instance, let the event
B correspond to the obtention of three heads in the five-toss experiment, i.e.
B = {3}. This is also called a single event or a sample point (since it is a
single element of the sample space).
Definition 3.4. The complement of an event X consists of all the outcomes
(sample points) in the sample space S that are not in event X.
For example, the complement of event A consists in obtaining an odd num-
ber of heads, i.e.
Ā = {1, 3, 5} . (3.4)
Two events are considered mutually exclusive if they have no outcome in
common. For instance, A and Ā are mutually exclusive. In fact, an event and
its complement must by definition be mutually exclusive. The events C = {1}
and D = {3, 5} are also mutually exclusive, but B = {3} and D are not.
Definition 3.5. Probability (simple definition): If an experiment has N pos-
sible and equally-likely exclusive outcomes, and M ≤ N of these outcomes
constitute an event E, then the probability of E is
M
P (E) = . (3.5)
N
Example 3.1
Consider a die roll. If the die is fair, all sample point are equally likely. Defin-
ing event Ek as corresponding to outcome / sample point k, where k ranges
from 1 to 6, we have
1
P (Ek ) = , k = 1 . . . 6.
6
Given an event F = {1, 2, 3, 4}, we find
4 2
P (F ) = = ,
6 3
94 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

and
2 1
P (F̄ ) =
= .
6 3
Definition 3.6. The sum or union of two events is an event that contains all
the outcomes in the two events.
For example,

C ∪ D = {1} ∪ {3, 5} = {1, 3, 5} = Ā. (3.6)

Therefore the event Ā corresponds to the occurence of event C or event D.


Definition 3.7. The product or intersection of two events is an event that
contains only the outcomes that are common in the two events.
For example
Ā ∩ B = {1, 3, 5} ∩ {3} = {3} , (3.7)
and
{1, 2, 3, 4} ∩ Ā = {1, 2, 3, 4} ∩ {1, 3, 5} = {1, 3} . (3.8)
Therefore, the intersection corresponds to the occurence of one event and
the other.
It is noteworthy that the intersection of two mutually exclusive events yields
the null event, e. g.
E ∩ Ē = ∅, (3.9)
where ∅ denotes the null event.
A more rigorous definition of probability calls for the statement of four pos-
tulates.
Definition 3.8. Probability (rigorous definition): Given that each event E is
associated with a corresponding probability P (E),
Postulate 1: The probability of a given event E is such that P (E) ≥ 0.
Postulate 2: The probability associated with the null event is zero, i.e.
P (∅) = 0.
Postulate 3: The probability of the event corresponding to the entire sample
space (referred to as the certain event) is 1, i.e. P (S) = 1.
Postulate 4: Given a number N of mutually exclusive events X1 , X2 ,
. . . XN , then the probability of the union of these events is given by
N
� � �
P ∪N X
i=1 i = P (Xi ). (3.10)
i=1
Probability and stochastic processes 95

From postulates 1, 3 and 4, it is easy to deduce that the probability of an


event E must necessarily satisfy the condition

P (X) ≤ 1.

The proof is left as an exercise.

Twin experiments
It is often of interest to consider two separate experiments as a whole. For
example, if one probability experiment consists of a coin toss with event space
S = {h, t}, two consecutive (or simultaneous) coin tosses constitute twin ex-
periments or a joint experiment. The event space of such a joint experiment is
therefore
S2 = {(h, h), (h, t), (t, h), (t, t)} . (3.11)
Let Xi (i = 1, 2) correspond to an outcome on the first coin toss and Yj (j =
1, 2) correspond to an outcome on the second coin toss. To each joint outcome
(Xi , Yj ) is associated a joint probability P (Xi , Yj ). This corresponds naturally
to the probability that events Xi and Yj occurred and it can therefore also be
written P (Xi ∩ Yj ). Suppose that given only the set of joint probabilities, we
wish to find P (X1 ) and P (X2 ), that is, the marginal probabilities of events X1
and X2 .
Definition 3.9. The marginal probability of an event E is, in the context of
a joint experiment, the probability that event E occurs irrespective of the other
constituent experiments in the joint experiment.
In general, a twin experiment has outcomes Xi , i = 1, 2, . . . , N1 , for the
first experiment and Yj , j = 1, 2, . . . , N2 for the second experiment. If all the
Yj are mutually exclusive, the marginal probability of Xi is given by
N2

P (Xi ) = P (Xi , Yj ) . (3.12)
j=1

By the same token, if all the Xi are mutually exclusive, we have


N1

P (Yj ) = P (Xi , Yj ) . (3.13)
i=1

Now, suppose that the outcome of one of the two experiments is known, but
not the other.
Definition 3.10. The conditional probability of an event Xi given an event
Yj is the probability that event Xi will occur given that Yj has occured.
96 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The conditional probability of Xi given Yj is defined as


P (Xi , Yj )
P (Xi |Yj ) = . (3.14)
P (Yj )
Likewise, we have
P (Xi , Yj )
P (Yj |Xi ) = . (3.15)
P (Xi )
A more general form of the above two relations is known as Bayes’ rule.
Definition 3.11. Bayes’ rule: Given {Y1 , . . . , YN }, a set of N mutually ex-
clusive events whose union forms the entire sample space S, and X is any
arbitrary event in S with non-zero probability (P (X) ≥ 0), then
P (X, Yj )
P (Yj |X) =
P (X)
P (Yj ) P (X|Yj )
= (3.16)
P (X)
P (Yj ) P (X|Yj )
= �N . (3.17)
j=1 P (Yj ) P (X|Yj )

Another important concern in a twin experiment is whether or not the oc-


currence of one event (Xi ) influences the probability that another event (Yj )
will occur, i.e. whether the events Xi and Yj are independent.
Definition 3.12. Two events X and Y are said to be statistically independent
if P (X|Y ) = P (X) and P (Y |X) = P (Y ).
For instance, in the case of the twin coin toss joint experiment, the events
X1 = {h} and X2 = {t} — being the potential outcomes of the first coin
toss — are independent from the events Y1 = {h} and Y2 = {t}, the potential
outcomes of the second coin toss. However, X1 is certainly not independent
from X2 since the occurence of X1 precludes the occurence of X2 .
Typically, independence results if the two events considered are generated
by physically separate probability experiments (e. g. two different coins).
Example 3.2
Consider a twin die toss. The results on one die is independent from the other
since there is no physical linkage between the two dice. However, if we refor-
mulate the joint experiment as follows:
Experiment 1: the outcome is the sum of the two dice and corresponds to
the event Xi , i=1, 2, . . . , 11. The corresponding sample space is SX =
{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
Probability and stochastic processes 97

Experiment 2: the outcome is the magnitude of the difference between the


two dice and corresponds to the event Yj , j=1, 2, . . . , 6. The corresponding
sample space is SY = {0, 1, 2, 3, 4, 5}.
In this case, the two experiments are linked since they are both derived from
the same dice toss. For example, if Xi = 4, then Yj can only take the values
{0, 2}. We therefore have statistical dependence.
From (3.14), we find the following:
Definition 3.13. Multiplicative rule: Given two events X and Y , the proba-
bility that they both occur is

P (X, Y ) = P (X)P (Y |X) = P (Y )P (X|Y ). (3.18)

Hence, the probability that X and Y occur is the probability that one of
these events occurs times the probability that the other event occurs, given that
the first one has occurred.
Furthermore, should events X and Y be independent, the above reduces to
(special multiplicative rule):

P (X, Y ) = P (X)P (Y ). (3.19)

3. Random variables
In the preceding section, it was seen that a statistical experiment is any op-
eration or physical process by which one or more random measurements are
made. In general, the outcome of such an experiment can be conveniently
represented by a single number.
Let the function X(s) constitute such a mapping, i.e. X(s) takes on a value
on the real line as a function of s where s is an arbitrary sample point in the
sample space S. Then X(s) is a random variable.
Definition 3.14. A function whose value is a real number and is a function of
an element chosen in a sample space S is a random variable or r.v..
Given a die toss, the sample space is simply

S = {1, 2, 3, 4, 5, 6} . (3.20)

In a straightforward manner, we can define a random variable X(s) = s


which takes on a value determined by the number of spots found on the top
surface of the die after the roll.
A slightly less obvious mapping would be the following

−1 if s = 1, 3, 5,
X(s) = (3.21)
1 if s = 2, 4, 6,
98 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

which is an r.v. that takes on a value of 1 if the die roll is even and -1 otherwise.
Random variables need not be defined directly from a probability experi-
ment, but can actually be derived as functions of other r.v.’s. Going back to
example 3.2, and letting D1 (s) = s be an r.v. associated with the first die and
D2 (s) = s with the second die, we can define the r.v’s

X = D1 + D2 , (3.22)
Y = |D1 − D2 |, (3.23)

where X is an r.v. corresponding to the sum of the dice and defined as X(sX ) =
sX , where sX is an element of the sample space SX = {2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12}. The same holds for Y , but with respect to sample space
SY = {0, 1, 2, 3, 4, 5}.
Example 3.3
Consider a series of N consecutive coin tosses. Let us define the r.v. X as
being the total number of heads obtained. Therefore, the sample space of X is

SX = {0, 1, 2, 3, . . . , N } . (3.24)

Furthermore, we define an r.v. Y = X N , it being the ratio of the number of heads


to the total number of tosses. If N = 1, the sample � space �Y corresponds to the
set {0,
� 1 2 1}. If N =� 2, the sample space becomes 0, 1
2 , 1 , and if N = 10, it is
0, 10 , 10 , . . . , 1 .
Hence, the variable Y is always constrained between 0 and 1; however, if
we let N tend towards infinity, it can take an infinite number of values within
this interval (its sample space is infinite) and it thus becomes a continuous r.v.
Definition 3.15. A continuous random variable is an r.v. which is not re-
stricted to a discrete set of values, i.e. it can take any real value within a
predetermined interval or set of intervals.
It follows that a discrete random variable is an r.v. restricted to a finite
set of values (its sample space is finite) and corresponds to the type of r.v. and
probability experiments discussed so far.
Typically, discrete r.v.’s are used to represent countable data (number of
heads, number of spots on a die, number of defective items in a sample set,
etc.) while continuous r.v.’s are used to represent measurable data (heights,
distances, temperatures, electrical voltages, etc.).

Probability distribution
Since each value that a discrete random variable can assume corresponds to
an event or some quantification / mapping / function of an event, it follows that
each such value is associated with a probability of occurrence.
Probability and stochastic processes 99

Consider the variable X in example 3.3. If N = 4, we have the following:

x 0 1 2 3 4
P (X = x) 1
16
4
16
6
16
4
16
1
16

Assuming a fair coin, the underlying sample space is made up of sixteen


outcomes

S = {tttt, ttth, ttht, tthh, thtt, thth, thht, thhh, httt, htth, htht,
hthh, hhtt, hhth, hhht, hhhh} , (3.25)
1
and each of its sample points is equally likely with probability 16 . Only one of
these outcomes (tttt) has no heads; it follows that P (X = 0) = 16 1
. However,
four outcomes have one head and three tails (httt, thtt, ttht, ttth), leading to
to P (X = 1) = 16 4
. Furthermore, there are 6 possible combinations of 2 heads
and 2 tails, yielding P (X = 2) = 16 6
. By symmetry, we have P (X = 3) =
P (X = 1) = 16 and P (X = 4) = P (X = 0) = 16
4 1
.
Knowing that the number of combinations of N distinct objects taken n at
a time is � �
N N!
= , (3.26)
n n!(N − n)!
the probabilities tabulated above (for N = 4) can be expressed with a single
formula, as a function of x:
� �
4 1
P (X = x) = , x = 0, 1, 2, 3, 4. (3.27)
x 16
Such a formula constitutes the discrete probability distribution of X.
Suppose now that the coin is not necessarily fair and is characterized by
a probability p of getting heads and a probability 1 − p of getting tails. For
arbitrary N , we have
� �
N
P (X = x) = px (1 − p)N −x , x ∈ {0, 1, . . . , N } , (3.28)
x
which is known as the binomial distribution.

Probability density function


Consider again example 3.3. We have found that the variable X has a dis-
crete probability distribution of the form (3.28). Figure 3.1 shows histograms
of P (X = x) for increasing values of N . It can be seen that as N gets large,
the histogram approaches a smooth curve and its “tails” spread out. Ultimately,
if we let N tend towards infinity, we will observe a continuous curve like the
one in Figure 3.1d.
100 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The underlying r.v. is of the continuous variety, and Figure 3.1d is a graph-
ical representation of its probability density function (PDF). While a PDF
cannot be tabulated like a discrete probability distribution, it certainly can be
expressed as a mathematical function. In the case of Figure 3.1d, the PDF is
expressed
2
1 − x2
fX (x) = √ e 2σX , (3.29)
2πσX
where σX 2 is the variance of the r.v. X. This density function is the all-

important normal or Gaussian distribution. See the Appendix for a derivation


of this PDF which ties in with the binomial distribution.
0.2

0.25

0.15
0.2
P (X = x)
P (X = x)

0.15
0.1

0.1
0.05
0.05

0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
x x

(a) N = 8 (b) N = 16
0.14
0.14

0.12
0.12

0.1
0.1
P (X = x)

P (X = x)

0.08
0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0
1 5 10 15 20 25 30 33 0 5 10 15 20 25 30
x x

(c) N = 32 (d) Gaussian PDF

Figure 3.1. Histograms of the discrete probability function of the binomial distributions with
(a) N = 8; (b) N = 16; (c) N = 32; and (d) a Gaussian PDF with the same mean and variance
as the binomial distribution with N = 32.

A PDF has two important properties:

1. fX (x) ≥ 0 for all x, since a negative probability makes no sense;


�∞
2. −∞ fX (x)dx = 1 which is the continuous version of postulate 3 from
section 2.
Probability and stochastic processes 101

It is often of interest to determine the probability that an r.v. takes on a


value which is smaller than a predetermined threshold. Mathematically, this is
expressed
� x
FX (x) = P (X ≤ x) = fX (α)dα, −∞ < x < ∞, (3.30)
−∞

where fX (x) is the PDF of variable X and FX (x) is its cumulative distribu-
tion function (CDF). A CDF has three outstanding properties:
1. FX (−∞) = 0 (obvious from (3.30));
2. FX (∞) = 1 (a consequence of property 2 of PDFs);
3. FX (x) increases in a monotone fashion from 0 to 1 (from (3.30) and prop-
erty 1 of PDFs).
Furthermore, the relation (3.30) can be inverted to yield
dFX (x)
fX (x) = . (3.31)
dx
It is also often useful to determine the probability that an r.v. X takes on a
value falling in an interval [x1 , x2 ]. This can be easily accomplished using the
PDF of X as follows:
� x2 � x2 � x1
P (x1 < X ≤ x2 ) = fX (x)dx = fX (x)dx − fX (x)dx
x1 −∞ −∞
= FX (x2 ) − FX (x1 ). (3.32)
This leads us to the following counter-intuitive observation. If the PDF is
continuous and we wish to find P (X = x1 ), it is insightful to proceed as
folllows:

P (X = x1 ) = lim P (x1 < X ≤ x2 )


x2 → x1
� x2
= lim fX (x)dx
x2 → x1 x1
� x2
= fX (x1 ) lim dx
x2 → x1 x1

= fX (x1 ) lim (x2 − x1 )


x2 → x1
= 0. (3.33)
102 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Hence, the probability that X takes on exactly a given value x1 is null.


Intuitively, this is a consequence of the fact that X can take on an infinity of
values and the “sum” (integral) of the associated probabilities must be equal to
1.
However, if the PDF is not continuous, the above observation doesn’t nec-
essarily hold. A discrete r.v., for example, not only has a discrete probability
distribution, but also a corresponding PDF. Since the r.v. can only take on a
finite number of values, its PDF is made up of Dirac impulses at the locations
of the said values.
For the binomial distribution, we have

N �
� �
N
fX (x) = pn (1 − p)N −n δ(x − n), (3.34)
n
n=0

and, for a general discrete r.v. with a sample space of N elements, we have

N

fX (x) = P (X = xn )δ(x − xn ). (3.35)
n=1

Moments and characteristic functions


What exactly is the mean of a random variable? One intuitive answer is that
it is the “most likely” outcome of the underlying experiment. However, while
this is not far from the truth for well-behaved PDFs (which have a single peak
at, or in the vicinity of, their mean), it is misleading since the mean, in fact,
may not even be a part of the PDF’s support (i.e. its sample space or range of
possible values). Nonetheless, we refer to the mean of an r.v. as its expected
value, denoted by the expectation operator �·�. It is defined
� ∞
�X� = µX = xfX (x)dx, (3.36)
−∞

and it also happens to be the first moment of the r.v. X.


The expectation operator bears its name because it is indeed the best “edu-
cated guess” one can make a priori about the outcome of an experiment given
the associated PDF. While it may indeed lie outside the set of allowable values
of the r.v., it is the quantity that will on average minimize the error between X
and its a priori estimate X̂ = �X�. Mathematically, this is expressed

�� ��
� �
�X� = min �X − X̂ � . (3.37)

Probability and stochastic processes 103

Although this is a circular definition, it is insightful, especially if we expand


the right-hand side into its integral representation:
� ∞ � �
� �
�X� = min �X̂ − x� fX (x)dx. (3.38)
X̂ −∞

For another angle to this concept, consider a series of N trials of the same
experiment. If each trial is unaffected by the others, the outcomes are asso-
ciated with a set of N independent and identically distributed (i.i.d.) r.v.’s
X1 , . . . , XN . The average of these outcomes is itself an r.v. given by
N
� Xn
Y = . (3.39)
N
n=1

As N gets large, Y will tend towards the mean of the Xn ’s and in the limit

N
� Xn
Y = lim = �X� , (3.40)
N →∞ N
n=1

where �X� = �X1 � = . . . = �XN �. This is known in statistics as the weak


law of large numbers.
In general, the k th raw moment is defined as
� � � ∞
Xk = xk fX (x)dx, (3.41)
−∞

where the law of large numbers still holds since


� � � ∞ � ∞
X =
k
x fX (x)dx =
k
zfZ (z)dz = �Z� , (3.42)
−∞ −∞

where fZ (z) is the PDF of Z = X k .


In the same manner, we can find the expectation of any arbitrary function
Y = g(X) as follows
� ∞
�Y � = �g(X)� = g(x)fX (x)dx. (3.43)
−∞

One such useful function is Y = (X − µX )k where µX = �X� is the mean


of X.
Definition 3.16. The expectation of Y = (X − µX )k is the k th central mo-
ment of the r.v. X.
104 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

From (3.43), we have


� � � ∞
�Y � = (X − µX ) =
k
(x − µX )k fX (x)dx. (3.44)
−∞

Definition 3.17. The 2nd central moment �X − µX �2 is called the variance


and its square root is the standard deviation.
The variance of X, denoted σX
2 , is given by

� ∞
σX =
2
(x − µX )2 fX (x)dx, (3.45)
−∞

and it is useful because (like the standard deviation), it provides a measure of


the degree of dispersion of the r.v. X about its mean.
Expanding the quadratic form (x − µX )2 in (3.45) and integrating term-by-
term, we find
� �
2
σX = X 2 − 2µX �X� + µ2X
� �
= X 2 − 2µ2X + µ2X
� �
= X 2 − µ2X . (3.46)

In deriving the above, we have inadvertently exploited two of the properties


of the expectation operator:
Property 3.1. Given the expectation of a sum, where each term may or may
not involve the same random variable, it is equal to the sum of the expectations,
i.e.

�X + Y � = �X� + �Y � ,
� � � �
X + X 2 = �X� + X 2 .

Property 3.2. Given the expectation of the product of an r.v. by a deterministic


quantity α, the said quantity can be removed from the expectation, i.e.

�αX� = α �X� .

These two properties stem readily from the (3.43) and the properties of in-
tegrals.
Theorem 3.1. Given the expectation of a product of random variables, it is
equal to the product of expectations, i.e.

�XY � = �X� �Y � ,

if and only if the two variables X and Y are independent.


Probability and stochastic processes 105

Proof. We can readily generalize (3.43) for the multivariable case as follows:

� ∞ � ∞
�g(X1 , X2 , · · · , XN )� = ··· g (X1 , X2 , · · · , XN ) ×
−∞
� �� −∞�
N -fold
fX1 ,··· ,XN (x1 , · · · xN ) dx1 · · · dxN . (3.47)

Therefore, we have
� ∞ � ∞
�XY � = xyfX,Y (x, y)dxdy, (3.48)
−∞ −∞

where, if and only if X and Y are independent, the density factors to yield
� ∞� ∞
�XY � = xyfX (x)fY (y)dxdy
−∞ −∞
� ∞ � ∞
= xfX (x)dx yfY (y)dy
−∞ ∞
= �X� �Y � . (3.49)

The above theorem can be readily extended to the product of N indepen-


dent variables or N expressions of independent random variables by applying
(3.46).
Definition 3.18. The characteristic function (C. F.) of an r.v. X is defined
� ∞
� jtX �
φX (jt) = e = ejtX fX (x)dx, (3.50)
−∞

where j = −1.
It can be seen that the integral is in fact an inverse Fourier transform in the
variable t. It follows that the inverse of (3.50) is
� ∞
1
fX (x) = φX (jt)e−jtx dt. (3.51)
2π −∞
Characteristic functions play a role similar to the Fourier transform. In other
words, some operations are easier to perform in the characteristic function do-
main than in the PDF domain. For example, there is a direct relationship be-
tween the C. F. and the moments of a random variable, allowing the latter to be
obtained without integration (if the C. F. is known).
106 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The said relationship involves evaluation of the k th derivative of the C. F. at


t = 0, i. e. �
� �
k d φX (jt) �
k �
X = (−j)
k
. (3.52)
dtk �t=0
As will be seen later, characteristic functions are also useful in determining
the PDF of sums of random variables. Furthermore, since the crossing into the
C. F. domain is, in fact, an inverse Fourier transform, all properties of Fourier
transforms hold.

Functions of one r.v.


Consider an r.v. Y defined as a function of another r.v. X, i.e.

Y = g(X). (3.53)

If this function is uniquely invertible, we have X = g −1 (Y ) and

FY (y) = P (Y ≤ y) = P (g(X) ≤ y) = P (X ≤ g −1 (y))


= FX (g −1 (y)). (3.54)

Differentiating with respect to y allows us to relate the PDFs of X and Y :


d � �
fY (y) = fX g −1 (y) (3.55)
dy
� �
d d g −1 (y) � �
= FX g −1 (y)
d (g (y))
−1 dy
� �
d � −1 � d g −1 (y)
= FX g (y)
d (g −1 (y)) dy
� −1 �
� � d g (y)
= fX g −1 (y) (3.56)
dy
In the general case, the equation Y = g(X) may have more than one root.
If there are N real roots denoted x1 (y), x2 (y), . . . , xN (y), the PDF of Y is
given by
N � �
� � ∂xn (y) �
fY (y) = fX (xn (y)) �� �. (3.57)
∂y �
n=1

Example 3.4
Let Y = AX + B where A and B are arbitrary constants. Therefore, we have
X = Y −B
A and � �
y−B
∂ A 1
=
∂y A
Probability and stochastic processes 107
.
It follows that
� �
1 y−B
fY (y) = fX . (3.58)
A A
Suppose that X follows a Gaussian distribution with a mean of zero and a
variance of σX
2 . Hence,

2
1 − x2
fX (x) = √ e 2σX , (3.59)
2πσX

and
(y−B)2
1 − 2 2
fY (y) = √ e 2A σX . (3.60)
2πσX A

The r.v. Y is still a Gaussian variate, but its variance is σX


2 A2 and its mean

is B.

Pairs of random variables


In performing multiple related experiments or trials, it becomes necessary
to manipulate multiple r.v.’s. Consider two r.v.’s X1 and X2 which stem from
the same experiment or from twin related experiments. The probability that
X1 < x1 and X2 < x2 is determined from the joint CDF

P (X1 < x1 , X2 < x2 ) = FX1 ,X2 (x1 , x2 ). (3.61)

Furthermore, the joint PDF can be obtained by derivation:

∂ 2 FX1 ,X2 (x1 , x2 )


fX1 ,X2 (x1 , x2 ) = . (3.62)
∂x1 ∂x2

Following the concepts presented in section 2 for probabilities, the PDF


of the r.v. X1 irrespective of X2 is termed the marginal PDF of X1 and is
obtained by “averaging out” the contribution of X2 to the joint PDF, i.e.
� ∞
fX1 (x1 ) = fX1 ,X2 (x1 , x2 )dx2 . (3.63)
−∞

According to Bayes’ rule, the conditional PDF of X1 given that X2 = x2


is given by
fX1 ,X2 (x1 , x2 )
fX1 |X2 (x1 |x2 ) = . (3.64)
fX2 (x2 )
108 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Transformation of two random variables


Given 2 r.v.’s X1 and X2 with joint PDF fX1 ,X2 (x1 , x2 ), let Y1 = g(X1 , X2 ),
Y2 = g2 (X1 , X2 ), where g1 (X1 , X2 ) and g2 (X1 , X2 ) are 2 arbitrary single-
valued continuous functions of X1 and X2 .
Let us also assume that g1 and g2 are jointly invertible, i.e.

X1 = h1 (Y1 , Y2 ),
(3.65)
X2 = h2 (Y1 , Y2 ),

where h1 and h2 are also single-valued and continuous.


The application of the transformation defined by g1 and g2 amounts to a
change in the coordinate system. If we consider an infinitesimal rectangle of
dimensions ∆y1 ×∆y2 in the new system, it will in general be mapped through
the inverse transformation (defined by h1 and h2 ) to a four-sided curved region
in the original system (see Figure 3.2). However, since the region is infinitesi-
mally small, the curvature induced by the transformation can be abstracted out.
We are left with a parallelogram and we wish to calculate its area.
y2 ∆y1

∆x2
h1 , h2
∆y2
w α

∆x1 g1 , g2
v y1

Figure 3.2. Coordinate system change under transformation (g1 , g2 ).

This can be performed by relying on the tangential vectors v and w. The


sought-after area is then given by

A = ∆x1 ∆x2 = �v�2 �w�2 sin α∆y1 ∆y2


� �
v[1] w[1]
= det ∆y1 ∆y2 , (3.66)
v[2] w[2]

where the scaling factor is denoted


� � ��
A � v[1] w[1] ��
J= �
= det , (3.67)
∆y1 ∆y2 � v[2] w[2] �

and is the Jacobian J of the transformation. The Jacobian embodies the scaling
effects of the transformation and ensures that the new PDF will integrate to
unity.
Probability and stochastic processes 109

Thus, the joint PDF of Y1 and Y2 is given by

fY1 ,Y2 (y1 , y2 ) = JfX1 ,X2 (g1−1 (y1 , y2 ), g2−1 (y1 , y2 )). (3.68)

From the tangential vectors, the Jacobian is defined


� � �� � � ��
� ∂h1 (x1 ,x2 ) ∂h2 (x1 ,x2 ) � � ∂y ��
� �
J = �det ∂h1 (x11,x2 ) ∂h2 (x11,x2 ) � = ��det
∂x ∂x . (3.69)
� ∂x ∂x
� ∂x �
2 2

In the above, it has been assumed that the mapping between the original
and the new coordinate system was one-to-one. What if several regions in the
original domain map to the same rectangular area in the new domain? This
implies that the system

Y1 = h1 (X1 , X2 ),
(3.70)
Y2 = h2 (X1 , X2 ),
� � � � � �
(1) (1) (2) (2) (K) (K)
has several solutions x1 , x2 , x1 , x2 , . . . , x1 , x2 . Then all
these solutions (or roots) contribute equally to the new PDF, i.e.
K
� � � � �
(k) (k) (k) (k)
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 x1 , x2 J x1 , x2 . (3.71)
k=1

Multiple random variables


Transformation of N random variables
It is often the case (and especially in multidimensional signal processing
problems, which constitute a main focal point of the present book) that a rel-
atively large set of random variables must be considered collectively. It can
be considered then that such a set of variables represent the state of a single
stochastic process.
Given an arbitrary number N of random variables X1 , . . . , XN which col-
lectively behave according to joint PDF fX1 ,X2 ,··· ,XN (x1 , x2 , . . . , xN ), a trans-
formation is defined by the system of equations

y = g(x), (3.72)

where x and y are N × 1 vectors, and g is an N × 1 vector function of vector


x.
The reverse transformation may have one or more solutions, i.e.

x(k) = gk−1 (y), k ∈ [1, 2, · · · , K], (3.73)

where K is the number of solutions.


110 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The Jacobian corresponding to the kth solution is given by


� � ��
� ∂gk−1 (x) ��

Jk = �det �, (3.74)
� ∂x �

which is simply a generalization (with a slight change in notation) of (3.69).


This directly leads to
K
� � �
fy (y) = fx gk−1 (y) Jk . (3.75)
k=1

Joint characteristic functions


Given a set of N random variables characterized by a joint PDF, it is relevant
to define a corresponding joint characteristic function.

Definition 3.19. The joint characteristic function of a set of r.v.’s X1 , X2 ,


. . . , XN with joint PDF fX1 ,X2 ,··· ,XN (x1 , x2 , . . . , xN ) is given by
� �
φX1 ,X2 ,··· ,XN (t1 , t2 , . . . , tN ) = ej(t1 x1 +t2 x2 +···tN xN ) =
� ∞ � ∞
··· ej(t1 x1 +···tN xN ) fX1 ,··· ,XN (x1 , . . . , xN )dx1 · · · dxN .
−∞ −∞
� �� �
N -fold

Algebra of random variables


Sum of random variables
Suppose we want to characterize an r.v. which is defined as the sum of two
other independent r.v.’s, i.e.

Y = X1 + X2 . (3.76)

One way to attack this problem is to fix one r.v., say X2 , and treat this as a
transformation from X1 to Y . Given X2 = x2 , we have

Y = g(X1 ), (3.77)

where
g(X1 ) = X1 + x2 , (3.78)
and
g −1 (Y ) = Y − x2 . (3.79)
Probability and stochastic processes 111

It follows that
∂g −1 (y)
fY |X2 (y|x2 ) = fX1 (y − x2 )
∂y
= fX1 (Y − x2 ). (3.80)

Furthermore, we know that


� ∞
fY (y) = fY,X2 (y, x2 )dx2
�−∞

= fY |X2 (y|x2 )fX2 (x2 )dx2 (3.81)
−∞

which, by substituting (3.80), becomes


� ∞
fY (y) = fX1 (y − x2 )fX2 (x2 )dx2 . (3.82)
−∞

This is the general formula for the sum of two independent r.v.’s and it can
be observed that it is in fact a Fourier convolution of the two underlying PDFs.
It is common knowledge that a convolution becomes a simple multiplica-
tion in the Fourier transform domain. The same principle applies here with
characteristic functions and it is easily demonstrated.
Consider a sum of N independent random variables:
N

Y = Xn . (3.83)
n=1

The characteristic function of Y is given by


� � � PN �
φY (jt) = ejtY = ejt n=1 Xn . (3.84)

Assuming that the Xn ’s are independent, we have


�N � N N
� � � jtXn � �
φY (jt) = e jtXn
= e = φXn (jt). (3.85)
n=1 n=1 n=1

Therefore, the characteristic function of a sum of r.v.’s is the product of


the constituent C.F.’s. The corresponding PDF is obtainable via the Fourier
transform
� ∞ � N
1
fY (y) = φXn (jt)e−jyt dt. (3.86)
2π −∞
n=1
112 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

What if the random variables are not independent and exhibit correlation?
Consider again the case of two random variables; this time, the problem is
conveniently addressed by an adequate joint transformation. Let
Y = g1 (X1 , X2 ) = X1 + X2 , (3.87)
Z = g2 (X1 , X2 ) = X2 , (3.88)
where Z is not a useful r.v. per se, but was included to allow a 2 × 2 transfor-
mation.
The corresponding Jacobian is
� �
� 1 0 �
J =� � � = 1. (3.89)
−1 1 �
Hence, we have
fY,Z (y, z) = fX1 ,X2 (y − z, z), (3.90)
and we can find the marginal PDF of Y with the usual integration procedure,
i.e.
� ∞
fY (y) = fY,Z (y, z)fZ (z)dz
−∞
� ∞
= fX1 ,X2 (y − x2 , x2 )fX2 (x2 )dx2 . (3.91)
−∞

Products of random variables


A similar approach applies to products of r.v.’s. Consider the following
product of two r.v.’s:
Z = X1 X2 . (3.92)
We can again fix X2 and obtain the conditional PDF of Z given X2 = x2 as
was done for the sum of two r.v.’s. Given that g(X1 ) = X1 x2 and g −1 (Z) =
Z
x2 , this approach yields
� � � �
z ∂ z
fZ|X2 (z|x2 ) = fX1
x2 ∂z x2
� �
1 z
= fX1 . (3.93)
x2 x2
Therefore, we have
� ∞
fZ (z) = fZ|X2 (z|x2 )fX2 (x2 )dx2
−∞
� ∞ � �
z 1
= fX1 fX2 (x2 ) dx2 . (3.94)
−∞ x2 x2
Probability and stochastic processes 113

This formula happens to be the Mellin convolution of fX1 (x) and fX2 (x)
and it is related to the Mellin tranform in the same way that Fourier convolution
is related to the Fourier transform (see the overview of Mellin transforms in
subsection 3).
Hence, given a product of N independent r.v.’s
N

Z= Xn , (3.95)
n=1

we can immediately conclude that the Mellin transform of fZ (z) is the product
of the Mellin transforms of the constituent r.v.’s, i.e.
N

MfZ (s) = MfX (s) = [MfX (s)]N . (3.96)
n=1

It follows that we can find the corresponding PDF through the inverse Mellin
transform. This is expressed

1
fZ (z) = MfZ (s) x−s dx, (3.97)
2π L±∞
where one particular integration path was chosen, but others are possible (see
chapter 2, section 3, subsection on Mellin transforms).

4. Stochastic processes
Many observable parameters that are considered random in the world around
us are actually functions of time, e.g. ambient temperature and pressure, stock
market prices, etc. In the field of communications, actual useful message sig-
nals are typically considered random, although this might seem counterintu-
itive. The randomness here relates to the unpredictability that is inherent in
useful communications. Indeed, if it is known in advance what the message
is (the message is predetermined or deterministic), there is no point in trans-
mitting it at all. On the other hand, the lack of any fore-knowledge about the
message implies that from the point-of-view of the receiver, the said message
is a random process or stochastic process. Moreover, the omnipresent white
noise in communication systems is also a random process, as well as the chan-
nel gains in multipath fading channels, as will be seen in chapter 4.
Definition 3.20. Given a random experiment with a sample space S, compris-
ing outcomes λ1 , λ2 , . . . , λN , and a mapping between every possible outcome
λ and a set of corresponding functions of time X(t, λ), then this family of
functions, together with the mapping and the random experiment, constitutes a
stochastic process.
114 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

In fact, a stochastic process is a function of two variables; the outcome


variable λ which necessarily belongs to sample space S, and time t, which can
take any value between −∞ and ∞.

Definition 3.21. To a specific outcome λi in a stochastic process corresponds


a single time function X(t, λi ) = xi (t) called a member function or sample
function of the said process.

Definition 3.22. The set of all sample functions in a stochastic process is called
an ensemble.

We have established that for a given outcome λi , X(t, λi ) = xi (t) is a


predetermined function out of the ensemble of possible functions. If we fix
t to some value t1 instead of fixing the outcome λ, then X(t1 , λ) becomes
a random variable. At another time instant, we find another random vari-
able X(t2 , λ) which is most likely correlated with X(t1 , λ). It follows that
a stochastic process can also be seen as a succession of infinitely many joint
random variables (one for each defined instant in time) with a given joint dis-
tribution. Any set of instances for these r.v.’s constitutes one of the member
functions. While this view is conceptually helpful, it is hardly practical for
manipulating processes given the infinite number of individual r.v.’s required.
Instead of continuous time, it is often sufficient to consider only a prede-
termined set of time instants t1 , t2 , . . . , tN . In that case, the set of random
variables X(t1 ), X(t2 ), . . . , X(tN ) becomes a random vector x with a PDF
fx (x), where x = [X(t1 ), X(t2 ), · · · , X(tN )]T . Likewise, a CDF can be
defined:

Fx (x) = P (X(t1 ) ≤ x1 , X(t2 ) ≤ x2 , . . . , X(tN ) ≤ xN ) . (3.98)

Sometimes, it is more productive to consider how the process is generated.


Indeed, a large number of infinitely precise time functions can in some cases
result from a relatively simple random mechanism, and such simple models
are highly useful to the communications engineer, even when they are approx-
imations or simplifications of reality. Such parametric modeling as a function
of one or more random variables can be expressed as follows:

X(t, λ) = g1 (Y1 , Y2 , · · · , YN , t), (3.99)

where g1 is a function of the underlying r.v.’s Y1 , Y2 , . . . , YN and time t, and

λ = g2 (Y1 , Y2 , · · · , YN ), (3.100)

where g2 is a function of the r.v.’s {Yn }, which together uniquely determine


the outcome λ.
Probability and stochastic processes 115

Since at a specific time t a process is essentially a random variable, it follows


that a mean can be defined as
� ∞
�X(t)� = µX (t) = x(t)fX(t) (x)dx. (3.101)
−∞

Other important statistics are defined below.

Definition 3.23. The joint moment


� ∞ � ∞
RXX (t1 , t2 ) = �X(t1 )X(t2 )� = x1 x2 fX(t1 ),X(t2 ) (x1 , x2 )dx1 dx2 ,
−∞ −∞

is known as the autocorrelation function.

Definition 3.24. The joint central moment

µXX (t1 , t2 ) = �(X(t1 ) − µX (t1 )) (X(t2 ) − µX (t2 ))�


= RXX (t1 , t2 ) − µX (t1 )µX (t2 ).

is known as the autocovariance function.

The mean, autocorrelation, and autocovariance functions as defined above


are designated ensemble statistics since the averaging is performed with re-
spect to the ensemble of possible functions at a specific time instant. Other
joint moments can be defined in a straightforward manner.

Example 3.5
Consider the stochastic process defined by

X(t) = sin(2πfc t + Φ), (3.102)

where fc is a fixed frequency and Φ is a uniform random variable taking values


between 0 and 2π. This is an instance of a parametric description where the
process is entirely defined by a single random variable Φ.
The ensemble mean is
� ∞
µX (t) = xfX(t) (x)dx
−∞
� 2π
= sin(2πfc t + x)fΦ (x)dx
0

1 2π
= sin(2πfc t + x)dx
2π 0
= 0, (3.103)
116 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

and the autocorrelation function is


� ∞� ∞
RXX (t1 , t2 ) = x1 x2 fX(t1 ),X(t2 ) (x1 , x2 )dx1 dx2
−∞ −∞
� 2π
= sin(2πfc t1 + φ) sin(2πfc t2 + φ)fΦ (φ)dφ
0
�� 2π
1
= cos (2πfc (t2 − t1 )) dφ
4π 0
� 2π �
− cos (2πfc (t1 + t2 ) + 2φ) dφ
0
1
= [2π cos (2πfc (t2 − t1 )) − 0]

1
= cos (2πfc (t2 − t1 )) . (3.104)
2
Definition 3.25. If all statistical properties of a stochastic process are invariant
to a change in time origin, i.e. X(t, λ) is statistically equivalent to X(t +
T, λ), for any t, and T is any arbitrary time shift, then the process is said to be
stationary in the strict sense.
Stationarity in the strict sense implies that for any set of N time instants t1 ,
t2 , . . . , tN , the joint PDF of X(t1 ), X(t2 ), . . . , X(tN ) is identical to the joint
PDF of X(t1 + T ), X(t2 + T ), . . . , X(tN + T ). Equivalently, it can be said
that a process is strictly stationary if
� � � �
X k (t) = X k (0) , for all t, k. (3.105)

However, a less stringent definition of stationarity is often useful, since strict


stationarity is both rare and difficult to determine.
Definition 3.26. If the ensemble mean and autocorrelation function of a stochas-
tic process are invariant to a change of time origin, i.e.

�X(t)� = �X(t + T )� , RXX (t1 , t2 ) = RXX (t1 + T, t2 + T ),

for any t, t1 , t2 and T is any arbitrary time shift, then the process is said to be
stationary in the wide-sense or wide-sense stationary.
In the field of communications, it is typically considered sufficient to satisfy
the wide-sense stationarity (WSS) conditions. Hence, the expression “station-
ary process” in much of the literature (and in this book!) usually implies WSS.
Gaussian processes constitute a special case of interest; indeed, a Gaussian
process that is WSS is also automatically stationary in the strict sense. This is
Probability and stochastic processes 117

by virtue of the fact that all high-order moments of the Gaussian distribution
are functions solely of its mean and variance.
Since the point of origin t1 becomes irrelevant, the autocorrelation function
of a stationary process can be specified as a function of a single argument, i.e.

RXX (t1 , t2 ) = RXX (t1 − t2 ) = RXX (τ ), (3.106)

where τ is the delay variable.


Property 3.3. The autocorrelation function of a stationary process is an even
function (symmetric about τ = 0), i.e.

RXX (τ ) = �X(t1 )X(t1 − τ )� = �X(t1 + τ )X(t1 )� = RXX (−τ ). (3.107)

Property 3.4. The autocorrelation function with τ = 0 yields the average


energy of the process, i.e.
� �
RXX (0) = X 2 (t) .

Property 3.5. (Cauchy-Schwarz inequality)

|RXX (τ )| ≤ RXX (0),

.
Property 3.6. If X(t) is periodic, then RXX (τ ) is also periodic, i.e.

X(t) = X(t + T ) → RXX (τ ) = RXX (τ + T ).

It is straightforward to verify whether a process is stationary or not if an


analytical expression is available for the process. For instance, the process of
example 3.5 is obviously stationary. Otherwise, typical data must be collected
and various statistical tests performed on it to determine stationarity.
Besides ensemble statistics, time statistics can be defined with respect to a
given member function.
Definition 3.27. The time-averaged mean of a stochastic process X(t) is
given by
� T
1 2
MX = lim X(t, λi )dt,
T →∞ T − T
2

where X(t, λi ) is a member function.


Definition 3.28. The time-averaged autocorrelation function is defined
� T
1 2
RXX (τ ) = lim X(t, λi )X(t + τ, λi )dt.
T →∞ T − T2
118 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Unlike the ensemble statistics, the time-averaged statistics are random vari-
ables; their actual values depend on which member function is used for time
averaging.
Obtaining ensemble statistics requires averaging over all member functions
of the sample space S. This requires either access to all said member functions,
or perfect knowledge of all the joint PDFs characterizing the process. This
is obviously not possible when observing a real-world phenomenon behaving
according to a stochastic process. The best that we can hope for is access to
time recordings of a small set of member functions. This makes it possible to
compute time-averaged statistics.
The question that arises is: can a time-averaged statistic be employed as an
approximation of the corresponding ensemble statistic?

Definition 3.29. Ergodicity is the property of a stochastic process by virtue of


which all its ensemble statistics are equal to its corresponding time-averaged
statistics.

Not all processes are ergodic. It is also very difficult to determine whether a
process is ergodic in the strict sense, as defined above. However, it is sufficient
in practice to determine a limited form of ergodicity, i.e. with respect to one
or two basic statistics. For example, a process X(t) is said to be ergodic in
the mean if µX = MX . Similarly, a process is said to be ergodic in the au-
tocorrelation if RXX = RXX . An ergodic process is necessarily stationary,
although stationarity does not imply ergodicity.

Joint and complex stochastic processes


Given two joint stochastic process X(t) and Y (t), they are fully defined at
two respective sets of time instants {t1,1 , t1,2 , . . . , t1,M } and {t2,1 , t2,2 , . . . , t2,N }
by the joint PDF

fX(t1,1 ),X(t1,2 ),··· ,X(t1,M ),Y (t2,1 ),Y (t2,2 ),··· ,Y (t2,N ) (x1 , x2 , . . . , xM , y1 , y2 , . . . , yN ) .

Likewise, a number of useful joint statistics can be defined.

Definition 3.30. The joint moment


� ∞ � ∞
RXY (t1 , t2 ) = �X(t1 )Y (t2 )� = xyfX(t1 ),Y (t2 ) (x, y)dxdy
−∞ −∞

is the cross-correlation function of the processes X(t) and Y (t).

Definition 3.31. The cross-covariance of X(t) and Y (t) is

µXY (t1 , t2 ) = �X(t1 )Y (t2 )� = RXY (t1 , t2 ) − µX (t1 )µY (t2 ).


Probability and stochastic processes 119

X(t) and Y (t) are jointly wide sense stationary if


�X m (t)Y n (t)� = �X m (0)Y n (0)� , for all t, m and n. (3.108)
If X(t) and Y (t) are individually and jointly WSS, then
RXY (t1 , t2 ) = RXY (τ ), (3.109)
µXY (t1 , t2 ) = µXY (τ ). (3.110)
Property 3.7. If X(t) and Y (t) are individually and jointly WSS, then
RXY (τ ) = RY X (−τ ), µXY (τ ) = µY X (−τ ).
The above results from the fact that �X(t)Y (t + τ )� = �X(t − τ )Y (t)�.
The two processes are said to be statistically independent if and only if
their joint distribution factors, i.e.
fX(t1 ),Y (t2 ) (x, y) = fX(t1 ) (x)fY (t2 ) (y). (3.111)
They are uncorrelated if and only if µXY (τ ) = 0 and orthogonal if and
only if RXY (τ ) = 0.
Property 3.8. (triangle inequality)
1
|RXY (τ )| ≤ [RXX (0) + RY Y (0)] .
2
Definition 3.32. A complex stochastic process is defined
Z(t) = X(t) + jY (t),
where X(t) and Y (t) are joint, real stochastic processes.
Definition 3.33. The complex autocorrelation function (autocorrelation func-
tion of a complex process) is defined
1
RZZ (t1 , t2 ) = �Z(t1 )Z ∗ (t2 )�
2
1
= �[X(t1 ) + jY (t1 )] [X(t2 ) − jY (t2 )]�
2
1
= [RXX (t1 , t2 ) + RY Y (t1 , t2 )]
2
j
+ [RY X (t1 , t2 ) − RXY (t1 , t2 )] .
2
Property 3.9. If Z(t) is WSS (implying that X(t) and Y (t) are individually
and jointly WSS), then
RZZ (t1 , t2 ) = RZZ (t1 − t2 ) = RZZ (τ ). (3.112)
120 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Definition 3.34. The complex crosscorrelation function of two complex ran-


dom processes Z1 (t) = X1 (t) + jY1 (t) and Z2 (t) = X2 (t) + jY2 (t) is defined
1
RZ1 Z2 (t1 , t2 ) = �Z1 (t1 )Z2∗ (t2 )�
2
1
= [RX1 X1 (t1 , t2 ) + RX2 X2 (t1 , t2 )]
2
j
+ [RY1 X2 (t1 , t2 ) − RX1 Y2 (t1 , t2 )] .
2
Property 3.10. If the real and imaginary parts of two complex stochastic pro-
cesses Z1 (t) and Z2 (t) are individually and pairwise WSS, then
∗ 1 ∗
RZ 1 Z2
(τ ) = �Z (t)Z2 (t − τ )�
2 1
1 ∗
= �Z (t + τ )Z2 (t)�
2 1
= RZ2 Z1 (−τ ). (3.113)
Property 3.11. If the real and imaginary parts of a complex random process
Z(t) are individually and jointly WSS, then

RZZ (τ ) = RZZ (−τ ). (3.114)

Linear systems and power spectral densities


How does a linear system behave when a stochastic process is applied at
its input? Consider a linear, time-invariant system with impulse response h(t)
having for input a stationary random process X(t), as depicted in Figure 3.3. It
is logical to assume that its output will be another stationary stochastic process
Y (t) and that it will be defined by the standard convolution integral
� ∞
Y (t) = X(τ )h(t − τ )dτ. (3.115)
−∞

X(t) Y (t)
h(t)

Figure 3.3. Linear system with impulse response h(t) and stochastic process X(t) at its input.

The expectation of Y (t) is then given by


�� ∞ �
�Y (t)� = X(τ )h(t − τ )dτ
−∞
� ∞
= �X(τ )� h(t − τ )dτ, (3.116)
−∞
Probability and stochastic processes 121

where, by virtue of the stationarity of X(t), the remaining expectation is actu-


ally a constant, and we have
� ∞
�Y (t)� = µX h(t − τ )dτ
−∞
� ∞
= µX h(τ )dτ
−∞
= µX H(0), (3.117)
where H(f ) is the Fourier transform of h(t).
Let us now determine the crosscorrelation function of Y (t) and X(t). We
have
RY X (t, τ ) = �Y (t)X ∗ (t − τ )�
� � ∞ �

= X (t − τ ) X(t − a)h(a)da
−∞
� ∞
= �X ∗ (t − τ )X(t − a)� h(a)da
−∞
� ∞
= RXX (τ − a)h(a)da. (3.118)
−∞

Since the right-hand side of the above is independent of t, it can be deduced


that X(t) and Y (t) are jointly stationary. Furthermore, the last line is also a
convolution integral, i.e.
RY X (τ ) = RXX (τ ) ∗ h(τ ). (3.119)
The autocorrelation function of Y (t) can be derived in the same fashion:
RY Y (τ ) = �Y (t)Y ∗ (t − τ )�
� � ∞ �

= Y (t − τ ) X(t − a)h(a)da
−∞
� ∞
= �Y ∗ (t − τ )X(t − a)� h(a)da
�−∞

= RY X (a − τ )h(a)da
−∞
= RY X (−τ ) ∗ h(τ )
= RXX (−τ ) ∗ h(−τ ) ∗ h(τ )
= RXX (τ ) ∗ h(−τ ) ∗ h(τ ). (3.120)
Given that X(t) at any given instant is a random variable, how can the spec-
trum of X(t) be characterized? Intuitively, it can be assumed that there is
122 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

a different spectrum X(f, λ) for every member function X(t, λ). However,
stochastic processes are in general infinite energy signals which implies that
their Fourier transform in the strict sense does not exist. In the time domain, a
process is characterized essentially through its mean and autocorrelation func-
tion. In the frequency domain, we resort to the power spectral density.
Definition 3.35. The power spectral density (PSD) of a random process X(t)
is a spectrum giving the average (in the ensemble statistic sense) power in the
process at every frequency f .
The PSD can be found simply by taking the Fourier transform of the auto-
correlation function, i.e.
� ∞
SXX (f ) = RXX (τ )e−j2πf τ dτ, (3.121)
−∞

which obviously implies that the autocorrelation function can be found from
the PSD SXX (f ) by performing an inverse transform, i.e.
� ∞
RXX (τ ) = SXX (f )ej2πf τ df. (3.122)
−∞

This bilateral relation is known as the Wiener-Khinchin theorem and its


proof is left as an exercise.
Definition 3.36. The cross power spectral density (CPSD) between two ran-
dom processes X(t) and Y (t) is a spectrum giving the ensemble average prod-
uct between every frequency component of X(t) and every corresponding fre-
quency component of Y (t).
As could be expected, the CPSD can also be computed via a Fourier trans-
form � ∞
SXY (f ) = RXY (τ )ej2πf τ dτ. (3.123)
−∞
By taking the Fourier transform of properties 3.10 and 3.11, we find the
following:
Property 3.12.

SXY (f ) = SY X (f ).
Property 3.13.

SXX (f ) = SXX (f ).
The CPSD between the input process X(t) and the output process Y (t) of
a linear system can be found by simply taking the Fourier transform of 3.117.
Thus, we have
SY X (f ) = H(f )SXX (f ). (3.124)
Probability and stochastic processes 123

Likewise, the PSD of Y (t) is obtained by taking the Fourier transform of


(3.119), which yields

SY Y (f ) = H(f ) ∗ H ∗ (f )SXX (f )
= |H(f )|2 SXX (f ). (3.125)

It is noteworthy that, since RXX (f ) is an even function, its Fourier trans-


form SXX (f ) is necessarily real. This is logical, since according to the defini-
tion, the PSD yields the average power at every frequency (and complex power
makes no sense). It also follows that the average total power of X(t) is given
by
� �
P = X 2 (t) = RXX (0)
� ∞
= SXX (f )df. (3.126)
−∞

Discrete stochastic processes


If a stochastic process is bandlimited (i.e. its PSD is limited to a finite
interval of frequencies) either because of its very nature or because it results
from passing another process through a bandlimiting filter, then it is possible to
characterize it fully with a finite set of time instants by virtue of the sampling
theorem.
Given a deterministic signal x(t), it is bandlimited if

X(f ) = F{x(t)} = 0, for |f | > W , (3.127)

where W corresponds to the highest frequency in s(t). Recall that according


to the sampling theorem, x(t) can be uniquely determined by the set of its
samples taken at a rate of fs ≥ 2W samples / s, where the latter inequality
constitutes the Nyquist criterion and the minimum rate 2W samples / s is
known as the Nyquist rate. Sampling at the Nyquist rate, the sampled signal
is � � � �
�∞
k k
xs (t) = x δ t− , (3.128)
2W 2W
k=−∞
and it corresponds to the discrete sequence
� �
k
x[k] = x . (3.129)
2W
Sampling theory tells us that x[k] contains all the necessary information
to reconstruct x(t). Since the member functions of a stochastic process are
individually deterministic, the same is true for each of them and, by extension,
for the process itself. Hence, a bandlimited process X(t) is fully characterized
124 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

by the sequence of random variables X[k] corresponding to a sampling set at


the Nyquist rate. Such a sequence X[k] is an instance of a discrete stochastic
process.
Definition 3.37. A discrete stochastic process X[k] is an ensemble of discrete
sequences (member functions) x1 [k], x2 [k], . . . , xN [k] which are mapped to
the outcomes λ1 , λ2 , . . . , λN making up the sample space S of a corresponding
random experiment.
The mth moment of X[k] is
� ∞
�X [k]� =
m
xm fX[k] (x)dx, (3.130)
−∞

its autocorrelation is defined


� ∞� ∞
1� ∗

RXX [k1 , k2 ] = X k1 X k 2 = xy ∗ fXk1 ,Xk2 (x, y)dxdy, (3.131)
2 −∞ −∞

and its autocovariance is given by


µXX [k1 , k2 ] = RXX [k1 , k2 ] − �X[k1 ]� �X[k2 ]� . (3.132)
If the process is stationary, then we have
RXX [k1 , k2 ] = RXX [k1 − k2 ],
µXX [k1 , k2 ] = µXX [k1 − k2 ] = RXX [k1 − k2 ] − µ2X . (3.133)
The power spectral density of a discrete process is, naturally enough, com-
puted using the discrete Fourier transform, i.e.


SXX (f ) = RXX [k]e−j2πf k , (3.134)
k=−∞

with the inverse relationship being


� 1/2
RXX [k] = SXX (f )ej2πf k df. (3.135)
−1/2

Hence, as with any discrete Fourier transform, the PSD SXX (f ) is periodic.
More precisely, we have SXX (f ) = SXX (f + n) where n is any integer.
Given a discrete-time linear time-invariant system with a discrete impulse
response h[k] = h(tk ), the output process Y [k] of this system when a process
X[k] is applied at its input is given by


Y [k] = h[n]X[k − n], (3.136)
n=−∞
Probability and stochastic processes 125

which constitutes a discrete convolution.


The mean of the output can be computed as follows:
µY = �Y [k]�
� ∞ �

= h[n]X[k − n]
n=−∞


= h[n] �X[k − n]�
n=−∞
�∞
= µX h[n] = µX H(0), (3.137)
n=−∞

where H(0) is the DC component of the system’s frequency transfer function.


Likewise, the autocorrelation function of the output is given by
1 ∗
RY Y [k] = �Y [n]Y [n + k]�
2� �

� ∞

1 ∗ ∗
= h [m]X [n − m] h[l]X[k + n − l]
2 m=−∞
l=−∞

� ∞

= h∗ [m]h[l] �X ∗ [n − m]X[k + n − l]�
m=−∞ l=−∞

� �∞
= h∗ [m]h[l]RXX [k + m − l], (3.138)
m=−∞ l=−∞

which is in fact a double discrete convolution.


Taking the discrete Fourier transform of the above expression, we obtain
SY Y (f ) = SXX (f ) |Hs (f )|2 , (3.139)
which is exactly the same as for continuous processes, except that SXX (f ) and
SY Y (f ) are the periodic PSDs of discrete processes, and Hs (f ) is the periodic
spectrum of the sampled version of h(t).

Cyclostationarity
The modeling of signals carrying digital information implies stochastic pro-
cesses which are not quite stationary, although it is possible in a sense to treat
them as stationary and thus obtain the analytical convenience associated with
this property.
Definition 3.38. A cyclostationary stochastic process is a process with non-
constant mean (it is therefore not stationary, neither in the strict sense nor the
126 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

wide-sense) such that the mean and autocorrelation function are periodic in
time with a given period T .
Consider a random process


S(t) = a[k]g(t − kT ), (3.140)
k=−∞

where {a[k]} is a sequence of complex random variables having a mean µA


and autocorrelation function RAA [n] (so that the sequence A[k] = {a[k]} is a
stationary discrete random process), and g(t) is a real pulse-shaping function
considered to be 0 outside the interval t ∈ [0, T ]. The mean of S(t) is given by


µS = µA g(t − nT ), (3.141)
k=−∞

and it can be seen that it is periodic with period T .


Likewise, the autocorrelation function is given by
1 ∗
RSS (t, t + τ ) = �S (t)S(t + τ )�
2� �

� ∞

1 ∗
= a [k]g(t − kT ) a[l]g(t + τ − lT )
2
k=−∞ l=−∞
∞ �
� ∞
1
= g(t − kT )g(t + τ − lT ) �a∗ [k]a[l]�
−∞
2
k=−∞
�∞ �∞
= g(t − kT )g(t + τ − lT )RAA [l − k]. (3.142)
k=−∞ −∞

It is easy to see that


RSS (t, t + τ ) = RSS (t + kT, t + τ + kT ), for any integer k. (3.143)
The fact that such processes are not stationary is inconvenient. For exam-
ple, it is awkward to derive a PSD from the above autocorrelation function, be-
cause there are 2 variables involved (t and τ ) and this calls for a 2-dimensional
Fourier transform. However, it is possible to sidestep this issue by observ-
ing that averaging the autocorrelation function over one period T removes any
dependance upon t.
Definition 3.39. The period-averaged autocorrelation function of a cyclo-
stationary process is defined

1 T /2
R̄SS (τ ) = RSS (t, t + τ )dt.
T −T /2
Probability and stochastic processes 127

It is noteworthy that the period-averaged autocorrelation function is an en-


semble statistic and should not be confused with the time-averaged statistics
defined earlier. Based on this, the power spectral density can be simply de-
fined as
� ∞
� �
SSS (f ) = F R̄SS (τ ) = R̄SS (τ )ej2πf τ dτ. (3.144)
−∞

5. Typical univariate distributions


In this section, we examine various types of random variables which will
be found useful in the following chapters. We will start by studying three
fundamental distributions: the binomial, Gaussian and uniform laws. Because
complex numbers play a fundamental role in the modeling of communication
systems, we will then introduce complex r.v.’s and the associated distributions.
Most importantly, we will dwell on the complex Gaussian distribution, which
will then serve as a basis for deriving other useful distributions: chi-square,
F-distribution, Rayleigh and Rice. Finally, the Nakagami-m and lognormal
distributions complete our survey.

Binomial distribution
We have already seen in section 2 that a discrete random variable following
a binomial distribution is characterized by the PDF
N �
� �
N
fX (x) = pn (1 − p)N −n δ(x − n). (3.145)
n
n=0

We can find the CDF by simply integrating the above. Thus we have
� x N �
� �
N
FX (x) = pn (1 − p)N −n δ(α − n)dα
−∞ n
n=0
N
� � � � x
N
= pn (1 − p)N −n δ(α − n)dα, (3.146)
n −∞
n=0

where � �
x
1, if n < x
δ(α − n)dα = (3.147)
−∞ 0, otherwise
It follows that
�x� � �
� N
FX (x) = pn (1 − p)N −n , (3.148)
n
n=0
128 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

where �x� denotes the floor operator, i.e. it corresponds to the nearest integer
which is smaller or equal to x.
The characteristic function is given by
� ∞� N � �
N
φX (jt) = pn (1 − p)N −n δ(x − n)ejtx dx
−∞ n
n=0
N
� � �
N
= pn (1 − p)N −n ejtn , (3.149)
n
n=0
which, according to the binomial theorem, reduces to
φX (jt) = (1 − p + pejt )N . (3.150)
Furthermore, the Mellin transform of the binomial PDF is given by
� ∞� N � �
N
MfX (s) = pn (1 − p)N −n δ(x − n)xs−1 dx
−∞ n
n=0
N
� � �
N
= pn (1 − p)N −n ns−1 , (3.151)
n
n=0
which implies that the kth moment is
� � � N � �
N
X =k
pn (1 − p)N −n nk . (3.152)
n
n=0
If k = 1, we have
N �
� �
N
�X� = pn (1 − p)N −n n
n
n=0
�N
N!
= pn (1 − p)N −n
(N − n)!(n − 1)!
n=0
N� −1 � �
N −1 �
= Np � pn (1 − p)N −1−n (3.153)
n
n� =−1
� �
N −1
where n = n − 1 and, noting that
� = 0, we have
−1
N
� −1 � �
N −1 � �
�X� = N p � pn (1 − p)N −1−n

n
n =0
= N p(1 − p + p)N −1
= N p. (3.154)
Probability and stochastic processes 129

Likewise, if k = 2, we have
N �
� �
� � N
X 2
= pn (1 − p)N −n n2
n
n=0
N
� −1 � �
N −1 � �
= Np pn (1 − p)N −1−n (n� + 1)
n�
n� =0
� N −2 � �
� N −2 �� ��
= N p (N − 1)p pn (1 − p)N −2−n +
n��
n�� =0
N −1 � � �
� N −1 n� N −1−n�
p (1 − p) , (3.155)
n�
n� =0

where n� = n − 1 and n�� = n − 2 and, applying the binomial theorem


� 2�
X = N (N − 1)p2 + N p
= N p(1 − p) + N 2 p2 . (3.156)

If follows that the variance is given by


� �
σX2
= X 2 − �X�2
= N p(1 − p) + N 2 p2 − N 2 p2
= N p(1 − p) (3.157)

Uniform distribution
A continuous r.v. X characterized by a uniform distribution can take any
value within an interval [a, b] and this is denoted X ∼ U (a, b). Given a smaller
interval [∆a, ∆b] of width ∆ such that ∆a ≥ a and ∆b ≤ b, we find that, by
definition

P (X ∈ [∆a , ∆b ]) = P (∆a ≤ X ≤ ∆b ) = , (3.158)
b−a
regardless of the position of the interval [∆a , ∆b ] within the range [a, b].
The PDF of such an r.v. is simply
� 1
fX (x) = b−a , if x ∈ [a, b], (3.159)
0, elsewhere,

or, in terms of the step function u(x),


1
fX (x) = [u(x − a) − u(x − b)] . (3.160)
b−a
130 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The corresponding CDF is



 0, if x < a,
FX (x) = x
b−a , if x ∈ [a, b] (3.161)

1, if x > b,
which, in terms of the step function is conveniently expressed
x
FX (x) = [u(x − a) − u(x − b)] + u(x − b). (3.162)
b−a
The characteristic function is obtained as follows:
� b
1 jtx
φX (jt) = e dx
a b−a
� jtx �b
1 e
=
b − a jt a
1 � �
= ejtb − ejta . (3.163)
jt(b − a)
The Mellin transform is equally elementary:
� b s−1
x
MfX (s) = dx
a b−a
� s �b
1 x
=
b−a s a
1 bs − as
= . (3.164)
b−a s
It follows that
� � 1 bk+1 − ak+1
Xk = . (3.165)
b−a k+1

Gaussian distribution
From Appendix 3.A and Example 3.4, we know that the PDF of a Gaussian
r.v. is
(x−µX )2
1 − 2
fX (x) = √ e 2σX , (3.166)
2πσX
where σX 2 is the variance and µ is the mean of X.
X
The corresponding CDF is
� x
FX (x) = fX (α)dα
−∞

1 x α−µX

2σ 2
= √ e X dα, (3.167)
2πσX −∞
Probability and stochastic processes 131
α−µ
which, with the variable substitution u = √ X,
2σX
becomes

� x−µ
√ X
1 2σX 2
FX (x) = √ e−u du
π
 �−∞ � ��

 1
1 + erf x−µ
√ X , if x − µX > 0,
 2�
 � 2σX ��
|x−µX |
= 2 1 − erf
1 √ (3.168)

 � 2σX�

 = 1 erfc |x−µ |
2
√ X

otherwise,
X

where erf(·) is the error function and erfc(·) is the complementary error func-
tion.
Figure 3.4, shows the PDF and CDF of the Gaussian distribution.

0.12

0.1

0.08
fX (x)

0.06

0.04

0.02

0
−15 −10 −5 0 5 10 15
x
(a) Probability density function fX (x)

1
P (X ≤ x) = FX (x)

0.8

0.6

0.4

0.2

0
−15 −10 −5 0 5 10 15
x
(b) Cumulative density function fX (x)
Figure 3.4. The PDF and CDF of the real Gaussian distribution with zero mean and a variance
of 10.
132 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The characteristic function of a Gaussian r.v. with mean µX and variance


2 is
σX
t2 σX2

φX (jt) = ejtu− 2 . (3.169)


The k th central moment is given by

� � � (x−µX )2

1 −
2σ 2
(X − µX ) k
= (x − µX ) √ k
e X , (3.170)
−∞ 2πσX
(x−µX )2
where the integral can be reduced by substituting u = 2
2σX
to yield

� � �k�
2 2 � � � ∞ k−1
2σX
(X − µX )k = √ 1 − (−1)k+1
u 2 e−u du, (3.171)
π 0

which was obtained by treating separately the cases (x − µX ) > 0 and (x −


µX ) < 0. It can be observed that all odd-order central moments are null. For
the even moments, we have
� � k �
2 2 σk ∞ k−1
(X − µX ) k
= √X u 2 e−u du, if k is even, (3.172)
π 0

which, according to the definition of the Gamma function, is


� � k � �
2 2 σk k+1
(X − µX ) k
= √ XΓ . (3.173)
π 2

By virtue of identity (2.197), the above becomes


� � k
2 2 σ k 1 · 3 · 5 · · · (k − 1) √
(X − µX ) k
= √X k π, if k is even, (3.174)
π 22
which finally reduces to

� � 0, if k is odd,
(X − µX )k = � k2 −1 (3.175)
k
σX i=1 (2i − 1), if k is even.

The raw moments are best obtained as a function of the central moments.
For the k th raw moment, assuming k an integer, we have:

� � � (x−µX )2

1 −
2σ 2
X k
= x √ e k X dx. (3.176)
−∞ 2πσX
Probability and stochastic processes 133

This can be expressed as a function of x−µX thanks to the binomial theorem


as follows:
� � � ∞ (x−µX )2
1 − 2
X k
= (x − µX + µX ) √ k
e 2σX dx
−∞ 2πσX
k
� k � � � (x−µX )2

1 − 2
= k−n
µX (x − µX ) √
n
e 2σX dx
n=0
n −∞ 2πσX
�k � � � �
k
= µk−n
X (X − µX )k
n
n=0
� k2 � � � n−1
� k �
k−2n� 2n�
= µX σX (2n − 1). (3.177)

2n �
n =0 i=1

The Mellin transform does not follow readily because the above assumes
that k is an integer. The Mellin transform variable s, on the other hand, is typ-
ically not an integer and can assume any value in the complex plane. Further-
more, it may sometimes be useful to find fractional moments. From (3.176),
we have
∞ � �
� s−1 � � s−1 � �
X = µs−1−n
X (X − µX )s−1 , (3.178)
n
n=0

where the summation now runs form 0 to ∞ to account for the fact that s
might not be an integer. If it is an integer, terms for n > s − 1 will be nulled
because the combination operator has (s − 1 − n)! at the denominator and the
factorial of a negative integer is infinite. It follows that the above equation is
true regardless of the value of s.
Applying the identity (2.197), we find

� � � Γ(1 − s + n) s−1−n � �
X s−1 = µX (X − µX )s−1 , (3.179)
Γ(1 − s)n!
n=0

which, by virtue of 3.173 and bearing in mind that the odd central moments
are null, becomes
∞ � 2n� � �
� s−1 � � Γ(1 − s + 2n� ) s−1−2n� 2n σX � 1
X = µ √ Γ n + . (3.180)

Γ(1 − s)(2n� )! X π 2
n =0

The Gamma function and the factorial in 2n� can be expanded by virtue of
property 2.68 to yield
∞ � � � � � � 2n�
� s−1 � � Γ n� + 1−s Γ n + 1 − 2s 2n σX
X = 2
. (3.181)

Γ(1 − s)n� !
n =0
134 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Expanding Γ(1 − s) in the same way allows the formation of two Pochham-
mer functions by combination with the numerator Gamma functions; this re-
sults in
∞ � 1−s � � �
� s−1 � �
2 n � 1 − s
2 n� n� 2n�
X = 2 σX
n!
n� =0
� �
1−s s
= 2 F0 , 1 − ; 2σX 2
. (3.182)
2 2

A unique property of this distribution is that a sum of Gaussian r.v.’s is itself


a Gaussian variate. Consider
N

Y = Xi , (3.183)
i=1

where each Xi is a Gaussian r.v. with mean µi and variance σi2 .


The corresponding characteristic function is

N
� t2 σi2
φY (jt) = ejtµi − 2

i=1
�N �� N �
� � 2σ2
−t
= ejtµi
e i
2
i=1 i=1
P t2 PN
jt N − 2
= e µ
i=1 i 2 i=1 σi , (3.184)
�N
which, of course, is the C. F. of a Gaussian r.v. with mean i=1 µi and vari-

ance N i=1 σi .
2

Complex Gaussian distribution


In communications, quantities (channel gains, data symbols, filter coeffi-
cients) are typically complex thanks in no small part to the widespread use of
complex baseband notation (see chapter 4). It follows that it is useful to handle
complex r.v.’s. This is simple enough to do, at least in principle, given that a
complex r.v. comprises two jointly-distributed real r.v.’s corresponding to the
real and imaginary part of the complex quantity.
Hence, a complex Gaussian variate is constructed from two jointly dis-
tributed real Gaussian variates. Consider

Z = A + jB, (3.185)
Probability and stochastic processes 135

where A and B are real Gaussian variates. It was found in [Goodman, 1963]
that such a complex normal r.v. has desirable analytic properties if
� � � �
(A − µA )2 = (B − µB )2 , (3.186)
�(A − µA )(B − µB )� = = 0. (3.187)
The latter implies that A and B are uncorrelated and independent.
Furthermore, the variance of Z is defined
� �
σZ2 = |Z − µZ |2
= �(Z − µZ )(Z − µZ )∗ � , (3.188)
where µZ = µA + jµB . Hence, we have
σZ2 = �[(A − µA ) + j(B − µB )] [(A − µA ) − j(B − µB )]�
� � � �
= (A − µA )2 + (B − µB )2
= σA
2
+ σB
2

= 2σA .
2
(3.189)
It follows that A and B obey the same marginal distribution which is
(a−µA )2
1 −
2σ 2
fA (a) = √ e A . (3.190)
2πσA
Hence, the joint distribution of A and B is
(a−µA )2 (b−µB )2
1 −
2σ 2

2σ 2
fA,B (a, b) = e A e B
2πσA σB
(a−µA )2 +(b−µB )2
1 − 2σ 2
= 2 e
A , (3.191)
2πσA
which directly leads to
(z−µZ )(z−µZ )∗
1 − σ2
fZ (z) = e Z . (3.192)
πσZ2
Figure 3.5 shows the PDF fZ (z) plotted as a surface above the complex
plane.
While we can conveniently express many complex PDFs as a function of a
single complex r.v., it is important to remember that, from a formal standpoint,
a complex PDF is actually a joint PDF since the real and imaginary parts are
separate r.v.’s. Hence, we have
fZ (z) = fA,B (a, b) = fRe{Z},Im{Z} (Re{z}, Im{z}). (3.193)
136 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

0.3
fZ (z) 0.2 2

0.1 1
0
−2 0
Re{z}
−1
0 −1
Im{z} 1
2 −2

Figure 3.5. 2
The bidimensional PDF of unit variance Z (σZ = 1) on the complex plane.

The fact that there are two underlying r.v.’s becomes inescapable when we
consider the CDF of a complex r.v. Indeed, the event Z ≤ z is ambiguous.
Therefore, the CDF must be expressed as a joint CDF of the real and imaginary
parts, i.e.

FRe{Z},Im{Z} (a, b) = P (Re{Z} ≤ a, Im{Z} ≤ b), (3.194)

which, provided that the CDF is differentiable along its two dimensions, can
lead us back to the complex PDF according to
∂2
fZ (z) = fRe{Z},Im{Z} (a, b) = F (a, b). (3.195)
∂a∂b Re{Z},Im{Z}
Accordingly, the C. F. of a complex r.v. Z is actually a joint C.F. defined by
� �
φRe{Z},Im{Z} (jt1 , jt2 ) = ejt1 Re{Z}+jt2 Im{Z} . (3.196)

In the case of a complex Gaussian variate, we have


2
φRe{Z},Im{Z} (jt1 , jt2 ) = ej(t1 Re{Z}+t2 Im{Z}) e−σZ (t1 +t2 ) . (3.197)

Rayleigh distribution
The Rayleigh distribution characterizes the amplitute of a narrowband noise
process n(t), as originally derived by Rice [Rice, 1944], [Rice, 1945]. In
the context of wireless communications, however, the Rayleigh distribution is
most often associated with multipath fading (see chapter 4).
Probability and stochastic processes 137

Consider that the noise process n(t) is made up of a real and imaginary part
(as per complex baseband notation — see chapter 4):
N (t) = A(t) + jB(t), (3.198)
and that N (t) follows a central (zero-mean) complex Gaussian distribution. It
follows that if N (t) is now expressed in phasor notation, i.e.
N (t) = R(t)ejΘ(t) , (3.199)
where R(t) is the modulus (amplitude) and Θ(t) is the phase, their respective
PDFs can be derived through transformation techniques.
We have 2 2
1 − a σ+b2
fA,B (a, b) = 2 e
N , (3.200)
πσN
2 = 2σ 2 = 2σ 2 and the transformation is
where σN A B

R = g1 (A, B) = A2 + B 2 , R ∈ [0, ∞], (3.201)
� �
B
Θ = g2 (A, B) = arctan , Θ ∈ [0, 2π), (3.202)
A
being the conversion from cartesian to polar coordinates. The inverse transfor-
mation is
A = g1−1 (R, Θ) = R cos Θ, A ∈ [−∞, ∞] (3.203)
B = g2−1 (R, Θ) = R sin Θ, B ∈ [−∞, ∞]. (3.204)
Hence, the Jacobian is given by
� ∂R cos Θ ∂R sin Θ � � �
cos Θ sin Θ
J = det ∂R cos Θ ∂R sin Θ = det
∂R ∂R
∂Θ ∂Φ
−R sin Θ R cos Θ
= R(cos Θ + sin Θ) = R.
2 2
(3.205)
It follows that
2
r − σr2
fR,Θ (r, θ) = 2 e
N u(r) [u(θ) − u(θ − 2π)] , (3.206)
πσN
and
� ∞
1
fΘ (θ) = fR,Θ (r, θ)dr =
[u(θ) − u(θ − 2π)] , (3.207)
0 2π
2
� 2π −r � 2π
re σn2
fR (r) = fR,Θ (r, θ)dθ = 2 u(r) dθ
0 πσN 0
2
2r − σr2
= 2 e
N u(r). (3.208)
σN
138 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Hence, the modulus R is Rayleigh-distributed and the phase is uniformly


distributed. Furthermore, the two variables are independent since fR,Θ (r, θ) =
fR (r)fΘ (θ).
The Mellin tranform of fR (r) is
� �
� � s−1
MfR (s) = Rs−1 = σN s−1
Γ 1+ , (3.209)
2
and the kth central moment is given by
� � � �
k
R = σN Γ 1 +
k k
. (3.210)
2
Consequently, the mean is
� � √
3 σN π
�R� = σN Γ = , (3.211)
2 2
and the variance is
� � � �
2
σR = (R − µR )2 = R2 − µ2R
� �
2 π
σN 4−π
= σN Γ(2) −
2
= σN
2
. (3.212)
4 4
The CDF is simply
� 2 2
r
2α − σα2 − r2
FR (r) = P (R < r) = 2 e N dα = 1 − e
σ
N . (3.213)
0 σN
Figure 3.6 depicts typical PDFs and CDFs for Rayleigh-distributed vari-
ables.
Finally, the characteristic function is given by
� 2 t2
� � � �
π σN − σN σN t
φR (jt) = 1 − √ te 4 erfi −j , (3.214)
2 2 2
where erfi(x) = erf(jx) is the imaginary error function.

Rice distribution
If a signal is made up of the addition of a pure sinusoid and narrowband
noise, i.e. its complex baseband representation is
S(t) = C + A(t) + jB(t), (3.215)
where C is a constant and N (t) = A(t) + jB(t) is the narrowband noise
(as characterized in the previous subsection), then the envelope R(t) follows a
Rice distribution.
Probability and stochastic processes 139

0.8
2 =1
σN
0.6

fR (r) 2 =2
σN
0.4

0.2 2 =4
σN

0
0 2 4 6 8 10
r
(a) Probability density function fR (r)

0.8
P (R ≤ r) = FR (r)

2 =1
σN
0.6
2 =2
σN
0.4 2 =3
σN
2 =4
σN
0.2
2 =6
σN
0
0 2 4 6 8 10
r
(b) Cumulative density function FR (r)
Figure 3.6.
` 4−π
2
´ The PDF and CDF of the Rayleigh distribution with various variances σR =
2
σN 4 .

We note that Re{S(t)} = C + A(t), Im{S(t)} = B(t) and



R(t) = (C + A(t))2 + B(t)2 (3.216)
� �
B(t)
Θ(t) = arctan . (3.217)
C + A(t)

The inverse transformation is given by

A(t) = R(t) cos Θ(t) − C (3.218)


B(t) = R(t) sin Θ(t) (3.219)

It follows that the Jacobian is identical to that found for the Rayleigh distri-
bution, i.e. J = R.
140 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Therefore, we have
(r cos θ−C)2 +r 2 sin2 θ
r − σ2
fR,Θ (r, θ) = 2 e
N
πσN
2 −2rC cos θ+C 2
r −r σ2
= 2 e
N , (3.220)
πσN
and the marginal distribution fR (r) is given by
� 2π 2 θ+C 2
r − r −2rCσcos 2
fR (r) = 2 e
N dθ (3.221)
0 πσN
2 2 �
2π 2rC cos θ
r − r σ+C 2 2
= 2 e N e σN dθ, (3.222)
πσN 0
where the remaining integral can be reduced (by virtue of the definition of the
modified Bessel function of the first kind) to yield
2 2 � �
r − r σ+C2 2rC
fR (r) = 2 e N 2I0
2 u(r). (3.223)
σN σN
Finding the marginal PDF of Θ is a bit more challenging:
� ∞ 2 θ+C 2
r − r −2rCσcos
2
fΘ (θ) = 2 e N dr (3.224)
0 πσN
2
− C2 � 2 −2rC cos θ
σ ∞ −r
e N
σ2
= 2 re N dr. (3.225)
πσN 0
According to [Prudnikov et al., 1986a, 2.3.15-3], the above integral reduces
to
− C2
2
� � � √ �
e σ
N C 2 cos2 θ 2C cos θ
fΘ (θ) = exp D−2 − , (3.226)
2π 2σN2 σN
where we can get rid of D−2 (z) (the parabolic cylinder function of order -2)
by virtue of the following identity [Prudnikov et al., 1986b, App. II.8]:
� � �
π z2 z z2
D−2 (z) = ze 4 erfc √ + e− 4 (3.227)
2 2
to finally obtain
− C2
2
�√ 2 cos2 θ � � �
e σ
N πC cos(θ) C σ2
C cos θ
fΘ (θ) = e N erfc − +1
2π σN σN
× [u(θ) − u(θ − 2π)] . (3.228)
Probability and stochastic processes 141

In the derivation, we have assumed (without loss of generality and for ana-
lytical convenience) that the constant term was real where in fact, the constant
term can have a phase (Cejθ0 ). This does not change the Rice amplitude PDF,
but displaces the Rice phase distribution so that it is centered at θ0 . Therefore,
in general, we have

− C2
2
� √ C 2 cos2 (θ−θ )
e σ
N πC cos(θ − θ0 ) σ2
0
fΘ (θ) = 1+ e N × (3.229)
2π σN
� ��
C cos (θ − θ0 )
erfc − [u(θ) − u(θ − 2π)] .
σN
It is noteworthy that the second raw moment (a quantity of some importance
since it reflects the average power of Rician process R(t)) can easily be found
without integration. Indeed, we have from (3.216)
� 2� � �
R = (C + A)2 + B 2
� �
= C 2 + 2AC + A2 + B 2
� � � �
= C 2 + A2 + B 2
= C 2 + σN
2
. (3.230)

One important parameter associated with Rice distributions is the so-called


K factor which constitutes a measure of the importance of the random portion
of the variable (N (t) = A + jB) relative to the constant factor C. It is defined
simply as
C2
K= 2 . (3.231)
σN
Figure (3.7) shows typical PDFs for R and Θ for various values of the K fac-
tor, as well as their geometric relation with the underlying complex
� Gaussian

variate. For the PDF of R, the K factor was varied while keeping R2 = 1.

Central chi-square distribution


Consider the r.v.
N

Y = Xn2 , (3.232)
n=1
where {X1 , X2 , . . . , XN } form a set of i.i.d. zero-mean real Gaussian variates.
Then Y is a central chi-square variate with N degrees-of-freedom.
This distribution is best derived in the characteristic function domain. We
start with the case N = 1, i.e.

Y = X 2, (3.233)
142 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

�{Z}

θ0 θ Cejθ
0
�{Z}
0
(a) Geometry of the Rice amplitude and phase distributions in the complex
plane

1.75
1.75
1.5 K = 10 K = 10
1.5
1.25
1.25 K=2
K=2
fR (r)

1
fΘ (θ)

1
K= 1 K=1
16
0.75 K=1 0.75
0.5 0.5
K= 1
16
0.25 0.25

0 0
−3 −2 −1 0 1 2 3 0 0.5 1 1.5 2 2.5
θ − θ0 (radians) r(m)

(b) Rice phase PDF (d) Rice amplitude PDF

Figure 3.7. The Rice distributions and their relationship with the complex random variable
Z = (C cos(θ) + A) + j (C sin(θ) + B); (a) the distribution of Z in the complex plane (a
non-central complex Gaussian distribution pictured here with concentric isoprobability curves)
and relations with variables R and θ; (b) PDF of R; (c) PDF of Θ.

where the C. F. is simply


� � � 2

φY (jt) = ejtY = ejtX
� ∞ 2 2
ejtx − x2
= √ 2σ
e X dx
−∞ 2πσX
� � ∞ „ «
2 1 jt− 12 x2
= e 2σ
X dx. (3.234)
π σX 0
Probability and stochastic processes 143
� �
The integral can be solved by applying the substitution u = − jt − 2σ12 x2
X
which leads to
� ∞ −u
1 1 e
φY (jt) = √ � √ du
2πσX 1
2 − jt 0 u

� ∞ −u
1 e
= � √ du. (3.235)
π(1 − 2jtσX 2 ) 0 u

According to the definition of the Gamma function, the latter reduces to


� �
Γ 12 � �
2 −1/2
φY (jt) = � = 1 − 2jtσX . (3.236)
π(1 − 2jtσX 2 )

Going back now to the general case of N degrees of freedom, we find that
N
� � �−1/2 � �
2 −N/2
φY (jt) = 1 − 2jtσX
2
= 1 − 2jtσX . (3.237)
n=1

The PDF can be obtained by taking the transform of the above, i.e.
� ∞
1 � �
2 −N/2 −jty
fY (y) = 1 − 2jtσX e dt, (3.238)
2π −∞
which, according to [Prudnikov et al., 1986a, 2.3.3-9], reduces to
1 2
fY (y) = � �N/2 y N/2−1 e−y/2σX u(y). (3.239)
2σX
2 Γ (N/2)
The shape of the chi-square PDF for 1 to 10 degrees-of-freedom is shown
in Figure 3.8. It should be noted that for N = 2, the chi-square distribution
reduces to the simpler exponential distribution.
The chi-square distribution can also be derived from complex Gaussian
r.v.’s.
Let
�N
Y = |Zn |2 , (3.240)
n=1
where {Z1 , Z2 , . . . , ZN } are i.i.d. complex Gaussian variates with variance
σZ2 .
If we start again with the case N = 1, we find that the C.F. is
� � � � � � � �−1
φY (jt) = ejtY = ejt|Z1 | = ejt(A +B ) = 1 − jtσZ2
2 2 2
, (3.241)
144 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

2
N = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

1.5
fY (y)

0.5

0
0 2 4 6 8 10
y

Figure 3.8. The chi-square PDF for values of N between 1 and 10.

where A = �{Z} and B = �{Z}. Hence, is is a 2 degrees of freedom chi-


square variate which can be related to (3.237) by noting that σZ2 = 2σA
2 = 2σ 2 .
B
It follows that in the general case of N variates, we have
� �−N
φY (jt) = 1 − jtσZ2 , (3.242)
which naturally leads to
1 2
N −1 −y/σZ
f y(y) = y e u(y). (3.243)
(σZ2 )N Γ(N )
which is a 2N degrees-of-freedom X 2 variate. Equivalently, it can also be said
that this r.v. has N complex degrees-of-freedoms.
The k th raw moment is given by
� � � ∞
1 2
Y k
= y N −1+k e−y/σZ dy
0 (σZ ) Γ(N )
2 N
� ∞
(σZ )N −1+k
2
= uN −1+k e−u du
(σZ2 )N −1 Γ(N ) 0
(σZ2 )k
= Γ(N + k). (3.244)
Γ(N )
It follows that the mean is
2
σX
µY = �Y � = Γ(N + 1) = N σZ2 , (3.245)
Γ(N )
Probability and stochastic processes 145

and the variance is given by


� �
σY2 = (Y − µY )2
� �
= Y 2 − �Y �2
σZ4 Γ(N + 2)
= − N 2 σZ4
Γ(N )
= σZ4 (N + 1)N − N 2 σZ4 = σZ4 N. (3.246)

Non-central chi-square distributions


The non-central chi-square distribution characterizes a sum of squared non-
zero-mean Gaussian variates. Let
N

Y = Xn2 , (3.247)
n=1

where {X1 , X2 , . . . , XN } form a set of independant real Gaussian r.v.’s with


equal variance σX2 and means {µ , µ , . . . , µ } and
1 2 N

N

|µn | =
� 0, (3.248)
n=1

i.e. at least one Gaussian r.v. has a non-zero mean.


As a first step in deriving the distribution, consider

Yn = Xn2 , (3.249)

where
−(x−µ )2
1 2
n
fXn (x) = √ e 2σX . (3.250)
2πσX

According to (3.57) and since Xn = ± Yn , we find that
� −(√y−µ )2 √
−( y−µn )2

1 2σ 2
n
2σ 2
1
fYn (y) = √ e X +e X √
2πσX 2 y
µ2
� √ √ �
1
yµn yµ
− y2 − n2 − 2n
σ2
= √ √ e Xe X e X +e X
2σ 2σ σ

2 2πσX y
µ2 �√ �
1 − y2 − n2 yµn
= √ √ e X e X cosh
2σ 2σ
2 . (3.251)
2πσX y σX
146 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

To find the characteristic function, we start by expanding the hyperbolic


cosine into an infinite series:
� �
φYn (jt) = ejtYn
� ∞
ejtα 2 2 2
= √ √ e−α/2σX e−µn /2σX ×
0 2πσX α
� (αµ2 )k � 1 �k

n
, (3.252)
(2k)! 4
σX
k=0
2k � �
Applying the identity (2k)! = √ Γ
2
π
k+ 1
2 Γ(k + 1), the above becomes
− µn
∞ � 4 �−k � ∞
� � e 2σ 2
X � n 4σX
µ2k − α2
k− 21
ejtYn = √ � � ejtα
e 2σ
X α dα, (3.253)
2σX k=0
Γ k + 2 k! 0
1

where the resulting integral can be solved in a manner similar to that of sub-
section 6 to give
µn

2σ 2 ∞
� � �k
� � e X µ2k n (2σX
2 )k+1/2
1
ejtYn = √
2σX k (1 − j2σX2 t)k+1/2 4σX
2
k=0
2 µ2
n /2
e−µn /2σX 1−j2σ 2 t
= e X
(1 − j2σX
2 t)1/2

jtµ2
n
1−j2σ 2 t
e X
= . (3.254)
(1 − j2σX
2 t)1/2

In the general case, the C.F. is given by


� � � PN � N
� � �
φY (jt) = e jtY
= e jt n=1 Yn
= ejtYn (3.255)
n=1

Substituting (3.253) in (3.255) yields


jtµ2
n
N
� 1−j2tσ 2 t
e X
φY (jt) =
n=1
(1 − jt2σX
2 t)1/2

jtU
2
e 1−jt2σX t
= , (3.256)
(1 − jt2σX 2 t)N/2

where
N

U= µ2n . (3.257)
n=1
Probability and stochastic processes 147

To find the corresponding PDF by taking the Fourier transform, we once


again resort to an infinite series expansion, i.e.
� ∞
1
fY (y) = e−jyt φY (jt)dt
2π −∞
jtU
� ∞ −jyt 1−j2σ 2 t
1 e e X
= dt
2π −∞ (1 − j2σX 2 t)N/2

� 2
U/2σX
1 −U/2σ2 ∞ e−jyt 1−j2σ 2 t
= e X e X dt
2π −∞ (1 − j2σX t)
2 N/2

∞ � � � 2 � �k
1 −U/2σ2 � 1 ∞ e−jyt U/ 2σX
= e X dt
2π k! −∞ (1 − j2σX 2 t)N/2 1 − j2σ 2 t
X
k=0
∞ � 2�k� ∞
1 −U/2σ2 � [U/ 2σX ] e−jyt
= e X dt
2π k! −∞ (1 − j2σ 2 t)N/2+k
X
k=0
(3.258)

which, according to [Prudnikov et al., 1986a, 2.3.3-9], reduces to


� 2�k
2σX ] y N/2+k−1 e−y/(2σX )
�∞ 2
−U/2σX2 [U/
fY (y) = e u(y). (3.259)
k=0
k! (2σX
2 )N/2+k Γ( N + k)
2

We note, however, that


∞ � 4�k k � � �
� [U/ 4σX ] y N � Uy
= 0 F1 � (3.260)
k!Γ( N2 + k) 2 � 4σ 4 ,
k=0 X

which leads directly to


� � �
y N/2−1 e−y/(2σX )
2
2 N � Uy
fY (y) = e −U/2σX
0 F1
� (3.261)
(2σX2 )N/2 2 � 4σ 4 .
X

From (3.261), it is useful to observe that


∞ � 2�k
− U2 � [U/ 2σX ] � �
fY (y) = e X 2σ 2
gN +2k y, σX , (3.262)
k!
k=0

where gN +2k (y, σX 2 ) is the PDF of a central chi-square variate with N + 2k

degrees-of-freedom and a variance of σX 2 . This formulation of a non-central

χ2 PDF as a mixture of central χ2 PDFs is often useful to extend results from


the central to the non-central case.
148 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Like in the central case, the non-central χ2 distribution can be defined from
complex normal r.v.’s. Given the sum of the moduli of N squared complex
normal r.v.’s with the same variance σZ2 and means {µ1 , µ2 , . . . , µN }, the PDF
is given by
− U
σ2
� � �
e Z y N −1 − y
�Uy
fY (y) = e σ2
Z 0 F1

N � 4 u(y). (3.263)
σZ2N σZ
It follows that the corresponding C.F. is
jtU
2
e 1−jσZ t
φY (jt) = � �N . (3.264)
1 − jσZ2 t

From (3.262) and (3.243), the kth raw moment is


� � − �∞
U
[U/σZ2 ]m Γ (N + m + k)
σ2
Y k
= e Z σZ2k
m! Γ (N + m)
m=0
� � �
2
−U/σZ N + k �� U Γ(N + k)
= e 2k
σ Z 1 F1 , (3.265)
N � σZ2 Γ(N )
which, by virtue of Kummer’s identity, becomes
� � � � �
−k �� U Γ(N + k)
Y k
= σ Z 1 F1
2k
− . (3.266)
N � σ2 Γ(N ) Z

A most helpful fact at this point is that a hypergeometric function with a


single negative integer upper argument is in fact a hypergeometric polynomial,
i.e. it can be expressed as a finite sum. Indeed, we have:
� � � � ∞
−k �� Γ(−k + m)Γ(n1 )(−A)m
F − A = , (3.267)
1 1
n1 � Γ(n1 + m)Γ(−k)m!
m=0

which, by virtue of the Gamma reflection formula, becomes


�∞
Γ(n1 )(−1)m Γ(1 + k)(−A)m
1 F1 (· · · ) =
Γ(n1 + m)Γ(1 + k − m)m!
m=0
�k � �
k Γ(n1 )Am
= , (3.268)
m Γ(n1 + m)
m=0

where the terms for m > k are zero because they are characterized by a
Gamma function at the denominator (Γ(1 + k − m)) with an integer argu-
ment which is zero or negative. Indeed, such a Gamma function has an infinite
magnitude.
Probability and stochastic processes 149

Therefore, we have
� � k �
� �� �m
k U/σZ2
Y k
= σZ2k Γ(N + k) . (3.269)
m m!
m=0
From this finite sum representation, the mean is easily found to be
� �
U
�Y � = σZ N 1 + 2
2
. (3.270)
σZ N
Likewise, the second raw moment is
� � � �
� 2� 2U U2
Y = σZ (N + 1) N + 2 + 4 .
4
(3.271)
σZ σZ
It follows that the variance is


σY2 = Y 2 − �Y �2
� �
2U
= σZ N + 2 .
4
(3.272)
σZ

Non-central F -distribution
Consider
Z1 /n1
Y =, (3.273)
Z2 /n2
where Z1 is a non-central χ2 variate with n1 complex degrees-of-freedom and
a non-centrality parameter U , and Z2 is a central χ2 variate with n2 complex
degrees-of-freedom, then Y is said to follow a non-central F -distribution pa-
rameterized on n1 , n2 and U .
Given the distributions
1
fZ1 (x) = xn1 −1 e−x e−U 0 F1 (n1 |U x ) u(x), (3.274)
Γ(n1 )
1
fZ2 (x) = xn2 −1 e−x u(x), (3.275)
Γ(n2 )
we apply two simple univariate transformations:
Z1 n2
A1 = , A2 = , (3.276)
n1 Z2
such that Y = A1 A2 .
It is straightforward to show that
nn1 1 n1 −1 −n1 x −U
fA1 (x) = x e e 0 F1 (n1 |U n1 x ) u(x), (3.277)
Γ(n1 )
nn2 2 −n2 −1 − n2
fA2 (x) = x e x u(x). (3.278)
Γ(n2 )
150 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

It then becomes possible to compute the PDF of Y through the Mellin con-
volution, i.e.
� �y�

1
fY (y) = fA2 fA1 (x) dx, y ≥ 0
0 x x
� ∞ n2 � �n2 +1 n x n1
n2 x − 2 n1
= e y xn1 −2 e−n1 x e−U ×
0 Γ(n2 ) y Γ(n1 )
0 F1 (n1 |U n1 x ) dx
� ∞ “ ”
nn2 2 nn1 1 e−U n2
n1 +n2 −1 − y +n1 x
= x e ×
y n2 +1 Γ(n1 )Γ(n2 ) 0
0 F1 (n1 |U n1 x ) dx, (3.279)

where the integral can be interpreted as the Laplace transform of xν 0 F1 (n |Kx )


(see property 2.79) and yields
� �n1 � � �
n1
e−U Γ(n + n2 ) n2 y n1 −1 � n1
n1 + n2 � n2 U y
1
fY (y) = � �n1 +n2 1 F1 � u(y).
Γ(n1 )Γ(n2 ) n1 � 1 + nn12 y
1+ n1
n2 y
(3.280)
It is noteworty that letting U = 0 in the above, it reduces to the density of a
central F-distribution, it being the ratio of two central χ2 variates, i.e.
� �n1
n1
1 n2 y n1 −1
fY (y) = � �n1 +n2 u(y). (3.281)
B(n1 , n2 )
1+ n1
n2 y

Examples of the shape of the F-distribution are shown in Figure 3.9.


The PDF (3.280) could also have been obtained by representing the PDF
of Z1 as an infinite series of central chi-square PDFs according to (3.262) and
proceeding to find the PDF of a ratio of central χ2 variates. Hence, we have:

∞ �

�k
− U
σ2
U/σZ2
fZ1 (z) = e Z hn1 +k (z, σZ2 ), (3.282)
k!
k=0

where we have included the variance parameter σZ2 (being the variance of the
underlying complex gaussian variates; it was implicitely equal to 1 in the pre-
σ2
ceding development leading to (3.280)) and hn (z, σZ2 ) = g2n (z, 2Z ) is the
density in z of a central χ2 variate with n complex degrees-of-freedom and a
variance of the underlying complex Gaussian variates of σZ2 .
Probability and stochastic processes 151

0.6

0.5

0.4
fY (y)

0.3

0.2 U = 0, 1, 3, 5

0.1

0
0 2 4 6 8 10
y

Figure 3.9. The F-distribution PDF with n1 = n2 = 2 and various values of the non-centrality
parameter U .

Given that the PDF of Z2 is (3.275), we apply (for the sake of diversity with
respect to the preceding development) the following joint transformation:

n2 Z1
Y = ,
n1 Z2
(3.283)
X = Z2 .

The Jacobian is J = n1 x
n2 . It follows that the joint density of y and x is
� �
n1 x n1
fY,X (y, x) = fZ xy fZ2 (x)
n2 1 n2
�∞ � �m � �
n1 x − σU2 U/σ 2 Z n1
= e Z hn1 +m xy, σZ2 ×
n2 m! n2
m=0
1
xn2 −1 e−x
Γ(n2 )
„ «
∞ � �
2 m
− U2 � U/σZ
n y
n1 x −x 1+ 1 2
n2 −1
= x e n2 σ
Z e
σ
Z ×
n2 Γ(n2 )σZ2 m!
k=0
� n1 �n1 +m−1
n2 xy 1
u(x)u(y). (3.284)
σZ2 Γ (n1 + m)
152 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The marginal PDF of Y is, as usual, obtained by integrating over X. In


this case, it is found convenient to exchange the order of the summation and
integration operations, i.e.
� ∞
fY (y) = fY,X (y, x)dx
0
∞ � �m � n1 �n1 +m−1
n1 − U2 � U/σZ2 n2 y
= e σ
Z ×
Γ(n2 + 1)σZ 2 m!Γ(n1 + m) σZ2
m=0
„ «
� ∞ n y
−x 1+ 1 2
xn2 +n1 +m−1 e n2 σ
Z dx. (3.285)
0
� �
Letting v = x 1 + nn1σy2 , we find that the remaining integrand follows the
2 Z
definition of the Gamma function. Hence, we have
„ «
� ∞ n y
−x 1+ 1 2
I = xn2 +n1 +m−1 e n2 σ
Z
0
� ∞
1
= � �n1 +n2 +m v n1 +n2 +m−1 e−v dv
n1 y
1 + n σ2 0
2 Z

Γ (n1 + n2 + m)
= � �n1 +n2 +m . (3.286)
n1 y
1 + n σ2
2 Z

Substituting the above in (3.284), we obtain


n1 − U2
fY (y) = e σ
Z × (3.287)
Γ(n2 + 1)σZ2
∞ � �m � n1 �n1 +m−1
� U/σZ2 Γ(n1 + n2 + m) n2 y
� �n1 +n2 +m u(y),
m=0 m!Γ(n1 + m) 1 + 1 2
n y σZ2
n σ 2 Z

where the resulting series defines a 1 F1 (·) (Kummer) confluent hypergeometric


function, thus leading to the compact representation
� �n1 � � �
− U2 n1
σ
e Z u(y) n2 y n1 −1 n1 + n2 �
� n1
U y
n
fY (y) = 2n1 � � 1 F1 � 4 2 2 n1 .
σZ B(n1 , n2 ) 1 + n1 y n1 +n2 n1 � σZ + σZ n y
2 2
n2 σZ
(3.288)
It is noteworthy that (3.287) can be expressed as a mixture of central F -
distributions, i.e.
∞ � �
2 m � � ��
− U2 � U/σZ (F ) m
fY (y) = e Z σ
hn1 +m,n2 y, σZ2 1 + , (3.289)
m! n1
m=0
Probability and stochastic processes 153

where
� �ν 1
� � 1 ν1 y ν1 −1
h(F )
ν1 ,ν2 y, σF2 = � �ν1 +ν2 u(y), (3.290)
B (ν1 , ν2 ) ν2 σF2 ν1 y
1+ 2
ν 2 σF

is the PDF of a central F -distributed variate in y with ν1 and ν2 degrees-of-


freedom and where the numerator has a mean of σF2 1 .
The raw moments are best calculated based on the series representation
(3.287). Thus, after inverting the order of the summation and integration oper-
ations, we have:
� �n1 � � � n �m
2 m
� � n1
n2
U �
− 2
∞ U/σ Z
1
n2 σZ2 Γ(n1 + n2 + m)
Y k
= e σ
Z ×
Γ(n2 )σZ2n1 m!Γ(n1 + m)
m=0
� ∞
y k+n1 +m−1
� �n1 +n2 +m dy. (3.291)
0 1 + nn1σy2
2 Z

n1
y
n2 σ 2
Letting u = 1+
Z
n1 y , we find that
n2 σ 2
Z

n2 σZ2 u
y = , (3.292)
n1 (1 − u)
n1
dy = (1 − u)2 du, (3.293)
n2 σZ2
which leads to
� �−k−n1 −m
� 1 un1 +m+k−1 n1
2
n2 σZ
I = k+1−n2
du
0 (1 − u)
� �−k−n1 −m
n1
2
n2 σZ
= , n2 > k. (3.294)
B (k + m + n1 , −k + n2 )
Substituting (3.293) in (3.290), we find
� 2 �k
� � n2 σZ ∞ � �m
n1 − U2 � U/σZ2
Y k
= σ
e Z ×
Γ(n2 ) m!Γ(n1 + m)
m=0
Γ (k + m + n1 ) Γ (−k + n2 ) , n2 > k, (3.295)

1 The mean of the numerator is indeed ν times the variance of the underlying Gaussians by virtue of (3.245)
1
154 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

which also has a compact representation similar to (3.288):


� 2
�k − U
� � n2 σZ
e σ2
Γ(k + n1 )Γ(n2 − k)� � �
k + n1 �� U
Z
n1
Y =k
1 F1 .
Γ(n1 )Γ(n2 ) n1 � σZ2
(3.296)
According to Kummer’s identity, the above can be transformed as follows:
� �n1 +k−n2
� � n1 �
Γ(k + n1 )Γ(n2 − k) � �
2
n2 σZ −k �� U
Y k
= 1 F1 − .
Γ(n1 )Γ(n2 ) n1 � σZ2
(3.297)
By virtue of (3.267) and (3.268), a hypergeometric function with an upper
negative integer argument (and no lower negative integer argument) is repre-
sentable as a finite sum. Thus, we have
� 2
�k
n2 σZ
� � n1 Γ(k + n1 )Γ(n2 − k)
Yk = ×
Γ(n2 )
� �m
k �
� � U
k 2
σZ
. (3.298)
m Γ(n1 + m)
m=0

It follows that the mean is


� �
n2 σZ2 n1 + U
2
σZ
�Y � = , n2 > 1. (3.299)
n1 (n2 − 1)

In the same fashion, we can show that the second raw moment is
�� � �
� 2 � n22 σZ4 1 U U2
Y = 2 2 + n1 (n1 + 1) + 4 .
n21 (n2 − 1)(n2 − 2) σZ σZ
(3.300)
It follows that the variance is
� �2 
2 4

 n + U 

n2 σ Z 1 2
σZ U
σY = 2
2
+ 2 2 + n1 . (3.301)
n1 (n2 − 1)(n2 − 2)   n2 − 1 σZ 

Beta distribution
Let Y1 and Y2 be two independent central chi-square random variables with
n1 and n2 complex degrees-of-freedom, respectively and where the variance of
the underlying complex Gaussians is the same and is equal to σZ2 . Furthermore,
Probability and stochastic processes 155

we impose that

A2 = Y1 + Y2 , (3.302)
BA2 = Y1 . (3.303)

Then A and B are independent and B follows a beta distribution with PDF
Γ(n1 + n2 ) n1 −1
fB (b) = b (1 − b)n2 −1 [u(b) − u(b − 1)] . (3.304)
Γ(n1 )Γ(n2 )
As a first step in deriving this PDF, consider the following bivariate trans-
formation:

C1 = Y1 + Y2 , (3.305)
C2 = Y1 , (3.306)

where it can be verified that the corresponding Jacobian is 1.


Since Y1 and Y2 are independent, their joint PDF is
y1 +y2

y1n1 −1 y2n2 −1 e σ2
Z
fY1 ,Y2 (y1 , y2 ) = 2(n1 +n2 )
u(y1 )u(y2 ), (3.307)
σZ Γ(n1 )Γ(n2 )
which directly leads to
c1

cn2 1 −1 (c1 − c2 )n2 −1 e σ2
Z
fC1 ,C2 (c1 , c2 ) = 2(n1 +n2 )
u(c1 )u(c2 ). (3.308)
σZ Γ(n1 )Γ(n2 )
Consider now a second bivariate transformation:

T = C1 , (3.309)
BT = C2 , (3.310)

where the Jacobian is simply t.


We find that
t
− 2
Γ(n1 + n2 ) n1 −1 tn1 +n2 −1 e σZ
fB,T (b, t) = b (1 − b)n2 −1 2(n +n ) u(b)u(t).
Γ(n1 )Γ(n2 ) σZ 1 2 Γ(n1 + n2 )
(3.311)
Since this joint distribution factors, B and T are independent and B is said
to follow a beta distribution with n1 and n2 complex degrees-of-freedom and
PDF given by
Γ(n1 + n2 ) n1 −1
fB (b) = b (1 − b)n2 −1 u(b), (3.312)
Γ(n1 )Γ(n2 )
156 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

5
[2, 1], [3, 1], [4, 1], [5, 1]
4

[4, 2]
3
[4, 4]
fB (b)

[3, 2]
2 [2, 2]

0
0 0.2 0.4 0.6 0.8 1
b

Figure 3.10. The beta PDF for various integer combinations of [n1 , n2 ].

while T follows a central chi-square distribution with n1 +n2 complex degrees-


of-freedom and an underlying variance of σZ2 .
Representative instances of the beta PDF are shown in Figure 3.10.
The CDF of a beta-distributed variate is given by


Γ(n1 + n2 ) x
FB (x) = P (B < x) = bn1 −1 (1 − b)n2 −1 db. (3.313)
Γ(n1 )Γ(n2 ) 0

Applying integration by parts, we obtain

� � �
Γ(n1 + n2 ) xn1 (1 − x)n2 −1 n2 − 1 x n1 n2 −2
FB (x) = + b (1 − b) db ,
Γ(n1 )Γ(n2 ) n1 n1 0
(3.314)
where it can be observed that the remaining integral is of the same form as the
original except that the exponent of b has been incremented while the exponent
of (1 − b) has been decremented. It follows that integration by parts can be
applied iteratively until the last integral term contains only a power of b and is
Probability and stochastic processes 157

thus expressible in closed form:



Γ(n1 + n2 ) xn1 (1 − x)n2 −1
FB (x) = +
Γ(n1 )Γ(n2 ) n1

n2 − 1 xn1 −1 (1 − x)n2 −2
+ ···
n1 n1 + 1

n2 − 2 xn1 +n2 −1 (1 − x)
··· + +
n1 + 1 n1 + n2 − 2
�� �
xn1 +n2 −1
··· . (3.315)
(n1 + n2 − 1)(n1 + n2 )
By inspection, a pattern can be identified in the above, thus leading to the
following finite series expression:
2 −1
n�
(1 − x)n2 −1−m xm
FB (x) = xn1 Γ (n1 + n2 + 1) . (3.316)
Γ(n1 + m + 1)Γ(n2 − m)
m=0
However, the above development is valid only if the degrees-of-freedom are
integers. But the PDF exists even if that is not the case. In general, we have:
� � �
Γ(n1 + n2 ) n 1 − n2 ��
FB (x) = x 2 F1
n1
�x . (3.317)
Γ(n1 + 1)Γ(n2 ) 1 + n1
The kth raw moment is given by
� � �
Γ(n1 + n2 ) 1 k+n1 −1
B k
= b (1 − b)n2 −1 db
Γ(n1 )Γ(n2 ) 0
Γ(n1 + n2 )Γ(k + n1 )
= . (3.318)
Γ(n1 )Γ(k + n1 + n2 )
Therefore, the mean is given by
n1
�B� = , (3.319)
n1 + n2
the second raw moment by
� 2� n1 (1 + n1 )
B = , (3.320)
(n1 + n2 )(1 + n1 + n2 )
and the variance by
� �
2
σB = B 2 − �B�2
n1 (1 + n1 ) n21
= −
(n1 + n2 )(1 + n1 + n2 ) (n1 + n2 )2
n1 (1 + n1 − n1 n2 )
= . (3.321)
(n1 + n2 )2 (1 + n1 + n2 )
158 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Nakagami-m distribution
The Nakagami-m distribution was originally proposed by Nakagami [Nak-
agami, 1960] to model a wide range of multipath fading behaviors. Joining
the Rayleigh and Rice PDFs, the Nakagami-m distribution is one of the three
most encountered models for multipath fading (see chapter 4 for more details).
Unlike the other two, however, the Nakagami PDF was obtained through em-
pirical fitting with measured RF data. It is of interest that the Nakagami-m
distribution is very flexible, being controlled by its m parameter. Thus, it in-
cludes the Rayleigh PDF as a special case, and can be made to approximate
closely the Rician PDF.
The Nakagami-m PDF is given by
2 � m �m 2m−1 −m y
fY (y) = y e Ω u(y), (3.322)
Γ(m) Ω
where m is the distribution’s parameter which takes on any real value between
2 and ∞, and Ω is the second raw moment, i.e.
1

� �
Ω= Y2 . (3.323)

In general, the kth raw moment of Y is given by


� � Γ � k + m� � Ω �k/2
Yk = 2
. (3.324)
Γ(m) m
Likewise, the variance is given by
� �
σY2 = Y 2 − �Y �2
� � ��
1 Γ2 12 + m
= Ω 1− . (3.325)
m Γ2 (m)

� 1 The� Nakagami distribution reduces to a Rayleigh PDF if m = 1. If m ∈


2 , 1 , distributions which are more spread out (i.e. have longer tails) than
Rayleigh result. In fact, a half-Gaussian distribution is obtained with the min-
imum value m = 12 . Values of m above 1 result in distributions with a more
compact support than the Rayleigh PDF. The distribution is illustrated in Fig-
ure 3.11 for various values of m.
It was found in [Nakagami, 1960] that a very good approximation of the
Rician PDF can be obtained by relating the Rice factor K and the Nakagami
parameter m as follows:

m2 − m
K= √ , m > 1, (3.326)
m − m2 − m
Probability and stochastic processes 159

with the inverse relationship being

(K + 1)2 K2
m= =1+ . (3.327)
2K + 1 2K + 1

2
m = 12 , 34 , 1, 32 , 2, 52 , 3

1.5
fY (y)

0.5

0
0 1 2 3 4 5 6
y

Figure 3.11. The Nakagami distribution for various values of the parameter m with Ω = 1.

Lognormal distribution
The lognormal distribution, while being analytically awkward to use, is
highly important in wireless communications because it characterizes very
well the shadowing phenomenon which impacts outdoor wireless links. In-
terestingly, other variates which follow the lognormal distribution include the
weight and blood pressure of humans, and the number of words in the sen-
tences of the works of George Bernard Shaw!
When only path loss and shadowing are taken into account (see chapter 4),
it is known that the received power Ω(dB) at the end of a wireless transmission
over a certain distance approximately follows a noncentral normal distribution,
i.e.
� �
1 (p − µP )2
fΩ(dB) (p) = √ exp − , (3.328)
2πσΩ 2σP2

where
Ω(dB) = 10 log10 (P ). (3.329)
160 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

After an appropriate transformation of r.v.’s, the lognormal distribution of P


is found to be
� �
η (10 log10 (x) − µP )2
fP (x) = √ exp − u(x), (3.330)
2πσΩ x 2σP2

where η = ln(10)
10
.
This density, however, is very difficult to integrate. Nonetheless, the kth raw
moment can be found by using variable substitution to revert to the Gaussian
form in the integrand:
� � � ∞
P k
= xk fP (x)dx
0
� ∞ � �
η (10 log10 (p) − µP )2
= √ xk−1
exp − dx
2πσΩ 0 2σP2
� ∞ � �
1 (y − µP )2
= √ 10ky/10 exp − dy
2πσΩ −∞ 2σP2
� ∞ � �
1 (y − µP )2
= √ e ky/η
exp − dy
2πσΩ −∞ 2σP2
� ∞ � 2 �
η η (z − µP /η)2
= √ e exp −
kz
dz (3.331)
2πσΩ −∞ 2σP2
which, by virtue of [Prudnikov et al., 1986a, 2.3.15-11], yields
� � � 2 �
1 σP 2 µP
P = exp
k
k + k . (3.332)
2 η2 η
It follows that the variance is given by
� �
σP2 = P 2 − �P �2
� 2 � � 2 �
2σP 2µP 2 σP µP
= exp + − exp +
η2 η 2η 2 η
� 2 � � 2 �
2σP 2µP σP 2µP
= exp + − exp 2 +
η2 η η η
� �� � 2� �
2µP σ 2 σ
= exp + P2 exp P2 − 1 . (3.333)
η η η

6. Multivariate statistics
Significant portions of this section follow the developments in the first few
chapters of [Muirhead, 1982] with additions, omissions and alterations to suit
our purposes. One notable divergence is the emphasis on complex quantities.
Probability and stochastic processes 161

Random vectors
Given an M × M random vector x, its mean is defined by the vector
 
�x1 �
 �x2 � 
 
µ = �x� =  ..  . (3.334)
 . 
�xM �
The central second moments of x, analogous to the variance of a scalar
variate, are defined by the covariance matrix:
� �
Σx = [σmn ]m=1,··· ,M
n=1,··· ,M = (x − µ) (x − µ) ,
H
(3.335)
where
σmn = �(xm − µm )(xn − µn )∗ � . (3.336)
Lemma 3.1. An M ×M matrix Σx is a covariance matrix iff it is non-negative
definite (Σ ≥ 0).
Proof. Consider the variance of aH x where a is considered constant. We have:
� �
Var(aH x) = aH (x − µx ) (x − µx )H a
� �
= aH (x − µx ) (x − µx )H a
= aH Σx a. (3.337)
� H �2
The quantity �a (x − µx )� is by definition always non-negative. The vari-
ance above is the expectation of this expression and is therefore also non-
negative. It follows that the quadratic form aH Σx a ≥ 0, proving that Σx
is non-negative definite.
The above does not rule out the possibility that aH Σx a = 0 for some vector
a. This is only possible if there is some linear combination of the elements of
x, defined by vector a, which always yields 0. This is in turn only possible
if all elements of x are fully defined by x1 , i.e. there is only a single random
degree of freedom. In such a case, the complex vector x is constrained to a
fixed 2D hyperplane in R2M . If x is real, then it is constrained to a line in RM .

Multivariate normal distribution


Definition 3.40. An M × 1 vector has an M -variate real Gaussian (normal)
distribution if the distribution of aT x is Gaussian for all a in RM where a �= 0.
Theorem 3.2. Given an M × 1 vector x whose elements are real Gaussian
variates with mean
µx = �x� , (3.338)
162 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

and covariance matrix


� �
Σx = (x − µ) (x − µ)T , (3.339)

then x is said to follow a multivariate Gaussian distribution, denoted x ∼


NM (µ, Σ), and the associated PDF is
� �
−M − 1 1 −1
fx (x) = (2π) 2 det(Σ) 2 exp − (x − µ) Σ (x − µ) .
T
(3.340)
2
Proof. Given a vector u where the elements are zero-mean i.i.d Gaussian vari-
ates with u ∼ N (0, I). We have the transformation
x = Au + µ, (3.341)
which implies that
� �
Σx = (x − µ)(x − µ)T
� �
= (Au)(Au)T
� �
= A uuT AT
= AAT . (3.342)
Assuming that A is nonsingular, the inverse transformation is
u = B(x − µ), (3.343)
where B = A−1 .
The Jacobian is
 ∂u1 ∂u1 
∂x1 ··· ∂xM

= det  .. .. .. 
J . . . 
∂uM ∂uM
∂x1 ··· ∂xM
−1
= det(B) = det(A )
1
= det(A)−1 = det(Σ)− 2 . (3.344)
Given that
M
� 1 1 2
fu (u) = (2π)− 2 e− 2 um
m=1
� M

−M 1� 2
= (2π) 2 exp − um
2
m=1
� �
M 1
= (2π)− 2 exp − uT u . (3.345)
2
Probability and stochastic processes 163

Applying the transformation, we get


� �
−M − 1 1 � −1 �H −1
fx (x) = (2π) 2 det(Σ) 2 exp − (x − µ) A
T
A (x − µ) ,
2
(3.346)
� �H
which, noting that Σ−1 = A−1 A−1 , constitutes the sought-after result.

Theorem 3.3. Given an M ×1 vector x whose elements are complex Gaussian


variates with mean u and covariance matrix
� �
Σ = xxH , (3.347)

then x is said to follow a complex multivariate Gaussian distribution and this


is denoted x ∼ CN M (u, Σ). The associated PDF is
� �
fx (x) = (π)−M det(Σ)−1 exp −(x − u)H Σ−1 (x − u) . (3.348)

The proof is almost identical to the real case and is left as an exercise.
Theorem 3.4. If x ∼ CN M (u, Σ) and A is a fixed P × M matrix and b is a
fixed K × 1 vector, then
� �
y = Bx + b ∼ CN P Bu + b, BΣBH . (3.349)

Proof. From definition 3.40, it is clear that any linear combination of the el-
ements of a Gaussian vector, real or complex, is a univariate Gaussian r.v. It
directly follows that y is a multivariate normal vector. Furthermore, we have

�y� = �AX + b�
= A �X� + b
= Au + b (3.350)
� H� � �
yy = (Ax + b − (Au + b)) (Ax + b − (Au + b))H
� �
= (A(x − u)) (A(x − u))H
� �
= A (x − u)(x − u)H AH
= AΣAH , (3.351)

which concludes the proof.


Theorem 3.5. Consider x ∼ CN M (u, Σ) and the division of x, its mean
vector, and covariance matrix as follows:
� � � � � �
x1 u1 Σ11 0
x= u= Σ= , (3.352)
x2 u2 0 Σ22
164 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

where x1 is P × 1 and has mean vector u1 and covariance matrix Σ11 , x2 is


(N − P ) × 1 and corresponds to mean vector u2 and covariance matrix Σ22 .
Then x1 ∼ CN P (u1 , Σ11 ) and x2 ∼ CN M −P (u2 , Σ22 ). Furthermore, x1
and x2 are not only uncorrelated, they are also independent.
Proof. To prove this theorem, it suffices to introduce the partitioning of x into
x1 and x2 into the PDF of x. Hence:
� � �� ��−1 � � �H
x1 −M Σ11 0 (x1 − u1 )
fx = (π) det exp − ×
x2 0 Σ22 (x2 − u2 )
� �−1 � ��
Σ11 0 (x1 − u1 )H
0 Σ22 (x2 − u2 )H
= (π)−M det (Σ11 )−1 det (Σ22 )−1 ×
� �
exp (x1 − u1 )H Σ−1 11 (x1 − u1 ) ×
� �
exp (x2 − u2 )H Σ−1 22 (x2 − u2 )
= fx1 (x1 )fx2 (x2 ). (3.353)
Since the density factors, independence is established. Furthermore, the
PDFs of x1 and x2 have the expected forms, thus completing the proof.
The preceding theorem establishes a very important property of Gaussian
variates: if two (or more) Gaussian r.v.’s are uncorrelated, they are also au-
tomatically independant. While it may seem counterintuitive, this is not in
general true for arbitrarily-distributed variables.
Theorem 3.6. Consider x ∼ CN M (u, Σ) and the division of x, its mean
vector, and covariance matrix as follows:
� � � � � �
x1 u1 Σ11 Σ12
x= u= Σ= , (3.354)
x2 u2 Σ21 Σ22
where x1 is P × 1 and has mean vector u1 and covariance matrix Σ11 , x2 is
(N − P ) × 1 and corresponds to mean vector u2 and covariance matrix Σ22 .
Then
y = x − Σ12 Σ+
22 x2 ∼ CN P (u1 − Σ12 Σ22 u2 ), Σ11.2 ),
+
(3.355)
where Σ11.2 = Σ11 − Σ12 Σ+ 22 Σ21 , and y is independent from x2 . Further-
more, the conditional distribution of x1 given x2 is also complex Gaussian,
i.e. � �
(x1 |x2 ) ∼ CN P u1 + Σ12 Σ+ 22 (x2 − u2 ), Σ11.2 . (3.356)
Proof. To simplify the proof, it will be assumed that Σ22 is positive definite,
−1
thus implying that Σ+22 = Σ22 . For a more general proof, see [Muirhead,
1982, theorem 1.2.11].
Probability and stochastic processes 165

Defining the matrix


� �
IP −Σ12 Σ−1
B= 22 , (3.357)
0 IN −P
theorem 3.4 indicates that
� �
x1 − Σ12 Σ−1
22 x2
y = Bx = , (3.358)
x2
is a Gaussian vector with mean
� �
u1 − Σ12 Σ−1
22 u2
v = �y� = , (3.359)
u2
and covariance matrix
� �
BΣBH = (y − v)(y − v)H
� �� �� �
IP −Σ12 Σ−1 Σ11 Σ12 IP 0
= 22
0 IN −P Σ21 Σ22 −Σ−1
22 Σ21 IN −P
� �
Σ11.2 0
= . (3.360)
0 Σ22

Random matrices
Given an M × N matrix A = [a1 a2 · · · aN ], where

�am � = 0
� �
am aH
m = Σ, m = 1, . . . , M
� �
H
am an = 0, m �= n.

To fully characterize the covariances of a random matrix, it is necessary to


vectorize it, i.e.
 
Σ 0 ··· 0
� �  0 Σ ··· 0 
 
vec(A)vec(A)H =  .. .. . . . 
 . . . .. 
0 0 ··· Σ
= IN ⊗ Σ, (3.361)

where the above matrix has dimensions M N × M N and IN is the N × N


identity matrix.
It follows that if Σ = IM , the overall covariance is IN ⊗ IM = IM N .
166 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Given the transformation B = RAS, where P × M matrix R and N × Q


matrix S are fixed, we have
�B� = R �A� S. (3.362)
From lemma (2.3), we have
� �
vec(B) = SH ⊗ R vec(B), (3.363)
which implies � �
�vec(B)� = SH ⊗ R �vec(B)� , (3.364)
and
� � �� � �� � �H �
vec(B)vec(B)H = SH ⊗ R vec(A) SH ⊗ B vec(A)
�� H � � ��
= S ⊗ R vec(A)vec(A)H S ⊗ RH
� � � �
= SH ⊗ R (IN ⊗ Σ) S ⊗ RH , (3.365)
where property 2.60 was applied to move from the first to the second line.
Applying property 2.61 twice, we finally get
� �
Σvec(B) = vec(B)vec(B)H = SH S ⊗ RΣRH . (3.366)
It follows that, if Σ = IM , the overall covariance matrix can be conveniently
expressed as
Σvec(B) = SH S ⊗ RRH , (3.367)
where RRH is the column covariance matrix and SH S is the line covariance
matrix.

Gaussian matrices
Theorem 3.7. Given a P × Q complex Gaussian matrix X such that X ∼
CN (M, C ⊗ D) where C is a Q × Q positive definite matrix, D is a P × P
positive definite matrix, and �X� = M, then the PDF of X is
1
fX (X) = × (3.368)
π P Q (det(C))P (det(D))Q
� �
etr −C−1 (X − M)H D−1 (X − M) , X > 0,
where etr(X) = exp(tr(X)).
Proof. Given the random vector x = vec(X) with mean m = �vec(X)�, we
know from theorem 3.2 that its PDF is
1 � �
fx (x) = P Q exp −(x − m)H (C ⊗ D)−1 (x − m) .
π det (C ⊗ D)
(3.369)
Probability and stochastic processes 167

Equivalence of the above with the PDF stated in the theorem is demonstrated
first by observing that
det(C ⊗ D) = (det(C))P (det(D))Q , (3.370)
which is a consequence of property 2.64.
Second, we have
� �
(x−m)H (C ⊗ D)−1 (x−m) = (x−m)H C−1 ⊗ D−1 (x−m), (3.371)
which, according to lemma 2.4(c), becomes
� � � �
(x − m)H C−1 ⊗ D−1 (x − m) = tr C−1 (X − M)H D−1 (X − M) ,
(3.372)
which completes the proof.

7. Transformations, Jacobians and exterior products


Given an M × 1 vector x which follows a PDF fx (x), we introduce a trans-
formation y = g(x), thus finding that
fy (y) = fx (g −1 (y)) |J(x → y)| , (3.373)
where the Jacobian is
 ∂x1 ∂x1 
∂y1 ··· ∂yM

J(x → y) = det  .. .. .. 
(3.374)
. . . .
∂xM ∂xM
∂y1 ··· ∂yM

The above definition for the Jacobian, based on a determinant of partial


derivatives, certainly works and was used in deriving the multivariate normal
distribution. However, this definition is cumbersome for large numbers of vari-
ables.
We will illustrate an alternative approach based on the multiple integral be-
low. Our outline follows [James, 1954] as interpreted in [Muirhead, 1982].
We have �
I= fx (x1 , . . . , xM )dx1 . . . dxM , (3.375)
S
where S is a subset of R and I is the probability that vector x takes on a
M

value in subset S.
Given a one-to-one invertible transformation
y = g(x), (3.376)
the integral becomes

I= fx (g −1 (x)) |J(x → y| dy1 . . . dyM , (3.377)
S†
168 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

where S † is the image of S under the transformation, i.e. S † = g(S).


We wish to find an alternative expression of dx1 · · · dxM as a function of
dy1 · · · dyM . Such an expression can be derived from the differential forms
∂xm ∂xm ∂xm
dxm = dy1 + dy2 + · · · + dyM . (3.378)
∂y1 ∂y2 ∂yM
These can be substituted directly in (3.377). Consider for example the case
where M = 2. We get
� � �� �
−1 ∂x1 ∂x1 ∂x2 ∂x2
I= fx (g (y)) dy1 + dy2 dy1 + dy2 .
S† ∂y1 ∂y2 ∂y1 ∂y2
(3.379)
The problem at hand is to find a means of carrying out the product of the
two differential form in order to fall back to the Jacobian. If the multiplication
is carried out in the conventional fashion, we find
� �� �
∂x1 ∂x1 ∂x2 ∂x2
dy1 + dy2 dy1 + dy2 =
∂y1 ∂y2 ∂y1 ∂y2
∂x1 ∂x2 ∂x1 ∂x2 ∂x1 ∂x2
dy1 dy1 + dy1 dy2 + dy2 dy1 +
∂y1 ∂y1 ∂y1 ∂y2 ∂y2 ∂y1
∂x1 ∂x2
dy2 dy2 . (3.380)
∂y2 ∂y2
Comparing the above to the Jacobian
� � � �
∂x1 ∂x1
∂x1 ∂x2 ∂x1 ∂x2
J(x → y)dy1 dy2 = det ∂y1
∂x2
∂y2
∂x2 = − ,
∂y1 ∂y2 ∂y1 ∂y2 ∂y2 ∂y1
(3.381)
we find that the two expressions ((3.379) and (3.381)) can only be reconciled
if we impose non-standard multiplication rules and use a noncommutative, al-
ternating product where

dym dyn = −dyn dym . (3.382)

This implies that dym dym = −dym dym = 0. This product is termed the
exterior product and denoted by the symbol ∧. According to this wedge
product, the product (3.379) becomes
� �
∂x1 ∂x2 ∂x1 ∂x2
dx1 dx2 = − dy1 ∧ dy2 . (3.383)
∂y1 ∂y2 ∂y2 ∂y1
Theorem 3.8. Given two N × 1 real random vectors x and y, as well as the
transformation y = Ax where A is a nonsingular M × M matrix, we have
Probability and stochastic processes 169

dy = Adx and
M
� M

dym = det(A) dxm .
m=1 m=1
Proof. Given the properties of the exterior product, it is clear that
M
� M

dym = p(A) dxm , (3.384)
m=1 m=1

where p(A) is a polynomial in the elements of A.


Indeed, the elements of A are the coefficients of the elements of x and as
such are extracted by the partial derivatives (see e.g. (3.383)).
The following outstanding properties of p(A) can be observed:
(i) If any row of A is multiplied by a scalar factor α, then p(A) is likewise
increased by α.
(ii) If the positions of two variates yr and ys are reversed, then the positions of
dyr and dys are also reversed in the exterior product, leading to a change
of sign (by virtue of (3.382). This, however, is equivalent to interchanging
rows r and s in A. It follows that interchanging two rows of A reverses the
sign of p(A).
(iii) If A = I, then p(I) = 1 since this corresponds to the identity transforma-
tion.
However, these three properties correspond to properties 2.29-2.31 of deter-
minants. In fact, this set of properties is sufficiently restrictive to define deter-
minants; they actually correspond to Weierstrass’ axiomatic definition of de-
terminants [Knobloch, 1994]. Therefore, we must have p(A) = det(A).
Rewriting (3.375) using exterior product notation, we have

I= fx (x1 , . . . , xM )dx1 ∧ dx2 . . . ∧ dxM , (3.385)
S

From (3.378), we know that


 ∂x1 ∂x1 
∂y1 ··· ∂yM

dx =  .. .. .. 
(3.386)
. . . .
∂xM ∂xM
∂y1 ··· ∂yM

Therefore, theorem 3.8 implies that


M
�� � � M
� ∂xm m=1,··· ,M �
dxm = det dym , (3.387)
∂yn n=1,··· ,M
m=1 m=1
170 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

where the Jacobian is the magnitude of the right-hand side determinant.


In general, we have
M
� M

dxm = J(x → y) dym . (3.388)
m=1 m=1

For an M × N real, arbitrary random matrix X, we have in general


 
dx11 ··· dx1N
dX = 
 .. .. ..  , (3.389)
. . . 
dxM 1 · · · dxM N

which is the matrix of differentials, and we denote


N �
� M
(dX) = dxmn , (3.390)
n=1 m=1

the exterior product of the differentials.


If X is M × M and symmetric, it really has only 12 M (M + 1) distinct
elements (the diagonal and above-diagonal elements). Therefore, we define
M �
� n �
(dX) = dxmn = dxmn . (3.391)
n=1 m=1 m≤n

In the same fashion, if X is M × M and skew-symmetric, then there are


only 12 M (M − 1) (above-diagonal) elements and

M n−1
� � �
(dX) = dxmn = dxmn . (3.392)
n=1 m=1 m<n

It should be noted that a transformation on a complex matrix is best treated


by decomposing it into a real and imaginary part, i.e. X = �{X} + j�{X} =
Xr + jXi which naturally leads to

(dX) = (dXr )(dXi ), (3.393)

since exterior product rules eliminate any cross-terms. We offer no detailed


proof, but the proof of theorem (3.16) provides some insight into this matter.
It is also noteworthy that

d(AB) = AdB + dAB. (3.394)


Probability and stochastic processes 171

Theorem 3.9. Given two N × 1 complex random vectors u and v, as well as


the transformation v = Au where A is a nonsingular M × M matrix, we
have dv = Adu and
(dv) = det(A)2 (du) .
Proof. Given the vector version of (3.393), we have

(dv) = (d�{v}) (d�{v}) , (3.395)

and, applying theorem 3.8, we obtain

(d�{v}) = |det(A)| (d�{u})


(d�{v}) = |det(A)| (d�{u}) ,

which naturally leads to

(dv) = (d�{v}) (d�{v})


= det(A)2 (d�{u}) (d�{u}) (3.396)
= det(A)2 (du) . (3.397)

Selected Jacobians
What follows is a selection of Jacobians of random matrix transformations
which will help to support forthcoming derivations. For convenience and clar-
ity of notation, it is the inverse transformations that will be given.
Portions of this section follow [Muirhead, 1982, chapter 2] and [Ratnarajah
et al., 2004] (for complex matrices).
Theorem 3.10. If X = BY where X and Y are N ×M real random matrices
and B is a fixed positive definite N × N matrix, then

(dX) = |det(B)|M (dY), (3.398)

which implies that


J(X → Y) = |det(B)|M . (3.399)
Proof. The equation X = BY implies dX = BdY. Letting

dX = [dx1 , · · · , dxM ] ,
dY = [dy1 , · · · , dyM ] ,

we find
dxm = Bdym , m = 1, · · · , M, (3.400)
172 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

which, by virtue of theorem 3.8, implies that


N
� N

dxnm = det(B) dynm , m = 1, · · · , M. (3.401)
n=1 n=1

Therefore, we have
M �
� N
(dX) = dxnm
m=1 n=1
�M N

= det(B) dynm
m=1 n=1
M
� �N
= det(B)M dynm
m=1 n=1
= det(B)M (dY).

Theorem 3.11. If X = BYBT where X and Y are M × M symmetric real


random matrices and B is a non-singular fixed M × M matrix, then

(dX) = |det(B)|M +1 (dY). (3.402)

Proof. The equation X = BYBT implies dX = BdYBT . It follows that


� �
(dX) = BdYBT = p(B)(dY). (3.403)

where p(B) is a polynomial in the elements of B. Furthermore it can be shown


that if
B = B1 B2 , (3.404)
then
p(B) = p(B1 )p(B2 ). (3.405)
Indeed, we have

p(B)(dY) = (B1 B2 dY(B1 B2 )T )


= (B1 B2 dYBT2 BT1 )
= p(B1 )(B2 dYBT2 )
= p(B1 )p(B2 )(dY),

where (3.403) was applied twice.


Probability and stochastic processes 173

It turns out that the only polynomials in the elements of B that can be fac-
torized as above are the powers of det(B) (see prop. 2.40). Therefore, we
have
p(B) = (det(B))k ,
where k is some integer.
We can isolate k by letting B = det (β, 1, · · · , 1) such that
 2 
β y11 by12 · · · by1M
 βy12 y22 · · · by2M 
 
BdYBT =  .. .. . . ..  . (3.406)
 . . . . 
by1M y2M · · · yM M

It follows that the exterior product of the distinct elements (diagonal and
above-diagonal elements) of dX is

(dX) = (BdYBT ) = β M +1 (dY).

Given that p(B) = β M +1 = det(B)M +1 , the proof is complete.


Theorem 3.12. If X = BYBT where X and Y are M × M skew-symmetric
real random matrices and B is a non-singular fixed M × M matrix, then

(dX) = |det(B)|M −1 (dY). (3.407)

Proof. The proof follows that of theorem 3.11 except that by definition, we
take the exterior product of the above diagonal elements only in (3.406) since
X and Y are skew-symmetric.
Theorem 3.13. If X = BYBH where X and Y are M × M positive definite
complex random matrices and B is a non-singular fixed M × M matrix, then

(dX) = |det(B)|2M (dY). (3.408)

Proof. The proof follows from the fact that real and imaginary parts of a Her-
mitian matrix are symmetric and skew-symmetric, respectively. It follows that,
by virtue of theorems 3.11 and 3.12,

(dXr ) = det(B)M +1 (dYr ) (3.409)


(dXi ) = det(B)M −1 (dYi ). (3.410)

It follows that

(dX) = (dXr )(dXi ) = det(B)2M (dYr )(dYi ) = det(B)2M (dY).


174 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Theorem 3.14. If X = Y−1 where X and Y are M × M complex non-


singular random matrices and Y is Hermitian, then

(dX) = det(Y)−2M (dY), (3.411)

Proof. Since XY = I and applying (3.394), we have

d(XY) = XdY + dX · Y = dI = 0. (3.412)

Therefore,

dX · Y = −XdY,
dX = −XdY · Y−1 ,
dX = −Y−1 dY · Y−1 , (3.413)
(3.414)

which implies that

(dX) = (Y−1 dY · Y−1 )


= det(Y)2M (dY),

by virtue of theorem 3.13.

Theorem 3.15. If A is a real M × M positive definite random matrix, there


exists a decomposition A = TT T (Cholesky decomposition) where T is upper
triangular. Furthermore, we have
M

−m+1
(dA) = 2M tM
mm (dT), (3.415)
m=1

where tmm is the mth element of the diagonal of T.


Proof. The decomposition has the form
 
a11 a12 · · · a1M
 a12 a22 · · · a2M 
 
 .. .. .. ..  =
 . . . . 
a1M a2M · · · aM M
  
t11 0 ··· 0 t11 t12 · · · t1M
 t12 t22 ··· 0  0 t22 · · · t2M 
  
 .. .. .. ..  .. .. . . ..  . (3.416)
 . . . .  . . . . 
t1M t2M · · · tM M 0 0 · · · tM M
Probability and stochastic processes 175

We proceed by expressing each distinct element of A as a function of the


elements of T and then taking their respective differentials. Thus, we have
a11 = t211 ,
a12 = t11 t12 ,
a1M = t11 t1M ,
a22 = t212 + t222
..
.
a2M = t12 t1M + t22 t2M ,
..
.
aM M = t21M + · · · + t2M M ,
where we note that
da11 = 2t11 dt11 ,
da12 = t11 dt12 + t12 dt11 ,
..
.
da1M = t11 dt1M + t1M dt11 .
When taking the exterior product of the above differentials, the second term
in da12 , · · · , da1M disappears since dt211 = 0 (according to the product rules)
so that we have:
M
� M

da1j = 2t11 M
dt1M . (3.417)
n=0 n=0
In the same manner, we find that
M
� M

−m+1
damn = 2tM
mm dtmM , (3.418)
n=m−1 n=m−1

thus leading naturally to

� M
� M

(dA) = damn = damn
n≤m m=0 n=m−1
M
� �
−m+1
= 2 M
tM
mm dtmn
m=1 n≤m
M

−m+1
= 2M tM
mm (dT).
m=1
176 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Theorem 3.16. If A is a complex M × M positive definite random matrix,


there exists a decomposition A = TH T (Cholesky decomposition, def. 2.53)
where T is upper triangular. Furthermore, we have
M

−2m+1
(dA) = 2M t2M
mm (dT), (3.419)
m=1

where tmm is the mth element of the diagonal of T.


Proof. Letting amn = αmn +jβmn and tmn = τmn +jµmn and in accordance
with def. 2.53 (where the diagonal imaginary elements µmm are set to zero),
we have
 
α11 α12 · · · α1M
 α12 α22 · · · a2M 
 
 .. .. .. ..  = (3.420)
 . . . . 
α1M α2M · · · αM M
 
t211 τ11 τ12 ··· τ11 τ1M
 2 + τ2
.. 
 τ12 τ11 τ22 11 ··· . 
 .. .. .. .. 
 . . . . 
M + ···
τ1M τ11 ··· 2
· · · τM
and
 
0 β12 · · · β1M
 β12 0 · · · a2M 
 
 .. .. .. .. = (3.421)
 . . . . 
β1M β2M ··· 0
 
0 τ11 µ12 · · · τ11 µ1M
 −µ12 τ11 0 · · · τ12 µ1M + τ22 µM 2 − µ12 τ1M 
 
 .. .. .. .. .
 . . . . 
−µ1M τ11 ··· ··· 0
Taking the differentials of the diagonal and the upper-diagonal elements of
the real part of A, we have
dα11 = 2τ11 dτ11
dα12 = τ11 dτ12
..
.
dαM M = 2τM M dτM M .
Probability and stochastic processes 177

Likewise, the imaginary part yields

dβ12 = τ11 dµ12


dβ13 = τ11 dµ13
..
.
dβ1M = τ11 dµ1M .
dβM −1,M = τM −1,M −1 dµM −1,M + · · ·

It is noteworthy that the differential expressions above do not contain all the
terms they should. This is because they are meant to be multiplied together
using exterior product rules (so that repeated differentials yield 0) and redun-
dant terms have been removed. Hence, only differentials in τmn were taken
into account for the real part, and only differentials in µmn were kept for the
imaginary part. Likewise, the first appearance of a given differential form pre-
cludes its reappearance in successive expressions. For example, dτ11 appears
in the expression for dα11 so that all terms with dτ11 are omitted in successive
expressions dα12 , . . . dαM M .
Hence, we have

(dA) = (d�{A})(d�{A})
M
� �
= dαmn dβmn
m≤n m<n
 
M

M −1
= 2M tM
11 t22 · · · tM M dτmn  ×
m≤n
� M


−1 M −2
tM
11 t22 · · · tM −1,M −1 dµmn
m<n
M

2M −2m+1
= 2M tmm (d�{T})(d�{T})
m=1
�M
2M −2m+1
= 2M tmm (dT). (3.422)
m=1

Theorem 3.17. Given a complex N × M matrix Z of full rank where N ≥ M


and let it be defined as the product Z = U1 T where U1 is an N × M unitary
matrix such that UH 1 U1 = IM , and T is an M × M upper triangular matrix
178 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

(QR decomposition), then we have


M

−2m+1
(dZ) = t2N
mm 1 dU1 ).
(dT)(UH (3.423)
m=1

Proof. We postulate a matrix U2 , being a function of U1 , of dimensions N ×


(N − M ) such that
U = [ U1 | U2 ] = [u1 , u2 , · · · , uN ] , (3.424)
is a unitary N × N matrix.
Furthermore, we have
M �
� N
(UH
1 U1 ) = uH
n dum . (3.425)
m=1 n=m

We note that, according to the chain rule


dZ = dU1 T + U1 dT, (3.426)
and � � � �
UH 1 dU1 T + U1 U1 dT
UH H
UH
1 dZ = 1 dZ = .
UH
2 U2 dU1 T + UH
H
2 U1 dT

1 U1 = IM and U2 U1 = 0, reduces to
which, noting that UH H

� H �
U1 dU1 T + dT
UH dZ = . (3.427)
1 UH2 dU1 T

The exterior product of the left-hand side of (3.427) is given by


� H �
U1 dZ = (detU1 )2M (dZ) = (dZ) , (3.428)
by virtue of theorem (3.10), (3.393) and the fact that U1 is unitary.
Considering now the lower part of the right-hand side of (3.427), the mth
row of UH 2 dU1 T is
� H �
um du1 , · · · , uH
m duM T, m ∈ [M + 1, N ]. (3.429)
Applying theorem 3.9, the exterior product of the elements in this row is
M

2
|detT| uH
m dun , (3.430)
n=1

and the exterior product of all elements of UH


2 dU1 T is
M

det(T)2 uH
m dun , (3.431)
n=1
Probability and stochastic processes 179

and the exterior product of all elements of UH2 dU1 T is

N
� M

� H � � �
U2 dU1 T = det(T)2 uH
m dun
m=M +1 n=1
N
� M

2(N −M )
= det uH
m dun . (3.432)
m=M +1 n=1

We now turn our attention to the upper part of the right-hand side of (3.427),
1 dU1 T + dT. Since U1 is unitary, we have
the latter being UH

1 U1 = IM .
UH

1 dU1 dU1 U1 = 0 which


Taking the differential of the above yields UH H

implies that
� H �H
UH 1 dU1 = −dU1 U1 = − U1 U1
H
. (3.433)
For the above to hold, UH 1 dU1 must be skew-Hermitian, i.e. its real part is
skew-symmetric and its imaginary part is symmetric.
It follows that the real part of UH
1 dU1 can be written as
 
0 −�{uH H
2 du1 } · · · −�{uM du1 }
 �{uH du1 } 0 
 2 · · · −�{uH M u2 } 
�{UH dU } =  . . .. . 
1 1
 .. .. . .. 
M u1 }
�{uH M u2 }
�{uH ··· 0
(3.434)
Postmultiplying the above by T and retaining only the terms contributing to
the exterior product, we find that the subdiagonal elements are
 
0 ··· ··· ··· ···
 �{uH du1 }t11 ··· ··· ··· ··· 
 2 
 �{uH du1 }t11 �{u3 du2 }t22 · · · · · · · · · 
 3 .
 .. .. .. . . .
. 
 . . . . . 
�{uH M du1 }t11 �{uM du2 }t22 · · · �{uM uM −1 }tM −1,M −1 · · ·
H

(3.435)
� �
Thus, we find that the exterior product of the subdiagonal elements of �{ UH 1 U1 T}
is given by
−1 �M M −2 M

tM
11 m=2 �{um du1 }t2 2
H H
m=3 �{um du2 } · · ·
tM −1,M −1 �{uHM duM −1 } =
�� �� �
M M −m M M
m=1 tmm
H
m=1 n=m+1 �{un dum }. (3.436)
180 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

In the same fashion, we find that the exterior product of the diagonal and
subdiagonal elements of �{UH 1 dU1 }T is
� M � M M
� � �
M −m+1
tmm �{uH n dum }. (3.437)
m=1 m=1 n=m
� �
Clearly, the above-diagonal and diagonal elements of �{ UH 1 dU1 T} con-
tribute nothing to the exterior product since they all involve an element of dU1
and all such elements already appear in the subdiagonal � portion. � The same
argument appears to the above diagonal elements of �{ UH 1 dU 1 T. Further-
more, it can be verified that the inclusion of dT in U1 U1 + dT amounts to
H

multiplying the exterior product by


M

(dT) = dtmn . (3.438)
m≤n

Multiplying (3.431), (3.435), (3.437) and (3.438), we finally find that


M
� � �
(dZ) = t2N −2m+1 (dT) UH
1 dU1 . (3.439)
m=1

Theorem 3.18. Given a real N × M matrix X of full rank where N ≥ M and


let it be defined as the product X = H1 T where H1 is an N × M orthogonal
matrix such that HH 1 H1 = IM , and T is an M × M real upper triangular
matrix (QR decomposition), then we have
M

−m
(dX) = mm (dT)(H1 dH1 ),
tN H
(3.440)
m=1

where
M
� N

(HH
1 dH1 ) = hH
n dhm . (3.441)
m=1 n=m+1

The proof is left as an exercise.

Multivariate Gamma function


Definition 3.41. The multivariate Gamma function is a generalization of the
Gamma function and is given by the multivariate integral

ΓM (a) = etr(−A)det(A)a−(M +1)/2 (dA), (3.442)
A>0
Probability and stochastic processes 181

where the integral is carried out over the space of real positive definite (sym-
metric) matrices.
It is a point of interest that it can be verified that Γ1 (a) = Γ(a).
Definition 3.42. The complex multivariate Gamma function is defined

Γ̃M (a) = etr(−A)det(A)a−M (dA), (3.443)
A>0
where the integral is carried out over the space of Hermitian matrices.
Theorem 3.19. The complex multivariate Gamma function can be computed
in terms of standard Gamma functions by virtue of
M

Γ̃M (a) = π M (M −1)/2 Γ(a − m + 1). (3.444)
m=1

Proof. Given the definition 3.19, we apply the transformation A = TH T to


obtain (by virtue of theorem 3.16)
� �M
� � −2m+1
Γ̃M (a) = 2 M
etr −TH T det(TH T)a−M t2M
mm (dT),
T,TH T>0 m=1
(3.445)
where we note that a triangular matrix has its eigenvalues along its diagonal.
Therefore,
M

det(TH T) = t2mm , (3.446)
m=1
and � � �
tr −TH T = |tmn |2 . (3.447)
m≤n
Using (3.446) and (3.447), (3.445) can be rewritten in decoupled form, i.e.
 
� M
� M
� M

Γ̃M (a) = 2M exp − |tmn |2  t2a−2m+1
mm dtmn
T,TH T>0 m≤n m=1 m≤n
M ��
� ∞ �
2
= 2M e−�{tmn } d�{tmn } ×
m<n −∞
M
� �� ∞ �
−�{tmn }2
e d�{tmn } ×
m<n −∞
M ��
� ∞ �
2
e−tmm t2a−2m+1
mm dt mm . (3.448)
m=1 0
182 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Knowing that � ∞ √
2
e−t dt = π, (3.449)
−∞

and applying the substitution um = t2mm , we find


M ��
� ∞ �
M (M −1)/2
Γ̃M (a) = π e−um ua−m
m dum , (3.450)
m=1 0

where the remaining integrals correspond to standard Gamma functions, thus


yielding
M

M (M −1)/2
Γ̃M (a) = π Γ(a − m + 1). (3.451)
m=1

Stiefel manifold
Consider the matrix H1 in theorem 3.18. It is an N × M matrix with or-
thonormal columns. The space of all such matrices is called the Stiefel mani-
fold and it is denoted VM,N . Mathematically, this is stated
� �
VM,N = H1 ∈ RN ×M ; HH 1 H1 = IM . (3.452)

However, the complex counterpart of this concept will be found more useful
in the study of space-time multidimensional communication systems.
Definition 3.43. The N × M complex Stiefel manifold is the space spanned
by all N × M unitary matrices (denoted U1 ) and it is denoted ṼM,N , i.e.
� �
ṼM,N = U1 ∈ CN ×M ; UH 1 U1 = IM .

Theorem 3.20. The volume of the complex Stiefel manifold is


� � � � H � 2M π M N
Vol ṼM,N = U1 dU1 = .
ṼM,N Γ̃M (N )

Proof. If the real parts and imaginary parts of all the elements of U1 are treated
as individual coordinates, then a given instance of U1 defines a point in 2M N -
dimensional Euclidian space.
Furthermore, the constraining equation UH 1 U1 = IM can be decomposed
into its real and imaginary part. Since UH 1 U 1 is necessarily Hermitian, the
real part is symmetric and leads to 12 M (M + 1) constraints on the elements
of U1 . The imaginary part is skew-symmetric and thus leads to 12 M (M − 1)
constraints. It follows that there are M 2 constraints on the position of the point
Probability and stochastic processes 183

in 2M N -dimensional Euclidian space corresponding to U1 . Hence, the point


lies on a 2M N − M 2 -dimensional surface in 2M N space.
Moreover, the constraining equation implies
M �
� N
u2mn = M. (3.453)
m=1 n=1

Geometrically,
√ this means that the said surface is a portion of a sphere of
radius M .
Given an N × M complex Gaussian matrix X with N ≥ M such that
X ∼ CN (0, IN ⊗ IM ). By virtue of theorem 3.7, its density is
� �
fX (X) = π −N M etr −XH X . (3.454)
Since a density function integrates to 1, it is clear that

� �
etr −XH X = π M N . (3.455)
X

Applying the transformation X = U1 T where U1 ∈ ṼM,N and T is upper


triangular with positive diagonal elements (since it is nonsingular in accor-
dance with the definition of QR-decomposition), then we have
M
� � � � �
tr XH X = tr TH T = |tmn |2
m≤n
M

−2m+1
� �
(dX) = t2N
mm (dT) UH
1 dU1 . (3.456)
m=1

Then, eq. (3.455) becomes


� � � �� � � H �
M M
H
T,T T>0 exp − m≤n |t mn |2 2N −2m+1 (dT)
m=1 tmm ṼM,N U1 dU1

= πM N . (3.457)
An integral having the same form as the integral above over the elements of
T was solved in the proof of theorem 3.19. Applying this result, we find that
� � � �� � � H �
M M
H
T,T T>0 exp − m≤n |t mn |2 2N −2m+1 (dT)
m=1 tmm ṼM,N U1 dU1
Γ̃M (N )
= 2M
. (3.458)
From the above and (3.455), it is obvious that
� � 2M π M N
Vol ṼM,N = . (3.459)
Γ̃M (N )
184 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

Wishart matrices
As a preamble to this subsection, we introduce yet another theorem related
to the complex multivariate Gamma function.
Theorem 3.21. Given an Hermitian M × M matrix C and a scalar a such
that �{a} > M − 1, then

� �
etr −C−1 A det(A)a−M (dA) = Γ̃M (a)det(C)a ,
A>0

where integration is carried out over the space of Hermitian matrices.


Proof. Applying the transformation A = C1/2 BC1/2 , where C1/2 is the posi-
tive square root (e.g. obtained through eigenanalysis and taking the square root
of the eigenvalues) of A, theorem 3.13 implies that (dA) = det(C)M (dB).
Hence, the integral becomes

� �
I = etr −C−1 A det(A)a−M (dA)
�A>0 � �
= etr −C−1 C1/2 BC1/2 det(B)a−M (dB)det(C)a−M +M
�B>0
= etr (−B) det(B)a−M (dB)det(C)a , (3.460)
B>0

Thus, the transformed integral now coincides with the definition of Γ̃M (a),
leading us directly to
I = Γ̃M (a)det(C)a . (3.461)

From the above proof, it is easy to deduce that


1 � �
fA (A) = etr −C−1 A det(A)a−M (dA), (3.462)
Γ̃M (a)det(C)a

is a multivariate PDF since it integrates to 1 and is positive given the constraint


A > 0. This is an instance of the Wishart distribution, as detailed hereafter.
Definition 3.44. Given an N × M complex Gaussian matrix Z ∼ CN (0, IN ⊗
Σ) where N ≥ M , its PDF is given by
1 � �
fZ (Z) = N
etr −Σ−1 ZH Z (dZ), Z > 0,
πM N |det(Σ)|

then A = ZH Z is of size M × M and follows a complex Wishart density


with N degrees of freedom and denoted A ∼ CW M (N, Σ).
Probability and stochastic processes 185

Unless stated otherwise, it will be assumed in the following that the number
of degrees-of-freedom N is equal or superior to the Wishart matrix dimension
M , i.e. the Wishart matrix is nonsingular.
Theorem 3.22. If A ∼ CW M (N, Σ), then its PDF is given by

etr(−Σ−1 A) |det(A)|N −M
fA (A) = .
Γ̃M (N ) |det(Σ)|N

Proof. Given that A = ZH Z in accordance with definition (3.44), we wish


to apply a transformation Z = U1 T where U1 is N × M and unitary (i.e.
UH1 U1 = IM ) and T is M × M and upper triangular.
From the PDF of Z given in definition 3.44 and theorem 3.17, we have
1 � �
fU1 ,T (U1 , T) = N
etr −Σ−1 TH T ×
π M N |det (Σ)|
M

−2m+1
� �
t2N
mm (dT) UH
1 dU1 , (3.463)
m=1

1 U1 T = T T and U1 can be
where we note that A = ZH Z = TH UH H

removed by integration, i.e.


1 � −1 H

fT (T) = etr −Σ T T ×
π M N |det (Σ)|N
�M �
2N −2m+1
� H �
tmm (dT) U1 dU1 , (3.464)
m=1 ṼM,N

which, according to theorem 3.20, yields


2M � −1 H

fT (T) = etr −Σ T T ×
Γ̃(N ) |det (Σ)|N
M

−2m+1
t2N
mm (dT) . (3.465)
m=1

Applying a second transformation A = TH T and by virtue of theorem


3.16, we find that
2M � �
fA (A) = N
etr −Σ−1 A ×
Γ̃(N ) |det (Σ)|
M

−M )
t2(N
mm (dA) , (3.466)
m=1
186 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
� 2(N −M )
where M m=1 tmm = |det (T)|2(N −M ) = |det (A)|N −M , which con-
cludes the proof.
While the density above was derived assuming that N is an integer, the
distribution exists for non-integer values of N as well.
Theorem 3.23. If A ∼ CW M (N, Σ), then its characteristic function is

φA (jΘ) = �etr (jΘA)� = det (I − jΘΣ)−N ,

where
 
2t11 t12 · · · t1M
 t21 2t22 · · · t2M 
 
Θ =  .. .. .. .. 
 . . . . 
tM 1 tM 2 · · · 2tM M
= [tmn ]n=1,··· ,M
m=1,··· ,M + diag (t11 , t22 , · · · , tM M ) ,

and tmn is the variable in the characteristic function domain associated with
element amn of the matrix A. Since A is Hermitian, tmn = tnm .
Several aspects of the above theorem are interesting. First, a new defini-
tion of characteristic functions — using matrix Θ — is introduced for random
matrices, allowing for convenient and concise C. F. expressions. Second, we
note that if M = 1, the complex Wishart matrix reduces to a scalar a and its
characteristic function according to the above theorem is (1 − jtσ 2 )−N , i.e.
the C. F. of a 2N degrees-of-freedom chi-square variate.
Proof. We have

φA (jΘ) = �etr (jΘA)� = etr (jΘA) fA (A)(dA)
A

etr((jΘ − Σ−1 )A) |det(A)|N −M
= (dA), (3.467)
A Γ̃M (N ) |det(Σ)|N
where the integral can be solved by virtue of theorem 3.21 to yield
� �−N
φA (jΘ) = det (Σ)−N Σ−1 − jΘ = det (I − jΘΣ)−N . (3.468)

Theorem 3.24. If A ∼ CW M (N, Σ) and given a K × M matrix X of rank K


(thus implying that K ≤ M ), then the product
� XAX H
� also follows a complex
Wishart distribution denoted by CW K N, XΣX . H
Probability and stochastic processes 187

Proof. The C. F. of XΣXH is given by


� � ��
φ (jΘ) = etr jXAXH Θ , (3.469)
where, according to a property of traces, the order of matrices can be rotated
to yield
� � ��
φ (jΘ) = etr jAXH ΘX
� �−N
= det IM − jXH ΘXΣ , (3.470)
which, according to property 2.35, is equivalent to
� �−N
φ (jΘ) = det IM − jΘXΣXH . (3.471)
� �
Since this is the C. F. of a CW M N, XΣXH variate, the proof is complete.

Theorem 3.25. If A ∼ CW M (N, Σ) and given equivalent partitionings of A


and Σ, i.e.
� � � �
A11 A12 Σ11 Σ12
A= Σ= ,
AH12 A22 ΣH12 Σ22

where A11 and Σ11 are of size M � × M � , then submatrix A11 follows a
CW M � (N, Σ11 ) distribution.
Proof. Letting X = [IM � 0] (an M � × M matrix) in theorem 3.24, we find that
XAXH = A11 and XΣXH = Σ11 , thus completing the proof.
Theorem 3.26. If A ∼ CW M (N, Σ) and x is any M × 1 random vector
independant of A such that the probability that x = 0 is null, then
xH Ax
∼ χ22N ,
xH Σx
and is independent of x.
Proof. The distribution naturally derives from applying theorem 3.24 and let-
ting X = x therein.
1
To show independance, let y = Σ 2 x. Thus, we have
xH Ax yH By
= , (3.472)
xH Σx yH y
where B follows a CW M (N, IM ) distribution. This is also equivalent to
yH By
= zH Bz, (3.473)
yH y
188 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

where z = √ yH is obviously a unit vector. It follows that the quadratic form


y y
is independant of the length of y.
Likewise, B has the same distribution as UH BU where U is any M × M
unitary matrix (i.e. a rotation in space). It follows that zH Bz is also inde-
pendant of the angle of y in M -dimensional Hermitian space. Therefore, the
quadratic form is totally independant of y and, by extension, of x.

Theorem 3.27. Suppose that A ∼ CW M (N, Σ) where N ≥ M and given the


following partitionings,
� � � �
A11 A12 Σ11 Σ12
A= Σ= ,
AH12 A22 ΣH12 Σ22

where A11 and Σ11 are P × P , A22 and Σ22 are Q × Q, and P + Q = M ,
then
the matrix A1.2 = A11 − A12 A−1 22 A12 follows a CW P (M − Q, Σ1.2 )
H

distribution, where Σ1.2 = Σ1 1 − Σ12 Σ−1


22 Σ12 ;
H

A1.2 is independent from A12 and A22 ;


the PDF of A22 is CW Q (N, Σ22 );
� �
the PDF of A12 conditioned on A22 is CN Σ12 Σ−1
22 A22 , A22 ⊗ Σ1.2 .

Proof. Given the distribution of A, it can be expressed A = XXH , where X


is CN (0, Σ ⊗ IN ) and N × M . Furthermore, X is partitionned as follows:
� �
X = XH 1 XH2 , (3.474)

where X1 is P × N and X2 is Q × N . Since Y2 is made up of uncorrelated


columns, its rank is Q, Therefore, theorem 2.6 garantees the existence of a
matrix B of size (N − Q) × N of rank N − Q such that X2 BH = 0, BBH =
IN −Q , and � �
Y = XH 2 BH , (3.475)
is nonsingular.
Observe that

A22 = X2 XH
2 A12 = X1 XH
2 (3.476)

and

A11.2 = A11 − A12 A−1


22 A21
� � � �
H −1
= X 1 IN − X H
2 X X
2 2 X H
2 X1 . (3.477)
Probability and stochastic processes 189

Starting from
� �−1
YH YYH Y = IN , (3.478)
which derives directly from the nonsingularity of Y, it can be shown that
� �
H −1
XH2 X2 X2 X 2 + BH X = I N , (3.479)

by simply expanding the matrix Y into its partitions in (3.478) and exploiting
the properties of X2 and B to perform various simplifications.
Substituting (3.479) into (3.476), we find that

A11.2 = X1 BH BXH
1 . (3.480)

Given that X is CN (0, Σ ⊗ IN ), it follows


� according to theorem �3.6 that
the distribution of X1 conditioned on CN Σ12 Σ−1 22 X2 , IN ⊗ Σ1.2 where
−1
Σ1.2 = Σ11 − Σ12 Σ22 Σ21 and the corresponding distribution is
1
fX1 |X2 (X1 |X2 ) = × (3.481)
π P N det(Σ11.2 )P
� �
etr −Σ−1 11.2 (X1 − M)(X1 − M)H ,

where M = Σ12 Σ−1 22 X2 . � �


Applying the transformation Z = A12 C = X1 YH , the Jacobian is
found to be
� �2P � �P
J(X1 → A12 , C) = det YH = det YYH = det (A22 )P (3.482)

by exploiting theorems 3.10 and 3.9. Furthermore, the argument of etr(·) in


(3.484) can be expressed as follows
�� � �� �−1
(X1 − M) (X1 − M)H = A12 C − MY YYH ×
�� � �H
A12 C − MY (3.483)
H
= (A12 − M) A−1
22 (A12 − M) + CC ,
H

thus yielding
1
fA12 ,C|X2 (A12 , C|X2 ) = × (3.484)
π P N det(Σ11.2 )P det(A22 )P

etr −Σ−1 H
11.2 CC −,

H
Σ−1
11.2 (A 12 − M) A −1
22 (A 12 − M) .

� (0, IN
Since the above density factors, this implies that C follows a CN
−1
−Q ⊗
Σ1.2 ) distribution and that A12 conditioned on X2 follows a CN Σ12 Σ22 A22 ,
190 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

A22 ⊗ Σ1.2 ) distribution. Furthermore, C is independently distributed from


both A12 and X2 . This, in turn, implies that A1.2 = CCH is independently
distributed from A12 and A22 . It readily follows that A1.2 = CCH ∼
CW P (N − Q,� Σ1.2 ), A22 ∼ CW Q (N, �Σ22 ) and (A12 conditioned on A22
follows a CN Σ12 Σ−1 22 A22 , A22 ⊗ Σ1.2 , thus completing the proof.

Theorem 3.28. If A ∼ CW M (N, Σ) and given a K × M matrix X of rank


� �
−1 H −1
K where K ≤ M � , then B = XA X follows a CW K (N − M + K,
� � −1
XΣ−1 XH distribution.
1 1
Proof. Applying the transformation A2 = Σ− 2 AΣ− 2 , then A2 ∼ CW M (N ,
IM ) as a consequence of theorem 3.24. The problem can thus be recast in
terms of A2 (which has the virtue of exhibiting no correlations, thus simplify-
ing the subsequent proof), i.e.
� �−1 � �
H −1
XA−1 XH = YA−1 2 Y , (3.485)
1
where Y = XΣ 2 . It follows that
� �−1 � �−1
XΣ−1 X = YYH . (3.486)

Performing the SVD of Y according to (2.115), we have


� � � �
Y = U S 0 VH = US IK 0 VH . (3.487)

Therefore, we have
� �
H −1
YA−1
2 Y
� � � �−1
� � IK
= US IK 0 V H
A−1
2 V SU H
0
� � ��−1
� �
H −1
� � IK
= SU IK 0 V H
A−1
2 V (US)−1
0
� � ��−1
� � IK
= US−1 IK 0 D−1 S−1 UH , (3.488)
0

where D = VH A2 V and, according to theorem 3.24, its PDF is also CW M (N ,


IM ).
It is clear that the product inside the parentheses above yields the upper-left
K × K partition of D−1 . According to lemma 2.1, this is equivalent to
� �
H −1
YA−1
2 Y = US−1 D11.2 S−1 UH , (3.489)
Probability and stochastic processes 191

where D11.2 = D11 − D12 D−122 D21 which, by virtue of theorem 3.27, follows
CW K (N
� − M + K, IK ).−1It immediately
� follows that US−1 D11.2 S−1 UH is
CW K N − M + K, US S U where−1

� �−1
US−1 S−1 UH = YYH
� �−1
= XΣ−1 X , (3.490)

thus completing the proof.


A highly useful consequence of the above theorem follows.
Theorem 3.29. Given a matrix A ∼ CW M (N, Σ) and x is any random vector
independent of A such that P (x = 0) = 0, then

xH Σ−1 x
∼ χ22N −2M +2 ,
xH A−1 x
and is independent of x.
The proof is left as an exercise.

Problems
3.1. From the axioms of probability (see definition 3.8), demonstrate that the
probability of an event E must necessarily satisfy P (X) ≤ 1.
3.2. What is the link between the transformation technique of a random vari-
able used to obtain a new PDF and the basic calculus technique of variable
substitution used in symbolic integration?
3.3. Given your answer in 3.2, show that Jacobians can be used to solve multiple
integrals by permitting multivariate substitution.
3.4. Show that if Y follows a uniform distribution within arbitrary bounds a and
b (Y ∼ U (a, b)), then it can be found as a tranformation upon X ∼ U (0, 1).
Find the transformation function.
3.5. Propose a proof for the Wiener-Khinchin theorem (eq. 3.121).
3.6. Given that the characteristic function of a central 2 degrees-of-freedom chi-
square variate with variance P is given by
1
Ψx (jv) = ,
1 − jπvP
derive the infinite-series representation for the Rice PDF (envelope).
3.7. Demonstrate (3.317).
192 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

3.8. From (3.317), derive the finite sum form which applies if n2 is an integer.
3.9. Show that
� �
−M − 12 1 T −1
fx (x) = (2π) 2 det(Σ) exp − (x − µ) Σ (x − µ) . (3.491)
2
is indeed a PDF. Hint: Integrate over all elements of x to show that the area
under the surface of fx (x) is equal to 1. To do so, use a transformation of
random variables to decouple the multiple integrals.
3.10. Given two χ2 random variables Z1 and Z2 with PDFs
1
fZ1 (x) = xn1 −1 e−x ,
Γ(n1 )
1
fZ2 (x) = xn2 −1 e−x/2.5 ,
Γ(n2 )2.5n2
(a) What is the PDF of the ratio Y = Z1
Z2 ?
(b) If n1 = 2 and n2 = 2, derive the PDF of Y = Z1 + Z2 . Hint: Use
characteristic functions and expansions into partial fractions.
3.11. Write a proof for theorem 3.18.
3.12. If X = BY, where X and Y are N × M complex random matrices and B
is a fixed positive definite N × N matrix, prove that the Jacobian is given
by
(dX) = |det(B)|2M (dY). (3.492)

3.13. Show that if W = CW M (N, IM ), then its diagonal elements w11 , w22 ,
. . . , wM M are independent and identically distributed according to a χ2
law with 2M degrees of freedom.
3.14. Write a proof for theorem 3.29. Hint: start by studying theorems 3.25 and
3.26.

APPENDIX 3.A: Derivation of the Gaussian distribution


Given the binomial discrete probability distribution
„ «
N
g(x) = P (X = x) = px (1 − p)N −x , (3.A.1)
x

where x is an integer between 0 and N , we wish to show that it tends towards the Gaussian
distribution as N gets large.
In fact, Abraham de Moivre originally derived the Gaussian distribution as an approximation
to the binomial distribution and as a means of quickly calculating cumulative probabilities (e.g.
the probability that no more than 7 heads are obtained in 10 coin tosses). The first appearance
APPENDIX 3.A: Derivation of the Gaussian distribution 193

of the normal distribution and the associated CDF (the probability integral) occurs in a latin
pamphlet published by de Moivre in 1733 [Daw and Pearson, 1972]. This original derivation
hinges on the approximation to the Gamma function known as Stirling’s formula (which was,
in fact, discovered in a simpler form and used by de Moivre before Stirling). It is the approach
we follow here. The normal distribution was rediscovered by Laplace in 1778 as he derived the
central limit theorem, and rediscovered independently by Adrian in 1808 and Gauss in 1809 in
the process of characterizing the statistics of errors in astronomical measurements.
We start by expanding log(g(x)) into its Taylor series representation about its point xm
where g(x) is maximum. Since the relation between g(x) and log(g(x)) is monotonic, log(g(x))
is also maximum at x = xm . Hence, we have:
» –
d log(g(x))
log(g(x)) = log(g(xm )) + (x − xm ) +
dx x=xm
» –
1 d2 log(g(x))
(x − xm )2 +
2! dx2 x=xm
» –
1 d3 log(g(x))
(x − xm )3 + · · · ,
3! dx3 x=xm

X » –
1 dk log(g(x))
= (x − xm )k . (3.A.2)
k! dxk x=xm
k=0

Taking the logarithm of g(x), we find

log(g(x)) = log(N !) − log(x!) − log((N − x)!) + x log(p) + (N − x) log(1 − p). (3.A.3)

Since N is large by hypothesis and the expansion is about xm which is distant from both 0
and N (since it corresponds to the maximum of g(x)), thus making x and N − x also large,
Stirling’s approximation formula can be applied to the factorials x! and (N − x)! to yield

1
log(x!) ≈ log(2πx) + x(log x − 1)
2
≈ x(log(x) − 1). (3.A.4)

Thus, we have
d log(x!)
≈ (log(x) − 1) + 1 = log(x), (3.A.5)
dx
and

d log(N − x)! d
≈ [(N − x)(log(N − x) − 1)]
dx dx
−1
= −(log(N − x) − 1) + (N − x)
N −x
= − log(N − x), (3.A.6)

which, combined with (3.A.3), leads to

d log(g(x))
≈ − log(x) + log(N − x) + log(p) − log(1 − p). (3.A.7)
dx
194 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING

The maximum point xm can be easily found by setting the above derivative to 0 and solving
it for x. Hence,
„ «
p N − xm
log = 0
1 − p xm
p N − xm
= 1
1 − p xm
(N − xm )p = (1 − p)xm
xm (p + (1 − p)) = Np
xm = N p. (3.A.8)

We can now find the first coefficients in the Taylor series expansion. First, we have
» –
d log(g(x))
= 0, (3.A.9)
dx x=xm

by definition, since xm is the maximum.


Using (3.A.8), we have
» 2 –
d log(g(x)) 1 1 1
= − − =− (3.A.10)
dx2 x=xm x m N − xm N p(1 − p)
» 3 –
d log(g(x)) 1 1 1 − 2p
= − = 2 2 , (3.A.11)
dx3 x=xm x2m (N − xm )2 p N (1 − p)2

where we note that the 3rd coefficient is much smaller (by a factor proportional to N1 ) than the
second one as N and xm grow large.
Since their contribution is not significant, all terms in the Taylor expansion beyond k = 2
are neglected. Taking the exponential of (3.A.2), we find
(x−x m )2
− 2N p(1−p)
g(x) = g(xm )e , (3.A.12)

which is a smooth function of x. It only needs to be normalized to become a PDF, i.e.

fX (x) = Kg(x), (3.A.13)

where »Z ∞ –
K= g(x)dx , (3.A.14)
−∞
and
Z ∞ Z ∞ (x−xm )2

g(x)dx = g(xm )e 2N p(1−p) dx
−∞ −∞
Z ∞
− u2
= g(xm ) e 2N p(1−p) du
−∞
Z ∞
1 − v
= 2g(xm ) √ e 2N p(1−p) dv
0 2 v
p
= g(xm ) 2πN p(1 − p). (3.A.15)

Therefore, the distribution is


1 (x−µ)2

fX (x) = √ e 2σ2 , (3.A.16)
2πσ
REFERENCES 195

where the variance is


σ 2 = N p(1 − p), (3.A.17)
and the mean is given by
µ = xm = N p, (3.A.18)
since the distribution is symmetric about its peak, which is situated at xm = N p.

References

[Andersen,1958] T. W. Andersen, An introduction to multivariate statistical analysis. New


York: J. Wiley & Sons.
[Davenport, 1970] W. B. Davenport, Jr., Probability and random processes. New York:
McGraw-Hill.
[Ratnarajah et al., 2004] T. Ratnarajah, R. Vaillancourt and M. Alvo, “Jacobians and hyper-
geometric functions in complex multivariate analysis,” to appear in Can. Appl. Math.
Quaterly.
[Daw and Pearson, 1972] R. H. Saw and E. S. Pearson, “Studies in the history of probabil-
ity and statistics XXX : Abraham de Moivre’s 1733 derivation of the normal curve : a
bibliographical note,” Biometrika, vol. 59, pp. 677-680.
[Goodman, 1963] N. R. Goodman, “Statistical analysis based on a certain multivariate com-
plex Gaussian distribution,” Ann. Math. Statist., vol. 34, pp. 152-177.
[Knobloch, 1994] E. Knobloch, “From Gauss to Weierstrass: determinant theory and its eval-
uations,” in The intersection of history and mathematics, vol. 15 of Sci. Networks Hist.
Stud., pp. 51-66.
[Leon-Garcia, 1994] A. Leon-Garcia, Probability and random processes for electrical engi-
neering, 2nd ed. Addison-Wesley.
[James, 1954] A. T. James, “Normal multivariate analysis and the orthogonal group,” Ann.
Math. Statist., vol. 25, pp. 40-75.
[Muirhead, 1982] R. J. Muirhead, Aspects of mutivariate statistical theory. New York: J. Wiley
& Sons.
[Nakagami, 1960] M. Nakagami, “The m-distribution, a general formula of intensity distribu-
tion of rapid fading,” in Statistical Methods in Radio Wave Propagation, W. G. Hoffman,
Ed., Oxford: Pergamon.
[Prudnikov et al., 1986a] A. P. Prudnikov, Y. U. Brychkov and O. I. Marichev, Integrals and
Series, vol. 2: Elementary Functions, Amsterdam: Gordon and Breach.
[Prudnikov et al., 1986b] A. P. Prudnikov, Y. U. Brychkov and O. I. Marichev, Integrals and
Series, vol. 2: Special Functions, Amsterdam: Gordon and Breach.
[Papoulis and Pillai, 2002] A. Papoulis and S. U. Pillai, Probability, random variables, and
stochastic processes, 4th ed. New York: McGraw-Hill.
[Rice, 1944] S. O. Rice, ‘Mathematical analysis of random noise,” Bell Syst. Tech. J., vol. 23,
pp. 282-332.
[Rice, 1945] S. O. Rice, ‘Mathematical analysis of random noise,” Bell Syst. Tech. J., vol. 24,
pp. 46-156. [Reprinted with [Rice, 1944] in Selected papers on noise and stochastic pro-
cesses, N. Wax, ed. New York: Dover, pp. 133-294].

You might also like