Chapter 3
Chapter 3
Not only does God definitely play dice, but He sometimes confuses us by throwing them
where they cannot be seen.
—Stephen Hawking
Abstract This chapter aims to provide a cohesive overview of basic probability concepts,
starting from the axioms of probability, covering events, event spaces, joint and
conditional probabilities, leading to the introduction of random variables, dis-
crete probability distributions (PDP) and probability density functions (PDFs).
Then follows an exploration of random variables and stochastic processes in-
cluding cumulative distribution functions (CDFs), moments, joint densities, mar-
ginal densities, transformations and algebra of random variables. A number of
useful univariate densities (Gaussian, chi-square, non-central chi-square, Rice,
etc.) are then studied in turn. Finally, an introduction to multivariate statis-
tics is provided, including the exterior product (used instead of the conventional
determinant in matrix r.v. transformations), Jacobians of random matrix trans-
formations and culminating with the introduction of the Wishart distribution.
Multivariate statistics are instrumental in characterizing multidimensional prob-
lems such as array processing operating over random vector or matrix channels.
Throughout the chapter, an emphasis is placed on complex random variables,
vectors and matrices, because of their unique importance in digital communica-
tions.
91
92 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
1. Introduction
The theory of probability and stochastic processes is of central importance
in many core aspects of communication theory, including modeling of informa-
tion sources, of additive noise, and of channel characteristics and fluctuations.
Ultimately, through such modeling, probability and stochastic processes are in-
strumental in assessing the performance of communication systems, wireless
or otherwise.
It is expected that most readers will have at least some familiarity with this
vast topic. If that is not the case, the brief overview provided here may be
less than satisfactory. Interested readers who wish for a fuller treatment of
the subject from an engineering perspective may consult a number of classic
textbooks [Papoulis and Pillai, 2002], [Leon-Garcia, 1994], [Davenport, 1970].
The subject matter of this book calls for the development of a perhaps lesser-
known, but increasingly active, branch of statistics, namely multivariate statis-
tical theory. It is an area whose applications — in the field of communication
theory in general, and in array processing / MIMO systems in particular —
have grown considerably in recent years, mostly thanks to the usefulness and
polyvalence of the Wishart distribution. To supplement the treatment given
here, readers are directed to the excellent textbooks [Muirhead, 1982] and [An-
dersen,1958].
2. Probability
Experiments, events and probabilities
Definition 3.1. A probability experiment or statistical experiment consists
in performing an action which may result in a number of possible outcomes,
the actual outcome being randomly determined.
For example, rolling a die and tossing a coin are probability experiments. In
the first case, there are 6 possible outcomes while in the second case, there are
2.
Definition 3.2. The sample space S of a probability experiment is the set of
all possible outcomes.
In the case of a coin toss, we have
S = {t, h} , (3.1)
Probability and stochastic processes 93
and
2 1
P (F̄ ) =
= .
6 3
Definition 3.6. The sum or union of two events is an event that contains all
the outcomes in the two events.
For example,
P (X) ≤ 1.
Twin experiments
It is often of interest to consider two separate experiments as a whole. For
example, if one probability experiment consists of a coin toss with event space
S = {h, t}, two consecutive (or simultaneous) coin tosses constitute twin ex-
periments or a joint experiment. The event space of such a joint experiment is
therefore
S2 = {(h, h), (h, t), (t, h), (t, t)} . (3.11)
Let Xi (i = 1, 2) correspond to an outcome on the first coin toss and Yj (j =
1, 2) correspond to an outcome on the second coin toss. To each joint outcome
(Xi , Yj ) is associated a joint probability P (Xi , Yj ). This corresponds naturally
to the probability that events Xi and Yj occurred and it can therefore also be
written P (Xi ∩ Yj ). Suppose that given only the set of joint probabilities, we
wish to find P (X1 ) and P (X2 ), that is, the marginal probabilities of events X1
and X2 .
Definition 3.9. The marginal probability of an event E is, in the context of
a joint experiment, the probability that event E occurs irrespective of the other
constituent experiments in the joint experiment.
In general, a twin experiment has outcomes Xi , i = 1, 2, . . . , N1 , for the
first experiment and Yj , j = 1, 2, . . . , N2 for the second experiment. If all the
Yj are mutually exclusive, the marginal probability of Xi is given by
N2
�
P (Xi ) = P (Xi , Yj ) . (3.12)
j=1
Now, suppose that the outcome of one of the two experiments is known, but
not the other.
Definition 3.10. The conditional probability of an event Xi given an event
Yj is the probability that event Xi will occur given that Yj has occured.
96 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Hence, the probability that X and Y occur is the probability that one of
these events occurs times the probability that the other event occurs, given that
the first one has occurred.
Furthermore, should events X and Y be independent, the above reduces to
(special multiplicative rule):
3. Random variables
In the preceding section, it was seen that a statistical experiment is any op-
eration or physical process by which one or more random measurements are
made. In general, the outcome of such an experiment can be conveniently
represented by a single number.
Let the function X(s) constitute such a mapping, i.e. X(s) takes on a value
on the real line as a function of s where s is an arbitrary sample point in the
sample space S. Then X(s) is a random variable.
Definition 3.14. A function whose value is a real number and is a function of
an element chosen in a sample space S is a random variable or r.v..
Given a die toss, the sample space is simply
S = {1, 2, 3, 4, 5, 6} . (3.20)
which is an r.v. that takes on a value of 1 if the die roll is even and -1 otherwise.
Random variables need not be defined directly from a probability experi-
ment, but can actually be derived as functions of other r.v.’s. Going back to
example 3.2, and letting D1 (s) = s be an r.v. associated with the first die and
D2 (s) = s with the second die, we can define the r.v’s
X = D1 + D2 , (3.22)
Y = |D1 − D2 |, (3.23)
where X is an r.v. corresponding to the sum of the dice and defined as X(sX ) =
sX , where sX is an element of the sample space SX = {2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12}. The same holds for Y , but with respect to sample space
SY = {0, 1, 2, 3, 4, 5}.
Example 3.3
Consider a series of N consecutive coin tosses. Let us define the r.v. X as
being the total number of heads obtained. Therefore, the sample space of X is
SX = {0, 1, 2, 3, . . . , N } . (3.24)
Probability distribution
Since each value that a discrete random variable can assume corresponds to
an event or some quantification / mapping / function of an event, it follows that
each such value is associated with a probability of occurrence.
Probability and stochastic processes 99
x 0 1 2 3 4
P (X = x) 1
16
4
16
6
16
4
16
1
16
S = {tttt, ttth, ttht, tthh, thtt, thth, thht, thhh, httt, htth, htht,
hthh, hhtt, hhth, hhht, hhhh} , (3.25)
1
and each of its sample points is equally likely with probability 16 . Only one of
these outcomes (tttt) has no heads; it follows that P (X = 0) = 16 1
. However,
four outcomes have one head and three tails (httt, thtt, ttht, ttth), leading to
to P (X = 1) = 16 4
. Furthermore, there are 6 possible combinations of 2 heads
and 2 tails, yielding P (X = 2) = 16 6
. By symmetry, we have P (X = 3) =
P (X = 1) = 16 and P (X = 4) = P (X = 0) = 16
4 1
.
Knowing that the number of combinations of N distinct objects taken n at
a time is � �
N N!
= , (3.26)
n n!(N − n)!
the probabilities tabulated above (for N = 4) can be expressed with a single
formula, as a function of x:
� �
4 1
P (X = x) = , x = 0, 1, 2, 3, 4. (3.27)
x 16
Such a formula constitutes the discrete probability distribution of X.
Suppose now that the coin is not necessarily fair and is characterized by
a probability p of getting heads and a probability 1 − p of getting tails. For
arbitrary N , we have
� �
N
P (X = x) = px (1 − p)N −x , x ∈ {0, 1, . . . , N } , (3.28)
x
which is known as the binomial distribution.
The underlying r.v. is of the continuous variety, and Figure 3.1d is a graph-
ical representation of its probability density function (PDF). While a PDF
cannot be tabulated like a discrete probability distribution, it certainly can be
expressed as a mathematical function. In the case of Figure 3.1d, the PDF is
expressed
2
1 − x2
fX (x) = √ e 2σX , (3.29)
2πσX
where σX 2 is the variance of the r.v. X. This density function is the all-
0.25
0.15
0.2
P (X = x)
P (X = x)
0.15
0.1
0.1
0.05
0.05
0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
x x
(a) N = 8 (b) N = 16
0.14
0.14
0.12
0.12
0.1
0.1
P (X = x)
P (X = x)
0.08
0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
1 5 10 15 20 25 30 33 0 5 10 15 20 25 30
x x
Figure 3.1. Histograms of the discrete probability function of the binomial distributions with
(a) N = 8; (b) N = 16; (c) N = 32; and (d) a Gaussian PDF with the same mean and variance
as the binomial distribution with N = 32.
where fX (x) is the PDF of variable X and FX (x) is its cumulative distribu-
tion function (CDF). A CDF has three outstanding properties:
1. FX (−∞) = 0 (obvious from (3.30));
2. FX (∞) = 1 (a consequence of property 2 of PDFs);
3. FX (x) increases in a monotone fashion from 0 to 1 (from (3.30) and prop-
erty 1 of PDFs).
Furthermore, the relation (3.30) can be inverted to yield
dFX (x)
fX (x) = . (3.31)
dx
It is also often useful to determine the probability that an r.v. X takes on a
value falling in an interval [x1 , x2 ]. This can be easily accomplished using the
PDF of X as follows:
� x2 � x2 � x1
P (x1 < X ≤ x2 ) = fX (x)dx = fX (x)dx − fX (x)dx
x1 −∞ −∞
= FX (x2 ) − FX (x1 ). (3.32)
This leads us to the following counter-intuitive observation. If the PDF is
continuous and we wish to find P (X = x1 ), it is insightful to proceed as
folllows:
N �
� �
N
fX (x) = pn (1 − p)N −n δ(x − n), (3.34)
n
n=0
and, for a general discrete r.v. with a sample space of N elements, we have
N
�
fX (x) = P (X = xn )δ(x − xn ). (3.35)
n=1
�� ��
� �
�X� = min �X − X̂ � . (3.37)
X̂
Probability and stochastic processes 103
For another angle to this concept, consider a series of N trials of the same
experiment. If each trial is unaffected by the others, the outcomes are asso-
ciated with a set of N independent and identically distributed (i.i.d.) r.v.’s
X1 , . . . , XN . The average of these outcomes is itself an r.v. given by
N
� Xn
Y = . (3.39)
N
n=1
As N gets large, Y will tend towards the mean of the Xn ’s and in the limit
N
� Xn
Y = lim = �X� , (3.40)
N →∞ N
n=1
� ∞
σX =
2
(x − µX )2 fX (x)dx, (3.45)
−∞
�X + Y � = �X� + �Y � ,
� � � �
X + X 2 = �X� + X 2 .
�αX� = α �X� .
These two properties stem readily from the (3.43) and the properties of in-
tegrals.
Theorem 3.1. Given the expectation of a product of random variables, it is
equal to the product of expectations, i.e.
�XY � = �X� �Y � ,
Proof. We can readily generalize (3.43) for the multivariable case as follows:
� ∞ � ∞
�g(X1 , X2 , · · · , XN )� = ··· g (X1 , X2 , · · · , XN ) ×
−∞
� �� −∞�
N -fold
fX1 ,··· ,XN (x1 , · · · xN ) dx1 · · · dxN . (3.47)
Therefore, we have
� ∞ � ∞
�XY � = xyfX,Y (x, y)dxdy, (3.48)
−∞ −∞
where, if and only if X and Y are independent, the density factors to yield
� ∞� ∞
�XY � = xyfX (x)fY (y)dxdy
−∞ −∞
� ∞ � ∞
= xfX (x)dx yfY (y)dy
−∞ ∞
= �X� �Y � . (3.49)
Y = g(X). (3.53)
Example 3.4
Let Y = AX + B where A and B are arbitrary constants. Therefore, we have
X = Y −B
A and � �
y−B
∂ A 1
=
∂y A
Probability and stochastic processes 107
.
It follows that
� �
1 y−B
fY (y) = fX . (3.58)
A A
Suppose that X follows a Gaussian distribution with a mean of zero and a
variance of σX
2 . Hence,
2
1 − x2
fX (x) = √ e 2σX , (3.59)
2πσX
and
(y−B)2
1 − 2 2
fY (y) = √ e 2A σX . (3.60)
2πσX A
is B.
X1 = h1 (Y1 , Y2 ),
(3.65)
X2 = h2 (Y1 , Y2 ),
∆x2
h1 , h2
∆y2
w α
∆x1 g1 , g2
v y1
and is the Jacobian J of the transformation. The Jacobian embodies the scaling
effects of the transformation and ensures that the new PDF will integrate to
unity.
Probability and stochastic processes 109
fY1 ,Y2 (y1 , y2 ) = JfX1 ,X2 (g1−1 (y1 , y2 ), g2−1 (y1 , y2 )). (3.68)
In the above, it has been assumed that the mapping between the original
and the new coordinate system was one-to-one. What if several regions in the
original domain map to the same rectangular area in the new domain? This
implies that the system
Y1 = h1 (X1 , X2 ),
(3.70)
Y2 = h2 (X1 , X2 ),
� � � � � �
(1) (1) (2) (2) (K) (K)
has several solutions x1 , x2 , x1 , x2 , . . . , x1 , x2 . Then all
these solutions (or roots) contribute equally to the new PDF, i.e.
K
� � � � �
(k) (k) (k) (k)
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 x1 , x2 J x1 , x2 . (3.71)
k=1
y = g(x), (3.72)
Y = X1 + X2 . (3.76)
One way to attack this problem is to fix one r.v., say X2 , and treat this as a
transformation from X1 to Y . Given X2 = x2 , we have
Y = g(X1 ), (3.77)
where
g(X1 ) = X1 + x2 , (3.78)
and
g −1 (Y ) = Y − x2 . (3.79)
Probability and stochastic processes 111
It follows that
∂g −1 (y)
fY |X2 (y|x2 ) = fX1 (y − x2 )
∂y
= fX1 (Y − x2 ). (3.80)
This is the general formula for the sum of two independent r.v.’s and it can
be observed that it is in fact a Fourier convolution of the two underlying PDFs.
It is common knowledge that a convolution becomes a simple multiplica-
tion in the Fourier transform domain. The same principle applies here with
characteristic functions and it is easily demonstrated.
Consider a sum of N independent random variables:
N
�
Y = Xn . (3.83)
n=1
What if the random variables are not independent and exhibit correlation?
Consider again the case of two random variables; this time, the problem is
conveniently addressed by an adequate joint transformation. Let
Y = g1 (X1 , X2 ) = X1 + X2 , (3.87)
Z = g2 (X1 , X2 ) = X2 , (3.88)
where Z is not a useful r.v. per se, but was included to allow a 2 × 2 transfor-
mation.
The corresponding Jacobian is
� �
� 1 0 �
J =� � � = 1. (3.89)
−1 1 �
Hence, we have
fY,Z (y, z) = fX1 ,X2 (y − z, z), (3.90)
and we can find the marginal PDF of Y with the usual integration procedure,
i.e.
� ∞
fY (y) = fY,Z (y, z)fZ (z)dz
−∞
� ∞
= fX1 ,X2 (y − x2 , x2 )fX2 (x2 )dx2 . (3.91)
−∞
This formula happens to be the Mellin convolution of fX1 (x) and fX2 (x)
and it is related to the Mellin tranform in the same way that Fourier convolution
is related to the Fourier transform (see the overview of Mellin transforms in
subsection 3).
Hence, given a product of N independent r.v.’s
N
�
Z= Xn , (3.95)
n=1
we can immediately conclude that the Mellin transform of fZ (z) is the product
of the Mellin transforms of the constituent r.v.’s, i.e.
N
�
MfZ (s) = MfX (s) = [MfX (s)]N . (3.96)
n=1
It follows that we can find the corresponding PDF through the inverse Mellin
transform. This is expressed
�
1
fZ (z) = MfZ (s) x−s dx, (3.97)
2π L±∞
where one particular integration path was chosen, but others are possible (see
chapter 2, section 3, subsection on Mellin transforms).
4. Stochastic processes
Many observable parameters that are considered random in the world around
us are actually functions of time, e.g. ambient temperature and pressure, stock
market prices, etc. In the field of communications, actual useful message sig-
nals are typically considered random, although this might seem counterintu-
itive. The randomness here relates to the unpredictability that is inherent in
useful communications. Indeed, if it is known in advance what the message
is (the message is predetermined or deterministic), there is no point in trans-
mitting it at all. On the other hand, the lack of any fore-knowledge about the
message implies that from the point-of-view of the receiver, the said message
is a random process or stochastic process. Moreover, the omnipresent white
noise in communication systems is also a random process, as well as the chan-
nel gains in multipath fading channels, as will be seen in chapter 4.
Definition 3.20. Given a random experiment with a sample space S, compris-
ing outcomes λ1 , λ2 , . . . , λN , and a mapping between every possible outcome
λ and a set of corresponding functions of time X(t, λ), then this family of
functions, together with the mapping and the random experiment, constitutes a
stochastic process.
114 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Definition 3.22. The set of all sample functions in a stochastic process is called
an ensemble.
λ = g2 (Y1 , Y2 , · · · , YN ), (3.100)
Example 3.5
Consider the stochastic process defined by
for any t, t1 , t2 and T is any arbitrary time shift, then the process is said to be
stationary in the wide-sense or wide-sense stationary.
In the field of communications, it is typically considered sufficient to satisfy
the wide-sense stationarity (WSS) conditions. Hence, the expression “station-
ary process” in much of the literature (and in this book!) usually implies WSS.
Gaussian processes constitute a special case of interest; indeed, a Gaussian
process that is WSS is also automatically stationary in the strict sense. This is
Probability and stochastic processes 117
by virtue of the fact that all high-order moments of the Gaussian distribution
are functions solely of its mean and variance.
Since the point of origin t1 becomes irrelevant, the autocorrelation function
of a stationary process can be specified as a function of a single argument, i.e.
.
Property 3.6. If X(t) is periodic, then RXX (τ ) is also periodic, i.e.
Unlike the ensemble statistics, the time-averaged statistics are random vari-
ables; their actual values depend on which member function is used for time
averaging.
Obtaining ensemble statistics requires averaging over all member functions
of the sample space S. This requires either access to all said member functions,
or perfect knowledge of all the joint PDFs characterizing the process. This
is obviously not possible when observing a real-world phenomenon behaving
according to a stochastic process. The best that we can hope for is access to
time recordings of a small set of member functions. This makes it possible to
compute time-averaged statistics.
The question that arises is: can a time-averaged statistic be employed as an
approximation of the corresponding ensemble statistic?
Not all processes are ergodic. It is also very difficult to determine whether a
process is ergodic in the strict sense, as defined above. However, it is sufficient
in practice to determine a limited form of ergodicity, i.e. with respect to one
or two basic statistics. For example, a process X(t) is said to be ergodic in
the mean if µX = MX . Similarly, a process is said to be ergodic in the au-
tocorrelation if RXX = RXX . An ergodic process is necessarily stationary,
although stationarity does not imply ergodicity.
fX(t1,1 ),X(t1,2 ),··· ,X(t1,M ),Y (t2,1 ),Y (t2,2 ),··· ,Y (t2,N ) (x1 , x2 , . . . , xM , y1 , y2 , . . . , yN ) .
X(t) Y (t)
h(t)
Figure 3.3. Linear system with impulse response h(t) and stochastic process X(t) at its input.
a different spectrum X(f, λ) for every member function X(t, λ). However,
stochastic processes are in general infinite energy signals which implies that
their Fourier transform in the strict sense does not exist. In the time domain, a
process is characterized essentially through its mean and autocorrelation func-
tion. In the frequency domain, we resort to the power spectral density.
Definition 3.35. The power spectral density (PSD) of a random process X(t)
is a spectrum giving the average (in the ensemble statistic sense) power in the
process at every frequency f .
The PSD can be found simply by taking the Fourier transform of the auto-
correlation function, i.e.
� ∞
SXX (f ) = RXX (τ )e−j2πf τ dτ, (3.121)
−∞
which obviously implies that the autocorrelation function can be found from
the PSD SXX (f ) by performing an inverse transform, i.e.
� ∞
RXX (τ ) = SXX (f )ej2πf τ df. (3.122)
−∞
SY Y (f ) = H(f ) ∗ H ∗ (f )SXX (f )
= |H(f )|2 SXX (f ). (3.125)
Hence, as with any discrete Fourier transform, the PSD SXX (f ) is periodic.
More precisely, we have SXX (f ) = SXX (f + n) where n is any integer.
Given a discrete-time linear time-invariant system with a discrete impulse
response h[k] = h(tk ), the output process Y [k] of this system when a process
X[k] is applied at its input is given by
∞
�
Y [k] = h[n]X[k − n], (3.136)
n=−∞
Probability and stochastic processes 125
Cyclostationarity
The modeling of signals carrying digital information implies stochastic pro-
cesses which are not quite stationary, although it is possible in a sense to treat
them as stationary and thus obtain the analytical convenience associated with
this property.
Definition 3.38. A cyclostationary stochastic process is a process with non-
constant mean (it is therefore not stationary, neither in the strict sense nor the
126 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
wide-sense) such that the mean and autocorrelation function are periodic in
time with a given period T .
Consider a random process
∞
�
S(t) = a[k]g(t − kT ), (3.140)
k=−∞
Binomial distribution
We have already seen in section 2 that a discrete random variable following
a binomial distribution is characterized by the PDF
N �
� �
N
fX (x) = pn (1 − p)N −n δ(x − n). (3.145)
n
n=0
We can find the CDF by simply integrating the above. Thus we have
� x N �
� �
N
FX (x) = pn (1 − p)N −n δ(α − n)dα
−∞ n
n=0
N
� � � � x
N
= pn (1 − p)N −n δ(α − n)dα, (3.146)
n −∞
n=0
where � �
x
1, if n < x
δ(α − n)dα = (3.147)
−∞ 0, otherwise
It follows that
�x� � �
� N
FX (x) = pn (1 − p)N −n , (3.148)
n
n=0
128 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where �x� denotes the floor operator, i.e. it corresponds to the nearest integer
which is smaller or equal to x.
The characteristic function is given by
� ∞� N � �
N
φX (jt) = pn (1 − p)N −n δ(x − n)ejtx dx
−∞ n
n=0
N
� � �
N
= pn (1 − p)N −n ejtn , (3.149)
n
n=0
which, according to the binomial theorem, reduces to
φX (jt) = (1 − p + pejt )N . (3.150)
Furthermore, the Mellin transform of the binomial PDF is given by
� ∞� N � �
N
MfX (s) = pn (1 − p)N −n δ(x − n)xs−1 dx
−∞ n
n=0
N
� � �
N
= pn (1 − p)N −n ns−1 , (3.151)
n
n=0
which implies that the kth moment is
� � � N � �
N
X =k
pn (1 − p)N −n nk . (3.152)
n
n=0
If k = 1, we have
N �
� �
N
�X� = pn (1 − p)N −n n
n
n=0
�N
N!
= pn (1 − p)N −n
(N − n)!(n − 1)!
n=0
N� −1 � �
N −1 �
= Np � pn (1 − p)N −1−n (3.153)
n
n� =−1
� �
N −1
where n = n − 1 and, noting that
� = 0, we have
−1
N
� −1 � �
N −1 � �
�X� = N p � pn (1 − p)N −1−n
�
n
n =0
= N p(1 − p + p)N −1
= N p. (3.154)
Probability and stochastic processes 129
Likewise, if k = 2, we have
N �
� �
� � N
X 2
= pn (1 − p)N −n n2
n
n=0
N
� −1 � �
N −1 � �
= Np pn (1 − p)N −1−n (n� + 1)
n�
n� =0
� N −2 � �
� N −2 �� ��
= N p (N − 1)p pn (1 − p)N −2−n +
n��
n�� =0
N −1 � � �
� N −1 n� N −1−n�
p (1 − p) , (3.155)
n�
n� =0
Uniform distribution
A continuous r.v. X characterized by a uniform distribution can take any
value within an interval [a, b] and this is denoted X ∼ U (a, b). Given a smaller
interval [∆a, ∆b] of width ∆ such that ∆a ≥ a and ∆b ≤ b, we find that, by
definition
∆
P (X ∈ [∆a , ∆b ]) = P (∆a ≤ X ≤ ∆b ) = , (3.158)
b−a
regardless of the position of the interval [∆a , ∆b ] within the range [a, b].
The PDF of such an r.v. is simply
� 1
fX (x) = b−a , if x ∈ [a, b], (3.159)
0, elsewhere,
Gaussian distribution
From Appendix 3.A and Example 3.4, we know that the PDF of a Gaussian
r.v. is
(x−µX )2
1 − 2
fX (x) = √ e 2σX , (3.166)
2πσX
where σX 2 is the variance and µ is the mean of X.
X
The corresponding CDF is
� x
FX (x) = fX (α)dα
−∞
�
1 x α−µX
−
2σ 2
= √ e X dα, (3.167)
2πσX −∞
Probability and stochastic processes 131
α−µ
which, with the variable substitution u = √ X,
2σX
becomes
� x−µ
√ X
1 2σX 2
FX (x) = √ e−u du
π
�−∞ � ��
1
1 + erf x−µ
√ X , if x − µX > 0,
2�
� 2σX ��
|x−µX |
= 2 1 − erf
1 √ (3.168)
� 2σX�
= 1 erfc |x−µ |
2
√ X
2σ
otherwise,
X
where erf(·) is the error function and erfc(·) is the complementary error func-
tion.
Figure 3.4, shows the PDF and CDF of the Gaussian distribution.
0.12
0.1
0.08
fX (x)
0.06
0.04
0.02
0
−15 −10 −5 0 5 10 15
x
(a) Probability density function fX (x)
1
P (X ≤ x) = FX (x)
0.8
0.6
0.4
0.2
0
−15 −10 −5 0 5 10 15
x
(b) Cumulative density function fX (x)
Figure 3.4. The PDF and CDF of the real Gaussian distribution with zero mean and a variance
of 10.
132 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
� � � (x−µX )2
∞
1 −
2σ 2
(X − µX ) k
= (x − µX ) √ k
e X , (3.170)
−∞ 2πσX
(x−µX )2
where the integral can be reduced by substituting u = 2
2σX
to yield
� � �k�
2 2 � � � ∞ k−1
2σX
(X − µX )k = √ 1 − (−1)k+1
u 2 e−u du, (3.171)
π 0
The raw moments are best obtained as a function of the central moments.
For the k th raw moment, assuming k an integer, we have:
� � � (x−µX )2
∞
1 −
2σ 2
X k
= x √ e k X dx. (3.176)
−∞ 2πσX
Probability and stochastic processes 133
The Mellin transform does not follow readily because the above assumes
that k is an integer. The Mellin transform variable s, on the other hand, is typ-
ically not an integer and can assume any value in the complex plane. Further-
more, it may sometimes be useful to find fractional moments. From (3.176),
we have
∞ � �
� s−1 � � s−1 � �
X = µs−1−n
X (X − µX )s−1 , (3.178)
n
n=0
where the summation now runs form 0 to ∞ to account for the fact that s
might not be an integer. If it is an integer, terms for n > s − 1 will be nulled
because the combination operator has (s − 1 − n)! at the denominator and the
factorial of a negative integer is infinite. It follows that the above equation is
true regardless of the value of s.
Applying the identity (2.197), we find
∞
� � � Γ(1 − s + n) s−1−n � �
X s−1 = µX (X − µX )s−1 , (3.179)
Γ(1 − s)n!
n=0
which, by virtue of 3.173 and bearing in mind that the odd central moments
are null, becomes
∞ � 2n� � �
� s−1 � � Γ(1 − s + 2n� ) s−1−2n� 2n σX � 1
X = µ √ Γ n + . (3.180)
�
Γ(1 − s)(2n� )! X π 2
n =0
The Gamma function and the factorial in 2n� can be expanded by virtue of
property 2.68 to yield
∞ � � � � � � 2n�
� s−1 � � Γ n� + 1−s Γ n + 1 − 2s 2n σX
X = 2
. (3.181)
�
Γ(1 − s)n� !
n =0
134 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Expanding Γ(1 − s) in the same way allows the formation of two Pochham-
mer functions by combination with the numerator Gamma functions; this re-
sults in
∞ � 1−s � � �
� s−1 � �
2 n � 1 − s
2 n� n� 2n�
X = 2 σX
n!
n� =0
� �
1−s s
= 2 F0 , 1 − ; 2σX 2
. (3.182)
2 2
N
� t2 σi2
φY (jt) = ejtµi − 2
i=1
�N �� N �
� � 2σ2
−t
= ejtµi
e i
2
i=1 i=1
P t2 PN
jt N − 2
= e µ
i=1 i 2 i=1 σi , (3.184)
�N
which, of course, is the C. F. of a Gaussian r.v. with mean i=1 µi and vari-
�
ance N i=1 σi .
2
Z = A + jB, (3.185)
Probability and stochastic processes 135
where A and B are real Gaussian variates. It was found in [Goodman, 1963]
that such a complex normal r.v. has desirable analytic properties if
� � � �
(A − µA )2 = (B − µB )2 , (3.186)
�(A − µA )(B − µB )� = = 0. (3.187)
The latter implies that A and B are uncorrelated and independent.
Furthermore, the variance of Z is defined
� �
σZ2 = |Z − µZ |2
= �(Z − µZ )(Z − µZ )∗ � , (3.188)
where µZ = µA + jµB . Hence, we have
σZ2 = �[(A − µA ) + j(B − µB )] [(A − µA ) − j(B − µB )]�
� � � �
= (A − µA )2 + (B − µB )2
= σA
2
+ σB
2
= 2σA .
2
(3.189)
It follows that A and B obey the same marginal distribution which is
(a−µA )2
1 −
2σ 2
fA (a) = √ e A . (3.190)
2πσA
Hence, the joint distribution of A and B is
(a−µA )2 (b−µB )2
1 −
2σ 2
−
2σ 2
fA,B (a, b) = e A e B
2πσA σB
(a−µA )2 +(b−µB )2
1 − 2σ 2
= 2 e
A , (3.191)
2πσA
which directly leads to
(z−µZ )(z−µZ )∗
1 − σ2
fZ (z) = e Z . (3.192)
πσZ2
Figure 3.5 shows the PDF fZ (z) plotted as a surface above the complex
plane.
While we can conveniently express many complex PDFs as a function of a
single complex r.v., it is important to remember that, from a formal standpoint,
a complex PDF is actually a joint PDF since the real and imaginary parts are
separate r.v.’s. Hence, we have
fZ (z) = fA,B (a, b) = fRe{Z},Im{Z} (Re{z}, Im{z}). (3.193)
136 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
0.3
fZ (z) 0.2 2
0.1 1
0
−2 0
Re{z}
−1
0 −1
Im{z} 1
2 −2
Figure 3.5. 2
The bidimensional PDF of unit variance Z (σZ = 1) on the complex plane.
The fact that there are two underlying r.v.’s becomes inescapable when we
consider the CDF of a complex r.v. Indeed, the event Z ≤ z is ambiguous.
Therefore, the CDF must be expressed as a joint CDF of the real and imaginary
parts, i.e.
which, provided that the CDF is differentiable along its two dimensions, can
lead us back to the complex PDF according to
∂2
fZ (z) = fRe{Z},Im{Z} (a, b) = F (a, b). (3.195)
∂a∂b Re{Z},Im{Z}
Accordingly, the C. F. of a complex r.v. Z is actually a joint C.F. defined by
� �
φRe{Z},Im{Z} (jt1 , jt2 ) = ejt1 Re{Z}+jt2 Im{Z} . (3.196)
Rayleigh distribution
The Rayleigh distribution characterizes the amplitute of a narrowband noise
process n(t), as originally derived by Rice [Rice, 1944], [Rice, 1945]. In
the context of wireless communications, however, the Rayleigh distribution is
most often associated with multipath fading (see chapter 4).
Probability and stochastic processes 137
Consider that the noise process n(t) is made up of a real and imaginary part
(as per complex baseband notation — see chapter 4):
N (t) = A(t) + jB(t), (3.198)
and that N (t) follows a central (zero-mean) complex Gaussian distribution. It
follows that if N (t) is now expressed in phasor notation, i.e.
N (t) = R(t)ejΘ(t) , (3.199)
where R(t) is the modulus (amplitude) and Θ(t) is the phase, their respective
PDFs can be derived through transformation techniques.
We have 2 2
1 − a σ+b2
fA,B (a, b) = 2 e
N , (3.200)
πσN
2 = 2σ 2 = 2σ 2 and the transformation is
where σN A B
�
R = g1 (A, B) = A2 + B 2 , R ∈ [0, ∞], (3.201)
� �
B
Θ = g2 (A, B) = arctan , Θ ∈ [0, 2π), (3.202)
A
being the conversion from cartesian to polar coordinates. The inverse transfor-
mation is
A = g1−1 (R, Θ) = R cos Θ, A ∈ [−∞, ∞] (3.203)
B = g2−1 (R, Θ) = R sin Θ, B ∈ [−∞, ∞]. (3.204)
Hence, the Jacobian is given by
� ∂R cos Θ ∂R sin Θ � � �
cos Θ sin Θ
J = det ∂R cos Θ ∂R sin Θ = det
∂R ∂R
∂Θ ∂Φ
−R sin Θ R cos Θ
= R(cos Θ + sin Θ) = R.
2 2
(3.205)
It follows that
2
r − σr2
fR,Θ (r, θ) = 2 e
N u(r) [u(θ) − u(θ − 2π)] , (3.206)
πσN
and
� ∞
1
fΘ (θ) = fR,Θ (r, θ)dr =
[u(θ) − u(θ − 2π)] , (3.207)
0 2π
2
� 2π −r � 2π
re σn2
fR (r) = fR,Θ (r, θ)dθ = 2 u(r) dθ
0 πσN 0
2
2r − σr2
= 2 e
N u(r). (3.208)
σN
138 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Rice distribution
If a signal is made up of the addition of a pure sinusoid and narrowband
noise, i.e. its complex baseband representation is
S(t) = C + A(t) + jB(t), (3.215)
where C is a constant and N (t) = A(t) + jB(t) is the narrowband noise
(as characterized in the previous subsection), then the envelope R(t) follows a
Rice distribution.
Probability and stochastic processes 139
0.8
2 =1
σN
0.6
fR (r) 2 =2
σN
0.4
0.2 2 =4
σN
0
0 2 4 6 8 10
r
(a) Probability density function fR (r)
0.8
P (R ≤ r) = FR (r)
2 =1
σN
0.6
2 =2
σN
0.4 2 =3
σN
2 =4
σN
0.2
2 =6
σN
0
0 2 4 6 8 10
r
(b) Cumulative density function FR (r)
Figure 3.6.
` 4−π
2
´ The PDF and CDF of the Rayleigh distribution with various variances σR =
2
σN 4 .
It follows that the Jacobian is identical to that found for the Rayleigh distri-
bution, i.e. J = R.
140 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Therefore, we have
(r cos θ−C)2 +r 2 sin2 θ
r − σ2
fR,Θ (r, θ) = 2 e
N
πσN
2 −2rC cos θ+C 2
r −r σ2
= 2 e
N , (3.220)
πσN
and the marginal distribution fR (r) is given by
� 2π 2 θ+C 2
r − r −2rCσcos 2
fR (r) = 2 e
N dθ (3.221)
0 πσN
2 2 �
2π 2rC cos θ
r − r σ+C 2 2
= 2 e N e σN dθ, (3.222)
πσN 0
where the remaining integral can be reduced (by virtue of the definition of the
modified Bessel function of the first kind) to yield
2 2 � �
r − r σ+C2 2rC
fR (r) = 2 e N 2I0
2 u(r). (3.223)
σN σN
Finding the marginal PDF of Θ is a bit more challenging:
� ∞ 2 θ+C 2
r − r −2rCσcos
2
fΘ (θ) = 2 e N dr (3.224)
0 πσN
2
− C2 � 2 −2rC cos θ
σ ∞ −r
e N
σ2
= 2 re N dr. (3.225)
πσN 0
According to [Prudnikov et al., 1986a, 2.3.15-3], the above integral reduces
to
− C2
2
� � � √ �
e σ
N C 2 cos2 θ 2C cos θ
fΘ (θ) = exp D−2 − , (3.226)
2π 2σN2 σN
where we can get rid of D−2 (z) (the parabolic cylinder function of order -2)
by virtue of the following identity [Prudnikov et al., 1986b, App. II.8]:
� � �
π z2 z z2
D−2 (z) = ze 4 erfc √ + e− 4 (3.227)
2 2
to finally obtain
− C2
2
�√ 2 cos2 θ � � �
e σ
N πC cos(θ) C σ2
C cos θ
fΘ (θ) = e N erfc − +1
2π σN σN
× [u(θ) − u(θ − 2π)] . (3.228)
Probability and stochastic processes 141
In the derivation, we have assumed (without loss of generality and for ana-
lytical convenience) that the constant term was real where in fact, the constant
term can have a phase (Cejθ0 ). This does not change the Rice amplitude PDF,
but displaces the Rice phase distribution so that it is centered at θ0 . Therefore,
in general, we have
− C2
2
� √ C 2 cos2 (θ−θ )
e σ
N πC cos(θ − θ0 ) σ2
0
fΘ (θ) = 1+ e N × (3.229)
2π σN
� ��
C cos (θ − θ0 )
erfc − [u(θ) − u(θ − 2π)] .
σN
It is noteworthy that the second raw moment (a quantity of some importance
since it reflects the average power of Rician process R(t)) can easily be found
without integration. Indeed, we have from (3.216)
� 2� � �
R = (C + A)2 + B 2
� �
= C 2 + 2AC + A2 + B 2
� � � �
= C 2 + A2 + B 2
= C 2 + σN
2
. (3.230)
Y = X 2, (3.233)
142 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
�{Z}
θ0 θ Cejθ
0
�{Z}
0
(a) Geometry of the Rice amplitude and phase distributions in the complex
plane
1.75
1.75
1.5 K = 10 K = 10
1.5
1.25
1.25 K=2
K=2
fR (r)
1
fΘ (θ)
1
K= 1 K=1
16
0.75 K=1 0.75
0.5 0.5
K= 1
16
0.25 0.25
0 0
−3 −2 −1 0 1 2 3 0 0.5 1 1.5 2 2.5
θ − θ0 (radians) r(m)
Figure 3.7. The Rice distributions and their relationship with the complex random variable
Z = (C cos(θ) + A) + j (C sin(θ) + B); (a) the distribution of Z in the complex plane (a
non-central complex Gaussian distribution pictured here with concentric isoprobability curves)
and relations with variables R and θ; (b) PDF of R; (c) PDF of Θ.
Going back now to the general case of N degrees of freedom, we find that
N
� � �−1/2 � �
2 −N/2
φY (jt) = 1 − 2jtσX
2
= 1 − 2jtσX . (3.237)
n=1
The PDF can be obtained by taking the transform of the above, i.e.
� ∞
1 � �
2 −N/2 −jty
fY (y) = 1 − 2jtσX e dt, (3.238)
2π −∞
which, according to [Prudnikov et al., 1986a, 2.3.3-9], reduces to
1 2
fY (y) = � �N/2 y N/2−1 e−y/2σX u(y). (3.239)
2σX
2 Γ (N/2)
The shape of the chi-square PDF for 1 to 10 degrees-of-freedom is shown
in Figure 3.8. It should be noted that for N = 2, the chi-square distribution
reduces to the simpler exponential distribution.
The chi-square distribution can also be derived from complex Gaussian
r.v.’s.
Let
�N
Y = |Zn |2 , (3.240)
n=1
where {Z1 , Z2 , . . . , ZN } are i.i.d. complex Gaussian variates with variance
σZ2 .
If we start again with the case N = 1, we find that the C.F. is
� � � � � � � �−1
φY (jt) = ejtY = ejt|Z1 | = ejt(A +B ) = 1 − jtσZ2
2 2 2
, (3.241)
144 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
2
N = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
1.5
fY (y)
0.5
0
0 2 4 6 8 10
y
Figure 3.8. The chi-square PDF for values of N between 1 and 10.
N
�
|µn | =
� 0, (3.248)
n=1
Yn = Xn2 , (3.249)
where
−(x−µ )2
1 2
n
fXn (x) = √ e 2σX . (3.250)
2πσX
√
According to (3.57) and since Xn = ± Yn , we find that
� −(√y−µ )2 √
−( y−µn )2
�
1 2σ 2
n
2σ 2
1
fYn (y) = √ e X +e X √
2πσX 2 y
µ2
� √ √ �
1
yµn yµ
− y2 − n2 − 2n
σ2
= √ √ e Xe X e X +e X
2σ 2σ σ
2 2πσX y
µ2 �√ �
1 − y2 − n2 yµn
= √ √ e X e X cosh
2σ 2σ
2 . (3.251)
2πσX y σX
146 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where the resulting integral can be solved in a manner similar to that of sub-
section 6 to give
µn
−
2σ 2 ∞
� � �k
� � e X µ2k n (2σX
2 )k+1/2
1
ejtYn = √
2σX k (1 − j2σX2 t)k+1/2 4σX
2
k=0
2 µ2
n /2
e−µn /2σX 1−j2σ 2 t
= e X
(1 − j2σX
2 t)1/2
jtµ2
n
1−j2σ 2 t
e X
= . (3.254)
(1 − j2σX
2 t)1/2
jtU
2
e 1−jt2σX t
= , (3.256)
(1 − jt2σX 2 t)N/2
where
N
�
U= µ2n . (3.257)
n=1
Probability and stochastic processes 147
� 2
U/2σX
1 −U/2σ2 ∞ e−jyt 1−j2σ 2 t
= e X e X dt
2π −∞ (1 − j2σX t)
2 N/2
∞ � � � 2 � �k
1 −U/2σ2 � 1 ∞ e−jyt U/ 2σX
= e X dt
2π k! −∞ (1 − j2σX 2 t)N/2 1 − j2σ 2 t
X
k=0
∞ � 2�k� ∞
1 −U/2σ2 � [U/ 2σX ] e−jyt
= e X dt
2π k! −∞ (1 − j2σ 2 t)N/2+k
X
k=0
(3.258)
Like in the central case, the non-central χ2 distribution can be defined from
complex normal r.v.’s. Given the sum of the moduli of N squared complex
normal r.v.’s with the same variance σZ2 and means {µ1 , µ2 , . . . , µN }, the PDF
is given by
− U
σ2
� � �
e Z y N −1 − y
�Uy
fY (y) = e σ2
Z 0 F1
�
N � 4 u(y). (3.263)
σZ2N σZ
It follows that the corresponding C.F. is
jtU
2
e 1−jσZ t
φY (jt) = � �N . (3.264)
1 − jσZ2 t
where the terms for m > k are zero because they are characterized by a
Gamma function at the denominator (Γ(1 + k − m)) with an integer argu-
ment which is zero or negative. Indeed, such a Gamma function has an infinite
magnitude.
Probability and stochastic processes 149
Therefore, we have
� � k �
� �� �m
k U/σZ2
Y k
= σZ2k Γ(N + k) . (3.269)
m m!
m=0
From this finite sum representation, the mean is easily found to be
� �
U
�Y � = σZ N 1 + 2
2
. (3.270)
σZ N
Likewise, the second raw moment is
� � � �
� 2� 2U U2
Y = σZ (N + 1) N + 2 + 4 .
4
(3.271)
σZ σZ
It follows that the variance is
�
�
σY2 = Y 2 − �Y �2
� �
2U
= σZ N + 2 .
4
(3.272)
σZ
Non-central F -distribution
Consider
Z1 /n1
Y =, (3.273)
Z2 /n2
where Z1 is a non-central χ2 variate with n1 complex degrees-of-freedom and
a non-centrality parameter U , and Z2 is a central χ2 variate with n2 complex
degrees-of-freedom, then Y is said to follow a non-central F -distribution pa-
rameterized on n1 , n2 and U .
Given the distributions
1
fZ1 (x) = xn1 −1 e−x e−U 0 F1 (n1 |U x ) u(x), (3.274)
Γ(n1 )
1
fZ2 (x) = xn2 −1 e−x u(x), (3.275)
Γ(n2 )
we apply two simple univariate transformations:
Z1 n2
A1 = , A2 = , (3.276)
n1 Z2
such that Y = A1 A2 .
It is straightforward to show that
nn1 1 n1 −1 −n1 x −U
fA1 (x) = x e e 0 F1 (n1 |U n1 x ) u(x), (3.277)
Γ(n1 )
nn2 2 −n2 −1 − n2
fA2 (x) = x e x u(x). (3.278)
Γ(n2 )
150 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
It then becomes possible to compute the PDF of Y through the Mellin con-
volution, i.e.
� �y�
∞
1
fY (y) = fA2 fA1 (x) dx, y ≥ 0
0 x x
� ∞ n2 � �n2 +1 n x n1
n2 x − 2 n1
= e y xn1 −2 e−n1 x e−U ×
0 Γ(n2 ) y Γ(n1 )
0 F1 (n1 |U n1 x ) dx
� ∞ “ ”
nn2 2 nn1 1 e−U n2
n1 +n2 −1 − y +n1 x
= x e ×
y n2 +1 Γ(n1 )Γ(n2 ) 0
0 F1 (n1 |U n1 x ) dx, (3.279)
∞ �
�
�k
− U
σ2
U/σZ2
fZ1 (z) = e Z hn1 +k (z, σZ2 ), (3.282)
k!
k=0
where we have included the variance parameter σZ2 (being the variance of the
underlying complex gaussian variates; it was implicitely equal to 1 in the pre-
σ2
ceding development leading to (3.280)) and hn (z, σZ2 ) = g2n (z, 2Z ) is the
density in z of a central χ2 variate with n complex degrees-of-freedom and a
variance of the underlying complex Gaussian variates of σZ2 .
Probability and stochastic processes 151
0.6
0.5
0.4
fY (y)
0.3
0.2 U = 0, 1, 3, 5
0.1
0
0 2 4 6 8 10
y
Figure 3.9. The F-distribution PDF with n1 = n2 = 2 and various values of the non-centrality
parameter U .
Given that the PDF of Z2 is (3.275), we apply (for the sake of diversity with
respect to the preceding development) the following joint transformation:
n2 Z1
Y = ,
n1 Z2
(3.283)
X = Z2 .
The Jacobian is J = n1 x
n2 . It follows that the joint density of y and x is
� �
n1 x n1
fY,X (y, x) = fZ xy fZ2 (x)
n2 1 n2
�∞ � �m � �
n1 x − σU2 U/σ 2 Z n1
= e Z hn1 +m xy, σZ2 ×
n2 m! n2
m=0
1
xn2 −1 e−x
Γ(n2 )
„ «
∞ � �
2 m
− U2 � U/σZ
n y
n1 x −x 1+ 1 2
n2 −1
= x e n2 σ
Z e
σ
Z ×
n2 Γ(n2 )σZ2 m!
k=0
� n1 �n1 +m−1
n2 xy 1
u(x)u(y). (3.284)
σZ2 Γ (n1 + m)
152 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Γ (n1 + n2 + m)
= � �n1 +n2 +m . (3.286)
n1 y
1 + n σ2
2 Z
where
� �ν 1
� � 1 ν1 y ν1 −1
h(F )
ν1 ,ν2 y, σF2 = � �ν1 +ν2 u(y), (3.290)
B (ν1 , ν2 ) ν2 σF2 ν1 y
1+ 2
ν 2 σF
n1
y
n2 σ 2
Letting u = 1+
Z
n1 y , we find that
n2 σ 2
Z
n2 σZ2 u
y = , (3.292)
n1 (1 − u)
n1
dy = (1 − u)2 du, (3.293)
n2 σZ2
which leads to
� �−k−n1 −m
� 1 un1 +m+k−1 n1
2
n2 σZ
I = k+1−n2
du
0 (1 − u)
� �−k−n1 −m
n1
2
n2 σZ
= , n2 > k. (3.294)
B (k + m + n1 , −k + n2 )
Substituting (3.293) in (3.290), we find
� 2 �k
� � n2 σZ ∞ � �m
n1 − U2 � U/σZ2
Y k
= σ
e Z ×
Γ(n2 ) m!Γ(n1 + m)
m=0
Γ (k + m + n1 ) Γ (−k + n2 ) , n2 > k, (3.295)
1 The mean of the numerator is indeed ν times the variance of the underlying Gaussians by virtue of (3.245)
1
154 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
In the same fashion, we can show that the second raw moment is
�� � �
� 2 � n22 σZ4 1 U U2
Y = 2 2 + n1 (n1 + 1) + 4 .
n21 (n2 − 1)(n2 − 2) σZ σZ
(3.300)
It follows that the variance is
� �2
2 4
n + U
n2 σ Z 1 2
σZ U
σY = 2
2
+ 2 2 + n1 . (3.301)
n1 (n2 − 1)(n2 − 2) n2 − 1 σZ
Beta distribution
Let Y1 and Y2 be two independent central chi-square random variables with
n1 and n2 complex degrees-of-freedom, respectively and where the variance of
the underlying complex Gaussians is the same and is equal to σZ2 . Furthermore,
Probability and stochastic processes 155
we impose that
A2 = Y1 + Y2 , (3.302)
BA2 = Y1 . (3.303)
Then A and B are independent and B follows a beta distribution with PDF
Γ(n1 + n2 ) n1 −1
fB (b) = b (1 − b)n2 −1 [u(b) − u(b − 1)] . (3.304)
Γ(n1 )Γ(n2 )
As a first step in deriving this PDF, consider the following bivariate trans-
formation:
C1 = Y1 + Y2 , (3.305)
C2 = Y1 , (3.306)
T = C1 , (3.309)
BT = C2 , (3.310)
5
[2, 1], [3, 1], [4, 1], [5, 1]
4
[4, 2]
3
[4, 4]
fB (b)
[3, 2]
2 [2, 2]
0
0 0.2 0.4 0.6 0.8 1
b
Figure 3.10. The beta PDF for various integer combinations of [n1 , n2 ].
�
Γ(n1 + n2 ) x
FB (x) = P (B < x) = bn1 −1 (1 − b)n2 −1 db. (3.313)
Γ(n1 )Γ(n2 ) 0
� � �
Γ(n1 + n2 ) xn1 (1 − x)n2 −1 n2 − 1 x n1 n2 −2
FB (x) = + b (1 − b) db ,
Γ(n1 )Γ(n2 ) n1 n1 0
(3.314)
where it can be observed that the remaining integral is of the same form as the
original except that the exponent of b has been incremented while the exponent
of (1 − b) has been decremented. It follows that integration by parts can be
applied iteratively until the last integral term contains only a power of b and is
Probability and stochastic processes 157
Nakagami-m distribution
The Nakagami-m distribution was originally proposed by Nakagami [Nak-
agami, 1960] to model a wide range of multipath fading behaviors. Joining
the Rayleigh and Rice PDFs, the Nakagami-m distribution is one of the three
most encountered models for multipath fading (see chapter 4 for more details).
Unlike the other two, however, the Nakagami PDF was obtained through em-
pirical fitting with measured RF data. It is of interest that the Nakagami-m
distribution is very flexible, being controlled by its m parameter. Thus, it in-
cludes the Rayleigh PDF as a special case, and can be made to approximate
closely the Rician PDF.
The Nakagami-m PDF is given by
2 � m �m 2m−1 −m y
fY (y) = y e Ω u(y), (3.322)
Γ(m) Ω
where m is the distribution’s parameter which takes on any real value between
2 and ∞, and Ω is the second raw moment, i.e.
1
� �
Ω= Y2 . (3.323)
m2 − m
K= √ , m > 1, (3.326)
m − m2 − m
Probability and stochastic processes 159
(K + 1)2 K2
m= =1+ . (3.327)
2K + 1 2K + 1
2
m = 12 , 34 , 1, 32 , 2, 52 , 3
1.5
fY (y)
0.5
0
0 1 2 3 4 5 6
y
Figure 3.11. The Nakagami distribution for various values of the parameter m with Ω = 1.
Lognormal distribution
The lognormal distribution, while being analytically awkward to use, is
highly important in wireless communications because it characterizes very
well the shadowing phenomenon which impacts outdoor wireless links. In-
terestingly, other variates which follow the lognormal distribution include the
weight and blood pressure of humans, and the number of words in the sen-
tences of the works of George Bernard Shaw!
When only path loss and shadowing are taken into account (see chapter 4),
it is known that the received power Ω(dB) at the end of a wireless transmission
over a certain distance approximately follows a noncentral normal distribution,
i.e.
� �
1 (p − µP )2
fΩ(dB) (p) = √ exp − , (3.328)
2πσΩ 2σP2
where
Ω(dB) = 10 log10 (P ). (3.329)
160 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where η = ln(10)
10
.
This density, however, is very difficult to integrate. Nonetheless, the kth raw
moment can be found by using variable substitution to revert to the Gaussian
form in the integrand:
� � � ∞
P k
= xk fP (x)dx
0
� ∞ � �
η (10 log10 (p) − µP )2
= √ xk−1
exp − dx
2πσΩ 0 2σP2
� ∞ � �
1 (y − µP )2
= √ 10ky/10 exp − dy
2πσΩ −∞ 2σP2
� ∞ � �
1 (y − µP )2
= √ e ky/η
exp − dy
2πσΩ −∞ 2σP2
� ∞ � 2 �
η η (z − µP /η)2
= √ e exp −
kz
dz (3.331)
2πσΩ −∞ 2σP2
which, by virtue of [Prudnikov et al., 1986a, 2.3.15-11], yields
� � � 2 �
1 σP 2 µP
P = exp
k
k + k . (3.332)
2 η2 η
It follows that the variance is given by
� �
σP2 = P 2 − �P �2
� 2 � � 2 �
2σP 2µP 2 σP µP
= exp + − exp +
η2 η 2η 2 η
� 2 � � 2 �
2σP 2µP σP 2µP
= exp + − exp 2 +
η2 η η η
� �� � 2� �
2µP σ 2 σ
= exp + P2 exp P2 − 1 . (3.333)
η η η
6. Multivariate statistics
Significant portions of this section follow the developments in the first few
chapters of [Muirhead, 1982] with additions, omissions and alterations to suit
our purposes. One notable divergence is the emphasis on complex quantities.
Probability and stochastic processes 161
Random vectors
Given an M × M random vector x, its mean is defined by the vector
�x1 �
�x2 �
µ = �x� = .. . (3.334)
.
�xM �
The central second moments of x, analogous to the variance of a scalar
variate, are defined by the covariance matrix:
� �
Σx = [σmn ]m=1,··· ,M
n=1,··· ,M = (x − µ) (x − µ) ,
H
(3.335)
where
σmn = �(xm − µm )(xn − µn )∗ � . (3.336)
Lemma 3.1. An M ×M matrix Σx is a covariance matrix iff it is non-negative
definite (Σ ≥ 0).
Proof. Consider the variance of aH x where a is considered constant. We have:
� �
Var(aH x) = aH (x − µx ) (x − µx )H a
� �
= aH (x − µx ) (x − µx )H a
= aH Σx a. (3.337)
� H �2
The quantity �a (x − µx )� is by definition always non-negative. The vari-
ance above is the expectation of this expression and is therefore also non-
negative. It follows that the quadratic form aH Σx a ≥ 0, proving that Σx
is non-negative definite.
The above does not rule out the possibility that aH Σx a = 0 for some vector
a. This is only possible if there is some linear combination of the elements of
x, defined by vector a, which always yields 0. This is in turn only possible
if all elements of x are fully defined by x1 , i.e. there is only a single random
degree of freedom. In such a case, the complex vector x is constrained to a
fixed 2D hyperplane in R2M . If x is real, then it is constrained to a line in RM .
The proof is almost identical to the real case and is left as an exercise.
Theorem 3.4. If x ∼ CN M (u, Σ) and A is a fixed P × M matrix and b is a
fixed K × 1 vector, then
� �
y = Bx + b ∼ CN P Bu + b, BΣBH . (3.349)
Proof. From definition 3.40, it is clear that any linear combination of the el-
ements of a Gaussian vector, real or complex, is a univariate Gaussian r.v. It
directly follows that y is a multivariate normal vector. Furthermore, we have
�y� = �AX + b�
= A �X� + b
= Au + b (3.350)
� H� � �
yy = (Ax + b − (Au + b)) (Ax + b − (Au + b))H
� �
= (A(x − u)) (A(x − u))H
� �
= A (x − u)(x − u)H AH
= AΣAH , (3.351)
Random matrices
Given an M × N matrix A = [a1 a2 · · · aN ], where
�am � = 0
� �
am aH
m = Σ, m = 1, . . . , M
� �
H
am an = 0, m �= n.
Gaussian matrices
Theorem 3.7. Given a P × Q complex Gaussian matrix X such that X ∼
CN (M, C ⊗ D) where C is a Q × Q positive definite matrix, D is a P × P
positive definite matrix, and �X� = M, then the PDF of X is
1
fX (X) = × (3.368)
π P Q (det(C))P (det(D))Q
� �
etr −C−1 (X − M)H D−1 (X − M) , X > 0,
where etr(X) = exp(tr(X)).
Proof. Given the random vector x = vec(X) with mean m = �vec(X)�, we
know from theorem 3.2 that its PDF is
1 � �
fx (x) = P Q exp −(x − m)H (C ⊗ D)−1 (x − m) .
π det (C ⊗ D)
(3.369)
Probability and stochastic processes 167
Equivalence of the above with the PDF stated in the theorem is demonstrated
first by observing that
det(C ⊗ D) = (det(C))P (det(D))Q , (3.370)
which is a consequence of property 2.64.
Second, we have
� �
(x−m)H (C ⊗ D)−1 (x−m) = (x−m)H C−1 ⊗ D−1 (x−m), (3.371)
which, according to lemma 2.4(c), becomes
� � � �
(x − m)H C−1 ⊗ D−1 (x − m) = tr C−1 (X − M)H D−1 (X − M) ,
(3.372)
which completes the proof.
value in subset S.
Given a one-to-one invertible transformation
y = g(x), (3.376)
the integral becomes
�
I= fx (g −1 (x)) |J(x → y| dy1 . . . dyM , (3.377)
S†
168 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
This implies that dym dym = −dym dym = 0. This product is termed the
exterior product and denoted by the symbol ∧. According to this wedge
product, the product (3.379) becomes
� �
∂x1 ∂x2 ∂x1 ∂x2
dx1 dx2 = − dy1 ∧ dy2 . (3.383)
∂y1 ∂y2 ∂y2 ∂y1
Theorem 3.8. Given two N × 1 real random vectors x and y, as well as the
transformation y = Ax where A is a nonsingular M × M matrix, we have
Probability and stochastic processes 169
dy = Adx and
M
� M
�
dym = det(A) dxm .
m=1 m=1
Proof. Given the properties of the exterior product, it is clear that
M
� M
�
dym = p(A) dxm , (3.384)
m=1 m=1
M n−1
� � �
(dX) = dxmn = dxmn . (3.392)
n=1 m=1 m<n
Selected Jacobians
What follows is a selection of Jacobians of random matrix transformations
which will help to support forthcoming derivations. For convenience and clar-
ity of notation, it is the inverse transformations that will be given.
Portions of this section follow [Muirhead, 1982, chapter 2] and [Ratnarajah
et al., 2004] (for complex matrices).
Theorem 3.10. If X = BY where X and Y are N ×M real random matrices
and B is a fixed positive definite N × N matrix, then
dX = [dx1 , · · · , dxM ] ,
dY = [dy1 , · · · , dyM ] ,
we find
dxm = Bdym , m = 1, · · · , M, (3.400)
172 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Therefore, we have
M �
� N
(dX) = dxnm
m=1 n=1
�M N
�
= det(B) dynm
m=1 n=1
M
� �N
= det(B)M dynm
m=1 n=1
= det(B)M (dY).
It turns out that the only polynomials in the elements of B that can be fac-
torized as above are the powers of det(B) (see prop. 2.40). Therefore, we
have
p(B) = (det(B))k ,
where k is some integer.
We can isolate k by letting B = det (β, 1, · · · , 1) such that
2
β y11 by12 · · · by1M
βy12 y22 · · · by2M
BdYBT = .. .. . . .. . (3.406)
. . . .
by1M y2M · · · yM M
It follows that the exterior product of the distinct elements (diagonal and
above-diagonal elements) of dX is
Proof. The proof follows that of theorem 3.11 except that by definition, we
take the exterior product of the above diagonal elements only in (3.406) since
X and Y are skew-symmetric.
Theorem 3.13. If X = BYBH where X and Y are M × M positive definite
complex random matrices and B is a non-singular fixed M × M matrix, then
Proof. The proof follows from the fact that real and imaginary parts of a Her-
mitian matrix are symmetric and skew-symmetric, respectively. It follows that,
by virtue of theorems 3.11 and 3.12,
It follows that
Therefore,
dX · Y = −XdY,
dX = −XdY · Y−1 ,
dX = −Y−1 dY · Y−1 , (3.413)
(3.414)
� M
� M
�
(dA) = damn = damn
n≤m m=0 n=m−1
M
� �
−m+1
= 2 M
tM
mm dtmn
m=1 n≤m
M
�
−m+1
= 2M tM
mm (dT).
m=1
176 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
It is noteworthy that the differential expressions above do not contain all the
terms they should. This is because they are meant to be multiplied together
using exterior product rules (so that repeated differentials yield 0) and redun-
dant terms have been removed. Hence, only differentials in τmn were taken
into account for the real part, and only differentials in µmn were kept for the
imaginary part. Likewise, the first appearance of a given differential form pre-
cludes its reappearance in successive expressions. For example, dτ11 appears
in the expression for dα11 so that all terms with dτ11 are omitted in successive
expressions dα12 , . . . dαM M .
Hence, we have
(dA) = (d�{A})(d�{A})
M
� �
= dαmn dβmn
m≤n m<n
M
�
M −1
= 2M tM
11 t22 · · · tM M dτmn ×
m≤n
� M
�
�
−1 M −2
tM
11 t22 · · · tM −1,M −1 dµmn
m<n
M
�
2M −2m+1
= 2M tmm (d�{T})(d�{T})
m=1
�M
2M −2m+1
= 2M tmm (dT). (3.422)
m=1
1 U1 = IM and U2 U1 = 0, reduces to
which, noting that UH H
� H �
U1 dU1 T + dT
UH dZ = . (3.427)
1 UH2 dU1 T
N
� M
�
� H � � �
U2 dU1 T = det(T)2 uH
m dun
m=M +1 n=1
N
� M
�
2(N −M )
= det uH
m dun . (3.432)
m=M +1 n=1
We now turn our attention to the upper part of the right-hand side of (3.427),
1 dU1 T + dT. Since U1 is unitary, we have
the latter being UH
1 U1 = IM .
UH
implies that
� H �H
UH 1 dU1 = −dU1 U1 = − U1 U1
H
. (3.433)
For the above to hold, UH 1 dU1 must be skew-Hermitian, i.e. its real part is
skew-symmetric and its imaginary part is symmetric.
It follows that the real part of UH
1 dU1 can be written as
0 −�{uH H
2 du1 } · · · −�{uM du1 }
�{uH du1 } 0
2 · · · −�{uH M u2 }
�{UH dU } = . . .. .
1 1
.. .. . ..
M u1 }
�{uH M u2 }
�{uH ··· 0
(3.434)
Postmultiplying the above by T and retaining only the terms contributing to
the exterior product, we find that the subdiagonal elements are
0 ··· ··· ··· ···
�{uH du1 }t11 ··· ··· ··· ···
2
�{uH du1 }t11 �{u3 du2 }t22 · · · · · · · · ·
3 .
.. .. .. . . .
.
. . . . .
�{uH M du1 }t11 �{uM du2 }t22 · · · �{uM uM −1 }tM −1,M −1 · · ·
H
(3.435)
� �
Thus, we find that the exterior product of the subdiagonal elements of �{ UH 1 U1 T}
is given by
−1 �M M −2 M
�
tM
11 m=2 �{um du1 }t2 2
H H
m=3 �{um du2 } · · ·
tM −1,M −1 �{uHM duM −1 } =
�� �� �
M M −m M M
m=1 tmm
H
m=1 n=m+1 �{un dum }. (3.436)
180 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
In the same fashion, we find that the exterior product of the diagonal and
subdiagonal elements of �{UH 1 dU1 }T is
� M � M M
� � �
M −m+1
tmm �{uH n dum }. (3.437)
m=1 m=1 n=m
� �
Clearly, the above-diagonal and diagonal elements of �{ UH 1 dU1 T} con-
tribute nothing to the exterior product since they all involve an element of dU1
and all such elements already appear in the subdiagonal � portion. � The same
argument appears to the above diagonal elements of �{ UH 1 dU 1 T. Further-
more, it can be verified that the inclusion of dT in U1 U1 + dT amounts to
H
where
M
� N
�
(HH
1 dH1 ) = hH
n dhm . (3.441)
m=1 n=m+1
where the integral is carried out over the space of real positive definite (sym-
metric) matrices.
It is a point of interest that it can be verified that Γ1 (a) = Γ(a).
Definition 3.42. The complex multivariate Gamma function is defined
�
Γ̃M (a) = etr(−A)det(A)a−M (dA), (3.443)
A>0
where the integral is carried out over the space of Hermitian matrices.
Theorem 3.19. The complex multivariate Gamma function can be computed
in terms of standard Gamma functions by virtue of
M
�
Γ̃M (a) = π M (M −1)/2 Γ(a − m + 1). (3.444)
m=1
Knowing that � ∞ √
2
e−t dt = π, (3.449)
−∞
Stiefel manifold
Consider the matrix H1 in theorem 3.18. It is an N × M matrix with or-
thonormal columns. The space of all such matrices is called the Stiefel mani-
fold and it is denoted VM,N . Mathematically, this is stated
� �
VM,N = H1 ∈ RN ×M ; HH 1 H1 = IM . (3.452)
However, the complex counterpart of this concept will be found more useful
in the study of space-time multidimensional communication systems.
Definition 3.43. The N × M complex Stiefel manifold is the space spanned
by all N × M unitary matrices (denoted U1 ) and it is denoted ṼM,N , i.e.
� �
ṼM,N = U1 ∈ CN ×M ; UH 1 U1 = IM .
Proof. If the real parts and imaginary parts of all the elements of U1 are treated
as individual coordinates, then a given instance of U1 defines a point in 2M N -
dimensional Euclidian space.
Furthermore, the constraining equation UH 1 U1 = IM can be decomposed
into its real and imaginary part. Since UH 1 U 1 is necessarily Hermitian, the
real part is symmetric and leads to 12 M (M + 1) constraints on the elements
of U1 . The imaginary part is skew-symmetric and thus leads to 12 M (M − 1)
constraints. It follows that there are M 2 constraints on the position of the point
Probability and stochastic processes 183
Geometrically,
√ this means that the said surface is a portion of a sphere of
radius M .
Given an N × M complex Gaussian matrix X with N ≥ M such that
X ∼ CN (0, IN ⊗ IM ). By virtue of theorem 3.7, its density is
� �
fX (X) = π −N M etr −XH X . (3.454)
Since a density function integrates to 1, it is clear that
�
� �
etr −XH X = π M N . (3.455)
X
= πM N . (3.457)
An integral having the same form as the integral above over the elements of
T was solved in the proof of theorem 3.19. Applying this result, we find that
� � � �� � � H �
M M
H
T,T T>0 exp − m≤n |t mn |2 2N −2m+1 (dT)
m=1 tmm ṼM,N U1 dU1
Γ̃M (N )
= 2M
. (3.458)
From the above and (3.455), it is obvious that
� � 2M π M N
Vol ṼM,N = . (3.459)
Γ̃M (N )
184 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Wishart matrices
As a preamble to this subsection, we introduce yet another theorem related
to the complex multivariate Gamma function.
Theorem 3.21. Given an Hermitian M × M matrix C and a scalar a such
that �{a} > M − 1, then
�
� �
etr −C−1 A det(A)a−M (dA) = Γ̃M (a)det(C)a ,
A>0
Thus, the transformed integral now coincides with the definition of Γ̃M (a),
leading us directly to
I = Γ̃M (a)det(C)a . (3.461)
Unless stated otherwise, it will be assumed in the following that the number
of degrees-of-freedom N is equal or superior to the Wishart matrix dimension
M , i.e. the Wishart matrix is nonsingular.
Theorem 3.22. If A ∼ CW M (N, Σ), then its PDF is given by
etr(−Σ−1 A) |det(A)|N −M
fA (A) = .
Γ̃M (N ) |det(Σ)|N
1 U1 T = T T and U1 can be
where we note that A = ZH Z = TH UH H
where
2t11 t12 · · · t1M
t21 2t22 · · · t2M
Θ = .. .. .. ..
. . . .
tM 1 tM 2 · · · 2tM M
= [tmn ]n=1,··· ,M
m=1,··· ,M + diag (t11 , t22 , · · · , tM M ) ,
and tmn is the variable in the characteristic function domain associated with
element amn of the matrix A. Since A is Hermitian, tmn = tnm .
Several aspects of the above theorem are interesting. First, a new defini-
tion of characteristic functions — using matrix Θ — is introduced for random
matrices, allowing for convenient and concise C. F. expressions. Second, we
note that if M = 1, the complex Wishart matrix reduces to a scalar a and its
characteristic function according to the above theorem is (1 − jtσ 2 )−N , i.e.
the C. F. of a 2N degrees-of-freedom chi-square variate.
Proof. We have
�
φA (jΘ) = �etr (jΘA)� = etr (jΘA) fA (A)(dA)
A
�
etr((jΘ − Σ−1 )A) |det(A)|N −M
= (dA), (3.467)
A Γ̃M (N ) |det(Σ)|N
where the integral can be solved by virtue of theorem 3.21 to yield
� �−N
φA (jΘ) = det (Σ)−N Σ−1 − jΘ = det (I − jΘΣ)−N . (3.468)
where A11 and Σ11 are of size M � × M � , then submatrix A11 follows a
CW M � (N, Σ11 ) distribution.
Proof. Letting X = [IM � 0] (an M � × M matrix) in theorem 3.24, we find that
XAXH = A11 and XΣXH = Σ11 , thus completing the proof.
Theorem 3.26. If A ∼ CW M (N, Σ) and x is any M × 1 random vector
independant of A such that the probability that x = 0 is null, then
xH Ax
∼ χ22N ,
xH Σx
and is independent of x.
Proof. The distribution naturally derives from applying theorem 3.24 and let-
ting X = x therein.
1
To show independance, let y = Σ 2 x. Thus, we have
xH Ax yH By
= , (3.472)
xH Σx yH y
where B follows a CW M (N, IM ) distribution. This is also equivalent to
yH By
= zH Bz, (3.473)
yH y
188 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where A11 and Σ11 are P × P , A22 and Σ22 are Q × Q, and P + Q = M ,
then
the matrix A1.2 = A11 − A12 A−1 22 A12 follows a CW P (M − Q, Σ1.2 )
H
A22 = X2 XH
2 A12 = X1 XH
2 (3.476)
and
Starting from
� �−1
YH YYH Y = IN , (3.478)
which derives directly from the nonsingularity of Y, it can be shown that
� �
H −1
XH2 X2 X2 X 2 + BH X = I N , (3.479)
by simply expanding the matrix Y into its partitions in (3.478) and exploiting
the properties of X2 and B to perform various simplifications.
Substituting (3.479) into (3.476), we find that
A11.2 = X1 BH BXH
1 . (3.480)
thus yielding
1
fA12 ,C|X2 (A12 , C|X2 ) = × (3.484)
π P N det(Σ11.2 )P det(A22 )P
�
etr −Σ−1 H
11.2 CC −,
�
H
Σ−1
11.2 (A 12 − M) A −1
22 (A 12 − M) .
� (0, IN
Since the above density factors, this implies that C follows a CN
−1
−Q ⊗
Σ1.2 ) distribution and that A12 conditioned on X2 follows a CN Σ12 Σ22 A22 ,
190 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Therefore, we have
� �
H −1
YA−1
2 Y
� � � �−1
� � IK
= US IK 0 V H
A−1
2 V SU H
0
� � ��−1
� �
H −1
� � IK
= SU IK 0 V H
A−1
2 V (US)−1
0
� � ��−1
� � IK
= US−1 IK 0 D−1 S−1 UH , (3.488)
0
where D11.2 = D11 − D12 D−122 D21 which, by virtue of theorem 3.27, follows
CW K (N
� − M + K, IK ).−1It immediately
� follows that US−1 D11.2 S−1 UH is
CW K N − M + K, US S U where−1
� �−1
US−1 S−1 UH = YYH
� �−1
= XΣ−1 X , (3.490)
xH Σ−1 x
∼ χ22N −2M +2 ,
xH A−1 x
and is independent of x.
The proof is left as an exercise.
Problems
3.1. From the axioms of probability (see definition 3.8), demonstrate that the
probability of an event E must necessarily satisfy P (X) ≤ 1.
3.2. What is the link between the transformation technique of a random vari-
able used to obtain a new PDF and the basic calculus technique of variable
substitution used in symbolic integration?
3.3. Given your answer in 3.2, show that Jacobians can be used to solve multiple
integrals by permitting multivariate substitution.
3.4. Show that if Y follows a uniform distribution within arbitrary bounds a and
b (Y ∼ U (a, b)), then it can be found as a tranformation upon X ∼ U (0, 1).
Find the transformation function.
3.5. Propose a proof for the Wiener-Khinchin theorem (eq. 3.121).
3.6. Given that the characteristic function of a central 2 degrees-of-freedom chi-
square variate with variance P is given by
1
Ψx (jv) = ,
1 − jπvP
derive the infinite-series representation for the Rice PDF (envelope).
3.7. Demonstrate (3.317).
192 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
3.8. From (3.317), derive the finite sum form which applies if n2 is an integer.
3.9. Show that
� �
−M − 12 1 T −1
fx (x) = (2π) 2 det(Σ) exp − (x − µ) Σ (x − µ) . (3.491)
2
is indeed a PDF. Hint: Integrate over all elements of x to show that the area
under the surface of fx (x) is equal to 1. To do so, use a transformation of
random variables to decouple the multiple integrals.
3.10. Given two χ2 random variables Z1 and Z2 with PDFs
1
fZ1 (x) = xn1 −1 e−x ,
Γ(n1 )
1
fZ2 (x) = xn2 −1 e−x/2.5 ,
Γ(n2 )2.5n2
(a) What is the PDF of the ratio Y = Z1
Z2 ?
(b) If n1 = 2 and n2 = 2, derive the PDF of Y = Z1 + Z2 . Hint: Use
characteristic functions and expansions into partial fractions.
3.11. Write a proof for theorem 3.18.
3.12. If X = BY, where X and Y are N × M complex random matrices and B
is a fixed positive definite N × N matrix, prove that the Jacobian is given
by
(dX) = |det(B)|2M (dY). (3.492)
3.13. Show that if W = CW M (N, IM ), then its diagonal elements w11 , w22 ,
. . . , wM M are independent and identically distributed according to a χ2
law with 2M degrees of freedom.
3.14. Write a proof for theorem 3.29. Hint: start by studying theorems 3.25 and
3.26.
where x is an integer between 0 and N , we wish to show that it tends towards the Gaussian
distribution as N gets large.
In fact, Abraham de Moivre originally derived the Gaussian distribution as an approximation
to the binomial distribution and as a means of quickly calculating cumulative probabilities (e.g.
the probability that no more than 7 heads are obtained in 10 coin tosses). The first appearance
APPENDIX 3.A: Derivation of the Gaussian distribution 193
of the normal distribution and the associated CDF (the probability integral) occurs in a latin
pamphlet published by de Moivre in 1733 [Daw and Pearson, 1972]. This original derivation
hinges on the approximation to the Gamma function known as Stirling’s formula (which was,
in fact, discovered in a simpler form and used by de Moivre before Stirling). It is the approach
we follow here. The normal distribution was rediscovered by Laplace in 1778 as he derived the
central limit theorem, and rediscovered independently by Adrian in 1808 and Gauss in 1809 in
the process of characterizing the statistics of errors in astronomical measurements.
We start by expanding log(g(x)) into its Taylor series representation about its point xm
where g(x) is maximum. Since the relation between g(x) and log(g(x)) is monotonic, log(g(x))
is also maximum at x = xm . Hence, we have:
» –
d log(g(x))
log(g(x)) = log(g(xm )) + (x − xm ) +
dx x=xm
» –
1 d2 log(g(x))
(x − xm )2 +
2! dx2 x=xm
» –
1 d3 log(g(x))
(x − xm )3 + · · · ,
3! dx3 x=xm
∞
X » –
1 dk log(g(x))
= (x − xm )k . (3.A.2)
k! dxk x=xm
k=0
Since N is large by hypothesis and the expansion is about xm which is distant from both 0
and N (since it corresponds to the maximum of g(x)), thus making x and N − x also large,
Stirling’s approximation formula can be applied to the factorials x! and (N − x)! to yield
1
log(x!) ≈ log(2πx) + x(log x − 1)
2
≈ x(log(x) − 1). (3.A.4)
Thus, we have
d log(x!)
≈ (log(x) − 1) + 1 = log(x), (3.A.5)
dx
and
d log(N − x)! d
≈ [(N − x)(log(N − x) − 1)]
dx dx
−1
= −(log(N − x) − 1) + (N − x)
N −x
= − log(N − x), (3.A.6)
d log(g(x))
≈ − log(x) + log(N − x) + log(p) − log(1 − p). (3.A.7)
dx
194 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The maximum point xm can be easily found by setting the above derivative to 0 and solving
it for x. Hence,
„ «
p N − xm
log = 0
1 − p xm
p N − xm
= 1
1 − p xm
(N − xm )p = (1 − p)xm
xm (p + (1 − p)) = Np
xm = N p. (3.A.8)
We can now find the first coefficients in the Taylor series expansion. First, we have
» –
d log(g(x))
= 0, (3.A.9)
dx x=xm
where we note that the 3rd coefficient is much smaller (by a factor proportional to N1 ) than the
second one as N and xm grow large.
Since their contribution is not significant, all terms in the Taylor expansion beyond k = 2
are neglected. Taking the exponential of (3.A.2), we find
(x−x m )2
− 2N p(1−p)
g(x) = g(xm )e , (3.A.12)
where »Z ∞ –
K= g(x)dx , (3.A.14)
−∞
and
Z ∞ Z ∞ (x−xm )2
−
g(x)dx = g(xm )e 2N p(1−p) dx
−∞ −∞
Z ∞
− u2
= g(xm ) e 2N p(1−p) du
−∞
Z ∞
1 − v
= 2g(xm ) √ e 2N p(1−p) dv
0 2 v
p
= g(xm ) 2πN p(1 − p). (3.A.15)
References