AppendixA Probability and Statistics
AppendixA Probability and Statistics
Statistics
Bibliography 529
Index 530
500
Appendix A
This appendix provides a summary of probability and random-process basics. The intent is a useful
reference to assist with items used throughout the book, but not as a primary source for teaching
probability and/or statistics, for which there are many excellent sources, including one for electrical
engineers [1], a time-tested classic [2], an advanced treatment [?], and an introductory treatment [5].
Section A.1 samples and overviews prerequisite results from basic random variables and probability.
Readers viewing this should have some previous exposure to random variables and basic systems analysis.
This section is meant only as a refresher and notational consolidator. Section A.2 proceeds to random
processes as time-indexed random variables, reviewing results in stationarity (both strict and wide-sense)
and reviewing some basic linear-systems results. Section A.3 completes this appendix with specifics of
the first two sections when considering complex passband (or baseband) signals.
501
A.1 Random Variables and Their Probability
Random variables values that are not specific, or are not deterministic. This means several possibilities
exist, each with a certain likelihood (probability). This text’s data transmission study has a few types
of random variables of interest:
In general engineering and beyond, of course, there are many more examples, but these above are
this text’s focus. A good reference with more significant detail and proofs is [?]
where 0 ≤ px (i) ≤ 1 ∀ i = 0, ..., |C| − 1. The probability that the random variable is in subset
x0 ⊆ C 0 = {x0 ... x|C|−1 } with corresponding (possibly relabled) indices j = 0, ..., |C 0 | − 1| ≤ |C| − 1 is
X
P r{C 0 } = px (i) ≤ 1 . (A.3)
j∈C 0
with P r{C} = 1.
Example probability mass functions include: The discrete uniform mass function :
1
px (i) = , (A.4)
|C|
and the Poisson Distribution that characterizes events occurring at rate R in time T
(R · T )i · e−R·T
px (i) = ∀ i ≥ 0 and R > 0 , T ≥ 0 . (A.5)
i!
The Bernoulli Distribution has only two values x = 1 or x = 0 with respectively probability mass
function values p and q = 1 − p respectively, while the geometric distribution has countably infinite
discrete values i = 1, ..., ∞ values with probabilities
often corresponding to the first occurrence of a 1 on the ith experimental sample from a Bernoulli
distribution. Another somewhat unusual distribution has value i occurring with decaying probability
1
px (i) = i(i+1) but infinite average (see Subsection A.1.2 ).
502
A continuous real probability density function, px (u) ≥ 0, measures a continuous random
variable’s relatively likelihood . Continuous randcom variables take values as real numbers x ∈ R, or
as complex numbers x ∈ C, with domain x ∈ Dx ⊆ C. The probability that the random variable is in
continuous region (or set of such regions), x0 ⊆ Dx0 ⊆ Dx , is
Z
0
P r{Dx } = px (u) · du ≤ 1 . (A.7)
u∈Dx0
with P r{Dx } = 1. Often the word probability distribution more generally describes either or both of
probability mass function and probability density function. In such context, the integral notation most
often appears and is equivalent to the sum if the integral is more broadly constructed as a Stieltijes
Integral (https://en.wikipedia.org/wiki/Riemann\OT1\textendashStieltjes_integral).
The cumulative distribution function for both discrete and continuous real or complex random
variables is
∆
Fx (X) = P r{x ≤ X} . (A.8)
When x is continuous, then
d
px (u) = Fx (u) , (A.9)
du
and when x is discrete, the probability distribution function’s values are the successive differences between
the cumulative distribution function’s values.
Example probability densities include the continuous uniform density
1
px (u) = ∀ d ∈ [−d/2, d2] (A.10)
d
and the exponential distribution (parametrized by λ)
λ · e−λu u ≥ 0
px (u) = . (A.11)
0 u<0
Functions of random variables are random variables. Thus, f (x) is a random variable if x is a random
variable. The new distribution is
df −1 (y)
py (y) = px (f −1 (y)) · . (A.12)
dy
where the last expression essentially reverses the axes in finding area under a curve. The mean is also
the first moment; higher order moments (about zero) are for real variables:
Z
n
E [x ] = E [x] = un · px (u) · du (A.15)
Dx
(the integral is replaced by the sum for the discrete case). When n = 2, this is the mean-square value
or the autocorrelation of x if generalized to complex variables as E[|x|2 ] = E[x · x∗ ]. The expectation
operator E is linear in that E[x + y] = E[x] + E[y]. The variance is the second moment about the
random variable’s mean:
h i Z
2 ∆ 2
σx = E (x − E [x]) = (u − E [x])2 · px (u) · du . (A.16)
Dx
The standard deviation is the (positive) square root of the variance, σx . indexstandard deviation
503
A.1.3 Moment Generating Functions
The moment-generating or characteristic function is the (frequency-reversed) Fourier Transform
(See Appendix C) of the probability distribution function
Z
rx
φx (r) = E [e ] = px (u) · er·u· · du = Px (r) , (A.17)
u∈Dx,r
where Dx,r is a domain of convergence for the integral that depends on the distribution and choice of r.
Often use r = r = −ω and then Φx (ω) is the Fourier Transform of the probability density function.
The moment generating function helps generate moments according to (with r = −ω)
dn (φx (ω = 0))
E [xn ] = (−)n · , (A.18)
dω n
essentially replacing (A.15)’s integration with (often simpler) differentiation to generate the random vari-
able’s moments (see https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory)).
There is also the semi-invariant moment generating function
where
E [erx · x]
dγx (r) 1 dφ(r)
= E = , (A.20)
dr φ(r) dr φ(r)
so with 0 denoting derivative
γ 0 (0) = E [x] . (A.21)
Similarly
d2 γ(r) φ(r) · E x2 · erx + erx − E [x · erx ] · E [x · erx ]
= (A.22)
dr2 φ2 (r)
so that
γx00 (0) = σx2 , (A.23)
00
and γ (r) > 0 so convex in r
Name px Px (−ω)
Bernoulli p 1−p 1 − p + p · eω
Binomial p, N times (1 − p + p · eω )N
(R·T )i ·e−R·T ω
−1)
Poisson i! eRT (e
aω (b+1)ω
1 e −e
discrete uniform b, a ∈ Z, b > a b−a+1 (b−a+1)·(1−eω )
bω
1 e −eaω
continuous uniform b−a ω(b−a)
−λx 1
exponential λ·e ω
1− RT
(x−µx )2
− 2
2σx 1 2 2
Gaussian e √ 2
eωµx − 2 ·σx ω
2πσx
504
The probability that x and y are both in region Dx,y is
Z
P r{(x, y) ∈ Dx,y } = px,y (u, v) · du · dv . (A.26)
Dx,y
Also, Z
py (v) = py/x (u, v) · px (u) · du . (A.35)
Dx
Expectations apply to any distribution, and whence to conditional distributions. The implied expectation
variable may be written as a subscript, for instance
Ey {Ex [x/y]} = E [x] = Ex [x] . (A.36)
505
A.1.6 The random vector
The random vector
xN
x = ... (A.37)
x1
has N random-variable components. These N random variables’ joint distribution is px (u). The com-
QN
ponents are independent if px (u) = n=1 pxn (un ). If any two of the components are independent
individually in the joint margin distribution pxi ,xj , for i 6= j, the components are said to be pair-
wise independent. Pairwise independence does not necessarily imply independent of all components,
but independence necessarily includes pairwise independence. The concept of mean expands to the
N -dimensional mean vector
E[xN ]
∆
µx = E [x] = ..
, (A.38)
.
E[x1 ]
while the second moment concept expands to the N × N autocorrelation matrix:
Rxx = E [x · x∗ ] . (A.39)
A covariance matrix subtracts the mean vectors so that
Σxx = E [(x − µx ) · (x − µx )∗ ] = Rxx − µx · µ∗x . (A.40)
Similarly there is a cross-correlation matrix between two random vectors x and y
Rxy = E [x · y ∗ ] , (A.41)
which is the all zeroes matrix when the two random vectors are uncorrelated. Again, if x and y are
independent, then px,y (u, v) = px (u)·py (v). When the vectors are jointly Gaussian (Subsection A.1.7),
uncorrelated and independence are the same thing, but not in general.
The Chain Rule of probability recursively extends Bayes Theorem to random vectors so that
N
Y
px (u) = pxn /[xn−1 ...x1 ] (un , un−1 , ...u1 ) , (A.42)
n=1
∆
with px1 /x0 = px1 . The vector elements in the chain rule can be in any order.
The Q-function appears in Figures A.3, A.1, and A.2 for very low SNR (-10 to 0 dB), low SNR (0 to
10 dB), and high SNR (10 to 16 dB) using a very accurate approximation (less than 1% error) formula
from Leon-Garcia[1]:
x2
" #
1 e− 2
Q(x) ≈ π−1 1
√ √ . (A.44)
π x + π x2 + 2π 2π
506
√
For the mathematician at heart, Q(x) = .5 · erf c(x/ 2), where erfc is known as the complimentary error
function by mathematicians.
The integral cannot be evaluated in closed form for arbitrary x, but Q(0) = 0.5 and Q(−∞) = 1.
Matlab’s q.m function evaluates it directly.
Argument (dB)
Figures A.1, A.2, and A.3 have their horizongal axes in dB (20 log10 (x)). Q(−x) = 1 − Q(x), so
valuation need only be for positive arguments. The following bounds apply
x2 x2
1 e− 2 e− 2
(1 − 2 ) √ ≤ Q(x) ≤ √ (A.45)
x 2πx2 2πx2
The readily computed upper bound in (A.45) is easily seen to be a very close approximation for x ≥ 3.
Computation of the probability that a Gaussian random variable u with mean µx and variance σx2
exceeds some value d then uses the Q-function as follows:
d − µx
P {u ≥ d} = Q( ) (A.46)
σx
507
Argument (dB)
508
1
√
approaches a Gaussian probability density with variance N · σx2 and mean N · µx .
N
Proof: The sum trivially has mean E [< x >] = N · µx = µx , and similarly since the
random variables are independent variance σ<x> = N 2 · σx2 = N1 σx2 , so then the variance of
2 N
√
N · < x > is σx2 . The Gaussian probability density follows from recognizing the convolution
of probability densities corresponds to the product of their characteristic functions, N times.
When a random variable is scaled by a > 0, the probability density (see A.12) is a1 · px ua .
√
This means the terms x0 = √1N ·xn in the sum have probability densities px0 (u) = px ( N ·u),
and thus characteristic functions φx √ωN . Further, the mean of µx can be subtracted from
each term in the sum (leaving overall zero mean) for the entire sum in this next proof
segment. With independent samples added, the characteristic functions (which are all the
same) multiply, so
N N
ω ω
φ<x> (ω) = · φx ( √ ) = φx ( √ ) . (A.48)
N N
ω
Using a Taylor Series Expansion (see Appendix B), and recalling that √ φx (ω) = E(e ) and
the exponential function has simple Taylors series expansion (the ω/ N term zeros because
the mean is zero ).
ω ω2 ω2
φx ( √ ) = 1 − + o( ) (A.49)
N 2N 2N
2 2
ω ω
where the o( 2N term decays more rapidly than 2N as N → ∞ and the Fourier Transform
presumably does not increase with frequency (as it would correspond to infinite energy, see
the Paley Weiner Criterion in Appendix D). Then
N N
ω2 ω2 ω2
1 2
lim 1 − + o( ) = 1− ) = e− 2 ω . (A.50)
N →∞ 2N 2N 2N
The latter is recognized as the characteristic function of a unit-variance Gaussian probability
density. QED.
509
Proof:
σx2 E (x − µx )2
=
E (x − µx )2 /k · σx ≤ |x − µx | · P r{k · σx ≤ |x − µx |} +
=
E (x − µx )2 /k · σx > |x − µx | · P r{k · σx > |x − µx |}
Chebyshev’s inequality helps prove the law of large numbers (LLN), which basically says that the
sample average converges to the mean if the samples are independent and drawn from the same distri-
bution.
Strong Form:
lim P r{< x >= µx } = 1 . (A.57)
N →∞
2 σ2
Proof: Already established are E[< x >N ] = µx and σ<x>N
= Nx . Chebychev’s equality
as N → ∞ establishes the weak form. The strong form follows from
σx2
P r{|< x >N −µx | < } = 1 − P r{|< x >N −µx | ≥ } ≥ 1 − , (A.58)
N · 2
which tends to 1 as N becomes large. QED.
E [erx ] φx (r)
P r{x ≥ x0 } = P r{erx ≥ erx0 } ≤ = rx0 , (A.59)
erx0 e
which can be made tight by finding the r value that makes the bound as low as possible:
This bound when optimized (see [3]) is found to be exact for the sample mean < x > as N → ∞.
Chapter 2 largely uses the LLN in constructing the AEP, which essentially avoids much of the need for
the other bounds in this Section, but they appear here for completeness.
510
A.1.10 Memoryless Random Variable
A memoryless random variable is such that
The only probability density that satisfies this property is the exponential distribution, for which P r{x >
u} = Fx (u) = e−λ·u .
511
A.2 Random Processes
Special thanks are due to Dr. James Aslanis who wrote a version of this section when a Professor at the
University of Hawaii.
Random processes add a time-index to random variables, aggregating into a vector or time series
multiple samples from the random variable’s density/distribution at different times. In the most general
case, the probability density/distribution px may vary at the different sampling times, leading to a
time-variant random process as opposed to a stationary random process when the distribution
is invariant to time shifts.
The natural mathematical construct to describe a noisy communication signal is a random process.
Random processes are simply a generalization of random variables. Consider a single sample of a random
process, which is described by a random variable x with probability density px (u). (Often the random
process is assumed to be Gaussian but not always). This random variable only characterizes the random
process for a single instant of time. For a finite set of time instants, a random vector x = [xt1 , . . . , xtn ]
with a joint probability density function Px (u) describes the random process. Extension to a countably
infinite set of random variable samples, indexed by n, defines a discrete-time random process.
The random variables in a random process need not be independent nor identically distributed,
although the random variables all sample the same domain. Similarly, the mean E[f (Xt )] is a function
of the index. The mean, also known as an ensemble average, should not be confused with averaging
over the time index (sometimes referred to as the sample mean). For example the mean value of a
random variable associated with the tossing of a die is approximated by averaging the values over many
independent tosses.
When the sample mean converges to the ensemble average, the random process is mean ergodic).
In a random process, each random variable in the collection of random variables may have a different
probability density function. Thus, time averaging, or the sample mean, over successive samples may
not yield any information about the ensemble averages.
Random processes are classified by statistical properties that their density functions obey, the most
important of which is stationarity.
A.2.1 Stationarity
pxt1 ,...,xtn (ut1 , . . . , utn ) = pxt1 +t ,...,xtn +t (ut1 +t , . . . , utn +t ) ∀ n, t, {t1 , . . . , tn } . (A.62)
Roughly speaking, the statistics of Xt are invariant to a time shift; i.e. the placement of the origin t = 0
is irrelevant.
This text next considers commonly calculated functions of a random process: These functions en-
capsulate properties of the random processes, and a linear-systems analysis sometimes calculates these
functions for processes without knowing the exact process, probability density. Certain random processes,
such as stationary Gaussian processes, are completely described by a collection of these functions.
512
Definition A.2.3 (Mean) The mean of a random process Xt is
Z ∞
∆
E(xt ) = ut · pxt (ut ) · dut = µX (t) . (A.63)
−∞
In general, the autocorrelation is a two-dimensional function of the pair {t1 , t2 }. For a stationary
∆
process, the autocorrelation is a one-dimensional function of the time difference τ = t1 − t2 only:
The stationary process’ autocorrelation also satisfies a Hermitian property rx (τ ) = rx∗ (−τ ).
Using the mean and autocorrelation functions, also known as the first- and second-order moments,
designers often define a weaker form of stationarity.
Definition A.2.5 [Wide Sense Stationarity] A random process Xt is called wide sense
stationary (WSS) if
1. E(xt ) = constant,
2. E(xt1 · x∗t2 ) = rx (t1 − t2 ) = rx (τ ), i.e. a function of the time difference only.
While SSS ⇒ WSS, WSS 6⇒ SSS. Often, random processes’ analysis only considers their first- and
second-order statistics. Such results do not reveal anything about the random process’ higher-order
statistics; however, the Gaussian needs only the lower-order statistics. In particular,
Definition A.2.6 [Gaussian Random Process] The joint probability density func-
tion of a stationary real Gaussian random process for any set of n indices {t1 , . . . , tn }
is
1 1 −1 0
px (u) = 1/2
exp − (u − µx )Σ xx (x − µ x ) . (A.66)
(2π)n/2 (|Σxx | 2
A complex Gaussian random variable has independent Gaussian random variables in both
the real and imaginary parts, both with the same variance, which is half the variance of
the complex random variable. Then, the distribution is
1
exp −(x − µx )Σ−1 0
px (u) =
(π)n (|Σ xx (x − µx ) . (A.67)
xx |
For a Gaussian random process, the set of random variables {xt1 , . . . , xtn } are jointly Gaussian. A
Gaussian random process also satisfies the following two important properties:
513
1. The output response of a linear time-invariant system to a Gaussian input is also a Gaussian
random process.
2. A WSS, real-valued, Gaussian random process is SSS.
Much of this textbook’s analysis will considers Gaussian random processes passed through linear time-
invariant systems. As a result of the properties listed above, the designer only requires these processes’
mean and autocorrelation functions for complete characterization. Fortunately, the designer can calculate
the effect of linear time-invariant systems on the randcom process without explicitly using the probability
densities/distributions.
In particular, for a linear time-invariant system defined by an impulse response h(t), the mean of the
output random process Yt is
µy (t) = µx (t) ∗ h(t) . (A.68)
The output autocorrelation function is
ry (τ ) = h(τ ) ∗ h∗ (−τ ) ∗ rx (τ ) . (A.69)
In addition, many analyses use the correlation between the input and output random processes:
For a jointly WSS random processes, the cross-correlation only depends on the time difference
rxy (t1 , t2 ) = rxy (t1 − t2 ) = rxy (τ ) . (A.71)
The cross-correlation rXY (τ ) does not satisfy the Hermitian property that the autocorrelation obeys,
but
∗
rxy (τ ) = ryx (−τ ) . (A.72)
Further
rxy (τ ) = rx (τ ) ∗ h∗ (−τ ) (A.73)
ryx (τ ) = rx (τ ) ∗ h(τ ) (A.74)
A more general form of stationarity is cyclostationarity, wherein the random process’ density/distribution
is invariant only to specific index shifts.
pxt1 ,...,xtn (ut1 , . . . , utn ) = pxt1 +T ,...,xtn +T (ut1 +T , . . . , utn +T ) ∀ n, {t1 , . . . , tn }, (A.75)
That is, xt+kT is statistically equivalent to xt ∀ t, k. Cyclostationarity accounts for the regularity
in communication transmissions that repeat a particular operation at specific time intervals; however,
within a particular time interval, the statistics are allowed to vary arbitrarily. As with stationarity, a
weaker form for cyclostationarity that depends only on the first and second order statistics of the random
process is.
514
Definition A.2.9 [Wide Sense Cyclostationarity] A random process is wide sense
cyclostationary if
1. E(xt ) = E(xt+kT ) ∀ t, k.
2. rx (t + τ, t) = rx (t + τ + kT, t + kT ) ∀ t, τ, k.
Thus, the mean and autocorrelation functions of a WS cyclostationary process are periodic functions
with period T . Many random signals in communications, such as an ensemble of modulated waveforms,
satisfy the WS cyclostationarity properties.
The periodicity of a WS cyclostationary random process would complicate the study of modulated
signals without use of the following convenient property. Given a WS cyclostationary random process
xt with period T , the random process xt+θ is WSS if θ is a uniform random variable over the interval
[0, T ]. Thus, analysis often shall include (or assume) a random phase θ to yield a WSS random process.
Alternatively for a WS cyclostationary random process, there is a time-averaged autocorrelation
function.
Since the autocorrelation function rX (t+τ, t) is periodic, integration could be over any closed interval
of length T in A.76.
As in the study of deterministic signals and systems, frequency-domain descriptions are often useful
for analyzing random processes. First, this appendix continues with the definitions for deterministic
signals.
Definition A.2.11 [Energy Spectral Density] The Energy Spectral Density of a
finite energy deterministic signal x(t) is |X(ω)|2 where
Z ∞
∆
X(ω) = x(t) · e−jωt dt = F{x(t)} (A.77)
−∞
If the finite-energy signal x(t) is nonzero for only a finite time interval, say T , then the time average
power in the signal equals Px = Ex /T .
Communication signals are usually modeled as repeated patterns extending from (−∞, ∞), in which
case the energy is infinite, although the time average power may be finite.
Definition A.2.12 [Power Spectral Density] The power spectral density of a finite
power signal defined as
|XT (ω)|2
Sx (ω) = lim (A.79)
T →∞ T
where XT (ω) = F(xT (t)) is the Fourier transform of the truncated signal
x(t) |t| < T2
xT (t) = (A.80)
0 otherwise
∆ 1
R T /2 R∞
Thus, the time average power is calculable as Px = limT →∞ T −T /2
|x(t)|2 · dt = −∞
Sx (ω) ·
dω < ∞
515
For deterministic signals, the power Px is a time-averaged quantity.
For a random process, strictly speaking, the Fourier transform does not exactly specify the power
spectral density. Even if the random process is well-behaved, the result would be another random
process. Instead, ensemble averages are required for frequency-domain analysis.
∆
For a random process xt , the ensemble average power, Pxt = E[|xt |2 ], may vary instantaneously over
time. For a WSS random process, however, Pxt = Px is a constant.
For a WS cyclostationary random process xt the autocorrelation function rx (t + τ, t), for a fixed time
lag τ , is periodic in t with period T . Consequently the autocorrelation function can be expanded using
a Fourier series.
∞
X
rx (t + τ, t) = γn (τ ) · ej2πnt/T , (A.82)
n=−∞
The function G0 (f ) is the power spectral density of the WS cyclostationary random process Xt associated
with the time average autocorrelation.
For a nonstationary random process, the average power must be calculated by both time and ensemble
R T /2
averaging, i.e. PXt = limT →∞ T1 −T /2 E[|Xt |]2 dt.
These same relations hold in discrete time also with ω → e−ω . For more on Fourier Transforms and
well-behaved functions, see Appendix D.
516
A.3 Passband Processes
This appendix investigates properties of the correlation functions for a WSS passband random process
x(t) in its several representations. For a brief introduction to the definitions of random processes see
Section A.2.
Definition A.3.1 (Hilbert Transform) The Hilbert Transform of x(t) is denoted x̌(t)
and is given by Z ∞
x(u)
x̌(t) = ~(t) ∗ x(t) = · du , (A.88)
−∞ π(t − u)
where
1
πt t 6= 0
~(t) = . (A.89)
0 t=0
Equation (A.94) shows that a frequency component at a positive frequency is shifted in phase by −90o ,
while a component at a negative frequency is shifted by +90o . Summarizing
Since |H(ω)| = 1 ∀ ω 6= 0, then |X(ω)| = |X̌(ω)|, assuming X(0) = 0. This text only considers passband
signals with no energy present at DC (ω = 0). Thus, the Hilbert Transform only affects the phase and
not the magnitude of a passband signal.
A.3.1.1 Examples
Let
1
x(t) = cos(ωc t) = · (eωc t + e−ωc t ) , (A.96)
2
then
1 1
x̌(t) = · (−eωc t + e−ωc t ) = · (eωc t − e−ωc t ) = sin(ωc t) . (A.97)
2 2
Let
1
x(t) = sin(ωc t) = · (eωc t − e−ωc t ) , (A.98)
2
517
then
1 1
x̌(t) = · (−eωc t − · e−ωc t ) = − (eωc t + e−ωc t ) = − cos(ωc t) . (A.99)
2j 2
Note
ˇ (t) = ~(t) ∗ ~(t) ∗ x(t) = −x(t) .
x̌ (A.100)
since (−j · sgn(ω))2 = −1 ∀ ω 6= 0. A correct interpretation of the Hilbert transform is that every
sinusoial component is passed with the same amplitude, but with its phase reduced by 90 degrees.
or then
1
− πt t 6= 0
~−1 (t) = −~(t) = . (A.102)
0 t=0
x̌(t) = xI (t) · sin (ωc t) + xQ (t) · cos (ωc t) = ={xA (t)} , (A.106)
Thus, a WSS random process and its Hilbert Transform have the same autocorrelation function and the
same power spectral density.
By Equation (A.74)
518
The cross correlation between the random process x(t) and its Hilbert transform x̌(t) is the Hilbert
transform of the autocorrelation function of the random process x(t).
Thus also,
rx̌,x (τ ) = ~(τ ) ∗ rx (τ ) = ~∗ (τ ) ∗ rx (τ ) = ~∗ (τ ) ∗ rx (−τ ) = rx,x̌
∗
(−τ ) = rx,x̌ (−τ ) . (A.111)
By using Equations (A.109), (A.110) and (A.111),
rx̌,x (τ ) = řx (τ ) = −rx,x̌ (τ ) = −rx̌,x (−τ ) . (A.112)
Equation (A.112) implies that rx̌,x (τ ) is an odd function, and thus
rx̌,x (0) = 0 . (A.113)
That is, a real-valued random process and its Hilbert Transform are uncorrelated at any particular point
in time.
519
Standard trigonometric identities simplify (A.125) to
1
rx (τ ) = · [rI (τ ) + rQ (τ )] · cos(ωc τ ) (A.126)
2
1
+ · [rIQ (τ ) − rQI (τ )] · sin(ωc τ )
2
1
− · [rQ (τ ) − rI (τ )] · cos ωc (2t − τ )
2
1
− · [rIQ (τ ) + rQI (τ )] · sin(ωc (2t − τ ))
2
Strictly speaking, most modulated waveforms are WS cyclostationary with period Tc = 2π ωc , i.e.
E [x(t) · x∗ (t − τ )] = rx (t, t − τ ) = rx (t + Tc , t + Tc − τ ). For cyclostationary random processes a time-
R T /2
averaged autocorrelation function of one variable τ can be defined by rx (τ ) = 1/Tc −Tc c /2 rx (t, t − τ )dt,
and this new time-averaged autocorrelation function will satisfy the properties derived thus far in this
section. The next set of properties require that the random process to be WSS, not WS cyclostationary –
R T /2
or equivalently the time-averaged autocorrelation function rx (τ ) = 1/Tc −Tc c /2 rx (t, t − τ )dt can be used.
An example of a WSS random process is AWGN. Modulated signals often have equal energy inphase
and quadrature components with the inphase and quadrature signals derived independently from the
incoming bit stream; thus, the modulated signal is then WSS.
For x(t) to be WSS, the last two terms in (A.126) must equal zero. Thus, rI (τ ) = rQ (τ ) and rIQ (τ ) =
−rQI (τ ) = −rIQ (−τ ). The latter equality shows that rIQ (τ ) is an odd function of τ and thus rIQ (0) = 0.
For x(t) to be WSS, the inphase and quadrature components of x(t) have the same autocorrelation and
are uncorrelated at any particular instant in time. Substituting back into Equation A.126,
520
A.4 Markov Processes
Markov processes often characterize the distribution dynamics of time-varying processes, but have more
general mathematical distribution, ss in Section 1.6. While the Markov Process is time-varying, it often
has a stationary distribution that characterizes the likelihood of being in any one of a number, |A |, of
different states. The Markov process specifically has a probability of being in a next state that only
depends on the previous state:
pk/i = pk/i,i−1,...0 . (A.134)
The discrete (special case of a) Markov model has an |A | × |A | probability-transition matrix with
entries pk/j representing the probability that the state k will occur when the current/last state is j. The
probability matrix is then
p|A |−1/|A |−2 p|A |−1/|A |−3 ... p|A |−1/0
∆
P = .. .. .. ..
. . (A.135)
. . . .
p0/|A |−2 p1/|A |−3 .... p0/0
𝑝#!
$
𝒜 =4 𝑝"!
%
𝑝#" 𝑝#" 0 0
# $
0 0 𝑝$" 0
%
𝑃! = 0 𝑝%" 0 𝑝%"
$ &
𝑝&" 0 0 𝑝&"
# &
Figure A.4: Markov State-Machine Model, with non-zero transition probabilities shown.
Distant past is unimportant in Markov processes, only the last state matters. The selection of a next
state may be associated with a random variable with probability distribution characterized by an input
that determines to which state the Markov process next proceeds. That input has a distribution that
corresponds to a column of P . When, as always the case in this text, that input is stationary, P is a
constant matrix of nonnegative entries. Each column also sums to unity value. There may be a random
value associated with each state that is observed and not directly the state itself, and where that value is
a function of the state and perhaps jointly with another random variable that is independent of the state,
which then crease a Hidden Markov Process. This latter HMP often characterizes a time-varying
channel where the H varies according to P but independent noise is added to create a channel output.
In this case, the transition between states does not depend on an input. However, in other situations
521
the channel H may be fixed and the transmitted symbol causes the state to change; the number of
states |A | = M (number of symbol constellation values). This second example often corresponds leads
to maximum-likelihood sequence decoders and again the noise is independent of the channel input. Yet
another (not hidden) example uses a Markov process to characterize the interarrival times between
message packets/symbols at the channel input, where the arrival of the next packet only depends on the
time since the last arrival. Markov processes are often also characterized a state-probability distribution
pk where
pk = P · pk−1 . (A.136)
The matrix P has all nonnegative entries, as does p. Because of the possibility of zero entries in P ,
non-degenerate Markov processes satisfy an additional condition that
0
n
Pk/i n
> 0 for some n > 0 and Pi/k > 0 for some n0 > 0 , (A.137)
which means that any state will eventually be reached from any other state with nonzero probability3 .
n
That is, the Markov Process is irreducible. A periodic state i occurs when Pi/i 6= 0 only if n = k · d for
k, d ∈ Z+ , and d > 1 is the period. If d = 1, it is aperiodic. If no states are periodic, the Markov Process
is aperiodic.. If the MP is both irreducible and aperiodic, then for large enough number of successive
transitions, occurring starting at any point in time, some integer power of the matrix P will contain all
positive (nonzero) entries. Then by Perron-Frobenious theory4 , such a matrix has a largest positive-real
eigenvalue that dominates P n→∞ and a corresponding all-positive-entry eigenvector. This eigenvector
π satisfies also
P ·π =π , (A.138)
and is the stationary vector or probability distribution (when normalized so that it sums to one). When
a stationary distribution exists, the Markov Process is balanced so that the sum of the probabilities of
going to other states out of any state j equals the probability of arriving in state j from all other states,
thus satisfying the balance equation:
!
X X
pi/j · pj = Pj/i · pi . (A.139)
i i
The balance equation simplifies for many arrival-process situations as in Section A.5.
P · p0 = P t · p0 . (A.143)
00
3 Equivalently for some large n00 ≥ max(n, n0 ), then P n 0, so it is as if the transition matrix for n00 transitions is all
positive and then satisfies Peron Frobenious condition of all positive entries.
4 See the Wikipedia page at https://en.wikipedia.org/wiki/Perron\OT1\textendashFrobenius_theorem.
522
A.5 Queuing Theory
In this text, renewal processes are basically random processes that attempt to model the arrival or
departure of messages. There is an interarrival interval that becomes random, essentially signalling
a burst of messages to be sent. Often these messages are queued with perhaps a steady output flow
(perhaps augmented by dummy packets) into a modulator that transmits b bits/symbol. The depth
of the queue indicates roughly the need for use/service by the channel. For this text’s purposes, the
queue can be possibly infinite in length/delay to avoid consideration of “buffer overflow” errors that
typically find solution at higher layers. Instead, this appendix simply investigates random modeling of
the arrivals, by the common Poisson distribution or other potential distributions. The book by Gallager
[?] (Chapter 2) is an excellent more detailed reference on this subject. See also Giambene.
The instants tn theoretically need not be integer multiples of some basic time unit T (like a sampling
clock’s zero crossings), but in practice any system would likely approach that integer multiple if arrivals
are slightly delayed until the start of the next symbol as would be necessary in most practice. A
continuous-time random process can count the number of arrivals that have occurred before or at any
time t > 0, so
523
The discrete-valued distribution for B(t) derives from looking at the integral of the Erlang density
over the infinitessimmally small interval (t < tn+1 < t + δ) as in [?], which then provides the Poisson
probability mass function
(λ · t)n · e−λ·t
pB(t) (n) = . (A.152)
n!
𝑝$! =𝜆 % 𝑑𝜏 𝑝&! =𝜆 % 𝑑𝜏
% $
𝑝!⁄" =𝜆 % 𝑑𝜏
Figure A.5: M/M/1 Queue - simple birth-death Markov Process state-machine model.
λ · πn = b · πn+1 (A.153)
ρn+1 = ρn · (1 − ρ) (A.154)
524
1
∆ = (A.156)
b−λ
ρ
∆q = (A.157)
b−λ
ρ2
Bq = . (A.158)
1−ρ
The variance of the delay depends on scheduling policy and can be complicated to compute. The
M/M/1 queue is sometimes called a statistical multiplexer because as the users arrive randomly they
are served by (typically) a high-speed channel. To understand this better, the input may be viewed as
U users’ messages/symbols all multiplexed to one channel. Each of the users may have an average data
rate of λ/U , which of course sum over all users to λ. The aggregate as well as the individual message
streams are Poisson processes, just the users are slower individually. A deterministic multiplexer
instead equally apportions b/U of the channel bandwidth to each user (presuming tacitly the capacity
region allows such equal apportionment). It has then delay
1 U
∆det = = = U · ∆stat . (A.159)
λ/U − b/U b−λ
The statistical multiplexer thus has less delay than would have say a TDM (or OFDM - see Chapter
4) system. However, a slightly different queue is the D/D/1 queue where each user is presumably
streaming at data rate λ/U and matched to its turn in the channel that runs at b ≥ λ. The D/D/1 system
has constant delay of one symbol and no variation, and no chance of buffer overflow. While the M/M/1
has significantly lower average delay for U > 1, very large delays can have nonzero probability. Thus,
the trade-off to use statistical multiplexing (over deterministic) is not so simple, eventually depending
on the relative user rates and the scheduling method and maximum tolerable delay. Systems with many
users continually streaming (e.g., video), especially when the aggregate data rate approaches the channel
sum-capacity, are better served by deterministic scheduling. Other systems, with more infrequent use
(whether low data rate or high data rate for any particular user) may be better served by a statistical
multiplexer. A method intermediate to Chapter 2’s QPS (queue-proportional scheduling) that considers
the channel capacity region, an intermediate queue of interest investigates a system for which there are
U parallel channels each carrying b bits/symbol, as in the next subsection.
525
𝑝"! =1 − 𝜆 % 𝑑𝜏 𝑝%! =1 − 𝜆 % 𝑑𝜏 − 𝑏 % 𝑑𝜏 𝑝$! =1 − 𝜆 % 𝑑𝜏 − 2𝑏 % 𝑑𝜏 𝑝*! =1 − 𝜆 % 𝑑𝜏 − 𝑈𝑏 % 𝑑𝜏
" % $ *
𝑝$! =𝜆 % 𝑑𝜏
%
𝑝!⁄" =𝜆 % 𝑑𝜏 𝑝&! =𝜆 % 𝑑𝜏 𝑝&()(%! =𝜆 % 𝑑𝜏
&'%
… &()
𝑝!⁄# =2𝑏 % 𝑑𝜏
…
𝑝"⁄! =𝑏 % 𝑑𝜏
𝑝$%!!$ =𝑈𝑏 % 𝑑𝜏 𝑝$&'! =𝑈𝑏 % 𝑑𝜏
$&'&`
Figure A.6: M/M/U Queue - U -user birth-death Markov Process state-machine model.
π0 · ρU
B = ρ+ (A.162)
(U − 1)! · (U − ρ)2
1 1 π0 · ρU +1
∆ = + · (A.163)
b λ (U − 1)! · (U − ρ)2
1 π0 · ρU +1
∆q = · (A.164)
λ (U − 1)! · (U − ρ)2
π0 · ρU +1
Bq = . (A.165)
(U − 1)! · (U − ρ)2
Another related quantity of interest is the probability that a user has to wait in queue
∞
X π0 · ρU
PQ = ·πn = , (A.166)
n=m
(U − 1)! · (U − ρ)
which is known as the Erlang C formula [4]. The M/M/U queue also has an interesting large-number-
of-users property that the system essentially creates a Poisson arrival process to whatever comes next:
n
1 λ
lim πn = π0 · (A.167)
U →∞ n! b
n
e−λ/b
λ
= · (A.168)
n! b
lim B = ρ (A.169)
U →∞
526
𝑝"! =1 − 𝜆 % 𝑑𝜏 𝑝%! =1 − 𝜆 % 𝑑𝜏 − 𝑏 % 𝑑𝜏 𝑝$! =1 − 𝜆 % 𝑑𝜏 − 2𝑏 % 𝑑𝜏 𝑝(! =1 − 𝑈𝑏 % 𝑑𝜏
" % $ (
𝑝$! =𝜆 % 𝑑𝜏
%
𝑝!⁄" =𝜆 % 𝑑𝜏 𝑝&! =𝜆 % 𝑑𝜏
&'%
…
State State State State
0 1 2 𝑈
𝑝!⁄# =2𝑏 % 𝑑𝜏
…
𝑝"⁄! =𝑏 % 𝑑𝜏
𝑝$%!!$ =𝑈𝑏 % 𝑑𝜏
Figure A.7: M/M/U Queue - U -user birth-death Markov Process state-machine model.
The relationship ρ < U (or ρ ≤ 1 for a single user, meaning channel must run faster than inputs
on average) need no longer hold, but there is a nonzero probability that messages are simply not sent,
which is πU . The balance equations provide in this case
ρn
πn = π0 · , (A.170)
n!
which then yields (probabilities for states 0 to U must add to 1)
1
π0 = PU . (A.171)
n=0 ρn /n!
B = ρ · (1 − Pb ) (A.174)
1
∆= , (A.175)
b
so the delay is basically the channel processing time for a symbol, but that may occur with unacceptable
blocking probability.
5 The inverse form has less dynamic range in quantities so is easier to calculate in finite precision.
527
A.6 Gram-Schmidt Orthonormalization Procedure
This appendix illustrates the construction of a set of orthonormal basis functions ϕn (t) from a set of
modulated waveforms {xi (t), i = 0, ..., M − 1}. The process for doing so, and achieving minimal dimen-
sionality is called Gram-Schmidt Orthonormalization.
Step 1:
Find a signal in the set of modulated waveforms with nonzero energy and call it x0 (t). Let
∆ x0 (t)
ϕ1 (t) = p , (A.176)
Ex0
R∞ p
where Ex = −∞ [x(t)]2 dt. Then x0 = Ex0 0 ... 0 .
∆ R∞
• Compute xi−1,n for n = 1, ..., i − 1 (xi−1,n = −∞
xi−1 (t)ϕn (t)dt).
• Compute
i−1
∆
X
θi (t) = xi−1 (t) − xi−1,n ϕn (t) (A.177)
n=1
Final Step:
Delete all components, n, for which ϕn (t) = 0 to achieve minimum dimensional basis function set,
and reorder indices appropriately.
528
Bibliography
[1] Alberto Leon-Garcia. “Probability, Statistics, and Random Processes for Electrical Engineering”, 3rd
Edition. Pearson Education, USA, 2007.
[2] Anthanasios Papoulis. “Probability, Random Variables, and Stochastic Processes”. McGraw-Hill,
New York, 1965.
[3] Robert G. Gallager “Stochastic Processes: Theory for Application”. Cambridge Press,
UK, 2013. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-262-discrete-stochastic-processes-spring-2011/
[4] Dimitris Bertsekas and Robert Gallager “Data Networks, Second Edition”. Prentice Hall, Englewood
Cliffs,NJ, 1992.
[5] Sheldon Ross “Introduction to Probability and Statistics for Engineers and Scientists”, 6th Edition.
Elsevier, Academic Press, 2020.
529
Index
arrival moment
rate, 523 first, 503
arrival process, 523 generator, 504
auotcorellation, 503 multiplexer
autocorrelation deterministic, 525
matrix, 506 multipllexer
statistical, 525
Bayes Theorem, 505
Bernoulli Distribution, 502 Perron Frobenius, 522
Poisson Process, 523
chain rule, 506 probability
characteristic function, 504 density, 503
Chebyshev, 509 mass function, 502
Chernoff Bound, 510
covariance random variable
matrix, 506 continuous, 503
discrete, 502
density random variables
exponential, 503 independence, 505
uniform, 503 relative frequency, 510
distribution renewal process, 523
cummulative, 503
geometric, 502
marginal, 504
Poisson, 502
uniform discrete, 502
gram-schmidt, 528
530
1
2
3
4
5
6
7
8
531