0% found this document useful (0 votes)
17 views32 pages

AppendixA Probability and Statistics

Uploaded by

slavadryga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views32 pages

AppendixA Probability and Statistics

Uploaded by

slavadryga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Appendix A - Probability and

Statistics

A Probability and Statistics 501


A.1 Random Variables and Their Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
A.1.1 The random variable, probability distributions and densities . . . . . . . . . . . . . 502
A.1.2 Mean, Moments, and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
A.1.3 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
A.1.4 Joint random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
A.1.5 Conditional Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 505
A.1.6 The random vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
A.1.7 The Gaussian and the Q-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
A.1.8 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
A.1.9 The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
A.1.10 Memoryless Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
A.2 Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
A.2.1 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
A.2.2 Random Processes and Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 516
A.3 Passband Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
A.3.1 The Hilbert Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
A.3.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
A.3.1.2 Inverse Hilbert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
A.3.2 Hilbert Transform of Passband Signals . . . . . . . . . . . . . . . . . . . . . . . . . 518
A.3.2.1 Hilbert Transform of a Random Process . . . . . . . . . . . . . . . . . . . 518
A.3.2.2 Quadrature Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 519
A.4 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
A.4.1 Birth-Death Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
A.5 Queuing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
A.5.1 Arrival Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
A.5.2 Renewal and the Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 523
A.5.3 Queue Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
A.5.3.1 M/M/1 Queue: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
A.5.3.2 M/M/U Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
A.5.3.3 Finite Queues and Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . 526
A.6 Gram-Schmidt Orthonormalization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 528

Bibliography 529

Index 530

500
Appendix A

Probability and Statistics

This appendix provides a summary of probability and random-process basics. The intent is a useful
reference to assist with items used throughout the book, but not as a primary source for teaching
probability and/or statistics, for which there are many excellent sources, including one for electrical
engineers [1], a time-tested classic [2], an advanced treatment [?], and an introductory treatment [5].
Section A.1 samples and overviews prerequisite results from basic random variables and probability.
Readers viewing this should have some previous exposure to random variables and basic systems analysis.
This section is meant only as a refresher and notational consolidator. Section A.2 proceeds to random
processes as time-indexed random variables, reviewing results in stationarity (both strict and wide-sense)
and reviewing some basic linear-systems results. Section A.3 completes this appendix with specifics of
the first two sections when considering complex passband (or baseband) signals.

501
A.1 Random Variables and Their Probability
Random variables values that are not specific, or are not deterministic. This means several possibilities
exist, each with a certain likelihood (probability). This text’s data transmission study has a few types
of random variables of interest:

Random Variables in Data Transmission

Data messages m ∈ {0, ..., M − 1} or their consequent symbol representations xi ∈ C =


{x0 ... x|C|−1 (M = |C| when uncoded).
Noise n ∈ C
channel gain h ∈ C
user u ∈ {1, ..., U }

In general engineering and beyond, of course, there are many more examples, but these above are
this text’s focus. A good reference with more significant detail and proofs is [?]

A.1.1 The random variable, probability distributions and densities


The possibilities for a discrete random variable in some finite or countably infinite set C with |C| ≤ ∞
elements are:
x ∈ C = xi ∈ {x0 ... x|C|−1 } , (A.1)
where i is an integer index i ⊆ Z+ . In general, any label can apply to any finite set to characterize
the random variable (not just a data symbol in R or C) and the probability mass function, px (i),
specifies probabilities that each random-variable value may occur as:

px (i) i = 0, ..., |C| − 1 , (A.2)

where 0 ≤ px (i) ≤ 1 ∀ i = 0, ..., |C| − 1. The probability that the random variable is in subset
x0 ⊆ C 0 = {x0 ... x|C|−1 } with corresponding (possibly relabled) indices j = 0, ..., |C 0 | − 1| ≤ |C| − 1 is
X
P r{C 0 } = px (i) ≤ 1 . (A.3)
j∈C 0

with P r{C} = 1.
Example probability mass functions include: The discrete uniform mass function :
1
px (i) = , (A.4)
|C|
and the Poisson Distribution that characterizes events occurring at rate R in time T
(R · T )i · e−R·T
px (i) = ∀ i ≥ 0 and R > 0 , T ≥ 0 . (A.5)
i!
The Bernoulli Distribution has only two values x = 1 or x = 0 with respectively probability mass
function values p and q = 1 − p respectively, while the geometric distribution has countably infinite
discrete values i = 1, ..., ∞ values with probabilities

px (i) = p · (1 − p)i−1 , (A.6)

often corresponding to the first occurrence of a 1 on the ith experimental sample from a Bernoulli
distribution. Another somewhat unusual distribution has value i occurring with decaying probability
1
px (i) = i(i+1) but infinite average (see Subsection A.1.2 ).

502
A continuous real probability density function, px (u) ≥ 0, measures a continuous random
variable’s relatively likelihood . Continuous randcom variables take values as real numbers x ∈ R, or
as complex numbers x ∈ C, with domain x ∈ Dx ⊆ C. The probability that the random variable is in
continuous region (or set of such regions), x0 ⊆ Dx0 ⊆ Dx , is
Z
0
P r{Dx } = px (u) · du ≤ 1 . (A.7)
u∈Dx0

with P r{Dx } = 1. Often the word probability distribution more generally describes either or both of
probability mass function and probability density function. In such context, the integral notation most
often appears and is equivalent to the sum if the integral is more broadly constructed as a Stieltijes
Integral (https://en.wikipedia.org/wiki/Riemann\OT1\textendashStieltjes_integral).
The cumulative distribution function for both discrete and continuous real or complex random
variables is

Fx (X) = P r{x ≤ X} . (A.8)
When x is continuous, then
d
px (u) = Fx (u) , (A.9)
du
and when x is discrete, the probability distribution function’s values are the successive differences between
the cumulative distribution function’s values.
Example probability densities include the continuous uniform density
1
px (u) = ∀ d ∈ [−d/2, d2] (A.10)
d
and the exponential distribution (parametrized by λ)
λ · e−λu u ≥ 0

px (u) = . (A.11)
0 u<0
Functions of random variables are random variables. Thus, f (x) is a random variable if x is a random
variable. The new distribution is
df −1 (y)
py (y) = px (f −1 (y)) · . (A.12)
dy

A.1.2 Mean, Moments, and Variance


A discrete random variable’s mean value is
|C|−1

X
µx = E [x] = i · px (i) (A.13)
i=0

and for the continuous case,


Z Z

µx = E [x] = u · px (u) · du = udFx (u) , (A.14)
Dx Dx

where the last expression essentially reverses the axes in finding area under a curve. The mean is also
the first moment; higher order moments (about zero) are for real variables:
Z
n
E [x ] = E [x] = un · px (u) · du (A.15)
Dx

(the integral is replaced by the sum for the discrete case). When n = 2, this is the mean-square value
or the autocorrelation of x if generalized to complex variables as E[|x|2 ] = E[x · x∗ ]. The expectation
operator E is linear in that E[x + y] = E[x] + E[y]. The variance is the second moment about the
random variable’s mean:
h i Z
2 ∆ 2
σx = E (x − E [x]) = (u − E [x])2 · px (u) · du . (A.16)
Dx

The standard deviation is the (positive) square root of the variance, σx . indexstandard deviation

503
A.1.3 Moment Generating Functions
The moment-generating or characteristic function is the (frequency-reversed) Fourier Transform
(See Appendix C) of the probability distribution function
Z
rx
φx (r) = E [e ] = px (u) · er·u· · du = Px (r) , (A.17)
u∈Dx,r

where Dx,r is a domain of convergence for the integral that depends on the distribution and choice of r.
Often use r = r = −ω and then Φx (ω) is the Fourier Transform of the probability density function.
The moment generating function helps generate moments according to (with r = −ω)

dn (φx (ω = 0))
E [xn ] = (−)n · , (A.18)
dω n
essentially replacing (A.15)’s integration with (often simpler) differentiation to generate the random vari-
able’s moments (see https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory)).
There is also the semi-invariant moment generating function

γx (r) = ln [φ(r)] (A.19)

where
E [erx · x]
 
dγx (r) 1 dφ(r)
= E = , (A.20)
dr φ(r) dr φ(r)
so with 0 denoting derivative
γ 0 (0) = E [x] . (A.21)
Similarly  
d2 γ(r) φ(r) · E x2 · erx + erx − E [x · erx ] · E [x · erx ]
= (A.22)
dr2 φ2 (r)
so that
γx00 (0) = σx2 , (A.23)
00
and γ (r) > 0 so convex in r
Name px Px (−ω)
Bernoulli p 1−p 1 − p + p · eω
Binomial p, N times (1 − p + p · eω )N
(R·T )i ·e−R·T ω
−1)
Poisson i! eRT (e
aω (b+1)ω
1 e −e
discrete uniform b, a ∈ Z, b > a b−a+1 (b−a+1)·(1−eω )
bω
1 e −eaω
continuous uniform b−a ω(b−a)
−λx 1
exponential λ·e ω
1− RT
(x−µx )2
− 2
2σx 1 2 2
Gaussian e √ 2
eωµx − 2 ·σx ω
2πσx

A.1.4 Joint random variables


Two random variables also have a joint distribution px,y (u, v), where this subsection simply uses the
continuous form, but the discrete forms follow with the obvious replacement of integrals by sums. The
original marginal distributions are obtained by
Z
px (u) = px,y (u, v) · dv (A.24)
Dy
Z
py (v) = px,y (u, v) · du . (A.25)
Dx

504
The probability that x and y are both in region Dx,y is
Z
P r{(x, y) ∈ Dx,y } = px,y (u, v) · du · dv . (A.26)
Dx,y

Extension to more than 2 random variables appears in Subsection A.1.6.


When the probability distribution factors as
px,y (u, v) = px (u) · py (v)) , (A.27)
the two random variables are independent. The cross correlation of two random variables is

rxy = E [x · y ∗ ] . (A.28)
With independent random variables, trivially rxy = E [x]·E [y ∗ ]. The converse does not necessarily imply
independence (but does for Gaussian random variables in Subsection A.1.7. The covariance is

2
σxy = E [(x − E[x]) · (y − E[y])] = rxy − E [x] · E [y ∗ ] . (A.29)
The sum of two random variables has variance
2
σx+y = σx2 + σy2 + 2 · <rxy . (A.30)
2
Thus, the variance of a sum is the sum of the variances, so σx+y = σx2 +σy2 if x and y are independent.
Also, the probability density for the sum of two independent random variables is the convolution of
their probability densities ([1])
px+y (u) = px (u) ∗ py (u) . (A.31)

A.1.5 Conditional Probability Distributions


The conditional probability distribution furthers the joint distribution in providing a distribution for
one random variable y, given a specific value’s observation, x = u, for the other random variable. This
conditional distribution is clearly the ratio of the joint distribution, px,y (u, v) at the specific value x = u
to the marginal distribution py (v).
px,y (u, v)
py/x (u, v) = . (A.32)
py (v)
(A.32)’s inherent symmetry then leads to
py/x (u, v) · py (v) = px,y (u, v) = px/y (u, v) · px (u) , (A.33)
sometimes known as Bayes Theorem :

Theorem A.1.1 [Bayes Theorem] The conditional probability distributions satisfy

px/y (u, v) · px (u)


py/x (u, v) = . (A.34)
py (v)

proof; Rewrite (A.32). QED.

Also, Z
py (v) = py/x (u, v) · px (u) · du . (A.35)
Dx
Expectations apply to any distribution, and whence to conditional distributions. The implied expectation
variable may be written as a subscript, for instance
Ey {Ex [x/y]} = E [x] = Ex [x] . (A.36)

505
A.1.6 The random vector
The random vector  
xN
x =  ...  (A.37)
 

x1
has N random-variable components. These N random variables’ joint distribution is px (u). The com-
QN
ponents are independent if px (u) = n=1 pxn (un ). If any two of the components are independent
individually in the joint margin distribution pxi ,xj , for i 6= j, the components are said to be pair-
wise independent. Pairwise independence does not necessarily imply independent of all components,
but independence necessarily includes pairwise independence. The concept of mean expands to the
N -dimensional mean vector  
E[xN ]

µx = E [x] =  ..
 , (A.38)
 
.
E[x1 ]
while the second moment concept expands to the N × N autocorrelation matrix:
Rxx = E [x · x∗ ] . (A.39)
A covariance matrix subtracts the mean vectors so that
Σxx = E [(x − µx ) · (x − µx )∗ ] = Rxx − µx · µ∗x . (A.40)
Similarly there is a cross-correlation matrix between two random vectors x and y
Rxy = E [x · y ∗ ] , (A.41)
which is the all zeroes matrix when the two random vectors are uncorrelated. Again, if x and y are
independent, then px,y (u, v) = px (u)·py (v). When the vectors are jointly Gaussian (Subsection A.1.7),
uncorrelated and independence are the same thing, but not in general.
The Chain Rule of probability recursively extends Bayes Theorem to random vectors so that
N
Y
px (u) = pxn /[xn−1 ...x1 ] (un , un−1 , ...u1 ) , (A.42)
n=1


with px1 /x0 = px1 . The vector elements in the chain rule can be in any order.

A.1.7 The Gaussian and the Q-function


The Q Function is used to evaluate error probability in digital communication - It is the integral of
a zero-mean unit-variance Gaussian random variable’s probabiliy density from some specified argument
to ∞:

Definition A.1.1 [Q Function]


Z ∞
1 u2
Q(x) = Fx (x) = √ e− 2 du (A.43)
2π x

The Q-function appears in Figures A.3, A.1, and A.2 for very low SNR (-10 to 0 dB), low SNR (0 to
10 dB), and high SNR (10 to 16 dB) using a very accurate approximation (less than 1% error) formula
from Leon-Garcia[1]:
x2
" #
1 e− 2
Q(x) ≈ π−1 1
√ √ . (A.44)
π x + π x2 + 2π 2π

506

For the mathematician at heart, Q(x) = .5 · erf c(x/ 2), where erfc is known as the complimentary error
function by mathematicians.
The integral cannot be evaluated in closed form for arbitrary x, but Q(0) = 0.5 and Q(−∞) = 1.
Matlab’s q.m function evaluates it directly.

Argument (dB)

Figure A.1: Low SNR Q-Function Values

Figures A.1, A.2, and A.3 have their horizongal axes in dB (20 log10 (x)). Q(−x) = 1 − Q(x), so
valuation need only be for positive arguments. The following bounds apply
x2 x2
1 e− 2 e− 2
(1 − 2 ) √ ≤ Q(x) ≤ √ (A.45)
x 2πx2 2πx2
The readily computed upper bound in (A.45) is easily seen to be a very close approximation for x ≥ 3.
Computation of the probability that a Gaussian random variable u with mean µx and variance σx2
exceeds some value d then uses the Q-function as follows:
d − µx
P {u ≥ d} = Q( ) (A.46)
σx

A.1.8 The Central Limit Theorem


The Central Limit Theorem (see also Wikipedia).
Theorem A.1.2 (Central Limit Theorem) A scaled 1 sample average sum of random
variables independently drawn from the same distribution

√ 1 X
N · < x >= xn (A.47)
N n=0
1 The scaling here prevents the sum’s variance from going to zero.

507
Argument (dB)

Figure A.2: High SNR Q-Function Values

Figure A.3: Very Low SNR Q-Function Values

508
1

approaches a Gaussian probability density with variance N · σx2 and mean N · µx .
N
Proof: The sum trivially has mean E [< x >] = N · µx = µx , and similarly since the
random variables are independent variance σ<x> = N 2 · σx2 = N1 σx2 , so then the variance of
2 N

N · < x > is σx2 . The Gaussian probability density follows from recognizing the convolution
of probability densities corresponds to the product of their characteristic functions, N times.
When a random variable is scaled by a > 0, the probability density (see A.12) is a1 · px ua .


This means the terms x0 = √1N ·xn in the sum have probability densities px0 (u) = px ( N ·u),
 
and thus characteristic functions φx √ωN . Further, the mean of µx can be subtracted from
each term in the sum (leaving overall zero mean) for the entire sum in this next proof
segment. With independent samples added, the characteristic functions (which are all the
same) multiply, so
 N  N
ω ω
φ<x> (ω) = · φx ( √ ) = φx ( √ ) . (A.48)
N N
ω
Using a Taylor Series Expansion (see Appendix B), and recalling that √ φx (ω) = E(e ) and
the exponential function has simple Taylors series expansion (the ω/ N term zeros because
the mean is zero ).
ω ω2 ω2
φx ( √ ) = 1 − + o( ) (A.49)
N 2N 2N
2 2
ω ω
where the o( 2N term decays more rapidly than 2N as N → ∞ and the Fourier Transform
presumably does not increase with frequency (as it would correspond to infinite energy, see
the Paley Weiner Criterion in Appendix D). Then
N  N
ω2 ω2 ω2

1 2
lim 1 − + o( ) = 1− ) = e− 2 ω . (A.50)
N →∞ 2N 2N 2N
The latter is recognized as the characteristic function of a unit-variance Gaussian probability
density. QED.

A.1.9 The Law of Large Numbers


The Markov Inequality, for nonnegative random variable x, relates that
E [x]
P r{x > x0 } ≤ , (A.51)
x0
which implies that larger values are less probable. For instance a value twice the mean, has probability
less than half of occurring. This is trivially follows from

E [x] = E [x/x ≤ x0 ] ·P r{x ≥ x0 } + E [x/x < x0 ] · P r{x < x0 } (A.52)


| {z } | {z }
≥x0 >0

≥ x0 · P r{x > x0 } (A.53)


E [x]
≥ P r{x > x0 } . (A.54)
x0
Chebychev’s inequality also follows from conditional expectations and relates the probability that
a random variable deviates more than k standard deviations from its mean to the variance and k.

Lemma A.1.1 [Chebyshev’s Inequality]


1
P r{|x − µx | > k · σx } ≤ . (A.55)
k2

509
Proof:

σx2 E (x − µx )2
 
=
E (x − µx )2 /k · σx ≤ |x − µx | · P r{k · σx ≤ |x − µx |} +
 
=
E (x − µx )2 /k · σx > |x − µx | · P r{k · σx > |x − µx |}
 

≥ (kσx2 ) · P r{k · σx ≤ |x − µx |} + 0 · P r{k · σx > |x − µx |}


1
P r{k · σx ≤ |x − µx |} ≤
k2
QED.

Chebyshev’s inequality helps prove the law of large numbers (LLN), which basically says that the
sample average converges to the mean if the samples are independent and drawn from the same distri-
bution.

Theorem A.1.3 [Law of Large Numbers]


Weak Form:
lim P r{|< x > −µx | < } = 1 . (A.56)
N →∞

Strong Form:
lim P r{< x >= µx } = 1 . (A.57)
N →∞

2 σ2
Proof: Already established are E[< x >N ] = µx and σ<x>N
= Nx . Chebychev’s equality
as N → ∞ establishes the weak form. The strong form follows from

σx2
P r{|< x >N −µx | < } = 1 − P r{|< x >N −µx | ≥ } ≥ 1 − , (A.58)
N · 2
which tends to 1 as N becomes large. QED.

Estimating probability distribution values by counting (binning) the number of occurrences of a


specific value (or small ranges of values in continuous case) computes the relative frequency.. Such
calculation corresponds to each such calculation to a sample mean of the indicator function (1 if the
value occurs, 0 otherwise). Indeed, then the strong LLN implies that these relative frequencies converge
with probability 1 to the actual probability values. Thus, computing relative frequencies (see for instance
Chapter 4) can be an accurate means for computing probability distributions when models themselves
may not be accurate.
The Chernoff bound follows directly from Markov’s inequality

E [erx ] φx (r)
P r{x ≥ x0 } = P r{erx ≥ erx0 } ≤ = rx0 , (A.59)
erx0 e
which can be made tight by finding the r value that makes the bound as low as possible:

P r{x ≥ x0 } ≤ min e−rx0 · φx (r) . (A.60)


r≥0

This bound when optimized (see [3]) is found to be exact for the sample mean < x > as N → ∞.
Chapter 2 largely uses the LLN in constructing the AEP, which essentially avoids much of the need for
the other bounds in this Section, but they appear here for completeness.

510
A.1.10 Memoryless Random Variable
A memoryless random variable is such that

P r{x > t + u} = P r{x > u} · P r{x > t} . (A.61)

The only probability density that satisfies this property is the exponential distribution, for which P r{x >
u} = Fx (u) = e−λ·u .

511
A.2 Random Processes
Special thanks are due to Dr. James Aslanis who wrote a version of this section when a Professor at the
University of Hawaii.
Random processes add a time-index to random variables, aggregating into a vector or time series
multiple samples from the random variable’s density/distribution at different times. In the most general
case, the probability density/distribution px may vary at the different sampling times, leading to a
time-variant random process as opposed to a stationary random process when the distribution
is invariant to time shifts.
The natural mathematical construct to describe a noisy communication signal is a random process.
Random processes are simply a generalization of random variables. Consider a single sample of a random
process, which is described by a random variable x with probability density px (u). (Often the random
process is assumed to be Gaussian but not always). This random variable only characterizes the random
process for a single instant of time. For a finite set of time instants, a random vector x = [xt1 , . . . , xtn ]
with a joint probability density function Px (u) describes the random process. Extension to a countably
infinite set of random variable samples, indexed by n, defines a discrete-time random process.

Definition A.2.1 (Discrete-Time Random Process) A Discrete-Time Random


Process is a countably infinite (or finite), indexed set of random variables described by
a joint density function px (u) where x = [xti , . . . , xti+n ], where i is an integer and n
is a positive integer. u contains particular sample values as inputs to the probability
density/distribution.

The random variables in a random process need not be independent nor identically distributed,
although the random variables all sample the same domain. Similarly, the mean E[f (Xt )] is a function
of the index. The mean, also known as an ensemble average, should not be confused with averaging
over the time index (sometimes referred to as the sample mean). For example the mean value of a
random variable associated with the tossing of a die is approximated by averaging the values over many
independent tosses.
When the sample mean converges to the ensemble average, the random process is mean ergodic).
In a random process, each random variable in the collection of random variables may have a different
probability density function. Thus, time averaging, or the sample mean, over successive samples may
not yield any information about the ensemble averages.
Random processes are classified by statistical properties that their density functions obey, the most
important of which is stationarity.

A.2.1 Stationarity

Definition A.2.2 [Strict Sense Stationarity] A discrete andom process xt is called


strict sense stationary (SSS) if the joint probability density function

pxt1 ,...,xtn (ut1 , . . . , utn ) = pxt1 +t ,...,xtn +t (ut1 +t , . . . , utn +t ) ∀ n, t, {t1 , . . . , tn } . (A.62)

Roughly speaking, the statistics of Xt are invariant to a time shift; i.e. the placement of the origin t = 0
is irrelevant.
This text next considers commonly calculated functions of a random process: These functions en-
capsulate properties of the random processes, and a linear-systems analysis sometimes calculates these
functions for processes without knowing the exact process, probability density. Certain random processes,
such as stationary Gaussian processes, are completely described by a collection of these functions.

512
Definition A.2.3 (Mean) The mean of a random process Xt is
Z ∞

E(xt ) = ut · pxt (ut ) · dut = µX (t) . (A.63)
−∞

A second-moment-like quanity is the autocorrelation function

Definition A.2.4 [Autocorrelation Function] A random process xt has an autocor-


relation function:
Z ∞
∗ ∆
E(xt1 · xt2 ) = ut1 · u∗t2 · pxt1 xt2 (ut1 , ut2 ) · dut1 · dut2 = rx (t1 , t2 ) (A.64)
−∞

In general, the autocorrelation is a two-dimensional function of the pair {t1 , t2 }. For a stationary

process, the autocorrelation is a one-dimensional function of the time difference τ = t1 − t2 only:

E(xt1 · x∗t2 ) = rx (t1 − t2 ) = rx (τ ) (A.65)

The stationary process’ autocorrelation also satisfies a Hermitian property rx (τ ) = rx∗ (−τ ).
Using the mean and autocorrelation functions, also known as the first- and second-order moments,
designers often define a weaker form of stationarity.
Definition A.2.5 [Wide Sense Stationarity] A random process Xt is called wide sense
stationary (WSS) if
1. E(xt ) = constant,
2. E(xt1 · x∗t2 ) = rx (t1 − t2 ) = rx (τ ), i.e. a function of the time difference only.
While SSS ⇒ WSS, WSS 6⇒ SSS. Often, random processes’ analysis only considers their first- and
second-order statistics. Such results do not reveal anything about the random process’ higher-order
statistics; however, the Gaussian needs only the lower-order statistics. In particular,

Definition A.2.6 [Gaussian Random Process] The joint probability density func-
tion of a stationary real Gaussian random process for any set of n indices {t1 , . . . , tn }
is  
1 1 −1 0
px (u) = 1/2
exp − (u − µx )Σ xx (x − µ x ) . (A.66)
(2π)n/2 (|Σxx | 2

A complex Gaussian random variable has independent Gaussian random variables in both
the real and imaginary parts, both with the same variance, which is half the variance of
the complex random variable. Then, the distribution is
1
exp −(x − µx )Σ−1 0
 
px (u) =
(π)n (|Σ xx (x − µx ) . (A.67)
xx |

For a Gaussian random process, the set of random variables {xt1 , . . . , xtn } are jointly Gaussian. A
Gaussian random process also satisfies the following two important properties:

513
1. The output response of a linear time-invariant system to a Gaussian input is also a Gaussian
random process.
2. A WSS, real-valued, Gaussian random process is SSS.
Much of this textbook’s analysis will considers Gaussian random processes passed through linear time-
invariant systems. As a result of the properties listed above, the designer only requires these processes’
mean and autocorrelation functions for complete characterization. Fortunately, the designer can calculate
the effect of linear time-invariant systems on the randcom process without explicitly using the probability
densities/distributions.
In particular, for a linear time-invariant system defined by an impulse response h(t), the mean of the
output random process Yt is
µy (t) = µx (t) ∗ h(t) . (A.68)
The output autocorrelation function is
ry (τ ) = h(τ ) ∗ h∗ (−τ ) ∗ rx (τ ) . (A.69)
In addition, many analyses use the correlation between the input and output random processes:

Definition A.2.7 [Cross-correlation] The cross-correlation between the random


processes Xt and Yt is given by

E(xt1 yt∗2 ) = rxy (t1 , t2 ) . (A.70)

For a jointly WSS random processes, the cross-correlation only depends on the time difference
rxy (t1 , t2 ) = rxy (t1 − t2 ) = rxy (τ ) . (A.71)
The cross-correlation rXY (τ ) does not satisfy the Hermitian property that the autocorrelation obeys,
but

rxy (τ ) = ryx (−τ ) . (A.72)
Further
rxy (τ ) = rx (τ ) ∗ h∗ (−τ ) (A.73)
ryx (τ ) = rx (τ ) ∗ h(τ ) (A.74)
A more general form of stationarity is cyclostationarity, wherein the random process’ density/distribution
is invariant only to specific index shifts.

Definition A.2.8 [Strict-Sense Cyclostationarity] A random process is strict


sense cyclostationary if the joint probability density function satisfies

pxt1 ,...,xtn (ut1 , . . . , utn ) = pxt1 +T ,...,xtn +T (ut1 +T , . . . , utn +T ) ∀ n, {t1 , . . . , tn }, (A.75)

where T is called the period of the process.

That is, xt+kT is statistically equivalent to xt ∀ t, k. Cyclostationarity accounts for the regularity
in communication transmissions that repeat a particular operation at specific time intervals; however,
within a particular time interval, the statistics are allowed to vary arbitrarily. As with stationarity, a
weaker form for cyclostationarity that depends only on the first and second order statistics of the random
process is.

514
Definition A.2.9 [Wide Sense Cyclostationarity] A random process is wide sense
cyclostationary if
1. E(xt ) = E(xt+kT ) ∀ t, k.
2. rx (t + τ, t) = rx (t + τ + kT, t + kT ) ∀ t, τ, k.
Thus, the mean and autocorrelation functions of a WS cyclostationary process are periodic functions
with period T . Many random signals in communications, such as an ensemble of modulated waveforms,
satisfy the WS cyclostationarity properties.
The periodicity of a WS cyclostationary random process would complicate the study of modulated
signals without use of the following convenient property. Given a WS cyclostationary random process
xt with period T , the random process xt+θ is WSS if θ is a uniform random variable over the interval
[0, T ]. Thus, analysis often shall include (or assume) a random phase θ to yield a WSS random process.
Alternatively for a WS cyclostationary random process, there is a time-averaged autocorrelation
function.

Definition A.2.10 [Time-Averaged Autocorrelation] The time-averaged auto-


correlation of a WS cyclostationary random process xt is
Z T /2
1
r̄x (τ ) = rx (t + τ, t) · dt . (A.76)
T −T /2

Since the autocorrelation function rX (t+τ, t) is periodic, integration could be over any closed interval
of length T in A.76.
As in the study of deterministic signals and systems, frequency-domain descriptions are often useful
for analyzing random processes. First, this appendix continues with the definitions for deterministic
signals.
Definition A.2.11 [Energy Spectral Density] The Energy Spectral Density of a
finite energy deterministic signal x(t) is |X(ω)|2 where
Z ∞

X(ω) = x(t) · e−jωt dt = F{x(t)} (A.77)
−∞

is the Fourier transform of x(t). Thus, the energy is calculable as


Z ∞ Z ∞
∆ 1
Ex = |x(t)|2 · dt = |X(ω)|2 · dω < ∞ (A.78)
−∞ 2π −∞

If the finite-energy signal x(t) is nonzero for only a finite time interval, say T , then the time average
power in the signal equals Px = Ex /T .
Communication signals are usually modeled as repeated patterns extending from (−∞, ∞), in which
case the energy is infinite, although the time average power may be finite.
Definition A.2.12 [Power Spectral Density] The power spectral density of a finite
power signal defined as
|XT (ω)|2
Sx (ω) = lim (A.79)
T →∞ T
where XT (ω) = F(xT (t)) is the Fourier transform of the truncated signal
x(t) |t| < T2
xT (t) = (A.80)
0 otherwise
∆ 1
R T /2 R∞
Thus, the time average power is calculable as Px = limT →∞ T −T /2
|x(t)|2 · dt = −∞
Sx (ω) ·
dω < ∞

515
For deterministic signals, the power Px is a time-averaged quantity.
For a random process, strictly speaking, the Fourier transform does not exactly specify the power
spectral density. Even if the random process is well-behaved, the result would be another random
process. Instead, ensemble averages are required for frequency-domain analysis.

For a random process xt , the ensemble average power, Pxt = E[|xt |2 ], may vary instantaneously over
time. For a WSS random process, however, Pxt = Px is a constant.

Definition A.2.13 [Power Spectral Density] For a WSS continuous-time random


process Xt , the power spectral density is

SX (ω) = F{rx (τ )}. (A.81)


R∞
Also, Px = −∞
SX (ω) · dω.

For a WS cyclostationary random process xt the autocorrelation function rx (t + τ, t), for a fixed time
lag τ , is periodic in t with period T . Consequently the autocorrelation function can be expanded using
a Fourier series.

X
rx (t + τ, t) = γn (τ ) · ej2πnt/T , (A.82)
n=−∞

where γn (τ ) are the Fourier coefficients


Z T /2
1
γn (τ ) = rx (t + τ, t) · e−j2πnt/T · dt . (A.83)
T −T /2

The time average autocorrelation function is then


Z T /2
1
rx (t + τ, t) · e−j2πnt/T · dt = γ0 (τ ). (A.84)
T −T /2

The average power for the WS cyclostationary random process is


Z T /2 Z ∞
1
PX t = E[|Xt |2 ] · dt = r̄X (0) = γ0 (0) = G0 (f ) · df (A.85)
T −T /2 −∞

where G0 (f ) is the Fourier transform of the n = 0 Fourier coefficient

F{γ0 (τ )} = G0 (f ) = F{r̄X (τ )}. (A.86)

The function G0 (f ) is the power spectral density of the WS cyclostationary random process Xt associated
with the time average autocorrelation.
For a nonstationary random process, the average power must be calculated by both time and ensemble
R T /2
averaging, i.e. PXt = limT →∞ T1 −T /2 E[|Xt |]2 dt.

A.2.2 Random Processes and Linear Systems


Followinbg (A.69), the frequency-domeain equivalent is

Syy (ω) = |H(ω)|2 · Sxx (ω) . (A.87)

These same relations hold in discrete time also with ω → e−ω . For more on Fourier Transforms and
well-behaved functions, see Appendix D.

516
A.3 Passband Processes
This appendix investigates properties of the correlation functions for a WSS passband random process
x(t) in its several representations. For a brief introduction to the definitions of random processes see
Section A.2.

A.3.1 The Hilbert Transform


The Hilbert Transform finds frequent use in passband signal processing, and is is a linear operator that
shifts the phase of a sinusoid by 90o :

Definition A.3.1 (Hilbert Transform) The Hilbert Transform of x(t) is denoted x̌(t)
and is given by Z ∞
x(u)
x̌(t) = ~(t) ∗ x(t) = · du , (A.88)
−∞ π(t − u)
where
1

πt t 6= 0
~(t) = . (A.89)
0 t=0

The Fourier Transform of ~(t) is



e−ωt
Z
H(ω) = · dt (A.90)
−∞ πt
Z ∞
− sin(ωt)
= · dt (A.91)
−∞ πt
Z ∞
− sin(πu)
= sgn(ω) · du (A.92)
−∞ πu
Z ∞ 
= − sinc(u) · du · sgn(ω) (A.93)
−∞
= − · sgn(ω) (A.94)

Equation (A.94) shows that a frequency component at a positive frequency is shifted in phase by −90o ,
while a component at a negative frequency is shifted by +90o . Summarizing

X̌(ω) = − · sgn(ω)X(ω) . (A.95)

Since |H(ω)| = 1 ∀ ω 6= 0, then |X(ω)| = |X̌(ω)|, assuming X(0) = 0. This text only considers passband
signals with no energy present at DC (ω = 0). Thus, the Hilbert Transform only affects the phase and
not the magnitude of a passband signal.

A.3.1.1 Examples
Let
1
x(t) = cos(ωc t) = · (eωc t + e−ωc t ) , (A.96)
2
then
1 1
x̌(t) = · (−eωc t + e−ωc t ) = · (eωc t − e−ωc t ) = sin(ωc t) . (A.97)
2 2
Let
1
x(t) = sin(ωc t) = · (eωc t − e−ωc t ) , (A.98)
2

517
then
1 1
x̌(t) = · (−eωc t −  · e−ωc t ) = − (eωc t + e−ωc t ) = − cos(ωc t) . (A.99)
2j 2
Note
ˇ (t) = ~(t) ∗ ~(t) ∗ x(t) = −x(t) .
x̌ (A.100)
since (−j · sgn(ω))2 = −1 ∀ ω 6= 0. A correct interpretation of the Hilbert transform is that every
sinusoial component is passed with the same amplitude, but with its phase reduced by 90 degrees.

A.3.1.2 Inverse Hilbert


The inverse Hilbert Transform is easily specified in the frequency domain as

H−1 (ω) =  · sgn(ω) , (A.101)

or then
1

− πt t 6= 0
~−1 (t) = −~(t) = . (A.102)
0 t=0

A.3.2 Hilbert Transform of Passband Signals


Given a passband signal x(t), form the quadrature decomposition

x(t) = xI (t) · cos (ωc t) − xQ (t) · sin (ωc t) (A.103)

and transform x(t) into the frequency domain


1 1
X(ω) = · [XI (ω + ωc ) + XI (ω − ωc )] − · [XQ (ω − ωc ) − XQ (ω + ωc )] . (A.104)
2 2
Equation A.104 shows that if X(ω) = 0 ∀ |ω| > 2ωc then XI (ω) = 0 and XQ (ω) = 0 ∀ |ω| > ωc .2 Using
this fact the Hilbert transform X̌(ω) is given by
 1
X̌(ω) = · [XI (ω + ωc ) − XI (ω − ωc )] + · [XQ (ω − ωc ) + XQ (ω + ωc )] . (A.105)
2 2
The inverse Fourier Transform of X̌(ω) then yields

x̌(t) = xI (t) · sin (ωc t) + xQ (t) · cos (ωc t) = ={xA (t)} , (A.106)

where = denotes the imaginary part.

A.3.2.1 Hilbert Transform of a Random Process


Let x(t) be a WSS real-valued random process and x̌(t) = ~(t) ∗ x(t) be its Hilbert transform. By
Equation (A.69), the autocorrelation of x̌t is

rx̌ (τ ) = ~(τ ) ∗ ~∗ (−τ ) ∗ rx (τ ) = rx (τ ). (A.107)

Since |H(ω)|2 = 1 ∀ ω 6= 0, and assuming SX (0) = 0, then

Sx̌ (ω) = Sx (ω) . (A.108)

Thus, a WSS random process and its Hilbert Transform have the same autocorrelation function and the
same power spectral density.
By Equation (A.74)

rx̌,x (τ ) = ~(τ ) ∗ rx (τ ) = řx (τ ) (A.109)


rx,x̌ (τ ) = ~∗ (−τ ) ∗ rx (τ ) = −~(τ ) ∗ rx (τ ) = −řx (τ ) . (A.110)
2 Recall xI (t) and xQ (t) are real signals.

518
The cross correlation between the random process x(t) and its Hilbert transform x̌(t) is the Hilbert
transform of the autocorrelation function of the random process x(t).
Thus also,
rx̌,x (τ ) = ~(τ ) ∗ rx (τ ) = ~∗ (τ ) ∗ rx (τ ) = ~∗ (τ ) ∗ rx (−τ ) = rx,x̌

(−τ ) = rx,x̌ (−τ ) . (A.111)
By using Equations (A.109), (A.110) and (A.111),
rx̌,x (τ ) = řx (τ ) = −rx,x̌ (τ ) = −rx̌,x (−τ ) . (A.112)
Equation (A.112) implies that rx̌,x (τ ) is an odd function, and thus
rx̌,x (0) = 0 . (A.113)
That is, a real-valued random process and its Hilbert Transform are uncorrelated at any particular point
in time.

A.3.2.2 Quadrature Decomposition


The quadrature decomposition for any real-valued WSS passband random process and its Hilbert trans-
form is
x(t) = xI (t) · cos ωc t − xQ (t) · sin ωc t (A.114)
x̌(t) = xI (t) · sin ωc t + xQ (t) · cos ωc t. (A.115)
The baseband equivalent complex-valued random process is
xbb (t) = xI (t) +  · xQ (t) (A.116)
and the analytic equivalent complex-valued random process is
xA (t) = x(t) +  · x̌(t) = xbb (t) · ejωc t . (A.117)
The original random process can be recovered as
x(t) = < {xA (t)} . (A.118)
The autocorrelation of xA (t) is
rA (τ ) = E {xA (t)x∗A (t − τ )} (A.119)
= 2 · (rx (τ ) + řx (τ )) . (A.120)
The right hand side of Equation (A.120) is twice the analytic equivalent of the autocorrelation function
rx (τ ). The power spectral density is
SA (ω) = 4 · Sx (ω) ω > 0 . (A.121)
The functions in the quadrature decomposition of x(t) also have autocorrelation functions:

rI (τ ) = E {xI (t) · x∗I (t − τ )} (A.122)

E xQ (t) · x∗Q (t − τ )

rQ (τ ) = (A.123)

E xI (t) · x∗Q (t − τ )

rIQ (τ ) = (A.124)
Then also,
rx (τ ) = E {x(t) · x∗ (t − τ )} (A.125)
= rI (τ ) · cos(ωc t) cos(ωc (t − τ ))
− rIQ (τ ) · cos(ωc t) · sin(ωc (t − τ ))
− rQI (τ ) · sin(ωc t) · cos(ωc (t − τ ))
+ rQ (τ ) · sin(ωc t) · sin ωc (t − τ )

519
Standard trigonometric identities simplify (A.125) to
1
rx (τ ) = · [rI (τ ) + rQ (τ )] · cos(ωc τ ) (A.126)
2
1
+ · [rIQ (τ ) − rQI (τ )] · sin(ωc τ )
2
1
− · [rQ (τ ) − rI (τ )] · cos ωc (2t − τ )
2
1
− · [rIQ (τ ) + rQI (τ )] · sin(ωc (2t − τ ))
2
Strictly speaking, most modulated waveforms are WS cyclostationary with period Tc = 2π ωc , i.e.
E [x(t) · x∗ (t − τ )] = rx (t, t − τ ) = rx (t + Tc , t + Tc − τ ). For cyclostationary random processes a time-
R T /2
averaged autocorrelation function of one variable τ can be defined by rx (τ ) = 1/Tc −Tc c /2 rx (t, t − τ )dt,
and this new time-averaged autocorrelation function will satisfy the properties derived thus far in this
section. The next set of properties require that the random process to be WSS, not WS cyclostationary –
R T /2
or equivalently the time-averaged autocorrelation function rx (τ ) = 1/Tc −Tc c /2 rx (t, t − τ )dt can be used.
An example of a WSS random process is AWGN. Modulated signals often have equal energy inphase
and quadrature components with the inphase and quadrature signals derived independently from the
incoming bit stream; thus, the modulated signal is then WSS.
For x(t) to be WSS, the last two terms in (A.126) must equal zero. Thus, rI (τ ) = rQ (τ ) and rIQ (τ ) =
−rQI (τ ) = −rIQ (−τ ). The latter equality shows that rIQ (τ ) is an odd function of τ and thus rIQ (0) = 0.
For x(t) to be WSS, the inphase and quadrature components of x(t) have the same autocorrelation and
are uncorrelated at any particular instant in time. Substituting back into Equation A.126,

rx (τ ) = rI (τ ) · cos(ωc τ ) − rQI (τ ) · sin(ωc τ ) (A.127)

Equation A.127 expresses the autocorrelation rx (τ ) in a quadrature decomposition and thus

řx (τ ) = rI (τ ) · sin(ωc τ ) + rQI (τ ) · cos(ωc τ ) (A.128)

Further algebra leads to

rbb (τ ) = E {xbb (t) · x∗bb (t − τ )} (A.129)


= 2 · (rI (τ ) + rQI (τ )) (A.130)
ωc τ
rA (τ ) = rbb (τ ) · e (A.131)

The power spectral density is


SA (ω) = Sbb (ω − ωc ) . (A.132)
If Sx (ω) is symmetric about ωc , then Sbb (ω) is symmetric about ω = 0 (recall that the spectrum
SA (ω) is a scaled version of the positive frequencies of SX (ω) and Sbb (ω) is SA (ω) shifted down by
ωc ). In this case rbb (τ ) is real, and using Equation A.130, rQI (τ ) = 0. Equivalently, the inphase and
quadrature components of a random process are uncorrelated at any lag τ (not just τ = 0) if the power
spectral density is symmetric about the carrier frequency. Finally,
1 1
rx (τ ) = · < {rbb (τ )eωc τ } = · < {rA (τ )} . (A.133)
2 2
If x(t) is a random modulated waveform, by construction it is usually true that rI (τ ) = rQ (τ ) and
rIQ (τ ) = −rQI (τ ) = 0, so that the constructed x(t) is WSS. For AWGN, n(t) is usually WSS so that
rI (τ ) = rQ (τ ) and rIQ (τ ) = −rQI (τ ) = 0. When a QAM waveform is such that rIQ (τ ) 6= −rQI (τ ) or
rI (τ ) 6= rQ (τ ), then x(t) is WS cyclostationary with period π/ωc .

520
A.4 Markov Processes
Markov processes often characterize the distribution dynamics of time-varying processes, but have more
general mathematical distribution, ss in Section 1.6. While the Markov Process is time-varying, it often
has a stationary distribution that characterizes the likelihood of being in any one of a number, |A |, of
different states. The Markov process specifically has a probability of being in a next state that only
depends on the previous state:
pk/i = pk/i,i−1,...0 . (A.134)
The discrete (special case of a) Markov model has an |A | × |A | probability-transition matrix with
entries pk/j representing the probability that the state k will occur when the current/last state is j. The
probability matrix is then
 
p|A |−1/|A |−2 p|A |−1/|A |−3 ... p|A |−1/0
∆ 
P = .. .. .. ..
. . (A.135)

. . . .
p0/|A |−2 p1/|A |−3 .... p0/0

Every non-zero conditional-probability entry in P corresponds to a transition possibility in Figure A.4’s


directed graph of the state machine. Often channel uses order in time, but of course order could also
most generally include any path though sets of used dimensions in space, time, and/or frequency. Zero
probability entries have no transition shown.

𝑝#! 𝑝$! 𝑝%!


" # $

𝑝"! State State State State 𝑝%!


" 0 1 2 3 %

𝑝#!
$

𝒜 =4 𝑝"!
%

𝑝#" 𝑝#" 0 0
# $
0 0 𝑝$" 0
%
𝑃! = 0 𝑝%" 0 𝑝%"
$ &
𝑝&" 0 0 𝑝&"
# &
Figure A.4: Markov State-Machine Model, with non-zero transition probabilities shown.

Distant past is unimportant in Markov processes, only the last state matters. The selection of a next
state may be associated with a random variable with probability distribution characterized by an input
that determines to which state the Markov process next proceeds. That input has a distribution that
corresponds to a column of P . When, as always the case in this text, that input is stationary, P is a
constant matrix of nonnegative entries. Each column also sums to unity value. There may be a random
value associated with each state that is observed and not directly the state itself, and where that value is
a function of the state and perhaps jointly with another random variable that is independent of the state,
which then crease a Hidden Markov Process. This latter HMP often characterizes a time-varying
channel where the H varies according to P but independent noise is added to create a channel output.
In this case, the transition between states does not depend on an input. However, in other situations

521
the channel H may be fixed and the transmitted symbol causes the state to change; the number of
states |A | = M (number of symbol constellation values). This second example often corresponds leads
to maximum-likelihood sequence decoders and again the noise is independent of the channel input. Yet
another (not hidden) example uses a Markov process to characterize the interarrival times between
message packets/symbols at the channel input, where the arrival of the next packet only depends on the
time since the last arrival. Markov processes are often also characterized a state-probability distribution
pk where
pk = P · pk−1 . (A.136)
The matrix P has all nonnegative entries, as does p. Because of the possibility of zero entries in P ,
non-degenerate Markov processes satisfy an additional condition that
0
n
Pk/i n
> 0 for some n > 0 and Pi/k > 0 for some n0 > 0 , (A.137)

which means that any state will eventually be reached from any other state with nonzero probability3 .
n
That is, the Markov Process is irreducible. A periodic state i occurs when Pi/i 6= 0 only if n = k · d for
k, d ∈ Z+ , and d > 1 is the period. If d = 1, it is aperiodic. If no states are periodic, the Markov Process
is aperiodic.. If the MP is both irreducible and aperiodic, then for large enough number of successive
transitions, occurring starting at any point in time, some integer power of the matrix P will contain all
positive (nonzero) entries. Then by Perron-Frobenious theory4 , such a matrix has a largest positive-real
eigenvalue that dominates P n→∞ and a corresponding all-positive-entry eigenvector. This eigenvector
π satisfies also
P ·π =π , (A.138)
and is the stationary vector or probability distribution (when normalized so that it sums to one). When
a stationary distribution exists, the Markov Process is balanced so that the sum of the probabilities of
going to other states out of any state j equals the probability of arriving in state j from all other states,
thus satisfying the balance equation:
!
X X
pi/j · pj = Pj/i · pi . (A.139)
i i

The balance equation simplifies for many arrival-process situations as in Section A.5.

A.4.1 Birth-Death Processes


Birth-Death Markov Processes are a special case of interest to queuing. They always have a stationary
distribution and satisfy the properties:

Pi/j = 0 in |i − j| > 1 and (A.140)


Pi/i+1 > 0 (A.141)
pi+1/i > 0 (A.142)

Birth-death processes well model arrival processes necessarily also satisfy

P · p0 = P t · p0 . (A.143)

00
3 Equivalently for some large n00 ≥ max(n, n0 ), then P n  0, so it is as if the transition matrix for n00 transitions is all

positive and then satisfies Peron Frobenious condition of all positive entries.
4 See the Wikipedia page at https://en.wikipedia.org/wiki/Perron\OT1\textendashFrobenius_theorem.

522
A.5 Queuing Theory
In this text, renewal processes are basically random processes that attempt to model the arrival or
departure of messages. There is an interarrival interval that becomes random, essentially signalling
a burst of messages to be sent. Often these messages are queued with perhaps a steady output flow
(perhaps augmented by dummy packets) into a modulator that transmits b bits/symbol. The depth
of the queue indicates roughly the need for use/service by the channel. For this text’s purposes, the
queue can be possibly infinite in length/delay to avoid consideration of “buffer overflow” errors that
typically find solution at higher layers. Instead, this appendix simply investigates random modeling of
the arrivals, by the common Poisson distribution or other potential distributions. The book by Gallager
[?] (Chapter 2) is an excellent more detailed reference on this subject. See also Giambene.

A.5.1 Arrival Processes


An arrival process is discrete that models with a random process the times at which a message arrives
(more generally when an event occurs), ti > 0, i = 1, ..., ∞ and tj > ti if j > i. The stationary random-
variable distribution from which this processes samples are selected permits only positive values. These
“times” have separations or interarrival times

τi = ti − ti−1 > 0 , (A.144)

which is also a random process with only positive values. Clearly,


n
X
tn = τi . (A.145)
i=1

The instants tn theoretically need not be integer multiples of some basic time unit T (like a sampling
clock’s zero crossings), but in practice any system would likely approach that integer multiple if arrivals
are slightly delayed until the start of the next symbol as would be necessary in most practice. A
continuous-time random process can count the number of arrivals that have occurred before or at any
time t > 0, so

B(t) = n t ∈ [tn , tn+1 ) or in terms of sets (A.146)


tn ≤ t ⇒ B(t) ≥ n . (A.147)

So, t1 > t implies that B(t) = 0.

A.5.2 Renewal and the Poisson Distribution


A renewal process is an arrival process where the arrival-time random-process samples ti are inde-
pendent of each other. A Poisson renewal process has the arrival increments distributed exponentially
with arrival rate λ ∈ R+ , so
pτ (u) = λ · eλ·u u ≥ 0 , (A.148)
or equivalently 0
P r{τ > τ 0 } = e−λ·τ . (A.149)
As earlier, the exponential distribution is memoryless so

P r{τ > u + v} = P r{τ > u} · P r{τ > v} . (A.150)


Pn
Since tn = i=1 τn is a sum of independent random variables, then the distribution of tn derives as the
convolution of the exponential distribution n − 1 times or the Erlang densityfunction (a continuous
random variable)
λn · tn−1 · e−λ·t
ptn (t) = . (A.151)
(n − 1)!

523
The discrete-valued distribution for B(t) derives from looking at the integral of the Erlang density
over the infinitessimmally small interval (t < tn+1 < t + δ) as in [?], which then provides the Poisson
probability mass function
(λ · t)n · e−λ·t
pB(t) (n) = . (A.152)
n!

A.5.3 Queue Modelling


Queuing convention describes (known as Kendall’s notation https://en.wikipedia.org/wiki/Kendall%
27s_notation) queues according to a notation, which is simplified here for this text’s purposes, arrival-
distribution/channel-processing-distribution/U /queue-depth. This text uses only the letters M , G, or
D for the arrival and channel-processing distributions, where M corresponds to Poisson arrivals (so ex-
ponential inter-arrival or channel-processing times), G for general distribution, and D for deterministic
(non random). Very common in this text is the notation M/D/U queue, which means Poisson arrival
process, the channel immediately processes its inputs without delay, and U users. The simplest queue
is the M/M/1 queue (which will behave like the M/G/1 queue for this text’s purposes, leaving some
formulas for M/M/1 that simplify for M/D/1.
The Poisson processes’ exponentially distributed inter-arrival time has probability for some measured
τ since last arrival P r{t > τ } = 1 = e−λ·τ . Description by a state-machine or Markov model, with matrix
entires pn/n±1 requires looking at the instantaneous probability because multiple arrivals could occur in
any finite time. For infinitely small time increment dt, the probability becomes λ · dt, and similarly for
the channel-entry times, b · dt.

A.5.3.1 M/M/1 Queue:


Figure A.5 illustrates the corresponding state machine.

𝑝"! =1 − 𝜆 % 𝑑𝜏 𝑝%! =1 − 𝜆 % 𝑑𝜏 𝑝$! =1 − 𝜆 % 𝑑𝜏 𝑝&! =1 − 𝜆 % 𝑑𝜏


" % $ &

𝑝$! =𝜆 % 𝑑𝜏 𝑝&! =𝜆 % 𝑑𝜏
% $
𝑝!⁄" =𝜆 % 𝑑𝜏

State State State State


0 1 2 3

𝑝"⁄! =𝑏 % 𝑑𝜏 𝑝!⁄# =𝑏 % 𝑑𝜏 𝑝#⁄$ =𝑏 % 𝑑𝜏

Figure A.5: M/M/1 Queue - simple birth-death Markov Process state-machine model.

The balance equation simplifies in this case to

λ · πn = b · πn+1 (A.153)

Defining ρ = λ/b, it is clear that λ ≤ b for queue stability, and

ρn+1 = ρn · (1 − ρ) (A.154)

This leads to calculation of the delays and number of symbols (B) as


ρ λ
B = = (A.155)
1−ρ b−λ

524
1
∆ = (A.156)
b−λ
ρ
∆q = (A.157)
b−λ
ρ2
Bq = . (A.158)
1−ρ
The variance of the delay depends on scheduling policy and can be complicated to compute. The
M/M/1 queue is sometimes called a statistical multiplexer because as the users arrive randomly they
are served by (typically) a high-speed channel. To understand this better, the input may be viewed as
U users’ messages/symbols all multiplexed to one channel. Each of the users may have an average data
rate of λ/U , which of course sum over all users to λ. The aggregate as well as the individual message
streams are Poisson processes, just the users are slower individually. A deterministic multiplexer
instead equally apportions b/U of the channel bandwidth to each user (presuming tacitly the capacity
region allows such equal apportionment). It has then delay

1 U
∆det = = = U · ∆stat . (A.159)
λ/U − b/U b−λ

The statistical multiplexer thus has less delay than would have say a TDM (or OFDM - see Chapter
4) system. However, a slightly different queue is the D/D/1 queue where each user is presumably
streaming at data rate λ/U and matched to its turn in the channel that runs at b ≥ λ. The D/D/1 system
has constant delay of one symbol and no variation, and no chance of buffer overflow. While the M/M/1
has significantly lower average delay for U > 1, very large delays can have nonzero probability. Thus,
the trade-off to use statistical multiplexing (over deterministic) is not so simple, eventually depending
on the relative user rates and the scheduling method and maximum tolerable delay. Systems with many
users continually streaming (e.g., video), especially when the aggregate data rate approaches the channel
sum-capacity, are better served by deterministic scheduling. Other systems, with more infrequent use
(whether low data rate or high data rate for any particular user) may be better served by a statistical
multiplexer. A method intermediate to Chapter 2’s QPS (queue-proportional scheduling) that considers
the channel capacity region, an intermediate queue of interest investigates a system for which there are
U parallel channels each carrying b bits/symbol, as in the next subsection.

A.5.3.2 M/M/U Queue


The M/M/U system views a MAC as having U users on a single channel (common transmitter and
receiver,), each of rate bu = b, so the sum rate is bsum = U · b. This does not directly match any
multiuser physical-layer channel, any of which have either or both of physically isolated transmitters
and/or receivers. Nonetheless, the theory naturally follows the M/M/1 queue and so appears here.
The state machine has return to state u − 1 from state u that can relieve up to u ≤ U users needs
simultaneously, which the u · b · dt in the lower transitions represents. The quantity ρ remains as λ/b,
but now can have a value as large as U . Otherwise, analysis proceeds similar to the M/M/1 case and
the balance equation becomes: 
n · b · πn n ≤ U
λ · πn−1 = (A.160)
U · b · πn n > U
The solution, using much algebra, for the steady state probabilities is
 hP i−1
U −1 ρn U ·ρU


 n=0 n + U(U −ρ) n=0



πn = ρn . (A.161)
 π0 · n 0≤n≤U




 ρn
π0 · U 0≤n≤U

525
𝑝"! =1 − 𝜆 % 𝑑𝜏 𝑝%! =1 − 𝜆 % 𝑑𝜏 − 𝑏 % 𝑑𝜏 𝑝$! =1 − 𝜆 % 𝑑𝜏 − 2𝑏 % 𝑑𝜏 𝑝*! =1 − 𝜆 % 𝑑𝜏 − 𝑈𝑏 % 𝑑𝜏
" % $ *

𝑝$! =𝜆 % 𝑑𝜏
%
𝑝!⁄" =𝜆 % 𝑑𝜏 𝑝&! =𝜆 % 𝑑𝜏 𝑝&()(%! =𝜆 % 𝑑𝜏
&'%
… &()

State State State State


0 1 2 𝑈

𝑝!⁄# =2𝑏 % 𝑑𝜏

𝑝"⁄! =𝑏 % 𝑑𝜏
𝑝$%!!$ =𝑈𝑏 % 𝑑𝜏 𝑝$&'! =𝑈𝑏 % 𝑑𝜏
$&'&`

Figure A.6: M/M/U Queue - U -user birth-death Markov Process state-machine model.

This leads to calculation of the delays and number of symbols (B) as

π0 · ρU
B = ρ+ (A.162)
(U − 1)! · (U − ρ)2
1 1 π0 · ρU +1
∆ = + · (A.163)
b λ (U − 1)! · (U − ρ)2
1 π0 · ρU +1
∆q = · (A.164)
λ (U − 1)! · (U − ρ)2
π0 · ρU +1
Bq = . (A.165)
(U − 1)! · (U − ρ)2

Another related quantity of interest is the probability that a user has to wait in queue

X π0 · ρU
PQ = ·πn = , (A.166)
n=m
(U − 1)! · (U − ρ)

which is known as the Erlang C formula [4]. The M/M/U queue also has an interesting large-number-
of-users property that the system essentially creates a Poisson arrival process to whatever comes next:
 n
1 λ
lim πn = π0 · (A.167)
U →∞ n! b
n
e−λ/b
 
λ
= · (A.168)
n! b
lim B = ρ (A.169)
U →∞

A.5.3.3 Finite Queues and Blocking


Any implementable queue must have finite memory and can only handle a finite number of users, so
the M/M/U queue then adjusts to the closely related M/M/U /U queue that blocks user inputs when in
state U , as in Figure A.7.

526
𝑝"! =1 − 𝜆 % 𝑑𝜏 𝑝%! =1 − 𝜆 % 𝑑𝜏 − 𝑏 % 𝑑𝜏 𝑝$! =1 − 𝜆 % 𝑑𝜏 − 2𝑏 % 𝑑𝜏 𝑝(! =1 − 𝑈𝑏 % 𝑑𝜏
" % $ (

𝑝$! =𝜆 % 𝑑𝜏
%
𝑝!⁄" =𝜆 % 𝑑𝜏 𝑝&! =𝜆 % 𝑑𝜏
&'%

State State State State
0 1 2 𝑈

𝑝!⁄# =2𝑏 % 𝑑𝜏

𝑝"⁄! =𝑏 % 𝑑𝜏
𝑝$%!!$ =𝑈𝑏 % 𝑑𝜏
Figure A.7: M/M/U Queue - U -user birth-death Markov Process state-machine model.

The relationship ρ < U (or ρ ≤ 1 for a single user, meaning channel must run faster than inputs
on average) need no longer hold, but there is a nonzero probability that messages are simply not sent,
which is πU . The balance equations provide in this case
ρn
πn = π0 · , (A.170)
n!
which then yields (probabilities for states 0 to U must add to 1)
1
π0 = PU . (A.171)
n=0 ρn /n!

The blocking probability is then


ρU
PB = π U = π 0 ·
, (A.172)
U!
which is the Erlang B formula. This formula can be recursively computed in the number of users
according to5
U
PB−1 (ρ, U ) = PB−1 (ρ, U − 1) + . (A.173)
ρ · PB (ρ, U − 1)
Further algebra leads to

B = ρ · (1 − Pb ) (A.174)
1
∆= , (A.175)
b
so the delay is basically the channel processing time for a symbol, but that may occur with unacceptable
blocking probability.

5 The inverse form has less dynamic range in quantities so is easier to calculate in finite precision.

527
A.6 Gram-Schmidt Orthonormalization Procedure
This appendix illustrates the construction of a set of orthonormal basis functions ϕn (t) from a set of
modulated waveforms {xi (t), i = 0, ..., M − 1}. The process for doing so, and achieving minimal dimen-
sionality is called Gram-Schmidt Orthonormalization.

Step 1:

Find a signal in the set of modulated waveforms with nonzero energy and call it x0 (t). Let

∆ x0 (t)
ϕ1 (t) = p , (A.176)
Ex0
R∞ p 
where Ex = −∞ [x(t)]2 dt. Then x0 = Ex0 0 ... 0 .

Step i for i = 2, ..., M :

∆ R∞
• Compute xi−1,n for n = 1, ..., i − 1 (xi−1,n = −∞
xi−1 (t)ϕn (t)dt).
• Compute
i−1

X
θi (t) = xi−1 (t) − xi−1,n ϕn (t) (A.177)
n=1

• if θi (t) = 0, then ϕi (t) = 0, skip to step i + 1.


• If θi (t) 6= 0, compute
θi (t)
ϕi (t) = p , (A.178)
Eθi
R∞  p 0
where Eθi = −∞
[θi (t)]2 dt. Then xi−1 = xi−1,1 ... xi−1,i−1 Eθi 0 ... 0 .

Final Step:

Delete all components, n, for which ϕn (t) = 0 to achieve minimum dimensional basis function set,
and reorder indices appropriately.

528
Bibliography

[1] Alberto Leon-Garcia. “Probability, Statistics, and Random Processes for Electrical Engineering”, 3rd
Edition. Pearson Education, USA, 2007.
[2] Anthanasios Papoulis. “Probability, Random Variables, and Stochastic Processes”. McGraw-Hill,
New York, 1965.
[3] Robert G. Gallager “Stochastic Processes: Theory for Application”. Cambridge Press,
UK, 2013. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-262-discrete-stochastic-processes-spring-2011/

[4] Dimitris Bertsekas and Robert Gallager “Data Networks, Second Edition”. Prentice Hall, Englewood
Cliffs,NJ, 1992.
[5] Sheldon Ross “Introduction to Probability and Statistics for Engineers and Scientists”, 6th Edition.
Elsevier, Academic Press, 2020.

529
Index

arrival moment
rate, 523 first, 503
arrival process, 523 generator, 504
auotcorellation, 503 multiplexer
autocorrelation deterministic, 525
matrix, 506 multipllexer
statistical, 525
Bayes Theorem, 505
Bernoulli Distribution, 502 Perron Frobenius, 522
Poisson Process, 523
chain rule, 506 probability
characteristic function, 504 density, 503
Chebyshev, 509 mass function, 502
Chernoff Bound, 510
covariance random variable
matrix, 506 continuous, 503
discrete, 502
density random variables
exponential, 503 independence, 505
uniform, 503 relative frequency, 510
distribution renewal process, 523
cummulative, 503
geometric, 502
marginal, 504
Poisson, 502
uniform discrete, 502

Erlang Distribution, 523


Erlang Formula
B, 527
C, 526

gram-schmidt, 528

interarrival time, 523

Markov Process, 521


aperiodic, 522
Birth-Death, 522
irreducible, 522
periodic, 522
mean
sample, 512
mean square
value, 503

530
1

2
3
4

5
6
7
8

531

You might also like