Lectnotes 1
Lectnotes 1
1 Introduction
You have possibly learned something about signals, done some implementation, and had
some mathematics in your undergraduate days. This course essentially explains how they
come together in equal proportions in modern digital communication systems.
While there is a lot to be written about the evolution of digital communication, we will
make some short scribes here. Pioneering works from Maxwell, Hertz, Bell and others led to
long distance communication becoming a reality, the synergy of theory and practice crucial
to this evolution. It was Marconi’s turn at the dawn of the past century to push communi-
cation out of their wired confinements. Unlike the analog telephones, Marconi’s radio was
about digital communication, for transmitting Morse codes through wireless telegraphic
links. The less written part is that Marconi’s effort in commercialization of communica-
tion techniques led to unprecedented developments, whose footprints still remain in the
modern digital era. In particular, the development of diodes, triodes, transistors, ICs etc
were all defining points in the race for communication superiority. History repeats itself!.
Look around you, all sorts of communication gadgets springs to life, from tiny RF-ids
to spohisticated smart phones, all striving to keep abreast of the latest communication
demands. Furthermore, at the time of this writing, communication is driving the compu-
tational market too, from GPUs to cloud computers. It will be interesting at this point to
read Chapter 1 of Wozencraft and Jacobs, it was written 50 years ago!
While Marconi is known as the father of radio, that attribute for digital communication
unequivocally goes to Claude Shannon. Among the most influential figures of twentieth
century, Shannon went several steps further from the initial stones laid by Hartley and
Nyquist, the resulting theory had far reaching consequences. Unlike the other biggest
development of twentieth century, which went nuclear, the communication revolution went
the ballistic way. Above all, this was a giant leap in enabling communication theory to
become a mathematical discipline, facilitating analysis and computation even before actual
systems are practically rolled out. That we could be sure of communicating to the MARS
shuttly, modulo mishaps, even before launching a test run is a testament to the resounding
nature of communication theory. Following Shannon’s foot-steps, in order to study or
devise a communication technique for an environment, the following course of action is
advocated.
• abstract the practical constraints to meaningful mathematical models for the signals
and systems.
• solve the mathematical objective to obtain optimal communication schemes.
• translate and optimal schemes to practical design guidelines, which can be realized
by existing apparatus.
• test the performance in operating environments, and refine the design and implemen-
tation.
Most communication systems that you see around have gone through such an evolutionary
path, only to be named as a revolutionary technology later.
2 Some Notations
Let us list some notations that we will try to stick throughout the course. In particular,
random variables will be denoted by capital letters, and their realizations will be written
in small cases. For example, random variable Y takes values in the set y ∈ Y. Here Y is
the sample space of Y .
Unless otherwise specified, a vector will be considered as a column vector. Thus, a
vector u is an n × 1 vector, and uT will be written as Ð
→
u . In communication theory, often
we have to dealy with complex numbers. So we will define a dot product of vectors in the
complex field. For n−dimensional column vectors u and v, we define the dot product as
i=1
In the above figure, W is a random variable, which denotes a set of information symbols
(usually bits), known as a message. The sample space for W can be taken as the index set
{1, ⋯, M }, where M is some finite number1 . It is usual to assume that the messages are
uniformly distributed, i.e.
1
P (W = i) = , 1 ≤ i ≤ M.
M
1
The sigma field is not explicitly specified whenever it is understood by the context, the power set in
this example
2
The encoder translates the message W to an n−dimensional vector X1 , ⋯, Xn in an appro-
priate field. We will denote the vector X1 , ⋯, Xn by X1n , or sometimes X n , in this course. In
particular, Xi ∈ X are symbols which are suitable to be transmitted over the given medium.
Another way to visualize X n is to consider them as samples of an underlying continuous
time waveform sent over a medium, typically voltage/current waveforms in the baseband
circuitry. We will opine more on this view later. A vector of values Y n ∶= Y1 , ⋯, Yn are
received, and the decoder declares Ŵ ∈ {∅, 1, ⋯, M } as the estimate of the transmitted
message. This is a high level view of the communication systems considered in this course.
3.1 Encoder
The encoder can be described by a M × n matrix, where the rows are indexed by the
messages in {1, ⋯, M }. This matrix is also known as a codebook, where each row is named
as a codeword. Thus, each message is mapped to an n−dimensional codeword, this is
demonstrated in Figure 2.
W =M xM 1 , xM 2 , ⋯,xM n
3.2 Decoder
The observed output vector Y n ∈ Y n is also random. The randomness here is induced by
two sources. Since W is random, so is the transmitted codeword X n (W ), and the resulting
output Y n . Moreover, even when M = 1, the output can be random due to the randomness
induced by the medium. The decoder Ŵ is a surjective mapping Ŵ (Y n ) from the space
Y n to {∅, 1, ⋯, M }. The first element asserts the receiver’s freedom to say ‘I do not know’,
or declare an erasure. Nevertheless, in most cases we demand the decoder to declare one
of the messages.
3
D1
D2
DM
In Figure 3, the region Di represents the set of Y n ∈ Y n which are mapped to message i
by the decoder, i.e.
Di = {y n ∈ Y n ∣Ŵ (y n ) = i}.
More generally, a decoding rule is a partition of the space Y n to M disjoint subsets.
We have left out the medium in the above description, modelling this appropriately is
paramount to the success of our design. Down the line, we will treat this more rigorously.
In fact, the system model for the medium can be very much application dependent. As an
exercise, can you look around and identify three different communication systems that are
used in our day to day life. Though the long haul optical cables and high speed LAN cables
still adorn our offices and infrastructure, the ubiquitous spread of mobile devices has made
digital communication almost synonymous with wireless communication.
X p(y∣x) Y
The above representation is that of a memoryless channel, where the input symbol x ∈ X
is mapped to one of the output symbols y ∈ Y, with probability p(y∣x). Thus, the medium
is specified by a collection of probability laws p(y∣x), one for each x ∈ X . Notice that we
did not insist X or Y being scalars, or even complex valued. This gives us enough flexibility
as a generic model suitable for communication theoretic analysis. If you are not familiar
with probability, don’t worry, we can make the representation even shorter, and depict the
communication medium literally as a pipe from X to Y , as in Figure 5.
4
X Y
Such a representation fits systems where the output takes the form Y = f (X, Z), where
Z is some randomness (often noise) independent of the transmissions. The function f (⋅, ⋅)
captures the effect of the transmitted symbol at the receiver. In wireless and wireline
systems, it often makes sense to assume that f (⋅, ⋅) is a linear function of the arguments.
Let us now demonstrate the components of a communication system using an example,
albeit a futuristic one, i.e. something that is yet in the developing phase. The idea of
interference cancellation is among the latest developments in digital communication.
4 Interference Cancellation
Interference is something we all worry about. While it is unclear how to model this, in one
sentence, ‘can we have several simultaneous communications over a shared medium’. By
appealing to physical laws, we know that simultaneous transmissions will cause superposi-
tion of the EM waves at the receiving terminal. Since we are yet to introduce the details of
the medium, our immediate approach is to model the superposition of transmissions by an
appropriate graphical representation, which can be visualized as an extension of Figure 5
to several communication links.
H X1 Y1
X1
1
H X2 Y2
2
X2
Imagine a cellular infrastructure with a frequency reuse factor of unity. Thus, the
transmitter-receiver pairs in the neighboring cells may interfere with each other, as depicted
by the graph in Figure 6. Transmitter i ∈ {1, 2} emits the scalar symbols Xi ∈ C. In
our notation the transmitted vector for n consecutive transmission instants from user i is
Xi1 , Xi2 , ⋯, Xin ∶= Xi1
n
. The receiver observes
where the functions f1 (⋅, ⋅) and f2 (⋅, ⋅) model appropriate superpositions (linear). For ex-
ample, we use the popular choice of fk (x, y) = α1k x+α2k y, k = 1, 2 in our illustrations below.
The coefficients αik are called the fading coefficients, usually assumed to be complex scalars.
Notice that we excluded any additive noise in (1), this is purely to illustrate the concept
of interference management, we can later add noise in our discussions. Also let us somewhat
naively assume that interference results in collision and data loss. How do we communicate
in this situation? Sounds familiar!, imagine so many people discussing with their respective
5
counterparts across a conference table, or a crowded party room. Such interference is also
the subject of user management within a cell. In such situations, the good old TDMA
(used in GSM), and the more recent CDMA come to our rescue. For instructive purposes,
let us go through some abstract details.
4.0.1 TDMA
The essential idea here is to orthogonalize the communication resources in time. i.e. the
users take turn in transmitting. Suppose dij , j ≥ 1 is the data available at user i. In a
simple model where the two users are given alternate transmission instants, user 1 transmits
X1j , j ≥ 1 = {d11 , 0, d12 , 0, ⋯}, while user 2 sends {X2j,j≥1 } = {0, d21 , 0, d22 , ⋯}, where the zero
symbol stands for no transmission. For simplicity, assume that the data symbols are real,
though our discussion easily extends to complex values.
Observe that in order to send the data symbol d1i , user 1 multiplies the symbol d1i with
the row vector u⃗1 = [1, 0], and schedules the resulting vector for the next two transmissions.
In particular, we can write
In order to get d11 back at user 1, we first combine the elements of Y11 and Y12 . This can
be achieved by taking dot product with an appropriate weight vector Ð →
v = [v11 v12 ]. Thus,
The vector Ð →v is also called a combiner. It will also be called a zero-forcer if all other data
than the intended one is cancelled at the output of the encoder. It is easy to see that the
choice of Ð
u→1 = Ð
v→1 and Ð
u→2 = Ð
v→2 is sufficient for zero-forcing. In other words
Ð
u→1 1 0 Ðv→1 1 0
U = [Ð→ ] = [ ] and V = [Ð → ]=[ ] (2)
u2 0 1 v2 0 1
will guarantee that the data symbols are transmitted without interference from the other
user.
In general, we call the matrix U with rows Ð u→i , 1 ≤ i ≤ n as a beamforming matrix.
Similarly V with rows Ð v→i , 1 ≤ i ≤ n is called the zero-forcing matrix. To illustrate, consider
n users operating in TDMA mode. We can collect n symbols at the output of receiver k
as a vector Ð
y→k , given by
Ð
y→k = ∑ αik di Ð
u→i .
n
(3)
i=1
6
4.1 CDMA
In the beamforming and zero-forcing employed in the previous section, the key property
ensuring interference free transmission is that
⟨Ð
u→, Ð
v→⟩ = δ .
k i k,i
Generalizing this, we can pick any orthonormal n × n matrix as U , and then take V = U .
Recall that Ðu→i (ith row of U ) as the beamforming vector at transmitter i, and Ð
v→j as the
zeroforcer at receiver j. From (3), we have
⟨Ð
y→j , Ð
v→j ⟩ = ∑ αij di ⟨Ð
u→i , Ð
v→j ⟩ = αjj
n
∗ ∗
dj .
i=1
Notice that any unitary U is good enough, and the choice U = I does indeed give TDMA.
A popular technique which designs an orthogonal U using values from the set {−1, +1} is
called CDMA or plain CDMA.
1 1
2 2
3 3
4 4
5 5
In the interference graph shown, there are five transmitter receiver pairs participating in
communication. Transmitter i intends to communicate to receiver i, shown by the dashed
line. The solid lines represent the additive interference structure. Thus, user 1 causes
additive disturbance at receivers 3 as well as 4. The link coefficients αij are taken to be
identically unity for a simple exposition.
While TDMA/CDMA will achieve an efficiency of 51 data symbol per transmission,
better efficiencies are feasible in the above network. For example, users 1 and 2 can transmit
simultaneously without interference. However, what more can be done is slightly unclear
at this point. Let us build a more formal mechanism to analyze this model. To this end,
let U be a m × n beamforming matrix. Thus, the beamformer at transmitter i is Ð u→i , which
is m− dimensional. Our intention is get m as small as possible. By collecting m samples
as a vector at receiver i,
Ð
y→i = ∑ dj Ð
u→j ,
j∈Ai
7
where Ai is the set of transmitters which are connected to receiver i. After combining or
zeroforcing
⟨Ð
y→i , Ð
v→i ⟩ = ⟨ ∑ dj Ð
u→j , Ð
v→i ⟩
j∈Ai
= di ⟨ui , vi ⟩ + ∑ dj ⟨Ð
u→j , Ð
v→i ⟩.
j∈Ai ,j≠i
⟨Ð
u→j , Ð
v→i ⟩ = δi,j for j ∈ Ai . (4)
where × inside the matrix stands for a don’t care condition. In other words, we are free to
fill those with any values that we wish, a reminiscent of the so called Matrix Completion
problem. In this particular example, we take an easy way out by filling all the don’t care
values by 1 in the first 4 columns. Fortunately, this makes two of the columns to repeat,
and taking the fifth column as the difference of the second and third column only changes
the don’t care values. We can then write,
⎡1 0 1 ⎤⎥ ⎡⎢1 0⎤⎥
⎢ 1 0
⎢1 1 0 ⎥⎥ ⎢⎢1 1⎥⎥
⎢ 1 1
⎢ ⎥ ⎢ ⎥ 1 1 0 0 1
⎢0 1 −1⎥ = ⎢0 1⎥ [ ],
⎢ ⎥ ⎢ ⎥ 0 0 1 1 −1
0 1
⎢0 1 −1⎥⎥ ⎢⎢0 ⎥
⎢ 0 1 1⎥
⎢1 0 1 ⎥⎦ ⎢⎣1 0⎥⎦
⎣ 1 0
which gives a rank-2 decomposition. The interpretation of this from the transmission side
is as follows. Users 1, 2 and 5 send their respective data symbols in every odd time-slots.
Even time-slots are occupied by users 2, 3 and 5. By multiplying by Ð v→i at receiver i data
symbol di can be recovered.
5 Matrix Decomposition
In the above interference management problem, there were two aspects that we used in
getting the decomposition. The first was the matrix completion problem, while maintaining
a low rank to the matrix. The second was in finding a decomposition. We elaborate more
on the latter part now. In particular we will learn the singular value decomposition (SVD).
8
5.1 SVD
Theorem 1. A m × n matrix A can be decomposed as A = U ∆V † , where U and V are
unitary and ∆ in an m × n diagonal matrix.
Before we embark of the proof, the reader should be reassured that the proof uses noth-
ing but some elementary techniques from linear algebra. In particular Eigen vectors and
Eigen values of the matrix AA† play key role. Recall that x is an Eigen-vector corresponding
to Eigen value λ of AA† if ∣∣x∣∣ = 1 and
Ax = λx.
Lemma 1. All the Eigen values of AA† are non-negative, and the Eigen vectors form an
orthonormal set.
Proof. Notice that x† AA† x = x† λx = λ∣∣x∣∣2 = λ. On the other hand, x† AA† x = ∣∣A† x∣∣2 ≥ 0,
being the norm of a vector. This proves the first assertion. For the second take λ1 ≠ λ2 as
two Eigen values, with x and y the corresponding Eigen vectors. It is easy to see that
y † AA† x = (AA† y) x.
†
The LHS above is nothing but y † λ1 x = λ1 y † x, whereas the RHS is λ2 y † x. For equality
y † x = 0, which is the intended result.
For the case where λ1 = λ2 =, ⋯, = λl , and other Eigen values different, clearly u1 , ⋯, ul
will be orthogonal to all other Eigen vectors ui with λi ≠ λ1 . Thus u1 , ⋯, ul will span a
sub-space. We will choose the orthonormal basis of this sub-space as the Eigen vectors.
Let us now write [λ1 , λ2 , ⋯, λr , 0, ⋯, 0] in descending order as the ordered Eigen values
of AA† , and let U = [u1 , u2 , ⋯, um ] denote the corresponding Eigen vectors. Here r is called
the rank of the matrix A with r ≤ min{m, n}.
Looks like we are being partial to AA† . To change that perception, we can collect the
ordered Eigen values of A† A as [λ′1 , ⋯, λ′r , 0, ⋯, 0] and let the corresponding Eigen vectors
be V = [v1 , ⋯, vn ]. By the above arguments V contains an orthonormal set of vectors as
well.
Lemma 2.√The matrices AA† and A† A have the same non-zero Eigen values, and we can
take vj = ( λj )−1 A† uj .
A† AA† ui = A† λi ui = λi A† ui .
Thus by taking vi = A† ui
∣∣A† ui ∣∣
, we get
A† Avi = λi vi ,
thus proving both the statements, since ∣∣Aui ∣∣2 = u†i AA† ui = λi .
The following lemma is now straightforward.
Lemma 3. √
u†i Avj = λj δi,j ,
where δ{⋅} is the Kronecker delta.
9
√ † √
Proof. By Lemma 2, we have u†i Avj = (A† ui )† vj = λ vi vj = λδi,j .
Now, in order to obtain the proof for SVD, let us compute U † AV with U and V defined
as above. Assume m ≥ n for simplicity.
U † AV = U † A [v1 , ⋯, vn ] (6)
= U [Av1 , ⋯, Avn ]
†
(7)
⎡ u† ⎤
⎢ 1⎥
⎢ †⎥
⎢ u2 ⎥
⎢ ⎥
= ⎢⎢ . ⎥⎥ [Av1 , ⋯, Avn ] (8)
⎢ . ⎥
⎢ ⎥
⎢ †⎥
⎢um ⎥
⎣ ⎦
√
The entry at index (i, j) of this matrix is nothing but u†i Avj = λj δi,j . Thus
U † AV = ∆,
where √
⎡ λ1 0 ⋯ 0 ⎤⎥
⎢
⎢ 0 √λ 0 ⋯
0
⎢ 0 ⎥⎥
⎢ 2
⎥
⎢ ⎥
⎢ ⎥
∆ = ⎢⎢ √ ⎥⎥
⎥
⎢ 0 0 ⋯ 0
⎢ λn ⎥
⎢ ⎥
⎢ 0 0 ⋯ 0 0 ⎥⎥
⎢
⎢ 0 0 ⋯ 0 0 ⎥⎦
⎣
Notice that U † AV = ∆ will imply that A = U † ∆V † , since U and V are unitary matrices,
i.e. U † U = U U † = I, and V † V = V V † = I. Notice however that U U † and V V † may be of
different dimensions.
10