Comtheory
Comtheory
Fall 2003
1
Group Signal Processing Systems, Electrical Engineering Department, Eindhoven University of Tech-
nology.
Contents
I Some History 7
1 Historical facts related to communication 8
1.1 Telegraphy and telephony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Wireless communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Classical modulation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Fundamental bounds and concepts . . . . . . . . . . . . . . . . . . . . . . . . . 11
1
CONTENTS 2
7 Matched filters 62
7.1 Matched-filter receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2 Direct receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.3 Signal-to-noise ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.4 Parseval relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8 Orthogonal signaling 72
8.1 Orthogonal signal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.2 Optimum receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3 Error probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.4 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
CONTENTS 3
VI Appendices 158
A An upper bound for the Q-function 159
H Decibels 179
Some History
7
Chapter 1
SUMMARY: In this chapter we mention some historical facts related to telegraphy and
telephony, wireless communication, electronics, and modulation. Furthermore we shortly
discuss some classic fundamental bounds and concepts relevant for digital communication.
1819 Hans Christian Oersted (Denmark 1777 - 1851) Discovery of the fact that an electric
current generates a magnetic field. This can be regarded as the first electro-magnetic re-
sult. It took twenty years after Volta’s invention to realize and prove that this effect exists
although it was known at the time that a stroke of lightning magnetized objects.
1827 Joseph Henry (U.S. 1797 - 1878) Constructed electromagnets using coils of wire.
1838 Samuel Morse (U.S. 1791 - 1872) Demonstration of the electric telegraph in Morristown,
New Jersey. The first telegraph line linked Washington with Baltimore. It became opera-
tional in May 1844. By 1848 every state east of the Mississippi was linked by telegraph
lines. The alphabet that was designed by Morse (a portrait-painter) transforms letters into
variable-length sequences (code words) of dots and dashes (see table 1.1). A dash should
last roughly three times as long as a dot. Frequent letters (e.g. E) get short code words,
less frequently occurring letters (e.g. J, Q, and Y) are represented by longer words. The
first transcontinental telegraph line was completed in 1861. A transatlantic cable became
operational in 1866. However already in 1858 a cable was completed but it failed after a
few weeks.
8
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 9
It should be mentioned that in 1833 C.F. Gauss and W.E. Weber demonstrated an electro-
magnetic telegraph in Göttingen, Germany.
1875 Alexander Graham Bell (Scotland 1847 - 1922 U.S.) Invention of the telephone. Bell
was a teacher of the deaf. His invention was patented in 1876 (electro-magnetic tele-
phone). In 1877 Bell established the Bell Telephone Company. Early versions provided
service over several hundred miles. Advances in quality resulted e.g. from the invention
of the carbon microphone.
1900 Michael Pupin (Yugoslavia 1858 - 1935 U.S.) Obtained a patent on loading coils for the
improvement of telephone communications. Adding these Pupin-coils at specified inter-
vals along a telephone line reduced attenuation significantly. The same invention was
disclosed by Campbell two days later than Pupin. Pupin sold his patent for $455000 to
AT&T Co. Pupin (and Campbell) were the first to set up a theory of transmission lines.
The implications of this theory were not obvious at all for ”experimentalists”.
1873 James Clerk Maxwell (Scotland 1831 - 1879, England) The publication of “Treatise on
Electricity and Magnetism”. Maxwell combined the results obtained by Oersted and Fara-
day into a single theory. From this theory he could predict the existence of electromagnetic
waves.
1886 Heinrich Hertz (Germany 1857 - 1894) Demonstration of the existence of electromag-
netic waves. In his laboratory at the university of Karlsruhe, Hertz used a spark-transmitter
to generate the waves and a resonator to detect them.
1890 Edouard Branly (France 1844 - 1940) Origination of the coherer. This is a receiving de-
vice for electromagnetic waves based on the principle that, while most powdered metals are
poor direct current conductors, metallic powder becomes conductive when high-frequency
current is applied.
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 10
1896 Aleksander Popov (Russia 1859 - 1906) Wireless telegraph transmission between two
buildings (200 meters) was shown to be possible. Marconi did similar experiments at
roughly the same time.
1901 Guglielmo Marconi (Italy 1874 - 1937) A radio signal was received at St. John’s, New-
foundland. The radio signal had originated from Cornwall, England, 1700 miles away.
1955 John R. Pierce (U.S. 1910 - 2002) Pierce (Bell Laboratories) proposed the use of satel-
lites for communications and did pioneering work in this area. Originally this idea came
from Arthur C. Clarke who suggested already in 1945 the idea to use earth-orbiting satel-
lites as relay points between earth stations. The first satellite, Telstar I, built by Bell Lab-
oratories, was launched in 1962. It served as a relay station for TV programs across the
Atlantic. Pierce also headed the Bell-Labs team that developed the transistor and suggested
the name for it.
1.3 Electronics
1874 Karl Ferdinand Braun (Germany 1850 - 1918) Observation of rectification at metal con-
tacts to galena (lead sulfide). These semiconductor devices were the forerunners of the
“cat’s whisker” diodes used for detection of radio channels (crystal detectors).
1904 John Ambrose Fleming (England 1849 - 1945) Invention of the thermionic diode, “the
valve”, that could be used as rectifier. A rectifier can convert trains of high-frequency
oscillations into trains of intermittent but unidirectional current. A telephone can then be
used to produce a sound with the frequency of the trains of the sparks.
1906 Lee DeForest (U.S. 1873 - 1961) Invention of the vacuum triode, a diode with grid con-
trol. This invention made practical the cascade amplifier, the triode oscillator and the
regenerative feedback circuit. As a result transcontinental telephone transmission became
operational in 1915. A transatlantic cable was laid not earlier than 1953.
1948 John Bardeen, Walter Brattain and William Shockley Development of the theory of the
junction transistor. This transistor was first fabricated in 1950. The point contact transistor
was invented in December 1947 at Bell Telephone Labs.
1958 Jack Kilby and Robert Noyce (U.S.) Invention of the integrated circuit.
1909 George A. Campbell (U.S. 1870 - 1954) Internal AT&T memorandum on the “Electric
Wave Filter”. Campbell there discussed band-pass filters that would reject frequencies
other than those in a narrow band. A patent application filed in 1915 was awarded [2].
1915 Edwin Howard Armstrong (U.S. 1890 - 1954) Invention of regenerative amplifiers and
oscillators. Regenerative amplification considerably increased the sensitivity of the re-
ceiver. DeForest claimed the same invention.
1915 Hendrik van der Bijl, Ralph V.L. Hartley, and Raymond A. Heising In a patent that
was filed in 1915 van der Bijl showed how the non-linear portion of a vacuum tube could be
utilized to modulate and demodulate. Heising participated in a project (in Arlinton, VA) in
which a vacuum-tube transmitter based on the van der Bijl modulator was designed. Heis-
ing invented the constant-current modulator. Hartley also participated in the Arlington
project and designed the receiver. He invented an oscillator that was named after him. In a
paper published in 1923 Hartley explained how suppression of the carrier signal and using
only one of the sidebands could economize transmitter power and reduce interference.
1915 John R. Carson Publication of a mathematical analysis of the modulation and demodula-
tion process. Description of single-sideband and suppressed carrier methods.
1918 Edwin Howard Armstrong Invention of the superheterodyne radio receiver. The combi-
nation of the received signal with a local oscillation resulted in an audio beat-note.
1922 John R. Carson “Notes on the Theory of Modulation [3]”. Carson compares amplitude
modulation (AM) and frequency modulation (FM) . He came to the conclusion that FM
was inferior to AM from the perspective of bandwidth requirement and distortion. He
overlooked the possibility that wide-band FM might offer some advantages over AM as
was later demonstrated by Armstrong.
1933 Edwin Howard Armstrong Demonstration of an FM system to RCA. A patent was gran-
ted to Armstrong. Despite of the larger bandwidth that is needed, FM gives a considerable
better performance than AM.
n(t)
º?·
s(t) linear ŝ(t)
- + - -
¹¸
filter
1928 Ralph V.L. Hartley (U.S. 1888 - 1970) Determined the number of distinguishable pulse
amplitudes [8]. Suppose that the amplitudes are confined to the interval [−A, +A], and
that the receiver can estimate an amplitude reliably only to an accuracy of ±1. Then
the number of distinguishable pulses is roughly 2(A+1)
21 = 1 A
+ 1. Clearly Hartley was
concerned with digital communication. He realized that inaccuracy (due to noise) limited
the amount of information that could be transmitted.
1938 Alec Reeves Invention of Pulse Code Modulation (PCM) for digital encoding of speech
signals. In World War II PCM-encoding allowed transmission of encrypted speech (Bell
Laboratories). In [16] Oliver, Pierce and Shannon compared PCM to classical modulation
systems.
1939 Homer Dudley Description of a vocoder. This system did overthrow the idea that com-
munication requires a bandwidth at least as wide as that of the signal to be communicated.
1942 Norbert Wiener (U.S. 1894 - 1964) Investigated the problem shown in figure 1.1. The
linear filter is to be chosen such that its output ŝ(t) is the best mean-square approximation
to s(t) given the statistical properties of the processes S(t) and N (t). A drawback of
the result of Wiener was that that modulation did not fit into his model and could not be
analyzed.
1943 Dwight O. North Discovery of the matched filter. This filter was shown to be the optimum
detector of a known signal in additive white noise.
1947 Vladimir A. Kotelnikov (Russia 1908 - ) Analysis of several modulation systems. His
noise immunity work [11] dealt with how to design receivers (but not how to choose the
transmitted signals) to minimize error probability in the case of digital signals or mean
square error in the case of analog signals. Kotelnikov discovered independently of Nyquist,
the sampling theorem in 1933.
1948 Claude E. Shannon (U.S. 1916 - 2001) Publication of “A Mathematical Theory of Com-
munication [20].” Shannon (see figure 1.2) showed that noise does not place an inescapable
restriction on the accuracy of communication. He showed that noise properties, channel
bandwidth, and restricted signal magnitude can be incorporated into a single parameter C
which he called the channel capacity. If the cardinality |M| of the message M grows as a
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 13
Figure 1.2: Claude E. Shannon, founder of Information Theory. Photo IEEE-IT Soc. Newsl.,
Summer 1998.
14
Chapter 2
SUMMARY: In this first chapter we investigate a discrete channel, i.e. a channel with a
discrete input and output alphabet. For this discrete channel we determine the optimum
receiver, i.e. the receiver that minimizes the error probability PE . The so-called maximum
a-posteriori (MAP) receiver turns out to be optimum. When all messages are equally likely
the optimum receiver reduces to a maximum-likelihood (ML) receiver. In the last section
of this chapter we distinguish between discrete scalar and vector channels.
m sm r m̂
- - discrete - - destin.
source transmitter receiver
channel
15
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 16
DISCRETE CHANNEL The channel produces an output r that assumes a value from a discrete
alphabet R. If the channel input is signal s ∈ S the output r ∈ R occurs with conditional
probability Pr{R = r |S = s}. This channel output is directed to the receiver. The random
variable associated with the channel output is denoted by R.
When message m ∈ M occurs, the transmitter chooses sm as channel input. Therefore
These conditional probabilities describe the behavior of the transmitter followed by the
channel.
RECEIVER The receiver forms an estimate m̂ of the transmitted message (or signal) by looking
at the received channel output r ∈ R, hence m̂ = f (r ). The random variable correspond-
ing to this estimate is called M̂. We assume here that m̂ ∈ M = {1, 2, · · · , |M|}, thus the
receiver has to choose one of the possible messages (and can not declare an error, e.g.).
Note that we call our system discrete because the channel is discrete. It has a discrete input
and output alphabet.
The performance of our communication system is evaluated by considering the probability
of error. This performance depends on the receiver, i.e. on the mapping f (·).
We are now interested in choosing a decision rule f (·) that minimizes PE . A decision rule that
minimizes the error probability PE is called optimum. The corresponding receiver is called an
optimum receiver.
The probability of correct (right) decision is denoted as PC and obviously PE + PC = 1.
Example 2.1 We assume that |M| = 2, i.e. there are two possible messages. Their a-priori probabilities
can be found in the following table:
m Pr{M = m}
1 0.4
2 0.6
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 17
The two signals corresponding to the messages are s1 and s2 and the conditional probabilities Pr{R =
r |S = sm } are given in the table below for the values of r and m that can occur. There are three values that
r can assume, i.e. R = {a, b, c}.
r a b c
f (r ) 1 1 2
This means that the receiver outputs the estimate m̂ = 1 if the channel output is a or b and m̂ = 2 if the
channel output is c. To determine the probability of error PE that is achieved by this rule we first compute
the joint probabilities
where we used (2.1). We list these joint probabilities in the following table:
This decision rule is not optimum. To see why, note that for every r ∈ R a certain message m̂ ∈
M is chosen. In other words the decision rule selects in each column exactly one joint probability.
These selected probabilities are added together to form PC . Our decision rule selects the joint probability
Pr{M = 1, R = b} = 0.16 in the column that corresponds to R = b. A larger PC is obtained if for
output R = b instead of message 1 the message 2 is chosen. In that case the larger joint probability
Pr{M = 2, R = b} = 0.18 is selected. Now the probability of correct decision becomes
We may conclude that, in general, to maximize PC , we have to choose the m that achieves the largest
Pr{M = m, R = r } for the r that was received. This will be made more precise in the following section.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 18
This upper bound is achieved if for all r the estimate f (r ) corresponds to the message m that
achieves the maximum in maxm Pr{M = m, R = r }, hence an optimum decision rule for the
discrete channel achieves
for all r ∈ R that can be received, i.e. that have Pr{R = r } > 0. Note that it is possible that for
some channel outputs r ∈ R more than one decision f (r ) is optimum.
Definition 2.3 For a communication system based on a discrete channel the joint probabilities
which are, given the received output value r , indexed by m ∈ M, are called the decision vari-
ables.
An optimum decision rule f (·) is based on these variables.
RESULT 2.1 (MAP decision rule) To minimize the probability of error PE , the decision rule
f (·) should produce for each received r a message m having the largest decision variable. Hence
for r ∈ R that actually can occur, i.e. that have Pr{R = r } > 0,
For such r we can divide all decision variables by Pr{R = r }. This results in the a-posteriori
probabilities of the message m ∈ M given the channel output r . By the Bayes rule
Pr{M = m} Pr{R = r |S = sm }
= Pr{M = m|R = r }, for all m ∈ M, (2.10)
Pr{R = r }
therefore the optimum receiver chooses as m̂ a message that has maximum a-posteriori proba-
bility (MAP). This decision rule is called the MAP-decision rule, the receiver is called a MAP-
receiver.
Example 2.2 Below are the a-posteriori probabilities that correspond to the example in the previous sec-
tion (obtained from Pr{R = a} = 0.26, Pr{R = b} = 0.34, and Pr{R = c} = 0.4). Note that the
probabilities in a column add up to one now.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 19
r a b c
f (r ) 1 2 2
P
where Pr{ M̂ 6= m|M = m} = r : f (r )6=m Pr{R = r |S = sm } is the probability of error conditional on the
fact that message m was sent.
RESULT 2.2 (ML decision rule) If the a-priori message probabilities are all equal, in order to
minimize the error probability PE , a decision rule f (·) has to be applied that satisfies
for r ∈ R that can occur i.e. for r with Pr{R = r } > 0. Such a decision rule is called a maxi-
mum likelihood (ML) decision rule, the resulting receiver is a maximum likelihood receiver.
Given the received channel output r ∈ R the receiver chooses a message m̂ ∈ M for which
the received r has maximum likelihood. The transition probabilities Pr{R = r |S = sm } for
r ∈ R and m ∈ M are called likelihoods. A receiver that operates like this (no matter whether
or not the messages are indeed equally likely) is called a maximum likelihood receiver. It should
be noted that such a receiver is less complex than a maximum a-posteriori receiver since it does
not have to multiply the transition probabilities Pr{R = r |S = sm } by the a-priori probabilities
Pr{M = m}.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 20
Example 2.3 Consider again the transition probabilities of the example in section 2.2:
sm r
sm1- r1 -
m sm2- discrete r2 - m̂
source - transmitter .. vector .. receiver - destin.
. channel .
- -
sm N rN
So far we have considered in this chapter a discrete channel that accepts a single input s ∈ S
and produces a single output r ∈ R. This channel could be called a discrete scalar channel.
There is also a vector variant of this channel, see figure 2.2. The components of a discrete vector
system are described below.
TRANSMITTER Let N be a positive integer. Then the vector transmitter sends a signal vector
s m = (sm1 , sm2 , · · · , sm N ) of N components from the discrete alphabet S if message m
is to be conveyed. This signal vector is input to the discrete vector channel. The random
variable corresponding to the signal signal vector is denoted by S. The collection of used
signal vectors is s 1 , s 2 , · · · , s |M| .
r |S = s}. This channel output vector is directed to the vector receiver. The random
variable associated with the channel output is denoted by R.
When message m ∈ M occurs, s m is chosen by the transmitter as channel input vector.
Then the conditional probabilities
describe the behavior of the vector transmitter followed by the discrete vector channel.
RECEIVER The vector receiver forms an estimate m̂ of the transmitted message (or signal) by
looking at the received channel output vector r ∈ R N , hence m̂ = f (r ). The mapping
f (·) is again called the decision rule.
It will not be a big surprise that we define the decision variables for the discrete vector channel
as follows:
Definition 2.4 For a communication system based on a discrete vector channel the decision
variables are again the joint probabilities
RESULT 2.3 (MAP decision rule for the discrete vector channel) To minimize the probabil-
ity of error PE , the decision rule f (·) should produce for each received vector r a message m
having the largest decision variable. Hence for vectors r ∈ R N that actually can occur, i.e. that
have Pr{R = r } > 0,
2.6 Exercises
1. Consider the noisy discrete communication channel illustrated in figure 2.3.
(a) If Pr{M = 1} = 0.7 and Pr{M = 2} = 0.3, determine the optimum decision rule
(assignment of a, b or c to 1, 2) and the resulting probability of error PE .
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 22
¤¡ 0.7 ¤¡
P
1 £Q
¢P -
Q P ´£ ¢ a
Q PP 0.2 ´
Q PP ´
Q
0.1 Q P ´
´PPP ¤ ¡
Q´ PP
´Q ³³³£ ¢ b
´ ³³
0.3 ´ ³³Q Q
´³³ 0.2 Q
¤´
¡ ´
³³ Q ¤¡
´
³
2 £³
¢ - Q£ ¢ c
0.5
Figure 2.3: A noisy discrete communication system with input M and output R.
(b) There are eight decision rules. Plot the probability of error for each decision rule
versus Pr{M = 1} on one graph.
(c) Each decision rule has a maximum probability of error which occurs for some least
favorable a-priori probability Pr{M = 1}. The decision rule which has the smallest
is called the minimax decision rule. Which of the eight rules is minimax?
r = sm + n. (2.19)
The noise variable N is independent of the channel input and can have the values −1, 0,
and +1 only. It is given that Pr{N = 0} = 0.4 and Pr{N = −1} = Pr{N = +1} = 0.3.
(a) Note that the channel output assumes values in {0, 1, 2, 3, 4, 5, 6}. Determine the
probability Pr{R = 2}. For each message m ∈ M compute the a-posteriori probabil-
ity Pr{M = m|R = 2}.
(b) Note that an optimum receiver minimizes the probability of error PE . Give for each
possible channel output the estimate m̂ that an optimum receiver will make. Deter-
mine the resulting error probability.
(c) Consider a maximum-likelihood receiver. Give for each possible channel output the
estimate m̂ that a maximum-likelihood receiver will make. Again determine the re-
sulting probability of error.
(a) Suppose that the detector chooses m̂ such that the probability of error PE = Pr{ M̂ 6=
M} is minimized. For what values of k does the detector choose m̂ = 1 and when
does it choose m̂ = 2?
(b) Consider a maximum-likelihood detector. For what values of k does this detector
choose m̂ = 1 and when does it choose m̂ = 2 now?
(c) Assume that λoff = 0. Consider a maximum-a-posteriori detector. For what values
of pon does the detector choose m̂ = 2 no matter what k is? What is in that case the
error probability PE ?
1− p
¤£¡¢ - ¤¡
0 H HH © ©£¢ 0
pHHH
jH
H ©
H© ©©
©HH
p© *©
©
© HH
©© HH¤¡
¡¢©
¤£© - £¢ 1
1
1− p
4. Consider transmission over a binary symmetric channel with cross-over probability 0 <
p < 1/2 (see figure 2.4. All messages m ∈ M are equally likely. Each message m
now corresponds to a signal vector (codeword) s m = (sm1 , sm2 , · · · , sm N ) consisting of N
binary digits, hence smi ∈ {0, 1} for i = 1, N . Such a codeword (sequence) is transmitted
over the binary symmetric channel, the resulting channel output sequence is denoted by r .
The Hamming-distance dH (x, y) between two sequences x and y, both consisting of N
binary digits, is the number of positions at which they differ.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 24
(a) Show that the optimum receiver should choose the codeword that has minimum Ham-
ming distance to the received channel output sequence.
(b) What is the optimum receiver for 1/2 < p < 1?
(c) Assume that the messages are not equally likely and that the cross-over probability
p = 1/2? What is the minimum error probability now? What does the corresponding
receiver do?
SUMMARY: Here we will consider transmission of information over a channel with a single
real-valued input and output. Again the optimum receiver is determined. As an example
we investigate the additive Gaussian noise channel, i.e. the channel that adds Gaussian
noise to the input signal.
m sm r m̂
- - real - - destin.
source transmitter receiver
channel
A communication system based on a channel with real-valued input and output alphabet, i.e.
a real scalar channel does not differ very much from a system based on a discrete channel (see
figure 3.1).
REAL SCALAR CHANNEL The channel now produces an output r in the range (−∞, ∞).
When the input signal is the real-valued scalar s the channel output is generated according
25
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 26
describes the behavior of the vector transmitter followed by the vector channel.
RECEIVER The receiver forms an estimate m̂ of the transmitted message (or signal) based on
the received real-valued scalar channel output r , hence m̂ = f (r ). The mapping f (·) is
called the decision rule.
An optimum decision rule is obtained if, after receiving the scalar R = r , the decision f (r ) is
taken in such a way that
Also this reasoning leads to the definition of the decision variables below.
Definition 3.1 For a system based on a real scalar channel the products
which are, given the received output r , indexed by m ∈ M, are called the decision variables.
The optimum decision rule f (·) is again based on these variables. It is possible that for certain
channel outputs r more decisions f (r ) are optimum.
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 27
RESULT 3.1 (MAP) To minimize the error probability PE , the decision rule f (·) should be
such that for each received r a message m is chosen with the largest decision variable. Hence
for r that can be received, an optimum decision rule f (·) should satisfy
Both sides of the inequality (3.3) can be divided by p R (r ) for values of r that actually did occur.
Then we obtain
f (r ) = arg max Pr{M = m|R = r }, (3.7)
m∈M
for r for which p R (r ) > 0. This rule is again called maximum a-posteriori (MAP) decision rule.
RESULT 3.2 (ML) When all messages have equal a-priori probabilities. i.e. when Pr{M =
m} = 1/|M| for all m ∈ M, we observe from (3.6), that the optimum receiver has to choose
for all r with pr (r ) > 0. This rule is referred to as the maximum likelihood (ML) decision rule.
s ¾?»r =s+n
- + -
½¼
Definition 3.2 The scalar additive Gaussian noise (AGN) channel adds Gaussian noise N to
the input signal S . This Gaussian noise N has variance σ 2 and mean 0. The probability density
function of the noise is defined to be
1 1 n2
p N (n) = √ exp(− ). (3.9)
2πσ 2 2σ 2
The noise variable N is assumed to be independent of the signal S .
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 28
1 (r − s1 )2 1 (r − s2 )2
Pr{M = 1} √ exp(− ) ≥ Pr{M = 2} √ exp(− ), (3.11)
2πσ 2 2σ 2 2πσ 2 2σ 2
(r − s1 )2 (r − s2 )2
ln Pr{M = 1} − ≥ ln Pr{M = 2} − ,
2σ 2 2σ 2
2σ 2 ln Pr{M = 1} − (r − s1 )2 ≥ 2σ 2 ln Pr{M = 2} − (r − s2 )2 ,
2σ 2 ln Pr{M = 1} + 2r s1 − s12 ≥ 2σ 2 ln Pr{M = 2} + 2r s2 − s22 ,
Pr{M = 2}
2rs1 − 2r s2 ≥ 2σ 2 ln + s12 − s22 . (3.12)
Pr{M = 1}
RESULT 3.3 (Optimum receiver for the scalar additive Gaussian noise channel) A receiver
that decides m̂ = 1 if
∗ 1 σ2 Pr{M = 2} s1 + s2
r ≥r = ln + , (3.13)
s1 − s2 Pr{M = 1} 2
and m̂ = 2 otherwise, is optimum. When the a-priori probabilities Pr{M = 1} and Pr{M = 2}
are equal the optimum threshold is
s1 + s2
r∗ = . (3.14)
2
Example 3.1 Assume that σ 2 = 1 and s1 = +1 and s2 = −1. If the a-priori message probabilities are
equal, i.e. when Pr{M = 1} = Pr{M = 2}, we obtain an optimum receiver if only for r ≥ r ∗ = s1 +s
2
2
we
∗
decide for m̂ = 1. The decision changes exactly halfway between s1 and s2 . The intervals (−∞, r ) and
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 29
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
0.3
0.25
0.2
0.15
0.1
0.05
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 3.4: Decision variables as a function of r for a-priori probabilities 1/4 and 3/4.
(r ∗ , ∞) are called decision intervals. In our case, since s2 = −s1 , the threshold r ∗ = 0. This also can be
seen from figure 3.3 where the decision variables
1 (r − s1 )2 1 (r − s2 )2
Pr{M = 1} √ exp(− ), Pr{M = 2} √ exp(− ) (3.15)
2π σ 2 2σ 2 2π σ 2 2σ 2
are plotted as a function of r assuming that Pr{M = 1} = Pr{M = 2} = 1/2.
Next assume that the a-priori probabilities are not equal but let Pr{M = 1} = 3/4 and Pr{M = 2} =
1/4. Now the decision variables change (see figure 3.4) and we must also change the decision rule. It
turns out that for r ≥ r ∗ = − ln23 ≈ −0.5493 the optimum receiver should choose m̂ = 1 (see again figure
3.4). The threshold r ∗ has moved away from the more probable signal s1 .
Note that, no matter how much the a-priori probabilities differ, there is always a value of r , which we
have called r ∗ , the threshold, before, for which (3.11) is satisfied with equality. For r > r ∗ the left side in
(3.11) is larger than the right side, for r < r ∗ the right side is the largest.
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 30
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−4 −3 −2 −1 0 1 2 3 4
Figure 3.5: Gaussian probability density function for mean 0 and variance 1. The shaded area
corresponds to Q(1).
In appendix A an upper bound for Q(x) is derived. In table 3.1 we tabulated Q(x) for several
values of x.
where
Z ∞
∗ 1 (r − s2 )2
Pr{R ≥ r |M = 2} = √ exp(−)dr
r =r ∗ 2πσ 2 2σ 2
Z ∞
1 ((r/σ ) − s2 /σ )2
= √ exp(− )d(r/σ )
r =r ∗ 2π 2
Z ∞
1 µ2
= √ exp(− )dµ = Q(r ∗ /σ − s2 /σ ), (3.19)
µ=r ∗ /σ −s2 /σ 2π 2
and similarly
Z r =r ∗
∗ 1(r − s1 )2
Pr{R < r |M = 1} = √ exp(− )dr
−∞ 2πσ 2 2σ 2
Z r =r ∗
1 ((r/σ ) − s1 /σ )2
= √ exp(− )d(r/σ )
−∞ 2π 2
Z µ=r ∗ /σ −s1 /σ
1 µ2
= √ exp(− )dµ
−∞ 2π 2
(∗)
= 1 − Q(r ∗ /σ − s1 /σ ) = Q(s1 /σ − r ∗ /σ ). (3.20)
RESULT 3.4 (Minimum probability of error) If we combine (3.18), (3.19), and (3.20), we ob-
tain
s1 − r ∗ r ∗ − s2
PE = Pr{M = 1}Q( ) + Pr{M = 2}Q( ). (3.21)
σ σ
If the a-priori message probabilities Pr{M = 1} and Pr{M = 2} are equal then, according to
(3.14), we get r ∗ = s1 +s2
2 and
s1 − r ∗ s1 − s1 +s
2
2
s1 − s2
= =
σ σ 2σ
∗ s1 +s2
r − s2 2 − s2 s1 − s2
= = (3.22)
σ σ 2σ
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 32
For Pr{M = 1} = 3/4 and Pr{M = 2} = 1/4 we get that r ∗ = − ln23 ≈ −0.5493 and
3 ln 3 1 ln 3
PE = Q(1 + ) + Q(− + 1)
4 2 4 2
≈ 0.75Q(1.5493) + 0.25Q(0.4507)
≈ 0.75 · 0.0607 + 0.25 · 0.3261 ≈ 0.1270. (3.25)
Note that this is smaller than Q(1) ≈ 0.1587 which would be the error probability after ML-detection
(which is suboptimum here).
3.4 Exercises
p R (r |M = 1) p R (r |M = 2)
1 1
¡@
¡ @
¡ @ 1/4
¡ @
0 1 2 r -1 0 1 2 3 r
1. A communication system is used to transmit one of two equally likely messages, 1 and 2.
The channel output is a real-valued random variable R, the conditional density functions of
which are shown in figure 3.6. Determine the optimum receiver decision rule and compute
the resulting probability of error.
(Exercise 2.23 from Wozencraft and Jacobs [25].)
2. The noise n in figure 3.7a is Gaussian with zero mean, i.e. E[N ] = 0. If one two equally
likely messages is transmitted, using the signals of 3.7b, an optimum receiver yields PE =
0.01.
(a) What is the minimum attainable probability of error PEmin when the channel of 3.7a
is used with three equally likely messages and the signals of 3.7c? And with four
equally likely messages and the signals of 3.7d?
(b) How do the answers to the previous questions change if it is known that E[N ] = 1
instead of 0?
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 33
n
s1 s2
?» ¤¡ ¤¡
s ¾ r =s+n £¢ £¢
- + - -2 +2
½¼
(a) (b)
s1 s2 s3 s1 s2 s3 s4
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢
-4 0 +4 -4 0 +4 +8
(c) (d)
Figure 3.7:
SUMMARY: In this chapter we discuss the channel whose input and output are vectors of
real-valued components. An important example of such a real vector channel is the additive
Gaussian noise (AGN) vector channel. This channel adds zero-mean Gaussian noise to each
input signal component. All noise samples are assumed to be independent of each other and
have the same variance. For the AGN vector channel we determine the optimum receiver.
If all messages are equally likely this receiver chooses the signal vector that is closest to the
received vector in Euclidean sense. We also investigate the error behavior of this channel.
The last part of this chapter deals with the theorem of reversibility.
sm r
sm1- r1 -
m sm2- real r2 - m̂
source - transmitter .. vector .. receiver - destin.
. channel .
- -
sm N rN
In a communication system based on a real vector channel (see figure 4.1) the channel input
and output are vectors with real-valued components.
SOURCE The information source generates the message m ∈ M = {1, 2, · · · , |M|} with a-
priori probability Pr{M = m} for m ∈ M. M is the random variable associated with this
mechanism.
34
CHAPTER 4. REAL VECTOR CHANNELS 35
by S. The vector-signal is input to the channel. The collection of used signal vectors is
s 1 , s 2 , · · · , s |M| .
describes the behavior of the cascade of the vector transmitter and the vector channel.
Receiver The receiver forms m̂ based on the received real-valued channel output vector r , hence
m̂ = f (r ). Mapping f (·) is called the decision rule.
In other words, upon receiving r , the optimum receiver produces an estimate m̂ that corresponds
to a largest decision variable. Given r , the decision variables for the real vector channel are
That this is MAP-decoding follows from the Bayes rule which says that
Pr{M = m} p R (r |M = m)
Pr{M = m|R = r } = (4.4)
p R (r )
for r with p R (r ) > 0. Note that p R (r ) > 0 for an output vector r that has actually been received.
CHAPTER 4. REAL VECTOR CHANNELS 36
I1 »»
½ ½
s1
¤¡
£¢ I2
¡ ¤¡
¡ £ ¢ s2
¡
´
´ @
´ @
´
©´ @
© © AA
©
©©
³³ B
³³
³ @
QX
X``
``
¤¡
£¢
s3
I3
Figure 4.2: Three two-dimensional signal vectors and their decision regions.
Example 4.1 Suppose (see figure 4.2) that we have three messages i.e. M = {1, 2, 3}, and the corre-
sponding signal-vectors are two-dimensional i.e. s 1 = (1, 2), s 2 = (2, 1), and s 3 = (1, −3). A possible
partitioning of the observation space into the three decision regions I1 , I2 , and I3 is shown in the figure.
CHAPTER 4. REAL VECTOR CHANNELS 37
1 knk2
n p N (n) = (2π σ 2 ) N /2
exp(− 2σ 2 )
¾?»
s r =s+n
- + -
½¼
Definition 4.2 For the output of the additive Gaussian noise (AGN) vector channel we have
that
r = s + n, (4.6)
1
where n = (n 1 , n 2 , · · · , n N ) is an N -dimensional noise vector. We denote this random noise
vector by N . The noise vector N is assumed to be independent of the signal vector S . Moreover
the N components of the noise vector are assumed to be independent of each other. All noise
components have mean 0 and the same variance σ 2 . Therefore the joint density function of the
noise vector is given by
Y 1 n i2 1 1 X 2
p N (n) = √ exp(− 2 ) = exp(− n i ). (4.7)
2πσ 2 2σ (2πσ 2 ) N /2 2σ 2
i=1,N i=1,N
1 P
where (a · b) = i=1,N ai bi is the dot (inner) product of the vectors a = (a1 , a2 , · · · , a N ) and
b = (b1 , b2 , · · · , b N ). Thus
1 knk2
p N (n) = exp(− ). (4.9)
(2πσ 2 ) N /2 2σ 2
CHAPTER 4. REAL VECTOR CHANNELS 38
p R (r |M = m) = p R (r |S = s m ) = p N (r − s m |S = s m ) = p N (r − s m ). (4.10)
Now we can easily determine the decision variables, one for each m ∈ M:
Pr{M = m} p R (r |M = m) = Pr{M = m} p N (r − s m )
1 kr − s m k2
= Pr{M = m} exp(− ). (4.11)
(2πσ 2 ) N /2 2σ 2
Note that the factor (2πσ 2 ) N /2 is independent of m. Hence maximizing over the decision vari-
ables in (4.11) is equivalent to minimizing
kr − s m k2 − 2σ 2 ln Pr{M = m} (4.12)
over all m ∈ M. We obtain a very simple decision rule if all messages are equally likely.
RESULT 4.1 (Minimum Euclidean distance decision rule) If the a-priori message probabili-
ties are all equal, the optimum receiver has to minimize the squared Euclidean distance
In other words the receiver has to choose the message m̂ with corresponding signal vector s m̂
that is closest in Euclidean sense to the received vector r .
Example 4.2 Consider again (see figure 4.4) three two dimensional signal vectors s 1 = (1, 2), s 2 =
(2, 1), and s 3 = (1, −3).
The optimum decision regions can be found by drawing the perpendicular bisectors of the sides of the
signal triangle. These are the boundaries of the decision regions I1 , I2 , and I3 (see figure). Note that the
three bisectors have a single point in common.
¡
¡
¡
I1 ¡
¡
¡
s1 ¡
e ¡ I2
@ ¡
¡@
¡ @e s 2
¡ ¤¤
¡ ¤
¡ ¤
¡
XXX ¤
XXX¤
¤ XXXX
¤ XX
XX
¤
¤
¤
¤e
s3
I3
Figure 4.4: Three two-dimensional signal vectors and the corresponding optimum decision re-
gions for the additive Gaussian noise channel.
QQ
Q
Q
Q
Q
r2 Q I
Q
Q
6 Q
Á Q
Q
Q
1 (r1 , r2 ) QQ
Q
QAK Q Q Á
A Q Q
k
QQ A Q
À
Q Q
Q A Q
Q Q Q
Q
k A Q Q
u2 Q A Á u1
Q
Q Q line
Q A Q
Q A Q θ Q
Q ¤¡
QA£¢
(a, b)
- r1
where (a, b) is the center of a Cartesian system with coordinates u 1 and u 2 , and θ the rotation-
angle. Coordinate u 1 is perpendicular to and coordinate u 2 is parallel to the line in the figure.
Note that Z
1 (r1 − a)2 + (r2 − b)2
PI = 2
exp − dr1 dr2 . (4.16)
I 2πσ 2σ 2
Elementary calculus (see e.g. [26], p. 386, theorem 7.7.13) tells us that
Z Z
f (r1 , r2 )dr1 dr2 = f (r1 (u 1 , u 2 ), r2 (u 1 , u 2 ))|J |du 1 du 2 , (4.17)
I I
Note that |J | = cos2 θ + sin2 θ = 1 and (r1 − a)2 + (r2 − b)2 = u 22 + u 22 . Therefore
Z !
1 u 21 + u 22
PI = 2
exp − du 1 du 2
I 2πσ 2σ 2
Z ∞ Z ∞ !
1 u 21 + u 22
= 2
exp − du 1 du 2
u 1 =1 u 2 =−∞ 2πσ 2σ 2
Z ∞ ! Z ∞ !
1 u 21 1 u 22
= √ exp − 2 du 1 √ exp − 2 du 2
u 1 =1 2πσ 2 2σ u 2 =−∞ 2πσ 2 2σ
1 1
= Q( ) · 1 = Q( ). (4.19)
σ σ
Conclusion is that the probability depends only on the distance 1 from the signal point (a, b) to
the line. This result carries over to more than two dimensions:
RESULT 4.2 For the additive Gaussian noise vector channel, the probability that the noise
pushes a signal to the wrong side of a hyperplane is
1
PI = Q( ), (4.20)
σ
where 1 is the distance from the signal-point to the hyperplane and σ 2 is the variance of each
noise component. All noise components are assumed to be zero-mean.
CHAPTER 4. REAL VECTOR CHANNELS 41
THEOREM 4.3 (Theorem of reversibility) The minimum attainable probability of error is not
affected by the introduction of a reversible operation at the output of a channel, see figure 4.6.
s r r0
- channel - G -
Figure 4.6: If G is reversible then a ”good” decision can be made from r 0 = G(r ).
Proof: Consider the receiver for r 0 that is depicted in figure 4.7. This receiver first recovers
r = G −1 (r 0 ). This is possible since the mapping G from r to r 0 is reversible. Then an optimum
receiver for r is used to determine m̂. The receiver constructed in this way for r 0 yield the same
error probability as an optimum receiver that observes r directly, thus a reversible operation does
not (necessarily) lead to an increase of PE . 2
r0 r optimum m̂
- G −1 - receiver -
for r
4.6 Exercises
1. One of four equally likely messages is to be communicated over a vector channel which
adds a (different) statistically independent zero-mean Gaussian random variable with vari-
ance N0 /2 to each transmitted vector component. Assume that the transmitter uses the
signal vectors shown in figure 4.8 and express the PE produced by an optimum receiver in
terms of the function Q(x).
(Exercise 4.2 from Wozencraft and Jacobs [25].)
2. In the communication system diagrammed in figure 4.9 the transmitted signal sm and the
noises n 1 and n 2 are all random voltages and all statistically independent. Assume that
CHAPTER 4. REAL VECTOR CHANNELS 42
s 2 ¤£ ¡¢ ¤ ¡s 1
£¢
√
E s /2
¤¡ ¤¡
£¢ £¢
s3 s4
¾» r1 = sm + n 1
n1 - + - m̂
½¼ optimum -
m 6
- transmitter sm receiver
¾?»
n2 - + -
½¼ r2 = sm + n 2
r1
¾?»
r1 + r2 m̂
+ - threshold device -
½¼
6
r2
(a) Show that the optimum receiver can be realized as diagrammed in figure 4.10. De-
termine the optimum threshold setting and the value m̂ for r1 + r2 larger than the
threshold.
CHAPTER 4. REAL VECTOR CHANNELS 43
m sm ¾ »r 1 = sm + n 1
- transmitter - + - m̂
½¼ optimum -
6
receiver
n1 ¾?»
- + -
n2 ½¼ r2 = n 1 + n 2
Figure 4.11:
r1
¾?»
r1 + ar2 m̂
+ - threshold device -
¾» ½¼
- × 6
r2 ½ ¼
6
a
Figure 4.12:
3. In the communication system diagrammed in figure 4.11 the transmitted signal s and the
noises n 1 and n 2 are all random voltages and all statistically independent. Assume that
|M| = 2 i.e. M = {1, 2} and that
(a) Show that the optimum receiver can be realized as diagrammed in figure 4.12 where
a is an appropriately chosen constant.
(b) What is the optimum value for a?
(c) What is the optimum threshold setting?
(d) Express the resulting PE in terms of Q(x).
CHAPTER 4. REAL VECTOR CHANNELS 44
(e) By what factor would E s have to be increased to yield this same probability of error
if the receiver were restricted to observing only r1 .
4. Consider a set of signal vectors {s 1 , s 2 , · · · , s |M| }. Result 4.1 says that if the a-priori
probabilities of the corresponding messages are equal, the minimum Euclidean distance
decision rule is optimum if the channel is an additive Gaussian noise vector channel (defi-
nition 4.2). In this case the decision regions can be described by a set of hyperplanes (one
for each message pair). Show that this is also the case if the a-priori probabilities are not
all equal.
Chapter 5
SUMMARY: All topics discussed so far are necessary to analyze the additive white Gaus-
sian noise waveform channel. In a waveform channel continuous-time waveforms are trans-
mitted over a channel that adds continuous-time white Gaussian noise to them. We show in
this chapter that an optimum receiver for such a waveform channel correlates the received
waveform with all possible transmitted waveforms and makes a decision based on all these
correlations. Together these correlations form a sufficient statistic.
TRANSMITTER The transmitter chooses the waveform sm (t) for 0 ≤ t < T , if message m
is to be transmitted. The set of used waveforms is therefore {s1 (t), s2 (t), · · · , s|M| (t)}.
The interval [0, T ) is called the observation interval. The chosen waveform is input to the
channel. We assume here that all these waveforms have finite-energy, i.e.
Z T
1
E sm = sm2 (t)dt < ∞ for all m ∈ M, (5.1)
0
where E sm is the energy of waveform sm (t). To convince yourself that this is a reasonable
n w (t)
m sm (t) ¾?» m̂
- transmitter - + - receiver -
½ r¼
(t) = sm (t) + n w (t)
Figure 5.1: Communication over an additive white Gaussian noise waveform channel.
45
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 46
definition assume that sm (t) is the voltage across (or the current through) a resistor of 1.
The total energy that is dissipated in this resistor is then equal to E sm .
WAVEFORM CHANNEL The channel accepts the input waveform sm (t) and adds white Gaus-
sian noise n w (t) to it, i.e. for the received waveform r (t) we have that
The noise n w (t) is supposed to be generated by the random process Nw (t) which is zero-
mean (i.e. E[Nw (t) = 0 for all t), stationary, white, and Gaussian1 . The autocorrelation
function of the noise is
1 N0
R Nw (t, s) = E[Nw (t)Nw (s)] = δ(t − s), (5.3)
2
and depends only on the time difference t − s (by the stationarity). The noise has power
spectral density
N0
S Nw ( f ) = for −∞ < f < ∞, (5.4)
2
i.e. it is white, see appendix D. Note that f is the frequency in Hz.
RECEIVER The receiver forms the message estimate m̂ based on the observed waveform
r (t), 0 ≤ t < T .
The transmission of messages over a waveform channel is what we actually want to study in
the first part of these course notes. It is a model for many digital communication systems (e.g.
a telephone line including the modems on both sides, wireless communication links, recording
channels). In the next sections we will show that the optimum receiver should correlate the
received signal r (t) with all possible transmitted waveforms sm (t), m ∈ M to form a good
message estimate. In a second chapter on waveform communication we will study alternative
implementations of optimum receivers.
GAUSSIAN NOISE is always everywhere. It originates in the thermal fluctuations of all
matter in the universe. It creates randomly varying electromagnetic fields that will a produce a
fluctuating voltage across the terminals of an antenna. The random movements of electrons in
e.g. resistors will create random voltages in the electronics of the receiver. Since these voltages
are the sum of many minuscule effects, by the central-limit theorem, they are Gaussian.
1 1 1 1
0 0 0 0
Consider a set of functions { f i (t), i = 1, 2, · · · } that are orthonormal over the interval [0, T )
in the sense that Z T
1 1 if i = j
f i (t) f j (t)dt = (5.5)
0 0 if i 6= j.
Some well-known sets of orthonormal functions are given in the example that follows.
Example 5.1 In figure 5.2 four R Tfunctions are shown that are time-translated orthonormal pulses. Note
that the energy in a pulse i.e. 0 f i2 (t)dt = 1 for i = 1, 4. In figure 5.3 five functions are shown on
a√one-second time interval, a pulse with amplitude 1 and four sine and cosine waves whose amplitude is
2. All these functions are again mutually orthogonal and have unit energy. Note that functions like these
form the building blocks used in Fourier series.
Now assume2 that all outcomes n w (t) of the random process Nw (t) can be represented in
terms of the orthonormal functions f 1 (t), f 2 (t), · · · as
∞
X
n w (t) = n i f i (t). (5.6)
i=1
To see why, substitute (5.6) in the right side of (5.7) and integrate making use of (5.5). Next
assume that also all signals can be expressed in terms of the same set of orthonormal functions,
2 In this discussion we will focus only on the main concepts and we do not bother very much about mathematical
details as e.g. the existence of such a “complete set” of orthonormal functions, extra conditions on the random
process Nw (t) and on the waveforms sm (t), the convergence of series, the interchanging of orders of summation and
integration, etc. . For a more precise treatment we refer e.g. to Gallager [7], Helstrom [9], or Lee and Messerschmitt
[13].
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 48
1 1 1
0 0 0
−1 −1 −1
sqrt(2)*sin(2*pi*t) sqrt(2)*sin(4*pi*t)
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
0 0.5 1 0 0.5 1
Figure 5.3: Example of orthonormal functions: first five functions in a Fourier series.
i.e.
∞
X
sm (t) = smi f i (t), for m ∈ {1, 2, · · · , |M|}, (5.8)
i=1
where the coefficients smi , m ∈ M, i = 1, 2, · · · , are given by
Z T
smi = sm (t) f i (t)dt. (5.9)
0
Then also the received signal r (t) = sm (t) + n w (t) can be expanded as
∞
X
r (t) = ri f i (t), (5.10)
i=1
ri = smi + n i
Z T Z T Z T
= sm (t) f i (t)dt + n w (t) f i (t)dt = r (t) f i (t)dt. (5.11)
0 0 0
smallest possible achievable error probability PE . Similarly we can assume that instead of the
signal sm (t) the vector s ∞
m = (sm1 , rm2 , rm3 , · · · ) is transmitted in order to convey the message
m ∈ M, the reason for this is that s ∞ m and sm (t) are equivalent. All this implies that we have
transformed our waveform channel into a vector channel (with an infinite number of dimensions
however). Since
r ∞ = s∞ m +n ,
∞
(5.12)
with n ∞ = (n 1 , n 2 , n 3 , · · · ) the resulting vector channel adds noise to the input vector. What
can we say now about the noise components Ni , i = 1, 2, · · · ?
Note first that each component results from linear operations on the Gaussian process Nw (t).
Therefore the noise components are jointly Gaussian.
For the mean of component Ni for i = 1, 2, · · · , we find that
Z T Z T
E[Ni ] = E[ Nw (t) f i (t)dt] = E[Nw (t)] f i (t)dt = 0. (5.13)
0 0
As a consequence of all this our vector channel is an additive Gaussian noise vector channel
as described in definition 4.2.
where we have substituted N0 /2 for the variance σ 2 . The receiver has to minimize (5.16) over
1 P∞
m ∈ M. With (a ∞ · b∞ ) = i=1 ai bi for a ∞ = (a1 , a2 , · · · ) and b∞ = (b1 , b2 , · · · ), we
rewrite
kr ∞ − s ∞
mk
2
= ((r ∞ − s ∞m ) · (r
∞
− s∞m ))
= (r ∞ · r ∞ ) − 2(r ∞ · s ∞ ∞ ∞
m ) + (s m · s m )
= kr ∞ k2 − 2(r ∞ · s ∞ ∞ 2
m ) + ks m k . (5.17)
Since kr ∞ k2 does not depend on m, the optimum receiver should maximize
(r ∞ · s ∞
m ) + cm over m ∈ M = {1, 2, · · · , |M|} (5.18)
with
N0
1 ks ∞ k2
cm = ln Pr{M = m} − m . (5.19)
2 2
Now it would be nice if the receiver could use some simple method to determine the dot
product (r ∞ · s ∞ ∞
m ) and the squared vector length ks m k, which are both sums of an infinite
number of components. We will see next that this is possible and quite easy. For the dot product
we find
Z T Z T ∞
X
r (t)sm (t)dt = r (t) smi f i (t)dt
0 0 i=1
∞
X Z ∞ ∞
X
= smi r (t) f i (t)dt = smi ri = (r ∞ · s ∞
m ). (5.20)
i=1 −∞ i=1
RESULT 5.1 The optimum receiver for transmission of the signals sm (t), m ∈ M, over an
additive white Gaussian noise waveform channel has to maximize
Z ∞
r (t)sm (t)dt + cm over m ∈ M = {1, 2, · · · , |M|}. (5.22)
−∞
Here
N01 Es
cm = ln Pr{M = m} − m . (5.23)
2 2
This receiver is called a correlation receiver (see figure 5.4).
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 51
s1 (t) c1
RT
¾ ?» r (t)s1 (t)dt ¾?»
0
- × - integrator - + -
r (t) ½¼ ½¼ m̂
s2 (t) c2 select -
RT
¾ ?» r (t)s2 (t)dt ¾?» largest
0
- × - integrator - + -
½¼ ½¼
..
.
s|M| (t) c|M|
RT
¾ ?» r (t)s|M| (t)dt ¾? »
0
- × - integrator - + -
½¼ ½¼
Figure 5.4: Correlation receiver, an optimum receiver for waveforms in white Gaussian noise.
5.6 Exercises
1. Consider a waveform communication system in which one of two equally likely messages
is transmitted, hence M = {1, 2} and Pr{M = 1} = Pr{M = 2} = 1/2. The waveforms
s1 (t) and s2 (t) corresponding to the messages both have energy E, i.e.
Z T Z T
2
s1 (t)dt = s22 (t)dt = E,
0 0
where [0, T ) is the observation interval. The channel adds zero-mean additive Gaussian
noise with power spectral density S Nw ( f ) = 1 to the transmitted waveform.
(a) Find an expression for the error probability obtained by a correlation receiver when
s1 (t) = −s2 (t).
RT
(b) What is the error probability of a correlation receiver when 0 s1 (t)s2 (t)dt = 0?
Chapter 6
SUMMARY: In this chapter we assume that when signal waveforms are linear combina-
tions of a set of orthonormal waveforms (building-bock waveforms) the waveform channel
can be transformed into an AGN vector channel without loss of optimality. To each input
waveform there corresponds a vector in what is called the signal space. It is possible to
make an optimum decision based on the correlations of the channel output waveform with
the building-block waveforms. The difference between this vector of correlations and the
transmitted vector is a noise vector which is Gaussian and spherically distributed.
52
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 53
Now the correlation of r (t) and the signal sm (t) for m ∈ M can be expressed as
Z T Z T X
r (t)sm (t)dt = r (t) smi ϕi (t)dt
0 0 i=1,N
X Z T X
= smi r (t)ϕi (t)dt = smi ri , (6.4)
i=1,N 0 i=1,N
with Z T
1
ri = r (t)ϕi (t)dt for i = 1, N . (6.5)
0
RT
Observe that all the correlations 0 r (t)sm (t)dt, m ∈ M, can be computed from the N
RT RT
correlations 0 r (t)ϕi (t)dt, i = 1, N . Therefore also these N correlations 0 r (t)ϕi (t)dt, i =
1, N , are a sufficient statistic since from these N numbers an optimum decision can be made.
This is summarized in the following theorem.
RESULT 6.1 If the transmitted waveforms sm (t), m ∈ M, are linear combinations of the N
building-block waveforms ϕi (t), i = 1, N , i.e. if
X
sm (t) = smi ϕi (t), for m ∈ M, (6.6)
i=1,N
ϕ1 (t) c1
PN
¾ ?» r1 ¾?»
i=1 ri s1i
- × - integrator - - + -
r (t) ½¼ ½¼ m̂
-
ϕ2 (t) c2 select
PN
¾ ?» r2 ¾?»
i=1 ri s2i largest
- × - integrator - weighting - + -
½¼ ½¼
matrix
.. ..
. .
ϕ N (t) c|M|
PN
¾ ?» rN ¾? »
i=1 ri s|M|i
- × - integrator - - + -
½¼ ½¼
RESULT 6.2 (Gram-Schmidt) For an arbitrary signal collection, i.e. a collection of wave-
forms {s1 (t), s2 (t), · · · , s|M| (t)} on the interval [0, T ) we can construct a set of N ≤ |M|
building-block waveforms {ϕ1 (t), ϕ2 (t), · · · , ϕ N (t)} and P find coefficients smi such that for m =
1, 2, · · · , |M| the signals can be synthesized as sm (t) = i=1,N smi ϕi (t).
The proof of this result can found in appendix E. There is also an example there where for
three signals we determine a set of building-blocks.
A certain set of signals can be expanded in many different sets of building-block waveforms,
also in sets with a larger dimensionality in general. What remains the same however is the geo-
metrical configuration of the vector representations of the signals (see exercise 3 in this chapter).
The Gram-Schmidt procedure always yields a base with the smallest possible dimension.
ϕ1 (t)
¾ ?»
sm1 - ×
½¼
ϕ2 (t)
¾ ?»
-
sm2 - × -
½¼ .. + - s (t)
. m
-
ϕ N (t)
¾ ?»
sm N - ×
½¼
Figure 6.2: A modulator forms the waveform sm (t) from the vector s m .
Now, instead of the waveform sm (t) we can say that the vector s m is transmitted. A
modulator (see figure 6.2) performs operation (6.8) that transforms a vector into the transmitted
waveform.
RESULT 6.3 If the waveforms s1 (t), s2 (t), · · · , s|M| (t) are linear combinations of N building-
blocks, the collection of waveforms can be represented as a collection of vectors s 1 , s 2 , · · · , s |M|
in an N -dimensional space. This space is called the signal space. The collection of vectors
s 1 , s 2 , · · · , s |M| is called the signal structure.
ϕ2
s3
6
s
¾ -4
s2 ϕ1
?s
1
Figure 6.3: Four√signals represented as vectors in a two-dimensional space. The length of all
signal vectors is E s .
for 0 ≤ t < T and zero elsewhere, we obtain the following vector-representations for the signals:
p
s 1 = (0, − E s ),
p
s 2 = (− E s , 0),
p
s 3 = (0, E s ),
p
s 4 = ( E s , 0). (6.13)
These vectors are depicted in figure 6.3. Note that the four waveforms shown in figure 6.4 have the same
signal structure (with respect to the base also shown there) as the signals defined in (6.10). We will learn
later in this chapter that both sets of waveforms also have identical error behavior.
q 0 τ 2τ 0 τ 2τ
1
τ
t t
ϕ1 (t) ϕ2 (t) s1 (t) s2 (t)
t q q
− Eτs − Eτs
0 τ 2τ
q q
Es Es
τ τ
s3 (t) s4 (t)
t t
0 τ 2τ 0 τ 2τ
Figure 6.4: Another set of waveforms that leads to the vector diagram in figure 6.3.
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 57
ϕ1 (t)
¾ ?» r1
- × - integrator -
½¼
ϕ2 (t)
¾ ?» r2
r (t)
- × - integrator -
½¼
..
.
ϕ N (t)
¾ ?» rN
- × - integrator -
½¼
Figure 6.5: Recovering the vector r = (r1 , r2 , · · · , r N ) from the received waveform r (t).
Consider the first part (demodulator) of the optimum receiver for waveform communication
working with sufficient statistics (see figure 6.5). This receiver observes
where sm (t) for some m ∈ M is the transmitted waveform and n w (t) is a realization of a white
Gaussian noise process. The first part of the receiver determines the N -dimensional vector r =
(r1 , r2 , · · · , r N ) whose components are defined as (see figure 6.5)
Z T
1
ri = r (t)ϕi (t)dt for i = 1, 2, · · · , N . (6.15)
0
The components of this vector are the sufficient statistics mentioned in section 6.1. So we can
say that the optimum receiver makes a decision based on the vector r instead of on the
waveform r (t).
received. We will now consider the vector-channel that has s m as input and r as output. Since
r (t) = sm (t) + n w (t) we have that
Z T
ri = r (t)ϕi (t)dt
0
Z T Z T
= sm (t)ϕi (t)dt + n w (t)ϕi (t)dt = smi + n i , (6.16)
0 0
with for i = 1, N , Z T
1
ni = n w (t)ϕi (t)dt. (6.17)
0
Hence again, as in section 5.3, we get
r = s m + n, (6.18)
RESULT 6.4 The joint density function of the N -dimensional noise vector n, that is added to
the input of a vector channel, which is derived from a waveform communication system, is
1 knk2
p N (n) = exp(− ), (6.21)
(π N0 ) N /2 N0
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 59
hence the noise is spherically symmetric and depends on the magnitude but not on the direction
of n. The noise projected on each dimension i = 1, N has variance N0 /2.
We may conclude that our channel with input s and output r is an additive Gaussian noise
(AGN) vector channel as was described in definition 4.2.
Note that by result 6.4 we must substitute N0 /2 for σ 2 here. Inspection now shows that an
optimum vector receiver should maximize
N0 ks k2 X N0 ks k2
(r · s m ) + ln Pr{M = m} − m = ri smi + ln Pr{M = m} − m (6.23)
2 2 2 2
i=1,N
we can observe that the back end of the receiver shown in figure 6.1 performs exactly as we
expect.
@
@ ¡
¡
vector channel
waveform sm (t) is input to the channel that adds white Gaussian noise. The demodulator op-
erates on the received waveform r (t) and forms the vector r . A vector receiver determines the
estimate m̂ of the message that was sent from r . The combination modulator - waveform channel
- demodulator is an additive Gaussian noise vector channel. In the previous chapter we have
seen how an optimum receiver for such a channel can be constructed. The decision function that
applies to this situation can be found in (4.12).
We may conclude that, by converting the problem of designing an optimum receiver for
waveform communication to designing an optimum receiver for an additive white Gaussian noise
vector channel, we have solved it. We want to end this chapter with a very important observation:
RESULT 6.5 The error behavior of the waveform communication system is only determined by
the collection of signal vectors (the signal structure) and N0 /2. We should realize that depending
on the chosen building-block waveforms different sets of waveforms having the same signal struc-
ture yield the same error performance. What building-block waveforms actually are chosen by
the communication engineer depends possibly on other factors as e.g. bandwidth requirements,
as we will see in later parts of these course notes.
6.9 Exercises
1. For each set of waveforms s1 (t), s2 (t), · · · , s|M| (t) we can apply the Gram-Schmidt pro-
cedure to construct an orthonormal base such that each waveform sm (t) for m ∈ M is a
linear combination of the building-block waveforms ϕ1 (t), ϕ2 (t), · · · , ϕ N (t). Show that
for a given set of waveforms the Gram-Schmidt procedure results in a base with the small-
est possible dimension N .
2. If we apply the Gram-Schmidt procedure we must assume that the waveforms sm (t), for
m ∈ M, have finite energy. In the construction it is essential that the energy E θm corre-
sponding the the auxiliary signal θm, (t) is finite. Show that
Z T
E θm ≤ sm2 (t)dt < ∞. (6.25)
0
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 61
3. Show that independent of the choice of the building-block waveforms, the length of any
signal vector s m for m ∈ M is constant. Moreover show that the Euclidean distance
between two signal vectors s m and s m 0 for m, m 0 ∈ M is constant. What does this imply
for energy and expected error probability of the system?
4. (a) Calculate PEmin when the signal sets (a), (b), and (c) specified in figure 6.7 are used to
communicate one of two equally likely messages over a channel disturbed by additive
Gaussian noise with Sn ( f ) = 0.15. Note that the third set is specified by the Fourier
transforms (see appendix B) of the waveforms. Use Parseval’s equation.
(b) Also calculate the PEmin for the same three sets when the a-priori message probabili-
ties are 1/4, 3/4.
(Based on exercise 4.8 from Wozencraft and Jacobs [25].)
s1 (t) s2 (t)
2 2
s1 (t) s2 (t)
1 1
t t t t
0 1 0 1 0 1 0 2
(a) (b)
S1 ( f ) S2 ( f )
1 1
(c)
f f
-1 0 +1 − 32 − 21 1
2
3
2
Matched filters
r (t) u i (t)
- h i (t) = ϕi (T − t) -
Consider (see figures 7.1 and 7.2) filters with an impulse response h i (t) = ϕi (T − t) for
i = 1, N . Note that h i (t) ≡ 0 for t ≤ 0 (i.e. the filter is causal) and also for t > T . For the
output of such a filter we can write
Z ∞
u i (t) = r (t) ∗ h i (t) = r (α)h i (t − α)dα
−∞
Z ∞
= r (α)ϕi (T − t + α)dα. (7.1)
−∞
62
CHAPTER 7. MATCHED FILTERS 63
¢A ϕi (t) ϕi (T − t) ¢A
¢ A ¢ A
¢ A ¢ A
¢ A ¢ A
¢ A t ¢ A t
¢ A ¢ A
0 T 0 T
Figure 7.2: A building-block waveform ϕi (t) and the impulse response ϕi (T − t) of the corre-
sponding matched filter.
hence we can determine the i-th component of the vector r in this way. Note that ri is the i-th
component of the sufficient statistic corresponding to the building-blocks.
Now we have an alternative method for computing the vector r . This results in the receiver
shown in figure 7.3. Note that for all m ∈ M
X Z T
(r · s m ) = ri smi = r (t)sm (t)dt. (7.3)
i=1,N 0
A filter whose impulse response is a delayed time-reversed version of a signal ϕi (t) is called
”matched to” ϕi (t). A receiver that is equipped with such filters is called a matched-filter receiver.
This gives another method to form an optimum receiver (see figure 7.4). This receiver is called a
direct receiver since the filters are matched directly to the signals {sm (t), m ∈ M}.
Note that a direct receiver is usually more expensive than a receiver with filters matched to
the building-block waveforms, since always M ≥ N and in practice even often M N . The
weighting-matrix operations are not needed here however.
CHAPTER 7. MATCHED FILTERS 64
c1
H r1 (r · s 1 ) ¾?»
- ϕ1 (T − t) ¤ ¡ HH¤ ¡ - - + -
£¢ £¢
r (t) u 1 (t) ½¼ m̂
weighting c2 select -
H r2 matrix (r · s 2 ) ¾?» largest
- ϕ2 (T − t) ¤ ¡ HH¤ ¡ - - + -
£¢ £¢
u 2 (t) ½¼
P
i ri smi
c|M|
H rN ) ?»
(r · s |M|¾
- ϕ N (T − t) ¤ ¡ HH¤ ¡ - - + -
£¢ £¢
u N (t) ½¼
? sample at t = T
with
Z ∞
1
u s (T ) = s(T − α)h(α)dα, and (7.6)
Z−∞
∞
1
u n (T ) = n w (T − α)h(α)dα, (7.7)
−∞
CHAPTER 7. MATCHED FILTERS 65
c1
H ¾?»
- ¤ ¡ HH¤ ¡ - + -
s1 (T − t) £¢ £¢
½¼ m̂
r (t) c2 select -
H ¾? » largest
- ¤ ¡ HH¤ ¡ - + -
s2 (T − t) £¢ £¢
½¼
c|M|
H ¾? »
- s|M| (T − t) ¤ ¡ HH¤ ¡ - + -
£¢ £¢
½¼
? sample at t = T
? sample at t = T
where u s (T ) is the signal component and u n (T ) the noise component in the the sampled filter
output.
RESULT 7.1 If we substitute (7.9) and (7.6) in (7.8) we obtain for the maximum attainable
signal-to-noise ratio
R ∞ 2
−∞ s(T − α)h(α)dα
S/N = N0 R ∞
2 −∞ h (α)dα
2
R∞ 2 R∞ 2
(∗) −∞ s (T − α)dα −∞ h (α)dα
≤ N0 R ∞
2 −∞ h (α)dα
2
R∞ 2 RT 2
−∞ s (T − α)dα 0 s (T − α)dα
= N0
= N0
. (7.10)
2 2
The inequality (∗) in this derivation comes from Schwarz inequality (see appendix F). Equality
is obtained if and only if h(t) = Cs(T − t) for some constant C, i.e. if the filter h(t) is matched
to the signal s(t).
Note that the maximum signal-to-noise ratio depends only on the energy of the waveform
s(t) and not on its specific shape.
It should be noted that in this section we have demonstrated a weaker form of optimality for
the matched filter than the one that we have obtained in the previous chapter. The matched filter
is not only the filter that maximizes signal-to-noise ratio, but can be used for optimum detection
as well.
f = ( f 1 , f 2 , · · · , f N ) and
g = (g1 , g2 , · · · , g N ). (7.12)
Now:
CHAPTER 7. MATCHED FILTERS 67
7.5 Notes
Figure 7.6: Dwight O. North, inventor of the matched filter. Photo IEEE-IT Soc. Newsl., Dec.
1998.
The matched filter as a filter for maximizing the signal-to-noise ratio was invented by North
[14] in 1943. The result was published in a classified report at RCA Labs in Princeton. The
name “matched filter” was coined by Van Vleck and Middleton who independently published
the result a year later in a Harvard Radio Research Lab report [24].
7.6 Exercises
1. One of two equally likely messages is to be transmitted over an additive white Gaussian
noise channel with Sn ( f ) = 0.05 by means of binary pulse position modulation. Specifi-
cally
s1 (t) = p(t)
s2 (t) = p(t − 2), (7.19)
in which the pulse p(t) is shown in figure 7.7.
2 p(t)
¢¢A
¢ A
1 ¢ A
¢ A
¢ AA
0 1 2 t
(d) Suggest another pair of waveforms that require the same energy as the binary pulse-
position waveforms and yield the same average error probability and a pair that
achieves smaller error probability.
(e) Calculate the minimum attainable average error probability if
s1 (t) = p(t) and s2 (t) = p(t − 1). (7.20)
Repeat this for
s1 (t) = p(t) and s2 (t) = − p(t − 1). (7.21)
√ √
E s /2 E s /7
s1 (t) s2 (t)
0 1 2 3 4 5 6 7 t 0 1 2 34 5 6 7 t
√
− E s /7
2. Specify a matched filter for each of the signals shown in figure 7.8 and sketch each filter
output as a function of time when the signal matched to it is the input. Sketch the output
of the filter matched to s2 (t) when the input is s1 (t).
(Exercise 4.14 from Wozencraft and Jacobs [25].)
3. Consider a transmitter that sends the signal
X
sa (t) = ak h(t − (k − 1)τ ),
k=1,K
CHAPTER 7. MATCHED FILTERS 70
n w (t)
impulses
a1 , · · · , a K sa (t) º ?· HH
- h(t) - + - h(T − t) H¤£¡¢ -
¹¸
?
+1 +1
h(t) h(t − 3τ ) sample at
6 6 6 ¢@ ¢@ ¢@
¢ @ ¢ @¢ @ t = T + (k − 1)τ
t ¢ @ ¢ ¢@ @
0 τ 2τ 3τ 0 τA ¡ 3τ
2τ 4τ t
A ¡
? A¡
−h(t − τ )
−1
¢@ £@ ¡@ sa (t)
¢ @ £
@¡
@
¢ B @
£
0 τB 2τ
£ 3τ 4τ t
B
¡
B¡
Figure 7.9: Pulse transmission and detection with a sampled-matched-filter receiver. Signals are
also shown.
4. One of two equally likely messages is to be transmitted over an additive white Gaussian
noise channel with Sn ( f ) = N0 /2 = 1 by means of binary pulse-position modulation.
Specifically
s1 (t) = p(t),
s2 (t) = p(t − 2),
(a) Describe (and sketch) an optimum receiver for this case? Express the resulting error
probability in terms of Q(·).
(b) Give the implementation of an optimum receiver which uses a single linear filter
followed by a sampler and comparison device. Assume that two samples from the
filter output are fed into the comparison device. Sketch the impulse response of the
CHAPTER 7. MATCHED FILTERS 71
2 p(t)
0 1 2 t
appropriate filter. What is the output of the filter at both sample moments when the
filter input is s1 (t)? What are these outputs for filter input s2 (t)?
(c) Calculate the minimum attainable average error probability if
Orthogonal signaling
SUMMARY: Here we investigate orthogonal signaling. In this case the waveforms that
correspond to the messages are orthogonal. We determine the average error probability PE
for these signals. It appears that PE can be made arbitrary small by increasing the number
of waveforms if only the energy per transmitted bit is larger than N0 ln 2. Note that this is
a capacity-result!
Definition 8.1 All signals in an orthogonal set are assumed to have equal energy and are orthog-
onal i.e.
1 p
s m = E s ϕ m for m ∈ M, (8.1)
where ϕ m is the unit-vector corresponding to dimension m . There are as many building-block
waveforms ϕm (t) and dimensions in the signal space as there are messages.
The signals that we have defined are now orthogonal since
Z ∞
sm (t)sk (t)dt = (s m · s k ) = E s (ϕ m · ϕ k ) = E s δmk for m ∈ M and k ∈ M. (8.2)
−∞
72
CHAPTER 8. ORTHOGONAL SIGNALING 73
kr − s m k2 = kr k2 + ks m k2 − 2(r · s m )
p
= kr k2 + E s − 2 E s rm . (8.3)
√
RESULT 8.1 Since only the term −2 E s rm depends on m an optimum receiver has to choose
m̂ such that
rm̂ ≥ rm for all m ∈ M. (8.4)
Now we can find an expression for the error probability.
Note that the noise vector n = (n 1 , n 2 , · · · , n |M| ) consists of |M| independent components all
with mean 0 and variance N0 /2.
Suppose that the first component of the received vector is α. Then we can write for the correct
probability, conditional on the fact that message 1 was sent and that the first component of r is
α,
√
with µ = α/ N0 /2. Furthermore
Z √ Z √ √
µ N0 /2 1 β2 µ N0 /2 1 (β/ N0 /2)2 p
√ exp(− )dβ = √ exp(− )dβ/ N0 /2
−∞ π N0 N0 −∞ 2π 2
Z µ 2
1 λ
= √ exp(− )dλ (8.9)
−∞ 2π 2
√
with λ = β/ N0 /2. Therefore
Z ∞ Z µ |M|−1
PC = p(µ − b) p(λ)dλ dµ, (8.10)
−∞ −∞
2 √
with p(γ ) = √1 exp(− γ2 ) and b = 2E s /N0 . Note that p(γ ) is the probability density
2π
function of a Gaussian random variable with mean zero and variance 1.
From (8.10) we conclude that for a given |M| the correct probability PC depends only on
b i.e. on the ratio E s /N0 . This can be regarded as a signal to noise ratio since E s is the signal
energy and N0 is twice the variance of the noise in each dimension.
Example 8.1 In figure 8.1 the error probability PE = 1 − PC is depicted for values of log2 |M| =
1, 2, · · · , 16 and as a function of E s /N0 .
8.4 Capacity
Consider the following experiment. We keep increasing |M| and want to know how we should
increase E s /N0 such that the error probability PE gets smaller and smaller. It turns out that it is
the energy per bit that counts. We define E b i.e. the energy per transmitted bit of information
1 Es
Eb = . (8.11)
log2 |M|
Es/N0
0
10
−1
10
−2
10
−3
10
−4
10
−4 −2 0 2 4 6 8 10 12 14 16
Figure 8.1: Error probability PE for |M| orthogonal signals as a function of E s /N0 in dB for
log2 |M| = 1, 2, · · · , 16. Note that PE increases with |M| for fixed E s /N0 .
Example 8.2 In figure 8.2 we have plotted the error probability PE as a function of the ratio E b /N0 . It
appears that for ratios larger than ln 2 = −1.5917 dB (this number is called the Shannon limit) the error
probability decreases by making |M| larger.
CHAPTER 8. ORTHOGONAL SIGNALING 76
Eb/N0
0
10
−1
10
−2
10
−3
10
−4
10
−4 −2 0 2 4 6 8 10 12 14 16
Figure 8.2: Error probability for |M| = 2, 4, 8, · · · , 32768 now as function of the ratio E b /N0
in dB.
8.5 Exercises
1. A Hadamard matrix is a matrix whose elements are ±1. When n is a power of 2, an n × n
Hadamard matrix is constructed by means of the recursion:
1 +1 +1
H2 = ,
+1 −1
1 +Hn +Hn
H2n = . (8.13)
+Hn −Hn
(a) Show that the signal set {s m , m ∈ M} consists of orthogonal vectors all having
energy E s .
(b) What is the error probability PE if the signal vectors correspond to waveforms that
are transmitted over a waveform channel with additive white Gaussian noise having
spectral density N0 /2.
(c) What is the advantage of using the Hadamard signal set over the orthogonal set from
definition 8.1 if we assume that the building-block waveforms are in both cases time-
shifts of a pulse? And the disadvantage?
CHAPTER 8. ORTHOGONAL SIGNALING 77
defines a set of 2n signals which is called bi-orthogonal. Determine the error proba-
bility if this signal set if the energy of each signal is E s .
A bi-orthogonal “code” with n = 32 was used for an early deep-space mission (Mariner,
1969). A fast Hadamard transform was used as decoding method [6].
For t < 0 and t ≥ 1 all signals are zero. The signals are transmitted over an additive white
Gaussian noise waveform channel. The power spectral density of the noise is Sn ( f ) =
N0 /2 for all frequencies f .
(a) First show that the signals sm (t), m ∈ {1, 2, · · · , 8} are orthogonal1 . Give the ener-
gies of the signals. What are the building-block waveforms ϕ1 (t), ϕ2 (t), · · · , ϕ8 (t)
that result in the signal vectors
s1 = (A, 0, 0, 0, 0, 0, 0, 0),
s2 = (0, A, 0, 0, 0, 0, 0, 0),
···
s8 = (0, 0, 0, 0, 0, 0, 0, A)?
R1
(b) The optimum receiver first determines the correlations ri = 0 r (t)ϕi (t)dt for i =
1, 2, · · · , 8. Here r (t) is the received waveform hence r (t) = sm (t) + n w (t). For
what values of the vector r = (r1 , r2 , · · · , r8 ) does the receiver decide that message
m was transmitted?
(c) Give an expression for the error probability PE obtained by the optimum receiver.
(d) Next consider a system with 16 messages all having the same a-priori probability.
The signal waveform corresponding to message m is now given by
√
sm (t) = A 2 cos(2π mt), for 0 ≤ t < 1 for m = 1, 2, · · · , 8,
√
sm (t) = −A 2 cos(2π(m − 8)t), for 0 ≤ t < 1 for m = 9, 10, · · · , 16.
For t < 0 and t ≥ 1 these signals are again zero. What are the signal vectors
s 1 , s 2 , · · · , s 16 if we use the building vectors mentioned in part (a) again?
1 Hint: 2 cos(a) cos(b) = cos(a − b) + cos(a + b).
CHAPTER 8. ORTHOGONAL SIGNALING 78
R1
(e) The optimum receiver again determines the correlations ri = 0 r (t)ϕi (t)dt for i =
1, 2, · · · , 8. For what values of the vector r = (r1 , r2 , · · · , r8 ) does the receiver now
decide that message m was transmitted? Give an expression for the error probability
PE obtained by the optimum receiver now.
Transmitter Optimization
79
Chapter 9
80
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 81
therefore we only need to know the collection of signal vectors, the signal structure, and the
message probabilities to determine the average signal energy which is defined as
1
X X
E av = Pr{M = m}E sm = Pr{M = m}ks m k2 = E[kSk2 ]. (9.2)
m∈M m∈M
ϕ2 ϕ 02
¤¡
£¢
¤¡
£¢ ϕ 01
¤¡
£¢
Á ¤¡
a £¢
¤¡
£¢ ϕ1
Figure 9.1: Translation of a signal structure, moving the origin of the coordinate system to a.
We now have to find out how we should choose the translation vector a that minimizes
E av (a). It turns out that the best choice is
X
a= Pr{M = m}s m = E[S]. (9.4)
m=1,|M|
This follows from considering an alternative translation vector b. The energy of the signal struc-
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 82
s1(t)/sqrt(Es) s2(t)/sqrt(Es)
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
s1(t)/sqrt(Es) s2(t)/sqrt(Es)
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 9.2: The two signal sets of (9.6) and (9.7) in waveform representation.
where equality in the third line follows from a = E[S]. Observe that E av (b) is minimized only
for b = a = E[S].
RESULT 9.1 This implies that to minimize the average signal energy we should choose the
center of gravity of the signal structure as the origin of the coordinate system.
In what follows we will discuss some examples.
ϕ2
s 2 ¤£ ¢¡
s¤2¡ s¤1¡
s¤1¡ £¢ £¢
£¢ 0 ϕ1
0 ϕ1
(a) (b)
Figure 9.3: Vector representation of the orthogonal (a) and the antipodal signal set (b).
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
10
−15 −10 −5 0 5 10 15
Figure 9.4: Probability of error for binary antipodal and orthogonal signaling as function of
E s /N0 in dB. It is assumed that Pr{M = 1} = Pr{M = 2} = 1/2. Observe the difference of 3
dB between both curves.
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 84
and a second signal set of two antipodal signals (see the bottom row sub-figures in figure 9.2)
p
s1 (t) = 2E sin(10π t)
p s
s2 (t) = − 2E s sin(10π t), also for 0 ≤ t < 1. (9.7)
Note that binary FSK (frequency-shift keying) is the same as orthogonal signaling, while binary
PSK (phase-shift keying) is identical to antipodal signaling.
The vector representations of the signal sets are
p
s 1 = ( E s , 0),
p
s 2 = (0, E s ), (9.8)
9.6 Exercises
1. Either of the two waveform sets illustrated in figures 9.5 and 9.6 may be used to commu-
nicate one of four equally likely messages over an additive white Gaussian noise channel.
s1 (t) s2 (t)
√ √
E s /2 E s /2
1 2 3 4 t 1 2 3 4 t
s3 (t) s4 (t)
√ √
E s /2 E s /2
1 2 3 4 t 1 2 3 4 t
s1 (t) s2 (t)
√ √
Es Es
1 2 3 4 t 1 2 3 4 t
s3 (t) s4 (t)
√ √
Es Es
1 2 3 4 t 1 2 3 4 t
n1
m sm1 ¾?»r1 = sm1 + n 1 m̂
- - + - -
transmitter ½¼ optimum
¾» receiver
- + -
sm2 ½¼ r2 = sm2 + n 2
6
n2
r1
¾?»
r1 + αr2 m̂
+ - threshold device -
¾» ½¼
- × 6
r2 ½¼
6
α
(a) Show that the optimum receiver can be realized as diagrammed in figure 9.8. Deter-
mine α, the optimum threshold setting, and the value m̂ for r1 + αr2 larger than the
threshold.
(b) Express the resulting PE in terms of Q(·).
(c) What is the average energy E av of the signals. By translation of the signal structure
we can reduce this average energy without changing the error probability. Determine
the smallest possible average signal energy that can be obtained in this way.
(Exam Communication Principles, July 5, 2002.)
Chapter 10
10.1 Introduction
In the previous part we have considered transmission of a single randomly-chosen message
m ∈ M, over a waveform channel during the time-interval 0 ≤ t < T . The problem that was to
be solved there was to determine the optimum receiver, i.e the receiver that minimizes the error
probability PE for a channel with additive white Gaussian noise. We have found the following
solution:
1. First form an orthonormal base for the set of signals.
2. The optimum receiver determines the finite-dimensional vector representation of the re-
ceived waveform, i.e. it projects the received waveform onto the orthonormal signal base.
3. Then using this vector representation of the received waveform, it determines the message
that has the (or a) largest a-posteriori probability. This is done by minimizing expression
(4.12) over all m ∈ M, hence by acting as optimum vector receiver.
In the present part we shall investigate the more practical situation where we have a con-
tinuous stream of messages that are to be transmitted over our additive white Gaussian noise
waveform channel.
88
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 89
m1 ∈ M m2 ∈ M m3 ∈ M ···
0 T 2T 3T t
Definition 10.2 For each signal sm (t) for m ∈ M the available energy is E s hence
Z T
Es ≥ sm2 (t)dt Joule. (10.1)
0
Therefore the available power is
Es1
Ps = Joule/second. (10.2)
T
Definition 10.3 The transmission rate R is defined as
1 log2 |M|
R= bit/second. (10.3)
T
Definition 10.4 Now the available energy per transmitted bit is defined as
1 Es
Eb = Joule/bit. (10.4)
log2 |M|
From the above definitions we can deduce for the available energy per transmitted bit that
Es T Ps
Eb = = , (10.5)
T log2 |M| R
hence Ps = R E b .
Our problem is now to determine the maximum rate at which we can communicate reli-
ably over a waveform channel when the available power is Ps . What are the signals that are to
be used to achieve this maximum rate? In the next sections we will first consider two extremal
situations, namely bit-by-bit signaling and block-orthogonal signaling.
x(t)
c
¡@
R∞
¡ @ −∞ x 2 (t)dt = E b
¡
¡ @
@
τ t
s(t)
c x(t − 3τ )
¡@
¡ @ ¡@
¡ @ ¡@
¡ @
¡
¡ @
@¡¡ @
@ ¡
¡ @
@
τ 2τ@@ ¡
¡3τ 4τ@@ ¡
¡5τ t
@
@¡¡ @
@¡¡
We can realize this by transmitting a signal s(t) that is composed out of K pulses x(t) that
are shifted in time (see figure 10.2). More precisely
X
−1 if bi = 0
s(t) = si x(t − (i − 1)τ ) with si = (10.7)
+1 if bi = 1.
i=1,K
and we have an orthonormal base. Hence the√building-block waveforms are the time-shifts over
multiples of τ of the normalized pulse x(t)/ E b as is shown in figure 10.3. Now we can deter-
mine the signal structures that correspond to all the signals. For K = 1, 2, 3 we have shown the
signal structures for bit-by-bit signaling in figure 10.4. The structure is always a K -dimensional
hypercube.
To see how the decision regions look like note first that
p p p
s 1 = (− E b , − E b , · · · , − E b ). (10.10)
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 91
¡
¡ @
@¡¡ @
@¡¡ @
@¡¡ @
@¡¡ @
@
τ 2τ 3τ 4τ 5τ t
√
2 Eb
¾ -
ϕ2
¤¡ ¤¡
√ s3 £ ¢ £ ¢ s4
2 Eb
¾ -
ϕ1 ϕ1
¤¡ ¤¡
£¢ £¢
s1 0 s2 0
K =1 ¤¡ ¤¡
£¢ £¢
s1 s2
ϕ2
s s
¤3¡ ¤4¡
©£ ¢
© ©£ ¢
©
© © K =2
©© ©
© ©©
¤¡ ©© ¤©
¡ ©©
s 7 £©
¢ £¢
s8
ϕ1
©
©
©©© 0
ϕ3 ©©
©©
© ¤¡ ¤¡
©£ ¢
© ©£ ¢ s 2
©
© ©© s1 ©©
© ©©
¡©
¤© ¡©
¤©
£¢ £¢
s5 s6 K =3
¾ -
√
2 Eb
Now for messages other than for m = 1 similar arguments show that the decision regions are
separated by hyperplanes ri = 0 for i = 1, K .
To find an expression for PE note that the√signal hypercube is symmetrical and assume that
s 1 was transmitted. No error occurs if ri = − E b + n i < 0 for all i = 1, K or, in other words,
if p
n i < E b for all i = 1, K , (10.12)
hence, noting that σ 2 = N0 /2, we get
p K
PC = 1 − Q( 2E b /N0 ) . (10.13)
Therefore K
p
PE = 1 − 1 − Q( 2E b /N0 ) . (10.14)
Observe that to estimate bit bi for i = 1, K , an optimum receiver only needs to consider the
received signal ri in dimension i.
Based on this expression for PE we can now investigate what we should do to improve the
reliability of our system. √
First note that we should have K = RT ≥ 1. For K = 1 we get PE = Q( 2Ps /R N0 ).
There are two conclusions that can be drawn now:
• For fixed average power Ps and fixed rate R the average error probability increases and
approaches 1 if we increase T . We cannot improve reliability by increasing T therefore.
• For fixed T the average error probability PE can only be decreased by increasing the power
Ps or by decreasing the rate R.
This seemed to be ”the end of the story” for communication engineers before Shannon presented
his ideas in [20] and [21].
2 To
√
√ why, consider a signal s m for some m 26 = 1. Consider
see √ a dimension
2
√ i for which smi = E b . Note that
2 2
√ for ri < 0 we have (ri − 2smi ) = (ri − 2 E b ) > (ri + E b ) = (ri − s1i ) . For dimensions
s1i = − E b hence
i with smi = − E b = s1i we get (ri − smi ) = (ri − s1i ) . Taking together all dimensions i = 1, K we obtain
kr − s m k2 > kr − s 1 k2 and therefore as claimed m̂ = 1 is chosen by an optimum receiver if ri < 0 for all i = 1, K .
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 93
R∞ 2
R∞ 2
−∞ ϕ (t)dt = 1 −∞ s27 (t)dt = E s = T Ps
where ϕ(t) has energy 1 and duration not more than τ with
T
τ= . (10.17)
2K
All signals within the block [0, T ) are orthogonal and have energy E s . That is why we call our
signaling method block-orthogonal.
i.e. we are willing to spend slightly more than N0 ln 2 Joule per transmitted bit of information.
Now we obtain from (8.12) that
p √
PE ≤ 2 exp(− log2 |M|[ (1 + )2 ln 2 − ln 2]2 )
= 2 exp(− log2 |M| 2 ln 2). (10.19)
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 94
What does (10.18) imply for the rate R? Note that E b = Ps /R (see (10.5)). This results in
in other words
1 Ps
R= . (10.22)
(1 + )2 N0 ln 2
Based on (10.20) and (10.22) we can now investigate what we should do to improve the reliability
of our system. This leads to the following result.
THEOREM 10.1 For an available average power Ps we can achieve rates R smaller than but
arbitrarily close to
1 Ps
C∞ = bit/second (10.23)
N0 ln 2
while the error probability PE can be made arbitrarily small by increasing T . Observe that C∞ ,
the capacity, depends only on the available power Ps and spectral density N0 /2 of the noise.
This is a Shannon-type of result. The reliability can be increased not only by increasing the
power Ps or decreasing the rate R but also by increasing the “codeword-lengths” T . It is also
important to note that only rates up to the capacity C∞ can be achieved reliably. We will see
later that rates larger than C∞ are indeed not achievable. Therefore it is correct to use the term
capacity here.
One might get the feeling that this could finish our investigations. We have found the capacity
of the waveform channel with additive white Gaussian noise. In the next chapter we will see that
so far we have ignored an important property of signal sets: their dimensionality.
10.5 Exercises
1. Rewrite the bound in (8.12) on PE in terms of R, T and C∞ . Consider3
1 1
E(R) = lim − ln PE (T, R), (10.24)
T →∞ T
for the best possible signaling methods with rate not smaller than R having signals with
duration T . The function E(R) of R is called the reliability function.
Now compute a lower bound on the reliability function E(R) and draw a plot of this lower
bound. It can be shown that E(R) is equal to the lower bound that we have just derived
(see Gallager [7]).
SUMMARY: Here we first discuss the fact that, to obtain a small error probability PE ,
block-orthogonal signaling requires a huge number of dimensions per second. Then we
show that a bandwidth W can only accommodate roughly 2W dimensions per second.
Hence block-orthogonal signaling is only practical when the available bandwidth is very
large.
RESULT 11.1 (Dimensionality theorem) Let {φi (t), for i = 1, N } denote any set of orthogo-
nal waveforms such that for all i = 1, N ,
95
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 96
2. and also Z Z
W ∞
2 2
|8i ( f )| d f ≥ (1 − ηW ) |8i ( f )|2 d f (11.1)
−W −∞
for the spectrum 8i ( f ) of φi (t). The parameter W is called bandwidth (in Hz).
The theorem says that if we require almost all spectral energy of the waveforms to be in the
frequency range [−W, W ], the number of waveforms cannot be much more than roughly 2W T .
In other words the number of dimensions is not much more than 2W per second.
The “definition” of bandwidth in the dimensionality theorem may seem somewhat arbitrary,
but what is important is that the number of dimensions grows not faster than linear in T .
Note that instead of [0, T ) as in previous chapters, we have restricted ourselves here to the
interval [− T2 , T2 ] here. The reason for this is that the analysis is more easy this way.
Next we want to show the converse statement, i.e. that the number of dimensions N can grow
linearly with T . Therefore consider 2K + 1 orthogonal waveforms that are always zero except
for − T2 ≤ t ≤ T2 . In that case the waveforms are defined as
φ0 (t) = 1
√ t √ t
φ1c (t) = 2 cos(2π ) and φ1s (t) = )
2 sin(2π
T T
c
√ t √ t
φ2 (t) = 2 cos(4π ) and φ2s (t) = 2 sin(4π )
T T
···
√ t √ t
φ Kc (t) = 2 cos(2K π ) and φ Ks (t) = 2 sin(2K π ). (11.3)
T T
We now determine the spectra of all these waveforms.
Note that this is the so called sinc-function. In figure 11.1 the signal φ0 (t) and the corre-
sponding spectrum 80 ( f ) are shown for T = 1.
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 97
x0(t) X0(f)
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−1 −0.5 0 0.5 1 −4 −2 0 2 4
Figure 11.1: The signal φ0 (t) and the corresponding spectrum 80 (t) for T = 1.
2. The spectrum of the cosine-waveform φkc (t) for a specified k can be determined by observ-
ing that
√ kt
φkc (t) = φ0 (t) 2 cos 2π
T
√ 1 kt kt
= φ0 (t) 2 exp( j2π ) + exp(− j2π ) (11.5)
2 T T
In figure 11.2 the signal φ5c (t) and the corresponding spectrum 8c5 ( f ) are shown again for
T = 1.
We now want to find out how much energy is in certain frequency bands. E.g. MATLAB
tells us that
Z ∞
sin π f T 2
d f = T and
−∞ πf
Z 1/T
sin π f T 2
d f = 0.9028T, (11.7)
−1/T πf
hence less than 10 % of the energy of the sinc-function is outside the frequency band [−1/T, 1/T ].
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 98
x5c(t) X5c(f)
1.5
0.7
0.6
1
0.5
0.5
0.4
0.3
0
0.2
0.1
−0.5
−1
−0.1
−0.2
−1.5
−1 −0.5 0 0.5 1 −5 0 5
Figure 11.2: The signal φ5c (t) and the corresponding spectrum 8c5 ( f ) for T = 1.
Numerical analysis shows that never more than 12 % of the energy of 8ck ( f ) is outside
the frequency band [− k+1 k+1
T , T ] for k = 1, 2, · · · . Similarly we can show that the spectrum
of the sine-waveform φks (t) has never more than 5 % of its energy outside the frequency band
[− k+1 k+1
T , T ] for such k. For large k both percentages approach roughly 5 %.
The frequency band needed to accommodate all 2K + 1 waveforms is [− KT+1 , KT+1 ]. Now
suppose that the available bandwidth is W . Then, to have not more than 12 % out-of-band energy,
K should satisfy
K +1
W ≥ . (11.8)
T
If, for a certain bandwidth W we take the largest K , the number N of orthogonal waveforms is
RESULT 11.2 There are at least N = 2W T − 3 orthogonal waveforms over [− T2 , T2 ] such that
less than 12 % of their spectral energy is outside the frequency band [−W, W ]. This implies that
a fixed bandwidth W can accommodate 2W dimensions per second for large T .
The dimensionality theorem and the converse result show that block-orthogonal signaling has
a very unpleasant property that concerns bandwidth. This is demonstrated by the next example.
Example 11.1 For block-orthogonal signaling the number of required dimensions in an interval of T
seconds is 2 RT . If the channel bandwidth is W , the number of available dimensions is roughly 2W T ,
hence the bandwidth W should be such that W ≥ 2 RT /(2T ).
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 99
100
Now consider subsection 10.4.2 and take = 0.1. Then (see (10.22)) R = C .
121 ∞
To get an
acceptable PE we have to take RT 100 (see (10.20)), hence
2 RT 2 RT 2100
W ≥ = R R. (11.10)
2T 2RT 200
Even for very small values of the rate R this causes a big problem. We can also conclude that only very
small spectral efficiencies R/W can be realized1 .
11.3 Remark
In this chapter we have studied building-block waveforms that are time-limited. As a conse-
quence these waveforms have a spectrum which is not frequency-limited. We can also investi-
gate waveforms that are frequency-limited. However this implies that these waveforms are not
time-limited anymore. Chapter 14 on pulse modulation deals with such building blocks.
11.4 Exercises
√
1. Consider the Gaussian pulse x(t) = ( 2πσ 2 )−1 exp(−t 2 /(2σ 2 )) and signals such as
X
sm (t) = smi x(t − iτ ), for m = 1, 2, · · · , |M|, (11.11)
i=1,N
constructed from successive τ -second translates of x(t). Constrain the inter-pulse interfer-
ence by requiring that
Z ∞ Z ∞
x(t − kτ )x(t − lτ )dt ≤ 0.05 x 2 (t)dt, for all k 6= l, (11.12)
−∞ −∞
and constrain the signal bandwidth W by requiring that x(t) have no more than 10 % of
its energy outside the frequency interval [−W, W ]. Determine the largest possible value
of the coefficient c in the equation N = cT W when N 1.
(Exercise 5.4 from Wozencraft and Jacobs [25].)
1 Note however that e.g. for = 1 the required number of dimensions can be acceptable. However then the rate
R is only one fourth of the capacity Ps /(N0 ln 2).
Chapter 12
Bandlimited transmission
SUMMARY: Here we determine the capacity of the bandlimited additive white Gaussian
noise waveform channel. We approach this problem in a geometrical way. The key idea is
random code generation (Shannon, 1948).
12.1 Introduction
In the previous chapter we have seen that for a waveform channel with bandwidth W , the num-
ber of available dimensions per second is roughly 2W . We have also seen that reliable block-
orthogonal signaling requires, already for very modest rates, many more dimensions per second.
The reason for this was that, for rates close to capacity, although the error probability decreases
exponentially with T , the number of necessary dimensions increases exponentially with T . On
the other hand bit-by-bit signaling requires as many dimensions as there are bits to be transmit-
ted, which is acceptable from the bandwidth perspective. But bit-by-bit signaling can only made
reliable by increasing the transmitter power Ps or by decreasing the rate R.
This raises the question whether we can achieve reliable transmission at certain rates R by
increasing T , when both the bandwidth W and available power Ps are fixed. The following
sections show that this is indeed possible. We will describe the results obtained by Shannon
[21]. His analysis has a strongly geometrical flavor. These early results may not be as strong as
possible but they certainly are very surprising and give the right intuition.
We will determine the capacity per dimension, not per unit of bandwidth since the definition
of bandwidth is somewhat arbitrary. We will work with vectors as we learned in the previous
part of these course notes. Note that all messages are equiprobable.
Definition 12.1 The energies of all signals are upper-bounded by E s which is defined to be the
number of dimensions N times E N , i.e.
X 1
ks m k2 = 2
smi ≤ E s = N E N , for all m ∈ M. (12.1)
i=1,N
100
CHAPTER 12. BANDLIMITED TRANSMISSION 101
³» » X XP
´© » » X X
3́H Q
´
©
¡ 0 ´ ££± H @
¡¡ r ´ £ @@
¡ ´´ £ n0 @J
£ ¢¢ ´´ £
AA B
¤¤´¤¤ ´ £ CC CC
¤£¡
´ ³
1£ ¢
´ ³³ 0
´ CC CC ³³³ sm ¤¤ ¤¤
´ ³
´ ³³³ B A ¢
´³³ A ¢ £
¡³
¤³ ´ ³ J @ ¡
0 £´
¢ @ @
@ HHX
¡
© ¡
¡
QP X » »© ³ ´
X X » »
It has advantages to consider here normalized versions of the signal, noise and received vectors.
These normalized versions are defined in the following way:
Observe that this variance approaches zero for N → ∞. Now Chebyshev’s inequality can be
used to show that the probability of observing a normalized noise vector with squared length
smaller than N0 /2 − or larger than N0 /2 + for any fixed > 0 approaches zero for N → ∞.
CHAPTER 12. BANDLIMITED TRANSMISSION 102
kr 0 k2 = ks 0m + n 0 k2 = ks 0m k2 + 2(s 0m · n 0 ) + kn 0 k2
2 X
= ks 0m k2 + smi n i + kn 0 k2 . (12.4)
N
i=1,N
Note that the term ks 0m k2 is ”non-random” for a given message m while the other two terms in
(12.4) depend on N and are therefore random variables. For the expected squared length of the
random normalized received vector R we therefore obtain that
2 X
E[kR 0 k2 ] = ks 0m k2 + smi E[Ni ] + E[kN 0 k2 ]
N
i=1,N
N0 (a) N0
= ks 0m k2 + ≤ EN + . (12.5)
2 2
Here (a) follows from the fact that ks 0m k2 = ks m k2 /N ≤ E s /N = E N by definition 12.1. It can
be shown that also the variance of the squared length of R 0 approaches zero for N → ∞. See
also exercise 2 at the end of this chapter.
³»» X XP
´© » X
» XH Q
© H
¡¡¡ @@@
¡ r ½@
>J
½
£ ¢¢ ½ AA B
½ CC CC
¤¤ ¤¤ ¡½
¤½
- ¾ £¢ -
r −
CC CC ¤¤ ¤¤
B AA ¢
¢ £
J@ ¡
@ @
@ H ¡ ¡
HX »©© ¡
QP X » ³ ´
X
X » »
Figure 12.2: A hypersphere with radius r − concentric within a hypersphere of radius r , the
difference is a shell with thickness .
RESULT 12.3 The ratio of the volume of the ”interior” of a hypersphere in N dimensions and
the hypersphere itself is
Vr − r − N
=( ) (12.6)
Vr r
which approaches zero for N → ∞. Consequently for N → ∞ the shell volume Vr − Vr −
constitutes the entire volume Vr of the hypersphere, no matter how small the thickness > 0 of
the shell is.
RESULT 12.4 For reliable transmission, the number of signals |M| can not be larger than the
volume of the hypersphere containing the received vectors, divided by the volume of the noise
hyperspheres, hence
N /2 ! N /2
B N E N + N20 E N + N20
|M| ≤ N /2 = N0
. (12.7)
N0
BN 2 2
Here B N is again the constant in the formula Vr = B N r N for the volume Vr of an N -dimensional
hypersphere with radius r .
We now have our upper bound on the number |M| of signal vectors. It should be noted that
in the book of Wozencraft and Jacobs [25] a more detailed proof can be found.
CHAPTER 12. BANDLIMITED TRANSMISSION 104
»» XXX »
³» X
XP
!!» aa ´ Q
©
© H
H
´ Q¡ ¡ @
@
¡ 7
¶ Z
A
@ J
¡ ¶ @ Z n 0
¶ £ A Z B
¡ A @Z CC
¶ ¤¤
¶ h A J ZZ ~
AA © *
©
¢¢ s 0m ¶ AAA©©
¯ ¶ C © L ¤¤
¯ ¶
C
© ©© L
B £
¤ ¶
© ©©J C
¤ ¶ © C
¤ © 0 @@ C ¡
¡
¶ © r QP
¡©
¤¶
© X X »
» ³´
£¢
0
C ¤
C ¤
C ¤
L ¯
L ¯
AA ¢¢
J
@ ¡
@ ¡
@ ¡
Q
H
Ha ©©´
aXX !!
X »»»
Consider figure 12.3. Suppose that the signal s 0m for some fixed m ∈ M was actually sent.
The noise vector n 0 is added to the signal vector and hence the received vector turns out to be
r 0 = s 0m + n 0 . Now an optimum receiver will decode the message m only if there are no other
signals sk0 , k 6= m, inside a hypersphere with radius n 0 around r 0 . This is a consequence of result
4.1 that says that minimum Euclidean distance decoding is optimum if all messages are equally
likely.
0
(i) Note that result 12.3 implies that, with probability √ approaching one for N → ∞, the signal
s m was chosen on the surface of a sphere with radius E N . This is a consequence of the √ selec-
tion procedure of the signal vectors which is uniform over the hypersphere with radius E N .
Therefore we may assume that ks 0m k2 = E N . (ii) By result 12.2 the normalized received√ vector
0
r is now, with probability approaching one, on the surface of a sphere with radius E N + N0 /2
centered at the origin, and thus kr 0 k2 = E N√+ N0 /2. (iii) Moreover the normalized noise vector
n 0 is on the surface of a sphere with radius N0 /2 hence kn 0 k2 = N0 /2 by the sphere-hardening
argument (see result 12.1).
First we have to determine the probability that the signal corresponding to the fixed message
k 6= m is inside the sphere with center r 0 and radius kn 0 k. Therefore we need to know the volume
of the “lens” in the figure 12.3. Determining this volume is not so easy but we know that it is not
larger than the volume of a sphere with radius h, see figure 12.3, and center coinciding with the
center of the lens. This hypersphere contains the lens. Our next problem is therefore to find out
how large h is.
Observing the lengths of s 0m , n 0 , and r 0 , we may conclude that the angle between the normal-
ized signal vector s 0m and the normalized noise vector n 0 is π/2 (Pythagoras). Now we can use
simple geometry to determine the length h:
√ q N0
EN 2
h=q , (12.8)
E N + N20
and consequently
! N /2
E N N20
Vlens ≤ B N h N = B N N0
. (12.9)
EN + 2
The probability that a signal s k for a specific k 6= m √was selected inside the lens is (by the
uniform distribution over the hypersphere with radius E N ) now given by the volume of the
lens divided by the volume of the sphere. This ratio can be bounded as follows
N0
! N /2
Vlens 2
N /2
≤ N0
. (12.10)
BN E N EN + 2
By the union bound1 the probability that any signal s k for k 6= m is chosen inside the lens is at
most |M| − 1 times as large as the ratio (probability) considered in (12.10). Therefore we obtain
1 For the K events E 1 , E 2 , · · · , E K the probability Pr{E 1 ∪ E 2 ∪ · · · ∪ E K } ≤ Pr{E 1 } + Pr{E 2 } + · · · + Pr{E K }.
CHAPTER 12. BANDLIMITED TRANSMISSION 106
for the error probability averaged over the ensemble of signal sets
N
! N /2
0
PEav ≤ |M| 2
N0
. (12.11)
EN + 2
then
lim PEav ≤ lim 2−δ N = 0. (12.13)
N →∞ N →∞
This implies the existence of signal sets, one for each value of N , with |M| as in (12.12), for
which lim N →∞ PE = 0.
Note that relation (12.12) makes reliable transmission possible. Since better methods could
exist, it serves as a lower bound on |M| when reliable transmission is required.
THEOREM 12.6 (Shannon, 1949) Reliable communication is possible over a waveform chan-
nel, when the available energy per dimension is E N and the spectral density of the noise is N20 ,
for all rates R N satisfying
R N < C N bit/dimension. (12.17)
Rates R N larger than C N are not realizable with reliable methods. The quantity C N is called the
capacity of the waveform channel in bit per dimension.
CHAPTER 12. BANDLIMITED TRANSMISSION 107
In the previous chapter we have seen that a waveform channel with bandwidth W can ac-
commodate at most 2W dimensions per second. With 2W dimensions per second, the available
energy per dimension is E N = Ps /(2W ) if Ps is the available transmitter power. In that case the
channel capacity in bit per dimension is
Ps
!
1 2W + N 0 /2 1 Ps
C N = log2 = log2 1 + . (12.18)
2 N0 /2 2 W N0
Therefore we obtain the following result:
THEOREM 12.7 For a waveform channel with spectral noise density N0 /2, frequency band-
width W , and available transmitter power Ps , the capacity in bit per second is
Ps
C = 2W C N = W log2 1 + bit/second. (12.19)
W N0
Thus reliable communication is possible for rates R in bit per second smaller than C, while rates
larger than C are not realizable with arbitrary small PE .
12.8 Exercises
2
1. Let N be a random variable with density p N (n) = √ 1 n
exp(− 2σ 4
2 ). Show that E[N ] =
2π σ 2
3σ 4 .
2. Show that the variance of the squared length of the normalized received vector R 0 ap-
proaches zero for N → ∞.
n w,a (t)
¾?»
- + -
m ½¼ m̂
- transmitter n w,b (t) receiver -
¾?»
- + -
½¼
Wideband transmission
SUMMARY: We determine the capacity of the additive white Gaussian noise waveform
channel in the case of unlimited bandwidth resources. The result that is obtained in this
way shows that block-orthogonal signaling is optimal.
13.1 Introduction
We have seen that with block-orthogonal signaling we can achieve reliable communication at
rates
Ps
R< bit/second, (13.1)
N0 ln 2
if we only had access to as much bandwidth as we would want. We will prove in this chapter
that rates larger than Ps /(N0 ln 2) are not realizable with reliable signaling schemes. Therefore
we may indeed call Ps /(N0 ln 2) the capacity of the wideband waveform channel.
RESULT 13.1 The capacity C∞ of the wideband waveform channel with additive white Gaus-
sian noise with power density N0 /2 for −∞ < f < ∞, when the transmitter power is Ps , follows
108
CHAPTER 13. WIDEBAND TRANSMISSION 109
1.5
0.5
0
0 0.5 1 1.5 2 2.5
S/N
Figure 13.1: The ratio ln(1 + S/N )/S/N and ln(1 + S/N ) as function of the S/N .
from:
Ps
C∞ = lim W log2 1 +
W →∞ W N0
Ps
Ps ln 1 + W N0 Ps
= lim P
= . (13.3)
W →∞ N0 ln 2 s N0 ln 2
WN 0
Here S/N is the signal-to-noise ratio of the waveform channel. Note that the total noise power
is equal to the frequency band 2W times the power spectral density N0 /2 of the noise. We have
plotted the ratio ln(1 + S/N )/S/N in figure 13.1.
CHAPTER 13. WIDEBAND TRANSMISSION 110
RESULT 13.2 If we note that C = W log2 (1 + S/N ) then we can can distinguish two cases.
C∞ if S/N 1,
C≈ (13.5)
W log2 (S/N ) if S/N 1.
The case S/N 1 is called the power-limited case. There is enough bandwidth. When on the
other hand S/N 1 we speak about bandwidth-limited channels. Increasing the power by a
factor of two does not double the capacity but increases it only by one bit per second per Hz.
Finally we give some other ways to express the signal-to-noise ratio (under the assumption
that the number of dimensions per second is exactly 2W ):
Ps EN Eb R Eb
S/N = = = 2R N = . (13.6)
W N0 N0 /2 N0 W N0
13.4 Exercises
1. To obtain result 12.7 we have assumed that the number of dimensions that a channel with
bandwidth W can accommodate per second is 2W . What would happen with the wideband
capacity if instead the channel would give us αW dimensions per second?
2. Find out whether a telephone line channel (W = 3400Hz, C = 34000 bit/sec) is power- or
bandwidth-limited by determining its S/N . Also determine E b /N0 and R N .
3. For a fixed ratio E b /N0 , for some values of R/W reliable transmission is possible, for
other values of R/W this is not possible. Therefore the E b /N0 × R/W - plane can be
divided in a region in which reliable transmission is possible and a region where this is
not possible. Find out what function of R/W describes the boundary between these two
regions.
(The ratio R/W is sometimes called the bandwidth efficiency. Note that R/W = 2R N
where R N is the rate per dimension. Hence a similar partitioning is possible for the
E b /N0 × R N plane.)
Part IV
Channel Models
111
Chapter 14
SUMMARY: In this chapter we discuss serial pulse amplitude modulation. This method
provides us with a new channel dimension every T seconds if we choose the pulse according
to the so-called Nyquist criterion. The Nyquist criterion implies that the required band-
width W is larger than 1/(2T ). Multi-pulse transmission is also considered in this chapter.
Just as in the single-pulse case a bandwidth of 1/(2T ) Hz per used pulse is required. This
corresponds again to 2W = 1/T dimensions per second.
14.1 Introduction
p(t)
¢ @
¢ @
¢ @ t
0 T 2T
n w (t)
impulses
a0 , a1 , · · · , a K −1 sa (t) º ?· r (t)
- p(t) - + -
¹¸
+1 +1 +1 ¡¡@
@ s (t)
a
6 6 6 ¢ B ¡
¡ @@
¢ B
t ¢ £ @
@ t
B £
0 T 2T 3T 0 T B 2T 3T 4T 5T
£
? @
@£
−1
112
CHAPTER 14. PULSE AMPLITUDE MODULATION 113
imately. Here we will investigate whether it is possible to get a new dimension every 1/(2W )
seconds. We allow the building blocks to have a non-finite duration. Moreover all these building-
block waveforms are time shifts of a ”pulse”. The subject of this chapter is therefore serial pulse
transmission.
Consider figure 14.1. We there assume that the transmitter sends a signal s(t) that consists
of amplitude-modulated time shifts of a pulse p(t) by an integer multiple k of the so-called
modulation interval T , hence
X
sa (t) = ak p(t − kT ). (14.1)
k=0,K −1
The vector of amplitudes a = (a0 , a1 , · · · , a K −1 ) consists of symbols ak , k = 0, K − 1 taking
values in the alphabet A. We call this modulation method serial pulse-amplitude modulation
(PAM).
where H ( f ) = P( f )P ∗ ( f ) = kP( f )k2 is the Fourier transform of h(t) = p(t) ∗ p(−t). Note
that kP( f )k is the modulus of P( f ). Moreover Z ( f ) is called the 1/T -aliased spectrum of
H ( f ).
Later in this section we will give the proof of this theorem. First we will discuss it however
and consider an important consequence of it.
CHAPTER 14. PULSE AMPLITUDE MODULATION 114
H( f ) Z( f )
¢ A ¢ A ¢ A ¢ A ¢ A ¢ A
¢ A ¢ A ¢ A ¢ A ¢ A ¢ A
¢ A AA ¢ A ¢ A ¢ A ¢ A ¢ A ¢¢
1
− 2T 1 f 3
− 2T − T1 − 2T
1 1 1 3 f
2T 2T T 2T
Figure 14.2: The spectrum H ( f ) = kP( f )k2 corresponding to a pulse p(t) that does not satisfy
the Nyquist criterion.
H( f ) Z( f )
T 1
1
− 2T 1 f 3
− 2T − T1 − 2T
1 1 1 3 f
2T 2T T 2T
Figure 14.3: The ideally bandlimited spectrum H ( f ) = kP( f )k2 . Note that P( f ) is a spectrum
with the smallest possible bandwidth satisfying the Nyquist criterion.
14.2.2 Discussion
• Since p(t) is a real signal, the real part of its spectrum P( f ) is even in f , and the imaginary
part of this spectrum is odd. Therefore the modulus kP( f )k of P( f ) is an even function
of the frequency f .
• If the bandwidth W of the pulse p(t) is strictly smaller than 1/(2T ) (see figure 14.2), then
the Nyquist criterion, which is based on H ( f ) = kP( f )k2 , can not be satisfied. Thus
no pulse p(t) that satisfies a bandwidth-W constraint, can lead to orthogonal signaling if
W < 1/(2T ).
• The smallest possible bandwidth W of a pulse that satisfies the Nyquist criterion is 1/(2T ).
The ”basic” pulse with bandwidth 1/(2T ) for which the Nyquist criterion holds has a so-
called ideally bandlimited spectrum, which is given by
√
T if | f | < 1/(2T ),
P( f ) = (14.4)
0 if | f | > 1/2T ,
√
and T /2 for | f | = 1/(2T ). The corresponding H ( f ) = kP( f )k2 and 1/T -aliased
spectrum Z ( f ) can be found in figure 14.3.
The basic pulse p(t) that corresponds to the ideally bandlimited spectrum is shown in
figure 14.4. It is the well-known sinc-pulse
1 sin(πt/T )
p(t) = √ . (14.5)
T π t/T
CHAPTER 14. PULSE AMPLITUDE MODULATION 115
0.8
0.6
0.4
0.2
−0.2
−0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 14.4: The sinc-pulse that corresponds to T = 1, i.e. p(t) = sin(π t)/(π t).
H( f ) Z( f )
T 1
¡ @ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡
¡ @ @
¡ ¡
@ ¡
@ ¡
@ ¡
@ ¡
@
¡ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡ @
1
− 2T 1 f 3
− 2T − T1 − 2T
1 1 1 3 f
2T 2T T 2T
Figure 14.5: A spectrum H ( f ) with excess bandwidth that satisfies the Nyquist criterion.
• A pulse with a bandwidth larger than 1/2T can also satisfy the Nyquist criterion as can
be seen in figure 14.5. The so-called excess bandwidth can be larger than 1/2T . Square-
root raised-cosine pulses p(t) have a spectrum H ( f ) = |P( f )|2 that satisfies the Nyquist
criterion and an excess bandwidth that can be controlled.
RESULT 14.2 The smallest possible bandwidth W of a pulse that satisfies the Nyquist criterion
t/T )
is W = 1/2T . The sinc-pulse p(t) = √1 sin(π has this property. Note that this way of serial
T π t/T
pulse transmission leads to exactly 2W dimensions per second.
Moreover observe that unlike before in chapter 11, our pulses and signals are not time-
limited!
We now break up the integral in parts, a part for each integer m, and obtain
∞ Z
X (2m+1)/2T
h(kT ) = H ( f ) exp( j2π f kT )d f
m=−∞ (2m−1)/2T
X∞ Z 1/2T
m
= H( f +) exp( j2π f kT )d f
m=−∞ −1/2T T
Z " ∞ #
1/2T X m
= H ( f + ) exp( j2π f kT )d f
−1/2T m=−∞
T
Z 1/2T
= Z ( f ) exp( j2π f kT )d f (14.7)
−1/2T
Observe that Z ( f ) is a periodic function in f with period 1/T . Therefore it can be expanded in
terms of its Fourier series coefficients · · · , z −1 , z 0 , z 1 , z 2 , · · · as
∞
X
Z( f ) = z k exp( j2π k f T ) (14.9)
k=−∞
where Z 1/2T
zk = T Z ( f ) exp(− j2π f kT )d f. (14.10)
−1/2T
If we now combine (14.7) and (14.10) we obtain that
T h(−kT ) = z k , (14.11)
for all integers k. Condition (14.3) now tells us that only z 0 = T and all other z k are zero. This
implies that
Z ( f ) = T, (14.12)
or equivalently
∞
X m
H( f + ) = T. (14.13)
m=−∞
T
2
CHAPTER 14. PULSE AMPLITUDE MODULATION 117
r (t) H rk
- ¤ ¡ HH¤ ¡ -
p(T p − t) £¢ £¢
?
sample at t = T p + kT
Figure 14.6: The optimum receiver front-end for detection of serially transmitted orthonormal
pulses.
which is what the optimum receiver should determine, i.e. the correlation of r (t) with the pulse
p(t − kT ). This leads to the very simple receiver structure shown in figure 14.6. Processing the
samples rk , k = 1, K , should be done
P in the usual way.
When there is no noise r (t) = k=0,K −1 ak p(t − kT ). Then at time t = T p + kT we see at
the filter output
Z ∞
rk = u(T p + kT ) = r (α) p(α − kT )dα
−∞
Z ∞ X
= ak 0 p(α − k 0 T ) p(α − kT )dα
−∞ k 0 =0,K −1
X Z ∞
= ak 0 p(α − k 0 T ) p(α − kT )dα = ak , (14.16)
k 0 =0,K −1 −∞
for ak j ∈ A. If we want all pulses and its time-shifts to form an orthonormal basis, then the
time-shifts of all pulses should be orthogonal to the pulses p j (t), j = 1, J . In other words, we
require the J pulses to satisfy
Z ∞
0 1 if j = j 0 and k = k 0 ,
p j (t − kT ) p j 0 (t − k T )dt = (14.18)
−∞ 0 elsewhere,
for all j = 1, J , j 0 = 1, J , and all integer k and k 0 . A set of pulses p j (t) for j = 1, 2, · · · , J
that satisfies this restriction is
1 sin(π t/(2T ))
p j (t) = √ cos ((2 j − 1)π t/(2T )) . (14.19)
T πt/(2T )
For J = 4, and assuming that T = 1, these pulses and their spectra are shown in figure 14.7.
Observe that in this example the total bandwidth of the J pulses is J/(2T ).
RESULT 14.3 The smallest possible bandwidth that can be achieved for orthogonal multi-pulse
signaling with J pulses each T seconds is J/(2T ). Note therefore that also bandwidth-efficient
serial multi-pulse transmission leads to exactly 2W dimensions per second.
Proof: (Lee and Messerschmitt [13], p. 266) Just like in the proof of theorem 14.1 we can show
that the pulse-spectra P j ( f ) for j = 1, J must satisfy
∞
1 X m m
P j ( f + )P j∗0 ( f + ) = δ j j 0 . (14.20)
T m=−∞ T T
Now fix a frequency f ∈ [−1/(2T ), +1/(2T )) and define for each j = 1, J the vector
1
P j = (· · · , P j ( f − 1/T ), P j ( f ), P j ( f + 1/T ), P j ( f + 2/T ), · · · ). (14.21)
Here P j ( f +m/T ) is the component at position m. With this definition condition (14.20) implies
that the vectors P j , j = 1, J are orthogonal. Assume for a moment that there are less than J
positions m for which at least one component P j ( f + m/T ) for some j = 1, J is non-zero. This
would imply that the J vectors P j , j = 1, J would be dependent. Contradiction!
Thus for each frequency interval d f ∈ [−1/(2T ), 1/(2T )) we may conclude that the J pulses
”fill” at least J disjoint intervals of size d f of the frequency spectrum. In total the J pulses fill a
part J/T of the spectrum. The minimally occupied bandwidth is therefore J/(2T ). 2
Note that the optimum receiver for multi-pulse transmission can be implemented with J
matched filters that are sampled each T seconds.
CHAPTER 14. PULSE AMPLITUDE MODULATION 119
1 1
0 0.5
−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1 1
0 0.5
−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1 1
0 0.5
−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1 1
0 0.5
−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Figure 14.7: Time domain and frequency domain plots of the pulses in (14.19) for j = 1, 2, 3, 4
and T = 1.
14.4 Exercises
1. Determine the spectra of the pulses p j (t) for j = 1, 2, · · · , J defined in (14.19). Show
that these pulses satisfy (14.18). Determine the total bandwidth of these J pulses.
Chapter 15
Bandpass channels
SUMMARY: Here we consider transmission over an ideal bandpass channel with band-
width 2W . We show that building-block waveforms for transmission over a bandpass chan-
nel can be constructed from baseband building-block waveforms having a bandwidth not
larger than W . A first set of orthonormal baseband building-block waveforms can be mod-
ulated on a cosine carrier, a second orthonormal set on a sine carrier, and the resulting
bandpass waveforms are all orthogonal to each other. This technique is called quadrature
multiplexing. We determine the optimum receiver for this signaling method and the capac-
ity of the bandpass channel. We finally discuss quadrature amplitude modulation (QAM).
15.1 Introduction
So far we have only considered baseband communication. In baseband communication the chan-
nel allows signaling only in the frequency band [−W, W ]. But what should we do if the channel
only accepts signals with a spectrum in the frequency band ±[ f 0 − W, f 0 + W ]? In the present
chapter we will describe a method that can be used for signaling over such a channel. We start
by discussing this so-called bandpass channel. First we give a definition of the ideal bandpass
channel.
Definition 15.1 Consider figure 15.1. The input signal s(t) to the bandpass channel is first sent
through a filter W0 ( f ). This filter W0 ( f ) is an ideal bandpass filter with bandwidth 2W and
n w (t) W0 ( f )
¾? »r (t) f0 − W f0 + W
s(t) 1
- W0 ( f ) - + -
½¼
− f0 f0 f
Figure 15.1: Ideal bandpass channel with additive white Gaussian noise.
120
CHAPTER 15. BANDPASS CHANNELS 121
The noise process Nw (t) is assumed to be stationary, Gaussian, zero-mean, and white. The noise
has power spectral density function
N0
S Nw ( f ) = for −∞ < f < ∞, (15.2)
2
1
and autocorrelation function R Nw (t, s) = E[Nw (t)Nw (s)] = N20 δ(t − s). The noise-waveform
n w (t) is added to the output of the bandpass filter W0 ( f ) which results in r (t), the output of the
bandpass channel.
In the next section we will describe a signaling method for a bandpass channel. This method
is called quadrature multiplexing.
√
2 cos 2π f 0 t
s c (t) ¾ ?»
- ×
½¼
¾?» s(t)
+ -
½¼
¾» 6
- ×
s s (t) ½¼
6
√
2 sin 2π f 0 t
write:
√ √
sm (t) = smc (t) 2 cos(2π f 0 t) + sms (t) 2 sin(2π f 0 t)
X √ X √
c
= smi φi (t) 2 cos(2π f 0 t) + sms j ψ j (t) 2 sin(2π f 0 t). (15.4)
i=1,Nc j=1,Ns
Since both sets {φi (t), i = 1, Nc } and {ψi (t), j = 1, Ns } form an orthonormal base, using the
Parseval relation, we have for all i and j and all i 0 and j 0 that
Z ∞ Z ∞
δi,i 0 = φi (t)φi 0 (t)dt = 8i ( f )8i∗0 ( f )d f,
Z−∞∞
−∞
Z ∞
δ j, j =
0 ψ j (t)ψ j (t)dt =
0 9 j ( f )9 ∗j 0 ( f )d f. (15.6)
−∞ −∞
1 Note
that we do not assume here that the baseband building blocks are limited in time as in the chapters on
waveform communication.
CHAPTER 15. BANDPASS CHANNELS 123
8i ( f ) 8c,i ( f )
¡@ f0 − W f0 + W
¡ @ ½Z ½Z
¡ @ ½ Z ½ Z
½ Z ½ Z
−W 0 W f 0 f
− f0 f0
9j( f ) j9s, j ( f )
@ ¡
@ ¡ Z ½
@¡ Z ½
− f0 Z½
−W 0 W f 0 f
½Z f0
½ Z
½ Z
√ √
Figure 15.3: Spectra after modulation with 2 cos 2π f 0 t and 2 sin 2π f 0 t. For simplicity it is
assumed that the baseband spectra are real.
Now we are ready to determine the spectrum 8c,i ( f ) of a cosine or in-phase building-block
waveform √
1
φc,i (t) = φi (t) 2 cos 2π f 0 t (15.7)
and the spectrum 9s, j ( f ) of a sine or quadrature building-block waveform
1 √
ψs, j (t) = ψ j (t) 2 sin 2π f 0 t. (15.8)
We express these spectra in terms of the baseband spectra 8i ( f ) and 9 j ( f ) (see also figure
15.3).
Z ∞ √
8c,i ( f ) = φi (t) 2 cos(2π f 0 t) exp(−2π f t)dt
−∞
1
= √ (8i ( f − f 0 ) + 8i ( f + f 0 )) ,
2
Z ∞ √
9s, j ( f ) = ψ j (t) 2 sin(2π f 0 t) exp(−2π f t)dt
−∞
1
= √ 9 j ( f − f0) − 9 j ( f + f0) . (15.9)
j 2
CHAPTER 15. BANDPASS CHANNELS 124
Note (see again figure 15.3) that the spectra 8c,i ( f ) and 9s, j ( f ) are zero outside the passband
±[ f 0 − W, f 0 + W ].
We now first consider the correlation between φc,i (t) and φc,i 0 (t) for all i and i 0 . This leads
to
Z ∞ Z ∞
φc,i (t)φc,i 0 (t)dt = 8c,i ( f )8∗c,i 0 ( f )d f
−∞ −∞
Z
1 ∞
= [8i ( f − f 0 ) + 8i ( f + f 0 )][8i∗0 ( f − f 0 ) + 8i∗0 ( f + f 0 )]d f
2 −∞
Z Z
1 ∞ ∗ 1 ∞
= 8i ( f − f 0 )8i 0 ( f − f 0 )d f + 8i ( f − f 0 )8i∗0 ( f + f 0 )d f
2 −∞ 2 −∞
Z Z
1 ∞ ∗ 1 ∞
+ 8i ( f + f 0 )8i 0 ( f − f 0 )d f + 8i ( f + f 0 )8i∗0 ( f + f 0 )d f
2 −∞ 2 −∞
Z ∞ Z
(a) 1 ∗ 1 ∞
= 8i ( f − f 0 )8i 0 ( f − f 0 )d f + 8i ( f + f 0 )8i∗0 ( f + f 0 )d f
2 −∞ 2 −∞
(b) 1 1
= δi,i 0 + δi,i 0 = δi,i 0 . (15.10)
2 2
Here equality (a) follows from the fact that f 0 > W and the observation that for all i = 1, Nc
the spectra 8i ( f ) ≡ 0 for | f | ≥ W . Therefore the cross-terms are zero, see also figure 15.3.
Equality (b) follows from (15.6) which holds since the baseband building-blocks φi (t) for i =
1, Nc , form an orthonormal base.
In a similar way we can show that for all j and j 0
Z ∞
ψs, j (t)ψs, j 0 (t)dt = δ j, j 0 . (15.11)
−∞
What remains to be investigated is the correlation between all in-phase (cosine) building-
block waveforms φc,i (t) for i = 1, Nc and all quadrature (sine) building-block waveforms
ψs, j (t) for j = 1, Ns :
Z ∞ Z ∞
∗
φc,i (t)ψs, j (t)dt = 8c,i ( f )9s, j ( f )d f
−∞ −∞
Z
j ∞
= [8i ( f − f 0 ) + 8i ( f + f 0 )][9 ∗j ( f − f 0 ) − 9 ∗j ( f + f 0 )]d f
2 −∞
Z Z
j ∞ ∗ j ∞
= 8i ( f − f 0 )9 j ( f − f 0 )d f − 8i ( f − f 0 )9 ∗j ( f + f 0 )d f
2 −∞ 2 −∞
Z Z
j ∞ ∗ j ∞
+ 8i ( f + f 0 )9 j ( f − f 0 )d f − 8i ( f + f 0 )9 ∗j ( f + f 0 )d f
2 −∞ 2 −∞
Z ∞ Z ∞
(c) j ∗ j
= 8i ( f − f 0 )9 j ( f − f 0 )d f − 8i ( f + f 0 )9 ∗j ( f + f 0 )d f
2 −∞ 2 −∞
= 0. (15.12)
CHAPTER 15. BANDPASS CHANNELS 125
Here equality (c) follows from the fact that f 0 > W and the observation that for all i = 1, Nc
the spectra 8i ( f ) ≡ 0 for | f | ≥ W and for all j = 1, Ns the spectra 9 j ( f ) ≡ 0 for | f | ≥ W .
Therefore the cross-terms are zero. See figure 15.3.
RESULT 15.1 We have shown that all in-phase building-block waveforms φc,i (t) for i = 1, Nc
and all quadrature building-block waveforms ψs, j (t) for j = 1, Ns together form an orthonor-
mal base.
Moreover the spectra 8c,i ( f ) and 9s, j ( f ) of all these building-block waveforms are zero outside
the passband ±[ f 0 − W, f 0 + W ]. Therefore none of these building-block waveforms is hindered
by the bandpass filter W0 ( f ) when they are sent over our bandpass channel.
It is important to note that the baseband building-block waveforms φi (t) and ψ√j (t), for any
i and √ j, need not be orthogonal. Multiplication of these baseband waveforms by 2 cos 2π f 0 t
resp. 2 sin 2π f 0 t results in the orthogonality of the bandpass building-block waveforms φc,i (t)
and ψs, j (t).
Also note that all the bandpass building block waveforms have unit energy. This follows from
(15.10) and (15.11). Therefore the energy of sm (t) is equal to the squared length of the vector
s m = (s cm , s sm ) = (sm1
c c
, sm2 , · · · , smc Nc , sm1
s s
, sm2 , · · · , sms Ns ). (15.13)
After having determined the received vector r = (r c , r s ) = (r1c , r2c , · · · , r Nc c , r1s , r2s , · · · , r Ns s ) we
c c
can form the dot-products (r · s m ) where the signal vector s m = (s cm , s sm ) = (sm1 , sm2 , · · · , smc Nc ,
s s s
sm1 , sm2 , · · · , sm Ns ):
X X
(r · s m ) = ric smi
c
+ r sj sms j . (15.15)
i=1,Nc j=1,Ns
CHAPTER 15. BANDPASS CHANNELS 126
φ1 (t)
¾?» r1c
R
- × - -
½¼
φ2 (t)
¾?» r2c
R
- × - -
½¼
√ .. weighting
2 cos 2π f 0 t .
matrix
φ Nc (t) c1
¾ ? »¾ ?» r Nc c (r · s 1 ) ¾?»
R
- × - × - - - + -
½¼½¼ ½¼ m̂
r (t) c2 select -
(r · s 2 ) ¾? » largest
ψ1 (t) - + -
¾»¾ ?» r1s ½¼
R ..
- × - × - -
P c c .
½¼½¼
i ri smi
6 ψ (t) 2 P c|M|
+ j r sj sms j
√ ¾?» r2s (r · s |M|¾
) ?»
2 sin 2π f 0 t R
- × - - - + -
½¼ ½¼
..
.
ψ Ns (t)
¾?» r Ns s
R
- × - -
½¼
Figure 15.4: Optimum receiver for quadrature-multiplexed signals that are transmitted over a
bandpass channel.
CHAPTER 15. BANDPASS CHANNELS 127
Adding the constants cm is now the last step before selecting m̂ as the message m that maximizes
(r · s m ) + cm where (see result (6.1))
N0 ks m k2
cm = ln Pr{M = m} − , (15.16)
2 2
with
X X
ks m k2 = ks cm k2 + ks sm k2 = c 2
(smi ) + (sms j )2
i=1,Nc j=1,Ns
Z ∞ Z ∞
= (smc (t))2 dt + (sms (t))2 dt. (15.17)
−∞ −∞
R∞
Note that ks m k2 = −∞ sm2 (t)dt.
All this leads to the receiver implementation shown in figure 15.4.
RESULT 15.2 Therefore the capacity in bit per dimension of the bandpass channel with band-
width ±[ f 0 − W, f 0 + W ] is
1 Ps
C N = log2 (1 + ), (15.21)
2 2N0 W
if the transmitter power is Ps and the noise spectral density is N0 /2 for all f . Consequently the
capacity per second is
Ps
C = 4W C N = 2W log2 (1 + ) bit/second. (15.22)
2N0 W
¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢
¤¡ ¤¡ ¤¾¡d ¤¡ ¤¡ ¤¡ ¤¡ ¤¾¡d ¤¡ ¤¡
£¢ £¢ £ ¢ -£ ¢ £¢ £¢ £¢ £ ¢ -£ ¢ £¢
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢
(a) (b)
Figure 15.5: Two QAM signal structures, (a) 16-QAM, (b) 32-QAM. Note that 32-QAM is not
square!
15.9 Exercises
1. A signal structure must contain M signal points. These point are to be chosen on an integer
grid.
Now consider a hypercube and a hypersphere in N dimensions. Both have their center in
the origin of the coordinate system.
We can either choose our signal structure as all the grid points inside the sphere or all the
grid points inside the cube. Assume that the dimensions of the sphere and cube are such
that they contain equally many signal points.
Let M 1 so that the signal points can be assumed to be uniformly distributed over the
sphere and the cube.
(a) Let N = 2. Find the ratio between the average signal energies of the sphere-structure
and that of the cube-structure. What is the difference in dB?
(b) Let N be even. What is then the ratio for N → ∞? In dB?
Hint: The volume of an N -dimensional sphere is π N /2 R N /(N /2)! for even N .
Random carrier-phase
SUMMARY: Here we consider carrier modulation under the assumption that the carrier-
phase is not known to the receiver. Only one baseband signal s b (t) will be transmitted. First
we determine the optimum receiver for this case. Then we consider the implementation of
the optimum receiver when all signals have equal energy. We investigate envelope detection
and work out an example in which one of out of two orthogonal signals is transmitted.
16.1 Introduction
Because of oscillator drift or differences in propagation time it is not always reasonable to as-
sume that the receiver knows the phase of the wave that is used as carrier of the message, when
quadrature multiplexing would be used. In that case coherent demodulation is not possible.
To investigate this situation we assume that the transmitter modulates with the1 baseband
signal s b (t) a wave with a random phase θ, i.e.
√
s(t) = s b (t) 2 cos(2π f 0 t − θ), (16.1)
where the random phase 2 is assumed to be uniform over [0, 2π) hence
1
p2 (θ) = , for 0 ≤ θ < 2π. (16.2)
2π
More precisely, for each message m ∈ M, let the signal smb (t) correspond to the vector
s m = (sm1 , sm2 , · · · , sm N ) hence
X
smb (t) = smi φi (t). (16.3)
i=1,N
We assume as before that the spectrum of the signals smb (t) for all m ∈ M is zero outside
[−W, W ]. Note that the same holds for the building-block waveforms φi (t) for i = 1, N . This
1 Notethat there is only one baseband signal involved here. The type of modulation is called double sideband
suppressed carrier (DSB-SC) modulation.
130
CHAPTER 16. RANDOM CARRIER-PHASE 131
is since the building-block waveforms are linear combinations of the signals. If we now use the
goniometric identity cos(a − b) = cos a cos b + sin a sin b we obtain
√
sm (t) = smb (t) 2 cos(2π f 0 t − θ)
√ √
= smb (t) cos θ 2 cos(2π f 0 t) + smb (t) sin θ 2 sin(2π f 0 t)
X √ X √
= smi cos θφi (t) 2 cos(2π f 0 t) + smi sin θφi (t) 2 sin(2π f 0 t). (16.4)
i=1,N i=1,N
If we now turn to the vector approach we can say that, although θ is unknown, this is equivalent
to quadrature multiplexing where vector-signals s cm and s sm are transmitted satisfying
s cm = s m cos θ,
s sm = s m sin θ. (16.5)
After receiving the signal r (t) = sm (t) + n w (t) an optimum-receiver forms the vectors r c and r s
for which
r c = s cm + n c = s m cos θ + n c ,
r s = s sm + n s = s m sin θ + n s , (16.6)
where both n c and n s are independent Gaussian vectors with independent components all having
mean zero and variance N0 /2. In the next section we will further determine the optimum receiver
for this situation.
and note that both kr c k2 and kr s k2 do not depend on the message m and can be ignored. There-
fore the relevant part of the decision variable is
Z
2 c s Em
p2 (θ) exp [(r · s m ) cos θ + (r · s m ) sin θ] dθ exp − , (16.10)
θ N0 N0
where E m = ks m k2 . Note that we have to maximize the decision variable over all m ∈ M.
We next consider (r c · s m ) cos θ + (r s · s m ) sin θ . Therefore, for each m ∈ M, we first make
"
"
"
"
X m ""
" (r s , s m )
"
"
"
"
"
" B γm
"
"
(r c , s m )
12
10
0
0 0.5 1 1.5 2 2.5 3 3.5 4
1 1 R 2π
Figure 16.2: A plot of I0 (x) = 2π 0 exp(x cos θ)dθ as a function of x.
RESULT 16.1 Combining everything we obtain that the optimum receiver for “random-phase
transmission” (incoherent detection) has to choose the message m ∈ M that maximizes
2X m Em
I0 exp − . (16.15)
N0 N0
Note that
X X Z ∞
c
(r · s m ) = ric smi = r (t)φic (t)dt smi
i=1,N i=1,N −∞
X Z ∞ √
= r (t)φi (t) 2 cos(2π f 0 t)dt smi
i=1,N −∞
Z ∞ √ X
= r (t) 2 cos(2π f 0 t) smi φi (t)dt
−∞ i=1,N
Z ∞ √
= r (t) 2 cos(2π f 0 t)smb (t)dt. (16.17)
−∞
A similar result holds for the product (r s · s m ). All this suggests the implementation shown in
figure 16.3. As usual there is also a matched-filter version of this incoherent receiver.
√
2 cos(2π f 0 t)
? » ¾» (r c · s 1 )
r (t) ¾ R
- × - × - - (·)2
½¼ ½¼ ¾ ?» X 12
6b -
s1 (t) +
¾» ¾ ?» ½¼
R 6
- × - × - - (·)2
½¼ ½¼ m̂
6 (r s · s 1 ) select -
¾» (r c · s 2 ) largest
√ R
2 sin(2π f 0 t) - × - - (·)2
½¼ ¾?»X 22
6b -
s2 (t) +
¾ ?» ½¼
R 6
- × - - (·)2
½¼
(r s · s 2 )
..
.
c
¾ » (r · s |M| )
R
- × - - (·)2
½¼ ¾?X»2
6b |M|
s|M| (t) + -
¾ ?» ½¼
R 6
- × - - (·)2
½¼
(r s · s |M| )
Figure 16.3: Correlation receiver for equal-energy signals with random phase.
CHAPTER 16. RANDOM CARRIER-PHASE 135
The signals u cm (t) and u sm (t) are baseband signals. Why? The reason is that we can regard
e.g. √u cm (t) as the output of the filter with impulse response smb (T − t) when the excitation is
r (t) 2 cos(2π f 0 t). It is clear that signal-components outside the frequency band [−W, W ] can
not pass the filter smb (T − t) since smb (t) is a baseband signal.
We now use the trick of section 16.2 again. Write the matched-filter output as
u m (t) = X m (t) cos[2π f 0 t − γm (t)] (16.21)
where q
1
X m (t) = (u cm (t))2 + (u sm (t))2 (16.22)
and angle γm (t) is such that
u cm (t) = X m (t) cos γm (t),
u sm (t) = X m (t) sin γm (t). (16.23)
CHAPTER 16. RANDOM CARRIER-PHASE 136
u cm (T ) = (r c · s m ),
u sm (T ) = (r s · s m ), (16.24)
hence q
X m (T ) = (r c · s m )2 + (r s · s m )2 . (16.25)
Therefore, for equal-energy signals, we can construct an optimum√ receiver by sampling the
b
envelope of the outcomes of the matched filters h m (t) = sm (T −t) 2 cos(2π f 0 t) for all m ∈ M
and comparing the samples. This leads to the implementation shown in figure 16.4.
..
.
sample at t = T
Figure 16.4: Envelope-detector receiver for equal energy signals with random phase.
√
Note that instead of using a matched filter with (passband) impulse response smb (T − t) 2
cos(2π f 0 t) we can first multiply the received signal r (t) by cos(2π f 0 t) and then use a matched
filter with (baseband) impulse response smb (T − t).
Example 16.1 For some m ∈ M assume that smb (t) = 1 for 0 ≤ t ≤ 1 and zero elsewhere. If the random
phase turns out to be θ = π/2 then
√ √
sm (t) = smb (t) 2 cos(2π f 0 t − π/2) = smb (t) 2 sin(2π f 0 t). (16.26)
If we assume that is no noise, hence r (t) = sm (t), the output u m (t) of the matched filter will be
Z ∞
u m (t) = r (α)h m (t − α)dα
−∞
Z ∞ √ √
= smb (α) 2 sin(2π f 0 α)smb (1 − t + α) 2 cos(2π f 0 (t − α)dα. (16.28)
−∞
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5
b
hm(t)=sm(1−t)sqrt(2)cos(2*pi*f 0*t) um(t)
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5
Figure 16.5: Signals smb (t), sm (t), and u m (t), and impulse-response h m (t).
(r c · s 1 )2 + (r s · s 1 )2 > (r c · s 2 )2 + (r s · s 2 )2 . (16.30)
r c = s m cos θ + n c
r s = s m sin θ + n s . (16.31)
When M = 1 then the vector components of r c = (r1c , r2c ) and r s = (r1s , r2s ) are
p
r1c = E s cos θ + n c1 ,
p
r1s = E s sin θ + n s1 ,
r2c = n c2 ,
r2s = n s2 , (16.32)
√ √
where n c = (n c1 , n c2 ) and n s = (n s1 , n s2 ). Now by s 1 = ( E s , 0) and s 2 = (0, E s ) the optimum
receiver decodes m̂ = 1 if and only if
The noise components are all statistically independent Gaussian variables with density function
1 n2
p N (n) = √ exp(− ). (16.34)
π N0 N0
Now fix some θ. Assume that (r1c )2 + (r1s )2 = ρ 2 . What is now the probability of error given
that M = 1 and 2 = θ? Therefore consider
We obtain the error probability for the case where M = 1 by averaging over all R1c and R1s , hence
Pr{ M̂ = 2|2 = θ, M = 1}
Z ∞Z ∞
c s (r1c )2 + (r1s )2
= p R1c ,R1s (r1 , r1 |2 = θ, M = 1) exp(− )dr1c dr1s
−∞ −∞ N 0
Z ∞Z ∞
c s (r1c )2 + (r1s )2
= p R1 (r1 |2 = θ, M = 1) p R1 (r1 |2 = θ, M = 1) exp(−
c s )dr1c dr1s
−∞ −∞ N0
Z ∞ c 2
(r )
= p R1c (r1c |2 = θ, M = 1) exp(− 1 )dr1c ·
−∞ N0
Z ∞
(r s )2
· p R1s (r1s |2 = θ, M = 1) exp(− 1 )dr1s . (16.36)
−∞ N0
√
Consider the first factor. Note that r1c = E s cos θ + n c1 . Therefore
Z ∞
(r c )2
p R1c (r1c |2 = θ, M = 1) exp(− 1 )dr1c
−∞ N0
Z ∞ √
1 (α − E s cos θ)2 α2
= √ exp(− ) exp(− )dα. (16.37)
−∞ π N0 N0 N0
√
With m = E s cos θ this integral becomes
Z ∞
1 (α − m)2 α2
√ exp(− ) exp(− )dα
−∞ π N0 N0 N0
Z ∞
1 2α − 2αm + m 2
2
= √ exp(− )dα
−∞ π N0 N0
Z ∞
1 2(α 2 − αm + m 2 /4) + m 2 /2
= √ exp(− )dα
−∞ π N0 N0
m2 Z ∞ m2
exp(− 2N ) 1 (α − m/2) 2 exp(− 2N )
= √ 0 √ exp(− )dα = √ 0
2 −∞ π N0 /2 N0 /2 2
cos θ 2
exp(− E s2N )
= √ 0 . (16.38)
2
Combining this with a similar result for the second factor we obtain
cos θ 2sin θ 2
exp(− E s2N ) exp(− E s2N ) 1 Es
Pr{ M̂ = 2|2 = θ, M = 1} = √ 0
· √ 0 = exp(− ). (16.39)
2 2 2 2N0
Although we have fixed θ this probability is independent of θ . Averaging over θ yields therefore
1 Es
Pr{ M̂ = 2|M = 1} = exp(− ). (16.40)
2 2N0
Based on symmetry we can obtain a similar result for Pr{ M̂ = 1|M = 2}.
CHAPTER 16. RANDOM CARRIER-PHASE 140
RESULT 16.2 Therefore we obtain for the error probability for an incoherent receiver for two
equally likely orthogonal signals, both having energy E s , that
1 Es
PE = exp(− ). (16.41)
2 2N0
Note that for coherent reception of two equally likely orthogonal signals we have obtained
before that s
Es 1 Es
PE = Q( ) ≤ exp(− ). (16.42)
N0 2 2N0
Here we have used the bound on the Q-function that was derived in appendix A. Figure 16.6
shows that the error probabilities are comparable however.
Es/N0
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
10
−15 −10 −5 0 5 10 15
Figure 16.6: Probability of error for coherent and incoherent reception of two equally likely
orthogonal signals of energy E s as a function of E s /N0 in dB. Note that probability of error for
incoherent reception is largest.
16.6 Exercises
1. Let X and Y be two independent Gaussian random variables with common variance σ 2 .
The mean of
√ X is m and Y is a zero-mean random variable. We define the random variable
V as V = X 2 + Y 2 . Show that
v mv v2 + m 2
pV (v) = 2 I0 ( 2 ) exp(− ), for v > 0, (16.43)
σ σ 2σ 2
and 0 for v ≤ 0. Here I0 (·) is the modified Bessel function of the first kind and zero order.
The distribution of V is know as the Rician distribution. In the special case where m = 0,
the Rician distribution simplifies to the Rayleigh distribution.
(Exercise 3.31 from Proakis and Salehi [17].)
Note that s1 and s2 are signals in one-dimensional “vector”-notation. The signals are
equiprobable i.e. Pr{M = 1} = Pr{M = 2} = 1/2.
These signals are transmitted over a bandpass channel with a carrier with random phase θ
with p2 (θ) = 1/2π for 0 ≤ θ < 2π . Power density of the noise is N20 for all frequencies.
An optimum receiver first determines r c and r s for which we can write
r c = sm cos θ + n c ,
r s = sm sin θ + n s , (16.45)
N0
where both n c and n s are independent zero-mean Gaussian vectors with variance 2 .
Now let E/N0 = 10dB.
Coded Modulation
142
Chapter 17
si ¾?»ri = si + n i
- + -
½¼
the i-th channel output ri is equal to the i-th input si to which noise n i is added, hence
ri = si + n i , for i = 1, 2, · · · . (17.1)
We assume that the noise variables Ni are independent and normally distributed, with mean 0
and variance σ 2 = N0 /2.
This AWGN-channel models transmission over a waveform channel with additive white
Gaussian noise with power density S Nw ( f ) = N0 /2 for −∞ < f < ∞ as we have seen
before (see section 6.8).
Note that i is the index of the dimensions. If e.g. the waveform channel has bandwidth W ,
1
we can get a new dimension every 2W seconds (see chapter 14). This determines the relation
between the actual time t and the dimension index i but we will ignore this in what follows.
143
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 144
In this chapter we call a dimension a transmission. We assume that the energy “invested” in
each transmission (dimension) is E N . It is our objective to investigate coding techniques for this
channel. First we consider uncoded transmission.
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
−|A| + 1 · · · -5 -3 -1 1 3 5 ··· |A| − 1
Figure 17.2: An |A|-PAM constellation (|A| even) with the |A| elementary symbols.
Each symbol value has a probability of occurrence of 1/|A|. Therefore the information rate
per transmission is
R N = log2 |A| bits per transmission. (17.4)
For the average symbol energy E N per transmission we now have that
d02
EN = E pam (17.5)
4
where E pam is the elementary energy of an |A|-PAM constellation which is defined as
1 1 X
E pam = a2. (17.6)
|A|
a=−|A|+1,−|A|+3,··· ,|A|−1
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 145
|A|2 − 1
E pam = . (17.7)
3
The result holds both for even and odd |A|.
It now follows from the above result that, since si = (d0 /2)ai , the average symbol energy
d02 |A|2 − 1
EN = . (17.8)
4 3
Example 17.1 Consider the 4-PAM constellation
The rate corresponding to this constellation is R N = log2 4 = 2 bits per transmission. For the elementary
energy per transmission we obtain
1 42 − 1
E pam = ((−3)2 + (−1)2 + (+1)2 + (+3)2 ) = = 5. (17.10)
4 3
Apart from the rate R N and the average symbol energy E N , a third important parameter is
the symbol error probability PEs . This error probability is Q( d0σ/2 ) for the outer two signal points
and 2Q( d0σ/2 ) for the inner signal points. Therefore
2(|A| − 2) + 2 d0 /2 2(|A| − 1) d0 /2
PEs = Q( )= Q( ). (17.11)
|A| σ |A| σ
By (17.8) and the fact that σ 2 = N0 /2, the square of the argument of the Q-function can be
written as
d02 3 EN
= . (17.12)
4σ 2 |A|2 − 1 N0 /2
Therefore we can express the symbol error probability as
s ! s !
2(|A| − 1) 3 E N 2(|A| − 1) 3
PEs = Q = Q S/N , (17.13)
|A| |A| − 1 N0 /2
2 |A| |A|2 − 1
if we note that the signal-to-noise ratio S/N = E N /(N0 /2), i.e. the average symbol energy
divided by the variance of the noise in a transmission.
capacity C N per dimension. If we write the capacity per transmission (see 12.17) as a function
of S/N = E N /(N0 /2) we obtain
1
RN < CN = log2 (1 + S/N ). (17.14)
2
Rewriting this inequality in the reverse direction way we obtain the following lower bound for
S/N as a function of the rate R N
1
S/N > 22R N − 1 = S/N min . (17.15)
Definition 17.1 This immediately suggests defining the normalized signal-to-noise ratio as
1 S/N S/N
S/N norm = = 2R . (17.16)
S/N min 2 N −1
This definition allows us to compare coding methods with different rates with each other.
Now it follows from (17.15) that for reliable transmission S/N norm > 1 must hold. Good
signaling methods achieve small error probabilities for S/N norm close to 1 while lousy methods
need a larger S/N norm to realize reliable transmission. It is the objective of the communication
engineer to design systems that achieve acceptable error probabilities for a S/N norm as close as
possible to 1.
SNRnorm
0
−1
−2
−3
−4
−5
−6
−1 0 1 2 3 4 5 6 7 8 9 10
Figure 17.3: Average symbol error probability PEs versus S/N norm for uncoded |A|-PAM, under
the assumption that |A| is large. Horizontally S/N norm in dB, vertically log10 PEs .
where
X # of s 0 such that ks − s 0 k = d
Kd = , (17.22)
|S|
s∈S
i.e. the average number of signals at distance d from a coded sequence. Inequality (17.21) is
referred to as the union bound for the sequence error probability.
The function Q(x) decays exponentially not slower than exp(−x 2 /2). This follows from
appendix A. Therefore, when K d does not increase too rapidly with d, for small σ , the union
bound estimate is dominated by its first term, which is called the union bound estimate
dmin /2
PE ≈ K min Q , (17.23)
σ
where dmin is the minimum Euclidean distance between any two sequences in S and K min the
average number of signals at distance dmin from a coded sequence. K min is called the error
coefficient.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 149
where dH (c, c0 ) is the Hamming distance between two codewords c and c0 , i.e. the number of
positions at which they differ.
1 Wewrite P̃Es with a tilde since we are not dealing with concrete symbols.
sum c of two codewords c0 and c00 is the component-wise mod 2 sum of both codewords. A codeword c
2 The
times a scalar is the codeword c itself if the scalar is 1 or the all-zero codeword 0, 0, · · · , 0 if the scalar is 0.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 150
We will use these codes in the next sections of this chapter, just because it is convenient to do so.
Example 17.2 For q = 3 we obtain a code with parameters (N , K , dH,min ) = (8, 4, 4). Indeed this code
can be constructed from the following K = 4 independent codewords of length N = 8:
c1 = (1, 0, 0, 0, 0, 1, 1, 1),
c2 = (0, 1, 0, 0, 1, 0, 1, 1),
c3 = (0, 0, 1, 0, 1, 1, 0, 1),
c4 = (0, 0, 0, 1, 1, 1, 1, 0). (17.29)
The code consists of the 16 linear combinations of these 4 codewords. Why is the minimum Hamming
distance dH,min of this code 4? To see this, note that the difference of two codewords c 6= c0 is a non-zero
codeword itself. The Hamming distance between two codewords is the number of ones i.e. the Hamming
weight of this difference. The minimal Hamming weight of a nonzero codeword is at least 4. This can be
checked easily.
A
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
-7 -5 -3 -1 1 3 5 7
³ PP
A³0³³³ PPA1
PP
³
³
) ³ P
P
q
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
-7 -3 1 5 -5 -1 3 7
Figure 17.4: Partitioning of an 8-PAM set into two subsets with four elementary symbols having
both a twice as large minimum within-subset distance than the original PAM-set.
and a subset A1 both containing four elementary symbols, and such that the minimum distance
between two elementary symbols in A0 or two symbols in A1 , i.e. the minimum within-subset
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 151
distance, is 4, which is twice as large as the minimum distance between two elementary symbols
in A, which is 2 (see figure 17.4). Note that the minimum distance between any symbol from A0
and any symbol from A1 is still 2.
How can set-partitioning lead to more reliable transmission? Therefore we consider the
example-encoder shown in figure 17.5. This encoder transforms 20 binary information digits
b1 , b2 , · · · , b20 into 8 elementary 8-PAM symbols a1 , a2 , · · · , a8 . The digits b1 , b2 , b3 , b4 are
b1
b2
(8,4,4) ..
b3 .
b4 c1
b5 tab a1
b6
c2
b7 tab a2
b8
.. .. ..
. . .
c8
b19 tab a8
b20
1. First consider all 216 elementary sequences a that result from a fixed codeword c from
the extended Hamming code. For such a sequence each component ai ∈ Aci for i =
1, 2, · · · , 8. The within-subset distance in subsets A0 and A1 is 4. Therefore the minimum
distance between two elementary sequences a and a 0 resulting from the same codeword c
is 4.
2. Next consider two elementary sequences a and a 0 that correspond to two different code-
words c and c0 from the extended Hamming code. Then there are at least dH,min = 4
positions i ∈ {1, 2, · · · , 8} at which ci 6= ci0 . Note that at these positions the elementary
symbol ai ∈ Aci while the symbol ai0 ∈ Aci0 . The minimum distance between any symbol
from A0 and any symbol from A1 is 2 and thus at these positions |ai − ai0 | ≥ 2 and hence
X
ka − a 0 k2 = (ai − ai0 )2 ≥ 4 · 4 = 16. (17.31)
i=1,8
Therefore the distance between two elementary sequences a and a 0 that correspond to two
different codewords c and c0 is at least 4.
We conclude that the Euclidean distance between two elementary sequences a and a 0 is at least
4. Therefore the Euclidean distance between two sequences s and s 0 is at least 4(d0 /2) = 2d0 .
Hence dmin = 2d0 , which is a factor of two larger than in the uncoded case.
d02 |A|2 − 1
EN = , (17.33)
4 3
where it is important to note that although we use coding, the PAM symbols remain equiprobable.
Moreover
2
dmin 2
dmin d02 2
dmin 3 EN 2
dmin 3
= = = S/N . (17.34)
4σ 2 d02 4σ 2 d02 |A|2 − 1 N0 /2 d02 |A|2 − 1
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 153
Therefore
s ! s !
2 2
dmin 3 dmin 22R N − 1
PE = K min Q S/N = K min Q 3S/N norm , (17.35)
d02 |A|2 − 1 d02 |A|2 − 1
where R N is the rate of the signal set S, i.e.
log2 |S| K
RN = = + log2 |A|/2
N N
2q − q − 1 q +1
= + log2 |A| − 1 = log2 |A| − . (17.36)
2q 2q
Note that the rate R N approaches log2 |A| for q → ∞, i.e. if we increase the complexity of the
extended Hamming code. √
The argument of the Q( ·)-function in terms of S/N norm was 3S/N norm in the uncoded
2
dmin 22R N −1
PAM case. If we use Ungerboeck coding as we described here, it is 2
d0 |A|2 −1
3S/N norm , hence
asymp
we have achieved an increase of the argument by a factor G cod of
2
asymp dmin 22R N − 1 22R N − 1
G cod = = 4 · . (17.37)
d02 |A|2 − 1 |A|2 − 1
This factor is called the asymptotic (or nominal) coding gain of our coding system with respect
to the baseline performance.
2 /d 2 = 4 corresponds to the increase of the squared Euclidean distance
The factor of dmin 0
between the sequences. This factor is referred to as distance gain. This increase of the distance
however was achieved since we have introduced redundancy by using an error-correcting code.
The factor (22R N − 1)/(|A|2 − 1) that corresponds to this loss is therefore called redundancy
loss.
Example 17.3 The encoder shown in figure 17.5 is based on the (8, 4, 4) extended Hamming code. The
rate of this extended Hamming, corresponding to q = 3, is
K 2q − q − 1 23 − 3 − 1
Rext.Hamm. = = = = 1/2. (17.38)
N 2q 23
Therefore the total rate R N = Rext.Hamm. + 2 = 5/2 bits/transmission. The additional 2 bits come from the
fact that the subsets A0 and A1 each contain four symbols.
The distance gain of this method
2
dmin
= 4, (17.39)
d02
which is 10 log10 4 = 6.02dB. The redundancy loss however is
22R N − 1 22·5/2 − 1
= = 31/63 (17.40)
|A|2 − 1 82 − 1
which is 10 log10 31/63 = −3.08dB. Therefore the asymptotic coding gain for this code is
asymp 31
G cod = 10 log10 4 · = 6.02dB − 3.08dB = 2.94dB. (17.41)
63
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 154
Note that the normalized error coefficient is K min /N . This normalized error coefficient
K min /N is larger than the coefficient 2 in the uncoded case. What is the effect of this larger
coefficient in dB? In other words, how much do we have to increase the S/N norm , to compen-
sate for the larger K min /N ? Note that we are interested in achieving√a symbol error probability
of 10−6 . To achieve the baseline performance the argument α of Q( ·) has to satisfy
√
2Q( α) = 10−6 (17.44)
in other words α = 23.928. For an alternative coding methods with normalized error coefficient
K min /N we need an argument β that satisfies
K min p
Q( β) = 10−6 , (17.45)
N
or a S/N norm that is a factor β/α larger, to get the same performance. Therefore the coding loss
L that we have to take into account is
β
L = 10 log10 . (17.46)
α
In practice it turns out that if the normalized error coefficient K min /N is increased by a factor of
two, we have to accept an additional loss of roughly 0.2dB (at error rate 10−6 ).
The resulting coding gain, called the effective coding gain, is the difference of the asymptotic
coding gain and the loss, hence
asymp asymp β
G eff
cod = G cod − L = G cod − 10 log10 . (17.47)
α
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 155
5
3
4
2 3
2
1
1
0 0
2 3 4 5 6 2 3 4 5 6
2500 5
2000 4
1500 3
1000 2
500 1
0 0
2 3 4 5 6 2 3 4 5 6
Figure 17.6: Rate, asymptotic coding gain (in dB), normalized error coefficient, and effective
coding gain (in dB) versus q.
Example 17.4 Again consider the encoder shown in figure 17.5 which is based on the (8, 4, 4) extended
Hamming code. We can show that K min (q) i.e. the number of “nearest neighbors” in the extended Ham-
ming code, for q = 3, is
23
3−1
4
+ (23 − 1) 2 2
K min (3) = = 14. (17.48)
23
This results in the following bound for the error coefficient in the set S of coded sequences
The normalized error coefficient K min /N is therefore at most 240/8 = 30. Now we have to solve
K min p p
Q( β) = 30 · Q( β) = 10−6 , (17.50)
N
It turns out that β = 29.159 and thus the loss is 10 log10 (29.159/23.928) = 0.86dB. Therefore the
asymp
cod = G cod
effective coding gain G eff − 0.86dB = 2.94dB − 0.86dB = 2.08dB.
asymp
q N RN G cod K min /N L G eff
cod
2 4 2.250 1.38 dB 6 0.37 1.01 dB
3 8 2.500 2.94 dB 30 0.86 2.08 dB
4 16 2.687 4.10 dB 142 1.29 2.81 dB
5 32 2.813 4.87 dB 622 1.66 3.21 dB
6 64 2.891 5.35 dB 2606 1.99 3.36 dB
The results from the table are also shown in figure 17.6. It can be observed that the rate R N of
our code tends to 3 bits/transmission if q increases. Moreover we see that the asymptotic coding
gain approaches 6dB. The effective coding gain is considerably smaller however. The reason for
this is that the normalized error coefficient increases very quickly with q.
17.9 Remark
What we did not do is the following. To keep things simple we have only discussed one-
dimensional signal structures. Ungerboeck’s method of set-partitioning also applies to two or
more-dimensional structures.
17.10 Exercises
1. Consider instead of the extended Hamming code a so-called single-parity-check code of
length 4. This code contains 8 codewords and has distance dH,min = 2. A codeword
now has four binary components c1 , c2 , c3 , c4 and is determined by three information bits
b1 , b2 , b3 .
»XX
»»
»» XXX
» » » XX
» XXX
A »»» - A0 X
X
z A1
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ 3£ ¢ £¢ £¢ 3 £¢ £¢ 3£ ¢
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ 1£ ¢ £¢ £¢ 1£ ¢ £¢ 1 £¢
¤¡ ¤¡ ¤ 1¡ ¤ 3¡ ¤¡ 1 ¤ 3¡ ¤¡ ¤ 1¡ 3
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
As before we use set-partitioning, but now we partition a 16-QAM constellation into two
subsets A0 and A1 (see figure). Note that the signals in these sets are two-dimensional.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 157
(a) What is the minimum Euclidean distance dmin now? What is the rate R N of this code
in bits per transmission? Give the asymptotic coding gain of this code (relative to the
baseline performance).
Part VI
Appendices
158
Appendix A
1 x2
Q(x) ≤ exp(− ). (A.2)
2 2
Proof: Note that for α ≥ x, and since x ≥ 0,
α 2 = (α − x)2 + 2αx − x 2
≥ (α − x)2 + 2x 2 − x 2
= (α − x)2 + x 2 . (A.3)
Therefore
Z ∞ 1 α2
Q(x) = √ exp(− )dα
x 2π 2
Z ∞
1 (α − x)2 + x 2
≤ √ exp(− )dα
x 2π 2
Z ∞
x2 1 (α − x)2
= exp(− ) √ exp(− )dα
2 x 2π 2
1 x2
= exp(− ). (A.4)
2 2
2
159
APPENDIX A. AN UPPER BOUND FOR THE Q -FUNCTION 160
0
10
−1
10
−2
10
−3
10
−4
10
−5
10
−6
10
−7
10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
B.1 Definition
THEOREM B.1 If the signal x(t) satisfies the Dirichlet conditions, that is,
R∞
1. The signal x(t) is absolutely integrable on the real line, i.e. −∞ |x(t)|dt < ∞.
2. The number of maxima and minima of x(t) in any finite interval on the real line is finite.
3. The number of discontinuities of x(t) in any finite interval on the real line is finite.
x(t + )+x(t − )
4. When x(t) is discontinuous at t then x(t) = 2 ,
exists and the original signal can be obtained from its Fourier transform by
Z ∞
x(t) = X ( f ) exp( j2π f t)d f. (B.2)
−∞
161
APPENDIX B. THE FOURIER TRANSFORM 162
Proof: Note that for a real signal g(t) the Fourier transform G( f ) satisfies G(− f ) = G ∗ ( f ).
Then
Z ∞ Z ∞ Z ∞ Z ∞
f (t)g(t)dt = F( f ) exp( j2π f t)d f G( f ) exp( j2π f t)d f dt
−∞ −∞ −∞ −∞
Z ∞ Z ∞ Z ∞
∗ 0 0 0
= F( f ) exp( j2π f t)d f G ( f ) exp(− j2π f t)d f dt
−∞ −∞ −∞
Z ∞ Z ∞ Z ∞
∗ 0 0 0
= F( f ) G (f ) exp( j2π t ( f − f ))dt d f d f. (B.4)
−∞ −∞ −∞
We know that Z ∞
exp( j2π t ( f − f 0 ))dt = δ( f − f 0 ), (B.5)
−∞
therefore
Z ∞ Z ∞ Z ∞
∗ 0 0 0
f (t)g(t)dt = F( f ) G ( f )δ( f − f )d f df
−∞
Z−∞
∞
−∞
= F( f )G ∗ ( f )d f. (B.6)
−∞
2
Appendix C
We call this property the sifting property of the impulse signal. Note that we defined the impulse
signal δ(t) by desribing its action on the test function f (t) and not by specifying its value for
different values of t.
Definition C.2 The impulse response h(t) of a system L is the response of the system to an
impulse input δ(t), thus
h(t) = L[δ(t)]. (C.3)
The response of the time-invariant system time-invariant system to a unit response applied at
time τ , i.e. δ(t − τ ) is obviously h(t − τ ).
163
APPENDIX C. IMPULSE SIGNAL, FILTERS 164
Now we can determine the response of a linear time-invariant system L to an input signal
x(t) as follows:
y(t) = L[x(t)]
Z ∞
= L[ x(τ )δ(t − τ )dτ ]
−∞
Z ∞
= x(τ )L[δ(t − τ )]dτ
−∞
Z ∞
= x(τ )h(t − τ )dτ. (C.4)
−∞
The final integral is called the convolution of the signal x(t) and the impulse response h(t).
A linear time-invariant system is often called a filter.
Appendix D
We will investigate here what the influence is of a filter with impulse response h(t) on the spec-
trum Sx ( f ) of the input random process X (t). Then we study the consequences of our findings.
Consider figure D.1. The filter produces the random process Y (t) at its output while the input
process is X (t).
x(t) y(t)
- h(t) -
165
APPENDIX D. CORRELATION FUNCTIONS, POWER SPECTRA 166
with ν = t − α and µ = s − β. Observe that now also R y (t, s) only depends on the time
difference τ = t − s hence R y (t, s) = R y (τ ).
Next consider the Fourier transforms Sx ( f ) and S y ( f ) of Rx (τ ) and R y (τ ) respectively (see
appendix B):
Z ∞
1 R∞
Sx ( f ) = R x (τ ) exp(− j2π f τ )dτ and Rx (τ ) = −∞ Sx ( f ) exp( j2π f τ )d f
Z−∞
∞
1 R∞
Sy ( f ) = R y (τ ) exp(− j2π f τ )dτ and R y (τ ) = −∞ S y ( f ) exp( j2π f τ )d f. (D.4)
−∞
Note that these Fourier transforms can only be defined if the correlation functions depend only
on the time-difference τ = t − s. Next we obtain
Z ∞Z ∞
R y (τ ) = R x (τ + µ − ν)h(ν)h(µ)dνdµ
−∞ −∞
Z ∞Z ∞ Z ∞
= [ Sx ( f ) exp( j2π f (τ + µ − ν)d f ]h(ν)h(µ)dνdµ
−∞ −∞ −∞
Z ∞ Z ∞ Z ∞
= Sx ( f )[ h(ν) exp(− j2π f ν)dν][ h(µ) exp(− j2π f µ)dµ] exp( j2π f τ )d f
−∞ −∞ −∞
Z ∞
= Sx ( f )H ( f )H ∗ ( f ) exp( j2π f τ )d f, (D.5)
−∞
hence
S y ( f ) = Sx ( f )H ( f )H ∗ ( f ) = Sx ( f )|H ( f )|2 . (D.6)
D.3 Interpretation
First note that the mean square value of the filter output process Y (t) is time-independent if
Rx (t, s) = Rx (t − s). This follows from
6H( f )
f
-
− f2 − f1 0 f1 f2
Therefore R y (0) is the expected value of the power that is dissipated in a resistor of 1 connected
to the output of the filter, at any time instant. Next note that
Z ∞ Z ∞
R y (0) = S y ( f )d f = Sx ( f )|H ( f )|2 d f. (D.8)
−∞ −∞
Consider a filter with transfer function H ( f ) shown in figure D.2. This is a bandpass filter and
1 for f 1 ≤ | f | ≤ f 2 ,
H( f ) = (D.9)
0 elsewhere.
Then we obtain Z Z
− f1 f2
R y (0) = S y ( f )d f + S y ( f )d f. (D.10)
− f2 f1
Now let f 1 = f and f 2 = f + 1 f . The filter H ( f ) now only tranfers components of X (t)
in the frequency band ( f, f + 1 f ) and stops all other components. Since a power spectrum
is always even, see section D.6, the expected power at the output of the filter is approximately
2Sx ( f )1 f . This implies that Sx ( f ) is the distribution of average power in the process X (t) over
the frequencies. Therefore we call Sx ( f ) the power spectral density function of X (t).
m z (t) = constant
and Rz (t, s) = Rz (t − s). (D.11)
For a WSS process Z (t) it is meaningful to consider its power spectral density Sz ( f ). For
a WSS process Y (t) the expected power can be expressed as in equation (D.7). When a WSS
process X (t) is the input of a filter then the filter output process Y (t) is also WSS and its power
spectral density S y ( f ) is related to the input power spectral density Sx ( f ) as given by (D.8).
These consequences already justify the concept wide-sense stationarity.
APPENDIX D. CORRELATION FUNCTIONS, POWER SPECTRA 168
b) Next we show that Sx ( f ) is a real even function of f . Since Rx (τ ) is an even funtion and
sin(2π f τ ) an odd function of τ
Z ∞
Rx (τ ) sin(2π f τ )dτ = 0. (D.13)
−∞
Therefore
Z ∞
Sx ( f ) = R x (τ ) exp(−2π f τ )dτ
Z−∞
∞
= R x (τ )[cos(2π f τ ) − j sin(2π f τ )]dτ
Z−∞
∞
= R x (τ ) cos(2π f τ )dτ, (D.14)
−∞
We will only prove result 6.2 for signal sets with signals sm (t) 6≡ 0 for all m ∈ M. The statement
then also should hold for sets that do contain all-zero signals.
1. First we will show that the induction hypothesis holds for one signal. Therefore consider
s1 (t) and take Z T
s1 (t)
ϕ1 (t) = p with E s1 = s12 (t)dt. (E.1)
E s1 0
2. Now suppose that induction hypothesis holds for m − 1 ≥ 1 signals. Thus there exists an
orthonormal basis ϕ1 (t), ϕ2 (t), · · · , ϕn−1 (t) for the signals s1 (t), s2 (t), · · · , sm−1 (t) with
n − 1 ≤ m − 1. Now consider the next signal sm (t) and an auxiliary signal θm (t) which is
defined as follows: X
1
θm (t) = sm (t) − smi ϕi (t) (E.3)
i=1,n−1
with Z T
smi = sm (t)ϕi (t)dt for i = 1, n − 1. (E.4)
0
We can distinguish between two cases now:
P
(a) If θm (t) ≡ 0 then sm (t) = i=1,n−1 smi ϕi (t) and the induction hypothesis also holds
for m signals in this case.
169
APPENDIX E. GRAM-SCHMIDT PROCEDURE, PROOF, EXAMPLE 170
(b) If on the other hand θm (t) 6≡ 0 then take a new building-block waveform
Z T
θm (t)
ϕn (t) = p with E θm = θm2 (t)dt. (E.5)
E θm 0
By doing so
Z T Z T 1 2
ϕn2 (t)dt = θ (t)dt = 1, (E.6)
0 0 E θm m
i.e. also ϕn (t) has unit energy. Moreover for all i = 1, n − 1
Z T Z T
1
ϕn (t)ϕi (t)dt = p θm (t)ϕi (t)dt
0 0 E θm
Z Z T X
1 T
= p sm (t)ϕi (t)dt − sm j ϕ j (t)ϕi (t)dt
E θm 0 0 j=1,n−1
X Z T
1
= p smi − sm j ϕ j (t)ϕi (t)dt
E θm j=1,n−1 0
1 X
= p (smi − sm j δ ji ) = 0, (E.7)
E θm j=1,n−1
It should be noted that, when using the Gram-Schmidt procedure, any ordering of the
signals other than s1 (t), s2 (t), · · · , s|M| (t) will yield a basis, i.e. a set of building-block
waveforms, of the smallest possible dimensionality (see exercise 1 in chapter 6), however
in general with different building-block waveforms.
E.2 An example
We will next discuss an example in which we actually carry out the Gram-Schmidt procedure for
a given set of waveforms.
APPENDIX E. GRAM-SCHMIDT PROCEDURE, PROOF, EXAMPLE 171
s1 (t) ϕ1 (t)
3 3
1 √
t 1/ 3 t
√
-1 1 2 3 −1/ 3 1 2 3
-3 -3
-3 -3 -3
s3 (t)
3
1 1 2 3
θ3 (t) ≡ 0
-1 t
-3
Example E.1 Consider the three waveforms sm (t), m = 1, 2, 3, shown in figure E.1. Note that T = 3.
We first determine from the energy E 1 of the first signal s1 (t)
E 1 = 4 + 4 + 4 = 12. (E.9)
ϕ2 √
s2 2 2
J
]
J
J
J √
J 2
J
√ J √ √
− 3 J 3 2 3
J -
s1 ϕ1
√
− 2
√
s3
À −2 2
We have now obtained three vectors of coefficients, one for each signal:
√
s 1 = (s11 , s12 ) = (2 3, 0),
√ √
s 2 = (s21 , s22 ) = (− 3, 2 2), and
√ √
s 3 = (s31 , s32 ) = (− 3, −2 2). (E.19)
Schwarz inequality
LEMMA F.1 (Schwarz inequality) For two finite-energy waveforms a(t) and b(t) the inequal-
ity
Z ∞ 2 Z ∞ Z ∞
2
a(t)b(t)dt ≤ a (t)dt b2 (t)dt (F.1)
−∞ −∞ −∞
holds. Equality is obtained only if b(t) ≡ Ca(t) for some constant C.
Proof: Form an orthonormal expansion for a(t) and b(t), i.e.
If we now can prove that (a · b)2 ≤ kak2 kbk2 we get Schwarz inequality.
We start by stating the triangle inequality for a and b:
Moreover
ka + bk2 = (a + b)2 = kak2 + kbk2 + 2(a · b) (F.5)
hence
kak2 + kbk2 + 2(a · b) ≤ kak2 + kbk2 + 2kakkbk, (F.6)
and therefore
(a · b) ≤ kakkbk. (F.7)
174
APPENDIX F. SCHWARZ INEQUALITY 175
or
(a · b) ≥ −kakkbk. (F.9)
We therefore may conclude that −kakkbk ≤ (a · b) ≤ kakkbk or (a · b)2 ≤ kak2 kbk2 and this
proves Schwarz inequality.
Equality in Schwarz inequality is achieved only when equality in the triangle inequality is
obtained. This the case when (first part of proof) b = Ca or (second part) −b = (−b) = Ca for
some non-negative C. 2
Appendix G
The term in square brackets is the probability that at least one of |M| − 1 noise components
exceeds µ. By the union bound this probability is not larger than the sum of the probabilities that
individual components exceed µ, thus
1 − (1 − Q(µ))|M|−1 ≤ 1. (G.4)
When µ is small Q(µ) is large and then the unity bound is tight. On the other hand for large µ
the bound |M|Q(µ) is tighter. Therefore we split the integration range into two parts, (−∞, a)
and (a, ∞): Z Z
a ∞
PE ≤ p(µ − b)dµ + |M| p(µ − b)Q(µ)dµ. (G.5)
−∞ a
If we take a ≥ 0 we can use the upper bound Q(µ) ≤ exp(−µ2 /2), see the proof in appendix
A, and we obtain
Z a Z ∞
µ2
PE ≤ p(µ − b)dµ + |M| p(µ − b) exp(− )dµ. (G.6)
−∞ a 2
176
APPENDIX G. BOUND ERROR PROBABILITY ORTHOGONAL SIGNALING 177
PE ≤ P1 + |M|P2 . (G.7)
To minimize the bound we choose a such that the derivative of (G.6) with respect to a is equal
to 0, i.e.
d
0 = [P1 + |M|P2 ]
da
a2
= p(a − b) − |M| p(a − b) exp(− ). (G.8)
2
This results in
a2
) = M.
exp( (G.9)
2
√
Therefore the value a = 2 ln |M| achieves a (at least a local) minimum in PE . Note that a ≥ 0
as was required. Now we can consider the separate terms. For the first term we can write
Z a 1 (µ − b)2
P1 = √ exp(− )dµ
−∞ 2π 2
Z a−b
1 γ2
= √ exp(− )dγ
−∞ 2π 2
= Q(b − a)
(∗) (b − a)2
≤ exp(− ) if 0 ≤ a ≤ b. (G.10)
2
Here the inequality (∗) follows from appendix A. For the second term we get
Z ∞1 (µ − b)2 µ2
P2 = √ exp(− ) exp(− )dµ
a 2π 2 2
2 Z ∞
(∗∗) b 1
= exp(− ) √ exp(−(µ − b/2)2 )dµ
4 a 2π
√ √ 2
2 Z ∞ µ 2 − (b/2) 2 √
b 1 1
= exp(− ) √ √ exp(− )dµ 2
4 2 a 2π 2
√ 2
2 Z ∞ γ − (b/2) 2
b 1 1
= exp(− ) √ √ √ exp(− )dγ
4 2 a 2 2π 2
b2 1 √ √
= exp(− ) √ Q(a 2 − (b/2) 2). (G.11)
4 2
APPENDIX G. BOUND ERROR PROBABILITY ORTHOGONAL SIGNALING 178
1 µ2 µ2 b 2 µ2
(µ − b)2 + = − µb + +
2 2 2 2 2
b 2 b2
= µ2 − µb + +
4 4
b b 2
= (µ − )2 + . (G.12)
2 4
From (G.11) we get
exp(−b2 /4) 0 ≤ a ≤ b/2
P2 ≤ (G.13)
exp(−b2 /4 − (a − b/2)2 ) a ≥ b/2.
If we now collect the bounds (G.10) and (G.13) and substitute the optimal value of a found in
(G.9) we get
exp(−(b − a)2 /2) + exp(a 2 /2) exp(−b2 /4) 0 ≤ a < b/2
PE ≤ (G.14)
exp(−(b − a)2 /2) + exp(a 2 /2) exp(−b2 /4 − (a − b/2)2 ) b/2 ≤ a ≤ b.
where we used the following definition for E b i.e. the energy per transmitted bit of information
1 Es
Eb = . (G.18)
log2 |M|
Decibels
It is easy to work with decibels since by doing so we transform multiplications into additions.
Table H.1 shows for some energy ratios X the equivalent ratio in dB.
X X |in dB
0.01 -20.0 dB
0.1 -10.0 dB
1.0 0.0 dB
2.0 3.0 dB
3.0 4.8 dB
5.0 7.0 dB
10.0 10.0 dB
100.0 20.0 dB
Table H.1: Decibels.
179
Appendix I
1 1 X
E pam = s 2. (I.1)
|A|
s=−|A|+1,−|A|+3,··· ,|A|−1
We want to express E pam in terms of the number of symbols |A|. We will only consider the case
where |A| is even. A similar treatment can be given for odd |A|. For even |A|
2 X
E pam = (2k − 1)2 . (I.2)
|A|
k=1,|A|/2
Note that the induction hypothesis holds for H = 1. Next suppose that it holds for H = h ≥ 1.
Now it turns out that
X X
(2k − 1)2 = (2k − 1)2 + (2h + 1)2
k=1,h+1 k=1,h
4h 3 − h
= + (2h + 1)2
3
4h 3 − h + 12h 2 + 12h + 3
=
3
4h 3 + 12h 2 + 12h + 4 − h − 1 4(h + 1)3 − (h + 1)
= = , (I.4)
3 3
180
APPENDIX I. ELEMENTARY ENERGY E PAM OF AN |A|-PAM CONSTELLATION 181
and thus the induction hypothesis also holds for h + 1 and thus for all H . Therefore we obtain
[1] A.R. Calderbank, T.A. Lee, and J.E. Mazo, “Baseband Trellis Codes with a Spectral Null at
Zero,” IEEE Trans. Inform. Theory, vol. IT-34, pp. 425-434, May 1988.
[2] G.A. Campbell, U.S. Patent 1,227,113, May 22, 1917, “Basic Types of Electric Wave Fil-
ters.”
[3] J.R. Carson, “Notes on the Theory of Modulation,” Proc. IRE, vol. 10, pp. 57-64, Febru-
ary 1922. Reprinted in Proc. IEEE, Vol. 51, pp. 893-896, June 1963.
[4] T.M. Cover, “Enumerative Source Encoding,” IEEE Trans. Inform. Theory, vol. IT-19, pp.
73-77, Jan. 1973.
[5] G.D. Forney, Jr., R.G. Gallager, G.R. Lang, F.M. Longstaff, and S.U. Quereshi, “Efficient
Modulation for Band-Limited Channels,” IEEE Journ. Select. Areas Comm., vol. SAC-2, pp.
632-647, Sept. 1984.
[6] G.D. Forney, Jr., and G. Ungerboeck, “Modulation and Coding for Linear Gaussian Chan-
nels,” IEEE Trans. Inform. Theory, vol. 44, pp. 2384 - 2415, October 1998.
[7] R.G. Gallager, Information Theory and Reliable Communication, Wiley and Sons, 1968.
[8] R.V.L. Hartley, “Transmission of Information,” Bell Syst. Tech. J., vol. 7, pp. 535 - 563, July
1928.
[9] C.W. Helstrom, Elements of Signal Detection and Estimation, Prentice Hall, 1995.
[10] S. Haykin, Communication Systems. John Wiley & Sons, 4th edition, 2000.
[11] V.A. Kotelnikov, The Theory of Optimum Noise Immunety. Dover Publications, 1960.
[13] E.A. Lee and D.G. Messerschmitt, Digital Communication, 2nd Edition, Kluwer Academic,
1994.
182
BIBLIOGRAPHY 183
[14] D.O. North, Analysis of the Factors which determine Signal/Noise Discrimination in Radar.
RCA Laboratories, Princeton, Technical Report, PTR-6C, June 1943. Reprinted in Proc.
IEEE, vol. 51, July 1963.
[15] H. Nyquist, “Certain factors affecting telegraph speed,” Bell Syst. Tech. J., vol. 3, pp. 324 -
346, April 1924.
[16] B.M. Oliver, J.R. Pierce, and C.E. Shannon, “The Philosophy of PCM,” Proc. IRE, vol. 36,
pp.1324-1331, Nov. 1948.
[17] J.G. Proakis and M. Salehi, Communication Systems Engineering. Prentice Hall, 1994.
[19] J.P.M. Schalkwijk, “An Algorithm for Source Coding,” IEEE Trans. Inform. Theory, vol.
IT-18, pp. 395-399, May 1972.
[20] C.E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech. J., vol. 27, pp.
379 - 423 and 623 - 656, July and October 1948. Reprinted in the Key Papers on Information
Theory.
[21] C.E. Shannon, “Communication in the Presence of Noise,” Proc. IRE, vol. 37, pp. 10 - 21,
January 1949. Reprinted in the Key Papers on Information Theory.
[22] G. Ungerboeck, “Channel Coding with Multilevel/Phase Signals,” IEEE Trans. Inform.
Theory, vol. IT-28, pp. 55-67, Jan. 1982.
[24] J.H. Van Vleck and D. Middleton, “A Theoretical Comparison of Visual, Aural, and Meter
Reception of Pulsed Signals in the Presence of Noise,” Journal of Applied Physics, vol. 17,
pp. 940-971, November 1946.
[25] J.M. Wozencraft and I.M. Jacobs, Principles of Communication Engineering, Wiley, New
York, 1965.
[26] S.T.M. Ackermans and J.H. van Lint, Algebra en Analyse, Academic Service, Den Haag,
1976.
Index
184
INDEX 185
telegraph, 9
telephone, 9
Telstar I, 10
transistor, 10
transmitter, 15
triode, 10