0% found this document useful (0 votes)
765 views188 pages

Comtheory

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
765 views188 pages

Comtheory

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 188

COMMUNICATION THEORY

Frans M.J. Willems1

Fall 2003

1
Group Signal Processing Systems, Electrical Engineering Department, Eindhoven University of Tech-
nology.
Contents

I Some History 7
1 Historical facts related to communication 8
1.1 Telegraphy and telephony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Wireless communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Electronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Classical modulation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Fundamental bounds and concepts . . . . . . . . . . . . . . . . . . . . . . . . . 11

II Optimum Receiver Principles 14


2 Decision rules for discrete channels 15
2.1 System description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Decision rules, an example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 The “maximum a-posteriori” decision rule . . . . . . . . . . . . . . . . . . . . . 18
2.4 The “maximum likelihood” decision rule . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Scalar and vector channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Decision rules for the real scalar channel 25


3.1 System description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 MAP and ML decision rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 The scalar additive Gaussian noise channel . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 The MAP decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.3 The Q-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.4 Probability of error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Real vector channels 34


4.1 System description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Decision variables, MAP decoding . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Decision regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

1
CONTENTS 2

4.4 The additive Gaussian noise vector channel . . . . . . . . . . . . . . . . . . . . 37


4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.2 Minimum Euclidean distance decision rule . . . . . . . . . . . . . . . . 38
4.4.3 Error probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Theorem of reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Waveform channels, correlation receiver 45


5.1 System description, additive white Gaussian noise . . . . . . . . . . . . . . . . . 45
5.2 Waveform expansions in series of orthonormal functions . . . . . . . . . . . . . 46
5.3 An equivalent vector channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 A correlation receiver is optimum . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Sufficient statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6 Waveform channels, building-blocks 52


6.1 Another sufficient statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 The Gram-Schmidt orthogonalization procedure . . . . . . . . . . . . . . . . . . 54
6.3 Transmitting vectors instead of waveforms . . . . . . . . . . . . . . . . . . . . . 54
6.4 Signal space, signal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5 Receiving vectors instead of waveforms . . . . . . . . . . . . . . . . . . . . . . 57
6.6 The additive Gaussian vector channel . . . . . . . . . . . . . . . . . . . . . . . 57
6.7 Processing the received vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.8 Relation between waveform and vector channel . . . . . . . . . . . . . . . . . . 59
6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7 Matched filters 62
7.1 Matched-filter receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.2 Direct receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.3 Signal-to-noise ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.4 Parseval relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

8 Orthogonal signaling 72
8.1 Orthogonal signal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.2 Optimum receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.3 Error probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.4 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
CONTENTS 3

III Transmitter Optimization 79


9 Signal energy considerations 80
9.1 Translation and rotation of signal structures . . . . . . . . . . . . . . . . . . . . 80
9.2 Signal energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.3 Translating a signal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.4 Comparison of orthogonal and antipodal signaling . . . . . . . . . . . . . . . . . 82
9.5 Energy of |M| orthogonal signals . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

10 Signaling for message sequences 88


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.2 Definitions, problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.3 Bit-by-bit signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.3.2 Probability of error considerations . . . . . . . . . . . . . . . . . . . . . 92
10.4 Block-orthogonal signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
10.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
10.4.2 Probability of error considerations . . . . . . . . . . . . . . . . . . . . . 93
10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

11 Time, bandwidth, and dimensionality 95


11.1 Number of dimensions for bit-by-bit and block-orthogonal signaling . . . . . . . 95
11.2 Dimensionality as a function of T . . . . . . . . . . . . . . . . . . . . . . . . . 95
11.3 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

12 Bandlimited transmission 100


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.2 Sphere hardening (noise vector) . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.3 Sphere hardening (received vector) . . . . . . . . . . . . . . . . . . . . . . . . . 102
12.4 The interior volume of a hypersphere . . . . . . . . . . . . . . . . . . . . . . . . 102
12.5 An upper bound for the number |M| of signal vectors . . . . . . . . . . . . . . . 103
12.6 A lower bound to the number |M| of signal vectors . . . . . . . . . . . . . . . . 104
12.6.1 Generating a random code . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.6.2 Error probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.6.3 Achievability result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
12.7 Capacity of the bandlimited channel . . . . . . . . . . . . . . . . . . . . . . . . 106
12.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

13 Wideband transmission 108


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
13.2 Capacity of the wideband channel . . . . . . . . . . . . . . . . . . . . . . . . . 108
CONTENTS 4

13.3 Relation between capacities, signal-to-noise ratio . . . . . . . . . . . . . . . . . 109


13.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

IV Channel Models 111


14 Pulse amplitude modulation 112
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
14.2 Orthonormal pulses: the Nyquist criterion . . . . . . . . . . . . . . . . . . . . . 113
14.2.1 The Nyquist result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
14.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
14.2.3 Proof of the Nyquist result . . . . . . . . . . . . . . . . . . . . . . . . . 115
14.2.4 Receiver implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 117
14.3 Multi-pulse transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
14.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

15 Bandpass channels 120


15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
15.2 Quadrature multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
15.3 Optimum receiver for quadrature multiplexing . . . . . . . . . . . . . . . . . . . 125
15.4 Transmission of a complex signal . . . . . . . . . . . . . . . . . . . . . . . . . . 127
15.5 Dimensions per second . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
15.6 Capacity of the bandpass channel . . . . . . . . . . . . . . . . . . . . . . . . . . 128
15.7 Quadrature amplitude modulation . . . . . . . . . . . . . . . . . . . . . . . . . 128
15.8 Serial quadrature amplitude modulation . . . . . . . . . . . . . . . . . . . . . . 128
15.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

16 Random carrier-phase 130


16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
16.2 Optimum incoherent reception . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
16.3 Signals with equal energy, receiver implementation . . . . . . . . . . . . . . . . 133
16.4 Envelope detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
16.5 Probability of error for two orthogonal signals . . . . . . . . . . . . . . . . . . . 137
16.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

V Coded Modulation 142


17 Coding for bandlimited channels 143
17.1 AWGN channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
17.2 |A|-Pulse amplitude modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 144
17.3 Normalized S/N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
17.4 Baseline performance in terms of S/N norm . . . . . . . . . . . . . . . . . . . . 146
CONTENTS 5

17.5 The promise of coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147


17.6 Union bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
17.6.1 Sequence error probability . . . . . . . . . . . . . . . . . . . . . . . . . 148
17.6.2 Symbol error probability . . . . . . . . . . . . . . . . . . . . . . . . . . 149
17.7 Extended Hamming codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
17.8 Coding by set-partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
17.8.1 Construction of the signal set . . . . . . . . . . . . . . . . . . . . . . . . 150
17.8.2 Distance between the signals . . . . . . . . . . . . . . . . . . . . . . . . 151
17.8.3 Asymptotic coding gain . . . . . . . . . . . . . . . . . . . . . . . . . . 152
17.8.4 Effective coding gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
17.8.5 Coding gains for more complex codes . . . . . . . . . . . . . . . . . . . 155
17.9 Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
17.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

VI Appendices 158
A An upper bound for the Q-function 159

B The Fourier transform 161


B.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
B.2 Some properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
B.2.1 Parseval’s relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

C Impulse signal, filters 163


C.1 The impulse or delta signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
C.2 Linear time-invariant systems, filters . . . . . . . . . . . . . . . . . . . . . . . . 163

D Correlation functions, power spectra 165


D.1 Expectation of an integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
D.2 Power spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
D.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
D.4 Wide-sense stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
D.5 Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
D.6 Properties of Sx ( f ) and R x (τ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

E Gram-Schmidt procedure, proof, example 169


E.1 Proof of result 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
E.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

F Schwarz inequality 174

G Bound error probability orthogonal signaling 176


CONTENTS 6

H Decibels 179

I Elementary energy E pam of an |A|-PAM constellation 180


Part I

Some History

7
Chapter 1

Historical facts related to communication

SUMMARY: In this chapter we mention some historical facts related to telegraphy and
telephony, wireless communication, electronics, and modulation. Furthermore we shortly
discuss some classic fundamental bounds and concepts relevant for digital communication.

1.1 Telegraphy and telephony


1800 Alessandro Volta (Italy 1745 - 1827) Announcement of the electric battery (Volta’s pile).
This battery made constant-current electricity possible. Motivated by the “frog-leg” exper-
iments of Luigi Galvani, anatomy professor at the university of Bologna, Volta discovered
that it was possible to construct a battery from a plate of silver and a plate of zinc separated
by spongy matter impregnated with a saline solution, repeated thirty or forty times.

1819 Hans Christian Oersted (Denmark 1777 - 1851) Discovery of the fact that an electric
current generates a magnetic field. This can be regarded as the first electro-magnetic re-
sult. It took twenty years after Volta’s invention to realize and prove that this effect exists
although it was known at the time that a stroke of lightning magnetized objects.

1827 Joseph Henry (U.S. 1797 - 1878) Constructed electromagnets using coils of wire.

1838 Samuel Morse (U.S. 1791 - 1872) Demonstration of the electric telegraph in Morristown,
New Jersey. The first telegraph line linked Washington with Baltimore. It became opera-
tional in May 1844. By 1848 every state east of the Mississippi was linked by telegraph
lines. The alphabet that was designed by Morse (a portrait-painter) transforms letters into
variable-length sequences (code words) of dots and dashes (see table 1.1). A dash should
last roughly three times as long as a dot. Frequent letters (e.g. E) get short code words,
less frequently occurring letters (e.g. J, Q, and Y) are represented by longer words. The
first transcontinental telegraph line was completed in 1861. A transatlantic cable became
operational in 1866. However already in 1858 a cable was completed but it failed after a
few weeks.

8
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 9

A .- G --. M -- S ... Y -.--


B -... H .... N -. T - Z --..
C -.-. I .. O --- U ..- period .-.-.-
D -.. J .--- P .--. V ...- ? ..--..
E . K -.- Q --.- W .--
F ..-. L .-.. R .-. X -..-
Table 1.1: Morse code.

It should be mentioned that in 1833 C.F. Gauss and W.E. Weber demonstrated an electro-
magnetic telegraph in Göttingen, Germany.

1875 Alexander Graham Bell (Scotland 1847 - 1922 U.S.) Invention of the telephone. Bell
was a teacher of the deaf. His invention was patented in 1876 (electro-magnetic tele-
phone). In 1877 Bell established the Bell Telephone Company. Early versions provided
service over several hundred miles. Advances in quality resulted e.g. from the invention
of the carbon microphone.

1900 Michael Pupin (Yugoslavia 1858 - 1935 U.S.) Obtained a patent on loading coils for the
improvement of telephone communications. Adding these Pupin-coils at specified inter-
vals along a telephone line reduced attenuation significantly. The same invention was
disclosed by Campbell two days later than Pupin. Pupin sold his patent for $455000 to
AT&T Co. Pupin (and Campbell) were the first to set up a theory of transmission lines.
The implications of this theory were not obvious at all for ”experimentalists”.

1.2 Wireless communications


1831 Michael Faraday (England 1791 - 1867) Discovery of the fact that a changing magnetic
field induces an electric current in a conducting circuit. This is the famous law of induction.

1873 James Clerk Maxwell (Scotland 1831 - 1879, England) The publication of “Treatise on
Electricity and Magnetism”. Maxwell combined the results obtained by Oersted and Fara-
day into a single theory. From this theory he could predict the existence of electromagnetic
waves.

1886 Heinrich Hertz (Germany 1857 - 1894) Demonstration of the existence of electromag-
netic waves. In his laboratory at the university of Karlsruhe, Hertz used a spark-transmitter
to generate the waves and a resonator to detect them.

1890 Edouard Branly (France 1844 - 1940) Origination of the coherer. This is a receiving de-
vice for electromagnetic waves based on the principle that, while most powdered metals are
poor direct current conductors, metallic powder becomes conductive when high-frequency
current is applied.
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 10

1896 Aleksander Popov (Russia 1859 - 1906) Wireless telegraph transmission between two
buildings (200 meters) was shown to be possible. Marconi did similar experiments at
roughly the same time.

1901 Guglielmo Marconi (Italy 1874 - 1937) A radio signal was received at St. John’s, New-
foundland. The radio signal had originated from Cornwall, England, 1700 miles away.

1955 John R. Pierce (U.S. 1910 - 2002) Pierce (Bell Laboratories) proposed the use of satel-
lites for communications and did pioneering work in this area. Originally this idea came
from Arthur C. Clarke who suggested already in 1945 the idea to use earth-orbiting satel-
lites as relay points between earth stations. The first satellite, Telstar I, built by Bell Lab-
oratories, was launched in 1962. It served as a relay station for TV programs across the
Atlantic. Pierce also headed the Bell-Labs team that developed the transistor and suggested
the name for it.

1.3 Electronics
1874 Karl Ferdinand Braun (Germany 1850 - 1918) Observation of rectification at metal con-
tacts to galena (lead sulfide). These semiconductor devices were the forerunners of the
“cat’s whisker” diodes used for detection of radio channels (crystal detectors).

1904 John Ambrose Fleming (England 1849 - 1945) Invention of the thermionic diode, “the
valve”, that could be used as rectifier. A rectifier can convert trains of high-frequency
oscillations into trains of intermittent but unidirectional current. A telephone can then be
used to produce a sound with the frequency of the trains of the sparks.

1906 Lee DeForest (U.S. 1873 - 1961) Invention of the vacuum triode, a diode with grid con-
trol. This invention made practical the cascade amplifier, the triode oscillator and the
regenerative feedback circuit. As a result transcontinental telephone transmission became
operational in 1915. A transatlantic cable was laid not earlier than 1953.

1948 John Bardeen, Walter Brattain and William Shockley Development of the theory of the
junction transistor. This transistor was first fabricated in 1950. The point contact transistor
was invented in December 1947 at Bell Telephone Labs.

1958 Jack Kilby and Robert Noyce (U.S.) Invention of the integrated circuit.

1.4 Classical modulation methods


Beginning of the 20th century Carrier-frequency currents were generated by electric arcs or
by high-frequency alternators. These currents were modulated by carbon transmitters and
demodulated (or converted to audio) by a crystal detector or an other kind of rectifier.
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 11

1909 George A. Campbell (U.S. 1870 - 1954) Internal AT&T memorandum on the “Electric
Wave Filter”. Campbell there discussed band-pass filters that would reject frequencies
other than those in a narrow band. A patent application filed in 1915 was awarded [2].

1915 Edwin Howard Armstrong (U.S. 1890 - 1954) Invention of regenerative amplifiers and
oscillators. Regenerative amplification considerably increased the sensitivity of the re-
ceiver. DeForest claimed the same invention.

1915 Hendrik van der Bijl, Ralph V.L. Hartley, and Raymond A. Heising In a patent that
was filed in 1915 van der Bijl showed how the non-linear portion of a vacuum tube could be
utilized to modulate and demodulate. Heising participated in a project (in Arlinton, VA) in
which a vacuum-tube transmitter based on the van der Bijl modulator was designed. Heis-
ing invented the constant-current modulator. Hartley also participated in the Arlington
project and designed the receiver. He invented an oscillator that was named after him. In a
paper published in 1923 Hartley explained how suppression of the carrier signal and using
only one of the sidebands could economize transmitter power and reduce interference.

1915 John R. Carson Publication of a mathematical analysis of the modulation and demodula-
tion process. Description of single-sideband and suppressed carrier methods.

1918 Edwin Howard Armstrong Invention of the superheterodyne radio receiver. The combi-
nation of the received signal with a local oscillation resulted in an audio beat-note.

1922 John R. Carson “Notes on the Theory of Modulation [3]”. Carson compares amplitude
modulation (AM) and frequency modulation (FM) . He came to the conclusion that FM
was inferior to AM from the perspective of bandwidth requirement and distortion. He
overlooked the possibility that wide-band FM might offer some advantages over AM as
was later demonstrated by Armstrong.

1933 Edwin Howard Armstrong Demonstration of an FM system to RCA. A patent was gran-
ted to Armstrong. Despite of the larger bandwidth that is needed, FM gives a considerable
better performance than AM.

1.5 Fundamental bounds and concepts


1924 Harry Nyquist (Sweden 1889 - 1976 U.S.) Shows that the number of resolvable (non-
interfering) pulses that can be transmitted per second over a bandlimited channel is pro-
portional to the channel bandwidth [15]. If the bandwidth is W Hz the number of pulses
per second cannot exceed 2W . This is strongly related to the sampling theorem which says
that time is essentially discrete. If a time-continuous signal has bandwidth not larger than
W Hz then it is completely specified by samples taken from it at discrete time instants
1/2W seconds apart.
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 12

n(t)

º?·
s(t) linear ŝ(t)
- + - -
¹¸
filter

Figure 1.1: The communication problem considered by Wiener.

1928 Ralph V.L. Hartley (U.S. 1888 - 1970) Determined the number of distinguishable pulse
amplitudes [8]. Suppose that the amplitudes are confined to the interval [−A, +A], and
that the receiver can estimate an amplitude reliably only to an accuracy of ±1. Then
the number of distinguishable pulses is roughly 2(A+1)
21 = 1 A
+ 1. Clearly Hartley was
concerned with digital communication. He realized that inaccuracy (due to noise) limited
the amount of information that could be transmitted.

1938 Alec Reeves Invention of Pulse Code Modulation (PCM) for digital encoding of speech
signals. In World War II PCM-encoding allowed transmission of encrypted speech (Bell
Laboratories). In [16] Oliver, Pierce and Shannon compared PCM to classical modulation
systems.

1939 Homer Dudley Description of a vocoder. This system did overthrow the idea that com-
munication requires a bandwidth at least as wide as that of the signal to be communicated.

1942 Norbert Wiener (U.S. 1894 - 1964) Investigated the problem shown in figure 1.1. The
linear filter is to be chosen such that its output ŝ(t) is the best mean-square approximation
to s(t) given the statistical properties of the processes S(t) and N (t). A drawback of
the result of Wiener was that that modulation did not fit into his model and could not be
analyzed.

1943 Dwight O. North Discovery of the matched filter. This filter was shown to be the optimum
detector of a known signal in additive white noise.

1947 Vladimir A. Kotelnikov (Russia 1908 - ) Analysis of several modulation systems. His
noise immunity work [11] dealt with how to design receivers (but not how to choose the
transmitted signals) to minimize error probability in the case of digital signals or mean
square error in the case of analog signals. Kotelnikov discovered independently of Nyquist,
the sampling theorem in 1933.

1948 Claude E. Shannon (U.S. 1916 - 2001) Publication of “A Mathematical Theory of Com-
munication [20].” Shannon (see figure 1.2) showed that noise does not place an inescapable
restriction on the accuracy of communication. He showed that noise properties, channel
bandwidth, and restricted signal magnitude can be incorporated into a single parameter C
which he called the channel capacity. If the cardinality |M| of the message M grows as a
CHAPTER 1. HISTORICAL FACTS RELATED TO COMMUNICATION 13

Figure 1.2: Claude E. Shannon, founder of Information Theory. Photo IEEE-IT Soc. Newsl.,
Summer 1998.

function of the signal duration T slowly enough such that


1
log2 |M| < C,
T
then arbitrarily high communication accuracy is possible by increasing T , i.e. by taking
longer and longer signals (code words). Conversely Shannon showed that reliable commu-
nication is not possible when
1
log2 |M| > C.
T
Therefore a channel can be considered as a pipe through which which bits can reliably be
transmitted up to rate C.
Other results of Shannon include the application of Boole’s algebra to switching circuits
(M.S. thesis, MIT, 1938). It was also Shannon who introduced the sampling theorem to
the engineering community in 1949.
Part II

Optimum Receiver Principles

14
Chapter 2

Decision rules for discrete channels

SUMMARY: In this first chapter we investigate a discrete channel, i.e. a channel with a
discrete input and output alphabet. For this discrete channel we determine the optimum
receiver, i.e. the receiver that minimizes the error probability PE . The so-called maximum
a-posteriori (MAP) receiver turns out to be optimum. When all messages are equally likely
the optimum receiver reduces to a maximum-likelihood (ML) receiver. In the last section
of this chapter we distinguish between discrete scalar and vector channels.

2.1 System description

m sm r m̂
- - discrete - - destin.
source transmitter receiver
channel

Figure 2.1: Elements in a communication system based on a discrete channel.

We start by considering a very simple communication system based on a discrete channel. It


consists of the following elements (see figure 2.1):
1
SOURCE An information source produces message m ∈ M = {1, 2, · · · , |M|}, one out of
|M| alternatives. Message m occurs with probability Pr{M = m} for m ∈ M. This
probability is called the a-priori message probability and M is the name of the random
variable associated with this mechanism.

TRANSMITTER The transmitter sends a signal sm if message m is to be transmitted. This


signal is input to the channel. It assumes values from the discrete channel input alphabet
S. The random variable corresponding to the signal is denoted by S. The collection of
used signals is s1 , s2 , · · · , s|M| .

15
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 16

DISCRETE CHANNEL The channel produces an output r that assumes a value from a discrete
alphabet R. If the channel input is signal s ∈ S the output r ∈ R occurs with conditional
probability Pr{R = r |S = s}. This channel output is directed to the receiver. The random
variable associated with the channel output is denoted by R.
When message m ∈ M occurs, the transmitter chooses sm as channel input. Therefore

Pr{R = r |M = m} = Pr{R = r |S = sm } for all r ∈ R. (2.1)

These conditional probabilities describe the behavior of the transmitter followed by the
channel.

RECEIVER The receiver forms an estimate m̂ of the transmitted message (or signal) by looking
at the received channel output r ∈ R, hence m̂ = f (r ). The random variable correspond-
ing to this estimate is called M̂. We assume here that m̂ ∈ M = {1, 2, · · · , |M|}, thus the
receiver has to choose one of the possible messages (and can not declare an error, e.g.).

DESTINATION The destination accepts the estimate m̂.

Note that we call our system discrete because the channel is discrete. It has a discrete input
and output alphabet.
The performance of our communication system is evaluated by considering the probability
of error. This performance depends on the receiver, i.e. on the mapping f (·).

Definition 2.1 The mapping f (·) is called the decision rule.

Definition 2.2 The probability of error is defined as


1
PE = Pr{ M̂ 6= M}. (2.2)

We are now interested in choosing a decision rule f (·) that minimizes PE . A decision rule that
minimizes the error probability PE is called optimum. The corresponding receiver is called an
optimum receiver.
The probability of correct (right) decision is denoted as PC and obviously PE + PC = 1.

2.2 Decision rules, an example


To get more familiar with the properties of our system we study an example first.

Example 2.1 We assume that |M| = 2, i.e. there are two possible messages. Their a-priori probabilities
can be found in the following table:

m Pr{M = m}
1 0.4
2 0.6
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 17

The two signals corresponding to the messages are s1 and s2 and the conditional probabilities Pr{R =
r |S = sm } are given in the table below for the values of r and m that can occur. There are three values that
r can assume, i.e. R = {a, b, c}.

m Pr{R = a|S = sm } Pr{R = b|S = sm } Pr{R = c|S = sm }


1 0.5 0.4 0.1
2 0.1 0.3 0.6

Note that the row sums are equal to one.


Now suppose that we use the decision rule f (·) that is given by:

r a b c
f (r ) 1 1 2

This means that the receiver outputs the estimate m̂ = 1 if the channel output is a or b and m̂ = 2 if the
channel output is c. To determine the probability of error PE that is achieved by this rule we first compute
the joint probabilities

Pr{M = m, R = r } = Pr{M = m} Pr{R = r |M = m}


= Pr{M = m} Pr{R = r |S = sm }, (2.3)

where we used (2.1). We list these joint probabilities in the following table:

m Pr{M = m, R = a} Pr{M = m, R = b} Pr{M = m, R = c}


1 0.20 0.16 0.04
2 0.06 0.18 0.36

Now we can determine the probability of correct decision PC = 1 − PE which is:

PC = Pr{M = 1, R = a} + Pr{M = 1, R = b} + Pr{M = 2, R = c}


= 0.20 + 0.16 + 0.36 = 0.72. (2.4)

This decision rule is not optimum. To see why, note that for every r ∈ R a certain message m̂ ∈
M is chosen. In other words the decision rule selects in each column exactly one joint probability.
These selected probabilities are added together to form PC . Our decision rule selects the joint probability
Pr{M = 1, R = b} = 0.16 in the column that corresponds to R = b. A larger PC is obtained if for
output R = b instead of message 1 the message 2 is chosen. In that case the larger joint probability
Pr{M = 2, R = b} = 0.18 is selected. Now the probability of correct decision becomes

PC = Pr{M = 1, R = a} + Pr{M = 2, R = b} + Pr{M = 2, R = c}


= 0.20 + 0.18 + 0.36 = 0.74. (2.5)

We may conclude that, in general, to maximize PC , we have to choose the m that achieves the largest
Pr{M = m, R = r } for the r that was received. This will be made more precise in the following section.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 18

2.3 The “maximum a-posteriori” decision rule


We can upper bound the probability of correct decision as follows:
X
PC = Pr{M = f (r ), R = r }
r
X
≤ max Pr{M = m, R = r }. (2.6)
m
r

This upper bound is achieved if for all r the estimate f (r ) corresponds to the message m that
achieves the maximum in maxm Pr{M = m, R = r }, hence an optimum decision rule for the
discrete channel achieves

Pr{M = f (r ), R = r } ≥ Pr{M = m, R = r }, for all m ∈ M, (2.7)

for all r ∈ R that can be received, i.e. that have Pr{R = r } > 0. Note that it is possible that for
some channel outputs r ∈ R more than one decision f (r ) is optimum.

Definition 2.3 For a communication system based on a discrete channel the joint probabilities

Pr{M = m, R = r } = Pr{M = m} Pr{R = r |M = m}


= Pr{M = m} Pr{R = r |S = sm }. (2.8)

which are, given the received output value r , indexed by m ∈ M, are called the decision vari-
ables.
An optimum decision rule f (·) is based on these variables.

RESULT 2.1 (MAP decision rule) To minimize the probability of error PE , the decision rule
f (·) should produce for each received r a message m having the largest decision variable. Hence
for r ∈ R that actually can occur, i.e. that have Pr{R = r } > 0,

f (r ) = arg max Pr{M = m} Pr{R = r |S = sm }. (2.9)


m∈M

For such r we can divide all decision variables by Pr{R = r }. This results in the a-posteriori
probabilities of the message m ∈ M given the channel output r . By the Bayes rule
Pr{M = m} Pr{R = r |S = sm }
= Pr{M = m|R = r }, for all m ∈ M, (2.10)
Pr{R = r }
therefore the optimum receiver chooses as m̂ a message that has maximum a-posteriori proba-
bility (MAP). This decision rule is called the MAP-decision rule, the receiver is called a MAP-
receiver.

Example 2.2 Below are the a-posteriori probabilities that correspond to the example in the previous sec-
tion (obtained from Pr{R = a} = 0.26, Pr{R = b} = 0.34, and Pr{R = c} = 0.4). Note that the
probabilities in a column add up to one now.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 19

m Pr{M = m|R = a} Pr{M = m|R = b} Pr{M = m|R = c}


1 20/26 16/34 4/40
2 6/26 18/34 36/40

From this table it is clear that the optimum decision rule is

r a b c
f (r ) 1 2 2

This leads to an error probability


X
PE = Pr{M = m} Pr{ M̂ 6 = m|M = m} = 0.4(0.4 + 0.1) + 0.6 · 0.1 = 0.26, (2.11)
m

P
where Pr{ M̂ 6= m|M = m} = r : f (r )6=m Pr{R = r |S = sm } is the probability of error conditional on the
fact that message m was sent.

2.4 The “maximum likelihood” decision rule


When all messages are equally likely, i.e. when
1
Pr{M = m} = for all m ∈ M = {1, 2, · · · , |M|}, (2.12)
|M|
we get for the joint probabilities
1
Pr{M = m, R = r } = Pr{M = m} Pr{R = r |M = m} = Pr{R = r |S = sm }. (2.13)
|M|
Considering (2.7) we obtain:

RESULT 2.2 (ML decision rule) If the a-priori message probabilities are all equal, in order to
minimize the error probability PE , a decision rule f (·) has to be applied that satisfies

f (r ) = arg max Pr{R = r |S = sm }, (2.14)


m∈M

for r ∈ R that can occur i.e. for r with Pr{R = r } > 0. Such a decision rule is called a maxi-
mum likelihood (ML) decision rule, the resulting receiver is a maximum likelihood receiver.
Given the received channel output r ∈ R the receiver chooses a message m̂ ∈ M for which
the received r has maximum likelihood. The transition probabilities Pr{R = r |S = sm } for
r ∈ R and m ∈ M are called likelihoods. A receiver that operates like this (no matter whether
or not the messages are indeed equally likely) is called a maximum likelihood receiver. It should
be noted that such a receiver is less complex than a maximum a-posteriori receiver since it does
not have to multiply the transition probabilities Pr{R = r |S = sm } by the a-priori probabilities
Pr{M = m}.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 20

Example 2.3 Consider again the transition probabilities of the example in section 2.2:

m Pr{R = a|S = sm } Pr{R = b|S = sm } Pr{R = c|S = sm }


1 0.5 0.4 0.1
2 0.1 0.3 0.6
From this table follows the maximum-likelihood decision rule which is
r a b c
f (r ) 1 1 2
When the messages 1 and 2 both have a-priori probability 1/2 the probability of correct decision is
X 1 1
PC = Pr{M = m} Pr{ M̂ = m|M = m} = (0.5 + 0.4) + 0.6 = 0.75. (2.15)
m
2 2
P
Here Pr{ M̂ = m|M = m} = r : f (r )=m Pr{R = r |S = sm } is the correct-probability conditioned on the
fact that message m was sent. Note that the maximum likelihood decision rule is optimum here since both
messages have the same a-priori probability.

2.5 Scalar and vector channels

sm r
sm1- r1 -
m sm2- discrete r2 - m̂
source - transmitter .. vector .. receiver - destin.
. channel .
- -
sm N rN

Figure 2.2: Elements in a communication system based on a discrete vector channel.

So far we have considered in this chapter a discrete channel that accepts a single input s ∈ S
and produces a single output r ∈ R. This channel could be called a discrete scalar channel.
There is also a vector variant of this channel, see figure 2.2. The components of a discrete vector
system are described below.

TRANSMITTER Let N be a positive integer. Then the vector transmitter sends a signal vector
s m = (sm1 , sm2 , · · · , sm N ) of N components from the discrete alphabet S if message m
is to be conveyed. This signal vector is input to the discrete vector channel. The random
variable corresponding to the signal signal vector is denoted by S. The collection of used
signal vectors is s 1 , s 2 , · · · , s |M| .

DISCRETE VECTOR CHANNEL This channel produces an output vector r = (r1 , r2 , · · · ,


r N ) with N components all assuming a value from a discrete alphabet R. If the channel
input was signal s ∈ S N the output r ∈ R N occurs with conditional probability Pr{R =
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 21

r |S = s}. This channel output vector is directed to the vector receiver. The random
variable associated with the channel output is denoted by R.
When message m ∈ M occurs, s m is chosen by the transmitter as channel input vector.
Then the conditional probabilities

Pr{R = r |M = m} = Pr{R = r |S = s m } for all r ∈ R, (2.16)

describe the behavior of the vector transmitter followed by the discrete vector channel.

RECEIVER The vector receiver forms an estimate m̂ of the transmitted message (or signal) by
looking at the received channel output vector r ∈ R N , hence m̂ = f (r ). The mapping
f (·) is again called the decision rule.

It will not be a big surprise that we define the decision variables for the discrete vector channel
as follows:

Definition 2.4 For a communication system based on a discrete vector channel the decision
variables are again the joint probabilities

Pr{M = m, R = r } = Pr{M = m} Pr{R = r |M = m}


= Pr{M = m} Pr{R = r |S = s m }. (2.17)

which are, given the received output vector r , indexed by m ∈ M.


The following result tells us what the optimum decision rule should be for the discrete vector
channel:

RESULT 2.3 (MAP decision rule for the discrete vector channel) To minimize the probabil-
ity of error PE , the decision rule f (·) should produce for each received vector r a message m
having the largest decision variable. Hence for vectors r ∈ R N that actually can occur, i.e. that
have Pr{R = r } > 0,

f (r ) = arg max Pr{M = m} Pr{R = r |S = s m }. (2.18)


m∈M

This decision rule is the MAP-decision rule.


There is obviously also a ”maximum likelihood version” of this result.

2.6 Exercises
1. Consider the noisy discrete communication channel illustrated in figure 2.3.

(a) If Pr{M = 1} = 0.7 and Pr{M = 2} = 0.3, determine the optimum decision rule
(assignment of a, b or c to 1, 2) and the resulting probability of error PE .
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 22

¤¡ 0.7 ¤¡
P
1 £Q
¢P -
Q P ´£ ¢ a
Q PP 0.2 ´
Q PP ´
Q
0.1 Q P ´
´PPP ¤ ¡
Q´ PP
´Q ³³³£ ¢ b
´ ³³
0.3 ´ ³³Q Q
´³³ 0.2 Q
¤´
¡ ´
³³ Q ¤¡
´
³
2 £³
¢ - Q£ ¢ c
0.5

Figure 2.3: A noisy discrete communication system with input M and output R.

(b) There are eight decision rules. Plot the probability of error for each decision rule
versus Pr{M = 1} on one graph.
(c) Each decision rule has a maximum probability of error which occurs for some least
favorable a-priori probability Pr{M = 1}. The decision rule which has the smallest
is called the minimax decision rule. Which of the eight rules is minimax?

(Exercise 2.12 from Wozencraft and Jacobs [25].)

2. Consider a communication system based on a discrete channel. The number of messages


is five i.e. M = {1, 2, 3, 4, 5}. The a-priori probabilities are Pr{M = 1} = Pr{M = 5} =
0.1, Pr{M = 2} = Pr{M = 4} = 0.2, and Pr{M = 3} = 0.4. The signal corresponding to
message m is equal to m, thus sm = m, for m ∈ M. This signal is the input for the discrete
channel.
The channel adds the noise n to the signal, hence the channel output

r = sm + n. (2.19)

The noise variable N is independent of the channel input and can have the values −1, 0,
and +1 only. It is given that Pr{N = 0} = 0.4 and Pr{N = −1} = Pr{N = +1} = 0.3.

(a) Note that the channel output assumes values in {0, 1, 2, 3, 4, 5, 6}. Determine the
probability Pr{R = 2}. For each message m ∈ M compute the a-posteriori probabil-
ity Pr{M = m|R = 2}.
(b) Note that an optimum receiver minimizes the probability of error PE . Give for each
possible channel output the estimate m̂ that an optimum receiver will make. Deter-
mine the resulting error probability.
(c) Consider a maximum-likelihood receiver. Give for each possible channel output the
estimate m̂ that a maximum-likelihood receiver will make. Again determine the re-
sulting probability of error.

(Exam Communication Principles, July 5, 2002.)


CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 23

3. Consider an optical communication system. A binary message is transmitted by switching


the intensity λ of a laser ”on” or ”off” within the time-interval [0, T ]. For message m = 1
(”off”) the intensity λ = λoff ≥ 0. For message m = 2 (”on”) the intensity λ = λon > λoff .
The laser produces photons that are counted by a detector. The number K of photons is a
non-negative random variable, and the corresponding probabilities are
(λT )k
Pr{K = k} = exp(−λT ), for k = 0, 1, 2, · · · .
k!
This probability distribution is known as Poisson distribution.
The a-priori message probabilities are Pr{M = 1} = poff and Pr{M = 2} = pon . It is
assumed that 0 < pon = 1 − poff < 1.
The detector forms the message estimate m̂ based on the number k of photons that are
counted.

(a) Suppose that the detector chooses m̂ such that the probability of error PE = Pr{ M̂ 6=
M} is minimized. For what values of k does the detector choose m̂ = 1 and when
does it choose m̂ = 2?
(b) Consider a maximum-likelihood detector. For what values of k does this detector
choose m̂ = 1 and when does it choose m̂ = 2 now?
(c) Assume that λoff = 0. Consider a maximum-a-posteriori detector. For what values
of pon does the detector choose m̂ = 2 no matter what k is? What is in that case the
error probability PE ?

(Exam Communication Principles, July 8, 2003.)

1− p
¤£¡¢ - ¤¡
0 H HH © ©£¢ 0
pHHH
jH
H ©
H© ©©
©HH
p© *©
©
© HH
©© HH¤¡
¡¢©
¤£© - £¢ 1
1
1− p

Figure 2.4: A binary symmetric channel with cross-over probability p.

4. Consider transmission over a binary symmetric channel with cross-over probability 0 <
p < 1/2 (see figure 2.4. All messages m ∈ M are equally likely. Each message m
now corresponds to a signal vector (codeword) s m = (sm1 , sm2 , · · · , sm N ) consisting of N
binary digits, hence smi ∈ {0, 1} for i = 1, N . Such a codeword (sequence) is transmitted
over the binary symmetric channel, the resulting channel output sequence is denoted by r .
The Hamming-distance dH (x, y) between two sequences x and y, both consisting of N
binary digits, is the number of positions at which they differ.
CHAPTER 2. DECISION RULES FOR DISCRETE CHANNELS 24

(a) Show that the optimum receiver should choose the codeword that has minimum Ham-
ming distance to the received channel output sequence.
(b) What is the optimum receiver for 1/2 < p < 1?
(c) Assume that the messages are not equally likely and that the cross-over probability
p = 1/2? What is the minimum error probability now? What does the corresponding
receiver do?

(Exam Communication Principles, October 6, 2003.)


Chapter 3

Decision rules for the real scalar channel

SUMMARY: Here we will consider transmission of information over a channel with a single
real-valued input and output. Again the optimum receiver is determined. As an example
we investigate the additive Gaussian noise channel, i.e. the channel that adds Gaussian
noise to the input signal.

3.1 System description

m sm r m̂
- - real - - destin.
source transmitter receiver
channel

Figure 3.1: Elements in a communication system based on a real scalar channel.

A communication system based on a channel with real-valued input and output alphabet, i.e.
a real scalar channel does not differ very much from a system based on a discrete channel (see
figure 3.1).

SOURCE An information source produces the message m ∈ M = {1, 2, · · · , |M|} with a-


priori probability Pr{M = m} for m ∈ M. Again M is the name of the random variable
associated with this mechanism.

TRANSMITTER The transmitter now sends a real-valued scalar signal sm if message m is


to be transmitted. The scalar signal is input to the channel. It assumes a value in the
range (−∞, ∞). The random variable corresponding to the signal is denoted by S. The
collection of used signals is s1 , s2 , · · · , s|M| .

REAL SCALAR CHANNEL The channel now produces an output r in the range (−∞, ∞).
When the input signal is the real-valued scalar s the channel output is generated according

25
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 26

to the conditional probability density function p R (r |S = s) and thus R is a real-valued


scalar random variable. The probability of receiving a channel output r ≤ R < r + dr is
equal to p R (r |S = s)dr for an infinitely small interval dr .
When message m ∈ M occurs, signal sm is chosen by the transmitter as channel input.
Then the conditional probability density function

p R (r |M = m) = p R (r |S = sm ) for all r ∈ (−∞, ∞), (3.1)

describes the behavior of the vector transmitter followed by the vector channel.

RECEIVER The receiver forms an estimate m̂ of the transmitted message (or signal) based on
the received real-valued scalar channel output r , hence m̂ = f (r ). The mapping f (·) is
called the decision rule.

DESTINATION The destination accepts the estimate m̂.

3.2 MAP and ML decision rules


For the real scalar channel we can write the probability of correct decision as
Z ∞
PC = Pr{M = f (r )} p R (r |M = f (r ))dr. (3.2)
−∞

An optimum decision rule is obtained if, after receiving the scalar R = r , the decision f (r ) is
taken in such a way that

Pr{M = f (r )} p R (r |M = f (r )) ≥ Pr{M = m} p R (r |M = m) for all m ∈ M. (3.3)

This leads to the definition of the decision variables given below.


Alternatively we can assume that a value r ≤ Rr + dr was received. Then the decision
variables would be

Pr{M = m, r ≤ R < r + dr } = Pr{M = m} Pr{r ≤ R < r + dr |S = sm }


= Pr{M = m} p R (r |S = sm )dr. (3.4)

Also this reasoning leads to the definition of the decision variables below.

Definition 3.1 For a system based on a real scalar channel the products

Pr{M = m} p R (r |M = m) = Pr{M = m} p R (r |S = sm ) (3.5)

which are, given the received output r , indexed by m ∈ M, are called the decision variables.
The optimum decision rule f (·) is again based on these variables. It is possible that for certain
channel outputs r more decisions f (r ) are optimum.
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 27

RESULT 3.1 (MAP) To minimize the error probability PE , the decision rule f (·) should be
such that for each received r a message m is chosen with the largest decision variable. Hence
for r that can be received, an optimum decision rule f (·) should satisfy

f (r ) = arg max Pr {M = m} p R (r |S = sm ). (3.6)


m∈M

Both sides of the inequality (3.3) can be divided by p R (r ) for values of r that actually did occur.
Then we obtain
f (r ) = arg max Pr{M = m|R = r }, (3.7)
m∈M
for r for which p R (r ) > 0. This rule is again called maximum a-posteriori (MAP) decision rule.

RESULT 3.2 (ML) When all messages have equal a-priori probabilities. i.e. when Pr{M =
m} = 1/|M| for all m ∈ M, we observe from (3.6), that the optimum receiver has to choose

f (r ) = arg max p R (r |S = sm ), (3.8)


m∈M

for all r with pr (r ) > 0. This rule is referred to as the maximum likelihood (ML) decision rule.

3.3 The scalar additive Gaussian noise channel


3.3.1 Introduction
Gaussian noise is probably the most important kind of impairment. Therefore we will investigate
a simple communication situation based on a channel that adds a Gaussian noise sample n to the
real scalar channel input signal s (see figure 3.2).
2
n p N (n) = √ 1 n
exp(− 2σ 2)
2π σ 2

s ¾?»r =s+n
- + -
½¼

Figure 3.2: Scalar additive Gaussian noise channel.

Definition 3.2 The scalar additive Gaussian noise (AGN) channel adds Gaussian noise N to
the input signal S . This Gaussian noise N has variance σ 2 and mean 0. The probability density
function of the noise is defined to be
1 1 n2
p N (n) = √ exp(− ). (3.9)
2πσ 2 2σ 2
The noise variable N is assumed to be independent of the signal S .
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 28

3.3.2 The MAP decision rule


We assume e.g. that |M| = 2, i.e. there are two messages and M can be either 1 or 2. The
corresponding two signals s1 and s2 are the inputs of our channel. Without loss of generality let
s1 > s2 .
The conditional probability density function of receiving R = r given the message m depends
only on the value n that the noise variable N gets. In order to receive R = r when the signal is
sm , the noise variable N should have value r − sm . The noise N is independent of the signal S
(and the message M). Therefore (assuming that dr is infinitely small)

p R (r |S = sm ) = Pr{r ≤ R < r + dr |S = sm }/dr


= Pr{r − sm ≤ N < r − sm + dr }/dr
= p N (r − sm )
1 (r − sm )2
= √ exp(− ), for m = 1, 2. (3.10)
2πσ 2 2σ 2

We obtain an optimum (maximum a-posteriori) receiver when m̂ = 1 is chosen if

1 (r − s1 )2 1 (r − s2 )2
Pr{M = 1} √ exp(− ) ≥ Pr{M = 2} √ exp(− ), (3.11)
2πσ 2 2σ 2 2πσ 2 2σ 2

and m̂ = 2 otherwise. The following inequalities are all equivalent to (3.11):

(r − s1 )2 (r − s2 )2
ln Pr{M = 1} − ≥ ln Pr{M = 2} − ,
2σ 2 2σ 2
2σ 2 ln Pr{M = 1} − (r − s1 )2 ≥ 2σ 2 ln Pr{M = 2} − (r − s2 )2 ,
2σ 2 ln Pr{M = 1} + 2r s1 − s12 ≥ 2σ 2 ln Pr{M = 2} + 2r s2 − s22 ,
Pr{M = 2}
2rs1 − 2r s2 ≥ 2σ 2 ln + s12 − s22 . (3.12)
Pr{M = 1}
RESULT 3.3 (Optimum receiver for the scalar additive Gaussian noise channel) A receiver
that decides m̂ = 1 if
∗ 1 σ2 Pr{M = 2} s1 + s2
r ≥r = ln + , (3.13)
s1 − s2 Pr{M = 1} 2
and m̂ = 2 otherwise, is optimum. When the a-priori probabilities Pr{M = 1} and Pr{M = 2}
are equal the optimum threshold is
s1 + s2
r∗ = . (3.14)
2
Example 3.1 Assume that σ 2 = 1 and s1 = +1 and s2 = −1. If the a-priori message probabilities are
equal, i.e. when Pr{M = 1} = Pr{M = 2}, we obtain an optimum receiver if only for r ≥ r ∗ = s1 +s
2
2
we

decide for m̂ = 1. The decision changes exactly halfway between s1 and s2 . The intervals (−∞, r ) and
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 29

0.2

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0
−5 −4 −3 −2 −1 0 1 2 3 4 5

Figure 3.3: Decision variables for equal a-priori probabilities as a function of r .


0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
−5 −4 −3 −2 −1 0 1 2 3 4 5

Figure 3.4: Decision variables as a function of r for a-priori probabilities 1/4 and 3/4.

(r ∗ , ∞) are called decision intervals. In our case, since s2 = −s1 , the threshold r ∗ = 0. This also can be
seen from figure 3.3 where the decision variables
1 (r − s1 )2 1 (r − s2 )2
Pr{M = 1} √ exp(− ), Pr{M = 2} √ exp(− ) (3.15)
2π σ 2 2σ 2 2π σ 2 2σ 2
are plotted as a function of r assuming that Pr{M = 1} = Pr{M = 2} = 1/2.
Next assume that the a-priori probabilities are not equal but let Pr{M = 1} = 3/4 and Pr{M = 2} =
1/4. Now the decision variables change (see figure 3.4) and we must also change the decision rule. It
turns out that for r ≥ r ∗ = − ln23 ≈ −0.5493 the optimum receiver should choose m̂ = 1 (see again figure
3.4). The threshold r ∗ has moved away from the more probable signal s1 .
Note that, no matter how much the a-priori probabilities differ, there is always a value of r , which we
have called r ∗ , the threshold, before, for which (3.11) is satisfied with equality. For r > r ∗ the left side in
(3.11) is larger than the right side, for r < r ∗ the right side is the largest.
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 30

3.3.3 The Q-function


To compute the probability of error we introduce the so-called Q-function (see figure 3.5).

Definition 3.3 (Q-function) This function of x ∈ (−∞, ∞) is defined as


Z ∞
1 1 α2
Q(x) = √ exp(− )dα. (3.16)
x 2π 2
It is the probability that a Gaussian random variable with mean 0 and variance 1 assumes a value
larger than x .

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
−4 −3 −2 −1 0 1 2 3 4

Figure 3.5: Gaussian probability density function for mean 0 and variance 1. The shaded area
corresponds to Q(1).

A useful property of the Q-function is

Q(x) + Q(−x) = 1. (3.17)

In appendix A an upper bound for Q(x) is derived. In table 3.1 we tabulated Q(x) for several
values of x.

3.3.4 Probability of error


We now want to find an expression for the error probability of our scalar system with two mes-
sages. We write

PE = Pr{M = 1} Pr{R < r ∗ |M = 1} + Pr{M = 2} Pr{R ≥ r ∗ |M = 2}, (3.18)


CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 31

x Q(x) x Q(x) x Q(x)


0.00 0.50000 1.5 6.6681 ·10−2 5.0 2.8665 ·10−7
0.25 0.40129 2.0 2.2750 ·10−2 6.0 9.8659 ·10−10
0.50 0.30854 2.5 6.2097 ·10−3 7.0 1.2798 ·10−12
0.75 0.22663 3.0 1.3499 ·10−3 8.0 6.2210 ·10−16
1.00 0.15866 4.0 3.1671 ·10−5 10.0 7.6200 ·10−24
Table 3.1: Table of Q(x) for some values of x.

where
Z ∞
∗ 1 (r − s2 )2
Pr{R ≥ r |M = 2} = √ exp(−)dr
r =r ∗ 2πσ 2 2σ 2
Z ∞
1 ((r/σ ) − s2 /σ )2
= √ exp(− )d(r/σ )
r =r ∗ 2π 2
Z ∞
1 µ2
= √ exp(− )dµ = Q(r ∗ /σ − s2 /σ ), (3.19)
µ=r ∗ /σ −s2 /σ 2π 2

and similarly
Z r =r ∗
∗ 1(r − s1 )2
Pr{R < r |M = 1} = √ exp(− )dr
−∞ 2πσ 2 2σ 2
Z r =r ∗
1 ((r/σ ) − s1 /σ )2
= √ exp(− )d(r/σ )
−∞ 2π 2
Z µ=r ∗ /σ −s1 /σ
1 µ2
= √ exp(− )dµ
−∞ 2π 2
(∗)
= 1 − Q(r ∗ /σ − s1 /σ ) = Q(s1 /σ − r ∗ /σ ). (3.20)

Note that the last equality (*) follows from (3.17).

RESULT 3.4 (Minimum probability of error) If we combine (3.18), (3.19), and (3.20), we ob-
tain
s1 − r ∗ r ∗ − s2
PE = Pr{M = 1}Q( ) + Pr{M = 2}Q( ). (3.21)
σ σ
If the a-priori message probabilities Pr{M = 1} and Pr{M = 2} are equal then, according to
(3.14), we get r ∗ = s1 +s2
2 and

s1 − r ∗ s1 − s1 +s
2
2
s1 − s2
= =
σ σ 2σ
∗ s1 +s2
r − s2 2 − s2 s1 − s2
= = (3.22)
σ σ 2σ
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 32

hence for the minimum probability of error we obtain


s1 − s2
PE = Q( ). (3.23)

Example 3.2 For s1 = +1, s2 = −1, and σ 2 = 1, we obtain if Pr{M = 1} = Pr{M = 2} that

PE = Q(1) ≈ 0.1587. (3.24)

For Pr{M = 1} = 3/4 and Pr{M = 2} = 1/4 we get that r ∗ = − ln23 ≈ −0.5493 and

3 ln 3 1 ln 3
PE = Q(1 + ) + Q(− + 1)
4 2 4 2
≈ 0.75Q(1.5493) + 0.25Q(0.4507)
≈ 0.75 · 0.0607 + 0.25 · 0.3261 ≈ 0.1270. (3.25)

Note that this is smaller than Q(1) ≈ 0.1587 which would be the error probability after ML-detection
(which is suboptimum here).

3.4 Exercises
p R (r |M = 1) p R (r |M = 2)
1 1
¡@
¡ @
¡ @ 1/4
¡ @
0 1 2 r -1 0 1 2 3 r

Figure 3.6: Conditional probability density functions.

1. A communication system is used to transmit one of two equally likely messages, 1 and 2.
The channel output is a real-valued random variable R, the conditional density functions of
which are shown in figure 3.6. Determine the optimum receiver decision rule and compute
the resulting probability of error.
(Exercise 2.23 from Wozencraft and Jacobs [25].)

2. The noise n in figure 3.7a is Gaussian with zero mean, i.e. E[N ] = 0. If one two equally
likely messages is transmitted, using the signals of 3.7b, an optimum receiver yields PE =
0.01.

(a) What is the minimum attainable probability of error PEmin when the channel of 3.7a
is used with three equally likely messages and the signals of 3.7c? And with four
equally likely messages and the signals of 3.7d?
(b) How do the answers to the previous questions change if it is known that E[N ] = 1
instead of 0?
CHAPTER 3. DECISION RULES FOR THE REAL SCALAR CHANNEL 33

n
s1 s2
?» ¤¡ ¤¡
s ¾ r =s+n £¢ £¢
- + - -2 +2
½¼

(a) (b)

s1 s2 s3 s1 s2 s3 s4
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢
-4 0 +4 -4 0 +4 +8

(c) (d)

Figure 3.7:

(Exercise 4.1 from Wozencraft and Jacobs [25].)


2
3. Show that p(n) = √1 exp(− n2 ) for −∞ < n < ∞ is a probability density function

(non-negative, integrating to 1).
Chapter 4

Real vector channels

SUMMARY: In this chapter we discuss the channel whose input and output are vectors of
real-valued components. An important example of such a real vector channel is the additive
Gaussian noise (AGN) vector channel. This channel adds zero-mean Gaussian noise to each
input signal component. All noise samples are assumed to be independent of each other and
have the same variance. For the AGN vector channel we determine the optimum receiver.
If all messages are equally likely this receiver chooses the signal vector that is closest to the
received vector in Euclidean sense. We also investigate the error behavior of this channel.
The last part of this chapter deals with the theorem of reversibility.

4.1 System description

sm r
sm1- r1 -
m sm2- real r2 - m̂
source - transmitter .. vector .. receiver - destin.
. channel .
- -
sm N rN

Figure 4.1: A communication system based on a real vector channel.

In a communication system based on a real vector channel (see figure 4.1) the channel input
and output are vectors with real-valued components.

SOURCE The information source generates the message m ∈ M = {1, 2, · · · , |M|} with a-
priori probability Pr{M = m} for m ∈ M. M is the random variable associated with this
mechanism.

TRANSMITTER The transmitter sends a vector-signal s m = (sm1 , sm2 , · · · , sm N ) consisting


of N components if message m is to be transmitted. Each component assumes a value in
the range (−∞, ∞). The random variable corresponding to the signal vector is denoted

34
CHAPTER 4. REAL VECTOR CHANNELS 35

by S. The vector-signal is input to the channel. The collection of used signal vectors is
s 1 , s 2 , · · · , s |M| .

REAL VECTOR CHANNEL The channel produces an output vector r = (r1 , r2 , · · · , r N )


consisting of N components. We assume here that all these components assume values
from (−∞, ∞). The conditional probability density function of the channel output vector
r when the input vector s is transmitted is p R (r |S = s). Thus, for S = s, the probability
of receiving an N -dimensional channel output vector R with components Rn for n = 1, N
satisfying rn ≤ Rn < rn + drn is equal to p R (r |S = s)dr for an infinitely small dr =
dr1 dr2 · · · dr N and r = (r1 , r2 , · · · , r N ).
When message m ∈ M is sent, signal s m is chosen by the transmitter as channel input
vector. Then the conditional probability density function

p R (r |M = m) = p R (r |S = s m ) for all r ∈ (−∞, ∞) N , (4.1)

describes the behavior of the cascade of the vector transmitter and the vector channel.

Receiver The receiver forms m̂ based on the received real-valued channel output vector r , hence
m̂ = f (r ). Mapping f (·) is called the decision rule.

Destination The destination accepts m̂.

4.2 Decision variables, MAP decoding


The optimum receiver upon receiving r determines which one of the possible messages has
maximum a-posteriori probability. It therefore chooses the decision rule f (r ) such that for all r
that actually can be received

Pr{M = f (r )} p R (r |M = f (r )) ≥ Pr{M = m} p R (r |M = m), for all m ∈ M. (4.2)

In other words, upon receiving r , the optimum receiver produces an estimate m̂ that corresponds
to a largest decision variable. Given r , the decision variables for the real vector channel are

Pr{M = m} p R (r |M = m) = Pr{M = m} p R (r |S = s m ) for all m ∈ M. (4.3)

That this is MAP-decoding follows from the Bayes rule which says that
Pr{M = m} p R (r |M = m)
Pr{M = m|R = r } = (4.4)
p R (r )

for r with p R (r ) > 0. Note that p R (r ) > 0 for an output vector r that has actually been received.
CHAPTER 4. REAL VECTOR CHANNELS 36

4.3 Decision regions


The optimum receiver, upon receiving r , determines the maximum of all decision variables which
are given in (4.3). To compute these decision variables the a-priori probabilities Pr{M = m}
must be known to the receiver and also the conditional density functions p R (r |S = s m ) for all
m ∈ M. This calculation can be carried out for every r in the observation space, i.e. the set of
all possible output vectors r . Each point r in the observation space is therefore assigned to one of
the possible messages m ∈ M, the message that is actually chosen by the receiver. This results
in a partitioning of the observation space in at most |M| regions.

Definition 4.1 Given the decision rule f (·) we can write


1
Im = {r | f (r ) = m}. (4.5)

where Im is called the decision region that corresponds to message m ∈ M.


Note that in the example in subsection 3.3.2 of the previous chapter we considered decision
intervals. A decision region is an N -dimensional generalization of a (one-dimensional) decision
interval.

I1 »»
½ ½

s1 ­­
¤¡ ­
£¢ ­ I2
­
¡ ¤¡
¡ £ ¢ s2
¡
´
´ @
´ @
´
©´ @
© © AA
©
©©
³³ B
³³
³ @
QX
X``
``
¤¡
£¢
s3
I3

Figure 4.2: Three two-dimensional signal vectors and their decision regions.

Example 4.1 Suppose (see figure 4.2) that we have three messages i.e. M = {1, 2, 3}, and the corre-
sponding signal-vectors are two-dimensional i.e. s 1 = (1, 2), s 2 = (2, 1), and s 3 = (1, −3). A possible
partitioning of the observation space into the three decision regions I1 , I2 , and I3 is shown in the figure.
CHAPTER 4. REAL VECTOR CHANNELS 37

4.4 The additive Gaussian noise vector channel


4.4.1 Introduction
The actual shape of the decision regions is determined by the a-priori message probabilities
Pr{M = m}, the signals s m , and the conditional density functions p R (r |S = s m ) for all m ∈
M. A relatively simple but again quite important situation is the case where the channel adds

1 knk2
n p N (n) = (2π σ 2 ) N /2
exp(− 2σ 2 )

¾?»
s r =s+n
- + -
½¼

Figure 4.3: Additive Gaussian noise vector channel.

Gaussian noise to the signal components (see figure 4.3).

Definition 4.2 For the output of the additive Gaussian noise (AGN) vector channel we have
that
r = s + n, (4.6)
1
where n = (n 1 , n 2 , · · · , n N ) is an N -dimensional noise vector. We denote this random noise
vector by N . The noise vector N is assumed to be independent of the signal vector S . Moreover
the N components of the noise vector are assumed to be independent of each other. All noise
components have mean 0 and the same variance σ 2 . Therefore the joint density function of the
noise vector is given by
Y 1 n i2 1 1 X 2
p N (n) = √ exp(− 2 ) = exp(− n i ). (4.7)
2πσ 2 2σ (2πσ 2 ) N /2 2σ 2
i=1,N i=1,N

The notation in the definition can be contracted by noting that


X
n i2 = (n · n) = knk2 , (4.8)
i=1,N

1 P
where (a · b) = i=1,N ai bi is the dot (inner) product of the vectors a = (a1 , a2 , · · · , a N ) and
b = (b1 , b2 , · · · , b N ). Thus

1 knk2
p N (n) = exp(− ). (4.9)
(2πσ 2 ) N /2 2σ 2
CHAPTER 4. REAL VECTOR CHANNELS 38

4.4.2 Minimum Euclidean distance decision rule


If the channel output is r and the channel input was s m then the noise vector must have been
n = r − s m . This and the independence of the noise vector and the signal vector yields that

p R (r |M = m) = p R (r |S = s m ) = p N (r − s m |S = s m ) = p N (r − s m ). (4.10)

Now we can easily determine the decision variables, one for each m ∈ M:

Pr{M = m} p R (r |M = m) = Pr{M = m} p N (r − s m )
1 kr − s m k2
= Pr{M = m} exp(− ). (4.11)
(2πσ 2 ) N /2 2σ 2

Note that the factor (2πσ 2 ) N /2 is independent of m. Hence maximizing over the decision vari-
ables in (4.11) is equivalent to minimizing

kr − s m k2 − 2σ 2 ln Pr{M = m} (4.12)

over all m ∈ M. We obtain a very simple decision rule if all messages are equally likely.

RESULT 4.1 (Minimum Euclidean distance decision rule) If the a-priori message probabili-
ties are all equal, the optimum receiver has to minimize the squared Euclidean distance

kr − s m k2 , over all m ∈ M. (4.13)

In other words the receiver has to choose the message m̂ with corresponding signal vector s m̂
that is closest in Euclidean sense to the received vector r .

Example 4.2 Consider again (see figure 4.4) three two dimensional signal vectors s 1 = (1, 2), s 2 =
(2, 1), and s 3 = (1, −3).
The optimum decision regions can be found by drawing the perpendicular bisectors of the sides of the
signal triangle. These are the boundaries of the decision regions I1 , I2 , and I3 (see figure). Note that the
three bisectors have a single point in common.

4.4.3 Error probabilities


The error probability of an additive Gaussian noise vector channel is determined by the location
of the signals s 1 , s 2 , · · · , s |M| , and, more importantly, the hyperplanes that separate these signal-
vectors. An error occurs if the noise ”pushes” a signal-point to the wrong side of a hyperplane.
To study this behavior we investigate a simple example in two dimensions. i.e. N = 2.
Consider a signal vector s = (a, b), see figure 4.5. This signal vector is corrupted by the additive
Gaussian noise vector n = (n 1 , n 2 ) with independent components that both have mean zero and
variance σ 2 . What is now the probability PI that the received vector r = (r1 , r2 ) = (a, b) +
(n 1 , n 2 ) is in region I, the region above the line, see the figure?
CHAPTER 4. REAL VECTOR CHANNELS 39

¡
¡
¡
I1 ¡
¡
¡
s1 ¡
e ¡ I2
@ ¡
¡@
¡ @e s 2
¡ ¤¤
¡ ¤
¡ ¤
¡
XXX ¤
XXX¤
¤ XXXX
¤ XX
XX
¤
¤
¤
¤e
s3
I3

Figure 4.4: Three two-dimensional signal vectors and the corresponding optimum decision re-
gions for the additive Gaussian noise channel.

QQ
Q
Q
Q
Q
r2 Q I
Q
Q
6 Q
­
Á Q
­ Q
­ Q
1­ (r1 , r2 ) QQ
Q
­ ­QAK Q Q ­­Á
­ ­ A Q Q
k
QQ A ­Q
À
­ ­ Q Q
Q A Q ­
Q­ Q Q
Q
k A Q­ Q
u2 Q A Á u1
­ Q
Q Q line
Q A ­ Q
Q A ­Q θ Q
Q ¤¡
QA£­¢
(a, b)

- r1

Figure 4.5: An optimum receiver for r 1


CHAPTER 4. REAL VECTOR CHANNELS 40

To solve this problem we have to change the coordinate system. Let

r1 = a + u 1 cos θ − u 2 sin θ (4.14)


r2 = b + u 1 sin θ + u 2 cos θ, (4.15)

where (a, b) is the center of a Cartesian system with coordinates u 1 and u 2 , and θ the rotation-
angle. Coordinate u 1 is perpendicular to and coordinate u 2 is parallel to the line in the figure.
Note that Z  
1 (r1 − a)2 + (r2 − b)2
PI = 2
exp − dr1 dr2 . (4.16)
I 2πσ 2σ 2
Elementary calculus (see e.g. [26], p. 386, theorem 7.7.13) tells us that
Z Z
f (r1 , r2 )dr1 dr2 = f (r1 (u 1 , u 2 ), r2 (u 1 , u 2 ))|J |du 1 du 2 , (4.17)
I I

where |J | is the determinant of the Jacobian J which is


!  
∂r1 ∂r2
∂u ∂u cos θ − sin θ
J= ∂r1
1
∂r2
1 = . (4.18)
∂u ∂u
sin θ cos θ
2 2

Note that |J | = cos2 θ + sin2 θ = 1 and (r1 − a)2 + (r2 − b)2 = u 22 + u 22 . Therefore
Z !
1 u 21 + u 22
PI = 2
exp − du 1 du 2
I 2πσ 2σ 2
Z ∞ Z ∞ !
1 u 21 + u 22
= 2
exp − du 1 du 2
u 1 =1 u 2 =−∞ 2πσ 2σ 2
Z ∞ ! Z ∞ !
1 u 21 1 u 22
= √ exp − 2 du 1 √ exp − 2 du 2
u 1 =1 2πσ 2 2σ u 2 =−∞ 2πσ 2 2σ
1 1
= Q( ) · 1 = Q( ). (4.19)
σ σ
Conclusion is that the probability depends only on the distance 1 from the signal point (a, b) to
the line. This result carries over to more than two dimensions:

RESULT 4.2 For the additive Gaussian noise vector channel, the probability that the noise
pushes a signal to the wrong side of a hyperplane is
1
PI = Q( ), (4.20)
σ
where 1 is the distance from the signal-point to the hyperplane and σ 2 is the variance of each
noise component. All noise components are assumed to be zero-mean.
CHAPTER 4. REAL VECTOR CHANNELS 41

4.5 Theorem of reversibility


It is interesting to investigate what kind of processing of the output of a channel is allowed, in the
sense that it does not degrade the performance. An important result corresponding to this issue
is stated below.

THEOREM 4.3 (Theorem of reversibility) The minimum attainable probability of error is not
affected by the introduction of a reversible operation at the output of a channel, see figure 4.6.

s r r0
- channel - G -

Figure 4.6: If G is reversible then a ”good” decision can be made from r 0 = G(r ).

Proof: Consider the receiver for r 0 that is depicted in figure 4.7. This receiver first recovers
r = G −1 (r 0 ). This is possible since the mapping G from r to r 0 is reversible. Then an optimum
receiver for r is used to determine m̂. The receiver constructed in this way for r 0 yield the same
error probability as an optimum receiver that observes r directly, thus a reversible operation does
not (necessarily) lead to an increase of PE . 2

r0 r optimum m̂
- G −1 - receiver -
for r

Figure 4.7: An optimum receiver for r 0

4.6 Exercises
1. One of four equally likely messages is to be communicated over a vector channel which
adds a (different) statistically independent zero-mean Gaussian random variable with vari-
ance N0 /2 to each transmitted vector component. Assume that the transmitter uses the
signal vectors shown in figure 4.8 and express the PE produced by an optimum receiver in
terms of the function Q(x).
(Exercise 4.2 from Wozencraft and Jacobs [25].)

2. In the communication system diagrammed in figure 4.9 the transmitted signal sm and the
noises n 1 and n 2 are all random voltages and all statistically independent. Assume that
CHAPTER 4. REAL VECTOR CHANNELS 42

s 2 ¤£ ¡¢ ¤ ¡s 1
£¢


E s /2
¤¡ ¤¡
£¢ £¢
s3 s4

Figure 4.8: Signal structure.

|M| = 2 i.e. M = {1, 2} and that


Pr{M = 1} = Pr{M = 2} = 1/2,
p
s1 = 2E s ,
s2 = 0,
1 n2
p N1 (n) = p N2 (n) = √ exp(− 2 ). (4.21)
2πσ 2 2σ

¾» r1 = sm + n 1
n1 - + - m̂
½¼ optimum -
m 6
- transmitter sm receiver
¾?»
n2 - + -
½¼ r2 = sm + n 2

Figure 4.9: Communication system.

r1
¾?»
r1 + r2 m̂
+ - threshold device -
½¼
6

r2

Figure 4.10: Detector structure.

(a) Show that the optimum receiver can be realized as diagrammed in figure 4.10. De-
termine the optimum threshold setting and the value m̂ for r1 + r2 larger than the
threshold.
CHAPTER 4. REAL VECTOR CHANNELS 43

m sm ¾ »r 1 = sm + n 1
- transmitter - + - m̂
½¼ optimum -
6
receiver
n1 ¾?»
- + -
n2 ½¼ r2 = n 1 + n 2

Figure 4.11:

r1
¾?»
r1 + ar2 m̂
+ - threshold device -
¾» ½¼
- × 6

r2 ½ ¼
6
a

Figure 4.12:

(b) Express the resulting PE in terms of Q(·).


(c) By what factor would E s have to be increased to yield this same probability of error
if the receiver were restricted to observing only r1 .

(Exam Communication Principles, July 4, 2001.)

3. In the communication system diagrammed in figure 4.11 the transmitted signal s and the
noises n 1 and n 2 are all random voltages and all statistically independent. Assume that
|M| = 2 i.e. M = {1, 2} and that

Pr{M = 1} = Pr{M = 2} = 1/2


p
s1 = −s2 = Es
1 n2
p N1 (n) = p N2 (n) = √ exp(− 2 ). (4.22)
2πσ 2 2σ

(a) Show that the optimum receiver can be realized as diagrammed in figure 4.12 where
a is an appropriately chosen constant.
(b) What is the optimum value for a?
(c) What is the optimum threshold setting?
(d) Express the resulting PE in terms of Q(x).
CHAPTER 4. REAL VECTOR CHANNELS 44

(e) By what factor would E s have to be increased to yield this same probability of error
if the receiver were restricted to observing only r1 .

(Exercise 4.6 from Wozencraft and Jacobs [25].)

4. Consider a set of signal vectors {s 1 , s 2 , · · · , s |M| }. Result 4.1 says that if the a-priori
probabilities of the corresponding messages are equal, the minimum Euclidean distance
decision rule is optimum if the channel is an additive Gaussian noise vector channel (defi-
nition 4.2). In this case the decision regions can be described by a set of hyperplanes (one
for each message pair). Show that this is also the case if the a-priori probabilities are not
all equal.
Chapter 5

Waveform channels, correlation receiver

SUMMARY: All topics discussed so far are necessary to analyze the additive white Gaus-
sian noise waveform channel. In a waveform channel continuous-time waveforms are trans-
mitted over a channel that adds continuous-time white Gaussian noise to them. We show in
this chapter that an optimum receiver for such a waveform channel correlates the received
waveform with all possible transmitted waveforms and makes a decision based on all these
correlations. Together these correlations form a sufficient statistic.

5.1 System description, additive white Gaussian noise


In a communication system based on an additive white Gaussian noise waveform channel (see
figure 5.1) the channel input and output signals are waveforms.

TRANSMITTER The transmitter chooses the waveform sm (t) for 0 ≤ t < T , if message m
is to be transmitted. The set of used waveforms is therefore {s1 (t), s2 (t), · · · , s|M| (t)}.
The interval [0, T ) is called the observation interval. The chosen waveform is input to the
channel. We assume here that all these waveforms have finite-energy, i.e.
Z T
1
E sm = sm2 (t)dt < ∞ for all m ∈ M, (5.1)
0

where E sm is the energy of waveform sm (t). To convince yourself that this is a reasonable

n w (t)

m sm (t) ¾?» m̂
- transmitter - + - receiver -
½ r¼
(t) = sm (t) + n w (t)

Figure 5.1: Communication over an additive white Gaussian noise waveform channel.

45
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 46

definition assume that sm (t) is the voltage across (or the current through) a resistor of 1.
The total energy that is dissipated in this resistor is then equal to E sm .

WAVEFORM CHANNEL The channel accepts the input waveform sm (t) and adds white Gaus-
sian noise n w (t) to it, i.e. for the received waveform r (t) we have that

r (t) = sm (t) + n w (t). (5.2)

The noise n w (t) is supposed to be generated by the random process Nw (t) which is zero-
mean (i.e. E[Nw (t) = 0 for all t), stationary, white, and Gaussian1 . The autocorrelation
function of the noise is

1 N0
R Nw (t, s) = E[Nw (t)Nw (s)] = δ(t − s), (5.3)
2
and depends only on the time difference t − s (by the stationarity). The noise has power
spectral density
N0
S Nw ( f ) = for −∞ < f < ∞, (5.4)
2
i.e. it is white, see appendix D. Note that f is the frequency in Hz.

RECEIVER The receiver forms the message estimate m̂ based on the observed waveform
r (t), 0 ≤ t < T .

The transmission of messages over a waveform channel is what we actually want to study in
the first part of these course notes. It is a model for many digital communication systems (e.g.
a telephone line including the modems on both sides, wireless communication links, recording
channels). In the next sections we will show that the optimum receiver should correlate the
received signal r (t) with all possible transmitted waveforms sm (t), m ∈ M to form a good
message estimate. In a second chapter on waveform communication we will study alternative
implementations of optimum receivers.
GAUSSIAN NOISE is always everywhere. It originates in the thermal fluctuations of all
matter in the universe. It creates randomly varying electromagnetic fields that will a produce a
fluctuating voltage across the terminals of an antenna. The random movements of electrons in
e.g. resistors will create random voltages in the electronics of the receiver. Since these voltages
are the sum of many minuscule effects, by the central-limit theorem, they are Gaussian.

5.2 Waveform expansions in series of orthonormal functions


To determine the optimum receiver some new techniques have to be introduced. It is our aim
to transform the waveform channel into vector channel first and then use our knowledge of the
vector channel to determine an optimum receiver. Therefore we first expand the noise process
N (t) and the signals sm (t) in terms of a countable set of orthonormal functions (waveforms).
1 One can argue that white noise is always Gaussian, see Helstrom [9] p.36.
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 47

f1(t) f2(t) f3(t) f4(t)

1 1 1 1

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

0 0 0 0

−0.2 −0.2 −0.2 −0.2


0 2 4 0 2 4 0 2 4 0 2 4

Figure 5.2: Example of orthonormal functions: four time-translated pulses.

Consider a set of functions { f i (t), i = 1, 2, · · · } that are orthonormal over the interval [0, T )
in the sense that Z T 
1 1 if i = j
f i (t) f j (t)dt = (5.5)
0 0 if i 6= j.
Some well-known sets of orthonormal functions are given in the example that follows.

Example 5.1 In figure 5.2 four R Tfunctions are shown that are time-translated orthonormal pulses. Note
that the energy in a pulse i.e. 0 f i2 (t)dt = 1 for i = 1, 4. In figure 5.3 five functions are shown on
a√one-second time interval, a pulse with amplitude 1 and four sine and cosine waves whose amplitude is
2. All these functions are again mutually orthogonal and have unit energy. Note that functions like these
form the building blocks used in Fourier series.

Now assume2 that all outcomes n w (t) of the random process Nw (t) can be represented in
terms of the orthonormal functions f 1 (t), f 2 (t), · · · as

X
n w (t) = n i f i (t). (5.6)
i=1

Then the coefficients n i , i = 1, 2, · · · , will satisfy


Z T
ni = n w (t) f i (t)dt. (5.7)
0

To see why, substitute (5.6) in the right side of (5.7) and integrate making use of (5.5). Next
assume that also all signals can be expressed in terms of the same set of orthonormal functions,
2 In this discussion we will focus only on the main concepts and we do not bother very much about mathematical
details as e.g. the existence of such a “complete set” of orthonormal functions, extra conditions on the random
process Nw (t) and on the waveforms sm (t), the convergence of series, the interchanging of orders of summation and
integration, etc. . For a more precise treatment we refer e.g. to Gallager [7], Helstrom [9], or Lee and Messerschmitt
[13].
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 48

constant 1 sqrt(2)*cos(2*pi t) sqrt(2)*cos(4*pi*t)


1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0

−0.5 −0.5 −0.5

−1 −1 −1

−1.5 −1.5 −1.5


0 0.5 1 0 0.5 1 0 0.5 1

sqrt(2)*sin(2*pi*t) sqrt(2)*sin(4*pi*t)
1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5
0 0.5 1 0 0.5 1

Figure 5.3: Example of orthonormal functions: first five functions in a Fourier series.

i.e.

X
sm (t) = smi f i (t), for m ∈ {1, 2, · · · , |M|}, (5.8)
i=1
where the coefficients smi , m ∈ M, i = 1, 2, · · · , are given by
Z T
smi = sm (t) f i (t)dt. (5.9)
0

Then also the received signal r (t) = sm (t) + n w (t) can be expanded as

X
r (t) = ri f i (t), (5.10)
i=1

with coefficients ri , i = 1, 2, · · · , that satisfy

ri = smi + n i
Z T Z T Z T
= sm (t) f i (t)dt + n w (t) f i (t)dt = r (t) f i (t)dt. (5.11)
0 0 0

5.3 An equivalent vector channel


An optimum receiver can now make a decision based on the vector r ∞ = (r1 , r2 , r3 , · · · ) instead
of the waveform r (t). By the theorem of reversibility (Theorem 4.3) this does not affect the
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 49

smallest possible achievable error probability PE . Similarly we can assume that instead of the
signal sm (t) the vector s ∞
m = (sm1 , rm2 , rm3 , · · · ) is transmitted in order to convey the message
m ∈ M, the reason for this is that s ∞ m and sm (t) are equivalent. All this implies that we have
transformed our waveform channel into a vector channel (with an infinite number of dimensions
however). Since
r ∞ = s∞ m +n ,

(5.12)
with n ∞ = (n 1 , n 2 , n 3 , · · · ) the resulting vector channel adds noise to the input vector. What
can we say now about the noise components Ni , i = 1, 2, · · · ?
Note first that each component results from linear operations on the Gaussian process Nw (t).
Therefore the noise components are jointly Gaussian.
For the mean of component Ni for i = 1, 2, · · · , we find that
Z T Z T
E[Ni ] = E[ Nw (t) f i (t)dt] = E[Nw (t)] f i (t)dt = 0. (5.13)
0 0

For the correlation of component Ni , i = 1, 2, · · · , and component N j , j = 1, 2, · · · , we obtain


Z T Z T
E[Ni N j ] = E[ Nw (α)Nw (β) f i (α) f j (β)dαdβ]
0 0
Z TZ T
= E[Nw (α)Nw (β)] f i (α) f j (β)dαdβ
0 0
Z TZ T
N0
= δ(α − β) f i (α) f j (β)dαdβ
0 0 2
Z T
N0 N0
= f i (α) f j (α)dα = δi j , (5.14)
0 2 2
and thus the noise variables all have variance N0 /2 and are uncorrelated and thus independent of
each other. In our derivation we used the Kronecker delta function which is defined as

1 1 if i = j,
δi j = (5.15)
0 if i 6= j.

As a consequence of all this our vector channel is an additive Gaussian noise vector channel
as described in definition 4.2.

5.4 A correlation receiver is optimum


In the previous subsection we have seen that our waveform channel is equivalent to an additive
Gaussian noise vector channel (with an infinite number of dimensions). From (4.12) we know
what the decision variables are for this situation. The decision variable for m ∈ M is expressed
as
kr ∞ − s ∞ 2 2
m k − 2σ ln Pr{M = m} = kr

− s∞ 2
m k − N0 ln Pr{M = m}, (5.16)
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 50

where we have substituted N0 /2 for the variance σ 2 . The receiver has to minimize (5.16) over
1 P∞
m ∈ M. With (a ∞ · b∞ ) = i=1 ai bi for a ∞ = (a1 , a2 , · · · ) and b∞ = (b1 , b2 , · · · ), we
rewrite
kr ∞ − s ∞
mk
2
= ((r ∞ − s ∞m ) · (r

− s∞m ))
= (r ∞ · r ∞ ) − 2(r ∞ · s ∞ ∞ ∞
m ) + (s m · s m )
= kr ∞ k2 − 2(r ∞ · s ∞ ∞ 2
m ) + ks m k . (5.17)
Since kr ∞ k2 does not depend on m, the optimum receiver should maximize
(r ∞ · s ∞
m ) + cm over m ∈ M = {1, 2, · · · , |M|} (5.18)
with
N0
1 ks ∞ k2
cm = ln Pr{M = m} − m . (5.19)
2 2
Now it would be nice if the receiver could use some simple method to determine the dot
product (r ∞ · s ∞ ∞
m ) and the squared vector length ks m k, which are both sums of an infinite
number of components. We will see next that this is possible and quite easy. For the dot product
we find
Z T Z T ∞
X
r (t)sm (t)dt = r (t) smi f i (t)dt
0 0 i=1

X Z ∞ ∞
X
= smi r (t) f i (t)dt = smi ri = (r ∞ · s ∞
m ). (5.20)
i=1 −∞ i=1

Similarly we get for the vector length


Z T Z T ∞
X
2
E sm = sm (t)dt = sm (t) smi f i (t)dt
0 0 i=1

X Z ∞ ∞
X
= smi sm (t) f i (t)dt = 2
smi = ks ∞ 2
mk . (5.21)
i=1 −∞ i=1

This leads to an important result.

RESULT 5.1 The optimum receiver for transmission of the signals sm (t), m ∈ M, over an
additive white Gaussian noise waveform channel has to maximize
Z ∞
r (t)sm (t)dt + cm over m ∈ M = {1, 2, · · · , |M|}. (5.22)
−∞

Here
N01 Es
cm = ln Pr{M = m} − m . (5.23)
2 2
This receiver is called a correlation receiver (see figure 5.4).
CHAPTER 5. WAVEFORM CHANNELS, CORRELATION RECEIVER 51

s1 (t) c1
RT
¾ ?» r (t)s1 (t)dt ¾?»
0
- × - integrator - + -
r (t) ½¼ ½¼ m̂
s2 (t) c2 select -
RT
¾ ?» r (t)s2 (t)dt ¾?» largest
0
- × - integrator - + -
½¼ ½¼

..
.
s|M| (t) c|M|
RT
¾ ?» r (t)s|M| (t)dt ¾? »
0
- × - integrator - + -
½¼ ½¼

Figure 5.4: Correlation receiver, an optimum receiver for waveforms in white Gaussian noise.

5.5 Sufficient statistic


In the previous section we have seen that the receiver can make an optimum decision by looking
only at the correlations
Z T
r (t)sm (t)dt, for all m ∈ M. (5.24)
0
These |M| correlations together form a so-called sufficient statistic for making an optimum de-
cision. It is unnecessary to store the received waveform r (t) after having determined the cor-
relations. Or, to put it in another way, given the correlations the received waveform r (t) is not
relevant anymore.

5.6 Exercises
1. Consider a waveform communication system in which one of two equally likely messages
is transmitted, hence M = {1, 2} and Pr{M = 1} = Pr{M = 2} = 1/2. The waveforms
s1 (t) and s2 (t) corresponding to the messages both have energy E, i.e.
Z T Z T
2
s1 (t)dt = s22 (t)dt = E,
0 0
where [0, T ) is the observation interval. The channel adds zero-mean additive Gaussian
noise with power spectral density S Nw ( f ) = 1 to the transmitted waveform.
(a) Find an expression for the error probability obtained by a correlation receiver when
s1 (t) = −s2 (t).
RT
(b) What is the error probability of a correlation receiver when 0 s1 (t)s2 (t)dt = 0?
Chapter 6

Waveform channels, building-blocks

SUMMARY: In this chapter we assume that when signal waveforms are linear combina-
tions of a set of orthonormal waveforms (building-bock waveforms) the waveform channel
can be transformed into an AGN vector channel without loss of optimality. To each input
waveform there corresponds a vector in what is called the signal space. It is possible to
make an optimum decision based on the correlations of the channel output waveform with
the building-block waveforms. The difference between this vector of correlations and the
transmitted vector is a noise vector which is Gaussian and spherically distributed.

6.1 Another sufficient statistic


Suppose that the number |M| of waveforms sm (t) is large. Then the receiver has to determine
|M| correlations which is quite complex. The question that pops up is now is whether an opti-
mum decision can be based on a smaller number of correlations? This turns out to be true when
all waveforms sm (t), m ∈ M, can be represented as linear combinations of N orthonormal func-
tions and N < |M|. We call these orthonormal functions building-block waveforms or building
blocks.
The decomposition of waveforms into building-blocks is also essential when we are inter-
ested in evaluating the error probability of a system.

Definition 6.1 The building-block waveforms {ϕi (t), i = 1, N } satisfy by definition


Z T 
1 1 if i = j
ϕi (t)ϕ j (t)dt = (6.1)
0 0 if i 6 = j.
Note that such a set of building blocks is just a set of functions orthonormal over the interval
[0, T ).
We say that a basis of N building-block waveforms exists for a collection s1 (t), s2 (t), · · · ,
s|M| (t) of transmitted waveforms if we can decompose these transmitted waveforms as
X
sm (t) = smi ϕi (t), for m ∈ M = {1, 2, · · · , |M|}. (6.2)
i=1,N

52
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 53

Note that the coefficient smi corresponding to a building-block ϕi (t) satisfies


Z T Z T X X Z T
sm (t)ϕi (t)dt = sm j ϕ j (t)ϕi (t)dt = sm j ϕ j (t)ϕi (t)dt = smi . (6.3)
0 0 j=1,N j=1,N 0

Now the correlation of r (t) and the signal sm (t) for m ∈ M can be expressed as
Z T Z T X
r (t)sm (t)dt = r (t) smi ϕi (t)dt
0 0 i=1,N
X Z T X
= smi r (t)ϕi (t)dt = smi ri , (6.4)
i=1,N 0 i=1,N

with Z T
1
ri = r (t)ϕi (t)dt for i = 1, N . (6.5)
0
RT
Observe that all the correlations 0 r (t)sm (t)dt, m ∈ M, can be computed from the N
RT RT
correlations 0 r (t)ϕi (t)dt, i = 1, N . Therefore also these N correlations 0 r (t)ϕi (t)dt, i =
1, N , are a sufficient statistic since from these N numbers an optimum decision can be made.
This is summarized in the following theorem.

RESULT 6.1 If the transmitted waveforms sm (t), m ∈ M, are linear combinations of the N
building-block waveforms ϕi (t), i = 1, N , i.e. if
X
sm (t) = smi ϕi (t), for m ∈ M, (6.6)
i=1,N

then the receiver can make an optimum decision from


Z T
ri = r (t)ϕi (t)dt, for i = 1, N . (6.7)
0

The numbers r1 , r2 , · · · , r N therefore formP


a sufficient statistic. The optimum receiver produces
N
as estimate the message m that maximizes i=1 ri smi + cm over m ∈ M. Note that cm is given
by ( 5.23 ).
All this results in the receiver that is shown in figure 6.1. There we first see a bank of N
RT
multipliers and integrators. This bank yields the correlations ri = 0 r (t)ϕi (t)dt, i = 1, N .
P RT
Then a matrix multiplication yields i=1,N ri smi = 0 r (t)sm (t)dt for all m ∈ M. Adding the
constants cm and choosing as message estimate the m that achieves the largest sum yields again
an optimum receiver.
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 54

ϕ1 (t) c1
PN
¾ ?» r1 ¾?»
i=1 ri s1i
- × - integrator - - + -
r (t) ½¼ ½¼ m̂
-
ϕ2 (t) c2 select
PN
¾ ?» r2 ¾?»
i=1 ri s2i largest
- × - integrator - weighting - + -
½¼ ½¼
matrix
.. ..
. .
ϕ N (t) c|M|
PN
¾ ?» rN ¾? »
i=1 ri s|M|i
- × - integrator - - + -
½¼ ½¼

Figure 6.1: Correlation receiver based on building blocks.

6.2 The Gram-Schmidt orthogonalization procedure


It is not so difficult to construct an orthonormal basis, i.e a set of N building-block waveforms,
for a given set of |M| finite-energy signaling waveforms. The Gram-Schmidt procedure can be
used.

RESULT 6.2 (Gram-Schmidt) For an arbitrary signal collection, i.e. a collection of wave-
forms {s1 (t), s2 (t), · · · , s|M| (t)} on the interval [0, T ) we can construct a set of N ≤ |M|
building-block waveforms {ϕ1 (t), ϕ2 (t), · · · , ϕ N (t)} and P find coefficients smi such that for m =
1, 2, · · · , |M| the signals can be synthesized as sm (t) = i=1,N smi ϕi (t).
The proof of this result can found in appendix E. There is also an example there where for
three signals we determine a set of building-blocks.
A certain set of signals can be expanded in many different sets of building-block waveforms,
also in sets with a larger dimensionality in general. What remains the same however is the geo-
metrical configuration of the vector representations of the signals (see exercise 3 in this chapter).
The Gram-Schmidt procedure always yields a base with the smallest possible dimension.

6.3 Transmitting vectors instead of waveforms


In the previous section we have seen that for each collection of signals s1 (t), s2 (t), · · · , s|M| (t)
we can construct a finite orthonormal base such that each waveform sm (t) for m ∈ M is a linear
combination of the building-block waveforms ϕ1 (t), ϕ2 (t), · · · , ϕ N (t) that form the base, i.e.
X
sm (t) = smi ϕi (t). (6.8)
i=1,N
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 55

ϕ1 (t)
¾ ?»
sm1 - ×
½¼
ϕ2 (t)
¾ ?»
-
sm2 - × -
½¼ .. + - s (t)
. m
-

ϕ N (t)
¾ ?»
sm N - ×
½¼

Figure 6.2: A modulator forms the waveform sm (t) from the vector s m .

Hence to each waveform sm (t) there corresponds a vector consisting of N coefficients

s m = (sm1 , sm2 , · · · , sm N ). (6.9)

Now, instead of the waveform sm (t) we can say that the vector s m is transmitted. A
modulator (see figure 6.2) performs operation (6.8) that transforms a vector into the transmitted
waveform.

6.4 Signal space, signal structure


The next result applies to the set of all transmitted waveforms.

RESULT 6.3 If the waveforms s1 (t), s2 (t), · · · , s|M| (t) are linear combinations of N building-
blocks, the collection of waveforms can be represented as a collection of vectors s 1 , s 2 , · · · , s |M|
in an N -dimensional space. This space is called the signal space. The collection of vectors
s 1 , s 2 , · · · , s |M| is called the signal structure.

Example 6.1 Consider for m = 1, 2, 3, 4 the set of phase-modulated transmitter waveforms


( q
2E s
cos(2π f 0 t + mπ ) for 0 ≤ t < T and
sm (t) = T 2 (6.10)
0 elsewhere,

where f 0 is an integral multiple of 1/T .


Note that
mπ mπ mπ
cos(2π f 0 t + ) = cos(2π f 0 t) cos − sin(2π f 0 t) sin . (6.11)
2 2 2
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 56

ϕ2
s3
6

s
¾ -4
s2 ϕ1

?s
1

Figure 6.3: Four√signals represented as vectors in a two-dimensional space. The length of all
signal vectors is E s .

If we now take as building-block waveforms


p
ϕ1 (t) = 2/T cos(2π f 0 t), and
p
ϕ2 (t) = 2/T sin(2π f 0 t), (6.12)

for 0 ≤ t < T and zero elsewhere, we obtain the following vector-representations for the signals:
p
s 1 = (0, − E s ),
p
s 2 = (− E s , 0),
p
s 3 = (0, E s ),
p
s 4 = ( E s , 0). (6.13)

These vectors are depicted in figure 6.3. Note that the four waveforms shown in figure 6.4 have the same
signal structure (with respect to the base also shown there) as the signals defined in (6.10). We will learn
later in this chapter that both sets of waveforms also have identical error behavior.

q 0 τ 2τ 0 τ 2τ
1
τ
t t
ϕ1 (t) ϕ2 (t) s1 (t) s2 (t)
t q q
− Eτs − Eτs
0 τ 2τ
q q
Es Es
τ τ
s3 (t) s4 (t)
t t
0 τ 2τ 0 τ 2τ

Figure 6.4: Another set of waveforms that leads to the vector diagram in figure 6.3.
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 57

6.5 Receiving vectors instead of waveforms


In the previous sections we have shown that a waveform-transmitter actually sends a vector out of
an N -dimensional signal space. In this section we will see that the optimum waveform-receiver
as derived in section 6.1, which operates with sufficient statistics corresponding to the building-
blocks, actually receives a vector in this N -dimensional signal space.

ϕ1 (t)
¾ ?» r1
- × - integrator -
½¼
ϕ2 (t)
¾ ?» r2
r (t)
- × - integrator -
½¼
..
.
ϕ N (t)
¾ ?» rN
- × - integrator -
½¼

Figure 6.5: Recovering the vector r = (r1 , r2 , · · · , r N ) from the received waveform r (t).

Consider the first part (demodulator) of the optimum receiver for waveform communication
working with sufficient statistics (see figure 6.5). This receiver observes

r (t) = sm (t) + n w (t), (6.14)

where sm (t) for some m ∈ M is the transmitted waveform and n w (t) is a realization of a white
Gaussian noise process. The first part of the receiver determines the N -dimensional vector r =
(r1 , r2 , · · · , r N ) whose components are defined as (see figure 6.5)
Z T
1
ri = r (t)ϕi (t)dt for i = 1, 2, · · · , N . (6.15)
0

The components of this vector are the sufficient statistics mentioned in section 6.1. So we can
say that the optimum receiver makes a decision based on the vector r instead of on the
waveform r (t).

6.6 The additive Gaussian vector channel


In the previous sections we have seen that without losing optimality we may assume that an
N -dimensional vector s m for some m ∈ M is transmitted and an N -dimensional vector r is
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 58

received. We will now consider the vector-channel that has s m as input and r as output. Since
r (t) = sm (t) + n w (t) we have that
Z T
ri = r (t)ϕi (t)dt
0
Z T Z T
= sm (t)ϕi (t)dt + n w (t)ϕi (t)dt = smi + n i , (6.16)
0 0

with for i = 1, N , Z T
1
ni = n w (t)ϕi (t)dt. (6.17)
0
Hence again, as in section 5.3, we get

r = s m + n, (6.18)

with r = (r1 , r2 , · · · , r N ), s = (s1 , s2 , · · · , s N ), and n = (n 1 , n 2 , · · · , n N ). Note that the


dimension N of the vectors is the dimension of the signal space (the number of building-block
waveforms), which is now finite. In section 5.3 the dimension of the vectors was infinite.
What can we say about the noise vector n that disturbs the transmitted vector s m ? Just
like before, in section 5.3, the noise components are jointly Gaussian. They result from linear
operations on the Gaussian process Nw (t). For the mean of component Ni , i = 1, N , we again
find that Z Z
T T
E[Ni ] = E[ Nw (t)ϕi (t)dt] = E[Nw (t)]ϕi (t)dt = 0. (6.19)
0 0
By the orthonormality of the building-block waveforms we get for the correlation of component
Ni , i = 1, N , and component N j , j = 1, N , that
Z TZ T
E[Ni N j ] = E[ Nw (α)Nw (β)ϕi (α)ϕ j (β)dαdβ]
0 0
Z TZ T
= E[Nw (α)Nw (β)]ϕi (α)ϕ j (β)dαdβ
0 0
Z TZ T
N0
= δ(α − β)ϕi (α)ϕ j (β)dαdβ
0 0 2
Z T
N0 N0
= ϕi (α)ϕ j (α)dα = δi j , (6.20)
0 2 2
and thus all N noise variables have variance N0 /2 and are uncorrelated and since they are jointly
Gaussian also independent of each other.

RESULT 6.4 The joint density function of the N -dimensional noise vector n, that is added to
the input of a vector channel, which is derived from a waveform communication system, is
1 knk2
p N (n) = exp(− ), (6.21)
(π N0 ) N /2 N0
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 59

hence the noise is spherically symmetric and depends on the magnitude but not on the direction
of n. The noise projected on each dimension i = 1, N has variance N0 /2.
We may conclude that our channel with input s and output r is an additive Gaussian noise
(AGN) vector channel as was described in definition 4.2.

6.7 Processing the received vector


Note (see 4.12) that after having determined the vector r , an optimum vector receiver should
search for the message m that minimizes

kr − s m k2 − 2σ 2 ln Pr{M = m}. (6.22)

Note that by result 6.4 we must substitute N0 /2 for σ 2 here. Inspection now shows that an
optimum vector receiver should maximize

N0 ks k2 X N0 ks k2
(r · s m ) + ln Pr{M = m} − m = ri smi + ln Pr{M = m} − m (6.23)
2 2 2 2
i=1,N

over m ∈ M. If we note that


Z T Z T X
2
E sm = sm (t)dt = sm (t) smi ϕi (t)dt
0 0 i=1,N
X Z T X
2
= smi sm (t)ϕi (t)dt = smi = ks m k2 , (6.24)
i=1,N 0 i=1,N

we can observe that the back end of the receiver shown in figure 6.1 performs exactly as we
expect.

6.8 Relation between waveform and vector channel


We have seen that to each set of waveforms there corresponds a finite set of building-block wave-
forms such that each waveform is a linear combination of these building-block waveforms. We
have observed that instead of the waveform sm (t) the vector s m is transmitted over the channel.
We have also seen that an optimum receiver only needs to consider the projections (correla-
tions) of the received signal r (t) onto the building-block waveforms. The receiver can base its
decision only on the vector r , a sufficient statistic. It is important to note that these conclusions
depend on the fact that the channel adds white Gaussian noise to the input waveform.
From the above we may conclude that transmission over an additive white Gaussian wave-
form channel can actually be considered as transmission over an additive Gaussian noise vector
channel (see figure 6.6). A vector transmitter produces the vector s m when message m is gen-
erated by the source. This vector is transformed into the waveform sm (t) by a modulator. The
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 60

transmitter optimum receiver


¡
¡ @
@ ¡
¡ @
@
m sm sm (t) r (t) r m̂
- -
source - vector - modulator - waveform - demodulator - vector
- -
-
transmitter channel receiver

@
@ ¡
¡
vector channel

Figure 6.6: Reduction of waveform channel to vector channel.

waveform sm (t) is input to the channel that adds white Gaussian noise. The demodulator op-
erates on the received waveform r (t) and forms the vector r . A vector receiver determines the
estimate m̂ of the message that was sent from r . The combination modulator - waveform channel
- demodulator is an additive Gaussian noise vector channel. In the previous chapter we have
seen how an optimum receiver for such a channel can be constructed. The decision function that
applies to this situation can be found in (4.12).
We may conclude that, by converting the problem of designing an optimum receiver for
waveform communication to designing an optimum receiver for an additive white Gaussian noise
vector channel, we have solved it. We want to end this chapter with a very important observation:

RESULT 6.5 The error behavior of the waveform communication system is only determined by
the collection of signal vectors (the signal structure) and N0 /2. We should realize that depending
on the chosen building-block waveforms different sets of waveforms having the same signal struc-
ture yield the same error performance. What building-block waveforms actually are chosen by
the communication engineer depends possibly on other factors as e.g. bandwidth requirements,
as we will see in later parts of these course notes.

6.9 Exercises
1. For each set of waveforms s1 (t), s2 (t), · · · , s|M| (t) we can apply the Gram-Schmidt pro-
cedure to construct an orthonormal base such that each waveform sm (t) for m ∈ M is a
linear combination of the building-block waveforms ϕ1 (t), ϕ2 (t), · · · , ϕ N (t). Show that
for a given set of waveforms the Gram-Schmidt procedure results in a base with the small-
est possible dimension N .

2. If we apply the Gram-Schmidt procedure we must assume that the waveforms sm (t), for
m ∈ M, have finite energy. In the construction it is essential that the energy E θm corre-
sponding the the auxiliary signal θm, (t) is finite. Show that
Z T
E θm ≤ sm2 (t)dt < ∞. (6.25)
0
CHAPTER 6. WAVEFORM CHANNELS, BUILDING-BLOCKS 61

3. Show that independent of the choice of the building-block waveforms, the length of any
signal vector s m for m ∈ M is constant. Moreover show that the Euclidean distance
between two signal vectors s m and s m 0 for m, m 0 ∈ M is constant. What does this imply
for energy and expected error probability of the system?

4. (a) Calculate PEmin when the signal sets (a), (b), and (c) specified in figure 6.7 are used to
communicate one of two equally likely messages over a channel disturbed by additive
Gaussian noise with Sn ( f ) = 0.15. Note that the third set is specified by the Fourier
transforms (see appendix B) of the waveforms. Use Parseval’s equation.
(b) Also calculate the PEmin for the same three sets when the a-priori message probabili-
ties are 1/4, 3/4.
(Based on exercise 4.8 from Wozencraft and Jacobs [25].)

s1 (t) s2 (t)
2 2
s1 (t) s2 (t)
1 1
t t t t
0 1 0 1 0 1 0 2
(a) (b)

S1 ( f ) S2 ( f )
1 1
(c)
f f
-1 0 +1 − 32 − 21 1
2
3
2

Figure 6.7: Si ( f ), the Fourier transform of si (t), is real.


Chapter 7

Matched filters

SUMMARY: Here we discuss matched-filter receiver structures for optimum waveform


communication. The optimum receiver can be implemented as a matched-filter receiver,
so a matched filter not only maximizes the signal-to-noise ratio but it can also be used to
minimize the error probability. In the last part of this chapter we consider the Parseval
relation between correlation and dot product.

7.1 Matched-filter receiver


Note that for all i = 1, N the building-block waveforms are such that ϕi (t) ≡ 0 for t < 0 and
t ≥ T.
We can now replace the N multipliers and integrators by N matched filters and samplers.
This can be advantageous in analog implementations since accurate multipliers are in that case
more expensive than filters. Note that appendix C contains some elementary material on filters.

r (t) u i (t)
- h i (t) = ϕi (T − t) -

Figure 7.1: A filter matched to the building-block waveform ϕi (t).

Consider (see figures 7.1 and 7.2) filters with an impulse response h i (t) = ϕi (T − t) for
i = 1, N . Note that h i (t) ≡ 0 for t ≤ 0 (i.e. the filter is causal) and also for t > T . For the
output of such a filter we can write
Z ∞
u i (t) = r (t) ∗ h i (t) = r (α)h i (t − α)dα
−∞
Z ∞
= r (α)ϕi (T − t + α)dα. (7.1)
−∞

62
CHAPTER 7. MATCHED FILTERS 63

¢A ϕi (t) ϕi (T − t) ¢A
¢ A ¢ A
¢ A ¢ A
¢ A ¢ A
¢ A t ¢ A t
¢ A ¢ A
0 T 0 T

Figure 7.2: A building-block waveform ϕi (t) and the impulse response ϕi (T − t) of the corre-
sponding matched filter.

At t = T the matched-filter output


Z ∞ Z T
u i (T ) = r (α)ϕi (α)dα = r (α)ϕi (α)dα = ri , (7.2)
−∞ 0

hence we can determine the i-th component of the vector r in this way. Note that ri is the i-th
component of the sufficient statistic corresponding to the building-blocks.
Now we have an alternative method for computing the vector r . This results in the receiver
shown in figure 7.3. Note that for all m ∈ M
X Z T
(r · s m ) = ri smi = r (t)sm (t)dt. (7.3)
i=1,N 0

A filter whose impulse response is a delayed time-reversed version of a signal ϕi (t) is called
”matched to” ϕi (t). A receiver that is equipped with such filters is called a matched-filter receiver.

7.2 Direct receiver


Note that, just like the building-blocks, for all m ∈ M the transmitted waveforms are such that
sm (t) ≡ 0 for t < 0 and t ≥ T .
Now for all m ∈ M consider a filter with impulse response sm (T − t), let the waveform
channel output r (t) be the input of all these filters and sample the |M| filter outputs at t = T .
Then we obtain
Z ∞ Z ∞ Z T
t=T
r (α)sm (T − t + α)dα = r (α)sm (α)dα = r (α)sm (α)dα. (7.4)
−∞ −∞ 0

This gives another method to form an optimum receiver (see figure 7.4). This receiver is called a
direct receiver since the filters are matched directly to the signals {sm (t), m ∈ M}.
Note that a direct receiver is usually more expensive than a receiver with filters matched to
the building-block waveforms, since always M ≥ N and in practice even often M  N . The
weighting-matrix operations are not needed here however.
CHAPTER 7. MATCHED FILTERS 64

c1
H r1 (r · s 1 ) ¾?»
- ϕ1 (T − t) ¤ ¡ HH¤ ¡ - - + -
£¢ £¢
r (t) u 1 (t) ½¼ m̂
weighting c2 select -
H r2 matrix (r · s 2 ) ¾?» largest
- ϕ2 (T − t) ¤ ¡ HH¤ ¡ - - + -
£¢ £¢
u 2 (t) ½¼

P
i ri smi
c|M|
H rN ) ?»
(r · s |M|¾
- ϕ N (T − t) ¤ ¡ HH¤ ¡ - - + -
£¢ £¢
u N (t) ½¼

? sample at t = T

Figure 7.3: Matched-filter receiver based on building-blocks.

7.3 Signal-to-noise ratio


We have seen in the previous section that a matched-filter receiver is optimum, i.e. minimizes
the error probability PE . Not only in this sense is the matched filter optimum, we will show
next that it also maximizes the signal-to-noise ratio. To see what we mean by this, consider
the communication situation shown in figure 7.5. A signal s(t) is assumed to be non-zero only
for 0 ≤ t < T . This signal is observed in additive white noise, i.e. the observer receives
r (t) = s(t) + n w (t). The process Nw (t) is a zero-mean white Gaussian noise process with power
spectral density Sw ( f ) = N0 /2 for all −∞ < f < ∞.
The problem is now e.g. to decide whether the signal s(t) was present in the noise or not
(or in a similar setting to decide whether s(t) or −s(t) was seen). Therefore the observer uses a
linear time-invariant filter with impulse response h(t) and samples the filter output at time t = T .
For the sampled filter output u(t) at time t = T we can write

u(T ) = r (t) ∗ h(t)|t=T


Z ∞
= r (T − α)h(α)dα = u s (T ) + u n (T ), (7.5)
−∞

with
Z ∞
1
u s (T ) = s(T − α)h(α)dα, and (7.6)
Z−∞

1
u n (T ) = n w (T − α)h(α)dα, (7.7)
−∞
CHAPTER 7. MATCHED FILTERS 65

c1
H ¾?»
- ¤ ¡ HH¤ ¡ - + -
s1 (T − t) £¢ £¢
½¼ m̂
r (t) c2 select -
H ¾? » largest
- ¤ ¡ HH¤ ¡ - + -
s2 (T − t) £¢ £¢
½¼

c|M|
H ¾? »
- s|M| (T − t) ¤ ¡ HH¤ ¡ - + -
£¢ £¢
½¼

? sample at t = T

Figure 7.4: Direct matched-filter receiver.

r (t) = s(t) + n w (t) H u s (T ) + u n (T )


- ¤ ¡ HH¤ ¡ -
h(t) £¢ £¢

? sample at t = T

Figure 7.5: A matched filter maximizes S/N .

where u s (T ) is the signal component and u n (T ) the noise component in the the sampled filter
output.

Definition 7.1 We can now define the signal-to-noise ratio as


1 u 2s (T )
S/N = , (7.8)
E[Un2 (T )]
i.e. the ratio between signal energy and noise variance.
The noise variance can be expressed as
Z ∞ Z ∞
2
E[Un (T )] = E[ Nw (T − α)h(α)dα Nw (T − β)h(β)dβ]
−∞ −∞
Z ∞Z ∞
= E[Nw (T − α)Nw (T − β)]h(α)h(β)dαdβ
−∞ −∞
Z Z Z
N0 ∞ ∞ N0 ∞ 2
= δ(β − α)h(α)h(β)dαdβ = h (α)dα. (7.9)
2 −∞ −∞ 2 −∞
CHAPTER 7. MATCHED FILTERS 66

RESULT 7.1 If we substitute (7.9) and (7.6) in (7.8) we obtain for the maximum attainable
signal-to-noise ratio
R ∞ 2
−∞ s(T − α)h(α)dα
S/N = N0 R ∞
2 −∞ h (α)dα
2
R∞ 2 R∞ 2
(∗) −∞ s (T − α)dα −∞ h (α)dα
≤ N0 R ∞
2 −∞ h (α)dα
2
R∞ 2 RT 2
−∞ s (T − α)dα 0 s (T − α)dα
= N0
= N0
. (7.10)
2 2

The inequality (∗) in this derivation comes from Schwarz inequality (see appendix F). Equality
is obtained if and only if h(t) = Cs(T − t) for some constant C, i.e. if the filter h(t) is matched
to the signal s(t).
Note that the maximum signal-to-noise ratio depends only on the energy of the waveform
s(t) and not on its specific shape.
It should be noted that in this section we have demonstrated a weaker form of optimality for
the matched filter than the one that we have obtained in the previous chapter. The matched filter
is not only the filter that maximizes signal-to-noise ratio, but can be used for optimum detection
as well.

7.4 Parseval relation


Equation (7.3) demonstrates that a correlation can be expressed as the dot product of the corre-
sponding vectors. We will investigate this phenomenon more closely here. Therefore consider a
set of orthonormal waveforms {ϕi (t), i = 1, 2, · · · , N } over interval [0, T ) and two waveforms
f (t) and g(t) that can be expressed in terms of these building-blocks, i.e.
1
X
f (t) = f i ϕi (t),
i=1,N
1
X
g(t) = gi ϕi (t). (7.11)
i=1,N

The vector-representations that correspond to f (t) and g(t) are

f = ( f 1 , f 2 , · · · , f N ) and
g = (g1 , g2 , · · · , g N ). (7.12)

Now:
CHAPTER 7. MATCHED FILTERS 67

RESULT 7.2 (Parseval relation)


Z T Z T X X
f (t)g(t)dt = f i ϕi (t) g j ϕ j (t)dt
0 0 i=1,N j=1,N
X X ZT
= fi g j ϕi (t)ϕ j (t)dt
i=1,N j=1,N 0
X X X
= f i g j δi j = f i gi = ( f · g). (7.13)
i=1,N j=1,N i=1,N
This result says that the correlation of f (t) and g(t), which is defined as the integral of their
product, is equal to the dot product of the corresponding vectors. Note that this result can be
regarded as an analogue to the Parseval relation in Fourier analysis (see appendix B) which says
that Z ∞ Z ∞
f (t)g(t)dt = F( f )G ∗ ( f )d f, (7.14)
−∞ −∞
with
Z ∞
F( f ) = f (t) exp(− j2π f t)dt, and
Z−∞

G( f ) = g(t) exp(− j2π f t)dt. (7.15)
−∞
Here G ∗ ( f ) is the complex conjugate of G( f ).
The consequences of result 7.2 are:
• Take g(t) ≡ f (t) then Z T
f 2 (t)dt = ( f · f ) = k f k2 . (7.16)
0
This means that the energy of waveform f (t) is simply the square of the length of the
corresponding vector f . We therefore also call the squared length of a vector its energy.
We have seen before that Z T
1
E sm = sm2 (t)dt = ks m k2 , (7.17)
0
the energy corresponding to the waveform sm (t), for m ∈ M = {1, 2, · · · , |M|}.
• Consider
Z T Z T X
r (t)sm (t)dt = r (t) smi ϕi (t)dt
0 0 i=1,N
X Z T X
= smi r (t)ϕi (t)dt = smi ri = (s m · r ). (7.18)
i=1,N 0 i=1,N
P
This result is similar to the Parseval result 7.2 but not identical since r (t) 6= i=1,N ri ϕi (t),
i.e. r (t) can not be expressed as a linear combination of building-block waveforms. Note
that (7.18) is identical to equation (7.3).
CHAPTER 7. MATCHED FILTERS 68

7.5 Notes

Figure 7.6: Dwight O. North, inventor of the matched filter. Photo IEEE-IT Soc. Newsl., Dec.
1998.

The matched filter as a filter for maximizing the signal-to-noise ratio was invented by North
[14] in 1943. The result was published in a classified report at RCA Labs in Princeton. The
name “matched filter” was coined by Van Vleck and Middleton who independently published
the result a year later in a Harvard Radio Research Lab report [24].

7.6 Exercises
1. One of two equally likely messages is to be transmitted over an additive white Gaussian
noise channel with Sn ( f ) = 0.05 by means of binary pulse position modulation. Specifi-
cally
s1 (t) = p(t)
s2 (t) = p(t − 2), (7.19)
in which the pulse p(t) is shown in figure 7.7.

(a) What mathematical operations are performed by the optimum receiver?


(b) What is the resulting average error probability?
(c) Indicate two methods of implementing the receiver, each of which uses a single linear
filter followed by a sampler and comparison device. Method I requires that two sam-
ples from the filter output be fed into the comparison device. Method II requires that
just one sample be used. For each method sketch the impulse response of the appro-
priate filter and its response to p(t). Which of the methods is most easily extended to
M-ary pulse position modulation, where sm (t) = p(t − 2m + 2), m = 1, 2, · · · , M?
CHAPTER 7. MATCHED FILTERS 69

2 p(t)
¢¢A
¢ A
1 ¢ A
¢ A
¢ AA
0 1 2 t

Figure 7.7: A pulse.

(d) Suggest another pair of waveforms that require the same energy as the binary pulse-
position waveforms and yield the same average error probability and a pair that
achieves smaller error probability.
(e) Calculate the minimum attainable average error probability if
s1 (t) = p(t) and s2 (t) = p(t − 1). (7.20)
Repeat this for
s1 (t) = p(t) and s2 (t) = − p(t − 1). (7.21)

(Exercise 4.10 from Wozencraft and Jacobs [25].)

√ √
E s /2 E s /7
s1 (t) s2 (t)

0 1 2 3 4 5 6 7 t 0 1 2 34 5 6 7 t


− E s /7

Figure 7.8: The waveforms s1 (t) and s2 (t).

2. Specify a matched filter for each of the signals shown in figure 7.8 and sketch each filter
output as a function of time when the signal matched to it is the input. Sketch the output
of the filter matched to s2 (t) when the input is s1 (t).
(Exercise 4.14 from Wozencraft and Jacobs [25].)
3. Consider a transmitter that sends the signal
X
sa (t) = ak h(t − (k − 1)τ ),
k=1,K
CHAPTER 7. MATCHED FILTERS 70

n w (t)
impulses
a1 , · · · , a K sa (t) º ?· HH
- h(t) - + - h(T − t) H¤£¡¢ -
¹¸
?
+1 +1
h(t) h(t − 3τ ) sample at
6 6 6 ¢@ ¢@ ¢@
¢ @ ¢ @¢ @ t = T + (k − 1)τ
t ¢ @ ¢ ¢@ @
0 τ 2τ 3τ 0 τA ¡ 3τ
2τ 4τ t
A ¡
? A¡
−h(t − τ )
−1
¢@ £@ ¡@ sa (t)
¢ @ £

@
¢ B @
£
0 τB 2τ
£ 3τ 4τ t
B
¡

Figure 7.9: Pulse transmission and detection with a sampled-matched-filter receiver. Signals are
also shown.

when the message a = a1 , a2 , · · · , a K has to be conveyed to the decoder (i.e. pulse-


transmission, see figure 7.9). Assume that ak ∈ {−1, +1} for k = 1, K , thus there are 2 K
messages. The signal is sent over an additive white Gaussian noise waveform channel.
Assume that h(t) = 0 for t < 0 and t ≥ T . The receiver now uses a matched filter
h(T − t) and samples its output at t = T + (k − 1)τ , for k = 1, K . These K samples are
then processed to form an estimate of the transmitted message (sequence).
Show that this receiver is optimum if processing of the samples is done in the right way.
What is optimum processing here?

4. One of two equally likely messages is to be transmitted over an additive white Gaussian
noise channel with Sn ( f ) = N0 /2 = 1 by means of binary pulse-position modulation.
Specifically

s1 (t) = p(t),
s2 (t) = p(t − 2),

in which the pulse p(t) is shown in figure 7.10.

(a) Describe (and sketch) an optimum receiver for this case? Express the resulting error
probability in terms of Q(·).
(b) Give the implementation of an optimum receiver which uses a single linear filter
followed by a sampler and comparison device. Assume that two samples from the
filter output are fed into the comparison device. Sketch the impulse response of the
CHAPTER 7. MATCHED FILTERS 71

2 p(t)

0 1 2 t

Figure 7.10: A rectangular pulse.

appropriate filter. What is the output of the filter at both sample moments when the
filter input is s1 (t)? What are these outputs for filter input s2 (t)?
(c) Calculate the minimum attainable average error probability if

s1 (t) = p(t) and s2 (t) = p(t − 1).

(Exam Communication Principles, October 6, 2003.)


Chapter 8

Orthogonal signaling

SUMMARY: Here we investigate orthogonal signaling. In this case the waveforms that
correspond to the messages are orthogonal. We determine the average error probability PE
for these signals. It appears that PE can be made arbitrary small by increasing the number
of waveforms if only the energy per transmitted bit is larger than N0 ln 2. Note that this is
a capacity-result!

8.1 Orthogonal signal structure


Consider |M| signals sm (t), or in vector representation s m , with a-priori probabilities Pr{M =
m} = 1/|M| for m ∈ M = {1, 2, · · · , |M|}. We now define an orthogonal signal set in the
following way:

Definition 8.1 All signals in an orthogonal set are assumed to have equal energy and are orthog-
onal i.e.
1 p
s m = E s ϕ m for m ∈ M, (8.1)
where ϕ m is the unit-vector corresponding to dimension m . There are as many building-block
waveforms ϕm (t) and dimensions in the signal space as there are messages.
The signals that we have defined are now orthogonal since
Z ∞
sm (t)sk (t)dt = (s m · s k ) = E s (ϕ m · ϕ k ) = E s δmk for m ∈ M and k ∈ M. (8.2)
−∞

Note that all signals have energy equal to E s .


So far we have not been very explicit about the actual signals. However we can e.g. think of
(disjoint) shifts of a pulse (pulse-position modulation, PPM) or sines and cosines with an integer
number of periods over [0, T ) (frequency-shift keying, FSK).

72
CHAPTER 8. ORTHOGONAL SIGNALING 73

8.2 Optimum receiver


How does the optimum receiver decide when it receives the vector r = (r1 , r2 , · · · , r|M| )? It has
to choose the message m ∈ M that minimizes the squared Euclidean distance between s m and
r , i.e.

kr − s m k2 = kr k2 + ks m k2 − 2(r · s m )
p
= kr k2 + E s − 2 E s rm . (8.3)

RESULT 8.1 Since only the term −2 E s rm depends on m an optimum receiver has to choose
m̂ such that
rm̂ ≥ rm for all m ∈ M. (8.4)
Now we can find an expression for the error probability.

8.3 Error probability


To determine the error probability for orthogonal signaling we may assume, because of symme-
try, that signal s1 (t) was actually sent. Then
p
r1 = Es + n1,
rm = n m , for m = 2, 3, · · · , |M|,
Y 1 n2
with p N (n) = √ exp(− m ). (8.5)
m=1,|M|
π N 0 N0

Note that the noise vector n = (n 1 , n 2 , · · · , n |M| ) consists of |M| independent components all
with mean 0 and variance N0 /2.
Suppose that the first component of the received vector is α. Then we can write for the correct
probability, conditional on the fact that message 1 was sent and that the first component of r is
α,

Pr{ M̂ = 1|M = 1, R1 = α} = Pr{N2 < α, N3 < α, · · · , N|M| < α}


= (Pr{N2 < α})|M|−1
Z α |M|−1
= p N (β)dβ (8.6)
−∞

therefore the correct probability


Z ∞ p Z α |M|−1
PC = p N (α − Es ) p N (β)dβ dα. (8.7)
−∞ −∞
CHAPTER 8. ORTHOGONAL SIGNALING 74

We rewrite this correct probability as follows


Z ∞
√ Z α |M|−1
1 (α − E s )2
PC = √ exp(− ) · · · dβ dα
−∞ π N0 N0 −∞
Z √ √ √ Z α |M|−1
∞ 1 (α/ N0 /2 − E s / N0 /2)2 p
= √ exp(− ) · · · dβ dα/ N0 /2
−∞ 2π 2 −∞
Z √ Z √ !|M|−1
∞ 1 (µ − 2E s /N0 )2 µ N0 /2
= √ exp(− ) · · · dβ dµ (8.8)
−∞ 2π 2 −∞


with µ = α/ N0 /2. Furthermore
Z √ Z √ √
µ N0 /2 1 β2 µ N0 /2 1 (β/ N0 /2)2 p
√ exp(− )dβ = √ exp(− )dβ/ N0 /2
−∞ π N0 N0 −∞ 2π 2
Z µ 2
1 λ
= √ exp(− )dλ (8.9)
−∞ 2π 2

with λ = β/ N0 /2. Therefore
Z ∞ Z µ |M|−1
PC = p(µ − b) p(λ)dλ dµ, (8.10)
−∞ −∞

2 √
with p(γ ) = √1 exp(− γ2 ) and b = 2E s /N0 . Note that p(γ ) is the probability density

function of a Gaussian random variable with mean zero and variance 1.
From (8.10) we conclude that for a given |M| the correct probability PC depends only on
b i.e. on the ratio E s /N0 . This can be regarded as a signal to noise ratio since E s is the signal
energy and N0 is twice the variance of the noise in each dimension.

Example 8.1 In figure 8.1 the error probability PE = 1 − PC is depicted for values of log2 |M| =
1, 2, · · · , 16 and as a function of E s /N0 .

8.4 Capacity
Consider the following experiment. We keep increasing |M| and want to know how we should
increase E s /N0 such that the error probability PE gets smaller and smaller. It turns out that it is
the energy per bit that counts. We define E b i.e. the energy per transmitted bit of information

1 Es
Eb = . (8.11)
log2 |M|

Reliable communication is possible if E b > N0 ln 2. More precisely:


CHAPTER 8. ORTHOGONAL SIGNALING 75

Es/N0
0
10

−1
10

−2
10

−3
10

−4
10

−4 −2 0 2 4 6 8 10 12 14 16

Figure 8.1: Error probability PE for |M| orthogonal signals as a function of E s /N0 in dB for
log2 |M| = 1, 2, · · · , 16. Note that PE increases with |M| for fixed E s /N0 .

RESULT 8.2 The error probability for orthogonal signalling satisfies



2 exp(− log2 |M|[E b /(2N0 ) −√ln 2]) E b /N0 ≥ 4 ln 2
PE ≤ √ 2 (8.12)
2 exp(− log2 |M|[ E b /N0 − ln 2] ) ln 2 ≤ E b /N0 ≤ 4 ln 2.

The proof of result 8.2 can be found in appendix G.


The consequence of (8.12) is that if E b , i.e. the energy per bit, is larger than N0 ln 2 we can
get an arbitrary small error probability by increasing |M|, the dimensionality of the signal space.
This a capacity result!
Note that the number of bits that we transmit each time is log2 |M| and this number grows
much slower than |M| itself.

Example 8.2 In figure 8.2 we have plotted the error probability PE as a function of the ratio E b /N0 . It
appears that for ratios larger than ln 2 = −1.5917 dB (this number is called the Shannon limit) the error
probability decreases by making |M| larger.
CHAPTER 8. ORTHOGONAL SIGNALING 76

Eb/N0
0
10

−1
10

−2
10

−3
10

−4
10

−4 −2 0 2 4 6 8 10 12 14 16

Figure 8.2: Error probability for |M| = 2, 4, 8, · · · , 32768 now as function of the ratio E b /N0
in dB.

8.5 Exercises
1. A Hadamard matrix is a matrix whose elements are ±1. When n is a power of 2, an n × n
Hadamard matrix is constructed by means of the recursion:
 
1 +1 +1
H2 = ,
+1 −1
 
1 +Hn +Hn
H2n = . (8.13)
+Hn −Hn

be a power of 2 and M = {1, 2, · · · , n}. Consider for m ∈ M the signal vectors


Let n q
1
s m = Ens h m where h m is the m-th row of the Hadamard matrix Hn .

(a) Show that the signal set {s m , m ∈ M} consists of orthogonal vectors all having
energy E s .
(b) What is the error probability PE if the signal vectors correspond to waveforms that
are transmitted over a waveform channel with additive white Gaussian noise having
spectral density N0 /2.
(c) What is the advantage of using the Hadamard signal set over the orthogonal set from
definition 8.1 if we assume that the building-block waveforms are in both cases time-
shifts of a pulse? And the disadvantage?
CHAPTER 8. ORTHOGONAL SIGNALING 77

(d) The matrix n × 2n matrix


1 +Hn
H∗n =
−Hn . (8.14)

defines a set of 2n signals which is called bi-orthogonal. Determine the error proba-
bility if this signal set if the energy of each signal is E s .

A bi-orthogonal “code” with n = 32 was used for an early deep-space mission (Mariner,
1969). A fast Hadamard transform was used as decoding method [6].

2. Consider a communication system based on frequency-shift keying (FSK). There are 8


equiprobable messages, hence Pr{M = m} = 1/8 for m ∈ {1, 2, · · · , 8}. The signal
waveform corresponding to message m is

sm (t) = A 2 cos(2πmt), for 0 ≤ t < 1.

For t < 0 and t ≥ 1 all signals are zero. The signals are transmitted over an additive white
Gaussian noise waveform channel. The power spectral density of the noise is Sn ( f ) =
N0 /2 for all frequencies f .

(a) First show that the signals sm (t), m ∈ {1, 2, · · · , 8} are orthogonal1 . Give the ener-
gies of the signals. What are the building-block waveforms ϕ1 (t), ϕ2 (t), · · · , ϕ8 (t)
that result in the signal vectors

s1 = (A, 0, 0, 0, 0, 0, 0, 0),
s2 = (0, A, 0, 0, 0, 0, 0, 0),
···
s8 = (0, 0, 0, 0, 0, 0, 0, A)?
R1
(b) The optimum receiver first determines the correlations ri = 0 r (t)ϕi (t)dt for i =
1, 2, · · · , 8. Here r (t) is the received waveform hence r (t) = sm (t) + n w (t). For
what values of the vector r = (r1 , r2 , · · · , r8 ) does the receiver decide that message
m was transmitted?
(c) Give an expression for the error probability PE obtained by the optimum receiver.
(d) Next consider a system with 16 messages all having the same a-priori probability.
The signal waveform corresponding to message m is now given by

sm (t) = A 2 cos(2π mt), for 0 ≤ t < 1 for m = 1, 2, · · · , 8,

sm (t) = −A 2 cos(2π(m − 8)t), for 0 ≤ t < 1 for m = 9, 10, · · · , 16.

For t < 0 and t ≥ 1 these signals are again zero. What are the signal vectors
s 1 , s 2 , · · · , s 16 if we use the building vectors mentioned in part (a) again?
1 Hint: 2 cos(a) cos(b) = cos(a − b) + cos(a + b).
CHAPTER 8. ORTHOGONAL SIGNALING 78
R1
(e) The optimum receiver again determines the correlations ri = 0 r (t)ϕi (t)dt for i =
1, 2, · · · , 8. For what values of the vector r = (r1 , r2 , · · · , r8 ) does the receiver now
decide that message m was transmitted? Give an expression for the error probability
PE obtained by the optimum receiver now.

(Exam Communication Principles, October 6, 2003.)


Part III

Transmitter Optimization

79
Chapter 9

Signal energy considerations

SUMMARY: The collection of vectors that corresponds to the signal-waveforms is called


the signal structure. If the signal structure is translated or rotated the error probability
need not change. In this chapter we determine the translation vector that minimizes the
average signal energy. After that we demonstrate that orthogonal signaling achieves a cer-
tain error performance with twice as much energy as antipodal signal. The energy loss of
orthogonal sets with many signals can be neglected however.

9.1 Translation and rotation of signal structures


In the additive white1 Gaussian noise case, if a signal structure, i.e. the collection of vectors
s 1 , s 2 , · · · , s |M| , is translated or rotated the error probability PE need not change. To see why,
just assume that the corresponding decision regions I1 , I2 , · · · , I|M| (which need not be opti-
mum) are translated or rotated in the same way. Then, because the additive white Gaussian
noise-vector is spherically symmetric in all dimensions in the signal space, and since distances
between decoding regions and signal points did not change, the error probability remains the
same. However, in general, the average signal energy changes if a signal structure is translated.
Rotation about the origin has no effect on the average signal energy.

9.2 Signal energy


Here we want to stress again that
Z T
E sm = sm2 (t)dt = ks m k2 , (9.1)
0
1 Although we are concerned with an AGN vector channel, we use the word ”white” to emphasize that this vector

channel is equivalent to a waveform channel with additive white Gaussian noise.

80
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 81

therefore we only need to know the collection of signal vectors, the signal structure, and the
message probabilities to determine the average signal energy which is defined as
1
X X
E av = Pr{M = m}E sm = Pr{M = m}ks m k2 = E[kSk2 ]. (9.2)
m∈M m∈M

9.3 Translating a signal structure


The smallest possible average error probability of a waveform communication system only de-
pends on the signal structure, i.e. on the collection of vectors s 1 , s 2 , · · · , s |M| . It does not
change if the entire signal structure is translated. The optimum decision regions simply translate
too. Translation may affect the average signal energy however. We next determine the translation

ϕ2 ϕ 02

¤¡
£¢
¤¡
£¢ ϕ 01
¤¡
£¢
­
Á ¤¡
a­­ £¢
¤¡
­ £¢ ϕ1
­

Figure 9.1: Translation of a signal structure, moving the origin of the coordinate system to a.

vector that minimizes the average signal energy.


Consider a certain signal structure. Let E av (a) denote the average signal energy when the
structure is translated over −a, or equivalently when the origin of the coordinate system is moved
to a (see figure 9.1), then
X
E av (a) = Pr{M = m}ks m − ak2 = E[kS − ak2 ]. (9.3)
m∈M

We now have to find out how we should choose the translation vector a that minimizes
E av (a). It turns out that the best choice is
X
a= Pr{M = m}s m = E[S]. (9.4)
m=1,|M|

This follows from considering an alternative translation vector b. The energy of the signal struc-
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 82

s1(t)/sqrt(Es) s2(t)/sqrt(Es)
1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

s1(t)/sqrt(Es) s2(t)/sqrt(Es)
1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 9.2: The two signal sets of (9.6) and (9.7) in waveform representation.

ture after translation by b is

E av (b) = E[kS − bk2 ] = E[k(S − a) + (a − b)k2 ]


= E[kS − ak2 + 2(S − a) · (a − b) + ka − bk2 ]
= E[kS − ak2 ] + 2(E[S] − a) · (a − b) + ka − bk2
= E[kS − ak2 ] + ka − bk2
≥ E[kS − ak2 ], (9.5)

where equality in the third line follows from a = E[S]. Observe that E av (b) is minimized only
for b = a = E[S].

RESULT 9.1 This implies that to minimize the average signal energy we should choose the
center of gravity of the signal structure as the origin of the coordinate system.
In what follows we will discuss some examples.

9.4 Comparison of orthogonal and antipodal signaling


Let M = {1, 2} and Pr{M = 1} = Pr{M = 2} = 1/2. Consider a first signal set of two
orthogonal waveforms (see the two sub-figures in the top row in figure 9.2)
p
s1 (t) = 2E sin(10π t)
p s
s2 (t) = 2E s sin(12π t), for 0 ≤ t < 1, (9.6)
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 83

ϕ2
s 2 ¤£ ¢¡
s¤2¡ s¤1¡
s¤1¡ £¢ £¢
£¢ 0 ϕ1
0 ϕ1

(a) (b)

Figure 9.3: Vector representation of the orthogonal (a) and the antipodal signal set (b).

0
10

−1
10

−2
10

−3
10

−4
10

−5
10

−6
10

−7
10
−15 −10 −5 0 5 10 15

Figure 9.4: Probability of error for binary antipodal and orthogonal signaling as function of
E s /N0 in dB. It is assumed that Pr{M = 1} = Pr{M = 2} = 1/2. Observe the difference of 3
dB between both curves.
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 84

and a second signal set of two antipodal signals (see the bottom row sub-figures in figure 9.2)
p
s1 (t) = 2E sin(10π t)
p s
s2 (t) = − 2E s sin(10π t), also for 0 ≤ t < 1. (9.7)

Note that binary FSK (frequency-shift keying) is the same as orthogonal signaling, while binary
PSK (phase-shift keying) is identical to antipodal signaling.
The vector representations of the signal sets are
p
s 1 = ( E s , 0),
p
s 2 = (0, E s ), (9.8)

for the first (orthogonal) set, and


p
s 1 = ( E s , 0),
p
s 2 = (− E s , 0), (9.9)

for the second (antipodal) set.


The average signal energy for both sets is equal to E s . For the error probabilities in the case
of additive white Gaussian noise with power spectral density N0 /2 we get
 

2E s p
= Q  q  = Q( E s /N0 ),
orthog.
PE
2 N20
 

2 Es p
= Q  q  = Q( 2E s /N0 ).
antipod.
PE (9.10)
N0
2 2

Note that the distance d between the signal points differs by a factor 2 and PE = Q(d/2σ )).
These error probabilities are depicted in figure 9.4. We observe a difference in the error
probabilities of 3.0 dB2 . By this we mean that, in order to achieve a certain value of PE , we have
to make the signal energy twice as large for orthogonal signaling than for antipodal signaling.
The better performance of antipodal signaling relative to orthogonal signaling is best explained
by the fact that for antipodal signaling the center of gravity is the origin of the coordinate system
while for orthogonal signaling this is certainly not the case.
We may conclude that binary PSK modulation achieves a better error-performance than bi-
nary FSK. An advantage of FSK modulation is however that efficient FSK receivers can be
designed that do not recover the phase. PSK on the other hand requires coherent demodulation.
2 For a description of the meaning of decibels etc. we refer to appendix H.
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 85

9.5 Energy of |M| orthogonal signals


The average energy E av of the signals in an orthogonal set (all signals having energy E s ) is
X
E av = E[kSk2 ] = Pr{M = m}ks m k2 = E s . (9.11)
m∈M

The center of gravity of our orthogonal signal structure is


1 1 1 p
a=( , ,··· , ) Es . (9.12)
|M| |M| |M|
We know from (9.5) that
E av (0) = E av (a) + kak2 (9.13)
thus, since E av (0) = E av ,
Es
E s = E av (a) + (9.14)
|M|
or
1
E av (a) = E s (1 − ). (9.15)
|M|
Observe that E av (a) is the average signal energy after translating the signal structure such that
the origin is the center of gravity. For |M| → ∞ the difference between the E av (a) and E s can
be neglected. Therefore we conclude that an orthogonal signal set is not optimal in the sense that
for a given error performance it needs the smallest possible average signal energy, but that the
difference diminishes if M → ∞.

9.6 Exercises
1. Either of the two waveform sets illustrated in figures 9.5 and 9.6 may be used to commu-
nicate one of four equally likely messages over an additive white Gaussian noise channel.

(a) Show that both sets use the same energy.


(b) Exploit the union bound to show that the set of figure 9.6 uses energy almost 3 dB
more effectively than the set of figure 9.5 when a small PE is required.

(Exercise 4.18 from Wozencraft and Jacobs [25].)

2. In the communication system diagrammed in figure 9.7 the transmitted signal-vectors


(sm1 , sm2 ) and the noises n 1 and n 2 are all statistically independent. Assume that |M| = 2
CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 86

s1 (t) s2 (t)
√ √
E s /2 E s /2

1 2 3 4 t 1 2 3 4 t

s3 (t) s4 (t)
√ √
E s /2 E s /2

1 2 3 4 t 1 2 3 4 t

Figure 9.5: Signal set (a).

s1 (t) s2 (t)
√ √
Es Es

1 2 3 4 t 1 2 3 4 t

s3 (t) s4 (t)
√ √
Es Es

1 2 3 4 t 1 2 3 4 t

Figure 9.6: Signal set (b).


CHAPTER 9. SIGNAL ENERGY CONSIDERATIONS 87

i.e. M = {1, 2} and that


Pr{M = 1} = Pr{M = 2} = 1/2,
√ √
(s11 , s12 ) = (3 E, 2 E),
√ √
(s21 , s22 ) = (− E, −2 E),
1 n2
p N1 (n) = p N2 (n) = √ exp(− 2 ). (9.16)
2πσ 2 2σ

n1
m sm1 ¾?»r1 = sm1 + n 1 m̂
- - + - -
transmitter ½¼ optimum
¾» receiver
- + -
sm2 ½¼ r2 = sm2 + n 2
6
n2

Figure 9.7: Communication system.

r1
¾?»
r1 + αr2 m̂
+ - threshold device -
¾» ½¼
- × 6

r2 ½¼
6
α

Figure 9.8: Detector structure.

(a) Show that the optimum receiver can be realized as diagrammed in figure 9.8. Deter-
mine α, the optimum threshold setting, and the value m̂ for r1 + αr2 larger than the
threshold.
(b) Express the resulting PE in terms of Q(·).
(c) What is the average energy E av of the signals. By translation of the signal structure
we can reduce this average energy without changing the error probability. Determine
the smallest possible average signal energy that can be obtained in this way.
(Exam Communication Principles, July 5, 2002.)
Chapter 10

Signaling for message sequences

SUMMARY: In this chapter we will describe the problem of transmitting a continuous


stream of messages. How should we use the available signal power Ps in such a way that we
achieve a large information rate R together with a small error probability PE ?

10.1 Introduction
In the previous part we have considered transmission of a single randomly-chosen message
m ∈ M, over a waveform channel during the time-interval 0 ≤ t < T . The problem that was to
be solved there was to determine the optimum receiver, i.e the receiver that minimizes the error
probability PE for a channel with additive white Gaussian noise. We have found the following
solution:
1. First form an orthonormal base for the set of signals.
2. The optimum receiver determines the finite-dimensional vector representation of the re-
ceived waveform, i.e. it projects the received waveform onto the orthonormal signal base.
3. Then using this vector representation of the received waveform, it determines the message
that has the (or a) largest a-posteriori probability. This is done by minimizing expression
(4.12) over all m ∈ M, hence by acting as optimum vector receiver.
In the present part we shall investigate the more practical situation where we have a con-
tinuous stream of messages that are to be transmitted over our additive white Gaussian noise
waveform channel.

10.2 Definitions, problem statement


Definition 10.1 We assume that each T seconds one out of |M| messages is to be transmitted.
The messages are equally likely1 i.e. Pr{M = m} = 1/|M| for all m ∈ M.
1 Only for the sake of simplicity we assume that all messages are equally likely. This is not essential however.

88
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 89

m1 ∈ M m2 ∈ M m3 ∈ M ···

0 T 2T 3T t

Figure 10.1: A continuous stream of messages.

The waveform that corresponds to message m ∈ M is sm (t). It is non-zero only for 0 ≤


t < T . If the message in the k -th interval [(k − 1)T, kT ) is m , the signal in that interval is
sk,m (t) = sm (t − (k − 1)T ).

Definition 10.2 For each signal sm (t) for m ∈ M the available energy is E s hence
Z T
Es ≥ sm2 (t)dt Joule. (10.1)
0
Therefore the available power is
Es1
Ps = Joule/second. (10.2)
T
Definition 10.3 The transmission rate R is defined as
1 log2 |M|
R= bit/second. (10.3)
T
Definition 10.4 Now the available energy per transmitted bit is defined as
1 Es
Eb = Joule/bit. (10.4)
log2 |M|
From the above definitions we can deduce for the available energy per transmitted bit that
Es T Ps
Eb = = , (10.5)
T log2 |M| R
hence Ps = R E b .
Our problem is now to determine the maximum rate at which we can communicate reli-
ably over a waveform channel when the available power is Ps . What are the signals that are to
be used to achieve this maximum rate? In the next sections we will first consider two extremal
situations, namely bit-by-bit signaling and block-orthogonal signaling.

10.3 Bit-by-bit signaling


10.3.1 Description
Suppose that in T seconds, we want to transmit K bits b1 b2 · · · b K . Then
|M| = 2 K ,
log2 |M| K
R = = . (10.6)
T T
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 90

x(t)
c
¡@
R∞
¡ @ −∞ x 2 (t)dt = E b

¡
¡ @
@
τ t

s(t)
c x(t − 3τ )
¡@
¡ @ ¡@
¡ @ ¡@
¡ @

¡
¡ @
@¡¡ @
@ ¡
¡ @
@
τ 2τ@@ ¡
¡3τ 4τ@@ ¡
¡5τ t

@
@¡¡ @
@¡¡

Figure 10.2: A bit-by-bit waveform for K = 5 and message 11010.

We can realize this by transmitting a signal s(t) that is composed out of K pulses x(t) that
are shifted in time (see figure 10.2). More precisely
X 
−1 if bi = 0
s(t) = si x(t − (i − 1)τ ) with si = (10.7)
+1 if bi = 1.
i=1,K

Here x(t) is a pulse with energy E b and duration τ .


To evaluate the performance of bit-by-bit signaling we first have to determine the building-
block waveforms that are the base of our signals. It will be clear that if we take for i =
1, 2, · · · , K
1 x(t − (i − 1)τ )
ϕi (t) = √ (10.8)
Eb
then for i = 1, 2, · · · , K and j = 1, 2, · · · , K
Z ∞
ϕi (t)ϕ j (t)dt = δi j , (10.9)
−∞

and we have an orthonormal base. Hence the√building-block waveforms are the time-shifts over
multiples of τ of the normalized pulse x(t)/ E b as is shown in figure 10.3. Now we can deter-
mine the signal structures that correspond to all the signals. For K = 1, 2, 3 we have shown the
signal structures for bit-by-bit signaling in figure 10.4. The structure is always a K -dimensional
hypercube.
To see how the decision regions look like note first that
p p p
s 1 = (− E b , − E b , · · · , − E b ). (10.10)
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 91

ϕ1 (t) ϕ2 (t) ϕ3 (t) ϕ4 (t) ϕ5 (t)


√c
Eb ¡@
¡ @ ¡@
¡ @ ¡@
¡ @ ¡@
¡ @ ¡@
¡ @

¡
¡ @
@¡¡ @
@¡¡ @
@¡¡ @
@¡¡ @
@
τ 2τ 3τ 4τ 5τ t

Figure 10.3: Our K = 5 building-block waveforms.


2 Eb
¾ -
ϕ2
¤¡ ¤¡
√ s3 £ ¢ £ ¢ s4
2 Eb
¾ -
ϕ1 ϕ1
¤¡ ¤¡
£¢ £¢
s1 0 s2 0

K =1 ¤¡ ¤¡
£¢ £¢
s1 s2
ϕ2
s s
¤3¡ ¤4¡
©£ ¢
© ©£ ¢
©
© © K =2
©© ©
© ©©
¤¡ ©© ¤©
¡ ©©
s 7 £©
¢ £¢
s8
ϕ1
©
©
©©© 0
ϕ3 ©©
©©
© ¤¡ ¤¡
©£ ¢
© ©£ ¢ s 2
©
© ©© s1 ©©
© ©©
¡©
¤© ¡©
¤©
£¢ £¢
s5 s6 K =3
¾ -

2 Eb

Figure 10.4: Bit-by-bit signal structure for K = 1, 2, 3.


CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 92

We claim2 that the optimum receiver decides m̂ = 1 if

ri < 0, for all i = 1, K (10.11)

Now for messages other than for m = 1 similar arguments show that the decision regions are
separated by hyperplanes ri = 0 for i = 1, K .
To find an expression for PE note that the√signal hypercube is symmetrical and assume that
s 1 was transmitted. No error occurs if ri = − E b + n i < 0 for all i = 1, K or, in other words,
if p
n i < E b for all i = 1, K , (10.12)
hence, noting that σ 2 = N0 /2, we get
 p K
PC = 1 − Q( 2E b /N0 ) . (10.13)

Therefore  K
p
PE = 1 − 1 − Q( 2E b /N0 ) . (10.14)
Observe that to estimate bit bi for i = 1, K , an optimum receiver only needs to consider the
received signal ri in dimension i.

10.3.2 Probability of error considerations


Note that for bit-by-bit signaling K = RT (see equation (10.6)) and that E b = Ps /R (see
equation (10.5)) and substitute this in (10.14), our expression for PE . We then obtain
 p  RT
PE = 1 − 1 − Q( 2Ps /R N0 ) . (10.15)

Based on this expression for PE we can now investigate what we should do to improve the
reliability of our system. √
First note that we should have K = RT ≥ 1. For K = 1 we get PE = Q( 2Ps /R N0 ).
There are two conclusions that can be drawn now:

• For fixed average power Ps and fixed rate R the average error probability increases and
approaches 1 if we increase T . We cannot improve reliability by increasing T therefore.

• For fixed T the average error probability PE can only be decreased by increasing the power
Ps or by decreasing the rate R.
This seemed to be ”the end of the story” for communication engineers before Shannon presented
his ideas in [20] and [21].
2 To

√ why, consider a signal s m for some m 26 = 1. Consider
see √ a dimension
2
√ i for which smi = E b . Note that
2 2
√ for ri < 0 we have (ri − 2smi ) = (ri − 2 E b ) > (ri + E b ) = (ri − s1i ) . For dimensions
s1i = − E b hence
i with smi = − E b = s1i we get (ri − smi ) = (ri − s1i ) . Taking together all dimensions i = 1, K we obtain
kr − s m k2 > kr − s 1 k2 and therefore as claimed m̂ = 1 is chosen by an optimum receiver if ri < 0 for all i = 1, K .
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 93

10.4 Block-orthogonal signaling


10.4.1 Description

ϕ(t) s27 (t)


¤¤CC ¤¤CC

0 τ t 0 8τ 16τ 24τ 32τ t

R∞ 2
R∞ 2
−∞ ϕ (t)dt = 1 −∞ s27 (t)dt = E s = T Ps

Figure 10.5: A block-orthogonal waveform for K = 5 and message 11010 or m = 27.

Next we will consider block-orthogonal signaling. We again want to transmit K bits in T


seconds. We do this by sending one out of 2 K orthogonal pulses every T seconds. If we use
pulse-position modulation (PPM) the |M| = 2 K signals are
p
sm (t) = E s ϕ(t − (m − 1)τ ), for m = 1, 2 K , (10.16)

where ϕ(t) has energy 1 and duration not more than τ with
T
τ= . (10.17)
2K
All signals within the block [0, T ) are orthogonal and have energy E s . That is why we call our
signaling method block-orthogonal.

10.4.2 Probability of error considerations


To determine the error probability PE for block-orthogonal signaling we assume that

E b /N0 = (1 + )2 ln 2, for 0 ≤  ≤ 1, (10.18)

i.e. we are willing to spend slightly more than N0 ln 2 Joule per transmitted bit of information.
Now we obtain from (8.12) that
p √
PE ≤ 2 exp(− log2 |M|[ (1 + )2 ln 2 − ln 2]2 )
= 2 exp(− log2 |M| 2 ln 2). (10.19)
CHAPTER 10. SIGNALING FOR MESSAGE SEQUENCES 94

Finally substitution of log2 |M| = RT (see (10.3) yields

PE ≤ 2 exp(− 2 RT ln 2). (10.20)

What does (10.18) imply for the rate R? Note that E b = Ps /R (see (10.5)). This results in

E b /N0 = Ps /R N0 = (1 + )2 ln 2, (10.21)

in other words
1 Ps
R= . (10.22)
(1 + )2 N0 ln 2
Based on (10.20) and (10.22) we can now investigate what we should do to improve the reliability
of our system. This leads to the following result.

THEOREM 10.1 For an available average power Ps we can achieve rates R smaller than but
arbitrarily close to
1 Ps
C∞ = bit/second (10.23)
N0 ln 2
while the error probability PE can be made arbitrarily small by increasing T . Observe that C∞ ,
the capacity, depends only on the available power Ps and spectral density N0 /2 of the noise.
This is a Shannon-type of result. The reliability can be increased not only by increasing the
power Ps or decreasing the rate R but also by increasing the “codeword-lengths” T . It is also
important to note that only rates up to the capacity C∞ can be achieved reliably. We will see
later that rates larger than C∞ are indeed not achievable. Therefore it is correct to use the term
capacity here.
One might get the feeling that this could finish our investigations. We have found the capacity
of the waveform channel with additive white Gaussian noise. In the next chapter we will see that
so far we have ignored an important property of signal sets: their dimensionality.

10.5 Exercises
1. Rewrite the bound in (8.12) on PE in terms of R, T and C∞ . Consider3
1 1
E(R) = lim − ln PE (T, R), (10.24)
T →∞ T
for the best possible signaling methods with rate not smaller than R having signals with
duration T . The function E(R) of R is called the reliability function.
Now compute a lower bound on the reliability function E(R) and draw a plot of this lower
bound. It can be shown that E(R) is equal to the lower bound that we have just derived
(see Gallager [7]).

3 Just assume that the limit in this definition exists.


Chapter 11

Time, bandwidth, and dimensionality

SUMMARY: Here we first discuss the fact that, to obtain a small error probability PE ,
block-orthogonal signaling requires a huge number of dimensions per second. Then we
show that a bandwidth W can only accommodate roughly 2W dimensions per second.
Hence block-orthogonal signaling is only practical when the available bandwidth is very
large.

11.1 Number of dimensions for bit-by-bit and block-orthogonal


signaling
We again assume that in an interval of duration T we have to transmit K bits.
Then for bit-by-bit signaling we need K = RT dimensions per block. For block-orthogonal
signaling we need 2 K = 2 RT dimensions per block.
Therefore for bit-by-bit signaling we need K /T = R dimensions per second. For block-
orthogonal signaling we need 2 K /T = 2 RT /T dimensions per second.
Note that for block-orthogonal signaling the number of dimensions per second explodes by
increasing T . In the next section we will see that a channel with a finite bandwidth cannot
accommodate all these dimensions. Increasing T is however necessary to improve the reliability
of a block-orthogonal system hence finite bandwidth creates a problem.

11.2 Dimensionality as a function of T


The following theorem will not be proved. For more information on this subject check Wozen-
craft and Jacobs [25]. The Fourier transform is described shortly in appendix B.

RESULT 11.1 (Dimensionality theorem) Let {φi (t), for i = 1, N } denote any set of orthogo-
nal waveforms such that for all i = 1, N ,

1. φi (t) = 0 for t outside [− T2 , T2 ),

95
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 96

2. and also Z Z
W ∞
2 2
|8i ( f )| d f ≥ (1 − ηW ) |8i ( f )|2 d f (11.1)
−W −∞
for the spectrum 8i ( f ) of φi (t). The parameter W is called bandwidth (in Hz).

Then the dimensionality N of the set of orthogonal waveforms is upper-bounded by


2W T
N≤ 2
. (11.2)
1 − ηW

The theorem says that if we require almost all spectral energy of the waveforms to be in the
frequency range [−W, W ], the number of waveforms cannot be much more than roughly 2W T .
In other words the number of dimensions is not much more than 2W per second.
The “definition” of bandwidth in the dimensionality theorem may seem somewhat arbitrary,
but what is important is that the number of dimensions grows not faster than linear in T .
Note that instead of [0, T ) as in previous chapters, we have restricted ourselves here to the
interval [− T2 , T2 ] here. The reason for this is that the analysis is more easy this way.
Next we want to show the converse statement, i.e. that the number of dimensions N can grow
linearly with T . Therefore consider 2K + 1 orthogonal waveforms that are always zero except
for − T2 ≤ t ≤ T2 . In that case the waveforms are defined as

φ0 (t) = 1
√ t √ t
φ1c (t) = 2 cos(2π ) and φ1s (t) = )
2 sin(2π
T T
c
√ t √ t
φ2 (t) = 2 cos(4π ) and φ2s (t) = 2 sin(4π )
T T
···
√ t √ t
φ Kc (t) = 2 cos(2K π ) and φ Ks (t) = 2 sin(2K π ). (11.3)
T T
We now determine the spectra of all these waveforms.

1. The spectrum of the first waveform φ0 (t) is


Z T /2
80 ( f ) = exp(− j2π f t)dt
−T /2
Z T /2
−1
= d exp(− j2π f t)
j2π f −T /2
 
−1 T T sin π f T
= exp(− j2π f ) − exp( j2π f ) = . (11.4)
j2π f 2 2 πf

Note that this is the so called sinc-function. In figure 11.1 the signal φ0 (t) and the corre-
sponding spectrum 80 ( f ) are shown for T = 1.
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 97

x0(t) X0(f)

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2

−1 −0.5 0 0.5 1 −4 −2 0 2 4

Figure 11.1: The signal φ0 (t) and the corresponding spectrum 80 (t) for T = 1.

2. The spectrum of the cosine-waveform φkc (t) for a specified k can be determined by observ-
ing that
√ kt
φkc (t) = φ0 (t) 2 cos 2π
 T 
√ 1 kt kt
= φ0 (t) 2 exp( j2π ) + exp(− j2π ) (11.5)
2 T T

hence (see appendix B)


 
1 k k
8ck ( f ) = √ 80 ( f − ) + 80 ( f + ) . (11.6)
2 T T

In figure 11.2 the signal φ5c (t) and the corresponding spectrum 8c5 ( f ) are shown again for
T = 1.

We now want to find out how much energy is in certain frequency bands. E.g. MATLAB
tells us that
Z ∞ 
sin π f T 2
d f = T and
−∞ πf
Z 1/T  
sin π f T 2
d f = 0.9028T, (11.7)
−1/T πf

hence less than 10 % of the energy of the sinc-function is outside the frequency band [−1/T, 1/T ].
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 98

x5c(t) X5c(f)
1.5
0.7

0.6
1

0.5

0.5
0.4

0.3
0
0.2

0.1
−0.5

−1
−0.1

−0.2
−1.5
−1 −0.5 0 0.5 1 −5 0 5

Figure 11.2: The signal φ5c (t) and the corresponding spectrum 8c5 ( f ) for T = 1.

Numerical analysis shows that never more than 12 % of the energy of 8ck ( f ) is outside
the frequency band [− k+1 k+1
T , T ] for k = 1, 2, · · · . Similarly we can show that the spectrum
of the sine-waveform φks (t) has never more than 5 % of its energy outside the frequency band
[− k+1 k+1
T , T ] for such k. For large k both percentages approach roughly 5 %.
The frequency band needed to accommodate all 2K + 1 waveforms is [− KT+1 , KT+1 ]. Now
suppose that the available bandwidth is W . Then, to have not more than 12 % out-of-band energy,
K should satisfy
K +1
W ≥ . (11.8)
T
If, for a certain bandwidth W we take the largest K , the number N of orthogonal waveforms is

N = 2K + 1 = 2bW T − 1c + 1 > 2(W T − 2) + 1 = 2W T − 3. (11.9)

RESULT 11.2 There are at least N = 2W T − 3 orthogonal waveforms over [− T2 , T2 ] such that
less than 12 % of their spectral energy is outside the frequency band [−W, W ]. This implies that
a fixed bandwidth W can accommodate 2W dimensions per second for large T .
The dimensionality theorem and the converse result show that block-orthogonal signaling has
a very unpleasant property that concerns bandwidth. This is demonstrated by the next example.

Example 11.1 For block-orthogonal signaling the number of required dimensions in an interval of T
seconds is 2 RT . If the channel bandwidth is W , the number of available dimensions is roughly 2W T ,
hence the bandwidth W should be such that W ≥ 2 RT /(2T ).
CHAPTER 11. TIME, BANDWIDTH, AND DIMENSIONALITY 99

100
Now consider subsection 10.4.2 and take  = 0.1. Then (see (10.22)) R = C .
121 ∞
To get an
acceptable PE we have to take RT  100 (see (10.20)), hence

2 RT 2 RT 2100
W ≥ = R R. (11.10)
2T 2RT 200
Even for very small values of the rate R this causes a big problem. We can also conclude that only very
small spectral efficiencies R/W can be realized1 .

11.3 Remark
In this chapter we have studied building-block waveforms that are time-limited. As a conse-
quence these waveforms have a spectrum which is not frequency-limited. We can also investi-
gate waveforms that are frequency-limited. However this implies that these waveforms are not
time-limited anymore. Chapter 14 on pulse modulation deals with such building blocks.

11.4 Exercises

1. Consider the Gaussian pulse x(t) = ( 2πσ 2 )−1 exp(−t 2 /(2σ 2 )) and signals such as
X
sm (t) = smi x(t − iτ ), for m = 1, 2, · · · , |M|, (11.11)
i=1,N

constructed from successive τ -second translates of x(t). Constrain the inter-pulse interfer-
ence by requiring that
Z ∞ Z ∞
x(t − kτ )x(t − lτ )dt ≤ 0.05 x 2 (t)dt, for all k 6= l, (11.12)
−∞ −∞

and constrain the signal bandwidth W by requiring that x(t) have no more than 10 % of
its energy outside the frequency interval [−W, W ]. Determine the largest possible value
of the coefficient c in the equation N = cT W when N  1.
(Exercise 5.4 from Wozencraft and Jacobs [25].)

1 Note however that e.g. for  = 1 the required number of dimensions can be acceptable. However then the rate
R is only one fourth of the capacity Ps /(N0 ln 2).
Chapter 12

Bandlimited transmission

SUMMARY: Here we determine the capacity of the bandlimited additive white Gaussian
noise waveform channel. We approach this problem in a geometrical way. The key idea is
random code generation (Shannon, 1948).

12.1 Introduction
In the previous chapter we have seen that for a waveform channel with bandwidth W , the num-
ber of available dimensions per second is roughly 2W . We have also seen that reliable block-
orthogonal signaling requires, already for very modest rates, many more dimensions per second.
The reason for this was that, for rates close to capacity, although the error probability decreases
exponentially with T , the number of necessary dimensions increases exponentially with T . On
the other hand bit-by-bit signaling requires as many dimensions as there are bits to be transmit-
ted, which is acceptable from the bandwidth perspective. But bit-by-bit signaling can only made
reliable by increasing the transmitter power Ps or by decreasing the rate R.
This raises the question whether we can achieve reliable transmission at certain rates R by
increasing T , when both the bandwidth W and available power Ps are fixed. The following
sections show that this is indeed possible. We will describe the results obtained by Shannon
[21]. His analysis has a strongly geometrical flavor. These early results may not be as strong as
possible but they certainly are very surprising and give the right intuition.
We will determine the capacity per dimension, not per unit of bandwidth since the definition
of bandwidth is somewhat arbitrary. We will work with vectors as we learned in the previous
part of these course notes. Note that all messages are equiprobable.

Definition 12.1 The energies of all signals are upper-bounded by E s which is defined to be the
number of dimensions N times E N , i.e.
X 1
ks m k2 = 2
smi ≤ E s = N E N , for all m ∈ M. (12.1)
i=1,N

E N is therefore the energy available per dimension.

100
CHAPTER 12. BANDLIMITED TRANSMISSION 101

³» » X XP
´© » » X X
3́H Q
´
©
¡ 0 ´ ££± H @
¡¡ r ´ £ @@
­ ¡ ´´ £ n0 @J
£ ¢¢ ´´ £
AA B
¤¤´¤¤ ´ £ CC CC
¤£¡
´ ³
1£ ¢
´ ³³ 0
´ CC CC ³³³ sm ¤¤ ¤¤
´ ³
´ ³³³ B A ¢
´³³ A ¢ £
¡³
¤³ ´ ³ J @ ¡­
0 £´
¢ @ @
@ HHX
¡
© ¡
¡
QP X » »© ³ ´
X X » »

Figure 12.1: Hardening of the noise sphere around a signal point.

It has advantages to consider here normalized versions of the signal, noise and received vectors.
These normalized versions are defined in the following way:

Definition 12.2 The normalized version r 0 of r is defined by


1 √
r0 = r/ N.
As usual N is the number of components in r or the number of dimensions. Consequently also
1 √
R 0 = R/ N for the random variables R 0 and R that correspond to the realizations r 0 and r .

12.2 Sphere hardening (noise vector)


The random noise vector N has N √components, each with mean 0 and variance σ 2 = N0 /2. The
normalized noise vector N 0 = N / N . Its expected squared length is
1 X 2 1 X 1 X N0 N0
E[kN 0 k2 ] = E[ Ni ] = E[Ni2 ] = = . (12.2)
N N N 2 2
i=1,N i=1,N i=1,N

For the variance of the squared length of N 0 we find that


1 X 1 X
var[kN 0 k2 ] = 2
var[ N i
2
] = 2
var[Ni2 ]
N N
i=1,N i=1,N
1 X  
4 2 2
= E[N i ] − (E[N i ])
N2
i=1,N
 
1 X  4 2 2
 2σ 4 2 N0 2
= 3σ − (σ ) = = . (12.3)
N2 N N 2
i=1,N

Observe that this variance approaches zero for N → ∞. Now Chebyshev’s inequality can be
used to show that the probability of observing a normalized noise vector with squared length
smaller than N0 /2 −  or larger than N0 /2 +  for any fixed  > 0 approaches zero for N → ∞.
CHAPTER 12. BANDLIMITED TRANSMISSION 102

RESULT 12.1 We have demonstrated that the normalized


√ received vector r 0 is (roughly speak-
ing) on the surface of a hypersphere with radius N0 /2 around the signal vector s m . Small
fluctuations are possible, however for N → ∞ these fluctuations diminish.

12.3 Sphere hardening (received vector)


The received vector r =√s m + n has N components. The normalized received vector r 0 =
s 0m + n 0 where s 0m = s m / N . The squared length of the normalized received vector can now be
expressed as

kr 0 k2 = ks 0m + n 0 k2 = ks 0m k2 + 2(s 0m · n 0 ) + kn 0 k2
2 X
= ks 0m k2 + smi n i + kn 0 k2 . (12.4)
N
i=1,N

Note that the term ks 0m k2 is ”non-random” for a given message m while the other two terms in
(12.4) depend on N and are therefore random variables. For the expected squared length of the
random normalized received vector R we therefore obtain that
2 X
E[kR 0 k2 ] = ks 0m k2 + smi E[Ni ] + E[kN 0 k2 ]
N
i=1,N
N0 (a) N0
= ks 0m k2 + ≤ EN + . (12.5)
2 2
Here (a) follows from the fact that ks 0m k2 = ks m k2 /N ≤ E s /N = E N by definition 12.1. It can
be shown that also the variance of the squared length of R 0 approaches zero for N → ∞. See
also exercise 2 at the end of this chapter.

RESULT 12.2 . We have demonstrated


√ that the normalized received vector r 0 is roughly speak-
ing within a sphere with radius E N + N0 /2 and center at the origin of the coordinate system.
For a normalized signal vector with energy equal to E N the received vectors are however on the
surface of this hypersphere.

12.4 The interior volume of a hypersphere


Consider in N dimensions a hypersphere with radius r . The volume of this sphere is Vr = B N r N
where B N is some constant depending in N (see e.g. appendix 5D in Wozencraft and Jacobs
[25]). Now also consider a hypersphere with the same center but with a slightly smaller radius
r −  for some  > 0, see figure 12.2. The volume of this smaller hypersphere is now Vr − =
B N (r − ) N . This leads to the following result.
CHAPTER 12. BANDLIMITED TRANSMISSION 103

³»» X XP
´© » X
» XH Q
© H
¡¡¡ @@@
­ ¡ r ½@
>J
½
£ ¢¢ ½ AA B
½ CC CC
 ¤¤ ¤¤ ¡½
¤½
- ¾ £¢ -
r −
CC CC ¤¤ ¤¤
B AA ¢
¢ £
J@ ¡­
@ @
@ H ¡ ¡
HX »©© ¡
QP X » ³ ´
X
X » »

Figure 12.2: A hypersphere with radius r −  concentric within a hypersphere of radius r , the
difference is a shell with thickness .

RESULT 12.3 The ratio of the volume of the ”interior” of a hypersphere in N dimensions and
the hypersphere itself is
Vr − r − N
=( ) (12.6)
Vr r
which approaches zero for N → ∞. Consequently for N → ∞ the shell volume Vr − Vr −
constitutes the entire volume Vr of the hypersphere, no matter how small the thickness  > 0 of
the shell is.

12.5 An upper bound for the number |M| of signal vectors


For reliable transmission, the (disjoint) decision regions Im , m√ ∈ M should not be essentially
smaller than a hypersphere with radius (slightly larger than) N0 /2. This follows from result
12.1. Note that we are referring to normalized vectors. On the other√hand we know that all
received vectors are inside a hypersphere of radius (slightly larger than) E N + N0 /2. This was
a consequence of result 12.2. To obtain an upper bound on the number |M| of signal vectors that
allow reliable communication, we can now apply a so-called sphere-packing argument.

RESULT 12.4 For reliable transmission, the number of signals |M| can not be larger than the
volume of the hypersphere containing the received vectors, divided by the volume of the noise
hyperspheres, hence
  N /2 ! N /2
B N E N + N20 E N + N20
|M| ≤   N /2 = N0
. (12.7)
N0
BN 2 2

Here B N is again the constant in the formula Vr = B N r N for the volume Vr of an N -dimensional
hypersphere with radius r .
We now have our upper bound on the number |M| of signal vectors. It should be noted that
in the book of Wozencraft and Jacobs [25] a more detailed proof can be found.
CHAPTER 12. BANDLIMITED TRANSMISSION 104

»» XXX »
³» X
XP
!!» aa ´ Q
©
© H
H
´ Q¡ ¡ @
@
¡ 7
¶­ Z
A
@ J
¡ ¶ @ Z n 0
¶ £ A Z B
¡ A @Z CC
¶ ¤¤
­ ¶ h A J ZZ ~
AA © *
©
¢¢ s 0m ¶ AAA©©
¯ ¶ C © L ¤¤
¯ ¶
C
© ©© L
B £
¤ ¶
© ©©J C ­
¤ ¶ © C
¤ © 0 @@ C ¡
¡
¶ © r QP
¡©
¤¶
© X X »
» ³´
£¢
0
C ¤
C ¤
C ¤
L ¯
L ¯
AA ¢¢
J ­
@ ¡
@ ¡
@ ¡
Q
H
Ha ©©´
aXX !!
X »»»

Figure 12.3: Random coding situation

12.6 A lower bound to the number |M| of signal vectors


Shannon [21] constructed an existence proof to show that there are signal sets that allow reli-
able communication and that contain surprisingly many signals. His argument involved random
coding.

12.6.1 Generating a random code


Fix the number of signal vectors |M| and their length N . To obtain a signal set, choose |M|
normalized signal vectors s 01 , s 02 , · · · , s 0|M| at random, independently of each other, uniformly

within a hypersphere centered at the origin having radius E N . Consider the ensemble of all
signal sets that can be chosen in this way.

12.6.2 Error probability


We are now interested in PEav , i.e. the error probability PE averaged over the ensemble of signal
sets. The averaging corresponds to the choice of the signal sets. Once we know PEav we claim
that there exists at least one signal set with error probability PE ≤ PEav . Therefore we will first
show that PEav is small enough.
CHAPTER 12. BANDLIMITED TRANSMISSION 105

Consider figure 12.3. Suppose that the signal s 0m for some fixed m ∈ M was actually sent.
The noise vector n 0 is added to the signal vector and hence the received vector turns out to be
r 0 = s 0m + n 0 . Now an optimum receiver will decode the message m only if there are no other
signals sk0 , k 6= m, inside a hypersphere with radius n 0 around r 0 . This is a consequence of result
4.1 that says that minimum Euclidean distance decoding is optimum if all messages are equally
likely.

0
(i) Note that result 12.3 implies that, with probability √ approaching one for N → ∞, the signal
s m was chosen on the surface of a sphere with radius E N . This is a consequence of the √ selec-
tion procedure of the signal vectors which is uniform over the hypersphere with radius E N .
Therefore we may assume that ks 0m k2 = E N . (ii) By result 12.2 the normalized received√ vector
0
r is now, with probability approaching one, on the surface of a sphere with radius E N + N0 /2
centered at the origin, and thus kr 0 k2 = E N√+ N0 /2. (iii) Moreover the normalized noise vector
n 0 is on the surface of a sphere with radius N0 /2 hence kn 0 k2 = N0 /2 by the sphere-hardening
argument (see result 12.1).
First we have to determine the probability that the signal corresponding to the fixed message
k 6= m is inside the sphere with center r 0 and radius kn 0 k. Therefore we need to know the volume
of the “lens” in the figure 12.3. Determining this volume is not so easy but we know that it is not
larger than the volume of a sphere with radius h, see figure 12.3, and center coinciding with the
center of the lens. This hypersphere contains the lens. Our next problem is therefore to find out
how large h is.
Observing the lengths of s 0m , n 0 , and r 0 , we may conclude that the angle between the normal-
ized signal vector s 0m and the normalized noise vector n 0 is π/2 (Pythagoras). Now we can use
simple geometry to determine the length h:
√ q N0
EN 2
h=q , (12.8)
E N + N20

and consequently
! N /2
E N N20
Vlens ≤ B N h N = B N N0
. (12.9)
EN + 2
The probability that a signal s k for a specific k 6= m √was selected inside the lens is (by the
uniform distribution over the hypersphere with radius E N ) now given by the volume of the
lens divided by the volume of the sphere. This ratio can be bounded as follows

N0
! N /2
Vlens 2
N /2
≤ N0
. (12.10)
BN E N EN + 2

By the union bound1 the probability that any signal s k for k 6= m is chosen inside the lens is at
most |M| − 1 times as large as the ratio (probability) considered in (12.10). Therefore we obtain
1 For the K events E 1 , E 2 , · · · , E K the probability Pr{E 1 ∪ E 2 ∪ · · · ∪ E K } ≤ Pr{E 1 } + Pr{E 2 } + · · · + Pr{E K }.
CHAPTER 12. BANDLIMITED TRANSMISSION 106

for the error probability averaged over the ensemble of signal sets
N
! N /2
0
PEav ≤ |M| 2
N0
. (12.11)
EN + 2

12.6.3 Achievability result


RESULT 12.5 Fix any δ > 0. Suppose that the number of signals |M| as a function of the
number of dimensions N is given by
!
N0 N /2
E N +
|M| = 2−δ N N
2
, (12.12)
0
2

then
lim PEav ≤ lim 2−δ N = 0. (12.13)
N →∞ N →∞
This implies the existence of signal sets, one for each value of N , with |M| as in (12.12), for
which lim N →∞ PE = 0.
Note that relation (12.12) makes reliable transmission possible. Since better methods could
exist, it serves as a lower bound on |M| when reliable transmission is required.

12.7 Capacity of the bandlimited channel


We can now combine results 12.4 and 12.5. Define
 
1 1 E N + N0 /2
C N = log2 , (12.14)
2 N0 /2
then result 12.4 (upper bound) shows that reliable communication is only possible if the rate R N
per dimension satisfies
log2 |M|
RN = ≤ CN . (12.15)
N
On the other hand result 12.5 (lower bound) demonstrates the existence of signal sets that allow
reliable communication with rates
R N ≥ C N − δ, (12.16)
for any value of δ > 0. Since we can take δ as small as we like we get the following theorem:

THEOREM 12.6 (Shannon, 1949) Reliable communication is possible over a waveform chan-
nel, when the available energy per dimension is E N and the spectral density of the noise is N20 ,
for all rates R N satisfying
R N < C N bit/dimension. (12.17)
Rates R N larger than C N are not realizable with reliable methods. The quantity C N is called the
capacity of the waveform channel in bit per dimension.
CHAPTER 12. BANDLIMITED TRANSMISSION 107

In the previous chapter we have seen that a waveform channel with bandwidth W can ac-
commodate at most 2W dimensions per second. With 2W dimensions per second, the available
energy per dimension is E N = Ps /(2W ) if Ps is the available transmitter power. In that case the
channel capacity in bit per dimension is
Ps
!  
1 2W + N 0 /2 1 Ps
C N = log2 = log2 1 + . (12.18)
2 N0 /2 2 W N0
Therefore we obtain the following result:

THEOREM 12.7 For a waveform channel with spectral noise density N0 /2, frequency band-
width W , and available transmitter power Ps , the capacity in bit per second is
 
Ps
C = 2W C N = W log2 1 + bit/second. (12.19)
W N0
Thus reliable communication is possible for rates R in bit per second smaller than C, while rates
larger than C are not realizable with arbitrary small PE .

12.8 Exercises
2
1. Let N be a random variable with density p N (n) = √ 1 n
exp(− 2σ 4
2 ). Show that E[N ] =
2π σ 2
3σ 4 .
2. Show that the variance of the squared length of the normalized received vector R 0 ap-
proaches zero for N → ∞.

n w,a (t)
¾?»
- + -
m ½¼ m̂
- transmitter n w,b (t) receiver -
¾?»
- + -
½¼

Figure 12.4: Water-filling situation

3. A transmitter having power Ps = 10 Watt is connected to a receiver by two waveform


channels (see figure 12.4). Each of these channels adds white Gaussian noise to its input
waveform. The power densities of the two noise processes are N0a /2 = 1 and N0b /2 = 2
for −∞ < f < ∞, while the frequency bandwidth W = 1Hz. What is the total capacity
of this parallel channel in bit per second? What happens if Ps = 1 Watt? The power
allocation procedure that should be used here is called “water-filling”.
Chapter 13

Wideband transmission

SUMMARY: We determine the capacity of the additive white Gaussian noise waveform
channel in the case of unlimited bandwidth resources. The result that is obtained in this
way shows that block-orthogonal signaling is optimal.

13.1 Introduction
We have seen that with block-orthogonal signaling we can achieve reliable communication at
rates
Ps
R< bit/second, (13.1)
N0 ln 2
if we only had access to as much bandwidth as we would want. We will prove in this chapter
that rates larger than Ps /(N0 ln 2) are not realizable with reliable signaling schemes. Therefore
we may indeed call Ps /(N0 ln 2) the capacity of the wideband waveform channel.

13.2 Capacity of the wideband channel


In the previous chapter the capacity of a waveform channel with frequency bandwidth limited to
W was shown to be  
Ps
W log2 1 + bit/second, (13.2)
W N0
see theorem 12.7. We can easily determine the wideband capacity by letting W → ∞.

RESULT 13.1 The capacity C∞ of the wideband waveform channel with additive white Gaus-
sian noise with power density N0 /2 for −∞ < f < ∞, when the transmitter power is Ps , follows

108
CHAPTER 13. WIDEBAND TRANSMISSION 109

ln( 1 + S/N ), S/N, and ln( 1 + S/N )/ S/N


2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5
S/N

Figure 13.1: The ratio ln(1 + S/N )/S/N and ln(1 + S/N ) as function of the S/N .

from:
 
Ps
C∞ = lim W log2 1 +
W →∞ W N0
 
Ps
Ps ln 1 + W N0 Ps
= lim P
= . (13.3)
W →∞ N0 ln 2 s N0 ln 2
WN 0

Expressed in nats per second this capacity is equal to Ps /N0 .

13.3 Relation between capacities, signal-to-noise ratio


Note that we can express the capacity of the bandlimited channel as
 
Ps
C = W log2 1 +
W N0
 
Ps
Ps ln 1 + W N0 ln(1 + S/N ) 1 Ps
= = C∞ , with S/N = . (13.4)
N0 ln 2 P
WN
s S/N W N0
0

Here S/N is the signal-to-noise ratio of the waveform channel. Note that the total noise power
is equal to the frequency band 2W times the power spectral density N0 /2 of the noise. We have
plotted the ratio ln(1 + S/N )/S/N in figure 13.1.
CHAPTER 13. WIDEBAND TRANSMISSION 110

RESULT 13.2 If we note that C = W log2 (1 + S/N ) then we can can distinguish two cases.

C∞ if S/N  1,
C≈ (13.5)
W log2 (S/N ) if S/N  1.

The case S/N  1 is called the power-limited case. There is enough bandwidth. When on the
other hand S/N  1 we speak about bandwidth-limited channels. Increasing the power by a
factor of two does not double the capacity but increases it only by one bit per second per Hz.
Finally we give some other ways to express the signal-to-noise ratio (under the assumption
that the number of dimensions per second is exactly 2W ):
Ps EN Eb R Eb
S/N = = = 2R N = . (13.6)
W N0 N0 /2 N0 W N0

13.4 Exercises
1. To obtain result 12.7 we have assumed that the number of dimensions that a channel with
bandwidth W can accommodate per second is 2W . What would happen with the wideband
capacity if instead the channel would give us αW dimensions per second?

2. Find out whether a telephone line channel (W = 3400Hz, C = 34000 bit/sec) is power- or
bandwidth-limited by determining its S/N . Also determine E b /N0 and R N .

3. For a fixed ratio E b /N0 , for some values of R/W reliable transmission is possible, for
other values of R/W this is not possible. Therefore the E b /N0 × R/W - plane can be
divided in a region in which reliable transmission is possible and a region where this is
not possible. Find out what function of R/W describes the boundary between these two
regions.
(The ratio R/W is sometimes called the bandwidth efficiency. Note that R/W = 2R N
where R N is the rate per dimension. Hence a similar partitioning is possible for the
E b /N0 × R N plane.)
Part IV

Channel Models

111
Chapter 14

Pulse amplitude modulation

SUMMARY: In this chapter we discuss serial pulse amplitude modulation. This method
provides us with a new channel dimension every T seconds if we choose the pulse according
to the so-called Nyquist criterion. The Nyquist criterion implies that the required band-
width W is larger than 1/(2T ). Multi-pulse transmission is also considered in this chapter.
Just as in the single-pulse case a bandwidth of 1/(2T ) Hz per used pulse is required. This
corresponds again to 2W = 1/T dimensions per second.

14.1 Introduction

p(t)
¢ @
¢ @
¢ @ t
0 T 2T
n w (t)
impulses
a0 , a1 , · · · , a K −1 sa (t) º ?· r (t)
- p(t) - + -
¹¸

+1 +1 +1 ¡¡@
@ s (t)
a
6 6 6 ¢ B ¡
¡ @@
¢ B
t ¢ £ @
@ t
B £
0 T 2T 3T 0 T B 2T 3T 4T 5T
£
? @

−1

Figure 14.1: A serial pulse-amplitude modulation system.

In chapter 11 we could observe that a channel with a bandwidth of W Hz can accommodate


roughly 2W T dimensions each T seconds (for large enough T ). We considered there building-
block waveforms of finite duration, therefore the bandwidth constraint could only be met approx-

112
CHAPTER 14. PULSE AMPLITUDE MODULATION 113

imately. Here we will investigate whether it is possible to get a new dimension every 1/(2W )
seconds. We allow the building blocks to have a non-finite duration. Moreover all these building-
block waveforms are time shifts of a ”pulse”. The subject of this chapter is therefore serial pulse
transmission.
Consider figure 14.1. We there assume that the transmitter sends a signal s(t) that consists
of amplitude-modulated time shifts of a pulse p(t) by an integer multiple k of the so-called
modulation interval T , hence
X
sa (t) = ak p(t − kT ). (14.1)
k=0,K −1
The vector of amplitudes a = (a0 , a1 , · · · , a K −1 ) consists of symbols ak , k = 0, K − 1 taking
values in the alphabet A. We call this modulation method serial pulse-amplitude modulation
(PAM).

14.2 Orthonormal pulses: the Nyquist criterion


14.2.1 The Nyquist result
If we have the possibility to choose the pulse p(t) ourselves we can choose it in such a way that
all time shifts of the pulse form an orthonormal base. In that case the pulse p(t) has to satisfy
Z ∞ 
0 1 if k = k 0 ,
p(t − kT ) p(t − k T )dt = (14.2)
−∞ 0 if k 6= k 0 ,
for integer k and k 0 . This is equivalent to
Z ∞ 
1 if k = 0,
p(τ ) p(τ − kT )dτ = p(t) ∗ p(−t)|t=kT = h(kT ) = (14.3)
−∞ 0 if k 6= 0,
1
where h(t) = p(t) ∗ p(−t). This time-domain restriction on the pulse p(t) is called the zero-
forcing (ZF) criterion. Later we will see why.
THEOREM 14.1 The frequency-domain equivalent to (14.3) which is known as the Nyquist
criterion for zero intersymbol interference is
1 X
Z( f ) = H ( f + m/T )
T
m=··· ,−1,0,1,2,···
1 X
= kP( f + m/T )k2 = 1 for all f ,
T
m=··· ,−1,0,1,2,···

where H ( f ) = P( f )P ∗ ( f ) = kP( f )k2 is the Fourier transform of h(t) = p(t) ∗ p(−t). Note
that kP( f )k is the modulus of P( f ). Moreover Z ( f ) is called the 1/T -aliased spectrum of
H ( f ).
Later in this section we will give the proof of this theorem. First we will discuss it however
and consider an important consequence of it.
CHAPTER 14. PULSE AMPLITUDE MODULATION 114

H( f ) Z( f )
¢ A ¢ A ¢ A ¢ A ¢ A ¢ A
¢ A ¢ A ¢ A ¢ A ¢ A ¢ A
¢ A AA ¢ A ¢ A ¢ A ¢ A ¢ A ¢¢
1
− 2T 1 f 3
− 2T − T1 − 2T
1 1 1 3 f
2T 2T T 2T

Figure 14.2: The spectrum H ( f ) = kP( f )k2 corresponding to a pulse p(t) that does not satisfy
the Nyquist criterion.

H( f ) Z( f )
T 1

1
− 2T 1 f 3
− 2T − T1 − 2T
1 1 1 3 f
2T 2T T 2T

Figure 14.3: The ideally bandlimited spectrum H ( f ) = kP( f )k2 . Note that P( f ) is a spectrum
with the smallest possible bandwidth satisfying the Nyquist criterion.

14.2.2 Discussion
• Since p(t) is a real signal, the real part of its spectrum P( f ) is even in f , and the imaginary
part of this spectrum is odd. Therefore the modulus kP( f )k of P( f ) is an even function
of the frequency f .

• If the bandwidth W of the pulse p(t) is strictly smaller than 1/(2T ) (see figure 14.2), then
the Nyquist criterion, which is based on H ( f ) = kP( f )k2 , can not be satisfied. Thus
no pulse p(t) that satisfies a bandwidth-W constraint, can lead to orthogonal signaling if
W < 1/(2T ).

• The smallest possible bandwidth W of a pulse that satisfies the Nyquist criterion is 1/(2T ).
The ”basic” pulse with bandwidth 1/(2T ) for which the Nyquist criterion holds has a so-
called ideally bandlimited spectrum, which is given by
 √
T if | f | < 1/(2T ),
P( f ) = (14.4)
0 if | f | > 1/2T ,

and T /2 for | f | = 1/(2T ). The corresponding H ( f ) = kP( f )k2 and 1/T -aliased
spectrum Z ( f ) can be found in figure 14.3.
The basic pulse p(t) that corresponds to the ideally bandlimited spectrum is shown in
figure 14.4. It is the well-known sinc-pulse
1 sin(πt/T )
p(t) = √ . (14.5)
T π t/T
CHAPTER 14. PULSE AMPLITUDE MODULATION 115

0.8

0.6

0.4

0.2

−0.2

−0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5

Figure 14.4: The sinc-pulse that corresponds to T = 1, i.e. p(t) = sin(π t)/(π t).

H( f ) Z( f )
T 1
¡ @ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡
¡ @ @
¡ ¡
@ ¡
@ ¡
@ ¡
@ ¡
@
¡ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡ @ ¡ @
1
− 2T 1 f 3
− 2T − T1 − 2T
1 1 1 3 f
2T 2T T 2T

Figure 14.5: A spectrum H ( f ) with excess bandwidth that satisfies the Nyquist criterion.

• A pulse with a bandwidth larger than 1/2T can also satisfy the Nyquist criterion as can
be seen in figure 14.5. The so-called excess bandwidth can be larger than 1/2T . Square-
root raised-cosine pulses p(t) have a spectrum H ( f ) = |P( f )|2 that satisfies the Nyquist
criterion and an excess bandwidth that can be controlled.

At he end of this subsection we state the implication of the previous discussion.

RESULT 14.2 The smallest possible bandwidth W of a pulse that satisfies the Nyquist criterion
t/T )
is W = 1/2T . The sinc-pulse p(t) = √1 sin(π has this property. Note that this way of serial
T π t/T
pulse transmission leads to exactly 2W dimensions per second.
Moreover observe that unlike before in chapter 11, our pulses and signals are not time-
limited!

14.2.3 Proof of the Nyquist result


We will next give the proof of result 14.1, the Nyquist result.
Proof: Since H ( f ) is the Fourier transform of h(t), the condition (14.3) can be rewritten as
Z ∞ 
1 if k = 0,
h(kT ) = H ( f ) exp( j2π f kT )d f = (14.6)
−∞ 0 if k 6= 0.
CHAPTER 14. PULSE AMPLITUDE MODULATION 116

We now break up the integral in parts, a part for each integer m, and obtain
∞ Z
X (2m+1)/2T
h(kT ) = H ( f ) exp( j2π f kT )d f
m=−∞ (2m−1)/2T
X∞ Z 1/2T
m
= H( f +) exp( j2π f kT )d f
m=−∞ −1/2T T
Z " ∞ #
1/2T X m
= H ( f + ) exp( j2π f kT )d f
−1/2T m=−∞
T
Z 1/2T
= Z ( f ) exp( j2π f kT )d f (14.7)
−1/2T

where we have defined Z ( f ) by



X
1 m
Z( f ) = H( f + ). (14.8)
m=−∞
T

Observe that Z ( f ) is a periodic function in f with period 1/T . Therefore it can be expanded in
terms of its Fourier series coefficients · · · , z −1 , z 0 , z 1 , z 2 , · · · as

X
Z( f ) = z k exp( j2π k f T ) (14.9)
k=−∞

where Z 1/2T
zk = T Z ( f ) exp(− j2π f kT )d f. (14.10)
−1/2T
If we now combine (14.7) and (14.10) we obtain that

T h(−kT ) = z k , (14.11)

for all integers k. Condition (14.3) now tells us that only z 0 = T and all other z k are zero. This
implies that
Z ( f ) = T, (14.12)
or equivalently

X m
H( f + ) = T. (14.13)
m=−∞
T
2
CHAPTER 14. PULSE AMPLITUDE MODULATION 117

r (t) H rk
- ¤ ¡ HH¤ ¡ -
p(T p − t) £¢ £¢

?
sample at t = T p + kT

Figure 14.6: The optimum receiver front-end for detection of serially transmitted orthonormal
pulses.

14.2.4 Receiver implementation


If we use serial PAM with orthonormal pulses p(t − kT ), k = 0, K − 1, then we can use a
single matched-filter m(t) = p(T p − t) for the computations of the correlations of the received
waveform r (t) with all building-block waveforms, i.e. with all pulses. Assume that the delay T p
is chosen in such a way that m(t) is (effectively) causal. The output of this filter m(t) when r (t)
is the input signal is
Z ∞ Z ∞
u(t) = r (α)m(t − α)dα = r (α) p(T p − t + α)dα. (14.14)
−∞ −∞

At time t = T p + kT we see at the filter output


Z ∞
rk = u(T p + kT ) = r (α) p(α − kT )dα, (14.15)
−∞

which is what the optimum receiver should determine, i.e. the correlation of r (t) with the pulse
p(t − kT ). This leads to the very simple receiver structure shown in figure 14.6. Processing the
samples rk , k = 1, K , should be done
P in the usual way.
When there is no noise r (t) = k=0,K −1 ak p(t − kT ). Then at time t = T p + kT we see at
the filter output
Z ∞
rk = u(T p + kT ) = r (α) p(α − kT )dα
−∞
Z ∞ X
= ak 0 p(α − k 0 T ) p(α − kT )dα
−∞ k 0 =0,K −1
X Z ∞
= ak 0 p(α − k 0 T ) p(α − kT )dα = ak , (14.16)
k 0 =0,K −1 −∞

by the orthonormality of the pulses. Conclusion is that there is no intersymbol interference


present in the samples. In other words the Nyquist criterion forces the intersymbol interference
to be zero (zero-forcing (ZF)).
CHAPTER 14. PULSE AMPLITUDE MODULATION 118

14.3 Multi-pulse transmission


Instead of using shifted versions of a single pulse we can apply the shifts of a set of orthonormal
pulses. Consider J pulses p1 (t), p2 (t), · · · , p J (t) which are orthonormal. The shifted versions
of these pulses can be used to convey a stream of messages. Again the shifts are over multiples
of T and T is called the modulation interval again. Thus a signal s(t) can be written as
X X
s(t) = ak j p j (t − kT ), (14.17)
k=0,K −1 j=1,J

for ak j ∈ A. If we want all pulses and its time-shifts to form an orthonormal basis, then the
time-shifts of all pulses should be orthogonal to the pulses p j (t), j = 1, J . In other words, we
require the J pulses to satisfy
Z ∞ 
0 1 if j = j 0 and k = k 0 ,
p j (t − kT ) p j 0 (t − k T )dt = (14.18)
−∞ 0 elsewhere,
for all j = 1, J , j 0 = 1, J , and all integer k and k 0 . A set of pulses p j (t) for j = 1, 2, · · · , J
that satisfies this restriction is
1 sin(π t/(2T ))
p j (t) = √ cos ((2 j − 1)π t/(2T )) . (14.19)
T πt/(2T )
For J = 4, and assuming that T = 1, these pulses and their spectra are shown in figure 14.7.
Observe that in this example the total bandwidth of the J pulses is J/(2T ).

RESULT 14.3 The smallest possible bandwidth that can be achieved for orthogonal multi-pulse
signaling with J pulses each T seconds is J/(2T ). Note therefore that also bandwidth-efficient
serial multi-pulse transmission leads to exactly 2W dimensions per second.
Proof: (Lee and Messerschmitt [13], p. 266) Just like in the proof of theorem 14.1 we can show
that the pulse-spectra P j ( f ) for j = 1, J must satisfy

1 X m m
P j ( f + )P j∗0 ( f + ) = δ j j 0 . (14.20)
T m=−∞ T T
Now fix a frequency f ∈ [−1/(2T ), +1/(2T )) and define for each j = 1, J the vector
1
P j = (· · · , P j ( f − 1/T ), P j ( f ), P j ( f + 1/T ), P j ( f + 2/T ), · · · ). (14.21)
Here P j ( f +m/T ) is the component at position m. With this definition condition (14.20) implies
that the vectors P j , j = 1, J are orthogonal. Assume for a moment that there are less than J
positions m for which at least one component P j ( f + m/T ) for some j = 1, J is non-zero. This
would imply that the J vectors P j , j = 1, J would be dependent. Contradiction!
Thus for each frequency interval d f ∈ [−1/(2T ), 1/(2T )) we may conclude that the J pulses
”fill” at least J disjoint intervals of size d f of the frequency spectrum. In total the J pulses fill a
part J/T of the spectrum. The minimally occupied bandwidth is therefore J/(2T ). 2
Note that the optimum receiver for multi-pulse transmission can be implemented with J
matched filters that are sampled each T seconds.
CHAPTER 14. PULSE AMPLITUDE MODULATION 119

1 1

0 0.5

−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1 1

0 0.5

−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1 1

0 0.5

−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
1 1

0 0.5

−1 0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

Figure 14.7: Time domain and frequency domain plots of the pulses in (14.19) for j = 1, 2, 3, 4
and T = 1.

14.4 Exercises
1. Determine the spectra of the pulses p j (t) for j = 1, 2, · · · , J defined in (14.19). Show
that these pulses satisfy (14.18). Determine the total bandwidth of these J pulses.
Chapter 15

Bandpass channels

SUMMARY: Here we consider transmission over an ideal bandpass channel with band-
width 2W . We show that building-block waveforms for transmission over a bandpass chan-
nel can be constructed from baseband building-block waveforms having a bandwidth not
larger than W . A first set of orthonormal baseband building-block waveforms can be mod-
ulated on a cosine carrier, a second orthonormal set on a sine carrier, and the resulting
bandpass waveforms are all orthogonal to each other. This technique is called quadrature
multiplexing. We determine the optimum receiver for this signaling method and the capac-
ity of the bandpass channel. We finally discuss quadrature amplitude modulation (QAM).

15.1 Introduction
So far we have only considered baseband communication. In baseband communication the chan-
nel allows signaling only in the frequency band [−W, W ]. But what should we do if the channel
only accepts signals with a spectrum in the frequency band ±[ f 0 − W, f 0 + W ]? In the present
chapter we will describe a method that can be used for signaling over such a channel. We start
by discussing this so-called bandpass channel. First we give a definition of the ideal bandpass
channel.

Definition 15.1 Consider figure 15.1. The input signal s(t) to the bandpass channel is first sent
through a filter W0 ( f ). This filter W0 ( f ) is an ideal bandpass filter with bandwidth 2W and

n w (t) W0 ( f )
¾? »r (t) f0 − W f0 + W
s(t) 1
- W0 ( f ) - + -
½¼
− f0 f0 f

Figure 15.1: Ideal bandpass channel with additive white Gaussian noise.

120
CHAPTER 15. BANDPASS CHANNELS 121

center frequency f 0 > W . It is defined by the transfer function



1 1 if f 0 − W < | f | < f 0 + W ,
W0 ( f ) = (15.1)
0 elsewhere.

The noise process Nw (t) is assumed to be stationary, Gaussian, zero-mean, and white. The noise
has power spectral density function
N0
S Nw ( f ) = for −∞ < f < ∞, (15.2)
2
1
and autocorrelation function R Nw (t, s) = E[Nw (t)Nw (s)] = N20 δ(t − s). The noise-waveform
n w (t) is added to the output of the bandpass filter W0 ( f ) which results in r (t), the output of the
bandpass channel.
In the next section we will describe a signaling method for a bandpass channel. This method
is called quadrature multiplexing.

15.2 Quadrature multiplexing


In quadrature multiplexing we combine two waveforms with a baseband spectrum into a wave-
form with a spectrum that matches with the bandpass channel. We assume that the two baseband
waveforms together contain a single message.
Consider a first set of baseband waveforms {s1c (t), s2c (t), · · · , s|M|c
(t)} with waveform smc (t)
for all m ∈ {1, 2, · · · , |M|}, having bandwidth smaller than W , i.e. with a spectrum Smc ( f ) ≡ 0
for | f | ≥ W . To this set of baseband-waveforms there corresponds a set of building-block wave-
forms {φi (t), i = 1, 2, · · · , Nc }. For all i = {1, 2, · · · , Nc } the spectrum of the corresponding
building-block waveform 8i ( f ) ≡ 0 for | f | ≥ W since the building-block waveforms are linear
combinations of the signal waveforms (Gram-Schmidt procedure, see appendix E).
Also consider a second set of baseband waveforms {s1s (t), s2s (t), · · · , s|M| s
(t)} with wave-
s
form sm (t) for all m ∈ {1, 2, · · · , |M|} having bandwidth smaller than W , i.e. Sms ( f ) ≡ 0
for | f | ≥ W . To this second set of waveforms there corresponds a set of building-block wave-
forms {ψ j (t), j = 1, 2, · · · , Ns }. For all j = {1, 2, · · · , Ns } the spectrum of the corresponding
building-block waveform 9 j ( f ) ≡ 0 for | f | ≥ W .
Now, to obtain a passband signal, the two waveforms s c (t) and s s (t) are combined into a
signal s(t) by modulating them on a cosine resp. sine wave with frequency f 0 and then adding
the resulting signals together (see figure 15.2), thus
√ √
s(t) = s c (t) 2 cos(2π f 0 t) + s s (t) 2 sin(2π f 0 t). (15.3)

We now make (15.3)


P morec precise. Note that to each P m ∈ M there corresponds a
message
waveform smc (t) = i=1,Nc smi φi (t) and a waveform sms (t) = j=1,Ns sms j ψ j (t). Therefore we
CHAPTER 15. BANDPASS CHANNELS 122


2 cos 2π f 0 t

s c (t) ¾ ?»
- ×
½¼
¾?» s(t)
+ -
½¼
¾» 6
- ×
s s (t) ½¼
6

2 sin 2π f 0 t

Figure 15.2: Quadrature multiplexing.

write:
√ √
sm (t) = smc (t) 2 cos(2π f 0 t) + sms (t) 2 sin(2π f 0 t)
X √ X √
c
= smi φi (t) 2 cos(2π f 0 t) + sms j ψ j (t) 2 sin(2π f 0 t). (15.4)
i=1,Nc j=1,Ns

This means that √ the signal sm (t) can be regarded


√ as a linear combination of “building-block”
waveforms φi (t) 2 cos(2π f 0 t) and ψ j (t) 2 sin(2π f 0 t). To see whether these waveforms are
actually building blocks we have to check whether they form an orthonormal set.
To this end consider the Fourier transforms of the building block waveforms1 φi (t) and ψ j (t):
Z ∞
8i ( f ) = φi (t) exp(−2π f t)dt,
−∞
Z ∞
9j( f ) = ψ j (t) exp(−2π f t)dt. (15.5)
−∞

Since both sets {φi (t), i = 1, Nc } and {ψi (t), j = 1, Ns } form an orthonormal base, using the
Parseval relation, we have for all i and j and all i 0 and j 0 that
Z ∞ Z ∞
δi,i 0 = φi (t)φi 0 (t)dt = 8i ( f )8i∗0 ( f )d f,
Z−∞∞
−∞
Z ∞
δ j, j =
0 ψ j (t)ψ j (t)dt =
0 9 j ( f )9 ∗j 0 ( f )d f. (15.6)
−∞ −∞

1 Note
that we do not assume here that the baseband building blocks are limited in time as in the chapters on
waveform communication.
CHAPTER 15. BANDPASS CHANNELS 123

8i ( f ) 8c,i ( f )

¡@ f0 − W f0 + W
¡ @ ½Z ½Z
¡ @ ½ Z ½ Z
½ Z ½ Z

−W 0 W f 0 f
− f0 f0

9j( f ) j9s, j ( f )

@ ¡
@ ¡ Z ½
@¡ Z ½
− f0 Z½

−W 0 W f 0 f
½Z f0
½ Z
½ Z

√ √
Figure 15.3: Spectra after modulation with 2 cos 2π f 0 t and 2 sin 2π f 0 t. For simplicity it is
assumed that the baseband spectra are real.

Now we are ready to determine the spectrum 8c,i ( f ) of a cosine or in-phase building-block
waveform √
1
φc,i (t) = φi (t) 2 cos 2π f 0 t (15.7)
and the spectrum 9s, j ( f ) of a sine or quadrature building-block waveform
1 √
ψs, j (t) = ψ j (t) 2 sin 2π f 0 t. (15.8)

We express these spectra in terms of the baseband spectra 8i ( f ) and 9 j ( f ) (see also figure
15.3).
Z ∞ √
8c,i ( f ) = φi (t) 2 cos(2π f 0 t) exp(−2π f t)dt
−∞
1
= √ (8i ( f − f 0 ) + 8i ( f + f 0 )) ,
2
Z ∞ √
9s, j ( f ) = ψ j (t) 2 sin(2π f 0 t) exp(−2π f t)dt
−∞
1 
= √ 9 j ( f − f0) − 9 j ( f + f0) . (15.9)
j 2
CHAPTER 15. BANDPASS CHANNELS 124

Note (see again figure 15.3) that the spectra 8c,i ( f ) and 9s, j ( f ) are zero outside the passband
±[ f 0 − W, f 0 + W ].
We now first consider the correlation between φc,i (t) and φc,i 0 (t) for all i and i 0 . This leads
to
Z ∞ Z ∞
φc,i (t)φc,i 0 (t)dt = 8c,i ( f )8∗c,i 0 ( f )d f
−∞ −∞
Z
1 ∞
= [8i ( f − f 0 ) + 8i ( f + f 0 )][8i∗0 ( f − f 0 ) + 8i∗0 ( f + f 0 )]d f
2 −∞
Z Z
1 ∞ ∗ 1 ∞
= 8i ( f − f 0 )8i 0 ( f − f 0 )d f + 8i ( f − f 0 )8i∗0 ( f + f 0 )d f
2 −∞ 2 −∞
Z Z
1 ∞ ∗ 1 ∞
+ 8i ( f + f 0 )8i 0 ( f − f 0 )d f + 8i ( f + f 0 )8i∗0 ( f + f 0 )d f
2 −∞ 2 −∞
Z ∞ Z
(a) 1 ∗ 1 ∞
= 8i ( f − f 0 )8i 0 ( f − f 0 )d f + 8i ( f + f 0 )8i∗0 ( f + f 0 )d f
2 −∞ 2 −∞
(b) 1 1
= δi,i 0 + δi,i 0 = δi,i 0 . (15.10)
2 2
Here equality (a) follows from the fact that f 0 > W and the observation that for all i = 1, Nc
the spectra 8i ( f ) ≡ 0 for | f | ≥ W . Therefore the cross-terms are zero, see also figure 15.3.
Equality (b) follows from (15.6) which holds since the baseband building-blocks φi (t) for i =
1, Nc , form an orthonormal base.
In a similar way we can show that for all j and j 0
Z ∞
ψs, j (t)ψs, j 0 (t)dt = δ j, j 0 . (15.11)
−∞

What remains to be investigated is the correlation between all in-phase (cosine) building-
block waveforms φc,i (t) for i = 1, Nc and all quadrature (sine) building-block waveforms
ψs, j (t) for j = 1, Ns :
Z ∞ Z ∞

φc,i (t)ψs, j (t)dt = 8c,i ( f )9s, j ( f )d f
−∞ −∞
Z
j ∞
= [8i ( f − f 0 ) + 8i ( f + f 0 )][9 ∗j ( f − f 0 ) − 9 ∗j ( f + f 0 )]d f
2 −∞
Z Z
j ∞ ∗ j ∞
= 8i ( f − f 0 )9 j ( f − f 0 )d f − 8i ( f − f 0 )9 ∗j ( f + f 0 )d f
2 −∞ 2 −∞
Z Z
j ∞ ∗ j ∞
+ 8i ( f + f 0 )9 j ( f − f 0 )d f − 8i ( f + f 0 )9 ∗j ( f + f 0 )d f
2 −∞ 2 −∞
Z ∞ Z ∞
(c) j ∗ j
= 8i ( f − f 0 )9 j ( f − f 0 )d f − 8i ( f + f 0 )9 ∗j ( f + f 0 )d f
2 −∞ 2 −∞
= 0. (15.12)
CHAPTER 15. BANDPASS CHANNELS 125

Here equality (c) follows from the fact that f 0 > W and the observation that for all i = 1, Nc
the spectra 8i ( f ) ≡ 0 for | f | ≥ W and for all j = 1, Ns the spectra 9 j ( f ) ≡ 0 for | f | ≥ W .
Therefore the cross-terms are zero. See figure 15.3.

RESULT 15.1 We have shown that all in-phase building-block waveforms φc,i (t) for i = 1, Nc
and all quadrature building-block waveforms ψs, j (t) for j = 1, Ns together form an orthonor-
mal base.
Moreover the spectra 8c,i ( f ) and 9s, j ( f ) of all these building-block waveforms are zero outside
the passband ±[ f 0 − W, f 0 + W ]. Therefore none of these building-block waveforms is hindered
by the bandpass filter W0 ( f ) when they are sent over our bandpass channel.
It is important to note that the baseband building-block waveforms φi (t) and ψ√j (t), for any
i and √ j, need not be orthogonal. Multiplication of these baseband waveforms by 2 cos 2π f 0 t
resp. 2 sin 2π f 0 t results in the orthogonality of the bandpass building-block waveforms φc,i (t)
and ψs, j (t).
Also note that all the bandpass building block waveforms have unit energy. This follows from
(15.10) and (15.11). Therefore the energy of sm (t) is equal to the squared length of the vector
s m = (s cm , s sm ) = (sm1
c c
, sm2 , · · · , smc Nc , sm1
s s
, sm2 , · · · , sms Ns ). (15.13)

15.3 Optimum receiver for quadrature multiplexing


The result of the previous section actually states that by applying quadrature multiplexing for a
bandpass channel we obtain a “normal” waveform channel communication problem. Quadrature
multiplexing apparently yields the right building-block waveforms! The building-block wave-
forms have spectra that are matched to the bandpass channel. Each message m ∈ M results
in a signal sm (t) that is a linear combination of Nc cosine and Ns sine building-block wave-
forms. Therefore the bandpass filter at the input of the bandpass channel has no effect on the
transmission of the signals.
We have seen before that a waveform channel is actually equivalent to a vector channel. Also
we know how the optimum receiver for a waveform channel should be constructed. If we restrict
ourselves for a moment to the receiver that correlates the received signal r (t) with the building
blocks, we know that it should start with Nc + Ns multipliers followed by integrators.
Z ∞ Z ∞ √
r (t)φc,i (t)dt = r (t) 2 cos(2π f 0 t)φi (t)dt = ric , and
Z −∞
∞ Z−∞∞ √
r (t)ψs, j (t)dt = r (t) 2 sin(2π f 0 t)ψ j (t)dt = r sj . (15.14)
−∞ −∞

After having determined the received vector r = (r c , r s ) = (r1c , r2c , · · · , r Nc c , r1s , r2s , · · · , r Ns s ) we
c c
can form the dot-products (r · s m ) where the signal vector s m = (s cm , s sm ) = (sm1 , sm2 , · · · , smc Nc ,
s s s
sm1 , sm2 , · · · , sm Ns ):
X X
(r · s m ) = ric smi
c
+ r sj sms j . (15.15)
i=1,Nc j=1,Ns
CHAPTER 15. BANDPASS CHANNELS 126

φ1 (t)
¾?» r1c
R
- × - -
½¼
φ2 (t)
¾?» r2c
R
- × - -
½¼
√ .. weighting
2 cos 2π f 0 t .
matrix
φ Nc (t) c1
¾ ? »¾ ?» r Nc c (r · s 1 ) ¾?»
R
- × - × - - - + -
½¼½¼ ½¼ m̂
r (t) c2 select -
(r · s 2 ) ¾? » largest
ψ1 (t) - + -
¾»¾ ?» r1s ½¼
R ..
- × - × - -
P c c .
½¼½¼
i ri smi
6 ψ (t) 2 P c|M|
+ j r sj sms j
√ ¾?» r2s (r · s |M|¾
) ?»
2 sin 2π f 0 t R
- × - - - + -
½¼ ½¼
..
.
ψ Ns (t)
¾?» r Ns s
R
- × - -
½¼

Figure 15.4: Optimum receiver for quadrature-multiplexed signals that are transmitted over a
bandpass channel.
CHAPTER 15. BANDPASS CHANNELS 127

Adding the constants cm is now the last step before selecting m̂ as the message m that maximizes
(r · s m ) + cm where (see result (6.1))

N0 ks m k2
cm = ln Pr{M = m} − , (15.16)
2 2
with
X X
ks m k2 = ks cm k2 + ks sm k2 = c 2
(smi ) + (sms j )2
i=1,Nc j=1,Ns
Z ∞ Z ∞
= (smc (t))2 dt + (sms (t))2 dt. (15.17)
−∞ −∞
R∞
Note that ks m k2 = −∞ sm2 (t)dt.
All this leads to the receiver implementation shown in figure 15.4.

15.4 Transmission of a complex signal


After the previous section we may think of quadrature multiplexing as a method for transmitting
two real-valued signals smc (t) and sms (t). However we can also see quadrature multiplexing as a
technique to transmit a single complex waveform

smcs (t) = smc (t) − jsms (t). (15.18)

Then we can write



sm (t) = <[smcs (t) 2 exp( j2π f 0 t)]
√ √
= smc (t) 2 cos(2π f 0 t) + sms (t) 2 sin(2π f 0 t), (15.19)

where <[x] denotes the real part of x.


The notion of energy of the complex signal smcs (t) is in accordance with the energy that we
have used in expressing the constants cm , hence
Z ∞ Z ∞ Z ∞
cs 2 c 2
ksm (t)k dt = (sm (t)) dt + (sms (t))2 dt. (15.20)
−∞ −∞ −∞

15.5 Dimensions per second


By applying quadrature multiplexing we can not obtain more than 4W dimensions per sec-
ond from our bandpass channel. By choosing both the baseband building-block waveform sets
{φi (t), i = 1, Nc } and {ψ j (t), j = 1, Ns } as large (rich) as possible given the bandwidth con-
straint of W Hz, i.e. by taking 2W dimensions per second for each of the building-block wave-
form sets, we obtain the upper bound of 4W dimensions per second. Note however that, as usual,
this corresponds to 2 dimensions per Hz per second.
CHAPTER 15. BANDPASS CHANNELS 128

15.6 Capacity of the bandpass channel


In the previous section we saw that the number of dimensions in the case of bandpass transmis-
sion is at most 4W per second. Moreover there exist baseband building-block sets that achieve
this maximum.

RESULT 15.2 Therefore the capacity in bit per dimension of the bandpass channel with band-
width ±[ f 0 − W, f 0 + W ] is
1 Ps
C N = log2 (1 + ), (15.21)
2 2N0 W
if the transmitter power is Ps and the noise spectral density is N0 /2 for all f . Consequently the
capacity per second is
Ps
C = 4W C N = 2W log2 (1 + ) bit/second. (15.22)
2N0 W

15.7 Quadrature amplitude modulation


In general we take Nc = Ns = N and the baseband in-phase and quadrature building-block
waveforms equal, i.e. φi (t) = ψi (t) for
√ all i = 1, N . Then the√two dimensions corresponding
to the passband building-blocks φi (t) 2 cos(2π f 0 t) and ψi (t) 2 sin(2π f 0 t) for i = 1, N are
in some way linked together. This will become clear in chapter 16 on random carrier-phase
communication. We refer to this method as quadrature amplitude modulation (QAM).

15.8 Serial quadrature amplitude modulation


It is important to realize that we could use serial pulse amplitude modulation with a pulse p(t)
satisfying the Nyquist criterion (see result 14.1) on both carriers (cosine and sine). A Nyquist
pulse corresponds to an orthonormal basis consisting of time shifts of this pulse. Two such sets
(based on the same pulse p(t)) can be used in a QAM scheme. Note that this signaling method
leads to an extremely simple receiver structure.
Example 15.1 QAM modulation results in two-dimensional signal structures. In serial QAM we can take
as baseband building blocks
√ sin(2π W t)
φ1 (t) = ψ1 (t) = 2W , (15.23)
2π W t
and all other baseband building blocks are shifts over multiples of 1/(2W ) seconds of this sinc-pulse.
Therefore the frequency bandwidth of all building blocks is W . After modulation a bandpass spectrum
is obtained, thus the spectra of the signals fit into ±[ f 0 − W, f 0 + W ]. In each pair of dimensions
corresponding to a shift over (k − 1)/(2W ) seconds we can observe a two-dimensional part (skc , sks ) of the
signal vector. This part assumes values in a two-dimensional signal structure.
In figure 15.5 two signal structures are shown. The signal points are placed on a rectangular grid.
This implies that their minimum Euclidean distance is not smaller than a certain value d. It is important to
place the required number of signal points in such a way on the grid that their average energy is minimal.
CHAPTER 15. BANDPASS CHANNELS 129

¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢

¤¡ ¤¡ ¤¾¡d ¤¡ ¤¡ ¤¡ ¤¡ ¤¾¡d ¤¡ ¤¡
£¢ £¢ £ ¢ -£ ¢ £¢ £¢ £¢ £ ¢ -£ ¢ £¢

¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢

¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢

¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢

¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢
(a) (b)

Figure 15.5: Two QAM signal structures, (a) 16-QAM, (b) 32-QAM. Note that 32-QAM is not
square!

15.9 Exercises
1. A signal structure must contain M signal points. These point are to be chosen on an integer
grid.
Now consider a hypercube and a hypersphere in N dimensions. Both have their center in
the origin of the coordinate system.
We can either choose our signal structure as all the grid points inside the sphere or all the
grid points inside the cube. Assume that the dimensions of the sphere and cube are such
that they contain equally many signal points.
Let M  1 so that the signal points can be assumed to be uniformly distributed over the
sphere and the cube.

(a) Let N = 2. Find the ratio between the average signal energies of the sphere-structure
and that of the cube-structure. What is the difference in dB?
(b) Let N be even. What is then the ratio for N → ∞? In dB?
Hint: The volume of an N -dimensional sphere is π N /2 R N /(N /2)! for even N .

The computed ratios are called shaping gains.


Chapter 16

Random carrier-phase

SUMMARY: Here we consider carrier modulation under the assumption that the carrier-
phase is not known to the receiver. Only one baseband signal s b (t) will be transmitted. First
we determine the optimum receiver for this case. Then we consider the implementation of
the optimum receiver when all signals have equal energy. We investigate envelope detection
and work out an example in which one of out of two orthogonal signals is transmitted.

16.1 Introduction
Because of oscillator drift or differences in propagation time it is not always reasonable to as-
sume that the receiver knows the phase of the wave that is used as carrier of the message, when
quadrature multiplexing would be used. In that case coherent demodulation is not possible.
To investigate this situation we assume that the transmitter modulates with the1 baseband
signal s b (t) a wave with a random phase θ, i.e.

s(t) = s b (t) 2 cos(2π f 0 t − θ), (16.1)

where the random phase 2 is assumed to be uniform over [0, 2π) hence
1
p2 (θ) = , for 0 ≤ θ < 2π. (16.2)

More precisely, for each message m ∈ M, let the signal smb (t) correspond to the vector
s m = (sm1 , sm2 , · · · , sm N ) hence
X
smb (t) = smi φi (t). (16.3)
i=1,N

We assume as before that the spectrum of the signals smb (t) for all m ∈ M is zero outside
[−W, W ]. Note that the same holds for the building-block waveforms φi (t) for i = 1, N . This
1 Notethat there is only one baseband signal involved here. The type of modulation is called double sideband
suppressed carrier (DSB-SC) modulation.

130
CHAPTER 16. RANDOM CARRIER-PHASE 131

is since the building-block waveforms are linear combinations of the signals. If we now use the
goniometric identity cos(a − b) = cos a cos b + sin a sin b we obtain

sm (t) = smb (t) 2 cos(2π f 0 t − θ)
√ √
= smb (t) cos θ 2 cos(2π f 0 t) + smb (t) sin θ 2 sin(2π f 0 t)
X √ X √
= smi cos θφi (t) 2 cos(2π f 0 t) + smi sin θφi (t) 2 sin(2π f 0 t). (16.4)
i=1,N i=1,N

If we now turn to the vector approach we can say that, although θ is unknown, this is equivalent
to quadrature multiplexing where vector-signals s cm and s sm are transmitted satisfying
s cm = s m cos θ,
s sm = s m sin θ. (16.5)
After receiving the signal r (t) = sm (t) + n w (t) an optimum-receiver forms the vectors r c and r s
for which
r c = s cm + n c = s m cos θ + n c ,
r s = s sm + n s = s m sin θ + n s , (16.6)
where both n c and n s are independent Gaussian vectors with independent components all having
mean zero and variance N0 /2. In the next section we will further determine the optimum receiver
for this situation.

16.2 Optimum incoherent reception


To determine the optimum receiver we first write out the decision variable for message m, i.e.
Pr{M = m} p R c ,R s (r c , r s |M = m)
Z 2π
= Pr{M = m} p2 (θ) p R c ,R s (r c , r s |M = m, 2 = θ)dθ. (16.7)
0
Note that we average over the unknown phase θ. Assume first that all messages are equally
likely. Therefore only the integral is relevant. The informative term inside the integral is
p R c ,R s (r c , r s |M = m, 2 = θ )
= p N c ,N s (r c − s m cos θ, r s − s m sin θ)
 
1 N kr c − s m cos θ k2 + kr s − s m sin θk2
= ( ) exp − . (16.8)
π N0 N0
Therefore we investigate
kr c − s m cos θk2 + kr s − s m sin θk2
= kr c k2 − 2(r c · s m ) cos θ + ks m k2 cos2 θ + kr s k2 − 2(r s · s m ) sin θ + ks m k2 sin2 θ
= kr c k2 + kr s k2 − 2(r c · s m ) cos θ − 2(r s · s m ) sin θ + ks m k2 , (16.9)
CHAPTER 16. RANDOM CARRIER-PHASE 132

and note that both kr c k2 and kr s k2 do not depend on the message m and can be ignored. There-
fore the relevant part of the decision variable is
Z    
2 c s Em
p2 (θ) exp [(r · s m ) cos θ + (r · s m ) sin θ] dθ exp − , (16.10)
θ N0 N0
where E m = ks m k2 . Note that we have to maximize the decision variable over all m ∈ M.
We next consider (r c · s m ) cos θ + (r s · s m ) sin θ . Therefore, for each m ∈ M, we first make
"
"
"
"
X m ""
" (r s , s m )
"
"
"
"
"
" B γm
"
"
(r c , s m )

Figure 16.1: Polar transformation of matched-filter outputs.

a transformation of the matched-filter outputs (r c · s m ) and (r s · s m ) from rectangular into the


polar coordinates (amplitude X m and phase γm ). Define (see figure 16.1):
q
1
X m = (r c · s m )2 + (r s · s m )2 (16.11)

and let the angle γm be such that


(r c · s m ) = X m cos γm ,
(r s · s m ) = X m sin γm . (16.12)
Therefore
(r c · s m ) cos θ + (r s · s m ) sin θ = X m cos γm cos θ + X m sin γm sin θ
= X m cos(θ − γm ), (16.13)
and
Z 2π  
2 c s
p2 (θ) exp [(r · s m ) cos θ + (r · s m ) sin θ] dθ
0 N0
Z 2π  
1 2X m
= exp cos(θ − γm ) dθ
0 2π N0
Z 2π    
(a) 1 2X m 2X m
= exp cos θ dθ = I0 . (16.14)
0 2π N0 N0
Here (a) follows from the periodicity of the cosine. Note that therefore the angle γm is not
relevant! The integral denoted by I0 (·) is called the “zero-order modified Bessel function of the
first kind” (see figure 16.2).
CHAPTER 16. RANDOM CARRIER-PHASE 133

12

10

0
0 0.5 1 1.5 2 2.5 3 3.5 4

1 1 R 2π
Figure 16.2: A plot of I0 (x) = 2π 0 exp(x cos θ)dθ as a function of x.

RESULT 16.1 Combining everything we obtain that the optimum receiver for “random-phase
transmission” (incoherent detection) has to choose the message m ∈ M that maximizes
   
2X m Em
I0 exp − . (16.15)
N0 N0

16.3 Signals with equal energy, receiver implementation


If all the signals have the same energy the optimum receiver only has to maximize I0 (2X m /N0 ).
However since I0 (x) is a monotonically increasing function of x this is equivalent to maximizing
X m or its square
2
Xm = (r c · s m )2 + (r s · s m )2 . (16.16)
How can we now determine e.g. (r c · s m )? We can use the standard approach √ as shown in
figure 15.4 and correlate r (t) with the bandpass building-block waveforms φi (t) 2 cos(2π f 0 t)
to obtain the components of r c and then form the dot product, but there is also a more direct way.
CHAPTER 16. RANDOM CARRIER-PHASE 134

Note that
X X Z ∞ 
c
(r · s m ) = ric smi = r (t)φic (t)dt smi
i=1,N i=1,N −∞
X Z ∞ √

= r (t)φi (t) 2 cos(2π f 0 t)dt smi
i=1,N −∞
Z ∞ √ X
= r (t) 2 cos(2π f 0 t) smi φi (t)dt
−∞ i=1,N
Z ∞ √
= r (t) 2 cos(2π f 0 t)smb (t)dt. (16.17)
−∞

A similar result holds for the product (r s · s m ). All this suggests the implementation shown in
figure 16.3. As usual there is also a matched-filter version of this incoherent receiver.

2 cos(2π f 0 t)
? » ¾» (r c · s 1 )
r (t) ¾ R
- × - × - - (·)2
½¼ ½¼ ¾ ?» X 12
6b -
s1 (t) +
¾» ¾ ?» ½¼
R 6
- × - × - - (·)2
½¼ ½¼ m̂
6 (r s · s 1 ) select -
¾» (r c · s 2 ) largest
√ R
2 sin(2π f 0 t) - × - - (·)2
½¼ ¾?»X 22
6b -
s2 (t) +
¾ ?» ½¼
R 6
- × - - (·)2
½¼
(r s · s 2 )
..
.
c
¾ » (r · s |M| )
R
- × - - (·)2
½¼ ¾?X»2
6b |M|
s|M| (t) + -
¾ ?» ½¼
R 6
- × - - (·)2
½¼
(r s · s |M| )

Figure 16.3: Correlation receiver for equal-energy signals with random phase.
CHAPTER 16. RANDOM CARRIER-PHASE 135

16.4 Envelope detection



b (t) 2 cos(2π f t − θ)
Fix an m ∈ M. The matched filter for the (bandpass)
√ signal s m (t) = s m 0
has impulse response sm (T − t) = smb (T − t) 2 cos(2π f 0 (T − t) − θ). Here it is assumed that
the signal sm (t) is zero outside [0, T ]. Since the phase θ of the transmitted signal is unknown to
the receiver, the impulse response

h m (t) = smb (T − t) 2 cos(2π f 0 t) (16.18)
is probably a good matched filter. Let’s investigate this!
The response of this filter on r (t) is
Z ∞
u m (t) = r (α)h m (t − α)dα
−∞
Z ∞ √
= r (α)smb (T − t + α) 2 cos(2π f 0 (t − α))dα
Z−∞
∞ √
= r (α)smb (T − t + α) 2[cos(2π f 0 t) cos(2π f 0 α) + sin(2π f 0 t) sin(2π f 0 α)]dα
−∞
Z ∞ √
= cos(2π f 0 t) r (α)smb (T − t + α) 2 cos(2π f 0 α)dα
−∞
Z ∞ √
+ sin(2π f 0 t) r (α)smb (T − t + α) 2 sin(2π f 0 α)dα
−∞
= u cm (t) cos(2π f 0 t) + u sm (t) sin(2π f 0 t), (16.19)
with
Z ∞ √
1
u cm (t) = r (α) 2 cos(2π f 0 α)smb (T − t + α)dα,
Z−∞
∞ √
1
u sm (t) = r (α) 2 sin(2π f 0 α)smb (T − t + α)dα. (16.20)
−∞

The signals u cm (t) and u sm (t) are baseband signals. Why? The reason is that we can regard
e.g. √u cm (t) as the output of the filter with impulse response smb (T − t) when the excitation is
r (t) 2 cos(2π f 0 t). It is clear that signal-components outside the frequency band [−W, W ] can
not pass the filter smb (T − t) since smb (t) is a baseband signal.
We now use the trick of section 16.2 again. Write the matched-filter output as
u m (t) = X m (t) cos[2π f 0 t − γm (t)] (16.21)
where q
1
X m (t) = (u cm (t))2 + (u sm (t))2 (16.22)
and angle γm (t) is such that
u cm (t) = X m (t) cos γm (t),
u sm (t) = X m (t) sin γm (t). (16.23)
CHAPTER 16. RANDOM CARRIER-PHASE 136

Now X m (t) is a slowly-varying “envelope” (amplitude) and γm (t) a slowly-varying ”phase” of


the output u m (t) of the matched filter. By slowly-varying we mean within bandwidth [−W, W ].
Next let us consider what happens at t = T . Observe from equations (16.17) that

u cm (T ) = (r c · s m ),
u sm (T ) = (r s · s m ), (16.24)

hence q
X m (T ) = (r c · s m )2 + (r s · s m )2 . (16.25)
Therefore, for equal-energy signals, we can construct an optimum√ receiver by sampling the
b
envelope of the outcomes of the matched filters h m (t) = sm (T −t) 2 cos(2π f 0 t) for all m ∈ M
and comparing the samples. This leads to the implementation shown in figure 16.4.

r (t) u 1 (t) X 1 (t) H X1


- - envelope H
¤¡ H¤£ ¡¢ -
h 1 (t) £¢
detector m̂
select -
largest
u 2 (t) X 2 (t) H X2
- - envelope H
¤¡ H¤£ ¡¢ -
h 2 (t) £¢
detector

..
.

u |M| (t) X |M| (t) H X |M|


- - envelope ¤ ¡ HH¤ ¡ -
h |M| (t) £¢ £¢
detector

sample at t = T

Figure 16.4: Envelope-detector receiver for equal energy signals with random phase.

Note that instead of using a matched filter with (passband) impulse response smb (T − t) 2
cos(2π f 0 t) we can first multiply the received signal r (t) by cos(2π f 0 t) and then use a matched
filter with (baseband) impulse response smb (T − t).

Example 16.1 For some m ∈ M assume that smb (t) = 1 for 0 ≤ t ≤ 1 and zero elsewhere. If the random
phase turns out to be θ = π/2 then
√ √
sm (t) = smb (t) 2 cos(2π f 0 t − π/2) = smb (t) 2 sin(2π f 0 t). (16.26)

For the matched-filter impulse response, noting that T = 1, we can write



h m (t) = smb (1 − t) 2 cos(2π f 0 t). (16.27)
CHAPTER 16. RANDOM CARRIER-PHASE 137

If we assume that is no noise, hence r (t) = sm (t), the output u m (t) of the matched filter will be
Z ∞
u m (t) = r (α)h m (t − α)dα
−∞
Z ∞ √ √
= smb (α) 2 sin(2π f 0 α)smb (1 − t + α) 2 cos(2π f 0 (t − α)dα. (16.28)
−∞

All these signals are shown in figure 16.5 for f 0 = 7Hz.


Note that the envelope of u m (t) has the triangular shape which is the output of a matched filter for a
rectangular pulse if the input of this matched filter is this rectangular pulse.

sbm(t) sm(t)=sbm(t)sqrt(2)sin(2*pi*f 0*t)


1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5
−0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5

b
hm(t)=sm(1−t)sqrt(2)cos(2*pi*f 0*t) um(t)
1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5
−0.5 0 0.5 1 1.5 2 2.5 −0.5 0 0.5 1 1.5 2 2.5

Figure 16.5: Signals smb (t), sm (t), and u m (t), and impulse-response h m (t).

16.5 Probability of error for two orthogonal signals


An incoherent receiver does not consider phase information. Therefore it can not yield so small
an error probability as a coherent receiver. To illustrate this we will now calculate the probability
of error for white Gaussian noise when one of two equally likely messages is communicated
over a system utilizing an incoherent receiver and DSB-SC modulated equal-energy orthogonal
baseband signals
p p
s1b (t) = E s φ1 (t), hence s 1 = ( E s , 0), and
p p
s2b (t) = E s φ2 (t), hence s 2 = (0, E s ). (16.29)
CHAPTER 16. RANDOM CARRIER-PHASE 138

An optimum receiver (see result 16.1) now chooses m̂ = 1 if and only if

(r c · s 1 )2 + (r s · s 1 )2 > (r c · s 2 )2 + (r s · s 2 )2 . (16.30)

Here all vectors are two-dimensional, more precisely

r c = s m cos θ + n c
r s = s m sin θ + n s . (16.31)

When M = 1 then the vector components of r c = (r1c , r2c ) and r s = (r1s , r2s ) are
p
r1c = E s cos θ + n c1 ,
p
r1s = E s sin θ + n s1 ,
r2c = n c2 ,
r2s = n s2 , (16.32)
√ √
where n c = (n c1 , n c2 ) and n s = (n s1 , n s2 ). Now by s 1 = ( E s , 0) and s 2 = (0, E s ) the optimum
receiver decodes m̂ = 1 if and only if

(r1c )2 + (r1s )2 > (r2c )2 + (r2s )2 . (16.33)

The noise components are all statistically independent Gaussian variables with density function

1 n2
p N (n) = √ exp(− ). (16.34)
π N0 N0

Now fix some θ. Assume that (r1c )2 + (r1s )2 = ρ 2 . What is now the probability of error given
that M = 1 and 2 = θ? Therefore consider

Pr{ M̂ = 2|2 = θ, M = 1, (R1c )2 + (R1s )2 = ρ 2 } = Pr{(R2c )2 + (R2s )2 ≥ ρ 2 }


Z Z
= p N (α) p N (β)dαdβ
α 2 +β 2 ≥ρ 2
Z Z
1 α2 + β 2
= exp(− )dαdβ
α 2 +β 2 ≥ρ 2 π N0 N0
Z ∞ Z 2π 1 r2
= exp(− )rdr dθ
ρ 0 π N0 N0
ρ2
= exp(− ). (16.35)
N0
CHAPTER 16. RANDOM CARRIER-PHASE 139

We obtain the error probability for the case where M = 1 by averaging over all R1c and R1s , hence
Pr{ M̂ = 2|2 = θ, M = 1}
Z ∞Z ∞
c s (r1c )2 + (r1s )2
= p R1c ,R1s (r1 , r1 |2 = θ, M = 1) exp(− )dr1c dr1s
−∞ −∞ N 0
Z ∞Z ∞
c s (r1c )2 + (r1s )2
= p R1 (r1 |2 = θ, M = 1) p R1 (r1 |2 = θ, M = 1) exp(−
c s )dr1c dr1s
−∞ −∞ N0
Z ∞ c 2
(r )
= p R1c (r1c |2 = θ, M = 1) exp(− 1 )dr1c ·
−∞ N0
Z ∞
(r s )2
· p R1s (r1s |2 = θ, M = 1) exp(− 1 )dr1s . (16.36)
−∞ N0

Consider the first factor. Note that r1c = E s cos θ + n c1 . Therefore
Z ∞
(r c )2
p R1c (r1c |2 = θ, M = 1) exp(− 1 )dr1c
−∞ N0
Z ∞ √
1 (α − E s cos θ)2 α2
= √ exp(− ) exp(− )dα. (16.37)
−∞ π N0 N0 N0

With m = E s cos θ this integral becomes
Z ∞
1 (α − m)2 α2
√ exp(− ) exp(− )dα
−∞ π N0 N0 N0
Z ∞
1 2α − 2αm + m 2
2
= √ exp(− )dα
−∞ π N0 N0
Z ∞
1 2(α 2 − αm + m 2 /4) + m 2 /2
= √ exp(− )dα
−∞ π N0 N0
m2 Z ∞ m2
exp(− 2N ) 1 (α − m/2) 2 exp(− 2N )
= √ 0 √ exp(− )dα = √ 0
2 −∞ π N0 /2 N0 /2 2
cos θ 2
exp(− E s2N )
= √ 0 . (16.38)
2
Combining this with a similar result for the second factor we obtain
cos θ 2sin θ 2
exp(− E s2N ) exp(− E s2N ) 1 Es
Pr{ M̂ = 2|2 = θ, M = 1} = √ 0
· √ 0 = exp(− ). (16.39)
2 2 2 2N0
Although we have fixed θ this probability is independent of θ . Averaging over θ yields therefore
1 Es
Pr{ M̂ = 2|M = 1} = exp(− ). (16.40)
2 2N0
Based on symmetry we can obtain a similar result for Pr{ M̂ = 1|M = 2}.
CHAPTER 16. RANDOM CARRIER-PHASE 140

RESULT 16.2 Therefore we obtain for the error probability for an incoherent receiver for two
equally likely orthogonal signals, both having energy E s , that
1 Es
PE = exp(− ). (16.41)
2 2N0
Note that for coherent reception of two equally likely orthogonal signals we have obtained
before that s
Es 1 Es
PE = Q( ) ≤ exp(− ). (16.42)
N0 2 2N0
Here we have used the bound on the Q-function that was derived in appendix A. Figure 16.6
shows that the error probabilities are comparable however.

Es/N0
0
10

−1
10

−2
10

−3
10

−4
10

−5
10

−6
10

−7
10
−15 −10 −5 0 5 10 15

Figure 16.6: Probability of error for coherent and incoherent reception of two equally likely
orthogonal signals of energy E s as a function of E s /N0 in dB. Note that probability of error for
incoherent reception is largest.

Another disadvantage of incoherent transmission of two orthogonal signals is that we loose


3 dB relative to antipodal signaling. Also the bandwidth efficiency of random-phase transmis-
sion of two-orthogonal signals is a factor of two smaller than that of coherent transmission. A
transmitted dimension spreads out over two received dimensions.
CHAPTER 16. RANDOM CARRIER-PHASE 141

16.6 Exercises
1. Let X and Y be two independent Gaussian random variables with common variance σ 2 .
The mean of
√ X is m and Y is a zero-mean random variable. We define the random variable
V as V = X 2 + Y 2 . Show that

v mv v2 + m 2
pV (v) = 2 I0 ( 2 ) exp(− ), for v > 0, (16.43)
σ σ 2σ 2
and 0 for v ≤ 0. Here I0 (·) is the modified Bessel function of the first kind and zero order.
The distribution of V is know as the Rician distribution. In the special case where m = 0,
the Rician distribution simplifies to the Rayleigh distribution.
(Exercise 3.31 from Proakis and Salehi [17].)

2. In on-off keying of a carrier-modulated signal, the two possible signals are


√ √
s1b (t) = Eφ(t), hence s1 = E,
s2b (t) = 0, hence s2 = 0. (16.44)

Note that s1 and s2 are signals in one-dimensional “vector”-notation. The signals are
equiprobable i.e. Pr{M = 1} = Pr{M = 2} = 1/2.
These signals are transmitted over a bandpass channel with a carrier with random phase θ
with p2 (θ) = 1/2π for 0 ≤ θ < 2π . Power density of the noise is N20 for all frequencies.
An optimum receiver first determines r c and r s for which we can write

r c = sm cos θ + n c ,
r s = sm sin θ + n s , (16.45)
N0
where both n c and n s are independent zero-mean Gaussian vectors with variance 2 .
Now let E/N0 = 10dB.

(a) Determine the optimum decision regions Im for m = 1, 2.


Hint: Note that I0 (12.1571) = 22025.
(b) Determine the expected error probability that is achieved by the optimum receiver.
R 2.7184 2 √
Hint: Note that 0 x exp(− x2 )I0 (x 20)d x = 635.72.
Part V

Coded Modulation

142
Chapter 17

Coding for bandlimited channels

SUMMARY: Modulation makes it possible to transmit sequences of symbols over a wave-


form channel. Here we will investigate coding, i.e. we will study which sequences of symbols
should be used to achieve efficient transmission. We study only coding methods, i.e. tech-
niques that create distance between sequences. Shaping techniques, i.e. methods that aim
at achieving spherical signal structures will not be discussed.

17.1 AWGN channel


Consider transmission over an additive white Gaussian noise (AWGN) channel. For this channel
n2
ni p N (n i ) = √1 exp(− Ni0 )
π N0

si ¾?»ri = si + n i
- + -
½¼

Figure 17.1: Additive white Gaussian noise channel.

the i-th channel output ri is equal to the i-th input si to which noise n i is added, hence

ri = si + n i , for i = 1, 2, · · · . (17.1)

We assume that the noise variables Ni are independent and normally distributed, with mean 0
and variance σ 2 = N0 /2.
This AWGN-channel models transmission over a waveform channel with additive white
Gaussian noise with power density S Nw ( f ) = N0 /2 for −∞ < f < ∞ as we have seen
before (see section 6.8).
Note that i is the index of the dimensions. If e.g. the waveform channel has bandwidth W ,
1
we can get a new dimension every 2W seconds (see chapter 14). This determines the relation
between the actual time t and the dimension index i but we will ignore this in what follows.

143
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 144

In this chapter we call a dimension a transmission. We assume that the energy “invested” in
each transmission (dimension) is E N . It is our objective to investigate coding techniques for this
channel. First we consider uncoded transmission.

17.2 |A|-Pulse amplitude modulation


In this section we will consider pulse amplitude modulation (PAM) again. In chapter 14 the mod-
ulation properties of this technique were discussed. Here we will investigate the error behavior
of this method which can be regarded as uncoded transmission. We will use (uncoded) PAM as
a reference for evaluating alternative signaling methods, methods that do involve coding.
In the case of PAM, the channel input symbols si for i = 1, 2, · · · assume values in the
symbol alphabet (or signal structure or signal constellation)

A = {−|A| + 1, −|A| + 3, · · · , |A| − 1}, (17.2)

scaled by a factor d0 /2. We write


d0
si = ai where ai ∈ A. (17.3)
2
We will here call ai an elementary PAM-symbol.
The constellation A is called an |A|-PAM constellation. It contains |A| equidistant elemen-
tary symbols centered on the origin and having minimum Euclidean distance 2. Therefore d0 is
the minimum Euclidean distance between the symbols si . We assume in this chapter that |A| is
even. In figure 17.2 an |A|-PAM constellation is shown.

¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
−|A| + 1 · · · -5 -3 -1 1 3 5 ··· |A| − 1

Figure 17.2: An |A|-PAM constellation (|A| even) with the |A| elementary symbols.

Each symbol value has a probability of occurrence of 1/|A|. Therefore the information rate
per transmission is
R N = log2 |A| bits per transmission. (17.4)
For the average symbol energy E N per transmission we now have that

d02
EN = E pam (17.5)
4
where E pam is the elementary energy of an |A|-PAM constellation which is defined as

1 1 X
E pam = a2. (17.6)
|A|
a=−|A|+1,−|A|+3,··· ,|A|−1
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 145

RESULT 17.1 In appendix I it is shown that

|A|2 − 1
E pam = . (17.7)
3
The result holds both for even and odd |A|.
It now follows from the above result that, since si = (d0 /2)ai , the average symbol energy

d02 |A|2 − 1
EN = . (17.8)
4 3
Example 17.1 Consider the 4-PAM constellation

A = {−3, −1, +1, +3}. (17.9)

The rate corresponding to this constellation is R N = log2 4 = 2 bits per transmission. For the elementary
energy per transmission we obtain

1 42 − 1
E pam = ((−3)2 + (−1)2 + (+1)2 + (+3)2 ) = = 5. (17.10)
4 3
Apart from the rate R N and the average symbol energy E N , a third important parameter is
the symbol error probability PEs . This error probability is Q( d0σ/2 ) for the outer two signal points
and 2Q( d0σ/2 ) for the inner signal points. Therefore

2(|A| − 2) + 2 d0 /2 2(|A| − 1) d0 /2
PEs = Q( )= Q( ). (17.11)
|A| σ |A| σ

By (17.8) and the fact that σ 2 = N0 /2, the square of the argument of the Q-function can be
written as
d02 3 EN
= . (17.12)
4σ 2 |A|2 − 1 N0 /2
Therefore we can express the symbol error probability as
s ! s !
2(|A| − 1) 3 E N 2(|A| − 1) 3
PEs = Q = Q S/N , (17.13)
|A| |A| − 1 N0 /2
2 |A| |A|2 − 1

if we note that the signal-to-noise ratio S/N = E N /(N0 /2), i.e. the average symbol energy
divided by the variance of the noise in a transmission.

17.3 Normalized S/N


We now want to look more closely at signal-to-noise ratios. Therefore note that reliable trans-
mission is only possible if the rate R N (in bits per transmission, dimension) is smaller than the
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 146

capacity C N per dimension. If we write the capacity per transmission (see 12.17) as a function
of S/N = E N /(N0 /2) we obtain
1
RN < CN = log2 (1 + S/N ). (17.14)
2
Rewriting this inequality in the reverse direction way we obtain the following lower bound for
S/N as a function of the rate R N
1
S/N > 22R N − 1 = S/N min . (17.15)
Definition 17.1 This immediately suggests defining the normalized signal-to-noise ratio as
1 S/N S/N
S/N norm = = 2R . (17.16)
S/N min 2 N −1
This definition allows us to compare coding methods with different rates with each other.
Now it follows from (17.15) that for reliable transmission S/N norm > 1 must hold. Good
signaling methods achieve small error probabilities for S/N norm close to 1 while lousy methods
need a larger S/N norm to realize reliable transmission. It is the objective of the communication
engineer to design systems that achieve acceptable error probabilities for a S/N norm as close as
possible to 1.

17.4 Baseline performance in terms of S/N norm


In a previous section we have determined the symbol error probability of |A|-PAM as a function
of S/N . This symbol error probability can be expressed in terms of S/N norm in a very simple
way as we see soon.
Since we are considering bandlimited transmission in this chapter, we may assume that
S/N  1. We are aiming at achieving
1
log |A| = R N ≈ C N = log2 (1 + S/N ) (17.17)
2

or |A| ≈ 1 + S/N and hence we make the assumption that also |A|  1. Therefore we may
conclude for the symbol error probability for |A|-PAM that
s !
3
PEs ≈ 2Q S/N
|A|2 − 1
r !
3 p 
= 2Q S/N = 2Q 3S/N norm (17.18)
22R N − 1
by definition 17.1 of the normalized signal-to-noise ratio. This symbol error probability for PAM
is depicted in figure 17.3. It will serve as reference performance for the alternative methods that
will be investigated soon. We call it therefore the baseline performance. Note once more that the
baseline performance corresponds to uncoded PAM transmission.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 147

SNRnorm
0

−1

−2

−3

−4

−5

−6
−1 0 1 2 3 4 5 6 7 8 9 10

Figure 17.3: Average symbol error probability PEs versus S/N norm for uncoded |A|-PAM, under
the assumption that |A| is large. Horizontally S/N norm in dB, vertically log10 PEs .

17.5 The promise of coding


It can be observed from figure 17.3 that a S/N norm of approximately 9 dB is needed to achieve
a reliable performance when PAM is used. It is more or less common practice to assume that
PEs ≈ 10−6 corresponds to reliable transmission.
Shannon’s results imply that reliable transmission is possible for S/N norm ≈ 1. Therefore
for |A|-PAM the S/N norm is roughly 9 dB, a factor of eight, larger than necessary. There exist
methods that perform up to 9 dB better than our baseline method, i.e. up to 9 dB better than
uncoded PAM.
Why do we prefer small signal-to-noise ratios? In a radio transmission system a smaller S/N
gives the opportunity to use less transmitter power or to apply simpler and/or smaller antennas.
Using less transmitter power makes smaller batteries possible, results in a smaller bill from the
electricity distribution company, or makes dissipation in circuits lower, etc. In line transmission
using smaller transmitter power is also advantageous for the same reasons. Moreover we have
the effect of cross-talk between different users. We can not combat the noise that results from
this cross-talk by increasing the transmitter power. In this case it is better to achieve a better error
performance given the S/N that occurs there.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 148

17.6 Union bound


17.6.1 Sequence error probability
In what follows we will investigate several coding methods and evaluate their performance. Since
we want to compare the performance of the proposed codes with the baseline performance it
would be nice to know the symbol error probabilities of the proposed codes. But since it is not
clear yet what this means, we consider the (sequence) error probability PE of a coding method
first. The union bound can be used to give a good estimate of this error probability. This bound
is based on the evaluating the pair-wise error probabilities between pairs of coded sequences.
On the AWGN channel the received sequence r = s + n is the sum of the transmitted coded
sequence (signal) s and the noise vector n consisting of N independent components each having
mean zero and variance σ 2 = N0 /2. Now consider two coded sequences s ∈ S and s 0 ∈ S
where S is the set of sequences (signals). They differ by the Euclidean distance ks − s 0 k. If the
sequence s is transmitted, the probability that the received vector r = s + n will be closer to s 0
than to s is given by  
ks − s 0 k/2
Q . (17.19)
σ
The probability that r will be closer to s 0 than to s for any s 0 6= s in S which is precisely the
probability of error PE (s) for minimum distance decoding when s is sent, is therefore upper-
bounded by
X  ks − s 0 k/2 
PE (s) ≤ Q . (17.20)
0
σ
s 6 =s
The error probability over all coded sequences s is now upper-bounded by
X 1 X 1 X  ks − s 0 k/2  X 
d/2

PE = PE (s) ≤ Q = Kd Q , (17.21)
|S| |S| 0 σ σ
s∈S s∈S s 6 =s d

where
X # of s 0 such that ks − s 0 k = d
Kd = , (17.22)
|S|
s∈S
i.e. the average number of signals at distance d from a coded sequence. Inequality (17.21) is
referred to as the union bound for the sequence error probability.
The function Q(x) decays exponentially not slower than exp(−x 2 /2). This follows from
appendix A. Therefore, when K d does not increase too rapidly with d, for small σ , the union
bound estimate is dominated by its first term, which is called the union bound estimate
 
dmin /2
PE ≈ K min Q , (17.23)
σ
where dmin is the minimum Euclidean distance between any two sequences in S and K min the
average number of signals at distance dmin from a coded sequence. K min is called the error
coefficient.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 149

17.6.2 Symbol error probability


To compare a coded system with the uncoded baseline system, we would like to know, instead
of the sequence error probability PE for this coded system, the symbol error probability PEs .
A first problem now appears: What are the symbols? Usually in a coded system a bunch of
bits is transmitted in a single sequence during N transmissions. In the baseline system however,
a symbol is the amount of information that is sent in a single transmission. Therefore in the
coded case we assume also that in each transmission a symbol is sent. Such an abstract symbol
corresponds to a fraction 1/N of the information bits. Thus the coded sequence contains N of
these symbols.
Can we now express the sequence error probability PE in terms of the error probability P̃Es
for these symbols1 and vice versa? Now we get a second problem: How do these symbols-errors
depend on each other? To make life easy we assume that the N symbols-errors are independent
of each other. Now we easily obtain
PE = 1 − (1 − P̃Es ) N ≈ N P̃Es (17.24)
or
P̃Es ≈ PE /N . (17.25)
If we now use the union bound estimate (17.23) obtained in the previous subsection we get
 
s K min dmin /2
P̃E ≈ PE /N ≈ Q . (17.26)
N σ
The ratio K min /N is called the normalized error coefficient (see [6], p. 2399). The symbol error
probability P̃Es which is also referred to as sequence error probability per transmission can now
be used for comparison with the uncoded PAM case.

17.7 Extended Hamming codes


To construct sets of coded sequences we can make use of a binary error-correcting code. Such a
code C is a set of |C| codewords {c1 , c2 , · · · , c|C| }. Each codeword c is a sequence consisting of
N symbols c1 , c2 , · · · , c N taking values in the alphabet {0, 1}.
A code can be linear. In that case all codewords are linear combinations of K independent
codewords2 . Such a linear code contains 2 K codewords.

Definition 17.2 The minimum Hamming distance of a code is defined as


1
dH,min = min0 dH (c, c0 ), (17.27)
c6 =c

where dH (c, c0 ) is the Hamming distance between two codewords c and c0 , i.e. the number of
positions at which they differ.
1 Wewrite P̃Es with a tilde since we are not dealing with concrete symbols.
sum c of two codewords c0 and c00 is the component-wise mod 2 sum of both codewords. A codeword c
2 The

times a scalar is the codeword c itself if the scalar is 1 or the all-zero codeword 0, 0, · · · , 0 if the scalar is 0.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 150

So-called extended Hamming codes exist and have parameters

(N , K , dH,min ) = (2q , 2q − q − 1, 4), for q = 2, 3, · · · . (17.28)

We will use these codes in the next sections of this chapter, just because it is convenient to do so.

Example 17.2 For q = 3 we obtain a code with parameters (N , K , dH,min ) = (8, 4, 4). Indeed this code
can be constructed from the following K = 4 independent codewords of length N = 8:

c1 = (1, 0, 0, 0, 0, 1, 1, 1),
c2 = (0, 1, 0, 0, 1, 0, 1, 1),
c3 = (0, 0, 1, 0, 1, 1, 0, 1),
c4 = (0, 0, 0, 1, 1, 1, 1, 0). (17.29)

The code consists of the 16 linear combinations of these 4 codewords. Why is the minimum Hamming
distance dH,min of this code 4? To see this, note that the difference of two codewords c 6= c0 is a non-zero
codeword itself. The Hamming distance between two codewords is the number of ones i.e. the Hamming
weight of this difference. The minimal Hamming weight of a nonzero codeword is at least 4. This can be
checked easily.

17.8 Coding by set-partitioning


17.8.1 Construction of the signal set
The idea of “mapping by set-partitioning” comes from Ungerboeck [22]. He proposed to par-
tition the constellation A into subsets A0 and A1 in such a way that the minimum distance of
the elementary symbols in the subsets becomes larger than that of the elementary symbols in
the original constellation. E.g. an 8-PAM constellation A can be partitioned into a subset A0

A
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
-7 -5 -3 -1 1 3 5 7
³ PP
A³0³³³ PPA1
PP
³
³
) ³ P
P
q

¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
-7 -3 1 5 -5 -1 3 7

Figure 17.4: Partitioning of an 8-PAM set into two subsets with four elementary symbols having
both a twice as large minimum within-subset distance than the original PAM-set.

and a subset A1 both containing four elementary symbols, and such that the minimum distance
between two elementary symbols in A0 or two symbols in A1 , i.e. the minimum within-subset
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 151

distance, is 4, which is twice as large as the minimum distance between two elementary symbols
in A, which is 2 (see figure 17.4). Note that the minimum distance between any symbol from A0
and any symbol from A1 is still 2.
How can set-partitioning lead to more reliable transmission? Therefore we consider the
example-encoder shown in figure 17.5. This encoder transforms 20 binary information digits
b1 , b2 , · · · , b20 into 8 elementary 8-PAM symbols a1 , a2 , · · · , a8 . The digits b1 , b2 , b3 , b4 are

b1
b2
(8,4,4) ..
b3 .
b4 c1

b5 tab a1
b6
c2
b7 tab a2
b8
.. .. ..
. . .
c8
b19 tab a8
b20

Figure 17.5: An encoder that maps by set-partitioning.

transformed into a codeword c1 , c2 , · · · , c8 from an (8,4,4) extended Hamming code. Digit ci


together with two information digits b2i+3 and b2i+4 are used to form the PAM symbol ai . There-
fore the following table can be used. The table lists ai for all combinations of ci and b2i+3 , b2i+4 .
ci b2i+3 , b2i+4 00 01 11 10
0 −7 −3 +1 +5
1 −5 −1 +3 +7
Note that according to this table essentially one out of the four elementary symbols from the set
A0 is chosen if ci = 0 or one out the four elementary symbols from A1 if ci = 1. The binary
digits b2i+3 and b2i+4 determine which symbol is picked.

17.8.2 Distance between the signals


How about the minimum distance dmin between the signals s = s1 , s2 , · · · , s8 that is achieved in
this way? Note that there are 220 such signals. Just like before each signal (vector)

s = (d0 /2)a, (17.30)

thus each signal s is a scaled version of an elementary signal (vector) a = a1 , a2 , · · · , a8 . To de-


termine the minimum Euclidean distance between two signals s 6= s 0 we determine the minimum
distance between two elementary signals a 6= a 0 first. We consider two cases.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 152

1. First consider all 216 elementary sequences a that result from a fixed codeword c from
the extended Hamming code. For such a sequence each component ai ∈ Aci for i =
1, 2, · · · , 8. The within-subset distance in subsets A0 and A1 is 4. Therefore the minimum
distance between two elementary sequences a and a 0 resulting from the same codeword c
is 4.

2. Next consider two elementary sequences a and a 0 that correspond to two different code-
words c and c0 from the extended Hamming code. Then there are at least dH,min = 4
positions i ∈ {1, 2, · · · , 8} at which ci 6= ci0 . Note that at these positions the elementary
symbol ai ∈ Aci while the symbol ai0 ∈ Aci0 . The minimum distance between any symbol
from A0 and any symbol from A1 is 2 and thus at these positions |ai − ai0 | ≥ 2 and hence
X
ka − a 0 k2 = (ai − ai0 )2 ≥ 4 · 4 = 16. (17.31)
i=1,8

Therefore the distance between two elementary sequences a and a 0 that correspond to two
different codewords c and c0 is at least 4.

We conclude that the Euclidean distance between two elementary sequences a and a 0 is at least
4. Therefore the Euclidean distance between two sequences s and s 0 is at least 4(d0 /2) = 2d0 .
Hence dmin = 2d0 , which is a factor of two larger than in the uncoded case.

17.8.3 Asymptotic coding gain


We have seen in the previous section that the minimum distance between any two coded se-
quences for the Ungerboeck coding method dmin = 2d0 . In uncoded PAM, our baseline method,
the distance between any two symbols was at least d0 . We may conclude that by using the idea
of Ungerboeck the minimum Euclidean distance has increased by a factor of two.
Let us now compute the sequence error probability for our Ungerboeck coding technique. Is it
indeed smaller than the baseline behavior for the symbol error probability for the uncoded PAM
case that was described by (17.18)? Therefore we first note that by the union bound estimate
(17.23) we can write for our coding method that
 
dmin /2
PE ≈ K min Q . (17.32)
σ
Now we express PE in terms of E N by noting that again

d02 |A|2 − 1
EN = , (17.33)
4 3
where it is important to note that although we use coding, the PAM symbols remain equiprobable.
Moreover
2
dmin 2
dmin d02 2
dmin 3 EN 2
dmin 3
= = = S/N . (17.34)
4σ 2 d02 4σ 2 d02 |A|2 − 1 N0 /2 d02 |A|2 − 1
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 153

Therefore
s ! s !
2 2
dmin 3 dmin 22R N − 1
PE = K min Q S/N = K min Q 3S/N norm , (17.35)
d02 |A|2 − 1 d02 |A|2 − 1
where R N is the rate of the signal set S, i.e.
log2 |S| K
RN = = + log2 |A|/2
N N
2q − q − 1 q +1
= + log2 |A| − 1 = log2 |A| − . (17.36)
2q 2q
Note that the rate R N approaches log2 |A| for q → ∞, i.e. if we increase the complexity of the
extended Hamming code. √
The argument of the Q( ·)-function in terms of S/N norm was 3S/N norm in the uncoded
2
dmin 22R N −1
PAM case. If we use Ungerboeck coding as we described here, it is 2
d0 |A|2 −1
3S/N norm , hence
asymp
we have achieved an increase of the argument by a factor G cod of
2
asymp dmin 22R N − 1 22R N − 1
G cod = = 4 · . (17.37)
d02 |A|2 − 1 |A|2 − 1
This factor is called the asymptotic (or nominal) coding gain of our coding system with respect
to the baseline performance.
2 /d 2 = 4 corresponds to the increase of the squared Euclidean distance
The factor of dmin 0
between the sequences. This factor is referred to as distance gain. This increase of the distance
however was achieved since we have introduced redundancy by using an error-correcting code.
The factor (22R N − 1)/(|A|2 − 1) that corresponds to this loss is therefore called redundancy
loss.
Example 17.3 The encoder shown in figure 17.5 is based on the (8, 4, 4) extended Hamming code. The
rate of this extended Hamming, corresponding to q = 3, is
K 2q − q − 1 23 − 3 − 1
Rext.Hamm. = = = = 1/2. (17.38)
N 2q 23
Therefore the total rate R N = Rext.Hamm. + 2 = 5/2 bits/transmission. The additional 2 bits come from the
fact that the subsets A0 and A1 each contain four symbols.
The distance gain of this method
2
dmin
= 4, (17.39)
d02
which is 10 log10 4 = 6.02dB. The redundancy loss however is
22R N − 1 22·5/2 − 1
= = 31/63 (17.40)
|A|2 − 1 82 − 1
which is 10 log10 31/63 = −3.08dB. Therefore the asymptotic coding gain for this code is
asymp 31
G cod = 10 log10 4 · = 6.02dB − 3.08dB = 2.94dB. (17.41)
63
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 154

17.8.4 Effective coding gain


In the previous subsection we have considered only the asymptotic coding gain, i.e. we have
only concentrated on the argument of the Q-function (in the expressions of both PE and P̃Es )
and ignored the coefficient of it. How large are these coefficients? It can be shown that for the
extended Hamming code with parameter q
2q  q 2q−1 
4 + (2 − 1) 2
K min (q) = (17.42)
2q
is the number of codewords at Hamming distance 4 from a codeword. We have already seen that
the minimum Euclidean distance of our Ungerboeck code is 2d0 . How many sequences have
distance 2d0 to a given sequence, in other words how large is our K min ? Therefore suppose that
a sequence s is a result of extended Hamming codeword c. Note first that there are at most 2N
signals s 0 that result from the same codeword c having distance 2d0 from s. Moreover for each
extended Hamming codeword c0 at Hamming distance 4 from c there are at most 24 sequence s 0
having distance 2d0 from s. Therefore in total

K min ≤ 2N + 24 K min (q). (17.43)

Note that the normalized error coefficient is K min /N . This normalized error coefficient
K min /N is larger than the coefficient 2 in the uncoded case. What is the effect of this larger
coefficient in dB? In other words, how much do we have to increase the S/N norm , to compen-
sate for the larger K min /N ? Note that we are interested in achieving√a symbol error probability
of 10−6 . To achieve the baseline performance the argument α of Q( ·) has to satisfy

2Q( α) = 10−6 (17.44)

in other words α = 23.928. For an alternative coding methods with normalized error coefficient
K min /N we need an argument β that satisfies
K min p
Q( β) = 10−6 , (17.45)
N
or a S/N norm that is a factor β/α larger, to get the same performance. Therefore the coding loss
L that we have to take into account is
β
L = 10 log10 . (17.46)
α
In practice it turns out that if the normalized error coefficient K min /N is increased by a factor of
two, we have to accept an additional loss of roughly 0.2dB (at error rate 10−6 ).
The resulting coding gain, called the effective coding gain, is the difference of the asymptotic
coding gain and the loss, hence

asymp asymp β
G eff
cod = G cod − L = G cod − 10 log10 . (17.47)
α
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 155

RN (bits/transm.) Asympt. cod. gain (dB)


4 6

5
3
4

2 3

2
1
1

0 0
2 3 4 5 6 2 3 4 5 6

Kmin/N Effect. cod. gain (dB)


3000 6

2500 5

2000 4

1500 3

1000 2

500 1

0 0
2 3 4 5 6 2 3 4 5 6

Figure 17.6: Rate, asymptotic coding gain (in dB), normalized error coefficient, and effective
coding gain (in dB) versus q.

Example 17.4 Again consider the encoder shown in figure 17.5 which is based on the (8, 4, 4) extended
Hamming code. We can show that K min (q) i.e. the number of “nearest neighbors” in the extended Ham-
ming code, for q = 3, is
23
 3−1 
4
+ (23 − 1) 2 2
K min (3) = = 14. (17.48)
23
This results in the following bound for the error coefficient in the set S of coded sequences

K min ≤ 2N + 24 K min (3) = 2 · 8 + 16 · 14 = 240. (17.49)

The normalized error coefficient K min /N is therefore at most 240/8 = 30. Now we have to solve
K min p p
Q( β) = 30 · Q( β) = 10−6 , (17.50)
N
It turns out that β = 29.159 and thus the loss is 10 log10 (29.159/23.928) = 0.86dB. Therefore the
asymp
cod = G cod
effective coding gain G eff − 0.86dB = 2.94dB − 0.86dB = 2.08dB.

17.8.5 Coding gains for more complex codes


For the values 2, 3, · · · , 6 of the parameter q that determines the extended Hamming code we
have computed rates, normalized error coefficients, and asymptotic and effective coding gains of
our Ungerboeck codes. These data can be found in the table below.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 156

asymp
q N RN G cod K min /N L G eff
cod
2 4 2.250 1.38 dB 6 0.37 1.01 dB
3 8 2.500 2.94 dB 30 0.86 2.08 dB
4 16 2.687 4.10 dB 142 1.29 2.81 dB
5 32 2.813 4.87 dB 622 1.66 3.21 dB
6 64 2.891 5.35 dB 2606 1.99 3.36 dB
The results from the table are also shown in figure 17.6. It can be observed that the rate R N of
our code tends to 3 bits/transmission if q increases. Moreover we see that the asymptotic coding
gain approaches 6dB. The effective coding gain is considerably smaller however. The reason for
this is that the normalized error coefficient increases very quickly with q.

17.9 Remark
What we did not do is the following. To keep things simple we have only discussed one-
dimensional signal structures. Ungerboeck’s method of set-partitioning also applies to two or
more-dimensional structures.

17.10 Exercises
1. Consider instead of the extended Hamming code a so-called single-parity-check code of
length 4. This code contains 8 codewords and has distance dH,min = 2. A codeword
now has four binary components c1 , c2 , c3 , c4 and is determined by three information bits
b1 , b2 , b3 .
»XX
»»
»» XXX
» » » XX
» XXX
A »»» - A0 X
X
z A1

¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ 3£ ¢ £¢ £¢ 3 £¢ £¢ 3£ ¢
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ 1£ ¢ £¢ £¢ 1£ ¢ £¢ 1 £¢
¤¡ ¤¡ ¤ 1¡ ¤ 3¡ ¤¡ 1 ¤ 3¡ ¤¡ ¤ 1¡ 3
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢
¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡ ¤¡
£¢ £¢ £¢ £¢ £¢ £¢ £¢ £¢

Figure 17.7: Partitioning a 16-QAM constellation into two subsets.

As before we use set-partitioning, but now we partition a 16-QAM constellation into two
subsets A0 and A1 (see figure). Note that the signals in these sets are two-dimensional.
CHAPTER 17. CODING FOR BANDLIMITED CHANNELS 157

Each component ci , i = 1, 2, 3, 4 of the chosen codeword c determines the used sub-


set Aci , i.e. if ci = 0 then A0 is used, if ci = 1 we use A1 . Three binary dig-
its b3i+1 , b3i+2 , b3i+3 then determine which one of the (two-dimensional) signal points
a2i−1 , a2i from the subset Aci is transmitted. Note that N = 8, since four two-dimensional
signals are transmitted.
As before the elementary signal a = a1 , a2 , · · · , a8 is multiplied by d0 /2 to get the actual
signal s.

(a) What is the minimum Euclidean distance dmin now? What is the rate R N of this code
in bits per transmission? Give the asymptotic coding gain of this code (relative to the
baseline performance).
Part VI

Appendices

158
Appendix A

An upper bound for the Q-function

Note that the Q-function is defined as


Z ∞
1 1 α2
Q(x) = √ exp(− )dα. (A.1)
x 2π 2
The lemma below gives a useful upper bound for Q(x). In figure A.1 the Q-function and the
derived upper bound are plotted.

LEMMA A.1 For x ≥ 0 the Q-function is bounded as

1 x2
Q(x) ≤ exp(− ). (A.2)
2 2
Proof: Note that for α ≥ x, and since x ≥ 0,

α 2 = (α − x)2 + 2αx − x 2
≥ (α − x)2 + 2x 2 − x 2
= (α − x)2 + x 2 . (A.3)

Therefore
Z ∞ 1 α2
Q(x) = √ exp(− )dα
x 2π 2
Z ∞
1 (α − x)2 + x 2
≤ √ exp(− )dα
x 2π 2
Z ∞
x2 1 (α − x)2
= exp(− ) √ exp(− )dα
2 x 2π 2
1 x2
= exp(− ). (A.4)
2 2
2

159
APPENDIX A. AN UPPER BOUND FOR THE Q -FUNCTION 160

0
10

−1
10

−2
10

−3
10

−4
10

−5
10

−6
10

−7
10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Figure A.1: The Q-function and the derived upper bound.


Appendix B

The Fourier transform

B.1 Definition
THEOREM B.1 If the signal x(t) satisfies the Dirichlet conditions, that is,
R∞
1. The signal x(t) is absolutely integrable on the real line, i.e. −∞ |x(t)|dt < ∞.

2. The number of maxima and minima of x(t) in any finite interval on the real line is finite.

3. The number of discontinuities of x(t) in any finite interval on the real line is finite.
x(t + )+x(t − )
4. When x(t) is discontinuous at t then x(t) = 2 ,

then the Fourier transform (or Fourier integral) of x(t), defined by


Z ∞
X( f ) = x(t) exp(− j2π f t)dt (B.1)
−∞

exists and the original signal can be obtained from its Fourier transform by
Z ∞
x(t) = X ( f ) exp( j2π f t)d f. (B.2)
−∞

B.2 Some properties


B.2.1 Parseval’s relation
RESULT B.2 If the Fourier transforms of the signals f (t) and g(t) are denoted by F( f ) and
G( f ) respectively, then
Z ∞ Z ∞
f (t)g(t)dt = F( f )G ∗ ( f )d f. (B.3)
−∞ −∞

161
APPENDIX B. THE FOURIER TRANSFORM 162

Proof: Note that for a real signal g(t) the Fourier transform G( f ) satisfies G(− f ) = G ∗ ( f ).
Then
Z ∞ Z ∞ Z ∞  Z ∞ 
f (t)g(t)dt = F( f ) exp( j2π f t)d f G( f ) exp( j2π f t)d f dt
−∞ −∞ −∞ −∞
Z ∞ Z ∞  Z ∞ 
∗ 0 0 0
= F( f ) exp( j2π f t)d f G ( f ) exp(− j2π f t)d f dt
−∞ −∞ −∞
Z ∞ Z ∞ Z ∞  
∗ 0 0 0
= F( f ) G (f ) exp( j2π t ( f − f ))dt d f d f. (B.4)
−∞ −∞ −∞

We know that Z ∞
exp( j2π t ( f − f 0 ))dt = δ( f − f 0 ), (B.5)
−∞
therefore
Z ∞ Z ∞ Z ∞ 
∗ 0 0 0
f (t)g(t)dt = F( f ) G ( f )δ( f − f )d f df
−∞
Z−∞

−∞

= F( f )G ∗ ( f )d f. (B.6)
−∞

2
Appendix C

Impulse signal, filters

C.1 The impulse or delta signal


In mathematical sense (see e.g. [17]) the delta signal δ(t) is not a funtion but a distribution or a
generalized function. A distribution is defined in terms of its effect on another function (usually
called “test function”). The impulse distribution can be defined by its effect on the test function
f (t), which is supposed to be continuous at the origin, by the relation
Z ∞
f (t)δ(t)dt = f (0). (C.1)
−∞

We call this property the sifting property of the impulse signal. Note that we defined the impulse
signal δ(t) by desribing its action on the test function f (t) and not by specifying its value for
different values of t.

C.2 Linear time-invariant systems, filters


Definition C.1 A system L is linear if and only if for any two legitimate input signals x1 (t)
and x2 (t) and any two scalars α1 and α2 , the linear combination α1 x1 (t) + α2 x2 (t) is again a
legitimate input, and

L[α1 x1 (t) + α2 x2 (t)] = α1 L[x1 (t)] + α2 L[x2 (t)]. (C.2)

A system that does not satisfy this relation is nonlinear.

Definition C.2 The impulse response h(t) of a system L is the response of the system to an
impulse input δ(t), thus
h(t) = L[δ(t)]. (C.3)
The response of the time-invariant system time-invariant system to a unit response applied at
time τ , i.e. δ(t − τ ) is obviously h(t − τ ).

163
APPENDIX C. IMPULSE SIGNAL, FILTERS 164

Now we can determine the response of a linear time-invariant system L to an input signal
x(t) as follows:

y(t) = L[x(t)]
Z ∞
= L[ x(τ )δ(t − τ )dτ ]
−∞
Z ∞
= x(τ )L[δ(t − τ )]dτ
−∞
Z ∞
= x(τ )h(t − τ )dτ. (C.4)
−∞

The final integral is called the convolution of the signal x(t) and the impulse response h(t).
A linear time-invariant system is often called a filter.
Appendix D

Correlation functions, power spectra

We will investigate here what the influence is of a filter with impulse response h(t) on the spec-
trum Sx ( f ) of the input random process X (t). Then we study the consequences of our findings.
Consider figure D.1. The filter produces the random process Y (t) at its output while the input
process is X (t).

x(t) y(t)
- h(t) -

Figure D.1: Filtering the noise process X (t).

D.1 Expectation of an integral


For the expected value of the random process Y (t) we can write1
Z ∞
m y (t) = E[Y (t)] = E[ X (α)h(t − α)dα]
−∞
Z ∞
= E[X (α)]h(t − α)dα. (D.1)
−∞
The autocorrelation function of the random process is
Z ∞ Z ∞
R y (t, s) = E[Y (t)Y (s)] = E[ X (α)h(t − α)dα X (β)h(s − β)dβ]
−∞ −∞
Z ∞Z ∞
= E[X (α)X (β)]h(t − α)h(s − β)dαdβ
−∞ −∞
Z ∞Z ∞
= Rx (α, β)h(t − α)h(s − β)dαdβ. (D.2)
−∞ −∞
1 Note that we do not assume (yet) that the process X (t) is Gaussian.

165
APPENDIX D. CORRELATION FUNCTIONS, POWER SPECTRA 166

D.2 Power spectrum


To study the effect of filtering a random process we assume that R x (t, s) = R x (τ ) with τ =
t − s, i.e. the autocorrelation function Rx (t, s) of the input random process depends only on the
difference in time t − s between the sample times t and s. If this condition is satisfied we can
investigate the distribution of the mean power of a random process as a function of frequency.
Now
Z ∞Z ∞
R y (t, s) = Rx (α − β)h(t − α)h(s − β)dαdβ
−∞ −∞
Z ∞Z ∞
= Rz (t − s + µ − ν)h(ν)h(µ)dνdµ, (D.3)
−∞ −∞

with ν = t − α and µ = s − β. Observe that now also R y (t, s) only depends on the time
difference τ = t − s hence R y (t, s) = R y (τ ).
Next consider the Fourier transforms Sx ( f ) and S y ( f ) of Rx (τ ) and R y (τ ) respectively (see
appendix B):
Z ∞
1 R∞
Sx ( f ) = R x (τ ) exp(− j2π f τ )dτ and Rx (τ ) = −∞ Sx ( f ) exp( j2π f τ )d f
Z−∞

1 R∞
Sy ( f ) = R y (τ ) exp(− j2π f τ )dτ and R y (τ ) = −∞ S y ( f ) exp( j2π f τ )d f. (D.4)
−∞

Note that these Fourier transforms can only be defined if the correlation functions depend only
on the time-difference τ = t − s. Next we obtain
Z ∞Z ∞
R y (τ ) = R x (τ + µ − ν)h(ν)h(µ)dνdµ
−∞ −∞
Z ∞Z ∞ Z ∞
= [ Sx ( f ) exp( j2π f (τ + µ − ν)d f ]h(ν)h(µ)dνdµ
−∞ −∞ −∞
Z ∞ Z ∞ Z ∞
= Sx ( f )[ h(ν) exp(− j2π f ν)dν][ h(µ) exp(− j2π f µ)dµ] exp( j2π f τ )d f
−∞ −∞ −∞
Z ∞
= Sx ( f )H ( f )H ∗ ( f ) exp( j2π f τ )d f, (D.5)
−∞

hence
S y ( f ) = Sx ( f )H ( f )H ∗ ( f ) = Sx ( f )|H ( f )|2 . (D.6)

D.3 Interpretation
First note that the mean square value of the filter output process Y (t) is time-independent if
Rx (t, s) = Rx (t − s). This follows from

E[Y 2 (t)] = R y (t, t) = R y (t − t) = R y (0). (D.7)


APPENDIX D. CORRELATION FUNCTIONS, POWER SPECTRA 167

6H( f )

f
-
− f2 − f1 0 f1 f2

Figure D.2: An ideal bandpass filter.

Therefore R y (0) is the expected value of the power that is dissipated in a resistor of 1 connected
to the output of the filter, at any time instant. Next note that
Z ∞ Z ∞
R y (0) = S y ( f )d f = Sx ( f )|H ( f )|2 d f. (D.8)
−∞ −∞

Consider a filter with transfer function H ( f ) shown in figure D.2. This is a bandpass filter and

1 for f 1 ≤ | f | ≤ f 2 ,
H( f ) = (D.9)
0 elsewhere.

Then we obtain Z Z
− f1 f2
R y (0) = S y ( f )d f + S y ( f )d f. (D.10)
− f2 f1
Now let f 1 = f and f 2 = f + 1 f . The filter H ( f ) now only tranfers components of X (t)
in the frequency band ( f, f + 1 f ) and stops all other components. Since a power spectrum
is always even, see section D.6, the expected power at the output of the filter is approximately
2Sx ( f )1 f . This implies that Sx ( f ) is the distribution of average power in the process X (t) over
the frequencies. Therefore we call Sx ( f ) the power spectral density function of X (t).

D.4 Wide-sense stationarity


Definition D.1 A random process Z (t) is called wide-sense stationary (WSS) if and only if

m z (t) = constant
and Rz (t, s) = Rz (t − s). (D.11)

For a WSS process Z (t) it is meaningful to consider its power spectral density Sz ( f ). For
a WSS process Y (t) the expected power can be expressed as in equation (D.7). When a WSS
process X (t) is the input of a filter then the filter output process Y (t) is also WSS and its power
spectral density S y ( f ) is related to the input power spectral density Sx ( f ) as given by (D.8).
These consequences already justify the concept wide-sense stationarity.
APPENDIX D. CORRELATION FUNCTIONS, POWER SPECTRA 168

D.5 Gaussian processes


Recall that a Gaussian process Z (t) is completely specified by its mean m z (t) and autocorrelation
function Rz (t, s) for all t and s. Therefore a WSS (wide-sense stationary) Gaussian process Z (t)
is also (strict-sense) stationary.
Furthermore note that when a Gaussian process X (t) is the input of a filter then the filter
output process Y (t) is also Gaussian. Remember that the output process is WSS when input
process is WSS.

D.6 Properties of Sx ( f ) and Rx (τ )


Let X (t) be a WSS random process again.
a) First observe that Rx (τ ) is a (real) even function of τ . This follows from

Rx (−τ ) = E[X (t − τ )X (t)] = E[X (t)X (t − τ )] = R x (τ ). (D.12)

b) Next we show that Sx ( f ) is a real even function of f . Since Rx (τ ) is an even funtion and
sin(2π f τ ) an odd function of τ
Z ∞
Rx (τ ) sin(2π f τ )dτ = 0. (D.13)
−∞

Therefore
Z ∞
Sx ( f ) = R x (τ ) exp(−2π f τ )dτ
Z−∞

= R x (τ )[cos(2π f τ ) − j sin(2π f τ )]dτ
Z−∞

= R x (τ ) cos(2π f τ )dτ, (D.14)
−∞

which is even and real.


c) The power spectral density Sx ( f ) is a non-negative function of the frequency f . To see
why this statement is true suppose that Sx ( f ) is negative for f 1 < | f | < f 2 . Then consider an
ideal bandpass filter H ( f ) as given by (D.9). The expected power of the output Y (t) of this filter
when the input is the random process X (t) is given by
Z f2
2
E[Y (t)] = R y (0) = 2 Sx ( f )d f < 0, (D.15)
f1

which yields a contradiction.


d) Without proof we give another property of Rx (τ ). It can be shown that

|Rx (τ )| ≤ Rx (0). (D.16)


Appendix E

Gram-Schmidt procedure, proof, example

We will only prove result 6.2 for signal sets with signals sm (t) 6≡ 0 for all m ∈ M. The statement
then also should hold for sets that do contain all-zero signals.

E.1 Proof of result 6.2


The proof is based in induction.

1. First we will show that the induction hypothesis holds for one signal. Therefore consider
s1 (t) and take Z T
s1 (t)
ϕ1 (t) = p with E s1 = s12 (t)dt. (E.1)
E s1 0

Note that by doing so


Z T Z T
1 2
ϕ12 (t)dt s1 (t)dt = 1,
= (E.2)
0 0 E s1
p p
i.e. ϕ1 (t) has unit energy (is normal). Since s1 (t) = E s1 ϕ1 (t) the coefficient s11 = E s1 .

2. Now suppose that induction hypothesis holds for m − 1 ≥ 1 signals. Thus there exists an
orthonormal basis ϕ1 (t), ϕ2 (t), · · · , ϕn−1 (t) for the signals s1 (t), s2 (t), · · · , sm−1 (t) with
n − 1 ≤ m − 1. Now consider the next signal sm (t) and an auxiliary signal θm (t) which is
defined as follows: X
1
θm (t) = sm (t) − smi ϕi (t) (E.3)
i=1,n−1

with Z T
smi = sm (t)ϕi (t)dt for i = 1, n − 1. (E.4)
0
We can distinguish between two cases now:
P
(a) If θm (t) ≡ 0 then sm (t) = i=1,n−1 smi ϕi (t) and the induction hypothesis also holds
for m signals in this case.

169
APPENDIX E. GRAM-SCHMIDT PROCEDURE, PROOF, EXAMPLE 170

(b) If on the other hand θm (t) 6≡ 0 then take a new building-block waveform
Z T
θm (t)
ϕn (t) = p with E θm = θm2 (t)dt. (E.5)
E θm 0

By doing so
Z T Z T 1 2
ϕn2 (t)dt = θ (t)dt = 1, (E.6)
0 0 E θm m
i.e. also ϕn (t) has unit energy. Moreover for all i = 1, n − 1
Z T Z T
1
ϕn (t)ϕi (t)dt = p θm (t)ϕi (t)dt
0 0 E θm
 
Z Z T X
1  T
= p sm (t)ϕi (t)dt − sm j ϕ j (t)ϕi (t)dt 
E θm 0 0 j=1,n−1
 
X Z T
1 
= p smi − sm j ϕ j (t)ϕi (t)dt 
E θm j=1,n−1 0

1 X
= p (smi − sm j δ ji ) = 0, (E.7)
E θm j=1,n−1

and therefore ϕn (t) is orthogonal to all ϕi (t).


Note that now
X
sm (t) = θm (t) + smi ϕi (t)
i=1,n−1
p X X
= E θm ϕn (t) + smi ϕi (t) = smi ϕi (t) (E.8)
i=1,n−1 i=1,n
p
with smn = E θm . Thus also in this case for m signals there exists an orthonormal
basis with n ≤ m building-block waveforms. Therefore the induction hypothesis also
holds for m signals.

It should be noted that, when using the Gram-Schmidt procedure, any ordering of the
signals other than s1 (t), s2 (t), · · · , s|M| (t) will yield a basis, i.e. a set of building-block
waveforms, of the smallest possible dimensionality (see exercise 1 in chapter 6), however
in general with different building-block waveforms.

E.2 An example
We will next discuss an example in which we actually carry out the Gram-Schmidt procedure for
a given set of waveforms.
APPENDIX E. GRAM-SCHMIDT PROCEDURE, PROOF, EXAMPLE 171

s1 (t) ϕ1 (t)
3 3

1 √
t 1/ 3 t

-1 1 2 3 −1/ 3 1 2 3

-3 -3

s2 (t) θ2 (t) ϕ2 (t)


3 3 3

1 t 1 t 1/ 2 t
-1 1 2 3 -1 1 2 3 1 2 3

-3 -3 -3

s3 (t)
3

1 1 2 3
θ3 (t) ≡ 0
-1 t

-3

Figure E.1: An example of the Gram-Schmidt procedure.

Example E.1 Consider the three waveforms sm (t), m = 1, 2, 3, shown in figure E.1. Note that T = 3.
We first determine from the energy E 1 of the first signal s1 (t)

E 1 = 4 + 4 + 4 = 12. (E.9)

Therefore the first building-block waveform ϕ1 (t) is


s1 (t) s1 (t) √
ϕ1 (t) = √ = √ , and thus s11 = 2 3. (E.10)
E1 2 3
Now we continue with the second signal s2 (t). First we determine the projection s21 of s2 (t) on the first
dimension i.e. the dimension that corresponds to ϕ1 (t):
Z T √
1
s21 = s2 (t)ϕ1 (t)dt = √ (−1 − 3 + 1) = − 3. (E.11)
0 3
APPENDIX E. GRAM-SCHMIDT PROCEDURE, PROOF, EXAMPLE 172

ϕ2 √
s2 2 2
J
]
J
J
J √
J 2
J
√ J √ √
− 3 J 3 2 3
J -
­
­ s1 ϕ1
­
­ √
­ − 2
­
­
­ √
s3 ­
À −2 2

Figure E.2: A vector representation of the signals sm (t), m = 1, 2, 3, of figure E.1.

hence the auxiliary signal θ2 (t) is now



θ2 (t) = s2 (t) + 3ϕ1 (t). (E.12)
Next we determine the energy E θ2 of the auxiliary signal θ2 (t):
E θ2 = 4 + 4 = 8, (E.13)
and the second building-block waveform ϕ2 (t) is
θ2 (t) θ2 (t)
ϕ2 (t) = p = √ . (E.14)
E θ2 2 2
Now we obtain for the signal s2 (t):

s2 (t) = θ2 (t) − 3ϕ1 (t)
√ √ √
= 2 2ϕ2 (t) − 3ϕ1 (t), thus s22 = 2 2. (E.15)
For the third signal s3 (t) we can now determine the projections s31 and s32 and the auxiliary signal θ3 (t):
Z T √
1
s31 = s3 (t)ϕ1 (t)dt = √ (−1 + 1 − 3) = − 3
0 3
Z T √
1
s32 = s3 (t)ϕ2 (t)dt = √ (−1 − 3) = −2 2 hence
0 2
√ √
θ3 (t) = s3 (t) + 3ϕ1 (t) + 2 2ϕ2 (t) ≡ 0. (E.16)
Since this auxiliary signal is always 0 we can express the third signal s3 t) in terms of ϕ1 (t) and ϕ2 (t) as
follows: √ √
s3 (t) = − 3ϕ1 (t) − 2 2ϕ2 (t), (E.17)
APPENDIX E. GRAM-SCHMIDT PROCEDURE, PROOF, EXAMPLE 173

hence the coefficients corresponding to s3 (t) are



s31 = − 3 and

s32 = −2 2. (E.18)

We have now obtained three vectors of coefficients, one for each signal:

s 1 = (s11 , s12 ) = (2 3, 0),
√ √
s 2 = (s21 , s22 ) = (− 3, 2 2), and
√ √
s 3 = (s31 , s32 ) = (− 3, −2 2). (E.19)

These vectors are depicted in figure E.2.


Appendix F

Schwarz inequality

LEMMA F.1 (Schwarz inequality) For two finite-energy waveforms a(t) and b(t) the inequal-
ity
Z ∞ 2 Z ∞ Z ∞
2
a(t)b(t)dt ≤ a (t)dt b2 (t)dt (F.1)
−∞ −∞ −∞
holds. Equality is obtained only if b(t) ≡ Ca(t) for some constant C.
Proof: Form an orthonormal expansion for a(t) and b(t), i.e.

a(t) = a1 φ1 (t) + a2 φ2 (t),


b(t) = b1 φ1 (t) + b2 φ2 (t), (F.2)
R∞
with −∞ φi (t)φ j (t)dt = δi j for i = 1, 2 and j = 1, 2.
R∞
Then with a = (a1 , a2 ) and b = (b1 , b2 ) and the Parseval results (a · b) = −∞ a(t)b(t)dt,
R∞ R∞
(a · a) = −∞ a 2 (t)dt, and (b · b) = −∞ b2 (t)dt, we find for the vectors a and b that
R∞
(a · b) −∞ a(t)b(t)dt
= qR qR . (F.3)
kakkbk ∞ ∞ 2
−∞ a (t)dt −∞ b (t)dt
2

If we now can prove that (a · b)2 ≤ kak2 kbk2 we get Schwarz inequality.
We start by stating the triangle inequality for a and b:

ka + bk ≤ kak + kbk. (F.4)

Moreover
ka + bk2 = (a + b)2 = kak2 + kbk2 + 2(a · b) (F.5)
hence
kak2 + kbk2 + 2(a · b) ≤ kak2 + kbk2 + 2kakkbk, (F.6)
and therefore
(a · b) ≤ kakkbk. (F.7)

174
APPENDIX F. SCHWARZ INEQUALITY 175

From this it also follows that

−(a · b) = (a · (−b)) ≤ kakk − bk = kakkbk (F.8)

or
(a · b) ≥ −kakkbk. (F.9)
We therefore may conclude that −kakkbk ≤ (a · b) ≤ kakkbk or (a · b)2 ≤ kak2 kbk2 and this
proves Schwarz inequality.
Equality in Schwarz inequality is achieved only when equality in the triangle inequality is
obtained. This the case when (first part of proof) b = Ca or (second part) −b = (−b) = Ca for
some non-negative C. 2
Appendix G

Bound error probability orthogonal


signaling

Note that the correct probability satisfies


Z ∞
PC = p(µ − b)(1 − Q(µ))|M|−1 dµ (G.1)
−∞

where b = 2E s /N0 , hence
Z ∞
PE = 1 − PC = p(µ − b)[1 − (1 − Q(µ))|M|−1 ]dµ. (G.2)
−∞

The term in square brackets is the probability that at least one of |M| − 1 noise components
exceeds µ. By the union bound this probability is not larger than the sum of the probabilities that
individual components exceed µ, thus

1 − (1 − Q(µ))|M|−1 ≤ (|M| − 1)Q(µ) ≤ |M|Q(µ). (G.3)

Moreover, since this term is a probability we can also write

1 − (1 − Q(µ))|M|−1 ≤ 1. (G.4)

When µ is small Q(µ) is large and then the unity bound is tight. On the other hand for large µ
the bound |M|Q(µ) is tighter. Therefore we split the integration range into two parts, (−∞, a)
and (a, ∞): Z Z
a ∞
PE ≤ p(µ − b)dµ + |M| p(µ − b)Q(µ)dµ. (G.5)
−∞ a

If we take a ≥ 0 we can use the upper bound Q(µ) ≤ exp(−µ2 /2), see the proof in appendix
A, and we obtain
Z a Z ∞
µ2
PE ≤ p(µ − b)dµ + |M| p(µ − b) exp(− )dµ. (G.6)
−∞ a 2

176
APPENDIX G. BOUND ERROR PROBABILITY ORTHOGONAL SIGNALING 177

If we denote the first integral by P1 and the second by P2 , we get

PE ≤ P1 + |M|P2 . (G.7)

To minimize the bound we choose a such that the derivative of (G.6) with respect to a is equal
to 0, i.e.
d
0 = [P1 + |M|P2 ]
da
a2
= p(a − b) − |M| p(a − b) exp(− ). (G.8)
2
This results in
a2
) = M.
exp( (G.9)
2

Therefore the value a = 2 ln |M| achieves a (at least a local) minimum in PE . Note that a ≥ 0
as was required. Now we can consider the separate terms. For the first term we can write
Z a 1 (µ − b)2
P1 = √ exp(− )dµ
−∞ 2π 2
Z a−b
1 γ2
= √ exp(− )dγ
−∞ 2π 2
= Q(b − a)
(∗) (b − a)2
≤ exp(− ) if 0 ≤ a ≤ b. (G.10)
2
Here the inequality (∗) follows from appendix A. For the second term we get
Z ∞1 (µ − b)2 µ2
P2 = √ exp(− ) exp(− )dµ
a 2π 2 2
2 Z ∞
(∗∗) b 1
= exp(− ) √ exp(−(µ − b/2)2 )dµ
4 a 2π
 √ √ 2
2 Z ∞ µ 2 − (b/2) 2 √
b 1 1
= exp(− ) √ √ exp(− )dµ 2
4 2 a 2π 2
 √ 2
2 Z ∞ γ − (b/2) 2
b 1 1
= exp(− ) √ √ √ exp(− )dγ
4 2 a 2 2π 2
b2 1 √ √
= exp(− ) √ Q(a 2 − (b/2) 2). (G.11)
4 2
APPENDIX G. BOUND ERROR PROBABILITY ORTHOGONAL SIGNALING 178

Here the equality (∗∗) follows from

1 µ2 µ2 b 2 µ2
(µ − b)2 + = − µb + +
2 2 2 2 2
b 2 b2
= µ2 − µb + +
4 4
b b 2
= (µ − )2 + . (G.12)
2 4
From (G.11) we get

exp(−b2 /4) 0 ≤ a ≤ b/2
P2 ≤ (G.13)
exp(−b2 /4 − (a − b/2)2 ) a ≥ b/2.

If we now collect the bounds (G.10) and (G.13) and substitute the optimal value of a found in
(G.9) we get

exp(−(b − a)2 /2) + exp(a 2 /2) exp(−b2 /4) 0 ≤ a < b/2
PE ≤ (G.14)
exp(−(b − a)2 /2) + exp(a 2 /2) exp(−b2 /4 − (a − b/2)2 ) b/2 ≤ a ≤ b.

Now we note that  2   


(a − b)2 b a2 b 2
− − = a− ≥ 0. (G.15)
2 4 2 2
Therefore for 0 ≤ a < b/2 the first term is not larger than the second one while for b/2 ≤ a ≤ b
both terms are equal. This leads to

2 exp(−b2 /4 + a 2 /2) 0 ≤ a < b/2
PE ≤ (G.16)
2 exp(−(b − a)2 /2) b/2 ≤ a ≤ b.

Now we rewrite both a an b as


p √ p
a = 2 ln |M| = 2 ln 2 · log2 |M|
p p p
b = 2E s /N0 = 2E b /N0 · log2 |M|, (G.17)

where we used the following definition for E b i.e. the energy per transmitted bit of information

1 Es
Eb = . (G.18)
log2 |M|

This finally leads to



2 exp(− log2 |M|[E b /(2N0 ) −√ln 2]) E b /N0 ≥ 4 ln 2
PE ≤ √ (G.19)
2 exp(− log2 |M|[ E b /N0 − ln 2]2 ) ln 2 ≤ E b /N0 ≤ 4 ln 2.
Appendix H

Decibels

Definition H.1 An energy ratio X > 0 can be expressed in decibels as follows:


1
X |in dB = 10 log10 X. (H.1)

It is easy to work with decibels since by doing so we transform multiplications into additions.
Table H.1 shows for some energy ratios X the equivalent ratio in dB.

X X |in dB
0.01 -20.0 dB
0.1 -10.0 dB
1.0 0.0 dB
2.0 3.0 dB
3.0 4.8 dB
5.0 7.0 dB
10.0 10.0 dB
100.0 20.0 dB
Table H.1: Decibels.

Note that 1 dB = 0.1 Bel.

179
Appendix I

Elementary energy E pam of an |A|-PAM


constellation

The elementary energy E pam of an |A|-PAM constellation was defined as

1 1 X
E pam = s 2. (I.1)
|A|
s=−|A|+1,−|A|+3,··· ,|A|−1

We want to express E pam in terms of the number of symbols |A|. We will only consider the case
where |A| is even. A similar treatment can be given for odd |A|. For even |A|
2 X
E pam = (2k − 1)2 . (I.2)
|A|
k=1,|A|/2

By induction we will show that now


X 4H 3 − H
(2k − 1)2 = . (I.3)
3
k=1,H

Note that the induction hypothesis holds for H = 1. Next suppose that it holds for H = h ≥ 1.
Now it turns out that
X X
(2k − 1)2 = (2k − 1)2 + (2h + 1)2
k=1,h+1 k=1,h
4h 3 − h
= + (2h + 1)2
3
4h 3 − h + 12h 2 + 12h + 3
=
3
4h 3 + 12h 2 + 12h + 4 − h − 1 4(h + 1)3 − (h + 1)
= = , (I.4)
3 3

180
APPENDIX I. ELEMENTARY ENERGY E PAM OF AN |A|-PAM CONSTELLATION 181

and thus the induction hypothesis also holds for h + 1 and thus for all H . Therefore we obtain

2 X 2 4(|A|/2)3 − |A|/2 |A|2 − 1


E pam = (2k − 1)2 = = . (I.5)
|A| |A| 3 3
k=1,|A|/2
Bibliography

[1] A.R. Calderbank, T.A. Lee, and J.E. Mazo, “Baseband Trellis Codes with a Spectral Null at
Zero,” IEEE Trans. Inform. Theory, vol. IT-34, pp. 425-434, May 1988.

[2] G.A. Campbell, U.S. Patent 1,227,113, May 22, 1917, “Basic Types of Electric Wave Fil-
ters.”

[3] J.R. Carson, “Notes on the Theory of Modulation,” Proc. IRE, vol. 10, pp. 57-64, Febru-
ary 1922. Reprinted in Proc. IEEE, Vol. 51, pp. 893-896, June 1963.

[4] T.M. Cover, “Enumerative Source Encoding,” IEEE Trans. Inform. Theory, vol. IT-19, pp.
73-77, Jan. 1973.

[5] G.D. Forney, Jr., R.G. Gallager, G.R. Lang, F.M. Longstaff, and S.U. Quereshi, “Efficient
Modulation for Band-Limited Channels,” IEEE Journ. Select. Areas Comm., vol. SAC-2, pp.
632-647, Sept. 1984.

[6] G.D. Forney, Jr., and G. Ungerboeck, “Modulation and Coding for Linear Gaussian Chan-
nels,” IEEE Trans. Inform. Theory, vol. 44, pp. 2384 - 2415, October 1998.

[7] R.G. Gallager, Information Theory and Reliable Communication, Wiley and Sons, 1968.

[8] R.V.L. Hartley, “Transmission of Information,” Bell Syst. Tech. J., vol. 7, pp. 535 - 563, July
1928.

[9] C.W. Helstrom, Elements of Signal Detection and Estimation, Prentice Hall, 1995.

[10] S. Haykin, Communication Systems. John Wiley & Sons, 4th edition, 2000.

[11] V.A. Kotelnikov, The Theory of Optimum Noise Immunety. Dover Publications, 1960.

[12] R. Laroia, N. Farvardin, S. Tretter, “On Optimal Shaping of Multidimensional Constella-


tions”, submitted to IEEE Trans. on Inform. Theory, vol. IT-40, pp. 1044-1056, July 1994.

[13] E.A. Lee and D.G. Messerschmitt, Digital Communication, 2nd Edition, Kluwer Academic,
1994.

182
BIBLIOGRAPHY 183

[14] D.O. North, Analysis of the Factors which determine Signal/Noise Discrimination in Radar.
RCA Laboratories, Princeton, Technical Report, PTR-6C, June 1943. Reprinted in Proc.
IEEE, vol. 51, July 1963.

[15] H. Nyquist, “Certain factors affecting telegraph speed,” Bell Syst. Tech. J., vol. 3, pp. 324 -
346, April 1924.

[16] B.M. Oliver, J.R. Pierce, and C.E. Shannon, “The Philosophy of PCM,” Proc. IRE, vol. 36,
pp.1324-1331, Nov. 1948.

[17] J.G. Proakis and M. Salehi, Communication Systems Engineering. Prentice Hall, 1994.

[18] J.G. Proakis, Digital Communications. McGraw-Hill, 4th Edition, 2001.

[19] J.P.M. Schalkwijk, “An Algorithm for Source Coding,” IEEE Trans. Inform. Theory, vol.
IT-18, pp. 395-399, May 1972.

[20] C.E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Tech. J., vol. 27, pp.
379 - 423 and 623 - 656, July and October 1948. Reprinted in the Key Papers on Information
Theory.

[21] C.E. Shannon, “Communication in the Presence of Noise,” Proc. IRE, vol. 37, pp. 10 - 21,
January 1949. Reprinted in the Key Papers on Information Theory.

[22] G. Ungerboeck, “Channel Coding with Multilevel/Phase Signals,” IEEE Trans. Inform.
Theory, vol. IT-28, pp. 55-67, Jan. 1982.

[23] S. Verdu, Multi-user detection, Cambridge, 1998.

[24] J.H. Van Vleck and D. Middleton, “A Theoretical Comparison of Visual, Aural, and Meter
Reception of Pulsed Signals in the Presence of Noise,” Journal of Applied Physics, vol. 17,
pp. 940-971, November 1946.

[25] J.M. Wozencraft and I.M. Jacobs, Principles of Communication Engineering, Wiley, New
York, 1965.

[26] S.T.M. Ackermans and J.H. van Lint, Algebra en Analyse, Academic Service, Den Haag,
1976.
Index

a-posteriori message probability, 18 scalar, 27


a-priori message probability, 15 bandpass, 120
additive Gaussian noise channel binary symmetric, 23
scalar, 27 capacity, 12
additive Gaussian noise vector channel, 37 discrete, 15
aliased spectrum, 113 discrete scalar, 20
Armstrong, E.H., 11 discrete vector, 20
asymptotic coding gain, 152 real scalar, 25
autocorrelation function real vector, 34, 35
of white noise, 46 waveform, 45
available energy Clarke, A.C., 10
per dimension, 100 code
error-correcting, 149
bandlimited transmission, 100 coding gain
bandwidth-limited case, 110 asymptotic, 152
Bardeen J., Brattain W., and Shockley W., 10 coherent demodulation, 130
Bayes rule, 18 correlation, 49
Bell, A.G., 9 correlation receiver, 49
bi-orthogonal signaling, 77 cross-talk, 147
binary modulation
frequency shift keying (FSK), 84 decibel definition, 179
phase shift keying (PSK), 84 decision intervals, 29
Branly, E., 9 decision region, 36
Braun, K.F., 10 definition for real vector channel, 36
building-block waveforms, 52 decision rule, 16
maximum a-posteriori
Campbell, G.A., 9, 11 for discrete channel, 17
capacity for real scalar channel, 27
of bandlimited waveform channel, 107 maximum likelihood
of wideband waveform channel, 108 for discrete channel, 19
per dimension, 100, 106 for real scalar channel, 27
carrier-phase, 130 minimax, 22
Carson, J.R., 11 minimum Euclidean distance, 38
center of gravity of the signal structure, 82 decision variables
channel for discrete channel, 18
additive Gaussian noise for real scalar channel, 26

184
INDEX 185

for real vector channel, 35 interior volume of, 102


for the discrete vector channel, 21
DeForest, L., 10 information source, 15
demodulator, 60 inner product, 37
destination, 16 integrated circuit, 10
diode, 10 interval
direct receiver, 63 observation, 45
discrete, 16 irrelevant data (noise), 57
distance gain, 153 Kilby J. and Noyce R., 10
dot product, 37 Kotelnikov, V., 12
Dudley, H., 12 Kronecker delta function, 49
energy likelihood, 19
of vector, 67
envelope detection, 135 MAP decision rule
equally likely messages, 19 for the discrete vector channel, 21
error probability, 16 for discrete channel, 17
minimum, for the scalar AGN channel, for real scalar channel, 27
30 for the scalar AGN channel, 28
symbol, 145 MAP decoding
error-correcting code, 149 for real vector channel, 35
estimate, 16, 21 Marconi, G., 10
excess bandwidth, 115 matched filter, 12
extended Hamming codes, 149 matched filters, 62
matched-filter receiver, 62
Faraday, M., 9 Maxwell, J.C., 9
Fleming, J.A., 10 message, 15
frequency shift keying, 72 message sequences, 88
FSK, 72 minimax decision rule, 22
Galvani, L., 8 minimum Hamming distance, 149
Gauss, C.F. and Weber, W.E., 9 ML decision rule
Gram-Schmidt procedure, 54, 169 for discrete channel, 19
for real scalar channel, 27
Hadamard matrix, 76 modulation
Hamming codes amplitude (AM), 11
extended, 149 double sideband suppressed carrier (DSB-
Hamming distance, 149 SC), 130
minimum, 149 frequency (FM), 11
Hamming-distance, 23 pulse amplitude (PAM), 112
Hartley, R., 12 pulse code (PCM), 12
Henry, J., 8 pulse position, 72
Hertz, H., 9 quadrature amplitude (QAM), 120, 128
hypershere modulation interval, 113, 118
INDEX 186

modulator, 55, 59 Pupin, M., 9


Morse
code, 8 Q-function, 30
telegraph, 8 definition of, 30
Morse, S., 8 table of, 31
multi-pulse transmission, 118 upper bound for, 159
quadrature amplitude modulation, 120, 128
normalized signal-to-noise-ratio, 145 quadrature multiplexing, 120
North, D.O., 12, 68
Nyquist criterion, 113 random code, 100, 104
Nyquist, H., 11 real vector channel, 35
receiver, 16
observation interval, 45 direct, 63
observation space, 36 redundancy loss, 153
Oersted, H.K., 8 Reeves, A., 12
optimum decision rule, 16 reversibility
for real channel, 26 theorem of, 41
for the discrete channel, 18
optimum receiver, 16 satellites, 10
for the AGN vector channel, 38 Schwarz inequality, 174
for the scalar AGN channel, 28 serial pulse transmission, 113
orthogonal signaling, 72 set-partitioning, 150
capacity, 74 Shannon, C.E., 12
energy, 85 shaping gain, 129
error probability, 73 signal, 15
optimum receiver for, 73 signal constellation, 144
signal structure, 72 signal energy, 80
orthonormal, 47 signal space, 55
orthonormal functions, 46 signal structure, 55, 60
average energy of, 81
Parseval relation, 66 rotation of, 80
PCM, 12 translation of, 80
Pierce, J.R., 10 signal vector, 20
Poisson distribution, 23 signal-to-noise ratio, 64, 109
Popov, A., 10 at output of matched filter, 65
power spectral density normalized, 145
of white noise, 46 signaling
power-limited case, 110 antipodal, 82
PPM, 72 bit-by-bit, 100
probability of correct decision, 16 block-orthogonal, 100
probability of error orthogonal binary, 82
definition, 16 source, 15
pulse amplitude modulation, 144 sphere hardening
pulse transmission, 113 noise vector, 101
INDEX 187

received vector, 102


sphere-packing argument, 103
sufficient statistic, 51
sufficient statistics, 52
symbol error probability, 145

telegraph, 9
telephone, 9
Telstar I, 10
transistor, 10
transmitter, 15
triode, 10

uncoded transmission, 144


union bound, 105

van der Bijl, H., Hartley, R., and Heising, R.,


11
vector channel, 20
vector receiver, 21
vector transmitter, 20
Volta, A., 8

water-filling procedure, 107


waveform channel
relation with vector channel, 59
Wiener, N., 12

zero-forcing (ZF), 117


zero-forcing criterion, 113

You might also like