Capacity Limits of Optical Fiber Networks

Capacity Limits of Optical Fiber Networks

Ren-Jean Essiambre, Senior Member, IEEE, Fellow, OSA, Gerhard Kramer, Fellow, IEEE,
Peter J. Winzer, Fellow, IEEE, Gerard J. Foschini, Fellow, IEEE, and Bernhard Goebel, Student Member, IEEE
(Invited Paper)

AbstractWe describe a method to estimate the capacity limit

of fiber-optic communication systems (or fiber channels) based
on information theory. This paper is divided into two parts. Part
1 reviews fundamental concepts of digital communications and
information theory. We treat digitization and modulation followed by information theory for channels both without and with
memory. We provide explicit relationships between the commonly
used signal-to-noise ratio and the optical signal-to-noise ratio. We
further evaluate the performance of modulation constellations
such as quadrature-amplitude modulation, combinations of amplitude-shift keying and phase-shift keying, exotic constellations,
and concentric rings for an additive white Gaussian noise channel
using coherent detection. Part 2 is devoted specifically to the fiber
channel. We review the physical phenomena present in transmission over optical fiber networks, including sources of noise, the
need for optical filtering in optically-routed networks, and, most
critically, the presence of fiber Kerr nonlinearity. We describe
various transmission scenarios and impairment mitigation techniques, and define a fiber channel deemed to be the most relevant
for communication over optically-routed networks. We proceed to
evaluate a capacity limit estimate for this fiber channel using ring
constellations. Several scenarios are considered, including uniform
and optimized ring constellations, different fiber dispersion maps,
and varying transmission distances. We further present evidences
that point to the physical origin of the fiber capacity limitations
and provide a comparison of recent record experiments with our
capacity limit estimation.
Index Terms, Amplified spontaneous emission, Brillouin scattering, channel coding, detection, fiber nonlinearity, information
rates, information theory, modulation, noise, optical networks,
Raman scattering.




and practical interests [1][4]. Such a limit is referred to as the

channel capacity and the process of evaluating this limit leads to
a better understanding of the technologies needed to approach
Capacity evaluations require information theory [1] that must
be adapted to the specific characteristics of the channel under
study. A pragmatic approach is to define the channel as that part
of a communication system that the designer is unable or unwilling to change [5], [6]. Using this approach, capacity evaluations have been performed for a variety of physical media such
as twisted-pair copper cables [7][10], coaxial cables [9], [11],
wireless [12][15], and satellite communications [16][18], respectively. The goal of Part 1 of this paper is to introduce basic
concepts for evaluating channel capacities. We refer to these
tools in the second part of the paper where we evaluate the capacity of the fiber channel.
A. Spectra and Sampling
Information is usually transmitted using electromagnetic
waves over a physical medium (copper wires, coaxial cable, atmosphere, space, etc.). One often represents such waves by real,
analog signals
referred to as waveforms. Suppose
bandlimited to
Hz, i.e., the support of its Fourier transform,
or spectrum,
is within the frequency set
as shown in Fig. 1(a). We can then represent
by regularly
taken at the Nyquist rate of
spaced signal samples
samples per second [1],[19][21]. The signal may be reconstructed from the samples by multiplying the samples by a
sinc function defined as
, i.e., we have (see Fig. 2)

ETERMINING an ultimate limit to the rate at which one
can reliably transmit information over a physical medium
in a given environment is an endeavor having both fundamental

This work was supported by the Defense Advanced Research Projects Agency under Grant HR0011-06-C-0098.

24, 2010. This work was supported by the Defense Advanced Research Projects
Agency under Grant HR0011-06-C-0098.
R.-J. Essiambre, G. J. Foschini, and P. J. Winzer are with Bell Laboratories, Alcatel-Lucent, Holmdel, NJ 07733 USA (e-mail: [email protected]).
G. Kramer was with Bell Laboratories, Alcatel-Lucent, Murray Hill, NJ
07974 USA. He is now with the Department of Electrical Engineering, University of Southern California, Los Angeles CA 90089-2565 USA.
B. Goebel is with the Institute for Communications Engineering (LNT), Technische Universitt Mnchen, D-80290 Munich, Germany.
Color versions of one or more of the figures in this paper are available online
Digital Object Identifier 10.1109/JLT.2009.2039464

Another representation of the real signal
follows because
its spectrum satisfies
where is the complex
conjugate of . In other words, the negative frequency components of
are redundant for real signals. Thus, we may represent
by using
at positive frequencies only, as shown
in Fig. 1(b). This spectrum is referred to as the single sideband
(SSB) version of
Suppose next that we are interested in a passband signal
bandlimited to

is the carrier frequency and

Hz. The factor
normalizes the energy



Fig. 1. Spectral amplitudes of baseband and passband signals. (a) Real baseband signal spectrum. (b) Complex signal spectrum (single-sideband spectrum,
real and imaginary parts). (c) Complex baseband signal spectrum (real and
imaginary parts). (d) Real passband signal spectrum (cosine and sine parts).

Fig. 2. Two adjacent sinc pulses. The value of a sinc pulse at regular sampling
instants is zero except at one sampling instant for each pulse. This property
makes sinc pulses free from intersymbol interference (ISI) between symbols at
these sampling instants.

of the
functions. We may represent (2) by
defining a complex baseband signal
and writing
is the real part of and
. The real signals
are sometimes referred to as in-phase (I)
and quadrature (Q) components. The signal
is sometimes
referred to as the complex envelope of
The complex signal
has spectral support
as shown in Fig. 1(c), while the signal
has spectral
as shown in
Fig. 1(d). We again say that
has bandwidth
Hz. We can
at their Nyquist rate of
samples per
second and these samples
may be represented
as complex samples
of the signal
The signal
is reconstructed from the complex samples as
follows [19, see Fig. 2], [13, Ch. 2], [26, Ch. 6], and [27, Ch. 8]
The reader might wonder why (1) and (5) have the sampling
and , respectively. In fact, the sampling in (5) also
represents a sampling rate of
real samples per second since
in (5) are complex numbers. Mathematically, the passband signal of (2) is projected onto the signals


, rather than the signals

as is done for passband signals [13, Ch. 2], [26, Ch. 6], Ch. 6),
and [27, Ch. 7]. The difference in the representations of (1) and
(5) can be seen by considering the baseband signal of (1) to be
a passband signal at the carrier frequency
, assuming

Fig. 3. Symbol rate R , spectral support W , and wavelength-division multiplexing (WDM) channel bandwidth B for a transmit pulse with a square-root
raised-cosine spectrum.

now that
is permitted. In both representations (1) and
(5), a sampling rate of
real samples per signal is necessary
and sufficient.
One sometimes encounters an alternative way of viewing
passband signals, related to SSB modulation. For instance,
we can generate a passband signal with spectrum shown in
Fig. 1(d) corresponding to a real-valued baseband signal
with spectrum shown in Fig. 1(a) by using the following steps.
1) Strip off one sideband of
by using a Hilbert filter [26,
p. 200], [27, Ch. 7], Ch. 7) to generate a complex signal
with spectral support shown in Fig. 1(b). Now frequency
up-convert the signal by
Hz and transmit the
real part of the resulting signal.
2) Modulate
with the carrier
then eliminate one sideband by using a bandpass filter.
We observe that SSB modulation, being effectively an alternative way of generating passband signals, does not gain capacity
over dual sideband modulation with complex baseband signals,
as described above.
We remark that we distinguish between the signal spectral
support , the frequency bandwidth assigned to the signal
within the optical network, and the symbol rate
at which one
is modulating the transmit pulse. The symbol rate is
where is the symbol period. For example, consider Fig. 3 and
suppose we use passband communication, a transmit pulse with
a square-root raised-cosine spectrum [25], [23] and a roll-off
, and that is 20% larger than . In this case,
we have
. In general,
we consider



Fig. 4. Schematic representation of the main functions of a digital communication channel. The lighter boxes represent the source compression and decompression
functions while the darker boxes represent the communication channel and the related coding and modulation functions. The thicker arrows represent analog

B. Transmitter: Digital-to-Analog
The discrete-time signal representations described earlier are
eminently practical: to generate a passband signal
that is
bandlimited to
Hz one may generate
complex symbols
per second, multiply each symbol by a pulse that is bandlimited
Hz, and transmit these pulses in sequence. In other words,
signal generation may be separated into two distinct parts: modulation defined by a discrete set of values called the modulation alphabet or constellation, and pulse shaping to create
the pulse waveforms [3]. The size of the constellation determines the maximum information that each symbol can carry,
while pulse shaping affects the spectral width occupied by the
signal. The pulse shaping can be synthesized digitally using a
digital-to-analog converter (DAC) [28], [29]. Constellations are
treated in Section IV, so we consider pulse shaping next.
The pulse shape may be chosen so that there is no ISI between
successive symbols. For instance, the sinc pulse in Fig. 2 is zero
for all sampling instants
except for
, where
is the symbol period. However, the sinc pulse amplitude decays
slowly 1
so that there is significant ISI if the sampling is
imperfect. Another commonly used pulse shape has the raisedcosine spectrum [26, Ch. 6]

(6) is

is the roll-off factor. The pulse shape corresponding to

Clearly, we have
so there is no ISI.
Moreover, the pulse decays much more quickly 1
the sinc pulse if
, thereby significantly reducing ISI with
imperfect sampling. The price paid is that the spectrum has a
larger bandwidth
than the sinc pulse. The choice
recovers the sinc pulse.
A third commonly used pulse has the square-root raisedcosine spectrum
given by (6). The corresponding time-domain pulse is not zero at the sampling times
and, therefore, exhibits ISI. The reason for using this pulse
shape is that by placing a square-root raised-cosine spectrum

filter at both the transmitter and receiver, the overall pulse shape
has a raised-cosine spectrum [15, Ch. 11]. Furthermore, the receiver filter now acts as a matched filter for the transmit pulse.
We use the matched filter receiver because it maximizes the
signal-to-noise ratio (SNR) (see Section III-C) for channels with
additive noise. However, note that the nonlinear fiber channel
studied in Part 2 of this paper may have a different optimum receiver structure, which is a topic of further investigations. We
refer to [26, Ch. 6], [14, Ch. 5], and [15, Ch. 11] for more discussion on pulse shapes and matched filters.
Finally, we remark that the signal spectrum can be narrowed below the minimum bandwidth associated to the
by using correlative methods such as partial-resymbol rate
sponse signaling [30], [31] or continuous phase modulation
(CPM) [32], [33]. Correlative methods introduce memory. For
instance, the duobinary signaling in [34, Fig. 6] modulates a
pulse twice as fast as usual, perhaps resulting in
and thereby introducing memory into successive Nyquist-rate
symbols while doubling the data rate in a given spectral band
[34]. However, as was pointed out in [35], such modulations
are better viewed as an encoding operation followed by a
memoryless modulator. In other words, correlative methods
such as duobinary signaling or CPM are better viewed as coded
versions of the usual signaling.
For example, the duobinary signaling in [23] and [24] with a
pulse shape
is equivalent to a unit-memory, rate 1, digital
encoder followed by a memoryless modulator with a constellation size of 3 and a pulse shape
.1 A closely related CPM
method is known as minimum-shift keying (MSK) [36], and it
can also be represented as a unit-memory, rate 1, digital encoder
followed by a memoryless modulator and a pulse-shaper [37],
[38]. We can, therefore, focus our attention on finding the capacity of signaling with memoryless modulators and the usual
pulse shaping.
C. Receiver: Analog-to-Digital
At a receiver we may capture all the information in a noisy
bandlimited signal
by filtering the signal to reject noise and
interference outside the band of interest, and subsequently sampling the signal at its Nyquist rate. As an additional practical
step, one usually quantizes the amplitude of the signal samples
1See Section III and Fig. 4 for the meaning of unit-memory, rate 1, and
digital encoder. The encoder for duobinary signaling shapes the spectrum and
this type of coding is called line coding.


at detection to a discrete and finite set of values that are represented by sampling bits [23]. This is done by an analog-to-digital converter (ADC) [39], [29].
and suppose that
Consider the transmitted waveform
each symbol
in (5) takes on one of
complex values. The
combination of an ADC and demodulator that puts out more
values per sample is called a soft-decision detector and
it leads to two scenarios for the digital demodulator and subsequent decoder shown in Fig. 4 (see [26, Ch. 8], [14, Ch. 8],
and [27, Ch. 29]). In the first scenario, called hard-decision decoding, the demodulator decides which modulation symbol was
transmitted and passes its decision to the decoder (see Fig. 4);
the decoder operates on these hard decisions. In the second scenario, called soft-decision decoding, some or all of the sampling
bits are passed to the decoder and the decoder uses this soft information to decode [40]. In other words, the digital demodulator
in Fig. 4 is effectively removed. Obviously, using soft-decision
decoding with many quantization levels is preferable for performance, while using hard-decision decoding with few quantization levels reduces complexity. A summary of the performance
and challenges of high-speed ADCs can be found in [41][43].
As we shall see shortly, using discrete-time and discrete-alphabet signals makes sense at both the transmitter and the
receiver because noise limits our ability to extract information.
The process of converting a continuous-time and amplitude
to a discrete-time and discrete-amplitude-and-phase
is referred to as digitization. One of the many key
insights provided by Shannons information theory is that it
suffices to consider digitized signals to approach the ultimate
capacity limits of noisy channels [1]. Shannons work is generally recognized as having given birth to digital communications
and laying the foundation of todays computer and information


The process of transmitting information between an information source and a receiver can be represented by the basic
building blocks shown in Fig. 4 [1], [25]. The process can be
divided into two sets of operations: source encoding and decoding (light boxes in Fig. 4) [44] and the channel with its encoder, modulator, pre-equalizer, pulse shaper and inverse functions (dark boxes in Fig. 4) [2], [3].2 The process of source
encoding can be loosely described as a process to remove redundancy in the information source so as to produce a purely
random data source. In this paper, we assume that the data are
already compressed and can, therefore, be represented by a sequence of independent and identically distributed (i.i.d.) bits
each of which are equally likely to be 0 and 1. Shannon showed
in [1] that separating source and channel coding as shown in
Fig. 4 incurs no loss in communication rate.
Consider, therefore, the channel encoder and decoder. The
channel encoder takes as input a stream of bits and puts out a
stream of bits with added redundancy, usually in the form of
parity checks [40]. The encoder rate
represents the number
2Line coding for spectral shaping and security coding for secrecy and authentication are other types of coding. Line coding was treated briefly in Section II-B; we consider neither source nor security coding.


Fig. 5. Schematic representation of the channel probability distributions.

of input bits per output bit3 and satisfies

corresponds to uncoded data. The code overhead is defined as
and is usually given as a percentage. For instance,
has a 6.7% overhead. The ena code with rate
coder memory refers to the number of successive input bits, in
excess of a single bit, required to compute each output bit. For
example, a unit-memory encoder requires two successive input
bits to compute each output bit.4
The encoders output bits are mapped onto a discrete set of
values from a modulation constellation of size . The number
of (coded) bits per modulation symbol is, therefore,
and the number of information bits per modulation symbol is
. This combination, or any other combination,
of channel encoding and modulation is referred to as coded modulation.
The pre- and post-equalization functions serve as generalized filters that can remove channel memory or perform nonlinear operations, such as described in Section X-D.
The receiver quantizes each received pulse onto a (perhaps
large) discrete set of values, as described in Section II-C.
The channel from the digital modulators output
to the
digital demodulators input
is, therefore, a discrete-input,
discrete-output channel. For example, Fig. 5 depicts a channel
with modulation constellation
, output
, and channel probabilities
, i.e.,
the probability of
under the condition that
was sent.
We shall review the notion of channel capacity for discreteinput, discrete-output channels in the following section. However, it is insightful, and computationally useful, to consider
what is possible if one were permitted to increase the sizes of
the modulation and receiver sampling alphabets without limit.
For this reason, we will also consider as follows:
1) complex-input, complex-output channels;
2) discrete-input, complex-output channels; and
3) ring-input, complex-output channels.
The motivation for considering the first of these channels is
to determine what is ultimately possible without considering
transmitter or receiver complexity. The motivation for considering the second channel is to see what is possible if the receiver can quantize as finely as desired. One reason for consid3A tilde distinguishes rates that are not per second (input bits/output bits,
bits/symbol) from rates that are per-second (bits/second, symbols/second).
4More generally, an encoder can take in a stream of symbols and put out a
stream of symbols. For example, a symbol might represent a block of bits. A
unit-memory encoder might then be viewed as requiring two successive input
symbols to compute each output symbol.



ering ring constellations is to later take advantage of channel

rotational symmetries that simplify capacity computations (see
Section X-C). Finally, in what follows we also discuss channels
with memory to motivate and justify our approach for finding a
capacity lower bound estimate in Part 2 of this paper.

. This quantity is referred to as mutual information and

is defined as [2, Ch. 2], [3, Ch. 2]


A. Discrete-Input, Discrete-Output Channels

Consider the discrete-time, discrete-input, discrete-output
channel depicted in Fig. 5. The channel takes as input a seand puts out a sequence
is a noisy function of
. Every
takes on a value in and , respectively. The noise is
described by the conditional probability distribution
that is time-invariant, i.e., the probability that the input
produces the output
for all . Following
[2], we shall usually refer to random variables with upper case
and realizations of these variables using
lowercase letters
. For notational convenience, we
shall sometimes use the sequence notation
and similarly
To describe capacity, it is useful to define the entropy of the
input random variable as (see [1][3])
is the probability that the random variable takes
are not included in
the value . Symbols with
the summation and we follow this convention below. We have
further taken the logarithm to the base 2 so that our units are in
binary digits or bits. If the natural logarithm is used, then the
units are called nats. One nat is
bits. We remark that we can alternatively write (8) as
is the expectation operator [3, Ch. 2].
The entropy can be considered to be the uncertainty about .
Similarly, the uncertainty of one random variable relative to
the realization of another random variable can be captured by
introducing a quantity called conditional entropy. For
, conditional entropy is defined as

Observe that, we can also write

, which is the reason for calling this quantity the mutual
information. The capacity of a channel is the maximum mutual information, where the maximization is performed over all
possible input distributions
, i.e., the capacity in bits per
(channel input or output) symbol or channel use is [1], [2, Ch.
4], and [3, Ch. 7]
It turns out that reliable communication, i.e., communication
with arbitrarily small nonzero error probability, is possible at
the rate bits per symbol if
and is impossible if
(the case
behaves differently depending on the channel).
Observe from (12) and (13) that the capacity is a maximum entropy difference.
B. Complex-Input, Complex-Output Channels
We next focus our attention on the complex additive white
Gaussian noise (AWGN channel) (see Fig. 6 and [1], [2, Ch. 7],
and [3, Ch. 9]. The channel input and output are complex
random variables with
is noise, and
are independent
Gaussian random variables each having zero mean and variance
. In other words, for
we have
The input and noise are independent, and the channel conditional probability density is, therefore,

is the probability that
was transmitted
given that we observe
at the receiver. Using Bayes rule
[3], [45], [46], we have
is the joint probability that
. The
average conditional entropy is written as

Suppose that
and have a joint density
. The
entropies and mutual information are now defined as [3, Ch. 9]5

Using (8) and (11), we define a quantity that measures the
and , in the
information between two random variables
sense that it measures how much knowing reduces


5The entropy of a continuous random variable

is called a differential entropy and is often represented using the notation ( ) rather than ( ). We
here use the notation ( ) for entropy and differential entropy.





Alternatively, since the signal and noise energies are

Joules/symbol, respectively, we can write (24) as
Fig. 6. Schematic representation of the AWGN channel.

assuming that all of these integrals exist. For the complex

AWGN channel, we compute

where the last step follows by inserting (15) into (17) with
replacing .
Suppose we have the input constraint6
(we soon
interpret as a power). The capacity in bits per symbol is now
[1], [2, Ch. 7], and [3, Ch. 9]

bits per
Observe from (24) that
second. In other words, capacity increases linearly with if we
can use all frequency bands. Moreover, the spectral efficiency
SE in bits per second per Hz is [26, Ch. 7]
Suppose next that we use a pulse with energy but at the
that is less than
which is less than the WDM
symbol rate
channel bandwidth (see Fig. 3). For the same energy , the
signal power is reduced by a factor of
as compared to
the previous case with sinc pulses at symbol rate . Suppose
we filter the received signal by using a unit-energy matched filter
before sampling. The noise energy is again
by using Parsevals theorem [26, p. 115] and the noise power is
. The
capacity of (24) in bits/second, thus, reduces to
The spectral efficiency of (26) in bits/second/Hz reduces to

It is well known from Shannons work that the optimum has a
density that is a bidimensional Gaussian distribution of the form
[1], [2, Ch. 7], and [3, Ch. 9]
The resulting capacity in bits per modulation symbol is

Note again that by keeping the energy constant the power

in (27)(28) is smaller than the in (24) and (26).
C. Signal-to-Noise Ratio
We develop three SNR definitions that are commonly used.
The second of these leads to an important insight based on the
spectral efficiency of (28). The first definition is taken from (25)
or (27), namely

is referred to as an SNR.
Suppose, we use a sinc pulse (see Fig. 2) with symbol period
, bandwidth
, and energy
Joules, respectively, for signaling and the same sinc pulse but with unit energy
as a receiver (matched) filter. Suppose the noise is a Gaussian
random process with a (two-sided) power spectral density of
Watts/Hz across all (positive and negative) frequencies of
interest [26, Sec. 7.7], [27, Sec. 25.15]. The noise power after the
receiver sinc filter is, therefore,
Watts per sample
and the noise samples are independent. Using (23), the capacity
in bits per second is [26, Ch. 7], [13, Ch. 5]


constraint may be interpreted as E [j j ]  for all . A less stringent constraint is

E [j j ] 
. However, the capacity in either case
turns out to be given by (21).


Equation (28) can thus be expressed as
Second, the SNR in (29) is based on the energy or power per
modulation symbol. For a fair comparison among modulation
formats, it is convenient to consider the SNR per information
bit which we denote by SNR . Recall that the number of information bits per modulation symbol is
so we

We remark that
is often referred to as
since the energy per information bit is
[26, Ch. 7]. Note that both
SNR and SNR are defined here in a single mode (i.e., single



We remark that the information bit rate, in bits per second, is

given by
where a symbol includes the field of both polarizations. Using
, one can, therefore, express (34) as
Fig. 7. Quantities involved in the definition of (a) signal-to-noise ratio (SNR)
and (b) optical SNR (OSNR). l.u.: linear units, AWGN: additive white Gaussian
noise, ASE: amplified spontaneous emission.

Note that this relationship between OSNR and SNR depends

only on the information bit rate
and is independent of
whether one uses polarization multiplexing or not.
D. Discrete-Input, Complex-Output Channels

polarization) with complex signal and noise representation. Inin bits/second, we

serting (31) into (30) and using
One can check that (32) has nonzero solutions only if SNR
. In other words, the minimum possible SNR , expressed in
decibels (dB), is
dB [26, Ch. 7], [13,
Ch. 5]. Note that in (32) has the units of bits/symbol. The
spectral efficiency in bits/second/Hz is the largest solution for
in (32) reduced by the factor
symbols/second/Hz. For
instance, when signaling with sinc pulses without guardband we
(see Fig. 3) while if we use
as in the example in Section II, we have
Third, in optical communications one usually uses a quantity
called the optical SNR. A schematic representation of the quantities involved in the definition of OSNR along with the corresponding quantities for SNR definitions are shown schematically in Fig. 7. The definition of OSNR is
where is the total average signal power summed over the two
states of polarization,
is the spectral density of amplified
spontaneous emission (ASE, see Section IX-B1) in one polarization and the reference bandwidth
is commonly taken to
be 12.5 GHz, corresponding to a 0.1 nm resolution bandwidth
of optical spectrum analyzers at 1550 nm carrier wavelength
(193.4 THz carrier frequency). The factor 2 in (33) is often interpreted as accounting for both polarizations of ASE. The definition of OSNR differs from SNR by a normalization factor based
on the particular choice for the fixed reference noise bandwidth
as well as by how one accounts for signal and noise polarization
modes. Using (29) and (33), one can relate SNR and OSNR directly as
for a singly polarized signal and
polarization-multiplexed signal, and where
assumed to be equivalent (see Section IX-B1).


for a

Sections III-B and III-C showed what is possible if we could

transmit using infinitely fine modulation alphabets and infinitely
fine receiver quantization. We now take a step back from these
idealizations and use a discrete modulation alphabet . The mutual information of (19) becomes (see [47])
Alternatively, we can simply use (20) with (38). The capacity is
thus a maximization of
For example, suppose we use
-ary phase-shift keying
(M-PSK) which means that
. The optimal input distribution turns out
to be
for all . This intuitively
pleasing result follows by the rotational symmetry of the
channel: (37) remains the same if we replace
for any fixed phase . Furthermore, we know that
concave in
is held fixed [3, p. 33]. We can
that is a probabilistic
thus generate a new channel input
mixture of several
for different and apply Jensens
inequality [3, Sec. 2.6] to show that
is the channel output when the channel input is . For
M-PSK, one simply chooses a uniform mixture of
E. Ring-Input, Complex-Output Channels
that are continuous
In Part 2 of this paper, we consider
as ring
in phase but discrete in amplitude. We refer to such
constellations with the alphabet
one ring or
. We
for rings for some choice of ring powers
can again use (20) to compute the mutual information and to
show that the best
have a uniformly-distributed phase on
every ring by using the rotational symmetry of the channel (use
the arguments outlined above for M-PSK). One could further
optimize the ring powers
, or ring amplitudes



. We provide the details of the calculation of the mutual information for ring constellations in Section IV-B and in
Appendix A.
There are several reasons for choosing ring constellations
[48], [49]. First, such constellations approximate the Gaussian
distribution as the number of rings increases if we choose the
appropriately. A simple choice is to have
ring amplitudes
the rings equally spaced in optical field amplitude and with
equal frequencies of occupation (equal probability for choosing
a transmitted symbol from any of the rings). The ring amplitudes and occupation frequencies could both still be optimized
to better approximate (22) but, as we shall see in Figs. 16 and 17,
respectively, this pragmatic constellation choice already gives
very close to the AWGN channel capacity. We treat
our choice of ring constellations in more detail in Section X-C.

where the second step follows because the
are independent
and by using the chain rule for expanding entropy [3, Sec. 2.5],
and the third step follows because conditioning cannot increase
entropy [3, p. 29]. We now use the fact that
is convex in
if the distribution of is held fixed [3, p. 33] . Observe
, are identically distributed with,
that, the
say, the distribution of . Applying Jensens inequality [3, Sec.
2.6] to (44), we thus have

F. Channels With Memory

The AWGN channel is memoryless because the output
at time depends only on the input
at time ,
is independent of previous channel inputs
[3, Ch. 7]. However, many channels
in practice do not fulfill these conditions and have memory.
For channels with memory, the capacity can be calculated by
replacing the symbols and with larger and larger blocks of
[2, Ch. 4] (recall that
). For instance, if
has the
is usually very time and resource intensive, even if is as small as 3 or 4. We remark that when the
channel is memoryless, (40) normalized by gives the same
capacity as if one uses the mutual information (19) for memoryless channels.
For propagation over optical fiber considered in Part 2 of
this paper, memory is introduced by fiber chromatic dispersion
and fiber nonlinearity (see Section IX-D). However, our fiber
channel includes reverse propagation (or back-propagation) of
a single channel using digital signal processing (DSP) that removes a large portion of the channel memory (see Section X-D).
Some additional memory may remain, e.g., due to imperfect
channel modeling, but for simplicity we will use a memoryless
model to evaluate fiber capacity in Part 2. The following paragraphs demonstrate that using such a memoryless model results
in a capacity lower bound.
Suppose that the optimal input is
and that the corre. Let
be an input with independent
sponding output is
and i.i.d. symbols
. Let
be the channel
output corresponding to
. The capacity is then bounded as

is the capacity-achieving input. We further have

is the output of a channel with input
the channel density is
Inserting (45) into (41), we find that

and where

Summarizing, we obtain a lower bound on the capacity of a

channel with memory by using i.i.d. inputs
and computing (say by simulation) the mutual information over
the averaged channel. We use precisely this approach in Part
2 to compute a fiber capacity estimate that is a lower bound
estimate of the actual capacity. The bound is an estimate because
our calculations in Part 2 are based on particular choices for
noise-models, as well as numerically obtained estimates for the
parameters of these models.
In Section II, we discussed analog waveforms upon which
one can imprint symbols while in Section III, we discussed the
information content associated with sets of symbols when optimum coding is used. This section deals with the various ways
symbols can be arranged in constellations, the performance of
these constellations in terms of bit-error ratio (BER) without
coding, and the capacity of these constellations on the AWGN
channel assuming optimum coding.
Once a pulse shape has been chosen to represent the signal
[50], one can generate each symbol pulse using two oscillators
in quadrature (the sin and cos oscillators described in
Section II). The two oscillator outputs can be separated without
introducing crosstalk since they form an orthogonal basis [22]
[25], [51]. Each symbol can be represented by a complex
number, one real number for each quadrature. Since symbol
values are real numbers, constellations can be continuous sets.
The continuous amplitude of each quadrature is motivated by
the optimality of the bidimensional Gaussian distribution for
the AWGN channel (see (22) and left plot in Fig. 14). Practical considerations compel, however, to use a set of discrete
constellation points (a discrete symbol alphabet).
A. Discrete Constellations


We show in Fig. 8 examples of discrete constellations using

the real part of the field only while Fig. 9 shows constellations



Fig. 8. Examples of constellations using only one quadrature of the field (here
the real part). The number of bits/symbol is given by log ( ) where
is the
total number of symbols. The number log ( ) of symbols is used as the first
digit of the format label.

Fig. 10. BER as a function of SNR for the modulation formats of Figs. 8 and 9.

Fig. 9. Examples of constellations that use both quadratures of the field.

that utilize both field quadratures. These constellations generally carry a different number of bits per symbol, depending on
the number of symbols . A constellation can carry a maximum of
information bits per symbol. This maximum
is achieved when all the points in a constellation are used at the
same frequency and in the absence of coding, which means that
. One can design transmitters that use some constellation points more often than others, in which case different frequencies of occupation are associated with different constellation points. The conveyed information is then less than
bits per symbol for such transmitters.
The average power associated with a constellation is given
, i.e., the average of the square of all symbol amplitudes. This means that constellations with larger
must have
their points more closely spaced together. One, therefore, expects different constellations to have different BER versus SNR
curves for the AWGN channel. BER curves for the constellations of Figs. 8 and 9 are shown in Fig. 10. These BER curves
apply in the absence of coding and for an identical frequency of
occupation for each constellation point. Moreover, for
the bits are mapped into the constellation using Gray mapping
(after Frank Gray, who used the term reflected binary code [52])
that minimizes the number of bit errors for a given symbol error

probability under the assumption that a symbol is most likely

mistaken for one of its immediate neighbors [23][25].
The BER curve with the lowest SNR requirement in Fig. 10
represents binary phase shift keying (BPSK) (1a). The next
formats having the lowest SNR requirements are the on/off
keying (OOK) (1b) and 4-quadrature amplitude modulation
(QAM) or quaternary phase-shift keying (QPSK, 2b) formats
that have identical SNR requirements of 3 dB higher than
BPSK. Therefore, between the two 1 bit/symbol formats considered, BPSK (1a) and OOK (1b), BPSK has the lowest SNR
requirement by 3 dB. The next formats with low SNR requirements are 2-ASK/2-PSK (2a), where ASK stands for amplitude
shift keying, and 2-ASK/4-PSK (3d) that have nearly identical
SNR requirements at
. The 2-bits/symbol format
with the lowest SNR requirement in Fig. 10 is, therefore,
4-QAM or QPSK (2b). The 8-PSK (3c) format follows with a
requirement of about 1.2 dB higher SNR than 2-ASK/4-PSK
(3d) at
BER. The format 4-ASK/2-PSK (3a) that also
supports 3 bits/symbol has a considerably higher required SNR
than the two other 3 bits/symbol formats. 2-ASK/4-PSK is
therefore the 3 bits/symbol format with the lowest required
SNR shown here. Among the 4 bits/symbol formats (4b, 4d,
and 4e), 16-QAM (4b) has the lowest SNR requirement. The
QAM formats considered for 5 bits/symbol, 32-QAM (5b),
and 6 bits/symbol, 64-QAM (6b), have considerably higher
required SNR than lower bits/symbol formats.
The BER plot of Fig. 10 considers only the SNR and disregards the difference in the number of bits that different constellations can carry. In order to take into account the number
of bits a constellation can carry, we plot the BER as a function of the SNR per bit, SNR
(see (31) with
). The BER curves as a function of the SNR are shown
in Fig. 11. One notices on these curves that the three pairs of
formats, BPSK (1a) and 4-QAM or QPSK (2b), 2-ASK/2-PSK
(2a) and 16-QAM (4b), and 4-ASK/2-PSK (3a) and 64-QAM
(6b) have identical SNR per bit requirements. This can be understood by considering that the second format is an orthogonal
bidimensional version of the 1-D first format. Since the two


Fig. 11. BER as a function of SNR per bit for the modulation formats of Figs. 8
and 9.

field quadratures are orthogonal, modulating both quadratures

conveys twice as many bits while doubling the noise variance.
The SNR per bit requirements are thus identical (see [25, Sec.
4.8.4] for more explanations on BPSK and 4-QAM or QPSK).
These identical requirements of SNR when using both quadratures is a compelling reason to make use of both quadratures
for signaling. The same situation is found in optical communications when making use of polarization multiplexing: modulating both orthogonal polarizations of the optical field doubles
the conveyed information but also doubles the noise power by
including noise from the second polarization. As a net result, at
a fixed aggregate bit rate and modulation format, the OSNR required to achieve a certain BER is the same whether one uses
polarization multiplexing or not (see (36)).
The BER curves of Figs. 10 and 11, respectively, do not
take into account channel coding [1]. As explained in previous
sections, coding is a powerful way to reduce the BER to an
arbitrarily small nonzero value. Fig. 12 shows the capacity
in bits/symbol (see (23)) for the formats considered so far. The
Shannon capacity limit of (23) that uses a continuous bidimensional Gaussian constellation is also shown as a reference. At
high SNR, all formats with a finite number of constellation
points saturate to
bits per symbol, the maximum
capacity (and maximum entropy) of the respective constellation. At SNRs below the constellation capacity, one observes
that constellations using both field quadratures approach their
limits much faster than constellations that use only one
quadrature (cf. constellations 1a, 1b, 2a, and 3a). For instance,
one can see in Fig. 12 that 4-QAM or QPSK (2b) approach
limit much faster than 2-ASK/2-PSK (2a). Note
also in Fig. 12 that to achieve a capacity of 3 bits/symbol, for
instance, it is preferable to use larger constellations and code
them with the appropriate (high) redundancy to achieve a low
SNR requirement for the desired capacity. This is in contrast to
BER curves, where larger constellations always require higher
SNRs than the best smaller constellations.


Fig. 12. Capacity as a function of SNR for the modulation formats of Figs. 8
and 9.

Fig. 13. Capacity as a function of SNR per bit of information for the modulation formats of Figs. 8 and 9, respectively.

The capacity as a function of SNR is shown in Fig. 13. It

shows more clearly than in Fig. 12 the performance of different
formats at low values of capacities. At low capacities (
bits/symbol), all formats require an SNR per bit
dB [47] except for the OOK
format (1b) that requires 3 dB higher SNR [47]. One may note
that the three other formats that use only one quadrature of the
field (3a, 3c, and 3d) approach the low capacity region at half
the slope of all the other formats that use both field quadratures
[53]. Formats that use both field quadratures also approach the
Shannon limit the closest even at higher capacities. From these
capacity plots, one concludes that large constellations can be
coded so as to require the minimum SNR and SNR per bit.
Finally, it is worth pointing out that all real-valued constellations and real-valued modulation waveforms could also be
transmitted in their single-sideband version (see discussion
along with Fig. 1). In this case, and if we were to plot the


Fig. 14. Bidimensional Gaussian constellation (left plot) optimum for the
AWGN channel and a four ring constellation approximation (right plot).


Fig. 15. Ring constellations with various numbers of rings r from 1 to 16.
In ring constellations, only discrete values of amplitude are allowed while the
phase can assume an arbitrary value (continuous phase). The amplitude of the
outer rings is here equal to a multiple of the amplitude of the inner ring for
constellations larger than one ring.

capacity in terms of
(in bits/s/Hz) as opposed to
bits/symbol), these formats would have the exact same limiting
capacity as their complex equivalents, e.g., 2-ASK/2-PSK
would have the exact same limiting capacity of 4 bits/s/Hz as
We have considered in this section a number of commonly
used constellations and capacity calculations assuming soft-decision decoding. In Appendix C, we consider a more extensive set of constellations, including constellations of larger sizes
and different shapes. Appendix C also includes the impact of
hard-decision decoding on capacity.
B. Ring Constellations
The constellations considered in Figs. 8 and 9, respectively,
are used in practice because of their discrete nature that facilitates their generation. However, their capacity is limited to
bits/symbol. Such capacity limitations are alleviated by
continuous constellations such as the bidimensional Gaussian
constellation that leads to the Shannon capacity formula of (23).
One way to understand this is that a continuous constellation
allows increasing arbitrarily the effective number of constellation points in both quadratures. It is interesting to note that a
format remains unbounded in capacity as SNR increases even
if only one of the two dimensions of the constellation is continuous while the other is discretized. One way to produce a constellation not bounded in capacity is to discretize the bidimensional Gaussian constellation in amplitude to create concentric
rings. Such a discretization allows to take advantage of continuous rotational symmetries of certain channels, i.e., the fact that
constellation points rotated by an arbitrary value of phase are
equivalent for these channels.
A schematic representation of the discretization in concentric
rings is shown in Fig. 14. A constellation with a single ring is
referred to as phase-shift keying (PSK) while constellations with
two or more rings can be referred to as r-ASK/PSK, where r is
the number of rings. Ring constellations having 1, 2, 4, 8, and
16 rings are shown in Fig. 15 along with their names and labels.
The capacities of the ring constellations as a function of SNR
are shown in Fig. 16. We consider equidistant rings here, i.e.,
constellations where the radii of the outer rings are given by
an integer multiple of the radius of the inner ring. The number
of points on each ring is assumed to be large enough so as to
be considered continuously distributed in phase, with identical
frequency of occupation on each ring. The details of the ring
capacity calculations are given in Appendix A.

Fig. 16. Capacity as a function of SNR for the ring constellations of Fig. 15.
The capacities of constellations with more than 16 rings are also presented.

The one-ring constellation starts to depart from the Shannon

limit at a SNR of a few decibels, where the capacity is slightly
above one bit/symbol. For more rings, departure occurs at higher
SNRs and capacities. One notes that at high SNRs, the capacity
for any number of rings increases at a fixed rate of 0.5 bits/
symbol for every doubling of the SNR in contrast to the rate
of 1 bit/symbol for every doubling in the SNR observed for
the Shannon limit. This can be understood by the fact that, for
a fixed number of rings, the capacity can only increase by increasing the number of points on the rings as the SNR increases.
This confines the growth of the number of points to effectively
one dimension of the constellation in contrast to the two dimensions of the bidimensional Gaussian.
We plot the capacity as a function of SNR per bit for the ring
constellations in Fig. 17. As for the discrete constellations (except OOK(1b), see Fig. 13), ring constellations require a minimum of
dB per bit to transmit at any capacity. Note
that the capacity curves for each ring start to change slope near
a capacity of
bits/symbol where is the number
of rings for both Figs. 16 and 17.
Finally, we bring together the concepts presented in Part 1
by giving a numerical example in Fig. 18 of an optical field
of a two-ring constellation using sinc pulses. The optical spectrum is displayed in Fig. 18(a). It is bandlimited to the bandwidth corresponding to the symbol rate
expressed in Hz. To
facilitate visualization, a limited number of phase values on a



Fig. 17. Same as Fig. 16 but with capacity as a function of SNR per information bit. All constellations converge to an SNR per information bit of
10 log (ln(2))
1:59 dB at low capacity.


Fig. 18. Example of signal field using sinc pulses and a ring constellation. (a)
Spectrum, (b) Constellation, and (c) Waveform. A small number of symbols
(32) and rings (2) are represented for clarity.

32-point angular grid is used to represent a continuous phase

(empty circles). A finer grid can be used to represent more accurately a continuous phase as needed. The constellation points
actually used are shown as full circles on the angular grid. Only
32 symbols of the waveform are shown in Fig. 18(c) to facilitate visualization. The sampling instants of the time waveform
of Fig. 18(c) shows where the symbols are located. Notice that
in Fig. 18(c), the amplitude between sampling points can exceed considerably the highest value of the symbols. This phenomenon is the result of coherent addition of the sinc pulses
between sampling instants and gives a peaky waveform, no
matter how many rings are used or how many points are present
in the constellation, and occurs even for binary formats. This is
due to the fact that the sinc pulse extends well beyond a single
symbol duration.
In Part 1 of this paper, we have presented basic concepts
of digitization of analog waveforms, information theory, and
multi-level constellations, both discrete and using ring constellations. The performance of these constellations has been evaluated in terms of BER and capacity with hard and soft decisions
for the AWGN channel. These concepts, and in particular the
ring constellations, will be used in Part 2 of the paper to calculate an estimate of the capacity limits of optical fibers in optically routed networks.




The vast majority of worldwide data and voice traffic is transported using optical fibers, interconnected to form global fiberoptic networks. As the demand for bandwidth continues to increase exponentially at about 60% per year [54], it is of great

interest to study the transmission capacity between two locations in such optical networks. The aim of Part 2 of this paper
is to provide the most accurate capacity estimate possible for a
fiber channel defined in the context of transporting information in optically-routed networks (ORNs).
Since its foundation [1], information theory has been applied
to several communication channels with great success. The
capacity analysis presented here is also based on information
theory, with specific adaptations to the optical fiber channel.
The most important difference between the optical fiber and
other transmission media that have been considered for capacity analyses is the presence of Kerr-nonlinearity, i.e., the
propagation properties of the medium change with increasing
signal power. As we shall see, this property has important
consequences. While linear physical media perturbed by additive noise generally result in channel capacities that increase
monotonically with transmit power owing to an increasing
SNR, we may find that the negative impact of nonlinear signal
distortions grows at a faster rate than the SNR capacity gain
at high signal powers and for a band-limited channel. This
behavior may turn the channel capacity into a nonmonotonic
function of the transmit power, and the channel capacity will
exhibit a pronounced maximum at a given (finite) signal power
level or SNR. Our fiber channel capacity estimate exhibits such
behavior and therefore differs fundamentally from linear channels whose capacities have been extensively studied [5][18].
Applying information theory to the fiber channel faces several major challenges. An important difficulty originates from
the presence of three phenomena in the fiber channel: noise,
filtering, and Kerr nonlinearity, as visualized in Fig. 19. These
three phenomena are distinct in nature, occur simultaneously,
are distributed along the propagation path, and influence each
other. Note that fiber chromatic dispersion is a form of all-pass
filter and can introduce substantial memory into the channel.



Fig. 19. List of the physical phenomena present in the optical path classified
in three groups: 1) Fiber nonlinearities, 2) filtering, and 3) noise. All the phenomena mentioned in the figure are discussed in this paper. The most important
phenomena limiting fiber capacity are in bold.

The various interactions between these physical phenomena

may lead to deterministic as well as (at least partially) stochastic
There have been previous studies on the capacity limits of
fibers accounting for the presence of fiber Kerr nonlinearity.
Some rely on empirical approaches [55][58], approximate
solutions assuming that fiber nonlinearity is low [59][65] or is
heuristically considered as a particular form of multiplicative
noise [59], [60], [62], while others are limited to specific
nonlinear propagation effects [66]. Some fiber capacity studies
[67][75] solve numerically the equation of propagation in
fibers to fully capture all Kerr instantaneous nonlinear effects
that include the signal and noise. Some of these studies have
been performed using a model with memory [67][70] but
were limited to modulation formats with at most a few bits per
symbol, while others have used modulation formats that are
not limited in the number of bits per symbol and are maximally
compact [71][75]. Capacity limits when using the OFDM
format for transmission over fibers has also been reported [76].
In Part 2 of this paper, we discuss in detail the channel model
studied here and present the results of fiber capacity estimates
using ring constellation that incorporate input constellations optimization, various dispersion maps, and the effect of propagation distance on channel capacity.
Experimental demonstrations of high-capacity transmission
over a single fiber strand using state-of-the-art technologies
have generated considerable excitement over more than two
decades. These record capacity experiments are typically reported at the postdeadline sessions of major conferences on
optical communication such as the annual Optical Fiber Communications Conference (OFC) or the European Conference
on Optical Communication (ECOC). These experiments, often
dubbed as hero experiments, use the latest technologies in
transmitters, receivers, optical fibers, and optical amplifiers to
maximize fiber capacity.
The term capacity, in the sense of Shannon, is defined as the
maximum information rate (averaged over long data sequences)
that can be guaranteed for an arbitrarily low but nonzero BER.
The system is then declared error free. For hero experiments,
the minimum acceptable BER was initially set to
(at a

Fig. 20. (Top) Historical evolution of record capacity and (bottom) spectral
efficiency of hero experiments in fiber-optic communication systems.

time when forward-error correction (FEC) was not used in fiberoptic communication). With the introduction of FEC in the mid1990s, system experiments started to include a coding bit-rate
overhead of typically 7%, and declared error-free transmission if the measured BER was at least as good as the value
needed at the input of state-of-the-art (hard-decision) FEC devices such that the FEC output BER would be between
in the late 1990s to as low as
today. With advances in
FEC technology [77][79] the required value for the measured
[80] for first-generation
input BER has shifted from
for second-generation
7% ReedSolomon FEC to
FECs [79]. Codes with higher overheads (on the order of 25%)
have been investigated and used mostly in the context of subat
marine systems so far. They require about a BER of
the FEC input [79]. Note that all these codes guarantee the correction at the prescribed input BER only for uncorrelated errors
such as for the AWGN channel. On the nonlinear fiber channel,
however, different noise statistics as well as burst errors may
be encountered, which can tighten the BER requirements of a
code [72], [78], [81][84]. This fact is neglected in virtually all
hero experiments due to the experimental difficulties associated with testing full end-to-end transmission including FEC.
Fig. 20 plots the capacity and the spectral efficiency of
hero experiments since the mid-1980s [85]. The lower curve
in the top plot shows the transmission bit rate that could be
obtained on a single optical wavelength and on a single polarization using electronically time-division multiplexed (ETDM)
transmitters. The experienced growth is about one order of
magnitude over two decades or about 12% per year, which
would have been insufficient to fuel the bandwidth demand
of modern data services that grows at about 60% per year
[54]. The highest ETDM bit rate reported today is 100 Gb/s


Fig. 21. Spectral dependence of the fiber loss coefficient for a typical low-loss
optical fiber (SSMF) and a fiber without the water absorption peak (Allwave).
The origin of the main sources of loss are indicated along with the names of the
amplification bands and their wavelength ranges. (Courtesy of D. Peckham.)


Fig. 23. Schematic representation of an ORN in mesh. The optical functionality

of a reconfigurable optical add-drop multiplexer (ROADM) is to multiplex (add)
or demultiplex (drop) frequency bands from a WDM signal from one optical
fiber to another.

and optical filtering technologies. Around the turn of the millennium, the bandwidth of optical amplifiers approached their maximum values allowed by the material, and the capacity growth
began to slow down. Capacity growth became mainly driven by
an increase in spectral efficiency (see bottom plot of Fig. 20)
brought by advanced modulation formats that have quickly been
replacing the prevailing OOK systems in the long-haul transport
Fig. 22. Spectral layout of the WDM channel superposed to the noisy field
originating from ASE.


[86], which is much lower than the bandwidth supported by
an optical fiber. The fiber bandwidth that is considered usable
for long distance transmission occupies the wavelength range
from 1300 to 1700 nm, where fiber loss is moderate or low
dB/km, see Fig. 21). This corresponds to a full channel
bandwidth of 54 THz. In practical implementations, the
usable bandwidth is limited by the bandwidth of amplification
technologies, which is on the order of 510 THz (see EDFA
box in Fig. 21). Multiple amplification technologies can be
used in parallel to form multiband amplifiers and transmission
systems [87][89] (see the various amplification bands in Fig.
21). The large difference between single-channel ETDM bit
rates and the available optical amplification bandwidth suggests
dividing the usable fiber bandwidth in smaller frequency bands
and populating these bands using WDM [90]. The spectral
layout of the WDM channels is shown in Fig. 22 along with the
broadband noise generated by optical amplifiers along the path
(see Section IX-B). Each WDM channel occupies a bandwidth
that defines the channel spacing. Note that WDM channels
are not to be confused with a channel in the sense of Shannon
and as represented in Fig. 4 of Part 1.
WDM technologies were developed in the mid-1990s and
allowed parallel transmission of many WDM channels on the
same fiber. The upper curve in the top plot in Fig. 20 shows
the total fiber capacity evolution using WDM, with a growth
rate of about 78% per year for over 10 years, backed by the
steady increase in the bandwidth of optical amplifiers as well as
the increase in spectral efficiency due to improvements in laser

Early fiber-optic transmission systems provided point-topoint transmission [91] with all WDM channels co-propagating
over the same optical path. These WDM systems have evolved
to now use reconfigurable optical add-drop multiplexers
(ROADMs) [92] at network nodes to form optically-routed
networks (ORNs) [93][96] such as represented in Fig. 23.
The granularity of WDM channels has an important effect
on the design of ORNs. Because the routing granularity of
ROADMs cannot be smaller than the granularity of WDM
channels unless expensive optical-electronic-optical (OEO)
conversion is performed, it becomes economical to establish a
hierarchy of nodes in a network. Nodes with insufficient traffic
to fill a WDM channel are aggregated in a single, bigger node
called a core node. These core nodes are linked together to
form a core ORN as represented in Fig. 23. ROADMs can then
add and drop channels at the WDM channel granularity.
In this paper, we consider an ORN with a generic mesh
network topology [94], [95] as schematically represented in
Fig. 23. The figure illustrates that independent WDM channels can share some optical fiber spans on their propagation
path from their respective transmitters (Tx) to their respective
receivers (Rx), distorting each others waveforms through
fiber Kerr nonlinearity when sharing the same fiber. Individual
users generally do not have access to other WDM channels
at either the transmitter or the receiver, since this would (in
the most general case) involve the exchange of the full optical
field information between all transponders physically separated
by large distances in the ORN. We further assume that the



total usable fiber bandwidth is entirely filled with uncorrelated

WDM channels (see Fig. 30 for a schematic representation of
interacting WDM channels), which is what we believe to be the
worst-case scenario for a WDM system.
The capacity estimation in this paper is based on the assumption that the usable fiber bandwidth is divided in frequency
bands as shown in Fig. 22. We study a typical WDM channel,
or channel of interest (COI) from which one can calculate
the spectral efficiency (SE), defined in (28), by dividing the
capacity of that channel by the bandwidth allocated to it. If
desired, the aggregate fiber capacity can then be obtained by
multiplying the spectral efficiency by the total fiber bandwidth
assumed to be supported by the system. In this study, we will
use the term in-band to refer to any field that falls within the
frequency band of bandwidth of the COI. Fields outside this
frequency band are referred to as out-of-band. The guardband
represents the difference between the bandwidth
to each channel and the channels spectral support
at the
transmitter. Note that in the nonlinear regime of transmission
we are considering (pseudolinear transmission regime [97],
[98]), the spectral support is approximatively conserved after
transmission (see discussion in Section X-B). Note also that
we are considering only unidirectional transmission over one
and the same optical fiber strand. Bidirectional transmission is
in principle possible and has been reported, e.g., in [99][103].
Severe limitations from Rayleigh scattering have been observed
in these systems [99], [101].
In this section, we discuss the various physical phenomena
present in the fiber channel, including the most important optical and opto-electronic noise sources, Kerr nonlinearity, and
the presence of optical filtering originating from fiber chromatic
dispersion (CD) as well as from bandpass filters (at ROADMs)
within ORNs.
A. Signal and Noise in Coherent Optical Receivers
In all studies presented in this paper, we consider ideal
coherent demodulation, and in particular a perfectly balanced,
ideal homodyne receiver. As discussed in Appendix B, ideal
homodyne demodulation linearly translates the optical field
of the WDM COI into the electronic complex baseband for
further processing using standard communication engineering
methods. The electrical signal at the receiver output for in-phase
(I) and quadrature (Q) components, and prior to any electronic
(matched) filtering, reads (Eqs. (94) and (95) of Appendix B)
is the photodetectors responsivity (in [A/W]);
is the complex envelope of the optical signal field;
is the
temporally constant complex envelope of the LO field acting in
the ideal receiver both as a constant gain multiplier as well as
a perfect phase reference;
denotes the complex envelope
of any other stochastic optical field that is not actively compensated for within the receiver and is therefore contributing to de-

tection noise. The physical origin of the two most important

random optical fields, ASE and double-Rayleigh backscatter
(DRB), is discussed in detail in Section IX-B.
1) Beat Noise: The first term on the right-hand side of (47)
and (48) is the desired signal term, and the second term is the
beat term between the LO and the optical noise field. Note that
this term is the only source of beat noise7 in this idealized receiver. As is shown in Appendix B, the beat noise between the
signal and the optical noise field as well as the beat noise of
the optical noise field with itself are fully suppressed by ideal
balanced detection. The linear conversion of signal and noise
optical fields into the electrical regime also implies that the statistics of the optical noise field is fully preserved. In particular, a circularly symmetric complex Gaussian (ccG) optical
noise field (such as ASE, as we shall see in Section IX-B1)
remains ccG in the electrical domain. The linear translation of
optical signal and noise fields into the electronic domain using
the LO as a perfect phase reference is the key differentiating
factor between coherent demodulation and direct demodulation
[105][110] with one [111], [112] or more [113] delay interferometers, where the electrical noise will no longer be Gaussian.
The variance of the beat noise term between the LO and the
optical noise field is derived in Appendix B and reads
is the power-equivalent bandwidth of the entire
receiver opto-electronics (including the matched filter),
the power spectral density of the noise field
derived in
Section IX-B, and
is the optical LO power.
2) Shot Noise: While the existence of beat noise requires
the presence of a stochastic classical optical field in addition to
the signal optical field, shot noise is always and fundamentally
present in any optical receiver. Shot noise is a direct manifestation of the quantum nature of light [114][119] and is perceived
as random fluctuations of the detected photocurrent, even if the
classical optical field by itself is deterministic.8 Shot noise in
fiber-optic systems cannot be obtained directly from classical
optical field descriptions (classical Maxwells equations); its
understanding requires at a minimum a quantized model for
of ideal,
light-matter interactions. For example, the power
unmodulated laser radiation is a constant, deterministic quantity,
resulting in a constant, deterministic photocurrent in a fully
classical picture. However, if the quantization of lightmatter
interactions is taken into account, the photocurrent produced by
such a radiation source will exhibit random fluctuations. If we
assume a perfectly integrating detector, the statistics of these
fluctuations follow a Poissonian probability distribution. For
more general receiver characteristics, the shot noise variance is
given by [114], [117]
, where is the elementary charge. If the optical power is time-varying (as is necessarily
the case at the photodetector of all optical communication receivers), the shot noise process itself becomes time-varying and
7In optical communications, the term beat noise is used to refer to that part
of the noise associated with a photodetected signal that directly originates from
the classical interference (beating) of optical fields at the photodetector, where
at least one of these optical fields is viewed as a stochastic process [104].
8If the optical field itself is random, it can be shown [119] that, quite remarkably, the total noise variance splits additively into a beat noise term and a shot
noise term.



hence nonstationary. The resulting general expressions for the

shot noise variance are discussed, e.g., in [104], [117], [119],
In contrast to the cancellation of all but one beat noise term
in an ideal, perfectly balanced homodyne receiver, shot noise
is generated by the total optical power reaching each photodetector, as given by (88) through (93). Owing to the statistical
independence of the interactions between classical light and
matter within any two detectors, these shot noise fluctuations
add up statistically. Hence, we find that for the shot noise variances associated with the two (I/Q) outputs of a balanced homodyne receiver, we have (see Appendix B)

thermal noise as well as with other noise sources of purely

noise, etc.) [121],
electronic origin (transistor shot noise,
[122]. We lump all these statistically independent noise sources
into an electronics noise term with variance
at the two
outputs of the balanced receivers. However, as we have seen
above, the variance of both shot noise as well as LO-N beat
noise can be made arbitrarily large by choosing high enough LO
powers, which in turn dwarfs the electronics noise component.
As a consequence, while being of great practical importance in
designing opto-electronic receiver circuits, electronics noise is
not of fundamental interest for a capacity limit estimate and is
hence neglected in our studies.
B. Optical Noise Fields

Although shot noise is identified as a nonstationary noise source,
we see from this equation that in a coherent receiver shot noise
can be made arbitrarily stationary by increasing the LO power,
term dominate and
which eventually lets the
dwarfs the shot noise contributions of signal power and noise
power. In the limit of
, we are left with
Furthermore, we note that at high LO powers, the probability
distribution of shot noise rapidly converges from its Poissonian
nature towards a Gaussian.
3) Is Shot Noise or Beat Noise More Important to Fiber Capacity: Having specified beat noise and shot noise variances,
we are now in the position to answer the question whether shot
noise or beat noise is the more fundamental noise source in
an ideal homodyne receiver when it comes to evaluating fiber
capacity. We can establish a fundamental connection between
the shot noise and beat noise variances as
for the responsivity of a detector
where we used
with perfect unity quantum efficiency; is Plancks constant,
and is the optical frequency ( is used to represent optical frequencies in this paper). Note that (52) is independent of both
. As we will see in Section IX-B, the noise power spectral density
at the receiver for perfectly (ideal) distributed
optical amplification is given by
[cf. (56)]. Inserting
this expression into (52), we see that shot noise is essentially
negligible compared to beat noise whenever
is the fiber loss coefficient and the system length, a condition
that is very well satisfied for any reasonable fiber-optic transport
system length. We may neglect shot noise in our further studies
on fiber-optic transport capacities.9
4) Thermal and Electronics Noise: Finally, we acknowledge that practical receivers are fundamentally associated with
9In certain other applications, e.g., in optical satellite communication links
[18], the optical noise power spectral density at the receiver can be much lower
than in an amplified fiber-optic system, which can make shot noise the dominant
noise contribution in such systems.

The most important optical noise fields related to fiber-optic

transmission are ASE and DRB. We will show here that the most
fundamental source of the two is ASE.
Note that signal distortions involving fiber nonlinearity that
are not or cannot be compensated for can also be considered a
source of random optical fields and hence can be viewed as a
source of noise albeit with statistical characteristics and correlation properties that can be very different from ASE or DRB.
The intricate randomness associated with these nonlinear interactions is taken into account in our work through Monte Carlo
simulations, and the effects are discussed in the fiber nonlinearity Section IX-D.
1) Amplified Spontaneous Emission: As mentioned in the
context of Fig. 21, the loss coefficient dB of optical fibers is
0.2 dB/km in the 1550-nm wavelength region. Transmission
over a distance at such wavelengths experiences dB dB of
loss. For 2000 km, the accumulated loss is 400 dB, an incredibly large power attenuation of
. Clearly, such an enormous
attenuation cannot be bridged at a reasonable transmit power
(even when leaving aside the detrimental impact of fiber nonlinearities) using modulation formats with a reasonable spectral efficiency in optical fiber. Therefore optical amplification
is required along the optical path if frequent opto-electronic regenerations are to be avoided. On the downside, optical amplifiers produce ASE together with signal amplification. One may
therefore understand ASE generation in the fiber channel from
the fundamental fact that the optical fiber is a lossy transmission
Spontaneous emission is the result of a spontaneous transition from an excited state to a lower energy state in a physical
medium, accompanied by the emission of a photon [119],[123],
[124]. At the same time, stimulated emission is responsible for
amplifying a photon within an optical amplifier. Since stimulated emission itself takes place at random, each signal photon
passing through an optical amplifier will experience a random
multiplication factor, in addition to being accompanied by a
number of randomly multiplied spontaneously emitted photons
(see, e.g., [110], [125][127]). The resulting quantum-mechanical optical field fluctuations are summarized under the term
ASE. Remarkably, in [128] and [129], Gordon showed that ASE
can be well represented by a random classical optical field that
has the statistical properties of additive Gaussian noise. The establishment of this equivalence in turn allows the modeling of
ASE as a circularly symmetric ccG noise process. A ccG process



is the spontaneous emission factor [125], [133],

[134]. The quantity
is the photon energy.
For the case of periodically pumped distributed Raman amplification [Fig. 24(b)] we consider ideal distributed Raman amplification (IDRA) where the Raman gain continuously compensates for the fiber loss [74], [138], [139], i.e., the signal
maintains constant average power along the entire transmission
span. One can derive the spectral density of noise
ideal distributed Raman amplification from (54), considering
where is the length of the transmission
line, and by taking the limit
. We obtain [140]
Fig. 24. Two optical amplification schemes for optical transmission over fibers;
based (a) on EDFAs and (b) on distributed Raman amplification. FWP: forward
pumps, BWP: backward pumps.

is fully characterized by its autocorrelation [130]. For ideal distributed Raman amplification, it is given by [124], [131], [132]
is the expectation operator [3, Ch. 2], is the Dirac
functional, and
is the power spectral density of the ASE
after a transmission distance [125], [133], [134] given by (56).
An amplification scheme widely used in fiber-optic communication consists of amplifying the signal periodically at
discrete locations along the optical path. This is done by
inserting optical amplifiers, generally Erbium-doped fiber amplifiers (EDFAs) [125], to interconnect passive fiber spans. This
discrete EDFA amplification scheme is shown in Fig. 24(a).
EDFAs are typically unidirectional as they include optical
isolators a nonreciprocal component that allows propagation in
one direction while blocking propagation in the opposite direction [117], [135]. Fiber span lengths between EDFAs typically
range between 40 and 120 km, depending on the network type.
This corresponds to between 8 and 24 dB of loss per fiber
span before amplification by an EDFA can take place. EDFAs
today closely approach the 3-dB theoretical noise figure limit
dictated by quantum mechanics.
To improve the OSNR beyond the capabilities of EDFAs,
one can transform the passive (lossy) fiber into an amplifying
medium by injecting optical pump power (see Fig. 24(b) and
Section IX-D2). Such optical pumps provide gain through a
stimulated Raman scattering (SRS) process [136], [137] in the
transmission fiber and prevent the signal power from dropping
along the optical path, which results in improved delivered
OSNR [74].
We next calculate the delivered OSNR for both the discrete
EDFA and the distributed Raman systems depicted in Fig. 24.
For periodically spaced discrete EDFAs, the noise spectral density per state of polarization
, generated at the end of a
transmission line composed of a chain of
amplifiers spaced
by fiber spans of length
is given by

is the fiber loss coefficient given by



is replaced by
, the phonon occupancy factor. It
, where
is given by
[137] with
the Boltzmann constant, the
fiber temperature and
the optical frequency of the Raman
is approxipump providing the distributed gain. The factor
mately 1.13 for Raman amplification of fiber-optic communication systems at room temperature. Experimental demonstrations
of nearly ideal distributed gain can be found in [141], [142].
For this capacity limit study, we choose the ideal distributed
Raman amplification scheme with Raman gain exactly compensating the fiber intrinsic loss, as it maximizes OSNR (and
the SNR) at fixed nonlinear phase which will be discussed in
Section IX-D2. The delivered OSNR and SNR can be calculated using (33) and (34).
2) Double Rayleigh Scattering: As described in Section X,
Rayleigh scattering can be an important source of fiber loss, but
it can also be an important source of noise [119], [143][145].
A fraction of the Rayleigh scattering of the forward propagating signal is recaptured into the guided mode of the fiber and
propagates in the opposite direction of the signal. A fraction
of that back-propagating light is then Rayleigh scattered and
recaptured into the guided mode of forward propagation, hence
co-propagating as double Rayleigh backscatter (DRB) along
with the signal. This double-scattering process is distributed
over the entire fiber length and creates a continuum of echoes
that act as multipath interference (MPI) on the signal [146].
Since we deny the receiver knowledge of the amplitudes and
phases of these continuum of echoes, MPI is considered as a
fundamental source of noise in this context [146].
The power of the DRB light for a lossy fiber with a power
loss coefficient per unit length due to Rayleigh scattering of
is given by [146]
is the signal input power to a fiber of length and
is the dimensionless backscatter recapture fraction that defines
how much of the scattered light is recaptured into the guided
fiber mode for a particular optical fiber type [137]. The parameters and are the distributed gain and fiber loss coefficients per
unit of length, respectively, both assumed to be constant along
the fiber.
From (57), for ideal distributed Raman amplification for
, we obtain



Depending on the fiber length , the power

can represent a significant fraction of the signal power
and limit the
effectiveness of Raman amplification for large gain [98], [137],
[147]. It is important to note that the DRB power depends
quadratically on fiber length when allowing propagation in
both directions in a fiber segment. As a result, double Rayleigh
backscattering can be reduced dramatically by inserting optical
isolators [117], [135], [148]. Dividing the fiber link of length
elements of length
, the DRB power at the
end of the line can now be expressed as
sufficiently large can reduce DRB to an arbitrarily
low level where its impact on capacity is well below that of the
ASE. For this reason, DRB is not considered as fundamental a
limitation as ASE for our optical fiber capacity evaluation, and
we will ignore DRB in the rest of this paper.
C. Optical Filtering
There are two classes of optical filters that enter the
problem of establishing a fiber channel capacity: all-pass and
bandpass filters. The first class is represented by chromatic
dispersion (CD), originating from the dispersive nature of
optical fibers. The second class is represented by the presence
of optical bandpass filters at ROADMs to separate and route
individual WDM channels in an ORN. These two classes of filters are very different in nature and impact capacity differently.
1) Fiber Chromatic Dispersion: There are two distinct
origins to the dispersive nature of single-mode optical fibers:
material and waveguide [149]. Optical fibers are made of
fused silica, a material that exhibits inherent CD. Standard
single-mode fibers (SSMFs) have a waveguide dispersion
smaller than the material dispersion with a combined disper. The CD of fibers can be altered
dramatically by designing advanced waveguide structures, with
waveguide dispersion largely exceeding material dispersion
[150, Chs. 24], [151, Ch. 2]. For instance, dispersion-compensating fibers with very negative values of total dispersion [e.g.,
] have been engineered using advanced
waveguide structures [152][156] to compensate the (generally
positive) dispersion of transmission fibers [156], [157].
Note that the variation of dispersion with frequency, such as
the dispersion slope [158], is neglected here because a fiber can
be engineered to have a nonzero value of dispersion over a broad
frequency range [159]. The zero dispersion region is generally
to be avoided as the effects of fiber nonlinearity are enhanced
dramatically [158], [160], [161].
Independent of the origins of dispersion, the equation describing dispersive propagation in fibers can be written as
where is the group-velocity dispersion (GVD) parameter. CD
and GVD are related by

Fig. 25. Concatenation of optical filters. (a) Optical filter with significant amplitude roll-off: spectral narrowing occurs as a result of repeated optical filtering
(ten times). (b) Idealized rectangular optical filter: absence of amplitude narrowing for an arbitrary number n of filters.

Equation (60) can be solved in the spectral domain to give,

is the Fourier
transform of
As its name suggests, CD produces a spread in time of the
various frequency components of a signal due to the difference
in group velocity experienced by each frequency component. As
CD accumulates, neighboring symbols start to overlap in time,
with the number of symbols overlapping increasing with the accumulation of CD. In terms of information theory, CD introduces memory to the channel. The memory, expressed in number
of symbols, is the spreading in time of a signal of spectral supand is given approximately by
is the speed of light and
is the
maximum excursion of dispersion [162]. This memory expression assumes that the fiber nonlinearity remains sufficiently low.
2) ROADM Filtering: Routing individual WDM channels
in an ORN (see Fig. 23) requires optical bandpass filters in
ROADMs. The number of ROADMs needed to route the signal
from a transmitter to a receiver can vary widely in an ORN. In
order to accommodate a varying number of ROADMs in the various optical paths, optical filters should be cascadable in their
amplitude response. Fig. 25 shows the amplitude response of
two types of optical filters. The first type (left plot) has a smooth
amplitude roll-off. One can see that concatenating such optical
in the figure) can result in considerable spectral narfilters (
rowing. In contrast, an idealized rectangular optical filter can be
concatenated an arbitrary number of times without any spectral narrowing (provided that all filters have the same bandwidth
and center frequency). If one chooses sinc pulses for modulation (see Fig. 2 and Section II), a rectangular optical filter
of bandwidth equal to the symbol rate matches the transmitted
signals modulation spectrum. An ideal rectangular optical filter
suppresses neighboring WDM channels completely.



An ideal rectangular filter is noncausal [25] and hence not

physically realizable. However, it is possible to generate filters
with characteristics that approach those of an ideal rectangular
optical filter as closely as desired by using finite-impulse response (FIR) filters and sufficient time delays to ensure causality
[163], [164].
D. Fiber Kerr Nonlinearity
The Kerr effect can be described by a change of refractive
index experienced by a medium when traversed by an electric
field. It was first observed in 1875 by John Kerr [165] by applying an external magnetic field. In optical fibers, the electromagnetic field of the signal itself can reach a sufficient intensity
of 1 GW/m 10 so as to change the refractive index of the fused
silica through the optical Kerr effect. One can distinguish two
types of optical Kerr nonlinearity: instantaneous and noninstantaneous. Both types are discussed in the following.
1) Instantaneous Kerr Nonlinearity: The fast change in the
fiber medium refractive index that occurs in the presence of an
intense electric field is referred to as the instantaneous Kerr nonin the presence of
linearity. Propagation of a signal field
loss, gain and instantaneous Kerr nonlinearity (neglecting dispersion) can be represented as [98], [158], [166]
where, for simplicity, the and dependences of have been
omitted in (63). The nonlinear coefficient is given by
is the fiber nonlinear refractive index [158],
is the angular optical frequency, the speed of light and
the fiber effective area [158].
Equation (63) has the following exact solution
where the integrated nonlinear phase

is defined as

where is the nonlinear coefficient and

length defined as

is the effective

with represents the fiber length. When gain compensates loss
, we have
exactly, i.e.,
The integrated nonlinear phase for an arbitrary signal power
evolution is defined as [158]
10For 100 mW of signal power propagating in a fiber of 100 m effective
GW/m .
area A [158], the intensity of the field 100 mW/100 m


is the signal power evolution. Another
measure that relates to nonlinear transmission is the integrated
nonlinear phase spectral density
defined as
where is the WDM channel spacing. The integrated nonlinear
phase spectral density is a measure of the nonlinear phase that
takes into account the spectral density of the WDM signal.
2) Noninstantaneous Kerr Nonlinearity: The noninstantaneous part of the Kerr effect in optical fibers leads to
Brillouin [91], [167][170] and Raman scatterings [91], [136],
[168][170]. These processes can be spontaneous [171] or
stimulated by the presence of an input wave [145], [172]. Both
phenomena can be interpreted as mechanical waves, of low
frequencies (acoustic phonons) for the Brillouin scattering and
of high frequencies (optical phonons) for Raman scattering.
The Raman effect is often modeled as a delayed nonlinear
response [173][178].
The most important phenomenon associated with stimulated
Brillouin scattering (SBS) in fibers is the presence of optical
amplification in the backward direction downshifted by about
10 GHz in frequency from the signal and of bandwidth less
than 100 MHz [171]. This amplification is generally detrimental
to transmission but can be efficiently suppressed using various
techniques with minimal impact on capacity, using for instance
slow (a few tens of kHz) frequency dithering [179]. It is worth
pointing out that SBS gain has some polarization dependence
[99], [180]. Stimulated Raman scattering (SRS) also leads to
optical amplification, but in contrast to SBS, SRS gain is very
broad ( 10 THz) in fibers. As for SBS, SRS gain is also polarization-dependent [172], [174]. We consider below the two
main amplification mechanisms resulting from SRS.
a) Interchannel Stimulated Raman Scattering: In WDM
transmission, SRS can occur between the different wave-lengths
of the WDM spectrum. An important resulting effect of this interchannel SRS is the creation of a tilt of the WDM spectrum
[91]. Such a gain tilt results in high-frequency WDM channels to be depleted and the low-frequency WDM channels to
be amplified. The WDM spectrum tilt can be calculated using
relations derived in [181] and [182]. Some capacity limitations
are expected from the WDM spectrum tilt but tilt compensation
through gain tilt in the opposite direction and pre-equalization
[183] can greatly reduce its impact.
The gain tilt represents only the average effect of
inter-channel SRS. The gain provided by interchannel SRS
originates from the WDM channels that are data and polarization modulated and experience propagation effects. As a result,
the waveform has power and polarization variations in time
which produces time-varying gain [170], [184][186]. These
effects are not incorporated in our analysis. Further studies
are needed to assess their importance in the context of fiber
b) Ideal Distributed Raman Amplification: Stimulated
Raman scattering can be exploited to create fiber Raman
amplifiers by pumping the optical fiber at a frequency about
13 THz above the desired frequency of gain (see [187] and
[188]). An important application of SRS in systems is to generate distributed gain [189] in passive transmission fibers. This
amplification scheme is referred to as distributed Raman ampli-



Fig. 26. Delivered OSNR for EDFA and ideal distributed amplification for a
typical system: (a) delivered OSNR at fixed input power, (b) nonlinear phase,
and (c) delivered OSNR at constant nonlinear phase.

fication, see Fig. 24(b). We will show that such an amplification

scheme provides significantly higher delivered OSNR than the
discrete Erbium-doped amplification scheme of Fig. 24(a).
We first consider the case with unconstrained signal power
and then the case where power is limited to a given value of the
integrated nonlinear phase [defined later in (68)] [71]. Without
loss of generality, we can initially set the signal power
a fixed value of 0 dBm per channel. The amplifier spontaneous
emission factor
is set to 1 (noise figure 3 dB at large gain
[190]) for EDFAs and
is set to 1 (lowest possible value)
for ideal distributed Raman amplification. The system length
is 2000 km. Fig. 26(a) shows the delivered OSNR as a function of amplifier spacing (or fiber span length) for fixed launch
power for EDFA and ideal distributed Raman amplification. The
EDFA system produces lower OSNR even for a span length of a
few kilometers. However, a fairer comparison between the two
amplification schemes should account for the integrated nonlinear phase of (68) to represent the impact of fiber Kerr nonlinearity. Fig. 26(b) shows the nonlinear phase for the two amplification schemes at a fixed input power and Fig. 26(c) the delivered OSNR when the power
is adjusted so that transmission
takes place at a fixed nonlinear phase.
As seen in Fig. 26(c), for short amplifier spacing (
EDFA (full line) and ideal distributed Raman amplification
(dashed line) systems produce similar OSNRs at constant
nonlinear phase defined as
For large amplifier spacings (50 km and above), the
obtained for discrete amplification is much lower than ideal distributed Raman amplification, resulting in a 10 dB difference
for 100 km amplifier spacing. This illustrates the benefit of using
distributed Raman amplification to achieve low noise at constant
nonlinear phase. One can easily show that maximizing OSNR
also maximizes SNR using (34).

Fig. 27. Decomposition of the instantaneous fiber Kerr nonlinearities into two
categories: intrachannel and interchannel nonlinearities. A list of elementary
nonlinear interactions for each category is provided. NLPN: nonlinear phase
noise, SPM: self-phase modulation, NL: nonlinear, MI: modulation instability,
XPM: cross-phase modulation, FWM: four-wave mixing, IXPM: intrachannel
XPM, IFWM: intrachannel FWM.

stochastic generalized nonlinear Schrdinger equation (GNSE)

[131], [132]
The terms involving
, and are the source of fiber channel
memory, nonlinearity, and noise generation, respectively. As for
(63), for simplicity, the and dependences of
have been
omitted in (70). The term
in (70) is the term describing
ASE noise generation for ideal distributed Raman amplification.
The nonlinear interactions resulting from the instantaneous
Kerr nonlinearity can be classified in two broad categories, intrachannel and interchannel nonlinearities [97]. The expression
intrachannel nonlinearities describes nonlinear interactions involving only fields present in the frequency band of the WDM
COI while we refer to interchannel nonlinearities when it involves at least one field outside the frequency band of the COI
as shown in Fig. 27.
Each nonlinearity type is further decomposed into
signalsignal, signalnoise and noisenoise nonlinear interactions depending on the fields involved in the nonlinear
interactions. Further decomposition into elementary nonlinear
interactions is possible and is shown in Fig. 27. It is beyond
the scope of this paper to discuss in details these elementary
nonlinear interactions (see for instance [98], [158], [166] for
detailed description). Also, our capacity analysis is general in
the sense that they simultaneously include all of these interactions, without the need for a decomposition or classification.
Nevertheless, such a decomposition may be useful to better
understand the specific fields and the specific nonlinear interactions that are responsible for signal distortions, as well as to
devise effective means of counteracting them. As mentioned in
previous sections, uncompensated signal distortions from the
fiber Kerr nonlinearity are considered a source of noise that
limit capacity.

E. Fiber Propagation
The equation that describes the evolution of the optical field
(that contains all WDM channels) in a fiber using ideal
distributed Raman amplification (gain continuously compensates fiber loss) with ASE generation can be represented by the

This section describes the choices we made in terms of modulation, constellation and digital signal processing to mitigate
nonlinear distortions for our fiber capacity estimate.


Fig. 28. Schematic of the distribution of physical effects for a COI for the fiber
channel considered in this paper. Signal and noise fields are represented for both
in-band and out-of-band frequencies.

A. Distributed Impairments
The physical effects present during the propagation from the
transmitter (Tx) to the receiver (Rx) are represented schematically in Fig. 28 in the context of where they appear along
the propagation path. The signal experiences distributed noise
(ASE), fiber nonlinearity, chromatic dispersion (CD), and
periodic filtering from optical bandpass filters (OFs). Two
out-of-band fields are also shown. They stand as a representation of any other WDM channels added and dropped at random
locations in ORNs. In this study, we assume that all neighboring
WDM channels co-propagate with the WDM COI during the
entire optical path but are not available at either the Tx or Rx.
The in-band fields, signal and noise, propagate all the way
to the COI receiver while out-of-band fields may be dropped
or added along the way. To fully capture and understand the
impact of the distributed nature of the impairments of Fig. 28,
it requires a full solution of (70).
B. Choice of Modulation
Studies of capacity limits for band-limited channels [1] have
been developed for linear channels that conserve the signal spectral support. A nonlinear channel can, in general, create new frequencies falling outside the originally transmitted signal spectral support and eventually the channel bandwidth. In the case of
the ORN fiber channel, the signal spectrum is repeatedly confined by optical bandpass filters at ROADMs. Even though, it
is possible to reconstruct a signal truncated by filtering in some
scenarios in the absence of noise [191], [192], we surmise that,
from a capacity standpoint, it is generally preferable to avoid
spectral broadening altogether in a band-limited fiber channel.
Our approach to deal with these difficulties is to place ourselves in a nonlinear regime that limits spectral broadening by
, where
is the signal
fulfilling the condition
(power dependent) transmission length over which a non-negligible amount of spectral components are generated beyond
the signal spectral support, and is the transmission length.
To make
large, we operate in the regime
often referred to as the pseudo-linear regime of transmission
[97], [98]. The dispersion length is
[158] where
is the symbol duration and
the GVD, related to
the dispersion by (61). The nonlinear length [158] is
where is the nonlinear coefficient defined in (64)
and is the signal power. This regime produces a waveform
that changes very rapidly with propagation, helping reduce nonlinear effects. However, the large spreading of the waveform
creates a large number of symbols to interact nonlinearly, creating channel memory [162]. One operates in this regime when


( 25 Gbaud and above)

transmitting at high symbol rates
over fibers with high dispersion
above) which covers the vast majority of commercial fibers. In
our studies, we numerically verified that our capacity estimate
results were not limited by spectral broadening for WDM systems as shown later in Fig. 36.
Operating under the conditions of small spectral broadening
allows us to use conventional compact spectrum modulation.
We studied a range of modulations with square-root raised-cosine spectra with different roll-off factors from 0 to 0.25 (see
Section II-B). The primary impact on SE of using a nonzero
is a reduction by a factor of
of the spectral filling to
avoid spectrum overlap. We performed SE estimates with low
values of of 0.02, 0.01, and 0 and found no significant differences besides the spectral filling factor. For this reason, we use
or sinc pulses described in Section II for essentially all
calculations in this paper. Moreover, we assume that the lasers
linewidths have negligible impact.
C. Choice of Constellation
Since accurate numerical solutions of (70) are involved
and take a long time, it is exceedingly difficult to gather
enough numerical statistics for all constellation points individually. As a consequence, we make use of the statistical
rotational invariance of the AWGN channel and surmise
that this invariance also applies to the nonlinear case, as
given by (70). By statistical rotational invariance we mean
that the channel probability distribution has the property
, i.e., for a certain transmit symbol
, sending the symbol
will produce the same output density but
now rotated by
around the origin of the complex plane.
The fact that we are using a statistically rotationally invariant
ccG process as noise and statistically rotationally invariant
symbol constellations as our signal and our interferers fosters
our confidence that we are, in fact, dealing with a statistically
rotationally invariant situation. This, in turn, allows us to treat
all points on the same ring as statistically equivalent, and we
can numerically accumulate statistics by considering only the
relative transmission induced displacement of a constellation
point with respect to its transmit angle by back-rotating each
received symbol by its corresponding transmit angle.
The process of back-rotation of constellation points used
for numerical evaluation of a capacity estimate is shown in
Fig. 29(a,b). Fig. 29(a) shows an original constellation at the
transmitter. Each point of the constellation is back-rotated to the
positive real axis. In the absence of noise, all symbols belonging
to the same ring degenerate to a single point on the real axis.
With noise, there is a spreading of the points on each ring that
now form clouds [see Fig. 29(c)] associated with each ring.
With noise and nonlinearity [see Fig. 29(d)], there is a common
rotation for all points and further spreading of the clouds. The
average rotation of the clouds is referred to as
. The
noisy and nonlinearly distorted clouds after transmission are
fitted for each ring to bivariate Gaussian probability distribution
functions (PDFs) whose covariance matrices also capture the
non-circularity of clouds due to nonlinear signal distortions.
From the discretized version of these PDFs, one can calculate
capacity estimates using (37). We also explored fitting various



Fig. 30. Schematic representation of interacting fields in nonlinear fiber transmission.

Fig. 29. Example of a four-ring constellation used for numerical evaluation

of capacity: (a) original constellation; (b) after back-rotation (BR) without any
impairment; (c) after back-rotation with ASE; and (d) after back-rotation with
ASE, CD, and fiber nonlinearity.

types of non-Gaussian PDF shapes, both with and without

circular symmetry. These PDFs led to very marginally different
capacities than obtained for the bivariate Gaussian PDFs. The
AWGN channel capacities for ring constellations calculated
using the numerical technique described give identical results
to Fig. 16 obtained using a semianalytic approach derived in
Appendix A.
D. Nonlinearity Compensation Using DSP
Digital signal processing (DSP) can compensate signal distortions in the electrical domain either as pre- or post-equalization (see Fig. 4). Such compensation needs to be performed in
the context of capacity calculations for ORNs. Fig. 30 shows
a schematic representation of various fields present in the optical path of a WDM COI (labeled 6 in Fig. 30). The ASE
noise generated in the same frequency band as the COI, the
in-band noise, is shown at the bottom of the plot (labeled 7).
The signal of the COI is available at both the transmitter (Tx)
and receiver (Rx) while the noise being only available at the receiver. The out-of-band fields are shown by the first five fields
of Fig. 30. Some fields may be available at the transmitter (labeled 5), or at the receiver (labeled 3 and 4) or at neither the
transmitter nor receiver (labeled 1 and 2). The scenario that we
believe limits the capacity of ORNs the most is when only the
in-band fields are available at the transmitter and receiver. This
is the ORN scenario we consider in this paper. Because only the
in-band fields are assumed available, we focus our attention on
compensation of nonlinear distortions from intrachannel nonlinearities. As shown in Fig. 27, one can separate intrachannel
nonlinearities into nonlinear interactions that involve only the
signal itself (signal-signal intrachannel nonlinearities) and all

other intrachannel nonlinear interactions that involve the noise

(signalnoise and noisenoise).
We compensate intrachannel nonlinear interactions by using
reverse propagation (or back-propagation) on the fields
present in the COI bandwidth. In the absence of optical noise
and optical bandpass filtering, back-propagation can undo exactly the simultaneous impact of nonlinearity and dispersion on
the signal. Back-propagation can be applied at the transmitter,
at the receiver, or both. The back-propagation equation can
be obtained by setting the right-hand side of (70) to zero and
. The new equation can be solved in the elecchanging to
tronic domain by using digital signal processing to implement
the split-step Fourier method (SSFM) [158], which consists
essentially of a succession of fast Fourier transforms (FFTs)
and complex multiplications (the SSFM is explained in [158,
Sec. 2.4.1]). Given a sufficient number of steps of the SSFM, all
signalsignal intrachannel nonlinearities can be undone by this
process. We will see in Section XI-E that the power levels where
signalnoise and noisenoise nonlinear interactions as well as
the presence of optical filtering starts to reduce the effectiveness
of back-propagation is almost an order of magnitude higher
than the power at which WDM nonlinearities limit capacity.
Note that we assume that the parameters
, and
are known at both the transmitter and receiver.
In our calculations of fiber capacity estimates for ORNs, we
consider SSMF whose parameters are given in Table I. CD of
corresponds to
ps /km
using the relation given in (61). The amplification scheme is
ideal distributed Raman amplification with the characteristics
given in Table II. We assume that there is a ROADM at every
Raman pumping station (see Fig. 24), where dispersion compensating fibers can optionally be inserted. The signal and optical bandpass filter characteristics in ROADMs, multiplexers,
and demultiplexers are given in Table III. We restrict ourselves
to a single state of polarization for the signal and most of the
numerical simulations were performed with copolarized noise
only. It was verified that the same results were obtained when





Fig. 31. Definition of parameters in singly periodic dispersion maps.

RDPS: residual dispersion per span, NRD: net residual dispersion.



polarization effects were included. Finally, the modulation parameters are given in Table IV.
We modeled transmission with a large number of WDM channels and found that increasing the number of WDM channels beyond five only slightly impacts our capacity calculations for the
parameters considered; we, therefore, use five WDM channels
in the following calculations and study the central channel as our
COI. We use constellation points that are randomly chosen on
the ring constellation structures, using time sequences varying
from 2048 and 8192 symbols per simulation. The large computation time prevented using larger numbers of points but repeated trials with different noise, data realizations and time offsets (including time offsets of a fraction of symbol duration
) led to variations in capacity estimates of only a few tenths
of bits/s/Hz. Back-propagation applied at both the transmitter
and receiver in variable ratios also produced capacity estimates
within a few tenths of bits/s/Hz for all these scenarios presented
A. Conventional Dispersion Map
Optical transmission systems that are limited mainly by
single-channel nonlinear transmission generally greatly ben-

Fig. 32. Spectral efficiency after transmission over 2000 km for uniform ring
constellations and a conventional dispersion map.

efit from dispersion mapping where some level of dispersion

compensation is periodically applied along the link [98], [193].
The parameters defining a dispersion map are shown in Fig. 31.
Optimized values [194] of the parameters of a singly-periodic
dispersion map for ideal distributed amplification in the absence
of nonlinearity compensation are given in Table V. Back-propagation is then used after coherent detection to compensate for
CD and fiber nonlinearity.
The fiber channel capacity per unit bandwidth [i.e., the spectral efficiency, SE, defined in (28)] for the system studied here is
displayed in Fig. 32 for various numbers of rings [72]. For each
ring constellation, the capacity increases following its AWGN
capacity (see Fig. 16) at low SNRs (
dB). At moderate
SNR ( 20 dB), the capacities for each ring constellation peak
and eventually decrease at higher values of SNRs (
Note that the signal power level is displayed in Fig. 32 because,
unlike for the AWGN channel, not only the SNR but also the ab-




Fig. 33. Spectral efficiency after transmission over 2000 km for ring constellations optimized as described in this paper.

solute signal power is required to calculate fiber capacity. The

maximum spectral efficiency of bits/s/Hz is achieved using 16
rings slightly above the capacity for 8 rings. We note that the
capacity of the one-ring constellation cuts across the capacities of richer constellations, suggesting that additional capacity
may be available in that regime by optimizing the input constellation; this is discussed in the following sections. The statistical variations in capacity near the peak is about 0.2 bits/s/Hz.
This was estimated from ten simulations of each case: 1) different noise seeds, 2) different random data, and 3) different
timing between WDM channels. The largest variations were observed from different data. Slightly larger variations have been
observed beyond the capacity peak, a region of large nonlinear
distortions. One should point out that even though we present
the fiber capacity results as a function of SNR, unlike for the
AWGN channel, the signal power and noise levels may need to
be considered separately for the fiber channel.
B. Constellation Shaping
The capacity calculations in Section XI-A use uniform ring
constellations, i.e., the ring radii are an integer multiple of the
inner ring radius, and each transmitted symbol is taken from
any of the rings with equal probability (i.e., we assume an equal
probability of occupation on each ring). As pointed out in Part
1 of this paper, the unconstrained channel capacity involves an
optimization of the input constellation [see (13)]. We now optimize the ring constellations by varying the ring spacing and
the frequency of occupation on each ring. The optimization is
summarized in Table VI and a detailed description can be found
in [195]. The capacity results with optimized constellations are
shown in Fig. 33. Here, the multiring cases always exceed the
one-ring capacity values, as expected for optimized constellations (compare to the crossings observed in Fig. 32). At high
SNRs, the capacities of the optimized multi-ring constellations
become identical to that of the one-ring constellation, as the
multiring constellations degenerate to a one-ring constellation
for severe nonlinear distortions [195]. The optimization hardly

Fig. 34. Spectral efficiency after transmission over 2000 km for various
residual dispersion per span.

benefits constellations with more than two rings, so the maximum capacity is not increased. Note that the calculations shown
in Fig. 33 were performed with different data realizations (i.e.,
WDM waveforms) than in the uniform ring constellation case
of Fig. 32 resulting in slight statistical variations in capacities
for the one-ring case.
C. Effect of Dispersion Map
A measure of the impact of dispersion mapping is presented
in Fig. 34 for 16-ring constellations, a number of rings sufficient for our capacity estimate. The residual dispersion per span
(RDPS) has been varied from full dispersion compensation per
to the total absence of any in-line dispersion
compensation [
]. In each case, a
dispersion precompensation equal to half the accumulated link
dispersion is used and the dispersion is brought back to zero before coherent detection. One observes that increasing the value
of RDPS (reducing in-line dispersion compensation) increases
capacity, with the maximum capacity being reached in the absence of dispersion compensation.
Two reasons explain this behavior. The first is that back-propagation eliminates all signalsignal intrachannel nonlinearities
rendering extraneous the function of the dispersion map to reduce intrachannel nonlinearities in such systems. The second
reason is that periodic in-line broadband dispersion compensation recorrelates WDM channels in time producing a coherent
accumulation of nonlinear distortions rather than a statistical averaging when no realignment occurs in the absence of in-line
dispersion compensation. A coherent addition of impairments is
more damaging to capacity than a random addition of the same



Fig. 35. Spectral efficiency after transmission for various distance. All links
are without dispersion compensation.

impairments. It is possible to avoid this recorrelation of WDM

channels by using dispersion compensators that are channelized, i.e., that compensate dispersion independently for each
WDM channel without compensating the relative time delay between them [196][199].
The optimization of the input constellation described in
Section XI-B and [195] has been applied to systems without
in-line dispersion compensation. We were unable to observe
any statistically significant increase in capacity.
D. Effect of Distance
We evaluated the dependence of capacity on distance, from
500 to 8000 km. The capacity results are displayed in Fig. 35.
The number of rings is 16 for all distances and the number of
symbols used here is 8192. We verified by calculating the capacity of 32 rings for a few SNR points that the number of rings
was sufficient to capture the maximum capacity, even for 500
km. One observes that the SNR at which the capacity peaks decreases by 3 dB for every doubling in distance. Since the noise
level also increases by 3 dB when doubling the distance, the
optimum signal launch power is virtually independent of distance. This can be understood by realizing that higher capacities
are achieved using richer (denser) constellations that are more
sensitive to nonlinear distortions, preventing raising the signal
power even when transmission distances are shortened.
E. Origin of Capacity Limitations
To determine the origin of fiber capacity limitations, we calculated the capacity of various signal and noise scenarios for the
2000 km case (the middle curve of Fig. 35). The scenario (1) in
Fig. 36 is exactly the middle curve of Fig. 35, which corresponds
to the case of WDM transmission with noise (ASE) and optical
filters (OFs) every 100 km, and no in-line dispersion compensation. Scenario (2) is an unphysical case where ASE is neglected.
At high signal power (SNR greater than 30 dB), the capacity
is identical to scenario (1) where ASE is included. This indicates that capacity is limited by signalsignal interchannel nonlinearities (see Fig. 27) at these SNR values. At lower SNRs, the
capacity increases, even above the linear Shannon limit, since

Fig. 36. Spectral efficiency for four signal and noise scenarios for the 2000 km
transmission of Fig. 35. (ch: channel.)

there is no ASE to limit capacity for this unphysical scenario.

The last two scenarios are for single-channel transmission that
both have higher capacities than the first scenario.
The capacity of the single-channel case with OFs [scenario
(3)] rolls-off more abruptly than if the OFs are removed [scenario (4)] indicating that the spectral truncation from the presence of optical bandpass filters limits single-channel transmission more strongly than nonlinear distortions between signal
and noise (see Fig. 27). Note that single-channel transmission
can operate at around 10 dB higher SNR (i.e., power) than the
WDM cases. Probably, the most important understanding garnered from Fig. 36 is that signalsignal interchannel nonlinearities are responsible for limiting capacity in the WDM systems considered, and that a moderate increase in capacity can
be gained if WDM nonlinear effects could be suppressed.
It is interesting to study the output constellations after
transmission for various powers around the optimum capacity.
All constellation points are individually back-rotated by their
transmit angle since we assume rotational symmetry (see
Section X-C). These constellations are displayed in Fig. 37.
We show four-ring constellations here to facilitate visualization
and because the shape of the clouds are similar to the 16-ring
case that produces slightly larger capacities. At low powers
dBm per channel), the SNR is low, explaining
the large sizes of the clouds. The cloud sizes decrease with
dBm) the
increasing signal power until at high powers (
clouds increase in size due to nonlinear effects. The capacity
has already decreased when this power level has been reached.
F. Comparison to Record Capacities
The spectral efficiency of recent record experiments that have
propagated over more than 300 km and operated at 50 Gb/s
and above are shown in Fig. 38. The capacity limit estimate
for 500 km transmission over SSMF with a loss coefficient of
dB/km and effective area of 80 m is shown for
comparison. The record experiments are about a factor three



Fig. 38. Spectral efficiency results for recent record experiments. The capacity
limit estimate curve for 500 km transmission of Fig. 35 is shown for comparison. There is about a factor three between the capacity limit estimate and the
record capacities. The experimental data, labeled (1)(5) in the legend, are from
[200][204]. The upper axes apply only to the capacity limit estimation curve.

Fig. 37. Four-ring constellations (back-rotated) in the absence of ASE and nonlinearity, and after WDM transmission for various values of input power per
channel P . The optimum power for our capacity estimate is between
3 dB m.



from the calculated capacity limit estimates. Note however, that

record experiments most often use state-of-the-art optical fibers
that have larger effective areas than 80 m and loss coefficients
lower than 0.2 dB/km.
We refer to the evaluation of a fiber capacity in this paper
as an estimate because of approximations made along the way.
First and foremost, the general problem of estimating the capacity of a nonlinear channel that includes distributed Kerrtype nonlinearity, dispersion, noise and optical bandpass filtering has had no general formulation yet. In this paper, we operate in a propagation regime (pseudolinear transmission [97],
[98]) where the signal spectrum remains highly confined. We
use this property to justify using information theory for bandlimited AWGN channels. We are also not searching systematically for an optimum input distribution, potentially overlooking
some capacity gain. In addition, our capacity calculations do
not attempt to take advantage of any memory remaining in the
channel after back-propagation is used, potentially underestimating capacity.
For instance, one possible way to better approach capacity
for a channel that has memory, with or without back-propaga-

tion, is by constituting
different subchannels
periodic extractions of symbols in time starting at a difst. So, for sufficiently
ferent symbol, from the 0th to the
large , subchannel one,
, involves symbols transmitted
in time slots numbered
consists of symbols transmitted in time slots numbered
consists of symbols
. Any one of these sub-channels, for sufficiently large , can by itself be treated as an independent memoryless channel, even though the received symbols suffer impairment through memory effects from symbols
sent in neighboring time slots. The symbols in
can be
decoded and any of the now-known nonlinear influence from
subtracted from the received sigsymbols exclusively in
. Then
nals for sub-channels
the symbols for
are decoded and the now-known nonlinear influence of the transmitted symbols decoded in channels described by
together subtracted from
the received symbols for
, and
so on. Finally,
would have the now-known nonlinear influences from all the decoded subchannel symbols involving
removed. Since more
and more nonlinear impairment is removed, the capacities of the
sub channels increase with increasing subscript. By processing
in this way it is clear we have increased the capacity over just
treating each subchannel without subtracting away impairments
due to already decoded subchannels. What we do in this paper
. Pursuing
is pessimistically use times the capacity of
such advanced, but much more involved forms of processing
is beyond the scope of this paper. Indeed, as nonlinear interferences are progressively removed, it suggests that progressively more refined signal constellations be used for channels
with larger indexes .



One should note that additional capacity is also expected to be

available if WDM channels other than the WDM COI are available at the transmitter or receiver for digital signal processing
for nonlinearity compensation. The availability of additional capacity is suggested by Fig. 36 that shows single channel transmission having significantly larger capacity limit estimates than
WDM systems in ORNs. As mentioned earlier, WDM channels
availability is not guaranteed in ORNs (see Fig. 30). In the case
of point-to-point transmission, the full WDM spectrum would
be available for processing.
Finally, unlike many channels, the fiber channel can be
molded to increase the capacity of the channel. This can
be done for instance by reducing fiber loss or increasing fiber
effective area, as discussed in [75].

The Jacobian of the
so the integral in (71) becomes

transformation is

is the modified Bessel function of the first kind of
order zero. We thus have

A framework for the study of the capacity limits of the fiber
channel in optically routed networks has been described. Using
a series of advanced technologies, including advanced modulation formats, digital signal processing for fiber back-propagation, flat square bandpass optical filters and optimum coding,
we showed that a spectral efficiency per polarization of about
9 bits/s/Hz is achievable over 500 km of standard single-mode

We compute (76) numerically as follows. We generate a large
where is chosen uniformly over
number of
, and then generate by adding complex Gaussian noise.
Finally, we compute
and the average in (76). For large
, we use the approximation [39, p. 47]
entries of
so that the term inside the expectation in (76) becomes

Consider one ring with
distributed over the interval
pute the entropy

A. Information Rates for Several Rings

, where is uniformly
. Using (20), we need to com. We have

and we have

Suppose next that we have

rings, i.e., we have



is the probability of choosing ring that has

. It is easy to check that
in (76) is now

is given by (16),


is the Dirac- functional defined by the equation

We compute (80) numerically as described above for one ring.

It remains to maximize
to the power constraint

Note that
is concave in
which is linear in
as seen in (79). Thus, once the
are chosen one could optiby using convex optimization methods. The best
will be a function of the SNR.
is to use uniform
A simple approach for choosing the
spacing in the field, i.e., choose

for the open interval .

We continue by making the change of variables





where we omit the time variable for notational simplicity, the

asterisk denotes complex conjugation, and and stand for the
real and imaginary parts, respectively. The difference signals at
the two outputs of the balanced receivers then read

Fig. 39. General setup of a coherent optical receiver. Time variables for the
various signals are omitted for visual clarity.

ample, if

, where

is chosen to satisfy (81). For exfor all we compute

for a large number of
A better approach for choosing the
rings is to use a spacing that approximates Gaussian signaling
in the complex plane. For example, one such choice is a squareroot logarithmic spacing with

, where

is again chosen to satisfy (81).

The structure of a (single-polarization) balanced coherent opoptical
tical I/Q receiver is shown in Fig. 39. It consists of a
hybrid, which combines the incident optical field with an LO in
both quadratures using two beam splitters as well as two pairs
of balanced photodetectors. Their difference signal constitutes
the output signal of the receiver.
With reference to Fig. 39, the optical fields at the four detectors can be written in the following form [117]
where the sign of the signal term originates from energy conservation within the lossless beam splitters with power transmission (ideally,
), and the multiplication by is due
to the
phase shift of the LO within the 90-degree optical hybrid. After square-law photodetection with responsivities
(in [A/W]; ideally
for all ), the four electrical signals


For an ideally balanced receiver (

direct detection terms vanish, and we are left with

) all

Finally, these expressions are convolved with the opto-elec, which can, e.g.,
tronic front-ends impulse response
implement a matched filter.
The first term on the right-hand side of (94) and (95) is the
desired signal term, and the second term is the beat term between the LO and the optical noise field, which is the only beat
noise term of relevance. Note that both the beat term between
signal and optical noise as well as the noisenoise beat term are
fully eliminated by ideal balanced detection. The linear conversion of signal and noise optical fields into the electrical regime
also implies that the statistics of the noise are fully preserved.
In particular, a Gaussian optical noise field (ASE) will remain
Gaussian in the electrical domain, in contrast to direct detection
receivers or imbalanced coherent receivers, which generally exhibit non-Gaussian detection noise statistics [105][108].
The variance of the beat noise term between the LO and the
optical noise field can be directly calculated from (94) and (95),
as outlined, e.g., in [119] and [205]. One starts by taking the
expectation of the squared magnitude of the beat-noise term
assuming that
is a zero-mean stochastic process. (An
equivalent procedure can be performed for the beat noise in
the quadrature component.) Writing out the convolution in
its integral form, expanding the real part into the sum of two
complex conjugated terms, expressing the squared magnitude
as a multiplication with the complex conjugate, assuming the
optical noise field to obey
, which holds,
e.g., for circularly symmetric ccG noise by the moment theorem
of Gaussian random variables, and taking note of the fact that
is temporally constant, we arrive at





Further simplifications can be obtained through certain assumptions on the noise autocorrelation
and the detectors impulse response. For example, if we
assume the noise to be white with power spectral density
over the opto-electronic detection bandwidth, we have
, and consequently


where we made use of
being the power equivalent bandwidth of the real-valued im. This result agrees with the commonly used
pulse response
beat noise variance approximation derived in [206].
The capacity-achieving input distribution for the AWGN
is circhannel under an average power constraint
cularly complex Gaussian (see (22) and Fig. 14). Any other
constellation, discrete or continuous, is sub-optimum for the
AWGN channel. As we shall see in Subsection D below, one
can approach the AWGN channel capacity (23) with discrete constellations by using a sufficiently large number
of well-placed points. Any remaining SNR gap between the
capacity (23) and the information rate (37) of constellations is
called the shaping gain [207], [208]. For example, for -QAM
with uniform input probabilities, the shaping gain grows to 1.53
(or large SNR). The gap in SNR
dB asymptotically for large
per information bit between uncoded and coded transmission
, is known as the effective (or
at a certain error ratio, e.g.,
net) coding gain [208].
Discrete constellations are often compared in terms of their
uncoded symbol or bit error ratio (SER/BER) performance and
their coded information rates (12) or (37) with hard- or soft-decision receivers, respectively. Different constellations may be
optimum at different SNR values; generally, a globally optimum constellation cannot be found. The problem of placing
points in the signal space such that the capacity at a given
SNR is maximized (or the BER is minimized) is nontrivial and
attracted much attention in early days of digital communications [209][211]. An overview of the historical development
of two-dimensional constellations can be found in [212].
For a given constellation, error ratios and capacities depend
on how the noisy samples are processed in the receiver. Upon
observing the channel output , the receiver might make a
decision on the transmitted channel input . The probability
of a symbol error is minimized by the maximum a posteriori
(MAP) criterion, i.e., by a receiver that, for an observed channel
which maximizes
output , decides on that input symbol
. For equally probable input symbols, this is equivalent to a maximum likelihood (ML) decision which maximizes
[24]. A receiver that makes such hard decisions
maps each point in the received signal space onto a discrete
symbol; the set of values that are mapped onto a given symbol
form its decision region. If a received value is outside the decision region of the transmitted symbol, a symbol error occurs.
For the AWGN channel, the MAP (or ML) criterion reduces
to deciding on the symbol that is closest to the received value,

Fig. 40. Capacities of various binary and quaternary formats with soft (solid
lines) and hard decision (dashed lines): OOK (1b), BPSK (1a), 2-ASK/2-PSK
(2a), QPSK or 4-QAM (2b), ring-1-3 (2c).

i.e., the decision regions are regions of minimum distance [24]

(so-called Voronoi regions [213]).11 The SER of an uncoded
hard-decision system is given by
denotes the decision region corresponding to input
is a complex
symbol . For the AWGN channel,
Gaussian PDF (16).
The most likely error event is to receive within a decision
region that neighbors that of the transmitted symbol. To reduce
the BER, the bit sequences assigned to adjacent symbols should
differ only in one digit; such an assignment is called a Gray
mapping (after Frank Gray, who used the term reflected binary
code [52]). For constellations with one degree of freedom (ASK,
PSK), a Gray mapping is easily found. Square QAM constellations (with even
) can be Gray encoded hierarchically
by subdividing the constellation into smaller blocks [212]. Similarly, we have produced Gray mappings for ring constellations,
i.e., ASK/PSK combinations, by encoding the phase and amplitude separately and concatenating the code words appropriately. There are constellations for which a perfect Gray mapping cannot be found; among them are cross-QAM constellations (with odd
) [212]. Another simple example for
which a perfect Gray mapping does not exist is a quaternary constellation with 3 points on a ring and 1 point in the origin. We
call this constellation ring-1-3 in the following. For a given bit
mapping, the BER of an uncoded hard-decision system is calculated by integrating over the decision regions similar to (99)
and weighting each probability with the corresponding number
of wrongly decided bits (see Fig. 10).
To compare various modulation schemes, it is instructive to
express the bit or symbol error ratio, and the capacity, in terms
of the SNR per information bit or SNR as in Fig. 11. Recall
11The minimum distance decision rule is optimum only for circularly symmetric and monotonically decreasing noise PDFs; the noise encountered in optical communication systems, in particular those with direct-detection receivers,
may deviate significantly from this ideal.


from (31) that SNR

for uncoded BER and
SER evaluations.
At the receiver, one can distinguish between hard decision
and soft decision systems (see Section II-C). The latter pass
the received continuous channel output to the decoder. This
permits the use of soft-decision error-correcting codes such as
Turbo or low-density parity-check (LDPC) codes. In contrast,
hard-decision receivers make a definitive decision to which
input symbol the output symbol corresponds to before decoding. The hard decision, which is performed according to the
MAP (or ML) criterion as described above, can be regarded as
a quantization into Voronoi regions or Voronoi cells [213]. In
this quantization process, information is inevitably lost, so that
the achievable information rate (12) is generally less than the
rate achieved by soft decision (37). Note that the bit mapping
function of the digital modulator (see Fig. 4) can be an arbitrary
bijective mapping without loss of information. Therefore, the
achievable information rates for both hard- and soft-decision
systems are independent of the bit mapping used by the digital
A geometric property that is used to characterize modulation
, i.e., the smallest Euschemes is their minimum distance
clidean distance between any two symbols of the constellation.
given in this Appendix are normalized to
The values for
unity average symbol energy. At large values of SNR, the BER
performance is dominated by the minimum distance, whereas
this is not necessarily the case for lower SNRs, where a received
symbol may be mistaken for a symbol that is further away than
the nearest neighbor. A general geometric property of any optimum -ary modulation scheme is that its center of gravity
is the origin, as this minimizes the average symbol energy [22],
[210]. Well-known modulation schemes include amplitude-shift
keying ( -ASK/2-PSK in this paper), phase-shift keying (PSK)
and quadrature amplitude modulation (QAM). A combination
of amplitude and phase modulation (ASK/PSK) yields signal
constellations where
points are located on rings, so that
symbols per ring. Depending on the rings amthere are
plitudes and phase offsets, this scheme describes a large number
of constellations, e.g., all different 8-QAM constellations. In
earlier references, this scheme is called a Type I system [212].
To achieve better minimum-distance properties, it is better to
allow a variable number of symbols on each ring (Type II in
the literature). Such constellations usually have one point in the
origin and a growing number of points on the outer rings and
are often referred to by the number of points on their rings, e.g.,
ring-1-3-5-7 for the 16-ary constellation depicted in Fig. 44 with
one point in the origin, three points on the first ring, five points
on the second ring, and seven points on the outermost (third)
ring. The densest 2-D lattice is hexagonal (try penny packing
[211]); hexagonal constellations therefore maximize
are asymptotically (i.e., for large SNR) optimal. Such constellations become symmetrical when the outer-most hexagon is completely filled, i.e., for
, etc.
The error ratio and capacity results presented in the remainder
of this Appendix have been obtained as follows. To calculate the
SER at a given SNR value, the integral in (99) (with
given by (16)) was evaluated numerically for every point
of the constellation. The decision regions
are regions of
minimum distance, i.e., every point in the complex plane that
is closer to than to any other constellation point belongs to


Fig. 41. Symbol and bit error ratios (SERsolid lines, BERdashed lines) of
various binary and quaternary formats: OOK (1b), BPSK (1a), 2-ASK/2-PSK
(2a), QPSK or 4-QAM (2b), ring-1-3 (2c). The BER of QPSK (2b) is equal to
that of BPSK (1a).

. Using the same decision regions, the BER is numerically

evaluated as
where the probability for receiving
given by

given that

was sent is


is the number of bit errors made for that symbol error

. Using a Gray mapping decreases this
and hence the BER. To obtain the soft-decision capacity value, (37) was integrated numerically for every input
and using (16). To obtain the hard-decision capacity, the transitional probabilities (101) are used to evaluate
A. Binary and Quaternary Constellations
For binary
and quaternary
those constellations with the largest minimum distance yield
the best performance in terms of error ratio or capacity at all
values of SNR. The distance between the symbols for OOK [see
Fig. 42(1b)] is
, whereas BPSK [see Fig. 42(1a)]
(all at unity average symbol energy). QPSK [see
Fig. 42(2b)] achieves
, which exceeds
for 2-ASK/2-PSK [see Fig. 42(2a)], and
ring-1-3 [see Fig. 42(2c)]. Fig. 40 depicts the capacities of
these formats for soft- and hard-decision receivers as a function
of the SNR per information bit. All schemes achieve
bits/symbol for large SNR. At lower SNRs, QPSK/4-QAM
(2b) has the highest capacity, followed by ring-1-3 (2c) and
2-ASK/2-PSK (2a). The dashed lines show the capacity values
for hard-decision, so that the difference between corresponding
solid and dashed line pairs can be interpreted as the amount
of information that is destroyed by the hard decision process



Fig. 44. Constellations with M

Fig. 42. Examples of constellations with

= 16 points.

M = 2 = 4, and = 8 points.

Fig. 45. Symbol and bit error ratios (SER: solid lines, BER: dashed lines)
of 16-ary formats: hexagonal (4g), QAM (4b), PSK (4c), ring-1-3-5-7 (4f),
2-ASK/8-PSK (4d), 4-ASK/4-PSK (4e).

Fig. 43. Capacities of 8-ary formats with soft (solid lines) and hard decision
(dashed lines): 4-ASK/2-PSK (3a), rectangular 8-QAM (3b), 8-PSK (3c),
, ring-1-7 (3f), hexagonal (3g), and star
2-ASK/4-PSK (3d) with r =r
8-QAM (3h).


or, equivalently, as the SNR gain that can be achieved by

processing soft values in the receiver.
Fig. 41 shows the SER for these formats. For binary constel, at high
lations, the SER and BER are identical. For
SNRs, the dominant error event is the detection of a neighbor
symbol. With Gray mapping, this induces only a single bit error,
so that for schemes with
, BER and SER are asymptotically equal. This can be observed in Fig. 41, where the BER
and SER curves of QPSK (2b) and 2-ASK/2-PSK (2a) merge
for large SNRs.
B. 8-ary Constellations
Examples for constellations with
points are depicted
in Fig. 42.
Using only a single quadrature results in bad performance at
; 4-ASK/2-PSK (3a) has a minimum distance of only

. This compares to
for 8-PSK (3c)
for 2-ASK/4-PSK (3d) with a ring ampliand
tude ratio of
. 2-ASK/4-PSK (3d) subsumes other
, we obtain square
well-known constellations. For
8-QAM with
. Setting
delivers an optimum minimum distance of
Star 8-QAM (3h) (which is 2-ASK/4-PSK with equal phase angles in both rings) has
. Rectangular 8-QAM (3b),
which are eight points on a 2-by-4 grid, achieves the same min. More unusual
imum distance as square 8-QAM,
8-ary formats are ring-1-7 (3f)
and 8-hex (3g)
. The latter is the optimum 8-ary constellation
in terms of minimum distance [210]. Notice that the constellation is dc-free; because of its asymmetry, this implies that the
innermost constellation point is not located at the origin.
Fig. 43 shows soft- and hard-decision capacity values for
8-ary modulation schemes. It is noteworthy that despite its
having the largest minimum distance, 8-hex performs slightly
worse than ring-1-7 for a wide SNR range. The SER of 8-hex
is smaller than that of ring-1-7 only for very large SNRs
dB); in this range, the capacities of both schemes have
already saturated at 3 bits/symbol.



Fig. 48. Capacities of -QAM constellations ( = 8; 16; 32; 64; 128; 256;
512; 1024) with soft (solid lines) and hard decision (dashed lines).
Fig. 46. Capacities of various 16-ary formats with soft (solid lines) and hard
decision (dashed lines): hexagonal (4g), QAM (4b), PSK (4c), ring-1-3-5-7 (4f),
2-ASK/8-PSK (4d).

Fig. 47. QAM constellations with 32, 64, 128, 256, 512, and 1024 points.

C. 16-ary Constellations
The most prominent 16-ary constellations are (sorted by
, 16-QAM
minimum distance): 16-hex (4g)
, 4-ASK/4-PSK (4e)
, ring-1-3-5-7 (4f)
2-ASK/8-PSK (4d)
, 16-PSK (4c)
. Fig. 44 contains
illustrations of these constellations.
The performance of selected 16-ary modulation schemes is
depicted in Figs. 45 and 46, respectively. As expected, the modulation schemes perform according to their minimum distance
at large SNRs. The hexagonal constellation achieves the highest
capacity; however, the gain over 16-QAM is marginal. Because
of the simpler implementation in practical systems, 16-QAM is
generally preferred over 16-hex. Across the SNR range shown in
Fig. 46, 2-ASK/8-PSK and 4-ASK/4-PSK achieve virtually the
same capacity values. 16-PSK performs worst in terms of error
ratio or capacity at all values of SNR shown. With increasing
, the gain obtained from soft decoding increases.

Fig. 49. SER (solid) and BER (dashed) curves for M -QAM constellations.

D. Higher Order Constellations

increases, the trend observed for
At large SNRs, those constellations with the largest minimum
distance achieve the highest capacity. These are hexagonal
constellations, but QAM constellations perform only slightly
worse. Ring constellations and ASK/PSK combinations are optimum at low SNRs. In absolute values, however, their capacity
gain over QAM is very small. The SER/BER curves as well as
soft- and hard-decision capacities for all -QAM constellaare shown in Figs. 48 and 49,
tions with
The authors would like to gratefully acknowledge discussions with T. Freckmann, J. P. Gordon, B. Basch, R. W. Tkach,
A. R. Chraplyvy, A. A. M. Saleh, M. Joindot, S. Korotky,
M. Magarini, S. Colas, A. Gnauck, D. Caplan, P. Andrekson,
G. P. Agrawal, A. Chowdhury, and H. Kogelnik, and many
others not mentioned here.











