0% found this document useful (0 votes)
4 views18 pages

Prob Shaping

The document discusses probabilistic constellation shaping (PCS), which has enabled recent record-setting optical fiber communication experiments by providing rate adaptability and sensitivity gains. It examines the performance of PCS compared to other shaping techniques and the impact of non-ideal PCS and forward error correction. It also reviews key assumptions and derivation of optimal parameters for PCS and FEC to maximize information rate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views18 pages

Prob Shaping

The document discusses probabilistic constellation shaping (PCS), which has enabled recent record-setting optical fiber communication experiments by providing rate adaptability and sensitivity gains. It examines the performance of PCS compared to other shaping techniques and the impact of non-ideal PCS and forward error correction. It also reviews key assumptions and derivation of optimal parameters for PCS and FEC to maximize information rate.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

1590 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO.

6, MARCH 15, 2019

Probabilistic Constellation Shaping for Optical


Fiber Communications
Junho Cho , Member, IEEE, and Peter J. Winzer, Fellow, IEEE, Fellow, OSA

(Invited Paper)

Abstract—We review probabilistic constellation shaping (PCS),


which has been a key enabler for several recent record-setting op-
tical fiber communications experiments. PCS provides both fine-
grained rate adaptability and energy efficiency (sensitivity) gains.
We discuss the reasons for the fundamentally better performance of
PCS over other constellation shaping techniques that also achieve
rate adaptability, such as time-division hybrid modulation, and ex-
amine in detail the impact of sub-optimum shaping and forward
error correction (FEC) on PCS systems. As performance metrics
for systems with PCS, we compare information-theoretic measures
such as mutual information (MI), generalized MI (GMI), and nor-
malized GMI, which enable optimization and quantification of the
information rate (IR) that can be achieved by PCS and FEC. We
derive the optimal parameters of PCS and FEC that maximize the
IR for both ideal and non-ideal PCS and FEC. To avoid plausi- Fig. 1. (a) Geometric and (b) probabilistic constellation shaping.
ble pitfalls in practice, we carefully revisit key assumptions that
are typically made for ideal PCS and FEC systems.
Index Terms—Modulation, optical fiber communication, proba-
bilistic constellation shaping, quadrature amplitude modulation. to within tenths of a decibel. Remarkably, capacity-approaching
soft-decision (SD) FEC codes have a good structure for low-cost
I. INTRODUCTION parallel application-specific integrated circuit (ASIC) imple-
T HAS been known since 1948 when information theory was mentation, and have hence been adopted as a quasi-standard
I first established in Shannon’s seminal paper [1] that a contin-
uous Gaussian source distribution achieves the capacity of the
in almost every field of communications [12]–[18]. A tremen-
dous amount of research has been published in the golden era of
additive white Gaussian noise (AWGN) channel when ideal for- FEC since 1993, and research on constellation shaping was rel-
ward error correction (FEC) is assumed. Between the late 1980s atively unpopular except for a small number of isolated papers,
and the early 1990s, many studies developed discrete modula- e.g., [19]–[25]. This may be partly because the shaping gain
tion techniques to mimic continuous Gaussian signaling, com- relative to a square quadrature amplitude modulated (QAM)
monly referred to as constellation shaping [2]–[6]. Constellation constellation is fundamentally limited to ∼1.53 dB, while the
shaping, however, did not find broad applications, except for the coding gain with modern SD FEC codes easily reaches 10 dB
V.34 voice band modem over telephone lines that was standard- at a bit error ratio (BER) of 10−15 , and partly because there
ized by the International Telecommunications Union (ITU) in was no effective method to implement capacity-approaching
1994 [7]. While constellation shaping attempts to approach the constellation shaping up until very recently.
Shannon limit from a modulation perspective, approaching the In the context of optical communications, geometric constel-
Shannon limit from a coding perspective saw a new wave of lation shaping (GCS) in the form of multi-ring constellations
substantial progress with the invention of turbo codes in 1993 was used to estimate the Shannon limit of the nonlinear optical
[8]. The success of turbo codes led to the rediscovery of low- fiber channel [26], and in the form of iterative polar modulation
density parity-check (LDPC) codes [9]–[11], which reduced (IPM) to achieve experimental spectral efficiency (SE) records
the coding gap to the (modulation-constrained) Shannon limit [27], [28]. Using GCS, the location of the constellation points
in the complex plane is arranged to approximate a Gaussian
distribution, cf. Fig. 1(a). However, GCS has some serious prac-
Manuscript received December 6, 2018; revised January 9, 2019 and February tical disadvantages that have prevented its commercialization:
7, 2019; accepted February 8, 2019. Date of publication February 12, 2019; date
of current version March 28, 2019. (Corresponding author: Junho Cho.) (i) there is no simple solution to finding locations of the GCS
The authors are with Nokia Bell Labs, Holmdel, NJ 07733 USA (e-mail:, constellation points for arbitrary channel conditions; (ii) the ir-
[email protected]; [email protected]). regular constellation points of GCS increase the complexity of
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. coherent digital signal processing (DSP) for robust signal re-
Digital Object Identifier 10.1109/JLT.2019.2898855 covery prior to decoding; and (iii) the general infeasibility of
0733-8724 © 2019 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution
requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1591

Fig. 2. Architectures for PCS. Fig. 3. Schematic illustration of the AIR of the auxiliary AWGN channel mod-
eling an optical fiber channel. Upper solid line: Gaussian signaling (i.e., AWGN
capacity), lower solid line: uniform QAMs with arbitrarily rate-adaptable FEC
(i.e., modulation-constrained AIR), staircase lines: uniform QAMs with nine
Gray mapping increases the complexity of demapping symbols different fixed-rate FEC codes (i.e., modulation- and code-constrained AIRs).
to soft-decision bit metrics.
It is only four years ago that constellation shaping began to
attract significant attention, both in research and in rapidly fol- SD FEC codes, with minimum to no specific tailoring for the
lowing productization, in the form of probabilistic constellation use in a PCS application.
shaping (PCS), which shapes the probability of occurrence of PCS based on the PAS architecture in optical communica-
the constellation points rather than their locations to approxi- tions was first demonstrated by full-field simulations [34] and
mate Gaussian signaling, as shown in Fig. 1(b). In contrast to transmission experiments [35] in 2015. Record SEs using PCS
GCS, (i) it is simple to optimize these probabilities through a were then demonstrated across a wide range of transmission
single parameter to match any given channel condition, (ii) con- distances from 500 km to 4,000 km [36], and a capacity of
stellation points are placed on the rectilinear grid of a square 65 Tb/s was demonstrated at a record SE using PCS, exploiting
QAM template, which facilitates coherent DSP by robust state- C and L bands over 6,600 km in a laboratory experiment [37].
of-the-art square-QAM algorithms, and (iii) Gray mapping fa- The first field trial over a trans-oceanic submarine cable using
cilitates symbol demapping for subsequent SD FEC. PCS achieved a record SE over 5,500 km and 11,000 km [38].
Combinations of PCS and GCS have also been studied in the Over a short distance of 50 km, a record SE of 17.3 b/s/Hz was
context of optical communications [29], [30], but these have demonstrated using PCS on a 10-subcarrier superchannel [39],
yielded little gain over pure PCS based on square QAM tem- [40]. The first commercial transponder using PCS was recently
plates, which already approach the Shannon limit to within announced [41]. The first real-time experimental demonstration
0.1 dB in the AWGN channel. Nevertheless, the combination of PCS was reported in [42]. The numerous milestones that have
of GCS and PCS to combat channel nonlinearities [31], [32] is been achieved in only 4 years and the rapid adoption of PCS in
not yet a completely resolved problem. the commercial sector bear testimony to the significance of PCS
PCS is practically enabled by the probabilistic ampli- in improving the performance of optical fiber communications.
tude shaping (PAS) architecture [33], which shows capacity-
approaching performance with a practical shaping and cod- II. BENEFITS OF PCS IN OPTICAL TRANSMISSION
ing implementation and elegantly resolves the long-standing
problem of PCS in terms of combining shaping and coding, as A. Fiber Channel Capacity and Achievable Information Rates
visualized in Fig. 2: The problem with previously known PCS The trade-off between the achievable information rate (AIR)
architectures is that performing coding after shaping at the trans- and the transmission distance in a fiber-optic transmission sys-
mitter distorts the shaped symbol distribution, as FEC parity tem is illustrated in Fig. 3; as the figure merely visualizes general
bits are generally not shaped, see Fig. 2(a). On the other hand, trade-offs, the exact axis labels that vary depending on the un-
performing coding before shaping at the transmitter can cause derlying system assumptions are omitted. While the nonlinear
error bursts upon de-shaping erroneously received symbols at fiber channel is a non-AWGN channel with memory, whose gen-
the receiver, see Fig. 2(b). The PAS architecture elegantly cir- eral capacity has been estimated but is not exactly known [26],
cumvents this problem by optimally intertwining shaping and [43], it can under certain assumptions be accurately modeled
coding in a capacity-approaching and efficiently implementable as a memoryless AWGN channel [26], [44]–[46]. The AIR for
way, cf. Fig. 2(c). Coding and shaping are decoupled through a this auxiliary AWGN channel can then be maximized over all
parallel transmitter architecture (as reviewed in Section II-A.) possible input distributions, assuming ideal FEC coding with
such that their independent optimization leads to jointly optimal infinite code length and unlimited decoder complexity, lead-
performance. This greatly simplifies the implementation of en- ing to a capacity estimate of the fiber channel as represented
coder and decoder by allowing the use of off-the-shelf modern by its auxiliary AWGN channel. The capacity of the auxiliary
1592 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

AWGN channel, however, does not represent the fundamental


fiber channel capacity, but rather a lower bound of it, in the
sense that a higher AIR may be obtained if one could further ex-
ploit intra- and inter-channel nonlinear interference to enhance
the signal-to-noise ratio (SNR). The largest recovered SNR of a
fiber channel depends on the network scenario and on assump-
tions about what information is and is not known to the various
transponders within the network. This leads to a variety of capac-
ity estimates for the optical fiber channel [26], [46]. Regardless
of the sophistication of the optical fiber channel model, it is a
general observation that capacity is maximized by a certain op-
tical signal power. Furthermore, as both optical amplifier noise
[26] and nonlinear interference noise (NLIN) at optimized opti-
cal channel powers [44]–[46] are, either exactly or to an excel- Fig. 4. (a) Optimal code rate R c∗ for uniform QAM, and (b) optimal code rate
R c∗ (solid lines) and optimal shaping rate R s∗ normalized by m (dashed lines)
lent approximation for Gaussian signaling, linearly proportional for PS QAM.
to the transmission reach, the channel capacity decreases loga-
rithmically with transmission distance in the high SNR regime,
as illustrated by the upper solid line in Fig. 3 [46]. Achieving rates to the form
the auxiliary AWGN channel capacity implies, at each trans- Rc = z kb
= kc
, (1)
znb nc
mission distance, the use of the optimally chosen variance of
a Gaussian-shaped modulation as well as optimal FEC perfor- resulting in a coding overhead of (nb − kb )/kb , with kb and
mance at an optimally chosen code rate Rc ∈ (0, 1]; hence, at- nb being small positive integers. Hence, practically achievable
taining the capacity involves the continuous adaptation of both code rates have a relatively coarse granularity and do not
modulation and FEC code rate. If we restrict ourselves to uni- fall on a uniform grid; e.g., the 9 code rates of Fig. 3,
form square QAM constellations, the modulation-constrained Rc = 1/2, 2/3, . . . , 9/10, have increments of 0.167, 0.083,
AIR is decreased to below the modulation-unconstrained AIR . . . , 0.011. Together with a set of uniform M 2 -ary QAM
(i.e., the capacity of the auxiliary channel), as indicated by the constellations, this leads to IRs of1
lower solid line in Fig. 3, suffering a loss called the shaping gap IR = 2mRc (2)
due to the non-Gaussianity of the signal. In principle, the QAM-
constrained AIR can be reached by optimizing the FEC code rate in bits/symbol (per two dimensions: in-phase I and quadrature
for each transmission distance with uniform square QAM for- Q), where m = log2 M . Therefore, with uniform QAM and
mats. However, in practical ASIC implementations, only a few multi-rate FEC, one can only obtain coarse and irregular IR
code rates may be available, which lets the AIR decrease in the increments, as shown in Fig. 3.
form of a staircase function versus distance, as shown for nine The AIR is determined through the mutual information (MI)
different FEC rates Rc = 1/2, 2/3, . . . , 9/10 in Fig. 3. Despite or generalized MI (GMI), which will be discussed in Section III
these many FEC rates, there is still a significant gap to the opti- in more detail. We denote the AIR under a given transponder
mal AIR, as well as a step-like rate/reach trade-off. Compared to constraint by IR∗ . Once IR∗ is obtained for a given QAM order
uniform QAM, PCS achieves both an arbitrarily fine rate/reach and SNR, the required code rate Rc∗ is found as, cf. (2),
trade-off, even for a single FEC code rate, and bridges the shap-
Rc∗ = IR∗ / (2m) , (3)
ing gap to closely approach ultimate performance. These two
distinct benefits of PCS will be discussed in the context of con- which is depicted in Fig. 4(a) for various uniform square QAM
tending techniques in the subsequent Sections II-B and II-C. formats as a function of the SNR. Note that Rc∗ denotes the
theoretically largest code rate that leads to error-free decoding;
any actually used FEC code must have a rate smaller than Rc∗ .
B. Rate Adaptation
The available code rates may potentially be far from the op-
1) Uniform Square QAM With Multi-Rate FEC: In order to timum rate Rc∗ , which consequently leads to the step function
perform rate adaptation by FEC alone, as discussed along with behavior of Fig. 3. In order to obtain finer granularity than given
Fig. 3, the most common way in communication standards is to by the “mother codes”, codes can be shortened or punctured
use a small family of base matrices for LDPC coding, which are [48]–[52]. By shortening or puncturing s code symbols in each
highly optimized using, e.g., density evolution [11] or extrinsic codeword, with s  nc , code rates of (kc − s)/nc < Rc or
information transfer (EXIT) chart analyses [47], to approach kc /(nc − s) > Rc can be derived with a step size ΔR c ≈ 1/nc ,
the (modulation-constrained) AIR. Every matrix in the family letting the resulting code rate more closely approach Rc∗ . Since
of FEC codes is made to be a sub-matrix of a larger matrix to the code length nc is generally beyond tens of thousands in
establish a good structure for ASIC implementation. The base
matrices are then lifted by replacing each non-zero element with 1 Note that the IR is a property of the transponder parameters alone, while the
a z × z circulant matrix such that larger matrices can be derived AIR is a property of the channel, possibly constrained by assumptions on the
for actual LDPC codes. This construction limits the derived code transponder as well (cf. Table I in Section III).
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1593

per dimension. As λ increases, the MB distribution contains


fractional numbers of 1 ≤ H(X) < m bits/symbol per dimen-
sion, hence realizing rate adaptation with a reduced average
symbol energy. The functional block that performs rate-adaptive
shaping in the PAS architecture is the distribution matcher (DM),
which transforms uniformly distributed input information bits
to MB-distributed PAM output symbols, cf. Fig. 5. The DM
generates only the positive amplitudes of the M -PAM symbols
Fig. 5. The PAS architecture [33]. (a “half-PAM” constellation). A binary systematic FEC encoder
generates parity bits that are equally distributed in {−1, +1}.
Since the FEC code is systematic, it does not affect the infor-
optical fiber communications, the rate discrepancy  1/(2nc ) mation bits, so the positive-amplitude DM output remains un-
between the optimal Rc∗ and the realized Rc could then be made changed by FEC encoding. A symmetric M -PAM distribution
negligible and one could thereby make the steps finer and more is then created by multiplying each of the half-PAM symbols
closely approach the modulation-constrained AIR of Fig. 3. with a parity bit acting as a sign bit. In some cases, the sign bit
However, shortening or puncturing induces two problems in stream also includes some information bits in addition to parity
practice: (i) shortened or punctured codes generally have a wider bits, see [33], [54] for details.
gap to the AIR than the mother code [48]–[50], which can often In the PAS architecture with code rate Rc and entropy rate
be significant in practice [51], [52], because the optimal degree 2H(X), the IR can be calculated as [33], [54]
distribution for the rate of children codes may not necessarily
be derived by shortening or puncturing the mother codes, and IR = 2 (H (X) − m (1 − Rc )) , (5)
(ii) their error floor may be raised compared to the mother codes in bits/symbol per two dimensions. The term 2H(PX ) on
due to the change of their cycle properties, whose adverse effect the right-hand side of (5) is the largest number of informa-
must be minimized by a laborious optimization process [53]. tion bits that can be contained within a complex symbol (i.e.,
The impact of suboptimum codes on system performance will per two dimensions) with the distribution PX , which is con-
be discussed in detail in Section IV. trolled by the rate parameter λ for an MB distribution, and the
2) PCS With Variable-Rate and Fixed-Rate FEC: As an al- term 2m(1 − Rc ) quantifies the FEC overhead in bits/symbol
ternative to uniform square QAM with variable-rate FEC, PCS per two dimensions. Assuming bit-metric decoding (BMD, cf.
can be used for rate adaptation in conjunction with variable-rate Section III-B), IR∗ , i.e., the largest AIR for a given SNR and
or even with fixed-rate FEC. As shown in Fig. 5, the PAS archi- QAM template, can be obtained by maximizing the GMI over
tecture [33] achieves PCS by independently shaping each signal all possible MB distributions PX . The result then also repre-
dimension on an M -ary pulse amplitude modulation (PAM) sents the capacity of PAS in the auxiliary AWGN channel. The
template to construct a probabilistically-shaped (PS) M 2 -QAM maximization can be done numerically by an exhaustive search
constellation. This is possible since the in-phase and quadrature or by the bisection method, since the MB distribution has only
dimensions of a modulated signal are orthogonal. one free parameter λ. Rigorously speaking, IR∗ obtained this
In what follows, we use the convention that a scalar random way does not represent the unconstrained AWGN channel ca-
variable is denoted by a capital letter (e.g., X), a realization of pacity since (i) the finite number of constellation points in the
a scalar random variable by a lowercase letter (e.g., x), and an underlying QAM template imposes a weak constraint on the
alphabet (i.e., a set of allowed symbols) by a script letter (e.g., modulation and (ii) the decoding is BMD. However, the gap
X , with elements xi ). A vector of random variables is denoted between IR∗ and the capacity of the auxiliary AWGN channel
by a boldface capital letter (e.g., X), and a realization of a vector is negligible [55].
random variable by a boldface lowercase letter (e.g., x). From (5), the required code rate Rc to achieve an IR with a
Given the M -PAM symbol set X = ±1, ±3, . . . , ±(M − 1), channel input distribution PX can be calculated as
the probability of a constellation point x ∈ X is commonly gen- H(X )−I R /2
erated according to the Maxwell-Boltzmann (MB) distribution Rc = 1 − m . (6)
2 If the DM produces a length-ns amplitude block from a length-
e −λx
PX (x) =  (4)
x  ∈Xe −λx 
2
(ks − ns ) input bit block, with ks > ns , the sign path in the
with λ ≥ 0, which is the maximum-entropy distribution for PAS architecture transports ns sign bits per block, regardless of
X under an average-power constraint. The rate parameter λ whether they are information bits or parity bits from a shaping
controls the entropy rate2 2H(X) point of view, hence the PAS architecture implements a shaping
 of the PS QAM signal rate of
in bits/symbol, where H(X) = − x∈X PX (x)log2 PX (x) de-
notes the binary entropy. When λ = 0, the MB distribution de- Rs = ks
(7)
ns
generates to a uniform distribution with H(X) = m bits/symbol
in bits/symbol per dimension [54]. While a class of FEC mother
2 A stationary memoryless information source produces an entropy
codes has a relatively low degree of freedom to choose kc and nc
H(X 1 , . . . , X n ) that grows linearly with time n at a rate H(X ), hence the without shortening or puncturing, limiting the achievable rate
name “entropy rate.” adaptability as discussed above, there exists a DM algorithm that
1594 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

can finely adjust the number of input bits ks − ns to be mapped


into a length-ns block of output symbols, hence achieves
granularity of the shaping rate ΔR s = 1/ns . Denoting by X ∗
the M -PAM symbols that maximize the AIR through the MB
distribution PX ∗ , the small shaping granularity lets the realized
Rs closely approach the optimal entropy rate Rs∗ ≈ H(X ∗ ), by
choosing a large block length ns . Figure 4(b) shows the optimal
shaping rate Rs∗ (dashed lines) that produces the largest AIR
in the auxiliary AWGN channel and the corresponding optimal
code rate Rc∗ obtained using (6) with Rs∗ = H(X ∗ ). As shown
in the figure, when PCS shares the role of rate adaptation with
FEC by adjusting both Rs and Rc , the optimal code rate (i)
is much higher than when FEC alone performs rate adaptation
(Fig. 4(a)), and (ii) occupies a much narrower range [55]; in Fig. 6. Time and ensemble averages of symbols created by (a) TDHM and
(b) PCS.
the case of Fig. 4, we have 0.74 < Rc∗ ≤ 1 for PCS, instead of
0.18 < Rc∗ ≤ 1 for uniform QAM.
The narrow range of optimum FEC rates for PCS suggests
the potential use of a single (or a small number of) fixed-rate
FEC code(s), whereby rate adaptation is performed (almost)
exclusively by PCS. This then gives a code rate-constrained
AIR (with a weak modulation constraint given by the underly-
ing QAM template). Remarkably, it was shown in [55] that the
performance loss due to fixed-rate FEC with Rc = 0.8 does not
exceed 0.07 bits/symbol of IR per two dimensions or 0.3 dB
of SNR in the AWGN channel, valid for all square M 2 -QAM
templates with M 2 ≤ 1024. This assumes ideal PCS with a DM
that maps ks − ns information bits into ns PAM symbols such
that the realized shaping rate Rs = ks /ns is exactly equal to
H(X ∗ ). Such an ideal DM can be implemented, e.g., by con-
stant composition distribution matching (CCDM) [56], which
is asymptotically optimal in block length ns . CCDM achieves Fig. 7. Three-dimensional square lattice constellation points contained in
(a) a cube, and (b) a ball, and their marginal probability distributions as projected
close to optimal performance already with a relatively small onto each coordinate axis. Figure after [69].
ns ≤ 104 , its hardware architecture is universal for all shaping
rates Rs ≤ 2m, and at least in principle it is implementable in to-
day’s hardware. Other DM techniques that are lower-complexity
lations in a deterministic manner, such as frequency-division
than CCDM at small performance loss are discussed in [57]–
hybrid modulation (FDHM) or digital subcarrier multiplexing
[63]. In contrast to shaping, it is extremely difficult for FEC
[67], [68]). Consequently, while the rate granularity of TDHM
to narrow down the last few tenths of a decibel of coding gap;
can be as fine as that of PCS, the performance of TDHM does
for example, a rate-1/2 irregular and unstructured LDPC code
not reach that of PCS, as we shall see in the following section.
with block length nc = 107 and a maximum variable degree of
200 may approach the (modulation-constrained) AIR to within
0.04 dB at BER = 10−6 using belief-propagation decoding C. Energy Efficiency
with up to 2000 decoding iterations [64]. In this section, we illustrate various modulation schemes from
3) Time-Division Hybrid Modulation (TDHM): TDHM the perspective of a multi-dimensional signal space, which gives
time-interleaves symbols picked from different uniform square valuable insights into why PCS is needed to closely approach
QAM constellations in a deterministic manner to achieve fine the AWGN capacity. A set of ‘dimensions’ in signal space cor-
granularity of the IR [65], [66]. For example, using M12 -QAM responds to the collection of any physically orthogonal entities,
for a fraction 0 ≤ α ≤ 1 of the time, and M22 -QAM for a fraction which may be most intuitively viewed as the real-valued (single-
1 − α of the time, TDHM can realize an arbitrary shaping rate quadrature, PAM) amplitudes of consecutive symbols, which
of Rs = αm1 + (1 − α)m2 bits/symbol per dimension, where are orthogonal in time. Hence, 4 dimensions may be built by 4
m1 = log2 M1 and m2 = log2 M2 . When averaged over time, successive PAM symbols. Alternatively, 4 dimensions may be
TDHM creates the illusion of an MB-like symbol distribution, built by 2 successive QAM symbols, or by a single polarization-
cf. Fig. 6. However, TDHM is fundamentally different from division multiplexed (PDM) QAM symbol.
probabilistic constellation shaping in that a receiver can separate 1) Uniform QAM: As shown in Fig. 7(a), assume an ns -
the constituent constellations deterministically using the a pri- dimensional (hyper-) cube centered at the origin, each side be-
ori knowledge of their temporal locations. (The same is true for ing parallel with each of the ns coordinate axes. If the cube is
any other hybrid modulation scheme that uses multiple orthogo- uniformly filled with points on a square lattice grid, the pro-
nal signal dimensions to carry different uniform QAM constel- jection of any random selection of points onto any Cartesian
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1595

siderations apply to the constellation entropy (a property of the


transmitter), it can be shown that Gaussian signals also result in
maximum mutual information between the transmitted and the
received signals under a transmission energy constraint in the
presence of AWGN [70], Ch. 3], [71], Chs. 8, 9].
True Gaussian signaling requires continuous symbols whose
support is not confined to within a finite range of amplitudes.
This leads to high required digital-to-analog and analog-to-
Fig. 8. (a) Gaussian distribution of a signal, (b) the two-dimensional ‘fuzzy’ digital converter resolutions and to large peak-to-average power
ball with a non-uniform density created by their 2-fold Cartesian product, and
(c) the two-dimensional uniform ball with the same entropy as that of (b). ratios, which are both problematic engineering aspects in prac-
tice. If the symbols are discrete and confined to a finite range
on each coordinate axis, it can be shown that the distribution
that maximizes the entropy is an MB distribution [5], which
coordinate axis yields a uniform distribution of points (i.e., a
is a Gaussian distribution sampled at discrete amplitudes across
PAM constellation), regardless of the cardinality ns . Projections
a finite amplitude range, cf. (4). Here, it should be noted that
on different axes are independent and identically distributed
a continuous Gaussian distribution maximizes both the entropy
(IID). Conversely, the ns -fold Cartesian product of zero-mean
and the AIR under a transmission energy constraint, but the
uniform IID distributions confined on a finite support constructs
MB distribution is proven to maximize only the entropy, not
an ns -dimensional uniform cube.
the AIR, the latter being maximized using the Blahut-Arimoto
2) Probabilistic Constellation Shaping (PCS): Instead of the
algorithm [72], [73]. Nevertheless, the AIR obtained by the MB
cube, now assume an ns -dimensional (hyper-) ball centered at
distribution is very close to the AWGN channel capacity [33].
the origin, again with a uniform density of points within (cf.
Creating the shaped distribution in each dimension is the
Fig. 7(b)). The projection onto any one of the Cartesian coor-
task of the DM. For example, the CCDM algorithm creates a
dinate axes yields a non-uniform probability density. Since the
target distribution by fixing the number of occurrences of M -
energy of a signal point is quadratic in distance from the origin,
PAM symbols in each length-ns block; i.e., symbol xi ∈ X , for
a ball centered at the origin, which by definition is enclosed by
i = 1, . . . , M , appears exactly
ni times in each of the length-ns
a constant-radius surface, is the most energy efficient shape to
CCDM blocks, where ns = M i=1 ni , thereby creating a prob-
contain a given number of points in multi-dimensional space.
ability mass function (PMF) PX = [ nn 1s , . . . , nnMs ] that approxi-
When ns = 3, the points within the ball have ∼0.27 dB less
mates an MB distribution. Therefore, if we mark a constellation
average energy than those in the cube, assuming the same num-
point in ns -dimensional space, whose coordinates are speci-
ber of points (i.e., 512) and the same minimum distance (i.e.,
fied by the ns symbols ofthe CCDM block, its distance from
2) between them. This relatively small energy saving is due to M 2
the small choice of ns and the small number of points in this the origin is a constant i=1 ni |xi | , hence it lies on an
example, and increases with ns . ns -dimensionl spherical shell. Knowing that almost the entire
The energy savings can be translated into a better noise re- volume of a ball is near the surface in high-dimensional space
siliency in a communications context as follows: If the mini- (known as the sphere hardening phenomenon [70]), CCDM
mum distance of the ball is increased to ∼2.06 (i.e., Δ ≈ 1.03 casts symbols onto the surface of a ball as ns → ∞, which is a
in Fig. 7(b)) such that the average energy becomes the same for necessary condition to achieve the optimal energy efficiency. A
the ball and the cube, i.e., when we compare signals of equal en- sufficient condition for the optimal energy efficiency under the
ergy or signals of equal SNR, the points in the ball have now an constraint on the finite support on each coordinate axis is that
increased minimum distance, hence are more immune to noise. the DM maps each of the points in a ks -dimensional uniform
This suggests that transmitting discrete information symbols in cube to a distinct point in an ns -dimensional ball (truncated
ns dimensions (e.g., by transmitting successively in ns time to within a finite support in each dimension), thereby fulfilling
slots), the tightly enclosing shape of the symbols should be an Rs = ks /ns → H(X), where PX is an MB distribution. This
ns -dimensional ball instead of an ns -dimensional cube. is fulfilled by CCDM, as the block length ns → ∞. However,
As ns → ∞, and as the number of points on each axis if the block length ns is small, Rs is smaller than H(X), and
M → ∞, the probability density of the points projected onto the volume inside the surface of the ball is not negligible, hence
each coordinate axis converges to a Gaussian distribution. CCDM becomes sub-optimal. In this case, a direct mapping of
Conversely, if we generate an IID zero-mean Gaussian sig- uniformly distributed information bits to a completely filled ns
nal in every Cartesian coordinate axis, the composite signal in -dimensional ball-like constellation can outperform CCDM, as
n-dimensional space forms a uniformly dense ball as ns → ∞. is done, e.g., by shell mapping [63], [74]–[76].
Note that this statement only applies for ns → ∞, as composite 3) Time-Domain Hybrid Modulation (TDHM): When
points generated from a finite number of IID Gaussian amplitude speaking of ‘constellation shaping’ it is important to distinguish
distributions will generally result in a ‘fuzzy’ ball with a non- between ensemble-averages and time-averages, as visualized in
uniform density, not a true ball with a uniform density, as shown Fig. 6. The time average over all symbols in a data stream may
in Fig. 8. The energy savings (i.e., the shaping gain) of a ball yield the same symbol amplitude distribution for both TDHM
relative to a cube for the same volume approaches πe/6 ≈ 1.53 and PCS, in fact, the overall amplitude distribution averaged
dB [70], Ch. 14] in the limit of ns → ∞. While the above con- over all symbols in a TDHM stream may even be MB, and this
1596 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

Fig. 9. Time-averaged distributions generated (a) by TDHM, and (b) by PCS.

may suggest that TDHM and PCS should perform the same in
Fig. 10. Two-dimensional square lattice constellation points contained in
terms of their shaping characteristics. However, the ensemble (a) a cube, and (b) a rectangle, and their marginal probability distributions
average, i.e., the symbol amplitude distribution within a single in each coordinate axis.
time slot when averaged across all possible data streams, looks
very different for the two shaping schemes, as shown in Fig. 6. In
an ideal PCS implementation, ensemble average and time aver-
age result in the same distribution, letting the encoding process
be stationary and ergodic, and justifying the AIR calculated
based on the entropy as in (5) [77]. As an example, consider
the TDHM shown in Fig. 9(a) that interleaves symbols drawn
from a uniform binary phase-shift keying (BPSK) alphabet
XBPSK = [−1, +1] and symbols drawn from a 4-PAM alpha-
bet X4 -PAM = [−3, −1, +1, +3] at a multiplexing ratio α = 0.5
such that an MB distribution PX = [p1 , . . . , p4 ] = [ 18 , 38 , 38 , 18 ]
is observed at the receiver when performing a time aver-
age. The shaping rate of this TDHM is Rs = (1 + 2)/2 =
1.5bits/symbol per dimension, and the average symbol energy
is 4m =1 pm |xm |2 =3. Note that PCS can create the same time-
averaged distribution (hence the same average symbol energy
of 3), as shown in Fig. 9(b), but it can do so at a larger shap-
ing rate of Rs = H(X) ≈ 1.8 bits/symbol per dimension! This
shows that achieving a time-averaged MB distribution is only a
Fig. 11. AIR of various modulation schemes under bit metric decoding in the
necessary condition for optimal energy efficiency. AWGN channel.
By using different PAM orders in different time slots, TDHM
does not construct a ball but rather constructs a (hyper-) rect-
angle. As it is the cube (with equal side lengths) that is the TABLE I
PERFORMANCE METRICS FOR PCS
most energy-efficient shape among all possible rectangles for
the same volume, TDHM performs worse than uniform square-
QAM; and as the ball is more energy efficient than the cube,
PCS performs best. Figure 10 depicts a two-dimensional ex-
ample, representing square-QAM and TDHM in 2 dimensions.
The points in the rectangle have ∼3.3 dB larger average energy
than the points in the cube, with the same number of points
(i.e., 64) and the same minimum distance (i.e., 2). The same is
evident from Figure 11, which shows that TDHM (lower solid
line) can cause a loss of ∼2 dB in SNR [69], or 25% loss
in AIR [78], relative to optimal PCS (upper solid line) in the
AWGN channel, when all bit levels are encoded jointly by a III. PERFORMANCE METRICS FOR PCS
single FEC code of rate 0.8. If used with a fixed rate-0.8 FEC To quantify system performance of PCS in conjunction with
code, TDHM performs worse than uniform square QAMs with SD FEC, several approaches with and without an explicit focus
rate-adaptable FEC (cf. dashed lines in Fig. 11). A compari- on their operational meaning have been taken [79]–[85]. Rele-
son of rate adaptability and performance of the various coded vant performance metrics are summarized in Table I. The system
modulation schemes discussed so far are sketched in Fig. 12. model used to obtain these metrics is depicted in Fig. 13(a).
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1597

MI over all possible modulation formats (including continuous-


amplitude formats with infinitely many “constellation points”).
For the more practical class of BMD systems, a bit-to-symbol
mapper transforms an m-bit sequence [B1 , . . . , Bm ] to an M
-ary modulation symbol X, cf. Fig. 13(a), where m = log2 M .
If the bit sequences are encoded by binary FEC codes and are
decoded using BMD, and if we still allow infinite code length
and unlimited decoder complexity, the GMI represents an AIR
for BMD, in the same way as the MI represents an AIR for
SMD. Maximizing the GMI over all possible input symbol dis-
tributions for a square QAM template yields an AIR that is
constrained in terms of the code alphabet size, the specific mod-
Fig. 12. Rate adaptability and performance of various schemes.
ulation template, and the fact that we are using BMD. In this
section, without imposing any complexity constraints on FEC
and PCS, we review the MI, GMI, and other related metrics in
the context of the underlying transponder architecture. A more
realistic scenario will be discussed in Section IV, where prac-
tical (non-ideal, pragmatic) FEC and complexity-constrained
PCS are assumed.

A. Mutual Information
Assume that we use a length-nc M -ary SD FEC code with
code rate Rc = kc /nc together with an M -ary constellation,
and the (auxiliary) channel is memoryless AWGN. In this sys-
tem, based on perfect knowledge of the transmitted symbols
X, a measurable statistic of the channel is PY |X (Y |X), i.e.,
the probability of the observed physical entity Y given the
transmitted physical entity X, cf. Fig. 13(b), which is often
called the channel transition probability. An SD demapper pro-
duces the conditional probability PY |S (yi |s) of the i-th received
symbol yi , for i = 1, . . . , nc , for every symbol s in the code
alphabet. In our system where the FEC code has the same
alphabet size as the constellation, this is equivalent to the con-
ditional probability PY |X (yi |x)given a transmitted modulation
symbol x ∈ X , which is directly fed to the subsequent SMD
as an SD decoding metric. An optimal SMD finds a legiti-
mate codeword x = [x1 , . . . , xn c ] that is the most likely to
be transmitted among all M k c possible codewords, given the
noisy observation y = [y1 , . . . , yn c ], by maximizing the prod-
uct of the channeltransition probabilities over all symbols in
y, PY |X (y|x) = ni=1 c
PY |X (yi |xi ) [71], Ch. 7.7]. It should
Fig. 13. (a) System model, and architecture of decoders for (b) SMD, be noted that there are only M k c codewords that are legiti-
(c) multi-level coding and multi-stage decoding (discussed in Appendix), and mate for the underlying code, while M n c uncoded sequences
(d) BMD. can exist for an M -ary alphabet. Therefore, only one out of
M n c /M k c = M n c (1−R c ) possible words is a legitimate code-
word, which allows a decoder to select the nearest codeword
We first consider SMD with non-binary FEC codes that have
from a noisy non-codeword word. (This illustrates the funda-
the same number of symbols in the code alphabet as that of
mental operation of FEC.) An AIR of the ideal and optimal
the modulation alphabet, i.e., M -ary FEC codes for an M -ary
SMD is the MI, defined as
constellation. (In principle, the code alphabet need not have the  
same cardinality as the modulation alphabet, but this restriction Δ PY |X (Y |X )
I (X; Y ) = EX ,Y log2
makes it simple to develop equations and achieves capacity in a PY (Y )
memoryless channel.) As briefly discussed in Section II-B.1, a  
PY |X (Y |X )
relevant performance metric for SMD is the MI that quantifies = EX ,Y log2   
(8)
an IR that is achievable (hence an AIR) using infinite code x  ∈X PX (x ) PY |X (Y |x )

length and unlimited decoder complexity. The channel capacity, in bits/symbol per dimension, where X is a random variable for
known as the Shannon limit (SL), is obtained by maximizing the the one-dimensional transmitted signal, Y is a random variable
1598 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

for the corresponding received signal in the AWGN channel which represents the SD decoding metric of BMD. An
with a known noise variance, and EX ,Y ( · ) denotes the expec- SD demapper for BMD produces the conditional likelihood
tation taken over Xand Y. Here, by “ideal” SMD, we mean that PB j |Y (bi,j |yi ) for the j-th bit bi,j of the i-th transmitted sym-
a code is of infinite length (nc → ∞), and by “optimal” SMD, bol xi , for i = 1, . . . , nc , which is then input to the subsequent
we mean that (i) the code rate Rc is chosen to match the chan- binary SD decoder, cf. Fig. 13(d). Here, we omit the time in-
nel condition, and (ii) no other codeword has a higher likelihood dex i from Bi,j and Yi since the PCS encoding is a stationary
than the codeword chosen by SMD, since the decoder is (unreal- process and the channel is assumed to be stationary as well.
istically) capable of sorting all M k c codewords in a descending For a length-nc binary code, optimal BMD finds a legitimate
order of their probabilities PY |X (y|x). The supremum of (8) codeword b = [b1,1 , . . . , bn c /m , m ] that is the most likely to be
over all possible (continuous- and discrete-amplitude) input dis- transmitted among all 2k c possible codewords by maximizing
tributions PX is the channel capacity, which on an (auxiliary) n c /m m
PB|Y (b|y) = i=1 j =1 PB i , j |Y (bi,j |yj ), given the noisy
AWGN channel can be achieved by Gaussian signaling, as dis- observation y = [y1 , . . . , yn c /m ]. Multiplications in PB|Y (b|y)
cussed in Section II. are often removed by taking the logarithm without affecting the
Although it is in principle possible to use non-binary codes decoding performance. In addition, instead of producing two
and SMD in the PAS architecture, PCS in optical systems is metrics PB j |Y (0|yi ) and PB j |Y (1|yi ) for each received symbol
commonly implemented using binary codes and BMD for com- yi , the SD BMD demapper can produce only one log-likelihood
plexity reasons, hence the MI does not generally represent the ratio (LLR) metric
most relevant performance metric. P B j |Y (0|y i )
log P B j |Y (1|y i ) , (10)
B. Generalized Mutual Information
which will be discussed in Section IV in more detail.
Let us next consider BMD in Fig. 13(a), where a bit-to-symbol Note that the BMD demapper produces only log2 M LLRs
Δ
mapper transforms a vector B = [B1 , . . . , Bm ] to a symbol X per received symbol, whereas an SMD demapper pro-
of an M -PAM constellation. It should be first noted that Bj duces |X | = M LLRs per received symbol, in the form of
for j = 1, . . . , m are logical entities that are not directly cast log PX |Y (x1 |yi )/PX |Y (x|yi ) for all x ∈ X , where x1 de-
into the channel, but only through their physical representation notes the first letter in X . Using the conditional likelihood
X, e.g., a voltage or an optical field amplitude. On the other PB j |Y (Bj |Y ) in (9), the channel transition probability can be
hand, in the context of BMD, the decoder estimates bits and approximated as (see Appendix for derivation details and for a
not symbols. Therefore, the decoder operates on PY |B j (Y |Bj ) clarification of the operational meaning of the obtained results)
instead of PY |X (Y |X), calculated as ⎡ ⎤
m
Δ PY (Y )
PB j ,Y (Bj , Y ) QY |X (Y |X ) = ⎣ PB j |Y (Bj |Y )⎦
PY |B j (Y |Bj ) = j =1
PX (X)
PB j (Bj )
 ≈ PY |X (Y |X ) . (11)
x  ∈X
(j ) PY |X (Y |x ) PX (x )
b j (x )
= , This is called the mismatched decoding metric [86], [87], since
PB j (Bj ) n c /m
QY |X (y|x) = i=1 QY |X (yi |xi ) is not a monotonic func-
where bj (x) is the j-th bit of symbol x, and Xb =
(j ) Δ tion of PY |X (y|x), causing loss of decoding performance; in
{x ∈ X : bj (x) = b} denotes the set of constellation points other words, the codeword that maximizes QY |X (y|x) does not
x whose j-th bit representation is b ∈ {0, 1}. For example, necessarily maximize PY |X (y|x).
if we use binary reflected Gray coding (BRGC) {101, 100, Eventually, in analogy to the MI obtained from the exact
110, 111, 011, 010, 000, 001} to represent the 8-PAM sym- decoding metric PY |X (Y |X) as in (8), we obtain the GMI using
bol alphabet X = {−7, −5, . . . , +7}, the symbol sets cor- the approximate decoding metric QY |X (Y |X) as
responding to a ‘0’ and ‘1’ at the second bit position are Δ Q Y |X (Y |X )
(2) (2) GM I (X; Y ) = EX ,Y log2  P X (x  )Q Y |X (Y |x  ) (12)
X0 = {−7, −5, +5, +7} and X1 = {−3, −1, +1, +3}, re- x  ∈X

spectively. The conditional probability of observation y given in bits/symbol per dimension. After some mathematical manip-
2 = 0 is then calculated through PY |X (Y |X) as
transmitted bit B ulation (see Appendix), we can obtain a compact notation of
PY |B 2 (y|0) = x  ∈X ( 2 ) PY |X (y|x )PX (x )/PB 2 (0). In BMD, (12) as
0
we often use the conditional likelihood PB j |Y (Bj |Y ) instead 
GM I (X; Y ) = H (X) − m j =1 H (Bj |Y ) . (13)
of the conditional probability PY |B j (Y |Bj ), which can be ob-
tained by Bayes’ rule as In case of uniform PX and independent bit levels, (13) degen-
erates to
PB j (Bj ) m
PB j |Y (Bj |Y ) = PY |B j (Y |Bj ) GM I (X; Y ) = I (Bj ; Y ) ,
PY (Y ) j =1

(j )
x  ∈Xb
PY |X (Y |x ) PX (x ) which represents an AIR for bit-interleaved coded modulation
j (x ) (BICM) [87]. Importantly, the GMI in (13) has the same form
= , (9)
PY (Y ) as the “BMD rate” that was first defined in [33], and was proven
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1599

to be achievable [82], i.e., there exists a coding scheme such ted bit by sub-optimal coding compared to optimal coding. In
that the post-FEC BER can be made arbitrarily small, as the [80], FEC decoding simulations are performed using spatially-
code length nc → ∞. The supremum of GMI over all possible coupled (SC) LDPC codes, showing that for each code rate Rc†
PX is the capacity of PCS under the constraints of a square the coding gap δc is nearly constant across various distributions
QAM template and parallel BMD, which can be approximately PX and M 2 -QAM constellation templates; the most widely
achieved by an MB distribution. applicable coding gap is conservatively chosen as that of the
smallest constellation (i.e., 4-QAM) since it is the marginally
C. Normalized Generalized Mutual Information greatest among those of all PX and M 2 -QAM. This implies
The GMI quantifies the number of information bits per trans- that we can with high confidence declare error-free decoding if
mitted symbol that can be reliably transmitted through a given the channel metric N GM I(X † ; Y ) is larger than the code rate
channel. After proper normalization of the GMI, we can derive Rc† by δc , independent of modulation. Therefore, if only one
a channel metric that quantifies the number of information bits FEC code of rate rc with coding gap δc is available, the optimal
per transmitted bit, which is called the normalized GMI (NGMI) shaping distribution can be obtained as
[79]–[81]. Since the GMI is an AIR of the PAS architecture as PX † = argmax GM I (X; Y )
per our above discussion, we can replace the IR of (5) with the PX
GM I to obtain the unit-less metric subject to N GM I (X; Y ) ≥ rc + δc , (16)
H(X )−G M I (X ;Y )
N GM I (X; Y ) = 1 − . (14)
m where the last condition ensures error-free decoding. It has been
It immediately follows from (13) and (14) that shown in [88] that the loss of IR due to a constant coding gap δc
m is approximately proportional to m, which importantly implies
N GM I (X; Y ) = 1 − m1 j =1 H (Bj |Y ) . (15)
that a small QAM template with moderate shaping performs
Note that the asymmetric information (ASI) introduced in [85] better than a large QAM template with strong shaping.
from a different perspective has the same form as the NGMI.
Suppose that we have obtained the maximum GM I(X; Y ) B. Optimal FEC, Sub-Optimal Shaping
over all possible distributions of X, and denote by X ∗
the channel input that maximizes the GMI, i.e., X ∗ = If the FEC is optimal but PCS is sub-optimal, we can calculate
argmaxX GM I(X; Y ). It should be noted that GM I(X ∗ ; Y ) the IR loss Δs ≥ 0 that quantifies how many fewer information
and N GM I(X ∗ ; Y ) are not associated with potential imper- bits are transmitted per transmitted symbol per dimension by
fections of the underlying transceiver technology but represent a sub-optimal shaping algorithm compared to optimal shaping.
channel metrics of the auxiliary AWGN channel, whereas Rc∗ Formally, the IR loss due to a sub-optimal shaping algorithm is
Δ
in (1) and Rs∗ in (7) are the transceiver metrics that need to Δs = H(X † ) − Rs† , where X † is the output of the sub-optimal
be used to achieve GM I(X ∗ ; Y ), cf. Table I. In other words, shaping algorithm whose probability approximately follows an
the channel’s transmission capabilities as given by the channel MB distribution and Rs† ≤ H(X † ) is the realized shaping rate
metric GM I(X ∗ ; Y ) are fully exhausted when we use ideal (7). If we define a shaping gap as the unit-less ratio of the IR
binary FEC with the optimal code rate Rc∗ = N GM I(X ∗ ; Y ) loss relative to the entropy H(X † ) for the same average symbol
2
and ideal PCS with the optimal shaping rate Rs∗ = H(X ∗ ), as energy H ∗ [|X † | ], i.e.,
summarized in Table I.
Δ Δs Rs†
δs = = 1 − ,
IV. IMPACT OF SUB-OPTIMAL PCS AND FEC H (X † ) H (X † )

GMI and NGMI quantify theoretic channel metrics as well the IR obtained by sub-optimal shaping is a fraction
as the limit of transceiver technologies without imposing any Rs† /H(X † ) = 1 − δs ≤ 1 of the GMI. Also, by substituting Rs†
constraints on implementation complexity. However, they are for H(X † ) in (5), we have
also very useful to evaluate and optimize systems with sub-  
IR = Rs† − m 1 − Rc†
optimal pragmatic PCS and FEC, if shaping and coding gaps    
are properly taken into account. In what follows, let PX † denote = H X † (1 − δs ) − m 1 − Rc†
the distribution that maximizes the IR using a sub-optimal PCS
and/or FEC scheme. in bits/symbol per dimension. It follows from IR =
GM I(X † ; Y )(1 − δs ) that the optimal code rate that achieves
A. Sub-Optimal FEC, Optimal Shaping this IR is then given by
   
Since sub-optimal FEC requires more redundancy (i.e., a † H X † − GM I X † ; Y
Rc = 1 − (1 − δs )
lower code rate) than optimal FEC to achieve error-free de- m
coding, the largest code rate for error-free decoding is  † 
= N GM I X ; Y (1 − δs ) + δs . (17)
 
Rc† = N GM I X † ; Y − δc ,
If only one FEC code of rate rc with δc = 0 is available, and if
where δc ≥ 0 is the coding gap. The coding gap δc quantifies the shaping gap δs is known for every realized MB distribution
how much fewer information bits are conveyed per transmit- PX of the shaping algorithm, the optimal distribution for this
1600 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

Fig. 15. IR of non-ideal PCS with δs = 0.025, and non-ideal FEC with
δc = 0 (solid lines), δc = 0.05 (dashed lines), and δc = 0.10 (dotted lines).

calculated as
   
IR = GM I X † ; Y − mδc (1 − δs ) . (19)

At the same time, from (5) we have


   
IR = H X † (1 − δs ) − m 1 − Rc† (20)

in bits/symbol per dimension. Therefore, the optimal code rate


is given by relating (19) and (20) as
Fig. 14. Shaping gap δs of (a) CCDM and (b) MR-PCDM, with 4-PAM    
(dotted lines), 8-PAM (dashed lines), and 16-PAM (solid lines) constellations.
Rc† = N GM I X † ; Y − (1 + δc ) (1 − δs ) + 1. (21)
The numbers in parentheses show the block length n s .
In case where a fixed rate-rc code is used with a pre-determined
coding gap δc , if we assume a nearly constant shaping gap of
δs over all Rs , (20) shows that the practically achieved IR is
sub-optimal shaping scheme can be obtained by
increasing with the entropy rate H(X † ). Therefore, the optimal
PX † = argmax GM I (X; Y ) distribution PX † for the sub-optimal PCS and FEC can be
PX obtained by solving
rc − δs
subject to N GM I (X; Y ) ≥ . (18) PX † = argmax H (X)
1 − δs PX

In Fig. 14, the shaping gap is estimated for two sub-optimal rc − δs


subject to N GM I (X; Y ) ≥ + δc . (22)
finite-length DM algorithms: (a) CCDM [56], and (b) low- 1 − δs
complexity multi-rate prefix-free code DM (MR-PCDM) [89].
For some cases in Fig. 14, the shaping gap is almost constant Figures 15 and 16 show the IRs obtained by solving the
across the realized shaping rates Rs , e.g., when ns ≥ 320 with maximization problem (22), with coding gaps δc = 0, 0.05, 0.1,
CCDM, or when ns ≥ 1280 with MR-PCDM for 8- and 16- and shaping gaps δs = 0, 0.025, 0.05. Note that state-of-the-
PAMs. This constant shaping gap simplifies the maximization art soft-decision FEC codes have coding gaps of δc ≤ 0.1, and
problem (18) and facilitates the analysis, as will be shown in the CCDM with a block length ≥ 480 produces shaping gaps of
following section. δs  0.02, as shown in Fig. 14(a). It can be seen from Figs.
15 and 16 that a reduction of the coding gap is crucial to more
closely approach the channel capacity, but the effect of a shaping
C. Sub-Optimal FEC and Sub-Optimal Shaping
gap on the IR is relatively insignificant, except at high SNR
Combining the above results, if FEC and PCS are both where the IR is saturated. In practice, however, the IR at high
sub-optimal, after penalizing GMI by δc and δs , the IR can be SNR can be recovered if uniform QAM is used.
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1601

However, for short block lengths (i.e., small dimensions


ns ≤ 100), it is feasible by today’s implementation technol-
ogy to perform direct mapping of information bits to an ns
-dimensional ball-like constellation in an algorithmic manner,
e.g., using shell mapping [63], [74]–[76]. Shell mapping was
adopted in dial-up and fax modems in the mid-1990s, as de-
fined in the ITU-T Standard V.34 [7]. Obviously, the shaping
performance of shell mapping is somewhat sub-optimal due to
its limited block length.

B. SD FEC
In BMD, the SD decoding metric of the j-th bit level can be
represented by an LLR as (cf. (10))

P B j |Y (0|y ) (j ) PY |X (y |x )P X (x)
x ∈X
Lj (y) = log P B j |Y (1|y ) = log  0
(j ) PY |X (y |x )P X (x) . (23)
x ∈X
1

When symbol X is uniformly distributed over X , the LLR re-


Fig. 16. IR of non-ideal FEC with δc = 0.05, and non-ideal PCS with δs = 0 duces to
(solid lines), δs = 0.025 (dashed lines), and δs = 0.05 (dotted lines). 
( j ) PY |X (y |x )
x∈X
Lj (y) = log  0
( j ) PY |X (y |x )
x∈X 1
V. IMPLEMENTATION ASPECTS
and an efficient piecewise-linear approximation of Lj [92] leads
A. Distribution Matching to near-optimal decoding performance in belief-propagation de-
CCDM [56] is one of the most commonly assumed DMs for coding of LDPC codes [93]. If we use PS QAM with an MB
PCS in optical communications, since (i) it is asymptotically op- distribution PX in an AWGN channel with noise variance σ 2 ,
timal in block length, simplifying the analysis of experimental the LLR Lj can be calculated from the received signal y as
results, and (ii) it can be implemented on the same architecture   
( y −x ) 2
(j ) exp − 2 σ 2 −λx 2
for any shaping rate. However, CCDM uses modified arithmetic x ∈X
Lj (y) = log 
0 
( y −x ) 2
. (24)
coding that involves multiplications, divisions, and comparisons (j ) exp − 2 σ 2 −λx 2
x ∈X
1
of real numbers. An approximate implementation of CCDM
using fixed-point operations still needs multiplications and di- Let us denote the symbols that have a dominant effect in decod-
visions of (possibly large) integer numbers, see, e.g., [90]; the ing as
 
effect of limited numerical precision on the performance can be (y − x)2 2
analyzed following [91]. Furthermore, and more fundamentally, x0 = argmax exp − − λx
x∈X
(j ) 2σ 2
arithmetic coding is intrinsically serial in each block, and the 0

block size should be large to approach capacity, which impedes and


parallel ASIC implementations.  
(y − x)2
Approaches to design a DM algorithm that is computationally x1 = argmax exp − 2
− λx2 ,
efficient and also good for parallelization include PCDM, which x∈X
(j ) 2σ
1
was used in early demonstrations of PCS in optical communi-
cations [36]. This scheme is implemented using small look-up respectively, from the numerator and the denominator of (24).
tables (LUTs), and a framing method for PCDM is presented in Then, the max-log approximation of (24) using x0 and x1 leads
[58], [59], which allows variable-length prefix-free codes to be to an LLR estimate of the j-th bit level, which is a linear function
contained in a fixed-length block. Without framing, PCDM ap- of y as
 
proaches the optimal energy efficiency to within a few tenths of x0 − x1 1  
L̃j (y) = 2
y− 2
+ λ x20 − x21 .
 σ   2σ
a dB across a wide range of shaping rates with very fine granu- (25)
larity. Even after framing, the shaping gap is kept to within a few  
(a) (b)
tenths of a dB if the block length is large. Like CCDM, PCDM is
also an asymptotically good algorithm in block length. Indeed, The term (a) is a function of the channel parameter σ, and the
the asymptotically good performance of CCDM and PCDM is term (b) is a joint function of the channel (σ) and shaping (λ).
intrinsic, since they are both designed to avoid the exponen- When PS QAM degenerates to uniform QAM by λ = 0, (25) re-
tial complexity associated with the direct mapping of uniformly duces to the conventional linear LLR approximation of uniform
distributed information bits to an ns -dimensional ball of con- QAM, Lj (y) = (x0 − x1 )/σ 2 × (y − (x0 + x1 )/2). Figure 17
stellation points, by generating IID MB distributions in large shows the exact and piecewise-linear approximate LLRs of the
dimensions. Conversely, though, both schemes can result in a first 3 bit levels (i.e., of one quadrature) of a PS 64-QAM con-
significant shaping gap for short block lengths. stellation with BRGC [101, 100, 110, 111, 011, 010, 000, 001].
1602 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

Fig. 18. Probabilities of 8-PAM constellation points X and LLRs L j with


(a) H(X ) = 2.7 at SNR = 12.9 dB, and (b) H(X ) = 1.8 at SNR = 5.1 dB.

Fig. 17. Exact (solid lines) and piecewise-linear approximate (dashed lines)
LLRs of the (a) first, (b) second, and (c) third bit levels, with H(X ) = 2.6 on levels in this example). Therefore, in order to support strong
the 64-QAM template at SNR = 13 dB. shaping, FEC codes should be designed to be robust to shorten-
ing at the bit levels with a highly asymmetric LLR distribution.
With this, and looking back at the fact that a fixed coding gap
The piecewise-linear approximation (dashed) yields LLRs that causes a loss of IR that increases with m, overly strong shaping
are indistinguishable from the exact (solid) LLRs when their of a large QAM template, such as used, e.g., in [94], should
magnitudes (i.e., the absolute values |L̃j (y)| on the y-axis) are be avoided for pragmatic FEC decoding. Instead, one should
small; i.e., the approximation error is negligible for those LLRs switch to a smaller QAM template whenever the shaping gap
that play a crucial role in SD decoding. The approximation leads becomes small enough with weak shaping.
to an increasing discrepancy as the magnitude grows. This, how-
ever, has an insignificant impact on decoding performance, and C. Pre-FEC Performance Metrics and HD FEC
almost no impact at high SNR.
In terms of reporting raw transmission performance (pre-FEC
SD FEC codes are typically designed by assuming symmetric
BER or Q-factors), attention has to be paid to how these are de-
LLR distributions, which occur, e.g., as a consequence of BICM
termined for a shaped constellation. When performing HD of the
with uniform QAM constellations. However, when a constella-
received symbols according to the maximum a posteriori (MAP)
tion is strongly shaped such that its shaping rate Rs is much
decision rule, the decoder chooses x̂ = argmax PX |Y (x|y). If
smaller than 2m, LLRs can have highly asymmetric distribu- x∈X
tions. Therefore, performance loss can be observed in pragmatic we represent the constellation symbols X in a binary form
FEC decoding if the constellation is strongly shaped. As an ex- B = [B1 . . . Bm ] using the BRGC, two nearest-neighbor sym-
ample, the probability distribution of input symbol, PX (X), and bols xL , xR ∈ X of a received symbol y differ in only one
that of the LLR, PL i (Li ), are evaluated for two shaping rates bit. Denote this bit level by j. Then, the MAP decision can be
Rs = 2H(X) with H(X) = 2.7 and 1.8 in Fig. 18, using the 64- made as x̂ = argmax PB j |Y (bj (x)|y). In other words, x̂ = xL
x∈{x L ,x R }
QAM template, m = 3, and the BRGC [101, 100, 110, 111, 011,
010, 000, 001] in each dimension. The LLR distributions are ob- if PB j |Y (bj (xL )|y) > PB j |Y (bj (xR )|y), and x̂ = xR other-
tained at SNRs of 12.9 dB and 5.1 dB, respectively, which are wise. Therefore, an optimal decision boundary is given by the
the SNRs that achieve capacity with Rs∗ = 2H(X). With weak value d such that PB j |Y (bj (xL )|d) = PB j |Y (bj (xR )|d). That is,
shaping of H(X) = 2.7, all LLR distributions are symmetric PB j |Y (bj (xL )|d)/PB j |Y (bj (xR )|d) = 1, hence Lj (d) = 0 (cf.
or close to symmetric. With strong shaping of H(X) = 1.8, (23)). The HD boundaries are a union of the HD boundaries of
however, L2 and L3 become highly asymmetric around zero. constituent bit levels. Since evaluation of exact Lj (y) is compli-
In particular, at the second bit level, P (L2 < 0) ≈ 0.9963 and cated as shown in (24), and by knowing that the piecewise-linear
P (L2 > 0) ≈ 0.0037, hence the hard decision (HD) value of approximate of LLR is very accurate in low-magnitude regimes
the demapper output is almost always bit 1. This results in the (near Lj (y) = 0), we can obtain the HD boundaries using (25)
as by L̃j (d)˜ = 0. Therefore, from (25), the union of HD bound-
effect that the code bits are nearly shortened at the second bit
level, which amounts to 1/3 of the code bits. In the extreme case aries of all bit levels is given by
where λ → ∞, hence H(X) = 1, only the innermost constel-  
d˜k = 1 + 2λσ 2 x k +x2 k + 1 , (26)
lation points have a non-zero probability of occurrence, which
results in complete shortening of the code bits that are mapped for the M -PAM constellation X = [x1 , . . . , xM ] with x1 <
to outer symbols (i.e., the code bits at the second and third bit . . . < xM . Notice that the boundary d˜k is a joint function of
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1603

uses the (error-free) output of all the preceding decoders (cf.


Fig. 13(c)). The reason why MLC-MSD can achieve the SMD
capacity will become clear below.
First, recall that B is merely a binary representation of the
non-binary symbol X, hence we have
P B|Y (B|Y )P Y (Y )
PY |X (Y |X ) = PY |B (Y |B ) = P B (B) , (27)

where the last equation is again due to Bayes’ rule. Here, using
the chain rule, the likelihood can be rewritten as

PB|Y (B |Y ) = PB 1 ...B m |Y (B1 . . . Bm |Y )


= PB 1 |Y (B1 |Y ) × PB 2 |B 1 Y (B2 |B1 Y ) · · ·
Fig. 19. Penalty in Q factor when the HD boundaries of uniform 16-QAM are × PB m |B 1 ...B m −1 Y (Bm |B1 . . . Bm −1 Y )
used for PS 16-QAM.
m
= PB j |B 1 ...B j −1 Y (Bj |B1 . . . Bj −1 Y ). (28)
j =1
the channel (σ) and shaping (λ). For uniform PAM with λ = 0,
the boundaries in (26) reduce to d˜k = (xk + xk +1 )/2, which For example, with the BRGC {101, 100, 110, 111, 011,
is independent of the parameters σ and λ. Interestingly, given 010, 000, 001} of the 8-PAM constellation X = −7, −5,
σ and λ, the PS PAM boundaries are simply a constant multi- (1,2) Δ
. . . , +7, we have X00 = {x ∈ X : b1 (x) = 0, b2 (x) = 0} =
plication of the uniform PAM boundaries, hence making a uni- {+5, +7}, hence PY |B 1 B 2 (y|00) = x  ∈X ( 1 , 2 ) PY |X (y|x ) is
form grid; e.g., if DU = [d˜1 , . . . , d˜M −1 ] = [−6, −4, . . . , +6] 00
calculated using the measurable PY |X (Y |X), which in
for uniform 8-PAM, DP C S = [−6Δ, −4Δ, . . . , +6Δ] for PS
turn can be plugged into PB 2 |B 1 Y (0|0y) = PY |B 1 B 2 (y|00)
8-PAM, where Δ = 1 + 2λσ 2 . Therefore, when PCS is used,
PB 2 (0)/PY (y) to evaluate (28). Eventually, by plugging (28)
the raw pre-FEC BER should be calculated based on DP C S
into (27), we obtain an equivalent form of PY |X (Y |X) ex-
instead of DU . Figure 19 shows that, when PCS is performed,
pressed using the metrics of BMD as
QU = 10log10 BER obtained with the uniform 16-QAM bound-
aries DU can lead to > 0.5 dB of loss compared to QP C S PY |X (Y |X )
obtained with the optimal PS 16-QAM boundaries DP C S . ⎡ ⎤
m
PY (Y )
VI. CONCLUSION =⎣ PB j |B 1 ...B j −1 Y (Bj |B1 . . . Bj −1 Y )⎦ .
j =1
PX (X)
In this paper, we reviewed the theoretic foundation of PCS (29)
and discussed the merits of PCS over other constellation shaping
techniques. Information-theoretic measures such as MI, GMI, Using (29), an optimal MLC-MSD that consists of m different
and NGMI were explained with their operational meanings. length-n binary FEC codes finds m codewords b1 , . . . , bm such
Based on these measures, optimization problems are formulated that the product of the channel transition probabilities
for systems with optimal and sub-optimal PCS/FEC schemes, ⎡ ⎤
the solution of which provides the parameters of PCS and FEC ⎢m ⎥
that achieve the maximum IR under a given channel condition. n ⎢ ⎥ P (y )
⎢ ⎥ Y i
We revisited important assumptions that are commonly made ⎢ PB i , j |B i , 1 ...B i , j −1 Y (bi,j |bi,1 , . . . , bi,j −1 , yi )⎥
⎢ ⎥ PX (xi )
i=1 ⎣j =1 ⎦
for ideal PCS and FEC systems, and addressed the potential      
pitfalls that should be avoided in practice. (b)
(a)
(30)
APPENDIX is maximized, where bi,j denotes the j-th bit of the
In this section, we show that QY |X (Y |X) in (11) represents transmitted symbol xi . When nc → ∞, the terms
an approximated channel transition probability that derives the (a) and (b) can be factored out of the product  c as
GMI, in analogy to PY |X (Y |X) that derives the MI, and its limn c →∞ ni=1
c
[(a)(b)]=limn →∞ ni=1c
(a) · limn →∞ ni=1 (b),
operational meaning is illustrated. since both limits separately exist. In particular,
 c due to the
When binary codes are used with non-binary signaling, the asymptotic equipartition property (AEP), ni=1 (b) becomes
multi-level coding and multi-stage decoding (MLC-MSD) [95], concentrated at a fixed value 2−n c (H(Y )−H(X )) that is inde-
illustrated in Fig. 13(c), can achieve the SMD capacity. The pendent of the choice of the codeword (i.e., independent of
MLC-MSD encodes each bit level using a different binary FEC decoding), as nc → ∞. Therefore, decoding c in MLC-MSD is
code whose rate is matched to the bit level, and decodes the a function only of the remaining term ni=1 (a). The chain
received symbols in a successive manner from the 1st constituent operations in (a) describe the successive decoding procedure
bit level to the m-th bit level, where each of the m decoders of the MLC-MSD depicted in Fig. 13(c). This shows why
1604 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

MLC-MSD can achieve the SMD capacity using binary codes  m

and successive BMD. = PB 1 Y (0, y) PB j Y (bj , y)


MLC-MSD has a high complexity due to the use of multi- [b 2 ...b m ]∈{0,1} m −1 j =2
ple different FEC codes and a long latency due to successive  m
decoding of bit levels and is hence not very practical. The par- + PB 1 Y (1, y) PB j Y (bj , y)
allel BMD architecture shown in Fig. 13(d) is a low-complexity [b 2 ...b m ]∈{0,1}m −1 j =2
low-latency alternative to MLC-MSD. Parallel BMD uses an  m
approximation of the term (a) in (30) without relying on knowl- = (PB 1 Y (0, y)+PB 1 Y (1, y)) PB j Y (bj , y)
edge of any other bit levels as [b 2 ...b m ]∈{0,1} m −1 j =2

PB j |B 1 ,...,B j −1 ,Y (Bj |B1 , . . . , Bj −1 , Y )≈PB j |Y (Bj |Y ). (31)  m


= PY (y) PB j Y (bj , y) .
By plugging the right-hand side of (31) into (29), it follows [b 2 ...b m ]∈{0,1}m −1 j =2
that the channel transition probability PY |X (Y |X) can be ap- By recursion, therefore, we obtain
proximated as (11), in which the term PY (Y )/PX (X) has a
vanishing effect on decoding as the code length increases, for  m
PB j Y (bj (x ) , y) = PY (y)m .
the same reason as in (30). Therefore, optimal BMD finds a
x  ∈X j =1
codeword that maximizes the product of PB j |Y (Bj |Y ) over the
received symbols that span all of the nc codeword bits. Note By substituting this into (31), we have
that the mismatched decoding metric in (11) is valid for arbi- GM I (X; Y )
m
trary distributions PX , whereas the mismatched decoding met-  j =1 PB j |Y (bj (x) |y )
ric has been derived for uniform PX in most cases. In a special = PX ,Y (x, y) log2 dy
case where PX is uniform and bit levels B1 , . . . , Bm are inde- PX (x)
x∈X y
pendent of each other, such as in BICM with BRGC, the mis- ⎡ ⎤
matched decoding metric can be simplified as QY |X (Y |X) =  m
  = ⎣PX ,Y (x, y) log2 PB j |Y (bj (x) |y )⎦ dy
m
q
j =1 Y |B j (Y |Bj ), where q Δ 
x ∈X i PY |X (Y |x ), as y j =1
Y |B j = b j (x ) x∈X
derived in [87].   
(a)

We are now to derive (13). First, by substituting (11) into
(12), we have − [PX ,Y (x, y) log2 PX (x)] dy .
x∈X y
GM I (X; Y )   
  (b)
QY |X (Y |X )
= EX ,Y log2  The term (a) can be developed as
x  ∈X PX (x ) QY |X (Y |x ) ⎡ ⎤
 QY |X (Y |X )  
m
= PX ,Y (x, y) log2  dy (a) = ⎣PX ,Y (x, y) log2 PB j |Y (bj (x) |y )⎦ dy
y x  ∈X PX (x ) QY |X (Y |x ) x∈X y j =1
x∈X
m  
 PB |Y (bj (x) |y ) m 
 
= PX ,Y (x, y)log2 j =1 jm 
dy = PX ,Y (x, y) log2 PB j |Y (bj (x) |y ) dy
P X (x) x  ∈X j =1PB j |Y (bj (x ) |y )
x∈X y j =1 y x∈X
m
 j =1 PB j |Y (bj (x) |y ) 
m
= PX ,Y (x, y)log2   P B j Y (b j (x  ),y ) dy =− H (Bj |Y ) .
x∈X y PX (x) x  ∈X m j =1 P Y (y ) j =1
 
= PX ,Y (x, y) The term (b) can be developed as
x∈X y   
m (b) = − PX ,Y (x, y) dy log2 PX (x)
j =1 PB j |Y (bj (x) |y ) y
×log2  m dy. (32) x∈X
P X (x)  
P Y (y ) m x  ∈X j =1 PB j Y (bj (x ) , y) =− PX (x) log2 PX (x)
x∈X
In the denominator of the log term,
= H (X) .
 m

PB j Y (bj (x ) , y) Therefore, we obtain
x  ∈X j =1

m

 m GM I (X; Y ) = (a) + (b) = H (X) − H (Bj |Y ) ,


= PB j Y (bj , y) j =1
[b 1 ...b m ]∈{0,1}m j =1 which is equal to (13).
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1605

REFERENCES [27] I. B. Djordjevic, H. G. Batshon, L. Xu, and T. Wang, “Coded polarization-


multiplexed iterative polar modulation (PM-IPM) for beyond 400 Gb/s
[1] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. serial optical transmission,” in Proc. Opt. Fiber. Conf., San Diego, CA,
Tech. J., vol. 27, no. 3, pp. 379–423, Jul. 1948. USA, Mar. 2010, Paper OMK2.
[2] G. D. Forney Jr., R. G. Gallager, G. R. Lang, F. M. Longstaff, and S. U. [28] T. H. Lotz et al., “Coded PDM-OFDM transmission with shaped 256-
Qureshi, “Efficient modulation for band-limited channels,” IEEE J. Sel. iterative-polar-modulation achieving 11.15-b/s/Hz intrachannel spectral
Areas Commun., vol. SAC-2, no. 5, pp. 632–647, Sep. 1984. efficiency and 800-km reach,” J. Lightw. Tehcnol., vol. 31, no. 4,
[3] A. R. Calderbank and L. H. Ozarow, “Nonequiprobable signaling pp. 538–545, Feb. 2013.
on the Gaussian channel,” IEEE Trans. Inf. Theory, vol. 36, no. 4, [29] J.-X. Cai et al., “70.46 Tb/s over 7,600 km and 71.65 Tb/s over 6,970 km
pp. 726–740, Jul. 1990.
transmission in C+L band using coded modulation with hybrid constella-
[4] G. D. Forney Jr., “Trellis shaping,” IEEE Trans. Inf. Theory, vol. 38, no.
tion shaping and nonlinearity compensation,” J. Lightw. Tehcnol., vol. 36,
2, pp. 281–300, Mar. 1992.
no. 1, pp. 114–121, Jan. 2018.
[5] F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signal-
[30] R. T. Jones, T. A. Eriksson, Y. P. Metodi, and D. Zibar, “Deep learning of
ing for Gaussian channels,” IEEE Trans. Inf. Theory, vol. 39, no. 3,
pp. 913–929, May 1993. geometric constellation shaping including fiber nonlinearities,” in Proc.
Eur. Conf. Opt. Commun., Rome, Italy, Sep. 2018, Paper We1F.5.
[6] F.-W. Sun and H. C. A. van Tilborg, “Approaching capacity by equiprob-
able signaling on the Gaussian channel,” IEEE Trans. Inf. Theory, [31] R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Properties of nonlin-
vol. 39, no. 5, pp. 1714–1716, Sep. 1993. ear noise in long, dispersion-uncompensated fiber links,” Opt. Express,
[7] A Modem Operating at Data Signalling Rates of Up to 33 600 Bit/S for vol. 21, no. 22, pp. 25685–25699, Nov. 2013.
Use On the General Switched Telephone Network and On Leased Point- [32] T. Fehenberger, A. Alvarado, G. Böcherer, and N. Hanik. “On proba-
To-Point 2-Wire Telephone-Type Circuits, ITU-T Recommendation V.34, bilistic shaping of quadrature amplitude modulation for the nonlinear
Feb. 1998. fiber channel,” J. Lightw. Technol., vol. 34, no. 21, pp. 5063–5073, Nov.
[8] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error- 2016.
correcting coding and decoding: Turbo-codes. 1,” in Proc. IEEE Int. Conf. [33] G. Böcherer, F. Steiner, and P. Schulte., “Bandwidth efficient and rate-
Commun., Geneva, Switzerland, May 1993, vol. 2, pp. 1064–1070. matched low-density parity-check coded modulation,” IEEE Trans. Com-
[9] R. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, mun., vol. 63, no. 12, pp. 4651–4665, Dec. 2015.
vol. 8, no. 1, pp. 21–28, Jan. 1962. [34] T. Fehenberger, G. Bocherer, A. Alvarado, and N. Hanik, “LDPC coded
[10] D. J. C. MacKay and R. M. Neal, “Near Shannon limit performance of low modulation with probabilistic shaping for optical fiber systems,” in Proc.
density parity check codes,” Electron. Lett., vol. 32, no. 18, pp. 1645–1646, Opt. Fiber Commun. Conf., Los Angeles, CA, USA, Mar. 2015, Paper
Aug. 1996. Th.2.A.23.
[11] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of [35] F. Buchali, G. Bocherer, W. Idler, L. Schmalen, P. Schulte, and F. Steiner,
capacity-approaching irregular low-density parity-check codes,” IEEE “Experimental demonstration of capacity increase and rate-adaptation by
Trans. Inf. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001. probabilistically shaped 64-QAM,” in Proc. Eur. Conf. Opt. Commun.,
[12] Unified high-speed wireline-based home networking transceivers – Sys- Valencia, Spain, Sep. 2015, Paper PDP.3.4.
tem architecture and physical layer specification, ITU-T Recommendation [36] S. Chandrasekhar et al., “High-spectral-efficiency transmission of PDM
G.9960, Jul. 2015. 256-QAM with parallel probabilistic shaping at record rate-reach trade-
[13] IEEE Standard for Ethernet, IEEE Std 802.3, Sep. 2015. offs,” in Proc. Eur. Conf. Opt. Commun., Dusseldorf, Germany, Sep. 2016,
[14] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Paper Th.3.C.1.
Layer (PHY) Specifications, IEEE Std 802.11, Dec. 2016. [37] A. Ghazisaeidi et al., “65 Tb/s transoceanic transmission using
[15] IEEE Standard for Air Interface for Broadband Wireless Access Systems, probabilistically-shaped PDM-64QAM,” in Proc. Eur. Conf. Opt. Com-
IEEE Std 802.16, Aug. 2012. mun., Dusseldorf, Germany, Sep. 2016, Paper Th.3.C.4.
[16] Second Generation Framing Structure, Channel Coding and Modulation
[38] J. Cho et al., “Trans-Atlantic field trial using high spectral efficiency
Systems for Broadcasting, Interactive Services, News Gathering and Other probabilistically shaped 64-QAM and single-carrier real-time 250-Gb/s
Broadband Satellite Applications; Part 1 (DVB-S2), ETSI EN 302 307-1,
16-QAM,” J. Lightw. Technol., vol. 36, no. 1, pp. 103–113, Jan.
Nov. 2014.
2018.
[17] A. Leven and L. Schmalen, “Status and recent advances on forward er-
[39] S. L. I. Olsson, J. Cho, S. Chandrasekhar, X. Chen, P. J. Winzer, and S.
ror correction technologies for lightwave systems,” J. Lightw. Technol.,
Makovejs, “Probabilistically shaped PDM 4096-QAM transmission over
vol. 32, no. 16, pp. 2735–2750, Aug. 2014.
up to 200 km of fiber using standard intradyne detection,” Opt. Express,
[18] G. Tzimpragos, C. Kachris, I. B. Djordjevic, M. Cvijetic, D. Soudris, and I.
vol. 26, no. 4, pp. 4522–4530, Feb. 2018.
Tomkos, “A survey on FEC codes for 100 G and beyond optical networks,”
IEEE Commun. Surv. Tut., vol. 18, no. 1, pp. 209–221, Jan.–Mar. 2016. [40] S. L. I. Olsson, J. Cho, S. Chandrasekhar, X. Chen, E. C. Burrows, and P. J.
[19] L. Duan, B. Rimoldi, and R. Urbanke, “Approaching the AWGN channel Winzer, “Record-high 17.3-bit/s/Hz spectral efficiency transmission over
capacity without active shaping,” in Proc. IEEE Int. Symp. Inf. Theory, 50 km using probabilistically shaped PDM 4096-QAM,” in Proc. Opt.
Jun. 1997, p. 374. Fiber Commun. Conf., San Diego, CA, USA, Mar. 2018, Paper Th4C.5.
[20] D. Raphaeli and A. Gurevitz, “Constellation shaping for pragmatic turbo- [41] Nokia Corporation, Nokia Photonic Service Engine 3. 2018. [Online].
coded modulation with high spectral efficiency,” IEEE Trans. Commun., Available: https://networks.nokia.com/photonic-service-engine-3
vol. 52, no. 3, pp. 341–345, Mar. 2004. [42] J. Li et al., “Field trial of probabilistic-shaping-programmable real-
[21] S. L. Goff, B. Sharif, and S. Jimaa, “Bit-interleaved turbo-coded mod- time 200-Gb/s coherent transceivers in an intelligent core optical net-
ulation using shaping coding,” IEEE Commun. Lett., vol. 9, no. 3, work,” in Proc. Asia Commun. Photon. Conf., Hangzhou, China, Oct.
pp. 246–248, Mar. 2005. 2018, Paper Su2C.1.
[22] F. Schreckenbach and P. Henkel, “Signal shaping using non-unique symbol [43] G. Kramer, M. I. Yousefi, and F. R. Kschischang, “Upper bound on the
mappings,” in Proc. Allerton Conf. Commun. Control Comput., Sep. 2005, capacity of a cascade of nonlinear and noisy channels,” in Proc. Inf. Theory
pp. 1–10. Workshop, Jerusalem, Israel, Apr. 2015, pp. 1–4.
[23] B. K. Khoo, S. Le Goff, B. Sharif, and C. Tsimenidis, “Bit-interleaved [44] P. Poggiolini, G. Bosco, A. Carena, V. Curri, Y. Jiang, and F. Forghieri,
coded modulation with iterative decoding using constellation shaping,” “The GN-model of fiber non-linear propagation and its applications,” J.
IEEE Trans. Commun., vol. 54, no. 9, pp. 1517–1520, Sep. 2006. Lightw. Tehcnol., vol. 32, no. 4, pp. 694–721, Feb. 2014.
[24] S. Kaimalettu, A. Thangaraj, M. Bloch, and S. McLaughlin, “Constellation [45] R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Accumulation of nonlinear
shaping using LDPC codes,” in Proc. IEEE Int. Symp. Inf. Theory, Jun. interference noise in fiber-optic systems,” Opt. Express, vol. 22, no. 12,
2007, pp. 2366–2370. pp. 14199–14211, May 2014.
[25] H. Cronie, “Signal shaping for bit-interleaved coded modulation on the [46] R. Dar, M. Shtaif, and M. Feder, “New bounds on the capacity of the
AWGN channel,” IEEE Trans. Commun., vol. 58, no. 12, pp. 3428–3435, nonlinear fiber-optic channel,” Opt. Lett., vol. 39, no. 2, pp. 398–401, Jan.
Dec. 2010. 2014.
[26] R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, [47] S. Ten Brink, “Convergence behavior of iteratively decoded parallel con-
“Capacity limits of optical fiber networks,” J. Lightw. Technol., vol. 28, catenated codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1727–1737,
no. 4, pp. 662–701, Feb. 2010. Oct. 2001.
1606 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019

[48] T. Tian and C. R. Jones, “Construction of rate-compatible LDPC codes [72] S. Arimoto, “An algorithm for computing the capacity of arbitrary dis-
utilizing information shortening and parity puncturing,” EURASIP J. Wire- crete memoryless channels,” IEEE Trans. Inf. Theory, vol. 18, no. 1,
less Commun. Netw., vol. 2005, no. 5, pp. 789–795, Dec. 2005. pp. 14–20, Jan. 1972.
[49] T. V. Nguyen, A. Nosratinia, and D. Divsalar, “The design of rate- [73] R. Blahut, “Computation of channel capacity and rate-distortion func-
compatible protograph LDPC codes,” IEEE Trans. Commun., vol. 60, tions,” IEEE Trans. Inf. Theory, vol. 18, no. 4, pp. 460–473, Jul.
no. 10, pp. 2841–2850, Oct. 2012. 1972.
[50] D. G. M. Mitchell, M. Lentmaier, A. E. Pusane, and D. J. Costello, “Ran- [74] G. R. Lang and F. M. Longstaff, “A Leech lattice modem,” IEEE J. Sel.
domly punctured LDPC codes,” IEEE J. Sel. Areas Commun., vol. 34, Areas Commun., vol. 7, no. 6, pp. 968–973, Aug. 1989.
no. 2, pp. 408–421, Feb. 2016. [75] A. K. Khandani and P. Kabal, “Shaping multidimensional signal spaces.
[51] J. Ha, J. Kim, and S. McLaughlin, “Rate-compatible puncturing of low- I. Optimum shaping, shell mapping,” IEEE Trans. Inf. Theory, vol. 39, no.
density parity-check codes,” IEEE Trans. Inf. Theory, vol. 50, no. 11, pp. 6, pp. 1799–1808, Nov. 1993.
2824–2826, Nov. 2004.
[76] R. Laroia, N. Farvardin, and S. A. Tretter, “On optimal shaping of multi-
[52] C.-H. Hsu and A. Anastasopoulos, “Capacity achieving LDPC codes
dimensional constellations,” IEEE Trans. Inf. Theory, vol. 40, no. 4, pp.
through puncturing,” IEEE Trans. Inf. Theory, vol. 54, no. 10,
1044–1056, Jul. 1994.
pp. 4698–4706, Oct. 2008.
[77] H. D. Pfister, J. B. Soriaga, and P. H. Siegel, “On the achievable infor-
[53] R. Asvadi and A. H. Banihashemi, “A rate-compatible puncturing scheme
mation rates of finite state ISI channels,” in Proc. IEEE GlobeCom, San
for finite-length LDPC codes,” IEEE Commun. Lett., vol. 17, no. 1,
pp. 147–150, Jan. 2013. Antonio, TX, USA, Nov. 2001, pp. 2992–2996.
[54] J. Cho, X. Chen, S. Chandrasekhar, and P. Winzer, “On line rates, [78] J. Cho, S. Chandrasekhar, and P. Winzer, “Rate-adaptive modula-
information rates, and spectral efficiencies in probabilistically shaped tion schemes for high spectral efficiency optical communications,” in
QAM systems,” Opt. Express, vol. 26, no. 8, pp. 9784–9791, Apr. Proc. OSA Frontiers Opt., Washington, DC, USA, Sep. 2018, Paper
2018. FW5B-1.
[55] J. Cho, “Balancing probabilistic shaping and forward error correction for [79] A. Alvarado, E. Agrell, D. Lavery, R. Maher, and P. Bayvel, “Replacing
optimal system performance,” in Proc. Opt. Fiber. Conf., San Diego, CA, the soft-decision FEC limit paradigm in the design of optical communi-
USA, Mar. 2018, Paper M3C-2. cation systems,” J. Lightw. Technol., vol. 33, no. 20, pp. 4338–4352, Oct.
[56] P. Schulte and G. Böcherer, “Constant composition distribution matching,” 2015.
IEEE Trans. Inf. Theory, vol. 62, no. 1, pp. 430–434, Jan. 2016. [80] J. Cho, L. Schmalen, and P. Winzer, “Normalized generalized mutual
[57] J. Cho, S. Chandrasekhar, R. Dar, and P. J. Winzer, “Low-complexity information as a forward error correction threshold for probabilistically
shaping for enhanced nonlinearity tolerance,” in Proc. Eur. Conf. Opt. shaped QAM,” in Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden,
Commun., Dusseldorf, Germany, Sep. 2016, Paper W1C.2. Sep. 2017, Paper M.2.D.2.
[58] J. Cho, “Prefix-free code distribution matching for probabilistic constel- [81] A. Alvarado, T. Fehenberger, B. Chen, and F. M. J. Willems, “Achiev-
lation shaping,” IEEE Trans. Commun., submitted for publication. able information rates for fiber optics: Applications and computations,” J.
[59] J. Cho et al., “Probabilistic signal shaping and codes therefor,” U.S. Patent Lightw. Technol., vol. 36, no. 2, pp. 424–439, Jan. 2018.
Appl. 15/374397, Dec. 9, 2016. [82] G. Böcherer, “Achievable rates for probabilistic shaping,” 2018,
[60] G. Böcherer, F. Steiner, and P. Schulte, “Fast probabilistic shaping arXiv:1707.01134.
implementation for long-haul fiber-optic communication systems,” in [83] G. Böcherer, “On joint design of probabilistic shaping and FEC for optical
Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden, Sep. 2017, Paper systems,” in Proc. Opt. Fiber Conf., San Diego, CA, USA, Mar. 2018,
Tu.2.D.3. Paper M4E-1.
[61] T. Yoshida, M. Karlsson, and E. Agrell, “Short-block-length shaping by [84] G. Böcherer, P. Schulte, and F. Steiner, “Probabilistic shaping and for-
simple mark ratio controllers for granular and wide-range spectral effi- ward error correction for fiber-optic communication systems,” J. Lightw.
ciencies,” in Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden, Sep. Technol., to be published.
2017, Paper Tu.2.D.2. [85] T. Yoshida, M. Karlsson, and E. Agrell, “Performance metrics for systems
[62] T. Yoshida, M. Karlsson, and E. Agrell, “Low-complexity variable-length with soft-decision FEC and probabilistic shaping,” IEEE Photon. Technol.
output distribution matching with periodical distribution uniformaliza- Lett., vol. 29, no. 23, pp. 2111–2114, Dec. 2017.
tion,” in Proc. Opt. Fiber. Conf., San Diego, CA, USA, Mar. 2018, Paper [86] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai Shitz, “On informa-
M4E.2. tion rates for mismatched decoders,” IEEE Trans. Inf. Theory, vol. 40,
[63] P. Schulte and F. Steiner, “Divergence-optimal fixed-to-fixed length dis- no. 6, pp. 1953–1967, Nov. 1994.
tribution matching with shell mapping,” IEEE Wireless Commun. Lett., to [87] A. Martinez, A. G. i Fàbregas, G. Caire, and F. M. J. Willems, “Bit-
be published. interleaved coded modulation revisited: A mismatched decoding per-
[64] S.-Y. Chung, G. D. Forney, T. J. Richardson, and R. Urbanke, “On the spective,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2756–2765,
design of low-density parity-check codes within 0.0045 dB of the Shannon Jun. 2009.
limit,” IEEE Commun. Lett., vol. 5, no. 2, pp. 58–60, Feb. 2001. [88] J. Cho, S. L. I. Olsson, S. Chandrasekhar, and P. Winzer, “Information
[65] W.-R. Peng, I. Morita, and H. Tanaka, “Hybrid QAM transmission tech- rate of probabilistically shaped QAM with non-ideal forward error correc-
niques for single-carrier ultra-dense WDM systems,” in Proc. Opto- tion,” in Proc. Eur. Conf. Opt. Commun., Rome, Italy, Sep. 2018, Paper
Electron. Commun. Conf., Kaohsiung, Taiwan, Jul. 2011, pp. 824–825. Th1H.5.
[66] X. Zhou et al., “4000 km transmission of 50 GHz spaced, [89] J. Cho and P. J. Winzer, “Multi-rate prefix-free code distribution match-
10 × 494.85-Gb/s hybrid 32-64QAM using cascaded equalization and ing,” in Proc. Opt. Fiber Commun. Conf., to be published.
training-assisted phase recovery,” in Proc. Opt. Fiber. Conf., Los Angeles, [90] G. Böcherer, F. Steiner, and P. Schulte, “Fast probabilistic shaping imple-
CA, USA, Mar. 2012, Paper PDP5C.6. mentation for long-haul fiber-optic communication systems,” in Proc. Eur.
[67] M. Xiang et al., “Multi-subcarrier flexible bit-loading enabled capacity Conf. Opt. Commun., Gothenburg, Sweden, Sep. 2017, Paper Tu.2.D.3.
improvement in meshed optical networks with cascaded ROADMs,” Opt. [91] T. V. Ramabadran, “A coding scheme for m-out-of-n codes,” IEEE Trans.
Express, vol. 25, no. 21, pp. 25046–25058, Oct. 2017. Commun., vol. 38, no. 8, pp. 1156–1163, Aug. 1990.
[68] F. P. Guiomar, L. Bertignono, A. Nespola, and A. Carena, “Frequency- [92] F. Tosato and P. Bisaglia, “Simplified soft-output demapper for binary
domain hybrid modulation formats for high bit-rate flexibility and nonlin- interleaved COFDM with application to HIPERLAN/2,” in Proc. Int. Conf.
ear robustness,” J. Lightw. Technol., vol. 36, no. 20, pp. 4856–4870, Oct. Commun., New York, NY, USA, May 2002, vol. 2, pp. 664–668.
2018. [93] G. Baruffa and L. Rugini, “Soft-output demapper with approximated LLR
[69] J. Cho, S. Chandrasekhar, X. Chen, G. Raybon, and P. J. Winzer, for DVB-T2 systems,” in Proc. IEEE GlobeCom, San Diego, CA, USA,
“High spectral efficiency transmission with probabilistic shaping,” in Dec. 2015, pp. 1–6.
Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden, Sep. 2017, Paper [94] R. Maher, K. Croussore, M. Lauermann, R. Going, X. Xu, and J. Rahn,
Th.1.E.1. “Constellation shaped 66 GBd DP-1024QAM transceiver with 400 km
[70] G. D. Forney, Principles of Digital Communication II. Cambridge, transmission over standard SMF,” in Proc. Eur. Conf. Opt. Commun.,
MA, USA: MIT OpenCourseWare, Sep. 7, 2018. [Online]. Available: Gothenburg, Sweden, Sep. 2017, Paper Th.PDP.B.2.
https://ocw.mit.edu [95] H. Imai and S. Hirakawa, “A new multilevel coding method using error-
[71] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. correcting codes,” IEEE Trans. Inf. Theory, vol. 23, no. 3, pp. 371–377,
Hoboken, NJ, USA: Wiley, 2006. May 1977.
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1607

Junho Cho (M’10) received the B.S., M.S., and Ph.D. degrees in electrical Peter J. Winzer (F’09) received the Ph.D. degree from the Vienna University
engineering and computer science from Seoul National University, Seoul, South of Technology, Vienna, Austria, where he worked on space-borne lidar and laser
Korea. He has been with Bell Labs, Seoul, South Korea from 2010 to 2014, communications for the European Space Agency. Since 2000, he has been with
and with Holmdel, NJ, USA since 2014. He was a Ph.D. dissertation committee Bell Labs, Holmdel, NJ, USA, and has focused on many aspects of fiber-optic
member for Seoul National University. He has authored or coauthored numerous communications and networking, from advanced optical modulation, multiplex-
papers and serves as a reviewer for a wide range of IEEE journals, the scope of ing, and detection to cross-layer network architectures. He has contributed to
which includes the optics, communications, circuits and systems, and computer. several high-speed optical transmission records from 100 Gb/s to 1 Tb/s in
His current research interests are probabilistic constellation shaping, forward laboratory experiments and field trials, and has been widely promoting spatial
error correction, and signal processing. He was the recipient of the Outstanding multiplexing to overcome the optical networks capacity crunch. He has amply
Research Award under the Brain Korea 21 Project while studying with Seoul authored or coauthored and patented, and is actively involved with the IEEE
National University in 2009. Photonics Society and the Optical Society of America, including service as the
Program Chair of ECOC 2009, Program/General Chair of OFC 2015/17, and the
former Editor-in-Chief for the IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOL-
OGY. He was the recipient of multiple awards for his work and is a highly cited
researcher. He is a Fellow of Bell Labs and the OSA, and an elected member of
the U.S. National Academy of Engineering.

You might also like