Prob Shaping
Prob Shaping
(Invited Paper)
Fig. 2. Architectures for PCS. Fig. 3. Schematic illustration of the AIR of the auxiliary AWGN channel mod-
eling an optical fiber channel. Upper solid line: Gaussian signaling (i.e., AWGN
capacity), lower solid line: uniform QAMs with arbitrarily rate-adaptable FEC
(i.e., modulation-constrained AIR), staircase lines: uniform QAMs with nine
Gray mapping increases the complexity of demapping symbols different fixed-rate FEC codes (i.e., modulation- and code-constrained AIRs).
to soft-decision bit metrics.
It is only four years ago that constellation shaping began to
attract significant attention, both in research and in rapidly fol- SD FEC codes, with minimum to no specific tailoring for the
lowing productization, in the form of probabilistic constellation use in a PCS application.
shaping (PCS), which shapes the probability of occurrence of PCS based on the PAS architecture in optical communica-
the constellation points rather than their locations to approxi- tions was first demonstrated by full-field simulations [34] and
mate Gaussian signaling, as shown in Fig. 1(b). In contrast to transmission experiments [35] in 2015. Record SEs using PCS
GCS, (i) it is simple to optimize these probabilities through a were then demonstrated across a wide range of transmission
single parameter to match any given channel condition, (ii) con- distances from 500 km to 4,000 km [36], and a capacity of
stellation points are placed on the rectilinear grid of a square 65 Tb/s was demonstrated at a record SE using PCS, exploiting
QAM template, which facilitates coherent DSP by robust state- C and L bands over 6,600 km in a laboratory experiment [37].
of-the-art square-QAM algorithms, and (iii) Gray mapping fa- The first field trial over a trans-oceanic submarine cable using
cilitates symbol demapping for subsequent SD FEC. PCS achieved a record SE over 5,500 km and 11,000 km [38].
Combinations of PCS and GCS have also been studied in the Over a short distance of 50 km, a record SE of 17.3 b/s/Hz was
context of optical communications [29], [30], but these have demonstrated using PCS on a 10-subcarrier superchannel [39],
yielded little gain over pure PCS based on square QAM tem- [40]. The first commercial transponder using PCS was recently
plates, which already approach the Shannon limit to within announced [41]. The first real-time experimental demonstration
0.1 dB in the AWGN channel. Nevertheless, the combination of PCS was reported in [42]. The numerous milestones that have
of GCS and PCS to combat channel nonlinearities [31], [32] is been achieved in only 4 years and the rapid adoption of PCS in
not yet a completely resolved problem. the commercial sector bear testimony to the significance of PCS
PCS is practically enabled by the probabilistic ampli- in improving the performance of optical fiber communications.
tude shaping (PAS) architecture [33], which shows capacity-
approaching performance with a practical shaping and cod- II. BENEFITS OF PCS IN OPTICAL TRANSMISSION
ing implementation and elegantly resolves the long-standing
problem of PCS in terms of combining shaping and coding, as A. Fiber Channel Capacity and Achievable Information Rates
visualized in Fig. 2: The problem with previously known PCS The trade-off between the achievable information rate (AIR)
architectures is that performing coding after shaping at the trans- and the transmission distance in a fiber-optic transmission sys-
mitter distorts the shaped symbol distribution, as FEC parity tem is illustrated in Fig. 3; as the figure merely visualizes general
bits are generally not shaped, see Fig. 2(a). On the other hand, trade-offs, the exact axis labels that vary depending on the un-
performing coding before shaping at the transmitter can cause derlying system assumptions are omitted. While the nonlinear
error bursts upon de-shaping erroneously received symbols at fiber channel is a non-AWGN channel with memory, whose gen-
the receiver, see Fig. 2(b). The PAS architecture elegantly cir- eral capacity has been estimated but is not exactly known [26],
cumvents this problem by optimally intertwining shaping and [43], it can under certain assumptions be accurately modeled
coding in a capacity-approaching and efficiently implementable as a memoryless AWGN channel [26], [44]–[46]. The AIR for
way, cf. Fig. 2(c). Coding and shaping are decoupled through a this auxiliary AWGN channel can then be maximized over all
parallel transmitter architecture (as reviewed in Section II-A.) possible input distributions, assuming ideal FEC coding with
such that their independent optimization leads to jointly optimal infinite code length and unlimited decoder complexity, lead-
performance. This greatly simplifies the implementation of en- ing to a capacity estimate of the fiber channel as represented
coder and decoder by allowing the use of off-the-shelf modern by its auxiliary AWGN channel. The capacity of the auxiliary
1592 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019
may suggest that TDHM and PCS should perform the same in
Fig. 10. Two-dimensional square lattice constellation points contained in
terms of their shaping characteristics. However, the ensemble (a) a cube, and (b) a rectangle, and their marginal probability distributions
average, i.e., the symbol amplitude distribution within a single in each coordinate axis.
time slot when averaged across all possible data streams, looks
very different for the two shaping schemes, as shown in Fig. 6. In
an ideal PCS implementation, ensemble average and time aver-
age result in the same distribution, letting the encoding process
be stationary and ergodic, and justifying the AIR calculated
based on the entropy as in (5) [77]. As an example, consider
the TDHM shown in Fig. 9(a) that interleaves symbols drawn
from a uniform binary phase-shift keying (BPSK) alphabet
XBPSK = [−1, +1] and symbols drawn from a 4-PAM alpha-
bet X4 -PAM = [−3, −1, +1, +3] at a multiplexing ratio α = 0.5
such that an MB distribution PX = [p1 , . . . , p4 ] = [ 18 , 38 , 38 , 18 ]
is observed at the receiver when performing a time aver-
age. The shaping rate of this TDHM is Rs = (1 + 2)/2 =
1.5bits/symbol per dimension, and the average symbol energy
is 4m =1 pm |xm |2 =3. Note that PCS can create the same time-
averaged distribution (hence the same average symbol energy
of 3), as shown in Fig. 9(b), but it can do so at a larger shap-
ing rate of Rs = H(X) ≈ 1.8 bits/symbol per dimension! This
shows that achieving a time-averaged MB distribution is only a
Fig. 11. AIR of various modulation schemes under bit metric decoding in the
necessary condition for optimal energy efficiency. AWGN channel.
By using different PAM orders in different time slots, TDHM
does not construct a ball but rather constructs a (hyper-) rect-
angle. As it is the cube (with equal side lengths) that is the TABLE I
PERFORMANCE METRICS FOR PCS
most energy-efficient shape among all possible rectangles for
the same volume, TDHM performs worse than uniform square-
QAM; and as the ball is more energy efficient than the cube,
PCS performs best. Figure 10 depicts a two-dimensional ex-
ample, representing square-QAM and TDHM in 2 dimensions.
The points in the rectangle have ∼3.3 dB larger average energy
than the points in the cube, with the same number of points
(i.e., 64) and the same minimum distance (i.e., 2). The same is
evident from Figure 11, which shows that TDHM (lower solid
line) can cause a loss of ∼2 dB in SNR [69], or 25% loss
in AIR [78], relative to optimal PCS (upper solid line) in the
AWGN channel, when all bit levels are encoded jointly by a III. PERFORMANCE METRICS FOR PCS
single FEC code of rate 0.8. If used with a fixed rate-0.8 FEC To quantify system performance of PCS in conjunction with
code, TDHM performs worse than uniform square QAMs with SD FEC, several approaches with and without an explicit focus
rate-adaptable FEC (cf. dashed lines in Fig. 11). A compari- on their operational meaning have been taken [79]–[85]. Rele-
son of rate adaptability and performance of the various coded vant performance metrics are summarized in Table I. The system
modulation schemes discussed so far are sketched in Fig. 12. model used to obtain these metrics is depicted in Fig. 13(a).
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1597
A. Mutual Information
Assume that we use a length-nc M -ary SD FEC code with
code rate Rc = kc /nc together with an M -ary constellation,
and the (auxiliary) channel is memoryless AWGN. In this sys-
tem, based on perfect knowledge of the transmitted symbols
X, a measurable statistic of the channel is PY |X (Y |X), i.e.,
the probability of the observed physical entity Y given the
transmitted physical entity X, cf. Fig. 13(b), which is often
called the channel transition probability. An SD demapper pro-
duces the conditional probability PY |S (yi |s) of the i-th received
symbol yi , for i = 1, . . . , nc , for every symbol s in the code
alphabet. In our system where the FEC code has the same
alphabet size as the constellation, this is equivalent to the con-
ditional probability PY |X (yi |x)given a transmitted modulation
symbol x ∈ X , which is directly fed to the subsequent SMD
as an SD decoding metric. An optimal SMD finds a legiti-
mate codeword x = [x1 , . . . , xn c ] that is the most likely to
be transmitted among all M k c possible codewords, given the
noisy observation y = [y1 , . . . , yn c ], by maximizing the prod-
uct of the channeltransition probabilities over all symbols in
y, PY |X (y|x) = ni=1 c
PY |X (yi |xi ) [71], Ch. 7.7]. It should
Fig. 13. (a) System model, and architecture of decoders for (b) SMD, be noted that there are only M k c codewords that are legiti-
(c) multi-level coding and multi-stage decoding (discussed in Appendix), and mate for the underlying code, while M n c uncoded sequences
(d) BMD. can exist for an M -ary alphabet. Therefore, only one out of
M n c /M k c = M n c (1−R c ) possible words is a legitimate code-
word, which allows a decoder to select the nearest codeword
We first consider SMD with non-binary FEC codes that have
from a noisy non-codeword word. (This illustrates the funda-
the same number of symbols in the code alphabet as that of
mental operation of FEC.) An AIR of the ideal and optimal
the modulation alphabet, i.e., M -ary FEC codes for an M -ary
SMD is the MI, defined as
constellation. (In principle, the code alphabet need not have the
same cardinality as the modulation alphabet, but this restriction Δ PY |X (Y |X )
I (X; Y ) = EX ,Y log2
makes it simple to develop equations and achieves capacity in a PY (Y )
memoryless channel.) As briefly discussed in Section II-B.1, a
PY |X (Y |X )
relevant performance metric for SMD is the MI that quantifies = EX ,Y log2
(8)
an IR that is achievable (hence an AIR) using infinite code x ∈X PX (x ) PY |X (Y |x )
length and unlimited decoder complexity. The channel capacity, in bits/symbol per dimension, where X is a random variable for
known as the Shannon limit (SL), is obtained by maximizing the the one-dimensional transmitted signal, Y is a random variable
1598 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019
for the corresponding received signal in the AWGN channel which represents the SD decoding metric of BMD. An
with a known noise variance, and EX ,Y ( · ) denotes the expec- SD demapper for BMD produces the conditional likelihood
tation taken over Xand Y. Here, by “ideal” SMD, we mean that PB j |Y (bi,j |yi ) for the j-th bit bi,j of the i-th transmitted sym-
a code is of infinite length (nc → ∞), and by “optimal” SMD, bol xi , for i = 1, . . . , nc , which is then input to the subsequent
we mean that (i) the code rate Rc is chosen to match the chan- binary SD decoder, cf. Fig. 13(d). Here, we omit the time in-
nel condition, and (ii) no other codeword has a higher likelihood dex i from Bi,j and Yi since the PCS encoding is a stationary
than the codeword chosen by SMD, since the decoder is (unreal- process and the channel is assumed to be stationary as well.
istically) capable of sorting all M k c codewords in a descending For a length-nc binary code, optimal BMD finds a legitimate
order of their probabilities PY |X (y|x). The supremum of (8) codeword b = [b1,1 , . . . , bn c /m , m ] that is the most likely to be
over all possible (continuous- and discrete-amplitude) input dis- transmitted among all 2k c possible codewords by maximizing
tributions PX is the channel capacity, which on an (auxiliary) n c /m m
PB|Y (b|y) = i=1 j =1 PB i , j |Y (bi,j |yj ), given the noisy
AWGN channel can be achieved by Gaussian signaling, as dis- observation y = [y1 , . . . , yn c /m ]. Multiplications in PB|Y (b|y)
cussed in Section II. are often removed by taking the logarithm without affecting the
Although it is in principle possible to use non-binary codes decoding performance. In addition, instead of producing two
and SMD in the PAS architecture, PCS in optical systems is metrics PB j |Y (0|yi ) and PB j |Y (1|yi ) for each received symbol
commonly implemented using binary codes and BMD for com- yi , the SD BMD demapper can produce only one log-likelihood
plexity reasons, hence the MI does not generally represent the ratio (LLR) metric
most relevant performance metric. P B j |Y (0|y i )
log P B j |Y (1|y i ) , (10)
B. Generalized Mutual Information
which will be discussed in Section IV in more detail.
Let us next consider BMD in Fig. 13(a), where a bit-to-symbol Note that the BMD demapper produces only log2 M LLRs
Δ
mapper transforms a vector B = [B1 , . . . , Bm ] to a symbol X per received symbol, whereas an SMD demapper pro-
of an M -PAM constellation. It should be first noted that Bj duces |X | = M LLRs per received symbol, in the form of
for j = 1, . . . , m are logical entities that are not directly cast log PX |Y (x1 |yi )/PX |Y (x|yi ) for all x ∈ X , where x1 de-
into the channel, but only through their physical representation notes the first letter in X . Using the conditional likelihood
X, e.g., a voltage or an optical field amplitude. On the other PB j |Y (Bj |Y ) in (9), the channel transition probability can be
hand, in the context of BMD, the decoder estimates bits and approximated as (see Appendix for derivation details and for a
not symbols. Therefore, the decoder operates on PY |B j (Y |Bj ) clarification of the operational meaning of the obtained results)
instead of PY |X (Y |X), calculated as ⎡ ⎤
m
Δ PY (Y )
PB j ,Y (Bj , Y ) QY |X (Y |X ) = ⎣ PB j |Y (Bj |Y )⎦
PY |B j (Y |Bj ) = j =1
PX (X)
PB j (Bj )
≈ PY |X (Y |X ) . (11)
x ∈X
(j ) PY |X (Y |x ) PX (x )
b j (x )
= , This is called the mismatched decoding metric [86], [87], since
PB j (Bj ) n c /m
QY |X (y|x) = i=1 QY |X (yi |xi ) is not a monotonic func-
where bj (x) is the j-th bit of symbol x, and Xb =
(j ) Δ tion of PY |X (y|x), causing loss of decoding performance; in
{x ∈ X : bj (x) = b} denotes the set of constellation points other words, the codeword that maximizes QY |X (y|x) does not
x whose j-th bit representation is b ∈ {0, 1}. For example, necessarily maximize PY |X (y|x).
if we use binary reflected Gray coding (BRGC) {101, 100, Eventually, in analogy to the MI obtained from the exact
110, 111, 011, 010, 000, 001} to represent the 8-PAM sym- decoding metric PY |X (Y |X) as in (8), we obtain the GMI using
bol alphabet X = {−7, −5, . . . , +7}, the symbol sets cor- the approximate decoding metric QY |X (Y |X) as
responding to a ‘0’ and ‘1’ at the second bit position are Δ Q Y |X (Y |X )
(2) (2) GM I (X; Y ) = EX ,Y log2 P X (x )Q Y |X (Y |x ) (12)
X0 = {−7, −5, +5, +7} and X1 = {−3, −1, +1, +3}, re- x ∈X
spectively. The conditional probability of observation y given in bits/symbol per dimension. After some mathematical manip-
2 = 0 is then calculated through PY |X (Y |X) as
transmitted bit B ulation (see Appendix), we can obtain a compact notation of
PY |B 2 (y|0) = x ∈X ( 2 ) PY |X (y|x )PX (x )/PB 2 (0). In BMD, (12) as
0
we often use the conditional likelihood PB j |Y (Bj |Y ) instead
GM I (X; Y ) = H (X) − m j =1 H (Bj |Y ) . (13)
of the conditional probability PY |B j (Y |Bj ), which can be ob-
tained by Bayes’ rule as In case of uniform PX and independent bit levels, (13) degen-
erates to
PB j (Bj ) m
PB j |Y (Bj |Y ) = PY |B j (Y |Bj ) GM I (X; Y ) = I (Bj ; Y ) ,
PY (Y ) j =1
(j )
x ∈Xb
PY |X (Y |x ) PX (x ) which represents an AIR for bit-interleaved coded modulation
j (x ) (BICM) [87]. Importantly, the GMI in (13) has the same form
= , (9)
PY (Y ) as the “BMD rate” that was first defined in [33], and was proven
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1599
to be achievable [82], i.e., there exists a coding scheme such ted bit by sub-optimal coding compared to optimal coding. In
that the post-FEC BER can be made arbitrarily small, as the [80], FEC decoding simulations are performed using spatially-
code length nc → ∞. The supremum of GMI over all possible coupled (SC) LDPC codes, showing that for each code rate Rc†
PX is the capacity of PCS under the constraints of a square the coding gap δc is nearly constant across various distributions
QAM template and parallel BMD, which can be approximately PX and M 2 -QAM constellation templates; the most widely
achieved by an MB distribution. applicable coding gap is conservatively chosen as that of the
smallest constellation (i.e., 4-QAM) since it is the marginally
C. Normalized Generalized Mutual Information greatest among those of all PX and M 2 -QAM. This implies
The GMI quantifies the number of information bits per trans- that we can with high confidence declare error-free decoding if
mitted symbol that can be reliably transmitted through a given the channel metric N GM I(X † ; Y ) is larger than the code rate
channel. After proper normalization of the GMI, we can derive Rc† by δc , independent of modulation. Therefore, if only one
a channel metric that quantifies the number of information bits FEC code of rate rc with coding gap δc is available, the optimal
per transmitted bit, which is called the normalized GMI (NGMI) shaping distribution can be obtained as
[79]–[81]. Since the GMI is an AIR of the PAS architecture as PX † = argmax GM I (X; Y )
per our above discussion, we can replace the IR of (5) with the PX
GM I to obtain the unit-less metric subject to N GM I (X; Y ) ≥ rc + δc , (16)
H(X )−G M I (X ;Y )
N GM I (X; Y ) = 1 − . (14)
m where the last condition ensures error-free decoding. It has been
It immediately follows from (13) and (14) that shown in [88] that the loss of IR due to a constant coding gap δc
m is approximately proportional to m, which importantly implies
N GM I (X; Y ) = 1 − m1 j =1 H (Bj |Y ) . (15)
that a small QAM template with moderate shaping performs
Note that the asymmetric information (ASI) introduced in [85] better than a large QAM template with strong shaping.
from a different perspective has the same form as the NGMI.
Suppose that we have obtained the maximum GM I(X; Y ) B. Optimal FEC, Sub-Optimal Shaping
over all possible distributions of X, and denote by X ∗
the channel input that maximizes the GMI, i.e., X ∗ = If the FEC is optimal but PCS is sub-optimal, we can calculate
argmaxX GM I(X; Y ). It should be noted that GM I(X ∗ ; Y ) the IR loss Δs ≥ 0 that quantifies how many fewer information
and N GM I(X ∗ ; Y ) are not associated with potential imper- bits are transmitted per transmitted symbol per dimension by
fections of the underlying transceiver technology but represent a sub-optimal shaping algorithm compared to optimal shaping.
channel metrics of the auxiliary AWGN channel, whereas Rc∗ Formally, the IR loss due to a sub-optimal shaping algorithm is
Δ
in (1) and Rs∗ in (7) are the transceiver metrics that need to Δs = H(X † ) − Rs† , where X † is the output of the sub-optimal
be used to achieve GM I(X ∗ ; Y ), cf. Table I. In other words, shaping algorithm whose probability approximately follows an
the channel’s transmission capabilities as given by the channel MB distribution and Rs† ≤ H(X † ) is the realized shaping rate
metric GM I(X ∗ ; Y ) are fully exhausted when we use ideal (7). If we define a shaping gap as the unit-less ratio of the IR
binary FEC with the optimal code rate Rc∗ = N GM I(X ∗ ; Y ) loss relative to the entropy H(X † ) for the same average symbol
2
and ideal PCS with the optimal shaping rate Rs∗ = H(X ∗ ), as energy H ∗ [|X † | ], i.e.,
summarized in Table I.
Δ Δs Rs†
δs = = 1 − ,
IV. IMPACT OF SUB-OPTIMAL PCS AND FEC H (X † ) H (X † )
GMI and NGMI quantify theoretic channel metrics as well the IR obtained by sub-optimal shaping is a fraction
as the limit of transceiver technologies without imposing any Rs† /H(X † ) = 1 − δs ≤ 1 of the GMI. Also, by substituting Rs†
constraints on implementation complexity. However, they are for H(X † ) in (5), we have
also very useful to evaluate and optimize systems with sub-
IR = Rs† − m 1 − Rc†
optimal pragmatic PCS and FEC, if shaping and coding gaps
are properly taken into account. In what follows, let PX † denote = H X † (1 − δs ) − m 1 − Rc†
the distribution that maximizes the IR using a sub-optimal PCS
and/or FEC scheme. in bits/symbol per dimension. It follows from IR =
GM I(X † ; Y )(1 − δs ) that the optimal code rate that achieves
A. Sub-Optimal FEC, Optimal Shaping this IR is then given by
Since sub-optimal FEC requires more redundancy (i.e., a † H X † − GM I X † ; Y
Rc = 1 − (1 − δs )
lower code rate) than optimal FEC to achieve error-free de- m
coding, the largest code rate for error-free decoding is †
= N GM I X ; Y (1 − δs ) + δs . (17)
Rc† = N GM I X † ; Y − δc ,
If only one FEC code of rate rc with δc = 0 is available, and if
where δc ≥ 0 is the coding gap. The coding gap δc quantifies the shaping gap δs is known for every realized MB distribution
how much fewer information bits are conveyed per transmit- PX of the shaping algorithm, the optimal distribution for this
1600 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 37, NO. 6, MARCH 15, 2019
Fig. 15. IR of non-ideal PCS with δs = 0.025, and non-ideal FEC with
δc = 0 (solid lines), δc = 0.05 (dashed lines), and δc = 0.10 (dotted lines).
calculated as
IR = GM I X † ; Y − mδc (1 − δs ) . (19)
B. SD FEC
In BMD, the SD decoding metric of the j-th bit level can be
represented by an LLR as (cf. (10))
P B j |Y (0|y ) (j ) PY |X (y |x )P X (x)
x ∈X
Lj (y) = log P B j |Y (1|y ) = log 0
(j ) PY |X (y |x )P X (x) . (23)
x ∈X
1
Fig. 17. Exact (solid lines) and piecewise-linear approximate (dashed lines)
LLRs of the (a) first, (b) second, and (c) third bit levels, with H(X ) = 2.6 on levels in this example). Therefore, in order to support strong
the 64-QAM template at SNR = 13 dB. shaping, FEC codes should be designed to be robust to shorten-
ing at the bit levels with a highly asymmetric LLR distribution.
With this, and looking back at the fact that a fixed coding gap
The piecewise-linear approximation (dashed) yields LLRs that causes a loss of IR that increases with m, overly strong shaping
are indistinguishable from the exact (solid) LLRs when their of a large QAM template, such as used, e.g., in [94], should
magnitudes (i.e., the absolute values |L̃j (y)| on the y-axis) are be avoided for pragmatic FEC decoding. Instead, one should
small; i.e., the approximation error is negligible for those LLRs switch to a smaller QAM template whenever the shaping gap
that play a crucial role in SD decoding. The approximation leads becomes small enough with weak shaping.
to an increasing discrepancy as the magnitude grows. This, how-
ever, has an insignificant impact on decoding performance, and C. Pre-FEC Performance Metrics and HD FEC
almost no impact at high SNR.
In terms of reporting raw transmission performance (pre-FEC
SD FEC codes are typically designed by assuming symmetric
BER or Q-factors), attention has to be paid to how these are de-
LLR distributions, which occur, e.g., as a consequence of BICM
termined for a shaped constellation. When performing HD of the
with uniform QAM constellations. However, when a constella-
received symbols according to the maximum a posteriori (MAP)
tion is strongly shaped such that its shaping rate Rs is much
decision rule, the decoder chooses x̂ = argmax PX |Y (x|y). If
smaller than 2m, LLRs can have highly asymmetric distribu- x∈X
tions. Therefore, performance loss can be observed in pragmatic we represent the constellation symbols X in a binary form
FEC decoding if the constellation is strongly shaped. As an ex- B = [B1 . . . Bm ] using the BRGC, two nearest-neighbor sym-
ample, the probability distribution of input symbol, PX (X), and bols xL , xR ∈ X of a received symbol y differ in only one
that of the LLR, PL i (Li ), are evaluated for two shaping rates bit. Denote this bit level by j. Then, the MAP decision can be
Rs = 2H(X) with H(X) = 2.7 and 1.8 in Fig. 18, using the 64- made as x̂ = argmax PB j |Y (bj (x)|y). In other words, x̂ = xL
x∈{x L ,x R }
QAM template, m = 3, and the BRGC [101, 100, 110, 111, 011,
010, 000, 001] in each dimension. The LLR distributions are ob- if PB j |Y (bj (xL )|y) > PB j |Y (bj (xR )|y), and x̂ = xR other-
tained at SNRs of 12.9 dB and 5.1 dB, respectively, which are wise. Therefore, an optimal decision boundary is given by the
the SNRs that achieve capacity with Rs∗ = 2H(X). With weak value d such that PB j |Y (bj (xL )|d) = PB j |Y (bj (xR )|d). That is,
shaping of H(X) = 2.7, all LLR distributions are symmetric PB j |Y (bj (xL )|d)/PB j |Y (bj (xR )|d) = 1, hence Lj (d) = 0 (cf.
or close to symmetric. With strong shaping of H(X) = 1.8, (23)). The HD boundaries are a union of the HD boundaries of
however, L2 and L3 become highly asymmetric around zero. constituent bit levels. Since evaluation of exact Lj (y) is compli-
In particular, at the second bit level, P (L2 < 0) ≈ 0.9963 and cated as shown in (24), and by knowing that the piecewise-linear
P (L2 > 0) ≈ 0.0037, hence the hard decision (HD) value of approximate of LLR is very accurate in low-magnitude regimes
the demapper output is almost always bit 1. This results in the (near Lj (y) = 0), we can obtain the HD boundaries using (25)
as by L̃j (d)˜ = 0. Therefore, from (25), the union of HD bound-
effect that the code bits are nearly shortened at the second bit
level, which amounts to 1/3 of the code bits. In the extreme case aries of all bit levels is given by
where λ → ∞, hence H(X) = 1, only the innermost constel-
d˜k = 1 + 2λσ 2 x k +x2 k + 1 , (26)
lation points have a non-zero probability of occurrence, which
results in complete shortening of the code bits that are mapped for the M -PAM constellation X = [x1 , . . . , xM ] with x1 <
to outer symbols (i.e., the code bits at the second and third bit . . . < xM . Notice that the boundary d˜k is a joint function of
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1603
where the last equation is again due to Bayes’ rule. Here, using
the chain rule, the likelihood can be rewritten as
[48] T. Tian and C. R. Jones, “Construction of rate-compatible LDPC codes [72] S. Arimoto, “An algorithm for computing the capacity of arbitrary dis-
utilizing information shortening and parity puncturing,” EURASIP J. Wire- crete memoryless channels,” IEEE Trans. Inf. Theory, vol. 18, no. 1,
less Commun. Netw., vol. 2005, no. 5, pp. 789–795, Dec. 2005. pp. 14–20, Jan. 1972.
[49] T. V. Nguyen, A. Nosratinia, and D. Divsalar, “The design of rate- [73] R. Blahut, “Computation of channel capacity and rate-distortion func-
compatible protograph LDPC codes,” IEEE Trans. Commun., vol. 60, tions,” IEEE Trans. Inf. Theory, vol. 18, no. 4, pp. 460–473, Jul.
no. 10, pp. 2841–2850, Oct. 2012. 1972.
[50] D. G. M. Mitchell, M. Lentmaier, A. E. Pusane, and D. J. Costello, “Ran- [74] G. R. Lang and F. M. Longstaff, “A Leech lattice modem,” IEEE J. Sel.
domly punctured LDPC codes,” IEEE J. Sel. Areas Commun., vol. 34, Areas Commun., vol. 7, no. 6, pp. 968–973, Aug. 1989.
no. 2, pp. 408–421, Feb. 2016. [75] A. K. Khandani and P. Kabal, “Shaping multidimensional signal spaces.
[51] J. Ha, J. Kim, and S. McLaughlin, “Rate-compatible puncturing of low- I. Optimum shaping, shell mapping,” IEEE Trans. Inf. Theory, vol. 39, no.
density parity-check codes,” IEEE Trans. Inf. Theory, vol. 50, no. 11, pp. 6, pp. 1799–1808, Nov. 1993.
2824–2826, Nov. 2004.
[76] R. Laroia, N. Farvardin, and S. A. Tretter, “On optimal shaping of multi-
[52] C.-H. Hsu and A. Anastasopoulos, “Capacity achieving LDPC codes
dimensional constellations,” IEEE Trans. Inf. Theory, vol. 40, no. 4, pp.
through puncturing,” IEEE Trans. Inf. Theory, vol. 54, no. 10,
1044–1056, Jul. 1994.
pp. 4698–4706, Oct. 2008.
[77] H. D. Pfister, J. B. Soriaga, and P. H. Siegel, “On the achievable infor-
[53] R. Asvadi and A. H. Banihashemi, “A rate-compatible puncturing scheme
mation rates of finite state ISI channels,” in Proc. IEEE GlobeCom, San
for finite-length LDPC codes,” IEEE Commun. Lett., vol. 17, no. 1,
pp. 147–150, Jan. 2013. Antonio, TX, USA, Nov. 2001, pp. 2992–2996.
[54] J. Cho, X. Chen, S. Chandrasekhar, and P. Winzer, “On line rates, [78] J. Cho, S. Chandrasekhar, and P. Winzer, “Rate-adaptive modula-
information rates, and spectral efficiencies in probabilistically shaped tion schemes for high spectral efficiency optical communications,” in
QAM systems,” Opt. Express, vol. 26, no. 8, pp. 9784–9791, Apr. Proc. OSA Frontiers Opt., Washington, DC, USA, Sep. 2018, Paper
2018. FW5B-1.
[55] J. Cho, “Balancing probabilistic shaping and forward error correction for [79] A. Alvarado, E. Agrell, D. Lavery, R. Maher, and P. Bayvel, “Replacing
optimal system performance,” in Proc. Opt. Fiber. Conf., San Diego, CA, the soft-decision FEC limit paradigm in the design of optical communi-
USA, Mar. 2018, Paper M3C-2. cation systems,” J. Lightw. Technol., vol. 33, no. 20, pp. 4338–4352, Oct.
[56] P. Schulte and G. Böcherer, “Constant composition distribution matching,” 2015.
IEEE Trans. Inf. Theory, vol. 62, no. 1, pp. 430–434, Jan. 2016. [80] J. Cho, L. Schmalen, and P. Winzer, “Normalized generalized mutual
[57] J. Cho, S. Chandrasekhar, R. Dar, and P. J. Winzer, “Low-complexity information as a forward error correction threshold for probabilistically
shaping for enhanced nonlinearity tolerance,” in Proc. Eur. Conf. Opt. shaped QAM,” in Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden,
Commun., Dusseldorf, Germany, Sep. 2016, Paper W1C.2. Sep. 2017, Paper M.2.D.2.
[58] J. Cho, “Prefix-free code distribution matching for probabilistic constel- [81] A. Alvarado, T. Fehenberger, B. Chen, and F. M. J. Willems, “Achiev-
lation shaping,” IEEE Trans. Commun., submitted for publication. able information rates for fiber optics: Applications and computations,” J.
[59] J. Cho et al., “Probabilistic signal shaping and codes therefor,” U.S. Patent Lightw. Technol., vol. 36, no. 2, pp. 424–439, Jan. 2018.
Appl. 15/374397, Dec. 9, 2016. [82] G. Böcherer, “Achievable rates for probabilistic shaping,” 2018,
[60] G. Böcherer, F. Steiner, and P. Schulte, “Fast probabilistic shaping arXiv:1707.01134.
implementation for long-haul fiber-optic communication systems,” in [83] G. Böcherer, “On joint design of probabilistic shaping and FEC for optical
Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden, Sep. 2017, Paper systems,” in Proc. Opt. Fiber Conf., San Diego, CA, USA, Mar. 2018,
Tu.2.D.3. Paper M4E-1.
[61] T. Yoshida, M. Karlsson, and E. Agrell, “Short-block-length shaping by [84] G. Böcherer, P. Schulte, and F. Steiner, “Probabilistic shaping and for-
simple mark ratio controllers for granular and wide-range spectral effi- ward error correction for fiber-optic communication systems,” J. Lightw.
ciencies,” in Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden, Sep. Technol., to be published.
2017, Paper Tu.2.D.2. [85] T. Yoshida, M. Karlsson, and E. Agrell, “Performance metrics for systems
[62] T. Yoshida, M. Karlsson, and E. Agrell, “Low-complexity variable-length with soft-decision FEC and probabilistic shaping,” IEEE Photon. Technol.
output distribution matching with periodical distribution uniformaliza- Lett., vol. 29, no. 23, pp. 2111–2114, Dec. 2017.
tion,” in Proc. Opt. Fiber. Conf., San Diego, CA, USA, Mar. 2018, Paper [86] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai Shitz, “On informa-
M4E.2. tion rates for mismatched decoders,” IEEE Trans. Inf. Theory, vol. 40,
[63] P. Schulte and F. Steiner, “Divergence-optimal fixed-to-fixed length dis- no. 6, pp. 1953–1967, Nov. 1994.
tribution matching with shell mapping,” IEEE Wireless Commun. Lett., to [87] A. Martinez, A. G. i Fàbregas, G. Caire, and F. M. J. Willems, “Bit-
be published. interleaved coded modulation revisited: A mismatched decoding per-
[64] S.-Y. Chung, G. D. Forney, T. J. Richardson, and R. Urbanke, “On the spective,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2756–2765,
design of low-density parity-check codes within 0.0045 dB of the Shannon Jun. 2009.
limit,” IEEE Commun. Lett., vol. 5, no. 2, pp. 58–60, Feb. 2001. [88] J. Cho, S. L. I. Olsson, S. Chandrasekhar, and P. Winzer, “Information
[65] W.-R. Peng, I. Morita, and H. Tanaka, “Hybrid QAM transmission tech- rate of probabilistically shaped QAM with non-ideal forward error correc-
niques for single-carrier ultra-dense WDM systems,” in Proc. Opto- tion,” in Proc. Eur. Conf. Opt. Commun., Rome, Italy, Sep. 2018, Paper
Electron. Commun. Conf., Kaohsiung, Taiwan, Jul. 2011, pp. 824–825. Th1H.5.
[66] X. Zhou et al., “4000 km transmission of 50 GHz spaced, [89] J. Cho and P. J. Winzer, “Multi-rate prefix-free code distribution match-
10 × 494.85-Gb/s hybrid 32-64QAM using cascaded equalization and ing,” in Proc. Opt. Fiber Commun. Conf., to be published.
training-assisted phase recovery,” in Proc. Opt. Fiber. Conf., Los Angeles, [90] G. Böcherer, F. Steiner, and P. Schulte, “Fast probabilistic shaping imple-
CA, USA, Mar. 2012, Paper PDP5C.6. mentation for long-haul fiber-optic communication systems,” in Proc. Eur.
[67] M. Xiang et al., “Multi-subcarrier flexible bit-loading enabled capacity Conf. Opt. Commun., Gothenburg, Sweden, Sep. 2017, Paper Tu.2.D.3.
improvement in meshed optical networks with cascaded ROADMs,” Opt. [91] T. V. Ramabadran, “A coding scheme for m-out-of-n codes,” IEEE Trans.
Express, vol. 25, no. 21, pp. 25046–25058, Oct. 2017. Commun., vol. 38, no. 8, pp. 1156–1163, Aug. 1990.
[68] F. P. Guiomar, L. Bertignono, A. Nespola, and A. Carena, “Frequency- [92] F. Tosato and P. Bisaglia, “Simplified soft-output demapper for binary
domain hybrid modulation formats for high bit-rate flexibility and nonlin- interleaved COFDM with application to HIPERLAN/2,” in Proc. Int. Conf.
ear robustness,” J. Lightw. Technol., vol. 36, no. 20, pp. 4856–4870, Oct. Commun., New York, NY, USA, May 2002, vol. 2, pp. 664–668.
2018. [93] G. Baruffa and L. Rugini, “Soft-output demapper with approximated LLR
[69] J. Cho, S. Chandrasekhar, X. Chen, G. Raybon, and P. J. Winzer, for DVB-T2 systems,” in Proc. IEEE GlobeCom, San Diego, CA, USA,
“High spectral efficiency transmission with probabilistic shaping,” in Dec. 2015, pp. 1–6.
Proc. Eur. Conf. Opt. Commun., Gothenburg, Sweden, Sep. 2017, Paper [94] R. Maher, K. Croussore, M. Lauermann, R. Going, X. Xu, and J. Rahn,
Th.1.E.1. “Constellation shaped 66 GBd DP-1024QAM transceiver with 400 km
[70] G. D. Forney, Principles of Digital Communication II. Cambridge, transmission over standard SMF,” in Proc. Eur. Conf. Opt. Commun.,
MA, USA: MIT OpenCourseWare, Sep. 7, 2018. [Online]. Available: Gothenburg, Sweden, Sep. 2017, Paper Th.PDP.B.2.
https://ocw.mit.edu [95] H. Imai and S. Hirakawa, “A new multilevel coding method using error-
[71] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. correcting codes,” IEEE Trans. Inf. Theory, vol. 23, no. 3, pp. 371–377,
Hoboken, NJ, USA: Wiley, 2006. May 1977.
CHO AND WINZER: PROBABILISTIC CONSTELLATION SHAPING FOR OPTICAL FIBER COMMUNICATIONS 1607
Junho Cho (M’10) received the B.S., M.S., and Ph.D. degrees in electrical Peter J. Winzer (F’09) received the Ph.D. degree from the Vienna University
engineering and computer science from Seoul National University, Seoul, South of Technology, Vienna, Austria, where he worked on space-borne lidar and laser
Korea. He has been with Bell Labs, Seoul, South Korea from 2010 to 2014, communications for the European Space Agency. Since 2000, he has been with
and with Holmdel, NJ, USA since 2014. He was a Ph.D. dissertation committee Bell Labs, Holmdel, NJ, USA, and has focused on many aspects of fiber-optic
member for Seoul National University. He has authored or coauthored numerous communications and networking, from advanced optical modulation, multiplex-
papers and serves as a reviewer for a wide range of IEEE journals, the scope of ing, and detection to cross-layer network architectures. He has contributed to
which includes the optics, communications, circuits and systems, and computer. several high-speed optical transmission records from 100 Gb/s to 1 Tb/s in
His current research interests are probabilistic constellation shaping, forward laboratory experiments and field trials, and has been widely promoting spatial
error correction, and signal processing. He was the recipient of the Outstanding multiplexing to overcome the optical networks capacity crunch. He has amply
Research Award under the Brain Korea 21 Project while studying with Seoul authored or coauthored and patented, and is actively involved with the IEEE
National University in 2009. Photonics Society and the Optical Society of America, including service as the
Program Chair of ECOC 2009, Program/General Chair of OFC 2015/17, and the
former Editor-in-Chief for the IEEE/OSA JOURNAL OF LIGHTWAVE TECHNOL-
OGY. He was the recipient of multiple awards for his work and is a highly cited
researcher. He is a Fellow of Bell Labs and the OSA, and an elected member of
the U.S. National Academy of Engineering.