Papr 2
Papr 2
shaping codes for noiseless finite-state channels with cost and coding technique was introduced. Several works extend coding
i.i.d. sources. We establish a relationship between the code rate algorithms for memoryless channels to finite-state channels.
and minimum average symbol cost. We then determine the rate In [2], a finite-state graph was transformed to its memoryless
that minimizes the average cost per source symbol (total cost).
An equivalence is established between codes minimizing average representation M and a normalized geometric Huffman code
symbol cost and codes minimizing total cost, and a separation was used to design a asymptotically capacity achieving code on
theorem is proved, showing that optimal shaping can be achieved M. In [8], the author extended the dynamic programming al-
by a concatenation of optimal compression and optimal shaping gorithm introduced in [7] to finite-state channels. The proposed
for a uniform i.i.d. source. algorithm finds locally optimal codes for each starting state,
I. I NTRODUCTION but the algorithm does not guarantee global optimality. In [6],
an iterative algorithm that can find globally optimal codes was
Shaping codes are used to encode information for use on
proposed.
channels with symbol costs under an average cost constraint.
They find application in data transmission with a power con- The concepts of combinatorial capacity and probabilistic
straint, where constellation shaping is achieved by addressing capacity can be generalized to the setting where there is a con-
into a suitably designed multidimensional constellation or, straint on the average cost per transmitted channel symbol. The
equivalently, by incorporating, either explicitly or implicitly, probabilistic capacity was determined in [20] and [9], where
some form of non-equiprobable signaling. More recently, shap- the entropy-maximizing stationary Markov chain satisfying the
ing codes have been proposed for use in data storage applica- average cost constraint was found. The relationship between
tions: coding for flash memory to reduce device wear [17], and cost-constrained combinatorial capacity and probabilistic ca-
coding for efficient DNA synthesis in DNA-based storage [14]. pacity was also addressed in [10]. The equivalence of the two
Motivated by these applications, [18] investigated information- definitions of cost-constrained capacity was proved in [25], and
theoretic properties and design of rate-constrained fixed-to- an alternative proof was recently given in [15], where methods
variable length shaping odes for memoryless noiseless channels of analytic combinatorics in several variables were used to
with cost and general i.i.d. sources. In this paper, we extend the directly evaluate the cost-constrained combinatorial capacity.
results in [18] to rate-constrained shaping codes for finite-state We refer to the problem of designing codes that achieve
noiseless channels with cost and general i.i.d. sources. the cost-constrained capacity as the type-I coding problem.
Finite-state noiseless channels with cost trace their concep- This problem has also been addressed by several authors.
tual origins to Shannon’s 1948 paper that launched the study of In [10], an asymptotically optimal block code was introduced
information theory [23]. In that paper, Shannon considered the by considering codewords that start and end at the same state.
problem of transmitting information over a telegraph channel. In [12], the authors construct fixed-to-fixed length and variable-
The telegraph channel is a finite-state graph and the channel to-fixed length codes based on state-splitting methods [1]
symbols – dots and dashes – have different time durations, for magnetic recording and constellation shaping applications.
which can be interpreted as integer transmission costs. Shannon Other constructions can be found in [13], [24] and [26].
defined the combinatorial capacity of this channel and gave an In this paper, we address the problem of designing shaping
explicit formula. He also determined the symbol probabilities codes for noiseless finite-state channels with cost and general
that maximize the entropy per unit cost, and showed the i.i.d. sources. We systematically study the fundamental proper-
equivalence of this probabilistic definition of capacity to the ties of these codes from the perspective of symbol distribution,
combinatorial capacity. In [4], this result was then generalized average cost, and entropy rate using the theory of finite-state
to arbitrary non-negative symbol costs. In [11], a new proof word-valued sources. We derive fundamental bounds relating
technique for deriving the combinatorial capacity was intro- these quantities and establish an equivalence between optimal
duced for non-integer costs and another proof of the equivalence type-I and type-II shaping codes. A generalization of Varn
of combinatorial and probabilistic definitions of capacity was coding [28] is shown to provide an asymptotically optimal
given. In [2] and [3], a generating function approach was used to type-II shaping code for uniform i.i.d. sources. Finally, we
extend the equivalence to a larger class of constrained systems. prove separation theorems showing that optimal shaping for
We refer to the problem of designing codes that achieve the a general i.i.d. source can be achieved by a concatenation of
capacity, i.e., that maximize the information rate per unit cost, optimal lossless compression with an optimal shaping code for
or, equivalently, that minimize the cost per information bit, as a uniform i.i.d. source.
In Section II, we define finite-state channels with cost and 1
review the combinatorial and probabilistic capacities associated
with the type-I and type-II coding problems. In Section III, we 00
define finite-state variable length shaping codes for channels 2 2
3
with cost and characterize properties of the codeword pro- 01 4 10
cess using the theory of finite-state word-valued sources. In 4 4
Section IV, we analyze shaping codes for a fixed code rate, 11
which we call type-I shaping codes. We develop a theoretical
bound on the trade-off between the rate – or more precisely, 4
the corresponding expansion factor – and the average cost of a
Fig. 1: Flash memory channel
type-I shaping code. We then study shaping codes that minimize
average cost per source symbol (total cost). We refer to this def 1
class of shaping codes as type-II shaping codes. We derive the C I,comb (W ) = lim sup log2 |Kn (W )|. (1)
n→∞ n
relationship between the code expansion factor and the total
cost and determine the optimal expansion factor. In Section V, We also refer to this definition as type-I combinatorial capacity.
we consider the problem of designing optimal shaping codes. Let E be a stationary Markov process with entropy rate H (E)
We prove an equivalence theorem showing that both type-I and and average cost A(E). The probabilistic capacity for a given
type-II shaping codes can be realized using a type-II shaping average cost constraint W, or cost-constrained probabilistic
code for a channel with modified edge cost. Using a generaliza- capacity, is
tion of Varn coding [28], we propose an asymptotically optimal C I,prob (W ) =
def
sup H (E). (2)
type-II shaping code on this modified channel for a uniform E:A(E)≤W
i.i.d. source. We then extend our construction to arbitrary i.i.d.
sources by introducing a separation theorem, which states that The maxentropic Markov chain for a given W was derived
optimal shaping can be achieved by a concatenation of lossless in [9] and [20]. The result relies on the one-step cost-enumerator
compression and optimal shaping for a uniform i.i.d. source. matrix D ( S), where S ≥ 0, with entries
(
Due to space constraints, we must omit many detailed proofs, 2− Sw(ei j ) for edge ei j between (vi , v j )
which can be found in [16]. However, we remark that several di j ( S ) = (3)
new proof techniques are required to extend the results on 0 if edge ei j doesn’t exist.
block shaping codes for memoryless channels in [18] to the Denote by λ ( S) its Perron root and by vectors E L = [ Pvi /ρi ]
corresponding results on finite-state shaping codes for finite- and E R = [ρi ]> the corresponding left and right eigenvectors
state channels in this paper. such that E L E R = 1. Given an average cost constraint W ( S)
II. N OISELESS F INITE -S TATE C OSTLY C HANNEL the maxentropic Markov chain has transition probabilities
Let H = (V , E ) be an irreducible finite directed graph,
1 2− Sw(ei j )
with vertices V and edges E . A finite-state costly channel is Pi j ( S) = ρj (4)
a noiseless channel with cost associated with H, where each ρi λ ( S )
edge e ∈ E is assigned a non-negative cost w(e) ≥ 0. We such that
assume that between any pair of vertices (vi , v j ) ∈ V × V , 1 ρj
there is at most one edge. If not, we can always convert it to
W ( S) =
λ ( S) ∑ Pvi 2−Sw(ei j ) ρi , (5)
ij
another graph that satisfies this condition by state splitting [19].
An example of such a channel is given in Example 1. and the type-I probabilistic capacity of this channel is
Example 1. In SLC NAND flash memory, cells are arranged C I,prob (W ( S)) = log2 λ ( S) + SW ( S). (6)
in a grid and programming a cell affects its neighbors. One It was shown in [25], [15] that C I,comb (W ) = C I,prob (W ).
example of this phenomenon is inter-cell inference (ICI) [27].
Cells have two states: programmed, corresponding to bit 1, and B. Channel capacity without cost constraint
erased, corresponding to bit 0. Due to ICI, programming a cell
Denote by K (W ) the number of distinct sequences e∗ with
will damage its neighbors cells. Each length-3 sequence has a
cost equal to W. The combinatorial capacity, or the type-II
cost associated with the damage to the middle bit, as shown in
combinatorial capacity, of this channel is defined as
Table I. We can convert this table into a directed graph with
vertices V = {00, 01, 10, 11}, as shown in Fig. 1. def 1
C I I,comb = lim sup log2 K (W ). (7)
TABLE I: Flash memory channel cost W →∞ W
call shaping codes that achieve the minimum average cost for a 1, and E L = [ P̂v?i /ρi ] and E R = [ρi ]> are the corresponding
given expansion factor optimal type-I shaping codes. We solve eigenvectors such that E L E R = 1. The corresponding expansion
the following optimization problem. factor f ? is
w ( ei j ) H (X) H (X)
minimize ∑ P̂e ij f? = = . (35)
P̂ei j ij P̂e?
ij S? ∑i j P̂e?i j w(ei j )
− ∑i j P̂e?i j log2
P̂v?
P̂ei j H (X) i
subject to H (Ê) = − ∑ P̂ei j log2 ≥ (29) If there is a cost-0 cycle in H, the total cost is a decreasing
ij ∑ j P̂ei j f
function of f . 2
∑ P̂e ji = ∑ P̂ei j and ∑ P̂e ij = 1. V. O PTIMAL S HAPING C ODE D ESIGN
j j ij
In this section, we consider the problem of designing optimal
In [15], the authors discuss cost-diverse and cost-uniform type-I and type-II shaping codes.
graphs. A graph is cost-diverse if it has at least one pair of A. Equivalence Theorem
equal-length paths with different costs that connect the same
pair of vertices. Otherwise it is called cost-uniform. It can be We consider the channel with modified edge costs
proved that the edge costs w(ei j ) of a cost-uniform graph can P̂e?i j
be expressed as w(ei j ) = −µi + µ j − α. The following theorem w0 (ei j ) = − log2 ? , (36)
P̂vi
relates to the achievable minimum average cost of a finite-state
shaping code. where P̂e?i j , P̂v?i are given in Theorem 6. It is easy to check
Theorem 5. On a cost-diverse graph, the average cost of a type- that the optimal type-II shaping codes on this channel are also
I shaping code φ : V × X q → E ∗ with expansion factor f is optimal on the original channel, in the sense that the symbol
lower bounded by occurrence probabilities { P̂ei j } are identical on both channels.
H (X) log2 λ ( S) We can prove the following lemma.
Amin ( f ) = ∑ P̂e ij w ( ei j ) = − , (30)
ij
Sf S Lemma 7. Given a noiseless finite-state costly channel with edge
P̂vi 2− Sw(ei j )
costs {w(ei j )}. If there is a shaping code φ : V × X q → E ∗
where P̂ei j = ρ j, λ ( S) is the Perron root of the such that
ρi λ ( S) f ∑ P̂ei j w0 (ei j ) − H (X) < δ, (37)
matrix D ( S), E L = [ P̂vi /ρi ] E R = [ρi ]> are the corresponding ij
where w0 (ei j ) = − log2 ( P̂e?i j / P̂v?i ) = S? w(ei j ) + log2 ρi − Remark 3. By extending some leaf nodes to states that are not
log2 ρ j , for some δ > 0, then the total cost of this code satisfies visited by the original code, we can make graph G0 a complete
H (X) δ graph. Then we can choose any state as the starting state. This
f ∑ P̂ei j w(ei j ) − ?
< ?. (38) operation only adds a constant to the cost of a codeword and
ij
S S therefore does not affect the asymptotic performance of the
2 generalized Varn code.
The next two theorems establish the equivalence between type-I
and type-II shaping codes. Example 2. For the channel introduced in Example 1, the
optimal symbol distributions that minimize the total cost are
Theorem 8. Given a noiseless finite-state costly channel with shown in Table II. Based on the distribution, we can design a
edge costs {w(ei j )}. For any γ, η > 0, there exists a δ > 0 generalized Varn code on the channel with modified edge costs
such that if there exists a shaping code φ : V × X q → E ∗ with shown in Table III. The total cost as a function of codebook
expansion factor f 0 such that size is shown in Fig. 3.
f 0 ∑ P̂e0i j w0 (ei j ) − H (X) < δ, (39)
TABLE II: Probabilities for SLC flash channel that minimize
ij
total cost.
P̂e?
where w0 (ei j )= − log2 ij
P̂v?i
= S? w(ei j ) + log2 ρi − log2 ρ j , u 000 001 010 011 100 101 110 111
then the average cost of this code is upper bounded by P̂u 0.4318 0.1323 0.1135 0.0593 0.1323 0.0405 0.0593 0.0310
H (X) log λ ( S)
∑ P̂e0i j w(ei j ) − ( S f − 2S ) < γ (40) TABLE III: Modified cost for flash memory channel.
ij
u 000 001 010 011 100 101 110 111
and the expansion factor of this code f 0 satisfies | f 0 − f | < η. C (u) 0.3805 2.0923 0.6068 1.5423 0.3855 2.0923 0.6068 1.5423
Theorem 9. Given a noiseless finite-state costly channel that
does not contain a cost-0 cycle. Denote by S? the constant such
that λ ( S? ) = 1 and by f ? the expansion factor of an optimal
type-II shaping code. For any γ > 0, there exist δ, η > 0 such
that if a shaping code φ : V × X q → E ∗ with expansion factor
f 0 satisfies
∑ P̂e0i j w(ei j ) − Amin ( f 0 ) < δ, | f 0 − f ? | < η, (41)
ij
then the total cost of this code satisfies
H (X)
f 0 ∑ P̂e0i j w(ei j ) − < γ. (42)
ij
S?
B. Generalized Varn Code
We now describe an asymptotically optimal type-II shaping
code for uniform i.i.d. sources based on a generalization of
Varn coding [28]. Given a uniform i.i.d. input source X ,
Fig. 3: The total cost of a generalized Varn code on the SLC
a generalized Varn Code on the noiseless finite-state costly
flash channel
channel is a collection of tree-based variable-length mappings,
φ : V × X q → E ∗ . Denote by Yk the set of codewords starting C. Separation Theorem
from state vk , namely We now present a separation theorem for shaping codes.
Yk = {φ(vk , xq )| xq ∈ X q }. (43) It states that the minimum total cost can be achieved by a
concatenation of optimal lossless compression with an optimal
Codewords in Yk are generated according to the following steps.
shaping code for a uniform i.i.d. source.
• Set state vk ∈ V as the root of the tree.
• Expand the root node. The edge costs {w0 (ekl )} are the
Theorem 11. Given an i.i.d. source X and a noiseless finite-
modified costs defined in Lemma 7. The cost of a leaf node state costly channel with edge costs {w(ei j )}, the minimum total
is the cost of the path from root node to the leaf node. cost can be achieved by a concatenation of an optimal lossless
• Expand the leaf node that has the lowest cost.
compression code with a binary optimal type-II shaping code for
• Repeat the previous steps until the total number of leaf
a uniform i.i.d. source.
nodes M ≥ |X |q . Delete the leaf nodes that have the
largest cost until the number of leaf nodes equals to |X |q .
Theorem 12. Given the i.i.d. source X, the noiseless finite-state
Each path from the root node vk to a leaf node represents
costly channel with edge costs {w(ei j )}, and the expansion
one codeword in Yk .
factor f , the minimum average cost can be achieved by a
The following lemma gives an upper bound on the total cost of concatenation of an optimal lossless compression code with a
a generalized Varn code. binary optimal type-I shaping code for uniform i.i.d. source and
Lemma 10. The total cost of a generalized Varn code φ : V × f
expansion factor f 0 = H (X) .
X q → E ∗ is upper bounded by
log2 M maxi j {w0 (ei j )} By Theorem 9, the optimal type-I shaping code for uniform i.i.d.
T (φ) ≤ + = log2 |X |. (44) source in Theorem 12 can be replaced by a suitable optimal
q q q→∞
type-II shaping code for uniform i.i.d. source.
2
R EFERENCES
[1] R. Adler, D. Coppersmith, and M. Hassner, “Algorithms for sliding block
codes,” IEEE Trans. Inf. Theory, vol. IT-29, no. 1, pp. 5–22, Jan. 1983.
[2] G. Böcherer, “Capacity-Achieving Probabilistic Shaping for Noisy and
Noiseless Channels”, Ph.D. dissertation, RWTH Aachen University, 2012.
[3] G. Böcherer, R. Mathar, V. C. da Rocha Jr., and C. Pimentel,“On the
capacity of constrained systems,” in Proc. Int. ITG Conf. Source Channel
Coding (SCC), 2010.
[4] I. Csiszár, “Simple proofs of some theorems on noiseless channels”, Inf.
Contr., vol. 14, pp. 285–298, 1969.
[5] R. Durrett, Probability: Theory and Examples, 3rd ed. Belmont, CA:
Duxbury, 2004.
[6] R. Fujita, K. Iwata, and H. Yamamoto, “An Iterative Algorithm to
Optimize the Average Performance of Markov Chains with Finite States,”
in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Paris, France, 2019, pp.1902
- 1906.
[7] M. J. Golin and G. Rote, “A dynamic programming algorithm for
constructing optimal prefix-free codes with unequal letter costs,” IEEE
Trans. Inf. Theory, vol. 44, no. 5, pp. 1770–1781, Sep. 1998.
[8] K. Iwata and T. Koyama, “A prefix-free coding for finite-state noiseless
channels with small coding delay,” in Proc. 2010 Int. Symp. Inf. Theory
& its Applications, Taichung, Taiwan, Oct. 2010, pp. 473–477.
[9] J. Justesen and T. Høholdt, “Maxentropic Markov chains,” IEEE Trans.
Inf. Theory, vol. IT-30, no. 4, pp. 665–667, Jul. 1984.
[10] R. Karabed, D. L. Neuhoff, A. Khayrallah, The Capacity of Costly
Noiseless Channels, Research report, IBM Research Division, 1988.
[11] A. Khandekar, R. J. McEliece, and E. Rodemich, “The Discrete Noiseless
Channel Revisited,” in Proc. 1999 Int. Symp. Communication Theory and
Applications, pp. 115-137, 1999.
[12] A. S. Khayrallah and D. L. Neuhoff, “Coding for channels with cost
constraints,” IEEE Trans. Inf. Theory, vol. 42, pp. 854-867, May 1996.
[13] V. Y. Krachkovsky, R. Karabed, S. Yang and B. A. Wilson, “On
modulation coding for channels with cost constraints”, in Proc. IEEE
Int. Symp. Inf. Theory, Honolulu, HI, Ju.-Jul. 2014, pp. 421–425.
[14] A. Lenz, Y. Liu, C. Rashtchian, P. H. Siegel, A. Wachter-Zeh, and E.
Yaakobi, “Coding for efficient DNA synthesis,” in Proc. IEEE Int. Symp.
Inf. Theory, Los Angeles, CA, Jun. 2020, pp. 2885-2890.
[15] A. Lenz, S. Melczer, C. Rashtchian, and P. H. Siegel, “Multivariate
Analytic Combinatorics for Cost Constrained Channels and Subsequence
Enumeration”, arXiv:2111.06105 [cs.IT], Nov. 2021.
[16] Y. Liu, “Coding Techniques to Extend the Lifetime of Flash Mem-
ories”, Ph.D. dissertation, University of California, San Diego, 2020.
https://escholarship.org/uc/item/43k8v2hz
[17] Y. Liu and P. H. Siegel, “Shaping codes for structured data,” in Proc.
IEEE Globecom, Washington, D.C., Dec. 4-8, 2016, pp. 1–5.
[18] Y. Liu, P. Huang, A. W. Bergman, P. H. Siegel, “Rate-constrained shaping
codes for structured sources”, IEEE Trans. Inf. Theory, vol. 66, no. 8,
pp. 5261–5281, Aug. 2020.
[19] B. H. Marcus, R.M. Roth, and P.H. Siegel, An Introduction to Cod-
ing for Constrained Systems, Lecture Notes, 2001, available online at:
ronny.cswp.cs.technion.ac.il/wp-content/uploads/sites/54/2016/05/chapters1-9.pdf
[20] R. J. McEliece and E. R. Rodemich, “A maximum entropy Markov chain”, in Proc.
17th Conf. Inf. Sciences and Systems, Johns Hopkins University, Mar. 1983, pp.
245-248.
[21] M. Nishiara and H. Morita, “On the AEP of word-valued sources,” IEEE Trans. Inf.
Theory, vol. 46, no. 3, pp. 1116–1120, May 2000.
[22] S. A. Savari and R. G. Gallager, “Arithmetic coding for finite-state noiseless
channels,” IEEE Trans. Inf. Theory, vol. 40, no. 1, pp. 100–107, Jan. 1994.
[23] C. E. Shannon, “A mathematical theory of communication, Part I, Part II,” Bell Syst.
Tech. J, vol. 27, pp. 379–423, 1948.
[24] J. B. Soriaga and P. H. Siegel, “On distribution shaping codes for partial- response
channels,” in Proc. 41st Annual Allerton Conference on Communication, Control,
and Computing, (Monticello, IL, USA), pp. 468-477, October 2003.
[25] J. B. Soriaga and P. H. Siegel “On the design of finite-state shaping encoders
for partial-response channels,” in Proc. of the 2006 Inf. Theory and Application
Workshop (ITA2006), San Diego, CA, USA, Feb. 2006.
[26] J. B. Soriaga and P. H. Siegel, “Near-capacity coding systems for partial- response
channels,” in Proc. IEEE Int. Symp. Inf. Theory, Chicago, IL, USA, p. 267, June
2004.
[27] V. Taranalli, H. Uchikawa and P. H. Siegel, “Error analysis and inter-cell interference
mitigation in multi-level cell flash memories,” in Proc. IEEE Int. Conf. Commun.
(ICC), London, Jun. 2015 , pp. 271-276.
[28] B. Varn, “Optimal variable length codes (arbitrary symbol cost and equal code word
probability),” Inform. Contr., vol. 19, pp. 289–301, 1971.