Notes 2
Notes 2
We begin by answering the question raised at the end of the previous notes on the existence of
asymptotically good codes.
Suppose we are interested in q-ary codes (not necessarily linear) of block length n and minimum
distance d that have many codewords. What is the largest size such a code can have? This is a
fundamental quantity for which we define a notation below.
Definition 1 Let Aq (n, d) be the largest size of a q-ary code of block length n and minimum distance
d. The binary case is of special importance, and in this case A2 (n, d) is denoted simply as A(n, d).
There is a natural greedy approach to construct a code of distance at least d: start with any
codeword, and keep on adding codewords which have distance at least d from all previously chosen
codewords, until we can proceed no longer. Suppose this procedure halts after picking a code C.
Then Hamming balls in {0, 1, . . . , q − 1}n of radius d − 1 centered at the codewords of C must cover
the whole space. (Otherwise, we can pick one more codeword which has distance at least d from
every element of C, and the process would not have terminated.)
Definition 2 For integers q, n, `, denote by Volq (n, `) the volume of (i.e., the number of strings
in) a Hamming ball of radius ` in {0, 1, . . . , q − 1}. Note that this number does not depend on where
the ball is centered and equals
`
X n
Volq (n, `) = (q − 1)j .
j
j=0
Lemma 3 (Gilbert-Varshamov bound) The maximal size of a q-ary code of block length n and
distance d satisfies
qn qn
Aq (n, d) ≥ = Pd−1 n . (1)
Volq (n, d − 1) j=0 (q − 1)j
j
1
There also exist linear codes of size given by the Gilbert-Varshamov bound:
Exercise 1 By a suitable adaptation of the greedy procedure, prove that there also exists a linear
code over Fq of dimension at least n − blogq Volq (n, d − 1)c.
The Gilbert-Varshamov bound was actually proved in two independent works (Gilbert, 1952) and
(Varshamov, 1957). The latter actually proved the existence of linear codes and in fact got a slightly
sharper bound stated below. (You can verify that the Hamming code in fact attains this bound for
d = 3.)
Exercise 2 For every prime power q, and integers n, k, d, prove that there exists an [n, k, d]q linear
code with
d−2
X n−1
k ≥ n − blogq (q − 1)j c − 1 .
j
j=0
In fact, one can prove that a random linear code almost matches the Gilbert-Varshamov bound
with high probability, so such linear codes exist in abundance. But before stating this, we will
switch to the asymptotic viewpoint, expressing the lower bound in terms of the rate vs. relative
distance trade-off.
We now give an asymptotic estimate of the volume Volq (n, d) when d = pn for p ∈ [0, 1 − 1/q]
held fixed and n growing. This volume turns out to be very well approximated by the exponential
q hq (p)n where hq () is the “entropy function” defined below.
Definition 4 (Entropy function) For a positive integer q ≥ 2, define the q-ary entropy function
hq : [0, 1] → R as follows:
If X is the {0, 1}-valued random variable such that P[X = 1] = p and P[X = 0] = 1 − p, then
the Shannon entropy of X, H(X), equals h(p). In other words, h(p) is the uncertainty in the
outcome of a p-biased coin toss (which lands heads with probability p and tails with probability
1 − p). The function hq is continuous and increasing in the interval [0, 1 − 1/q] with hq (0) = 0 and
hq (1−1/q) = 1. The binary entropy function is symmetric around the x = 1/2 line: h(1−x) = h(x).
We can define the inverse of the entropy function as follows. For y ∈ [0, 1], the inverse h−1
q (y) is
equal to the unique x ∈ [0, 1 − 1/q] satisfying hq (x) = y.
2
Lemma 5 For an integer q ≥ 2 and p ∈ [0, 1 − 1/q],
Volq (n, pn) ≤ q hq (p)n .
Proof: We have
Ppn n
j
Volq (n, pn) j=0 j (q − 1)
=
q hq (p)n (q − 1)pn p−pn (1 − p)−(1−p)n
pn
X n
= (q − 1)j (q − 1)−pn ppn (1 − p)(1−p)n
j
j=0
pn pn
X n j n p
= (q − 1) (1 − p) .
j (q − 1)(1 − p)
j=0
p
Since p ≤ 1 − 1/q, q−1 ≤ 1 − p, and therefore the above quantity is at most
pn j Xpn
X n j n p n
(q − 1) (1 − p) = (1 − p)n−j pj .
j (q − 1)(1 − p) j
j=0 j=0
The above upper bound is tight up to lower order√terms. The quantity Volq (n, pn) is at least as
n
(q − 1)pn . By Stirling’s formula m! = 2πm(m/e)m (1 + o(1)), it follows that
large as pn
pn (1−p)n
n 1 1
≥ exp(−o(n)) = 2h(p)n−o(n)
pn p 1−p
and therefore
n
Volq (n, pn) ≥ (q − 1)pn ≥ q hq (p)n−o(n) .
pn
For a self-contained derivation of the entropy estimate for the binomial coefficients, we can work
with a crude estimate of m! given by the integral estimate
m−1
X Z m m
X
ln i ≤ ln x ≤ ln i
i=1 1 i=2
which gives
mm mm+1
≤ m! ≤ .
em−1 em−1
This immediately gives the lower bound
n 1
≥ 2h(p)n · p ≥ 2h(p)n−o(n) .
pn en p(1 − p)
We summarize the above discussion in the following important estimate.
3
Lemma 6 For positive integers n, q ≥ 2 and real p, 0 ≤ p ≤ 1 − 1/q,
q (hq (p)−o(1))n ≤ Volq (n, pn) ≤ q hq (p)n .
Combining the greedy construction of Lemma 3 with the estimate of the Hamming volume from
Lemma 6 gives the following asymptotic version of the Gilbert-Varshamov bound.
Theorem 7 (Asymptotic Gilbert-Varshamov bound) For every q and δ ∈ [0, 1 − 1/q], there
exists an infinite family C of q-ary codes with rate
R(C) ≥ 1 − hq (δ) − o(1) .
(In fact, such codes exist for every block length.)
Since hq (δ) < 1 for δ < 1 − 1/q, the above implies that for every δ < 1 − 1/q there exists an
asymptotically good family of q-ary codes of rate at least R0 (δ) > 0 and relative distance at least
δ. By Exercises 1 and 2 this also holds for linear codes over Fq . We now give an alternate proof
based on the probabilistic method.
Theorem 8 For every prime power q, δ ∈ [0, 1 − 1/q), 0 < < 1 − hq (p), and sufficiently large
positive integer n, the following holds for k = d(1 − hq (δ) − )ne. If G ∈ Fqn×k is chosen uniformly
at random, then the linear code with G as generator matrix has rate at least (1 − hq (δ) − ) and
relative distance at least δ with probability at least 1 − e−Ω(n) .
Proof: The claim about rate follows whenever G has full column rank. The probability that the
i’th column is in the span of the first (i − 1) columns is at most q i−1 /q n . By a union bound, G has
rank k with probability at least 1 − qn−kk
≥ 1 − e−Ω(n) .
For each nonzero x ∈ Fkq , the vector Gx is a uniformly random element of Fkq . (Indeed, say that
xk 6= 0, then conditioned on the choice of the first k − 1 columns G0 of G, Gx = G0 x + gk xk is
uniformly distributed since the k’th column gk is chosen uniformly at random from Fnq .) Therefore
the probability that wt(Gx) ≤ δn is at most
Volq (n, δn)
≤ q (hq (δ)−1)n .
qn
Now a union bound over all nonzero x implies that the probability that the code generated by the
columns of G has distance at most δn is bounded from above by
q k q (hq (δ)−1)n ≤ q (1−hq (δ)−)n+1 q (hq (δ)−1)n = q · q −n ≤ e−Ω(n) .
We conclude that with probability at least 1 − e−Ω(n) , the code generated by G has relative distance
at least δ and rate at least 1 − hq (δ) − .
Exercise 3 Establish a similar result by picking a random (n − k) × n parity check matrix for the
code.
4
1.4 Some comments on attaining/beating the GV bound
We have seen that there exist binary linear codes that meet the Gilbert-Varshamov bound, and thus
have rate approaching 1−h(δ) for a target relative distance of δ, 0 < δ < 1/2. The proof of this was
non-constructive, based on an exponential time algorithm to construct such a code (by a greedy
algorithm), or by picking a generator matrix (or a parity check matrix) at random. The latter
leads to a polynomial time randomized Monte Carlo construction. If there were a way to ascertain
if a randomly chosen linear code has the claimed relative distance, then this would be a practical
method to construct codes of good distance; we will have a Las Vegas construction that picks a
random linear code and then checks that it has good minimum distance. Unfortunately, given a
linear code, computing (or even approximating) the value of its minimum distance is NP-hard.
A natural challenge therefore is to give an explicit (i.e., deterministic polynomial time) construction
of a code that meets the Gilbert-Varshamov bound (i.e., has rate R and relative distance close
to h−1
q (1 − R)). Giving such a construction of binary codes (even non-linear ones) remains an
outstanding open question.
For prime powers q = p2k for q ≥ 49, explicit constructions of q-ary linear codes that not only attain
but surpass the GV bound are known! These are based on algebraic geometry and a beautiful
construction of algebraic curves with many rational points and small genus. This is also one of the
rare examples in combinatorics where we know an explicit construction that beats the parameters
obtained by the probabilistic method. (Another notable example is the Lubotzky-Phillips-Sarnak
construction of Ramanujan graphs whose girth surpasses the probabilistic bound.)
What about codes over smaller alphabets, and in particular binary codes? The Hamming upper
bound on size of codes (Lemma 13 in Notes 1) leads to the asymptotic upper bound R ≤ 1 − h(δ/2)
on the rate. This is off by a factor of 2 in the coefficient of δ compared to the achievable 1 − h(δ)
rate. We will later see improvements to the Hamming bound, but the best bound will still be
far from the Gilbert-Varshamov bound. Determining the largest rate possible for binary codes of
relative distance δ ∈ (0, 1/2) is another fundamental open problem in the subject. The popular
conjecture seems to be that the Gilbert-Varshamov bound on rate is asymptotically tight (i.e., a
binary code of relative distance δ must have rate 1 − h(δ) + o(1)), but arguably there is no strong
evidence that this must be the case.
While we do not know explicit constructions of binary codes approaching the GV bound, it is still
interesting to construct codes which achieve good trade-offs. This leads to the following questions,
which are the central questions in coding theory for any noise model (once some existential bounds
are established on the trade-offs, the questions below pertaining to the worst-case or adversarial
noise model where we impose no restriction on the channel other than a limit on the total number
of errors caused):
1. Can one explicitly construct an asymptotically good family of binary codes with a “good”
rate vs. relative distance trade-off?
2. Can one construct such codes together with an efficient algorithm to correct a fraction of
errors approaching half-the-relative distance (or even beyond)?