Codes, Curves and Cryptography: Informal Notes. I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Codes, Curves and Cryptography

Informal Notes. I

”God created the integers. Every-

thing else is the work of Man. ”

Leopold Kronecker

Viet Nguyen-Khac

Hanoi Institute of Mathematics

1
Galois Fields

Fp, p – a prime (easy to imagine)

Fq , q = pe, e > 1 – a bit difficult

(Hint: Fq =∼ F [x]/(f (x)) for some irreducible


p
(better primitive) polynomial f of degree e)

Finite Metric Spaces

A – an alphabet, usually A = Fq

An := A × · · · × A, (n times)

Hamming distance:

for x = (x1, . . . , xn), y = (y1, . . . , yn) ∈ Fn


q

d(x, y) := #{i : xi = yi}

(Fn
q , d) – a finite metric (Fq -vector) space
2
The ISBN-Code: a1-a2a3a4 − a5a6a7a8a9-a10

ISBN – (International Standardized Book Number)


10

iai ≡ 0 (mod 11) (the check digit)
i=1

if a10 ≡ 10 (mod 11), then it is taken to be


the symbol X (see also the Maple worksheet)

The code can detect a single error

The Repetition Code

Instead of an information bit a just send aaa


(three times): this code can detect 2 errors,
and correct 1 error

Quiz: How about sending n times?

In general it is inefficient (Topological Econ-


omy Principle in AG – V. I. Arnold?!)
3
Basic Concepts

A code C = a subset of Fn
q

Codewords = elements of C

A linear code C = an Fq -linear subspace of Fn


q

w(x) := #{i : xi = 0} – the Hamming weight

The minimum distance:

dmin (C) := min {d(x, y) : x = y ∈ C}

dmin (C) := min {w(a) : a = 0 ∈ C} for linear C

Fact.
  The [n, k, d]q-code
 C can correct t : =
d−1 d
and detect errors
2 2

4
The Code Domain

M := |C| - the number of codewords

k(C) := logq M - the log-cardinality of C

(k(C) = dimFq C, if C is a linear code)

k
R(C) := – the information rate
n

dmin
δ(C) := – the relative minimum distance
n

In the (reversed) plane (R, δ)

Vq := {(R(C), δ(C)) ∈ [0, 1] × [0, 1]}

(the points are counted up to equivalence of


codes and with multiplicities)

Uq := {limit points of Vq }, Vq \ Uq :=: {isolated


codes}
5
Encoding+Transmission (error)+Decoding:

M – the message space

encoding = E : M → C (usually an inclusion)

decoding = D : Fn
q → C s.t. D(a) = a, ∀a ∈ C
(a retract)

the decoding strategy: the nearest neighbour


decoding (or standard)

D(b) = nearest to b codeword (which may not


be unique)

A brute-force method is to compare b with all


codewords (impossible as k is large)

6
The Main Problem of Coding Theory.
Find good codes, i.e. with both R, δ large
(efficiency + high capability to correct errors)

The whole space Fn


q is an [n, n, 1]q -code.

The ISBN-code is not linear, but it can be con-


sidered as a subcode of a linear code ⊂ F10
11.

The parity check code: an [8, 7, 2]2-code go-


ing back to the time of punched paper tape
8

C := {x ∈ F8
2: xi = 0}
i=1

The repetition code has parameters [n, 1, n]q .

A refined ISBN-code: an [11, 9, 3]11-code


10
 10

C := {(x0, . . . , x10) ∈ F11
11 : xi = 0, ixi = 0}
i=0 i=0

(a dual first order Reed-Muller code)


7
Linear Codes

ϕ : Fkq → Fn
q ←→ G – generator k × n-matrix

ψ : Fn
q → Fq
n−k ←→ H – parity check (n−k)×n-
matrix

C := {u . G : u ∈ Fkq } (Im ϕ)

C := {x ∈ Fn
q : H . tx = 0} (Ker ψ)

u1u2 . . . uk - a message; x1x2 . . . xn - a codeword

xi = ui, i = 1, . . . , k; xk+1, . . . , xn - check digits

Then H = (A|In−k ), G = (Ik |B) with B = −tA.



C ⊥ := {x ∈ Fn
q : x . c := xici = 0, ∀c ∈ C} -
dual code

Proposition. For a non-zero (linear) code C


dmin (C) = max {d : ∀(d−1) column vectors of H
are linearly independent}
8
Decoding Linear Codes

A standrad coset partition: t = q n−k − 1


Fn
q = C ∪ (a1 + C) ∪ (a2 + C) ∪ · · · ∪ (at + C)

Every ai is the coset leader, i.e. the minimum


weight vector in its coset (chosen randomly if
not unique).

The decoder’s strategy is to find the coset with


coset leader, say ê, containing the received
message y, and to decode x̂ = y − ê.

The standard array: (maximum likelihood de-


coding) the first row consits of s + 1 (= q k )
codewords
0, c(1), ··· , c(s)
a1, a1 + c(1), · · · , a1 + c(s)
···
at, at + c(1), · · · , at + c(s)
The syndrome: S(y) := H . ty - a column vec-
tor of length n − k
9
Properties: 1) S(y) = H . te, where y = x+e.
In particular S(y) = 0 ⇐⇒ y - a codeword.

2) Two vectors are in the same cosets ⇐⇒


they have the same syndrome.

3) For a binary code if e = (0 · · · 010 · · · 1 · · · 1 · · · ),


a b c
then

S(y) = ei hi = ha + hb + hc + · · ·
i
that is S(y) = sum of the columns of H where
the errors occurred. So the MLD problem is
to find a minimal subset M of {hi} s.t. S(y) ∈
M .

Theorem (Berlekamp-McEliece-van Tilborg,1978)


The MLD problem is N P -complete (equiva-
lent to the MAX-CUT problem).

(cf. Madhu Sudan, Algorithmic Introduction


to Coding Theory, 2001).
10
The Hamming Code

R. A. Fisher, The theory of cofounding in fac-


torial experiments in relation to the theory of
groups, Ann. Eugenics, 11 (1942), 341–353.

R. A. Fisher, A system of cofounding for fac-


tors with more than two alternatives, giving
completely orthogonal cubes and higher pow-
ers, Ann. Eugenics, 12 (1945), 2283–2290.

M. J. E. Golay, Notes on digital coding, Proc.


IEEE, 37 (1949), 657.

R. Hamming, Error detecting and error cor-


recting codes, Bell Systems Tech. J., 29 (1950),
147–160.

see also R. Hamming (1915-1998)

11
The Hamming [n, n − r, 3]2-code Hr :

The parity check matrix Hr consists of all n


(= 2r − 1) non-zero binary column vectors of
length r, namely the binary representations of
n numbers 1, 2, . . . , 2r − 1, e.g. with r = 3 it is
a [7, 4, 3]2-code with
⎡ ⎤
0 0 0 1 1 1 1
⎢ ⎥
H3 = ⎣0 1 1 0 0 1 1⎦
1 0 1 0 1 0 1
Let ei ∈ Fn2 be the standard i-th basis vector,
clearly Hr . ei = the binary representation of i.
So if S(y) = hj , then the occurred error is at
j-th position: just change the bit here.

The dual of Hr is the so-called simplex [n, r, (n+


1)/2]2-code Sr (all the non-zero codewords have
the same weight (n + 1)/2), e.g. S2 (the tetra-
hedron code) is the check parity [3, 2, 2]2-code
and H2 is the repetition [3, 1, 3]2-code.

In general one has the Hamming [n, n − r, 3]q -


code with n = (q r − 1)/(q − 1).
12
Perfect Codes

Vq (n, r) := #Br (x0) := #{y ∈ Fn


q : d(y, x0) ≤ r}

The Hamming or sphere-packing bound:


M . Vq (n, t) ≤ q n.

A code C is called perfect if it meets the Ham-


ming bound, or equivalently

Fn
q = Bt(c) (disjoint union).
c∈C

The trivial perfect codes are: Fnq (the whole


space), the single word code, a binary repe-
tition code of odd length. Among non-trivial
ones there are the Hamming codes, the Golay
[23, 12, 7]2-code G23 and the Golay [11, 6, 5]4-
code G11.

Theorem (Tietäväinen, van Lint, 1973). A


non-trivial perfect code over Fq must have the
same parameter [n, M, d] as one of the Ham-
ming or Golay codes.
13
MDS Codes
(Maximum Distance Separable Codes)

The Singleton bound (1964): d ≤ n − k + 1,


(⇔ R + δ ≤ 1 + 1/n) (cf. Y. Komamiya, 1953)

Codes that meet Singleton’s bound are called M DS.

Examples: the trivial code ([n, n, 1]q ), the par-


ity check code ([n, n − 1, 2]q ), the repetition
code ([n, 1, n]q ) are M DS (trivial series), the
RS code,... Actually M DS codes are isolated.

Theorem. In an M DS code k ≤ q − 1, if d > 2,


and d ≤ q, if k ≥ 2. In particular, if q = 2 these
are the only codes from the trivial series.

The Main Conjecture on M DS Codes: non-trivial


M DS codes are short.

14
The Reed-Solomon code

k ≤ n ≤ q; α1, · · · , αn - distinct elements of Fq


Lk−1 := {f ∈ Fq [x] : deg(f ) < k}.
Consider the evaluation map
ev : Lk−1 −→ Fn q
f −→ (f (α1), · · · , f (αn))

RS(q, n, k) := ev(Lk−1)
is an M DS code with parameters [n, k, n−k+1]q

Applications of RS codes: used in CD-Digital-


Audio, e.g. CD, CD-ROM, DVD, DTV,... (cf.
the attached Web page).

Goppa’s generalization. Let X be an algebraic


variety /Fq , P = {P1, P2, · · · , Pn } consist of
distinct elements of X(Fq ). Let L be an Fq -
vector space of rational functions ∈ Fq (X).
The map evP : X → Fn q , evaluated at points
of P as above, gives rise to a promising code
belonging to the class of Algebraic Geometry
codes, or Goppa Algebraic-Geometric codes.
15
Cyclic Codes

BCH Codes with t ≥ 2: R. C. Bose-D. K. Ray-


Chaudhuri (1960), A. Hocquenghem (1959)

The Hamming [n, n−r, 3]2-code needed r parity


checks to correct one error (n = 2r − 1). In the
abbreviation form below each entry means the
corresponding binary r-tuple
 
1 2 ··· n
H=
f (1) f (2) · · · f (n)
It is, with f (i) : = i3, an [n, n − 2r, ≥ 5]2-code
(the arithmetic operations in F2r play here an
important role).

(c0, c1, · · · , cn−1) ↔ c(x) := c0+c1x+· · ·+cn−1xn−1


Definition. A code C is cyclic if it is invariant
under any cyclic shift, or equivalently it is an
ideal of Rn := Fq [x]/(xn − 1).

16
Examples: The ideal x−1 = {f : f (1) = 0} ↔
the parity check code. In the other extreme
case, the ideal g(x) : = 1 + x + · · · + xn−1 =
{scalar multiples of g} ↔ the repetition code.

(n, q) = 1, m = min {a : n|q a − 1}, α ∈ F∗qm


has order n (primitive n-th root of unity)

BCH Codes of designed distance δ: b ≥ 0, δ ≥ 1

C := {c : c(αb) = c(αb+1) = · · · = c(αb+δ−2) = 0}


Theorem. dmin (C) ≥ δ.

b = 1, δ = 3: the binary Hamming code Hr


F∗2r = α, n = 2r −1, H = [1, α, α2, · · · , αn−1]
c(α) = 0 (so c(α2) = 0) ⇐⇒ H . tc = 0

b = 1, δ = 5: the binary double-error-correcting


BCH code c(α) = c(α3) = 0 ↔ M (1)(x).M (3)(x)


n−k
The RS code with n|q − 1 ←→  (x − αj )
j=1
17
Asymptotic Bounds

Aq (n, d) := max {M : ∃ an [n, M, d]q -code}

The Asymptotic Problem: Let {dn} be a se-


quence of natural numbers s.t. dn/n → δ ∈
[0, 1]. Investigate the behaviour of Aq (n, dn )
as n → ∞.

{Ci} - a family of [ni, ki, di ]-codes

R({Ci }) := lim R(Ci), δ({Ci}) := lim δ(Ci)


i→∞ i→∞

The family {Ci} is called asymptotically good,


if R({Ci }) > 0 and δ({Ci}) > 0.

Uq = {(R, δ) : 0 ≤ R ≤ αq (δ)}, where

logq Aq (n, [δn])


αq (δ) := lim
i→∞ n

(αq (δ) = sup {R : ∃ {Ci} s.t Ri → R, δi → δ})


18
Theorem (Manin, 1981). αq (δ) is a conti-
nous function decreasing on [0, θ], satisfying
αq (0) = 1, αq (δ) = 0 on [θ, 1] and on [0, θ]:

• αq (δ) ≤ 1 − δ/θ (Plotkin bound),

• αq (δ) ≤ 1 − Hq (δ/2) (Hamming bound),

  
• αq (δ) ≤ 1 − Hq θ − θ(θ − δ) (Bassalygo-
Elias bound),

• αq (δ) ≥ 1−Hq (δ) (Gilbert-Varshamov bound).

Here Hq (δ) denotes the Hilbert entropy func-


tion


⎨ δlogq (q − 1) − δlogq x − (1 − δ)logq (1 − δ),

Hq (δ) :=

0 < δ ≤ θ := 1 − 1/q


0, δ=0
Remark. In fact αq (δ) is Lipschitz, but it is an
open question whether it is differentiable.
19
Algebraic-Geometric Codes

V. D. Goppa, Codes on Algebraic Curves, Dok-


lady USSR, 259 (1981), 1289-1290.

X - an algebraic curve/Fq , g := g(X)

P = {P1, P2, · · · , Pn} - distinct points, ⊂ X(Fq )



D= nP P ∈ DivFq (X), Supp D ∩ P = ∅

L := L(D) := {f ∈ k(X)∗ : div(f )+D ≥ 0}∪{0}

‘ ≥ 0‘ = ‘effective.‘; ‘f ∈ L‘ = ‘∀P : ordP (f ) ≥ −nP ‘


evP : L −→ Fn
q
f −→ (f (P1), · · · , f (Pn ))
Goppa’s code C(D, P) := evP(L), k := dimFq C(D, P)

Theorem. If 2g − 2 < deg(D) ≤ n, then

(i) k = deg(D) − g + 1 (Riemann-Roch),

(ii) dmin ≥ n − deg(D) (Bézout).


20
Idea: Since R + δ ≥ 1 − (g − 1)/n (g = 0 ⇒ the
RS code), so fixing the ratio deg(D)/n

Problem: Find X/Fq with small g & large #X(Fq )

# X(F ) ≤ q+1+g[2√q] (Hasse-Weil-Serre bound)


q

Fermat (or Hermitian) curves are maximal:


xq+1 + y q+1 + z q+1 = 0/Fq2 .

(g − 1) 1
Sq (X) := # , Sq := lim Sq (X) ≥ √
X(Fq ) − 1 g>0 [2 q]

Theorem (Tsfasman, 1982). The interval


R + δ = 1 − Sq , 0 ≤ R, δ ≤ 1 lies entirely in
the code domain Uq . If q ≥ 49, it intersects
the Gilbert-Varshamov curve at two points the
interval between them lies above.

Remark. Plotkin’s bound gives better estimate


for S2, S3
21
#X(F )
q
Aq := lim = S−1
q (Ihara’s notation)
g>0 g(X)

Theorem (Ihara, 1981). Aq2 ≥ q − 1.

Theorem (Drinfel’d-Vladuts, 1983). Aq ≤


√ √
q−1. In particular, if q = , then Aq = q−1.

Theorem (Serre, 1983). There is an absolute


constant c > 0 s.t. Aq > c log q. As for q = 2
we have A2 ≥ 2/9.

Observation (Ihara & Tsfasman-Vladuts-Zink).


∃ a sequence Xi /Fq2 with #Xi(Fq2 )/g(Xi) →
q − 1. These are (Drinfel’d) modular curves,
having a plenty of supersingular points.

Theorem (Tsfasman-Vladuts-Zink, 1982). ∃


a sequence of Goppa codes over Fq2 with limit
point on or above the Tsfasman interval (bet-
ter than the GV bound as q ≥ 49).

Conjecture. The GV bound is best possible


for q = 2.
22
Some Additional References

Y. Ihara, Some remarks on the number of ra-


tional points of algebraic curves over finite fields,
J. Fac. Sci. Univ. Tokyo, 28 (1981), Sec. IA,
721–724.

F. J. MacWilliams, N. J. A. Sloane, The The-


ory of Error-Correcting Codes, North-Holland
Publ., Amsterdam, 1977.

Yu. I. Manin, What is the maximal number of


points on a curve over F2, J. Fac. Sci. Univ.
Tokyo, 28 (1981), Sec. IA, 715–720.

Yu. I. Manin, S. G. Vladuts, Linear Codes and


Modular Curves, Contemporary Problems of
Mathematics, 25 (1984), 209–257 (Russian).

M. A. Tsfasman, S. G. Vladuts, Algebraic-


Geometric Codes, Kluwer Acad. Publ., Dodrecht-
Boston-London, 1991.
23

You might also like